Skip to main content

Resilience: helvia-rag-pipelines

Error handling and retry patterns for this service. Platform-wide patterns: docs/architecture/resilience.md

HTTP Retry

  • Library: httpx + backoff decorator (Python)
  • Attempts: Varies by call type (see table)
  • Backoff: Constant or exponential depending on call type
  • On failure: Exception raised after retries exhausted
Call typeMax attemptsBackoffTrigger
LLM API calls5Constant 1shttpx.HTTPError
Semantic search indexing3 (Exception) + 5 (HTTPError)0.5s constantException, httpx.HTTPError
Translation5Constant 1sException (in translation_service.py)
Pipeline operations3ExponentialException

Queue Retry

Not applicable. No queue consumers in this service.

Timeouts

CallTimeoutConfigured in
LLM API calls30shttpx client config
Translation calls10shttpx client config
HTTP connection poolmax_connections=1000, max_keepalive=20, keepalive_expiry=5shttpx pool config
Vector DB operations (Qdrant/Milvus)None (no explicit timeout)Vector DB client

Circuit Breakers

None implemented.

Fallback Strategy

Failure scenarioBehaviourUser impact
LLM cache lookup failsFalls back to direct LLM call ("Fallback to no cache")Slightly higher latency, no user-visible impact
Embedding cache failsMulti-tier fallback: Redis async, Redis sync, in-memoryTransparent to user
Translation provider failsRound-robin provider selection via NLPProviderServiceNext provider used, but selection is not failure-aware
LLM permanent failure (all retries exhausted)Exception raisedPipeline step fails, caller receives error

Health Check

  • Endpoint: None exposed for orchestration probes
  • Startup checks: DB connection, Alembic migrations, Vector DB connectivity, collection setup (lifespan events)

Known Gaps

  • No timeout on Vector DB operations (Qdrant/Milvus), so a hung vector store can block indefinitely
  • No /health endpoint for Kubernetes liveness/readiness probes (startup checks exist but are not exposed)
  • No circuit breaker for LLM providers. Repeated failures to a degraded provider consume all retry budget before moving on
  • Provider selection (NLPProviderService) uses round-robin, not failure-aware routing. A failing provider keeps receiving traffic
  • LLM call retries use constant 1s backoff instead of exponential, which can amplify load on a stressed provider