Resilience: helvia-rag-pipelines

Library: httpx + backoff decorator (Python)
Attempts: Varies by call type (see table)
Backoff: Constant or exponential depending on call type
On failure: Exception raised after retries exhausted

Error handling and retry patterns for this service. Platform-wide patterns: docs/architecture/resilience.md

HTTP Retry

Call type	Max attempts	Backoff	Trigger
LLM API calls	5	Constant 1s	httpx.HTTPError
Semantic search indexing	3 (Exception) + 5 (HTTPError)	0.5s constant	Exception, httpx.HTTPError
Translation	5	Constant 1s	Exception (in `translation_service.py`)
Pipeline operations	3	Exponential	Exception

Not applicable. No queue consumers in this service.

Call	Timeout	Configured in
LLM API calls	30s	httpx client config
Translation calls	10s	httpx client config
HTTP connection pool	max_connections=1000, max_keepalive=20, keepalive_expiry=5s	httpx pool config
Vector DB operations (Qdrant/Milvus)	None (no explicit timeout)	Vector DB client

None implemented.

Failure scenario	Behaviour	User impact
LLM cache lookup fails	Falls back to direct LLM call ("Fallback to no cache")	Slightly higher latency, no user-visible impact
Embedding cache fails	Multi-tier fallback: Redis async, Redis sync, in-memory	Transparent to user
Translation provider fails	Round-robin provider selection via NLPProviderService	Next provider used, but selection is not failure-aware
LLM permanent failure (all retries exhausted)	Exception raised	Pipeline step fails, caller receives error

Endpoint: None exposed for orchestration probes
Startup checks: DB connection, Alembic migrations, Vector DB connectivity, collection setup (lifespan events)

No timeout on Vector DB operations (Qdrant/Milvus), so a hung vector store can block indefinitely
No /health endpoint for Kubernetes liveness/readiness probes (startup checks exist but are not exposed)
No circuit breaker for LLM providers. Repeated failures to a degraded provider consume all retry budget before moving on
Provider selection (NLPProviderService) uses round-robin, not failure-aware routing. A failing provider keeps receiving traffic
LLM call retries use constant 1s backoff instead of exponential, which can amplify load on a stressed provider