Resilience: hbf-session-manager
Error handling and retry patterns for this service. Platform-wide patterns:
docs/architecture/resilience.md
HTTP Retry
- Library: got (v11.8.5)
- Attempts: Configurable via
options.retries(trainOne method only) - Backoff: None (immediate retry, no delay between attempts)
- On failure: Returns
falseif all retries exhausted
Queue Retry (if applicable)
N/A
Timeouts
| Call | Timeout | Configured in |
|---|---|---|
| POST/PATCH requests | SESSION_SERVICE_REQUEST_TIMEOUT env var or 5000ms default | Environment / service config |
| GET requests | May have no timeout (this.timeout may be unset) | Service config |
| NLP pipeline polling | NLP_PIPELINE_POLL_TIMEOUT_IN_SECS (default 360s / 6 min), poll interval 2s | Environment config |
Circuit Breakers
None.
Fallback Strategy
| Failure scenario | Behaviour | User impact |
|---|---|---|
| NLP training failure | Returns false after retries exhausted | Training silently fails, caller must handle |
| NLP polling timeout | Throws TimeoutError | Caller receives error, must handle |
| HTTP errors | Logged and re-thrown | Error propagated to caller |
Known Gaps
- GET requests may have no timeout, risking indefinite hangs.
- Manual retry loop has no backoff delay (thundering herd risk under load).
- No health endpoint.
- No circuit breaker for downstream dependencies.