Resilience: hbf-data-retention
Error handling and retry patterns for this service. Platform-wide patterns:
docs/architecture/resilience.md
HTTP Retry
- Library: hbf-core-api (inherited, uses axios)
- Attempts: 3 (hbf-core-api, network errors only); application-level retry via
THRESHOLD_OF_DELETION_RETIRESconfig - Backoff: Exponential (hbf-core-api, network errors only); linear at application level (no backoff between deletion retries)
- On failure: Failed org/tenant IDs collected and retried up to N times; permanently failed items logged, execution continues
Queue Retry (if applicable)
N/A
Timeouts
| Call | Timeout | Configured in |
|---|---|---|
| hbf-core-api calls | Not set (axios default: no timeout) | N/A |
Circuit Breakers
None.
Fallback Strategy
| Failure scenario | Behaviour | User impact |
|---|---|---|
| Deletion failure for org/tenant | Failed IDs collected and retried up to THRESHOLD_OF_DELETION_RETIRES times | Deletion delayed but eventually attempted |
| Permanent deletion failure | Logged, execution continues (runs in infinite loop) | Stale data persists until manual intervention |
| Uncaught error | Logged at ERROR level, loop restarts | Service self-recovers but may miss deletion window |
Known Gaps
- No timeout on hbf-core-api calls (requests may hang indefinitely).
- No circuit breaker (cascading failures possible if hbf-core is down).
- No DLQ for permanently failed deletions (only logged, no structured tracking).
- Application-level retry is linear with no exponential backoff between attempts.
- No graceful shutdown mechanism.
- No health endpoint.