Resilience: hbf-broadcast
Error handling and retry patterns for this service. Platform-wide patterns:
docs/architecture/resilience.md
HTTP Retry
- Library: Mixed (deprecated
requestfor Facebook,axiosfor Slack) - Attempts: Facebook: 5 (manual retry loop); Slack: 3 (exponential backoff on 5xx)
- Backoff: Facebook: none; Slack:
(e^attempt - 2*random) * 1000ms; DB fetch: exponential, 3 attempts - On failure: Facebook classifies errors via
shouldKeepError()to decide resend or drop; Slack propagates after retries
Queue Retry (if applicable)
N/A
Timeouts
| Call | Timeout | Configured in |
|---|---|---|
| All HTTP requests | Not set | N/A |
Circuit Breakers
None.
Fallback Strategy
| Failure scenario | Behaviour | User impact |
|---|---|---|
| Facebook send failure | shouldKeepError() determines resend vs drop | Some messages silently dropped |
| Individual broadcast failure | Promise.allSettled() isolates failures per recipient | Other recipients unaffected |
| DB fetch failure (fetchNonReceiversFromDB) | Returns empty array after 3 retries | Broadcast proceeds with incomplete recipient list |
Known Gaps
- No timeout on any HTTP request (requests may hang indefinitely).
- Inconsistent retry strategies across channels (Facebook: 5 attempts, no backoff; Slack: 3 attempts, exponential backoff).
- No circuit breaker for downstream channel APIs.
- No health endpoint.