Resilience: hbf-core
Error handling and retry patterns for this service. Platform-wide patterns:
docs/architecture/resilience.md
HTTP Retry
- Library: Spring Retry (
@Retryableannotations with@EnableRetry) - Attempts: 3 (Spring Retry default)
- Backoff: Fixed 300ms on
AnalyticsService.addMessageTags; default onChatSessionService.addMessages - On failure: Exception propagates to controller;
ResponseExceptionHandler(@ControllerAdvice) returns structuredErrorResponse
Important: Retry is applied ONLY to specific MongoDB operations (UncategorizedMongoDbException), NOT to outbound HTTP calls.
Queue Retry
Not applicable. This service does not consume queues.
Timeouts
| Call | Timeout | Configured in |
|---|---|---|
| HttpClient base default | 30000ms | HttpClient constructor |
| NotificationServiceClient | 2000ms | Client constructor (hardcoded) |
| DataManagerClient | 5000ms | Client constructor (hardcoded) |
| LanguageToolClient | 5000ms | Client constructor (hardcoded) |
| HelviaNLPSpecificationClient | 120000ms | Client constructor (hardcoded) |
| HelviaRAGPipelineClient | 120000ms | Configurable |
| HelviaGPTPipelineClient | 120000ms | Client constructor (hardcoded) |
| AzureAIClient | 350000ms | Client constructor (hardcoded) |
| OpenAIClient | 350000ms | Client constructor (hardcoded) |
| IntegrationAuthenticationServiceOIDC | 30000ms | Client constructor (hardcoded) |
Circuit Breakers
None. No circuit breaker library is configured or used.
Fallback Strategy
| Failure scenario | Behaviour | User impact |
|---|---|---|
| MongoDB write fails (retryable) | Spring Retry retries up to 3 times | Transparent if retry succeeds |
| Outbound HTTP call fails | Exception propagates to ResponseExceptionHandler | Structured error response returned to caller |
| AI client call fails (Azure/OpenAI) | Exception propagated, no retry | Operation fails after 350s timeout |
Health Check
/actuator/health(Spring Actuator) withmanagement.endpoint.health.probes.enabled=true/{tenant}/health-checkcustom endpoint
Known Gaps
- No HTTP retry on external service calls (only MongoDB operations are retried)
- No circuit breaker on any dependency
- Timeout values hardcoded in client constructors, not configurable via properties (except RAG pipeline)
- AI client calls (AzureAI, OpenAI) have 350s timeout but no retries
- NLU/pipeline clients have long timeouts (120s) with no recovery strategy if the dependency is degraded