Skip to main content

Resilience: hbf-core-api

Retry logic, timeout behavior, and error handling patterns in the HTTP client layer. Platform-wide resilience patterns: docs/architecture/resilience.md

Retry Policy

Retries are implemented in makeApiRequest in src/core.ts.

PropertyValue
Max attempts3 (initial attempt + 2 retries)
TriggerNetwork error only: axios error.request set, error.response absent
BackoffExponential with jitter: Math.round((Math.exp(attempt) - 2 * Math.random()) * 1000) ms
Max delay (approx.)~18s on attempt 3 (e^3 ≈ 20s minus jitter)
Retry on HTTP 4xx/5xxNo. Any response from the server (including 500, 503) is returned immediately without retry.
Retry on connection refusedYes (no response received).
Retry on DNS failureYes (no response received).

Retry Logic (simplified)

attempt 1: make request
-> got response (any status): return HBFCoreApiResponse
-> no response (network error) AND attempt <= 3: wait, increment, recurse
-> no response AND attempt > 3: return synthetic 503
-> setup error (bad config): return synthetic 503

Timeout Behavior

There is no timeout configured on any request.

axios uses undefined for timeout by default, meaning requests can hang indefinitely waiting for hbf-core to respond. If hbf-core becomes unresponsive without dropping the TCP connection (e.g., a stuck thread holding the socket open), the calling service will block on that request forever.

This affects every service that uses this library.

Error Handling

makeApiRequest never throws. It returns an HBFCoreApiResponse in all cases:

ScenarioResult
HTTP 2xxHBFCoreApiResponse with status from server
HTTP 4xx/5xxHBFCoreApiResponse with error status from server
Network error (no response), attempts exhaustedHBFCoreApiResponse with status: 503, body { message: <error.message> }
axios setup error (bad config, pre-request)HBFCoreApiResponse with status: 503, body { message: <error.message> }

Extractors on HBFCoreApiResponse throw synchronously if the status is outside 2xx (except getOptionalValue and getFile, which return undefined on 404).

Sub-client methods propagate whatever the extractor returns or throws. There is no additional error wrapping at the client layer.

Known Gaps

  1. No request timeout. A hung hbf-core connection blocks the caller indefinitely. All 13 consuming services are exposed to this.

  2. HTTP 5xx not retried. If hbf-core returns 500 or 503, the library returns that response immediately. Only connectivity failures (no response at all) trigger a retry. A transient hbf-core crash that returns a 503 before closing the connection will not be retried.

  3. No circuit breaker. There is no open/half-open/closed state machine. If hbf-core is down, every call goes through the full retry cycle (3 attempts, up to ~30s total) before failing.

  4. No per-client or per-method timeout override. Timeout must be handled at the consuming service level (e.g., wrapping calls in Promise.race with a setTimeout).

  5. Token not refreshed. The token is set once at construction. If the token expires during the lifetime of the HBFCoreApi instance, all subsequent requests will receive 401 responses. The library does not detect this or trigger a token refresh.

Recommendations

  • Add a default axios timeout in requestHBFCore: timeout: 30000 (30s) at minimum. Make it configurable via a constructor option.
  • Retry on 503: Extend the retry condition to include error.response.status === 503 with a short fixed delay.
  • Implement circuit breaker in the calling service if hbf-core is a critical dependency (e.g., using opossum).
  • Wrap calls with a timeout in services where a hung call is unacceptable:
    const result = await Promise.race([
    coreApi.BotDeploymentClient.findByHandle(handle),
    new Promise((_, reject) => setTimeout(() => reject(new Error("timeout")), 10000)),
    ]);
  • Recreate HBFCoreApi on token refresh rather than holding a long-lived instance.