Skip to main content

Flow: Analytics Ingestion

End-to-end sequence for stats aggregation and report generation. Services: hbf-stats, hbf-reports, hbf-core, hbf-session-manager, hbf-console, SMTP

Sequence Diagram

Step-by-Step

Data Source: DataCollector (Real-Time Message Recording)

DataCollector is a component in hbf-bot's lifecycle pipeline that writes chatSessionMessage records into the chat-sessions.messages[] array in MongoDB as events are processed. This is the primary mechanism for recording what happened during a conversation.

For live chat events, DataCollector only runs on ITC (Internal To Client) events, not raw LIVECHAT-origin events. The _isEnabled() method returns false for EventOrigin.LIVECHAT. This means analytics are recorded when the message is delivered to the end user, not when it first arrives from hbf-lcg.

DataCollector also maintains the liveChats[] aggregate on the ChatSession, tracking waiting time, duration, response times, and agent info.

See docs/architecture/flows/live-chat.md for the full live chat analytics data flow.

Data Source: Session Completion

hbf-session-manager feeds into the analytics pipeline by marking chat sessions as completed in hbf-core. This updates the raw data that hbf-stats later aggregates.

Stats Aggregation (hbf-stats)

  1. Polling (hbf-stats -> hbf-core): The hbf-stats daemon runs an infinite loop, querying hbf-core for tenants whose stats have not been updated recently (statsUpdateLessThan threshold).

  2. Data fetch (hbf-stats -> hbf-core): For each stale tenant, hbf-stats fetches the organization's timezone and the analytics summary covering two windows: daily granularity for the past 60 days, and monthly granularity for the past 13 months.

  3. Aggregation (hbf-stats): The raw analytics data is aggregated into summary statistics per tenant.

  4. Write-back (hbf-stats -> hbf-core): Aggregated stats are written back to the tenant document via TenantsClient.createOrUpdateStats().

Report Generation (hbf-reports)

  1. Scheduled triggers: hbf-reports runs cron jobs on two schedules:

    • Weekly: Fires every Monday, generating weekly summary reports.
    • Monthly: Fires on the 1st of each month, generating monthly summary reports.
  2. Data fetch (hbf-reports -> hbf-core): For each scheduled report, hbf-reports calls 10+ analytics API methods on hbf-core, fetching summary data, live chat metrics, automated answer stats, organization/tenant/deployment metadata.

  3. Generation: hbf-reports generates PDF or Excel files from the fetched data.

  4. Delivery (hbf-reports -> SMTP): Reports are sent as email attachments via nodemailer.

On-Demand Reports (hbf-console -> hbf-reports)

  1. Export (hbf-console -> hbf-reports): Users can request on-demand PDF exports from the console via GET /exports. hbf-reports fetches the data from hbf-core and returns the generated PDF.

  2. Schedule management (hbf-console -> hbf-reports): Users manage automated report schedules (create, update, delete) via REST CRUD endpoints on hbf-reports.

Contracts

hbf-stats -> hbf-core (tenant polling):

TenantsClient.list({ statsUpdateLessThan: Date })
-> Tenant[] (tenants needing stats refresh)

hbf-stats -> hbf-core (data fetch + write-back):

Organization timezone lookup
Analytics summary (daily: 60 days, monthly: 13 months)
TenantsClient.createOrUpdateStats(tenantId, {
daily: [...],
monthly: [...],
updatedAt: Date
})

hbf-reports -> hbf-core (analytics methods, 10+):

Analytics summary, live chat stats, automated answers,
organization info, tenant info, deployment info,
session breakdowns, NLP accuracy, response rates, etc.

hbf-reports -> SMTP:

nodemailer transport with PDF/Excel attachment
To: report schedule recipients
Subject: weekly/monthly report for {organization}

hbf-console -> hbf-reports:

GET  /exports?tenantId=...&from=...&to=...  -> PDF download
GET /schedules -> Schedule[]
POST /schedules -> Schedule
PUT /schedules/:id -> Schedule
DELETE /schedules/:id -> 204