Skip to main content

Communication: hbf-knowledge-manager

1-hop view of how this service communicates with its siblings. For the full system view, see docs/architecture/service-communication.md.

Calls Out To

ServiceProtocolPurposeKey calls
hbf-corehbf-core-apiIntegration lookup, KB queries, file ingestion, KB group deletion, user auth, SharePoint subscription state persistenceIntegrationClient.getById(), IntegrationClient.findAllByWebhookKey(), IntegrationClient.update() (subscription IDs, expiries, delta links), IntegrationClient.listSharepointActiveSubscriptions(), KnowledgeBaseClient.list(), KnowledgeBaseArticleClient.fileToArticles(), KnowledgeBaseGroupClient.list(), KnowledgeBaseGroupClient.deleteAll(), KnowledgeBaseGroupClient.deleteBySourceId(), UsersClient.findCurrentUser()
Azure Blob StorageAzure SDK (@azure/storage-blob)Download blob files for ingestion; list blobs for full syncSAS token sourced from integration config in hbf-core — no credentials stored locally
Microsoft Graph APIHTTPS (https://graph.microsoft.com/v1.0)SharePoint site/drive resolution, file listing and download, delta queries, webhook subscription CRUD, item permissions, list item field metadataOAuth2 client credentials via SHAREPOINT_CLIENT_ID + SHAREPOINT_CLIENT_SECRET; token cached per tenant with 5-minute buffer
Azure AD / Entra IDHTTPS (https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token)OAuth2 token acquisition for Graph APIClient credentials grant with scope https://graph.microsoft.com/.default

Called By

CallerProtocolHow
Azure Event GridHTTP POSTPOST /webhooks/azure-blob — delivers Microsoft.Storage.BlobCreated and Microsoft.Storage.BlobDeleted events; handles subscription validation handshake
Microsoft GraphHTTP POSTPOST /webhooks/sharepoint — delivers change notifications for subscribed drives; handles subscription validation handshake (?validationToken=)
hbf-console (or any admin caller)HTTP POSTPOST /sync/org/:orgId/integrations/:integrationId/knowledge-bases/:knowledgeBaseId/full — guarded by HBFGuard + AdminOrgRoleGuard
hbf-console (or any admin caller)HTTP POSTPOST /sync/sharepoint/integrations/:integrationId/subscriptions — initialize Graph webhook subscriptions; guarded by HBFGuard + AdminOrgRoleGuard
hbf-console (or any admin caller)HTTP GETGET /sharepoint/drives?tenantId=...&siteUrl=... — list document libraries for integration setup; guarded by HBFGuard

Contracts

Inbound — Webhook (POST /webhooks/azure-blob)

Auth: none (Azure Event Grid delivers to the registered URL; webhook key routing via <accountName>:<containerName>).

Event Grid sends a JSON array. Two event types are handled:

Subscription validation (handshake):

[{ "eventType": "Microsoft.EventGrid.SubscriptionValidationEvent", "data": { "validationCode": "..." } }]

Blob created:

[{ "eventType": "Microsoft.Storage.BlobCreated", "topic": "/subscriptions/.../storageAccounts/<account>", "subject": "/blobServices/default/containers/<container>/blobs/<path>" }]

Blob deleted:

[{ "eventType": "Microsoft.Storage.BlobDeleted", "topic": "...", "subject": "/blobServices/default/containers/<container>/blobs/<path>" }]

Response: 200 OK immediately (processing is fire-and-forget).

Inbound — SharePoint Webhook (POST /webhooks/sharepoint)

Auth: optional ?secret=<SHAREPOINT_WEBHOOK_SECRET> query param (if SHAREPOINT_WEBHOOK_SECRET env var is set, requests without a matching secret are rejected with 401).

Subscription validation (handshake): Graph sends a POST with ?validationToken=<opaque-token>. The service echoes the token back as text/plain with status 200.

Change notification:

{
"value": [
{
"subscriptionId": "<graph-subscription-id>",
"clientState": "<orgId>:<integrationId>",
"changeType": "updated",
"resource": "drives/<driveId>/root",
"subscriptionExpirationDateTime": "2026-05-01T00:00:00Z",
"tenantId": "<azure-tenant-id>"
}
]
}

Response: 202 Accepted immediately. Processing is fire-and-forget: the service runs a Graph delta query to resolve actual file changes.

Inbound — Full Sync (POST /sync/org/:orgId/integrations/:integrationId/knowledge-bases/:knowledgeBaseId/full)

Auth: Authorization: Bearer <user-token> (HBFGuard validates via hbf-core GET /users/me; AdminOrgRoleGuard checks org ADMIN role).

No request body required. Path params identify the target org, integration, and KB. Works for both Azure Blob and SharePoint integrations (provider resolved from integration type). For SharePoint, also calls ensureSubscription after sync completes.

Inbound — SharePoint Subscriptions (POST /sync/sharepoint/integrations/:integrationId/subscriptions)

Auth: Authorization: Bearer <user-token> (HBFGuard + AdminOrgRoleGuard).

Request body:

{ "organizationId": "<org-id>" }

Response:

{ "subscriptionCount": 2 }

Creates Graph webhook subscriptions for each drive in the integration. Establishes baseline delta links so the first notification only picks up new changes.

Inbound — Drive Discovery (GET /sharepoint/drives?tenantId=...&siteUrl=...)

Auth: Authorization: Bearer <user-token> (HBFGuard).

Query params: tenantId (Azure tenant), siteUrl (e.g. https://contoso.sharepoint.com/sites/MySite). Validates that siteUrl is a *.sharepoint.com domain.

Response:

{ "drives": [{ "id": "<drive-id>", "name": "Documents" }, ...] }

Outbound — hbf-core-api calls

Integration lookup by webhook key:

  • IntegrationClient.findAllByWebhookKey(webhookKey: string) — key format: <accountName>:<containerName>

KB list (filtered by integration):

  • KnowledgeBaseClient.list(orgId, { source: integrationId })

File ingestion (blob created):

  • KnowledgeBaseArticleClient.fileToArticles(orgId, kbId, fileBuffer, fileName, { sourceId, groupName, source, publishArticles: true })
  • sourceId format: azure-blob:<account>:<container>:<blobPath>

KB group deletion (blob deleted):

  • KnowledgeBaseGroupClient.deleteBySourceId(orgId, kbId, sourceId)

Orphan cleanup (during full sync):

  • KnowledgeBaseGroupClient.deleteBySourceId(orgId, kbId, sourceId) — removes orphaned groups no longer present in the source

Integration update (SharePoint subscription/delta state):

  • IntegrationClient.update(orgId, integrationId, { webhookSubscriptionIds, webhookSubscriptionExpiries, deltaLinks, lastSubscriptionRenewalCheck }) — persists Graph subscription IDs, expiry dates, and delta links per drive

Active SharePoint subscriptions (for renewal sweep):

  • IntegrationClient.listSharepointActiveSubscriptions() — returns all integrations with active Graph subscriptions

KB group listing (for orphan cleanup during full sync):

  • KnowledgeBaseGroupClient.list(orgId, kbId) — lists existing groups to detect orphaned sourceIds after sync

User auth (HBFGuard):

  • UsersClient.findCurrentUser() — called with the caller's Bearer token

Outbound — Microsoft Graph API calls

All calls go through GraphClientService which manages OAuth2 token caching per tenant.

Site resolution:

  • GET /sites/{hostname}:{serverRelativePath} — resolve SharePoint site URL to siteId

Drive listing and resolution:

  • GET /sites/{siteId}/drives — list document libraries
  • Drive lookup by name (client-side filter on response)

File operations:

  • GET /drives/{driveId}/items/{itemId}/children — list children (recursive traversal for full sync)
  • GET /drives/{driveId}/items/{itemId} — get item metadata + @microsoft.graph.downloadUrl
  • Download via pre-authenticated URL from @microsoft.graph.downloadUrl

Delta queries (incremental sync):

  • GET /drives/{driveId}/root/delta — initial delta query
  • GET {deltaLink} — subsequent delta queries using stored deltaLink
  • Returns paginated results with @odata.nextLink / @odata.deltaLink

Subscription management:

  • POST /subscriptions — create webhook subscription for /drives/{driveId}/root with changeType updated
  • PATCH /subscriptions/{id} — renew subscription (update expirationDateTime)
  • GET /subscriptions/{id} — check subscription resource (for siteUrl staleness detection)
  • DELETE /subscriptions/{id} — remove subscription

Item metadata:

  • GET /sites/{siteId}/drives/{driveId}/items/{itemId}/listItem/fields — read managed metadata (taxonomy columns)
  • GET /drives/{driveId}/items/{itemId}/permissions — read item permissions (for RBAC sync)

Flows Involving This Service

This service is an ingestion pipeline triggered by external events (Azure Event Grid, Microsoft Graph) rather than a participant in the bot message-processing or live-chat flows. It also runs a background subscription renewal timer for SharePoint Graph webhook subscriptions (every 12 hours).