Skip to main content

AI Brief: hbf-knowledge-manager

NestJS service that syncs external storage files into HBF Knowledge Base articles via hbf-core-api. Supports Azure Blob Storage (via Event Grid webhooks) and SharePoint Online (via Microsoft Graph API with delta sync and webhook subscriptions).

What This Repo Does

Receives webhook events from Azure Event Grid (blob creation/deletion) and Microsoft Graph (SharePoint change notifications), translating them into Knowledge Base article operations (file-to-articles conversion or KB group deletion) via hbf-core-api. Also exposes guarded REST endpoints to trigger a full sync for either source and to manage SharePoint webhook subscriptions. SharePoint integration uses delta queries for incremental sync, automatic subscription lifecycle management (creation, renewal, orphan cleanup), managed metadata/taxonomy extraction, and RBAC permission syncing.

Tech Stack

  • Language: TypeScript 5.7
  • Framework: NestJS 11 (platform-express)
  • Key dependencies: @azure/storage-blob (Azure Blob SDK), @microsoft/microsoft-graph-client (Microsoft Graph API), @helvia/hbf-core-api (hbf-core client), @nestjs/throttler (rate limiting on webhook endpoints), nestjs-pino + @elastic/ecs-pino-format (structured logging), elastic-apm-node (APM), joi (config validation)

Entry Points

  • Main: src/main.ts
  • App module: src/app.module.ts
  • Config validation: AppModule inline Joi.object schema (PORT, CORE_BASE_URL, CORE_TOKEN, PINO_*, SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_SECRET, SHAREPOINT_WEBHOOK_BASE_URL, SHAREPOINT_WEBHOOK_SECRET)

Key Directories

DirectoryPurpose
src/webhooks/Webhook controller: receives Azure Event Grid and Microsoft Graph POST webhooks, dispatches to source handler/sync engine (Azure Blob) or delta service (SharePoint)
src/sync/Sync controller (full-sync endpoint) + SyncEngineService (handles CREATED/DELETED events) + SyncIntegrationResolverService
src/sources/SourceRegistryService (maps ProviderType to source/handler) + per-provider subdirs
src/sources/azure-blob/AzureBlobSource (download, list, fullSync via @azure/storage-blob), AzureBlobWebhookHandler (parse/validate Event Grid payloads)
src/sources/sharepoint/SharePointSource (download, list, fullSync via Graph API), GraphClientService (Microsoft Graph client with token cache), SharePointDeltaService (delta query processing for incremental sync), SharePointSubscriptionService (Graph subscription lifecycle: create, renew, delete, sweep), SharePointWebhookHandler (stub handler; actual processing uses delta service), SharePointController (GET /sharepoint/drives for drive discovery)
src/hbf-core/HbfCoreService: wraps HBFCoreApi — integration lookups, KB queries, fileToArticles, deleteKbGroup, updateIntegration (for subscription/delta state), listSharepointActiveSubscriptions
src/guards/HBFGuard (token validation via hbf-core GET /users/me), AdminOrgRoleGuard (org ADMIN role check)
src/common/Shared interfaces (KnowledgeSource, FileBasedKnowledgeSource, WebhookHandler, IntegrationConfig, NormalisedSourceEvent, SyncResult), enums (EventAction, ProviderType), utils

External Dependencies

  • hbf-core (via CORE_BASE_URL + CORE_TOKEN): integration config, KB lookups, fileToArticles, KB group deletion, user auth, SharePoint subscription state persistence, integration updates
  • Azure Blob Storage: direct blob download via SAS token (no Azure credentials stored — SAS token comes from the integration config in hbf-core)
  • Azure Event Grid: inbound webhook delivery of Microsoft.Storage.BlobCreated and Microsoft.Storage.BlobDeleted events
  • Microsoft Graph API (https://graph.microsoft.com/v1.0): SharePoint site/drive resolution, file listing and download, delta queries for incremental sync, webhook subscription management (create/renew/delete), item permissions, list item fields (managed metadata). Authenticated via OAuth2 client credentials (SHAREPOINT_CLIENT_ID + SHAREPOINT_CLIENT_SECRET per tenant)
  • Azure AD / Entra ID (login.microsoftonline.com): OAuth2 token endpoint for Graph API client credentials flow

API Endpoints

MethodPathAuthDescription
POST/webhooks/azure-blobNone (Event Grid handshake + implicit key via webhook registration)Receive Azure Event Grid events; handle subscription validation and blob events
POST/webhooks/sharepointOptional ?secret= query param (matched against SHAREPOINT_WEBHOOK_SECRET)Receive Microsoft Graph change notifications; handle subscription validation handshake (?validationToken=) and fire-and-forget delta processing
POST/sync/org/:orgId/integrations/:integrationId/knowledge-bases/:knowledgeBaseId/fullHBFGuard + AdminOrgRoleGuardFull sync: re-upload all files from the integration's source. For SharePoint, also ensures webhook subscriptions exist after sync
POST/sync/sharepoint/integrations/:integrationId/subscriptionsHBFGuard + AdminOrgRoleGuardInitialize or recreate Graph webhook subscriptions for a SharePoint integration (body: { organizationId })
GET/sharepoint/drivesHBFGuardList available document libraries for a SharePoint site (query: tenantId, siteUrl). Used by hbf-console for drive discovery during integration setup

Event Flow (Webhook Path)

Azure Blob Storage

Azure Event Grid POST /webhooks/azure-blob
-> AzureBlobWebhookHandler.handleValidation (subscription handshake? return 200)
-> respond 200 immediately (fire-and-forget processing)
-> extract accountName + containerName from topic/subject
-> HbfCoreService.findAllByWebhookKey (lookup integrations by webhookKey)
-> for each integration:
-> HbfCoreService.findKnowledgeBasesByIntegration
-> SyncEngineService.handleFileEvent
-> CREATED: AzureBlobSource.downloadFile -> HbfCoreService.fileToArticles
-> DELETED: HbfCoreService.deleteKbGroupBySourceId

SharePoint

Microsoft Graph POST /webhooks/sharepoint
-> validation handshake? echo ?validationToken= as text/plain 200
-> verify ?secret= against SHAREPOINT_WEBHOOK_SECRET (if configured)
-> respond 202 immediately (fire-and-forget processing)
-> SharePointDeltaService.processNotification (deduplicate by subscriptionId)
-> decodeClientState -> orgId + integrationId
-> HbfCoreService.getIntegration -> resolve driveName from subscriptionId
-> GraphClientService.getDelta (paginated delta query using stored deltaLink)
-> for each changed item:
-> CREATED: read managed metadata + RBAC permissions (if sync toggles enabled)
-> SharePointSource.downloadFile -> HbfCoreService.fileToArticles
-> DELETED: HbfCoreService.deleteKbGroupBySourceId
-> persist new deltaLink to integration via HbfCoreService.updateIntegration

SharePoint Subscription Lifecycle

On module init:
-> SharePointSubscriptionService.sweepAndRenew (immediate + every 12 hours)
-> HbfCoreService.listSharepointActiveSubscriptions
-> for each subscription expiring within 48 hours:
-> GraphClientService.renewSubscription
-> persist new expiry via HbfCoreService.updateIntegration

On full sync (SharePoint provider):
-> SharePointSubscriptionService.ensureSubscription
-> create missing subscriptions, remove orphaned ones (drives removed from config)
-> establish baseline deltaLink for new drives (consume initial delta to skip existing items)

Running Locally

npm run start:dev   # watch mode (ELASTIC_APM_ACTIVE=false by default)
npm run start # normal start

Tests

npm test
npm run test:cov