AI Brief: hbf-knowledge-manager
NestJS service that syncs external storage files into HBF Knowledge Base articles via hbf-core-api. Supports Azure Blob Storage (via Event Grid webhooks) and SharePoint Online (via Microsoft Graph API with delta sync and webhook subscriptions).
What This Repo Does
Receives webhook events from Azure Event Grid (blob creation/deletion) and Microsoft Graph (SharePoint change notifications), translating them into Knowledge Base article operations (file-to-articles conversion or KB group deletion) via hbf-core-api. Also exposes guarded REST endpoints to trigger a full sync for either source and to manage SharePoint webhook subscriptions. SharePoint integration uses delta queries for incremental sync, automatic subscription lifecycle management (creation, renewal, orphan cleanup), managed metadata/taxonomy extraction, and RBAC permission syncing.
Tech Stack
- Language: TypeScript 5.7
- Framework: NestJS 11 (platform-express)
- Key dependencies:
@azure/storage-blob(Azure Blob SDK),@microsoft/microsoft-graph-client(Microsoft Graph API),@helvia/hbf-core-api(hbf-core client),@nestjs/throttler(rate limiting on webhook endpoints),nestjs-pino+@elastic/ecs-pino-format(structured logging),elastic-apm-node(APM),joi(config validation)
Entry Points
- Main:
src/main.ts - App module:
src/app.module.ts - Config validation:
AppModuleinlineJoi.objectschema (PORT, CORE_BASE_URL, CORE_TOKEN, PINO_*, SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_SECRET, SHAREPOINT_WEBHOOK_BASE_URL, SHAREPOINT_WEBHOOK_SECRET)
Key Directories
| Directory | Purpose |
|---|---|
src/webhooks/ | Webhook controller: receives Azure Event Grid and Microsoft Graph POST webhooks, dispatches to source handler/sync engine (Azure Blob) or delta service (SharePoint) |
src/sync/ | Sync controller (full-sync endpoint) + SyncEngineService (handles CREATED/DELETED events) + SyncIntegrationResolverService |
src/sources/ | SourceRegistryService (maps ProviderType to source/handler) + per-provider subdirs |
src/sources/azure-blob/ | AzureBlobSource (download, list, fullSync via @azure/storage-blob), AzureBlobWebhookHandler (parse/validate Event Grid payloads) |
src/sources/sharepoint/ | SharePointSource (download, list, fullSync via Graph API), GraphClientService (Microsoft Graph client with token cache), SharePointDeltaService (delta query processing for incremental sync), SharePointSubscriptionService (Graph subscription lifecycle: create, renew, delete, sweep), SharePointWebhookHandler (stub handler; actual processing uses delta service), SharePointController (GET /sharepoint/drives for drive discovery) |
src/hbf-core/ | HbfCoreService: wraps HBFCoreApi — integration lookups, KB queries, fileToArticles, deleteKbGroup, updateIntegration (for subscription/delta state), listSharepointActiveSubscriptions |
src/guards/ | HBFGuard (token validation via hbf-core GET /users/me), AdminOrgRoleGuard (org ADMIN role check) |
src/common/ | Shared interfaces (KnowledgeSource, FileBasedKnowledgeSource, WebhookHandler, IntegrationConfig, NormalisedSourceEvent, SyncResult), enums (EventAction, ProviderType), utils |
External Dependencies
- hbf-core (via
CORE_BASE_URL+CORE_TOKEN): integration config, KB lookups, fileToArticles, KB group deletion, user auth, SharePoint subscription state persistence, integration updates - Azure Blob Storage: direct blob download via SAS token (no Azure credentials stored — SAS token comes from the integration config in hbf-core)
- Azure Event Grid: inbound webhook delivery of
Microsoft.Storage.BlobCreatedandMicrosoft.Storage.BlobDeletedevents - Microsoft Graph API (
https://graph.microsoft.com/v1.0): SharePoint site/drive resolution, file listing and download, delta queries for incremental sync, webhook subscription management (create/renew/delete), item permissions, list item fields (managed metadata). Authenticated via OAuth2 client credentials (SHAREPOINT_CLIENT_ID+SHAREPOINT_CLIENT_SECRETper tenant) - Azure AD / Entra ID (
login.microsoftonline.com): OAuth2 token endpoint for Graph API client credentials flow
API Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /webhooks/azure-blob | None (Event Grid handshake + implicit key via webhook registration) | Receive Azure Event Grid events; handle subscription validation and blob events |
| POST | /webhooks/sharepoint | Optional ?secret= query param (matched against SHAREPOINT_WEBHOOK_SECRET) | Receive Microsoft Graph change notifications; handle subscription validation handshake (?validationToken=) and fire-and-forget delta processing |
| POST | /sync/org/:orgId/integrations/:integrationId/knowledge-bases/:knowledgeBaseId/full | HBFGuard + AdminOrgRoleGuard | Full sync: re-upload all files from the integration's source. For SharePoint, also ensures webhook subscriptions exist after sync |
| POST | /sync/sharepoint/integrations/:integrationId/subscriptions | HBFGuard + AdminOrgRoleGuard | Initialize or recreate Graph webhook subscriptions for a SharePoint integration (body: { organizationId }) |
| GET | /sharepoint/drives | HBFGuard | List available document libraries for a SharePoint site (query: tenantId, siteUrl). Used by hbf-console for drive discovery during integration setup |
Event Flow (Webhook Path)
Azure Blob Storage
Azure Event Grid POST /webhooks/azure-blob
-> AzureBlobWebhookHandler.handleValidation (subscription handshake? return 200)
-> respond 200 immediately (fire-and-forget processing)
-> extract accountName + containerName from topic/subject
-> HbfCoreService.findAllByWebhookKey (lookup integrations by webhookKey)
-> for each integration:
-> HbfCoreService.findKnowledgeBasesByIntegration
-> SyncEngineService.handleFileEvent
-> CREATED: AzureBlobSource.downloadFile -> HbfCoreService.fileToArticles
-> DELETED: HbfCoreService.deleteKbGroupBySourceId
SharePoint
Microsoft Graph POST /webhooks/sharepoint
-> validation handshake? echo ?validationToken= as text/plain 200
-> verify ?secret= against SHAREPOINT_WEBHOOK_SECRET (if configured)
-> respond 202 immediately (fire-and-forget processing)
-> SharePointDeltaService.processNotification (deduplicate by subscriptionId)
-> decodeClientState -> orgId + integrationId
-> HbfCoreService.getIntegration -> resolve driveName from subscriptionId
-> GraphClientService.getDelta (paginated delta query using stored deltaLink)
-> for each changed item:
-> CREATED: read managed metadata + RBAC permissions (if sync toggles enabled)
-> SharePointSource.downloadFile -> HbfCoreService.fileToArticles
-> DELETED: HbfCoreService.deleteKbGroupBySourceId
-> persist new deltaLink to integration via HbfCoreService.updateIntegration
SharePoint Subscription Lifecycle
On module init:
-> SharePointSubscriptionService.sweepAndRenew (immediate + every 12 hours)
-> HbfCoreService.listSharepointActiveSubscriptions
-> for each subscription expiring within 48 hours:
-> GraphClientService.renewSubscription
-> persist new expiry via HbfCoreService.updateIntegration
On full sync (SharePoint provider):
-> SharePointSubscriptionService.ensureSubscription
-> create missing subscriptions, remove orphaned ones (drives removed from config)
-> establish baseline deltaLink for new drives (consume initial delta to skip existing items)
Running Locally
npm run start:dev # watch mode (ELASTIC_APM_ACTIVE=false by default)
npm run start # normal start
Tests
npm test
npm run test:cov