Skip to main content

Architecture: hbf-knowledge-manager

Internal Component Diagram

Module Dependency Graph

AppModule
HbfCoreModule (provides HbfCoreService, global)
SourcesModule
AzureBlobModule (provides AzureBlobSource, AzureBlobWebhookHandler)
SharePointModule (provides GraphClientService, SharePointSource, SharePointWebhookHandler,
SharePointSubscriptionService, SharePointDeltaService;
controller: SharePointController; imports SyncModule via forwardRef)
ResolverModule (global; maps ProviderType aliases to Source/Handler instances)
SyncModule
HbfCoreModule
SourcesModule
SyncController
SyncEngineService
SyncIntegrationResolverService
SharePointSubscriptionService (injected into SyncController)
HBFGuard, AdminOrgRoleGuard
WebhooksModule
HbfCoreModule
SourcesModule
SyncModule
WebhooksController (injects SharePointDeltaService for Graph notifications)

Key Design Decisions

Fire-and-forget webhook processing. The webhook controller ACKs Azure Event Grid with HTTP 200 before any sync work begins. This avoids Event Grid retry storms if hbf-core is slow or the blob download is large.

Webhook key routing. Integration lookup uses a composite key <accountName>:<containerName> (built by buildAzureBlobWebhookKey). A single Event Grid subscription can match multiple integrations (e.g., different orgs sharing the same container).

sourceId stability. Azure Blob source IDs are built as azure-blob:<accountName>:<containerName>:<blobPath>. This stable key is what hbf-core uses to identify KB groups for upsert/delete, making re-ingestion idempotent at the group level.

pathPrefix filtering. If an integration has a pathPrefix, blob events outside that prefix are silently skipped. Full-sync passes the prefix to listFiles to scope the listing.

allowedContentTypes filtering. Full sync skips blobs whose contentType does not start with any entry in allowedContentTypes. Webhook-triggered downloads do not filter by content type (hbf-core handles segmentation).

No database. hbf-knowledge-manager has no local data store. All state (integrations, KB mappings, articles) lives in hbf-core.

Extensible source architecture. Adding a new provider requires: a new ProviderType enum value, a new source class implementing KnowledgeSource (or FileBasedKnowledgeSource), an optional WebhookHandler, and registration in ResolverModule. The sync engine and webhook controller are provider-agnostic.

SharePoint delta sync (not per-file webhooks). Unlike Azure Event Grid which delivers per-file events, Microsoft Graph change notifications only signal "something changed in this drive". The actual file-level changes are resolved by running a delta query against the Graph API. Delta links are persisted per-drive in the integration record in hbf-core, making incremental sync resumable across restarts.

SharePoint subscription lifecycle. Graph webhook subscriptions expire after 29 days max. SharePointSubscriptionService runs a sweep-and-renew timer every 12 hours (plus an immediate check on startup), renewing any subscription expiring within 48 hours. Subscriptions are keyed by driveName (human-readable library name), not driveId. When drives are added/removed from an integration, ensureSubscription creates new subscriptions and deletes orphaned ones.

SharePoint sourceId stability. Source IDs use a SHA-256 hash of sharepoint:<siteId>:<driveId>:<itemId>, producing a URL-safe sp_<digest> key. This is stable across file renames (unlike path-based keys) and deterministic for idempotent upserts.

Sync toggles. SharePoint integrations support syncManagedMetadata and syncRbacPermissions boolean config fields (default: true). When enabled, full sync and delta processing fetch list item fields (for taxonomy extraction) and item permissions (for RBAC normalization) from Graph API, attaching them as article metadata.

clientState routing. SharePoint webhook notifications include a clientState field set during subscription creation, encoded as <orgId>:<integrationId>. This allows the delta service to resolve the correct integration without a webhook key lookup (unlike Azure Blob which uses accountName:containerName).