AI Brief: hbf-nlp
NestJS NLP service that handles intent classification, entity extraction, language detection, LLM orchestration, and RAG pipeline routing for the Helvia Chatbricks platform. Offloads all NLP processing that previously ran inside hbf-core.
What This Repo Does
Receives user messages and routes them through one of three NLP pipeline backends: Helvia RAG Pipelines (vector search + LLM generation), Helvia NLP Specification (custom NLP), or Dialogflow. Also exposes LLM endpoints for chat session analysis, language detection, and direct LLM requests. Persists per-message NLP metadata (pipeline used, confidence, timings) to MySQL.
Tech Stack
- Language: TypeScript
- Framework: NestJS 11
- Key dependencies: @helvia/hbf-core-api, @nestjs/typeorm + typeorm (MySQL), @google-cloud/dialogflow, @google/genai, axios, @nestjs/cache-manager + keyv/redis, @nestjs/schedule, nestjs-pino + @elastic/ecs-pino-format, elastic-apm-node
- Dev dependencies: openai (dev only)
Entry Points
- Main:
src/main.ts - App module:
src/app.module.ts - Config:
.env/.env.local(envFilePath in ConfigModule)
Key Directories
| Directory | Purpose |
|---|---|
src/nlp/ | Core NLP pipeline routing, training, and processing logic |
src/nlp/providers/ | Pipeline provider implementations (RAG, NLP Spec, Dialogflow) |
src/nlp/clients/ | HTTP clients to call downstream pipeline APIs |
src/llm/ | LLM orchestration: analyze session, detect language, direct LLM request |
src/llm/providers/ | Abstract LlmProvider + OpenAI, Azure, Gemini implementations |
src/llm/clients/ | Low-level API clients for Azure and OpenAI |
src/generation/ | Text generation module with per-provider generation providers |
src/core/ | hbf-core-api wrapper (tenants, pipelines, sessions, activities) |
src/models/ | NLP model CRUD (TypeORM entities via models.module) |
src/test-set/ | Test set management for NLP pipeline evaluation |
src/scheduler/ | Scheduled tasks (e.g., poll training status) |
src/notifications/ | Push notifications to hbf-core on training events |
src/entities/ | TypeORM entity: MessageMetadata (NLP trace per message) |
migrations/ | TypeORM migrations (MySQL) |
src/utils/ | String similarity, session utils, knowledge base article utils |
src/guards/ | HBFGuard, JWTGuard, role guards (CanReadTenant, CanManageTenant, etc.) |
API Surface
Key endpoints discovered:
NLP Processing:
POST /organizations/:orgId/tenants/:tenantId/process— process a message using the tenant's default pipeline (language-aware)POST /organizations/:orgId/tenants/:tenantId/nlp-pipelines/:pipelineId/process— process with a specific pipelinePOST /tenants/:tenantId/process— moderator-only process (no org scoping)
NLP Training:
POST /organizations/:orgId/tenants/:tenantId/train— train all pipelines for a tenantPOST /organizations/:orgId/tenants/:tenantId/nlp-pipelines/:pipelineId/train— train a specific pipelinePOST /tenants/:tenantId/train— moderator-level tenant trainPOST /nlp-pipelines/:pipelineId/train— moderator-level single pipeline train
LLM:
POST /organizations/:orgId/tenants/:tenantId/sessions/:sessionId/analyze— analyze/summarize a chat sessionPOST /llm-request— direct LLM completion request (hbf-bot only, JWT auth)POST /detect-language— detect message language (hbf-bot only, JWT auth)GET /organizations/:orgId/categories/:category/prompt— retrieve plugin prompt for a categoryGET /organizations/:orgId/providers/:alias/version— get LLM provider API version
Message Metadata:
- Endpoints in
src/nlp/message-metadata/(read per-message NLP trace)
External Dependencies
- LLM Providers: OpenAI, Azure OpenAI, Google Gemini, Google Dialogflow
- Database: MySQL (via TypeORM)
- Cache: Redis (optional, falls back to in-memory via CacheableMemory)
- hbf-core-api: tenant/pipeline/session/activity data
- helvia-rag-pipelines: downstream RAG pipeline service (HTTP)
- Helvia NLP Specification service: downstream NLP spec pipeline (HTTP)
- APM: Elastic APM
Running Locally
npm install
npm run start:dev
Swagger UI available at /api when running.
Tests
npm test # unit tests
npm run test:e2e # e2e tests
npm run test:cov # coverage report