Skip to main content

AI Brief: hbf-nlp

NestJS NLP service that handles intent classification, entity extraction, language detection, LLM orchestration, and RAG pipeline routing for the Helvia Chatbricks platform. Offloads all NLP processing that previously ran inside hbf-core.

What This Repo Does

Receives user messages and routes them through one of three NLP pipeline backends: Helvia RAG Pipelines (vector search + LLM generation), Helvia NLP Specification (custom NLP), or Dialogflow. Also exposes LLM endpoints for chat session analysis, language detection, and direct LLM requests. Persists per-message NLP metadata (pipeline used, confidence, timings) to MySQL.

Tech Stack

  • Language: TypeScript
  • Framework: NestJS 11
  • Key dependencies: @helvia/hbf-core-api, @nestjs/typeorm + typeorm (MySQL), @google-cloud/dialogflow, @google/genai, axios, @nestjs/cache-manager + keyv/redis, @nestjs/schedule, nestjs-pino + @elastic/ecs-pino-format, elastic-apm-node
  • Dev dependencies: openai (dev only)

Entry Points

  • Main: src/main.ts
  • App module: src/app.module.ts
  • Config: .env / .env.local (envFilePath in ConfigModule)

Key Directories

DirectoryPurpose
src/nlp/Core NLP pipeline routing, training, and processing logic
src/nlp/providers/Pipeline provider implementations (RAG, NLP Spec, Dialogflow)
src/nlp/clients/HTTP clients to call downstream pipeline APIs
src/llm/LLM orchestration: analyze session, detect language, direct LLM request
src/llm/providers/Abstract LlmProvider + OpenAI, Azure, Gemini implementations
src/llm/clients/Low-level API clients for Azure and OpenAI
src/generation/Text generation module with per-provider generation providers
src/core/hbf-core-api wrapper (tenants, pipelines, sessions, activities)
src/models/NLP model CRUD (TypeORM entities via models.module)
src/test-set/Test set management for NLP pipeline evaluation
src/scheduler/Scheduled tasks (e.g., poll training status)
src/notifications/Push notifications to hbf-core on training events
src/entities/TypeORM entity: MessageMetadata (NLP trace per message)
migrations/TypeORM migrations (MySQL)
src/utils/String similarity, session utils, knowledge base article utils
src/guards/HBFGuard, JWTGuard, role guards (CanReadTenant, CanManageTenant, etc.)

API Surface

Key endpoints discovered:

NLP Processing:

  • POST /organizations/:orgId/tenants/:tenantId/process — process a message using the tenant's default pipeline (language-aware)
  • POST /organizations/:orgId/tenants/:tenantId/nlp-pipelines/:pipelineId/process — process with a specific pipeline
  • POST /tenants/:tenantId/process — moderator-only process (no org scoping)

NLP Training:

  • POST /organizations/:orgId/tenants/:tenantId/train — train all pipelines for a tenant
  • POST /organizations/:orgId/tenants/:tenantId/nlp-pipelines/:pipelineId/train — train a specific pipeline
  • POST /tenants/:tenantId/train — moderator-level tenant train
  • POST /nlp-pipelines/:pipelineId/train — moderator-level single pipeline train

LLM:

  • POST /organizations/:orgId/tenants/:tenantId/sessions/:sessionId/analyze — analyze/summarize a chat session
  • POST /llm-request — direct LLM completion request (hbf-bot only, JWT auth)
  • POST /detect-language — detect message language (hbf-bot only, JWT auth)
  • GET /organizations/:orgId/categories/:category/prompt — retrieve plugin prompt for a category
  • GET /organizations/:orgId/providers/:alias/version — get LLM provider API version

Message Metadata:

  • Endpoints in src/nlp/message-metadata/ (read per-message NLP trace)

External Dependencies

  • LLM Providers: OpenAI, Azure OpenAI, Google Gemini, Google Dialogflow
  • Database: MySQL (via TypeORM)
  • Cache: Redis (optional, falls back to in-memory via CacheableMemory)
  • hbf-core-api: tenant/pipeline/session/activity data
  • helvia-rag-pipelines: downstream RAG pipeline service (HTTP)
  • Helvia NLP Specification service: downstream NLP spec pipeline (HTTP)
  • APM: Elastic APM

Running Locally

npm install
npm run start:dev

Swagger UI available at /api when running.

Tests

npm test            # unit tests
npm run test:e2e # e2e tests
npm run test:cov # coverage report