Data Model: helvia-rag-pipelines
Domain objects used by this service. Python (FastAPI) application with SQLAlchemy ORM and vector database.
Objects This Service Uses
This service does not import from hbf-core-api. It receives requests from hbf-nlp and hbf-bot and manages its own data stores.
Local Entities
| Entity | Table | DB | Key Fields |
|---|---|---|---|
| Pipeline | pipelines | MySQL | id (uuid str), name, tenant_id, organization_id, status, configuration_json, consumed_llm_tokens, vdb_collection, last_trained_at |
| CorpusItem | corpus | MySQL | int_id (PK), id, title, body, language, training_text, tags, url, group, need_training, pipeline_id (FK), shared_corpus_id |
| HistoryMessage | history_messages | MySQL | id (auto), session_id, language, role, content, pipeline_id (FK) |
| SemCacheStats | sem_cache_stats | MySQL | id (auto), cache_id, query, query_processed, query_language, session_id, generated_text, examined_corpus, cache_match, pipeline_id (FK) |
| NLPProvider | nlp_providers | MySQL | id (auto), name, platform (enum), scope (enum), url, api_key |
| PipelineAnalytics | pipeline_analytics | MySQL | id (auto), pipeline_id, tenant_id, organization_id, session_id, record_type, operation, provider, model, metric, record_value |
External Data Stores
| Store | Type | Purpose |
|---|---|---|
| Qdrant or Milvus | Vector DB | Stores document embeddings for semantic search. Collection per pipeline. |
| Redis | Cache | Embedding cache (optional, via embedding_cache_redis_async). |
Notes
- SQLAlchemy with MySQL (InnoDB, utf8mb4 charset).
- Schema managed via Alembic migrations.
- Unique constraint on
(id, pipeline_id)in corpus table. - Pipeline configuration stored as JSON text blob, deserialized to
PipelineConfigurationPydantic model at read time. - Vector DB backend is configurable (Qdrant or Milvus) via
VECTOR_DBenvironment variable. - No direct interaction with hbf-core MongoDB. Receives tenant_id/organization_id as request parameters.
- Uses JWT bearer auth (validated locally, not against hbf-core).