Skip to main content

Data Model: helvia-rag-pipelines

Domain objects used by this service. Python (FastAPI) application with SQLAlchemy ORM and vector database.

Objects This Service Uses

This service does not import from hbf-core-api. It receives requests from hbf-nlp and hbf-bot and manages its own data stores.

Local Entities

EntityTableDBKey Fields
PipelinepipelinesMySQLid (uuid str), name, tenant_id, organization_id, status, configuration_json, consumed_llm_tokens, vdb_collection, last_trained_at
CorpusItemcorpusMySQLint_id (PK), id, title, body, language, training_text, tags, url, group, need_training, pipeline_id (FK), shared_corpus_id
HistoryMessagehistory_messagesMySQLid (auto), session_id, language, role, content, pipeline_id (FK)
SemCacheStatssem_cache_statsMySQLid (auto), cache_id, query, query_processed, query_language, session_id, generated_text, examined_corpus, cache_match, pipeline_id (FK)
NLPProvidernlp_providersMySQLid (auto), name, platform (enum), scope (enum), url, api_key
PipelineAnalyticspipeline_analyticsMySQLid (auto), pipeline_id, tenant_id, organization_id, session_id, record_type, operation, provider, model, metric, record_value

External Data Stores

StoreTypePurpose
Qdrant or MilvusVector DBStores document embeddings for semantic search. Collection per pipeline.
RedisCacheEmbedding cache (optional, via embedding_cache_redis_async).

Notes

  • SQLAlchemy with MySQL (InnoDB, utf8mb4 charset).
  • Schema managed via Alembic migrations.
  • Unique constraint on (id, pipeline_id) in corpus table.
  • Pipeline configuration stored as JSON text blob, deserialized to PipelineConfiguration Pydantic model at read time.
  • Vector DB backend is configurable (Qdrant or Milvus) via VECTOR_DB environment variable.
  • No direct interaction with hbf-core MongoDB. Receives tenant_id/organization_id as request parameters.
  • Uses JWT bearer auth (validated locally, not against hbf-core).