Data Model: helvia-rag-pipelines

Domain objects used by this service. Python (FastAPI) application with SQLAlchemy ORM and vector database.

Objects This Service Uses

This service does not import from hbf-core-api. It receives requests from hbf-nlp and hbf-bot and manages its own data stores.

Entity	Table	DB	Key Fields
Pipeline	pipelines	MySQL	id (uuid str), name, tenant_id, organization_id, status, configuration_json, consumed_llm_tokens, vdb_collection, last_trained_at
CorpusItem	corpus	MySQL	int_id (PK), id, title, body, language, training_text, tags, url, group, need_training, pipeline_id (FK), shared_corpus_id
HistoryMessage	history_messages	MySQL	id (auto), session_id, language, role, content, pipeline_id (FK)
SemCacheStats	sem_cache_stats	MySQL	id (auto), cache_id, query, query_processed, query_language, session_id, generated_text, examined_corpus, cache_match, pipeline_id (FK)
NLPProvider	nlp_providers	MySQL	id (auto), name, platform (enum), scope (enum), url, api_key
PipelineAnalytics	pipeline_analytics	MySQL	id (auto), pipeline_id, tenant_id, organization_id, session_id, record_type, operation, provider, model, metric, record_value

Store	Type	Purpose
Qdrant or Milvus	Vector DB	Stores document embeddings for semantic search. Collection per pipeline.
Redis	Cache	Embedding cache (optional, via `embedding_cache_redis_async`).

SQLAlchemy with MySQL (InnoDB, utf8mb4 charset).
Schema managed via Alembic migrations.
Unique constraint on (id, pipeline_id) in corpus table.
Pipeline configuration stored as JSON text blob, deserialized to PipelineConfiguration Pydantic model at read time.
Vector DB backend is configurable (Qdrant or Milvus) via VECTOR_DB environment variable.
No direct interaction with hbf-core MongoDB. Receives tenant_id/organization_id as request parameters.
Uses JWT bearer auth (validated locally, not against hbf-core).