Architecture: helvia-rag-pipelines

C4 Component Diagram

POST /pipelines/{pipeline_id}:process received; JWTBearer validates token and pipeline_id claim.
PipelineService.process() is called.
If translation.enabled and query language differs from pipeline's native languages, query is translated via TranslationService.
Previous messages are loaded from MessageHistoryService (if chat_history.enabled and session_id provided).
If query_summarization.enabled and history exists, LLMService.summarize_user_query() rewrites the query in context.
If semantic_search.enabled, SemanticSearchService.search() embeds the query (with optional cache) and searches the vector DB for top-k results.
LLMService.process() constructs a prompt from the corpus results and calls the configured LLM client; response is parsed by TextGenerationParser.
If response language differs from query language, TranslationService.translate_response() translates back.
Chat history is stored; LLM token count is recorded in MySQL.
Optional: if sem_cache.enabled, result is stored in SemCacheService for future semantic cache hits.

POST /pipelines/{pipeline_id}:train sets pipeline status to TRAINING.
PipelineService.train() calls _index_corpus().
SemanticSearchService.index_corpus() fetches all corpus items with need_training=True (or all if force_reindex=True).
For each item, calls the configured embedding client to generate a vector.
Vectors are upserted into the vector DB collection via VectorDbManager.
Pipeline status is set to READY; last_trained_at is updated; corpus items are marked trained.

PUT /pipelines/{pipeline_id}/corpus receives the full desired corpus.
PipelineService.update_corpus() diffs incoming items against existing MySQL corpus (insert/update/delete).
For new or changed items, _prepare_training_text() formats the article text (using article_format template with {{title}}, {{group}}, {{body}}, {{tags}}).
If the corpus language differs from the pipeline's native language, items are translated before storage.
Pipeline status is set to OUTDATED (requires re-training).