Skip to main content

Database Schema Reference

Keep this file up to date. When models or migrations change, update this document accordingly.

MySQL database with InnoDB engine, utf8mb4 charset, utf8mb4_unicode_ci collation on all tables.

Models defined in: app/models/models.py Migrations: alembic/versions/

Tables

document

Stores uploaded document binary data and metadata.

ColumnTypeNullableDefaultDescription
idString(255) PKnoUUIDPrimary key
statusEnum(NEW, PROCESSED, CANCELED)noNEWProcessing status
createdDateTimenoUTC nowCreation timestamp
updatedDateTimenoUTC nowLast update (auto-updated)
bodyLargeBinary(16MB)no-Raw document bytes
filenameString(255)no-Original filename
filesizeIntegerno-File size in bytes
mimetypeString(255)no-MIME type
doctypeString(10)no-Document type (pdf, docx, pptx, xlsx, txt, html, md)

job

Tracks document processing jobs. One job per document submission.

ColumnTypeNullableDefaultDescription
idString(255) PKnoUUIDPrimary key
ipString(20)no-Client IP address
documentidString(255) FKno-References document.id (indexed)
statusEnum(NEW, PENDING, PROCESSING, COMPLETED, FAILED, CANCELED)noNEWJob status
createdDateTimenoUTC nowCreation timestamp
updatedDateTimenoUTC nowLast update (auto-updated)
completedDateTimeyesnullCompletion timestamp
timeitIntegeryesnullProcessing time in seconds
callbackurlVARCHAR(2048)yes-URL for progress/result callbacks
maxsizeIntegerno-Max article size (in SEGMENTER_SIZE_UNITS)
article_sizeString(20)yesnullNamed article size preset (small, medium, large, xlarge). Null for custom numeric sizes.
usetagsJSONno-Tag list for LLM tagging
maxtagsIntegerno1Max tags per segment
errormessageTEXTyes-Error details on failure
process_imagesBooleanno-Whether to handle images
optionsJSONno-Additional job options (mode, backend, OCR settings, etc.)
progressIntegerno0Processing progress (0-100)
correlation_idString(255)yesnullUUID for log correlation
processing_stageString(50)yesnullCurrent processing stage (e.g. UPLOADED, CONVERTING, SEGMENTING, COMPLETED)

documentsegment

Stores the output segments/articles produced by document processing.

ColumnTypeNullableDefaultDescription
idString(255) PKnoUUIDPrimary key
documentidString(255) FKno-References document.id (indexed)
statusEnum(COMPLETED, FAILED)no-Segment status
bodyTEXTno-Segment text content (markdown)
pagenrIntegerno-Source page number
groupString(255)yesnullSegment grouping
titleString(255)yesnullSegment/article title
tagsJSONyes-LLM-assigned tags
createdDateTimenoUTC nowCreation timestamp
updatedDateTimenoUTC nowLast update (auto-updated)
timeitIntegeryesnullDeprecated, no longer used
langString(10)no-Detected language code
ordinalIntegerno-Segment ordering index

Relationships

document (1) ──< job (N)

└──< documentsegment (N)
  • A document can have multiple job records (reprocessing)
  • A document can have multiple documentsegment records (the output articles)
  • DELETE /jobs/{jobid} cascades manually: deletes job, all segments for its document, and the document itself

Migration History

RevisionDescription
47cd9e9e0aa6Base schema (document, job, documentsegment)
3a0135f92ea8Add process_images field to job
97c5f3f7f0acAdd options JSON column to job
ac627aa39080Add processing_stage and correlation_id to job
854f0244c449Add article_size column to job (current head)