DynamoDB is used for workflow state management , not for search. It applies the One Table Design pattern to manage all state for projects, documents, workflows, segments, and processing steps in a single table.
Item Value Billing On-Demand Partition Key PK (String)Sort Key SK (String)GSI1 GSI1PK / GSI1SKGSI2 GSI2PK / GSI2SKStream NEW_AND_OLD_IMAGES
Field Description data.nameProject name data.languageLanguage (default: en) data.document_promptCustom prompt for document analysis data.ocr_modelOCR model (default: pp-ocrv5)
Query documents belonging to a project with begins_with(SK, 'DOC#').
Field Description data.file_nameFile name data.statusWorkflow status
PK: DOC#{document_id} (or WEB#{document_id})
Field Description data.project_idParent project data.file_uriS3 path data.file_nameFile name data.file_typeMIME type data.execution_arnStep Functions execution ARN data.statuspending / in_progress / completed / failed data.total_segmentsTotal segment count data.preprocessPer-stage preprocessing status (ocr, bda, transcribe, webcrawler)
GSI1PK: STEP#ANALYSIS_STATUS
GSI1SK: pending | in_progress | completed | failed
Tracks the status of each processing step in a workflow. GSI1 enables fast lookup of currently running analyses.
Step Description segment_prepSegment preparation bda_processorBedrock Document Analysis format_parserFormat parsing paddleocr_processorPaddleOCR processing transcribeAudio transcription webcrawlerWeb crawling segment_builderSegment construction segment_analyzerAI analysis (Claude) graph_builderGraph construction document_summarizerDocument summarization
Each step has status, label, started_at, ended_at, and error attributes.
SK: SEG#{segment_index:04d} ← 0001, 0002, ...
Field Description data.segment_indexSegment index data.s3_keyS3 path (segment data) data.image_uriImage URI data.image_analysisImage analysis results array
Query Index Key Condition Project document list Primary PK=PROJ#{proj_id}, SK begins_with DOC#Project workflow list Primary PK=PROJ#{proj_id}, SK begins_with WF#Workflow metadata Primary PK=DOC#{doc_id}, SK=WF#{wf_id}Step progress Primary PK=WF#{wf_id}, SK=STEPSegment list Primary PK=WF#{wf_id}, SK begins_with SEG#Specific segment Primary PK=WF#{wf_id}, SK=SEG#{index}In-progress analysis GSI1 GSI1PK=STEP#ANALYSIS_STATUS, GSI1SK=in_progress
Single transaction : Workflow metadata and step status are created atomically via batch_write
Efficient queries : All documents/workflows for a project retrieved with a single query
Cost reduction : Minimized operational complexity with a single table
DynamoDB stores only state and metadata , while actual data (segment content, analysis results) is stored in S3.
├─ Workflow status ├─ Segment raw data
├─ Step progress ├─ Analysis results (JSON)
├─ Segment metadata (s3_key) ├─ Entity extraction results
└─ WebSocket connections └─ Document summaries
Due to the Step Functions payload limit (256KB), DynamoDB serves as intermediate storage. Documents with 3000+ pages can be processed by passing only segment indices through the workflow.