A processing pipeline that routes OCR tasks to two backends based on model characteristics.
Backend Models Reason Lambda (CPU) PP-OCRv5, PP-StructureV3 Lightweight models, no GPU needed, fast startup SageMaker (GPU) PaddleOCR-VL Vision-Language model, GPU required
PP-OCRv5/PP-StructureV3 run fast enough on CPU alone, so they are processed directly on Lambda without SageMaker cold start overhead. PaddleOCR-VL requires GPU for per-region VLM inference and runs on SageMaker.
→ Lambda (OCR Invoker) ─── Routes by model
├─ [PP-OCRv5 / PP-StructureV3] ── CPU models
│ → Lambda (OCR Processor) ── Container image Lambda
│ ├─ S3 (save result.json)
│ └─ DynamoDB (update preprocess status)
└─ [PaddleOCR-VL] ── GPU model
├─ Scale-out: DesiredInstanceCount → 1
└─ InvokeEndpointAsync → SageMaker Endpoint
├─ Success → SNS (Success) → OCR Complete Handler → DynamoDB + S3
└─ Failure → SNS (Error) → OCR Complete Handler → DynamoDB
CloudWatch Alarm (10 min idle)
→ SNS (Scale-in) → Scale-in Handler Lambda
→ DesiredInstanceCount → 0
Runs CPU-based OCR on a container image Lambda. Writes results directly to S3 and updates DynamoDB status without going through SageMaker.
Item Value Function Name idp-v2-ocr-lambda-processorRuntime Python 3.12 (Container Image) Memory 4096 MB Timeout 15 min Base Image public.ecr.aws/lambda/python:3.12Dependencies paddleocr>=3.3.0, paddlepaddle>=3.2.2, boto3Model Cache Model archives cached on S3 (reused after initial download)
Processing Flow:
OCR Invoker (Invoke async, Event type)
├─ Download file from S3 → /tmp
├─ Load model (S3 cache → HuggingFace fallback)
├─ Save result.json to S3
└─ Update DynamoDB status (COMPLETED/FAILED)
No SNS callback needed : Unlike SageMaker async inference, the Lambda handles results directly, so no SNS topic is involved.
PaddleOCR-VL is a Vision-Language model that performs VLM inference for each detected text region, requiring GPU. Auto-scaling 0→1 configuration optimizes cost.
Item Value Instance Type ml.g5.xlarge (NVIDIA A10G 24GB)Min Instances 0 (Scale-to-zero) Max Instances 1 Max Concurrent Invocations 4 / instance Invocation Timeout 3,600s (1 hour) Max Response Size 100MB Base Image PyTorch 2.2.0 GPU (CUDA 11.8, Ubuntu 20.04)
The VL model internally works as follows:
→ [Step 1] Layout detection (CPU/GPU) ── Detect N text regions
→ [Step 2] Per-region VLM inference (GPU) ── N sequential calls
When N text regions are detected, the VLM is called N times sequentially . Due to this structural characteristic:
~14s per page (independent of image size, proportional to region count)
~25% GPU utilization (CPU pre/post-processing waits between VLM inferences)
Multi-process not possible on single GPU (VLM model ~12GB, OOM with 2 instances)
These constraints are why lightweight models run on Lambda to avoid SageMaker cold start, while only VL remains on SageMaker where GPU is required.
SageMaker (PaddleOCR-VL) only. The Lambda backend follows AWS Lambda’s automatic scaling.
Item Value Trigger OCR Invoker Lambda Timing Just before SageMaker async inference invocation Method Direct update_endpoint_weights_and_capacities API call Action DesiredInstanceCount: 0 → 1Response Time Immediate (API call) Idempotent No-op if already at 1
When the OCR Invoker Lambda needs to process a document with the VL model, it activates the endpoint before invoking inference. Cold start time is required from 0 instances until the instance becomes available.
Item Value Trigger CloudWatch Alarm → SNS → Scale-in Handler Lambda Metric ApproximateBacklogSizePerInstanceCondition < 0.1 (effectively zero) Evaluation Period 10 consecutive minutes (1-min intervals, 10 periods) Missing Data Treated as BREACHING (triggers alarm) Action DesiredInstanceCount: 1 → 0
When no work remains in the queue for 10 minutes, the CloudWatch alarm fires, triggering the Scale-in Handler Lambda via SNS to reduce instances to zero.
Document arrives ─→ OCR Invoker checks model
├─ [PP-OCRv5/V3] → Lambda processes immediately (no cold start)
└─ [VL] → SageMaker Scale-out (0 → 1)
Inference processing (including cold start)
Processing complete → SNS → OCR Complete Handler
No additional requests for 10 minutes
CloudWatch Alarm fires → Scale-in (1 → 0)
Billing stops (0 instances)
Item Value Name idp-v2-ocr-invokerRuntime Python 3.14 Memory 256MB Timeout 1 min Trigger SQS (batch size: 1) Role Route by model: Lambda async invoke or SageMaker Scale-out + async inference
Item Value Name idp-v2-ocr-lambda-processorRuntime Python 3.12 (Container Image) Memory 4096 MB Timeout 15 min Trigger Lambda async invoke (from OCR Invoker) Role OCR inference, save results to S3, update DynamoDB status Target Models PP-OCRv5, PP-StructureV3
Item Value Name idp-v2-ocr-complete-handlerRuntime Python 3.14 Memory 256MB Timeout 5 min Trigger SNS (Success + Error topics) Role Process SageMaker inference results, save to S3, update DynamoDB status Target Models PaddleOCR-VL (via SageMaker)
Item Value Name idp-v2-ocr-scale-inRuntime Python 3.14 Memory 128MB Timeout 30s Trigger SNS (CloudWatch Alarm) Role DesiredInstanceCount → 0
Used only by the SageMaker (PaddleOCR-VL) path. The Lambda path does not use SNS.
Topic Purpose Subscriber idp-v2-ocr-successInference success notification OCR Complete Handler idp-v2-ocr-errorInference failure notification OCR Complete Handler idp-v2-ocr-scale-inScale-in alarm notification Scale-in Handler
Model Backend Description Use Case PP-OCRv5 Lambda (CPU) High-accuracy general-purpose text extraction OCR General documents, multilingual text PP-StructureV3 Lambda (CPU) Document structure analysis with table and layout detection Tables, forms, complex layouts PaddleOCR-VL SageMaker (GPU) Vision-language model for document understanding Complex documents, contextual understanding
PaddleOCR supports 80+ languages .
Language Code Language Code Chinese & English chKorean koreanEnglish enJapanese japanTraditional Chinese chinese_chtFrench frGerman deSpanish esItalian itPortuguese ptRussian ruArabic arHindi hiThai thVietnamese viTurkish tr
Language Code Language Code Afrikaans afAlbanian sqBasque euBosnian bsCatalan caCroatian hrCzech csDanish daDutch nlEstonian etFinnish fiGalician glHungarian huIcelandic isIndonesian idIrish gaLatvian lvLithuanian ltLuxembourgish lbMalay msMaltese mtMaori miNorwegian noOccitan ocPolish plRomanian roRomansh rmSerbian (Latin) rs_latinSlovak skSlovenian slSwedish svTagalog tlWelsh cyLatin la
Language Code Language Code Russian ruUkrainian ukBelarusian beBulgarian bgSerbian (Cyrillic) srMacedonian mkMongolian mnKazakh kkKyrgyz kyTajik tgTatar ttUzbek uzAzerbaijani azMoldovan moBashkir baChuvash cvMari mhrUdmurt udmKomi kvOssetian osBuriat buaKalmyk xalTuvinian tyvSakha sahKarakalpak kaaAbkhaz abAdyghe adyKabardian kbdAvar avDargwa darIngush inhChechen ceLak lkiLezgian lezTabasaran tab
Language Code Language Code Arabic arPersian faUyghur ugUrdu urPashto psKurdish kuSindhi sdBalochi bal
Language Code Language Code Hindi hiMarathi mrNepali neTamil taTelugu teBihari bhMaithili maiBhojpuri bhoMagahi mahSadri sckNewar newKonkani gomSanskrit saHaryanvi bgcPali pi
Language Code Language Code Greek elSwahili swQuechua quOld English ang
Format Extensions PDF .pdfImages .png, .jpg, .jpeg, .tiff, .bmp, .webp