Skip to content

PaddleOCR on SageMaker

A processing pipeline that routes OCR tasks to two backends based on model characteristics.

BackendModelsReason
Lambda (CPU)PP-OCRv5, PP-StructureV3Lightweight models, no GPU needed, fast startup
SageMaker (GPU)PaddleOCR-VLVision-Language model, GPU required

PP-OCRv5/PP-StructureV3 run fast enough on CPU alone, so they are processed directly on Lambda without SageMaker cold start overhead. PaddleOCR-VL requires GPU for per-region VLM inference and runs on SageMaker.


SQS (OCR Queue)
→ Lambda (OCR Invoker) ─── Routes by model
├─ [PP-OCRv5 / PP-StructureV3] ── CPU models
│ → Lambda (OCR Processor) ── Container image Lambda
│ ├─ S3 (save result.json)
│ └─ DynamoDB (update preprocess status)
└─ [PaddleOCR-VL] ── GPU model
├─ Scale-out: DesiredInstanceCount → 1
└─ InvokeEndpointAsync → SageMaker Endpoint
→ PaddleOCR-VL inference
├─ Success → SNS (Success) → OCR Complete Handler → DynamoDB + S3
└─ Failure → SNS (Error) → OCR Complete Handler → DynamoDB
SageMaker Scale-in:
CloudWatch Alarm (10 min idle)
→ SNS (Scale-in) → Scale-in Handler Lambda
→ DesiredInstanceCount → 0

Runs CPU-based OCR on a container image Lambda. Writes results directly to S3 and updates DynamoDB status without going through SageMaker.

ItemValue
Function Nameidp-v2-ocr-lambda-processor
RuntimePython 3.12 (Container Image)
Memory4096 MB
Timeout15 min
Base Imagepublic.ecr.aws/lambda/python:3.12
Dependenciespaddleocr>=3.3.0, paddlepaddle>=3.2.2, boto3
Model CacheModel archives cached on S3 (reused after initial download)

Processing Flow:

OCR Invoker (Invoke async, Event type)
→ OCR Lambda Processor
├─ Download file from S3 → /tmp
├─ Load model (S3 cache → HuggingFace fallback)
├─ Run OCR inference
├─ Save result.json to S3
└─ Update DynamoDB status (COMPLETED/FAILED)

No SNS callback needed: Unlike SageMaker async inference, the Lambda handles results directly, so no SNS topic is involved.

PaddleOCR-VL is a Vision-Language model that performs VLM inference for each detected text region, requiring GPU. Auto-scaling 0→1 configuration optimizes cost.

ItemValue
Instance Typeml.g5.xlarge (NVIDIA A10G 24GB)
Min Instances0 (Scale-to-zero)
Max Instances1
Max Concurrent Invocations4 / instance
Invocation Timeout3,600s (1 hour)
Max Response Size100MB
Base ImagePyTorch 2.2.0 GPU (CUDA 11.8, Ubuntu 20.04)

The VL model internally works as follows:

Image input
→ [Step 1] Layout detection (CPU/GPU) ── Detect N text regions
→ [Step 2] Per-region VLM inference (GPU) ── N sequential calls
→ Merge results

When N text regions are detected, the VLM is called N times sequentially. Due to this structural characteristic:

  • ~14s per page (independent of image size, proportional to region count)
  • ~25% GPU utilization (CPU pre/post-processing waits between VLM inferences)
  • Multi-process not possible on single GPU (VLM model ~12GB, OOM with 2 instances)

These constraints are why lightweight models run on Lambda to avoid SageMaker cold start, while only VL remains on SageMaker where GPU is required.


SageMaker (PaddleOCR-VL) only. The Lambda backend follows AWS Lambda’s automatic scaling.

ItemValue
TriggerOCR Invoker Lambda
TimingJust before SageMaker async inference invocation
MethodDirect update_endpoint_weights_and_capacities API call
ActionDesiredInstanceCount: 0 → 1
Response TimeImmediate (API call)
IdempotentNo-op if already at 1

When the OCR Invoker Lambda needs to process a document with the VL model, it activates the endpoint before invoking inference. Cold start time is required from 0 instances until the instance becomes available.

ItemValue
TriggerCloudWatch Alarm → SNS → Scale-in Handler Lambda
MetricApproximateBacklogSizePerInstance
Condition< 0.1 (effectively zero)
Evaluation Period10 consecutive minutes (1-min intervals, 10 periods)
Missing DataTreated as BREACHING (triggers alarm)
ActionDesiredInstanceCount: 1 → 0

When no work remains in the queue for 10 minutes, the CloudWatch alarm fires, triggering the Scale-in Handler Lambda via SNS to reduce instances to zero.

Document arrives ─→ OCR Invoker checks model
├─ [PP-OCRv5/V3] → Lambda processes immediately (no cold start)
└─ [VL] → SageMaker Scale-out (0 → 1)
Inference processing (including cold start)
Processing complete → SNS → OCR Complete Handler
No additional requests for 10 minutes
CloudWatch Alarm fires → Scale-in (1 → 0)
Billing stops (0 instances)

ItemValue
Nameidp-v2-ocr-invoker
RuntimePython 3.14
Memory256MB
Timeout1 min
TriggerSQS (batch size: 1)
RoleRoute by model: Lambda async invoke or SageMaker Scale-out + async inference
ItemValue
Nameidp-v2-ocr-lambda-processor
RuntimePython 3.12 (Container Image)
Memory4096 MB
Timeout15 min
TriggerLambda async invoke (from OCR Invoker)
RoleOCR inference, save results to S3, update DynamoDB status
Target ModelsPP-OCRv5, PP-StructureV3
ItemValue
Nameidp-v2-ocr-complete-handler
RuntimePython 3.14
Memory256MB
Timeout5 min
TriggerSNS (Success + Error topics)
RoleProcess SageMaker inference results, save to S3, update DynamoDB status
Target ModelsPaddleOCR-VL (via SageMaker)
ItemValue
Nameidp-v2-ocr-scale-in
RuntimePython 3.14
Memory128MB
Timeout30s
TriggerSNS (CloudWatch Alarm)
RoleDesiredInstanceCount → 0

Used only by the SageMaker (PaddleOCR-VL) path. The Lambda path does not use SNS.

TopicPurposeSubscriber
idp-v2-ocr-successInference success notificationOCR Complete Handler
idp-v2-ocr-errorInference failure notificationOCR Complete Handler
idp-v2-ocr-scale-inScale-in alarm notificationScale-in Handler

ModelBackendDescriptionUse Case
PP-OCRv5Lambda (CPU)High-accuracy general-purpose text extraction OCRGeneral documents, multilingual text
PP-StructureV3Lambda (CPU)Document structure analysis with table and layout detectionTables, forms, complex layouts
PaddleOCR-VLSageMaker (GPU)Vision-language model for document understandingComplex documents, contextual understanding

PaddleOCR supports 80+ languages.

LanguageCodeLanguageCode
Chinese & EnglishchKoreankorean
EnglishenJapanesejapan
Traditional Chinesechinese_chtFrenchfr
GermandeSpanishes
ItalianitPortuguesept
RussianruArabicar
HindihiThaith
VietnameseviTurkishtr
LanguageCodeLanguageCode
AfrikaansafAlbaniansq
BasqueeuBosnianbs
CatalancaCroatianhr
CzechcsDanishda
DutchnlEstonianet
FinnishfiGaliciangl
HungarianhuIcelandicis
IndonesianidIrishga
LatvianlvLithuanianlt
LuxembourgishlbMalayms
MaltesemtMaorimi
NorwegiannoOccitanoc
PolishplRomanianro
RomanshrmSerbian (Latin)rs_latin
SlovakskSloveniansl
SwedishsvTagalogtl
WelshcyLatinla
LanguageCodeLanguageCode
RussianruUkrainianuk
BelarusianbeBulgarianbg
Serbian (Cyrillic)srMacedonianmk
MongolianmnKazakhkk
KyrgyzkyTajiktg
TatarttUzbekuz
AzerbaijaniazMoldovanmo
BashkirbaChuvashcv
MarimhrUdmurtudm
KomikvOssetianos
BuriatbuaKalmykxal
TuviniantyvSakhasah
KarakalpakkaaAbkhazab
AdygheadyKabardiankbd
AvaravDargwadar
IngushinhChechence
LaklkiLezgianlez
Tabasarantab
LanguageCodeLanguageCode
ArabicarPersianfa
UyghurugUrduur
PashtopsKurdishku
SindhisdBalochibal
LanguageCodeLanguageCode
HindihiMarathimr
NepalineTamilta
TeluguteBiharibh
MaithilimaiBhojpuribho
MagahimahSadrisck
NewarnewKonkanigom
SanskritsaHaryanvibgc
Palipi
LanguageCodeLanguageCode
GreekelSwahilisw
QuechuaquOld Englishang

FormatExtensions
PDF.pdf
Images.png, .jpg, .jpeg, .tiff, .bmp, .webp