ModeId ModelSeries ModelType Supported Instances Supported Services Support China Region
glm-4-9b-chat glm4 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
GLM-4-9B-0414 glm4 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
GLM-4-32B-0414 glm4 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
GLM-Z1-9B-0414 glm4 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
GLM-Z1-32B-0414 glm4 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
GLM-Z1-Rumination-32B-0414 glm4 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
internlm2_5-20b-chat-4bit-awq internlm2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
internlm2_5-20b-chat internlm2.5 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
internlm2_5-7b-chat internlm2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
internlm2_5-7b-chat-4bit internlm2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
internlm2_5-1_8b-chat internlm2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-7B-Instruct qwen2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,inf2.8xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-72B-Instruct-AWQ qwen2.5 llm g5.12xlarge,g5.24xlarge,g5.48xlarge,inf2.24xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-72B-Instruct qwen2.5 llm g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-72B-Instruct-AWQ-128k qwen2.5 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-32B-Instruct qwen2.5 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-0.5B-Instruct qwen2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,inf2.8xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-1.5B-Instruct qwen2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-3B-Instruct qwen2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-14B-Instruct-AWQ qwen2.5 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,g4dn.2xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2.5-14B-Instruct qwen2.5 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
QwQ-32B-Preview qwen reasoning model llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
QwQ-32B qwen reasoning model llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-8B qwen3 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,g4dn.2xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-0.6B qwen3 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,g4dn.2xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-1.7B qwen3 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-4B qwen3 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,g4dn.2xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-14B-AWQ qwen3 llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,g4dn.2xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-14B qwen3 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-32B-AWQ qwen3 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-32B qwen3 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-30B-A3B qwen3 llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen3-235B-A22B qwen3 llm
Qwen3-235B-A22B-FP8 qwen3 llm
llama-3.3-70b-instruct-awq llama llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Qwen-32B deepseek reasoning model llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Qwen-14B deepseek reasoning model llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Qwen-7B deepseek reasoning model llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Qwen-1.5B deepseek reasoning model llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Qwen-1.5B_ollama deepseek reasoning model llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Qwen-1.5B-GGUF deepseek reasoning model llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Qwen-32B-GGUF deepseek reasoning model llm g5.12xlarge,g5.24xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1-Distill-Llama-8B deepseek reasoning model llm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
deepseek-r1-distill-llama-70b-awq deepseek reasoning model llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
deepseek-r1-671b-1.58bit_gguf deepseek reasoning model llm g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g6.8xlarge,g6.12xlarge,g6.16xlarge,g6.24xlarge,g6.48xlarge,g6e.4xlarge,g6e.8xlarge,g6e.12xlarge,g6e.16xlarge,g6e.24xlarge,g6e.48xlarge sagemaker_realtime,sagemaker_async,ecs
deepseek-r1-671b-2.51bit_gguf deepseek reasoning model llm g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g6.12xlarge,g6.16xlarge,g6.24xlarge,g6.48xlarge,g6e.8xlarge,g6e.12xlarge,g6e.16xlarge,g6e.24xlarge,g6e.48xlarge sagemaker_realtime,sagemaker_async,ecs
DeepSeek-R1 deepseek reasoning model llm
deepseek-r1-671b-4bit_gguf deepseek reasoning model llm g5.24xlarge,g5.48xlarge,g6.24xlarge,g6.48xlarge,g6e.16xlarge,g6e.24xlarge,g6e.48xlarge sagemaker_realtime,sagemaker_async,ecs
deepseek-v3-UD-IQ1_M_ollama deepseek v3 llm g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Baichuan-M1-14B-Instruct baichuan llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
ReaderLM-v2 jina llm g4dn.2xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,inf2.8xlarge sagemaker_realtime,sagemaker_async,ecs
txgemma-9b-chat txgemma llm g5.12xlarge,g5.24xlarge,g5.48xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
txgemma-27b-chat txgemma llm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Qwen2-VL-72B-Instruct-AWQ qwen2vl vlm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async
Qwen2.5-VL-72B-Instruct-AWQ qwen2vl vlm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async
Qwen2.5-VL-32B-Instruct qwen2vl vlm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async
QVQ-72B-Preview-AWQ qwen reasoning model vlm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async
Qwen2-VL-7B-Instruct qwen2vl vlm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g6e.2xlarge sagemaker_realtime,sagemaker_async
UI-TARS-1.5-7B agent vlm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g6e.2xlarge sagemaker_realtime,sagemaker_async
InternVL2_5-78B-AWQ internvl2.5 vlm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async
gemma-3-4b-it gemma3 vlm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
gemma-3-12b-it gemma3 vlm g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,sagemaker_async,ecs
gemma-3-27b-it gemma3 vlm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
Mistral-Small-3.1-24B-Instruct-2503 mistral vlm g5.12xlarge,g5.24xlarge,g5.48xlarge sagemaker_realtime,sagemaker_async,ecs
txt2video-LTX comfyui video g5.4xlarge,g5.8xlarge,g6e.2xlarge sagemaker_async
whisper whisper whisper g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_async
bce-embedding-base_v1 bce embedding g4dn.2xlarge,g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,ecs
bge-base-en-v1.5 bge embedding g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,ecs
bge-m3 bge embedding g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,ecs
jina-embeddings-v3 jina embedding g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,ecs
bge-reranker-v2-m3 bge rerank g4dn.2xlarge,g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,ecs
bge-reranker-large bge rerank g4dn.2xlarge,g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,ecs
jina-reranker-v2-base-multilingual jina rerank g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge sagemaker_realtime,ecs