Document Generation MCP with AgentCore Runtime¶
AI-powered document generation for Amazon Quick Suite. Creates professional docx, pdf, pptx, xlsx, and HTML files using a Strands SDK agent with Claude Sonnet on Bedrock and AgentCore Code Interpreter.
What You Get With This Solution¶
A chat-to-document pipeline that produces real, downloadable files — not text you copy-paste and reformat. The agent writes and executes Python code in a sandboxed Code Interpreter using real libraries (openpyxl, python-docx, reportlab, python-pptx), so the output includes:
- Formatted Word documents with headings, tables, bullet points, and page numbers
- Multi-sheet Excel workbooks with formulas, conditional formatting, and charts
- PowerPoint decks with slide layouts, themes, and data visualizations
- PDF reports with professional typography, headers, footers, and page breaks
- Interactive HTML prototypes with CSS and working JavaScript
The output is a file you download with one click from the chat — ready to attach to an email or present in a meeting.
Sample Outputs¶
These were generated entirely by the agent from natural language prompts — no manual editing:
| File | Prompt Summary |
|---|---|
| Employee_Performance_Tracking.xlsx | Employee performance spreadsheet with quarterly KPI scores, weighted averages, and distribution charts |
| Cloud_Migration_Business_Proposal.pdf | Cloud migration business proposal with executive summary, cost analysis, and timeline |
| AI_BackOffice_Automation_Pitch_Deck.pptx | AI back-office automation pitch deck with ROI projections and implementation roadmap |
| dataflow-landing-page.html | Product landing page with responsive layout, feature cards, and pricing tiers |
What You Can Ask¶
Spreadsheets (xlsx):
- "Build a project budget tracker spreadsheet with cost categories, monthly actuals vs. forecast, variance formulas, and a burn-down chart"
- "Create an employee performance tracking spreadsheet with quarterly KPI scores, weighted averages, and distribution charts"
- "Generate a sales pipeline spreadsheet with deal stages, win probability, weighted revenue, and a funnel chart"
Presentations (pptx):
- "Create a 15-slide Q4 business review PowerPoint with revenue charts, regional breakdowns, and key metrics"
- "Build an architecture decision record PowerPoint comparing 3 approaches with pros/cons tables and a recommendation slide"
Documents (docx):
- "Write a technical design Word document for a microservices migration with architecture diagrams described in tables, risk matrix, and timeline"
- "Generate an onboarding guide Word document with checklists, role-specific sections, and a 30-60-90 day plan table"
PDFs (pdf):
- "Create a professional invoice PDF with line items, tax calculations, and company branding"
- "Build a compliance audit report PDF with findings table, severity ratings, and remediation timeline"
Web prototypes (frontend-design):
- "Design a dashboard landing page HTML with a sidebar nav, metric cards, and a responsive data table"
- "Create a pricing page HTML with three tiers, feature comparison grid, and toggle between monthly/annual"
The Key Difference¶
The agent doesn't just write text — it writes and executes Python code in a sandboxed Code Interpreter. That means it can use real libraries (openpyxl, python-docx, reportlab, python-pptx) to produce files with:
- Formulas and cell references (not just static numbers)
- Conditional formatting and data validation
- Charts generated from actual data
- Proper document styling, fonts, and page layout
- Multi-sheet/multi-slide structure
- Interactive HTML with working JavaScript
The document generation skills for docx, pdf, pptx, and xlsx are inspired by Anthropic's open-source skills, adapted here to run on AgentCore Runtime with Code Interpreter and enhanced with tool call budgeting and base64 capture hooks for reliability.
Architecture¶
Amazon Quick Suite (Chat)
│
▼
AgentCore Gateway (MCP tools) ← CDK-managed: Gateway + Cognito + Lambdas
│
├── create_document ──→ Lambda (submit_job)
│ │
│ ├─ 1. Create job record in DynamoDB (SUBMITTED)
│ ├─ 2. Invoke AgentCore Runtime (fire-and-forget)
│ ├─ 3. Start Step Function polling loop
│ └─ 4. Return job_id immediately
│
│ AgentCore Runtime (runs independently, no timeout)
│ ┌──────────────────────────────────┐
│ │ Strands SDK Agent │
│ │ Model: Claude Sonnet 4.6 │
│ │ (cross-region profile) │
│ │ Tool: Code Interpreter │
│ │ │
│ │ Generates Python code │
│ │ → executes in sandbox │
│ │ → produces .docx/.pdf/… │
│ │ → direct-persists to S3 + DynamoDB│
│ └──────────────────────────────────┘
│
│ Step Function (polling loop)
│ ┌──────────────────────────────────┐
│ │ Wait 30s │
│ │ → CheckAgentStatus Lambda │
│ │ → COMPLETED? → MarkCompleted │
│ │ → still running? → loop back │
│ │ → error? → MarkFailed │
│ │ Timeout: 45 minutes │
│ └──────────────────────────────────┘
│
└── get_document_job_result ──→ Lambda (get_result)
│
▼
DynamoDB → CloudFront URL
│
▼
Returns download link to chat
(CloudFront → S3, clean URLs
that work on corporate networks)
Key Design Decisions¶
- Strands SDK — agent framework running on AgentCore Runtime
- Claude Sonnet 4.6 (
us.anthropic.claude-sonnet-4-6) — cross-region inference profile on Bedrock - AgentCore Code Interpreter — secure sandbox for Python code execution
- Tool call budget — MaxToolCallsHook limits the agent to 20 code executions to prevent runaway loops
- Base64 capture hook — captures file output in real-time before conversation trimming
- Fire-and-forget invocation — submit_job invokes the agent with a 10s read timeout and doesn't wait for completion; the agent runs independently on AgentCore Runtime with no Lambda timeout constraint
- Direct-persist — the agent uploads the result to S3 and marks the job COMPLETED in DynamoDB before returning, so the result is persisted regardless of any downstream timeouts
- Step Function polling loop — polls DynamoDB every 30s to detect when the agent finishes; 45-minute timeout as a safety net
- Async submit/poll — works around Quick Suite's 60-second MCP timeout
- S3 + CloudFront — file delivery via clean
*.cloudfront.netURLs that aren't blocked by corporate proxies (S3 presigned URLs are often blocked) - CDK-managed Gateway — Gateway, Cognito, CloudFront, and all infrastructure in a single CDK stack
Prerequisites¶
Before you start, make sure you have:
- AWS CLI v2 installed and configured with credentials for your target account
- Python 3.12+ — the agent and CDK stack both use Python
- Node.js 18+ and npm — required for the AWS CDK CLI (
npx cdk) - Bedrock model access — enable
anthropic.claude-sonnet-4-6(or the cross-region inference profileus.anthropic.claude-sonnet-4-6) in the Bedrock console for your account and region - AgentCore CLI — installed via
pip install bedrock-agentcore[starter-toolkit](handled byrequirements.txtin Step 0)
Project Structure¶
agentcore_runtime/ Strands agent deployed to AgentCore Runtime
agent.py Agent code (Claude Sonnet + Code Interpreter + hooks)
requirements.txt Python dependencies for the agent
create-iam-role.sh IAM role setup for AgentCore Runtime
cdk/ CDK infrastructure (Python)
app.py CDK app entry point
document_skills_stack.py Stack: DynamoDB, S3, CloudFront, Lambdas, Step Function, Gateway, Cognito
requirements.txt Python CDK dependencies
cdk.json CDK app config (runs `python3 app.py`)
lambdas/
submit_job/ Accepts request, invokes agent (fire-and-forget), starts polling loop
check_agent_status/ Polls DynamoDB to detect when agent finishes
get_job_result/ Reads DynamoDB status, returns presigned S3 download URL
update_job/ Updates DynamoDB job status (used by Step Function)
gateway/
openapi-spec.yaml MCP tool definitions (reference documentation)
samples/ Sample outputs generated by the agent
Deployment¶
Three steps. Each step depends on the previous one.
Set your target region for all steps (used throughout):
export AWS_REGION=us-east-1 # or your preferred region
Step 0: Set up Python environment¶
Create a virtual environment and install the AgentCore CLI and CDK dependencies:
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -r cdk/requirements.txt
This installs:
bedrock-agentcore[starter-toolkit]— provides theagentcoreCLI for deploying the agentaws-cdk-libandconstructs— Python CDK libraries for the infrastructure stack
Verify the CLI is available:
agentcore --help
Step 1: Create IAM role for AgentCore Runtime¶
The agent needs an IAM role that allows it to invoke Bedrock models, use Code Interpreter, and write CloudWatch logs.
cd agentcore_runtime
chmod +x create-iam-role.sh
./create-iam-role.sh
cd ..
The script outputs a Role ARN like:
arn:aws:iam::<account-id>:role/DocumentSkillsAgentCoreRole
Save this — you'll need it in Step 2.
What the role allows:
bedrock:InvokeModel/bedrock:InvokeModelWithResponseStream— call Claude Sonnet (cross-region inference profiles)bedrock-agentcore:StartCodeInterpreterSession,InvokeCodeInterpreter,StopCodeInterpreterSession, etc. — Code Interpreter sessions (managed + custom)s3:PutObject/dynamodb:UpdateItem— direct-persist (agent writes results to S3 + DynamoDB)logs:CreateLogGroup/logs:PutLogEvents— CloudWatch logging
Step 2: Deploy the agent to AgentCore Runtime¶
source .venv/bin/activate
agentcore deploy
On first run, the CLI will interactively prompt you:
Where should we create your new agent?
> 1. Create a new agent
Agent name: document_skills_agent
Region: us-east-1
Execution role ARN: <paste the role ARN from Step 1>
It creates .bedrock_agentcore.yaml with your settings and deploys the agent.
This file is gitignored because it contains account-specific config.
On success, you'll see:
✅ Agent deployed successfully
Agent ARN: arn:aws:bedrock-agentcore:<region>:<account>:runtime/document_skills_agent-XxxYyyZzz
Note the Runtime ID (the part after runtime/, e.g. document_skills_agent-XxxYyyZzz).
You'll need it in Step 3.
Subsequent deploys (after code changes) just run agentcore deploy — no prompts.
Step 3: Deploy CDK infrastructure¶
The CDK stack creates everything: DynamoDB table, S3 bucket, Lambda functions, Step Function, AgentCore Gateway, and Cognito authorizer — all in one deploy.
cd cdk
# Install the CDK CLI (if not already installed globally)
npm install -g aws-cdk
# Bootstrap CDK in your account/region (first time only)
cdk bootstrap aws://<account-id>/$AWS_REGION
# Deploy the stack, passing the Runtime ID from Step 2
cdk deploy --parameters AgentCoreRuntimeId=<runtime-id-from-step-2>
Example with a real runtime ID:
cdk deploy --parameters AgentCoreRuntimeId=document_skills_agent-XxxYyyZzz
CDK will show you the resources it plans to create and ask for confirmation.
Type y to proceed.
On success, the stack outputs all the values you need for Quick Suite:
Outputs:
QuickSuiteDocumentSkills.McpUrl = https://...gateway.bedrock-agentcore.<region>.amazonaws.com/mcp
QuickSuiteDocumentSkills.TokenUrl = https://docskills-XXXXXXXX.auth.<region>.amazoncognito.com/oauth2/token
QuickSuiteDocumentSkills.ClientId = abc123def456...
QuickSuiteDocumentSkills.Scope = document-skills-gateway/invoke
QuickSuiteDocumentSkills.UserPoolId = <region>_XxxYyy
QuickSuiteDocumentSkills.GatewayId = ...
QuickSuiteDocumentSkills.SubmitJobFnArn = arn:aws:lambda:...
QuickSuiteDocumentSkills.GetJobResultFnArn = arn:aws:lambda:...
QuickSuiteDocumentSkills.DocsBucketName = document-skills-output-...
QuickSuiteDocumentSkills.JobsTableName = document-skill-jobs
QuickSuiteDocumentSkills.StateMachineArn = arn:aws:states:...
To get the Client Secret (not included in stack outputs because Cognito doesn't expose it as a CloudFormation attribute), run:
aws cognito-idp describe-user-pool-client \
--user-pool-id <UserPoolId-from-output> \
--client-id <ClientId-from-output> \
--query 'UserPoolClient.ClientSecret' \
--output text --region $AWS_REGION
cd ..
What gets created:
- DynamoDB table (
document-skill-jobs) — tracks job status, TTL-enabled - S3 bucket (
document-skills-output-<account>-<region>) — stores generated files, 7-day auto-expiry - CloudFront distribution — serves download URLs via
*.cloudfront.net(avoids corporate proxy blocks on S3 presigned URLs) - 3 Lambda functions — submit_job (invokes agent + starts polling), check_agent_status, update_job, get_job_result
- Step Function (
document-skill-orchestrator) — polling loop: Wait → Check → Choice (loop or done) - AgentCore Gateway (
document-skills-gateway) — MCP gateway with two tool targets - Cognito User Pool — OAuth2 authorizer with
client_credentialsgrant for the gateway
Configure Quick Suite¶
Using the CDK stack outputs (and the Client Secret from the command above):
- Go to Quick Suite Admin → MCP Actions → Add MCP Server
- Fill in:
| Setting | Value |
|---|---|
| MCP URL | McpUrl from stack output |
| Token URL | TokenUrl from stack output |
| Client ID | ClientId from stack output |
| Client Secret | from describe-user-pool-client command |
| Scope | Scope from stack output (document-skills-gateway/invoke) |
The auth uses OAuth2 client_credentials flow — Quick Suite requests a token
from the Cognito token URL using the client ID + secret, then passes that JWT
in the Authorization header when calling the MCP gateway.
- Save and test by asking Quick Suite to create a document
If you need to retrieve these values later:
# All values except Client Secret
aws cloudformation describe-stacks \
--stack-name QuickSuiteDocumentSkills \
--query 'Stacks[0].Outputs' \
--output table --region $AWS_REGION
# Client Secret
aws cognito-idp describe-user-pool-client \
--user-pool-id <UserPoolId> \
--client-id <ClientId> \
--query 'UserPoolClient.ClientSecret' \
--output text --region $AWS_REGION
Create a Custom Agent with Document Skills¶
You can create a custom Quick Suite agent that uses the document generation MCP tools. The key behavior: after submitting a job, the agent should automatically poll for the result instead of asking the user what to do next.
Agent Setup¶
- Go to Quick Suite Admin → Custom Agents → Create Agent
- Configure the agent with the MCP server you set up in the previous section
- Use the system prompt below
Recommended System Prompt¶
You are a document creation assistant. You help users create professional
documents (Word, PDF, PowerPoint, Excel, and HTML) from natural language
descriptions.
You have access to two tools:
- create_document: Submit a document creation job
- get_document_job_result: Check job status and get the download link
WORKFLOW — follow this exactly for every document request:
1. Determine the skill_type from the user's request:
- Word document → "docx"
- PDF → "pdf"
- PowerPoint/presentation/deck → "pptx"
- Excel/spreadsheet → "xlsx"
- HTML/web page/landing page → "frontend-design"
2. Call create_document with the skill_type, a detailed prompt based on
the user's request, and an appropriate filename.
3. After the job is submitted, tell the user EXACTLY this:
"Your document is being generated. This can take 3-8 minutes for
complex documents.
To check status, type: **check status {job_id}**"
Replace {job_id} with the actual job ID returned by create_document.
4. When the user sends "check status <job_id>", call get_document_job_result
with that job_id.
- If status is COMPLETED: present the download link to the user.
- If status is SUBMITTED or PROCESSING: tell the user the document is
still being generated and to try again in a minute.
- If status is FAILED: tell the user what went wrong and offer to retry.
IMPORTANT RULES:
- After calling create_document, do NOT try to poll or call
get_document_job_result on your own. You MUST wait for the user to
ask for status.
- Always give the user the exact "check status {job_id}" query to copy.
- When COMPLETED, always present the download link clearly.
- If the user asks to "check status" without a job_id, ask them for it.
- You can handle multiple document requests in one conversation.
How It Works¶
With this prompt, the agent flow looks like:
User: "Create a Q4 business review PowerPoint with revenue charts"
│
Agent: calls create_document(skill_type="pptx", prompt="...", filename="Q4_Review.pptx")
│
Agent: "Your document is being generated. This can take 3-8 minutes
for complex documents.
To check status, type: check status abc-123-def-456"
│
... user waits a few minutes ...
│
User: "check status abc-123-def-456"
│
Agent: calls get_document_job_result(job_id="abc-123-def-456") → COMPLETED
Agent: "Your PowerPoint is ready! Download it here: [link]"
The user controls when to check — the agent gives them the exact query to use.
Testing¶
Test the agent directly (without Gateway/Quick Suite)¶
You can invoke the agent directly using the AgentCore CLI:
source .venv/bin/activate
agentcore invoke -a document_skills_agent '{
"skill_type": "xlsx",
"prompt": "Create a simple budget tracker with 3 months of expenses and a totals row",
"filename": "test_budget.xlsx"
}'
The response will contain file_base64 — the generated file encoded as base64.
(Without job_id/docs_bucket/jobs_table, the agent runs synchronously.)
Test the full pipeline (Lambda → Agent → Step Function)¶
Invoke the submit_job Lambda directly:
aws lambda invoke \
--function-name document-skill-submit-job \
--payload '{"skill_type":"xlsx","prompt":"Create a simple budget tracker","filename":"test.xlsx"}' \
--cli-binary-format raw-in-base64-out \
--region $AWS_REGION \
/dev/stdout
This returns a job_id. Then poll for the result:
aws lambda invoke \
--function-name document-skill-get-result \
--payload '{"job_id":"<job-id-from-above>"}' \
--cli-binary-format raw-in-base64-out \
--region $AWS_REGION \
/dev/stdout
Poll every 10 seconds until status is COMPLETED (includes a download_url)
or FAILED.
Updating the Agent¶
After making changes to agentcore_runtime/agent.py:
source .venv/bin/activate
agentcore deploy
The agent redeploys in ~2 minutes. No CDK redeploy needed for agent-only changes.
Updating the Infrastructure¶
After making changes to cdk/document_skills_stack.py:
cd cdk
cdk deploy --parameters AgentCoreRuntimeId=<your-runtime-id>
cd ..
CDK will show a diff of what changed and ask for confirmation.
Cleanup¶
To tear down all resources:
# 1. Destroy the CDK stack (DynamoDB, S3, Lambdas, Step Function, Gateway, Cognito)
cd cdk
cdk destroy
cd ..
# 2. Destroy the AgentCore Runtime agent
agentcore destroy
CDK handles the Gateway and Cognito cleanup automatically — no separate gateway deletion step needed.
Supported Document Types¶
| Skill | Output | Libraries Used |
|---|---|---|
| docx | Word document | python-docx, Pillow |
| PDF document | reportlab, Pillow | |
| pptx | PowerPoint | python-pptx, matplotlib |
| xlsx | Excel spreadsheet | openpyxl, matplotlib, pandas |
| frontend-design | HTML/CSS/JS | Pure Python file write |
Agent Reliability Features¶
The agent includes two Strands hooks to handle edge cases:
- MaxToolCallsHook (20 calls max) — Three-phase approach:
- Calls 1–18: Normal execution
- Call 19: Warning — cancels the call and tells the model to output base64 next
- Call 20: Final allowed call (should be base64 output)
-
Call 21+: Hard-stop via
stop_event_loop -
Base64CaptureHook — Captures file output from tool results in real-time, before
SlidingWindowConversationManagertrims old messages between turns.
Troubleshooting¶
agentcore deploy fails with STS global endpoint error:
export AWS_STS_REGIONAL_ENDPOINTS=regional
Or add sts_regional_endpoints = regional to your AWS config profile.
CDK bootstrap fails with "bucket already exists":
The CDK bootstrap bucket from a previous attempt may be orphaned. Delete the
CDKToolkit CloudFormation stack and the cdk-* S3 bucket manually, then
re-run cdk bootstrap.
Agent times out or produces no file:
Check CloudWatch logs at /aws/bedrock-agentcore/runtimes/<runtime-id>-DEFAULT.
The MaxToolCallsHook may have hit the limit — the agent logs show the call count.
Step Function execution stuck in polling loop: The Step Function polls every 30s for up to 45 minutes. If the agent crashed without updating DynamoDB, the job will stay in PROCESSING until the Step Function times out. Check the agent's CloudWatch logs for errors.