Skip to content

Network architecture

This document describes the network isolation layer for the AgentCore Runtime.

The Runtime runs inside a VPC with 2 Availability Zones:

┌─────────────────── VPC (10.0.0.0/16) ───────────────────┐
│ │
│ ┌─ Public Subnets ──┐ ┌─ Private Subnets ─────────┐ │
│ │ NAT Gateway ──────┼─→ │ AgentCore Runtime (ENIs) │ │
│ │ (→ IGW → GitHub) │ │ SG: egress 443 only │ │
│ └────────────────────┘ └───────────────────────────┘ │
│ │
│ VPC Endpoints: S3, DynamoDB (gw), ECR API, ECR Docker, │
│ CloudWatch Logs, Secrets Manager, │
│ Bedrock Runtime, STS, X-Ray (interface) │
└───────────────────────────────────────────────────────────┘
Outside VPC: Orchestrator Lambda, API Lambdas, API Gateway
  • Public subnets — Host the NAT Gateway and Internet Gateway. No compute resources.
  • Private subnets (with egress) — Host the AgentCore Runtime ENIs. All outbound traffic goes through VPC endpoints or the NAT Gateway.
  • Single NAT Gateway — Provides internet egress (HTTPS only) for GitHub API access. Deployed in one AZ to minimize cost.

The Runtime security group enforces HTTPS-only egress (TCP 443 to 0.0.0.0/0):

  • AWS services — Reached via VPC endpoints (no NAT traffic, no internet traversal).
  • GitHub API — Reached via NAT Gateway → Internet Gateway → github.com:443.
  • All other traffic — Blocked by default (no other egress rules).

Domain-level filtering is enforced by the DNS Firewall (see below).

EndpointTypePurpose
S3GatewayECR image layers, artifact storage
DynamoDBGatewayTask state tables
ECR APIInterfaceContainer image metadata
ECR DockerInterfaceContainer image pull
CloudWatch LogsInterfaceRuntime application and flow logs
Secrets ManagerInterfaceGitHub token retrieval
Bedrock RuntimeInterfaceModel invocation
STSInterfaceTemporary credential retrieval for AWS SDK calls
X-RayInterfaceDistributed tracing via OpenTelemetry/ADOT

Gateway endpoints are free. Interface endpoints have per-hour and per-GB costs.

VPC flow logs are enabled for all traffic (ACCEPT + REJECT) and sent to CloudWatch Logs with 30-day retention. This satisfies the AwsSolutions-VPC7 cdk-nag rule and provides audit visibility into network activity.

The following resources remain outside the VPC (public Lambda execution):

  • Orchestrator Lambda — Invokes the AgentCore Runtime API (not the Runtime itself).
  • API handler Lambdas — Serve the REST API behind API Gateway.
  • API Gateway — Public-facing REST API with Cognito auth.

These do not need VPC access and would incur unnecessary cold-start latency and ENI costs if placed in a VPC.

Route 53 Resolver DNS Firewall provides domain-level egress filtering for the agent VPC. Only domains on the allowlist can be resolved; all other DNS queries are logged (observation mode) or blocked (enforcement mode).

The DNS Firewall evaluates DNS queries at the VPC Resolver level using a rule group with three rules, evaluated in priority order:

  1. Priority 100 — ALLOW platform baseline domains. Always-allowed domains required for core agent operations: GitHub (github.com, api.github.com, *.githubusercontent.com), npm (registry.npmjs.org, *.npmjs.org), PyPI (pypi.org, *.pypi.org, files.pythonhosted.org), and AWS services (*.amazonaws.com).
  2. Priority 200 — ALLOW additional domains. Aggregated from Blueprint networking.egressAllowlist values. Empty by default.
  3. Priority 300 — ALERT or BLOCK all other domains. In observation mode (default), non-allowlisted queries are logged with an ALERT action. In enforcement mode, they are blocked with a NODATA response.

The construct deploys in observation mode by default (observationMode: true). In this mode, DNS Firewall logs all queries but does not block anything, allowing safe analysis of real traffic before switching to enforcement.

Rollout process:

  1. Deploy with observationMode: true — DNS queries are logged (ALERT) but not blocked.
  2. Analyze CloudWatch DNS query logs over 1-2 weeks of real usage.
  3. Add any missing domains to the platform baseline or Blueprint egressAllowlist.
  4. Switch to observationMode: false — non-allowlisted domains are blocked (NODATA).

DNS query logs are sent to a dedicated CloudWatch Logs log group with 30-day retention. Logs capture every DNS query from the VPC, including the queried domain, source IP, and the firewall action taken (ALLOW, ALERT, or BLOCK).

The DNS Firewall is configured with FirewallFailOpen: ENABLED. If the DNS Firewall service experiences a transient issue, DNS queries are allowed through rather than blocked. This prevents a DNS Firewall outage from killing running agent sessions (which can last up to 8 hours).

The Blueprint construct supports a networking.egressAllowlist prop:

new Blueprint(this, 'MyRepoBlueprint', {
repo: 'org/my-repo',
repoTable: repoTable.table,
networking: {
egressAllowlist: ['npm.internal.example.com', '*.private-registry.io'],
},
});

Important: Per-repo egressAllowlist values are aggregated into the platform-wide DNS Firewall policy. They document intent and feed the allowlist, but they do not provide per-session isolation. All agent sessions in the VPC share the same DNS Firewall rules.

  • VPC-wide policy, not per-session — All agent sessions share one VPC and DNS Firewall rule group. AgentCore Runtime has no per-session network configuration. Per-repo egressAllowlist entries are union-ed into the platform allowlist.
  • DNS-only — DNS Firewall intercepts DNS queries. A direct connection to an IP address (e.g. curl https://1.2.3.4/) bypasses DNS and is not blocked. This is acceptable for the “confused agent” threat model (the agent uses domain names) but not for a sophisticated adversary.
  • Wildcard scope*.amazonaws.com is broad but necessary for VPC endpoint private DNS. GitHub wildcards (*.githubusercontent.com) include GitHub Pages, which is a potential exfiltration vector. Narrowing may be considered after analyzing query logs.
  • Missing ecosystems — The platform baseline covers npm and PyPI. Go (proxy.golang.org), Rust (crates.io, static.crates.io), and OS packages (dl-cdn.alpinelinux.org) may need to be added based on observation mode logs.

Estimated monthly cost of the network and edge security layer (~$145-150/month):

ResourceEstimated Cost
NAT Gateway (1× fixed + data)~$32/month
Interface endpoints (7× $0.01/hr/AZ × 2 AZs)~$102/month
Flow logs (CloudWatch ingestion)~$3/month
DNS Firewall (queries)<$1/month
DNS query log group (CloudWatch ingestion)~$1-3/month
WAFv2 Web ACL (3 rules + requests)~$6/month
  • Defense in depth — Multiple layers restrict egress: security group (HTTPS-only), DNS Firewall (domain allowlist with observation or enforcement mode), and VPC endpoints (AWS service traffic stays on-network). See the DNS Firewall section for details and limitations.
  • AWS service isolation — VPC endpoints keep AWS API traffic on the AWS network, reducing exposure.
  • Audit trail — Flow logs record IP-level network activity; DNS query logs record domain-level resolution activity. Together they provide comprehensive egress audit visibility.
  • Remaining gap — DNS Firewall does not prevent direct IP-based connections. A connection to https://1.2.3.4/ bypasses DNS resolution entirely. The security group still allows TCP 443 to 0.0.0.0/0. This gap is acceptable for the “confused agent” threat model but not for a “sophisticated adversary” threat model. AWS Network Firewall (SNI-based filtering) would close this gap at significantly higher cost (~$274/month/endpoint).
  • Single NAT Gateway availability risk — The NAT Gateway is deployed in a single AZ to minimize cost (~$32/month vs ~$64/month for two). If that AZ experiences an outage, all agent sessions lose internet egress (GitHub API access). For a platform where sessions run up to 8 hours, losing egress mid-session means the agent cannot push code or create PRs. Mitigation options: (a) Accept the risk for cost-sensitive deployments (single-developer or small-team usage). (b) Add a second NAT Gateway in the other AZ for production deployments — the additional ~$32/month is justified by the availability improvement. (c) Use a NAT instance (cheaper, but operational overhead). The Blueprint construct or stack props should allow configuring single vs. multi-AZ NAT (default: single for cost; opt-in to multi-AZ for production).