Skip to content

Compute

Every task runs in an isolated cloud compute environment. Nothing runs on the user’s machine. The agent clones the repo, writes code, runs tests, and opens a PR inside a MicroVM that is created for the task and destroyed when it ends.

  • Use this doc for: understanding the compute environment, agent harness, network architecture, and the constraints that shape the platform’s design.
  • Related docs: ORCHESTRATOR.md for session management and liveness monitoring, SECURITY.md for isolation and egress controls, REPO_ONBOARDING.md for per-repo compute configuration.

The default runtime is Amazon Bedrock AgentCore Runtime, which runs each session in a Firecracker MicroVM with per-session isolation, managed lifecycle, and built-in health monitoring. For repos that exceed AgentCore’s constraints (2 GB image limit, no GPU), the ComputeStrategy interface allows switching to alternative backends per repo.

AgentCore RuntimeECS on FargateECS on EC2EKSAWS BatchLambdaCustom EC2 + Firecracker
IsolationMicroVM (Firecracker)Task-level (Firecracker)Container on shared nodesPod on shared nodesBackend-dependentFunction env (Firecracker)MicroVM (you own it)
Image limit2 GB (non-adjustable)No hard capNo hard capNo hard capBackend-dependent10 GBN/A (you define)
FilesystemEphemeral + persistent mount (preview)20-200 GB ephemeralNode disk + EBS/EFSNode disk + PVsBackend-dependent512 MB-10 GB /tmpYou choose (EBS/NVMe)
Max duration8 hoursNo hard capNo hard capNo hard capConfigurable15 minutesUnlimited
StartupService-managedSlim images helpWarm ASGs + pre-pullKarpenter + pre-pullBackend-dependentProvisioned concurrencySnapshot pools (DIY)
GPUNoNoYesYesYes (EC2/EKS backend)NoYes (with passthrough)
Ops burdenLow (managed)LowMediumHighLow-MediumLowVery high
Cost modelvCPU-hrs + GB-hrsvCPU + mem/secEC2 + EBSEKS control + EC2Underlying computeRequest + durationEC2 metal + your ops
FitDefault choiceRepos > 2 GB imageGPU, heavy toolchainsMax flexibilityQueued batch jobsPoor (15 min cap)Best potential, highest cost

The backend is selected per repo via compute_type in the Blueprint config. The orchestrator resolves the strategy and delegates session start, polling, and termination to the strategy implementation. See REPO_ONBOARDING.md for the ComputeStrategy interface.

Each session:

  • Runs the agent harness (Claude Agent SDK) with the foundation model inference loop
  • Clones the repo, creates or checks out a branch, edits files, runs shell commands (build, test, lint)
  • Makes outbound API calls to GitHub (clone, push, PR), Bedrock (model invocation), and tool services (AgentCore Gateway, Memory)
  • Reads/writes memory via AgentCore Memory for cross-session learning

Code durability comes from the agent committing and pushing to the remote branch. Cross-session state uses external storage (Memory, DynamoDB).

The most significant constraint. The image must fit the agent code, runtimes, and tools in 2 GB.

LayerEstimated size
Base OS (slim Linux)~50-100 MB
Python 3.x + pip~100-150 MB
Node.js 20.x + npm~100-150 MB
Git + CLI tools~50-80 MB
Agent code + SDK~100-200 MB
Available for repo deps~1.3-1.6 GB

When repos exceed 2 GB: the onboarding pipeline warns the operator, attempts optimization (multi-stage builds, slim bases), falls back to runtime install (slower cold start), or flags the repo for an alternate compute backend.

AgentCore supports persistent session storage (preview): a per-session filesystem mounted at /mnt/workspace that survives stop/resume cycles (14-day TTL). However, the S3-backed FUSE mount does not support flock(), which breaks build tools like uv.

The platform works around this by splitting storage:

WhatLocationWhy
Repo clone/workspace (ephemeral)Build tools need flock()
npm cache/mnt/workspace (persistent)npm uses lockless atomic ops
Claude Code config/mnt/workspace (persistent)No flock() needed
mise data, uv cache/tmp/ (ephemeral)Both use flock() internally
LimitValueNotes
Max session duration8 hoursHard limit enforced by AgentCore
Idle timeout15 minutesAgent must report HealthyBusy via /ping to stay alive

See ORCHESTRATOR.md for how the orchestrator handles these timeouts.

The agent harness is the layer around the LLM that manages the execution loop: context, tools, guardrails, and lifecycle. It is not the agent itself but the infrastructure that makes long-running autonomous agents reliable.

The platform uses the Claude Agent SDK as the harness. It provides the agent loop, built-in tools (filesystem, shell), and streaming message reception for per-turn trajectory capture (token usage, cost, tool calls).

Execution model: Tasks are fully unattended and one-shot. The agent loop runs in a background thread so the FastAPI /ping endpoint stays responsive on the main thread. The agent thread uses asyncio.run() with the stdlib event loop (uvicorn is configured with --loop asyncio to avoid uvloop conflicts with subprocess SIGCHLD handling).

System prompt: Selected by task type from a shared base template (agent/prompts/base.py) with per-task-type workflow sections (new_task, pr_iteration, pr_review). The platform defines what the agent should do; the harness executes it.

Result contract: The agent does not call back to the platform. It follows the contract (push work, create PR) and exits. The orchestrator infers the outcome from GitHub state and the agent’s poll response.

ToolSourceDescription
Shell executionNative (MicroVM)Build, test, lint via bash
File systemNative (MicroVM)Read/write code
GitHubAgentCore Gateway + IdentityClone, push, PR, issues
Web searchAgentCore GatewayDocumentation lookups

Plugins, skills, and MCP servers are out of scope for MVP. Additional tools can be added via Gateway integration.

The harness enforces tool-call policy via Cedar-based hooks:

  • PreToolUse (agent/src/hooks.py + agent/src/policy.py) - Evaluates tool calls before execution. pr_review agents cannot use Write/Edit. Writes to .git/* are blocked. Destructive bash commands are denied. Fail-closed: if Cedar is unavailable, all calls are denied.
  • PostToolUse (agent/src/hooks.py + agent/src/output_scanner.py) - Screens tool outputs for secrets and redacts before re-entering agent context.

Per-repo custom Cedar policies are supported via Blueprint security.cedarPolicies. See SECURITY.md for the full policy enforcement model.

The agent runtime runs inside a VPC with private subnets. AWS service traffic stays on the private network via VPC endpoints. External traffic (GitHub, package registries) goes through a NAT Gateway.

flowchart TB
    subgraph VPC["VPC (10.0.0.0/16)"]
        subgraph Private["Private Subnets"]
            RT[AgentCore Runtime]
        end
        subgraph Public["Public Subnets"]
            NAT[NAT Gateway]
        end
        VPE[VPC Endpoints]
    end
    IGW[Internet Gateway]
    GH[GitHub / npm / PyPI]
    AWS[AWS Services]

    RT -->|AWS API calls| VPE
    VPE --> AWS
    RT -->|External HTTPS| NAT
    NAT --> IGW --> GH
DestinationPathExamples
AWS servicesVPC endpoints (private network)Bedrock, DynamoDB, S3, Secrets Manager, ECR, CloudWatch, STS, X-Ray
GitHubNAT Gateway -> internetgithub.com, api.github.com, *.githubusercontent.com
Package registriesNAT Gateway -> internetregistry.npmjs.org, pypi.org, files.pythonhosted.org
Everything elseBlocked by security group (TCP 443 only) + DNS Firewall (domain allowlist)-
EndpointTypePurpose
S3, DynamoDBGateway (free)Image layers, task state
ECR API + DockerInterfaceContainer image pull
CloudWatch LogsInterfaceRuntime logs
Secrets ManagerInterfaceGitHub token
Bedrock RuntimeInterfaceModel invocation
STSInterfaceTemporary credentials
X-RayInterfaceDistributed tracing

Route 53 Resolver DNS Firewall provides domain-level egress filtering. Three rules evaluate in priority order:

  1. Priority 100 - ALLOW platform baseline (GitHub, npm, PyPI, *.amazonaws.com)
  2. Priority 200 - ALLOW additional domains from Blueprint networking.egressAllowlist
  3. Priority 300 - ALERT or BLOCK everything else

Current state: observation mode. Non-allowlisted domains are logged but not blocked. The rollout process:

  1. Deploy with observationMode: true (default)
  2. Analyze DNS query logs over 1-2 weeks
  3. Add missing domains to baseline or Blueprint egressAllowlist
  4. Switch to observationMode: false to enforce blocking

Configured with FirewallFailOpen: ENABLED so a DNS Firewall outage does not kill running sessions.

Limitations:

  • VPC-wide, not per-session - All sessions share one DNS Firewall rule group. Per-repo egressAllowlist values are aggregated (union).
  • DNS-only - Direct IP connections bypass DNS filtering. Acceptable for confused-agent threats, not for sophisticated adversaries.
  • Broad wildcards - *.amazonaws.com and *.githubusercontent.com are necessary but broad.

Multiple layers restrict egress, each catching what the others miss:

  1. Security group - TCP 443 only (always enforced)
  2. DNS Firewall - Domain allowlist (observation or enforcement mode)
  3. VPC endpoints - AWS traffic stays on private network
  4. VPC flow logs - All traffic (ACCEPT + REJECT) logged to CloudWatch (30-day retention)

Remaining gap: DNS Firewall does not block direct IP connections. AWS Network Firewall (SNI filtering) would close this at ~$274/month/endpoint.

Single NAT Gateway (~$32/month) provides internet egress for GitHub and package registries. Single-AZ deployment minimizes cost but creates an availability risk: if that AZ fails, running sessions lose egress. Configurable via natGateways prop for production deployments that need multi-AZ.

ResourceMonthly cost
NAT Gateway (1x, fixed + data)~$32
Interface endpoints (7x, 2 AZs)~$102
Flow logs (CloudWatch)~$3
DNS Firewall + query logs~$2-4
WAFv2 (3 rules)~$6
Total~$145-150