Compute

Overview

The tasks requested by the user are offloaded to a cloud compute environment. Nothing runs on the user’s computer — all agent work happens in the cloud.

This compute environment is where the agent actually runs. In each session it:

Runs the agent — the agent harness (e.g. Claude Code SDK) and the foundation model inference loop execute here. The agent reasons, plans, and decides what to do next.
Clones and works on the repo — it clones the target repository (e.g. from GitHub), checks out or creates a branch, and performs file edits, runs shell commands (build, test, lint), and uses the filesystem to read and write code.
Makes API calls — outbound calls to the GitHub API (clone, push, create PR, read issues), to the FM inference endpoint (e.g. Amazon Bedrock), and to any tool or identity services (e.g. AgentCore Gateway for tools, AgentCore Identity for OAuth tokens). The compute environment must allow this outbound HTTP/HTTPS traffic.
Uses tools and memory — the agent may call MCP or gateway-backed tools (e.g. web search) and read/write short- or long-term memory (e.g. AgentCore Memory) via those services.

Each user task gets its own isolated session (its own compute unit — e.g. a MicroVM or container). Sessions are ephemeral: when the task ends (success, failure, cancel, or timeout), the session is torn down and any local state is discarded. Code durability comes from the agent committing and pushing to the remote branch; any cross-session state uses external storage (e.g. memory service, DynamoDB), not the compute node’s disk.

Requirements

This project has the following requirements for the cloud compute environment:

Session isolation (isolated compute, memory, and filesystem resources): the isolation prevents data leakage or cross-session contamination, ensuring that sensitive information or temporary data from one session is securely wiped when terminated. No shared mutable state between sessions.
Filesystem: access to a writable filesystem with enough capacity for cloning a repo, installing dependencies, and build artifacts (order of magnitude: multi-GB). Ephemeral per session is acceptable if the agent can commit work regularly and/or access an external storage (EFS)
Persistent storage (beyond the session): the user and the agent need to persist some information across sessions (e.g. memory, task state). This may be provided by the compute layer or by separate services (e.g. AgentCore Memory, DynamoDB. EFS).
Long execution (hours): the selected service must allow runs for long periods so the agent can complete coding tasks without being killed by short time limits.
Startup time: we want to minimize cold start (e.g. provisioned concurrency, snapshot-based starts, or pre-warmed environments) so users are not blocked by long clone-and-install phases. Snapshot-on-schedule pattern: rebuild filesystem snapshots on a periodic schedule (e.g. every 30 minutes or on push to default branch) with pre-installed dependencies. The onboarding pipeline (Iteration 3a) triggers the initial snapshot when a repo is onboarded; subsequent rebuilds are triggered by webhooks (push to main) or scheduled EventBridge rules. Optionally begin sandbox warming proactively when a user starts composing a task, reducing perceived latency. The snapshot is stored as a container image in ECR and used as the base for new sessions targeting that repo.
Outbound network access: the agent must reach external services over HTTP/HTTPS — at minimum GitHub API (clone, push, PR, issues), the FM inference endpoint (e.g. Bedrock), and any tool or identity services (e.g. AgentCore Gateway, Identity). The compute environment must allow this outbound traffic; network policy may restrict it to allowlisted endpoints.
External termination: the platform must be able to stop a running session on demand (e.g. when the user cancels a task). The compute/runtime must expose a way to terminate a session (e.g. StopRuntimeSession or equivalent) so the orchestration layer can enforce cancellation.
Session liveness / health: the platform needs a way to know whether a session is still running or has ended (finished, failed, timed out, or terminated). This may be a status API, a health/ping contract, or polling; it is required for orchestration (e.g. when to finalize a task) and for observability.
Predictable timeouts: documented idle timeout (e.g. session killed after N minutes of no activity) and maximum session duration (e.g. hard cap in hours). These drive the durability design (e.g. commit regularly) and orchestration timeouts.
Concurrent sessions: the system runs multiple tasks in parallel; each task uses its own session. The compute option must support many concurrent sessions (subject to quotas) so that admission control and scaling are feasible.
Observability: the compute/runtime should support or not block visibility into what is going on — e.g. logs (e.g. to CloudWatch), optional metrics and traces, and optionally streaming agent output (reasoning, tool calls) for debugging and evaluation. Aligns with the design principle that it should be easy to see everything that is going on.
Resource profile for coding workloads: sufficient CPU, memory, and disk for typical coding tasks (clone repo, install deps, run builds/tests/linters). The exact numbers depend on the runtime (e.g. 2 vCPU, 8 GB RAM, 10 GB writable disk are cited in current options); the requirement is that the profile is viable for this workload.
Visual proof: to support running the application and capturing screenshots or videos as proof that changes work: virtual display (e.g. Xvfb) for GUI/desktop apps, or headless browser (Playwright/Puppeteer) for web; capture stack (browser + Playwright/Puppeteer for web, Xvfb + FFmpeg for desktop) within image and disk limits; optional higher CPU/RAM/disk for capture workloads or strict duration/resolution limits; outbound upload (S3 or platform API) for screenshots/videos; scripts or tools for start app, capture, and upload with defined limits and a place to link the proof (task/PR).

AgentCore Runtime 2GB image limit

The AgentCore Runtime imposes a non-adjustable 2GB maximum on container images. This is the most significant constraint for a coding agent platform.

Image budget breakdown (estimated)

Layer	Estimated size	Notes
Base OS (slim Linux)	~50–100 MB	Alpine or distroless base
Python 3.x runtime + pip	~100–150 MB	Agent code and dependencies
Node.js 20.x + npm	~100–150 MB	For JS/TS repos
Git + common CLI tools	~50–80 MB	git, curl, jq, etc.
Agent code + SDK dependencies	~100–200 MB	Claude Code SDK, requirements
Available for repo-specific deps	~1.3–1.6 GB	Language SDKs, compilers, package caches

What happens when repos exceed 2GB

At onboarding time: The onboarding pipeline should estimate the image size by analyzing the repo’s dependencies (e.g. package-lock.json, requirements.txt, Cargo.toml). If the estimated image exceeds 2GB, the onboarding pipeline should:
1. Warn the operator that the repo may exceed the image limit.
2. Attempt optimization: multi-stage builds, strip debug symbols, use slim base images, exclude dev-only dependencies not needed at agent runtime.
3. Fall back to runtime install: Ship a lean base image and install repo-specific dependencies at session start (slower cold start, but no image limit). The setup script (from .backgroundagent/setup.sh or onboarding config) runs npm install / pip install etc. during the HYDRATING phase.
4. Flag for alternate runtime: Mark the repo as requiring a larger compute environment (ECS/Fargate, EKS) when the ComputeStrategy interface is available (see REPO_ONBOARDING.md).
At task time: If the image was built within 2GB but runtime install pushes the writable filesystem beyond its available capacity, the session fails. The orchestrator should detect this failure pattern (e.g. “no space left on device” in agent logs) and surface it as a specific error (IMAGE_SIZE_EXCEEDED).

Design implication: ComputeStrategy interface should be planned earlier

The 2GB limit is a known blocker for repos with heavy toolchains (Rust, Java/JDK, .NET SDK, monorepos with large dependency trees). The ComputeStrategy interface (see REPO_ONBOARDING.md) should be designed in Iteration 3a (as an interface contract) even if only the AgentCore implementation exists initially. This ensures the orchestrator is not tightly coupled to AgentCore-specific assumptions and that switching to an alternate runtime (ECS/Fargate) is a configuration change (compute_type: 'ecs'), not a re-architecture. Additional compute options will be explored to fill the gaps in the current runtime selection.

Note on virtualization methods

Journey of virtualization: VM (whole machine), container (single process), MicroVM (secure sandbox)

Firecracker follows a minimalist philosophy: removing unnecessary hardware emulation (graphics, USB, BIOS, etc.) to achieve maximum efficiency. Each MicroVM boots in under 125 ms, with binaries around 3 MB and minimal memory use.

Evaluation

Multiple options are available for compute:

Fargate (uses Firecracker)
EKS
AgentCore runtime (uses Firecracker)
Lambda (uses Firecracker)
Custom: Bare metal EC2 + Firecracker
Custom: Bare metal EC2 + Other hypervisor

The following table provides an overview

Option	Max Docker image size	Filesystem size (session-local)	Cost / billing model	State management (cross-session)	Isolation mechanism	Execution duration	Guest OS	GPU support	Environment pre-warming
ECS on Fargate (Firecracker)	No published Fargate-specific hard cap (practically bounded by image pull time + task ephemeral storage; image layers consume task storage)	20 GiB default, configurable up to 200 GiB ephemeral per task	Pay for requested vCPU + memory (and ephemeral storage beyond included amount), billed from image pull until task stops, per-second with 1-min minimum	Externalize to DynamoDB/S3/RDS/Agent memory; local disk is ephemeral. EFS/EBS patterns possible depending ECS design	Managed task isolation (backed by Firecracker on AWS side)	No documented ECS task hard max (you enforce timeout/cancel in orchestration)	Linux + Windows container families supported on Fargate task defs	No (`gpu` task-def param invalid for Fargate)	Partial: no direct prewarm knob; keep warm tasks/services, slim images
EKS (on EC2 nodes)	No EKS service-specific cap (depends on registry/runtime/node disk)	Node root volume / instance store + Kubernetes volumes/PVs (EBS/EFS/FSx)	EKS control plane hourly + worker compute/storage/network	Strong PV/PVC model + external stores; ephemeral pod volumes destroyed with pod unless persistent volume used	Pod/container isolation on shared nodes (can be strengthened with sandboxing choices)	No EKS-imposed pod/job hard max by default; use K8s controllers + timeouts (`activeDeadlineSeconds`)	Linux (AL2023/Bottlerocket) and Windows nodes supported	Yes (GPU/accelerator node AMIs supported)	Strong: warm nodes, overprovisioning, image pre-pull, Karpenter/managed node groups
Bedrock AgentCore Runtime (Firecracker)	2 GB container image max (runtime quota)	Ephemeral writable filesystem is available for session-local operations	Runtime billed by vCPU-hours + GB-hours (check region/pricing page)	Designed for external state (AgentCore Memory / DynamoDB / S3 / DBs); session local state is ephemeral	Per-session isolated runtime (Firecracker-backed service)	Up to 8 hours per session, 15-min idle timeout (keepalive `/ping` available)	Runtime expects Linux container images (see current runtime quotas documentation)	No (runtime quotas show max GPU allocation = 0)	No user-facing prewarm control documented (service-managed startup)
AWS Lambda (Firecracker)	10 GB (container image code package, uncompressed incl. layers)	`/tmp` configurable 512 MB to 10,240 MB	Request + duration billing (plus optional provisioned concurrency)	External-only (S3/DynamoDB/etc.); `/tmp` is ephemeral	Function execution environment isolation (Firecracker-backed)	15 min max (900s)	Linux only (Lambda runtime/container model)	No	Yes: Provisioned Concurrency (best native prewarm option)
Custom: Bare metal EC2 + Firecracker	N/A (VM-first; if you run containers inside host/guest, you set the limits)	You choose (EBS / NVMe / instance store), from GBs to TBs	EC2 (metal) + EBS + your ops/control-plane costs (EC2 billed per-second, 60s min)	Anything you build (DynamoDB/S3/EBS/EFS/DB)	Firecracker microVM per session (you own implementation)	You define it (effectively unlimited)	Firecracker supports Linux host/guest (and OSv)	Generally no native GPU device model/passthrough in stock Firecracker	Excellent but DIY: snapshot pools, pre-created microVMs
Custom: Bare metal EC2 + other hypervisor (KVM/QEMU, etc.)	N/A (VM-first; container support optional)	You choose (EBS / NVMe / instance store), GBs–TBs	EC2 (metal) + EBS + hypervisor/orchestration ops	Anything you build	Full VM isolation (depends on hypervisor config)	You define it (effectively unlimited)	Linux / Windows guests possible (depends on hypervisor)	Yes (with supported instance/hypervisor + passthrough strategy)	Excellent but DIY: warm VM pools, snapshots, templates
ECS on EC2 (relevant addition)	No ECS service-specific cap (depends on registry/runtime/node disk)	Node disk + attached EBS/EFS (you size it)	ECS control plane has no extra “cluster fee”; you pay EC2/EBS/network	External stores + optional EBS/EFS per task/workload	Container isolation on shared EC2 nodes	No documented ECS task hard max	Depends on your EC2 AMI/OS (Linux/Windows possible)	Yes (ECS supports GPU tasks on GPU EC2 container instances)	Strong: warm ASGs/capacity providers + pre-pulled images
AWS Batch (relevant addition; runs on ECS/EKS/Fargate/EC2)	Backend-dependent (ECS/EKS/Fargate/EC2)	Backend-dependent (e.g., Fargate 20–200 GiB; EC2/EKS node/PV sizing)	No additional AWS Batch charge; pay underlying EC2/Fargate/etc.	External stores; Batch is scheduler/orchestrator	Backend-dependent	Timeout configurable; Batch can terminate jobs when timeout exceeded (min 60s)	Backend-dependent	Backend-dependent (yes on EC2/EKS GPU backends; no on Fargate)	Good via compute environment sizing/min capacity (backend-dependent)

This second table maps each compute option to the requirement checklist using 🟢 / 🟡 / 🔴.

Legend: 🟢 strong fit / native, 🟡 workable with extra engineering or constraints, 🔴 weak fit / notable mismatch

Compute option	Isolation	Writable FS (multi-GB)	Cross-session state	Long-run (hours)	Startup / prewarm	Outbound egress	External termination	Liveness / health	Predictable timeouts	Concurrency / scaling	Observability	Visual proof (screenshots/video)	GPU / devices	Overall fit for autonomous coding agent
AgentCore Runtime	🟢	🟡	🟢	🟢	🟡	🟢	🟢	🟢	🟢	🟢	🟢	🟡	🔴	Strong managed fit (best isolation/session lifecycle; resource/image limits matter)
ECS on Fargate	🟢	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟡	🟢	🟢	🟡	🔴	Strong fit for most CPU-bound coding agents
ECS on EC2 (relevant add)	🟡	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟢	Very strong fit if you can operate the fleet
EKS (Kubernetes on EC2)	🟡	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟢	Very strong fit (max flexibility, max ops burden)
AWS Batch (EC2/EKS backend) (relevant add)	🟡	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	Excellent fit for queued/async background coding tasks
AWS Batch (Fargate backend) (relevant add)	🟢	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟢	🟢	🟢	🟡	🔴	Great fit for async jobs without GPU
Lambda	🟢	🔴	🟡	🔴	🟢	🟢	🔴	🟡	🟢	🟢	🟢	🔴	🔴	Poor fit for long-running coding sessions (good only for short helpers)
Custom EC2 + Firecracker	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟡	🟡	🟢	🟢	Best potential fit, but very high platform engineering cost
Custom EC2 + other hypervisor	🟡	🟢	🟢	🟢	🟡	🟢	🟢	🟢	🟢	🟡	🟡	🟢	🟢	Strong but heavyweight; less efficient than Firecracker-based designs