Adr 014 workflow driven tasks

ADR-014: Workflow-driven tasks with an agent-side step runner

Status: accepted Date: 2026-06-04 Implementation: Shipped (#248). The task_type enum is removed; workflow_ref resolves to a pinned resolved_workflow ({id, version}) at the create-task boundary, and the agent runs first-party workflows (coding/new-task-v1, coding/pr-iteration-v1, coding/pr-review-v1, plus repo-less default/agent-v1 and knowledge/web-research-v1) via the agent-side step runner. The Cedar context.read_only migration (Phase 2a) and the repo-optional schema freeze are in place (see the 2026-06-08 addenda).

Context

The platform supports exactly three task types — new_task, pr_iteration, pr_review — fixed as a TypeScript union and enforced by a compile-time exhaustiveness assert. Behavior for each type is not centralized: it is spread across eight files in the Python agent runtime, each branching on the literal task_type string (pipeline.py, post_hooks.py, repo.py, config.py, server.py, models.py, prompts/__init__.py, runner.py) plus a Cedar policy that hard-codes Agent::TaskAgent::"pr_review" for read-only enforcement.

This has two costs:

Every new task type is a core-code change touching all of those files — high-friction, regression-prone, and gated on a CDK deploy.
Non-coding tasks are impossible. Every task unconditionally clones a repo (setup_repo runs for all types) and the create-task API hard-requires an onboarded repo (422 REPO_NOT_ONBOARDED). “Requires a repo” is an implicit universal assumption, never a declared, switchable property.

Issue #248 asks for capability-driven tasks: a declarative file that describes the ordered steps to run, so coding and non-coding workflows share the same admission, memory, policy, cost, and observability machinery — only the declarative file differs. It is the focused, current-architecture slice of the broader AKW vision (#99), and pairs with the agent asset registry (#246).

Two forces constrain the design:

Naming collision. “Blueprint” already denotes the per-repo CDK construct that writes RepoConfig (compute, model, credentials, networking). The new artifact is per-task-type. Overloading “Blueprint” would conflate two very different scopes.
Existing prior art. An unmerged branch (origin/merge/akw-integration, commit 9d066a8) already ports AKW’s YAML registry, Blueprint/ToolEntry Python models, a resolve_task() contract, and nine example blueprints. It is rich (LTM capability negotiation, quality_checkpoints, meta-agents, Mem0) — most of which is explicitly out of scope for #248.
Where steps run. The orchestrator already documents (but has not shipped) a step_sequence/StepRef model for durable, orchestrator-side custom steps (REPO_ONBOARDING.md Layers 2–3). The issue, however, asks for steps that execute “inside the container.”

Decision

Introduce workflows: versioned, declarative YAML files describing how the agent executes one kind of task (ordered steps, system prompt, agent_config (tools, MCP servers, skills, plugins, rules/prompt-fragments, Cedar policy — mirroring the #246 registry asset kinds), how repo-discovered config is layered/gated, hydration sources, terminal outcomes, domain, requires_repo, read_only). The three shipped task types become the first three first-party workflows. The full schema and worked examples live in docs/design/WORKFLOWS.md.

Four sub-decisions:

Name the artifact “Workflow,” not “Capability” or “Blueprint.” Per scope: Blueprint = per-repo (unchanged); Workflow = per-task-type (new). A Blueprint pins which Workflows a repo may run; a Workflow is repo-agnostic. (“Capability” tested poorly for clarity; the registry in #246 still classifies these under its capability asset kind, with “workflow” as the ABCA-facing name.) The request field is workflow_ref (the #248 capability_ref is just this field’s issue name); recorded metadata is resolved_workflow.
Execute steps agent-side, via a new step runner (agent/src/workflow/runner.py) that interprets workflow.steps inside the container. The orchestrator’s durable lifecycle (admission-control → pre-flight → hydrate-context → start-session → await-agent-completion → finalize) is unchanged; the workflow drives what happens within RUNNING, not the platform lifecycle. Existing helpers (setup_repo, run_agent, verify_build, ensure_pr, …) become step handlers rather than inline branches.
Reconcile-and-adopt the prior art, scoped down. Take its proven data shapes — per-blueprint YAML structure, task_mode→(domain+requires_repo), read_only, promotion status (draft→validated→production→deprecated), the resolve pre-flight contract — but drop everything that overshoots #248: LTM capability negotiation / CapabilityIndex, Mem0, quality_checkpoints, meta-agents, and multi-agent loops. The resolver interface is designed so #246’s registry backend is a drop-in replacement for the filesystem one.
Remove task_type; resolve workflow_ref at the create-task boundary. This work replaces the task_type enum — it is deleted, not preserved as an alias (carrying both would defeat centralizing per-task-type behavior). workflow_ref becomes the sole task-selection field, resolved (ref + constraint → pinned {id, version}) where task_type is validated today (create-task-core.ts). When workflow_ref is absent, resolution falls through a short ladder (Blueprint default → platform default/agent-v1), so there is always a workflow. This is an intentional breaking API change, acceptable pre-1.0. Existing task_type callers migrate via a published one-to-one map (new_task → coding/new-task-v1, etc.). Functional fidelity of the migrated coding paths is a goal verified by the promotion gate (tests/eval), not a hard byte-for-byte constraint; intentional behavior changes are recorded in the migration PR.

Why agent-side over orchestrator-side steps

The orchestrator-side StepRef/step_sequence model is real but unshipped, and pulling per-step workflow logic into durable Lambda steps would (a) be a far larger change to the durable execution path, (b) overlap the separate “Blueprint custom steps” planned work, and (c) contradict the issue’s “inside the container” framing. Agent-side keeps the blast radius off durable orchestration: the platform’s invariants (concurrency, audit, cancellation, timeouts) are untouched, and the workflow only reshapes the agent’s own execution — which is exactly the unpredictable part the architecture already isolates in the compute session.

Consequences

(+) New task types are authored, not coded. A new coding or knowledge workflow is a YAML file + registered step handlers, not edits across eight runtime files (plus a Cedar policy) and a CDK deploy.
(+) Repo-optional tasks unlocked. requires_repo:false cleanly skips repo onboarding enforcement, GitHub pre-flight, clone, and PR finalization — enabling knowledge work from task_description + attachments alone.
(+) Always a workflow. With task_type gone, resolution is a short ladder (workflow_ref → Blueprint default → platform default/agent-v1), so a submission with no workflow_ref lands on a minimal “run the request through the agent” default instead of being coerced into the heavyweight new_task path.
(+) Per-task-type model preference, safely bounded. agent_config.model lets a workflow prefer a Bedrock model, validated against the platform/Blueprint allow-list (unpermitted ⇒ admission failure) and still capped by the per-task budget — model choice without privilege/cost escalation.
(+) Provider-neutral repo vocabulary. repo_config.provider and a VcsProvider seam name the GitHub-specific operations (PR/review/permission) as instances of generic “change proposal”/“review” concepts, so adding GitLab/Bitbucket/etc. later is not a schema/contract break. github is the only implemented backend in #248.
(+) Single source of truth per task type. The scattered task_type branches (PR_TASK_TYPES, _PROMPTS, is_read_only, the Cedar pr_review string-match) collapse into declared workflow fields. Read-only is enforced by both allowed_tools and Cedar, closing today’s gap where it was Cedar-only.
(+) Audit & eval. resolved_workflow pinned on every task; per-step milestones via the existing progress writer; cost segmentation by domain.
(+) Promotion is earned, not asserted. status: production is gated by a declared promotion_gate (a behavioral eval per workflow) layered onto the ADR-013 validation pyramid — so quality is machine-checked at the transition, not claimed in a PR description. The gate verifies a workflow does the right thing, which permits intentional behavior changes (recorded alongside the eval update) rather than enforcing byte-for-byte parity with today.
(+) Discovery separated from execution. Optional description/guidance fields give workflows a human- and agent-readable selection surface (for registry search / bgagent workflow list) distinct from the machine-facing prompt.
(+) Clean path to the registry (#246). Filesystem-backed first, registry-resolved later, with a stable resolve contract. agent_config’s asset kinds (tools, mcp_servers, skills, plugins, subagents, prompt_fragments, cedar_policy_modules) mirror the #246 vocabulary 1:1, so a workflow is the registry’s first concrete consumer.
(+) Full agent-config surface, three planes. A workflow declares the whole SDK session shape (not just tools), layered as Blueprint (per-repo) < agent_config (per-task-type) < repo-discovered .claude//.mcp.json — and can gate the repo plane via repo_config. Skills/plugins/subagents/prompt_fragments are forward-declared but registry-resolved (Phase 4), so the schema is complete without implying Phase-1 behavior.
(−) Two related-but-distinct concepts (Blueprint vs Workflow) increase the platform’s conceptual surface; mitigated by the strict per-repo-vs-per-task-type framing and the naming note in WORKFLOWS.md.
(−) A new schema and validator to maintain (JSON Schema + synth-time/CI lint over agent/workflows/**), plus the workflow_ref/resolved_workflow fields now span the CDK↔CLI type-sync contract. To avoid re-creating the cedar-parity drift hazard, the JSON Schema is the single canonical shape contract (consumed by both the TS synth validator and the Python loader via standard libraries, not re-implemented), the non-schema cross-field rules have exactly one implementation in Phases 1–3 (CI-time; the runtime loader does shape-only validation and trusts the CI verdict), and a contracts/workflow-validation/ golden corpus (annotated expected verdicts, run against every validator implementation) locks any Phase-4 second implementation to parity — the same mechanism as contracts/cedar-parity/. See WORKFLOWS.md §“Single source of truth and validator parity”.
(−) Breaking API change. Removing task_type breaks existing callers; they must move to workflow_ref via the published map. Acceptable pre-1.0, but it is a hard cutover (no dual-field grace period), so the migration map and CLI rework must ship together and be documented in API_CONTRACT.md.
(!) Migration risk. Moving the three shipped paths onto the runner can change behavior. Mostly this is mitigated by substituting the runner for the branches without rewriting helper logic, and gated by existing handler/agent tests plus #236 E2E; where behavior changes intentionally, the change is called out in the migration PR rather than slipping through silently.
(!) Memory keying. Long-term memory is keyed on repo; repo-less workflows need a fallback actorId (open question in WORKFLOWS.md, to coordinate with MEMORY.md).
(!) Single run_agent invariant. The runner enforces exactly one agentic step for now; lifting that (multi-agent workflows) is deliberately deferred to #99.
(!) Cedar principal migration is security-load-bearing. Read-only enforcement moves from a literal "pr_review" match to a context.read_only-driven rule so it applies to all read-only workflows; this must be done precisely and is gated by the existing Cedar parity fixtures. Because an error here silently weakens enforcement (the rule stops matching) rather than failing loudly, the policy rewrite + regenerated parity fixtures must be reviewed with that property front-of-mind. The schema enforces a policy floor (soft-deny mandatory for writeable workflows) so a workflow file cannot weaken the HITL posture by config. (Superseded — see Addendum 2026-06-08 on Phase 2a sequencing.)
(!) Resume-aware steps, not orchestrator-durable steps. The runner checkpoints step completion to persistent session storage (/mnt/workspace, survives stop/resume) and resumes the agent loop via the persisted SDK session UUID — so a stop/resume skips completed steps rather than replaying from turn 0. This is agent-side recovery; the orchestrator still treats the session as one await-agent-completion step (invariants stay agent-external). Worker-portable resume depends on the planned S3-backed SDK session store (tracked as a GitHub issue); until then a total worker loss falls back to a from-step-0 re-run, mitigated by mandatory step idempotency. on_failure: continue after side effects is forbidden; per-step compensation/rollback is a non-goal.
(!) Promotion gate bootstrapping. A behavioral eval per workflow depends on #236; until it lands, the gate is a concrete test target and (when omitted) a reviewed manual step. “Earned, not set” arrives in stages, weakest for the phases that ship first.
(!) Repo-optionality is a wider refactor than Phase 3’s one-liner implies — repo is required across ~6 TS interfaces + the agent config validator, and memory keying, SessionRole tenant tags, and artifact delivery all assume a repo. The requires_repo:false promise in the Phase-0 schema is a forward-declaration, not a runnable path: the web_research example is a schema-expressiveness fixture, not yet an executable acceptance test. The two blocking open questions (memory actorId, artifact delivery contract) must be resolved as recorded decisions — a short ADR addendum — before the Phase-0 schema is frozen, since either may add or reshape a schema field (e.g. actor_namespace, the deliver_artifact target contract) and deferring them past freeze risks a breaking schema revision.
(!) Governance. Publishing/promoting a production workflow is a trust decision and follows ADR-003; registry-era publish ACLs are Cedar-governed per #246 Phase 3.

Addendum (2026-06-08): Phase 2a sequencing — isolated PR requirement dropped

The original decision required the Cedar principal migration (Phase 2a) to ship as its own isolated PR, reviewed alone and landing ahead of the pr_iteration/pr_review workflow migrations (Phase 2b). That ordering is no longer achievable on the as-built branch: the task_type→workflow cutover (Phase 2b) already shipped without the Cedar change, made safe by a deliberate bridge — policy_principal_for() maps read_only ⇒ "pr_review", so the existing literal-principal forbid rules in hard_deny.cedar keep firing for every read-only workflow. Read-only enforcement is therefore already correct under the legacy principal; Phase 2a is a refactor of how that enforcement is keyed (principal == Agent::TaskAgent::"pr_review" → context.read_only == true), not a gap being closed.

Given the bridge, the “isolated PR ahead of 2b” rule has lost its purpose (there is no window where 2b runs without protection), so we drop the isolation+ordering requirement and implement Phase 2a as commits on the existing feat/248-workflow-driven-tasks branch. The security property the rule was protecting is preserved by other means, which remain mandatory:

The migration is gated by the contracts/cedar-parity/ golden fixtures, run against both the Python cedarpy and TypeScript cedar-wasm bindings (decision #23). The fixtures are updated in the same commit as the policy rewrite.
A dedicated parity fixture for the context.read_only deny path is added, so the new rule’s matching behavior is locked by a golden vector (a silent stop-matching regression fails CI rather than shipping).
The bridge in policy_principal_for() is removed only after the context.read_only rule is in place and fixture-verified, in the same change — never a window where neither keys the deny.

What does not change: read-only is still enforced by both allowed_tools (no Write/Edit) and Cedar; the schema’s soft-deny policy floor for writeable workflows stands. Reviewers should still treat the hard_deny.cedar + policy.py context diff as the security-critical core of this PR and review it with the “silently weakens” failure mode in mind.

Addendum (2026-06-08): repo-optional open questions resolved — schema freeze

WORKFLOWS.md open questions #1 (memory actorId for repo-less tasks) and #2 (artifact-delivery contract) were flagged as blocking Phase 3 and requiring resolution before the Phase-0 schema is frozen, because either might add or reshape a schema field. Both are now decided. The schema field reshape implied by #2 is applied in the same change as this addendum, so the schema can be treated as frozen.

Decision 1 — Memory actorId for repo-less tasks: per-user (user:{id}). A repo-less task uses actorId = user:{cognito_sub} (the platform user id already threaded as TaskConfig.user_id), not the repo used by coding tasks (memory.py). Rationale: it is caller-scoped (no cross-tenant knowledge bleed — the same isolation property the per-user trace prefix already relies on), and it matches the platform’s existing user-scoping pattern. Cross-workflow knowledge pooling (e.g. “all web_research tasks share learnings”) is explicitly not adopted now — it mixes tenants in one namespace and is a larger privacy decision deferrable to the registry phase.

Schema impact: none. This is a fixed platform fallback, not author-configurable, so it adds no actor_namespace selector to the workflow schema. (The earlier note that it “may introduce an actor_namespace selector” is resolved in the negative — keeping the schema smaller.) It is a Phase-3 memory.py change: when repo is absent, key on user:{user_id}; coding tasks are unchanged.

Decision 2 — Artifact delivery: named Python deliverers (open target), shared S3 plumbing pinned now. deliver_artifact.target becomes an open string that resolves to a registered Python deliverer — the same “author, don’t code the core” pattern as the step-handler registry (STEP_HANDLERS in runner.py). A new delivery method is a new registered deliverer function, not a schema change. This keeps the field forward-compatible with #246 (a registry-resolved deliverer is a drop-in for the filesystem one) instead of pinning a closed s3 | comment | s3_and_comment enum that every new method would have to widen.

Plumbing pinned now (shared by all deliverers, so it can freeze): artifacts upload to a task-scoped S3 key artifacts/{task_id}/<name> in a platform artifacts bucket; the agent SessionRole gets an IAM grant scoped to that prefix (mirroring the traces/{user_id}/ per-prefix guard); a size limit applies per artifact; the delivered artifact’s URL surfaces on TaskDetail. The SessionRole repo tenant tag gains a repo-less form (workflow:{id} tag) so the role is still attributable without a repo.
Deliverer contract: each registered deliverer declares the set of terminal outcomes it produces (e.g. an S3 deliverer produces artifact; a comment deliverer produces comment). Validator rule 11 (today a hardcoded target → outcomes map, _DELIVER_TARGET_OUTCOMES) is reframed to consult the deliverer registry rather than the enum, so the “declared primary outcome is actually produced by some step” check survives the move to open targets. Until Phase 3 ships real deliverers, the registry carries the three first-party names (s3, comment, s3_and_comment) with their existing produced-outcome sets, so no current workflow, fixture, or contracts/workflow-validation/ golden vector changes behavior — the enum is widened to an open string + registry, not redefined.
Schema impact (applied with this addendum): steps[].target drops its enum constraint (stays type: string); the closed set moves into the deliverer registry. deliver_artifact itself remains a NotImplementedError stub until Phase 3 — only the contract is frozen here, not the implementation.

With both resolved and the one schema reshape applied, the Phase-0 schema is frozen: subsequent phases add registered handlers/deliverers and platform plumbing, not schema fields.

References

Issue #248 — capability-driven tasks (this ADR’s tracking issue)
Issue #246 — agent asset registry (workflows are its first capability-kind consumer)
Issue #245 — attribution on resolved capability
Issue #236 — E2E verification (parity coverage)
Issue #99 — AKW integration (broader vision; out-of-scope items defer here)
docs/design/WORKFLOWS.md — the workflow schema, step catalog, and step-runner design
docs/design/ORCHESTRATOR.md — durable lifecycle and extension points
docs/design/REPO_ONBOARDING.md — the Blueprint construct and step_sequence model
docs/design/CEDAR_HITL_GATES.md — policy engine the agent_config feeds
Prior art: origin/merge/akw-integration (commit 9d066a8) — AKW YAML registry and models (reconciled, scoped down)
ADR-013 — the validation pyramid the promotion_gate layers onto
ADR-005 — the feedback loop that workflow trajectory-evolution would extend (future, out of scope)