Skip to main content
aws-samples · AIDLC × AgenticOps
Tech Previewv0.3.0-preview.1 — API may change before GA. See the support policy.

The AIDLC operations gap
is still human glue.

AIDLC automates design and construction. Operations — deploys, incidents, cost drift, regressions — still fall on the team. OMA is the plugin marketplace that closes the loop with AgenticOps: humans approve, agents execute everything between the checkpoints.

Plugins
4
Tier-0 workflows
9
AWS MCP servers
11 pinned
Ontology entities
8 schemas

$ claude

> /plugin marketplace add aws-samples/sample-oh-my-aidlcops

> /plugin install ai-infra agenticops aidlc modernization

✔ 4 plugins enabled · 11 AWS MCP servers pinned

> /oma:autopilot "ship the anomaly detector end to end"

OMA · Inception → Construction → Operations. Approval gates: 4. Agent steps in between: ~40.

The gap

Most AIDLC implementations stop at merge time.

Without OMA

  • Traces pile up in Langfuse but never become PRs.
  • Incident playbooks live in a wiki no on-call reads at 2am.
  • Cost anomalies surface on the next month's invoice.
  • Every operations decision is a human judgement call.

With OMA

  • Trace patterns open draft PRs against the skills that caused them.
  • SEV1 alarms get diagnosed + mitigated with a human approval gate.
  • Budget breaches throttle or downgrade before the ceiling hits.
  • Humans approve at checkpoints. Agents do the rest.

What changes

Three mechanisms that make AIDLC close itself.

  1. 01

    One command, entire lifecycle.

    Spec → design → code → canary deploy → self-healing → cost attribution. /oma:autopilot drives the whole loop and pauses only at explicit approval checkpoints.

  2. 02

    Self-improving from production traces.

    Langfuse traces feed /oma:self-improving. Failure patterns become draft PRs against the skills and prompts that produced them — regression tests run before the PR is opened.

  3. 03

    Humans approve. Agents execute.

    Every Tier-0 workflow sandwiches agent-driven diagnosis, proposal, and execution between explicit human gates. The agent never silently mutates production.

Drop-in

Ship as a plugin inside the tools you already run.

Claude Code plugin

Ship as a native Claude Code marketplace entry. Slash commands, keyword triggers, and the AWS hosted MCP layer work out of the box.

Kiro skills

install/kiro.sh symlinks every skill into ~/.kiro/skills/ and wires kiro-agents profiles with pinned MCP server versions.

Shared .omao state

Tier-0 mode, project memory, and audit logs live in .omao/. Both harnesses read and write the same directory — switch without losing context.

The AIDLC loop, closed.

Inception and Construction describe what will ship. Operations keeps it alive after it ships — and feeds learnings back to Construction without a human in the loop for routine corrections.

  1. 1

    Inception · aidlc

    Workspace detection, adaptive requirements, user stories, workflow plan. Output artifacts become the contract Construction must honor.

  2. 2

    Construction · aidlc

    Component design, code generation with human-approved gates, 12-category risk discovery, TDD for agentic systems, phase quality gates.

  3. 3

    Operations · agenticops

    Autopilot deploys, continuous eval, incident response, cost governance, and the self-improving loop that feeds learnings back into Construction.

Runtime (ai-infra) and brownfield entry (modernization) sit alongside the loop, not inside it.

AgenticOps capabilities

Purpose-built for the autonomous era.

Autopilot deploys

autopilot-deploy runs canary 1% → 10% → 50% → 100% with SLO-gated circuit breakers. Each stage waits for continuous-eval before promotion; regression trips auto-rollback.

  • Argo Rollouts / Flagger
  • Prometheus SLO gates
  • Human approval at 100%

Self-healing

incident-response classifies SEV1–4, pulls the matching runbook, issues diagnostic MCP queries, and drafts a remediation script for approval. SEV1 pages on-call; it never acts.

Cost governance

cost-governance attributes spend per agent, vetoes deploys that would breach the monthly ceiling, and drafts Opus → Sonnet → Haiku downgrade PRs. budget.yaml runs in a simpleeval sandbox — no Python eval, no RCE vector.

CLI first. Always.

Every skill is reachable as a slash command in Claude Code or a direct skill call in Kiro. The full state lives under .omao/ and is portable between harnesses.

> /plugin marketplace add https://github.com/aws-samples/sample-oh-my-aidlcops> /plugin install ai-infra agenticops aidlc modernization> /oma:platform-bootstrap  [1/5] Gather Context  …  ok  [2/5] Pre-flight      …  ok

Nine Tier-0 workflows

Call one slash command. Get a checkpointed plan.

autopilot/oma:autopilot

Full AIDLC loop (Inception → Construction → Operations).

aidlc-loop/oma:aidlc-loop

Single-feature Inception → Construction pass.

inception/oma:inception

AIDLC Phase 1 only — spec, stories, workflow plan.

construction/oma:construction

AIDLC Phase 2 only — design, codegen, agentic TDD.

agenticops/oma:agenticops

Operations mode: continuous-eval + incident-response + cost-governance.

self-improving/oma:self-improving

Langfuse traces → prompt / skill improvement PR.

platform-bootstrap/oma:platform-bootstrap

5-checkpoint Agentic AI Platform bootstrap on EKS.

modernize/oma:modernize

6-stage brownfield modernization (assessment → cutover).

cancel/oma:cancel

Terminate the active Tier-0 mode.

Keyword triggers auto-suggest the right command when your prompt contains a match. See the trigger catalog.

Four plugins

Install only what you need — or all four with one marketplace command.

ai-infraBuild the runtime.

AI runtime infrastructure on AWS. Ships EKS + vLLM + Inference Gateway + Langfuse + GPU + guardrails today; Bedrock / SageMaker runtime skills planned. MCP servers pinned to exact PyPI versions — no @latest.

agentic-eks-bootstrap · vllm-serving-setup · inference-gateway-routing · langfuse-observability · gpu-resource-management · ai-gateway-guardrails

aidlcDesign and build with a spec.

AIDLC Phase 1 (Inception) + Phase 2 (Construction) opt-in extensions for awslabs/aidlc-workflows. Inception captures workspace, requirements, stories, and the workflow plan. Construction turns that plan into components, code, tests, and risk-discovered quality gates.

workspace-detection · requirements-analysis · user-stories · workflow-planning · component-design · code-generation · test-strategy · risk-discovery · quality-gates

agenticopsOperate with agents.

Autonomous operations for production agentic workloads. Incident response, self-improving feedback loops, progressive rollouts with SLO circuit breakers, cost governance with a simpleeval sandbox, and verbatim audit trails.

self-improving-loop · autopilot-deploy · incident-response · continuous-eval · cost-governance · audit-trail

modernizationLift legacy onto AWS.

Brownfield legacy workload modernization using the AWS 6R strategy. Workload assessment with Five Lenses, 6R decision matrix, to-be architecture, containerization hardening, and production cutover planning with rollback triggers.

workload-assessment · modernization-strategy · to-be-architecture · containerization · cutover-planning

30-second install

Three terminal lines to a working loop.

1

Register the marketplace

claude
> /plugin marketplace add https://github.com/aws-samples/sample-oh-my-aidlcops
2

Install the four plugins

> /plugin install ai-infra@oh-my-aidlcops
> /plugin install agenticops@oh-my-aidlcops
> /plugin install aidlc@oh-my-aidlcops
> /plugin install modernization@oh-my-aidlcops
3

Run a Tier-0 workflow

> /oma:autopilot "ship the anomaly detector end to end"

Or start with a safer on-ramp: getting-started guide.

Secure by default

Ship-ready, not just demo-ready.

  • MCP versions pinned

    Every .mcp.json and agent profile references awslabs MCP servers by exact PyPI version. No @latest supply-chain surprises.

  • Read-only EKS MCP

    The Kiro agent profile does not enable --allow-write or --allow-sensitive-data-access by default; opt in explicitly.

  • Least-privilege IAM

    langfuse-observability uses a bucket-scoped customer-managed policy. AmazonS3FullAccess is called out as a Bad Example.

  • Sandboxed expressions

    cost-governance evaluates budget.yaml rules with simpleeval. Python eval() on user-editable config is a documented RCE vector.

  • Session state stays local

    .omao/state, .omao/plans, .omao/logs, audit-trail output, and project memory are gitignored. Verbatim prompts never leave the machine.

  • Safe JSON hooks

    session-start.sh requires jq or python3 and refuses to emit shell-interpolated JSON, preventing state-file injection into context.

FAQ

Common questions before you install.

  • profile.yaml v1 and the 8 ontology schemas are stable; CLI surfaces and the doctor report shape may still evolve before GA. Breaking changes land in CHANGELOG under an explicit "Breaking" heading. See the support policy for the full stability contract.

Stop running the operations loop by hand.

Install once. Approve at the checkpoints. Let agents carry the rest of the AIDLC loop.