Adr 010 error recovery
ADR-010: Error recovery and rollback protocol
Section titled “ADR-010: Error recovery and rollback protocol”Status: proposed Date: 2026-05-19
Context
Section titled “Context”When merged code breaks something, the response is ad-hoc. Agents operating autonomously may merge code that passes CI but breaks integration. No protocol defines when to revert vs. fix forward, who decides, or how stacked PR chains recover.
Decision
Section titled “Decision”Decision tree
Section titled “Decision tree”Broken thing detected├─ Production affected (users impacted NOW)?│ └─ Yes → REVERT immediately, investigate after├─ Fix obvious and < 30 minutes?│ └─ Yes → Fix forward (new PR, not amend)├─ Stacked PR chain?│ └─ Yes → Pause dependent PRs, fix the base└─ Scope of damage unclear? └─ Yes → REVERT (safe default), then investigateRevert protocol
Section titled “Revert protocol”- Create a revert commit (not force-push) — preserves history
- Open an issue: what broke, why CI did not catch it, what the fix needs
- The fix goes through normal review (no rushing, no skipping gates)
Fix-forward protocol
Section titled “Fix-forward protocol”- Only if the fix is obvious, small, and low-risk
- Must still go through PR + review
- If the fix introduces new complexity — revert instead
Stacked PR chain recovery
Section titled “Stacked PR chain recovery”- Identify which PR introduced the breakage
- Pause/close all PRs above it
- Fix the base PR
- Rebase and re-evaluate dependent PRs
- Re-run CI on each before re-opening
Agents must NEVER do during recovery
Section titled “Agents must NEVER do during recovery”- Force-push to shared branches
- Delete branches with others’ work
- Amend published commits
- Skip review “because it’s urgent”
- Self-approve a revert
Consequences
Section titled “Consequences”- (+) Clear decision tree prevents analysis paralysis during incidents
- (+) Revert-first default limits blast radius
- (+) Stacked chain recovery is defined (not improvised)
- (+) History is preserved (revert commits, not force-push)
- (-) Reverts create noise in git history
- (-) Fix-forward temptation may lead to rushed fixes
- (!) “Production affected” requires definition per deployment (self-hosted varies)
References
Section titled “References”- Issue #141 — full RFC with open questions
- ADR-003 — governance (no bypasses during recovery)
- ADR-001 — stacked PRs (chain recovery protocol)
- ADR-009 — security (revert authority tied to role)