CDK Stack Cleanup¶
Automated cleanup mechanism using Step Functions with async polling for reliable deletion of CDK stacks, S3 buckets, and bootstrap resources.
Problem Solved¶
- Timeout Limitations — Original Lambda approach had 15-minute timeout, insufficient for EKS/Aurora (30+ minutes)
- Orphaned Resources — Failed cleanups would delete the cleanup mechanism itself
- Silent Failures — Stack deletion could complete even if cleanup failed
Architecture¶
flowchart TD
List["List Tagged Stacks"] --> Check{"Stacks Found?"}
Check -->|Yes| Delete["Delete Stacks (sequential)"]
Check -->|No| Done["Done"]
Delete --> Eval{"All Succeeded?"}
Eval -->|Yes| Staging["Cleanup CDK Staging Bucket"]
Eval -->|No| Skip["Skip Cleanup (END)"]
Staging --> Cache["Cleanup Cache Bucket"]
Cache --> Toolkit["Delete CDK Toolkit Stack"]
Toolkit --> Complete["Cleanup Complete"] Execution Workflows¶
Successful Cleanup¶
- CloudFormation stack deletion initiated
- Custom resource starts Step Function
- CDK stacks listed and deleted sequentially
- All deletions succeed → cleanup S3 buckets and bootstrap stack
- Cleanup completion function removes all retained resources
- No resources left behind
Failed Cleanup¶
- One or more stack deletions fail
- Step Function skips cleanup steps
- CloudFormation stack deletion fails (
DELETE_FAILED) - All cleanup resources preserved for retry
Retry After Failure¶
- Investigate Step Function execution history
- Manually resolve the issue
- Delete the CloudFormation stack again
- Step Function automatically retries
Key Features¶
- No timeout limits — Step Functions can run up to 1 year
- Failure detection — Cleanup only proceeds if ALL deletions succeed
- Retry capability — Failed cleanups preserve all resources for retry
- Complete cleanup — On success, all Lambda functions, IAM roles, EventBridge rules, and the state machine itself are deleted
Triggers¶
- Stack Deletion — EventBridge rule monitors
DELETE_IN_PROGRESSstatus - CodeBuild Failure — EventBridge rule monitors
FAILED,FAULT,STOPPED,TIMED_OUT
Troubleshooting¶
CDK Bootstrap Stack Deleted Prematurely
If the bootstrap stack is removed before cleanup completes, re-bootstrap from CloudShell: