EKS Auto Mode — Cleanup Playbook
Why terraform destroy alone fails
aws_eks_cluster deletion tears down the control plane but does NOT drain in-cluster workloads. CSI drivers and the ALB Controller lose their API server mid-reconcile, so PVCs become orphaned EBS volumes, Ingresses become orphaned ALBs/TGs, and EFS access points leak. This applies equally to Auto Mode and Standard Mode clusters.
Drain-before-destroy order
- Delete all Ingresses — triggers ALB Controller finalizers; wait 30-60s for ALB/TG deletion via
describe-load-balancerspolling. - Delete LoadBalancer-type Services — NLBs live here, separate from Ingress ALBs.
- Delete all PVCs — triggers EBS CSI volume detach+delete; poll
describe-volumesuntil cluster-tagged vols are gone. - Remove KEDA ScaledObjects — prevents rescaling during drain (skip if no KEDA CRDs).
- Uninstall Helm releases — triggers external-dns record cleanup via finalizers.
- Delete custom NodePools + NodeClaims — triggers node termination; skip the
general-purposepool (AWS-managed in Auto Mode). - Wait for LB cleanup — poll
elbv2.k8s.aws/cluster=<name>tagged LBs (up to 3 min). - Run
terraform destroy(KEDA sub-module first, then main).
Post-destroy orphan checklist
Delete in this order (dependencies before parents):
| # | Resource type | How to find | Notes |
|---|---|---|---|
| 1 | Target Groups | elbv2 describe-tags with elbv2.k8s.aws/cluster=<name> | Delete before LBs |
| 2 | Load Balancers | Same tag filter | Delete listeners first |
| 3 | EC2 Instances | Tags: karpenter.sh/discovery, aws:eks:cluster-name, kubernetes.io/cluster/<name> | Terminate |
| 4 | EBS Volumes | status=available + tag kubernetes.io/cluster/<name> | Detached = safe to delete |
| 5 | ENIs | status=available + description/SG containing cluster name | |
| 6 | Security Groups | Name or tag contains cluster name | Revoke all rules first |
| 7 | IAM Roles | Name contains cluster name (skip AWSServiceRole*) | Detach policies first |
| 8 | OIDC Providers | ARN contains the 32-char OIDC ID (capture while cluster is alive) | |
| 9 | CloudWatch Logs | Prefix /aws/eks/<name> and /aws/containerinsights/<name> | |
| 10 | KMS Keys | Alias contains cluster name | Schedule deletion (7-day min) |
| 11 | Route53 Records | A/CNAME under *.<domain> + TXT with txtOwnerId | |
| 12 | SQS Queues | Queue name prefix = cluster name | |
| 13 | Launch Templates | Name or aws:eks:cluster-name tag | |
| 14 | Elastic IPs | Unassociated + name/tag containing cluster name |
Usage
./scripts/cleanup.sh [OPTIONS]
--dry-run Show what would be deleted without deleting
--yes Skip all prompts (full non-interactive delete)
--keep-storage Preserve PVC/EBS/EFS resources
--region REGION Override auto-detected region
--cluster-name N Override auto-detected cluster name
--domain DOMAIN Full domain for Route53 sweep
--skip-terraform Orphan sweep only (TF already destroyed)
--skip-keda Skip KEDA sub-module destroy
Common invocations:
./scripts/cleanup.sh --dry-run # preview
./scripts/cleanup.sh --yes # full non-interactive teardown
./scripts/cleanup.sh --skip-terraform --yes # post-hoc orphan sweep
Key gotchas (Auto Mode specific)
- Managed
defaultNodeClass tags revert silently — always use a custom NodeClass for durable tags. - OIDC provider ID must be captured while cluster is alive — script does this in Phase 1.
provider default_tagsdon't reach EKS primary SG — pass viacluster_tags.