Skip to main content

EKS Auto Mode — Cleanup Playbook

Why terraform destroy alone fails

aws_eks_cluster deletion tears down the control plane but does NOT drain in-cluster workloads. CSI drivers and the ALB Controller lose their API server mid-reconcile, so PVCs become orphaned EBS volumes, Ingresses become orphaned ALBs/TGs, and EFS access points leak. This applies equally to Auto Mode and Standard Mode clusters.

Drain-before-destroy order

  1. Delete all Ingresses — triggers ALB Controller finalizers; wait 30-60s for ALB/TG deletion via describe-load-balancers polling.
  2. Delete LoadBalancer-type Services — NLBs live here, separate from Ingress ALBs.
  3. Delete all PVCs — triggers EBS CSI volume detach+delete; poll describe-volumes until cluster-tagged vols are gone.
  4. Remove KEDA ScaledObjects — prevents rescaling during drain (skip if no KEDA CRDs).
  5. Uninstall Helm releases — triggers external-dns record cleanup via finalizers.
  6. Delete custom NodePools + NodeClaims — triggers node termination; skip the general-purpose pool (AWS-managed in Auto Mode).
  7. Wait for LB cleanup — poll elbv2.k8s.aws/cluster=<name> tagged LBs (up to 3 min).
  8. Run terraform destroy (KEDA sub-module first, then main).

Post-destroy orphan checklist

Delete in this order (dependencies before parents):

#Resource typeHow to findNotes
1Target Groupselbv2 describe-tags with elbv2.k8s.aws/cluster=<name>Delete before LBs
2Load BalancersSame tag filterDelete listeners first
3EC2 InstancesTags: karpenter.sh/discovery, aws:eks:cluster-name, kubernetes.io/cluster/<name>Terminate
4EBS Volumesstatus=available + tag kubernetes.io/cluster/<name>Detached = safe to delete
5ENIsstatus=available + description/SG containing cluster name
6Security GroupsName or tag contains cluster nameRevoke all rules first
7IAM RolesName contains cluster name (skip AWSServiceRole*)Detach policies first
8OIDC ProvidersARN contains the 32-char OIDC ID (capture while cluster is alive)
9CloudWatch LogsPrefix /aws/eks/<name> and /aws/containerinsights/<name>
10KMS KeysAlias contains cluster nameSchedule deletion (7-day min)
11Route53 RecordsA/CNAME under *.<domain> + TXT with txtOwnerId
12SQS QueuesQueue name prefix = cluster name
13Launch TemplatesName or aws:eks:cluster-name tag
14Elastic IPsUnassociated + name/tag containing cluster name

Usage

./scripts/cleanup.sh [OPTIONS]

--dry-run Show what would be deleted without deleting
--yes Skip all prompts (full non-interactive delete)
--keep-storage Preserve PVC/EBS/EFS resources
--region REGION Override auto-detected region
--cluster-name N Override auto-detected cluster name
--domain DOMAIN Full domain for Route53 sweep
--skip-terraform Orphan sweep only (TF already destroyed)
--skip-keda Skip KEDA sub-module destroy

Common invocations:

./scripts/cleanup.sh --dry-run # preview
./scripts/cleanup.sh --yes # full non-interactive teardown
./scripts/cleanup.sh --skip-terraform --yes # post-hoc orphan sweep

Key gotchas (Auto Mode specific)

  • Managed default NodeClass tags revert silently — always use a custom NodeClass for durable tags.
  • OIDC provider ID must be captured while cluster is alive — script does this in Phase 1.
  • provider default_tags don't reach EKS primary SG — pass via cluster_tags.