Source
This page is generated from skills/eks-operation-review/references/autoscaling.md. Edit the source, not this page.
Vendored skill
This skill is sourced from eks-operation-review, also maintained by the APEX team.
Autoscaling
Purpose
Assess cluster-level autoscaling (nodes) and workload-level autoscaling (pods), plus AZ resilience.
Checks to Execute
7.1 — Cluster Autoscaling Strategy
What to check:
- Karpenter NodePools and EC2NodeClasses
- Cluster Autoscaler deployment in kube-system
- EKS Auto Mode (cluster computeConfig)
- Node group scaling config (min/max/desired)
- Spot vs On-Demand nodes
- Currently pending pods
How to check:
- List resources
nodepools.karpenter.sh→ check limits and consolidation policy. If 404/NotFound (CRD not installed) → Karpenter is not deployed, proceed to check for Cluster Autoscaler or Auto Mode. If 403/Forbidden → mark Karpenter status UNKNOWN. - List Deployments in kube-system → check for
cluster-autoscaler - Describe cluster →
computeConfigfor Auto Mode - List node groups → describe each for scalingConfig and capacityType
- List nodes → check labels for capacity type (
karpenter.sh/capacity-typeoreks.amazonaws.com/capacityType) - List pods with field selector
status.phase=Pending
Rating:
- 🟢 GREEN: Karpenter or EKS Auto Mode with consolidation enabled (AWS-preferred path)
- 🟡 AMBER: Cluster Autoscaler present (legacy — consider migration to Karpenter), or Karpenter without consolidation
- 🔴 RED: No cluster autoscaling — manual node management
- ⬜ UNKNOWN: Cannot determine scale-up latency without testing
7.2 — Horizontal Pod Autoscaler (HPA)
What to check:
- HPAs across all namespaces (targets, min/max, current replicas)
- Multi-replica deployments without HPA
- HPAs with minReplicas=1 (single point of failure)
- KEDA ScaledObjects
- VPA resources
How to check:
- List HPAs across all namespaces → check minReplicas, maxReplicas, current metrics
- List Deployments with replicas > 1 → cross-reference with HPA targets
- List HPAs where minReplicas == 1 (single point of failure for production workloads; acceptable for dev/staging)
- List ScaledObjects (KEDA CRD). If 404/NotFound → KEDA not installed, skip. If 403/Forbidden → mark UNKNOWN.
- List VPA resources. If 404/NotFound → VPA not installed, skip. If 403/Forbidden → mark UNKNOWN.
Rating:
- 🟢 GREEN: HPAs on stateless production workloads, min >= 2, tested under load
- 🟡 AMBER: HPAs exist but min=1, or some workloads missing HPA
- 🔴 RED: No HPAs — all workloads at fixed replica count
- ⬜ UNKNOWN: Cannot determine if HPAs have been load-tested
7.3 — Pod Topology Spread & AZ Resilience
What to check:
- Node distribution across AZs
- Deployments with topology spread constraints
- Deployments with pod anti-affinity
- Multi-replica deployments with neither (vulnerable to AZ failure)
How to check:
- List nodes → group by label
topology.kubernetes.io/zone - List Deployments → check
spec.template.spec.topologySpreadConstraints - List Deployments → check
spec.template.spec.affinity.podAntiAffinity - List multi-replica Deployments with neither topology spread nor anti-affinity
Rating:
- 🟢 GREEN: Nodes in 3 AZs, topology spread on production deployments
- 🟡 AMBER: Nodes in multiple AZs but no topology spread constraints
- 🔴 RED: Single-AZ deployment, or multi-replica services with no AZ spread
- ⬜ UNKNOWN: Cannot verify actual pod distribution without checking pod-to-node mapping
Key talking point: Having nodes in 3 AZs doesn't mean pods are spread. Without topology spread constraints, the scheduler may pack all pods into one AZ.