Source

This page is generated from skills/eks-upgrade-check/references/node-readiness.md. Edit the source, not this page.

Vendored skill

This skill is sourced from eks-upgrade-check, also maintained by the APEX team.

Node Readiness

Purpose

Assess node groups, AMI types, version alignment, and migration requirements for the target version.

Checks to Execute

5.1 — Node Group Inventory

How to check:

List all managed node groups → describe each for:
- Kubernetes version
- AMI type (AL2, AL2023, AL2_ARM_64, BOTTLEROCKET_x86_64, etc.)
- Instance types
- Scaling config (min/max/desired)
- Capacity type (ON_DEMAND, SPOT)
- Health status
List nodes via Kubernetes API → get:
- status.nodeInfo.kubeletVersion
- status.nodeInfo.osImage
- status.nodeInfo.kernelVersion
- status.nodeInfo.containerRuntimeVersion
- Labels: topology.kubernetes.io/zone, node.kubernetes.io/instance-type
Check for Karpenter NodePools (nodepools.karpenter.sh)
Check for EKS Auto Mode (computeConfig in cluster describe)

Output per node group:

Name, version, AMI type, instance types, scaling config
Version skew against target (calculated in version-validation)

5.2 — AL2 to AL2023 Migration Assessment

Why this matters:

AL2 standard support ended June 2025
EKS 1.33+ does NOT publish AL2 AMIs — cannot create new AL2 node groups
AL2 uses cgroup v1; AL2023 uses cgroup v2 (required for EKS 1.35+)

How to check:

From node group descriptions, identify AMI type
From node Kubernetes API, check kernelVersion for amzn2 or osImage for Amazon Linux 2
Count AL2 nodes and node groups

Rating:

No AL2 nodes → PASS
AL2 nodes present, target < 1.33 → WARN (plan migration)
AL2 nodes present, target >= 1.33 → FAIL (blocker — no AL2 AMI available)

Migration guidance:

Create new node group with AL2023 AMI type
Cordon old AL2 nodes: kubectl cordon <node-name>
Drain workloads: kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
Delete old node group after all pods rescheduled
Key differences: cgroup v2 default, dnf instead of yum, different kernel

5.3 — Container Runtime Version

How to check:

List nodes → status.nodeInfo.containerRuntimeVersion
Check for containerd 1.x vs 2.x

Rating:

All nodes on containerd 2.x → PASS
Any node on containerd 1.x, target < 1.35 → WARN (plan upgrade)
Any node on containerd 1.x, target >= 1.35 → WARN (last supported version, next will block)

5.4 — Self-Managed Nodes

How to check:

List all nodes
Compare against managed node group nodes (by labels or node group membership)
Nodes not in any managed node group or Karpenter → self-managed

Rating:

No self-managed nodes → PASS
Self-managed nodes present → WARN (no automated upgrade path, manual AMI update required)

5.5 — Subnet IP Capacity

Why this matters:

EKS requires at least 5 available IPs in each cluster subnet to update the control plane (EKS creates new ENIs for the upgraded API server). If any subnet has < 5 IPs, the update-cluster-version API call will fail immediately.
During node group rolling updates, new nodes are launched before old nodes are terminated (surge). Each new node consumes 1 IP for its primary ENI plus additional IPs for the VPC CNI warm pool (pod IPs). Insufficient capacity causes the node group update to hang.

How to check:

Get the cluster subnet IDs from the cluster description (already retrieved in pre-flight Action 2 — resourcesVpcConfig.subnetIds).

Run:

aws ec2 describe-subnets --subnet-ids <subnet-id-1> <subnet-id-2> ... \
  --query 'Subnets[].{SubnetId:SubnetId,AZ:AvailabilityZone,AvailableIPs:AvailableIpAddressCount,CIDR:CidrBlock}' \
  --output table

For each subnet, evaluate AvailableIpAddressCount against thresholds.

Thresholds:

Available IPs	Verdict	Severity
< 5	HARD BLOCKER — control plane upgrade will fail	CRITICAL
5–15	WARNING — control plane OK, but node rolling update at risk if surge needs more IPs	MEDIUM
> 15	PASS	—

Important context for the 5–15 warning: The exact number of IPs needed during node group surge depends on:

Instance type (determines max ENIs and IPs per ENI)
VPC CNI configuration (WARM_IP_TARGET, MINIMUM_IP_TARGET, ENABLE_PREFIX_DELEGATION)
Node group maxSurge setting (default: 1 additional node)

Do NOT report a precise "you need X IPs" number — instead flag the risk and advise the user to verify capacity is sufficient for their instance type and CNI config.

If subnet has < 5 IPs, report:

❌ Subnet IP exhaustion — control plane upgrade will fail

Subnet <subnet-id> in <az> has only <N> available IPs (CIDR: <cidr>). EKS requires at least 5 free IPs per subnet to place control plane ENIs during an upgrade.

Remediation (choose one):

Remove unused ENIs: aws ec2 describe-network-interfaces --filters Name=subnet-id,Values=<subnet-id> Name=status,Values=available --query 'NetworkInterfaces[].NetworkInterfaceId'

Add a new subnet to the cluster: aws eks update-cluster-config --name <cluster> --resources-vpc-config subnetIds=<existing>,<new-subnet>

Expand the subnet CIDR (if VPC allows)

If subnet has 5–15 IPs, report:

⚠️ Low subnet IP capacity — node group upgrade may stall

Subnet <subnet-id> in <az> has <N> available IPs. While this is sufficient for the control plane upgrade (minimum 5), the node group rolling update launches new nodes before terminating old ones. If your instance type + VPC CNI warm pool requires more IPs than are available, the surge node will fail to launch.

Before upgrading: Verify capacity is sufficient for your configuration, or consider adding subnets / enabling VPC CNI prefix delegation to reduce per-pod IP consumption.

Score Impact

Canonical scoring is defined in references/report-generation.md §Category 3 (Node Readiness) and §Category 8 (AL2 Nodes).

Finding	Deduction
Subnet IPs < 5 (hard blocker)	5 pts + hard blocker override (caps score ≤ 59%)
Subnet IPs 5–15 (warning)	2 pts
AL2 nodes (target < 1.33)	2-5 pts
AL2 nodes (target >= 1.33)	10-15 pts
Containerd 1.x	2 pts
Self-managed nodes	3 pts
Max category (combined with version-validation skew)	20 pts

Purpose​

Checks to Execute​

5.1 — Node Group Inventory​

5.2 — AL2 to AL2023 Migration Assessment​

5.3 — Container Runtime Version​

5.4 — Self-Managed Nodes​

5.5 — Subnet IP Capacity​

Score Impact​

Purpose

Checks to Execute

5.1 — Node Group Inventory

5.2 — AL2 to AL2023 Migration Assessment

5.3 — Container Runtime Version

5.4 — Self-Managed Nodes

5.5 — Subnet IP Capacity

Score Impact