Skip to main content
Source

This page is generated from skills/eks-build/SKILL.md. Edit the source, not this page.

EKS Build

Generate complete, production-ready EKS infrastructure projects. All generated code is terraform apply-ready with zero manual fixups.

All generated Terraform code MUST follow the conventions from these companion skills:

  • terraform-skill -- resource block ordering, variable conventions, file organization, version constraints

When to Use

  • Generating EKS Terraform code from requirements (new project)
  • Scaffolding EKS infrastructure with air-gapped, proxy, or compliance constraints
  • Creating ArgoCD GitOps integration for EKS addon management
  • Implementing two-phase webhook ordering for eks-blueprints-addons
  • Configuring IRSA or Pod Identity for EKS workloads
  • Generating ArgoCD ApplicationSets for addon lifecycle
  • Creating Ralph loop validation configurations for EKS projects
  • Adding custom addons (Kyverno, Prisma, New Relic) to an EKS project

Don't Use

  • Designing EKS architecture from requirements (use eks-design)
  • General Terraform module development or testing (use terraform-skill)
  • EKS cluster reconnaissance or discovery (use eks-recon)
  • EKS operational best practices reference (use eks-best-practices)

Internet Search Requirements

This skill generates code from scratch every time. Always search the internet for:

  • Latest versions of all addons, Helm charts, and Terraform modules before generating code -- never use hardcoded versions from cached knowledge
  • Latest EKS best practices when requirements don't exactly match existing patterns
  • Addon-specific configuration for addons not fully covered in existing patterns
  • Breaking changes in new versions of addons or Terraform modules
  • Pattern 2b: KRO, ACK, and Crossplane -- these evolve rapidly. Always search for latest RGD schema syntax, supported CRDs, ProviderConfig format, and known limitations

The references in this skill cover the most common scenarios. For anything beyond that, research first, then adapt.

Pattern Decision Matrix

Key distinction: All patterns use Terraform for the EKS cluster and VPC infrastructure. The patterns differ in how K8s addons and AWS addon resources are managed after the cluster exists.

FactorPattern 1: Full TerraformPattern 2a: ArgoCD + TF AWSPattern 2b: ArgoCD + K8s-native AWS
K8s addon managementTerraform (Helm releases)K8s-native (ArgoCD ApplicationSets)K8s-native (ArgoCD ApplicationSets)
AWS resources for addonsTerraform (IRSA roles, S3 buckets)Terraform for AWS IaC (IRSA roles, S3)K8s-native controller (ACK, Crossplane)
Deployment workflowCI/CD runs terraform applyFull GitOps -- ArgoCD reconciles from GitFull GitOps -- ArgoCD reconciles from Git
Drift detectionterraform plan (manual/scheduled)ArgoCD self-heal + terraform plan for AWSArgoCD self-heal + controller reconciliation
Terraform surface areaEverything (cluster + addons + AWS)Cluster + AWS resources (addons in ArgoCD)Cluster only (addons + AWS both in K8s)
Air-gapped supportBest (Terraform controls all images)Good (ECR mirrors + ArgoCD)Good (same as 2a)
StatusValidatedValidatedValidated

For detailed architecture, trade-offs, and selection criteria see references/pattern-guide.md.

Pattern 2a vs 2b -- The Defining Difference

The ONLY difference between 2a and 2b is WHO creates the AWS resources that addons need (S3 buckets, IAM roles, policies, Route53 zones):

Pattern 2aPattern 2b
Who creates AWS resources?Terraform (module "eks_blueprints_addons", aws_iam_role)K8s-native controller (ACK CRDs, Crossplane MRs)
Terraform surface areaCluster + IAM + S3 + policiesCluster + controller bootstrap ONLY

Self-check: If generated code still uses Terraform for IRSA roles or S3 buckets for addons, that is Pattern 2a, NOT 2b.

Pattern 2b does NOT prescribe which controller. The choice is: ACK, Crossplane, or any K8s-native AWS controller. KRO provides orchestration on top (composing multiple resources into RGDs). Always search the internet for latest KRO docs before generating Pattern 2b code.

Workflow

Step 1: Gather Requirements

Accept requirements via interactive questionnaire or a requirements YAML file. If YAML is supplied, validate coverage (cluster, compute, networking, addons, auth, compliance, multi-tenancy) and prompt for gaps.

Step 2: Select Pattern

Based on requirements, select Pattern 1, 2a, or 2b from the decision matrix.

Step 3: Create Project Structure

projects/<project-name>/
├── design/ # Architecture docs (eks-design skill)
└── code/ # Terraform code (this skill)

Step 4: Generate Pattern Scaffold

Generate root Terraform files under projects/<project-name>/code/ following terraform-skill conventions.

  • Pattern 1 -- main.tf implements two-phase module architecture (Critical Rule 1). Phase 1: LBC + Gatekeeper with wait = true. Phase 2: all remaining addons with depends_on = [module.eks_addons_webhooks].
  • Pattern 2a/2b -- main.tf provisions EKS cluster and GitOps Bridge. For 2a include IRSA roles. For 2b include only controller bootstrap IAM.

Before generating, search the internet for latest stable versions of terraform-aws-modules/eks, terraform-aws-modules/vpc, aws-ia/eks-blueprints-addons, and all Helm charts. Pin exact versions. See references/version-matrix.md.

Step 5: Generate Required Modules

Under projects/<project-name>/code/modules/:

  • eks-cluster/ -- always. Wraps terraform-aws-modules/eks with opinionated defaults from references/baseline-defaults.md.
  • custom-addons/ -- if custom addons enabled. See references/addon-catalog.md.
  • kyverno-policies/ -- if Kyverno CIS benchmark enabled.
  • eks-tenants/ -- if multi-tenancy enabled.
  • eks-gitops-bridge/ -- Pattern 2 only.

Step 6-7: Configure and Customize

Edit configs/*.yaml based on requirements and apply customization patches per references/customization-guide.md:

  • Air-gapped, Enterprise proxy, Private registry, Compliance-strict

Step 8: Generate GitOps Artifacts (Pattern 2 only)

Generate the gitops/ tree with ArgoCD ApplicationSets. Use cluster ARN as destination server (Critical Rule 10). For OCI Helm charts, omit oci:// prefix in repoURL (Critical Rule 10b).

Step 9-10: README and Validate

Generate README and run scripts/validate_project.sh to verify structure, formatting, and critical configuration.

Critical Build Rules

Non-negotiable lessons from production deployments. Violating any causes deployment failures. See references/lessons-learned.md for full context.

1. Two-Phase Module Architecture (Pattern 1)

Split eks-blueprints-addons into TWO module calls:

  • Phase 1: LBC + Gatekeeper only, wait: true
  • Phase 2: All other addons, depends_on = [module.eks_addons_webhooks]

LBC and Gatekeeper register webhooks (failurePolicy: Fail) before pods are ready. Without two-phase, these webhooks block all other addon deployments.

2. before_compute: true

MANDATORY for vpc-cni and eks-pod-identity-agent. Without this, nodes fail with NodeCreationFailure: NetworkPluginNotReady.

3. Cluster-Autoscaler Version Match

Image tag MUST match EKS K8s minor version (e.g., v1.x.y for EKS 1.x).

4. LBC Requires Explicit vpcId

IMDS fallback fails with hop-limit restrictions. Always set vpcId in LBC Helm values.

5. Multus DISABLED

Thick-plugin pod-lookup race blocks ALL new pod creation. Never enable.

6. Version Pinning -- Always Search Internet

Never rely on module defaults or hardcoded versions. Search for every addon and module version before generating. See references/version-matrix.md.

7. Kyverno syncOptions (Pattern 2)

Use Replace=true, NOT ServerSideApply=true. SSA conflicts with ArgoCD selfHeal: true + --force.

8. Velero Configuration

Three-part config: S3 bucket (encrypted) + Auth (Pod Identity or IRSA) + Helm values (upgradeCRDs: false, pin kubectl.image.tag).

9. ACK/KRO Are Capabilities, Crossplane Is Not

ACK and KRO run as EKS Capabilities (no pods). Validate via CRDs, not kubectl pods. Crossplane runs as pods via Helm chart.

10. ArgoCD Destination Server (Pattern 2)

Use cluster ARN as destination server, NOT https://kubernetes.default.svc. AppProjects require sourceNamespaces: [argocd].

10b. ArgoCD OCI Helm Chart repoURL

Do NOT include oci:// prefix in repoURL. Use public.ecr.aws/karpenter with chart: karpenter.

10c. S3 Bucket Persistence in Destroy/Apply Cycles (Pattern 2b)

S3 buckets persist after terraform destroy because the controller is deleted first. Add aws s3 rb s3://<bucket> --force || true as pre-destroy step.

11. Use HashiCorp Terraform, NOT OpenTofu

OpenTofu v1.10.6 has severe depends_on performance regression (30+ min plan vs < 2 min with Terraform v1.14.5). Require HashiCorp Terraform v1.14.5+.

12. ArgoCD EKS Capability Requires IAM Identity Center

The AWS-managed ArgoCD EKS Capability requires an existing IAM Identity Center instance. Self-managed ArgoCD can use any auth method.

Customization Rules

ConstraintTriggerGuide
Air-gappednetwork.air_gapped = truecustomization-guide.md S1
Enterprise proxynetwork.proxy.enabled = truecustomization-guide.md S2
Private registryregistry.type = "private"customization-guide.md S3
Compliance-strictSecurity posture requirementcustomization-guide.md S4

Output Structure

Pattern 1

projects/<project-name>/code/
├── main.tf # Two-phase module architecture
├── locals.tf, data.tf, providers.tf, variables.tf, outputs.tf, versions.tf
├── modules/
│ ├── eks-cluster/
│ ├── custom-addons/
│ └── kyverno-policies/
├── configs/
│ ├── cluster.yaml, compute.yaml, addons.yaml, backend.hcl
└── validation-checklist.md

Pattern 2

projects/<project-name>/code/
├── main.tf # Cluster + GitOps Bridge
├── locals.tf, data.tf, providers.tf, variables.tf, outputs.tf, versions.tf
├── modules/
│ ├── eks-cluster/
│ ├── eks-gitops-bridge/
│ └── custom-addons/
├── configs/
│ ├── cluster.yaml, compute.yaml, addons.yaml, backend.hcl
├── gitops/
│ ├── addons/applicationset.yaml
│ ├── custom-addons/applicationset.yaml
│ ├── bootstrap/argocd-projects.yaml
│ └── tenants/applicationset.yaml
└── validation-checklist.md

Deployment

Pattern 1:

cd projects/<project-name>/code
terraform init -backend-config=configs/backend.hcl
terraform plan && terraform apply

Pattern 2:

cd projects/<project-name>/code
terraform init -backend-config=configs/backend.hcl
terraform plan && terraform apply
aws eks update-kubeconfig --name <CLUSTER_NAME> --region <REGION>
kubectl apply -f gitops/bootstrap/argocd-projects.yaml
kubectl apply -f gitops/addons/applicationset.yaml
kubectl apply -f gitops/custom-addons/applicationset.yaml

Expected Timing (HashiCorp Terraform v1.14.5+)

StepPattern 1Pattern 2
terraform plan< 2 min< 1 min
terraform apply~18 min~16 min
ArgoCD syncN/A~5 min
terraform destroy~10 min~15 min

Completion Checklist

Every item must be done before handoff:

  • Internet search for latest versions (not from cached knowledge)
  • Pattern selected and confirmed
  • Project structure created with pattern scaffold
  • Required modules generated
  • Configuration customized with project-specific values
  • Customization patches applied (if applicable)
  • GitOps artifacts generated (Pattern 2 only)
  • terraform fmt -check passes
  • before_compute: true set for vpc-cni and eks-pod-identity-agent
  • Two-phase module architecture present (Pattern 1)
  • All versions explicitly pinned
  • Multus NOT enabled
  • validation-checklist.md generated

References

Read these as needed based on the task at hand:

  • Baseline Defaults -- Read when generating cluster, compute, networking, or security configuration. Covers all default values applied to every project.
  • Pattern Guide -- Read when selecting between patterns or implementing pattern-specific architecture.
  • Customization Guide -- Read when applying air-gapped, proxy, private registry, or compliance constraints.
  • Addon Catalog -- Read when configuring specific addons or adding custom addons. Covers all 29+ supported addons.
  • Lessons Learned -- Read when troubleshooting deployment failures or understanding why a Critical Build Rule exists.
  • Version Matrix -- Read when looking up authoritative version sources for addons and Terraform modules.
  • Checkov Config -- Read when setting up security scanning for generated Terraform code.