Skip to main content
Source

This page is generated from skills/eks-design/references/architecture-validation.md. Edit the source, not this page.

Architecture Validation Guide

Comprehensive validation of EKS architecture completeness, integration feasibility, and technical readiness before handoff to build.

Table of Contents

Validation Framework

Step 1: Requirements Coverage Assessment

Functional requirements:

  1. Extract ALL requirements from available inputs — every phase, every answer
  2. Map each requirement to a specific section in system-architecture.md
  3. Verify every requirement has both a narrative explanation AND an architectural solution
  4. Document gaps — any requirement without a corresponding section is a FAIL

CRITICAL — Comprehensive coverage check: The system-architecture.md must address ALL of the following requirement areas. A design that only covers one domain (e.g., only security) while ignoring compute, networking, observability, etc. is INCOMPLETE and must score 0/25 on Requirements Coverage regardless of how thorough the single domain is.

EKS-specific requirements coverage (ALL mandatory unless marked conditional):

#Requirement AreaArchitectural SolutionMandatory?
1Cluster creationEKS cluster module, version, endpoint accessYes
2Node managementCompute strategy (Karpenter/MNG/Auto Mode), instance types, AMIYes
3NetworkingVPC CNI mode, subnets, ingress, DNS, IP planningYes
4SecurityIAM model, PSA, encryption, secrets, network policiesYes
5Addon deploymentPattern selection (1/2a/2b), addon list with versionsYes
6ObservabilityMetrics, logs, traces stack decisionsYes
7UpgradesUpgrade strategy, sequence, PDBs, disruption budgetsYes
8Cost & scalabilityCost strategies (Graviton, Spot, right-sizing), scaling guidanceYes
9DR & backupBackup tiers, recovery scenarios, availability designYes
10Multi-tenancyNamespace isolation, RBAC, quotas, onboarding, cost attributionIf multi-tenant
11Air-gappedVPC endpoints, ECR pull-through, containerd mirrorsIf air-gapped
12ProxyHTTP_PROXY injection, NO_PROXY listIf proxy required
13Private registryImage overrides, ImagePullSecretsIf private registry
14CompliancePolicy engine, benchmark mapping, audit loggingIf compliance required

Scoring rules for Requirements Coverage (/25):

  • 0/25: Design only covers 1-2 areas (e.g., only security) — automatic fail
  • 5-10/25: Design covers <50% of applicable areas
  • 11-15/25: Design covers 50-75% of applicable areas
  • 16-20/25: Design covers 75-95% of areas, some sections thin
  • 21-25/25: Design covers 95%+ of applicable areas with narrative + tables

Coverage thresholds:

  • 95%+ requirements coverage to proceed
  • 100% critical requirements coverage (no gaps in security, networking, compute)
  • Every section must have narrative prose explaining WHY, not just configuration tables

Step 2: Component Integration Validation

Interface compatibility:

Source -> TargetProtocolAuthData FormatStatus
ALB -> EKS podsHTTP/HTTPSJSON/HTML
EKS -> ECRHTTPSIRSA/PIContainer images
EKS -> S3HTTPSIRSA/PIObjects
ArgoCD -> GitHTTPS/SSHToken/KeyYAML manifests
Karpenter -> EC2AWS APIIRSA/PIInstance lifecycle
External Secrets -> SMAWS APIIRSA/PISecret values

Data flow validation:

  1. Primary flows: Request -> ALB -> Pod -> Response
  2. Addon flows: ArgoCD sync, Karpenter provisioning, External Secrets sync
  3. Security flows: Authentication, authorization, audit logging
  4. Monitoring flows: Metrics collection, log aggregation, alerting

Step 3: AWS Service Limits Assessment

Service inventory: List all AWS services used in the architecture.

ServiceUsageCriticalityDefault LimitExpected UsageRisk
EKSClusterHigh100 clusters/region1Low
EC2NodesHighVaries by type[count][assess]
ENIPod networkingHighVaries by instance[count][assess]
ALB/NLBIngressHigh50/region[count]Low
ECRImagesMedium10,000 repos[count]Low
S3State, backupsMediumUnlimited[count]Low
IAM RolesIRSA/PIMedium1,000/account[count][assess]

Risk levels:

  • Low: <50% of default limit
  • Medium: 50-80% of default limit — request increase proactively
  • High: >80% of default limit — requires mitigation before deployment

EKS-specific limits to check:

  • Pods per node (ENI-based, check instance type)
  • Managed node groups per cluster (30 default)
  • Fargate profiles per cluster (10 default)
  • EKS addons per cluster
  • Security groups per ENI (5 default)
  • IP addresses per subnet (for prefix delegation, this matters)

Step 4: Technical Feasibility Assessment

Technology validation:

TechnologyMaturityTeam ExpertiseRisk
EKS [version]GA[assess]
Karpenter [version]GA[assess]
ArgoCD [version]GA[assess]
Terraform [version]GA[assess]
[each addon]

EKS-specific feasibility checks:

  • Selected EKS version is currently supported
  • Selected instance types are available in target region
  • Selected addons are compatible with EKS version
  • Karpenter version is compatible with EKS version
  • VPC has sufficient IP space for nodes + pods (especially with prefix delegation)
  • If air-gapped: all required VPC endpoints are available in target region
  • If Graviton: all selected addons have arm64 images
  • If GPU: selected GPU instance types are available in target AZs

Step 5: Documentation Completeness

Required documents checklist:

  • system-architecture.md — complete with all sections filled
  • ADRs — one per required category (minimum 6)
  • security-architecture.md — IAM, pod security, encryption, secrets, audit
  • Mermaid diagrams — cluster topology, addon dependencies (minimum 2)
  • Diagrams rendered to high-res PNG (4x scale, white background) in diagrams/ folder
  • If docx/pptx generated: all rendered PNGs embedded in the documents (not just Mermaid code blocks)

ADR quality checklist:

  • Every ADR has at least 2 alternatives with pros/cons
  • Every ADR has specific rationale (not generic "best practice")
  • Every ADR has consequences (positive and negative)
  • Every ADR has research sources

Scoring Matrix

Category Scoring

1. Requirements Coverage (25 points)

CriteriaPointsThreshold
Functional requirements mapped1095%+ = 10, 85-94% = 7, <85% = 4
Non-functional requirements mapped1095%+ = 10, 85-94% = 7, <85% = 4
Constraint requirements addressed5All = 5, most = 3, gaps = 1

2. Component Integration (20 points)

CriteriaPointsThreshold
Interfaces defined and compatible10All = 10, most = 7, gaps = 4
Data flows documented5Complete = 5, partial = 3, missing = 1
Integration patterns appropriate5All appropriate = 5, minor issues = 3

3. Service Limits (15 points)

CriteriaPointsThreshold
All services identified5Complete = 5, most = 3, gaps = 1
Limits analyzed5All = 5, high-risk only = 3, none = 0
Mitigation for high-risk5All mitigated = 5, partial = 3, none = 0

4. Technical Feasibility (20 points)

CriteriaPointsThreshold
Technology validation10All GA/validated = 10, some risk = 7, high risk = 4
EKS-specific checks pass10All pass = 10, minor issues = 7, blockers = 0

5. Documentation Completeness (20 points)

CriteriaPointsThreshold
Required documents present10All = 10, most = 7, major gaps = 4
ADR quality5All pass checklist = 5, partial = 3, poor = 1
Diagram quality5Mermaid in markdown + rendered PNGs in diagrams/ + embedded in docx/pptx = 5, Mermaid only (no PNGs) = 3, missing = 0

Overall Score

ScoreStatusAction
90-100ExcellentProceed to build
85-89GoodProceed, minor improvements optional
70-84ConditionalAddress identified issues before proceeding
<70FailedSignificant rework needed

Minimum to proceed: 85/100 with no critical failures in any category.

Validation Report Format

# Architecture Integration Validation Report

## Executive Summary

- **Overall Score**: X/100
- **Status**: [PASSED / CONDITIONAL / FAILED]
- **Key Findings**: [3-5 bullet points]
- **Recommendation**: [Proceed / Address issues / Rework]

## Requirements Coverage: X/25

[Details per criteria]

## Component Integration: X/20

[Details per criteria]

## Service Limits: X/15

[Details per criteria]

## Technical Feasibility: X/20

[Details per criteria]

## Documentation Completeness: X/20

[Details per criteria]

## Critical Issues

[List any blocking issues]

## Recommendations

[Prioritized improvement list]

## Next Steps

[Clear actions to address gaps or proceed to quality review]