EKS Operation Review — AWS DevOps Agent Skill¶
A comprehensive Amazon EKS operational review skill for AWS DevOps Agent. Conducts best-practices assessments aligned with the EKS Best Practices Guide and generates a shareable report artifact per cluster.
What It Does¶
When activated via Chat, this skill instructs the DevOps Agent to:
- Discover EKS clusters in the configured account/regions.
- Collect cluster configuration, K8s resources, node groups, add-ons, networking, security, and workloads. K8s API is preferred when reachable; AWS APIs are used as fallback.
- Collect 7-day historical CloudWatch metrics, control-plane logs, and CloudTrail events.
- Analyze against 12 EKS best-practices sections (Security, Reliability, Networking, Scalability, Cost Optimization, Karpenter, Cluster Upgrades, etc.).
- Generate a shareable report artifact per cluster, named
eks-review-<cluster-name>-<YYYY-MM-DD>.md.
Agent Types¶
This skill is intended for the following agent types (selected in the Operator Web App at upload time):
- On-demand — conversational invocation in Chat ("review my EKS cluster", "EKS health check").
- Evaluation — proactive operational improvement recommendations.
Select Generic instead if you want the skill available to all agent types.
Prerequisites¶
1. An AWS DevOps Agent Space with the target AWS account¶
You need an existing Agent Space with the target AWS account configured as a cloud source.
2. Integrate the DevOps Agent with each EKS cluster¶
This grants the Agent Space's IAM role read-only Kubernetes API access via an EKS access entry. Repeat for each cluster you want to review.
Reference: AWS EKS access setup
a. Get the Agent Space IAM role ARN
In the AWS DevOps Agent console, open your Agent Space → Capabilities → Cloud → Primary Source → Edit. Copy the IAM role ARN.
b. Verify cluster authentication mode
In the Amazon EKS console, open the cluster → Access tab. The Authentication mode must include EKS API. If it doesn't, switch to a mode that does (note: this change cannot be reverted).
c. Create the access entry
On the cluster's Access tab:
- Click Create access entry.
- IAM principal: paste the Agent Space IAM role ARN from step (a).
- Click Next.
- Access policy: select the AWS managed policy
AmazonAIOpsAssistantPolicy. - Access scope: choose Cluster (or specific Kubernetes namespaces if you want to limit visibility).
- Click Add Policy → Next → Create.
d. Verify
In the Operator Web App Chat, ask: "list all pods in the default namespace on cluster <name>". If pods are returned, access is configured.
If the agent can't reach the cluster, check that the access entry uses the exact IAM role ARN shown in the Agent Space dialog and that AmazonAIOpsAssistantPolicy is attached.
3. (Conditional) Private connectivity for clusters with a private API endpoint¶
If the cluster's API server endpoint access is private only, the AWS DevOps Agent service can't reach the K8s API over the public network. You have two options:
Option A — enable public + private endpoint access
Easiest path. In the EKS console → cluster → Networking → Manage networking → enable Public and private API server endpoint access. Restrict the public endpoint with public access CIDRs to lock it down. This is the supported, simplest setup.
Option B — create a private connection from the Agent Space to your VPC
Use the AWS DevOps Agent private connection mechanism, which sets up an Amazon VPC Lattice resource gateway with ENIs in your VPC, so the agent can reach private host addresses without exposing them to the internet.
Reference: Connecting to privately hosted tools
Note on EKS coverage. The capability providers that currently bind to a private connection are GitHub, GitLab, MCP Server, and Grafana. EKS isn't a listed capability provider in this list, so the most reliable option for fully private API endpoints today remains Option A. The steps below describe the generic private-connection mechanism for completeness — verify against the current docs that EKS is supported as a target before relying on it in production.
Prerequisites for the private connection:
- An active Agent Space.
- A target service (here, the EKS private API endpoint) reachable at a private DNS name or IP from the chosen VPC, serving HTTPS with TLS 1.2+ on a known port (EKS API: TCP 443).
- 1–20 subnets in the VPC where the resource-gateway ENIs will live (multi-AZ recommended; one subnet per AZ).
- (Optional) Up to 5 security groups to attach to the ENIs. If omitted, a default SG scoped to the chosen ports is created.
- The cluster's API endpoint security group must allow inbound TCP 443 from the resource-gateway ENI security group (or the VPC CIDR).
- Verify the chosen subnets are not in any of the VPC Lattice unsupported AZs.
Console steps:
- Open the AWS DevOps Agent console.
- Capability providers → Private connections → Create a new connection.
- Name: e.g.
eks-private-api. - VPC: the VPC routable to the cluster's private endpoint.
- Subnets: 1–20 subnets, multi-AZ.
- IP address type:
IPv4(typical for EKS). - (Optional) Security groups: SGs that allow egress to TCP 443 of the cluster API endpoint.
- Port ranges:
443. - Host address: the cluster's private API endpoint DNS name (from EKS console → cluster → Overview → API server endpoint, e.g.
<id>.gr7.<region>.eks.amazonaws.com). Must be resolvable from the VPC. - (Optional) Certificate public key: only needed if the endpoint uses a private CA — EKS uses public AWS-issued certs, so this is normally not required.
- Create connection. Status
CREATE_IN_PROGRESS→ACTIVE(up to ~10 min).
CLI equivalent:
aws devops-agent create-private-connection \
--name eks-private-api \
--mode '{
"serviceManaged": {
"hostAddress": "<cluster-id>.gr7.<region>.eks.amazonaws.com",
"vpcId": "vpc-xxxxxxxxxxxxxxxxx",
"subnetIds": ["subnet-aaa", "subnet-bbb"],
"securityGroupIds": ["sg-xxxxxxxxxxxxxxxxx"],
"portRanges": ["443"]
}
}'
aws devops-agent describe-private-connection --name eks-private-api
Troubleshooting:
- ENI security group must allow outbound TCP 443; the cluster's endpoint SG must allow inbound TCP 443 from the ENI SG (or VPC CIDR).
- Subnet route tables must reach the cluster's endpoint network.
- If
CREATE_FAILED: check VPC Lattice quotas, subnet IP availability, and any SCPs blocking the service-linked role. - If using a hub-and-spoke VPC Lattice topology you already manage, use the
selfManagedmode with an existingresourceConfigurationId— see Advanced setup.
Uploading to AWS DevOps Agent¶
Reference: Uploading a skill
1. Package the skill¶
From the skills/ directory in this repo:
cd skills
zip -r eks-operation-review.zip eks-operation-review/ -i '*.md' '*.txt' '*.json' '*.yaml' '*.yml' '*.xml' '*.csv' '*.tsv' '*.html' '*.htm' '*.png' '*.jpg' '*.jpeg' '*.gif' '*.svg' '*.webp' '*.pdf' -x '*/.claude/*' '*/scripts/*' '*/README.md' '*/.skilleval.yaml' '*/.skilleval.yml' '*/CHANGELOG.md' '*/evals/*'
The resulting eks-operation-review.zip contains:
eks-operation-review/
├── SKILL.md # frontmatter + skill instructions (required)
└── references/
├── best-practices-checklist.md
└── metrics-thresholds.md
Constraints (enforced at upload time):
- Total zip size ≤ 6 MB.
SKILL.mdis required and must includenameanddescriptionfrontmatter.- A
scripts/directory is not allowed — uploads containing scripts are rejected.
2. Upload via the Operator Web App¶
- Navigate to the Skills page in your Agent Space Operator Web App.
- Click Add skill → Upload skill.
- Drag and drop
eks-operation-review.zip(or browse to it). - Select agent types: On-demand and Evaluation (or leave Generic to make it available to all agent types).
- Review the validation results.
- Click Upload.
3. (Optional) Connect additional observability sources¶
For richer analysis, connect your observability tools to the Agent Space:
| Tool | Setup Guide |
|---|---|
| CloudWatch | Built-in (no setup needed) |
| Datadog | Connecting Datadog |
| Dynatrace | Connecting Dynatrace |
| New Relic | Connecting New Relic |
| Splunk | Connecting Splunk |
| Grafana | Connecting Grafana |
| Custom MCP | Connecting MCP Servers |
Usage¶
In the DevOps Agent Chat, use natural language:
- "Run an EKS operational review for all clusters."
- "Review my EKS cluster
prodinus-east-1for best practices." - "Audit EKS security and cost optimization."
- "Generate an EKS best-practices report for cluster
genai-workshop."
The agent will:
- Collect all data automatically (no prompts for confirmation).
- Use the K8s API first when reachable; fall back to AWS APIs.
- Generate a report artifact per cluster, named
eks-review-<cluster-name>-<YYYY-MM-DD>.md.
Skill Contents¶
eks-operation-review/
├── SKILL.md # main skill instructions (with frontmatter)
├── README.md # this file
├── references/
│ ├── best-practices-checklist.md # checklist mapped to EKS Best Practices Guide
│ └── metrics-thresholds.md # CloudWatch metric thresholds & severity rules
└── evals/ # evaluation data (not included in upload zip)
evals/ is for skill evaluation tracking and is not required at the cluster — keep it out of the upload zip if you want to minimize size (the zip -r eks-operation-review.zip eks-operation-review/ command above includes it; if you need to slim down, exclude it explicitly: zip -r eks-operation-review.zip eks-operation-review/ -x 'eks-operation-review/evals/*').
Best-Practices Sections Covered¶
| # | Section | Reference |
|---|---|---|
| 1 | Security (IAM, Pod Security, Network, Encryption, etc.) | security.html |
| 2 | Reliability (Applications, Control Plane, Data Plane) | reliability.html |
| 3 | Karpenter | karpenter.html |
| 4 | Cluster Autoscaler | cas.html |
| 5 | EKS Auto Mode | automode.html |
| 6 | Networking | networking.html |
| 7 | Scalability + Data Plane Scaling | scalability.html, scale-data-plane.html |
| 8 | Cluster Upgrades | cluster-upgrades.html |
| 9 | Cost Optimization | cost-opt.html |
| 10–12 | Windows / Hybrid / AI-ML (conditional) | — |
Severity Definitions¶
| Severity | Definition | SLA |
|---|---|---|
| CRITICAL | Immediate risk to availability, security, or data integrity | 24–48 hours |
| HIGH | Significant gap that could lead to incidents | 1 week |
| MEDIUM | Notable improvement opportunity | 30 days |
| LOW | Minor optimization or hardening | When convenient |
| INFO | Observation, no action required | N/A |
License¶
Internal use.