RDS Operation Review — AWS DevOps Agent Skill¶

View on GitHub

A comprehensive Amazon RDS and Aurora operational review skill for AWS DevOps Agent. Conducts best-practices assessments aligned with the Amazon RDS Best Practices Guide, the Aurora Best Practices Guide, and the AWS Well-Architected Framework. Generates a shareable report artifact per resource.

What It Does¶

When activated via Chat, this skill instructs the DevOps Agent to:

Discover RDS instances and Aurora clusters in the configured account/regions.
Collect database configuration, parameter groups, security groups, subnet groups, snapshots, and events.
Collect 7-day historical CloudWatch metrics, RDS log group entries, and 14-day RDS events.
Pull 3-month Cost Explorer data and proportionally estimate per-database monthly cost.
Analyze against five Well-Architected pillars (Security, Reliability, Performance, Cost Optimization, Operational Excellence) plus engine-specific guidance.
Generate a shareable report artifact per resource, named rds-review-<resource>-<YYYY-MM-DD>.md.

All data is gathered through native AWS APIs (rds, cloudwatch, logs, ec2, kms, costexplorer, applicationautoscaling, pi). The skill does not require Kubernetes (k8s), EKS, or Dante MCP servers.

Agent Types¶

This skill is intended for the following agent types (selected in the Operator Web App at upload time):

On-demand — conversational invocation in Chat ("review my Aurora cluster", "RDS health check").
Evaluation — proactive operational improvement recommendations.

Select Generic instead if you want the skill available to all agent types.

Prerequisites¶

1. An AWS DevOps Agent Space with the target AWS account¶

You need an existing Agent Space with the target AWS account configured as a cloud source.

2. IAM permissions for the DevOps Agent's primary cloud-source role¶

The Agent Space's IAM role must have read access to RDS, Aurora, CloudWatch, Logs, EC2, KMS, and Cost Explorer APIs. The AWS managed policy AWSDevOpsAgent...ReadOnlyAccess (or your custom equivalent) typically covers all of the following — verify in your account before running the review:

rds:Describe*, rds:ListTagsForResource, rds:DownloadDBLogFilePortion
pi:DescribeDimensionKeys, pi:GetResourceMetrics (Performance Insights — only if PI is enabled on the target databases)
cloudwatch:GetMetricData, cloudwatch:GetMetricStatistics, cloudwatch:DescribeAlarms, cloudwatch:DescribeAlarmsForMetric
logs:DescribeLogGroups, logs:DescribeLogStreams, logs:FilterLogEvents, logs:GetLogEvents
ec2:DescribeSecurityGroups, ec2:DescribeSubnets, ec2:DescribeVpcs
kms:DescribeKey, kms:ListAliases
application-autoscaling:DescribeScalableTargets, application-autoscaling:DescribeScalingPolicies
ce:GetCostAndUsage, ce:GetReservationCoverage, ce:GetReservationUtilization
tag:GetResources (optional — for cross-service tag reporting)

The skill operates entirely in read-only mode: it never calls Modify*, Reboot*, Restore*, Create*, or Delete* RDS APIs.

3. Performance Insights (recommended)¶

For richer query-level analysis, enable Performance Insights on databases you want to review:

RDS console → Modify the instance → Performance Insights → enable (free 7-day retention by default).
Or aws rds modify-db-instance --db-instance-identifier <id> --enable-performance-insights --apply-immediately.

Without Performance Insights, the skill still produces a complete report from CloudWatch metrics, logs, and configuration data — it just won't include db.load.avg and top-SQL breakdowns.

4. CloudWatch Logs export (recommended)¶

For log-pattern analysis in Step 6, enable CloudWatch Logs export for the engine logs you care about:

MySQL/MariaDB: audit, error, general, slowquery
PostgreSQL: postgresql, upgrade
Oracle: alert, audit, listener, trace
SQL Server: agent, error

RDS console → Modify the instance → Log exports → select log types. The skill reports "log exports disabled" as a finding when this is missing, but cannot retrieve historical log events without it.

5. (Conditional) Network reachability for databases in private VPCs¶

The rds, cloudwatch, logs, and other AWS APIs the skill calls are regional control-plane APIs — they're reached over the public AWS API endpoints regardless of whether the database itself sits in a private VPC. There is no need for a private connection to run the operational review against a database with a private endpoint, because the skill only reads metadata via control-plane APIs (it never connects to the database engine on port 3306 / 5432 / 1521 / 1433).

If your account requires all AWS API traffic to traverse VPC endpoints (org-level guardrail), ensure the following interface VPC endpoints are reachable from the agent's source path: com.amazonaws.<region>.rds, com.amazonaws.<region>.monitoring (CloudWatch), com.amazonaws.<region>.logs, com.amazonaws.<region>.ec2, com.amazonaws.<region>.kms. Cost Explorer is global (ce.us-east-1.amazonaws.com) and does not have a VPC endpoint.

If you separately want the agent to execute SQL inside a private database (out of scope for this skill — but a related capability), use the DevOps Agent private connection mechanism with an MCP server that fronts the database.

Uploading to AWS DevOps Agent¶

Reference: Uploading a skill

1. Package the skill¶

From the skills/ directory in this repo:

cd skills
zip -r rds-operation-review.zip rds-operation-review/ -x 'rds-operation-review/evals/*'

The resulting rds-operation-review.zip contains:

rds-operation-review/
├── SKILL.md          # frontmatter + skill instructions (required)
└── references/
    ├── best-practices-checklist.md
    └── metrics-thresholds.md

evals/ is excluded from the upload to keep the zip small (it's only used for offline evaluation).

Constraints (enforced at upload time):

Total zip size ≤ 6 MB.
SKILL.md is required and must include name and description frontmatter.
A scripts/ directory is not allowed — uploads containing scripts are rejected.

2. Upload via the Operator Web App¶

Navigate to the Skills page in your Agent Space Operator Web App.
Click Add skill → Upload skill.
Drag and drop rds-operation-review.zip (or browse to it).
Select agent types: On-demand and Evaluation (or leave Generic to make it available to all agent types).
Review the validation results.
Click Upload.

3. (Optional) Connect additional observability sources¶

For richer analysis, connect your observability tools to the Agent Space:

Tool	Setup Guide
CloudWatch	Built-in (no setup needed)
Datadog	Connecting Datadog
Dynatrace	Connecting Dynatrace
New Relic	Connecting New Relic
Splunk	Connecting Splunk
Grafana	Connecting Grafana
Custom MCP	Connecting MCP Servers

Usage¶

In the DevOps Agent Chat, use natural language:

"Run an RDS operational review for all databases."
"Review my Aurora cluster prod-aurora in us-east-1 for best practices."
"Audit RDS security and cost optimization across all regions."
"Generate an RDS best-practices report for instance analytics-pg."
"ORR for our production databases."

The agent will:

Collect all data automatically (no prompts for confirmation).
Use only AWS APIs — no Kubernetes or external scripts.
Generate a report artifact per resource, named rds-review-<resource-name>-<YYYY-MM-DD>.md.

Skill Contents¶

rds-operation-review/
├── SKILL.md                           # main skill instructions (with frontmatter)
├── README.md                          # this file
├── references/
│   ├── best-practices-checklist.md    # checklist mapped to RDS / Aurora best practices
│   └── metrics-thresholds.md          # CloudWatch metric thresholds & severity rules
└── evals/                             # evaluation data (not included in upload zip)

Best-Practices Sections Covered¶

#	Pillar	Reference
1	Security (Network, Encryption, IAM auth, Audit logging)	RDS security, Encryption
2	Reliability (Multi-AZ, Backups, DR, Replication)	Multi-AZ, Aurora Global Database
3	Performance (CPU, Memory, IOPS, PI, Connection mgmt)	Monitoring RDS, Performance Insights
4	Cost Optimization (RIs, gp3, Graviton, idle DBs)	Cost-optimized RDS
5	Operational Excellence (Alarms, Events, Maintenance, Tagging)	RDS events, Well-Architected Operational Excellence
Engine-specific	MySQL / PostgreSQL / Oracle / SQL Server / Aurora tuning	linked in SKILL.md

Severity Definitions¶

Severity	Definition	SLA
CRITICAL	Immediate risk to availability, security, or data integrity	24–48 hours
HIGH	Significant gap that could lead to incidents	1 week
MEDIUM	Notable improvement opportunity	30 days
LOW	Minor optimization or hardening	When convenient
INFO	Observation, no action required	N/A

License¶

Internal use.