RDS Operation Review — AWS DevOps Agent Skill¶
A comprehensive Amazon RDS and Aurora operational review skill for AWS DevOps Agent. Conducts best-practices assessments aligned with the Amazon RDS Best Practices Guide, the Aurora Best Practices Guide, and the AWS Well-Architected Framework. Generates a shareable report artifact per resource.
What It Does¶
When activated via Chat, this skill instructs the DevOps Agent to:
- Discover RDS instances and Aurora clusters in the configured account/regions.
- Collect database configuration, parameter groups, security groups, subnet groups, snapshots, and events.
- Collect 7-day historical CloudWatch metrics, RDS log group entries, and 14-day RDS events.
- Pull 3-month Cost Explorer data and proportionally estimate per-database monthly cost.
- Analyze against five Well-Architected pillars (Security, Reliability, Performance, Cost Optimization, Operational Excellence) plus engine-specific guidance.
- Generate a shareable report artifact per resource, named
rds-review-<resource>-<YYYY-MM-DD>.md.
All data is gathered through native AWS APIs (rds, cloudwatch, logs, ec2, kms, costexplorer, applicationautoscaling, pi). The skill does not require Kubernetes (k8s), EKS, or Dante MCP servers.
Agent Types¶
This skill is intended for the following agent types (selected in the Operator Web App at upload time):
- On-demand — conversational invocation in Chat ("review my Aurora cluster", "RDS health check").
- Evaluation — proactive operational improvement recommendations.
Select Generic instead if you want the skill available to all agent types.
Prerequisites¶
1. An AWS DevOps Agent Space with the target AWS account¶
You need an existing Agent Space with the target AWS account configured as a cloud source.
2. IAM permissions for the DevOps Agent's primary cloud-source role¶
The Agent Space's IAM role must have read access to RDS, Aurora, CloudWatch, Logs, EC2, KMS, and Cost Explorer APIs. The AWS managed policy AWSDevOpsAgent...ReadOnlyAccess (or your custom equivalent) typically covers all of the following — verify in your account before running the review:
rds:Describe*,rds:ListTagsForResource,rds:DownloadDBLogFilePortionpi:DescribeDimensionKeys,pi:GetResourceMetrics(Performance Insights — only if PI is enabled on the target databases)cloudwatch:GetMetricData,cloudwatch:GetMetricStatistics,cloudwatch:DescribeAlarms,cloudwatch:DescribeAlarmsForMetriclogs:DescribeLogGroups,logs:DescribeLogStreams,logs:FilterLogEvents,logs:GetLogEventsec2:DescribeSecurityGroups,ec2:DescribeSubnets,ec2:DescribeVpcskms:DescribeKey,kms:ListAliasesapplication-autoscaling:DescribeScalableTargets,application-autoscaling:DescribeScalingPoliciesce:GetCostAndUsage,ce:GetReservationCoverage,ce:GetReservationUtilizationtag:GetResources(optional — for cross-service tag reporting)
The skill operates entirely in read-only mode: it never calls Modify*, Reboot*, Restore*, Create*, or Delete* RDS APIs.
3. Performance Insights (recommended)¶
For richer query-level analysis, enable Performance Insights on databases you want to review:
- RDS console → Modify the instance → Performance Insights → enable (free 7-day retention by default).
- Or
aws rds modify-db-instance --db-instance-identifier <id> --enable-performance-insights --apply-immediately.
Without Performance Insights, the skill still produces a complete report from CloudWatch metrics, logs, and configuration data — it just won't include db.load.avg and top-SQL breakdowns.
4. CloudWatch Logs export (recommended)¶
For log-pattern analysis in Step 6, enable CloudWatch Logs export for the engine logs you care about:
- MySQL/MariaDB:
audit, error, general, slowquery - PostgreSQL:
postgresql, upgrade - Oracle:
alert, audit, listener, trace - SQL Server:
agent, error
RDS console → Modify the instance → Log exports → select log types. The skill reports "log exports disabled" as a finding when this is missing, but cannot retrieve historical log events without it.
5. (Conditional) Network reachability for databases in private VPCs¶
The rds, cloudwatch, logs, and other AWS APIs the skill calls are regional control-plane APIs — they're reached over the public AWS API endpoints regardless of whether the database itself sits in a private VPC. There is no need for a private connection to run the operational review against a database with a private endpoint, because the skill only reads metadata via control-plane APIs (it never connects to the database engine on port 3306 / 5432 / 1521 / 1433).
If your account requires all AWS API traffic to traverse VPC endpoints (org-level guardrail), ensure the following interface VPC endpoints are reachable from the agent's source path: com.amazonaws.<region>.rds, com.amazonaws.<region>.monitoring (CloudWatch), com.amazonaws.<region>.logs, com.amazonaws.<region>.ec2, com.amazonaws.<region>.kms. Cost Explorer is global (ce.us-east-1.amazonaws.com) and does not have a VPC endpoint.
If you separately want the agent to execute SQL inside a private database (out of scope for this skill — but a related capability), use the DevOps Agent private connection mechanism with an MCP server that fronts the database.
Uploading to AWS DevOps Agent¶
Reference: Uploading a skill
1. Package the skill¶
From the skills/ directory in this repo:
The resulting rds-operation-review.zip contains:
rds-operation-review/
├── SKILL.md # frontmatter + skill instructions (required)
└── references/
├── best-practices-checklist.md
└── metrics-thresholds.md
evals/ is excluded from the upload to keep the zip small (it's only used for offline evaluation).
Constraints (enforced at upload time):
- Total zip size ≤ 6 MB.
SKILL.mdis required and must includenameanddescriptionfrontmatter.- A
scripts/directory is not allowed — uploads containing scripts are rejected.
2. Upload via the Operator Web App¶
- Navigate to the Skills page in your Agent Space Operator Web App.
- Click Add skill → Upload skill.
- Drag and drop
rds-operation-review.zip(or browse to it). - Select agent types: On-demand and Evaluation (or leave Generic to make it available to all agent types).
- Review the validation results.
- Click Upload.
3. (Optional) Connect additional observability sources¶
For richer analysis, connect your observability tools to the Agent Space:
| Tool | Setup Guide |
|---|---|
| CloudWatch | Built-in (no setup needed) |
| Datadog | Connecting Datadog |
| Dynatrace | Connecting Dynatrace |
| New Relic | Connecting New Relic |
| Splunk | Connecting Splunk |
| Grafana | Connecting Grafana |
| Custom MCP | Connecting MCP Servers |
Usage¶
In the DevOps Agent Chat, use natural language:
- "Run an RDS operational review for all databases."
- "Review my Aurora cluster
prod-aurorainus-east-1for best practices." - "Audit RDS security and cost optimization across all regions."
- "Generate an RDS best-practices report for instance
analytics-pg." - "ORR for our production databases."
The agent will:
- Collect all data automatically (no prompts for confirmation).
- Use only AWS APIs — no Kubernetes or external scripts.
- Generate a report artifact per resource, named
rds-review-<resource-name>-<YYYY-MM-DD>.md.
Skill Contents¶
rds-operation-review/
├── SKILL.md # main skill instructions (with frontmatter)
├── README.md # this file
├── references/
│ ├── best-practices-checklist.md # checklist mapped to RDS / Aurora best practices
│ └── metrics-thresholds.md # CloudWatch metric thresholds & severity rules
└── evals/ # evaluation data (not included in upload zip)
Best-Practices Sections Covered¶
| # | Pillar | Reference |
|---|---|---|
| 1 | Security (Network, Encryption, IAM auth, Audit logging) | RDS security, Encryption |
| 2 | Reliability (Multi-AZ, Backups, DR, Replication) | Multi-AZ, Aurora Global Database |
| 3 | Performance (CPU, Memory, IOPS, PI, Connection mgmt) | Monitoring RDS, Performance Insights |
| 4 | Cost Optimization (RIs, gp3, Graviton, idle DBs) | Cost-optimized RDS |
| 5 | Operational Excellence (Alarms, Events, Maintenance, Tagging) | RDS events, Well-Architected Operational Excellence |
| Engine-specific | MySQL / PostgreSQL / Oracle / SQL Server / Aurora tuning | linked in SKILL.md |
Severity Definitions¶
| Severity | Definition | SLA |
|---|---|---|
| CRITICAL | Immediate risk to availability, security, or data integrity | 24–48 hours |
| HIGH | Significant gap that could lead to incidents | 1 week |
| MEDIUM | Notable improvement opportunity | 30 days |
| LOW | Minor optimization or hardening | When convenient |
| INFO | Observation, no action required | N/A |
License¶
Internal use.