FinServ GenAI Risk Checks (FS-01 to FS-69)
This document is the complete reference for the Financial Services (FS-XX) GenAI security
checks derived from the AWS User Guide to Governance, Risk, and Compliance for Responsible AI
Adoption
(referred to throughout as “the Responsible AI GRC guide”). It combines the shared reference material
(severity rubric, guide traceability, upstream-overlap table, compliance mapping) with the full
set of check definitions, organised into three parts:
- Part 1 — Infrastructure & Resource Controls (FS-01 to FS-26): unbounded consumption,
excessive agency, supply chain, training-data poisoning, vector & embedding weaknesses.
- Part 2 — Guardrails & Content Safety (FS-27 to FS-46): non-compliant output,
misinformation, abusive/harmful output, biased output, sensitive information disclosure.
- Part 3 — Application-Layer Controls & Material Gaps (FS-47 to FS-69): hallucination,
prompt injection, improper output handling, off-topic output, out-of-date training data,
cross-category gap checks.
Of the 69 FS numbers, 64 ship as standalone checks; 5 (FS-17, FS-18, FS-19, FS-23, FS-64)
are merged into upstream SM/BR checks and appear here as upstream-extension notes. See
Relationship to upstream SM/BR/AC checks for the
consolidation table.
Each check includes how it is detected (the AWS API calls or configuration inspected) and
how a failure is remediated (the specific AWS actions to take). Severities follow a
documented Likelihood × Impact methodology — see the
FinServ Severity Methodology, with
authoritative per-finding assignments in the
FinServ Severity Register.
Table of Contents
Shared reference
Checks
About the source
The 69 FS checks are derived from the AWS User Guide to Governance, Risk, and Compliance for
Responsible AI Adoption
(referred to throughout as “the Responsible AI GRC guide”).
Each check includes how it is detected (the AWS API calls or configuration inspected)
and how a failure is remediated (the specific AWS actions to take).
Guide traceability
The Responsible AI GRC guide organizes AI-specific risks into 15 categories (§1.2.1 through
§1.2.15). Every check below is tagged with one of:
- [Guide §x.y.z] — mitigation is explicitly listed in that guide section’s “Mitigations or controls”
table or “Practical guidance” callout.
- [Guide §x.y.z, extension] — mitigation is consistent with the guide’s risk description but is
not verbatim in the guide; included because it is a widely-accepted AWS best practice for the
same risk. These are labelled so reviewers know the provenance.
Severity rubric
Severities follow a documented Likelihood × Impact methodology mapped to the AWS Security Hub
ASFF label set (Informational | Low | Medium | High; Critical is reserved, not used this
round). The full methodology, the 3×3 scoring matrix, the N/A disposition rules, and the
authoritative per-finding assignments are in
SECURITY_CHECKS_FINSERV_SEVERITY_METHODOLOGY.md
and SECURITY_CHECKS_FINSERV_SEVERITY_REGISTER.md.
| Severity |
Criteria (ASFF-aligned) |
| High |
Control whose absence can lead to direct regulatory breach, data exposure, large-scale financial loss, or full bypass of safety guardrails. |
| Medium |
Control whose absence materially increases the likelihood or impact of a risk category but does not by itself produce a breach. |
| Low |
Control that reduces residual risk or supports audit/observability but has alternative or compensating controls. |
| Informational |
No actionable issue is asserted. Used for three dispositions: (1) NOT_APPLICABLE — the control’s resource type is absent (e.g., no Knowledge Bases, no guardrails); (2) ADVISORY — the control cannot be verified via AWS APIs and requires human review (finding name prefixed ADVISORY:); (3) checks awaiting manual verification. |
Disposition rules (how a finding’s severity is set): severity is a property of the control
(its Likelihood × Impact), applied to that control’s Passed and Failed rows alike. The N/A
family is fixed by disposition: NOT_APPLICABLE → Informational, ADVISORY → Informational,
COULD_NOT_ASSESS (access denied / unsupported region) → Low. The legacy “Advisory” tier in
earlier revisions of this document is reconciled to the Informational label + N/A status +
ADVISORY: name prefix.
Validation note
Detection and remediation guidance in this document was systematically validated against the
Responsible AI GRC guide, current AWS documentation, API references, and AWS announcements as
of April 2026. IAM action names were verified against the AWS Service Authorization Reference
for Amazon Bedrock,
Amazon Bedrock AgentCore,
and Amazon OpenSearch Serverless
(note: the OpenSearch Serverless IAM prefix is aoss:, not opensearchserverless: — the latter
is the boto3 client name).
CloudWatch metric namespaces were verified against the service-specific monitoring docs (Bedrock,
Bedrock Agents, Bedrock Guardrails, SageMaker Model Monitor, SageMaker Clarify). CloudTrail
event-type classification (management vs data) for Bedrock API operations was verified against the
Bedrock CloudTrail integration guide.
Cost Anomaly Detection monitor-type values were verified against the
AnomalyMonitor API reference.
Where AWS does not prescribe a specific value (e.g., grounding thresholds), this is explicitly
called out as an assessment recommendation rather than an AWS requirement. AWS regional
availability of new features (Automated Reasoning, AgentCore Policy, AWS Security Agent,
cross-account guardrails) evolves rapidly — region lists in Parts 1-3 reflect the state at the
cited announcement date and should be re-verified before audit reliance.
Contribution workflow
The FS checks are contributed via a pull request from a personal GitHub fork of
aws-samples/sample-aiml-security-assessment. For the contribution process — feature-request
GitHub issue, fork + feature branch, Conventional Commits, PR, and reviewer assignment — see
CONTRIBUTING.md and the Developer Guide.
Key quality gates before opening the PR:
ruff check and ruff format --check pass on functions/security/finserv_assessments/.
cfn-lint and sam validate --lint pass on the SAM templates.
- ASH v3 scan
(
ash --source-dir . --fail-on-findings --config-overrides
'global_settings.severity_threshold=MEDIUM') reports zero Critical / High findings,
or suppressions are documented in the ASH configuration used for the scan.
- Amazon Code Defender (
git defender scan) reports no secrets in the staged diff.
Because aws-samples is an OSPO-managed organization, pushes to your personal fork of
aws-samples/* are auto-allowed by Code Defender — a Git Defender exception ticket is
not expected for this contribution.
Relationship to upstream SM/BR/AC checks
The upstream sample-aiml-security-assessment
framework already provides 52 core security checks (SM-01 to SM-25, BR-01 to BR-14, AC-01 to AC-13).
The 69 FS checks in this document are additive: they enhance the upstream with FinServ-specific
detection and remediation guidance drawn from the Responsible AI GRC guide. A few FS
checks overlap with upstream checks — in those cases, the FS check adds FinServ-specific depth
(e.g., protected-attribute facets, regulatory cadence requirements, denied-topic content for
financial advice). The table below surfaces each overlap with a systematic recommendation based
on five factors: (1) whether the detection target is the same AWS resource/configuration, (2)
whether the FS check adds FinServ-specific regulatory specificity, (3) severity differentiation,
(4) whether a customer would remediate them differently, and (5) guide-traceability value.
Recommendation values:
- Extend upstream — merge FS detection/remediation detail into the upstream check; do not ship FS as a standalone entry in the final report. Best when both checks target the same resource and the FS content is an enhancement.
- Keep separate — ship as a standalone FS check alongside the upstream check. Best when the FS check targets a different AWS resource, has materially different severity, or encodes a FinServ-specific regulatory requirement that would be diluted by merging.
| FS check |
Upstream check |
Overlap analysis |
Recommendation |
| FS-17 (Model Monitor Data Quality) |
SM-07 (Model Monitor) |
Same resource (sagemaker:ListMonitoringSchedules); FS-17 adds training-data-drift-specific guidance, exact CloudWatch namespace (/aws/sagemaker/Endpoints/data-metric), and emit_metrics requirement. |
Extend SM-07 — add FS-17’s detection detail (namespace, emit_metrics) as a refinement of the existing check |
| FS-18 (Model Drift Detection) |
SM-23 (Model Drift Detection) |
Same name, same resource, same detection logic (MonitoringType=ModelQuality). FS-18 adds Guide §1.2.14 low-entropy classification monitoring as an early-warning poisoning indicator. |
Extend SM-23 — add low-entropy monitoring as a new remediation step on SM-23; do not ship FS-18 separately |
| FS-19 (Model Registry Approval) |
SM-08 (Model Registry) / SM-22 (Model Approval Workflow) |
SM-22 is conceptually identical. FS-19 specifies exact ModelApprovalStatus=PendingManualApproval default and flags auto-approved latest versions. |
Extend SM-22 — add FS-19’s detection specificity (flag auto-approved latest versions) to SM-22; do not ship FS-19 separately |
| FS-20 (Feature Store Rollback) |
SM-15 (Feature Store Encryption) |
Different security properties on the same resource: SM-15 checks encryption; FS-20 checks OfflineStoreConfig presence for point-in-time rollback. |
Keep separate — different security property; no true overlap |
| FS-39 (SageMaker Clarify Bias) |
SM-06 (Clarify Usage) |
Same resource family but SM-06 is Severity Low and generic (“validates Clarify for bias detection”); FS-39 is Severity High with specific MonitoringType=ModelBias, protected-attribute facets (age/gender/race/geography), and specific bias metrics (DPL, DI, DPPL) for FinServ decision models. |
Keep separate — severity, detection specificity, and FinServ regulatory context (ECOA/Fair Housing) warrant a standalone check |
| FS-41 (SageMaker Clarify Explainability) |
SM-06 (Clarify Usage) |
Same as FS-39 but for MonitoringType=ModelExplainability. FS-41 is Severity High with SHAP analysis for adverse-action-notice use cases. |
Keep separate — severity and adverse-action-notice regulatory context justify a standalone check |
| FS-22 (KB IAM Least Privilege) |
BR-01 (IAM Least Privilege) |
BR-01 detects the managed policy AmazonBedrockFullAccess on any role. FS-22 inspects role policy documents for wildcard bedrock:* affecting KB actions and requires ARN-scoped resource restrictions. |
Keep separate — different detection logic (managed-policy attachment vs policy-document statement analysis); FS-22 fills a detection gap BR-01 does not cover |
| FS-23 (KB CloudTrail Logging) |
BR-06 (CloudTrail Logging) |
BR-06 verifies CloudTrail is logging Bedrock API calls generally. FS-23 specifically requires an advanced event selector for AWS::Bedrock::KnowledgeBase to capture Retrieve/RetrieveAndGenerate data events (NOT logged by default). |
Extend BR-06 — add FS-23’s data-event-selector requirement as a refinement of the same CloudTrail check |
| FS-25 (OpenSearch Serverless Encryption) |
BR-09 (Knowledge Base Encryption) |
Different AWS resources: BR-09 checks the Bedrock KB’s kmsKeyArn; FS-25 checks the underlying AOSS collection’s encryption policy (aoss:ListSecurityPolicies(type=encryption)). A KB can be CMK-encrypted while its vector store is not. |
Keep separate — different AWS resources with independent encryption configurations; both needed for defense-in-depth |
| FS-26 (KB VPC Access) |
BR-02 (VPC Endpoint Configuration) |
BR-02 checks Bedrock VPC endpoints exist. FS-26 checks the AOSS collection’s network policy for AllowFromPublic=true (whether the vector store itself is internet-reachable). |
Keep separate — orthogonal controls: Bedrock VPC endpoint vs vector-store network policy |
| FS-27 (Automated Reasoning / Contextual Grounding) |
BR-05 (Guardrail Configuration) |
BR-05 verifies a guardrail exists and is enforced. FS-27 checks for automatedReasoningPolicy or contextualGroundingPolicy with specific threshold (≥ 0.7). |
Keep separate — policy-level guardrail content BR-05 does not evaluate |
| FS-28 (Financial Denied Topics) |
BR-05 |
BR-05 is existence; FS-28 inspects topicPolicy.topics for FinServ-specific denied topics (investment advice, tax advice, guaranteed returns). |
Keep separate — FinServ denied-topic content is a regulatory-specific requirement not representable as a generic extension |
| FS-36 (Guardrail Content Filters) |
BR-05 |
FS-36 inspects contentPolicy.filters for HATE/VIOLENCE/SEXUAL/INSULTS/MISCONDUCT/PROMPT_ATTACK with strength ≥ MEDIUM. |
Keep separate — policy-level detection BR-05 does not cover |
| FS-38 (Word Filters and Allowlists) |
BR-05 |
FS-38 inspects wordPolicy.words and managedWordLists for FinServ business-term allowlist guidance. |
Keep separate — advisory business-term allowlist has no upstream equivalent |
| FS-45 (Guardrail PII Filters) |
BR-05 |
FS-45 inspects sensitiveInformationPolicy.piiEntities for 12 specific PII types critical to FinServ (SSN, bank account, SWIFT code, etc.) with inputAction=BLOCK/outputAction=ANONYMIZE. |
Keep separate — FinServ-specific PII entity list is a distinct regulatory requirement |
| FS-47 (Grounding Threshold) |
BR-05 |
FS-47 checks contextualGroundingPolicy.filters for GROUNDING filter with threshold ≥ 0.7. |
Keep separate — threshold-value check BR-05 does not perform |
| FS-50 (Relevance Grounding Filters) |
BR-05 |
Same as FS-47 but for RELEVANCE filter type. |
Keep separate — distinct filter type |
| FS-51 (Prompt Attack Filters) |
BR-05 |
FS-51 checks PROMPT_ATTACK filter in Standard tier with input-tagging requirement and inputStrength=HIGH. |
Keep separate — Standard-tier cross-region-inference opt-in and input-tagging nuance warrant standalone guidance |
| FS-59 (Guardrail Topic Allowlist) |
BR-05 |
FS-59 checks topicPolicy.topics exist to block off-topic conversations (politics, entertainment, medical advice). |
Keep separate — off-topic content restrictions are distinct from FS-28’s regulated-advice restrictions; different guide section (§1.2.2 vs §1.2.1) |
| FS-64 (Guardrail Trace Logging) |
BR-04 (Model Invocation Logging) |
BR-04 verifies invocation logging is enabled. FS-64 additionally verifies the log output captures guardrailTrace with action/inputAssessments/outputAssessments and adds NYDFS/SR 11-7 retention guidance. |
Extend BR-04 — add guardrail-trace verification as a refinement of the same invocation-logging check; retention guidance can be a remediation note |
Summary of consolidation recommendations
- Extend upstream (5 FS checks merged into 5 upstream checks): FS-17 → SM-07; FS-18 → SM-23; FS-19 → SM-22; FS-23 → BR-06; FS-64 → BR-04. These checks are replaced by upstream-extension notes in Parts 1 and 3 and are removed from
finserv_assessments/app.py.
- Keep separate (64 FS checks): All other FS checks ship as standalone entries. This includes FS-20, FS-22, FS-25, FS-26, FS-39, FS-41, all Guardrail-policy-level checks (FS-27, FS-28, FS-36, FS-38, FS-45, FS-47, FS-50, FS-51, FS-59), and all FS checks that have no upstream overlap at all.
After consolidation the combined framework contains 52 upstream + 64 FS = 116 distinct checks (down from 52 + 69 = 121 before merging). The consolidation reduces duplication without losing FinServ-specific regulatory depth.
Compliance Framework Mapping
Disclaimer: The mappings below are preliminary and illustrative, provided by the
authors of this assessment to help FSI teams start conversations with their MRM/compliance
colleagues. They are not authoritative AWS compliance guidance and they have not been
reviewed by AWS Security Assurance Services, external auditors, or the regulators whose
frameworks are named. Each firm should have its own MRM, Legal, and Compliance teams
validate these mappings against the firm’s specific interpretation of each framework before
relying on them as audit evidence.
Each FS check maps to one or more FinServ regulatory frameworks (preliminary mapping):
| Framework |
Description |
Relevant Checks |
| SR 11-7 |
Federal Reserve Model Risk Management Guidance |
FS-07, FS-10, FS-12 to FS-16, FS-20, FS-21, FS-27 to FS-33, FS-34, FS-39 to FS-42, FS-66, FS-67 |
| FFIEC CAT |
Cybersecurity Assessment Tool |
All FS checks |
| NYDFS 500 |
NY Cybersecurity Regulation |
FS-22, FS-43 to FS-46, FS-51 to FS-54, FS-66 |
| PCI-DSS |
Payment Card Industry Data Security Standard |
FS-22, FS-25, FS-26, FS-43 to FS-46, FS-53, FS-56, FS-67, FS-68 |
| DORA |
EU Digital Operational Resilience Act |
FS-01 to FS-06, FS-08, FS-11, FS-54, FS-65, FS-68 |
| MAS TRM 9 |
Monetary Authority of Singapore Technology Risk Management |
FS-07 to FS-11, FS-15, FS-27 to FS-30, FS-32, FS-37, FS-39 to FS-42, FS-66, FS-67 |
| ISO 27001 |
Information Security Management |
FS-13, FS-14, FS-16, FS-21, FS-33, FS-46, FS-52, FS-63, FS-65 |
| ECOA/Fair Housing |
Equal Credit Opportunity Act (US) |
FS-39 to FS-42 (advisory — applicability depends on whether the model is used for ECOA-covered credit decisions; confirm with your compliance team) |
| OWASP LLM Top 10 |
OWASP LLM Application Security |
FS-51 to FS-58, FS-68, FS-69 |
FS-34 note: FS-34 (TPRM for FM Providers) is listed above under SR 11-7. Although the
check appears in the Misinformation section of Part 2 for numbering continuity, its
primary guide source is §1.2.12 Supply Chain, which is the lens MRM and TPRM teams will
evaluate it through.
Part 1 — Infrastructure & Resource Controls (FS-01 to FS-26)
Guide risk categories: Unbounded Consumption (FS-01..06, §1.2.11), Excessive Agency (FS-07..11, §1.2.9), Supply Chain Vulnerabilities (FS-12..16, §1.2.12), Training Data & Model Poisoning (FS-17..21, §1.2.14), Vector & Embedding Weaknesses (FS-22..26, §1.2.15). FS-17, FS-18, FS-19, and FS-23 are merged into upstream checks — see the extension notes in each section.
Unbounded Consumption (FS-01 to FS-06)
Guide source: §1.2.11 Unbounded consumption. Guide-listed mitigations: (a) AWS WAF and Shield
Advanced for LLM APIs; (b) maximum input length limits; (c) rate limits/quotas on APIs
accessing LLMs; (d) cost-and-usage tracking for generative AI. Practical guidance in the guide
also calls out max_tokens optimisation and CloudWatch metrics for token usage.
FS-01 — WAF and Shield Protection
| Field |
Detail |
| Severity |
Medium (WAF) / Low (Shield Advanced) |
| Guide ref |
[Guide §1.2.11] — “Protect your LLM APIs and Amazon Bedrock-hosted LLMs by using AWS WAF and AWS Shield Advanced.” Also covers: “To protect your API endpoints, set maximum length limits for input requests when you use large language models (LLMs) directly or through Amazon Bedrock.” |
| Description |
Verifies AWS WAF Web ACLs and Shield Advanced protect GenAI API endpoints, and verifies the Web ACL enforces both rate-based limits and body-size (input-length) constraints. |
| Detection |
Calls shield:DescribeSubscription to check Shield Advanced is active. Calls wafv2:ListWebACLs(Scope=REGIONAL) in each region where GenAI API endpoints run to verify at least one regional Web ACL exists (covers API Gateway, ALB, AppSync). Additionally calls wafv2:ListWebACLs(Scope=CLOUDFRONT) in us-east-1 to detect Web ACLs protecting CloudFront distributions fronting GenAI workloads — CLOUDFRONT-scope Web ACLs must be created and queried in us-east-1 per the WAF resources documentation. For each Web ACL found, calls wafv2:GetWebACL and inspects the Rules array for: (a) at least one RateBasedStatement (rate limiting) and (b) at least one SizeConstraintStatement with FieldToMatch=Body or FieldToMatch=JsonBody (input-size limit — this implements Guide §1.2.11 mitigation “set maximum length limits for input requests when you use large language models (LLMs) directly or through Amazon Bedrock”). Flags accounts with no Web ACL in either scope, a Web ACL with no rate-based rule, a Web ACL with no body size-constraint rule, or where Shield Advanced is inactive. |
| Remediation |
1. Subscribe to AWS Shield Advanced via the Shield console. 2. Create a WAF Web ACL with both (a) a rate-based rule (e.g., 1 000 req / 5 min per IP) and (b) a SizeConstraintStatement that blocks requests where FieldToMatch=Body (or JsonBody for JSON APIs) exceeds your LLM’s expected maximum input size — for example, ComparisonOperator=GT, Size=100000 (100 KB) — use Scope=REGIONAL for API Gateway/ALB/AppSync resources, or Scope=CLOUDFRONT (created in us-east-1) for CloudFront distributions fronting Bedrock. The body size-constraint rule directly implements the Guide §1.2.11 mitigation “set maximum length limits for input requests when you use large language models (LLMs) directly or through Amazon Bedrock” and prevents large-prompt token-exhaustion attacks before they reach Bedrock. 3. Associate the ACL with the fronting resource (API Gateway stage, ALB, or CloudFront distribution). 4. Add AWS Managed Rules (e.g., AWSManagedRulesCommonRuleSet, which includes additional size checks). 5. For CloudFront-fronted workloads, register the distribution with Shield Advanced via shield:CreateProtection to unlock automatic application-layer DDoS mitigation. 6. For API Gateway REST APIs, also note the service’s own payload-size quota: the default is 10 MB per request (see API Gateway quotas); use a request validator or Lambda authorizer for sub-10 MB limits where WAF size constraints are unsuitable. |
| Reference |
Shield Advanced, WAF, WAF Size Constraint Rule, API Gateway Quotas |
FS-02 — API Gateway Rate Limiting
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.11] — “protect your API endpoints by implementing rate limits and quotas for APIs that access large language models (LLMs)”. |
| Description |
Checks API Gateway usage plans enforce throttling on GenAI endpoints. |
| Detection |
Calls apigateway:GetUsagePlans and inspects each plan’s throttle.rateLimit and throttle.burstLimit. Flags plans where either is zero or absent. |
| Remediation |
1. Create or update usage plans with rateLimit and burstLimit values appropriate for your traffic. 2. Associate plans with API stages serving Bedrock. 3. Issue per-consumer API keys with individual quotas. |
| Reference |
API Gateway Throttling |
FS-03 — Bedrock Token Quota Review
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.11, extension] — guide practical guidance notes “Bedrock has default quota on model inference based on token usage” and recommends optimising max_tokens. Quota review as an operational control is an extension aligned with this guidance. |
| Description |
Verifies Bedrock TPM/RPM quotas have been reviewed and set appropriately. |
| Detection |
Calls service-quotas:ListServiceQuotas(ServiceCode=bedrock) for applied quotas and ListAWSDefaultServiceQuotas for defaults, then compares each adjustable quota’s Value against the default Value. Flags accounts where every quota equals the service default (indicating no quota review or increase has been requested). |
| Remediation |
1. Review current quotas in the Service Quotas console. 2. Request increases aligned with expected peak load via service-quotas:RequestServiceQuotaIncrease. 3. Implement client-side token counting and pre-flight quota checks. 4. Use Bedrock cross-region inference profiles to distribute load — note that cross-region inference routes requests across destination regions automatically with no additional cost, but requires the invoked model to be available in the destination regions defined in the inference profile. |
| Reference |
Bedrock Quotas |
FS-04 — Cost Anomaly Detection
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.11] — “Track, allocate, and manage your costs and usage for generative AI.” |
| Description |
Checks AWS Cost Anomaly Detection monitors cover Bedrock/SageMaker. |
| Detection |
Calls ce:GetAnomalyMonitors and inspects each monitor. AWS Cost Anomaly Detection supports exactly two MonitorType values per the AnomalyMonitor API: DIMENSIONAL (AWS-managed, where MonitorDimension is one of SERVICE, LINKED_ACCOUNT, TAG, or COST_CATEGORY) and CUSTOM (customer-managed, scoped via MonitorSpecification to specific values). For DIMENSIONAL monitors, checks MonitorDimension=SERVICE (the AWS-managed “AWS services” monitor that automatically covers all services including Bedrock and SageMaker — the recommended default). For CUSTOM monitors, inspects MonitorSpecification for references to Bedrock or SageMaker. Flags accounts with no monitors, or with only narrowly-scoped monitors that would not detect Bedrock cost anomalies (e.g., DIMENSIONAL with MonitorDimension=LINKED_ACCOUNT only). |
| Remediation |
1. Create an AWS-managed DIMENSIONAL monitor with MonitorDimension=SERVICE for comprehensive coverage across all AWS services (the recommended default — in the console this appears as “AWS services” under “Managed by AWS”). For narrower scope, add a CUSTOM monitor using MonitorSpecification with a Dimensions expression scoped to specific service values (e.g., {"Dimensions": {"Key": "SERVICE", "Values": ["Amazon Bedrock", "Amazon SageMaker"]}}) — note that for CUSTOM monitors you use MonitorSpecification, not MonitorDimension. 2. Configure alert subscriptions (SNS/email) for anomalies above threshold. 3. Set daily spend budgets with AWS Budgets as a secondary control. 4. Enable Bedrock IAM principal cost allocation: tag IAM users/roles with team or cost-center attributes, activate them as cost allocation tags in the Billing and Cost Management console, and include caller identity data in CUR 2.0 exports for per-user/per-team Bedrock spend attribution. |
| Reference |
Cost Anomaly Detection, Bedrock IAM Cost Allocation |
FS-05 — CloudWatch Token Usage Alarms
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.11] — guide practical guidance cites CloudWatch metrics for token usage; alarms operationalise that guidance. |
| Description |
Verifies CloudWatch alarms exist for Bedrock throttling and token metrics. |
| Detection |
Paginates cloudwatch:DescribeAlarms(AlarmTypes=MetricAlarm) and filters for alarms in the AWS/Bedrock namespace or with “bedrock” in the alarm name. Separately counts throttle-specific alarms. |
| Remediation |
1. Create alarms for AWS/Bedrock InvocationThrottles (threshold > 0). 2. Create alarms for AWS/Bedrock EstimatedTPMQuotaUsage to track approach to token quota limits, and separately on InputTokenCount + OutputTokenCount (sum via CloudWatch metric math) for absolute token consumption. Note: TokensProcessed is not a valid Bedrock metric — the correct runtime metrics are InputTokenCount, OutputTokenCount, InvocationThrottles, EstimatedTPMQuotaUsage, Invocations, InvocationLatency, TimeToFirstToken. 3. Publish custom application-level token counters via Embedded Metric Format (EMF) if you need per-tenant or per-feature attribution. 4. Attach SNS actions to all alarms. |
| Reference |
Bedrock CloudWatch Metrics |
FS-06 — AWS Budgets AI/ML Spend
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.11] — “Track, allocate, and manage your costs and usage for generative AI.” |
| Description |
Checks AWS Budgets are configured with alerts for AI/ML service spend. |
| Detection |
Calls budgets:DescribeBudgets and inspects each budget’s FilterExpression (the current field) and CostFilters (deprecated but may still be populated on older budgets) for references to “bedrock” or “sagemaker”. Note: CostFilters is marked deprecated in the AWS Budgets API — new budgets use FilterExpression with an Expression object; the detection should check both fields to cover both old and new budgets. |
| Remediation |
1. Create cost budgets for Bedrock and SageMaker with 80 %/100 % alert thresholds. 2. Add SNS notifications to on-call channels. 3. Consider budget actions to apply IAM deny policies when thresholds are breached. 4. Enable Bedrock IAM principal cost allocation to attribute inference costs to specific IAM users/roles via Cost Explorer and CUR 2.0 — tag IAM principals with team or cost-center attributes and activate them as cost allocation tags. |
| Reference |
AWS Budgets, Bedrock IAM Cost Allocation |
Excessive Agency (FS-07 to FS-11)
Guide source: §1.2.9 Excessive agency. Guide-listed mitigations: (a) Amazon Bedrock AgentCore
for managing complex tasks; (b) least-privilege permissions on plugins; (c) human-in-the-loop
output validation; (d) explicit action boundaries in agent configuration (AgentCore Policy);
(e) audit logging of agent actions with reasoning chain (AgentCore Observability);
(f) transaction-value thresholds on agent tool calls; (g) monitoring agent call rates with
alarms (AgentCore Evaluations). Mitigation (e) is covered by the expanded FS-08 check, which
now verifies both AgentCore Policy Engine and AgentCore Observability are configured.
FS-07 — Agent Action Boundaries
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.9] — “grant only the minimum permissions required”; “Define and enforce explicit action boundaries in the agent configuration”. |
| Description |
Verifies Bedrock agent execution roles have no wildcard sensitive actions (iam:*, s3:*, ec2:*, lambda:*, *). |
| Detection |
Calls ListAgents and GetAgent (via the bedrock-agent boto3 client; IAM actions are bedrock:ListAgents and bedrock:GetAgent) to retrieve each agent’s agentResourceRoleArn. Resolves the role name and inspects attached and inline policy documents from the permissions cache for wildcard Allow statements. |
| Remediation |
1. Replace wildcard actions with the specific actions the agent needs. 2. Apply IAM permission boundaries to agent execution roles. 3. Use resource-level conditions to restrict to specific ARNs. 4. Implement human-in-the-loop approval for high-impact actions. 5. For agents deployed in a VPC, use AWS Network Firewall with domain-based filtering to control which external domains agents can reach — this provides a network-layer boundary that limits agent tool access to approved endpoints regardless of IAM permissions. |
| Reference |
Bedrock Agent Permissions, Control Agent Domain Access |
FS-08 — AgentCore Policy Engine and Observability
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.9] — “Use Amazon Bedrock AgentCore to manage complex tasks and connect securely”; “Define and enforce explicit action boundaries”; “Implement audit logging of all actions taken by AI agents, including the reasoning chain that led to each action.” (The audit-logging mitigation’s guide reference is “Observe your agent applications on Amazon Bedrock AgentCore Observability.”) |
| Description |
Checks AgentCore Gateways have a Policy Engine attached to authorize agent-to-tool interactions, verifies AgentCore Runtimes have an inbound authorizer configured, and verifies AgentCore Observability is enabled so agent reasoning chains and tool calls are auditable. |
| Detection |
(a) Calls ListGateways and GetGateway (via the bedrock-agentcore-control boto3 client; IAM actions are bedrock-agentcore:ListGateways and bedrock-agentcore:GetGateway); inspects policyEngineConfiguration.arn and policyEngineConfiguration.mode (must be ENFORCE for production). (b) Calls ListAgentRuntimes (IAM action bedrock-agentcore:ListAgentRuntimes) and inspects each runtime’s authorizerConfiguration.customJWTAuthorizer for inbound auth. (c) Verifies AgentCore Observability is enabled by (i) checking that CloudWatch Transaction Search is on via xray:GetTraceSegmentDestination (destination should be CloudWatchLogs) and that the X-Ray → CloudWatch Logs resource policy is in place via logs:GetResourcePolicy, and (ii) calling logs:DescribeDeliveries / logs:DescribeDeliverySources for AgentCore resource sources (runtime, memory, gateway, built-in tools, identity) — flags runtimes/gateways with no log delivery configured. For memory resources, additionally checks that tracing was enabled at memory creation time. Flags gateways without a Policy Engine in ENFORCE mode, runtimes without an authorizer, or accounts where Transaction Search is not enabled or no delivery exists for AgentCore resources. |
| Remediation |
1. Configure a Policy Engine: create via CreatePolicyEngine (IAM action bedrock-agentcore:CreatePolicyEngine), then author Cedar policies using one of three methods: (a) write Cedar directly for fine-grained control via CreatePolicy (IAM action bedrock-agentcore:CreatePolicy), (b) use the form-based console UI, or (c) generate Cedar from natural language descriptions (natural-language-to-Cedar is a documented capability in the GA announcement; verify the exact IAM action name against the current AgentCore Service Authorization Reference before writing IAM policies for it). Policy in AgentCore went GA on March 3, 2026 in thirteen AWS regions (US East N. Virginia, US East Ohio, US West Oregon, Asia Pacific Mumbai/Seoul/Singapore/Sydney/Tokyo, Europe Frankfurt/Ireland/London/Paris/Stockholm) — verify current regional availability on the launch announcement before audit reliance. 2. Attach the Policy Engine to each Gateway by specifying the Policy Engine ARN in the policyEngineConfiguration field during CreateGateway, or attach later via UpdateGateway. 3. Start in LOG_ONLY mode — the policy engine evaluates actions and logs whether they would be allowed or denied without enforcing the decision — then switch to ENFORCE mode once confident. 4. Configure a JWT inbound authorizer on each Runtime with discovery URL, allowed audiences, and allowed clients. 5. Enable AgentCore Observability so agent reasoning chains are captured (directly addresses the Guide §1.2.9 audit-logging mitigation): (a) one-time enable CloudWatch Transaction Search — console path CloudWatch → Application Signals (APM) → Transaction search → Enable Transaction Search, or CLI: aws xray update-trace-segment-destination --destination CloudWatchLogs plus a logs:PutResourcePolicy granting xray.amazonaws.com permission to logs:PutLogEvents on aws/spans:* and /aws/application-signals/data:*; (b) configure log delivery for AgentCore runtime, memory, gateway, built-in tools, and identity resources via logs:PutDeliverySource + logs:PutDeliveryDestination + logs:CreateDelivery (CloudWatch Logs / S3 / Firehose destinations supported; note the write APIs use Put* for source and destination but Create* for the delivery pairing); (c) enable tracing at memory creation. For traditional Bedrock Agents (non-AgentCore), set enableTrace=true on InvokeAgent calls to receive the reasoning-chain trace in the response. |
| Reference |
Policy in AgentCore, Inbound JWT Authorizer, AgentCore Observability Configuration, Bedrock Agent Trace View |
FS-09 — Agent Transaction Limits
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.9, extension] — Lambda reserved concurrency is not named in the guide, but it directly implements the guide mitigation “Monitor agent call rates and alarm upon exceeding defined thresholds” by capping execution parallelism. |
| Description |
Verifies agent Lambda functions have reserved concurrency limits to cap execution parallelism. |
| Detection |
Calls lambda:ListFunctions and filters for functions with agent-related naming patterns. For each, calls lambda:GetFunctionConcurrency and flags functions with no reserved concurrency set. |
| Remediation |
1. Set reserved concurrency on each agent action-group Lambda (e.g., 10–50 depending on expected load). 2. Add CloudWatch alarms for Throttles metric on these functions. 3. Consider Step Functions execution limits as an additional control. |
| Reference |
Lambda Reserved Concurrency |
FS-10 — Human-in-the-Loop Approval
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.9, §1.2.1, §1.2.2, §1.2.3, §1.2.7, §1.2.10] — “For internal AI systems, validate outputs with human review before business use (human-in-the-loop).” HITL is referenced in six separate guide risk sections. |
| Description |
Checks Step Functions workflows have human approval steps for high-risk agent actions. |
| Detection |
Calls stepfunctions:ListStateMachines and filters for agent/GenAI-related names. Retrieves each definition via stepfunctions:DescribeStateMachine and parses the ASL JSON for task states with .waitForTaskToken or callback patterns indicating human approval gates. |
| Remediation |
1. Add a callback-pattern task state in your Step Functions workflow before any high-risk action (financial transactions, data modifications, external communications). 2. Route the approval token to a human reviewer via SNS/SQS/Slack. 3. Set a HeartbeatSeconds timeout so stale approvals expire. 4. Enable user confirmation on Bedrock Agent action groups for inline approval — when configured, the agent returns a confirmation prompt in the returnControl.invocationInputs field of the InvokeAgent response (alongside invocationType and a unique invocationId); the client displays the prompt, collects confirm/deny, and returns the user’s decision via sessionState.returnControlInvocationResults (with confirmationState on each apiResult/functionResult) in the next InvokeAgent request (there is no standalone GetUserConfirmation API). |
| Reference |
Step Functions Callback Pattern, Bedrock Agent User Confirmation |
FS-11 — Agent Rate Alarms
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.9] — “Monitor agent call rates and alarm upon exceeding defined thresholds.” |
| Description |
Verifies CloudWatch alarms exist for agent invocation rates. |
| Detection |
Paginates cloudwatch:DescribeAlarms and filters for alarms referencing “agent” in the alarm name or targeting AWS/Bedrock/Agents agent-related metrics (such as InvocationCount or InvocationThrottles with the Operation, AgentAliasArn, ModelId dimension combination). |
| Remediation |
1. Create CloudWatch alarms on the AWS/Bedrock/Agents namespace for InvocationCount and InvocationThrottles. Per AWS docs, the available dimensions are: Operation alone; Operation, ModelId; or Operation, AgentAliasArn, ModelId — use the Operation, AgentAliasArn, ModelId combination to scope alarms to a specific agent alias. 2. Set thresholds based on expected peak agent call rates, established via CloudWatch metric math on historical InvocationCount data. 3. Attach SNS actions for on-call notification. 4. Use AgentCore Evaluations (GA March 2026, available in 9 AWS regions — verify current regional availability on the GA announcement) to monitor agent quality alongside rate-based alarms: online evaluation continuously scores production traffic against 13 built-in evaluators (response quality, safety, task completion, tool usage), and on-demand evaluation supports regression testing. |
| Reference |
Bedrock Agents CloudWatch Metrics, AgentCore Evaluation Types |
Supply Chain Vulnerabilities (FS-12 to FS-16)
Guide source: §1.2.12 Supply chain vulnerabilities. Guide-listed mitigations:
(a) control access to serverless and marketplace models (IAM policies, SCPs);
(b) model onboarding process — EULA review, procurement, security/compliance review,
MRM assessment, documentation, stakeholder approvals;
(c) update TPRM to continuously monitor model providers — vendor security advisories,
deprecation notices, T&C changes;
(d) maintain a model inventory recording provenance, version, license terms, and risk
assessment status;
(e) use Bedrock Evaluations against attack test cases (practical guidance);
(f) allow-list approved models via SCP (practical guidance).
FS-12 — SCP Model Access Restrictions
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.12 — Practical guidance] — “Implement an allow-list of models using a Service Control Policy (SCP) for your AWS organization.” |
| Description |
Checks SCPs restrict Bedrock model access to approved models only. |
| Detection |
Calls organizations:ListPolicies(Filter=SERVICE_CONTROL_POLICY) and inspects each SCP document for Deny statements on bedrock:InvokeModel* with StringNotEquals conditions on bedrock:ModelId. Flags if no SCP restricts model access. |
| Remediation |
1. Create an SCP that denies bedrock:InvokeModel* except for an explicit allowlist of approved model ARNs. 2. Attach the SCP to the OU containing GenAI workload accounts. 3. For multi-account guardrail enforcement, use the Bedrock cross-account safeguards feature (GA April 3, 2026, available in all AWS commercial and GovCloud regions where Bedrock Guardrails is supported): enable the Amazon Bedrock policy type in AWS Organizations, create a guardrail in the management account, create a versioned guardrail, optionally attach a resource-based policy granting bedrock:ApplyGuardrail to member accounts for cross-account access, then create and attach an AWS Organizations Bedrock policy referencing the guardrail ARN and version to the target OUs or accounts. This automatically enforces content filters, denied topics, word filters, sensitive information filters, and contextual grounding checks across all member accounts for every model invocation — no application code changes required. Important limitation: Automated Reasoning checks are not supported with cross-account safeguards — omit Automated Reasoning policies from any guardrail used for org-level enforcement. If you rely on AR (see FS-27), you must configure AR guardrails separately at the application or account level. 4. Test with both allowed and denied model IDs. |
| Reference |
Managing Access in AWS Organizations, Bedrock Cross-Account Guardrails |
FS-13 — Model Inventory Tagging
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.12] — “Maintain a model inventory that records the provenance, version, license terms, and risk assessment status of all models in use across the organization.” |
| Description |
Verifies models are tagged with provenance metadata (source, version, approval-date). |
| Detection |
Calls bedrock:ListFoundationModels and bedrock:ListCustomModels. For custom models, calls bedrock:ListTagsForResource and checks for required tag keys: model-source, model-version, approval-date, risk-tier. |
| Remediation |
1. Define a mandatory tagging policy for all AI/ML models. 2. Tag each custom model with provenance metadata. 3. Create an AWS Config rule (required-tags) to enforce the tagging policy. 4. For foundation models, maintain an external inventory spreadsheet or CMDB entry. |
| Reference |
Bedrock Tagging |
FS-14 — Model Onboarding Governance
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.12] — “To onboard a model, follow these steps: Review EULA, Complete procurement, Follow security and compliance procedures, Assess MRM requirements, Document findings, Get necessary approvals from stakeholders.” |
| Description |
Checks AWS Config rules enforce model onboarding governance (EULA review, MRM assessment, stakeholder approval). |
| Detection |
Calls config:DescribeConfigRules and searches for rules targeting AWS::Bedrock::* resources or custom rules with “model” or “onboarding” in the name. |
| Remediation |
1. Create a custom AWS Config rule that checks new Bedrock custom models have required tags (approval-date, risk-tier, eula-reviewed). 2. Document the model onboarding process: EULA review → procurement → security/compliance review → MRM assessment → stakeholder sign-off. 3. Store approval artifacts in a versioned S3 bucket. |
| Reference |
AWS Config Custom Rules |
FS-15 — Adversarial Model Evaluation
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.12 — Practical guidance] — “Amazon Bedrock Evaluations can help to evaluate models against specific types of attacks by automating your test cases, scoring, reporting and to enable comparison of different models.” |
| Description |
Verifies Bedrock evaluation jobs include adversarial test datasets. |
| Detection |
Calls bedrock:ListEvaluationJobs and inspects each job’s configuration for evaluation datasets. Flags if no evaluation jobs exist or if none reference adversarial/red-team test data. |
| Remediation |
1. Create a Bedrock model evaluation job with adversarial prompt datasets (prompt injection attempts, jailbreak sequences, harmful content probes). 2. Include both automated metrics and human evaluation. 3. Run evaluations before production deployment and after model updates. 4. Store results for audit. |
| Reference |
Bedrock Model Evaluation |
FS-16 — ECR Image Scanning
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.12, extension] — ECR image scanning is not named in the guide, but directly mitigates the guide’s listed risk “Third-party package vulnerabilities” in LLM supply chains. Included for completeness of the supply-chain risk category. |
| Description |
Checks ECR repositories have scan-on-push enabled for supply chain security of model containers. |
| Detection |
Calls ecr:DescribeRepositories and for each repository checks imageScanningConfiguration.scanOnPush. Also checks whether Amazon Inspector ECR scanning is enabled via inspector2:BatchGetAccountStatus. Flags repositories relying solely on basic scanning or with no scanning configured. |
| Remediation |
1. Enable enhanced scanning via Amazon Inspector (the current best practice) — Inspector provides continuous vulnerability monitoring, re-scanning images automatically when new CVEs are published, and covers both OS and programming language package vulnerabilities. This requires two steps: (a) enable Inspector ECR scanning at the account level — aws inspector2 enable --account-ids <account-id> --resource-types ECR; (b) set the ECR registry scanning configuration to enhanced mode — aws ecr put-registry-scanning-configuration --scan-type ENHANCED --rules '[{"scanFrequency":"CONTINUOUS_SCAN","repositoryFilters":[{"filter":"*","filterType":"WILDCARD"}]}]'. Important limitations: (i) When enhanced scanning is first enabled, Amazon Inspector only discovers images pushed within the last 14 days — older images receive SCAN_ELIGIBILITY_EXPIRED status and must be re-pushed to be scanned. (ii) After the initial scan, scan duration is controlled by the ECR re-scan duration setting in the Amazon Inspector console (defaults to LIFETIME); if you shorten this duration, images whose last scan exceeds the new window also move to SCAN_ELIGIBILITY_EXPIRED. (iii) Enhanced scanning incurs Amazon Inspector charges (no additional ECR cost). (iv) Repositories not matching a scan filter will have Off scan frequency and won’t be scanned. 2. If enhanced scanning is not available in your region, enable basic scan-on-push as a fallback: aws ecr put-image-scanning-configuration --repository-name <name> --image-scanning-configuration scanOnPush=true. 3. Create EventBridge rules to alert on CRITICAL/HIGH findings from Inspector. 4. Integrate findings into your vulnerability management workflow. |
| Reference |
ECR Enhanced Scanning, Amazon Inspector ECR Scanning |
Training Data & Model Poisoning (FS-17 to FS-21)
Guide source: §1.2.14 Training data and model poisoning. Guide-listed mitigations:
(a) protect training datasets via data protection best practices;
(b) use trusted data sources with audit controls tracking changes (who/when);
(c) monitor training data for pattern/distribution changes (data drift);
(d) compare retrained model performance against baseline before production;
(e) rollback plan using versioned training data and models (Feature Store);
(f) monitor low-entropy classification with thresholds and alerts;
(g) AI Service Cards for evaluating third-party model testing procedures.
FS-17 — Model Monitor Data Quality → Merged into upstream SM-07
Upstream extension note (do not ship as a standalone check): The detection and remediation
content from FS-17 should be added as a refinement of the existing SM-07 (Model Monitor)
check in the upstream aws-samples/sample-aiml-security-assessment repo.
What to add to SM-07:
- Filter
ListMonitoringSchedules results for MonitoringType=DataQuality (not just any schedule). Note the format difference: ListMonitoringSchedules/MonitoringScheduleSummary returns MonitoringType in PascalCase (DataQuality, ModelQuality, ModelBias, ModelExplainability); DescribeMonitoringSchedule returns the same type in SCREAMING_SNAKE_CASE (DATA_QUALITY, MODEL_QUALITY, MODEL_BIAS, MODEL_EXPLAINABILITY) — the detection should normalise both forms.
- Require
emit_metrics to be enabled on the monitoring schedule.
- Verify CloudWatch alarms exist on the
feature_baseline_drift_<feature_name> metrics published
to namespace /aws/sagemaker/Endpoints/data-metric (real-time endpoints, dimensions
EndpointName + ScheduleName) or /aws/sagemaker/ModelMonitoring/data-metric (batch
transform, dimension MonitoringSchedule).
- Guide traceability: [Guide §1.2.14] — “Monitor your training data for pattern and distribution
changes to detect data drift”; “Amazon SageMaker Model Monitor – Data quality.”
Reference: SageMaker Model Monitor Data Quality
FS-18 — Model Drift Detection → Merged into upstream SM-23
Upstream extension note (do not ship as a standalone check): The detection and remediation
content from FS-18 should be added as a refinement of the existing SM-23 (Model Drift
Detection) check in the upstream repo.
What to add to SM-23:
- Filter
ListMonitoringSchedules results for MonitoringType=ModelQuality.
- Add a new remediation step for low-entropy classification monitoring (Guide §1.2.14
mitigation): publish custom CloudWatch metrics tracking prediction confidence distributions,
set threshold boundaries for unexpected low-confidence/high-confidence clusters, and alert
when the retrained model produces unexpected classification patterns — this can indicate
training data poisoning before accuracy metrics degrade.
- Guide traceability: [Guide §1.2.14] — “Before deploying to production, compare your retrained
model’s performance against previous iterations using historical test data as a baseline.”
Reference: SageMaker Model Monitor Model Quality
FS-19 — Model Registry Approval → Merged into upstream SM-22
Upstream extension note (do not ship as a standalone check): The detection and remediation
content from FS-19 should be added as a refinement of the existing SM-22 (Model Approval
Workflow) check in the upstream repo.
What to add to SM-22:
- Explicitly check that
ModelApprovalStatus=PendingManualApproval is the default for new
model package versions (not Approved).
- Flag any model package group where the latest version has
ModelApprovalStatus=Approved
without evidence of a manual approval step (i.e., auto-approved at creation time).
- Guide traceability: [Guide §1.2.14] — cites “Amazon SageMaker AI – Model Registration and
Deployment with Model Registry” as a reference for staged deployment with rollback.
Reference: SageMaker Model Registry
FS-20 — Feature Store Rollback
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.14] — “Create a rollback plan by using versioned training data and models. This ensures that you can revert to a stable, working model if failures occur.” References “Amazon SageMaker AI Feature Store”. |
| Description |
Checks SageMaker Feature Store has offline store for rollback capability. |
| Detection |
Calls sagemaker:ListFeatureGroups to enumerate all groups, then sagemaker:DescribeFeatureGroup for each to inspect OfflineStoreConfig. Flags feature groups where OfflineStoreConfig is absent (online-only groups with no offline store for rollback). |
| Remediation |
1. Enable the offline store on each feature group: specify an S3 URI and data catalog in OfflineStoreConfig. 2. The offline store provides a versioned, immutable history of feature values for point-in-time rollback. 3. Test rollback by querying the offline store with a historical timestamp. |
| Reference |
SageMaker Feature Store |
FS-21 — Training Data S3 Versioning and Audit Trail
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.14] — “Use trusted data sources for your training data. Implement audit controls that let you track and review changes, including who made them and when they occurred.” |
| Description |
Verifies S3 buckets used for training data have versioning enabled so poisoned datasets can be rolled back. Recommends CloudTrail data-event logging as remediation to record who modified training data and when. |
| Detection |
Identifies training-data S3 buckets by naming convention (train/dataset/model/sagemaker/bedrock). Calls s3:GetBucketVersioning to verify Status=Enabled. (CloudTrail data-event logging is recommended in remediation but is not asserted by this check — verifying it is covered by the upstream BR-06 CloudTrail control and the FS-23 extension.) |
| Remediation |
1. Enable versioning: aws s3api put-bucket-versioning --bucket <name> --versioning-configuration Status=Enabled. 2. Enable CloudTrail S3 data events for the training-data buckets to capture PutObject/DeleteObject with caller identity. 3. Enable MFA Delete for critical training datasets. 4. Apply S3 Object Lock for immutable baselines. |
| Reference |
S3 Versioning, CloudTrail Data Events |
Vector & Embedding Weaknesses (FS-22 to FS-26)
Guide source: §1.2.15 Vector and embedding weaknesses. Guide-listed mitigations:
(a) apply least privilege to vector and embedding database access;
(b) validate knowledge base data sources;
(c) add data only from trusted sources to knowledge bases;
(d) monitor and log all activities in knowledge base control plane (CloudTrail);
(e) enable encryption at rest and in transit for vector and embedding databases;
(f) implement document/record-level access controls via KB metadata filtering for
multi-tenancy.
FS-22 — Knowledge Base IAM Least Privilege
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.15] — “Apply the principle of least privilege to control access to your vector and embedding database. Only grant users and services the minimum permissions they need to perform their tasks.” |
| Description |
Checks IAM roles accessing Knowledge Bases have no wildcard bedrock:* permissions covering KB actions. |
| Detection |
Inspects the permissions cache for all IAM roles. Flags any role with an Allow statement granting bedrock:* without resource-level restrictions, or broad bedrock: actions covering KB operations without a specific knowledge-base ARN. Note: Bedrock agent and KB operations use the single IAM service prefix bedrock: (not bedrock-agent:) — the bedrock-agent token refers to the boto3 SDK client name, not the IAM action prefix. |
| Remediation |
1. Replace wildcard bedrock:* with specific KB actions: bedrock:Retrieve, bedrock:RetrieveAndGenerate, bedrock:GetKnowledgeBase (these are the actual IAM action names — verify via the AWS Service Authorization Reference for Amazon Bedrock). 2. Scope the resource ARN to specific Knowledge Base IDs (e.g., arn:aws:bedrock:<region>:<account>:knowledge-base/<kb-id>). 3. Apply IAM permission boundaries to limit blast radius. |
| Reference |
Bedrock Knowledge Base Permissions |
FS-23 — Knowledge Base CloudTrail Logging → Merged into upstream BR-06
Upstream extension note (do not ship as a standalone check): The detection and remediation
content from FS-23 should be added as a refinement of the existing BR-06 (CloudTrail
Logging) check in the upstream repo.
What to add to BR-06:
- After verifying that a CloudTrail trail is active and logging Bedrock management events,
additionally check for an advanced event selector with
resources.type = AWS::Bedrock::KnowledgeBase to capture Retrieve and
RetrieveAndGenerate data events (these are NOT logged by default — they require an
explicit advanced event selector).
- Note:
InvokeAgent / InvokeInlineAgent are also data events requiring
resources.type = AWS::Bedrock::AgentAlias or AWS::Bedrock::InlineAgent respectively.
Data events incur additional CloudTrail charges and can produce high volumes under load.
- Guide traceability: [Guide §1.2.15] — “Monitor and log all activities in knowledge base
control plane” with reference “Monitor Amazon Bedrock API calls using CloudTrail.”
Reference: CloudTrail Bedrock Logging
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.15] — “Implement access controls at the document or record level within knowledge bases where different users or applications should only have access to specific subsets of data. Use Amazon Bedrock Knowledge Bases metadata filtering to enforce data segmentation.” |
| Description |
Advisory: verifies KB metadata fields support tenant-level filtering for multi-tenancy. |
| Detection |
Calls ListKnowledgeBases and GetKnowledgeBase (via the bedrock-agent boto3 client; IAM actions are bedrock:ListKnowledgeBases and bedrock:GetKnowledgeBase). Inspects the storage configuration for metadata field definitions. Flags KBs with no metadata fields defined (no tenant isolation possible). |
| Remediation |
1. Define metadata fields on your KB data sources (e.g., tenant_id, department, classification). 2. Populate metadata during document ingestion. 3. Use the filter parameter in Retrieve/RetrieveAndGenerate API calls to enforce tenant-scoped queries. 4. Test that cross-tenant data leakage is prevented. |
| Reference |
Bedrock KB Metadata Filtering |
FS-25 — OpenSearch Serverless Encryption
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.15] — “Enable encryption at rest and in transit for vector and embedding databases.” |
| Description |
Checks OpenSearch Serverless collections used by KBs have CMK encryption policies. |
| Detection |
Calls opensearchserverless:ListCollections (IAM action aoss:ListCollections) and for each calls opensearchserverless:ListSecurityPolicies(type=encryption) (IAM action aoss:ListSecurityPolicies). Inspects each encryption policy’s document for AWSOwnedKey=true or missing KmsARN. Note: the encryption policy JSON document uses PascalCase field names — AWSOwnedKey and KmsARN — while the direct API EncryptionConfig struct uses camelCase (aWSOwnedKey, kmsKeyArn); detection should inspect the policy document form returned by GetSecurityPolicy/ListSecurityPolicies. Flags collections using AWS-owned keys instead of customer-managed KMS keys. Note: the boto3 client name is opensearchserverless, but IAM actions use the service prefix aoss: (not opensearchserverless:). Note also: encryption in transit is automatic (TLS 1.2, AES-256) for all OpenSearch Serverless traffic and is not configurable — this check focuses on encryption at rest. |
| Remediation |
1. Create an encryption security policy specifying a customer-managed KMS key: set AWSOwnedKey=false and provide KmsARN with the ARN of your KMS key. 2. Apply the policy to the collection by matching the collection name or prefix pattern in the policy Rules. 3. Ensure the KMS key policy grants the OpenSearch Serverless service principal kms:Decrypt and kms:GenerateDataKey. Note: if you provide a KMS key directly in the CreateCollection request, it takes precedence over any matching security policies. |
| Reference |
OpenSearch Serverless Encryption |
FS-26 — Knowledge Base VPC Access
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.15, extension] — network isolation is not verbatim in the guide but directly implements “Apply the principle of least privilege to control access to your vector and embedding database” at the network layer. |
| Description |
Verifies OpenSearch Serverless collections have VPC-only network policies (no public access). |
| Detection |
Calls opensearchserverless:ListSecurityPolicies(type=network) (IAM action aoss:ListSecurityPolicies — the service prefix for OpenSearch Serverless is aoss, not opensearchserverless) and inspects each policy rule for AllowFromPublic=true. Flags collections accessible from the public internet. Note: a policy with AllowFromPublic=false may still grant private access to Bedrock via SourceServices: ["bedrock.amazonaws.com"] or to specific VPC endpoints via SourceVPCEs — these are the recommended private-access patterns and are not flagged. |
| Remediation |
1. Create a network security policy that restricts access to specific VPC endpoints only via SourceVPCEs, or grants private AWS service access (e.g., Bedrock) via SourceServices: ["bedrock.amazonaws.com"]. Per AWS docs, private access to AWS services applies only to the collection’s OpenSearch endpoint, not to the OpenSearch Dashboards endpoint. 2. Create an OpenSearch Serverless VPC endpoint in your VPC if VPC-private access is required. 3. Remove any policy rules with AllowFromPublic=true. 4. Test connectivity from within the VPC. |
| Reference |
OpenSearch Serverless Network Access |
Part 2 — Guardrails & Content Safety (FS-27 to FS-46)
Guide risk categories: Non-Compliant Output (FS-27..30, §1.2.1), Misinformation (FS-31..34, §1.2.3; FS-34 sources from §1.2.12 — see note), Abusive or Harmful Output (FS-35..38, §1.2.4), Biased Output (FS-39..42, §1.2.5), Sensitive Information Disclosure (FS-43..46, §1.2.6).
Non-Compliant Output (FS-27 to FS-30)
Guide source: §1.2.1 Non-compliant output. Guide-listed mitigations:
(a) prompt engineering to guide the model and prevent unwanted responses;
(b) content filters and denied topics in Bedrock Guardrails;
(c) RAG with Bedrock Knowledge Bases;
(d) Automated Reasoning checks in Bedrock Guardrails;
(e) human-in-the-loop validation for internal AI systems;
(f) audit logs of AI-generated outputs and guardrails applied for regulatory reporting.
FS-27 — Automated Reasoning Checks
| Field |
Detail |
| Severity |
High (contextual grounding) / Medium (Automated Reasoning) |
| Guide ref |
[Guide §1.2.1, §1.2.7] — “Automated Reasoning checks in Amazon Bedrock Guardrails uses automated reasoning to verify that natural language content complies with your defined policies. This mathematical verification helps ensure that your content strictly follows your guardrails.” |
| Description |
Verifies Bedrock Guardrails have Automated Reasoning checks or contextual grounding enabled. |
| Detection |
Calls bedrock:ListGuardrails and bedrock:GetGuardrail for each. Inspects the response fields contextualGroundingPolicy and automatedReasoningPolicy. Flags guardrails with neither enabled. |
| Remediation |
1. Enable contextual grounding filters (type GROUNDING) with a threshold ≥ 0.7 — these filters CAN block content that fails grounding checks. Note: valid threshold values are 0 to 0.99; a threshold of 1.0 is invalid and will block all content. Important use-case limitation: Contextual grounding checks support summarization, paraphrasing, and question answering use cases only — Conversational QA / Chatbot use cases are not supported. If your FinServ application is a conversational chatbot, contextual grounding cannot be used for hallucination detection; use Automated Reasoning checks or human-in-the-loop validation instead. 2. If available in your region, additionally enable Automated Reasoning checks by creating an Automated Reasoning policy and attaching it to the guardrail. Cross-Region inference is REQUIRED for AR: Guardrails that use Automated Reasoning checks require a cross-Region inference profile — set crossRegionConfig.guardrailProfileIdentifier to a profile matching your Region (for example, us.guardrail.v1:0 for US Regions or eu.guardrail.v1:0 for EU Regions). Omitting this parameter returns ValidationException. As of April 2026, AR is generally available in US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Frankfurt), EU (Ireland), and EU (Paris) — verify current regional availability on the AR documentation page before audit reliance, as AWS regularly expands coverage. Attach the versioned policy ARN (for example, ...:1) — the unversioned ARN returns an error. You can attach a maximum of 2 AR policies per guardrail. Important: Automated Reasoning operates in detect mode only — it returns findings and feedback but does NOT block content. AR finding types (per the AWS user guide) are: VALID (response is consistent with policy), INVALID (response contradicts policy rules), SATISFIABLE (response could be true or false depending on unstated conditions), IMPOSSIBLE (premises are contradictory), TRANSLATION_AMBIGUOUS (natural language could not be reliably translated to formal logic), TOO_COMPLEX (policy complexity exceeded processing limits), and NO_TRANSLATIONS (some or all input was not translated into logic due to irrelevance or lack of matching policy variables). Note: in the AutomatedReasoningCheckFinding runtime response, these appear as a union with lowercase camelCase keys (valid, invalid, satisfiable, impossible, translationAmbiguous, tooComplex, noTranslations) — exactly one key is present per finding. Per AWS docs, AR also does not protect against prompt injection attacks, cannot detect off-topic responses, does not support streaming APIs, and supports English (US) only — use content filters, topic policies, and other guardrail components alongside AR. Critical limitation for cross-account enforcement: AR policies are NOT supported with Bedrock Guardrails cross-account safeguards (org-level or account-level enforcement) — including an AR policy in a guardrail used for enforcement will cause runtime failures. If you rely on AR, configure it at the application or account level separately. Your application must inspect the AR findings via the ApplyGuardrail (or Converse / InvokeModel / InvokeAgent / RetrieveAndGenerate) API response and decide whether to serve the response, rewrite it using AR feedback, ask the user for clarification, or fall back to a default behavior. 3. For INVALID responses, implement an iterative rewriting loop that feeds AR feedback (contradicting rules) back to the LLM to self-correct. 4. Build an audit trail of all AR validation iterations — log supportingRules and claimsTrueScenario for VALID findings as mathematically verifiable compliance evidence. |
| Reference |
Automated Reasoning in Bedrock Guardrails, AR Checks Concepts (Validation Results Reference), Integrate AR Checks in Your Application, Deploy Automated Reasoning Policy |
FS-28 — Financial Denied Topics
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.1] — “Configure content filters and guardrails to restrict model responses to approved topics” with reference “Amazon Bedrock User Guide – Guardrails – Denied topics”. |
| Description |
Checks guardrails have denied topics for regulated financial advice. |
| Detection |
Calls bedrock:GetGuardrail and inspects topicPolicy.topics for entries with type=DENY. Flags guardrails with no denied topics or with no topics related to financial advice, investment recommendations, or tax guidance. |
| Remediation |
1. Add denied topics to the guardrail following the AWS best-practice golden rules: (a) Be crisp and precise — e.g., “Investment advice is inquiries, guidance, or recommendations about the management or allocation of funds or assets with the goal of generating returns or achieving specific financial objectives” rather than vague “Investment advice”. (b) Define, don’t instruct — write “All content associated with specific investment recommendations” not “Block all investment advice”. (c) Stay positive — never define topics negatively (e.g., avoid “All content except general financial education”). (d) Focus on themes, not words — denied topics capture subjects contextually; use word filters for specific names or entities. (e) Provide sample phrases — add up to 5 representative inputs per topic (each up to 100 characters). 2. Quantity and character limits: A guardrail can contain a maximum of 30 denied topics. In Classic tier, topic definitions are limited to 200 characters; in Standard tier, up to 1,000 characters — use Standard tier for complex financial topic definitions. 3. Recommended denied topics for FinServ: “specific investment recommendations”, “tax advice”, “specific financial product recommendations”, “guaranteed returns or performance claims”. 4. For multi-account enforcement, use Bedrock cross-account safeguards to apply denied topics from a management-account guardrail across all member accounts automatically. When configuring account-level or org-level enforcement, set both selectiveContentGuarding.messages AND selectiveContentGuarding.system to COMPREHENSIVE to ensure guardrails evaluate all user messages AND system prompts regardless of input tags — use SELECTIVE only when you trust callers to correctly tag content. Setting only messages to COMPREHENSIVE leaves system prompts potentially unguarded. 5. Enforce guardrails via IAM policy conditions (bedrock:GuardrailIdentifier) to prevent any Bedrock inference call without a guardrail attached. 6. Test with prompts that attempt to elicit regulated financial advice. |
| Reference |
Bedrock Guardrails Denied Topics, Safeguard Tiers for Guardrails, Cross-Account Safeguards with Enforcements, Guardrails Best Practices |
FS-29 — Compliance Disclaimer
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.1, extension] — disclaimers are not verbatim in §1.2.1 but the guide references “Implement response disclaimers in customer-facing applications” under §1.2.7 Hallucination, which is conceptually the same control applied here for non-compliant financial-advice output. |
| Description |
Advisory: verifies application adds required regulatory disclaimers to AI-generated outputs. |
| Detection |
Advisory check — cannot be fully automated. Inspects application Lambda function environment variables or configuration for disclaimer-related settings (e.g., DISCLAIMER_ENABLED, COMPLIANCE_FOOTER). |
| Remediation |
1. Add a standard regulatory disclaimer to all customer-facing AI-generated responses (e.g., “This information is generated by AI and does not constitute financial advice. Please consult a qualified financial advisor.”). 2. Make the disclaimer text configurable via environment variable or parameter store. 3. Ensure disclaimers are not removable by prompt manipulation. |
| Reference |
AWS Well-Architected GenAI Lens — Guardrails |
FS-30 — Compliance Evaluation Datasets
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.1, extension] — the Guide §1.2.12 practical guidance mentions “Amazon Bedrock Evaluations can help to evaluate models against specific types of attacks”; this check extends that concept to compliance-specific evaluation for FS-regulated outputs. |
| Description |
Checks Bedrock evaluation jobs use compliance-specific test datasets. |
| Detection |
Calls bedrock:ListEvaluationJobs to enumerate existing jobs, then calls bedrock:GetEvaluationJob for each to inspect the full evaluationConfig including dataset configuration. Flags if no evaluation jobs exist or if none reference compliance/regulatory test data. Note: ListEvaluationJobs returns only job summaries — dataset configuration details require GetEvaluationJob. |
| Remediation |
1. Create a compliance-specific evaluation dataset containing: prompts requesting regulated financial advice, prompts testing disclaimer presence, prompts testing denied-topic enforcement. 2. Run Bedrock evaluation jobs with this dataset before each production deployment. 3. Set pass/fail thresholds and gate deployments on results. |
| Reference |
Bedrock Model Evaluation |
Guide source: §1.2.3 Misinformation through inadvertent or malicious action. Guide-listed mitigations:
(a) prompt engineering;
(b) verify knowledge base data sources are up-to-date, accurate, reliable, and complete;
(c) human-in-the-loop validation for internal AI systems;
(d) source attribution in RAG responses for end users to verify provenance;
(e) integrity monitoring on knowledge base data sources — e.g., S3 event notifications to
track document changes.
FS-31 — Knowledge Base Data Source Sync
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.3, §1.2.10] — “Verify that your knowledge base data sources are up-to-date, accurate, reliable, and complete”; “Sync your data with your Amazon Bedrock knowledge base”. |
| Description |
Verifies KB data sources have been synced within 7 days. |
| Detection |
Calls ListDataSources then ListIngestionJobs for each data source (via the bedrock-agent boto3 client; IAM actions are bedrock:ListDataSources and bedrock:ListIngestionJobs). Checks the most recent successful ingestion job’s updatedAt timestamp. Flags data sources not synced within 7 days. |
| Remediation |
1. Create an EventBridge scheduled rule to trigger KB data source sync at least weekly. 2. Use StartIngestionJob (IAM action bedrock:StartIngestionJob) as the rule target. 3. Add CloudWatch alarms for failed ingestion jobs. 4. For rapidly changing data, increase sync frequency. |
| Reference |
Bedrock KB Data Source Sync |
FS-32 — Source Attribution
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.3, §1.2.10] — “Use source attribution in RAG-based response for end users to verify provenance of information” (§1.2.3); “Use source attribution in RAG-based response for end users to verify currency of information” (§1.2.10). |
| Description |
Advisory: verifies application implements source citations in RAG responses. |
| Detection |
Advisory check — inspects application code or configuration for use of the citations field in RetrieveAndGenerate API responses. Checks Lambda environment variables for attribution-related settings. |
| Remediation |
1. Use the RetrieveAndGenerate API (IAM action bedrock:RetrieveAndGenerate) which returns citations with source document references. Each citation contains retrievedReferences — an array where each reference has a content object (the cited text), a location object (data source type and URI — for S3 sources, location.type=S3 and location.s3Location.uri contains the S3 URI), and optional metadata (a string-to-JSON map with any custom metadata attributes stored on the chunk, which can hold document title and other fields). Note: there is no fixed title field in the API — if you need to display document titles to end users, store them as a metadata attribute during KB ingestion and retrieve them via retrievedReferences[].metadata. 2. Display source citations to end users alongside AI-generated responses. 3. Include the data source location (URI or other location identifier depending on source type: S3, Web, Confluence, SharePoint, Salesforce, Kendra, SQL, or Custom) and the cited text excerpt (from content). 4. If document titles are required, ensure they are populated in KB metadata and propagated to your UI. 5. Allow users to click through to the original source document where possible. |
| Reference |
Bedrock RetrieveAndGenerate API |
FS-33 — Knowledge Base Integrity Monitoring
| Field |
Detail |
| Severity |
High (deleted bucket) / Medium (versioning) |
| Guide ref |
[Guide §1.2.3] — “Use integrity monitoring on knowledge base data sources to detect unauthorized modifications. Track changes to documents used in knowledge bases.” References “For example on S3 data sources use Amazon S3 event notification to track changes to documents.” |
| Description |
Checks KB data source S3 buckets have versioning enabled and S3 event notifications (EventBridge or SNS) configured to detect unauthorized document modifications in real time. |
| Detection |
Identifies KB data-source S3 buckets via GetDataSource (via the bedrock-agent boto3 client; IAM action bedrock:GetDataSource). Calls s3:GetBucketVersioning to verify Status=Enabled. Calls s3:GetBucketNotificationConfiguration and checks for EventBridgeConfiguration, TopicConfigurations, QueueConfigurations, or LambdaFunctionConfigurations. Flags buckets missing either control. |
| Remediation |
1. Enable versioning: aws s3api put-bucket-versioning --bucket <name> --versioning-configuration Status=Enabled. 2. Enable EventBridge notifications on the bucket: aws s3api put-bucket-notification-configuration --bucket <name> --notification-configuration '{"EventBridgeConfiguration":{}}'. Once enabled, S3 automatically sends all bucket events to EventBridge — you do not select specific event types at the bucket level. 3. Create an EventBridge rule that matches S3 events for this bucket — use the detail-type field values Object Created and Object Deleted (these are the EventBridge event type names; note: s3:ObjectCreated:* and s3:ObjectRemoved:* are the legacy SNS/SQS/Lambda notification event type names and are NOT used in EventBridge rules). Route matched events to an SNS topic or Lambda function for alerting. 4. Integrate alerts into your security incident response workflow. |
| Reference |
S3 EventBridge Integration |
Note: FS-34 (Third-Party Risk Management for FM Providers) is kept adjacent to Misinformation
in this file for continuity with the prior draft numbering, but its guide source is §1.2.12
Supply Chain Vulnerabilities. Treat FS-34 as a Supply Chain check for compliance-framework
mapping purposes.
FS-34 — Third-Party Risk Management (TPRM) for Foundation Model Providers
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.12] — “Update existing third-party risk management processes to continuously monitor model providers and third-party dependencies, including tracking vendor security advisories, model deprecation notices, and change to terms and conditions.” (Note: moved from the Misinformation section in the prior draft; the guide places TPRM under Supply Chain.) |
| Description |
Verifies a documented third-party risk management (TPRM) process exists to monitor FM providers for security advisories, model deprecation notices, and T&C changes; also flags legacy FMs currently in use. |
| Detection |
Calls bedrock:ListFoundationModels, then bedrock:GetFoundationModel for each in-use model; inspects modelLifecycle.status and flags models with status LEGACY. Note: the FoundationModelLifecycle.status API field has only two valid values — ACTIVE and LEGACY. There is no EOL status value in the API; models that have passed their EOL date are removed from the service entirely and API calls referencing them will fail. The user-facing lifecycle page describes three conceptual states (Active, Legacy, EOL) but the API only exposes two. Advisory component checks for evidence of a TPRM process — e.g., an AWS Config rule or a tag on Bedrock resources indicating periodic review (tprm-last-reviewed=<ISO-date>). |
| Remediation |
1. Establish a documented TPRM process: at least quarterly review of each in-use FM provider’s security advisories, model lifecycle announcements, and T&C changes. 2. Assign an owner for the TPRM process and record review evidence in your MRM system. 3. Subscribe to AWS Bedrock model lifecycle notifications. 4. Migrate workloads from LEGACY models to active versions before their published EOL date — note that for models with EOL dates after February 1, 2026, there is a “public extended access” period where Legacy models remain usable but at higher pricing set by the model provider. 5. For third-party models procured via AWS Marketplace or consumed directly, evaluate the provider’s own testing procedures — AWS AI Service Cards provide this transparency for Amazon-trained models. |
| Reference |
Bedrock Model Lifecycle, Access Amazon Bedrock foundation models |
Abusive or Harmful Output (FS-35 to FS-38)
Guide source: §1.2.4 Model output is abusive or harmful. Guide-listed mitigations:
(a) AWS AI Service Cards to understand how Amazon addresses toxicity per model;
(b) Amazon Bedrock Guardrails to detect and filter harmful content;
(c) FMEval to evaluate for inappropriate content (sexual, profanity, hate, aggression,
insults, flirtation, identity attacks, threats);
(d) user reporting mechanism so end users can flag abusive outputs, reviewed within a
defined process;
(e) Practical guidance: create allowlists for approved business terminology to reduce
false positives on brand, product, industry, and technical vocabulary.
FS-35 — FMEval Harmful Content
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.4] — “Foundation Model Evaluations (FMEval) evaluates your model to detect inappropriate content, including sexual references, profanity, hate speech, aggression, insults, flirtation, identity-based attacks, and threats.” |
| Description |
Checks Bedrock evaluation jobs test for harmful/toxic content. |
| Detection |
Calls bedrock:ListEvaluationJobs to enumerate existing jobs, then calls bedrock:GetEvaluationJob for each to inspect the full evaluationConfig. The correct metric name depends on the evaluation job type: (a) For automated model evaluation jobs (pre-computed metrics), the toxicity metric is "Builtin.Toxicity" — the only valid harmful-content metric for this job type. (b) For judge-based model evaluation jobs (LLM-as-judge), the harmful content metrics are "Builtin.Harmfulness" and "Builtin.Stereotyping". (c) For knowledge base (RAG) evaluation jobs, "Builtin.Harmfulness" and "Builtin.Stereotyping" are also valid. Flags if no evaluation jobs exist or none include a harmful-content metric (Builtin.Toxicity for automated, Builtin.Harmfulness for judge-based/RAG). Note: ListEvaluationJobs returns only job summaries — dataset configuration details require GetEvaluationJob. |
| Remediation |
1. For automated model evaluation (fastest, no judge model required): create a Bedrock evaluation job with "Builtin.Toxicity" in the metricNames array. Valid task types are Summarization, Classification, QuestionAndAnswer, Generation, and Custom. 2. For judge-based model evaluation (more nuanced, requires a judge model): create a Bedrock evaluation job with "Builtin.Harmfulness" and/or "Builtin.Stereotyping" in the metricNames array — these metrics are only valid for judge-based and RAG evaluation jobs, not automated model evaluation jobs. 3. Include test prompts designed to elicit harmful content. 4. Set pass/fail thresholds based on the scores returned. 5. Run evaluations before production deployment and after model updates. 6. For more granular toxicity scoring (the 7-category UnitaryAI Detoxify-unbiased scores: toxicity, severe_toxicity, obscene, threat, insult, sexual_explicit, identity_attack — or the Toxigen-roberta binary classifier), use SageMaker FMEval via SageMaker Studio or the fmeval Python library as a complementary evaluation path. |
| Reference |
Bedrock Model Evaluation Metrics, SageMaker FMEval Toxicity |
FS-36 — Guardrail Content Filters
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.4] — “Use Amazon Bedrock’s guardrails to detect and filter harmful content.” |
| Description |
Verifies guardrails have content filters for hate, violence, sexual, and other harmful content. |
| Detection |
Calls bedrock:GetGuardrail and inspects contentPolicy.filters. Flags guardrails missing filters for HATE, VIOLENCE, SEXUAL, INSULTS, or MISCONDUCT categories. Also checks that inputStrength and outputStrength are at least MEDIUM. |
| Remediation |
1. Update the guardrail to include content filters for all harmful categories: HATE, VIOLENCE, SEXUAL, INSULTS, MISCONDUCT. 2. Select the Standard tier (not Classic) for content filters — it offers better accuracy, broader language support (extensive multilingual support vs. English/French/Spanish only in Classic), prompt leakage detection, and extends protection to harmful content within code elements. Standard tier requires cross-Region inference to be enabled on the guardrail (configurable at creation or by modifying an existing guardrail). 3. Start with HIGH filter strength for customer-facing applications; evaluate false-positive rates on representative sample traffic and lower to MEDIUM only if necessary. 4. Apply filters to both INPUT and OUTPUT. 5. Before enabling blocking in production, use detect mode (action=NONE) to test guardrail behavior on live traffic — review trace output to validate decisions, then switch to action=BLOCK once confident. 6. Enforce guardrails organization-wide via IAM policy-based enforcement: add an IAM condition key (bedrock:GuardrailIdentifier) to deny any InvokeModel/Converse call that does not include a guardrail. For account-level or org-level enforcement configurations, set both selectiveContentGuarding.messages AND selectiveContentGuarding.system to COMPREHENSIVE to ensure guardrails evaluate all user messages AND system prompts regardless of input tags (use SELECTIVE only when you trust callers to correctly tag content). Setting only messages to COMPREHENSIVE leaves system prompts potentially unguarded. |
| Reference |
Bedrock Guardrails Content Filters, Safeguard Tiers for Guardrails, Cross-Account Safeguards with Enforcements, Guardrails Best Practices, IAM Guardrail Enforcement |
FS-37 — User Feedback Mechanism
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.4] — “Implement a user reporting mechanism that allows end users to flag abusive or harmful outputs. Reported incidents [are] reviewed within a defined process to refine content filters.” |
| Description |
Advisory: verifies application has a user reporting mechanism for harmful outputs. |
| Detection |
Advisory check — inspects application configuration for feedback-related settings (e.g., FEEDBACK_ENABLED, REPORT_ABUSE_ENDPOINT). Checks for Lambda functions with “feedback” or “report” in the name. |
| Remediation |
1. Implement a “Report this response” button in the application UI. 2. Route reported responses to an SQS queue or DynamoDB table for review. 3. Define an SLA for reviewing reported content (e.g., 24 hours). 4. Use reported incidents to refine guardrail content filters and word lists. 5. Log all reports with Bedrock invocation logging correlation IDs. |
| Reference |
Bedrock Model Invocation Logging |
FS-38 — Guardrail Word Filters and Business Term Allowlists
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.4 — Practical guidance] — “Create allowlists for business terms that include approved terminology for: brand names, product names, industry terms, and technical vocabulary. Also test filter settings to verify that your content filters allow necessary business communications and generate accurate alerts. Monitor and adjust regularly your filtering system to reduce false positives.” |
| Description |
Checks guardrails have word/phrase block filters configured and that approved business terminology allowlists are defined to prevent false positives on legitimate financial services vocabulary. |
| Detection |
Calls bedrock:GetGuardrail and inspects wordPolicy. Flags guardrails with no custom words array (blocked phrases). Also checks managedWordLists for the AWS-managed PROFANITY list. Note: a guardrail with only the profanity filter and no custom FinServ-specific blocked terms should still be flagged as incomplete for financial services use cases. |
| Remediation |
1. Add blocked words/phrases to the guardrail word filter (profanity, slurs, competitor names if applicable). Each custom word/phrase entry has a maximum length of 100 characters per the API (GuardrailWordConfig.text); the console UI additionally limits entries to up to three words per phrase. You can add up to 10,000 items to the custom word filter. 2. Enable the AWS-managed profanity filter (managedWordListsConfig with type=PROFANITY) as a baseline. 3. Create an allowlist of approved business terminology: brand names, product names, industry terms, technical vocabulary — document this separately as the guardrail word filter only blocks, it does not allowlist. Test filter settings to verify legitimate business communications are not blocked. 4. Monitor and adjust regularly to reduce false positives. |
| Reference |
Bedrock Guardrails Word Filters |
Biased Output (FS-39 to FS-42)
Guide source: §1.2.5 Model output is biased. Guide-listed mitigations:
(a) AWS AI Service Cards to understand how providers address fairness/bias per model;
(b) prompt engineering;
(c) Amazon Bedrock Guardrails;
(d) Bedrock Evaluations to measure bias;
(e) Amazon SageMaker Clarify for bias detection, transparency, and prediction explanation
on fine-tuned and self-trained models;
(f) develop and maintain a bias testing dataset with representative cases across
demographic groups, geographic regions, and sensitive attributes — run periodically and
after each model update.
FS-39 — SageMaker Clarify Bias
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.5] — “Use Amazon SageMaker Clarify to detect bias, increase transparency, and explain predictions for your fine-tuned and self-trained AI models.” |
| Description |
Verifies Clarify model bias monitoring is configured for financial decision models. |
| Detection |
Calls sagemaker:ListMonitoringSchedules with the MonitoringTypeEquals=ModelBias filter parameter (the MonitoringType field on the MonitoringScheduleSummary response has one of four values: DataQuality, ModelQuality, ModelBias, ModelExplainability). Flags if no bias monitoring schedules exist. Cross-references with endpoints tagged use-case=financial-decision or similar. Clarify bias monitoring publishes metrics to the aws/sagemaker/Endpoints/bias-metrics namespace for real-time endpoints (and aws/sagemaker/ModelMonitoring/bias-metrics for batch transform jobs) with Endpoint, MonitoringSchedule, BiasStage, Label, LabelValue, Facet, and FacetValue dimensions. |
| Remediation |
1. Create a SageMaker Clarify bias monitoring schedule for each financial decision model endpoint. 2. Specify facets (protected attributes: age, gender, race, geography) and bias metrics (DPL, DI, DPPL). 3. Provide a baseline bias report from training data. 4. Configure CloudWatch alarms on bias metric violations on the aws/sagemaker/Endpoints/bias-metrics namespace. Note: publish_cloudwatch_metrics is enabled by default — do NOT set it to Disabled in the model bias job definition’s Environment map, as that would stop metrics from being published to CloudWatch. |
| Reference |
SageMaker Clarify Bias Detection |
FS-40 — Bedrock Bias Evaluation Datasets and Cadence
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.5] — “Develop and maintain a bias testing dataset that includes representative test cases across demographic groups, geographic regions, and other sensitive attributes relevant to your use case. Run these test cases periodically and after model updates.” |
| Description |
Checks evaluation jobs include demographic fairness test cases across protected groups and verifies evaluations are run on a defined periodic schedule and after each model update. |
| Detection |
Calls bedrock:ListEvaluationJobs to enumerate existing jobs, then calls bedrock:GetEvaluationJob for each to inspect the full evaluationConfig including dataset configuration for demographic diversity test cases. Checks the creationTime of the most recent evaluation job and flags if it is older than 90 days or if no evaluation was run after the most recent model deployment. Note: ListEvaluationJobs returns only job summaries — dataset configuration details require GetEvaluationJob. |
| Remediation |
1. Create a bias evaluation dataset with representative test cases across demographic groups, geographic regions, and other sensitive attributes. 2. Schedule evaluation jobs to run at least quarterly via EventBridge. 3. Trigger an evaluation job automatically after each model update in your CI/CD pipeline. 4. Store results for audit and trend analysis. |
| Reference |
Bedrock Model Evaluation |
FS-41 — SageMaker Clarify Explainability
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.5, extension] — Guide §1.2.5 recommends “Amazon SageMaker Clarify to detect bias, increase transparency, and explain predictions”. ECOA/Fair Housing adverse-action-notice use case is an FS-specific extension of Clarify explainability not named verbatim in the guide. |
| Description |
Verifies Clarify explainability monitoring for adverse action notices (commonly cited under ECOA for credit decisions; this is an FS industry-practice extension, not a guide-prescribed control). |
| Detection |
Calls sagemaker:ListMonitoringSchedules with the MonitoringTypeEquals=ModelExplainability filter parameter. Flags if no explainability monitoring schedules exist for financial decision model endpoints. Clarify explainability monitoring publishes metrics to the aws/sagemaker/Endpoints/explainability-metrics namespace for real-time endpoints (and aws/sagemaker/ModelMonitoring/explainability-metrics for batch transform jobs) with Endpoint, MonitoringSchedule, ExplainabilityMethod (value: KernelShap), Label, and ValueType (values: GlobalShapValues or ExpectedValue) dimensions. |
| Remediation |
1. Create a SageMaker Clarify explainability monitoring schedule using SHAP analysis. 2. Configure feature attribution baselines. 3. Use explainability outputs to generate adverse action notices (top contributing factors for negative decisions) where your firm’s use case and regulatory interpretation require them. 4. Retain explainability reports for regulatory audit. |
| Reference |
SageMaker Clarify Explainability |
FS-42 — AI Service Cards
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.4, §1.2.5, §1.2.14] — “Amazon provides AI Service Cards for models that are pre-trained for AWS services like Amazon Bedrock and Amazon Q. These cards help you understand how Amazon addresses toxicity in each model.” Referenced in three separate guide risk sections. |
| Description |
Checks SageMaker Model Cards document intended use and bias evaluations. |
| Detection |
Calls sagemaker:ListModelCards. For each card, calls sagemaker:DescribeModelCard and inspects the content JSON for intended_uses, business_details, and evaluation_details sections. Flags cards missing these sections. |
| Remediation |
1. Create a SageMaker Model Card for each production model. 2. Document: intended use cases, out-of-scope uses, training data description, bias evaluation results, performance metrics. 3. Review and update cards after each model retrain. 4. For Bedrock foundation models, reference the AWS AI Service Cards published by Amazon. |
| Reference |
SageMaker Model Cards, AWS AI Service Cards |
Guide source: §1.2.6 Sensitive information disclosure. Guide-listed mitigations:
(a) Bedrock Guardrails sensitive information filters for PII, PHI;
(b) data classification scanning and access controls on AI data sources;
(c) strict IAM access controls for Bedrock API;
(d) mask sensitive information in CloudWatch Logs and custom application logging;
(e) protect training and fine-tuning data via data protection best practices;
(f) monitor PII in training/fine-tuning/RAG data with Amazon Macie;
(g) remove, mask, or tokenize PII before use in training, fine-tuning, or RAG;
(h) Practical guidance: least privilege for agent identities; user-authorized communications
to tool services; propagate end-user identities so tool services can validate them without
revealing them to unauthorized third parties.
FS-43 — CloudWatch Log PII Masking
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.6] — “If you implement model invocation logging for the LLM or custom logging logic in your application, make sure to mask sensitive information in your log data.” References “Amazon CloudWatch – Help protect sensitive log data with masking”. |
| Description |
Checks CloudWatch Logs data protection policies mask PII in Bedrock invocation logs. |
| Detection |
Identifies CloudWatch log groups used by Bedrock invocation logging (from bedrock:GetModelInvocationLoggingConfiguration). Calls logs:GetDataProtectionPolicy for each log group. Flags log groups with no data protection policy or policies missing PII identifiers. Note: model invocation logging only captures calls made through the bedrock-runtime endpoint (Converse, ConverseStream, InvokeModel, InvokeModelWithResponseStream); calls through other endpoints such as the Responses API (bedrock-mantle endpoint) are not captured. |
| Remediation |
1. Create a CloudWatch Logs data protection policy on each Bedrock log group. 2. Include managed data identifiers using their exact ARN-based IDs — country-code suffixes are required in the ARN for most identifiers (the data-types table uses the short name such as Ssn, but the ARN must include the country code): Ssn-US (US Social Security Number; Ssn-ES for Spain — there is no bare Ssn ARN), CreditCardNumber (no suffix), CreditCardSecurityCode (no suffix), EmailAddress (no suffix), Address (no suffix), PhoneNumber-US, BankAccountNumber-US, DriversLicense-US, PassportNumber-US, IndividualTaxIdentificationNumber-US. 3. Add a Deidentify operation statement (no hyphen — this is the exact JSON key required in the policy document, even though AWS prose documentation uses “De-identify”) to mask sensitive data, and a separate Audit statement to emit findings to CloudWatch. The Deidentify operation must contain an empty "MaskConfig": {} object. 4. Retroactive masking scope: A log group-level data protection policy only masks data ingested after the policy is applied — historical log events are not retroactively masked. However, an account-level data protection policy applies to both existing log groups and log groups created in the future. For maximum coverage, consider creating an account-level policy in addition to log group-level policies. Apply policies at log group creation time or as early as possible. 5. Test by sending a log entry containing sample PII and verifying it is masked in subsequent reads. |
| Reference |
CloudWatch Logs Data Protection, PII Data Identifier ARNs, Financial Data Identifier ARNs |
FS-44 — Amazon Macie PII Scanning and Pre-Processing
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.6] — “Monitor personally identifiable information (PII) in your data when you train models, fine-tune them, or use retrieval-augmented generation (RAG)” and “Remove, mask, or tokenize personally identifiable information (PII) or sensitive data before you use it for training, fine-tuning, or retrieval-augmented generation (RAG).” |
| Description |
Verifies Macie is enabled and scanning AI/ML data buckets, and checks that a PII pre-processing step (tokenization, masking, or removal) exists in training and RAG ingestion pipelines before data reaches the model. |
| Detection |
Calls macie2:GetMacieSession to verify Macie is enabled. Calls macie2:GetAutomatedDiscoveryConfiguration to check whether automated sensitive data discovery is enabled (preferred over manual classification jobs — automated discovery evaluates S3 buckets daily without explicit job creation). Also calls macie2:ListClassificationJobs to check for any additional targeted jobs covering S3 buckets tagged for AI/ML use. Additionally inspects SageMaker Processing jobs or Glue jobs for PII-related naming patterns indicating a pre-processing pipeline. |
| Remediation |
1. Enable Amazon Macie in the account. 2. Preferred: Enable Macie Automated Sensitive Data Discovery (via macie2:UpdateAutomatedDiscoveryConfiguration set to ENABLED) — this continuously evaluates ALL S3 buckets in the account or organization daily, selects representative objects, and produces sensitive-data findings without requiring manual job creation. 3. For higher-priority AI/ML buckets where you need full-depth scans, supplement with targeted classification jobs (macie2:CreateClassificationJob) scheduled at least weekly. 4. Implement a PII pre-processing step in your data pipeline (SageMaker Processing job, Glue job, or Lambda) that tokenizes, masks, or removes PII before data is used for training or RAG ingestion. 5. Use Amazon Comprehend DetectPiiEntities or Macie findings to identify PII locations and feed them into the pre-processing step. 6. Route Macie findings to EventBridge and then to your SIEM or ticketing system for timely investigation. |
| Reference |
Amazon Macie, Macie Automated Sensitive Data Discovery, Amazon Comprehend PII Detection |
FS-45 — Guardrail PII Filters
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.6] — “Use Amazon Bedrock Guardrails to detect and filter structured sensitive information in model inputs and outputs, such as personally identifiable information (PII), protected health information (PHI).” |
| Description |
Checks guardrails have PII entity filters for SSN, credit card, and account numbers. |
| Detection |
Calls bedrock:GetGuardrail and inspects sensitiveInformationPolicy.piiEntities. Flags guardrails missing filters for critical PII types: US_SOCIAL_SECURITY_NUMBER, CREDIT_DEBIT_CARD_NUMBER, CREDIT_DEBIT_CARD_CVV, CREDIT_DEBIT_CARD_EXPIRY, US_BANK_ACCOUNT_NUMBER, US_BANK_ROUTING_NUMBER, PIN, SWIFT_CODE, INTERNATIONAL_BANK_ACCOUNT_NUMBER, US_INDIVIDUAL_TAX_IDENTIFICATION_NUMBER, EMAIL, PHONE. |
| Remediation |
1. Update the guardrail to add PII entity filters for all relevant types. 2. Configure separate input and output actions using the inputAction and outputAction fields: set outputAction=ANONYMIZE (replace with placeholder such as {US_SOCIAL_SECURITY_NUMBER}) so PII in model responses is masked before reaching the user; set inputAction=BLOCK for PII types that should never be submitted (e.g., SSN, credit card numbers). 3. Use inputEnabled and outputEnabled to selectively enable evaluation per direction — disable evaluation on a direction you don’t need to reduce cost and latency. 4. PHI coverage nuance: The Bedrock Guardrails sensitive information filter has only limited built-in PHI entities — specifically CA_HEALTH_NUMBER (Canada) and UK_NATIONAL_HEALTH_SERVICE_NUMBER (UK). For US HIPAA PHI (for example, Medical Record Numbers, Health Plan Beneficiary Numbers, Medicare Beneficiary Identifiers), there is no built-in entity type — use regexesConfig (custom regex patterns) on the guardrail to detect these patterns, complemented by downstream CloudWatch Logs data protection policies (see FS-43) which have PHI identifiers under the HIPAA category. 5. Critical limitation — tool_use outputs: The sensitive information filter does NOT detect PII when models respond with tool_use (function call) output parameters via supported APIs. For FinServ agentic applications where models invoke tools and return structured function-call responses, implement application-layer PII scanning on tool outputs before they are processed or displayed. 6. Critical limitation — invocation logs: Guardrail PII masking applies only to content sent to and returned from the inference model. It does NOT apply to model invocation logs — the input field in CloudWatch Logs always contains the original, unmasked request regardless of guardrail intervention. Use CloudWatch Logs data protection policies (see FS-43) to mask PII in logs separately. Similarly, the match field in guardrail trace output contains the original PII value, not the masked output. 7. Test with sample inputs containing each PII type and verify both input blocking and output anonymization work as expected. |
| Reference |
Bedrock Guardrails Sensitive Information Filters |
FS-46 — Data Classification Tagging
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.6] — “Implement data classification scanning and access controls on the data sources connected to your AI system to prevent disclosure of company-confidential or proprietary information.” |
| Description |
Verifies AI/ML S3 buckets are tagged with data classification labels. |
| Detection |
Lists S3 buckets and filters for AI/ML-related names or tags. Calls s3:GetBucketTagging for each and checks for a data-classification tag with values like public, internal, confidential, restricted. Flags buckets missing the tag. |
| Remediation |
1. Define a data classification taxonomy (e.g., Public, Internal, Confidential, Restricted). 2. Tag all AI/ML S3 buckets with data-classification=<level>. 3. Detective enforcement: Create an AWS Config managed rule (required-tags, checks up to six tag keys at a time) to identify buckets missing the tag and trigger remediation via a custom SSM automation document (note: the AWS-managed AWS-SetRequiredTags automation document does NOT work as a remediation with this rule — you must author a custom Systems Manager automation document). 4. Preventive enforcement: Use AWS Organizations Tag Policies to require the data-classification tag key with allowed values (Public, Internal, Confidential, Restricted) across accounts — Tag Policies are preventive and complement the detective Config rule. 5. Use tag-based IAM policies (via condition keys aws:ResourceTag/data-classification) to restrict S3 access based on classification level. 6. Pair with Macie classification jobs (see FS-44) so that buckets automatically classified as containing sensitive data are flagged if their data-classification tag is missing or inconsistent with the Macie findings. |
| Reference |
AWS Tagging Best Practices, AWS Config required-tags Rule, AWS Organizations Tag Policies |
Part 3 — Application-Layer Controls & Material Gaps (FS-47 to FS-69)
Guide risk categories: Hallucination (FS-47..50, §1.2.7), Prompt Injection (FS-51..54, §1.2.8), Improper Output Handling (FS-55..58, §1.2.13), Off-Topic & Inappropriate Output (FS-59..60, §1.2.2), Out-of-Date Training Data (FS-61..63, §1.2.10), Additional Controls — Material Gaps (FS-64..69). FS-64 is merged into upstream BR-04 — see the extension note in the Material Gaps section.
Hallucination (FS-47 to FS-50)
Guide source: §1.2.7 Hallucination. Guide-listed mitigations:
(a) prompt engineering;
(b) RAG with Bedrock Knowledge Bases;
(c) detect hallucinations in RAG and agent-based systems;
(d) HITL validation for internal AI systems;
(e) Automated Reasoning checks in Bedrock Guardrails;
(f) Bedrock Guardrails contextual grounding checks with reference source and query;
(g) response disclaimers in customer-facing applications informing users that AI responses
should be verified for critical decisions.
FS-47 — Guardrail Grounding Threshold
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.7] — “You can use Amazon Bedrock Guardrails to detect and filter hallucinations in model responses by performing contextual grounding checks when you provide a reference source and query.” |
| Description |
Verifies guardrail grounding thresholds are set appropriately for financial use cases (this assessment recommends ≥ 0.7; AWS does not prescribe a specific minimum, but the valid range is 0 to 0.99). Note: contextual grounding checks are not supported for conversational chatbot use cases — only for summarization, paraphrasing, and Q&A. |
| Detection |
Calls bedrock:GetGuardrail and inspects contextualGroundingPolicy.filters for the GROUNDING filter type. Checks that the threshold value is ≥ 0.7. Flags guardrails with lower thresholds or no grounding filter. |
| Remediation |
1. Update the guardrail to set the grounding filter threshold to at least 0.7 (this assessment recommends 0.8 for financial services to reduce hallucination risk — note: AWS does not prescribe a specific minimum, but the valid range is 0 to 0.99; a value of 1.0 is explicitly invalid and will block all content per AWS documentation). 2. Enable the grounding filter for both the GROUNDING and RELEVANCE types. 3. Test with prompts that should and should not be grounded in the reference source — tune the threshold based on your false-positive/false-negative tolerance. 4. Monitor grounding filter invocation rates via CloudWatch using the AWS/Bedrock/Guardrails namespace. Important limitation: Contextual grounding checks support only summarization, paraphrasing, and question-answering use cases — Conversational QA / Chatbot use cases are explicitly not supported per AWS documentation. For FinServ chatbot deployments, use denied topics and content filters (FS-28, FS-36, FS-59) as the primary hallucination-mitigation controls instead. |
| Reference |
Bedrock Guardrails Contextual Grounding |
FS-48 — RAG Knowledge Base
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.1, §1.2.7, §1.2.10] — “Use Retrieval-Augmented Generation (RAG) to enhance your model responses with information from trusted knowledge bases.” Referenced in three separate guide risk sections. |
| Description |
Checks active Knowledge Bases are configured for RAG grounding. |
| Detection |
Calls ListKnowledgeBases (via the bedrock-agent boto3 client; IAM action bedrock:ListKnowledgeBases) and checks that at least one KB exists with status=ACTIVE. Flags accounts with no active KBs when Bedrock models are in use (indicating responses are ungrounded). |
| Remediation |
1. Create a Bedrock Knowledge Base with your authoritative data sources. 2. Configure the KB with an appropriate embedding model and vector store. 3. Use RetrieveAndGenerate API instead of direct InvokeModel for customer-facing use cases. 4. Sync data sources on a regular schedule. |
| Reference |
Bedrock Knowledge Bases |
FS-49 — Hallucination Disclaimer
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.7] — “Implement response disclaimers in customer-facing applications, to inform end users that AI-generated responses should be verified for critical decisions.” References “AWS Well-Architected Framework Generative AI Lens - Implement guardrails to mitigate harmful or incorrect model responses”. |
| Description |
Advisory: verifies application adds hallucination disclaimers to AI-generated outputs. |
| Detection |
Advisory check — inspects application Lambda environment variables for disclaimer-related settings. Checks for post-processing Lambda functions that append disclaimers. |
| Remediation |
1. Add a standard disclaimer to all AI-generated responses: “This response is generated by AI and may contain inaccuracies. Please verify critical information independently.” 2. Make the disclaimer configurable and non-removable by prompt manipulation. 3. For financial decisions, add: “This does not constitute financial advice.” |
| Reference |
AWS Well-Architected GenAI Lens |
FS-50 — Relevance Grounding Filters
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.2, §1.2.7] — “Use Amazon Bedrock Guardrails to detect and filter hallucinations in model responses by performing contextual grounding checks.” Contextual grounding covers both GROUNDING and RELEVANCE filter sub-types. |
| Description |
Checks guardrails have relevance grounding filters to prevent off-topic responses. |
| Detection |
Calls bedrock:GetGuardrail and inspects contextualGroundingPolicy.filters for the RELEVANCE filter type. Flags guardrails with no relevance filter configured. |
| Remediation |
1. Update the guardrail to enable the RELEVANCE contextual grounding filter. 2. Set the threshold to at least 0.7 (valid range is 0 to 0.99; a value of 1.0 is explicitly invalid per AWS documentation). 3. This ensures responses are relevant to the user’s query and the provided reference source, filtering out off-topic hallucinations. Important limitation: Contextual grounding checks (both GROUNDING and RELEVANCE) support only summarization, paraphrasing, and question-answering use cases — Conversational QA / Chatbot use cases are explicitly not supported per AWS documentation. For FinServ chatbot deployments, use denied topics (FS-59) as the primary off-topic control. |
| Reference |
Bedrock Guardrails Contextual Grounding |
Prompt Injection (FS-51 to FS-54)
Guide source: §1.2.8 Prompt injection. Guide-listed mitigations:
(a) prompt engineering best practices to avoid prompt injection;
(b) input validation — sanitize user input, remove special characters or use escape sequences,
match expected format;
(c) secure coding practices — parameterized queries, avoid string concatenation, minimal
privileges;
(d) security testing — regular testing for prompt injection and vulnerabilities, pentest,
static code analysis, DAST;
(e) stay updated — keep Bedrock SDK, libraries, and dependencies current;
(f) Bedrock Guardrails to detect and block user inputs attempting to override system
instructions through prompt attacks.
FS-51 — Prompt Attack Filters
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.8] — “Use Amazon Bedrock Guardrails to detect and block user inputs that attempt to override system instructions through prompt attacks.” |
| Description |
Verifies guardrails have PROMPT_ATTACK content filters enabled and are configured correctly for the Standard tier. |
| Detection |
Calls bedrock:GetGuardrail and inspects contentPolicy.filters for a filter with type=PROMPT_ATTACK. Flags guardrails where this filter is absent, has inputStrength set to NONE or LOW (note: PROMPT_ATTACK only applies to inputs — there is no outputStrength for this filter type), or where contentPolicy.tier.tierName=CLASSIC (the PROMPT_ATTACK filter in Classic tier detects jailbreaks and prompt injection; in Standard tier it additionally detects prompt leakage — attempts to extract system prompts or developer instructions). |
| Remediation |
1. Ensure the guardrail is configured with the Standard content filters tier — prompt leakage detection (extracting system prompts/developer instructions) is available only in Standard tier; jailbreak and prompt injection detection are available in both tiers. Standard tier requires cross-Region inference to be enabled on the guardrail. You can configure Standard tier on a new or existing guardrail: for an existing guardrail, modify it via UpdateGuardrail (set tierConfig.tierName=STANDARD in contentPolicyConfig and add a crossRegionConfig.guardrailProfileIdentifier), or use the console by editing the guardrail and selecting Standard tier with cross-Region inference. 2. Add a PROMPT_ATTACK content filter with inputStrength=HIGH. 3. Wrap user input in guardrail input tags when using InvokeModel or InvokeModelResponseStream — for these APIs, PROMPT_ATTACK only evaluates content enclosed in input tags (e.g., <amazon-bedrock-guardrails-guardContent_xyz>user text</amazon-bedrock-guardrails-guardContent_xyz> — the reserved prefix is amazon-bedrock-guardrails-guardContent and the suffix should be a unique random string per request to prevent an attacker from closing the tag and appending malicious content). Untagged content is not evaluated for PROMPT_ATTACK when using these APIs. Note: When using the Converse API, use the guardContent field (GuardrailConverseContentBlock) in user messages to scope PROMPT_ATTACK evaluation to specific content — this is the Converse API equivalent of input tags. Without guardContent, the guardrail evaluates ALL message content (the entire messages array). Using guardContent in user messages ensures only user-provided content is evaluated for prompt attacks, while system prompts and conversation history are excluded. If no guardContent blocks are present in messages, the guardrail evaluates everything in the messages array. 4. Test with known prompt injection patterns (role-play attacks, instruction override, delimiter injection). 5. Monitor filter invocation rates via CloudWatch guardrail metrics (InvocationsIntervened in the AWS/Bedrock/Guardrails namespace, filtered by GuardrailPolicyType=ContentPolicy) for trending attack patterns. |
| Reference |
Bedrock Guardrails Prompt Attack, Safeguard tiers for guardrails, Securing Amazon Bedrock Agents against indirect prompt injections |
FS-52 — Bedrock SDK Version Currency
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.8] — “Stay Updated – Keep your Amazon Bedrock SDK, libraries, and dependencies current to receive the latest security patches and updates.” |
| Description |
Checks Bedrock Lambda functions use current (non-deprecated) runtimes and SDK versions. |
| Detection |
Calls lambda:ListFunctions and filters for functions with Bedrock-related names or environment variables referencing Bedrock. Checks each function’s Runtime against the list of deprecated Lambda runtimes. |
| Remediation |
1. Update Lambda functions to use a currently supported runtime — as of April 2026, recommended runtimes are python3.13 or python3.14 for Python (both deprecation date June 30, 2029; python3.12 remains supported through Oct 31, 2028), and nodejs22.x or nodejs24.x for Node.js (nodejs20.x reaches deprecation on April 30, 2026 and should not be used for new deployments). 2. Update the Bedrock SDK (boto3/botocore) to the latest version in your requirements.txt or package.json. 3. Test after upgrading to verify no breaking changes. 4. Subscribe to AWS Lambda runtime deprecation notifications via EventBridge or SNS (Lambda also surfaces runtime deprecation notices via AWS Health Dashboard and Trusted Advisor). |
| Reference |
Lambda Runtime Deprecation Policy |
FS-53 — WAF Injection Protection Rules
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.8, extension] — WAF SQLi and known-bad-inputs rule groups are not named in the guide, but implement the guide mitigation “Secure Coding Practices – use parameterized queries, avoid string concatenation for input, grant minimal access privileges” at the network edge for web-facing GenAI endpoints. |
| Description |
Verifies WAF ACLs include SQL injection (AWSManagedRulesSQLiRuleSet) and known-bad-inputs (AWSManagedRulesKnownBadInputsRuleSet) managed rule groups for GenAI endpoints. |
| Detection |
Calls wafv2:ListWebACLs(Scope=REGIONAL) and for each calls wafv2:GetWebACL. Inspects the rules list for AWSManagedRulesSQLiRuleSet and AWSManagedRulesKnownBadInputsRuleSet. Flags ACLs missing either rule group. |
| Remediation |
1. Add AWSManagedRulesSQLiRuleSet to your WAF Web ACL (contains SQLi detection rules for body, URI path, cookie, and query-string components). 2. Add AWSManagedRulesKnownBadInputsRuleSet for known Remote Command Execution (RCE) and vulnerability-discovery patterns (e.g., Log4j, Spring Core deserialization, path traversal) — note this rule group does NOT cover XSS; XSS is in AWSManagedRulesCommonRuleSet (see FS-56). 3. Set both rule groups to COUNT mode initially, review logs for false positives, then switch to BLOCK. 4. Create custom rules for GenAI-specific injection patterns if needed. |
| Reference |
AWS WAF Managed Rules |
FS-54 — Penetration Testing Evidence
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.8] — “Security Testing – Test your applications regularly for prompt injection and other security vulnerabilities. Use penetration testing, static code analysis, and dynamic application security testing (DAST).” |
| Description |
Advisory: verifies GenAI applications have been penetration tested for prompt injection and other AI-specific vulnerabilities. |
| Detection |
Advisory check — inspects resource tags for last-pentest-date or checks for a documented penetration testing schedule. Cannot be fully automated. |
| Remediation |
1. Conduct penetration testing of your GenAI application at least annually and before major releases. 2. Include AI-specific test cases: prompt injection, jailbreak attempts, data extraction, system prompt leakage. 3. Use tools like Garak, PyRIT, manual red-teaming, or the AWS Security Agent. As of the March 2026 GA announcement, Security Agent runs from 6 AWS regions (N. Virginia, Oregon, Ireland, Frankfurt, Sydney, Tokyo) but can test targets across AWS, Azure, GCP, and on-premises environments. For multi-account FinServ deployments, Security Agent supports penetration testing on VPC resources shared across AWS accounts in the same AWS Organization via AWS Resource Access Manager (RAM) — enable this by launching Security Agent from a central security account and sharing VPC resources from sub-accounts via RAM. Verify current region coverage on the AWS Security Agent page before citing, as AWS has been expanding regional availability and feature set rapidly. 4. Document findings and track remediation. 5. Tag resources with last-pentest-date for audit trail. |
| Reference |
AWS Penetration Testing Policy, AWS Security Agent GA |
Improper Output Handling (FS-55 to FS-58)
Guide source: §1.2.13 Improper output handling. Guide-listed mitigations:
(a) implement output validation rules against expected response format (e.g., JSON schema,
SQL schema);
(b) apply context-specific output sanitization — HTML encoding for web apps, SQL
parameterization for database queries, command escaping for system integrations;
(c) Practical guidance: treat model output as untrusted user input; use Bedrock Agents
action-group Lambda to implement output encoding so output text is non-executable by
JavaScript or Markdown.
FS-55 — Output Validation Lambda
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.13] — “Implement output validation rules specific to the expected response format. For example, if the AI system is expected to return structured data (JSON, SQL), validate the output against the expected schema before processing.” |
| Description |
Checks for Lambda functions implementing output validation/sanitization before AI responses reach downstream consumers. |
| Detection |
Calls lambda:ListFunctions and searches for functions with naming patterns indicating output validation (e.g., “output-valid”, “sanitiz”, “post-process”, “response-filter”). Flags if no such functions exist. |
| Remediation |
1. Implement a post-processing Lambda that validates AI model output before it reaches the end user or downstream system. 2. Validate output against expected schema (JSON schema validation for structured responses). 3. Strip or escape any executable content (HTML tags, JavaScript, SQL fragments). 4. Log rejected outputs for security monitoring. |
| Reference |
AWS Well-Architected Security Pillar — Application Security, Bedrock Prompt Injection Security, Well-Architected FSI Lens — FSISEC14 Monitor AI system outputs for security issues |
FS-56 — XSS Prevention WAF
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.13, extension] — WAF XSS rule groups are not named in the guide, but implement the guide mitigation “Apply context-specific output sanitization … apply HTML encoding for web applications” at the network edge. |
| Description |
Verifies WAF ACLs include XSS prevention rules to protect against AI-generated outputs containing malicious scripts. |
| Detection |
Calls wafv2:GetWebACL for each regional ACL and inspects rules for AWSManagedRulesCommonRuleSet (which includes the four CrossSiteScripting_* rules covering request body, query arguments, cookies, and URI path) or custom rules using XssMatchStatement on request components. Flags ACLs missing XSS protection. |
| Remediation |
1. Add AWSManagedRulesCommonRuleSet to your WAF Web ACL (includes CrossSiteScripting_COOKIE, CrossSiteScripting_QUERYARGUMENTS, CrossSiteScripting_BODY, and CrossSiteScripting_URIPATH rules — all four inspect inbound request components). 2. XssMatchStatement and the CRS XSS rules inspect request components only (body, query string, URI path, cookies, headers). WAF does NOT inspect arbitrary response bodies for XSS — response inspection (ResponseInspection) is available only in AWSManagedRulesATPRuleSet/AWSManagedRulesACFPRuleSet for CloudFront-protected ACLs and only scans for configured success/failure strings. 3. To protect against XSS in AI-generated output, enforce output encoding at the application layer (see FS-57) — rendering raw model output in a browser without encoding is the root cause that WAF cannot mitigate after the fact. 4. Apply output encoding in your application layer as defense-in-depth. |
| Reference |
AWS WAF XSS Protection |
FS-57 — Output Encoding
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.13] — “Apply context-specific output sanitization based on the downstream consumer. For example, apply HTML encoding for web applications, SQL parameterization for database queries, and command escaping for system integrations.” Practical guidance: “Use Amazon Bedrock Agents to securely integrate with AWS native and third-party services and implement output encoding in the action group Lambda function under an Amazon Bedrock Agent. Encoding all output text presented to end-users makes it automatically non-executable by JavaScript or Markdown.” |
| Description |
Advisory: verifies application encodes GenAI outputs appropriately for the rendering context (HTML, JSON, SQL). |
| Detection |
Advisory check — inspects application Lambda functions for encoding libraries or patterns (e.g., html.escape, json.dumps, markupsafe). Checks environment variables for encoding-related configuration. |
| Remediation |
1. Treat all model output as untrusted user input. 2. Apply context-specific encoding: HTML encoding for web display, SQL parameterization for database queries, command escaping for system integrations. 3. Use Bedrock Agents action-group Lambda functions to implement output encoding — encoding all output text makes it non-executable by JavaScript or Markdown renderers. 4. Never render raw model output in a web page without encoding. |
| Reference |
OWASP Output Encoding |
FS-58 — Output Schema Validation
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.13] — “Implement output validation rules specific to the expected response format. For example, if the AI system is expected to return structured data (JSON, SQL), validate the output against the expected schema before processing.” |
| Description |
Checks for structured output validation in GenAI pipelines (JSON schema, XML schema, or custom validators). |
| Detection |
Inspects Step Functions state machine definitions for states that perform schema validation (e.g., Choice states with JSON path conditions, Lambda states with “schema” or “validate” in the name). Does not rely on API Gateway response models as a validation signal because those are used for SDK generation, not runtime validation. |
| Remediation |
1. Define a JSON schema for expected AI output format. 2. Add a validation step in your pipeline (Lambda function or Step Functions Choice state) that rejects non-conforming outputs before returning the response to clients — this is the runtime enforcement point. 3. Note: API Gateway response models in REST APIs are used for SDK generation (user-defined data types) and documentation — they do NOT perform runtime validation of response payloads. API Gateway request validators only validate inbound requests against request models. To validate AI output at runtime, implement the check in Lambda/Step Functions before the response reaches API Gateway. 4. Return a safe fallback response when validation fails. 5. Log rejected outputs (without leaking sensitive content) for security monitoring. |
| Reference |
API Gateway Request and Response Validation |
Off-Topic & Inappropriate Output (FS-59 to FS-60)
Guide source: §1.2.2 Off-topic and inappropriate output. Guide-listed mitigations:
(a) prompt engineering with an allowlist of approved topics aligned with business purpose;
(b) content filters and denied topics in Bedrock Guardrails;
(c) Bedrock Guardrails contextual grounding check with reference source and query;
(d) HITL validation for internal AI systems.
FS-59 — Guardrail Topic Allowlist
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.2] — “Configure content filters and guardrails to restrict model responses to approved topics.” The check name uses “allowlist” loosely — implementation uses denied-topic lists to block out-of-scope content. |
| Description |
Verifies guardrails restrict GenAI to on-topic financial services responses via denied topics. |
| Detection |
Calls bedrock:GetGuardrail and inspects topicPolicy.topics. Checks that denied topics exist to block off-topic conversations (e.g., politics, entertainment, medical advice). Flags guardrails with no topic restrictions. |
| Remediation |
1. Define denied topics that are outside your business scope (e.g., “medical advice”, “legal advice”, “political opinions”, “entertainment recommendations”). 2. Add these as denied topics in the guardrail with clear descriptions and sample phrases. 3. Test with off-topic prompts to verify they are blocked. 4. Use the system prompt to positively scope the assistant’s role. |
| Reference |
Bedrock Guardrails Topic Policies |
FS-60 — Contextual Grounding for Off-Topic
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.2] — “Use prompt engineering techniques to guide the model toward appropriate topics and prevent unwanted responses. Include an allowlist of approved topics aligned with the business purpose.” Use of Bedrock Prompt Management for system prompt versioning is an implementation choice. |
| Description |
Advisory: verifies system prompts explicitly scope the assistant’s role to prevent off-topic responses. |
| Detection |
Advisory check — inspects Bedrock Prompt Management templates (via ListPrompts on the bedrock-agent boto3 client; IAM action bedrock:ListPrompts) for system prompt content that defines the assistant’s role, scope, and boundaries. Flags if no prompt templates exist. |
| Remediation |
1. Define a clear system prompt that states: the assistant’s role, allowed topics, prohibited topics, and response format. 2. Use Bedrock Prompt Management to version and manage system prompts. 3. Include explicit instructions like “You are a financial services assistant. Only answer questions related to [specific topics]. Decline all other requests politely.” 4. Test with boundary-case prompts. |
| Reference |
Bedrock Prompt Management |
Out-of-Date Training Data (FS-61 to FS-63)
Guide source: §1.2.10 Out-of-date training data. Guide-listed mitigations:
(a) RAG with Bedrock Knowledge Bases;
(b) keep knowledge bases up to date (sync data sources);
(c) HITL validation for internal AI systems;
(d) data currency disclaimers in AI system responses; source attribution via
RetrieveAndGenerate API for users to verify currency.
FS-61 — Knowledge Base Sync Schedule
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.10] — “Keep your knowledge bases up to date.” Automated scheduling via EventBridge operationalises this mitigation. |
| Description |
Checks EventBridge Scheduler or EventBridge rules automate KB data source sync on a regular schedule. |
| Detection |
Calls events:ListRules and searches for rules with targets that invoke StartIngestionJob (IAM action bedrock:StartIngestionJob) or Lambda functions that trigger KB sync. Also checks AWS Scheduler (scheduler:ListSchedules) for schedules targeting KB sync. Flags if no scheduled sync mechanism exists. |
| Remediation |
1. Use EventBridge Scheduler (the current recommended approach — EventBridge scheduled rules are a legacy feature) to create a recurring schedule that triggers KB data source sync: create a schedule with a rate expression (e.g., rate(1 day)) or cron expression (e.g., cron(0 2 * * ? *)) targeting a Lambda function. 2. The Lambda function calls StartIngestionJob (IAM action bedrock:StartIngestionJob) for each data source. 3. Add error handling and CloudWatch alarms for failed syncs. |
| Reference |
EventBridge Scheduler, EventBridge Scheduled Rules (legacy) |
FS-62 — Data Currency Disclaimer
| Field |
Detail |
| Severity |
Informational |
| Guide ref |
[Guide §1.2.10] — “Include data currency disclaimers in AI system responses where appropriate. Use source attribution in RAG-based response for end users to verify currency of information.” |
| Description |
Advisory: verifies application adds data currency disclaimers to AI-generated outputs. |
| Detection |
Advisory check — inspects application configuration for data-currency disclaimer settings. Checks system prompts for instructions to include data freshness information. |
| Remediation |
1. Add a data currency disclaimer to responses: “This information is based on data available as of [date]. It may not reflect the most recent changes.” 2. Use the RetrieveAndGenerate API’s source attribution to display document dates. 3. Configure the system prompt to instruct the model to caveat time-sensitive information. |
| Reference |
Bedrock RetrieveAndGenerate API |
FS-63 — Foundation Model Lifecycle Policy
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.10, extension] — FM currency is conceptually related to “out-of-date training data” but the specific Bedrock lifecycle-status check is not named in the guide. The guide’s “1.1.6 Monitor and improve” general guidance says “Update your foundation models when new versions become available” — this FS check operationalises that guidance. See also FS-34 (TPRM) which the guide places under §1.2.12. |
| Description |
Checks for a model lifecycle management process and Config rules to ensure models are updated when new versions are available. |
| Detection |
Calls config:DescribeConfigRules and searches for rules targeting Bedrock resources. Calls bedrock:GetFoundationModel for each model in use and inspects modelLifecycle.status. Flags models with status LEGACY (note: the Bedrock API exposes only two lifecycle status values — ACTIVE and LEGACY; models past their endOfLifeTime are removed from the service entirely and return a ResourceNotFound error, so any model still reachable via the API that is not ACTIVE will be LEGACY). |
| Remediation |
1. Create an AWS Config custom rule that flags Bedrock models with modelLifecycle.status=LEGACY. 2. Establish a model lifecycle policy: evaluate new model versions within 30 days of release, test in staging, migrate production within 90 days (and before the endOfLifeTime published in the Bedrock model lifecycle page). 3. Subscribe to AWS Bedrock model lifecycle notifications. 4. Document the policy and assign an owner. 5. Budget planning for FinServ: For models with EOL dates after February 1, 2026, after a minimum of 3 months in Legacy state a model enters a public extended access period during which the model provider may set higher pricing. The publicExtendedAccessTime timestamp in the FoundationModelLifecycle response indicates when this phase begins. Include this phase in contract-and-budget review so FinServ cost governance teams are aware of potential price changes before migrating off Legacy models. |
| Reference |
Bedrock Model Lifecycle |
Additional Controls — Material Gaps (FS-64 to FS-69)
These checks address mitigations explicitly called out in the Responsible AI GRC guide that were
not covered by the original checks in the upstream AIML Security Assessment (BR/SM/AC).
FS-64 is merged into upstream BR-04 (see extension note below); FS-65 to FS-69 ship as
standalone checks.
FS-64 — Guardrail Trace Logging → Merged into upstream BR-04
Upstream extension note (do not ship as a standalone check): The detection and remediation
content from FS-64 should be added as a refinement of the existing BR-04 (Model Invocation
Logging) check in the upstream repo.
What to add to BR-04:
- After verifying that
bedrock:GetModelInvocationLoggingConfiguration shows logging is
enabled, additionally verify the log output captures guardrail trace data: when
guardrails are applied during inference, the invocation log contains a guardrailTrace
object with action (values: INTERVENED or NONE), inputAssessments, and
outputAssessments arrays detailing which policies were evaluated and their results.
- Important logging coverage gap: Model invocation logging only captures calls made through the
bedrock-runtime endpoint (Converse, ConverseStream, InvokeModel, InvokeModelWithResponseStream). Calls made through the bedrock-mantle endpoint (e.g., the Responses API) are not currently captured by invocation logging. If your application uses the Responses API, implement application-level logging as a compensating control.
- Add a remediation note on retention requirements: NYDFS 23 NYCRR 500.06 explicitly
requires cybersecurity records for ≥ 5 years; SR 11-7 does not prescribe a specific period
but requires documentation be maintained for the duration of model use plus a reasonable
period thereafter (commonly met with 5–7 year retention per firm policy). Consult your
compliance and records-management team for exact requirements.
- Suggest creating CloudWatch Metrics filters to track guardrail intervention rates (filter
on
guardrailTrace.action = INTERVENED) and applying CloudWatch Logs data protection
policies to mask PII in traces.
- Guide traceability: [Guide §1.2.1] — “Maintain audit logs of AI-generated outputs and the
guardrails applied to support regulatory reporting and post-incident analysis.” Also
§1.2.9 — “Implement audit logging of all actions taken by AI agents.”
Reference: Bedrock Model Invocation Logging
FS-65 — KB Data Source S3 Event Notifications
| Field |
Detail |
| Severity |
High (deleted bucket) / Medium (notifications) |
| Guide ref |
[Guide §1.2.3] — “Use integrity monitoring on knowledge base data sources to detect unauthorized modifications… For example on S3 data sources use Amazon S3 event notification to track changes to documents.” Note: This check overlaps with FS-33; FS-33 verifies notifications are enabled on the bucket, while FS-65 verifies that notifications are routed to an alerting destination (SNS/Lambda/EventBridge rule with a target). In the final PR to aws-samples these two checks may be consolidated into a single check at the reviewer’s discretion. |
| Description |
Checks that S3 event notifications on KB data-source buckets are routed to an alerting destination (EventBridge rule with SNS/Lambda target, or direct SNS/SQS/Lambda notification) — not just enabled with no consumer. |
| Detection |
Identifies KB data-source S3 buckets via ListDataSources and GetDataSource (via the bedrock-agent boto3 client; IAM actions bedrock:ListDataSources and bedrock:GetDataSource). For each bucket, calls s3:GetBucketNotificationConfiguration and checks for the presence of EventBridgeConfiguration, TopicConfigurations, QueueConfigurations, or LambdaFunctionConfigurations. Flags buckets with no notifications configured. |
| Remediation |
1. Enable EventBridge notifications on each KB data-source bucket: aws s3api put-bucket-notification-configuration --bucket <name> --notification-configuration '{"EventBridgeConfiguration":{}}'. 2. Create an EventBridge rule matching S3 event detail types "Object Created" and "Object Deleted" for the bucket (note: when S3 sends events to EventBridge, the event detail types are Object Created/Object Deleted; the s3:ObjectCreated:* and s3:ObjectRemoved:* wildcard names are used only for direct SNS/SQS/Lambda notification configurations, not for EventBridge rule patterns). 3. Route events to an SNS topic or Lambda function for alerting. 4. Integrate alerts into your security incident response workflow. |
| Reference |
S3 EventBridge Integration |
FS-66 — AgentCore End-User Identity Propagation
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.6 — Practical guidance] — “1. Implement least privilege for identities associated with agents and tool services. 2. Where supported by the tool service ensure that communications to tool services or agents are authorized by the end user. 3. Customers building their own tool services should consider propagating end-user identities separately; ensuring these identities can be validated and are not revealed to unauthorized third parties.” |
| Description |
Verifies AgentCore runtimes are configured to propagate end-user identities to downstream tool services, ensuring tool calls are authorized by the originating user and not solely by the agent execution role. |
| Detection |
Calls ListAgentRuntimes (via the bedrock-agentcore-control boto3 client; IAM action bedrock-agentcore:ListAgentRuntimes) and inspects each runtime’s authorizerConfiguration.customJWTAuthorizer for a discoveryUrl and allowed audiences/clients/scopes. Flags runtimes with no JWT authorizer (meaning inbound calls carry no verifiable end-user identity), and advises configuring outbound OAuth for downstream tool services. |
| Remediation |
1. Configure a custom JWT inbound authorizer on each AgentCore runtime: specify discoveryUrl, allowedAudience, allowedClients, and optional required custom claims. 2. Propagate the end-user’s identity via the X-Amzn-Bedrock-AgentCore-Runtime-User-Id header and JWT token in the Authorization header when calling downstream tool services. Important: Invoking InvokeAgentRuntime with the X-Amzn-Bedrock-AgentCore-Runtime-User-Id header requires the distinct IAM action bedrock-agentcore:InvokeAgentRuntimeForUser in addition to bedrock-agentcore:InvokeAgentRuntime. Only trusted principals should hold this permission — scope it to specific runtime resources with IAM resource conditions, never via wildcard. For runtimes that do not need user-id delegation, explicitly deny bedrock-agentcore:InvokeAgentRuntimeForUser to prevent the header from being accepted. Additionally, derive the user-id from the authenticated principal’s context (IAM caller identity or JWT claims) rather than from arbitrary client-supplied values to prevent user impersonation, and log the relationship between the authenticated IAM principal (via CloudTrail’s SigV4 context) and the user-id value passed. 3. Configure outbound OAuth 2.0 for agents accessing third-party resources on behalf of the user. 4. Ensure tool services validate the propagated JWT before executing actions. 5. Implement agent identity segregation: assign distinct identities to each sub-agent in multi-agent workflows so actions are separately attributable. 6. Apply a maker-checker pattern for critical financial actions — require a second agent or human to verify before execution. 7. Do not log or expose propagated identity tokens to unauthorized third parties. |
| Reference |
Configure Inbound JWT Authorizer, Inbound and Outbound Auth |
FS-67 — Agent Financial Transaction Value Thresholds
| Field |
Detail |
| Severity |
High |
| Guide ref |
[Guide §1.2.9] — “Enforce transaction value thresholds and action boundaries on agent tool calls (for example to cap financial transaction amounts).” |
| Description |
Checks AgentCore Policy Engine (attached to Gateways) or action-group Lambda functions enforce maximum transaction-value limits (e.g., cap on financial amounts an agent can initiate) to prevent runaway or unauthorized high-value transactions. |
| Detection |
(a) Calls ListGateways (via the bedrock-agentcore-control boto3 client; IAM action bedrock-agentcore:ListGateways) and for each inspects attached Policy Engine Cedar policies for transaction-value constraints (policies referencing amount, limit, or threshold context attributes). (b) Calls lambda:ListFunctions and filters for agent action-group Lambda functions. Inspects each function’s environment variables for threshold-related keys (e.g., MAX_TRANSACTION_AMOUNT, TRANSACTION_LIMIT). Flags gateways and functions with no threshold configuration. |
| Remediation |
1. Add transaction-value threshold environment variables to each agent action-group Lambda (e.g., MAX_TRANSACTION_AMOUNT=10000). 2. Implement threshold enforcement logic in the Lambda handler that rejects or escalates transactions exceeding the limit. 3. Author Cedar policies in the AgentCore Policy Engine that evaluate tool-call context attributes (amount, currency, tool) and deny calls exceeding defined limits. 4. Route transactions exceeding thresholds to a human-in-the-loop approval step via Step Functions callback pattern. |
| Reference |
Policy in AgentCore, AgentCore Example Policies |
FS-68 — API Gateway Request Body Size Limits
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.11] — “To protect your API endpoints, set maximum length limits for input requests when you use large language models (LLMs) directly or through Amazon Bedrock.” |
| Description |
Verifies API Gateway REST/HTTP APIs fronting GenAI endpoints have WAF SizeConstraintStatement rules enforcing a maximum request body size, optionally paired with an API Gateway request-body JSON schema that bounds individual field lengths — to prevent token-exhaustion attacks via oversized prompts. |
| Detection |
Calls apigateway:GetRestApis and for each calls apigateway:GetRequestValidators to check for validators (validators enforce parameter-existence and request-body JSON schema conformance — not total body size). Calls wafv2:GetWebACL for associated ACLs and inspects rules for SizeConstraintStatement targeting the request body. Flags APIs with no WAF SizeConstraintStatement on body, since that is the only AWS-native mechanism that enforces a custom maximum body size in front of API Gateway. |
| Remediation |
1. Primary control — WAF SizeConstraintStatement: Add a WAF SizeConstraintStatement rule on your regional Web ACL that blocks requests whose body size exceeds your maximum allowed prompt length (e.g., 32 KB). Verify that the Web ACL’s AssociationConfig.RequestBody.DefaultSizeInspectionLimit is set high enough (16 KB default; can be increased to 32/48/64 KB) so WAF can actually inspect bodies at the size you are enforcing against — if the inspection limit is lower than the SizeConstraintStatement threshold, oversized requests fall through to oversize handling instead of the rule. This is the only AWS-native way to enforce a custom maximum body size before requests reach API Gateway. 2. Secondary control — API Gateway request validation: Add an API Gateway request validator with a request-body model (JSON schema). Request validators do not enforce total body size, but a JSON schema can constrain individual string fields with maxLength and arrays with maxItems, which indirectly bounds payload content. Note API Gateway REST APIs also enforce a service-level hard limit of 10 MB per request (6 MB when integrated with Lambda) that you cannot lower. 3. Set the max_tokens parameter in Bedrock API calls to cap output length. 4. Implement client-side token counting before submitting requests. |
| Reference |
WAF Size Constraint, WAF Body Inspection Size Limit, API Gateway Request Validation |
| Field |
Detail |
| Severity |
Medium |
| Guide ref |
[Guide §1.2.8] — “Input Validation – Before you send user input to Amazon Bedrock or the tokenizer, validate and sanitize it by removing special characters or using escape sequences. Make sure the input matches your expected format.” |
| Description |
Checks for a Lambda function or API Gateway request validator that sanitizes user prompt input (strips special characters, enforces expected format, rejects oversized inputs) before forwarding to Bedrock, complementing WAF-level controls. |
| Detection |
Calls lambda:ListFunctions and searches for functions with input-validation naming patterns (e.g., “sanitiz”, “validat”, “input-filter”, “prompt-guard”, “preprocess”). Flags if no such functions exist. |
| Remediation |
1. Implement a Lambda authorizer or pre-processing function that: strips or escapes special characters from user input; validates input against an expected format (e.g., regex allowlist); rejects inputs exceeding maximum token/character limits; logs rejected inputs for security monitoring. 2. Use parameterized prompt templates (Bedrock Prompt Management) instead of string concatenation. 3. Apply Bedrock Guardrails PROMPT_ATTACK filter as a complementary control. 4. Integrate the validation function as an API Gateway Lambda authorizer or Step Functions pre-processing step. 5. Implement schema validation for all tool interactions — validate both inputs to and outputs from tools against defined JSON schemas per AWS Prescriptive Guidance for tool integration security. 6. Enforce TLS for all remote tool communications. |
| Reference |
Bedrock Prompt Injection Security, Security Best Practices for Tool Integration |