Deploy Option¶
Configuration Method¶
GenU allows you to change settings in parameter.ts or through AWS CDK context.
Since you can define settings for multiple environments, it is recommended to use parameter.ts when building a new environment. For backward compatibility, parameters are retrieved in the order of context > parameter.ts.
When using Context: While CDK context can be specified with '-c', this won't trigger frontend builds since no code changes are made. For this asset, it's recommended to change all settings in cdk.json.
How to change parameter.ts values¶
Change values in packages/cdk/parameter.ts to configure settings.
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
ragEnabled: false,
// Customize other parameters as needed
},
staging: {
ragEnabled: false,
// Customize other parameters as needed
},
prod: {
ragEnabled: true,
// Customize other parameters as needed
},
};
If the environment specified by env
in CDK context is defined in parameter.ts
, the values in parameter.ts
take precedence. If the specified env
environment is not defined in parameter.ts
, the environment is created with context values.
You can specify env
in the context
of packages/cdk/cdk.json or switch env
with -c
.
// cdk.json
{
"context": {
"env": "dev"
}
}
# Deploy with the env specified in cdk.json context.env
npm run cdk:deploy
# Deploy with environment set to prod
npm run cdk:deploy -- -c env=prod
When developing locally, specify env
as follows:
# Develop locally using the backend specified by cdk.json context.env
npm run web:devw
# Develop locally using the dev2 environment backend
npm run web:devw --env=dev2
How to change cdk.json values¶
Change values under the context in packages/cdk/cdk.json to configure settings. For example, setting "ragEnabled": true
enables the RAG chat use case. After setting context values, redeploy with the following command to apply the settings:
npm run cdk:deploy
Use Case Configuration¶
Enabling RAG Chat (Amazon Kendra) Use Case¶
Set ragEnabled
to true
. (Default is false
)
You can also set kendraIndexLanguage
to prefered language, which may improve search performance.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
ragEnabled: true,
kendraIndexLanguage: 'en',
},
};
// cdk.json
{
"context": {
"ragEnabled": true,
"kendraIndexLanguage": "en"
}
}
After making changes, redeploy with npm run cdk:deploy
to apply the changes. Data stored in /packages/cdk/rag-docs/docs
will be automatically uploaded to the S3 bucket for Kendra data source. (Note that files starting with logs
will not be synchronized.)
Next, perform Kendra Data source Sync with the following steps:
- Open the Amazon Kendra console
- Click on generative-ai-use-cases-index
- Click on Data sources
- Click on "s3-data-source"
- Click on Sync now
When the Status / Summary in Sync run history shows Completed, the process is finished. Files stored in S3 are synchronized and can be searched through Kendra.
Using an Existing Amazon Kendra Index¶
When using an existing Kendra Index, note that ragEnabled
still needs to be true
.
Specify the Index ARN in kendraIndexArn
. If you are using an S3 data source with the existing Kendra Index, specify the bucket name in kendraDataSourceBucketName
.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
kendraIndexArn: '<Kendra Index ARN>',
kendraDataSourceBucketName: '<Kendra S3 Data Source Bucket Name>',
},
};
// cdk.json
{
"context": {
"kendraIndexArn": "<Kendra Index ARN>",
"kendraDataSourceBucketName": "<Kendra S3 Data Source Bucket Name>"
}
}
After making changes, redeploy with npm run cdk:deploy
to apply the changes.
The <Kendra Index ARN>
has the following format:
arn:aws:kendra:<Region>:<AWS Account ID>:index/<Index ID>
For example:
arn:aws:kendra:ap-northeast-1:333333333333:index/77777777-3333-4444-aaaa-111111111111
Enabling RAG Chat (Knowledge Base) Use Case¶
Set ragKnowledgeBaseEnabled
to true
. (Default is false
)
If you have an existing Knowledge Base, set ragKnowledgeBaseId
to the knowledge base ID. (If null
, an OpenSearch Serverless knowledge base will be created)
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
ragKnowledgeBaseEnabled: true,
ragKnowledgeBaseId: 'XXXXXXXXXX',
ragKnowledgeBaseStandbyReplicas: false,
ragKnowledgeBaseAdvancedParsing: false,
ragKnowledgeBaseAdvancedParsingModelId:
'anthropic.claude-3-sonnet-20240229-v1:0',
embeddingModelId: 'amazon.titan-embed-text-v2:0',
},
};
// cdk.json
{
"context": {
"ragKnowledgeBaseEnabled": true,
"ragKnowledgeBaseId": "XXXXXXXXXX",
"ragKnowledgeBaseStandbyReplicas": false,
"ragKnowledgeBaseAdvancedParsing": false,
"ragKnowledgeBaseAdvancedParsingModelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"embeddingModelId": "amazon.titan-embed-text-v2:0",
"rerankingModelId": "amazon.rerank-v1:0",
"queryDecompositionEnabled": true
}
}
ragKnowledgeBaseStandbyReplicas
relates to the redundancy of automatically created OpenSearch Serverless:
false
: Suitable for development and testing purposes. Runs in a single AZ, reducing OCU costs by half.true
: Suitable for production environments. Runs across multiple AZs, enabling high availability.
embeddingModelId
is the model used for embedding. Currently, the following models are supported:
"amazon.titan-embed-text-v1"
"amazon.titan-embed-text-v2:0"
"cohere.embed-multilingual-v3"
"cohere.embed-english-v3"
rerankingModelId
is the model used for reranking. Currently, the following models are supported: (Default is null
)
"amazon.rerank-v1:0"
"cohere.rerank-v3-5:0"
queryDecompositionEnabled
enables query decomposition. (Default is false
)
After making changes, redeploy with npm run cdk:deploy
to apply the changes. The Knowledge Base will be deployed in the region specified by modelRegion
. Please note the following:
- At least one model from the
modelRegion
region must be defined inmodelIds
. - The
embeddingModelId
model must be enabled in Bedrock in themodelRegion
region. - The
rerankingModelId
model must be enabled in Bedrock in themodelRegion
region. - AWS CDK Bootstrap must be completed in the
modelRegion
region before runningnpm run cdk:deploy
.
# Example bootstrap command (if modelRegion is us-east-1)
npx -w packages/cdk cdk bootstrap --region us-east-1
During deployment, data stored in /packages/cdk/rag-docs/docs
will be automatically uploaded to the S3 bucket for Knowledge Base data source. (Note that files starting with logs
will not be synchronized.) After deployment is complete, follow these steps to sync the Knowledge Base Data source:
- Open the Knowledge Base console
- Click on generative-ai-use-cases-jp
- Select s3-data-source and click Sync
When the Status becomes Available, the process is complete. Files stored in S3 have been ingested and can be searched through the Knowledge Base.
Note
After enabling RAG Chat (Knowledge Base), if you want to disable it again, set ragKnowledgeBaseEnabled: false
and redeploy. This will disable RAG Chat (Knowledge Base), but the RagKnowledgeBaseStack
itself will remain. To completely remove it, open the management console and delete the RagKnowledgeBaseStack
stack from CloudFormation in the modelRegion.
Enabling Advanced Parsing¶
You can enable the Advanced Parsing feature. Advanced Parsing is a feature that analyzes and extracts information from unstructured data such as tables and graphs in files. By adding data extracted from tables and graphs in addition to text in files, it can improve RAG accuracy.
ragKnowledgeBaseAdvancedParsing
: Set totrue
to enable Advanced ParsingragKnowledgeBaseAdvancedParsingModelId
: Specify the model ID used for extracting information- Supported models (as of 2024/08)
anthropic.claude-3-sonnet-20240229-v1:0
anthropic.claude-3-haiku-20240307-v1:0
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
ragKnowledgeBaseEnabled: true,
ragKnowledgeBaseId: 'XXXXXXXXXX',
ragKnowledgeBaseStandbyReplicas: false,
ragKnowledgeBaseAdvancedParsing: true,
ragKnowledgeBaseAdvancedParsingModelId:
'anthropic.claude-3-sonnet-20240229-v1:0',
embeddingModelId: 'amazon.titan-embed-text-v2:0',
},
};
// cdk.json
{
"context": {
"ragKnowledgeBaseEnabled": true,
"ragKnowledgeBaseId": "XXXXXXXXXX",
"ragKnowledgeBaseStandbyReplicas": false,
"ragKnowledgeBaseAdvancedParsing": true,
"ragKnowledgeBaseAdvancedParsingModelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"embeddingModelId": "amazon.titan-embed-text-v2:0"
}
}
Changing Chunking Strategy¶
In rag-knowledge-base-stack.ts, there's a section to specify chunkingConfiguration. Uncomment it and refer to the CDK documentation or CloudFormation documentation to change to any desired chunking strategy.
For example, to change to semantic chunking, uncomment and specify as follows:
// packages/cdk/lib/rag-knowledge-base-stack.ts
// Semantic chunking
chunkingConfiguration: {
chunkingStrategy: 'SEMANTIC',
semanticChunkingConfiguration: {
maxTokens: 300,
bufferSize: 0,
breakpointPercentileThreshold: 95,
},
},
Then, refer to the chapter Recreating Knowledge Base or OpenSearch Service to apply changes to apply the changes.
Recreating Knowledge Base or OpenSearch Service to apply changes¶
For Knowledge Base chunking strategy or the following OpenSearch Service parameters, changes will not be reflected if you run npm run cdk:deploy
after making changes:
embeddingModelId
ragKnowledgeBaseStandbyReplicas
ragKnowledgeBaseAdvancedParsing
ragKnowledgeBaseAdvancedParsingModelId
To apply changes, follow these steps to delete and recreate the existing Knowledge Base-related resources:
- Set
ragKnowledgeBaseEnabled
to false and deploy - Open CloudFormation (note the region), click on RagKnowledgeBaseStack
- Click Delete in the upper right to delete RagKnowledgeBaseStack
This will delete S3 buckets and RAG files, temporarily making RAG chat unavailable - Make changes to parameters or chunking strategy
- After RagKnowledgeBaseStack deletion is complete, redeploy with
npm run cdk:deploy
With the deletion of RagKnowledgeBaseStack, the S3 bucket for RAG chat and the RAG files stored in it will be deleted. If you have uploaded RAG files to the S3 bucket, back them up and upload them again after redeployment. Also, follow the previously mentioned steps to sync the Data source again.
How to check OpenSearch Service Index in the management console¶
By default, when you open the Indexes tab of OpenSearch Service from the management console, you'll see an error message: User does not have permissions for the requested resource
.
This is because the Data access policy doesn't allow the IAM user logged into the management console.
Follow these steps to manually add the necessary permissions:
- Open OpenSearch Service (note the region), click on generative-ai-use-cases-jp
- Click on the Associated policy named generative-ai-use-cases-jp at the bottom of the page
- Click Edit in the upper right
- Click Add principals in the Select principals section in the middle of the page, and add the IAM User/Role etc. (the permissions you're logged into the management console with)
- Save
After saving, wait a moment and try accessing again.
Metadata Filter Configuration¶
Filter settings can be configured in packages/common/src/custom/rag-knowledge-base.ts. Customize as needed.
dynamicFilters
: Dynamically creates and applies filters on the application side. (Example: Creating and applying filters based on user attributes such as department) Currently only supports Claude Sonnet 3.5. (Throttling may occur due to quotas) You can also map Cognito Groups or SAML IdP Groups to Attributes for use. (See SAML Integration with Microsoft Entra ID for details)implicitFilters
: If specified, the LLM creates and applies filters for the specified metadata based on the user's question. (Example: Filtering by year mentioned in the user's question to search only data from that year) If an empty array, no filters are applied.hiddenStaticExplicitFilters
: Applies filters at the application level. (Example: Excluding data classified as secret)userDefinedExplicitFilters
: Defines filters displayed in the application UI.
Enabling Agent Chat Use Case¶
In the Agent Chat use case, you can:
- Use Code Interpreter for data visualization, code execution, and data analysis
- Execute actions using Agents for Amazon Bedrock
- Reference vector databases from Knowledge Bases for Amazon Bedrock
Agents are created in the region specified by modelRegion
. The agentEnabled: true
option mentioned below is for creating Code Interpreter agents and search agents. When adding manually created Agents, agentEnabled: true
is not required.
Deploying Code Interpreter Agent¶
You can perform data visualization, code execution, data analysis, etc. using Code Interpreter.
The Code Interpreter agent is deployed when you enable Agent.
Set agentEnabled
to true
. (Default is false
)
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
agentEnabled: true,
},
};
// cdk.json
{
"context": {
"agentEnabled": true
}
}
Deploying Search Agent¶
Creates an Agent that connects to APIs to reference the latest information for responses. You can customize the Agent to add other actions and create multiple Agents to switch between.
The default search agent uses Brave Search API's Data for AI due to its large free tier, request limit considerations, and cost factors, but you can customize it to use other APIs. Getting an API key requires credit card registration even for the free plan.
Note
When you enable the Agent Chat use case, it only sends data to external APIs in the Agent Chat use case. (By default, Brave Search API) Other use cases can continue to be used entirely within AWS. Please check your internal policies and API terms of service before enabling.
Set agentEnabled
and searchAgentEnabled
to true
(default is false
), and specify the search engine API key in searchApiKey
.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
agentEnabled: true,
searchAgentEnabled: true,
searchApiKey: '<Search Engine API Key>',
},
};
// cdk.json
{
"context": {
"agentEnabled": true,
"searchAgentEnabled": true,
"searchApiKey": "<Search Engine API Key>"
}
}
After making changes, redeploy with npm run cdk:deploy
to apply the changes. This will deploy the default search engine Agent.
Note
If you want to disable the search agent after enabling it, set searchAgentEnabled: false
and redeploy. This will disable the search agent, but the WebSearchAgentStack
itself will remain. To completely remove it, open the management console and delete the WebSearchAgentStack
stack from CloudFormation in the modelRegion
.
Adding Manually Created Agents¶
If you want to register manually created Agents other than the default Agents, add additional Agents to agents
. Note that Agents should be created in the modelRegion
.
[!NOTE] >
agentEnabled: true
is an option to create Code Interpreter agents and search agents, so it is not required when adding manually created Agents.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
agents: [
{
displayName: 'MyCustomAgent',
agentId: 'XXXXXXXXX',
aliasId: 'YYYYYYYY',
},
],
},
};
// cdk.json
{
"context": {
"agents": [
{
"displayName": "MyCustomAgent",
"agentId": "XXXXXXXXX",
"aliasId": "YYYYYYYY"
}
]
}
}
You can also modify packages/cdk/lib/construct/agent.ts
to define new Agents. If you use Agents defined in CDK, set agentEnabled: true
.
Deploying Knowledge Bases for Amazon Bedrock Agent¶
You can also manually create and register agents that integrate with Knowledge Bases for Amazon Bedrock.
First, create a knowledge base from the knowledge base AWS console by referring to the Knowledge Bases for Amazon Bedrock documentation. Create it in the same region as modelRegion
.
Next, manually create an Agent from the agent AWS console. Keep the settings mostly default, and enter a prompt referring to the example below for the Agent prompt. Don't set any action groups and proceed, register the knowledge base created in the previous step, and enter a prompt referring to the example below.
Agent prompt example: You are an assistant that responds to instructions. Search for information according to instructions and respond appropriately based on the content. Do not answer about things not mentioned in the information. You can search multiple times.
Knowledge Base prompt example: Search by keywords and get information. You can use it for tasks such as research, asking about X, or summarizing. Guess search keywords from the conversation. Search results may include low-relevance content, so only refer to highly relevant content when answering. You can execute multiple times.
Create an Alias from the created Agent, copy the agentId
and aliasId
, and add them in the following format. Set the displayName
to the name you want to display in the UI. Also, set agentEnabled
to true
.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
agentEnabled: true,
agents: [
{
displayName: 'Knowledge Base',
agentId: 'XXXXXXXXX',
aliasId: 'YYYYYYYY',
},
],
},
};
// cdk.json
{
"context": {
"agentEnabled": true,
"agents": [
{
"displayName": "Knowledge Base",
"agentId": "XXXXXXXXX",
"aliasId": "YYYYYYYY"
}
]
}
}
Displaying Agents Inline¶
By default, Agents can be selected from within the "Agent Chat" use case. By enabling the inline display option, the "Agent Chat" use case will no longer be displayed, and all available Agents will be displayed like other use cases. Set inlineAgents
to true
when you have valid Agents.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
inlineAgents: true,
},
};
// cdk.json
{
"context": {
"inlineAgents": true
}
}
Enabling Flow Chat Use Case¶
In the Flow Chat use case, you can call created Flows.
Add or edit the flows
array.
Manually create Flows from the Amazon Bedrock Flows AWS console. Then create an Alias and add the created Flow's flowId
, aliasId
, and flowName
. In description
, write an explanation to prompt user input. This explanation will be displayed in the Flow chat text box. Here's an example:
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
flows: [
{
flowId: 'XXXXXXXXXX',
aliasId: 'YYYYYYYYYY',
flowName: 'WhatIsItFlow',
description:
'This flow searches the web for any keyword and returns an explanation. Please enter text',
},
],
},
};
// cdk.json
{
"context": {
"flows": [
{
"flowId": "XXXXXXXXXX",
"aliasId": "YYYYYYYYYY",
"flowName": "WhatIsItFlow",
"description": "This flow searches the web for any keyword and returns an explanation. Please enter text"
},
{
"flowId": "ZZZZZZZZZZ",
"aliasId": "OOOOOOOOOO",
"flowName": "RecipeFlow",
"description": "Creates a recipe based on the given JSON.\nPlease enter like {\"dish\": \"curry rice\", \"people\": 3}."
},
{
"flowId": "PPPPPPPPPP",
"aliasId": "QQQQQQQQQQQ",
"flowName": "TravelPlanFlow",
"description": "Creates a travel plan based on the given array.\nPlease enter like [{\"place\": \"Tokyo\", \"day\": 3}, {\"place\": \"Osaka\", \"day\": 2}]."
}
]
}
}
Enabling Image Generation Use Case¶
This is enabled when you define one or more models in imageGenerationModelIds
.
For imageGenerationModelIds
, please refer to Changing Amazon Bedrock Models.
For default values, please refer to packages/cdk/lib/stack-input.ts.
Enabling Video Generation Use Case¶
This is enabled when you define one or more models in videoGenerationModelIds
.
For videoGenerationModelIds
, please refer to Changing Amazon Bedrock Models.
For default values, please refer to packages/cdk/lib/stack-input.ts.
Enabling Video Analysis Use Case¶
In the video analysis use case, you input video image frames and text to have the LLM analyze the image content. There is no direct option to enable the video analysis use case, but a multimodal model must be enabled in the parameters.
As of 2025/03, the multimodal models are:
"anthropic.claude-3-5-sonnet-20241022-v2:0",
"anthropic.claude-3-5-sonnet-20240620-v1:0",
"anthropic.claude-3-opus-20240229-v1:0",
"anthropic.claude-3-sonnet-20240229-v1:0",
"anthropic.claude-3-haiku-20240307-v1:0",
"us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"us.anthropic.claude-3-opus-20240229-v1:0",
"us.anthropic.claude-3-sonnet-20240229-v1:0",
"us.anthropic.claude-3-haiku-20240307-v1:0",
"eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
"eu.anthropic.claude-3-sonnet-20240229-v1:0",
"eu.anthropic.claude-3-haiku-20240307-v1:0",
"apac.anthropic.claude-3-haiku-20240307-v1:0",
"apac.anthropic.claude-3-sonnet-20240229-v1:0",
"apac.anthropic.claude-3-5-sonnet-20240620-v1:0",
"apac.anthropic.claude-3-5-sonnet-20241022-v2:0",
"us.meta.llama3-2-90b-instruct-v1:0",
"us.meta.llama3-2-11b-instruct-v1:0",
"us.mistral.pixtral-large-2502-v1:0",
"eu.mistral.pixtral-large-2502-v1:0",
"amazon.nova-pro-v1:0",
"amazon.nova-lite-v1:0",
"us.amazon.nova-pro-v1:0",
"us.amazon.nova-lite-v1:0",
"eu.amazon.nova-pro-v1:0",
"eu.amazon.nova-lite-v1:0",
"apac.amazon.nova-pro-v1:0",
"apac.amazon.nova-lite-v1:0"
At least one of these must be defined in modelIds
.
For details, refer to Changing Amazon Bedrock Models.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
modelIds: ['anthropic.claude-3-sonnet-20240229-v1:0'],
},
};
// cdk.json
{
"context": {
"modelIds": ["anthropic.claude-3-sonnet-20240229-v1:0"]
}
}
Enabling Prompt Optimization Tool¶
The prompt optimization tool converts input prompts into an optimal form for the specified model. There is no direct option to enable the prompt optimization tool, but the parameter settings must meet the following two conditions:
modelRegion
: A region where Amazon Bedrock Prompt optimization is supportedmodelIds
: At least one model supported by Amazon Bedrock Prompt optimization is specified
For Prompt optimization support status, please refer to this link.
Hiding Specific Use Cases¶
You can hide use cases with the following options. If not specified or set to false, the use case will be displayed.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
hiddenUseCases: {
generate: true, // Hide text generation
summarize: true, // Hide summarization
writer: true, // Hide writing
translate: true, // Hide translation
webContent: true, // Hide Web content extraction
image: true, // Hide image generation
video: true, // Hide video generation
videoAnalyzer: true, // Hide video analysis
diagram: true, // Hide diagram generation
},
},
};
// cdk.json
{
"context": {
"hiddenUseCases": {
"generate": true,
"summarize": true,
"writer": true,
"translate": true,
"webContent": true,
"image": true,
"video": true,
"videoAnalyzer": true,
"diagram": true
}
}
}
Use Case Builder Configuration¶
The Use Case Builder is enabled by default and can be accessed from the "Builder Mode" option displayed on the screen after deployment. To disable the Use Case Builder, specify false
for the parameter useCaseBuilderEnabled
. (Default is true
)
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
useCaseBuilderEnabled: false,
},
};
// cdk.json
{
"context": {
"useCaseBuilderEnabled": false
}
}
Change Amazon Bedrock Models¶
Specify the model region and models in parameter.ts
or cdk.json
using modelRegion
, modelIds
, imageGenerationModelIds
, and videoGenerationModelIds
. For modelIds
, imageGenerationModelIds
, and videoGenerationModelIds
, specify a list of models you want to use from those available in the specified region. AWS documentation provides a list of models and model support by region.
The solution also supports cross-region inference models. Cross-region inference models are represented as {us|eu|apac}.{model-provider}.{model-name}
and must match the {us|eu|apac}
prefix with the region specified in modelRegion.
(Example) If modelRegion
is us-east-1
, us.anthropic.claude-3-5-sonnet-20240620-v1:0
is OK, but eu.anthropic.claude-3-5-sonnet-20240620-v1:0
is not.
This solution supports the following text generation models:
"anthropic.claude-3-5-sonnet-20241022-v2:0",
"anthropic.claude-3-5-haiku-20241022-v1:0",
"anthropic.claude-3-5-sonnet-20240620-v1:0",
"anthropic.claude-3-opus-20240229-v1:0",
"anthropic.claude-3-sonnet-20240229-v1:0",
"anthropic.claude-3-haiku-20240307-v1:0",
"us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"us.anthropic.claude-3-5-sonnet-20241022-v2:0",
"us.anthropic.claude-3-5-haiku-20241022-v1:0",
"us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"us.anthropic.claude-3-opus-20240229-v1:0",
"us.anthropic.claude-3-sonnet-20240229-v1:0",
"us.anthropic.claude-3-haiku-20240307-v1:0",
"eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
"eu.anthropic.claude-3-sonnet-20240229-v1:0",
"eu.anthropic.claude-3-haiku-20240307-v1:0",
"apac.anthropic.claude-3-haiku-20240307-v1:0",
"apac.anthropic.claude-3-sonnet-20240229-v1:0",
"apac.anthropic.claude-3-5-sonnet-20240620-v1:0",
"apac.anthropic.claude-3-5-sonnet-20241022-v2:0",
"us.deepseek.r1-v1:0",
"amazon.titan-text-premier-v1:0",
"us.meta.llama3-3-70b-instruct-v1:0",
"us.meta.llama3-2-90b-instruct-v1:0",
"us.meta.llama3-2-11b-instruct-v1:0",
"us.meta.llama3-2-3b-instruct-v1:0",
"us.meta.llama3-2-1b-instruct-v1:0",
"meta.llama3-1-405b-instruct-v1:0",
"meta.llama3-1-70b-instruct-v1:0",
"meta.llama3-1-8b-instruct-v1:0",
"meta.llama3-70b-instruct-v1:0",
"meta.llama3-8b-instruct-v1:0",
"cohere.command-r-plus-v1:0",
"cohere.command-r-v1:0",
"mistral.mistral-large-2407-v1:0",
"mistral.mistral-large-2402-v1:0",
"mistral.mistral-small-2402-v1:0",
"us.mistral.pixtral-large-2502-v1:0",
"eu.mistral.pixtral-large-2502-v1:0",
"anthropic.claude-v2:1",
"anthropic.claude-v2",
"anthropic.claude-instant-v1",
"mistral.mixtral-8x7b-instruct-v0:1",
"mistral.mistral-7b-instruct-v0:2",
"amazon.nova-pro-v1:0",
"amazon.nova-lite-v1:0",
"amazon.nova-micro-v1:0",
"us.amazon.nova-pro-v1:0",
"us.amazon.nova-lite-v1:0",
"us.amazon.nova-micro-v1:0",
"eu.amazon.nova-pro-v1:0",
"eu.amazon.nova-lite-v1:0",
"eu.amazon.nova-micro-v1:0",
"apac.amazon.nova-pro-v1:0",
"apac.amazon.nova-lite-v1:0",
"apac.amazon.nova-micro-v1:0"
This solution supports the following image generation models:
"amazon.nova-canvas-v1:0",
"amazon.titan-image-generator-v2:0",
"amazon.titan-image-generator-v1",
"stability.sd3-large-v1:0",
"stability.sd3-5-large-v1:0",
"stability.stable-image-core-v1:0",
"stability.stable-image-core-v1:1",
"stability.stable-image-ultra-v1:0",
"stability.stable-image-ultra-v1:1",
"stability.stable-diffusion-xl-v1",
This solution supports the following video generation models:
"amazon.nova-reel-v1:0",
"luma.ray-v2:0"
Please verify that the models you specify are enabled in the specified region.
Using Models from Multiple Regions Simultaneously¶
By default, GenU uses models from the modelRegion
. If you want to use the latest models that are only available in certain regions, you can specify {modelId: '<model name>', region: '<region code>'}
in modelIds
, imageGenerationModelIds
, or videoGenerationModelIds
to call that specific model from the specified region.
Note
When using both the monitoring dashboard and models from multiple regions, the default dashboard settings will not display prompt logs for models outside the primary region (modelRegion
).
To view prompt logs from all regions in a single dashboard, you need these additional configurations:
- Manually enable "Model invocation logging" in the Amazon Bedrock settings for each region
- Add widgets to the CloudWatch dashboard to aggregate logs from each region
Example: Using Tokyo region as primary while also using the latest models from Northern Virginia and Oregon regions¶
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
modelRegion: 'ap-northeast-1',
modelIds: [
{
modelId: 'us.anthropic.claude-3-7-sonnet-20250219-v1:0',
region: 'us-east-1',
},
'apac.anthropic.claude-3-5-sonnet-20241022-v2:0',
'anthropic.claude-3-5-sonnet-20240620-v1:0',
{
modelId: 'us.anthropic.claude-3-5-haiku-20241022-v1:0',
region: 'us-east-1',
},
'apac.amazon.nova-pro-v1:0',
'apac.amazon.nova-lite-v1:0',
'apac.amazon.nova-micro-v1:0',
{ modelId: 'us.deepseek.r1-v1:0', region: 'us-east-1' },
{ modelId: 'us.meta.llama3-3-70b-instruct-v1:0', region: 'us-east-1' },
{ modelId: 'us.meta.llama3-2-90b-instruct-v1:0', region: 'us-east-1' },
{ modelId: 'us.mistral.pixtral-large-2502-v1:0', region: 'us-east-1' },
],
imageGenerationModelIds: [
'amazon.nova-canvas-v1:0',
{ modelId: 'stability.sd3-5-large-v1:0', region: 'us-west-2' },
{ modelId: 'stability.stable-image-core-v1:1', region: 'us-west-2' },
{ modelId: 'stability.stable-image-ultra-v1:1', region: 'us-west-2' },
],
videoGenerationModelIds: [
'amazon.nova-reel-v1:0',
{ modelId: 'luma.ray-v2:0', region: 'us-west-2' },
],
},
};
{
"context": {
"modelRegion": "ap-northeast-1",
"modelIds": [
{
"modelId": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"region": "us-east-1"
},
"apac.anthropic.claude-3-5-sonnet-20241022-v2:0",
"anthropic.claude-3-5-sonnet-20240620-v1:0",
{
"modelId": "us.anthropic.claude-3-5-haiku-20241022-v1:0",
"region": "us-east-1"
},
"apac.amazon.nova-pro-v1:0",
"apac.amazon.nova-lite-v1:0",
"apac.amazon.nova-micro-v1:0",
{
"modelId": "us.deepseek.r1-v1:0",
"region": "us-east-1"
},
{
"modelId": "us.meta.llama3-3-70b-instruct-v1:0",
"region": "us-east-1"
},
{
"modelId": "us.meta.llama3-2-90b-instruct-v1:0",
"region": "us-east-1"
},
{
"modelId": "us.mistral.pixtral-large-2502-v1:0",
"region": "us-east-1"
}
],
"imageGenerationModelIds": [
"amazon.nova-canvas-v1:0",
{
"modelId": "stability.sd3-5-large-v1:0",
"region": "us-west-2"
},
{
"modelId": "stability.stable-image-core-v1:1",
"region": "us-west-2"
},
{
"modelId": "stability.stable-image-ultra-v1:1",
"region": "us-west-2"
}
],
"videoGenerationModelIds": [
"amazon.nova-reel-v1:0",
{
"modelId": "luma.ray-v2:0",
"region": "us-west-2"
}
]
}
}
Example: Using Amazon Bedrock Models in us-east-1 (Virginia)¶
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
modelRegion: 'us-east-1',
modelIds: [
'anthropic.claude-3-5-sonnet-20240620-v1:0',
'anthropic.claude-3-sonnet-20240229-v1:0',
'anthropic.claude-3-haiku-20240307-v1:0',
'amazon.nova-pro-v1:0',
'amazon.nova-lite-v1:0',
'amazon.nova-micro-v1:0',
'amazon.titan-text-premier-v1:0',
'meta.llama3-70b-instruct-v1:0',
'meta.llama3-8b-instruct-v1:0',
'cohere.command-r-plus-v1:0',
'cohere.command-r-v1:0',
'us.mistral.pixtral-large-2502-v1:0',
'mistral.mistral-large-2402-v1:0',
],
imageGenerationModelIds: [
'amazon.nova-canvas-v1:0',
'amazon.titan-image-generator-v2:0',
'amazon.titan-image-generator-v1',
'stability.stable-diffusion-xl-v1',
],
videoGenerationModelIds: ['amazon.nova-reel-v1:0'],
},
};
// cdk.json
{
"context": {
"modelRegion": "us-east-1",
"modelIds": [
"anthropic.claude-3-5-sonnet-20240620-v1:0",
"anthropic.claude-3-sonnet-20240229-v1:0",
"anthropic.claude-3-haiku-20240307-v1:0",
"amazon.nova-pro-v1:0",
"amazon.nova-lite-v1:0",
"amazon.nova-micro-v1:0",
"amazon.titan-text-premier-v1:0",
"meta.llama3-70b-instruct-v1:0",
"meta.llama3-8b-instruct-v1:0",
"cohere.command-r-plus-v1:0",
"cohere.command-r-v1:0",
"mistral.mistral-large-2402-v1:0"
],
"imageGenerationModelIds": [
"amazon.nova-canvas-v1:0",
"amazon.titan-image-generator-v2:0",
"amazon.titan-image-generator-v1",
"stability.stable-diffusion-xl-v1"
],
"videoGenerationModelIds": ["amazon.nova-reel-v1:0"]
}
}
Example: Using Amazon Bedrock Models in us-west-2 (Oregon)¶
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
modelRegion: 'us-west-2',
modelIds: [
'anthropic.claude-3-5-sonnet-20241022-v2:0',
'anthropic.claude-3-5-haiku-20241022-v1:0',
'anthropic.claude-3-5-sonnet-20240620-v1:0',
'anthropic.claude-3-opus-20240229-v1:0',
'anthropic.claude-3-sonnet-20240229-v1:0',
'anthropic.claude-3-haiku-20240307-v1:0',
'meta.llama3-1-70b-instruct-v1:0',
'meta.llama3-1-8b-instruct-v1:0',
'cohere.command-r-plus-v1:0',
'cohere.command-r-v1:0',
'mistral.mistral-large-2407-v1:0',
],
imageGenerationModelIds: [
'amazon.titan-image-generator-v2:0',
'amazon.titan-image-generator-v1',
'stability.sd3-large-v1:0',
'stability.sd3-5-large-v1:0',
'stability.stable-image-core-v1:0',
'stability.stable-image-core-v1:1',
'stability.stable-image-ultra-v1:0',
'stability.stable-image-ultra-v1:1',
'stability.stable-diffusion-xl-v1',
],
},
};
// cdk.json
{
"context": {
"modelRegion": "us-west-2",
"modelIds": [
"anthropic.claude-3-5-sonnet-20241022-v2:0",
"anthropic.claude-3-5-haiku-20241022-v1:0",
"anthropic.claude-3-5-sonnet-20240620-v1:0",
"anthropic.claude-3-opus-20240229-v1:0",
"anthropic.claude-3-sonnet-20240229-v1:0",
"anthropic.claude-3-haiku-20240307-v1:0",
"meta.llama3-1-70b-instruct-v1:0",
"meta.llama3-1-8b-instruct-v1:0",
"cohere.command-r-plus-v1:0",
"cohere.command-r-v1:0",
"mistral.mistral-large-2407-v1:0"
],
"imageGenerationModelIds": [
"amazon.titan-image-generator-v2:0",
"amazon.titan-image-generator-v1",
"stability.sd3-large-v1:0",
"stability.sd3-5-large-v1:0"
"stability.stable-image-core-v1:0",
"stability.stable-image-core-v1:1",
"stability.stable-image-ultra-v1:0",
"stability.stable-image-ultra-v1:1",
"stability.stable-diffusion-xl-v1",
],
}
}
Example: Using cross-region inference models from us (Northern Virginia or Oregon) Amazon Bedrock¶
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
modelRegion: 'us-east-2',
modelIds: [
"us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"us.anthropic.claude-3-5-sonnet-20241022-v2:0",
"us.anthropic.claude-3-5-haiku-20241022-v1:0",
"us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"us.anthropic.claude-3-opus-20240229-v1:0",
"us.anthropic.claude-3-sonnet-20240229-v1:0",
"us.anthropic.claude-3-haiku-20240307-v1:0",
"us.deepseek.r1-v1:0",
"us.meta.llama3-3-70b-instruct-v1:0",
"us.meta.llama3-2-90b-instruct-v1:0",
"us.meta.llama3-2-11b-instruct-v1:0",
"us.meta.llama3-2-3b-instruct-v1:0",
"us.meta.llama3-2-1b-instruct-v1:0",
"us.amazon.nova-pro-v1:0",
"us.amazon.nova-lite-v1:0",
"us.amazon.nova-micro-v1:0",
"cohere.command-r-plus-v1:0",
"cohere.command-r-v1:0",
"mistral.mistral-large-2407-v1:0",
],
imageGenerationModelIds: [
"amazon.titan-image-generator-v2:0",
"amazon.titan-image-generator-v1",
"stability.sd3-large-v1:0",
"stability.sd3-5-large-v1:0"
"stability.stable-image-core-v1:0",
"stability.stable-image-core-v1:1",
"stability.stable-image-ultra-v1:0",
"stability.stable-image-ultra-v1:1",
"stability.stable-diffusion-xl-v1",
],
},
};
// cdk.json
{
"context": {
"modelRegion": "us-west-2",
"modelIds": [
"us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"us.anthropic.claude-3-5-sonnet-20241022-v2:0",
"us.anthropic.claude-3-5-haiku-20241022-v1:0",
"us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"us.anthropic.claude-3-opus-20240229-v1:0",
"us.anthropic.claude-3-sonnet-20240229-v1:0",
"us.anthropic.claude-3-haiku-20240307-v1:0",
"us.deepseek.r1-v1:0",
"us.meta.llama3-3-70b-instruct-v1:0",
"us.meta.llama3-2-90b-instruct-v1:0",
"us.meta.llama3-2-11b-instruct-v1:0",
"us.meta.llama3-2-3b-instruct-v1:0",
"us.meta.llama3-2-1b-instruct-v1:0",
"us.amazon.nova-pro-v1:0",
"us.amazon.nova-lite-v1:0",
"us.amazon.nova-micro-v1:0",
"cohere.command-r-plus-v1:0",
"cohere.command-r-v1:0",
"mistral.mistral-large-2407-v1:0"
],
"imageGenerationModelIds": [
"amazon.titan-image-generator-v2:0",
"amazon.titan-image-generator-v1",
"stability.sd3-large-v1:0",
"stability.sd3-5-large-v1:0"
"stability.stable-image-core-v1:0",
"stability.stable-image-core-v1:1",
"stability.stable-image-ultra-v1:0",
"stability.stable-image-ultra-v1:1",
"stability.stable-diffusion-xl-v1",
],
}
}
Example: Using Amazon Bedrock Models in ap-northeast-1 (Tokyo)¶
Edit parameter.ts
// parameter.ts
const envs: Record<string, StackInput> = {
dev: {
modelRegion: 'ap-northeast-1',
modelIds: [
'anthropic.claude-3-5-sonnet-20240620-v1:0',
'anthropic.claude-3-haiku-20240307-v1:0',
],
imageGenerationModelIds: ['amazon.nova-canvas-v1:0'],
videoGenerationModelIds: ['amazon.nova-reel-v1:0'],
},
};
// cdk.json
{
"context": {
"modelRegion": "ap-northeast-1",
"modelIds": [
"anthropic.claude-3-5-sonnet-20240620-v1:0",
"anthropic.claude-3-haiku-20240307-v1:0"
],
"imageGenerationModelIds": ["amazon.nova-canvas-v1:0"],
"videoGenerationModelIds": ["amazon.nova-reel-v1:0"]
}
}
Using Custom Models with Amazon SageMaker¶
You can use large language models deployed to Amazon SageMaker endpoints. This solution supports SageMaker endpoints using Hugging Face's Text Generation Inference (TGI) LLM inference containers. Ideally, the models should support chat-formatted prompts where user and assistant take turns speaking. Currently, image generation use cases are not supported with Amazon SageMaker endpoints.
There are two ways to deploy models using TGI containers to SageMaker endpoints:
Deploy pre-packaged models from SageMaker JumpStart
SageMaker JumpStart offers one-click deployment of packaged open-source large language models. You can deploy these models by opening them in the JumpStart screen in SageMaker Studio and clicking the "Deploy" button. Examples of Japanese models provided include:
- SageMaker JumpStart Elyza Japanese Llama 2 7B Instruct
- SageMaker JumpStart Elyza Japanese Llama 2 13B Instruct
- SageMaker JumpStart CyberAgentLM2 7B Chat
- SageMaker JumpStart Stable LM Instruct Alpha 7B v2
- SageMaker JumpStart Rinna 3.6B
- SageMaker JumpStart Bilingual Rinna 4B
Deploy with a few lines of code using SageMaker SDK
Thanks to AWS's partnership with Hugging Face, you can deploy models by simply specifying the model ID from Hugging Face using the SageMaker SDK.
From a model's Hugging Face page, select Deploy > Amazon SageMaker to see the code for deploying the model. Copy and run this code to deploy the model. (You may need to adjust parameters like instance size or SM_NUM_GPUS
depending on the model. If deployment fails, you can check the logs in CloudWatch Logs.)
Note
There's one modification needed when deploying: The endpoint name will be displayed in the GenU application and is used to determine the model's prompt template (explained in the next section). Therefore, you need to specify a distinguishable endpoint name.
Add endpoint_name="<distinguishable endpoint name>"
as an argument to huggingface_model.deploy()
when deploying.
Configuring GenU to Call Deployed Models¶
To use deployed SageMaker endpoints with the target solution, specify them as follows:
endpointNames is a list of SageMaker endpoint names. (Example: ["elyza-llama-2", "rinna"]
)
To specify the prompt template used when constructing prompts in the backend, you need to include the prompt type in the endpoint name. (Example: llama-2
, rinna
, etc.) See packages/cdk/lambda/utils/models.ts
for details. Add prompt templates as needed.
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
modelRegion: 'us-east-1',
endpointNames: [
'jumpstart-dft-hf-llm-rinna-3-6b-instruction-ppo-bf16',
'jumpstart-dft-bilingual-rinna-4b-instruction-ppo-bf16',
],
},
};
// cdk.json
{
"context": {
"modelRegion": "<SageMaker Endpoint Region>",
"endpointNames": ["<SageMaker Endpoint Name>"]
}
}
Example: Using Rinna 3.6B and Bilingual Rinna 4B
// cdk.json
{
"context": {
"modelRegion": "us-west-2",
"endpointNames": [
"jumpstart-dft-hf-llm-rinna-3-6b-instruction-ppo-bf16",
"jumpstart-dft-bilingual-rinna-4b-instruction-ppo-bf16"
]
}
}
Example: Using ELYZA-japanese-Llama-2-7b-instruct
// cdk.json
{
"context": {
"modelRegion": "us-west-2",
"endpointNames": ["elyza-japanese-llama-2-7b-inference"]
}
}
Security-Related Settings¶
Disable Self-Signup¶
Set selfSignUpEnabled
to false
. (Default is true
)
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
selfSignUpEnabled: false,
},
};
// cdk.json
{
"context": {
"selfSignUpEnabled": false
}
}
Restrict Email Domains for Sign-up¶
Specify a list of allowed domains in allowedSignUpEmailDomains
(default is null
).
Specify values as a list of strings, and do not include "@" in each string. Users can sign up if their email domain matches any of the allowed domains. Specifying null
means no restrictions, allowing all domains. Specifying []
prohibits all domains, preventing any email address from registering.
When configured, users with non-allowed domains will receive an error when trying to "Create Account" on the web signup screen, preventing them from signing up for GenU. Also, attempting to "Create User" from the Cognito service screen in the AWS Management Console will result in an error.
This does not affect users already created in Cognito. It only applies to new users attempting to sign up or be created.
Configuration Examples
- Example to allow sign-up only with email addresses with the
amazon.com
domain
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
allowedSignUpEmailDomains: ['amazon.com'],
},
};
// cdk.json
{
"context": {
"allowedSignUpEmailDomains": ["amazon.com"] // Change from null to specify allowed domains to enable
}
}
- Example to allow sign-up with email addresses with either
amazon.com
oramazon.jp
domains
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
allowedSignUpEmailDomains: ['amazon.com', 'amazon.jp'],
},
};
// cdk.json
{
"context": {
"allowedSignUpEmailDomains": ["amazon.com", "amazon.jp"] // Change from null to specify allowed domains to enable
}
}
Enable AWS WAF Restrictions¶
IP Address Restrictions¶
To restrict web app access by IP address, you can enable AWS WAF IP address restrictions. You can specify allowed IPv4 CIDRs in an array with allowedIpV4AddressRanges
and allowed IPv6 CIDRs in an array with allowedIpV6AddressRanges
.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
allowedIpV4AddressRanges: ['192.168.0.0/24'],
allowedIpV6AddressRanges: ['2001:0db8::/32'],
},
};
// cdk.json
{
"context": {
"allowedIpV4AddressRanges": ["192.168.0.0/24"], // Change from null to specify allowed CIDR list to enable
"allowedIpV6AddressRanges": ["2001:0db8::/32"] // Change from null to specify allowed CIDR list to enable
}
}
Geographic Restrictions¶
To restrict web app access by country of origin, you can enable AWS WAF geographic restrictions. You can specify allowed countries as an array of Country Codes with allowedCountryCodes
.
For Country Codes, please refer to ISO 3166-2 from Wikipedia.
If "IP Address Restrictions" are also configured, only access from "source IP addresses included in the allowed IP addresses AND from allowed countries" will be permitted.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
allowedCountryCodes: ['JP'],
},
};
// cdk.json
{
"context": {
"allowedCountryCodes": ["JP"] // Change from null to specify allowed country list to enable
}
}
When you specify any of allowedIpV4AddressRanges
, allowedIpV6AddressRanges
, or allowedCountryCodes
and run npm run cdk:deploy
again, a WAF stack will be deployed in us-east-1 (AWS WAF V2 currently only supports us-east-1 when used with CloudFront). If you have never used CDK in us-east-1 before, run the following command to bootstrap before deployment:
npx -w packages/cdk cdk bootstrap --region us-east-1
SAML Authentication¶
You can integrate with SAML authentication features provided by IdPs such as Google Workspace or Microsoft Entra ID (formerly Azure Active Directory). Here are detailed integration procedures:
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
samlAuthEnabled: true,
samlCognitoDomainName:
'your-preferred-name.auth.ap-northeast-1.amazoncognito.com',
samlCognitoFederatedIdentityProviderName: 'EntraID',
},
};
// cdk.json
{
"context": {
"samlAuthEnabled": true,
"samlCognitoDomainName": "your-preferred-name.auth.ap-northeast-1.amazoncognito.com",
"samlCognitoFederatedIdentityProviderName": "EntraID"
}
}
- samlAuthEnabled: Setting to
true
switches to a SAML-specific authentication screen. The conventional authentication using Cognito user pools will no longer be available. - samlCognitoDomainName: Specify the Cognito Domain name to be set in Cognito's App integration.
- samlCognitoFederatedIdentityProviderName: Specify the Identity Provider name to be set in Cognito's Sign-in experience.
Guardrails¶
When using the Converse API (i.e., generative AI models that produce text output), guardrails can be applied. To configure this, change guardrailEnabled
to true
and redeploy.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
guardrailEnabled: true,
},
};
// cdk.json
{
"context": {
"guardrailEnabled": true
}
}
The default guardrail applies sensitive information filters that have proven effective in Japanese conversations. We've also confirmed that custom word filters and regular expressions for sensitive information filters work, so modify packages/cdk/lib/construct/guardrail.ts
as needed. For details, refer to Guardrails for Amazon Bedrock and CfnGuardrail.
Note
If you want to disable guardrails after enabling them, set guardrailEnabled: false
and redeploy. This will disable guardrails when calling generative AI, but the guardrail itself will remain. You can completely remove it by opening the management console and deleting the GuardrailStack
stack from CloudFormation in the modelRegion. Having the guardrail remain doesn't incur costs, but it's best to delete unused resources.
Cost-Related Settings¶
Set Schedule for Automatic Creation and Deletion of Kendra Index¶
Configure settings to automatically create and delete the Kendra index created by GenerativeAiUseCasesDashboardStack on a predetermined schedule. This helps reduce usage fees that accrue based on Kendra index uptime. After creating a Kendra index, it automatically synchronizes with the S3 data source created by default in this repository.
This feature is only effective when ragEnabled
is true
AND kendraIndexArn
is null
(i.e., it doesn't work with externally created Kendra indexes).
Configure as shown in the example below:
- Setting
kendraIndexScheduleEnabled
totrue
enables schedule settings; setting it tofalse
disables scheduling from that deployment forward. - Specify creation and deletion start times in Cron format using
kendraIndexScheduleCreateCron
andkendraIndexScheduleDeleteCron
. - For Cron format details, refer to this documentation. However, to comply with EventBridge specifications, specify times in UTC. Currently, only minute, hour, month, and weekDay can be specified. These items must be specified, and other items will be ignored even if specified.
- Setting to
null
means creation/deletion won't be executed. You can set just one tonull
(configure only one) or both tonull
(execute nothing).
The example below configures index creation to start at 8:00 AM JST Monday-Friday and deletion to start at 8:00 PM JST Monday-Friday.
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
kendraIndexScheduleEnabled: true,
kendraIndexScheduleCreateCron: {
minute: '0',
hour: '23',
month: '*',
weekDay: 'SUN-THU',
},
kendraIndexScheduleDeleteCron: {
minute: '0',
hour: '11',
month: '*',
weekDay: 'MON-FRI',
},
},
};
// cdk.json
{
"context": {
"kendraIndexScheduleEnabled": true,
"kendraIndexScheduleCreateCron": {
"minute": "0",
"hour": "23",
"month": "*",
"weekDay": "SUN-THU"
},
"kendraIndexScheduleDeleteCron": {
"minute": "0",
"hour": "11",
"month": "*",
"weekDay": "MON-FRI"
}
}
}
Even when the Kendra index is deleted, the RAG feature remains on. RAG-related menus continue to be displayed in the web application (GenU). When executing RAG chat, an error occurs because the index doesn't exist, and an error message indicating "Please check the index creation/deletion schedule" will be displayed.
EventBridge rules are used for scheduling, and Step Functions for process control. You can stop scheduling by manually disabling the EventBridge rule. You can also manually execute the Step Functions state machine to create or delete the index.
Note
- After index recreation, only the default S3 data source is added.
- If you added other data sources after index creation, they will be deleted when the index is deleted, and won't be recreated when the index is recreated; you'll need to add them again.
- If you added data sources within this repository's CDK, the data sources will be created but not synchronized. To synchronize CDK-added data sources, either manually synchronize them or modify the code to add them as targets for the Step Functions state machine.
- It takes time from starting Kendra index creation until it becomes available. Specifically, index creation and data source synchronization take time. Therefore, if you have a specific time when you want to start using RAG chat, set the startup time earlier than that. Since this varies depending on resource availability, data source type, and document size/number, check the actual time required if you need precise uptime settings.
- As a rough guideline, index creation takes about 30 minutes, and synchronizing an S3 data source with a few hundred text files takes about 10 minutes (these are just estimates). (Based on this, you would set it 40 minutes earlier.)
- Be especially careful when using external services as data sources, as required time can vary significantly. Also be mindful of API call limits.
- This doesn't guarantee that the index will be stopped outside the set times; it simply executes startup/shutdown on schedule. Be careful with deployment and scheduling timing.
- For example, if you deploy a setting to delete at 8:00 PM at 9:00 PM, it won't be deleted at that point, but will start deletion at 8:00 PM the next day.
- When creating a stack (executing cdk:deploy when GenerativeAiUseCasesStack doesn't exist), if
ragEnabled
istrue
, a Kendra index is created. Even if schedule times are set, the index is created. The index remains created until the next deletion schedule time. - Currently, there's no feature to notify of startup/shutdown errors.
- Each time the index is recreated, the IndexId and DataSourceId change. If other services reference these, you'll need to adapt to these changes.
Enabling Monitoring Dashboard¶
Create a dashboard that aggregates input/output token counts and recent prompts. This dashboard is not built into GenU but is an Amazon CloudWatch dashboard. Amazon CloudWatch dashboards can be viewed from the management console. To view the dashboard, you need to create an IAM user with permissions to log into the management console and view dashboards.
Set dashboard
to true
. (Default is false
)
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
dashboard: true,
},
};
// cdk.json
{
"context": {
"dashboard": true
}
}
After changing, redeploy with npm run cdk:deploy
to apply the changes. A stack named GenerativeAiUseCasesDashboardStack
will be deployed in the region specified by modelRegion
. The output values will be used in the next steps.
Next, configure Amazon Bedrock log output. Open Amazon Bedrock Settings and enable Model invocation logging. For Select the logging destinations, choose CloudWatch Logs only. (You can also choose Both S3 and CloudWatch Logs if you want to output to S3 as well.) For Log group name, specify the GenerativeAiUseCasesDashboardStack.BedrockLogGroup
that was output during npm run cdk:deploy
. (Example: GenerativeAiUseCasesDashboardStack-LogGroupAAAAAAAA-BBBBBBBBBBBB
) Create a new Service role with any name. Note that Model invocation logging settings should be configured in the region specified as modelRegion
.
After configuration, open the GenerativeAiUseCasesDashboardStack.DashboardUrl
that was output during npm run cdk:deploy
.
Note
If you want to disable the monitoring dashboard after enabling it, set dashboard: false
and redeploy. This will disable the monitoring dashboard, but the GenerativeAiUseCasesDashboardStack
itself will remain. To completely remove it, open the management console and delete the GenerativeAiUseCasesDashboardStack
stack from CloudFormation in the modelRegion.
Using a Custom Domain¶
You can use a custom domain for your website URL. A public hosted zone must already be created in Route53 in the same AWS account. For public hosted zones, please refer to: Working with public hosted zones - Amazon Route 53
If you don't have a public hosted zone in the same AWS account, you can also manually add DNS records for AWS ACM SSL certificate validation or use email validation. To use these methods, refer to the CDK documentation for customization: aws-cdk-lib.aws_certificatemanager module · AWS CDK
Set the following values:
hostName
... The hostname for your website. An A record will be created by CDK; you don't need to create it in advancedomainName
... The domain name of the pre-created public hosted zonehostedZoneId
... The ID of the pre-created public hosted zone
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
hostName: 'genai',
domainName: 'example.com',
hostedZoneId: 'XXXXXXXXXXXXXXXXXXXX',
},
};
// cdk.json
{
"context": {
"hostName": "genai",
"domainName": "example.com",
"hostedZoneId": "XXXXXXXXXXXXXXXXXXXX"
}
}
Using Bedrock from a Different AWS Account¶
Note
Agent-related tasks (Agent, Flow, Prompt Optimization Tool) do not support using a different AWS account and may result in errors during execution.
You can use Bedrock from a different AWS account. As a prerequisite, the initial deployment of GenU must be completed.
To use Bedrock from a different AWS account, you need to create one IAM role in that account. You can name the IAM role anything, but you must specify the following IAM role names (that were created during GenU deployment) in the Principal of the IAM role created in the different account:
GenerativeAiUseCasesStack-APIPredictTitleService
GenerativeAiUseCasesStack-APIPredictService
GenerativeAiUseCasesStack-APIPredictStreamService
GenerativeAiUseCasesStack-APIGenerateImageService
For details on how to specify Principals, refer to: AWS JSON Policy Elements: Principal
Principal configuration example (set in the different account)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::111111111111:role/GenerativeAiUseCasesStack-APIPredictTitleServiceXXX-XXXXXXXXXXXX",
"arn:aws:iam::111111111111:role/GenerativeAiUseCasesStack-APIPredictServiceXXXXXXXX-XXXXXXXXXXXX",
"arn:aws:iam::111111111111:role/GenerativeAiUseCasesStack-APIPredictStreamServiceXX-XXXXXXXXXXXX",
"arn:aws:iam::111111111111:role/GenerativeAiUseCasesStack-APIGenerateImageServiceXX-XXXXXXXXXXXX"
]
},
"Action": "sts:AssumeRole",
"Condition": {}
}
]
}
Set the following parameter:
crossAccountBedrockRoleArn
... The ARN of the IAM role created in advance in the different account
Edit parameter.ts
// parameter.ts
const envs: Record<string, Partial<StackInput>> = {
dev: {
crossAccountBedrockRoleArn:
'arn:aws:iam::AccountID:role/PreCreatedRoleName',
},
};
// cdk.json
{
"context": {
"crossAccountBedrockRoleArn": "arn:aws:iam::AccountID:role/PreCreatedRoleName"
}
}
Run npm run cdk:deploy
after changing the settings to apply the changes.
Deploying Multiple Environments in the Same Account¶
When deploying multiple environments in the same account, you need to deploy with different stack names.
Setting env
will append it as a suffix to each stack name and deploy as a separate environment.
env
is also used for environment determination in parameter.ts
, and if the environment specified by env
exists in parameter.ts
, all parameters will be overwritten with the values from parameter.ts
. If the environment specified by env
doesn't exist in parameter.ts
, the application will be deployed with the parameters from the context
in cdk.json
.
Set the following value:
env
... Environment name (default: "" (empty string))
// cdk.json
{
"context": {
"env": "<environment name>"
}
}
Alternatively, you can specify context in the command when deploying:
npm run cdk:deploy -- -c env=<environment name>
Configuration example
// cdk.json
{
"context": {
"env": "dev"
}
}