Prerequisites
Deployment prerequisites
Before you start the deployment, review your development environment and tools, your AWS Service Quotas, and access to Amazon Bedrock models.
Development environment and tools
Tool | Version | Recommendation |
---|---|---|
pnpm | >=8 <9 | Use these instructions to install pnpm |
NodeJS | >=18 | Use Node Version Manager (nvm) |
Python | >=3.10,<4 | Use Python Version Manager (pyenv) |
Poetry | >=1.5,<2 | https://python-poetry.org/docs/ |
AWS CLI | v2 | https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html |
Docker1 | v20+ | https://docs.docker.com/desktop/ |
JDK | v17+ | Amazon Corretto 17 |
AWS Service Quotas
Ensure the necessary service quota limits are increased based on your configuration before deploying. The deployment performs a check and will fail early if limits are not met.
Warning
The embedding model usage is required for all deployments at this time, and must be 5 unless you configure it differently in the code.
SageMaker processing job quota ml.g4dn.2xlarge for processing job usage must be >= 5
. This is required for current bulk processing of the dataset into vectorstore.
Quota limits for predefined models
For predefined models, check the instance type to determine the quota limits you need to increase.
Predefined Model
Disclaimer: Use of Third-Party Models
By using this sample, you agree that you may be deploying third-party models (“Third-Party Model”) into your specified user account. AWS does not own and does not exercise any control over these Third-Party Models. You should perform your own independent assessment, and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the Third-Party Models, and any outputs from the Third-Party Models. AWS does not make any representations or warranties regarding the Third-Party Models.
Bedrock Model access
Provider | Model | Instance / Size | Model Status1 | Prompt Status2 | Notes |
---|---|---|---|---|---|
SageMaker |
Falcon Lite | ml.g5.12xlarge |
Quite stable and great for general purpose, flexible prompt engineering | ||
SageMaker |
Falcon 7B | ml.g5.16xlarge |
Prefer Lite version | ||
SageMaker |
Falcon 40B | ml.g5.48xlarge |
Expensive for unquantifiable benefits, Lite version is preferred at this time | ||
SageMaker |
LLama2 | ml.g5.12xlarge |
Followup questions are inconsistent, and formatting markup in responses - complex prompt engineer | ||
Bedrock |
Claude V2 | - | Good results and easy to work with | ||
Bedrock |
Jurassic | - | Should work | ||
Bedrock |
Titan | - | Should work |
Service Quotas
Ensure the necessary Service Quota limits for SageMaker
models meet the capacity of your deployment configurations (<instance> for endpoint usage
).
Status Keys
- Model Status: Defines stability of deployment/integration with model and model/endpoint kwargs configuration optimization.
- Prompt Status: Defines robustness and adaptability of prompt templates and engineering for this model.
Status | Description |
---|---|
- | Not applicable |
Not tested yet, might work, might not | |
Very experimental, with high probability of undesirable results or errors | |
Works for specific use case, but not vetted in the wild | |
Should work for general use cases, but not fully battle tested | |
Awesome, battled tested and ready for use |
If you only deploy the Falcon Lite predefined model, then you only need to ensure ml.g5.12xlarge for endpoint usage >= 1
, while the other quotas of X for endpoint usage can remain 0. With the exception of below minimum requirements.
Access to Amazon Bedrock models
Request access to Bedrock models
Before using the models, you will need to request access to specific models via Amazon Bedrock.
To request access to additional models, select the Model access link from the left side navigation panel in the Amazon Bedrock console. For detailed instructions, refer to the Bedrock Model access user guide.
Next steps
-
Docker virtual disk space should have at least 30GB of free space. If you see
no space left on device
error during build, free up space by runningdocker system prune -f
and/or increasing the virtual disk size. ↩