Deployment Planning
Please check the following considerations before deployment:
Deployable Regions
The services used in this solution, or the Amazon EC2 instance types, may not be available in all AWS Regions at this time. Please launch this solution in an AWS Region that provides the required services.
Verified Deployable Regions
Region Name | Verified |
---|---|
US East (N. Virginia) | |
US West (Oregon) |
If you deploy in an unverified region, you may need to handle the following or face the following issues:
- When deploying in regions that do not support
g5
instance types, you need to manually specify the instance type used by Karpenter asg4dn
or other GPU instance types.
Deploying in AWS China Regions
Please refer to Deploying in AWS China Regions
IAM Permissions
Deploying this solution requires administrator or equivalent permissions. Due to the number of components involved, we do not provide a minimal permissions list.
Service Quotas
Each AWS account in each AWS Region has quotas on the number of resources you can create. You can view your service quotas using the Service Quota tool in the AWS console. If a service quota can be increased, you can request an increase through the tool by opening a case.
The main service quotas related to this solution are:
AWS Service | Quota Entry | Estimated Usage | Adjustable |
---|---|---|---|
Amazon EC2 | Running On-Demand G and VT instances | Based on max concurrent GPU instances | |
Amazon EC2 | All G and VT Spot Instance Requests | Based on max concurrent GPU instances | |
Amazon SNS | Messages Published per Second | Based on max concurrent requests |
In addition, you need to consider the following service quotas during deployment:
AWS Service | Quota Entry | Estimated Usage | Adjustable |
---|---|---|---|
Amazon VPC | VPCs per Region | 1 | |
Amazon VPC | NAT gateways per Availability Zone | 1 | |
Amazon EC2 | EC2-VPC Elastic IPs | 1 | |
Amazon S3 | General purpose buckets | 1 per queue |
Choosing a Stable Diffusion Runtime
You need a runtime to deploy the Stable Diffusion model and provide API access.
Currently, there are multiple community Stable Diffusion runtimes available:
Runtime Name | Link | Verified |
---|---|---|
Stable Diffusion Web UI | GitHub | |
ComfyUI | GitHub | |
InvokeAI | GitHub |
You can also choose other runtimes or build your own runtime. You need to package the runtime as a container image to run it on EKS.
You need to fully understand and comply with the license terms of the Stable Diffusion runtime you are using.
Example Runtime
You can use the community-provided example Dockerfile to build the runtime container images for Stable Diffusion Web UI and ComfyUI. Please note that this image is only for technical evaluation and testing purposes, and should not be deployed to production environments.
Model Storage
By default, this solution will load the model to the /opt/ml/code/models
directory, please ensure your runtime is configured to read the model from this directory.
You need to disable mmap to achieve the highest performance for your runtime.
- For SD Web UI, you need to set
disable_mmap_load_safetensors: true
inconfig.json
- For ComfyUI, you need to manually modify the source code as guided in the community issue.
Notes on SD Web UI Runtime
For the SD Web UI runtime, depending on the model being run, the runtime can be either a static runtime (pre-loading the model) or a dynamic runtime (loading the model on-demand).
- Static runtime requires specifying the model to be used in
modelFilename
. This model will be loaded into GPU memory at startup. - Dynamic runtime requires specifying
dynamicModel: true
. In this case, there is no need to specify the model in advance. The runtime will load the model from Amazon S3 and perform model inference based on the model used in the request.
Other Important Notes and Limitations
-
In the current version, this solution will automatically create a new VPC when deployed. This VPC includes:
- CIDR
10.0.0.0/16
- 3 public subnets distributed across different Availability Zones, with subnet size
/19
- 3 private subnets distributed across different Availability Zones, with subnet size
/19
- 3 NAT gateways (placed in public subnets)
- 1 Internet gateway
- Corresponding route tables and security groups
Currently, the parameters of this VPC cannot be customized.
- CIDR
-
In the current version, this solution can only be deployed on a newly created EKS cluster, and the version is fixed at
1.28
. We will update the cluster version as Amazon EKS releases new versions.