Benchmark models on EC2¶
You can use FMBench to benchmark models on hosted on EC2. This can be done in one of two ways:
- Deploy the model on your EC2 instance independently of
FMBenchand then benchmark it through the Bring your own endpoint mode. - Deploy the model on your EC2 instance through
FMBenchand then benchmark it.
The steps for deploying the model on your EC2 instance are described below.
👉 In this configuration both the model being benchmarked and FMBench are deployed on the same EC2 instance.
Create a new EC2 instance suitable for hosting an LMI as per the steps described here. Note that you will need to select the correct AMI based on your instance type, this is called out in the instructions.
The steps for benchmarking on different types of EC2 instances (GPU/CPU/Neuron) and different inference containers differ slightly. These are all described below.
Benchmarking options on EC2¶
- Benchmarking on an instance type with NVIDIA GPUs or AWS Chips
- Benchmarking on an instance type with NVIDIA GPU and the Triton inference server
- Benchmarking on an instance type with AWS Chips and the Triton inference server
- Benchmarking on an CPU instance type with AMD processors
- Benchmarking on an CPU instance type with Intel processors
-
Benchmarking on an CPU instance type with ARM processors (Graviton 4)
- Benchmarking models on Ollama
Benchmarking on an instance type with NVIDIA GPUs or AWS Chips¶
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs
uvon the instance which is then used to create a new virtual environment forFMBench. -
Install
docker-compose. -
Setup the
.fmbench_python311Python environment. -
Create local directory structure needed for
FMBenchand copy all publicly available dependencies from the AWS S3 bucket forFMBench. This is done by running thecopy_s3_content.shscript available as part of theFMBenchrepo. Replace/tmpin the command below with a different path if you want to store the config files and theFMBenchgenerated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txtfile in the/tmp/fmbench-read/scripts/directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstringwith your Hugging Face token. Replace/tmpin the command below if you are using/path/to/your/custom/tmpto store the config files and theFMBenchgenerated data. -
Run
FMBenchwith a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucketparameter value is just a placeholder and an actual S3 bucket is not required. Skip to the next step if benchmarking for AWS Chips. You could set the--tmp-dirflag to an EFA path instead of/tmpif using a shared path for storing config files and reports. -
For example, to run
FMBenchon allama3-8b-Instructmodel on aninf2.48xlargeinstance, run the command command below. The config file for this example can be viewed here. -
Open a new Terminal and do a
tailonfmbench.logto see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-writedirectory created automatically by thefmbenchpackage. Once the run completes all files are copied locally in aresults-*folder as usual.
Benchmarking on an instance type with NVIDIA GPU and the Triton inference server¶
-
Follow steps in the Benchmarking on an instance type with NVIDIA GPUs or AWS Chips section to install
FMBenchbut do not run any benchmarking tests yet. -
Once
FMBenchis installed then install the following additional dependencies for Triton.cd ~ git clone https://github.com/triton-inference-server/tensorrtllm_backend.git --branch v0.12.0 # Update the submodules cd tensorrtllm_backend # Install git-lfs if needed sudo apt --fix-broken install sudo apt-get update && sudo apt-get install git-lfs -y --no-install-recommends git lfs install git submodule update --init --recursive -
Now you are ready to run benchmarking with Triton. For example for benchmarking
Llama3-8bmodel on ag5.12xlargeuse the following command:
Benchmarking on an instance type with AWS Chips and the Triton inference server¶
As of 2024-09-26 this has been tested on a trn1.32xlarge instance
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs
uvon the instance which is then used to create a new Python virtual environment forFMBench.(Note: Your EC2 instance needs to have at least 200GB of disk space for this test) -
Setup the
.fmbench_python311Python virtual environment. -
First we need to build the required docker image for
triton, and push it locally. To do this, curl theTriton Dockerfileand the script to build and push the triton image locally:- Now wait until the docker image is saved locally and then follow the instructions below to start a benchmarking test.# curl the docker file for triton curl -o ./Dockerfile_triton https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/fmbench/scripts/triton/Dockerfile_triton # curl the script that builds and pushes the triton image locally curl -o build_and_push_triton.sh https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/fmbench/scripts/triton/build_and_push_triton.sh # Make the triton build and push script executable, and run it chmod +x build_and_push_triton.sh ./build_and_push_triton.sh -
Create local directory structure needed for
FMBenchand copy all publicly available dependencies from the AWS S3 bucket forFMBench. This is done by running thecopy_s3_content.shscript available as part of theFMBenchrepo. Replace/tmpin the command below with a different path if you want to store the config files and theFMBenchgenerated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txtfile in the/tmp/fmbench-read/scripts/directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstringwith your Hugging Face token. Replace/tmpin the command below if you are using/path/to/your/custom/tmpto store the config files and theFMBenchgenerated data. -
Run
FMBenchwith a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucketparameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dirflag to an EFA path instead of/tmpif using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tailonfmbench.logto see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-writedirectory created automatically by thefmbenchpackage. Once the run completes all files are copied locally in aresults-*folder as usual. -
Note: To deploy a model on AWS Chips using Triton with
djlorvllmbackend, the configuration file requires thebackendandcontainer_paramsparameters within theinference_specdictionary. The backend options arevllm/djland thecontainer_paramscontains container specific parameters to deploy the model, for exampletensor parallel degree,n positions, etc. Tensor parallel degree is a necessary field to be added. If no other parameters are provided, the container will choose the default parameters during deployment.# Backend options: [djl, vllm] backend: djl # Container parameters that are used during model deployment container_params: # tp degree is a mandatory parameter tp_degree: 8 amp: "f16" attention_layout: 'BSH' collectives_layout: 'BSH' context_length_estimate: 3072, 3584, 4096 max_rolling_batch_size: 8 model_loader: "tnx" model_loading_timeout: 2400 n_positions: 4096 output_formatter: "json" rolling_batch: "auto" rolling_batch_strategy: "continuous_batching" trust_remote_code: true # modify the serving properties to match your model and requirements serving.properties:
Benchmarking on an CPU instance type with AMD processors¶
As of 2024-08-27 this has been tested on a m7a.16xlarge instance
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs
uvon the instance which is then used to create a new Python virtual environment forFMBench. -
Setup the
.fmbench_python311Python virtual environment. -
Build the
vllmcontainer for serving the model.-
👉 The
vllmcontainer we are building locally is going to be references in theFMBenchconfig file. -
The container being build is for CPU only (GPU support might be added in future).
# Clone the vLLM project repository from GitHub git clone https://github.com/vllm-project/vllm.git # Change the directory to the cloned vLLM project cd vllm # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 4GB sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g .
-
-
Create local directory structure needed for
FMBenchand copy all publicly available dependencies from the AWS S3 bucket forFMBench. This is done by running thecopy_s3_content.shscript available as part of theFMBenchrepo. Replace/tmpin the command below with a different path if you want to store the config files and theFMBenchgenerated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txtfile in the/tmp/fmbench-read/scripts/directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstringwith your Hugging Face token. Replace/tmpin the command below if you are using/path/to/your/custom/tmpto store the config files and theFMBenchgenerated data. -
Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use
sudoeach time. -
Install
docker-compose.DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} mkdir -p $DOCKER_CONFIG/cli-plugins sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose docker compose version -
Run
FMBenchwith a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucketparameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dirflag to an EFA path instead of/tmpif using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tailonfmbench.logto see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-writedirectory created automatically by thefmbenchpackage. Once the run completes all files are copied locally in aresults-*folder as usual.
Benchmarking on an CPU instance type with Intel processors¶
As of 2024-08-27 this has been tested on c5.18xlarge and m5.16xlarge instances
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs
uvon the instance which is then used to create a new Python virtual environment forFMBench. -
Setup the
.fmbench_python311Python virtual environment. -
Build the
vllmcontainer for serving the model.-
👉 The
vllmcontainer we are building locally is going to be references in theFMBenchconfig file. -
The container being build is for CPU only (GPU support might be added in future).
# Clone the vLLM project repository from GitHub git clone https://github.com/vllm-project/vllm.git # Change the directory to the cloned vLLM project cd vllm # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 12GB sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=12g .
-
-
Create local directory structure needed for
FMBenchand copy all publicly available dependencies from the AWS S3 bucket forFMBench. This is done by running thecopy_s3_content.shscript available as part of theFMBenchrepo. Replace/tmpin the command below with a different path if you want to store the config files and theFMBenchgenerated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txtfile in the/tmp/fmbench-read/scripts/directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstringwith your Hugging Face token. Replace/tmpin the command below if you are using/path/to/your/custom/tmpto store the config files and theFMBenchgenerated data. -
Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use
sudoeach time. -
Install
docker-compose.DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} mkdir -p $DOCKER_CONFIG/cli-plugins sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose docker compose version -
Run
FMBenchwith a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucketparameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dirflag to an EFA path instead of/tmpif using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tailonfmbench.logto see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-writedirectory created automatically by thefmbenchpackage. Once the run completes all files are copied locally in aresults-*folder as usual.
Benchmarking models on Ollama¶
As of 10/24/2024, this has been tested on g6e.2xlarge with llama 3.1 8b
-
Install Ollama.
-
Pull the model required.
-
Serve the model. This might produce the following error message:
Error: accepts 0 arg(s), received 1but you can safely ignore this error. -
Create local directory structure needed for
FMBenchand copy all publicly available dependencies from the AWS S3 bucket forFMBench. This is done by running thecopy_s3_content.shscript available as part of theFMBenchrepo. Replace/tmpin the command below with a different path if you want to store the config files and theFMBenchgenerated data in a different directory. -
Run
FMBenchwith a packaged or a custom config file. The--write-bucketparameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dirflag to an EFA path instead of/tmpif using a shared path for storing config files and reports.
Benchmarking on an CPU instance type with ARM processors¶
As of 12/24/2024, this has been tested on c8g.24xlarge with llama 3 8b Instruct on Ubuntu Server 24.04 LTS (HVM), SSD Volume Type
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs
Dockeranduvon the instance which is then used to create a new Python virtual environment forFMBench. -
Setup the
.fmbench_python311Python virtual environment. -
Build the
vllmcontainer for serving the model.-
👉 The
vllmcontainer we are building locally is going to be referenced in theFMBenchconfig file. -
The container being built is for ARM CPUs only.
# Clone the vLLM project repository from GitHub git clone https://github.com/vllm-project/vllm.git # Change the directory to the cloned vLLM project cd vllm # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 12GB sudo docker build -f Dockerfile.arm -t vllm-cpu-env --shm-size=12g .
-
-
Create local directory structure needed for
FMBenchand copy all publicly available dependencies from the AWS S3 bucket forFMBench. This is done by running thecopy_s3_content.shscript available as part of theFMBenchrepo. Replace/tmpin the command below with a different path if you want to store the config files and theFMBenchgenerated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txtfile in the/tmp/fmbench-read/scripts/directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstringwith your Hugging Face token. Replace/tmpin the command below if you are using/path/to/your/custom/tmpto store the config files and theFMBenchgenerated data. -
Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use
sudoeach time. -
Install
docker-compose.DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} mkdir -p $DOCKER_CONFIG/cli-plugins sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose docker compose version -
Run
FMBenchwith a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucketparameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dirflag to an EFA path instead of/tmpif using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tailonfmbench.logto see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-writedirectory created automatically by thefmbenchpackage. Once the run completes all files are copied locally in aresults-*folder as usual.