Benchmark models on EC2¶
You can use FMBench
to benchmark models on hosted on EC2. This can be done in one of two ways:
- Deploy the model on your EC2 instance independantly of
FMBench
and then benchmark it through the Bring your own endpoint mode. - Deploy the model on your EC2 instance through
FMBench
and then benchmark it.
The steps for deploying the model on your EC2 instance are described below.
👉 In this configuration both the model being benchmarked and FMBench
are deployed on the same EC2 instance.
Create a new EC2 instance suitable for hosting an LMI as per the steps described here. Note that you will need to select the correct AMI based on your instance type, this is called out in the instructions.
The steps for benchmarking on different types of EC2 instances (GPU/CPU/Neuron) and different inference containers differ slightly. These are all described below.
Benchmarking options on EC2¶
- Benchmarking on an instance type with NVIDIA GPUs or AWS Chips
- Benchmarking on an instance type with NVIDIA GPU and the Triton inference server
- Benchmarking on an instance type with AWS Chips and the Triton inference server
- Benchmarking on an CPU instance type with AMD processors
- Benchmarking models on Ollama
Benchmarking on an instance type with NVIDIA GPUs or AWS Chips¶
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new
conda
environment forFMBench
.wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Install
docker-compose
. -
Setup the
fmbench_python311
conda environment. -
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txt
file in the/tmp/fmbench-read/scripts/
directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstring
with your Hugging Face token. -
Run
FMBench
with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. Skip to the next step if benchmarking for AWS Chips. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports. -
For example, to run
FMBench
on allama3-8b-Instruct
model on aninf2.48xlarge
instance, run the command command below. The config file for this example can be viewed here. -
Open a new Terminal and do a
tail
onfmbench.log
to see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-write
directory created automatically by thefmbench
package. Once the run completes all files are copied locally in aresults-*
folder as usual.
Benchmarking on an instance type with NVIDIA GPU and the Triton inference server¶
-
Follow steps in the Benchmarking on an instance type with NVIDIA GPUs or AWS Chips section to install
FMBench
but do not run any benchmarking tests yet. -
Once
FMBench
is installed then install the following additional dependencies for Triton.cd ~ git clone https://github.com/triton-inference-server/tensorrtllm_backend.git --branch v0.12.0 # Update the submodules cd tensorrtllm_backend # Install git-lfs if needed sudo apt --fix-broken install sudo apt-get update && sudo apt-get install git-lfs -y --no-install-recommends git lfs install git submodule update --init --recursive
-
Now you are ready to run benchmarking with Triton. For example for benchmarking
Llama3-8b
model on ag5.12xlarge
use the following command:
Benchmarking on an instance type with AWS Chips and the Triton inference server¶
As of 2024-09-26 this has been tested on a trn1.32xlarge
instance
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new
conda
environment forFMBench
. See instructions for downloading anaconda here. (Note: Your EC2 instance needs to have at least 200GB of disk space for this test)# Install Docker and Git using the YUM package manager sudo yum install docker git -y # Start the Docker service sudo systemctl start docker # Download the Miniconda installer for Linux wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Setup the
fmbench_python311
conda environment.# Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel conda create --name fmbench_python311 -y python=3.11 ipykernel # Activate the newly created conda environment source activate fmbench_python311 # Upgrade pip and install the fmbench package pip install -U fmbench
-
First we need to build the required docker image for
triton
, and push it locally. To do this, curl theTriton Dockerfile
and the script to build and push the triton image locally:- Now wait until the docker image is saved locally and then follow the instructions below to start a benchmarking test.# curl the docker file for triton curl -o ./Dockerfile_triton https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/src/fmbench/scripts/triton/Dockerfile_triton # curl the script that builds and pushes the triton image locally curl -o build_and_push_triton.sh https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/src/fmbench/scripts/triton/build_and_push_triton.sh # Make the triton build and push script executable, and run it chmod +x build_and_push_triton.sh ./build_and_push_triton.sh
-
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txt
file in the/tmp/fmbench-read/scripts/
directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstring
with your Hugging Face token. -
Run
FMBench
with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tail
onfmbench.log
to see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-write
directory created automatically by thefmbench
package. Once the run completes all files are copied locally in aresults-*
folder as usual. -
Note: To deploy a model on AWS Chips using Triton with
djl
orvllm
backend, the configuration file requires thebackend
andcontainer_params
parameters within theinference_spec
dictionary. The backend options arevllm
/djl
and thecontainer_params
contains container specific parameters to deploy the model, for exampletensor parallel degree
,n positions
, etc. Tensor parallel degree is a necessary field to be added. If no other parameters are provided, the container will choose the default parameters during deployment.# Backend options: [djl, vllm] backend: djl # Container parameters that are used during model deployment container_params: # tp degree is a mandatory parameter tp_degree: 8 amp: "f16" attention_layout: 'BSH' collectives_layout: 'BSH' context_length_estimate: 3072, 3584, 4096 max_rolling_batch_size: 8 model_loader: "tnx" model_loading_timeout: 2400 n_positions: 4096 output_formatter: "json" rolling_batch: "auto" rolling_batch_strategy: "continuous_batching" trust_remote_code: true # modify the serving properties to match your model and requirements serving.properties:
Benchmarking on an CPU instance type with AMD processors¶
As of 2024-08-27 this has been tested on a m7a.16xlarge
instance
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new
conda
environment forFMBench
. See instructions for downloading anaconda here# Install Docker and Git using the YUM package manager sudo yum install docker git -y # Start the Docker service sudo systemctl start docker # Download the Miniconda installer for Linux wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Setup the
fmbench_python311
conda environment.# Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel conda create --name fmbench_python311 -y python=3.11 ipykernel # Activate the newly created conda environment source activate fmbench_python311 # Upgrade pip and install the fmbench package pip install -U fmbench
-
Build the
vllm
container for serving the model.-
👉 The
vllm
container we are building locally is going to be references in theFMBench
config file. -
The container being build is for CPU only (GPU support might be added in future).
# Clone the vLLM project repository from GitHub git clone https://github.com/vllm-project/vllm.git # Change the directory to the cloned vLLM project cd vllm # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 4GB sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g .
-
-
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txt
file in the/tmp/fmbench-read/scripts/
directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstring
with your Hugging Face token. -
Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use
sudo
each time. -
Install
docker-compose
.DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} mkdir -p $DOCKER_CONFIG/cli-plugins sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose docker compose version
-
Run
FMBench
with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tail
onfmbench.log
to see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-write
directory created automatically by thefmbench
package. Once the run completes all files are copied locally in aresults-*
folder as usual.
Benchmarking on an CPU instance type with Intel processors¶
As of 2024-08-27 this has been tested on c5.18xlarge
and m5.16xlarge
instances
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new
conda
environment forFMBench
. See instructions for downloading anaconda here# Install Docker and Git using the YUM package manager sudo yum install docker git -y # Start the Docker service sudo systemctl start docker # Download the Miniconda installer for Linux wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Setup the
fmbench_python311
conda environment.# Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel conda create --name fmbench_python311 -y python=3.11 ipykernel # Activate the newly created conda environment source activate fmbench_python311 # Upgrade pip and install the fmbench package pip install -U fmbench
-
Build the
vllm
container for serving the model.-
👉 The
vllm
container we are building locally is going to be references in theFMBench
config file. -
The container being build is for CPU only (GPU support might be added in future).
# Clone the vLLM project repository from GitHub git clone https://github.com/vllm-project/vllm.git # Change the directory to the cloned vLLM project cd vllm # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 12GB sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=12g .
-
-
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txt
file in the/tmp/fmbench-read/scripts/
directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstring
with your Hugging Face token. -
Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use
sudo
each time. -
Install
docker-compose
.DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} mkdir -p $DOCKER_CONFIG/cli-plugins sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose docker compose version
-
Run
FMBench
with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tail
onfmbench.log
to see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-write
directory created automatically by thefmbench
package. Once the run completes all files are copied locally in aresults-*
folder as usual.
Benchmarking models on Ollama¶
As of 10/24/2024, this has been tested on g6e.2xlarge
with llama 3.1 8b
-
Install Ollama.
-
Pull the model required.
-
Serve the model. This might produce the following error message:
Error: accepts 0 arg(s), received 1
but you can safely ignore this error. -
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
Run
FMBench
with a packaged or a custom config file. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports.