Benchmark models on EC2¶
You can use FMBench
to benchmark models on hosted on EC2. This can be done in one of two ways:
- Deploy the model on your EC2 instance independantly of
FMBench
and then benchmark it through the Bring your own endpoint mode. - Deploy the model on your EC2 instance through
FMBench
and then benchmark it.
The steps for deploying the model on your EC2 instance are described below.
👉 In this configuration both the model being benchmarked and FMBench
are deployed on the same EC2 instance.
Create a new EC2 instance suitable for hosting an LMI as per the steps described here. Note that you will need to select the correct AMI based on your instance type, this is called out in the instructions.
The steps for benchmarking on different types of EC2 instances (GPU/CPU/Neuron) and different inference containers differ slightly. These are all described below.
Benchmarking options on EC2¶
- Benchmarking on an instance type with NVIDIA GPUs or AWS Chips
- Benchmarking on an CPU instance type with AMD processors
Benchmarking on an instance type with NVIDIA GPUs or AWS Chips¶
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new
conda
environment forFMBench
. See instructions for downloading anaconda herewget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Install
docker-compose
. -
Setup the
fmbench_python311
conda environment. -
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txt
file in the/tmp/fmbench-read/scripts/
directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstring
with your Hugging Face token. -
Run
FMBench
with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. Skip to the next step if benchmarking for AWS Chips. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports. -
For example, to run
FMBench
on allama3-8b-Instruct
model on aninf2.48xlarge
instance, run the command command below. The config file for this example can be viewed here. -
Open a new Terminal and do a
tail
onfmbench.log
to see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-write
directory created automatically by thefmbench
package. Once the run completes all files are copied locally in aresults-*
folder as usual.
Benchmarking on an CPU instance type with AMD processors¶
As of 2024-08-27 this has been tested on a m7a.16xlarge
instance
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new
conda
environment forFMBench
. See instructions for downloading anaconda here# Install Docker and Git using the YUM package manager sudo yum install docker git -y # Start the Docker service sudo systemctl start docker # Download the Miniconda installer for Linux wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Setup the
fmbench_python311
conda environment.# Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel conda create --name fmbench_python311 -y python=3.11 ipykernel # Activate the newly created conda environment source activate fmbench_python311 # Upgrade pip and install the fmbench package pip install -U fmbench
-
Build the
vllm
container for serving the model.-
👉 The
vllm
container we are building locally is going to be references in theFMBench
config file. -
The container being build is for CPU only (GPU support might be added in future).
# Clone the vLLM project repository from GitHub git clone https://github.com/vllm-project/vllm.git # Change the directory to the cloned vLLM project cd vllm # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 4GB sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g .
-
-
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txt
file in the/tmp/fmbench-read/scripts/
directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstring
with your Hugging Face token. -
Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use
sudo
each time. -
Install
docker-compose
.DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} mkdir -p $DOCKER_CONFIG/cli-plugins sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose docker compose version
-
Run
FMBench
with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tail
onfmbench.log
to see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-write
directory created automatically by thefmbench
package. Once the run completes all files are copied locally in aresults-*
folder as usual.
Benchmarking on an CPU instance type with Intel processors¶
As of 2024-08-27 this has been tested on c5.18xlarge
and m5.16xlarge
instances
-
Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new
conda
environment forFMBench
. See instructions for downloading anaconda here# Install Docker and Git using the YUM package manager sudo yum install docker git -y # Start the Docker service sudo systemctl start docker # Download the Miniconda installer for Linux wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Setup the
fmbench_python311
conda environment.# Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel conda create --name fmbench_python311 -y python=3.11 ipykernel # Activate the newly created conda environment source activate fmbench_python311 # Upgrade pip and install the fmbench package pip install -U fmbench
-
Build the
vllm
container for serving the model.-
👉 The
vllm
container we are building locally is going to be references in theFMBench
config file. -
The container being build is for CPU only (GPU support might be added in future).
# Clone the vLLM project repository from GitHub git clone https://github.com/vllm-project/vllm.git # Change the directory to the cloned vLLM project cd vllm # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 12GB sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=12g .
-
-
Create local directory structure needed for
FMBench
and copy all publicly available dependencies from the AWS S3 bucket forFMBench
. This is done by running thecopy_s3_content.sh
script available as part of theFMBench
repo. Replace/tmp
in the command below with a different path if you want to store the config files and theFMBench
generated data in a different directory. -
To download the model files from HuggingFace, create a
hf_token.txt
file in the/tmp/fmbench-read/scripts/
directory containing the Hugging Face token you would like to use. In the command below replace thehf_yourtokenstring
with your Hugging Face token. -
Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use
sudo
each time. -
Install
docker-compose
.DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} mkdir -p $DOCKER_CONFIG/cli-plugins sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose docker compose version
-
Run
FMBench
with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The--write-bucket
parameter value is just a placeholder and an actual S3 bucket is not required. You could set the--tmp-dir
flag to an EFA path instead of/tmp
if using a shared path for storing config files and reports. -
Open a new Terminal and and do a
tail
onfmbench.log
to see a live log of the run. -
All metrics are stored in the
/tmp/fmbench-write
directory created automatically by thefmbench
package. Once the run completes all files are copied locally in aresults-*
folder as usual.
Benchmarking the Triton inference server on GPU instances¶
Here are the steps for using the Triton inference server and benchmarking model performance. The steps presented here are based on several publicly available resources such as Deploying Hugging Face Llama2-7b Model in Triton, TensorRT-LLM README, End to end flow to run Llama-7b and others.
- Install the TensorRT backend.
git clone https://github.com/triton-inference-server/tensorrtllm_backend.git --branch v0.12.0
# Update the submodules
cd tensorrtllm_backend
# Install git-lfs if needed
apt-get update && apt-get install git-lfs -y --no-install-recommends
git lfs install
git submodule update --init --recursive
- That's it, everything else is encapsulated within the
FMBench
code.FMBench
would copy the relevant scripts in the${HOME}/deploy_on_triton
directory and rundocker compose up -d
to deploy the model. Here is an example command for benchmarkingLlama3-8b-instruct
model served using Triton.