Skip to content

Benchmark models on EC2

You can use FMBench to benchmark models on hosted on EC2. This can be done in one of two ways:

  • Deploy the model on your EC2 instance independantly of FMBench and then benchmark it through the Bring your own endpoint mode.
  • Deploy the model on your EC2 instance through FMBench and then benchmark it.

The steps for deploying the model on your EC2 instance are described below.

👉 In this configuration both the model being benchmarked and FMBench are deployed on the same EC2 instance.

Create a new EC2 instance suitable for hosting an LMI as per the steps described here. Note that you will need to select the correct AMI based on your instance type, this is called out in the instructions.

The steps for benchmarking on different types of EC2 instances (GPU/CPU/Neuron) and different inference containers differ slightly. These are all described below.

Benchmarking options on EC2

Benchmarking on an instance type with NVIDIA GPUs or AWS Chips

  1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new conda environment for FMBench.

    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash Miniconda3-latest-Linux-x86_64.sh -b  # Run the Miniconda installer in batch mode (no manual intervention)
    rm -f Miniconda3-latest-Linux-x86_64.sh    # Remove the installer script after installation
    eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell
    conda init  # Initialize conda, adding it to the shell  
    
  2. Install docker-compose.

    sudo apt-get update
    sudo apt-get install --reinstall docker.io -y
    sudo apt-get install -y docker-compose
    docker compose version 
    
  3. Setup the fmbench_python311 conda environment.

    conda create --name fmbench_python311 -y python=3.11 ipykernel
    source activate fmbench_python311;
    pip install -U fmbench
    
  4. Create local directory structure needed for FMBench and copy all publicly available dependencies from the AWS S3 bucket for FMBench. This is done by running the copy_s3_content.sh script available as part of the FMBench repo. Replace /tmp in the command below with a different path if you want to store the config files and the FMBench generated data in a different directory.

    curl -s https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/copy_s3_content.sh | sh -s -- /tmp
    
  5. To download the model files from HuggingFace, create a hf_token.txt file in the /tmp/fmbench-read/scripts/ directory containing the Hugging Face token you would like to use. In the command below replace the hf_yourtokenstring with your Hugging Face token.

    echo hf_yourtokenstring > /tmp/fmbench-read/scripts/hf_token.txt
    
  6. Run FMBench with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The --write-bucket parameter value is just a placeholder and an actual S3 bucket is not required. Skip to the next step if benchmarking for AWS Chips. You could set the --tmp-dir flag to an EFA path instead of /tmp if using a shared path for storing config files and reports.

    fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-ec2-llama3-8b.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
    
  7. For example, to run FMBench on a llama3-8b-Instruct model on an inf2.48xlarge instance, run the command command below. The config file for this example can be viewed here.

    fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
    
  8. Open a new Terminal and do a tail on fmbench.log to see a live log of the run.

    tail -f fmbench.log
    
  9. All metrics are stored in the /tmp/fmbench-write directory created automatically by the fmbench package. Once the run completes all files are copied locally in a results-* folder as usual.

Benchmarking on an instance type with NVIDIA GPU and the Triton inference server

  1. Follow steps in the Benchmarking on an instance type with NVIDIA GPUs or AWS Chips section to install FMBench but do not run any benchmarking tests yet.

  2. Once FMBench is installed then install the following additional dependencies for Triton.

    cd ~
    git clone https://github.com/triton-inference-server/tensorrtllm_backend.git  --branch v0.12.0
    # Update the submodules
    cd tensorrtllm_backend
    # Install git-lfs if needed
    sudo apt --fix-broken install
    sudo apt-get update && sudo apt-get install git-lfs -y --no-install-recommends
    git lfs install
    git submodule update --init --recursive
    
  3. Now you are ready to run benchmarking with Triton. For example for benchmarking Llama3-8b model on a g5.12xlarge use the following command:

    fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-llama3-8b-g5.12xl-tp-2-mc-max-triton-ec2.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
    

Benchmarking on an instance type with AWS Chips and the Triton inference server

As of 2024-09-26 this has been tested on a trn1.32xlarge instance

  1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new conda environment for FMBench. See instructions for downloading anaconda here. (Note: Your EC2 instance needs to have at least 200GB of disk space for this test)

    # Install Docker and Git using the YUM package manager
    sudo yum install docker git -y
    
    # Start the Docker service
    sudo systemctl start docker
    
    # Download the Miniconda installer for Linux
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash Miniconda3-latest-Linux-x86_64.sh -b  # Run the Miniconda installer in batch mode (no manual intervention)
    rm -f Miniconda3-latest-Linux-x86_64.sh    # Remove the installer script after installation
    eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell
    conda init  # Initialize conda, adding it to the shell
    
  2. Setup the fmbench_python311 conda environment.

    # Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel
    conda create --name fmbench_python311 -y python=3.11 ipykernel
    
    # Activate the newly created conda environment
    source activate fmbench_python311
    
    # Upgrade pip and install the fmbench package
    pip install -U fmbench
    
  3. First we need to build the required docker image for triton, and push it locally. To do this, curl the Triton Dockerfile and the script to build and push the triton image locally:

    # curl the docker file for triton
    curl -o ./Dockerfile_triton https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/src/fmbench/scripts/triton/Dockerfile_triton
    
    # curl the script that builds and pushes the triton image locally
    curl -o build_and_push_triton.sh https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/src/fmbench/scripts/triton/build_and_push_triton.sh
    
    # Make the triton build and push script executable, and run it
    chmod +x build_and_push_triton.sh
    ./build_and_push_triton.sh
    
    - Now wait until the docker image is saved locally and then follow the instructions below to start a benchmarking test.

  4. Create local directory structure needed for FMBench and copy all publicly available dependencies from the AWS S3 bucket for FMBench. This is done by running the copy_s3_content.sh script available as part of the FMBench repo. Replace /tmp in the command below with a different path if you want to store the config files and the FMBench generated data in a different directory.

    curl -s https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/copy_s3_content.sh | sh -s -- /tmp
    
  5. To download the model files from HuggingFace, create a hf_token.txt file in the /tmp/fmbench-read/scripts/ directory containing the Hugging Face token you would like to use. In the command below replace the hf_yourtokenstring with your Hugging Face token.

    echo hf_yourtokenstring > /tmp/fmbench-read/scripts/hf_token.txt
    
  6. Run FMBench with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The --write-bucket parameter value is just a placeholder and an actual S3 bucket is not required. You could set the --tmp-dir flag to an EFA path instead of /tmp if using a shared path for storing config files and reports.

    fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-llama3-8b-trn1-32xlarge-triton-djl.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
    
  7. Open a new Terminal and and do a tail on fmbench.log to see a live log of the run.

    tail -f fmbench.log
    
  8. All metrics are stored in the /tmp/fmbench-write directory created automatically by the fmbench package. Once the run completes all files are copied locally in a results-* folder as usual.

  9. Note: To deploy a model on AWS Chips using Triton with djl or vllm backend, the configuration file requires the backend and container_params parameters within the inference_spec dictionary. The backend options are vllm/djl and the container_params contains container specific parameters to deploy the model, for example tensor parallel degree, n positions, etc. Tensor parallel degree is a necessary field to be added. If no other parameters are provided, the container will choose the default parameters during deployment.

      # Backend options: [djl, vllm]
      backend: djl
    
      # Container parameters that are used during model deployment
      container_params:
        # tp degree is a mandatory parameter
        tp_degree: 8
        amp: "f16"
        attention_layout: 'BSH'
        collectives_layout: 'BSH'
        context_length_estimate: 3072, 3584, 4096
        max_rolling_batch_size: 8
        model_loader: "tnx"
        model_loading_timeout: 2400
        n_positions: 4096
        output_formatter: "json"
        rolling_batch: "auto"
        rolling_batch_strategy: "continuous_batching"
        trust_remote_code: true
        # modify the serving properties to match your model and requirements
        serving.properties:
    

Benchmarking on an CPU instance type with AMD processors

As of 2024-08-27 this has been tested on a m7a.16xlarge instance

  1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new conda environment for FMBench. See instructions for downloading anaconda here

    # Install Docker and Git using the YUM package manager
    sudo yum install docker git -y
    
    # Start the Docker service
    sudo systemctl start docker
    
    # Download the Miniconda installer for Linux
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash Miniconda3-latest-Linux-x86_64.sh -b  # Run the Miniconda installer in batch mode (no manual intervention)
    rm -f Miniconda3-latest-Linux-x86_64.sh    # Remove the installer script after installation
    eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell
    conda init  # Initialize conda, adding it to the shell
    
  2. Setup the fmbench_python311 conda environment.

    # Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel
    conda create --name fmbench_python311 -y python=3.11 ipykernel
    
    # Activate the newly created conda environment
    source activate fmbench_python311
    
    # Upgrade pip and install the fmbench package
    pip install -U fmbench
    
  3. Build the vllm container for serving the model.

    1. 👉 The vllm container we are building locally is going to be references in the FMBench config file.

    2. The container being build is for CPU only (GPU support might be added in future).

      # Clone the vLLM project repository from GitHub
      git clone https://github.com/vllm-project/vllm.git
      
      # Change the directory to the cloned vLLM project
      cd vllm
      
      # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 4GB
      sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g .
      
  4. Create local directory structure needed for FMBench and copy all publicly available dependencies from the AWS S3 bucket for FMBench. This is done by running the copy_s3_content.sh script available as part of the FMBench repo. Replace /tmp in the command below with a different path if you want to store the config files and the FMBench generated data in a different directory.

    curl -s https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/copy_s3_content.sh | sh -s -- /tmp
    
  5. To download the model files from HuggingFace, create a hf_token.txt file in the /tmp/fmbench-read/scripts/ directory containing the Hugging Face token you would like to use. In the command below replace the hf_yourtokenstring with your Hugging Face token.

    echo hf_yourtokenstring > /tmp/fmbench-read/scripts/hf_token.txt
    
  6. Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use sudo each time.

    sudo usermod -a -G docker $USER
    newgrp docker
    
  7. Install docker-compose.

    DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
    mkdir -p $DOCKER_CONFIG/cli-plugins
    sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose
    sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
    docker compose version
    
  8. Run FMBench with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The --write-bucket parameter value is just a placeholder and an actual S3 bucket is not required. You could set the --tmp-dir flag to an EFA path instead of /tmp if using a shared path for storing config files and reports.

    fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
    
  9. Open a new Terminal and and do a tail on fmbench.log to see a live log of the run.

    tail -f fmbench.log
    
  10. All metrics are stored in the /tmp/fmbench-write directory created automatically by the fmbench package. Once the run completes all files are copied locally in a results-* folder as usual.

Benchmarking on an CPU instance type with Intel processors

As of 2024-08-27 this has been tested on c5.18xlarge and m5.16xlarge instances

  1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new conda environment for FMBench. See instructions for downloading anaconda here

    # Install Docker and Git using the YUM package manager
    sudo yum install docker git -y
    
    # Start the Docker service
    sudo systemctl start docker
    
    # Download the Miniconda installer for Linux
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention)
    rm -f Miniconda3-latest-Linux-x86_64.sh    # Remove the installer script after installation
    eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell
    conda init  # Initialize conda, adding it to the shell
    
  2. Setup the fmbench_python311 conda environment.

    # Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel
    conda create --name fmbench_python311 -y python=3.11 ipykernel
    
    # Activate the newly created conda environment
    source activate fmbench_python311
    
    # Upgrade pip and install the fmbench package
    pip install -U fmbench
    
  3. Build the vllm container for serving the model.

    1. 👉 The vllm container we are building locally is going to be references in the FMBench config file.

    2. The container being build is for CPU only (GPU support might be added in future).

      # Clone the vLLM project repository from GitHub
      git clone https://github.com/vllm-project/vllm.git
      
      # Change the directory to the cloned vLLM project
      cd vllm
      
      # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 12GB
      sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=12g .
      
  4. Create local directory structure needed for FMBench and copy all publicly available dependencies from the AWS S3 bucket for FMBench. This is done by running the copy_s3_content.sh script available as part of the FMBench repo. Replace /tmp in the command below with a different path if you want to store the config files and the FMBench generated data in a different directory.

    curl -s https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/copy_s3_content.sh | sh -s -- /tmp
    
  5. To download the model files from HuggingFace, create a hf_token.txt file in the /tmp/fmbench-read/scripts/ directory containing the Hugging Face token you would like to use. In the command below replace the hf_yourtokenstring with your Hugging Face token.

    echo hf_yourtokenstring > /tmp/fmbench-read/scripts/hf_token.txt
    
  6. Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use sudo each time.

    sudo usermod -a -G docker $USER
    newgrp docker
    
  7. Install docker-compose.

    DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
    mkdir -p $DOCKER_CONFIG/cli-plugins
    sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o $DOCKER_CONFIG/cli-plugins/docker-compose
    sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
    docker compose version
    
  8. Run FMBench with a packaged or a custom config file. This step will also deploy the model on the EC2 instance. The --write-bucket parameter value is just a placeholder and an actual S3 bucket is not required. You could set the --tmp-dir flag to an EFA path instead of /tmp if using a shared path for storing config files and reports.

    fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
    
  9. Open a new Terminal and and do a tail on fmbench.log to see a live log of the run.

    tail -f fmbench.log
    
  10. All metrics are stored in the /tmp/fmbench-write directory created automatically by the fmbench package. Once the run completes all files are copied locally in a results-* folder as usual.

Benchmarking models on Ollama

As of 10/24/2024, this has been tested on g6e.2xlarge with llama 3.1 8b

  1. Install Ollama.

    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Pull the model required.

    ollama pull llama3.1:8b
    
  3. Serve the model. This might produce the following error message: Error: accepts 0 arg(s), received 1 but you can safely ignore this error.

    ollama serve llama3.1:8b
    
  4. Create local directory structure needed for FMBench and copy all publicly available dependencies from the AWS S3 bucket for FMBench. This is done by running the copy_s3_content.sh script available as part of the FMBench repo. Replace /tmp in the command below with a different path if you want to store the config files and the FMBench generated data in a different directory.

    curl -s https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/copy_s3_content.sh | sh -s -- /tmp
    
  5. Run FMBench with a packaged or a custom config file. The --write-bucket parameter value is just a placeholder and an actual S3 bucket is not required. You could set the --tmp-dir flag to an EFA path instead of /tmp if using a shared path for storing config files and reports.

    fmbench --config-file /tmp/fmbench-read/configs/llama3.1/8b/config-ec2-llama3-1-8b-g6e-2xlarge-byoe-ollama.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1