Quickstart - run `FMBench` on SageMaker Notebook¶

Each FMBench run works with a configuration file that contains the information about the model, the deployment steps, and the tests to run. A typical FMBench workflow involves either directly using an already provided config file from the configs folder in the FMBench GitHub repo or editing an already provided config file as per your own requirements (say you want to try benchmarking on a different instance type, or a different inference container etc.).

👉 A simple config file with key parameters annotated is included in this repo, see config-llama2-7b-g5-quick.yml. This file benchmarks performance of Llama2-7b on an ml.g5.xlarge instance and an ml.g5.2xlarge instance. You can use this config file as it is for this Quickstart.
Launch the AWS CloudFormation template included in this repository using one of the buttons from the table below. The CloudFormation template creates the following resources within your AWS account: Amazon S3 buckets, Amazon IAM role and an Amazon SageMaker Notebook with this repository cloned. A read S3 bucket is created which contains all the files (configuration files, datasets) required to run FMBench and a write S3 bucket is created which will hold the metrics and reports generated by FMBench. The CloudFormation stack takes about 5-minutes to create.

AWS Region	Link
us-east-1 (N. Virginia)
us-west-2 (Oregon)
us-gov-west-1 (GovCloud West)

Once the CloudFormation stack is created, navigate to SageMaker Notebooks and open the fmbench-notebook.

On the fmbench-notebook open a Terminal and run the following commands.

conda create --name fmbench_python311 -y python=3.11 ipykernel
source activate fmbench_python311;
pip install -U fmbench

Now you are ready to fmbench with the following command line. We will use a sample config file placed in the S3 bucket by the CloudFormation stack for a quick first run.
1. We benchmark performance for the Llama2-7b model on a ml.g5.xlarge and a ml.g5.2xlarge instance type, using the huggingface-pytorch-tgi-inference inference container. This test would take about 30 minutes to complete and cost about $0.20.
2. It uses a simple relationship of 750 words equals 1000 tokens, to get a more accurate representation of token counts use the Llama2 tokenizer (instructions are provided in the next section). It is strongly recommended that for more accurate results on token throughput you use a tokenizer specific to the model you are testing rather than the default tokenizer. See instructions provided later in this document on how to use a custom tokenizer.
```
account=`aws sts get-caller-identity | jq .Account | tr -d '"'`
region=`aws configure get region`
fmbench --config-file s3://sagemaker-fmbench-read-${region}-${account}/configs/llama2/7b/config-llama2-7b-g5-quick.yml > fmbench.log 2>&1
```
3. Open another terminal window and do a tail -f on the fmbench.log file to see all the traces being generated at runtime.
```
tail -f fmbench.log
```
4. 👉 For streaming support on SageMaker and Bedrock checkout these config files:
  1. config-llama3-8b-g5-streaming.yml
  2. config-bedrock-llama3-streaming.yml
The generated reports and metrics are available in the sagemaker-fmbench-write-<replace_w_your_aws_region>-<replace_w_your_aws_account_id> bucket. The metrics and report files are also downloaded locally and in the results directory (created by FMBench) and the benchmarking report is available as a markdown file called report.md in the results directory. You can view the rendered Markdown report in the SageMaker notebook itself or download the metrics and report files to your machine for offline analysis.

`FMBench` on GovCloud¶

No special steps are required for running FMBench on GovCloud. The CloudFormation link for us-gov-west-1 has been provided in the section above.

Not all models available via Bedrock or other services may be available in GovCloud. The following commands show how to run FMBench to benchmark the Amazon Titan Text Express model in the GovCloud. See the Amazon Bedrock GovCloud page for more details.

account=`aws sts get-caller-identity | jq .Account | tr -d '"'`
region=`aws configure get region`
fmbench --config-file s3://sagemaker-fmbench-read-${region}-${account}/configs/bedrock/config-bedrock-titan-text-express.yml > fmbench.log 2>&1

Quickstart - run FMBench on SageMaker Notebook¶

FMBench on GovCloud¶

Quickstart - run `FMBench` on SageMaker Notebook¶

`FMBench` on GovCloud¶