Skip to content

Bring your own dataset

By default FMBench uses the LongBench dataset dataset for testing the models, but this is not the only dataset you can test with. You may want to test with other datasets available on HuggingFace or use your own datasets for testing. You can do this by converting your dataset to the JSON lines format. We provide a code sample for converting any HuggingFace dataset into JSON lines format and uploading it to the S3 bucket used by FMBench in the bring_your_own_dataset notebook. Follow the steps described in the notebook to bring your own dataset for testing with FMBench.

Support for Open-Orca dataset

Support for Open-Orca dataset and corresponding prompts for Llama3, Llama2 and Mistral, see:

  1. bring_your_own_dataset.ipynb
  2. prompt templates
  3. Llama3 config file with OpenOrca