Bulk test questions against your deployed system

Why?

If users (prompt engineers, developers) want to test their prompts with a long list of questions (50-100+), without automation, it would take too long, especially if one wants to test multiple versions of prompts, with various model kwargs (i.e.: temperature) or search kwargs (filters).

With this feature, users can feed in their chat configuration and questions, and run an automated test.

Preparation

Chat settings

Once you're done with your prompt setup, you can export it via the UI, or create your own.

Use the Actions menu feature on the UI
Copy your settings and save is as a JSON file on your developer machine

Questions

The CLI expects an array of strings saved in a JSON file:

// test-questions.json
[
 "Question 1",
 "Question 2",
 ...
 "Question N"
]

Additional parameters

You'll need to provide the following cloud resources for the CLI:

Userpool used to manage users (retrieves automatically the list, you just need to choose)
The Lambda Url Endpoint to trigger (AWS Console > CloudFormation > Stacks > Galileo-InferenceEngine* > Outputs > InferenceEngineLambdaUrl)
Create a chat on the UI manually, and use the newly created chat's ID

Running the test

To run the automated test, use the following command:

pnpm run galileo-cli invoke chat-bulk

Note: The test will directly invoke the chat endpoint in the deployed solution, so first you will need to authenticate yourself with your user.