Deploy Galileo

Deploying Galileo

Before you start the deployment, make sure:

Docker is running, with sufficient virtual disk space.
Your AWS credentials are set up and available in the shell.
You have reviewed the EULA before requesting access to Bedrock models.

Using the CLI

Tip

We recommend using the CLI for individuals, developer account, trials, and demos.

To deploy Galileo:

Open a CLI terminal and navigate to the Galileo directory.
Run these commands:
```
export AWS_REGION={current aws region you're in}
export AWS_PROFILE=default
pnpm bootstrap-account
pnpm galileo-cli deploy
```
This will display a guided CLI workflow for input.

Note

If you get a (node:12100) [EACCES] Error: spawn galileo-cli EACCES message, ignore it.

The following options are displayed for selecting a foundation model.
To navigate these prompts:
- The circle filled green is the currently selected option. All other options are unselected.
- The underlined and blue text option is the currently highlighted option.
- Use the keyboard arrows up and down to move the currently highlighted option.
- Use the spacebar to select/deselect the currently highlighted option.
- To submit your final answer, press Enter.
Select the following options using the CLI:
- AWS Profile: default
- AWS Region: (press Enter, the filled in region should be the correct region. If not, fill in the region code)
- Administrator email address: (enter your email address)
- Administrator username: admin
- Deploy main application stack?: Y
- Choose the foundation models to suq:qpport: (unselect all then press enter)
- Foundation model region?: us-west-2
- Enable Bedrock?: Y
- Bedrock Region: us-west-2
- Bedrock model ids: Anthropic Claude (anthropic.claude-v2)
- Bedrock endpoint url (optional): (press Enter, should be blank)
- Choose the default foundation model: bedrock::anthropic.claude-v2
- Press Enter for the rest of the prompts.

Your terminal displays this information:

    ____                            __             _  _  _                            _  _
   / __ \   __ _ __      __ ___    / /__ _   __ _ | |(_)| |  ___   ___           ___ | |(_)
  / / _` | / _` |\ \ /\ / // __|  / // _` | / _` || || || | / _ \ / _ \  _____  / __|| || |
 | | (_| || (_| | \ V  V / \__ \ / /| (_| || (_| || || || ||  __/| (_) ||_____|| (__ | || |
  \ \__,_| \__,_|  \_/\_/  |___//_/  \__, | \__,_||_||_||_| \___| \___/         \___||_||_|
   \____/                            |___/
✔ Config file name? … config.json
✔ Application Name (stack/resource naming) … Galileo
✔ AWS Profile … default
✔ AWS Region (app) … us-west-2
✔ Administrator email address Enter email address to automatically create Cognito admin user, otherwise leave blank
 … someone@somewhere.com
✔ Administrator username … yourusername
✔ Choose the foundation models to support ›
✔ Foundation model region? … us-west-2
✔ Enable Bedrock? … yes
✔ Bedrock region … us-west-2
✔ Loading available Bedrock models
✔ Bedrock model ids › Anthropic Claude (anthropic.claude-v2)
✔ Bedrock endpoint url (optional) …
✔ Choose the default foundation model › bedrock::anthropic.claude-v2
✔ Embedding model
Enter the model id to use for embeddings, supports any AutoML model

Example: sentence-transformers/all-mpnet-base-v2, intfloat/multilingual-e5-large, sentence-transformers/all-MiniLM-L6-v2
 … sentence-transformers/all-mpnet-base-v2
✔ Embedding Vector Size
Enter the vector size for the chosen embedding model
 … 768
✔ Embedding model instance type
Enable autoscaling the embedding instance capacity based

Recommend "ml.g4dn.xlarge" for smaller datasets, and "ml.g4dn.2xlarge" for larger datasets
 … ml.g4dn.xlarge
✔ Embedding model max capacity (autoscaling)
Enable autoscaling the embedding instance capacity based

Ensure adequate Service Quota limit for SageMaker > "ml.g4dn.xlarge for endpoint usage"
 … 1
✔ Indexing Pipeline instance type
Instance type used for processing dataset files and indexing to vector store
 … ml.t3.large
✔ Indexing Pipeline max containers
Number of containers used for indexing files to vector store

Ensure adequate Service Quota limit for SageMaker > "ml.t3.large for processing job"
 … 5
✔ Create vector store "index"?
If enabled, will create a database index for the data to improve search over large datasets

Recommended for very large datasets
 … no
✔ Deploy sample dataset? ›
✔ Enable tooling in dev stage (SageMaker Studio, PgAdmin)? ›
Synthesizing project repository...
? [CDK DEPLOY] Execute the following command in 615092085770?
cdk deploy --require-approval never --region us-west-2 --profile default -c "configPath=config.json" Dev/Galileo
 … yes

Info

It takes about 40 minutes to build and deploy everything. While we wait, continue to the next page to have a look at how this project was built and how to extend it.

Updating configuration settings

The CLI will generate an application configuration file in demo/infra/config.json, which will persist your configuration. You can modify this file and redeploy to change the configuration, or use the CLI.

pnpm run galileo-cli --help for cli help info

For more details on CLI operations, refer to the CLI page.

Cross-Region deployments

Galileo CLI allows you to deploy your LLM stack and application stack into different Regions.

Using a CI/CD pipeline

Tip

We recommend using the CI/CD pipeline deployment method for live services and for shared team accounts.

Note: Make sure your AWS credentials in your shell are correct.

Create a CodeCommit repository in your target account/Region name "galileo".
Push this git repository to the mainline branch
Run pnpm run deploy:pipeline

Deploying manually

Tip

We recommend using a manual deployment method only if you need to have full control and want to modify the application.

pnpm install
pnpm build

cd demo/infra
pnpm exec cdk deploy --app cdk.out --require-approval never Dev/Galileo
pnpm exec cdk deploy --app cdk.out --require-approval never Dev/Galileo-SampleDataset # (optional)

What is deployed?

As part of the deployment, the following services are deployed in your AWS account:

A pre-built conversational UI that enables contextual conversation with memory,
An optimized embeddings vector store based on RDS Postgres and pgvector,
A scalable and elastic data ingestion pipeline,
A low latency text embeddings inference engine,
Retrieval augmented generation (RAG) features, and
A choice of open source large language models.