Inference script
We are using a multi-model enpoint hosted on Sagemaker and provide a inference script to process requests and send responses back.
The inference script is currently hardcoded with the supported models (lib/rag-engines/sagemaker-rag-models/model/inference.py)
py
embeddings_models = [
"intfloat/multilingual-e5-large",
"sentence-transformers/all-MiniLM-L6-v2",
]
cross_encoder_models = ["cross-encoder/ms-marco-MiniLM-L-12-v2"]
embeddings_models = [
"intfloat/multilingual-e5-large",
"sentence-transformers/all-MiniLM-L6-v2",
]
cross_encoder_models = ["cross-encoder/ms-marco-MiniLM-L-12-v2"]
The API is JSON body based:
json
{
"type": "embeddings",
"model": "intfloat/multilingual-e5-large",
"input": "I love Berlin"
}
{
"type": "embeddings",
"model": "intfloat/multilingual-e5-large",
"input": "I love Berlin"
}
json
{
"type": "cross-encoder",
"model": "cross-encoder/ms-marco-MiniLM-L-12-v2",
"input": "I love Berlin",
"passages": ["I love Paris", "I love London"]
}
{
"type": "cross-encoder",
"model": "cross-encoder/ms-marco-MiniLM-L-12-v2",
"input": "I love Berlin",
"passages": ["I love Paris", "I love London"]
}