/embeddings
Quick Start
from litellm import embedding
import os
os.environ['OPENAI_API_KEY'] = ""
response = embedding(model='text-embedding-ada-002', input=["good morning from litellm"])
Proxy Usage
NOTE
For vertex_ai,
export GOOGLE_APPLICATION_CREDENTIALS="absolute/path/to/service_account.json"
Add model to config
model_list:
- model_name: textembedding-gecko
litellm_params:
model: vertex_ai/textembedding-gecko
general_settings:
master_key: sk-1234
Start proxy
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Test
- Curl
- OpenAI (python)
- Langchain Embeddings
curl --location 'http://0.0.0.0:4000/embeddings' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"input": ["Academia.edu uses"], "model": "textembedding-gecko", "encoding_format": "base64"}'
from openai import OpenAI
client = OpenAI(
api_key="sk-1234",
base_url="http://0.0.0.0:4000"
)
client.embeddings.create(
model="textembedding-gecko",
input="The food was delicious and the waiter...",
encoding_format="float"
)
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="textembedding-gecko", openai_api_base="http://0.0.0.0:4000", openai_api_key="sk-1234")
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(f"VERTEX AI EMBEDDINGS")
print(query_result[:5])
Image Embeddings
For models that support image embeddings, you can pass in a base64 encoded image string to the input param.
- SDK
- PROXY
from litellm import embedding
import os
# set your api key
os.environ["COHERE_API_KEY"] = ""
response = embedding(model="cohere/embed-english-v3.0", input=["<base64 encoded image>"])
- Setup config.yaml
model_list:
- model_name: cohere-embed
litellm_params:
model: cohere/embed-english-v3.0
api_key: os.environ/COHERE_API_KEY
- Start proxy
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
- Test it!
curl -X POST 'http://0.0.0.0:4000/v1/embeddings' \
-H 'Authorization: Bearer sk-54d77cd67b9febbb' \
-H 'Content-Type: application/json' \
-d '{
"model": "cohere/embed-english-v3.0",
"input": ["<base64 encoded image>"]
}'
Input Params for litellm.embedding()
Any non-openai params, will be treated as provider-specific params, and sent in the request body as kwargs to the provider.
Required Fields
-
model: string - ID of the model to use.model='text-embedding-ada-002' -
input: string or array - Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8192 tokens for text-embedding-ada-002), cannot be an empty string, and any array must be 2048 dimensions or less.
input=["good morning from litellm"]
Optional haimaker Fields
-
user: string (optional) A unique identifier representing your end-user, -
dimensions: integer (Optional) The number of dimensions the resulting output embeddings should have. Only supported in OpenAI/Azure text-embedding-3 and later models. -
encoding_format: string (Optional) The format to return the embeddings in. Can be either"float"or"base64". Defaults toencoding_format="float" -
timeout: integer (Optional) - The maximum time, in seconds, to wait for the API to respond. Defaults to 600 seconds (10 minutes). -
api_base: string (optional) - The api endpoint you want to call the model with -
api_version: string (optional) - (Azure-specific) the api version for the call -
api_key: string (optional) - The API key to authenticate and authorize requests. If not provided, the default API key is used. -
api_type: string (optional) - The type of API to use.
Output from litellm.embedding()
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.0022326677571982145,
0.010749882087111473,
...
...
...
]
}
],
"model": "text-embedding-ada-002-v2",
"usage": {
"prompt_tokens": 10,
"total_tokens": 10
}
}
OpenAI Embedding Models
Usage
from litellm import embedding
import os
os.environ['OPENAI_API_KEY'] = ""
response = embedding(
model="text-embedding-3-small",
input=["good morning from litellm", "this is another item"],
metadata={"anything": "good day"},
dimensions=5 # Only supported in text-embedding-3 and later models.
)
| Model Name | Function Call | Required OS Variables |
|---|---|---|
| text-embedding-3-small | embedding('text-embedding-3-small', input) | os.environ['OPENAI_API_KEY'] |
| text-embedding-3-large | embedding('text-embedding-3-large', input) | os.environ['OPENAI_API_KEY'] |
| text-embedding-ada-002 | embedding('text-embedding-ada-002', input) | os.environ['OPENAI_API_KEY'] |
OpenAI Compatible Embedding Models
Use this for calling /embedding endpoints on OpenAI Compatible Servers, example https://github.com/xorbitsai/inference
Note add openai/ prefix to model so litellm knows to route to OpenAI
Usage
from litellm import embedding
response = embedding(
model = "openai/<your-llm-name>", # add `openai/` prefix to model so litellm knows to route to OpenAI
api_base="http://0.0.0.0:4000/" # set API Base of your Custom OpenAI Endpoint
input=["good morning from litellm"]
)
Bedrock Embedding
API keys
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
Usage
from litellm import embedding
response = embedding(
model="amazon.titan-embed-text-v1",
input=["good morning from litellm"],
)
print(response)
| Model Name | Function Call |
|---|---|
| Titan Embeddings - G1 | embedding(model="amazon.titan-embed-text-v1", input=input) |
| Cohere Embeddings - English | embedding(model="cohere.embed-english-v3", input=input) |
| Cohere Embeddings - Multilingual | embedding(model="cohere.embed-multilingual-v3", input=input) |
Cohere Embedding Models
https://docs.cohere.com/reference/embed
Usage
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
input_type="search_document" # optional param for v3 llms
)
| Model Name | Function Call |
|---|---|
| embed-english-v3.0 | embedding(model="embed-english-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-english-light-v3.0 | embedding(model="embed-english-light-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-multilingual-v3.0 | embedding(model="embed-multilingual-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-multilingual-light-v3.0 | embedding(model="embed-multilingual-light-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-english-v2.0 | embedding(model="embed-english-v2.0", input=["good morning from litellm", "this is another item"]) |
| embed-english-light-v2.0 | embedding(model="embed-english-light-v2.0", input=["good morning from litellm", "this is another item"]) |
| embed-multilingual-v2.0 | embedding(model="embed-multilingual-v2.0", input=["good morning from litellm", "this is another item"]) |
NVIDIA NIM Embedding Models
API keys
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["NVIDIA_NIM_API_KEY"] = "" # api key
os.environ["NVIDIA_NIM_API_BASE"] = "" # nim endpoint url
Usage
from litellm import embedding
import os
os.environ['NVIDIA_NIM_API_KEY'] = ""
response = embedding(
model='nvidia_nim/<model_name>',
input=["good morning from litellm"],
input_type="query"
)
input_type Parameter for Embedding Models
Certain embedding models, such as nvidia/embed-qa-4 and the E5 family, operate in dual modes—one for indexing documents (passages) and another for querying. To maintain high retrieval accuracy, it's essential to specify how the input text is being used by setting the input_type parameter correctly.
Usage
Set the input_type parameter to one of the following values:
"passage"– for embedding content during indexing (e.g., documents)."query"– for embedding content during retrieval (e.g., user queries).
Warning: Incorrect usage of
input_typecan lead to a significant drop in retrieval performance.
All models listed here are supported:
| Model Name | Function Call |
|---|---|
| NV-Embed-QA | embedding(model="nvidia_nim/NV-Embed-QA", input) |
| nvidia/nv-embed-v1 | embedding(model="nvidia_nim/nvidia/nv-embed-v1", input) |
| nvidia/nv-embedqa-mistral-7b-v2 | embedding(model="nvidia_nim/nvidia/nv-embedqa-mistral-7b-v2", input) |
| nvidia/nv-embedqa-e5-v5 | embedding(model="nvidia_nim/nvidia/nv-embedqa-e5-v5", input) |
| nvidia/embed-qa-4 | embedding(model="nvidia_nim/nvidia/embed-qa-4", input) |
| nvidia/llama-3.2-nv-embedqa-1b-v1 | embedding(model="nvidia_nim/nvidia/llama-3.2-nv-embedqa-1b-v1", input) |
| nvidia/llama-3.2-nv-embedqa-1b-v2 | embedding(model="nvidia_nim/nvidia/llama-3.2-nv-embedqa-1b-v2", input) |
| snowflake/arctic-embed-l | embedding(model="nvidia_nim/snowflake/arctic-embed-l", input) |
| baai/bge-m3 | embedding(model="nvidia_nim/baai/bge-m3", input) |
HuggingFace Embedding Models
haimaker supports all Feature-Extraction + Sentence Similarity Embedding models: https://huggingface.co/models?pipeline_tag=feature-extraction
Usage
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
model='huggingface/microsoft/codebert-base',
input=["good morning from litellm"]
)
Usage - Set input_type
haimaker infers input type (feature-extraction or sentence-similarity) by making a GET request to the api base.
Override this, by setting the input_type yourself.
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
model='huggingface/microsoft/codebert-base',
input=["good morning from litellm", "you are a good bot"],
api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud",
input_type="sentence-similarity"
)
Usage - Custom API Base
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
model='huggingface/microsoft/codebert-base',
input=["good morning from litellm"],
api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud"
)
| Model Name | Function Call | Required OS Variables |
|---|---|---|
| microsoft/codebert-base | embedding('huggingface/microsoft/codebert-base', input=input) | os.environ['HUGGINGFACE_API_KEY'] |
| BAAI/bge-large-zh | embedding('huggingface/BAAI/bge-large-zh', input=input) | os.environ['HUGGINGFACE_API_KEY'] |
| any-hf-embedding-model | embedding('huggingface/hf-embedding-model', input=input) | os.environ['HUGGINGFACE_API_KEY'] |
Mistral AI Embedding Models
All models listed here https://docs.mistral.ai/platform/endpoints are supported
Usage
from litellm import embedding
import os
os.environ['MISTRAL_API_KEY'] = ""
response = embedding(
model="mistral/mistral-embed",
input=["good morning from litellm"],
)
print(response)
| Model Name | Function Call |
|---|---|
| mistral-embed | embedding(model="mistral/mistral-embed", input) |
Gemini AI Embedding Models
API keys
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["GEMINI_API_KEY"] = ""
Usage - Embedding
from litellm import embedding
response = embedding(
model="gemini/text-embedding-004",
input=["good morning from litellm"],
)
print(response)
All models listed here are supported:
| Model Name | Function Call |
|---|---|
| text-embedding-004 | embedding(model="gemini/text-embedding-004", input) |
Vertex AI Embedding Models
Usage - Embedding
import litellm
from litellm import embedding
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location
response = embedding(
model="vertex_ai/textembedding-gecko",
input=["good morning from litellm"],
)
print(response)
Supported Models
All models listed here are supported
| Model Name | Function Call |
|---|---|
| textembedding-gecko | embedding(model="vertex_ai/textembedding-gecko", input) |
| textembedding-gecko-multilingual | embedding(model="vertex_ai/textembedding-gecko-multilingual", input) |
| textembedding-gecko-multilingual@001 | embedding(model="vertex_ai/textembedding-gecko-multilingual@001", input) |
| textembedding-gecko@001 | embedding(model="vertex_ai/textembedding-gecko@001", input) |
| textembedding-gecko@003 | embedding(model="vertex_ai/textembedding-gecko@003", input) |
| text-embedding-preview-0409 | embedding(model="vertex_ai/text-embedding-preview-0409", input) |
| text-multilingual-embedding-preview-0409 | embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input) |
Voyage AI Embedding Models
Usage - Embedding
from litellm import embedding
import os
os.environ['VOYAGE_API_KEY'] = ""
response = embedding(
model="voyage/voyage-01",
input=["good morning from litellm"],
)
print(response)
Supported Models
All models listed here https://docs.voyageai.com/embeddings/#models-and-specifics are supported
| Model Name | Function Call |
|---|---|
| voyage-01 | embedding(model="voyage/voyage-01", input) |
| voyage-lite-01 | embedding(model="voyage/voyage-lite-01", input) |
| voyage-lite-01-instruct | embedding(model="voyage/voyage-lite-01-instruct", input) |
Provider-specific Params
Any non-openai params, will be treated as provider-specific params, and sent in the request body as kwargs to the provider.
Example
Cohere v3 Models have a required parameter: input_type, it can be one of the following four values:
input_type="search_document": (default) Use this for texts (documents) you want to store in your vector databaseinput_type="search_query": Use this for search queries to find the most relevant documents in your vector databaseinput_type="classification": Use this if you use the embeddings as an input for a classification systeminput_type="clustering": Use this if you use the embeddings for text clustering
https://txt.cohere.com/introducing-embed-v3/
- SDK
- PROXY
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
input_type="search_document" # 👈 PROVIDER-SPECIFIC PARAM
)
via config
model_list:
- model_name: "cohere-embed"
litellm_params:
model: embed-english-v3.0
input_type: search_document # 👈 PROVIDER-SPECIFIC PARAM
via request
curl -X POST 'http://0.0.0.0:4000/v1/embeddings' \
-H 'Authorization: Bearer sk-54d77cd67b9febbb' \
-H 'Content-Type: application/json' \
-d '{
"model": "cohere-embed",
"input": ["Are you authorized to work in United States of America?"],
"input_type": "search_document" # 👈 PROVIDER-SPECIFIC PARAM
}'
Nebius AI Studio Embedding Models
Usage - Embedding
from litellm import embedding
import os
os.environ['NEBIUS_API_KEY'] = ""
response = embedding(
model="nebius/BAAI/bge-en-icl",
input=["Good morning from litellm!"],
)
print(response)
Supported Models
All supported models can be found here: https://studio.nebius.ai/models/embedding
| Model Name | Function Call |
|---|---|
| BAAI/bge-en-icl | embedding(model="nebius/BAAI/bge-en-icl", input) |
| BAAI/bge-multilingual-gemma2 | embedding(model="nebius/BAAI/bge-multilingual-gemma2", input) |
| intfloat/e5-mistral-7b-instruct | embedding(model="nebius/intfloat/e5-mistral-7b-instruct", input) |