Skip to main content

DeepInfra

https://deepinfra.com/

tip

We support ALL DeepInfra models, just set model=deepinfra/<any-model-on-deepinfra> as a prefix when sending litellm requests

Table of Contents

API Key

# env variable
os.environ['DEEPINFRA_API_KEY']

Sample Usage

from litellm import completion
import os

os.environ['DEEPINFRA_API_KEY'] = ""
response = completion(
model="deepinfra/meta-llama/Llama-2-70b-chat-hf",
messages=[{"role": "user", "content": "write code for saying hi from haimaker"}]
)

Sample Usage - Streaming

from litellm import completion
import os

os.environ['DEEPINFRA_API_KEY'] = ""
response = completion(
model="deepinfra/meta-llama/Llama-2-70b-chat-hf",
messages=[{"role": "user", "content": "write code for saying hi from haimaker"}],
stream=True
)

for chunk in response:
print(chunk)

Chat Models

Model NameFunction Call
meta-llama/Meta-Llama-3-8B-Instructcompletion(model="deepinfra/meta-llama/Meta-Llama-3-8B-Instruct", messages)
meta-llama/Meta-Llama-3-70B-Instructcompletion(model="deepinfra/meta-llama/Meta-Llama-3-70B-Instruct", messages)
meta-llama/Llama-2-70b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-70b-chat-hf", messages)
meta-llama/Llama-2-7b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-7b-chat-hf", messages)
meta-llama/Llama-2-13b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-13b-chat-hf", messages)
codellama/CodeLlama-34b-Instruct-hfcompletion(model="deepinfra/codellama/CodeLlama-34b-Instruct-hf", messages)
mistralai/Mistral-7B-Instruct-v0.1completion(model="deepinfra/mistralai/Mistral-7B-Instruct-v0.1", messages)
jondurbin/airoboros-l2-70b-gpt4-1.4.1completion(model="deepinfra/jondurbin/airoboros-l2-70b-gpt4-1.4.1", messages)

Rerank Endpoint

LiteLLM provides a Cohere API compatible /rerank endpoint for DeepInfra rerank models.

Supported Rerank Models

Model NameDescription
deepinfra/Qwen/Qwen3-Reranker-0.6BLightweight rerank model (0.6B parameters)
deepinfra/Qwen/Qwen3-Reranker-4BMedium rerank model (4B parameters)
deepinfra/Qwen/Qwen3-Reranker-8BLarge rerank model (8B parameters)

Usage - LiteLLM Python SDK

from litellm import rerank
import os

os.environ["DEEPINFRA_API_KEY"] = "your-api-key"

response = rerank(
model="deepinfra/Qwen/Qwen3-Reranker-0.6B",
query="What is the capital of France?",
documents=[
"Paris is the capital of France.",
"London is the capital of the United Kingdom.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
"Rome is the capital of Italy."
]
)
print(response)

Supported Cohere Rerank API Params

ParamTypeDescription
querystrThe query to rerank the documents against
documentslist[str]The documents to rerank

Provider-specific parameters

Pass any deepinfra specific parameters as a keyword argument to the rerank function, e.g.

response = rerank(
model="deepinfra/Qwen/Qwen3-Reranker-0.6B",
query="What is the capital of France?",
documents=[
"Paris is the capital of France.",
"London is the capital of the United Kingdom.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
"Rome is the capital of Italy."
],
my_custom_param="my_custom_value", # any other deepinfra specific parameters
)

Response Format

{
"id": "request-id",
"results": [
{
"index": 0,
"relevance_score": 0.9975274205207825
},
{
"index": 1,
"relevance_score": 0.011687257327139378
}
],
"meta": {
"billed_units": {
"total_tokens": 427
},
"tokens": {
"input_tokens": 427,
"output_tokens": 0
}
}
}