MiniMax
MiniMax - v1/messages
Overview
Litellm provides anthropic specs compatible support for minmax
Supported Models
MiniMax offers three models through their Anthropic-compatible API:
| Model | Description | Input Cost | Output Cost | Prompt Caching Read | Prompt Caching Write |
|---|---|---|---|---|---|
| MiniMax-M2.1 | Powerful Multi-Language Programming with Enhanced Programming Experience (~60 tps) | $0.3/M tokens | $1.2/M tokens | $0.03/M tokens | $0.375/M tokens |
| MiniMax-M2.1-lightning | Faster and More Agile (~100 tps) | $0.3/M tokens | $2.4/M tokens | $0.03/M tokens | $0.375/M tokens |
| MiniMax-M2 | Agentic capabilities, Advanced reasoning | $0.3/M tokens | $1.2/M tokens | $0.03/M tokens | $0.375/M tokens |
Usage Examples
Basic Chat Completion
import litellm
response = litellm.anthropic.messages.acreate(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "Hello, how are you?"}],
api_key="your-minimax-api-key",
api_base="https://api.minimax.io/anthropic/v1/messages",
max_tokens=1000
)
print(response.choices[0].message.content)
Using Environment Variables
export MINIMAX_API_KEY="your-minimax-api-key"
export MINIMAX_API_BASE="https://api.minimax.io/anthropic/v1/messages"
import litellm
response = litellm.anthropic.messages.acreate(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=1000
)
With Thinking (M2.1 Feature)
response = litellm.anthropic.messages.acreate(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "Solve: 2+2=?"}],
thinking={"type": "enabled", "budget_tokens": 1000},
api_key="your-minimax-api-key"
)
# Access thinking content
for block in response.choices[0].message.content:
if hasattr(block, 'type') and block.type == 'thinking':
print(f"Thinking: {block.thinking}")
With Tool Calling
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
response = litellm.anthropic.messages.acreate(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "What's the weather in SF?"}],
tools=tools,
api_key="your-minimax-api-key",
max_tokens=1000
)
Usage with LiteLLM Proxy
You can use MiniMax models with the Anthropic SDK by routing through LiteLLM Proxy:
| Step | Description |
|---|---|
| 1. Start LiteLLM Proxy | Configure proxy with MiniMax models in config.yaml |
| 2. Set Environment Variables | Point Anthropic SDK to proxy endpoint |
| 3. Use Anthropic SDK | Call MiniMax models using native Anthropic SDK |
Step 1: Configure LiteLLM Proxy
Create a config.yaml:
model_list:
- model_name: minimax/MiniMax-M2.1
litellm_params:
model: minimax/MiniMax-M2.1
api_key: os.environ/MINIMAX_API_KEY
api_base: https://api.minimax.io/anthropic/v1/messages
Start the proxy:
litellm --config config.yaml
Step 2: Use with Anthropic SDK
import os
os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:4000"
os.environ["ANTHROPIC_API_KEY"] = "sk-1234" # Your LiteLLM proxy key
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="minimax/MiniMax-M2.1",
max_tokens=1000,
system="You are a helpful assistant.",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Hi, how are you?"
}
]
}
]
)
for block in message.content:
if block.type == "thinking":
print(f"Thinking:\n{block.thinking}\n")
elif block.type == "text":
print(f"Text:\n{block.text}\n")
MiniMax - v1/chat/completions
Usage with LiteLLM SDK
You can use MiniMax's OpenAI-compatible API directly with LiteLLM:
Basic Chat Completion
import litellm
response = litellm.completion(
model="minimax/MiniMax-M2.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
api_key="your-minimax-api-key",
api_base="https://api.minimax.io/v1"
)
print(response.choices[0].message.content)
Using Environment Variables
export MINIMAX_API_KEY="your-minimax-api-key"
export MINIMAX_API_BASE="https://api.minimax.io/v1"
import litellm
response = litellm.completion(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "Hello!"}]
)
With Reasoning Split
response = litellm.completion(
model="minimax/MiniMax-M2.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Solve: 2+2=?"}
],
extra_body={"reasoning_split": True},
api_key="your-minimax-api-key",
api_base="https://api.minimax.io/v1"
)
# Access reasoning details if available
if hasattr(response.choices[0].message, 'reasoning_details'):
print(f"Thinking: {response.choices[0].message.reasoning_details}")
print(f"Response: {response.choices[0].message.content}")
With Tool Calling
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
response = litellm.completion(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "What's the weather in SF?"}],
tools=tools,
api_key="your-minimax-api-key",
api_base="https://api.minimax.io/v1"
)
Streaming
response = litellm.completion(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
api_key="your-minimax-api-key",
api_base="https://api.minimax.io/v1"
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Usage with OpenAI SDK via LiteLLM Proxy
You can also use MiniMax models with the OpenAI SDK by routing through LiteLLM Proxy:
| Step | Description |
|---|---|
| 1. Start LiteLLM Proxy | Configure proxy with MiniMax models in config.yaml |
| 2. Set Environment Variables | Point OpenAI SDK to proxy endpoint |
| 3. Use OpenAI SDK | Call MiniMax models using native OpenAI SDK |
Step 1: Configure LiteLLM Proxy
Create a config.yaml:
model_list:
- model_name: minimax/MiniMax-M2.1
litellm_params:
model: minimax/MiniMax-M2.1
api_key: os.environ/MINIMAX_API_KEY
api_base: https://api.minimax.io/v1
Start the proxy:
litellm --config config.yaml
Step 2: Use with OpenAI SDK
import os
os.environ["OPENAI_BASE_URL"] = "http://localhost:4000"
os.environ["OPENAI_API_KEY"] = "sk-1234" # Your LiteLLM proxy key
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="minimax/MiniMax-M2.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi, how are you?"},
],
# Set reasoning_split=True to separate thinking content
extra_body={"reasoning_split": True},
)
# Access thinking and response
if hasattr(response.choices[0].message, 'reasoning_details'):
print(f"Thinking:\n{response.choices[0].message.reasoning_details[0]['text']}\n")
print(f"Text:\n{response.choices[0].message.content}\n")
Streaming with OpenAI SDK
from openai import OpenAI
client = OpenAI()
stream = client.chat.completions.create(
model="minimax/MiniMax-M2.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a story"},
],
extra_body={"reasoning_split": True},
stream=True,
)
reasoning_buffer = ""
text_buffer = ""
for chunk in stream:
if hasattr(chunk.choices[0].delta, "reasoning_details") and chunk.choices[0].delta.reasoning_details:
for detail in chunk.choices[0].delta.reasoning_details:
if "text" in detail:
reasoning_text = detail["text"]
new_reasoning = reasoning_text[len(reasoning_buffer):]
if new_reasoning:
print(new_reasoning, end="", flush=True)
reasoning_buffer = reasoning_text
if chunk.choices[0].delta.content:
content_text = chunk.choices[0].delta.content
new_text = content_text[len(text_buffer):] if text_buffer else content_text
if new_text:
print(new_text, end="", flush=True)
text_buffer = content_text
Cost Calculation
Cost calculation works automatically using the pricing information in model_prices_and_context_window.json.
Example:
response = litellm.completion(
model="minimax/MiniMax-M2.1",
messages=[{"role": "user", "content": "Hello!"}],
api_key="your-minimax-api-key"
)
# Access cost information
print(f"Cost: ${response._hidden_params.get('response_cost', 0)}")
MiniMax - Text-to-Speech
Quick Start
LiteLLM Python SDK Usage
Basic Usage
from pathlib import Path
from litellm import speech
import os
os.environ["MINIMAX_API_KEY"] = "your-api-key"
speech_file_path = Path(__file__).parent / "speech.mp3"
response = speech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="The quick brown fox jumped over the lazy dogs",
)
response.stream_to_file(speech_file_path)
Async Usage
from litellm import aspeech
from pathlib import Path
import os, asyncio
os.environ["MINIMAX_API_KEY"] = "your-api-key"
async def test_async_speech():
speech_file_path = Path(__file__).parent / "speech.mp3"
response = await aspeech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="The quick brown fox jumped over the lazy dogs",
)
response.stream_to_file(speech_file_path)
asyncio.run(test_async_speech())
Voice Selection
MiniMax supports many voices. LiteLLM provides OpenAI-compatible voice names that map to MiniMax voices:
from litellm import speech
# OpenAI-compatible voice names
voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
for voice in voices:
response = speech(
model="minimax/speech-2.6-hd",
voice=voice,
input=f"This is the {voice} voice",
)
response.stream_to_file(f"speech_{voice}.mp3")
You can also use MiniMax-native voice IDs directly:
response = speech(
model="minimax/speech-2.6-hd",
voice="male-qn-qingse", # MiniMax native voice ID
input="Using native MiniMax voice ID",
)
Custom Parameters
MiniMax TTS supports additional parameters for fine-tuning audio output:
from litellm import speech
response = speech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="Custom audio parameters",
speed=1.5, # Speed: 0.5 to 2.0
response_format="mp3", # Format: mp3, pcm, wav, flac
extra_body={
"vol": 1.2, # Volume: 0.1 to 10
"pitch": 2, # Pitch adjustment: -12 to 12
"sample_rate": 32000, # 16000, 24000, or 32000
"bitrate": 128000, # For MP3: 64000, 128000, 192000, 256000
"channel": 1, # 1 for mono, 2 for stereo
}
)
response.stream_to_file("custom_speech.mp3")
Response Formats
from litellm import speech
# MP3 format (default)
response = speech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="MP3 format audio",
response_format="mp3",
)
# PCM format
response = speech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="PCM format audio",
response_format="pcm",
)
# WAV format
response = speech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="WAV format audio",
response_format="wav",
)
# FLAC format
response = speech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="FLAC format audio",
response_format="flac",
)
LiteLLM Proxy Usage
LiteLLM provides an OpenAI-compatible /audio/speech endpoint for MiniMax TTS.
Setup
Add MiniMax to your proxy configuration:
model_list:
- model_name: tts
litellm_params:
model: minimax/speech-2.6-hd
api_key: os.environ/MINIMAX_API_KEY
- model_name: tts-turbo
litellm_params:
model: minimax/speech-2.6-turbo
api_key: os.environ/MINIMAX_API_KEY
Start the proxy:
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Making Requests
curl http://0.0.0.0:4000/v1/audio/speech \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "tts",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy"
}' \
--output speech.mp3
With custom parameters:
curl http://0.0.0.0:4000/v1/audio/speech \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "tts",
"input": "Custom parameters example.",
"voice": "nova",
"speed": 1.5,
"response_format": "mp3",
"extra_body": {
"vol": 1.2,
"pitch": 1,
"sample_rate": 32000
}
}' \
--output custom_speech.mp3
Voice Mappings
LiteLLM maps OpenAI-compatible voice names to MiniMax voice IDs:
| OpenAI Voice | MiniMax Voice ID | Description |
|---|---|---|
| alloy | male-qn-qingse | Male voice |
| echo | male-qn-jingying | Male voice |
| fable | female-shaonv | Female voice |
| onyx | male-qn-badao | Male voice |
| nova | female-yujie | Female voice |
| shimmer | female-tianmei | Female voice |
You can also use any MiniMax-native voice ID directly by passing it as the voice parameter.
Streaming (WebSocket)
The current implementation uses MiniMax's HTTP endpoint. For WebSocket streaming support, please refer to MiniMax's official documentation at https://platform.minimax.io/docs.
Error Handling
from litellm import speech
import litellm
try:
response = speech(
model="minimax/speech-2.6-hd",
voice="alloy",
input="Test input",
)
response.stream_to_file("output.mp3")
except litellm.exceptions.BadRequestError as e:
print(f"Bad request: {e}")
except litellm.exceptions.AuthenticationError as e:
print(f"Authentication failed: {e}")
except Exception as e:
print(f"Error: {e}")
Extra Body Parameters
Pass these via extra_body:
| Parameter | Type | Description | Default |
|---|---|---|---|
| vol | float | Volume (0.1 to 10) | 1.0 |
| pitch | int | Pitch adjustment (-12 to 12) | 0 |
| sample_rate | int | Sample rate: 16000, 24000, 32000 | 32000 |
| bitrate | int | Bitrate for MP3: 64000, 128000, 192000, 256000 | 128000 |
| channel | int | Audio channels: 1 (mono) or 2 (stereo) | 1 |
| output_format | string | Output format: "hex" or "url" (url returns a URL valid for 24 hours) | hex |