Skip to main content

Thinking / Reasoning Content

Supported Providers:

  • Deepseek (deepseek/)
  • Anthropic API (anthropic/)
  • Bedrock (Anthropic + Deepseek + GPT-OSS) (bedrock/)
  • Vertex AI (Anthropic) (vertexai/)
  • OpenRouter (openrouter/)
  • XAI (xai/)
  • Google AI Studio (google/)
  • Vertex AI (vertex_ai/)
  • Perplexity (perplexity/)
  • Mistral AI (Magistral models) (mistral/)
  • Groq (groq/)

haimaker will standardize the reasoning_content in the response and thinking_blocks in the assistant message.

# Example response structure
"message": {
...
"reasoning_content": "The capital of France is Paris.",
"thinking_blocks": [ # only returned for Anthropic models
{
"type": "thinking",
"thinking": "The capital of France is Paris.",
"signature": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+..."
}
]
}

Quick Start

Python

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
extra_body={
"reasoning_effort": "low"
}
)

print(response.choices[0].message.content)

cURL

curl https://api.haimaker.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-7-sonnet-20250219",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"reasoning_effort": "low"
}'

Using the Thinking Parameter

For Anthropic models, you can use the thinking parameter for more control:

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
extra_body={
"thinking": {"type": "enabled", "budget_tokens": 1024}
}
)

print(response.choices[0].message.content)

cURL

curl https://api.haimaker.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-7-sonnet-20250219",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"thinking": {"type": "enabled", "budget_tokens": 1024}
}'

Using Different Models

Deepseek

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
model="deepseek/deepseek-chat",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
extra_body={
"reasoning_effort": "low"
}
)

XAI Grok

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
model="xai/grok-2-latest",
messages=[
{"role": "user", "content": "Explain quantum computing"}
],
extra_body={
"reasoning_effort": "medium"
}
)

Response Format

The response includes reasoning content:

{
"id": "3b66124d79a708e10c603496b363574c",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The capital of France is Paris.",
"role": "assistant",
"reasoning_content": "Let me think about this...",
"thinking_blocks": [
{
"type": "thinking",
"thinking": "Let me think about this...",
"signature": "..."
}
]
}
}
],
"model": "claude-3-7-sonnet-20250219",
"usage": {
"completion_tokens": 12,
"prompt_tokens": 16,
"total_tokens": 28
}
}

Reasoning Effort Levels

LevelDescription
lowMinimal reasoning, faster responses
mediumBalanced reasoning
highMaximum reasoning, more thorough but slower

Spec

These fields can be accessed from the response:

  • reasoning_content - str: The reasoning content from the model. Returned across all providers.
  • thinking_blocks - Optional[List[Dict[str, str]]]: A list of thinking blocks from the model. Only returned for Anthropic models.
    • type - str: The type of thinking block.
    • thinking - str: The thinking from the model.
    • signature - str: The signature delta from the model.