Thinking / Reasoning Content

Supported Providers:

Deepseek (deepseek/)
Anthropic API (anthropic/)
Bedrock (Anthropic + Deepseek + GPT-OSS) (bedrock/)
Vertex AI (Anthropic) (vertexai/)
OpenRouter (openrouter/)
XAI (xai/)
Google AI Studio (google/)
Vertex AI (vertex_ai/)
Perplexity (perplexity/)
Mistral AI (Magistral models) (mistral/)
Groq (groq/)

haimaker will standardize the reasoning_content in the response and thinking_blocks in the assistant message.

# Example response structure
"message": {
    ...
    "reasoning_content": "The capital of France is Paris.",
    "thinking_blocks": [  # only returned for Anthropic models
        {
            "type": "thinking",
            "thinking": "The capital of France is Paris.",
            "signature": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+..."
        }
    ]
}

Quick Start

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
    model="anthropic/claude-3-7-sonnet-20250219",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "reasoning_effort": "low"
    }
)

print(response.choices[0].message.content)

cURL

curl https://api.haimaker.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "anthropic/claude-3-7-sonnet-20250219",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "reasoning_effort": "low"
  }'

Using the Thinking Parameter

For Anthropic models, you can use the thinking parameter for more control:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
    model="anthropic/claude-3-7-sonnet-20250219",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "thinking": {"type": "enabled", "budget_tokens": 1024}
    }
)

print(response.choices[0].message.content)

cURL

curl https://api.haimaker.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "anthropic/claude-3-7-sonnet-20250219",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "thinking": {"type": "enabled", "budget_tokens": 1024}
  }'

Using Different Models

Deepseek

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
    model="deepseek/deepseek-chat",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "reasoning_effort": "low"
    }
)

XAI Grok

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
    model="xai/grok-2-latest",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    extra_body={
        "reasoning_effort": "medium"
    }
)

Response Format

The response includes reasoning content:

{
    "id": "3b66124d79a708e10c603496b363574c",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "The capital of France is Paris.",
                "role": "assistant",
                "reasoning_content": "Let me think about this...",
                "thinking_blocks": [
                    {
                        "type": "thinking",
                        "thinking": "Let me think about this...",
                        "signature": "..."
                    }
                ]
            }
        }
    ],
    "model": "claude-3-7-sonnet-20250219",
    "usage": {
        "completion_tokens": 12,
        "prompt_tokens": 16,
        "total_tokens": 28
    }
}

Reasoning Effort Levels

Level	Description
`low`	Minimal reasoning, faster responses
`medium`	Balanced reasoning
`high`	Maximum reasoning, more thorough but slower

Spec

These fields can be accessed from the response:

reasoning_content - str: The reasoning content from the model. Returned across all providers.
thinking_blocks - Optional[List[Dict[str, str]]]: A list of thinking blocks from the model. Only returned for Anthropic models.
- type - str: The type of thinking block.
- thinking - str: The thinking from the model.
- signature - str: The signature delta from the model.

Quick Start​

Python​

cURL​

Using the Thinking Parameter​

cURL​

Using Different Models​

Deepseek​

XAI Grok​

Response Format​

Reasoning Effort Levels​

Spec​

Quick Start

Python

cURL

Using the Thinking Parameter

cURL

Using Different Models

Deepseek

XAI Grok

Response Format

Reasoning Effort Levels

Spec