Input Params

Quick Start

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    max_tokens=100,
    temperature=0.7
)

print(response.choices[0].message.content)

Supported Parameters

haimaker accepts and translates the OpenAI Chat Completion params across all providers.

Provider	temperature	max_completion_tokens	max_tokens	top_p	stream	stream_options	stop	n	presence_penalty	frequency_penalty	functions	function_call	logit_bias	user	response_format	seed	tools	tool_choice	logprobs	top_logprobs	extra_headers
Anthropic	✅	✅	✅	✅	✅	✅	✅							✅	✅		✅	✅			✅
OpenAI	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Azure OpenAI	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
xAI	✅		✅	✅	✅	✅	✅	✅	✅	✅			✅	✅	✅	✅	✅	✅	✅	✅
Cohere	✅	✅	✅	✅	✅	✅	✅	✅
VertexAI	✅	✅	✅		✅	✅									✅	✅
Bedrock	✅	✅	✅	✅	✅	✅									✅ (model dependent)
TogetherAI	✅	✅	✅	✅	✅	✅					✅				✅		✅	✅
Groq	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅					✅	✅	✅	✅
Mistral	✅	✅	✅	✅	✅	✅	✅								✅	✅	✅	✅

Required Fields

model

string - ID of the model to use.

model="openai/gpt-4o"
model="anthropic/claude-3-7-sonnet-latest"
model="gemini/gemini-1.5-pro"

messages

array - A list of messages comprising the conversation so far.

messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

Properties of messages

role: string - The role of the message's author. Roles can be: system, user, assistant, function or tool.
content: string or list[dict] or null - The contents of the message.
name: string (optional) - The name of the author of the message.
tool_call_id: str (optional) - Tool call that this message is responding to.

Optional Fields

temperature

number or null (optional) - The sampling temperature to be used, between 0 and 2.

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    temperature=0.7
)

max_tokens

integer (optional) - The maximum number of tokens to generate in the chat completion.

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100
)

max_completion_tokens

integer (optional) - An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

top_p

number or null (optional) - An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability.

n

integer or null (optional) - The number of chat completion choices to generate for each input message.

stream

boolean or null (optional) - If set to true, it sends partial message deltas.

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

stream_options

dict or null (optional) - Options for streaming response. Only set this when you set stream: true.

include_usage boolean (optional) - If set, an additional chunk will be streamed with token usage statistics.

stop

string/array/null (optional) - Up to 4 sequences where the API will stop generating further tokens.

presence_penalty

number or null (optional) - Penalizes new tokens based on their existence in the text so far.

frequency_penalty

number or null (optional) - Penalizes new tokens based on their frequency in the text so far.

response_format

object (optional) - An object specifying the format that the model must output.

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "Output JSON."},
        {"role": "user", "content": "List 3 colors"}
    ],
    response_format={"type": "json_object"}
)

seed

integer or null (optional) - If specified, the system will make a best effort to sample deterministically.

tools

array (optional) - A list of tools the model may call.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    tools=tools
)

tool_choice

string or object (optional) - Controls which function is called by the model.

"none" - Model will not call a function
"auto" - Model can pick between generating a message or calling a function
{"type": "function", "function": {"name": "my_function"}} - Forces the model to call that function

parallel_tool_calls

boolean (optional) - Whether to enable parallel function calling during tool use.

logit_bias

map (optional) - Used to modify the probability of specific tokens appearing in the completion.

user

string (optional) - A unique identifier representing your end-user.

logprobs

bool (optional) - Whether to return log probabilities of the output tokens.

top_logprobs

int (optional) - An integer between 0 and 5 specifying the number of most likely tokens to return at each token position.

extra_headers

dict (optional) - Additional headers to send with the request.

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Custom-Header": "value"}
)

provider

dict (optional) - Configuration for model selection and routing. Controls how requests are routed when multiple providers serve the same model.

sort: string (optional) - Sorting strategy for provider selection. Options:
- "price" (default): Select the lowest cost provider based on input + output token costs
- "latency": Optimize for fastest response time
compliance: string (optional) - ISO 3166-1 Alpha-2 region code to filter providers by geographic compliance requirements (e.g., "US", "EU", "AE"). Only providers with certifications for the specified region will be considered. Returns 403 if no compliant providers are available.

Example:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v3",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "provider": {
            "sort": "price",      # Select cheapest provider
            "compliance": "AE"    # Must be compliant with UAE regulations
        }
    }
)

cURL Example:

curl https://api.haimaker.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v3",
    "provider": {
      "sort": "price",
      "compliance": "AE"
    },
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Quick Start​

Supported Parameters​

Required Fields​

model​

messages​

Properties of messages​

Optional Fields​

temperature​

max_tokens​

max_completion_tokens​

top_p​

n​

stream​

stream_options​

stop​

presence_penalty​

frequency_penalty​

response_format​

seed​

tools​

tool_choice​

parallel_tool_calls​

logit_bias​

user​

logprobs​

top_logprobs​

extra_headers​

provider​