Input Params
Quick Start
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello, how are you?"}],
max_tokens=100,
temperature=0.7
)
print(response.choices[0].message.content)
Supported Parameters
haimaker accepts and translates the OpenAI Chat Completion params across all providers.
| Provider | temperature | max_completion_tokens | max_tokens | top_p | stream | stream_options | stop | n | presence_penalty | frequency_penalty | functions | function_call | logit_bias | user | response_format | seed | tools | tool_choice | logprobs | top_logprobs | extra_headers |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Anthropic | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||
| OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Azure OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| xAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||
| Cohere | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||||||
| VertexAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||
| Bedrock | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ (model dependent) | ||||||||||||||
| TogetherAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||||
| Groq | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||
| Mistral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Required Fields
model
string - ID of the model to use.
model="openai/gpt-4o"
model="anthropic/claude-3-7-sonnet-latest"
model="gemini/gemini-1.5-pro"
messages
array - A list of messages comprising the conversation so far.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
Properties of messages
role: string - The role of the message's author. Roles can be: system, user, assistant, function or tool.content: string or list[dict] or null - The contents of the message.name: string (optional) - The name of the author of the message.tool_call_id: str (optional) - Tool call that this message is responding to.
Optional Fields
temperature
number or null (optional) - The sampling temperature to be used, between 0 and 2.
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
temperature=0.7
)
max_tokens
integer (optional) - The maximum number of tokens to generate in the chat completion.
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100
)
max_completion_tokens
integer (optional) - An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
top_p
number or null (optional) - An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability.
n
integer or null (optional) - The number of chat completion choices to generate for each input message.
stream
boolean or null (optional) - If set to true, it sends partial message deltas.
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
stream_options
dict or null (optional) - Options for streaming response. Only set this when you set stream: true.
include_usageboolean (optional) - If set, an additional chunk will be streamed with token usage statistics.
stop
string/array/null (optional) - Up to 4 sequences where the API will stop generating further tokens.
presence_penalty
number or null (optional) - Penalizes new tokens based on their existence in the text so far.
frequency_penalty
number or null (optional) - Penalizes new tokens based on their frequency in the text so far.
response_format
object (optional) - An object specifying the format that the model must output.
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{"role": "system", "content": "Output JSON."},
{"role": "user", "content": "List 3 colors"}
],
response_format={"type": "json_object"}
)
seed
integer or null (optional) - If specified, the system will make a best effort to sample deterministically.
tools
array (optional) - A list of tools the model may call.
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Boston?"}],
tools=tools
)
tool_choice
string or object (optional) - Controls which function is called by the model.
"none"- Model will not call a function"auto"- Model can pick between generating a message or calling a function{"type": "function", "function": {"name": "my_function"}}- Forces the model to call that function
parallel_tool_calls
boolean (optional) - Whether to enable parallel function calling during tool use.
logit_bias
map (optional) - Used to modify the probability of specific tokens appearing in the completion.
user
string (optional) - A unique identifier representing your end-user.
logprobs
bool (optional) - Whether to return log probabilities of the output tokens.
top_logprobs
int (optional) - An integer between 0 and 5 specifying the number of most likely tokens to return at each token position.
extra_headers
dict (optional) - Additional headers to send with the request.
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
extra_headers={"X-Custom-Header": "value"}
)
provider
dict (optional) - Configuration for model selection and routing. Controls how requests are routed when multiple providers serve the same model.
-
sort: string (optional) - Sorting strategy for provider selection. Options:"price"(default): Select the lowest cost provider based on input + output token costs"latency": Optimize for fastest response time
-
compliance: string (optional) - ISO 3166-1 Alpha-2 region code to filter providers by geographic compliance requirements (e.g.,"US","EU","AE"). Only providers with certifications for the specified region will be considered. Returns 403 if no compliant providers are available.
Example:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.chat.completions.create(
model="deepseek/deepseek-v3",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={
"provider": {
"sort": "price", # Select cheapest provider
"compliance": "AE" # Must be compliant with UAE regulations
}
}
)
cURL Example:
curl https://api.haimaker.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v3",
"provider": {
"sort": "price",
"compliance": "AE"
},
"messages": [{"role": "user", "content": "Hello!"}]
}'