/responses
haimaker provides an endpoint in the spec of OpenAI's /responses API.
Requests to /chat/completions may be bridged here automatically when the provider lacks support for that endpoint. The model's default mode determines how bridging works.
| Feature | Supported | Notes |
|---|---|---|
| Cost Tracking | ✅ | Works with all supported models |
| Logging | ✅ | Works across all integrations |
| End-user Tracking | ✅ | |
| Streaming | ✅ | |
| Image Generation Streaming | ✅ | Progressive image generation with partial images (1-3) |
| Fallbacks | ✅ | Works between supported models |
| Loadbalancing | ✅ | Works between supported models |
| Guardrails | ✅ | Applies to input and output text (non-streaming only) |
| Supported operations | Create, Get, Cancel, Delete a response | |
| Supported Providers | All providers: openai, anthropic, bedrock, vertex_ai, gemini, azure, azure_ai etc. |
Quick Start
Python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# Non-streaming response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)
print(response)
cURL
curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "openai/o1-pro",
"input": "Tell me a three sentence bedtime story about a unicorn."
}'
Streaming
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)
for event in response:
print(event)
Response Format
{
"id": "resp_abc123",
"object": "response",
"created_at": 1734366691,
"status": "completed",
"model": "o1-pro-2025-01-30",
"output": [
{
"type": "message",
"id": "msg_abc123",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Once upon a time, a little unicorn named Stardust lived in a magical meadow...",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 18,
"output_tokens": 98,
"total_tokens": 116
}
}
Using Different Models
OpenAI
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.responses.create(
model="openai/o1-pro",
input="What is the capital of France?"
)
Anthropic Claude
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.responses.create(
model="anthropic/claude-3-5-sonnet-20240620",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)
Google Gemini
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.responses.create(
model="gemini/gemini-1.5-flash",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)
Get a Response
Retrieve a response by its ID:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# Create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a story."
)
# Retrieve it by ID
retrieved_response = client.responses.retrieve(response.id)
print(retrieved_response)
Cancel a Response
You can cancel an in-progress response (if supported by the provider):
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# Create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)
# Cancel the response by ID
cancel_response = client.responses.cancel(response.id)
print(cancel_response)
REST API:
curl -X POST https://api.haimaker.ai/v1/responses/RESPONSE_ID/cancel \
-H "Authorization: Bearer YOUR_API_KEY"
This will attempt to cancel the in-progress response with the given ID.
Note: Not all providers support response cancellation. If unsupported, an error will be raised.
Delete a Response
Delete a response by its ID:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# Create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a story."
)
# Delete it
delete_response = client.responses.delete(response.id)
print(delete_response)
Image Generation
Generate images with the responses API:
import base64
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# OpenAI models require tools parameter for image generation
response = client.responses.create(
model="openai/gpt-4o",
input="Generate a futuristic city at sunset",
tools=[{"type": "image_generation"}]
)
# Access generated images from output
for item in response.output:
if item.type == "image_generation_call":
image_bytes = base64.b64decode(item.result)
with open(f"generated_{item.id}.png", "wb") as f:
f.write(image_bytes)
Gemini Image Generation
Gemini image generation models don't require the tools parameter:
import base64
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.responses.create(
model="gemini/gemini-2.5-flash-image",
input="Generate a cute cat playing with yarn"
)
# Access generated images from output
for item in response.output:
if item.type == "image_generation_call":
# item.result contains pure base64 (no data: prefix)
image_bytes = base64.b64decode(item.result)
# Save the image
with open(f"generated_{item.id}.png", "wb") as f:
f.write(image_bytes)
print(f"Image saved: generated_{response.output[0].id}.png")
Image Generation Response Format
When image generation is successful, the response contains:
{
"id": "resp_abc123",
"status": "completed",
"output": [
{
"type": "image_generation_call",
"id": "resp_abc123_img_0",
"status": "completed",
"result": "iVBORw0KGgo..."
}
]
}
Note: The result field contains pure base64-encoded image data without the data:image/png;base64, prefix. You must decode it with base64.b64decode() before saving.
Supported Image Generation Models
| Provider | Models | Requires tools Parameter |
|---|---|---|
| Google AI Studio | gemini/gemini-2.5-flash-image | No |
| Vertex AI | vertex_ai/gemini-2.5-flash-image-preview | No |
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3 | Yes |
| AWS Bedrock | Stability AI, Amazon Nova Canvas models | Model-specific |
Image Generation with Streaming
import base64
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
stream = client.responses.create(
model="openai/gpt-4.1",
input="Draw a gorgeous image of a river made of white owl feathers",
stream=True,
tools=[{"type": "image_generation", "partial_images": 2}],
)
for event in stream:
if event.type == "response.image_generation_call.partial_image":
idx = event.partial_image_index
image_base64 = event.partial_image_b64
image_bytes = base64.b64decode(image_base64)
with open(f"river{idx}.png", "wb") as f:
f.write(image_bytes)
Session Management with Previous Response
Continue conversations by referencing previous responses:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# Initial request
response = client.responses.create(
model="anthropic/claude-3-5-sonnet-latest",
input="Who is Michael Jordan?"
)
print(f"Response ID: {response.id}")
# Follow-up request referencing the previous response
follow_up = client.responses.create(
model="anthropic/claude-3-5-sonnet-latest",
input="Can you tell me more about him?",
previous_response_id=response.id
)
print(follow_up.output[0].content[0].text)
Session Example: Step by Step
Step 1: Make the initial request (new session)
curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "who is Michael Jordan"
}'
Response:
{
"id":"resp_123abc",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"Michael Jordan is widely considered one of the greatest basketball players of all time. He played for the Chicago Bulls (1984-1993, 1995-1998) and Washington Wizards (2001-2003), winning 6 NBA Championships with the Bulls."
}]
}]
}
Step 2: Continue the conversation with previous_response_id
curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "can you tell me more about him",
"previous_response_id": "resp_123abc"
}'
Response:
{
"id":"resp_456def",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"Michael Jordan was born February 17, 1963. He attended University of North Carolina before being drafted 3rd overall by the Bulls in 1984. Beyond basketball, he built the Air Jordan brand with Nike and later became owner of the Charlotte Hornets."
}]
}]
}
Step 3: Starting a new session (no previous_response_id) shows context is not maintained
curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "can you tell me more about him"
}'
Response:
{
"id":"resp_789ghi",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"I don't see who you're referring to in our conversation. Could you let me know which person you'd like to learn more about?"
}]
}]
}
Server-side Compaction
For long-running conversations, you can enable server-side compaction so that when the rendered context size crosses a threshold, the server automatically runs compaction in-stream and emits a compaction item -- no separate POST /v1/responses/compact call is required.
Supported on the OpenAI Responses API when using the openai or azure provider. Pass context_management with a compaction entry and compact_threshold (token count; minimum 1000). When the context crosses the threshold, the server compacts in-stream and continues. Chain turns with previous_response_id or by appending output items to your next input array. See OpenAI Compaction guide for details.
For explicit control over when compaction runs, use the standalone compact endpoint (POST /v1/responses/compact) instead.
Non-streaming
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# Enable compaction when context exceeds 200k tokens
response = client.responses.create(
model="openai/gpt-4o",
input="Your conversation input...",
context_management=[{"type": "compaction", "compact_threshold": 200000}],
max_output_tokens=1024,
)
print(response)
Streaming
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
# Compaction runs in-stream if threshold is crossed
stream = client.responses.create(
model="openai/gpt-4o",
input="Your conversation input...",
context_management=[{"type": "compaction", "compact_threshold": 200000}],
stream=True,
)
for event in stream:
print(event)
cURL
curl -X POST "https://api.haimaker.ai/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "openai/gpt-4o",
"input": "Your conversation input...",
"context_management": [{"type": "compaction", "compact_threshold": 200000}],
"max_output_tokens": 1024
}'
Shell Tool
The Shell tool lets the model run commands in a hosted container or local runtime (OpenAI Responses API). You pass tools=[{"type": "shell", "environment": {...}}]; the environment object configures the runtime (e.g. type: "container_auto" for auto-provisioned containers). See OpenAI Shell tool guide for full options.
Supported when using the openai or azure provider with a model that supports the Shell tool.
Python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)
response = client.responses.create(
model="openai/gpt-5.2",
input="List files in /mnt/data and run python --version.",
tools=[{"type": "shell", "environment": {"type": "container_auto"}}],
tool_choice="auto",
max_output_tokens=1024,
)
cURL
curl -X POST "https://api.haimaker.ai/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "openai/gpt-5.2",
"input": "List files in /mnt/data.",
"tools": [{"type": "shell", "environment": {"type": "container_auto"}}],
"tool_choice": "auto",
"max_output_tokens": 1024
}'
Supported Responses API Parameters
| Provider | Supported Parameters |
|---|---|
openai | All Responses API parameters are supported |
azure | All Responses API parameters are supported |
anthropic | See provider documentation for supported parameters |
bedrock | See provider documentation for supported parameters |
gemini | See provider documentation for supported parameters |
vertex_ai | See provider documentation for supported parameters |
| All other providers | See provider documentation for supported parameters |
Supported Providers
All models available on haimaker can be used with the /responses endpoint. See haimaker.ai/models or call /v1/models for the full list of available models.
| Provider | Notes |
|---|---|
openai | All Responses API parameters are supported |
azure | All Responses API parameters are supported |
anthropic | See provider documentation for supported parameters |
bedrock | See provider documentation for supported parameters |
gemini | See provider documentation for supported parameters |
vertex_ai | See provider documentation for supported parameters |
| All other providers | See provider documentation for supported parameters |