Skip to main content

/responses

haimaker provides an endpoint in the spec of OpenAI's /responses API.

Requests to /chat/completions may be bridged here automatically when the provider lacks support for that endpoint. The model's default mode determines how bridging works.

FeatureSupportedNotes
Cost TrackingWorks with all supported models
LoggingWorks across all integrations
End-user Tracking
Streaming
Image Generation StreamingProgressive image generation with partial images (1-3)
FallbacksWorks between supported models
LoadbalancingWorks between supported models
GuardrailsApplies to input and output text (non-streaming only)
Supported operationsCreate, Get, Cancel, Delete a response
Supported ProvidersAll providers: openai, anthropic, bedrock, vertex_ai, gemini, azure, azure_ai etc.

Quick Start

Python

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# Non-streaming response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

print(response)

cURL

curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "openai/o1-pro",
"input": "Tell me a three sentence bedtime story about a unicorn."
}'

Streaming

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)

for event in response:
print(event)

Response Format

{
"id": "resp_abc123",
"object": "response",
"created_at": 1734366691,
"status": "completed",
"model": "o1-pro-2025-01-30",
"output": [
{
"type": "message",
"id": "msg_abc123",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Once upon a time, a little unicorn named Stardust lived in a magical meadow...",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 18,
"output_tokens": 98,
"total_tokens": 116
}
}

Using Different Models

OpenAI

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.responses.create(
model="openai/o1-pro",
input="What is the capital of France?"
)

Anthropic Claude

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.responses.create(
model="anthropic/claude-3-5-sonnet-20240620",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

Google Gemini

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.responses.create(
model="gemini/gemini-1.5-flash",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

Get a Response

Retrieve a response by its ID:

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# Create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a story."
)

# Retrieve it by ID
retrieved_response = client.responses.retrieve(response.id)
print(retrieved_response)

Cancel a Response

You can cancel an in-progress response (if supported by the provider):

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# Create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

# Cancel the response by ID
cancel_response = client.responses.cancel(response.id)
print(cancel_response)

REST API:

curl -X POST https://api.haimaker.ai/v1/responses/RESPONSE_ID/cancel \
-H "Authorization: Bearer YOUR_API_KEY"

This will attempt to cancel the in-progress response with the given ID.

Note: Not all providers support response cancellation. If unsupported, an error will be raised.

Delete a Response

Delete a response by its ID:

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# Create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a story."
)

# Delete it
delete_response = client.responses.delete(response.id)
print(delete_response)

Image Generation

Generate images with the responses API:

import base64
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# OpenAI models require tools parameter for image generation
response = client.responses.create(
model="openai/gpt-4o",
input="Generate a futuristic city at sunset",
tools=[{"type": "image_generation"}]
)

# Access generated images from output
for item in response.output:
if item.type == "image_generation_call":
image_bytes = base64.b64decode(item.result)
with open(f"generated_{item.id}.png", "wb") as f:
f.write(image_bytes)

Gemini Image Generation

Gemini image generation models don't require the tools parameter:

import base64
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.responses.create(
model="gemini/gemini-2.5-flash-image",
input="Generate a cute cat playing with yarn"
)

# Access generated images from output
for item in response.output:
if item.type == "image_generation_call":
# item.result contains pure base64 (no data: prefix)
image_bytes = base64.b64decode(item.result)

# Save the image
with open(f"generated_{item.id}.png", "wb") as f:
f.write(image_bytes)

print(f"Image saved: generated_{response.output[0].id}.png")

Image Generation Response Format

When image generation is successful, the response contains:

{
"id": "resp_abc123",
"status": "completed",
"output": [
{
"type": "image_generation_call",
"id": "resp_abc123_img_0",
"status": "completed",
"result": "iVBORw0KGgo..."
}
]
}

Note: The result field contains pure base64-encoded image data without the data:image/png;base64, prefix. You must decode it with base64.b64decode() before saving.

Supported Image Generation Models

ProviderModelsRequires tools Parameter
Google AI Studiogemini/gemini-2.5-flash-imageNo
Vertex AIvertex_ai/gemini-2.5-flash-image-previewNo
OpenAIgpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3Yes
AWS BedrockStability AI, Amazon Nova Canvas modelsModel-specific

Image Generation with Streaming

import base64
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

stream = client.responses.create(
model="openai/gpt-4.1",
input="Draw a gorgeous image of a river made of white owl feathers",
stream=True,
tools=[{"type": "image_generation", "partial_images": 2}],
)

for event in stream:
if event.type == "response.image_generation_call.partial_image":
idx = event.partial_image_index
image_base64 = event.partial_image_b64
image_bytes = base64.b64decode(image_base64)
with open(f"river{idx}.png", "wb") as f:
f.write(image_bytes)

Session Management with Previous Response

Continue conversations by referencing previous responses:

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# Initial request
response = client.responses.create(
model="anthropic/claude-3-5-sonnet-latest",
input="Who is Michael Jordan?"
)

print(f"Response ID: {response.id}")

# Follow-up request referencing the previous response
follow_up = client.responses.create(
model="anthropic/claude-3-5-sonnet-latest",
input="Can you tell me more about him?",
previous_response_id=response.id
)

print(follow_up.output[0].content[0].text)

Session Example: Step by Step

Step 1: Make the initial request (new session)

curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "who is Michael Jordan"
}'

Response:

{
"id":"resp_123abc",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"Michael Jordan is widely considered one of the greatest basketball players of all time. He played for the Chicago Bulls (1984-1993, 1995-1998) and Washington Wizards (2001-2003), winning 6 NBA Championships with the Bulls."
}]
}]
}

Step 2: Continue the conversation with previous_response_id

curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "can you tell me more about him",
"previous_response_id": "resp_123abc"
}'

Response:

{
"id":"resp_456def",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"Michael Jordan was born February 17, 1963. He attended University of North Carolina before being drafted 3rd overall by the Bulls in 1984. Beyond basketball, he built the Air Jordan brand with Nike and later became owner of the Charlotte Hornets."
}]
}]
}

Step 3: Starting a new session (no previous_response_id) shows context is not maintained

curl https://api.haimaker.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "can you tell me more about him"
}'

Response:

{
"id":"resp_789ghi",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"I don't see who you're referring to in our conversation. Could you let me know which person you'd like to learn more about?"
}]
}]
}

Server-side Compaction

For long-running conversations, you can enable server-side compaction so that when the rendered context size crosses a threshold, the server automatically runs compaction in-stream and emits a compaction item -- no separate POST /v1/responses/compact call is required.

Supported on the OpenAI Responses API when using the openai or azure provider. Pass context_management with a compaction entry and compact_threshold (token count; minimum 1000). When the context crosses the threshold, the server compacts in-stream and continues. Chain turns with previous_response_id or by appending output items to your next input array. See OpenAI Compaction guide for details.

For explicit control over when compaction runs, use the standalone compact endpoint (POST /v1/responses/compact) instead.

Non-streaming

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# Enable compaction when context exceeds 200k tokens
response = client.responses.create(
model="openai/gpt-4o",
input="Your conversation input...",
context_management=[{"type": "compaction", "compact_threshold": 200000}],
max_output_tokens=1024,
)
print(response)

Streaming

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

# Compaction runs in-stream if threshold is crossed
stream = client.responses.create(
model="openai/gpt-4o",
input="Your conversation input...",
context_management=[{"type": "compaction", "compact_threshold": 200000}],
stream=True,
)
for event in stream:
print(event)

cURL

curl -X POST "https://api.haimaker.ai/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "openai/gpt-4o",
"input": "Your conversation input...",
"context_management": [{"type": "compaction", "compact_threshold": 200000}],
"max_output_tokens": 1024
}'

Shell Tool

The Shell tool lets the model run commands in a hosted container or local runtime (OpenAI Responses API). You pass tools=[{"type": "shell", "environment": {...}}]; the environment object configures the runtime (e.g. type: "container_auto" for auto-provisioned containers). See OpenAI Shell tool guide for full options.

Supported when using the openai or azure provider with a model that supports the Shell tool.

Python

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.responses.create(
model="openai/gpt-5.2",
input="List files in /mnt/data and run python --version.",
tools=[{"type": "shell", "environment": {"type": "container_auto"}}],
tool_choice="auto",
max_output_tokens=1024,
)

cURL

curl -X POST "https://api.haimaker.ai/v1/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "openai/gpt-5.2",
"input": "List files in /mnt/data.",
"tools": [{"type": "shell", "environment": {"type": "container_auto"}}],
"tool_choice": "auto",
"max_output_tokens": 1024
}'

Supported Responses API Parameters

ProviderSupported Parameters
openaiAll Responses API parameters are supported
azureAll Responses API parameters are supported
anthropicSee provider documentation for supported parameters
bedrockSee provider documentation for supported parameters
geminiSee provider documentation for supported parameters
vertex_aiSee provider documentation for supported parameters
All other providersSee provider documentation for supported parameters

Supported Providers

All models available on haimaker can be used with the /responses endpoint. See haimaker.ai/models or call /v1/models for the full list of available models.

ProviderNotes
openaiAll Responses API parameters are supported
azureAll Responses API parameters are supported
anthropicSee provider documentation for supported parameters
bedrockSee provider documentation for supported parameters
geminiSee provider documentation for supported parameters
vertex_aiSee provider documentation for supported parameters
All other providersSee provider documentation for supported parameters