Using Audio Models

How to send / receive audio to a /chat/completions endpoint.

Audio Output from a Model

Example for creating a human-like audio response to a prompt.

Python

import base64
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.chat.completions.create(
    model="openai/gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(response.choices[0])

wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

cURL

curl https://api.haimaker.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-4o-audio-preview",
    "modalities": ["text", "audio"],
    "audio": {"voice": "alloy", "format": "wav"},
    "messages": [
      {
        "role": "user",
        "content": "Is a golden retriever a good family dog?"
      }
    ]
  }'

Audio Input to a Model

Python

import base64
import requests
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

# Fetch the audio file and convert it to a base64 encoded string
url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode('utf-8')

response = client.chat.completions.create(
    model="openai/gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is in this recording?"
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": encoded_string,
                        "format": "wav"
                    }
                }
            ]
        },
    ]
)

print(response.choices[0].message)

Response Format with Audio

Below is an example JSON data structure for a message you might receive from a /chat/completions endpoint when requesting audio output.

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "refusal": null,
    "audio": {
      "id": "audio_abc123",
      "expires_at": 1729018505,
      "data": "<bytes omitted>",
      "transcript": "Yes, golden retrievers are known to be ..."
    }
  },
  "finish_reason": "stop"
}

audio: If the audio output modality is requested, this object contains data about the audio response from the model
- audio.id: Unique identifier for the audio response
- audio.expires_at: The Unix timestamp (in seconds) for when this audio response will no longer be accessible on the server for use in multi-turn conversations.
- audio.data: Base64 encoded audio bytes generated by the model, in the format specified in the request.
- audio.transcript: Transcript of the audio generated by the model.

Audio Output from a Model​

Python​

cURL​

Audio Input to a Model​

Python​

Response Format with Audio​

Audio Output from a Model

Python

cURL

Audio Input to a Model

Python

Response Format with Audio