/completions

Overview

Feature	Supported	Notes
Cost Tracking	✅	Works with all supported models
Logging	✅	Works across all integrations
Streaming	✅
Fallbacks	✅	Works between supported models
Loadbalancing	✅	Works between supported models
Supported Providers	`openai`, `azure`

Quick Start

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.haimaker.ai/v1"
)

response = client.completions.create(
    model="text-completion-openai/gpt-3.5-turbo-instruct",
    prompt="Say this is a test",
    max_tokens=7
)

print(response)

cURL

curl https://api.haimaker.ai/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "text-completion-openai/gpt-3.5-turbo-instruct",
    "prompt": "Say this is a test",
    "max_tokens": 7
  }'

Input Params

haimaker accepts and translates the OpenAI Text Completion params across all supported providers.

Required Fields

model: string - ID of the model to use
prompt: string or array - The prompt(s) to generate completions for

Optional Fields

best_of: integer - Generates best_of completions server-side and returns the "best" one
echo: boolean - Echo back the prompt in addition to the completion.
frequency_penalty: number - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency.
logit_bias: map - Modify the likelihood of specified tokens appearing in the completion
logprobs: integer - Include the log probabilities on the logprobs most likely tokens. Max value of 5
max_tokens: integer - The maximum number of tokens to generate.
n: integer - How many completions to generate for each prompt.
presence_penalty: number - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
seed: integer - If specified, system will attempt to make deterministic samples
stop: string or array - Up to 4 sequences where the API will stop generating tokens
stream: boolean - Whether to stream back partial progress. Defaults to false
suffix: string - The suffix that comes after a completion of inserted text
temperature: number - What sampling temperature to use, between 0 and 2.
top_p: number - An alternative to sampling with temperature, called nucleus sampling.
user: string - A unique identifier representing your end-user

Output Format

Follows OpenAI's output format

Non-Streaming Response

{
  "id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
  "object": "text_completion",
  "created": 1589478378,
  "model": "gpt-3.5-turbo-instruct",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "text": "\n\nThis is indeed a test",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}

Streaming Response

{
  "id": "cmpl-7iA7iJjj8V2zOkCGvWF2hAkDWBQZe",
  "object": "text_completion",
  "created": 1690759702,
  "choices": [
    {
      "text": "This",
      "index": 0,
      "logprobs": null,
      "finish_reason": null
    }
  ],
  "model": "gpt-3.5-turbo-instruct",
  "system_fingerprint": "fp_44709d6fcb"
}

Supported Providers

Provider	Notes
OpenAI	Use `text-completion-openai/` prefix
Azure OpenAI	Use `azure/` prefix

Overview​

Quick Start​

Python​

cURL​

Input Params​

Required Fields​

Optional Fields​

Output Format​

Non-Streaming Response​

Streaming Response​

Supported Providers​

Overview

Quick Start

Python

cURL

Input Params

Required Fields

Optional Fields

Output Format

Non-Streaming Response

Streaming Response

Supported Providers