Skip to main content

/completions

Overview

FeatureSupportedNotes
Cost TrackingWorks with all supported models
LoggingWorks across all integrations
Streaming
FallbacksWorks between supported models
LoadbalancingWorks between supported models
Supported Providersopenai, azure

Quick Start

Python

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.haimaker.ai/v1"
)

response = client.completions.create(
model="text-completion-openai/gpt-3.5-turbo-instruct",
prompt="Say this is a test",
max_tokens=7
)

print(response)

cURL

curl https://api.haimaker.ai/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "text-completion-openai/gpt-3.5-turbo-instruct",
"prompt": "Say this is a test",
"max_tokens": 7
}'

Input Params

haimaker accepts and translates the OpenAI Text Completion params across all supported providers.

Required Fields

  • model: string - ID of the model to use
  • prompt: string or array - The prompt(s) to generate completions for

Optional Fields

  • best_of: integer - Generates best_of completions server-side and returns the "best" one
  • echo: boolean - Echo back the prompt in addition to the completion.
  • frequency_penalty: number - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency.
  • logit_bias: map - Modify the likelihood of specified tokens appearing in the completion
  • logprobs: integer - Include the log probabilities on the logprobs most likely tokens. Max value of 5
  • max_tokens: integer - The maximum number of tokens to generate.
  • n: integer - How many completions to generate for each prompt.
  • presence_penalty: number - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
  • seed: integer - If specified, system will attempt to make deterministic samples
  • stop: string or array - Up to 4 sequences where the API will stop generating tokens
  • stream: boolean - Whether to stream back partial progress. Defaults to false
  • suffix: string - The suffix that comes after a completion of inserted text
  • temperature: number - What sampling temperature to use, between 0 and 2.
  • top_p: number - An alternative to sampling with temperature, called nucleus sampling.
  • user: string - A unique identifier representing your end-user

Output Format

Follows OpenAI's output format

Non-Streaming Response

{
"id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
"object": "text_completion",
"created": 1589478378,
"model": "gpt-3.5-turbo-instruct",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"text": "\n\nThis is indeed a test",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 7,
"total_tokens": 12
}
}

Streaming Response

{
"id": "cmpl-7iA7iJjj8V2zOkCGvWF2hAkDWBQZe",
"object": "text_completion",
"created": 1690759702,
"choices": [
{
"text": "This",
"index": 0,
"logprobs": null,
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct",
"system_fingerprint": "fp_44709d6fcb"
}

Supported Providers

ProviderNotes
OpenAIUse text-completion-openai/ prefix
Azure OpenAIUse azure/ prefix