Skip to main content

v1.70.1-stable - Gemini Realtime API Support

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Deploy this version

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
docker.litellm.ai/berriai/litellm:main-v1.70.1-stable

Key Highlights

LiteLLM v1.70.1-stable is live now. Here are the key highlights of this release:

  • Gemini Realtime API: You can now call Gemini's Live API via the OpenAI /v1/realtime API
  • Spend Logs Retention Period: Enable deleting spend logs older than a certain period.
  • PII Masking 2.0: Easily configure masking or blocking specific PII/PHI entities on the UI

Gemini Realtime API

This release brings support for calling Gemini's realtime models (e.g. gemini-2.0-flash-live) via OpenAI's /v1/realtime API. This is great for developers as it lets them easily switch from OpenAI to Gemini by just changing the model name.

Key Highlights:

  • Support for text + audio input/output
  • Support for setting session configurations (modality, instructions, activity detection) in the OpenAI format
  • Support for logging + usage tracking for realtime sessions

This is currently supported via Google AI Studio. We plan to release VertexAI support over the coming week.

Read more

Spend Logs Retention Period

This release enables deleting LiteLLM Spend Logs older than a certain period. Since we now enable storing the raw request/response in the logs, deleting old logs ensures the database remains performant in production.

Read more

PII Masking 2.0

This release brings improvements to our Presidio PII Integration. As a Proxy Admin, you now have the ability to:

  • Mask or block specific entities (e.g., block medical licenses while masking other entities like emails).
  • Monitor guardrails in production. LiteLLM Logs will now show you the guardrail run, the entities it detected, and its confidence score for each entity.

Read more

New Models / Updated Models

  • Gemini (VertexAI + Google AI Studio)
    • /chat/completion
      • Handle audio input - PR
      • Fixes maximum recursion depth issue when using deeply nested response schemas with Vertex AI by Increasing DEFAULT_MAX_RECURSE_DEPTH from 10 to 100 in constants. PR
      • Capture reasoning tokens in streaming mode - PR
  • Google AI Studio
    • /realtime
      • Gemini Multimodal Live API support
      • Audio input/output support, optional param mapping, accurate usage calculation - PR
  • VertexAI
    • /chat/completion
      • Fix llama streaming error - where model response was nested in returned streaming chunk - PR
  • Ollama
    • /chat/completion
      • structure responses fix - PR
  • Bedrock
    • /chat/completion
      • Handle thinking_blocks when assistant.content is None - PR
      • Fixes to only allow accepted fields for tool json schema - PR
      • Add bedrock sonnet prompt caching cost information
      • Mistral Pixtral support - PR
      • Tool caching support - PR
    • /messages
      • allow using dynamic AWS Params - PR
  • Nvidia NIM
  • Novita AI
    • New Provider added for /chat/completion routes - PR
  • Azure
  • Cohere
    • /embeddings
      • Migrate embedding to use /v2/embed - adds support for output_dimensions param - PR
  • Anthropic
  • VLLM
    • /embeddings
      • Support embedding input as list of integers
  • OpenAI

LLM API Endpoints

  • Responses API
    • Fix delete API support - PR
  • Rerank API
    • /v2/rerank now registered as ‘llm_api_route’ - enabling non-admins to call it - PR

Spend Tracking Improvements

  • /chat/completion, /messages
    • Anthropic - web search tool cost tracking - PR
    • Groq - update model max tokens + cost information - PR
  • /audio/transcription
    • Azure - Add gpt-4o-mini-tts pricing - PR
    • Proxy - Fix tracking spend by tag - PR
  • /embeddings
    • Azure AI - Add cohere embed v4 pricing - PR

Management Endpoints / UI

Logging / Alerting Integrations

Guardrails

  • Guardrails
    • New /apply_guardrail endpoint for directly testing a guardrail - PR
  • Lakera
    • /v2 endpoints support - PR
  • Presidio
    • Fixes handling of message content on presidio guardrail integration - PR
    • Allow specifying PII Entities Config - PR
  • Aim Security
    • Support for anonymization in AIM Guardrails - PR

Performance / Loadbalancing / Reliability improvements

General Proxy Improvements

  • Authentication
    • Handle Bearer $LITELLM_API_KEY in x-litellm-api-key custom header PR
  • New Enterprise pip package - litellm-enterprise - fixes issue where enterprise folder was not found when using pip package
  • Proxy CLI
    • Add models import command - PR
  • OpenWebUI
    • Configure LiteLLM to Parse User Headers from Open Web UI
  • LiteLLM Proxy w/ LiteLLM SDK
    • Option to force/always use the litellm proxy when calling via LiteLLM SDK

New Contributors

Demo Instance

Here's a Demo Instance to test changes:

Git Diff