Skip to main content

v1.80.5-stable - Gemini 3.0 Support

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.80.5-stable

Key Highlights


Prompt Management



This release introduces LiteLLM Prompt Studio - a comprehensive prompt management solution built directly into the LiteLLM UI. Create, test, and version your prompts without leaving your browser.

You can now do the following on LiteLLM Prompt Studio:

  • Create & Test Prompts: Build prompts with developer messages (system instructions) and test them in real-time with an interactive chat interface
  • Dynamic Variables: Use {{variable_name}} syntax to create reusable prompt templates with automatic variable detection
  • Version Control: Automatic versioning for every prompt update with complete version history tracking and rollback capabilities
  • Prompt Studio: Edit prompts in a dedicated studio environment with live testing and preview

API Integration:

Use your prompts in any application with simple API calls:

response = client.chat.completions.create(
model="gpt-4",
extra_body={
"prompt_id": "your-prompt-id",
"prompt_version": 2, # Optional: specify version
"prompt_variables": {"name": "value"} # Optional: pass variables
}
)

Get started here: LiteLLM Prompt Management Documentation


Performance – /realtime 182× Lower p99 Latency

This update reduces /realtime latency by removing redundant encodings on the hot path, reusing shared SSL contexts, and caching formatting strings that were being regenerated twice per request despite rarely changing.

Results

MetricBeforeAfterImprovement
Median latency2,200 ms59 ms−97% (~37× faster)
p95 latency8,500 ms67 ms−99% (~127× faster)
p99 latency18,000 ms99 ms−99% (~182× faster)
Average latency3,214 ms63 ms−98% (~51× faster)
RPS1651,207+631% (~7.3× increase)

Test Setup

CategorySpecification
Load TestingLocust: 1,000 concurrent users, 500 ramp-up
System4 vCPUs, 8 GB RAM, 4 workers, 4 instances
DatabasePostgreSQL (Redis unused)
Configurationconfig.yaml
Load Scriptno_cache_hits.py

Model Compare UI

New interactive playground UI enables side-by-side comparison of multiple LLM models, making it easy to evaluate and compare model responses.

Features:

  • Compare responses from multiple models in real-time
  • Side-by-side view with synchronized scrolling
  • Support for all LiteLLM-supported models
  • Cost tracking per model
  • Response time comparison
  • Pre-configured prompts for quick and easy testing

Details:

  • Parameterization: Configure API keys, endpoints, models, and model parameters, as well as interaction types (chat completions, embeddings, etc.)

  • Model Comparison: Compare up to 3 different models simultaneously with side-by-side response views

  • Comparison Metrics: View detailed comparison information including:

    • Time To First Token
    • Input / Output / Reasoning Tokens
    • Total Latency
    • Cost (if enabled in config)
  • Safety Filters: Configure and test guardrails (safety filters) directly in the playground interface

Get Started with Model Compare

New Providers and Endpoints

New Providers

ProviderSupported EndpointsDescription
Docker Model Runner/v1/chat/completionsRun LLM models in Docker containers

New Models / Updated Models

New Model Support

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Azureazure/gpt-5.1272K$1.38$11.00Reasoning, vision, PDF input, responses API
Azureazure/gpt-5.1-2025-11-13272K$1.38$11.00Reasoning, vision, PDF input, responses API
Azureazure/gpt-5.1-codex272K$1.38$11.00Responses API, reasoning, vision
Azureazure/gpt-5.1-codex-2025-11-13272K$1.38$11.00Responses API, reasoning, vision
Azureazure/gpt-5.1-codex-mini272K$0.275$2.20Responses API, reasoning, vision
Azureazure/gpt-5.1-codex-mini-2025-11-13272K$0.275$2.20Responses API, reasoning, vision
Azure EUazure/eu/gpt-5-2025-08-07272K$1.375$11.00Reasoning, vision, PDF input
Azure EUazure/eu/gpt-5-mini-2025-08-07272K$0.275$2.20Reasoning, vision, PDF input
Azure EUazure/eu/gpt-5-nano-2025-08-07272K$0.055$0.44Reasoning, vision, PDF input
Azure EUazure/eu/gpt-5.1272K$1.38$11.00Reasoning, vision, PDF input, responses API
Azure EUazure/eu/gpt-5.1-codex272K$1.38$11.00Responses API, reasoning, vision
Azure EUazure/eu/gpt-5.1-codex-mini272K$0.275$2.20Responses API, reasoning, vision
Geminigemini-3-pro-preview2M$1.25$5.00Reasoning, vision, function calling
Geminigemini-3-pro-image2M$1.25$5.00Image generation, reasoning
OpenRouteropenrouter/deepseek/deepseek-v3p1-terminus164K$0.20$0.40Function calling, reasoning
OpenRouteropenrouter/moonshot/kimi-k2-instruct262K$0.60$2.50Function calling, web search
OpenRouteropenrouter/gemini/gemini-3-pro-preview2M$1.25$5.00Reasoning, vision, function calling
XAIxai/grok-4.1-fast2M$0.20$0.50Reasoning, function calling
Together AItogether_ai/z-ai/glm-4.6203K$0.40$1.75Function calling, reasoning
Cerebrascerebras/gpt-oss-120b131K$0.60$0.60Function calling
Bedrockanthropic.claude-sonnet-4-5-20250929-v1:0200K$3.00$15.00Computer use, reasoning, vision

Features

  • Gemini (Google AI Studio + Vertex AI)

    • Add Day 0 gemini-3-pro-preview support - PR #16719
    • Add support for Gemini 3 Pro Image model - PR #16938
    • Add reasoning_content to streaming responses with tools enabled - PR #16854
    • Add includeThoughts=True for Gemini 3 reasoning_effort - PR #16838
    • Support thought signatures for Gemini 3 in responses API - PR #16872
    • Correct wrong system message handling for gemma - PR #16767
    • Gemini 3 Pro Image: capture image_tokens and support cost_per_output_image - PR #16912
    • Fix missing costs for gemini-2.5-flash-image - PR #16882
    • Gemini 3 thought signatures in tool call id - PR #16895
  • Azure

    • Add azure gpt-5.1 models - PR #16817
    • Add Azure models 2025 11 to cost maps - PR #16762
    • Update Azure Pricing - PR #16371
    • Add SSML Support for Azure Text-to-Speech (AVA) - PR #16747
  • OpenAI

    • Support GPT-5.1 reasoning.effort='none' in proxy - PR #16745
    • Add gpt-5.1-codex and gpt-5.1-codex-mini models to documentation - PR #16735
    • Inherit BaseVideoConfig to enable async content response for OpenAI video - PR #16708
  • Anthropic

    • Add support for strict parameter in Anthropic tool schemas - PR #16725
    • Add image as url support to anthropic - PR #16868
    • Add thought signature support to v1/messages api - PR #16812
    • Anthropic - support Structured Outputs output_format for Claude 4.5 sonnet and Opus 4.1 - PR #16949
  • Bedrock

    • Haiku 4.5 correct Bedrock configs - PR #16732
    • Ensure consistent chunk IDs in Bedrock streaming responses - PR #16596
    • Add Claude 4.5 to US Gov Cloud - PR #16957
    • Fix images being dropped from tool results for bedrock - PR #16492
  • Vertex AI

    • Add Vertex AI Image Edit Support - PR #16828
    • Update veo 3 pricing and add prod models - PR #16781
    • Fix Video download for veo3 - PR #16875
  • Snowflake

    • Snowflake provider support: added embeddings, PAT, account_id - PR #15727
  • OCI

    • Add oci_endpoint_id Parameter for OCI Dedicated Endpoints - PR #16723
  • XAI

    • Add support for Grok 4.1 Fast models - PR #16936
  • Together AI

  • Cerebras

    • Fix Cerebras GPT-OSS-120B model name - PR #16939

Bug Fixes

  • OpenAI

    • Fix for 16863 - openai conversion from responses to completions - PR #16864
    • Revert "Make all gpt-5 and reasoning models to responses by default" - PR #16849
  • General

    • Get custom_llm_provider from query param - PR #16731
    • Fix optional param mapping - PR #16852
    • Add None check for litellm_params - PR #16754

LLM API Endpoints

Features

Bugs

  • General
    • Responses API cost tracking with custom deployment names - PR #16778
    • Trim logged response strings in spend-logs - PR #16654

Management Endpoints / UI

Features

  • Proxy CLI Auth

    • Allow using JWTs for signing in with Proxy CLI - PR #16756
  • Virtual Keys

    • Fix Key Model Alias Not Working - PR #16896
  • Models + Endpoints

    • Add additional model settings to chat models in test key - PR #16793
    • Deactivate delete button on model table for config models - PR #16787
    • Change Public Model Hub to use proxyBaseUrl - PR #16892
    • Add JSON Viewer to request/response panel - PR #16687
    • Standarize icon images - PR #16837
  • Teams

  • Fallbacks

    • Fallbacks icon button tooltips and delete with friction - PR #16737
  • MCP Servers

    • Delete user and MCP Server Modal, MCP Table Tooltips - PR #16751
  • Callbacks

    • Expose backend endpoint for callbacks settings - PR #16698
    • Edit add callbacks route to use data from backend - PR #16699
  • Usage & Analytics

    • Allow partial matches for user ID in User Table - PR #16952
  • General UI

    • Allow setting base_url in API reference docs - PR #16674
    • Change /public fields to honor server root path - PR #16930
    • Correct ui build - PR #16702
    • Enable automatic dark/light mode based on system preference - PR #16748

Bugs

  • UI Fixes

    • Fix flaky tests due to antd Notification Manager - PR #16740
    • Fix UI MCP Tool Test Regression - PR #16695
    • Fix edit logging settings not appearing - PR #16798
    • Add css to truncate long request ids in request viewer - PR #16665
    • Remove azure/ prefix in Placeholder for Azure in Add Model - PR #16597
    • Remove UI Session Token from user/info return - PR #16851
    • Remove console logs and errors from model tab - PR #16455
    • Change Bulk Invite User Roles to Match Backend - PR #16906
    • Mock Tremor's Tooltip to Fix Flaky UI Tests - PR #16786
    • Fix e2e ui playwright test - PR #16799
    • Fix Tests in CI/CD - PR #16972
  • SSO

    • Ensure role from SSO provider is used when a user is inserted onto LiteLLM - PR #16794
    • Docs - SSO - Manage User Roles via Azure App Roles - PR #16796
  • Auth

    • Ensure Team Tags works when using JWT Auth - PR #16797
    • Fix key never expires - PR #16692
  • Swagger UI

    • Fixes Swagger UI resolver errors for chat completion endpoints caused by Pydantic v2 $defs not being properly exposed in the OpenAPI schema - PR #16784

AI Integrations

Logging

Guardrails

Prompt Management

  • Prompt Management
    • Allow specifying just prompt_id in a request to a model - PR #16834
    • Add support for versioning prompts - PR #16836
    • Allow storing prompt version in DB - PR #16848
    • Add UI for editing the prompts - PR #16853
    • Allow testing prompts with Chat UI - PR #16898
    • Allow viewing version history - PR #16901
    • Allow specifying prompt version in code - PR #16929
    • UI, allow seeing model, prompt id for Prompt - PR #16932
    • Show "get code" section for prompt management + minor polish of showing version history - PR #16941

Secret Managers


MCP Gateway

  • MCP Hub - Publish/discover MCP Servers within a company - PR #16857
  • MCP Resources - MCP resources support - PR #16800
  • MCP OAuth - Docs - mcp oauth flow details - PR #16742
  • MCP Lifecycle - Drop MCPClient.connect and use run_with_session lifecycle - PR #16696
  • MCP Server IDs - Add mcp server ids - PR #16904
  • MCP URL Format - Fix mcp url format - PR #16940

Performance / Loadbalancing / Reliability improvements

  • Realtime Endpoint Performance - Fix bottlenecks degrading realtime endpoint performance - PR #16670
  • SSL Context Caching - Cache SSL contexts to prevent excessive memory allocation - PR #16955
  • Cache Optimization - Fix cache cooldown key generation - PR #16954
  • Router Cache - Fix routing for requests with same cacheable prefix but different user messages - PR #16951
  • Redis Event Loop - Fix redis event loop closed at first call - PR #16913
  • Dependency Management - Upgrade pydantic to version 2.11.0 - PR #16909

Documentation Updates

  • Provider Documentation

    • Add missing details to benchmark comparison - PR #16690
    • Fix anthropic pass-through endpoint - PR #16883
    • Cleanup repo and improve AI docs - PR #16775
  • API Documentation

    • Add docs related to openai metadata - PR #16872
    • Update docs with all supported endpoints and cost tracking - PR #16872
  • General Documentation

    • Add mini-swe-agent to Projects built on LiteLLM - PR #16971

Infrastructure / CI/CD

  • UI Testing

  • Dependency Management

    • Bump js-yaml from 3.14.1 to 3.14.2 in /tests/proxy_admin_ui_tests/ui_unit_tests - PR #16755
    • Bump js-yaml from 3.14.1 to 3.14.2 - PR #16802
  • Migration

  • Config

  • Release Notes

    • Add perf improvements on embeddings to release notes - PR #16697
    • Docs - v1.80.0 - PR #16694
  • Investigation


New Contributors

  • @mattmorgis made their first contribution in PR #16371
  • @mmandic-coatue made their first contribution in PR #16732
  • @Bradley-Butcher made their first contribution in PR #16725
  • @BenjaminLevy made their first contribution in PR #16757
  • @CatBraaain made their first contribution in PR #16767
  • @tushar8408 made their first contribution in PR #16831
  • @nbsp1221 made their first contribution in PR #16845
  • @idola9 made their first contribution in PR #16832
  • @nkukard made their first contribution in PR #16864
  • @alhuang10 made their first contribution in PR #16852
  • @sebslight made their first contribution in PR #16838
  • @TsurumaruTsuyoshi made their first contribution in PR #16905
  • @cyberjunk made their first contribution in PR #16492
  • @colinlin-stripe made their first contribution in PR #16895
  • @sureshdsk made their first contribution in PR #16883
  • @eiliyaabedini made their first contribution in PR #16875
  • @justin-tahara made their first contribution in PR #16957
  • @wangsoft made their first contribution in PR #16913
  • @dsduenas made their first contribution in PR #16891

Known Issues

  • /audit and /user/available_users routes return 404. Fixed in PR #17337

Full Changelog

View complete changelog on GitHub