Skip to main content

v1.76.3-stable - Performance, Video Generation & CloudZero Integration

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM
warning

This release has a known issue where startup is leading to Out of Memory errors when deploying on Kubernetes. We recommend waiting before upgrading to this version.

Deploy this version

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.76.3

Key Highlights

  • Major Performance Improvements +400 RPS when using correct amount of workers + CPU cores combination
  • Video Generation Support - Added Google AI Studio and Vertex AI Veo Video Generation through LiteLLM Pass through routes
  • CloudZero Integration - New cost tracking integration for exporting LiteLLM Usage and Spend data to CloudZero.

Major Changes

  • Performance Optimization: LiteLLM Proxy now achieves +400 RPS when using correct amount of CPU cores - PR #14153, PR #14242

    By default, LiteLLM will now use num_workers = os.cpu_count() to achieve optimal performance.

    Override Options:

    Set environment variable:

    DEFAULT_NUM_WORKERS_LITELLM_PROXY=1

    Or start LiteLLM Proxy with:

    litellm --num_workers 1
  • Security Fix: Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229


Performance Improvements

This release includes significant performance optimizations. On our internal benchmarks we saw 1 instance get +400 RPS when using correct amount of workers + CPU cores combination.

  • +400 RPS Performance Boost - LiteLLM Proxy now uses correct amount of CPU cores for optimal performance - PR #14153
  • Default CPU Workers - Changed DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number of CPUs - PR #14242

New Models / Updated Models

New Model Support

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
OpenRouteropenrouter/openai/gpt-4.11M$2.00$8.00Chat completions with vision
OpenRouteropenrouter/openai/gpt-4.1-mini1M$0.40$1.60Efficient chat completions
OpenRouteropenrouter/openai/gpt-4.1-nano1M$0.10$0.40Ultra-efficient chat
Vertex AIvertex_ai/openai/gpt-oss-20b-maas131K$0.075$0.30Reasoning support
Vertex AIvertex_ai/openai/gpt-oss-120b-maas131K$0.15$0.60Advanced reasoning
Geminigemini/veo-3.0-generate-preview1K-$0.75/secVideo generation
Geminigemini/veo-3.0-fast-generate-preview1K-$0.40/secFast video generation
Geminigemini/veo-2.0-generate-0011K-$0.35/secVideo generation
Volcenginedoubao-embedding-large4KFreeFree2048-dim embeddings
Together AItogether_ai/deepseek-ai/DeepSeek-V3.1128K$0.60$1.70Reasoning support

Features

Bug Fixes

New Provider Support

  • Volcengine
    • Added Volcengine embedding module with handler and transformation logic - PR #14028

LLM API Endpoints

Features

Bugs

  • General
    • Remove "/" or ":" from model name when being used as h11 header name - PR #14191
    • Bug fix for openai.gpt-oss when using reasoning_effort parameter - PR #14300

Spend Tracking, Budgets and Rate Limiting

Features

  • Added header support for spend_logs_metadata - PR #14186
  • Litellm passthrough cost tracking for chat completion - PR #14256

Bug Fixes

  • Fixed TPM Rate Limit Bug - PR #14237
  • Fixed Key Budget not resets at expectable times - PR #14241

Management Endpoints / UI

Features

  • UI Improvements
    • Logs page screen size fixed - PR #14135
    • Create Organization Tooltip added on Success - PR #14132
    • Back to Keys should say Back to Logs - PR #14134
    • Add client side pagination on All Models table - PR #14136
    • Model Filters UI improvement - PR #14131
    • Remove table filter on user info page - PR #14169
    • Team name badge added on the User Details - PR #14003
    • Fix: Log page parameter passing error - PR #14193
  • Authentication & Authorization
    • Support for ES256/ES384/ES512 and EdDSA JWT verification - PR #14118
    • Ensure team_id is a required field for generating service account keys - PR #14270

Bugs

  • General
    • Validate store model in db setting - PR #14269

Logging / Guardrail Integrations

Features

Guardrails

  • Added guardrail to the Anthropic API endpoint - PR #14107

New Integration


Performance / Loadbalancing / Reliability improvements

Features

  • Performance
    • LiteLLM Proxy: +400 RPS when using correct amount of CPU cores - PR #14153
    • Allow using x-litellm-stream-timeout header for stream timeout in requests - PR #14147
    • Change DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number CPUs - PR #14242
  • Monitoring
    • Added Prometheus missing metrics - PR #14139
  • Timeout
    • Stream Timeout Control - Allow using x-litellm-stream-timeout header for stream timeout in requests - PR #14147
  • Routing
    • Fixed x-litellm-tags not routing with Responses API - PR #14289

Bugs

  • Security
    • Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229

General Proxy Improvements

Features

  • SCIM Support
    • Added better SCIM debugging - PR #14221
    • Bug fixes for handling SCIM Group Memberships - PR #14226
  • Kubernetes
    • Added optional PodDisruptionBudget for litellm proxy - PR #14093
  • Error Handling
    • Add model to azure error message - PR #14294

New Contributors

  • @iabhi4 made their first contribution in PR #14093
  • @zainhas made their first contribution in PR #14087
  • @LifeDJIK made their first contribution in PR #14146
  • @retanoj made their first contribution in PR #14133
  • @zhxlp made their first contribution in PR #14193
  • @kayoch1n made their first contribution in PR #14191
  • @kutsushitaneko made their first contribution in PR #14171
  • @mjmendo made their first contribution in PR #14176
  • @HarshavardhanK made their first contribution in PR #14213
  • @eycjur made their first contribution in PR #14207
  • @22mSqRi made their first contribution in PR #14241
  • @onlylhf made their first contribution in PR #14028
  • @btpemercier made their first contribution in PR #11319
  • @tremlin made their first contribution in PR #14287
  • @TobiMayr made their first contribution in PR #14262
  • @Eitan1112 made their first contribution in PR #14252

Full Changelog