Skip to main content

DAY 0 Support: Gemini 3 Flash on LiteLLM

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

LiteLLM now supports gemini-3-flash-preview and all the new API changes along with it.

note

If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.

Deploy this version

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1

What's New

1. New Thinking Levels: thinkingLevel with MINIMAL & MEDIUM

Gemini 3 Flash introduces granular thinking control with thinkingLevel instead of thinkingBudget.

  • MINIMAL: Ultra-lightweight thinking for fast responses
  • MEDIUM: Balanced thinking for complex reasoning
  • HIGH: Maximum reasoning depth

LiteLLM automatically maps the OpenAI reasoning_effort parameter to Gemini's thinkingLevel, so you can use familiar reasoning_effort values (minimal, low, medium, high) without changing your code!

2. Thought Signatures

Like gemini-3-pro, this model also includes thought signatures for tool calls. LiteLLM handles signature extraction and embedding internally. Learn more about thought signatures.

Edge Case Handling: If thought signatures are missing in the request, LiteLLM adds a dummy signature ensuring the API call doesn't break


Supported Endpoints

LiteLLM provides full end-to-end support for Gemini 3 Flash on:

  • /v1/chat/completions - OpenAI-compatible chat completions endpoint
  • /v1/responses - OpenAI Responses API endpoint (streaming and non-streaming)
  • /v1/messages - Anthropic-compatible messages endpoint
  • /v1/generateContentGoogle Gemini API compatible endpoint All endpoints support:
  • Streaming and non-streaming responses
  • Function calling with thought signatures
  • Multi-turn conversations
  • All Gemini 3-specific features
  • Converstion of provider specific thinking related param to thinkingLevel

Quick Start

Basic Usage with MEDIUM thinking (NEW)

from litellm import completion

# No need to make any changes to your code as we map openai reasoning param to thinkingLevel
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Solve this complex math problem: 25 * 4 + 10"}],
reasoning_effort="medium", # NEW: MEDIUM thinking level
)

print(response.choices[0].message.content)

Key Features

Thinking Levels: MINIMAL, LOW, MEDIUM, HIGH
Thought Signatures: Track reasoning with unique identifiers
Seamless Integration: Works with existing OpenAI-compatible client
Backward Compatible: Gemini 2.5 models continue using thinkingBudget


Installation

pip install litellm --upgrade
import litellm
from litellm import completion

response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Your question here"}],
reasoning_effort="medium", # Use MEDIUM thinking
)
print(response)
note

If using this model via vertex_ai, keep the location as global as this is the only supported location as of now.

reasoning_effort Mapping for Gemini 3+

reasoning_effortthinking_level
minimalminimal
lowlow
mediummedium
highhigh
disableminimal
noneminimal