Back to products
Gemini 2.5 Flash

Gemini 2.5 Flash

Fast, Efficient AI with Controllable Reasoning

Website developers.googleblog.com
Overview

What it is

Gemini 2.5 Flash, is now in preview, offering improved reasoning while prioritizing speed and cost efficiency for developers.

Intent

I need it when

Build AI applications with improved reasoning while maintaining cost efficiency

Gemini 2.5 Flash delivers major upgrades in reasoning capabilities while prioritizing speed and cost, offering the best price-to-performance ratio among thinking models. Developers can toggle thinking on or off and set thinking budgets to balance quality, cost, and latency for their specific use case.

Solve complex multi-step problems like math, research analysis, or code generation

As a fully hybrid reasoning model, Gemini 2.5 Flash performs a thinking process before responding, allowing it to break down complex tasks and plan responses. It ranks second only to Gemini 2.5 Pro on Hard Prompts in LMArena, making it suitable for reasoning-intensive workloads.

Maintain fast response times while improving performance over previous generation models

With thinking budget set to 0, Gemini 2.5 Flash maintains the fast speeds of 2.0 Flash while delivering improved performance. This allows developers to get performance gains without latency penalties for use cases that don't require extended reasoning.

Fine-tune the tradeoff between answer quality and API costs for different application scenarios

Developers can set a thinking budget from 0 to 24,576 tokens to control maximum reasoning tokens. The model automatically decides how much to think based on task complexity, allowing optimization for low-reasoning tasks (simple queries) to high-reasoning tasks (engineering calculations) without overspending.

Drop

Not a fit when

  • User requires production-ready model; Gemini 2.5 Flash is in preview and not yet generally available for full production use
  • User needs maximum reasoning capability without cost constraints; 2.5 Pro is positioned as the superior reasoning model
  • User requires offline or on-device inference; Gemini 2.5 Flash is accessed via Gemini API only
  • User needs guaranteed low latency with complex reasoning; thinking budget increases latency proportionally to reasoning depth
  • User requires pricing transparency before commitment; no pricing information is disclosed in available documentation
Commercials

Pricing

Pricing not specified