Gemini 2.5 Flash

Google's price-performance workhorse with thinking and a 1M-token context.

Context window

1,048,576 tokens (1M) input; up to 65,535 output

Input / 1M tokens

$0.30

Output / 1M tokens

$2.50

Provider

Google DeepMind

Paid tier: text/image/video input $0.30/1M, audio input $1.00/1M; output $2.50/1M. A free tier is also available with rate limits. · Data verified 2026-07-02

Gemini 2.5 Flash is Google's best price-performance model in the 2.5 generation, built for low-latency, high-volume tasks. It includes optional built-in 'thinking' with a configurable thinking budget for greater accuracy on reasoning, coding, math, and scientific tasks, and supports a 1M-token context window with multimodal input. It is significantly cheaper than 2.5 Pro while remaining capable, making it a common default for production apps.

Capability index

Relative estimates (0-100) to place this model against its peers, grounded in published benchmarks.

Coding

Reasoning

Math

Multimodal

Long context

Speed

Cost efficiency

How to access it

Available through Google AI Studio, the Gemini API, and Vertex AI. Create an API key at aistudio.google.com and call model 'gemini-2.5-flash'.

Get access →Documentation →

Strengths

✓Excellent cost-to-performance ratio
✓Low latency and high throughput
✓Configurable thinking budget to trade off cost vs quality
✓1M-token context window and multimodal input

Best for developers who...

High-volume production workloadsCost-sensitive chat and RAG appsFast multimodal processing

When to choose it (and when not to)

Reach for Gemini 2.5 Flash when...

→High-volume, latency-sensitive production workloads
→Chatbots, extraction, classification, and summarization at scale
→You need decent reasoning but must control costs

Look elsewhere if...

✕The hardest reasoning/STEM tasks (use 2.5 Pro or a frontier model)
✕Tasks needing the newest agentic capabilities (consider Gemini 3.5 Flash)
✕Open-weight / self-hosting requirements

How to use it

›Tune the thinking budget: raise it for hard tasks, set it low or off for simple high-volume calls
›Batch similar requests to maximize throughput and cost efficiency
›Give explicit output schemas for structured extraction
›Prefer text/image/video inputs where possible since audio input is priced higher

Quickstart

Python

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Summarize this text in one sentence: ...",
)
print(response.text)

Install with `pip install google-genai`. Get an API key at aistudio.google.com.

API model id: gemini-2.5-flash

Benchmarks

Benchmark	Score	Notes
Context window	1M tokens	~1,048,576 input tokens, up to 65,535 output
Input price	$0.30/1M	Text/image/video; audio input $1.00/1M

Source: Google Gemini API pricing docs

Compare Gemini 2.5 Flash with

Gemini 2.5 Flash vs Gemini 2.5 Pro

Google DeepMind - 1,048,576 tokens (1M) input; up to 65K output ctx

Compare →

Gemini 2.5 Flash vs Claude Haiku 4.5

Anthropic - 200K ctx

Compare →

Gemini 2.5 Flash vs GPT-4o

OpenAI - 128,000 tokens (16,384 max output) ctx

Compare →

All model comparisons →

Learn the concepts

Inference RAG (Retrieval-Augmented Generation)Context Window Latency vs Throughput

← All AI models