Gemini
Our most intelligent AI models, built for the agentic era
Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.
Model family
Gemini 2.5 builds on the best of Gemini — with native multimodality and a long context window.
Hands-on with 2.5 Pro
See how Gemini 2.5 Pro uses its reasoning capabilities to create interactive simulations and do advanced coding.
Performance
Gemini 2.5 is state-of-the-art across a wide range of benchmarks.
Benchmarks
Gemini 2.5 Pro demonstrates significantly improved performance across a wide range of benchmarks.
Benchmark | Gemini 2.5 Pro Preview (05-06) | OpenAI o3 | OpenAI GPT-4.1 | Claude 3.7 Sonnet 64k Extended thinking | Grok 3 Beta Extended thinking | DeepSeek R1 | |
---|---|---|---|---|---|---|---|
Input price | $/1M tokens | $2.50 $1.25 <= 200k tokens | $10.00 | $2.00 | $3.00 | $3.00 | $0.55 |
Output price | $/1M tokens | $15.00 $10.00 <= 200k tokens | $40.00 | $8.00 | $15.00 | $15.00 | $2.19 |
Reasoning & knowledge Humanity's Last Exam (no tools) | 17.8% | 20.3% | 5.4% | 8.9% | — | 8.6%* | |
Science GPQA diamond | single attempt (pass@1) | 83.0% | 83.3% | 66.3% | 78.2% | 80.2% | 71.5% |
| multiple attempts | — | — | — | 84.8% | 84.6% | — |
Mathematics AIME 2025 | single attempt (pass@1) | 83.0% | 88.9% | — | 49.5% | 77.3% | 70.0% |
| multiple attempts | — | — | — | — | 93.3% | — |
Code generation LiveCodeBench v5 | single attempt (pass@1) | 75.6% | — | — | — | 70.6% | 64.3% |
| multiple attempts | — | — | — | — | 79.4% | — |
Code editing Aider Polyglot | 76.5% / 72.7% whole / diff | 81.3% / 79.6% whole / diff | 51.6% / 52.9% whole / diff | 64.9% diff | — | 56.9% diff | |
Agentic coding SWE-bench Verified | 63.2% | 69.1% | 54.6% | 70.3% | — | 49.2% | |
Factuality SimpleQA | 50.8% | 49.4% | 41.6% | — | 43.6% | 30.1% | |
Visual reasoning MMMU | single attempt (pass@1) | 79.6% | 82.9% | 75.0% | 75.0% | 76.0% | no MM support |
| multiple attempts | — | — | — | — | 78.0% | no MM support |
Image understanding Vibe-Eval (Reka) | 65.6% | — | — | — | — | no MM support | |
Video Video-MME | 84.8% | — | — | — | — | no MM support | |
Long context MRCR | 128k (average) | 93.0% | — | — | — | — | — |
| 1M (pointwise) | 82.9% | — | — | — | — | — |
Multilingual performance Global MMLU (Lite) | 88.6% | — | — | — | — | — |