Gemini 3.5 Flash + Gemini Spark: Google I/O 2026
Google's I/O 2026 keynote on May 19 was unusually focused. Instead of dropping a half-dozen models across tiers, Sundar Pichai's team led with two things: Gemini 3.5 Flash (a Flash-class model that meaningfully outperforms last cycle's Pro tier) and Gemini Spark (a persistent personal agent that runs on top of it). Everything else — Antigravity updates, AI Mode in Search, the new Intelligent Eyewear preview — orbited those two announcements.
This guide is what we'd send a developer asking "what actually shipped, what's it good for, and how does it compare to Claude Opus 4.7 or GPT-5.5?" It separates what Google confirmed in official docs from what's still rumored or only available behind closed-tester access.
Want the full picture? Read our continuously-updated Claude Opus 4.7 — Anthropic's flagship general-purpose model — see the full guide for benchmarks and pricing comparisons referenced throughout this article..
What is Gemini 3.5 Flash?
Gemini 3.5 Flash is the first model in Google's 3.5 series. The pitch from DeepMind is that this Flash-tier model is now strong enough to handle the coding and agentic workloads that previously required Gemini 3.1 Pro — at a fraction of the latency and a meaningful (but not enormous) cost premium over the previous Flash generation.
Key reported numbers from the official model card and launch posts:
| Capability | Gemini 3.5 Flash |
|---|---|
| Context window (input) | 1,048,576 tokens (~1M) |
| Output tokens (max) | 65,536 |
| Modalities (input) | Text, image, video, audio |
| Modalities (output) | Text + rich graphics generation |
| Output speed | ~280 tokens/sec (Artificial Analysis) |
| Terminal-Bench 2.1 | 76.2% |
| GDPval-AA (agentic) | 1656 Elo |
| MCP Atlas (tool use) | 83.6% |
| CharXiv Reasoning (multimodal) | 84.2% |
| MMMU-Pro | 83.6% (record on this benchmark) |
| MRCR v2 128k (long-context) | 77.3% |
The long-context number is the one most worth pausing on. A 1M-token context window has been table stakes since the Gemini 1.5 days, but actually using 1M tokens effectively is another problem. On MRCR v2 at 128k, Gemini 3.5 Flash scores 77.3% — versus 46.9% for Claude Opus 4.7 and 41.4% for GPT-5.5 on the same eval. If you're doing real codebase-spanning work or long document analysis, that gap matters more than headline reasoning scores.
How is 3.5 Flash different from 3.1 Flash?
Three things changed materially between the April Gemini 3.1 cycle and the May 3.5 release:
- Agentic capability. 3.5 Flash is the model Google is putting under Antigravity and Spark. Long-horizon tool use (the MCP Atlas number) is where most of the headline gains land, not raw reasoning.
- Coding tier shift. On Terminal-Bench 2.1 (76.2%), 3.5 Flash beats Gemini 3.1 Pro. That's the line Google's been waiting to cross — "last cycle's Pro for less than this cycle's Pro will cost."
- Output speed. ~280 tok/s is roughly 4x what frontier competitors push. That's the actual product unlock for agents: an agent that takes 40 seconds per turn feels broken; one that takes 8 seconds feels usable.
What didn't change much: the input context window (still ~1M tokens) and the modalities supported. The architecture refinements are doing the work, not raw scale.
How much does Gemini 3.5 Flash cost?
Confirmed API pricing at launch:
| Token type | Price (per 1M tokens) |
|---|---|
| Input | $1.50 |
| Output | $9.00 |
| Cached input | $0.15 |
That's roughly 3x the per-token cost of Gemini 3.1 Flash (3x the Gemini 3 Flash Preview it replaces, 6x Gemini 3.1 Flash-Lite), which is why launch coverage led with the price hike. The honest framing: it's still meaningfully cheaper than Claude Opus 4.7 or GPT-5.5 for the same workload, and Google's published data shows tasks completing in fewer tokens because the agentic loops converge faster. On a per-task basis (not per-token), most workloads end up cheaper than the comparable Anthropic or OpenAI run.
Where the math gets uncomfortable: if you were happily running cheap classification or simple generation jobs on Gemini 3.1 Flash and don't need any of the agentic upgrades, your bill just tripled. Google clearly expects everyone to use 3.5 Flash for everything, but if you're cost-sensitive on volume workloads, 3.1 Flash is still around and still cheap.
What is Gemini Spark?
Gemini Spark is positioned as a personal AI agent rather than a model. It runs on top of Gemini 3.5 Flash and the Antigravity agentic platform, and the framing is that it persists across your digital life — checking email, drafting replies, browsing on your behalf, executing multi-step tasks — under your direction and with safety check-ins before major actions.
What's confirmed:
- Powered by Gemini 3.5 Flash, deployed via Google Antigravity
- Lives inside the Gemini app rather than as a separate product surface
- "Trusted tester" access at I/O launch; broader beta to Google AI Ultra ($100/month, also included in the $200 tier reduced from $250) subscribers in the U.S. the week after I/O
- Designed to ask before taking material actions on your behalf
What's still rumored or unverified at the time of writing:
- Whether Spark will have a developer API or stays consumer-only
- Whether free Gemini app users get a watered-down version, or none at all
- How Spark handles cross-account credentials (Gmail, Calendar, Drive are confirmed; third-party SaaS is gestured at but not detailed)
- International rollout timeline beyond the U.S.
If your mental model is "OpenAI Operator but inside the Gemini app and running on a frontier-speed model," you're in the right neighborhood. The difference Google is leaning into is persistence: Spark isn't a session you start, it's an agent that's always there with a memory of what you've asked it to handle.
Where does 3.5 Flash fit in the Gemini lineup?
Google's current tier structure (as of the I/O 2026 announcement):
| Tier | Status | Use case |
|---|---|---|
| Gemini 3.5 Pro | Internal use; public rollout "next month" | Maximum reasoning, expected to land in June 2026 |
| Gemini 3.5 Flash | Generally available now | Frontier-class agentic and coding workloads |
| Gemini 3.1 Ultra / Pro | Still in API | Legacy workloads; native multimodal reasoning at 2M context |
| Gemini 3.1 Flash | Still in API | Cost-sensitive bulk workloads |
| Gemini Nano | On-device | Pixel, Chrome, Android workloads |
The unusual move here is that 3.5 launched with Flash before Pro. Historically Google led with the heavy tier and trickled down. The new sequencing — Flash first, Pro "next month" — is the most concrete sign that the bottleneck is no longer raw capability, it's getting a fast, cheap, agent-shaped model into developers' hands.
How does Gemini 3.5 Flash compare to Claude Opus 4.7 and GPT-5.5?
Honest read on the comparison: Gemini 3.5 Flash is best-in-class for two things — long-context retrieval and multimodal input — and competitive on agentic coding. It is not the strongest pure-reasoning model on the market.
| Capability | Gemini 3.5 Flash | Claude Opus 4.7 | GPT-5.5 |
|---|---|---|---|
| Context window | 1M input / 64k output | 1M input (lower output cap) | 1M input (lower output cap) |
| Multimodal input | Text, image, video, audio | Text, image | Text, image |
| Output speed (tok/s) | ~280 | ~67 | ~71 |
| MRCR v2 128k | 77.3% | 46.9% | 41.4% |
| Input price ($/1M) | $1.50 | significantly higher | significantly higher |
| Best at | Long-context, multimodal, fast agents | Deep reasoning, code architecture | General reasoning, breadth |
If you're shipping an agent that reads a 200k-token codebase and acts on it, 3.5 Flash is the practical pick. If you're doing dense analytical reasoning where time-per-token is irrelevant, GPT-5.5 or Claude Opus 4.7 are still the safer choices. Most production teams will end up routing across models per task rather than committing to one — that's the reality the agent frameworks have been quietly preparing for.
Where does Gemini 3.5 Flash win?
From the early benchmarks and our own poking around, the wins cluster in:
- Video understanding. Native video input means you can pass an actual MP4 (not a transcript) and ask questions about it. Neither Claude nor GPT-5.5 does this end-to-end.
- Audio analysis. Same story — native audio input. For voice-driven product flows, you can skip the Whisper step entirely.
- Long-codebase agents. The MRCR gap is real. Passing a whole repo and asking for cross-file refactors is where Gemini 3.5 Flash punches above its tier.
- Latency-bound agentic loops. ~280 tok/s means a tool-using agent feels fluid. The difference between 8-second and 40-second turn times changes what's product-shippable.
- Cost on multi-step tasks. Even though per-token pricing went up, agentic loops converging in fewer steps tends to beat slower, cheaper-per-token models on total bill.
How do you access Gemini 3.5 Flash via the API?
Three official surfaces at launch:
- Google AI Studio — quickest path; model identifier is
gemini-3.5-flash. Free tier with rate limits, paid tier at the pricing above. - Vertex AI — for enterprise workloads with VPC, audit logs, regional deployment.
- Google Antigravity — the agent-first platform; this is where Spark itself runs and where you'd build production agents.
The Gemini API is OpenAI-compatible at the chat-completions endpoint, so if you're already abstracted behind an OpenAI client, switching is a base-URL change plus a model name change. SDKs available for Python, Node, Go.
For tool use, Google is pushing the Model Context Protocol (MCP) as the standard. If you've already built MCP tools against Claude or another MCP-aware model, they should drop in with minor adjustment. That's reinforced by the 83.6% MCP Atlas score — Google is investing in MCP being the agentic interop layer.
Should you switch to Gemini 3.5 Flash?
Rough decision framework:
- You're building an agent that needs video or audio input. Switch. Nothing else competes on native multimodal.
- You're doing whole-repo code work. Try it side-by-side with Claude Opus 4.7. The MRCR numbers suggest a real edge, but model-of-the-month claims don't always survive contact with your specific codebase.
- You're running high-volume cheap classification. Stay on 3.1 Flash for now. 3.5 Flash is 3x the price; the upgrade may not pay back.
- You're doing deep analytical reasoning where speed doesn't matter. Claude Opus 4.7 is still where we'd reach first.
- You're building a consumer-facing agent. Worth a serious look. The latency advantage is product-level, not benchmark-level.
For most engineering teams, the practical move is to set up a routing layer (or use a framework that does) and run side-by-side comparisons on your real workload before committing. The AI coding agents landscape is where this multi-model reality is most visible — every serious tool now routes across providers, and which model wins on which task is constantly shifting.
FAQ
When was Gemini 3.5 Flash released?
May 19, 2026, at Google I/O. Generally available the same day via Google AI Studio, Vertex AI, and Antigravity. Gemini Spark, which runs on top of it, started rolling out to trusted testers at the same time and reached AI Ultra subscribers in the U.S. the following week.
How big is the context window?
1,048,576 input tokens (a hair over 1M) with a 65,536 output token cap. The model also performs unusually well at actually using that window — 77.3% on MRCR v2 at 128k is the headline retrieval number.
Is Gemini Spark a model or a product?
A product. Gemini Spark is a persistent personal agent inside the Gemini app, built on top of the Gemini 3.5 Flash model and the Antigravity agentic platform. There's no separate "Spark model" to call from the API.
Can I access Gemini Spark via the API?
Not at launch. Spark is consumer-side only and available to Google AI Ultra ($100/month, also included in the $200 tier reduced from $250) subscribers in the U.S. as of late May 2026. Developer-side access has not been announced.
How does Gemini 3.5 Flash compare to Gemini 3.1 Ultra?
3.5 Flash beats 3.1 Pro on agentic and coding benchmarks, which Google's framing implies puts it ahead of 3.1 Ultra on most tasks too. 3.1 Ultra retains a slightly larger 2M context window and is still useful for workloads where pure single-shot reasoning matters more than speed or agentic behavior.
Is Gemini 3.5 Flash cheaper than Claude Opus 4.7 or GPT-5.5?
Per token, materially cheaper — $1.50 in, $9.00 out vs. comparable frontier rates above that. Per task, even cheaper because the convergence is faster. The price hike is only painful if you were previously running 3.1 Flash and don't need the upgrade.
Does Gemini 3.5 Flash support MCP?
Yes, and Google is leaning into MCP as the agentic tool-use standard. MCP Atlas score is 83.6%. If you've built MCP tools for Claude, they should largely drop in.
When does Gemini 3.5 Pro launch?
Google's launch post says "next month," so realistically June 2026. The current 3.5 Pro is being used internally for training and red-teaming work.
Can I use Gemini 3.5 Flash with the OpenAI SDK?
Yes. The Gemini API is OpenAI-compatible at the chat-completions endpoint. Swap the base URL and model name; the rest of your client code stays put.