Gemini 3.5 Pro: The June 2026 Launch Guide

Gemini 3.5 Pro was announced at Google I/O 2026 with a June general-availability target. Here's what's confirmed, what's likely, and how to prepare your stack.

Quick answer. Gemini 3.5 Pro was announced at Google I/O on May 19, 2026 but is still in limited Vertex preview, with GA expected in June 2026. Pro targets a 2M-token context window, Deep Think reasoning, and frontier multimodal — the use cases Gemini Ultra used to cover. Gemini 3.5 Flash is the public 3.5 model until Pro lands.

If you're trying to figure out whether to wait for Gemini 3.5 Pro or move now, this is the briefing. Google announced 3.5 Pro at I/O 2026 alongside Gemini 3.5 Flash, but only Flash went live on May 19. Pro is in internal use and limited Vertex preview — Sundar Pichai's exact words on stage were "Give us until next month to get it to you," which drew audible groans from the live audience. June 2026 is the working date.

This guide covers what's confirmed, what's strongly signaled, and the decisions you can make today: pricing math, the migration path from Gemini 3.1 Pro and Gemini 3 Pro, head-to-head positioning against Claude Opus 4.7 and GPT-5.5, and how to get on the access list now.

What is Gemini 3.5 Pro and when does it launch?

Gemini 3.5 Pro is the flagship tier of Google DeepMind's 3.5 family — announced at Google I/O 2026 on May 19, with general availability targeted for June 2026. It is the model that absorbs the use cases Google previously routed to Gemini Ultra: frontier reasoning, deep multimodal understanding, and very long context.

What's actually live right now:

  • Gemini 3.5 Flash — generally available in the Gemini app, AI Mode in Search, AI Studio, the Gemini API, Vertex AI, and Gemini Enterprise. Default model for the consumer Gemini experience globally.
  • Gemini 3.5 Pro — limited preview for select Vertex enterprise customers, in internal use at Google, public GA "next month."
  • Antigravity 2.0 — the desktop app and new Antigravity CLI shipped at I/O. The CLI replaces Gemini CLI on June 18, 2026.

The Pro launch matters because Flash, despite outperforming Gemini 3.1 Pro on most published coding and agent benchmarks (Terminal-Bench 2.1, MCP Atlas, GDPval-AA, CharXiv), still trails on the things Pro is built for: long-context retrieval at the upper end, hard reasoning (Humanity's Last Exam, ARC-AGI-2), and the 128k+ MRCR slice where 3.1 Pro keeps a 7.6-point lead. Pro is meant to close those last gaps and then some.

What's the context window and what can 2M tokens actually do?

Gemini 3.5 Pro is expected to ship with a 2M-token input context window — double Flash's 1M and the largest of any production frontier model in May 2026. For reference, the rest of the field:

  • Gemini 3.5 Flash: 1M tokens
  • Gemini 3.1 Pro: 1M tokens
  • Claude Opus 4.7: 200K standard, 1M in beta
  • GPT-5.5: 256K (922K input + 128K output in extended mode)

Two million tokens means roughly:

  • 1,500-page legal filing fits as a single prompt with room to reason
  • An entire mid-size monorepo (~150k lines of code) plus its tests and docs
  • About 30 hours of audio transcription, or ~30 minutes of video at standard sampling (1 FPS = 18,000 tokens per minute of video)
  • A full quarter of customer support tickets for a small SaaS

The catch: window size and retrieval quality are different problems. On MRCR v2 long-context retrieval at 128k tokens, Gemini 3.1 Pro hits 84.9% while 3.5 Flash drops to 77.3% — at the 1M slice both models drop to ~26%. Pro 3.5 is expected to improve retrieval inside long windows, not just stretch them. Until the model card lands, treat the 2M number as a ceiling for batch document analysis, not a guarantee of perfect needle-in-haystack accuracy.

How does Gemini 3.5 Pro compare to Claude Opus 4.7 and GPT-5.5?

Today, three frontier models matter for serious engineering work: Claude Opus 4.7 (released April 16, 2026), GPT-5.5 (April 23), and the Gemini 3.5 family. Here's the per-task picture based on what's published, with Pro 3.5 entered where Google has signaled intent rather than confirmed numbers:

Repository-level coding (multi-file refactor)

Opus 4.7 leads on SWE-Bench Pro at 64.3%. Gemini 3.5 Flash sits at 55.1%, GPT-5.5 at 57.7%, Gemini 3.1 Pro at 54.2%. Pro 3.5 should narrow this — the question is by how much.

Terminal and agent loops

GPT-5.5 leads Terminal-Bench 2.0 at 82.7%. Gemini 3.5 Flash leads Terminal-Bench 2.1 at 76.2%. Opus 4.7 trails at 66.1%. Pro 3.5 is the most likely model to take the top of the next leaderboard.

Multimodal understanding

Gemini 3.5 Flash currently tops the Roboflow vision leaderboard across 66 models and leads CharXiv Reasoning at 84.2% and MMMU-Pro at 83.6%. Opus 4.7 sits at 84.1% on MMMU. Pro 3.5 should extend the lead here, especially on video.

Long-context retrieval

Gemini's structural advantage. Pro 3.5's 2M window plus better retrieval should be the cleanest pick for legal review, codebase analysis, and multi-document RAG over technical PDFs.

Math and scientific reasoning

GPT-5.5 leads FrontierMath. Gemini 3 Deep Think hits 84.6% on ARC-AGI-2 (verified by the ARC Prize Foundation), and Gemini 3.1 Pro with Deep Think Mini hits 77.1%. With HIGH thinking enabled, Pro 3.5 is the model to watch on these benchmarks.

Speed

Flash 3.5 outputs at ~289 tok/sec, roughly 4x faster than Opus 4.7 (67) and GPT-5.5 (71). Pro 3.5 will give up some of that speed for reasoning quality, but should remain competitive with Opus and GPT in the same tier.

The honest summary

No single model wins everything. The pattern emerging in mid-2026 is: Gemini for breadth and multimodal, Claude for code quality and defensive engineering, OpenAI for math and consumer reach. Pro 3.5 looks designed to take the Gemini-for-everything thesis and make it real.

What benchmarks should you expect from Gemini 3.5 Pro?

Google hasn't published Pro 3.5's model card. Based on the gap between Flash and Pro in previous generations, plus Sundar Pichai's framing of "frontier coding, reasoning, and long-context use cases that previously needed Gemini Ultra," here's what to expect at GA:

BenchmarkGemini 3.5 Flash (confirmed)Gemini 3.1 Pro (reference)Gemini 3.5 Pro (likely range)
Terminal-Bench 2.176.2%70.3%78-82%
MCP Atlas83.6%78.2%85-88%
SWE-Bench Verified80.6%84-88%
SWE-Bench Pro55.1%54.2%60-66%
GPQA Diamond94.3%94-96%
ARC-AGI-2 (Deep Think HIGH)77.1%80-85%
MRCR v2 @ 128k77.3%84.9%86-90%
CharXiv Reasoning84.2%85-88%
LiveCodeBench Pro Elo28872950-3050

Take the "likely range" column as informed extrapolation, not a published spec. The gap between Flash and Pro in the 3.x series has consistently been 5-10 points on hard reasoning and 3-6 points on coding, and Google is positioning 3.5 Pro as the model that beats every competitor on at least one headline benchmark.

How much will Gemini 3.5 Pro cost?

Google has not announced Pro 3.5 pricing. The reference points everyone is using:

ModelInput ($/1M)Output ($/1M)Long-context (>200K)
Gemini 3.5 Pro (expected)$2.00-3.00$12.00-18.002x / 1.5x likely
Gemini 3.5 Flash$1.50$9.00n/a
Gemini 3.1 Pro$2.00$12.002x / 1.5x
Claude Opus 4.7$5.00$25.00
GPT-5.5$5.00$30.002x / 1.5x >272K
GPT-5.5-pro$30.00$180.00

The base case: Pro 3.5 matches or slightly exceeds 3.1 Pro pricing — roughly $2.00 input and $12.00 output, with a long-context surcharge above 200K. That keeps it 2-3x cheaper than Opus 4.7 and GPT-5.5 standard, while still giving Google room to charge a premium for Deep Think HIGH or batch / cached workloads.

Two pricing levers to plan around now:

  • Context caching at -90%. Cached input on the Gemini API drops to ~$0.15/M today. If you're sending the same system prompt or document repeatedly, caching is the difference between viable and unviable.
  • Batch mode at half rate. Async batch is half the standard price across Google, OpenAI, and Anthropic. For offline document processing, this is the path.

How do you access Gemini 3.5 Pro today?

Pro 3.5 is not generally available as of May 28, 2026. The access ladder, in increasing order of likelihood you'll actually get in this week:

  1. Vertex AI Model Garden allowlist. Open Vertex AI, search "gemini-3.5-pro," request allowlist access if you have an active GCP enterprise contract. Some I/O attendees were granted access on stage; most everyone else is on the list.
  2. Gemini Enterprise Agent Platform. If your company is on Gemini Enterprise, your account team can request early Pro access.
  3. Google AI Studio waitlist. When Pro lands publicly, the first surface is aistudio.google.com. Watch the model picker on a paid project — Google typically adds new models there before announcing GA.
  4. Google AI Pro and Ultra subscriptions. The $20/month Pro plan and $250/month Ultra plan will get access in the Gemini consumer app first. Ultra subscribers get Deep Think access too.
  5. Antigravity 2.0 CLI. Once Pro is on the Gemini API, Antigravity will list it as a backend model. Gemini CLI sunsets June 18, 2026, so this is also the moment to migrate.

When GA happens, expect Google to ship across all surfaces in a single day — that's what Flash did on May 19. The June 2026 announcement will almost certainly include the model card, pricing, and API model name on day one.

Sample API call (post-GA)

from google import genai

client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

response = client.models.generate_content(
    model="gemini-3.5-pro",  # name TBC; AI Studio will show the canonical id
    contents="Refactor this 800-line file to use typed dataclasses ...",
    config={
        "thinking_config": {"thinking_level": "high"},
        "max_output_tokens": 4096,
    },
)
print(response.text)

If you're already on the google-genai SDK with Flash, the only line that changes is the model argument. The thinking config and the rest of the surface (streaming, function calling, multimodal inputs) are preserved across the family.

Which coding tools already support Gemini 3.5?

The integration surface around Gemini 3.5 is mature even before Pro lands. If you build with any of these, Pro 3.5 will appear as a model option within days of GA:

  • Antigravity 2.0 (desktop and CLI) — Google's flagship agentic coding surface. Replaces Gemini CLI on June 18, 2026.
  • Gemini Code Assist — IDE plugins for VS Code, JetBrains, and Android Studio.
  • Cursor — model picker already supports Gemini 3.x; will add 3.5 Pro on availability.
  • Cline — open-source agent in VS Code with BYOM support.
  • Aider — CLI pair-programming with full BYOM via the Gemini API.
  • Continue, Zed, OpenHands, Devin — all support arbitrary Gemini models via API key.

The pattern is consistent: any tool that lets you paste an API key from AI Studio already runs on Gemini 3.5 Flash and will run on Pro the moment the model id resolves on Google's side.

What are the real-world gotchas?

Five operational details that have bitten teams on the 3.x series and will carry forward to Pro 3.5:

  1. The 200K pricing cliff. On Gemini Pro models, the entire request reprices at 2x input and 1.5x output once you cross 200K tokens. A 250K-token prompt costs the same as a 250K prompt at 2x rates — not just the overage. Plan around it: batch into 199K-token chunks where you can.
  2. Free-tier prompts train future models. Google's privacy terms allow free-tier requests to be used for model improvement. Move any proprietary code or customer data to a billed project.
  3. Safety filters return PROHIBITED_CONTENT. Adjustable filters default to OFF on Gemini 3.5, but built-in protections (child safety, terms of service violations) cannot be disabled and surface as PROHIBITED_CONTENT. Build retry logic for SAFETY blocks but not for PROHIBITED_CONTENT.
  4. Video burns tokens fast. At default 1 FPS sampling, an hour of video is ~1.08M tokens. Even a 2M context window doesn't give you that much headroom for analysis on top. Sub-sample if you can.
  5. Gemini CLI sunsets June 18, 2026. If your CI pipeline or local automation calls gemini, migrate to Antigravity CLI now. The shape is similar but extensions become plugins and the agent harness changes.

Should you wait for Pro or ship on Flash now?

The honest answer depends on what you're building.

Ship on Gemini 3.5 Flash now if

  • Your task fits in 1M context
  • You care about per-token cost (Flash is 3x cheaper than GPT-5.5 and 6x cheaper than Opus 4.7 on input)
  • You're building an agentic loop where speed matters (Flash is ~4x faster output)
  • Your benchmark target is Terminal-Bench, MCP Atlas, or multimodal vision — Flash already leads here

Wait for Gemini 3.5 Pro if

  • You need 1M+ context with reliable retrieval (Flash drops to 77% on MRCR v2 at 128k; Pro 3.1 hits 85%)
  • Your task is repository-level coding where Opus 4.7's 64.3% SWE-Bench Pro is the gap to close
  • You need Deep Think HIGH for hard math or scientific reasoning
  • You're processing >1-hour videos or 500+ page documents in single prompts
  • You're an enterprise that needs Vertex SLAs and governance — Pro is the SKU your account team will route you toward

Hedge by building on the API surface

Because Pro and Flash share the same SDK, thinking config, and feature set, you can build today on Flash and swap to Pro by changing one string. Most teams should do exactly that: ship on Flash now, A/B Pro against Flash on your specific workload when it lands, route by cost-per-quality.

For broader context on the 3.5 family — Flash specs, Gemini Spark, Antigravity 2.0 — see our complete Gemini 3.5 guide.

Codersera: hire engineers who already know the Gemini stack

If you're integrating Gemini 3.5 — Flash now, Pro when it lands — and need engineers who understand context caching, long-context pricing, the Vertex governance story, and how to migrate from Gemini CLI to Antigravity 2.0, Codersera matches you with vetted remote developers who ship production AI systems. Risk-free trial, technical fit guaranteed.

FAQ

Is Gemini 3.5 Pro available right now?

No. As of May 28, 2026, Gemini 3.5 Pro is in limited preview for select Vertex enterprise customers. Sundar Pichai announced it at Google I/O on May 19 with a June 2026 general-availability target. Gemini 3.5 Flash is the public-facing 3.5 model right now.

What is the Gemini 3.5 Pro context window?

The expected context window is 2M tokens for input — the largest of any production frontier model in May 2026 and double Gemini 3.5 Flash's 1M. The model card with confirmed numbers will land at GA.

How much will Gemini 3.5 Pro cost?

Google has not announced pricing. The reference point is Gemini 3.1 Pro at $2.00 per 1M input tokens and $12.00 per 1M output tokens, with a 2x / 1.5x surcharge above 200K. Expect Pro 3.5 to match or modestly exceed that, with cached input around $0.15/M and batch mode at half rate.

Will Gemini 3.5 Pro beat Claude Opus 4.7 on coding?

Probably on Terminal-Bench and MCP Atlas, where Gemini 3.5 Flash already leads. SWE-Bench Pro is where Opus 4.7 dominates at 64.3%, and Pro 3.5 needs a 10-point jump from Flash's 55.1% to close that gap. Google's framing suggests it will — but until the model card publishes, this is the open question.

Can I use Gemini 3.5 Pro in Cursor or Antigravity?

Antigravity 2.0 already runs Gemini 3.5 Flash and will add Pro on the day of GA. Cursor's model picker supports the Gemini 3.x series and will pick up Pro as soon as Google publishes the API model id. The migration is a one-line change in both surfaces.

What happens to Gemini CLI when Antigravity ships?

Gemini CLI sunsets on June 18, 2026 for Google AI Pro, Ultra, and Gemini Code Assist users. Antigravity CLI is the direct replacement — same backend agent harness, same hook and skills features renamed as plugins. Migrate any automation that depends on gemini command output before mid-June.

How do I get on the Gemini 3.5 Pro preview list?

If you're on Google Cloud, open Vertex AI Model Garden, search "gemini-3.5-pro," and request allowlist access through your account team. Gemini Enterprise customers should ask their CSM directly. Solo developers should watch aistudio.google.com — Google typically adds new models to AI Studio's picker the moment the API is ready.

Should I rewrite my Gemini 3.1 Pro integration before Pro 3.5 lands?

Probably not. The google-genai SDK surface is stable across 3.1, 3.5 Flash, and 3.5 Pro — the model name is the only field that changes. Swap when Pro lands, A/B against your current workload, decide based on quality-per-dollar.

How does Gemini 3.5 Pro compare to GPT-5.5 on price?

GPT-5.5 costs $5.00 input and $30.00 output per 1M tokens — 2-3x what Gemini 3.5 Pro is expected to cost. GPT-5.5-pro is $30/$180. For volume workloads, Gemini Pro is the cheaper frontier-class option even before factoring in context caching.

Is Deep Think coming to Gemini 3.5 Pro?

Almost certainly. Gemini 3.1 Pro ships with a three-tier thinking system (LOW / MEDIUM / HIGH) where HIGH activates Deep Think Mini. Pro 3.5 is expected to inherit and improve this. Deep Think is the mode that drives the headline ARC-AGI-2 numbers (84.6% on the full Deep Think) and pushes hard math benchmarks.

When exactly in June will Gemini 3.5 Pro launch?

Google has not given a specific date — only "next month" as of May 19. Polymarket and Manifold prediction markets are pricing in a late-June 2026 launch. The most likely surface for the first signal is the model picker in AI Studio, which is where Flash showed up before its official announcement.