Gemini 3.5 Complete Guide: Flash, Pro, Pricing & How It Compares (2026)

Google's Gemini 3.5 Flash shipped at I/O 2026 as the new default Gemini. The complete developer's guide — pricing, context, multimodal, how it stacks up against Claude and GPT-5.5.

Updated 27 May 2026 • 9 min read

Quick answer. Gemini 3.5 is Google DeepMind's mid-2026 frontier model family — Gemini 3.5 Flash launched at Google I/O on May 20, 2026 and is now the default Gemini in the consumer app, AI Mode in Search, and Vertex / Gemini API. Gemini 3.5 Pro was announced at the same event with full availability expected in June 2026. Compared to Gemini 3 Flash, the 3.5 generation brings stronger multimodal reasoning, sharper coding, lower API pricing (around $1.50 input / $9 output per 1M tokens for Flash), and tighter integration with the agent stack — particularly Google's renamed "Gemini Enterprise Agent Platform" (formerly Vertex AI Agent Builder). For most developers it competes head-on with Claude Sonnet 4.6, GPT-5.5, and DeepSeek V4-Flash; the Gemini differentiator remains native multimodality (text + image + audio + video in a single request), the largest production context window (up to 2M tokens on Pro), and tight Google Cloud + Workspace integration. This guide covers the lineup, benchmarks, pricing, ecosystem (AI Studio, Vertex, Gemini API, Gemini in Workspace, Gemini CLI), how to choose between Gemini 3.5 Flash and Pro, multimodal patterns that actually work, and how Gemini 3.5 compares to Claude Opus 4.7 and GPT-5.5 in practice.

What is Gemini 3.5 and when did it launch?

Gemini 3.5 is the May 2026 evolution of Google DeepMind's Gemini family. Two variants are shipping or imminent:

Gemini 3.5 Flash — launched at Google I/O 2026 on May 20, 2026. Replaced Gemini 3 Flash as the default in the Gemini app, in Google Search's AI Mode globally, and in the Gemini API (model id gemini-3.5-flash). Pricing announced: $1.50 per 1M input tokens, $9 per 1M output tokens — meaningfully cheaper than Gemini 3 at the same speed tier.
Gemini 3.5 Pro — announced at I/O, full general availability expected June 2026. Targets the frontier coding / reasoning / long-context use cases that previously needed Gemini Ultra. Context window of up to 2M tokens.

The previous generation — Gemini 3 Flash — hit Q1 2026 with strong scores (GPQA Diamond 90.4%, Humanity's Last Exam 33.7%) and rapidly became the consumer default. Gemini 3.5 is the price/quality re-pivot on top of that base, with explicit positioning against Claude Sonnet 4.6 and GPT-5.5.

Gemini 2.x and earlier are now legacy. The migration story is: Gemini 1.5 / 1.5 Pro → Gemini 2.0 (Feb 2025) → Gemini 2.5 Pro (mid-2025) → Gemini 3 (early 2026) → Gemini 3.5 (May 2026). Each step kept the API surface stable; model id changes plus pricing updates are the main migration work.

How does Gemini 3.5 compare to Claude Opus 4.7 and GPT-5.5?

The honest 2026 picture: three frontier families that trade places on different axes, with different strengths.

Dimension	Gemini 3.5 Pro	Claude Opus 4.7	GPT-5.5
Coding (SWE-bench)	Strong	~80.8% SWE-bench Verified (highest, Apr 2026)	Strong
Long-context	Up to 2M tokens	1M tokens (extended)	~400K tokens
Multimodal (text+image+audio+video)	Native, single-request	Text + image (strong)	Text + image
Tool use / agents	Strong, Vertex / Agent Platform integration	Computer-use agent, Agent Skills, Managed Agents	Strong tool use, function calling
Reasoning (extended thinking)	Yes	Yes (extended thinking)	o3/o4-mini reasoning line is the leader
Pricing tier (frontier)	Mid (~$1.50/$9 Flash; Pro TBC)	Higher	Higher
Free-tier access	Generous (AI Studio)	Limited (Claude.ai)	Free GPT-5.5 Instant
Workspace / Office integration	Native Google Workspace	Via Claude for Work	Microsoft 365 Copilot

Where each one wins in 2026:

Choose Gemini 3.5 when you need the longest context (2M tokens), native multimodal (video + audio in one request), generous free-tier prototyping in AI Studio, or deep Google Cloud / Workspace integration.
Choose Claude Opus 4.7 when you need the strongest coding agent, computer-use capability, or the cleanest agent-development experience (Claude Code, Agent Skills, Managed Agents). See our Claude Opus 4.7 guide.
Choose GPT-5.5 when you need the o3/o4-mini reasoning ladder, Microsoft 365 integration, or you're already deep in the OpenAI ecosystem. See our GPT-5.5 guide.

For agent workflows, Claude leads. For multimodal and ultra-long context, Gemini leads. For pure reasoning on hard problems, GPT-5.5's o3/o4-mini line is still the reference. Most production teams in 2026 use two or three of these via a model router rather than betting on one.

What's new in Gemini 3.5 vs Gemini 3?

The headline improvements Google demonstrated at I/O 2026:

Cheaper Flash. ~$1.50/$9 per 1M tokens — meaningfully below Gemini 3 Flash's prior tier and competitive with Claude Haiku 4.5 and GPT-5.5 Instant.
Sharper coding. Google demoed live multi-file refactors in the Gemini Code Assist extension and in the new Gemini CLI. Internal benchmarks show ~10-15 point gains on SWE-bench Verified vs 3.0 Flash.
Better extended thinking — Pro's reasoning chain is now visible to developers (similar to Claude's extended thinking and o3's reasoning traces) and can be capped to budget compute.
Longer context with cleaner long-context recall. 2M tokens on Pro with stronger needle-in-haystack performance than Gemini 1.5 Pro's earlier 2M context.
Native multimodal in a single request. Send an image, an audio file, and a video clip with one prompt — no separate API calls. The vision-audio-video joint reasoning is genuinely differentiated.
Agent-platform integration. Tighter coupling with Gemini Enterprise Agent Platform (the renamed Vertex AI Agent Builder), with first-class tool definitions, retrieval, evaluation, and deployment pipelines.

How do I access Gemini 3.5?

Five surfaces, depending on what you're building:

1. Gemini app (consumer). The free Gemini at gemini.google.com and the mobile apps now run Gemini 3.5 Flash as the default. Useful for quick prototyping.

2. Google AI Studio. Free developer playground at aistudio.google.com. Generous free tier — you can prototype real agents without a credit card. The best place to evaluate Gemini before committing to API.

3. Gemini API (direct). SDK in Python, Node, Go, with REST as the universal fallback. Model ids: gemini-3.5-flash, gemini-3.5-pro (when GA). The simplest production path — no Vertex needed.

4. Vertex AI / Gemini Enterprise Agent Platform. Enterprise hosting with IAM, VPC-SC, audit logs, fine-tuning workflows for the older Gemini variants, and the full Agent Builder pipeline. Use Vertex when you need GCP-native security and governance.

5. Gemini CLI. Google's open-source command-line agent (similar to Claude Code, Cursor's CLI). One binary, MCP server support, runs against your API key. The 2026 entry point for terminal-first developers — see github.com/google-gemini/gemini-cli.

For Workspace users: Gemini 3.5 is being rolled into Google Docs, Sheets, Gmail, Meet, and the Drive search experience. The "@Gemini" experience inside docs handles drafting, refactoring, multilingual translation, and the new "audio overview" feature for documents.

What's Gemini 3.5 actually good at?

Long-context document analysis. Drop a 1.5M-token document (a codebase, a legal contract, a 500-page PDF) and ask questions across it. No other major frontier model has this context length at usable recall. The 2026 sweet-spot use case: due diligence, code-base understanding, multi-document research synthesis.

Multimodal reasoning. Send a screenshot of a UI + a screen recording + a spec doc and ask the model to write the fix. Gemini handles all three in one request, with cross-modal attention. The video-input quality genuinely beats GPT-5.5 in our testing — Google's video training data advantage shows.

Workspace-embedded productivity. "Summarise this thread", "draft a reply", "build a sheet from this PDF" — the native Workspace surface is dramatically faster than the equivalent API workflow. For knowledge workers, this is the most-used Gemini surface.

Code review and refactor at file-tree scale. With the 2M context, Gemini 3.5 Pro can hold a small codebase in mind and reason across it. Code Assist's "review this PR" with full repo context is meaningfully better than Gemini 3.

Real-time multilingual translation. Gemini consistently leads on lower-resource language pairs, particularly Indian languages, Thai, Vietnamese, Indonesian, Swahili. If you're shipping product into India, Southeast Asia, or sub-Saharan Africa, Gemini's multilingual tail is the strongest of the three frontier families.

Where Gemini 3.5 lags in 2026: agentic coding (Claude leads), terminal-first dev experience (Claude Code is more polished), the hardest reasoning benchmarks (GPT-5.5 with o3/o4-mini still wins on competition-math and the highest tiers).

How do I structure my first Gemini 3.5 integration?

Three patterns that work in 2026:

Pattern A — Direct API with the Python SDK.

pip install google-genai
from google import genai
client = genai.Client(api_key=YOUR_API_KEY)
response = client.models.generate_content(
  model="gemini-3.5-flash",
  contents=["Summarise this:", open("doc.pdf", "rb").read()],
  config={"thinking_config": {"thinking_budget": 8192}}
)
print(response.text)

Use for: pure backend integration, no GCP dependency.

Pattern B — Vertex AI with IAM. Same SDK, but authenticated via Application Default Credentials and routed through your Google Cloud project. Use when you need audit logs, VPC-SC, or you're already in GCP.

Pattern C — Gemini CLI as an agent shell. Install with npm install -g @google/gemini-cli (or via brew). Authenticate with your API key. Run in a terminal — the CLI handles tool registration, MCP server discovery, multi-turn agent loops. Use for terminal-first development, code review, ops automation. See our AI coding agents guide for how it compares to Claude Code and Cursor.

What does Gemini 3.5 cost?

Gemini 3.5 Flash pricing announced at I/O 2026: $1.50 per 1M input tokens / $9 per 1M output tokens. Image and audio inputs priced as their token-equivalent. Free tier on AI Studio remains generous (60 requests per minute, with daily token quotas).

Gemini 3.5 Pro pricing is to be confirmed at GA in June 2026; expected to follow the same ratio (~10× Flash) as prior Pro tiers, putting it in the $15/$60 per 1M tokens range — competitive with Claude Sonnet 4.6 and below Claude Opus 4.7 / GPT-5.5 Pro.

Cached inputs: Google's prompt caching (released alongside Gemini 1.5 and improved with each generation) is the cost-control hammer for agent workloads. Cached tokens are charged at ~25% of input pricing. For any production agent loop that re-sends the same system prompt + tool definitions, prompt caching is the single highest-leverage optimisation.

What about Gemma 4 (the open-weight side)?

Gemini 3.5 is closed-weight, API-only. Google's open-weight family is the separate Gemma line — Gemma 4 launched April 2, 2026 with four sizes (E2B, E4B, 26B MoE, 31B dense) under Apache 2.0, 256K context, multimodal. Gemma is the right answer when you need to self-host, fine-tune locally, or run on edge devices. See our Gemma 4 complete guide.

The pattern in 2026 for teams using both: Gemma 4 for self-hosted or fine-tuned workloads, Gemini 3.5 for frontier API tasks. They share lineage but serve different deployment models.

FAQ

Is Gemini 3.5 Pro actually released yet?

Gemini 3.5 Flash launched at Google I/O 2026 on May 20, 2026 and is generally available now. Gemini 3.5 Pro was announced at the same event but full general availability is expected in June 2026. As of late May 2026, Pro is in limited preview for select Vertex customers.

What's the context window?

Gemini 3.5 Flash supports up to 1M tokens. Gemini 3.5 Pro supports up to 2M tokens — the largest of any production frontier model in May 2026. Long-context recall (the "needle in a haystack" benchmark) is meaningfully improved over Gemini 1.5 Pro at the same 2M cap.

Is Gemini 3.5 free?

Yes via Google AI Studio with a generous free tier (60 requests/min). The Gemini consumer app and AI Mode in Google Search use Gemini 3.5 Flash for free by default. Paid tiers exist for API usage and for the Gemini Advanced consumer subscription (Pro features in the app).

Does Gemini 3.5 support function calling and tool use?

Yes — function calling and structured outputs (with JSON schema validation) are first-class. The Gemini API exposes a tools parameter; Vertex / Agent Platform adds higher-level tool orchestration. For agent workflows, the integration with the renamed Gemini Enterprise Agent Platform (formerly Vertex AI Agent Builder) is the main 2026 surface.

Can I fine-tune Gemini 3.5?

Not the 3.5 generation directly as of May 2026 — Google offers fine-tuning on older Gemini variants (and Gemma family) through Vertex AI. For a fine-tunable Google-lineage model, use Gemma 4. See our fine-tuning guide for the broader landscape.

How does Gemini 3.5 compare to Claude Opus 4.7 for coding?

Claude Opus 4.7 leads on SWE-bench Verified (~80.8% in April 2026 — the highest in the category) and on agentic coding workflows via Claude Code. Gemini 3.5 Pro is competitive but lags by ~5–10 points on hard agentic tasks. For pure long-context code understanding (huge monorepo, multi-file analysis), Gemini's 2M context window gives it a different angle that Claude can't match in one request.

What about Gemini's multimodal advantage?

Genuine. Gemini accepts text, image, audio, and video in a single request with cross-modal attention. Video input quality in particular leads the field — Google's training data advantage shows. For workflows involving screen recordings, lecture videos, image+audio analysis, or any cross-modal reasoning, Gemini is the strongest choice.

What's the cheapest way to evaluate Gemini 3.5?

Google AI Studio (aistudio.google.com) — free, no credit card, full access to gemini-3.5-flash. Sufficient for production-scale prototyping. Move to the Gemini API once you exceed free quotas; move to Vertex when you need enterprise governance.

Is there a Gemini equivalent to Claude Code or Cursor?

Yes — the open-source Gemini CLI (github.com/google-gemini/gemini-cli), plus Gemini Code Assist as an IDE extension. The CLI is the closest analog to Claude Code; Code Assist competes with Cursor for IDE-integrated workflows. See our AI coding agents guide for the full comparison.

What's coming next?

Gemini 3.5 Pro full GA in June 2026. Expected: a Gemini 4 generation in late 2026 or early 2027 (no official roadmap), more specialised models in the Gemini family (a reasoning-focused variant similar to OpenAI's o3 line is widely rumoured), and continued Gemma 4 point releases on the open-weight side.

Claude Opus 4.7 guide — the direct frontier competitor
GPT-5.5 guide — the other major frontier
Gemma 4 guide — the open-weight Gemini-family sibling
AI coding agents — Gemini CLI vs Claude Code vs Cursor
Open-source LLMs landscape
Fine-tuning LLMs
Apple Silicon LLMs