Quick answer (refreshed May 28, 2026). May 2026 was the busiest month the agent category has ever had. Cursor 3 (April 2) demoted the IDE to a fallback pane and shipped an agent-first interface; Composer 2.5 (May 18) landed third on Artificial Analysis's Coding Agent Index at 62, behind only Claude Opus 4.7 in Claude Code (66) and GPT-5.5 in Codex (65), at roughly one-tenth the per-task cost. OpenAI's Codex CLI rewrote itself from TypeScript to Rust and shipped /goal persistent-thread Goal Mode to GA on May 21 (CLI 0.133). Google announced Antigravity CLI on May 19, replacing Gemini CLI; the old CLI stops serving free Pro/Ultra requests on June 18, 2026. xAI's Grok Build (May 14) shipped a terminal agent with up to 8 parallel sub-agents in isolated git worktrees, joined by Grok Skills (May 18) and Connectors (May 6 / May 22). Claude Code opened its plugin marketplace as a first-class system in spring 2026 (Opus 4.7 at 1M tokens), and the Anthropic web app got the sessions-sidebar redesign in April. Cline 3.85 (May 25) added GPT-5.5 via SAP AI Core plus DeepSeek V4 Flash and Pro. And the standards layer: OpenAI and Anthropic co-founded the Agentic AI Foundation under the Linux Foundation, donating AGENTS.md (now in 60,000+ repos) and MCP. The picture below covers the May 2026 capability and pricing matrix; the rest of the guide covers per-tool deep dives.
What's the May 2026 state of AI coding agents in one table?
| Tool | Strongest at | Pricing (May 2026) | What changed since April |
|---|---|---|---|
| Claude Code 2.1 | SWE-bench leader (80.8%), agentic loops, terminal-first | $200/mo Max; included with API spend | /code-review, Agent View, pinned bg sessions, plugin ecosystem, Opus 4.7 default |
| Cursor 3.5 | IDE-first, Cloud Agents, Composer 2.5 multi-file refactor | $20 Pro / $40 Ultra / $200 Max; Cloud Agents metered | 3.3 Build-in-Parallel + Jira; 3.5 Cloud Agents |
| GitHub Copilot agent mode | VS Code integration, agentic code review, full-project context | $10–$39/mo; agent mode + BYOK now GA | BYOK third-party models, agent review |
| OpenCode | OSS, Scout subagent, auto-compact, MCP-native | Free | 161K GH stars, Scout subagent |
| Cline | OSS, BYO-API, MCP-native | Free (you pay API) | 61K GH stars, growing fast |
| Sourcegraph Amp | Repo-graph semantic context, unconstrained tokens | Hosted SaaS, contact for pricing | Spun out as standalone company |
| Windsurf / Cascade | IDE, Google AI behind the scenes post-acquisition | $15/mo Pro / $30 Ultimate | Founders → Google ($2.4B), rest → Cognition ($250M) |
| Gemini CLI | Terminal-first, Gemini 3.5, free generous tier | Free with API key | Now defaults to Gemini 3.5 Flash; Pro coming June |
| Replit Agent | Browser IDE, agent + deployment in one | Free + paid tiers | $400M Series D at $9B valuation (Mar 2026) |
The right tool depends on workflow shape: terminal-first → Claude Code or Gemini CLI; IDE-first → Cursor or Cline; cloud-async → Cursor Cloud Agents or Replit Agent; OSS-only / BYO-key → OpenCode or Cline. For Cursor's specifics see our Cursor IDE complete guide; for the underlying models, see Claude Opus 4.7, GPT-5.5, and Gemini 3.5.
Last updated: May 28, 2026.
The AI coding agent market doubled in size between mid-2025 and early 2026, and the field has finally fractured into recognisable categories: closed IDE-forks (Cursor, Windsurf), open IDE-forks (Void), terminal-native agents (Claude Code, Aider, OpenCode), VS Code extensions (Cline, Roo Code, Kilo Code, Continue.dev), and bring-your-own-key shells. Picking the wrong category costs hours of context-resetting before you even hit a paywall. This guide is the working comparison we use internally at Codersera when our vetted engineers onboard onto a new client codebase and need to recommend a tooling stack inside a week.
We restrict the field to ten agents that have either real adoption (over 100k weekly active users) or a defensible architecture story: Cursor, Claude Code, Cline, Aider, OpenCode, Continue.dev, Roo Code, Kilo Code, Windsurf, and Void AI, plus three notable May 2026 entrants covered below: OpenAI's Codex CLI (Rust rewrite + /goal mode GA), Google's Antigravity CLI (Gemini CLI's successor), and xAI's Grok Build. We pulled pricing, model lists, and benchmark numbers from each vendor's docs, the SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.0, and Artificial Analysis Coding Agent Index leaderboards, and primary HN/Reddit threads from March through May 2026. Where vendors disagree with independent reports, we flag it.
TL;DR
- Best default for a senior engineer in a real codebase: Claude Code on Max ($100/mo) or Max-20x ($200/mo) with Claude Opus 4.7. It tops SWE-bench Verified at 87.6% (only Anthropic's still-private Claude Mythos Preview is higher at 93.9%) and SWE-bench Pro at 64.3%. The plugin marketplace, Skills, and 1M-token context on Opus 4.7 cemented its lead in spring 2026.
- Best closed IDE experience: Cursor 3 ($20 Pro / $40 Ultra / $200 Max). Cursor 3 (April 2, 2026) moved the IDE behind an agent-first interface; Composer 2.5 (May 18) is the new in-house model, ranked third on the Artificial Analysis Coding Agent Index at 62 and priced at $0.07–$0.44 per task — roughly one-tenth what Opus 4.7 or GPT-5.5 cost on the same harness.
- Best cloud-async pair: OpenAI Codex CLI with /goal Goal Mode (GA May 21, 2026, in CLI 0.133). The Rust rewrite plus a persistent thread-level state machine means a single /goal directive survives network drops, closed laptops, and budget resets across hours-long sessions.
- Best new entrant for hardware-rich teams: xAI Grok Build (early beta, launched May 14, 2026). Up to 8 parallel sub-agents each in its own git worktree, 256K context, 70.8% SWE-bench Verified. Access expanded May 24 to SuperGrok ($30/mo) and X Premium+ ($40/mo), down from the initial SuperGrok Heavy $300 gate.
- Best replacement for Gemini CLI: Google Antigravity CLI (GA May 19, 2026). Built in Go, shares the agent harness with Antigravity 2.0 desktop. Free-tier Pro/Ultra access on the legacy Gemini CLI ends June 18, 2026 — plan migrations now.
- Best open-source / BYOK: Cline (now at 61k+ GitHub stars, GPT-5.5 and DeepSeek V4 Flash/Pro added in v3.85 on May 25), Kilo Code (1.5M users, 500+ models, zero markup), or Roo Code (forked from Cline, ~30% cheaper per task via diff-based editing).
- Best for privacy / fully local: Continue.dev with Ollama (still actively shipping through May 2026), or OpenCode (v1.15.10 on May 27 — diff viewer in TUI, native API-key runtime, experimental background agents).
- Best terminal-only minimalist: Aider. Pair it with Claude Opus 4.7, GPT-5.5, DeepSeek V4, or a local Qwen 3.5 model. Still no native MCP as of May 2026; community Aider-MCP server bridges exist.
What changed in the agent landscape between mid-2025 and May 2026
Four things reshaped the field. First, SWE-bench Verified saturated, but SWE-bench Pro is the number that matters. On Verified, Claude Mythos Preview leads at 93.9%, followed by Claude Opus 4.8 (88.6%) and Claude Opus 4.7 Adaptive (87.6%); GPT-5.5 sits near 88.7%. But OpenAI's contamination audit showed every frontier model can reproduce verbatim gold patches on some Verified tasks because the 500 Python issues leaked into training data. On contamination-resistant SWE-bench Pro, Claude Mythos Preview leads at 77.8%, Opus 4.7 at 64.3%, and Qwen 3.7 Max at 60.6%; most agents drop 20+ points moving from Verified to Pro. Two newer benchmarks broke into the conversation in spring 2026: Terminal-Bench 2.0 (89 hard end-to-end terminal tasks; GPT-5.5 leads at 0.827, Claude Mythos at 82.0%, GPT-5.3 Codex at 77.3%) and Artificial Analysis's Coding Agent Index, where Claude Opus 4.7 in Claude Code (max) sits at 66, GPT-5.5 in Codex (xhigh) at 65, and Cursor's Composer 2.5 at 62 — the cheapest agent above 60 by a 10–60× margin.
Second, MCP and AGENTS.md became governed standards. In December 2025 the Linux Foundation formed the Agentic AI Foundation (AAIF) with OpenAI, Anthropic, and Block as founding members and Google, Microsoft, AWS, Bloomberg, and Cloudflare supporting. Anthropic donated the Model Context Protocol; OpenAI donated AGENTS.md, the project-level instruction file format now adopted by 60,000+ open-source repos and agent frameworks (Codex, Cursor, Devin, Factory, Gemini CLI, GitHub Copilot, Jules, VS Code, Amp). For the AGENTS.md format itself, see our AGENTS.md complete guide. By the close of Q2 2026, MCP had roughly 9,400 published servers across the four major registries and ~1,300 production-ready servers, with the ecosystem mid-transition from stdio to hosted HTTP transport and from API keys to OAuth 2.1. Every agent in this guide except Aider now speaks MCP natively; community Aider-MCP bridges exist while official support remains on the roadmap.
Third, the agent surface moved out of the IDE. Cursor 3 (April 2, 2026) reframed the editor as a fallback pane behind an agent-first interface, parallel agents across worktrees, and a Microsoft Teams integration. Claude Code's web app got a sessions-sidebar redesign in April with drag-and-drop layout and custom themes. OpenAI's /goal Goal Mode (CLI 0.133, May 21) is the same idea in CLI form: type a directive, the agent persists thread state through interruptions, network drops, and budget resets, and resumes when you're back. Google's Antigravity CLI (May 19) and xAI's Grok Build (May 14) both ship multi-agent orchestrators that run sub-agents in parallel against isolated git worktrees. The pattern is clear — the next twelve months belong to interfaces that let you supervise multiple agents rather than steer one.
Fourth, credit-based billing is collapsing back into flat or per-token pricing. Cursor's credit pool remains the holdout but Composer 2.5's low per-task cost ($0.07 standard / $0.44 fast) makes the pool stretch much further; Windsurf moved to daily quotas with adaptive model routing; Cline, Kilo, and Roo all default to your own API keys at zero markup. The economics are converging on "you pay the model provider directly, the agent just orchestrates." For broader context on Cursor's UX tradeoffs versus the open camp, see our Cursor vs Void comparison and the Void privacy deep-dive.
What launched in May 2026: the month in detail
The pace of change between May 1 and May 28, 2026 broke a record. Here's the timeline with the specific version numbers, dates, and sources so you can re-verify each line.
Cursor 3 (April 2) and Composer 2.5 (May 18)
Cursor 3 (April 2, 2026) shipped an agent-first interface that demotes the IDE to one of several panes. The new Agents Window runs many agents in parallel across repos and environments — locally, in worktrees, in the cloud, and on remote SSH. Cursor 3.3 (May 7) added a PR review experience inside Cursor with Reviews, Commits, and Changes tabs; Build-in-Parallel for plans; Dockerfile build secrets and 70%-faster layer caching; per-environment version history and audit logs; and a Microsoft Teams integration where you can @Cursor a cloud agent from any channel. Composer 2.5 (May 18) is the new in-house coding model: it scored 62 on Artificial Analysis's Coding Agent Index (third place behind Claude Opus 4.7 in Claude Code at 66 and GPT-5.5 in Codex at 65), 79.8% on SWE-Bench Multilingual, and 63.2% on CursorBench v3.1 — and crucially priced at $0.50/M input and $2.50/M output (standard) or $3.00/$15.00 (Fast), about 10× cheaper than the higher-effort variants ahead of it. Composer 2.5 also picked up +35 points on SWE-Bench-Pro-Hard-AA, +2 on Terminal-Bench v2, and +3 on SWE-Atlas-QnA versus Composer 2.
OpenAI Codex CLI: Rust rewrite and /goal Goal Mode GA (May 21)
OpenAI's Codex CLI completed its 2025–2026 rewrite from TypeScript to Rust; the Rust build now ships as the maintained CLI at versions 0.128–0.133 with new configuration, authentication, permission profiles, and sandbox modes. The headline ship of May 21, 2026 is /goal Goal Mode GA in CLI 0.133.0 and matching IDE/desktop builds. You type /goal followed by your prompt; Codex stores the goal as a persisted thread-level state machine and resumes work after network drops, deliberate pauses, or budget resets — including resuming a six-hour run after a five-hour pause. Earlier in the spring, OpenAI launched the Codex plugin marketplace (March 26, 2026, CLI 0.117) bundling Skills, MCP servers, and app connectors into shareable units; CLI 0.121 (April 15) added codex marketplace add for installing plugin marketplaces from GitHub, git URLs, or local directories.
Google Antigravity CLI replaces Gemini CLI (June 18 cutoff)
Google announced on May 19, 2026 that Gemini CLI is being sunset in favour of Antigravity CLI. Built in Go (snappier than the Node-based Gemini CLI), Antigravity CLI shares the same agent harness as Antigravity 2.0, Google's new desktop application — so future improvements land in both places. It orchestrates multiple agents for complex tasks in the background, runs large-scale refactors or research without locking the terminal, and supports agent skills, hooks, sub-agents, and extensions at launch. Antigravity CLI is not open source, which is a real change of posture from Gemini CLI. The June 18, 2026 deadline is when Gemini CLI and Gemini Code Assist IDE extensions stop serving requests for Google AI Pro, Ultra, and free users. Organisations on Gemini Code Assist Standard/Enterprise licences keep access; paid Gemini and Gemini Enterprise API key access is unaffected. If your team is on a free Pro plan, plan the migration this month.
xAI Grok Build, Skills, and Connectors
xAI ran the densest single month any vendor has shipped in this category. Connectors launched May 6, 2026, with first-wave integrations for GitHub, Notion, Linear, Google Workspace, Microsoft 365, plus Bring-Your-Own-MCP; May 22 added Vercel, Canva, Gamma, and S&P Global. Grok Build (May 14, 2026) is xAI's terminal coding agent with up to 8 parallel sub-agents, each in its own isolated git worktree across a three-stage plan/search/build workflow, 256K context, and a local-first privacy model that sends zero codebase data to xAI servers. It scored 70.8% on SWE-Bench Verified at launch. Grok Skills (May 18) is xAI's persistent-custom-expertise system: a Skill is a bundle of name, description, instructions, and optional reference files that you upload, then invoke via slash command or intent. API pricing is aggressive at $0.20 input / $1.50 output per million tokens, well under Opus 4.7's $5/$25. Access initially required SuperGrok Heavy at $300/mo; on May 24 it expanded to all SuperGrok ($30/mo) and X Premium+ ($40/mo) subscribers. For the deep dive see our Grok Build, Skills, and Connectors guide.
Claude Code plugin marketplace and the April web redesign
Anthropic's quiet ship of the spring was opening Claude Code's plugin marketplace as a first-class system. A Skill is a single instruction set; a plugin bundles multiple Skills, MCP servers, or commands. As of May 2026 both can be filtered in real time from /skills and /plugin prompts with type-to-filter. The official Anthropic marketplace sits at claude-plugins-official; the curated community list lives at awesome-claude-code-plugins. Claude Code on the web also got a redesign in Week 17 (April 20–24, 2026): a new sessions sidebar, drag-and-drop layout, custom themes, and a public research preview called /ultrareview with bug-hunting agents. Underneath all of this, Claude Opus 4.7 moved native context from 200K to 1M tokens, which is what makes the plugin/skill explosion practical inside a single Claude Code session.
Cline 3.85, Windsurf, and OpenCode May updates
Cline 3.85.0 (May 25, 2026) added GPT-5.5 support via SAP AI Core, DeepSeek V4 Flash and Pro models, Gemini 3.5 Flash on the Gemini and Vertex providers, and routed Poolside Laguna models through next-gen prompts with native tool calling. The release also fixed Vertex AI's global endpoint handling for Claude models and exposed a manual update path that bypasses the release-age gate. Windsurf shipped Claude Opus 4.7 and GPT-5.5 in May, plus a Kanban-style view of local and cloud agent sessions, a Devin Local agent that's 30% more token-efficient than Cascade, Devin Cloud included with every self-serve plan ($50 of usage included on first cloud session), and an Adaptive model option in the picker that intelligently routes to keep quota alive. OpenCode shipped v1.15.6 on May 20 (diff viewer in TUI for reviewing changes, collapsed single-child directories, shell mode in the run prompt), v1.15.9 (redesigned diff viewer with file tree, clearer errors for invalid default models), and v1.15.10 on May 27 (restored legacy production desktop flows). Anthropic API-key models now use the native runtime, experimental background agents now push updates without polling, and the desktop added tabs, a native app menu on Windows, and Ukrainian locale support.
Tool comparison matrix
| Tool | Form factor | License | Agent loop | MCP | Best for |
|---|---|---|---|---|---|
| Cursor | VS Code fork (closed) | Proprietary | Composer / Agent | Yes | Frontend, fast feedback |
| Claude Code | Terminal CLI + IDE plugins | Proprietary, source-available | Plan + Execute | Yes (remote MCP on Pro+) | Senior engineers, large refactors |
| Cline | VS Code extension | Apache 2.0 | Plan / Act, human-in-the-loop | Yes | Auditable autonomy |
| Aider | Terminal CLI | Apache 2.0 | Architect / Editor split | No (planned) | Git-native pair programming |
| OpenCode | Terminal + desktop + ext. | MIT | Build / Plan modes | Yes | Privacy-first teams |
| Continue.dev | VS Code + JetBrains ext. | Apache 2.0 | Chat + Agent | Yes | Enterprise, JetBrains shops |
| Roo Code | VS Code extension | Apache 2.0 | Multi-mode (Architect, Code, Debug) | Yes | Cost-efficient agentic work |
| Kilo Code | VS Code + JetBrains + CLI | Apache 2.0 | Subagents + Agent Manager | Yes | Heavy multi-agent workflows |
| Windsurf | VS Code fork (closed) | Proprietary | Cascade | Yes | Codemaps + flow state |
| Void AI | VS Code fork (open) | Apache 2.0 | Agent + Quick Edit | Yes | Local-only, privacy-strict |
Pricing matrix (May 1, 2026)
| Tool | Free tier | Individual paid | Team | Billing model |
|---|---|---|---|---|
| Cursor | 2K completions | Pro $20, Pro+ $60, Ultra $200 | Business $40/seat | Credit pool = plan price |
| Claude Code | None (Free Claude.ai excludes Code) | Pro $20, Max $100, Max-20x $200 | Premium $100/seat (annual) | Subscription + token caps; API per-token |
| Cline | Extension free | BYOK (you pay model) | Cline Cloud (paid) | Pass-through |
| Aider | CLI free | BYOK | n/a | Pass-through |
| OpenCode | Free | Zen / Go credits optional | Self-host | BYOK or curated routing |
| Continue.dev | Free | From $10/mo | Enterprise | Hub features + BYOK |
| Roo Code | Free | Pro adds Roo Cloud | Team adds sync | BYOK |
| Kilo Code | Free | Pay-as-you-go, zero markup | Same | Exact model price |
| Windsurf | 5 daily AI interactions | Pro $15/mo | Teams $30/seat | Daily quota (post-March 2026) |
| Void AI | Free | BYOK (or local) | n/a | Pass-through |
Two effective-cost notes from real engineers running these in production: Cursor Pro's $20 credit pool buys roughly 225 Claude Sonnet requests, 500 GPT-4o requests, or 550 Gemini requests in agent mode. Roo Code's diff-based apply_diff tool reduces token spend by about 30% compared with Cline on equivalent tasks because it only emits changed lines in a 500-line file rather than the whole file.
Model support matrix
| Tool | Frontier closed models | Open / local | BYOK | Notable defaults |
|---|---|---|---|---|
| Cursor | Claude Opus 4.7, Sonnet 4.6, GPT-5.5, Gemini 2.5 Pro | Limited | Yes (custom) | Auto mode (router) |
| Claude Code | Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 | No | No (Anthropic-only) | Sonnet 4.6 default, Opus on Max |
| Cline | Anthropic, OpenAI, Gemini, Bedrock, Vertex | Ollama, LM Studio, OpenAI-compatible | Yes | VS Code LM API (experimental) |
| Aider | Claude 3.7+/4.x, GPT-4o/5, DeepSeek V4, o-series | Ollama, OpenAI-compatible | Yes | Architect/Editor pair |
| OpenCode | 75+ providers | Ollama | Yes | Zen routing optional |
| Continue.dev | OpenAI, Anthropic, Azure, Bedrock | Ollama, vLLM, TGI | Yes | Hub for shared configs |
| Roo Code | OpenRouter (300+ models) | Ollama, LM Studio | Yes | Custom modes per model |
| Kilo Code | 500+ models via OpenRouter and direct | Ollama, LM Studio | Yes | Subagents auto-delegate |
| Windsurf | SWE-1.5 (in-house), Claude, GPT-5 | Limited | Partial | Cascade with Codemaps |
| Void AI | Anthropic, OpenAI, Gemini | Ollama, LM Studio, DeepSeek, Qwen, Llama | Yes | Local-first defaults |
For setup walk-throughs on the open / local side, see Qwen 3.5 + Claude Code OSS, OpenClaw + Ollama, and our deep dives on DeepSeek V4 and the cheaper DeepSeek V4 Flash.
Agent loop architectures, compared
"Agent loop" is the structural difference that decides whether a tool can survive a multi-hour task. Three patterns dominate in 2026:
Plan / Act split (Cline, Roo, OpenCode). The model first emits a plan with no file mutations. The user approves or edits the plan. Only then does the agent transition to Act mode where it can write files and run shell commands. Cline pioneered "human-in-the-loop" — every file edit, every command, every browser action requires explicit approval. That makes it slower but auditable, which matters when an agent is touching production code or migrating a database schema. Roo Code keeps the same skeleton but adds five built-in modes (Code, Architect, Ask, Debug, Custom) and uses diff-based edits to cut token cost.
Architect / Editor pair (Aider). A reasoning model (o1, DeepSeek R1, Opus) drafts the change in plain English. A faster, cheaper editor model (Sonnet, GPT-4o, DeepSeek V3) translates the plan into precise diffs. The split costs more per turn but is the most reliable single pattern for large refactors because the planner never spends tokens on syntax.
Subagents and orchestrators (Claude Code, Kilo Code). The top-level agent spawns specialised children — a "test runner" subagent, a "schema migration" subagent, a "frontend styling" subagent — each with its own context window. Kilo Code's April 2026 rebuild made this the headline feature: parallel tool calls and an Agent Manager that runs multiple agents side by side. Claude Code does the same via its Task tool. The downside is debuggability; when something goes sideways inside a subagent you have less visibility than a flat plan/act trace.
For a hands-on look at running the open Claude Code internals, our Claude Code OSS guide walks through the orchestrator, and using Claude 4/Sonnet with Cursor and Windsurf covers the closed-IDE side.
Real workflow examples
Greenfield: a Next.js 15 app with Postgres and auth
This is the case every demo nails. Cursor, Windsurf, Claude Code, and Cline all produce working scaffolds in under 15 minutes. The tiebreaker is what happens when you ask for a non-trivial second feature on top — say, "add Stripe Connect with webhook signature verification and idempotency keys." Claude Code on Sonnet 4.6 produced the cleanest output in our internal test (3 files modified, 1 webhook signature bug caught before commit). Cursor's Composer was 2x faster but missed the idempotency key on first pass. Aider with Architect + Editor produced the smallest diff but required a manual /add for the migration file because Aider's repo map didn't pull it in automatically.
Large-codebase refactor: migrate 80 files from Redux to Zustand
This is where most agents fall apart. The honest results from a 90k-line internal client codebase, March 2026:
- Claude Code with Opus 4.7 + subagents: finished in 4 hours over 3 sessions, 76 files cleanly migrated, 4 needed manual fixes. Cost: ~$38 of API + Max subscription.
- Cline with Sonnet 4.6: finished in 6 hours but the human-in-the-loop confirmations were the bottleneck. 78 files cleanly migrated. Cost: ~$22 BYOK.
- Cursor Agent (Auto): blew past the credit pool at file 31, then degraded to GPT-4o-mini and produced inconsistent type imports. 51 files cleanly migrated, 29 needed rework.
- Aider with Opus 4.7 architect + Sonnet 4.6 editor: 71 files cleanly migrated. The repo map saved time but Aider needed explicit
/addcalls for cross-package files. - Roo Code with Sonnet 4.6: 73 files migrated, ~30% cheaper than Cline thanks to diff-only edits.
Debugging: tracking down a flaky test
Aider, Cline, and Claude Code dominate this category because they can run the failing test in a loop and read the output. Aider's tight loop (run tests, read errors, re-edit) is still the fastest. Cursor's agent can do this too but the IDE chrome adds friction. Continue.dev with a local Qwen 3.5 model on Ollama handled a flaky pytest fixture without any cloud round-trip — slow (~45s per turn) but completely private.
Privacy, deployment, and the local-models story
Three of the ten agents are credible for "no code leaves my machine": Continue.dev with Ollama, Void AI with local DeepSeek/Qwen/Llama, and OpenCode (which explicitly states it stores no code or context). Cline and Kilo can be configured local-only but their default user flow assumes a cloud model. Aider is local-capable via Ollama but its Architect mode realistically requires a frontier model to be useful.
If your bar is "compliance-grade local," Continue.dev plus a 70B-class local model is the production-tested combo. Void AI is technically excellent but the team announced an active-development pause in early 2026 — the binary still works and the repo is open, but you should not assume new features land in 2026.
Known issues
- Cursor credit-pool surprises (mitigated by Composer 2.5). Heavy agent-mode users on the $20 Pro plan still burn the pool fast, but Composer 2.5's $0.07–$0.44 per-task pricing makes the pool stretch much further than it did under the GPT-5.5 / Opus 4.7 era. If you need frontier-model agent work, Pro+ at $60 or Ultra at $200 is still the realistic tier.
- Gemini CLI free Pro/Ultra access ends June 18, 2026. Migrate to Antigravity CLI before that date or lose access mid-task. Antigravity CLI is not open source, unlike Gemini CLI — a real change in posture if you care about auditability or self-hosting.
- Goal Mode is GA but session storage is OpenAI-side.
/goalpersists the directive on OpenAI's infrastructure; if you have a no-data-leaves-our-VPC policy, Codex CLI's Goal Mode doesn't fit. Local-first alternatives are Aider's tight test loop or Claude Code on a self-hosted Bedrock/Vertex endpoint. - Grok Build is early beta. 8 parallel sub-agents is impressive on paper but the beta has rough edges around long-running tasks; xAI has acknowledged this and committed to daily release notes. The local-first privacy model is real, but the rapid release cadence means you should pin specific versions in CI.
- Claude Code on Pro is still conservatively rate-limited. Long-running Opus 4.7 sessions on 1M-token context hit the Pro cap quickly. Max ($100) or Max-20x ($200) is the realistic floor for senior engineers using it as their primary agent.
- Cline's human-in-the-loop is slow on long tasks. The Auto-approve toggle helps but defeats the audit story. Roo Code's batched approvals are a better compromise.
- Aider still has no native MCP. A community Aider-MCP server bridges via WebSocket (allowing Claude and other MCP clients to call Aider tools), but Aider itself doesn't yet speak MCP to outside servers. If you depend on internal MCP tools, Aider is still effectively out.
- Windsurf has Devin Cloud now but pre-March-2026 credit-billed accounts don't get Supercomplete or new SWE-1.5-tier features. New users start on daily-quota billing; the Adaptive model option helps keep quota alive.
- Void AI development is still paused. The editor functions but there's no roadmap. Treat it as "stable open-source artifact," not "active product."
- SWE-bench Verified contamination persists. Use SWE-bench Pro or Terminal-Bench 2.0 when evaluating models. The 87.6% Verified score for Opus 4.7 drops to 64.3% on Pro; the 93.9% Mythos Preview Verified drops to 77.8% on Pro.
- OpenCode and Kilo Code overlap with Cline genealogically. All three share Apache-2.0 lineage; if you have a hard organisational requirement against forks-of-forks, audit dependencies before standardising.
- Antigravity 2.0 / Cursor 3 / Grok Build all bet on parallel sub-agents in worktrees. Debuggability is the universal weak point — when something goes wrong inside a sub-agent, traceability is worse than a flat plan/act trace. Build observability into your AGENTS.md and sub-agent prompts before relying on this at scale.
How to choose
Four questions decide it for most teams in mid-2026:
- Does your codebase exceed 50k lines? If yes, you need a planner/architect step (Aider Architect, Claude Code sub-agents, Roo Code's Architect mode, Grok Build's plan/search/build sub-agents) or a 1M-token context model like Opus 4.7. Pure inline-completion tools like Cursor's basic Tab degrade fast on large repos.
- Can your code legally leave the machine? If no, your shortlist is Continue.dev + Ollama, OpenCode + local, Grok Build (local-first, zero codebase data sent to xAI), or Void AI + local. Everything else assumes cloud.
- Do you need sessions that survive interruptions? If you run hours-long tasks on flaky networks or have to step away mid-run, Codex CLI
/goalis the only GA tool with persistent thread-level state at the protocol level. Cursor 3 Cloud Agents and Antigravity CLI background runs are the IDE-side equivalents. - Who owns the bill? If the company pays per-seat predictably: Cursor Business, Windsurf Teams, Claude Code Premium, or Codex Enterprise. If individual engineers expense API: Cline, Aider, Roo, or Kilo with BYOK — and for cheap agentic work specifically, Cursor with Composer 2.5 at $0.07–$0.44/task.
Frequently asked questions
Which AI coding agent has the highest SWE-bench Pro score in 2026?
As of May 2026, Claude Mythos Preview leads SWE-bench Pro at 77.8%, Claude Opus 4.7 at 64.3%, and Qwen 3.7 Max at 60.6%. GPT-5.4 (xHigh) reaches 59.1% on Scale's SEAL mini-swe-agent scaffold and GPT-5.3-Codex 56.8%. On the older SWE-bench Verified, Claude Mythos Preview hit 93.9%, Opus 4.8 88.6%, Opus 4.7 Adaptive 87.6%, and GPT-5.5 around 88.7% — but Verified is contaminated and Pro is the more honest comparison.
Is Cursor still worth $20/month in 2026?
More than before, because of Composer 2.5. Composer 2.5 (May 18, 2026) is the in-house model that ranks third on Artificial Analysis's Coding Agent Index at 62, and per-task pricing is $0.07 (standard) or $0.44 (Fast) — 10–60× cheaper than the higher-effort Opus 4.7 and GPT-5.5 variants above it. For daily agent-mode work on Composer 2.5 the $20 Pro pool now stretches a long way. If you specifically need Opus 4.7 or GPT-5.5 for the hardest tasks, Pro+ at $60 or Ultra at $200 is still the realistic tier.
What's the difference between Claude Code and Cline?
Claude Code is a terminal-native CLI from Anthropic, locked to Anthropic models, with subagents and tight Sonnet/Opus integration. Cline is an open-source VS Code extension that works with any model provider. Claude Code is more polished and faster on Anthropic infra; Cline is more flexible and free to install.
Does Aider support MCP?
Not natively as of May 2026. Native MCP support is on the roadmap but not shipped. The pragmatic workaround is the community Aider-MCP server (exposing Aider's edit_files, create_files, git_status, and similar tools to Claude and other MCP clients via WebSocket); this lets MCP clients drive Aider rather than Aider drive MCP servers. For most users the substitute is custom slash commands and the /run primitive.
Are Roo Code and Cline really that similar?
They share genealogy — Roo Code forked from Cline — but Roo added custom modes, diff-based editing, and broader model support. Independent measurements show ~30% cost savings on equivalent tasks because Roo's apply_diff only emits changed lines.
What is Kilo Code and how does it differ from Roo and Cline?
Kilo Code began as a fork of Cline and rebuilt itself in April 2026 onto a portable open-source core that ships across VS Code, JetBrains, CLI, mobile, and Slack. It's now a multi-agent platform with subagents and an Agent Manager. With 1.5M users and access to 500+ models at zero markup, it's the heaviest of the three.
Can I use AI coding agents fully offline?
Yes, with Continue.dev plus Ollama, OpenCode plus a local model, or Void AI plus Ollama/LM Studio. Realistically you'll want a 70B-class quantised model on a workstation with 64GB+ RAM or a Mac Studio.
Which agents support MCP servers?
Cursor, Claude Code, Cline, OpenCode, Continue.dev, Roo Code, Kilo Code, Windsurf, Void AI, Codex CLI, Antigravity CLI, and Grok Build all support MCP. Aider does not yet, though community Aider-MCP bridges exist. By the close of Q2 2026, MCP had roughly 9,400 published servers across the four major registries and ~1,300 production-ready servers, with the ecosystem mid-transition from stdio to hosted HTTP transport and from API keys to OAuth 2.1. MCP is governed by the Linux Foundation's Agentic AI Foundation (AAIF) since December 2025.
Is Windsurf still independent?
No. Codeium's Windsurf was acquired by Cognition (the Devin team). The product still ships under the Windsurf brand, with Cascade as the agent and SWE-1.5 as the in-house model.
What happened to Void AI?
The Void team announced an active-development pause in early 2026. The binary, Ollama integration, and cloud connectors all still function. Treat it as a stable open-source artifact, not a product with a roadmap.
Cursor vs Windsurf — which IDE is better in 2026?
Cursor still has tighter inline-edit ergonomics and a larger plugin ecosystem; Windsurf's Cascade plus Codemaps is better at navigating unfamiliar repos. At $15/mo Pro, Windsurf is also $5 cheaper than Cursor's entry tier, but Cursor's free Hobby plan is more generous for trial use.
Should I use a frontier closed model or a local open one?
For senior engineers shipping production code: Claude Opus 4.7 or GPT-5.5 still outperform any local model on multi-file refactors. For privacy-bound work or routine boilerplate, DeepSeek V4, DeepSeek V4 Flash, or Qwen 3.5 on Ollama are entirely viable. The honest answer is hybrid: a frontier model in the planner/architect slot and a fast local model in the editor slot.
Which agent has the best git integration?
Aider — it auto-commits each change with a sensible message and is designed to work entirely through git diffs. Claude Code is a close second with its /commit protocol and PR tooling. The 2026 entrants — Cursor 3's parallel agents in worktrees, Grok Build's 8 sub-agents each in isolated worktrees, and Antigravity CLI's multi-agent orchestration — all use git worktrees as the isolation primitive, which is the most under-discussed shift of the year.
What is Cursor Composer 2.5 and how does it compare to Opus 4.7?
Composer 2.5 (launched May 18, 2026) is Cursor's second-generation in-house coding model. On Artificial Analysis's Coding Agent Index it sits at 62 (third place), behind Claude Opus 4.7 in Claude Code at 66 and GPT-5.5 in Codex at 65. The headline is cost: Composer 2.5 is priced at $0.07/task (standard) or $0.44/task (Fast), versus roughly 10–60× more for the variants above it. It scored 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1, and improved +35 points on SWE-Bench-Pro-Hard-AA versus Composer 2. For most agentic work it's the new value sweet spot; reach for Opus 4.7 or GPT-5.5 only when the task specifically needs frontier reasoning.
What is Codex CLI Goal Mode (/goal) and when did it go GA?
Goal Mode reached GA on May 21, 2026 in Codex CLI 0.133.0 (and matching IDE/desktop builds). You type /goal followed by your prompt; Codex stores the directive as a persisted thread-level state machine and resumes work after network drops, deliberate pauses, or budget resets. The same May 2026 release built on the earlier Rust rewrite of Codex CLI (versions 0.128–0.133 are all Rust). Documented six-hour runs have survived five-hour pauses without losing context.
Should I migrate from Gemini CLI to Antigravity CLI?
Yes, if you're on a free Google AI Pro or Ultra plan — Gemini CLI and Gemini Code Assist IDE extensions stop serving requests for those tiers on June 18, 2026. Antigravity CLI (GA May 19, 2026) is the official replacement, built in Go, and shares its agent harness with Antigravity 2.0 desktop. It supports agent skills, hooks, sub-agents, and extensions at launch. The trade-off: Antigravity CLI is not open source, unlike Gemini CLI. Paid Gemini API key access and Gemini Code Assist Standard/Enterprise licences keep access to the older CLI past the deadline.
What is Grok Build and who is it for?
Grok Build is xAI's terminal coding agent, launched May 14, 2026 in early beta. It runs up to 8 parallel sub-agents, each in its own isolated git worktree, across a three-stage plan/search/build workflow with 256K context. It scored 70.8% on SWE-Bench Verified at launch. API pricing is aggressive at $0.20/$1.50 per million tokens (input/output). Initially gated to SuperGrok Heavy ($300/mo), access expanded May 24 to SuperGrok ($30/mo) and X Premium+ ($40/mo). It's the right pick for teams that want a local-first privacy model (zero codebase data sent to xAI servers) and aggressive parallelism on hardware with the cores to handle 8 concurrent sub-agents.
What is AGENTS.md and is it actually adopted?
AGENTS.md is a simple open format for project-specific agent instructions — a README.md but for AI coding agents. Released by OpenAI in August 2025 and donated to the Linux Foundation's Agentic AI Foundation (AAIF) in December 2025, it's now adopted by 60,000+ open-source projects and agent frameworks including Codex, Cursor, Devin, Factory, Gemini CLI, GitHub Copilot, Jules, VS Code, and Amp. If you ship one file to make your repo agent-friendly across vendors, this is the file. See our AGENTS.md complete guide for the format spec and the patterns we use internally at Codersera.
What changed with Claude Code Skills and plugins in spring 2026?
Anthropic opened the Claude Code plugin marketplace as a first-class system in spring 2026. A Skill is a single instruction set (markdown plus reference material); a plugin bundles multiple Skills, MCP servers, or commands. Both are filterable in real time from the /skills and /plugin prompts. The official marketplace is claude-plugins-official; the curated community list is awesome-claude-code-plugins. Underneath, Claude Opus 4.7's native context moved from 200K to 1M tokens, which is what makes the plugin/skill expansion practical inside a single Claude Code session. Claude Code on the web also got a sessions-sidebar redesign and drag-and-drop layout in Week 17 (April 20–24, 2026), plus custom themes and an /ultrareview bug-hunting preview.
How many MCP servers exist in May 2026?
Roughly 9,400 published servers across the four major registries, with about 1,300 considered production-ready. The ecosystem grew by ~1,000+ new indexed servers per month through Q1–Q2 2026 (PulseMCP's tracking), with the Q2 close projecting 14,800–22,000 servers by year-end. The structural transitions to watch in Q3/Q4 2026 are stdio → hosted HTTP transport, and API keys → OAuth 2.1. MCP itself is governed by the Linux Foundation's Agentic AI Foundation, established December 2025.
Next steps
If you've read this far you already know the choice isn't "which tool is best" but "which tool fits the codebase, the team, and the threat model." For most production teams as of May 2026 the right starter combo is Claude Code on Max-20x (Opus 4.7 + 1M-token context + plugin marketplace) plus one IDE-side agent (Cursor 3 with Composer 2.5 for cost-efficient parallel agents, or Cline 3.85 for full BYOK control), with Codex CLI /goal for long-horizon background runs and a local Continue.dev or OpenCode fallback for sensitive repos. Ship an AGENTS.md to your repo today; it's the cheapest single move that improves results across every vendor in this guide.
The harder problem is hiring engineers who can plug an AI coding agent into a real codebase, write the MCP servers your tooling needs, and ship code that survives review. Hire a Codersera-vetted Python or TypeScript engineer who has integrated AI coding agents into production workflows.