Claude Code vs OpenAI Codex (May 2026): The Honest Engineering-Team Comparison

Updated 2026-05-26 with the current SWE-bench leaderboard (GPT-5.5 now leads Verified; Opus 4.7 leads Pro), Anthropic's May 14 billing announcement, Haiku 4.5, and Codex Goal mode GA.

Quick answer. Claude Code and OpenAI Codex are roughly at parity on the May 2026 leaderboards but differ sharply in workflow. GPT-5.5 now narrowly leads SWE-bench Verified (88.7% vs 87.6%) and Terminal-Bench 2.0 (82.7%); Claude Opus 4.7 leads SWE-bench Pro (64.3% vs 58.6%) and burns 3–4× more tokens per task. Pick Claude Code for quality on hard multi-file refactors and IDE depth; pick Codex for async PRs, OS-kernel sandboxing, and per-task cost. From June 15, 2026, Anthropic splits Claude billing — interactive use stays on plan limits, programmatic use moves to a metered Agent SDK credit.

Heads up — June 15, 2026 billing change

Anthropic announced on May 14, 2026 that it is splitting Claude subscription billing into two pools on June 15, 2026. Interactive Claude Code in your terminal/IDE keeps using your existing Pro/Max limits. Programmatic usage (Claude Agent SDK, claude -p, the GitHub Actions integration, third-party tools) draws from a new dollar-denominated Agent SDK credit billed at full API rates — $20 on Pro, $100 on Max 5×, $200 on Max 20×. Unused credit doesn't carry over. Watch for an Anthropic email on June 8 with a one-time opt-in. Full breakdown: Anthropic's June 15 billing change explained.

By 2026, "Claude Code or OpenAI Codex" is the most common AI tooling decision in engineering teams. Both are agentic CLI coders. Both can open pull requests, run tests, refactor across files, and operate from your terminal, your IDE, your phone, or a cloud sandbox. They have very different defaults, very different price points, and they win on different benchmarks. This is the honest engineering-team comparison.

Want the full picture? Read our continuously-updated AGENTS.md and SKILL.md complete guide — the open standard 60k+ repos use to give Cursor, Codex, Claude Code, Copilot, and 20+ other agents the build commands, conventions, and boundaries they need to actually follow project rules.

Codex vs Claude Code — TL;DR which one to pick

  • Pick Claude Code if code quality, multi-file refactor reliability, and IDE integration matter more than monthly cost. It leads SWE-bench Verified and feels like a senior pair-programmer.
  • Pick OpenAI Codex if you want async parallel work, terminal/CI-first flows, GitHub-PR-native automation, and lower per-task token spend. It leads Terminal-Bench 2.0 and ships PRs from Slack.
  • Use both if you're on a senior team. Most do — Claude for design and surgical edits, Codex for bulk-parallel work.

If you care most about…

PriorityPick
Highest code quality on hard refactorsClaude Code (Opus 4.7)
Lowest token spend per delivered taskOpenAI Codex
Async "fire and forget" PRsOpenAI Codex Cloud
Tight interactive loop with sub-agentsClaude Code
OS-level sandbox security by defaultOpenAI Codex CLI
Open-source agent harnessOpenAI Codex CLI (Apache-2.0)
Headless multi-day "Goal mode" runsOpenAI Codex (GA May 2026)

What each tool actually is

Claude Code (Anthropic)

Claude Code is Anthropic's agentic coder. It runs in the terminal, in VS Code and JetBrains via plugins, in a desktop app on macOS and Windows, on the web at claude.ai/code, on iOS, and inside Slack. It defaults to Claude Sonnet 4.6 on the Pro plan and gives access to Claude Opus 4.7 on Max. Its differentiating features are sub-agents (Agent Teams), Skills, Hooks, slash commands, the project-rooted CLAUDE.md memory file, Routines (managed scheduled cloud sessions), Remote Control from a phone, and a headless claude -p mode for piping into Unix toolchains. MCP support is first-class.

Important for June 15, 2026 onward: interactive Claude Code in the terminal and IDE keeps drawing from your Pro/Max plan's existing message and weekly limits (which were temporarily lifted 50% through July 13, 2026 in Anthropic's response to Codex pressure). Programmatic use — claude -p, the Agent SDK, the Claude Code GitHub Actions integration, ACP-based third-party tools — moves to a separate, monthly Agent SDK credit billed at full API list prices. If you script Claude Code into CI or build your own agent on the Agent SDK, budget against that new pool, not the plan limits you're used to.

OpenAI Codex

Codex is OpenAI's agentic coder. The CLI is open source (Apache-2.0, ~85k GitHub stars as of May 2026, written in Rust, install with npm i -g @openai/codex or Homebrew). The repo ships roughly weekly releases — latest is v0.133.0 (May 21, 2026) with persisted Goal workflows, model tools, runtime continuation, and TUI controls. Codex Cloud is a cloud sandbox you can dispatch tasks to from ChatGPT, Slack, the macOS desktop app, or GitHub Code Review. Codex runs on GPT-5.5, GPT-5.4, and GPT-5.3-Codex. It uses an AGENTS.md file for project memory. The CLI sandbox is OS-level (Seatbelt on macOS, Landlock on Linux), with three approval modes — Suggest, Auto-Edit, Full Auto. Goal mode went GA on May 21, 2026 — point Codex at a multi-hour or multi-day objective and let it iterate against tests in a sandbox.

One-line architecture summary: Claude Code is a local-first interactive loop with optional cloud spillover. Codex is a local CLI plus a strong cloud-async sandbox dispatched from ChatGPT.

Claude Code vs Codex — feature by feature

FeatureClaude CodeOpenAI Codex
Latest modelOpus 4.7 / Sonnet 4.6 / Haiku 4.5GPT-5.5 (default) / GPT-5.4 / GPT-5.3-Codex
Open sourceNo (Agent SDK is)Yes — Apache-2.0, Rust, ~85k stars
Installcurl claude.ai/install.sh | bashnpm i -g @openai/codex
Project memory fileCLAUDE.mdAGENTS.md
Context windowUp to 1M tokens (Sonnet 4.6 GA)400K
IDE pluginsVS Code, JetBrains, CursorVS Code, JetBrains, Cursor
Desktop appmacOS + Windows + Win-ARM64macOS (Windows planned)
Web / mobileclaude.ai/code, iOS app, SlackChatGPT web, Slack
Cloud async agentYes (Routines, Web sessions)Yes (Codex Cloud — flagship)
Sub-agents / multi-agentYes (Agent Teams)Yes (subagents)
Long-horizon autonomous modeRoutines (scheduled)Goal mode (GA May 21, 2026)
MCP supportFirst-classYes; HTTP-MCP still maturing
SandboxingApp-layer hooks + permissionsOS-kernel (Seatbelt/Landlock) + cloud sandbox
Approval modesPer-tool promptsSuggest / Auto-Edit / Full Auto
Headless / scriptingclaude -p (Agent SDK credit pool from June 15)codex exec non-interactive
Voice inputNoYes (spacebar transcribe)
Scheduled tasksRoutinesCron via CI only
Billing model (as of June 15, 2026)Two pools: plan limits for interactive; Agent SDK credit for programmaticToken-based billing on Plus/Pro/Business/Enterprise (April 2026)

Benchmarks — what the leaderboards say

Leaderboards aren't workflows, but they're the only third-party signal we have. As of May 2026:

BenchmarkBest ClaudeBest Codex / GPTWhat it measures
SWE-bench VerifiedClaude Opus 4.7 — 87.6%GPT-5.5 — 88.7%Real GitHub issues; GPT-5.5 now narrowly leads (May 2026 flip)
SWE-bench ProClaude Opus 4.7 — 64.3%GPT-5.5 — 58.6%Contamination-resistant; Claude leads by ~5.7pts
Terminal-Bench 2.0ForgeCode + Opus 4.6 — 79.8%Codex CLI + GPT-5.5 — 82.7%Pure terminal/DevOps tasks; Codex's home turf
Token efficiency (Composio task)6.23M tokens1.5M tokensSingle experiment; Codex ≈ 4× more efficient

Sources: swebench.com Verified leaderboard; Scale SWE-Bench Pro; tbench.ai Terminal-Bench 2.0; Composio's measured task experiment.

OpenAI has historically flagged that some Verified items may be contaminated in training data, so SWE-bench Pro (contamination-resistant) is the more trustworthy head-to-head — and Pro now puts Claude Opus 4.7 ahead by ~5.7 points (64.3% vs 58.6%). Verified flipped to GPT-5.5 in late April 2026 after the GPT-5.5 launch (88.7% vs 87.6%), so on paper Codex now leads two of the three public leaderboards, while Claude leads the contamination-resistant one. Terminal-Bench 2.0 remains Codex's home turf: pure terminal/DevOps tasks where the harness matters as much as the model. SWE-bench Pro stays Claude's home turf: multi-file repository changes that look like the issues a senior engineer triages.

Pricing and real per-month cost

PlanMonthlyIncludes Claude Code?Includes Codex?Notes
ChatGPT Free$0NoLimitedDemo Codex only
Claude Pro$20 ($17 annual)YesNoSonnet 4.6 + Opus 4.7. From June 15, 2026: $20 Agent SDK credit for programmatic use.
ChatGPT Plus$20NoYesToken-based limits (April 2026 change)
Claude Max 5×$100YesNo5× Pro limits + $100 Agent SDK credit (June 15)
Team Premium (Claude)$125 / seatYesNo5-seat min
ChatGPT Pro$200NoYes25× Plus through May 31, 2026 (was 2× promo)
Claude Max 20×$200YesNo20× Pro limits + $200 Agent SDK credit (June 15)
API / pay-goper-tokenSonnet 4.6 $3/$15; Opus 4.7 $5/$25; GPT-5.x per OpenAI rate card

Per-million-token API pricing (May 2026): Claude Sonnet 4.6 is $3 input / $15 output. Claude Opus 4.7 is $5 input / $25 output. Claude Haiku 4.5 is $1 input / $5 output for the cheap-and-fast tier. GPT-5.x Codex pricing varies by tier and is generally lower per-token than Opus. Prompt caching (up to 90% savings) and batch APIs (50% off) can compound discounts on either side.

The hidden cost: tokens, not subscription dollars

If both tools cost $20/mo, why does this matter? Because per-task token consumption differs by 3–4×. In Composio's measured Figma-clone task, Claude Code burned 6.23 million tokens and Codex burned 1.5 million for the same end result. On pay-as-you-go API pricing, that's the difference between $93 and $7.50 for one task on the top tier. On subscription plans, it's the difference between hitting your 5-hour rate-limit ceiling and not.

The honest summary: Codex is meaningfully cheaper per delivered task. Claude is meaningfully better at hard ones. Whether the savings justify the quality gap depends on what you're shipping — and from June 15, 2026, whether your usage is interactive (still bundled in the plan) or programmatic (now metered at API rates against the Agent SDK credit).

Workflow — how they feel day to day

Claude Code: the interactive loop

Claude Code is built around a tight session: you talk to it, it runs tools, you review, you iterate. Sub-agents (Agent Teams) let you parallelize within a session — one agent fixes the failing test while another updates the docs. Hooks let you intercept tool calls (e.g., block edits to migrations/). Routines schedule cloud sessions that run on a cron. CLAUDE.md stays at the project root and loads automatically.

Codex: the async hand-off loop

Codex Cloud's model is different. You describe a task in ChatGPT, Slack, or the macOS desktop app; Codex Cloud spins up a sandbox, runs the task to completion, and opens a PR. The sandbox is internet-disabled by default. The desktop app is built around managing many parallel sessions, each one a different agent. AGENTS.md is increasingly a quasi-standard, adopted by Cursor, Aider, and other tools beyond Codex itself. Goal mode lets a single session iterate for hours or days against an objective without you babysitting it.

CLAUDE.md vs AGENTS.md

Both are project-rooted markdown files that the agent reads at session start. Both encode coding conventions, repo structure, gotchas, and project goals. Many teams now keep both, because one or the other tool may show up in the workflow. They can be near-identical files with different filenames.

Security and sandboxing

Codex CLI uses OS-kernel sandboxing — Seatbelt on macOS, Landlock on Linux. Default mode is Suggest (you approve every action); Auto-Edit lets it edit files in the sandboxed working directory; Full Auto runs without prompts in the sandbox. Codex Cloud runs with the internet disabled by default. Compliance: SOC 2 + zero-data-retention options on Business and Enterprise.

Claude Code uses application-layer permissions. Per-tool prompts ("Allow Claude to edit src/foo.ts?"), hooks for guardrails ("block any edit to schemas/"), and project-scoped permissions in settings.json. It runs in your shell with your permissions — secure if configured, looser by default. Compliance: HIPAA-ready Enterprise tier with 500K context.

Net: Codex is stricter by default; Claude is more flexible but requires deliberate configuration to match Codex's defaults.

When to pick which

Pick Claude Code if

  • You do large multi-file refactors and consistency across files matters.
  • You need deep IDE integration (VS Code + JetBrains both).
  • You want sub-agents to parallelize within a single session.
  • Output quality on architectural decisions justifies higher token spend.
  • Your team is mostly senior frontend or full-stack.
  • Your usage is primarily interactive (the new Agent SDK credit pool from June 15 only bites if you script claude -p, the Agent SDK, or GitHub Actions).

Pick OpenAI Codex if

  • You want to delegate well-scoped tickets and walk away.
  • Your work is terminal-heavy: shell scripts, CI tweaks, Dockerfile fixes.
  • Your team is on ChatGPT Pro/Business already.
  • Token economics matter — many small tasks per day.
  • You want OS-level sandbox isolation by default.
  • You want Goal mode to run long horizon tasks unattended.

Use both

The honest answer for senior teams: Claude Code as the daily driver for design and surgical edits, Codex Cloud for bulk parallel PRs from a single product spec. Many engineers run one terminal pane with claude and a separate ChatGPT/Codex thread for fire-and-forget tasks. From June 15, 2026, the hybrid pattern gets a budget twist — keep Claude Code interactive (covered by your plan), and route Agent-SDK-style automation to whichever pool is cheaper for that workload.

A note for engineering leaders hiring or scaling

The multiplier on these tools isn't the tool — it's the operator. A vetted senior dev with Claude Code or Codex outputs roughly 3–5× more per week than a junior with the same tool, because the AI doesn't level up the operator's judgment about what to build, what to refactor, and what to leave alone. Codersera matches you with vetted remote engineers who already work fluently with these CLIs — interviewed, reference-checked, and ready with a risk-free trial period.

Feature-by-feature deep dive (May 2026)

The comparison table above is the 30-second answer. This is the 5-minute one: the features that actually drive day-to-day choice, broken out side by side, anchored to what each vendor shipped through May 2026.

What features does OpenAI Codex CLI offer?

  • Model picker. /model switches between GPT-5.5, GPT-5.4, and GPT-5.3-Codex mid-session, or pass --model at launch. GPT-5.5 is the default for complex coding work; lighter models save tokens on routine edits.
  • Goal mode (GA May 21, 2026). Point Codex at a multi-hour or multi-day objective and let it iterate against your tests without a human in the loop. Now backed by dedicated storage with progress tracking across turns. Available in the CLI (v0.128.0+), VS Code/JetBrains extensions, and the Codex desktop app.
  • Three approval modes. Read-only (suggest), Auto (default — edit files in working dir), and Full Access (skip prompts entirely). OS-kernel sandboxing via Seatbelt on macOS and Landlock on Linux means even Full Access stays inside its blast radius.
  • MCP support. STDIO and streaming HTTP servers configurable in ~/.codex/config.toml or via the codex mcp CLI. Servers launch automatically when a session starts.
  • Customization. Single TOML config (~/.codex/config.toml) drives custom slash commands, themes, MCP servers, and approval defaults. Enterprises can ship reusable plugin bundles containing skills, app integrations, and MCP servers — added in v0.116.0 (March 2026) alongside codex doctor for diagnostics and a userPromptSubmit hook for policy enforcement.
  • Install footprint. npm i -g @openai/codex or brew install --cask codex. Single Rust binary, Apache-2.0, ~85k stars. Roughly weekly releases.
  • Headless / scripting. codex exec runs non-interactively for CI pipelines. Sessions persist locally and can be resumed with codex resume.
  • What makes it feel different. Defaults that just work — strict sandbox, autonomous loops, /status for token usage, Ctrl+T for transcript view. Less ceremony, fewer permission prompts, faster turnaround on bounded tasks.

What features does Claude Code offer?

  • Model picker. Claude Sonnet 4.6 (1M context, default on Pro) and Opus 4.7 (Max). Fast mode now uses Opus 4.7 by default. Switch with /model mid-session.
  • Sub-agents / Agent Teams. A lead agent coordinates work; specialized sub-agents handle bounded tasks (code review, test running, security checks) in isolated context windows. Only the summary returns to the main thread — verbose output stays out of the parent context.
  • Hooks at 25 lifecycle points. Deterministic shell commands that fire on events like UserPromptSubmit (block or modify the prompt before Claude sees it) and PreToolUse (the primary security checkpoint — block edits to migrations/, force lint before commit, etc.). Unlike instructions, hooks aren't subject to model interpretation.
  • Skills. Named bundles of instructions plus optional helper files, invoked via the Skill tool. Package repeatable workflows your team can share — e.g., /review-pr, /deploy-staging.
  • Slash commands. User-defined prompt templates at .claude/commands/<name>.md (project-level) or ~/.claude/commands/ (user-level). YAML frontmatter sets description, allowed tools, and default model.
  • MCP support. First-class — same MCP servers work across Terminal, VS Code, JetBrains, Desktop, and Web. Connect Google Drive, Jira, Slack, custom tooling.
  • Customization. CLAUDE.md at project root loads at session start; the entire .claude/ directory holds commands, skills, agent definitions, hooks, and per-project settings. Auto memory captures build commands and debugging insights across sessions automatically.
  • Routines. Anthropic-managed scheduled sessions that keep running with your computer off. Trigger on cron, API calls, or GitHub events.
  • Surface coverage. Terminal, VS Code, JetBrains, macOS + Windows + Win-ARM64 desktop, web (claude.ai/code), iOS, Slack. Remote Control lets you continue a local session from your phone; Dispatch lets your phone kick off a desktop session.
  • Token-budget controls. Per-tool permissions in settings.json, hooks for hard guardrails. From June 15, 2026, the Agent SDK credit pool meters programmatic usage (claude -p, GitHub Actions, ACP tools) separately from your interactive plan limits — see our June 15 billing change explainer for the budgeting math.
  • What makes it feel different. Extensibility on top of a strong planning model. Hooks + skills + sub-agents are deeper than Codex's customization surface today, at the cost of more setup. Pairs naturally with the workflows in our Opus 4.7 task budgets for engineering teams writeup.

AGENTS.md is becoming the cross-tool standard

AGENTS.md emerged from a Sourcegraph / OpenAI / Google / Cursor / Factory collaboration and is now stewarded by the Agentic AI Foundation under the Linux Foundation. By May 2026 it's used by 60,000+ open-source projects and supported natively by Codex, Cursor, GitHub Copilot, Gemini CLI, Windsurf, Aider, Zed, Warp, and RooCode. Claude Code still uses CLAUDE.md as its native format — the open GitHub issue requesting AGENTS.md support has thousands of upvotes but no timeline. The standard workaround is keeping AGENTS.md as the source of truth and symlinking CLAUDE.md to it, or keeping CLAUDE.md as a thin Claude-specific layer that imports AGENTS.md.

What do developers actually say? (community signal, May 2026)

Benchmarks and feature matrices are one input. What people actually report after spending real money and real hours is another. Here's the consensus from Reddit, Hacker News, and developer Twitter through May 2026.

Reddit consensus

The most-cited finding from a r/ClaudeAI and r/OpenAI cross-survey of 500+ developers: 65% preferred Codex for daily coding, yet blind reviews of the produced code rated Claude Code as cleaner, more idiomatic, and better structured (67% win rate in blind comparisons). The gap between "what people pick" and "what produces better code" maps almost entirely to rate limits and per-task token economics, not capability.

Claude Code has better code quality but hits usage limits too quickly to be a daily driver. Codex is slightly lower quality but actually usable. (Paraphrasing the consensus across r/ClaudeCode and r/Anthropic threads through Q2 2026.)

The pain point that drove most of the spring 2026 switching: in late March, Claude Max subscribers reported their 5-hour session windows burning out in 60-90 minutes on workloads that had previously been fine. Anthropic acknowledged the issue publicly ("people are hitting usage limits in Claude Code way faster than expected") and later temporarily lifted Pro/Max limits 50% through July 13, 2026. That damage-control move kept many subscribers from churning, but the perception that Claude Code's limits are tighter than its marketing implies is now baked in.

Hacker News

The recurring HN thread "Ask HN: Is Codex really on par with Claude Code?" (April 2026) captures the nuanced mid-2026 sentiment. Top-voted positions:

  • Codex's sandbox-by-default model wins for autonomous, fire-and-forget work — fewer permission prompts mean fewer interruptions.
  • Claude Code's persistent sessions, skills, and hooks remain stronger for complex agentic loops that need browser automation or custom tool access.
  • Several commenters reported Claude Code "got slower" through Q1 2026, occasionally stalling for minutes on simple prompts — a complaint that surfaced repeatedly enough to be worth taking seriously.
  • The "both tools are roughly on par" line dominates: Claude generates cleaner code, Codex understands intent with fewer tokens, and a review pass tightens up Codex's output to Claude's level on most tasks.

What developer Twitter/X is saying

The dominant framing on X through April and May 2026, repeated by multiple prominent developers: "2026 power stack: Codex for keystroke, Claude Code for commits." The pattern that crystallized — both during and after Anthropic's late-March rate-limit incident — is hybrid use, not one-or-the-other.

One widely-shared comparison thread (Ian Nuttall, side-by-side trial) reported that GPT-5 used ~90,000 tokens to build a worker that Opus did in ~50,000 — Claude was the higher-quality output per token, but Codex was meaningfully cheaper on a per-task basis at the same dollar plan tier. The same thread praised Codex's /status command for visible token usage and Ctrl+T transcript view as small ergonomic wins that compound over a day.

OpenAI's response to the perception shift was unusually direct for the company: a "two free months of Codex" promo aimed at Claude Code switchers landed in early May, and Greg Brockman was given permanent product leadership across ChatGPT, Codex, and the developer API on May 16, signalling that Codex is now a top-line product priority, not a side project. Anthropic's countermove — the temporary 50% lift on Pro/Max limits, plus the June 15 billing-pool split that protects interactive Claude Code users from programmatic-usage surprises — is documented in our June 15 explainer.

Recent updates worth knowing (May 2026)

  • GPT-5.5 takes the Verified lead (April 23, 2026 launch) — first fully retrained base model since GPT-4.5; SWE-bench Verified 88.7% (vs Opus 4.7's 87.6%) and Terminal-Bench 2.0 82.7%. Now the default Codex model.
  • Anthropic billing change announced (May 14, 2026, effective June 15) — interactive Claude Code stays on Pro/Max plan limits; programmatic usage moves to a metered Agent SDK credit pool. Watch for a June 8 email with a one-time opt-in. See the billing change explainer.
  • Anthropic temporary limit lift (May 13, 2026) — Pro/Max weekly limits raised 50% through July 13, 2026 in response to Codex pressure. Third capacity bump in five weeks.
  • OpenAI "Codex for Claude Code switchers" promo (early May 2026) — two free months aimed at users frustrated by Anthropic's March rate-limit incident.
  • OpenAI org consolidation (May 16, 2026) — ChatGPT, Codex, and the developer API merged under one product organization led by Greg Brockman.
  • Codex Goal mode GA (May 21, 2026) — long-horizon autonomous mode, available in CLI v0.128.0+, IDE extensions, and the desktop app.
  • Codex v0.116.0 (March 19, 2026) — enterprise plugin bundles, codex doctor diagnostics, userPromptSubmit hook for prompt auditing/policy, websocket reliability fixes.

The honest read on the community signal: this is not a winner-takes-all race in 2026. The strongest pattern across Reddit, HN, and X is teams running both tools — Codex for cost-sensitive bulk work and autonomous PRs, Claude Code for high-stakes refactors and architecture. The Grok Build vs Claude Code vs Codex CLI comparison and our best MCP servers for Claude Code and Cursor guide both reflect the same multi-tool-by-default norm.

FAQ

Is Claude Code or OpenAI Codex better in 2026?

There's no single winner. Since the GPT-5.5 launch on April 23, 2026, Codex narrowly leads SWE-bench Verified (88.7% vs 87.6%) and Terminal-Bench 2.0 (82.7%), while Claude Opus 4.7 still leads the contamination-resistant SWE-bench Pro (64.3% vs 58.6%) and wins blind code-quality reviews ~67% of the time. Most senior teams run both.

What changes for Claude Code on June 15, 2026?

Anthropic splits Claude subscription billing into two pools. Interactive Claude Code in your terminal and IDE keeps drawing from your existing Pro/Max plan limits. Programmatic usage — the Claude Agent SDK, claude -p, the Claude Code GitHub Actions integration, and ACP-based third-party tools — moves to a separate monthly Agent SDK credit billed at full API list prices ($20 on Pro, $100 on Max 5×, $200 on Max 20×; unused credit forfeits monthly). See our June 15 billing change explainer.

Do they use the same models?

No. Claude Code runs Anthropic's Claude family (Opus 4.7, Sonnet 4.6, Haiku 4.5). Codex runs OpenAI's GPT family (GPT-5.5 default, GPT-5.4, GPT-5.3-Codex).

Which one is cheaper?

At equal $20/mo plans, Codex is cheaper per delivered task because of ~3–4× lower token consumption per workflow. For "must be right first try" multi-file refactors, Claude's higher quality often justifies the higher token spend. After June 15, 2026, the gap widens for heavy programmatic Claude users because the Agent SDK credit is metered at API rates.

Is OpenAI Codex CLI still actively maintained?

Yes — actively. The repo at github.com/openai/codex sits at ~85k stars (May 2026) with roughly weekly releases. Latest stable is v0.133.0 (May 21, 2026) with persisted Goal workflows, model tools, runtime continuation, and modal Vim editing in the TUI composer. Apache-2.0, Rust.

Is OpenAI Codex open source?

The Codex CLI is Apache-2.0 at github.com/openai/codex. Claude Code's CLI is closed-source, though Anthropic publishes the Agent SDK separately.

Does Claude Code work asynchronously like Codex Cloud?

Yes, via Routines and Web/iOS sessions. Codex's cloud-sandbox-by-default model is more battle-tested for fire-and-forget PRs, and the GitHub PR integration is tighter. Codex's Goal mode (GA May 21, 2026) is the closest equivalent to "set it and forget it for a day" in either ecosystem.

CLAUDE.md vs AGENTS.md — what's the difference?

Same idea, different file. Both sit at project root and load on session start. Many teams keep both so either tool works on the same repo. AGENTS.md is becoming a quasi-standard.

Which is more secure?

Codex by default — it sandboxes at the OS kernel level (Seatbelt/Landlock) and runs cloud tasks with the internet disabled. Claude Code relies on per-tool permission prompts and hooks; secure if configured, looser if not.

Can I use both Claude Code and Codex on the same repo?

Yes. Hybrid pattern: Claude Code for architecture and complex changes, Codex Cloud for bulk parallel PRs. Keep both CLAUDE.md and AGENTS.md at the project root.

Which one has the bigger context window?

Claude Sonnet 4.6 ships a 1M-token window at standard pricing. Codex's GPT-5.x models are at 400K. For very long-context work, Claude has the edge.

Does either one support voice input?

Codex's CLI supports a hold-spacebar voice transcription. Claude Code does not currently.

The bottom line

Claude Code is the better hand-on-the-keyboard pair-programmer in 2026 — higher quality on hard tasks, larger context, deeper IDE integration. OpenAI Codex is the better autonomous worker — cheaper per task, better sandboxing by default, smoother PR-from-anywhere workflow. Neither one obviates the other; the best engineers in 2026 don't pick — they use both.

If you're picking one: start with whichever ecosystem you're already paying for. If you're paying for both: keep paying for both. The expensive part is always the developer.