Claude Code: 10 Common Errors and the /doctor Cheatsheet (2026)

A senior-engineer field guide to the ten errors you will actually hit in Claude Code in 2026 — 529, 401, MCP -32000, Client Closed, plugin and skill failures, subagent timeouts, sandbox denials — plus how to read the /doctor output line by line.

Quick answer. Most Claude Code errors are environmental, not model bugs. HTTP 529 means Anthropic is overloaded — back off and switch models. 401 means your OAuth token expired — /logout then /login. MCP -32000 and Client Closed almost always mean a server process died on launch. Run /doctor first; it surfaces 80% of misconfigurations in one pass.

The Claude Code error surface in 2026 is small but unforgiving: a handful of transport, auth, and configuration failures account for almost every broken session. The SERP for "Claude Code error" is dominated by forum threads with wrong fixes from six months ago. This is the field reference we use at Codersera when an engineer pings us mid-session — what triggers each error, what /doctor says about it, the fixes ordered by likelihood, and a line-by-line cheatsheet for /doctor itself.

Why does Claude Code error at all?

Claude Code is a thin shell over three things: the Anthropic API, your local OS sandbox, and a growing set of subprocess transports (MCP servers, hooks, plugins, skills). The model itself rarely fails. What fails is the boundary — an OAuth token expiring mid-run, an MCP server writing to stdout and corrupting JSON-RPC, a marketplace 403 behind a corporate proxy, a YAML frontmatter line prettier wrapped onto two lines. Treat every error as environmental first; only escalate to "model bug" after that.

Why am I getting an HTTP 529 overloaded error?

What causes it

529 is overloaded_error — Anthropic’s capacity for the specific model you requested is temporarily saturated. Not a quota error (that’s 429) and not counted against your usage. Common during peak hours on Opus, especially on the Max plan.

What /doctor shows

API connectivity passes green — /doctor can’t see capacity, only reachability. The error appears inline as API Error: 529 {"type":"overloaded_error"}.

How to fix it

  1. Check status.claude.com first. If the API is red, you’re wasting effort debugging locally. Wait for the incident to clear.
  2. Switch model with /model. Capacity is tracked per model. If Opus is overloaded, Sonnet often isn’t. The model switch is the single fastest unblock during a 529 wave.
  3. Let Claude Code retry. The CLI already retries transient failures up to 10 times with exponential backoff. Do not wrap your own retry loop on top — tight retries make overload worse and you’ll get rate-limited as collateral damage.

When to escalate

If 529s persist across every model for >30 minutes with a green status page, file a ticket with requestId values. Sustained per-account 529s have been a known 2026 regression.

What does an HTTP 503 from Claude Code mean?

What causes it

503 is the edge telling you Anthropic’s backend is unreachable on your route. Unlike 529 (application-level), 503 usually means the request didn’t make it past Cloudflare or a regional gateway — corporate proxies, a flapping VPN, or a brief edge incident in your region.

How to fix it

  1. Test raw connectivity: curl -I https://api.anthropic.com/v1/messages. If that 503s too, the problem is your network, not Claude Code.
  2. Disable your VPN or zero-trust agent for one request. Most 503 reports we see come from a corporate ZTNA that selectively breaks long-lived HTTPS streams.
  3. Try a different egress. Tether to a phone hotspot — if the 503 vanishes, the issue lives in your office network.

When to escalate

Persistent 503s on a clean network with a green status page warrant a ticket. Include the cf-ray header and a traceroute.

How do I fix an HTTP 401 auth failure in Claude Code?

What causes it

Your OAuth token expired, was revoked, or doesn’t carry the scope a new feature requires. The painful variant is mid-session expiry during an autonomous run — every subsequent tool call returns authentication_error and the session is unrecoverable until you re-auth.

What /doctor shows

Auth: failed (401) in red, usually with the scope mismatch (for example, user:profile).

How to fix it

  1. Run /logout then /login. The logout is the part most people skip; a stale cached token will poison the new session. After /logout, the browser flow mints a new token with current scopes.
  2. If /login itself 401s, clear the credentials cache: rm -rf ~/.claude/auth, then restart Claude Code and re-authenticate.
  3. For long autonomous runs, kick off from a fresh login and keep jobs under the token lifetime — OAuth refresh during autonomous runs is still imperfect in 2026.

When to escalate

If a fresh browser login completes but the first API call still 401s — a documented 2026 regression — file an issue with timestamps. Don’t keep retrying; the bug is server-side.

What does MCP error -32000 actually mean?

What causes it

-32000 is the JSON-RPC generic server-error code. In practice it means your MCP server process crashed — failed to start, threw during initialization, or exited mid-handshake. The matching message is usually Connection closed.

What /doctor shows

A red MCP server <name>: error -32000 tells you which server died but not why. The why is in the server’s stderr, which Claude Code captures but doesn’t surface inline.

How to fix it

  1. Run the server command manually. Copy command + args from your MCP config, run it in a terminal. If it exits, fix that first — missing dependency, wrong Node version, missing env var.
  2. Use absolute paths. MCP servers run in a stripped environment; your .zshrc is not loaded. Use /usr/local/bin/node instead of node, and set every env var explicitly in the env block.
  3. Don’t write to stdout. The stdio transport multiplexes JSON-RPC on stdout; any console.log() corrupts the stream. Move logging to console.error().

When to escalate

If the server runs cleanly in isolation but still -32000s under Claude Code, capture stderr and file on the server’s repo first — Anthropic owns the transport but not third-party servers.

What is the MCP "Client Closed" error?

What causes it

The stdio process died between connect and the first tools/list response. The OS pipe opened, then the child exited — usually a missing binary, a dependency load failure, or Node.js exiting because synchronous init left no event-loop work.

How to fix it

  1. On Windows, wrap npx in cmd /c. Without it, Windows can’t exec npx and the child closes immediately. Example: {"command":"cmd","args":["/c","npx","-y","@modelcontextprotocol/server-filesystem"]}.
  2. Add process.stdin.resume() to custom Node servers — otherwise Node exits as soon as init is done.
  3. Verify the binary exists. A surprising number of Client Closed errors are typos in command — the OS dispatches to /bin/sh, which exits non-zero.

When to escalate

Claude Code has a known issue where clean shutdowns are mis-reported as "failed" in /doctor. If your server works in-session but /doctor flags it red after exit, that’s the bug, not your config.

Why does plugin install fail?

What causes it

Three failure modes: the marketplace isn’t registered (common on Windows, where claude-plugins-official is not pre-installed), the network blocks the marketplace host (corporate proxies, VPNs, zero-trust agents), or a partial download wedges the retry loop in ~/.claude/plugins/cache.

How to fix it

  1. Clear the cache: rm -rf ~/.claude/plugins/cache and restart Claude Code. This unblocks most stuck installs.
  2. On Windows, add claude-plugins-official to extraKnownMarketplaces in ~/.claude/settings.json manually.
  3. Verify .claude/plugins is writable. If you ever ran Claude Code with sudo, you likely have root-owned plugin dirs — chown them back.

Why are my skills not loading?

What causes it

Skill discovery is unusually sensitive to YAML frontmatter. The most common failure is invisible: prettier wraps a long description: field onto two lines and Claude Code silently skips the skill. No error, no warning, just absence.

How to fix it

  1. Keep description on one line. If long, use a YAML block scalar (description: >) rather than letting prettier wrap a plain string.
  2. Remove paths: if you added it. A known 2026 bug makes a skill with paths in its frontmatter completely undiscoverable.
  3. Don’t rely on allowed-tools for security. It parses but is not currently enforced in the CLI. Treat it as documentation, not a sandbox.

Verify with /doctor — the skills section lists every discovered SKILL.md by path. If yours isn’t there, the frontmatter didn’t parse.

Companion guide

For the full landscape of agentic coding tools, see our AI coding agents complete guide for 2026.

Why am I hitting the context limit with a 1M window?

What causes it

The 1M window is GA on Opus 4.6 and Sonnet 4.6, but Claude Code reserves a compaction buffer (~33K tokens). You have roughly 830K usable tokens before automatic compaction kicks in. Most "context full" errors at 1M are compaction-thrash on noisy tool output, not real exhaustion.

How to fix it

  1. Chunk-and-summarize beats "more context". A 1M window is not a substitute for narrowing the task. Feed Claude one subsystem at a time with a running summary — better answers, fewer tokens.
  2. Trim tool output at the source. Long shell output, large file reads, and verbose MCP responses are the biggest context consumers. Use ripgrep with line limits instead of cat, head -n instead of full reads, and have MCP servers return summaries by default.
  3. Use the compaction API explicitly with the compact-2026-01-12 beta header for long autonomous sessions. Default trigger is ~83.5%; raise it earlier to control where the summary boundary falls.

Why does my subagent time out or stop early?

What causes it

Subagents spawned via the Task tool with run_in_background: true sometimes terminate before completing and report "completed" back to the parent — reproduces in ~14-30% of long runs in 2026. Separately, builds and log-watchers that legitimately need 10+ minutes hit the per-tool timeout if you don’t structure them async.

How to fix it

  1. Use async subagents explicitly. Claude Code now supports background subagents that keep working after the main task completes and wake the parent when done. Spawn with the async pattern, then poll with /tasks.
  2. For builds and test suites, push them into a background task and use Monitor (or an explicit Wait) to stream output. Don’t Bash them synchronously — you’ll burn context on stdout chatter and hit the timeout.
  3. Press Ctrl+B while the agent is running — not after — to background the current task. After-the-fact backgrounding is what the broken cases look like.

What do I do about bash sandbox permission denied?

What causes it

Claude Code’s default permission mode runs Bash in an OS-level sandbox that blocks network egress and writes outside the project root. Anything that reaches the public network, touches ~/.ssh, or writes outside the sandbox EACCES’s.

How to fix it

  1. Add an explicit permission rule in ~/.claude/settings.json for the specific command. Permissions follow deny → ask → allow precedence.
  2. Run the command manually first. If it fails in your terminal too, the sandbox isn’t the problem.
  3. Avoid dangerouslyDisableSandbox: true as a habit. It lifts all restrictions for that call, not just the one that failed. On macOS Tahoe (26.x) the sandbox sometimes fails to initialize and Claude Code falls back to this flag per call — an OS-compat workaround, not a config choice.

When to escalate

EACCES writing to /tmp/claude/…/tasks after an upgrade is a known post-upgrade regression. Fix: rm -rf /tmp/claude and let Claude Code recreate it.

How do you read /doctor output?

/doctor is Claude Code’s built-in diagnostic. It runs ~20 checks in a few seconds and prints each with a color status: green pass, yellow warning, red fail. This is the cheatsheet for what each line actually means and what you do about it.

/doctor lineWhat it’s checkingIf it’s red
InstallationInstall method (npm global, native, Homebrew, ~/.claude/local) and how many parallel installs exist.Two installs on PATH — uninstall the older one. Mixed install methods are the #1 cause of upgrades that "don’t take".
VersionCurrent vs latest, plus the update channel.Run the upgrade command the line suggests. If on latest, your channel is opting into pre-stable; switch to stable if you want fewer surprises.
AuthOAuth token presence, expiry, and scope./logout then /login. If /login itself 401s, clear ~/.claude/auth manually.
API connectivityA reachability ping to api.anthropic.com.Network problem — VPN, proxy, DNS. Not a Claude Code problem.
MCP serversEach configured server, its transport, and connect status. New in 2026: warns if the same server is defined in multiple scopes.Run the server command in a terminal first. Fix it standalone, then re-run /doctor.
HooksHook syntax validity in settings.json.JSON parse error or shell parse error in a hook block. Validate the JSON, then dry-run the hook command.
SkillsEvery SKILL.md discovered, with path.If a skill is missing, its frontmatter didn’t parse. Check for wrapped description: lines or a paths: field.
Plugins / MarketplacePlugin dir permissions, registered marketplaces, fetch status.Clear ~/.claude/plugins/cache. On Windows, manually register claude-plugins-official.
Shell integrationWhether Claude Code can spawn your default shell and inherit env.Usually a corrupt .zshrc/.bashrc or a missing shell binary at the configured path.
SandboxOS sandbox availability and initialization.macOS 26 / Tahoe has a known sandbox-init regression. If red there, expect per-call dangerouslyDisableSandbox prompts until patched.
Context usageCurrent session token usage vs window, and time-to-compaction.Yellow at ~70% is normal. Red means compaction is imminent — trim tool output or split the task.

Rule of thumb: run /doctor immediately after any upgrade and any time a session behaves oddly. About 80% of the time, one of the lines above is the answer.

When should you actually contact Anthropic support?

Most Claude Code errors are environmental; support can’t debug your .zshrc. Escalate when:

  • You have a requestId for a 5xx that persists across models for >30 minutes with a green status page.
  • A fresh /login still 401s on the first request — documented 2026 bug.
  • You can reproduce a behavior change between two specific releases. Bisecting a regression is the most useful thing you can hand support.
  • You see data leakage, sandbox bypass without prompts, or unexpected auth flow — file at security@anthropic.com, not the public tracker.

For everything else, GitHub issues on the relevant repo with full /doctor output is fastest. Support asks for it first anyway.

Hire engineers who have debugged Claude Code in anger

Building with Claude Code in production is a different skill set from writing a prompt that demos well. Engineers who ship with it day to day know which errors are transient, which point at infrastructure, and which need a workflow change — they write hooks that survive an OAuth refresh, MCP servers that don’t crash on cold start, and skills whose frontmatter parses.

If you’re scaling a team and want vetted developers who already work this way, talk to Codersera. We place Claude Code-experienced engineers on a risk-free trial so you can see the workflow before you commit.

FAQ

Is a 529 the same as being rate-limited?

No. 429 is rate_limit_error — your account hit a quota. 529 is overloaded_error — Anthropic’s shared capacity is saturated for the model you asked for. 529 does not count against your quota, and Claude Code retries it automatically.

Does /doctor fix anything on its own?

No, it’s diagnostic-only. It surfaces broken configs and stale state, but each red line points to a fix you have to run yourself. Treat it as triage, not auto-repair.

Why does my MCP server work in Claude Desktop but not Claude Code?

Usually environment. Claude Desktop and Claude Code launch MCP servers in slightly different isolation contexts — PATH, working directory, and which env vars survive can differ. Switch to absolute paths in the command and put every required env var in the env block.

Should I use dangerouslyDisableSandbox as a default?

No. It lifts all sandbox restrictions for the call, not just the one that failed, and it has known bypass-without-prompt issues when combined with auto-approve modes. Add specific permission rules instead and only fall back to disabling the sandbox per-call when a host bug forces your hand.

How do I know if my skill actually loaded?

Run /doctor. The skills section enumerates every SKILL.md it discovered, by path. If yours is missing, the frontmatter didn’t parse — check for a wrapped description: field or a stray paths: key.

Does the 1M context window cost more?

No. As of March 2026, the 1M window is generally available on Opus 4.6 and Sonnet 4.6 with no pricing premium — a 900K-token request bills at the same per-token rate as a 9K one. The cost is in token count, not window size.

Why did my autonomous run stop mid-task with a 401?

Your OAuth token expired during the run. Claude Code’s OAuth refresh is still imperfect in 2026 for long autonomous jobs. The practical fix is to chunk long runs into pieces shorter than your token lifetime, kicking each off from a fresh /login.