ANTHROPIC_BASE_URL=https://openrouter.ai/api, ANTHROPIC_AUTH_TOKEN set to your OpenRouter key, and ANTHROPIC_API_KEY="" (explicitly blank). No proxy is needed. You can then route the Opus, Sonnet, and Haiku slots to cheaper models like GLM-5.2 or DeepSeek V4 — but reliable tool use is only guaranteed on Anthropic's own models.If you have spent any time in r/ClaudeCode this month, you have seen the mood. Threads titled "anthropic isn't the only reason you're hitting claude code limits" (557 upvotes, 137 comments) and "Why Claude Code Max burns limits 40% faster" (356 upvotes) are pulling real engagement. The recurring frustration — rate limits hit mid-task, bills that climb faster than expected — keeps pushing developers to ask the same question: can I keep the Claude Code harness I like and stop paying Anthropic for the tokens behind it?
The answer is yes, and it is officially documented. OpenRouter — the model-routing marketplace — publishes a Claude Code integration cookbook that shows how to redirect the CLI to its own Anthropic-compatible endpoint with three environment variables. Once you do, you can keep using real Claude models billed through OpenRouter credits, or swap the Opus/Sonnet/Haiku "slots" for cheaper open models like GLM-5.2, DeepSeek V4, Qwen3-Coder, or Kimi.
This guide covers the exact setup, which models actually behave as drop-in backends, what it costs, and the part the hype threads skip: OpenRouter itself only guarantees Claude Code works against Anthropic's first-party models. Everything else is a tradeoff you should make with your eyes open.
What does pointing Claude Code at OpenRouter actually do?
Claude Code is two things bolted together: an agentic harness (the loop that reads files, runs commands, edits code, calls tools, and manages sub-agents) and a backend (the model that does the thinking). By default the harness talks to Anthropic's API and bills against your Anthropic subscription or API key. The trick here separates those two layers — you keep the harness and replace the backend.
Anthropic's CLI honours an environment variable, ANTHROPIC_BASE_URL, that tells it where to send requests. OpenRouter exposes an endpoint at https://openrouter.ai/api that speaks the native Anthropic Messages protocol — OpenRouter calls this its "Anthropic Skin." When Claude Code points there, it keeps sending the exact same request shape it always sends Anthropic; OpenRouter receives it, maps the model name, and forwards it to whichever provider actually serves the model. Crucially, the Anthropic Skin passes through both extended thinking blocks and native tool use, so the agentic features Claude Code depends on survive the round trip.
The important consequence: no local proxy is required. A lot of older guides told you to run a translation server on localhost that converted Anthropic-format requests into OpenAI-format ones. With OpenRouter's Anthropic-compatible endpoint, that translation happens server-side. You set three variables, relaunch claude, and you are done. (There is still a proxy-based path — claude-code-router — and we cover it later, because it buys you routing features the env-var approach does not.)
If you build with the Anthropic Agent SDK (claude-agent-sdk / @anthropic-ai/claude-agent-sdk), the same trick works there too — the SDK uses Claude Code as its runtime, so the identical ANTHROPIC_BASE_URL / ANTHROPIC_AUTH_TOKEN configuration redirects programmatic agents the same way it redirects the interactive CLI.
How do you set up Claude Code with OpenRouter?
Here is the full path from nothing to a working OpenRouter-backed Claude Code session.
1. Install Claude Code
If you do not already have it, install the CLI. The native installer is the simplest route:
# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash
# or via npm (needs Node 18+)
npm install -g @anthropic-ai/claude-code2. Get an OpenRouter API key
Create an account at OpenRouter, add credits (it is prepaid, pay-as-you-go), and generate an API key from your account settings. This single key is what authenticates every model request, regardless of which underlying model you route to.
3. Set the three environment variables
Add these to your shell profile — ~/.zshrc, ~/.bashrc, or equivalent. The blank Anthropic key is not optional; leaving a stale Anthropic key in place causes a credential conflict.
export OPENROUTER_API_KEY="<your-openrouter-api-key>"
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY"
export ANTHROPIC_API_KEY="" # MUST be explicitly emptyReload your shell (source ~/.zshrc or open a new terminal) so the variables take effect.
4. Or scope it to a single project
If you do not want OpenRouter to be your global default, set it per project instead. Create .claude/settings.local.json in the repo root:
{
"env": {
"ANTHROPIC_BASE_URL": "https://openrouter.ai/api",
"ANTHROPIC_AUTH_TOKEN": "<your-openrouter-api-key>",
"ANTHROPIC_API_KEY": ""
}
}Do not put these in a project .env file. The native Claude Code installer does not read standard .env files, so the variables silently do nothing and you spend an hour wondering why you are still being billed by Anthropic. Use the shell profile or settings.local.json — those are the two paths the CLI actually reads.
5. Clear any cached Anthropic login
If you were previously signed in with an Anthropic account, Claude Code has a cached session that overrides your new environment. That cache produces confusing "model not found" errors (often for openrouter/auto) that have nothing to do with the model and everything to do with the stale login. Run, inside Claude Code:
/logoutThen quit, relaunch claude, and verify.
6. Verify with /status
Inside the CLI, run /status. A correctly configured session reports:
Auth token: ANTHROPIC_AUTH_TOKENAnthropic base URL: https://openrouter.ai/api
If you see those two lines, your traffic is flowing through OpenRouter. You can now watch every request and its token cost in real time on the OpenRouter Activity dashboard — which, the first time you run a big agentic task, is a genuinely useful gut-check on what these sessions actually cost.
Which models can you use as Claude Code backends?
Out of the box, the redirect gives you real Claude models billed through OpenRouter. The more interesting move — and the one driving most of the social-media excitement — is repointing the model slots at cheaper open-weight models. OpenRouter's catalog is large; the models that show up repeatedly in working Claude Code configs are the coding-tuned, long-context ones below. All prices are per 1M tokens (prompt / completion) and come from OpenRouter's live model catalog.
| Model (OpenRouter slug) | Input / 1M | Output / 1M | Context | Notes |
|---|---|---|---|---|
z-ai/glm-5.2 | $0.94 | $3.00 | 1M | Most-recommended GLM coding backend |
z-ai/glm-5 | $0.60 | $1.92 | — | Cheaper previous GLM generation |
deepseek/deepseek-v4-pro | $0.435 | $0.87 | 1M | Strong price-to-capability for the Opus slot |
deepseek/deepseek-v4-flash | $0.09 | $0.18 | — | Cheapest serious option; good Haiku-slot fit |
qwen/qwen3-coder | $0.22 | $1.80 | 1M | Coding-specialised; long context |
qwen/qwen3-coder:free | $0.00 | $0.00 | — | Free tier; rate-limited and less reliable |
moonshotai/kimi-k2.7-code | $0.74 | $3.50 | 262k | Kimi coding variant |
minimax/minimax-m2.5 | $0.12 | $0.48 | 204.8k | Cheap, smaller context |
For comparison's sake, this is why people bother: routing the Opus slot to deepseek/deepseek-v4-pro at $0.435 input / $0.87 output is an order of magnitude cheaper per token than Anthropic's frontier pricing, and DeepSeek V4 carries a 1M-token context window. GLM-5.2 is the model the community reaches for most often when they want one open backend that behaves well across the board — we go deeper on it in the GLM-5.2 complete guide, and on how it stacks up against frontier Claude in GLM-5.2 vs Claude Opus 4.8 for coding.
One naming trap worth flagging: GLM-5.2 on OpenRouter is the pay-per-token route z-ai/glm-5.2. Do not confuse it with Z.ai's separate flat-rate "GLM Coding Plan" subscription, which people also use as a Claude Code backend but which is billed and rate-limited completely differently. They are two distinct products from the same vendor.
How do you route each model slot?
Claude Code internally asks for three model tiers — Opus (heavy reasoning), Sonnet (the everyday workhorse), and Haiku (cheap, fast utility calls) — plus a separate model for sub-agents. You override each tier with its own environment variable:
export ANTHROPIC_DEFAULT_OPUS_MODEL="z-ai/glm-5.2"
export ANTHROPIC_DEFAULT_SONNET_MODEL="deepseek/deepseek-v4-pro"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="qwen/qwen3-coder-flash"
export CLAUDE_CODE_SUBAGENT_MODEL="z-ai/glm-5.2"A sensible pattern is to put your strongest model in the Opus and sub-agent slots (those do the planning and the parallel grunt work), a fast mid-tier model in Sonnet, and the cheapest viable model in Haiku, which Claude Code calls constantly for small bookkeeping tasks like generating commit messages and summarising tool output. Spending GLM-5.2 money on Haiku-tier calls is wasteful; that is what deepseek/deepseek-v4-flash or qwen/qwen3-coder-flash is for.
Presets: abstract the backend server-side
Hard-coding model slugs in your shell profile works, but it means editing local config every time you want to try a different backend. OpenRouter Presets solve that. Create a preset at openrouter.ai/settings/presets — a preset bundles a model choice plus an optional system prompt and routing rules — then reference it by slug:
export ANTHROPIC_DEFAULT_SONNET_MODEL="@preset/<your-preset-slug>"Now you can swap the model behind that slot from the OpenRouter dashboard without touching your machine. The top r/ClaudeCode tutorial on this setup used exactly this technique to run GPT-5.2-Codex behind the Sonnet slot — proof that the slots are just names, and you can put genuinely non-Claude models behind them if the provider handles tool calls well enough.
What's the catch with non-Anthropic models?
Here is the sentence the viral threads tend to bury. OpenRouter's own documentation states plainly that Claude Code "is only guaranteed to work with the Anthropic first-party provider" and that it "is optimized for Anthropic models and may not work correctly with other providers." That is not boilerplate. It is the load-bearing caveat of this entire approach.
Tool use and parsing can break
Claude Code is an agentic harness. It does not just generate text — it asks the model to emit structured tool calls (read this file, run this command, apply this edit) in a specific format, then parses the response and acts on it. Anthropic's models are trained against exactly that format. GLM, Qwen, DeepSeek, and Kimi are trained against their own tool-calling conventions, and OpenRouter's skin does its best to translate, but the translation is not perfect. The failure modes you will actually hit: a model that narrates what it would do instead of emitting a tool call, an agentic loop that stalls, edits that come back malformed, or sub-agents that never return. None of this shows up in a one-line "hello world" test — it shows up three steps into a real refactor.
Practical guidance: if you are staying on Claude models through OpenRouter, set Anthropic as your top-priority provider so you get first-party behaviour. If you are routing to open models, treat tool-use reliability as the thing you are trading cost for, and keep a known-good Claude config one /logout away. Our Claude Code troubleshooting guide covers the error patterns that show up when a backend mishandles tool calls.
Prompt caching only behaves natively on Anthropic models
This is the subtle cost killer. Anthropic models support prompt caching: you mark stable parts of the context (system prompt, file contents, instructions) with explicit cache_control "ephemeral" breakpoints, and subsequent requests that reuse that prefix read it from cache at a steep discount instead of paying full input price every turn. In an agentic session that re-sends a large context on every step, caching is the difference between a sane bill and a brutal one.
OpenRouter's prompt-caching guide documents how this works through its endpoint: it uses provider "sticky routing" to keep your requests hitting the same backend so the cache stays warm. But there are two things to understand. First, on Anthropic models the cache write carries a premium — a 1-hour TTL cache costs 2x the normal input price to write, with reads discounted afterward, so caching pays off only when you actually reuse the prefix. Second: caching behaviour varies by provider and model on non-Anthropic backends. Some open backends cache automatically (OpenRouter lists DeepSeek among the providers with implicit caching), others cache only on specific models or not at all — so the discount you can count on is less predictable than the explicitly-controlled caching you get on Anthropic models. If a backend you route to caches weakly or not at all, a heavy agentic session that would have been mostly cache reads on Claude pays closer to full input price each turn — which can erode the savings that motivated the switch.
Is OpenRouter actually cheaper than an Anthropic plan?
Not always — and this is the most common misconception. OpenRouter is metered pay-as-you-go. You buy credits and burn them per token, and reasoning/thinking tokens are billed too. Anthropic's subscriptions (Pro, and the $200/month Max plan) are flat-rate: you pay once and run until you hit rate limits.
Those are different cost curves. For light or bursty use, metered billing wins easily — you pay cents for an afternoon of work instead of a fixed monthly fee. But for very heavy users, the math flips. If you are running Claude Code for hours every day across multiple parallel sessions, a flat $200 Max plan can be cheaper than the equivalent token volume on OpenRouter, even at open-model prices. Pay-per-token gets expensive fast for power users — which is exactly why some of the heaviest among them move to a flat-rate plan (such as Z.ai's GLM Coding Plan) rather than metered OpenRouter.
The honest framing: OpenRouter buys you model choice and rate-limit relief, not guaranteed savings. The rate-limit relief is often the real win — if Anthropic's limits are blocking you mid-task, paying per token to keep working has value that does not show up in a simple cost-per-token comparison. Watch your OpenRouter Activity dashboard for the first week and compare it honestly against what a flat plan would have cost you.
What is claude-code-router, and when should you use it?
The env-var method points all of Claude Code at one OpenRouter endpoint. claude-code-router (CCR) is the heavier-weight alternative: a local gateway that sits between Claude Code and a whole fleet of providers, with per-workload routing and fallback. Install and start it:
npm install -g claude-code-router
ccr start # local gateway on http://127.0.0.1:3456, web UI on :3458CCR routes Claude Code to OpenAI-compatible endpoints, Anthropic, Gemini, OpenRouter, DeepSeek, Moonshot/Kimi, Z.AI, and more. Its appeal over the plain env-var setup is the routing logic: you can send different workloads to different providers (cheap model for background tasks, strong model for the main loop), configure automatic fallback if one provider errors, and mix direct provider keys with OpenRouter in one config. The r/ClaudeCode post "I stopped paying Anthropic" describes precisely this — CCR plus DeepInfra, with the Opus slot mapped to DeepSeek V4 Pro and the Sonnet slot to DeepSeek, the author keeping Claude Code itself because the harness was the part they valued, not the backend.
When to choose which: use the three env vars if you want one backend (real Claude through OpenRouter, or one open model) with minimal moving parts. Use CCR if you want to mix providers, route by workload, or add fallback — at the cost of running and maintaining a local gateway. Both put the same harness in front; they differ only in how much routing intelligence sits behind it. If you are weighing this against staying first-party, our guide to AI coding agents puts Claude Code in context against the broader field.
Is "free Claude Code" real, or is it hype?
Your feed is full of it. "BREAKING: You can now run Claude Code for FREE w/ Ollama + OpenRouter. No API costs. No rate limits." (~225k views). "Goodbye Claude Code subscription fees... a proxy that runs Claude Code completely free." (~33k views). "CLAUDE CODE IS FREE TO RUN AND MOST PEOPLE STILL THINK YOU NEED A PAID SETUP." These posts are not lying, exactly — but read the fine print before you reorganise your workflow around them.
The "free" routes fall into a few buckets, each with a catch:
- Free model tiers like
qwen/qwen3-coder:free($0/$0 on OpenRouter). Genuinely free, genuinely rate-limited, and the free endpoints are less reliable under load. Fine for tinkering; frustrating for sustained work. - Free-proxy projects like free-claude-code, which route through providers' free tiers. Workable for a solo developer, but free tiers carry hard request-rate ceilings that an agent firing many tool calls hits quickly, and you are depending on a third party's free quota staying open.
- Local models via Ollama or llama.cpp. The r/LocalLLaMA guide "Wrote a guide for running Claude Code with GLM-4.7 Flash locally with llama.cpp" (123 upvotes) is real — llama.cpp's server gained Anthropic-API compatibility, so you can point Claude Code at a model running on your own hardware. "Free" in the sense of no per-token bill, very much not free in hardware, setup, and the capability gap against a frontier model. If local is the direction you want, the best free local LLM tools and the self-hosting LLMs guide are the right starting points.
One more thing worth saying plainly: anything that routes real Claude models through an unofficial back door to dodge billing is a terms-of-service and reliability risk, not a clever hack — and it is the kind of setup that tends to break the week it gets noticed. The clean version of this is the supported one: OpenRouter credits, the official cookbook, and your own dashboard showing exactly what you spend.
Who should actually do this?
To pull the threads together:
- You keep hitting Anthropic rate limits mid-task and want a release valve → point Claude Code at OpenRouter with real Claude models. You keep first-party behaviour and pay per token only when the flat plan would have blocked you.
- You want dramatically lower token costs and can tolerate some tool-use flakiness → route the slots to GLM-5.2 or DeepSeek V4 and accept the caveats. Keep a Claude fallback config handy.
- You run many parallel sessions for hours daily → do the math before switching. A flat $200 Max plan may still be your cheapest option; metered billing punishes the heaviest users.
- You want per-workload routing and provider fallback → use claude-code-router instead of the bare env vars.
- You are chasing a "completely free" setup → it exists, on free tiers and local models, with real ceilings on rate, reliability, and capability. Manage your expectations.
The underlying point is liberating either way: Claude Code's value is the harness, and the harness is portable. You are not locked to one backend. Once you internalise that, the question stops being "Anthropic or not" and becomes "which backend for which job" — which is a much better question to be asking.
If you are making backend and tooling calls like this across a team and want engineers who already live in these workflows, Codersera places vetted remote developers who can extend your team without the ramp-up.
FAQ
Do I need a proxy to use Claude Code with OpenRouter?
No. Setting ANTHROPIC_BASE_URL=https://openrouter.ai/api makes Claude Code speak its native Anthropic protocol directly to OpenRouter's Anthropic-compatible endpoint, which handles model mapping and passes through thinking blocks and tool use. A proxy (claude-code-router) is optional and only worth it if you want per-workload routing or provider fallback.
Why must ANTHROPIC_API_KEY be set to empty?
If a stale Anthropic API key is present alongside your OpenRouter auth token, the two credentials conflict and requests fail. Setting ANTHROPIC_API_KEY="" explicitly clears it so Claude Code authenticates only through ANTHROPIC_AUTH_TOKEN (your OpenRouter key). A leftover cached Anthropic login causes the same class of problem — fix it by running /logout and relaunching.
Will every model work as a Claude Code backend?
No. OpenRouter only guarantees Claude Code against Anthropic's first-party models. Open models like GLM-5.2, DeepSeek V4, Qwen3-Coder, and Kimi can work, but Claude Code is optimised for Anthropic's tool-calling format, so non-Anthropic backends may mishandle tool calls, stall agentic loops, or return malformed edits. Test on a real task before relying on one.
Is OpenRouter cheaper than an Anthropic subscription?
It depends on usage. OpenRouter is metered pay-per-token (reasoning tokens included); Anthropic's $200 Max plan is flat-rate. For light or bursty work, metered billing is cheaper. For very heavy daily use across parallel sessions, a flat plan can win — and prompt caching, which only behaves natively on Anthropic models, can erode the savings on open backends.
Does the same setup work for the Anthropic Agent SDK?
Yes. The Claude Agent SDK (claude-agent-sdk / @anthropic-ai/claude-agent-sdk) uses Claude Code as its runtime, so the identical ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN configuration redirects programmatic agents through OpenRouter. OpenRouter documents this with Python and TypeScript examples.
How do I confirm Claude Code is actually using OpenRouter?
Run /status inside Claude Code. It should report Auth token: ANTHROPIC_AUTH_TOKEN and Anthropic base URL: https://openrouter.ai/api. You can then verify token spend per request on the OpenRouter Activity dashboard, which is the fastest way to catch an agentic session that is costing more than you expected.