How to Stop Claude Code From Over-Engineering Your Code
You ask Claude Code to fix a null check on one line. You get back a new utils/ directory, a custom error class, a retry wrapper "for robustness," a config flag nobody requested, and three files renamed "while I was here." The one-line bug is fixed. So is a lot of code that wasn't broken.
This is one of the most common complaints from developers using AI coding agents, and it shows up in code reviews almost word for word: the agent behaves like a well-meaning senior engineer who can't stop gold-plating. It adds abstraction layers for a single use case, defensive fallbacks for inputs that can't occur, and "future-proofing" for requirements that don't exist. The code usually works. It's just three times bigger than the change you asked for, and now your diff is unreviewable.
The good news: this is a controllable behavior, not a fixed trait. Claude Code over-engineers because of how it's prompted, configured, and reviewed — and every one of those is a lever you control. This guide walks through the practical controls that produce surgical changes: CLAUDE.md rules, prompting patterns, plan mode, scoping discipline, model and effort settings, and review gates. None of it requires a different tool. It requires telling the tool what "done" means before it starts.
What does Claude Code over-engineering actually look like?
Before you can suppress a behavior you have to name it. "Over-engineering" is a fuzzy word, so here is the concrete catalog — the specific patterns that drive the loudest complaints. If you've used an AI coding agent on real code, you'll recognize most of these on sight.
- Unrequested abstractions. You ask for one function; you get an interface, a base class, a factory, and a "strategy" pattern to support the second implementation that will never exist. Abstraction is a cost you pay now to buy flexibility later. The agent keeps buying flexibility you didn't order.
- Defensive fallbacks for impossible states. Try/except blocks around code that can't throw, null guards on values the type system already guarantees, default branches for enums that are exhaustively handled. Each one looks harmless. Together they bury the actual logic and hide real bugs behind swallowed errors.
- New files nobody asked for. A two-line helper gets its own module. Constants get extracted into a
constants.ts. A single config value spawns a config loader. File sprawl makes the change harder to review and harder to revert. - Scope drift mid-task. While fixing the thing you asked for, the agent "notices" unrelated issues and fixes those too — reformatting, renaming, re-ordering imports, upgrading a pattern it dislikes. Your one-file change becomes a fourteen-file diff.
- Premature optimization. Caching layers, memoization, and batching added to code that runs once at startup. Complexity justified by performance that was never a constraint.
- Comment and docstring inflation. Every line narrated, obvious code over-explained, and verbose block comments restating what the code plainly says. Noise that ages badly the moment the code changes.
- Backward-compatibility theater. Keeping the old code path "just in case," shimming a function signature that has exactly one caller, versioning an internal API. Dead weight dressed up as caution.
The throughline is that none of these are wrong in the abstract. A factory pattern is good engineering — when you have three implementations. A fallback is good engineering — when the input is genuinely untrusted. The failure is applying senior-engineer instincts indiscriminately, to a one-line change, without being asked. That's why it's tempting to call it "lazy senior dev mode": it's the pattern of someone who reaches for the heavyweight solution on autopilot because it feels rigorous, not because the problem called for it.
Why does Claude Code over-engineer in the first place?
You'll write better controls if you understand the mechanism. There are four structural reasons the model trends toward more code, not less.
Training tends to reward thoroughness. Assistant models are generally tuned on human feedback, and humans rating code tend to prefer answers that look complete, handle edge cases, and include error handling. "Here's the one line you asked for" can read as lazy to a rater; "here's a robust, well-structured solution" reads as helpful. A plausible result is that the model leans toward more structure as the safer, more-helpful answer — which is exactly backwards when you wanted a surgical edit.
It can't see your judgment, only your words. A human engineer who's lived in your codebase knows that you don't add a retry wrapper here because this path is internal and synchronous. The agent doesn't have that context unless you give it. Absent a constraint, it defaults to the textbook-safe choice, which is almost always the more-code choice.
It optimizes for "not getting blamed." Adding a fallback or a guard is cheap insurance against an edge case the model can imagine but can't rule out. Leaving it out requires confidence the agent doesn't have about your specific runtime. So it hedges, and hedging accretes.
Underspecified prompts invite filling in the blanks. "Fix the login bug" leaves enormous room. The model resolves that ambiguity by doing more, because doing more covers more interpretations of what you might have meant. Every gap in your instruction is a place the agent will pour extra code.
Put together, the lesson is that over-engineering is the model's default in the absence of constraint. You don't stop it by asking nicely once. You stop it by changing the defaults — in your project config, in how you phrase tasks, and in what you let through review. The rest of this guide is those three layers. For a wider view of how these agents reason and where they reliably slip, the AI coding agents guide is a useful companion.
How do you write CLAUDE.md rules that curb scope creep?
CLAUDE.md is the project-level instruction file Claude Code reads on every session. It's the single highest-leverage control you have, because it changes the default for every prompt without you re-typing constraints each time. Most teams use it for setup commands and conventions. Fewer use it to actively fight over-engineering — which is a missed opportunity, because this is exactly where a standing rule beats a one-off reminder.
The mistake is writing vague aspirations like "write clean code" or "keep it simple." Those are noise; the model already thinks it's doing that. Effective rules are specific, negative, and behavioral — they tell the agent what not to do and what to ask before doing. A practical anti-over-engineering block looks like this:
## Change discipline
- Make the SMALLEST change that satisfies the request. Do not refactor,
rename, reformat, or "improve" code you were not asked to touch.
- Do NOT create new files unless strictly required. Prefer editing an
existing file over adding a new one.
- Do NOT add abstractions (interfaces, base classes, factories, wrappers,
generic helpers) for a single use case. Inline first; abstract only when
there are 3+ real call sites.
- Do NOT add error handling, fallbacks, retries, or defensive guards unless
the input is genuinely untrusted or I explicitly ask. No try/except around
code that cannot throw.
- Do NOT add caching, memoization, or batching unless I name a performance
requirement.
- Match the surrounding code's style, naming, and patterns. Do not introduce
a new pattern.
- If a change seems to need more than ~20 lines or a new file, STOP and
propose the plan first instead of writing it.
- Comments only where the "why" is non-obvious. No narration of what the
code plainly does.A few notes on why this wording works. It's phrased as prohibitions because "don't add X" is far harder to rationalize around than "be minimal." It includes the "3+ call sites" rule as a concrete trigger, so the model has a bright line instead of a vibe. And it includes the "stop and propose" escape hatch, which converts a silent 200-line diff into a one-paragraph plan you can approve or redirect — the single most valuable line in the block.
Keep the file short and load-bearing. A CLAUDE.md bloated with fifty rules gets diluted; the model can't weight everything equally, and the rules you actually care about drown. Five to ten sharp prohibitions beat a wall of text. If you maintain shared conventions across an org, the same discipline applies to the broader instruction-file format — the AGENTS.md guide covers how to structure these files so agents actually honor them rather than skimming past.
One more guardrail worth its line: tell the agent that deleting code is a valid and preferred outcome. Models are strongly biased toward addition — they'll keep a dead branch "for safety" by default. A rule like "removing unnecessary code is as valuable as adding code; delete what the change makes obsolete" gives explicit permission to subtract, which it otherwise won't take.
Which prompting patterns produce surgical changes?
CLAUDE.md sets the standing defaults; the prompt sets the specifics. Even with a strong config, a sloppy prompt re-opens the door. The pattern that consistently produces tight diffs has four parts: a tightly bounded task, an explicit "don't," a definition of done, and a permission to ask. Compare:
Weak prompt: "Fix the bug where users sometimes don't get logged out."
Strong prompt: "In auth/session.ts, the clearSession() function doesn't remove the refresh-token cookie. Add one line to delete that cookie. Change nothing else in the file. Don't add logging, error handling, or new helpers. Done = the refresh-token cookie is cleared on logout. If you think more is needed, tell me before changing anything."
The strong version names the file, names the function, states the exact change, fences the scope ("change nothing else"), enumerates the tempting additions to forbid, defines done as a single verifiable fact, and grants permission to push back instead of guessing. That last clause matters more than it looks: without it, an agent that disagrees with your narrow scope will silently do the bigger thing anyway. With it, the disagreement surfaces as a question you can answer.
A handful of reusable phrasings carry most of the weight:
- "Make the minimal change." Blunt, but it reliably shifts the default. Pair it with a line count when you can estimate one: "this should be a 1-3 line change."
- "Don't touch anything outside
X." Scope-fences the blast radius to a file or function. The agent will still read other files for context, but won't edit them. - "No new files. No new dependencies." Kills file sprawl and the reflex to
npm installa library for something you can write in five lines. - "Show me the plan first; don't write code yet." Forces a cheap, reviewable proposal before any edits — see the plan-mode section below.
- "If you're unsure, ask. Don't assume." Converts the model's hedging instinct from "add more code to be safe" into "ask a clarifying question."
- "YAGNI." Shorthand for "you aren't gonna need it." The model knows the principle; invoking it by name is a surprisingly effective nudge against speculative generality.
Two anti-patterns to drop. First, stop saying "make it production-ready" or "make it robust" unless you mean it — those phrases are explicit licenses to add fallbacks, logging, validation, and structure. They're the exact words that summon the behavior you're trying to suppress. Second, stop accepting a sprawling diff and fixing it by asking for another change. Each follow-up is a fresh opportunity for scope creep. It's cheaper to reject the diff and re-prompt with tighter scope than to negotiate it down edit by edit.
How does plan mode keep changes scoped?
Prompting and config reduce over-engineering probabilistically. Plan mode reduces it structurally, by inserting a checkpoint between intent and edits. In plan mode, Claude Code reads your code and the task, then produces a written plan — the files it intends to touch, the changes it intends to make — and waits for your approval before writing anything. It doesn't edit your files until you say go.
This is one of the most reliable controls in the toolkit, because over-engineering is usually visible in the plan. When the plan says "create RetryHandler class, add utils/validation.ts, wrap the call in exponential backoff," you can see the gold-plating coming and cut it with one reply — "skip all of that, just add the null check" — before a single file changes. You're reviewing a paragraph instead of a diff, which is faster and catches scope creep at its cheapest possible moment.
A workflow that consistently produces surgical changes:
- Enter plan mode before describing the task. Toggle it on so the agent knows to propose rather than execute. This sets the mode for the whole exchange.
- State the task with scope fences, using the strong-prompt structure above.
- Read the plan for the tells: new files, new classes/interfaces, words like "robust," "wrapper," "handler," "fallback," "for flexibility," "to be safe." Each is a flag.
- Cut before approving. Reply with the deletions: "Remove the new file and the retry logic. Inline it. Then proceed." The agent revises the plan, not the code.
- Approve only the minimal plan, then let it execute. Because you pre-negotiated scope, the diff lands close to what you wanted.
Plan mode also fixes the worst failure mode of agentic editing: the silent 14-file change you discover only after it's done. With a plan gate, large changes can't sneak up on you — they have to be proposed, and a proposal is something you can decline. Treat any plan that wants to touch more files than you expected as guilty until proven necessary, and ask the agent to justify each one. Half the time, it can't, and the plan shrinks on the spot.
What model and effort settings reduce gold-plating?
Configuration beyond the prompt matters too, and there's a counterintuitive trap here. It's tempting to assume the most capable model with the highest reasoning effort will give you the cleanest result. For surgical edits, the opposite often holds: more reasoning budget can mean more elaboration, because a model given room to think will think its way into edge cases, alternatives, and "improvements" you didn't ask for. High effort is excellent for genuinely hard architectural problems. For "add a null check," it's the setting most likely to over-build.
Match the setting to the task:
| Task type | Suggested approach | Why |
|---|---|---|
| One-line fix, rename, small tweak | Lower effort / faster model, tight prompt, no plan needed | Less reasoning budget means less elaboration; the task is too small to benefit from deep thinking |
| Bounded feature in one file | Mid effort, plan mode on, scope-fenced prompt | Enough capability to get it right, with the plan gate catching any scope drift |
| Multi-file refactor or new subsystem | Higher effort / strongest model, plan mode mandatory, review gate after | Genuinely needs the reasoning; here the elaboration instinct is often warranted, but still review the plan |
| Debugging an unclear failure | Higher effort, but instruct "diagnose first, propose fix, don't change code yet" | Reasoning helps find the cause; the constraint stops it from rewriting half the module while it's in there |
The principle is to spend reasoning budget where the problem is hard, not where the change is small. A common, effective habit is to keep a faster, lower-effort configuration as the default for everyday edits and reach for the heavyweight model deliberately, only when a task's difficulty actually warrants it. If you're weighing which model and effort tier to keep on hand, the Claude Opus 4.8 launch guide covers the current capability and reasoning-effort tradeoffs in detail.
One more setting-level lever: context hygiene. A long, messy session where the agent has accumulated a tangle of prior decisions tends to produce more elaborate output — it's pattern-matching on its own earlier verbosity. Starting a fresh session (clearing context) for a new, unrelated task strips that momentum and often yields a notably tighter result. If a session has drifted into over-building, don't fight it inside the same thread — reset and re-prompt cleanly.
How do you build review gates that catch over-engineering?
Controls upstream of the diff reduce over-engineering; a review gate catches what slips through. The gate is simple: no diff merges without a human (or a second, adversarial pass) checking it specifically for unrequested complexity. This is the same separation-of-concerns logic that keeps human teams honest — the engineer who wrote the code is the worst person to judge whether it's too much code.
A focused review checklist for AI-generated diffs, beyond ordinary correctness:
- Did it touch files outside the task? Anything edited that wasn't part of the request is suspect. Revert it unless there's a stated reason.
- Are there new files? For each one, ask whether the code could have lived in an existing file. Usually it could.
- Are there new abstractions with one call site? An interface, base class, or generic helper used exactly once is premature. Inline it.
- Is there error handling for impossible states? Try/except around non-throwing code, guards on type-guaranteed values — strip them.
- Did it add dependencies? A new package for a five-line task is almost always the wrong trade.
- Are the comments earning their place? Delete narration; keep only the non-obvious "why."
- Is anything kept "for backward compatibility" that has no second caller? Dead-weight compatibility shims go.
You can run this gate two ways, and the strongest teams do both. The human pass is you, reading the diff with the checklist in mind before merge. The agent pass is a second, separate Claude Code invocation — a fresh session with no stake in the original code — prompted: "Review this diff only for over-engineering. Flag every unrequested abstraction, fallback, new file, and dependency. Propose the minimal version." A reviewer that didn't write the code has no sunk-cost attachment to it and tends to cut hard. The key is that authoring and reviewing happen in separate contexts; an agent asked to review its own just-written diff in the same session tends to defend it.
If you want a more rigorous loop, codify the review as a slash command or a standing prompt template so it runs identically every time, and consider adding linters or CI checks that fail the build on the mechanical tells — new top-level files in a PR that wasn't supposed to add any, a dependency-count increase, a diff that exceeds an expected size. Automated gates don't catch subtle over-abstraction, but they catch the gross cases for free. When a change genuinely goes sideways, the Claude Code troubleshooting guide covers the recovery patterns for diffs that have already landed badly.
Which control fixes which over-engineering complaint?
Every complaint from the catalog maps to a specific control. Pulling them together makes it clear that you don't need all of these at maximum — you need the one or two that target your most frequent failure. In practice, a strong CLAUDE.md block plus a plan-mode habit handles the large majority of the problem, with prompt discipline and the review gate cleaning up the rest.
| Complaint | Primary control | How it applies |
|---|---|---|
| Unrequested abstractions | CLAUDE.md "3+ call sites" rule + plan review | Standing prohibition on single-use abstraction; catch the rest in the plan |
| Defensive fallbacks for impossible states | CLAUDE.md "no error handling unless input is untrusted" | Removes the reflex; review gate strips any that slip through |
| New files nobody asked for | Prompt "no new files" + review checklist | Per-task fence backed by a merge-time check |
| Scope drift mid-task | Prompt "don't touch anything outside X" + plan mode | Scope fence plus a plan that exposes any extra files before edits land |
| Premature optimization | CLAUDE.md "no caching/batching unless I name a perf requirement" | Default-off for performance complexity |
| Comment inflation | CLAUDE.md "comments only for non-obvious why" | Cuts narration at the source |
| Backward-compat theater | Review gate + "deleting is valid" rule | Explicit permission to subtract plus a merge-time check for dead shims |
| Silent large diffs | Plan mode | Nothing large lands without an approved proposal |
The pattern across the table is layering. No single control is airtight — a prompt can be forgotten, a CLAUDE.md rule can be under-weighted in a long session, a plan can be approved too quickly. But the controls are cheap and they stack: a rule the agent skims past gets caught in the plan; a plan you approve too fast gets caught at review. Defense in depth is what turns "Claude Code over-engineers" from a recurring frustration into an occasional, easily-corrected blip.
Does this generalize to other AI coding agents?
Mostly, yes — the over-engineering tendency shows up across today's large coding agents, not just Claude, so the same complaints surface with competing tools. The controls transfer with minor renaming: the instruction-file mechanism, the plan-before-edit checkpoint, scope-fenced prompts, effort settings, and a review gate all have analogs across the major tools. What differs is the exact configuration surface and how aggressively each agent defaults to elaboration. If you work across more than one, it's worth knowing those differences; this comparison of Claude Code, Cursor, Codex and Gemini CLI and the Claude Code vs Codex breakdown both get into where each one tends to over-build and where it stays lean.
One caveat worth stating plainly: the goal is not to make the agent never add structure. Sometimes the abstraction, the fallback, or the new file is genuinely the right call, and an agent that's been clamped too hard will under-build — leaving you with brittle, unguarded code that breaks on the first real edge case. The controls here aren't about minimizing code as an end in itself; they're about making the agent's additions deliberate and requested rather than reflexive. When you do want the robust version, ask for it explicitly. The win is that "more code" becomes a decision you make, not a default you have to keep undoing.
The anti-over-engineering checklist
A compact reference you can keep next to your editor. Set up the first two once; run the rest per task.
- Once, in CLAUDE.md: add the change-discipline block — smallest change, no new files, no single-use abstractions, no unrequested error handling, no premature optimization, match surrounding style, stop-and-propose past ~20 lines, deleting is valid.
- Once, in settings: default to a faster/lower-effort configuration for everyday edits; reserve the heavyweight model for genuinely hard work.
- Per task — prompt: name the file and function, state the exact change, fence the scope ("nothing else"), forbid the tempting additions by name, define done as one verifiable fact, grant permission to ask.
- Per task — plan: for anything non-trivial, plan mode on; read the plan for new files, new classes, "robust/wrapper/fallback/handler"; cut before approving.
- Per task — context: start a fresh session for an unrelated task; if a session has drifted into over-building, reset rather than negotiate.
- Per task — review: run the diff through the over-engineering checklist (out-of-scope files, new files, single-use abstractions, impossible-state guards, new deps, comment inflation, compat shims) — human pass, agent pass in a separate session, or both.
If your team is leaning on AI agents heavily enough that over-engineering is eating real review time, it can also be worth pairing them with engineers who already work this way by instinct — people who scope tightly, review hard, and treat "less code" as a feature. That's the bar teams adopting AI coding agents are starting to set, and it's the same judgment Codersera's vetted remote developers bring to a codebase — the human discipline that keeps an agent's output surgical instead of sprawling. The tooling controls get you most of the way; experienced reviewers close the gap.
FAQ
Why does Claude Code add so much code I didn't ask for?
Because its defaults lean toward thoroughness and it can't see your judgment about what's unnecessary. Assistant models are generally tuned on feedback that prefers "complete, robust" answers, so more structure and more error handling read as more helpful — which is backwards for surgical edits. Absent an explicit constraint, it fills ambiguity by doing more. You correct it by changing the defaults in CLAUDE.md, scoping each prompt tightly, and reviewing for unrequested complexity before merge.
What's the single most effective control against over-engineering?
Plan mode. Forcing the agent to propose its changes — files, abstractions, additions — before writing any code lets you see and cut the gold-plating in a one-paragraph plan instead of a sprawling diff. It catches scope creep at its cheapest moment. CLAUDE.md rules are the highest-leverage standing control, but plan mode is the most reliable per-task one because it makes large changes far harder to land silently.
Will strict CLAUDE.md rules make Claude Code worse at hard problems?
If you clamp too hard, yes — over-constrained agents can under-build, leaving brittle code that breaks on real edge cases. The fix isn't milder rules; it's asking for the robust version explicitly when you actually want it. The standing rules suppress reflexive elaboration on small tasks. For genuinely hard work, raise the effort setting and tell the agent the complexity is warranted. The goal is deliberate additions, not minimal code as a dogma.
Should I let Claude Code review its own diffs for over-engineering?
Not in the same session — an agent asked to review code it just wrote tends to defend it, the same sunk-cost bias humans have. Run the review in a separate, fresh invocation with no stake in the original, prompted specifically to flag unrequested abstractions, fallbacks, new files, and dependencies. A reviewer with no attachment cuts hard. Authoring and reviewing should always happen in separate contexts.
How do I stop Claude Code from creating new files?
Two layers. In CLAUDE.md, add a standing rule: "Do not create new files unless strictly required; prefer editing an existing file." Per task, add "no new files" to the prompt. Then enforce it at review — for every new file in the diff, ask whether the code could have lived in an existing module. It almost always could, and file sprawl is one of the easiest over-engineering tells to catch mechanically.
Does over-engineering happen with other AI coding agents too?
Yes. It shows up across current coding agents, not just Claude, so Cursor, Codex, Gemini CLI and others show the same tendency to varying degrees. The controls transfer: instruction files, plan-before-edit checkpoints, scope-fenced prompts, effort settings, and review gates all have equivalents. What varies is the configuration surface and how aggressively each defaults to elaboration.