How to Secure MCP Servers: Auth, Prompt Injection & Defenses

MCP's real attack surface in 2026 - prompt injection, tool poisoning, token passthrough, confused deputy, SSRF - and how to harden a server with OAuth 2.1, scoping, input validation, and human-in-the-loop.

To secure an MCP server, turn on OAuth 2.1 authorization with mandatory PKCE, validate the token audience (RFC 8707 / 9068) so you only accept tokens minted for you, never pass a client token through to upstream APIs, allow-list and validate every tool input, block SSRF egress to private IP ranges, and require human confirmation for any sensitive or irreversible action.

In March 2025 the security firm Equixly scanned a batch of popular Model Context Protocol (MCP) servers and the results read like a worst-case audit: 43% had command-injection flaws, 22% allowed path traversal or arbitrary file reads, and 30% were exploitable via server-side request forgery (SSRF) — with no authentication by default. Worse than the bugs was the reaction: when Equixly disclosed the findings, 45% of the notified vendors dismissed the risks as theoretical or acceptable.

A year of hardening later, the numbers have barely moved. A 2026 audit summarized by the API gateway vendor Zuplo found that 40% of MCP servers still require no authentication, 43% still carry command-injection vulnerabilities, and 79% handle credentials in plaintext. MCP has become the default plumbing between large language models and the real world — it is what lets AI coding agents read your files, hit your APIs, and run your shell — yet most servers were written for the demo, not for production.

The spec has moved fast to catch up. The 2025-06-18 revision introduced a full OAuth 2.1-based authorization rework, and the current protocol revision is 2025-11-25. There is now an official Security Best Practices document that names the attacks and prescribes the mitigations. This guide walks through that real attack surface — prompt injection, tool poisoning, token passthrough, the confused deputy, SSRF, over-broad scopes — and exactly how to harden a server against each one.

Why are MCP servers such an easy target?

MCP inverts the trust model most developers carry in their heads. A normal API has one client you wrote, one server you wrote, and a token boundary you control. An MCP server sits between an autonomous model and a set of tools, and the model is steered by text that can come from anywhere — a web page it just fetched, a GitHub issue someone else filed, the description of a tool published by a third party. The thing deciding which tool to call is, by design, manipulable by untrusted input.

On top of that, three structural facts make exploitation cheap:

  • Authorization is optional. The spec says HTTP-based transports SHOULD conform to the auth spec, while stdio transports SHOULD NOT use it and should pull credentials from the environment instead. In practice "optional" gets read as "skip it," which is how you end up with 40% of remote servers wide open.
  • Tool descriptions are treated as trusted. A tool's description and parameter schema get injected into the model's context with near-instruction-level authority, before any human reviews what the model is about to do.
  • Anyone can call the server, not just the LLM. An MCP endpoint is still an HTTP or process boundary. If a tool handler shells out with unsanitized arguments, that is remote code execution reachable by anyone who can reach the port — the model is just one possible caller.

A CyberArk write-up summarizing MCP risk for developers puts it bluntly: the headline problems are malicious tool registration, hidden prompt injection, and weak authentication. None of this is exotic. It is the same OWASP-flavored bugs you already know, plus a language-model-shaped layer of prompt injection on top.

What is the real MCP attack surface in 2026?

It helps to separate the attacks into two families. The first is classic application security: command injection, path traversal, SSRF, missing auth — bugs in the server code itself. The second is model-layer manipulation: tool poisoning, rug pulls, tool shadowing, and the broader prompt-injection problem. A secure MCP server has to defend both, because in MCP they chain: a prompt-injection payload in untrusted content can trigger a tool call that then exploits a command-injection bug in a handler.

AttackWhat happensPrimary mitigation
Missing / weak authServer accepts any caller; 40% require no auth at allOAuth 2.1 + audience-validated bearer tokens
Command injectionHandler shells out with unsanitized args, giving RCE (43% of scanned servers)No shell-out; parameterized exec; strict input validation
Path traversalArbitrary file read via crafted paths (22% of scanned servers)Canonicalize + allow-list paths; sandbox the filesystem
SSRFServer fetches attacker-chosen URLs, incl. cloud metadata at 169.254.169.254 (30%)Block private ranges; HTTPS-only; egress proxy
Token passthroughClient token forwarded upstream, breaking the audience boundaryExplicitly forbidden; mint a separate upstream token
Confused deputyProxy with static client ID is tricked into leaking an auth codePer-client consent; exact redirect_uri match; single-use state
Tool poisoningHidden instructions in a tool description exfiltrate secretsInspect full descriptions; pin + re-confirm on change
Rug pullServer mutates a tool's description after you approved itHash descriptions; re-prompt on any change
Tool shadowingMalicious server overrides behavior toward a trusted serverIsolate servers; don't co-mingle trust domains

Two of these deserve to be singled out because they are uniquely MCP-flavored and because they were demonstrated live, against tools developers actually use.

Tool poisoning. On April 1, 2025, Invariant Labs showed that hidden instructions placed inside a tool's description — text the model reads in full, but which the client UI truncates or hides — can make an agent read ~/.ssh/id_rsa and ~/.cursor/mcp.json and exfiltrate them through an innocuous-looking tool parameter. They demonstrated it live against Cursor. The user sees a tool called something harmless; the model sees a paragraph of attacker instructions.

Rug pulls and tool shadowing. The same research showed two follow-ons. A server can present a clean tool, wait for you to approve it, then silently swap in a malicious description afterward — a "rug pull." And a malicious server's description can override the agent's behavior toward a separate, trusted server connected in the same session, hijacking its authority (cross-server shadowing). Approval at install time is not approval forever.

What is the lethal trifecta, and why does MCP make it worse?

Simon Willison's framing from June 16, 2025 — the "lethal trifecta" — is the single most useful mental model for reasoning about MCP risk. The claim is simple: you get data exfiltration when an agent simultaneously has all three of:

  1. Access to private data (your files, your database, your tokens).
  2. Exposure to untrusted content (a web page, an email, a tool description, an issue someone else wrote).
  3. A way to communicate externally (an HTTP tool, an email send, a webhook).

Any one or two of these is fine. All three together is an exfiltration primitive, because the untrusted content can instruct the model to read the private data and ship it out. MCP is dangerous precisely because it encourages assembling tools that combine all three in one agent — a filesystem server, a web-fetch server, and an outbound-API server, all live in the same session.

Willison's other point is the part vendors hate: a guardrail product that "catches 95% of prompt-injection attacks" is a failing grade, not a win. Security boundaries that hold 95% of the time are boundaries an attacker simply probes until the 5% case lands. This is why the durable defenses below are architectural — break the trifecta, validate the audience, require human confirmation — rather than "run the payload through a classifier and hope."

The exploits bear this out. Willison points to a string of real systems researchers have demonstrated this against — among them Microsoft 365 Copilot, GitLab's Duo chatbot, and GitHub's official MCP server, where an attacker-filed public issue could steer the agent into reading a private repository and exfiltrating its contents through a pull request. In each case the trifecta was assembled and untrusted content closed the loop.

How does MCP authorization actually work?

MCP's authorization model, introduced in the 2025-06-18 revision, is not a bespoke scheme — it is standard OAuth, which is good news because the security properties are well understood. The MCP server acts as an OAuth 2.1 resource server. The stack is:

  • OAuth 2.1 (draft-ietf-oauth-v2-1-13) as the base framework.
  • RFC 9728 Protected Resource Metadata — the server MUST implement this and return a WWW-Authenticate header on a 401 so the client can discover which authorization server to use.
  • RFC 8414 Authorization Server Metadata — how the client discovers the AS endpoints.
  • RFC 7591 Dynamic Client Registration — optional client onboarding without manual setup.
  • RFC 8707 Resource Indicators — the audience-binding mechanism that prevents token reuse across services.

A few requirements are stated as hard MUSTs and are where most of the security actually lives:

  • PKCE is mandatory. MCP clients MUST implement PKCE (OAuth 2.1 Section 7.5.2). All authorization-server endpoints MUST be HTTPS, and redirect URIs MUST be either localhost or HTTPS, with exact-match validation — no wildcard or prefix matching.
  • The resource parameter is required on both legs. Clients MUST send the RFC 8707 resource parameter — the canonical MCP server URI — in both the authorization request and the token request, regardless of whether the AS advertises support for it. In a real request it looks like &resource=https%3A%2F%2Fmcp.example.com, URL-encoded.
  • The server must validate the audience. The server MUST only accept tokens specifically issued for it (the audience claim, per RFC 9068) and MUST reject tokens issued for any other resource. Symmetrically, clients MUST NOT send the server a token that was not issued by that server's own authorization server.
  • The token goes in the header, never the URL. Every HTTP request carries Authorization: Bearer <access-token>, and tokens MUST NOT appear in the URI query string (where they leak into logs, referers, and history).

One important caveat: this whole layer applies to HTTP transports. For stdio servers — the kind a coding agent launches as a local subprocess — the spec says you SHOULD NOT use the OAuth flow and instead retrieve credentials from the environment. That is fine for a process the user controls on their own machine, but it is exactly why credentials handled in plaintext (79% of servers) are both expected and risky: a local secret pulled from the environment is only as safe as the process that can read it.

How do you harden an MCP server? An eight-step checklist

Here is the order of operations for taking a working-but-insecure MCP server to something you would expose. Each step maps to a specific class of attack from the table above.

1. Turn authorization on (for remote servers)

If your server is reachable over HTTP, implement the OAuth 2.1 resource-server role: RFC 9728 metadata, the WWW-Authenticate 401 challenge, PKCE, HTTPS-only endpoints, and exact-match redirect URIs. "Optional in the spec" does not mean "optional for production." Treat an unauthenticated remote MCP server the same way you would treat an unauthenticated database on the public internet.

2. Validate the token audience on every request

Reject any token whose audience claim is not your canonical server URI. This single check defeats the token-reuse and passthrough classes outright: a token minted for some other service is useless against you. (The confused-deputy attack needs its own mitigations, covered below.) Pair it with the client-side rule that you only send tokens issued by your own AS.

3. Never pass tokens through

If your tool handler calls an upstream API, do not forward the caller's token to it. Token passthrough is an explicitly forbidden anti-pattern in the spec: an MCP server MUST NOT accept tokens that were not issued to it, and when it needs to act against an upstream service it MUST obtain a separate token scoped to that service. Passing the token through silently destroys the audience boundary and turns your server into a relay for stolen credentials.

4. Apply least-privilege scopes

Request and grant the narrowest scopes that make the tool work. An MCP server that can read one repository should not hold a token that can write to every repository in the org. Over-broad scopes are what turn a single poisoned tool call into a breach instead of a nuisance. Scope per tool where the protocol and your AS allow it.

5. Validate inputs and allow-list aggressively

This is where the 43% command-injection and 22% path-traversal numbers come from, and it is the cheapest win. Never build a shell command by string-concatenating tool arguments — use parameterized execution or, better, avoid shelling out entirely. Canonicalize and allow-list file paths against a sandbox root before touching the filesystem. Constrain every parameter with a strict schema (types, lengths, enums, regex) and reject anything that does not match. Remember the caller may not be the model.

6. Lock down egress to stop SSRF

During OAuth discovery — and in any tool that fetches a URL — a malicious server or input can point a request at internal or cloud-metadata endpoints, classically 169.254.169.254 to steal cloud credentials. The mitigation is concrete: block requests to private and link-local ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, loopback, fc00::/7, fe80::/10), enforce HTTPS, and route outbound traffic through an egress proxy you control so the allow-list is enforced in one place rather than in every handler.

7. Inspect and pin tool descriptions

Before connecting a third-party server, read the full tool descriptions and parameter schemas — not the truncated version the UI shows. Hash them at approval time and re-prompt the user if a description changes, which neutralizes rug pulls. Keep servers from different trust domains isolated rather than co-mingled in one session, which limits cross-server shadowing. Treat a tool description as untrusted input, because it is.

8. Require human-in-the-loop confirmation for sensitive actions

For anything irreversible or sensitive — sending data externally, deleting, paying, writing to production — require an explicit human confirmation that shows the actual resolved action, not a label. This is the backstop that holds even when prompt injection succeeds, because it breaks the automation between "model decided" and "world changed." It is also the one defense that does not depend on perfectly classifying malicious text.

How do you defend against prompt injection and tool poisoning?

Prompt injection is the defense problem with no clean solution, so the honest answer is: you reduce the blast radius rather than promising to block the payload. The techniques that actually help, in rough order of leverage:

  • Break the lethal trifecta by design. Don't give one agent private-data access, untrusted-content exposure, and an external-communication tool at the same time. Split capabilities across separate agents or sessions so no single context can read secrets and ship them out. This is architectural and it holds regardless of how clever the injection is.
  • Treat tool descriptions and tool outputs as untrusted. A description is attacker-controllable; so is the body of a web page your fetch tool returns. Don't let either silently expand the agent's authority. Inspecting full descriptions before approval catches the original tool-poisoning vector.
  • Pin and re-confirm. Hash tool descriptions at approval and re-prompt on change to defeat rug pulls; isolate trust domains to defeat shadowing.
  • Confirm sensitive actions with a human. The same human-in-the-loop gate from step 8 is your last line against a successful injection — it converts "the model was tricked" into "the model proposed something the human declined."
  • Don't trust the 95% guardrail. Input/output classifiers are a useful extra layer, never the primary boundary. A filter that stops 95% of attacks is a filter an attacker iterates against until they find the 5%.

If you are choosing which servers to even connect, prefer audited, narrowly-scoped ones. Our roundups of the best MCP servers for Claude Code and Cursor and the best free open-source MCP servers flag which projects are actively maintained — maintenance status is itself a security signal, since an abandoned server is one nobody is patching.

How do you stop token passthrough and confused-deputy attacks?

These two are the OAuth-specific traps, and they are subtle enough to catch careful engineers.

Token passthrough happens when a server, asked to call an upstream service on the user's behalf, just forwards whatever token the client presented. It feels efficient. It is forbidden for a reason: the upstream service has no way to know the token is being relayed by an intermediary it never authenticated, the audience claim no longer means anything, and a compromised MCP server becomes a credential laundromat. The fix is mechanical — your server obtains its own, separately-scoped token for each upstream, and refuses any inbound token whose audience is not itself.

The confused deputy is the nastier one. The spec describes the canonical setup: an MCP proxy that uses a static client ID, supports dynamic client registration, and relies on a third-party consent cookie. If a user already consented once, the authorization server may skip the consent screen on the next request with that static client ID — so an attacker can craft a malicious authorization request, the AS silently redirects with a valid auth code, and the attacker captures it. The mitigations are specific and all three matter:

  • Store and check per-client consent server-side — don't let a shared static client ID inherit another user's consent.
  • Enforce exact redirect_uri matching, so a code can never be sent to an attacker-chosen callback.
  • Use a cryptographically-random, single-use state parameter to bind the request and detect replay.

If you are building a server in a typed ecosystem and want the auth wiring spelled out concretely, the patterns transfer cleanly to .NET — our walkthrough on creating an MCP server with .NET is a good place to see where the resource-server checks slot into a real handler pipeline.

A practical hardening checklist

Use this as a pre-deployment gate. If you cannot check every box, the server is not ready to face untrusted input.

ControlRequirementDefends against
OAuth 2.1 + PKCESHOULD for HTTP transports; PKCE MUST for clients; HTTPS-onlyMissing auth, code interception
Audience validationMUST reject tokens not issued for this server (RFC 9068)Token reuse / passthrough
Resource indicatorMUST send resource param on auth + token requests (RFC 8707)Token misbinding
No token passthroughMUST mint a separate upstream tokenCredential relay / laundering
Bearer in header onlyToken MUST NOT appear in URI queryToken leakage via logs
Least-privilege scopesNarrowest scope per toolBlast-radius limitation
Input validationNo shell-out; schema + allow-list every paramCommand injection, path traversal
Egress controlsBlock private ranges; HTTPS-only; egress proxySSRF / cloud-metadata theft
Pin tool descriptionsHash + re-confirm on change; isolate trust domainsTool poisoning, rug pull, shadowing
Human-in-the-loopConfirm resolved sensitive/irreversible actionsSuccessful prompt injection

Notice how few of these are about clever detection and how many are about boundaries — audience claims, scopes, egress allow-lists, human gates. That is the right ratio. The MCP exploits researchers have demonstrated against production systems were not stopped by smarter classifiers; they would have been stopped by an audience check, a scope limit, or a confirmation prompt.

FAQ

Is authentication required for an MCP server?

Not by the letter of the spec. Authorization is optional: HTTP-based transports SHOULD conform to the OAuth 2.1 auth spec, while stdio transports SHOULD NOT use it and instead read credentials from the environment. In practice this "optional" wording is why roughly 40% of remote MCP servers ship with no auth at all. For any server reachable over the network, treat OAuth 2.1 authorization as mandatory regardless of what the spec permits.

What is token passthrough and why is it forbidden?

Token passthrough is when an MCP server forwards the client's access token to an upstream API instead of obtaining its own token. The spec explicitly forbids it: a server MUST NOT accept tokens that were not issued to it, and when it calls upstream services it MUST get a separate, correctly-scoped token. Passthrough breaks the OAuth audience boundary and turns a compromised server into a relay for credentials it should never have been able to reuse.

Can prompt injection be fully prevented in MCP?

No — and any tool claiming a complete fix should be treated with suspicion. As Simon Willison's "lethal trifecta" framing argues, a guardrail that catches 95% of attacks is failing, because attackers iterate against the remaining 5%. The durable defenses are architectural: don't combine private-data access, untrusted-content exposure, and external communication in one agent, and require human confirmation for sensitive actions so a successful injection still can't act unsupervised.

What is a tool poisoning attack?

Demonstrated by Invariant Labs in April 2025, a tool poisoning attack hides malicious instructions inside a tool's description — text the model reads in full but the client UI hides or truncates. In their live demo against Cursor, the hidden instructions made the agent read ~/.ssh/id_rsa and ~/.cursor/mcp.json and exfiltrate them through a normal-looking tool parameter. Defenses are inspecting full descriptions before approval, pinning their hashes, and re-confirming when they change.

How do RFC 8707 resource indicators help secure MCP?

RFC 8707 binds a token to a specific resource. MCP clients MUST send the resource parameter — the canonical server URI — in both the authorization and token requests, and servers MUST validate that the token's audience matches. Together these stop a token minted for one service from being accepted by another, which is the core defense against the confused-deputy and token-reuse attacks.

Which MCP attacks are most common in the wild?

Empirical scans put command injection at the top (43% of servers), followed by SSRF (30%) and path traversal / arbitrary file read (22%), against a backdrop of no authentication by default and credentials handled in plaintext (79%). The model-layer attacks — tool poisoning, rug pulls, tool shadowing — are rarer in raw counts but higher in impact because they weaponize the agent's own trusted tools.

The takeaway

MCP did not invent new vulnerability classes; it wired the old ones directly to an autonomous agent and then made the riskiest configuration the default. The fix is not a product you buy — it is a sequence of boring, well-understood boundaries: OAuth 2.1 with audience validation, least-privilege scopes, strict input validation, SSRF egress controls, no token passthrough, pinned tool descriptions, and a human gate on anything that matters. Every one of those is in the official spec or the Security Best Practices doc; almost none of them are switched on by default.

If you are wiring MCP servers into an agent stack that touches production systems and you want a second pair of hands who has done the OAuth-resource-server and egress-hardening work before, Codersera can extend your team with vetted remote engineers who treat agent security as a first-class concern rather than a demo-day afterthought. Either way, run the checklist above before you expose a server — the attackers already have.