2026

DeepSeek V4-Pro Review: Benchmarks, Pricing, Verdict (2026)

Honest review of DeepSeek V4-Pro. Permanent $0.435/$0.87 pricing since 2026-05-22, 80.6% SWE-bench Verified, where it beats Claude and where it loses.

Published 27 Apr 2026 • Updated 02 Jun 2026 • 15 min read

Quick answer. DeepSeek V4-Pro is the cheapest frontier-class coding model in 2026: $0.435/M input, $0.87/M output (75% cut, now permanent), 80.6% on SWE-bench Verified, 1M-token context, open weights on Hugging Face. Verdict: best pick for cost-sensitive coding and reasoning; Claude Opus 4.7 still wins long agentic loops, GPT-5.5 still wins multimodal.

Updated 2026-05-23. DeepSeek announced on 2026-05-22 that the 75%-off promo on V4-Pro is now the standing price. The pre-discount sticker rates ($1.74 / $3.48) are historical.

DeepSeek V4-Pro, released April 24, 2026, is the first open-weight model that lands within striking distance of Claude Opus 4.7 and GPT-5.5 on real-world coding and reasoning benchmarks — at roughly one-thirtieth of the per-token cost. This review covers what it is, where it wins, where it loses, the new permanent pricing, and how to access it through the DeepSeek API, OpenRouter, or a self-host stack.

Companion guide

For the deeper architecture and deployment patterns, read the continuously-updated DeepSeek V4 complete guide.

DeepSeek V4-Pro vs the frontier: at a glance

What is the DeepSeek V4 Pro in one line?

The deepseek v4 pro is the flagship 1.6 trillion-parameter Mixture-of-Experts model in the DeepSeek V4 family, activating 49 billion parameters per token. It is the cheapest frontier-class coding model in 2026 at $0.435/M input and $0.87/M output, scores 80.6% on SWE-bench Verified, ships a 1M-token context window, and has open weights on Hugging Face. Use it for cost-sensitive coding and reasoning; reach for Claude Opus 4.7 on the longest agentic loops and GPT-5.5 for multimodal work.

The table below compares V4-Pro against the three closed-source frontier models teams typically evaluate against. SWE-bench, LiveCodeBench, MMLU-Pro and GPQA figures are vendor-reported except where noted. The "winner" column flags the model with the best public number on that row; ties get both names.

Benchmark / spec	DeepSeek V4-Pro	Claude Opus 4.7	GPT-5.5	Gemini 3.1 Pro	Winner
SWE-bench Verified	80.6%	80.8%	74.9%	76.2%	Claude / V4-Pro (tie)
LiveCodeBench	93.5%	~89%	~86%	~84%	V4-Pro
Terminal-Bench 2.0	67.9%	65.4%	not reported	not reported	V4-Pro
Codeforces rating	3,206	not reported	3,168	not reported	V4-Pro
MMLU-Pro	~parity GPT-5.5	~parity	baseline	~parity	Tie
Input price ($/M)	$0.435	$5.00	$5.00	$3.50	V4-Pro (~11x cheaper)
Output price ($/M)	$0.87	$25.00	$30.00	$21.00	V4-Pro (~29x cheaper)
Context window	1M tokens	1M tokens	1M tokens	2M tokens	Gemini
Open weights	Yes (HF)	No	No	No	V4-Pro

Net read: V4-Pro is the cost-efficiency king and the strongest open-weight coding model. Claude Opus 4.7 still has the tightest agentic-loop story; GPT-5.5 still has the broadest tool-use ecosystem; Gemini 3.1 Pro still leads on context-length and multimodal.

What is DeepSeek V4-Pro?

DeepSeek V4-Pro is the flagship of the DeepSeek V4 family, released April 24, 2026 by DeepSeek (Hangzhou-based; the same lab behind V3 and the R1 reasoning series). It is a 1.6 trillion parameter Mixture-of-Experts model that activates 49 billion parameters per token, putting per-token inference cost close to a 49B dense model while retaining the knowledge capacity of a much larger one.

Two facts make V4-Pro distinct from prior open-weight efforts:

Hybrid attention. V4 combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA) so the model can serve a 1M-token context at roughly 27% of V3.2's per-token FLOPs and 10% of its KV-cache memory. This is what makes long-context production deployments financially viable.
Domestic training stack. V4 was trained on Huawei Ascend 950 chips plus Cambricon accelerators rather than Nvidia GPUs — the first frontier-class model to do so. Geopolitically meaningful; technically it means the recipe is reproducible outside the Nvidia ecosystem.

The other architectural details worth knowing: V4-Pro uses Manifold-Constrained Hyper-Connections (mHC) to stabilize training at 1.6T parameters — the residual-connection scheme that prior MoE efforts at this scale struggled to keep stable. Post-training combines supervised fine-tuning, Group Relative Policy Optimization (GRPO) for preference alignment, and on-policy distillation to clean up the artifacts that RL training tends to introduce. The result is a base model that scores well on benchmarks and a final policy that behaves coherently in production.

V4-Pro ships alongside V4-Flash (284B total / 13B active), a leaner sibling priced at $0.14/$0.28 per million tokens for high-volume, latency-sensitive workloads. Both share the same training data, the same hybrid-attention design, and the same 1M-token context — V4-Flash is the speed/cost-optimized smaller model, not a feature-stripped variant. For most production teams, V4-Flash handles 70-80% of traffic and V4-Pro is reserved for the requests where reasoning depth genuinely changes the answer.

Where DeepSeek V4-Pro wins

The honest case for V4-Pro is narrower than the launch headlines suggest, but the wins it does have are decisive.

Best open-weight coding model, by a margin

SWE-bench Verified at 80.6%, LiveCodeBench at 93.5%, Terminal-Bench 2.0 at 67.9%, Codeforces rating 3,206. No other open-weight model is in the same conversation in mid-2026. If you need a model you can self-host that holds its own against Claude Opus 4.7 on real-world repo-level coding, V4-Pro is the only credible answer today.

The practical implications matter. Teams that previously had to choose between open-weight cost savings and frontier code quality no longer have to choose. A self-hosted V4-Pro can drive PR-review automation, code-generation features, or repo-aware copilots at quality levels that, six months ago, required a managed API to a closed-source model. For organizations with strict data-residency or IP-isolation requirements, this is the first time the trade-off has actually disappeared.

Cost-per-token is not close

Permanent input pricing is $0.435/M; output is $0.87/M; cache hits drop to $0.003625/M. Against Claude Opus 4.7's $25/M output and GPT-5.5's $30/M, the math is on a different page. For a team burning 10M output tokens a day, V4-Pro lands at ~$261/month; the same workload on Claude Opus 4.7 is ~$7,500/month. The discount widens further on cache-heavy workloads with stable system prompts.

The cache-hit pricing deserves special attention. At $0.003625/M for cached input tokens — a 120x discount on the standard input rate — workloads with stable system prompts effectively run with free context. A RAG application that injects the same 50K-token system prompt on every call pays cents per million requests for that prefix, rather than dollars. Architect the prompt cache deliberately and the total spend often lands an additional 5-10x lower than the naive "$0.87/M output" calculation suggests.

Reasoning depth on structured problems

V4-Pro's competitive-programming and mathematical-reasoning numbers (Codeforces 3,206, AIME-style problem sets) are top-tier across all models, open or closed. If your workload is well-bounded reasoning — algorithmic synthesis, math, deterministic code generation — V4-Pro performs at the frontier.

The architecture choices behind this matter. V4-Pro's hybrid attention preserves attention quality on nearby tokens while compressing distant context, which keeps short-range reasoning chains crisp even inside long prompts. The GRPO + on-policy distillation post-training pipeline yields cleaner reasoning chains than vanilla RLHF tends to produce. The net effect: V4-Pro thinks through problems in a way that feels closer to Claude's reasoning trace than to the typical open-weight model.

Long context that actually works at price

1M-token context is now offered by several models, but most charge punitive prices or hit memory walls. V4-Pro's hybrid attention keeps long-context inference inside the same cost envelope as short prompts, which is what makes full-repo analysis or multi-hour agent transcripts financially sane.

Concretely: a 500K-token context call on V4-Pro is roughly 10x cheaper in memory and 4x cheaper in compute than the same call on V3.2 would have been. Combined with cache-hit pricing, this opens up workloads that were nominally possible on prior 1M-context models but never economically deployed — full monorepo code review, multi-day customer-support transcripts, 200-page contract review with extracted clause Q&A. The model can use the whole context window without the cost spiking past the value of the answer.

Open weights and data sovereignty

Weights are on Hugging Face. Self-hosting unlocks customization, on-prem deployment for regulated industries, and the ability to audit and fine-tune. No proprietary frontier model offers this.

Where DeepSeek V4-Pro loses

If the post stopped at the wins it would be marketing. Here is the honest other side.

Agentic loops still belong to Claude Opus 4.7

On long, multi-step agent runs — the kind Anthropic's Computer Use and Claude Code workflows are built around — Opus 4.7 still recovers from errors more reliably and is less prone to subtle drift over 100+ turns. Independent reviewers consistently flag this gap; V4-Pro is closing it, not closed.

The failure modes show up specifically in long-running coding agents: V4-Pro is more likely to lose track of an earlier-stated constraint, propose an action that contradicts a previous step, or fail to re-read its own scratchpad before continuing. Claude Opus 4.7's training pipeline appears to weight self-consistency and error-recovery more aggressively. For products that depend on autonomous multi-step execution (Claude Code-style coding agents, browser automation agents, anything chaining 50+ tool calls), the reliability gap is real and currently decisive.

Tool-use breadth favors GPT-5.5

OpenAI ships the broadest first-party tool surface (Code Interpreter, web browsing, image generation, structured outputs, function calling at scale, real-time voice). If your product depends on stitched-together tool calls inside the OpenAI stack, V4-Pro is a peer on the language model but a stranger to the ecosystem.

The DeepSeek API does support function calling and structured (JSON) outputs, and OpenRouter exposes parallel-tool-call patterns that mirror OpenAI's surface. But the supporting infrastructure — managed Code Interpreter sandboxes, hosted retrieval, first-party browsing, assistants threads — is OpenAI-only. Teams that have built on top of those primitives will face real migration cost to move to V4-Pro, and that migration cost frequently outweighs the per-token savings.

No vision, no audio

V4-Pro is text-only at launch. Workloads that need image understanding (PDF screenshots, screen-OCR agents, visual QA) or audio I/O still need GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro. DeepSeek has hinted at a multimodal variant but it does not exist today.

Timeouts on the hardest problems

In published evaluations V4-Pro completed 29 of 38 hard reasoning tasks before timing out — about a 24% timeout rate on the upper end of difficulty. For latency-bounded production endpoints or hard-deadline workloads, you may need to pair it with a fallback model.

Real-world vs benchmark gap

Several independent reviewers report V4-Pro trails Claude on tasks with ambiguity, implicit context, or precise factual recall — the kind of stuff that benchmarks don't measure well. Benchmark inflation is universal across the industry, but V4-Pro is no exception. Run your own evals on your real workload.

Concretely, the reports that surface most often: V4-Pro will occasionally produce confidently-wrong factual claims on edge-of-knowledge topics, miss the intent behind an underspecified user request more often than Claude does, and over-explain code where a shorter answer would have served. None of these are deal-breakers — they are the trade-offs that come with weighting the training pipeline toward benchmark scores. But they are also exactly the failure modes that your eval harness needs to catch before V4-Pro replaces Claude in a user-facing surface.

Preview status

V4-Pro is labelled a preview, not a final release. The behavior — and the system prompt expectations — may shift before the GA milestone. Production deployments should pin the version and budget for a re-validation pass.

DeepSeek V4-Pro pricing in depth

DeepSeek announced on 2026-05-22 that the 75% discount that originally launched as a time-boxed promo through 2026-05-31 is now permanent pricing. The previously published list rates of $1.74 / $3.48 are historical; the standing rate is $0.435 / $0.87.

Model	Input ($/M, cache miss)	Input ($/M, cache hit)	Output ($/M)	Notes
DeepSeek V4-Pro	$0.435	$0.003625	$0.87	Permanent since 2026-05-22; cache-hit price is 10x lower than the pre-permanent rate
DeepSeek V4-Flash	$0.14	$0.0028	$0.28	Best cost-per-token in class
GPT-5.5	$5.00	$0.625	$30.00	Official list pricing
Claude Opus 4.7	$5.00	$0.50	$25.00	Official list pricing

At 10M output tokens/day the monthly arithmetic:

V4-Pro: ~$261/month
V4-Flash: ~$84/month
Claude Opus 4.7: ~$7,500/month
GPT-5.5: ~$9,000/month

For workloads dominated by repeated system prompts (RAG, multi-turn assistants, agentic loops), the cache-hit price of $0.003625/M is the line item that compounds. A system prompt that resolves to a cache hit on every call effectively becomes free.

Source: api-docs.deepseek.com/quick_start/pricing.

Benchmarks deep-dive

The headline numbers in the comparison table above need a little context.

SWE-bench Verified: 80.6%

SWE-bench Verified is the standard benchmark for autonomous resolution of real GitHub issues — a much harder evaluation than synthetic code generation. V4-Pro's 80.6% (DeepSeek-reported) lands 0.2 percentage points behind Claude Opus 4.7 (80.8%) and well ahead of GPT-5.5 (74.9%, OpenAI-reported). Independent reproduction is still landing; treat V4-Pro and Claude as effectively tied until third-party numbers stabilize.

The SWE-bench score is the single most important number in this entire post. It is the benchmark closest to what real engineering work looks like — open a real bug report, navigate a real codebase, write a patch that passes real tests. A model at 80%+ is meaningfully better than a model at 70%+ for production code-fix automation. V4-Pro being inside half a point of the closed-source leader, at one-thirtieth the price, is the headline result of the V4 release.

LiveCodeBench: 93.5%

LiveCodeBench is run on freshly-released programming problems to mitigate training-set contamination. V4-Pro at 93.5% is the highest open-weight result on record and ahead of the publicly-reported closed-model numbers. The contamination-resistant design matters here: many models look stronger than they really are on benchmarks they may have indirectly memorized. LiveCodeBench is harder to game, which makes V4-Pro's lead more credible than a similar margin on, say, HumanEval would be.

Terminal-Bench 2.0: 67.9%

Terminal-Bench measures the model's ability to drive a real shell to completion on systems-level tasks — package installs, file manipulation, build pipelines, debugging. V4-Pro's 67.9% beats Claude Opus 4.7's 65.4%, suggesting strong filesystem and CLI competence. For products that build coding-agent surfaces (Cursor-style IDEs, Devin-style autonomous developers, internal devops automation), this is the benchmark to pay attention to alongside SWE-bench.

Codeforces rating: 3,206

Codeforces' competitive-programming Elo translates to the high International Master / low Grandmaster band. The 3,206 score is the highest reported by any model at the time of release, edging GPT-5.5's 3,168.

Cross-reference live model leaderboards on Artificial Analysis for third-party verification as those numbers update.

How to access DeepSeek V4-Pro

There are four practical access paths.

1. DeepSeek Platform API (most teams start here)

Sign up at platform.deepseek.com and call the OpenAI-compatible endpoint. The model id is deepseek-v4-pro; reasoning effort can be tuned via the reasoning parameter (high / xhigh). Streaming, function calling, and JSON mode are supported. Existing OpenAI SDK code typically requires changing the base URL and model name only — no broader refactor — which makes initial proof-of-concepts cheap to spin up.

2. OpenRouter (multi-model gateway)

openrouter.ai/deepseek/deepseek-v4-pro exposes the same model behind a unified billing layer alongside Claude, GPT, and Gemini. Useful when you want to A/B between models without managing four separate keys.

3. DashScope (Alibaba Cloud)

Available in regions where the DeepSeek API has latency or compliance friction. Pricing approximately matches the DeepSeek list rate.

4. Self-host from Hugging Face

Weights are on Hugging Face. Practical inference at 1.6T parameters requires meaningful infrastructure — 8x H100 or 8x H200 for tensor-parallel serving with vLLM or SGLang. V4-Flash is far more tractable for single-node deployment with 4-bit quantization. For most teams, self-hosting V4-Pro only makes sense at high steady-state volume (>$3-5K/month of equivalent API spend) or when data-sovereignty requirements rule out the managed API.

V4-Pro vs V4-Flash vs Claude Opus 4.7: which to pick

The choice between the three depends on the dominant axis of your workload.

If your priority is…	Pick	Why
Highest reasoning quality, cost matters	V4-Pro	Frontier-tier reasoning at ~3% of Claude's price
Highest throughput, latency-bounded	V4-Flash	13B active params; ~$0.28/M output
Multi-step agentic loops with tool use	Claude Opus 4.7	Most reliable long-running agent behavior
Broad ecosystem + multimodal	GPT-5.5	Widest first-party tool surface, vision, audio
Open weights, on-prem, fine-tuning	V4-Pro or V4-Flash	Only viable open-weight options at this tier

Many teams will use multiple models: V4-Flash for high-volume classification or extraction, V4-Pro for complex code generation, Claude Opus 4.7 for the longest-running agent workflows. The cost arbitrage often pays for the integration work in the first month.

Frequently asked questions

What is the current price of DeepSeek V4-Pro?

As of 2026-05-22, permanent pricing is $0.435 per million input tokens, $0.87 per million output tokens, and $0.003625 per million cache-hit input tokens. The pre-discount sticker rates ($1.74 / $3.48) are no longer in effect. Source: api-docs.deepseek.com.

Is DeepSeek V4-Pro better than Claude Opus 4.7?

On SWE-bench Verified the two are within 0.2 points (80.6% vs 80.8%) and V4-Pro is ahead on Terminal-Bench, LiveCodeBench, and Codeforces. On long-running agent loops with tool use, Claude Opus 4.7 still has the edge — and on multimodal workloads it is not close. For cost-sensitive coding workloads V4-Pro wins; for the trickiest agentic systems, Claude still wins.

Is DeepSeek V4-Pro better than GPT-5.5?

On pure coding and reasoning benchmarks, yes — V4-Pro beats GPT-5.5 on Codeforces, SWE-bench Verified, and LiveCodeBench. On tool-use breadth, ecosystem, multimodal capability, and real-time/voice, GPT-5.5 is well ahead. Pick on workload, not aggregate "better."

Is DeepSeek V4 truly open source?

It is open-weight, not fully open-source. Model weights are on Hugging Face under a permissive license that allows self-hosting, fine-tuning, and commercial use. Training code and training data are not fully released. This is consistent with DeepSeek's prior model releases.

Can I fine-tune DeepSeek V4-Pro?

Yes, on self-hosted weights. The DeepSeek Platform API does not yet expose managed fine-tuning. For LoRA-style adaptation at 1.6T parameters, expect a meaningful infrastructure footprint; many teams fine-tune V4-Flash (284B) instead for cost reasons.

What is the context window?

1 million tokens of input context with up to 384K tokens of output. The hybrid CSA + HCA attention mechanism keeps long-context inference inside roughly 10% of V3.2's KV-cache memory and 27% of its per-token FLOPs, which is what makes long-context workloads viable at the standing price.

How does cache-hit pricing work?

If a prefix of the prompt matches a previously-seen sequence (typically a stable system prompt or RAG context), DeepSeek bills those tokens at $0.003625/M instead of $0.435/M — a 120x discount on input. Workloads with repetitive system prompts can see total bills shrink dramatically; design system prompts to maximize the cached prefix.

Should I use V4-Pro or V4-Flash?

V4-Pro for tasks where reasoning quality is the bottleneck (complex code generation, multi-step planning, research synthesis). V4-Flash for high-volume, well-defined tasks (classification, extraction, summarization, structured output). At ~3x the cost, V4-Pro is worth it whenever quality directly drives business outcomes; below that bar, V4-Flash is the cheaper default.

Does V4-Pro support images, audio, or video?

No. V4-Pro is text-only at launch. For multimodal workloads use Claude Opus 4.7, GPT-5.5, or Gemini 3.1 Pro.

Is V4-Pro safe for production?

It is shipping in production at many teams, but the model is still labelled a preview. Pin to a specific model version, hold a fallback (V4-Flash or a closed-model peer), and budget a re-validation pass when the stable release lands. For latency-bounded endpoints, factor in the ~24% timeout rate observed on the hardest reasoning tasks.

What is DeepSeek V4-Pro good for?

DeepSeek V4-Pro is purpose-built for high-volume coding, reasoning, and long-context workloads where per-token cost is a real constraint. Concrete strengths: production code generation and review (80.6% SWE-bench Verified, 93.5% LiveCodeBench), competitive-programming-style problems (Codeforces 3,206), multi-step structured reasoning, and 1M-token RAG over codebases or research corpora. Weak fits: anything requiring images/audio/video (no multimodal), the trickiest open-ended agentic loops (Claude Opus 4.7 leads), and tool-use breadth (GPT-5.5 ecosystem is wider).

Is DeepSeek V4-Pro worth it in 2026?

For coding and reasoning workloads, yes — by a wide margin. At $0.435/M input and $0.87/M output it is roughly 11x cheaper on input and 29x cheaper on output than Claude Opus 4.7 while landing within 0.2 points on SWE-bench Verified and ahead on LiveCodeBench, Terminal-Bench, and Codeforces. With cache-hit pricing at $0.003625/M (a 120x input discount), repeated-prefix workloads like RAG over a fixed codebase get dramatically cheaper still. The honest caveats: still a preview label, no multimodal, and Claude Opus 4.7 remains the safer pick for production agent systems.

DeepSeek V4-Pro vs V4-Pro-Max: which one?

V4-Pro is the standard flagship at $0.435/M input and $0.87/M output, suitable for nearly every coding and reasoning workload. V4-Pro-Max is a higher-tier sibling targeted at the hardest reasoning tasks with longer chain-of-thought budgets — useful when V4-Pro hits its ~24% timeout rate on the most demanding problems. For most teams V4-Pro is the right default; reach for V4-Pro-Max only when you have benchmarked a specific reasoning workload that V4-Pro cannot complete reliably.

References

DeepSeek pricing: api-docs.deepseek.com/quick_start/pricing
DeepSeek Platform: platform.deepseek.com
OpenRouter listing: openrouter.ai/deepseek/deepseek-v4-pro
Cross-vendor benchmarks: artificialanalysis.ai
Hugging Face weights: huggingface.co/deepseek-ai

Build with DeepSeek V4, the right way

V4-Pro changes the cost-quality frontier for AI-powered features. A workload that used to need a Claude or GPT budget can now ship on a hobbyist's monthly spend — but the integration, evaluation harness, and fallback strategy still need real engineers.

Codersera places vetted remote developers who can ship AI features end-to-end: prompt engineering, inference cost optimization, eval harnesses, fallback routing, and the production plumbing around them. Whether you are migrating from Claude to V4-Pro or building a multi-model router, the right team turns model capability into shipped product.

Hire AI-ready developers with Codersera — lower hiring risk, faster ramp-up, engineers who ship.