DeepSeek V4 API Guide: Setup, Code, Cost

Quick answer. The DeepSeek V4 API is OpenAI-compatible. Point any OpenAI SDK at https://api.deepseek.com, set your key, and call deepseek-v4-pro (top reasoning/agentic) or deepseek-v4-flash (cheap, fast). Minimal request: curl https://api.deepseek.com/chat/completions -H "Authorization: Bearer $DEEPSEEK_API_KEY" -H "Content-Type: application/json" -d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"hi"}]}'

DeepSeek V4 is one of the most capable open-weight model families available in 2026, and its API is a production-grade option for coding assistants, agents, and high-volume inference. It ships with a 1M-token context window, OpenAI- and Anthropic-compatible endpoints, and pricing that is aggressive even by 2026 standards. If you already call the OpenAI API, you can switch in two lines.

This guide is built to get you from zero to a working call fast, then deep enough for production: setup, authentication, current pricing, and copy-paste code in Python, JavaScript/TypeScript, and curl, followed by streaming, tool use, the new thinking mode, caching economics, and the legacy-alias deprecation you must plan around before July 24, 2026. Every endpoint, model name, and price below was verified against api-docs.deepseek.com in May 2026.

What is the fastest way to call the DeepSeek V4 API?

Three things and you are calling the model: an account with a small balance, an API key in an environment variable, and a single HTTP request to the chat completions endpoint. The base URL is https://api.deepseek.com and the canonical endpoint is /chat/completions. The OpenAI SDK convention https://api.deepseek.com/v1 also works — keep whichever base URL your existing OpenAI code already uses and just change the model name.

Set your key once:

export DEEPSEEK_API_KEY="sk-your-key-here"

The minimal working request (this is the one-liner from the Quick Answer, expanded):

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "user", "content": "Say hello in one short sentence."}
    ]
  }'

Python (OpenAI SDK — the most common path):

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function that merges two sorted lists without using sorted()."},
    ],
)

print(response.choices[0].message.content)

JavaScript / TypeScript (OpenAI Node SDK):

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com",
});

const response = await client.chat.completions.create({
  model: "deepseek-v4-pro",
  messages: [
    { role: "system", content: "You are a senior backend engineer." },
    { role: "user", content: "Write a TS function that retries fetch 3x with exponential backoff." },
  ],
});

console.log(response.choices[0].message.content);

For CommonJS, swap the import for const OpenAI = require("openai").default; — everything else is identical.

Want the full picture? Read our continuously-updated DeepSeek V4 complete guide for benchmarks, deployment patterns, and how V4 compares to GPT-5.5 and Claude Opus 4.7.

How do I get a DeepSeek API key?

You need a funded account before any call succeeds — DeepSeek does not offer a free request tier on the paid API.

Go to platform.deepseek.com and create an account.
Open the billing section and add credit. A small balance (a couple of dollars) is enough to start; calls fail with a 402 until a balance exists.
Open the API keys page from your dashboard.
Click Create new API key, name it (for example my-app-dev), and copy it immediately — it is shown only once.
Store it in an environment variable or secret manager. Never commit it or ship it in a client bundle.

The same key works for the OpenAI-compatible endpoint (https://api.deepseek.com) and the Anthropic-compatible endpoint (https://api.deepseek.com/anthropic), so you can integrate with OpenAI SDKs, Anthropic SDKs, or tools like Claude Code without separate credentials.

What does the DeepSeek V4 API cost in 2026?

Pricing below is per 1M tokens in USD, verified against the official DeepSeek Models & Pricing page as of May 2026. Prices change; the pricing page is the source of truth.

Model	Input — cache miss	Input — cache hit	Output	Context	Max output
deepseek-v4-flash	$0.14	$0.0028	$0.28	1M tokens	384K tokens
deepseek-v4-pro (standard)	$1.74	$0.0145	$3.48	1M tokens	384K tokens
deepseek-v4-pro (promo, 75% off — until 2026/05/31 15:59 UTC)	$0.435	$0.003625	$0.87	1M tokens	384K tokens

Two things matter most for cost planning:

The cache-hit input rate is dramatically cheaper. On V4-Flash, a cache hit is $0.0028/M versus $0.14/M for a cache miss — about 50x cheaper. On V4-Pro it is $0.0145/M versus $1.74/M. Reused prompt prefixes (system prompts, pinned documents) are nearly free on input. For high-volume or privacy-sensitive workloads, running DeepSeek V4-Flash locally removes the per-token bill entirely.
V4-Pro is on a temporary 75% discount. As of May 2026 the promo runs until 2026/05/31 15:59 UTC. Budget against the standard rate ($1.74 / $0.0145 / $3.48) so a price reversion does not surprise you, and re-check the pricing page near the end of the promo window.

Choose V4-Pro for strong reasoning, multi-step tool use, complex/agentic coding, and long-context analysis. Choose V4-Flash when latency and cost dominate — classification, extraction, templated generation, simple Q&A; our DeepSeek V4-Flash deep dive walks through where it holds up and where it does not. V4-Flash output is roughly 8% the cost of standard V4-Pro output, so model choice is the single biggest lever on your bill.

How do I call the API with raw HTTP or requests?

If you cannot or do not want to install an SDK, the API is plain JSON over HTTPS. This works in any language with an HTTP client.

import os
import requests

resp = requests.post(
    "https://api.deepseek.com/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['DEEPSEEK_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "deepseek-v4-flash",
        "messages": [
            {"role": "user", "content": "Explain the difference between a mutex and a semaphore in two sentences."}
        ],
    },
    timeout=120,
)

print(resp.json()["choices"][0]["message"]["content"])

The response shape matches the OpenAI ChatCompletions spec, so existing parsing code that reads choices[0].message.content works unchanged. The 120-second timeout matters for long generations — long-context requests can take a while to first byte.

How do I stream responses from DeepSeek V4?

Streaming displays tokens as they arrive instead of blocking on the full response. Both models support it, and it adds no extra cost — you pay the same per-token rate either way.

Python streaming:

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com")

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Explain how B-trees work and why databases use them."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
print()

Node.js / TypeScript streaming:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com",
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-pro",
  messages: [{ role: "user", content: "Explain how event loops work in Node.js." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}
console.log();

Raw SSE with curl — add "stream": true and the endpoint returns server-sent events:

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Count from 1 to 5."}],
    "stream": true
  }'

How do I use function calling and tool use?

DeepSeek V4 supports OpenAI-style function calling, which is the foundation of agentic workflows: the model decides when to invoke structured functions you define, you run them, and you feed the result back. V4-Pro handles multi-turn and parallel tool calls reliably.

import os, json
from openai import OpenAI

client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name, e.g. 'San Francisco'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["city"],
        },
    },
}]

messages = [{"role": "user", "content": "What's the weather in Tokyo right now?"}]

first = client.chat.completions.create(
    model="deepseek-v4-pro", messages=messages, tools=tools, tool_choice="auto",
)
msg = first.choices[0].message

if msg.tool_calls:
    call = msg.tool_calls[0]
    args = json.loads(call.function.arguments)
    # Run the real function here; we stub the result:
    result = json.dumps({"city": args["city"], "temp": 22, "unit": "celsius", "condition": "partly cloudy"})

    messages.append(msg)
    messages.append({"role": "tool", "tool_call_id": call.id, "content": result})

    final = client.chat.completions.create(model="deepseek-v4-pro", messages=messages, tools=tools)
    print(final.choices[0].message.content)
else:
    print(msg.content)

For agentic loops that chain many tool calls, V4-Pro is the recommended model — DeepSeek positions V4 for agent integrations with Claude Code, OpenClaw, and OpenCode via the Anthropic-compatible endpoint.

Companion guide

For benchmarks, deployment patterns, self-hosting trade-offs, and how V4 stacks up against GPT-5.5 and Claude Opus 4.7, see our DeepSeek V4 complete guide (2026).

How do I enable thinking (reasoning) mode?

DeepSeek V4 supports a dual-mode design: a fast non-thinking mode and an extended thinking mode that performs chain-of-thought before answering — strong for math, logic, and multi-step coding. Thinking mode is controlled by an explicit thinking parameter plus a reasoning_effort level, not by inflating max_tokens.

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "user", "content": "A farmer has 3 fields. Field A yields 20% more than B. Field C yields 15% less than A. Together they total 5,400 kg. How much does each field produce? Show your reasoning."}
    ],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high"
  }'

In the OpenAI SDK, pass the same fields via extra_body:

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Solve and show every step: ..."}],
    extra_body={"thinking": {"type": "enabled"}, "reasoning_effort": "high"},
)
print(response.choices[0].message.content)

Reasoning tokens are billed as output tokens, so reserve thinking mode for problems where accuracy justifies the cost; use non-thinking mode (the default) for routine generation. If you previously enabled "Think Max" by setting max_tokens to a very large value, migrate to the explicit thinking / reasoning_effort parameters — that is the supported mechanism in the V4 API.

How does prompt caching cut DeepSeek API costs?

DeepSeek V4 automatically caches prompt prefixes. When later requests share the same prefix — a long system prompt, a pinned document — those tokens are billed at the cache-hit rate, which is roughly 50x cheaper than a cache miss (see the pricing table). No setup, no configuration, no cache-management API.

Keep system prompts byte-identical across calls. Any drift breaks the prefix match and you pay full input price again.
Front-load static context. Put large, unchanging material (documents, schemas, instructions) first; put the variable user input last. The cache matches from the start of the prompt forward.
Pick the cheaper model when quality allows. V4-Flash output is a fraction of V4-Pro output cost; at scale this dominates the bill.
Batch related questions about the same context into one request instead of repeating the context across many calls.
Watch cache-hit rate in the dashboard — a low hit rate usually means your prefix is changing when it should not.

What changes before the July 24, 2026 deprecation?

This is the single migration item every existing DeepSeek integration must handle. The legacy model aliases deepseek-chat and deepseek-reasoner are deprecated and fully retired after July 24, 2026, 15:59 UTC. After that date, requests using those names will fail. The exact retirement timeline and the rationale behind it are tracked in our DeepSeek V4 release, features and benchmarks breakdown.

deepseek-chat mapped to the non-thinking behaviour now served by deepseek-v4-flash.
deepseek-reasoner mapped to the thinking behaviour, now deepseek-v4-flash with thinking mode enabled.

Action before July 24, 2026: grep your codebase and prompt configs for the strings deepseek-chat and deepseek-reasoner, replace them with deepseek-v4-flash or deepseek-v4-pro, and add thinking: {"type": "enabled"} where you previously relied on deepseek-reasoner for chain-of-thought. The base URL and your API key do not change.

What are the common DeepSeek API errors and fixes?

401 Unauthorized — key missing, invalid, or revoked. Re-check the environment variable and the dashboard.
402 Payment Required / Insufficient Balance — fund the account at platform.deepseek.com; calls fail with no balance.
429 Too Many Requests — rate limited. Implement exponential backoff; honour the Retry-After header when present.
Model not found — only deepseek-v4-pro and deepseek-v4-flash are current. deepseek-chat / deepseek-reasoner are legacy and retire July 24, 2026.
Truncated output — raise max_tokens (max output is 384K). Note that for thinking mode you enable reasoning via the thinking parameter, not by maxing out max_tokens.
Connection timeouts — set the HTTP client timeout to at least 120s for long generations; stream user-facing requests to mask latency.

FAQ

Is the DeepSeek V4 API compatible with OpenAI client libraries?

Yes. It implements the OpenAI ChatCompletions specification. Use the official openai Python or Node SDK, set the base URL to https://api.deepseek.com (or https://api.deepseek.com/v1), and use your DeepSeek key. It also exposes an Anthropic-compatible endpoint at https://api.deepseek.com/anthropic for Anthropic SDKs and tools like Claude Code.

Which base URL should I use with the OpenAI SDK?

Both https://api.deepseek.com and https://api.deepseek.com/v1 work with the OpenAI SDK. The official docs use the form without /v1. If you are migrating existing OpenAI code, keep the base URL you already have and only change the model name to deepseek-v4-pro or deepseek-v4-flash.

How much does the DeepSeek V4 API cost?

As of May 2026, V4-Flash is $0.14/M input (cache miss), $0.0028/M cache hit, $0.28/M output. Standard V4-Pro is $1.74 / $0.0145 / $3.48. V4-Pro is on a temporary 75% discount ($0.435 / $0.003625 / $0.87) until 2026/05/31 15:59 UTC. Always confirm on the official pricing page.

When are deepseek-chat and deepseek-reasoner being removed?

The legacy aliases deepseek-chat and deepseek-reasoner are fully retired after July 24, 2026, 15:59 UTC. Migrate to deepseek-v4-flash or deepseek-v4-pro before then; for reasoning, enable thinking mode explicitly via the thinking parameter.

How do I turn on reasoning mode?

Pass "thinking": {"type": "enabled"} along with "reasoning_effort" (for example "high") in the request body, or via extra_body in the OpenAI SDK. This replaces the old approach of inflating max_tokens to trigger extended reasoning.

Can I use DeepSeek V4 for code generation and agents?

Yes. V4-Pro is strong at multi-file refactoring, debugging, and agentic coding, and supports parallel and multi-turn tool calls. V4-Flash handles boilerplate and single-function generation cheaply. DeepSeek positions V4 for integration with agent tools like Claude Code, OpenClaw, and OpenCode.

Does the 1M-token context window work in practice?

Both V4-Pro and V4-Flash accept up to 1M tokens of input — enough for whole codebases or long documents in one request. Very long contexts increase latency and input cost, so include only what is relevant and rely on prompt caching for any static prefix.

Build faster with vetted AI engineers

The DeepSeek V4 API is a practical, low-cost foundation for coding assistants, document-analysis tools, and autonomous agents — its OpenAI-compatible interface means you can integrate it in an afternoon, and the pricing makes it viable at production scale. The harder part is building the product around it well: robust tool-use loops, caching that actually hits, graceful failure, and cost controls.

If you are hiring vetted remote developers experienced with DeepSeek V4 and other LLM APIs — engineers who have shipped agent frameworks, inference pipelines, and LLM features in production — Codersera can match you with the right technical fit, fast. See codersera.com/hire to extend your engineering team with developers who know this stack.