6 min to read
The DeepSeek API is a drop-in replacement for the OpenAI API. Change your base URL and model name, and every Python or Node.js app built against OpenAI's SDK starts routing requests to DeepSeek V3.2 instead. This DeepSeek API tutorial walks you through the full setup — from generating an API key to handling the reasoning tokens that deepseek-reasoner produces — so you can integrate either model into production code today.
DeepSeek V3.2 competes with GPT-4o and Claude Sonnet on most developer benchmarks while costing a fraction of what those APIs charge per million tokens. More importantly for teams already on OpenAI: the DeepSeek API is fully compatible with the OpenAI REST spec. There is no new SDK to learn, no new message format, and no migration script. You point your existing client at https://api.deepseek.com and the rest stays the same.
The API exposes two models, each serving a distinct use case:
deepseek-chat is DeepSeek V3.2 in standard mode. It behaves like any capable large language model — fast, cost-efficient, suited to classification, summarisation, code generation, and general Q&A. No special output structure, no extra tokens.
deepseek-reasoner is DeepSeek V3.2 in thinking mode. Before it writes the final answer, the model produces a full chain-of-thought trace stored in a separate reasoning_content field. This makes it significantly stronger on multi-step problems: maths, complex code review, legal analysis, and anything that benefits from explicit intermediate reasoning. The trade-off is latency and cost — reasoning tokens are generated and billed in addition to the answer tokens.
For a deeper look at how DeepSeek's reasoning and chat variants compare on benchmarks, see our DeepSeek V3 vs DeepSeek R1 technical comparison.
# Linux / macOS
export DEEPSEEK_API_KEY="sk-..."
# Windows (PowerShell)
$Env:DEEPSEEK_API_KEY = "sk-..."
Add the export to your .bashrc, .zshrc, or your CI/CD secrets store so it persists across sessions.
You do not need to install a DeepSeek-specific package. The standard OpenAI SDK handles everything once you override base_url.
pip install openai
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
That is the entire configuration. Every client.chat.completions.create() call you write from this point targets the DeepSeek API.
npm install openai
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com",
});
The rest of this guide uses Python examples, but every pattern translates directly to the JavaScript SDK — the method names and response shapes are identical.
The following example sends a single user message to deepseek-chat and prints the response:
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what a context window is in one sentence."},
],
)
print(response.choices[0].message.content)
The response object is identical to an OpenAI response. response.choices[0].message.content holds the assistant text. response.usage breaks down prompt tokens, completion tokens, and total tokens for billing.
Switching to deepseek-reasoner is a one-word change:
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=[
{"role": "user", "content": "What is the derivative of x^3 + 2x?"},
],
)
# The final answer
print(response.choices[0].message.content)
# The full chain-of-thought trace (unique to deepseek-reasoner)
print(response.choices[0].message.reasoning_content)
For interactive applications, streaming avoids the long blank wait before any output appears. Set stream=True and iterate over the returned chunks:
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Write a Python function to flatten a nested list."},
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
When streaming with deepseek-reasoner, the response carries two concurrent channels — delta.reasoning_content arrives first (the thinking phase), followed by delta.content (the answer). The hasattr guard below prevents attribute errors on SDK versions that do not expose reasoning_content for non-reasoner models:
stream = client.chat.completions.create(
model="deepseek-reasoner",
messages=[
{"role": "user", "content": "Prove that sqrt(2) is irrational."},
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
# reasoning_content is only present on deepseek-reasoner responses
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
print(delta.reasoning_content, end="", flush=True)
elif delta.content:
print(delta.content, end="", flush=True)
The reasoning_content field is the most powerful — and most mishandled — part of the deepseek-reasoner API. Here is what developers get wrong in multi-turn conversations.
Critical: Never includereasoning_contentfrom a previous assistant turn when constructing the next request's message list. The API returns a400error if it detectsreasoning_contentin the input messages. Strip it before building the next turn.
Correct multi-turn pattern:
messages = [{"role": "user", "content": "What is 17 x 23?"}]
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=messages,
)
assistant_msg = response.choices[0].message
# Build next turn: only include role + content, NOT reasoning_content
messages.append({
"role": "assistant",
"content": assistant_msg.content,
# Do NOT append reasoning_content here -- it causes a 400 error
})
messages.append({"role": "user", "content": "Now multiply that result by 4."})
response2 = client.chat.completions.create(
model="deepseek-reasoner",
messages=messages,
)
print(response2.choices[0].message.content)
If you are storing conversation history in a database, save reasoning_content separately for observability or debugging — but never rehydrate it into the messages array sent to the API.
For details on how DeepSeek V3.2 performs across different API providers and which endpoint configurations give the best throughput, see our DeepSeek V3.2 API providers and performance guide.
To guarantee that the model returns a valid JSON string, set response_format to json_object and include the word "json" in your system or user prompt. Without the prompt keyword, the model may ignore the format instruction.
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{
"role": "system",
"content": "Return a JSON object with keys: name, language, stars.",
},
{
"role": "user",
"content": "Describe the FastAPI framework as json.",
},
],
response_format={"type": "json_object"},
)
import json
data = json.loads(response.choices[0].message.content)
print(data)
DeepSeek V3.2 supports OpenAI-style tool use in both deepseek-chat and deepseek-reasoner modes. Define tools as JSON Schema objects and pass them in the tools parameter:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Returns current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
}
]
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "What is the weather in Berlin?"}],
tools=tools,
tool_choice="auto",
)
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name) # get_weather
print(tool_call.function.arguments) # {"city": "Berlin"}
After receiving the tool call, execute your function, then append a tool role message with the result and make a second API call to get the model's final response. The pattern is identical to OpenAI function calling.
1.0. For deterministic tasks (code, structured data), use 0.0 to 0.3. For creative tasks, 1.0 to 1.3.0.6. In thinking mode, top_p, presence_penalty, and frequency_penalty are accepted by the API but have no effect on output.deepseek-chat and deepseek-reasoner share a 128K token context window. For long documents, chunk and summarise rather than stuffing the full context — reasoning token costs scale with context length.429 (rate throttling during high traffic) with exponential backoff. DeepSeek does not impose hard rate limits but the API can slow under load.import time
from openai import RateLimitError
def call_with_backoff(client, **kwargs):
for attempt in range(4):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
if attempt == 3:
raise
time.sleep(2 ** attempt)
If you are deploying DeepSeek API calls as part of a serverless application, our guide on integrating DeepSeek with Vercel covers environment variable handling and edge function patterns.
DeepSeek's API pricing is substantially lower than equivalent OpenAI models. The approximate current rates are listed below — verify at platform.deepseek.com/pricing before committing to production budgets.
For cost-sensitive workloads, use deepseek-chat for classification, summarisation, and standard generation. Reserve deepseek-reasoner for tasks where the quality improvement from chain-of-thought reasoning is measurable and justifies the higher output token cost.
DeepSeek does not publish hard rate limits. The platform attempts to serve all requests, but during peak usage you may see increased latency. Implement the retry pattern above, and monitor response.usage to track token consumption per call.
For a broader look at what the current V3.2 model is capable of across different tasks, see the DeepSeek V3.2 benchmarks and feature guide.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.