3 min to read
DeepSeek-R1-0528 is a cutting-edge open-source large language model (LLM) designed for developers, researchers, and AI enthusiasts. With state-of-the-art benchmark performance, advanced reasoning capabilities and support for JSON output and function calling, this model stands out for both experimentation and production use.
In this guide, you'll learn how to run and install DeepSeek-R1-0528 on your local machine using Ollama, vLLM, and Hugging Face Transformers.
DeepSeek-R1-0528 is the latest entry in the DeepSeek-R1 series and offers:
The model is freely available with open-source weights on Hugging Face.
Before setting up DeepSeek-R1-0528, ensure your hardware meets the minimum system requirements based on the model size.
Component | Minimum (1.5B) | Recommended (7B–8B) | Large Models (14B–32B) | Enterprise (671B) |
---|---|---|---|---|
CPU | Intel i7 / AMD Ryzen 7 (8 cores) | 3.5GHz+ latest-gen | Server-grade, multi-socket | Multi-socket, high core count |
RAM | 16GB | 32–64GB | 64–128GB | 256GB+ |
GPU | NVIDIA RTX 3060 (12GB VRAM) | A100, H100 (16–24GB VRAM) | 24–48GB VRAM | 80GB+ VRAM |
Storage | 512GB NVMe SSD | 1–2TB NVMe SSD (PCIe Gen 4/5) | Multiple NVMe SSDs (RAID) | Enterprise SSD arrays |
Tip: Use quantized versions for better performance on consumer-grade GPUs.
You can install and run DeepSeek-R1-0528 locally using three main methods:
Ollama simplifies LLM deployment and is ideal for getting started quickly.
1. Check GPU Compatibility
nvidia-smi
2. Update System and Install Dependencies
sudo apt-get update
sudo apt-get install pciutils -y
3. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
4. Start the Ollama Server
ollama serve
5. Verify Installation
ollama
6. Install the DeepSeek-R1-0528 Model
Currently available model (quantized):
ollama run hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL
7. Run Inference
Use the terminal interface to send prompts and receive real-time responses.
vLLM is ideal for high-throughput inference in scalable environments.
1. Install vLLM
pip install vllm
2. Download Model Weights
From Hugging Face:
https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
3. Launch the API Server
python -m vllm.entrypoints.openai.api_server \
--model deepseek-ai/DeepSeek-R1-0528 \
--tokenizer deepseek-ai/DeepSeek-R1-0528
4. Query the Model
Use the OpenAI-compatible API to send and receive prompts.
Use this method for programmatic integration and full control over inference.
1. Install Required Packages
pip install torch transformers
2. Load the Model in Python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-0528")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-0528")
prompt = "Explain the difference between monorepos and turborepos."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
DeepSeek-R1-0528 supports structured prompts, file handling, and web integration.
The assistant is DeepSeek-R1, created by DeepSeek。
Today is Monday, May 28, 2025.
0.6
file_template = """[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""
search_answer_en_template = '''
# The following contents are the search results related to the user's message:
{search_results}
...
# The user's message is:
{question}'''
venv
or conda
for dependency isolationDeepSeek-R1-0528 is licensed under the MIT License, allowing unrestricted commercial use, modification, and redistribution.
DeepSeek-R1-0528 empowers you to run a powerful, open-source LLM locally with a method that fits your workflow—Ollama for simplicity, vLLM for scalability, or Transformers for flexibility.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.