Record & Share Like a Pro
Free Screen Recording Tool
Made with ❤️ by developers at Codersera, forever free
4 min to read
DeepSeek AI has rapidly gained prominence as a Chinese AI model, rivaling even OpenAI's ChatGPT. Its open-source model, DeepSeek R1, is licensed by the Massachusetts Institute of Technology (MIT), ensuring accessibility for both personal and professional endeavors.
As the first open-source MoE (Mixture of Experts) vision-language model with MIT licensing, DeepSeek-VL2 offers:
Ollama simplifies the installation process, negating the necessity for cloud subscriptions.
Start DeepSeek: Enter the following command to start the application with debug mode enabled:
$env:OLLAMA_DEBUG="1" & "ollama app.exe"
%LOCALAPPDATA%\Ollama
%LOCALAPPDATA%\Programmes\Ollama
%HOMEPATH%.ollama
If you prefer not to download the software, you can access DeepSeek on the web:
This guide explains how to deploy the DeepSeek model using the vLLM framework.
Install Required Packages: Install the required libraries using pip:
pip install vllm==0.6.6.post1
setx OLLAMA_GPUS "1"
(Admin CMD)ollama list
Launch with debugging:
$env:OLLAMA_DEBUG="1"; ollama run deepseek-vl2-tiny
Pros: One-click setup, automatic updates
Cons: Limited model customization
# Create isolated environment
python -m venv deepseek_env
.\deepseek_env\Scripts\activate
# Install core dependencies
pip install torch==2.3.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
pip install vllm==0.6.6.post1 deepseek-vl2[gradio]
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04
RUN apt-get update && apt-get install -y python3.11
COPY requirements.txt .
RUN pip install -r requirements.txt
To implement DeepSeek-VL2, follow these steps:
Install the necessary dependencies using pip:
pip install -e .[gradio]
Example usage of DeepSeek-VL2 in Python:
import torch
from transformers import AutoModelForCausalLM
from deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
from deepseek_vl2.utils.io import load_pil_images
# specify the path to the model
model_path = "deepseek-ai/deepseek-vl2-tiny"
vl_chat_processor = DeepseekVLV2Processor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
vl_gpt = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
To run the Gradio demo, use the following commands:
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-tiny" \
--port 37914
from deepseek_vl2.utils.io import load_pil_images
conversation = [
{
"role": "<|User|>",
"content": "<image>\nAnalyze this medical scan",
"images": ["./patient_scan.png"]
},
{"role": "<|Assistant|>", "content": ""}
]
pil_images = load_pil_images(conversation, max_size=(1024,1024))
processor = DeepseekVLV2Processor.from_pretrained("deepseek-ai/deepseek-vl2-small")
inputs = processor(
conversations=conversation,
images=pil_images,
force_batchify=True
).to("cuda")
model = DeepseekVLV2ForCausalLM.from_pretrained(
"deepseek-ai/deepseek-vl2-small",
torch_dtype=torch.bfloat16,
device_map="auto"
)
outputs = model.generate(
inputs_embeds=inputs_embeds,
attention_mask=inputs.attention_mask,
max_new_tokens=512,
temperature=0.7,
top_p=0.8
)
--chunk_size 512
for memory-constrained systemsBatch Processing:
llm = LLM(model="deepseek-vl2", max_batch_size=8, gpu_memory_utilization=0.85)
Quantization (FP16 → INT8):
model = quantize_model(model, quantization_config=BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0
))
ollama serve --tls-cert cert.pem --tls-key key.pem
CI/CD Pipeline:
# GitHub Actions Example
- name: DeepSeek Model Test
run: |
ollama pull deepseek-vl2-tiny
pytest vision_tests/
env:
OLLAMA_HOST: 127.0.0.1
CUDA_VISIBLE_DEVICES: 0
Monitoring:
ollama logs --format json | jq '.latency, .gpu_util'
Problem: CUDA Out-of-Memory Error
Solution:
# Reduce image resolution
processor.image_size = (512,512)
# Enable gradient checkpointing
model.gradient_checkpointing_enable()
Problem: Ollama Connection Refused
Fix:
netsh advfirewall firewall add rule name="Ollama Port" dir=in action=allow protocol=TCP localport=11434
Problem: Slow Inference Speed
Optimizations:
nvidia-smi -pm 1
export OLLAMA_FP8_MATH=1
Edge Deployment:
torch.onnx.export(model, inputs, "deepseek-vl2.onnx", opset_version=18)
Model Updates:
ollama pull deepseek-vl2-2024Q3
To start using DeepSeek:
Run Command: Type the following command:
ollama run deepseek-r1:8b
By following these instructions and utilizing the code examples, you can effectively run and implement DeepSeek-VL2 on a Windows environment.
Need expert guidance? Connect with a top Codersera professional today!