Last updated April 2026 — refreshed for current model versions, CUDA 12.8+, and PyTorch 2.7.
JanusFlow 1.3B is DeepSeek's unified multimodal model that handles both image understanding and image generation in a single 1.3B-parameter package. Unlike Janus-Pro (which uses autoregressive generation), JanusFlow uses rectified flow — the same technique used by diffusion pipelines — to produce 384×384 images directly inside an LLM framework. This guide walks through a complete, working Windows installation, updated for CUDA 12.8 and PyTorch 2.7 (April 2026).
What changed since the original 2025 guide — read this firstCUDA version: The original guide specified CUDA 12.4. CUDA 12.6 and 12.8 are now the practical targets for Windows + PyTorch. CUDA 13.x is available but cuDNN 9.x for CUDA 13 is not supported on Windows — stick with cu126 or cu128 wheels.PyTorch version: PyTorch 2.7 (released April 2025) ships pre-built cu128 wheels and adds Blackwell GPU support. Avoid Python 3.13 — PyTorch CUDA wheels are not yet available for it; use Python 3.10, 3.11, or 3.12.JanusFlow vs Janus-Pro: JanusFlow 1.3B (November 2024) and Janus-Pro (January 2025) are separate model lines. JanusFlow uses rectified flow; Janus-Pro uses autoregressive generation. Both are still the latest releases from DeepSeek in this series as of April 2026.PyTorch install pitfall: DeepSeek'srequirements.txtandpyproject.tomlpin PyTorch without CUDA support. On Windows you must remove those lines and install the cu126/cu128 wheel manually — or GPU inference will silently fall back to CPU.Image quality context: JanusFlow is capped at 384×384 output. For high-quality standalone image generation at larger resolutions, FLUX.1 or Stable Diffusion 3.5 remain stronger. JanusFlow's value is the unified understanding + generation in one model.
TL;DR — Quick Reference
| Item | Requirement |
|---|---|
| OS | Windows 10 / 11 (64-bit) |
| GPU | NVIDIA with 8 GB+ VRAM (CUDA-capable) |
| CUDA Toolkit | 12.6 or 12.8 (recommended; avoid CUDA 13.x on Windows) |
| Python | 3.10, 3.11, or 3.12 (avoid 3.13) |
| PyTorch | 2.7.x with cu126 or cu128 wheel |
| Model size (JanusFlow 1.3B) | ~5 GB on disk (BF16 safetensors) |
| Min disk space | 15 GB free (model + venv + repo) |
| License | MIT (code), DeepSeek Model License (weights) |
Understanding the Janus Model Family
Before diving into installation it helps to know which model you actually want. DeepSeek has released three distinct architectures under the Janus name:
| Model | Released | Parameters | Generation method | Image understanding |
|---|---|---|---|---|
| Janus-1.3B | Oct 2024 | 1.3B | Autoregressive | Yes |
| JanusFlow-1.3B | Nov 2024 | 1.3B (2B effective) | Rectified flow | Yes |
| Janus-Pro-1B | Jan 2025 | 1B | Autoregressive | Yes (improved) |
| Janus-Pro-7B | Jan 2025 | 7B | Autoregressive | Yes (best in series) |
JanusFlow 1.3B is the right pick if you have exactly 8 GB VRAM and want the fastest image generation path. Its rectified-flow backend is conceptually closer to Stable Diffusion than to a pure LLM, which makes image output more coherent at low step counts. The Gradio demo for JanusFlow lives in demo/app_janusflow.py — separate from the Janus-Pro demo (demo/app_januspro.py).
If you are building a local AI stack that runs multiple models, check the OpenClaw + Ollama setup guide for running local AI agents — it covers orchestrating Ollama-served models alongside custom Python-deployed models like JanusFlow.
Prerequisites
Hardware
- GPU (required for inference): NVIDIA GPU with 8 GB+ VRAM. The JanusFlow 1.3B model fits in 8 GB in BF16 precision. The 7B Janus-Pro variant needs 16 GB+. AMD and Intel GPUs are not supported by the official code.
- RAM: 16 GB system RAM minimum; 32 GB recommended when running the Gradio demo alongside a browser.
- Storage: 15 GB free space (5 GB for model weights, ~2 GB for the Python venv, remainder for the repo and temp files).
Software dependencies (install before cloning)
- NVIDIA Driver (latest studio or game-ready)
Download from https://www.nvidia.com/download/index.aspx. Verify withnvidia-smiin PowerShell — you need driver version 520+ for CUDA 12.x. - Python 3.10, 3.11, or 3.12
Download from python.org. During setup, check "Add Python to PATH." Verify:python --version.
Do not use Python 3.13 — PyTorch CUDA wheels are not yet built for it. - Git for Windows
Download from https://git-scm.com/download/win. - Microsoft Visual C++ Build Tools 2022
Required for compiling any native extensions. Install via Visual Studio — choose "Desktop development with C++" workload. Without this,pip install -e .will fail on C extension steps.
CUDA Toolkit 12.6 or 12.8
CUDA 12.8 is the current recommended version for PyTorch 2.7 on Windows. Download from the CUDA Toolkit Archive. After installation, verify:
nvcc --version
# Expected: Cuda compilation tools, release 12.8, V12.8.x
Do not install CUDA 13.x for this use case — cuDNN 9.x for CUDA 13 is not supported on Windows as of April 2026.
Step-by-Step Installation
Step 1: Clone the repository
Open PowerShell or Windows Terminal and run:
git clone https://github.com/deepseek-ai/Janus.git
cd Janus
Step 2: Create a Python virtual environment
Using a virtual environment keeps JanusFlow's dependencies isolated from your system Python:
python -m venv janus-env
janus-env\Scripts\activate
Your prompt should now show (janus-env).
Step 3: Remove the CPU-only PyTorch pin from the project files
This is the most common source of the "Torch not compiled with CUDA enabled" error. DeepSeek's repo declares PyTorch as a dependency without a CUDA index URL, which means pip will silently install the CPU-only wheel.
Open pyproject.toml in a text editor and delete any line that starts with torch. Then open requirements.txt (if it exists) and do the same. Save both files.
Step 4: Install PyTorch with CUDA support
Install the PyTorch 2.7 wheel that matches your CUDA installation. Choose the command for your CUDA version:
For CUDA 12.8 (recommended):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
For CUDA 12.6:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
Verify GPU access after installation:
python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"
# Expected: True
# Expected: NVIDIA GeForce RTX XXXX
If this prints False, stop here and re-check your CUDA installation before continuing — running JanusFlow on CPU is extremely slow (30–60 minutes per image).
Step 5: Install JanusFlow and its dependencies
JanusFlow requires the diffusers library for its VAE decoder. Install everything:
pip install -e .
pip install diffusers[torch]
For the Gradio demo interface, add:
pip install -e .[gradio]
Step 6: Download the model weights
The model will auto-download on first run from Hugging Face (~5 GB). To pre-download and avoid timeouts during demo startup, use:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="deepseek-ai/JanusFlow-1.3B",
local_dir="./models/JanusFlow-1.3B"
)
Run this as a script: python download_model.py
The model card is at huggingface.co/deepseek-ai/JanusFlow-1.3B. You can also browse it to verify checksums before running.
Step 7: Run the JanusFlow demo
The JanusFlow demo is separate from the Janus-Pro demo:
python demo/app_janusflow.py
If you pre-downloaded the model to a custom path, edit the model_path variable at the top of demo/app_janusflow.py:
model_path = "./models/JanusFlow-1.3B"
The Gradio interface will open at http://127.0.0.1:7860. On first load expect 30–60 seconds for the model to warm up on GPU. Once loaded, text-to-image generation takes 10–30 seconds per image on an RTX 3080/4070-class card.
Using JanusFlow Programmatically
To run JanusFlow in a script without the Gradio UI, use the MultiModalityCausalLM class directly. Text-to-image example:
import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
model_path = "deepseek-ai/JanusFlow-1.3B"
processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = processor.tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True
).to(torch.bfloat16).cuda().eval()
# Image understanding
conversation = [
{
"role": "User",
"content": "<image_placeholder>\nDescribe this image.",
"images": ["./example.png"]
},
{"role": "Assistant", "content": ""}
]
pil_images = [Image.open("./example.png")]
prepare_inputs = processor(
conversations=conversation,
images=pil_images,
force_batchify=True
).to(model.device)
inputs_embeds = model.prepare_inputs_embeds(**prepare_inputs)
outputs = model.language_model.generate(
inputs_embeds=inputs_embeds,
attention_mask=prepare_inputs.attention_mask,
max_new_tokens=512,
)
answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(answer)
Choosing the Right Model: JanusFlow vs Janus-Pro
The key technical difference: JanusFlow uses rectified flow (a diffusion-adjacent approach) for image generation, while Janus-Pro uses autoregressive token prediction. In practice:
- JanusFlow generates smoother, more coherent images at low step counts. It is the better choice when you want fast text-to-image output with a small model (8 GB VRAM). Understanding performance is slightly lower than Janus-Pro-7B.
- Janus-Pro-7B achieves better scores on multimodal understanding benchmarks (GenEval: 84.2%) but needs 16 GB+ VRAM. It is the better choice for visual question answering, chart interpretation, and complex image-text reasoning.
- Janus-Pro-1B is roughly comparable to JanusFlow on 8 GB hardware but uses autoregressive generation — try both and compare for your use case.
Performance and Benchmarks
The following numbers come from the JanusFlow paper (arXiv:2411.07975) published November 2024. No 2026 benchmarks for this specific model have been published — it has received no updates since release.
Multimodal Understanding (JanusFlow 1.3B)
| Benchmark | JanusFlow 1.3B | Notes |
|---|---|---|
| MMBench | 74.9 | General vision-language understanding |
| SeedBench | 70.5 | Comprehensive multimodal evaluation |
| GQA | 60.3 | Compositional visual question answering |
Image Generation (JanusFlow 1.3B)
| Benchmark | JanusFlow 1.3B | SDXL (reference) | SDv1.5 (reference) |
|---|---|---|---|
| GenEval Overall | 0.63 | ~0.55 | ~0.43 |
| MJHQ FID-30k | 9.51 | ~9.55 | ~38.0 |
| DPG-Bench | 80.09% | — | — |
Lower FID is better. Higher GenEval and DPG-Bench are better. All numbers from the November 2024 paper — verify against current leaderboards for production decisions.
Key limitation: JanusFlow is capped at 384×384 pixel output. For production image generation at 1024×1024 or higher, FLUX.1 (Black Forest Labs) or Stable Diffusion 3.5 are stronger choices. JanusFlow's value is the single unified model that does both understanding and generation.
Running Configuration
Choosing inference settings
- 8 GB VRAM: Run JanusFlow-1.3B in BF16 (default). Avoid batching more than one image at a time.
- GPU memory pressure: Set the environment variable before launching:
set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. This reduces fragmentation-related OOM errors. - Temperature / CFG: The demo exposes classifier-free guidance scale and temperature. Lower CFG (5–7) produces more coherent images. Higher values (12+) can cause oversaturation.
Model switcher (1B → 7B)
To run Janus-Pro-7B instead of JanusFlow-1.3B, open demo/app_januspro.py and change:
# Change from:
model_path = "deepseek-ai/Janus-Pro-1B"
# To:
model_path = "deepseek-ai/Janus-Pro-7B"
Janus-Pro-7B requires 16 GB+ VRAM. On an RTX 3090 (24 GB) it runs without modification.
Common Pitfalls and Troubleshooting
"AssertionError: Torch not compiled with CUDA enabled"
This is the most common issue on Windows. Cause: pip installed the CPU-only PyTorch wheel because the CUDA index URL was not specified.
Fix: Uninstall torch, then reinstall with the explicit CUDA index:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Gradio loads but model produces no output
Usually a VRAM exhaustion issue. The Gradio interface may initialize without the model actually fitting into GPU memory. Check:
nvidia-smi
If VRAM is near 100% before you submit a prompt, other processes are consuming GPU memory. Close Chrome, gaming overlays, or other GPU processes, then restart the demo. Set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True before launching.
"DLL not found" or import errors on Windows
Install Microsoft Visual C++ Redistributable 2022 from https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist. Then update your NVIDIA drivers via GeForce Experience or the NVIDIA Driver download page.
Slow download from Hugging Face
If snapshot_download times out, set HF_ENDPOINT to a mirror, or download the safetensor files individually from the model card and place them in a local directory. Point model_path to that local directory in the demo script — the from_pretrained call accepts local paths.
Python 3.13 / PyTorch incompatibility
PyTorch 2.7 CUDA wheels are not available for Python 3.13 (as of April 2026). Downgrade to 3.12 or use pyenv-win to manage multiple Python versions side by side.
CUDA 13.x + Windows cuDNN failure
NVIDIA has not released cuDNN 9.x for CUDA 13.x on Windows as of April 2026. If you upgraded to CUDA 13.x, uninstall it and install CUDA 12.8 instead. Your GPU driver does not need to be downgraded — just the CUDA Toolkit.
What About Janus and JanusFlow on Ollama?
As of April 2026, Ollama does not support JanusFlow or Janus-Pro natively. The rectified flow generation pipeline requires custom PyTorch code not yet packaged in GGUF/GGML format. Running JanusFlow requires the Python install method described in this guide. Monitor the Janus GitHub repo for community-contributed Ollama or llama.cpp ports.
For models that do run in Ollama today (including DeepSeek-R2 and Qwen-3 family models), the OpenClaw + Ollama setup guide for running local AI agents provides a practical orchestration approach alongside custom Python inference.
Integrating JanusFlow With Your Development Workflow
If you are building a product or research tool on top of JanusFlow, the setup described here works well for experimentation, but productionizing it typically requires a dedicated inference server with proper resource management. Codersera maintains a guide to running DeepSeek Janus-Pro 7B on Windows and a ComfyUI integration for Janus-Pro on Windows that covers workflow-based generation pipelines. If you need a team of engineers who specialize in AI model deployment, Codersera's vetted AI engineers are available for contract and full-time remote engagements.
FAQ
What is the difference between JanusFlow and Janus-Pro?
JanusFlow uses rectified flow (similar to a diffusion model) for image generation, while Janus-Pro uses autoregressive next-token prediction. JanusFlow is better for fast, smooth image generation on 8 GB VRAM. Janus-Pro-7B is better for complex visual understanding tasks but needs 16 GB+ VRAM.
Can I run JanusFlow 1.3B without a GPU?
Technically yes — the model loads on CPU — but it is impractical. CPU inference takes 30–60 minutes per image. A CUDA-capable NVIDIA GPU with 8 GB+ VRAM is required for usable performance.
Does JanusFlow work with AMD GPUs via ROCm on Windows?
ROCm on Windows is still limited to specific AMD GPUs under WSL2. The official Janus code does not include ROCm-specific optimizations. You can try installing PyTorch's ROCm wheels but community support for this combination is sparse. NVIDIA is the only reliably supported path on Windows.
What CUDA version should I use in 2026?
CUDA 12.8 is the recommended version for Windows + PyTorch 2.7. CUDA 12.6 also works. Avoid CUDA 13.x on Windows because cuDNN 9.x for CUDA 13 is not supported on Windows as of April 2026.
Is JanusFlow free to use commercially?
The code is MIT-licensed. The model weights are under the DeepSeek Model License. Review that license for commercial use restrictions — it permits most research and commercial applications with attribution, but check the specific terms for your use case.
How does JanusFlow compare to FLUX.1 for image generation?
FLUX.1 produces significantly higher resolution and higher fidelity images. JanusFlow is capped at 384×384 pixels, while FLUX.1 Schnell can generate 1024×1024 images faster on similar hardware. JanusFlow's advantage is the integrated understanding + generation in one 1.3B-parameter model — you get a multimodal chatbot that can also generate images, all under 8 GB VRAM.
Will there be a JanusFlow 2 or JanusFlow 7B?
DeepSeek has not announced a JanusFlow update as of April 2026. The Janus-Pro series (January 2025) replaced Janus for most production use cases. Monitor the official GitHub repo and DeepSeek's Hugging Face page for new releases.
The model downloads fine but the demo crashes immediately
Check that diffusers[torch] is installed. JanusFlow's image generation path requires the diffusers library for SDXL-VAE decoding — this is separate from the base pip install -e . and is the second most common missing dependency after the CUDA wheel issue.
References and Further Reading
- JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation (arXiv, Nov 2024)
- deepseek-ai/Janus — Official GitHub Repository (README, release notes, demos)
- deepseek-ai/JanusFlow-1.3B — Hugging Face Model Card
- PyTorch 2.7 Release Notes — CUDA 12.8 support, Blackwell GPU, Windows improvements (April 2025)
- NVIDIA CUDA Toolkit Archive — download CUDA 12.8 for Windows
- Practical Windows install walkthrough with CUDA 12.6 + Python 3.12 (Haber, 2025)
- DeepSeek Janus Models — Are They Worth the Hype? (getimg.ai, May 2025)
- PyTorch: Introducing CUDA 13.2 and deprecating CUDA 12.8 (PyTorch Dev Discussion)