Run DeepSeek Janus-Pro 7B on Mac (Apple Silicon Guide)

Quick answer. DeepSeek Janus-Pro 7B is a vision-generation multimodal model and is not available on Ollama. Run it on Apple Silicon (M2/M3/M4 with 32 GB+ unified memory) by cloning deepseek-ai/Janus, installing PyTorch with the MPS backend, downloading the official deepseek-ai/Janus-Pro-7B weights from Hugging Face, and loading the model in float16 with PYTORCH_ENABLE_MPS_FALLBACK=1.

Artificial intelligence and machine learning models have become indispensable tools for developers, researchers, and tech enthusiasts. Among the many models available, the DeepSeek Janus-Pro 7B stands out as a unified multimodal model — it can both understand images and generate them from text prompts. If you're a Mac user looking to run this model locally, this guide walks through the canonical PyTorch + MPS path that actually works on Apple Silicon.

Want the full picture? Read our continuously-updated Self-Hosting LLMs complete guide — hardware requirements, inference engines (vLLM, Ollama, llama.cpp), and step-by-step deployment patterns for running open-weight models on your own infrastructure.

Introduction

DeepSeek released Janus-Pro 7B in January 2025 as a unified autoregressive multimodal model: one network handles both vision understanding (image-in, text-out) and vision generation (text-in, image-out). The code is MIT-licensed; the weights ship under the DeepSeek Model License.

Two things to know before you start, because they are the source of most confusion online:

Janus-Pro is not on Ollama. Ollama serves chat/instruct LLMs through GGUF; it does not support Janus-Pro's vision-generation pipeline. Any "ollama pull deepseek-ai/janus-pro-7b" instruction you've seen elsewhere is wrong — that tag does not exist in the official Ollama library, and the unofficial GGUF community uploads only contain the language backbone, not the image-generation head.
PyTorch's MPS backend does not support bfloat16 (the dtype the official Janus example assumes). On Mac you must load the model in float16 and set PYTORCH_ENABLE_MPS_FALLBACK=1 so unsupported ops fall back to CPU.

What Mac do I need to run Janus-Pro 7B?

Chip: Apple Silicon (M2 Pro / M3 / M4 / M5). Intel Macs and base M1s will technically install but generation is impractically slow.
Unified memory: 32 GB recommended. Janus-Pro 7B in fp16 is roughly 14 GB on disk, and you need headroom for MPS overhead, the image-generation buffers, and the OS. 16 GB will swap heavily.
Storage: ~20 GB free (model weights + Python environment).
macOS: 14.0 or newer (for current PyTorch MPS support).

Step-by-step installation

1. Install Homebrew and Python

If you don't have Homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install python@3.11 git git-lfs

Python 3.10 or 3.11 is the safe range for the Janus repo's dependencies.

2. Create a virtual environment

mkdir -p ~/janus-pro && cd ~/janus-pro
python3.11 -m venv .venv
source .venv/bin/activate

3. Install PyTorch with MPS support

PyTorch 2.x ships MPS support out of the box on Apple Silicon — no nightly needed for inference:

pip install --upgrade pip
pip install torch torchvision torchaudio

Verify MPS is available:

python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"

4. Clone the Janus repo and install dependencies

git clone https://github.com/deepseek-ai/Janus.git
cd Janus
pip install -e .
pip install transformers accelerate sentencepiece pillow

5. Download the Janus-Pro 7B weights

Pull the official model from Hugging Face. Either let transformers auto-download on first use (it will, ~14 GB), or pre-fetch with huggingface-cli:

pip install huggingface_hub
huggingface-cli download deepseek-ai/Janus-Pro-7B --local-dir ./Janus-Pro-7B

Model card and license: huggingface.co/deepseek-ai/Janus-Pro-7B.

6. Run image-understanding inference

Save the following as understand.py. The two MPS-specific changes vs. the upstream example: device="mps" and dtype=torch.float16 (not bfloat16).

import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

import torch
from transformers import AutoModelForCausalLM
from janus.models import VLChatProcessor
from janus.utils.io import load_pil_images

model_path = "deepseek-ai/Janus-Pro-7B"  # or "./Janus-Pro-7B" if pre-downloaded
processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = processor.tokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
model = model.to(torch.float16).to("mps").eval()

conversation = [
    {
        "role": "<|User|>",
        "content": "<image_placeholder>\nDescribe what's in this image.",
        "images": ["./example.jpg"],
    },
    {"role": "<|Assistant|>", "content": ""},
]

pil_images = load_pil_images(conversation)
prepare_inputs = processor(
    conversations=conversation, images=pil_images, force_batchify=True
).to("mps")

inputs_embeds = model.prepare_inputs_embeds(**prepare_inputs)
outputs = model.language_model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=256,
    do_sample=False,
)
print(tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True))

Run it:

PYTORCH_ENABLE_MPS_FALLBACK=1 python understand.py

7. Run text-to-image generation

The official deepseek-ai/Janus repo ships generation_inference.py — clone-then-edit two lines: change .cuda() to .to("mps") and torch.bfloat16 to torch.float16. Then:

PYTORCH_ENABLE_MPS_FALLBACK=1 python generation_inference.py

Expect image generation to be slow on Apple Silicon — a single 384x384 image typically takes 30-90 seconds on an M2 Pro with 32 GB, because the autoregressive image decoder produces tokens one at a time and several ops fall back to CPU. For real-time image-gen work, a CUDA GPU is materially faster.

Prefer a GUI? Use ComfyUI.

If you'd rather drive Janus-Pro through a node graph instead of Python, see our companion guide: Running DeepSeek Janus-Pro 7B on Mac using ComfyUI. The setup uses the CY-CHENYUE/ComfyUI-Janus-Pro custom node and the same Hugging Face weights.

Troubleshooting

TypeError: BFloat16 is not supported on MPS — you forgot to swap torch.bfloat16 for torch.float16 in the model loading line. PyTorch's MPS backend does not implement bfloat16. (tracking issue)
Slow first run / very high RAM pressure — that's the model loading and the MPS allocator warming. After the first inference, subsequent calls are much faster. If macOS starts swapping aggressively, drop generation resolution or close other GPU-heavy apps.
Flash-attention import errors — Apple Silicon has no flash-attn-2. The Janus code paths default to standard attention on MPS; if you see a flash-attn import error it means a dependency expected CUDA. Set attn_implementation="eager" when loading the model.
Operator not implemented for MPS — PYTORCH_ENABLE_MPS_FALLBACK=1 sends unsupported ops to CPU automatically. If you see a hard crash anyway, upgrade PyTorch.
Out-of-memory on 16 GB Macs — Janus-Pro 7B realistically needs 32 GB unified memory for the generation path. On 16 GB, try Janus-Pro-1B (deepseek-ai/Janus-Pro-1B) instead — same architecture, much smaller.

FAQ

Can I run Janus-Pro 7B with Ollama?

No. Ollama's GGUF runtime does not support Janus-Pro's vision-generation architecture. Community GGUF uploads that exist on the Ollama hub are language-backbone conversions only — they will not generate images. Use the PyTorch + MPS path documented above.

Why float16 instead of bfloat16?

PyTorch's MPS backend does not implement bfloat16 operations. The official Janus example uses bfloat16 because it assumes CUDA. On Apple Silicon you must use float16 — numerical results are essentially equivalent for inference.

How much RAM do I actually need?

The 7B weights in fp16 are about 14 GB; the model plus activation buffers plus macOS overhead push real usage closer to 22-26 GB. 32 GB unified memory is the practical floor. 16 GB Macs should use Janus-Pro-1B instead.

How slow is image generation on Mac?

Roughly 30-90 seconds per 384x384 image on an M2 Pro / M3 with 32 GB. Janus-Pro is autoregressive, so generation cost scales with the number of image tokens. For volume image work, a CUDA GPU is materially faster — the Mac path is best for development, prototyping, and image understanding.

Is the model free to use commercially?

The repo code is MIT. Weights ship under the DeepSeek Model License, which permits commercial use with attribution. Always re-read the current license text before shipping.

Conclusion

Running Janus-Pro 7B on a Mac is a PyTorch-and-MPS exercise, not an Ollama one. Clone the official deepseek-ai/Janus repo, install PyTorch with MPS, download weights from Hugging Face, load in float16, and set PYTORCH_ENABLE_MPS_FALLBACK=1. From there you have full access to both image understanding and text-to-image generation locally — slower than a CUDA GPU, but completely offline and on your own hardware.