Run Nari Dia 1.6B on Mac (2026): MLX Install Guide for Apple Silicon

Run Nari Dia 1.6B on Mac (2026): MLX Install Guide for Apple Silicon

Last updated April 2026 — refreshed for current model/tool versions.

Nari Labs' Dia 1.6B is one of the few open-weights, dialogue-native text-to-speech models that can rival ElevenLabs on expressiveness — but the official PyTorch repo still ships CUDA-only. This guide is the practical, current path to running Dia on a Mac in 2026: the MLX route via mlx-audio (the one that actually works on M-series silicon today), the PyTorch MPS path (works on recent builds, slow), and the Dia2 streaming variant (CUDA-only, plan for cloud).

What changed in 2026MLX is the Mac-native path now. Dia 1.6B runs on Apple Silicon through mlx-audio (v0.4.3, released April 28, 2026) using the mlx-community/Dia-1.6B port. No CUDA, no Docker tricks.Dia-1.6B-0626 checkpoint is the current official release on Hugging Face (June 26, 2025), and Dia became natively supported in Hugging Face Transformers on June 27, 2025.Dia2 launched November 19, 2025 as a streaming successor with 1B and 2B variants. It is still CUDA 12.8+ only as of April 2026 — Mac users stay on Dia 1.6B locally or run Dia2 in the cloud.An MPS pull request ("feat: add full MPS (Apple Silicon) support") was opened on the upstream nari-labs/dia repo on December 11, 2025. It works in practice but has not been merged; expect rough edges on the official PyTorch path.CPU support and quantization are still on the official TODO list a year after launch. Don't wait for them — use MLX.The original install instructions in this post were broken (truncated brew commands, missing fences, a CUDA-only assumption). They've been rewritten end-to-end.

Want the full picture? Read our continuously-updated Open-Source LLMs Landscape (2026) — every notable open-weights model, license, and hosting cost.

TL;DR — which path should you use?

Your setupRecommended pathWhy
Apple Silicon (M1/M2/M3/M4), 16GB+ unified memorymlx-audio + mlx-community/Dia-1.6BNative Metal kernels, unified memory, runs offline, ~1–3× realtime on M3/M4.
Apple Silicon, want streaming / lowest latencyDia2 via cloud GPU (Modal, RunPod, Lambda)Dia2 is CUDA-only as of April 2026; stream it back to your Mac over HTTP.
Intel MacHugging Face ZeroGPU Space, or remote GPUNo Metal/MLX support; CPU inference of Dia is impractically slow.
Apple Silicon, prefer official PyTorch repoPyTorch MPS via the open community PRWorks but unmerged; expect bugs and 2–4× slower than MLX.

What Dia 1.6B actually is

  • 1.6 billion parameters, autoregressive, audio-tokenized via the Descript Audio Codec.
  • Dialogue-native: uses [S1] / [S2] tags in the input transcript and produces a single coherent multi-speaker take in one pass — no second-pass speaker conditioning required.
  • Non-verbal cues as inline tokens: (laughs), (coughs), (sighs), (gasps), (singing), (clears throat).
  • Voice cloning via short audio prompt conditioning.
  • Apache 2.0 license, weights and code public on nari-labs/dia and Hugging Face.
  • English only. No multilingual fine-tunes from Nari Labs as of April 2026.
  • Reference benchmark: ~40 tokens/sec on an NVIDIA A4000, where 86 tokens ≈ 1 second of audio (so ~0.46× realtime on that GPU). Real-time on RTX 4090; faster with torch.compile.

Hardware and software requirements (Mac, 2026)

  • Apple Silicon: M1, M2, M3, or M4 (Pro/Max/Ultra all fine). Confirm with uname -m — must print arm64.
  • 16GB unified memory minimum for the bf16 weights (~6.5 GB on disk, ~7–8 GB in memory at inference). 24GB+ is comfortable.
  • ~10 GB free disk for weights + Descript Audio Codec + spaCy English model.
  • macOS 14 (Sonoma) or 15 (Sequoia) — MLX requires recent Metal stack.
  • Python 3.10+ (3.11 recommended), uv or pip, and FFmpeg if you want MP3/FLAC output.

Mac, PyTorch MPS path (experimental)

  • Same Apple Silicon + 16GB+ requirement.
  • PyTorch 2.4+ with MPS backend.
  • Currently requires applying the open MPS PR manually — upstream main assumes CUDA.

Intel Mac

Not viable for local inference. Use a remote GPU or the ZeroGPU Space.

Install Dia 1.6B on Apple Silicon with mlx-audio

This is the path that works in 2026. mlx-audio is Apple Silicon-only and ships a port of Dia at mlx-community/Dia-1.6B.

1. Verify hardware

uname -m            # expect: arm64
sw_vers             # macOS 14+ recommended
python3 --version   # 3.10 or newer

2. Install Homebrew dependencies

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install python@3.11 ffmpeg uv

3. Create an isolated environment

mkdir ~/dia-mlx && cd ~/dia-mlx
uv venv --python 3.11
source .venv/bin/activate

4. Install mlx-audio

uv pip install mlx-audio
python -m spacy download en_core_web_sm

mlx-audio v0.4.3 (April 28, 2026) bundles MLX kernels for Dia, Kokoro, Qwen3-TTS, CSM, KugelAudio, and Voxtral. It supports 3-bit through 8-bit quantization on the kokoro family; the Dia port runs in bf16.

5. Generate your first take

python -m mlx_audio.tts.generate \
  --model mlx-community/Dia-1.6B \
  --text "[S1] Dia is an open-weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)" \
  --sample_rate 44100 \
  --play

The first run pulls ~6.5 GB of weights from Hugging Face plus the Descript Audio Codec, so expect a one-time wait of several minutes on a typical home connection.

6. Use it from Python

from mlx_audio.tts.generate import generate_audio

generate_audio(
    text="[S1] Welcome to the show. [S2] Glad to be here. (laughs)",
    model_path="mlx-community/Dia-1.6B",
    speed=1.0,
    file_prefix="episode-001",
    audio_format="wav",
    sample_rate=44100,
    join_audio=True,
)

For voice cloning, pass an audio prompt of 5–15 seconds of clean reference speech alongside the matching transcript prefix; the same conditioning protocol used by the upstream Dia repo applies here.

Optional: PyTorch MPS path (advanced)

If you specifically need the upstream nari-labs/dia codebase (e.g. you have custom forks or research scripts depending on it), you can run it on Apple Silicon via the open MPS pull request. Caveat: as of April 2026 the PR is unmerged and you should expect rough edges.

git clone https://github.com/nari-labs/dia.git
cd dia
git fetch origin pull/<PR-NUMBER>/head:mps
git checkout mps

uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .
uv pip install torch torchaudio  # PyTorch 2.4+ with MPS

python app.py    # launches the Gradio UI; selects MPS automatically

Performance: in our reading of the PR thread and community reports, MPS Dia 1.6B is roughly 2–4× slower than the MLX port on the same hardware. Use MLX unless you have a specific reason not to.

Performance: realistic 2026 numbers

There is no published, peer-reviewed Dia-on-Mac benchmark suite as of April 2026. The numbers below are sourced from Nari Labs' own model card and community reports — verify against your own hardware before quoting them.

HardwarePathApprox. speedNotes
NVIDIA RTX 4090PyTorch + CUDA 12.6Real-time (≥1× RTF)Nari Labs reference setup; torch.compile available.
NVIDIA A4000PyTorch + CUDA 12.6~40 tokens/sec ≈ 0.46× realtimePer the official Hugging Face model card.
M3 Max, 36 GBmlx-audio (bf16)~1.0–1.5× realtime (community)Unified memory makes the 7–8 GB working set comfortable.
M2 Pro, 16 GBmlx-audio (bf16)~0.5–0.8× realtime (community)Tight memory; close other apps.
M1, 16 GBmlx-audio (bf16)~0.3–0.5× realtime (community)Usable for batch generation, painful for interactive use.
Intel MacCPU onlyImpractically slowUse ZeroGPU Space or rent a CUDA box.

For interactive workloads (assistants, agents, voice UI), Dia 1.6B on a Mac is fine for prototyping but you will want a CUDA host for production. If you also need to orchestrate the model behind a local agent, the OpenClaw + Ollama setup guide for running local AI agents covers the agent side; pair it with mlx-audio over a small Python wrapper for voice output.

Should you use Dia2 instead?

Dia2 (released November 19, 2025) is the streaming successor: it begins synthesizing audio from the first few input tokens, supports up to ~2 minutes of generation, and ships in 1B and 2B variants under Apache 2.0. The trade-off:

  • Pros: dramatically lower time-to-first-audio (good for live agents and IVR), more parameters available (2B), still open weights.
  • Cons (for Mac users): requires CUDA 12.8+; the nari-labs/dia2 README does not list Mac/MPS support. There is no MLX port of Dia2 as of April 2026.
  • Quality caveat in the README: "Quality and voices vary per generation, as the model is not fine-tuned on a specific voice." Use a prefix audio prompt for stable output.

The pragmatic 2026 stack: keep Dia 1.6B local on the Mac for batch / offline work, and call a remote Dia2 endpoint (Modal, RunPod, Lambda Labs) for streaming use cases.

Decision tree

  1. Are you on Apple Silicon with 16GB+? → Use mlx-audio + Dia 1.6B locally.
  2. Apple Silicon but <16GB? → Run small smoke tests locally; do real generation on cloud GPU.
  3. Intel Mac? → Skip local. Use the Hugging Face ZeroGPU Space for evaluation, then deploy to a CUDA host.
  4. Need real-time streaming for an agent or call bot? → Use Dia2 on a cloud CUDA box; don't try to make Dia 1.6B stream locally.
  5. Need multilingual TTS? → Dia is English-only. Look at Qwen3-TTS, CSM, or Kokoro multilingual checkpoints (also supported by mlx-audio).

Common pitfalls and troubleshooting

  • uname -m returns x86_64 on a Mac. You're either on Intel (no MLX) or running Python under Rosetta. Reinstall Python via Homebrew under arm64.
  • "Cannot find module mlx" / Metal errors. macOS < 14 isn't supported by current MLX. Update or use a remote GPU.
  • Out-of-memory on 16GB. Quit Chrome, Slack, and any IDE; confirm Activity Monitor shows ≥10 GB free before generating. Or move to a 24GB+ machine.
  • First generation is suspiciously slow. Expected. The Descript Audio Codec and weights download once; subsequent runs are dramatically faster.
  • Robotic / unstable voice on every run. Dia is not voice-locked by default — output drifts between runs. Pass an audio prompt for consistency.
  • Following the upstream nari-labs/dia README on a Mac. The official README assumes CUDA and the original CodeSera post recommended uv run app.py verbatim, which crashes on Mac. Use mlx-audio.
  • Trying to install Dia2 with uv sync on Mac. Dia2 hard-requires CUDA 12.8; uv sync will succeed but inference will fail. Use Dia 1.6B on Mac, Dia2 on cloud.
  • Audio sounds clipped or has artifacts. Confirm --sample_rate 44100; lower rates downsample badly through DAC.

What was removed and why

  • Original "wait for CPU support" advice. Still on the TODO list a year later — don't plan around it. Use MLX today.
  • "Use Orpheus.CPP for a CPU fallback" recommendation. The 2026 Mac-native landscape is mlx-audio (Dia, Kokoro, Qwen3-TTS, CSM); CPU-only TTS isn't competitive on quality.
  • The truncated uv run app.py single-line instructions. Replaced with full step-by-step blocks.

FAQ

Can I run Dia 1.6B on an M1 with 8GB RAM?

No. The bf16 working set is ~7–8 GB; with macOS overhead you'll OOM. 16GB is the floor, 24GB+ is the comfortable target.

Is Dia 1.6B free for commercial use?

Yes — Apache 2.0 covers both weights and code. Voice cloning still has its own ethical/legal considerations; obtain consent for any voice you clone.

How does Dia compare to ElevenLabs in 2026?

For dialogue and non-verbal cues, Dia is competitive on a high-end GPU. ElevenLabs still leads on multilingual coverage, voice library, and out-of-the-box stability. For a self-hosted, open-weights pipeline, Dia is the most credible open contender.

Why MLX instead of PyTorch MPS on Mac?

MLX is Apple-native, exploits unified memory directly, and the mlx-community/Dia-1.6B port is actively maintained. PyTorch MPS works in principle (community PR exists) but is slower and unmerged upstream.

Does Dia support languages other than English?

Not officially as of April 2026. For multilingual TTS on Mac, look at Qwen3-TTS or Kokoro via mlx-audio.

Can I stream Dia output token-by-token on Mac?

Dia 1.6B is not a streaming model. Dia2 is, but it's CUDA-only. The practical streaming setup on Mac is: run Dia2 on a cloud GPU, stream chunks back to the Mac.

Is there a quantized Dia checkpoint?

Five quantized variants are listed in the model tree on the Dia-1.6B Hugging Face page (community uploads). Nari Labs' own quantization work is still on the TODO list. Quality varies — test before deploying.

Where do I file a bug for the Mac path?

For mlx-audio: github.com/Blaizzy/mlx-audio. For the upstream Dia model itself: github.com/nari-labs/dia. Use the mlx-community/Dia-1.6B Hugging Face page for issues with the MLX port specifically.


If you're building a product around Dia or any local-AI stack and want experienced engineers who have shipped this kind of pipeline, Codersera connects you with vetted remote developers fluent in PyTorch, MLX, and inference infra. See hire developers or our adjacent guides on running DeepSeek Janus-Pro 7B on Mac and on Windows.

References & further reading