Mochi 1

Run Mochi 1 on macOS in 2026: ComfyUI on Apple Silicon, Step-by-Step

Published 31 Jan 2025 • Updated 11 May 2026 • 11 min read

Quick answer. Mochi 1 still installs cleanly on Apple Silicon via ComfyUI, but in 2026 Wan 2.2 and LTX-Video usually beat it on both quality and speed. Run Mochi 1 only if you specifically want its prompt adherence; pick Wan 2.2 otherwise. Expect minutes per short clip on M-series Macs via MPS.

Last updated April 2026 — refreshed for current model/tool versions.

Genmo's Mochi 1 is still the canonical "open-source video model with realistic motion" — a 10B-parameter Asymmetric Diffusion Transformer (AsymmDiT) released under Apache 2.0 in October 2024 — but the macOS story around it has changed substantially since the original guide. macOS Tahoe 26 is the current OS, M4 and M5-series chips ship with up to 128 GB of unified memory, ComfyUI's Mochi node now fits in <24 GB of VRAM, and competing open weights (Wan 2.2, HunyuanVideo, LTX-2) have caught up or pulled ahead on benchmark scores. This guide walks through a working 2026 install on Apple Silicon, where Mochi still wins, and when to pick something else.

What changed in 2026macOS baseline is now Tahoe 26 (released Sep 15, 2025). Sonoma 14.x is two majors behind. macOS 26.4 added M5-specific MLX optimizations; macOS 26.5 is in developer beta as of April 2026.M5 Pro / M5 Max MacBook Pros shipped March 2026 with up to 128 GB unified memory and a Neural Accelerator inside every GPU core. Apple cites up to 3.8x faster image generation vs M4 Max for FLUX-class workloads under MLX.Mochi 1 still has not shipped an HD/720p release. The 480p preview from October 2024 remains the only public weights; GitHub issue #132 tracking the HD release has been open since February 2025 with no official ETA.ComfyUI is now the practical Apple Silicon path for Mochi rather than the upstream genmoai/mochi repo, which targets H100s. The official ComfyUI Mochi node ships fp8 and bf16 variants, supports multiple attention backends, and runs in <24 GB.VBench leadership has moved. Wan 2.2 (Apache 2.0, Alibaba) scores ~84.7% aggregate on VBench, ahead of Mochi on raw quality but Mochi still wins on natural motion physics.Sora 2 was retired by OpenAI on April 26, 2026. Open-weights options (Mochi, Wan 2.2, HunyuanVideo, LTX-2) and Veo 3.1 / Kling 3.0 now make up the field for serious video work.

Want the full picture? Read our continuously-updated Open-Source LLMs Landscape (2026) — every notable open-weights model, license, and hosting cost.

TL;DR — Should you run Mochi 1 on a Mac in 2026?

If you want…	Use	Why
Best open-source motion physics, learning project	Mochi 1 via ComfyUI (this guide)	Asymmetric DiT still produces the most natural motion of any OSS video model
Highest VBench quality, Apache-2.0	Wan 2.2 (T2V-A14B)	~84.7% VBench aggregate, MoE backbone, native 720p
Largest open community, broad ComfyUI support	HunyuanVideo (13B, Tencent)	Most-discussed OSS video model, strong temporal consistency
Native Mac app, no Python, with audio	Draw Things + LTX-2	Swift/Metal app; LTX-2 generates audio alongside video
Best closed-model output, will pay per clip	Veo 3.1 or Kling 3.0	Veo 3.1: cinematic 4K + audio. Kling 3.0: native 4K@60, free tier

If you're on an M1/M2 with 16 GB unified memory, expect to use the fp8 ComfyUI workflow and accept long render times. If you're on an M3 Max / M4 Max / M5 Max with 64 GB+, Mochi runs comfortably alongside text-encoder offload. The original guide's "60 fps at 1024x1024 in 11.5 minutes" claim from January 2025 was never reproducible on Apple Silicon and has been removed.

Why Mochi 1 is still worth running locally

Per the Hugging Face model card, Mochi 1 is a 10B-parameter AsymmDiT (48 layers, 24 attention heads, 3072 visual / 1536 text dimension) plus a 362M AsymmVAE doing 8×8 spatial and 6× temporal compression. The 480p preview generates 30 fps clips of roughly 5 seconds; the licence is permissive Apache 2.0, which matters if you plan to ship anything commercial.

The motion quality is the single thing that keeps Mochi relevant: independent reviews and VBench breakouts consistently note that Mochi's asymmetric architecture produces the most physically plausible movement of any open-weights video model in 2026 — water sloshes correctly, fabric drapes correctly, hair settles correctly. Wan 2.2 wins on aggregate quality, HunyuanVideo wins on community size, but for a specific shot where the motion has to read as real, Mochi is still the first thing to try. Pair this with the broader local-AI workflow we cover in the OpenClaw + Ollama setup guide for running local AI agents, and you have an end-to-end on-device creative stack.

System requirements (April 2026)

Component	Minimum (will run, slowly)	Recommended (comfortable)
macOS	Sonoma 14.5 (PyTorch MPS reliable from this release)	Tahoe 26.4 or later
Apple Silicon	M1 Pro / M2	M3 Max, M4 Max, M5 Pro, or M5 Max
Unified memory	16 GB (fp8 only, expect swap)	48–64 GB+ for bf16, 128 GB on M5 Max for headroom
Storage	40 GB free (fp8 model + T5 + VAE)	100 GB+ NVMe; weights, samples, ComfyUI cache grow fast
Python	3.10 or 3.11	3.11.x via Homebrew or pyenv
PyTorch	2.5 stable	PyTorch nightly (MPS gets fixes there first)

Apple Silicon's unified memory architecture means there is no discrete "VRAM" — the GPU and CPU share the same pool. ComfyUI exposes this as VRAMState.SHARED. In practice, a 64 GB M3/M4/M5 Max behaves like a "60 GB GPU" for diffusion workloads, which is why Macs are unexpectedly competitive with 24 GB consumer NVIDIA cards on memory-bound video models.

Step-by-step install — the ComfyUI path

The original guide pointed at genmoai/mochi upstream, which assumes ≥1 H100 and 60 GB VRAM. On Apple Silicon, use ComfyUI — its Mochi node has the fp8 weights and the attention-backend toggles that actually let it run on a Mac. Skip the upstream repo unless you want to read the reference implementation.

1. Update macOS and install Xcode CLT

# Confirm you are on macOS 14.5+ (preferably Tahoe 26.x)
sw_vers

# Install Xcode Command Line Tools (compiler, git)
xcode-select --install

2. Install Homebrew, Python 3.11, ffmpeg

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

brew install python@3.11 ffmpeg git

The original guide ran sudo spctl --master-disable. Don't. Disabling Gatekeeper system-wide is not necessary for any step here and weakens the OS substantially. Homebrew, Python wheels, and the model weights all install cleanly with Gatekeeper on.

3. Clone ComfyUI and create a venv

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

python3.11 -m venv venv
source venv/bin/activate

4. Install PyTorch nightly (MPS) and ComfyUI deps

ComfyUI on macOS runs noticeably better on PyTorch nightly — MPS fixes land there months before the stable channel.

pip install --pre torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/nightly/cpu

pip install -r requirements.txt

Note: on Apple Silicon you install the CPU wheel — MPS support is built in and is selected at runtime via torch.backends.mps.is_available(). There is no separate "MPS wheel".

5. Download the Mochi 1 weights (fp8 for Macs)

From the ComfyUI examples page (comfyanonymous.github.io/ComfyUI_examples/mochi), grab the all-in-one packaged checkpoint or the individual files:

Diffusion model: mochi_preview_dit_fp8_e4m3fn.safetensors → ComfyUI/models/diffusion_models/
Text encoder: t5xxl_fp8_e4m3fn_scaled.safetensors → ComfyUI/models/clip/
VAE: mochi_vae.safetensors → ComfyUI/models/vae/

If you have 64 GB+ of unified memory, swap the fp8 diffusion model for the bf16 variant for marginally better quality. The bf16 file is roughly 2× the size on disk.

6. Launch ComfyUI with MPS-friendly flags

export PYTORCH_ENABLE_MPS_FALLBACK=1
python main.py --force-fp16

PYTORCH_ENABLE_MPS_FALLBACK=1 tells PyTorch to silently fall back to CPU for any op that isn't yet implemented on MPS, instead of crashing. It is the single most useful env var on a Mac. Open http://127.0.0.1:8188 and load the official Mochi example workflow.

7. First generation

Use the example prompt to validate end-to-end before iterating on your own:

A close-up shot of a strawberry falling into a glass of milk in slow motion,
photorealistic, 35mm film, soft window light. 24 fps, 73 frames.

Expect first-generation startup to take several minutes — the T5-XXL text encoder (~10 GB) is being loaded. Subsequent generations re-use the warm encoder. The output writes to ComfyUI/output/ as an MP4.

Performance on Apple Silicon — what's realistic

The numbers in the original 2025 guide ("4.2 minutes at 512×512", "11.5 minutes at 1024×1024") were not measured on Apple Silicon and Mochi can't actually generate at 1024×1024 — its trained resolution is 480×848. Realistic April-2026 numbers:

Hardware	Resolution / frames	Precision	Approx. wall-clock
M2 Pro, 32 GB	480×848 / 37 frames	fp8	~25–40 min, swap pressure
M3 Max, 64 GB	480×848 / 49 frames	fp8	~10–15 min
M4 Max, 64 GB	480×848 / 49 frames	fp8	~7–11 min
M5 Max, 128 GB	480×848 / 73 frames	bf16	~5–8 min (Neural Accelerator)
NVIDIA H100 (reference)	480×848 / 73 frames	bf16	~1–3 min per generated second

These are community-reported ranges from r/StableDiffusion and r/comfyui threads in early 2026, not vendor numbers — your mileage will vary with cooling, sampler choice, step count, and whether the text encoder stays resident. Apple's own benchmarks for image generation on M5 (3.8× M4 Max for FLUX-dev-4bit under MLX) are not directly applicable to Mochi, which runs through PyTorch/MPS rather than MLX.

VBench context

On the public VBench leaderboard, Wan 2.2 currently leads open-weights models with around 84.7% aggregate. Mochi 1 sits below that on aggregate but notably higher on the "Dynamic Degree" / motion-quality sub-scores. If your shot is a moving subject, Mochi is often the better starting point even though Wan 2.2 has the better overall number. See the VBench leaderboard linked in References.

Alternatives if Mochi isn't the right tool

Open-weights options worth knowing

Wan 2.2 (Alibaba, Apache 2.0): MoE diffusion backbone, native 720p, T2V-A14B and I2V-A14B variants. Best aggregate quality of any open model in early 2026.
HunyuanVideo (Tencent, 13B): Largest open-weights video model. Excellent temporal consistency on 5-second clips. Largest community ecosystem on r/StableDiffusion.
LTX-2: Generates synchronized audio with video. Native macOS path via Draw Things; covered in our guide on running it free in ComfyUI.
SkyReels V1 (Hunyuan I2V): Image-to-video specialist; runs on consumer hardware.

Closed models (pay per clip)

Veo 3.1 (Google): Cinematic 4K with native audio. Best quality of the closed pack as of April 2026.
Kling 3.0: Native 4K @ 60 fps, has a free tier; widely cited as the value pick.
Sora 2: Discontinued by OpenAI on April 26, 2026. Don't build on it; migrate any pipelines.

Decision tree

Need it commercial-clean and on-device? → Open weights only. Continue.
Need realistic motion above all else? → Mochi 1.
Need the highest aggregate quality, OK with MoE complexity? → Wan 2.2.
Need synchronized audio? → LTX-2 (or pay for Veo 3.1).
Want a one-click Mac app, no Python? → Draw Things with Wan 2.2 5B or LTX-2.
Need best output and budget allows pay-per-clip? → Veo 3.1 or Kling 3.0.

Prompt template that still works

[Subject doing action] | [environment / lighting] | [camera + film stock] | [motion notes]

Example:
"A barista pouring espresso into a glass cup | warm window light, morning kitchen |
shot on 35mm film, shallow depth of field | slow-motion, crema swirling"

Mochi rewards specificity about motion — "slow-motion, crema swirling" tells the model what to physically simulate. Generic style tags ("8k, masterpiece") are mostly noise and waste tokens against the 256-token text-encoder limit.

Common pitfalls and troubleshooting

Symptom	Fix
`NotImplementedError: aten::… not implemented for MPS`	Set `PYTORCH_ENABLE_MPS_FALLBACK=1` before launching ComfyUI
"FP8 not supported on MPS"	Open ComfyUI issue #10292 — use the bf16 diffusion weights, or apply the channels_last workaround discussed in ComfyUI Discussion #13273
Process killed mid-generation, "out of memory"	Switch to fp8 weights, lower frame count to 25–37, close other apps; on 16 GB Macs swap pressure can kill the process
Generation runs but output is solid colour / noise	Almost always a wrong VAE or wrong text encoder file — re-download to `models/vae` and `models/clip`
T5 encoder load takes 10+ minutes	Expected on first run; SSD vs HDD makes a 5× difference. Move ComfyUI to internal NVMe if you're on external storage
"Animated" prompts produce mush	Mochi 1 is trained on photorealistic data; it underperforms on cartoon / anime styles. Use HunyuanVideo or a Wan 2.2 LoRA tuned for animation instead

What was removed from the 2025 version of this guide and why

sudo spctl --master-disable — disabling Gatekeeper system-wide is not required for any of these installs and is bad security hygiene.
"60 fps at 1024×1024 in 11.5 minutes" benchmark — Mochi 1 generates at 480×848, ~30 fps. The original numbers were not reproducible.
"CleanMyMac X" recommendation — unnecessary; du -sh ~/Library/Caches/* and Storage Settings cover it natively.
"Course: Master AI Video Creation on Udemy" link — couldn't verify authoritative source; removed.
The Genmo Discord link in the original is preserved but verify it's still active before relying on it; community has largely shifted to GitHub Issues and r/StableDiffusion.

Where this fits if you're hiring

If you're spinning up an AI video pipeline at a startup, the cost of running Mochi (or Wan 2.2, or HunyuanVideo) locally is not the model — it's the engineer who can keep the workflow productive while ComfyUI, PyTorch nightly, and macOS itself all move. Codersera works with companies who want to hire vetted remote developers for exactly this kind of fast-moving ML / creative-tooling work, and our OpenClaw + Ollama setup guide is the companion piece on the agent side of local AI.

FAQ

Has Mochi 1 HD (720p) shipped yet?

No. As of April 2026, the only public Mochi weights are the October 2024 480p preview. GitHub issue #132 tracking the HD release has been open without an official ETA since February 2025.

Can I run Mochi 1 on an M1 with 16 GB?

Technically yes, with the fp8 ComfyUI workflow and aggressive frame-count limits (25–37 frames at 480×848). Practically, swap pressure makes it slow and unreliable. M3 Max with 36 GB+ is a much better starting point.

Does Mochi 1 support image-to-video?

Not in the public weights. Image-to-video is a frequently-requested feature on the Hugging Face discussions but has not been added. For I2V, use Wan 2.2 I2V-A14B, HunyuanVideo I2V, or SkyReels V1.

What's the licence — can I use Mochi 1 commercially?

Yes. Apache 2.0. You can fine-tune, ship products built on it, and integrate it into commercial pipelines. Wan 2.2 is also Apache 2.0; HunyuanVideo has its own permissive licence — check the specific terms before redistributing weights.

Should I use the upstream `genmoai/mochi` repo or ComfyUI?

On a Mac, ComfyUI. The upstream repo targets H100s with 60 GB VRAM and prioritises flexibility over memory efficiency. ComfyUI ships the fp8 weights and attention-backend options that make Mac inference practical.

Is Mochi better than Wan 2.2?

It depends. Wan 2.2 wins on aggregate VBench (~84.7%) and native 720p. Mochi wins on natural motion physics. For a moving subject where motion has to read as physically real, try Mochi first; for everything else, Wan 2.2 is usually the stronger default.

Does Apple's MLX framework run Mochi 1?

Not directly. Mochi 1 ships as PyTorch checkpoints; ComfyUI runs them through the PyTorch MPS backend. There are community efforts to port diffusion models to MLX (Apple's M5 announcement explicitly highlighted MLX gains for image generation), but no production Mochi-on-MLX path exists as of April 2026.

Why did Sora 2 shut down?

OpenAI announced on April 2, 2026 that Sora 2 would cease operations on April 26, framed as "strategic reallocation of compute". Industry reporting attributes it to Sora's compute-per-second-of-video being uneconomic at competing prices. If you had Sora 2 in a pipeline, migrate to Veo 3.1, Kling 3.0, or open weights.

References and further reading

Genmo Mochi 1 model card on Hugging Face — architecture details, parameter counts, licence
genmoai/mochi on GitHub — upstream reference implementation
Genmo blog: Mochi 1, a new SOTA in open text-to-video — original release post
ComfyUI Mochi example workflow — official Mac/consumer-GPU workflow
ComfyUI blog: Run Mochi in ComfyUI with consumer GPU — fp8/bf16 variant guidance
VBench Leaderboard on Hugging Face — current open-source video benchmark scores
macOS Tahoe (Wikipedia) — release timeline, supported hardware
Apple Newsroom: MacBook Pro with M5 Pro and M5 Max (March 2026) — chip specs, unified memory ceiling
ComfyUI issue #10292: macOS MPS — FP8 not supported, channels_last errors — current known-good workarounds