Last updated April 2026 — refreshed for current model/tool versions.
Genmo's Mochi 1 is still the canonical "open-source video model with realistic motion" — a 10B-parameter Asymmetric Diffusion Transformer (AsymmDiT) released under Apache 2.0 in October 2024 — but the macOS story around it has changed substantially since the original guide. macOS Tahoe 26 is the current OS, M4 and M5-series chips ship with up to 128 GB of unified memory, ComfyUI's Mochi node now fits in <24 GB of VRAM, and competing open weights (Wan 2.2, HunyuanVideo, LTX-2) have caught up or pulled ahead on benchmark scores. This guide walks through a working 2026 install on Apple Silicon, where Mochi still wins, and when to pick something else.
What changed in 2026macOS baseline is now Tahoe 26 (released Sep 15, 2025). Sonoma 14.x is two majors behind. macOS 26.4 added M5-specific MLX optimizations; macOS 26.5 is in developer beta as of April 2026.M5 Pro / M5 Max MacBook Pros shipped March 2026 with up to 128 GB unified memory and a Neural Accelerator inside every GPU core. Apple cites up to 3.8x faster image generation vs M4 Max for FLUX-class workloads under MLX.Mochi 1 still has not shipped an HD/720p release. The 480p preview from October 2024 remains the only public weights; GitHub issue #132 tracking the HD release has been open since February 2025 with no official ETA.ComfyUI is now the practical Apple Silicon path for Mochi rather than the upstream genmoai/mochi repo, which targets H100s. The official ComfyUI Mochi node ships fp8 and bf16 variants, supports multiple attention backends, and runs in <24 GB.VBench leadership has moved. Wan 2.2 (Apache 2.0, Alibaba) scores ~84.7% aggregate on VBench, ahead of Mochi on raw quality but Mochi still wins on natural motion physics.Sora 2 was retired by OpenAI on April 26, 2026. Open-weights options (Mochi, Wan 2.2, HunyuanVideo, LTX-2) and Veo 3.1 / Kling 3.0 now make up the field for serious video work.Want the full picture? Read our continuously-updated Open-Source LLMs Landscape (2026) — every notable open-weights model, license, and hosting cost.
TL;DR — Should you run Mochi 1 on a Mac in 2026?
| If you want… | Use | Why |
|---|---|---|
| Best open-source motion physics, learning project | Mochi 1 via ComfyUI (this guide) | Asymmetric DiT still produces the most natural motion of any OSS video model |
| Highest VBench quality, Apache-2.0 | Wan 2.2 (T2V-A14B) | ~84.7% VBench aggregate, MoE backbone, native 720p |
| Largest open community, broad ComfyUI support | HunyuanVideo (13B, Tencent) | Most-discussed OSS video model, strong temporal consistency |
| Native Mac app, no Python, with audio | Draw Things + LTX-2 | Swift/Metal app; LTX-2 generates audio alongside video |
| Best closed-model output, will pay per clip | Veo 3.1 or Kling 3.0 | Veo 3.1: cinematic 4K + audio. Kling 3.0: native 4K@60, free tier |
If you're on an M1/M2 with 16 GB unified memory, expect to use the fp8 ComfyUI workflow and accept long render times. If you're on an M3 Max / M4 Max / M5 Max with 64 GB+, Mochi runs comfortably alongside text-encoder offload. The original guide's "60 fps at 1024x1024 in 11.5 minutes" claim from January 2025 was never reproducible on Apple Silicon and has been removed.
Why Mochi 1 is still worth running locally
Per the Hugging Face model card, Mochi 1 is a 10B-parameter AsymmDiT (48 layers, 24 attention heads, 3072 visual / 1536 text dimension) plus a 362M AsymmVAE doing 8×8 spatial and 6× temporal compression. The 480p preview generates 30 fps clips of roughly 5 seconds; the licence is permissive Apache 2.0, which matters if you plan to ship anything commercial.
The motion quality is the single thing that keeps Mochi relevant: independent reviews and VBench breakouts consistently note that Mochi's asymmetric architecture produces the most physically plausible movement of any open-weights video model in 2026 — water sloshes correctly, fabric drapes correctly, hair settles correctly. Wan 2.2 wins on aggregate quality, HunyuanVideo wins on community size, but for a specific shot where the motion has to read as real, Mochi is still the first thing to try. Pair this with the broader local-AI workflow we cover in the OpenClaw + Ollama setup guide for running local AI agents, and you have an end-to-end on-device creative stack.
System requirements (April 2026)
| Component | Minimum (will run, slowly) | Recommended (comfortable) |
|---|---|---|
| macOS | Sonoma 14.5 (PyTorch MPS reliable from this release) | Tahoe 26.4 or later |
| Apple Silicon | M1 Pro / M2 | M3 Max, M4 Max, M5 Pro, or M5 Max |
| Unified memory | 16 GB (fp8 only, expect swap) | 48–64 GB+ for bf16, 128 GB on M5 Max for headroom |
| Storage | 40 GB free (fp8 model + T5 + VAE) | 100 GB+ NVMe; weights, samples, ComfyUI cache grow fast |
| Python | 3.10 or 3.11 | 3.11.x via Homebrew or pyenv |
| PyTorch | 2.5 stable | PyTorch nightly (MPS gets fixes there first) |
Apple Silicon's unified memory architecture means there is no discrete "VRAM" — the GPU and CPU share the same pool. ComfyUI exposes this as VRAMState.SHARED. In practice, a 64 GB M3/M4/M5 Max behaves like a "60 GB GPU" for diffusion workloads, which is why Macs are unexpectedly competitive with 24 GB consumer NVIDIA cards on memory-bound video models.
Step-by-step install — the ComfyUI path
The original guide pointed at genmoai/mochi upstream, which assumes ≥1 H100 and 60 GB VRAM. On Apple Silicon, use ComfyUI — its Mochi node has the fp8 weights and the attention-backend toggles that actually let it run on a Mac. Skip the upstream repo unless you want to read the reference implementation.
1. Update macOS and install Xcode CLT
# Confirm you are on macOS 14.5+ (preferably Tahoe 26.x)
sw_vers
# Install Xcode Command Line Tools (compiler, git)
xcode-select --install
2. Install Homebrew, Python 3.11, ffmpeg
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install python@3.11 ffmpeg git
The original guide ran sudo spctl --master-disable. Don't. Disabling Gatekeeper system-wide is not necessary for any step here and weakens the OS substantially. Homebrew, Python wheels, and the model weights all install cleanly with Gatekeeper on.
3. Clone ComfyUI and create a venv
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3.11 -m venv venv
source venv/bin/activate
4. Install PyTorch nightly (MPS) and ComfyUI deps
ComfyUI on macOS runs noticeably better on PyTorch nightly — MPS fixes land there months before the stable channel.
pip install --pre torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/nightly/cpu
pip install -r requirements.txt
Note: on Apple Silicon you install the CPU wheel — MPS support is built in and is selected at runtime via torch.backends.mps.is_available(). There is no separate "MPS wheel".
5. Download the Mochi 1 weights (fp8 for Macs)
From the ComfyUI examples page (comfyanonymous.github.io/ComfyUI_examples/mochi), grab the all-in-one packaged checkpoint or the individual files:
- Diffusion model:
mochi_preview_dit_fp8_e4m3fn.safetensors→ComfyUI/models/diffusion_models/ - Text encoder:
t5xxl_fp8_e4m3fn_scaled.safetensors→ComfyUI/models/clip/ - VAE:
mochi_vae.safetensors→ComfyUI/models/vae/
If you have 64 GB+ of unified memory, swap the fp8 diffusion model for the bf16 variant for marginally better quality. The bf16 file is roughly 2× the size on disk.
6. Launch ComfyUI with MPS-friendly flags
export PYTORCH_ENABLE_MPS_FALLBACK=1
python main.py --force-fp16
PYTORCH_ENABLE_MPS_FALLBACK=1 tells PyTorch to silently fall back to CPU for any op that isn't yet implemented on MPS, instead of crashing. It is the single most useful env var on a Mac. Open http://127.0.0.1:8188 and load the official Mochi example workflow.
7. First generation
Use the example prompt to validate end-to-end before iterating on your own:
A close-up shot of a strawberry falling into a glass of milk in slow motion,
photorealistic, 35mm film, soft window light. 24 fps, 73 frames.
Expect first-generation startup to take several minutes — the T5-XXL text encoder (~10 GB) is being loaded. Subsequent generations re-use the warm encoder. The output writes to ComfyUI/output/ as an MP4.
Performance on Apple Silicon — what's realistic
The numbers in the original 2025 guide ("4.2 minutes at 512×512", "11.5 minutes at 1024×1024") were not measured on Apple Silicon and Mochi can't actually generate at 1024×1024 — its trained resolution is 480×848. Realistic April-2026 numbers:
| Hardware | Resolution / frames | Precision | Approx. wall-clock |
|---|---|---|---|
| M2 Pro, 32 GB | 480×848 / 37 frames | fp8 | ~25–40 min, swap pressure |
| M3 Max, 64 GB | 480×848 / 49 frames | fp8 | ~10–15 min |
| M4 Max, 64 GB | 480×848 / 49 frames | fp8 | ~7–11 min |
| M5 Max, 128 GB | 480×848 / 73 frames | bf16 | ~5–8 min (Neural Accelerator) |
| NVIDIA H100 (reference) | 480×848 / 73 frames | bf16 | ~1–3 min per generated second |
These are community-reported ranges from r/StableDiffusion and r/comfyui threads in early 2026, not vendor numbers — your mileage will vary with cooling, sampler choice, step count, and whether the text encoder stays resident. Apple's own benchmarks for image generation on M5 (3.8× M4 Max for FLUX-dev-4bit under MLX) are not directly applicable to Mochi, which runs through PyTorch/MPS rather than MLX.
VBench context
On the public VBench leaderboard, Wan 2.2 currently leads open-weights models with around 84.7% aggregate. Mochi 1 sits below that on aggregate but notably higher on the "Dynamic Degree" / motion-quality sub-scores. If your shot is a moving subject, Mochi is often the better starting point even though Wan 2.2 has the better overall number. See the VBench leaderboard linked in References.
Alternatives if Mochi isn't the right tool
Open-weights options worth knowing
- Wan 2.2 (Alibaba, Apache 2.0): MoE diffusion backbone, native 720p, T2V-A14B and I2V-A14B variants. Best aggregate quality of any open model in early 2026.
- HunyuanVideo (Tencent, 13B): Largest open-weights video model. Excellent temporal consistency on 5-second clips. Largest community ecosystem on r/StableDiffusion.
- LTX-2: Generates synchronized audio with video. Native macOS path via Draw Things; covered in our guide on running it free in ComfyUI.
- SkyReels V1 (Hunyuan I2V): Image-to-video specialist; runs on consumer hardware.
Closed models (pay per clip)
- Veo 3.1 (Google): Cinematic 4K with native audio. Best quality of the closed pack as of April 2026.
- Kling 3.0: Native 4K @ 60 fps, has a free tier; widely cited as the value pick.
- Sora 2: Discontinued by OpenAI on April 26, 2026. Don't build on it; migrate any pipelines.
Decision tree
- Need it commercial-clean and on-device? → Open weights only. Continue.
- Need realistic motion above all else? → Mochi 1.
- Need the highest aggregate quality, OK with MoE complexity? → Wan 2.2.
- Need synchronized audio? → LTX-2 (or pay for Veo 3.1).
- Want a one-click Mac app, no Python? → Draw Things with Wan 2.2 5B or LTX-2.
- Need best output and budget allows pay-per-clip? → Veo 3.1 or Kling 3.0.
Prompt template that still works
[Subject doing action] | [environment / lighting] | [camera + film stock] | [motion notes]
Example:
"A barista pouring espresso into a glass cup | warm window light, morning kitchen |
shot on 35mm film, shallow depth of field | slow-motion, crema swirling"
Mochi rewards specificity about motion — "slow-motion, crema swirling" tells the model what to physically simulate. Generic style tags ("8k, masterpiece") are mostly noise and waste tokens against the 256-token text-encoder limit.
Common pitfalls and troubleshooting
| Symptom | Fix |
|---|---|
NotImplementedError: aten::… not implemented for MPS | Set PYTORCH_ENABLE_MPS_FALLBACK=1 before launching ComfyUI |
| "FP8 not supported on MPS" | Open ComfyUI issue #10292 — use the bf16 diffusion weights, or apply the channels_last workaround discussed in ComfyUI Discussion #13273 |
| Process killed mid-generation, "out of memory" | Switch to fp8 weights, lower frame count to 25–37, close other apps; on 16 GB Macs swap pressure can kill the process |
| Generation runs but output is solid colour / noise | Almost always a wrong VAE or wrong text encoder file — re-download to models/vae and models/clip |
| T5 encoder load takes 10+ minutes | Expected on first run; SSD vs HDD makes a 5× difference. Move ComfyUI to internal NVMe if you're on external storage |
| "Animated" prompts produce mush | Mochi 1 is trained on photorealistic data; it underperforms on cartoon / anime styles. Use HunyuanVideo or a Wan 2.2 LoRA tuned for animation instead |
What was removed from the 2025 version of this guide and why
sudo spctl --master-disable— disabling Gatekeeper system-wide is not required for any of these installs and is bad security hygiene.- "60 fps at 1024×1024 in 11.5 minutes" benchmark — Mochi 1 generates at 480×848, ~30 fps. The original numbers were not reproducible.
- "CleanMyMac X" recommendation — unnecessary;
du -sh ~/Library/Caches/*and Storage Settings cover it natively. - "Course: Master AI Video Creation on Udemy" link — couldn't verify authoritative source; removed.
- The Genmo Discord link in the original is preserved but verify it's still active before relying on it; community has largely shifted to GitHub Issues and r/StableDiffusion.
Where this fits if you're hiring
If you're spinning up an AI video pipeline at a startup, the cost of running Mochi (or Wan 2.2, or HunyuanVideo) locally is not the model — it's the engineer who can keep the workflow productive while ComfyUI, PyTorch nightly, and macOS itself all move. Codersera works with companies who want to hire vetted remote developers for exactly this kind of fast-moving ML / creative-tooling work, and our OpenClaw + Ollama setup guide is the companion piece on the agent side of local AI.
FAQ
Has Mochi 1 HD (720p) shipped yet?
No. As of April 2026, the only public Mochi weights are the October 2024 480p preview. GitHub issue #132 tracking the HD release has been open without an official ETA since February 2025.
Can I run Mochi 1 on an M1 with 16 GB?
Technically yes, with the fp8 ComfyUI workflow and aggressive frame-count limits (25–37 frames at 480×848). Practically, swap pressure makes it slow and unreliable. M3 Max with 36 GB+ is a much better starting point.
Does Mochi 1 support image-to-video?
Not in the public weights. Image-to-video is a frequently-requested feature on the Hugging Face discussions but has not been added. For I2V, use Wan 2.2 I2V-A14B, HunyuanVideo I2V, or SkyReels V1.
What's the licence — can I use Mochi 1 commercially?
Yes. Apache 2.0. You can fine-tune, ship products built on it, and integrate it into commercial pipelines. Wan 2.2 is also Apache 2.0; HunyuanVideo has its own permissive licence — check the specific terms before redistributing weights.
Should I use the upstream genmoai/mochi repo or ComfyUI?
On a Mac, ComfyUI. The upstream repo targets H100s with 60 GB VRAM and prioritises flexibility over memory efficiency. ComfyUI ships the fp8 weights and attention-backend options that make Mac inference practical.
Is Mochi better than Wan 2.2?
It depends. Wan 2.2 wins on aggregate VBench (~84.7%) and native 720p. Mochi wins on natural motion physics. For a moving subject where motion has to read as physically real, try Mochi first; for everything else, Wan 2.2 is usually the stronger default.
Does Apple's MLX framework run Mochi 1?
Not directly. Mochi 1 ships as PyTorch checkpoints; ComfyUI runs them through the PyTorch MPS backend. There are community efforts to port diffusion models to MLX (Apple's M5 announcement explicitly highlighted MLX gains for image generation), but no production Mochi-on-MLX path exists as of April 2026.
Why did Sora 2 shut down?
OpenAI announced on April 2, 2026 that Sora 2 would cease operations on April 26, framed as "strategic reallocation of compute". Industry reporting attributes it to Sora's compute-per-second-of-video being uneconomic at competing prices. If you had Sora 2 in a pipeline, migrate to Veo 3.1, Kling 3.0, or open weights.
Related Codersera guides
- OpenClaw + Ollama setup guide for running local AI agents (2026)
- Run LTX-2 on ComfyUI locally and free — generate videos with audio
- Run SkyReels V1 (Hunyuan I2V) on macOS — step-by-step
- Install and run Hunyuan3D 2 on macOS
- Run Mochi 1 on Windows — step-by-step guide
- Set up and run ComfyUI Copilot on macOS
References and further reading
- Genmo Mochi 1 model card on Hugging Face — architecture details, parameter counts, licence
- genmoai/mochi on GitHub — upstream reference implementation
- Genmo blog: Mochi 1, a new SOTA in open text-to-video — original release post
- ComfyUI Mochi example workflow — official Mac/consumer-GPU workflow
- ComfyUI blog: Run Mochi in ComfyUI with consumer GPU — fp8/bf16 variant guidance
- VBench Leaderboard on Hugging Face — current open-source video benchmark scores
- macOS Tahoe (Wikipedia) — release timeline, supported hardware
- Apple Newsroom: MacBook Pro with M5 Pro and M5 Max (March 2026) — chip specs, unified memory ceiling
- ComfyUI issue #10292: macOS MPS — FP8 not supported, channels_last errors — current known-good workarounds