Mochi 1

Run Mochi 1 on Windows (2026 ComfyUI Guide): Install, Tune, Compare

Learn how to install and optimize Mochi 1, the groundbreaking AI video generator, on Windows. Explore hardware tips, cloud setups, and advanced features for stunning results

Published 03 Feb 2025 • Updated 03 May 2026 • 11 min read

Last updated April 2026 — refreshed for current model/tool versions.

Genmo's Mochi 1 was, in October 2024, the first genuinely open text-to-video model that produced motion competitive with closed systems. Eighteen months later it is no longer the strongest open option — Wan 2.2 / 2.7, HunyuanVideo and LTX-Video have moved past it on benchmarks and VRAM efficiency — but it is still useful for natural human motion, it is Apache-2.0, and the ComfyUI workflow that runs it on a single 24 GB consumer GPU is now stable. This guide is the up-to-date Windows path: ComfyUI native nodes, FP8 weights, and a realistic comparison so you can decide whether Mochi is still the right tool for what you are building.

What changed since the last version of this guideUse ComfyUI, not the GenmoAI/Mochi repo's standalone scripts. ComfyUI added native Mochi support in November 2024 (BF16 + FP8 variants); the standalone path requires ~60 GB VRAM and is not consumer-friendly.Mochi 1 HD (720p) was promised for 2024 and has not shipped. The current public weights still output 480p (up to 84 frames). Treat it as a 480p preview, not a production HD model.The repo URL changed. It is github.com/genmoai/mochi (lower-case), not GenmoAI/Mochi-1.SwarmUI is no longer the recommended frontend for Mochi on Windows — ComfyUI's native nodes are. SwarmUI now wraps ComfyUI under the hood for video models.Cloud GPU prices fell. RunPod RTX 4090 is now ~$0.34/hr Community / ~$0.69/hr Secure; H100 is ~$2.39/hr on-demand (April 2026).Stronger 2026 alternatives exist. Wan 2.7 (Alibaba, March 2026), HunyuanVideo (Tencent), LTX-Video 0.9 (Lightricks), and the closed Veo 3.1 / Kling 3.0 are now the defaults most practitioners reach for. Sora 2 is being shut down on April 26 2026.

Want the full picture? Read our continuously-updated Open-Source LLMs Landscape (2026) — every notable open-weights model, license, and hosting cost.

TL;DR

Question	Short answer
Can I run Mochi 1 on a single Windows GPU?	Yes — RTX 4090 / 3090 (24 GB) via ComfyUI's FP8 path. 16 GB cards work with offloading; 12 GB is borderline.
What resolution / length?	480p (480×848), up to 84 frames (~3.4 s @ 24 fps). No official HD weights yet.
Time per clip on a 4090?	~3–6 minutes for 5 s @ 480p with FP8 + sage-attention.
Best 2026 open alternative if Mochi falls short?	Wan 2.2 (or Wan 2.7 cloud) for quality, HunyuanVideo for cinematic scenes, LTX-Video for speed.
License?	Apache 2.0 — commercial use allowed.

Why Mochi 1 still matters in 2026

Apache 2.0 weights. Unlike many closed and source-available competitors, Mochi 1 is fully open for commercial use, fine-tuning and redistribution.
Natural motion. On qualitative comparisons across r/StableDiffusion and the VBench leaderboard, Mochi 1 is consistently called out for fluid human motion and physically plausible movement at 30 fps.
10B-parameter Asymmetric Diffusion Transformer (AsymmDiT). Larger than CogVideoX-5B and competitive on text adherence with HunyuanVideo's 13B for many prompts.
LoRA fine-tuning is mature. Genmo and the community ship trainers; you can fine-tune on a few seconds of source footage and get usable style transfer.

That said: if you only need a finished clip and do not care about open weights, Veo 3.1 and Kling 3.0 produce visibly better output and longer clips. This guide focuses on the case where you specifically want to run video generation locally on Windows.

System requirements (April 2026)

Hardware

Component	Minimum (FP8, with offload)	Recommended (BF16, fast)
GPU	NVIDIA RTX 3060 12 GB / RTX 4060 Ti 16 GB	RTX 4090 24 GB or RTX 5090 32 GB (Blackwell)
System RAM	32 GB DDR4/DDR5	64 GB DDR5
CPU	6-core (Ryzen 5 / Intel i5 12th gen+)	8–16 core (Ryzen 7 7800X / i7-14700K)
Storage	50 GB free NVMe SSD	200 GB NVMe (fine-tunes, multiple models)
OS	Windows 10/11 64-bit, latest WDDM driver	Windows 11 24H2 + Studio driver 560+

The official Genmo standalone implementation needs ~60 GB VRAM. Do not use the standalone path on Windows consumer hardware. Use ComfyUI.

Software prerequisites

Python 3.11+ (3.12 works; 3.13 still has rough edges with Torch on Windows).
Git for Windows.
NVIDIA driver 552+ with CUDA 12.4 or 12.6 runtime (ComfyUI's portable build ships its own Torch + CUDA — you do not need a system CUDA install).
FFmpeg on PATH for export and post-processing.
(Optional) Visual C++ Build Tools if you compile sage-attention or flash-attn for extra speed.

Step-by-step installation on Windows

Step 1: Install Python and Git

Install Python 3.11.x from python.org and tick Add Python to PATH in the installer.
Install Git for Windows with default options.
Verify in PowerShell: python --version and git --version.

Step 2: Install ComfyUI (portable build, recommended)

The portable Windows build of ComfyUI is the lowest-friction path — it bundles a working PyTorch + CUDA so you do not have to fight CUDA toolkit versions.

Download the latest ComfyUI_windows_portable_nvidia.7z release from the ComfyUI releases page.
Extract it (7-Zip required) to a fast SSD, e.g. D:\ComfyUI. Avoid paths with spaces.
Double-click run_nvidia_gpu.bat once to confirm it launches at http://127.0.0.1:8188, then close it.
Restart ComfyUI. Click Manager > Update ComfyUI to pull the latest commit (Mochi support is in mainline).

Install ComfyUI-Manager by cloning into ComfyUI\custom_nodes:

cd ComfyUI\custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git

Step 3: Download Mochi 1 weights

You need three files. Place each in the matching folder under ComfyUI\models\.

File	Folder	Why
`mochi_preview_dit_fp8_e4m3fn.safetensors` (~10 GB)	`models\diffusion_models\`	FP8 diffusion weights — fits 24 GB and below
`t5xxl_fp8_e4m3fn_scaled.safetensors` (~5 GB)	`models\clip\` (or `text_encoders` on newer ComfyUI)	T5-XXL text encoder, FP8 scaled
`mochi_vae.safetensors` (~900 MB)	`models\vae\`	Mochi-specific VAE for video

All three are mirrored on Hugging Face under Comfy-Org/mochi_preview_repackaged and the original genmo/mochi-1-preview repo. If you have a 32 GB+ card, you can use the BF16 variant mochi_preview_dit_bf16.safetensors for slightly higher quality.

If you prefer one-file convenience, an all-in-one checkpoint that bundles the FP8 DiT + text encoder is on Civitai; drop it into models\checkpoints\ and use the corresponding Load Checkpoint workflow.

Step 4: Load the example workflow

Open ComfyUI at http://127.0.0.1:8188.
From the menu, choose Workflow > Browse Templates > Video > Mochi, or download the example JSON from ComfyUI examples – Mochi.
Drag the JSON onto the canvas. ComfyUI will auto-link nodes; verify the Load Diffusion Model, Load CLIP and Load VAE point at the files you downloaded.
Set length=25 (≈1 s) for a smoke test, then queue the prompt. The first run downloads nothing extra; the JIT compile takes 30–60 s.

Step 5: Tune for your VRAM

24 GB (4090, 3090): FP8 DiT + FP8 T5 + default VAE tiling. ~3–6 min for 5 s @ 480p.
16 GB (4060 Ti, 4080 Mobile): FP8 everything, enable --lowvram in run_nvidia_gpu.bat and reduce length to 49 frames.
12 GB (3060, 4070): Add --novram with sequential offload, set VAE tile size to 128, expect 12–20 min per clip. Quality drops modestly.
32 GB+ (5090, A6000): Use BF16 DiT for the small but visible quality bump.

For an extra ~25% speed-up on Ada and Blackwell cards, install SageAttention or flash-attn 2.6+ and pass --use-sage-attention to ComfyUI.

Generate your first clip

Prompt: Mochi rewards specific, motion-led prompts. Lead with subject + verb + camera. Example: "A neon-lit Tokyo alleyway at night, light rain, a black umbrella moves through the frame from right to left, slow tracking shot, 35mm film grain".
Negative prompt: Mochi does not use a CFG-style negative prompt; leave it empty in the example workflow.
Sampler: the default euler + simple scheduler with 50 steps is the canonical setting. Dropping to 30 steps cuts time ~40% with a small quality loss.
CFG: 4.5 is the Mochi-recommended value (not the SD 7–12 range). Going above 6 introduces artifacts.
Seed: fix it (e.g. 42) when iterating on a prompt so changes are attributable to the prompt, not the seed.
Length: max 84 frames (~3.5 s @ 24 fps). Longer outputs come from concatenation or stitching, not a single inference.

Save the output via the Save Video node (h264 or webp). For sharing, post-process with FFmpeg:

ffmpeg -i mochi_out.webm -vf "minterpolate=fps=30,scale=1024:1024:flags=lanczos" -c:v libx264 -crf 18 mochi_out_1024.mp4

Performance and benchmarks (April 2026)

Model	Params	Native res	Length	Min consumer VRAM	License
Mochi 1 preview	10B	480p	~3.4 s (84 frames)	12–24 GB	Apache 2.0
HunyuanVideo	13B	720p	~5 s	24 GB (with offload), 40+ GB ideal	Tencent community license
Wan 2.2 (A14B MoE)	14B (MoE)	720p @ 24 fps	5 s I2V / T2V	16–24 GB (FP8/GGUF)	Apache 2.0
Wan 2.7 (cloud)	n/a (closed weights as of April 2026)	720p / 1080p	2–15 s	API only via Alibaba Cloud Model Studio	Commercial API
LTX-Video 0.9	~700M	768×512	5 s	8–12 GB	RAIL-M (commercial-permissive)
CogVideoX-5B	5B	720p	~6 s	12 GB	Apache 2.0

On the public VBench leaderboard, Wan 2.2 currently leads the open-source field on aggregate score; HunyuanVideo leads on cinematic visual quality; Mochi 1 still ranks well on motion smoothness specifically. None of the open models beats Veo 3.1 or Kling 3.0 on aggregate human-preference benchmarks.

2026 closed-model alternatives worth knowing

Google Veo 3.1 — true 4K @ 60 fps with synchronized audio; the current human-preference leader for short cinematic clips. API via Google AI Studio / Vertex.
Kuaishou Kling 3.0 — best-in-class human motion, single-generation clips up to 3 minutes, strongest among Chinese closed models.
OpenAI Sora 2 — strong physics-aware output, but OpenAI announced Sora 2 will be shut down on April 26 2026. Plan migrations to Veo 3.1 or Kling 3.0.
Alibaba Wan 2.7 — released March 2026; introduces "Thinking Mode", first/last-frame conditioning, and an editing variant (wan2.7-videoedit). Currently API-only; weights for the open Wan line lag the closed releases by 1–2 versions.

How to choose: quick decision tree

You need open weights, commercial-permissive, 480p is fine, single 24 GB GPU → Mochi 1 (this guide).
You need 720p open weights and run Windows + 16–24 GB → Wan 2.2 (FP8 / GGUF) is now the community default.
You need cinematic 5 s clips and have 24+ GB → HunyuanVideo via ComfyUI.
You need fast (sub-30 s) iteration on a 12 GB GPU → LTX-Video 0.9.
You want best-in-class output and don't need local → Veo 3.1 (API) or Kling 3.0.
You need 1080p, 15 s clips with editing → Wan 2.7 via Alibaba Cloud Model Studio.

If your project is a one-off marketing video and you do not care about training or fine-tuning, the closed APIs will save you days. If you are building a product on top of video generation — e.g. a creator tool, a personalization pipeline, or a research artifact — open weights matter and Mochi / Wan 2.2 / HunyuanVideo are the realistic choices. Teams shipping product-grade pipelines around these models often pair them with strong infra and ML engineers; if that is your bottleneck, our vetted remote developer pool covers Python/Diffusers/ComfyUI experience explicitly. Local model orchestration at the agent layer is covered in our OpenClaw + Ollama setup guide for running local AI agents, which is the natural next step once your video pipeline is wired up.

Cloud GPU options (April 2026 pricing)

If your local card is below 12 GB, renting an hour or two beats fighting offloading.

Provider	GPU	On-demand price	Notes
RunPod (Community)	RTX 4090 24 GB	~$0.34/hr	Spot — can be interrupted
RunPod (Secure)	RTX 4090 24 GB	~$0.69/hr	Templates for ComfyUI exist
RunPod	A100 80 GB	~$1.19–1.89/hr	Comfortable for BF16 + long queues
RunPod	H100 80 GB	~$2.39/hr on-demand	Drops to ~$2.79/hr with 3-mo commit
Massed Compute	A6000 48 GB	~$0.40–0.60/hr	Pre-built ComfyUI image

Verify rates on the RunPod pricing page before booking — these change quarterly. The typical recipe: deploy a Secure Cloud RTX 4090 with the official ComfyUI template, mount a network volume for your weights so you do not re-download 16 GB on every restart, and shut the pod down between sessions.

Common pitfalls and troubleshooting

CUDA out-of-memory at the VAE decode step

The DiT fits in 24 GB but the VAE decode briefly spikes. Lower VAE tile size to 128 or 64 in the VAE Decode (Tiled) node. If you used the non-tiled VAE Decode node, swap it for the tiled version.

Output is black or noise

Almost always a model-mismatch problem: a CLIP/text-encoder file that does not match the diffusion checkpoint, or a non-Mochi VAE. Re-download the three files from Comfy-Org/mochi_preview_repackaged and verify checksums.

Each step takes 20+ seconds on a 4090

You are most likely on PyTorch's default attention. Update ComfyUI, install sage-attention or flash-attn, and launch with --use-sage-attention or --use-pytorch-cross-attention. Also confirm Windows is running on the dedicated GPU (NVIDIA Control Panel > Manage 3D Settings > python.exe = High-performance NVIDIA processor).

"Torch was not compiled with CUDA enabled"

You ran ComfyUI from a system Python instead of the portable Python under ComfyUI\python_embeded. Use run_nvidia_gpu.bat; it points at the right interpreter. If you must use a venv, install Torch from the official CUDA wheel index (pip install torch --index-url https://download.pytorch.org/whl/cu124).

Windows long-path errors during model download

Enable Win32 long paths in Group Policy or registry, and clone with git config --system core.longpaths true. Hugging Face's hf CLI avoids long-path issues entirely:

pip install -U "huggingface_hub[cli]"
hf download Comfy-Org/mochi_preview_repackaged --local-dir D:\models\mochi

What was removed from this guide and why

The standalone GenmoAI/Mochi-1 repo path. The repo was renamed to genmoai/mochi, and on Windows consumer hardware the standalone path needs ~60 GB VRAM. Use ComfyUI.
SwarmUI as the primary frontend. SwarmUI now wraps ComfyUI for video models; for Mochi specifically, ComfyUI's example workflow is the fastest path. SwarmUI is still fine if you prefer that UX.
The "Step 1 typo / Option-2-before-Option-1" section. Fixed in this rewrite.
CUDA 11.7 / PyTorch 2.0 instructions. Replaced with CUDA 12.4–12.6 and Torch 2.5+ via the ComfyUI portable build.

FAQ

When is Mochi 1 HD (720p) shipping?

Genmo announced an HD model alongside the October 2024 launch. As of April 2026 it has not been publicly released. The current weights remain 480p preview. If you need 720p open weights today, use Wan 2.2 or HunyuanVideo.

Will Mochi 1 run on a 12 GB GPU?

Yes, with offloading and FP8 — but slowly (12–20 min per 3 s clip on an RTX 3060). LTX-Video is a better fit at 12 GB and below.

Can I use Mochi 1 for commercial work?

Yes. The weights are Apache 2.0. Note this applies to the model; you are still responsible for ensuring your prompts and outputs do not infringe third-party rights or violate platform policies.

Does Mochi 1 do image-to-video?

Not officially. The released checkpoint is text-to-video only. Community wrappers like ComfyUI-MochiEdit add editing-style workflows, but for first-class image-to-video on open weights, Wan 2.2-I2V-A14B is the right tool.

What about running Mochi on macOS or AMD?

Apple Silicon support exists through community ports (MPS backend in ComfyUI), but is significantly slower and quality-fragile. AMD on Windows via DirectML is not viable. ROCm on Linux works on RX 7900 XTX-class cards. For a Mac walkthrough see our Run Mochi 1 on macOS guide.

Can I fine-tune a Mochi 1 LoRA on my own footage?

Yes — Genmo ships an official trainer, and there are community ComfyUI training nodes. You will want a 24 GB+ GPU for the training pass, even if inference runs on 12 GB.

Should I just use Wan 2.2 instead?

For most 2026 projects — yes. Wan 2.2 outputs 720p, has stronger benchmark numbers, runs on 16 GB cards via FP8/GGUF, and is also Apache-2.0. The reasons to still pick Mochi 1 are (1) you specifically want its motion characteristics, (2) you have an existing Mochi LoRA, or (3) you are evaluating multiple open models and want apples-to-apples comparisons.

Is Sora 2 still an option?

OpenAI announced Sora 2 will shut down on April 26 2026. Plan migrations to Veo 3.1 or Kling 3.0 if your pipeline depends on a closed API; plan migrations to Wan 2.2 or HunyuanVideo if you can move to open weights.

Run Mochi 1 on Windows (2026 ComfyUI Guide): Install, Tune, Compare

TL;DR

Why Mochi 1 still matters in 2026

System requirements (April 2026)

Hardware

Software prerequisites

Step-by-step installation on Windows

Step 1: Install Python and Git

Step 2: Install ComfyUI (portable build, recommended)

Step 3: Download Mochi 1 weights

Step 4: Load the example workflow

Step 5: Tune for your VRAM

Generate your first clip

Performance and benchmarks (April 2026)

2026 closed-model alternatives worth knowing

How to choose: quick decision tree

Cloud GPU options (April 2026 pricing)

Common pitfalls and troubleshooting

CUDA out-of-memory at the VAE decode step

Output is black or noise

Each step takes 20+ seconds on a 4090

"Torch was not compiled with CUDA enabled"

Windows long-path errors during model download

What was removed from this guide and why

FAQ

When is Mochi 1 HD (720p) shipping?

Will Mochi 1 run on a 12 GB GPU?

Can I use Mochi 1 for commercial work?

Does Mochi 1 do image-to-video?

What about running Mochi on macOS or AMD?

Can I fine-tune a Mochi 1 LoRA on my own footage?

Should I just use Wan 2.2 instead?

Is Sora 2 still an option?

References & further reading

Sign up for more like this.

TL;DR

Why Mochi 1 still matters in 2026

System requirements (April 2026)

Hardware

Software prerequisites

Step-by-step installation on Windows

Step 1: Install Python and Git

Step 2: Install ComfyUI (portable build, recommended)

Step 3: Download Mochi 1 weights

Step 4: Load the example workflow

Step 5: Tune for your VRAM

Generate your first clip

Performance and benchmarks (April 2026)

2026 closed-model alternatives worth knowing

How to choose: quick decision tree

Cloud GPU options (April 2026 pricing)

Common pitfalls and troubleshooting

CUDA out-of-memory at the VAE decode step

Output is black or noise

Each step takes 20+ seconds on a 4090

"Torch was not compiled with CUDA enabled"

Windows long-path errors during model download

What was removed from this guide and why

FAQ

When is Mochi 1 HD (720p) shipping?

Will Mochi 1 run on a 12 GB GPU?

Can I use Mochi 1 for commercial work?

Does Mochi 1 do image-to-video?

What about running Mochi on macOS or AMD?

Can I fine-tune a Mochi 1 LoRA on my own footage?

Should I just use Wan 2.2 instead?

Is Sora 2 still an option?

Related Codersera guides

References & further reading

Sign up for more like this.