Last updated April 2026 — refreshed for current model/tool versions.
Genmo's Mochi 1 was, in October 2024, the first genuinely open text-to-video model that produced motion competitive with closed systems. Eighteen months later it is no longer the strongest open option — Wan 2.2 / 2.7, HunyuanVideo and LTX-Video have moved past it on benchmarks and VRAM efficiency — but it is still useful for natural human motion, it is Apache-2.0, and the ComfyUI workflow that runs it on a single 24 GB consumer GPU is now stable. This guide is the up-to-date Windows path: ComfyUI native nodes, FP8 weights, and a realistic comparison so you can decide whether Mochi is still the right tool for what you are building.
What changed since the last version of this guideUse ComfyUI, not the GenmoAI/Mochi repo's standalone scripts. ComfyUI added native Mochi support in November 2024 (BF16 + FP8 variants); the standalone path requires ~60 GB VRAM and is not consumer-friendly.Mochi 1 HD (720p) was promised for 2024 and has not shipped. The current public weights still output 480p (up to 84 frames). Treat it as a 480p preview, not a production HD model.The repo URL changed. It isgithub.com/genmoai/mochi(lower-case), notGenmoAI/Mochi-1.SwarmUI is no longer the recommended frontend for Mochi on Windows — ComfyUI's native nodes are. SwarmUI now wraps ComfyUI under the hood for video models.Cloud GPU prices fell. RunPod RTX 4090 is now ~$0.34/hr Community / ~$0.69/hr Secure; H100 is ~$2.39/hr on-demand (April 2026).Stronger 2026 alternatives exist. Wan 2.7 (Alibaba, March 2026), HunyuanVideo (Tencent), LTX-Video 0.9 (Lightricks), and the closed Veo 3.1 / Kling 3.0 are now the defaults most practitioners reach for. Sora 2 is being shut down on April 26 2026.
Want the full picture? Read our continuously-updated Open-Source LLMs Landscape (2026) — every notable open-weights model, license, and hosting cost.
TL;DR
| Question | Short answer |
|---|---|
| Can I run Mochi 1 on a single Windows GPU? | Yes — RTX 4090 / 3090 (24 GB) via ComfyUI's FP8 path. 16 GB cards work with offloading; 12 GB is borderline. |
| What resolution / length? | 480p (480×848), up to 84 frames (~3.4 s @ 24 fps). No official HD weights yet. |
| Time per clip on a 4090? | ~3–6 minutes for 5 s @ 480p with FP8 + sage-attention. |
| Best 2026 open alternative if Mochi falls short? | Wan 2.2 (or Wan 2.7 cloud) for quality, HunyuanVideo for cinematic scenes, LTX-Video for speed. |
| License? | Apache 2.0 — commercial use allowed. |
Why Mochi 1 still matters in 2026
- Apache 2.0 weights. Unlike many closed and source-available competitors, Mochi 1 is fully open for commercial use, fine-tuning and redistribution.
- Natural motion. On qualitative comparisons across r/StableDiffusion and the VBench leaderboard, Mochi 1 is consistently called out for fluid human motion and physically plausible movement at 30 fps.
- 10B-parameter Asymmetric Diffusion Transformer (AsymmDiT). Larger than CogVideoX-5B and competitive on text adherence with HunyuanVideo's 13B for many prompts.
- LoRA fine-tuning is mature. Genmo and the community ship trainers; you can fine-tune on a few seconds of source footage and get usable style transfer.
That said: if you only need a finished clip and do not care about open weights, Veo 3.1 and Kling 3.0 produce visibly better output and longer clips. This guide focuses on the case where you specifically want to run video generation locally on Windows.
System requirements (April 2026)
Hardware
| Component | Minimum (FP8, with offload) | Recommended (BF16, fast) |
|---|---|---|
| GPU | NVIDIA RTX 3060 12 GB / RTX 4060 Ti 16 GB | RTX 4090 24 GB or RTX 5090 32 GB (Blackwell) |
| System RAM | 32 GB DDR4/DDR5 | 64 GB DDR5 |
| CPU | 6-core (Ryzen 5 / Intel i5 12th gen+) | 8–16 core (Ryzen 7 7800X / i7-14700K) |
| Storage | 50 GB free NVMe SSD | 200 GB NVMe (fine-tunes, multiple models) |
| OS | Windows 10/11 64-bit, latest WDDM driver | Windows 11 24H2 + Studio driver 560+ |
The official Genmo standalone implementation needs ~60 GB VRAM. Do not use the standalone path on Windows consumer hardware. Use ComfyUI.
Software prerequisites
- Python 3.11+ (3.12 works; 3.13 still has rough edges with Torch on Windows).
- Git for Windows.
- NVIDIA driver 552+ with CUDA 12.4 or 12.6 runtime (ComfyUI's portable build ships its own Torch + CUDA — you do not need a system CUDA install).
- FFmpeg on PATH for export and post-processing.
- (Optional) Visual C++ Build Tools if you compile
sage-attentionorflash-attnfor extra speed.
Step-by-step installation on Windows
Step 1: Install Python and Git
- Install Python 3.11.x from python.org and tick Add Python to PATH in the installer.
- Install Git for Windows with default options.
- Verify in PowerShell:
python --versionandgit --version.
Step 2: Install ComfyUI (portable build, recommended)
The portable Windows build of ComfyUI is the lowest-friction path — it bundles a working PyTorch + CUDA so you do not have to fight CUDA toolkit versions.
- Download the latest
ComfyUI_windows_portable_nvidia.7zrelease from the ComfyUI releases page. - Extract it (7-Zip required) to a fast SSD, e.g.
D:\ComfyUI. Avoid paths with spaces. - Double-click
run_nvidia_gpu.batonce to confirm it launches athttp://127.0.0.1:8188, then close it. - Restart ComfyUI. Click Manager > Update ComfyUI to pull the latest commit (Mochi support is in mainline).
Install ComfyUI-Manager by cloning into ComfyUI\custom_nodes:
cd ComfyUI\custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.gitStep 3: Download Mochi 1 weights
You need three files. Place each in the matching folder under ComfyUI\models\.
| File | Folder | Why |
|---|---|---|
mochi_preview_dit_fp8_e4m3fn.safetensors (~10 GB) | models\diffusion_models\ | FP8 diffusion weights — fits 24 GB and below |
t5xxl_fp8_e4m3fn_scaled.safetensors (~5 GB) | models\clip\ (or text_encoders on newer ComfyUI) | T5-XXL text encoder, FP8 scaled |
mochi_vae.safetensors (~900 MB) | models\vae\ | Mochi-specific VAE for video |
All three are mirrored on Hugging Face under Comfy-Org/mochi_preview_repackaged and the original genmo/mochi-1-preview repo. If you have a 32 GB+ card, you can use the BF16 variant mochi_preview_dit_bf16.safetensors for slightly higher quality.
If you prefer one-file convenience, an all-in-one checkpoint that bundles the FP8 DiT + text encoder is on Civitai; drop it into models\checkpoints\ and use the corresponding Load Checkpoint workflow.
Step 4: Load the example workflow
- Open ComfyUI at
http://127.0.0.1:8188. - From the menu, choose Workflow > Browse Templates > Video > Mochi, or download the example JSON from ComfyUI examples – Mochi.
- Drag the JSON onto the canvas. ComfyUI will auto-link nodes; verify the Load Diffusion Model, Load CLIP and Load VAE point at the files you downloaded.
- Set
length=25(≈1 s) for a smoke test, then queue the prompt. The first run downloads nothing extra; the JIT compile takes 30–60 s.
Step 5: Tune for your VRAM
- 24 GB (4090, 3090): FP8 DiT + FP8 T5 + default VAE tiling. ~3–6 min for 5 s @ 480p.
- 16 GB (4060 Ti, 4080 Mobile): FP8 everything, enable --lowvram in
run_nvidia_gpu.batand reducelengthto 49 frames. - 12 GB (3060, 4070): Add --novram with sequential offload, set VAE tile size to 128, expect 12–20 min per clip. Quality drops modestly.
- 32 GB+ (5090, A6000): Use BF16 DiT for the small but visible quality bump.
For an extra ~25% speed-up on Ada and Blackwell cards, install SageAttention or flash-attn 2.6+ and pass --use-sage-attention to ComfyUI.
Generate your first clip
- Prompt: Mochi rewards specific, motion-led prompts. Lead with subject + verb + camera. Example: "A neon-lit Tokyo alleyway at night, light rain, a black umbrella moves through the frame from right to left, slow tracking shot, 35mm film grain".
- Negative prompt: Mochi does not use a CFG-style negative prompt; leave it empty in the example workflow.
- Sampler: the default
euler+simplescheduler with 50 steps is the canonical setting. Dropping to 30 steps cuts time ~40% with a small quality loss. - CFG: 4.5 is the Mochi-recommended value (not the SD 7–12 range). Going above 6 introduces artifacts.
- Seed: fix it (e.g. 42) when iterating on a prompt so changes are attributable to the prompt, not the seed.
- Length: max 84 frames (~3.5 s @ 24 fps). Longer outputs come from concatenation or stitching, not a single inference.
Save the output via the Save Video node (h264 or webp). For sharing, post-process with FFmpeg:
ffmpeg -i mochi_out.webm -vf "minterpolate=fps=30,scale=1024:1024:flags=lanczos" -c:v libx264 -crf 18 mochi_out_1024.mp4Performance and benchmarks (April 2026)
| Model | Params | Native res | Length | Min consumer VRAM | License |
|---|---|---|---|---|---|
| Mochi 1 preview | 10B | 480p | ~3.4 s (84 frames) | 12–24 GB | Apache 2.0 |
| HunyuanVideo | 13B | 720p | ~5 s | 24 GB (with offload), 40+ GB ideal | Tencent community license |
| Wan 2.2 (A14B MoE) | 14B (MoE) | 720p @ 24 fps | 5 s I2V / T2V | 16–24 GB (FP8/GGUF) | Apache 2.0 |
| Wan 2.7 (cloud) | n/a (closed weights as of April 2026) | 720p / 1080p | 2–15 s | API only via Alibaba Cloud Model Studio | Commercial API |
| LTX-Video 0.9 | ~700M | 768×512 | 5 s | 8–12 GB | RAIL-M (commercial-permissive) |
| CogVideoX-5B | 5B | 720p | ~6 s | 12 GB | Apache 2.0 |
On the public VBench leaderboard, Wan 2.2 currently leads the open-source field on aggregate score; HunyuanVideo leads on cinematic visual quality; Mochi 1 still ranks well on motion smoothness specifically. None of the open models beats Veo 3.1 or Kling 3.0 on aggregate human-preference benchmarks.
2026 closed-model alternatives worth knowing
- Google Veo 3.1 — true 4K @ 60 fps with synchronized audio; the current human-preference leader for short cinematic clips. API via Google AI Studio / Vertex.
- Kuaishou Kling 3.0 — best-in-class human motion, single-generation clips up to 3 minutes, strongest among Chinese closed models.
- OpenAI Sora 2 — strong physics-aware output, but OpenAI announced Sora 2 will be shut down on April 26 2026. Plan migrations to Veo 3.1 or Kling 3.0.
- Alibaba Wan 2.7 — released March 2026; introduces "Thinking Mode", first/last-frame conditioning, and an editing variant (
wan2.7-videoedit). Currently API-only; weights for the open Wan line lag the closed releases by 1–2 versions.
How to choose: quick decision tree
- You need open weights, commercial-permissive, 480p is fine, single 24 GB GPU → Mochi 1 (this guide).
- You need 720p open weights and run Windows + 16–24 GB → Wan 2.2 (FP8 / GGUF) is now the community default.
- You need cinematic 5 s clips and have 24+ GB → HunyuanVideo via ComfyUI.
- You need fast (sub-30 s) iteration on a 12 GB GPU → LTX-Video 0.9.
- You want best-in-class output and don't need local → Veo 3.1 (API) or Kling 3.0.
- You need 1080p, 15 s clips with editing → Wan 2.7 via Alibaba Cloud Model Studio.
If your project is a one-off marketing video and you do not care about training or fine-tuning, the closed APIs will save you days. If you are building a product on top of video generation — e.g. a creator tool, a personalization pipeline, or a research artifact — open weights matter and Mochi / Wan 2.2 / HunyuanVideo are the realistic choices. Teams shipping product-grade pipelines around these models often pair them with strong infra and ML engineers; if that is your bottleneck, our vetted remote developer pool covers Python/Diffusers/ComfyUI experience explicitly. Local model orchestration at the agent layer is covered in our OpenClaw + Ollama setup guide for running local AI agents, which is the natural next step once your video pipeline is wired up.
Cloud GPU options (April 2026 pricing)
If your local card is below 12 GB, renting an hour or two beats fighting offloading.
| Provider | GPU | On-demand price | Notes |
|---|---|---|---|
| RunPod (Community) | RTX 4090 24 GB | ~$0.34/hr | Spot — can be interrupted |
| RunPod (Secure) | RTX 4090 24 GB | ~$0.69/hr | Templates for ComfyUI exist |
| RunPod | A100 80 GB | ~$1.19–1.89/hr | Comfortable for BF16 + long queues |
| RunPod | H100 80 GB | ~$2.39/hr on-demand | Drops to ~$2.79/hr with 3-mo commit |
| Massed Compute | A6000 48 GB | ~$0.40–0.60/hr | Pre-built ComfyUI image |
Verify rates on the RunPod pricing page before booking — these change quarterly. The typical recipe: deploy a Secure Cloud RTX 4090 with the official ComfyUI template, mount a network volume for your weights so you do not re-download 16 GB on every restart, and shut the pod down between sessions.
Common pitfalls and troubleshooting
CUDA out-of-memory at the VAE decode step
The DiT fits in 24 GB but the VAE decode briefly spikes. Lower VAE tile size to 128 or 64 in the VAE Decode (Tiled) node. If you used the non-tiled VAE Decode node, swap it for the tiled version.
Output is black or noise
Almost always a model-mismatch problem: a CLIP/text-encoder file that does not match the diffusion checkpoint, or a non-Mochi VAE. Re-download the three files from Comfy-Org/mochi_preview_repackaged and verify checksums.
Each step takes 20+ seconds on a 4090
You are most likely on PyTorch's default attention. Update ComfyUI, install sage-attention or flash-attn, and launch with --use-sage-attention or --use-pytorch-cross-attention. Also confirm Windows is running on the dedicated GPU (NVIDIA Control Panel > Manage 3D Settings > python.exe = High-performance NVIDIA processor).
"Torch was not compiled with CUDA enabled"
You ran ComfyUI from a system Python instead of the portable Python under ComfyUI\python_embeded. Use run_nvidia_gpu.bat; it points at the right interpreter. If you must use a venv, install Torch from the official CUDA wheel index (pip install torch --index-url https://download.pytorch.org/whl/cu124).
Windows long-path errors during model download
Enable Win32 long paths in Group Policy or registry, and clone with git config --system core.longpaths true. Hugging Face's hf CLI avoids long-path issues entirely:
pip install -U "huggingface_hub[cli]"
hf download Comfy-Org/mochi_preview_repackaged --local-dir D:\models\mochiWhat was removed from this guide and why
- The standalone
GenmoAI/Mochi-1repo path. The repo was renamed togenmoai/mochi, and on Windows consumer hardware the standalone path needs ~60 GB VRAM. Use ComfyUI. - SwarmUI as the primary frontend. SwarmUI now wraps ComfyUI for video models; for Mochi specifically, ComfyUI's example workflow is the fastest path. SwarmUI is still fine if you prefer that UX.
- The "Step 1 typo / Option-2-before-Option-1" section. Fixed in this rewrite.
- CUDA 11.7 / PyTorch 2.0 instructions. Replaced with CUDA 12.4–12.6 and Torch 2.5+ via the ComfyUI portable build.
FAQ
When is Mochi 1 HD (720p) shipping?
Genmo announced an HD model alongside the October 2024 launch. As of April 2026 it has not been publicly released. The current weights remain 480p preview. If you need 720p open weights today, use Wan 2.2 or HunyuanVideo.
Will Mochi 1 run on a 12 GB GPU?
Yes, with offloading and FP8 — but slowly (12–20 min per 3 s clip on an RTX 3060). LTX-Video is a better fit at 12 GB and below.
Can I use Mochi 1 for commercial work?
Yes. The weights are Apache 2.0. Note this applies to the model; you are still responsible for ensuring your prompts and outputs do not infringe third-party rights or violate platform policies.
Does Mochi 1 do image-to-video?
Not officially. The released checkpoint is text-to-video only. Community wrappers like ComfyUI-MochiEdit add editing-style workflows, but for first-class image-to-video on open weights, Wan 2.2-I2V-A14B is the right tool.
What about running Mochi on macOS or AMD?
Apple Silicon support exists through community ports (MPS backend in ComfyUI), but is significantly slower and quality-fragile. AMD on Windows via DirectML is not viable. ROCm on Linux works on RX 7900 XTX-class cards. For a Mac walkthrough see our Run Mochi 1 on macOS guide.
Can I fine-tune a Mochi 1 LoRA on my own footage?
Yes — Genmo ships an official trainer, and there are community ComfyUI training nodes. You will want a 24 GB+ GPU for the training pass, even if inference runs on 12 GB.
Should I just use Wan 2.2 instead?
For most 2026 projects — yes. Wan 2.2 outputs 720p, has stronger benchmark numbers, runs on 16 GB cards via FP8/GGUF, and is also Apache-2.0. The reasons to still pick Mochi 1 are (1) you specifically want its motion characteristics, (2) you have an existing Mochi LoRA, or (3) you are evaluating multiple open models and want apples-to-apples comparisons.
Is Sora 2 still an option?
OpenAI announced Sora 2 will shut down on April 26 2026. Plan migrations to Veo 3.1 or Kling 3.0 if your pipeline depends on a closed API; plan migrations to Wan 2.2 or HunyuanVideo if you can move to open weights.
Related Codersera guides
- Run Mochi 1 on macOS: Step-by-Step Guide
- Run DeepSeek Janus-Pro on Windows: Complete Installation Guide
- Run DeepSeek Janus-Pro on Mac with ComfyUI
- OpenClaw + Ollama setup guide for running local AI agents
References & further reading
- genmo/mochi-1-preview — Hugging Face model card
- genmoai/mochi — official repository (GitHub)
- ComfyUI blog: Run Mochi in ComfyUI with consumer GPU
- ComfyUI examples — Mochi workflow
- VBench Leaderboard (Hugging Face)
- Wan-Video/Wan2.2 — Alibaba's open video model
- RunPod GPU pricing
- r/StableDiffusion — community video-gen discussion