Run SkyReels V1 Hunyuan I2V on Ubuntu: Step-by-Step Guide (2026)

Run SkyReels V1 Hunyuan I2V on Ubuntu: Step-by-Step Guide (2026)

Last updated April 2026 — refreshed for current model/tool versions.

SkyReels-V1-Hunyuan-I2V is an open-source image-to-video model from SkyworkAI that produces cinematic, human-centric video from still images on a single consumer GPU. This guide walks through the complete Ubuntu setup — from NVIDIA drivers to running your first generation — and covers where SkyReels V1 fits now that V2, V3, and V4 have been released.

What changed in 2025–2026 (read this first if you set this up before)SkyReels has a full version lineage now. V1 (Feb 2025) → V2 with infinite-length diffusion forcing (Apr 2025) → V3 with multimodal in-context learning (Jan 2026) → V4 with native audio-video synthesis (Feb 2026). V1 is still fully functional and the easiest to run on a single RTX 4090.CUDA 12.2 is no longer the recommended version. SkyReels V3 requires CUDA 12.8+. For V1, CUDA 12.4 or 12.6 works fine and avoids compatibility issues with modern drivers.Python 3.12 is now the recommended baseline for V3 work. V1 still runs on Python 3.10.NVIDIA driver 525 is outdated. Current recommended driver is 550+ (for RTX 4090 on Ubuntu 22.04/24.04).ComfyUI has first-class SkyReels V1 support via Kijai's HunyuanVideo wrapper — a viable alternative to the command-line approach.Wan 2.1 and SkyReels V2 have largely superseded V1 for quality-critical work, but V1 remains the lightest-weight option for RTX 4090 class hardware.

TL;DR

Item Value
Model SkyReels-V1-Hunyuan-I2V (fine-tuned HunyuanVideo)
Released February 18, 2025
OS Ubuntu 22.04 LTS or 24.04 LTS
GPU NVIDIA RTX 4090 (24 GB VRAM) recommended minimum for full-res
CUDA 12.4 or 12.6 (12.2 still works; 12.8 for V3+)
Python 3.10 or 3.11
Peak VRAM (4s, 544×960) 18.5 GB with --quant --offload
Inference time (single RTX 4090) ~889 s (4 s clip, no multi-GPU)
Output resolution 544×960, 24 fps, up to 97 frames
License Apache 2.0

Where SkyReels V1 Fits in 2026

The SkyReels family has expanded significantly since V1 launched. Here is the full version timeline so you can decide which model to actually use:

Version Released Key capability Min VRAM
V1 (Hunyuan-I2V) Feb 2025 Human-centric I2V, 544p, single GPU ~18.5 GB (with quant)
V2 (14B I2V) Apr 2025 Infinite-length via diffusion forcing, 540p/720p ~43.4 GB (540p 14B)
V3 (R2V-14B) Jan 2026 Multi-reference, audio-guided, video extension ~24 GB with --low_vram
V4 Feb 2026 Native video+audio co-generation, 1080p/32fps Not yet open-source

If you have a single RTX 4090 and want image-to-video generation today, V1 is still the practical choice. V2's 14B model at 540p needs ~43 GB VRAM; the 1.3B variant needs only ~14.7 GB and is worth testing if V1's generation time (~15 minutes per 4-second clip) is a bottleneck. V3 is now available with Python 3.12 and CUDA 12.8 but targets multi-reference and audio workflows.

For reference on running broader local AI workflows alongside video generation, the OpenClaw + Ollama setup guide for running local AI agents covers setting up a complete local inference stack — useful when you want an LLM captioning pipeline feeding your video model.

Prerequisites

  • OS: Ubuntu 22.04 LTS or 24.04 LTS (both tested)
  • GPU: NVIDIA RTX 4090 (24 GB VRAM) for 544×960 at 97 frames. RTX 3090 (24 GB) also works with the same flags. Cards below 20 GB VRAM will require reducing resolution or frame count.
  • CUDA Toolkit: 12.4 or 12.6 (CUDA 12.2 works but is end-of-support; avoid 12.8 for V1 unless you also need V3)
  • Python: 3.10 or 3.11
  • Disk: ~30 GB free for model weights + environment
  • RAM: 32 GB system RAM recommended when using --high_cpu_memory

Installation Steps

Step 1: Update Ubuntu

sudo apt update && sudo apt upgrade -y

Step 2: Install NVIDIA Drivers

Use Ubuntu's automatic driver detection rather than hardcoding a version number:

sudo apt install ubuntu-drivers-common
sudo ubuntu-drivers autoinstall

This installs the recommended driver for your GPU. As of April 2026, the recommended driver for RTX 4090 on Ubuntu 22.04/24.04 is typically 550 or higher. Verify after reboot:

nvidia-smi

You should see your GPU listed with the driver version and CUDA version shown.

Step 3: Install CUDA Toolkit

Download the CUDA 12.6 toolkit from the NVIDIA CUDA Downloads page. Choose: Linux → x86_64 → Ubuntu → your version → deb (network). Example for Ubuntu 22.04:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-6

Set environment variables. Add these lines to your ~/.bashrc:

export PATH=/usr/local/cuda-12.6/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source ~/.bashrc

Verify:

nvcc --version

Step 4: Install Python and Create Virtual Environment

Never install heavy ML dependencies into the system Python. Use a virtual environment:

sudo apt install python3.10 python3.10-venv python3-pip -y
python3.10 -m venv skyreels-env
source skyreels-env/bin/activate

Step 5: Clone the SkyReels V1 Repository

git clone https://github.com/SkyworkAI/SkyReels-V1
cd SkyReels-V1

Step 6: Install Python Dependencies

pip install -r requirements.txt

The model weights (~13 GB) are downloaded automatically on first run from Hugging Face. If you are behind a firewall, set HF_HOME to a path with enough disk space, or pre-download using:

pip install huggingface_hub
python3 -c "from huggingface_hub import snapshot_download; snapshot_download('Skywork/SkyReels-V1-Hunyuan-I2V')"

Running SkyReels V1 Hunyuan I2V

Step 1: Prepare Your Input Image

SkyReels V1 I2V takes a still image as the starting frame plus a text prompt. The model was fine-tuned on human-centric content, so it performs best with images of people, actors, or characters in defined scenes. Provide the image path via --image.

Prompts should begin with "FPS-24," followed by your description. This token signals the desired frame rate to the model's conditioning system.

Step 2: Run the Generation Command

source ~/skyreels-env/bin/activate
cd SkyReels-V1

SkyReelsModel="Skywork/SkyReels-V1-Hunyuan-I2V" python3 video_generate.py \
  --model_id ${SkyReelsModel} \
  --task_type i2v \
  --image /path/to/your/input.jpg \
  --guidance_scale 6.0 \
  --height 544 \
  --width 960 \
  --num_frames 97 \
  --prompt "FPS-24, A woman walking through a sunlit forest, cinematic, slow motion" \
  --embedded_guidance_scale 1.0 \
  --quant \
  --offload \
  --high_cpu_memory \
  --parameters_level

Key Parameters Explained

Parameter What it does Notes
--task_type i2v Image-to-video mode Use t2v for text-only generation
--image Path to input image Required for i2v tasks
--guidance_scale 6.0 Prompt adherence strength 4–8 is the effective range; 6 is well-tested
--height 544 --width 960 Output resolution 544×960 is the model's native resolution
--num_frames 97 Frame count (97 = ~4 s at 24 fps) Up to 289 frames (~12 s) on single RTX 4090 with optimization
--quant Weight quantization Reduces VRAM usage; essential on 24 GB cards
--offload CPU offloading Reduces peak VRAM at cost of speed
--high_cpu_memory Uses system RAM as VRAM overflow Requires ~32 GB RAM
--parameters_level Parameter-level memory optimization Use with --offload for best VRAM savings

Step 3: Monitor GPU Usage

watch -n 1 nvidia-smi

Peak VRAM for a 97-frame, 544×960 generation with the flags above is approximately 18.5 GB on a single RTX 4090. Inference takes roughly 889 seconds (~15 minutes) on a single GPU.

Multi-GPU Acceleration

SkyReels V1 supports 1–8 GPU parallelism via xDiT's context, CFG, and VAE parallelism. If you have access to multiple GPUs (cloud or workstation), the inference time drops dramatically:

Setup Inference time (97 frames, 544p)
Single RTX 4090 889 s (~15 min)
4× RTX 4090 293 s (~5 min)
Single A800 80GB 771 s
4× A800 205 s

These numbers are from the official SkyReels V1 GitHub benchmarks, measured on a 4-second (97 frame) clip at 544p resolution.

ComfyUI Alternative

If you prefer a GUI-based workflow, SkyReels V1 has native ComfyUI support via Kijai's ComfyUI-HunyuanVideoWrapper. Since SkyReels V1 is fine-tuned on HunyuanVideo, the workflow is identical:

  1. Install ComfyUI and the HunyuanVideoWrapper custom node
  2. Download the GGUF-quantized weights from Kijai/SkyReels-V1-Hunyuan_comfy on Hugging Face
  3. Place in ComfyUI/models/diffusion_models/
  4. Use the native SkyReels I2V workflow from Civitai or the wrapper's example workflows

GGUF quantization reduces VRAM further, making 544p generation feasible on cards with as little as 12 GB VRAM (though at reduced quality and slower speed).

Performance and Hardware Guide

Real-world video generation speeds and VRAM usage for 2025-era hardware (source: SkyReels V1 GitHub repo, independent community testing):

GPU (VRAM) Feasibility Notes
RTX 4090 (24 GB) Full quality, native flags ~889 s for 4 s clip; peak 18.5 GB VRAM
RTX 3090 (24 GB) Full quality, same flags Slower CUDA cores; expect ~1,100–1,200 s
RTX 4080 (16 GB) Works with reduced frames or resolution Reduce to 544×544 or 65 frames to stay under 16 GB
RTX 4070 (12 GB) GGUF quantized via ComfyUI only Lower quality; generation 8–12 min for 2-4 s clips
A100/H100 (80 GB) Full quality, no offloading needed Best for multi-GPU and V2 14B models

How to Choose: V1 vs V2 vs V3

Use this decision tree to pick the right SkyReels version:

  • Single RTX 4090, general I2V (human subjects), fast setup: Use V1. This guide covers it completely.
  • Need longer than 12-second clips or infinite-length autoregressive video: Use V2 (1.3B variant at 14.7 GB VRAM, or 14B on multi-GPU).
  • Multi-reference input (1–4 characters), audio-driven avatars, or video extension: Use V3 (released January 2026, requires Python 3.12 + CUDA 12.8+).
  • Native audio-video co-generation at 1080p/32fps: Wait for V4's open-source release; it is currently in limited preview (as of April 2026).
  • Prefer GUI over CLI: Use ComfyUI with Kijai's HunyuanVideoWrapper for any SkyReels version.

Common Pitfalls and Troubleshooting

Issue: CUDA Not Found / nvcc command not found

Your PATH and LD_LIBRARY_PATH are not set. Ensure you added the export lines to ~/.bashrc and ran source ~/.bashrc in the same terminal. Check the CUDA version in your path matches what is installed:

ls /usr/local/ | grep cuda

Issue: CUDA out of memory / Insufficient VRAM

Ensure all three memory-saving flags are active: --quant --offload --high_cpu_memory. If still failing, reduce --num_frames to 65 (for a ~2.7 s clip) or reduce resolution to --height 480 --width 848. Do not attempt 289-frame generation on a single GPU without all optimization flags.

Issue: nvidia-smi not found after driver install

Reboot is required after driver installation:

sudo reboot

Issue: Driver version mismatch with CUDA

CUDA 12.6 requires NVIDIA driver 560.28.03 or higher. Running nvidia-smi shows the driver version and the maximum CUDA version supported. If the CUDA version shown is lower than 12.6, update your driver:

sudo apt install nvidia-driver-560
sudo reboot

Issue: Python package errors or version conflicts

Always activate the virtual environment before installing or running:

source ~/skyreels-env/bin/activate
python --version  # should show 3.10.x or 3.11.x

If a dependency fails, check whether you have the CUDA-enabled build of PyTorch installed. The requirements.txt may default to CPU PyTorch. Install the CUDA-enabled version explicitly:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Issue: Model weights download fails or is very slow

Set the Hugging Face cache directory to a location with sufficient space and use the Hub CLI to pre-fetch:

export HF_HOME=/path/to/fast/disk/.cache/huggingface
huggingface-cli download Skywork/SkyReels-V1-Hunyuan-I2V

Issue: Generated video has minimal motion or artifacts

Increase --guidance_scale toward 7–8 for more prompt adherence. Community testing shows motion quality does not approach closed-source results until around 100 inference steps. If you are running fewer steps for speed, expect lower motion fidelity. Higher resolution (closer to 960×960) also improves consistency, but requires multi-GPU or significant VRAM overhead.

What Was Removed and Why

The original version of this post recommended nvidia-driver-525 by hardcoded version number. Driver 525 dates from late 2022 and lacks optimizations for current workloads. It also recommended CUDA 12.2, which has been superseded by 12.4, 12.6, and 12.8. Both version-specific recommendations are replaced above with the ubuntu-drivers autoinstall approach, which always installs the current recommended driver for your hardware. Always verify the installed version with nvidia-smi rather than assuming a version number is current.

FAQ

Can I run SkyReels V1 Hunyuan I2V without an RTX 4090?

Yes. Any GPU with 24 GB VRAM (RTX 3090, RTX 4090) can run the full-resolution model with --quant --offload. GPUs with 16 GB (RTX 4080) work at reduced resolution or frame count. Below 16 GB, use the GGUF-quantized version via ComfyUI with Kijai's wrapper — expect lower quality and longer generation times.

How long does it take to generate a 4-second video?

On a single RTX 4090, approximately 889 seconds (~15 minutes) for a 97-frame, 544×960 clip. On 4× RTX 4090 using multi-GPU parallelism, this drops to ~293 seconds (~5 minutes). Cloud GPU options (RunPod H100, etc.) with 80 GB VRAM remove memory constraints and are faster still.

Is SkyReels V1 still worth using in 2026 when V2, V3, and V4 exist?

For single-GPU users with an RTX 4090 doing human-centric I2V, yes. V1 is the most memory-efficient option at full quality. V2's 14B I2V variant needs ~43 GB VRAM at 540p. V3 adds multimodal capabilities but requires Python 3.12 and CUDA 12.8+. V4 is not yet open-source. V1 remains the practical daily driver for the standard consumer GPU stack.

Does the model require an internet connection to run?

Only for the initial model weight download. Once the ~13 GB model is cached locally (default: ~/.cache/huggingface/), generation runs fully offline. Set HF_HOME to control cache location.

Can I fine-tune SkyReels V1 on my own images?

Yes — the model is Apache 2.0 licensed and the base architecture (HunyuanVideo) has an established fine-tuning community. See the SkyReels V1 GitHub issues and Hugging Face community discussions for fine-tuning recipes. Expect to need 40+ GB VRAM for fine-tuning runs.

What is the difference between SkyReels V1 T2V and I2V?

T2V (text-to-video) generates a video purely from a text prompt. I2V (image-to-video) uses a still image as the starting frame and animates it according to the text prompt. Swap --task_type t2v and omit --image for text-only generation, and change --model_id to Skywork/SkyReels-V1-Hunyuan-T2V.

How does SkyReels V1 compare to Wan 2.1 and HunyuanVideo?

SkyReels V1 outperforms base HunyuanVideo on human-centric content (people, actors, characters) due to fine-tuning on 10M+ film and TV clips. Wan 2.1's 14B model at 720p generally produces higher overall quality but requires more VRAM. At 100 inference steps at high resolution, SkyReels V1 I2V has been reported to match or beat closed-source models like Kling and Sora in human-motion fidelity, according to community benchmarks on Hugging Face.

Where do I find more SkyReels workflows and examples?

The official GitHub repository contains the reference inference code. Kijai's Hugging Face repository hosts ComfyUI-compatible checkpoints. Civitai has community-contributed workflows for ComfyUI native support.


Need infrastructure expertise for deploying AI video pipelines or building GPU-accelerated backend systems? Codersera's vetted remote developers include engineers with hands-on experience in ML infrastructure, GPU optimization, and production video AI deployments.


References & Further Reading

  1. SkyReels-V1 Official GitHub Repository (SkyworkAI)
  2. SkyReels-V1-Hunyuan-I2V Model Card — Hugging Face
  3. SkyReels-V2 GitHub: Infinite-length Film Generative Model
  4. SkyReels-V3 GitHub: Multimodal Video Generation Model (released Jan 2026)
  5. Kijai/SkyReels-V1-Hunyuan_comfy — ComfyUI-compatible GGUF weights
  6. NVIDIA CUDA Toolkit Downloads (official)
  7. Ubuntu Server: How to Install NVIDIA Drivers (official Ubuntu docs)
  8. SkyReels-V4 Technique Report — arXiv 2602.21818 (Feb 2026)