3 min to read
Qwen2.5-Omni 3B is Alibaba Cloud’s compact, multimodal AI model optimized for local deployment on consumer-grade hardware. Unlike the 7B variant, the 3B model significantly reduces VRAM usage—by more than 50%—while maintaining robust performance across text, image, audio, and video tasks.
With real-time output and simultaneous multimodal input support, Qwen2.5-Omni 3B is ideal for building local virtual assistants, media analytics tools, and interactive content engines.
This guide walks you through installing Qwen2.5-Omni 3B on Windows, including dependency management, GPU compatibility, and handling multimodal inputs.
Download Miniconda from the official site, install it, then run:
conda create -n qwen python=3.10 -y
conda activate qwen
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install sentencepiece bitsandbytes protobuf numpy einops timm pillow soundfile
pip uninstall -y transformers
pip install git+https://github.com/huggingface/transformers@v4.51.3-Qwen2.5-Omni-preview
pip install accelerate
pip install qwen-omni-utils[decord]
Note: If installation fails, fallback to:
pip install qwen-omni-utils
C:\path\to\ffmpeg\bin
to your system PATH.ffmpeg -version
from transformers import Qwen2_5OmniForConditionalGeneration
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-Omni-3B",
device_map="auto"
)
Use BF16 and flash attention for lower VRAM usage:
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-Omni-3B",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto"
)
Set FORCE_QWENVL_VIDEO_READER
to use the proper backend:
set FORCE_QWENVL_VIDEO_READER=decord
import torch
import soundfile as sf
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
from qwen_omni_utils import process_mm_info
# Load model and processor
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-Omni-3B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = Qwen2_5OmniProcessor.from_pretrained("Qwen/Qwen2.5-Omni-3B")
# Define conversation with video input
conversation = [
{
"role": "user",
"content": [{"type": "video", "video": "https://example.com/sample.mp4"}]
}
]
# Process inputs
text = processor.apply_chat_template(conversation, tokenize=False)
audios, images, videos = process_mm_info(conversation, use_audio_in_video=True)
inputs = processor(
text=text,
audio=audios,
images=images,
videos=videos,
return_tensors="pt"
).to(model.device)
# Generate output
text_ids, audio = model.generate(**inputs)
print(processor.decode(text_ids[0]))
sf.write("output.wav", audio.numpy(), 24000)
bitsandbytes
for 4-bit quantization (if supported).use_audio_in_video=False
to save memory.KeyError: 'qwen2_5_omni'
decord
for HTTP URLs.torchvision>=0.19.0
is installed.Task | Qwen2.5-Omni 3B | Qwen2.5-Omni 7B |
---|---|---|
15s Video (BF16) | 18.38 GB* | 31.11 GB* |
Text-Only Inference | 6–8 GB | 10–12 GB |
*Values represent minimum theoretical usage with flash attention.
Qwen2.5-Omni 3B brings advanced multimodal AI capabilities to local setups without requiring massive infrastructure. While setup requires careful attention to dependencies and GPU specs, the model's real-time performance and flexibility make it a powerful tool for researchers and developers alike.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.