3 min to read
Qwen2.5-Omni 3B is an advanced multimodal AI model capable of processing text, image, audio, and video in a single, 3-billion-parameter architecture. This guide provides step-by-step instructions for installing Qwen2.5-Omni 3B on Ubuntu, including three different installation methods optimized for GPU usage.
sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip git cmake build-essential
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run
Add to .bashrc:
export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
python3 -m venv qwen_env
source qwen_env/bin/activate
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install sentencepiece bitsandbytes protobuf numpy einops timm pillow
pip uninstall -y transformers
pip install git+https://github.com/huggingface/transformers@3a1ead0aabed473eafe527915eea8c197d424356
pip install accelerate soundfile qwen-omni-utils[decord]
git clone -b qwen2_omni_public https://github.com/fyabc/vllm.git
cd vllm
git checkout de8f43fbe9428b14d31ac5ec45d065cd3e5c3ee0
pip install setuptools_scm torchdiffeq resampy x_transformers qwen-omni-utils accelerate
pip install -r requirements/cuda.txt
pip install --upgrade setuptools wheel
pip install .
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen2.5-Omni-3B",
dtype="bfloat16",
tensor_parallel_size=2) # For multi-GPU setups
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama
ollama pull qwen2.5:3b-omni
ollama run qwen2.5:3b-omni
| Precision | 15s Video | 30s Video | 60s Video |
|---|---|---|---|
| FP32 | 89.10 GB | N/A | N/A |
| BF16 | 18.38 GB | 22.43 GB | 28.22 GB |
attn_implementation="flash_attention_2"bitsandbytes 4-bit quantization# For HTTP/HTTPS support
FORCE_QWENVL_VIDEO_READER=torchvision python script.py
# For local video files
FORCE_QWENVL_VIDEO_READER=decord python script.py
from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-Omni-3B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-3B", device_map="auto")
FROM nvidia/cuda:12.1-base
RUN pip install vllm qwen-omni-utils[decord]
CMD ["python3", "-m", "vllm.entrypoints.api_server"]
pip install gradio
python -m gradio webapp.py
CUDA Out of Memory
max_new_tokens--load-in-4bit--device-map="balanced"Audio Generation Issues
sudo apt install libsndfile1
pip install soundfile
Video Processing Errors
sudo apt install ffmpeg
pip install av
| Hardware | Tokens/sec | VRAM Usage |
|---|---|---|
| RTX 3090 (24GB) | 42.1 | 19.8 GB |
| A100 40GB | 78.3 | 22.1 GB |
| Dual RTX 4090 | 135.7 | 28.4 GB |
response = model.generate(
input_text="Describe this image",
input_images=["image.jpg"],
audio_prompt="audio.wav"
)
video_summary = model.process_video(
"video.mp4",
prompt="Summarize the key events"
)
Security Patches
sudo unattended-upgrade
Model Updates
pip install --upgrade transformers qwen-omni-utils
Installing and optimizing Qwen2.5-Omni 3B on Ubuntu can seem daunting, but by following the steps outlined in this guide, you’ll be able to take full advantage of its powerful multimodal capabilities.
Whether you choose to go with the Hugging Face Transformers method, the vLLM optimization setup, or the Ollama quick deployment, each option provides a flexible solution tailored to different hardware configurations.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.