Unleash Your Creativity
AI Image Editor
Create, edit, and transform images with AI - completely free
3 min to read
Qwen2.5-Omni 3B is an advanced multimodal AI model capable of processing text, image, audio, and video in a single, 3-billion-parameter architecture. This guide provides step-by-step instructions for installing Qwen2.5-Omni 3B on Ubuntu, including three different installation methods optimized for GPU usage.
sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip git cmake build-essential
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run
Add to .bashrc
:
export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
python3 -m venv qwen_env
source qwen_env/bin/activate
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install sentencepiece bitsandbytes protobuf numpy einops timm pillow
pip uninstall -y transformers
pip install git+https://github.com/huggingface/transformers@3a1ead0aabed473eafe527915eea8c197d424356
pip install accelerate soundfile qwen-omni-utils[decord]
git clone -b qwen2_omni_public https://github.com/fyabc/vllm.git
cd vllm
git checkout de8f43fbe9428b14d31ac5ec45d065cd3e5c3ee0
pip install setuptools_scm torchdiffeq resampy x_transformers qwen-omni-utils accelerate
pip install -r requirements/cuda.txt
pip install --upgrade setuptools wheel
pip install .
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen2.5-Omni-3B",
dtype="bfloat16",
tensor_parallel_size=2) # For multi-GPU setups
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama
ollama pull qwen2.5:3b-omni
ollama run qwen2.5:3b-omni
Precision | 15s Video | 30s Video | 60s Video |
---|---|---|---|
FP32 | 89.10 GB | N/A | N/A |
BF16 | 18.38 GB | 22.43 GB | 28.22 GB |
attn_implementation="flash_attention_2"
bitsandbytes
4-bit quantization# For HTTP/HTTPS support
FORCE_QWENVL_VIDEO_READER=torchvision python script.py
# For local video files
FORCE_QWENVL_VIDEO_READER=decord python script.py
from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-Omni-3B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-3B", device_map="auto")
FROM nvidia/cuda:12.1-base
RUN pip install vllm qwen-omni-utils[decord]
CMD ["python3", "-m", "vllm.entrypoints.api_server"]
pip install gradio
python -m gradio webapp.py
CUDA Out of Memory
max_new_tokens
--load-in-4bit
--device-map="balanced"
Audio Generation Issues
sudo apt install libsndfile1
pip install soundfile
Video Processing Errors
sudo apt install ffmpeg
pip install av
Hardware | Tokens/sec | VRAM Usage |
---|---|---|
RTX 3090 (24GB) | 42.1 | 19.8 GB |
A100 40GB | 78.3 | 22.1 GB |
Dual RTX 4090 | 135.7 | 28.4 GB |
response = model.generate(
input_text="Describe this image",
input_images=["image.jpg"],
audio_prompt="audio.wav"
)
video_summary = model.process_video(
"video.mp4",
prompt="Summarize the key events"
)
Security Patches
sudo unattended-upgrade
Model Updates
pip install --upgrade transformers qwen-omni-utils
Installing and optimizing Qwen2.5-Omni 3B on Ubuntu can seem daunting, but by following the steps outlined in this guide, you’ll be able to take full advantage of its powerful multimodal capabilities.
Whether you choose to go with the Hugging Face Transformers method, the vLLM optimization setup, or the Ollama quick deployment, each option provides a flexible solution tailored to different hardware configurations.
Need expert guidance? Connect with a top Codersera professional today!