Stand Out From the Crowd
Professional Resume Builder
Used by professionals from Google, Meta, and Amazon
3 min to read
Qwen2.5-Omni 3B is a cutting-edge multimodal AI model developed to handle text, image, audio, and video processing tasks. While macOS doesn't offer the same native GPU acceleration as Linux or Windows systems, it's still possible to run Qwen2.5-Omni 3B locally with some optimization.
This guide walks you through the complete installation process on macOS, with additional tips to improve performance on Apple Silicon and CPU-based systems.
To ensure smooth operation, check the following prerequisites:
Install Homebrew by running:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Add it to your shell environment:
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc
Install Python 3.10 and core dependencies:
brew install python@3.10
pip install --upgrade pip
brew install cmake ffmpeg
pip install torch torchvision torchaudio
Create and activate a virtual environment:
python -m venv qwen-env
source qwen-env/bin/activate
Uninstall any existing transformers
library and install the custom preview version that supports Qwen2.5-Omni:
pip uninstall transformers -y
pip install git+https://github.com/huggingface/transformers@v4.51.3-Qwen2.5-Omni-preview
pip install accelerate sentencepiece soundfile einops
Install the toolkit with video decoding support:
pip install qwen-omni-utils[decord]
Note: If the decord
installation fails on macOS, use:
pip install qwen-omni-utils
This fallback may be slower for video processing.
Use the huggingface_hub
API to download the model:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="Qwen/Qwen2.5-Omni-3B", local_dir="qwen-3b")
Alternatively, download it manually from the Hugging Face Hub.
Create an inference.py
file with the following code:
import torch
from transformers import Qwen2_5OmniModel, Qwen2_5OmniProcessor
from qwen_omni_utils import process_mm_info
# Load model and processor
model = Qwen2_5OmniModel.from_pretrained("qwen-3b", device_map="auto", torch_dtype=torch.float16)
processor = Qwen2_5OmniProcessor.from_pretrained("qwen-3b")
# Prepare input
inputs = processor("Describe this image: [img_path]", return_tensors="pt").to("cpu")
outputs = model.generate(**inputs)
# Output
print(processor.decode(outputs[0]))
torch_dtype=torch.bfloat16
when supported.device_map="auto"
to split workloads.Quantization for Low RAM:
model = Qwen2_5OmniModel.from_pretrained("qwen-3b", load_in_4bit=True)
Force torchvision backend:
export FORCE_QWENVL_VIDEO_READER=torchvision
transformers
branch.torchvision
to at least version 0.19.0.max_length
value in generate()
.Use Ollama for a managed local LLM runtime:
brew install --cask ollama
ollama pull qwen2.5-omni-3b
⚠️ You may need to configure custom templates for Qwen2.5-Omni compatibility.
Clone and run a custom vLLM fork:
git clone -b qwen2_omni_public https://github.com/fyabc/vllm.git
cd vllm && pip install -e .
python -m vllm.entrypoints.api_server --model Qwen/Qwen2.5-Omni-3B
inputs = processor("Speak a welcome message.", voice="Ethan", return_tensors="pt")
audio = model.generate_audio(**inputs)
sf.write("output.wav", audio.numpy(), 16000)
inputs = processor("Summarize this video: [video_url]", return_tensors="pt")
soundfile
.Installing and running Qwen2.5-Omni 3B on macOS is entirely feasible with the right configuration, even without powerful GPUs.
By following the steps outlined above—setting up Python environments, installing custom libraries, and managing performance through quantization and precision tuning—you can leverage this powerful multimodal AI model for local experimentation and prototyping.
Need expert guidance? Connect with a top Codersera professional today!