3 min to read
Qwen2.5-Omni 3B is a cutting-edge multimodal AI model developed to handle text, image, audio, and video processing tasks. While macOS doesn't offer the same native GPU acceleration as Linux or Windows systems, it's still possible to run Qwen2.5-Omni 3B locally with some optimization.
This guide walks you through the complete installation process on macOS, with additional tips to improve performance on Apple Silicon and CPU-based systems.
To ensure smooth operation, check the following prerequisites:
Install Homebrew by running:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Add it to your shell environment:
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc
Install Python 3.10 and core dependencies:
brew install python@3.10
pip install --upgrade pip
brew install cmake ffmpeg
pip install torch torchvision torchaudio
Create and activate a virtual environment:
python -m venv qwen-env
source qwen-env/bin/activate
Uninstall any existing transformers
library and install the custom preview version that supports Qwen2.5-Omni:
pip uninstall transformers -y
pip install git+https://github.com/huggingface/transformers@v4.51.3-Qwen2.5-Omni-preview
pip install accelerate sentencepiece soundfile einops
Install the toolkit with video decoding support:
pip install qwen-omni-utils[decord]
Note: If the decord
installation fails on macOS, use:
pip install qwen-omni-utils
This fallback may be slower for video processing.
Use the huggingface_hub
API to download the model:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="Qwen/Qwen2.5-Omni-3B", local_dir="qwen-3b")
Alternatively, download it manually from the Hugging Face Hub.
Create an inference.py
file with the following code:
import torch
from transformers import Qwen2_5OmniModel, Qwen2_5OmniProcessor
from qwen_omni_utils import process_mm_info
# Load model and processor
model = Qwen2_5OmniModel.from_pretrained("qwen-3b", device_map="auto", torch_dtype=torch.float16)
processor = Qwen2_5OmniProcessor.from_pretrained("qwen-3b")
# Prepare input
inputs = processor("Describe this image: [img_path]", return_tensors="pt").to("cpu")
outputs = model.generate(**inputs)
# Output
print(processor.decode(outputs[0]))
torch_dtype=torch.bfloat16
when supported.device_map="auto"
to split workloads.Quantization for Low RAM:
model = Qwen2_5OmniModel.from_pretrained("qwen-3b", load_in_4bit=True)
Force torchvision backend:
export FORCE_QWENVL_VIDEO_READER=torchvision
transformers
branch.torchvision
to at least version 0.19.0.max_length
value in generate()
.Use Ollama for a managed local LLM runtime:
brew install --cask ollama
ollama pull qwen2.5-omni-3b
⚠️ You may need to configure custom templates for Qwen2.5-Omni compatibility.
Clone and run a custom vLLM fork:
git clone -b qwen2_omni_public https://github.com/fyabc/vllm.git
cd vllm && pip install -e .
python -m vllm.entrypoints.api_server --model Qwen/Qwen2.5-Omni-3B
inputs = processor("Speak a welcome message.", voice="Ethan", return_tensors="pt")
audio = model.generate_audio(**inputs)
sf.write("output.wav", audio.numpy(), 16000)
inputs = processor("Summarize this video: [video_url]", return_tensors="pt")
soundfile
.Installing and running Qwen2.5-Omni 3B on macOS is entirely feasible with the right configuration, even without powerful GPUs.
By following the steps outlined above—setting up Python environments, installing custom libraries, and managing performance through quantization and precision tuning—you can leverage this powerful multimodal AI model for local experimentation and prototyping.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.