5 min to read
Qwen3-8B is one of the latest large language models (LLMs) from Alibaba's Qwen series, designed for high performance and versatility in a wide range of natural language processing tasks.
Running Qwen3-8B locally on Ubuntu allows developers and researchers to leverage its capabilities without relying on cloud APIs, ensuring data privacy, low latency, and cost efficiency.
Qwen3-8B is an 8-billion parameter transformer model, offering a balance between computational requirements and language understanding. It is available in various formats and can be deployed using multiple frameworks, such as Ollama, vLLM, Hugging Face Transformers, and more.
Before installing and running Qwen3-8B, ensure your system meets the following requirements:
There are several ways to run Qwen3-8B on Ubuntu. The most popular and user-friendly methods are:
Ollama is a streamlined tool for running and managing LLMs locally. It handles model downloads, updates, and provides a simple CLI and API server.
curl -fsSL https://ollama.com/install.sh | sh
vLLM is a high-throughput inference engine designed for serving LLMs efficiently, supporting advanced features like tensor parallelism and quantization13.
source
qwen3_env/bin/activatepip install
-U vllmhttp://localhost:8000/v1/
.curl
.--max-model-len
for longer prompts5.bashcurl -X POST http://localhost:8000/v1/chat/completions \
'{
-H "Content-Type: application/json" \
-d
"model": "Qwen/Qwen3-8B",
"messages": [{"role": "user", "content": "Hello, Qwen3-8B!"}]}'
For those who want more control or wish to fine-tune the model, Hugging Face Transformers is the go-to library.
sudo apt update && sudo apt
upgrade -ysudo apt install -y python3 python3-pip git
pip install
torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118pip install
transformerssource
qwen_env/bin/activatefrom transformers import AutoModelForCausalLM,
AutoTokenizermodel_name = "Qwen/Qwen3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
prompt = "What are the main features of Qwen3-8B?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
bitsandbytes
or GGUF format.For lightweight inference, especially on CPUs or with quantized models, llama.cpp is a popular choice.
git
clone https://github.com/ggerganov/llama.cppcd
llama.cppmake
./main -m qwen3-8b.gguf -p "Hello, Qwen3-8B!"
nvidia-smi
and htop
to monitor GPU and CPU usage.Method | Ease of Use | Performance | Flexibility | Best For |
---|---|---|---|---|
Ollama | ★★★★★ | ★★★★☆ | ★★★☆☆ | Beginners, quick setup |
vLLM | ★★★★☆ | ★★★★★ | ★★★★☆ | Production, high throughput |
Transformers | ★★★☆☆ | ★★★☆☆ | ★★★★★ | Research, customization |
llama.cpp | ★★★★☆ | ★★★☆☆ | ★★★☆☆ | Lightweight, quantized |
Trainer
class or frameworks like LoRA for parameter-efficient fine-tuning.Running Qwen3-8B on Ubuntu is now accessible to anyone with a modern workstation or server. Whether you prefer the simplicity of Ollama, the speed of vLLM, or the flexibility of Hugging Face Transformers, you can deploy this powerful LLM for research, prototyping, or production workloads.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.