Connect with OneDrive
High Quality Video Sharing
Store & share your recordings seamlessly with OneDrive integration
7 min to read
Microsoft’s Phi-4 is a state-of-the-art multimodal AI model, capable of advanced language, vision, and audio understanding. Running Phi-4 locally on Ubuntu allows developers, researchers, and enthusiasts to leverage its capabilities.
Microsoft Phi-4 is a cutting-edge multimodal AI model, designed to process and generate text, understand images, and transcribe or translate audio.
Running Phi-4 locally on Ubuntu provides:
Before installing Phi-4, ensure your system meets the following requirements:
Open your terminal and run:
bashsudo apt update && sudo apt
upgrade -y
bashsudo apt install python3 python3-pip python3-venv git unzip curl
-y
Install NVIDIA drivers and CUDA Toolkit if not already present:
bashnvidia-smi
nvcc --version
If these commands fail, refer to the official NVIDIA documentation to install the latest drivers and CUDA toolkit.
There are several ways to run Phi-4 on Ubuntu. The best method depends on your use case, technical comfort, and hardware:
Method | Best For | Ease of Setup | Flexibility | GPU Required |
---|---|---|---|---|
Ollama | Quick setup, chat UI | Easiest | Moderate | Yes |
Python Direct | Custom scripts, research | Moderate | High | Yes |
vLLM | High performance, APIs | Advanced | High | Yes |
Ollama is a user-friendly platform for running large language models locally. It abstracts much of the complexity, making it ideal for quick deployment and experimentation.
bashcurl -fsSL https://ollama.com/install.sh | sh
This script installs Ollama and its dependencies on your system37.
bashollama serve
This command starts the Ollama server, allowing you to interact with models.
List available models:
bashollama list
Pull the Phi-4 model:
bashollama pull vanilj/Phi-4
For better performance with more RAM, use the quantized version:
bashollama pull vanilj/Phi-4-q8_0
Start an interactive chat session:
bashollama run vanilj/Phi-4
You can now type queries and receive responses directly in your terminal.
For a graphical interface, you can use OpenWebUI via Docker:
bashdocker
pull openwebui/openwebuidocker run -d -p 8080
:8080 openwebui/openwebui
Then connect OpenWebUI to your local Ollama instance7.
For full flexibility, especially for research or custom pipelines, install and run Phi-4 using Python and Hugging Face Transformers.
bashpython3 -m venv phi4envsource
phi4env/bin/activate
Create a requirements.txt
file with the following content:
flash_attn2.7.4.post1
torch2.6.0
transformers4.48.2
accelerate1.3.0
soundfile0.13.1
pillow11.1.0
scipy1.15.2
torchvision0.21.0
backoff2.2.1
peft0.13.2
huggingface-hub
Create a directory for the model:
bashmkdir
model
Download the model from Hugging Face:
bashpip install "huggingface_hub[cli]"
huggingface-cli download microsoft/Phi-4-multimodal-instruct --local-dir ./model
import torch
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
from PIL import Image
import soundfile as sf
import io
import requests
from urllib.request import urlopen
model_path = "./model"
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
attn_implementation='flash_attention_2',
).cuda()
generation_config = GenerationConfig.from_pretrained(model_path)
Follow the official vLLM installation instructions (typically via pip):
bashpip install
vllm
bashwget
https://huggingface.co/microsoft/phi-4-gguf/resolve/main/phi-4-q4.gguf
bashvllm serve ./phi-4-q4.gguf --tokenizer microsoft/phi-4 --host 0.0.0.0 --port 7000
This will start the vLLM API server, accessible on port 7000.
Note: Some users have reported issues with engine process failures and LoRA adapter errors. Ensure all dependencies are compatible and your GPU drivers are up to date. See the Troubleshooting section for more details.
Run Phi-4: Create a Python script to load and run the Phi-4 model:PythonCopy
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load pre-trained model and tokenizer
model = AutoModelForCausalLM.from_pretrained('./model')
tokenizer = AutoTokenizer.from_pretrained('./model')
# Process input
input_text = "What are the applications of quantum computing?"
inputs = tokenizer(input_text, return_tensors='pt')
# Generate response
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Download Phi-4 Model:bashCopy
mkdir model
huggingface-cli download microsoft/Phi-4 --local-dir ./model
Install Python Packages: Create a requirements.txt
file with the following dependencies:plaintextCopy
flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2
Install the dependencies:bashCopy
pip install -r requirements.txt
Create a Virtual Environment:bashCopy
python3 -m venv venv
source venv/bin/activate
Install Dependencies:bashCopy
sudo apt install python3 python3-pip python3-venv git unzip -y
Update Ubuntu:bashCopy
sudo apt update && sudo apt upgrade -y
fibonacci_sequence
that computes the Fibonacci sequence up to n
terms and prints the sequence.Output:PythonCopy
# Python script to continue the Fibonacci sequence
def fibonacci_sequence(n):
a, b = 1, 1
sequence = [a, b]
for _ in range(n - 2):
a, b = b, a + b
sequence.append(b)
return sequence
# Continue the sequence up to 10 terms
fib_sequence = fibonacci_sequence(10)
print(fib_sequence)
Steps:bashCopy
ollama pull vanilj/Phi-4
ollama run vanilj/Phi-4 -- "Generate a Python script to continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8"
Output:Copy
To solve the quadratic equation 2x^2 + 3x - 2 = 0, we use the quadratic formula:
x = (-b ± sqrt(b^2 - 4ac)) / (2a)
Here, a = 2, b = 3, and c = -2.
First, calculate the discriminant (D):
D = b^2 - 4ac
D = 3^2 - 4 * 2 * (-2)
D = 9 + 16
D = 25
Now, calculate the roots:
x1 = (-3 + sqrt(25)) / (2 * 2)
x1 = (-3 + 5) / 4
x1 = 2 / 4
x1 = 0.5
x2 = (-3 - sqrt(25)) / (2 * 2)
x2 = (-3 - 5) / 4
x2 = -8 / 4
x2 = -2
The roots of the equation 2x^2 + 3x - 2 = 0 are x1 = 0.5 and x2 = -2.
Steps:bashCopy
ollama pull vanilj/Phi-4
ollama run vanilj/Phi-4 -- "Solve the quadratic equation 2x^2 + 3x - 2 = 0"
Prompt:
pythonprompt = "Explain quantum entanglement in simple terms."
Command (Ollama):
bashollama run vanilj/Phi-4# Then type your prompt
Python:
Use the processor and model as shown in the previous script.
Provide an image and ask for a description or analysis (see Python example above).
Provide an audio file and prompt Phi-4 to transcribe and translate (see Python example above).
nvidia-smi
to monitor GPU utilization and temperature3.Running Microsoft Phi-4 on Ubuntu empowers you with a powerful, flexible, and private AI system capable of advanced language, vision, and audio tasks. Whether you choose the simplicity of Ollama, the flexibility of Python, or the performance of vLLM, Phi-4 can be tailored to your workflow.
Microsoft's Phi-4 is a powerful 14-billion-parameter language model optimized for complex reasoning tasks like mathematical problem-solving, code generation, and natural language understanding. Below are detailed steps to run Phi-4 on Ubuntu, along with two live examples and a conclusion.
By following the steps outlined above, you can successfully deploy and run Microsoft Phi-4 on Ubuntu. Phi-4 excels in mathematical reasoning and outperforms many larger models in solving complex problems. Its ability to provide detailed step-by-step solutions makes it a powerful tool for students, educators, and professionals alike.
Need expert guidance? Connect with a top Codersera professional today!