4 min to read
Microsoft's Phi-4 represents a breakthrough in efficient language models, offering state-of-the-art reasoning capabilities with its 14-billion parameter architecture. While originally designed for Linux environments, this guide provides detailed methodologies for Windows users to harness its multimodal capabilities.
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
Install base dependencies
choco install -y git python310 cuda vcredist2022
mkdir Phi4-Windows && cd
Phi4-Windowspython -
m venv phi4_env.
\phi4_env\Scripts\activatepip install
torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install
flash-attn --no-build-isolationpip install
transformers accelerate soundfile pillow scipy peftollama run vanilj/Phi-4 "Explain quantum computing in simple terms"
docker
pull ollama/ollama:latestdocker run -d --gpus all -p 11434
:11434 ollama/ollamaimport transformers
model_id = "C:\\phi4"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": "auto"},
device_map="cuda",
)
messages = [
{"role": "system", "content": "You are a funny teacher trying to make lectures as interesting as possible and you give real-life examples"},
{"role": "user", "content": "How to explain gravity to high-school students?"},
]
outputs = pipeline(messages, max_new_tokens=128)
print(outputs[0]["generated_text"][-1])
Download the Model:PythonCopy
from huggingface_hub import snapshot_download
snapshot_download(repo_id="microsoft/phi-4", local_dir="C:\\phi4")
Install Additional Libraries:bashCopy
pip install huggingface-hub
pip install transformers
pip install accelerate
For GPU:bashCopy
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
For CPU:bashCopy
pip install torch torchvision torchaudio
Create a Virtual Environment:bashCopy
mkdir phi4
cd phi4
python -m venv venv
venv\Scripts\activate
Install CUDA and add the following environment variables:bashCopy
CUDA_HOME = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
Path += %CUDA_HOME%\bin; %CUDA_HOME%\libnvvp
Output:plaintextCopy
Feedback: The solution provided is incorrect. The correct first derivative of ln(x^2 + 1) is 2x / (x^2 + 1). Here's the step-by-step reasoning:
1. Apply the chain rule: d/dx [ln(u)] = 1/u * du/dx, where u = x^2 + 1.
2. Compute du/dx: d/dx [x^2 + 1] = 2x.
3. Combine the results: (1 / (x^2 + 1)) * 2x = 2x / (x^2 + 1).
Output:plaintextCopy
{'role': 'assistant', 'content': 'Alright, class, gather around! Today, we\'re diving into the mysterious and mind-bending world of gravity. Now, I know what you\'re thinking: "Gravity? Isn\'t that just why we don\'t float away into space?" Well, yes, but there\'s so much more to it! Let\'s break it down with some real-life examples that\'ll make your heads spin—figuratively, of course, because gravity keeps them attached to your bodies!'}
model = AutoModelForCausalLM.from_pretrained(
attn_implementation="flash_attention_2",
torch_dtype=torch.
float16)
python -m transformers.onnx --model=microsoft/Phi-4 --feature=causal-lm --quantize=
avx512_vnnipipeline = transformers.pipeline(
"text-generation",
model=model,
device=0,
batch_size=4,
max_new_tokens=512
)
vcredist --all --quiet --
norestartpython# Image Analysis
image = Image.open("street_view.jpg")
inputs = processor(
text="<|user|><|image_1|>Describe traffic conditions<|end|><|assistant|>",
images=image,
return_tensors="pt"
).to("cuda")
# Audio Transcription
audio, rate = sf.read("meeting_recording.flac")
audio_inputs = processor(
text="<|user|><|audio_1|>Transcribe and summarize<|end|><|assistant|>",
audios=[(audio, rate)],
return_tensors="pt"
).to("cuda")
Hardware | Tokens/Second | VRAM Usage | Latency |
---|---|---|---|
RTX 3060 12GB | 18.2 | 11.4GB | 550ms |
RTX 3090 24GB | 42.7 | 19.8GB | 230ms |
A100 40GB | 89.1 | 33.2GB | 110ms |
python# Multi-GPU Setup
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-4",
device_map="auto",
max_memory={0:"20GB",1:"20GB"},
offload_folder="offload"
)
# DeepSpeed Integration
ds_config = {
"train_batch_size": 8,
"fp16": {"enabled": True},
"zero_optimization": {"stage": 2}
}
from transformers import
AutoTokenizertokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4")
sanitized_input = tokenizer.sanitize_special_tokens(user_input)
python -m onnxruntime.transformers.optimizer --input=phi4.onnx --output=
phi4_optimized.onnxtorch.backends.directml.enabled(True)
device = torch.directml.device()
Microsoft Phi-4 is a versatile model that excels in complex reasoning tasks. By following the steps outlined above, you can successfully run Phi-4 on Windows and leverage its capabilities for a variety of applications, from educational content creation to solving complex mathematical problems.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.