Connect with OneDrive
High Quality Video Sharing
Store & share your recordings seamlessly with OneDrive integration
4 min to read
Microsoft's Phi-4 represents a breakthrough in efficient language models, offering state-of-the-art reasoning capabilities with its 14-billion parameter architecture. While originally designed for Linux environments, this guide provides detailed methodologies for Windows users to harness its multimodal capabilities.
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
Install base dependencies
choco install -y git python310 cuda vcredist2022
mkdir Phi4-Windows && cd
Phi4-Windowspython -
m venv phi4_env.
\phi4_env\Scripts\activatepip install
torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install
flash-attn --no-build-isolationpip install
transformers accelerate soundfile pillow scipy peftollama run vanilj/Phi-4 "Explain quantum computing in simple terms"
docker
pull ollama/ollama:latestdocker run -d --gpus all -p 11434
:11434 ollama/ollamaimport transformers
model_id = "C:\\phi4"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": "auto"},
device_map="cuda",
)
messages = [
{"role": "system", "content": "You are a funny teacher trying to make lectures as interesting as possible and you give real-life examples"},
{"role": "user", "content": "How to explain gravity to high-school students?"},
]
outputs = pipeline(messages, max_new_tokens=128)
print(outputs[0]["generated_text"][-1])
Download the Model:PythonCopy
from huggingface_hub import snapshot_download
snapshot_download(repo_id="microsoft/phi-4", local_dir="C:\\phi4")
Install Additional Libraries:bashCopy
pip install huggingface-hub
pip install transformers
pip install accelerate
For GPU:bashCopy
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
For CPU:bashCopy
pip install torch torchvision torchaudio
Create a Virtual Environment:bashCopy
mkdir phi4
cd phi4
python -m venv venv
venv\Scripts\activate
Install CUDA and add the following environment variables:bashCopy
CUDA_HOME = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
Path += %CUDA_HOME%\bin; %CUDA_HOME%\libnvvp
Output:plaintextCopy
Feedback: The solution provided is incorrect. The correct first derivative of ln(x^2 + 1) is 2x / (x^2 + 1). Here's the step-by-step reasoning:
1. Apply the chain rule: d/dx [ln(u)] = 1/u * du/dx, where u = x^2 + 1.
2. Compute du/dx: d/dx [x^2 + 1] = 2x.
3. Combine the results: (1 / (x^2 + 1)) * 2x = 2x / (x^2 + 1).
Output:plaintextCopy
{'role': 'assistant', 'content': 'Alright, class, gather around! Today, we\'re diving into the mysterious and mind-bending world of gravity. Now, I know what you\'re thinking: "Gravity? Isn\'t that just why we don\'t float away into space?" Well, yes, but there\'s so much more to it! Let\'s break it down with some real-life examples that\'ll make your heads spin—figuratively, of course, because gravity keeps them attached to your bodies!'}
model = AutoModelForCausalLM.from_pretrained(
attn_implementation="flash_attention_2",
torch_dtype=torch.
float16)
python -m transformers.onnx --model=microsoft/Phi-4 --feature=causal-lm --quantize=
avx512_vnnipipeline = transformers.pipeline(
"text-generation",
model=model,
device=0,
batch_size=4,
max_new_tokens=512
)
vcredist --all --quiet --
norestartpython# Image Analysis
image = Image.open("street_view.jpg")
inputs = processor(
text="<|user|><|image_1|>Describe traffic conditions<|end|><|assistant|>",
images=image,
return_tensors="pt"
).to("cuda")
# Audio Transcription
audio, rate = sf.read("meeting_recording.flac")
audio_inputs = processor(
text="<|user|><|audio_1|>Transcribe and summarize<|end|><|assistant|>",
audios=[(audio, rate)],
return_tensors="pt"
).to("cuda")
Hardware | Tokens/Second | VRAM Usage | Latency |
---|---|---|---|
RTX 3060 12GB | 18.2 | 11.4GB | 550ms |
RTX 3090 24GB | 42.7 | 19.8GB | 230ms |
A100 40GB | 89.1 | 33.2GB | 110ms |
python# Multi-GPU Setup
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-4",
device_map="auto",
max_memory={0:"20GB",1:"20GB"},
offload_folder="offload"
)
# DeepSpeed Integration
ds_config = {
"train_batch_size": 8,
"fp16": {"enabled": True},
"zero_optimization": {"stage": 2}
}
from transformers import
AutoTokenizertokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4")
sanitized_input = tokenizer.sanitize_special_tokens(user_input)
python -m onnxruntime.transformers.optimizer --input=phi4.onnx --output=
phi4_optimized.onnxtorch.backends.directml.enabled(True)
device = torch.directml.device()
Microsoft Phi-4 is a versatile model that excels in complex reasoning tasks. By following the steps outlined above, you can successfully run Phi-4 on Windows and leverage its capabilities for a variety of applications, from educational content creation to solving complex mathematical problems.
Need expert guidance? Connect with a top Codersera professional today!