Codersera

About Services Why Contact Blog Tools

Say Goodbye to Paid Screen Recording

No Credit Card Required

A free & open source alternative to Loom

ai model

AI Programmer

AI Training

+ 2 More

3 min to read

Comprehensive Guide to Setting Up the Qwen2.5-1M Model on Windows

3X Your Interview Chances

AI Resume Builder

Import LinkedIn, get AI suggestions, land more interviews

Create Your Imagination

AI-Powered Image Editing

No restrictions, just pure creativity. Browser-based and free!

Deploying the Qwen2.5-1M model locally on a Windows machine may seem complex due to its advanced features and hardware requirements. This guide provides a detailed, step-by-step approach to setting up Qwen2.5-1M, enabling users to leverage its cutting-edge capabilities in natural language processing and machine learning.

What is Qwen2.5-1M?

The Qwen2.5-1M model is a powerful language model developed by Alibaba's Qwen team. It boasts an impressive token capacity, supporting up to 1 million tokens. With advanced features like Dual Chunk Attention, Qwen2.5-1M excels in a range of NLP and ML tasks. The model comes in two primary configurations:

Qwen2.5-7B-Instruct-1M
Qwen2.5-14B-Instruct-1M

Each configuration has significant VRAM requirements, making it essential to ensure your system can handle the load for optimal performance.

Prerequisites for Installation

Before you begin, make sure your system meets the following hardware and software requirements:

Hardware Requirements

GPU: Recommended GPU architecture is either Ampere or Hopper for best performance.
VRAM:
- Qwen2.5-7B-Instruct-1M: Minimum of 120GB total across GPUs.
- Qwen2.5-14B-Instruct-1M: Minimum of 320GB total across GPUs.

Software Requirements

Operating System: Windows 10 or later.
CUDA Version: 12.1 or 12.3.
Python Version: Between 3.9 and 3.12.

Step-by-Step Installation Process

Step 1: Install CUDA

CUDA is necessary for utilizing the GPU capabilities of your system. Follow these steps:

Go to the NVIDIA CUDA Toolkit page.
Select your operating system (Windows) and download the appropriate installer for CUDA version 12.1 or 12.3.
Complete the installation as per the on-screen instructions.

Step 2: Install Python

Ensure you have a compatible version of Python:

Download Python from the official website.
Choose a version between 3.9 and 3.12.
Run the installer and make sure to check the option to Add Python to PATH.

Step 3: Install Git

Git is required to clone repositories. If it's not already installed, follow these steps:

Download Git from git-scm.com.
Follow the provided installation steps.

Step 4: Clone the vLLM Repository

Clone the necessary repository and install it in editable mode by running:

git clone -b dev/dual-chunk-attn git@github.com:QwenLM/vllm.git
cd vllm
pip install -e . -v

Step 5: Install Additional Dependencies

To run Qwen2.5-1M efficiently, install the following dependencies:

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121
pip install transformers

If you're using CUDA 12.3, replace cu121 with cu123.

Step 6: Set Up Environment Variables

To configure your system to recognize CUDA, follow these steps:

Right-click "This PC" and choose "Properties."
Click "Advanced system settings" and then "Environment Variables."
Add a new system variable:
- Variable name: CUDA_HOME
- Variable value: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.X (replace v12.X with your installed CUDA version).
Add C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.X\bin to your Path variable.

Step 7: Start the OpenAI-Compatible API Service

Once the environment is set up, launch the API service with the following command:

vllm serve Qwen/Qwen2.5-7B-Instruct-1M \
--tensor-parallel-size 4 \
--max-model-len 1010000 \
--enable-chunked-prefill --max-num-batched-tokens 131072 \
--enforce-eager --max-num-seqs 1

Parameter Explanations:

--tensor-parallel-size: Define based on the number of GPUs (up to 4 for the 7B model).
--max-model-len: Adjust the input sequence length if memory issues arise.
--max-num-batched-tokens: Controls the chunk size in Chunked Prefill (recommended: 131072).
--max-num-seqs: Limits the number of concurrent sequences processed.

Step 8: Test the Installation

To confirm that everything is working, you can test with a simple chat completion request using Python:

from openai import OpenAI

client = OpenAI(base_url='http://localhost:8000/v1/', api_key='your_api_key')
response = client.chat.completions.create(
    messages=[
        {'role': 'user', 'content': 'Hello! How can I use Qwen?'}
    ],
    model='Qwen/Qwen2.5-7B-Instruct',
)

print("Response:", response)

Replace 'your_api_key' with a valid API key if needed.

Common Troubleshooting Tips

VRAM Issues

If you encounter VRAM-related errors, try reducing the max-model-len or adjusting the tensor-parallel-size.

API Connection Issues

Ensure the API is running at http://localhost:8000. If you face connection issues, check your firewall settings and ensure the service is active.

Conclusion

This guide provides the essential steps to deploy Qwen2.5-1M on Windows. By following the outlined steps, you'll be able to utilize this powerful model for advanced language processing tasks. Keep up-to-date with future improvements from Alibaba’s Qwen team to maximize performance and capabilities.

For those who prefer a MacOS setup, you can refer to our dedicated guide on setting up Qwen2.5-1M on Mac for detailed instructions on the process for Apple devices.

Redefine Creativity

AI Image Editor

Free browser-based tool for stunning visual creations

Redefine Creativity

AI Image Editor

Free browser-based tool for stunning visual creations

Need expert guidance? Connect with a top Codersera professional today!

;

Stop Paying for Screen Recording

Switch to Free & Open Source

Built for developers, by developers

Codersera

Say Goodbye to Paid Screen Recording

No Credit Card Required

Comprehensive Guide to Setting Up the Qwen2.5-1M Model on Windows

3X Your Interview Chances

AI Resume Builder

Create Your Imagination

AI-Powered Image Editing

What is Qwen2.5-1M?

Prerequisites for Installation

Hardware Requirements

Software Requirements

Step-by-Step Installation Process

Step 1: Install CUDA

Step 2: Install Python

Step 3: Install Git

Step 4: Clone the vLLM Repository

Step 5: Install Additional Dependencies

Step 6: Set Up Environment Variables

Step 7: Start the OpenAI-Compatible API Service

Parameter Explanations:

Step 8: Test the Installation

Common Troubleshooting Tips

VRAM Issues

API Connection Issues

Conclusion

Redefine Creativity

AI Image Editor

Redefine Creativity

AI Image Editor

Stop Paying for Screen Recording

Switch to Free & Open Source

Company

Hire

Looking for Job

Support

Tools