Codersera

How to Run DeepSeek Janus Pro 7B on Hugging Face: A Step-by-Step Guide

The DeepSeek Janus Pro 7B is a powerful language model designed for advanced text generation tasks. This guide provides a clear, structured approach to running the model on Hugging Face, ensuring even beginners can follow along.

Prerequisites

Before starting, ensure you have:

  1. A Hugging Face Account
    • Sign up here if you don’t have one.
  2. Python 3.8 or Higher
  3. Basic Libraries Installed
  4. Install transformers, torch, and other dependencies:
pip install transformers torch torchvision torchaudio  

Installation

  1. Verify Installation:
    Ensure no errors occur during installation.

Install Required Packages:

pip install --upgrade transformers torch  

Running the Model

Step 1: Load the Model and Tokenizer

Use Hugging Face’s AutoModelForCausalLM and AutoTokenizer to load Janus Pro 7B:

from transformers import AutoModelForCausalLM, AutoTokenizer  

model = AutoModelForCausalLM.from_pretrained("deepseek-ai/Janus-Pro-7B")  
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/Janus-Pro-7B")  

Step 2: Prepare Input Text

Tokenize your input prompt:

input_text = "Describe a futuristic city."  
inputs = tokenizer(input_text, return_tensors="pt")  

Step 3: Generate Output

Run the model to generate text:

outputs = model.generate(  
    **inputs,  
    max_new_tokens=200,  # Limit output length  
    temperature=0.7,     # Control randomness (lower = more deterministic)  
)  
print(tokenizer.decode(outputs[0], skip_special_tokens=True))  

Example Code

Save this as run_janus_pro.py:

from transformers import AutoModelForCausalLM, AutoTokenizer  

def main():  
    # Load model and tokenizer  
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/Janus-Pro-7B")  
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/Janus-Pro-7B")  

    # Input prompt  
    input_text = "Explain the impact of AI on climate change."  
    inputs = tokenizer(input_text, return_tensors="pt")  

    # Generate response  
    outputs = model.generate(  
        **inputs,  
        max_new_tokens=250,  
        temperature=0.7,  
        do_sample=True  
    )  

    # Decode and print  
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))  

if __name__ == "__main__":  
    main()  

Run the Script:

python run_janus_pro.py  

Key Notes

  • Hardware Requirements:
    • A GPU with at least 16GB VRAM is recommended for faster inference.
    • The model can run on CPU but will be significantly slower.
  • Model Size: ~14GB (ensure sufficient disk space).
  • Library Updates:
  • Keep transformers updated:
pip install --upgrade transformers  

Troubleshooting

  1. CUDA Out of Memory:
    • Reduce max_new_tokens or use a smaller batch size.
  2. Model Loading Errors:
    • Ensure you’re connected to the internet for the first-time download.
    • Check Hugging Face’s model card for updates.
  3. Slow Performance:
  4. Use device_map="auto" to leverage GPU/CPU resources efficiently:
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/Janus-Pro-7B", device_map="auto")  

Here’s an expanded guide with additional insights, optimizations, and practical use cases for running DeepSeek Janus Pro 7B on Hugging Face:

Advanced Tips for Running Janus Pro 7B

1. Optimizing Model Performance

Leverage Hugging Face Pipelines:
Simplify inference with the pipeline API:

from transformers import pipeline  

generator = pipeline("text-generation", model="deepseek-ai/Janus-Pro-7B")  
output = generator("Write a poem about the ocean:", max_length=150)  
print(output[0]['generated_text'])  

Use 4-Bit Quantization (for GPU-limited systems):
Reduce memory usage by loading the model in 4-bit mode with bitsandbytes:

from transformers import BitsAndBytesConfig  

quantization_config = BitsAndBytesConfig(  
    load_in_4bit=True,  
    bnb_4bit_use_double_quant=True,  
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_compute_dtype=torch.bfloat16  
)  

model = AutoModelForCausalLM.from_pretrained(  
    "deepseek-ai/Janus-Pro-7B",  
    quantization_config=quantization_config,  
    device_map="auto"  
)  

Install bitsandbytes first:

pip install bitsandbytes  

2. Hardware Recommendations

For CPU Inference:
Add device_map="cpu" when loading the model, but expect slower performance:

model = AutoModelForCausalLM.from_pretrained("deepseek-ai/Janus-Pro-7B", device_map="cpu")  

Minimum Requirements:

Component Requirement
RAM 32GB
GPU NVIDIA RTX 3090/4090 (24GB VRAM)
Disk Space 30GB

Deploy via Hugging Face Inference API (Serverless):
Avoid local setup by using Hugging Face’s hosted API (requires API token):

import requests  

API_URL = "https://api-inference.huggingface.co/models/deepseek-ai/Janus-Pro-7B"  
headers = {"Authorization": "Bearer YOUR_API_TOKEN"}  

def query(payload):  
    response = requests.post(API_URL, headers=headers, json=payload)  
    return response.json()  

output = query({"inputs": "Explain quantum computing to a 5-year-old."})  
print(output)  

With LangChain:
Use Janus Pro 7B in LangChain workflows for chatbots or document analysis:

from langchain.llms import HuggingFacePipeline  

llm = HuggingFacePipeline.from_model_id(  
    model_id="deepseek-ai/Janus-Pro-7B",  
    task="text-generation",  
    model_kwargs={"temperature": 0.5, "max_length": 200}  
)  
response = llm("Summarize the French Revolution in 3 sentences.")  
print(response)  

4. Fine-Tuning the Model

For domain-specific tasks (e.g., medical or legal text), fine-tune Janus Pro 7B:

Training Script:
Use the Trainer class:

from transformers import TrainingArguments, Trainer  

training_args = TrainingArguments(  
    output_dir="./results",  
    per_device_train_batch_size=4,  
    num_train_epochs=3,  
    learning_rate=5e-5  
)  

trainer = Trainer(  
    model=model,  
    args=training_args,  
    train_dataset=dataset  
)  
trainer.train()  

Prepare a Dataset:
Use a dataset in Hugging Face’s datasets format. Example:

from datasets import load_dataset  

dataset = load_dataset("your_dataset_name")  

5. Common Use Cases

Code Generation:
Generate Python snippets (if the model is code-trained):

input_text = "Write a Python function to calculate Fibonacci numbers."  

Technical Explanations:
Simplify complex topics:

input_text = "Explain how blockchain works in simple terms."  

Creative Writing:
Generate stories, poems, or dialogue.

input_text = "Write a sci-fi story about a robot discovering emotions."  

6. Advanced Generation Parameters

Customize outputs with these parameters in model.generate():

Parameter Effect Example Value
temperature Controls randomness (0–1). Lower = more deterministic. 0.3
top_k Limits sampling to top-k likely tokens. 50
top_p (nucleus) Samples from top tokens summing to top_p probability. 0.9
repetition_penalty Reduces repetitive outputs. 1.2

Example:

outputs = model.generate(  
    **inputs,  
    max_new_tokens=300,  
    temperature=0.5,  
    top_k=50,  
    top_p=0.95,  
    repetition_penalty=1.1  
)  

7. Handling Long-Form Text

For multi-paragraph outputs, use stopping criteria:

from transformers import StoppingCriteria, StoppingCriteriaList  

class StopAfterParagraph(StoppingCriteria):  
    def __call__(self, input_ids, scores, **kwargs):  
        decoded_text = tokenizer.decode(input_ids[0])  
        return "\n\n" in decoded_text  # Stop after two newlines  

stopping_criteria = StoppingCriteriaList([StopAfterParagraph()])  

outputs = model.generate(  
    **inputs,  
    stopping_criteria=stopping_criteria  
)  

8. Safety and Ethical Considerations

  • Bias Mitigation:
    Use post-processing libraries like HuggingFace’s detoxify to filter harmful content.

Content Moderation:
Add a moderation layer to outputs:

from transformers import pipeline  

moderator = pipeline("text-classification", model="unitary/toxic-bert")  
if moderator(output_text)[0]['label'] == 'toxic':  
    print("Content flagged as inappropriate.")  

Final Thoughts

The DeepSeek Janus Pro 7B is a versatile model for both creative and technical tasks. Experiment with parameters, integrate it into workflows, and always validate outputs for accuracy and safety.

Additional Resources:

Happy coding! 🚀