Codersera

6 min to read

Run Teapot LLM on Windows: Step by Step Installation Guide

Teapot LLM is an open-source language model with approximately 800 million parameters, fine-tuned on synthetic data and optimized to run locally on resource-constrained devices such as smartphones and CPUs.

Developed by the community, Teapot LLM is designed to perform a variety of tasks, including hallucination-resistant Question Answering (QnA), Retrieval-Augmented Generation (RAG), and JSON extraction.

Key Features

  • Hallucination Resistance: Teapot LLM is trained to only answer questions using context from provided documents, reducing the likelihood of generating inaccurate or irrelevant responses.
  • Retrieval-Augmented Generation: The model can determine which documents are relevant before answering a question, ensuring responses are based on the most pertinent information.
  • Information Extraction: Teapot LLM can extract structured information from context using predefined JSON structures, making it useful for parsing documents.

Training Details

Teapot LLM is fine-tuned from flan-t5-large on a synthetic dataset of LLM tasks generated using DeepSeek-V3. The training process involves:

  • Dataset: A ~10MB synthetic dataset consisting of QnA pairs with a variety of task-specific formats.
  • Methodology: The model is trained to mimic task-specific output formats and is scored based on its ability to output relevant, succinct, and verifiable answers.
  • Hardware: Trained for approximately 10 hours on an A100 GPU provided by Google Colab.
  • Hyperparameters: Various learning rates were used, and the model was monitored to ensure task-specific performance without catastrophic forgetting.

System Requirements

Before installing Teapot LLM, ensure your system meets the following requirements:

Hardware

  • CPU: A modern multi-core processor (Intel i5/i7 or AMD Ryzen recommended).
  • GPU: NVIDIA RTX GPU with at least 8 GB VRAM for optimal performance (optional for CPU-only inference).
  • RAM: Minimum 16 GB; 32 GB or more recommended for larger models.
  • Storage: SSD with at least 100 GB free space for model files and dependencies.

Software

  • Operating System: Windows 10 or later.
  • Python: Version 3.10 or higher.
  • CUDA Toolkit: Version 12.8 or higher (for GPU acceleration).
  • Docker (Optional): For containerized setups.

Installation Methods

1. Using Docker Containers

Docker simplifies the setup process by bundling dependencies into containers.

  1. Install Docker Desktop for Windows.

Create directories to store model files and configurations:

mkdir ollama-files open-webui-files
  1. Access the Web Interface:
    Open your browser and navigate to http://localhost:4000 to interact with the model.

Pull the Teapot LLM Docker image:

docker run -d -p 4000:8080 -v /path/to/ollama-files:/root/.ollama -v /path/to/open-webui-files:/app/backend/data --name teapot-webui --restart always ghcr.io/open-webui/open-webui:teapot

2. Native Installation

For users preferring a direct installation without containers:

  1. Install Python:
    • Download Python from the official website.
    • During installation, check "Add Python to PATH."
  2. Install CUDA Toolkit (if using GPU):

Run the Model:

python main.py --model teapot --port 8080

Install Dependencies using PowerShell:

./setup_env.ps1

Clone the Teapot Repository:

git clone https://github.com/teapot-ai/teapot.git
cd teapot

3. Using Llamafile

Llamafile simplifies running LLMs by bundling them into single executables.

  1. Download the Teapot Llamafile Executable:
    Obtain it from the official release page.
  2. Launch the Application:
    Double-click the .exe file to start the application.
  3. Interact with the Model:
    Use the provided web interface or command-line prompts to work with Teapot LLM.

Getting Started

To use Teapot LLM, you can leverage the teapotai library, which simplifies model integration into production environments. Here’s a basic example of using Teapot LLM for general question answering:PythonCopy

from teapotai import TeapotAI

# Sample context
context = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
It stands at a height of 330 meters and is one of the most recognizable structures in the world.
"""

teapot_ai = TeapotAI()

answer = teapot_ai.query(
    query="What is the height of the Eiffel Tower?",
    context=context
)
print(answer)  # Output: "The Eiffel Tower stands at a height of 330 meters."

For more advanced use cases, such as Retrieval-Augmented Generation, Teapot LLM can be used with multiple documents to answer questions based on the most relevant information.

Optimization Techniques

1. GPU Acceleration

Leverage NVIDIA TensorRT for faster inference:

  • Install TensorRT: Follow NVIDIA's guidelines for installation.

Configure Teapot to Use GPU:

python main.py --model teapot --gpu

2. Quantization

Reduce model size by quantizing weights (e.g., converting to INT8). This process can greatly improve performance on machines with limited resources while maintaining acceptable accuracy.

3. Batch Processing

Increase batch sizes for tasks like text generation to improve throughput and overall efficiency.

Practical Coding Examples of Teapot LLM

Example 1: General Question Answering (QnA)

In this example, we showcase how to use Teapot LLM to answer questions based on a provided context. The model is optimized for conversational responses and is trained to avoid answering questions beyond the given context, thereby reducing hallucinations.

from teapotai import TeapotAI

# Sample context about the Eiffel Tower
context = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
It stands at a height of 330 meters and is one of the most recognizable structures in the world.
"""

# Initialize TeapotAI
teapot_ai = TeapotAI()

# Get the answer using the provided context
answer = teapot_ai.query(
    query="What is the height of the Eiffel Tower?",
    context=context
)
print(answer)  # Expected Output: "The Eiffel Tower stands at a height of 330 meters."

# Example demonstrating hallucination resistance:
context_without_height = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
"""

answer = teapot_ai.query(
    query="What is the height of the Eiffel Tower?",
    context=context_without_height
)
print(answer)  # Expected Output: "I don't have information on the height of the Eiffel Tower."

Example 2: Chat with Retrieval-Augmented Generation (RAG)

This example illustrates how to use Teapot LLM with Retrieval-Augmented Generation (RAG) to automatically select the most relevant documents before generating an answer. This approach is particularly useful when you have multiple documents and need the model to extract the most pertinent information.

from teapotai import TeapotAI

# Sample documents about various famous landmarks
documents = [
    "The Eiffel Tower is located in Paris, France. It was built in 1889 and stands 330 meters tall.",
    "The Great Wall of China is a historic fortification that stretches over 13,000 miles.",
    "The Amazon Rainforest is the largest tropical rainforest in the world, covering over 5.5 million square kilometers.",
    "The Grand Canyon is a natural landmark located in Arizona, USA, carved by the Colorado River.",
    "Mount Everest is the tallest mountain on Earth, located in the Himalayas along the border between Nepal and China.",
    "The Colosseum in Rome, Italy, is an ancient amphitheater known for its gladiator battles.",
    "The Sahara Desert is the largest hot desert in the world, located in North Africa.",
    "The Nile River is the longest river in the world, flowing through northeastern Africa.",
    "The Empire State Building is an iconic skyscraper in New York City that was completed in 1931 and stands at 1454 feet tall."
]

# Initialize TeapotAI with documents for RAG
teapot_ai = TeapotAI(documents=documents)

# Start a chat session with a retrieval prompt
answer = teapot_ai.chat([
    {
        "role": "system",
        "content": "You are an agent designed to answer facts about famous landmarks."
    },
    {
        "role": "user",
        "content": "What landmark was constructed in the 1800s?"
    }
])
print(answer)  # Expected Output: "The Eiffel Tower was constructed in the 1800s."

Additional Tips and Best Practices

Using a Virtual Environment

Creating a virtual environment is a best practice to manage project dependencies effectively. Use the following commands to set up a virtual environment:

python3 -m venv teapot-env
source teapot-env/bin/activate  # On Windows use: teapot-env\Scripts\activate

Keeping TeapotAI Updated

Always ensure you have the latest version of TeapotAI to take advantage of new features and improvements:

pip install --upgrade teapotai

Saving and Loading Models with Precomputed Embeddings

To reduce loading times, you can save a TeapotAI instance with precomputed embeddings using Python’s pickle module:

import pickle

# Save the TeapotAI model to a file
with open("teapot_ai.pkl", "wb") as f:
    pickle.dump(teapot_ai, f)

# Load the saved TeapotAI model
with open("teapot_ai.pkl", "rb") as f:
    loaded_teapot_ai = pickle.load(f)

# Verify the loaded model works as expected
print(len(loaded_teapot_ai.documents))  # Expected Output: Number of documents, e.g., 9
loaded_teapot_ai.query("What city is the Eiffel Tower in?")  # Expected Output: "The Eiffel Tower is located in Paris, France."

Applications

Teapot LLM is particularly useful for:

  • Conversational QnA: Providing friendly, conversational answers using context and documents as references.
  • Document Parsing: Efficiently extracting information from documents in various formats.
  • Educational Tools: Assisting in teaching core computer science subjects by generating examples and visualizing step-by-step logic.

Troubleshooting Common Issues:

  1. Docker Container Not Starting: Ensure Docker Desktop is running and properly configured.
  2. Python Path Errors: Verify that Python is correctly added to the system PATH.
  3. Insufficient VRAM: Switch to CPU inference if your GPU resources are inadequate.

Limitations

While Teapot LLM excels in question answering and information extraction, it is not intended for code generation, creative writing, or critical decision-making applications. Additionally, Teapot LLM has been trained primarily on English and may not perform well in other languages.

Conclusion

Running Teapot LLM locally on Windows offers unparalleled flexibility, enhanced privacy, and significant cost savings for developers and AI enthusiasts alike. Whether you choose Docker containers, native installation, or executables like Llamafile, this guide provides the steps needed for a smooth setup process.

References

  1. Run DeepSeek Janus-Pro 7B on Mac: A Comprehensive Guide Using ComfyUI
  2. Run DeepSeek Janus-Pro 7B on Mac: Step-by-Step Guide
  3. Run DeepSeek Janus-Pro 7B on Windows: A Complete Installation Guide
  4. Run Teapot LLM on Mac: Installation Guide
  5. Running OlympicCoder-7B on Windows: Installation Guide

Need expert guidance? Connect with a top Codersera professional today!

;