Codersera

3 min to read

How to Run Sesame CSM 1B on Windows: Step-by-Step Installation

Sesame CSM 1B is an open-source speech model designed for lifelike AI-generated voices, enabling offline voice synthesis and cloning on local hardware. This guide provides a step-by-step approach to installing and running Sesame CSM 1B on a Windows machine, covering prerequisites, installation steps, testing, and troubleshooting.

What is Sesame CSM 1B?

Sesame CSM 1B is engineered to deliver high-quality voice generation and cloning capabilities. It can replicate voices with impressive accuracy and generate speech from text inputs. The model is particularly useful for applications such as voice assistants, content creation, and accessibility tools.

Key Features

  • Offline operation – No internet connectivity required after installation.
  • Voice cloning – Replicates specific voices for personalization.
  • Efficient resource usage – Works effectively on systems with limited VRAM (e.g., 8GB).

System Requirements

Ensure your system meets the following minimum requirements before installation:

Hardware

  • Processor: Dual-core CPU or better.
  • Memory: 8GB RAM (recommended for optimal performance).
  • VRAM: Dedicated GPU with at least 8GB VRAM (e.g., NVIDIA RTX 4060).
  • Disk Space: Minimum of 10GB free storage.

Software

  • Operating System: Windows 10 or Windows 11 (64-bit).
  • Python: Version 3.8 or higher.
  • Microsoft Visual C++ Redistributable Packages: Latest version installed.
  • Hugging Face CLI: Required for downloading model dependencies.

Prerequisites

Install Python

  1. Download Python from the official website.
  2. During installation, check the box to add Python to PATH.

Verify the installation:

python --version

Install Git

Git is required to clone the Sesame CSM repository:

  1. Download Git from the official website.
  2. Install it with default settings.

Verify the installation:

git --version

Install Hugging Face CLI

Hugging Face CLI is necessary for model authentication:

Authenticate by generating an access token from your Hugging Face account and running:

huggingface-cli login

Open Command Prompt and run:

pip install huggingface_hub

Step-by-Step Installation

Follow these steps to install and configure Sesame CSM 1B on your Windows system:

Step 1: Clone the Repository

  1. Open Command Prompt or PowerShell.
  2. Navigate to the directory where you want to store project files.

Run:

git clone <REPOSITORY_URL>

Replace <REPOSITORY_URL> with the URL of the Sesame CSM GitHub repository.

Step 2: Set Up a Virtual Environment

Activate the virtual environment:

.\venv\Scripts\activate

Create a virtual environment:

python -m venv venv

Navigate to the cloned repository folder:

cd sesame-csm

Step 3: Install Dependencies

Run the following command to install all required libraries:

pip install -r requirements.txt

Step 4: Download Models

Ensure Hugging Face CLI is authenticated, then run the script provided in the repository to download necessary models:

python download_models.py

Testing Sesame CSM 1B

After installation, test the model using a sample script:

Generate Speech from Text

  1. Create a Python script named test.py in the repository folder.
  2. Locate and play output.wav in your file explorer.

Run the script:

python test.py

Add the following code:

from sesame_csm import generate_audio

text = "Hello from Sesame!"
audio_path = "output.wav"

generate_audio(text, audio_path)
print(f"Audio saved at {audio_path}")

Voice Cloning

To clone a voice:

  1. Provide reference audio files as input.
  2. Modify the script to include voice cloning parameters as described in the repository documentation.

Usage

  1. Generate Speech with Context:PythonCopy
speakers = [0, 1, 0, 0]
transcripts = [
    "Hey how are you doing.",
    "Pretty good, pretty good.",
    "I'm great.",
    "So happy to be speaking to you.",
]
audio_paths = [
    "utterance_0.wav",
    "utterance_1.wav",
    "utterance_2.wav",
    "utterance_3.wav",
]

def load_audio(audio_path):
    audio_tensor, sample_rate = torchaudio.load(audio_path)
    audio_tensor = torchaudio.functional.resample(
        audio_tensor.squeeze(0), orig_freq=sample_rate, new_freq=generator.sample_rate
    )
    return audio_tensor

segments = [
    Segment(text=transcript, speaker=speaker, audio=load_audio(audio_path))
    for transcript, speaker, audio_path in zip(transcripts, speakers, audio_paths)
]
audio = generator.generate(
    text="Me too, this is some cool stuff huh?",
    speaker=1,
    context=segments,
    max_audio_length_ms=10_000,
)

torchaudio.save("audio.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)
  1. Generate a Sentence:PythonCopy
from generator import load_csm_1b
import torchaudio
import torch

if torch.backends.mps.is_available():
    device = "mps"
elif torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

generator = load_csm_1b(device=device)

audio = generator.generate(
    text="Hello from Sesame.",
    speaker=0,
    context=[],
    max_audio_length_ms=10_000,
)

torchaudio.save("audio.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)

Troubleshooting

Common Issues and Solutions

Model Download Fails

  • Ensure Hugging Face CLI is authenticated properly.
  • Check internet connectivity.

High VRAM Usage

  • Reduce batch size or use a lower-resolution model if available.

Audio Quality Variations

  • Use consistent reference audio files for cloning tasks.

Advanced Configuration

Customizing Output Audio

Modify parameters such as sampling rate and voice tone in configuration files provided with the repository.

Integration with Other Applications

Use APIs or scripts to integrate Sesame CSM into larger projects like chatbots or multimedia tools.

Conclusion

Running Sesame CSM 1B on Windows enables powerful offline speech synthesis and voice cloning capabilities for personal projects or professional applications. By following this guide, you can set up and test the model efficiently while troubleshooting common issues along the way.

References

  1. Run DeepSeek Janus-Pro 7B on Mac: A Comprehensive Guide Using ComfyUI
  2. Run DeepSeek Janus-Pro 7B on Mac: Step-by-Step Guide
  3. Run DeepSeek Janus-Pro 7B on Windows: A Complete Installation Guide

Need expert guidance? Connect with a top Codersera professional today!

;