3 min to read
Sesame CSM 1B is a cutting-edge, open-source speech synthesis model optimized for local deployment. It enables lifelike voice generation and cloning with efficient VRAM usage, making it ideal for users with consumer GPUs like the RTX 4060 (8GB VRAM). This guide covers installation, configuration, and advanced usage on Ubuntu systems to ensure a seamless deployment.
# Update system packages
sudo apt update && sudo apt upgrade -y
# Install Python and essential packages
sudo apt install python3 python3-pip python3-venv git -y
# Install NVIDIA drivers and CUDA for GPU acceleration
sudo apt install nvidia-driver-535 cuda-12-2 -y
git clone https://github.com/sesame-ai/csm-1b.git
cd csm-1b
python3 -m venv venv
source venv/bin/activate
pip install torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
python scripts/download_models.py
~/.cache/sesame
by default.Create a test script:
# test_hello.py
from sesame import Synthesizer
synth = Synthesizer("sesame-1b")
audio = synth.generate("Hello from Sesame CSM 1B")
audio.save("output.wav")
Run the script:
python test_hello.py
librosa
or numba
) can be ignored initially..wav
file of the target voice in ./samples
.python scripts/clone_voice.py --text "Custom speech here" --reference samples/your_voice.wav
--seed
for reproducibility.Technique | Command/Setting | VRAM Reduction |
---|---|---|
FP16 Precision | torch.set_float32_matmul_precision('medium') |
30% |
Batch Size Reduction | --batch_size 1 |
20% |
Gradient Checkpointing | --use_checkpointing |
15% |
# Reinstall specific library versions
pip install Flask==2.0.3 PyMySQL==1.0.2 --force-reinstall
export http_proxy=http://proxy.example.com:80
export https_proxy=$http_proxy
Modify config.yaml
to adjust settings:
voice:
pitch_range: [60, 80] # Adjust for tonal variation
speed: 1.2 # 1.0 = default speed
Expose endpoints using Flask:
from flask import Flask, request
from sesame import Synthesizer
app = Flask(__name__)
@app.route('/synthesize', methods=['POST'])
def synthesize():
text = request.json['text']
audio = Synthesizer().generate(text)
return audio.to_bytes()
speakers = [0, 1, 0, 0]
transcripts = [
"Hey how are you doing.",
"Pretty good, pretty good.",
"I'm great.",
"So happy to be speaking to you.",
]
audio_paths = [
"utterance_0.wav",
"utterance_1.wav",
"utterance_2.wav",
"utterance_3.wav",
]
def load_audio(audio_path):
audio_tensor, sample_rate = torchaudio.load(audio_path)
audio_tensor = torchaudio.functional.resample(
audio_tensor.squeeze(0), orig_freq=sample_rate, new_freq=generator.sample_rate
)
return audio_tensor
segments = [
Segment(text=transcript, speaker=speaker, audio=load_audio(audio_path))
for transcript, speaker, audio_path in zip(transcripts, speakers, audio_paths)
]
audio = generator.generate(
text="Me too, this is some cool stuff huh?",
speaker=1,
context=segments,
max_audio_length_ms=10_000,
)
torchaudio.save("audio.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)
from generator import load_csm_1b
import torchaudio
import torch
if torch.backends.mps.is_available():
device = "mps"
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
generator = load_csm_1b(device=device)
audio = generator.generate(
text="Hello from Sesame.",
speaker=0,
context=[],
max_audio_length_ms=10_000,
)
torchaudio.save("audio.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)
Sesame CSM 1B offers enterprise-grade voice synthesis on consumer hardware. By following this guide, users can deploy it on Ubuntu with GPU acceleration, troubleshoot common issues, and extend functionality through APIs or custom voice profiles.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.