4 min to read
Zonos-TTS is an open-source, multilingual, real-time text-to-speech (TTS) model that offers high expressiveness and voice cloning capabilities. Released by ZyphraAI under the Apache 2.0 license, Zonos-TTS supports features like real-time voice cloning, audio prefix input, and fine control over speech attributes such as rate, pitch, and emotion.
This guide provides a step-by-step method to install and run Zonos-TTS locally on an Ubuntu system.
Zonos-TTS leverages deep learning methodologies to generate naturalistic speech outputs from textual inputs. The framework incorporates speaker embeddings and audio prefix conditioning to enhance voice fidelity. Notable features include:
Ensure your Ubuntu system meets the following requirements:
You can install Zonos-TTS via Docker or a manual (DIY) installation.
Docker simplifies dependency management and deployment.
Steps:
Generate Sample Audio:
python3 sample.py
Run Docker Compose:
docker compose up
Clone the Zonos Repository:
git clone https://github.com/Zyphra/Zonos.git
cd Zonos
Install Docker & Docker Compose:
sudo apt update
sudo apt install docker.io docker-compose
sudo systemctl start docker
sudo systemctl enable docker
For manual installation, follow these steps:
Generate Sample Audio:
python3 sample.py
Clone the Zonos Repository:
git clone https://github.com/Zyphra/Zonos.git
cd Zonos
Install Python Dependencies:
python3 -m pip install --upgrade uv
uv venv
source .venv/bin/activate
uv sync --no-group main
uv sync
Install eSpeak:
sudo apt install espeak-ng
Once installed, use Python to generate speech:
import torch
import torchaudio
from zonos.model import Zonos
from zonos.conditioning import make_cond_dict
# Load the model
model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-hybrid", device="cuda")
model.bfloat16()
# Load example audio for voice cloning
wav, sampling_rate = torchaudio.load("./exampleaudio.mp3")
spk_embedding = model.embed_spk_audio(wav, sampling_rate)
torch.manual_seed(421)
# Define conditioning parameters
cond_dict = make_cond_dict(
text="Hello, world!",
speaker=spk_embedding.to(torch.bfloat16),
language="en-us",
)
# Prepare conditioning and generate speech
conditioning = model.prepare_conditioning(cond_dict)
codes = model.generate(conditioning)
wavs = model.autoencoder.decode(codes).cpu()
# Save the generated audio
torchaudio.save("sample.wav", wavs, model.autoencoder.sampling_rate)
Zonos provides two models:
Transformer Model: Use this for higher fidelity:
model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device="cuda")
Zonos offers multiple model configurations:
Transformer Model: Higher fidelity output, albeit with increased computational demand.
model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device="cuda")
Adjust the language
parameter for speech synthesis in different languages:
cond_dict = make_cond_dict(
text="Bonjour le monde!",
speaker=spk_embedding.to(torch.bfloat16),
language="fr-fr",
)
Fine-tune output speech by modifying expressive parameters:
cond_dict = make_cond_dict(
text="I am very happy!",
speaker=spk_embedding.to(torch.bfloat16),
language="en-us",
emotion="happiness",
speaking_rate=1.2,
pitch_variation=0.1,
)
Modify the language
parameter for multilingual support:
cond_dict = make_cond_dict(
text="Bonjour le monde!",
speaker=spk_embedding.to(torch.bfloat16),
language="fr-fr",
)
Fine-tune speech attributes:
cond_dict = make_cond_dict(
text="I am very happy!",
speaker=spk_embedding.to(torch.bfloat16),
language="en-us",
emotion="happiness",
speaking_rate=1.2,
pitch_variation=0.1,
)
uv sync
to fix missing packages.Zonos-TTS can be used for:
Engage with the Zonos-TTS ecosystem via:
Zonos-TTS is a powerful open-source TTS model, offering multilingual support and expressive voice synthesis. Whether using Docker for quick deployment or DIY installation for greater control, this guide helps set up and run Zonos-TTS efficiently on Ubuntu. Its applications range from content creation to accessibility and research, making it a versatile tool for real-time voice synthesis.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCsāespecially those without Virtualization Technology (VT) or a dedicated graphics cardācan be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experienceāwhether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.