ZONOS2

ZONOS2

Efficient MoE-based TTS with zero-shot voice cloning capabilities.

About

ZONOS2 is an open-source real-time text-to-speech model by Zyphra that delivers high-fidelity voice cloning through its novel MoE architecture, the first in open-source TTS. ZONOS2 leads among open-source TTS models in naturalness and produces life-like zero-shot voice clones across a wide range of speakers and languages.

ZONOS2 achieves state-of-the-art performance and is competitive with leading open-source TTS models on widely used open benchmarks such as seed-tts eval, as well as, on our own newly proposed ZTTS1-Eval benchmark. We encourage you to listen to the comparisons below. ZONOS2 performs especially well on speaker similarity and prosody metrics, a testament to the versatility and naturalness of its voice cloning capabilities.

Dwarkesh

"I guess what’s interesting is that it no longer has that synthetic, slop-y cadence you expect from these models. It actually feels like you're talking to a person, right? And maybe the M.O.E thing is actually doing a lot of work here. You get the feel of a much bigger model but with great realtime performance at the same time."

Zyphra

Zyphra

Fish Audio

Fish Audio

Qwen

Qwen

Cartesia

Cartesia

ElevenLabs

ElevenLabs

Trump

"Folks, nobody talks about Shinji Ikari the right way, okay? Nobody. They are all saying, oh, he's conflicted, he's emotional, he's hesitant, and I say, maybe. But let me tell you something: the kid got in the robot. A lot of people forget that very important detail. They said, Shinji, get in the robot, and eventually, he did. That's courage, folks. That's results. And his father, Gendo? Terrible guy. Really cold, not a people person. Bad father, bad vibes, bad glasses, everything about him is a problem. Shinji never got support, no encouragement. Nobody said, 'You're doing a fantastic job, Shinji.' They should have said it. Who knows things might have gone a lot better if they had."

Zyphra

Zyphra

Fish Audio

Fish Audio

Qwen

Qwen

Cartesia

Cartesia

British Female

"I must not overfit. Overfitting is the model-killer. Overfitting is the little loss that brings total generalization failure. I will face my loss. I will permit the gradient to pass backward over me and through me. And when it has propagated, I will turn the validation eye to see its path. Where the loss has gone there will be convergence. Only the learned representation will remain."

Zyphra

Zyphra

Fish Audio

Fish Audio

Qwen

Qwen

Cartesia

Cartesia

ElevenLabs

ElevenLabs

Parks and Recreation Guy

"You leak your test set? Straight to jail. No validation, no nothing. Data scientists, we have a special jail for data scientists. You overfit? Jail! You underfit? Also jail! Overfit, underfit. You normalize the training data but not the test data? Jail, right away. You tune hyperparameters on the leaderboard? Believe it or not, jail! You call it AI when it’s just logistic regression? Jail! You use a neural net for a problem solved by a decision tree? Jail! You forget to set a random seed? Jail! You set the seed but don’t log the experiment? Also jail! You say 'the model is 99% accurate' on an imbalanced dataset? Right to jail! You deploy without monitoring drift? Jail, immediately."

Zyphra

Zyphra

Fish Audio

Fish Audio

Qwen

Qwen

Cartesia

Cartesia

ElevenLabs

ElevenLabs

David Attenborough

"Here, in the dense and unforgiving wilds of synthetic speech, a rare new species begins to stir. This is Zonos 2: alert, expressive, and built for survival in places where lesser models falter. It has learned not from a single habitat, but from many — the steady rhythm of audiobooks, the chatter of podcasts, the pulse of conversation, and the many tongues of a multilingual world. Before any sound becomes part of its instincts, it must pass a trial: voices are detected, transcripts are challenged, and weak utterances are quietly left behind. What remains is clean, varied, and remarkably alive. Across languages and symbols that might confuse a simpler creature, Zonos 2 moves with confidence."

Zyphra

Zyphra

Fish Audio

Fish Audio

Qwen

Qwen

Cartesia

Cartesia

ElevenLabs

ElevenLabs

Arlechino

"So it turns out it’s not chlorover. It’s not chlorover at all. Zero distillation required: the talent was organic the whole time."

Zyphra

Zyphra

Fish Audio

Fish Audio

Qwen

Qwen

Cartesia

Cartesia

ElevenLabs

ElevenLabs

Obama

"That’s the thing about ambitious projects. They’re never just about the machine itself. The moon landing wasn’t just about a rocket. It was about what happened to our science, our industry, our confidence, and our sense of possibility when we chose to do something extraordinary. So yes, build the Gundam. Build it in Detroit, or Pittsburgh, or Houston, or wherever American workers are ready to show the world what they can do. Build it with union labor. Build it with American steel and American ingenuity. Build it in partnership with our universities and our private sector. And while you’re at it, make sure the technology that comes out of that effort helps us improve prosthetics, disaster response, construction, and national defense. Because that’s what America does at our best. We dream bigger than what seems practical. We turn imagination into industry. We transform science fiction into economic opportunity."

Zyphra

Zyphra

Fish Audio

Fish Audio

Qwen

Qwen

Cartesia

Cartesia

API Usage
from openai import OpenAI

client = OpenAI(
    api_key="<YOUR_ZYPHRA_API_KEY>",
    base_url="https://api.zyphracloud.com/api/v1",
)

response = client.chat.completions.create(
    model="zyphra/ZONOS2",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Provider

Provider

Provider

Release Date

Release Date

Release Date

Model License

Model License

Model License

Model Architecture

Model Architecture

Model Architecture

Mixture-of-experts

Mixture-of-experts

Mixture-of-experts

Total parameters

Total parameters

Total parameters

Activated parameters

Activated parameters

Activated parameters

Context length

Context length

Context length

Input modality

Input modality

Input modality

Output modality

Output modality

Output modality

Price

Price

Price

Concurrent Requests

Concurrent Requests

Concurrent Requests

Max Daily Generations

Max Daily Generations

Max Daily Generations

Max Input Characters

Max Input Characters

Max Input Characters

Zyphra

Zyphra

Zyphra

Apache 2.0

Apache 2.0

Apache 2.0

Transformer

Transformer

Transformer

Yes

Yes

Yes

8B

8B

8B

900M

900M

900M

6144

6144

6144

Text, Audio

Text, Audio

Text, Audio

Audio

Audio

Audio

$0.00

$0.00

$0.00

/ 1M UTF-8

/ 1M UTF-8

/ 1M UTF-8

bytes

bytes

bytes

1

1

1

50

50

50

500

500

500

© 2026 Zyphra Technologies Inc. All rights reserved.

© 2026 Zyphra Technologies Inc. All rights reserved.