Hebrew Text-to-Speech: A Provider Comparison

This is a snapshot of Hebrew text-to-speech capabilities as of March 2025, comparing voice quality across multiple providers including voice cloning experiments via Replicate.

GitHub repository: danielrosehill/Hebrew-TTS-Providers

danielrosehill/Hebrew-TTS-Providers ★ 2

Point in time snapshot of TTS providers with Modern Hebrew support

Updated Mar 2026

Key Findings

MiniMax (voice cloning via Replicate) — Best results. Cloned voices sounded natural in Hebrew using T2A v2.6 Turbo with Hebrew boost
Edge TTS (Microsoft stock voices Avri, Hila) — Good quality, free, tested at 100% and 70% speed
Gemini (Puck, Zephyr via Google AI Studio) — Good quality via Gemini 2.5 Flash Preview TTS
ElevenLabs (v3 model) — Good, but requires language_code: "he". Multilingual v2 is unintelligible for Hebrew
Chatterbox (voice cloning via Replicate) — Poor. Cloning didn't carry through to Hebrew
Resemble AI — Poor. Needs nekudot (diacritics) for intelligible output

Voice Cloning Method

Voice clones were created from approximately one minute of English source audio on Replicate, then used with the Hebrew language parameter to generate Hebrew speech. This tests how well each provider's voice cloning transfers across languages.

Provider Details

MiniMax T2A v2.6 Turbo — Used via Replicate with voice cloning and Hebrew language boost. Produced the most natural-sounding Hebrew output.
Microsoft Edge TTS — Stock Hebrew voices (Avri male, Hila female). Free to use via edge-tts Python package. Tested at both full and reduced speed.
Google Gemini TTS — Gemini 2.5 Flash Preview TTS accessed through Google AI Studio. Voices Puck and Zephyr tested.
ElevenLabs — v3 model works well with explicit language_code parameter. The older Multilingual v2 model fails completely for Hebrew.
Chatterbox — Voice cloning via Replicate. The cloned voice character did not transfer to Hebrew output.
Resemble AI — Requires fully diacritized (nekudot) Hebrew text input to produce intelligible speech, making it impractical for most use cases.

Resources

Phonikud TTS (Hugging Face) | ivrit.ai | Replicate TTS Collection