Daniel Rosehill

Hebrew Text-to-Speech: A Provider Comparison

Hebrew TTS AI voice synthesis comparison
Hebrew Text-to-Speech: A Provider Comparison

This is a snapshot of Hebrew text-to-speech capabilities as of March 2025, comparing voice quality across multiple providers including voice cloning experiments via Replicate.

GitHub repository: danielrosehill/Hebrew-TTS-Providers

danielrosehill/Hebrew-TTS-Providers ★ 0

Point in time snapshot of TTS providers with Modern Hebrew support

Updated Mar 2026

Key Findings

  • MiniMax (voice cloning via Replicate) — Best results. Cloned voices sounded natural in Hebrew using T2A v2.6 Turbo with Hebrew boost

  • Edge TTS (Microsoft stock voices Avri, Hila) — Good quality, free, tested at 100% and 70% speed

  • Gemini (Puck, Zephyr via Google AI Studio) — Good quality via Gemini 2.5 Flash Preview TTS

  • ElevenLabs (v3 model) — Good, but requires language_code: "he". Multilingual v2 is unintelligible for Hebrew

  • Chatterbox (voice cloning via Replicate) — Poor. Cloning didn't carry through to Hebrew

  • Resemble AI — Poor. Needs nekudot (diacritics) for intelligible output

Voice Cloning Method

Voice clones were created from approximately one minute of English source audio on Replicate, then used with the Hebrew language parameter to generate Hebrew speech. This tests how well each provider's voice cloning transfers across languages.

Provider Details

  • MiniMax T2A v2.6 Turbo — Used via Replicate with voice cloning and Hebrew language boost. Produced the most natural-sounding Hebrew output.

  • Microsoft Edge TTS — Stock Hebrew voices (Avri male, Hila female). Free to use via edge-tts Python package. Tested at both full and reduced speed.

  • Google Gemini TTS — Gemini 2.5 Flash Preview TTS accessed through Google AI Studio. Voices Puck and Zephyr tested.

  • ElevenLabs — v3 model works well with explicit language_code parameter. The older Multilingual v2 model fails completely for Hebrew.

  • Chatterbox — Voice cloning via Replicate. The cloned voice character did not transfer to Hebrew output.

  • Resemble AI — Requires fully diacritized (nekudot) Hebrew text input to produce intelligible speech, making it impractical for most use cases.

Resources

Phonikud TTS (Hugging Face) | ivrit.ai | Replicate TTS Collection

Daniel Rosehill

Daniel Rosehill

AI developer and technologist specializing in AI systems, workflow orchestration, and automation. Specific interests include agentic AI, workflows, MCP, STT and ASR, and multimodal AI.