Skip to main content
VidFlow
 Sign inStart free50 cr
TRY
Quickmode
Director
Showcase
script.fountain
final.mp4
why?
WhyVidFlow
from $0.017
PER CR
Pricing
Diaries
Changelog
Docs
?
FAQ
⛑
Help
neural-voice-models.diary — VidFlow
← all diaries
PRODUCT⏱ 6 min · APR 28 · 2026By The VidFlow Team

Three voice providers, picked for trade-offs

Three TTS providers wired — ElevenLabs, Inworld, MiniMax. No custom-trained models. When to pick which.


There are three TTS providers in the VidFlow code: ElevenLabs (via the GenAIPro wrapper), Inworld, and MiniMax. The `VoiceProvider` enum has exactly those three values. There are no custom fine-tuned models — that claim was in a previous version of this page and it wasn't true. Here's what's actually there and how to pick.

ElevenLabs (via GenAIPro). Strongest for emotional range and expressive delivery. The GenAIPro client is a thin wrapper around ElevenLabs' API that adds our authentication and credit accounting. Voice catalog is fetched live from ElevenLabs — that's why the count changes — and you select per-voice settings (stability, similarity boost, style) in the voiceover panel.

Inworld. Newer in the lineup. We use the `inworld-tts-1.5-max` model. Strength is real-time / streaming use cases; tradeoff is fewer voices and slightly less prosody than ElevenLabs. Good fit for short-form where speed matters more than nuance.

MiniMax. The widest model selection of the three. Wired: `speech-2.6-hd`, `speech-2.6-turbo`, `speech-02-hd`, `speech-02-turbo`, `speech-01-hd`, `speech-01-turbo`. The `hd` variants are slower but cleaner; `turbo` is half the latency at noticeable quality loss. Multilingual support is the standout — if you're shipping in more than English, MiniMax is the default.

How many voices, total. Honest answer: we don't curate a count. Each provider exposes its own voice catalog through its API; we fetch the catalog at runtime, so the number drifts as providers add and remove voices. The previous '50+ neural voices' headline was misleading because it implied a VidFlow-curated count. The real ceiling is whatever your three providers ship combined — usually well past that figure across all of them, but it's not our number to claim.

Multi-voice. `VoiceMode.MULTI` lets you assign a different voice ID per character. Each `Character` row in the Visual Bible carries an optional `voiceId`; the voiceover stage uses it when generating that character's lines. The narrator stays consistent across the project (one voice for all narration), and named characters get their own voices.

No custom fine-tunes. None. We don't train. The provider's catalog is the catalog. Some marketing variants of this product imply otherwise; the code doesn't back that claim, and we'd rather you know.

Voice design — not yet. MiniMax exposes a voice-design API (describe a voice in words, get a synthesized voice ID back). The client is wired in `src/services/minimax/`, but the UI for it isn't built. When it ships, you'll be able to design a custom voice from a prompt and use it across a project without uploading sample audio.

Picking guide, short version. - English, emotional range, single narrator → ElevenLabs. - Multilingual or fast iteration → MiniMax (the `hd` model, not turbo). - Real-time / streaming use cases → Inworld. - More than one character → MultiVoice mode, different voice IDs per character in the Visual Bible.

One thing that doesn't matter as much as people think. Voice quality variance across providers is smaller than the variance from prosody settings within a single provider. The default stability / similarity values are not the right values for your specific narrator. Spend ten minutes tuning them before you generate a full chapter; you'll save a generation round.

POSTED FROM THE FLOOR · neural-voice-models.diary
Try VidFlow →