Real-time dictation, zero-shot voice cloning, and cinematic video dubbing — all on your desktop.
Open-source, no API keys, fully local. 646 languages.
<a href="#quickstart">Quickstart</a> ·
<a href="#features">Features</a> ·
<a href="#why-ovs">Why OVS</a> ·
<a href="#tts-engines">TTS Engines</a> ·
<a href="#asr-engines">ASR Engines</a> ·
<a href="#sponsor--donate">Donate</a> ·
<a href="#contributing">Contributing</a> ·
<a href="https://discord.gg/bzQavDfVV9">Discord</a> ·
<a href="https://github.com/debpalash/OmniVoice-Studio/raw/v0.3.8/README_CN.md"><strong>简体中文</strong></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/stargazers"><img src="https://img.shields.io/github/stars/debpalash/OmniVoice-Studio?style=flat-square&color=f59e0b" alt="Stars" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/releases/latest"><img src="https://img.shields.io/github/v/release/debpalash/OmniVoice-Studio?style=flat-square&color=10b981" alt="Release" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/raw/v0.3.8/LICENSE"><img src="https://img.shields.io/badge/license-AGPL--3.0-blue?style=flat-square" alt="License" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/issues"><img src="https://img.shields.io/github/issues/debpalash/OmniVoice-Studio?style=flat-square&color=ef4444" alt="Issues" /></a>
<a href="https://discord.gg/bzQavDfVV9"><img src="https://img.shields.io/badge/Discord-Join_Community-5865F2?style=flat-square&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://ko-fi.com/debpalash"><img src="https://img.shields.io/badge/Ko--fi-Support_Us-FF5E5B?style=flat-square&logo=ko-fi&logoColor=white" alt="Ko-fi" /></a>
<a href="https://paypal.me/palashCoder"><img src="https://img.shields.io/badge/PayPal-Donate-00457C?style=flat-square&logo=paypal&logoColor=white" alt="PayPal" /></a>

[!WARNING] OmniVoice Studio is in active beta. Things may break between releases. For the latest features and fixes, clone the repo and run from source rather than using pre-built installers. Bug reports and PRs are very welcome — open an issue or join Discord.
Get setup help · Share your dubs · Vote on the roadmap · Early access to new engines
🎙️ Voice Cloning3-second clip → mirror any voice. 646 languages, zero-shot. |
🎨 Voice DesignGender, age, accent, pitch, speed, emotion, dialect — dial it in. |
🎬 Video DubbingYouTube URL or file → transcribe → translate → re-voice → MP4. |
📖 Audiobook EditorImport text, EPUB, or PDF. Auto-chapter, loudnorm, metadata. Export .m4b. |
🎭 StoriesMulti-voice editor. Assign voices per-line, preview, export full cast. |
⌨️ Dictation Widget⌘+⇧+Space from any app.
Transcribes, auto-pastes, disappears.
|
🔊 Vocal IsolationDemucs-powered. Splits speech from music, keeps the background. |
👥 Speaker DiarizationPyannote + WhisperX. Auto-identifies who said what. |
📦 Batch QueueDrop 50 videos, walk away. Progress bars per job. |
🤖 MCP ServerUse OmniVoice from Claude, Cursor, or any MCP client. |
🛡️ AI WatermarkAudioSeal (Meta). Invisible, survives compression. |
🔬 DiagnosticsSelf-check, error journal, scrubbed diagnostic bundle. |
🔐 100% LocalNo keys, no cloud, no accounts. Your machine only. |
⚡ GPU Auto-DetectCUDA · MPS · ROCm · CPU. ≤8 GB? Auto-offloads. |
🧩 ExtensibleSubclassTTSbackend,
add any engine in ~50 lines.
|
🧭 Engine RoutingPreflight GPU check per engine. No silent CPU fallback. |
🎒 Portable PersonasExport voices as.ovsvoice
bundles — identity + watermark.
|
♾️ Unlimited TTSSentence-chunked generation. No length cap. Streaming via WS. |
🌐 Remote BackendPoint UI at a remote server. Tailscale-friendly. Bearer auth. |
🧠 Dictation + LLMLocal LLM cleanup of transcripts. Optional echo cancellation. |
macOS: first launch needs a one-time approval — right-click → Open (or System Settings → Privacy & Security → "Open Anyway" on macOS 15). No Terminal needed. Why?
Per-OS install guides — pick yours and follow it end-to-end:
palashdeb/omnivoice-studioStuck? Run the built-in self-check first — Settings → About → "Run
self-check" in the app, or uv run python backend/main.py --diagnose from
a checkout (--deep also test-loads the active engine). Then see
docs/install/troubleshooting.md for the
top 10 install errors. The in-app error UI deeplinks to those entries when
something breaks at runtime, and Settings → About → "Save diagnostic
bundle" packages scrubbed logs + the self-check report for bug reports.
For Hugging Face token setup, see docs/setup/huggingface-token.md. For diarization-specific gating, see docs/features/diarization.md. For download speed, the ⚡ fast-download (Xet) status, and restricted-network / mirror options, see docs/downloading-models.md.
Voice Clone
Drop a 3-second clip → mirror any voice. 646 languages, zero-shot.
|
Voice Design
Build new voices from scratch — gender, age, accent, pitch, style.
|
Video Dubbing
Upload or paste a YouTube URL. Transcribe, translate, re-voice, export.
|
Voice Gallery
Search YouTube, browse categories, download clips, build your library.
|
Settings → Models
15 models. One-click install. Auto-detects your platform (CUDA / MPS / CPU).
|
Projects
Dub projects, voice profiles, generation history, exports — all searchable.
|
Settings → Logs
Live backend, frontend, and Tauri runtime logs. Filter, refresh, clear.
|
ElevenLabs charges $5–$330/mo and processes your audio on their servers. OmniVoice Studio runs on your hardware, with no usage limits.
| ElevenLabs | OmniVoice Studio | |
|---|---|---|
| Pricing | $5–$330/mo, per-character billing | Free & open-source (AGPL-3.0) · Commercial license for proprietary use |
| Voice Cloning | ✅ 3s clip | ✅ 3s clip, zero-shot |
| Voice Design | ✅ Gender, age | ✅ Gender, age, accent, pitch, style, dialect |
| Audiobook / Stories | ❌ | ✅ Full audiobook editor + multi-voice stories (EPUB/PDF import, .m4b export) |
| Languages | 32 | 646 |
| Video Dubbing | ✅ Cloud-only | ✅ Fully local |
| Data Privacy | Audio sent to cloud | Nothing leaves your machine |
| API Keys | Required | Not needed |
| GPU Support | N/A (cloud) | CUDA · Apple Silicon · ROCm · CPU |
| Desktop App | ❌ | ✅ macOS · Windows · Linux |
| TTS Engines | 1 | 11 (OmniVoice, CosyVoice 3, GPT-SoVITS, VoxCPM2, MOSS-TTS-Nano, KittenTTS, MLX-Audio, Sherpa-ONNX, IndexTTS 2, OmniVoice GGUF, Supertonic 3) |
| ASR Engines | 1 | 9 (WhisperX, Faster-Whisper, MLX Whisper, PyTorch Whisper, Parakeet, Moonshine, FunASR, isolated Faster-Whisper, sherpa-onnx live dictation) |
| MCP Server | ❌ | ✅ Use from Claude, Cursor, any MCP client |
| Self-check | ❌ | ✅ Diagnostics suite, error journal, scrubbed debug bundles |
| Customizable | ❌ Closed | ✅ Fork it, extend it, ship it |
OmniVoice Studio gives you professional-grade AI tools without the subscription or the cloud.
Convinced? Come build with us.
| Minimum | Recommended | |
|---|---|---|
| OS | Windows 10, macOS 12+, Ubuntu 20.04+ | Any modern 64-bit OS |
| RAM | 8 GB | 16 GB+ |
| VRAM (GPU) | 4 GB (auto-offloads TTS to CPU) | 8 GB+ (NVIDIA RTX 3060+) |
| Disk | 10 GB free (models + cache) | 20 GB+ SSD |
| Python | 3.10+ (managed by uv) |
3.11–3.12 |
| GPU | Optional — CPU works | NVIDIA CUDA · Apple Silicon MPS · AMD ROCm |
[!TIP] On GPUs with ≤8 GB VRAM, OmniVoice automatically offloads TTS to CPU during transcription — no config needed. A dedicated GPU is not required; the entire pipeline runs on CPU (just slower).
OmniVoice ships a multi-engine TTS backend. The default engine (OmniVoice) is always available; additional engines are opt-in and auto-detected. Switch engines in Settings → TTS Engine or via the OMNIVOICE_TTS_BACKEND env var.
| Engine | Languages | Clone | Instruct | Linux | macOS ARM | Windows | License |
|---|---|---|---|---|---|---|---|
| OmniVoice (default) | 600+ | ✅ | ✅ | ✅ CUDA/CPU | ✅ MPS | ✅ CUDA/CPU | Built-in |
| CosyVoice 3 | 9 + 18 dialects | ✅ | ✅ | ✅ CUDA/CPU | ✅ MPS | ✅ CUDA/CPU | Apache-2.0 |
| GPT-SoVITS | 5 | ✅ | — | ✅ CUDA/CPU | — | ✅ CUDA/CPU | MIT |
| VoxCPM2 | 30 | ✅ | ✅ | ✅ CUDA/CPU | ✅ MPS | ✅ CUDA/CPU | Apache-2.0 |
| MOSS-TTS-Nano | 20 | ✅ | — | ✅ CUDA/CPU | ✅ CPU | ✅ CUDA/CPU | Apache-2.0 |
| KittenTTS | English | — | — | ✅ CPU | ✅ CPU | ✅ CPU | MIT |
| MLX-Audio (Kokoro, Qwen3-TTS, CSM, Dia, …) | Multi | Varies | Varies | ❌ | ✅ Native | ❌ | Varies |
| Sherpa-ONNX | 20+ | — | — | ✅ CUDA/CPU | ✅ CPU | ✅ CUDA/CPU | Apache-2.0 |
| IndexTTS 2 ⚡ | Multi | ✅ | — | ✅ CUDA | — |
$ claude mcp add OmniVoice-Studio \
-- python -m otcore.mcp_server <graph>