hub / github.com/debpalash/OmniVoice-Studio

github.com/debpalash/OmniVoice-Studio @v0.3.8 sqlite

repository ↗ · DeepWiki ↗ · release v0.3.8 ↗

6,228 symbols 21,366 edges 834 files 1,775 documented · 29%

README

OmniVoice Studio

The open-source ElevenLabs alternative.

Real-time dictation, zero-shot voice cloning, and cinematic video dubbing — all on your desktop.

Open-source, no API keys, fully local. 646 languages.

<a href="#quickstart">Quickstart</a> ·
<a href="#features">Features</a> ·
<a href="#why-ovs">Why OVS</a> ·
<a href="#tts-engines">TTS Engines</a> ·
<a href="#asr-engines">ASR Engines</a> ·
<a href="#sponsor--donate">Donate</a> ·
<a href="#contributing">Contributing</a> ·
<a href="https://discord.gg/bzQavDfVV9">Discord</a> ·
<a href="https://github.com/debpalash/OmniVoice-Studio/raw/v0.3.8/README_CN.md"><strong>简体中文</strong></a>







<a href="https://github.com/debpalash/OmniVoice-Studio/stargazers"><img src="https://img.shields.io/github/stars/debpalash/OmniVoice-Studio?style=flat-square&color=f59e0b" alt="Stars" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/releases/latest"><img src="https://img.shields.io/github/v/release/debpalash/OmniVoice-Studio?style=flat-square&color=10b981" alt="Release" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/raw/v0.3.8/LICENSE"><img src="https://img.shields.io/badge/license-AGPL--3.0-blue?style=flat-square" alt="License" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/issues"><img src="https://img.shields.io/github/issues/debpalash/OmniVoice-Studio?style=flat-square&color=ef4444" alt="Issues" /></a>
<a href="https://discord.gg/bzQavDfVV9"><img src="https://img.shields.io/badge/Discord-Join_Community-5865F2?style=flat-square&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://ko-fi.com/debpalash"><img src="https://img.shields.io/badge/Ko--fi-Support_Us-FF5E5B?style=flat-square&logo=ko-fi&logoColor=white" alt="Ko-fi" /></a>
<a href="https://paypal.me/palashCoder"><img src="https://img.shields.io/badge/PayPal-Donate-00457C?style=flat-square&logo=paypal&logoColor=white" alt="PayPal" /></a>

[!WARNING] OmniVoice Studio is in active beta. Things may break between releases. For the latest features and fixes, clone the repo and run from source rather than using pre-built installers. Bug reports and PRs are very welcome — open an issue or join Discord.

_{Get setup help · Share your dubs · Vote on the roadmap · Early access to new engines}

Features

🎙️ Voice Cloning 3-second clip → mirror any voice. 646 languages, zero-shot.	🎨 Voice Design Gender, age, accent, pitch, speed, emotion, dialect — dial it in.	🎬 Video Dubbing YouTube URL or file → transcribe → translate → re-voice → MP4.	📖 Audiobook Editor Import text, EPUB, or PDF. Auto-chapter, loudnorm, metadata. Export .m4b.
🎭 Stories Multi-voice editor. Assign voices per-line, preview, export full cast.	⌨️ Dictation Widget `⌘+⇧+Space` from any app. Transcribes, auto-pastes, disappears.	🔊 Vocal Isolation Demucs-powered. Splits speech from music, keeps the background.	👥 Speaker Diarization Pyannote + WhisperX. Auto-identifies who said what.
📦 Batch Queue Drop 50 videos, walk away. Progress bars per job.	🤖 MCP Server Use OmniVoice from Claude, Cursor, or any MCP client.	🛡️ AI Watermark AudioSeal (Meta). Invisible, survives compression.	🔬 Diagnostics Self-check, error journal, scrubbed diagnostic bundle.
🔐 100% Local No keys, no cloud, no accounts. Your machine only.	⚡ GPU Auto-Detect CUDA · MPS · ROCm · CPU. ≤8 GB? Auto-offloads.	🧩 Extensible Subclass `TTSbackend`, add any engine in ~50 lines.	🧭 Engine Routing Preflight GPU check per engine. No silent CPU fallback.
🎒 Portable Personas Export voices as `.ovsvoice` bundles — identity + watermark.	♾️ Unlimited TTS Sentence-chunked generation. No length cap. Streaming via WS.	🌐 Remote Backend Point UI at a remote server. Tailscale-friendly. Bearer auth.	🧠 Dictation + LLM Local LLM cleanup of transcripts. Optional echo cancellation.

Quickstart

_{macOS: first launch needs a one-time approval — right-click → Open (or System Settings → Privacy & Security → "Open Anyway" on macOS 15). No Terminal needed. Why?}

Per-OS install guides — pick yours and follow it end-to-end:

macOS — docs/install/macos.md
Windows — docs/install/windows.md
Linux — docs/install/linux.md
Docker — docs/install/docker.md · Docker Hub: palashdeb/omnivoice-studio

Stuck? Run the built-in self-check first — Settings → About → "Run self-check" in the app, or uv run python backend/main.py --diagnose from a checkout (--deep also test-loads the active engine). Then see docs/install/troubleshooting.md for the top 10 install errors. The in-app error UI deeplinks to those entries when something breaks at runtime, and Settings → About → "Save diagnostic bundle" packages scrubbed logs + the self-check report for bug reports.

For Hugging Face token setup, see docs/setup/huggingface-token.md. For diarization-specific gating, see docs/features/diarization.md. For download speed, the ⚡ fast-download (Xet) status, and restricted-network / mirror options, see docs/downloading-models.md.

Screenshots

Voice Clone _{Drop a 3-second clip → mirror any voice. 646 languages, zero-shot.}	Voice Design _{Build new voices from scratch — gender, age, accent, pitch, style.}
Video Dubbing _{Upload or paste a YouTube URL. Transcribe, translate, re-voice, export.}	Voice Gallery _{Search YouTube, browse categories, download clips, build your library.}
Settings → Models _{15 models. One-click install. Auto-detects your platform (CUDA / MPS / CPU).}	Projects _{Dub projects, voice profiles, generation history, exports — all searchable.}
Settings → Logs _{Live backend, frontend, and Tauri runtime logs. Filter, refresh, clear.}

Why OVS?

ElevenLabs charges $5–$330/mo and processes your audio on their servers. OmniVoice Studio runs on your hardware, with no usage limits.

	ElevenLabs	OmniVoice Studio
Pricing	$5–$330/mo, per-character billing	Free & open-source (AGPL-3.0) · Commercial license for proprietary use
Voice Cloning	✅ 3s clip	✅ 3s clip, zero-shot
Voice Design	✅ Gender, age	✅ Gender, age, accent, pitch, style, dialect
Audiobook / Stories	❌	✅ Full audiobook editor + multi-voice stories (EPUB/PDF import, .m4b export)
Languages	32	646
Video Dubbing	✅ Cloud-only	✅ Fully local
Data Privacy	Audio sent to cloud	Nothing leaves your machine
API Keys	Required	Not needed
GPU Support	N/A (cloud)	CUDA · Apple Silicon · ROCm · CPU
Desktop App	❌	✅ macOS · Windows · Linux
TTS Engines	1	11 (OmniVoice, CosyVoice 3, GPT-SoVITS, VoxCPM2, MOSS-TTS-Nano, KittenTTS, MLX-Audio, Sherpa-ONNX, IndexTTS 2, OmniVoice GGUF, Supertonic 3)
ASR Engines	1	9 (WhisperX, Faster-Whisper, MLX Whisper, PyTorch Whisper, Parakeet, Moonshine, FunASR, isolated Faster-Whisper, sherpa-onnx live dictation)
MCP Server	❌	✅ Use from Claude, Cursor, any MCP client
Self-check	❌	✅ Diagnostics suite, error journal, scrubbed debug bundles
Customizable	❌ Closed	✅ Fork it, extend it, ship it

OmniVoice Studio gives you professional-grade AI tools without the subscription or the cloud.

Convinced? Come build with us.

System Requirements

	Minimum	Recommended
OS	Windows 10, macOS 12+, Ubuntu 20.04+	Any modern 64-bit OS
RAM	8 GB	16 GB+
VRAM (GPU)	4 GB (auto-offloads TTS to CPU)	8 GB+ (NVIDIA RTX 3060+)
Disk	10 GB free (models + cache)	20 GB+ SSD
Python	3.10+ (managed by `uv`)	3.11–3.12
GPU	Optional — CPU works	NVIDIA CUDA · Apple Silicon MPS · AMD ROCm

[!TIP] On GPUs with ≤8 GB VRAM, OmniVoice automatically offloads TTS to CPU during transcription — no config needed. A dedicated GPU is not required; the entire pipeline runs on CPU (just slower).

TTS Engines

OmniVoice ships a multi-engine TTS backend. The default engine (OmniVoice) is always available; additional engines are opt-in and auto-detected. Switch engines in Settings → TTS Engine or via the OMNIVOICE_TTS_BACKEND env var.

Engine	Languages	Clone	Instruct	Linux	macOS ARM	Windows	License
OmniVoice (default)	600+	✅	✅	✅ CUDA/CPU	✅ MPS	✅ CUDA/CPU	Built-in
CosyVoice 3	9 + 18 dialects	✅	✅	✅ CUDA/CPU	✅ MPS	✅ CUDA/CPU	Apache-2.0
GPT-SoVITS	5	✅	—	✅ CUDA/CPU	—	✅ CUDA/CPU	MIT
VoxCPM2	30	✅	✅	✅ CUDA/CPU	✅ MPS	✅ CUDA/CPU	Apache-2.0
MOSS-TTS-Nano	20	✅	—	✅ CUDA/CPU	✅ CPU	✅ CUDA/CPU	Apache-2.0
KittenTTS	English	—	—	✅ CPU	✅ CPU	✅ CPU	MIT
MLX-Audio (Kokoro, Qwen3-TTS, CSM, Dia, …)	Multi	Varies	Varies	❌	✅ Native	❌	Varies
Sherpa-ONNX	20+	—	—	✅ CUDA/CPU	✅ CPU	✅ CUDA/CPU	Apache-2.0
IndexTTS 2 ⚡	Multi	✅	—	✅ CUDA	—

Extension points exported contracts — how you extend this code

StoryTrack (Interface)

* Long-form project state (#31) — ONE project concept both long-form editors * bind to: Stories (multi-voice cast + tra

frontend/src/store/longformSlice.ts

DubPrepProgress (Interface)

Per-stage progress for the prep pipeline (download, demucs).

frontend/src/store/dubSlice.ts

DubFailure (Interface)

Structured pipeline failure (plan-04 #131) — carries the specific cause, * an actionable hint, an optional docs-topic

frontend/src/store/dubSlice.ts

FitOptions (Interface)

* Knob overrides for the `smart_fit` strategy. `null` (default) sends no * `fit_options` and the backend uses its canon

frontend/src/store/prefsSlice.ts

AudiobookMetadata (Interface)

Global tags embedded in the output file (player-visible).

frontend/src/api/audiobook.ts

Core symbols most depended-on inside this repo

get

called by 1177

backend/core/job_queue.py

get

called by 516

backend/api/routers/setup/models.py

push

called by 144

backend/services/sentence_chunker.py

backend/services/llm_providers.py

write

called by 130

backend/utils/hf_progress.py

run

called by 128

tests/evals/runner.py

db_conn

called by 109

backend/core/db.py

Shape

Function 4,580

Method 940

Class 395

Route 225

Interface 88

Languages

Python82%

TypeScript18%

Modules by API surface

backend/services/asr_backend.py85 symbols

backend/services/tts_backend.py81 symbols

backend/api/routers/system.py68 symbols

backend/api/routers/dub_export.py62 symbols

tests/test_smart_fit_export.py60 symbols

tests/test_api.py55 symbols

backend/api/routers/settings.py54 symbols

backend/services/model_manager.py52 symbols

omnivoice/models/omnivoice.py43 symbols

tests/test_longform_render.py41 symbols

tests/test_pronunciation.py40 symbols

tests/test_segmentation.py39 symbols

Dependencies from manifests, versioned

@fontsource-variable/inter5.2.8 · 1×

@fontsource-variable/source-serif-45.2.9 · 1×

@fontsource/ibm-plex-mono5.2.7 · 1×

@playwright/test1.61.0 · 1×

@radix-ui/react-dialog1.1.17 · 1×

@radix-ui/react-dropdown-menu2.1.18 · 1×

@radix-ui/react-progress1.1.10 · 1×

@radix-ui/react-select2.3.1 · 1×

@radix-ui/react-slider1.4.1 · 1×

@radix-ui/react-slot1.3.0 · 1×

@radix-ui/react-tabs1.1.15 · 1×

@radix-ui/react-toggle1.1.12 · 1×

For agents

$ claude mcp add OmniVoice-Studio \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/debpalash/OmniVoice-Studio @v0.3.8 sqlite

OmniVoice Studio

The open-source ElevenLabs alternative.

Features

🎙️ Voice Cloning

🎨 Voice Design

🎬 Video Dubbing

📖 Audiobook Editor

🎭 Stories

⌨️ Dictation Widget

🔊 Vocal Isolation

👥 Speaker Diarization

📦 Batch Queue

🤖 MCP Server

🛡️ AI Watermark

🔬 Diagnostics

🔐 100% Local

⚡ GPU Auto-Detect

🧩 Extensible

🧭 Engine Routing

🎒 Portable Personas

♾️ Unlimited TTS

🌐 Remote Backend

🧠 Dictation + LLM