MCPcopy
hub / github.com/debpalash/OmniVoice-Studio

github.com/debpalash/OmniVoice-Studio @v0.3.8 sqlite

repository ↗ · DeepWiki ↗ · release v0.3.8 ↗
6,228 symbols 21,366 edges 834 files 1,775 documented · 29%
README

OmniVoice Logo

OmniVoice Studio

The open-source ElevenLabs alternative.

Real-time dictation, zero-shot voice cloning, and cinematic video dubbing — all on your desktop.

Open-source, no API keys, fully local. 646 languages.

<a href="#quickstart">Quickstart</a> ·
<a href="#features">Features</a> ·
<a href="#why-ovs">Why OVS</a> ·
<a href="#tts-engines">TTS Engines</a> ·
<a href="#asr-engines">ASR Engines</a> ·
<a href="#sponsor--donate">Donate</a> ·
<a href="#contributing">Contributing</a> ·
<a href="https://discord.gg/bzQavDfVV9">Discord</a> ·
<a href="https://github.com/debpalash/OmniVoice-Studio/raw/v0.3.8/README_CN.md"><strong>简体中文</strong></a>







<a href="https://github.com/debpalash/OmniVoice-Studio/stargazers"><img src="https://img.shields.io/github/stars/debpalash/OmniVoice-Studio?style=flat-square&color=f59e0b" alt="Stars" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/releases/latest"><img src="https://img.shields.io/github/v/release/debpalash/OmniVoice-Studio?style=flat-square&color=10b981" alt="Release" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/raw/v0.3.8/LICENSE"><img src="https://img.shields.io/badge/license-AGPL--3.0-blue?style=flat-square" alt="License" /></a>
<a href="https://github.com/debpalash/OmniVoice-Studio/issues"><img src="https://img.shields.io/github/issues/debpalash/OmniVoice-Studio?style=flat-square&color=ef4444" alt="Issues" /></a>
<a href="https://discord.gg/bzQavDfVV9"><img src="https://img.shields.io/badge/Discord-Join_Community-5865F2?style=flat-square&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://ko-fi.com/debpalash"><img src="https://img.shields.io/badge/Ko--fi-Support_Us-FF5E5B?style=flat-square&logo=ko-fi&logoColor=white" alt="Ko-fi" /></a>
<a href="https://paypal.me/palashCoder"><img src="https://img.shields.io/badge/PayPal-Donate-00457C?style=flat-square&logo=paypal&logoColor=white" alt="PayPal" /></a>

OmniVoice Studio — The open-source ElevenLabs alternative

[!WARNING] OmniVoice Studio is in active beta. Things may break between releases. For the latest features and fixes, clone the repo and run from source rather than using pre-built installers. Bug reports and PRs are very welcome — open an issue or join Discord.

Join Discord

Get setup help · Share your dubs · Vote on the roadmap · Early access to new engines

Features

🎙️ Voice Cloning

3-second clip → mirror any voice. 646 languages, zero-shot.

🎨 Voice Design

Gender, age, accent, pitch, speed, emotion, dialect — dial it in.

🎬 Video Dubbing

YouTube URL or file → transcribe → translate → re-voice → MP4.

📖 Audiobook Editor

Import text, EPUB, or PDF. Auto-chapter, loudnorm, metadata. Export .m4b.

🎭 Stories

Multi-voice editor. Assign voices per-line, preview, export full cast.

⌨️ Dictation Widget

⌘+⇧+Space from any app. Transcribes, auto-pastes, disappears.

🔊 Vocal Isolation

Demucs-powered. Splits speech from music, keeps the background.

👥 Speaker Diarization

Pyannote + WhisperX. Auto-identifies who said what.

📦 Batch Queue

Drop 50 videos, walk away. Progress bars per job.

🤖 MCP Server

Use OmniVoice from Claude, Cursor, or any MCP client.

🛡️ AI Watermark

AudioSeal (Meta). Invisible, survives compression.

🔬 Diagnostics

Self-check, error journal, scrubbed diagnostic bundle.

🔐 100% Local

No keys, no cloud, no accounts. Your machine only.

⚡ GPU Auto-Detect

CUDA · MPS · ROCm · CPU. ≤8 GB? Auto-offloads.

🧩 Extensible

Subclass TTSbackend, add any engine in ~50 lines.

🧭 Engine Routing

Preflight GPU check per engine. No silent CPU fallback.

🎒 Portable Personas

Export voices as .ovsvoice bundles — identity + watermark.

♾️ Unlimited TTS

Sentence-chunked generation. No length cap. Streaming via WS.

🌐 Remote Backend

Point UI at a remote server. Tailscale-friendly. Bearer auth.

🧠 Dictation + LLM

Local LLM cleanup of transcripts. Optional echo cancellation.

Quickstart

Download macOS DMG Download Windows MSI Download Linux AppImage Download Debian .deb

macOS: first launch needs a one-time approval — right-click → Open (or System Settings → Privacy & Security → "Open Anyway" on macOS 15). No Terminal needed. Why?

Per-OS install guides — pick yours and follow it end-to-end:

Stuck? Run the built-in self-check first — Settings → About → "Run self-check" in the app, or uv run python backend/main.py --diagnose from a checkout (--deep also test-loads the active engine). Then see docs/install/troubleshooting.md for the top 10 install errors. The in-app error UI deeplinks to those entries when something breaks at runtime, and Settings → About → "Save diagnostic bundle" packages scrubbed logs + the self-check report for bug reports.

For Hugging Face token setup, see docs/setup/huggingface-token.md. For diarization-specific gating, see docs/features/diarization.md. For download speed, the ⚡ fast-download (Xet) status, and restricted-network / mirror options, see docs/downloading-models.md.

Screenshots

Voice Clone Voice Clone Drop a 3-second clip → mirror any voice. 646 languages, zero-shot. Voice Design Voice Design Build new voices from scratch — gender, age, accent, pitch, style.
Video Dubbing Video Dubbing Upload or paste a YouTube URL. Transcribe, translate, re-voice, export. Voice Gallery Voice Gallery Search YouTube, browse categories, download clips, build your library.
Settings — Models Settings → Models 15 models. One-click install. Auto-detects your platform (CUDA / MPS / CPU). Projects Projects Dub projects, voice profiles, generation history, exports — all searchable.
Settings — Logs Settings → Logs Live backend, frontend, and Tauri runtime logs. Filter, refresh, clear.

Why OVS?

ElevenLabs charges $5–$330/mo and processes your audio on their servers. OmniVoice Studio runs on your hardware, with no usage limits.

ElevenLabs OmniVoice Studio
Pricing $5–$330/mo, per-character billing Free & open-source (AGPL-3.0) · Commercial license for proprietary use
Voice Cloning ✅ 3s clip ✅ 3s clip, zero-shot
Voice Design ✅ Gender, age ✅ Gender, age, accent, pitch, style, dialect
Audiobook / Stories ✅ Full audiobook editor + multi-voice stories (EPUB/PDF import, .m4b export)
Languages 32 646
Video Dubbing ✅ Cloud-only ✅ Fully local
Data Privacy Audio sent to cloud Nothing leaves your machine
API Keys Required Not needed
GPU Support N/A (cloud) CUDA · Apple Silicon · ROCm · CPU
Desktop App ✅ macOS · Windows · Linux
TTS Engines 1 11 (OmniVoice, CosyVoice 3, GPT-SoVITS, VoxCPM2, MOSS-TTS-Nano, KittenTTS, MLX-Audio, Sherpa-ONNX, IndexTTS 2, OmniVoice GGUF, Supertonic 3)
ASR Engines 1 9 (WhisperX, Faster-Whisper, MLX Whisper, PyTorch Whisper, Parakeet, Moonshine, FunASR, isolated Faster-Whisper, sherpa-onnx live dictation)
MCP Server ✅ Use from Claude, Cursor, any MCP client
Self-check ✅ Diagnostics suite, error journal, scrubbed debug bundles
Customizable ❌ Closed ✅ Fork it, extend it, ship it

OmniVoice Studio gives you professional-grade AI tools without the subscription or the cloud.

Convinced? Come build with us.

Join Discord


System Requirements

Minimum Recommended
OS Windows 10, macOS 12+, Ubuntu 20.04+ Any modern 64-bit OS
RAM 8 GB 16 GB+
VRAM (GPU) 4 GB (auto-offloads TTS to CPU) 8 GB+ (NVIDIA RTX 3060+)
Disk 10 GB free (models + cache) 20 GB+ SSD
Python 3.10+ (managed by uv) 3.11–3.12
GPU Optional — CPU works NVIDIA CUDA · Apple Silicon MPS · AMD ROCm

[!TIP] On GPUs with ≤8 GB VRAM, OmniVoice automatically offloads TTS to CPU during transcription — no config needed. A dedicated GPU is not required; the entire pipeline runs on CPU (just slower).

TTS Engines

OmniVoice ships a multi-engine TTS backend. The default engine (OmniVoice) is always available; additional engines are opt-in and auto-detected. Switch engines in Settings → TTS Engine or via the OMNIVOICE_TTS_BACKEND env var.

Engine Languages Clone Instruct Linux macOS ARM Windows License
OmniVoice (default) 600+ ✅ CUDA/CPU ✅ MPS ✅ CUDA/CPU Built-in
CosyVoice 3 9 + 18 dialects ✅ CUDA/CPU ✅ MPS ✅ CUDA/CPU Apache-2.0
GPT-SoVITS 5 ✅ CUDA/CPU ✅ CUDA/CPU MIT
VoxCPM2 30 ✅ CUDA/CPU ✅ MPS ✅ CUDA/CPU Apache-2.0
MOSS-TTS-Nano 20 ✅ CUDA/CPU ✅ CPU ✅ CUDA/CPU Apache-2.0
KittenTTS English ✅ CPU ✅ CPU ✅ CPU MIT
MLX-Audio (Kokoro, Qwen3-TTS, CSM, Dia, …) Multi Varies Varies ✅ Native Varies
Sherpa-ONNX 20+ ✅ CUDA/CPU ✅ CPU ✅ CUDA/CPU Apache-2.0
IndexTTS 2 Multi ✅ CUDA

Extension points exported contracts — how you extend this code

StoryTrack (Interface)
* Long-form project state (#31) — ONE project concept both long-form editors * bind to: Stories (multi-voice cast + tra
frontend/src/store/longformSlice.ts
DubPrepProgress (Interface)
Per-stage progress for the prep pipeline (download, demucs).
frontend/src/store/dubSlice.ts
DubFailure (Interface)
Structured pipeline failure (plan-04 #131) — carries the specific cause, * an actionable hint, an optional docs-topic
frontend/src/store/dubSlice.ts
FitOptions (Interface)
* Knob overrides for the `smart_fit` strategy. `null` (default) sends no * `fit_options` and the backend uses its canon
frontend/src/store/prefsSlice.ts
AudiobookMetadata (Interface)
Global tags embedded in the output file (player-visible).
frontend/src/api/audiobook.ts

Core symbols most depended-on inside this repo

get
called by 1177
backend/core/job_queue.py
get
called by 516
backend/api/routers/setup/models.py
push
called by 144
backend/services/sentence_chunker.py
filter
called by 140
backend/main.py
describe
called by 132
backend/services/llm_providers.py
write
called by 130
backend/utils/hf_progress.py
run
called by 128
tests/evals/runner.py
db_conn
called by 109
backend/core/db.py

Shape

Function 4,580
Method 940
Class 395
Route 225
Interface 88

Languages

Python82%
TypeScript18%

Modules by API surface

backend/services/asr_backend.py85 symbols
backend/services/tts_backend.py81 symbols
backend/api/routers/system.py68 symbols
backend/api/routers/dub_export.py62 symbols
tests/test_smart_fit_export.py60 symbols
tests/test_api.py55 symbols
backend/api/routers/settings.py54 symbols
backend/services/model_manager.py52 symbols
omnivoice/models/omnivoice.py43 symbols
tests/test_longform_render.py41 symbols
tests/test_pronunciation.py40 symbols
tests/test_segmentation.py39 symbols

Dependencies from manifests, versioned

@fontsource-variable/source-serif-45.2.9 · 1×
@fontsource/ibm-plex-mono5.2.7 · 1×
@playwright/test1.61.0 · 1×
@radix-ui/react-dialog1.1.17 · 1×
@radix-ui/react-dropdown-menu2.1.18 · 1×
@radix-ui/react-progress1.1.10 · 1×
@radix-ui/react-slider1.4.1 · 1×
@radix-ui/react-slot1.3.0 · 1×
@radix-ui/react-toggle1.1.12 · 1×

For agents

$ claude mcp add OmniVoice-Studio \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact