hub / github.com/abus-aikorea/voice-pro

github.com/abus-aikorea/voice-pro @v3.2.0 sqlite

repository ↗ · DeepWiki ↗ · release v3.2.0 ↗

2,038 symbols 6,610 edges 235 files 344 documented · 17%

README

Voice-Pro

The best AI speech recognition, translation, and multilingual dubbing solution 🚀

<img src="https://github.com/abus-aikorea/voice-pro/raw/v3.2.0/docs/images/main_page_crop.eng.jpg?raw=true" alt="Dubbing Studio"/>

🎙️ An AI-powered web application for speech recognition, translation, and dubbing

South Korea Flag 한국어 ∙ United Kingdom Flag English ∙ China Flag 中文简体 ∙ Taiwan Flag 中文繁體 ∙ Japan Flag 日本語 ∙ Germany Flag Deutsch ∙ Spain Flag Español ∙ Portugal Flag Português

Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals. - 🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX - 🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice - 📢 Multilingual text-to-speech: Edge-TTS, kokoro (Paid version includes Azure TTS) - 🎥 YouTube processing & audio extraction: yt-dlp - 🌍 Instant translation for 100+ languages: Deep-Translator (Paid version includes Azure Translator)

A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.

⚠️ Please Note

Due to WeConnect development work, Voice-Pro development and updates are not possible for the time being.
We have made all Voice-Pro code open source and completely free. Voice-Pro can now be freely distributed and modified by anyone.
It works well on Windows with NVIDIA GPU. Operation on Mac and Linux has not been verified.
Please leave your requests on the or pages.
Troubleshooting: In most cases, issues can be resolved by deleting the installer_files folder and then running configure.bat followed by start.bat.

📰 News & History

version 3.2

We have been focusing on WeConnect development for the past few months and have not been able to manage Voice-Pro at all.
We have decided to open source all Voice-Pro code.
Voice-Pro is completely free and supports Windows, Mac, Linux.
WeConnect is an application for global cultural exchange.
Connect with people from all over the world for meaningful cultural exchanges, language learning, and international friendships.

version 3.1

🪄 Support for fine-tuned models of F5-TTS
🌍 Supported languages
English & Chinese: SWivid/F5-TTS_v1
Finnish: AsmoKoskinen/F5-TTS_Finnish_Model
French: RASPIAUDIO/F5-French-MixedSpeakers-reduced
Hindi: SPRINGLab/F5-Hindi-24KHz
Italian: alien79/F5-TTS-italian
Japanese: Jmica/F5TTS/JA_21999120
Russian: hotstone228/F5-TTS-Russian
Spanish: jpgallegoar/F5-Spanish

version 3.0

🔥 Removed the AI Cover feature.
🚀 Added support for m-bain/whisperX.

version 2.0

🐍 Built with Python 3.10.15, Torch 2.5.1+cu124, and Gradio 5.14.0.
🆓 Free trial supports media up to 60 seconds in length.
🔥 Added the AI Cover feature.
🎤 Introduced support for CosyVoice and kokoro.
⏳ Initial run downloads CozyVoice2-0.5B (9GB), which may take over an hour depending on network speed.
🎧 Voice samples for cloning will be continuously updated.
📝 Added spaCy for natural sentence-by-sentence translation and TTS.
☁️ Subscription version includes Microsoft Azure Translator and TTS.
🏪 Subscription offers unlimited usage (no 60-second limit) during the subscription period, available via .

🎥 YouTube Showcase

Demo for Voice-Pro (v2.0)	F5-TTS: Voice Cloning	Live Transcription & Translation	Multi-Lingual Voice Cloning: Korean - German
Multi-Lingual Voice Cloning: English - Korean	Multi-Lingual Voice Cloning: Korean - Japanese	NVIDIA RTX Video Super-Resolution	AI Karaoke
Multi-Lingual Voice Cloning: English - Korean

⭐ Key Features

1. Dubbing Studio

YouTube video downloads & audio extraction
Voice separation with Demucs
Supports 100+ languages for speech recognition & translation

2. Speech Technologies

Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
Text-to-Speech:
Edge-TTS: 100+ languages, 400+ voices
E2-TTS, F5-TTS, CosyVoice: Zero-shot cloning
kokoro: Ranked #2 in HuggingFace TTS Arena

3. Real-Time Translation

Instant speech recognition
Multilingual translation on the fly
Customizable audio inputs

🤖 WebUI

`Dubbing Studio` Tab

All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
Supports all ffmpeg-compatible formats
Output

Core symbols most depended-on inside this repo

cosyvoice/cli/model.py

Shape

Method 1,275

Function 464

Class 299

Languages

Python99%

TypeScript1%

Modules by API surface

src/aicover/infer_pack/models.py61 symbols

cosyvoice/utils/scheduler.py53 symbols

src/aicover/infer_pack/models_onnx_moess.py46 symbols

src/aicover/infer_pack/models_onnx.py46 symbols

app/gradio_gulliver.py39 symbols

src/demucs/transformer.py37 symbols

src/aicover/rmvpe.py36 symbols

src/aicover/infer_pack/modules.py35 symbols

app/abus_path.py31 symbols

third_party/Matcha-TTS/matcha/models/components/text_encoder.py28 symbols

src/demucs/repo.py28 symbols

third_party/Matcha-TTS/matcha/hifigan/models.py27 symbols

Dependencies from manifests, versioned

HyperPyYAML1.2.2 · 1×

WeTextProcessing1.0.3 · 1×

azure-ai-translation-text1.0.0b1 · 1×

cached_path1.6.7 · 1×

conformer0.3.2 · 1×

demucs4.0.1 · 1×

diffusers0.29.0 · 1×

f5-tts1.0.8 · 1×

fastapi-cli0.0.4 · 1×

faster-whisper1.1.0 · 1×

ffmpeg-python0.2.0 · 1×

gdown5.1.0 · 1×

For agents

$ claude mcp add voice-pro \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact