(简体中文|English)

<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.
| Input Audio | Recognition Result |
|---|---|
|
|
I knocked at the door on the ancient side of the building. |
|
|
我认为跑步最重要的就是给我带来了身体健康。 |
| Input Audio | Translations Result |
|---|---|
|
|
我 在 这栋 建筑 的 古老 门上 敲门。 |
For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.
| Input Text | Output Text |
|---|---|
| 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 | 今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。 |
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at: - 📦 Ease of Use: low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. - 🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology. - 🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system. - 💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context. - 📦 Varieties of Functions that Vitalize both Industrial and Academia: - 🛎️ Implementation of critical audio tasks: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verfication, KeyWord Spotting, Audio Classification, and Speech Translation, etc. - 🔬 Integration of mainstream models and datasets: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also model list for more details. - 🧩 Cascaded models application: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
PaddleSpeech Streaming Server is available for Streaming ASR with Punctuation Restoration and Token Timestamp and Text-to-Speech.PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech, Speaker Verification and Punctuation Restoration.PaddleSpeech CLI is available for Speaker Verification.PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.
We strongly recommend our users to install PaddleSpeech in Linux with python>=3.8.
PaddleSpeech depends on pad
$ claude mcp add PaddleSpeech \
-- python -m otcore.mcp_server <graph>