hub / github.com/PaddlePaddle/PaddleSpeech

github.com/PaddlePaddle/PaddleSpeech @r1.5.0 sqlite

repository ↗ · DeepWiki ↗ · release r1.5.0 ↗

7,162 symbols 27,490 edges 1,106 files 2,974 documented · 42%

README

<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>

Quick Start | Documents | Models List | AIStudio Courses | NAACL2022 Best Demo Award Paper | Gitee

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.

Speech Recognition

Input Audio	Recognition Result
	I knocked at the door on the ancient side of the building.
	我认为跑步最重要的就是给我带来了身体健康。

Speech Translation (English to Chinese)

Input Audio	Translations Result
	我在这栋建筑的古老门上敲门。

Text-to-Speech

Input Text	Synthetic Audio
Life was like a box of chocolates, you never know what you're gonna get.
早上好，今天是2020/10/29，最低温度是-3°C。
季姬寂，集鸡，鸡即棘鸡。棘鸡饥叽，季姬及箕稷济鸡。鸡既济，跻姬笈，季姬忌，急咭鸡，鸡急，继圾几，季姬急，即籍箕击鸡，箕疾击几伎，伎即齑，鸡叽集几基，季姬急极屐击鸡，鸡既殛，季姬激，即记《季姬击鸡记》。
大家好，我是 parrot 虚拟老师，我们来读一首诗，我与春风皆过客，I and the spring breeze are passing by，你携秋水揽星河，you take the autumn water to take the galaxy。
宜家唔系事必要你讲，但系你所讲嘅说话将会变成呈堂证供。
各个国家有各个国家嘅国歌

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

Punctuation Restoration

Input Text	Output Text
今天的天气真不错啊你下午有空吗我想约你一起去吃饭	今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at: - 📦 Ease of Use: low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. - 🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology. - 🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system. - 💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context. - 📦 Varieties of Functions that Vitalize both Industrial and Academia: - 🛎️ Implementation of critical audio tasks: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verfication, KeyWord Spotting, Audio Classification, and Speech Translation, etc. - 🔬 Integration of mainstream models and datasets: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also model list for more details. - 🧩 Cascaded models application: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

👑 2023.05.31: Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech.
🎉 2023.05.18: Add Squeezeformer, Squeezeformer training for ASR on Aishell.
👑 2023.05.04: Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech.
⚡ 2023.04.28: Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
👑 2023.04.25: Add AMP for U2 conformer.
🔥 2023.04.06: Add subtitle file (.srt format) generation example.
🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including DiffSinger、PWGAN and HiFiGAN, the effect is continuously optimized.
👑 2023.03.09: Add Wav2vec2ASR-zh.
🎉 2023.03.07: Add TTS ARM Linux C++ Demo (with C++ Chinese Text Frontend).
🔥 2023.03.03 Add Voice Conversion StarGANv2-VC synthesize pipeline.
🎉 2023.02.16: Add Cantonese TTS.
🔥 2023.01.10: Add code-switch asr CLI and Demos.
👑 2023.01.06: Add code-switch asr tal_cs recipe.
🎉 2022.12.02: Add end-to-end Prosody Prediction pipeline (including using prosody labels in Acoustic Model).
🎉 2022.11.30: Add TTS Android Demo.
🤗 2022.11.28: PP-TTS and PP-ASR demos are available in AIStudio and official website of paddlepaddle.
👑 2022.11.18: Add Whisper CLI and Demos, support multi language recognition and translation.
🔥 2022.11.18: Add Wav2vec2 CLI and Demos, Support ASR and Feature Extraction.
🎉 2022.11.17: Add male voice for TTS.
🔥 2022.11.07: Add U2/U2++ C++ High Performance Streaming ASR Deployment.
👑 2022.11.01: Add Adversarial Loss for Chinese English mixed TTS.
🔥 2022.10.26: Add Prosody Prediction for TTS.
🎉 2022.10.21: Add SSML for TTS Chinese Text Frontend.
👑 2022.10.11: Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech.
🔥 2022.09.26: Add Voice Cloning, TTS finetune, and ERNIE-SAT in PaddleSpeech Web Demo.
⚡ 2022.09.09: Add AISHELL-3 Voice Cloning example with ECAPA-TDNN speaker encoder.
⚡ 2022.08.25: Release TTS finetune example.
🔥 2022.08.22: Add ERNIE-SAT models: ERNIE-SAT-vctk、ERNIE-SAT-aishell3、ERNIE-SAT-zh_en.
🔥 2022.08.15: Add g2pW into TTS Chinese Text Frontend.
🔥 2022.08.09: Release Chinese English mixed TTS.
⚡ 2022.08.03: Add ONNXRuntime infer for TTS CLI.
🎉 2022.07.18: Release VITS: VITS-csmsc、VITS-aishell3、VITS-VC.
🎉 2022.06.22: All TTS models support ONNX format.
🍀 2022.06.17: Add PaddleSpeech Web Demo.
👑 2022.05.13: Release PP-ASR、PP-TTS、PP-VPR.
👏🏻 2022.05.06: PaddleSpeech Streaming Server is available for Streaming ASR with Punctuation Restoration and Token Timestamp and Text-to-Speech.
👏🏻 2022.05.06: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech, Speaker Verification and Punctuation Restoration.
👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verification.
👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.

Community

Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.

Installation

We strongly recommend our users to install PaddleSpeech in Linux with python>=3.8.

Dependency Introduction

gcc >= 4.8.5
paddlepaddle
python >= 3.8
OS support: Linux(recommend), Windows, Mac OSX

PaddleSpeech depends on pad

Extension points exported contracts — how you extend this code

VadListener (Interface)

Created by George Konovalov on 11/16/2019. [2 implementers]

runtime/examples/vad/vad-android-demo/vad/src/main/java/com/konovalov/vad/VadListener.java

Listener (Interface)

(no doc) [2 implementers]

runtime/examples/vad/vad-android-demo/example/src/main/java/com/konovalov/vad/example/recorder/VoiceRecorder.java

Core symbols most depended-on inside this repo

append

called by 749

paddlespeech/audio/streamdata/pipeline.py

numpy

called by 321

paddlespeech/audiotools/core/audio_signal.py

append

called by 292

paddlespeech/s2t/models/wav2vec2/modules/containers.py

write

called by 262

paddlespeech/audio/streamdata/gopen.py

zeros

called by 222

paddlespeech/audiotools/core/_julius.py

load

called by 217

paddlespeech/t2s/training/updater.py

report

called by 206

paddlespeech/t2s/training/reporter.py

get

called by 165

paddlespeech/audio/functional/window.py

Shape

Method 3,880

Function 2,136

Class 1,073

Route 68

Enum 3

Interface 2

Languages

Python97%

Java2%

TypeScript1%

Modules by API surface

paddlespeech/s2t/models/wav2vec2/modules/wav2vec2_model.py115 symbols

paddlespeech/s2t/models/whisper/whisper.py90 symbols

paddlespeech/audiotools/data/transforms.py78 symbols

paddlespeech/t2s/modules/losses.py76 symbols

utils/zh_tn.py72 symbols

paddlespeech/s2t/models/wav2vec2/modules/modeling_wav2vec2.py68 symbols

paddlespeech/audiotools/core/audio_signal.py64 symbols

paddlespeech/audio/streamdata/filters.py55 symbols

paddlespeech/audiotools/core/_julius.py46 symbols

paddlespeech/t2s/models/starganv2_vc/starganv2_vc.py45 symbols

paddlespeech/t2s/models/waveflow.py43 symbols

paddlespeech/vector/io/augment.py41 symbols

Dependencies from manifests, versioned

@element-plus/icons-vue2.0.9 · 1×

@vitejs/plugin-vue2.3.0 · 1×

@vue/compiler-sfc3.1.0 · 1×

ant-design-vue2.2.8 · 1×

axios0.26.1 · 1×

element-plus2.1.9 · 1×

js-audio-recorder0.5.7 · 1×

lamejs1.2.1 · 1×

less4.1.2 · 1×

vite2.9.13 · 1×

vue3.2.25 · 1×

ToJyutping0.2.1 · 1×

Datastores touched

(mysql)Database · 1 repos

For agents

$ claude mcp add PaddleSpeech \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact