MCPcopy
hub / github.com/PaddlePaddle/PaddleSpeech

github.com/PaddlePaddle/PaddleSpeech @r1.5.0 sqlite

repository ↗ · DeepWiki ↗ · release r1.5.0 ↗
7,162 symbols 27,490 edges 1,106 files 2,974 documented · 42%
README

(简体中文|English)

<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/raw/r1.5.0/=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>

Quick Start | Documents | Models List | AIStudio Courses | NAACL2022 Best Demo Award Paper | Gitee


PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.

Speech Recognition
Input Audio Recognition Result
I knocked at the door on the ancient side of the building.
我认为跑步最重要的就是给我带来了身体健康。
Speech Translation (English to Chinese)
Input Audio Translations Result
我 在 这栋 建筑 的 古老 门上 敲门。
Text-to-Speech
Input Text Synthetic Audio
Life was like a box of chocolates, you never know what you're gonna get.
早上好,今天是2020/10/29,最低温度是-3°C。
季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。
大家好,我是 parrot 虚拟老师,我们来读一首诗,我与春风皆过客,I and the spring breeze are passing by,你携秋水揽星河,you take the autumn water to take the galaxy。
宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。
各个国家有各个国家嘅国歌

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

Punctuation Restoration
Input Text Output Text
今天的天气真不错啊你下午有空吗我想约你一起去吃饭 今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at: - 📦 Ease of Use: low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. - 🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology. - 🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system. - 💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context. - 📦 Varieties of Functions that Vitalize both Industrial and Academia: - 🛎️ Implementation of critical audio tasks: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verfication, KeyWord Spotting, Audio Classification, and Speech Translation, etc. - 🔬 Integration of mainstream models and datasets: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also model list for more details. - 🧩 Cascaded models application: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

Community

  • Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.

Installation

We strongly recommend our users to install PaddleSpeech in Linux with python>=3.8.

Dependency Introduction

  • gcc >= 4.8.5
  • paddlepaddle
  • python >= 3.8
  • OS support: Linux(recommend), Windows, Mac OSX

PaddleSpeech depends on pad

Extension points exported contracts — how you extend this code

VadListener (Interface)
Created by George Konovalov on 11/16/2019. [2 implementers]
runtime/examples/vad/vad-android-demo/vad/src/main/java/com/konovalov/vad/VadListener.java
Listener (Interface)
(no doc) [2 implementers]
runtime/examples/vad/vad-android-demo/example/src/main/java/com/konovalov/vad/example/recorder/VoiceRecorder.java

Core symbols most depended-on inside this repo

append
called by 749
paddlespeech/audio/streamdata/pipeline.py
numpy
called by 321
paddlespeech/audiotools/core/audio_signal.py
append
called by 292
paddlespeech/s2t/models/wav2vec2/modules/containers.py
write
called by 262
paddlespeech/audio/streamdata/gopen.py
zeros
called by 222
paddlespeech/audiotools/core/_julius.py
load
called by 217
paddlespeech/t2s/training/updater.py
report
called by 206
paddlespeech/t2s/training/reporter.py
get
called by 165
paddlespeech/audio/functional/window.py

Shape

Method 3,880
Function 2,136
Class 1,073
Route 68
Enum 3
Interface 2

Languages

Python97%
Java2%
TypeScript1%

Modules by API surface

paddlespeech/s2t/models/wav2vec2/modules/wav2vec2_model.py115 symbols
paddlespeech/s2t/models/whisper/whisper.py90 symbols
paddlespeech/audiotools/data/transforms.py78 symbols
paddlespeech/t2s/modules/losses.py76 symbols
utils/zh_tn.py72 symbols
paddlespeech/s2t/models/wav2vec2/modules/modeling_wav2vec2.py68 symbols
paddlespeech/audiotools/core/audio_signal.py64 symbols
paddlespeech/audio/streamdata/filters.py55 symbols
paddlespeech/audiotools/core/_julius.py46 symbols
paddlespeech/t2s/models/starganv2_vc/starganv2_vc.py45 symbols
paddlespeech/t2s/models/waveflow.py43 symbols
paddlespeech/vector/io/augment.py41 symbols

Dependencies from manifests, versioned

@element-plus/icons-vue2.0.9 · 1×
@vitejs/plugin-vue2.3.0 · 1×
@vue/compiler-sfc3.1.0 · 1×
ant-design-vue2.2.8 · 1×
axios0.26.1 · 1×
element-plus2.1.9 · 1×
js-audio-recorder0.5.7 · 1×
lamejs1.2.1 · 1×
less4.1.2 · 1×
vite2.9.13 · 1×
vue3.2.25 · 1×
ToJyutping0.2.1 · 1×

Datastores touched

(mysql)Database · 1 repos

For agents

$ claude mcp add PaddleSpeech \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact