
Docs | Example (ESPnet2) | Docker | Notebook
ESPnet is an end-to-end speech processing toolkit covering end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, and so on. ESPnet uses pytorch as a deep learning engine and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for various speech processing experiments.
ASR recipes (WSJ, Switchboard, CHiME-4/5, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, REVERB, Gigaspeech, etc.)TTS recipes in a similar manner to the ASR recipe (LJSpeech, LibriTTS, M-AILABS, etc.)ST recipes (Fisher-CallHome Spanish, Libri-trans, IWSLT'18, How2, Must-C, Mboshi-French, etc.)MT recipes (IWSLT'14, IWSLT'16, the above ST recipes etc.)SLU recipes (CATSLU-MAPS, FSC, Grabo, IEMOCAP, JDCINAL, SNIPS, SLURP, SWBD-DA, etc.)SE/SS recipes (DNS-IS2020, LibriMix, SMS-WSJ, VCTK-noisyreverb, WHAM!, WHAMR!, WSJ-2mix, etc.)Please refer to the tutorial page for complete documentation.
frontend to s3prlfrontend_conf to the corresponding name.Demonstration
- Real-time ASR demo with ESPnet2
- Gradio Web Demo on Hugging Face Spaces. Check out the Web Demo
- Streaming Transformer ASR Local Demo with ESPnet2.
Demonstration
- Real-time TTS demo with ESPnet2
- Integrated to Hugging Face Spaces with Gradio. See demo:
To train the neural vocoder, please check the following repositories: - kan-bayashi/ParallelWaveGAN - r9y9/wavenet_vocoder
Demonstration
- Interactive SE demo with ESPnet2
- Streaming SE demo with ESPnet2
$ claude mcp add espnet \
-- python -m otcore.mcp_server <graph>