MCPcopy
hub / github.com/speechbrain/speechbrain

github.com/speechbrain/speechbrain @v1.1.0 sqlite

repository ↗ · DeepWiki ↗ · release v1.1.0 ↗
6,874 symbols 26,109 edges 727 files 5,214 documented · 76%
README

SpeechBrain Logo

Typing SVG

| 📘 Tutorials | 🌐 Website | 📚 Documentation | 🤝 Contributing | 🤗 HuggingFace | ▶️ YouTube | 🐦 X |

GitHub Repo stars Please, help our community project. Star on GitHub!

Exciting News (January, 2024): Discover what is new in SpeechBrain 1.0 here!

🗣️💬 What SpeechBrain Offers

  • SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models.

  • It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.

🌐 Vision

  • With the rise of deep learning, once-distant domains like speech processing and NLP are now very close. A well-designed neural network and large datasets are all you need.

  • We think it is now time for a holistic toolkit that, mimicking the human brain, jointly supports diverse technologies for complex Conversational AI systems.

  • This spans speech recognition, speaker recognition, speech enhancement, speech separation, language modeling, dialogue, and beyond.

  • Aligned with our long-term goal of natural human-machine conversation, including for non-verbal individuals, we have recently added support for the EEG modality.

📚 Training Recipes

  • We share over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks (see below).

  • We support both training from scratch and fine-tuning pretrained models such as Whisper, Wav2Vec2, WavLM, Hubert, GPT2, Llama2, and beyond. The models on HuggingFace can be easily plugged in and fine-tuned.

  • For any task, you train the model using these commands:

python train.py hparams/train.yaml
  • The hyperparameters are encapsulated in a YAML file, while the training process is orchestrated through a Python script.

  • We maintained a consistent code structure across different tasks.

  • For better replicability, training logs and checkpoints are hosted on Dropbox.

drawing Pretrained Models and Inference

  • Access over 100 pretrained models hosted on HuggingFace.
  • Each model comes with a user-friendly interface for seamless inference. For example, transcribing speech using a pretrained model requires just three lines of code:
from speechbrain.inference import EncoderDecoderASR

asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-conformer-transformerlm-librispeech", savedir="pretrained_models/asr-transformer-transformerlm-librispeech")
asr_model.transcribe_file("speechbrain/asr-conformer-transformerlm-librispeech/example.wav")

drawing Documentation

  • We are deeply dedicated to promoting inclusivity and education.
  • We have authored over 30 tutorials that not only describe how SpeechBrain works but also help users familiarize themselves with Conversational AI.
  • Every class or function has clear explanations and examples that you can run. Check out the documentation for more details 📚.

🎯 Use Cases

  • 🚀 Research Acceleration: Speeding up academic and industrial research. You can develop and integrate new models effortlessly, comparing their performance against our baselines.

  • ⚡️ Rapid Prototyping: Ideal for quick prototyping in time-sensitive projects.

  • 🎓 Educational Tool: SpeechBrain's simplicity makes it a valuable educational resource. It is used by institutions like Mila, Concordia University, Avignon University, and many others for student training.

🚀 Quick Start

To get started with SpeechBrain, follow these simple steps:

🛠️ Installation

Install via PyPI

  1. Install SpeechBrain using PyPI:

    bash pip install speechbrain

  2. Access SpeechBrain in your Python code:

    python import speechbrain as sb

Install from GitHub

This installation is recommended for users who wish to conduct experiments and customize the toolkit according to their needs.

  1. Clone the GitHub repository and install the requirements:

    bash git clone https://github.com/speechbrain/speechbrain.git cd speechbrain pip install -r requirements.txt pip install --editable .

  2. Access SpeechBrain in your Python code:

    python import speechbrain as sb

Any modifications made to the speechbrain package will be automatically reflected, thanks to the --editable flag.

✔️ Test Installation

Ensure your installation is correct by running the following commands:

pytest tests
pytest --doctest-modules speechbrain

🏃‍♂️ Running an Experiment

In SpeechBrain, you can train a model for any task using the following steps:

cd recipes/<dataset>/<task>/
python experiment.py params.yaml

The results will be saved in the output_folder specified in the YAML file.

📘 Learning SpeechBrain

  • Website: Explore general information on the official website.

  • Tutorials: Start with basic tutorials covering fundamental functionalities. Find advanced tutorials and topics in the Tutorial notebooks category in the SpeechBrain documentation.

  • Documentation: Detailed information on the SpeechBrain API, contribution guidelines, and code is available in the documentation.

🔧 Supported Technologies

  • SpeechBrain is a versatile framework designed for implementing a wide range of technologies within the field of Conversational AI.
  • It excels not only in individual task implementations but also in combining various technologies into complex pipelines.

🎙️ Speech/Audio Processing

Tasks Datasets Technologies/Models
Speech Recognition AISHELL-1, CommonVoice, DVoice, LibriSpeech, MEDIA, RescueSpeech, Switchboard, TIMIT, Tedlium2, Voicebank CTC, Transducers, Transformers, Seq2Seq, Beamsearch techniques for CTC,seq2seq,transducers), Rescoring, Conformer, Branchformer, Hyperconformer, Kaldi2-FST
Speaker Recognition VoxCeleb ECAPA-TDNN, ResNET, Xvectors, PLDA, Score Normalization
Speech Separation WSJ0Mix, LibriMix, WHAM!, WHAMR!, Aishell1Mix, BinauralWSJ0Mix SepFormer, RESepFormer, SkiM, DualPath RNN, ConvTasNET
Speech Enhancement DNS, Voicebank SepFormer, MetricGAN, MetricGAN-U, SEGAN, spectral masking, time masking
Interpretability ESC50 Listenable Maps for Audio Classifiers (L-MAC), Learning-to-Interpret (L2I), Non-Negative Matrix Factorization (NMF), PIQ
Speech Generation AudioMNIST Diffusion, Latent Diffusion
Text-to-Speech LJSpeech, LibriTTS Tacotron2, Zero-Shot Multi-Speaker Tacotron2, FastSpeech2
Vocoding LJSpeech, LibriTTS HiFiGAN, DiffWave
Spoken Language Understanding MEDIA, SLURP, Fluent Speech Commands, Timers-and-Such Direct SLU, Decoupled SLU, Multistage SLU
Speech-to-Speech Translation CVSS Discrete Hubert, HiFiGAN, wav2vec2
Speech Translation Fisher CallHome (Spanish), IWSLT22(lowresource) wav2vec2
Emotion Classification IEMOCAP, ZaionEmotionDataset ECAPA-TDNN, wav2vec2, Emotion Diarization
Language Identification VoxLingua107, CommonLanguage ECAPA-TDNN
Voice Activity Detection LibriParty CRDNN
Sound Classification ESC50, UrbanSound CNN14, ECAPA-TDNN
Self-Supervised Learning CommonVoice, LibriSpeech wav2vec2
Metric Learning REAL-M, Voicebank Blind SNR-Estimation, PESQ Learning
Alignment TIMIT CTC, Viterbi, Forward Forward
Diarization AMI ECAPA-TDNN, X-vectors, Spectral Clustering

📝 Text Processing

Tasks Datasets Technologies/Models
Language Modeling CommonVoice, LibriSpeech n-grams, RNNLM, TransformerLM
Response

Core symbols most depended-on inside this repo

append
called by 1094
recipes/LibriTTS/focalcodec/metrics/dwer.py
split
called by 965
speechbrain/utils/parameter_transfer.py
to
called by 631
speechbrain/dataio/batch.py
append
called by 455
speechbrain/nnet/containers.py
filtered_sorted
called by 337
speechbrain/dataio/dataset.py
get_logger
called by 311
speechbrain/utils/logger.py
log_stats
called by 274
speechbrain/utils/train_logger.py
get
called by 223
speechbrain/utils/run_opts.py

Shape

Method 3,268
Function 2,789
Class 810
Route 7

Languages

Python100%

Modules by API surface

speechbrain/decoders/seq2seq.py91 symbols
speechbrain/nnet/schedulers.py90 symbols
speechbrain/lobes/models/HifiGAN.py69 symbols
speechbrain/decoders/scorer.py67 symbols
speechbrain/nnet/losses.py60 symbols
speechbrain/processing/features.py59 symbols
speechbrain/lobes/models/FastSpeech2.py58 symbols
speechbrain/nnet/RNN.py57 symbols
speechbrain/dataio/encoder.py57 symbols
speechbrain/utils/metric_stats.py53 symbols
speechbrain/lobes/models/dual_path.py52 symbols
recipes/AudioMNIST/diffusion/train.py52 symbols

Used by 2 indexed graphs manifest dependencies, hub-wide

Dependencies from manifests, versioned

huggingface_hub0.8.0 · 1×
hyperpyyaml0.0.1 · 1×
joblib0.14.1 · 1×
numpy1.17.0 · 1×
packaging
pandas1.0.1 · 1×
pre-commit2.3.0 · 1×
requests2.20.0 · 1×
scipy1.4.1 · 1×
sentencepiece0.1.91 · 1×
soundfile0.12.1 · 1×
torch2.1.0 · 1×

For agents

$ claude mcp add speechbrain \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact