| 📘 Tutorials | 🌐 Website | 📚 Documentation | 🤝 Contributing | 🤗 HuggingFace | ▶️ YouTube | 🐦 X |
Please, help our community project. Star on GitHub!
Exciting News (January, 2024): Discover what is new in SpeechBrain 1.0 here!
SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models.
It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.
With the rise of deep learning, once-distant domains like speech processing and NLP are now very close. A well-designed neural network and large datasets are all you need.
We think it is now time for a holistic toolkit that, mimicking the human brain, jointly supports diverse technologies for complex Conversational AI systems.
This spans speech recognition, speaker recognition, speech enhancement, speech separation, language modeling, dialogue, and beyond.
Aligned with our long-term goal of natural human-machine conversation, including for non-verbal individuals, we have recently added support for the EEG modality.
We share over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks (see below).
We support both training from scratch and fine-tuning pretrained models such as Whisper, Wav2Vec2, WavLM, Hubert, GPT2, Llama2, and beyond. The models on HuggingFace can be easily plugged in and fine-tuned.
For any task, you train the model using these commands:
python train.py hparams/train.yaml
The hyperparameters are encapsulated in a YAML file, while the training process is orchestrated through a Python script.
We maintained a consistent code structure across different tasks.
For better replicability, training logs and checkpoints are hosted on Dropbox.
from speechbrain.inference import EncoderDecoderASR
asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-conformer-transformerlm-librispeech", savedir="pretrained_models/asr-transformer-transformerlm-librispeech")
asr_model.transcribe_file("speechbrain/asr-conformer-transformerlm-librispeech/example.wav")
🚀 Research Acceleration: Speeding up academic and industrial research. You can develop and integrate new models effortlessly, comparing their performance against our baselines.
⚡️ Rapid Prototyping: Ideal for quick prototyping in time-sensitive projects.
🎓 Educational Tool: SpeechBrain's simplicity makes it a valuable educational resource. It is used by institutions like Mila, Concordia University, Avignon University, and many others for student training.
To get started with SpeechBrain, follow these simple steps:
Install SpeechBrain using PyPI:
bash
pip install speechbrain
Access SpeechBrain in your Python code:
python
import speechbrain as sb
This installation is recommended for users who wish to conduct experiments and customize the toolkit according to their needs.
Clone the GitHub repository and install the requirements:
bash
git clone https://github.com/speechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .
Access SpeechBrain in your Python code:
python
import speechbrain as sb
Any modifications made to the speechbrain package will be automatically reflected, thanks to the --editable flag.
Ensure your installation is correct by running the following commands:
pytest tests
pytest --doctest-modules speechbrain
In SpeechBrain, you can train a model for any task using the following steps:
cd recipes/<dataset>/<task>/
python experiment.py params.yaml
The results will be saved in the output_folder specified in the YAML file.
Website: Explore general information on the official website.
Tutorials: Start with basic tutorials covering fundamental functionalities. Find advanced tutorials and topics in the Tutorial notebooks category in the SpeechBrain documentation.
Documentation: Detailed information on the SpeechBrain API, contribution guidelines, and code is available in the documentation.
| Tasks | Datasets | Technologies/Models |
|---|---|---|
| Language Modeling | CommonVoice, LibriSpeech | n-grams, RNNLM, TransformerLM |
| Response |
$ claude mcp add speechbrain \
-- python -m otcore.mcp_server <graph>