:zany_face: TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
This repository is tested on Ubuntu 18.04 with:
Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. We recommend you install TensorFlow 2.6.0 to training in case you want to use MultiGPU.
$ pip install TensorFlowTTS
Examples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.
$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
$ cd TensorFlowTTS
$ pip install .
If you want to upgrade the repository and its dependencies:
$ git pull
$ pip install --upgrade .
TensorFlowTTS currently provides the following architectures:
We are also implementing some techniques to improve quality and convergence speed from the following papers:
Here in an audio samples on valid set. tacotron-2, fastspeech, melgan, melgan.stft, fastspeech2, multiband_melgan
Prepare a dataset in the following format:
|- [NAME_DATASET]/
| |- metadata.csv
| |- wavs/
| |- file1.wav
| |- ...
Where metadata.csv has the following format: id|transcription. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.
Note that NAME_DATASET should be [ljspeech/kss/baker/libritts/synpaflex] for example.
The preprocessing has two steps:
To reproduce the steps above:
tensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/libritts/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
Right now we only support ljspeech, kss, baker, libritts, thorsten and
synpaflex for dataset argument. In the future, we intend to support more datasets.
Note: To run libritts preprocessing, please first read the instruction in examples/fastspeech2_libritts. We need to reformat it first before run preprocessing.
Note: To run synpaflex preprocessing, please first run the notebook notebooks/prepare_synpaflex.ipynb. We need to reformat it first before run preprocessing.
After preprocessing, the structure of the project folder should be:
|- [NAME_DATASET]/
| |- metadata.csv
| |- wav/
| |- file1.wav
| |- ...
|- dump_[ljspeech/kss/baker/libritts/thorsten]/
| |- train/
| |- ids/
| |- LJ001-0001-ids.npy
| |- ...
| |- raw-feats/
| |- LJ001-0001-raw-feats.npy
| |- ...
| |- raw-f0/
| |- LJ001-0001-raw-f0.npy
| |- ...
| |- raw-energies/
| |- LJ001-0001-raw-energy.npy
| |- ...
| |- norm-feats/
| |- LJ001-0001-norm-feats.npy
| |- ...
| |- wavs/
| |- LJ001-0001-wave.npy
| |- ...
| |- valid/
| |- ids/
| |- LJ001-0009-ids.npy
| |- ...
| |- raw-feats/
| |- LJ001-0009-raw-feats.npy
| |- ...
| |- raw-f0/
| |- LJ001-0001-raw-f0.npy
| |- ...
| |- raw-energies/
| |- LJ001-0001-raw-energy.npy
| |- ...
| |- norm-feats/
| |- LJ001-0009-norm-feats.npy
| |- ...
| |- wavs/
| |- LJ001-0009-wave.npy
| |- ...
| |- stats.npy
| |- stats_f0.npy
| |- stats_energy.npy
| |- train_utt_ids.npy
| |- valid_utt_ids.npy
|- examples/
| |- melgan/
| |- fastspeech/
| |- tacotron2/
| ...
stats.npy contains the mean and std from the training split mel spectrogramsstats_energy.npy contains the mean and std of energy values from the training splitstats_f0.npy contains the mean and std of F0 values in the training splittrain_utt_ids.npy / valid_utt_ids.npy contains training and validation utterances IDs respectivelyWe use suffix (ids, raw-feats, raw-energy, raw-f0, norm-feats, and wave) for each input type.
IMPORTANT NOTES:
- This preprocessing step is based on ESPnet so you can combine all models here with other models from ESPnet repository.
- Regardless of how your dataset is formatted, the final structure of the dump folder SHOULD follow the above structure to be able to use the training script, or you can modify it by yourself 😄.
To know how to train model from scratch or fine-tune with other datasets/languages, pl
$ claude mcp add TensorFlowTTS \
-- python -m otcore.mcp_server <graph>