hub / github.com/TensorSpeech/TensorFlowTTS

github.com/TensorSpeech/TensorFlowTTS @v1.8 sqlite

repository ↗ · DeepWiki ↗ · release v1.8 ↗

867 symbols 2,644 edges 109 files 429 documented · 49%

README

:yum: TensorFlowTTS

Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

:zany_face: TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.

What's new

2021/08/18 (NEW!) Integrated to Huggingface Spaces with Gradio. See Gradio Web Demo.
2021/08/12 (NEW!) Support French TTS (Tacotron2, Multiband MelGAN). Pls see the colab. Many Thanks Samuel Delalez
2021/06/01 Integrated with Huggingface Hub. See the PR. Thanks patrickvonplaten and osanseviero
2021/03/18 Support IOS for FastSpeech2 and MB MelGAN. Thanks kewlbear. See here
2021/01/18 Support TFLite C++ inference. Thanks luan78zaoha. See here
2020/12/02 Support German TTS with Thorsten dataset. See the Colab. Thanks thorstenMueller and monatis
2020/11/24 Add HiFi-GAN vocoder. See here
2020/11/19 Add Multi-GPU gradient accumulator. See here
2020/08/23 Add Parallel WaveGAN tensorflow implementation. See here
2020/08/23 Add MBMelGAN G + ParallelWaveGAN G example. See here
2020/08/20 Add C++ inference code. Thank @ZDisket. See here
2020/08/18 Update new base processor. Add AutoProcessor and pretrained processor json file
2020/08/14 Support Chinese TTS. Pls see the colab. Thank @azraelkuan
2020/08/05 Support Korean TTS. Pls see the colab. Thank @crux153
2020/07/17 Support MultiGPU for all Trainer
2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the colab. Thank @jaeyoo from the TFlite team for his support
2020/06/20 FastSpeech2 implementation with Tensorflow is supported.
2020/06/07 Multi-band MelGAN (MB MelGAN) implementation with Tensorflow is supported

Features

High performance on Speech Synthesis.
Be able to fine-tune on other languages.
Fast, Scalable, and Reliable.
Suitable for deployment.
Easy to implement a new model, based-on abstract class.
Mixed precision to speed-up training if possible.
Support Single/Multi GPU gradient Accumulate.
Support both Single/Multi GPU in base trainer class.
TFlite conversion for all supported models.
Android example.
Support many languages (currently, we support Chinese, Korean, English, French and German)
Support C++ inference.
Support Convert weight for some models from PyTorch to TensorFlow to accelerate speed.

Requirements

This repository is tested on Ubuntu 18.04 with:

Python 3.7+
Cuda 10.1
CuDNN 7.6.5
Tensorflow 2.2/2.3/2.4/2.5/2.6
Tensorflow Addons >= 0.10.0

Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. We recommend you install TensorFlow 2.6.0 to training in case you want to use MultiGPU.

Installation

With pip

$ pip install TensorFlowTTS

From source

Examples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.

$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
$ cd TensorFlowTTS
$ pip install .

If you want to upgrade the repository and its dependencies:

$ git pull
$ pip install --upgrade .

Supported Model architectures

TensorFlowTTS currently provides the following architectures:

MelGAN released with the paper MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville.
Tacotron-2 released with the paper Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions by Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu.
FastSpeech released with the paper FastSpeech: Fast, Robust, and Controllable Text to Speech by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
Multi-band MelGAN released with the paper Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech by Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie.
FastSpeech2 released with the paper FastSpeech 2: Fast and High-Quality End-to-End Text to Speech by Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
Parallel WaveGAN released with the paper Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram by Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim.
HiFi-GAN released with the paper HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis by Jungil Kong, Jaehyeon Kim, Jaekyoung Bae.

We are also implementing some techniques to improve quality and convergence speed from the following papers:

Guided Attention Loss released with the paper Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention by Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara.

Audio Samples

Here in an audio samples on valid set. tacotron-2, fastspeech, melgan, melgan.stft, fastspeech2, multiband_melgan

Tutorial End-to-End

Prepare Dataset

Prepare a dataset in the following format:

|- [NAME_DATASET]/
|   |- metadata.csv
|   |- wavs/
|       |- file1.wav
|       |- ...

Where metadata.csv has the following format: id|transcription. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.

Note that NAME_DATASET should be [ljspeech/kss/baker/libritts/synpaflex] for example.

Preprocessing

The preprocessing has two steps:

Preprocess audio features
- Convert characters to IDs
- Compute mel spectrograms
- Normalize mel spectrograms to [-1, 1] range
- Split the dataset into train and validation
- Compute the mean and standard deviation of multiple features from the training split
Standardize mel spectrogram based on computed statistics

To reproduce the steps above:

tensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/libritts/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]

Right now we only support ljspeech, kss, baker, libritts, thorsten and synpaflex for dataset argument. In the future, we intend to support more datasets.

Note: To run libritts preprocessing, please first read the instruction in examples/fastspeech2_libritts. We need to reformat it first before run preprocessing.

Note: To run synpaflex preprocessing, please first run the notebook notebooks/prepare_synpaflex.ipynb. We need to reformat it first before run preprocessing.

After preprocessing, the structure of the project folder should be:

|- [NAME_DATASET]/
|   |- metadata.csv
|   |- wav/
|       |- file1.wav
|       |- ...
|- dump_[ljspeech/kss/baker/libritts/thorsten]/
|   |- train/
|       |- ids/
|           |- LJ001-0001-ids.npy
|           |- ...
|       |- raw-feats/
|           |- LJ001-0001-raw-feats.npy
|           |- ...
|       |- raw-f0/
|           |- LJ001-0001-raw-f0.npy
|           |- ...
|       |- raw-energies/
|           |- LJ001-0001-raw-energy.npy
|           |- ...
|       |- norm-feats/
|           |- LJ001-0001-norm-feats.npy
|           |- ...
|       |- wavs/
|           |- LJ001-0001-wave.npy
|           |- ...
|   |- valid/
|       |- ids/
|           |- LJ001-0009-ids.npy
|           |- ...
|       |- raw-feats/
|           |- LJ001-0009-raw-feats.npy
|           |- ...
|       |- raw-f0/
|           |- LJ001-0001-raw-f0.npy
|           |- ...
|       |- raw-energies/
|           |- LJ001-0001-raw-energy.npy
|           |- ...
|       |- norm-feats/
|           |- LJ001-0009-norm-feats.npy
|           |- ...
|       |- wavs/
|           |- LJ001-0009-wave.npy
|           |- ...
|   |- stats.npy
|   |- stats_f0.npy
|   |- stats_energy.npy
|   |- train_utt_ids.npy
|   |- valid_utt_ids.npy
|- examples/
|   |- melgan/
|   |- fastspeech/
|   |- tacotron2/
|   ...

stats.npy contains the mean and std from the training split mel spectrograms
stats_energy.npy contains the mean and std of energy values from the training split
stats_f0.npy contains the mean and std of F0 values in the training split
train_utt_ids.npy / valid_utt_ids.npy contains training and validation utterances IDs respectively

We use suffix (ids, raw-feats, raw-energy, raw-f0, norm-feats, and wave) for each input type.

IMPORTANT NOTES: - This preprocessing step is based on ESPnet so you can combine all models here with other models from ESPnet repository. - Regardless of how your dataset is formatted, the final structure of the dump folder SHOULD follow the above structure to be able to use the training script, or you can modify it by yourself 😄.

Training models

To know how to train model from scratch or fine-tune with other datasets/languages, pl

Extension points exported contracts — how you extend this code

OnTtsStateListener (Interface)

@author "mailto:xuefeng.ding@outlook.com" "Xuefeng Ding" Created 2020-07-28 14:25 [2 implementers]

examples/android/app/src/main/java/com/tensorspeech/tensorflowtts/dispatcher/OnTtsStateListener.java

Core symbols most depended-on inside this repo

calculate_3d_loss

called by 21

tensorflow_tts/utils/strategy.py

find_files

called by 20

tensorflow_tts/utils/utils.py

calculate_2d_loss

called by 14

tensorflow_tts/utils/strategy.py

compile

called by 14

tensorflow_tts/trainers/base_trainer.py

synthesis

called by 13

tensorflow_tts/models/mb_melgan.py

return_strategy

called by 12

tensorflow_tts/utils/strategy.py

create

called by 12

examples/melgan/audio_mel_dataset.py

collater

called by 12

examples/melgan/train_melgan.py

Shape

Method 586

Function 134

Class 131

Route 15

Interface 1

Languages

Python91%

Java9%

Modules by API surface

tensorflow_tts/trainers/base_trainer.py89 symbols

tensorflow_tts/models/tacotron2.py72 symbols

tensorflow_tts/models/fastspeech.py60 symbols

tensorflow_tts/models/parallel_wavegan.py27 symbols

tensorflow_tts/models/melgan.py27 symbols

tensorflow_tts/models/hifigan.py23 symbols

tensorflow_tts/utils/korean.py17 symbols

tensorflow_tts/utils/group_conv.py17 symbols

tensorflow_tts/processor/base_processor.py17 symbols

examples/fastspeech/fastspeech_dataset.py17 symbols

examples/android/app/src/main/java/com/tensorspeech/tensorflowtts/utils/Processor.java16 symbols

tensorflow_tts/optimizers/adamweightdecay.py15 symbols

For agents

$ claude mcp add TensorFlowTTS \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact