hub / github.com/DrewThomasson/ebook2audiobook

github.com/DrewThomasson/ebook2audiobook @v26.6.26 sqlite

repository ↗ · DeepWiki ↗ · release v26.6.26 ↗

1,888 symbols 7,166 edges 293 files 259 documented · 14%

README

📚 ebook2audiobook (E2A)

CPU/GPU Converter from E-Book to audiobook with chapters and metadata

using advanced TTS engines and much more.

Supports voice cloning and 1158 languages!

[!IMPORTANT] This tool is intended for use with non-DRM, legally acquired eBooks only.

The authors are not responsible for any misuse of this software or any resulting legal consequences.

Use this tool responsibly and in accordance with all applicable laws.

Thanks to support ebook2audiobook developers!

Run locally

Run Remotely

GUI Interface

demo_web_gui

Click to see images of Web GUI

GUI Screen 1 GUI Screen 2 GUI Screen 3

Demos

New Default Voice Demo

https://github.com/user-attachments/assets/750035dc-e355-46f1-9286-05c1d9e88cea

More Demos

ASMR Voice

https://github.com/user-attachments/assets/68eee9a1-6f71-4903-aacd-47397e47e422

Rainy Day Voice

https://github.com/user-attachments/assets/d25034d9-c77f-43a9-8f14-0d167172b080

Scarlett Voice

https://github.com/user-attachments/assets/b12009ee-ec0d-45ce-a1ef-b3a52b9f8693

David Attenborough Voice

https://github.com/user-attachments/assets/81c4baad-117e-4db5-ac86-efc2b7fea921

Example

Example

README.md

ebook2audiobook
Features
GUI Interface
Demos
Supported Languages
Minimum Requirements
Usage
Run Locally
Run Remotely
Docker
- Steps to Run
- Common Docker Issues
Fine Tuned TTS models
Collection of Fine-Tuned TTS Models
Train XTTSv2
Supported eBook Formats
Output Formats
Revert to older Version
Common Issues
Special Thanks
Table of Contents

Features

🔧 TTS Engines supported: XTTSv2, Bark, Fairseq, VITS, Tacotron2, Tortoise, GlowTTS, YourTTS
📚 Convert multiple file formats: .epub, .mobi, .azw3, .fb2, .lrf, .rb, .snb, .tcr, .pdf, .txt, .rtf, .doc, .docx, .html, .odt, .azw, .tiff, .tif, .png, .jpg, .jpeg, .bmp, .zip
💻 TextArea to convert directly a short text in audio
🔍 OCR scanning for files with text pages as images
🔊 High-quality text-to-speech from near realtime to near real voice
🗣️ Optional voice cloning using your own voice file
🌐 Supports 1158 languages (supported languages list)
💻 Low-resource friendly — runs on 2 GB RAM / 1 GB VRAM (minimum)
🎵 Audiobook output formats: mono or stereo aac, flac, mp3, m4b, m4a, mp4, mov, ogg, wav, webm
🧠 SML tags supported — fine-grained control of breaks, pauses, voice switching and more (see below)
🧩 Optional custom model using your own trained model (XTTSv2, VITS, FAIRSEQ, PIPER, others on request)
🎛️ Fine-tuned preset models trained by the E2A Team

(Contact us if you need additional fine-tuned models, or if you’d like to share yours to the official preset list)

Hardware Requirements

2GB RAM min, 8GB recommended.
1GB VRAM min, 4GB recommended.
Virtualization enabled if running on windows (Docker only).
CPU, XPU (intel, AMD, ARM)*.
CUDA, ROCm, JETSON
MPS (Apple Silicon CPU)

* Modern TTS engines are very slow on CPU, so use lower quality TTS like YourTTS, Tacotron2 etc..

Supported Languages

Arabic (ar)	Chinese (zh)	English (en)	Spanish (es)
French (fr)	German (de)	Italian (it)	Portuguese (pt)
Polish (pl)	Turkish (tr)	Russian (ru)	Dutch (nl)
Czech (cs)	Japanese (ja)	Hindi (hi)	Bengali (bn)
Hungarian (hu)	Korean (ko)	Vietnamese (vi)	Swedish (sv)
Persian (fa)	Yoruba (yo)	Swahili (sw)	Indonesian (id)
Slovak (sk)	Croatian (hr)	Tamil (ta)	Danish (da)
- +1130 languages and dialects here

Supported eBook Formats

.epub, .pdf, .mobi, .txt, .html, .rtf, .chm, .lit, .pdb, .fb2, .odt, .cbr, .cbz, .prc, .lrf, .pml, .snb, .cbc, .rb, .tcr
Best results: .epub or .mobi for automatic chapter detection

Output and process Formats

.m4b, .m4a, .mp4, .webm, .mov, .mp3, .flac, .wav, .ogg, .aac
Process format can be changed in lib/conf.py

SML tags available

[break] — silence (random range 0.3–0.6 sec.)
[pause] — silence (random range 1.0–1.6 sec.)
[pause:N] — fixed pause (N sec.)
[voice:/path/to/voice/file]...[/voice] — switch voice from default or selected voice from GUI/CLI

Check our other repo dedicated to add SML automatically in your ebook -> E2A-SML

[!IMPORTANT] **Before to post an install or bug issue search carefully to the opened and closed issues TAB

to be sure your issue does not exist already.**

[!NOTE] **EPUB format lacks any standard structure like what is a chapter, paragraph, preface etc.

So you should first remove manually any text you don't want to be converted in audio.**

Instructions

Clone repo bash git clone https://github.com/DrewThomasson/ebook2audiobook.git cd ebook2audiobook
Install / Run ebook2audiobook:
Linux/MacOS
bash ./ebook2audiobook.command Note for MacOS users: homebrew is installed to install missing programs.
Mac Launcher
Double click Mac Ebook2Audiobook Launcher.command
Windows
bash ebook2audiobook.cmd or Double click ebook2audiobook.cmd

Note for Windows users: scoop is installed to install missing programs without administrator privileges.
Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks. http://localhost:7860/
For Public Link: ./ebook2audiobook.command --share (Linux/MacOS) ebook2audiobook.cmd --share (Windows) python app.py --share (all OS)

[!IMPORTANT] **If the script is stopped and run again, you need to refresh your gradio GUI interface

to let the web page reconnect to the new connection socket.**

Basic Usage

Linux/MacOS: bash ./ebook2audiobook.command --headless --ebook <path_to_ebook_file> --voice <path_to_voice_file> --language <language_code>
Windows bash ebook2audiobook.cmd --headless --ebook <path_to_ebook_file> --voice <path_to_voice_file> --language <language_code>
[--ebook]: Path to your eBook file
[--voice]: Voice cloning file path (optional)
[--language]: Language code in ISO-639-3 (i.e.: ita for italian, eng for english, deu for german...).

Default language is eng and --language is optional for default language set in ./lib/lang.py.

The ISO-639-1 2 letters codes are also supported.

Example of Custom Model Zip Upload

(must be a .zip file containing the mandatory model files. Example for XTTSv2: config.json, model.pth, vocab.json and ref.wav) - Linux/MacOS bash ./ebook2audiobook.command --headless --ebook <ebook_file_path> --language <language> --custom_model <custom_model_path> - Windows bash ebook2audiobook.cmd --headless --ebook <ebook_file_path> --language <language> --custom_model <custom_model_path> Note: the ref.wav of your custom model is always the voice selected for the conversion

: Path to model_name.zip file, which must contain (according to the tts engine) all the mandatory files

(see ./lib/models.py).

For Detailed Guide with list of all Parameters to use

Linux/MacOS bash ./ebook2audiobook.command --help
Windows bash ebook2audiobook.cmd --help
Or for all OS python app.py --help

```bash usage: app.py [-h] [--session SESSION] [--share] [--headless] [--ebook EBOOK] [--ebooks_dir EBOOKS_DIR] [--language LANGUAGE] [--voice VOICE] [--voice_map VOICE_MAP] [--device {CPU,CUDA,MPS,ROCM,XPU,JETSON}] [--tts_engine {XTTS,BARK,VITS,FAIRSEQ,TACOTRON,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}] [--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED] [--output_format OUTPUT_FORMAT] [--output_channel OUTPUT_CHANNEL] [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS] [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting] [--text_temp TEXT_TEMP] [--waveform_temp WAVEFORM_TEMP] [--output_dir OUTPUT_DIR] [--version]

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion.

options: -h, --help show this help message and exit --session SESSION Session to resume the conversion in case of interruption, crash, or reuse of custom models and custom cloning voices.

**** The following option is for gradio/gui mode only: --share (Optional) Enable a public shareable Gradio link.

**** The following options are for --headless mode only: --headless Run the script in headless mode --ebook EBOOK Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present. --ebooks_dir EBOOKS_DIR Relative or absolute path of the directory containing the files to convert. Cannot be used when --ebook is present. --text TEXT Raw text for conversion. Cannot be used when --ebook or --ebooks_dir is present. --language LANGUAGE Language of the e-book. Default language is set in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py

optional parameters: --translate ISO3 (Optional) Translate ebook to a target language (ISO 639-3 code, e.g. eng, fra, deu) before TTS synthesis. Uses argostranslate. The target language becomes the effective TTS language for the run. A copy of the source ebook is made with the _ suffix so translated and non-translated outputs stay isolated (independent process folder, audio chunks, and final file). --voice VOICE (Optional) Path to the voice cloning file for TTS engine. Uses the default voice if not present. --voice_map VOICE_MAP (Optional, --ebooks_dir only) Path to a JSON file mapping ebook path -> voice path. Each entry overrides --voice for that specific ebook. Missing/null entries fall back to --voice. Keys may be absolute paths or basenames. Example:

Core symbols most depended-on inside this repo

join

called by 579

ext/py/num2words/num2words/lang_ID.py

update

called by 356

ext/py/demucs/demucs/ema.py

ext/py/demucs/demucs/audio.py

write

called by 63

lib/classes/std_filter.py

update

called by 60

components/E2A-SML/booknlp/english/gender_inference_model_1.py

components/Universal_TTS_Finetune/piper/piper/voice.py

Shape

Method 1,027

Function 643

Class 217

Route 1

Languages

Python100%

Modules by API surface

lib/core.py109 symbols

lib/gradio.py72 symbols

components/Universal_TTS_Finetune/utils/pipeline.py52 symbols

lib/classes/device_installer.py37 symbols

ext/py/demucs/demucs/transformer.py37 symbols

components/Universal_TTS_Finetune/web_gui.py35 symbols

components/Universal_TTS_Finetune/piper/piper_train/vits/modules.py34 symbols

components/Universal_TTS_Finetune/piper/piper_train/vits/models.py33 symbols

components/Universal_TTS_Finetune/utils/tokenizer.py30 symbols

ext/py/demucs/demucs/repo.py28 symbols

ext/py/num2words/num2words/base.py27 symbols

lib/classes/tts_engines/common/utils.py26 symbols

Dependencies from manifests, versioned

Cython0.29.0 · 1×

accelerate0.33.0 · 1×

argostranslate1.11.0 · 1×

audiocraft1.3.0 · 1×

av11.0.0 · 1×

blis0.7.11 · 1×

catalogue2.0.10 · 1×

cutlet1×

faster_whisper1.0.2 · 1×

ffmpeg-python0.2.0 · 1×

gradio5.49.1 · 1×

huggingface-hub0.25.2 · 1×

For agents

$ claude mcp add ebook2audiobook \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact