hub / github.com/AIGC-Audio/AudioGPT

github.com/AIGC-Audio/AudioGPT @main sqlite

repository ↗ · DeepWiki ↗

3,356 symbols 9,697 edges 260 files 671 documented · 20%

README

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Currently not every model has repository.

Speech

Task	Supported Foundation Models	Status
Text-to-Speech	FastSpeech, SyntaSpeech, VITS	Yes (WIP)
Style Transfer	GenerSpeech	Yes
Speech Recognition	whisper, Conformer	Yes
Speech Enhancement	ConvTasNet	Yes (WIP)
Speech Separation	TF-GridNet	Yes (WIP)
Speech Translation	Multi-decoder	WIP
Mono-to-Binaural	NeuralWarp	Yes

Sing

Task	Supported Foundation Models	Status
Text-to-Sing	DiffSinger, VISinger	Yes (WIP)

Audio

Task	Supported Foundation Models	Status
Text-to-Audio	Make-An-Audio	Yes
Audio Inpainting	Make-An-Audio	Yes
Image-to-Audio	Make-An-Audio	Yes
Sound Detection	Audio-transformer	Yes
Target Sound Detection	TSDNet	Yes
Sound Extraction	LASSNet	Yes

Talking Head

Task	Supported Foundation Models	Status
Talking Head Synthesis	GeneFace	Yes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNet NATSpeech Visual ChatGPT Hugging Face LangChain Stable Diffusion

Core symbols most depended-on inside this repo

append

called by 587

audio_detection/audio_infer/utils/utilities.py

size

called by 220

NeuralSeq/tasks/base_task.py

load

called by 157

mono2binaural/src/utils.py

update

called by 148

NeuralSeq/utils/__init__.py

_load_metrics

called by 127

audio_detection/audio_infer/utils/plot_statistics.py

register_buffer

called by 101

text_to_audio/Make_An_Audio/ldm/models/diffusion/ddim.py

keys

called by 68

NeuralSeq/modules/parallel_wavegan/utils/utils.py

pad

called by 58

NeuralSeq/utils/text_encoder.py

Shape

Method 2,142

Function 610

Class 604

Languages

Python100%

Modules by API surface

audio_detection/target_sound_detection/src/models.py104 symbols

audio-chatgpt.py88 symbols

text_to_audio/Make_An_Audio/ldm/models/diffusion/ddpm.py67 symbols

NeuralSeq/utils/pl_utils.py63 symbols

audio_detection/audio_infer/pytorch/models.py62 symbols

sound_extraction/model/modules.py58 symbols

audio_to_text/captioning/models/encoder.py58 symbols

NeuralSeq/utils/text_norm.py56 symbols

text_to_audio/Make_An_Audio/ldm/modules/x_transformer.py54 symbols

text_to_audio/Make_An_Audio/ldm/modules/diffusionmodules/model.py54 symbols

NeuralSeq/modules/GenerSpeech/model/glow_modules.py53 symbols

text_to_audio/Make_An_Audio/ldm/modules/encoders/modules.py52 symbols

Dependencies from manifests, versioned

Cython0.29.24 · 1×

Resemblyzer0.1.1.dev0 · 1×

TextGrid1.5 · 1×

addict2.4.0 · 1×

albumentations1.3.0 · 1×

appdirs1.4.4 · 1×

basicsr1.4.2 · 1×

beautifulsoup44.10.0 · 1×

einops0.3.0 · 1×

g2p-en2.1.0 · 1×

google3.0.0 · 1×

imageio2.9.0 · 1×

For agents

$ claude mcp add AudioGPT \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact