MCPcopy Index your code
hub / github.com/AIGC-Audio/AudioGPT

github.com/AIGC-Audio/AudioGPT @main sqlite

repository ↗ · DeepWiki ↗
3,356 symbols 9,697 edges 260 files 671 documented · 20%
README

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

arXiv GitHub Stars visitors Hugging Face

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Currently not every model has repository.

Speech

Task Supported Foundation Models Status
Text-to-Speech FastSpeech, SyntaSpeech, VITS Yes (WIP)
Style Transfer GenerSpeech Yes
Speech Recognition whisper, Conformer Yes
Speech Enhancement ConvTasNet Yes (WIP)
Speech Separation TF-GridNet Yes (WIP)
Speech Translation Multi-decoder WIP
Mono-to-Binaural NeuralWarp Yes

Sing

Task Supported Foundation Models Status
Text-to-Sing DiffSinger, VISinger Yes (WIP)

Audio

Task Supported Foundation Models Status
Text-to-Audio Make-An-Audio Yes
Audio Inpainting Make-An-Audio Yes
Image-to-Audio Make-An-Audio Yes
Sound Detection Audio-transformer Yes
Target Sound Detection TSDNet Yes
Sound Extraction LASSNet Yes

Talking Head

Task Supported Foundation Models Status
Talking Head Synthesis GeneFace Yes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNetNATSpeechVisual ChatGPTHugging FaceLangChainStable Diffusion

Core symbols most depended-on inside this repo

append
called by 587
audio_detection/audio_infer/utils/utilities.py
size
called by 220
NeuralSeq/tasks/base_task.py
load
called by 157
mono2binaural/src/utils.py
update
called by 148
NeuralSeq/utils/__init__.py
_load_metrics
called by 127
audio_detection/audio_infer/utils/plot_statistics.py
register_buffer
called by 101
text_to_audio/Make_An_Audio/ldm/models/diffusion/ddim.py
keys
called by 68
NeuralSeq/modules/parallel_wavegan/utils/utils.py
pad
called by 58
NeuralSeq/utils/text_encoder.py

Shape

Method 2,142
Function 610
Class 604

Languages

Python100%

Modules by API surface

audio_detection/target_sound_detection/src/models.py104 symbols
audio-chatgpt.py88 symbols
text_to_audio/Make_An_Audio/ldm/models/diffusion/ddpm.py67 symbols
NeuralSeq/utils/pl_utils.py63 symbols
audio_detection/audio_infer/pytorch/models.py62 symbols
sound_extraction/model/modules.py58 symbols
audio_to_text/captioning/models/encoder.py58 symbols
NeuralSeq/utils/text_norm.py56 symbols
text_to_audio/Make_An_Audio/ldm/modules/x_transformer.py54 symbols
text_to_audio/Make_An_Audio/ldm/modules/diffusionmodules/model.py54 symbols
NeuralSeq/modules/GenerSpeech/model/glow_modules.py53 symbols
text_to_audio/Make_An_Audio/ldm/modules/encoders/modules.py52 symbols

Dependencies from manifests, versioned

Cython0.29.24 · 1×
Resemblyzer0.1.1.dev0 · 1×
TextGrid1.5 · 1×
addict2.4.0 · 1×
albumentations1.3.0 · 1×
appdirs1.4.4 · 1×
basicsr1.4.2 · 1×
beautifulsoup44.10.0 · 1×
einops0.3.0 · 1×
g2p-en2.1.0 · 1×
google3.0.0 · 1×
imageio2.9.0 · 1×

For agents

$ claude mcp add AudioGPT \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact