MCPcopy
hub / github.com/FoundationVision/LlamaGen

github.com/FoundationVision/LlamaGen @main sqlite

repository ↗ · DeepWiki ↗
553 symbols 1,882 edges 66 files 82 documented · 15%
README

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

demo  arXiv  project page 

This repo contains pre-trained model weights and training/sampling PyTorch(torch>=2.1.0) codes used in

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan

HKU, ByteDance

You can find more visualizations on project page

🔥 Update

  • [2024.06.28] Image tokenizers and AR models for text-conditional image generation are released ! Try it !
  • [2024.06.15] All models ranging from 100M to 3B parameters are supported by vLLM !
  • [2024.06.11] Image tokenizers and AR models for class-conditional image generation are released !
  • [2024.06.11] Code and Demo are released !

🌿 Introduction

We introduce LlamaGen, a new family of image generation models that apply original next-token prediction paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spaces of image tokenizers, scalability properties of image generation models, and their training data quality.

In this repo, we release: * Two image tokenizers of downsample ratio 16 and 8. * Seven class-conditional generation models ranging from 100M to 3B parameters. * Two text-conditional generation models of 700M parameters. * Online demos in Hugging Face Spaces for running pre-trained models. * Supported vLLM serving framework to enable 300% - 400% speedup.

🦄 Class-conditional image generation on ImageNet

VQ-VAE models

Method params tokens rFID (256x256) weight
vq_ds16_c2i 72M 16x16 2.19 vq_ds16_c2i.pt
vq_ds16_c2i 72M 24x24 0.94 above
vq_ds16_c2i 72M 32x32 0.70 above
vq_ds8_c2i 70M 32x32 0.59 vq_ds8_c2i.pt

AR models

Method | params | training | tokens | FID (256x256) | weight --- |:---:|:---:|:---:|:---:|:---:| LlamaGen-B | 111M | DDP | 16x16 | 5.46 | c2i_B_256.pt LlamaGen-B | 111M | DDP | 24x24 | 6.09 | c2i_B_384.pt LlamaGen-L | 343M | DDP | 16x16 | 3.80 | c2i_L_256.pt LlamaGen-L | 343M | DDP | 24x24 | 3.07 | c2i_L_384.pt LlamaGen-XL | 775M | DDP | 24x24 | 2.62 | c2i_X_384L.pt LlamaGen-XXL | 1.4B | FSDP | 24x24 | 2.34 | c2i_XXL_384.pt LlamaGen-3B | 3.1B | FSDP | 24x24 | 2.18 | c2i_3B_384.pt

Demo

Please download models, put them in the folder ./pretrained_models, and run

python3 autoregressive/sample/sample_c2i.py --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_L_384.pt --gpt-model GPT-L --image-size 384
# or
python3 autoregressive/sample/sample_c2i.py --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_XXL_384.pt --gpt-model GPT-XXL --from-fsdp --image-size 384

The generated images will be saved to sample_c2i.png.

Gradio Demo

You can use our online gradio demo Hugging Face Spaces or run gradio locally:

python app.py

🚀 Text-conditional image generation

VQ-VAE models

Method params tokens data weight
vq_ds16_t2i 72M 16x16 LAION COCO (50M) + internal data (10M) vq_ds16_t2i.pt

AR models

Method params tokens data weight
LlamaGen-XL 775M 16x16 LAION COCO (50M) t2i_XL_stage1_256.pt
LlamaGen-XL 775M 32x32 internal data (10M) t2i_XL_stage2_512.pt

Demo

Before running demo, please refer to language readme to install the required packages and language models.

Please download models, put them in the folder ./pretrained_models, and run

python3 autoregressive/sample/sample_t2i.py --vq-ckpt ./pretrained_models/vq_ds16_t2i.pt --gpt-ckpt ./pretrained_models/t2i_XL_stage1_256.pt --gpt-model GPT-XL --image-size 256
# or
python3 autoregressive/sample/sample_t2i.py --vq-ckpt ./pretrained_models/vq_ds16_t2i.pt --gpt-ckpt ./pretrained_models/t2i_XL_stage2_512.pt --gpt-model GPT-XL --image-size 512

The generated images will be saved to sample_t2i.png.

Local Gradio Demo

⚡ Serving

We use serving framework vLLM to enable higher throughput. Please refer to serving readme to install the required packages.

python3 autoregressive/serve/sample_c2i.py --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_XXL_384.pt --gpt-model GPT-XXL --from-fsdp --image-size 384

The generated images will be saved to sample_c2i_vllm.png.

Getting Started

See Getting Started for installation, training and evaluation.

License

The majority of this project is licensed under MIT License. Portions of the project are available under separate license of referred projects, detailed in corresponding files.

BibTeX

@article{sun2024autoregressive,
  title={Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation},
  author={Sun, Peize and Jiang, Yi and Chen, Shoufa and Zhang, Shilong and Peng, Bingyue and Luo, Ping and Yuan, Zehuan},
  journal={arXiv preprint arXiv:2406.06525},
  year={2024}
}

Core symbols most depended-on inside this repo

print
called by 137
utils/distributed.py
load
called by 40
evaluations/c2i/evaluator.py
update
called by 22
autoregressive/models/gpt.py
encode
called by 11
tokenizer/vqgan/model.py
decode_code
called by 9
tokenizer/vqgan/model.py
from_pretrained
called by 9
tokenizer/tokenizer_image/lpips.py
create_logger
called by 8
utils/logger.py
build_dataset
called by 7
dataset/build.py

Shape

Method 291
Function 174
Class 88

Languages

Python100%

Modules by API surface

evaluations/c2i/evaluator.py51 symbols
autoregressive/models/gpt.py47 symbols
tokenizer/tokenizer_image/vq_model.py35 symbols
autoregressive/serve/gpt_model.py35 symbols
autoregressive/serve/model_runner.py33 symbols
autoregressive/serve/llm_engine.py24 symbols
tokenizer/vqgan/layer.py20 symbols
tokenizer/tokenizer_image/discriminator.py20 symbols
autoregressive/serve/sampler.py20 symbols
autoregressive/serve/worker.py19 symbols
tokenizer/tokenizer_image/lpips.py18 symbols
evaluations/t2i/evaluation.py14 symbols

Dependencies from manifests, versioned

torch2.1.0 · 1×

For agents

$ claude mcp add LlamaGen \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact