hub / github.com/POSTECH-CVLab/PyTorch-StudioGAN

github.com/POSTECH-CVLab/PyTorch-StudioGAN @v.0.4.0 sqlite

repository ↗ · DeepWiki ↗ · release v.0.4.0 ↗

670 symbols 1,441 edges 56 files 93 documented · 14%

README

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation. StudioGAN aims to offer an identical playground for modern GANs so that machine learning researchers can readily compare and analyze a new idea.

Moreover, StudioGAN provides an unprecedented-scale benchmark for generative models. The benchmark includes results from GANs (BigGAN-Deep, StyleGAN-XL), auto-regressive models (MaskGIT, RQ-Transformer), and Diffusion models (LSGM++, CLD-SGM, ADM-G-U).

News

Our new paper "StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis" is made public on arXiv.
StudioGAN provides implementations of 7 GAN architectures, 9 conditioning methods, 4 adversarial losses, 13 regularization modules, 3 differentiable augmentations, 8 evaluation metrics, and 5 evaluation backbones.
StudioGAN supports both clean and architecture-friendly metrics (IS, FID, PRDC, IFID) with a comprehensive benchmark.
StudioGAN provides wandb logs and pre-trained models (will be ready soon).

Release Notes (v.0.4.0)

We checked the reproducibility of implemented GANs.
We provide Baby, Papa, and Grandpa ImageNet datasets where images are processed using the anti-aliasing and high-quality resizer.
StudioGAN provides a dedicatedly established Benchmark on standard datasets (CIFAR10, ImageNet, AFHQv2, and FFHQ).
StudioGAN supports InceptionV3, ResNet50, SwAV, DINO, and Swin Transformer backbones for GAN evaluation.

Features

Coverage: StudioGAN is a self-contained library that provides 7 GAN architectures, 9 conditioning methods, 4 adversarial losses, 13 regularization modules, 6 augmentation modules, 8 evaluation metrics, and 5 evaluation backbones. Among these configurations, we formulate 30 GANs as representatives.
Flexibility: Each modularized option is managed through a configuration system that works through a YAML file, so users can train a large combination of GANs by mix-matching distinct options.
Reproducibility: With StudioGAN, users can compare and debug various GANs with the unified computing environment without concerning about hidden details and tricks.
Plentifulness: StudioGAN provides a large collection of pre-trained GAN models, training logs, and evaluation results.
Versatility: StudioGAN supports 5 types of acceleration methods with synchronized batch normalization for training: a single GPU training, data-parallel training (DP), distributed data-parallel training (DDP), multi-node distributed data-parallel training (MDDP), and mixed-precision training.

Implemented GANs

Method	Venue	Architecture	GC	DC	Loss	EMA
DCGAN	arXiv'15	DCGAN/ResNetGAN¹	N/A	N/A	Vanilla	False
InfoGAN	NIPS'16	DCGAN/ResNetGAN¹	N/A	N/A	Vanilla	False
LSGAN	ICCV'17	DCGAN/ResNetGAN¹	N/A	N/A	Least Sqaure	False
GGAN	arXiv'17	DCGAN/ResNetGAN¹	N/A	N/A	Hinge	False
WGAN-WC	ICLR'17	ResNetGAN	N/A	N/A	Wasserstein	False
WGAN-GP	NIPS'17	ResNetGAN	N/A	N/A	Wasserstein	False
WGAN-DRA	arXiv'17	ResNetGAN	N/A	N/A	Wasserstein	False
ACGAN-Mod²	-	ResNetGAN	cBN	AC	Hinge	False
PDGAN	ICLR'18	ResNetGAN	cBN	PD	Hinge	False
SNGAN	ICLR'18	ResNetGAN	cBN	PD	Hinge	False
SAGAN	ICML'19	ResNetGAN	cBN	PD	Hinge	False
TACGAN	Neurips'19	BigGAN	cBN	TAC	Hinge	True
LGAN	ICML'19	ResNetGAN	N/A	N/A	Vanilla	False
Unconditional BigGAN	ICLR'19	BigGAN	N/A	N/A	Hinge	True
BigGAN	ICLR'19	BigGAN	cBN	PD	Hinge	True
BigGAN-Deep-CompareGAN	ICLR'19	BigGAN-Deep CompareGAN	cBN	PD	Hinge	True
BigGAN-Deep-StudioGAN	-	BigGAN-Deep StudioGAN	cBN	PD	Hinge	True
StyleGAN2	CVPR' 20	StyleGAN2	cAdaIN	SPD	Logistic	True
CRGAN	ICLR'20	BigGAN	cBN	PD	Hinge	True
ICRGAN	AAAI'21	BigGAN	cBN	PD	Hinge	True
LOGAN	arXiv'19	ResNetGAN	cBN	PD	Hinge	True
ContraGAN	Neurips'20	BigGAN	cBN	2C	Hinge	True
MHGAN	WACV'21	BigGAN	cBN	MH	MH	True
BigGAN + DiffAugment	Neurips'20	BigGAN	cBN	PD	Hinge	True
StyleGAN2 + ADA	Neurips'20	StyleGAN2	cAdaIN	SPD	Logistic	True
BigGAN + LeCam	CVPR'2021	BigGAN	cBN	PD	Hinge	True
ADCGAN	arXiv'21	BigGAN	cBN	ADC	Hinge	True
ReACGAN	Neurips'21	BigGAN	cBN	D2D-CE	Hinge	True
StyleGAN2 + APA	Neurips'21	StyleGAN2	cAdaIN	SPD	Logistic	True
StyleGAN3-t	Neurips'21	StyleGAN3	cAaIN	SPD	Logistic	True
StyleGAN3-r	Neurips'21	StyleGAN3	cAaIN	SPD	Logistic	True

GC/DC indicates the way how we inject label information to the Generator or Discriminator.

EMA: Exponential Moving Average update to the generator. cBN : conditional Batch Normalization. cAdaIN: Conditional version of Adaptive Instance Normalization. AC : Auxiliary Classifier. PD : Projection Discriminator. TAC: Twin Auxiliary Classifier. SPD : Modified PD for StyleGAN. 2C : Conditional Contrastive loss. MH : Multi-Hinge loss. ADC : Auxiliary Discriminative Classifier. D2D-CE : Data-to-Data Cross-Entropy.

Evaluation Metrics

Method	Venue	Architecture
Inception Score (IS)	Neurips'16	InceptionV3
Frechet Inception Distance (FID)	Neurips'17	InceptionV3
Improved Precision & Recall	Neurips'19	InceptionV3
Classifier Accuracy Score (CAS)	Neurips'19	InceptionV3
Density & Coverage	ICML'20	InceptionV3
Intra-class FID	-	InceptionV3
SwAV FID	ICLR'21	SwAV
Clean metrics (IS, FID, PRDC)	CVPR'22	InceptionV3
Architecture-friendly metrics (IS, FID, PRDC)	arXiv'22	Not limited to InceptionV3

Training and Inference Techniques

Method	Venue	Target Architecture
FreezeD	CVPRW'20	Except for StyleGAN2
Top-K Training	Neurips'2020	-
DDLS	Neurips'2020	-
SeFa	CVPR'2021	BigGAN

Reproducibility

We check the reproducibility of GANs implemented in StudioGAN by comparing IS and FID with the original papers. We identify our platform successfully reproduces most of representative GANs except for PD-GAN, ACGAN, LOGAN, SAGAN, and BigGAN-Deep. FQ means Flickr-Faces-HQ Dataset (FFHQ). The resolutions of ImageNet, AFHQv2, and FQ datasets are 128, 512, and 1024, respectively.

Requirements

First, install PyTorch meeting your environment (at least 1.7, recommmended 1.10):

pip3 install torch==1.10.0+cu111 torchvision==0.11.1+cu111 torchaudio==0.10.0+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html

Then, use the following command to install the rest of the libraries:

pip3 install tqdm ninja h5py kornia matplotlib pandas sklearn scipy seaborn wandb PyYaml click requests pyspng imageio-ffmpeg

With docker, you can use:

docker pull mgkang/studiogan:latest

This is my command to make a container named "StudioGAN".

docker run -it --gpus all --shm-size 128g --name StudioGAN -v /home/USER:/root/code --workdir /root/code mgkang/studiogan:latest /bin/bash

Dataset

CIFAR10/CIFAR100: StudioGAN will automatically download the dataset once you execute main.py.
Tiny ImageNet, ImageNet, or a custom dataset:
download Tiny ImageNet , Baby ImageNet, Papa ImageNet, Grandpa ImageNet, ImageNet. Prepare your own dataset.
make the folder structure of the dataset as follows:

data
└── ImageNet, Tiny_ImageNet, Baby ImageNet, Papa ImageNet, or Grandpa ImageNet
    ├── train
    │   ├── cls0
    │   │   ├── train0.png
    │   │   ├── train1.png
    │   │   └── ...
    │   ├── cls1
    │   └── ...
    └── valid
        ├── cls0
        │   ├── valid0.png
        │   ├── valid1.png
        │   └── ...
        ├── cls1
        └── ...

Quick Start

Before starting, users should login wandb using their personal API key.

wandb login PERSONAL_API_KEY

From release 0.3.0, you can now define which evaluation metrics to use through -metrics option. Not specifying option defaults to calculating FID only. i.e. -metrics is fid calculates only IS and FID and -metrics none skips evaluation.

Train (-t) and evaluate IS, FID, Prc, Rec, Dns, Cvg (-metrics is fid prdc) of the model defined in CONFIG_PATH using GPU 0.

CUDA_VISIBLE_DEVICES=0 python3 src/main.py -t -metrics is fid prdc -cfg CONFIG_PATH -data DATA_PATH -save SAVE_PATH

Preprocess images for training and evaluation using PIL.LANCZOS filter (--pre_resizer lanczos). Then, train (-t) and evaluate friendly-IS, friendly-FID, friendly-Prc, friendly-Rec, friendly-Dns, friendly-Cvg (-metrics is fid prdc --post_resizer clean) of the model defined in CONFIG_PATH using GPU 0.

CUDA_VISIBLE_DEVICES=0 python3 src/main.py -t -metrics is fid prdc --pre_resizer lanczos --post_resizer clean -cfg CONFIG_PATH -data DATA_PATH -save SAVE_PATH

Train (-t) and evaluate FID of the model defined in CONFIG_PATH through DataParallel using GPUs (0, 1, 2, 3). Evaluation of FID does not require (-metrics) argument!

```bash CUDA_VISIBLE_DEVICES=0,1,2,3 python3 src/main.py -t -cfg CONFIG_PATH -data DATA_PATH -save

Core symbols most depended-on inside this repo

src/metrics/preparation.py

src/metrics/resnet.py

Shape

Method 313

Function 238

Class 119

Languages

Python100%

Modules by API surface

src/utils/misc.py58 symbols

src/utils/losses.py46 symbols

src/models/stylegan2.py39 symbols

src/metrics/swin_transformer.py37 symbols

src/metrics/vit.py35 symbols

src/utils/simclr_aug.py33 symbols

src/utils/style_ops/dnnlib/util.py30 symbols

src/models/stylegan3.py25 symbols

src/utils/ops.py23 symbols

src/worker.py20 symbols

src/sync_batchnorm/batchnorm.py16 symbols

src/metrics/inception_net.py16 symbols

For agents

$ claude mcp add PyTorch-StudioGAN \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact