
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translation. It can be used for turning semantic label maps into photo-realistic images or synthesizing portraits from face label maps.
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Andrew Tao1, Jan Kautz1, Bryan Catanzaro1
1NVIDIA Corporation, 2UC Berkeley
In CVPR 2018.







pip install dominate
git clone https://github.com/NVIDIA/pix2pixHD
cd pix2pixHD
datasets folder../checkpoints/label2city_1024p/bash ./scripts/test_1024p.sh):#!./scripts/test_1024p.sh
python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop none
The test results will be saved to a html file here: ./results/label2city_1024p/test_latest/index.html.
More example scripts can be found in the scripts directory.
datasets folder in the same way the example images are provided.bash ./scripts/train_512p.sh):#!./scripts/train_512p.sh
python train.py --name label2city_512p
./checkpoints/label2city_512p/web/index.html.
If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/label2city_512p/logs by adding --tf_log to the training scripts.bash ./scripts/train_512p_multigpu.sh):#!./scripts/train_512p_multigpu.sh
python train.py --name label2city_512p --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7
Note: this is not tested and we trained our model using single GPU only. Please use at your own discretion.
--fp16. For example,#!./scripts/train_512p_fp16.sh
python -m torch.distributed.launch train.py --name label2city_512p --fp16
In our test case, it trains about 80% faster with AMP on a Volta machine.
bash ./scripts/train_1024p_24G.sh), or 16G memory if using mixed precision (AMP).bash ./scripts/train_1024p_12G.sh), which will crop the images during training. Performance is not guaranteed using this script.--label_nc N during both training and testing.--label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.--no_instance.scale_width, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.--no_instance.If you find this useful for your research, please use the following.
@inproceedings{wang2018pix2pixHD,
title={High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs},
author={Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2018}
}
This code borrows heavily from pytorch-CycleGAN-and-pix2pix.
$ claude mcp add pix2pixHD \
-- python -m otcore.mcp_server <graph>