MCPcopy Index your code
hub / github.com/Kwai-Kolors/Kolors

github.com/Kwai-Kolors/Kolors @main sqlite

repository ↗ · DeepWiki ↗
439 symbols 1,382 edges 45 files 134 documented · 31%
README
English</a>&nbsp | &nbsp<a href="https://github.com/Kwai-Kolors/Kolors/raw/main/README_CN.md">中文</a>&nbsp












<img src="https://github.com/Kwai-Kolors/Kolors/raw/main/imgs/logo.png" width="400"/>

Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

Contents

🎉 News

📑 Open-source Plan

  • Kolors (Text-to-Image Model)
  • [x] Inference
  • [x] Checkpoints
  • [x] IP-Adapter
  • [x] ControlNet (Canny, Depth)
  • [x] Inpainting
  • [x] IP-Adapter-FaceID
  • [x] LoRA
  • [x] ControlNet (Pose)
  • [x] ComfyUI
  • [x] Gradio
  • [x] Diffusers

📖 Introduction

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this technical report.

📊 Evaluation

We have collected a comprehensive text-to-image evaluation dataset named KolorsPrompts to compare Kolors with other state-of-the-art open models and closed-source models. KolorsPrompts includes over 1,000 prompts across 14 catagories and 12 evaluation dimensions. The evaluation process incorporates both human and machine assessments. In relevant benchmark evaluations, Kolors demonstrated highly competitive performance, achieving industry-leading standards.

Human Assessment

For the human evaluation, we invited 50 imagery experts to conduct comparative evaluations of the results generated by different models. The experts rated the generated images based on three criteria: visual appeal, text faithfulness, and overall satisfaction. In the evaluation, Kolors achieved the highest overall satisfaction score and significantly led in visual appeal compared to other models.

Model Average Overall Satisfaction Average Visual Appeal Average Text Faithfulness
Adobe-Firefly 3.03 3.46 3.84
Stable Diffusion 3 3.26 3.50 4.20
DALL-E 3 3.32 3.54 4.22
Midjourney-v5 3.32 3.68 4.02
Playground-v2.5 3.37 3.73 4.04
Midjourney-v6 3.58 3.92 4.18
Kolors 3.59 3.99 4.17

All model results are tested with the April 2024 product versions

Machine Assessment

We used MPS (Multi-dimensional Human Preference Score) on KolorsPrompts as the evaluation metric for machine assessment. Kolors achieved the highest MPS score, which is consistent with the results of the human evaluations.

Models Overall MPS
Adobe-Firefly 8.5
Stable Diffusion 3 8.9
DALL-E 3 9.0
Midjourney-v5 9.4
Playground-v2.5 9.8
Midjourney-v6 10.2
Kolors 10.3

For more experimental results and details, please refer to our technical report.

🎥 Visualization

  • High-quality Portrait

  • Chinese Elements Generation

  • Complex Semantic Understanding

  • Text Rendering

The visualized case prompts mentioned above can be accessed here.

🛠️ Usage

Requirements

  • Python 3.8 or later
  • PyTorch 1.13.1 or later
  • Transformers 4.26.1 or later
  • Recommended: CUDA 11.7 or later

  • Repository Cloning and Dependency Installation

apt-get install git-lfs
git clone https://github.com/Kwai-Kolors/Kolors
cd Kolors
conda create --name kolors python=3.8
conda activate kolors
pip install -r requirements.txt
python3 setup.py install
  1. Weights download(link):
huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors

or

git lfs clone https://huggingface.co/Kwai-Kolors/Kolors weights/Kolors
  1. Inference:
python3 scripts/sample.py "一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着“可图”"
# The image will be saved to "scripts/outputs/sample_text.jpg"
  1. Web demo:
python3 scripts/sampleui.py

Using with Diffusers

Make sure you upgrade to the latest version(0.30.0.dev0) of diffusers:

git clone https://github.com/huggingface/diffusers
cd diffusers
python3 setup.py install

Notes: - The pipeline uses the EulerDiscreteScheduler by default. We recommend using this scheduler with guidance scale=5.0 and num_inference_steps=50. - The pipeline also supports the EDMDPMSolverMultistepScheduler. guidance scale=5.0 and num_inference_steps=25 is a good default for this scheduler. - In addition to Text-to-Image, KolorsImg2ImgPipeline also supports Image-to-Image.

And then you can run:

import torch
from diffusers import KolorsPipeline
pipe = KolorsPipeline.from_pretrained(
    "Kwai-Kolors/Kolors-diffusers", 
    torch_dtype=torch.float16, 
    variant="fp16"
).to("cuda")
prompt = '一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着"可图"'
image = pipe(
    prompt=prompt,
    negative_prompt="",
    guidance_scale=5.0,
    num_inference_steps=50,
    generator=torch.Generator(pipe.device).manual_seed(66),
).images[0]
image.show()

IP-Adapter-Plus

We provide IP-Adapter-Plus weights and inference code, detailed in the ipadapter.

# Weights download
huggingface-cli download --resume-download Kwai-Kolors/Kolors-IP-Adapter-Plus --local-dir weights/Kolors-IP-Adapter-Plus
# Inference:
python3 ipadapter/sample_ipadapter_plus.py ./ipadapter/asset/test_ip.jpg "穿着黑色T恤衫,上面中文绿色大字写着“可图”"

python3 ipadapter/sample_ipadapter_plus.py ./ipadapter/asset/test_ip2.png "一只可爱的小狗在奔跑"

# The image will be saved to "scripts/outputs/"

ControlNet

We provide three ControlNet weights and inference code, detailed in the controlnet.

# Weights download

# Canny - ControlNet
huggingface-cli download --resume-download Kwai-Kolors/Kolors-ControlNet-Canny --local-dir weights/Kolors-ControlNet-Canny

# Depth - ControlNet
huggingface-cli download --resume-download Kwai-Kolors/Kolors-ControlNet-Depth --local-dir weights/Kolors-ControlNet-Depth

# Pose - ControlNet
huggingface-cli download --resume-download Kwai-Kolors/Kolors-ControlNet-Pose --local-dir weights/Kolors-ControlNet-Pose

If you intend to utilize the depth estimation network, please make sure to download its corresponding model weights.

huggingface-cli download lllyasviel/Annotators ./dpt_hybrid-midas-501f0c75.pt --local-dir ./controlnet/annotator/ckpts

Thanks to DWPose, you can utilize the pose estimation network. Please download the Pose model dw-ll_ucoco_384.onnx (baidu, google) and Det model yolox_l.onnx (baidu, google). Then please put them into controlnet/annotator/ckpts/.

# Inference:

python ./controlnet/sample_controlNet.py ./controlnet/assets/woman_1.png 一个漂亮的女孩,高品质,超清晰,色彩鲜艳,超高分辨率,最佳品质,8k,高清,4K Canny

python ./controlnet/sample_controlNet.py ./controlnet/assets/woman_2.png 新海诚风格,丰富的色彩,穿着绿色衬衫的女人站在田野里,唯美风景,清新明亮,斑驳的光影,最好的质量,超细节,8K画质 Depth

python ./controlnet/sample_controlNet.py ./controlnet/assets/woman_3.png 一位穿着紫色泡泡袖连衣裙、戴着皇冠和白色蕾丝手套的女孩双手托脸,高品质,超清晰,色彩鲜艳,超高分辨率,最佳品质,8k,高清,4K Pose

# The image will be saved to "controlnet/outputs/"

Inpainting

We provide Inpainting weights and inference code, detailed in the inpainting.

# Weights download
huggingface-cli download --resume-download Kwai-Kolors/Kolors-Inpainting --local-dir weights/Kolors-Inpainting
# Inference:
python3 inpainting/sample_inpainting.py ./inpainting/asset/3.png ./inpainting/asset/3_mask.png 穿着美少女战士的衣服,一件类似于水手服风格的衣服,包括一个白色紧身上衣,前胸搭配一个大大的红色蝴蝶结。衣服的领子部分呈蓝色,并且有白色条纹。她还穿着一条蓝色百褶裙,超高清,辛烷渲染,高级质感,32k,高分辨率,最好的质量,超级细节,景深

python3 inpainting/sample_inpainting.py ./inpainting/asset/4.png ./inpainting/asset/4_mask.png 穿着钢铁侠的衣服,高科技盔甲,主要颜色为红色和金色,并且有一些银色装饰。胸前有一个亮起的圆形反应堆装置,充满了未来科技感。超清晰,高质量,超逼真,高分辨率,最好的质量,超级细节,景深

# The image will be saved to "scripts/outputs/"

IP-Adapter-FaceID-Plus

We p

Core symbols most depended-on inside this repo

get_activation
called by 12
controlnet/annotator/midas/midas/vit.py
decode
called by 11
kolors/models/tokenization_chatglm.py
get_command
called by 11
kolors/models/tokenization_chatglm.py
encode
called by 10
kolors/models/tokenization_chatglm.py
HWC3
called by 8
controlnet/annotator/util.py
constrain_to_multiple_of
called by 6
controlnet/annotator/midas/midas/transforms.py
load
called by 6
controlnet/annotator/midas/midas/base_model.py
enable_model_cpu_offload
called by 6
kolors/pipelines/pipeline_stable_diffusion_xl_chatglm_256.py

Shape

Method 272
Function 109
Class 58

Languages

Python100%

Modules by API surface

kolors/models/modeling_chatglm.py64 symbols
kolors/pipelines/pipeline_stable_diffusion_xl_chatglm_256_inpainting.py30 symbols
kolors/models/tokenization_chatglm.py28 symbols
kolors/models/unet_2d_condition.py27 symbols
controlnet/annotator/midas/midas/vit.py25 symbols
kolors/pipelines/pipeline_stable_diffusion_xl_chatglm_256_ipadapter_FaceID.py23 symbols
kolors/pipelines/pipeline_controlnet_xl_kolors_img2img.py21 symbols
dreambooth/train_dreambooth_lora.py21 symbols
controlnet/annotator/midas/midas/blocks.py21 symbols
kolors/pipelines/pipeline_stable_diffusion_xl_chatglm_256_ipadapter.py19 symbols
kolors/models/controlnet.py18 symbols
kolors/pipelines/pipeline_stable_diffusion_xl_chatglm_256.py17 symbols

Dependencies from manifests, versioned

Pillow9.4.0 · 1×
accelerate0.27.2 · 1×
deepspeed0.8.1 · 1×
diffusers0.28.2 · 1×
gradio4.37.2 · 1×
huggingface-hub0.23.4 · 1×
imageio2.25.1 · 1×
numpy1.21.6 · 1×
omegaconf2.3.0 · 1×
pandas1.3.5 · 1×
pydantic2.8.2 · 1×
safetensors0.3.3 · 1×

For agents

$ claude mcp add Kolors \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact