MCPcopy Index your code
hub / github.com/Tongyi-MAI/Z-Image

github.com/Tongyi-MAI/Z-Image @main sqlite

repository ↗ · DeepWiki ↗
126 symbols 468 edges 18 files 18 documented · 14%
README

⚡️- Image An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Official Site  Hugging Face  Hugging Face  Hugging Face  Hugging Face  ModelScope Model  ModelScope Model  ModelScope Space  ModelScope Space  Art Gallery PDF  Web Art Gallery 

Welcome to the official repository for the Z-Image(造相)project!

✨ Z-Image

Z-Image is a powerful and highly efficient image generation model family with 6B parameters. Currently there are four variants:

  • 🚀 Z-Image-Turbo – A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers ⚡️sub-second inference latency⚡️ on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.

  • 🎨 Z-Image – The foundation model behind Z-Image-Turbo. Z-Image focuses on high-quality generation, rich aesthetics, strong diversity, and controllability, well-suited for creative generation, fine-tuning, and downstream development. It supports a wide range of artistic styles, effective negative prompting, and high diversity across identities, poses, compositions, and layouts.

  • 🧱 Z-Image-Omni-Base – The versatile foundation model capable of both generation and editing tasks. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development, providing the most "raw" and diverse starting point for the open-source community.

  • ✍️ Z-Image-Edit – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.

📣 News

  • [2026-01-27] 🔥 Z-Image is released! We have released the model checkpoint on Hugging Face and ModelScope. Try our online demo!
  • [2025-12-08] 🏆 Z-Image-Turbo ranked 8th overall on the Artificial Analysis Text-to-Image Leaderboard, making it the 🥇 #1 open-source model! Check out the full leaderboard.
  • [2025-12-01] 🎉 Our technical report for Z-Image is now available on arXiv.
  • [2025-11-26] 🔥 Z-Image-Turbo is released! We have released the model checkpoint on Hugging Face and ModelScope. Try our online demo!

📥 Model Zoo

Model Pre-Training SFT RL Step CFG Task Visual Quality Diversity Fine-Tunability Hugging Face ModelScope
Z-Image-Omni-Base 50 Gen. / Editing Medium High Easy To be released To be released
Z-Image 50 Gen. High Medium Easy Hugging Face

Hugging Face Space | ModelScope Model

ModelScope Space | | Z-Image-Turbo | ✅ | ✅ | ✅ | 8 | ❌ | Gen. | Very High | Low | N/A | Hugging Face

Hugging Face Space | ModelScope Model

ModelScope Space | | Z-Image-Edit | ✅ | ✅ | ❌ | 50 | ✅ | Editing | High | Medium | Easy | To be released | To be released |

The figure below illustrates at which training stage each model is produced.

Training Pipeline of Z-Image

🖼️ Showcase

📸 Photorealistic Quality: Z-Image-Turbo delivers strong photorealistic image generation while maintaining excellent aesthetic quality.

Showcase of Z-Image on Photo-realistic image Generation

📖 Accurate Bilingual Text Rendering: Z-Image-Turbo excels at accurately rendering complex Chinese and English text.

Showcase of Z-Image on Bilingual Text Rendering

💡 Prompt Enhancing & Reasoning: Prompt Enhancer empowers the model with reasoning capabilities, enabling it to transcend surface-level descriptions and tap into underlying world knowledge.

reasoning.jpg

🧠 Creative Image Editing: Z-Image-Edit shows a strong understanding of bilingual editing instructions, enabling imaginative and flexible image transformations.

Showcase of Z-Image-Edit on Image Editing

🏗️ Model Architecture

We adopt a Scalable Single-Stream DiT (S3-DiT) architecture. In this setup, text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level to serve as a unified input stream, maximizing parameter efficiency compared to dual-stream approaches.

Architecture of Z-Image and Z-Image-Edit

📈 Performance

Z-Image-Turbo's performance has been validated on multiple independent benchmarks, where it consistently demonstrates state-of-the-art results, especially as the leading open-source model.

Artificial Analysis Text-to-Image Leaderboard

On the highly competitive Artificial Analysis Leaderboard, Z-Image-Turbo ranked 8th overall and secured the top position as the 🥇 #1 Open-Source Model, outperforming all other open-source alternatives.

Z-Image Rank on Artificial Analysis Leaderboard

<span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Artificial Analysis Leaderboard</span>

Z-Image Rank on Artificial Analysis Leaderboard (Open-Source Model Only)

<span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Artificial Analysis Leaderboard (Open-Source Model Only)</span>

Alibaba AI Arena Text-to-Image Leaderboard

According to the Elo-based Human Preference Evaluation on Alibaba AI Arena, Z-Image-Turbo also achieves state-of-the-art results among open-source models and shows highly competitive performance against leading proprietary models.

Z-Image Elo Rating on AI Arena

<span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Alibaba AI Arena Text-to-Image Leaderboard</span>

🚀 Quick Start

(1) PyTorch Native Inference

Build a virtual environment you like and then install the dependencies:

pip install -e .

Then run the following code to generate an image:

python inference.py

(2) Diffusers Inference

Install the latest version of diffusers, use the following command:

Click here for details for why you need to install diffusers from source

We have submitted two pull requests (#12703 and #12715) to the 🤗 diffusers repository to add support for Z-Image. Both PRs have been merged into the latest official diffusers release. Therefore, you need to install diffusers from source for the latest features and Z-Image support.

pip install git+https://github.com/huggingface/diffusers

Z-Image-Turbo - Click to expand

Then, try the following code to generate an image:

import torch
from diffusers import ZImagePipeline

# 1. Load the pipeline
# Use bfloat16 for optimal performance on supported GPUs
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

# [Optional] Attention Backend
# Diffusers uses SDPA by default. Switch to Flash Attention for better efficiency if supported:
# pipe.transformer.set_attention_backend("flash")    # Enable Flash-Attention-2
# pipe.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3

# [Optional] Model Compilation
# Compiling the DiT model accelerates inference, but the first run will take longer to compile.
# pipe.transformer.compile()

# [Optional] CPU Offloading
# Enable CPU offloading for memory-constrained devices.
# pipe.enable_model_cpu_offload()

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,  # This actually results in 8 DiT forwards
    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("example.png")

Z-Image - Click to expand

Recommended Parameters: - Resolution: 512×512 to 2048×2048 (total pixel area, any aspect ratio) - Guidance scale: 3.0 – 5.0 - Inference steps: 28 – 50 - Negative prompts: Strongly recommended for better control - CFG normalization: False for general stylism, True for realism

Then, try the following code to generate an image: ```python import torch from diffusers import ZImagePipeline

Load the pipeline

pipe = ZImagePipeline.from_pretrained( "Tongyi-MAI/Z-Image", torch_dtype=torch.bfloat16, low_cpu_mem_usage=False, ) pipe.to("cuda")

Generate image

prompt = "两名年轻亚裔女性紧密站在一起,背景为朴素的灰色纹理墙面,可能是室内地毯地面。左侧女性留着长卷发,身穿藏青色毛衣,左袖有奶油色褶皱装饰,内搭白色立领衬衫,下身白色裤子;佩戴小巧金色耳钉,双臂交叉于背后。右侧女性留直肩长发,身穿奶油色卫衣,胸前印有"Tun the tables"字样,下方为"New ideas",搭配白色裤子;佩戴银色小环耳环,双臂交叉于胸前。两人均面带微笑直视镜头。照

Core symbols most depended-on inside this repo

get
called by 40
src/zimage/scheduler.py
format_bytes
called by 4
src/utils/helpers.py
set_timesteps
called by 3
src/zimage/scheduler.py
create_coordinate_grid
called by 3
src/zimage/transformer.py
load_config
called by 3
src/utils/loader.py
_native_attention_wrapper
called by 3
src/utils/attention.py
_sigma_to_t
called by 2
src/zimage/scheduler.py
generate
called by 2
src/zimage/pipeline.py

Shape

Method 56
Function 46
Class 24

Languages

Python100%

Modules by API surface

src/zimage/autoencoder.py37 symbols
src/zimage/transformer.py30 symbols
src/utils/attention.py23 symbols
src/zimage/scheduler.py13 symbols
src/utils/helpers.py6 symbols
batch_inference.py4 symbols
src/zimage/pipeline.py3 symbols
src/utils/loader.py3 symbols
src/utils/import_utils.py3 symbols
src/tools/generate_manifest.py3 symbols
inference.py1 symbols

Dependencies from manifests, versioned

huggingface_hub0.25.0 · 1×
safetensors
torch2.5.0 · 1×
transformers4.51.0 · 1×

For agents

$ claude mcp add Z-Image \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact