![]()
OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang
xML Lab, National University of Singapore
OminiControl2: Efficient Conditioning for Diffusion Transformers
Zhenxiong Tan, Qiaochu Xue, Xingyi Yang, Songhua Liu, and Xinchao Wang
xML Lab, National University of Singapore
OminiControl is a minimal yet powerful universal control framework for Diffusion Transformer models like FLUX.
Universal Control 🌐: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
Minimal Design 🚀: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
diffusers 0.38. generate() gains a condition_scale argument to adjust condition strength, subject-driven generation now works on FLUX.1-dev (see example), and OminiControl2's KV-cache fast inference (kv_cache=True) is now documented (see Usage example).conda create -n omini python=3.12
conda activate omini
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
examples/subject.ipynb (on FLUX.1-dev: examples/subject_dev.ipynb)examples/subject_1024.ipynb (best quality with the 1024-trained model)examples/inpainting.ipynbexamples/spatial.ipynbexamples/combine_with_style_lora.ipynbexamples/ominicontrol_art.ipynbNote (multiple LoRAs): if you load more than one adapter via repeated
pipe.load_lora_weights(..., adapter_name=...)calls, activate them explicitly withpipe.set_adapters(["adapter_a", "adapter_b"])— otherwise only the last-loaded adapter stays active and the others are silently ignored. Seeexamples/spatial.ipynbfor the pattern.
generate() accepts a condition_scale argument (default 1.0, which reproduces the original behavior exactly). Values > 1 strengthen the condition image's influence, values < 1 weaken it, and 0 suppresses it entirely.
result = generate(pipe, prompt=prompt, conditions=[condition], condition_scale=1.3)
Pass kv_cache=True to generate() to compute the condition branch's keys/values once and reuse them across all remaining steps (~1.5x end-to-end speedup at 8 steps). This requires a LoRA trained with independent_condition: true — see Efficient Generation (OminiControl2).
subject/subject_512, 1024x1024 for subject_1024_beta) before passing it in — the pipeline does not do this automatically. See the example notebooks for the preprocessing code.this item, the object, or it. e.g.Demos (Left: condition image; Right: generated image)
![]()
Text Prompts
More results
Oye-cartoon finetune:
![]()
Prompt: The Mona Lisa is wearing a white VR headset with 'Omini' written on it.
- Prompt: A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.
2. Other spatially aligned tasks (Canny edge to image, depth to image, colorization, deblurring)
Click to show
<img src='./assets/demo/room_corner_canny.jpg' width='48%'/>
<img src='./assets/demo/room_corner_depth.jpg' width='48%' />
<img src='./assets/demo/room_corner_coloring.jpg' width='48%' />
<img src='./assets/demo/room_corner_deblurring.jpg' width='48%' />
Prompt: *A light gray sofa stands against a white wall, featuring a black and white geometric patterned pillow. A white side table sits next to the sofa, topped with a white adjustable desk lamp and some books. Dark hardwood flooring contrasts with the pale walls and furniture.*
![]()
Subject-driven control:
| Model | Description | Resolution |
| ------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------ |
| experimental / subject | The model used in the paper. | (512, 512) |
| omini / subject_512 | The model has been fine-tuned on a larger dataset. | (512, 512) |
| omini / subject_1024_beta | The model has been fine-tuned on a larger dataset and trained at 1024x1024. | (1024, 1024) |
| oye-cartoon | Fine-tuned on the oye-cartoon dataset by @saquib764 (for FLUX.1-dev) | (512, 512) |
The subject LoRAs were trained on
FLUX.1-dev. When running them onFLUX.1-dev, enable real image guidance (image_guidance_scale > 1.0, keepguidance_scale=3.5) — seeexamples/subject_dev.ipynb. They also run onFLUX.1-schnell(as in the example notebooks), where no image guidance is needed.
subject_1024_betawas trained at 1024x1024 and gives its best results at that resolution. The weight currently lives in a non-main revision of the HF repo — passrevision=when loading it (seeexamples/subject_1024.ipynb).
Spatially aligned control:
| Model | Description | Resolution |
| --------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------------ |
| experimental / <task_name> | Canny edge to image (canny), depth to image (depth), colorization (coloring), deblurring (deblurring), in-painting (fill). Works on both FLUX.1-dev and FLUX.1-schnell. | (512, 512) |
FLUX.1-dev. When running them on FLUX.1-dev, you must use real image guidance: call generate(...) with image_guidance_scale > 1.0 (e.g. 1.5) and more steps (~20–28) — without it, FLUX.1-dev tends to ignore the condition. image_guidance_scale is the tunable CFG knob; the distilled guidance_scale must be kept at 3.5 (the value used in training, for train/inference consistency — it is not a free hyperparameter). See examples/subject_dev.ipynb. On FLUX.1-schnell (as in the example notebooks), no image guidance is needed.subject/subject_512 and spatial LoRAs were trained at 512x512 and work best at that resolution; subject_1024_beta was trained at 1024x1024 and gives its best results there (see $ claude mcp add OminiControl \
-- python -m otcore.mcp_server <graph>