MCPcopy
hub / github.com/Yuanshi9815/OminiControl

github.com/Yuanshi9815/OminiControl @main sqlite

repository ↗ · DeepWiki ↗
60 symbols 258 edges 8 files 5 documented · 8%
README

OminiControl

HuggingFace HuggingFace HuggingFace GitHub HuggingFace

arXiv arXiv

OminiControl: Minimal and Universal Control for Diffusion Transformer

Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang

xML Lab, National University of Singapore

OminiControl2: Efficient Conditioning for Diffusion Transformers

Zhenxiong Tan, Qiaochu Xue, Xingyi Yang, Songhua Liu, and Xinchao Wang

xML Lab, National University of Singapore

Features

OminiControl is a minimal yet powerful universal control framework for Diffusion Transformer models like FLUX.

  • Universal Control 🌐: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).

  • Minimal Design 🚀: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.

News

  • 2026-07-02: Codebase updated for diffusers 0.38. generate() gains a condition_scale argument to adjust condition strength, subject-driven generation now works on FLUX.1-dev (see example), and OminiControl2's KV-cache fast inference (kv_cache=True) is now documented (see Usage example).
  • 2025-05-12: ⭐️ The code of OminiControl2 is released. It introduces a new efficient conditioning method for diffusion transformers. (Check out the training code here).
  • 2025-05-12: Support custom style LoRA. (Check out the example).
  • 2025-04-09: ⭐️ OminiControl Art is released. It can stylize any image with an artistic style. (Check out the demo and inference examples).
  • 2024-12-26: Training code is released. Now you can create your own OminiControl model by customizing any control tasks (3D, multi-view, pose-guided, try-on, etc.) with the FLUX model. Check the training folder for more details.

Quick Start

Setup (Optional)

  1. Environment setup
conda create -n omini python=3.12
conda activate omini
  1. Torch installation (pick the index URL matching your CUDA version)
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
  1. Requirements installation
pip install -r requirements.txt

Usage example

  1. Subject-driven generation: examples/subject.ipynb (on FLUX.1-dev: examples/subject_dev.ipynb)
  2. Subject-driven generation at 1024x1024: examples/subject_1024.ipynb (best quality with the 1024-trained model)
  3. In-painting: examples/inpainting.ipynb
  4. Canny edge to image, depth to image, colorization, deblurring: examples/spatial.ipynb
  5. Combining with a custom style LoRA: examples/combine_with_style_lora.ipynb
  6. Image stylization (OminiControl Art): examples/ominicontrol_art.ipynb

Note (multiple LoRAs): if you load more than one adapter via repeated pipe.load_lora_weights(..., adapter_name=...) calls, activate them explicitly with pipe.set_adapters(["adapter_a", "adapter_b"]) — otherwise only the last-loaded adapter stays active and the others are silently ignored. See examples/spatial.ipynb for the pattern.

Adjusting condition strength

generate() accepts a condition_scale argument (default 1.0, which reproduces the original behavior exactly). Values > 1 strengthen the condition image's influence, values < 1 weaken it, and 0 suppresses it entirely.

result = generate(pipe, prompt=prompt, conditions=[condition], condition_scale=1.3)

Faster generation with KV-cache (OminiControl2)

Pass kv_cache=True to generate() to compute the condition branch's keys/values once and reuse them across all remaining steps (~1.5x end-to-end speedup at 8 steps). This requires a LoRA trained with independent_condition: true — see Efficient Generation (OminiControl2).

Guidelines for subject-driven generation

  1. Center-crop and resize your input image to the model's native resolution (512x512 for subject/subject_512, 1024x1024 for subject_1024_beta) before passing it in — the pipeline does not do this automatically. See the example notebooks for the preprocessing code.
  2. When writing prompts, refer to the subject using phrases like this item, the object, or it. e.g.
  3. A close up view of this item. It is placed on a wooden table.
  4. A young lady is wearing this shirt.
  5. The model primarily works with objects rather than human subjects currently, due to the absence of human data in training.

Generated samples

Subject-driven generation

HuggingFace

Demos (Left: condition image; Right: generated image)

Text Prompts

  • Prompt1: A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'
  • Prompt2: A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.
  • Prompt3: In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.
  • Prompt4: On the beach, a lady sits under a beach umbrella with 'Omini' written on it. She's wearing this shirt and has a big smile on her face, with her surfboard hehind her. The sun is setting in the background. The sky is a beautiful shade of orange and purple.

More results

  • Try on:
  • Scene variations:
  • Dreambooth dataset:
  • Oye-cartoon finetune:

Spatially aligned control

  1. Image Inpainting (Left: original image; Center: masked image; Right: filled image)
  2. Prompt: The Mona Lisa is wearing a white VR headset with 'Omini' written on it.

    - Prompt: A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.

    2. Other spatially aligned tasks (Canny edge to image, depth to image, colorization, deblurring)

Click to show

  <img src='./assets/demo/room_corner_canny.jpg' width='48%'/>
  <img src='./assets/demo/room_corner_depth.jpg' width='48%' />
  <img src='./assets/demo/room_corner_coloring.jpg' width='48%' />
  <img src='./assets/demo/room_corner_deblurring.jpg' width='48%' />




Prompt: *A light gray sofa stands against a white wall, featuring a black and white geometric patterned pillow. A white side table sits next to the sofa, topped with a white adjustable desk lamp and some books. Dark hardwood flooring contrasts with the pale walls and furniture.*

Stylize images

HuggingFace

Models

Subject-driven control: | Model | Description | Resolution | | ------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------ | | experimental / subject | The model used in the paper. | (512, 512) | | omini / subject_512 | The model has been fine-tuned on a larger dataset. | (512, 512) | | omini / subject_1024_beta | The model has been fine-tuned on a larger dataset and trained at 1024x1024. | (1024, 1024) | | oye-cartoon | Fine-tuned on the oye-cartoon dataset by @saquib764 (for FLUX.1-dev) | (512, 512) |

The subject LoRAs were trained on FLUX.1-dev. When running them on FLUX.1-dev, enable real image guidance (image_guidance_scale > 1.0, keep guidance_scale=3.5) — see examples/subject_dev.ipynb. They also run on FLUX.1-schnell (as in the example notebooks), where no image guidance is needed.

subject_1024_beta was trained at 1024x1024 and gives its best results at that resolution. The weight currently lives in a non-main revision of the HF repo — pass revision= when loading it (see examples/subject_1024.ipynb).

Spatially aligned control: | Model | Description | Resolution | | --------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------------ | | experimental / <task_name> | Canny edge to image (canny), depth to image (depth), colorization (coloring), deblurring (deblurring), in-painting (fill). Works on both FLUX.1-dev and FLUX.1-schnell. | (512, 512) |

Community Extensions

Limitations

  1. The model's subject-driven generation primarily works with objects rather than human subjects due to the absence of human data in training.
  2. The subject-driven LoRAs were trained on FLUX.1-dev. When running them on FLUX.1-dev, you must use real image guidance: call generate(...) with image_guidance_scale > 1.0 (e.g. 1.5) and more steps (~20–28) — without it, FLUX.1-dev tends to ignore the condition. image_guidance_scale is the tunable CFG knob; the distilled guidance_scale must be kept at 3.5 (the value used in training, for train/inference consistency — it is not a free hyperparameter). See examples/subject_dev.ipynb. On FLUX.1-schnell (as in the example notebooks), no image guidance is needed.
  3. The subject/subject_512 and spatial LoRAs were trained at 512x512 and work best at that resolution; subject_1024_beta was trained at 1024x1024 and gives its best results there (see

Core symbols most depended-on inside this repo

lora_forward
called by 10
omini/pipeline/flux_omini.py
get_config
called by 5
omini/train_flux/trainer.py
train
called by 5
omini/train_flux/trainer.py
convert_to_condition
called by 4
omini/pipeline/flux_omini.py
generate
called by 4
omini/pipeline/flux_omini.py
clip_hidden_states
called by 3
omini/pipeline/flux_omini.py
encode_images
called by 3
omini/pipeline/flux_omini.py
encode
called by 3
omini/pipeline/flux_omini.py

Shape

Function 32
Method 20
Class 8

Languages

Python100%

Modules by API surface

omini/pipeline/flux_omini.py20 symbols
omini/train_flux/trainer.py14 symbols
omini/train_flux/train_subject.py7 symbols
omini/train_flux/train_spatial_alignment.py7 symbols
omini/train_flux/train_token_integration.py4 symbols
omini/train_flux/train_multi_condition.py4 symbols
omini/train_flux/train_custom.py4 symbols

Dependencies from manifests, versioned

diffusers0.38.0 · 1×
transformers4.55 · 1×

For agents

$ claude mcp add OminiControl \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact