hub / github.com/OpenSenseNova/SenseNova-U1

github.com/OpenSenseNova/SenseNova-U1 @comfyui-v0.1.4 sqlite

repository ↗ · DeepWiki ↗ · release comfyui-v0.1.4 ↗

857 symbols 2,813 edges 76 files 264 documented · 31%

README

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

English | 简体中文

SenseNova-U1

visualization

📣 Updated News

[2026.05.15] Release SenseNova-U1-8B-MoT-Infographic 📊 model for improved infographic generation. See U1 Infographic Model for details, and ✨ Infographic Showcases for 100 generated examples.
[2026.05.10] Release 🔥SenseNova-U1 Technical Report🔥 and the weights for SenseNova-U1-A3B-MoT-SFT & SenseNova-U1-A3B-MoT.
[2026.05.08] Add GGUF quantized checkpoints and layer-offload VRAM modes for low-VRAM single-GPU inference. See Memory-efficient inference. GGUF weights for SenseNova-U1-8B-MoT-Merger are available at 🤗 smthem/SenseNova-U1-8B-MoT-Merger-gguf — many thanks to @smthem for contributing the quantized weights.
[2026.05.06] Release SenseNova-U1-8B-MoT-LoRA-8step-V1.0. Please see the example script.
[2026.04.30] Release the preview version of the 8-step inference model SenseNova-U1-8B-MoT-8step-preview. In most cases, the image generation quality of this model closely matches that of the base model (see comparison and existing issues). To test this model, you can use the inference scripts, but with the following parameters: --cfg_scale 1.0 --num_steps 8 .
[2026.04.27] Initial release of the weights for SenseNova-U1-8B-MoT-SFT and SenseNova-U1-8B-MoT.
[2026.04.27] Initial release of the inference code for SenseNova-U1.

🌟 Overview

🚀 SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental paradigm shift in multimodal AI: from modality integration to true unification. Rather than relying on adapters to translate between modalities, SenseNova U1 models think-and-act across language and vision natively.

Unifying visual understanding and generation in an end-to-end architecture from pixel to word opens tremendous possibilities, enabling highly efficient and strong understanding, generation, and interleaved reasoning in a natively multimodal manner.

radar plot

🏗️ Key Pillars:

At the core of SenseNova U1 is NEO-unify, a novel architecture designed from the first principles for multimodal AI: It eliminates both Visual Encoder (VE) and Variational Auto-Encoder (VAE) where pixel-word information are inherently and deeply correlated. Several important features are as follows:

🔗 Model language and visual information end-to-end as a unified compound.
🖼️ Preserve semantic richness while maintaining pixel-level visual fidelity.
🧠 Reason across modalities with high efficiency & minimal conflict via native MoTs.

✨ What This Unlocks:

_{Left: Generation Latency vs. Averaging Performance on OneIG (EN, ZH), LongText (EN, ZH), BizGenEval (Easy, Hard), CVTG and IGenBench.}

Right: Generation Latency vs. Averaging Performance on Infographic Benchmarks, i.e., BizGenEval (Easy, Hard), and IGenBench.

🏆 Open-source SoTA in both understanding and generation: SenseNova U1 sets a new standard for unified multimodal understanding and generation, achieving state-of-the-art performance among open-source models across a wide range of understanding, reasoning, and generation benchmarks.
📖 Native interleaved image-text generation: SenseNova U1 can generate coherent interleaved text and images in a single flow with one model, enabling use cases such as practical guides and travel diaries that combine clear communication with vivid storytelling and transform complex information into intuitive visuals.
📰 High-density information rendering: SenseNova U1 demonstrates strong capabilities in dense visual communication, generating richly structured layouts for knowledge illustrations, posters, presentations, comics, resumes, and other information-rich formats.

🌍 Beyond Multimodality:

🤖 Vision–Language–Action (VLA)
🌐 World Modeling (WM)

🦁 Models

In this release, we are open-sourcing the SenseNova U1 Lite series in two sizes:

SenseNova U1-8B-MoT — dense backbone
SenseNova U1-A3B-MoT — MoE backbone

Model	Params	HF Weights
SenseNova-U1-8B-MoT-Infographic	8B MoT	🤗 link
SenseNova-U1-8B-MoT-SFT	8B MoT	🤗 link
SenseNova-U1-8B-MoT	8B MoT	🤗 link
SenseNova-U1-8B-MoT-LoRA-8step-V1.0	0.4B	🤗 link
SenseNova-U1-A3B-MoT-SFT	A3B MoT	🤗 link
SenseNova-U1-A3B-MoT	A3B MoT	🤗 link

Here SFT models (×32 downsampling ratio) are trained via Understanding Warmup, Generation Pre-training, Unified Mid-training, and Unified SFT, with final models obtained after an initial round of T2I RL training.

Although relatively compact by today’s standards, these models already show strong performance across diverse tasks, comparable to commercial models with excellent cost efficiency. Notably, larger-scale versions are planned to further enhance capability and performance in the future.

💡 The 8B-MoT in SenseNova-U1-8B-MoT refers to ~8B understanding parameters and ~8B generation parameters. See parameter breakdown for details.

📋 ToDo List

[ ] Training code of SenseNova-U1
[x] Final weights and technical report of SenseNova-U1

🎨 Showcases

🖼️ Text-to-Image (General)

🖼️ Text-to-Image (Reasoning)

Original Text	Reasoning Process	Resulting Image
A male peacock trying to attract a female	1. Instruction Understanding: The core subject is a single male peacock. The action involves the male peacock trying to attract a female. 2. Reasoning Process: Male peacocks perform this ritual by fanning their tail feathers into a massive, colorful circular display. 3. Establish the frame: The frame captures the male peacock in a full-body shot to show the entire display. The angle is eye-level to emphasize the height of the tail feathers. 4. Set the lighting and color: Natural lighting illuminates the colorful circular display with soft shadows. The palette features vibrant blues and greens on the tail feathers. 5. Lock the style: The style is a realistic photograph with sharp focus on the male peacock. The finish is natural without artistic filters. 6. Explicit Prompt: A vibrant male peacock with his tail feathers fully fanned out in a wide, iridescent display of blues and greens, standing in a garden.
A small piece of dry wood and a dense iron block are in a transparent water tank	1. Instruction Understanding: The core subjects are a small piece of dry wood and a dense iron block positioned within a transparent water tank. The wood is floating on the surface while the iron block is submerged at the bottom. 2. Reasoning Process: The wood is less dense than water so it will float, while the iron is denser and will sink to the bottom. 3. Establish the frame: The composition is a vertical medium shot centering the rectangular tank within the frame. The camera angle is eye-level to clearly display the water line and the submerged base. Focus is sharp across the entire depth of the tank to ensure both materials are distinct. 4. Build the environment: The scene is contained entirely within the clear glass walls of the water tank. The water fills the majority of the volume, providing a medium for the floating wood and sunken iron block. The background remains out of focus to keep attention on the tan

Core symbols most depended-on inside this repo

info

called by 23

apps/comfyui/local_pipeline.py

tqdm

called by 19

examples/vqa/inference.py

write

called by 19

evaluation/gen/tiif/eval/summary_dimension_results.py

_t2i_predict_v

called by 17

src/sensenova_u1/models/neo_unify/modeling_neo_chat.py

append_message

called by 16

src/sensenova_u1/models/neo_unify/conversation.py

from_pretrained

called by 15

src/sensenova_u1/models/neo_unify/configuration_neo_vit.py

_build_t2i_image_indexes

called by 13

src/sensenova_u1/models/neo_unify/modeling_neo_chat.py

prepare_flash_kv_cache

called by 11

src/sensenova_u1/models/neo_unify/modeling_neo_chat.py

Shape

Function 497

Method 269

Class 87

Route 4

Languages

Python99%

TypeScript1%

Modules by API surface

src/sensenova_u1/models/neo_unify/modeling_fm_modules.py58 symbols

evaluation/interleave/OpenING/infer_opening.py45 symbols

src/sensenova_u1/models/neo_unify/modeling_qwen3.py44 symbols

src/sensenova_u1/models/neo_unify/modeling_neo_chat.py39 symbols

apps/comfyui/nodes.py38 symbols

apps/comfyui/local_pipeline.py37 symbols

evaluation/interleave/Unimmmu/inference_unimmmu.py30 symbols

evaluation/interleave/Realunify/inference_realunify.py29 symbols

src/sensenova_u1/utils/layer_offload.py28 symbols

evaluation/interleave/BabyVision/infer_babyvision.py27 symbols

evaluation/interleave/OpenING/eval_opening.py24 symbols

src/sensenova_u1/utils/profiler.py20 symbols

Dependencies from manifests, versioned

accelerate1.10.1 · 1×

httpx1×

huggingface-hub0.36.2 · 1×

numpy1×

packaging25.0 · 1×

pillow12.0.0 · 1×

pre-commit4.5.1 · 1×

python-dotenv1×

safetensors0.6.2 · 1×

sentencepiece0.2.1 · 1×

tokenizers0.22.1 · 1×

torch2.8.0 · 1×

For agents

$ claude mcp add SenseNova-U1 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact