hub / github.com/showlab/ShowUI

github.com/showlab/ShowUI @main sqlite

503 symbols 1,586 edges 49 files 95 documented · 19%

README

ShowUI

Open-source, End-to-end, Lightweight, Vision-Language-Action model for GUI Agent & Computer Use.

ShowUI 是一款开源的、端到端、轻量级的视觉-语言-动作模型，专为 GUI 智能体设计。

ShowUI

    &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2411.17465">Paper</a> &nbsp&nbsp 
    | 🤗 <a href="https://huggingface.co/showlab/ShowUI-2B">Hugging Models</a>&nbsp&nbsp 
    | &nbsp&nbsp 🤗 <a href="https://huggingface.co/spaces/showlab/ShowUI">Spaces Demo</a> &nbsp&nbsp 
    | &nbsp&nbsp 📝 <a href="https://github.com/showlab/ShowUI/raw/main/assets/slide.pdf">Slides</a> &nbsp&nbsp 
    | &nbsp&nbsp 🕹️ <a href="https://openbayes.com/console/public/tutorials/I8euxlahBAm">OpenBayes贝式计算 Demo</a>

🤗 Datasets&nbsp&nbsp | &nbsp&nbsp💬 X (Twitter)&nbsp&nbsp | &nbsp&nbsp 🖥️ Computer Use &nbsp&nbsp | &nbsp&nbsp 📖 GUI Paper List &nbsp&nbsp | &nbsp&nbsp 🤖 ModelScope

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou

Show Lab @ National University of Singapore, Microsoft

🔥 Update

[x] [2026.2.21] ShowUI-pi has been accepted to CVPR 2026.
[x] [2025.12.31] We released ShowUI-Aloha for human demonstration workflow.
[x] [2025.12.31] We released ShowUI-π for GUI dragging.
[x] [2025.3.2] Support fine-tuning and inference of the lastest base model Qwen2.5-VL.
[x] [2025.2.27] ShowUI has been accepted to CVPR 2025.
[x] [2025.2.13] Support vllm inference.
[x] [2025.1.20] Support Navigation tasks: Mind2Web, AITW, Miniwob training and evaluator.
[x] [2025.1.17] Support API Calling via Gradio Client, simply run python3 api.py.
[x] [2025.1.5] Release the ShowUI-web dataset.
[x] [2024.12.28] Update GPT-4o annotation recaptioning scripts.
[x] [2024.12.27] Update training codes and instructions.
[x] [2024.12.23] Update showui for UI-guided token selection implementation.
[x] [2024.12.15] ShowUI received Outstanding Paper Award at NeurIPS2024 Open-World Agents workshop.
[x] [2024.12.9] Support int8 Quantization.
[x] [2024.12.5] Major Update: ShowUI is integrated into OOTB for local run!
[x] [2024.12.1] We support iterative refinement to improve grounding accuracy. Try it at HF Spaces demo.
[x] [2024.11.27] We release the arXiv paper, HF Spaces demo and ShowUI-desktop.
[x] [2024.11.16] showlab/ShowUI-2B is available at huggingface.

🤖 vllm Inference

See inference_vllm.ipynb for vllm inference.

To leverage multiple GPUs for faster inference, you can adjust the gpu_num parameter

⚡ API Calling

Run python3 api.py by providing a screenshot and a query.

Since we are based on huggingface gradio client, you don't need a GPU to deploy the model locally 🤗

🖥️ Computer Use

See Computer Use OOTB for using ShowUI to control your PC.

https://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee

⭐ Quick Start

See Quick Start for local model usage.

🤗 Local Gradio

See Gradio for installation.

🚀 Training

Our Training codebases supports: - [x] Grounding and Navigation training: Mind2Web, AITW, Miniwob - [x] Self-customized model: ShowUI, Qwen2VL, Qwen2.5VL - [x] Efficient Training: DeepSpeed, BF16, QLoRA, SDPA / FlashAttention2, Liger-Kernel - [x] Multiple datasets mixed training - [x] Interleaved data streaming - [x] Image randomly resize (crop, pad) - [x] Wandb training monitor - [x] Multi-GPUs, Multi-nodes training

See Train for training set up.

🕹️ UI-Guided Token Selection

Try test.ipynb, which seamless support for Qwen2VL models.

(a) Screenshot patch number: 1296 (b) By applying UI-graph, UI Component number: 167

✍️ Annotate your own data

Try recaption.ipynb, where we provide instructions on how to recaption the original annotations using GPT-4o.

❤ Acknowledgement

We extend our gratitude to SeeClick for providing their codes and datasets.

Special thanks to Siyuan for assistance with the Gradio demo and OOTB support.

🎓 BibTeX

If you find our work helpful, please kindly consider citing our paper.

@misc{lin2024showui,
      title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent}, 
      author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
      year={2024},
      eprint={2411.17465},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17465}, 
}

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Core symbols most depended-on inside this repo

apply_rotary_pos_emb_vision

called by 6

model/showui/modeling_showui.py

repeat_kv

called by 6

model/showui/modeling_showui.py

repeat_kv

called by 6

model/qwen2_5_vl/modeling_qwen2_5_vl.py

Shape

Method 287

Class 109

Function 104

Route 3

Languages

Python100%

Modules by API surface

model/qwen2_5_vl/modeling_qwen2_5_vl.py83 symbols

model/showui/modeling_showui.py80 symbols

model/qwen2_vl/modeling_qwen2_vl.py78 symbols

model/qwen2_5_vl/modular_qwen2_5_vl.py37 symbols

utils/utils.py18 symbols

data/data_utils.py17 symbols

model/showui/image_processing_showui.py15 symbols

data/dset_shared_navigation.py12 symbols

data/dset_mind2web.py12 symbols

main/utils_aitw.py11 symbols

data/dset_aitw.py11 symbols

data/dset_miniwob.py10 symbols

Dependencies from manifests, versioned

accelerate0.30.1 · 1×

bitsandbytes0.43.1 · 1×

deepspeed0.13.1 · 1×

selenium4.12.0 · 1×

transformers4.47.0 · 1×

For agents

$ claude mcp add ShowUI \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact