hub / github.com/AILab-CVC/YOLO-World

github.com/AILab-CVC/YOLO-World @main sqlite

repository ↗ · DeepWiki ↗

352 symbols 1,252 edges 113 files 128 documented · 36%

README

Tianheng Cheng^2,3,, Lin Song^1,📧,, Yixiao Ge^1,🌟,2, Wenyu Liu³, Xinggang Wang^3,📧, Ying Shan^1,2

* Equal contribution 🌟 Project lead 📧 Corresponding author

¹ Tencent AI Lab, ² ARC Lab, Tencent PCG ³ Huazhong University of Science and Technology

Notice

YOLO-World is still under active development!

We recommend that everyone use English to communicate on issues, as this helps developers from around the world discuss, share experiences, and answer questions together.

For business licensing and other related inquiries, don't hesitate to contact yixiaoge@tencent.com.

🔥 Updates

[2025-2-8]: We release a new YOLO-World-V2.1, which includes new pre-trained weights and training code for image prompts. Please see the update YOLO-World-V2.1-Blog for details.\ [2024-11-5]: We update the YOLO-World-Image and you can try it at HuggingFace YOLO-World-Image (Preview Version). It's a preview version and we are still improving it! Detailed documents about training and few-shot inference are coming soon.\ [2024-7-8]: YOLO-World now has been integrated into ComfyUI! Come and try adding YOLO-World to your workflow now! You can access it at StevenGrove/ComfyUI-YOLOWorld!
[2024-5-18]: YOLO-World models have been integrated with the FiftyOne computer vision toolkit for streamlined open-vocabulary inference across image and video datasets.
[2024-5-16]: Hey guys! Long time no see! This update contains (1) fine-tuning guide and (2) TFLite Export with INT8 Quantization.
[2024-5-9]: This update contains the real reparameterization 🪄, and it's better for fine-tuning on custom datasets and improves the training/inference efficiency 🚀!
[2024-4-28]: Long time no see! This update contains bugfixs and improvements: (1) ONNX demo; (2) image demo (support tensor input); (2) new pre-trained models; (3) image prompts; (4) simple version for fine-tuning / deployment; (5) guide for installation (include a requirements.txt).
[2024-3-28]: We provide: (1) more high-resolution pre-trained models (e.g., S, M, X) (#142); (2) pre-trained models with CLIP-Large text encoders. Most importantly, we preliminarily fix the fine-tuning without mask-refine and explore a new fine-tuning setting (#160,#76). In addition, fine-tuning YOLO-World with mask-refine also obtains significant improvements, check more details in configs/finetune_coco.
[2024-3-16]: We fix the bugs about the demo (#110,#94,#129, #125) with visualizations of segmentation masks, and release YOLO-World with Embeddings, which supports prompt tuning, text prompts and image prompts.
[2024-3-3]: We add the high-resolution YOLO-World, which supports 1280x1280 resolution with higher accuracy and better performance for small objects!
[2024-2-29]: We release the newest version of YOLO-World-v2 with higher accuracy and faster speed! We hope the community can join us to improve YOLO-World!
[2024-2-28]: Excited to announce that YOLO-World has been accepted by CVPR 2024! We're continuing to make YOLO-World faster and stronger, as well as making it better to use for all.
[2024-2-22]: We sincerely thank RoboFlow and @Skalskip92 for the Video Guide about YOLO-World, nice work!
[2024-2-18]: We thank @Skalskip92 for developing the wonderful segmentation demo via connecting YOLO-World and EfficientSAM. You can try it now at the 🤗 HuggingFace Spaces.
[2024-2-17]: The largest model X of YOLO-World is released, which achieves better zero-shot performance!
[2024-2-17]: We release the code & models for YOLO-World-Seg now! YOLO-World now supports open-vocabulary / zero-shot object segmentation!
[2024-2-15]: The pre-traind YOLO-World-L with CC3M-Lite is released!
[2024-2-14]: We provide the image_demo for inference on images or directories.
[2024-2-10]: We provide the fine-tuning and data details for fine-tuning YOLO-World on the COCO dataset or the custom datasets!
[2024-2-3]: We support the Gradio demo now in the repo and you can build the YOLO-World demo on your own device!
[2024-2-1]: We've released the code and weights of YOLO-World now!
[2024-2-1]: We deploy the YOLO-World demo on HuggingFace 🤗, you can try it now!
[2024-1-31]: We are excited to launch YOLO-World, a cutting-edge real-time open-vocabulary object detector.

TODO

YOLO-World is under active development and please stay tuned ☕️! If you have suggestions📃 or ideas💡,we would love for you to bring them up in the Roadmap ❤️!

YOLO-World 目前正在积极开发中📃，如果你有建议或者想法💡，我们非常希望您在 Roadmap 中提出来 ❤️！

FAQ (Frequently Asked Questions)

We have set up an FAQ about YOLO-World in the discussion on GitHub. We hope everyone can raise issues or solutions during use here, and we also hope that everyone can quickly find solutions from it.

我们在GitHub的discussion中建立了关于YOLO-World的常见问答，这里将收集一些常见问题，同时大家可以在此提出使用中的问题或者解决方案，也希望大家能够从中快速寻找到解决方案

Highlights & Introduction

This repo contains the PyTorch implementation, pre-trained weights, and pre-training/fine-tuning code for YOLO-World.

YOLO-World is pre-trained on large-scale datasets, including detection, grounding, and image-text datasets.
YOLO-World is the next-generation YOLO detector, with a strong open-vocabulary detection capability and grounding ability.
YOLO-World presents a prompt-then-detect paradigm for efficient user-vocabulary inference, which re-parameterizes vocabulary embeddings as parameters into the model and achieve superior inference speed. You can try to export your own detection model without extra training or fine-tuning in our online demo!

Zero-shot Evaluation Results for Pre-trained Models

We evaluate all YOLO-World-V2.1 models on LVIS, LVIS-mini, and COCO in the zero-shot manner, and compare with the previous version (the improvements are annotated in the superscripts).

Model	Resolution	LVIS AP	LVIS-mini	COCO
AP	AP_r	AP_c	AP_f	AP	AP_r	AP_c	AP_f	AP	AP₅₀	AP₇₅
YOLO-World-S	640	18.5^+1.2	12.6	15.8	24.1	23.6^+0.9	16.4	21.5	26.6	36.6	51.0	39.7
YOLO-World-S	1280	19.7^+0.9	13.5	16.3	26.3	25.5^+1.4	19.1	22.6	29.3	38.2	54.2	41.6
YOLO-World-M	640	24.1^+0.6	16.9	21.1	30.6	30.6^+0.6	19.7	29.0	34.1	43.0	58.6	46.7
YOLO-World-M	1280	26.0^+0.7	19.9	22.5	32.7	32.7^+1.1	24.4	30.2	36.4	43.8	60.3	47.7
YOLO-World-L	640	26.8^+0.7	19.8	23.6	33.4	33.8^+0.9	24.5	32.3	36.8	44.9	60.4	48.9
YOLO-World-L	800	28.3	22.5	24.4	35.1	35.2	27.8	32.6	38.8	47.4	63.3	51.8
YOLO-World-L	1280	28.7^+1.1	22.9	24.9	35.4	35.5^+1.2	24.4	34.0	38.8	46.0	62.5	50.0
YOLO-World-X	640	28.6^+0.2	22.0	25.6	34.9	35.8^+0.4	31.0	33.7	38.5	46.7	62.5	51.0
YOLO-World-X-1280 is coming soon.

Model Card

Model	Resolution	Training	Data	Model Weights
YOLO-World-S	640	PT (100e)	O365v1+GoldG+CC-LiteV2	🤗 HuggingFace
YOLO-World-S	1280	CPT (40e)	O365v1+GoldG+CC-LiteV2	🤗 HuggingFace
YOLO-World-M	640	PT (100e)	O365v1+GoldG+CC-LiteV2	🤗 HuggingFace
YOLO-World-M	1280	CPT (40e)	O365v1+GoldG+CC-LiteV2	🤗 HuggingFace
YOLO-World-L	640	PT (100e)	O365v1+GoldG+CC-LiteV2	🤗 HuggingFace
YOLO-World-L	800 / 1280	CPT (40e)	O365v1+GoldG+CC-LiteV2	🤗 HuggingFace
YOLO-World-X	640	PT (100e)	O365v1

Core symbols most depended-on inside this repo

build

called by 18

deploy/easydeploy/tools/build_engine.py

sigmoid

called by 6

deploy/easydeploy/examples/numpy_coder.py

predict_by_feat

called by 4

yolo_world/models/dense_heads/yolo_world_head.py

reparameterize

called by 4

yolo_world/models/detectors/yolo_world.py

forward

called by 4

yolo_world/models/detectors/yolo_world_image.py

get_data_info

called by 3

yolo_world/datasets/mm_dataset.py

train

called by 3

yolo_world/models/backbones/mm_backbone.py

forward_text

called by 3

yolo_world/models/backbones/mm_backbone.py

Shape

Method 210

Function 73

Class 69

Languages

Python100%

Modules by API surface

yolo_world/models/dense_heads/yolo_world_head.py30 symbols

yolo_world/models/layers/yolo_bricks.py27 symbols

yolo_world/datasets/transformers/mm_mix_img_transforms.py27 symbols

yolo_world/models/backbones/mm_backbone.py23 symbols

yolo_world/models/dense_heads/yolo_world_seg_head.py17 symbols

yolo_world/models/detectors/yolo_world_image.py16 symbols

yolo_world/models/detectors/yolo_world.py13 symbols

deploy/easydeploy/examples/numpy_coder.py12 symbols

deploy/easydeploy/model/backendwrapper.py11 symbols

yolo_world/datasets/mm_dataset.py10 symbols

deploy/tflite_demo.py10 symbols

deploy/easydeploy/nms/trt_nms.py10 symbols

Dependencies from manifests, versioned

mmcv1×

mmcv-lite2.0.0rc4 · 1×

mmdet3.0.0 · 1×

mmengine0.7.1 · 1×

numpy1×

opencv-python4.7.0.72 · 1×

openmim1×

supervision0.19.0 · 1×

tokenizers1×

torch1.11.0 · 1×

torchvision0.16.2 · 1×

transformers1×

For agents

$ claude mcp add YOLO-World \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact