hub / github.com/fundamentalvision/BEVFormer

github.com/fundamentalvision/BEVFormer @v2.0 sqlite

repository ↗ · DeepWiki ↗ · release v2.0 ↗

392 symbols 1,467 edges 141 files 206 documented · 53%

README

BEVFormer: a Cutting-edge Baseline for Camera-based Detection

https://user-images.githubusercontent.com/27915819/161392594-fc0082f7-5c37-4919-830a-2dd423c1d025.mp4

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers - Paper | Blog (in Chinese) | Presentation Slides at CVPR 2022 Workshop (soon) | Live-streaming video on BEV Perception (soon)

News

[2022/6/16]: We added two BEVformer configurations, which require less GPU memory than the base version. Please pull this repo to obtain the latest codes.
[2022/6/13]: We release an initial version of BEVFormer. It achieves a baseline result of 51.7% NDS on nuScenes.
[2022/5/23]: 🚀🚀Built on top of BEVFormer, BEVFormer++, gathering up all best practices in recent SOTAs and our unique modification, ranks 1st on Waymo Open Datast 3D Camera-Only Detection Challenge. We will present BEVFormer++ on CVPR 2022 Autonomous Driving Workshop.
[2022/3/10]: 🚀BEVFormer achieve the SOTA on nuScenes Detection Task with 56.9% NDS (camera-only)!

Abstract

In this work, the authors present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, the authors design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, the authors propose a temporal self-attention to recurrently fuse the history BEV information. The proposed approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines.

Methods

method

Getting Started

Model Zoo

Backbone	Method	Lr Schd	NDS	mAP	memroy	Config	Download
R50	BEVFormer-tiny	24ep	35.4	25.2	6500M	config	modle/log
R101-DCN	BEVFormer-small	24ep	47.9	37.0	10500M	config	model/log
R101-DCN	BEVFormer-base	24ep	51.7	41.6	28500M	config	model/log

Catalog

[ ] BEV Segmentation checkpoints
[ ] BEV Segmentation code
[x] 3D Detection checkpoints
[x] 3D Detection code
[x] Initialization

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{li2022bevformer,
  title={BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers},
  author={Li, Zhiqi and Wang, Wenhai and Li, Hongyang and Xie, Enze and Sima, Chonghao and Lu, Tong and Qiao, Yu and Dai, Jifeng}
  journal={arXiv preprint arXiv:2203.17270},
  year={2022}
}

Acknowledgement

Many thanks to these excellent open source projects: - detr3d - mmdet3d

↳ Stargazers

↳ Forkers

Core symbols most depended-on inside this repo

_extend_matrix

called by 12

tools/data_converter/kitti_data_utils.py

_create_reduced_point_cloud

called by 6

tools/data_converter/kitti_converter.py

get_kitti_info_path

called by 5

tools/data_converter/kitti_data_utils.py

extract_feat

called by 5

projects/mmdet3d_plugin/bevformer/detectors/bevformer.py

decode

called by 5

projects/mmdet3d_plugin/core/bbox/coders/nms_free_coder.py

show

called by 5

projects/mmdet3d_plugin/datasets/nuscenes_mono_dataset.py

get_color

called by 4

tools/analysis_tools/visual.py

_read_imageset_file

called by 4

tools/data_converter/kitti_converter.py

Shape

Method 202

Function 139

Class 50

Route 1

Languages

Python100%

Modules by API surface

projects/mmdet3d_plugin/datasets/pipelines/transform_3d.py21 symbols

projects/mmdet3d_plugin/models/backbones/vovnet.py20 symbols

projects/mmdet3d_plugin/datasets/nuscnes_eval.py18 symbols

projects/mmdet3d_plugin/datasets/nuscenes_mono_dataset.py17 symbols

tools/data_converter/scannet_data_utils.py16 symbols

tools/data_converter/kitti_data_utils.py15 symbols

tools/data_converter/waymo_converter.py13 symbols

tools/data_converter/sunrgbd_data_utils.py13 symbols

projects/mmdet3d_plugin/bevformer/detectors/bevformer.py12 symbols

tools/data_converter/s3dis_data_utils.py11 symbols

projects/mmdet3d_plugin/core/evaluation/kitti2waymo.py11 symbols

projects/mmdet3d_plugin/bevformer/dense_heads/bevformer_head.py11 symbols

For agents

$ claude mcp add BEVFormer \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact