https://user-images.githubusercontent.com/27915819/161392594-fc0082f7-5c37-4919-830a-2dd423c1d025.mp4
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers - Paper | Blog (in Chinese) | Presentation Slides at CVPR 2022 Workshop (soon) | Live-streaming video on BEV Perception (soon)
In this work, the authors present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, the authors design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, the authors propose a temporal self-attention to recurrently fuse the history BEV information. The proposed approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines.

| Backbone | Method | Lr Schd | NDS | mAP | memroy | Config | Download |
|---|---|---|---|---|---|---|---|
| R50 | BEVFormer-tiny | 24ep | 35.4 | 25.2 | 6500M | config | modle/log |
| R101-DCN | BEVFormer-small | 24ep | 47.9 | 37.0 | 10500M | config | model/log |
| R101-DCN | BEVFormer-base | 24ep | 51.7 | 41.6 | 28500M | config | model/log |
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{li2022bevformer,
title={BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers},
author={Li, Zhiqi and Wang, Wenhai and Li, Hongyang and Xie, Enze and Sima, Chonghao and Lu, Tong and Qiao, Yu and Dai, Jifeng}
journal={arXiv preprint arXiv:2203.17270},
year={2022}
}
Many thanks to these excellent open source projects: - detr3d - mmdet3d
$ claude mcp add BEVFormer \
-- python -m otcore.mcp_server <graph>