Xie, Enze and Yu, Zhiding and Zhou, Daquan and Philion, Jonah and Anandkumar, Anima and Fidler, Sanja and Luo, Ping and Alvarez, Jose M. (2022) M²BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20220714-212525848
![]() |
PDF
- Submitted Version
See Usage Policy. 15MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220714-212525848
Abstract
In this paper, we propose M$²BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs. Unlike the majority of previous works which separately process detection and segmentation, M$²BEV infers both tasks with a unified model and improves efficiency. M2BEV efficiently transforms multi-view 2D image features into the 3D BEV feature in ego-car coordinates. Such BEV representation is important as it enables different tasks to share a single encoder. Our framework further contains four important designs that benefit both accuracy and efficiency: (1) An efficient BEV encoder design that reduces the spatial dimension of a voxel feature map. (2) A dynamic box assignment strategy that uses learning-to-match to assign ground-truth 3D boxes with anchors. (3) A BEV centerness re-weighting that reinforces with larger weights for more distant predictions, and (4) Large-scale 2D detection pre-training and auxiliary supervision. We show that these designs significantly benefit the ill-posed camera-based 3D perception tasks where depth information is missing. M2BEV is memory efficient, allowing significantly higher resolution images as input, with faster inference speed. Experiments on nuScenes show that M$²BEV achieves state-of-the-art results in both 3D object detection and BEV segmentation, with the best single model achieving 42.5 mAP and 57.0 mIoU in these two tasks, respectively.
Item Type: | Report or Paper (Discussion Paper) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| |||||||||
ORCID: |
| |||||||||
Subject Keywords: | Multi-Camera, Multi-Task Learning, Autonomous Driving | |||||||||
Record Number: | CaltechAUTHORS:20220714-212525848 | |||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20220714-212525848 | |||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||
ID Code: | 115587 | |||||||||
Collection: | CaltechAUTHORS | |||||||||
Deposited By: | George Porter | |||||||||
Deposited On: | 15 Jul 2022 22:39 | |||||||||
Last Modified: | 15 Jul 2022 22:39 |
Repository Staff Only: item control page