RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection
CVPR 2024(2024)
摘要
Three-dimensional object detection is one of the key tasks in autonomous
driving. To reduce costs in practice, low-cost multi-view cameras for 3D object
detection are proposed to replace the expansive LiDAR sensors. However, relying
solely on cameras is difficult to achieve highly accurate and robust 3D object
detection. An effective solution to this issue is combining multi-view cameras
with the economical millimeter-wave radar sensor to achieve more reliable
multi-modal 3D object detection. In this paper, we introduce RCBEVDet, a
radar-camera fusion 3D object detection method in the bird's eye view (BEV).
Specifically, we first design RadarBEVNet for radar BEV feature extraction.
RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section
(RCS) aware BEV encoder. In the dual-stream radar backbone, a point-based
encoder and a transformer-based encoder are proposed to extract radar features,
with an injection and extraction module to facilitate communication between the
two encoders. The RCS-aware BEV encoder takes RCS as the object size prior to
scattering the point feature in BEV. Besides, we present the Cross-Attention
Multi-layer Fusion module to automatically align the multi-modal BEV feature
from radar and camera with the deformable attention mechanism, and then fuse
the feature with channel and spatial fusion layers. Experimental results show
that RCBEVDet achieves new state-of-the-art radar-camera fusion results on
nuScenes and view-of-delft (VoD) 3D object detection benchmarks. Furthermore,
RCBEVDet achieves better 3D detection results than all real-time camera-only
and radar-camera 3D object detectors with a faster inference speed at 21 28
FPS. The source code will be released at https://github.com/VDIGPKU/RCBEVDet.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要