Weakly Supervised Monocular 3D Detection with a Single-View Image
CVPR 2024(2024)
Abstract
Monocular 3D detection (M3D) aims for precise 3D object localization from asingle-view image which usually involves labor-intensive annotation of 3Ddetection boxes. Weakly supervised M3D has recently been studied to obviate the3D annotation process by leveraging many existing 2D annotations, but it oftenrequires extra training data such as LiDAR point clouds or multi-view imageswhich greatly degrades its applicability and usability in various applications.We propose SKD-WM3D, a weakly supervised monocular 3D detection framework thatexploits depth information to achieve M3D with a single-view image exclusivelywithout any 3D annotations or other training data. One key design in SKD-WM3Dis a self-knowledge distillation framework, which transforms image featuresinto 3D-like representations by fusing depth information and effectivelymitigates the inherent depth ambiguity in monocular scenarios with littlecomputational overhead in inference. In addition, we design anuncertainty-aware distillation loss and a gradient-targeted transfer modulationstrategy which facilitate knowledge acquisition and knowledge transfer,respectively. Extensive experiments show that SKD-WM3D surpasses thestate-of-the-art clearly and is even on par with many fully supervised methods.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined