Chrome Extension
WeChat Mini Program
Use on ChatGLM

MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector

Pan Liao,Feng Yang, Di Wu, Wenhui Zhao, Jinwen Yu

arXiv · Computer Vision and Pattern Recognition(2024)

Cited 0|Views26
Abstract
Monocular 3D object detection has vast application potential across various fields. DETR-type models have shown remarkable performance in different areas, but there is still considerable room for improvement in monocular 3D detection, especially with the existing DETR-based method, MonoDETR. After addressing the query initialization issues in MonoDETR, we explored several performance enhancement strategies, such as incorporating a more efficient encoder and utilizing a more powerful depth estimator. Ultimately, we proposed MonoDETRNext, a model that comes in two variants based on the choice of depth estimator: MonoDETRNext-E, which prioritizes speed, and MonoDETRNext-A, which focuses on accuracy. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 3.52% improvement in the AP_3D metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-E showed a 2.35% increase. Additionally, the computational efficiency of MonoDETRNext-E slightly exceeds that of its predecessor.
More
Translated text
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:MonoDETRNext是一种新一代精确且高效的单目3D目标检测方法,通过平衡精度和处理速度,在2D检测和深度估计的成功策略基础上进行改进,提出了有效的混合视觉编码器、增强的深度预测机制和创新的查询生成策略,并引入先进的深度预测器。

方法】:MonoDETRNext包括开发高效的混合视觉编码器、增强深度预测机制和引入创新的查询生成策略,并通过先进的深度预测器进行增强。

实验】:在MonoDETR的基础上,MonoDETRNext引入了两个变种:强调速度的MonoDETRNext-F和强调精度的MonoDETRNext-A。通过全面的评估,展示了模型优于现有解决方案的优越性能,特别是MonoDETRNext-A的mAP提高了4.60%,而MonoDETRNext-F的效率略超过了其前身。