EFRNet-VL: An end-to-end feature refinement network for monocular visual localization in dynamic environments

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 1|浏览3
暂无评分
摘要
This study addresses the challenge of visual localization using monocular images, a crucial technology for autonomous systems that facilitates their navigation and interaction capabilities. With the advent of deep learning, visual localization techniques that utilize these methods have demonstrated improved robustness across diverse environments. Existing end-to-end models apply convolutional neural networks (CNNs) to extract salient features and directly estimate continuous spatial poses from map models that allow for implicit differentiation. Nonetheless, these models often falter in adapting their feature representations to extreme variations in environmental conditions, leading to critical localization inaccuracies during episodes of altered lighting, varying weather, or in the presence of moving objects. To overcome these limitations, we introduce the end-to-end feature refinement network for visual localization (EFRNet-VL). This network architecture is specifically designed to prioritize the extraction of static features crucial for the six degrees of freedom (6DoF) pose estimation, thereby outperforming prior methodologies. EFRNet-VL meticulously integrates a convolutional network structure with self-attention mechanisms and Long Short-Term Memory (LSTM) modules, which together facilitate the accurate association of a single image with its corresponding camera pose, even within dynamic environments. The proposed feature refinement approach is straightforward to implement and can enhance the performance of existing neural pose estimators. Our comprehensive evaluations of EFRNet-VL underscore its effectiveness. Notably, it has diminished the average position and orientation errors by 54.5% and 25.7%, respectively, as compared to the popular PoseNet model across various indoor settings. Moreover, in large-scale outdoor environments, it has achieved an average localization precision of 7.02m/2.79 degrees. EFRNet-VL has set a new benchmark for end-to-end learning-based methods in visual localization and operates efficiently in real time, processing at a speed of 9.8 ms per image frame.
更多
查看译文
关键词
Visual localization,Pose estimation,Dynamic objects,Learning-based localization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要