Deep Monocular Video Depth Estimation Using Temporal Attention

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 4|浏览56
暂无评分
摘要
Monocular video depth estimation (MVDE) plays a crucial role in 3D computer vision. In this paper, we propose an end-to-end monocular video depth estimation network based on temporal attention. Our network starts by a motion compensation module where the spatial temporal transformer network (STN) is utilized to warp the input frames using the estimated optical flow. Next, a temporal attention module is used to combine features from the warped frames, while emphasizing the temporal consistency. A monocular depth estimation network is used to estimate the depth from the temporally combined features. Experimental results demonstrate that our proposed framework achieves better performance compared to the state-of-the-art single image depth estimation (SIDE) networks, as well as existing MVDE methods.
更多
查看译文
关键词
Depth estimation, temporal attention, spatial temporal transformer, optical flow
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要