Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction
arxiv(2024)
Abstract
Holistic understanding and reasoning in 3D scenes play a vital role in the
success of autonomous driving systems. The evolution of 3D semantic occupancy
prediction as a pretraining task for autonomous driving and robotic downstream
tasks capture finer 3D details compared to methods like 3D detection. Existing
approaches predominantly focus on spatial cues such as tri-perspective view
embeddings (TPV), often overlooking temporal cues. This study introduces a
spatiotemporal transformer architecture S2TPVFormer for temporally coherent 3D
semantic occupancy prediction. We enrich the prior process by including
temporal cues using a novel temporal cross-view hybrid attention mechanism
(TCVHA) and generate spatiotemporal TPV embeddings (i.e. S2TPV embeddings).
Experimental evaluations on the nuScenes dataset demonstrate a substantial 4.1
improvement in mean Intersection over Union (mIoU) for 3D Semantic Occupancy
compared to TPVFormer, confirming the effectiveness of the proposed S2TPVFormer
in enhancing 3D scene perception.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined