Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving

IEEE International Conference on Robotics and Automation(2022)

引用 66|浏览66
暂无评分
摘要
Predicting the behaviors of other agents on the road is critical for autonomous driving to ensure safety and efficiency. However, the challenging part is how to represent the social interactions between agents and output different possible trajectories with interpretability. In this paper, we introduce a neural prediction framework based on the Transformer structure to model the relationship among the interacting agents and extract the attention of the target agent on the map waypoints. Specifically, we organize the interacting agents into a graph and utilize the multi-head attention Transformer encoder to extract the relations between them. To address the multi-modality of motion prediction, we propose a multi-modal attention Transformer encoder, which modifies the multi-head attention mechanism to multi-modal attention, and each predicted trajectory is conditioned on an independent attention mode. The proposed model is validated on the Argoverse motion forecasting dataset and shows state-of-the-art prediction accuracy while maintaining a small model size and a simple training process. We also demonstrate that the multi-modal attention module can automatically identify different modes of the target agent's attention on the map, which improves the interpretability of the model.
更多
查看译文
关键词
map waypoints,multihead attention transformer encoder,multimodality,independent attention mode,multimodal motion prediction,transformer-based neural network,autonomous driving,social interactions,neural prediction framework,multimodal attention transformer encoder,Argoverse motion forecasting dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要