谷歌浏览器插件
订阅小程序
在清言上使用

Cross-Modality Attention and Multimodal Fusion Transformer for Pedestrian Detection.

ECCV Workshops (5)(2022)

引用 0|浏览9
暂无评分
摘要
Pedestrian detection is an important challenge in computer vision due to its various applications. To achieve more accurate results, thermal images have been widely exploited as complementary information to assist conventional RGB-based detection. Although existing methods have developed numerous fusion strategies to utilize the complementary features, research that focuses on exploring features exclusive to each modality is limited. On this account, the features specific to one modality cannot be fully utilized and the fusion results could be easily dominated by the other modality, which limits the upper bound of discrimination ability. Hence, we propose the Cross-modality Attention Transformer (CAT) to explore the potential of modality-specific features. Further, we introduce the Multimodal Fusion Transformer (MFT) to identify the correlations between the modality data and perform feature fusion. In addition, a content-aware objective function is proposed to learn better feature representations. The experiments show that our method can achieve state-of-the-art detection performance on public datasets. The ablation studies also show the effectiveness of the proposed components.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要