TMNet: Triple-modal interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection

Bin Wan,Chengtao Lv,Xiaofei Zhou,Yaoqi Sun,Zunjie Zhu,Hongkui Wang,Chenggang Yan

PATTERN RECOGNITION（2024）

引用 0|浏览18

暂无评分

摘要

Salient object detection methods based on two-modal images have achieved remarkable success with the aid of image acquisition equipment. However, environmental factors often interfere with the Depth and Thermal maps, rendering them ineffective in providing object information. To address this weakness, we utilize the VDT dataset, which includes Visible, Depth, and Thermal images, and propose a triple-modal interaction encoder and multi-scale fusion decoder network (TMNet) to highlight the salient regions. The triple-modal interaction encoder comprises the separation context-aware feature module, channel-wise fusion module, and triple-modal refinement and fusion module, enabling us to fully explore and utilize the complementarity between Visible, Depth, and Thermal information. The multi-scale fusion decoder involves the semantic-aware localizing module and contour-aware refinement module to extract and fuse the location and boundary information, yielding a high-quality saliency map. Extensive experiments on the public VDT-2048 dataset demonstrate that our TMNet outperforms existing state-of-the-art methods in terms of all evaluation metrics.

查看译文

关键词

V-D-T salient object detection,Triple-modal interaction encoder,Multi-scale fusion decoder,Triple-modal interaction unit

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要