TRF-Net: a transformer-based RGB-D fusion network for desktop object instance segmentation

Neural Comput. Appl.(2023)

引用 0|浏览6
暂无评分
摘要
To perform object-specific tasks on the desktop, robots need to perceive different objects. The challenge is to calculate the pixel-wise mask for each object, even in the presence of occlusions and unseen objects. We take a step toward this problem by proposing a metric learning-based network called TRF-Net to perform desktop object instance segmentation. We design two ResNet-based branches to process the RGB and depth images separately. Then, we propose a Transformer-based fusion module called TranSE to fuse the features from both branches. This module also transfers the fused features to the decoder part, which helps generate fine-grained decoder features. After that, we propose a multi-scale feature embedding loss function called MFE loss to reduce the intra-class distance and increase the inter-class distance, which contributes to the feature clustering in embedding space. Due to the lack of large-scale real-world datasets for desktop objects, the proposed TRF-Net is trained with the synthetic dataset and tested with the small-scale real-world dataset. The target objects in the testing dataset do not present in the training dataset, ensuring the novelty of testing objects. We demonstrate that our method can produce accurate instance segmentation masks, outperforming other state-of-the-art methods on desktop object instance segmentation.
更多
查看译文
关键词
Desktop object instance segmentation,Metric learning,Transformer-based fusion module,Multi-scale embedding loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要