Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning

Jiayi Ji,Xiaoyang Huang,Xiaoshuai Sun,Yiyi Zhou,Gen Luo,Liujuan Cao,Jianzhuang Liu,Ling Shao,Rongrong Ji

IEEE Transactions on Multimedia（2023）

引用 4|浏览51

暂无评分

摘要

Self-attention (SA) based networks have achieved great success in image captioning, constantly dominating the leaderboards of online benchmarks. However, existing SA networks still suffer from distance insensitivity and low-rank bottleneck. In this paper, we aim to optimize SA in terms of two aspects, thereby addressing the above issues. First, we introduce a Distance-sensitive Self-Attention (DSA), which considers the raw geometric distances between query-key pairs in the 2D images during SA modeling. Second, we present a simple yet effective approach, named Multi-branch Self-Attention (MSA) to compensate for the low-rank bottleneck. MSA treats a multi-head self-attention layer as a branch and duplicates it multiple times to increase the expressive power of SA. To validate the effectiveness of the two designs, we apply them to the standard self-attention network, and conduct extensive experiments on the highly competitive MS-COCO dataset. We achieve new state-of-the-art performance on both the local and online test sets, i.e ., 135.1% CIDEr on the Karpathy split and 135.4% CIDEr on the official online split.

查看译文

关键词

Image captioning,multi-branch techniques,distance-sensitive positional embedding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要