Towards local visual modeling for image captioning

Pattern Recognition(2023)

引用 4|浏览71
暂无评分
摘要
•Local visual modeling with grid features for image captioning.•Locality-Sensitive Attention (LSA) is deployed for the intra-layer interaction via local visual modeling.•Locality-Sensitive Fusion (LSF) is used for inter-layer information fusion.•Locality-Sensitive Transformer Network (LSTNet) outperforms SOTA captioning models on MS-COCO.•The generalization of LSTNet is also verified on the Flickr8k and Flickr30k datasets.
更多
查看译文
关键词
Image captioning,Attention mechanism,Local visual modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络