Transformer with a Parallel Decoder for Image Captioning

Peilang Wei,Xu Liu,Jun Luo,Huayan Pu,Xiaoxu Huang,Shilong Wang,Huajun Cao, Shouhong Yang,Xu Zhuang,Jason Wang,Hong Yue,Cheng Ji,Mingliang Zhou

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE（2024）

引用 0|浏览5

暂无评分

摘要

In this paper, a parallel decoder and a word group prediction module are proposed to speed up decoding and improve the effect of captions. The features of the image extracted by the encoder are linearly projected to different word groups, and then a unique relaxed mask matrix is designed to improve the decoding speed and the caption effect. First, since image captioning is composed of many words, sentences can also be broken down into word groups or words according to their syntactic structure, and we achieve this function through constituency parsing. Second, we make full use of the extracted features to predict the size of word groups. Then, a new embedding representing the information of the word is proposed based on word embedding. Finally, with the help of word groups, we design a mask matrix to modify the decoding process so that each step of the model can produce one or more words in parallel. Experiments on public datasets demonstrate that our method can reduce the time complexity while maintaining competitive performance.

查看译文

关键词

Image captioning,constituency parsing,word groups,time complexity,transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要