KAN:Keyframe Attention Network for Person Video Captioning

Xiangyun Zhang,Min Yang, Xu Zhang,Fan Ni,Fangqiang Hu,Aichun Zhu

2023 China Automation Congress (CAC)（2023）

Cited 0|Views9

No score

Abstract

This paper presents a novel algorithm named Keyframe Attention Network (KAN) for video captioning, which combines keyframe feature extraction with an attention allocation mechanism. The proposed method first utilizes a threshold-based keyframe extraction technique to obtain keyframes. Subsequently, keyframe representation module is employed to extract essential features from these keyframes, this module is built by deep residual network. Finally, the extracted feature vectors, along with reference captions, are fed into an attention allocation module to generate descriptive captions. The inclusion of deep residual network ensures an increased network depth without encountering gradient explosions. Moreover, the attention module adopts an Encoder-Decoder structure with additional attention layers, enabling effective attention allocation and yielding more accurate captions.

Translated text

Key words

video captioning,keyframe extraction,keyframe representation,attention mechanism

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined