Locality-Aware Transformer for Video-Based Sign Language Translation.

IEEE Signal Process. Lett.(2023)

引用 1|浏览12
暂无评分
摘要
Recently, the application of transformer makes significant progress in sign language translation. However, several characteristics of sign videos are neglected in existing transformer-based methods that hinder translation performance. Firstly, in sign videos, multiple consecutive frames represent a single sign gloss thus the local temporal relations are crucial. Secondly, the inconsistency between video and text demands the non-local and global context modeling ability of the model. To address these issues, a locality-aware transformer is proposed for sign language translation. Concretely, the multi-stride position encoding scheme assigns the same position index to adjacent frames with various strides to enhance the local dependency. Afterward, the adaptive temporal interaction module is utilized to capture non-local and flexible local frame correlation simultaneously. Moreover, a gloss counting task is designed to facilitate the holistic understanding of sign videos. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed framework.
更多
查看译文
关键词
sign language translation,locality-aware,video-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要