TCS-LipNet: Temporal & Channel & Spatial Attention-Based Lip Reading Network

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX(2023)

引用 0|浏览3
暂无评分
摘要
Lip-reading is the process of translating input lip-movement image sequences into text sequences, which is a task that requires both temporal and spatial information to be considered, and feature extraction is difficult. In this regard, this paper proposes a new lip reading model, TCS-LipNet, which innovatively proposes the temporal channel space attention mechanism module TCSAM, and compared with the channel space attention mechanism, TCS increases the association of channel space features in the temporal dimension and improves the performance of the model. TCS-LipNet uses the TCSAM-based ResNet18 network as the front-end module to enhance the extraction of visual features, and DC-TCN (Densely Connected Temporal Convolutional Networks) as the back-end module to address the temporal correlation of sequences. The experimental data show that TCS-LipNet achieves 92.2% accuracy on LRW, which is the highest accuracy rate currently.
更多
查看译文
关键词
Lip reading,attention mechanism,feature extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要