TCS-LipNet: Temporal & Channel & Spatial Attention-Based Lip Reading Network

Huanjie Chen,Wenjuan Li,Zhigang Cheng,Xiubo Liang,Qifei Zhang

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX（2023）

引用 0|浏览3

暂无评分

摘要

Lip-reading is the process of translating input lip-movement image sequences into text sequences, which is a task that requires both temporal and spatial information to be considered, and feature extraction is difficult. In this regard, this paper proposes a new lip reading model, TCS-LipNet, which innovatively proposes the temporal channel space attention mechanism module TCSAM, and compared with the channel space attention mechanism, TCS increases the association of channel space features in the temporal dimension and improves the performance of the model. TCS-LipNet uses the TCSAM-based ResNet18 network as the front-end module to enhance the extraction of visual features, and DC-TCN (Densely Connected Temporal Convolutional Networks) as the back-end module to address the temporal correlation of sequences. The experimental data show that TCS-LipNet achieves 92.2% accuracy on LRW, which is the highest accuracy rate currently.

查看译文

关键词

Lip reading,attention mechanism,feature extraction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要