The GUA-Speech System Description for CNVSRC Challenge 2023
CoRR(2023)
摘要
This study describes our system for Task 1 Single-speaker Visual Speech
Recognition (VSR) fixed track in the Chinese Continuous Visual Speech
Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate
connectionist temporal classification (Inter CTC) residual modules to relax the
conditional independence assumption of CTC in our model. Then we use a
bi-transformer decoder to enable the model to capture both past and future
contextual information. In addition, we use Chinese characters as the modeling
units to improve the recognition accuracy of our model. Finally, we use a
recurrent neural network language model (RNNLM) for shallow fusion in the
inference stage. Experiments show that our system achieves a character error
rate (CER) of 38.09% on the Eval set which reaches a relative CER reduction of
21.63% over the official baseline, and obtains a second place in the challenge.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要