Spelling-Aware Word-Based End-to-End ASR

IEEE SIGNAL PROCESSING LETTERS(2022)

引用 0|浏览15
暂无评分
摘要
We propose a new end-to-end architecture for automatic speech recognition that expands the "listen, attend and spell" (LAS) paradigm. While the main word-predicting network is trained to predict words, the secondary, speller network, is optimized to predict word spellings from inner representations of the main network (e.g. word embeddings or context vectors from the attention module). We show that this joint training improves the word error rate of a word-based system and enables solving additional tasks, such as out-of-vocabulary word detection and recovery. The tests are conducted on LibriSpeech dataset consisting of 1000 h of read speech.
更多
查看译文
关键词
Training,Vocabulary,Task analysis,Decoding,Predictive models,Training data,Recurrent neural networks,ASR,end-to-end,listen attend and spell architecture,OOV
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要