Improving the Performance of Online Neural Transducer Models

Tara N. Sainath,Chung-Cheng Chiu,Rohit Prabhavalkar,Anjuli Kannan,Yonghui Wu,Patrick Nguyen,Zhifeng Chen

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2017）

引用 56|浏览332

暂无评分

摘要

Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to non-streaming models such as Listen, Attend and Spell (LAS). In this paper, we present various improvements to NT. Specifically, we look at increasing the window over which NT computes attention, mainly by looking backwards in time so the model still remains online. In addition, we explore initializing a NT model from a LAS-trained model so that it is guided with a better alignment. Finally, we explore including stronger language models such as using wordpiece models, and applying an external LM during the beam search. On a Voice Search task, we find with these improvements we can get NT to match the performance of LAS.

查看译文

关键词

sequence-to-sequence model,NT model,LAS-trained model,wordpiece models,online neural transducer models,language models,streaming applications,external LM,beam search,voice search task

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要