Reducing Total Latency In Online Real-Time Inference And Decoding Via Combined Context Window And Model Smoothing Latencies
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2017)
摘要
probabilistic models are important in many interactive systems, including automatic speech recognition (ASR) and streaming environments. We study total inference latency (TL) in such systems, the additively combined latency of the inherent look-ahead of a deep neural network's (DNN) contextual window (CWL) in a DNN-HMM hybrid system and the latency incurred during Kalman-style smoothing in a dynamic probabilistic model (MSL) (hence, TL = CWL + MSL). For a fixed TL, the best accuracy can occur with a strictly positive MSL, often by quite a bit, a surprising result given the DNN's power. Furthermore, we find that accuracy is often improved with smaller TL and larger MSL. These results suggest that for optimal low-latency real-time decoding, the size of a DNN context window along with model smoothing should be jointly considered.
更多查看译文
关键词
Streaming inference, online inference, hybrid models, speech recognition, deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络