Noise robust acoustic modeling for single-channel speech recognition based on a stream-wise transformer architecture
Interspeech(2021)
摘要
This paper addresses a noise-robust automatic speech recognition (ASR) method under the constraints of real-time, one-pass, and single-channel processing. Under such strong constraints, single-channel speech enhancement becomes a key technology because methods with multiple-passes or batch processing, such as acoustic model adaptation, are not suitable for use. However, single-channel speech enhancement often degrades ASR performance due to speech distortion. To overcome this problem, we propose a noise robust acoustic modeling method based on the stream-wise transformer model. The proposed method accepts multi-stream features obtained by multiple single-channel speech enhancement methods as input and selectively uses an appropriate feature stream according to the noise environment by paying attention to the noteworthy stream on the basis of multi-head attention. The proposed method considers the attention for the stream direction instead of the time series direction, and it is thus capable of real-time and low-latency processing. Comparative evaluations reveal that the proposed method successfully improves the accuracy of ASR in noisy environments and reduces the number of model parameters even under strong constraints.
更多查看译文
关键词
noise robust ASR,one-pass single-channel processing,speech enhancement,stream-wise transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要