Robust Features in Deep-Learning-Based Speech Recognition.

Vikramjit Mitra,Horacio Franco,Richard M. Stern,Julien van Hout,Luciana Ferrer,Martin Graciarena,Wen Wang,Dimitra Vergyri,Abeer Alwan,John H. L. Hansen

New Era for Robust Speech Recognition, Exploiting Deep Learning（2017）

引用 22|浏览95

暂无评分

摘要

Recent progress in deep learning has revolutionized speech recognition research, with Deep Neural Networks (DNNs) becoming the new state of the art for acoustic modeling. DNNs offer significantly lower speech recognition error rates compared to those provided by the previously used Gaussian Mixture Models (GMMs). Unfortunately, DNNs are data sensitive, and unseen data conditions can deteriorate their performance. Acoustic distortionssuch as noise, reverberation, channel differences, etc.add variation to the speech signal, which in turn impact DNN acoustic model performance. A straightforward solution to this issue is training the DNN models with these types of variation, which typically provides quite impressive performance. However, anticipating such variation is not always possible; in these cases, DNN recognition performance can deteriorate quite sharply. To avoid subjecting acoustic models to such variation, robust features have traditionally been used to create an invariant representation of the acoustic space. Most commonly, robust feature-extraction strategies have explored three principal areas: (a) enhancing the speech signal, with a goal of improving the perceptual quality of speech; (b) reducing the distortion footprint, with signal-theoretic techniques used to learn the distortion characteristics and subsequently filter them out of the speech signal; and finally (c) leveraging knowledge from auditory neuroscience and psychoacoustics, by using robust features inspired by auditory perception. In this chapter, we present prominent robust feature-extraction strategies explored by the speech recognition research community, and we discuss their relevance to coping with data-mismatch problems in DNN-based acoustic modeling. We present results demonstrating the efficacy of robust features in the new paradigm of DNN acoustic models. And we discuss future directions in feature design for makVikramjit Mitra SRI International, STAR Lab, 333 Ravenswood Ave. Menlo Park, CA, 94532, USA, e-mail: vikramjit.mitra@sri.com Horacio Franco SRI International, STAR Lab, 333 Ravenswood Ave. Menlo Park, CA, 94532, USA, e-mail: horacio.franco@sri.com

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要