Fmllr Based Feature-Space Speaker Adaptation Of Dnn Acoustic Models

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 42|浏览118
暂无评分
摘要
We investigate the problem of speaker adaptation of DNN acoustic models in two settings: the traditional unsupervised adaptation and a supervised adaptation (SuA) where a few minutes of transcribed speech is available. SuA presents additional difficulties when a test speaker's adaptation information does not match the registered speaker's information. Employing feature-space maximum likelihood linear regression (fMLLR) transformed features as side-information to the DNN, we reintroduce some classical ideas for combining adapted and unadapted features: early and late fusion methods, as well as the estimation of the fMLLR transforms using simple target models (STM). Results show that early fusion helps DNNs generalize better when features are combined after a non-linear bottleneck layer, while late fusion improves robustness, specifically in mismatched cases. STM give consistent improvements in both settings.
更多
查看译文
关键词
Speech recognition, feature-space speaker adaptation, DNN acoustic models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要