Multi-domain adversarial training of neural network acoustic models for distant speech recognition.

Speech Communication(2019)

引用 21|浏览75
暂无评分
摘要
Building deep neural network acoustic models directly based on far-field speech from multiple recording environments with different acoustic properties is an increasingly popular approach to address the problem of distant speech recognition. The currently common approach to building such multi-condition (multi-domain) models is to compile available data from all different environments into a single train set, discarding information regarding the specific environment to which each utterance belongs. We propose a novel strategy for training neural network acoustic models based on adversarial training which makes use of environment labels during training. By adjusting the parameters of the initial layers of the network adversarially with respect to a domain classifier trained to recognize the recording environments, we enforce better invariance to the diversity of recording conditions. We provide a motivating study on the mechanism by which a deep network learns environmental invariance, and discuss some relations with existing approaches for improving the robustness of DNN models. The proposed multi-domain adversarial training is evaluated on an end-to-end speech recognition task based on the AMI meeting corpus, achieving a relative character error rate reduction of +3.3% with respect to a conventional multi-condition trained baseline and +25.4% with respect to a clean-trained baseline.
更多
查看译文
关键词
Distant speech recognition,Far-field microphone,Recurrent neural network,Adversarial training,Multi-domain speech data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要