On Using Heterogeneous Data For Vehicle-Based Speech Recognition: A Dnn-Based Approach

Xue Feng,Brigitte Richardson,Scott Amman,James Glass

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2015）

引用 20|浏览518

暂无评分

摘要

Most automatic speech recognition (ASR) systems incorporate a single source of information about their input, namely, features and transformations derived from the speech signal. However, in many applications, e.g., vehicle-based speech recognition, sensor data and environmental information are often available to complement audio information. In this paper, we show how these data can be used to improve hybrid DNN-HMM ASR systems for a vehicle-based speech recognition task. Feature fusion is accomplished by augmenting acoustic features with additional side information before being presented to the DNN acoustic model. The additional features are extracted from the vehicle speed, HVAC status, windshield wiper status, and vehicle type. This supplementary information improves the DNNs ability to discriminate phonetic events in an environment-aware way without having to make any modification to the DNN training algorithms. Experimental results show that heterogeneous data are effective irrespective of whether cross-entropy or sequence training is used. For CE training, a WER reduction of 6.3% is obtained, while sequential training reduces it by 5.5%.

查看译文

关键词

Noise Robustness,Deep Neural Network,Additional Feature for ASR,Condition-aware DNN

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要