Ear-model derived features for automatic speech recognition.
ICASSP(2000)
摘要
The paper provides a theoretical justification that gravity centers (GC) in frequency bands computed from zero-crossing information are far more robust to additive telephone noise than GCs computed from FFT spectra. Experiments on two different corpora confirm the theoretical results when GCs are added to standard Mel Frequency-scaled Cepstral Coefficients (MFCC) and their time derivatives. A 20.1% word error reduction is observed on a large telephone corpus of Italian cities, with an average Signal-to-Noise Ratio (SNR) of 15 dB, if GCs are computed from zero-crossings, while performance deteriorates when GCs are computed from FFT spectra.
更多查看译文
关键词
frequency,speech recognition,signal to noise ratio,automatic speech recognition,telephony,gravity,hidden markov models,frequency bands,snr,acoustic noise,neural networks,performance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络