Integration of fixed and multiple resolution analysis in a speech recognition system

Roberto Gemello,Dario Albesano,Loreta Moisa,de Mori, Renato

ICASSP '01). 2001 IEEE International Conference（2001）

引用 15|浏览19

暂无评分

摘要

Compares the performance of an operational automatic speech recognition system when Mel frequency-scaled cepstral coefficients (MFCCs), J-Rasta perceptual linear prediction coefficients (J-Rasta PLP) and energies from a multi resolution analysis (MRA) tree of filters are used as input features to a hybrid system consisting of a neural network (NN) which provides observation probabilities for a network of hidden Markov models (HMM). Furthermore, the paper compares the performance of the system when various combinations of these features are used showing a WER reduction of 16% w.r.t. the use of J-Rasta PLP coefficients, when J-Rasta PLP coefficients are combined with the energies computed at the output of the leaves of an MRA filter tree. Such a combination is practically feasible thanks to the NN architecture used in the system. Recognition is performed without any language model on a very large test set including many speakers uttering proper names from different locations of the Italian public telephone network

查看译文

关键词

cepstral analysis,feedforward neural nets,filtering theory,hidden Markov models,linear predictive coding,probability,speech recognition,wavelet transforms,Italian public telephone network,J-Rasta perceptual linear prediction coefficients,MFCCs,Mel frequency-scaled cepstral coefficients,filter tree,fixed resolution analysis,hidden Markov models,multiple resolution analysis,neural network,observation probabilities,speech recognition system,word error rate

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要