Standalone training of context-dependent deep neural network acoustic models

Acoustics, Speech and Signal Processing(2014)

引用 43|浏览35
暂无评分
摘要
Recently, context-dependent (CD) deep neural network (DNN) hidden Markov models (HMMs) have been widely used as acoustic models for speech recognition. However, the standard method to build such models requires target training labels from a system using HMMs with Gaussian mixture model output distributions (GMM-HMMs). In this paper, we introduce a method for training state-of-the-art CD-DNN-HMMs without relying on such a pre-existing system. We achieve this in two steps: build a context-independent (CI) DNN iteratively with word transcriptions, and then cluster the equivalent output distributions of the untied CD-DNN HMM states using the decision tree based state tying approach. Experiments have been performed on the Wall Street Journal corpus and the resulting system gave comparable word error rates (WER) to CD-DNNs built based on GMM-HMM alignments and state-clustering.
更多
查看译文
关键词
Gaussian processes,acoustic analysis,decision trees,hidden Markov models,iterative methods,learning (artificial intelligence),mixture models,neural nets,speech recognition,CD-DNN-HMM training,GMM-HMM,Gaussian mixture model output distributions,WER,Wall Street Journal corpus,context-dependent deep neural network acoustic models,context-dependent deep neural network hidden Markov models,decision tree based state tying approach,speech recognition,target training labels,word error rates
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要