Segmental Conditional Random Fields With Deep Neural Networks As Acoustic Models For First-Pass Word Recognition

Yanzhang He, Eric Fosler-Lussier

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 28|浏览60
暂无评分
摘要
Discriminative segmental models, such as segmental conditional random fields (SCRFs), have been successfully applied to speech recognition recently in lattice rescoring to integrate detectors across different levels of units, such as phones and words. However, the lattice generation has been constrained by a baseline decoder, typically a frame-based hybrid HMM-DNN system, which still suffers from the well-known frame independent assumption. In this paper, we propose to use SCRFs with DNNs directly as the acoustic model, a one-pass unified framework that can utilize local phone classifiers, phone transitions and long-span features, in direct word decoding to model phones or sub-phonetic segments with variable length. We describe a WFST-based approach to utilize the proposed acoustic model efficiently with the language model in first-pass word recognition. Our evaluation on the WSJO corpus shows our SCRF-DNN system outperforms a hybrid HMM-DNN system and a frame-level CRF-DNN system using the same mono phone label space.
更多
查看译文
关键词
word recognition, segmental conditional random fields, first-pass decoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要