Unsupervised Neural Network Based Feature Extraction Using Weak Top-Down Constraints

Herman Kamper,Micha Elsner,Aren Jansen,Sharon Goldwater

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2015）

引用 133|浏览106

暂无评分

摘要

Deep neural networks (DNNs) have become a standard component in supervised ASR, used in both data-driven feature extraction and acoustic modelling. Supervision is typically obtained from a forced alignment that provides phone class targets, requiring transcriptions and pronunciations. We propose a novel unsupervised DNN-based feature extractor that can be trained without these resources in zero-resource settings. Using unsupervised term discovery, we find pairs of isolated word examples of the same unknown type; these provide weak top-down supervision. For each pair, dynamic programming is used to align the feature frames of the two words. Matching frames are presented as input-output pairs to a deep autoencoder (AE) neural network. Using this AE as feature extractor in a word discrimination task, we achieve 64% relative improvement over a previous state-of-the-art system, 57% improvement relative to a bottom-up trained deep AE, and come to within 23% of a supervised system.

查看译文

关键词

Unsupervised feature extraction,deep neural networks,zero-resource speech processing,top-down constraints

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要