SparseSpeech: Unsupervised Acoustic Unit Discovery with Memory-Augmented Sequence Autoencoders

INTERSPEECH(2019)

引用 3|浏览9
暂无评分
摘要
We propose a sparse sequence autoencoder model for unsupervised acoustic unit discovery, based on bidirectional LSTM encoders/decoders with a sparsity-inducing bottleneck. The sparsity layer is based on memory-augmented neural networks, with a differentiable embedding memory bank addressed from the encoder. The decoder reconstructs the encoded input feature sequence from an utterance-level context embedding and the bottleneck representation. At some time steps, the input to the decoder is randomly omitted by applying sequence dropout, forcing the decoder to learn about the temporal structure of the sequence. We propose a bootstrapping training procedure, after which the network can be trained end-to-end with standard back-propagation. Sparsity of the generated representation can be controlled with a parameter in the proposed loss function. We evaluate the units with the ABX discriminability on minimal triphone pairs and also on entire words. Forcing the network to favor highly sparse memory adressings in the memory component yields symbolic-like representations of speech that are very compact and still offer better ABX discriminability than MFCC.
更多
查看译文
关键词
unsupervised learning, sparse autoencoders, acoustic unit discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要