A Factorial Deep Markov Model For Unsupervised Disentangled Representation Learning From Speech

Sameer Khurana,Shafiq Rayhan Joty,Ahmed Ali,James Glass

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)（2019）

引用 15|浏览30

暂无评分

摘要

We present the Factorial Deep Markov Model ( FDMM) for representation learning of speech. The FDMM learns disentangled, interpretable and lower dimensional latent representations from speech without supervision. We use a static and dynamic latent variable to exploit the fact that information in a speech signal evolves at different time scales. Latent representations learned by the FDMM outperform a baseline i-vector system on speaker verification and dialect identification while also reducing the error rate of a phone recognition system in a domain mismatch scenario.

查看译文

关键词

Disentangled Representation Learning, Variational Inference, Factorial Deep Markov Model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要