A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings.

Procedia Computer Science(2016)

引用 12|浏览51
暂无评分
摘要
We train neural networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization.
更多
查看译文
关键词
unsupervised learning,speech embeddings,speech recognition,temporal coherence,zero resource speech challenge,feature extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要