A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings.

Procedia Computer Science（2016）

引用 12|浏览51

暂无评分

摘要

We train neural networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization.

查看译文

关键词

unsupervised learning,speech embeddings,speech recognition,temporal coherence,zero resource speech challenge,feature extraction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要