The Representation of Speech in Deep Neural Networks.

Odette Scharenborg, Nikki van der Gouw,Martha Larson,Elena Marchiori

MMM（2019）

引用 25|浏览55

暂无评分

摘要

In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naive, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.

查看译文

关键词

Deep neural networks, Speech representations, Visualizations

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要