LSTM-Based Pitch Range Estimation from Spectral Information of Brief Speech Input

2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)(2018)

引用 5|浏览328
暂无评分
摘要
In human speech communication, pitch can be normalized automatically by listeners through a subjective estimation of the speaker’s overall pitch range, even from a very brief speech input. In speech technologies, pitch range used to be estimated by direct analysis of F0 values from a lengthy speech input, but a reliable estimation from a brief speech input has yet to be solved. In this study, we proposed a novel method of estimating pitch range from the spectral structure of a very brief speech input, using the recurrent neural network with long short-term memory (RNN-LSTM) to mimic the perceptual process of human listeners. Our experiments showed that the model gave the best estimation when speech input was as short as 300500ms, and in this condition the estimation was more reliable than the conventional method of direct F0 analysis. Thus, the validity of the proposed model was verified.
更多
查看译文
关键词
Recurrent neural networks,Analytical models,Reliability,Estimation error,Standards
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要