Statistical Phrase/Accent Command Estimation Algorithm Utilizing Linguistic Information
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)
摘要
The importance of extracting non-linguistic information has been highlighted in a growing variety of applications of speech signal processing. Among the audio features carrying such information, fundamental frequency (F-0) contours are considered primarily important. The Fujisaki model is a physical model that describes a F-0 contour with only a small number of parameters, namely, the timings and magnitudes of the phrase and accent commands, and a stochastic formulation and estimation algorithm have recently been proposed for it. However, the use of linguistic information has so far been limited, while it is known that accent commands are strongly related to linguistic information in many languages, and linguistic information could be obtained from the input audio signals by using speech recognition techniques. Against this background, this paper introduces a novel F-0 command parameter estimation method that incorporates linguistic information with the stochastic framework. Experiments using real speech data show that when linguistic information is appropriately utilized, the estimation accuracy of accent command parameters is improved by 43% under the proposed criteria.
更多查看译文
关键词
voice fundamental frequency contour estimation, Fujisaki model, prosodic information processing, EM algorithm, speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络