Incorporation of Happiness in Neutral Speech by Modifying Time-Domain Parameters of Emotive-Keywords

Circuits, Systems, and Signal Processing(2021)

引用 1|浏览1
暂无评分
摘要
Human-computer interactions can be enhanced by making machines recognize the emotional state of a user and respond accordingly. This necessitates text-to-speech systems that can produce natural emotional speech. While several existing methods are data driven, the current work attempts to incorporate happiness into neutral speech using signal processing algorithms. Analysis shows that it is mainly the speech-rate, pitch-period, and energy that exhibit variations due to emotion. Further, emotion is predominantly expressed in certain emotive words in the sentence. In this regard, several variations are introduced into the parameters mentioned before, and it is observed that fitting a hat-shaped pitch-contour onto the emotive keywords in a sentence and increasing their energy, suffices to incorporate happiness into neutral speech. An HMM-based approach is used to spot the keywords. Linear prediction-based synthesis and time-domain pitch-synchronous overlap-and-add method are then used to modify the keywords and synthesize emotional speech. The latter produces happy speech of better quality with a mean opinion score (MOS) of 2.51, out of a maximum of 3. Further, to verify if modifying the keywords would suffice, happy speech is also synthesized by modifying all the words of a neutral utterance to match the corresponding natural happy speech. An MOS of 2.34 is obtained for speech synthesized by this method, revealing that modifying the keywords would suffice to incorporate happiness into neutral speech. Finally, the use of the proposed method as a post-processing module in a text-to-speech synthesis system, to generate happy speech instead of neutral speech, is also demonstrated.
更多
查看译文
关键词
Emotion incorporation,Keyword-spotting,Polynomial curve fitting,PSOLA,Emotional speech synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要