Sound to expression: Using emotional sound to guide facial expression editing

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES(2024)

引用 0|浏览2
暂无评分
摘要
Recently, image generation technology has demonstrated surprising effects. However, precisely recognizing the emotion in sound to accurately express it on the face of a designated person is a huge challenge. To address this challenge, a new framework, Sound to Expression (S2E), which can use the emotion in sound to guide facial expression image generation, is proposed. A speech dataset for emotion recognition is constructed. S2E can edit facial expressions with different emotions in sounds for different people. S2E consists of Continuous Wavelet Transform (CWT), YOLOv3, ChatGPT-3, and facial expression diffusion editing model (FEDEM). CWT is utilized to extract emotional features from different sounds. YOLOv3 is employed to identify the emotion categories. The emotion category and a specific person's name are input into ChatGPT-3 to randomly generate a description of the person and emotion. The description is input into FEDEM to generate a facial expression image. To generate more accurate images and address emotional semantic deviation, a new facial detail emotional preservation loss is proposed. The experimental results show that S2E can accurately recognize the emotion in the voice and use this emotion to guide the editing of the facial expression for the specified person to generate more accurate images.
更多
查看译文
关键词
Sound to Expression,Emotion recognition,Facial expression,Emotional sound,ChatGPT-3,Diffusion editing model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要