Large-Scale Nonverbal Vocalization Detection Using Transformers

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览0
暂无评分
摘要
Detecting emotionally expressive nonverbal vocalizations is essential to developing technologies that can converse fluently with humans. The affective computing community has largely focused on understanding the intonation of emotional speech and language. However, advances in the study of vocal emotional behavior suggest that emotions may be more readily conveyed not by speech but by nonverbal vocalizations such as laughs, sighs, shrieks, and grunts – vocalizations that often occur in lieu of speech. The task of detecting such emotional vocalizations has been largely overlooked by researchers, likely due to the limited availability of data capturing a sufficiently wide variety of vocalizations. Most studies in the literature focus on detecting laughter or cries. In this paper, we present the first, to the best of our knowledge, nonverbal vocalization detection model trained to detect as many as 67 types of emotional vocalizations. For our purposes, we use the large-scale and in-the-wild HUME-VB dataset that provides more than 156 h of data. We thoroughly investigate the use of pre-trained audio transformer models, such as Wav2Vec2 and Whisper, and provide useful insights for the task at hand using different types of noise signals.
更多
查看译文
关键词
Nonverbal vocalization,transformers,vo-cal burst detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要