Sentiment Analysis From Sound Spectrograms Via Soft Bovw And Temporal Structure Modelling

ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS(2020)

引用 3|浏览24
暂无评分
摘要
Monitoring and analysis of human sentiments is currently one of the hottest research topics in the field of human-computer interaction, having many applications. However, in order to become practical in daily life, sentiment recognition techniques should analyze data collected in an unobtrusive way. For this reason, analyzing audio signals of human speech (as opposed to say biometrics) is considered key to potential emotion recognition systems. In this work, we expand upon previous efforts to analyze speech signals using computer vision techniques on their spectrograms. In particular, we utilize ORB descriptors on keypoints distributed on a regular grid over the spectrogram to obtain an intermediate representation. Firstly, a technique similar to Bag-of-Visual-Words (BoVW) is used, where a visual vocabulary is created by clustering keypoint descriptors, but instead a soft candidacy score is used to construct the histogram descriptors of the signal. Furthermore, a technique which takes into account the temporal structure of the spectrograms is examined, allowing for effective model regularization. Both of these techniques are evaluated in several popular emotion recognition datasets, with results indicating an improvement over the simple BoVW method.
更多
查看译文
关键词
Sentiment Analysis, Speech Analysis, Bag-of-Visual-Words
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要