Automated Emotional Valence Estimation in Infants with Stochastic and Strided Temporal Sampling

2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, ACII(2023)

引用 0|浏览5
暂无评分
摘要
We propose the first automated approach to estimate the emotional valence of infants from their facial behavior. We use the state-of-the-art transformer-based video masked autoencoder (VideoMAE) that is pre-trained on a large video dataset as a backbone, and finetune it on two large, well-annotated infant video datasets (SIBSMILE and MODELING). To augment the limited data, we propose a novel video temporal augmentation method called Stochastic and Strided Temporal Sampling (SSTS). We demonstrate the effectiveness of our approach for infant valence estimation by achieving 0.671 Concordance Correlation Coefficient (CCC) on SIBSMILE and MODELING. The experiments show that SSTS remarkably accelerates the training speed by 8 times while gaining the best valence estimation performance. Lastly, we suggest that face detection and cropping (coarse registration) is a promising alternative to landmark-based registration (i.e. fine registration) in data pre-processing when accurate infant facial landmark detectors are inaccessible.
更多
查看译文
关键词
infant emotional valence estimation,facial expression recognition,video transformers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要