EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
arxiv(2024)
摘要
Emotion AI is the ability of computers to understand human emotional states.
Existing works have achieved promising progress, but two limitations remain to
be solved: 1) Previous studies have been more focused on short sequential video
emotion analysis while overlooking long sequential video. However, the emotions
in short sequential videos only reflect instantaneous emotions, which may be
deliberately guided or hidden. In contrast, long sequential videos can reveal
authentic emotions; 2) Previous studies commonly utilize various signals such
as facial, speech, and even sensitive biological signals (e.g.,
electrocardiogram). However, due to the increasing demand for privacy,
developing Emotion AI without relying on sensitive signals is becoming
important. To address the aforementioned limitations, in this paper, we
construct a dataset for Emotion Analysis in Long-sequential and De-identity
videos called EALD by collecting and processing the sequences of athletes'
post-match interviews. In addition to providing annotations of the overall
emotional state of each video, we also provide the Non-Facial Body Language
(NFBL) annotations for each player. NFBL is an inner-driven emotional
expression and can serve as an identity-free clue to understanding the
emotional state. Moreover, we provide a simple but effective baseline for
further research. More precisely, we evaluate the Multimodal Large Language
Models (MLLMs) with de-identification signals (e.g., visual, speech, and NFBLs)
to perform emotion analysis. Our experimental results demonstrate that: 1)
MLLMs can achieve comparable, even better performance than the supervised
single-modal models, even in a zero-shot scenario; 2) NFBL is an important cue
in long sequential emotion analysis. EALD will be available on the open-source
platform.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要