Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild
arxiv(2024)
摘要
Multimodal fusion is a significant method for most multimodal tasks. With the
recent surge in the number of large pre-trained models, combining both
multimodal fusion methods and pre-trained model features can achieve
outstanding performance in many multimodal tasks. In this paper, we present our
approach, which leverages both advantages for addressing the task of Expression
(Expr) Recognition and Valence-Arousal (VA) Estimation. We evaluate the
Aff-Wild2 database using pre-trained models, then extract the final hidden
layers of the models as features. Following preprocessing and interpolation or
convolution to align the extracted features, different models are employed for
modal fusion. Our code is available at GitHub - FulgenceWen/ABAW6th.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要