Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

Zhuofan Wen, Fengyu Zhang, Siyuan Zhang,Haiyang Sun,Mingyu Xu,Licai Sun,Zheng Lian,Bin Liu,Jianhua Tao

arxiv（2024）

引用 0|浏览2

暂无评分

摘要

Multimodal fusion is a significant method for most multimodal tasks. With the recent surge in the number of large pre-trained models, combining both multimodal fusion methods and pre-trained model features can achieve outstanding performance in many multimodal tasks. In this paper, we present our approach, which leverages both advantages for addressing the task of Expression (Expr) Recognition and Valence-Arousal (VA) Estimation. We evaluate the Aff-Wild2 database using pre-trained models, then extract the final hidden layers of the models as features. Following preprocessing and interpolation or convolution to align the extracted features, different models are employed for modal fusion. Our code is available at GitHub - FulgenceWen/ABAW6th.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要