Sliding Window Seq2seq Modeling for Engagement Estimation

Jun Yu,Keda Lu,Mohan Jing, Ziqi Liang, Bingyuan Zhang,Jianqing Sun,Jiaen Liang

MM '23: Proceedings of the 31st ACM International Conference on Multimedia（2023）

引用 0|浏览22

暂无评分

摘要

Engagement estimation in human conversations has been one of the most important research issues for natural human-robot interaction. However, previous datasets and studies mainly focus on the video-wise level of engagement estimation, therefore, can hardly reflect human's constantly changing engagement. Fortunately, the MultiMediate '23 challenge provides the frame-wise level of engagement estimation task. In this paper, we propose Sliding Window Seq2seq Modeling by BiLSTM and Transformer with powerful sequence modeling capabilities. Our method fully utilizes the global and local multi-modal feature information in the participants' videos and accurately expresses the engagement of the participants at each moment. Our method achieves the state-of-the-art CCC result of 0.71 for engagement estimation on the corresponding test sets.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要