Predicting User Confidence in Video Recordings with Spatio-Temporal Multimodal Analytics.

International Conference on Multimodal Interaction (ICMI)(2022)

引用 0|浏览10
暂无评分
摘要
A critical component of effective communication is the ability to project confidence. In video presentations (e.g., video interviews), there are many factors that influence perceived confidence by a listener. Advances in computer vision, speech processing, and natural language processing have enabled the automatic extraction of salient features that can be used to model a presenter’s perceived confidence. Moreover, these multimodal features can be used to automatically provide feedback to a user with ways they can improve their projected confidence. This paper introduces a multimodal approach to modeling user confidence in video presentations by leveraging features from visual cues (i.e., eye gaze) and speech patterns. We investigate the degree to which the extracted multimodal features were predictive of user confidence with a dataset of 48 2-minute videos, where the participants used a webcam and microphone to record themselves responding to a prompt. Comparative experimental results indicate that our modeling approach of using both visual and speech features are able to score 83% and 78% improvements over the random and majority label baselines, respectively. We discuss implications of using the multimodal features for modeling confidence as well as the potential for automated feedback to users who want to improve their confidence in video presentations.
更多
查看译文
关键词
video recordings,user confidence,spatio-temporal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要