When Text and Speech are Not Enough: A Multimodal Dataset of Collaboration in a Situated Task

Ibrahim Khebour,Richard Brutti, Indrani Dey, Rachel Dickler, Kelsey Sikes,Kenneth Lai, Mariah Bradford, Brittany Cates, Paige Hansen, Changsoo Jung, Brett Wisniewski, Corbyn Terpstra,Leanne Hirshfield,Sadhana Puntambekar,Nathaniel Blanchard,James Pustejovsky,Nikhil Krishnaswamy

Journal of Open Humanities Data（2024）

引用 0|浏览6

暂无评分

摘要

To adequately model information exchanged in real human-human interactions, considering speech or text alone leaves out many critical modalities. The channels contributing to the “making of sense” in human-human interactions include but are not limited to gesture, speech, user-interaction modeling, gaze, joint attention, and involvement/engagement, all of which need to be adequately modeled to automatically extract correct and meaningful information. In this paper, we present a multimodal dataset of a novel situated and shared collaborative task, with the above channels annotated to encode these different aspects of the situated and embodied involvement of the participants in the joint activity.

查看译文

关键词

multimodal interaction,collaboration,problem solving,situated tasks

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要