Task-Oriented Multi-Modal Question Answering For Collaborative Applications

Hui Li Tan,Mei Chee Leong,Qianli Xu,Liyuan Li,Fen Fang,Yi Cheng,Nicolas Gauthier,Ying Sun,Joo Hwee Lim

2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)（2020）

引用 5|浏览46

暂无评分

摘要

Cobots that can work in human workspaces and adapt to human need to understand and respond to human's inquiry and instruction. In this paper, we propose new question answering (QA) task and dataset for human-robot collaboration on task-oriented operation, i.e., task-oriented collaborative QA (TC-QA). Differing from conventional video QA for answering questions about what happened in video clips constrained by scripts and subtitles, TC-QA aims to share common ground for task-oriented operation through question answering. We propose an open-end (OE) format of answer with text reply, image with annotated related objects, and video with operation duration to guide operation execution. Designed for grounding, the TC-QA dataset comprises query videos and questions to seek acknowledgement, correction, attention to task-related objects, and information on objects or operation. Due to the flexibility of real-world task with limited training sample, we propose and evaluate a baseline method based on a hybrid approach. The hybrid approach employs deep learning methods for object detection, hand detection and gesture recognition, and symbolic reasoning to ground question on observation for providing the answer. Our experiments show that the hybrid method is effective for the TC-QA task.

查看译文

关键词

question answering, multi-modal grounding, human-robot collaboration, hybrid system, corpora

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要