Intent based Multimodal Speech and Gesture Fusion for Human-Robot Communication in Assembly Situation.

Sheuli Paul,Michael Sintek,Veton Këpuska,Marius Silaghi,Liam Robertson

ICMLA（2022）

引用 1|浏览4

暂无评分

摘要

Understanding the intent is an essential step for maintaining effective communications. This essential feature is used in communications for assembling, patrolling, and surveillance. A fused and interactive multimodal system for human-robot communication, used in assembly applications, is presented in this paper. Communication is multimodal. Having the options of multiple communication modes such as gestures, text, symbols, graphics, images, and speech increase the chance of effective communication. The intent is the main component that we are aiming to model, specifically in human machine dialogues. For this, we extract the intents from spoken dialogues and fuse the intent with any detected matching gesture that is used in interaction with the robot. The main components of the presented system are: (1) a speech recognizer system using Kaldi, (2) a deep-learning based Dual Intent and Entity Transformer (DIET) based classifier for intent and entity extraction, (3) a hand gesture recognition system, and (4) a dynamic fusion model for speech and gesture based communication. These are evaluated on contextual assembly situation using a simulated interactive robot.

查看译文

关键词

Intents, Multimodal Fusion, HRC, NLP, NLU, Gestures, Assembly Situation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要