OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs
Proceedings of the CHI Conference on Human Factors in Computing Systems(2024)
摘要
The progression to "Pervasive Augmented Reality" envisions easy access to
multimodal information continuously. However, in many everyday scenarios, users
are occupied physically, cognitively or socially. This may increase the
friction to act upon the multimodal information that users encounter in the
world. To reduce such friction, future interactive interfaces should
intelligently provide quick access to digital actions based on users' context.
To explore the range of possible digital actions, we conducted a diary study
that required participants to capture and share the media that they intended to
perform actions on (e.g., images or audio), along with their desired actions
and other contextual information. Using this data, we generated a holistic
design space of digital follow-up actions that could be performed in response
to different types of multimodal sensory inputs. We then designed OmniActions,
a pipeline powered by large language models (LLMs) that processes multimodal
sensory inputs and predicts follow-up actions on the target information
grounded in the derived design space. Using the empirical data collected in the
diary study, we performed quantitative evaluations on three variations of LLM
techniques (intent classification, in-context learning and finetuning) and
identified the most effective technique for our task. Additionally, as an
instantiation of the pipeline, we developed an interactive prototype and
reported preliminary user feedback about how people perceive and react to the
action predictions and its errors.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要