Raise to Speak: An Accurate, Low-power Detector for Activating Voice Assistants on Smartwatches

Shiwen Zhao,Brandt Westing,Shawn Scully, Heri Nieto,Roman Holenstein,Minwoo Jeong,Krishna Sridhar,Brandon Newendorp,Mike Bastian,Sethu Raman,Tim Paek,Kevin Lynch,Carlos Guestrin

KDD（2019）

引用 3|浏览277

暂无评分

摘要

The two most common ways to activate intelligent voice assistants (IVAs) are button presses and trigger phrases. This paper describes a new way to invoke IVAs on smartwatches: simply raise your hand and speak naturally. To achieve this experience, we designed an accurate, low-power detector that works on a wide range of environments and activity scenarios with minimal impact to battery life, memory footprint, and processor utilization. The raise to speak (RTS) detector consists of four main compo- nents: an on-device gesture convolutional neural network (CNN) that uses accelerometer data to detect specific poses; an on-device speech CNN to detect proximal human speech; a policy model to combine signals from the motion and speech detector; and an off-device false trigger mitigation (FTM) system to reduce unin- tentional invocations trigged by the on-device detector. Majority of the components of the detector run on-device to preserve user privacy. The RTS detector was released in watchOS 5.0 and is running on millions of devices worldwide.

查看译文

关键词

gesture recognition, multimodal, neural network, speech detection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要