Monophone-Based Background Modeling for Two-Stage On-Device Wake Word Detection

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)

引用 84|浏览174
暂无评分
摘要
Accurate on-device wake word detection is crucial to products with far-field voice control such as the Amazon Echo. It is quite challenging to build a wake word system with both low False Reject Rate (FRR) and low False Alarm Rate (FAR) in real scenarios where there are various types of background speech, music or noise, especially when computational resources on the device is limited. In this paper, we introduce a two-stage wake word system based on Deep Neural Network (DNN) acoustic modeling, propose a new way to model the non-keyword background events using monophone-based units and present how richer information can be extracted from those monophone units for final wake word detection. Under the new system, we could get around 16% relative reduction in FRR when fixing the false alarm level, and about 37% relative reduction in FAR on the other hand if we maintain the miss rate. For the 2nd stage classifier itself, it is able to reduce the false alarm rate relatively by about 67% on top of 1st stage hypothesis with very few computational resources.
更多
查看译文
关键词
wake word detection, deep neural network, monophone-based units
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要