Query-by-example keyword spotting system using multi-head attention and softtriple loss
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)(2021)
Abstract
This paper proposes a neural network architecture for tackling the query-by-example user-defined keyword spotting task. A multi-head attention module is added on top of a multi-layered GRU for effective feature extraction, and a normalized multi-head attention module is proposed for feature aggregation. We also adopt the softtriple loss - a combination of triplet loss and softmax loss - and showcase its effectiveness. We demonstrate the performance of our model on internal datasets with different languages and the public Hey-Snips dataset. We compare the performance of our model to a baseline system [1] and conduct an ablation study to show the benefit of each component in our architecture. The proposed work shows solid performance while preserving simplicity.
MoreTranslated text
Key words
User-defined Keyword Spotting,Query-by-Example,Multi-head Attention,Softtriple,Deep Metric Learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined