谷歌浏览器插件
订阅小程序
在清言上使用

TACos: Learning Temporally Structured Embeddings for Few-Shot Keyword Spotting with Dynamic Time Warping.

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

引用 0|浏览0
暂无评分
摘要
To segment a signal into blocks to be analyzed, few-shot keyword spotting (KWS) systems often utilize a sliding window of fixed size. Because of the varying lengths of different keywords or their spoken instances, choosing the right window size is a problem: A window should be long enough to contain all necessary information needed to recognize a keyword but a longer window may contain irrelevant information such as multiple words or noise and thus makes it difficult to reliably detect on- and offsets of keywords. We propose TACos, a novel angular margin loss for deriving two-dimensional embeddings that retain temporal properties of the underlying speech signal. In experiments conducted on KWS-DailyTalk, a few-shot KWS dataset presented in this work, using these embeddings as templates for dynamic time warping is shown to outperform using other representations or a sliding window and that using time-reversed segments of the keywords during training improves the performance.
更多
查看译文
关键词
keyword spotting,representation learning,angular margin loss,few-shot learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要