A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling.

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2019)

引用 186|浏览139
暂无评分
摘要
Sound event detection (SED) entails two subtasks: recognizing what types of sound events are present in an audio stream (audio tagging), and pinpointing their onset and offset times (localization). In the popular multiple instance learning (MIL) framework for SED with weak labeling, an important component is the pooling function. This paper compares five types of pooling functions both theoretically and experimentally, with special focus on their performance of localization. Although the attention pooling function is currently receiving the most attention, we find the linear softmax pooling function to perform the best among the five. Using this pooling function, we build a neural network called TALNet. It is the first system to reach state-of-the-art audio tagging performance on Audio Set, while exhibiting strong localization performance on the DCASE 2017 challenge at the same time.
更多
查看译文
关键词
Training,Labeling,Tagging,Task analysis,Event detection,Neural networks,Google
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要