AENet: Learning Deep Audio Features for Video Analysis.
IEEE Transactions on Multimedia(2018)
摘要
We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear subword units that are present in speech. In order to incorporate this long-time frequency structure of audio events, w...
更多查看译文
关键词
Feature extraction,Hidden Markov models,Mel frequency cepstral coefficient,Visualization,Speech,Network architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络