Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features.

Lecture Notes in Artificial Intelligence(2016)

引用 17|浏览34
暂无评分
摘要
We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100% acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100% rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2KB memory and 15% complexity overhead.
更多
查看译文
关键词
Voice Activity Detector,Classification,Decision tree ensemble,Auditory masking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要