Convolutional Neural Networks with 3-D Kernels for Voice Activity Detection in a Multiroom Environment

Smart Innovation Systems and Technologies(2018)

引用 9|浏览1
暂无评分
摘要
This paper focuses on employing Convolutional Neural Networks (CNN) with 3-D kernels for Voice Activity Detectors in multi-room domestic scenarios (mVAD). This technology is compared with the Multi Layer Perceptron (MLP) and interesting advancements are observed with respect to previous works of the authors. In order to approximate real-life scenarios, the DIRHA dataset is exploited. It has been recorded in a home environment by means of several microphones arranged in various rooms. Our study is composed by a multi-stage analysis focusing on the selection of the network size and the input microphones in relation with their number and position. Results are evaluated in terms of Speech Activity Detection error rate (SAD). The CNN-mVAD outperforms the other method with a significant solidity in terms of performance statistics, achieving in the best overall case a SAD equal to 7.0%.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要