Leveraging Frequency-Dependent Kernel and DIP-Based Clustering for Robust Speech Activity Detection in Naturalistic Audio Streams.
IEEE/ACM Transactions on Audio, Speech, and Language Processing(2018)
摘要
Speech activity detection (SAD) is front-end in most speech systems, e.g., speaker verification, speech recognition etc. Supervised SAD typically leverages machine learning models trained on annotated data. For applications like zero-resource speech processing and NIST-OpenSAT-2017 public safety communications task, it might not be feasible to collect SAD annotations. SAD is challenging for natura...
更多查看译文
关键词
Training,Robustness,Speech processing,Rats,Signal to noise ratio,Kernel,Gaussian mixture model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络