Multi-Microphone Speaker Separation Based On Deep Doa Estimation

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)(2019)

引用 23|浏览21
暂无评分
摘要
In this paper, we present a multi-microphone speech separation algorithm based on masking inferred from the speakers direction of arrival (DOA). According to the W-disjoint orthogonality property of speech signals, each time-frequency (TF) bin is dominated by a single speaker. This TF bin can therefore be associated with a single DOA. In our procedure, we apply a deep neural network (DNN) with a U-net architecture to infer the DOA of each TF bin from a concatenated set of the spectra of the microphone signals. Separation is obtained by multiplying the reference microphone by the masks associated with the different DOAs. Our proposed deep direction estimation for speech separation (DDESS) method is inspired by the recent advances in deep clustering methods. Unlike already established methods that apply the clustering in a latent embedded space, in our approach the embedding is closely associated with the spatial information, as manifested by the different speakers' directions of arrival.
更多
查看译文
关键词
speakers direction,W-disjoint orthogonality property,speech signals,time-frequency bin,single speaker,TF bin,single DOA,deep neural network,U-net architecture,microphone signals,reference microphone,masks,deep direction estimation,deep clustering methods,multimicrophone speaker separation,deep DOA,multimicrophone speech separation algorithm,latent embedded space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要