Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest

Multimedia Tools Appl.(2017)

引用 7|浏览58
暂无评分
摘要
An auditory attention model that consists of binaural source segregation and also full localization of a target speech signal in a multi-talker environment is presented. The joint acoustic features, such as monaural, binaural and direct to reverberant ratio (DRR) that are successfully incorporated into deep recurrent neural network (DRNN) based joint discriminative model for the speech source segregation process. The monaural and binaural features are extracted from binaural speech mixtures of two speakers by using mean Hilbert envelope coefficients (MHEC) and interaural time, and level differences, respectively. The performance of deep recurrent network based speech segregation is validated in terms of signal to interference, signal to distortion and signal to artifacts and compared with existing architectures, including deep neural network (DNN). The proposed system is observed and found to be more suitable than monaural speech segregation especially when the desired target and interfering sources are located at different positions. The study also proposes full localization of segregated speech source that created the possibility to select the desired speaker of interest from an input acoustic speech mixture in a reverberant environment. The developed system has the capability to handle binaural segregation problem in multi-source and reverberation conditions. The auditory attention model provides accurate information about speech sources even when the desired targets are located at 2 m and above with higher reverberation time.
更多
查看译文
关键词
Deep recurrent neural network,binaural speech segregation,distance and position information,Computational Auditory Scene Analysis,direct-to-reverberant ratio(DRR)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要