Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

user-5da93e5d530c70bec9508e2b(2018)

引用 21|浏览2
暂无评分
摘要
We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.
更多
查看译文
关键词
Recurrent neural network,Word error rate,Source separation,Ambisonics,Communication channel,Spatial filter,Speech recognition,Separation (aeronautics),Computer science,High order
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要