Processing multi-channel audio waveforms

Tara N. Sainath,Ron J. Weiss,Kevin William Wilson,Andrew W. Senior,Arun Narayanan,Yedid Hoshen,Michiel A.U. Bacchiani

user-5f8cf7e04c775ec6fa691c92（2017）

引用 87|浏览36

暂无评分

摘要

Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

查看译文

关键词

Audio filter,Audio signal flow,Acoustic model,Convolution,Artificial neural network,Waveform,Time domain,Communication channel,Speech recognition,Computer science

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要