End to End Speech Recognition System

user-5f165ac04c775ed682f5819f（2017）

引用 0|浏览35

暂无评分

摘要

Speech Recognition is the task of transcribing the speech signal into equivalent text. While automatic speech recognition has greatly benefited from the introduction of neural networks, the networks are at present only a single component in a complex pipeline in state of the art systems. In existing systems, the first stage of the pipeline is input feature extraction: standard techniques include melscale filterbanks. Neural networks are then trained to classify individual frames of acoustic data, and their output distributions are reformulated as emission probabilities for a hidden Markov model. The objective function used to train the networks is therefore substantially different from the true performance measure which is sequence-level transcription accuracy. This is precisely the sort of inconsistency that end-to-end learning seeks to avoid. In practice thus even is there is large gain in frame accuracy it translates to a negligible improvement, or even deterioration in transcription accuracy. An additional problem is that the frame-level training targets must be inferred from the alignments determined by the HMM. This leads to an awkward iterative procedure, where network retraining is alternated with HMM realignments to generate more accurate targets. In this report, we will describe various models for sequence labelling task and the problem of labelling unsegmented sequence data. We will explain a end to end speech recognition system that directly transcribes the audio data with text/phonemes. The system tries to replace the conventional speech recognition pipeline by a single recurrent neural network (RNN) architecture. We have chosen the spectrograms as a minimal preprocessing scheme. The system is based on the combination of a deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要