Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition

Naima Zerari,Samir Abdelhamid,Hassen Bouzgou,Christian Raymond

2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP)（2018）

引用 11|浏览14

暂无评分

摘要

Automatic Speech Recognition can be considered as a transcription of spoken utterances into text which can be used to monitor/command a specific system. In this paper, we propose a general end-to-end approach to sequence learning that uses Long Short-Term Memory (LSTM) to deal with the non-uniform sequence length of the speech utterances. The neural architecture can recognize the Arabic spoken digit spelling of an isolated Arabic word using a classification methodology, with the aim to enable natural human-machine interaction. The proposed system consists to, first, extract the relevant features from the input speech signal using Mel Frequency Cepstral Coefficients (MFCC) and then these features are processed by a deep neural network able to deal with the non uniformity of the sequences length. A recurrent LSTM or GRU architecture is used to encode sequences of MFCC features as a fixed size vector that will feed a multilayer perceptron network to perform the classification. The whole neural network classifier is trained in an end-to-end manner. The proposed system outperforms by a large gap the previous published results on the same database.

查看译文

关键词

Arabic digits,Speech recognition,Auto-encoder,Mel Frequency Cepstral Coefficients,Long Short-Term Memory,Multilayer perceptron network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要