Isolated Word Recognition with Audio Derivation and CNN

Jingjing Zhang,Shuangjiu Xiao,Huichao Zhang,Lan Jiang

2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI)（2017）

引用 9|浏览26

暂无评分

摘要

We present a speaker-independent isolated word recognition approach with audio derivation and convolutional neural network(CNN) in this paper. In contrast with traditional sophisticated phonetic-based features extracted from audio, we utilize the spectrogram of audio as training data for convolutional neural network which transforms the isolated word recognition problem into the image recognition problem. Deep learning has high demands of training data, but it will reduce efficiency of the system to make such corpora. We present an audio-level data derivation approach, which makes it possible to obtain high recognition rate with a small number of audio seed data collected. It is achieved by formant perturbation, pitch shifting, time stretching and volume perturbation while maintaining semantic content. The approach presented in this paper reduces seed data amount demand of deep learning in isolated word recognition. Results show that accuracy improvement is significant with derived data and only 7.57%-15.14% of seed data is needed to achieve the same level accuracy.

查看译文

关键词

convolutional neuralnetwork,audio derivation,limited training set,spectrogram,isolated word recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要