Semi-sparse Residual Recurrent Neural Network Via Dictionary Representation for Throat Microphone Quality Enhancement

Applied soft computing（2022）

引用 1|浏览5

暂无评分

摘要

Throat microphone (TM) speech can be used for communication in noisy environment as it collects signals directly from human skin, but it should be improved in clarity and intelligibility due to the severe loss of high-frequency components. As recovery directly by neural networks is not sufficient to achieve satisfactory performance, we propose a dictionary representation based neural network to address this issue. Specifically, a magnitude spectrum dictionary of air-conducted speech is computed via sparse non-negative matrix factorization (SNMF), and then it is used to represent the transformed speech in hidden layer of the network. Meanwhile, a compensating dictionary is adopted to improve the representation accuracy. A memory efficient Semi-sparse Residual Recurrent Neural Network (SResRNN) with interactive mechanism and a special ResNet is employed to generate the coefficients on the dictionaries. Lastly, a three-layer neural network using a special initialization scheme is constructed as the recovery model. In the experiments, the model is compared with other five recovering models, and different criteria are adopted to measure the performance, the objective and subjective results can demonstrate the superiority of our proposed model.

查看译文

关键词

Speech recovery,Throat microphone speech,Long short-term memory,Non -negative matrix factorization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要