Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition

CIRCUITS SYSTEMS AND SIGNAL PROCESSING(2022)

引用 1|浏览1
暂无评分
摘要
Data augmentation is an essential component in building a dysarthric speech recognition system, as speech data collection from dysarthric speakers with varying degree of disorder is difficult. Dysarthric speech recognition systems are mostly built for isolated words as most of the low intelligible dysarthric speakers are fluent in speaking words in isolation. However, mild and moderate dysarthric speakers can formulate few phrases up to 5 words. Data augmentation procedures for continuous dysarthric speech are yet to be explored. In the current work, virtual microphone array synthesis and multi-resolution feature extraction-based data augmentation (VM-MRFE) proposed by the authors Mariya Celin et al. (IEEE J Sel Top Signal Process 14(2):346–354, 2020) is used to augment continuous dysarthric speech. The current work proposes a transfer learning-based speaker-dependent ASR system (isolated and continuous) for 15 dysarthric speakers from UA corpus and 20 dysarthric speakers from the SSN-Tamil Dysarthric Speech developed by the authors. The conventional speed and volume perturbation-based data augmentation is carried out for comparison. It is observed that for isolated word recognition the combination of speed & volume perturbation and the proposed VM-MRFE-based technique showed a reduction in WER of up to 29.98%, whereas for continuous speech the proposed VM-MRFE provided a reduction in WER by 24.95% over the conventional speed and volume perturbation-based data augmentation.
更多
查看译文
关键词
Data augmentation,Transfer learning,Dysarthria,Speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要