Impact of Including Pathological Speech in Pre-training on Pathology Detection.

Tobias Weise,Andreas K. Maier,Kubilay Can Demir,Paula Andrea Pérez-Toro,Tomas Arias-Vergara,Björn Heismann,Elmar Nöth,Maria Schuster,Seung Hee Yang

TSD（2023）

引用 0|浏览17

暂无评分

摘要

Transfer learning has achieved state-of-the-art performance across many different areas, requiring magnitudes less labeled data compared to traditional methods. Pre-trained weights are learned in a self-supervised way on large amounts of unlabeled data, which are fine-tuned for the desired downstream task using labeled data. An example of this in the speech domain is the wav2vec2.0 framework, which was originally designed for automatic speech recognition (ASR) but can also be fine-tuned for general sequence classification tasks. This paper analyses the effects of including pathological speech during the pre-training of wav2vec2.0, where quantized speech representations are learned, on the performance of a fine-tuned pathology detection task. We show that this architecture can be successfully fine-tuned for cleft lip and palate (CLP) detection, where the best-performing model yields an F1-score of 82.3 % when pre-trained on healthy speech only. Our experiments show, that including pathological speech during pre-training drastically degrades the performance on detection of the same pathology for which it was fine-tuned. The worst-performing model was pre-trained exclusively on CLP speech, resulting in an F1-score of 33.9 % . Whilst performed experiments only focus on CLP, the magnitude of the results suggest, that other pathologies will also follow this trend.

查看译文

关键词

pathological speech,pathology detection,pre-training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要