Speech motion anomaly detection via cross-modal translation of 4D motion fields from tagged MRI
CoRR(2024)
摘要
Understanding the relationship between tongue motion patterns during speech
and their resulting speech acoustic outcomes – i.e., articulatory-acoustic
relation – is of great importance in assessing speech quality and developing
innovative treatment and rehabilitative strategies. This is especially
important when evaluating and detecting abnormal articulatory features in
patients with speech-related disorders. In this work, we aim to develop a
framework for detecting speech motion anomalies in conjunction with their
corresponding speech acoustics. This is achieved through the use of a deep
cross-modal translator trained on data from healthy individuals only, which
bridges the gap between 4D motion fields obtained from tagged MRI and 2D
spectrograms derived from speech acoustic data. The trained translator is used
as an anomaly detector, by measuring the spectrogram reconstruction quality on
healthy individuals or patients. In particular, the cross-modal translator is
likely to yield limited generalization capabilities on patient data, which
includes unseen out-of-distribution patterns and demonstrates subpar
performance, when compared with healthy individuals. A one-class SVM is then
used to distinguish the spectrograms of healthy individuals from those of
patients. To validate our framework, we collected a total of 39 paired tagged
MRI and speech waveforms, consisting of data from 36 healthy individuals and 3
tongue cancer patients. We used both 3D convolutional and transformer-based
deep translation models, training them on the healthy training set and then
applying them to both the healthy and patient testing sets. Our framework
demonstrates a capability to detect abnormal patient data, thereby illustrating
its potential in enhancing the understanding of the articulatory-acoustic
relation for both healthy individuals and patients.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要