Improved Classification Of Speaking Styles For Mental Health Monitoring Using Phoneme Dynamics

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5（2011）

引用 26|浏览10

暂无评分

摘要

This paper investigates the usefulness of segmental phoneme-dynamics for classification of speaking styles. We modeled transition details based on the phoneme sequences emitted by a speech recognizer, using data obtained from a recording of 39 depressed patients with 7 different speaking styles - normal, pressured, slurred, stuttered, flat, slow and fast speech. We designed and compared two set of phoneme models: a language model treating each phoneme as a word unit (one for each style) and a context-dependent phoneme duration model based on Gaussians for each speaking style considered. The experiments showed that language modeling at the phoneme level performed better than the duration model. We also found that better performance can be obtained by user normalization. To see the complementary effect of the phoneme-based models, the classifiers were combined at a decision level with a Hidden Markov Model (HMM) classifier built from spectral features. The improvement was 5.7% absolute (10.4% relative), reaching 60.3% accuracy in 7-class and 71.0% in 4-class classification.

查看译文

关键词

speaking styles, language model, phoneme, duration

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要