Harnessing the Power of CNN-Transformer Encoders in Stress Speech Analysis

2023 International Conference on Information Technology and Computing (ICITCOM)(2023)

引用 0|浏览2
Stress is the physiological response to mental, emotional, or physical stress, which varies between individuals. A survey by Ipsos Global showed that around 30% of respondents identified stress as a significant health issue. Some countries in Southeast Asia, such as Cambodia, have much higher rates of depression than the world average. In Indonesia, the stress rate reached 9.8% in 2018. This research focuses on Speech Stress Recognition (SSR), an automated method that recognizes stress levels through speech characteristic analysis. We use Mel-Frequency Cepstral Coefficients (MFCC) feature extraction and the CNN-Transformer Encoder model. Evaluation results on the SUSAS dataset showed an overall accuracy of 73.76%. When the classification results are viewed by gender, male data appears better at classifying stress levels than female data. To improve performance, we implemented the Voice Activity Detection method, which resulted in an accuracy of 81% for male and 69.23% for female. The findings of this research have potential applications in various fields, including mental health and emotion analysis in human communication.
CNN,MFCC,Speech Stress Recognition,Stress,Transformer Encoder
AI 理解论文
Chat Paper