Unusable Spoken Response Detection with BLSTM Neural Networks

2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)(2018)

引用 1|浏览112
暂无评分
摘要
Voice biometrics has been applied to enhance the security of spoken language proficiency tests and ensure valid test scores by detecting fraudulent activity. These methods can, however, be triggered by certain distortions, including background noise and adjacent test-takers, resulting in false positive alarms. In this paper, a two-layer bi-directional LSTM RNN model is employed to detect these distorted (unusable) responses and a sub-sampling method is applied to reduce the difficulties of model training caused by very long input sequence and imbalanced training data. The system is evaluated on a corpus that was collected from an assessment of English language proficiency around the world. Results show that our approach significantly outperforms two baselines: a Gaussian mixture model (GMM) classifying frame-level features and an AdaBoost classifier operating on i-vectors. Our system’s F-score in unusable response detection is 0.60 compared to 0.43 and 0.49 for the two baseline systems.
更多
查看译文
关键词
Feature extraction,Task analysis,Training,Speech recognition,Neural networks,Data models,Training data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要