Confidence-Based Scoring: A Useful Diagnostic Tool For Detection Tasks

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5(2013)

引用 23|浏览17
暂无评分
摘要
This paper uses an unconventional analysis as a tool to diagnose the problems with three different speech activity detection systems. The unconventional analysis is to score the frames in an audio file in order of confidence, starting with the frame that we have the most confidence in and progressing towards less and less confident frames. By keeping track of the cumulative number of errors, we can determine how the errors are distributed across the data. Using speech activity detection on highly degraded audio as a case example, we show how this simple analysis can yield useful insight into both system performance and the data itself. In our case example, we use the analysis to establish three main points. First, a small percentage of the frames account for a lion's share of the errors. Second, three different systems perform very poorly on the same small subset of data despite the fact that the systems adopt very different decoding algorithms and features. In other words, three very different systems agree on which data is 'hard'. Third, the `hard' data is primarily characterized by its proximity to speech-nonspeech boundaries. Through follow-up analyses, we show that this phenomenon is not merely an artifact of ground truth inaccuracy, but rather a steady progression of the data becoming harder and harder to classify correctly as one moves closer to the boundaries. Through this case example, we demonstrate the utility of confidence-based scoring as a general diagnostic tool for detection tasks on time-series data.
更多
查看译文
关键词
confidence-based scoring,speech activity detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要