Sequential Testing In Classifier Evaluation Yields Biased Estimates Of Effectiveness
SIGIR '13: The 36th International ACM SIGIR conference on research and development in Information Retrieval Dublin Ireland July, 2013(2013)
摘要
It is common to develop and validate classifiers through a process of repeated testing, with nested training and/or test sets of increasing size. We demonstrate in this paper that such repeated testing leads to biased estimates of classifier effectiveness. Experiments on a range of text classification tasks under three sequential testing frameworks show all three lead to optimistic estimates of effectiveness. We calculate empirical adjustments to unbias estimates on our data set, and identify directions for research that could lead to general techniques for avoiding bias while reducing labeling costs.
更多查看译文
关键词
Supervised learning,text categorization,evaluation,F-measure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络