AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The evaluations indicate that experimental design active learning of logistic regression is one of the more robust strategies available

Active learning for logistic regression: an evaluation

Machine Learning, no. 3 (2007): 235-265

引用243|浏览38
EI
下载 PDF 全文
引用
微博一下

摘要

Which active learning methods can we expect to yield good performance in learning binary and multi-category logistic regression classifiers? Addressing this question is a natural first step in providing robust solutions for active learning across a wide variety of exponential models including maximum entropy, generalized linear, log-linea...更多

代码

数据

简介
重点内容
  • Procurement of labeled training data is the seminal step of training a supervised machine learning algorithm
  • Focus soon turned to methods applicable to pool-based active learning including the query by committee method (Seung et al 1992) and experimental design methods based on A-optimality (Cohn 1996)
  • The evaluations indicate that experimental design active learning of logistic regression is one of the more robust strategies available
  • Future work in active learning using logistic regression will benefit from evaluating against these gold standard methods
  • Throughout the active learning literature, we found statements to the effect that these methods are too computationally expensive to evaluate, but our results demonstrate that experimental design approaches are tractable for many data sets
  • The experimental design methods have the disadvantage of memory and computational complexity, and we were unable to evaluate them on two of the larger document classification tasks
结果
  • The evaluations in this study have specific goals: to discover which methods work in addition to why methods perform badly when they do
  • Towards this end, the authors assembled a suite of machine learning data sets consisting of a diverse number of predictors, categories and domains.
  • Table 4 contains the result of a hypothesis test on the mean stopping point accuracy: comparing different alternatives to random sampling, while Table 5 presents the same experiments in terms of the percent of the number of random examples needed to generate stopping point accuracy of each method
结论
  • The evaluations indicate that experimental design active learning of logistic regression is one of the more robust strategies available.
  • The experimental design methods produced attractive results much of the time without ever performing worse than random sampling.
  • This can be seen by the hypothesis testing results and the deficiency measurements in Table 6.
  • The result is so surprising that a separate section (5.5) is included to explore whether negative heuristic performance is an artifact of an “unlucky” evaluation design
表格
  • Table1: Notation used in the decomposition of squared error
  • Table2: Descriptions of the data sets used in the evaluation. Included are counts of: the number of categories (Classes), the number of observations (Obs), the test set size after splitting the data set into pool/test sets (Test), the number of predictors (Pred), the number of observations in the majority category (Maj), and the training set stopping point for the evaluation (Stop)
  • Table3: Average accuracy and squared error ((18), left hand side) results for the tested data sets when the entire pool is used as the training set. The data sets are sorted by squared error as detailed in Sect. 5.4
  • Table4: Results of hypothesis tests comparing bagging and seven active learning method accuracies to random sampling at the final training set size. ‘+’ indicates statistically significant improvement and ‘–’ indicates statistically significant deterioration. ‘NA’ indicates ‘not applicable.’ Figures 2–5 display the actual results used for hypothesis testing as a box plot
  • Table5: Results comparing random sampling, bagging, and seven active learning methods reported as the percentage of random examples over (or under) the final training set size needed to give similar accuracies. Active learning methods were seeded with 20 random examples, and stopped when training set sizes reached final tested size (300 observations with exceptions; see Sect. 5.3 for details on the rationale for different stopping points)
  • Table6: Average deficiency (see (47)) achieved by the various methods. For each dataset the winner appears in boldface and marked with a star. The runner-up appears in boldface
  • Table7: The average percentage of matching test set margins when comparing models trained on data sets of size 300 to a model trained on the entire pool. Margins match if they are formed from the same pair of categories. Ten repetitions of the experiment produce the averages below
  • Table8: Results of hypothesis tests comparing four heuristic active learning method accuracies to random sampling at the final training set size. These active learners used the larger candidate size of 300. ‘+’ indicates statistically significant improvement and ‘–’ indicates statistically significant deterioration compared to random sampling. ‘NA’ indicates ‘not applicable’
  • Table9: Average deficiency (see (47)) achieved by the various methods using a larger candidate size of 300. For each dataset the winner appears in boldface and marked with a star. The runner-up appears in boldface
  • Table10: Results of hypothesis tests comparing bagging and four active learning method accuracies to random sampling at training set size 600. ‘+’ indicates statistically significant improvement and ‘–’ indicates statistically significant deterioration. ‘NA’ indicates ‘not applicable’
  • Table11: Average deficiency (see (47)) achieved by the various methods beginning at 300 observations and ending at 600. For each dataset the winner appears in boldface and marked with a star. The runner-up appears in boldface
  • Table12: Results of hypothesis tests comparing bagging and two query by bagging methods using a bag size of 15. ‘+’ indicates statistically significant improvement and ‘–’ indicates statistically significant deterioration. ‘NA’ indicates ‘not applicable’
  • Table13: Average deficiency (see (47)) achieved by bagging and the two query by bagging methods using bag size 15. For each dataset the winner appears in boldface and marked with a star. The runner-up appears in boldface
Download tables as Excel
基金
  • Andrew Schein was supported by NSF grant ITR-0205448
引用论文
  • Abe, N., & Mamitsuka, H. (1998). Query learning strategies using boosting and bagging. In Proceedings of the 15th international conference on machine learning (ICML1998) (pp. 1–10).
    Google ScholarLocate open access versionFindings
  • Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75, 87–106.
    Google ScholarLocate open access versionFindings
  • Banko, M., & Brill, E. (2001). Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39’th annual ACL meeting (ACL2001).
    Google ScholarLocate open access versionFindings
  • Baum, E. B. (1991). Neural net algorithms that learn in polynomial time from examples and queries. IEEE Transactions on Neural Networks, 2(1).
    Google ScholarLocate open access versionFindings
  • Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.
    Google ScholarLocate open access versionFindings
  • Bickel, P. J., & Doksum, K. A. (2001). Mathematical statistics (2nd ed., Vol. 1). Englewood Cliffs: Prentice Hall.
    Google ScholarFindings
  • Blake, C., & Merz, C. (1998). UCI repository of machine learning databases.
    Google ScholarFindings
  • Boram, Y., El-Yaniv, R., & Luz, K. (2003). Online choice of active learning algorithms. In Twentieth international conference on machine learning (ICML-2003).
    Google ScholarLocate open access versionFindings
  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
    Google ScholarLocate open access versionFindings
  • Buja, A., Stuetzle, W., & Shen, Y. (2005). Degrees of boosting: a study of loss functions for classification and class probability estimation. Working paper.
    Google ScholarFindings
  • Chaloner, K., & Larntz, K. (1989). Optimal Bayesian design applied to logistic regression experiments. Journal of Statistical Planning and Inference, 21, 191–208.
    Google ScholarLocate open access versionFindings
  • Chen, J., Schein, A. I., Ungar, L. H., & Palmer, M. S. (2006). An empirical study of the behavior of active learning for word sense disambiguation. In Proceedings of the 2006 human language technology conference—North American chapter of the association for computational linguistics annual meeting HLT-NAACL 2006.
    Google ScholarLocate open access versionFindings
  • Cohn, D. A. (1996). Neural network exploration using optimal experimental design. Neural Networks, 9(6), 1071–1083.
    Google ScholarLocate open access versionFindings
  • Cohn, D. A. (1997). Minimizing statistical bias with queries. In Advances in neural information processing systems 9. Cambridge: MIT Press.
    Google ScholarFindings
  • Craven, M., DiPasquo, D., Freitag, D., McCallum, A. K., Mitchell, T. M., & Nigam, K. et al. (2000). Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1/2), 69–113.
    Google ScholarLocate open access versionFindings
  • Dagan, I., & Engelson, S. P. (1995). Committee-based sampling for training probabilistic classifiers. In International conference on machine learning (pp. 150–157).
    Google ScholarLocate open access versionFindings
  • Darroch, J. N., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43, 1470–1480.
    Google ScholarLocate open access versionFindings
  • Davis, R., & Prieditis, A. (1999). Designing optimal sequential experiments for a Bayesian classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(3). Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.
    Google ScholarLocate open access versionFindings
  • Frey, P. W., & Slate, D. J. (1991). Letter recognition using Holland-style adaptive classifiers. Machine Learning, 6(2). Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., & Dahlgren, N. (1993). Darpa timit acousticphonetic continuous speech corpus CD-ROM. NIST. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–58.
    Google ScholarLocate open access versionFindings
  • Gilad-Bachrach, R., Navot, A., & Tishby, N. (2003). Kernel query by committee (KQBC) (Tech. Rep. No. 2003-88). Leibniz Center, the Hebrew University.
    Google ScholarFindings
  • Hosmer, D. E., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.
    Google ScholarFindings
  • Hwa, R. (2004). Sample selection for statistical parsing. Computational Linguistics, 30(3). Hwang, J.-N., Choi, J. J., Oh, S., & Marks, R. J. (1991). Query-based learning applied to partially trained multilayer perceptrons. IEEE Transactions on Neural Networks, 2(1). Jin, R., Yan, R., Zhang, J., & Hauptmann, A. G. (2003). A faster iterative scaling algorithm for conditional exponential model. In Proceedings of the twentieth international conference on machine learning (ICML2001), Washington, DC. Kaynak, C. (1995). Methods of combining multiple classifiers and their applications to handwritten digit recognition. Unpublished master’s thesis, Bogazici University.
    Google ScholarFindings
  • Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (pp. 282–289). Los Altos: Kaufmann.
    Google ScholarLocate open access versionFindings
  • Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In W.B. Croft & C.J. van Rijsbergen (Eds.), Proceedings of SIGIR-94, 17th ACM international conference on research and development in information retrieval (pp. 3–12), Dublin. Heidelberg: Springer.
    Google ScholarLocate open access versionFindings
  • MacKay, D. J. C. (1991). Bayesian methods for adaptive models. Unpublished doctoral dissertation, California Institute of Technology.
    Google ScholarFindings
  • MacKay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 698–714.
    Google ScholarLocate open access versionFindings
  • Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation.
    Google ScholarFindings
  • McCallum, A., & Nigam, K. (1998). Employing em in pool-based active learning for text classification. In Proceedings of the 15th international conference on machine learning (ICML1998).
    Google ScholarLocate open access versionFindings
  • McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). Boca Raton: CRC Press.
    Google ScholarFindings
  • Melville, P., & Mooney, R. (2004). Diverse ensembles for active learning. In Proceedings of the 21st international conference on machine learning (ICML-2004) (pp. 584–591).
    Google ScholarLocate open access versionFindings
  • Mitchell, T. M. (1997). Machine learning. New York: McGraw–Hill. Nigam, K., Lafferty, J., & McCallum, A. (1999). Using maximum entropy for text classification. In IJCAI-99 workshop on machine learning for information filtering.
    Google ScholarLocate open access versionFindings
  • Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Berlin: Springer.
    Google ScholarFindings
  • Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th international conference on machine learning (pp. 441–448). San Francisco: Kaufmann.
    Google ScholarLocate open access versionFindings
  • Saar-Tsechansky, M., & Provost, F. (2001). Active learning for class probability estimation and ranking. In Proceedings of the international joint conference on artificial intelligence (pp. 911–920).
    Google ScholarLocate open access versionFindings
  • Schein, A. I. (2005). Active learning for logistic regression. Dissertation in Computer and Information Science, The University of Pennsylvania.
    Google ScholarFindings
  • Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Computational learning theory (pp. 287–294).
    Google ScholarLocate open access versionFindings
  • Steedman, M., Hwa, R., Clark, S., Osborne, M., Sarkar, A., & Hockenmaier, J. (2003). Example selection for bootstrapping statistical parsers. In Proceedings of the annual meeting of the North American chapter of the ACL, Edmonton, Canada.
    Google ScholarLocate open access versionFindings
  • Tang, M., Luo, X., & Roukos, S. (2002). Active learning for statistical natural language parsing. In ACL 2002.
    Google ScholarLocate open access versionFindings
  • Zheng, Z., & Padmanabhan, B. (2006). Selectively acquiring customer information: A new data acquisition problem and an active learning-based solution. Management Science, 52(5), 697–712.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn