AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The superior performance of our method is verified by our extensive evaluations on a number of datasets compared to state-of-the-art batch mode active learning methods

Querying Discriminative and Representative Samples for Batch Mode Active Learning

ACM Transactions on Knowledge Discovery From Data, no. 3 (2015)

Cited: 134|Views408
EI

Abstract

Empirical risk minimization (ERM) provides a principled guideline for many machine learning and data mining algorithms. Under the ERM principle, one minimizes an upper bound of the true risk, which is approximated by the summation of empirical risk and the complexity of the candidate classifier class. To guarantee a satisfactory learning ...More

Code:

Data:

0
Introduction
  • In many machine learning tasks, the authors need to collect the training data and manually annotate them by experts.
  • This procedure is expensive in most real-world applications, such as text classification [Xu et al 2003], collaborative filtering [Koren 2008], outlier detection [Abe et al 2006], and biomedicine [Warmuth et al 2001].
Highlights
  • In many machine learning tasks, we need to collect the training data and manually annotate them by experts
  • We extend the Empirical risk minimization principle to the active learning case and present a novel active learning framework
  • Empirical risk minimization is a successful guideline for designing machine learning and data mining methods [Burges 1998; Vapnik 1998]
  • It minimizes an upper bound of the true risk under the unknown data distribution. This upper bound is approximated by the summation of the empirical risk on the available data and a properly designed regularization term, which constrains the complexity of the candidate classifiers [Vapnik 1998; Bartlett and Mendelson 2002]
  • We query the samples that are expected to rapidly reduce the empirical risk and preserve the original source distribution at the same time. This enables our method to achieve consistent good performance during the whole active learning process. Under this active learning framework, we propose a practical batch mode active learning algorithm that is solved by alternating
  • The superior performance of our method is verified by our extensive evaluations on a number of datasets compared to state-of-the-art batch mode active learning methods
Methods
  • The authors compare the method with random selection and state-of-theart batch mode active learning methods in both binary and multiclass classification problems.

    5.1.
  • The authors use the training set for active learning and compare the prediction accuracy for different methods on the test set.
  • For Fbatch and Dbatch methods that need initial labeled data, the authors randomly sample the initial labeled data until there are enough labeled samples to train an initial classifier
  • The number of these initial samples are usually smaller than 10 in the experiments.
  • The experiment stops when 80% of the training set has been labeled or the learning accuracy does not increase for any method.
  • The authors only provide the result for this method on relatively small datasets
Results
  • ERM is a successful guideline for designing machine learning and data mining methods [Burges 1998; Vapnik 1998].
  • It minimizes an upper bound of the true risk under the unknown data distribution.
  • In most cases, the method performs consistently better than the competitors during the whole active learning process.
  • These results demonstrate that both discriminative and representative information are critical to active learning, and a proper balance of these two sources of information will boost the active learning performance
Conclusion
  • In the proposed active learning algorithm, the authors properly adopt both labeled and unlabeled samples for each query.
  • Most of the existing semi-supervised active learning methods directly apply the semi-supervised learning technique to update the learning model during the active learning process
  • For such active learning methods, if the data follow the clustering assumption, the learning accuracy can be boosted by the use of a large amount of unlabeled samples.
  • The authors query the samples that are expected to rapidly reduce the empirical risk and preserve the original source distribution at the same time
  • This enables the method to achieve consistent good performance during the whole active learning process.
  • The authors plan to extend the method to the semi-supervised learning setting
Tables
  • Table1: Characteristics of the Datasets, Including the Numbers of the Features and Samples
  • Table2: The Win/Tie/Loss Counts for Our Method Versus Each Competing Method during the Whole Active Learning Process Based on Paired t-Tests at the 95% Confidence Level
  • Table3: The Win/Tie/Loss Counts for Our Method Versus Each Competing Method during the Whole Multiclass Active Learning Process Based on Paired t-Tests at the 95% Confidence Level
Download tables as Excel
Funding
  • This work is supported in part by NSF CCF-1025177, NIH LM010730, and ONR N00014-11-1-0108
Reference
  • Naoki Abe, Bianca Zadrozny, and John Langford. 2006. Outlier detection by active learning. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 504–509.
    Google ScholarLocate open access versionFindings
  • Peter L. Bartlett and Shahar Mendelson. 200Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463–482.
    Google ScholarLocate open access versionFindings
  • Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. 2009. Importance weighted active learning. In Proceedings of the 26th International Conference on Machine Learning (ICML). 49–56.
    Google ScholarLocate open access versionFindings
  • James C. Bezdek and Richard J. Hathaway. 2003. Convergence of alternating optimization. Neural, Parallel, and Scientific Computations 11, 4, 351–368.
    Google ScholarLocate open access versionFindings
  • Karsten M. Borgwardt, Arthur Gretton, Malte J. Rasch, Hans-Peter Kriegel, Bernhard Scholkopf, and Alex J. Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, 14, 49–57.
    Google ScholarLocate open access versionFindings
  • Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3, 1–122.
    Google ScholarLocate open access versionFindings
  • Christopher J. C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121–167.
    Google ScholarLocate open access versionFindings
  • Colin Campbell, Nello Cristianini, and Alex J. Smola. 2000. Query learning with large margin classifiers. In Proceedings of the 17th International Conference on Machine Learning (ICML). 111–118.
    Google ScholarLocate open access versionFindings
  • Shayok Chakraborty, Vineeth Balasubramanian, and Sethuraman Panchanathan. 2011. Dynamic batch mode active learning. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2649–2656.
    Google ScholarLocate open access versionFindings
  • Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, 27:1–27:27.
    Google ScholarLocate open access versionFindings
  • Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien (Eds.). 2006. Semi-Supervised Learning. MIT Press, Cambridge, MA.
    Google ScholarFindings
  • Rita Chattopadhyay, Zheng Wang, Wei Fan, Ian Davidson, Sethuraman Panchanathan, and Jieping Ye. 20Batch mode active sampling based on marginal probability distribution matching. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 741–749.
    Google ScholarLocate open access versionFindings
  • Yuxin Chen and Andreas Krause. 20Near-optimal batch mode active learning and adaptive submodular optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML). 160–168.
    Google ScholarLocate open access versionFindings
  • Yunmei Chen and Xiaojing Ye. 2011. Projection onto a simplex. arXiv preprint arXiv:1101.6081.
    Findings
  • David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 1, 129–145.
    Google ScholarLocate open access versionFindings
  • F. d’Alche Buc, Yves Grandvalet, and Christophe Ambroise. 2002. Semi-supervised MarginBoost. In Advances in Neural Information Processing Systems 14, 553–563.
    Google ScholarLocate open access versionFindings
  • Sanjoy Dasgupta. 2011. Two faces of active learning. Theoretical Computer Science 412, 19, 1767–1781.
    Google ScholarLocate open access versionFindings
  • Richard M. Dudley. 2002. Real Analysis and Probability. Cambridge University Press.
    Google ScholarFindings
  • Andrew Frank and Arthur Asuncion. 2010. UCI Machine Learning Repository. Retrieved December 28, 2014, from http://archive.ics.uci.edu/ml.
    Findings
  • Yoav Freund, H. Sebastian Seung, Eli Shamir, and Naftali Tishby. 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2–3, 133–168.
    Google ScholarLocate open access versionFindings
  • Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf, and Alexander Smola. 2012. A kernel two-sample test. Journal of Machine Learning Research 13, 723–773.
    Google ScholarLocate open access versionFindings
  • Yuhong Guo. 2010. Active instance sampling via matrix partition. In Advances in Neural Information Processing Systems 23, 802–810.
    Google ScholarLocate open access versionFindings
  • Yuhong Guo and Dale Schuurmans. 2008. Discriminative batch mode active learning. In Advances in Neural Information Processing Systems 20, 593–600.
    Google ScholarLocate open access versionFindings
  • Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2006a. Batch mode active learning and its application to medical image classification. In Proceedings of the 23rd International Conference on Machine Learning (ICML). 417–424.
    Google ScholarLocate open access versionFindings
  • Steven C. H. Hoi, Rong Jin, and Michael R. Lyu. 2006b. Large-scale text categorization by batch mode active learning. In Proceedings of the 15th International Conference on World Wide Web (WWW). 633–642.
    Google ScholarLocate open access versionFindings
  • Steven C. H. Hoi, Rong Jin, and Michael R. Lyu. 2009a. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering 21, 9, 1233–1248.
    Google ScholarLocate open access versionFindings
  • Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2008. Semi-supervised SVM batch mode active learning for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–7.
    Google ScholarLocate open access versionFindings
  • Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2009b. Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems 27, 3, Article No. 16.
    Google ScholarLocate open access versionFindings
  • Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou. 2010. Active learning by querying informative and representative examples. In Advances in Neural Information Processing Systems 23, 892–900.
    Google ScholarLocate open access versionFindings
  • Ajay J. Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2010. Multi-class batch-mode active learning for image classification. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation (ICRA). 1873–1878.
    Google ScholarLocate open access versionFindings
  • Yehuda Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 426–434.
    Google ScholarLocate open access versionFindings
  • Hieu T. Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML). 79–86.
    Google ScholarLocate open access versionFindings
  • Ryan Rifkin and Aldebaro Klautau. 2004. In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101–141.
    Google ScholarLocate open access versionFindings
  • Nicholas Roy and Andrew McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML). 441–448.
    Google ScholarLocate open access versionFindings
  • Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin–Madison.
    Google ScholarFindings
  • H. Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the 5th Annual Conference on Computational Learning Theory (COLT). 287–294.
    Google ScholarLocate open access versionFindings
  • Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Scholkopf, and Gert R. G. Lanckriet. 2010. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research 11, 1517–1561.
    Google ScholarLocate open access versionFindings
  • Masashi Sugiyama. 2006. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research 7, 141–166.
    Google ScholarLocate open access versionFindings
  • Simon Tong and Daphne Koller. 2002. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66.
    Google ScholarLocate open access versionFindings
  • Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley.
    Google ScholarFindings
  • Zheng Wang, Shuicheng Yan, and Changshui Zhang. 2011. Active learning with adaptive regularization. Pattern Recognition 44, 10–11, 2375–2383.
    Google ScholarLocate open access versionFindings
  • Manfred K. Warmuth, Gunnar Ratsch, Michael Mathieson, Jun Liao, and Christian Lemmen. 2001. Active learning in the drug discovery process. In Advances in Neural Information Processing Systems 14, 1449–1456.
    Google ScholarLocate open access versionFindings
  • Zhao Xu, Kai Yu, Volker Tresp, Xiaowei Xu, and Jizhi Wang. 2003. Representative sampling for text classification using support vector machines. In Proceedings of the European Conference on Information Retrieval (ECIR). 393–407.
    Google ScholarLocate open access versionFindings
  • Kai Yu, Jinbo, and Volker Tresp. 2006. Active learning via transductive experimental design. In Proceedings of the 23rd International Conference on Machine Learning (ICML). 1081–1088.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. 2003. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.
    Google ScholarLocate open access versionFindings
  • Received October 2013; revised April 2014; accepted September 2014
    Google ScholarFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn