MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler

Zhining Liu
Zhining Liu
Pengfei Wei
Pengfei Wei
Wei Cao
Wei Cao

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views20
Weibo:
Empirical results show that MESA achieves superior performance on various tasks with high sample efficiency

Abstract:

Imbalanced learning (IL), i.e., learning unbiased models from class-imbalanced data, is a challenging problem. Typical IL methods including resampling and reweighting were designed based on some heuristic assumptions. They often suffer from unstable performance, poor applicability, and high computational cost in complex tasks where their ...More
0
Introduction
  • Due to the naturally-skewed class distributions, has been widely observed in many real-world applications such as click prediction, fraud detection, and medical diagnosis [13, 15, 20].
  • Canonical classification algorithms usually induce the bias, i.e., perform well in terms of global accuracy but poorly on the minority class, in solving class imbalance problems.
  • Typical imbalanced learning (IL) algorithms attempt to eliminate the bias through data resampling [6, 16, 17, 25, 32] or reweighting [27, 30, 36] in the learning process.
  • All these methods have been observed to suffer from three major limitations: (I) unstable performance due to the sensitivity to outliers, (II)
Highlights
  • Class imbalance, due to the naturally-skewed class distributions, has been widely observed in many real-world applications such as click prediction, fraud detection, and medical diagnosis [13, 15, 20]
  • All the results show the superiority of MESA to other traditional ensemble imbalanced learning (EIL) baselines in handling the overlapping, noises, and poor minority class representation
  • The results show that MESA achieves competitive performance on various real-world tasks
  • Compared with prevailing meta-learning imbalanced learning (IL) solutions that are limited to be co-optimized with DNNs, MESA is a generic framework capable of working with various learning models
  • Our meta-sampler is trained over task-agnostic meta-data and can be transferred to new tasks, which greatly reduces the meta-training cost
  • Empirical results show that MESA achieves superior performance on various tasks with high sample efficiency
Methods
  • Ensemble imbalanced learning (EIL) is known to effectively improve typical IL solutions by combining the outputs of multiple classifiers (e.g., [7, 29, 31, 35, 42]).
  • These EIL approaches prove to be highly competitive [22] and gain increasing popularity [15] in IL.
  • As the success of deep learning relies on the massive training data, mainly from the well-structured data domain like computer vision and natural language processing, the applications of these methods to other learning models in traditional classification tasks are highly constrained
Conclusion
  • The authors propose a novel imbalanced learning framework MESA.
  • It contains a meta-sampler that adaptively selects training data to learn effective cascade ensemble classifiers from imbalanced data.
  • Rather than following random heuristics, MESA directly optimizes its sampling strategy for better generalization performance.
  • Compared with prevailing meta-learning IL solutions that are limited to be co-optimized with DNNs, MESA is a generic framework capable of working with various learning models.
  • The authors plan to explore the potential of meta-knowledge-driven ensemble learning in the long-tail multi-classification problem
Summary
  • Introduction:

    Due to the naturally-skewed class distributions, has been widely observed in many real-world applications such as click prediction, fraud detection, and medical diagnosis [13, 15, 20].
  • Canonical classification algorithms usually induce the bias, i.e., perform well in terms of global accuracy but poorly on the minority class, in solving class imbalance problems.
  • Typical imbalanced learning (IL) algorithms attempt to eliminate the bias through data resampling [6, 16, 17, 25, 32] or reweighting [27, 30, 36] in the learning process.
  • All these methods have been observed to suffer from three major limitations: (I) unstable performance due to the sensitivity to outliers, (II)
  • Methods:

    Ensemble imbalanced learning (EIL) is known to effectively improve typical IL solutions by combining the outputs of multiple classifiers (e.g., [7, 29, 31, 35, 42]).
  • These EIL approaches prove to be highly competitive [22] and gain increasing popularity [15] in IL.
  • As the success of deep learning relies on the massive training data, mainly from the well-structured data domain like computer vision and natural language processing, the applications of these methods to other learning models in traditional classification tasks are highly constrained
  • Conclusion:

    The authors propose a novel imbalanced learning framework MESA.
  • It contains a meta-sampler that adaptively selects training data to learn effective cascade ensemble classifiers from imbalanced data.
  • Rather than following random heuristics, MESA directly optimizes its sampling strategy for better generalization performance.
  • Compared with prevailing meta-learning IL solutions that are limited to be co-optimized with DNNs, MESA is a generic framework capable of working with various learning models.
  • The authors plan to explore the potential of meta-knowledge-driven ensemble learning in the long-tail multi-classification problem
Tables
  • Table1: Comparisons of MESA with existing imbalanced learning methods, note that |N | |P|
  • Table2: Comparisons of MESA with other representative resampling methods
  • Table3: Comparisons of MESA with other representative under-sampling-based EIL methods
  • Table4: Cross-task transferability of the meta-sampler
Download tables as Excel
Related work
  • Fernández et al [1], Guo et al [15], and He et al [18, 19] provided systematic reviews of algorithms and applications of imbalanced learning. In this paper, we focus on binary imbalanced classification problem, which is one of the most widely studied problem setting [15, 22] in imbalanced learning. Such a problem extensively exists in practical applications, e.g., fraud detection (fraud vs. normal), medical diagnosis (sick vs. healthy), and cybersecurity (intrusion vs. user connection). We mainly review existing works on this problem as follows.

    Resampling Resampling methods focus on modifying the training set to balance the class distribution (i.e., over/under-sampling [6, 16, 17, 32, 38]) or filter noise (i.e., cleaning resampling [25, 41]). Random resampling usually leads to severe information loss or overfishing, hence many advanced methods explore distance information to guide their sampling process [15]. However, calculating the distance between instances is computationally expensive on large-scale datasets, and such strategies may even fail to work when the data does not fit their assumptions.
Funding
  • This work is supported by the National Natural Science Foundation of China (No.61976102, No.U19A2065)
Study subjects and analysis
samples: 87450
We select 13 representative methods from 4 major branches of resampling-based IL, i.e, under/over/cleaning-sampling and over-sampling with cleaning-sampling post-process. We test all methods on the challenging highly-imbalanced (IR=111, 87,450 samples) Protein Homo. task to check their efficiency and effectiveness. Five different classifiers, i.e., K-nearest neighbor (KNN), Gaussian Naïve Bayes (GNB), decision tree (DT), adaptive boosting (Boost), and gradient boosting machine (GBM), were used to collaborate with different resampling approaches

toy datasets with different levels of underlying class distribution overlapping: 3
Some examples of different meta-states. Comparisons of MESA with 4 representative traditional EIL methods (SMOTEBOOST [7], SMOTEBAGGING [42], RUSBOOST [35] and UNDERBAGGING [2]) on 3 toy datasets with different levels of underlying class distribution overlapping (less/mid/highlyoverlapped in 1st/2nd/3rd row). The number in the lower right corner of each subfigure represents the AUCPRC score of the corresponding classifier. Best viewed in color. Comparisons of MESA with other representative over-sampling-based EIL methods

Reference
  • Fernández Alberto, García Salvador, Galar Mikel, Prati Ronaldo C., and Krawczyk Bartosz. Learning from Imbalanced Data Sets. Springer, 2018.
    Google ScholarFindings
  • Ricardo Barandela, Rosa Maria Valdovinos, and José Salvador Sánchez. New applications of ensembles of classifiers. Pattern Analysis & Applications, 6(3):245–256, 2003.
    Google ScholarLocate open access versionFindings
  • Gustavo EAPA Batista, Ana LC Bazzan, and Maria Carolina Monard. Balancing training data for automated annotation of keywords: a case study. In WOB, pages 10–18, 2003.
    Google ScholarLocate open access versionFindings
  • Gustavo EAPA Batista, Ronaldo C Prati, and Maria Carolina Monard. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1):20–29, 2004.
    Google ScholarLocate open access versionFindings
  • Xiaoyong Chai, Lin Deng, Qiang Yang, and Charles X Ling. Test-cost sensitive naive bayes classification. In Fourth IEEE International Conference on Data Mining (ICDM’04), pages 51–58. IEEE, 2004.
    Google ScholarLocate open access versionFindings
  • Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321– 357, 2002.
    Google ScholarLocate open access versionFindings
  • Nitesh V Chawla, Aleksandar Lazarevic, Lawrence O Hall, and Kevin W Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In European conference on principles of data mining and knowledge discovery, pages 107–119.
    Google ScholarLocate open access versionFindings
  • Sheng Chen, Haibo He, and Edwardo A Garcia. Ramoboost: ranked minority oversampling in boosting. IEEE Transactions on Neural Networks, 21(10):1624–1642, 2010.
    Google ScholarLocate open access versionFindings
  • Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240. ACM, 2006.
    Google ScholarLocate open access versionFindings
  • Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
    Google ScholarFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1126–1135. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
    Google ScholarLocate open access versionFindings
  • Thore Graepel, Joaquin Quinonero Candela, Thomas Borchert, and Ralf Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. Omnipress, 2010.
    Google ScholarLocate open access versionFindings
  • Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870, 2018.
    Google ScholarLocate open access versionFindings
  • Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220–239, 2017.
    Google ScholarLocate open access versionFindings
  • Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, pages 878–887.
    Google ScholarLocate open access versionFindings
  • Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 1322–1328. IEEE, 2008.
    Google ScholarLocate open access versionFindings
  • Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, (9):1263–1284, 2008.
    Google ScholarLocate open access versionFindings
  • Haibo He and Yunqian Ma. Imbalanced learning: foundations, algorithms, and applications. John Wiley & Sons, 2013.
    Google ScholarFindings
  • Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002.
    Google ScholarLocate open access versionFindings
  • Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. arXiv preprint arXiv:1712.05055, 2017.
    Findings
  • Bartosz Krawczyk. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4):221–232, 2016.
    Google ScholarLocate open access versionFindings
  • Miroslav Kubat, Stan Matwin, et al. Addressing the curse of imbalanced training sets: one-sided selection. In Icml, volume 97, pages 179–186. Nashville, USA, 1997.
    Google ScholarLocate open access versionFindings
  • Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
    Google ScholarLocate open access versionFindings
  • Jorma Laurikkala. Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe, pages 63–66.
    Google ScholarLocate open access versionFindings
  • Buyu Li, Yu Liu, and Xiaogang Wang. Gradient harmonized single-stage detector. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 8577–8584, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
    Google ScholarLocate open access versionFindings
  • Charles X Ling, Qiang Yang, Jianning Wang, and Shichao Zhang. Decision trees with minimal costs. In Proceedings of the twenty-first international conference on Machine learning, page 69. ACM, 2004.
    Google ScholarLocate open access versionFindings
  • Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2):539– 550, 2009.
    Google ScholarLocate open access versionFindings
  • Xu-Ying Liu and Zhi-Hua Zhou. The influence of class imbalance on cost-sensitive learning: An empirical study. In Sixth International Conference on Data Mining (ICDM’06), pages 970–974. IEEE, 2006.
    Google ScholarLocate open access versionFindings
  • Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang, and Tie-Yan Liu. Self-paced ensemble for highly imbalanced massive data classification. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.
    Google ScholarLocate open access versionFindings
  • Inderjeet Mani and I Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, volume 126, 2003.
    Google ScholarLocate open access versionFindings
  • Minlong Peng, Qi Zhang, Xiaoyu Xing, Tao Gui, Xuanjing Huang, Yu-Gang Jiang, Keyu Ding, and Zhigang Chen. Trainable undersampling for class-imbalance learning. 2019.
    Google ScholarFindings
  • Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In International Conference on Machine Learning, pages 4334–4343, 2018.
    Google ScholarLocate open access versionFindings
  • Chris Seiffert, Taghi M Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40(1):185–197, 2010.
    Google ScholarLocate open access versionFindings
  • Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 761–769, 2016.
    Google ScholarLocate open access versionFindings
  • Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. Metaweight-net: Learning an explicit mapping for sample weighting. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Michael R Smith, Tony Martinez, and Christophe Giraud-Carrier. An instance level analysis of data complexity. Machine learning, 95(2):225–256, 2014.
    Google ScholarLocate open access versionFindings
  • Yanmin Sun, Mohamed S Kamel, Andrew KC Wong, and Yang Wang. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12):3358–3378, 2007.
    Google ScholarLocate open access versionFindings
  • Ivan Tomek. An experiment with the edited nearest-neighbor rule. IEEE Transactions on systems, Man, and Cybernetics, (6):448–452, 1976.
    Google ScholarLocate open access versionFindings
  • Ivan Tomek. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics, 6:769–772, 1976.
    Google ScholarLocate open access versionFindings
  • Shuo Wang and Xin Yao. Diversity analysis on imbalanced data sets by using ensemble models. In 2009 IEEE Symposium on Computational Intelligence and Data Mining, pages 324–331. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Dennis L Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, (3):408–421, 1972.
    Google ScholarLocate open access versionFindings
  • Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Lai Jian-Huang, and Tie-Yan Liu. Learning to teach with dynamic loss functions. In Advances in Neural Information Processing Systems, pages 6466–6477, 2018.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments