# Learning with Multiple Complementary Labels

ICML, pp. 3072-3081, 2020.

EI

Weibo:

Abstract:

A complementary label (CL) simply indicates an incorrect class of an example, but learning with CLs results in multi-class classifiers that can predict the correct class. Unfortunately, the problem setting only allows a single CL for each example, which notably limits its potential since our labelers may easily identify multiple CLs (MCLs...More

Code:

Data:

Introduction

- Ordinary machine learning tasks generally require massive data with accurate supervision information, while it is expensive and time-consuming to collect the data with high-quality labels.
- To alleviate this problem, the researchers have studied various weakly supervised learning frameworks (Zhou, 2018), including semi-supervised learning (Chapelle et al, 2006; Li & Liang, 2019; Miyato et al, 2018; Niu et al, 2013; Zhu & Goldberg, 2009), positive-.
- Multiple complementary labels (MCLs) would be more widespread than a single CL

Highlights

- Ordinary machine learning tasks generally require massive data with accurate supervision information, while it is expensive and time-consuming to collect the data with high-quality labels
- In order to solve the above problems, we further propose an unbiased risk estimator (Section 4.2) for learning with multiple complementary labels, which processes each set of multiple complementary labels as a whole
- We propose a novel problem setting called learning with multiple complementary labels (MCLs), which is a generation of complementary-label learning (Ishida et al, 2017; 2019; Yu et al, 2018)
- We find that the supervision information that multiple complementary labels hold is conceptually diluted after decomposition
- We further propose an unbiased risk estimator for learning with multiple complementary labels, which processes each set of multiple complementary labels as whole
- Our risk estimator does not rely on specific models or loss functions, we show that bounded loss is generally better than unbounded loss in our empirical risk estimator

Methods

- The authors conduct extensive experiments to evaluate the performance of the proposed approaches including the two wrappers, the unbiased risk estimator with various loss functions and the two upper-bound surrogate loss functions.

Datasets. - The authors use four base models including linear model, MLP model (d-500-k), ResNet (34 layers) (He et al, 2016), and DenseNet (22 layers) (Huang et al, 2017).
- The detailed descriptions of these datasets with the corresponding base models are provided in Appendix E.1.
- The authors first randomly sample s from ppsq, and uniformly and randomly sample a complementary label set Ys with size s

Conclusion

- The authors propose a novel problem setting called learning with multiple complementary labels (MCLs), which is a generation of complementary-label learning (Ishida et al, 2017; 2019; Yu et al, 2018)
- To solve this learning problem, the authors first design two wrappers that enable them to use arbitrary complementary-label learning approaches for learning with MCLs. the authors find that the supervision information that MCLs hold is conceptually diluted after decomposition.

Summary

## Introduction:

Ordinary machine learning tasks generally require massive data with accurate supervision information, while it is expensive and time-consuming to collect the data with high-quality labels.- To alleviate this problem, the researchers have studied various weakly supervised learning frameworks (Zhou, 2018), including semi-supervised learning (Chapelle et al, 2006; Li & Liang, 2019; Miyato et al, 2018; Niu et al, 2013; Zhu & Goldberg, 2009), positive-.
- Multiple complementary labels (MCLs) would be more widespread than a single CL
## Methods:

The authors conduct extensive experiments to evaluate the performance of the proposed approaches including the two wrappers, the unbiased risk estimator with various loss functions and the two upper-bound surrogate loss functions.

Datasets.- The authors use four base models including linear model, MLP model (d-500-k), ResNet (34 layers) (He et al, 2016), and DenseNet (22 layers) (Huang et al, 2017).
- The detailed descriptions of these datasets with the corresponding base models are provided in Appendix E.1.
- The authors first randomly sample s from ppsq, and uniformly and randomly sample a complementary label set Ys with size s
## Conclusion:

The authors propose a novel problem setting called learning with multiple complementary labels (MCLs), which is a generation of complementary-label learning (Ishida et al, 2017; 2019; Yu et al, 2018)- To solve this learning problem, the authors first design two wrappers that enable them to use arbitrary complementary-label learning approaches for learning with MCLs. the authors find that the supervision information that MCLs hold is conceptually diluted after decomposition.

- Table1: Supervision information for a set of MCLs (with size s). Setting #TP #FP Supervision Purity
- Table2: Classification accuracy (meanstd) of each algorithm on the four UCI datasets using a linear model for 5 trials. The best performance among all the approaches is highlighted in boldface. In addition, ‚{ ̋ indicates whether the performance of our approach (the best of EXP and LOG) is statistically superior/inferior to the comparing algorithm on each dataset (paired t-test at 0.05 significance level)
- Table3: Classification accuracy (meanstd) of each algorithm on the four benchmark datasets using a linear model for 5 trials. The best performance among all the approaches is highlighted in boldface. In addition, ‚{ ̋ indicates whether the performance of our approach (the best of EXP and LOG) is statistically superior/inferior to the comparing algorithm on each dataset (paired t-test at 0.05 significance level)
- Table4: Classification accuracy (meanstd) of each algorithm on the five benchmark datasets using neural networks for 5 trials. The best performance among all the approaches is highlighted in boldface. In addition, ‚{ ̋ indicates whether the performance of our approach (the best of EXP and LOG) is statistically superior/inferior to the comparing algorithm on each dataset (paired t-test at 0.05 significance level)

Related work

- In this section, we introduce some notations and briefly review the formulations of multi-class classification and complementary-label learning.

2.1. Multi-Class Classification

Suppose the feature space is X P Rd with d dimensions and the label space is Y “ t1, 2, . . . , ku with k classes, the instance x P X with its class label y P Y is sampled from an unknown probability distribution with density ppx, yq. Ordinary multi-class classification aims to induce a learning function f pxq : Rd Ñ Rk that minimizes the classification risk: Rpf q “ Eppx,yq“Lf pxq, y‰, (1)

where Lf pxq, yis a multi-class loss function. The predicted label is given as y “ argmaxyPY fypxq, where fypxq is the y-th coordinate of f pxq.

2.2. Complementary-Label Learning

Suppose the dataset for complementary-label learning is denoted by tpxi, ysiquni“1, where ysi P Y is a complementary label of xi, and each complementarily labeled example is sampled from pspx, ysq. Ishida et al (2017; 2019) assumed that pspx, ysq is expressed as: pspx, ysq “

Funding

- This research was supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-RP-2019-0013), National Satellite of Excellence in Trustworthy Software Systems (Award No: NSOE-TSS2019-01), and NTU
- BH was partially supported by HKBU Tier-1 Start-up Grant and HKBU CSD Start-up Grant
- GN and MS were supported by JST AIP Acceleration Research Grant Number JPMJCR20U3, Japan

Reference

- Bao, H., Niu, G., and Sugiyama, M. Classification from pairwise similarity and unlabeled data. In ICML, pp. 452–461, 2018.
- Bartlett, P. L. and Mendelson, S. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR, 3(11):463–482, 2002.
- Blake, C. L. and Merz, C. J. Uci repository of machine learning databases, 1998. URL http://archive.ics.uci.edu/ml/index.php.
- Han, B., Yao, J., Niu, G., Zhou, M., Tsang, I., Zhang, Y., and Sugiyama, M. Masking: A new perspective of noisy supervision. In NeurIPS, pp. 5836–5846, 2018a.
- Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NeurIPS, pp. 8527–8537, 2018b.
- He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, pp. 770–778, 2016.
- Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. In CVPR, pp. 4700–4708, 2017.
- Ishida, T., Niu, G., Hu, W., and Sugiyama, M. Learning from complementary labels. In NeurIPS, pp. 5644–5654, 2017.
- Ishida, T., Niu, G., and Sugiyama, M. Binary classification for positive-confidence data. In NeurIPS, pp. 5917–5928, 2018.
- Chapelle, O., Scholkopf, B., and Zien, A. Semi-Supervised Learning. MIT Press, 2006.
- Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., and Ha, D. Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
- Cour, T., Sapp, B., and Taskar, B. Learning from partial labels. JMLR, 12(5):1501–1536, 2011.
- du Plessis, M. C., Niu, G., and Sugiyama, M. Analysis of learning from positive and unlabeled data. In NeurIPS, pp. 703–711, 2014.
- du Plessis, M. C., Niu, G., and Sugiyama, M. Convex formulation for learning from positive and unlabeled data. In ICML, pp. 1386–1394, 2015.
- Elkan, C. and Noto, K. Learning classifiers from only positive and unlabeled data. In KDD, pp. 213–220, 2008.
- Feng, L. and An, B. Leveraging latent label distributions for partial label learning. In IJCAI, pp. 2107–2113, 2018.
- Feng, L. and An, B. Partial label learning with self-guided retraining. In AAAI, pp. 3542–3549, 2019a.
- Feng, L. and An, B. Partial label learning by semantic difference maximization. In IJCAI, pp. 2294–2300, 2019b.
- Ishida, T., Niu, G., Menon, A. K., and Sugiyama, M. Complementary-label learning for arbitrary losses and models. In ICML, pp. 2971–2980, 2019.
- Kaneko, T., Sato, I., and Sugiyama, M. Online multiclass classification based on prediction margin for partial feedback. arXiv preprint arXiv:1902.01056, 2019.
- Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. In ICLR, 2015.
- Kiryo, R., Niu, G., du Plessis, M. C., and Sugiyama, M. Positive-unlabeled learning with non-negative risk estimator. In NeurIPS, pp. 1674–1684, 2017.
- Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
- Lang, K. Newsweeder: Learning to filter netnews. In ICML, 1995.
- LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Li, Y.-F. and Liang, D.-M. Safe semi-supervised learning: a brief introduction. Frontiers of Computer Science, 13(4): 669–676, 2019.
- Ghosh, A., Kumar, H., and Sastry, P. Robust loss functions under label noise for deep neural networks. In AAAI, 2017.
- Lu, N., Niu, G., Menon, A. K., and Sugiyama, M. On the minimal supervision for training any binary classifier from only unlabeled data. In ICLR, 2019.
- Lu, N., Zhang, T., Niu, G., and Sugiyama, M. Mitigating overfitting in supervised classification from two unlabeled datasets: A consistent risk correction approach. In AISTATS, 2020.
- Menon, A., Van Rooyen, B., Ong, C. S., and Williamson, B. Learning from corrupted binary labels via classprobability estimation. In ICML, pp. 125–134, 2015.
- Menon, A. K., Rawat, A. S., Reddi, S. J., and Kumar, S. Can gradient clipping mitigate label noise? In ICLR, 2020.
- Miyato, T., Maeda, S.-i., Koyama, M., and Ishii, S. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. TPAMI, 41(8): 1979–1993, 2018.
- Niu, G., Jitkrittum, W., Dai, B., Hachiya, H., and Sugiyama, M. Squared-loss mutual information regularization: A novel information-theoretic approach to semi-supervised learning. In ICML, pp. 10–18, 2013.
- Rezaei, M., Yang, H., and Meinel, C. Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation. Multimedia Tools and Applications, pp. 1–20, 2019.
- Sakai, T., du Plessis, M. C., Niu, G., and Sugiyama, M. Semi-supervised classification based on classification from positive and unlabeled data. In ICML, pp. 2998– 3006, 2017.
- Sakai, T., Niu, G., and Sugiyama, M. Semi-supervised auc optimization based on positive-unlabeled learning. MLJ, 107(4):767–794, 2018.
- Wang, X., Kodirov, E., Hua, Y., and Robertson, N. M. Improving mae against cce under label noise. arXiv preprint arXiv:1903.12141, 2019.
- Wei, H., Feng, L., Chen, X., and An, B. Combating noisy labels by agreement: A joint training method with coregularization. In CVPR, June 2020.
- Xia, X., Liu, T., Wang, N., Han, B., Gong, C., Niu, G., and Sugiyama, M. Are anchor points really indispensable in label-noise learning? In NeurIPS, pp. 6835–6846, 2019.
- Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Yu, X., Liu, T., Gong, M., and Tao, D. Learning with biased complementary labels. In ECCV, pp. 68–83, 2018.
- Zhang, M.-L. and Yu, F. Solving the partial label learning problem: An instance-based approach. In IJCAI, pp. 4048–4054, 2015.
- Zhang, T. Statistical analysis of some multi-category large margin classification methods. JMLR, 5(10):1225–1251, 2004.
- Zhang, Z. and Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. In NeurIPS, pp. 8778–8788, 2018.
- Zhou, Z. A brief introduction to weakly supervised learning. National Science Review, 5(1):44–53, 2018.
- Zhu, X. and Goldberg, A. B. Introduction to semisupervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 3(1):1–130, 2009.

Full Text

Tags

Comments