Can Cross Entropy Loss Be Robust to Label Noise?

IJCAI, pp. 2206-2212, 2020.

Cited by: 0|Bibtex|Views30|DOI:https://doi.org/10.24963/ijcai.2020/305
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise

Abstract:

Trained with the standard cross entropy loss, deep neural networks can achieve great performance on correctly labeled data. However, if the training data is corrupted with label noise, deep models tend to overfit the noisy labels, thereby achieving poor generation performance. To remedy this issue, several loss functions have been propose...More

Code:

Data:

0
Introduction
  • Deep Neural Networks (DNNs) have achieved great advances over the past years.
  • DNNs can achieve great classification performance.
  • Incorrect labels in large-scale datasets are often inevitable.
  • It can be more beneficial to have datasets with more but noisier labels than less but more accurate labels [Khetan et al, 2017].
  • Training a robust classifier in the presence of label noise is an increasingly valued task
Highlights
  • Deep Neural Networks (DNNs) have achieved great advances over the past years
  • We present a detailed theoretical analysis to certify the robustness of Taylor Cross Entropy against label noise
  • In all cases, Lt-CE is superior to CE, Mean Absolute Error and Mean Squared Error
  • Lt-CE is derived from CE, with
  • We propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise
  • We present a detailed theoretical analysis to certify the robustness of this framework
Results
  • Table 1 reports the detailed experimental results of each loss function on the benchmark datasets.
  • In Table 1, /◦ indicates whether the performance of the proposed approach is statistically superior/inferior to other comparing approaches on each dataset.
  • Out of the total 240 cases, the proposed approach is statistically superior to other comparing approaches in 83.33% cases and inferior to the comparing approaches in only 8.33% cases.
  • In all cases, Lt-CE is superior to CE, MAE and MSE.
  • MAE and MSE as average components.
  • The authors may infer that Lt-CE maintains their advantages and surpasses them
Conclusion
  • The authors propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise.
  • The authors' framework can enable to weight the extent of fitting the training labels by controlling the order of Taylor Series for Categorical Cross Entropy (CCE) loss, and reveals the intrinsic relationships between CCE and other loss functions, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE).
  • The authors will explore if there exist robust loss functions that do not include any hyper-parameters
Summary
  • Introduction:

    Deep Neural Networks (DNNs) have achieved great advances over the past years.
  • DNNs can achieve great classification performance.
  • Incorrect labels in large-scale datasets are often inevitable.
  • It can be more beneficial to have datasets with more but noisier labels than less but more accurate labels [Khetan et al, 2017].
  • Training a robust classifier in the presence of label noise is an increasingly valued task
  • Results:

    Table 1 reports the detailed experimental results of each loss function on the benchmark datasets.
  • In Table 1, /◦ indicates whether the performance of the proposed approach is statistically superior/inferior to other comparing approaches on each dataset.
  • Out of the total 240 cases, the proposed approach is statistically superior to other comparing approaches in 83.33% cases and inferior to the comparing approaches in only 8.33% cases.
  • In all cases, Lt-CE is superior to CE, MAE and MSE.
  • MAE and MSE as average components.
  • The authors may infer that Lt-CE maintains their advantages and surpasses them
  • Conclusion:

    The authors propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise.
  • The authors' framework can enable to weight the extent of fitting the training labels by controlling the order of Taylor Series for Categorical Cross Entropy (CCE) loss, and reveals the intrinsic relationships between CCE and other loss functions, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE).
  • The authors will explore if there exist robust loss functions that do not include any hyper-parameters
Tables
  • Table1: Average test accuracy (%) and standard deviation (over 5 trials) on benchmark datasets with symmetric label noise and asymmetric label noise. The best results are highlighted in bold. In addition, •/◦ indicates whether the performance of our approach is statistically superior/inferior to the comparing approach on each dataset (paired t-test at 0.05 significance level)
  • Table2: Summary of benchmark datasets and models
Download tables as Excel
Related work
  • In this section, we briefly review existing works on learning in the presence of label noise.

    Noise rate estimation. Some of the early works [Natarajan et al, 2013; Sukhbaatar and Fergus, 2014; Menon et al, 2015; Patrini et al, 2017] aim to estimate the label transition matrix (sometimes called confusion matrix), and use it to train the target model. For this type of approach, the classification performance hinges on the quality of noise rate estimation [Goldberger and Ben-Reuven, 2017; Hendrycks et al, 2018; Han et al, 2018b; Xia et al, 2019]. However, noise rate estimation is challenging, especially on datasets with a huge number of classes.

    Robust loss functions. Designing loss functions that are robust to label noise has been received increasing attention from researchers. The first work is from [Ghosh et al, 2015], which demonstrates that binary loss functions that satisfy the symmetric condition (z) + (−z) = c (e.g., ramp loss and sigmoid loss) where c is a constant, are robust to label noise for binary classification. Then, for multi-class classification, loss functions that satisfy the symmetric condition k j=1
Funding
  • This research is supported by NSOE-TSS2019-01, AISG-RP2019-0013, and NTU
Reference
  • [Berthon et al., 2020] Antonin Berthon, Bo Han, Gang Niu, Tongliang Liu, and Masashi Sugiyama. Confidence scores make instance-dependent label-noise learning possible. arXiv preprint arXiv:2001.03772, 2020.
    Findings
  • [Clanuwat et al., 2018] Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
    Findings
  • [Ghosh et al., 2015] Aritra Ghosh, Naresh Manwani, and PS Sastry. Making risk minimization tolerant to label noise. Neurocomputing, 160:93–107, 2015.
    Google ScholarLocate open access versionFindings
  • [Ghosh et al., 2017] Aritra Ghosh, Himanshu Kumar, and PS Sastry. Robust loss functions under label noise for deep neural networks. In AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • [Goldberger and Ben-Reuven, 2017] Jacob Goldberger and Ehud Ben-Reuven. Training deep neural-networks using a noise adaptation layer. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • [Han et al., 2018a] Bo Han, Gang Niu, Jiangchao Yao, Xingrui Yu, Miao Xu, Ivor Tsang, and Masashi Sugiyama. Pumpout: A meta approach to robust deep learning with noisy labels. arXiv preprint arXiv:1809.11008, 2018.
    Findings
  • [Han et al., 2018b] Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, and Masashi Sugiyama. Masking: A new perspective of noisy supervision. In NuerIPS, pages 5836–5846, 2018.
    Google ScholarLocate open access versionFindings
  • [Han et al., 2018c] Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NuerIPS, pages 8527–8537, 2018.
    Google ScholarLocate open access versionFindings
  • [He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • [Hendrycks et al., 2018] Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using trusted data to train deep networks on labels corrupted by severe noise. In NuerIPS, pages 10456–10465, 2018.
    Google ScholarLocate open access versionFindings
  • [Jiang et al., 2017] Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning datadriven curriculum for very deep neural networks on corrupted labels. arXiv preprint arXiv:1712.05055, 2017.
    Findings
  • [Khetan et al., 2017] Ashish Khetan, Zachary C Lipton, and Anima Anandkumar. Learning from noisy singly-labeled data. arXiv preprint arXiv:1712.04577, 2017.
    Findings
  • [Kingma and Ba, 2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • [Krizhevsky et al., 2009] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical Report, 2009.
    Google ScholarFindings
  • [LeCun et al., 1998] Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • [Menon et al., 2015] Aditya Menon, Brendan Van Rooyen, Cheng Soon Ong, and Bob Williamson. Learning from corrupted binary labels via class-probability estimation. In ICML, pages 125–134, 2015.
    Google ScholarLocate open access versionFindings
  • [Menon et al., 2020] Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, and Sanjiv Kumar. Can gradient clipping mitigate label noise? In ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • [Natarajan et al., 2013] Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. In NuerIPS, pages 1196–1204, 2013.
    Google ScholarLocate open access versionFindings
  • [Patrini et al., 2017] Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In CVPR, pages 1944–1952, 2017.
    Google ScholarLocate open access versionFindings
  • [Sukhbaatar and Fergus, 2014] Sainbayar Sukhbaatar and Rob Fergus. Learning from noisy labels with deep neural networks. arXiv preprint arXiv:1406.2080, 2(3):4, 2014.
    Findings
  • [Tanaka et al., 2018] Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, and Kiyoharu Aizawa. Joint optimization framework for learning with noisy labels. In CVPR, pages 5552–5560, 2018.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2019] Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. Symmetric cross entropy for robust learning with noisy labels. In ICCV, pages 322–330, 2019.
    Google ScholarLocate open access versionFindings
  • [Wei et al., 2020] Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. Combating noisy labels by agreement: A joint training method with co-regularization. arXiv preprint arXiv:2003.02752v3, 2020.
    Findings
  • [Xia et al., 2019] Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. Are anchor points really indispensable in label-noise learning? In NuerIPS, pages 6835–6846, 2019.
    Google ScholarLocate open access versionFindings
  • [Xiao et al., 2017] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
    Findings
  • [Yang et al., 2019] Hansi Yang, Quanming Yao, Bo Han, and Gang Niu. Searching to exploit memorization effect in learning from corrupted labels. arXiv preprint arXiv:1911.02377, 2019.
    Findings
  • [Yi and Wu, 2019] Kun Yi and Jianxin Wu. Probabilistic end-to-end noise correction for learning with noisy labels. In CVPR, pages 7017–7025, 2019.
    Google ScholarLocate open access versionFindings
  • [Yu et al., 2019] Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor W Tsang, and Masashi Sugiyama. How does disagreement help generalization against label corruption? In ICML, pages 7164–7173, 2019.
    Google ScholarLocate open access versionFindings
  • [Zhang and Sabuncu, 2018] Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. In NuerIPS, pages 8778–8788, 2018.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments