We present a novel active deep learning model that systematically leverages two distinct sources of uncertainty, vacuity and dissonance, to effectively explore a large and high-dimensional data space for label-efficient training of Deep learning models
Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning
NIPS 2020, (2020)
We present a novel multi-source uncertainty prediction approach that enables deep learning (DL) models to be actively trained with much less labeled data. By leveraging the second-order uncertainty representation provided by subjective logic (SL), we conduct evidence-based theoretical analysis and formally decompose the predicted entropy ...更多
下载 PDF 全文
- Deep learning (DL) models establish dominating status among other types of supervised learning models by achieving the state-of-the-art performance in various application domains.
- Such an advantage only emerges when a huge amount of labeled training data is available.
- The model may provide misleading information that makes data sampling from a high-dimensional search space even more difficult.
- Complex data may contain a large number of classes
- Deep learning (DL) models establish dominating status among other types of supervised learning models by achieving the state-of-the-art performance in various application domains. Such an advantage only emerges when a huge amount of labeled training data is available. This limitation slows down the pace of DL, especially when being applied to knowledge-rich domains, such as medicine and biology, where large-scale labeled samples are too expensive to obtain from well-trained experts
- We develop a novel loss function that augments DL based evidence prediction with uncertainty anchor sample identification
- We present a novel active deep learning model that systematically leverages two distinct sources of uncertainty, vacuity and dissonance, to effectively explore a large and high-dimensional data space for label-efficient training of DL models
- The proposed active deep learning (ADL) model benefits from the evidencebased entropy decomposition that follows from our theoretical analysis of belief vacuity and belief dissonance under the subjective logic (SL) framework
- The multi-source uncertainty can be accurately estimated through a novel loss function that augments DL based evidence prediction with vacuity-aware regularization of the model parameters
- The authors report the experimental results on both synthetic and real-world data. The former aims to verify the key theoretical properties of ADL, including entropy decomposition and multi-source uncertainty prediction, and how these properties contribute to AL.
- In each AL iteration, the authors sample one data instance
- This is fundamentally different than some recent DL based AL methods, such as [3, 17], which perform batch-mode sampling with a large batch size.
- These models are not applicable when only limited label budgets are available which is true for many special domains where labeling is very costly.
- The authors use LeNet with Relu for activation
- The authors present a novel active deep learning model that systematically leverages two distinct sources of uncertainty, vacuity and dissonance, to effectively explore a large and high-dimensional data space for label-efficient training of DL models.
- The proposed ADL model benefits from the evidencebased entropy decomposition that follows from the theoretical analysis of belief vacuity and belief dissonance under the SL framework.
- The multi-source uncertainty can be accurately estimated through a novel loss function that augments DL based evidence prediction with vacuity-aware regularization of the model parameters.
- Uncertainty Quantification in Belief/Evidence Theory: In belief/evidence theory, uncertainty reasoning has been substantially explored through Fuzzy Logic , Dempster-Shafer Theory (DST) , and Subjective Logic (SL) . Unlike the efforts made in ML/DL, belief theorists focused on reasoning of inherent uncertainty in information resulting from unreliable, incomplete, deceptive, and/or conflicting evidence. SL considered uncertainty in subjective opinions in terms of vacuity (i.e., lack of evidence) and vagueness (i.e., failure of discriminating a belief state) . Vacuity has been used as an effective vehicle to detect OOD queries through evidence learning, achieved under the typical DL setting with ample training samples . Recently, other dimensions of uncertainty have been studied, such as dissonance (due to conflicting evidence) and consonance (due to evidence about composite subsets of state values) .
Uncertainty in Deep Learning: In DL, aleatoric uncertainty (AU) and epistemic uncertainty (EU) have been studies using Bayesian Neural Networks (BNNs) for computer vision. AU consists of homoscedastic uncertainty (i.e., constant errors for different inputs) and heteroscedastic uncertainty (i.e., different errors for different inputs) . A Bayesian DL (BDL) framework was presented to estimate both AU and EU simultaneously in regression (e.g., depth regression) and classification settings (e.g., semantic segmentation) . A new type of uncertainty, called distributional uncertainty, is defined based on distributional mismatch between the test and training data distributions . Other than exploring the new sources of uncertainty, recent works also focus on better estimating the well-known first-order uncertainty, predictive entropy, in DL models through calibration  or ensemble  methods. Even though the recent efforts offer abundant uncertainty measurements for DL, how to leverage these uncertainty information for better active sampling remains sparse. For example, while distributional uncertainty can be used for data sampling in AL, the prior network in  needs to be properly trained as its parameter must encapsulate knowledge of both in-domain distribution and the decision boundary, making it not very suitable for AL. This is also evidenced by our experimental results on real-world data. The Noise-Contrastive Priors can also be used to support better exploration in data sampling as it encourages high uncertainty near the boundary of the training data . However, in the initial phase of AL when the training data is very limited, this measure can be insufficient to explore data samples faraway from the training data.
- Weishi Shi and Qi Yu are supported in part by an ONR award N00014-18-1-2875 and an NSF IIS award IIS-1814450
- Xujiang Zhao and Feng Chen are supported by the NSF under Grant No #1815696 and #1750911
- The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agency
labeled data samples: 9
Such an issue may become more severe when training a neural network (NN)/DL active learner due to model overfitting as described above. Figure 1(a) shows the predicted entropy by an NN active learner trained using nine labeled data samples, which are in black color and evenly distributed in three classes. The standard softmax layer is used in the output layer to generate class probabilities over three classes, each of which is a mixture of two Gaussian’s
mimic the existence of OOD, we generate three mixtures of Gaussian’s. Each mixture consists of a major and a smaller (i.e., OOD) clusters with 750 and 50 samples, respectively. We center the major Gaussian components from each class and put their corresponding OOD components away from them
5.2 Real data. The real-world experiment is conducted on three datasets, MNIST, notMNIST, and CIFAR-10, all of which have ten classes. To mimic the real-world AL scenario, we leave 2-5 classes out for initial training
We compare the proposed model with EDL  (entropy, vacuity+dissonance), BALD  (epistemic), PriorNN  (distributional uncertainty), and softmax (entropy, random), where in the parenthesis are the uncertainty measurements used for sampling. Figures 4 and 5 show that ADL consistently outperforms other models on all three datasets. The advantages of ADL are twofold
- Burr Settles. Active learning literature survey. Technical report, University of WisconsinMadison Department of Computer Sciences, 2009.
- Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1183–119JMLR. org, 2017.
- Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
- Audun Jøsang. Subjective logic. Springer, 2016.
- Clarence W De Silva. Intelligent control: fuzzy logic applications. CRC press, 2018.
- Kari Sentz, Scott Ferson, et al. Combination of evidence in Dempster-Shafer theory, volume 4015.
- Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. In Advances in Neural Information Processing Systems, pages 3179– 3189, 2018.
- Audun Jøsang, Jin-Hee Cho, and Feng Chen. Uncertainty characteristics of subjective opinions. In Fusion, pages 1998–2005. IEEE, 2018.
- Yarin Gal. Uncertainty in deep learning. University of Cambridge, 2016.
- Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In NIPS, pages 5574–5584, 2017.
- Andrey Malinin and Mark Gales. Predictive uncertainty estimation via prior networks. arXiv preprint arXiv:1802.10501, 2018.
- Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1321–1330. JMLR. org, 2017.
- Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems, pages 6402–6413, 2017.
- Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, and James Davidson. Reliable uncertainty estimates in deep neural networks using noise contrastive priors. 2018.
- Dan Wang and Yi Shang. A new active labeling method for deep learning. In 2014 International joint conference on neural networks (IJCNN), pages 112–119. IEEE, 2014.
- Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, and Liang Lin. Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology, 27(12):2591–2600, 2016.
- Jordan T Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. In ICLR, 2020.
- Nils J Nilsson. Probabilistic logic. Artificial intelligence, 28(1):71–87, 1986.
- Glenn Shafer. A mathematical theory of evidence, volume 42. Princeton university press, 1976.
- Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630–3638, 2016.
- Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pages 7167–7177, 2018.