AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We present a novel active deep learning model that systematically leverages two distinct sources of uncertainty, vacuity and dissonance, to effectively explore a large and high-dimensional data space for label-efficient training of Deep learning models

Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning

NIPS 2020, (2020)

被引用0|浏览23
EI
下载 PDF 全文
引用
微博一下

摘要

We present a novel multi-source uncertainty prediction approach that enables deep learning (DL) models to be actively trained with much less labeled data. By leveraging the second-order uncertainty representation provided by subjective logic (SL), we conduct evidence-based theoretical analysis and formally decompose the predicted entropy ...更多

代码

数据

0
简介
  • Deep learning (DL) models establish dominating status among other types of supervised learning models by achieving the state-of-the-art performance in various application domains.
  • Such an advantage only emerges when a huge amount of labeled training data is available.
  • The model may provide misleading information that makes data sampling from a high-dimensional search space even more difficult.
  • Complex data may contain a large number of classes
重点内容
  • Deep learning (DL) models establish dominating status among other types of supervised learning models by achieving the state-of-the-art performance in various application domains. Such an advantage only emerges when a huge amount of labeled training data is available. This limitation slows down the pace of DL, especially when being applied to knowledge-rich domains, such as medicine and biology, where large-scale labeled samples are too expensive to obtain from well-trained experts
  • We develop a novel loss function that augments DL based evidence prediction with uncertainty anchor sample identification
  • We present a novel active deep learning model that systematically leverages two distinct sources of uncertainty, vacuity and dissonance, to effectively explore a large and high-dimensional data space for label-efficient training of DL models
  • The proposed active deep learning (ADL) model benefits from the evidencebased entropy decomposition that follows from our theoretical analysis of belief vacuity and belief dissonance under the subjective logic (SL) framework
  • The multi-source uncertainty can be accurately estimated through a novel loss function that augments DL based evidence prediction with vacuity-aware regularization of the model parameters
方法
  • The authors report the experimental results on both synthetic and real-world data. The former aims to verify the key theoretical properties of ADL, including entropy decomposition and multi-source uncertainty prediction, and how these properties contribute to AL.
  • In each AL iteration, the authors sample one data instance
  • This is fundamentally different than some recent DL based AL methods, such as [3, 17], which perform batch-mode sampling with a large batch size.
  • These models are not applicable when only limited label budgets are available which is true for many special domains where labeling is very costly.
  • The authors use LeNet with Relu for activation
结论
  • The authors present a novel active deep learning model that systematically leverages two distinct sources of uncertainty, vacuity and dissonance, to effectively explore a large and high-dimensional data space for label-efficient training of DL models.
  • The proposed ADL model benefits from the evidencebased entropy decomposition that follows from the theoretical analysis of belief vacuity and belief dissonance under the SL framework.
  • The multi-source uncertainty can be accurately estimated through a novel loss function that augments DL based evidence prediction with vacuity-aware regularization of the model parameters.
相关工作
  • Uncertainty Quantification in Belief/Evidence Theory: In belief/evidence theory, uncertainty reasoning has been substantially explored through Fuzzy Logic [5], Dempster-Shafer Theory (DST) [6], and Subjective Logic (SL) [4]. Unlike the efforts made in ML/DL, belief theorists focused on reasoning of inherent uncertainty in information resulting from unreliable, incomplete, deceptive, and/or conflicting evidence. SL considered uncertainty in subjective opinions in terms of vacuity (i.e., lack of evidence) and vagueness (i.e., failure of discriminating a belief state) [4]. Vacuity has been used as an effective vehicle to detect OOD queries through evidence learning, achieved under the typical DL setting with ample training samples [7]. Recently, other dimensions of uncertainty have been studied, such as dissonance (due to conflicting evidence) and consonance (due to evidence about composite subsets of state values) [8].

    Uncertainty in Deep Learning: In DL, aleatoric uncertainty (AU) and epistemic uncertainty (EU) have been studies using Bayesian Neural Networks (BNNs) for computer vision. AU consists of homoscedastic uncertainty (i.e., constant errors for different inputs) and heteroscedastic uncertainty (i.e., different errors for different inputs) [9]. A Bayesian DL (BDL) framework was presented to estimate both AU and EU simultaneously in regression (e.g., depth regression) and classification settings (e.g., semantic segmentation) [10]. A new type of uncertainty, called distributional uncertainty, is defined based on distributional mismatch between the test and training data distributions [11]. Other than exploring the new sources of uncertainty, recent works also focus on better estimating the well-known first-order uncertainty, predictive entropy, in DL models through calibration [12] or ensemble [13] methods. Even though the recent efforts offer abundant uncertainty measurements for DL, how to leverage these uncertainty information for better active sampling remains sparse. For example, while distributional uncertainty can be used for data sampling in AL, the prior network in [11] needs to be properly trained as its parameter must encapsulate knowledge of both in-domain distribution and the decision boundary, making it not very suitable for AL. This is also evidenced by our experimental results on real-world data. The Noise-Contrastive Priors can also be used to support better exploration in data sampling as it encourages high uncertainty near the boundary of the training data [14]. However, in the initial phase of AL when the training data is very limited, this measure can be insufficient to explore data samples faraway from the training data.
基金
  • Weishi Shi and Qi Yu are supported in part by an ONR award N00014-18-1-2875 and an NSF IIS award IIS-1814450
  • Xujiang Zhao and Feng Chen are supported by the NSF under Grant No #1815696 and #1750911
  • The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agency
研究对象与分析
labeled data samples: 9
Such an issue may become more severe when training a neural network (NN)/DL active learner due to model overfitting as described above. Figure 1(a) shows the predicted entropy by an NN active learner trained using nine labeled data samples, which are in black color and evenly distributed in three classes. The standard softmax layer is used in the output layer to generate class probabilities over three classes, each of which is a mixture of two Gaussian’s

samples: 50
mimic the existence of OOD, we generate three mixtures of Gaussian’s. Each mixture consists of a major and a smaller (i.e., OOD) clusters with 750 and 50 samples, respectively. We center the major Gaussian components from each class and put their corresponding OOD components away from them

datasets: 3
5.2 Real data. The real-world experiment is conducted on three datasets, MNIST, notMNIST, and CIFAR-10, all of which have ten classes. To mimic the real-world AL scenario, we leave 2-5 classes out for initial training

datasets: 3
We compare the proposed model with EDL [7] (entropy, vacuity+dissonance), BALD [2] (epistemic), PriorNN [11] (distributional uncertainty), and softmax (entropy, random), where in the parenthesis are the uncertainty measurements used for sampling. Figures 4 and 5 show that ADL consistently outperforms other models on all three datasets. The advantages of ADL are twofold

引用论文
  • Burr Settles. Active learning literature survey. Technical report, University of WisconsinMadison Department of Computer Sciences, 2009.
    Google ScholarFindings
  • Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1183–119JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Audun Jøsang. Subjective logic. Springer, 2016.
    Google ScholarFindings
  • Clarence W De Silva. Intelligent control: fuzzy logic applications. CRC press, 2018.
    Google ScholarFindings
  • Kari Sentz, Scott Ferson, et al. Combination of evidence in Dempster-Shafer theory, volume 4015.
    Google ScholarLocate open access versionFindings
  • Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. In Advances in Neural Information Processing Systems, pages 3179– 3189, 2018.
    Google ScholarLocate open access versionFindings
  • Audun Jøsang, Jin-Hee Cho, and Feng Chen. Uncertainty characteristics of subjective opinions. In Fusion, pages 1998–2005. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Yarin Gal. Uncertainty in deep learning. University of Cambridge, 2016.
    Google ScholarFindings
  • Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In NIPS, pages 5574–5584, 2017.
    Google ScholarLocate open access versionFindings
  • Andrey Malinin and Mark Gales. Predictive uncertainty estimation via prior networks. arXiv preprint arXiv:1802.10501, 2018.
    Findings
  • Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1321–1330. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems, pages 6402–6413, 2017.
    Google ScholarLocate open access versionFindings
  • Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, and James Davidson. Reliable uncertainty estimates in deep neural networks using noise contrastive priors. 2018.
    Google ScholarFindings
  • Dan Wang and Yi Shang. A new active labeling method for deep learning. In 2014 International joint conference on neural networks (IJCNN), pages 112–119. IEEE, 2014.
    Google ScholarLocate open access versionFindings
  • Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, and Liang Lin. Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology, 27(12):2591–2600, 2016.
    Google ScholarLocate open access versionFindings
  • Jordan T Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. In ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • Nils J Nilsson. Probabilistic logic. Artificial intelligence, 28(1):71–87, 1986.
    Google ScholarLocate open access versionFindings
  • Glenn Shafer. A mathematical theory of evidence, volume 42. Princeton university press, 1976.
    Google ScholarFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630–3638, 2016.
    Google ScholarLocate open access versionFindings
  • Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pages 7167–7177, 2018.
    Google ScholarLocate open access versionFindings
作者
Xujiang Zhao
Xujiang Zhao
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科