AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our work shows promise in various high-stakes domains, such as healthcare and criminal justice, where both uncertainty quantification and prior expert constraints are necessary for safe and desirable model behavior

Incorporating Interpretable Output Constraints in Bayesian Neural Networks

NIPS 2020, (2020)

Cited by: 1|Views24
EI
Full Text
Bibtex
Weibo

Abstract

Domains where supervised models are deployed often come with task-specific constraints, such as prior expert knowledge on the ground-truth function, or desiderata like safety and fairness. We introduce a novel probabilistic framework for reasoning with such constraints and formulate a prior that enables us to effectively incorporate the...More
0
Introduction
  • In domains where predictive errors are prohibitively costly, the authors desire models that can both capture predictive uncertainty as well as enforce prior human expertise or knowledge.
  • Recent work has addressed the challenge of incorporating richer functional knowledge into BNNs, such as preventing miscalibrated model predictions out-of-distribution [9], enforcing smoothness constraints [2] or specifying priors induced by covariance structures in the dataset [25, 19].
  • Unlike other types of functional beliefs, output constraints are intuitive, interpretable and specified, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Highlights
  • In domains where predictive errors are prohibitively costly, we desire models that can both capture predictive uncertainty as well as enforce prior human expertise or knowledge
  • Our contributions are: (a) we present a formal framework that lays out what it means to learn from output constraints in the probabilistic setting that Bayesian neural networks (BNNs) operate in, (b) we formulate a prior that enforces output constraint satisfaction on the resulting posterior predictive, including a variant that can be amortized across multiple tasks, (c) we demonstrate proof-of-concepts on toy simulations and apply Output-Constrained BNN (OC-BNN) to three real-world, high-dimensional datasets: (i) enforcing physiologically feasible interventions on a clinical action prediction task, (ii) enforcing a racial fairness constraint on a recidivism prediction task where the training data is biased, and (iii) enforcing recourse on a credit scoring task where a subpopulation is poorly represented by data
  • This is because the constraints are intentionally specified in input regions out-of-distribution, and incorporating this knowledge augments what the OC-BNN learns from Dtr alone
  • We propose OC-BNNs, which allow us to incorporate interpretable and intuitive prior knowledge, in the form of output constraints, into BNNs
  • Through a series of low-dimensional simulations as well as real-world applications with realistic constraints, we show that OC-BNNs generally maintain the desirable properties of ordinary BNNs while satisfying specified constraints
  • Our work shows promise in various high-stakes domains, such as healthcare and criminal justice, where both uncertainty quantification and prior expert constraints are necessary for safe and desirable model behavior
Methods
  • Experiments with Real

    World Data

    To demonstrate the efficacy of OC-BNNs, the authors apply meaningful and interpretable output constraints on real-life datasets.
  • The authors construct a dataset (N = 405K) of 8 relevant features and consider a binary classification task of whether clinical interventions for hypotension management — namely, vasopressors or IV fluids — should be taken for any patient.
  • The authors specify two physiologically feasible, positive constraints: (1) if the patient has high creatinine, high BUN and low urine, action should be taken (Cy = {1}); (2) if the patient has high lactate and low bicarbonate, action should be taken.
  • In addition to accuracy and F1 score on the test
Results
  • By constraining recidivism prediction to the defendant’s actual criminal history, OC-BNNs strictly enforce a fairness constraint
  • On both versions of Dtr, the baseline BNN predicts unequal risk for the two groups since the output labels (COMPAS decisions) are themselves biased.
  • This inequality is more stark when the race feature is included, as the model learns the explicit, positive correlation between race and the output label.
  • When an actionability constraint is enforced, the OC-BNN reduces the effort of recourse without sacrificing predictive accuracy on the test set, reaching the closest to the ground-truth recourse.
Conclusion
  • The usage of OC-BNNs depends on how the authors view constraints in relation to data. The clinical action prediction and credit scoring tasks are cases where the constraint is a complementary source of information, being defined in input regions where Dtr is sparse.
  • In contrast with [7, 19, 25], OC-BNNs take a sampling-based approach to bridge functional and parametric objectives
  • The simplicity of this can be advantageous — output constraints are a common currency of knowledge specified by domain experts, in contrast to more technical forms such as stochastic process priors.
  • (1) OC-BNNs allow them to manipulate an interpretable form of knowledge
  • They can be useful even to domain experts without technical machine learning expertise, who can specify such constraints for model behavior.
  • The authors intentionally showcase applications of high societal relevance, such as recidivism prediction and credit scoring, where the ability to specify and satisfy constraints can lead to fairer and more ethical model behavior
Tables
  • Table1: Compared to the baseline, the OC-BNN maintains equally high accuracy and F1 score on both train and test sets. The violation fraction decreased about six-fold when using OC-BNNs
  • Table2: The OC-BNN predicts both racial groups with almost equal rates of high-risk recidivism, compared to a 5-fold difference on the baseline. However, accuracy metrics decrease (expectedly)
  • Table3: All three models have comparable accuracy on the test set. However, the OC-BNN has the lowest recourse effort (closest to ground truth)
Download tables as Excel
Related work
  • Noise Contrastive Priors Hafner et al [9] propose a generative “data prior” in function space, modeled as zero-mean Gaussians if the input is out-of-distribution. Noise contrastive priors are similar to OC-BNNs as both methods involve placing a prior on function space but performing inference in parameter space. However, OC-BNNs model output constraints, which encode a richer class of functional beliefs than the simpler Gaussian assumptions encoded by NCPs.

    Global functional properties Previous work have enforced various functional properties such as Lipschitz smoothness [2] or monotonicity [29]. The constraints that they consider are different from output constraints, which can be defined for local regions in the input space. Furthermore, these works focus on classical NNs rather than BNNs.
Funding
  • HL acknowledges support from Google
  • WY and FDV acknowledge support from the Sloan Foundation
Study subjects and analysis
defendants: 6172
A study by ProPublica in 2016 found it to be racially biased against African American defendants [1, 16]. We use the same dataset as this study, containing 9 features on N = 6172 defendants related to their criminal history and demographic attributes. We consider the same binary classification task as in Slack et al [24] — predicting whether a defendant is profiled by COMPAS as being high-risk

cases: 3
Motivated by Ustun et al [26]’s work on recourse (defined as the extent that input features must be altered to change the model’s outcome), we consider the feature RevolvingUtilizationOfUnsecuredLines (RUUL), which has a ground-truth positive correlation with financial distress. We analyze how much a young adult under 35 has to reduce RUUL to flip their prediction to negative in three cases: (i) a BNN trained on the full dataset, (ii) a BNN trained on a blind dataset (age ≥ 35), (iii) an OC-BNN with an actionability constraint: for young adults, predict “no financial distress” even if RUUL is large. The positive Dirichlet COCP (5) is used

data: 1
The positive Dirichlet COCP (5) is used. In addition to scoring accuracy and F1 score on the entire test set (N = 10K); we measure the effort of recourse as the mean difference of RUUL between the two outcomes (Y = 0 or 1) on the subset of individuals where age < 35 (N = 1.5K). Results As can be seen in Table 3, the ground-truth positive correlation between RUUL and the output is weak, and the effort of recourse is consequentially low

COCP samples: 5
Figure 2. a) 1D regression with the positive constraint: Cx+ = R and Cy+(x) = {y | x · y ≥ 0} (green), using AOCP. (b) 1D regression with the negative constraint: Cx− = [−1, 1] and Cy− = [1, 2.5] (red), with the negative exponential COCP (6). The 50 SVGD particles represent functions passing above and below the constrained region, capturing two distinct predictive modes. (c) Fraction of rejected SVGD particles (out of 100) for the OC-BNN (blue, plotted as a function of log-samples used with COCP) and the baseline (black). All baseline particles were rejected, however, only 4% of particles were rejected, using just only 5 COCP samples. Figure 4

Reference
  • Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine Bias: Risk Assessments in Criminal Sentencing. ProPublica, 2016.
    Google ScholarLocate open access versionFindings
  • Cem Anil, James Lucas, and Roger Grosse. Sorting Out Lipschitz Function Approximation. In Proceedings of the 36th International Conference on Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Christopher M Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
    Google ScholarFindings
  • Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight Uncertainty in Neural Networks. In Proceedings of the 32nd International Conference on Machine Learning, 2015.
    Google ScholarLocate open access versionFindings
  • Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid Monte Carlo. Physics Letters B, 195(2):216–222, 1987.
    Google ScholarLocate open access versionFindings
  • John Duchi, Elad Hazan, and Yoram Singer. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.
    Google ScholarLocate open access versionFindings
  • Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J Rezende, S.M. Ali Eslami, and Yee Whye Teh. Neural Processes. In 35th ICML Workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018.
    Google ScholarLocate open access versionFindings
  • Alex Graves. Practical Variational Inference for Neural Networks. In Advances in Neural Information Processing Systems, pages 2348–2356, 2011.
    Google ScholarLocate open access versionFindings
  • Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, and James Davidson. Noise Contrastive Priors for Functional Uncertainty. arXiv:1807.09289, 2018.
    Findings
  • Geoffrey E Hinton and Drew Van Camp. Keeping the Neural Networks Simple by Minimizing the Description Length of the Weights. In Proceedings of the 6th Annual Conference on Computational Learning Theory, pages 5–13, 1993.
    Google ScholarLocate open access versionFindings
  • Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer Feedforward Networks are Universal Approximators. Neural Networks, 2(5):359–366, 1989.
    Google ScholarLocate open access versionFindings
  • Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. MIMIC-III, A Freely Accessible Critical Care Database. Scientific Data, 3:160035, 2016.
    Google ScholarLocate open access versionFindings
  • Kaggle. Give Me Some Credit. http://www.kaggle.com/c/GiveMeSomeCredit/, 2011.
    Findings
  • Nathan Kallus and Angela Zhou. Residual Unfairness in Fair Machine Mearning from Prejudiced Data. In Proceedings of the 35th International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How We Analyzed the COMPAS Recidivism Algorithm. ProPublica, 2016.
    Google ScholarLocate open access versionFindings
  • Qiang Liu and Dilin Wang. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm. In Advances in Neural Information Processing Systems, pages 2378–2386, 2016.
    Google ScholarLocate open access versionFindings
  • Marco Lorenzi and Maurizio Filippone. Constraining the Dynamics of Deep Probabilistic Models. In Proceedings of the 35th International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Christos Louizos, Xiahan Shi, Klamer Schutte, and Max Welling. The Functional Neural Process. In Advances in Neural Information Processing Systems, pages 8743–8754, 2019.
    Google ScholarLocate open access versionFindings
  • David J C MacKay. Probable Networks and Plausible Predictions — a Review of Practical Bayesian Methods for Supervised Neural Networks. Network: Computation in Neural Systems, 6(3):469–505, 1995.
    Google ScholarLocate open access versionFindings
  • Radford M Neal. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, 1995.
    Google ScholarFindings
  • Radford M Neal. MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo, 2(11):2, 2011.
    Google ScholarLocate open access versionFindings
  • Bernt Øksendal. Stochastic Differential Equations. In Stochastic Differential Equations, pages 65–84.
    Google ScholarLocate open access versionFindings
  • Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. In Proceedings of the 3rd AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, pages 180–186, 2020.
    Google ScholarLocate open access versionFindings
  • Shengyang Sun, Guodong Zhang, Jiaxin Shi, and Roger Grosse. Functional Variational Bayesian Neural Networks. In Proceedings of the 7th International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Berk Ustun, Alexander Spangher, and Yang Liu. Actionable Recourse in Linear Classification. In Proceedings of the ACM Conference on Fairness, Accountability and Transparency, pages 10–19, 2019.
    Google ScholarLocate open access versionFindings
  • Andrew Gordon Wilson. The Case for Bayesian Deep Learning. arXiv:2001.10995, 2020.
    Findings
  • Wanqian Yang, Lars Lorch, Moritz A Graule, Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, Melanie F Pradier, and Finale Doshi-Velez. Output-Constrained Bayesian Neural Networks. In 36th ICML Workshop on Uncertainty and Robustness in Deep Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Seungil You, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. Deep Lattice Networks and Partial Monotonic Functions. In Advances in Neural Information Processing Systems, pages 2981–2989, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科