# Counterfactual Predictions under Runtime Confounding

NIPS 2020, 2020.

EI

Weibo:

Abstract:

Algorithms are commonly used to predict outcomes under a particular decision or intervention, such as predicting whether an offender will succeed on parole if placed under minimal supervision. Generally, to learn such counterfactual prediction models from observational data on historical decisions and corresponding outcomes, one must me...More

Code:

Data:

Introduction

- Algorithmic tools are increasingly prevalent in domains such as health care, education, lending, criminal justice, and child welfare [2, 7, 12, 15, 30].
- Decision-makers need to know what is likely to happen if they choose to take a particular action.
- An undergraduate program advisor determining which students to recommend for a personalized case management program might wish to know the likelihood that a given student will graduate if enrolled in the program.
- A parole board determining whether to release an offender may wish to know the likelihood that the offender will succeed on parole under different possible levels of supervision intensity

Highlights

- Algorithmic tools are increasingly prevalent in domains such as health care, education, lending, criminal justice, and child welfare [2, 7, 12, 15, 30]
- A parole board determining whether to release an offender may wish to know the likelihood that the offender will succeed on parole under different possible levels of supervision intensity
- Contributions: Drawing upon techniques used in low-dimensional treatment effect estimation [6, 37, 41], we propose a procedure for the full pipeline of learning and evaluating prediction models under runtime confounding
- Our goal is to predict outcomes under a proposed decision in order to inform human decision-makers about what is likely to happen under that treatment
- We propose a generic procedure for learning counterfactual predictions under runtime confounding that can be used with any parametric or nonparametric learning algorithm

Methods

- Standard counterfactual prediction methods train models on the cases that received treatment a [8, 28], a procedure the authors will refer to as treatment-conditional regression (TCR).
- This procedure estimates ω(v) = E[Y | A = a, V = v].
- The authors can characterize the bias of this approach by analyzing b(v) := ω(v) − ν(v), a quantity the authors term the pointwise confounding bias

Results

**Evaluation method**

The authors describe an approach for evaluating the prediction methods using observed data.- The authors propose a doubly-robust procedure to estimate the prediction error that follows the approach in [8], which focused on classification metrics and did not consider MSE.
- Algorithm 5 describes this procedure.3
- This evaluation method can be used to select the regression estimators for the first and second stages
- Algorithm 5 describes this procedure.3 This evaluation method can be used to select the regression estimators for the first and second stages

Conclusion

- The authors propose a generic procedure for learning counterfactual predictions under runtime confounding that can be used with any parametric or nonparametric learning algorithm.
- The authors' theoretical and empirical analysis suggests this procedure will often outperform other methods, when the level of runtime confounding is significant

Summary

## Introduction:

Algorithmic tools are increasingly prevalent in domains such as health care, education, lending, criminal justice, and child welfare [2, 7, 12, 15, 30].- Decision-makers need to know what is likely to happen if they choose to take a particular action.
- An undergraduate program advisor determining which students to recommend for a personalized case management program might wish to know the likelihood that a given student will graduate if enrolled in the program.
- A parole board determining whether to release an offender may wish to know the likelihood that the offender will succeed on parole under different possible levels of supervision intensity
## Objectives:

The authors' goal is to predict outcomes under a proposed decision in order to inform human decision-makers about what is likely to happen under that treatment.- The authors' goal is to predict outcomes under a proposed treatment A = a ∈ {0, 1} based on runtimeavailable predictors V ∈ V ⊆ RdV .1.
- The authors' goal is to assess risk under the null treatment as per [8], and the authors construct π such that historically the riskier treatments were more likely to get the risk-mitigating treatment and the less risky cases were more likely to get the baseline treatment
## Methods:

Standard counterfactual prediction methods train models on the cases that received treatment a [8, 28], a procedure the authors will refer to as treatment-conditional regression (TCR).- This procedure estimates ω(v) = E[Y | A = a, V = v].
- The authors can characterize the bias of this approach by analyzing b(v) := ω(v) − ν(v), a quantity the authors term the pointwise confounding bias
## Results:

**Evaluation method**

The authors describe an approach for evaluating the prediction methods using observed data.- The authors propose a doubly-robust procedure to estimate the prediction error that follows the approach in [8], which focused on classification metrics and did not consider MSE.
- Algorithm 5 describes this procedure.3
- This evaluation method can be used to select the regression estimators for the first and second stages
- Algorithm 5 describes this procedure.3 This evaluation method can be used to select the regression estimators for the first and second stages
## Conclusion:

The authors propose a generic procedure for learning counterfactual predictions under runtime confounding that can be used with any parametric or nonparametric learning algorithm.- The authors' theoretical and empirical analysis suggests this procedure will often outperform other methods, when the level of runtime confounding is significant

- Table1: MSE E[ ν(V ) − ν(V ) 2] under correct specification vs misspecification in the 2nd stage for d = 500, dV = 400, kv = 24, kz = 20 and n = 3000 (with 95% confidence intervals). Our DR method has the lowest error in both settings. Errors are larger for all methods under misspecification

Related work

- Our work builds upon a growing literature on counterfactual risk assessments for decision support that proposes methods for the unconfounded prediction setting [8, 28]. Following this literature, our goal is to predict outcomes under a proposed decision (interchageably referred to as ‘treatment’ or ‘intervention’) in order to inform human decision-makers about what is likely to happen under that treatment. This prediction task is different from the common causal inference problem of treatment effect estimation, which targets a contrast of outcomes under two different treatments [29, 38]. Treatment effects are useful for describing responsiveness to treatment. While responsiveness is relevant to some types of decisions, it is insufficient, or even irrelevant, to consider for others. For instance, a doctor considering an invasive procedure may make a different recommendation for two patients with the same responsiveness if one has a good probability of successful recovery without the procedure and the other does not. In other settings, such as loan approval, the responsiveness to different loan terms is irrelevant; all that matters is that the likelihood of default be sufficiently small under some feasible terms.

Funding

- This project received funding from the Tata Consultancy Services (TCS) Presidential Fellowship, Block Center for Technology and Society at Carnegie Mellon University, and NSF grant DMS1810979

Reference

- Tim Bezemer, Mark CH De Groot, Enja Blasse, Maarten J Ten Berg, Teus H Kappen, Annelien L Bredenoord, Wouter W Van Solinge, Imo E Hoefer, and Saskia Haitjema. A human (e) factor in clinical decision support systems. Journal of medical Internet research, 21(3):e11732, 2019.
- Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1721–1730. ACM, 2015.
- Sourav Chatterjee. Assumptionless consistency of the lasso. arXiv preprint arXiv:1303.5817, 2013.
- Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and causal parameters. The Econometrics Journal, 2018.
- Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters, 2018.
- Victor Chernozhukov, Mert Demirer, Esther Duflo, and Ivan Fernandez-Val. Generic machine learning inference on heterogenous treatment effects in randomized experiments. Technical report, National Bureau of Economic Research, 2018.
- Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Conference on Fairness, Accountability and Transparency, pages 134–148, 2018.
- Amanda Coston, Alan Mishler, Edward H Kennedy, and Alexandra Chouldechova. Counterfactual risk assessments, evaluation, and fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 582–593, 2020.
- Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. A case for humansin-the-loop: Decisions in the presence of erroneous algorithmic scores. arXiv preprint arXiv:2002.08035, 2020.
- László Györfi, Michael Kohler, Adam Krzyzak, and Harro Walk. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
- Nathan Kallus and Angela Zhou. Confounding-robust policy improvement. In Advances in Neural Information Processing Systems, pages 9269–9279, 2018.
- Danielle Leah Kehl and Samuel Ari Kessler. Algorithms in the criminal justice system: Assessing the use of risk assessments in sentencing. 2017.
- Edward H Kennedy. Semiparametric theory and empirical processes in causal inference. In Statistical causal inferences and their applications in public health research, pages 141–167.
- Edward H Kennedy. Optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497, 2020.
- Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34(11):2767–2787, 2010.
- David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Fairness through causal awareness: Learning causal latent-variable models for biased data. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 349–358. ACM, 2019.
- Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M Mooij. Domain adaptation by using causal inference to predict invariant conditional distributions. In Advances in Neural Information Processing Systems, pages 10846–10856, 2018.
- Maggie Makar, Adith Swaminathan, and Emre Kıcıman. A distillation approach to data efficient individual treatment effect estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4544–4551, 2019.
- J Neyman. Sur les applications de la theorie des probabilites aux experiences agricoles: essai des principes (masters thesis); justification of applications of the calculus of probabilities to the solutions of certain questions in agricultural experimentation. excerpts english translation (reprinted). Stat Sci, 5:463–472, 1923.
- James Robins, Lingling Li, Eric Tchetgen, Aad van der Vaart, et al. Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and statistics: essays in honor of David A. Freedman, pages 335–421. Institute of Mathematical Statistics, 2008.
- James M Robins. Marginal structural models versus structural nested models as tools for causal inference. In Statistical models in epidemiology, the environment, and clinical trials, pages 95–133.
- James M Robins and Andrea Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90(429): 122–129, 1995.
- James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
- James M Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology, 2000.
- Daniel Rubin and Mark J van der Laan. Extending marginal structural models through local, penalized, and additive learning. 2006.
- Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
- Peter Schulam and Suchi Saria. Reliable decision support using counterfactual models. In Advances in Neural Information Processing Systems, pages 1697–1708, 2017.
- Uri Shalit, Fredrik D Johansson, and David Sontag. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3076–3085. JMLR. org, 2017.
- Vernon C Smith, Adam Lange, and Daniel R Huston. Predictive modeling to forecast student outcomes and drive effective interventions in online community college courses. Journal of Asynchronous Learning Networks, 16(3):51–61, 2012.
- Adarsh Subbaswamy and Suchi Saria. Counterfactual normalization: Proactively addressing dataset shift and improving reliability using causal mechanisms. Uncertainty in Artificial Intelligence, 2018.
- Adarsh Subbaswamy, Peter Schulam, and Suchi Saria. Preventing failures due to dataset shift: Learning predictive models that transport. arXiv preprint arXiv:1812.04597, 2018.
- Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. Distill-and-compare: Auditing blackbox models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 303–310, 2018.
- Mark J Van Der Laan and Sandrine Dudoit. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. 2003.
- Mark J van der Laan and Alexander R Luedtke. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. 2014.
- Mark J Van der Laan, MJ Laan, and James M Robins. Unified methods for censored longitudinal data and causality. Springer Science & Business Media, 2003.
- Mark J Van der Laan, MJ Laan, and James M Robins. Unified methods for censored longitudinal data and causality. Springer Science & Business Media, 2003.
- Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523): 1228–1242, 2018.
- Jiaming Zeng, Berk Ustun, and Cynthia Rudin. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(3):689–722, 2017.
- Wenjing Zheng and Mark J van der Laan. Asymptotic theory for cross-validated targeted maximum likelihood estimation. UC Berkeley Division of Biostatistics Working Paper Series, 2010.
- Michael Zimmert and Michael Lechner. Nonparametric estimation of causal heterogeneity under high-dimensional confounding. arXiv preprint arXiv:1908.08779, 2019.
- 2. The specification for ν follows from the marginalization of μ over Z. The propensity score π depends on the sigmoid of a sparse linear function in V and Z that uses coefficients √ 1 in order to satisfy kv +kz our positivity condition.

Tags

Comments