Counterfactual Predictions under Runtime Confounding

NIPS 2020, 2020.

Cited by: 0|Views14
EI
Weibo:
We propose a generic procedure for learning counterfactual predictions under runtime confounding that can be used with any parametric or nonparametric learning algorithm

Abstract:

Algorithms are commonly used to predict outcomes under a particular decision or intervention, such as predicting whether an offender will succeed on parole if placed under minimal supervision. Generally, to learn such counterfactual prediction models from observational data on historical decisions and corresponding outcomes, one must me...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Algorithmic tools are increasingly prevalent in domains such as health care, education, lending, criminal justice, and child welfare [2, 7, 12, 15, 30].
  • Decision-makers need to know what is likely to happen if they choose to take a particular action.
  • An undergraduate program advisor determining which students to recommend for a personalized case management program might wish to know the likelihood that a given student will graduate if enrolled in the program.
  • A parole board determining whether to release an offender may wish to know the likelihood that the offender will succeed on parole under different possible levels of supervision intensity
Highlights
  • Algorithmic tools are increasingly prevalent in domains such as health care, education, lending, criminal justice, and child welfare [2, 7, 12, 15, 30]
  • A parole board determining whether to release an offender may wish to know the likelihood that the offender will succeed on parole under different possible levels of supervision intensity
  • Contributions: Drawing upon techniques used in low-dimensional treatment effect estimation [6, 37, 41], we propose a procedure for the full pipeline of learning and evaluating prediction models under runtime confounding
  • Our goal is to predict outcomes under a proposed decision in order to inform human decision-makers about what is likely to happen under that treatment
  • We propose a generic procedure for learning counterfactual predictions under runtime confounding that can be used with any parametric or nonparametric learning algorithm
Methods
  • Standard counterfactual prediction methods train models on the cases that received treatment a [8, 28], a procedure the authors will refer to as treatment-conditional regression (TCR).
  • This procedure estimates ω(v) = E[Y | A = a, V = v].
  • The authors can characterize the bias of this approach by analyzing b(v) := ω(v) − ν(v), a quantity the authors term the pointwise confounding bias
Results
  • Evaluation method

    The authors describe an approach for evaluating the prediction methods using observed data.
  • The authors propose a doubly-robust procedure to estimate the prediction error that follows the approach in [8], which focused on classification metrics and did not consider MSE.
  • Algorithm 5 describes this procedure.3
  • This evaluation method can be used to select the regression estimators for the first and second stages
  • Algorithm 5 describes this procedure.3 This evaluation method can be used to select the regression estimators for the first and second stages
Conclusion
  • The authors propose a generic procedure for learning counterfactual predictions under runtime confounding that can be used with any parametric or nonparametric learning algorithm.
  • The authors' theoretical and empirical analysis suggests this procedure will often outperform other methods, when the level of runtime confounding is significant
Summary
  • Introduction:

    Algorithmic tools are increasingly prevalent in domains such as health care, education, lending, criminal justice, and child welfare [2, 7, 12, 15, 30].
  • Decision-makers need to know what is likely to happen if they choose to take a particular action.
  • An undergraduate program advisor determining which students to recommend for a personalized case management program might wish to know the likelihood that a given student will graduate if enrolled in the program.
  • A parole board determining whether to release an offender may wish to know the likelihood that the offender will succeed on parole under different possible levels of supervision intensity
  • Objectives:

    The authors' goal is to predict outcomes under a proposed decision in order to inform human decision-makers about what is likely to happen under that treatment.
  • The authors' goal is to predict outcomes under a proposed treatment A = a ∈ {0, 1} based on runtimeavailable predictors V ∈ V ⊆ RdV .1.
  • The authors' goal is to assess risk under the null treatment as per [8], and the authors construct π such that historically the riskier treatments were more likely to get the risk-mitigating treatment and the less risky cases were more likely to get the baseline treatment
  • Methods:

    Standard counterfactual prediction methods train models on the cases that received treatment a [8, 28], a procedure the authors will refer to as treatment-conditional regression (TCR).
  • This procedure estimates ω(v) = E[Y | A = a, V = v].
  • The authors can characterize the bias of this approach by analyzing b(v) := ω(v) − ν(v), a quantity the authors term the pointwise confounding bias
  • Results:

    Evaluation method

    The authors describe an approach for evaluating the prediction methods using observed data.
  • The authors propose a doubly-robust procedure to estimate the prediction error that follows the approach in [8], which focused on classification metrics and did not consider MSE.
  • Algorithm 5 describes this procedure.3
  • This evaluation method can be used to select the regression estimators for the first and second stages
  • Algorithm 5 describes this procedure.3 This evaluation method can be used to select the regression estimators for the first and second stages
  • Conclusion:

    The authors propose a generic procedure for learning counterfactual predictions under runtime confounding that can be used with any parametric or nonparametric learning algorithm.
  • The authors' theoretical and empirical analysis suggests this procedure will often outperform other methods, when the level of runtime confounding is significant
Tables
  • Table1: MSE E[ ν(V ) − ν(V ) 2] under correct specification vs misspecification in the 2nd stage for d = 500, dV = 400, kv = 24, kz = 20 and n = 3000 (with 95% confidence intervals). Our DR method has the lowest error in both settings. Errors are larger for all methods under misspecification
Download tables as Excel
Related work
  • Our work builds upon a growing literature on counterfactual risk assessments for decision support that proposes methods for the unconfounded prediction setting [8, 28]. Following this literature, our goal is to predict outcomes under a proposed decision (interchageably referred to as ‘treatment’ or ‘intervention’) in order to inform human decision-makers about what is likely to happen under that treatment. This prediction task is different from the common causal inference problem of treatment effect estimation, which targets a contrast of outcomes under two different treatments [29, 38]. Treatment effects are useful for describing responsiveness to treatment. While responsiveness is relevant to some types of decisions, it is insufficient, or even irrelevant, to consider for others. For instance, a doctor considering an invasive procedure may make a different recommendation for two patients with the same responsiveness if one has a good probability of successful recovery without the procedure and the other does not. In other settings, such as loan approval, the responsiveness to different loan terms is irrelevant; all that matters is that the likelihood of default be sufficiently small under some feasible terms.
Funding
  • This project received funding from the Tata Consultancy Services (TCS) Presidential Fellowship, Block Center for Technology and Society at Carnegie Mellon University, and NSF grant DMS1810979
Reference
  • Tim Bezemer, Mark CH De Groot, Enja Blasse, Maarten J Ten Berg, Teus H Kappen, Annelien L Bredenoord, Wouter W Van Solinge, Imo E Hoefer, and Saskia Haitjema. A human (e) factor in clinical decision support systems. Journal of medical Internet research, 21(3):e11732, 2019.
    Google ScholarLocate open access versionFindings
  • Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1721–1730. ACM, 2015.
    Google ScholarLocate open access versionFindings
  • Sourav Chatterjee. Assumptionless consistency of the lasso. arXiv preprint arXiv:1303.5817, 2013.
    Findings
  • Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and causal parameters. The Econometrics Journal, 2018.
    Google ScholarLocate open access versionFindings
  • Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters, 2018.
    Google ScholarFindings
  • Victor Chernozhukov, Mert Demirer, Esther Duflo, and Ivan Fernandez-Val. Generic machine learning inference on heterogenous treatment effects in randomized experiments. Technical report, National Bureau of Economic Research, 2018.
    Google ScholarFindings
  • Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Conference on Fairness, Accountability and Transparency, pages 134–148, 2018.
    Google ScholarLocate open access versionFindings
  • Amanda Coston, Alan Mishler, Edward H Kennedy, and Alexandra Chouldechova. Counterfactual risk assessments, evaluation, and fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 582–593, 2020.
    Google ScholarLocate open access versionFindings
  • Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. A case for humansin-the-loop: Decisions in the presence of erroneous algorithmic scores. arXiv preprint arXiv:2002.08035, 2020.
    Findings
  • László Györfi, Michael Kohler, Adam Krzyzak, and Harro Walk. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
    Google ScholarFindings
  • Nathan Kallus and Angela Zhou. Confounding-robust policy improvement. In Advances in Neural Information Processing Systems, pages 9269–9279, 2018.
    Google ScholarLocate open access versionFindings
  • Danielle Leah Kehl and Samuel Ari Kessler. Algorithms in the criminal justice system: Assessing the use of risk assessments in sentencing. 2017.
    Google ScholarFindings
  • Edward H Kennedy. Semiparametric theory and empirical processes in causal inference. In Statistical causal inferences and their applications in public health research, pages 141–167.
    Google ScholarLocate open access versionFindings
  • Edward H Kennedy. Optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497, 2020.
    Findings
  • Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34(11):2767–2787, 2010.
    Google ScholarLocate open access versionFindings
  • David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Fairness through causal awareness: Learning causal latent-variable models for biased data. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 349–358. ACM, 2019.
    Google ScholarLocate open access versionFindings
  • Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M Mooij. Domain adaptation by using causal inference to predict invariant conditional distributions. In Advances in Neural Information Processing Systems, pages 10846–10856, 2018.
    Google ScholarLocate open access versionFindings
  • Maggie Makar, Adith Swaminathan, and Emre Kıcıman. A distillation approach to data efficient individual treatment effect estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4544–4551, 2019.
    Google ScholarLocate open access versionFindings
  • J Neyman. Sur les applications de la theorie des probabilites aux experiences agricoles: essai des principes (masters thesis); justification of applications of the calculus of probabilities to the solutions of certain questions in agricultural experimentation. excerpts english translation (reprinted). Stat Sci, 5:463–472, 1923.
    Google ScholarLocate open access versionFindings
  • James Robins, Lingling Li, Eric Tchetgen, Aad van der Vaart, et al. Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and statistics: essays in honor of David A. Freedman, pages 335–421. Institute of Mathematical Statistics, 2008.
    Google ScholarLocate open access versionFindings
  • James M Robins. Marginal structural models versus structural nested models as tools for causal inference. In Statistical models in epidemiology, the environment, and clinical trials, pages 95–133.
    Google ScholarFindings
  • James M Robins and Andrea Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90(429): 122–129, 1995.
    Google ScholarLocate open access versionFindings
  • James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
    Google ScholarLocate open access versionFindings
  • James M Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology, 2000.
    Google ScholarFindings
  • Daniel Rubin and Mark J van der Laan. Extending marginal structural models through local, penalized, and additive learning. 2006.
    Google ScholarFindings
  • Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
    Google ScholarLocate open access versionFindings
  • Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
    Google ScholarLocate open access versionFindings
  • Peter Schulam and Suchi Saria. Reliable decision support using counterfactual models. In Advances in Neural Information Processing Systems, pages 1697–1708, 2017.
    Google ScholarLocate open access versionFindings
  • Uri Shalit, Fredrik D Johansson, and David Sontag. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3076–3085. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Vernon C Smith, Adam Lange, and Daniel R Huston. Predictive modeling to forecast student outcomes and drive effective interventions in online community college courses. Journal of Asynchronous Learning Networks, 16(3):51–61, 2012.
    Google ScholarLocate open access versionFindings
  • Adarsh Subbaswamy and Suchi Saria. Counterfactual normalization: Proactively addressing dataset shift and improving reliability using causal mechanisms. Uncertainty in Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Adarsh Subbaswamy, Peter Schulam, and Suchi Saria. Preventing failures due to dataset shift: Learning predictive models that transport. arXiv preprint arXiv:1812.04597, 2018.
    Findings
  • Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. Distill-and-compare: Auditing blackbox models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 303–310, 2018.
    Google ScholarLocate open access versionFindings
  • Mark J Van Der Laan and Sandrine Dudoit. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. 2003.
    Google ScholarFindings
  • Mark J van der Laan and Alexander R Luedtke. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. 2014.
    Google ScholarFindings
  • Mark J Van der Laan, MJ Laan, and James M Robins. Unified methods for censored longitudinal data and causality. Springer Science & Business Media, 2003.
    Google ScholarFindings
  • Mark J Van der Laan, MJ Laan, and James M Robins. Unified methods for censored longitudinal data and causality. Springer Science & Business Media, 2003.
    Google ScholarFindings
  • Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523): 1228–1242, 2018.
    Google ScholarLocate open access versionFindings
  • Jiaming Zeng, Berk Ustun, and Cynthia Rudin. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(3):689–722, 2017.
    Google ScholarLocate open access versionFindings
  • Wenjing Zheng and Mark J van der Laan. Asymptotic theory for cross-validated targeted maximum likelihood estimation. UC Berkeley Division of Biostatistics Working Paper Series, 2010.
    Google ScholarLocate open access versionFindings
  • Michael Zimmert and Michael Lechner. Nonparametric estimation of causal heterogeneity under high-dimensional confounding. arXiv preprint arXiv:1908.08779, 2019.
    Findings
  • 2. The specification for ν follows from the marginalization of μ over Z. The propensity score π depends on the sigmoid of a sparse linear function in V and Z that uses coefficients √ 1 in order to satisfy kv +kz our positivity condition.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments