AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We develop a new general setting of observational causal effect estimation called estimation with functional confounders where the confounder can be expressed as a function of the data, meaning positivity is violated

Causal Estimation with Functional Confounders

NIPS 2020, (2020): 5115-5125

Cited by: 0|Views17
EI WOS

Abstract

Causal inference relies on two fundamental assumptions: ignorability and positivity. We study causal inference when the true confounder value can be expressed as a function of the observed data; we call this setting estimation with functional confounders (EFC). In this setting, ignorability is satisfied, however positivity is violated, ...More

Code:

Data:

0
Introduction
  • Determining the effect of interventions on outcomes using observational data lies at the core of many fields like medicine, economic policy, and genomics.
  • There could exist unobserved variables that affect both the intervention and the outcome, called confounders.
  • A necessary condition for the causal effect to be identified is that all confounders are observed; called ignorability.
  • A sufficient condition for causal effect estimation is adequate variation in the intervention after conditioning on the confounders; called positivity
Highlights
  • Determining the effect of interventions on outcomes using observational data lies at the core of many fields like medicine, economic policy, and genomics
  • We develop a sufficient condition to estimate the effects of said functional interventions, called functional positivity (F-POSITIVITY)
  • Given a confounder value, C-REDUNDANCY allows us to compute a surrogate intervention such that the conditional effect of the surrogate is equal to that of the original intervention. We show that such surrogate interventions exist only under a certain condition that we call Effect Connectivity, that is necessary for nonparametric effect estimation in estimation with functional confounders (EFC)
  • When positivity is violated in traditional observational causal inference (OBS-CI), not all effects are estimable without further assumptions
  • We develop a new general setting of observational causal effect estimation called estimation with functional confounders (EFC) where the confounder can be expressed as a function of the data, meaning positivity is violated
  • We develop a sufficient condition called functional positivity (F-POSITIVITY) to estimate effects of functional interventions
Methods
  • The authors evaluate LODE on simulated data first and show that LODE can correct for confounding.
  • The authors investigate different properties of LODE on simulated data where ground truth is available.
  • Let the dimension of t be T = 20 and outcome noise be η ∼ N(0, 0.1).
Results
  • The authors select relevant SNPs by thresholding estimated effects at a magnitude > 0.1.
  • From 1050 SNPs (1000 not reported before) LODE returned 31 SNPs, out of which 13 were previously reported as being associated with Celiac disease [8, 25, 14, 1].
  • In appendix B.2 the authors plot the true positive and false negative rates of identifying previously reported SNPs, as a function of the effect threshold.
  • In table 1, the authors list a few SNPs that were both deemed relevant by LODE and were reported in existing litera- SNP EFFECT.
Conclusion
  • When positivity is violated in traditional OBS-CI, not all effects are estimable without further assumptions.
  • In such cases, practitioners have to turn to parametric models to estimate causal effects.
  • The authors develop a sufficient condition called functional positivity (F-POSITIVITY) to estimate effects of functional interventions.
  • Such effects could be of independent interest; like the effect of cumulative dosage of a drug instead of joint effects of multiple dosages at different times
Summary
  • Introduction:

    Determining the effect of interventions on outcomes using observational data lies at the core of many fields like medicine, economic policy, and genomics.
  • There could exist unobserved variables that affect both the intervention and the outcome, called confounders.
  • A necessary condition for the causal effect to be identified is that all confounders are observed; called ignorability.
  • A sufficient condition for causal effect estimation is adequate variation in the intervention after conditioning on the confounders; called positivity
  • Methods:

    The authors evaluate LODE on simulated data first and show that LODE can correct for confounding.
  • The authors investigate different properties of LODE on simulated data where ground truth is available.
  • Let the dimension of t be T = 20 and outcome noise be η ∼ N(0, 0.1).
  • Results:

    The authors select relevant SNPs by thresholding estimated effects at a magnitude > 0.1.
  • From 1050 SNPs (1000 not reported before) LODE returned 31 SNPs, out of which 13 were previously reported as being associated with Celiac disease [8, 25, 14, 1].
  • In appendix B.2 the authors plot the true positive and false negative rates of identifying previously reported SNPs, as a function of the effect threshold.
  • In table 1, the authors list a few SNPs that were both deemed relevant by LODE and were reported in existing litera- SNP EFFECT.
  • Conclusion:

    When positivity is violated in traditional OBS-CI, not all effects are estimable without further assumptions.
  • In such cases, practitioners have to turn to parametric models to estimate causal effects.
  • The authors develop a sufficient condition called functional positivity (F-POSITIVITY) to estimate effects of functional interventions.
  • Such effects could be of independent interest; like the effect of cumulative dosage of a drug instead of joint effects of multiple dosages at different times
Tables
  • Table1: A few SNPs previously reported as relevant and recovered by LODE, with estimated effects and Lasso coefficients. LODE produces effect estimates that do not rely purely on the coefficients
Download tables as Excel
Related work
  • The problem of genome-wide association studies (GWAS) is to estimate the effect of genetic variations(also called single nucleotide polymorphisms (SNPs)) on the phenotype [29]. The ancestry of the subjects acts as a confounder in GWAS. In GWAS practice, principle component analysis (PCA) and linear mixed models (LMMs) are used to compute this confounding structure [19, 31]. Lippert et al [15] suggest estimating the confounders and effects on separate subsets of the SNPs. This separation disregards the confounding that is captured in the interaction of the two subsets of SNPs. GWAS is a special case of effects from multiple treatments (MTE) where the confounder value is specified via optimization as a function of the pre-outcome variables [20, 30]. In all these settings, positivity is violated and not all effects are estimable. We provide an avenue for nonparametric effect-estimation of the full intervention under a new sufficient condition.
Funding
  • The authors were partly supported by NIH/NHLBI Award R01HL148248, and by NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science
Study subjects and analysis
cases: 3796
In this experiment, we explore the associations of genetic factors and Celiac disease. We utilize data from the Wellcome Trust Celiac disease GWAS dataset [8, 6] consisting of individuals with celiac disease, called cases (n = 3796), and controls (n = 8154). We construct our dataset by filtering from the ∼ 550, 000 SNPs

people: 11950
The only preprocessing in our experiments is linkage disequilibrium pruning of adjacent SNPs (at 0.5 R2) and PLINK [5] quality control. After this, 337, 642 SNPs remain for 11, 950 people. We imputed missing SNPs for each person by sampling from the marginal distribution of that SNP

Reference
  • Svetlana Adamovic, SS Amundsen, BA Lie, AH Gudjonsdottir, H Ascher, J Ek, DA Van Heel, S Nilsson, LM Sollid, and A Torinsson Naluai. Association study of il2/il21 and fcgriia: significant association with the il2/il21 region in scandinavian coeliac disease families. Genes and immunity, 9(4):364, 2008.
    Google ScholarLocate open access versionFindings
  • Carl A Anderson, Gabrielle Boucher, Charlie W Lees, Andre Franke, Mauro D’Amato, Kent D Taylor, James C Lee, Philippe Goyette, Marcin Imielinski, Anna Latiano, et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nature genetics, 43(3):246, 2011.
    Google ScholarLocate open access versionFindings
  • Uri M Ascher and Linda R Petzold. Computer methods for ordinary differential equations and differential-algebraic equations, volume 61.
    Google ScholarLocate open access versionFindings
  • William Astle, David J Balding, et al. Population structure and cryptic relatedness in genetic association studies. Statistical Science, 24(4):451–471, 2009.
    Google ScholarLocate open access versionFindings
  • Christopher C Chang, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee. Second-generation plink: rising to the challenge of larger and richer datasets. Gigascience, 4(1):s13742–015, 2015.
    Google ScholarLocate open access versionFindings
  • Wellcome Trust Case Control Consortium et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145):661, 2007.
    Google ScholarLocate open access versionFindings
  • J. Correa and E. Bareinboim. A calculus for stochastic interventions: Causal effect identification and surrogate experiments. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, 2020. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Patrick CA Dubois, Gosia Trynka, Lude Franke, Karen A Hunt, Jihane Romanos, Alessandra Curtotti, Alexandra Zhernakova, Graham AR Heap, Roza Adany, Arpo Aromaa, et al. Multiple common variants for celiac disease influencing immune gene expression. Nature genetics, 42 (4):295, 2010.
    Google ScholarLocate open access versionFindings
  • Frederick Eberhardt and Richard Scheines. Interventions and causal inference. Philosophy of Science, 74(5):981–995, 2007.
    Google ScholarLocate open access versionFindings
  • Miguel A Hernan and James M Robins. Causal inference: what if. Boca Raton: Chapman & Hill/CRC, 2020, 2020.
    Google ScholarFindings
  • Jennifer L. Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 20doi: 10.1198/jcgs.2010.08162. URL https://doi.org/10.1198/jcgs.2010.08162.
    Locate open access versionFindings
  • Lucia A Hindorff, Praveen Sethupathy, Heather A Junkins, Erin M Ramos, Jayashri P Mehta, Francis S Collins, and Teri A Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362–9367, 2009.
    Google ScholarLocate open access versionFindings
  • Morris W Hirsch, Robert L Devaney, and Stephen Smale. Differential equations, dynamical systems, and linear algebra, volume 60. Academic press, 1974.
    Google ScholarLocate open access versionFindings
  • Karen A Hunt, Alexandra Zhernakova, Graham Turner, Graham AR Heap, Lude Franke, Marcel Bruinenberg, Jihane Romanos, Lotte C Dinesen, Anthony W Ryan, Davinder Panesar, et al. Novel celiac disease genetic determinants related to the immune response. Nature genetics, 40 (4):395, 2008.
    Google ScholarLocate open access versionFindings
  • Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson, and David Heckerman. Fast linear mixed models for genome-wide association studies. Nature methods, 8 (10):833, 2011.
    Google ScholarLocate open access versionFindings
  • Virginia Pascual, Romina Dieli-Crimi, Natalia Lopez-Palacios, Andres Bodas, Luz Marıa Medrano, and Concepcion Nunez. Inflammatory bowel disease and celiac disease: overlaps and differences. World journal of gastroenterology: WJG, 20(17):4846, 2014.
    Google ScholarLocate open access versionFindings
  • Judea Pearl et al. Causal inference in statistics: An overview. Statistics surveys, 3:96–146, 2009.
    Google ScholarLocate open access versionFindings
  • Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikitlearn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
    Google ScholarLocate open access versionFindings
  • Alkes L Price, Nick J Patterson, Robert M Plenge, Michael E Weinblatt, Nancy A Shadick, and David Reich. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38(8):904, 2006.
    Google ScholarLocate open access versionFindings
  • Rajesh Ranganath and Adler Perotte. Multiple causal inference with latent confounding. arXiv preprint arXiv:1805.08273, 2018.
    Findings
  • Marc Ratkovic. Balancing within the margin: Causal effect estimation with support vector machines. Department of Politics, Princeton University, Princeton, NJ, 2014.
    Google ScholarFindings
  • James M Robins. Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association, volume 1999, pages 6–10. Indianapolis, IN, 2000.
    Google ScholarLocate open access versionFindings
  • Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
    Google ScholarLocate open access versionFindings
  • Donald B Rubin. Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association, 75(371):591–593, 1980.
    Google ScholarLocate open access versionFindings
  • Ludvig M Sollid. Coeliac disease: dissecting a complex inflammatory disorder. Nature Reviews Immunology, 2(9):647, 2002.
    Google ScholarLocate open access versionFindings
  • Michael Spivak. Calculus on manifolds: a modern approach to classical theorems of advanced calculus. CRC press, 2018.
    Google ScholarFindings
  • Gerald Teschl. Ordinary differential equations and dynamical systems, volume 140. American Mathematical Soc., 2012.
    Google ScholarFindings
  • Timothy Thornton and Michael Wu. Summer institute in statistical genetics 2015.
    Google ScholarLocate open access versionFindings
  • Peter M Visscher, Naomi R Wray, Qian Zhang, Pamela Sklar, Mark I McCarthy, Matthew A Brown, and Jian Yang. 10 years of gwas discovery: biology, function, and translation. The American Journal of Human Genetics, 101(1):5–22, 2017.
    Google ScholarLocate open access versionFindings
  • Yixin Wang and David M Blei. The blessings of multiple causes. Journal of the American Statistical Association, (just-accepted):1–71, 2019.
    Google ScholarLocate open access versionFindings
  • Jianming Yu, Gael Pressoir, William H Briggs, Irie Vroh Bi, Masanori Yamasaki, John F Doebley, Michael D McMullen, Brandon S Gaut, Dahlia M Nielsen, James B Holland, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature genetics, 38(2):203, 2006.
    Google ScholarLocate open access versionFindings
Author
Aahlad Manas Puli
Aahlad Manas Puli
Your rating :
0

 

Tags
Comments
小科