Differentiable Causal Discovery from Interventional Data

NIPS 2020, 2020.

Cited by: 0|Views31
EI
Weibo:
We proposed a general continuous-constrained method for causal discovery which can leverage various types of interventional data as well as expressive neural architectures, such as normalizing flows

Abstract:

Discovering causal relationships in data is a challenging task that involves solving a combinatorial problem for which the solution is not always identifiable. A new line of work reformulates the combinatorial problem as a continuous constrained optimization one, enabling the use of different powerful optimization techniques. However, m...More
0
Full Text
Bibtex
Weibo
Introduction
  • The inference of causal relationships is a problem of fundamental interest in science.
  • The authors aim to learn a causal graphical model (CGM) [28], which consists of a joint distribution coupled with a directed acyclic graph (DAG), where edges indicate direct causal relationships
  • Achieving this based on observational data alone is challenging since, under the faithfulness assumption, the true DAG is only identifiable up to a Markov equivalence class [38].
  • The kth interventional joint density is p(k)(x1, · · · , xd) :=
Highlights
  • The inference of causal relationships is a problem of fundamental interest in science
  • We aim to learn a causal graphical model (CGM) [28], which consists of a joint distribution coupled with a directed acyclic graph (DAG), where edges indicate direct causal relationships
  • We propose the approach Differentiable Causal Discovery with Interventions (DCDI): a general differentiable causal structure learning method that can leverage perfect, imperfect and unknown interventions (Section 3)
  • We proposed a general continuous-constrained method for causal discovery which can leverage various types of interventional data as well as expressive neural architectures, such as normalizing flows
  • One direction is to extend DCDI to time-series data, where non-stationarities can be modeled as unknown interventions [29]
  • Another exciting direction is to learn representations of variables across multiple systems that could serve as prior knowledge for causal discovery in low data settings
Methods
  • IGSP GIES CAM DCDI-G 36 43.
  • DCDI-DSF 33 47 tp fn fp rev F1 score.
  • In Table 3 the authors report SHD and SID for all methods, along with the number of true positive, false negative, false positive, reversed edges, and the F1 score.
  • IGSP has a low SHD, but a high SID, which can be explained by the relatively high number of false negative.
  • DCDI-G and DCDI-DSF have SHDs comparable to GIES and CAM, but higher than IGSP.
Results
  • Results for different intervention types.
  • The authors compare the methods to GIES [12], a modified version of CAM [2] that support interventions and IGSP [39].
  • Boxplots for SHD and SID over 10 graphs are shown in Figure 2.
  • DCDI-G and DCDI-DSF shows competitive results in term of SHD and SID.
  • For graphs with a higher number of average edges, DCDI-G and DCDI-DSF outperform all methods.
  • GIES often shows the best performance for the linear data set, which is not surprising given that it makes the right assumptions, i.e., linear functions with Gaussian noise
Conclusion
  • The authors proposed a general continuous-constrained method for causal discovery which can leverage various types of interventional data as well as expressive neural architectures, such as normalizing flows.
  • This approach is rooted in a sound theoretical framework and is competitive with other stateof-the-art algorithms on real and simulated data sets, both in terms of graph recovery and scalability.
  • Another exciting direction is to learn representations of variables across multiple systems that could serve as prior knowledge for causal discovery in low data settings
Summary
  • Introduction:

    The inference of causal relationships is a problem of fundamental interest in science.
  • The authors aim to learn a causal graphical model (CGM) [28], which consists of a joint distribution coupled with a directed acyclic graph (DAG), where edges indicate direct causal relationships
  • Achieving this based on observational data alone is challenging since, under the faithfulness assumption, the true DAG is only identifiable up to a Markov equivalence class [38].
  • The kth interventional joint density is p(k)(x1, · · · , xd) :=
  • Objectives:

    The authors' goal is to design an algorithm that can automatically discover causal relationships from data.
  • The authors aim to learn a causal graphical model (CGM) [28], which consists of a joint distribution coupled with a directed acyclic graph (DAG), where edges indicate direct causal relationships
  • Methods:

    IGSP GIES CAM DCDI-G 36 43.
  • DCDI-DSF 33 47 tp fn fp rev F1 score.
  • In Table 3 the authors report SHD and SID for all methods, along with the number of true positive, false negative, false positive, reversed edges, and the F1 score.
  • IGSP has a low SHD, but a high SID, which can be explained by the relatively high number of false negative.
  • DCDI-G and DCDI-DSF have SHDs comparable to GIES and CAM, but higher than IGSP.
  • Results:

    Results for different intervention types.
  • The authors compare the methods to GIES [12], a modified version of CAM [2] that support interventions and IGSP [39].
  • Boxplots for SHD and SID over 10 graphs are shown in Figure 2.
  • DCDI-G and DCDI-DSF shows competitive results in term of SHD and SID.
  • For graphs with a higher number of average edges, DCDI-G and DCDI-DSF outperform all methods.
  • GIES often shows the best performance for the linear data set, which is not surprising given that it makes the right assumptions, i.e., linear functions with Gaussian noise
  • Conclusion:

    The authors proposed a general continuous-constrained method for causal discovery which can leverage various types of interventional data as well as expressive neural architectures, such as normalizing flows.
  • This approach is rooted in a sound theoretical framework and is competitive with other stateof-the-art algorithms on real and simulated data sets, both in terms of graph recovery and scalability.
  • Another exciting direction is to learn representations of variables across multiple systems that could serve as prior knowledge for causal discovery in low data settings
Tables
  • Table1: Hyperparameter search spaces for each algorithm
  • Table2: Default Hyperparameter for DCDI-G and DCDI-DSF
  • Table3: Results for the flow cytometry data sets
  • Table4: Results for the linear data set with perfect intervention
  • Table5: Results for the additive noise model data set with perfect intervention
  • Table6: Results for the nonlinear with non-additive noise data set with perfect intervention
  • Table7: Results for the linear data set with imperfect intervention
  • Table8: Results for the additive noise model data set with imperfect intervention
  • Table9: Results for the nonlinear with non-additive noise data set with imperfect intervention
  • Table10: Results for the linear data set with perfect intervention with unknown targets
  • Table11: Results for the additive noise model data set with perfect intervention with unknown targets
  • Table12: Results for the nonlinear with non-additive noise data set with perfect intervention with unknown targets
  • Table13: Table 13
  • Table14: Table 14
  • Table15: Table 15
  • Table16: Table 16
  • Table17: Table 17
  • Table18: Table 18
  • Table19: Results for linear data set with perfect intervention
Download tables as Excel
Funding
  • This research was partially supported by the Canada CIFAR AI Chair Program, by an IVADO excellence PhD scholarship and by a Google Focused Research award
Study subjects and analysis
nodes and on data sets: 100
So far the experiments focused on moderate size data sets, both in terms of number of variables (10 or 20) and number of examples (≈ 104). In Appendix C.3, we compare the running times of DCDI to those of other methods on graphs of up to 100 nodes and on data sets of up to 1 million examples. The augmented Lagrangian procedure on which DCDI relies requires the computation of the matrix exponential at each gradient step, which costs O(d3)

Reference
  • L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. 2010.
    Google ScholarLocate open access versionFindings
  • P. Bühlmann, J. Peters, and J. Ernest. Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 2014.
    Google ScholarLocate open access versionFindings
  • D. M. Chickering. Optimal structure identification with greedy search. In Journal of Machine Learning Research, 2003.
    Google ScholarLocate open access versionFindings
  • A. Dixit, O. Parnas, B. Li, J. Chen, C. P. Fulco, L. Jerby-Arnon, N. D. Marjanovic, D. Dionne, T. Burks, R. Raychndhury, T. M. Adamson, B. Norman, E. S. Lander, J. S. Weissman, N. Friedman, and A. Regev. Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. Cell, 2016.
    Google ScholarLocate open access versionFindings
  • D. Eaton and K. Murphy. Exact bayesian structure learning from uncertain interventions. In Artificial intelligence and statistics, 2007.
    Google ScholarLocate open access versionFindings
  • F. Eberhardt. Causation and intervention. Unpublished doctoral dissertation, Carnegie Mellon University, 2007.
    Google ScholarFindings
  • F. Eberhardt. Almost Optimal Intervention Sets for Causal Discovery. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, 2008.
    Google ScholarLocate open access versionFindings
  • F. Eberhardt and R. Scheines. Interventions and causal inference. Philosophy of Science, 2007.
    Google ScholarLocate open access versionFindings
  • F. Eberhardt, C. Glymour, and R. Scheines. On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify all Causal Relations among N Variables. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, 2005.
    Google ScholarLocate open access versionFindings
  • K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf. Kernel measures of conditional dependence. In Advances in neural information processing systems, 2008.
    Google ScholarLocate open access versionFindings
  • X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010.
    Google ScholarLocate open access versionFindings
  • A. Hauser and P. Bühlmann. Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research, 2012.
    Google ScholarLocate open access versionFindings
  • C. Heinze-Deml, M. H. Maathuis, and N. Meinshausen. Causal structure learning. Annual Review of Statistics and Its Application, 2018.
    Google ScholarLocate open access versionFindings
  • C. Heinze-Deml, J. Peters, and N. Meinshausen. Invariant causal prediction for nonlinear models. Journal of Causal Inference, 2018.
    Google ScholarLocate open access versionFindings
  • C.-W. Huang, D. Krueger, A. Lacoste, and A. Courville. Neural autoregressive flows. In Proceedings of the 35th International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • A. Hyttinen, F. Eberhardt, and M. Järvisalo. Constraint-based causal discovery: Conflict resolution with answer set programming. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014.
    Google ScholarLocate open access versionFindings
  • E. Jang, S. Gu, and B. Poole. Categorical reparameterization with gumbel-softmax. Proceedings of the 34th International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • D. Kalainathan, O. Goudet, I. Guyon, D. Lopez-Paz, and M. Sebag. Sam: Structural agnostic model, causal discovery and penalized adversarial learning. arXiv preprint arXiv:1803.04929, 2018.
    Findings
  • N. R. Ke, O. Bilaniuk, A. Goyal, S. Bauer, H. Larochelle, C. Pal, and Y. Bengio. Learning neural causal models from unknown interventions. arXiv preprint arXiv:1910.01075, 2019.
    Findings
  • K. B. Korb, L. R. Hope, A. E. Nicholson, and K. Axnick. Varieties of causal intervention. In Pacific Rim International Conference on Artificial Intelligence, 2004.
    Google ScholarLocate open access versionFindings
  • S. Lachapelle, P. Brouillard, T. Deleu, and S. Lacoste-Julien. Gradient-based neural DAG learning. In Proceedings of the 8th International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • C. J. Maddison, A. Mnih, and Y. W. Teh. The concrete distribution: A continuous relaxation of discrete random variables. Proceedings of the 34th International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • J. M. Mooij, S. Magliacane, and T. Claassen. Joint causal inference from multiple contexts. arXiv preprint arXiv:1611.10351, 2016.
    Findings
  • I. Ng, Z. Fang, S. Zhu, Z. Chen, and J. Wang. Masked gradient-based causal structure learning. arXiv preprint arXiv:1910.08527, 2019.
    Findings
  • J. Pearl. Causality. Cambridge university press, 2009.
    Google ScholarFindings
  • J. Peters and P. Bühlmann. Structural intervention distance (SID) for evaluating causal graphs. Neural Computation, 2015.
    Google ScholarLocate open access versionFindings
  • J. Peters, P. Bühlmann, and N. Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2016.
    Google ScholarLocate open access versionFindings
  • J. Peters, D. Janzing, and B. Schölkopf. Elements of Causal Inference - Foundations and Learning Algorithms. MIT Press, 2017.
    Google ScholarFindings
  • N. Pfister, P. Bühlmann, and J. Peters. Invariant causal prediction for sequential data. Journal of the American Statistical Association, 2019.
    Google ScholarLocate open access versionFindings
  • D. J. Rezende and S. Mohamed. Variational inference with normalizing flows. Proceedings of the 32nd International Conference on Machine Learning, 2015.
    Google ScholarLocate open access versionFindings
  • D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, 2014.
    Google ScholarLocate open access versionFindings
  • K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 2005.
    Google ScholarLocate open access versionFindings
  • P. Spirtes, C. N. Glymour, R. Scheines, and D. Heckerman. Causation, prediction, and search. 2000.
    Google ScholarFindings
  • C. Squires, Y. Wang, and C. Uhler. Permutation-based causal structure learning with unknown intervention targets. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, 2020.
    Google ScholarLocate open access versionFindings
  • E. V. Strobl, K. Zhang, and S. Visweswaran. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference, 2019.
    Google ScholarLocate open access versionFindings
  • T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 2012.
    Google ScholarLocate open access versionFindings
  • S. Triantafillou and I. Tsamardinos. Constraint-based causal discovery from multiple interventions over overlapping variable sets. Journal of Machine Learning Research, 2015.
    Google ScholarLocate open access versionFindings
  • T. Verma and J. Pearl. Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, 1990.
    Google ScholarLocate open access versionFindings
  • Y. Wang, L. Solus, K. Yang, and C. Uhler. Permutation-based causal inference algorithms with interventions. In Advances in Neural Information Processing Systems, 2017.
    Google ScholarLocate open access versionFindings
  • K. D. Yang, A. Katcoff, and C. Uhler. Characterizing and learning equivalence classes of causal DAGs under interventions. Proceedings of the 35th International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Yu, J. Chen, T. Gao, and M. Yu. DAG-GNN: DAG structure learning with graph neural networks. In Proceedings of the 36th International Conference on Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • K. Zhang, J. Peters, D. Janzing, and B. Schölkopf. Kernel-based conditional independence test and application in causal discovery. Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 2011.
    Google ScholarLocate open access versionFindings
  • Q. Zhang, S. Filippi, A. Gretton, and D. Sejdinovic. Large-scale kernel methods for independence testing. Statistics and Computing, 2018.
    Google ScholarLocate open access versionFindings
  • X. Zheng, B. Aragam, P.K. Ravikumar, and E.P. Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems 31, 2018.
    Google ScholarLocate open access versionFindings
  • X. Zheng, C. Dan, B. Aragam, P. Ravikumar, and E. Xing. Learning sparse nonparametric dags. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, 2020.
    Google ScholarLocate open access versionFindings
  • S. Zhu and Z. Chen. Causal discovery with reinforcement learning. Proceedings of the 8th International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • A. M. Zimmer, Y. K. Pan, T. Chandrapalan, R. WM Kwong, and S. F. Perry. Loss-of-function approaches in comparative physiology: is there a future for knockdown experiments in the era of genome editing? Journal of Experimental Biology, 2019.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments