Causal Reasoning from Meta-reinforcement Learning

arXiv: Learning, 2018.

Cited by: 0|Views144
EI
Weibo:
By optimizing an agent to perform a task that depended on causal structure, the agent learned implicit strategies to use the available data for causal reasoning, including drawing inferences from passive observation, actively intervening, and making counterfactual predictions

Abstract:

Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find ...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Many machine learning algorithms are rooted in discovering patterns of correlation in data
  • While this has been sufficient to excel in several areas (Krizhevsky et al, 2012; Cho et al, 2014), sometimes the problems the authors are interested in are fundamentally causal.
  • By learning end-to-end, the algorithm has the potential to find the internal representations of causal structure best suited for the types of causal inference required
Highlights
  • Many machine learning algorithms are rooted in discovering patterns of correlation in data
  • We introduced and tested a framework for learning causal reasoning in various data settings—observational, interventional, and counterfactual—using deep meta-reinforcement learning (RL)
  • By optimizing an agent to perform a task that depended on causal structure, the agent learned implicit strategies to use the available data for causal reasoning, including drawing inferences from passive observation, actively intervening, and making counterfactual predictions
  • Traditional formal approaches usually decouple the two problems of causal induction and causal inference, and despite advances in both (Ortega & Stocker, 2015; Bramley et al, 2017; Parida et al, 2018; Sen et al, 2017; Forney et al, 2017; Lattimore et al, 2016), inducing models often requires assumptions that are difficult to fit to complex real-world conditions
  • By learning these end-to-end, our method can potentially find representations of causal structure best tuned to the specific causal inferences required
  • Our agents’ active intervention policy was close to optimal, which demonstrates the promise of agents that can learn to experiment on their environment and perform rich causal reasoning on the observations
Methods
  • Asynchronous methods for deep reinforcement learning

    CoRR, abs/1602.01783, 2016. URL http: //arxiv.org/abs/1602.01783.
  • Causal reasoning in a prediction task with hidden causes.
  • Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
  • Et al Matching networks for one shot learning.
  • Prefrontal cortex as a meta-reinforcement learning system.
Results
  • The authors focus on three key questions in this experiment: (i) Can the agents learn to do associative reasoning with observational data?, (ii) Can they learn to do cause-effect reasoning from observational data?, and (iii) In addition to making causal inferences, can the agent choose good actions in the information phase to generate the data it observes? (a) Obs.

    Long-Obs.
  • By optimizing an agent to perform a task that depended on causal structure, the agent learned implicit strategies to use the available data for causal reasoning, including drawing inferences from passive observation, actively intervening, and making counterfactual predictions.
  • This observation is corroborated by Fig. 2(b) which shows that performance increased selectively in cases where do-calculus made a prediction distinguishable from the predictions based on correlations.
  • These are situations where the externally intervened node had a parent – meaning that the intervention resulted in a different graph
Conclusion
  • DISCUSSION AND FUTURE

    WORK

    This work is the first demonstration that causal reasoning can arise out of model-free reinforcement learning.
  • Traditional formal approaches usually decouple the two problems of causal induction and causal inference, and despite advances in both (Ortega & Stocker, 2015; Bramley et al, 2017; Parida et al, 2018; Sen et al, 2017; Forney et al, 2017; Lattimore et al, 2016), inducing models often requires assumptions that are difficult to fit to complex real-world conditions
  • By learning these end-to-end, the method can potentially find representations of causal structure best tuned to the specific causal inferences required.
  • The authors' agents’ active intervention policy was close to optimal, which demonstrates the promise of agents that can learn to experiment on their environment and perform rich causal reasoning on the observations
Summary
  • Introduction:

    Many machine learning algorithms are rooted in discovering patterns of correlation in data
  • While this has been sufficient to excel in several areas (Krizhevsky et al, 2012; Cho et al, 2014), sometimes the problems the authors are interested in are fundamentally causal.
  • By learning end-to-end, the algorithm has the potential to find the internal representations of causal structure best suited for the types of causal inference required
  • Methods:

    Asynchronous methods for deep reinforcement learning

    CoRR, abs/1602.01783, 2016. URL http: //arxiv.org/abs/1602.01783.
  • Causal reasoning in a prediction task with hidden causes.
  • Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
  • Et al Matching networks for one shot learning.
  • Prefrontal cortex as a meta-reinforcement learning system.
  • Results:

    The authors focus on three key questions in this experiment: (i) Can the agents learn to do associative reasoning with observational data?, (ii) Can they learn to do cause-effect reasoning from observational data?, and (iii) In addition to making causal inferences, can the agent choose good actions in the information phase to generate the data it observes? (a) Obs.

    Long-Obs.
  • By optimizing an agent to perform a task that depended on causal structure, the agent learned implicit strategies to use the available data for causal reasoning, including drawing inferences from passive observation, actively intervening, and making counterfactual predictions.
  • This observation is corroborated by Fig. 2(b) which shows that performance increased selectively in cases where do-calculus made a prediction distinguishable from the predictions based on correlations.
  • These are situations where the externally intervened node had a parent – meaning that the intervention resulted in a different graph
  • Conclusion:

    DISCUSSION AND FUTURE

    WORK

    This work is the first demonstration that causal reasoning can arise out of model-free reinforcement learning.
  • Traditional formal approaches usually decouple the two problems of causal induction and causal inference, and despite advances in both (Ortega & Stocker, 2015; Bramley et al, 2017; Parida et al, 2018; Sen et al, 2017; Forney et al, 2017; Lattimore et al, 2016), inducing models often requires assumptions that are difficult to fit to complex real-world conditions
  • By learning these end-to-end, the method can potentially find representations of causal structure best tuned to the specific causal inferences required.
  • The authors' agents’ active intervention policy was close to optimal, which demonstrates the promise of agents that can learn to experiment on their environment and perform rich causal reasoning on the observations
Funding
  • For (ii) we see in Fig. 4a the crucial result that the Passive-Interventional Agent’s performance is significantly better than the Passive-Conditional Agent
  • In Fig. 5(b), we find that the increased performance is observed only in cases where the maximum mean value in the graph is degenerate, and optimal choice is affected by the exogenous noise – i.e. where multiple nodes have the same value on average and the specific randomness can be used to distinguish their actual values in that specific case
Reference
  • Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 39–48, 2016.
    Google ScholarLocate open access versionFindings
  • M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, pp. 3981–3989, 2016.
    Google ScholarLocate open access versionFindings
  • D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.
    Google ScholarFindings
  • Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
    Findings
  • C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
    Google ScholarFindings
  • A. P. Blaisdell, K. Sawa, K. J. Leising, and M. R. Waldmann. Causal reasoning in rats. Science, 311(5763): 1020–1022, 2006.
    Google ScholarLocate open access versionFindings
  • N. R. Bramley, P. Dayan, T. L. Griffiths, and D. A. Lagnado. Formalizing neuraths ship: Approximate algorithms for online causal learning. Psychological review, 124(3):301, 2017.
    Google ScholarLocate open access versionFindings
  • K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
    Findings
  • P. Dawid. Fundamentals of statistical causality. Technical report, University Colledge London, 2007.
    Google ScholarFindings
  • Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016. URL http://arxiv.org/abs/1611.02779.
    Findings
  • Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561, 2018.
    Findings
  • C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.
    Findings
  • Andrew Forney, Judea Pearl, and Elias Bareinboim. Counterfactual data-fusion for online reinforcement learners. In International Conference on Machine Learning, pp. 1156–1164, 2017.
    Google ScholarLocate open access versionFindings
  • Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, SM Eslami, and Oriol Vinyals. Synthesizing programs for images using reinforced adversarial learning. arXiv preprint arXiv:1804.01118, 2018.
    Findings
  • A. Gopnik, D. M. Sobel, L. E. Schulz, and C. Glymour. Causal learning mechanisms in very young children: two-, three-, and four-year-olds infer causal relations from patterns of variation and covariation. Developmental psychology, 37(5):620, 2001.
    Google ScholarLocate open access versionFindings
  • A. Gopnik, C. Glymour, D. M. Sobel, L. E. Schulz, T. Kushnir, and D. Danks. A theory of causal learning in children: causal maps and bayes nets. Psychological review, 111(1):3, 2004.
    Google ScholarLocate open access versionFindings
  • Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, and Hado van Hasselt. Multi-task deep reinforcement learning with popart. arXiv preprint arXiv:1809.04474, 2018.
    Findings
  • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, et al. Deep q-learning from demonstrations. arXiv preprint arXiv:1704.03732, 2017.
    Findings
  • S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
    Google ScholarFindings
  • A. Krizhevsky, I. Sutskever, and g. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • David A Lagnado, Tobias Gerstenberg, and Ro’i Zultan. Causal responsibility and counterfactuals. Cognitive science, 37(6):1036–1073, 2013.
    Google ScholarLocate open access versionFindings
  • Finnian Lattimore, Tor Lattimore, and Mark D Reid. Causal bandits: Learning good interventions via causal inference. In Advances in Neural Information Processing Systems, pp. 1181–1189, 2016.
    Google ScholarLocate open access versionFindings
  • A. M. Leslie. The perception of causality in infants. Perception, 11(2):173–186, 1982.
    Google ScholarLocate open access versionFindings
  • Asynchronous methods for deep reinforcement learning. CoRR, abs/1602.01783, 2016. URL http://arxiv.org/abs/1602.01783. K. P. Murphy. Machine Learning:a Probabilistic Perspective. MIT Press, 2012. P. A. Ortega and D. D. Lee A. A. Stocker. Causal reasoning in a prediction task with hidden causes.37th Annual Cognitive Science Society Meeting CogSci, 2015.
    Findings
  • P. K. Parida, T. Marwala, and S. Chakraverty. A multivariate additive noise model for complete causal discovery. Neural Networks, 103:44–54, 2018. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
    Google ScholarLocate open access versionFindings
  • J. Pearl, M. Glymour, and N. P. Jewell. Causal Inference in Statistics: A Primer. Wiley, 2016.
    Google ScholarFindings
  • A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. Meta-learning with memoryaugmented neural networks. In International conference on machine learning, pp. 1842–1850, 2016. Rajat Sen, Karthikeyan Shanmugam, Alexandros G Dimakis, and Sanjay Shakkottai. Identifying best interventions through online importance sampling. arXiv preprint arXiv:1701.02789, 2017.
    Findings
  • P. Spirtes, C. N. Glymour, R. Scheines, D. Heckerman, C. Meek, G. Cooper, and T. Richardson. Causation, prediction, and search. MIT press, 2000.
    Google ScholarFindings
  • O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630–3638, 2016.
    Google ScholarLocate open access versionFindings
  • J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blundell, D. Kumaran, and M. Botvinick. Learning to reinforcement learn. CoRR, abs/1611.05763, 2016. URL http://arxiv.org/abs/1611.05763. J. X. Wang, Z. Kurth-Nelson, D. Kumaran, D. Tirumala, H. Soyer, J. Z. Leibo, D. Hassabis, and M. Botvinick. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21, 2018.
    Findings
Your rating :
0

 

Tags
Comments