# Causal Reasoning from Meta-reinforcement Learning

arXiv: Learning, 2018.

EI

Weibo:

Abstract:

Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find ...More

Code:

Data:

Introduction

- Many machine learning algorithms are rooted in discovering patterns of correlation in data
- While this has been sufficient to excel in several areas (Krizhevsky et al, 2012; Cho et al, 2014), sometimes the problems the authors are interested in are fundamentally causal.
- By learning end-to-end, the algorithm has the potential to find the internal representations of causal structure best suited for the types of causal inference required

Highlights

- Many machine learning algorithms are rooted in discovering patterns of correlation in data
- We introduced and tested a framework for learning causal reasoning in various data settings—observational, interventional, and counterfactual—using deep meta-reinforcement learning (RL)
- By optimizing an agent to perform a task that depended on causal structure, the agent learned implicit strategies to use the available data for causal reasoning, including drawing inferences from passive observation, actively intervening, and making counterfactual predictions
- Traditional formal approaches usually decouple the two problems of causal induction and causal inference, and despite advances in both (Ortega & Stocker, 2015; Bramley et al, 2017; Parida et al, 2018; Sen et al, 2017; Forney et al, 2017; Lattimore et al, 2016), inducing models often requires assumptions that are difficult to fit to complex real-world conditions
- By learning these end-to-end, our method can potentially find representations of causal structure best tuned to the specific causal inferences required
- Our agents’ active intervention policy was close to optimal, which demonstrates the promise of agents that can learn to experiment on their environment and perform rich causal reasoning on the observations

Methods

**Asynchronous methods for deep reinforcement learning**

CoRR, abs/1602.01783, 2016. URL http: //arxiv.org/abs/1602.01783.- Causal reasoning in a prediction task with hidden causes.
- Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
- Et al Matching networks for one shot learning.
- Prefrontal cortex as a meta-reinforcement learning system.

Results

- The authors focus on three key questions in this experiment: (i) Can the agents learn to do associative reasoning with observational data?, (ii) Can they learn to do cause-effect reasoning from observational data?, and (iii) In addition to making causal inferences, can the agent choose good actions in the information phase to generate the data it observes? (a) Obs.

Long-Obs. - By optimizing an agent to perform a task that depended on causal structure, the agent learned implicit strategies to use the available data for causal reasoning, including drawing inferences from passive observation, actively intervening, and making counterfactual predictions.
- This observation is corroborated by Fig. 2(b) which shows that performance increased selectively in cases where do-calculus made a prediction distinguishable from the predictions based on correlations.
- These are situations where the externally intervened node had a parent – meaning that the intervention resulted in a different graph

Conclusion

**DISCUSSION AND FUTURE**

WORK

This work is the first demonstration that causal reasoning can arise out of model-free reinforcement learning.- Traditional formal approaches usually decouple the two problems of causal induction and causal inference, and despite advances in both (Ortega & Stocker, 2015; Bramley et al, 2017; Parida et al, 2018; Sen et al, 2017; Forney et al, 2017; Lattimore et al, 2016), inducing models often requires assumptions that are difficult to fit to complex real-world conditions
- By learning these end-to-end, the method can potentially find representations of causal structure best tuned to the specific causal inferences required.
- The authors' agents’ active intervention policy was close to optimal, which demonstrates the promise of agents that can learn to experiment on their environment and perform rich causal reasoning on the observations

Summary

## Introduction:

Many machine learning algorithms are rooted in discovering patterns of correlation in data- While this has been sufficient to excel in several areas (Krizhevsky et al, 2012; Cho et al, 2014), sometimes the problems the authors are interested in are fundamentally causal.
- By learning end-to-end, the algorithm has the potential to find the internal representations of causal structure best suited for the types of causal inference required
## Methods:

**Asynchronous methods for deep reinforcement learning**

CoRR, abs/1602.01783, 2016. URL http: //arxiv.org/abs/1602.01783.- Causal reasoning in a prediction task with hidden causes.
- Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
- Et al Matching networks for one shot learning.
- Prefrontal cortex as a meta-reinforcement learning system.
## Results:

The authors focus on three key questions in this experiment: (i) Can the agents learn to do associative reasoning with observational data?, (ii) Can they learn to do cause-effect reasoning from observational data?, and (iii) In addition to making causal inferences, can the agent choose good actions in the information phase to generate the data it observes? (a) Obs.

Long-Obs.- By optimizing an agent to perform a task that depended on causal structure, the agent learned implicit strategies to use the available data for causal reasoning, including drawing inferences from passive observation, actively intervening, and making counterfactual predictions.
- This observation is corroborated by Fig. 2(b) which shows that performance increased selectively in cases where do-calculus made a prediction distinguishable from the predictions based on correlations.
- These are situations where the externally intervened node had a parent – meaning that the intervention resulted in a different graph
## Conclusion:

**DISCUSSION AND FUTURE**

WORK

This work is the first demonstration that causal reasoning can arise out of model-free reinforcement learning.- Traditional formal approaches usually decouple the two problems of causal induction and causal inference, and despite advances in both (Ortega & Stocker, 2015; Bramley et al, 2017; Parida et al, 2018; Sen et al, 2017; Forney et al, 2017; Lattimore et al, 2016), inducing models often requires assumptions that are difficult to fit to complex real-world conditions
- By learning these end-to-end, the method can potentially find representations of causal structure best tuned to the specific causal inferences required.
- The authors' agents’ active intervention policy was close to optimal, which demonstrates the promise of agents that can learn to experiment on their environment and perform rich causal reasoning on the observations

Funding

- For (ii) we see in Fig. 4a the crucial result that the Passive-Interventional Agent’s performance is significantly better than the Passive-Conditional Agent
- In Fig. 5(b), we find that the increased performance is observed only in cases where the maximum mean value in the graph is degenerate, and optimal choice is affected by the exogenous noise – i.e. where multiple nodes have the same value on average and the specific randomness can be used to distinguish their actual values in that specific case

Reference

- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 39–48, 2016.
- M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, pp. 3981–3989, 2016.
- D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.
- Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
- C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
- A. P. Blaisdell, K. Sawa, K. J. Leising, and M. R. Waldmann. Causal reasoning in rats. Science, 311(5763): 1020–1022, 2006.
- N. R. Bramley, P. Dayan, T. L. Griffiths, and D. A. Lagnado. Formalizing neuraths ship: Approximate algorithms for online causal learning. Psychological review, 124(3):301, 2017.
- K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- P. Dawid. Fundamentals of statistical causality. Technical report, University Colledge London, 2007.
- Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016. URL http://arxiv.org/abs/1611.02779.
- Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561, 2018.
- C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.
- Andrew Forney, Judea Pearl, and Elias Bareinboim. Counterfactual data-fusion for online reinforcement learners. In International Conference on Machine Learning, pp. 1156–1164, 2017.
- Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, SM Eslami, and Oriol Vinyals. Synthesizing programs for images using reinforced adversarial learning. arXiv preprint arXiv:1804.01118, 2018.
- A. Gopnik, D. M. Sobel, L. E. Schulz, and C. Glymour. Causal learning mechanisms in very young children: two-, three-, and four-year-olds infer causal relations from patterns of variation and covariation. Developmental psychology, 37(5):620, 2001.
- A. Gopnik, C. Glymour, D. M. Sobel, L. E. Schulz, T. Kushnir, and D. Danks. A theory of causal learning in children: causal maps and bayes nets. Psychological review, 111(1):3, 2004.
- Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, and Hado van Hasselt. Multi-task deep reinforcement learning with popart. arXiv preprint arXiv:1809.04474, 2018.
- Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, et al. Deep q-learning from demonstrations. arXiv preprint arXiv:1704.03732, 2017.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
- D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
- A. Krizhevsky, I. Sutskever, and g. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
- David A Lagnado, Tobias Gerstenberg, and Ro’i Zultan. Causal responsibility and counterfactuals. Cognitive science, 37(6):1036–1073, 2013.
- Finnian Lattimore, Tor Lattimore, and Mark D Reid. Causal bandits: Learning good interventions via causal inference. In Advances in Neural Information Processing Systems, pp. 1181–1189, 2016.
- A. M. Leslie. The perception of causality in infants. Perception, 11(2):173–186, 1982.
- Asynchronous methods for deep reinforcement learning. CoRR, abs/1602.01783, 2016. URL http://arxiv.org/abs/1602.01783. K. P. Murphy. Machine Learning:a Probabilistic Perspective. MIT Press, 2012. P. A. Ortega and D. D. Lee A. A. Stocker. Causal reasoning in a prediction task with hidden causes.37th Annual Cognitive Science Society Meeting CogSci, 2015.
- P. K. Parida, T. Marwala, and S. Chakraverty. A multivariate additive noise model for complete causal discovery. Neural Networks, 103:44–54, 2018. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
- J. Pearl, M. Glymour, and N. P. Jewell. Causal Inference in Statistics: A Primer. Wiley, 2016.
- A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. Meta-learning with memoryaugmented neural networks. In International conference on machine learning, pp. 1842–1850, 2016. Rajat Sen, Karthikeyan Shanmugam, Alexandros G Dimakis, and Sanjay Shakkottai. Identifying best interventions through online importance sampling. arXiv preprint arXiv:1701.02789, 2017.
- P. Spirtes, C. N. Glymour, R. Scheines, D. Heckerman, C. Meek, G. Cooper, and T. Richardson. Causation, prediction, and search. MIT press, 2000.
- O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630–3638, 2016.
- J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blundell, D. Kumaran, and M. Botvinick. Learning to reinforcement learn. CoRR, abs/1611.05763, 2016. URL http://arxiv.org/abs/1611.05763. J. X. Wang, Z. Kurth-Nelson, D. Kumaran, D. Tirumala, H. Soyer, J. Z. Leibo, D. Hassabis, and M. Botvinick. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21, 2018.

Tags

Comments