Progressive Neural Networks

CoRR, Volume abs/1606.04671, 2016.

Cited by: 252|Bibtex|Views82|Links
EI
Keywords:
progressive neuralcatastrophic forgettingpositive transferAverage Fisher SensitivityMarkov Decision ProcessMore(4+)
Weibo:
We have shown that the progressive approach is able to effectively exploit transfer for compatible source and task domains; that the approach is robust to harmful features learned in incompatible tasks; and that positive transfer increases with the number of columns, corroboratin...

Abstract:

Learning to solve complex sequences of tasks--while both leveraging transfer and avoiding catastrophic forgetting--remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connect...More

Code:

Data:

0
Introduction
  • Finetuning remains the method of choice for transfer learning with neural networks: a model is pretrained on a source domain, the output layers of the model are adapted to the target domain, and the network is finetuned via backpropagation.
  • This approach was pioneered in [7] by transferring knowledge from a generative to a discriminative model, and has since been generalized with great success [11].
  • While distillation [8] offers one potential solution to multitask learning [17], it requires a reservoir of persistent training data for all tasks, an assumption which may not always hold
Highlights
  • Finetuning remains the method of choice for transfer learning with neural networks: a model is pretrained on a source domain, the output layers of the model are adapted to the target domain, and the network is finetuned via backpropagation
  • The approach has drawbacks which make it unsuitable for transferring across multiple tasks: if we wish to leverage knowledge acquired over a sequence of experiences, which model should we use to initialize subsequent models? This seems to require not only a learning method that can support transfer learning without catastrophic forgetting, but foreknowledge of task similarity
  • Progressive networks are widely applicable, this paper focuses on their application to deep reinforcement learning
  • Progressive neural networks are a stepping stone towards continual learning, and this work has demonstrated their potential through experiments and analysis across three RL domains, including Atari, which contains orthogonal or even adversarial tasks
  • We have shown that the progressive approach is able to effectively exploit transfer for compatible source and task domains; that the approach is robust to harmful features learned in incompatible tasks; and that positive transfer increases with the number of columns, corroborating the constructive, rather than destructive, nature of the progressive architecture
Methods
  • The authors evaluate progressive networks across three different RL domains. First, the authors consider synthetic versions of Pong, altered to have visual or control-level similarities.
  • The authors experiment broadly with random sequences of Atari games and perform a feature-level transfer analysis.
  • The authors demonstrate performance on a set of 3D maze games.
  • A3C is trained on CPU using multiple threads and has been shown to converge faster than DQN on GPU.
  • This made it a more natural fit for the large amount of sequential experiments required for this work
Conclusion
  • The ability to accumulate and transfer knowledge to new domains, is a core characteristic of intelligent beings.
  • Progressive neural networks are a stepping stone towards continual learning, and this work has demonstrated their potential through experiments and analysis across three RL domains, including Atari, which contains orthogonal or even adversarial tasks.
  • The authors believe that the authors are the first to show positive transfer in deep RL agents within a continual learning framework.
  • The authors have shown that the progressive approach is able to effectively exploit transfer for compatible source and task domains; that the approach is robust to harmful features learned in incompatible tasks; and that positive transfer increases with the number of columns, corroborating the constructive, rather than destructive, nature of the progressive architecture
Summary
  • Finetuning remains the method of choice for transfer learning with neural networks: a model is pretrained on a source domain, the output layers of the model are adapted to the target domain, and the network is finetuned via backpropagation.
  • While finetuning incorporates prior knowledge only at initialization, progressive networks retain a pool of pretrained models throughout training, and learn lateral connections from these to extract useful features for the new task.
  • Progressive networks integrate these desiderata directly into the model architecture: catastrophic forgetting is prevented by instantiating a new neural network for each task being solved, while transfer is enabled via lateral connections to features of previously learned columns.
  • Columns in progressive networks are free to reuse, modify or ignore previously learned features via the lateral connections.
  • The progressive net approach, in contrast, uses lateral connections to access previously learned features for deep compositionality.
  • The transfer score is defined as the relative performance of an architecture compared with a single column baseline, trained only on the target task.
  • To this end we start by training single columns on three source games (Pong, River Raid, and Seaquest) 3 and assess if the learned features transfer to a different subset of randomly selected target games (Alien, Asterix, Boxing, Centipede, Gopher, Hero, James Bond, Krull, Robotank, Road Runner, Star Gunner, and Wizard of Wor).
  • We observe from Fig. 6, that progressive nets result in positive transfer in 8 out of 12 target tasks, with only two cases of negative transfer.
  • The statistics across all 3-column nets (Figure 7b) show that positive transfer in Atari occurs at a "sweet spot" between heavy reliance on features from the source task, and heavy reliance on all new features for the target task.
  • This result appears unintuitive: if a progressive net finds a valuable feature set from a source task, shouldn’t we expect a high degree of transfer?
  • Note that even for these easy cases, baseline 2 shows negative transfer because it cannot learn new low-level visual features, which are important because the reward items change from task to task.
  • Progressive neural networks are a stepping stone towards continual learning, and this work has demonstrated their potential through experiments and analysis across three RL domains, including Atari, which contains orthogonal or even adversarial tasks.
  • We have shown that the progressive approach is able to effectively exploit transfer for compatible source and task domains; that the approach is robust to harmful features learned in incompatible tasks; and that positive transfer increases with the number of columns, corroborating the constructive, rather than destructive, nature of the progressive architecture.
Tables
  • Table1: Transfer percentages in three domains. Baselines are defined in Fig. 3
Download tables as Excel
Funding
  • Evaluates this architecture extensively on a wide variety of reinforcement learning tasks , and show that it outperforms common baselines based on pretraining and finetuning
  • Demonstrates that transfer occurs at both low-level sensory and high-level control layers of the learned policy
  • Introduces progressive networks, a novel model architecture with explicit support for transfer across sequences of tasks
  • Evaluates alternative approaches to transfer within the RL domain
  • Progressive networks are widely applicable, this paper focuses on their application to deep reinforcement learning
Study subjects and analysis
cases: 2
The transfer matrix and selected transfer curves are shown in Figure 6, and the results summarized in Table 1. Across all games, we observe from Fig. 6, that progressive nets result in positive transfer in 8 out of 12 target tasks, with only two cases of negative transfer. This compares favourably to baseline 3, which yields positive transfer in only 5 of 12 games

Reference
  • Forest Agostinelli, Michael R Anderson, and Honglak Lee. Adaptive multi-column deep neural networks with application to robust image denoising. In Advances in Neural Information Processing Systems, 2013.
    Google ScholarLocate open access versionFindings
  • Shun-ichi Amari. Natural gradient works efficiently in learning. Neural Computation, 1998.
    Google ScholarLocate open access versionFindings
  • M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47:253–279, 2013.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In JMLR: Workshop on Unsupervised and Transfer Learning, 2012.
    Google ScholarLocate open access versionFindings
  • Dan C. Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classification. In Conf. on Computer Vision and Pattern Recognition, 2012.
    Google ScholarLocate open access versionFindings
  • Scott E. Fahlman and Christian Lebiere. The cascade-correlation learning architecture. In Advances in Neural Information Processing Systems, 1990.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006.
    Google ScholarLocate open access versionFindings
  • Goeff Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.
    Findings
  • Yann LeCun, John S. Denker, and Sara A. Solla. Optimal brain damage. In Advances in Neural Information Processing Systems, 1990.
    Google ScholarLocate open access versionFindings
  • Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In Proc. of Int’l Conference on Learning Representations (ICLR), 2013.
    Google ScholarLocate open access versionFindings
  • G. Mesnil, Y. Dauphin, X. Glorot, S. Rifai, Y. Bengio, I. Goodfellow, E. Lavoie, X. Muller, G. Desjardins, D. Warde-Farley, P. Vincent, A. Courville, and J. Bergstra. Unsupervised and transfer learning challenge: a deep learning approach. In JMLR W& CP: Proc. of the Unsupervised and Transfer Learning challenge and workshop, volume 27, 2012.
    Google ScholarLocate open access versionFindings
  • V. Mnih, Kk Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Int’l Conf. on Machine Learning (ICML), 2016.
    Google ScholarLocate open access versionFindings
  • Emilio Parisotto, Lei Jimmy Ba, and Ruslan Salakhutdinov. Actor-mimic: Deep multitask and transfer reinforcement learning. In Proc. of Int’l Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Mark B. Ring. Continual Learning in Reinforcement Environments. R. Oldenbourg Verlag, 1995.
    Google ScholarFindings
  • Artem Rozantsev, Mathieu Salzmann, and Pascal Fua. Beyond sharing weights for deep domain adaptation. CoRR, abs/1603.06432, 2016.
    Findings
  • A. Rusu, S. Colmenarejo, Ç. Gülçehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell. Policy distillation. abs/1511.06295, 2016.
    Google ScholarLocate open access versionFindings
  • Paul Ruvolo and Eric Eaton. Ella: An efficient lifelong learning algorithm. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), June 2013.
    Google ScholarLocate open access versionFindings
  • Daniel L. Silver, Qiang Yang, and Lianghao Li. Lifelong machine learning systems: Beyond learning algorithms. In AAAI Spring Symposium: Lifelong Machine Learning, 2013.
    Google ScholarFindings
  • Matthew E. Taylor and Peter Stone. An introduction to inter-task transfer for reinforcement learning. AI Magazine, 32(1):15–34, 2011.
    Google ScholarLocate open access versionFindings
  • Alexander V. Terekhov, Guglielmo Montone, and J. Kevin O’Regan. Knowledge Transfer in Deep Block-Modular Neural Networks, pages 268–279. Springer International Publishing, Cham, 2015.
    Google ScholarLocate open access versionFindings
  • C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor. A Deep Hierarchical Approach to Lifelong Learning in Minecraft. ArXiv e-prints, 2016.
    Google ScholarFindings
  • Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pages 3320–3328, 2014.
    Google ScholarLocate open access versionFindings
  • Guanyu Zhou, Kihyuk Sohn, and Honglak Lee. Online incremental feature learning with denoising autoencoders. In Proc. of Int’l Conf. on Artificial Intelligence and Statistics (AISTATS), pages 1453–1461, 2012.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments