AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We provide a concrete instantiation of our framework using the recent and complementary Forget-Me-Not Process and Gated Linear Network models
A Combinatorial Perspective On Transfer Learning
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), (2020)
Human intelligence is characterized not only by the capacity to learn complex skills, but the ability to rapidly adapt and acquire new skills within an ever-changing environment. In this work we study how the learning of modular solutions can allow for effective generalization to both unseen and potentially differently distributed data. O...More
PPT (Upload PPT)
- Humans learn new tasks from a single temporal stream by efficiently transferring experience of previously encountered tasks.
- Instead of considering “online" and “continual" learning as inconvenient constraints to avoid, in this paper the authors describe a framework that leverages them as desirable properties to enable effective, data-efficient transfer of previously acquired skills
- Core to this framework is the ability to ensemble task-specific neural networks at the level of individual nodes.
- The authors refer to the intersection of these challenges as online continual learning
- Humans learn new tasks from a single temporal stream by efficiently transferring experience of previously encountered tasks
- We present our analysis in three parts: in Section 5.1, we demonstrate that Neural Combinatorial Transfer Learning (NCTL) exhibits combinatorial transfer using a more challenging variant of the standard Split MNIST protocol; in Section 5.2, we compare the performance of NCTL to many previous continual learning algorithms across standard Permuted and Split MNIST variants, using the same test and train splits as previously published; in Section 5.3, we further evaluate NCTL on a widely used real-world dataset Electricity (Elec2-3) which exhibits temporal dependencies and distribution drift
- In this paper we described a framework for combinatorial transfer, whereby a network with m nodes trained on h tasks generalizes to hm possible task instantiations
- This framework relies on the ability to meaningfully ensemble networks at the level of individual nodes, which is not a property of contemporary neural networks trained via back-propagation
- We provide a concrete instantiation of our framework using the recent and complementary Forget-Me-Not Process and Gated Linear Network models
- We provide a variety of experimental evidence that our NCTL algorithm does exhibit combinatorial transfer, and that leads to both positive forward and backward transfer in a difficult online setting with no access to task boundaries or identities
- The authors explore the properties of the NCTL algorithm empirically. The authors present the analysis in three parts: in Section 5.1, the authors demonstrate that NCTL exhibits combinatorial transfer using a more challenging variant of the standard Split MNIST protocol; in Section 5.2, the authors compare the performance of NCTL to many previous continual learning algorithms across standard Permuted and Split MNIST variants, using the same test and train splits as previously published; in Section 5.3, the authors further evaluate NCTL on a widely used real-world dataset Electricity (Elec2-3) which exhibits temporal dependencies and distribution drift.
- Each feature vector is standardized componentwise to have zero mean and unit variance.
- This normalized feature vector is broadcast to every neuron as side information z.
- In this paper the authors described a framework for combinatorial transfer, whereby a network with m nodes trained on h tasks generalizes to hm possible task instantiations.
- The authors demonstrate empirically that the distribution of pseudo-tasks is semantically meaningful by comparing NCTL to a number of contemporary continual learning algorithms on standard Split/Permuted MNIST benchmarks and real-world Electricity dataset.
- This new perspective on continual learning opens up exciting new opportunities for data-efficient learning from a single temporal stream of experience in the absence of clearly defined tasks
- Table1: Average domain incremental Split/Permuted MNIST accuracies of NCTL versus a benchmark suite of continual learning methods, using the setup described in [HLRK18]. NCTL significantly outperforms all other replay-free algorithms, and is able to achieve comparable performance with the replay-augmented GEM, DGR and RtF algorithms. Note that NCTL alone is solving a strictly more difficult variant of the problem where task boundaries and identities are not provided
- Funding Disclosure All authors are employees of DeepMind.
- Privacy and algorithmic bias should be considered for real world applications to ensure that they are ethical and of positive benefit to society. Our proposed algorithm is online and therefore does not necessarily require storing data, which is potentially beneficial in terms of privacy. The algorithm inherits many interpretability properties from GLNs [VLB+19], which might be helpful for understanding and addressing any potential bias issues in deployment.
- [ABE+18] Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018.
- Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, and Lucas Page-Caccia. Online continual learning with maximal interfered retrieval. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 11849–11860. Curran Associates, Inc., 2019.
- [BDR+19] Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A meta-transfer objective for learning to disentangle causal mechanisms. arXiv preprint arXiv:1901.10912, 2019.
- [BFH+18] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, and Skye Wanderman-Milne. JAX: composable transformations of Python+NumPy programs, 2018.
- [BHK+20] David Budden, Matteo Hessel, Iurii Kemaev, Stephen Spencer, and Fabio Viola. Chex: Testing made fun, in JAX!, 2020.
- [BHQ+20] David Budden, Matteo Hessel, John Quan, Steven Kapturowski, Kate Baumli, Surya Bhupatiraju, Aurelia Guy, and Michael King. RLax: Reinforcement Learning in JAX, 2020.
- [BMS+20] David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, and Joel Veness. Gaussian gated linear networks, 2020.
- [CFB+19] Yutian Chen, Abram L Friesen, Feryal Behbahani, David Budden, Matthew W Hoffman, Arnaud Doucet, and Nando de Freitas. Modular meta-learning with shrinkage. arXiv preprint arXiv:1909.05557, 2019.
- G. A. Carpenter and S. Grossberg. The art of adaptive pattern recognition by a selforganizing neural network. Computer, 21(3):77–88, March 1988.
- Zhiyuan Chen and Bing Liu. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3):1–207, 2018.
- Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Kumar Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. Continual learning with tiny episodic memories. CoRR, abs/1902.10486, 2019.
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531, 2013.
- [GDDM14] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun 2014.
- [GMX+13] Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks, 2013.
- [HBV+20] Matteo Hessel, David Budden, Fabio Viola, Mihaela Rosca, Eren Sezener, and Tom Hennigan. Optax: Composable gradient transformation and optimisation, in JAX!, 2020.
- [HCNB20] Tom Hennigan, Trevor Cai, Tamara Norman, and Igor Babuschkin. Haiku: Sonnet for JAX, 2020.
- [HLRK18] Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488, 2018.
- [Hus18] Ferenc Huszár. Note on the quadratic penalties in elastic weight consolidation. Proceedings of the National Academy of Sciences, 115(11):E2496–E2497, 2018.
- [HW99] Michael Harries and New South Wales. Splice-2 comparative evaluation: Electricity pricing. 1999.
- J Zico Kolter and Marcus A Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research, 8(Dec):2755–2790, 2007.
- James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka GrabskaBarwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
- Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 2017.
- [Mat12] Christopher Mattern. Mixing strategies in data compression. In 2012 Data Compression Conference, Snowbird, UT, USA, April 10-12, pages 337–346, 2012.
- Christopher Mattern. Linear and geometric mixtures - analysis. In 2013 Data Compression Conference, DCC 2013, Snowbird, UT, USA, March 20-22, 2013, pages 301–310, 2013.
- [MC89] Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 of Psychology of Learning and Motivation, pages 109 – 165. Academic Press, 1989.
- [MVK+16] Kieran Milan, Joel Veness, James Kirkpatrick, Michael Bowling, Anna Koop, and Demis Hassabis. The forget-me-not process. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3702–3710. Curran Associates, Inc., 2016.
- [PKP+19] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 2019.
- [PKRCS17] Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, and Bernhard Schölkopf. Learning independent causal mechanisms. arXiv preprint arXiv:1712.00961, 2017.
- [Qui14] John Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.
- [RKSL17] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017.
- [Rob95] Anthony V. Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci., 7:123–146, 1995.
- Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, and Joel Veness. Online Learning in Contextual Bandits using Gated Linear Networks. arXiv e-prints, page arXiv:2002.11611, February 2020.
- Jonathan Schwarz, Jelena Luketina, Wojciech M Czarnecki, Agnieszka GrabskaBarwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. arXiv preprint arXiv:1805.06370, 2018.
- [SLKK17] Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pages 2990–2999, 2017.
- Lisa Torrey and Jude W. Shavlik. Handbook of research on machine learning applications, Chapter 11: Transfer Learning. pages 242–264, 2009.
- Gido M van de Ven and Andreas S Tolias. Generative replay with feedback connections as a general strategy for continual learning. arXiv preprint arXiv:1809.10635, 2018.
- Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, and Peter Toth. Online learning with gated linear networks. CoRR, abs/1712.01897, 2017.
- Joel Veness, Tor Lattimore, Avishkar Bhoopchand, David Budden, Christopher Mattern, Agnieszka Grabska-Barwinska, Peter Toth, Simon Schmitt, and Marcus Hutter. Gated linear networks. arXiv preprint arXiv:1910.01526, 2019.
- [VWBG13] Joel Veness, Martha White, Michael Bowling, and András György. Partition tree weighting. In 2013 Data Compression Conference, pages 321–330. IEEE, 2013.
- Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pages 928–936, 2003.
- Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine LearningVolume 70, ICML’17, pages 3987–3995. JMLR. org, 2017.