AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We provide a concrete instantiation of our framework using the recent and complementary Forget-Me-Not Process and Gated Linear Network models

A Combinatorial Perspective On Transfer Learning

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), (2020)

Cited by: 1|Views179
EI
Full Text
Bibtex
Weibo

Abstract

Human intelligence is characterized not only by the capacity to learn complex skills, but the ability to rapidly adapt and acquire new skills within an ever-changing environment. In this work we study how the learning of modular solutions can allow for effective generalization to both unseen and potentially differently distributed data. O...More
0
Introduction
  • Humans learn new tasks from a single temporal stream by efficiently transferring experience of previously encountered tasks.
  • Instead of considering “online" and “continual" learning as inconvenient constraints to avoid, in this paper the authors describe a framework that leverages them as desirable properties to enable effective, data-efficient transfer of previously acquired skills
  • Core to this framework is the ability to ensemble task-specific neural networks at the level of individual nodes.
  • The authors refer to the intersection of these challenges as online continual learning
Highlights
  • Humans learn new tasks from a single temporal stream by efficiently transferring experience of previously encountered tasks
  • We present our analysis in three parts: in Section 5.1, we demonstrate that Neural Combinatorial Transfer Learning (NCTL) exhibits combinatorial transfer using a more challenging variant of the standard Split MNIST protocol; in Section 5.2, we compare the performance of NCTL to many previous continual learning algorithms across standard Permuted and Split MNIST variants, using the same test and train splits as previously published; in Section 5.3, we further evaluate NCTL on a widely used real-world dataset Electricity (Elec2-3) which exhibits temporal dependencies and distribution drift
  • In this paper we described a framework for combinatorial transfer, whereby a network with m nodes trained on h tasks generalizes to hm possible task instantiations
  • This framework relies on the ability to meaningfully ensemble networks at the level of individual nodes, which is not a property of contemporary neural networks trained via back-propagation
  • We provide a concrete instantiation of our framework using the recent and complementary Forget-Me-Not Process and Gated Linear Network models
  • We provide a variety of experimental evidence that our NCTL algorithm does exhibit combinatorial transfer, and that leads to both positive forward and backward transfer in a difficult online setting with no access to task boundaries or identities
Results
  • The authors explore the properties of the NCTL algorithm empirically. The authors present the analysis in three parts: in Section 5.1, the authors demonstrate that NCTL exhibits combinatorial transfer using a more challenging variant of the standard Split MNIST protocol; in Section 5.2, the authors compare the performance of NCTL to many previous continual learning algorithms across standard Permuted and Split MNIST variants, using the same test and train splits as previously published; in Section 5.3, the authors further evaluate NCTL on a widely used real-world dataset Electricity (Elec2-3) which exhibits temporal dependencies and distribution drift.
  • Each feature vector is standardized componentwise to have zero mean and unit variance.
  • This normalized feature vector is broadcast to every neuron as side information z.
Conclusion
  • In this paper the authors described a framework for combinatorial transfer, whereby a network with m nodes trained on h tasks generalizes to hm possible task instantiations.
  • The authors demonstrate empirically that the distribution of pseudo-tasks is semantically meaningful by comparing NCTL to a number of contemporary continual learning algorithms on standard Split/Permuted MNIST benchmarks and real-world Electricity dataset.
  • This new perspective on continual learning opens up exciting new opportunities for data-efficient learning from a single temporal stream of experience in the absence of clearly defined tasks
Tables
  • Table1: Average domain incremental Split/Permuted MNIST accuracies of NCTL versus a benchmark suite of continual learning methods, using the setup described in [HLRK18]. NCTL significantly outperforms all other replay-free algorithms, and is able to achieve comparable performance with the replay-augmented GEM, DGR and RtF algorithms. Note that NCTL alone is solving a strictly more difficult variant of the problem where task boundaries and identities are not provided
Funding
  • Funding Disclosure All authors are employees of DeepMind.
Reference
  • Privacy and algorithmic bias should be considered for real world applications to ensure that they are ethical and of positive benefit to society. Our proposed algorithm is online and therefore does not necessarily require storing data, which is potentially beneficial in terms of privacy. The algorithm inherits many interpretability properties from GLNs [VLB+19], which might be helpful for understanding and addressing any potential bias issues in deployment.
    Google ScholarFindings
  • [ABE+18] Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018.
    Google ScholarLocate open access versionFindings
  • Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, and Lucas Page-Caccia. Online continual learning with maximal interfered retrieval. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 11849–11860. Curran Associates, Inc., 2019.
    Google ScholarLocate open access versionFindings
  • [BDR+19] Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A meta-transfer objective for learning to disentangle causal mechanisms. arXiv preprint arXiv:1901.10912, 2019.
    Findings
  • [BFH+18] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, and Skye Wanderman-Milne. JAX: composable transformations of Python+NumPy programs, 2018.
    Google ScholarFindings
  • [BHK+20] David Budden, Matteo Hessel, Iurii Kemaev, Stephen Spencer, and Fabio Viola. Chex: Testing made fun, in JAX!, 2020.
    Google ScholarLocate open access versionFindings
  • [BHQ+20] David Budden, Matteo Hessel, John Quan, Steven Kapturowski, Kate Baumli, Surya Bhupatiraju, Aurelia Guy, and Michael King. RLax: Reinforcement Learning in JAX, 2020.
    Google ScholarLocate open access versionFindings
  • [BMS+20] David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, and Joel Veness. Gaussian gated linear networks, 2020.
    Google ScholarFindings
  • [CFB+19] Yutian Chen, Abram L Friesen, Feryal Behbahani, David Budden, Matthew W Hoffman, Arnaud Doucet, and Nando de Freitas. Modular meta-learning with shrinkage. arXiv preprint arXiv:1909.05557, 2019.
    Findings
  • G. A. Carpenter and S. Grossberg. The art of adaptive pattern recognition by a selforganizing neural network. Computer, 21(3):77–88, March 1988.
    Google ScholarLocate open access versionFindings
  • Zhiyuan Chen and Bing Liu. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3):1–207, 2018.
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Kumar Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. Continual learning with tiny episodic memories. CoRR, abs/1902.10486, 2019.
    Findings
  • Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531, 2013.
    Findings
  • [GDDM14] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun 2014.
    Google ScholarLocate open access versionFindings
  • [GMX+13] Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks, 2013.
    Google ScholarFindings
  • [HBV+20] Matteo Hessel, David Budden, Fabio Viola, Mihaela Rosca, Eren Sezener, and Tom Hennigan. Optax: Composable gradient transformation and optimisation, in JAX!, 2020.
    Google ScholarLocate open access versionFindings
  • [HCNB20] Tom Hennigan, Trevor Cai, Tamara Norman, and Igor Babuschkin. Haiku: Sonnet for JAX, 2020.
    Google ScholarFindings
  • [HLRK18] Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488, 2018.
    Findings
  • [Hus18] Ferenc Huszár. Note on the quadratic penalties in elastic weight consolidation. Proceedings of the National Academy of Sciences, 115(11):E2496–E2497, 2018.
    Google ScholarLocate open access versionFindings
  • [HW99] Michael Harries and New South Wales. Splice-2 comparative evaluation: Electricity pricing. 1999.
    Google ScholarFindings
  • J Zico Kolter and Marcus A Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research, 8(Dec):2755–2790, 2007.
    Google ScholarLocate open access versionFindings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka GrabskaBarwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
    Google ScholarLocate open access versionFindings
  • Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
    Google ScholarLocate open access versionFindings
  • David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 2017.
    Google ScholarLocate open access versionFindings
  • [Mat12] Christopher Mattern. Mixing strategies in data compression. In 2012 Data Compression Conference, Snowbird, UT, USA, April 10-12, pages 337–346, 2012.
    Google ScholarFindings
  • Christopher Mattern. Linear and geometric mixtures - analysis. In 2013 Data Compression Conference, DCC 2013, Snowbird, UT, USA, March 20-22, 2013, pages 301–310, 2013.
    Google ScholarFindings
  • [MC89] Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 of Psychology of Learning and Motivation, pages 109 – 165. Academic Press, 1989.
    Google ScholarFindings
  • [MVK+16] Kieran Milan, Joel Veness, James Kirkpatrick, Michael Bowling, Anna Koop, and Demis Hassabis. The forget-me-not process. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3702–3710. Curran Associates, Inc., 2016.
    Google ScholarLocate open access versionFindings
  • [PKP+19] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 2019.
    Google ScholarLocate open access versionFindings
  • [PKRCS17] Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, and Bernhard Schölkopf. Learning independent causal mechanisms. arXiv preprint arXiv:1712.00961, 2017.
    Findings
  • [Qui14] John Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.
    Google ScholarFindings
  • [RKSL17] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017.
    Google ScholarLocate open access versionFindings
  • [Rob95] Anthony V. Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci., 7:123–146, 1995.
    Google ScholarLocate open access versionFindings
  • Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, and Joel Veness. Online Learning in Contextual Bandits using Gated Linear Networks. arXiv e-prints, page arXiv:2002.11611, February 2020.
    Findings
  • Jonathan Schwarz, Jelena Luketina, Wojciech M Czarnecki, Agnieszka GrabskaBarwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. arXiv preprint arXiv:1805.06370, 2018.
    Findings
  • [SLKK17] Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pages 2990–2999, 2017.
    Google ScholarLocate open access versionFindings
  • Lisa Torrey and Jude W. Shavlik. Handbook of research on machine learning applications, Chapter 11: Transfer Learning. pages 242–264, 2009.
    Google ScholarFindings
  • Gido M van de Ven and Andreas S Tolias. Generative replay with feedback connections as a general strategy for continual learning. arXiv preprint arXiv:1809.10635, 2018.
    Findings
  • Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, and Peter Toth. Online learning with gated linear networks. CoRR, abs/1712.01897, 2017.
    Findings
  • Joel Veness, Tor Lattimore, Avishkar Bhoopchand, David Budden, Christopher Mattern, Agnieszka Grabska-Barwinska, Peter Toth, Simon Schmitt, and Marcus Hutter. Gated linear networks. arXiv preprint arXiv:1910.01526, 2019.
    Findings
  • [VWBG13] Joel Veness, Martha White, Michael Bowling, and András György. Partition tree weighting. In 2013 Data Compression Conference, pages 321–330. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pages 928–936, 2003.
    Google ScholarLocate open access versionFindings
  • Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine LearningVolume 70, ICML’17, pages 3987–3995. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科