AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose online structured meta-learning – a novel framework to address online meta-learning under heterogeneous task distribution

Online Structured Meta-learning

NIPS 2020, (2020)

Cited by: 1|Views75
EI
Full Text
Bibtex
Weibo

Abstract

Learning quickly is of great importance for machine intelligence deployed in online platforms. With the capability of transferring knowledge from learned tasks, meta-learning has shown its effectiveness in online scenarios by continuously updating the model with the learned prior. However, current online meta-learning algorithms are lim...More

Code:

Data:

0
Introduction
  • Meta-learning has shown its effectiveness in adapting to new tasks with transferring the prior experience learned from other related tasks [7, 34, 38].
  • To equip agents with such capability, recently, Finn et al [10] presented the online meta-learning framework by connecting meta-learning and online learning.
  • Under this setting, the meta-learner benefits the learning process from the current task and continuously updates itself with accumulated new knowledge
Highlights
  • Meta-learning has shown its effectiveness in adapting to new tasks with transferring the prior experience learned from other related tasks [7, 34, 38]
  • All online meta-learning methods (i.e., FTML, DPM, HSML, online structured meta-learning (OSML)) achieves better performance than the non-meta-learning ones (i.e., NT, FT), which further demonstrate the effectiveness of task-specific adaptation
  • We analyze the amount of data needed to learn each task in Appendix D and the results indicate that the ability of OSML to efficiently learn new tasks
  • We propose OSML – a novel framework to address online meta-learning under heterogeneous task distribution
  • Inspired by the knowledge organization in human brain, OSML maintains a meta-hierarchical structure that consists of various knowledge blocks
  • Two baselines (FTML and DPM) are selected for comparison. In this table, we observe that OSML is able to consistently improve the performance under different settings
  • The information from the new task is further incorporated into the meta-hierarchy by meta-updating the selected knowledge blocks
Methods
  • The authors conduct experiments on both homogeneous and heterogeneous datasets to show the effectiveness of the proposed OSML.
  • The following algorithms are adopted as baselines, including (1) Non-transfer (NT), which only uses support set of task Tt to train the base learner; (2) Fine-tune (FT), which continuously fine-tunes the base model without task-specific adaptation.
  • To make comparison the authors evaluate HSML under the setting by introducing task-awared parameter customization and hierarchical clustering structure.
  • Each combination of image transformation is considered as one task and a total of 56 tasks are generated in the Rainbow MNIST dataset.
  • Additional information about experiment settings are provided in Appendix A.1
Results
  • Results and Analysis

    The results of Rainbow MNIST shown in Figure 2.
  • The authors observe that task-specific online meta-learning methods (i.e., OSML, DPM, and HSML) achieve better performance than FTML
  • This is further supported by the summarized performance of Meta-dataset, where FTML achieved relatively better performance in Aircraft compared with Fungi and Flower.
  • This suggests that the shared metalearner is possibly attracted into a specific mode/region and is not able to make use of the information from all tasks.
  • OSML outperforms DPM and HSML in both datasets, indicating that the meta-hierarchical structure effectively capture heterogeneous task-specific information, and encourage more flexible knowledge sharing
Conclusion
  • Discussion with Related Work

    In meta-learning, the ultimate goal is to enhance the learning ability by utilizing and transferring learned knowledge from related tasks.
  • In the traditional meta-learning setting, all tasks are generated from a stationary distribution.
  • Inspired by the knowledge organization in human brain, OSML maintains a meta-hierarchical structure that consists of various knowledge blocks.
  • For each task, it constructs a meta-knowledge pathway by automatically select the most relevant knowledge blocks.
  • The authors plan to investigate this problem from two aspects: (1) effectively and efficiently structuring the memory buffer and storing the most representative samples for each task; (2) theoretically analyzing the generalization ability of proposed OSML; (3) investigating the performance of OSML on more real-world applications
Summary
  • Introduction:

    Meta-learning has shown its effectiveness in adapting to new tasks with transferring the prior experience learned from other related tasks [7, 34, 38].
  • To equip agents with such capability, recently, Finn et al [10] presented the online meta-learning framework by connecting meta-learning and online learning.
  • Under this setting, the meta-learner benefits the learning process from the current task and continuously updates itself with accumulated new knowledge
  • Objectives:

    In meta-learning, the authors aim to construct a well-organized meta-learner that can 1) benefit fast adaptation in the current task with task-specific structured prior; 2) accumulate and organize the newly learned experience; and 3) automatically adapt and expand for unseen structured knowledge.
  • Methods:

    The authors conduct experiments on both homogeneous and heterogeneous datasets to show the effectiveness of the proposed OSML.
  • The following algorithms are adopted as baselines, including (1) Non-transfer (NT), which only uses support set of task Tt to train the base learner; (2) Fine-tune (FT), which continuously fine-tunes the base model without task-specific adaptation.
  • To make comparison the authors evaluate HSML under the setting by introducing task-awared parameter customization and hierarchical clustering structure.
  • Each combination of image transformation is considered as one task and a total of 56 tasks are generated in the Rainbow MNIST dataset.
  • Additional information about experiment settings are provided in Appendix A.1
  • Results:

    Results and Analysis

    The results of Rainbow MNIST shown in Figure 2.
  • The authors observe that task-specific online meta-learning methods (i.e., OSML, DPM, and HSML) achieve better performance than FTML
  • This is further supported by the summarized performance of Meta-dataset, where FTML achieved relatively better performance in Aircraft compared with Fungi and Flower.
  • This suggests that the shared metalearner is possibly attracted into a specific mode/region and is not able to make use of the information from all tasks.
  • OSML outperforms DPM and HSML in both datasets, indicating that the meta-hierarchical structure effectively capture heterogeneous task-specific information, and encourage more flexible knowledge sharing
  • Conclusion:

    Discussion with Related Work

    In meta-learning, the ultimate goal is to enhance the learning ability by utilizing and transferring learned knowledge from related tasks.
  • In the traditional meta-learning setting, all tasks are generated from a stationary distribution.
  • Inspired by the knowledge organization in human brain, OSML maintains a meta-hierarchical structure that consists of various knowledge blocks.
  • For each task, it constructs a meta-knowledge pathway by automatically select the most relevant knowledge blocks.
  • The authors plan to investigate this problem from two aspects: (1) effectively and efficiently structuring the memory buffer and storing the most representative samples for each task; (2) theoretically analyzing the generalization ability of proposed OSML; (3) investigating the performance of OSML on more real-world applications
Tables
  • Table1: Performance w.r.t. the number of samples per task on Meta-dataset
Download tables as Excel
Funding
  • The work was supported in part by NSF awards #1652525 and #1618448
  • The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies
Study subjects and analysis
datasets: 3
In addition, new knowledge is further incorporated into the selected blocks. Experiments on three datasets demonstrate the effectiveness and interpretability of our proposed framework in the context of both homogeneous and heterogeneous tasks. In this section, we conduct experiments on both homogeneous and heterogeneous datasets to show the effectiveness of the proposed OSML

training samples: 900
Each combination of image transformation is considered as one task and thus a total of 56 tasks are generated in the Rainbow MNIST dataset. Each task contains 900 training samples and 100 testing samples. We adopt the classical four-block convolutional network as the base model

datasets: 3
In addition, new knowledge is further incorporated into the selected blocks. Experiments on three datasets demonstrate the effectiveness and interpretability of our proposed framework in the context of both homogeneous and heterogeneous tasks. Meta-learning has shown its effectiveness in adapting to new tasks with transferring the prior experience learned from other related tasks [7, 34, 38]

Reference
  • Tameem Adel, Han Zhao, and Richard E Turner. Continual learning with adaptive weights (claw). arXiv preprint arXiv:1911.09514, 2019.
    Findings
  • Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, and Pieter Abbeel. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641, 2017.
    Findings
  • Ferran Alet, Tomas Lozano-Perez, and Leslie P Kaelbling. Modular meta-learning. In Conference on Robot Learning, pages 856–868, 2018.
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV), pages 532–547, 2018.
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with a-GEM. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A Rusu, Alexander Pritzel, and Daan Wierstra. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017.
    Findings
  • Chelsea Finn and Sergey Levine. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. arXiv preprint arXiv:1710.11622, 2017.
    Findings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 1126–1135. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Kelvin Xu, and Sergey Levine. Probabilistic model-agnostic meta-learning. arXiv preprint arXiv:1806.02817, 2018.
    Findings
  • Chelsea Finn, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. Online meta-learning. In International Conference on Machine Learning, pages 1920–1930, 2019.
    Google ScholarLocate open access versionFindings
  • Sebastian Flennerhag, Andrei A Rusu, Razvan Pascanu, Hujun Yin, and Raia Hadsell. Metalearning with warped gradient descent. arXiv preprint arXiv:1909.00025, 2019.
    Findings
  • Victor Garcia and Joan Bruna. Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043, 2017.
    Findings
  • Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, and Thomas Griffiths. Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930, 2018.
    Findings
  • Jiatao Gu, Yong Wang, Yun Chen, Victor OK Li, and Kyunghyun Cho. Meta-learning for low-resource neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3622–3631, 2018.
    Google ScholarLocate open access versionFindings
  • Ghassen Jerfel, Erin Grant, Tom Griffiths, and Katherine A Heller. Reconciling meta-learning and continual learning with online mixtures of tasks. In Advances in Neural Information Processing Systems, pages 9119–9130, 2019.
    Google ScholarLocate open access versionFindings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
    Google ScholarLocate open access versionFindings
  • Yoonho Lee and Seungjin Choi. Gradient-based meta-learning with learned layerwise metric and subspace. In International Conference on Machine Learning, pages 2927–2936, 2018.
    Google ScholarLocate open access versionFindings
  • Jeongtae Lee, Jaehong Yun, Sungju Hwang, and Eunho Yang. Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547, 2017.
    Findings
  • Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few shot learning. arXiv preprint arXiv:1707.09835, 2017.
    Findings
  • Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, and Caiming Xiong. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. arXiv preprint arXiv:1904.00310, 2019.
    Findings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
    Findings
  • Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, and Yi Yang. Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002, 2018.
    Findings
  • David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 2017.
    Google ScholarLocate open access versionFindings
  • Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141, 2017.
    Findings
  • Anusha Nagabandi, Chelsea Finn, and Sergey Levine. Deep online learning via meta-learning: Continual adaptation for model-based rl. arXiv preprint arXiv:1812.07671, 2018.
    Findings
  • Alex Nichol and John Schulman. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, 2018.
    Findings
  • Boris Oreshkin, Pau Rodríguez López, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems, pages 721–731, 2018.
    Google ScholarLocate open access versionFindings
  • Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Advances in Neural Information Processing Systems, pages 113–124, 2019.
    Google ScholarLocate open access versionFindings
  • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
    Google ScholarLocate open access versionFindings
  • Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
    Findings
  • Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018.
    Findings
  • Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. In International Conference on Machine Learning, pages 4528–4537, 2018.
    Google ScholarLocate open access versionFindings
  • Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pages 2990–2999, 2017.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Advances in neural information processing systems, pages 4077–4087, 2017.
    Google ScholarLocate open access versionFindings
  • Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1199–1208, 2018.
    Google ScholarLocate open access versionFindings
  • Ming Tan, Yang Yu, Haoyu Wang, Dakuo Wang, Saloni Potdar, Shiyu Chang, and Mo Yu. Outof-domain detection for low-resource text classification tasks. arXiv preprint arXiv:1909.05357, 2019.
    Findings
  • Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, and Hugo Larochelle. Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096, 2019.
    Findings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630–3638, 2016.
    Google ScholarLocate open access versionFindings
  • Risto Vuorio, Shao-Hua Sun, Hexiang Hu, and Joseph J Lim. Multimodal model-agnostic meta-learning via task-aware modulation. In Advances in Neural Information Processing Systems, pages 1–12, 2019.
    Google ScholarLocate open access versionFindings
  • Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gonzalez, and Fisher Yu. Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957, 2020.
    Findings
  • Huaxiu Yao, Ying Wei, Junzhou Huang, and Zhenhui Li. Hierarchically structured meta-learning. In International Conference on Machine Learning, pages 7045–7054, 2019.
    Google ScholarLocate open access versionFindings
  • Huaxiu Yao, Xian Wu, Zhiqiang Tao, Yaliang Li, Bolin Ding, Ruirui Li, and Zhenhui Li. Automated relational meta-learning. arXiv preprint arXiv:2001.00745, 2020.
    Findings
  • Sung Whan Yoon, Jun Seo, and Jaekyun Moon. Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In International Conference on Machine Learning, pages 7115–7123, 2019.
    Google ScholarLocate open access versionFindings
  • Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3987–3995. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
小科