AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Classic meta-learning algorithms account for the structure of the problem space but define complex optimization objectives

Modeling and Optimization Trade-off in Meta-learning

NIPS 2020, (2020)

Cited by: 0|Views16
EI
Full Text
Bibtex
Weibo

Abstract

By searching for shared inductive biases across tasks, meta-learning promises to accelerate learning on novel tasks, but with the cost of solving a complex bilevel optimization problem. We introduce and rigorously define the trade-off between accurate modeling and optimization ease in meta-learning. At one end, classic meta-learning alg...More

Code:

Data:

0
Introduction
  • The major bottleneck of applying machine learning to many practical problems is the cost associated with data and/or labeling.
  • While the cost of labeling and data makes supervised learning problems expensive, the high sample complexity of reinforcement learning makes it downright inapplicable for many practical settings.
  • Meta-learning is designed to ease the sample complexity of these methods.
  • It has had success stories on a wide range of problems including image recognition and reinforcement learning [14].
  • Under the PAC framework, Baxter [2] shows that given sufficiently many tasks and data per task during meta-training, there are guarantees on the generalization of learned biases to novel tasks
Highlights
  • The major bottleneck of applying machine learning to many practical problems is the cost associated with data and/or labeling
  • Meta-learning, or ‘learning to learn’ [24], makes the observation that if the learner has access to a collection of tasks sampled from a distribution p(γ), it can utilize an offline meta-training stage to search for shared inductive biases that assist in learning future tasks from p(γ)
  • In other words our result shows that domain randomized search (DRS), it ignores the meta-learning problem structure as discussed in Section 1, provably solves the problem of meta-learning the initialization of an iterative optimization problem under sensible assumptions
  • This paper introduces an important trade-off in meta-learning, that of accurately modeling the metalearning problem and complexity of the optimization problem
  • Classic meta-learning algorithms account for the structure of the problem space but define complex optimization objectives
  • Through an analysis of the sample complexity for smooth nonconvex risk functions, we show that DRS and MAML both solve the meta-learning problem and delineate the roles of optimization complexity and modeling accuracy
Results
  • The authors carried out simulations to empirically study the trade-off in the linear regression case.
  • Figure 3 shows contour plots of the fraction of the datasets for which the MAML estimate has lower expected loss before meta-testing optimization and after, for several values of α.
  • For the first two environments with variations in system dynamics only, seen in Figure 4-(a,b) and 5(a,b), DRS is superior to MAML throughout training.
  • For the four environments with variations in reward functions only, either 1) DRS and MAML are comparable (Figure 4-(d) and 5-(e,f)), or.
  • In the final two environments with variations in system dynamics and reward functions, the standard errors are generally too large to make a definite statement (see Figure 4-(g,h) and 5-(h))
Conclusion
  • This paper introduces an important trade-off in meta-learning, that of accurately modeling the metalearning problem and complexity of the optimization problem.
  • Classic meta-learning algorithms account for the structure of the problem space but define complex optimization objectives.
  • Through an analysis of the sample complexity for smooth nonconvex risk functions, the authors show that DRS and MAML both solve the meta-learning problem and delineate the roles of optimization complexity and modeling accuracy.
  • All three studies show that the balance of the trade-off is determined by the sample sizes but characteristics of the meta-learning problem, such as the smoothness of the task risk functions
Summary
  • Introduction:

    The major bottleneck of applying machine learning to many practical problems is the cost associated with data and/or labeling.
  • While the cost of labeling and data makes supervised learning problems expensive, the high sample complexity of reinforcement learning makes it downright inapplicable for many practical settings.
  • Meta-learning is designed to ease the sample complexity of these methods.
  • It has had success stories on a wide range of problems including image recognition and reinforcement learning [14].
  • Under the PAC framework, Baxter [2] shows that given sufficiently many tasks and data per task during meta-training, there are guarantees on the generalization of learned biases to novel tasks
  • Results:

    The authors carried out simulations to empirically study the trade-off in the linear regression case.
  • Figure 3 shows contour plots of the fraction of the datasets for which the MAML estimate has lower expected loss before meta-testing optimization and after, for several values of α.
  • For the first two environments with variations in system dynamics only, seen in Figure 4-(a,b) and 5(a,b), DRS is superior to MAML throughout training.
  • For the four environments with variations in reward functions only, either 1) DRS and MAML are comparable (Figure 4-(d) and 5-(e,f)), or.
  • In the final two environments with variations in system dynamics and reward functions, the standard errors are generally too large to make a definite statement (see Figure 4-(g,h) and 5-(h))
  • Conclusion:

    This paper introduces an important trade-off in meta-learning, that of accurately modeling the metalearning problem and complexity of the optimization problem.
  • Classic meta-learning algorithms account for the structure of the problem space but define complex optimization objectives.
  • Through an analysis of the sample complexity for smooth nonconvex risk functions, the authors show that DRS and MAML both solve the meta-learning problem and delineate the roles of optimization complexity and modeling accuracy.
  • All three studies show that the balance of the trade-off is determined by the sample sizes but characteristics of the meta-learning problem, such as the smoothness of the task risk functions
Tables
  • Table1: Learning rates (LR), step sizes, and inner learning rates chosen by grid search
Download tables as Excel
Related work
  • Recent work on few-shot image classification has shown that features from learning a deep network classifier on a large training set combined with a simple classifier at meta-testing may outperform many meta-learning algorithms [32, 4, 25]; a similar observation has been made for few-shot object detection [31]. Packer et al [17] show that DRS outperforms RL2 [5] on simple reinforcement learning environments where tasks correspond to different system dynamics. Our meta-RL experiments complement these works and theoretical studies partially explain them. We argue that there is a larger picture to be considered; the trade-off between modeling accuracy and optimization ease depend on characteristics of the dataset, model, and optimization, and should be studied on a case-by-case basis.

    Previous theoretical studies of MAML have primarily focused on the meta-training stage. Finn et al.
Study subjects and analysis
studies: 3
For meta-linear regression, we prove theoretically and verify in simulations that while MAML can utilize the geometry of the distribution of task losses to improve performance through meta-testing optimization, this modeling gain can be counterbalanced by its greater optimization error for small sample sizes. All three studies show that the balance of the trade-off is not only determined by the sample sizes but characteristics of the meta-learning problem, such as the smoothness of the task risk functions. There are several interesting directions for future work

Reference
  • Dario Amodei and Danny Hernandez. AI and compute, 2018. URL https://openai.com/blog/ai-and-compute.
    Findings
  • Jonathan Baxter. A model of inductive bias learning. Journal of artificial intelligence research, 12:149–198, 2000.
    Google ScholarLocate open access versionFindings
  • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
    Findings
  • Yinbo Chen, Xiaolong Wang, Zhuang Liu, Huijuan Xu, and Trevor Darrell. A new meta-baseline for few-shot learning. arXiv preprint arXiv:2003.04390, 2020.
    Findings
  • Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. RL2: Fast reinforcement learning via slow reinforcement learning. arXiv:1611.02779, 2016.
    Findings
  • Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. On the convergence theory of gradientbased model-agnostic meta-learning algorithms. arXiv preprint arXiv:1908.10400, 2019.
    Findings
  • Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. Provably convergent policy gradient methods for model-agnostic meta-reinforcement learning. arXiv preprint arXiv:2002.05135, 2020.
    Findings
  • Chelsea Finn and Sergey Levine. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. arXiv preprint arXiv:1710.11622, 2017.
    Findings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, pages 1126–1135. JMLR, 2017.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. Online meta-learning. arXiv preprint arXiv:1902.08438, 2019.
    Findings
  • Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimilano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. arXiv preprint arXiv:1806.04910, 2018.
    Findings
  • Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
    Google ScholarLocate open access versionFindings
  • Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, and Saverio Salzo. On the iteration complexity of hypergradient computation. arXiv preprint arXiv:2006.16218, 2020.
    Findings
  • Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439, 2020.
    Findings
  • Patrice Marcotte and Gilles Savard. Bilevel programming: A combinatorial perspective. In Graph theory and combinatorial optimization, pages 191–217.
    Google ScholarLocate open access versionFindings
  • Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
    Findings
  • Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, and Dawn Song. Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282, 2018.
    Findings
  • Roger Penrose. On best approximate solutions of linear matrix equations. Mathematical Proceedings of the Cambridge Philosophical Society, 52(1):17–19, 1956.
    Google ScholarLocate open access versionFindings
  • Aniruddh Raghu, Maithra Raghu, Samy Bengio, and Oriol Vinyals. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
    Findings
  • Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Advances in Neural Information Processing Systems, pages 113–124, 2019.
    Google ScholarLocate open access versionFindings
  • Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, and Pieter Abbeel. Promp: Proximal meta-policy search. arXiv preprint arXiv:1810.06784, 2018.
    Findings
  • John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning (ICML), pages 1889–1897, 2015.
    Google ScholarLocate open access versionFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • Sebastian Thrun and Lorien Pratt. Learning to learn: Introduction and overview. In Learning to learn, pages 3–17.
    Google ScholarLocate open access versionFindings
  • Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B Tenenbaum, and Phillip Isola. Rethinking few-shot image classification: a good embedding is all you need? arXiv preprint arXiv:2003.11539, 2020.
    Findings
  • Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012.
    Google ScholarLocate open access versionFindings
  • Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2018.
    Google ScholarFindings
  • Haoxiang Wang, Ruoyu Sun, and Bo Li. Global convergence and induced kernels of gradientbased meta-learning with neural nets. arXiv preprint arXiv:2006.14606, 2020.
    Findings
  • Lingxiao Wang, Qi Cai, Zhuoran Yang, and Zhaoran Wang. On the global optimality of model-agnostic meta-learning. arXiv preprint arXiv:2006.13182, 2020.
    Findings
  • Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gonzalez, and Fisher Yu. Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957, 2020.
    Findings
  • Yan Wang, Wei-Lun Chao, Kilian Q Weinberger, and Laurens van der Maaten. Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623, 2019.
    Findings
  • Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2019. Each iteration of ProMP (TRPO-MAML) requires twice as many steps from the simulator as DRS+PPO (DRS+TRPO). Therefore, to ensure that each algorithm utilizes the same amount of data, we run ProMP (TRPO-MAML) for half as many iterations as DRS+PPO (DRS+TRPO). More specifically, for the robotic locomotion environments, we run ProMP (TRPO-MAML) for 1000 iterations and DRS+PPO (DRS+TRPO) for 2000. For the manipulation environments, we run ProMP (TRPO-MAML) for 10000 iterations and DRS+PPO (DRS+TRPO) for 20000. These go beyond the number of training steps used in Rothfuss et al. [21] and Yu et al. [33].
    Google ScholarLocate open access versionFindings
Author
Katelyn Gao
Katelyn Gao
Ozan Sener
Ozan Sener
Your rating :
0

 

Tags
Comments
小科