Training Stronger Baselines for Learning to Optimize

NIPS 2020, (2020)

被引用6|浏览96
EI
下载 PDF 全文
引用
微博一下

摘要

Learning to optimize (L2O) has gained increasing attention since classical optimizers require laborious problem-specific design and hyperparameter tuning. However, there is a gap between the practical demand and the achievable performance of existing L2O models. Specifically, those learned optimizers are applicable to only a limited cla...更多

代码

数据

0
简介
  • Learning to optimize (L2O) [1,2,3,4,5,6,7,8,9,10], a rising sub-field of meta

    Parameter Update learning, aims to replace manually designed analytical optimizers with learned optimizers, i.e., update rules as functions that can be fit from data.
  • Optimizee Variables a model to parameterize the target update rule.
  • L2O model will act as an algorithm itself, that can be applied to training other machine learning models, called optimizees, sampled from a specific class of similar problem.
  • The training of the L2O model is usually done in a meta-fashion, by enforcing it to decrease the loss values over sampled optimizees from the same class, via certain training techniques.
  • That LSTM is unrolled to mimic the behavior of an iterative optimizer and trained
重点内容
  • Learning to optimize (L2O) [1,2,3,4,5,6,7,8,9,10], a rising sub-field of meta

    Parameter Update learning, aims to replace manually designed analytical optimizers with learned optimizers, i.e., update rules as functions that can be fit from data
  • 4.1 Imitation learning: multi-task regularization by analytical optimizers. We propose another L2O training method based on imitation of analytical optimizers behaviours, through a multi-task learning form, which is found to further stabilize our training, prevent overfitting, and improve the trained L2O models’ generalization
  • Our improved training techniques can be further plugged into previous state-of-the-art L2O methods and yield extra performance boosts for them all
  • We propose a set of improved training techniques to unleash the great potential of L2O models
  • The contributions made in this work are of practical nature; we hope them to lay a solid and fair evaluation ground by offering strong baselines for the L2O community
  • This paper proposes several improved training techniques to tackle the dilemma of training instability and poor generalization in learned optimizers
方法
  • Experiments and Analysis

    the authors conduct systematic experiments to evaluate the proposed training techniques.
  • The authors' results come with multiple independent runs, and the error bars are reported in the Appendix
结果
  • Results and Analysis

    The results are presented in figure A2. The authors observe that the model trained by curriculum learning outperforms the two baselines (i.e., L2O-DM and L2O-DM-AUG) with fewer training iterations.
结论
  • Learning to optimize (L2O) is a promising field of meta learning that has so far been a bit held back by unstable L2O training and the poor generalization of learned optimizers.
  • This work provides practical solutions to push this field forward.
  • The authors propose a set of improved training techniques to unleash the great potential of L2O models.
  • The contributions made in this work are of practical nature; the authors hope them to lay a solid and fair evaluation ground by offering strong baselines for the L2O community.
相关工作
  • Learning to Optimize L2O uses a data-driven learned model as the optimizer, instead of handcrafted rules (e.g., SGD, RMSprop, and Adam). [1] was the first to leverage an LSTM as the coordinate-wise optimizer, which is fed with the optimizee gradients and outputs the optimizee parameter updates. [6] instead took the optimizee’s objective value history, as the input state of a reinforcement learning agent, which outputs the updates as actions. To train an L2O with better generalization and longer horizons, [7] proposes random scaling and convex function regularizers tricks. [8,23] introduce a hierarchical RNN to capture the relationship across the optimizee parameters and trains it via meta learning on the ensemble of small representative problems. Besides learning the full update rule, L2O was also customized to automatic hyperparamter tuning in specific tasks [24,25,26]. Curriculum Learning The idea [27] is to first focuses on learning from a subset of simple training examples, and gradually expanding to include the remaining harder samples. Curriculum learning often yields faster convergence and better generalization, especially when the training set is varied or noisy. [28] unifies it with self-paced learning. [29] automates the curriculum learning by employing a non-stationary multi-armed bandit algorithm with a reward of learning progress indicators. [30,31,32,33] describe a number of applications where curriculum learning plays important roles. Imitation Learning Imitation learning [34, 35], also known as "learning from demonstration", is to imitate an expert demonstration instead of learning from rewards as in reinforcement learning.
研究对象与分析
representative optimizees: 5
L2ODM-CL-IL denotes the enhanced L2O model. Learned optimizers are evaluated on five representative optimizees and the corresponding optimizee training loss are collected in Figure 2. From the results in Figure 2, we observe that the previously noncompetitive L2O-DM, that initially even cannot stably converge on the Optimizee i) at long horizons, now consistently and largely outperforms over all previous SOTA L2O methods: RNNprop, L2O-Scale, and L2O-Scale-Meta, by decreasing the objective loss value much lower

引用论文
  • Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, 2016.
    Google ScholarLocate open access versionFindings
  • Samy Bengio, Yoshua Bengio, and Jocelyn Cloutier. On the search for new learning rules for ANNs. Neural Processing Letters, 2(4):26–30, 1995.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Samy Bengio, and Jocelyn Cloutier. Learning a synaptic learning rule. Université de Montréal, Département d’informatique et de recherche..., 1990.
    Google ScholarFindings
  • A Steven Younger, Peter R Conwell, and Neil E Cotter. Fixed-weight on-line learning. IEEE Transactions on Neural Networks, 10(2):272–283, 1999.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter, A Steven Younger, and Peter R Conwell. Learning to learn using gradient descent. In International Conference on Artificial Neural Networks, pages 87–94.
    Google ScholarLocate open access versionFindings
  • Yutian Chen, Matthew W Hoffman, Sergio Gómez Colmenarejo, Misha Denil, Timothy P Lillicrap, Matt Botvinick, and Nando de Freitas. Learning to learn without gradient descent by gradient descent. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 748–75JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Kaifeng Lv, Shunhua Jiang, and Jian Li. Learning gradient descent: Better generalization and longer horizons. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 2247–2255. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Olga Wichrowska, Niru Maheswaranathan, Matthew W Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, and Jascha Sohl-Dickstein. Learned optimizers that scale and generalize. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3751–3760. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Yue Cao, Tianlong Chen, Zhangyang Wang, and Yang Shen. Learning to optimize in swarms. In Advances in Neural Information Processing Systems, pages 15018–15028, 2019.
    Google ScholarLocate open access versionFindings
  • Zhaohui Yang, Yunhe Wang, Kai Han, Chunjing Xu, Chao Xu, Dacheng Tao, and Chang Xu. Searching for low-bit weights in quantized neural networks. arXiv preprint arXiv:2009.08695, 2020.
    Findings
  • David E Goldberg and John Henry Holland. Genetic algorithms and machine learning. 1988.
    Google ScholarFindings
  • Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei. On the optimization of a synaptic learning rule. In Preprints Conf. Optimality in Artificial and Biological Neural Networks, pages 6–8. Univ. of Texas, 1992.
    Google ScholarLocate open access versionFindings
  • James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of machine learning research, 13(Feb):281–305, 2012.
    Google ScholarLocate open access versionFindings
  • Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc V Le. Neural optimizer search with reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 459–468. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Ke Li and Jitendra Malik. Learning to optimize. arXiv preprint arXiv:1606.01885, 2016.
    Findings
  • Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
    Google ScholarLocate open access versionFindings
  • Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C Daniel Freeman, and Jascha SohlDickstein. Understanding and correcting pathologies in the training of learned optimizers. arXiv preprint arXiv:1810.10180, 2018.
    Findings
  • Corentin Tallec and Yann Ollivier. Unbiasing truncated backpropagation through time. arXiv preprint arXiv:1705.08209, 2017.
    Findings
  • Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318, 2013.
    Google ScholarLocate open access versionFindings
  • Paavo Parmas, Carl Edward Rasmussen, Jan Peters, and Kenji Doya. Pipps: Flexible modelbased policy search robust to the curse of chaos. arXiv preprint arXiv:1902.01240, 2019.
    Findings
  • Xinshi Chen, Yu Li, Ramzan Umarov, Xin Gao, and Le Song. Rna secondary structure prediction by learning unrolled algorithms. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Xinshi Chen, Hanjun Dai, Yu Li, Xin Gao, and Le Song. Learning to stop while learning to predict. arXiv preprint arXiv:2006.05082, 2020.
    Findings
  • Chaojian Li, Tianlong Chen, Haoran You, Zhangyang Wang, and Yingyan Lin. Halo: Hardwareaware learning to optimize. In Proceedings of the European Conference on Computer Vision (ECCV), September 2020.
    Google ScholarLocate open access versionFindings
  • Yuning You, Tianlong Chen, Zhangyang Wang, and Yang Shen. L2-gcn: Layer-wise and learned efficient training of graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2127–2135, 2020.
    Google ScholarLocate open access versionFindings
  • Wuyang Chen, Zhiding Yu, Zhangyang Wang, and Anima Anandkumar. Automated syntheticto-real generalization. International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • Xuxi Chen, Wuyang Chen, Tianlong Chen, Ye Yuan, Chen Gong, Kewei Chen, and Zhangyang Wang. Self-pu: Self boosted and calibrated positive-unlabeled training. International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
    Google ScholarLocate open access versionFindings
  • Lu Jiang, Deyu Meng, Qian Zhao, Shiguang Shan, and Alexander G Hauptmann. Self-paced curriculum learning. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
    Google ScholarLocate open access versionFindings
  • Alex Graves, Marc G Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1311–1320. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Wojciech Zaremba and Ilya Sutskever. Learning to execute. arXiv preprint arXiv:1410.4615, 2014.
    Findings
  • Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pages 1171–1179, 2015.
    Google ScholarLocate open access versionFindings
  • M Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems, pages 1189–1197, 2010.
    Google ScholarLocate open access versionFindings
  • Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300, 2017.
    Findings
  • Stefan Schaal. Is imitation learning the route to humanoid robots? Trends in cognitive sciences, 3(6):233–242, 1999.
    Google ScholarLocate open access versionFindings
  • Stefan Schaal, Auke Ijspeert, and Aude Billard. Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431):537–547, 2003.
    Google ScholarLocate open access versionFindings
  • Michael Zhang, James Lucas, Jimmy Ba, and Geoffrey E Hinton. Lookahead optimizer: k steps forward, 1 step back. In Advances in Neural Information Processing Systems, pages 9593–9604, 2019.
    Google ScholarLocate open access versionFindings
  • Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3-4):293–321, 1992.
    Google ScholarLocate open access versionFindings
  • Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, and Le Song. Learning to plan in high dimensions via neural exploration-exploitation trees. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Melanie Coggan. Exploration and exploitation in reinforcement learning. Research supervised by Prof. Doina Precup, CRA-W DMP Project at McGill University, 2004.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, and Richard S Sutton. Toward off-policy learning control with function approximation. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations, 2020. Supplementary Materials: Training Stronger Baselines for Learning to Optimize arXiv:2010.09089v1 [cs.LG] 18 Oct 2020
    Findings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科