Bayesian Optimization for Iterative Learning

NIPS 2020, 2020.

Cited by: 2|Views20
EI
Weibo:
We present a way of leveraging our understanding that later stages of the training process are informed by progress made in earlier ones. This results in a more iteration-efficient hyperparameter tuning algorithm that is applicable to a broad range of machine learning systems

Abstract:

The success of deep (reinforcement) learning systems crucially depends on the correct choice of hyperparameters which are notoriously sensitive and expensive to evaluate. Training these systems typically requires running iterative processes over multiple epochs or episodes. Traditional approaches only consider final performances of a hy...More
0
Full Text
Bibtex
Weibo
Introduction
  • Deep learning (DL) and deep reinforcement learning (DRL) have lead to impressive breakthroughs in a broad range of applications such as game play (Mnih et al, 2013; Silver et al, 2016), motor control (Todorov et al, 2012), and image recognition (Krizhevsky et al, 2012).
  • Task at hand
  • This flexibility comes at the price of having to tune an additional set of parameters as, in DL and DRL, poor settings lead to drastic performance losses or divergence (Sprague, 2015; Smith, 2018; Henderson et al, 2018).
  • Tuning DRL parameters is further complicated as only noisy evaluations of an agents final performance are obtainable
Highlights
  • Deep learning (DL) and deep reinforcement learning (DRL) have lead to impressive breakthroughs in a broad range of applications such as game play (Mnih et al, 2013; Silver et al, 2016), motor control (Todorov et al, 2012), and image recognition (Krizhevsky et al, 2012)
  • This flexibility comes at the price of having to tune an additional set of parameters as, in DL and DRL, poor settings lead to drastic performance losses or divergence (Sprague, 2015; Smith, 2018; Henderson et al, 2018)
  • We present a Bayesian optimization approach for tuning systems learning iteratively -- the cases of deep learning and deep reinforcement learning
  • We demonstrate our proposed model by tuning hyperparameters for several deep reinforcement learning agents and convolutional neural networks (CNN)
  • Our framework complements the existing Bayesian optimization (BO) toolbox for hyperparameter tuning with iterative learning
  • We present a way of leveraging our understanding that later stages of the training process are informed by progress made in earlier ones. This results in a more iteration-efficient hyperparameter tuning algorithm that is applicable to a broad range of machine learning systems
Methods
  • The authors demonstrate the proposed model by tuning hyperparameters for several deep reinforcement learning agents and convolutional neural networks (CNN).
  • The authors utilize greedy approximations by sequentially selecting points to create a batch of M points (Kathuria et al, 2016).
  • The M-DPP reduces to M sequential draws from 1-DPPs in each of which the determinant computation in Eq (6) reduces to a scalar equivalent to a GPs predictive variance in Eq (2).
  • This relaxation.
Conclusion
  • Conclusion and Future work

    The authors' framework complements the existing BO toolbox for hyperparameter tuning with iterative learning.
  • The authors present a way of leveraging the understanding that later stages of the training process are informed by progress made in earlier ones.
  • This results in a more iteration-efficient hyperparameter tuning algorithm that is applicable to a broad range of machine learning systems.
  • The results demonstrate that the model can surpass the performance of well-established alternatives while consuming significantly fewer resources.
  • The authors would like to note that the approach is not necessarily specific to machine learning algorithms but more generally applies to any process exhibiting an iterative structure to be exploited
Summary
  • Introduction:

    Deep learning (DL) and deep reinforcement learning (DRL) have lead to impressive breakthroughs in a broad range of applications such as game play (Mnih et al, 2013; Silver et al, 2016), motor control (Todorov et al, 2012), and image recognition (Krizhevsky et al, 2012).
  • Task at hand
  • This flexibility comes at the price of having to tune an additional set of parameters as, in DL and DRL, poor settings lead to drastic performance losses or divergence (Sprague, 2015; Smith, 2018; Henderson et al, 2018).
  • Tuning DRL parameters is further complicated as only noisy evaluations of an agents final performance are obtainable
  • Objectives:

    The authors aim to optimize x∗ = arg maxx∈X f (x,t) at the same time the authors want to keep the overall training time ∑Ni=1 c(xi,ti) of evaluated settings [xi,ti] as low as possible
  • Methods:

    The authors demonstrate the proposed model by tuning hyperparameters for several deep reinforcement learning agents and convolutional neural networks (CNN).
  • The authors utilize greedy approximations by sequentially selecting points to create a batch of M points (Kathuria et al, 2016).
  • The M-DPP reduces to M sequential draws from 1-DPPs in each of which the determinant computation in Eq (6) reduces to a scalar equivalent to a GPs predictive variance in Eq (2).
  • This relaxation.
  • Conclusion:

    Conclusion and Future work

    The authors' framework complements the existing BO toolbox for hyperparameter tuning with iterative learning.
  • The authors present a way of leveraging the understanding that later stages of the training process are informed by progress made in earlier ones.
  • This results in a more iteration-efficient hyperparameter tuning algorithm that is applicable to a broad range of machine learning systems.
  • The results demonstrate that the model can surpass the performance of well-established alternatives while consuming significantly fewer resources.
  • The authors would like to note that the approach is not necessarily specific to machine learning algorithms but more generally applies to any process exhibiting an iterative structure to be exploited
Tables
  • Table1: Dueling DQN algorithm on CartPole problem
  • Table2: A2C algorithm on Reacher problem
  • Table3: A2C algorithm on InvertedPendulum problem
  • Table4: Convolutional Neural Network on SVHN dataset
  • Table5: Further specification for DRL agents
Download tables as Excel
Related work
  • In the following, we review a number of approaches that aim to minimize the number of training iterations needed to identify the optimal hyperparameters.

    The first category employs stopping criteria to terminate some training runs early and instead allocate resources towards more promising settings. These criteria typically evolve around projecting a final score using earlier training stages. Freezethaw BO (Swersky et al, 2014) models the training loss over time using a GP regressor under the assumption that the training loss roughly follows an exponential decay. Based on this projection training resources are allocated to the most promising settings. Hyperband (Li and Jamieson, 2018; Falkner et al, 2018) dynamically allocates the computational resources (e.g., training epochs or dataset size) through random sampling and eliminates under-performing hyperparameter settings by successive halving. In addition, attempts have also been made to improve the epoch efficiency of other hyperparameter optimization algorithms, including (Baker et al, 2017; Domhan et al, 2015; Klein et al, 2017b; Dai et al, 2019) which predict the final learning outcome based on partially trained learning curves to identify hyperparameter settings that are predicted to under-perform and early-stop their model learning. In the context of DRL, however, these stopping criteria may not be applicable due to the unpredictable fluctuations of DRL reward curves. In the supplement, we illustrate the noisiness of DRL training.
Reference
  • Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823, 2017.
    Findings
  • Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
    Findings
  • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
    Findings
  • Yutian Chen, Aja Huang, Ziyu Wang, Ioannis Antonoglou, Julian Schrittwieser, David Silver, and Nando de Freitas. Bayesian optimization in alphago. arXiv preprint arXiv:1812.06855, 2018.
    Findings
  • Zhongxiang Dai, Haibin Yu, Bryan Kian Hsiang Low, and Patrick Jaillet. Bayesian optimization meets bayesian optimal stopping. In International Conference on Machine Learning, pages 1496–1506, 2019.
    Google ScholarLocate open access versionFindings
  • Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. Openai baselines. GitHub, GitHub repository, 2017.
    Google ScholarFindings
  • Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
    Google ScholarLocate open access versionFindings
  • Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning, pages 1436–1445, 2018.
    Google ScholarLocate open access versionFindings
  • Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
    Findings
  • Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning, pages 1183–1192. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Philipp Hennig and Christian J Schuler. Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13:1809–1837, 2012.
    Google ScholarLocate open access versionFindings
  • José Miguel Hernández-Lobato, Matthew W Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black-box functions. In Advances in Neural Information Processing Systems, pages 918–926, 2014.
    Google ScholarLocate open access versionFindings
  • Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Overview of mini-batch gradient descent. Neural Networks for Machine Learning, 575, 2012.
    Google ScholarLocate open access versionFindings
  • M. I. Jordan and T. M. Mitchell. Machine learning: Trends, perspectives, and prospects. Science, 349(6245): 255–260, 20doi: 10.1126/science.aaa8415.
    Locate open access versionFindings
  • Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnabás Póczos. Multi-fidelity bayesian optimisation with continuous approximations. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 1799–1808. JMLR.org, 2017.
    Google ScholarLocate open access versionFindings
  • Tarun Kathuria, Amit Deshpande, and Pushmeet Kohli. Batched gaussian process bandit optimization via determinantal point processes. In Advances in Neural Information Processing Systems, pages 4206–4214, 2016.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial Intelligence and Statistics, pages 528–536, 2017a.
    Google ScholarLocate open access versionFindings
  • Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. Learning curve prediction with bayesian neural networks. International Conference on Learning Representations (ICLR), 2017b.
    Google ScholarLocate open access versionFindings
  • Andreas Krause and Cheng S Ong. Contextual gaussian process bandit optimization. In Advances in Neural Information Processing Systems, pages 2447–2455, 2011.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Alex Kulesza, Ben Taskar, et al. Determinantal point processes for machine learning. Foundations and Trends R in Machine Learning, 5(2–3):123–286, 2012.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Benjamin Letham, Brian Karrer, Guilherme Ottoni, Eytan Bakshy, et al. Constrained bayesian optimization with noisy experiments. Bayesian Analysis, 14(2):495–519, 2019.
    Google ScholarLocate open access versionFindings
  • Lisha Li and Kevin Jamieson. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18:1–52, 2018.
    Google ScholarLocate open access versionFindings
  • Mark McLeod, Stephen Roberts, and Michael A Osborne. Optimization, fast and slow: Optimally switching between local and bayesian optimization. In International Conference on Machine Learning, pages 3440– 3449, 2018.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
    Findings
  • Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016.
    Google ScholarLocate open access versionFindings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
    Google ScholarFindings
  • Favour M Nyikosa, Michael A Osborne, and Stephen J Roberts. Bayesian optimization for dynamic problems. arXiv preprint arXiv:1803.03432, 2018.
    Findings
  • Michael Osborne, Roman Garnett, Zoubin Ghahramani, David K Duvenaud, Stephen J Roberts, and Carl E Rasmussen. Active learning of model evidence using bayesian quadrature. In Advances in neural information processing systems, pages 46–54, 2012.
    Google ScholarLocate open access versionFindings
  • Carl Edward Rasmussen. Gaussian processes for machine learning. 2006.
    Google ScholarFindings
  • Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. International Conference on Learning Representations, 2016.
    Google ScholarLocate open access versionFindings
  • Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2016.
    Google ScholarLocate open access versionFindings
  • Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
    Google ScholarLocate open access versionFindings
  • Leslie N Smith. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820, 2018.
    Findings
  • Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
    Google ScholarLocate open access versionFindings
  • Nathan Sprague. Parameter selection for the deep qlearning algorithm. In Proceedings of the Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), page 24, 2015.
    Google ScholarLocate open access versionFindings
  • Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multitask Bayesian optimization. In Advances in neural information processing systems, pages 2004–2012, 2013.
    Google ScholarLocate open access versionFindings
  • Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. Freeze-thaw bayesian optimization. arXiv preprint arXiv:1406.3896, 2014.
    Findings
  • Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient bayesian optimization. In International Conference on Machine Learning, pages 3627–3635, 2017.
    Google ScholarLocate open access versionFindings
  • Ziyu Wang and Nando de Freitas. Theoretical analysis of bayesian optimisation with unknown gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758, 2014.
    Findings
  • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning, pages 1995– 2003, 2016.
    Google ScholarLocate open access versionFindings
  • Jian Wu and Peter Frazier. The parallel knowledge gradient method for batch Bayesian optimization. In Advances In Neural Information Processing Systems, pages 3126– 3134, 2016.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments