Google Vizier: A Service for Black-Box Optimization

Benjamin Solnik
Benjamin Solnik
Subhodeep Moitra
Subhodeep Moitra
John Karro
John Karro

KDD, pp. 1487-1495, 2017.

Cited by: 268|Bibtex|Views120|DOI:https://doi.org/10.1145/3097983.3098043
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
To evaluate the performance of Google Vizier we require functions that can be used to benchmark the results. These are pre-selected, calculated functions with known optimal points that have proven challenging for black-box optimization algorithms

Abstract:

Any sufficiently complex system acts as a black box when it becomes easier to experiment with than to understand. Hence, black-box optimization has become increasingly important as systems have become more complex. In this paper we describe Google Vizier, a Google-internal service for performing black-box optimization that has become the ...More

Code:

Data:

0
Introduction
  • Black–box optimization is the task of optimizing an objective function f : X → R with a limited budget for evaluations.
  • Black box optimization algorithms can be used to find the best operating parameters for any system whose performance can be measured as a function of adjustable parameters
  • It has many important applications, such as automated tuning of the hyperparameters of machine learning systems, optimization of the user interfaces of web services (e.g. optimizing colors and fonts.
  • In this paper the authors discuss a state-of-the-art system for black–box optimization developed within Google, called Google Vizier, named after a high official who offers advice to rulers
  • It is a service for black-box optimization that supports several advanced algorithms.
  • The authors discuss the architecture of the system, design choices, and some of the algorithms used
Highlights
  • Black–box optimization is the task of optimizing an objective function f : X → R with a limited budget for evaluations
  • To evaluate the performance of Google Vizier we require functions that can be used to benchmark the results. These are pre-selected, calculated functions with known optimal points that have proven challenging for black-box optimization algorithms
  • 4.2 Empirical Results In Figures 6 we look at result quality for four optimization algorithms currently implemented in the Vizier framework: a multiarmed bandit technique using a Gaussian process regressor [29], the SMAC algorithm [19], the Covariance Matrix Adaption Evolution Strategy (CMA-ES) [16], and a probabilistic search method of our own
  • While some authors have claimed that 2×Random Search is highly competitive with Bayesian Optimization methods [20], our data suggests this is only true when the dimensionality of the problem is sufficiently high
  • We found that the use of the performance curve stopping rule resulted in achieving optimality gaps comparable to those achieved without the stopping rule, while using approximately 50% fewer CPU-hours when tuning hyperparameter for deep neural networks
  • It has already proven to be a valuable platform for research and development, and we expect it will only grow more so as the area of black–box optimization grows in importance
Methods
  • Design Goals and Constraints

    Vizier’s design satisfies the following desiderata: Ease of use.
  • The authors implemented Vizier as a managed service that stores the state of each optimization.
  • This approach drastically reduces the effort a new user needs to get up and running; and a managed service with a well-documented and stable RPC API allows them to upgrade the service without user effort.
  • The authors choose to make the algorithms stateless, so that the authors can seamlessly switch algorithms during a
Results
  • To evaluate the performance of Google Vizier the authors require functions that can be used to benchmark the results
  • These are pre-selected, calculated functions with known optimal points that have proven challenging for black-box optimization algorithms.
  • A good black-box optimizer applied to the Rastrigin function might achieve an optimality gap of 160, while simple random sampling of the Beale function can quickly achieve an optimality gap of 60 [10].
  • One can see that transfer learning from one study to the leads to steady progress towards the optimum, as the stack of regressors gradually builds up information about the shape of the objective function
Conclusion
  • The authors have presented the design for Vizier, a scalable, state-of-theart internal service for black–box optimization within Google, explained many of its design choices, and described its use cases and benefits.
  • It has already proven to be a valuable platform for research and development, and the authors expect it will only grow more so as the area of black–box optimization grows in importance.
  • It designs excellent cookies, which is a very rare capability among computational systems
Summary
  • Introduction:

    Black–box optimization is the task of optimizing an objective function f : X → R with a limited budget for evaluations.
  • Black box optimization algorithms can be used to find the best operating parameters for any system whose performance can be measured as a function of adjustable parameters
  • It has many important applications, such as automated tuning of the hyperparameters of machine learning systems, optimization of the user interfaces of web services (e.g. optimizing colors and fonts.
  • In this paper the authors discuss a state-of-the-art system for black–box optimization developed within Google, called Google Vizier, named after a high official who offers advice to rulers
  • It is a service for black-box optimization that supports several advanced algorithms.
  • The authors discuss the architecture of the system, design choices, and some of the algorithms used
  • Methods:

    Design Goals and Constraints

    Vizier’s design satisfies the following desiderata: Ease of use.
  • The authors implemented Vizier as a managed service that stores the state of each optimization.
  • This approach drastically reduces the effort a new user needs to get up and running; and a managed service with a well-documented and stable RPC API allows them to upgrade the service without user effort.
  • The authors choose to make the algorithms stateless, so that the authors can seamlessly switch algorithms during a
  • Results:

    To evaluate the performance of Google Vizier the authors require functions that can be used to benchmark the results
  • These are pre-selected, calculated functions with known optimal points that have proven challenging for black-box optimization algorithms.
  • A good black-box optimizer applied to the Rastrigin function might achieve an optimality gap of 160, while simple random sampling of the Beale function can quickly achieve an optimality gap of 60 [10].
  • One can see that transfer learning from one study to the leads to steady progress towards the optimum, as the stack of regressors gradually builds up information about the shape of the objective function
  • Conclusion:

    The authors have presented the design for Vizier, a scalable, state-of-theart internal service for black–box optimization within Google, explained many of its design choices, and described its use cases and benefits.
  • It has already proven to be a valuable platform for research and development, and the authors expect it will only grow more so as the area of black–box optimization grows in importance.
  • It designs excellent cookies, which is a very rare capability among computational systems
Related work
  • Black–box optimization makes minimal assumptions about the problem under consideration, and thus is broadly applicable across many domains and has been studied in multiple scholarly fields under names including Bayesian Optimization [2, 25, 26], Derivative– free optimization [7, 24], Sequential Experimental Design [5], and assorted variants of the multiarmed bandit problem [13, 20, 29].

    Several classes of algorithms have been proposed for the problem. The simplest of these are non-adaptive procedures such as Random Search, which selects xt uniformly at random from X at each time step t independent of the previous points selected, {xτ : 1 ≤ τ < t }, and Grid Search, which selects along a grid (i.e., the Cartesian product of finite sets of feasible values for each parameter). Classic algorithms such as SimulatedAnnealing and assorted genetic algorithms have also been investigated, e.g., Covariance Matrix Adaptation [16].

    Another class of algorithms performs a local search by selecting points that maintain a search pattern, such as a simplex in the case of the classic Nelder–Mead algorithm [22]. More modern variants of these algorithms maintain simple models of the objective f within a subset of the feasible regions (called the trust region), and select a point xt to improve the model within the trust region [7].
Reference
  • Rémi Bardenet, Mátyás Brendel, Balázs Kégl, and Michele Sebag. 2013. Collaborative hyperparameter tuning. ICML 2 (2013), 199.
    Google ScholarLocate open access versionFindings
  • James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. 2546–2554.
    Google ScholarLocate open access versionFindings
  • J Bernardo, MJ Bayarri, JO Berger, AP Dawid, D Heckerman, AFM Smith, and M West. 2011. Optimization under unknown constraints. Bayesian Statistics 9 9 (2011), 229.
    Google ScholarLocate open access versionFindings
  • Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3 data-driven documents. IEEE transactions on visualization and computer graphics 17, 12 (2011), 2301–2309.
    Google ScholarLocate open access versionFindings
  • Herman Chernoff. 1959. Sequential Design of Experiments. Ann. Math. Statist. 30, 3 (09 1959), 755–770. https://doi.org/10.1214/aoms/1177706205
    Locate open access versionFindings
  • Jasmine Collins, Jascha Sohl-Dickstein, and David Sussillo. 2017. Capacity and Trainability in Recurrent Neural Networks. In Profeedings of the International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Andrew R Conn, Katya Scheinberg, and Luis N Vicente. 2009. Introduction to derivative-free optimization. SIAM.
    Google ScholarFindings
  • Thomas Desautels, Andreas Krause, and Joel W Burdick. 2014. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. Journal of Machine Learning Research 15, 1 (2014), 3873–3923.
    Google ScholarLocate open access versionFindings
  • Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves.. In IJCAI. 3460–3468.
    Google ScholarFindings
  • Steffen Finck, Nikolaus Hansen, Raymond Rost, and Anne Auger. 2009. Real-Parameter Black-Box Optimization Benchmarking 2009: Presentation of the Noiseless Functions. http://coco.gforge.inria.fr/lib/exe/fetch.php?media=download3.6:bbobdocfunctions.pdf. (2009).[Online].
    Findings
  • Jacob R Gardner, Matt J Kusner, Zhixiang Eddie Xu, Kilian Q Weinberger, and John P Cunningham. 2014. Bayesian Optimization with Inequality Constraints.. In ICML. 937–945.
    Google ScholarLocate open access versionFindings
  • Michael A Gelbart, Jasper Snoek, and Ryan P Adams. 2014. Bayesian optimization with unknown constraints. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 250–259.
    Google ScholarLocate open access versionFindings
  • Josep Ginebra and Murray K. Clayton. 1995. Response Surface Bandits. Journal of the Royal Statistical Society. Series B (Methodological) 57, 4 (1995), 771–784. http://www.jstor.org/stable/2345943
    Locate open access versionFindings
  • Google. 2017. Polymer: Build modern apps using web components. https://github.com/Polymer/polymer. (2017).[Online].
    Findings
  • Google. 2017. Protocol Buffers: Google’s data interchange format. https://github.com/google/protobuf. (2017).[Online].
    Findings
  • Nikolaus Hansen and Andreas Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation 9, 2 (2001), 159–195.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
    Google ScholarLocate open access versionFindings
  • Julian Heinrich and Daniel Weiskopf. 2013. State of the Art of Parallel Coordinates.. In Eurographics (STARs). 95–116.
    Google ScholarLocate open access versionFindings
  • Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential modelbased optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization. Springer, 507–523.
    Google ScholarLocate open access versionFindings
  • Lisha Li, Kevin G. Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. CoRR abs/1603.06560 (2016). http://arxiv.org/abs/1603.06560
    Findings
  • J Moćkus, V Tiesis, and A Źilinskas. 1978. The Application of Bayesian Methods for Seeking the Extremum. Vol. 2. Elsevier. 117–128 pages.
    Google ScholarFindings
  • John A Nelder and Roger Mead. 1965. A simplex method for function minimization. The computer journal 7, 4 (1965), 308–313.
    Google ScholarLocate open access versionFindings
  • Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press.
    Google ScholarLocate open access versionFindings
  • Luis Miguel Rios and Nikolaos V Sahinidis. 2013. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization 56, 3 (2013), 1247–1293.
    Google ScholarLocate open access versionFindings
  • Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. 2016. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE 104, 1 (2016), 148–175.
    Google ScholarLocate open access versionFindings
  • Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951–2959.
    Google ScholarFindings
  • Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat, and Ryan P. Adams. 2015. Scalable Bayesian Optimization Using Deep Neural Networks. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 (JMLR Workshop and Conference Proceedings), Francis R. Bach and David M. Blei (Eds.), Vol. 37. JMLR.org, 2171–2180. http://jmlr.org/proceedings/papers/v37/snoek15.html
    Locate open access versionFindings
  • Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. 2016. Bayesian Optimization with Robust Bayesian Neural Networks. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4134–4142. http://papers.nips.cc/paper/6117-bayesian-optimization-with-robust-bayesian-neural-networks.pdf
    Locate open access versionFindings
  • Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. 2010. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. ICML (2010).
    Google ScholarLocate open access versionFindings
  • Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. 2014. Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896 (2014).
    Findings
  • Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. 2016. Deep kernel learning. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. 370–378.
    Google ScholarLocate open access versionFindings
  • Dani Yogatama and Gideon Mann. 2014. Efficient Transfer Learning Method for Automatic Hyperparameter Tuning. JMLR: W&CP 33 (2014), 1077–1085.
    Google ScholarFindings
  • Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).
    Findings
Full Text
Your rating :
0

 

Tags
Comments