# Knowing The What But Not The Where in Bayesian Optimization

ICML 2020, 2019.

EI

Weibo:

Abstract:

Bayesian optimization has demonstrated impressive success in finding the optimum location $x^{*}$ and value $f^{*}=f(x^{*})=\max_{x\in\mathcal{X}}f(x)$ of the black-box function $f$. In some applications, however, the optimum value is known in advance and the goal is to find the corresponding optimum location. Existing work in Bayesian ...More

Code:

Data:

Introduction

- Bayesian optimization (BO) (Brochu et al, 2010; Shahriari et al, 2016; Oh et al, 2018; Frazier, 2018) is an efficient method for the global optimization of a black-box function.
- BO has been successfully employed in selecting chemical compounds (Hernández-Lobato et al, 2017), material design (Frazier & Wang, 2016; Li et al, 2018), and in search for hyperparameters of machine learning algorithms (Snoek et al, 2012; Klein et al, 2017; Chen et al, 2018)
- These recent results suggest BO is more efficient than manual, random, or grid search.
- This surrogate model is used to define an acquisition function which determines the query of the black-box function

Highlights

- Bayesian optimization (BO) (Brochu et al, 2010; Shahriari et al, 2016; Oh et al, 2018; Frazier, 2018) is an efficient method for the global optimization of a black-box function
- We first encode f ∗ to build an informed Gaussian process surrogate model through transformation and we propose two acquisition functions which effectively exploit knowledge of f ∗
- We develop our second acquisition function using f ∗, called expected regret minimization (ERM)
- We have considered a new setting in Bayesian optimization with known optimum output
- We present a transformed Gaussian process surrogate to model the objective function better by exploiting the knowledge of f ∗
- By using extra knowledge of f ∗, we demonstrate that our expected regret minimization can converge quickly to the optimum in benchmark functions and real-world applications

Methods

- The main goal of the experiments is to show that the authors can effectively exploit the known optimum output to improve Bayesian optimization performance.
- The authors perform hyperparameter optimization for a XGBoost classification on Skin Segmentation dataset and a deep reinforcement learning task on CartPole problem where the optimum values are publicly available.
- The experiments are independently performed 20 times.
- The authors run the deep reinforcement learning experiment on a NVIDIA GTX 2080 GPU machine.

Results

- The authors' approaches with f ∗ perform significantly better than the baselines in gSobol and Alpine1 functions.

Conclusion

- The authors show that the model will fail with misspecified value of f ∗ with different effects
- The authors both set f ∗ larger and smaller than the true value in a maximization problem.
- The transformed Gaussian process (TGP) surrogate takes into account the knowledge of optimum value f ∗ to inform the surrogate.
- This transformation may create additional uncertainty at the area where function value is low.
- The authors can extend the model to handle f ∗ within a range of ε from the true output

Summary

## Introduction:

Bayesian optimization (BO) (Brochu et al, 2010; Shahriari et al, 2016; Oh et al, 2018; Frazier, 2018) is an efficient method for the global optimization of a black-box function.- BO has been successfully employed in selecting chemical compounds (Hernández-Lobato et al, 2017), material design (Frazier & Wang, 2016; Li et al, 2018), and in search for hyperparameters of machine learning algorithms (Snoek et al, 2012; Klein et al, 2017; Chen et al, 2018)
- These recent results suggest BO is more efficient than manual, random, or grid search.
- This surrogate model is used to define an acquisition function which determines the query of the black-box function
## Methods:

The main goal of the experiments is to show that the authors can effectively exploit the known optimum output to improve Bayesian optimization performance.- The authors perform hyperparameter optimization for a XGBoost classification on Skin Segmentation dataset and a deep reinforcement learning task on CartPole problem where the optimum values are publicly available.
- The experiments are independently performed 20 times.
- The authors run the deep reinforcement learning experiment on a NVIDIA GTX 2080 GPU machine.
## Results:

The authors' approaches with f ∗ perform significantly better than the baselines in gSobol and Alpine1 functions.## Conclusion:

The authors show that the model will fail with misspecified value of f ∗ with different effects- The authors both set f ∗ larger and smaller than the true value in a maximization problem.
- The transformed Gaussian process (TGP) surrogate takes into account the knowledge of optimum value f ∗ to inform the surrogate.
- This transformation may create additional uncertainty at the area where function value is low.
- The authors can extend the model to handle f ∗ within a range of ε from the true output

- Table1: Hyperparameters for XGBoost
- Table2: Hyperparameters of Advantage Actor Critic (A2C) algorithm f ∗ = 200
- Table3: Examples of known optimum value settings

Reference

- Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 265–283, 2016.
- Astudillo, R. and Frazier, P. Bayesian optimization of composite functions. In International Conference on Machine Learning, pp. 354–363, 2019.
- Barto, A. G., Sutton, R. S., and Anderson, C. W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics, (5):834–846, 1983.
- Berk, J., Nguyen, V., Gupta, S., Rana, S., and Venkatesh, S. Exploration enhanced expected improvement for Bayesian optimization. In Machine Learning and Knowledge Discovery in Databases. Springer, 2018.
- Brochu, E., Cora, V. M., and De Freitas, N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
- Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794. ACM, 2016.
- Chen, Y., Huang, A., Wang, Z., Antonoglou, I., Schrittwieser, J., Silver, D., and de Freitas, N. Bayesian optimization in AlphaGo. arXiv preprint arXiv:1812.06855, 2018.
- Damianou, A. and Lawrence, N. Deep gaussian processes. In Artificial Intelligence and Statistics, pp. 207–215, 2013.
- Frazier, P. I. A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
- Gunter, T., Osborne, M. A., Garnett, R., Hennig, P., and Roberts, S. J. Sampling for inference in probabilistic models with fast Bayesian quadrature. In Advances in neural information processing systems, pp. 2789–2797, 2014.
- Hernández-Lobato, D. and Hernández-Lobato, J. M. Scalable Gaussian process classification via expectation propagation. In Artificial Intelligence and Statistics, pp. 168– 176, 2016.
- Hernández-Lobato, J. M., Hoffman, M. W., and Ghahramani, Z. Predictive entropy search for efficient global optimization of black-box functions. In Advances in Neural Information Processing Systems, pp. 918–926, 2014.
- Hernández-Lobato, J. M., Requeima, J., Pyzer-Knapp, E. O., and Aspuru-Guzik, A. Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. In International Conference on Machine Learning, pp. 1470–1479, 2017.
- Hoffman, M. W. and Ghahramani, Z. Output-space predictive entropy search for flexible global optimization. 2015.
- Klein, A., Falkner, S., Bartels, S., Hennig, P., and Hutter, F. Fast Bayesian optimization of machine learning hyperparameters on large datasets. In Artificial Intelligence and Statistics, pp. 528–536, 2017.
- Kuss, M. and Rasmussen, C. E. Assessing approximate inference for binary Gaussian process classification. Journal of machine learning research, 6(Oct):1679–1704, 2005.
- Le, T., Nguyen, V., Nguyen, T. D., and Phung, D. Nonparametric budgeted stochastic gradient descent. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 654–572, 2016.
- Letham, B., Karrer, B., Ottoni, G., Bakshy, E., et al. Constrained Bayesian optimization with noisy experiments. Bayesian Analysis, 14(2):495–519, 2019.
- Li, C., Santu, R., Gupta, S., Nguyen, V., Venkatesh, S., Sutti, A., Leal, D. R. D. C., Slezak, T., Height, M., Mohammed, M., and Gibson, I. Accelerating experimental design by incorporating experimenter hunches. In IEEE International Conference on Data Mining (ICDM), pp. 257–266, 2018.
- MacKay, D. J. Introduction to Gaussian processes. NATO ASI Series F Computer and Systems Sciences, 168:133– 166, 1998.
- Hennig, P. and Schuler, C. J. Entropy search for informationefficient global optimization. Journal of Machine Learning Research, 13:1809–1837, 2012.
- Mockus, J., Tiesis, V., and Zilinskas, A. The application of Bayesian methods for seeking the extremum. Towards global optimization, 2(117-129):2, 1978.
- Neal, R. M. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
- Nickisch, H. and Rasmussen, C. E. Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9(Oct):2035–2078, 2008.
- Oh, C., Gavves, E., and Welling, M. Bock: Bayesian optimization with cylindrical kernels. In International Conference on Machine Learning, pp. 3865–3874, 2018.
- Osborne, M., Garnett, R., Ghahramani, Z., Duvenaud, D. K., Roberts, S. J., and Rasmussen, C. E. Active learning of model evidence using Bayesian quadrature. In Advances in neural information processing systems, pp. 46– 54, 2012.
- Rasmussen, C. E. Gaussian processes for machine learning. 2006.
- Riihimäki, J., Jylänki, P., and Vehtari, A. Nested expectation propagation for Gaussian process classification with a multinomial probit likelihood. Journal of Machine Learning Research, 14(Jan):75–109, 2013.
- Ru, B., McLeod, M., Granziol, D., and Osborne, M. A. Fast information-theoretic Bayesian optimisation. In International Conference on Machine Learning, pp. 4381–4389, 2018.
- Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104 (1):148–175, 2016.
- Snelson, E., Ghahramani, Z., and Rasmussen, C. E. Warped Gaussian processes. In Advances in neural information processing systems, pp. 337–344, 2004.
- Snoek, J., Larochelle, H., and Adams, R. P. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959, 2012.
- Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., and Adams, R. Scalable Bayesian optimization using deep neural networks. In Proceedings of the 32nd International Conference on Machine Learning, pp. 2171–2180, 2015.
- Springenberg, J. T., Klein, A., Falkner, S., and Hutter, F. Bayesian optimization with robust bayesian neural networks. In Advances in Neural Information Processing Systems, pp. 4134–4142, 2016.
- Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, pp. 1015– 1022, 2010.
- Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
- Wang, Z. and de Freitas, N. Theoretical analysis of Bayesian optimisation with unknown Gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758, 2014.
- Wang, Z. and Jegelka, S. Max-value entropy search for efficient Bayesian optimization. In International Conference on Machine Learning, pp. 3627–3635, 2017.
- Wang, Z., Zhou, B., and Jegelka, S. Optimization as estimation with Gaussian processes in bandit settings. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 1022–1031, 2016.
- Wang, Z., Gehring, C., Kohli, P., and Jegelka, S. Batched large-scale Bayesian optimization in high-dimensional spaces. In International Conference on Artificial Intelligence and Statistics, pp. 745–754, 2018.

Tags

Comments