Knowing The What But Not The Where in Bayesian Optimization

ICML 2020, 2019.

Cited by: 2|Views17
EI
Weibo:
We have considered a new setting in Bayesian optimization with known optimum output

Abstract:

Bayesian optimization has demonstrated impressive success in finding the optimum location $x^{*}$ and value $f^{*}=f(x^{*})=\max_{x\in\mathcal{X}}f(x)$ of the black-box function $f$. In some applications, however, the optimum value is known in advance and the goal is to find the corresponding optimum location. Existing work in Bayesian ...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
Highlights
  • Bayesian optimization (BO) (Brochu et al, 2010; Shahriari et al, 2016; Oh et al, 2018; Frazier, 2018) is an efficient method for the global optimization of a black-box function
  • We first encode f ∗ to build an informed Gaussian process surrogate model through transformation and we propose two acquisition functions which effectively exploit knowledge of f ∗
  • We develop our second acquisition function using f ∗, called expected regret minimization (ERM)
  • We have considered a new setting in Bayesian optimization with known optimum output
  • We present a transformed Gaussian process surrogate to model the objective function better by exploiting the knowledge of f ∗
  • By using extra knowledge of f ∗, we demonstrate that our expected regret minimization can converge quickly to the optimum in benchmark functions and real-world applications
Methods
  • The main goal of the experiments is to show that the authors can effectively exploit the known optimum output to improve Bayesian optimization performance.
  • The authors perform hyperparameter optimization for a XGBoost classification on Skin Segmentation dataset and a deep reinforcement learning task on CartPole problem where the optimum values are publicly available.
  • The experiments are independently performed 20 times.
  • The authors run the deep reinforcement learning experiment on a NVIDIA GTX 2080 GPU machine.
Results
  • The authors' approaches with f ∗ perform significantly better than the baselines in gSobol and Alpine1 functions.
Conclusion
  • The authors show that the model will fail with misspecified value of f ∗ with different effects
  • The authors both set f ∗ larger and smaller than the true value in a maximization problem.
  • The transformed Gaussian process (TGP) surrogate takes into account the knowledge of optimum value f ∗ to inform the surrogate.
  • This transformation may create additional uncertainty at the area where function value is low.
  • The authors can extend the model to handle f ∗ within a range of ε from the true output
Summary
  • Introduction:

    Bayesian optimization (BO) (Brochu et al, 2010; Shahriari et al, 2016; Oh et al, 2018; Frazier, 2018) is an efficient method for the global optimization of a black-box function.
  • BO has been successfully employed in selecting chemical compounds (Hernández-Lobato et al, 2017), material design (Frazier & Wang, 2016; Li et al, 2018), and in search for hyperparameters of machine learning algorithms (Snoek et al, 2012; Klein et al, 2017; Chen et al, 2018)
  • These recent results suggest BO is more efficient than manual, random, or grid search.
  • This surrogate model is used to define an acquisition function which determines the query of the black-box function
  • Methods:

    The main goal of the experiments is to show that the authors can effectively exploit the known optimum output to improve Bayesian optimization performance.
  • The authors perform hyperparameter optimization for a XGBoost classification on Skin Segmentation dataset and a deep reinforcement learning task on CartPole problem where the optimum values are publicly available.
  • The experiments are independently performed 20 times.
  • The authors run the deep reinforcement learning experiment on a NVIDIA GTX 2080 GPU machine.
  • Results:

    The authors' approaches with f ∗ perform significantly better than the baselines in gSobol and Alpine1 functions.
  • Conclusion:

    The authors show that the model will fail with misspecified value of f ∗ with different effects
  • The authors both set f ∗ larger and smaller than the true value in a maximization problem.
  • The transformed Gaussian process (TGP) surrogate takes into account the knowledge of optimum value f ∗ to inform the surrogate.
  • This transformation may create additional uncertainty at the area where function value is low.
  • The authors can extend the model to handle f ∗ within a range of ε from the true output
Tables
  • Table1: Hyperparameters for XGBoost
  • Table2: Hyperparameters of Advantage Actor Critic (A2C) algorithm f ∗ = 200
  • Table3: Examples of known optimum value settings
Download tables as Excel
Reference
  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 265–283, 2016.
    Google ScholarLocate open access versionFindings
  • Astudillo, R. and Frazier, P. Bayesian optimization of composite functions. In International Conference on Machine Learning, pp. 354–363, 2019.
    Google ScholarLocate open access versionFindings
  • Barto, A. G., Sutton, R. S., and Anderson, C. W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics, (5):834–846, 1983.
    Google ScholarFindings
  • Berk, J., Nguyen, V., Gupta, S., Rana, S., and Venkatesh, S. Exploration enhanced expected improvement for Bayesian optimization. In Machine Learning and Knowledge Discovery in Databases. Springer, 2018.
    Google ScholarLocate open access versionFindings
  • Brochu, E., Cora, V. M., and De Freitas, N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
    Findings
  • Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • Chen, Y., Huang, A., Wang, Z., Antonoglou, I., Schrittwieser, J., Silver, D., and de Freitas, N. Bayesian optimization in AlphaGo. arXiv preprint arXiv:1812.06855, 2018.
    Findings
  • Damianou, A. and Lawrence, N. Deep gaussian processes. In Artificial Intelligence and Statistics, pp. 207–215, 2013.
    Google ScholarLocate open access versionFindings
  • Frazier, P. I. A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
    Findings
  • Gunter, T., Osborne, M. A., Garnett, R., Hennig, P., and Roberts, S. J. Sampling for inference in probabilistic models with fast Bayesian quadrature. In Advances in neural information processing systems, pp. 2789–2797, 2014.
    Google ScholarLocate open access versionFindings
  • Hernández-Lobato, D. and Hernández-Lobato, J. M. Scalable Gaussian process classification via expectation propagation. In Artificial Intelligence and Statistics, pp. 168– 176, 2016.
    Google ScholarLocate open access versionFindings
  • Hernández-Lobato, J. M., Hoffman, M. W., and Ghahramani, Z. Predictive entropy search for efficient global optimization of black-box functions. In Advances in Neural Information Processing Systems, pp. 918–926, 2014.
    Google ScholarLocate open access versionFindings
  • Hernández-Lobato, J. M., Requeima, J., Pyzer-Knapp, E. O., and Aspuru-Guzik, A. Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. In International Conference on Machine Learning, pp. 1470–1479, 2017.
    Google ScholarLocate open access versionFindings
  • Hoffman, M. W. and Ghahramani, Z. Output-space predictive entropy search for flexible global optimization. 2015.
    Google ScholarFindings
  • Klein, A., Falkner, S., Bartels, S., Hennig, P., and Hutter, F. Fast Bayesian optimization of machine learning hyperparameters on large datasets. In Artificial Intelligence and Statistics, pp. 528–536, 2017.
    Google ScholarLocate open access versionFindings
  • Kuss, M. and Rasmussen, C. E. Assessing approximate inference for binary Gaussian process classification. Journal of machine learning research, 6(Oct):1679–1704, 2005.
    Google ScholarLocate open access versionFindings
  • Le, T., Nguyen, V., Nguyen, T. D., and Phung, D. Nonparametric budgeted stochastic gradient descent. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 654–572, 2016.
    Google ScholarLocate open access versionFindings
  • Letham, B., Karrer, B., Ottoni, G., Bakshy, E., et al. Constrained Bayesian optimization with noisy experiments. Bayesian Analysis, 14(2):495–519, 2019.
    Google ScholarLocate open access versionFindings
  • Li, C., Santu, R., Gupta, S., Nguyen, V., Venkatesh, S., Sutti, A., Leal, D. R. D. C., Slezak, T., Height, M., Mohammed, M., and Gibson, I. Accelerating experimental design by incorporating experimenter hunches. In IEEE International Conference on Data Mining (ICDM), pp. 257–266, 2018.
    Google ScholarLocate open access versionFindings
  • MacKay, D. J. Introduction to Gaussian processes. NATO ASI Series F Computer and Systems Sciences, 168:133– 166, 1998.
    Google ScholarLocate open access versionFindings
  • Hennig, P. and Schuler, C. J. Entropy search for informationefficient global optimization. Journal of Machine Learning Research, 13:1809–1837, 2012.
    Google ScholarLocate open access versionFindings
  • Mockus, J., Tiesis, V., and Zilinskas, A. The application of Bayesian methods for seeking the extremum. Towards global optimization, 2(117-129):2, 1978.
    Google ScholarLocate open access versionFindings
  • Neal, R. M. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
    Google ScholarFindings
  • Nickisch, H. and Rasmussen, C. E. Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9(Oct):2035–2078, 2008.
    Google ScholarLocate open access versionFindings
  • Oh, C., Gavves, E., and Welling, M. Bock: Bayesian optimization with cylindrical kernels. In International Conference on Machine Learning, pp. 3865–3874, 2018.
    Google ScholarLocate open access versionFindings
  • Osborne, M., Garnett, R., Ghahramani, Z., Duvenaud, D. K., Roberts, S. J., and Rasmussen, C. E. Active learning of model evidence using Bayesian quadrature. In Advances in neural information processing systems, pp. 46– 54, 2012.
    Google ScholarLocate open access versionFindings
  • Rasmussen, C. E. Gaussian processes for machine learning. 2006.
    Google ScholarFindings
  • Riihimäki, J., Jylänki, P., and Vehtari, A. Nested expectation propagation for Gaussian process classification with a multinomial probit likelihood. Journal of Machine Learning Research, 14(Jan):75–109, 2013.
    Google ScholarLocate open access versionFindings
  • Ru, B., McLeod, M., Granziol, D., and Osborne, M. A. Fast information-theoretic Bayesian optimisation. In International Conference on Machine Learning, pp. 4381–4389, 2018.
    Google ScholarLocate open access versionFindings
  • Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104 (1):148–175, 2016.
    Google ScholarLocate open access versionFindings
  • Snelson, E., Ghahramani, Z., and Rasmussen, C. E. Warped Gaussian processes. In Advances in neural information processing systems, pp. 337–344, 2004.
    Google ScholarLocate open access versionFindings
  • Snoek, J., Larochelle, H., and Adams, R. P. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959, 2012.
    Google ScholarLocate open access versionFindings
  • Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., and Adams, R. Scalable Bayesian optimization using deep neural networks. In Proceedings of the 32nd International Conference on Machine Learning, pp. 2171–2180, 2015.
    Google ScholarLocate open access versionFindings
  • Springenberg, J. T., Klein, A., Falkner, S., and Hutter, F. Bayesian optimization with robust bayesian neural networks. In Advances in Neural Information Processing Systems, pp. 4134–4142, 2016.
    Google ScholarLocate open access versionFindings
  • Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, pp. 1015– 1022, 2010.
    Google ScholarLocate open access versionFindings
  • Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
    Google ScholarFindings
  • Wang, Z. and de Freitas, N. Theoretical analysis of Bayesian optimisation with unknown Gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758, 2014.
    Findings
  • Wang, Z. and Jegelka, S. Max-value entropy search for efficient Bayesian optimization. In International Conference on Machine Learning, pp. 3627–3635, 2017.
    Google ScholarLocate open access versionFindings
  • Wang, Z., Zhou, B., and Jegelka, S. Optimization as estimation with Gaussian processes in bandit settings. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 1022–1031, 2016.
    Google ScholarLocate open access versionFindings
  • Wang, Z., Gehring, C., Kohli, P., and Jegelka, S. Batched large-scale Bayesian optimization in high-dimensional spaces. In International Conference on Artificial Intelligence and Statistics, pp. 745–754, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments