Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views80
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We showed how polytope constraints on the embedding eliminate boundary distortions, and we derived a Mahalanobis kernel appropriate for GP modeling in a linear embedding

Abstract:

Bayesian optimization (BO) is a popular approach to optimize expensive-to-evaluate black-box functions. A significant challenge in BO is to scale to high-dimensional parameter spaces while retaining sample efficiency. A solution considered in existing literature is to embed the high-dimensional space in a lower-dimensional manifold, oft...More

Code:

Data:

0
Introduction
  • Bayesian optimization (BO) is a robust, sample-efficient technique for optimizing expensive-toevaluate black-box functions (Mockus, 1989; Jones, 2001).
  • HeSBO (Nayebi et al, 2019) is a recent extension of REMBO that avoids clipping to B and heuristic box bounds in the embedding by changing the projection matrix A.
  • The authors highlight one recent observation from Binois et al (2019), that most points in the embedding project up outside the box bounds, and discuss three novel observations about how existing methods can make it difficult to learn high-dimensional surrogates.
Highlights
  • Bayesian optimization (BO) is a robust, sample-efficient technique for optimizing expensive-toevaluate black-box functions (Mockus, 1989; Jones, 2001)
  • We show that existing approaches produce representations that cannot be well-modeled by a Gaussian process (GP), or representations that likely do not contain an optimum (Sec. 4). 2) We construct a representation with better properties for BO (Sec. 5): we improve modelability by deriving a Mahalanobis kernel tailored for linear embeddings and adding polytope bounds to the embedding, and we show how to maintain a high probability that the embedding contains an optimum
  • We evaluate the performance of adaptive linear embedding BO (ALEBO) on synthetic high-dimensional BO (HDBO) tasks, and compare its performance to a broad selection of HDBO methods
  • Relative to other linear embedding approaches, ALEBO had low variance in the final best-value, which is important in real applications where one can typically only run one optimization run
  • We showed how polytope constraints on the embedding eliminate boundary distortions, and we derived a Mahalanobis kernel appropriate for GP modeling in a linear embedding
  • When constructing a VAE for BO it will be important to ensure the function remains well-modeled on the embedding and that box bounds are not handled in a way that adds distortion
Results
  • Even if the function is well-modeled by a GP in the true low-dimensional space, the distortion produced by the REMBO projection transforms it into one on the embedding that is not appropriate for a GP.
  • From these results the authors see that for the REMBO projection with box bounds the authors cannot expect to successfully model the function on the embedding with a regular GP.
  • HeSBO avoids the challenges of REMBO related to box bounds: all interior points in the embedding map to interior points of B, and there is no need for the L2 projection and the ability to model in the embedding is improved.
  • To determine the covariance in function values of points in the embedding, the authors first project up to the ambient space and project down to the true subspace fB(y) = f (B†y) = fd(T B†y) .
  • Fig. 3 shows these probabilities for D = 100 as a function of d and de, for three strategies for generating the projection matrix: the REMBO strategy of N (0, 1), the HeSBO projection matrix, and the unit hypersphere sampling described in Sec. 4.
  • The linear embedding methods (ALEBO, REMBO, and HeSBO) can naturally be extended to constrained optimization as described in Appendix A.5.
  • Relative to other linear embedding approaches, ALEBO had low variance in the final best-value, which is important in real applications where one can typically only run one optimization run.
  • Fig. 6 shows optimization performance for the linear embedding methods on this task, which is a maximization problem.
Conclusion
  • The authors showed how polytope constraints on the embedding eliminate boundary distortions, and the authors derived a Mahalanobis kernel appropriate for GP modeling in a linear embedding.
  • When constructing a VAE for BO it will be important to ensure the function remains well-modeled on the embedding and that box bounds are not handled in a way that adds distortion.
  • The authors applied linear constraints to restrict the acquisition function optimization to points that project up inside the ambient box bounds.
Summary
  • Bayesian optimization (BO) is a robust, sample-efficient technique for optimizing expensive-toevaluate black-box functions (Mockus, 1989; Jones, 2001).
  • HeSBO (Nayebi et al, 2019) is a recent extension of REMBO that avoids clipping to B and heuristic box bounds in the embedding by changing the projection matrix A.
  • The authors highlight one recent observation from Binois et al (2019), that most points in the embedding project up outside the box bounds, and discuss three novel observations about how existing methods can make it difficult to learn high-dimensional surrogates.
  • Even if the function is well-modeled by a GP in the true low-dimensional space, the distortion produced by the REMBO projection transforms it into one on the embedding that is not appropriate for a GP.
  • From these results the authors see that for the REMBO projection with box bounds the authors cannot expect to successfully model the function on the embedding with a regular GP.
  • HeSBO avoids the challenges of REMBO related to box bounds: all interior points in the embedding map to interior points of B, and there is no need for the L2 projection and the ability to model in the embedding is improved.
  • To determine the covariance in function values of points in the embedding, the authors first project up to the ambient space and project down to the true subspace fB(y) = f (B†y) = fd(T B†y) .
  • Fig. 3 shows these probabilities for D = 100 as a function of d and de, for three strategies for generating the projection matrix: the REMBO strategy of N (0, 1), the HeSBO projection matrix, and the unit hypersphere sampling described in Sec. 4.
  • The linear embedding methods (ALEBO, REMBO, and HeSBO) can naturally be extended to constrained optimization as described in Appendix A.5.
  • Relative to other linear embedding approaches, ALEBO had low variance in the final best-value, which is important in real applications where one can typically only run one optimization run.
  • Fig. 6 shows optimization performance for the linear embedding methods on this task, which is a maximization problem.
  • The authors showed how polytope constraints on the embedding eliminate boundary distortions, and the authors derived a Mahalanobis kernel appropriate for GP modeling in a linear embedding.
  • When constructing a VAE for BO it will be important to ensure the function remains well-modeled on the embedding and that box bounds are not handled in a way that adds distortion.
  • The authors applied linear constraints to restrict the acquisition function optimization to points that project up inside the ambient box bounds.
Tables
  • Table1: Average running time per iteration in seconds on the Hartmann6 problem, D=100 and
Download tables as Excel
Related work
  • There are generally two approaches to extending BO into high dimensions. The first is to produce a low-dimensional embedding, do standard BO in this low-dimensional space, and then project up to the original space for function evaluations. The foundational work on embeddings for BO is REMBO (Wang et al, 2016), which creates a linear embedding by generating a random projection matrix. Sec. 3 provides a thorough description of REMBO and several subsequent approaches based on random linear embeddings (Qian et al, 2016; Binois et al, 2019; Nayebi et al, 2019). If derivatives of f are available, the active subspace method can be used to recover a linear embedding (Constantine et al, 2014; Eriksson et al, 2018), or approximate gradients can be used (Djolonga et al, 2013). BO can also be done in nonlinear embeddings through VAEs (Gomez-Bombarelli et al, 2018; Lu et al, 2018; Moriconi et al, 2019). An attractive aspect of random embeddings is that they can be extremely sample-efficient, since the only model to be estimated is a low-dimensional GP.
Study subjects and analysis
samples: 1000
√thdeei]ndtee,rsioarmopfliBng. This is A with measured empirically by N (0, 1) entries, and then checking if Ay ∈ B (with 1000 samples). Even for small D, with de > 2 practically all of the volume in the embedding projects up outside the box bounds, and is thus clipped to a facet of B

Reference
  • Rika Antonova, Akshara Rai, and Christopher G Atkeson. Deep kernels for optimizing locomotion controllers. In 1st Conference on Robot Learning, CoRL, pp. 47–56, 2017.
    Google ScholarLocate open access versionFindings
  • Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. BoTorch: Programmable Bayesian optimization in PyTorch. arXiv preprint arXiv:1910.06403, 2019.
    Findings
  • Mickael Binois. Uncertainty quantification on Pareto fronts and high-dimensional strategies in Bayesian optimization, with applications in multi-objective automotive design. PhD thesis, Ecole Nationale Superieure des Mines de Saint-Etienne, 2015.
    Google ScholarFindings
  • Mickael Binois, David Ginsbourger, and Olivier Roustant. A warped kernel improving robustness in Bayesian optimization via random embeddings. In Proceedings of the International Conference on Learning and Intelligent Optimization, LION, pp. 281–286, 2015.
    Google ScholarLocate open access versionFindings
  • Mickael Binois, David Ginsbourger, and Olivier Roustant. On the choice of the low-dimensional domain for global optimization via random embeddings. Journal of Global Optimization, 2019.
    Google ScholarLocate open access versionFindings
  • Roberto Calandra, Andre Seyfarth, Jan Peters, and Marc P. Deisenroth. Bayesian optimization for learning gaits under uncertainty. Annals of Mathematics and Artificial Intelligence, 76(1):5–23, 2015.
    Google ScholarLocate open access versionFindings
  • Paul G. Constantine, Eric Dow, and Qiqi Wang. Active subspace methods in theory and practice: applications to Kriging surfaces. SIAM Journal on Scientific Computing, 36:A1500–A1524, 2014.
    Google ScholarLocate open access versionFindings
  • Erwin Coumans and John McCutchan. Pybullet simulator. https://github.com/bulletphysics/bullet3, 200Accessed:2019-09.
    Findings
  • Alessandro Crespi and Auke Jan Ijspeert. Online optimization of swimming and crawling in an amphibious snake robot. IEEE Transactions on Robotics, 24(1):75–87, 2008.
    Google ScholarLocate open access versionFindings
  • Josip Djolonga, Andreas Krause, and Volkan Cevher. High-dimensional Gaussian process bandits. In Advances in Neural Information Processing Systems 26, NIPS, pp. 1025–1033, 2013.
    Google ScholarLocate open access versionFindings
  • David Eriksson, Kun Dong, Eric Hans Lee, David Bindel, and Andrew Gordon Wilson. Scaling Gaussian process regression with derivatives. In Advances in Neural Information Processing Systems 31, NIPS, pp. 6867–6877, 2018.
    Google ScholarLocate open access versionFindings
  • Jean-Albert Ferrez, Kornei Fukuda, and Th. M. Liebling. Solving the fixed rank convex quadratic maximization in binary variables by a parallel zonotope construction algorithm. European Journal of Operational Research, 166(1):35–50, 2005.
    Google ScholarLocate open access versionFindings
  • Jacob Gardner, Chuan Guo, Kilian Q. Weinberger, Roman Garnett, and Roger Grosse. Discovering and exploiting additive structure for Bayesian optimization. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 1311–1319, 2017.
    Google ScholarLocate open access versionFindings
  • Jacob R. Gardner, Matt J. Kusner, Zhixiang Xu, Kilian Q. Weinberger, and John P. Cunningham. Bayesian optimization with inequality constraints. In Proceedings of the 31st International Conference on Machine Learning, ICML, 2014.
    Google ScholarLocate open access versionFindings
  • Rafael Gomez-Bombarelli, Jennifer N. Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamın Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alan Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276, 2018.
    Google ScholarLocate open access versionFindings
  • Robert B. Gramacy, Genetha A. Gray, Sebastien Le Digabel, Herbert K. H. Lee, Pritam Ranjan, Garth Wells, and Stefan M. Wild. Modeling an augmented Lagrangian for blackbox constrained optimization. Technometrics, 58(1):1–11, 2016.
    Google ScholarLocate open access versionFindings
  • Nikolaus Hansen, Sibylle D. Mller, and Petros Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 11(1):1–18, 2003.
    Google ScholarLocate open access versionFindings
  • Hebi Robotics. Daisy hexapod, 2019. URL https://www.hebirobotics.com/robotic-kits.
    Findings
  • Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization, LION, pp. 507–523, 2011.
    Google ScholarLocate open access versionFindings
  • William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26(189–206):1, 1984.
    Google ScholarLocate open access versionFindings
  • Donald R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21(4):345–383, 2001.
    Google ScholarLocate open access versionFindings
  • Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455–492, 1998.
    Google ScholarLocate open access versionFindings
  • Kirthevasan Kandasamy, Jeff Schneider, and Barnabas Poczos. High dimensional Bayesian optimisation and bandits via additive models. In Proceedings of the 32nd International Conference on Machine Learning, ICML, pp. 295–304, 2015.
    Google ScholarLocate open access versionFindings
  • Johannes Kirschner, Mojmır Mutny, Nicole Hiller, Rasmus Ischebeck, and Andreas Krause. Adaptive and safe bayesian optimization in high dimensions via one-dimensional subspaces. In Proceedings of the 36th International Conference on Machine Learning, ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Benjamin Letham, Brian Karrer, Guilherme Ottoni, and Eytan Bakshy. Constrained Bayesian optimization with noisy experiments. Bayesian Analysis, 14(2):495–519, 2019.
    Google ScholarLocate open access versionFindings
  • Chun-Liang Li, Kirthevasan Kandasamy, Barnabas Poczos, and Jeff Schneider. High dimensional Bayesian optimization via restricted projection pursuit models. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 884–892, 2016.
    Google ScholarLocate open access versionFindings
  • Daniel J. Lizotte, Tao Wang, Michael Bowling, and Dale Schuurmans. Automatic gait optimization with Gaussian process regression. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI, pp. 944–949, 2007.
    Google ScholarLocate open access versionFindings
  • Xiaoyu Lu, Javier Gonzalez, Zhenwen Dai, and Neil Lawrence. Structured variationally autoencoded optimization. In Proceedings of the 35th International Conference on Machine Learning, ICML, pp. 3267–3275, 2018.
    Google ScholarLocate open access versionFindings
  • Jonas Mockus. Bayesian approach to global optimization: theory and applications. Mathematics and its Applications: Soviet Series. Kluwer Academic, 1989.
    Google ScholarFindings
  • Riccardo Moriconi, K. S. Sesh Kumar, and Marc P. Deisenroth. High-dimensional Bayesian optimization with manifold Gaussian processes. arXiv preprint arXiv:1902.10675, 2019.
    Findings
  • Mojmır Mutnyand Andreas Krause. Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. In Advances in Neural Information Processing Systems 31, NIPS, pp. 9005–9016, 2018.
    Google ScholarLocate open access versionFindings
  • Amin Nayebi, Alexander Munteanu, and Matthias Poloczek. A framework for Bayesian optimization in embedded subspaces. In Proceedings of the 36th International Conference on Machine Learning, ICML, pp. 4752–4761, 2019.
    Google ScholarLocate open access versionFindings
  • ChangYong Oh, Efstratios Gavves, and Max Welling. BOCK: Bayesian optimization with cylindrical kernels. In Proceedings of the 35th International Conference on Machine Learning, ICML, pp. 3868–3877, 2018.
    Google ScholarLocate open access versionFindings
  • Hong Qian, Yi-Qi. Hu, and Yang Yu. Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI, 2016.
    Google ScholarLocate open access versionFindings
  • Akshara Rai, Rika Antonova, Seungmoon Song, William Martin, Hartmut Geyer, and Christopher G. Atkeson. Bayesian optimization using domain knowledge on the ATRIAS biped. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA, pp. 1771– 1778, 2018.
    Google ScholarLocate open access versionFindings
  • Paul Rolland, Jonathan Scarlett, Ilija Bogunovic, and Volkan Cevher. High-dimensional Bayesian optimization via additive models with overlapping groups. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 298–307, 2018.
    Google ScholarLocate open access versionFindings
  • Edward Snelson and Zoubin Ghahramani. Variable noise and dimensionality reduction for sparse Gaussian processes. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, UAI, pp. 461–468, 2006.
    Google ScholarLocate open access versionFindings
  • Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, NIPS, pp. 2951– 2959, 2012.
    Google ScholarLocate open access versionFindings
  • Francesco Vivarelli and Christopher K. I. Williams. Discovering hidden features with Gaussian processes regression. In Advances in Neural Information Processing Systems 11, pp. 613–619, 1999.
    Google ScholarLocate open access versionFindings
  • Zi Wang, Chengtao Li, Stefanie Jegelka, and Pushmeet Kohli. Batched high-dimensional Bayesian optimization via structural kernel learning. In Proceedings of the 34th International Conference on Machine Learning, ICML, pp. 3656–3664, 2017.
    Google ScholarLocate open access versionFindings
  • Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale Bayesian optimization in high-dimensional spaces. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS, 2018.
    Google ScholarLocate open access versionFindings
  • Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, and Nando de Feitas. Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 55:361–387, 2016.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments