# Explicit Gradient Learning for Black-Box Optimization

ICML, pp. 8480-8490, 2020.

EI

Weibo:

Abstract:

Black-Box Optimization (BBO) methods can find optimal policies for systems that interact with complex environments with no analytical representation. As such, they are of interest in many Artificial Intelligence (AI) domains. Yet classical BBO methods fall short in high-dimensional non-convex problems. They are thus often overlooked in re...More

Code:

Data:

Introduction

- Optimization problems are prevalent in many artificial intelligence applications, from search-and-rescue optimal deployment (Zhen et al, 2014) to triage policy in emergency rooms (Rosemarin et al, 2019) to hyperparameter tuning in machine learning (Bardenet et al, 2013)
- In these tasks, the objective is to find a policy that minimizes a cost or maximizes a reward.
- Derivative-based methods are restricted to differentiable functions, but here the authors show that EGL can be applied successfully whenever the objective function is merely locally integrable

Highlights

- Optimization problems are prevalent in many artificial intelligence applications, from search-and-rescue optimal deployment (Zhen et al, 2014) to triage policy in emergency rooms (Rosemarin et al, 2019) to hyperparameter tuning in machine learning (Bardenet et al, 2013)
- Many Black-Box Optimization methods operate with a two-phase iterative algorithm: (1) search or collect data with some heuristic; and (2) update a model to obtain a new candidate solution and improve the heuristic
- We presented Explicit Gradient Learning, a derivative-based Black-Box Optimization algorithm that achieves state-of-the-art results on a wide range of optimization problems
- Decreasing it lets Explicit Gradient Learning converge to a local minimum
- The concept of Explicit Gradient Learning can be generalized to other related fields, such as sequential decision-making problems (i.e. Reinforcement Learning), by directly learning the gradient of the Q-function
- We demonstrated the use of Explicit Gradient Learning in an applicative high-dimensional Black-Box problem, searching the latent space of generative models

Methods

**Design & Analysis**

the authors lay out the practical EGL algorithm and analyze its asymptotic properties.

4.1.- For a model gθ : Ω → Rn and a dataset Dk = {}m i=1, define the loss function m Lk,ε(θ) =.
- I=1 xj ∈Vε and learn θk∗ = arg minθ Lk,ε(θ), e.g. with gradient descent.
- This formulation can be used to estimate the mean-gradient for any x.
- The authors assume that the dataset Dk holds samples only from Vε

Conclusion

- The authors presented EGL, a derivative-based BBO algorithm that achieves state-of-the-art results on a wide range of optimization problems.
- Starting with a high smoothness factor, let EGL find global areas in the function with low valleys.
- Decreasing it lets EGL converge to a local minimum.
- The concept of EGL can be generalized to other related fields, such as sequential decision-making problems (i.e. Reinforcement Learning), by directly learning the gradient of the Q-function.
- The authors demonstrated the use of EGL in an applicative high-dimensional Black-Box problem, searching the latent space of generative models

Summary

## Introduction:

Optimization problems are prevalent in many artificial intelligence applications, from search-and-rescue optimal deployment (Zhen et al, 2014) to triage policy in emergency rooms (Rosemarin et al, 2019) to hyperparameter tuning in machine learning (Bardenet et al, 2013)- In these tasks, the objective is to find a policy that minimizes a cost or maximizes a reward.
- Derivative-based methods are restricted to differentiable functions, but here the authors show that EGL can be applied successfully whenever the objective function is merely locally integrable
## Methods:

**Design & Analysis**

the authors lay out the practical EGL algorithm and analyze its asymptotic properties.

4.1.- For a model gθ : Ω → Rn and a dataset Dk = {}m i=1, define the loss function m Lk,ε(θ) =.
- I=1 xj ∈Vε and learn θk∗ = arg minθ Lk,ε(θ), e.g. with gradient descent.
- This formulation can be used to estimate the mean-gradient for any x.
- The authors assume that the dataset Dk holds samples only from Vε
## Conclusion:

The authors presented EGL, a derivative-based BBO algorithm that achieves state-of-the-art results on a wide range of optimization problems.- Starting with a high smoothness factor, let EGL find global areas in the function with low valleys.
- Decreasing it lets EGL converge to a local minimum.
- The concept of EGL can be generalized to other related fields, such as sequential decision-making problems (i.e. Reinforcement Learning), by directly learning the gradient of the Q-function.
- The authors demonstrated the use of EGL in an applicative high-dimensional Black-Box problem, searching the latent space of generative models

Related work

- BBO problems have been studied in multiple fields with diverse approaches. Many works investigated derivative–free methods (Rios & Sahinidis, 2013), from the classic Nelder–Mead algorithm (Nelder & Mead, 1965) and Powell’s method (Powell, 1964) to more recent evolutionary algorithms such as CMA-ES (Hansen, 2006). Another line of research is derivative-based algorithms, which first approximate the gradient and then apply line-search methods such as the Conjugate Gradient (CG) Method (Shewchuk et al, 1994) and Quasi-Newton Methods, e.g. BFGS (Nocedal & Wright, 2006). Other model-based methods such as SLSQP (Bonnans et al, 2006) and COBYLA (Powell, 2007) iteratively solve quadratic or linear approximations of the objective function. Some variants apply trust-region methods and iteratively find an optimum within a trusted subset of the domain (Conn et al, 2009; Chen et al, 2018). Another line of research is more focused on stochastic discrete problems, e.g. Bayesian methods (Snoek et al, 2015), and multi-armed bandit problems (Flaxman et al, 2004).

Funding

- This work was supported in part by the Ministry of Science & Technology, Israel

Reference

- Audet, C. and Hare, W. Derivative-free and blackbox optimization. Springer, 2017.
- Back, T. Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press, 1996.
- Balandat, M., Karrer, B., Jiang, D. R., Daulton, S., Letham, B., Wilson, A. G., and Bakshy, E. BoTorch: Programmable Bayesian Optimization in PyTorch. arxiv e-prints, 2019. URL http://arxiv.org/abs/1910.06403.
- Bardenet, R., Brendel, M., Kegl, B., and Sebag, M. Collaborative hyperparameter tuning. In International conference on machine learning, pp. 199–207, 2013.
- Bau, D., Zhu, J.-Y., Wulff, J., Peebles, W., Strobelt, H., Zhou, B., and Torralba, A. Seeing what a gan cannot generate. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4502–4511, 2019.
- Bertsekas, D. P. and Scientific, A. Convex optimization algorithms. Athena Scientific Belmont, 2015.
- Bonnans, J.-F., Gilbert, J. C., Lemarechal, C., and Sagastizabal, C. A. Numerical optimization: theoretical and practical aspects. Springer Science & Business Media, 2006.
- Brock, A., Donahue, J., and Simonyan, K. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
- Brownlee, J. Better Deep Learning: Train Faster, Reduce Overfitting, and Make Better Predictions. Machine Learning Mastery, 2018.
- Chen, R., Menickelly, M., and Scheinberg, K. Stochastic optimization using a trust-region method and random models. Mathematical Programming, 169(2):447–487, 2018.
- Cohen, N. and Shashua, A. Inductive bias of deep convolutional networks through pooling geometry. arXiv preprint arXiv:1605.06743, 2016.
- Conn, A. R., Gould, N. I., and Toint, P. L. Trust region methods. SIAM, 2000.
- Dolan, E. D. and More, J. J. Benchmarking optimization software with performance profiles. Mathematical programming, 91(2):201–213, 2002.
- Fey, M., Eric Lenssen, J., Weichert, F., and Muller, H. Splinecnn: Fast geometric deep learning with continuous b-spline kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 869–877, 2018.
- Flaxman, A. D., Kalai, A. T., and McMahan, H. B. Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004.
- Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., and Sculley, D. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1487–1495, 2017.
- Hansen, N., Auger, A., Ros, R., Finck, S., and Posık, P. Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In Proceedings of the 12th annual conference companion on Genetic and evolutionary computation, pp. 1689–1696, 2010.
- Hansen, N., Brockhoff, D., Mersmann, O., Tusar, T., Tusar, D., ElHara, O. A., Sampaio, P. R., Atamna, A., Varelas, K., Batu, U., Nguyen, D. M., Matzner, F., and Auger, A. COmparing Continuous Optimizers: numbbo/COCO on Github, March 2019. URL https://doi.org/10.5281/zenodo.2594848.
- Kazemi, V. and Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1867–1874, 2014.
- Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Lee, J. D., Simchowitz, M., Jordan, M. I., and Recht, B. Gradient descent converges to minimizers. arXiv preprint arXiv:1602.04915, 2016.
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Maheswaranathan, N., Metz, L., Tucker, G., Choi, D., and Sohl-Dickstein, J. Guided evolutionary strategies: Augmenting random search with surrogate gradients. arXiv preprint arXiv:1806.10230, 2018.
- Mania, H., Guy, A., and Recht, B. Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055, 2018.
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Human-level control through deep reinforcement learning. Nature, 518(7540): 529–533, 2015.
- Nelder, J. A. and Mead, R. A simplex method for function minimization. The computer journal, 7(4):308–313, 1965.
- Nesterov, Y. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
- Nocedal, J. and Wright, S. Numerical optimization. Springer Science & Business Media, 2006.
- Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., and Zheng, Y. Recent progress on generative adversarial networks (gans): A survey. IEEE Access, 7:36322–36333, 2019.
- Powell, M. J. An efficient method for finding the minimum of a function of several variables without calculating derivatives. The computer journal, 7(2):155–162, 1964.
- Powell, M. J. A view of algorithms for optimization without derivatives. Mathematics Today-Bulletin of the Institute of Mathematics and its Applications, 43(5):170–174, 2007.
- Reinsch, C. H. Smoothing by spline functions. Numerische mathematik, 10(3):177–183, 1967.
- Rios, L. M. and Sahinidis, N. V. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization, 56(3): 1247–1293, 2013.
- Rosemarin, H., Rosenfeld, A., and Kraus, S. Emergency department online patient-caregiver scheduling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 695–701, 2019.
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- Saremi, S. On approximating ∇f with neural networks. arXiv preprint arXiv:1910.12744, 2019.
- Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889–1897, 2015.
- Sener, O. and Koltun, V. Learning to guide random search. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1gHokBKwS.
- Shewchuk, J. R. et al. An introduction to the conjugate gradient method without the agonizing pain, 1994.
- Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
- Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., and Adams, R. Scalable bayesian optimization using deep neural networks. In International conference on machine learning, pp. 2171–2180, 2015.
- Vemula, A., Sun, W., and Bagnell, J. A. Contrasting exploration in parameter and action space: A zeroth-order optimization perspective. arXiv preprint arXiv:1901.11503, 2019.
- Volz, V., Schrum, J., Liu, J., Lucas, S. M., Smith, A., and Risi, S. Evolving mario levels in the latent space of a deep convolutional generative adversarial network. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 221–228, 2018.
- Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
- Yuan, X., He, P., Zhu, Q., and Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 30(9): 2805–2824, 2019.
- Zhen, L., Wang, K., Hu, H., and Chang, D. A simulation optimization framework for ambulance deployment and relocation problems. Computers & Industrial Engineering, 72:12–23, 2014.

Full Text

Tags

Comments