Explicit Gradient Learning for Black-Box Optimization

Elad Sarafian
Elad Sarafian
Mor Sinay
Mor Sinay
yoram louzoun
yoram louzoun

ICML, pp. 8480-8490, 2020.

Cited by: 0|Bibtex|Views61
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de
Weibo:
We presented Explicit Gradient Learning, a derivative-based Black-Box Optimization algorithm that achieves state-of-the-art results on a wide range of optimization problems

Abstract:

Black-Box Optimization (BBO) methods can find optimal policies for systems that interact with complex environments with no analytical representation. As such, they are of interest in many Artificial Intelligence (AI) domains. Yet classical BBO methods fall short in high-dimensional non-convex problems. They are thus often overlooked in re...More

Code:

Data:

0
Introduction
  • Optimization problems are prevalent in many artificial intelligence applications, from search-and-rescue optimal deployment (Zhen et al, 2014) to triage policy in emergency rooms (Rosemarin et al, 2019) to hyperparameter tuning in machine learning (Bardenet et al, 2013)
  • In these tasks, the objective is to find a policy that minimizes a cost or maximizes a reward.
  • Derivative-based methods are restricted to differentiable functions, but here the authors show that EGL can be applied successfully whenever the objective function is merely locally integrable
Highlights
  • Optimization problems are prevalent in many artificial intelligence applications, from search-and-rescue optimal deployment (Zhen et al, 2014) to triage policy in emergency rooms (Rosemarin et al, 2019) to hyperparameter tuning in machine learning (Bardenet et al, 2013)
  • Many Black-Box Optimization methods operate with a two-phase iterative algorithm: (1) search or collect data with some heuristic; and (2) update a model to obtain a new candidate solution and improve the heuristic
  • We presented Explicit Gradient Learning, a derivative-based Black-Box Optimization algorithm that achieves state-of-the-art results on a wide range of optimization problems
  • Decreasing it lets Explicit Gradient Learning converge to a local minimum
  • The concept of Explicit Gradient Learning can be generalized to other related fields, such as sequential decision-making problems (i.e. Reinforcement Learning), by directly learning the gradient of the Q-function
  • We demonstrated the use of Explicit Gradient Learning in an applicative high-dimensional Black-Box problem, searching the latent space of generative models
Methods
  • Design & Analysis

    the authors lay out the practical EGL algorithm and analyze its asymptotic properties.

    4.1.
  • For a model gθ : Ω → Rn and a dataset Dk = {}m i=1, define the loss function m Lk,ε(θ) =.
  • I=1 xj ∈Vε and learn θk∗ = arg minθ Lk,ε(θ), e.g. with gradient descent.
  • This formulation can be used to estimate the mean-gradient for any x.
  • The authors assume that the dataset Dk holds samples only from Vε
Conclusion
  • The authors presented EGL, a derivative-based BBO algorithm that achieves state-of-the-art results on a wide range of optimization problems.
  • Starting with a high smoothness factor, let EGL find global areas in the function with low valleys.
  • Decreasing it lets EGL converge to a local minimum.
  • The concept of EGL can be generalized to other related fields, such as sequential decision-making problems (i.e. Reinforcement Learning), by directly learning the gradient of the Q-function.
  • The authors demonstrated the use of EGL in an applicative high-dimensional Black-Box problem, searching the latent space of generative models
Summary
  • Introduction:

    Optimization problems are prevalent in many artificial intelligence applications, from search-and-rescue optimal deployment (Zhen et al, 2014) to triage policy in emergency rooms (Rosemarin et al, 2019) to hyperparameter tuning in machine learning (Bardenet et al, 2013)
  • In these tasks, the objective is to find a policy that minimizes a cost or maximizes a reward.
  • Derivative-based methods are restricted to differentiable functions, but here the authors show that EGL can be applied successfully whenever the objective function is merely locally integrable
  • Methods:

    Design & Analysis

    the authors lay out the practical EGL algorithm and analyze its asymptotic properties.

    4.1.
  • For a model gθ : Ω → Rn and a dataset Dk = {}m i=1, define the loss function m Lk,ε(θ) =.
  • I=1 xj ∈Vε and learn θk∗ = arg minθ Lk,ε(θ), e.g. with gradient descent.
  • This formulation can be used to estimate the mean-gradient for any x.
  • The authors assume that the dataset Dk holds samples only from Vε
  • Conclusion:

    The authors presented EGL, a derivative-based BBO algorithm that achieves state-of-the-art results on a wide range of optimization problems.
  • Starting with a high smoothness factor, let EGL find global areas in the function with low valleys.
  • Decreasing it lets EGL converge to a local minimum.
  • The concept of EGL can be generalized to other related fields, such as sequential decision-making problems (i.e. Reinforcement Learning), by directly learning the gradient of the Q-function.
  • The authors demonstrated the use of EGL in an applicative high-dimensional Black-Box problem, searching the latent space of generative models
Related work
  • BBO problems have been studied in multiple fields with diverse approaches. Many works investigated derivative–free methods (Rios & Sahinidis, 2013), from the classic Nelder–Mead algorithm (Nelder & Mead, 1965) and Powell’s method (Powell, 1964) to more recent evolutionary algorithms such as CMA-ES (Hansen, 2006). Another line of research is derivative-based algorithms, which first approximate the gradient and then apply line-search methods such as the Conjugate Gradient (CG) Method (Shewchuk et al, 1994) and Quasi-Newton Methods, e.g. BFGS (Nocedal & Wright, 2006). Other model-based methods such as SLSQP (Bonnans et al, 2006) and COBYLA (Powell, 2007) iteratively solve quadratic or linear approximations of the objective function. Some variants apply trust-region methods and iteratively find an optimum within a trusted subset of the domain (Conn et al, 2009; Chen et al, 2018). Another line of research is more focused on stochastic discrete problems, e.g. Bayesian methods (Snoek et al, 2015), and multi-armed bandit problems (Flaxman et al, 2004).
Funding
  • This work was supported in part by the Ministry of Science & Technology, Israel
Reference
  • Audet, C. and Hare, W. Derivative-free and blackbox optimization. Springer, 2017.
    Google ScholarFindings
  • Back, T. Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press, 1996.
    Google ScholarFindings
  • Balandat, M., Karrer, B., Jiang, D. R., Daulton, S., Letham, B., Wilson, A. G., and Bakshy, E. BoTorch: Programmable Bayesian Optimization in PyTorch. arxiv e-prints, 2019. URL http://arxiv.org/abs/1910.06403.
    Findings
  • Bardenet, R., Brendel, M., Kegl, B., and Sebag, M. Collaborative hyperparameter tuning. In International conference on machine learning, pp. 199–207, 2013.
    Google ScholarLocate open access versionFindings
  • Bau, D., Zhu, J.-Y., Wulff, J., Peebles, W., Strobelt, H., Zhou, B., and Torralba, A. Seeing what a gan cannot generate. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4502–4511, 2019.
    Google ScholarLocate open access versionFindings
  • Bertsekas, D. P. and Scientific, A. Convex optimization algorithms. Athena Scientific Belmont, 2015.
    Google ScholarLocate open access versionFindings
  • Bonnans, J.-F., Gilbert, J. C., Lemarechal, C., and Sagastizabal, C. A. Numerical optimization: theoretical and practical aspects. Springer Science & Business Media, 2006.
    Google ScholarFindings
  • Brock, A., Donahue, J., and Simonyan, K. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
    Findings
  • Brownlee, J. Better Deep Learning: Train Faster, Reduce Overfitting, and Make Better Predictions. Machine Learning Mastery, 2018.
    Google ScholarLocate open access versionFindings
  • Chen, R., Menickelly, M., and Scheinberg, K. Stochastic optimization using a trust-region method and random models. Mathematical Programming, 169(2):447–487, 2018.
    Google ScholarLocate open access versionFindings
  • Cohen, N. and Shashua, A. Inductive bias of deep convolutional networks through pooling geometry. arXiv preprint arXiv:1605.06743, 2016.
    Findings
  • Conn, A. R., Gould, N. I., and Toint, P. L. Trust region methods. SIAM, 2000.
    Google ScholarLocate open access versionFindings
  • Dolan, E. D. and More, J. J. Benchmarking optimization software with performance profiles. Mathematical programming, 91(2):201–213, 2002.
    Google ScholarLocate open access versionFindings
  • Fey, M., Eric Lenssen, J., Weichert, F., and Muller, H. Splinecnn: Fast geometric deep learning with continuous b-spline kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 869–877, 2018.
    Google ScholarLocate open access versionFindings
  • Flaxman, A. D., Kalai, A. T., and McMahan, H. B. Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004.
    Google ScholarFindings
  • Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., and Sculley, D. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1487–1495, 2017.
    Google ScholarLocate open access versionFindings
  • Hansen, N., Auger, A., Ros, R., Finck, S., and Posık, P. Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In Proceedings of the 12th annual conference companion on Genetic and evolutionary computation, pp. 1689–1696, 2010.
    Google ScholarLocate open access versionFindings
  • Hansen, N., Brockhoff, D., Mersmann, O., Tusar, T., Tusar, D., ElHara, O. A., Sampaio, P. R., Atamna, A., Varelas, K., Batu, U., Nguyen, D. M., Matzner, F., and Auger, A. COmparing Continuous Optimizers: numbbo/COCO on Github, March 2019. URL https://doi.org/10.5281/zenodo.2594848.
    Locate open access versionFindings
  • Kazemi, V. and Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1867–1874, 2014.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
    Findings
  • Lee, J. D., Simchowitz, M., Jordan, M. I., and Recht, B. Gradient descent converges to minimizers. arXiv preprint arXiv:1602.04915, 2016.
    Findings
  • Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
    Findings
  • Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
    Google ScholarLocate open access versionFindings
  • Maheswaranathan, N., Metz, L., Tucker, G., Choi, D., and Sohl-Dickstein, J. Guided evolutionary strategies: Augmenting random search with surrogate gradients. arXiv preprint arXiv:1806.10230, 2018.
    Findings
  • Mania, H., Guy, A., and Recht, B. Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055, 2018.
    Findings
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Human-level control through deep reinforcement learning. Nature, 518(7540): 529–533, 2015.
    Google ScholarLocate open access versionFindings
  • Nelder, J. A. and Mead, R. A simplex method for function minimization. The computer journal, 7(4):308–313, 1965.
    Google ScholarLocate open access versionFindings
  • Nesterov, Y. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
    Google ScholarFindings
  • Nocedal, J. and Wright, S. Numerical optimization. Springer Science & Business Media, 2006.
    Google ScholarFindings
  • Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., and Zheng, Y. Recent progress on generative adversarial networks (gans): A survey. IEEE Access, 7:36322–36333, 2019.
    Google ScholarLocate open access versionFindings
  • Powell, M. J. An efficient method for finding the minimum of a function of several variables without calculating derivatives. The computer journal, 7(2):155–162, 1964.
    Google ScholarLocate open access versionFindings
  • Powell, M. J. A view of algorithms for optimization without derivatives. Mathematics Today-Bulletin of the Institute of Mathematics and its Applications, 43(5):170–174, 2007.
    Google ScholarLocate open access versionFindings
  • Reinsch, C. H. Smoothing by spline functions. Numerische mathematik, 10(3):177–183, 1967.
    Google ScholarLocate open access versionFindings
  • Rios, L. M. and Sahinidis, N. V. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization, 56(3): 1247–1293, 2013.
    Google ScholarLocate open access versionFindings
  • Rosemarin, H., Rosenfeld, A., and Kraus, S. Emergency department online patient-caregiver scheduling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 695–701, 2019.
    Google ScholarLocate open access versionFindings
  • Ruder, S. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
    Findings
  • Saremi, S. On approximating ∇f with neural networks. arXiv preprint arXiv:1910.12744, 2019.
    Findings
  • Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889–1897, 2015.
    Google ScholarLocate open access versionFindings
  • Sener, O. and Koltun, V. Learning to guide random search. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1gHokBKwS.
    Locate open access versionFindings
  • Shewchuk, J. R. et al. An introduction to the conjugate gradient method without the agonizing pain, 1994.
    Google ScholarFindings
  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
    Google ScholarLocate open access versionFindings
  • Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., and Adams, R. Scalable bayesian optimization using deep neural networks. In International conference on machine learning, pp. 2171–2180, 2015.
    Google ScholarLocate open access versionFindings
  • Vemula, A., Sun, W., and Bagnell, J. A. Contrasting exploration in parameter and action space: A zeroth-order optimization perspective. arXiv preprint arXiv:1901.11503, 2019.
    Findings
  • Volz, V., Schrum, J., Liu, J., Lucas, S. M., Smith, A., and Risi, S. Evolving mario levels in the latent space of a deep convolutional generative adversarial network. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 221–228, 2018.
    Google ScholarLocate open access versionFindings
  • Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
    Google ScholarFindings
  • Yuan, X., He, P., Zhu, Q., and Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 30(9): 2805–2824, 2019.
    Google ScholarLocate open access versionFindings
  • Zhen, L., Wang, K., Hu, H., and Chang, D. A simulation optimization framework for ambulance deployment and relocation problems. Computers & Industrial Engineering, 72:12–23, 2014.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments