We proposed a method for efficient black-box optimization over high-dimensional, structured input spaces, combining latent space optimization with weighted retraining
Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining
NIPS 2020, (2020)
Many important problems in science and engineering, such as drug design, involve optimizing an expensive black-box objective function over a complex, high-dimensional, and structured input space. Although machine learning techniques have shown promise in solving such problems, existing approaches substantially lack sample efficiency. We...更多
下载 PDF 全文
- Many important problems in science and engineering can be formulated as optimizing an objective function over an input space.
- Machine learning has shown promising results in many problems that can be framed as optimization, such as conditional image [57, 40] and text  generation, molecular and materials design [11, 47], and neural architecture search 
- Despite these successes, using machine learning on structured input spaces and with limited data is still an open research area, making the use of machine learning infeasible for many practical applications
- Many important problems in science and engineering can be formulated as optimizing an objective function over an input space
- While a large body of work is applicable to the general problem formulated in Section 2, we focus only on the most relevant machine learning literature
- We proposed a method for efficient black-box optimization over high-dimensional, structured input spaces, combining latent space optimization with weighted retraining
- We showed that while being conceptually simple and easy to implement on top of previous methods, weighted retraining significantly boosts their efficiency and performance on challenging real-world optimization problems
- We observed that weighted retraining was less beneficial when used with poorly-performing optimization algorithms
- A further interesting direction would be to consider different classes of weighting functions, those that are robust to noise in the objective function evaluations
- Similar to Fig. 3, the performance can be seen to significantly improve immediately after several of the retraining steps, suggesting that the retraining does incorporate new information into the latent space, as conjectured.
- The authors proposed a method for efficient black-box optimization over high-dimensional, structured input spaces, combining latent space optimization with weighted retraining.
- The authors showed that while being conceptually simple and easy to implement on top of previous methods, weighted retraining significantly boosts their efficiency and performance on challenging real-world optimization problems.
- The latent space of DGMs can be challenging to optimize over, motivating further research to optimize more effectively and/or make the space more amenable to optimization
- Another promising idea is the use of a weighting schedule instead of a fixed weighting, which may allow balancing exploration vs exploitation similar to simulated annealing .
- The authors envision weighted retraining to become a core component of model-based optimization methods, further establishing machine learning as a critical tool for advancing science and engineering
- Table1: Comparison of top 3 scores on chemical design task. Baseline results are copied from [<a class="ref-link" id="c65" href="#r65">65</a>]. All our results state the worst of 3 runs (unless otherwise stated), each run being 500 epochs
- Table2: Approximate runtimes of main experiments
- While a large body of work is applicable to the general problem formulated in Section 2 (both using and not using machine learning), in this section we focus only on the most relevant machine learning literature. Early formulations of LSO were motivated by scaling Gaussian processes to high dimensional problems with simple linear manifolds, using either random projections  or a learned transformation matrix . LSO using DGMs was first applied to chemical design in , and further built upon in subsequent papers [21, 29, 9, 24, 5, 15, 35]. It has also been applied to other field such as automatic machine learning [33, 34], and conditional image generation [41, 40]. If the optimization model is a Gaussian process, the DGM can be viewed as a form of “extended kernel”, making LSO conceptually related to deep kernel learning [63, 19].
There are several previous papers that have used ideas closely related to weighted retraining. Perhaps the closest model to ours is the Feedback GAN , wherein samples are generated with a GAN and evaluated, discarding samples with low scores. These n samples replace the n oldest points in the training set, after which the GAN is retrained on the new dataset. This can be viewed as a crude version of weighted retraining, only using the weights 0 and 1/N , and assigning weights not based on scores but on novelty. Similarly, in  a generative model is trained on drug-like molecules, then repeatedly sampled, evaluating all samples and keeping only those with high scores. The model is then fine-tuned on the high-scoring samples and this process is repeated. Again, this can be viewed as a special case of weighted retraining, where the weights are implicitly defined by the number of fine-tuning epochs. Furthermore, both of these techniques are purely generative and have no optimization component, so we believe that they are fundamentally sample inefficient.
- AT acknowledges funding via a C T Taylor Cambridge International Scholarship
- ED acknowledges funding by the EPSRC and Qualcomm
- This work has been performed using resources provided by the Cambridge Tier-2 system operated by the University of Cambridge Research Computing Service (http://www.hpc.cam.ac.uk) funded by EPSRC Tier-2 capital grant EP/P020259/1
- R. Baptista and M. Poloczek. Bayesian optimization of combinatorial structures. arXiv preprint arXiv:1806.08838, 2018.
- S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio. Generating Sentences from a Continuous Space. arXiv:1511.06349 [cs], May 2016. arXiv: 1511.06349.
- E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
- D. H. Brookes and J. Listgarten. Design by adaptive sampling. arXiv:1810.03714 [cs, q-bio, stat], Feb 2020. arXiv: 1810.03714.
- H. Dai, Y. Tian, B. Dai, S. Skiena, and L. Song. Syntax-Directed Variational Autoencoder for Structured Data. arXiv:1802.08786 [cs], Feb. 2018. arXiv: 1802.08786.
- E. Daxberger, A. Makarova, M. Turchetta, and A. Krause. Mixed-variable bayesian optimization. arXiv preprint arXiv:1907.01329, 2019.
- N. De Cao and T. Kipf. MolGAN: An implicit generative model for small molecular graphs. arXiv:1805.11973 [cs, stat], May 2018. arXiv: 1805.11973.
- A. G. De G. Matthews, M. Van Der Wilk, T. Nickson, K. Fujii, A. Boukouvalas, P. León-Villagrá, Z. Ghahramani, and J. Hensman. Gpflow: A gaussian process library using tensorflow. The Journal of Machine Learning Research, 18(1):1299–1304, 2017.
- S. Eismann, D. Levy, R. Shu, S. Bartzsch, and S. Ermon. Bayesian optimization and attribute adjustment. In Proceedings of the Thirty-Fourth Conference (2018), page 11, Monterey, California, USA, Aug. 2018. Association for Uncertainty in Artificial Intelligence.
- T. Elsken, J. H. Metzen, and F. Hutter. Neural Architecture Search: A Survey. Journal of Machine Learning Research, 20(55):1–21, 2019.
- D. Elton, Z. Boukouvalas, M. Fuge, and P. Chung. Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering, 4(4):828–849, 2019. Publisher: Royal Society of Chemistry.
- R. Garnett, M. A. Osborne, and P. Hennig. Active Learning of Linear Embeddings for Gaussian Processes. arXiv:1310.6740 [cs, stat], Oct. 2013. arXiv: 1310.6740.
- R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- R.-R. Griffiths and J. Miguel Hernández-Lobato. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 11(2):577–586, 2020. Publisher: Royal Society of Chemistry.
- G. L. Guimaraes, B. Sanchez-Lengeling, C. Outeiral, P. L. C. Farias, and A. Aspuru-Guzik. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models. arXiv:1705.10843 [cs, stat], Feb. 2018. arXiv: 1705.10843.
- A. Gupta and J. Zou. Feedback GAN for DNA optimizes protein functions. Nature Machine Intelligence, 1(2):105–111, Feb. 2019. Number: 2 Publisher: Nature Publishing Group.
- T. N. Hoang, Q. M. Hoang, R. Ouyang, and K. H. Low. Decentralized high-dimensional bayesian optimization with factor graphs. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- W. Huang, D. Zhao, F. Sun, H. Liu, and E. Chang. Scalable Gaussian Process Regression Using Deep Neural Networks. In Twenty-Fourth International Joint Conference on Artificial Intelligence, June 2015.
- J. J. Irwin, T. Sterling, M. M. Mysinger, E. S. Bolstad, and R. G. Coleman. ZINC: A Free Tool to Discover Chemistry for Biology. Journal of Chemical Information and Modeling, 52(7):1757–1768, July 2012. Publisher: American Chemical Society.
- W. Jin, R. Barzilay, and T. Jaakkola. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv:1802.04364 [cs, stat], Mar. 2019. arXiv: 1802.04364.
- W. Jin, K. Yang, R. Barzilay, and T. Jaakkola. Learning Multimodal Graph-to-Graph Translation for Molecular Optimization. arXiv:1812.01070 [cs, stat], Jan. 2019. arXiv: 1812.01070.
- D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998.
- H. Kajino. Molecular Hypergraph Grammar with Its Application to Molecular Optimization. In International Conference on Machine Learning, pages 3183–3191, May 2019. ISSN: 1938-7228 Section: Machine Learning.
- K. Kandasamy, J. Schneider, and B. Póczos. High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, pages 295–304, 2015.
- S. Kang and K. Cho. Conditional Molecular Design with Deep Generative Models. Journal of Chemical Information and Modeling, 59(1):43–52, Jan. 2019.
- J. Kim, M. McCourt, T. You, S. Kim, and S. Choi. Bayesian optimization over sets. arXiv preprint arXiv:1905.09780, 2019.
- D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- M. J. Kusner, B. Paige, and J. M. Hernández-Lobato. Grammar variational autoencoder. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 1945–1954, Sydney, NSW, Australia, Aug. 2017. JMLR.org.
- Y. Li. Deep Reinforcement Learning. arXiv:1810.06339 [cs, stat], Oct. 2018. arXiv: 1810.06339.
- Y. Li, L. Zhang, and Z. Liu. Multi-objective de novo drug design with conditional graph generative model. Journal of Cheminformatics, 10(1):33, July 2018.
- J. Lim, S. Ryu, J. W. Kim, and W. Y. Kim. Molecular generative model based on conditional variational autoencoder for de novo molecular design. Journal of Cheminformatics, 10(1):31, July 2018.
- X. Lu, J. Gonzalez, Z. Dai, and N. Lawrence. Structured variationally auto-encoded optimization. In International Conference on Machine Learning, pages 3267–3275, 2018.
- R. Luo, F. Tian, T. Qin, E. Chen, and T.-Y. Liu. Neural architecture optimization. In Advances in neural information processing systems, pages 7816–7827, 2018.
- O. Mahmood and J. M. Hernández-Lobato. A COLD Approach to Generating Optimal Samples. arXiv:1905.09885 [cs, q-bio, stat], May 2019. arXiv: 1905.09885.
- L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
- M. McCloskey and N. J. Cohen. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. In G. H. Bower, editor, Psychology of Learning and Motivation, volume 24, pages 109–165. Academic Press, Jan. 1989.
- M. Mirza and S. Osindero. Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs, stat], Nov. 2014. arXiv: 1411.1784.
- M. Mutny and A. Krause. Efficient high dimensional bayesian optimization with additivity and quadrature fourier features. In Advances in Neural Information Processing Systems, pages 9005–9016, 2018.
- A. Nguyen, J. Clune, Y. Bengio, A. Dosovitskiy, and J. Yosinski. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space. pages 4467–4477, 2017.
- A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3387–3395. Curran Associates, Inc., 2016.
- C. Oh, J. M. Tomczak, E. Gavves, and M. Welling. Combinatorial bayesian optimization using graph representations. arXiv preprint arXiv:1902.00448, 2019.
- M. Olivecrona, T. Blaschke, O. Engkvist, and H. Chen. Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1):48, Sept. 2017.
- D. W. Otter, J. R. Medina, and J. K. Kalita. A Survey of the Usages of Deep Learning for Natural Language Processing. IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2020. Conference Name: IEEE Transactions on Neural Networks and Learning Systems.
- M. Popova, O. Isayev, and A. Tropsha. Deep reinforcement learning for de novo drug design. Science Advances, 4(7):eaap7885, July 2018.
- D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.
- B. Sanchez-Lengeling and A. Aspuru-Guzik. Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400):360–365, July 2018.
- M. H. S. Segler, T. Kogej, C. Tyrchan, and M. P. Waller. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS Central Science, 4(1):120–131, Jan. 2018. Publisher: American Chemical Society.
- O. Sener and S. Savarese. Active Learning for Convolutional Neural Networks: A Core-Set Approach. arXiv:1708.00489 [cs, stat], June 2018. arXiv: 1708.00489.
- B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
- G. N. C. Simm, R. Pinsler, and J. M. Hernández-Lobato. Reinforcement Learning for Molecular Design Guided by Quantum Mechanics. arXiv:2002.07717 [cs, stat], Feb. 2020. arXiv: 2002.07717.
- M. Simonovsky and N. Komodakis. GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders. Feb. 2018.
- J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
- K. Sohn, H. Lee, and X. Yan. Learning Structured Output Representation using Deep Conditional Generative Models. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3483–3491. Curran Associates, Inc., 2015.
- R. S. Sutton, A. G. Barto, et al. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.
- M. Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial Intelligence and Statistics, pages 567–574, 2009.
- A. van den Oord, N. Kalchbrenner, L. Espeholt, k. kavukcuoglu, O. Vinyals, and A. Graves. Conditional Image Generation with PixelCNN Decoders. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4790–4798. Curran Associates, Inc., 2016.
- P. J. Van Laarhoven and E. H. Aarts. Simulated annealing. In Simulated annealing: Theory and applications, pages 7–15.
- H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo. GraphGAN: Graph Representation Learning With Generative Adversarial Nets. In Thirty-Second AAAI Conference on Artificial Intelligence, Apr. 2018.
- Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. d. Freitas. Bayesian Optimization in High Dimensions via Random Embeddings. In Twenty-Third International Joint Conference on Artificial Intelligence, June 2013.
- T. White. Sampling Generative Networks. arXiv:1609.04468 [cs, stat], Dec. 2016. arXiv: 1609.04468.
- C. K. Williams and C. E. Rasmussen. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
- A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing. Deep Kernel Learning. In Artificial Intelligence and Statistics, pages 370–378, May 2016. ISSN: 1938-7228 Section: Machine Learning.
- J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 6410–6421. Curran Associates, Inc., 2018.
- Z. Zhou, S. Kearnes, L. Li, R. N. Zare, and P. Riley. Optimization of Molecules via Deep Reinforcement Learning. Scientific Reports, 9(1):1–10, July 2019. Number: 1 Publisher: Nature Publishing Group.