High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization

Qing Feng
Qing Feng
Ben Letham
Ben Letham

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views24
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
The latent context embedding additive model makes it possible to optimize in high-dimensional policy spaces by leveraging plausible inductive biases for contextual policies

Abstract:

Contextual policies are used in many settings to customize system parameters and actions to the specifics of a particular setting. In some real-world settings, such as randomized controlled trials or A/B tests, it may not be possible to measure policy outcomes at the level of context—we observe only aggregate rewards across a distribution...More

Code:

Data:

0
Introduction
  • Contextual policies are used in a wide range of applications, such as robotics [22, 30] and computing platforms [9].
  • The optimal policy for a particular ABR controller may depend on the network—for instance, a stream with large fluctuations in bandwidth will benefit from different ABR parameters than a stream with stable bandwidth
  • This motivates the use of a contextual policy where ABR parameters are personalized by context variables such as country or network type (2G, 3G, 4G, etc.).
  • Another set of methods for high-dimensional BO have assumed low-dimensional linear [44, 6, 36, 7, 34, 25] or nonlinear [15, 27, 32] structure to the problem
Highlights
  • Contextual policies are used in a wide range of applications, such as robotics [22, 30] and computing platforms [9]
  • (2) We develop new Gaussian process (GP) models that take advantage of the problem structure to significantly improve over existing Bayesian optimization (BO) approaches
  • (3) We provide a thorough simulation study that shows how the models scale with factors such as the number of contexts and the population distribution of contexts, considering both aggregate rewards and fairness
  • (4) We introduce a new real-world problem for contextual policy optimization (CPO), optimizing a contextual adaptive bitrate (ABR) policy, and show that our models perform best relative to a wide range of alternative approaches
  • We develop two kernels that allow for effective BO in this space by taking advantage of the particular structure of the aggregated CPO problem
  • Degraded while the other methods found significantly better policies across the full range of
  • The latent context embedding additive (LCE-A) model makes it possible to optimize in high-dimensional policy spaces by leveraging plausible inductive biases for contextual policies
Results
  • Degraded while the other methods found significantly better policies across the full range of.
Conclusion
  • The authors have shown that it is possible to deploy and optimize contextual policies even when rewards cannot be measured at the level of context.
  • The LCE-A model makes it possible to optimize in high-dimensional policy spaces by leveraging plausible inductive biases for contextual policies.
  • This improves top-level aggregate rewards relative to non-contextual policies, and improves the fairness of the policy by improving outcomes across all contexts.
  • The authors hope that future work can consider leveraging pre-trained, unsupervised representations of contexts to reduce the burden of learning good embeddings of contexts from scratch, which would further enable the method to scale to a very large number of contexts
Summary
  • Introduction:

    Contextual policies are used in a wide range of applications, such as robotics [22, 30] and computing platforms [9].
  • The optimal policy for a particular ABR controller may depend on the network—for instance, a stream with large fluctuations in bandwidth will benefit from different ABR parameters than a stream with stable bandwidth
  • This motivates the use of a contextual policy where ABR parameters are personalized by context variables such as country or network type (2G, 3G, 4G, etc.).
  • Another set of methods for high-dimensional BO have assumed low-dimensional linear [44, 6, 36, 7, 34, 25] or nonlinear [15, 27, 32] structure to the problem
  • Results:

    Degraded while the other methods found significantly better policies across the full range of.
  • Conclusion:

    The authors have shown that it is possible to deploy and optimize contextual policies even when rewards cannot be measured at the level of context.
  • The LCE-A model makes it possible to optimize in high-dimensional policy spaces by leveraging plausible inductive biases for contextual policies.
  • This improves top-level aggregate rewards relative to non-contextual policies, and improves the fairness of the policy by improving outcomes across all contexts.
  • The authors hope that future work can consider leveraging pre-trained, unsupervised representations of contexts to reduce the burden of learning good embeddings of contexts from scratch, which would further enable the method to scale to a very large number of contexts
Funding
  • Rapidly degraded while the other methods found significantly better policies across the full range of
Reference
  • Introducing tensorflow feature columns. https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html. Accessed:2020-06-05.
    Findings
  • Mohan R Akella, Rajan Batta, Moises Sudit, Peter Rogerson, and Alan Blatt. Cellular network configuration with co-channel and adjacent-channel interference constraints. Computers & Operations Research, 35(12):3738–3757, 2008.
    Google ScholarLocate open access versionFindings
  • Mauricio A. Álvarez, Lorenzo Rosasco, and Neil D. Lawrence. Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning, 4(3):195–266, 2012.
    Google ScholarLocate open access versionFindings
  • Raul Astudillo and Peter I Frazier. Bayesian optimization of composite functions. arXiv preprint arXiv:1906.01537, 2019.
    Findings
  • Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In Advances in Neural Information Processing Systems 33, NeurIPS, 2020.
    Google ScholarLocate open access versionFindings
  • Mickaël Binois, David Ginsbourger, and Olivier Roustant. A warped kernel improving robustness in Bayesian optimization via random embeddings. In Proceedings of the International Conference on Learning and Intelligent Optimization, LION, pages 281–286, 2015.
    Google ScholarLocate open access versionFindings
  • Mickaël Binois, David Ginsbourger, and Olivier Roustant. On the choice of the low-dimensional domain for global optimization via random embeddings. Journal of Global Optimization, 76(1):69–90, 2020.
    Google ScholarLocate open access versionFindings
  • Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In Advances in Neural Information Processing Systems 20, NIPS, pages 153–160, 2007.
    Google ScholarLocate open access versionFindings
  • Ian Char, Youngseog Chung, Willie Neiswanger, Kirthevasan Kandasamy, Andrew Oakleigh Nelson, Mark Boyer, Egemen Kolemen, and Jeff Schneider. Offline contextual Bayesian optimization. In Advances in Neural Information Processing Systems 32, NeurIPS, pages 4627–4638, 2019.
    Google ScholarLocate open access versionFindings
  • Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023, 2018.
    Findings
  • Jeffrey Dean and Luiz André Barroso. The tail at scale. Communications of the ACM, 56(2):74– 80, February 2013.
    Google ScholarLocate open access versionFindings
  • David K. Duvenaud, Hannes Nickisch, and Carl E. Rasmussen. Additive Gaussian processes. In Advances in Neural Information Processing Systems 24, NIPS, pages 226–234, 2011.
    Google ScholarLocate open access versionFindings
  • David Eriksson, Kun Dong, Eric Lee, David Bindel, and Andrew G. Wilson. Scaling Gaussian process regression with derivatives. In Advances in Neural Information Processing Systems 31, NIPS, pages 6867–6877, 2018.
    Google ScholarLocate open access versionFindings
  • Jacob Gardner, Chuan Guo, Kilian Q. Weinberger, Roman Garnett, and Roger Grosse. Discovering and exploiting additive structure for Bayesian optimization. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS, pages 1311–1319, 2017.
    Google ScholarLocate open access versionFindings
  • Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276, 2018.
    Google ScholarLocate open access versionFindings
  • Cheng Guo and Felix Berkhahn. Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737, 2016.
    Findings
  • Nikolaus Hansen, Sibylle D. Müller, and Petros Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18, 2003.
    Google ScholarLocate open access versionFindings
  • Kohei Hayashi, Takashi Takenouchi, Ryota Tomioka, and Hisashi Kashima. Self-measuring similarity for multi-task Gaussian process. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pages 145–153, 2012.
    Google ScholarLocate open access versionFindings
  • Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, 1998.
    Google ScholarLocate open access versionFindings
  • Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. High dimensional Bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, ICML, pages 295–304, 2015.
    Google ScholarLocate open access versionFindings
  • Andreas Krause and Cheng S. Ong. Contextual Gaussian process bandit optimization. In Advances in Neural Information Processing Systems 24, NIPS, pages 2447–2455, 2011.
    Google ScholarLocate open access versionFindings
  • Andras Gabor Kupcsik, Marc Peter Deisenroth, Jan Peters, and Gerhard Neumann. Dataefficient generalization of robot skills with contextual policy search. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI, pages 1401–1407, 2013.
    Google ScholarLocate open access versionFindings
  • Jeffrey Scott Lehman. Sequential design of computer experiments for robust parameter design. PhD thesis, The Ohio State University, 2002.
    Google ScholarFindings
  • Benjamin Letham and Eytan Bakshy. Bayesian optimization for policy search via online-offline experimentation. Journal of Machine Learning Research, 20(145):1–30, 2019.
    Google ScholarLocate open access versionFindings
  • Benjamin Letham, Roberto Calandra, Akshara Rai, and Eytan Bakshy. Re-examining linear embeddings for high-dimensional Bayesian optimization. In Advances in Neural Information Processing Systems 33, NeurIPS, 2020.
    Google ScholarLocate open access versionFindings
  • Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW, pages 661–670, 2010.
    Google ScholarLocate open access versionFindings
  • Xiaoyu Lu, Javier González, Zhenwen Dai, and Neil Lawrence. Structured variationally auto-encoded optimization. In Proceedings of the 35th International Conference on Machine Learning, ICML, pages 3267–3275, 2018.
    Google ScholarLocate open access versionFindings
  • Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Yuandong Tian, Mohammad Alizadeh, and Eytan Bakshy. Real-world video adaptation with reinforcement learning. arXiv preprint arXiv:2008.12858, 2020.
    Findings
  • Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Bojja Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, and Mohammad Alizadeh. Park: An open platform for learning-augmented computer systems. In Advances in Neural Information Processing Systems 32, NeurIPS, pages 2490–2502, 2019.
    Google ScholarLocate open access versionFindings
  • Jan Hendrik Metzen, Alexander Fabisch, and Jonas Hansen. Bayesian optimization for contextual policy search. In Proceedings of the Second Machine Learning in Planning and Control of Robot Motion Workshop, IROS Workshop, MLPC, 2015.
    Google ScholarLocate open access versionFindings
  • Jacob M. Montgomery, Brendan Nyhan, and Michelle Torres. How conditioning on posttreatment variables can ruin your experiment and what to do about it. American Journal of Political Science, 62(3):760–775, 2018.
    Google ScholarLocate open access versionFindings
  • Riccardo Moriconi, K. S. Sesh Kumar, and Marc P. Deisenroth. High-dimensional Bayesian optimization with manifold Gaussian processes. arXiv preprint arXiv:1902.10675, 2019.
    Findings
  • Mojmír Mutný and Andreas Krause. Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. In Advances in Neural Information Processing Systems 31, NIPS, pages 9005–9016, 2018.
    Google ScholarLocate open access versionFindings
  • Amin Nayebi, Alexander Munteanu, and Matthias Poloczek. A framework for Bayesian optimization in embedded subspaces. In Proceedings of the 36th International Conference on Machine Learning, ICML, pages 4752–4761, 2019.
    Google ScholarLocate open access versionFindings
  • Art B Owen. Scrambling Sobol’and Niederreiter–Xing points. Journal of Complexity, 14(4):466– 489, 1998.
    Google ScholarLocate open access versionFindings
  • Hong Qian, Yi-Qi Hu, and Yang Yu. Derivative-free optimization of high-dimensional nonconvex functions by sequential random embeddings. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI, pages 1946–1952, 2016.
    Google ScholarLocate open access versionFindings
  • Peter Z. G. Qian, Huaiqing Wu, and C. F. Jeff Wu. Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics, 50(3):383–396, 2008.
    Google ScholarLocate open access versionFindings
  • Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Massachusetts, 2006.
    Google ScholarFindings
  • Paul Rolland, Jonathan Scarlett, Ilija Bogunovic, and Volkan Cevher. High-dimensional Bayesian optimization via additive models with overlapping groups. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS, pages 298–307, 2018.
    Google ScholarLocate open access versionFindings
  • Kevin Swersky, Jasper Snoek, and Ryan P. Adams. Multi-task Bayesian optimization. In Advances in Neural Information Processing Systems 26, NIPS, pages 2004–2012, 2013.
    Google ScholarLocate open access versionFindings
  • Matthew Tesch, Jeff Schneider, and Howie Choset. Adapting control policies for expensive systems to changing environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pages 357–364, 2011.
    Google ScholarLocate open access versionFindings
  • Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale Bayesian optimization in high-dimensional spaces. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS, 2018.
    Google ScholarLocate open access versionFindings
  • Zi Wang, Chengtao Li, Stefanie Jegelka, and Pushmeet Kohli. Batched high-dimensional Bayesian optimization via structural kernel learning. In Proceedings of the 34th International Conference on Machine Learning, ICML, pages 3656–3664, 2017.
    Google ScholarLocate open access versionFindings
  • Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, and Nando de Feitas. Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 55:361–387, 2016.
    Google ScholarLocate open access versionFindings
  • Brian J. Williams, Thomas J. Santner, and William I. Notz. Sequential design of computer experiments to minimize integrated response functions. Statistica Sinica, 10(4):1133–1152, 2008.
    Google ScholarLocate open access versionFindings
  • Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM, pages 325–338, 2015.
    Google ScholarLocate open access versionFindings
  • Yichi Zhang, Daniel W. Apley, and Wei Chen. Bayesian optimization for materials design with mixed quantitative and qualitative variables. Scientific Reports, 10(4924), 2020.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments