## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Federated Bayesian Optimization via Thompson Sampling

NIPS 2020, (2020)

EI

Keywords

Abstract

Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (D...More

Code:

Data:

Introduction

- Bayesian optimization (BO) has recently become a prominent approach to optimizing expensiveto-evaluate black-box functions with no access to gradients, such as in hyperparameter tuning of deep neural networks (DNNs) [49].
- Some common ML tasks such as hyperparameter tuning of DNNs lack access to gradients and require zeroth-order/black-box optimization, and a recent survey [24] has pointed out that hyperparameter optimization of DNNs in the FL setting is one of the promising research directions for FL.

Highlights

- Bayesian optimization (BO) has recently become a prominent approach to optimizing expensiveto-evaluate black-box functions with no access to gradients, such as in hyperparameter tuning of deep neural networks (DNNs) [49]
- Since we allow the presence of heterogeneous agents, we do not aim to show that federated Thompson sampling (FTS) achieves a faster convergence than standard Thompson sampling (TS) and instead prove a convergence guarantee that is robust against heterogeneous agents
- This is consistent with most works proving the convergence of federated learning (FL) algorithms [34, 35] and makes the theoretical results more applicable in general since the presence of heterogeneous agents is a major and inevitable challenge of FL and federated BO (FBO)
- This paper introduces the first algorithm for the FBO setting called FTS which addresses some key challenges in FBO in a principled manner
- We theoretically show its convergence guarantee which is robust against heterogeneous agents, and empirically demonstrate its communication efficiency, computational efficiency, and practical effectiveness using three real-world experiments
- Other than the random Fourier features (RFF) approximation used in this work, other approximation techniques for Gaussian Process (GP) may be used to derive the parameters to be exchanged between agents, which is worth exploring in future works

Results

- Since the authors allow the presence of heterogeneous agents, the authors do not aim to show that FTS achieves a faster convergence than standard TS and instead prove a convergence guarantee that is robust against heterogeneous agents.
- With probability of at least 1 − δ, the cumulative regret incurred by FTS is5 where ψt

Conclusion

- Using 3 real-world experiments, the authors demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.
- In Appendix D.2.1, the authors evaluate the performance of FTS in the most general setting where the other agents are performing optimization tasks such that they may collect more observations between different rounds
**Conclusion and Future Works**. - The authors theoretically show its convergence guarantee which is robust against heterogeneous agents, and empirically demonstrate its communication efficiency, computational efficiency, and practical effectiveness using three real-world experiments.
- The authors will consider incentivizing collaboration in FBO [50] and generalizing FBO to nonmyopic BO [28, 36] and high-dimensional BO [22] settings

Related work

- Since its recent introduction in [39], FL has gained tremendous attention mainly due to its prominent practical relevance in the collaborative training of ML models such as DNNs [39] or decision treebased models [31, 32]. Meanwhile, efforts have also been made to derive theoretical convergence guarantees for FL algorithms [34, 35]. Refer to recent surveys [24, 30, 33] for more comprehensive reviews of FL. TS [54] has been known as a highly effective practical technique for multi-armed bandit problems [4, 47]. The Bayesian regret [46] and frequentist regret [9] of TS in BO have both been analyzed and TS has been shown to perform effectively in BO problems such as high-dimensional BO [40]. The theoretical analysis in this work has adopted techniques used in the works of [9, 40]. Our algorithm is also related to multi-fidelity BO [12, 25, 42, 56, 62, 63] which has the option to query low-fidelity functions. This is analogous to our algorithm allowing the target agent to use the information from the other agents for query selection and the similarity between an agent and the target agent can be interpreted as a measure of fidelity. Moreover, our algorithm also bears similarity to parallel/distributed BO algorithms [10, 13, 14], especially those based on TS [17, 26]. However, there are fundamental differences: For example, they usually optimize a single objective function whereas we need to consider possibly heterogeneous objective functions from different agents. On the other hand, BO involving multiple agents with possibly different objective functions has been studied from the perspective of game theory by the works of [11, 48]. As discussed in Section 3.2, some works on meta-learning for BO [16, 55], which study how information from other related BO tasks is used to accelerate the current BO task, can be adapted to the FBO setting. However, these works do not provide theoretical convergence guarantee nor tackle the issues of avoiding the transmission of raw data and achieving efficient communication. Moreover, their adapted variants for FBO have been shown to be outperformed by our FTS algorithm in various major aspects including communication efficiency, computational efficiency, and practical performance (Section 5.2).

Funding

- Acknowledgments and Disclosure of Funding This research/project is supported by A*STAR under its RIE2020 Advanced Manufacturing and Engineering (AME) Industry Alignment Fund – Pre Positioning (IAF-PP) (Award A19E4a0101)

Study subjects and analysis

datasets: 3

5.2 Real-world Experiments. For real-world experiments, we use 3 datasets generated in federated settings that naturally contain heterogeneous agents [51]. Firstly, we use a landmine detection dataset in which the landmine fields are located at two different terrains [58]

participants: 38

Activity Recognition Using Google Glasses. This dataset contains sensor measurements from Google glasses worn by 38 participants. Every agent attempts to use 57 features, which we have extracted from the corresponding participant’s measurements, to predict whether the participant is eating or performing other activities

participants: 37

Every agent uses logistic regression (LR) for activity prediction and needs to tune 3 hyperparameters of LR: batch size ([20, 60]), L2 regularization parameter ([10−6, 1]), and learning rate ([0.01, 0.1]). We fix one of the participants as the target agent and all other N = 37 participants as the other agents, each of whom possesses tn = 50 BO observations. Activity Recognition Using Mobile Phone Sensors

subjects: 30

Activity Recognition Using Mobile Phone Sensors. This dataset consists of mobile phone sensor measurements from 30 subjects performing 6 activities. Each agent attempts to tune the hyperparameters of a subject’s activity prediction model whose input includes 561 features and output is one of the 6 activity classes

subjects: 29

The activity prediction model and tuned hyperparameters, as well as their ranges, are the same as that in the Google glasses experiment. We again fix one of the subjects as the target agent and all other N = 29 subjects as the other agents with tn = 50 observations each. For all experiments, we set PN to be uniform: PN [n] = 1/N, ∀n = 1, . . . , N , and pt = 1 − 1/t2 for all t ∈ Z+ \ {1} and p1 = p2

Reference

- D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In Proc. ESANN, 2013.
- I. Bogunovic, J. Scarlett, S. Jegelka, and V. Cevher. Adversarially robust optimization with gaussian processes. In Proc. NeurIPS, pages 5760–5770, 2018.
- H. Chang, V. Shejwalkar, R. Shokri, and A. Houmansadr. Cronus: Robust and heterogeneous collaborative learning with black-box knowledge transfer. arXiv:1912.11279, 2019.
- O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Proc. NeurIPS, pages 2249–2257, 2011.
- J. Chen, N. Cao, B. K. H. Low, R. Ouyang, C. K.-Y. Tan, and P. Jaillet. Parallel Gaussian process regression with low-rank covariance matrix approximations. In Proc. UAI, pages 152–161, 2013.
- J. Chen, B. K. H. Low, P. Jaillet, and Y. Yao. Gaussian process decentralized data fusion and active sensing for spatiotemporal traffic modeling and prediction in mobility-on-demand systems. IEEE Trans. Autom. Sci. Eng., 12:901–921, 2015.
- J. Chen, B. K. H. Low, and C. K.-Y. Tan. Gaussian process-based decentralized data fusion and active sensing for mobility-on-demand system. In Proc. RSS, 2013.
- J. Chen, B. K. H. Low, C. K.-Y. Tan, A. Oran, P. Jaillet, J. M. Dolan, and G. S. Sukhatme. Decentralized data fusion and active sensing with mobile sensors for modeling and predicting spatiotemporal traffic phenomena. In Proc. UAI, pages 163–173, 2012.
- S. R. Chowdhury and A. Gopalan. On kernelized multi-armed bandits. In Proc. ICML, pages 844–853, 2017.
- E. Contal, D. Buffoni, A. Robicquet, and N. Vayatis. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Proc. ECML/PKDD, pages 225–240, 2013.
- Z. Dai, Y. Chen, K. H. Low, P. Jaillet, and T.-H. Ho. R2-B2: Recursive reasoning-based Bayesian optimization for no-regret learning in games. In Proc. ICML, 2020.
- Z. Dai, H. Yu, B. K. H. Low, and P. Jaillet. Bayesian optimization meets Bayesian optimal stopping. In Proc. ICML, pages 1496–1506, 2019.
- E. A. Daxberger and B. K. H. Low. Distributed batch Gaussian process optimization. In Proc. ICML, pages 951–960, 2017.
- T. Desautels, A. Krause, and J. W. Burdick. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. Journal of Machine Learning Research, 15:3873–3923, 2014.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proc. TCC, pages 265–284.
- M. Feurer, B. Letham, and E. Bakshy. Scalable meta-learning for Bayesian optimization using ranking-weighted Gaussian process ensembles. In Proc. ICML Workshop on Automatic Machine Learning, 2018.
- J. M. Hernández-Lobato, J. Requeima, E. O. Pyzer-Knapp, and A. Aspuru-Guzik. Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. In Proc. ICML, 2017.
- Q. M. Hoang, T. N. Hoang, and B. K. H. Low. A generalized stochastic variational Bayesian hyperparameter learning framework for sparse spectrum Gaussian process regression. In Proc. AAAI, pages 2007–2014, 2017.
- Q. M. Hoang, T. N. Hoang, B. K. H. Low, and C. Kingsford. Collective model fusion for multiple black-box experts. In Proc. ICML, pages 2742–2750, 2019.
- T. N. Hoang, Q. M. Hoang, and B. K. H. Low. A unifying framework of anytime sparse Gaussian process regression models with stochastic variational inference for big data. In Proc. ICML, pages 569–578, 2015.
- T. N. Hoang, Q. M. Hoang, and B. K. H. Low. A distributed variational inference framework for unifying parallel sparse Gaussian process regression models. In Proc. ICML, pages 382–391, 2016.
- T. N. Hoang, Q. M. Hoang, and B. K. H. Low. Decentralized high-dimensional Bayesian optimization with factor graphs. In Proc. AAAI, pages 3231–3238, 2018.
- T. N. Hoang, Q. M. Hoang, B. K. H. Low, and J. P. How. Collective online learning of Gaussian processes in massive multi-agent systems. In Proc. AAAI, pages 7850–7857, 2019.
- P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. Advances and open problems in federated learning. arXiv:1912.04977, 2019.
- K. Kandasamy, G. Dasarathy, J. B. Oliva, J. Schneider, and B. Póczos. Gaussian process bandit optimisation with multi-fidelity evaluations. In Proc. NeurIPS, pages 992–1000, 2016.
- K. Kandasamy, A. Krishnamurthy, J. Schneider, and B. Póczos. Parallelised Bayesian optimisation via Thompson sampling. In Proc. AISTATS, pages 133–142, 2018.
- D. Kharkovskii, Z. Dai, and B. K. H. Low. Private outsourced Bayesian optimization. In Proc. ICML, 2020.
- D. Kharkovskii, C. K. Ling, and B. K. H. Low. Nonmyopic Gaussian process optimization with macro-actions. In Proc. AISTATS, pages 4593–4604, 2020.
- M. Kusner, J. Gardner, R. Garnett, and K. Weinberger. Differentially private Bayesian optimization. In Proc. ICML, pages 918–927, 2015.
- Q. Li, Z. Wen, and B. He. Federated learning systems: Vision, hype and reality for data privacy and protection. arXiv:1907.09693, 2019.
- Q. Li, Z. Wen, and B. He. Practical federated gradient boosting decision trees. In Proc. AAAI, pages 4642–4649, 2020.
- Q. Li, Z. Wu, Z. Wen, and B. He. Privacy-preserving gradient boosting decision trees. In Proc. AAAI, pages 784–791, 2020.
- T. Li, A. K. Sahu, A. Talwalkar, and V. Smith. Federated learning: Challenges, methods, and future directions. arXiv:1908.07873, 2019.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith. Federated optimization in heterogeneous networks. arXiv:1812.06127, 2018.
- X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang. On the convergence of FedAvg on non-iid data. In Proc. ICLR, 2020.
- C. K. Ling, K. H. Low, and P. Jaillet. Gaussian process planning with Lipschitz continuous reward functions: Towards unifying Bayesian optimization, active learning, and beyond. In Proc. AAAI, pages 1860–1866, 2016.
- B. K. H. Low, N. Xu, J. Chen, K. K. Lim, and E. B. Özgül. Generalized online sparse Gaussian processes with application to persistent mobile robot localization. In Proc. ECML/PKDD Nectar Track, pages 499–503, 2014.
- B. K. H. Low, J. Yu, J. Chen, and P. Jaillet. Parallel Gaussian process regression for big data: Low-rank representation meets Markov approximation. In Proc. AAAI, pages 2821–2827, 2015.
- H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al. Communication-efficient learning of deep networks from decentralized data. In Proc. AISTATS, 2017.
- M. Mutny and A. Krause. Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. In Proc. NeurIPS, pages 9005–9016, 2018.
- R. Ouyang and B. K. H. Low. Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception. In Proc. AAAI, pages 3876–3883, 2018.
- M. Poloczek, J. Wang, and P. Frazier. Multi-information source optimization. In Proc. NeurIPS, pages 4288–4298, 2017.
- A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Proc. NeurIPS, pages 1177–1184, 2008.
- S. A. Rahman, C. Merck, Y. Huang, and S. Kleinberg. Unintrusive eating recognition using Google glass. In Proc. PervasiveHealth, pages 108–111, 2015.
- C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
- D. Russo and B. Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014.
- D. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen. A tutorial on Thompson sampling. arXiv:1707.02038, 2017.
- P. G. Sessa, I. Bogunovic, M. Kamgarpour, and A. Krause. No-regret learning in unknown games with correlated payoffs. In Proc. NeurIPS, 2019.
- B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2016.
- R. H. L. Sim, Y. Zhang, M. C. Chan, and B. K. H. Low. Collaborative machine learning with incentive-aware model rewards. In Proc. ICML, 2020.
- V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar. Federated multi-task learning. In Proc. NeurIPS, pages 4424–4434, 2017.
- N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. ICML, pages 1015–1022, 2010.
- T. Teng, J. Chen, Y. Zhang, and B. K. H. Low. Scalable variational Bayesian kernel selection for sparse Gaussian process regression. In Proc. AAAI, pages 5997–6004, 2020.
- W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285–294, 1933.
- M. Wistuba, N. Schilling, and L. Schmidt-Thieme. Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. Machine Learning, 107(1):43–78, 2018.
- J. Wu, S. Toscano-Palmerin, P. I. Frazier, and A. G. Wilson. Practical multi-fidelity Bayesian optimization for hyperparameter tuning. In Proc. UAI, pages 788–798, 2020.
- N. Xu, B. K. H. Low, J. Chen, K. K. Lim, and E. B. Özgül. GP-Localize: Persistent mobile robot localization using online sparse Gaussian process observation model. In Proc. AAAI, pages 2585–2592, 2014.
- Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research, 8(Jan):35–63, 2007.
- H. Yu, Y. Chen, Z. Dai, B. K. H. Low, and P. Jaillet. Implicit posterior variational inference for deep Gaussian processes. In Proc. NeurIPS, pages 14475–14486, 2019.
- H. Yu, T. N. Hoang, B. K. H. Low, and P. Jaillet. Stochastic variational inference for Bayesian sparse Gaussian process regression. In Proc. IJCNN, 2019.
- S. Yu, F. Farooq, A. Van Esbroeck, G. Fung, V. Anand, and B. Krishnapuram. Predicting readmission risk with institution-specific prediction models. Artificial Intelligence in Medicine, 65(2):89–96, 2015.
- Y. Zhang, Z. Dai, and B. K. H. Low. Bayesian optimization with binary auxiliary information. In Proc. UAI, pages 1222–1232, 2020.
- Y. Zhang, T. N. Hoang, B. K. H. Low, and M. Kankanhalli. Information-based multi-fidelity Bayesian optimization. In Proc. NeurIPS Workshop on Bayesian Optimization, 2017.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn