AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper introduces the first algorithm for the federated BO setting called federated Thompson sampling which addresses some key challenges in FBO in a principled manner

Federated Bayesian Optimization via Thompson Sampling

NIPS 2020, (2020)

Cited by: 8|Views133
EI
Full Text
Bibtex
Weibo

Abstract

Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (D...More

Code:

Data:

0
Introduction
  • Bayesian optimization (BO) has recently become a prominent approach to optimizing expensiveto-evaluate black-box functions with no access to gradients, such as in hyperparameter tuning of deep neural networks (DNNs) [49].
  • Some common ML tasks such as hyperparameter tuning of DNNs lack access to gradients and require zeroth-order/black-box optimization, and a recent survey [24] has pointed out that hyperparameter optimization of DNNs in the FL setting is one of the promising research directions for FL.
Highlights
  • Bayesian optimization (BO) has recently become a prominent approach to optimizing expensiveto-evaluate black-box functions with no access to gradients, such as in hyperparameter tuning of deep neural networks (DNNs) [49]
  • Since we allow the presence of heterogeneous agents, we do not aim to show that federated Thompson sampling (FTS) achieves a faster convergence than standard Thompson sampling (TS) and instead prove a convergence guarantee that is robust against heterogeneous agents
  • This is consistent with most works proving the convergence of federated learning (FL) algorithms [34, 35] and makes the theoretical results more applicable in general since the presence of heterogeneous agents is a major and inevitable challenge of FL and federated BO (FBO)
  • This paper introduces the first algorithm for the FBO setting called FTS which addresses some key challenges in FBO in a principled manner
  • We theoretically show its convergence guarantee which is robust against heterogeneous agents, and empirically demonstrate its communication efficiency, computational efficiency, and practical effectiveness using three real-world experiments
  • Other than the random Fourier features (RFF) approximation used in this work, other approximation techniques for Gaussian Process (GP) may be used to derive the parameters to be exchanged between agents, which is worth exploring in future works
Results
  • Since the authors allow the presence of heterogeneous agents, the authors do not aim to show that FTS achieves a faster convergence than standard TS and instead prove a convergence guarantee that is robust against heterogeneous agents.
  • With probability of at least 1 − δ, the cumulative regret incurred by FTS is5 where ψt
Conclusion
  • Using 3 real-world experiments, the authors demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.
  • In Appendix D.2.1, the authors evaluate the performance of FTS in the most general setting where the other agents are performing optimization tasks such that they may collect more observations between different roundsConclusion and Future Works.
  • The authors theoretically show its convergence guarantee which is robust against heterogeneous agents, and empirically demonstrate its communication efficiency, computational efficiency, and practical effectiveness using three real-world experiments.
  • The authors will consider incentivizing collaboration in FBO [50] and generalizing FBO to nonmyopic BO [28, 36] and high-dimensional BO [22] settings
Related work
  • Since its recent introduction in [39], FL has gained tremendous attention mainly due to its prominent practical relevance in the collaborative training of ML models such as DNNs [39] or decision treebased models [31, 32]. Meanwhile, efforts have also been made to derive theoretical convergence guarantees for FL algorithms [34, 35]. Refer to recent surveys [24, 30, 33] for more comprehensive reviews of FL. TS [54] has been known as a highly effective practical technique for multi-armed bandit problems [4, 47]. The Bayesian regret [46] and frequentist regret [9] of TS in BO have both been analyzed and TS has been shown to perform effectively in BO problems such as high-dimensional BO [40]. The theoretical analysis in this work has adopted techniques used in the works of [9, 40]. Our algorithm is also related to multi-fidelity BO [12, 25, 42, 56, 62, 63] which has the option to query low-fidelity functions. This is analogous to our algorithm allowing the target agent to use the information from the other agents for query selection and the similarity between an agent and the target agent can be interpreted as a measure of fidelity. Moreover, our algorithm also bears similarity to parallel/distributed BO algorithms [10, 13, 14], especially those based on TS [17, 26]. However, there are fundamental differences: For example, they usually optimize a single objective function whereas we need to consider possibly heterogeneous objective functions from different agents. On the other hand, BO involving multiple agents with possibly different objective functions has been studied from the perspective of game theory by the works of [11, 48]. As discussed in Section 3.2, some works on meta-learning for BO [16, 55], which study how information from other related BO tasks is used to accelerate the current BO task, can be adapted to the FBO setting. However, these works do not provide theoretical convergence guarantee nor tackle the issues of avoiding the transmission of raw data and achieving efficient communication. Moreover, their adapted variants for FBO have been shown to be outperformed by our FTS algorithm in various major aspects including communication efficiency, computational efficiency, and practical performance (Section 5.2).
Funding
  • Acknowledgments and Disclosure of Funding This research/project is supported by A*STAR under its RIE2020 Advanced Manufacturing and Engineering (AME) Industry Alignment Fund – Pre Positioning (IAF-PP) (Award A19E4a0101)
Study subjects and analysis
datasets: 3
5.2 Real-world Experiments. For real-world experiments, we use 3 datasets generated in federated settings that naturally contain heterogeneous agents [51]. Firstly, we use a landmine detection dataset in which the landmine fields are located at two different terrains [58]

participants: 38
Activity Recognition Using Google Glasses. This dataset contains sensor measurements from Google glasses worn by 38 participants. Every agent attempts to use 57 features, which we have extracted from the corresponding participant’s measurements, to predict whether the participant is eating or performing other activities

participants: 37
Every agent uses logistic regression (LR) for activity prediction and needs to tune 3 hyperparameters of LR: batch size ([20, 60]), L2 regularization parameter ([10−6, 1]), and learning rate ([0.01, 0.1]). We fix one of the participants as the target agent and all other N = 37 participants as the other agents, each of whom possesses tn = 50 BO observations. Activity Recognition Using Mobile Phone Sensors

subjects: 30
Activity Recognition Using Mobile Phone Sensors. This dataset consists of mobile phone sensor measurements from 30 subjects performing 6 activities. Each agent attempts to tune the hyperparameters of a subject’s activity prediction model whose input includes 561 features and output is one of the 6 activity classes

subjects: 29
The activity prediction model and tuned hyperparameters, as well as their ranges, are the same as that in the Google glasses experiment. We again fix one of the subjects as the target agent and all other N = 29 subjects as the other agents with tn = 50 observations each. For all experiments, we set PN to be uniform: PN [n] = 1/N, ∀n = 1, . . . , N , and pt = 1 − 1/t2 for all t ∈ Z+ \ {1} and p1 = p2

Reference
  • D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In Proc. ESANN, 2013.
    Google ScholarLocate open access versionFindings
  • I. Bogunovic, J. Scarlett, S. Jegelka, and V. Cevher. Adversarially robust optimization with gaussian processes. In Proc. NeurIPS, pages 5760–5770, 2018.
    Google ScholarLocate open access versionFindings
  • H. Chang, V. Shejwalkar, R. Shokri, and A. Houmansadr. Cronus: Robust and heterogeneous collaborative learning with black-box knowledge transfer. arXiv:1912.11279, 2019.
    Findings
  • O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Proc. NeurIPS, pages 2249–2257, 2011.
    Google ScholarLocate open access versionFindings
  • J. Chen, N. Cao, B. K. H. Low, R. Ouyang, C. K.-Y. Tan, and P. Jaillet. Parallel Gaussian process regression with low-rank covariance matrix approximations. In Proc. UAI, pages 152–161, 2013.
    Google ScholarLocate open access versionFindings
  • J. Chen, B. K. H. Low, P. Jaillet, and Y. Yao. Gaussian process decentralized data fusion and active sensing for spatiotemporal traffic modeling and prediction in mobility-on-demand systems. IEEE Trans. Autom. Sci. Eng., 12:901–921, 2015.
    Google ScholarLocate open access versionFindings
  • J. Chen, B. K. H. Low, and C. K.-Y. Tan. Gaussian process-based decentralized data fusion and active sensing for mobility-on-demand system. In Proc. RSS, 2013.
    Google ScholarLocate open access versionFindings
  • J. Chen, B. K. H. Low, C. K.-Y. Tan, A. Oran, P. Jaillet, J. M. Dolan, and G. S. Sukhatme. Decentralized data fusion and active sensing with mobile sensors for modeling and predicting spatiotemporal traffic phenomena. In Proc. UAI, pages 163–173, 2012.
    Google ScholarLocate open access versionFindings
  • S. R. Chowdhury and A. Gopalan. On kernelized multi-armed bandits. In Proc. ICML, pages 844–853, 2017.
    Google ScholarLocate open access versionFindings
  • E. Contal, D. Buffoni, A. Robicquet, and N. Vayatis. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Proc. ECML/PKDD, pages 225–240, 2013.
    Google ScholarLocate open access versionFindings
  • Z. Dai, Y. Chen, K. H. Low, P. Jaillet, and T.-H. Ho. R2-B2: Recursive reasoning-based Bayesian optimization for no-regret learning in games. In Proc. ICML, 2020.
    Google ScholarLocate open access versionFindings
  • Z. Dai, H. Yu, B. K. H. Low, and P. Jaillet. Bayesian optimization meets Bayesian optimal stopping. In Proc. ICML, pages 1496–1506, 2019.
    Google ScholarLocate open access versionFindings
  • E. A. Daxberger and B. K. H. Low. Distributed batch Gaussian process optimization. In Proc. ICML, pages 951–960, 2017.
    Google ScholarLocate open access versionFindings
  • T. Desautels, A. Krause, and J. W. Burdick. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. Journal of Machine Learning Research, 15:3873–3923, 2014.
    Google ScholarLocate open access versionFindings
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proc. TCC, pages 265–284.
    Google ScholarLocate open access versionFindings
  • M. Feurer, B. Letham, and E. Bakshy. Scalable meta-learning for Bayesian optimization using ranking-weighted Gaussian process ensembles. In Proc. ICML Workshop on Automatic Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • J. M. Hernández-Lobato, J. Requeima, E. O. Pyzer-Knapp, and A. Aspuru-Guzik. Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. In Proc. ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Q. M. Hoang, T. N. Hoang, and B. K. H. Low. A generalized stochastic variational Bayesian hyperparameter learning framework for sparse spectrum Gaussian process regression. In Proc. AAAI, pages 2007–2014, 2017.
    Google ScholarLocate open access versionFindings
  • Q. M. Hoang, T. N. Hoang, B. K. H. Low, and C. Kingsford. Collective model fusion for multiple black-box experts. In Proc. ICML, pages 2742–2750, 2019.
    Google ScholarLocate open access versionFindings
  • T. N. Hoang, Q. M. Hoang, and B. K. H. Low. A unifying framework of anytime sparse Gaussian process regression models with stochastic variational inference for big data. In Proc. ICML, pages 569–578, 2015.
    Google ScholarLocate open access versionFindings
  • T. N. Hoang, Q. M. Hoang, and B. K. H. Low. A distributed variational inference framework for unifying parallel sparse Gaussian process regression models. In Proc. ICML, pages 382–391, 2016.
    Google ScholarLocate open access versionFindings
  • T. N. Hoang, Q. M. Hoang, and B. K. H. Low. Decentralized high-dimensional Bayesian optimization with factor graphs. In Proc. AAAI, pages 3231–3238, 2018.
    Google ScholarLocate open access versionFindings
  • T. N. Hoang, Q. M. Hoang, B. K. H. Low, and J. P. How. Collective online learning of Gaussian processes in massive multi-agent systems. In Proc. AAAI, pages 7850–7857, 2019.
    Google ScholarLocate open access versionFindings
  • P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. Advances and open problems in federated learning. arXiv:1912.04977, 2019.
    Findings
  • K. Kandasamy, G. Dasarathy, J. B. Oliva, J. Schneider, and B. Póczos. Gaussian process bandit optimisation with multi-fidelity evaluations. In Proc. NeurIPS, pages 992–1000, 2016.
    Google ScholarLocate open access versionFindings
  • K. Kandasamy, A. Krishnamurthy, J. Schneider, and B. Póczos. Parallelised Bayesian optimisation via Thompson sampling. In Proc. AISTATS, pages 133–142, 2018.
    Google ScholarLocate open access versionFindings
  • D. Kharkovskii, Z. Dai, and B. K. H. Low. Private outsourced Bayesian optimization. In Proc. ICML, 2020.
    Google ScholarLocate open access versionFindings
  • D. Kharkovskii, C. K. Ling, and B. K. H. Low. Nonmyopic Gaussian process optimization with macro-actions. In Proc. AISTATS, pages 4593–4604, 2020.
    Google ScholarLocate open access versionFindings
  • M. Kusner, J. Gardner, R. Garnett, and K. Weinberger. Differentially private Bayesian optimization. In Proc. ICML, pages 918–927, 2015.
    Google ScholarLocate open access versionFindings
  • Q. Li, Z. Wen, and B. He. Federated learning systems: Vision, hype and reality for data privacy and protection. arXiv:1907.09693, 2019.
    Findings
  • Q. Li, Z. Wen, and B. He. Practical federated gradient boosting decision trees. In Proc. AAAI, pages 4642–4649, 2020.
    Google ScholarLocate open access versionFindings
  • Q. Li, Z. Wu, Z. Wen, and B. He. Privacy-preserving gradient boosting decision trees. In Proc. AAAI, pages 784–791, 2020.
    Google ScholarLocate open access versionFindings
  • T. Li, A. K. Sahu, A. Talwalkar, and V. Smith. Federated learning: Challenges, methods, and future directions. arXiv:1908.07873, 2019.
    Findings
  • T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith. Federated optimization in heterogeneous networks. arXiv:1812.06127, 2018.
    Findings
  • X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang. On the convergence of FedAvg on non-iid data. In Proc. ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • C. K. Ling, K. H. Low, and P. Jaillet. Gaussian process planning with Lipschitz continuous reward functions: Towards unifying Bayesian optimization, active learning, and beyond. In Proc. AAAI, pages 1860–1866, 2016.
    Google ScholarLocate open access versionFindings
  • B. K. H. Low, N. Xu, J. Chen, K. K. Lim, and E. B. Özgül. Generalized online sparse Gaussian processes with application to persistent mobile robot localization. In Proc. ECML/PKDD Nectar Track, pages 499–503, 2014.
    Google ScholarLocate open access versionFindings
  • B. K. H. Low, J. Yu, J. Chen, and P. Jaillet. Parallel Gaussian process regression for big data: Low-rank representation meets Markov approximation. In Proc. AAAI, pages 2821–2827, 2015.
    Google ScholarLocate open access versionFindings
  • H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al. Communication-efficient learning of deep networks from decentralized data. In Proc. AISTATS, 2017.
    Google ScholarLocate open access versionFindings
  • M. Mutny and A. Krause. Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. In Proc. NeurIPS, pages 9005–9016, 2018.
    Google ScholarLocate open access versionFindings
  • R. Ouyang and B. K. H. Low. Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception. In Proc. AAAI, pages 3876–3883, 2018.
    Google ScholarLocate open access versionFindings
  • M. Poloczek, J. Wang, and P. Frazier. Multi-information source optimization. In Proc. NeurIPS, pages 4288–4298, 2017.
    Google ScholarLocate open access versionFindings
  • A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Proc. NeurIPS, pages 1177–1184, 2008.
    Google ScholarLocate open access versionFindings
  • S. A. Rahman, C. Merck, Y. Huang, and S. Kleinberg. Unintrusive eating recognition using Google glass. In Proc. PervasiveHealth, pages 108–111, 2015.
    Google ScholarLocate open access versionFindings
  • C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
    Google ScholarFindings
  • D. Russo and B. Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014.
    Google ScholarLocate open access versionFindings
  • D. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen. A tutorial on Thompson sampling. arXiv:1707.02038, 2017.
    Findings
  • P. G. Sessa, I. Bogunovic, M. Kamgarpour, and A. Krause. No-regret learning in unknown games with correlated payoffs. In Proc. NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2016.
    Google ScholarLocate open access versionFindings
  • R. H. L. Sim, Y. Zhang, M. C. Chan, and B. K. H. Low. Collaborative machine learning with incentive-aware model rewards. In Proc. ICML, 2020.
    Google ScholarLocate open access versionFindings
  • V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar. Federated multi-task learning. In Proc. NeurIPS, pages 4424–4434, 2017.
    Google ScholarLocate open access versionFindings
  • N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. ICML, pages 1015–1022, 2010.
    Google ScholarLocate open access versionFindings
  • T. Teng, J. Chen, Y. Zhang, and B. K. H. Low. Scalable variational Bayesian kernel selection for sparse Gaussian process regression. In Proc. AAAI, pages 5997–6004, 2020.
    Google ScholarLocate open access versionFindings
  • W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285–294, 1933.
    Google ScholarLocate open access versionFindings
  • M. Wistuba, N. Schilling, and L. Schmidt-Thieme. Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. Machine Learning, 107(1):43–78, 2018.
    Google ScholarLocate open access versionFindings
  • J. Wu, S. Toscano-Palmerin, P. I. Frazier, and A. G. Wilson. Practical multi-fidelity Bayesian optimization for hyperparameter tuning. In Proc. UAI, pages 788–798, 2020.
    Google ScholarLocate open access versionFindings
  • N. Xu, B. K. H. Low, J. Chen, K. K. Lim, and E. B. Özgül. GP-Localize: Persistent mobile robot localization using online sparse Gaussian process observation model. In Proc. AAAI, pages 2585–2592, 2014.
    Google ScholarLocate open access versionFindings
  • Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research, 8(Jan):35–63, 2007.
    Google ScholarLocate open access versionFindings
  • H. Yu, Y. Chen, Z. Dai, B. K. H. Low, and P. Jaillet. Implicit posterior variational inference for deep Gaussian processes. In Proc. NeurIPS, pages 14475–14486, 2019.
    Google ScholarLocate open access versionFindings
  • H. Yu, T. N. Hoang, B. K. H. Low, and P. Jaillet. Stochastic variational inference for Bayesian sparse Gaussian process regression. In Proc. IJCNN, 2019.
    Google ScholarLocate open access versionFindings
  • S. Yu, F. Farooq, A. Van Esbroeck, G. Fung, V. Anand, and B. Krishnapuram. Predicting readmission risk with institution-specific prediction models. Artificial Intelligence in Medicine, 65(2):89–96, 2015.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, Z. Dai, and B. K. H. Low. Bayesian optimization with binary auxiliary information. In Proc. UAI, pages 1222–1232, 2020.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, T. N. Hoang, B. K. H. Low, and M. Kankanhalli. Information-based multi-fidelity Bayesian optimization. In Proc. NeurIPS Workshop on Bayesian Optimization, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科