# Locally Private Hypothesis Selection

Gopi Sivakanth
Nikolov Aleksandar
Zhang Huanyu

COLT, pp. 1785-1816, 2020.

Cited by: 0|Views39
EI
Weibo:
We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least Ω(k) samples

Abstract:

We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation ...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
• Perhaps the most fundamental question in statistics is that of simple hypothesis testing.
• Suppose M is a non-interactive an ε-LDP protocol that solves the k-wise simple hypothesis testing problem with probability at least 1/3 when given n samples from some distribution p ∈ Q, where Q = {q1, .
Highlights
• Perhaps the most fundamental question in statistics is that of simple hypothesis testing
• We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least Ω(k) samples
• The data may not have been generated according to any distribution from the set of known distributions – instead, the goal is to just select a distribution from the set which is competitive with the best possible. This problem is the core object of our study, and we denote it as hypothesis selection
• The parallelism model we study here was introduced by Valiant [Val75], for parallel comparison-based problems with non-adversarial comparators
• We describe the adversarial comparator setting of [AJOS14, AFJ+18], as well as their reduction to this model for the hypothesis selection problem
• We can not reuse the same set of O samples for all comparisons, since it violate the privacy constraint, and doing so would give rise to algorithms which violate our main lower bound for locally private hypothesis selection (Theorem 1.2)
Results
• The authors can not reuse the same set of O samples for all comparisons, since it violate the privacy constraint, and doing so would give rise to algorithms which violate the main lower bound for locally private hypothesis selection (Theorem 1.2).
• There exists a 1-round algorithm which achieves a (3+γ)-agnostic factor for locally private hypothesis selection with probability 1 − β, in the special case where k = 2, where γ > 0 is an arbitrarily small constant.
• There exists an 2-round algorithm which achieves a (81 + γ)-agnostic factor for locally private hypothesis selection with high probability, where γ > 0 is an arbitrarily small constant.
• The authors describe the main result in this setting, a family of algorithms for approximate maximum selection parameterized by t, which is the allowed number of rounds.
• There exists an O-round algorithm which, with probability 9/10, achieves a 3-approximation in the problem of parallel approximate maximum selection with adversarial comparators.
• If the fraction of such elements is high, the authors can sample a small number of items such that the authors select at least one 1-approximation to x∗, and running the round-robin algorithm on this set will guarantee a 3-approximation to the maximum.
• There exists a t-round algorithm which achieves a (27 + γ)-agnostic factor for locally private hypothesis selection with probability 9/10, where γ > 0 is an arbitrarily small constant.
Conclusion
• For any τ > 1, any 2-round algorithm which achieves a τ -approximation in the problem of parallel approximate maximum selection with non-adaptive adversarial comparators requires Ω(k 3 ) queries.
• For any τ > 1, any t-round algorithm which achieves τ -approximation in the problem of parallel approximate maximum selection with non-adaptive adversarial comparators requires k
Summary
• Perhaps the most fundamental question in statistics is that of simple hypothesis testing.
• Suppose M is a non-interactive an ε-LDP protocol that solves the k-wise simple hypothesis testing problem with probability at least 1/3 when given n samples from some distribution p ∈ Q, where Q = {q1, .
• The authors can not reuse the same set of O samples for all comparisons, since it violate the privacy constraint, and doing so would give rise to algorithms which violate the main lower bound for locally private hypothesis selection (Theorem 1.2).
• There exists a 1-round algorithm which achieves a (3+γ)-agnostic factor for locally private hypothesis selection with probability 1 − β, in the special case where k = 2, where γ > 0 is an arbitrarily small constant.
• There exists an 2-round algorithm which achieves a (81 + γ)-agnostic factor for locally private hypothesis selection with high probability, where γ > 0 is an arbitrarily small constant.
• The authors describe the main result in this setting, a family of algorithms for approximate maximum selection parameterized by t, which is the allowed number of rounds.
• There exists an O-round algorithm which, with probability 9/10, achieves a 3-approximation in the problem of parallel approximate maximum selection with adversarial comparators.
• If the fraction of such elements is high, the authors can sample a small number of items such that the authors select at least one 1-approximation to x∗, and running the round-robin algorithm on this set will guarantee a 3-approximation to the maximum.
• There exists a t-round algorithm which achieves a (27 + γ)-agnostic factor for locally private hypothesis selection with probability 9/10, where γ > 0 is an arbitrarily small constant.
• For any τ > 1, any 2-round algorithm which achieves a τ -approximation in the problem of parallel approximate maximum selection with non-adaptive adversarial comparators requires Ω(k 3 ) queries.
• For any τ > 1, any t-round algorithm which achieves τ -approximation in the problem of parallel approximate maximum selection with non-adaptive adversarial comparators requires k
Related work
• As mentioned before, our work builds on a long line of investigation on hypothesis selection. This style of approach was pioneered by Yatracos [Yat85], and refined in subsequent work by Devroye and Lugosi [DL96, DL97, DL01]. After this, additional considerations have been taken into account, such as computation, approximation factor, robustness, and more [MS08, DDS12, DK14, SOAJ14, AJOS14, DKK+16, AFJ+18, BKM19, BKSW19]. Most relevant is the recent work of Bun, Kamath, Steinke, and Wu [BKSW19], which studies hypothesis selection under central differential privacy. Our results are for the stronger constraint of local differential privacy.

Versions of our problem have been studied under both central and local differential privacy. In the local model, the most pertinent result is that of Duchi, Jordan, and Wainwright [DJW13, DJW17], showing a lower bound on the sample complexity for simple hypothesis testing between two known distributions. This matches folklore upper bounds for the same problem. However, the straightforward way of extending said protocol to k-wise simple hypothesis testing would incur a cost of O(k2) samples. Other works on hypothesis testing under local privacy include [GR18, She18, ACFT19, ACT19, JMNR19]. In the central model, some of the early work was done by the Statistics community [VS09, USF13]. More recent work can roughly be divided into two lines – one attempts to provide private analogues of classical statistical tests [WLK15, GLRV16, KR17, KSF17, CBRG18, SGHG+19, CKS+19], while the other focuses more on achieving minimax sample complexities for testing problems [CDK17, ASZ18, ADR18, AKSZ18, CKM+19b, ADKR19, AJM19]. While most of these focus on composite hypothesis testing, we highlight [CKM+19a] which studies simple hypothesis testing. Work of Awan and Slavkovic [AS18] gives a universally optimal test for binomial data, however Brenner and Nissim [BN14] give an impossibility result for distributions with domain larger than 2.
Reference
• Noga Alon and Yossi Azar. The average complexity of deterministic and randomized parallel comparison-soring algorithms. SIAM Journal on Computing, 17(6):1178–1192, 1988.
• Noga Alon and Yossi Azar. Sorting, approximate sorting, and searching in rounds. SIAM Journal on Discrete Mathematics, 1(3):269–280, 1988.
• Noga Alon, Yossi Azar, and Uzi Vishkin. Tight complexity bounds for parallel comparison sorting. In Proceedings of the 27th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’86, pages 502–510, Washington, DC, USA, 1986. IEEE Computer Society.
• Jayadev Acharya, Clement L. Canonne, Cody Freitag, and Himanshu Tyagi. Test without trust: Optimal locally private distribution testing. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, AISTATS ’19, pages 2067–2076. JMLR, Inc., 2019.
• Jayadev Acharya, Clement L. Canonne, and Himanshu Tyagi. Inference under information constraints: Lower bounds from chi-square contraction. In Proceedings of the 32nd Annual Conference on Learning Theory, COLT ’19, pages 1–15, 2019.
• Maryam Aliakbarpour, Ilias Diakonikolas, Daniel M. Kane, and Ronitt Rubinfeld. Private testing of distributions via sample permutations. In Advances in Neural Information Processing Systems 32, NeurIPS ’19, pages 10877–10888. Curran Associates, Inc., 2019.
• Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld. Differentially private identity and closeness testing of discrete distributions. In Proceedings of the 35th International Conference on Machine Learning, ICML ’18, pages 169–178. JMLR, Inc., 2018.
• [AFHN09] Miklos Ajtai, Vitaly Feldman, Avinatan Hassidim, and Jelani Nelson. Sorting and selection with imprecise comparisons. In Proceedings of the 36th International Colloquium on Automata, Languages, and Programming, ICALP ’09, pages 37–48, 2009.
• Jayadev Acharya, Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. Maximum selection and sorting with adversarial comparators. Journal of Machine Learning Research, 19(1):2427–2457, 2018.
• [AJM19] Kareem Amin, Matthew Joseph, and Jieming Mao. Pan-private uniformity testing. arXiv preprint arXiv:1911.01452, 2019.
• Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. Sorting with adversarial comparators and application to density estimation. In Proceedings of the 2014 IEEE International Symposium on Information Theory, ISIT ’14, pages 1682–1686, Washington, DC, USA, 2014. IEEE Computer Society.
• Miklos Ajtai, Janos Komlos, and Endre Szemeredi. An o(n log n) sorting network. In Proceedings of the 15th Annual ACM Symposium on the Theory of Computing, STOC ’83, pages 1–9, New York, NY, USA, 1983. ACM.
• Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang. Inspectre: Privately estimating the unseen. In Proceedings of the 35th International Conference on Machine Learning, ICML ’18, pages 30–39. JMLR, Inc., 2018.
• Noga Alon. Expanders, sorting in rounds and superconcentrators of limited depth. In Proceedings of the 17th Annual ACM Symposium on the Theory of Computing, STOC ’85, pages 98–102, New York, NY, USA, 1985. ACM.
• Yossi Azar and Nicholas Pippenger. Parallel selection. Discrete Applied Mathematics, 27(1-2):49–58, 1990.
• Jordan Awan and Aleksandra Slavkovic. Differentially private uniformly most powerful tests for binomial data. In Advances in Neural Information Processing Systems 31, NeurIPS ’18, pages 4208–4218. Curran Associates, Inc., 2018.
• Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private testing of identity and closeness of discrete distributions. In Advances in Neural Information Processing Systems 31, NeurIPS ’18, pages 6878–6891. Curran Associates, Inc., 2018.
• Yossi Azar and Uzi Vishkin. Tight comparison bounds on the complexity of parallel sorting. SIAM Journal on Computing, 16(3):458–464, 1987.
• Bela Bollobas and Graham Brightwell. Parallel selection with high probability. SIAM Journal on Discrete Mathematics, 3(1):21–31, 1990.
• Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, and David P. Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 1011–1020, 2016.
• Olivier Bousquet, Daniel M. Kane, and Shay Moran. The optimal approximation factor in density estimation. In Proceedings of the 32nd Annual Conference on Learning Theory, COLT ’19, pages 318–341, 2019.
• [BKSW19] Mark Bun, Gautam Kamath, Thomas Steinke, and Zhiwei Steven Wu. Private hypothesis selection. In Advances in Neural Information Processing Systems 32, NeurIPS ’19, pages 156–167. Curran Associates, Inc., 2019.
• Mark Braverman, Jieming Mao, and Yuval Peres. Sorted top-k in rounds. In Proceedings of the 32nd Annual Conference on Learning Theory, COLT ’19, pages 342–382, 2019.
• Mark Braverman, Jieming Mao, and S. Matthew Weinberg. Parallel algorithms for select and partition with noisy comparisons. In Proceedings of the 48th Annual ACM Symposium on the Theory of Computing, STOC ’16, pages 851–862, New York, NY, USA, 2016. ACM.
• Hai Brenner and Kobbi Nissim. Impossibility of differentially private universally optimal mechanisms. SIAM Journal on Computing, 43(5):1513–1540, 2014.
• Bela Bollobas and Andrew Thomason. Parallel sorting. Discrete Applied Mathematics, 6(1):1–11, 1983.
• Zachary Campbell, Andrew Bray, Anna Ritz, and Adam Groce. Differentially private ANOVA testing. In Proceedings of the 2018 International Conference on Data Intelligence and Security, ICDIS ’18, pages 281–285, Washington, DC, USA, 2018. IEEE Computer Society.
• Bryan Cai, Constantinos Daskalakis, and Gautam Kamath. Priv’it: Private and sample efficient identity testing. In Proceedings of the 34th International Conference on Machine Learning, ICML ’17, pages 635–644. JMLR, Inc., 2017.
• [CKM+19a] Clement L. Canonne, Gautam Kamath, Audra McMillan, Adam Smith, and Jonathan Ullman. The structure of optimal private tests for simple hypotheses. In Proceedings of the 51st Annual ACM Symposium on the Theory of Computing, STOC ’19, New York, NY, USA, 2019. ACM.
• [CKM+19b] Clement L. Canonne, Gautam Kamath, Audra McMillan, Jonathan Ullman, and Lydia Zakynthinou. Private identity testing for high-dimensional distributions. arXiv preprint arXiv:1905.11947, 2019.
• Simon Couch, Zeki Kazan, Kaiyan Shi, Andrew Bray, and Adam Groce. Differentially private nonparametric hypothesis testing. In Proceedings of the 2019 ACM Conference on Computer and Communications Security, CCS ’19, New York, NY, USA, 2019. ACM.
• T-H Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC), 14(3):1– 24, 2011.
• Constantinos Daskalakis, Ilias Diakonikolas, and Rocco A. Servedio. Learning Poisson binomial distributions. In Proceedings of the 44th Annual ACM Symposium on the Theory of Computing, STOC ’12, pages 709–728, New York, NY, USA, 2012. ACM.
• Amit Daniely and Vitaly Feldman. Locally private learning without interaction requires separation. In Advances in Neural Information Processing Systems 32, NeurIPS ’19, pages 14975–14986. Curran Associates, Inc., 2019.
• Differential Privacy Team, Apple. Learning with privacy at scale. https://machinelearning.apple.com/docs/learning-with-privacy-at-scale/appledifferentialprivacysystem.pdf, December 2017.
• John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Local privacy and statistical minimax rates. In Proceedings of the 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’13, pages 429–438, Washington, DC, USA, 2013. IEEE Computer Society.
• John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 2017.
• Constantinos Daskalakis and Gautam Kamath. Faster and sample near-optimal algorithms for proper learning mixtures of Gaussians. In Proceedings of the 27th Annual Conference on Learning Theory, COLT ’14, pages 1183–1213, 2014.
• Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high dimensions without the computational intractability. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’16, pages 655–664, Washington, DC, USA, 2016. IEEE Computer Society.
• Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, NIPS ’17, pages 3571–3580. Curran Associates, Inc., 2017.
• Luc Devroye and Gabor Lugosi. A universally acceptable smoothing factor for kernel density estimation. The Annals of Statistics, 24(6):2499–2512, 1996.
• Luc Devroye and Gabor Lugosi. Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes. The Annals of Statistics, 25(6):2626–2637, 1997.
• Luc Devroye and Gabor Lugosi. Combinatorial methods in density estimation. Springer, 2001.
• [DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography, TCC ’06, pages 265–284, Berlin, Heidelberg, 2006. Springer.
• [DMR18] Luc Devroye, Abbas Mehrabian, and Tommy Reddad. The total variation distance between high-dimensional Gaussians. arXiv preprint arXiv:1810.08693, 2018.
• Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends R in Machine Learning, 9(3–4):211–407, 2014.
• John Duchi and Ryan Rogers. Lower bounds for locally private estimation via communication complexity. In Proceedings of the 32nd Annual Conference on Learning Theory, COLT ’19, pages 1161–1191, 2019.
• Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’03, pages 211–222, New York, NY, USA, 2003. ACM.
• Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM Conference on Computer and Communications Security, CCS ’14, pages 1054–1067, New York, NY, USA, 2014. ACM.
• Marco Gaboardi, Hyun-Woo Lim, Ryan M. Rogers, and Salil P. Vadhan. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. In Proceedings of the 33rd International Conference on Machine Learning, ICML ’16, pages 1395–1403. JMLR, Inc., 2016.
• Marco Gaboardi and Ryan Rogers. Local private hypothesis testing: Chi-square tests. In Proceedings of the 35th International Conference on Machine Learning, ICML ’18, pages 1626–1635. JMLR, Inc., 2018.
• Roland Haggkvist and Pavol Hell. Parallel sorting with constant time for comparisons. SIAM Journal on Computing, 10(3):465–472, 1981.
• Matthew Joseph, Jieming Mao, Seth Neel, and Aaron Roth. The role of interactivity in local differential privacy. In Proceedings of the 60th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’19, pages 94–105, Washington, DC, USA, 2019. IEEE Computer Society.
• Matthew Joseph, Jieming Mao, and Aaron Roth. Exponential separations in local differential privacy through communication complexity. In Proceedings of the 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’20, pages 515–527, Philadelphia, PA, USA, 2020. SIAM.
• Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
• Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning high-dimensional distributions. In Proceedings of the 32nd Annual Conference on Learning Theory, COLT ’19, pages 1853–1902, 2019.
• Daniel Kifer and Ryan M. Rogers. A new class of private chi-square tests. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS ’17, pages 991–1000. JMLR, Inc., 2017.
• Clyde P. Kruskal. Searching, merging, and sorting in parallel computation. IEEE Transactions on Computers, C-32(10):942–946, 1983.
• Kazuya Kakizaki, Jun Sakuma, and Kazuto Fukuchi. Differentially private chi-squared test by unit circle mechanism. In Proceedings of the 34th International Conference on Machine Learning, ICML ’17, pages 1761–1770. JMLR, Inc., 2017.
• Tom Leighton. Tight bounds on the complexity of parallel sorting. In Proceedings of the 16th Annual ACM Symposium on the Theory of Computing, STOC ’84, pages 71–80, New York, NY, USA, 1984. ACM.
• Satyaki Mahalanabis and Daniel Stefankovic. Density estimation in linear time. In Proceedings of the 21st Annual Conference on Learning Theory, COLT ’08, pages 503–512, 2008.
• Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’07, pages 94–103, Washington, DC, USA, 2007. IEEE Computer Society.
• Jerzy Neyman and Egon Sharpe Pearson. Ix. on the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694706):289–337, 1933.
• Nicholas Pippenger. Soring and selecting in rounds. SIAM Journal on Computing, 16(6):1032–1038, 1987.
• [SGHG+19] Marika Swanberg, Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, and Andrew Bray. Improved differentially private analysis of variance. Proceedings on Privacy Enhancing Technologies, 2019(3), 2019.
• Or Sheffet. Locally private hypothesis testing. In Proceedings of the 35th International Conference on Machine Learning, ICML ’18, pages 4605–4614. JMLR, Inc., 2018.
• Ananda Theertha Suresh, Alon Orlitsky, Jayadev Acharya, and Ashkan Jafarpour. Near-optimal-sample estimators for spherical Gaussian mixtures. In Advances in Neural Information Processing Systems 27, NIPS ’14, pages 1395–1403. Curran Associates, Inc., 2014.
• Jonathan Ullman. Tight lower bounds for locally differentially private selection. arXiv preprint arXiv:1802.02638, 2018.
• Caroline Uhler, Aleksandra Slavkovic, and Stephen E. Fienberg. Privacy-preserving data sharing for genome-wide association studies. The Journal of Privacy and Confidentiality, 5(1):137–166, 2013.
• Leslie G. Valiant. Parallelism in comparison problems. SIAM Journal on Computing, 4(3):348–355, 1975.
• Duy Vu and Aleksandra Slavkovic. Differential privacy for clinical trial data: Preliminary evaluations. In 2009 IEEE International Conference on Data Mining Workshops, ICDMW ’09, pages 138–143. IEEE, 2009.
• Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
• [WLK15] Yue Wang, Jaewoo Lee, and Daniel Kifer. Revisiting differentially private hypothesis tests for categorical data. arXiv preprint arXiv:1511.03376, 2015.
• Yannis G. Yatracos. Rates of convergence of minimum distance estimators and Kolmogorov’s entropy. The Annals of Statistics, 13(2):768–774, 1985.
0

Tags