AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We propose the weighted averaging method for smoothly bounding user contribution in differential privacy
Smoothly Bounding User Contributions in Differential Privacy
NIPS 2020, (2020)
A differentially private algorithm guarantees that the input of a single user won’t significantly change the output distribution of the algorithm. When a user contributes more data points, more information can be collected to improve the algorithm’s performance. But at the same time, more noise might need to be added to the algorithm in o...More
PPT (Upload PPT)
- The notion of Differential Privacy, introduced by [DMNS06], aims to capture the requirement that the output of an algorithm should not reveal much about the information provided by a single user.
- In many applications of differential privacy, a single user might contribute more than one data point.
- While the standard definition of differential privacy can still capture such settings by defining a row as the collection of all data points belonging to the same user, an important and useful nuance is lost in this translation.
- Most importantly, when a user contributes many data points, the algorithm designer must balance between the value of the information contained in these data points, and the added noise it will have to add to the output to make it private with respect to this user
- The notion of Differential Privacy, introduced by [DMNS06], aims to capture the requirement that the output of an algorithm should not reveal much about the information provided by a single user
- While the standard definition of differential privacy can still capture such settings by defining a row as the collection of all data points belonging to the same user, an important and useful nuance is lost in this translation
- We propose the weighted averaging method for smoothly bounding user contribution in differential privacy
- Privacy is a fundamental concern in machine learning
- Respecting the privacy of the users is a requirement of any real system and differential privacy allows to formalize such requirement
- In this paper we provided algorithms with improved trade-offs of utility vs differential privacy
- The authors perform an empirical evaluation of the algorithm and the authors compare it with the sample limiting algorithm for linear regression in the label-privacy case.
- In Appendix E, the authors provide experimental results on logistic regression using the ERM algorithm of Setion 4.
- Datasets The authors evaluated all methods on two publicly-available datasets containing real-world data as well as synthetic datasets with ground-truth generated with standard open-source libraries.
- Synthetic data: The authors generated regression problem instances with sklearn’s make_regression (n ∈ [600, 3000] samples, d = 10 features, bias=0.0 and noise=20).
- To model user contributions the authors used the Zipf’s distribution for the number of rows of a user.
- Results on the synthetic dataset
The authors com-
bers of samples n and different parameter α of the Zipf’s distribution.
- Results on the synthetic dataset.
- Bers of samples n and different parameter α of the Zipf’s distribution.
- Lower α values correspond to more uneven distributions.
- The results for α = 1.5, ε = 1, are plot in Figures 1.
- Dataset ε The authors' method Sample limit h∗ Sample limit hmax 95.4 drugs 2 4862670.7 news 2.
- The authors fix ε = 1 and n = 3000 samples and analyze the effect of the parameter α in Figure 2.
- The authors' method is comparatively much better for low α, but it performs always better
- The authors propose the weighted averaging method for smoothly bounding user contribution in differential privacy
- The authors apply this method to estimating the mean and quantiles, empirical risk minimization, and linear regression.
- Privacy requirements may negatively affect utility, and it is known that differential privacy potentially disparately impacts certain users [BPS19].
- Such considerations are beyond the scope of the paper and the authors refer to the emerging literature on responsible machine learning for addressing them [KR19]
- Table1: Average squared errors for our method, sample limit with best threshold (h∗), and using all data (hmax)
- Differential privacy is proposed by the seminal work of [DMNS06]. For a detailed survey on differential privacy, see [DR14].
Differentially private linear regression and its general form, empirical risk minimization have been well-studied [CM08, CMS11, KST12, JKT12, TS13, SCS13, DJW13, JT14, BST14, Ull15, TTZ15, STU17, WLK+17, WYX17, ZZMW17, Wan18, She19a, She19b, INS+19, BFTT19, WX19, FKT20]. In particular, [WX19] studies label privacy which is similar to the setting we have in Section 5. These results are in the case when each user has only one data point.
Motivated by federated learning, [AKMV19] initiates the study of bounding user contributions in differential privacy. [TAM19, PSY+19] study how to adaptively bound user contributions in differentially private stochastic gradient descent for federated learning. For a detailed survey on federated learning, see [KMA+19]. More broadly, our setting of each user having multiple data points can be considered as a special case of personalized/heterogeneous differential privacy [JYC15, SCS15, AGK17] and is very related to group privacy which is introduced in [Dwo06].
Study subjects and analysis
users with min 1 and max 63 samples: 502
To model user contributions we used the Zipf’s (power law) distribution for the number of rows of a user (users contributions are often heavy tailed [AH02]). Real-world datasets: We used also two UCI Machine Learning Datasets. drugs [GKMZ18] (n = 3107, d = 8, m = 502 users with min 1 and max 63 samples) and news [MT18] (n = 3452, d = 10, m = 297 users with min 1 and max 878 samples).
users with min 1 and max 63 samples: 502
To model user contributions we used the Zipf’s (power law) distribution for the number of rows of a user (users contributions are often heavy tailed [AH02]). Real-world datasets: We used also two UCI Machine Learning Datasets. drugs [GKMZ18] (n = 3107, d = 8, m = 502 users with min 1 and max 63 samples) and news [MT18] (n = 3452, d = 10, m = 297 users with min 1 and max 878 samples). Experimental set up Experiments are re-
squared error, even orders of magnitude lower (notice the y-axis is in log scale). We now fix ε = 1 and n = 3000 samples and analyze the effect of the parameter α in Figure 2. Recall that α controls the inequality in the distribution of the user’s contributions
- Mohammad Alaggan, Sébastien Gambs, and Anne-Marie Kermarrec. Heterogeneous differential privacy. Journal of Privacy and Confidentiality, 7(2), Jan. 2017.
- Lada A Adamic and Bernardo A Huberman. Zipf’s law and the internet. Glottometrics, 3(1):143–150, 2002.
- AC Aitken. On Least-squares and Linear Combinations of Observations. Proceedings of the Royal Society of Edinburgh, 55:42–48, 1934.
- [AKMV19] Kareem Amin, Alex Kulesza, Andres Munoz, and Sergei Vassilvitskii. Bounding user contributions: A bias-variance trade-off in differential privacy. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 263–271, June 2019.
- Raef Bassily, Vitaly Feldman, Kunal Talwar, and Abhradeep Guha Thakurta. Private stochastic convex optimization with optimal rates. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pages 11279–11288, 2019.
- Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems 32, pages 15479–15488. 2019.
- Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS ’14, page 464–473, USA, 2014. IEEE Computer Society.
- Kamalika Chaudhuri and Daniel Hsu. Sample complexity bounds for differentially private learning. In Proceedings of the 24th Annual Conference on Learning Theory, volume 19 of Proceedings of Machine Learning Research, pages 155–186, June 2011.
- Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS’08, page 289–296, Red Hook, NY, USA, 2008. Curran Associates Inc.
- [CMS11] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. J. Mach. Learn. Res., 12(null):1069–1109, July 2011.
- [DJW13] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429–438, 2013.
- [DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, TCC’06, page 265–284, 2006.
- [DR14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3–4):211–407, August 2014.
- [Dwo06] Cynthia Dwork. Differential privacy. In Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, editors, Automata, Languages and Programming, pages 1–12, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
- [FKT20] Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: Optimal rates in linear time. In STOC, 2020.
- [Gau26] Carl Friedrich Gauss. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae, Parts 1, 2 and suppl. Werke 4, 1–108. 1821, 1823, 1826.
- [GKMZ18] Felix Gräßer, Surya Kallumadi, Hagen Malberg, and Sebastian Zaunseder. Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning. In Proceedings of the 2018 International Conference on Digital Health, pages 121–125, 2018.
- [INS+19] R. Iyengar, J. P. Near, D. Song, O. Thakkar, A. Thakurta, and L. Wang. Towards practical differentially private convex optimization. In 2019 IEEE Symposium on Security and Privacy (SP), pages 299–316, 2019.
- Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Shie Mannor, Nathan Srebro, and Robert C. Williamson, editors, Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 24.1–24.34, Edinburgh, Scotland, 25–27 Jun 2012. PMLR.
- Prateek Jain and Abhradeep Guha Thakurta. (near) dimension independent risk bounds for differentially private learning. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 476–484, Bejing, China, 22–24 Jun 2014. PMLR.
- Z. Jorgensen, T. Yu, and G. Cormode. Conservative or liberal? personalized differential privacy. In 2015 IEEE 31st International Conference on Data Engineering, pages 1023–1034, 2015.
- [KMA+19] Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaïd Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. Advances and open problems in federated learning. CoRR, abs/1912.04977, 2019.
- [KR19] Michael Kearns and Aaron Roth. The Ethical Algorithm: The Science of Socially Aware Algorithm Design. Oxford University Press, 2019.
- Daniel Kifer, Adam Smith, and Abhradeep Thakurta. Private convex empirical risk minimization and high-dimensional regression. In Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 25.1–25.40, Edinburgh, Scotland, 25–27 Jun 2012. PMLR.
- Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’07, page 94–103, USA, 2007. IEEE Computer Society.
- Nuno Moniz and Luis Torgo. Multi-source social feedback of online news feeds. CoRR, 2018.
- Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, and Sanjiv Kumar. Adaclip: Adaptive clipping for private SGD. CoRR, abs/1908.07643, 2019.
- S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, 2013.
- Shuang Song, Kamalika Chaudhuri, and Anand Sarwate. Learning from Data with Heterogeneous Noise using SGD. volume 38 of Proceedings of Machine Learning Research, pages 894–902, San Diego, California, USA, 09–12 May 2015. PMLR.
- [She19a] Or Sheffet. Differentially private ordinary least squares. Journal of Privacy and Confidentiality, 9(1), Mar. 2019.
- Or Sheffet. Old techniques in differentially private linear regression. In Aurélien Garivier and Satyen Kale, editors, Proceedings of the 30th International Conference on Algorithmic Learning Theory, volume 98 of Proceedings of Machine Learning Research, pages 789–827, Chicago, Illinois, 22–24 Mar 2019. PMLR.
- Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, STOC ’11, page 813–822, New York, NY, USA, 2011. Association for Computing Machinery.
- [SSSS09] Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Stochastic convex optimization. In COLT 2009 - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009, 2009.
- [STU17] A. Smith, A. Thakurta, and J. Upadhyay. Is interaction necessary for distributed private learning? In 2017 IEEE Symposium on Security and Privacy (SP), pages 58–77, 2017.
- [TAM19] Om Thakkar, Galen Andrew, and H. Brendan McMahan. Differentially private learning with adaptive clipping. CoRR, abs/1905.03871, 2019.
- Abhradeep Guha Thakurta and Adam Smith. Differentially private feature selection via stability arguments, and the robustness of the lasso. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 819–850, Princeton, NJ, USA, 12–14 Jun 2013. PMLR.
- Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Nearly-optimal private lasso. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 3025–3033, Cambridge, MA, USA, 2015. MIT Press.
- Jonathan Ullman. Private multiplicative weights beyond linear queries. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’15, page 303–312, New York, NY, USA, 2015. Association for Computing Machinery.
- Yu-Xiang Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, pages 93–103, 2018.
- Naughton. Bolt-on differential privacy for scalable stochastic gradient descent-based analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, page 1307–1322, New York, NY, USA, 2017. Association for Computing Machinery.
- Di Wang and Jinhui Xu. On sparse linear regression in the local differential privacy model. volume 97 of Proceedings of Machine Learning Research, pages 6628–6637, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
- Di Wang, Minwei Ye, and Jinhui Xu. Differentially private empirical risk minimization revisited: Faster and more general. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2722–2731. Curran Associates, Inc., 2017.
- [ZZMW17] Jiaqi Zhang, Kai Zheng, Wenlong Mou, and Liwei Wang. Efficient private ERM for smooth objectives. In Carles Sierra, editor, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 3922–3928. ijcai.org, 2017.