Adversarially Robust Streaming Algorithms via Differential Privacy

NeurIPS 2020, 2020.

Cited by: 0|Bibtex|Views21|Links
Keywords:
datum analysisdata streamdatum streamadversarial robustnessstreaming algorithmMore(10+)
Weibo:
For interesting regimes of parameters, our algorithm outperforms the current state-of-the-art constructions for the insertion-only model

Abstract:

A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adv...More

Code:

Data:

Introduction
  • The field of streaming algorithms was formalized by Alon, Matias, and Szegedy [3], and has generated a large body of work that intersects many other fields in computer science such as theory, databases, networking, and natural language processing.
  • The authors establish a connection between adversarial robustness of streaming algorithms and differential privacy, a model to provably guarantee privacy protection when analyzing data.
  • For many problems of interest, even in the general turnstile model, this technique allows them to obtain adversarially robust streaming algorithms with sublinear space.
Highlights
  • The field of streaming algorithms was formalized by Alon, Matias, and Szegedy [3], and has generated a large body of work that intersects many other fields in computer science such as theory, databases, networking, and natural language processing
  • Streaming algorithms can be queried a lot of times throughout the execution
  • The vast majority of the work on streaming algorithms is focused on the oblivious setting
  • Over the last few years, differential privacy has proven itself to be an important algorithmic notion, and has found itself useful in many other fields, such as machine learning, mechanism design, secure computation, probability theory, secure storage, and more. [35, 17, 26, 41, 5, 39, 40, 33, 6] In particular, our results utilize a connection between differential privacy and generalization, which was first discovered by Dwork et al [17] in the context of adaptive data analysis
  • Let A(r, ai) denote the estimate returned by the oblivious streaming algorithm A after the ith update, when it is executed with the random string r and receives the stream ai
Results
  • Fix any function g and let A be an oblivious streaming algorithm for g that for any α, δ > 0 uses space L(α, δ) and guarantees accuracy α with success probability 1 − δ for streams of length m.
  • The following theorem allows to argue about the privacy guarantees of an algorithm that accesses its input database using several differentially private mechanisms.
  • The authors use the sparse vector technique [19] in order to identify the time steps in which the authors need to aggregate the responses of the k copies of A, and the aggregation itself is done using a differentially private algorithm for approximating the median of the responses.
  • Let A be an oblivious streaming algorithm for a functionality g, that guarantees that with probability at least
  • With probability at least 1 − δ all the estimates returned by RobustSketch before it halts are accurate to within multiplicative error of (1 ± α), even when the stream is chosen by an adaptive adversary, provided that k=Ω
  • Let A(r, ai) denote the estimate returned by the oblivious streaming algorithm A after the ith update, when it is executed with the random string r and receives the stream ai.
  • Case (b) If the algorithm outputs an estimate on Step 3d, it is computed using algorithm PrivateMed, which is executed on the database.
  • Let A be an oblivious streaming algorithm for a functionality g, that uses space and guarantees accuracy α 10 with success probability
Conclusion
  • There is an adversarially robust F2 estimation algorithm for τ -bounded deletion streams of length m that guarantees α accuracy with probability at least
  • The F2 estimation algorithm of [8] for τ -bounded deletion streams uses space
  • There is an adversarially robust F2 estimation algorithm for insertion-only streams of length m that guarantees α accuracy with probability at least
Summary
  • The field of streaming algorithms was formalized by Alon, Matias, and Szegedy [3], and has generated a large body of work that intersects many other fields in computer science such as theory, databases, networking, and natural language processing.
  • The authors establish a connection between adversarial robustness of streaming algorithms and differential privacy, a model to provably guarantee privacy protection when analyzing data.
  • For many problems of interest, even in the general turnstile model, this technique allows them to obtain adversarially robust streaming algorithms with sublinear space.
  • Fix any function g and let A be an oblivious streaming algorithm for g that for any α, δ > 0 uses space L(α, δ) and guarantees accuracy α with success probability 1 − δ for streams of length m.
  • The following theorem allows to argue about the privacy guarantees of an algorithm that accesses its input database using several differentially private mechanisms.
  • The authors use the sparse vector technique [19] in order to identify the time steps in which the authors need to aggregate the responses of the k copies of A, and the aggregation itself is done using a differentially private algorithm for approximating the median of the responses.
  • Let A be an oblivious streaming algorithm for a functionality g, that guarantees that with probability at least
  • With probability at least 1 − δ all the estimates returned by RobustSketch before it halts are accurate to within multiplicative error of (1 ± α), even when the stream is chosen by an adaptive adversary, provided that k=Ω
  • Let A(r, ai) denote the estimate returned by the oblivious streaming algorithm A after the ith update, when it is executed with the random string r and receives the stream ai.
  • Case (b) If the algorithm outputs an estimate on Step 3d, it is computed using algorithm PrivateMed, which is executed on the database.
  • Let A be an oblivious streaming algorithm for a functionality g, that uses space and guarantees accuracy α 10 with success probability
  • There is an adversarially robust F2 estimation algorithm for τ -bounded deletion streams of length m that guarantees α accuracy with probability at least
  • The F2 estimation algorithm of [8] for τ -bounded deletion streams uses space
  • There is an adversarially robust F2 estimation algorithm for insertion-only streams of length m that guarantees α accuracy with probability at least
Funding
  • For interesting regimes of parameters, our algorithm outperforms the current state-of-the-art constructions also for the insertion-only model (strictly speaking, our results for the insertion-only model are incomparable with [8])
Reference
  • K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear measurements. In Y. Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 459–467. SIAM, 2012.
    Google ScholarLocate open access versionFindings
  • K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, and subgraphs. In M. Benedikt, M. Krotzsch, and M. Lenzerini, editors, Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012, pages 5–14. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137–147, 1999.
    Google ScholarLocate open access versionFindings
  • Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In J. D. P. Rolim and S. Vadhan, editors, Randomization and Approximation Techniques in Computer Science, pages 1–10, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg.
    Google ScholarLocate open access versionFindings
  • R. Bassily, K. Nissim, A. D. Smith, T. Steinke, U. Stemmer, and J. Ullman. Algorithmic stability for adaptive data analysis. In D. Wichs and Y. Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 1046–1059. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • A. Beimel, I. Haitner, N. Makriyannis, and E. Omri. Tighter bounds on multi-party coin flipping via augmented weak martingales and differentially private sampling. In M. Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 838–849. IEEE Computer Society, 2018.
    Google ScholarLocate open access versionFindings
  • A. Beimel, K. Nissim, and U. Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. In APPROX-RANDOM, volume 8096 of Lecture Notes in Computer Science, pages 363–378.
    Google ScholarLocate open access versionFindings
  • O. Ben-Eliezer, R. Jayaram, D. P. Woodruff, and E. Yogev. A framework for adversarially robust streaming algorithms. CoRR, abs/2003.14265, 2020.
    Findings
  • O. Ben-Eliezer and E. Yogev. The adversarial robustness of sampling. CoRR, abs/1906.11327, 2019.
    Findings
  • J. Blasiok, J. Ding, and J. Nelson. Continuous monitoring of l p norms in data streams. In K. Jansen, J. D. P. Rolim, D. Williamson, and S. S. Vempala, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2017, August 16-18, 2017, Berkeley, CA, USA, volume 81 of LIPIcs, pages 32:1–32:13. Schloss Dagstuhl - Leibniz-Zentrum fur Informatik, 2017.
    Google ScholarLocate open access versionFindings
  • M. Bun, C. Dwork, G. N. Rothblum, and T. Steinke. Composable and versatile privacy via truncated CDP. In STOC, pages 74–86, 2018.
    Google ScholarLocate open access versionFindings
  • M. Bun, K. Nissim, U. Stemmer, and S. P. Vadhan. Differentially private release and learning of threshold functions. In FOCS, pages 634–649, 2015.
    Google ScholarLocate open access versionFindings
  • M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming, ICALP 02, page 693703, Berlin, Heidelberg, 2002. Springer-Verlag.
    Google ScholarLocate open access versionFindings
  • G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58 – 75, 2005.
    Google ScholarLocate open access versionFindings
  • G. Cormode and S. Muthukrishnan. Whats hot and whats not: Tracking most frequent items dynamically. ACM Trans. Database Syst., 30(1):249278, Mar. 2005.
    Google ScholarLocate open access versionFindings
  • M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 31(6):1794–1813, 2002.
    Google ScholarLocate open access versionFindings
  • C. Dwork, V. Feldman, M. Hardt, T. Pitassi, O. Reingold, and A. L. Roth. Preserving statistical validity in adaptive data analysis. In STOC, pages 117–126. ACM, 2015.
    Google ScholarLocate open access versionFindings
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876 of Lecture Notes in Computer Science, pages 265–284.
    Google ScholarLocate open access versionFindings
  • C. Dwork, M. Naor, O. Reingold, G. N. Rothblum, and S. P. Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In M. Mitzenmacher, editor, STOC, pages 381–390. ACM, 2009.
    Google ScholarFindings
  • C. Dwork, G. N. Rothblum, and S. P. Vadhan. Boosting and differential privacy. In FOCS, pages 51–60. IEEE Computer Society, 2010.
    Google ScholarLocate open access versionFindings
  • M. Elkin and J. Zhang. Efficient algorithms for constructing (1+epsilon, beta)-spanners in the distributed and streaming models. Distributed Comput., 18(5):375–385, 2006.
    Google ScholarLocate open access versionFindings
  • P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182 – 209, 1985.
    Google ScholarLocate open access versionFindings
  • A. C. Gilbert, B. Hemenway, A. Rudra, M. J. Strauss, and M. Wootters. Recovering simple signals. In 2012 Information Theory and Applications Workshop, pages 382–391, 2012.
    Google ScholarLocate open access versionFindings
  • A. C. Gilbert, B. Hemenway, M. J. Strauss, D. P. Woodruff, and M. Wootters. Reusable low-error compressive sampling schemes through privacy. In 2012 IEEE Statistical Signal Processing Workshop (SSP), pages 536–539, 2012.
    Google ScholarLocate open access versionFindings
  • M. Hardt and G. N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In FOCS, pages 61–70. IEEE Computer Society, 2010.
    Google ScholarLocate open access versionFindings
  • M. Hardt and J. Ullman. Preventing false discovery in interactive data analysis is hard. In FOCS, pages 454–463, 2014.
    Google ScholarLocate open access versionFindings
  • M. Hardt and D. P. Woodruff. How robust are linear sketches to adaptive inputs? In STOC, pages 121–130. ACM, June 1-4 2013.
    Google ScholarLocate open access versionFindings
  • P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, STOC 05, page 202208, New York, NY, USA, 2005. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • R. Jayaram and D. P. Woodruff. Data streams with bounded deletions. In J. V. den Bussche and M. Arenas, editors, Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10-15, 2018, pages 341–354. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • D. M. Kane, J. Nelson, and D. P. Woodruff. On the exact space complexity of sketching and streaming small norms. In M. Charikar, editor, Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 1161–1178. SIAM, 2010.
    Google ScholarLocate open access versionFindings
  • D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 10, page 4152, New York, NY, USA, 2010. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • H. Kaplan, K. Ligett, Y. Mansour, M. Naor, and U. Stemmer. Privately learning thresholds: Closing the exponential gap. CoRR, abs/1911.10137, 2019.
    Findings
  • G. Kellaris, G. Kollios, K. Nissim, and A. O’Neill. Accessing data while preserving privacy. CoRR, abs/1706.01552, 2017.
    Findings
  • G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, August 20-23, 2002, pages 346–357. Morgan Kaufmann, 2002.
    Google ScholarLocate open access versionFindings
  • F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94–103. IEEE Computer Society, 2007.
    Google ScholarLocate open access versionFindings
  • I. Mironov, M. Naor, and G. Segev. Sketching in adversarial environments. SIAM J. Comput., 40(6):1845–1870, 2011.
    Google ScholarLocate open access versionFindings
  • S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2):117–236, 2005.
    Google ScholarLocate open access versionFindings
  • J. Nelson. Sketching and streaming algorithms. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2011.
    Google ScholarFindings
  • K. Nissim, A. D. Smith, T. Steinke, U. Stemmer, and J. Ullman. The limits of post-selection generalization. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montreal, Canada, pages 6402–6411, 2018.
    Google ScholarLocate open access versionFindings
  • K. Nissim and U. Stemmer. Concentration bounds for high sensitivity functions through differential privacy. J. Priv. Confidentiality, 9(1), 2019.
    Google ScholarLocate open access versionFindings
  • T. Steinke and J. Ullman. Interactive fingerprinting codes and the hardness of preventing false discovery. In COLT, pages 1588–1628, 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments