## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# On Adaptive Distance Estimation

NIPS 2020, (2020)

EI

Keywords

Abstract

We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset $X = \{x_i\}_{i = 1}^n$ of $n$ points in $\mathbb{R}^d$ and $0 < p \leq 2$, we construct a randomized data structure with low memory consumption and query time which, when later given any query point $q \in \math...More

Code:

Data:

Introduction

- Much research attention has been directed towards understanding the performance of machine learning algorithms in adaptive or adversarial environments.
- The authors' method is simple and likely to be applicable to other domains: the authors describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to p norm estimation with negligible overhead in query time and a factor d overhead in memory.
- The authors provide a new data structure for ADE in the adaptive setting, for p norms (0 < p ≤ 2) with memory consumption O ((n + d)d/ε2), slightly more than the O(nd) required to store X in memory explicitly, but with the benefit that the query time is only O (ε−2(n + d)) as opposed to the O(nd) query time of the trivial algorithm.

Highlights

- In recent years, much research attention has been directed towards understanding the performance of machine learning algorithms in adaptive or adversarial environments
- Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to p norm estimation with negligible overhead in query time and a factor d overhead in memory
- We study the problem of designing efficient data structures for distance estimation, a basic primitive in algorithms for nonparametric estimation and exploratory data analysis, in the adaptive setting where the sequence of queries made to the data structure may be adversarially chosen
- We studied the problem of adaptive distance estimation where one is required to estimate the distance between a sequence of possibly adversarially chosen query points and the points in a dataset
- The only previous result with comparable guarantees is an algorithm for the Euclidean case which only returns one near neighbor [Kle97] and does not estimate all distances
- Starting with the influential work of [Bre96, Bre01], ensemble methods have been a mainstay in practical machine learning techniques

Results

- Pre-processing time for the data structure can be improved by using fast algorithms for rectangular matrix multiplication (See Section 4 for further discussion).
- For the specific application of approximate nearest neighbor, the works of [Kle97, KOR00] provide non-trivial data structures supporting adaptive queries; a comparison with the results is given in Subsection 1.2.
- The work of [Kle97] presents another algorithm with memory and query/pre-processing times similar to the ADE data structure though for Euclidean space.
- While both of these works provide algorithms with runtimes sublinear in n, they are for finding the approximate single nearest neighbor (“1-NN”) and do not provide distance estimates to all points in the same query time.
- For any 0 < δ < 1 and any 0 < p < 2, there is a data structure for the ADE problem in p space that succeeds on any query with probability at least 1 − δ, even in a sequence of adaptively chosen queries.
- Algorithm 2 when given as input any query point q ∈ Rd , D = {Πj, {Πjxi}in=1}lj=1 where {Πj}lj=1 are (ε, p)-representative, ε and δ, outputs distance estimates {di}in=1 satisfying:
- The query time follows from the time required to compute Πjk q for k ∈ [r] with r = O(log n/δ), the n median computations in Algorithm 2 and the setting of m.

Conclusion

- The proof of Theorem 4.1 follows by using Algorithm 1 to construct the adaptive data structure, D, and Algorithm 2 to answer any query, q.
- The only previous result with comparable guarantees is an algorithm for the Euclidean case which only returns one near neighbor [Kle97] and does not estimate all distances.
- Are there other machine learning tasks for which such trade-offs can be quantified?

Summary

- Much research attention has been directed towards understanding the performance of machine learning algorithms in adaptive or adversarial environments.
- The authors' method is simple and likely to be applicable to other domains: the authors describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to p norm estimation with negligible overhead in query time and a factor d overhead in memory.
- The authors provide a new data structure for ADE in the adaptive setting, for p norms (0 < p ≤ 2) with memory consumption O ((n + d)d/ε2), slightly more than the O(nd) required to store X in memory explicitly, but with the benefit that the query time is only O (ε−2(n + d)) as opposed to the O(nd) query time of the trivial algorithm.
- Pre-processing time for the data structure can be improved by using fast algorithms for rectangular matrix multiplication (See Section 4 for further discussion).
- For the specific application of approximate nearest neighbor, the works of [Kle97, KOR00] provide non-trivial data structures supporting adaptive queries; a comparison with the results is given in Subsection 1.2.
- The work of [Kle97] presents another algorithm with memory and query/pre-processing times similar to the ADE data structure though for Euclidean space.
- While both of these works provide algorithms with runtimes sublinear in n, they are for finding the approximate single nearest neighbor (“1-NN”) and do not provide distance estimates to all points in the same query time.
- For any 0 < δ < 1 and any 0 < p < 2, there is a data structure for the ADE problem in p space that succeeds on any query with probability at least 1 − δ, even in a sequence of adaptively chosen queries.
- Algorithm 2 when given as input any query point q ∈ Rd , D = {Πj, {Πjxi}in=1}lj=1 where {Πj}lj=1 are (ε, p)-representative, ε and δ, outputs distance estimates {di}in=1 satisfying:
- The query time follows from the time required to compute Πjk q for k ∈ [r] with r = O(log n/δ), the n median computations in Algorithm 2 and the setting of m.
- The proof of Theorem 4.1 follows by using Algorithm 1 to construct the adaptive data structure, D, and Algorithm 2 to answer any query, q.
- The only previous result with comparable guarantees is an algorithm for the Euclidean case which only returns one near neighbor [Kle97] and does not estimate all distances.
- Are there other machine learning tasks for which such trade-offs can be quantified?

Related work

- As previously discussed, there has been growing interest in understanding risks posed by the deployment of algorithms in potentially adversarial settings ([BCM+17, HMPW16, GSS15, YHZL19, LCLS17, PMG16]). In addition, the problem of preserving statistical validity in exploratory data analysis has been well explored [DFH+15a, BNS+16, DFH+15b, DFH+15c, DSSU17] where the goal is to maintain coherence with an unknown distribution from which one obtains data samples. There has also been previous work studying linear sketches in adversarial scenarios quite different from those appearing here ([MNS11, GHR+12, GHS+12]).

Specifically on data structures, it is, of course, the case that deterministic data structures provide correctness guarantees for adaptive queries automatically, though we are unaware of any non-trivial deterministic solutions for ADE. For the specific application of approximate nearest neighbor, the works of [Kle97, KOR00] provide non-trivial data structures supporting adaptive queries; a comparison with our results is given in Subsection 1.2. In the context of streaming algorithms (i.e. sublinear memory), the very recent work of Ben-Eliezer et al [BEJWY20] considers streaming algorithms with both adaptive queries and updates. One key difference is they considered the insertion-only model of streaming, which does not allow one to model computing some function of the difference of two vectors (e.g. the norm of q − xi).

Reference

- Thomas Dybdahl Ahle. Optimal Las Vegas locality sensitive data structures. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 938–949. IEEE Computer Society, 2017. 3
- Noga Alon and Bo’az Klartag. Optimal compression of approximate inner products and dimension reduction. In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 639–650, 2017. 21
- N. S. Altman. An introduction to kernel and nearest-neighbor nonparametric regression. Amer. Statist., 46(3):175–185, 1992. 4
- [AMS97] Christopher G. Atkeson, Andrew W. Moore, and Stefan Schaal. Locally weighted learning. Artif. Intell. Rev., 11(1-5):11–73, 1997. 4
- [AMS99] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137–147, 1999. 2
- [BCM+17] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. CoRR, abs/1708.06131, 2017. 1, 3
- [BEJWY20] Omri Ben-Eliezer, Rajesh Jayaram, David P. Woodruff, and Eylon Yogev. A framework for adversarially robust streaming algorithms. In Proceedings of the 39th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems (PODS), 2020. 3
- Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities. Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence, With a foreword by Michel Ledoux. 16
- Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In Doheon Lee, Mario Schkolnick, Foster J. Provost, and Ramakrishnan Srikant, editors, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, CA, USA, August 26-29, 2001, pages 245–250. ACM, 2001. 3
- Raef Bassily, Kobbi Nissim, Adam D. Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. Algorithmic stability for adaptive data analysis. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 1046–1059. ACM, 2016. 3
- [Bre96] Leo Breiman. Bagging predictors. Mach. Learn., 24(2):123–140, 1996. 9
- [Bre01] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, 2001. 9
- [CBK09] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1–15:58, 2009. 1
- [CCF04] Moses Charikar, Kevin C. Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theor. Comput. Sci., 312(1):3–15, 2004. 2, 8
- Kenneth L. Clarkson. A randomized algorithm for closest-point queries. SIAM J. Comput., 17(4):830–847, 1988. 3
- [CMS76] J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. J. Amer. Statist. Assoc., 71(354):340–344, 1976. 5
- [DFH+15a] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. In Proceedings of the 28th Annual Conference on Advances in Neural Information Processing Systems (NIPS), pages 2350–2358, 2015. 3
- [DFH+15b] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. The reusable holdout: preserving validity in adaptive data analysis. Science, 349(6248):636–638, 2015. 3
- [DFH+15c] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC), pages 117–126, 2015. 3
- Thomas G. Dietterich. Ensemble methods in machine learning. In Josef Kittler and Fabio Roli, editors, Multiple Classifier Systems, First International Workshop, MCS 2000, Cagliari, Italy, June 21-23, 2000, Proceedings, volume 1857 of Lecture Notes in Computer Science, pages 1–15.
- Springer, 2000. 9
- Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Jack Snoeyink and Jean-Daniel Boissonnat, editors, Proceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA, June 8-11, 2004, pages 253–262. ACM, 2004. 3, 5
- [DSSU17] Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4(1):61–84, 2017. 3
- [GHR+12] Anna C. Gilbert, Brett Hemenway, Atri Rudra, Martin J. Strauss, and Mary Wootters. Recovering simple signals. In 2012 Information Theory and Applications Workshop, ITA 2012, San Diego, CA, USA, February 5-10, 2012, pages 382–391. IEEE, 2012. 3
- Anna C. Gilbert, Brett Hemenway, Martin J. Strauss, David P. Woodruff, and Mary Wootters. Reusable low-error compressive sampling schemes through privacy. In IEEE Statistical Signal Processing Workshop, SSP 2012, Ann Arbor, MI, USA, August 5-8, 2012, pages 536–539. IEEE, 2012. 3
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. 3
- Francois Le Gall and Florent Urrutia. Improved rectangular matrix multiplication using powers of the Coppersmith-Winograd tensor. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1029–1046, 2018. 8
- [HMPW16] Moritz Hardt, Nimrod Megiddo, Christos H. Papadimitriou, and Mary Wootters. Strategic classification. In Madhu Sudan, editor, Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, Cambridge, MA, USA, January 14-16, 2016, pages 111–122. ACM, 2016. 1, 3
- [HSS08] Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola. Kernel methods in machine learning. Ann. Statist., 36(3):1171–1220, 2008. 4
- Moritz Hardt and David P. Woodruff. How robust are linear sketches to adaptive inputs? In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013, pages 121–1ACM, 2013. 2
- Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Jeffrey Scott Vitter, editor, Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pages 604–613. ACM, 1998. 3
- Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307–323, 2006. 1, 2, 5, 6
- Piotr Indyk and Tal Wagner. Approximate nearest neighbors in limited space. In Proceedings of the Conference On Learning Theory (COLT), pages 2012–2036, 2018. 1, 2
- William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Conn., 1982), volume 26 of Contemp. Math., pages 189–206. Amer. Math. Soc., Providence, RI, 1984. 1, 2, 5, 17
- Jon M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Frank Thomson Leighton and Peter W. Shor, editors, Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, May 4-6, 1997, pages 599–608. ACM, 1997. 3, 9
- [KNPW11] Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC), pages 745–754, 2011. 2, 8
- [KNW10] Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1161–1178, 2010. 6
- [KOR00] Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457–474, 2000. 3
- [LCLS17] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. 1, 3
- Gábor Lugosi and Shahar Mendelson. Near-optimal mean estimators with respect to general norms. Probab. Theory Related Fields, 175(3-4):957–973, 2019. 18, 19
- Michel Ledoux and Michel Talagrand. Probability in Banach spaces. Classics in Mathematics. Springer-Verlag, Berlin, 2011. Isoperimetry and processes, Reprint of the 1991 edition. 16
- S. Meiser. Point location in arrangements of hyperplanes. Inform. and Comput., 106(2):286–303, 1993. 3
- [MNS11] Ilya Mironov, Moni Naor, and Gil Segev. Sketching in adversarial environments. SIAM J. Comput., 40(6):1845–1870, 2011. 3
- Shahar Mendelson and Nikitz Zhivotovskiy. Robust covariance estimation under L4-L2 norm equivalence. arXiv preprint arXiv:1809.10462, 2018. 18
- J. P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, Boston, 2018. In progress, Chapter 1 online at http://fs2.american.edu/jpnolan/www/stable/stable.html.5, 15
- Rasmus Pagh. Coveringlsh: Locality-sensitive hashing without false negatives. ACM Trans. Algorithms, 14(3):29:1–29:17, 2018. 3
- Nicolas Papernot, Patrick D. McDaniel, and Ian J. Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. CoRR, abs/1605.07277, 2016. 1, 3
- [PMG+17] Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Ramesh Karri, Ozgur Sinanoglu, Ahmad-Reza Sadeghi, and Xun Yi, editors, Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, April 2-6, 2017, pages 506–519. ACM, 2017. 1
- Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. Nearest-neighbor methods in learning and vision. IEEE Trans. Neural Networks, 19(2):377, 2008. 3
- Jeffrey S. Simonoff. Smoothing methods in statistics. Springer Series in Statistics. Springer-Verlag, New York, 1996. 4
- Piotr Sankowski and Piotr Wygocki. Approximate nearest neighbors search without false negatives for ł2 for c > log log n. In Yoshio Okamoto and Takeshi Tokuyama, editors, 28th International Symposium on Algorithms and Computation, ISAAC 2017, December 9-12, 2017, Phuket, Thailand, volume 92 of LIPIcs, pages 63:1–63:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. 3
- Mikkel Thorup and Yin Zhang. Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput., 41(2):293–331, 2012. 1, 2, 8
- Roman Vershynin. High-dimensional probability, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018. An introduction with applications in data science, With a foreword by Sara van de Geer. 15, 20
- Alexander Wei. Optimal las vegas approximate near neighbors in p. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1794–1813, 2019. 3
- M. P. Wand and M. C. Jones. Kernel smoothing, volume 60 of Monographs on Statistics and Applied Probability. Chapman and Hall, Ltd., London, 1995. 4
- [YHZL19] Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans. Neural Networks Learn. Syst., 30(9):2805–2824, 2019. 3
- V. M. Zolotarev. One-dimensional stable distributions, volume 65 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 1986. Translated from the Russian by H. H. McFaden, Translation edited by Ben Silver. 5
- B2(0, 1 + ε/2, d). From the fact that the sets B2(x, ε/2, d) and B2(y, ε/2, d) are disjoint for distinct x, y ∈ T, we have: Vol (Tε) = |T| Vol (B2(0, ε/2, d)) ≤ Vol (B2(0, 1 + ε/2, d)). By dividing both sides and by using that fact that Vol (B2(0, l, d)) = ld Vol (B2(0, 1, d)), we get:
- 1. Through a similar manipulation, we get Z − Z ≤ 1 and this concludes the proof of the lemma.

Tags

Comments