Brief Announcement: Efficient Distributed Algorithms for the K-Nearest Neighbors Problem

Reza Fathi,Anisur Rahaman Molla,Gopal Pandurangan

arxiv（2020）

引用 4|浏览27

暂无评分

摘要

The K-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point q, we want to assign a label to q based on the labels of the K-nearest points to the query. We study this problem in the k-machine model,(1) a model for distributed large-scale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to compute an answer given a query point to a machine using a small number of communication rounds. Our main result is a randomized algorithm in the k-machine model that runs in O(log K) communication rounds with high success probability (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(k log K) messages. Our bounds are essentially the best possible for comparison-based algorithms. We also implemented our algorithm and show that it performs well in practice.

查看译文

关键词

K-Nearest Neighbors, Randomized selection, k-Machine Model, Distributed Algorithm, Round complexity, Message complexity

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要