Brief Announcement: Efficient Distributed Algorithms for the K-Nearest Neighbors Problem

arxiv(2020)

引用 4|浏览27
暂无评分
摘要
The K-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point q, we want to assign a label to q based on the labels of the K-nearest points to the query. We study this problem in the k-machine model,(1) a model for distributed large-scale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to compute an answer given a query point to a machine using a small number of communication rounds. Our main result is a randomized algorithm in the k-machine model that runs in O(log K) communication rounds with high success probability (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(k log K) messages. Our bounds are essentially the best possible for comparison-based algorithms. We also implemented our algorithm and show that it performs well in practice.
更多
查看译文
关键词
K-Nearest Neighbors, Randomized selection, k-Machine Model, Distributed Algorithm, Round complexity, Message complexity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要