Fast and Accurate k-means For Large Datasets

NIPS, pp. 2375-2383, 2011.

Cited by: 170|Views127
EI
Weibo:
It can use the entire amount to read from the stream, writing the results of computing their 3k log k means to disk; when the stream is exhausted, this file is treated as a stream, until an iteration produces a file that fits entirely into main memory

Abstract:

Clustering is a popular problem with many applications. We consider the k-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant ...More

Code:

Data:

Your rating :
0

 

Best Paper
Best Paper of NeurIPS, 2011
Tags
Comments