Fast and Accurate k-means For Large Datasets
NIPS, pp. 2375-2383, 2011.
It can use the entire amount to read from the stream, writing the results of computing their 3k log k means to disk; when the stream is exhausted, this file is treated as a stream, until an iteration produces a file that fits entirely into main memory
Clustering is a popular problem with many applications. We consider the k-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant ...More
PPT (Upload PPT)
Best Paper of NeurIPS, 2011