Fast density-based clustering through dataset partition using graphics processing units

Inf. Sci.(2015)

引用 24|浏览39
暂无评分
摘要
Graphics processing units (GPUs) have been utilized to improve the processing speed of many conventional data mining algorithms. DBSCAN, a popular clustering algorithm that has been often used in practice, was extended to execute on a GPU. However, existing GPU-based DBSCAN extensions still have impediments in that the distances from all objects need to be repeatedly computed to find the neighbor objects and the objects and intermediate clustering results are stored in costly off-chip memory of the GPU. This paper proposes CudaSCAN, a novel algorithm that improves the efficiency of DBSCAN by making better use of the GPU. CudaSCAN consists of three phases: (1) partitioning the entire dataset into sub-regions of size of an integer multiple of the on-chip shared memory size in the GPU; (2) local clustering within sub-regions in parallel; and (3) merging the local clustering results. CudaSCAN allows an overlap between sub-regions to ensure independent, parallel local clustering in each sub-region, which in turn enables for objects and/or intermediate results to be stored in on-chip shared memory that has an access cost a few hundred times cheaper than that of off-chip global memory. The independence also enables for merging to be parallelized. This paper proves the correctness of CudaSCAN, and according to our extensive experiments, CudaSCAN outperforms CUDA-DClust, a previous GPU-based DBSCAN extension, by up to 163.6 times.
更多
查看译文
关键词
density-based clustering,graphics processing unit,massively parallel algorithm,on-chip shared memory,dataset partition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要