Design and implementation of K-means parallel algorithm based on Hadoop

Jiyang Jia, Hui Xie,Tao Xu

PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21)(2021)

引用 0|浏览10
暂无评分
摘要
Aiming at the problem of low efficiency and instability of K-means clustering in big data environment, a parallel k-means algorithm based on Hadoop is proposed. Determine the initial number of clusters of K-means clustering through the elbow method, and then determine the initial cluster center based on the ideas of density and proximity, and use the MapReduce framework of the Hadoop ecosystem to achieve parallelization. Experiments show that the algorithm can improve the efficiency and convergence of K-means clustering in the case of massive data.
更多
查看译文
关键词
K-means, Hadoop, MapReduce framework, Nearest neighbor degree
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要