An efficient sampling-based visualization technique for big data clustering with crisp partitions

DISTRIBUTED AND PARALLEL DATABASES(2021)

引用 3|浏览5
暂无评分
摘要
The data cluster tendency is an emerging need for exploring the big data cluster analysis tasks. The data are evaluated based on the number of clusters is known as cluster tendency. Many visualization techniques have been developed for the detection of cluster tendency. Some of the existing techniques include Visual Assessment Tendency (VAT), spectral-based VAT (SpecVAT), and improved VAT (iVAT), are considerably succeeded for an assessment of cluster tendency for small datasets. A bigVAT is another method that was recently developed for the estimation of cluster tendency of big data. It is perfect for deriving the clustering tendency in visual form for big data. However, it is intractable to explore the data clusters for large volumes of data objects. The proposed work addresses the clustering problem of bigVAT with the derivation of sampling-based crisp partitions. The crisp partitions will accurately predict the cluster labels of data objects. This research is based on big synthetic and big real-life datasets for demonstrating the performance efficiency of the proposed work.
更多
查看译文
关键词
Cluster tendency, Visualization techniques, Data clustering, Crisp partitions, Sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要