An Optimized Graph-Based Clustering For Multi-Database Mining

2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI)（2020）

引用 1|浏览5

暂无评分

摘要

Multinational corporations have multiple databases distributed throughout their branches, which store millions of transactions per day. For business applications, identifying disjoint clusters of homogeneous databases contributes to learning the common patterns among customers and also increases profits by targeting potential clients in the future. In this paper, we present an effective approach to search for the optimal clustering of multiple transactional databases in a weighted undirected similarity graph. To assess the clustering quality, we use dual gradient descent to minimize a constrained quasi-convex loss function whose parameters will determine the edges needed to form the optimal database clusters in the graph. Therefore, finding the global minimum is guaranteed in a finite and short time compared with the existing non-convex objectives where all possible candidate classifications are generated to find the ideal clustering. Moreover, our algorithm does not require specifying the number of clusters a priori and uses a disjoint-set forest data structure to maintain and keep track of the clusters as they are updated. We have performed extensive experiments on public data samples and compared our algorithm with one of the best previous algorithms for clustering multiple databases. Our experimental study shows that our algorithm performs better than the previous algorithms in terms of accuracy and running time.

查看译文

关键词

multi-database mining, frequent item-sets, graph clustering, dual gradient descent, convex optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要