A Gradient-Based Clustering For Multi-Database Mining

IEEE ACCESS（2021）

引用 3|浏览6

暂无评分

摘要

Multinational corporations have multiple databases distributed throughout their branches, which store millions of transactions per day. For business applications, identifying disjoint clusters of similar and relevant databases contributes to learning the common buying patterns among customers and also increases the profits by targeting potential clients in the future. This process is called clustering, which is an important unsupervised technique for big data mining. In this article, we present an effective approach to search for the optimal clustering of multiple transaction databases in a weighted undirected similarity graph. To assess the clustering quality, we use dual gradient descent to minimize a constrained quasi-convex loss function whose parameters will determine the edges needed to form the optimal database clusters in the graph. Therefore, finding the global minimum is guaranteed in a finite and short time compared with the existing non-convex objectives where all possible candidate clusterings are generated to find the ideal clustering. Moreover, our algorithm does not require specifying the number of clusters a priori and uses a disjoint-set forest data structure to maintain and keep track of the clusters as they are updated. Through a series of experiments on public data samples and precomputed similarity matrices, we show that our algorithm is more accurate and faster in practice than the existing clustering algorithms for multi-database mining.

查看译文

关键词

Databases, Itemsets, Clustering algorithms, Data models, Prototypes, Computer science, Computational modeling, Multi-database mining, graph clustering, dual gradient descent, quasi-convex optimization, similarity measure

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要