Clustering Based On Kolmogorov-Smirnov Statistic With Application To Bank Card Transaction Data

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS(2021)

引用 4|浏览27
暂无评分
摘要
Rapid developments in third-party online payment platforms now make it possible to record massive bank card transaction data. Clustering on such transaction data is of great importance for the analysis of merchant behaviours. However, traditional methods based on generated features inevitably lead to much loss of information. To make better use of bank card transaction data, this study investigates the possibility of using the empirical cumulative distribution of transaction amounts. As the distance between two merchants can be measured using the two-sample Kolmogorov-Smirnov test statistic, we propose the Kolmogorov-Smirnov K-means clustering approach based on this distance measure. An approximation step is conducted to ensure the feasibility of the proposed method even for large-scale transaction data, and the associated theoretical properties are investigated. Both simulations and an empirical study demonstrate that our method outperforms feature-based methods and is computationally efficient for large-scale data sets.
更多
查看译文
关键词
empirical cumulative distribution function, Kolmogorov&#8211, Smirnov test, K&#8208, means clustering, sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要