Clustering of Longitudinal Trajectories Using Correlation-Based Distances

Joaquim F. Pinto da Costa, Fábio Ferreira, Martina Mascarello,Rita Gaio

SN Comput. Sci.（2021）

引用 1|浏览8

暂无评分

摘要

The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.

查看译文

关键词

Auto-regressive model,Compound symmetry,K-Means,Longitudinal clustering,Mahalanobis distance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要