Abstract 11946: Comparison of Unsupervised Learning Approaches Applied to Electronic Health Record Traits in Heart Failure

Nosheen Reza,William P. Bone,Pankhuri Singhal,Yifan Yang,Anurag Verma,Ashwin C. Murthy,Srinivas Denduluri,Srinath Adusumalli,Marylyn Ritchie,Thomas P. Cappola

Circulation（2021）

引用 0|浏览17

暂无评分

摘要

Introduction: Unsupervised machine learning (UML) applied to high dimensional data has been used to discover cardiovascular disease subtypes; however, the reproducibility of subtypes identified by different algorithms has not been explored. We compared the ability of several promising UML and clustering algorithms to identify heart failure (HF) subtypes using high dimensional electronic health record (EHR) data. Methods: Using the Penn Medicine EHR, we identified all patients who had > 2 instances of ICD-10-CM HF diagnosis. We extracted 1272 EHR-based features (vital signs, demographics, echocardiographic measurements, laboratories, comorbidities) from time of HF diagnosis and limited the cohort based on data completeness (n=8569). We selected the following methods based on prior success in simulation studies and used them to identify HF subtypes: Similarity Network Fusion (SNF), Locally Linear Embedding (LLE), Modified LLE, Uniform Manifold Approximation and Projection (UMAP), and Principal Component Analysis (PCA) followed by several clustering algorithms including K-means, Density-based spatial clustering of applications with noise (DBSCAN), and Spectral Clustering. K groups 2-12 were evaluated. Clustering performance was assessed by silhouette score and visual separation. Results: Model visualizations are shown in the Figure. Highest silhouette score achieved for each model varied widely from 0.02-0.62; optimal cluster number ranged from 2-4 across models. Normalization and standardization of continuous data did not significantly alter silhouette scores or optimal cluster number. Conclusions: HF subtypes identified through UML applied to EHR data may vary substantially depending on the algorithms used. Benchmarking strategies to evaluate reproducibility of UML in the EHR are needed to ensure valid HF patient stratification and phenotypic refinement.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要