Comparison of machine learning methods for the classification of cardiovascular disease

Rachael Hagan,Charles J. Gillan, Fiona Mallett

Informatics in Medicine Unlocked（2021）

引用 7|浏览2

暂无评分

摘要

Abstract Background Researchers are devoting significant effort to use machine learning algorithms, a subset of the wider field of artificial intelligence, to detect disease in a single patient. There exists extensive research in the application of machine learning methods in health care, and more specifically, cardiovascular disease. We have chosen to focus this initial investigation on the case of cardiac disease in order focus our efforts on as much detail of the methods as possible. Methods In this paper we explore the uncertainty that exists across applying machine learning methods, namely: Support Vector Machines (SVM), Multi-Layer Perception Neural Networks (MLP) and ensemble methods, for the classification of cardiovascular disease. Our work uses two public datasets with significantly different characteristics in order to assess the potential differences in the uncertainty of the methods. The cardiac arrhythmia dataset from the University California Irvine (UCI) Machine Learning repository has almost three hundred specific physiological data points per patient gathered from analysis of electrocardiogram (ECG) signals on several hundred patients although the distribution of cases is severely imbalanced. Contrast this with one dataset, reporting on cardiovascular disease from the Kaggle collection where there are nearly seventy thousand patient records. However, this Kaggle dataset reports only a small number of parameters per patient record, values such as serum cholesterol level, diastolic and systolic blood pressure, relative blood glucose levels and presence or absence of angina. Results Models built for the UCI dataset have an order of magnitude more dimensions or alternatively have much larger numbers of input nodes for neural network models compared to the models developed the Kaggle dataset. On the other hand, the Kaggle dataset has an order of magnitude more records for training and validation than the UCI dataset. Our results compare and contrast the uncertainty in models built using support vector machine, multilayer perceptron neural networks and decision trees for these two datasets. The work suggests that it will be instructive to extend our analysis to datasets of other patho-physiologies.

查看译文

关键词

Supervised learning,Classification,Heart disease,SVM,Decision trees,Neural network,Ensemble learning,Uncertainty quantification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要