Machine learning from real data: A mental health registry case study

Elisabetta Gentili,Giorgia Franchini,Riccardo Zese,Marco Alberti,Maria Ferrara,Ilaria Domenicano,Luigi Grassi

Computer Methods and Programs in Biomedicine Update（2024）

引用 0|浏览15

暂无评分

摘要

Imbalanced datasets can impair the learning performance of many Machine Learning techniques. Nevertheless, many real-world datasets, especially in the healthcare field, are inherently imbalanced. For instance, in the medical domain, the classes representing a specific disease are typically the minority of the total cases. This challenge justifies the substantial research effort spent in the past decades to tackle data imbalance at the data and algorithm levels. In this paper, we describe the strategies we used to deal with an imbalanced classification task on data extracted from a database generated from the Electronic Health Records of the Mental Health Service of the Ferrara Province, Italy. In particular, we applied balancing techniques to the original data, such as random undersampling and oversampling, and Synthetic Minority Oversampling Technique for Nominal and Continuous (SMOTE-NC). In order to assess the effectiveness of the balancing techniques on the classification task at hand, we applied different Machine Learning algorithms. We employed cost-sensitive learning as well and compared its results with those of the balancing methods. Furthermore, a feature selection analysis was conducted to investigate the relevance of each feature. Results show that balancing can help find the best setting to accomplish classification tasks. Since real-world imbalanced datasets are increasingly becoming the core of scientific research, further studies are needed to improve already existing techniques.

查看译文

关键词

Healthcare,Machine learning,Imbalanced dataset,Mental health

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要