School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach

JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE(2022)

引用 1|浏览8
暂无评分
摘要
Designing early warning systems through machine learning (ML) models to identify students at risk of dropout can improve targeting mechanisms and lead to efficient social policy interventions in education. School dropout is a culmination of various factors that drive children to leave school, and timely policy responses are most needed to address these underlying factors and improve school retention of children over time. However, applying ML approaches to school dropout prediction is an important challenge, especially in low-income countries, where data collection and management systems are relatively more prone to financial and technical constraints. For this reason, this study suggests using already collected household panel data to predict the probability of school dropout and explore feature importance for primary school children in Malawi through ML models. A rich set of variables is obtained in this study from the household data and used to build Random Forest (RF), least absolute shrinkage and selection operator (LASSO), Ridge and multilayer neural network (MNN) models. The study further explores how performance metrics differ when we embed the training samples' weights representing frequency in sampling design into the cost function of these ML models to discuss the implications of using household data in computational social science. LASSO and MNN models trained with sample weights become more prominent due to their higher recall rates of 80.6% and 78.8%. Compared to the baseline model trained with sample weights, the recall rate gained is roughly 56 percentage points using LASSO and 54 percentage points using MNN. Also, comparing LASSO and MNN trained with and without sample weights reveals that training models with sample weights increase the recall rate roughly by 11 percentage points for LASSO and 12 percentage points for MNN. Lastly, the paper provides a comprehensive and unified approach to better interpret the models using a game-theoretic approach – SHapley Additive exPlanations (SHAP) – to quantify feature importance. As a result, socio-economic characteristics of children, such as working in household farming and father's education level, are among the most important features contributing to the probability of school dropout in ML models. This study argues that the weighted sample structure of household data and its wide range of variables explored through the SHAP method for feature importance can enrich the literature and yield valuable results to harness data science for society.
更多
查看译文
关键词
Machine learning,Feature importance,School dropout prediction,Sample weights,Educational data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要