ReMAHA-CatBoost: Addressing Imbalanced Data in Traffic Accident Prediction Tasks

Guolian Li,Yadong Wu,Yulong Bai,Weihan Zhang

APPLIED SCIENCES-BASEL（2023）

引用 0|浏览0

暂无评分

摘要

Featured Application ReMAHA-CatBoost is an advanced machine learning model designed for predicting traffic accident severity. It is constructed in two parts: ReMAHA (relief-F-based genetic algorithm with over-sampling algorithm for weighted Mahalanobis distance) and CatBoost, to offer an innovative solution in the field of imbalanced data classification. Key Features and Highlights: (1) ReMAHA Over-sampling: ReMAHA employs the Relief-F algorithm for feature selection and combines it with an innovative over-sampling technique to enhance prediction accuracy for minority classes; (2) Feature Engineering: The model leverages feature engineering to determine the significance of different attributes, enabling it to make precise predictions regarding accident severity; and (3) CatBoost Integration: ReMAHA incorporates CatBoost, a state-of-the-art gradient-boosting algorithm, to improve predictive performance by mitigating issues like overfitting and prediction bias. This paper elucidates the working principles of oversampling algorithms in machine learning tasks based on imbalanced datasets, specifically addressing how to resolve the issue of low accuracy stemming from imbalanced data at the data level. Based on the experimental results presented in this paper, it is evident that ReMAHA-CatBoost outperforms several other oversampling algorithms and models, especially on the US-Accidents traffic accident dataset characterized by an extreme class imbalance ratio of 91.40. This improved performance enhances the precision of traffic accident severity prediction.Abstract Using historical information from traffic accidents to predict accidents has always been an area of active exploration by researchers in the field of transportation. However, predicting only the occurrence of traffic accidents is insufficient for providing comprehensive information to relevant authorities. Therefore, further classification of predicted traffic accidents is necessary to better identify and prevent potential hazards and the escalation of accidents. Due to the significant disparity in the occurrence rates of different severity levels of traffic accidents, data imbalance becomes a critical issue. To address the challenge of predicting extremely imbalanced traffic accident events, this paper introduces a predictive framework named ReMAHA-CatBoost. To evaluate the effectiveness of ReMAHA-CatBoost, we conducted experiments on the US-Accidents traffic accident dataset, where the class label imbalance reaches up to 91.40 times. The experimental results demonstrate that the proposed model in this paper exhibits exceptional predictive performance in the domain of imbalanced traffic accident prediction.

查看译文

关键词

relief-F,imbalanced data,CatBoost,traffic accident,class imbalance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要