Machine learning for predicting halogen radical reactivity toward aqueous organic chemicalsl.

Youheng Liang,Xiaoliu Huangfu,Ruixing Huang, Zhenpeng Han,Sisi Wu, Jingrui Wang, Xinlong Long,Jun Ma,Qiang He

Journal of hazardous materials(2024)

引用 0|浏览2
暂无评分
摘要
Rapid advances in machine learning (ML) provide fast, accurate, and widely applicable methods for predicting free radical-mediated organic pollutant reactivity. In this study, the rate constants (logk) of four halogen radicals were predicted using Morgan fingerprint (MF) and Mordred descriptor (MD) in combination with a series of ML models. The findings highlighted that making accurate predictions for various datasets depended on an effective combination of descriptors and algorithms. To further alleviate the challenge of limited sample size, we introduced a data combination strategy that improved prediction accuracy and mitigated overfitting by combining different datasets. The Light Gradient Boosting Machine (LightGBM) with MF and Random Forest (RF) with MD models based on the unified dataset were finally selected as the optimal models. The SHapley Additive exPlanations revealed insights: the MF-LightGBM model successfully captured the influence of electron-withdrawing/donating groups, while autocorrelation, walk count and information content descriptors in the MD-RF model were identified as key features. Furthermore, the important contribution of pH was emphasized. The results of the applicability domain analysis further supported that the developed model can make reliable predictions for query compounds across a broader range. Finally, a practical web application for logk calculations was built.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要