Optimized modelling of countrywide soil organic carbon levels via an interpretable decision tree

SMART AGRICULTURAL TECHNOLOGY(2023)

引用 2|浏览9
暂无评分
摘要
There are relatively few studies that explicitly evaluate the performance of machine learning algorithms (MLAs) such as decision trees while varying conditions like data splitting strategies and feature selection methods in digital soil mapping (DSM). Since several more powerful black-box models such as Random Forest (RF) exist, regular models like the Classification and Regression Tree (CART) are least applied despite being more intelligible than the former. We demonstrate a simple yet relevant way to improve the performance of a CART model for DSM while still benefiting from its intelligibility, interpretability and intuition potential. Soil organic carbon (SOC) levels across the Czech Republic are predicted at 30 m x 30 m resolution using selected covariates coupled with respective CART models. For this work, 440 topsoils (0-20 cm) for the Czech Republic were retrieved from the LUCAS soil database. Regarding the distinct CART models, data splitting strategies (Random, SPlit and Conditional Latin Hypercube Sampling: cLHS) and 7 feature selection methods were varied. Meanwhile, overall model results were compared using accuracy metrics including the root mean square error (RMSE). One of the satisfactory SOC model validation results based on SPlit has a root mean square error (RMSE) of 17.30 g/ kg and a coefficient of determination (R2) of 0.52. The cLHS proves robust for model data splitting. Feature selection methods including Stepwise Regression (SWR), Recursive Feature Elimination (RFE) and the Genetic Algorithm (GA) were considered computationally efficient for identifying relevant covariates. Generally, the study demonstrates the relevance and effectiveness of varying data splitting strategies and feature selection methods for improving SOC modelling via a decision tree (CART).
更多
查看译文
关键词
Intelligible models,Model parsimony,Czech Republic,Generalization,Digital soil mapping (DSM)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要