Model averaging of machine learning algorithms for digital soil mapping: A minimum variance framework

Geoderma(2023)

引用 0|浏览2
暂无评分
摘要
In the digital soil mapping framework, machine learning (ML) algorithms are currently the most popular methods for the spatial prediction of soil properties. The fast developments of easy-to-use software implementations for a large panel of ML algorithms have encouraged comparison studies between algorithms, with the goal of ranking their performances and identifying the best ones among them. However, as no firm conclusions can be drawn about the best ML algorithm to be used in general, this suggests that combining a set of them could be a better approach. Numerous methods have been proposed to do so, most of them relying on a linear weighting of the individual algorithms. However, there are almost as many methods for linearly weighting ML algorithms as there are ML algorithms, thus leaving the problem unsolved. Moreover, these weighting methods are mostly used out-of-the-box, without paying a proper attention to the associated hypotheses. In this paper, we propose to address this issue by setting the problem in a more formal framework. Starting from classical hypotheses, it is shown how the benefit of averaging various ML algorithms can be estimated from their joint performances. Relying afterwards on the most commonly used linear weighting schemes, it is reminded that, as long as the performance metrics are based on mean square errors, the best averaging method is by essence the best linear (unbiased) predictor. Using a more general Bayesian framework, it is also shown that accounting for conditional biases when weighting ML algorithms is a key issue for obtaining improved predictions, and explicit formulas are proposed for that goal. Finally, these theoretical results are illustrated and discussed using a soil data set collected over an arid and semi-arid region in Iran where clay content, calcium carbonate equivalent, soil organic carbon and electrical conductivity were measured in topsoil samples.
更多
查看译文
关键词
digital soil mapping,model averaging,machine learning,algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要