Effect Of Data Set Size On Geochemical Quantification Accuracy With Laser-Induced Breakdown Spectroscopy

SPECTROCHIMICA ACTA PART B-ATOMIC SPECTROSCOPY（2021）

引用 17|浏览4

暂无评分

摘要

Laser-induced breakdown spectroscopy (LIBS) data acquired from 2959 geochemical standards allow the effects of training set size on LIBS accuracy in geochemical analyses to be evaluated. In addition, LIBS prediction accuracies are quantified for 65 elements based on a typical benchtop instrument. Analyses used two equivalent randomly selected subsets of the full data set to compare prediction accuracies of partial least squares models using 75, 50, 25, 10, 5, 2.5, 1, and 0.5% of the total data set for training and the remainder for testing. The number of components, a measure of complexity, in the PLS models was shown to increase with the size of the training set. Based on root mean square errors on unseen test data, our results show that the larger the training set, the better (lower) the prediction accuracy will be on unseen data. Calibration (training set) size was shown to have a first-order effect on prediction accuracy relative to spectral resolution and detector sensitivity. Different methods of assessing model accuracy using root mean square error (RMSE) are compared, including the error of the calibration (RMSE-C), the error of cross-validation (RMSE-CV), and the error of prediction (RMSE-P). Use of RMSE-C is inappropriate because the samples being predicted are those on which the model was trained. In data sets that are sufficiently large, use of test data (RMSE-P) provides the best measure of prediction accuracy, while RMSE-CV is useful only to provide an estimate of subsequent model performance. Increasing the number of crossvalidation folds for our large dataset yields surprisingly comparable RMSE-CV values for models with five or more (up to 100) folds, but this result is likely not applicable to smaller data sets and needs further evaluation.

查看译文

关键词

LIBS, PLS, Geostandards, Quantification, Accuracy

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要