Fig 1.
Dominant soil orders in the United States and locations of soil sample collection (triangles) included in the national CO3 model.
Table 1.
Geographic and taxonomic diversity of soils contributing to KSSL national CO3 model and the Cornell regional model.
Fig 2.
Flow chart of data subdivision and multiple model development starting with the full national dataset including Histosols.
Blue boxes represent spectral and CO3 datasets, and red boxes the PLS models derived from them. Through preliminary principal components analysis (PCA) of the raw spectra using 10 principle components, 119 and 0 samples were excluded as redundant spectra from the KSSL national dataset and the smaller Cornell dataset, respectively. The remaining spectra in the national dataset were then divided, half for calibration and half for internal validation, using the Kennard-Stone algorithm to ensure equal distribution in the final PCA space. The smaller Cornell dataset was tested with leave-one-out cross validation (LOOCV). All chemometric models were developed using partial least squares regression (PLS) but with redundant spectra identified in PCA above excluded.
Fig 3.
MIR modeled CaCO3 equivalent as percent of soil weight verses manometrically measured CaCO3 contents.
A) National model based on 1268 samples from the KSSL archive including all seven soil orders from the contiguous united states (Table 1) in which carbonates are likely to be found. This MIR model was developed dividing the dataset between calibration and test samples. Sample sizes for the calibration and validation sets are denoted by nCal and nVal, respectively; and B) a second, independent MIR prediction model based exclusively on the Cornell dataset of 209 samples collected from 5 sites in New York and Iowa and including Alfisols, Inceptisols, and Mollisols. Due to smaller total sample size (nCV), this calibration was performed using the cross-validation leave-one-out technique.
Table 2.
Evaluating the robustness of carbonate models in reciprocal analyses.
Fig 4.
Predicted CO3 values using the Cornell dataset MIR calibration compared to KSSL manometrically measured CO3 contents for the same national database samples.
Shown is the model performance for Alfisols and Inceptisols, which are the main soil orders contributing to the Cornell calibration model, and also for Histosols which are not represented in the Cornell calibration dataset at all. Statistics on these fits as well as all other soil orders present in the national dataset are given in Table 2.
Table 3.
Matrix of root mean square error of prediction (RMSEP) and bias for 10 CO3 prediction models all developed from the Kellog Soil Survey Laboratory (KSSL) national dataset (Table 1) and subsets thereof.
Table 4.
Repeatability (within batch standard deviation (sd)), reproducibility (among batch sd) and accuracy (mean values) comparing manometric and MIR carbonate measurements done repeatedly on the same Kellogg Soil Survey Laboratory (KSSL) soil standards.
Fig 5.
Comparison of the root mean square error of prediction (RMSEP) with the standard deviation (SD) of replicate measurements of the same soil sample within batches (96 well plates) for the KSSL national dataset without Histosols.
Each reported MIR predicted value is an average of four replicate samples individually loaded and analyzed providing a robust estimate of the within-batch MIR repeatability. Repeatability was calculated as the square root of average variance in the predicted values among each set of four reps divided by square root of 4. Placement in bins was based on the manometric measurements. The number of soil samples associated with bins 0 to 100, 100 to 200, 200 to 300 and 300–700 g kg-1 were 300, 214, 127, and 48, respectively.
Fig 6.
Representative spectra drawn from all soil orders in the KSSL national CO3 dataset except Histosols and indicating the locations of the individual bands evaluated in Table 5.
All spectra shown represent soil samples with CO3 contents of 400–550 g kg-1. Contiguous bands have been given contrasting shading colors for visual clarity. Numbers at the top indicate the order in Table 3, sorted by RMSEP of predictive models based on each band individually.
Table 5.
MIR model results using the KSSL national dataset with Histosols excluded (n = 1101 soil samples from six soil orders) and utilizing spectral intervals corresponding to specifically identified absorbance peaks.
Fig 7.
Comparison of three sets of whole model regression coefficients for CO3 prediction for the national dataset.
A) PLS model using only the peak centered at 1796 cm-1 and 1st derivative preprocessing (Table 5). Histosols were excluded. The model used 12 loading vectors. B) PLS model with Histosols excluded and using 2nd derivative preprocessing (Table 3). The model used 11 loading vectors. C) PLS model based on all soil orders and using 1st derivative + multiplicative scattering correction preprocessing (Fig 2 & Table 3). The model had 13 loading vectors. For all three models, the full complement of final regression coefficients and all loading vectors are available as excel files in Supplemental Data online.