Carbonate determination in soils by mid-IR spectroscopy with regional and continental scale models

doi:10.1371/journal.pone.0210235

Fig 1.

Dominant soil orders in the United States and locations of soil sample collection (triangles) included in the national CO₃ model.

More »

Expand

Table 1.

Geographic and taxonomic diversity of soils contributing to KSSL national CO₃ model and the Cornell regional model.

More »

Expand

Fig 2.

Flow chart of data subdivision and multiple model development starting with the full national dataset including Histosols.

Blue boxes represent spectral and CO₃ datasets, and red boxes the PLS models derived from them. Through preliminary principal components analysis (PCA) of the raw spectra using 10 principle components, 119 and 0 samples were excluded as redundant spectra from the KSSL national dataset and the smaller Cornell dataset, respectively. The remaining spectra in the national dataset were then divided, half for calibration and half for internal validation, using the Kennard-Stone algorithm to ensure equal distribution in the final PCA space. The smaller Cornell dataset was tested with leave-one-out cross validation (LOOCV). All chemometric models were developed using partial least squares regression (PLS) but with redundant spectra identified in PCA above excluded.

More »

Expand

Fig 3.

MIR modeled CaCO₃ equivalent as percent of soil weight verses manometrically measured CaCO₃ contents.

A) National model based on 1268 samples from the KSSL archive including all seven soil orders from the contiguous united states (Table 1) in which carbonates are likely to be found. This MIR model was developed dividing the dataset between calibration and test samples. Sample sizes for the calibration and validation sets are denoted by n_Cal and n_Val, respectively; and B) a second, independent MIR prediction model based exclusively on the Cornell dataset of 209 samples collected from 5 sites in New York and Iowa and including Alfisols, Inceptisols, and Mollisols. Due to smaller total sample size (n_CV), this calibration was performed using the cross-validation leave-one-out technique.

More »

Expand

Table 2.

Evaluating the robustness of carbonate models in reciprocal analyses.

More »

Expand

Fig 4.

Predicted CO₃ values using the Cornell dataset MIR calibration compared to KSSL manometrically measured CO₃ contents for the same national database samples.

Shown is the model performance for Alfisols and Inceptisols, which are the main soil orders contributing to the Cornell calibration model, and also for Histosols which are not represented in the Cornell calibration dataset at all. Statistics on these fits as well as all other soil orders present in the national dataset are given in Table 2.

More »

Expand

Table 3.

Matrix of root mean square error of prediction (RMSEP) and bias for 10 CO₃ prediction models all developed from the Kellog Soil Survey Laboratory (KSSL) national dataset (Table 1) and subsets thereof.

More »

Expand

Table 4.

Repeatability (within batch standard deviation (sd)), reproducibility (among batch sd) and accuracy (mean values) comparing manometric and MIR carbonate measurements done repeatedly on the same Kellogg Soil Survey Laboratory (KSSL) soil standards.

More »

Expand

Fig 5.

Comparison of the root mean square error of prediction (RMSEP) with the standard deviation (SD) of replicate measurements of the same soil sample within batches (96 well plates) for the KSSL national dataset without Histosols.

Each reported MIR predicted value is an average of four replicate samples individually loaded and analyzed providing a robust estimate of the within-batch MIR repeatability. Repeatability was calculated as the square root of average variance in the predicted values among each set of four reps divided by square root of 4. Placement in bins was based on the manometric measurements. The number of soil samples associated with bins 0 to 100, 100 to 200, 200 to 300 and 300–700 g kg^-1 were 300, 214, 127, and 48, respectively.

More »

Expand

Fig 6.

Representative spectra drawn from all soil orders in the KSSL national CO₃ dataset except Histosols and indicating the locations of the individual bands evaluated in Table 5.

All spectra shown represent soil samples with CO₃ contents of 400–550 g kg^-1. Contiguous bands have been given contrasting shading colors for visual clarity. Numbers at the top indicate the order in Table 3, sorted by RMSEP of predictive models based on each band individually.

More »

Expand

Table 5.

MIR model results using the KSSL national dataset with Histosols excluded (n = 1101 soil samples from six soil orders) and utilizing spectral intervals corresponding to specifically identified absorbance peaks.

More »

Expand

Fig 7.

Comparison of three sets of whole model regression coefficients for CO₃ prediction for the national dataset.

A) PLS model using only the peak centered at 1796 cm^-1 and 1st derivative preprocessing (Table 5). Histosols were excluded. The model used 12 loading vectors. B) PLS model with Histosols excluded and using 2nd derivative preprocessing (Table 3). The model used 11 loading vectors. C) PLS model based on all soil orders and using 1st derivative + multiplicative scattering correction preprocessing (Fig 2 & Table 3). The model had 13 loading vectors. For all three models, the full complement of final regression coefficients and all loading vectors are available as excel files in Supplemental Data online.

More »

Expand