Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Carbonate determination in soils by mid-IR spectroscopy with regional and continental scale models

  • Jonathan P. Comstock ,

    Roles Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    jpc8@cornell.edu

    Affiliations Horticulture Section, School of Integrative Plant Science, Cornell University, Ithaca, New York, United States of America, Kellogg Soil Survey Laboratory, Natural Resources Conservation Service, Lincoln, Nebraska, United States of America

  • Sonam R. Sherpa,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Kellogg Soil Survey Laboratory, Natural Resources Conservation Service, Lincoln, Nebraska, United States of America, Soil and Crop Sciences Section, School of Integrative Plant Sciences, Cornell University, Ithaca, New York, United States of America

  • Richard Ferguson,

    Roles Conceptualization, Data curation, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Kellogg Soil Survey Laboratory, Natural Resources Conservation Service, Lincoln, Nebraska, United States of America

  • Scarlett Bailey,

    Roles Data curation, Methodology, Visualization

    Affiliation Kellogg Soil Survey Laboratory, Natural Resources Conservation Service, Lincoln, Nebraska, United States of America

  • Jeffrey P. Beem-Miller,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft

    Affiliation Horticulture Section, School of Integrative Plant Science, Cornell University, Ithaca, New York, United States of America

  • Feng Lin,

    Roles Formal analysis, Software

    Affiliation Jiangsu Key Laboratory of Low Carbon Agriculture and GHGs Mitigation, Nanjing Agricultural University, Nanjing, China

  • Johannes Lehmann,

    Roles Conceptualization, Investigation, Writing – original draft

    Affiliations Soil and Crop Sciences Section, School of Integrative Plant Sciences, Cornell University, Ithaca, New York, United States of America, Atkinson Center for a Sustainable Future, Cornell University, Ithaca, New York, United States of America

  • David W. Wolfe

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – original draft

    Affiliations Horticulture Section, School of Integrative Plant Science, Cornell University, Ithaca, New York, United States of America, Atkinson Center for a Sustainable Future, Cornell University, Ithaca, New York, United States of America

Abstract

A Partial Least Squares (PLS) carbonate (CO3) prediction model was developed for soils throughout the contiguous United States using mid-infrared (MIR) spectroscopy. Excellent performance was achieved over an extensive geographic and chemical diversity of soils. A single model for all soil types performed very well with a root mean square error of prediction (RMSEP) of 12.6 g kg-1 and was further improved if Histosols were excluded (RMSEP 11.1 g kg-1). Exclusion of Histosols was particularly beneficial for accurate prediction of CO3 values when the national model was applied to an independent regional dataset. Little advantage was found in further narrowing the taxonomic breadth of the calibration dataset, but higher precision was obtained by running models for a restricted range of CO3. A model calibrated using only on the independent regional dataset, was unable to accurately predict CO3 content for the more chemically diverse national dataset. Ten absorbance peaks enabling CO3 prediction by mid-infrared (MIR) spectroscopy were identified and evaluated for individual and combined predictive power. A single-band model derived from an absorbance peak centered at 1796 cm-yielded the lowest RMSEP of 13.5 g kg-1 for carbonate prediction compared to other single-band models. This predictive power is attributed to the strength and sharpness of the peak, and an apparent minimal overlap with confounding co-occurring spectral features of other soil components. Drawing from the 10 identified bands, multiple combinations of 3 or 4 peaks were able to predict CO3 content as well as the full-spectrum national models. Soil CO3 is an excellent example of a soil parameter that can be predicted with great effectiveness and generality, and MIR models could replace direct laboratory measurement as a lower cost, high quality alternative.

Introduction

In some arid and semi-arid regions, the soil inorganic carbon (SIC) in carbonates can be the dominant form of soil carbon. In moderately humid regions, SIC often complicates analytical determination of soil organic carbon (SOC), especially at greater depths [1]. While both SOC and SIC contribute important fractions of total soil carbon (TC) under various climatic regimes, SOC rather than SIC is of interest in relation to soil health, fertility, and carbon sequestration. SOC is considered the more dynamic component of TC, exchanging with atmospheric greenhouse gasses in a manner highly sensitive to land use and management practices, but ongoing studies suggest that SIC can also be variously manipulated to be a net source or sink of atmospheric CO2 under some circumstances [25]. The need for low-cost high-throughput measurement approaches for both SOC and SIC has grown tremendously in the last two decades to support studies of soil carbon dynamics requiring large amounts of data, real-time data-informed management practices, and inexpensive and rapid verification of soil carbon stocks to facilitate sequestration efforts and carbon markets [3,68].

A wide variety of direct methods have been developed for measuring CO3 contents in soils [9]. These include dry combustion at two temperatures for selectively removing first SOC and subsequently SIC [10] and methods based on acidifying soil samples and evaluating the release of CO2 from carbonates either gravimetrically [11], by titration, or manometrically [1214].All of these laboratory methods require considerable time investments in sample preparation and measurement, and are consequently relatively low throughput and high-cost approaches. More recently interest has grown in spectroscopic methods of soil carbonate measurement using both visible and near- / mid- infrared spectral regions (VNIR and MIR, respectively) [6,1517]. These approaches have shown great potential for high throughput, low cost per sample after initial investment in equipment, and the potential to evaluate numerous soil properties simultaneously. Nonetheless, questions remain regarding the limitations and generality of the spectroscopic prediction models.

Carbonate content has been one of the most tractable soil properties for MIR analysis [6,18,19]., but different forms of Carbonate are rarely distinguished in these studies. The most common form is Calcite (CaCO3) but many soils also have substantial amounts of Dolomite (CaMg(CO3)2), and several other rarer forms also occur in restricted cases. This is a potential source of error in calibration due to both spectral differences among CO3 minerals, and differences in molecular weight per CO3 ion. The success in modeling CO3 with MIR chemometric models is due to the presence of several well-defined absorption peaks, and because soil carbonates, when present, can reach very high percentages of total soil mass with a correspondingly dominant influence on spectral properties. The numerous strong spectral features and large amounts of carbonate in some soils, however, can seriously confound the interpretation of other soil parameters. Strong carbonate peaks associated with fundamental vibrational states are present in the MIR region at 700, 880, and 1450 cm-1 [20]. Several additional bands, such as that at 3000–2900 cm-1 are due to overtones, and bands at 2600–2500 cm-1, and 1830–1760 cm-1 are due to combinations of fundamental vibrations [2022].

Earlier evaluations assessing MIR measurement of soil CO3 were promising but often either very limited in scope or of more qualitative accuracy [23]. McCarty et al. [24], however, showed excellent MIR CO3 predictions (RMSEP = 10 g kg-1, bias 2.5 g kg-1) for a set of Alfisols and Mollisols from the central United States. A French national model achieved slightly less precision (RMSEP = 23 g kg-1) [15]. Our study seeks to clarify the trade-offs between high performance and generality across datasets with extensive chemical and geographic variability. Specifically, we will test:

  1. Hypothesis 1) CO3 spectral characteristics are sufficiently strong and distinctive to allow a single predictive model to accurately predict CO3 content for all common soil types found in the contiguous United States provided the variation in soil types is well represented in a robust calibration dataset.
  2. Hypothesis 2) Models derived from localized data or data with restricted soil diversity may achieve apparently lower error terms in internal validation, but lack generality outside the narrowly defined calibration limits.
  3. Hypothesis 3) When calibrated against a sufficiently robust and extensive dataset, a broad inclusive model can match the accuracy and precision of other standard techniques for carbonate assessment despite extensive heterogeneity of soil background.

To address these questions, we report on a collaborative effort between the National Resources Conservation Service (NRCS) Kellogg Soil Survey Laboratory (KSSL) in Lincoln, Nebraska, and researchers at Cornell University in Ithaca, NY. The KSSL lab is the repository of a large soil archive with collections from the 1950s to the present. It currently houses approximately 245,000 soil samples and the archive is steadily growing. Approximately 55,000 samples from this archive have been scanned for MIR spectra as well as other measurements, and the total fraction scanned is steadily increasing. This spectral library and the associated information on soil properties allowed us to test these hypotheses on a national dataset with broad geographic distribution across the contiguous United States and representatives of the most important soil orders displayed on the map. Andisols, Spodisols andUltisols were not represented.

Materials and methods

Soil sample collection and preparation

The national dataset used in this study includes 1268 samples containing CO3 from the KSSL archive and accessed through the Laboratory Information and Management System (LIMS database). These samples provide a broad survey of both geographic and chemical soil diversity (Table 1, Fig 1) across the contiguous United States and Puerto Rico. A second soil collection of 209 samples came from independent sampling by the Cornell team at three sites in New York State (Inceptisols and Alfisols) and two in Iowa (Mollisols) (Table 1).

thumbnail
Fig 1. Dominant soil orders in the United States and locations of soil sample collection (triangles) included in the national CO3 model.

https://doi.org/10.1371/journal.pone.0210235.g001

thumbnail
Table 1. Geographic and taxonomic diversity of soils contributing to KSSL national CO3 model and the Cornell regional model.

https://doi.org/10.1371/journal.pone.0210235.t001

The individual states contributing the most samples to the national dataset were North Dakota, Texas, Wyoming, New Mexico, South Dakota, and Kansas, with 403, 167, 116, 103, 98, and 60 samples, respectively. 18 samples came from Puerto Rico. New York and Iowa, the states sampled for development of the Cornell localized model, contributed only 25 and 1 sample, respectively. The Cornell dataset had Alfisols and Inceptisols sampled in New York and Mollisols from Iowa.

Samples for the national dataset were initially prepared at KSSL using method 1B1b2d1 [25] to achieve a < 2mm fraction of fine earth which was further ground to 80 mesh in a Fritsch Planetary mill with Syalon grinding vessel and balls. Samples very high in organic matter, such as Histosols, could not be ground in the planetary mill and were fine-ground using a cross-beater mill (Fritsch Pulverisette 16 mill with an 80 mesh screen). Samples at Cornell were prepared in similar fashion and finished with grinding in a Retsch MM200 ball mill with stainless steel grinding vessels and balls. The high levels of grinding needed to achieve sample homogeneity for the small subsamples in MIR measurement also have consequences for spectral characteristics [26]. Preliminary tests were run on common samples to ensure that equivalent spectra would be produced after sample prep at either location.

Laboratory CO3 measurement

Results throughout this report are given as CaCO3 equivalents because the manometric method is calibrated using reagent gradeCaCO3 standards [13]. The measurement is based on measuring the CO2 released from carbonate reacting with 3M HCl, and is expected to be equally effective at measuring soil CO3 carbon released from the full range of carbonate soil minerals including common forms such as calcite, magnesite, and dolomite. Results are presented in units of CaCO3 equivalents, though actual weights and percent of total soil mass would differ depending on associated cations and crystal structures. Spectra of the common forms of soil carbonate can differ slightly, but generally show the same principle peaks in the MIR region [27,28].

CaCO3 equivalents were determined at both locations by pressure calcimeter method treating the <2mm soil fraction with 3M HCl in a closed vial. At KSSL this was method 4E1a1a1, pg 370 [25]. The method employed at Cornell University was similar in concept following the protocol of Sherrod et al. [13].

MIR spectroscopy and CO3 prediction model development

All soil samples were analyzed by Diffuse Reflectance Infrared Fourier Transform (DRIFT)-MIR spectroscopy. Undiluted soil samples were laid out in aluminum 96 well microplates with four replicate wells of each soil sample resulting in four replicate spectra for each sample. Soil samples were scanned in a Bruker Vertex 70 FT-IR Spectrometer with HTS-XT (Bruker Optik GmbH, Germany). The HTS-XT is an external microplate module. Spectra were scanned from 600–4000 cm−1 with a resolution of 4 cm−1 and 32 scans/sample-well. Final spectra were expressed in absorbance units [log(1/Reflectance)]. Empty microplate wells with anodized aluminum bottoms were used as background.

Influence of taxonomic coverage on model performance

The national dataset was used to compare the performance of PLS models produced using the full dataset with an array of subsets focused on single soil orders or excluding various combinations of soil orders. The flow of spectral subset selection resulting in these various models is shown in Fig 2. The blue boxes both above and below the full national dataset in the flowchart are all subsets of various kinds. The national dataset with only Histosols excluded is recognized separately because of its importance in analyses of specific CO3 peaks.

thumbnail
Fig 2. Flow chart of data subdivision and multiple model development starting with the full national dataset including Histosols.

Blue boxes represent spectral and CO3 datasets, and red boxes the PLS models derived from them. Through preliminary principal components analysis (PCA) of the raw spectra using 10 principle components, 119 and 0 samples were excluded as redundant spectra from the KSSL national dataset and the smaller Cornell dataset, respectively. The remaining spectra in the national dataset were then divided, half for calibration and half for internal validation, using the Kennard-Stone algorithm to ensure equal distribution in the final PCA space. The smaller Cornell dataset was tested with leave-one-out cross validation (LOOCV). All chemometric models were developed using partial least squares regression (PLS) but with redundant spectra identified in PCA above excluded.

https://doi.org/10.1371/journal.pone.0210235.g002

PLS chemometric models were developed in part using the “optimization” function of the OPUS-QUANT2 software package (Bruker Optik GmbH, Germany). This automated routine considered 11 options for spectral preprocessing together with a predefined array of bands dividing up the spectrum from 4000 to 600 cm-1. This optimization routine did not employ a full factorial approach but started out testing each spectral pre-processing option using the whole spectrum and also leaving out various selected bands to determine which made important contributions. Spectral band choices were then refined during an iterative process, still including all preprocessing options in each iteration. Results of all tests conducted were then ranked based on root mean square error of the prediction (RMSEP) or the root mean square error of cross validation (RMSECV) as appropriate to determine the best combination of spectral preprocessing and spectral regions. Preprocessing options evaluated included: original spectra without data pre-processing, constant offset elimination, straight line subtraction, vector normalization, min max normalization, multiplicative scatter correction, first derivative, second derivative, first derivative with straight line subtraction, first derivative with vector normalization, and first derivative with multiplicative scatter correction. Prediction accuracy of selected MIR models was evaluated by the coefficient of determination (R2), the RMSEP or RMSECV, bias, standard error of prediction (SEP) and the residual prediction deviation (RPD) [29]. While most models were calibrated using a calibration and test set division during optimization and evaluated by RMSEP, the Cornell model and three single soil order models derived from the national dataset (Histosols, Aridisols and Entisols) were developed with the cross-validation leave one out technique due to reduced sample size and were evaluated with RMSECV.

Samples with CO3 contents above 70% were excluded because of reduced accuracy and insufficient sample size in that extreme range. This may have been related in part to spectral distortions of the undiluted samples as CO3 approaches the total composition of the sample [22]. Also excluded due to unresolved outlier status were samples with extremely high levels of soluble salts (e.g., salt playas). No data were excluded from the Cornell dataset and only a limited number of well-defined cases were excluded from the national model dataset.

Reproducibility and accuracy of CO3 measurements

Laboratory standard soils included in KSSL measurement protocols permitted a comparison of reproducibility and accuracy between manometric and MIR techniques. KSSL laboratory standards 104 and 146 were included (routine quality assurance) during each run of the CO3 manometric assay, and 15 samples (60 wells in a 96 well plate) were also scanned by MIR for CO3 prediction using the KSSL (without Histosols) model. Another carbonate-containing KSSL laboratory standard, 101, is loaded on every 96 well plate during MIR analysis and has thus been scanned thousands of times. Standard 101 was subjected to 14 reps in a manometric analysis for comparison. Data are compared for mean values and standard deviations.

Spectral regions associated with CO3 prediction by MIR

To evaluate the contribution of specific hypothesized carbonate peaks, a similar software-driven optimization process was employed only substituting the defined array of spectral bands for the default bands dividing up the entire spectrum. 16 samples were chosen with CO3 contents between 300 and 450 g kg-1 and equally representing all soil orders in the KSSL national dataset except Histosols. Examination of these spectra resulted in identification of 10 spectral bands consistently associated with high CO3. A set of ten prediction models was developed using the KSSL national dataset but limited to each of the listed spectral bands in turn. All models were constrained to use first derivative preprocessing, a maximum number of 15 PLS loading vectors, and the same division between test and calibration samples. This array of preliminary optimization results provided an assessment of the performance of individual carbonate peaks and also the combinations of combined peaks that were most effective. In all cases, whether using default spectral regions or the pre-assessed carbonate peaks, the best optimization results were run again with a full analysis graphically and statistically.

Results & discussion

Performance of national and regional scale CO3 prediction models

Models were derived at both national scale and a more regional scale to assess the effect of geographic and taxonomic coverage on the relative performance CO3 prediction models. The CO3 concentration of samples ranged from 0.00 to 494.6 g kg-1 with a mean and standard deviation of 1.05 and 3.18 g kg-1, respectively. Additionally, the organic C concentration of samples ranged from 1.5 to 662.0 g kg-1 with a mean and standard deviation of 14.3 and 11.4 g kg-1, respectively (data not shown). Excellent results were achieved generating MIR predictive models for both the KSSL national dataset and more restricted Cornell dataset (Fig 3a and 3b, respectively). R2 of MIR predicted versus manometrically measured values were 0.988 and 0.993 for the KSSL model and Cornell model, respectively, with negligible bias in either case. RMSEP and root mean square error of cross validation (RMSECV) were 12.7 and 7.7 g kg-1 for the KSSL and Cornell models, respectively. Lower error in the Cornell model is likely associated with lower chemical and spatial diversity of the dataset. While a tendency for lower error estimates from the cross validation leave-one-out method used with the smaller dataset has been reported [17] the effect of this choice was very small for this dataset. Repeating the analysis using ten randomly chosen divisions of the Cornell samples between calibration and validation subsets resulted in an average RMSEP 0.792 g kg-1 (range 0.639 to 0.891 g kg-1), essentially identical on average to the RMSECV value. RPD values of 9.0 and 11.8, respectively, suggest that both MIR models are of sufficient quality for ‘any purpose’ and not just qualitative assessment [29].

thumbnail
Fig 3. MIR modeled CaCO3 equivalent as percent of soil weight verses manometrically measured CaCO3 contents.

A) National model based on 1268 samples from the KSSL archive including all seven soil orders from the contiguous united states (Table 1) in which carbonates are likely to be found. This MIR model was developed dividing the dataset between calibration and test samples. Sample sizes for the calibration and validation sets are denoted by nCal and nVal, respectively; and B) a second, independent MIR prediction model based exclusively on the Cornell dataset of 209 samples collected from 5 sites in New York and Iowa and including Alfisols, Inceptisols, and Mollisols. Due to smaller total sample size (nCV), this calibration was performed using the cross-validation leave-one-out technique.

https://doi.org/10.1371/journal.pone.0210235.g003

These two models, calibrated on completely independent datasets, were further tested for generality of prediction accuracy outside the original modeling datasets by evaluating the national model’s ability to accurately predict values for the independent Cornell dataset of manometrically determined CO3 and, conversely, the Cornell model’s ability to predict values for the much more diverse national dataset. The KSSL national model calibrated with all soil orders predicted CO3 contents for the independent 209 samples of the Cornell dataset (Table 2) with an RMSEP of 15.4 g kg-1 despite the fact that relatively few of the samples in the national dataset came from Iowa (4) or New York (18). This was a favorable result, but somewhat worse than the internal validation statistics (12.7 g kg-1) of the KSSL national model.

thumbnail
Table 2. Evaluating the robustness of carbonate models in reciprocal analyses.

https://doi.org/10.1371/journal.pone.0210235.t002

Influence of taxonomic coverage on model performance

When applied to the independent KSSL national dataset, the model derived from Cornell data provided moderate accuracy for Alfisols and Inceptisols from other parts of the continent with RMSEP of 14.6 and 21.4 g kg-1, respectively. Good performance was also seen for Vertisols with an RMSEP of 13.9 g kg-1 (Fig 4, Table 2) despite their absence from the calibration dataset. However, CO3 predictions were poor for both Aridisols and Mollisols, and were very noisy with severe underestimates for Histosols (Fig 4). Expanding the spectral region used while modeling with the Cornell dataset did not improve generality and actually worsened prediction accuracy.

thumbnail
Fig 4. Predicted CO3 values using the Cornell dataset MIR calibration compared to KSSL manometrically measured CO3 contents for the same national database samples.

Shown is the model performance for Alfisols and Inceptisols, which are the main soil orders contributing to the Cornell calibration model, and also for Histosols which are not represented in the Cornell calibration dataset at all. Statistics on these fits as well as all other soil orders present in the national dataset are given in Table 2.

https://doi.org/10.1371/journal.pone.0210235.g004

An evaluation of the full KSSL national model’s (Fig 3A) internal validation data broken down by soil order is given in Table 3 along with additional calibration models for nine subset combinations of soil orders from the KSSL national dataset. In all these models, the KSSL dataset was divided into the same division of calibration and test datasets.

thumbnail
Table 3. Matrix of root mean square error of prediction (RMSEP) and bias for 10 CO3 prediction models all developed from the Kellog Soil Survey Laboratory (KSSL) national dataset (Table 1) and subsets thereof.

https://doi.org/10.1371/journal.pone.0210235.t003

Histosols, with RMSEP of 21.1 g kg-1, was the most poorly predicted soil order (Table 3) followed by Entisols and Aridisols with the next highest RMSEPs at 18.3 and 16.6 g kg-1, respectively. Excluding Histosols from model calibration resulted in an optimization using 2nd derivative preprocessing, more selective portions of the available spectrum (2982–2640, 2301–1620 and 1281–939 cm-1), and had a Quant2 recommended PLS using 11 factors. A substantial improvement resulted in prediction precision for all other soil orders as well as the independent Cornell dataset, but this model, like the Cornell model, was unable to make useful predictions for Histosols (Fig 4, Tables 2 and 3). This is not an important model limitation given that histosols with measureable carbonate content are extremely rare. Further excluding Entisols and Aridisols from model calibration allowed for modest additional improvement in precision for remaining orders of the KSSL dataset but less dramatically so than for Histosol exclusion, and it did not improve predictions of the independent Cornell dataset (Table 3).

Histosols represent an extreme case in which the dominant spectral background is from organic matter rather than mineral soil. Given their limited frequency and highly distinctive nature, this is not a substantive limitation to application of the restricted form of the KSSL national model. Nonetheless, inclusion of Histosols results in only modest loss of overall accuracy and may be desirable in some contexts.

Further explorations of precision for more restricted models, including calibrations based on individual soil orders, showed, as would be expected, lower RMSEP for the validation test set as the diversity of samples in the model decreased (Table 3). However, these gains were rather small, and the calibration based on all orders except Histosols actually made the best predictions of the independent Cornell dataset. More substantial improvements in accuracy were achieved by calibrating for a reduced range of CO3 values (Table 3), and this is recommended when high-accuracy is needed for evaluating soils at low carbonate content.

Reproducibility and accuracy of CO3 measurement

Manometric assay and MIR predictions for these two KSSL laboratory CO3 standards were in almost perfect agreement (Table 4), and precision of individual measures was similar, possibly actually better for MIR, but the data do not include among-batch sources of error for MIR.

thumbnail
Table 4. Repeatability (within batch standard deviation (sd)), reproducibility (among batch sd) and accuracy (mean values) comparing manometric and MIR carbonate measurements done repeatedly on the same Kellogg Soil Survey Laboratory (KSSL) soil standards.

https://doi.org/10.1371/journal.pone.0210235.t004

Of interest in this comparison is that, although the difference is not huge, the mean carbonate values for manometric and MIR measurement are significantly different from each other (p<0.001). Since the difference is much larger than can be explained by measurement repeatability for either assay, this suggests a spectral background effect resulting in a small but consistent overestimate of carbonate by the MIR analysis. Such a discrepancy was not seen for standards 104 or 146, but it is within the bounds of expected deviations based on the RMSEP of the KSSL national model for samples with this CO3 content (Fig 5).

thumbnail
Fig 5. Comparison of the root mean square error of prediction (RMSEP) with the standard deviation (SD) of replicate measurements of the same soil sample within batches (96 well plates) for the KSSL national dataset without Histosols.

Each reported MIR predicted value is an average of four replicate samples individually loaded and analyzed providing a robust estimate of the within-batch MIR repeatability. Repeatability was calculated as the square root of average variance in the predicted values among each set of four reps divided by square root of 4. Placement in bins was based on the manometric measurements. The number of soil samples associated with bins 0 to 100, 100 to 200, 200 to 300 and 300–700 g kg-1 were 300, 214, 127, and 48, respectively.

https://doi.org/10.1371/journal.pone.0210235.g005

The behavior of KSSL standard 101 is consistent with the total dataset from the KSSL national model (Fig 5). All data from the KSSL model without Histosols was divided among four CO3 range-based bins to allow a comparison of the reproducibility of MIR predictions for a given sample and the overall accuracy with which individual manometric measurements were captured in MIR carbonate predictions. Since each reported MIR prediction going into the model is actually an average of four reps, an expected repeatability of the measurement can be estimated from the variance. Values for within batch repeatability of MIR predictions are calculated as the square root of the average variance across all sets of four reps (independently loaded wells within a given batch) divided by the square root of four to give an expected sd for averages of four reps, the standard for reported values. Fig 5 clearly indicates that repeatability of spectral measurements of particular soil samples is only a small contributor to the overall error represented by the RMSE. This is true at all levels of CO3 content, but more dramatically so at low levels of CO3.

Spectral regions associated with CO3 prediction by MIR

To further explore the contributions of specific spectral bands to overall model performance, 10 spectral bands consistently associated with high CO3 were identified, and both single-band and multiband prediction models were developed (Table 5, Fig 6). Several of the listed regions have a second peak in the same interval. In some cases these may be due to variation in harmonic contributions to particular peaks, to mixed crystal structures for carbonates present in the samples, or simply closely associated separate spectral features. The crest for the most prominent peak in each spectral region is given in column 3. These bands included the expected peaks at 700, 880, and 1450 cm-1 [20] (699, 887, and an expected double peak in undiluted samples at 1636 and 1471 cm-1 in Table 5) associated with major vibrational states, a combination band 1830–1800 cm-1 (1796 cm-1 in Table 5), well-discussed overtone bands at 2600–2500 (2514 cm-1 in Table 5), 3000–2900 cm-1 [2022] (2876 cm-1 in Table 5) and 3984–3937 cm-1 [30] (3938 cm-1 in Table 5), and three additional bands with consistent peaks centered at 2138, 1866 and 814 cm-1 (Table 5, Fig 5).

thumbnail
Fig 6. Representative spectra drawn from all soil orders in the KSSL national CO3 dataset except Histosols and indicating the locations of the individual bands evaluated in Table 5.

All spectra shown represent soil samples with CO3 contents of 400–550 g kg-1. Contiguous bands have been given contrasting shading colors for visual clarity. Numbers at the top indicate the order in Table 3, sorted by RMSEP of predictive models based on each band individually.

https://doi.org/10.1371/journal.pone.0210235.g006

thumbnail
Table 5. MIR model results using the KSSL national dataset with Histosols excluded (n = 1101 soil samples from six soil orders) and utilizing spectral intervals corresponding to specifically identified absorbance peaks.

https://doi.org/10.1371/journal.pone.0210235.t005

All 10 single-band models were significant, but the band from 1842–1763 cm-1 stands out with an RMSEP of 13.5 g kg-1, only 2.4 g kg-1 greater than the optimized national model without Histosols (Table 3). This is attributable to the strength and sharpness of this peak that make it clearly discernable against various background changes, and its greater consistency of peak centering and shape. Multiband band optimization searches using all or selected combinations of the 10 bands in Table 5 achieved RMSEP only 0.3 g kg-1 higher than full spectrum models (data not shown). The best multiband models always contained the peak centered at 1796 cm-1, but did not need all peaks and achieved similar results with a variety of ancillary band combinations. The band from 2424–2680 cm-1 has previously been put forward as holding the best peak for carbonate determination due to a relative lack of other soil components absorbing in that region [17,20,31]. It was the second most informative CO3 peak in this study, but notably less effective than the one centered at 1796 cm-1. The 3rd and 4th most informative bands (Table 5) overlap with absorbance by aliphatic compounds and carbohydrates, respectively [32].

A calibration of the KSSL national model (without Histosols) was performed using first derivative preprocessing to match the calibration analyses of individual peaks reported in Table 5. It used the full MIR spectrum available in this study (4000–600 cm-1), and had an RMSEP of 12.0 g kg-1. Spectral peaks with specific chemometric attributions are expected to show up most clearly in the first factor (loading vector) of a PLS model, but with first derivative preprocessing, the peak shapes are slightly altered [33,34]. Where a positive peak center may have been present in the raw absorbance data, the first derivative of the spectrum will have a value of 0 at this same wavelength with negative values dropping down to the left (higher wavenumbers) and positive values rising to the right reflecting the positive and negative slopes on contrasting sides of the raw absorbance maximum. The peaks ranked as the top three regarding their RMSEP (Table 5) fulfilled this expectation very well. The second and third ranked peaks from Table 5 exhibited features close to those expected in the first loading vector. These observations are consistent with the findings of previous studies, and with which peaks overlap extensively with strong absorbance by other common soil components. The expected peak at 1450 cm-1 for calcite splits into two rounded domes in neat samples [20]. The resulting region from 1336–1758 cm-1 overlaps with absorbance bands for a wide array of organic compounds [26,31]. Similarly, the expected peaks at 700 and 880 cm-1 overlap with absorbance by soil organic matter, quartz and clay minerals [26].

The PLSR regression coefficients for three of the most important CO3 prediction models discussed above (Peak centered at 1796 cm-1, Histosols excluded, and Full model all soils) are presented in Fig 7, so that the relative importance of spectral regions may be assessed. Optimization procedures resulted in the use of contrasting spectral regions for the three models, yet all emphasized the peak at 1796 cm-1 to a greater extent than any other region. The model based on all soil orders utilized the entire available spectrum, while optimization after exclusion of just the Histosols resulted in a substantial reduction of spectral regions utilized. The peak centered at 1796 cm-1 appears to be the predominant spectral region associated with CO3 prediction by MIR, as inclusion of additional spectral regions provided minimal gains in prediction accuracy ranging from 0.8 to 2.5 g kg-1.

thumbnail
Fig 7. Comparison of three sets of whole model regression coefficients for CO3 prediction for the national dataset.

A) PLS model using only the peak centered at 1796 cm-1 and 1st derivative preprocessing (Table 5). Histosols were excluded. The model used 12 loading vectors. B) PLS model with Histosols excluded and using 2nd derivative preprocessing (Table 3). The model used 11 loading vectors. C) PLS model based on all soil orders and using 1st derivative + multiplicative scattering correction preprocessing (Fig 2 & Table 3). The model had 13 loading vectors. For all three models, the full complement of final regression coefficients and all loading vectors are available as excel files in Supplemental Data online.

https://doi.org/10.1371/journal.pone.0210235.g007

Advantages of MIR vs manometric assay

The MIR technique is an attractive replacement for the manometric method in several important aspects. First, MIR has a much lower propensity for human error and/or cryptic equipment failure during the procedure. At the KSSL, as much as 10% of manometric measurements are invalidated and repeated as a result of quality control procedures, and other authors have noted the difficulty of maintaining accuracy during the procedure [35]. In contrast, only a small fraction of 1% of MIR spectra ever need to be invalidated and re-collected. For the last three years, although the manometric method is still employed at the KSSL, values are always compared with prediction from MIR spectra as a powerful QC procedure. Discrepancies greater than +/- 2 times the RMSEP of the MIR model are tagged for a re-analysis by the manometric measurement. The KSSL lab currently uses the national MIR model for QA/QC of data generated by the manometric method.

The MIR spectral approach also has lower cost once the initial investment in spectrometers has been made. At the KSSL, processing, archiving and entering the soil into the database are in themselves a fairly costly procedure. The additional cost of performing a manometric CO3 measurement versus MIR spectral analysis is then $3.60 versus $1.80 per sample, respectively. The lower cost for the MIR analysis is a result of higher throughput, reduced labor, and with no chemical costs. Further, of course, the same spectra can be used for numerous other predictions of other soil properties, and if the costs of multiple lab bench assays were considered the savings would become even more dramatic.

While spectral windows exist where CO3 peaks suffer little overlap with other key soil parameters, the reverse may be less true and the many strong carbonate peaks are likely to interfere with many other measurement goals. The effectiveness with which carbonates can be predicted, however, may provide opportunities for MIR model trees [36] and stratification.

This study demonstrates the value of a large, diverse, well-studied and well-curated soil collection and database in generating high-performance MIR models. The KSSL soil archive and associated characterization database provides unique opportunities to derive high-quality MIR models for a wide range of soil properties. Access to the spectral library is freely available upon request.

Conclusions

After investment in the instrumentation, spectroscopic data can be generated with low cost per sample and high throughput. They have the potential advantage of providing information about numerous soil properties at once if a set of high-quality models are available for interpretation. While benchtop analytic methods at their best may have somewhat higher precision than full-range MIR prediction models, specialized models with comparable precision can be developed where needed. The low frequency of spectral measurement error compared to chemical methods can also result in greater overall accuracy unless very extensive QA/QC procedures are in place.

Even with Histosols included, CO3 MIR models encompassed geographically and chemically diverse soils with minimal bias and excellent precision, suitable for most analytical needs. This validates Hypothesis 1 and has been fully demonstrated here for soils of the continental United States. The localized Cornell model was very limited outside its calibration dataset and error increased even with soils of the same order from other regions of the continent. In contrast, the national KSSL models proved to be robust. This confirms Hypothesis 2, and suggests broader, global models will be quite possible, but maintaining optimal precision and accuracy will require adding new soil orders into the calibration dataset.

In a strict sense, Hypothesis 3 was rejected for this dataset. The MIR prediction models were strongly affected by spectral background and this uncertainty was the primary contribution to RMSEP, which were larger than repeatability measures for either MIR or manometric assay. In terms of percent error, this was particularly true for samples with low carbonate contents. Nonetheless, very good predictions were possible and, in terms of full scale of the range of CO3 contents, the differences between manometric and MIR uncertainty were small. For many contexts, the MIR models are suitable as primary analytical assays.

The peak at 1796 cm-1 exhibited the least evidence of confounding overlap with other soil properties and gave excellent results even as a single peak prediction model. The peak at 2514 cm-1 was also quite good, but with an RMSEP nearly twice that of the peak at 1796 cm-1.

Acknowledgments

We wish to thank the several growers who gave us access to their fields to collect samples for this work.

References

  1. 1. Lal R, Kimble JM, Eswaran H, Stewart BA. Global Climate Change and Pedogenic Carbonates. Boca Ratan, Florida: CRC Press LLC; 2000.
  2. 2. Sanderman J. Can management induced changes in the carbonate system drive soil carbon sequestration? A review with particular focus on Australia. Agric Ecosyst Environ. Elsevier B.V.; 2012;155: 70–77.
  3. 3. Hartemink AE, Mcsweeney K. Soil Carbon [Internet]. 2014. https://doi.org/10.1007/978-3-319-04084-4
  4. 4. Singh BP, Chan KY, Cowie l AL. Soil Health and Climate Change.
  5. 5. Bughio M a., Wang P, Meng F, Qing C, Kuzyakov Y, Wang X, et al. Neoformation of pedogenic carbonates by irrigation and fertilization and their contribution to carbon sequestration in soil. Geoderma. Elsevier B.V.; 2016;262: 12–19.
  6. 6. Bellon-Maurel V, McBratney A. Near-infrared (NIR) and mid-infrared (MIR) spectroscopic techniques for assessing the amount of carbon stock in soils—Critical review and research perspectives. Soil Biol Biochem. Elsevier Ltd; 2011;43: 1398–1410.
  7. 7. McCarty GW, Reeves JB. Comparison of Near Infrared and Mid Infrared Diffuse Reflectance Spectroscopy for Field-Scale Measurement of Soil Fertility Parameters. Soil Sci. 2006;171: 94–102.
  8. 8. Singh K, Murphy B, Marchant B. Towards cost-effective estimation of soil carbon stocks at the field scale. Soil Res. 2013; 672–684. http://dx.doi.org/10.1071/SR12119
  9. 9. Loeppert R, Suarez D. Carbonate and Gypsum. In: Sparks D, Page A, Helmke P, et al., editors. Methods of soil analysis Part 3—chemical methods. Soil Science Society of America Inc.; 1996. pp. 437–474.
  10. 10. Rabenhorst MC. Determination of Organic and Carbonate Carbon in Calcareous Soils Using Dry Combustion. Soil Sci Soc Am J. 1988;52: 965.
  11. 11. Allison L, Moodie C. Carbonate. In: Black C, editor. Methods of Soil Analysis Part 2. American Society of Agronomy; 1965. pp. 1346–1366.
  12. 12. Horvath B, Opara-Nadi O, Beese F. a Simple Method for Measuring the Carbonate Content of Soils. Soil Sci Soc Am J. 2005;69: 1066–1068.
  13. 13. Sherrod L a., Dunn G, Peterson G a., Kolberg RL. Inorganic Carbon Analysis by Modified Pressure-Calcimeter Method. Soil Sci Soc Am J. 2002;66: 299.
  14. 14. Chatterjee a., Lal R, Wielopolski L, Martin MZ, Ebinger MH. Evaluation of Different Soil Carbon Determination Methods. CRC Crit Rev Plant Sci. 2009;28: 164–178.
  15. 15. Grinand C, Barthès BG, Brunet D, Kouakoua E, Arrouays D, Jolivet C, et al. Prediction of soil organic and inorganic carbon contents at a national scale (France) using mid-infrared reflectance spectroscopy (MIRS). Eur J Soil Sci. 2012;63: 141–151.
  16. 16. Brown DJ, Shepherd KD, Walsh MG, Dewayne Mays M, Reinsch TG. Global soil characterization with VNIR diffuse reflectance spectroscopy. Geoderma. 2006;132: 273–290.
  17. 17. Soriano-Disla JM, Janik LJ, Viscarra Rossel R a, MacDonald LM, McLaughlin MJ. The Performance of Visible, Near-, and Mid-Infrared Reflectance Spectroscopy for Prediction of Soil Physical, Chemical, and Biological Properties. Appl Spectrosc Rev. 2014;49: 139–186.
  18. 18. Viscarra Rossel R a., Walvoort DJJ, McBratney a. B, Janik LJ, Skjemstad JO. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma. 2006;131: 59–75.
  19. 19. Reeves JB. Near- versus mid-infrared diffuse reflectance spectroscopy for soil analysis emphasizing carbon and laboratory versus on-site analysis: Where are we and what needs to be done? Geoderma. Elsevier B.V.; 2010;158: 3–14.
  20. 20. Nguyen T, Janik L, Raupach M. Diffuse reflectance infrared fourier transform (DRIFT) spectroscopy in soil studies. Aust J Soil Res. 1991;29: 49.
  21. 21. Gillet P, McMillan P, Schott J, Badro J, Grzechnik A. Thermodynamic properties and isotopic fractionation of calcite from vibrational spectroscopy of 18O-substituted calcite. Geochim Cosmochim Acta. 1996;60: 3471–3485.
  22. 22. Reeves JB, Smith DB. The potential of mid- and near-infrared diffuse reflectance spectroscopy for determining major- and trace-element concentrations in soils from a geochemical survey of North America. Appl Geochemistry. Elsevier Ltd; 2009;24: 1472–1481.
  23. 23. Bellon-Maurel V, Fernandez-Ahumada E, Palagos B, Roger JM, McBratney A. Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. TrAC—Trends Anal Chem. 2010;29: 1073–1081.
  24. 24. McCarty GW, Reeves JB, Reeves VB, Follett RF, Kimble JM. Mid-Infrared and Near-Infrared Diffuse Reflectance Spectroscopy for Soil Carbon Measurement. Soil Sci Soc Am J. 2002;66: 640.
  25. 25. Staff SS (KSSL). Kellogg Soil Survey Laboratory Methods Manual. report 42. Burt R, editor. USDA; 2014.
  26. 26. Stumpe B, Weihermüller L, Marschner B. Sample preparation and selection for qualitative and quantitative analyses of soil organic carbon with mid-infrared reflectance spectroscopy. Eur J Soil Sci. 2011;62: 849–862.
  27. 27. van der Marel H, Beutelspacher H. Atlas of Infrared Spectroscopy of Clay Minerals and their Admixtures. New York: Elsevier Scientific Publishing Company; 1976.
  28. 28. White W. The carbonate minerals. In: Farmer V, editor. The Infra-Red Spectra of Minerals. London: Minerological Society; 1974. p. 227-.
  29. 29. Williams PC. Implementation of Near-infrared Technology. In: Williams P, Norris KH, editors. Near-infrared technology in the agricultural and food industries. 2nd ed. St. Paul: American Association of Cereal chemists, Inc; 2001. pp. 145–169.
  30. 30. Ben-Dor E, Banin a. Near-infrared reflectance analysis of carbonate concentration in soils. Appl Spectrosc. 1990;44: 1064–1069.
  31. 31. Bornemann L, Welp G, Amelung W. Particulate Organic Matter at the Field Scale: Rapid Acquisition Using Mid-Infrared Spectroscopy. Soil Sci Soc Am J. 2010;74: 1147.
  32. 32. Calderón FJ, Reeves JB, Collins HP, Paul E a. Chemical Differences in Soil Organic Matter Fractions Determined by Diffuse-Reflectance Mid-Infrared Spectroscopy. Soil Sci Soc Am J. 2011;75: 568.
  33. 33. Haaland DM, Thomas E V. Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information. Anal Chem. 1988;60: 1193–1202.
  34. 34. Janik L, Skjemstad J, Raven M. Characterization and analysis of soils using mid-infrared partial least-squares.1. Correlations with XRF-determined major-element composition. Aust J Soil Res. 1995;33: 621.
  35. 35. Stetson SJ, Osborne SL. Further Modification of Pressure-Calcimeter Method for Soil Inorganic Carbon Analysis. Commun Soil Sci Plant Anal. 2015;46: 2162–2167.
  36. 36. Viscarra Rossel R, Webster R. Predicting soil properties from the Australian soil visible-near infrared spectroscopic database. Eur J Soil Sci. 2012;63: 848–860.