Desert soil clay content estimation using reflectance spectroscopy preprocessed by fractional derivative

Effective pretreatment of spectral reflectance is vital to model accuracy in soil parameter estimation. However, the classic integer derivative has some disadvantages, including spectral information loss and the introduction of high-frequency noise. In this paper, the fractional order derivative algorithm was applied to the pretreatment and partial least squares regression (PLSR) was used to assess the clay content of desert soils. Overall, 103 soil samples were collected from the Ebinur Lake basin in the Xinjiang Uighur Autonomous Region of China, and used as data sets for calibration and validation. Following laboratory measurements of spectral reflectance and clay content, the raw spectral reflectance and absorbance data were treated using the fractional derivative order from the 0.0 to the 2.0 order (order interval: 0.2). The ratio of performance to deviation (RPD), determinant coefficients of calibration (Rc2), root mean square errors of calibration (RMSEC), determinant coefficients of prediction (Rp2), and root mean square errors of prediction (RMSEP) were applied to assess the performance of predicting models. The results showed that models built on the fractional derivative order performed better than when using the classic integer derivative. Comparison of the predictive effects of 22 models for estimating clay content, calibrated by PLSR, showed that those models based on the fractional derivative 1.8 order of spectral reflectance (Rc2 = 0.907, RMSEC = 0.425%, Rp2 = 0.916, RMSEP = 0.364%, and RPD = 2.484 ≥ 2.000) and absorbance (Rc2 = 0.888, RMSEC = 0.446%, Rp2 = 0.918, RMSEP = 0.383% and RPD = 2.511 ≥ 2.000) were most effective. Furthermore, they performed well in quantitative estimations of the clay content of soils in the study area.


Introduction
Direct measurements of various physical and chemical properties of soil are more accurate than estimations via remote sensing methods; however, they often require intensive field investigations that can be restricted by limited funds and labor [1]. Remote sensing is considered a promising alternative approach to conventional methods for estimating soil properties a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 because of its high efficiency, low cost, and its large-scale, non-destructive, rapid data acquisition [1,2]. In particular, its characteristics of high spectral resolution, convenience and controlled condition are well suited to laboratory analysis of soil spectral reflectance. Traditionally, the measurement of clay content in soil is complicated and it requires more chemical reagents and caution, especially for salt-affected soils [3]. Therefore, based on the different spectral responses in the VIS-NIR (visible and near-infrared) bands to soil particle size, spectral analysis technology could be used as an alternative to ensure the accurate estimation of the clay content in soil.
Many studies have been conducted on the spectral response features and quantitative prediction of clay content [4][5][6][7]. For example, Ben-Dor and Banin [8] considered that clay content was correlated strongly with the clay minerals in soil, and that the principal characteristic bands were related to the lattice hydroxyl groups of layered silicates. Stenberg et al. [9] reviewed the application of VIS-NIR spectroscopy in soil science, and their results showed that the characteristic bands cover the absorption spectra of the clay content (1400 nm), hydroxyl groups (1900 nm) and clay minerals (2200 nm). Using VIS-NIR spectroscopy and pretreatment by Savitzky-Golay (SG) smoothing, first derivative with SG smoothing, and other mathematical methods, the prediction performances of models based on multivariate adaptive regression splines were improved [10]. In order to obtain better accuracy in estimations of clay and soil organic matter (SOM) contents, Nawar et al. [11] applied the first-and second-derivative and another seven algorithms to pretreat the reflectance data.
The pretreatment of spectral reflectance is efficient in terms of improving the accuracy of spectral estimation models. In previous research, spectral reflectance has been transformed often by some commonly used functions, e.g., absorbance and the corresponding integer derivative algorithms [10,12,13]. To some degree, the spectral derivative can eliminate the background influence of the environment and highlight certain spectral features [14]. However, because the quantity of information is considerable, the pretreatment of spectral reflectance by a general integer order derivative might influence the detection of crucial information and, to some extent, cause loss of spectral information [15]. Fractional calculus is a theoretical branch of mathematics that generalizes the classic integer derivative into an arbitrary (non-integer) order, which has broadened the concept of the classic integer derivative [14,16]. Because of its improved accuracy and higher efficiency, it has been used widely in system control and diagnosis, digital filtering, signal and image processing, and other related fields [15,[17][18][19]. Of particular relevance, the fractional derivative has been applied to the pretreatment of the spectral data of saline soil [20], which has demonstrated its validity in detecting spectral information from reflectance data of soil from arid regions.
Compared with free iron, clay content is a more reliable indicator of the age and weathering degree of soil at the various stages of development [21]. Soil salinization and desertification are the most common but serious environmental problems in the Ebinur Lake basin of Northwest China [22]. Therefore, the calibration of a rapid and accurate model for the quantitative estimation of local soil clay content is crucial. Given this context and motivated by previous research, the objective of this study was to use laboratory-derived spectral reflectance data pretreated by the fractional derivative, in combination with known soil clay content to establish an acceptably accurate and stable model for soil parameter estimation.
The study area has an arid desert climate with mean annual precipitation, potential evapotranspiration, and temperature of 102 mm, 2492 mm, and 7.2˚C, respectively [23,24]. The Alataw Pass is a famous entrance for the northwest wind in the Ebinur Lake region. On average, winds with speeds >8 m s -1 occur on 164 days per year, reaching a maximum of 185 days per year [25,26]. The main geomorphic types are stone desert, gravel desert, salt desert, and swamp. The soil types are mainly Mollic Solonchaks, Gypsic Regosols, and Stagnic Solonetz [22,27]. Soil erosion by wind is a common phenomenon within this region because of the extreme weather and particular texture of the soil.
In order to obtain representative soil samples, 103 sampling sites (30 × 30 m) were established, with consideration of the typical landforms, landscape types, and soil textures of the study area. Within each site, soil samples were collected at five evenly distributed points and then mixed thoroughly to obtain a representative sample. Overall, 103 soil samples were collected at depths of 0-10 cm from the study area during May 18-29, 2015 (Fig 1).

Laboratory analysis
All 103 soil samples were air-dried, crushed, and then passed through a 2.0 mm sieve and the resulting fine earth (<2.0 mm) was retained for further analysis. The potassium dichromate method was applied for the measurement of SOM content [28]. The concentrations of K + and Na + were determined using the flame photometry method, and those of Ca 2+ and Mg 2+ were  determined using the EDTA complexometric titration method [28]. The soil electrical conductivity (EC) was determined using a WTW inoLab1 Multi 3420 Set B multiparameter measuring instrument (Wissenschaftlich-Technische Werkstätten GmbH, Germany) with extracts of soil and distilled water in a ratio of 1:5. Soil clay content was determined using a particle analyzer and imaging system (Bluewave S3500, Microtrac Inc., Largo, FL, USA) at room temperature.

VIS-NIR spectroscopy and spectral processing
Spectral measurement. For controlled irradiance conditions, the measurements of spectral reflectance for all soil samples were conducted in a dark laboratory. The reflectance spectra were measured using an ASD FieldSpec1 3 portable spectrometer (Analytical Spectral Devices Inc., Boulder, CO, USA) with a spectral range of 350-2500 nm. The sampling intervals of this spectrometer are 1.4 nm (350-1000 nm) and 2.0 nm (1000-2500 nm), while the resampling interval is 1.0 nm [5,20]. Circular containers with a diameter of 12.0 cm and a depth of 1.8 cm were used to store the soil samples (1.5 cm is considered optically infinitely thick for soil). To avoid contamination during the measurements, these containers had been painted black in advance [29]. Notably, each sample had the same flat measuring surface [30]. Scanning was performed using a fiber optic sensor with an 8˚zenith angle, which was placed 10.0 cm above the samples [20]. For lighting, a halogen lamp (50 W) was placed 50.0 cm from each sample at a zenith angle of 30˚ [1,11]. For each measurement, 20 spectral curves were gathered from the central area of the sample, and the final reflectance was yielded by averaging these 20 representative spectra. To ensure accuracy, the spectrometer was calibrated using a Spectra-lon1 panel with 100% reflectance prior to each measurement.
Spectral preprocessing. The measured reflectance data were translated from binary to ASCII and exported using ViewSpecPro™ software version 6.0. Marginal wavebands with low signal-to-noise ratios (350-400 and 2401-2500 nm) were omitted in order to eliminate the noise at the edges of each spectrum [31]. Smoothing was conducted with the SG algorithm using a window size of 5 and polynomial order of 2 using OriginPro1 version 9.0.0 [32]. The processed spectra constituted the final data for later analysis (S1 File). The processed spectral reflectance data of all the soil samples are illustrated in Fig 2. Generally, absorbance spectra are employed in spectral analysis because unlike inversion (1/R), root mean square ( ffiffiffi ffi R p ), logarithm (lg R), and other forms, they have has practical spectral meaning [33]. For better modeling results and improved nonlinear relations, the previously pretreated spectral reflectance was transformed into absorbance.

Fractional derivative
Fractional calculus is a theoretical branch of mathematics that generalizes the classic integer derivative into an arbitrary (non-integer) order. Detailed descriptions of this algorithm have been given by Schmitt [14] and by Zhang et al. [20]. In general, a fractional derivative has multiple forms, e.g., Grümwald-Letnikov (G-L), Riemann-Liouville, and Capotu [34,35]. In order to reduce computation complexity, the G-L definition was applied to the relevant calculations [36]. The specific formula for G-L on the section is as follows: where α and h are considered the order and step length, respectively, and the Gamma function is as defined in Eq (2): The actual spectral resolution of the instrument in this research was 1 nm; thus, setting h = 1 means Eq (1) can be written as: Notably, when α = 1 or 2, Eq (3) is identical to the common first-and second-derivative equations. The 0.0 order stands for data that are not processed by the algorithm [20,37,38]. Thus, according to Eq (3), the 0.0 to the 2.0 order fractional derivatives of spectral reflectance and its absorbance (order interval: 0.2) were calculated under the Java programming integrated development platform Eclipse.

Estimation model and prediction accuracy
Selection of calibration and validation set. For choosing the calibration and validation data set, the Concentration Gradient, Kennard-Stone (K-S), Sample Set Partitioning Based on Joint x-y Distances (SPXy), and other algorithms have been used widely [11][12][13]. The K-S algorithm is based on spectral distances, i.e., the spectral distance between two samples is calculated as in Eq (4). In spectral analysis, x p (i) and x q (i) are the responses at the ith wavelength for samples p and q: The SPXy algorithm is a modification of the K-S algorithm that can accommodate multidimensional variable space and two intersample distances [39][40][41]. In this algorithm, the sample distances are determined based on the independent variable (sp) and dependent variable (p) space for the parameter under consideration, and n is the number of samples. As above, y means the actual clay content in this research. Therefore, the distance d p (p,q) can be computed as: By assigning the same weight to the distributions of the samples in the sp and p spaces, the distances d sp (p,q) and d p (p,q) are both divided by the maximum values in the data set. Thus, the normalized d(p,q) can be calculated as follows: In this research, the calibration and validation data sets were selected by the SPXy algorithm, and they comprised 52 and 51 samples, respectively.
Modeling method and accuracy test. Partial least squares regression (PLSR) has been proven a robust and reliable approach in spectral quantitative research, primarily because of its advantages regarding dimension reduction and the synthesis and solving of collinearity problems among independent variables [20,42]. Here, to take full advantage of spectral reflectance data, all wavelengths in the 401-2400nm range were applied in building up the models using PLSR.
The performance of clay content prediction models is often assessed by five performance indices: the ratio of performance to deviation (RPD), determinant coefficients of calibration (R 2 c ), root mean square errors of calibration (RMSEC), determinant coefficients of prediction (R 2 p ), and root mean square errors of prediction (RMSEP): where M i is the measured value and P i is the predicted value, " M is the mean of the measured values; " P is the mean of the predicted values, SD is the standard deviation of the measured values, and n is the number of samples.
Optimal models are represented by high values of R 2 c , R 2 p , and RPD but low values of RMSEC and RMSEP. Generally, the RPD can be divided into three grades: Class A (RPD ! 2.000) has good predictive performance; Class B (1.400 < RPD < 2.000) indicates a possibility of distinguishing between high and low levels of clay content poorly, and Class C (RPD 1.400) has no predictive ability [13,43].
The entire calculation of this step was conducted using MATLAB 1 software version R2012a (MathWorks, Inc., Natick, MA, USA).

Statistical analysis of soil data
The descriptive statistical characteristics of the 103 soil samples are presented in Table 1. The clay content of all samples was low with mean and maximum values of 1.288% and 4.543%, respectively. Furthermore, the standard deviation was 0.961% and the coefficient of variation was 74.557%, indicating intermediate variability. The SOM content had a wider range, varying from 0.684 to 78.387 g kg -1 with a mean value of 21.429 g kg -1 . There were significant correlations between the clay and SOM contents (r = 0.307), as well as clay content and EC (r = 0.314) at the 0.05 significance level.

Soil spectrum
To investigate the relationship between clay content and spectral reflectance, soil samples with different clay contents were selected for curve plotting (Fig 3). The three spectral curves had similar shapes, variation tendencies and characteristic peaks. Thus, it was not complicated to distinguish the spectrum with the lowest, average, and highest clay contents in the range of 401-2400 nm, despite the spectral curves of the three soil samples having some overlapping sections (e.g., 550-600 and 1870-2000 nm). Because of moisture from different sources, there were three significant absorption features located around 1400, 1900, and 2200 nm [44,45]. It was obvious that all three spectra had the highest reflectance near 2150 nm. The curves could be distinguished approximately from 400 to 600 nm and from 1900 to 2000 nm. The diagram shows that clay contents of 0.000% and 4.543% corresponded to the lowest and highest reflectance, respectively, with the average spectrum approximately mid-way in between. The relationship intuitively reflected the correlation of clay content and corresponding spectral reflectance, which laid the foundation for this research.

Performance of PLSR models for quantitative estimation of clay content
Model calibration using all wavelengths can exploit all the spectral information of spectral reflectance. Furthermore, derivative pretreatment can effectively eliminate the impact of background noise on the target spectrum and enhance the spectral characteristics of the analyte [10,46]. In order to benefit from PLSR, all raw spectral reflectance and corresponding absorbance data, pretreated by the fractional derivative, were applied in the process of model calibration. Using an order interval set to 0.2, PLSR was used to build 22 inversion models. In this research, the performances of the estimating models were affected significantly by the various derivative orders (Tables 2 and 3).
For spectral reflectance, in the range from the 0.0 to the 1.0 order, the trend of model preference was not obvious: the highest values of R 2 c , R 2 p , and RPD were only 0.459, 0.551, and 1.196, respectively, for the 1.0 order, while the RMSEC and RMSEP achieved their optimal status (0.848% and 0.770%, respectively) at the 0.8 order. The five parameters did not reach maximum or minimum values for the same order within the specified range. However, the indices did show a slight improvement with the increase from the 1.0 to the 1.6 order. When the order reached 1.8, the performance of the model showed significant improvement with the highest values of R 2 c (0.907), R 2 p (0.916), and RPD (2.484 ! 2.000), while the RMSEC (0.425%) and RMSEP (0.364%) were the lowest of all the 11 models. This proved to be a critical point. As the order increased to 2.0, despite higher values of RPD (!2.000), the performance of each subsequent model was lower than the previous one (Figs 4 and 5). The absorbance models built using PLSR had similar variation trends to the spectral reflectance models. The optimal accuracy parameters did not appear at the same order. Considering the case of the range from the 0.0 to the 1.2 order, the pair of R 2 c and RMSEC, and the group of R 2 p , RMSEP, and RPD reached their optimum status at the 0.8 order and 1.2 order, respectively. For orders >1.6, the stabilities and accuracies of these models were perfected. However, despite the highest value of RPD (2.511), the model based on the 1.8 order did not possess the optimal values of R 2 c and RMSEC, which were 0.903 and 0.379% at the 1.6 order, respectively. For absorbance, RPD exceeded 2.000 for two models (Figs 6 and 7). After repeated siftings to determine good predictive performance and stability, the model based on the 1.8 order was selected as the optimum inversion model of absorbance.
For clay content, the results using the validation data set with the 1.8 order were the best among the 22 models with the values of R 2 p = 0.916, RMSEP = 0.364% and RPD = 2.484 and .543% clay content, 24.172 g kg -1 SOM, 68.547 g kg -1 Na + , and 6.044 g kg -1 Ca 2+ ; spectral curve (c) denotes the soil sample with 0.000% clay content, 25.340 g kg -1 SOM, 3.088 g kg -1 Na + , and 2.808 g kg -1 Ca 2+ .
https://doi.org/10.1371/journal.pone.0184836.g003 R 2 p = 0.918, RMSEP = 0.383% and RPD = 2.511, for spectral reflectance and the absorbance model, respectively (Figs 5 and 7). The calibration accuracies of these two models were slightly lower than with validation data set, but they remained within a reasonable range with values of R 2 c and RMSEC of 0.888-0.907, and 0.425%-0.446%, respectively. The slopes for the spectral reflectance and absorbance models with the 1.8 order using the validation data set were well distributed along the 1:1 line, indicative of good validations, while values over or under the 1:1 line indicated inaccurate estimation of the clay content in the soils of the Ebinur Lake basin (Figs 4 and 6). The results verified that the model based on spectral reflectance, pretreated using the fractional derivative, could be used to predict the clay content of soils.

Discussion
Effective pretreatment of spectral data could enhance the features of spectral reflectance, and minimize the irrelevant and useless information of the spectra [20,47]. Therefore, the performance of models for soil parameter estimation could be improved to some extent. The classic integer derivatives have exact physical meanings and the first and second derivatives represent the slope and curvature of the spectral curves, respectively. Normally, the order interval is 1.0,  introduction of high-frequency noise [14]. Compared with the integer derivative, the fractional derivative has a narrower order interval, which could reveal greater information of spectral reflectance, because the order is extended to non-integers, which could add detailed curves among the integer derivative spectral curves. Although the explicit spectral meaning of the fractional derivative has not been clarified yet, the non-local and genetic characteristics of the fractional derivate have been recognized widely. It is suggested that the fractional derivative between the 0.0 and the 2.0 order could be identified as the sensitivity to the slope and curvature of spectral curves [20]. Currently, the discrete algorithm of the fractional derivative only applies to spectral reflectance data obtained from ASD portable spectrometers with equal intervals. Raw spectral reflectance and the first and second derivatives are three approaches commonly used in model calibration. In terms of spectral reflectance, the model using non-pretreated data (0.0 order), performed the poorest among the 11 models, with the lowest values of R 2 c , R 2 p , and RPD and the highest values of RMSEC and RMSEP (0.417, 0.254%, 1.033, 0.927%, and 0.869, respectively). For spectral reflectance pretreated by the first derivative, the performance of the corresponding model improved slightly; however, it remained inadequate for the estimation of clay content (RPD = 1.196 < 1.400). When the order was set as 2.0, the model had good prediction ability, with values of R 2 c = 0.905, RMSEC = 0.445%, R 2 p = 0.880, RMSEP = 0.388%, and RPD = 2.103 ! 2.000. When the order was extended to include non-integers, eight additional models were built based on the fractional order. Considering the five accuracy indices for these models, they did not increase or decrease directly, but rather they varied irregularly. The model based on the 1.6 order had limited predictive ability with a value of RPD = 1.458 ! 1.400. It is noted that the prediction ability of the model based on the 1.8 order improved with optimal values of accuracy indices (i.e., R 2 c = 0.907, RMSEC = 0.425%, R 2 p = 0.916, RMSEP = 0.364%, and RPD = 2.484 ! 2.000), which exceeded the 2.0 order model. Although the 2.0 order model has good predictive performance (RPD = 2.103 ! 2.000), the precision parameters of the model based on the 1.8 order had improved further to some extent. Instead of adding complexity, it was vital to obtain further modeling results and to enhance the quantitative predicting ability of the models.
Among the 11 models, 10 had better performance than the 0.0 order model and 5 performed better than the 1.0 order model. Nevertheless, only one model built on the fractional order (the 1.8 order model) was superior to the 2.0 order model. Furthermore, the variation of absorbance models showed similar trends.
In this research, the SPXy algorithm was applied to select the calibration and validation data sets. This approach is based on the distance between the independent variable and dependent variable space for the parameter under consideration [39]. Commonly, previous research has used the Concentration Gradient and K-S algorithms that consider the concentration or corresponding spectroscopy of the samples. However, the SPXy algorithm combines both these aspects and it can accommodate multidimensional variable space, e.g., the clay content and reflectance data in our study. Consequently, it was considered reasonable that these inversion models might have various calibration and validation data sets.
In previous research, clay content has been estimated quantitatively using ultraviolet-visible, VIS-NIR, and mid-infrared reflectance spectroscopy [4,6,[48][49][50]. For spectral reflectance, multiple pretreating methods have been used, e.g., SG smoothing and the first derivate and second derivatives. Based on these approaches, many predicting models have been established. For example, Rossel et al. [1] applied the VIS spectral range (400-700 nm) to predict soil texture and soil organic carbon contents. Bilgili et al. [10] discovered that clay was strongly correlated with SOM, and they developed an optimized model for estimating local clay content that had good performance (R 2 = 0.83, RMSE = 4.03 g kg -1 ). Using PLSR with first derivative reflectance data, Nawar et al. [11] achieved values of R 2 p , RMSEP, and RPD of 0.65, 8.79%, and 1.67, respectively, when predicting the clay content in the soil of El-Tina Plain in Egypt.
The coefficients of all wavebands and the constant term of two optimal models established in this study are illustrated in Fig 8. Results obtained in the current study were in accord with previous research, and they indicated that the relatively larger absolute values of the coefficients were located within the range 670-850 nm [51]. The use of the fractional derivative in this study allowed greater exploration of the spectral information than previous approaches; it reduced information loss, and revealed the details of the variation trends of the 5 accuracy indices based on the spectral reflectance and absorbance models of 11 orders.
In reality, only limited quantitative information can be acquired using remote sensing techniques [48]. Generally, soil spectral features are affected by variations in the SOM, EC, iron oxide, and soil texture and moisture content. The SOM content in the Ebinur Lake basin is low (near 2%). With SOM content of 2% as a boundary, that is, when SOM content exceeded 2%, the SOM played a principal role in masking out the spectral features, while the SOM content was less than 2%, it became less effective [46,52,53]. In the study, the soil clay content was divided into five groups: 0%-1% (n = 46), 1%-2% (n = 40), 2%-3% (n = 8), 3%-4% (n = 7), and >4% (n = 2). Hence, it was obvious that the clay texture was not dominant within the study area, which meant that corresponding characteristic bands were difficult to detect. In addition, the correlation between the clay content and EC was significant (r = 0.314). In the arid ecology, salt concentrations in soils is generally high. Soluble salts in soil could bind fine particles and further form hard salt crust, which could fix the clay of soil [54,55]. It might influence the accuracy with clay content estimation to some degree. Furthermore, there was certain difficulty in the calibration of the retrieval model using the spectral reflectance data. The introduction of the fractional derivative algorithm generates a narrower order interval, which can reduce the loss of spectral information to some extent, extract additional spectral information, and determine the optimal prediction model. In this study, the model based on the fractional derivative 1.8 order was established as optimal.

Conclusions
In this research, the fractional derivative algorithm was used for the pretreatment of spectral reflectance. Based on this, 22 spectral models for the estimation of clay content in the desert soils of the Ebinur Lake basin were calibrated using PLSR, and the accuracy indices of the various models were compared. It was found that the values of R 2 c , R 2 p , RMSEC, RMSEP and RPD of the models did not increase or decrease. They were irregular and they reached optimal Predicting soil clay content by reflectance spectroscopy values at a fractional order. The two best models were selected: one. calibrated based on the 1.8 order of spectral reflectance (R 2 c = 0.907, RMSEC = 0.425%, R 2 p = 0.916, RMSEP = 0.364%, and RPD = 2.484 ! 2.000), and the other based on the 1.8 order of absorbance (R 2 c = 0.888, RMSEC = 0.446%, R 2 p = 0.918, RMSEP = 0.383%, and RPD = 2.511 ! 2.000). The Ebinur Lake basin is representative of an area with severe salinization. For a model designed to predict the different clay contents of soils, the salt contents might have a certain impact on model accuracy. Therefore, the next step in future research is to distinguish the features of salt, salt ions, SOM and soil texture from spectral reflectance curves to improve estimation accuracy.
Supporting information S1 File. The smoothed VIS-NIR spectra of 103 samples. Bands range from 401 to 2400 nm. (XLSX)