Improving Estimations of Spatial Distribution of Soil Respiration Using the Bayesian Maximum Entropy Algorithm and Soil Temperature as Auxiliary Data

Soil respiration inherently shows strong spatial variability. It is difficult to obtain an accurate characterization of soil respiration with an insufficient number of monitoring points. However, it is expensive and cumbersome to deploy many sensors. To solve this problem, we proposed employing the Bayesian Maximum Entropy (BME) algorithm, using soil temperature as auxiliary information, to study the spatial distribution of soil respiration. The BME algorithm used the soft data (auxiliary information) effectively to improve the estimation accuracy of the spatiotemporal distribution of soil respiration. Based on the functional relationship between soil temperature and soil respiration, the BME algorithm satisfactorily integrated soil temperature data into said spatial distribution. As a means of comparison, we also applied the Ordinary Kriging (OK) and Co-Kriging (Co-OK) methods. The results indicated that the root mean squared errors (RMSEs) and absolute values of bias for both Day 1 and Day 2 were the lowest for the BME method, thus demonstrating its higher estimation accuracy. Further, we compared the performance of the BME algorithm coupled with auxiliary information, namely soil temperature data, and the OK method without auxiliary information in the same study area for 9, 21, and 37 sampled points. The results showed that the RMSEs for the BME algorithm (0.972 and 1.193) were less than those for the OK method (1.146 and 1.539) when the number of sampled points was 9 and 37, respectively. This indicates that the former method using auxiliary information could reduce the required number of sampling points for studying spatial distribution of soil respiration. Thus, the BME algorithm, coupled with soil temperature data, can not only improve the accuracy of soil respiration spatial interpolation but can also reduce the number of sampling points.

Introduction compared the results of the BME algorithm to those of kriging and proved that the former could improve the regional interpolation effect. Akita et al. [25] developed a moving-window BME method to improve the estimation accuracy of regional air pollutant distribution.
Studies in the field of soil respiration have indicated that soil surface temperature significantly affects soil respiration, representing an exponential relationship [26][27][28]. Soil temperature can be more easily obtained by advanced technologies (such as wireless sensor networks) than soil respiration. Thus, in this paper, we take advantage of the ability of the BME algorithm to use soft data and employ soil temperature as the soft data. In doing so, we confirm the following three hypotheses: (1) The BME method is more accurate at estimating the spatial distribution of soil CO 2 efflux than OK and Co-OK methods, (2) data on soil temperature, used as auxiliary information, provide improved estimates of the spatial distribution of soil CO 2 efflux on small scale, (3) and this auxiliary information can help reduce the number of sampling points while studying the spatial distribution of soil CO 2 efflux.

Study area
The experimental site was located in the city of Lin'an in the northwest of Jincheng County, Zhejiang Province, China (119°43'15.24"-119°43'26.97"E and 30°15'21.60"-30°15'33.27"N). The entire study area is open grassland bounded by a lake in the east; sparse woods in the south, west, and northwest; and open grassland in the northeast. The average altitude is 50 m, and the highest altitude is 170 m. The area has an average annual frost-free period of 237 days and receives an average annual rainfall of 1613.9 mm over a total of about 158 days. The average annual temperature is 16.4°C, with 1847.3 h of annual sunshine. Roughly speaking, this area is warm and humid, featuring a subtropical monsoon climate, and it has sufficient illumination, abundant rainfall, and four distinct seasons.
The area belongs to Zhejiang A & F University, and we can do research freely. Other researchers also can easily get permission to do research in the area. Warning signs had been set during the testing process to insure there was no any danger. The research didn't cause irreversible damage to the soil and there isn't any protected species in the study area.

Data sources
The experimental area covers an area of 35 m × 35 m, which was divided into grids of 5 × 5m. Seven Lr100GE-6400 [29], developed by GreenOrbs Laboratory, were used to measure the CO 2 efflux within each grid. The Lr100GE-6400 is an open-box CO 2 efflux measuring instrument. One day before measuring the CO 2 efflux, a PVC soil collar was pressed into the soil to a depth of about 5 cm at the center of the surface of each soil core. In order to ensure simultaneous readings, we took 1 minute to warm up the 7 instruments, and 3 minutes to conduct the measurements, with all measurements being completed within the 90 minutes between 13:00 to 14:30 on September 30 (Day 1) and October 7 (Day 2), 2014. We chose the stable data as the experimental data from all the observed data. On Day 1, we conducted measurements from east to west (horizontal direction), while on Day 2, we did so from south to north (vertical direction). The summary statistics of the CO 2 efflux data have been provided in Table 1. Higher density temperature data (35 × 35) were measured manually by 30 corrected soil TP-101 thermometers, logged when they achieved equilibrium at 1 to 2 min inserted into the soil at a depth of about 5 cm-10cm. All rounds of sampling were completed in 90 min. The summary statistics of the soil temperature data appear in Table 2.

Data preprocessing
The application of soft data (soil temperature) is important for the BME algorithm, to integrate uncertain information into the estimation. Suitable and high quality soft data can improve the performance of the algorithm. Soft data can be integrated into expert knowledge, experimental conclusions, and so on, with the common probability-type of soft data [30,31]. Probability soft data can be approximately normally distributed or Student t-distributed to express the measurement error or physical interpretation [32,33]. Interval soft data denote physical meanings with upper and lower bounds. After reviewing other similar studies, we assumed that soil respiration and soil temperature share a functional relationship of the Arrhenius type [26,27], as shown in Eq 1. We used the measured data to calculate the fitting parameters in Eq 1. The soft data are given by the fitting results at regular intervals of the Student's t-distribution, and the formula used to calculate the prediction interval is given by Eq 2 [34]: Here,R S is the estimated soil CO 2 efflux relative to the soil temperature T i , with parameters a and b. R SÁinterval is the prediction interval corresponding to each estimated soil CO 2 efflux R sÁi : t nÀ2;0:025 is the critical value of the Student's t-distribution with (n-2) degrees of freedom and a confidence level of 95%. S TÁP is the standard deviation (SD) of the soil CO 2 efflux estimation error. T is the average soil temperature. S TT is the sum of the square of the deviations. The numerical values of parameters in Eqs 1 and 2 are shown in Table 3. Fig 1 shows the relationship between soil CO 2 efflux and soil temperature, the probability distribution, and prediction interval of the soft data.

Comparison between methods
This paper compared the results of the BME method and two Kriging methods (OK and Co-OK). The spatial estimates from the three methods were evaluated by several validation methods.
Kriging. Kriging is an unbiased linear estimation method used to characterize a physical attribute's spatial variation and generate attribute estimates at un-sampled locations. OK, the simplest and most widely used kind of Kriging, calculates the weights (relative contributions) of attribute samples surrounding each estimation point by means of the geostatistical variogram, and the unknown attribute values are then estimated as the linear combination of the weighted samples, subject to the condition that the sum of the weights is equal to 1, see Eq 3: Here, Z Ã (V 0 ) represents the value of the estimated point V 0 , V i represents the value of the ith point among n points around V 0 , and λ i denote the weight coefficients.
Co-OK follows the same principle as OK. However, the former considers more than one variable, and in addition to considering the spatial relationship of the main variable itself, it considers the relationships between the main variable and all other variable types to enable better predictions. Adding more information about relevant variables while estimating the main variable can improve the estimated effects. As we consider the spatial variability of soil C flux over time, adding the closely related variable of soil temperature can compensate for the insufficiency of CO 2 efflux sampling and improve the accuracy of the estimation, as shown in Eq 4:  Here, Z Ã 1 is the estimate of the main variable Z 1 at point V 0 . λ 1i is the weight of the main variable Z 1 , and λ 2j is the weight of the auxiliary variable Z 2 (the second characteristic).
The Kriging method studies the spatial relationships from point to point, which are usually used to express spatial variability with an experimental variogram. Variograms generally include self-variograms and cross-variograms, and they are used to determine the spatial autocorrelations of the variable's properties, as shown as Eq 5 [35]: whereΥðhÞ is the experimental semivariance at a separation distance h, Z(X i ) is the property value of the variable at the i-th point, and N(h) is the number of pairs of points separated by the distance h.
In this paper, we added soil temperature properties as the auxiliary information. Cross-variograms can show the relationship between two variables, as seen in Eq 6 [36]: g ZY ðhÞ is the experience cross-variogram at separation distance h, Z (X i ) is the main property value at the i-th point, Y(X i ) is the secondary property value at the i-th point, and N(h) is the number of pairs of points separated by distance h. Based on the coefficient of determination (R 2 ) and squared residuals, we chose the Gaussian and Spherical models as the optimal variogram model, as depicted by Eq 7 and Eq 8: where γ (h) is the semivariance, C 0 indicates the nugget, C represents the structural variability, C 0 + C represents the sill variance, and a denotes the correlation length range in geostatistics. Bayesian Maximum Entropy. BME is a spatiotemporal analysis and mapping method that combines information theory with Bayesian statistics [30]. Compared with classical geostatistics (kriging), BME can consider general-prior and site-specific knowledge using a certain error and uncertainty in soft (uncertain) data, in addition to hard (exact) data, to improve the accuracy of spatiotemporal analysis. The meaning of soft data is very flexible; it may denote sampled data, historical data, rough measurement data, expert knowledge, and/or model fitting data. The manner in which the BME method uses comprehensive information and employs the probability method to express uncertainty to the extent possible allows it to present more realistic results in the spatiotemporal analysis of nature attributes.
The BME process can be divided into three stages: prior, meta-prior, and posterior (Fig 2). The prior stage mainly uses general knowledge G, to calculate the prior joint probability density function f G (χ map ) using the Shannon information measure. χ map is composed using hard data χ hard , soft data χ soft , and unknown values at estimation point χ k (the point to be estimated). The process involved in obtaining the expected information is shown as Eq 9, where g α (χ map ) contains the general statistical information about χ map , such as the mean and covariance. The Shannon information measure is used to maximize the entropy under the relevant constraints of g α (χ map ). Eq 10 shows the function of the Lagrange multipliers method (LMM) for maximizing the expected information by introducing the Lagrange multiplier μ α and the expectation of g α (χ map ).
At the meta-prior stage, specific knowledge will be added into the calculation, including the measured hard data χ hard and various forms of soft data χ soft . The soft data set f S (χ soft ) denoted probability in this paper. At the posterior stage, using Bayes' theorem, the posterior probability density function f K (x k |χ data ) of the estimation points is calculated, taking into account the conditions of the specific knowledge, shown as Eqs 11 and 12: After computing the posterior probability density function f K (x k |χ data ), we can obtain the attribute values at the estimation points by way of the maximum posterior probability or maximum expectation, shown as Eqs 13 and 14: where x Ã k denotes the estimated values of x k . We choose Eq 9 as the final estimation method in this paper.
Methods of validation. In order to evaluate the performance of the three methods (OK, Co-OK, and BME), about 10% of the collected data (which were selected to avoid continuous sampling points or points located on the outside, namely, a total of five points) were employed as data for cross-validation. The three statistical indicators of the root mean squared error (RMSE), correlation coefficient (CR), and average deviation (mean bias) were used to quantify the accuracy of the estimation results, as shown in Eqs 15, 16 and 17: Bias ¼ We applied GS+ 9.0 spatial analysis software to calculate the semivariance and for the OK and Co-OK methods [37]. The BMElib library (BMEGUI3.0 software) was used for the BME method [31], and Matlab 8.4 was employed for basic processing and mapping.

Results
Comparison of results of the OK, Co-OK, and BME methods the mean values of soil respiration in the study area was 3.476 and 5.81 μmg/m 2 s with the varition range of 2.698 and 3.019 μmg/m 2 s respectively, and the values of the coefficient of variation were observed 17.332 and 8.805 respectively, which proved that it was significant to consider the spatial variability of soil respiration in the study area. As shown in the Table 4, the parameters of the variogram models for the auto-variograms of soil respiration and the cross- variograms of the soil respiration and temperature in the study area during the abservation periods. There was some different in the range values of soil respiration in the observing days but the similar range of the cross-varigram of soil respiration and temperature, shown in Fig 3, and the soil respiration presented strong spatial dependence with the C 0 /(C 0 + C) ratio<0. 25. The spatial distribution of soil respiration in the study area was estimated using the three methods previously described. Fig 4 shows results of the mapping method for the samples collected over Day 1 and Day 2. In general, the three methods reflect the variations in soil respiration and its range in the study area, but certain difference in the local. The range of the spatial estimates, however, shows some differences among the three methods (Table 5). According to the BME method, the range on Day 1 and Day 2 was 3.543 and 5.038, respectively, which exceeded the ranges obtained using the Co-OK method (2.220 and 3.130, respectively) and the OK method (2.170 and 3.040, respectively). In fact, the results from the BME method showed an even greater range for both days than the corresponding values for the measured data (2.698 and 3.014, respectively). The estimations obtained using the Co-OK and OK methods fall in the same range as the measured values. Note that the values estimated using the Co-OK method are slightly larger than those obtained using the OK method. At the same time, the standard deviation of estimated value using the three methods also follow that the range of BME method is greater than Co-OK method than the OK method. In probability terms, it's possible for the range of estimated value of soil respiration beyond the sampling data using only sampling data to estimate the value of soil respiration in the entire study area, so the BME method integrating more information and presenting the spatial variation of soil respiration in a probabilistic manner is more in line with real-world changes. Table 6 presents the validation results for the three methods. CR for Day 1 and Day 2 using the BME method (0.793 and 0.697, respectively) was significantly higher than the corresponding values obtained using the Co-OK and OK methods. The correlation between the estimation results from the BME method and the actual measurements is higher. Moreover, the RMSEs and absolute values of bias for both Day 1 and Day 2 were the lowest for the BME method, indicating that it can provide estimates of soil respiration in the study area with higher precision than the traditional Co-OK and OK methods.  Effect of soil temperature as auxiliary information on the spatial estimation of soil respiration In this study, we used only soil respiration samples for the OK method, while soil temperature data provided additional inputs to the BME and Co-OK methods. Table 6 shows that the RMSEs (using soil temperature data) obtained using the BME and Co-OK methods are 0.727 and 0.911, respectively, on Day 1, and 0.409 and 0.790, respectively, on Day 2, indicating that the inclusion of soil temperature as auxiliary information in the BME and Co-OK methods improves their overall performance compared to the OK method (its corresponding RMSE values being 0.979 and 2.042, respectively). The verification of the existing correlation indicates that the BME and Co-OK methods significantly outperform the OK method; thus, soil temperature data can effectively improve the reliability of the spatial pattern of soil respiration. However, the results for the deviation of validation (bias; Table 6) indicate only a small difference in results between the Co-OK and OK methods (-0.309 and -0.296, respectively, on Day 1), while the value for the BME method is -0.035, which is significantly better than those obtained using the other two methods. Thus, the auxiliary effect of soil temperature on the estimates would vary depending on the manner of its utilization. It is notable that the estimations obtained using the BME and Co-OK methods with soil temperature data are different. The results of the index regression model for both days of the measured data show that soil temperature can explain 60.6% and 57.1% of soil respiration respectively, as shown in Table 3. After estimating the spatial variation of soil respiration in the study area using the BME and Co-OK methods, the generated soil respiration estimates are exponentially related to the soil temperature values, see Fig 5. According to the results, in the BME method, soil temperature can explain 57.6% and 47.8% of soil respiration on Day 1 and Day 2, respectively. For the Co-OK method, the corresponding values are 47.8% and 37%, using the same soil temperature as auxiliary information. Thus, the utilization degree and degree of influence of the auxiliary information differ between the BME and Co-OK methods. Thus, Table 5 clearly indicates that, statistically, the BME method outperforms the Co-OK method in terms of estimating the spatial variation in soil respiration when soil temperature serves as auxiliary data.

Effect of soil temperature as auxiliary information on soil respiration sampling points
According to the analysis presented above, soil temperature can serve as auxiliary information to improve the estimation accuracy of the spatial distribution of soil respiration. In the field, soil temperature data can be obtained by setting up a wide network of wireless sensor networks, Spatial Distribution of Soil Respiration Using BME but these data are cumbersome to measure over multiple points over a large area and over a long period. However, the collection of high-density soil temperature data to help estimate soil respiration has certain practical significance. Accordingly, we reduced the number of sampling points for soil respiration in our study, and as noted in previous sections, using soil temperature data, we then applied the BME method to estimate the spatial distribution of soil respiration in the study area. The results were compared with estimates from the OK method, which could not use the temperature data. In order to ensure an even distribution of measuring points throughout the study area, we chose the measuring points across four quadrants (Fig 6), including points on the horizontal and vertical lines (axes) dividing the quadrants. The number of measuring points was increased from 1 to 3, from 3 to 6, and from 6 to 9 in each quadrant and by 1 on each axis, ultimately resulting in the 9, 21, 37, and 49 measuring points scheme adopted by this study. Fig 7 shows the spatial distribution of soil respiration estimated by the BME and OK methods when the number of measuring points was 9, 21, 37, and 49. The BME results continue to show more detailed information even when the number of sampling points are reduced. Conversely, the results from the OK method, which cannot account for temperature data, depend strongly on the number of measuring points, and thus, it is difficult to obtain detailed information with a reduced number of measuring points. Table 7 shows the validation results in terms of the RMSE and CR when the number of sampling points is 9, 21, and 37. For Day 1, the RMSEs for the BME method are 0.972, 0.838, and 0.673, respectively, while those for the OK method (without using soil temperature as auxiliary information) are 2.759, 1.246, and 1.146, respectively. This proves that the BME method using soil temperature as auxiliary information provides more accurate results, which are not significantly affected by the reduced number of soil respiration samples (that are difficult to obtain). The values of CR using the BME method with soil temperature as auxiliary information are 0.778, 0.906, and 0.951 when the number of sampling points is 9, 21, and 37, respectively. Again, the results of the BME method are preferable to those of the OK method (the corresponding values being 0.289, 0.544, and 0.738, respectively). Moreover, the differences in CR using the BME method for the three above-mentioned sampling strategies are smaller (Fig 8). We see similar patterns for Day 2 also. Furthermore, when the number of sampling points is 9, the RMSEs using the BME method are 0.972 and 1.193 for Day 1 and Day 2, which continue to be superior to the RMSEs obtained using the OK method when the number of sampling points is 37 (1.146 and 1.539, respectively). This means that the accuracy of the BME method, coupled with soil temperature as auxiliary information, does not suffer when the number of soil samples is reduced, and the effect of reducing the number of sampling points on the estimation results of the study area as a whole is small.

Accuracy of soil respiration spatial interpolation based on Bayesian Maximum Entropy is high
Christakos proposed the BME method in the early 1990s demonstrated both its theoretical and practical significance. Christakos also demonstrated that Kriging was a particular situation of BME under the limited assumption of transcendental information and effective data [32][33][34][35][36]38]. By making the best use of auxiliary information, the BME method could effectively deal with real-world physical situations characterized by relatively high uncertainty. The BME method has been applied to the area of edaphology [39][40][41] and the results have demonstrated less error and higher accuracy compared with traditional interpolation. Spatial estimation values of higher accuracy and more detailed local differences have been achieved in its application to environmental science [42][43][44][45][46][47]. The BME method fittingly integrates into the soft data's logarithm spatial analysis and improves the classification of results in data mining [48][49][50]. The BME method has also obtained better effects than traditional methods in applications to the  ecological field [24,51,52].Traditional geostatistics mainly relies on sampled data measurements to obtain estimates for the whole study area without bias. The BME method, however, blends more soft data and expert knowledge with the sampled data, making full use of this information to enhance the estimation accuracy of the study object. For example, the spatial distribution of soil respiration can be studied using auxiliary information from the existing literature; Teixeira et al. [6] used soil bulk density as auxiliary information to improve estimates of spatial distribution of soil C density in sugarcane fields. This article considers the exponential relationship between soil temperature and C flux, by applying the BME approach to soil respiration and using soil temperature data as auxiliary information. BME thus effectively improves the accuracy of soil respiration spatial interpolation.
Bayesian maximum entropy method coupled with soil temperature data (as soft data) can reduce the number of sampling points Currently, the most widely used instruments in soil respiration monitoring are Li-8100 and Li-6400, which are so expensive that it is difficult to buy multiple instruments for simultaneous monitoring. The static air box method, although inexpensive, requires more manpower, and it is difficult to achieve simultaneous monitoring. Soil respiration typically exhibits strong spatial heterogeneity [19,[53][54][55][56]. Therefore, an insufficient number of monitoring points cannot characterize soil respiration for an entire area. Soil temperature, however, is relatively easy to measure, and these measurements can compensate for the lack of adequate monitoring points. Thus, devising a method that requires fewer monitoring points to achieve more comprehensive measurements of soil respiration is of great significance. The biggest advantage of the BME method is that it can integrate soft data (prior knowledge, expertise, etc.) into hard data (measured data). Many scholars [32,[57][58][59] have concluded that, compared to conventional kriging, the BME method offers the advantages of higher accuracy and less error. Regarding the use of auxiliary information, the Co-OK method only uses weights to simply increase the influence of auxiliary information. The BME method, in contrast, is based on information entropy theory and using the LMM, it integrates the auxiliary information into the estimation of the natural attributes of interest. Then, using the Bayesian approach, it calculates the final estimate, thus making the whole process more systematic. .During the integration of the auxiliary information, the BME method combines the measured data (the hard data X hard = {X 1 ,X 2 Á Á ÁX h }) and the auxiliary information (soft data X soft = {X h+1 ,X h +2 Á Á ÁX h+s }) into available information sets X data = {X hard , X soft }, namely, the elements are expanded from h to q = h + s. Generally, the available soft data are very large; in this study too, the extent of soil temperature data significantly exceeded the measured soil respiration data. Notably, the expandation in the amount of information increases the uncertainty of information. Using its base of information entropy and the ability of the Bayesian approach to deal with information uncertainty, the BME method effectively combines the auxiliary information and measured data in the same space estimation [25,52,60]. Christakos also inferred the sum of partial derivatives of product μ q g q (X k , X hard , X soft ) between Lagrange parameter μ q and statistical matrix equation g q (X k , X hard , X soft ) with respect to the estimation point X k ; in other words, the BME method entails the simultaneous inclusion of massive amounts of soft data and hard data into the calculation [38]. Similarly, in this study, high-density soil temperature data were integrated into the space estimates alongside soil respiration values. Meanwhile, in the absence of adequate amounts of hard (measured) data region, the integration of soft data compensates for this absence of hard data to some extent. Christakos et al. [32] also noted that the BME method uses less measured data to obtain better estimates than the simple kriging method, which uses the space of all measured data. In agreement with this result, this study showed that the BME method, when coupled with soil temperature data and measured data for only nine measurement points, had better effects than the OK method for 37 measurement points as the verification results showed in Table 6. Thus, the BME method, coupled with soil temperature as auxiliary information, can reduce the number of monitoring points in soil respiration studies by a considerable extent.

Conclusions
This paper coupled the BME method with auxiliary information of soil temperature, to study the spatial distribution of soil respiration. The results indicated that the BME method is superior to the Co-OK and OK methods. Moreover, such application can reduce the number of soil respiration sampling points, thus indicating its significance in the area of soil respiration monitoring, the equipment for which is typically expensive enough to restrict researchers to sparse single-point monitoring.
This study also has some limitations. We chose soil temperature as auxiliary information. However, the spatial heterogeneity of soil respiration is affected by many other factors, such as soil C storage, soil bulk density, and root and microbial biomass. These factors can be taken into consideration in future studies of the performance of the BME method in interpolating soil respiration distributions while using data on multiple factors as auxiliary information.