Spatial Prediction and Optimized Sampling Design for Sodium Concentration in Groundwater

Sodium is an integral part of water, and its excessive amount in drinking water causes high blood pressure and hypertension. In the present paper, spatial distribution of sodium concentration in drinking water is modeled and optimized sampling designs for selecting sampling locations is calculated for three divisions in Punjab, Pakistan. Universal kriging and Bayesian universal kriging are used to predict the sodium concentrations. Spatial simulated annealing is used to generate optimized sampling designs. Different estimation methods (i.e., maximum likelihood, restricted maximum likelihood, ordinary least squares, and weighted least squares) are used to estimate the parameters of the variogram model (i.e, exponential, Gaussian, spherical and cubic). It is concluded that Bayesian universal kriging fits better than universal kriging. It is also observed that the universal kriging predictor provides minimum mean universal kriging variance for both adding and deleting locations during sampling design.


Introduction
Sodium is a mineral which is always present in drinking water through natural occurrences. The human body needs sodium to maintain the blood pressure and to control fluid levels. If it exceeds a threshold value (i.e., 200 mg/l), then it may change the taste of water. Moreover, it also creates severe medical problems for those who have high blood pressure. In most of the countries the level of sodium in water is less than 20 mg/l, however, in some countries it exceeds 250 mg/l. About 1.1 billion people in the whole world are unable to access safe drinking water, and it causes a lot of deaths. Gadgil [1] provides some guidelines and highlights the presence of required parameters in drinking water. Ferreccio et al. [2] evaluates the concentration of arsenic in drinking water which was about 860 mg/l during 1958 to 1970 in northern cities of Chile, however, later on it has been reduced to 40mg/l. They cross-examine two types of patients, smokers and non-smokers, and conclude that consumption of arsenic through water is the major cause of lung cancer. Gundogdu and Guney [3] use various kriging techniques to study the water levels by using spherical, tetraspherical, pentaspherical, exponential, Gaussian, rational quadratic, hole effect, K-bessel, J-bessel and stable variogram models. They conclude that universal kriging is better than ordinary kriging for interpolating water levels. Neuman et al. [4] described and implemented Bayesian model averaging, and maximum likelihood version of Bayesian model averaging which does not require any prior knowledge about parameters. It updates posterior probabilities as well as model parameters on the basis of new data, and is consistent with modern statistical methods of hydrologic model calibration. Mehrjardi et al. [5] show that co-kriging and kriging methods are better than inverse distance weighting techniques for predicting the spatial distribution of some characteristics of groundwater quality. Nas [6] concludes that ordinary kriging provides accurate patterns of groundwater quality parameters in Konya, Turkey. Sarukkalige [7] also uses kriging techniques for the analysis of the quality of groundwater in Western Australia. Andrade and Stigter [8] model the spatiotemporal variation of arsenic concentration in groundwater by using geostatistical and multivariate methods. They show that the concentration of arsenic has strong correlation with rice culture and that indicator kriging provides appropriate maps of arsenic concentrations.
Dhar and Datta [9] developed methodology based on inverse distance weighting method to reduce redundancy in monitoring network. Siri et al. [10] establish a sampling scheme for generating samples simultaneously. They prove that GPS units and a pseudo-sampling frame are more effective than old sampling methods. Brus and Heuvelink [11] validate that universal kriging performs better than ordinary kriging in terms of smaller mean universal kriging variance (MUKV) for spatial sampling design. Optimal spatial sampling design is a core issue in environmental studies when exploring low cost and greater efficiency samples. The sampling scheme can be optimized through prior knowledge and sampling constraints. Van Groenigen and Stein [12] conclude that Spatial Simulated Annealing (SSA) is better than classical methods for optimizing sampling designs. SSA is useful for those studies that have several sampling constraints. Zhu and Stein [13] study spatial sampling design for the estimation of covariance parameters by the maximum likelihood method.
In present paper, spatial behavior of sodium in drinking water is determined and optimized sampling patterns by both, the optimal deletion and subsequent addition of locations is generated for three divisions of Punjab, Pakistan. Several variogram models are used for modeling the spatial dependence existing in the data. Maximum likelihood (ML), restricted maximum likelihood (REML), ordinary least square (OLS), and weighted least square (WLS) are used to estimate the parameters of the variogram. Universal kriging and Bayesian kriging with varying trend are used for the prediction of sodium concentration at unobserved locations. Moreover, spatial simulated annealing optimized sampling patterns are generated.

Study Area
Three divisions of Punjab (Bahawalpur, Dera Ghazi Khan and Multan) are investigated for study purpose. The Bahawalpur division is a second-order administrative division that includes the districts Bahawalnagar, Bahawalpur and Rahimyar Khan. It is located at an elevation of 92 meters above sea level. Its latitude is 28°30 0 0" North and longitude is 71°30 0 0" East in Degrees, Minutes and Seconds (DMS) or 28.5 and 71.5 in decimal degrees. The Dera Ghazi Khan division has four districts (Dera Ghazi Khan, Layyah, Muzaffargarh and Rajanpur). Geographical coordinates of Dera Ghazi Khan are 30°3 0 22" North, 70°38 0 4" East. Multan division has also four districts (Khanewal, Lodhran, Multan and Vehari). Multan district is spread over an area of 3721 square km. Its latitude is 30°12 0 54" North and longitude is 71°35 0 27" East.
The Pakistan Council of Research in Water Resources (PCRWR) collected and analyzed two samples from each union council (sub-sub-sub district). The water samples were collected from easily and frequently available sources such as, hand pump, tab-water, tube-well, and water supply because it depends on inhabitants of the region. According to the PCRWR survey The samples of water were collected in 500 ml polystyrene bottles from the two selected sites of each union council according to a standardized method. The details about selection of samples laboratory analysis is provided in [14]. Various properties of the selected samples were analyzed according to the 2540C APHA (1992) standard.

Spatial Interpolation Methods
Kriging is extensively used in spatial analysis and its main objective is to predict the value at an unobserved location by calculating the weighted average of samples at observed locations, see Matheron [15].
Universal kriging. Ordinary kriging assumes that the mean is constant in the study region, however, in universal kriging the mean is a function of the coordinates (trend). The trend can be modeled through linear or polynomial functions. Let Y = (Y 1 ,. . .. . .,Y n ) T be a vector of response variables measured at observed locations x 1 , x 2 , . . .,x n , and let its distribution be as follows: where μ = (μ(x 1 ), μ(x 2 ), . . .,μ(x n )) T , and Here β k , k = 1, 2, . . ., p, are unknown regression coefficients and the f k (x) are known functions of the spatial coordinates x to < 1 ; p is the number of functions that are used to model the trend component. The covariance matrix S Y is defined as where R α is a correlation matrix generated from one of the well known positive definite correlation function models, i.e. Gaussian, exponential or spheric. S Y depends on a vector-valued parameter θ = (σ 2 = total sill, α = range, τ 2 = relative nugget). Equivalently one may define also the matrix of semivariogram values where 1 is the nxn-matrix of ones. The covariance or semivariogram parameters θ are estimated using estimation methods like ML, restricted-ML or empirical semivariogram estimation and weighted-least-squares-fitting. The universal kriging predictor at an unsampled location x 0 is a linear predictorŶ ðx 0 Þ ¼ P n i¼1 l i Y i . Minimizing the mean squared error of prediction subject to the condition of unbiasedness results in the following system of simultaneous equations for the weighting vector λ = (λ 1 , λ 2 , . . .λ n ) T : where l k , k = 1, 2, . . ., p are the Lagrange multipliers for the conditions of unbiasedness, γ ij are the semivariogram values between locations x i and x j , i, j = 0, 1, 2,. . ., n and f i k ¼ f k ðx i Þ, i = 0, 1, 2,. . ., n, k = 1, 2, . . ., p. The weights vector λ and Lagrange multiplier l are calculated by solving the above system of equations and are utilized also for calculating the universal kriging variance: Bayesian Kriging. Bayesian kriging makes use of the Bayes theorem which involves the likelihood function l (θ;Y) and the prior distribution π (θ) of the respective parameters θ = (total sill = σ 2 ,range = α,relative nugget = τ 2 ,trend = β), see Omre [16]. The posterior distribution of the parameter vector θ can be expressed as: All model parameters are considered uncertain, and the prior distribution of parameters is given as follows: pðb; s 2 ; t 2 ; aÞ ¼ pðbj; s 2 ; t 2 ; aÞpðs 2 jt 2 ; aÞpðt 2 ; aÞ; whereas noninformative prior for β is used i.e π(β|, σ 2 , τ 2 , α) / 1. Now, using the above prior distribution of parameters, the posterior distribution is obtained as ½b; s 2 ; t 2 ; ajY ¼ ½bjs 2 ; t 2 ; a; Y½s 2 jt 2 ; a; Y½t 2 ; ajY; ð8Þ where [β|σ 2 , τ 2 , α, Y] is Gaussian with mean the generalized least squares estimate of β and covariance matrix the corresponding generalized least squares covariance matrix. The posterior density for τ 2 and α is given as where F is the design matrix of regression functions, R τ 2 ,α is the correlation matrix between the data and withb the generalized least squares estimate of β. For the case that a noninformative prior π(σ 2 |τ 2 , α) / 1/σ 2 is used, the posterior of σ 2 is given as scaled inverse chi-square distribution: which is equivalent to To predict values at unsampled locations the predictive distribution is used. Proposed by Diggle and Lophaven [17] and described in Diggle and Ribeiro [18] the predictive distribution for the unobserved signal process Y 0 = Y(x 0 ) at location x 0 is specified as: Diggle and Ribeiro [18] use a discrete prior for the covariance parameters. This way the posterior becomes also discrete and parameters can be simulated easily from the posterior by means of multinomial sampling. Sampling from the predictive distribution is performed by first sampling a parameter vector from the discrete posterior and then with the parameter vector fixed sampling from [Y 0 |σ 2 , α, τ 2 , Y], which is a Gaussian distribution with mean, the universal kriging predictor, and variance, the universal kriging variance. If no discrete prior π(τ 2 , α) is used but a continuous one then one has to use MCMC methods to sample from the posterior π(τ 2 , α|Y).

Spatial Sampling Design
Spatial sampling is a process in which a number of samples are used to evaluate the content of a larger geographical region. Every point in a sample exhibits information about the variable of interest at an unsampled spatial location. The sampling process has many advantages over the complete enumeration; for example, low cost, greater speed and higher scope, see Cochran [19]. The major objective of spatial sampling is to get the desired results with high precision at low cost. This can be achieved by allocating samples to locations based on minimizing an objective function, i.e., the mean universal kriging variance, see Wang et al. [20]. Spatial Simulated Annealing. Here, SSA is used for the optimization of the sampling design, see Van Groenigen and Stein [12]. In SSA locations are candidate measurements that are either removed or added iteratively and are optimized by minimizing the Mean Universal Kriging Variance (MUKV). The MUKV is an average of all kriging variances over a fine square grid of locations in the study region i.e.
where s 2 i is the variance of universal kriging. If the trend is constant, the algorithm uses the ordinary kriging variance. As the trend varies, it uses the universal kriging variance. The SSAalgorithm used here is just standard simulated annealing algorithm as described i.e. in Kirkpatrick [21] but adapted to the spatial context by following Brus and Heuvelink [11]. In SSA two different design tasks can be accomplished by: i.) Adding n new sample locations to an existing monitoring-network of m locations. ii.) Selecting n < m samples from an existing network of m locations.
SSA can be described as; i): The algorithm is initialized by randomly selecting n design-locations from the spatial region and by calculating the MUKV based on these random locations and maybe other available m given locations. Let's denote this MUKV as MUKV 0 . As the next step these n previously selected locations are randomly perturbed. Again the MUKV is calculated for n new locations i.e MUKV 1 . If MUKV 1 is smaller than MUKV 0 then it can be assumed that there is an improvement with last design and is stored for memorizing. If Δ MUKV = MUKV 1 −MUKV 0 > 0 then the previous design is accepted only with a certain probability i.e.
T is called temperature and at the beginning of the algorithm can be large. The larger T the higher is the probability that a worsening design will be accepted. The algorithm now starts to iterate: Every current design is randomly perturbed as described above; its MUKV is calculated and compared to the MUKV of the so far best design. Improvements are always accepted and stored and worsenings are accepted with the above probability, where MUKV 0 takes over the role of the MUKV of the so far best design. Actually, the probable acceptance of worsenings prevents the algorithm from being trapped in local minima of the MUKV. Further accuracy can be achieved by accounting uncertainty of variogram parameters and kriging predictors.

Cross Validation
Cross validation statistics are used to compare the performance of different fitted models. In the present study leave-one-out cross-validation is used for selecting appropriate variogram models, parameter estimation methods and kriging predictors. The Root Mean Squared Prediction Error(RMSPE) is used as performance measure: In the above statisticŶ ðx i Þ are the cross-validated, predicted values from a specific model and Y(x i ) are the observed values of the data. Finally, the method with minimum RMSPE is used for predicting the response variable at unobserved locations.

Exploratory Spatial Data Analysis
Exploratory spatial data analysis is performed on the sodium concentration data by using the geoR package of Ribeiro and Diggle [22] and R software Team [23]. This analysis is carried out to explore the spatial auto-correlation and the assumption of normality which is necessary for most of the kriging methods. It is observed that the sodium concentration data violate the assumption of normality. The Box-Cox transformation with parameter λ = −0.028 is used to normalize the data, see Fig 2. The transformed data are used for modeling the spatial distribution of sodium concentrations and for selecting optimal sampling designs.

Results for Universal Kriging
From exploratory analysis it is observed that there exists spatial dependence between sodium concentrations and coordinates, see left panel of Fig 1. Universal kriging with linear trend can take into account this spatial dependence. Since modeling the variogram is essential for prediction by kriging, various variogram models are fitted through different estimation methods (results of RMSPE given in Table 1). Table 1 shows that REML provides minimum RMSPE for covariance (see column 2-5 of Table 1). REML and universal kriging provide minimum RMSPE as REML has the advantage of removing the trend during variogram estimation. From the RMSPE presented in Table 1 it can be concluded that universal kriging with a spherical

Results for Bayesian Kriging
Bayesian kriging with linear trend is used with spherical and exponential variogram models.
The prior distributions are specified as follows: for the mean parameter β a uniform prior is used, i.e p(β) / 1; for the range parameter α a uniform prior is used, too, i.e p(α) / 1; the prior for σ 2 (total sill) is scaled inverse chi-square with 367 degrees of freedom; and for the relative nugget parameter τ 2 a uniform prior is used, too. The exponential and spherical covariance models are investigated. Leave-one-out cross-validation is used to estimate the RMSPE for each model. The RMSPE presented in Table 3 shows that the spherical model fits well (minimum RMSPE) for Bayesian kriging with linear trend. For predicting the sodium concentration at unsampled locations the posterior predictive distribution is used to draw contour maps.

Comparison of Spatial Interpolation Methods
Root mean squared errors of prediction of the spatial interpolation methods are given in Table 4. The RMSPE of Bayesian universal kriging is less than the RMSPE of universal kriging. It can be conclude that Bayesian universal kriging performs better than universal kriging. This is due to the use of additional information, data information as well as prior information, and more statistical modeling flexibility of Bayesian universal kriging.

Optimized Sampling Design
One of the objectives of the present paper is to obtain an optimum allocation of sampling locations to minimize cost and prediction error. Since the kriging prediction variance depends on both the sample size, the variance and covariance of the sample. Optimum allocation can be obtained only by considering all of these factors. Different variogram models are fitted as it is a pre-requisite for prediction by kriging. The variogram model providing minimum RMSPE is considered as appropriate for prediction and sampling design. For selecting the optimal sampling design optimal deletion and subsequent addition of locations are performed. The diameter was fixed by k-means i.e k = 6 and temperature was fixed by following Brus and Heuvelink [11]. Table 5 shows the MUKV using the observed 370 data locations for both, ordinary and universal kriging, and a spherical variogram model which comes out to be the appropriate model.

Deleting Locations using Spatial Simulated Annealing
To minimize sampling cost one possible option is to reduce the number of sampling locations. Spatial simulated annealing (SSA) is used to optimize the effect of deleting sampling locations upon MUKV. The MUKV is calculated using ordinary and universal kriging. The MUKV increases in both kriging techniques as the number of data locations goes down, see Table 6 and Fig 5. Subsequently deleting 20 and 40 locations from the 370 locations. When deleting 50 locations from the 370 locations the MUKV using ordinary kriging is 24560 and the MUKV for universal kriging is 24480, see Tables 5 and 6. According to the MUKV criterion it can be inferred that universal kriging theoretically seems to perform better than ordinary kriging. The sampling patterns for deleting locations using ordinary kriging and universal kriging are

Adding Locations using Spatial Simulated Annealing
Like before the objective here is to find the sampling pattern by adding locations that has minimum MUKV. If the number of locations to be added are increased in the SSA-algorithm then the MUKV decreases for both, ordinary and universal kriging, see Fig 8 and Table 7. However, the mean universal kriging variance is smaller than the ordinary kriging one. The sampling

Conclusion
The present paper is focused on two objectives: (i) to model the spatial distribution of sodium concentration, and (ii) to generate optimized spatial sampling designs for three divisions of Punjab, Pakistan. Universal kriging and Bayesian kriging with varying linear trend are used for modelling the spatial distribution of sodium concentrations in drinking water. To take account of the spatial dependence in the response variable Gaussian, exponential, spherical and cubic Spatial Prediction and Optimized Sampling Design variogram models are used. It is concluded that the spherical variogram model provides smaller mean squared error of prediction than other variogram models. Since Bayesian universal kriging has the advantage of utilizing prior information about model parameters and is statistically more flexible, it performs better than universal kriging. A comparison of the used kriging methods can also be based on credible intervals. The 95% credible intervals are estimated from the simulated values of sodium concentration. Plots of credible intervals as presented in Fig 11 show that Bayesian universal kriging has shorter intervals than universal kriging. In Bayesian universal kriging most of the actual data values lie within the 95% predictive intervals. Thus, Bayesian universal kriging gives more reliable predictions than universal kriging. This is also due to the fact that Bayesian universal kriging does take into account the uncertainty of the covariance model. Universal kriging does not; once the covariance model is estimated it is fixed and its uncertainty is discarded during prediction. Prediction maps of sodium concentrations are generated based on Bayesian universal and universal kriging; locations having sodium concentration above the threshold value of 200mg are identified. Prediction maps show that sodium concentrations in drinking water are increasing to the southern side of the study area. Spatial simulated annealing is used to generate optimized spatial designs for adding and deleting locations. For optimization the MUKV is used as objective function subject to ordinary kriging or universal kriging. When deleting locations the MUKV increases; however, the increase is not much because redundant locations are deleted. If locations are added the MUKV decreases; the designs themselves have a space-filling character, when adding locations by means of SSA. Spatial simulated annealing optimize the patterns, which are generated by deleting and adding locations. Time and cost may be saved by deleting the redundant locations, and the variation can be minimized by adding locations that was unfilled. Recently, Junez-Ferreira and Herrera [24] used Kalman filter to sequentially optimize space-time monitoring points. Their suggested method can minimize the prediction error in better way as compared with simulated annealing algorithm. For further improvement in optimizing sampling design in present paper the method suggested in [24] can be used.