A Comparison of the Spatial Linear Model to Nearest Neighbor (k-NN) Methods for Forestry Applications

Forest surveys provide critical information for many diverse interests. Data are often collected from samples, and from these samples, maps of resources and estimates of aerial totals or averages are required. In this paper, two approaches for mapping and estimating totals; the spatial linear model (SLM) and k-NN (k-Nearest Neighbor) are compared, theoretically, through simulations, and as applied to real forestry data. While both methods have desirable properties, a review shows that the SLM has prediction optimality properties, and can be quite robust. Simulations of artificial populations and resamplings of real forestry data show that the SLM has smaller empirical root-mean-squared prediction errors (RMSPE) for a wide variety of data types, with generally less bias and better interval coverage than k-NN. These patterns held for both point predictions and for population totals or averages, with the SLM reducing RMSPE from 9% to 67% over some popular k-NN methods, with SLM also more robust to spatially imbalanced sampling. Estimating prediction standard errors remains a problem for k-NN predictors, despite recent attempts using model-based methods. Our conclusions are that the SLM should generally be used rather than k-NN if the goal is accurate mapping or estimation of population totals or averages.


Introduction
Forest surveys provide critical information for many interests: quantifying carbon sequestration, making sound management decisions, designing processing plants, guiding decisions among conflicting land uses, and quantifying wildlife habitats, just to name a few. To meet national and international negotiations and reporting requirements, forest management plans require local inventory data on vegetation, site productivity, biomass, carbon and other resources. The data must be intensive enough to include structural variables relevant to biomass and carbon projections and extensive enough to cover hundreds to thousands of acres, but can not be too expensive to collect. Thus, data are often collected from samples. From these samples, maps of resources and estimates of aerial totals or averages are required. One approach to mapping and estimating totals of biomass and productivity data is a spatial linear model (SLM), which includes ordinary kriging and universal kriging. This approach was initially developed for a similar goal: to predict geographic values or totals for mining resources. However, another approach, k-NN (k-Nearest Neighbor), has been recently developed and has gained widespread use. The overall goal of this paper is to compare SLM to k-NN theoretically, through simulations, and as applied to real forestry data.
The k-NN method finds observed samples that are ''close'' to an unobserved location based on covariates, and then either imputes the ''closest'' one directly as a prediction (k~1), or forms a weighted average as a prediction (kw1). Widespread availability of remotely-sensed data as covariates allows extending ground information to large areas using k-NN. One of the reasons k-NN is popular is that, when k~1, predictions are within the bounds of biological reality because they were observed in the samples [1][2][3]. Also, the logical relationships among response variables will be maintained, so k-NN is a multivariate method that retains the variable relationships seen in the data, particularly when k~1 [1,[4][5][6]. When variables are predicted separately, the dependence structure among the response variables is generally lost [7]. The multivariate aspect of the k-NN may be necessary for inventory applications where information on multiple stand attributes is required for stand management decisions or further modeling [5,8]. Because k-NN methods reuse existing samples, they are distribution-free [2,9,10]. Non-parametric k-NN imputation methods may provide better matches to listings of tree species for complex stands with multiple species and a wide variety of tree sizes, which tend to have multi-modal distributions [8]. Nonparametric methods were found to effectively describe local conditions and variability [11,12]. One way to make k-NN local is to select a combination of neighbors from the neighborhood where the average of the covariates is closest to the target record covariates [12,13]. Localization can also be achieved by using spatial coordinates as covariates or by restricting the selection of neighbors to a circular area around the target unit [14]. The yaImpute R package [15] facilitated the comparison of different k-NN approaches and its wide use.
There are some recognized problems with k-NN. It tends to be highly biased at the edge of the data cloud because prediction sites will likely be paired with a more central sample value due to the asymmetric neighborhood [16,17]. Extremely small values and extremely high values will be over-and underestimated, respectively, if the sample data do not cover the whole range of variability [18,19]. Bias can also be a problem in the interior of the data cloud if the covariates are non-uniformly distributed [20]. In k-NN methods, the match is found using the covariates that are available for every site. As the number of samples increases, there will be a higher chance of getting an exact match in the covariates. However, this only guarantees an exact match for the response variables if there is perfect (or very strong) correlations with the covariates. As sample size increases, there is no guarantee that the mean of the response variables will approach the true mean and hence k-NN methods are not statistically consistent [3]. The k-NN methods lack a good measure of uncertainty, and often the global root-mean squared error from cross-validation is used for pointwise standard errors [20]. However, that global root-mean squared error may not be a good measure of accuracy when response variables have heteroscedastic variances around the covariates; it was recommended that graphical tools be used to evaluate issues of bias, homoscedasticity, influential observations, outliers, and extrapolations [21]. An approach that uses a model of the covariates space variogram to quantify prediction uncertainty was proposed, but is computationally demanding [22]. Relevant accuracy statistics for assessing the quality of predictions of categorical variables are still lacking [7]. However, there are recent proposals for a variance estimator that incorporates spatial correlation [23], model-based estimators of the uncertainty [24], and design-based approaches to derive the statistical properties of the k-NN predictions [17]. If kw1, then k-NN loses many of its purported benefits as some estimates may not be within the realm of real values [3]. If there are several response variables or ''rare'' polygons (stands), a good match will be very difficult to find [3].
Geostatistical methods were also devised for prediction, both for single sites or block averages. For a history, see [25]. Combining the notion of classical geostatistics with a linear model (e.g., regression) yields the spatial linear model (SLM) (called universal kriging in the geostatistical literature). In comparison to k-NN, SLM predictions were designed to minimize the root-meansquared error. If the data-generating process is multivariate normal, then the SLM predictor is equivalent to the conditional expectation, which is optimal [26](pgs. 108-110). Moreover, if the generating mechanism is not multivariate normal, the SLM predictions are still optimal among the class of linear predictors, called best linear unbiased prediction (BLUP) [26](pgs. 151-155). The BLUP was extended to finite populations of spatial data [27][28][29]. Although distribution-free, BLUP requires specification of a covariance model; however predictions are generally robust to mis-specification of the covariance model [30,31]. Classically, geostatistical methods estimated the covariance model by binning data into distance classes and using a least-squares fit [32]; however, restricted maximum likelihood (REML) estimation [33,34] removes the arbitrary nature of binning and fitting and is an unbiased estimating equation approach to the SLM [35,36]. Finally, SLM predictors are linear, creating weighted averages of data, and we can appeal to a spatially correlated version of the central limit theorem that predictors will be asymptotically normally distributed, e.g. [37], allowing for inference based on a standard normal distribution (e.g., for prediction intervals). More detailed and mathematically-oriented arguments on the stability of the geostatistical method can be found in [38] and [26](pgs. 289-299). There are many extensions of the SLM when predictions cannot be assumed approximately normally-distributed, e.g. [39].
The SLM is not without some recognized problems. Most of these are related to the problem of estimating the spatial covariance function. In a sense, the data are used twice; once to estimate the covariance parameters and secondly used for BLUP. This has been termed empirical best linear unbiased prediction (EBLUP) [40]. Forming multivariable models requires estimating many covariance parameters, and any gain in precision is often minimal [41], with gains often 10% or less [42]. However, see [43,44] for forestry applications. Some claims against the SLM are in error. Some forestry data are obtained from polygons that do not have a unique position in space and are irregular lattice data (aggregates of spatial data). Some authors, e.g. [6], indicate that the SLM is not appropriate for these data, but that is not correct. Polygon centroids can be used to compute distance, as shown with the real forestry data below. We note, however, that this may be a problem for very irregularly shaped polygons.
It turns out that predictors for both SLM and k-NN are linear, and the above review indicates that SLM should be optimal. However, much of that theory is based on an assumed spatial stochastic model, does not take into account the estimation of the covariance parameters, nor does it say how much better SLM might be. k-NN has fewer assumptions, and perhaps it is easier to implement and faster to compute, and a small loss in efficiency is compensated by ease of use and computational speed (see [45] and references therein). We build on a few previous comparisons [46,47], where they compare only for point predictions, they do not assess the validity of prediction standard errors, and they use variogram estimators on residuals, which is known to be biased for covariance parameters [48](pgs. 257-258). One problem in making the comparison is that k-NN is an algorithm that makes no assumption about how spatial data were created; it only assumes a fixed surface. The SLM, on the other hand, is based on the idea that data are a realization of a spatial stochastic model. By conditioning on realizations under a SLM, and hence adopting a fixed surface perspective, k-NN and SLM can be evaluated in a common framework. Foresters and forest managers are interested in the global performance of predictive or imputation methods across a management region, among management regions, and across time. Supporting that practice, the comparison of prediction or imputation methods using global computations of average bias and RMSPE (described in Section ''Performance Measures'') is deeply rooted in the forestry literature [3,10,14,21].
Our goal is to compare k-NN to the SLM using simulations of artificial populations and resampling real forestry data. In the Methods section, we make explicit the prediction goals and k-NN and SLM methods and models, and we describe the simulation methods and real forestry data. The outcomes of using k-NN and SLM on these data are presented in the Results section. We discuss the results and offer some conclusions in the Discussion section.

Methods
We will use the following notation for spatial data. Let the population of response values be partitioned into those that are observed y o~f y i ; i~1,:::,ng and those that are unobserved y u~f y i ; i~nz1,:::,nzmg, and y~(y . Let the index set for the observed data be O~1, . . . ,n and for the unobserved data be U~nz1, . . . ,nzm. We consider two main goals: 1) point prediction of y j for j[U, and 2) block prediction of the total or average T~P nzm i~1 b i y i~b 0 y, where b i are the weights that define the block objective; e.g., if fb i~1 ; i~1, . . . ,(nzm)g then T is a population total, and if fb i~1 =(nzm); i~1, . . . ,(nzm)g then T is a population average. Note that prediction goals for small areas can also be defined using zeros as weights in b, but we do not pursue that here. For all response values, there are covariates contained in a design matrix X which has nzm rows and p columns, where the first n rows, X o , correspond to y o and the next m rows, X u , correspond to y u . The spatial coordinates are contained in the matrix S which has nzm rows and q columns, where the first n rows correspond to y o and the next m rows correspond to y u . In S, q is generally two. For example, the first column could be longitude and the second column latitude, or some planar transformation of them. A coordinate such as height might also be included, but we do not consider it here. To meet our two objective above, we define the linear predictor, where j[U. A linear block predictor iŝ As will be seen, both k-NN and SLM are linear predictors. Note that [23] and [24] attempt model-based estimators for a version (2) that includes predicted values of y i ; i[O rather than simply summing the observed values. The variance of such predictors lack any notion of a finite population that shrinks the variance as the sampling fraction increases; i.e., we want a variance estimator such that var(T T)~0 if the whole population is observed, so we do not pursue their formulation any further. Both the SLM and k-NN use distance in various ways so a general definition is given here. Let A be a matrix with coordinates in the columns and the ith row denoted as a i . A general distance formula between the ith and jth rows of A is, where W is a weighting matrix.

Review of k-NN
Let x 0 i and x 0 j be the ith and jth rows of X, respectively. Then a ''variable-space distance'' between the ith and jth sites can be computed as d(i,j; X,W). Several types of distances are possible. For example, if W is the identity matrix, this is raw distance; if W is diagonal with the inverse of the empirical variance for each of the columns in X as the diagonal elements, then this is normalized distance; if W is the inverse of the empirical covariance matrix among the columns in X, then this is Mahalanobis distance; and if W~C L 2 C 0 , where C is the matrix of canonical vectors from canonical correlation analysis between X and y and L is the canonical correlation matrix, then this is the ''most similar neighbor'' (MSN) distance [1]. All of these and others have been implemented in the yaImpute package [15] in R [49]. The k-NN method chooses weights l j based on a distance matrix. Let D be a distance matrix with i,jth element d(i,j; X,W), which can be partitioned as Let d j be the jth column of D, j[U, contained in D o,u ; i.e., d j~f D½i,j; i~1,2, . . . ,n; j[Ug. If i is the index for min (d j ), then for a first-order nearest neighbor, l i,j~1 in (1) and all other l ',j~0 ; '=i. This essentially assigns the value of y i toŷ y j for the ith site that is closest to the jth site in variable-space. Let N k j be the index set of the k nearest sites (smallest values) in d j . Then l i,j~1 =k; i[N k j takes the average of fy i g from the k nearest neighbors in variable-space. Another option is to weight inversely Cross-validation is the method most often used to compute prediction standard errors [20]. Cross-validation makes predictions for sites that already have values, where each sample is removed one at a time, and the rest of the sample is used to predict the one that was removed. The idea is to use in-sample averaged squared errors between the predicted and observed values to serve as a global estimator of squared errors when out-of-sample. Let the k-NN prediction standard error be estimated as, whereŷ y i is the cross-validation prediction of y i for i~1, . . . ,n sample values. Assuming prediction errors are normally-distributed, 90% prediction intervals are formed asŷ y j +1:645ĵ j for j~nz1, . . . ,nzm out-of-sample values. Note that these are constant for all j; we examine a spatially explicit model-based approach [23] in Section ''A geostatistical approach for estimating the variance of k-NN predictors.'' For the standard error of estimating a total, we borrow from the idea of classical sampling theory, e.g. [50], whereĵ j replaces the standard error,

Review of SLM
Assume only the linear model where X is a matrix of fixed covariates, b is a vector of parameters, and E is a random vector with var(E)~V for some unknown spatial multivariate distribution. Note the contrast to k-NN; (6) is a spatial stochastic model that allows optimization with respect to bias and squared error, which we now review. Let V be partitioned as and let v j be the jth column of V, j[U, contained in V o,u ; i.e., v j~f V½i,j; i~1,2, . . . ,n; j[Ug. Consider squared-error loss, and the linear predictor l j in (1), where l 0 j X o~x 0 j and X 0 contains at least a column of ones. The predictor that minimizes squarederror loss, known as the best linear unbiased predictor (BLUP), y y j~l where [26] (pgs. 151-155). Notice that b is unknown; the only assumption is the linear model and a known spatial covariance matrix.
Assume the same linear model in (6), except this time the linear predictor is (2). Let T~b 0 y, where b is a vector of all ones, and with prediction variance of where F~X [27][28][29]. The finite population correction factor is not obvious in (11). However, as b u gets shorter in length, (11) goes to zero. If V~s 2 I, then (11) simplifies to var(T T)~(nzm) 2 s 2 (1{f )=n where f~n=(nzm) is the sampling fraction; this is the classical formula in simple random sampling without replacement for finite populations, e.g. [50](pg. 16).
For equations (9) and (11) V is unknown and must be estimated. In spatial models, V is modeled through spatial information; in geostatistics this is spatial distance. Consider the exponential autocovariance model, where G(d i,j : r) is a general autocorrelation function, d i,j~d (i,j; S,I) as defined in (3), V½i,j; h is the i,jth element of V, h~(d 2 ,r,s 2 ) with d 2 as the partial sill, r as the range parameter, and s 2 as the nugget effect (which may absorb spatial autocorrelation at very fine scales within minimum sampling distances). We will fit models using G(d i,j )~exp ({d i,j =r); for many other models see [51](pgs. 80-96). The larger r, the more autocorrelation between sites for a given distance. The parameters s 2 and d 2 are variance components, with d 2 controlling the autocorrelated component and s 2 controlling the uncorrelated component. For all models in this article, we use (12), relying on the fact that inferences are generally robust to mis-specification of the model. We estimate the covariance parameters using restricted maximum likelihood (REML) [33,34], where , and c is a constant that does not depend on h. Equation (13) is an unbiased estimating equation [35,36] and minimizing it for s 2 and r provides their REML estimates. Using the estimated covariance parameters from REML in equations (8)(9)(10)(11) provides the EBLUP predictors and standard errors.
The significance of using (13) is three-fold: 1) normality is not required to use (13) because it is an unbiased estimating equation, 2) there is no need to de-trend because estimation of b is essentially embedded in (13), and 3) there is no need to compute empirical variograms by binning residuals. Residuals from detrending are biased [48](pgs. 257-258) and binning is arbitrary. Thus, (13) provides an automatic way to estimate a spatial covariance matrix in very general conditions.

A Geostatistical Approach for Estimating the Variance of k-NN Predictors
An iterated variogram estimator for the variance of a k-NN predictor has been proposed [23]. Suppose that we start with iteration q~0 and (4) as an estimator for a constant prediction standard error,n n ½q i~ĵ j. Form standardized residuals as whereŷ y i is the in-sample cross-validation prediction value using some k-NN method. Then compute an empirical semivariogram, where where t 0~0 , 0vt 1 vt 2 ,v, . . . ,vt L , and h ' is some function of A semivariogram, such as the equivalent to (12), is fit to (15), typically by minimizing a weighted-least-squares criteria; e.g. [32], Let the spatial autocorrelation be c Then a local estimator of variance is Now go back to (14) with updated n's and iterate until convergence. For a convergence criteria, we used After convergence, a local estimator of prediction variance is where the iteration superscript ½q is suppressed after convergence. The prediction variance estimator (18) can be seen as an attempt to form a local (in covariate space) version of (9) without having to estimate mean effects due to ''nearness'' in covariate space. Note that this estimator will not work for k~1, and it is only sensible when the mean of nearest neighbors is computed, as compared to the distance-weighted version. However, the above method could be adopted for distance-weighted. The estimator (18) will be examined using simulations. For the simulations, we used 10 equal-interval variogram distance-bins (t ' {t '{1 was equal for all '; L~10) between 0 and the maximum distance in the data set. Equation (17) was minimized using the optim() function in R [49] with the Nelder-Mead simplex method [52], obtaining a starting value from a 10610610 search grid for the parameters of (16). A maximum of 30 iterations was allowed. A more formal geostatistical analysis of the k-NN predictors is as follows. The k-NN predictor can be written as (1). Under the SLM, the root-mean-squared prediction error (RMSPE) is After taking expectations, Note that k-NN methods make the sensible constraint that l j 0 1~1 when using the mean of the nearest neighbors or distance-weighting. If X o is a single column of ones, then the bias term ½( l j 0 X o {x j 0 ) b 2 above disappears. However, this is not true in general. In contrast, under the BLUP, the bias term disappears due to further constraints on l j that guarantee unbiasedness for any X o and x j 0 . Hence, the RMSPE for k-NN will be greater than BLUP for two reasons: it is not optimized for minimizing the error variance l j there is a bias-squared component. An estimate of the RMSPE of the k-NN predictor under the SLM model is obtained by replacing b withb b. Note that which equals Note that is an estimator of the RMSPE of a k-NN predictor using parameters estimated under a SLM; i.e., covariance parameters can be estimated using (13) and b can be estimated using We will analyze some k-NN estimators using (19) in simulations below.

Simulation of Artificial Data
We created spatially-patterned and cross-correlated X -variables. All data sets were repeatedly simulated on a 20620 regular grid evenly spaced between 21 and 1 on both coordinate axes and eight covariates: where z 1 is a 400|1 vector of values (on the 20620 regular grid) containing zero-mean spatially-autocorrelated random variables from some geostatistical model with partial sill d 2 1 and range parameter r 1 , and E 1 is a 400|1 vector containing zero-mean independent random variables with variance s 2 1 . We also let z 1 be independent of E 1 . Next, we set up an autoregressive-like recursion, where where z g ,g~2,3, . . . ,8 contains zero-mean spatially-autocorrelated random variables from some geostatistical model with partial sill d 2 g and range parameter r g , and E g contains zero-mean independent random variables with variance s 2 g , where again z g is independent of E g . Note that w g{1 is a parameter that creates cross-correlation between variables by regressing w g on w g{1 . This set-up ensures cross-correlation among the W -variables and spatial autocorrelation within each W -variable. Now let where all the elements of m g are constant, equal to m g . Finally, create the response variable as, where X~½x 1 Dx 2 D . . . Dx 8 , b is a vector of parameters, and z y contains zero-mean spatially-autocorrelated random variables from some geostatistical model with partial sill d 2 y and range parameter r y , and E y contains zero-mean independent random variables with variance s 2 y , where again z y is independent of E y . For simulations, we let all z g and E g be normally distributed. For autocorrelation of z g , we used the spherical model, where d j,j 0~d(j,j 0 ; S,I) We simulated three types of data using these models. In all cases w g = (NA, 0.5, 0.5, 0.5, 0, 0.5, 0.5, 0.5). Note that for each simulated data set the covariates X 1 {X 4 were cross-correlated through w 2 {w 4 , but w 5~0 broke any further cross-correlation to the group X 5 {X 8 , and then X 5 {X 8 as a group were crosscorrelated. , and s 2 y~: 05. Then, y was simulated from Poi(exp(y t )) where Poi( : ) is a Poisson distribution. Note that y t wandered well below zero over significant spatial patches creating zeroinflated count data. Variances and regression coefficients are smaller here to keep numbers reasonable when exponentiating. 3. For the third simulation method, let y t denote the simulated data, with all of the same parameters as for the second simulation. Then, y was simulated as y j~I (y t,j w0), creating binary data of zeros and ones.
Note that the group X 1 {X 4 had smaller ranges and variances than group X 5 {X 8 . The response variable was related to both groups through b, but the coefficients were zero for X 4 and X 5 . We made predictions for both k-NN and SLM methods using only X 1 , X 2 , X 4 , X 5 , X 7 , X 8 . Thus, several model mis-specifications were made: inclusion of covariates with no effect (X 4 and X 5 ), exclusion of covariates with real effects (X 3 and X 6 ), and in the case of SLM, generating data from a spherical autocovariance model but fitting the data with an exponential autocovariance model.
For each of the three simulation methods listed above, 2000 data sets were simulated (with 400 simulated values per data set). For each simulation 100 locations were sampled randomly, and from each sample the other 300 were predicted, along with the overall total.

Forest Data
Data for this study were obtained from the Forest Inventory and Analysis (FIA) databases for Oregon. The FIA databases are part of the national inventory of forests for the United States [53,54]. The Food Security Act of 1985 protects the confidentiality of the Forest Inventory and Analysis (FIA) plot locations, the integrity of the FIA sample, and the privacy of landowners who allow FIA field crews on their land. For these reasons, the actual plot locations are not available publicly. Persons needing exact plot locations to examine or reproduce our results should contact the Pacific Northwest Forest Inventory and Analysis Program at http://www.fs.fed.us/pnw/fia/. Alternatively, FIA produces and maintains a set of public databases with perturbed plot coordinates that can be downloaded and used by anyone. The website for accessing these data is http://www.fia.fs.fed.us/tools-data/.
A tessellation of hexagons, each approximately 2400 hectares in size, is superimposed across the nation, with one field plot randomly located within each hexagon. Approximately the same number of plots is measured each year, each plot has the same probability of selection, and in the western U.S. plots are remeasured every ten years. Each field plot is composed of four subplots. Forested areas that are distinguished by structure, management history, or forest type are mapped as unique polygons (also called condition-classes) on the plot and correspond to stands of at least 0.4047 hectare in size. For our study area there were 1886 forested FIA plots measured between 2001 and 2006. Biomass (DRYBIOT), maximum potential mean annual increment (PMAI), elevation, and primary species identifier (i.e., Douglas-fir (Pseudotsuga menziesii), ponderosa pine (Pinus ponderosa) or western hemlock (Tsuga heterophylla)) were obtained from the FIA annual database.
For climate data, we used monthly temperature and precipitation normal data for the period 1971-2000 produced by the Parameter-elevation Regressions on Independent Slopes Model (PRISM), which provided an 800 m grid that produced differences between measured plot elevation and overlaid PRISM grid elevation up to 350 mm in the mountainous areas of Oregon. To account for changes in climate due to these elevation differences we utilized a process similar to [55] where we created a scale-free interpolation process using a 90 m digital elevation model and PRISM temperature and elevation gradients of the larger 800 m grid. The result is a 90 m monthly climate grid. Like [55] we used this procedure for extracting temperature (T) only and used a simple distance weighting method for precipitation (P). Climate Moisture Index (CMI), a measure of precipitation in excess of evapotranspiration, was used to quantify moisture availability. The raw values of PMAI and DRYBIOT, along with residuals after fitting a linear model, are shown in Figure 1; covariates (X -variables) in the linear model were temperature, precipitation, Climate Moisture Index (CMI), WH (an indicator variable for shade tolerance based on Western Hemlock trees) and elevation.
For real forest data, we used a census of 1886 known FIA values for PMAI and DRYBIOT. These were subsampled in three different experiments. For each resampling experiment listed below, the data sets were subsampled 500 times. For each subsample, a sample of 386 values was chosen randomly without replacement, and from each sample the remaining 1500 locations were predicted, along with an estimate of the total for all 1886 values. The covariates used were temperature, precipitation, CMI, WH, and elevation. The 500 resamplings were performed for each of the following: 1. PMAI: this variable reflects the maximum potential forest productivity at a particular site. It indicates the average annual productivity of wood volume (m 3 /ha/year) that would be realized over time. 2. DRYBIOT: this variable is the total above-ground oven-dry biomass of live trees larger than 2.5 cm in diameter. 3. PMAI with unbalanced spatial sampling. Here, we preferentially sampled geographically, as often happens in real applications due to access or other issues. We divided up the study area into four parts, using x~100 and y~100 as the dividing lines. We created samples by sampling 200 values without replacement for yw100 and xv100, and then randomly sampling 186 values without replacement from the rest of area. One such sample is shown in Figure 2. From each unbalanced spatial sample, the remaining 1500 locations were predicted.

Performance Measures
Over the six preceding simulation/resampling experiments, the SLM is compared to k-NN using three measures of predictive performance, both for individual site predictions, and totals over all sites. Note that these are simple summaries of predictive performance, and make no assumptions about how the data were generated; they are global summary statistics to help evaluate and compare methods. Let y jDr be the true, known value for the jth location and the rth simulation or resampling. Note that if we use y jDr (s) to denote the spatial location s associated with y jDr , then fy jDr (s)[Ug will vary over s for each simulation/resampling because observed samples were randomized each time. We indicate this dependence on r as jDr. A prediction of y jDr , using (1), is denotedŷ y jDr for each j and r. For each r, there is a true total T r , and an estimate, using (2), denotedT T r for the rth simulation/ resampling. The three performance measures are: N RMSPE: the root-mean-squared prediction error measures how close the estimates are to the true values.
for point-wise predictions, where R = number simulations (1500) of artificial data or resamplings (500) of real forestry data for point prediction, and m = number of point predictions (300 for artificial data and 1500 for real forestry data) per replication r, and for total prediction. A smaller value of RMSPE indicates predictors are closer to true values.
N SRB: signed relative bias. Absolute bias is meaningless, so it is expressed as a fraction of the variability. It is well-known that MSPE = bias 2 +variance, and we use MSPE as RMSPE squared from above. The sign of bias is also informative, so signed relative bias as a fraction of variability is computed as,  for point-wise predictions, and for total predictions, sign(t k ) is the sign (positive or negative) of t k , and k~y for a point-wise performance measure or k~T for a total performance measure. A smaller absolute value of SRB has smaller bias, and a negative sign indicates under-prediction and a positive sign indicates over-prediction. where b se se(ŷ y jDr ) is the estimated standard error ofŷ y jDr , taken from (4) for k-NN methods, and from the square root of (9) for the SLM (EBLUP), with covariance parameters estimated from (13). PIC90 y should be near 0.90 if prediction intervals are properly estimated. It is also possible to compute PIC95 y by replacing 1.645 with 1.96 in the formula above, and PIC95 y should be near 0.95. For total predictions, where b se se(T T r ) is the estimated standard error ofT T r , taken from (5) for k-NN methods, and from the square root of (11) for the SLM (EBLUP), with covariance parameters estimated from (13).

Prediction Methods
Seven prediction methods were examined; five different k-NN methods, multiple regression (a special case of a SLM that assumes independence), and a SLM: N SLM: a spatial linear model using the same covariates as all k-NN methods as main effects only, with an exponential autocovariance model estimated by REML, and using prediction and variance equations as described in the Review of SLM section.
N LM: multiple regression like SLM but assuming all random errors are independent.

Results
The performance measures for the first set of Gaussian simulated data are presented in Table 1. Note that this table is based on 2000 simulations with 300 predictions per simulation, so 2000 total values were estimated and 600,000 points. As expected, the SLM had the lowest RMSPE, for both point and total predictions. Not only was it lowest, it was dramatically lower than any other predictor. The data were simulated with a high amount of autocorrelation, so this demonstrates how much better SLM can be in that case. When compared to MAH5 and MSN1 (the two commonly-used k-NN methods), SLM reduced RMSPE by 52.6 and 64.1% for the point estimates and 43.1 and 66.8% for the total estimates. SLM was also noticeably better than LM (linear model assuming independence), with reduced RMSPE of 34.8 and 31.8% for point and block prediction, respectively. Among the k-NN methods, MSN5 was best for both points and totals, but still not as good as LM. All methods were essentially unbiased for both points and totals. For all point estimates, prediction interval coverage was near 0.90, as they should be. For total estimates, it appears the MAH1 is a bit too high, and perhaps MAH5 a bit too low.
The performance measures for the second set of Poisson simulated data are presented in Table 1. The SLM again had the lowest RMSPE, for both point and total prediction. SLM reduced RMSPE by 31.3 and 14.8% for point estimates when compared to MAH5 and MSN1, and SLM reduced RMSPE by 23.6 and 23.5% for total estimates. All of the methods appear to be unbiased for point prediction, with generally valid confidence interval coverage. There appears to be some bias among the k-NN methods for predicting the total, and some prediction intervals fall below 0.85 for the k-NN methods. Also, the 0.86 prediction interval coverage for the SLM was a bit low, and this simulation was its poorest performance on that measure.
For predicting a binary variable, we replaced the RMSPE with percent correctly classified (PCC) for point prediction. Only the k-NN methods with k = 1 truly predicted values that were 0 or 1, so for all other methods predictions were rounded to 0 or 1. The performance measures for the binary simulated data are listed in Table 1. In fact, the k-NN with k = 1 performed most poorly, with the SLM again best. SLM increased PCC by 12.5 and 10.3% over MAH5 and MSN1, respectively. Point prediction appears unbiased for all methods. Prediction interval coverage is poor for the methods. A total of binary variables is rarely of interest in forestry applications compared to estimating proportions. For this simulation, we used the block mean, which is the estimated proportion for binary data, instead of a total. For block prediction, SLM decreased the RMSPE by 22.9 and 24.4% over MAH5 and MSN1, respectively. There may be some bias for MAH1 and bstNN. Prediction interval coverage is a little low for MAH5 and bstNN.
The performance measures for resampling real PMAI forestry data are presented in Table 2 in rows marked with PM. For point prediction, SLM reduced RMSPE by 9.0 and 34.4% over MAH5 and MSN1, respectively. Point prediction appears unbiased for all methods. Prediction interval coverage is quite good for all methods. For predicting a total, there appears to be some bias for k-NN methods using Mahalanobis distance, and prediction intervals are too large for MSN1 and too short for MAH5. The SLM reduces the RMSPE for predicting a total by 21.8 and 25.9% over MAH5 and MSN1, respectively.
The performance measures for resampling real DRYBIOT forestry data are presented in Table 2 in rows marked with DB. For point prediction, the bestNN approached SLM for the smallest RMSPE. The MAH5 method also did quite well, but MSN1 was very poor. For predicting a total, SLM again has the lowest RMSPE. There appears to be some bias for the bestNN method. All prediction intervals are within + 5% of 90%. Table 2, in rows marked UN, presents the performance measures for resampling real PMAI forestry data with spatially unbalanced sampling, as shown in Figure 2. For point prediction, this creates substantially more bias than the k-NN methods in Table 2. SLM remains relatively unbiased, again with the smallest RMSPE and valid prediction intervals. For predicting a total, there are large biases for k-NN methods and prediction intervals are far from the nominal 90%. The large bias cause the RMSPE for SLM to be much lower than any k-NN methods.
Most methods showed little bias globally, with generally valid prediction intervals. Yet, the SLM, and geostatistics in general, aims to make prediction intervals that vary in space, while the cross-validation approach used for k-NN is constant in space. We re-ran simulation 1 using the iterated variogram (IterVar) variance estimator of [23] in (18), testing its global and point-wise efficacy, compared to the SLM predictor and interval, and compared to the k-NN predictor under the SLM model (19), which we label kNNGeo. A scatter-plot of a single simulation, with 300 predictions for the unsampled locations, is shown in Figure 3, which plots Dŷ y i {y i D on the x-axis and c s:e s:e:(ŷ y i ) based on (9), (18), and (19) on the y-axis. We computed Kendall's rank correlation between the true absolute error Dŷ y i {y i D and the estimated prediction standard error, c s:e s:e:(ŷ y i ) for each method. These correlations were computed for 1000 simulations, and then all correlations were plotted as violin plots for each method, which is shown in Figure 4. Figure 4 shows that, indeed, the individual prediction intervals for the SLM are generally related to the actual errors. In contrast, Figure 4 shows that the IterVar method has no relationship between the prediction intervals and the actual absolute errors. Also, [23] claim that the algorithm is expected to rapidly converge. In our implementation, it converged only 57.4% of the time. It diverged before 30 iterations about 2% of the time. Globally, the IterVar 90% prediction interval has 88.8% coverage. The kNNGeo method of Figure 4 showed the strongest correlation between actual absolute errors and prediction intervals, largely due to the fact that it correctly estimated a dominant component of the error, which was the bias-squared. Using the RMSPE of the kNNGeo method for a 90% prediction interval had 94.2% coverage.

Discussion
This article set out to compare k-NN to the SLM for forestry mapping, and for the estimation of totals or averages of forest resources. In the introduction, we laid out arguments that favor In the Data column, S1 indicates data from the first simulation method, S2 indicates data from the second simulation method (count data), and S3 indicates data from the third simulation method (binary data), as described in Section ''Simulation of Artificial Data.'' Each data set used 100 samples per simulation, indicated by P in the P/ T column, and summaries were based on 300 predictions per resampling, which were then averaged over the 2000 simulations. There was one total estimate per simulation, which were summarized over the 2000 simulations, and indicated by T in the P/T column. Different prediction methods form the rest of the columns and are described in Section ''Prediction Methods.'' Performance measures form the rows and are described in Section ''Performance Measures;'' however, note that percent correctly classified "PCC" replace RMSPE for point predictions of the binary (S3) simulated data. doi:10.1371/journal.pone.0059129.t001 using k-NN, and arguments that favor using a SLM, along with disadvantages for both. Our simulations of artificial data and resamplings of real data are not exhaustive; however, for the criteria that we chose (RMSPE, signed relative bias, and prediction interval coverage), the results presented in the previous section clearly favor SLM in general. To summarize, we simulated data under conditions that should severely test the SLM method. Because k-NN is primarily used in forestry, we included various k-NN methods in the simulations. In all cases, even with misspecified covariance models, mis-specified linear models (including nonsignificant covariates and ignoring significant ones), zeroinflated count data, binary data, and skewed real forestry data, the SLM performed better than k-NN, and generally provided valid inference with little bias, and prediction intervals that contained the true values the correct proportion of time. From a single simulation, it also appears that the SLM is more robust to unbalanced spatial sampling. These results generally verify the claim in the introduction that EBLUP used to estimate the SLM is fairly robust in a variety of ways. The SLM has an additional benefit from its model-based assumptions; it allows point-wise inference, with globally valid prediction intervals that vary at each point.
Our results can be compared to previous literature cited in the Introduction, such as [46], where our SLM is mathematically equivalent to their universal kriging (UK); however, parameter estimation likely differed in the studies ( [46] do not specify if they used the REML option when they fit variograms using the GSTAT package [56]). In [46], the SLM performed well compared to another k-NN method called gradient nearest neighbor (GNN), but not as consistently better as our results. Our results can also be compared to [47], where our SLM is In the Data column, PM indicates the PMAI data set, DB indicates the DRYBIOT data set, and UN indicates the PMAI data set with unbalanced sampling, as described in Section ''Forest Data.'' Each data set used 386 samples per resampling, and for point predictions, indicated by P in the P/T column, summaries were based on 1500 predictions per resampling, which were then averaged over the 500 resamples. There was one total estimate per resample, which were summarized over the 500 resamples, and indicated by T in the P/T column. Different prediction methods form the rest of the columns and are described in Section ''Prediction Methods.'' Performance measures form the rows and are described in Section ''Performance Measures.'' doi:10.1371/journal.pone.0059129.t002 mathematically equivalent to their kriging-with-external-drift (KED). An interesting hybrid method that uses MSN with kriging on the residuals is compared to the SLM based on RMSPE and bias [47]. They find that the MSN-kriging hybrid is slightly better than the SLM, with both better than MSN alone. However, we note that [47] do not give a standard error estimator of point-wise predictions for the MSN-kriging hybrid. Also [47] estimate the SLM by first fitting a linear model assuming independence, and then computing and fitting a semivariogram on residuals. The use of REML for the SLM, as we have described it, estimates the fixed effects assuming correlated residuals, and is expected to be more efficient.
Note that it may seem surprising that k-NN was mostly unbiased for these simulations. Clarification is required because the Introduction claimed that k-NN is biased [16][17][18][19]. These authors equate bias with the fact that k-NN underestimates large values and overestimates small values; in geostatistics, this characteristic is called smoothing [51](pg. 158). Smoothing is a desirable characteristic under squared-error loss, which the SLM minimizes, so it is also a property of the SLM [57](pg. 189). Because the SLM is BLUP, it is unbiased for point-wise predictions; however the predictions are not unbiased for nonlinear functionals of the spatial population, such as quantiles. For example, following [58], but for finite populations, F (y; fY i g)~(nzm) {1 P nzm i I(Y i vy) is a spatial cumulative distribution function (SCDF). Then its inverse, F {1 (a; fY i g): arg min½y[R; F (y; fY i g)wa defines the a quan-tile, which is nonlinear in fY i g. Predictors that can handle such nonlinearity have been proposed [59], and by matching variances in the predictions to those in the data, predictions will no longer underestimate high values or overestimate low values. However, it should be noted that these predictors will sacrifice the pointwise MSPE as optimized for BLUP; for an example where the prediction-variance-constrained MSPE is twice that of the ''smooth'' SLM predictor, see [59]. This illustrates that, in general, no set of predictors will be optimal for all purposes.
More generally, it is possible, though computationally expensive, to obtain multiple sets of predictions, where the predicted data are simulated from conditional distribution properties of the population. The multiple prediction sets can be averaged to obtain predictions that satisfy BLUP, or quantiles can be computed across the sets. Multiple sets of predictions also allow the propogation of uncertainty if prediction sets are used as inputs to other models. In fact, k-NN is closely related to multiple imputation methods [60][61][62][63], which sample from existing data to impute for missing data; in that sense they are like using k~1 multiple times to give multiple possible ''realizations.'' Again, there are equivalent ideas in geostatistics, generally termed ''conditional simulation,'' e.g., [64] and [51](pgs. 452-453). We do not pursue a comparison here but suggest it for further research. Given the above discussion, we emphasize that our goal was point-wise unbiased prediction while minimizing the MSPE, which is what the SLM achieves in a single map, and compare that to k-NN on the same basis.
A model-based variance for k-NN predictors remains problematic. Cross-validation works from a global standpoint. Attempts at making variance local, such as the iterated variogram approach [23], did not work well for one simulated data set as shown by Figure 4. There was no correlation between estimated prediction standard errors and actual absolute errors, so cross-validation was just as good, and the iterated variogram approach had convergence problems and does not work with k~1. More testing of this method, and possible improvements are warranted. The kNNGeo method was correlated to actual absolute errors. However global prediction intervals for kNNGeo were too conservative with 94.2% coverage for 90% prediction intervals because the bias component is not stochastic, but is treated as such if included in prediction intervals. Thus, the SLM is the only viable choice that we examined for making valid uncertainty maps along with predictions.
Finally, we stress that the SLM was presented here as a ''black box'' method. As we used it, there were no decisions involved; after choosing a covariance model like the exponential, use all covariates that are available, estimate the covariance parameters with REML, and plug the resulting covariance matrix into the prediction equations. This allowed us to make predictions for thousands of simulations. Such a ''black box'' method is certainly possible when many predictions are needed by personnel with little statistical training. However, when data have been collected at great expense, a careful analysis is better. In that case, exploratory data analysis, understanding of covariate relationships, finding and explaining outliers, model selection and diagnostics, and finally prediction, all can enhance prediction and understanding for both the SLM and k-NN, and we recommend that over a black box approach. For example, a Bayesian approach can account for the fact that covariance parameters are estimated and should correct for the plug-in aspect of the EBLUP, e.g. [43,65], with available software [66]. Also, when remotely sensed data are involved, data sets can be massive in size. In that case, other methods can be used, e.g. [67,68]. Moreover, there is no single correct analysis for forestry data sets; they can be modeled in various ways to achieve different desired goals.