^{1}

^{2}

^{*}

^{1}

^{2}

^{1}

^{2}

The authors have the following interests. The work has been supported by the Cooperative Research Centre for Spatial Information. All authors are affiliated with the CRC for Spatial Information. There are no patents, products in development or marketed products to declare. This does not alter their adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.

Conceived and designed the experiments: SYK JM KM. Performed the experiments: SYK. Analyzed the data: SYK. Wrote the paper: SYK.

Discretization of a geographical region is quite common in spatial analysis. There have been few studies into the impact of different geographical scales on the outcome of spatial models for different spatial patterns. This study aims to investigate the impact of spatial scales and spatial smoothing on the outcomes of modelling spatial point-based data. Given a spatial point-based dataset (such as occurrence of a disease), we study the geographical variation of residual disease risk using regular grid cells. The individual disease risk is modelled using a logistic model with the inclusion of spatially unstructured and/or spatially structured random effects. Three spatial smoothness priors for the spatially structured component are employed in modelling, namely an intrinsic Gaussian Markov random field, a second-order random walk on a lattice, and a Gaussian field with Matérn correlation function. We investigate how changes in grid cell size affect model outcomes under different spatial structures and different smoothness priors for the spatial component. A realistic example (the Humberside data) is analyzed and a simulation study is described. Bayesian computation is carried out using an integrated nested Laplace approximation. The results suggest that the performance and predictive capacity of the spatial models improve as the grid cell size decreases for certain spatial structures. It also appears that different spatial smoothness priors should be applied for different patterns of point data.

Spatial data are available in various forms; at point level, grid level or area level. In the context of epidemiological studies, area level data are usually utilized due to its availability. This is because some phenomena are expressed naturally as area level data such as contextual variables in social epidemiology. In addition, disease incidence is often aggregated to administrative districts in order to protect patient confidentiality. For convenience, the aggregated data are further used to study small-scale geographical variation. Consequences of this practice include loss of individual information and potential ecological fallacy

In contrast, point level disease data contain desirable individual information and precise domicile addresses in some instances, alleviating the issue of ecological bias. However, they are often difficult to access due to confidentiality issues. Even if they are available, patients' residential locations have to be protected and are not allowed to be published, which has restricted the types of analyses that can be carried out on point level disease datasets. Another limitation is that the study of small-scale geographical variation is not practicable if using individual level disease data. As a compromise, we utilize point level disease data in this study but employ a grid level modelling approach to study the geographical variation of residual disease risk using regular grid cells. As a result, the issue of patient confidentiality and ecological bias are both addressed in this study.

We model the individual disease risk using a logistic model with the inclusion of spatially unstructured and/or spatially structured random effects. Geographical variation of residual disease risk is modelled using a spatial component that allows for the heterogeneity of random effects and borrows strength from neighboring grid cells. The grid cells are far smaller than the typical administrative districts and therefore allow for better specification and identification of spatial random effects. Many ecological responses of interest do not recognize areas or borders defined for administrative purposes, and thus a finer geographical scale of study is often more appropriate for ecological studies

Despite being less common than studying the geographical variation using the area level data, the grid level modelling approach has rapidly increased in popularity in recent years

One of the challenges in the grid level modelling is the specification of an appropriate spatial scale for a specific spatial dataset. At present, not much is known about the impact of different spatial scales on the outcome of spatial models at different spatial patterns. Without repeating the analyses at multiple scales, it is difficult to know whether the findings at various scales are consistent. According to

Given this identified challenge, the study has two main aims: (i) to investigate the impact of changes in spatial scale on model outcome for a set of spatial structures; (ii) to evaluate the performance of various Bayesian spatial smoothness priors for spatial dependence, namely an intrinsic Gaussian Markov random field (IGMRF), a second-order random walk (RW2D) on a lattice, and a Gaussian field with Matérn correlation function. Bayesian inference is carried out using integrated nested Laplace approximation (INLA) throughout the study.

We designed a simulation study and utilized a case study to fulfill the aims. The simulated datasets consist of point data with various spatial structures including inhomogeneous point patterns, patterns with local repulsion, patterns with local clustering, and patterns with local clustering in the presence of a larger-scale inhomogeneity. The case study involves the analysis of the Humberside data on childhood leukaemia and lymphoma. This dataset portrays a sparse spatial pattern with potential spatial clustering.

Let

Spatial variation in the individual risk is modelled using different components including

The IID model considers a fixed intercept and unstructured random effects

The IID model defines

where

In the IGMRF model corresponding to (1), the spatially unstructured component,

An IGMRF for

where

The RW2D model corresponding to (1) employs a different formulation for the spatially structured effect. Here,

The RW2D model is defined on a regular grid (see Rue and Held

The precision

The MATERN2D model corresponding to (1) is considered. The spatially structured effect

where

In light of the computational cost of Markov chain Monte Carlo (MCMC) methods for spatial inference, we adopt the integrated nested Laplace approximation (INLA) approach proposed by

where

The models considered in this study are regarded as latent Gaussian models by assigning

The vector

In order to estimate (2) and (3), nested approximations are constructed, and numerical integration is used to integrate out

where

Computation in this study is performed in the R package, by calling the inla program. Two steps are taken to run the models. First, the linear predictor of a model is specified using the formula object in R. The specified model can then be run by calling the inla( ) function. We choose “strategy = laplace” to apply a Laplace approximation in (4) to estimate the marginals of the components of the latent field. The output of the inla( ) function generates various statistics such as marginal likelihood, deviance information criterion, effective number of parameters, predictive measures such as logarithmic score

As an illustration, the call in R-INLA to fit the IGMRF model is

data = list(y, j, region.iid, region.struct)

formula = y ∼ –1 + f(j, model = “iid”)

+ f(region.iid, model = “iid”,

hyper = list(theta = list(prior = “loggamma”,param = c(1,0.01))))

+ f(region.struct, model = “besag”, graph = “nb5×5.graph”,

hyper = list(theta = list(prior = “loggamma”,param = c(1,0.01))))

result = inla(formula, family = “binomial”, Ntrials = 1, data = data, verbose = TRUE,

control.compute = list(dic = TRUE, cpo = TRUE), control.inla = list(strategy = “laplace”))

We conducted a simulation study to investigate the impact of spatial scales and spatial smoothing on modelling outcomes. As discussed earlier, a spatial pattern may be present at a given aggregation level and may vanish at other scales. Therefore, using a range of spatial scales, the purpose of this simulation study was to investigate the performance of the models when dealing with different spatial structures of point-based data. We simulated spatial point-based data from various classical point-process models on the unit square. As guided by Illian et al.

In dataset

Dataset

To generate the clustered cases in dataset

The cases in dataset

In dataset

The cases in dataset

Datasets

Various spatial patterns are considered, including inhomogeneous point patterns, patterns with local repulsion, patterns with local clustering, and patterns with local clustering in the presence of a larger-scale inhomogeneity.

We considered a realistic example (

To evaluate the impact of modelling the random effects at different spatial scales, we considered the partitions at the grid level by discretizing the study region using grids 5×5, 10×10, 15×15, 20×20, 25×25, 30×30, 35×35, 40×40, 45×45, and 50×50. The grid 5×5 resulted in 25 grid cells over the unit square, the grid 10×10 resulted in 100 grid cells over the unit square, and so on. So, the grid 5×5 had the largest grid cell size whereas the grid 50×50 had the smallest grid cell size. The cell2nb function in the spdep R package

In terms of prior specification, the precision parameters of the unstructured random effect and spatial effect,

For the purpose of model comparison, DIC was used to select the most parsimonious model after penalizing for model complexity. We note that DIC has been criticized

The logarithmic score (LS) for each model was also computed

where

We describe the results for model fitting on the six simulated datasets and the realistic example in this section. The DIC for fitting the four models on the six simulated datasets at various spatial scales are presented in

The RW2D model fitted at small grid cell sizes appears to be a reasonable choice for dataset

The most appropriate spatial scale for fitting dataset

The RW2D model fitted at small grid cell sizes appears to be a reasonable choice for dataset

The most appropriate spatial scale for fitting dataset

The IID, IGMRF and MATERN2D models perform quite similarly at all spatial scales; whereas the RW2D model has larger DIC and LS than the other models at the first four spatial scales but its performance gradually improves as the grid cell size decreases. Across the spatial scales from the largest grid cell size to the smallest grid cell size, it is observed that the performance of the IID, IGMRF and MATERN2D models is fairly consistent. However, the RW2D prior appears to perform increasingly well as the grid cell size reduces. Therefore, for point data that are sparse and inhomogeneously distributed across the space such as dataset

Based on the results obtained for dataset

In dataset

Dataset

As suggested by the results, some models behave differently at different scales, e.g., working relatively well at certain scales but not others. This could be related to the fact that the smoothness priors perform in different mechanisms at various scales due to the impact from the neighboring grid cells. For illustration, we present the estimated precision parameters for the MATERN2D model at various spatial scales in

Spatial scale | Precision for |
Precision for |
||

Mean | Std dev | Mean | Std dev | |

5×5 | 0.455 | 0.092 | 172.474 | 44.838 |

10×10 | 0.217 | 0.043 | 23.700 | 12.438 |

15×15 | 46.034 | 4.236 | 0.084 | 0.007 |

20×20 | 49.972 | 5.169 | 0.065 | 0.006 |

25×25 | 40.582 | 3.022 | 0.041 | 0.003 |

30×30 | 41.473 | 3.413 | 0.051 | 0.004 |

Datasets

In order to understand the discretization of the Humberside dataset better, we provide the summaries of the number of events included in the grid cells for all the non-zero cell counts at the different spatial scales (

Spatial scale | Case | Control | ||||

Min | Mean | Max | Min | Mean | Max | |

5×5 | 1 | 6.20 | 33 | 1 | 11.75 | 83 |

10×10 | 1 | 4.77 | 22 | 1 | 6.13 | 44 |

15×15 | 1 | 2.82 | 10 | 1 | 4.15 | 29 |

20×20 | 1 | 2.95 | 11 | 1 | 3.62 | 17 |

25×25 | 1 | 2.39 | 8 | 1 | 3.36 | 14 |

30×30 | 1 | 1.88 | 4 | 1 | 2.77 | 10 |

35×35 | 1 | 2.00 | 6 | 1 | 2.71 | 12 |

40×40 | 1 | 1.77 | 5 | 1 | 2.31 | 11 |

45×45 | 1 | 1.68 | 4 | 1 | 2.14 | 8 |

50×50 | 1 | 1.55 | 4 | 1 | 2.10 | 9 |

Based on the results obtained from modelling the Humberside dataset, the IID, IGMRF and MATERN2D models produce similar DIC and LS at the various spatial scales (

The dataset should be fitted at the grid 30×30 using the MATERN2D prior for spatial smoothing.

Spatial patterns | Recommended spatial smoothness priors and spatial scales | Sensitivity of the models towards the changing spatial scales |

Sparse inhomogeneous point pattern (dataset |
The RW2D model at small grid cell sizes. | The IID, IGMRF and MATERN2D models perform consistently at all spatial scales; the RW2D model is sensitive towards the changing grid cell sizes. |

Sparse point pattern with local repulsion and mild inhomogeneity (dataset |
The RW2D model; spatial scales have little impact on model outcomes. | All four models perform rather consistently at all spatial scales. |

Sparse inhomogeneous point pattern with large clusters (dataset |
The RW2D model at grids 15×15 and above. | The IGMRF model performs consistently at all spatial scales; the IID, RW2D and MATERN2D models are sensitive towards the changing grid cell sizes. |

Sparse inhomogeneous point pattern with small clusters (dataset |
The MATERN2D model at the grid 20×20 or the IGMRF model at the grid 25×25. | The RW2D model performs consistently at all spatial scales; the IID, IGMRF and MATERN2D models are sensitive towards the changing grid cell sizes. |

Dense inhomogeneous point pattern (datasets |
The IID, IGMRF and MATERN2D models; spatial scales have little impact on model outcomes. | All four models perform rather consistently at all spatial scales. |

Sparse point pattern with clusters (the Humberside dataset) | The MATERN2D model at the grid 30×30. | The IID, IGMRF and RW2D models perform rather consistently at all spatial scales; the MATERN2D model is sensitive towards the changing grid cell sizes. |

We evaluated the performance of a range of spatial smoothness priors (an intrinsic Gaussian Markov random field (IGMRF), a second-order random walk on a lattice (RW2D), and a Gaussian field with Matérn correlation function (MATERN2D)) and spatial scales for various spatial structures using deviance information criterion (DIC) and logarithmic score (LS). The simulated datasets consist of points that are distributed across the space at various spatial patterns. The Humberside data are real phenomena where the data points are spatially sparse and exhibit a cluster. The results in this study suggest that different spatial smoothness priors and spatial scales may be appropriate for different patterns of spatial point-based data.

We note that for spatially sparse and inhomogeneously distributed point pattern (dataset

When the point patterns with local repulsion and mild inhomogeneity (dataset

The sparse inhomogeneous point pattern with a number of small clusters across the space (dataset

The realistic example studied here (the Humberside dataset) has further confirmed that for sparse point pattern with potential spatial clustering, the spatial scale and spatial smoothness prior have to be chosen carefully in modelling. The model fit (as guided by the DIC) and predictive performance of the models (as guided by the LS) differ at the different spatial scales. The results for this dataset show that the best modelling approach for this dataset is the MATERN2D model at the grid 30×30. This complements the results for dataset

The various spatial smoothness priors considered in this study have been shown to be applicable for different spatial structures. We note that it is possible to choose the appropriate prior based on the spatial structures but a range of priors should generally be considered. As suggested by our study, the RW2D prior is a reasonable choice for spatial smoothing when spatially sparse point patterns are involved, regardless of whether the points are homogenous or inhomogeneously distributed across the space. The RW2D prior imposes spatial smoothing by taking into account the first and second-order neighbors. Our study also shows that the IGMRF prior is suitable for spatial smoothing in spatially dense and inhomogeneous point patterns as it considers only first-order neighbors. The RW2D prior is essentially a second-order IGMRF on a lattice. It is quite flexible due to its invariance to addition of a linear trend. The RW2D prior imposes a higher level of spatial smoothing than the IGMRF prior due to the presence of the second-order neighbors. Sparse data need more spatial smoothing than dense data, therefore the RW2D prior works well in this context. If spatially dense and homogeneous point patterns are considered, the model may not include the spatially structured component but only the unstructured component assigned the IID prior. The MATERN2D prior appears to be well-suited for capturing the spatial effect in spatially clustered point patterns but it is very sensitive to the changes in spatial scales. This could be due to the representation of the smoothness parameter which gives the model great flexibility in modelling clustered point data that require a relatively high level of spatial smoothing.

In conclusion, we note that it is crucial to repeat the spatial analyses at multiple spatial scales when modelling inhomogeneously distributed point patterns as the model fit and predictive performance of the models appear to vary at different spatial scales.

An acknowledged limitation of the study is that we simulated one scenario for each point process structure of interest. Therefore, we are reserved about the generality of the conclusions drawn above. For future work, more than one simulation scenario for a continuum of point process models with varying spatial structures could be studied in order to achieve more general conclusions. In this study, we consider grid cells with equal sizes as it was argued by

Given the different results observed and different inferences made at the different spatial scales, it is crucial to repeat the analysis at different scales as the data may contain useful information at more than just one scale. It is also important to take into account the spatial scale that is of interest in a particular problem, i.e., the scale at which decisions or inferences will be made in practice. Often, disease management and policy making of subpopulation require modelling at a coarser scale than that required for understanding individual influences or associations. The choice of spatial scale is typically influenced by geo-political considerations, for instance, administrative districts are often used to describe and to understand geographical variation of a disease, with the aim being to assist public health decision making. Similarly, the identification of population-based clusters may differ from local clusters, with different interpretations and decision/action implications.

The authors would like to thank Nicole White and the two reviewers of this paper for their helpful comments.