S-maup: Statistical test to measure the sensitivity to the modifiable areal unit problem

This work presents a nonparametric statistical test, S-maup, to measure the sensitivity of a spatially intensive variable to the effects of the Modifiable Areal Unit Problem (MAUP). To the best of our knowledge, S-maup is the first statistic of its type and focuses on determining how much the distribution of the variable, at its highest level of spatial disaggregation, will change when it is spatially aggregated. Through a computational experiment, we obtain the basis for the design of the statistical test under the null hypothesis of non-sensitivity to MAUP. We performed an exhaustive simulation study for approaching the empirical distribution of the statistical test, obtaining its critical values, and computing its power and size. The results indicate that, in general, both the statistical size and power improve with increasing sample size. Finally, for illustrative purposes, an empirical application is made using the Mincer equation in South Africa, where starting from 206 municipalities, the S-maup statistic is used to find the maximum level of spatial aggregation that avoids the negative consequences of the MAUP.


Introduction
Although spatial data are increasingly disaggregated, many socioeconomic studies require some level of aggregation (e.g., neighborhoods, municipalities, states, districts, countries). Spatial aggregation is useful for calculating rates and indexes, minimizing the influence of outliers, or preserving confidentiality [1,2]. Spatial aggregation is also useful for creating meaningful units for analysis [3,4], reducing computational complexity [5], controlling for spurious spatial autocorrelation [6,7], comparing results at different scales [8,9], and merging different datasets to a comparable resolution [10].
However, spatial aggregation triggers a problem known as the Modifiable Areal Unit Problem (MAUP). The MAUP, introduced in the literature by [11] and [12], refers to the sensitivity of statistical results to changes in the spatial units of analysis. The MAUP has two dimensions: the scale effect and the zoning effect. The scale effect refers to changes in the size of the spatial units, which implies a change in the number of spatial units, e.g., doing the analysis at the state a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 or county level. The zoning effect refers to changes in the shape of the spatial units preserving the number of units, e.g., aggregating USA counties into 50 states is merely one of the many ways in which one can aggregate counties into 50 spatial units.
Although the literature on MAUP is extensive, to the best of our knowledge, there is no statistical tool that allows a practitioner to easily determine the level of sensitivity of a spatially intensive variable to the MAUP (in Geographic Information Science, spatially intensive variables, such as rates, densities and proportions, are averaged when areas are aggregated [13]). Hence, in this paper, we present S-maup, a nonparametric statistical test to measure the sensitivity of a spatially intensive variable to the MAUP. Instead of looking at a specific measure of central tendency or dispersion or at the coefficient associated with the variable in a specific regression, S-maup focuses on determining how much the distribution of the variable, at its highest level of spatial disaggregation, will change when it is aggregated into a given number of regions. For its calculation S-maup requires the number of areas, the ρ parameter, that measures the degree of spatial correlation of the variable, and the number of regions in which the areas will be aggregated. Under the null hypothesis of non-sensitivity to MAUP, S-maup would be useful for determining the maximum level of aggregation that we can apply to a given variable before it loses its distributional characteristics. S-maup could also be used to determine whether the results obtained at a given scale (e.g., counties) hold for another scale (e.g., states).
The rest of this article is structured as follows. We begin with a literature review concerning the primary research surrounding the MAUP. We then explore the effects of the MAUP through a computational experiment. Next, we propose a test statistic, S-maup, and its empirical distribution under the null hypothesis of non-sensitivity to MAUP. Next, we establish the power and size of the statistic under various levels of spatial autocorrelation and number of areas. We then present a simple example of the use of the S-maup statistic. Last, we conclude and suggest avenues for further investigation.

Literature review
The effects of aggregating spatial data have been a subject of study since the early 1930s and have been referred to by different names, such as aggregation effects [14], scale problem [3], ecological fallacy [15], and Modifiable Areal Unit Problem, MAUP, [12]. If one delves into the details, it can be argued that these previous concepts are different. However, these concepts possess as a common factor a concern regarding the undesired effects that result from working with aggregate data. Hereinafter, we will refer to this problem as MAUP.
It is well known that the impact of the MAUP on the mean can be considered negligible [17][18][19][20]. However, the MAUP has a large impact on the variance, which decreases when the variable exhibit high values of spatial autocorrelation [21]. With respect to the statistical association, such as the covariance and correlation coefficient, [22], [12] and [17] found that the sensitivity to MAUP increases as the level of spatial aggregation increases (scale effect), i.e., the correlation between variables X and Y will exhibit a wider variation if, for example, USA counties are aggregated into 50 spatial units than if they were aggregated into 1,000 spatial units.
Although there is no solution to the MAUP because it is inherent to the use of spatial data, some authors have proposed different alternatives to minimize its effects: the formulation of scale-robust statistics [40], the design of optimal aggregations that minimize the loss of information [4,9,16,41,42], the use of a set of auxiliary or grouping variables together with variables at the individual level [43,44], and the measurement of rates of change through the concept of a fractal dimension [24].
Most studies above required extensive computational experiments. Table 1 summarizes the main characteristics of those experiments, including the covered dimensions (scale or zoning), studied statistics (mean, variance, correlation, regression coefficients, etc.), type of data (real or simulated), studied variables (income, rates, random, etc.), and size of the experiment in terms of the number of areas and regions (herein, we will refer to area as the smallest spatial unit of observation and region as the spatial units that result from aggregating the areas into contiguous spatial units). From this table, we can highlight the dominance of the use of real data over simulated data and the evident increase in the size of the experiment as the computational capacity increases over the years. As expected, the two driver parameters in these experiments are the number of areas and the number of regions. Although it has been considered in a few experiment [6,12,21], the level of spatial autocorrelation of the variables/attributes being aggregated plays an important role in the level of sensitivity of the variable to the MAUP. Finally, the mean is significantly highlighted by being the more common grouping operator, i.e., if areas i and j, with attribute values X i and X j , are merged into a region, the attribute value for the resulting region is calculated as the mean of X i and X j , which indicates that all of the experiments use spatially intensive variables.
Based on the available literature, a practitioner can anticipate high(low) variation of its results when the aggregation level is high(low) and the level of spatial autocorrelation of its variable is low(high). However, there is no tool in the literature that allows the assignment of a specific number and statistical significance to that variation. The closest the research can get to that number would require a computational experiment involving the calculation of the results for a large number of random aggregations of the areas into a predefined number of regions. This paper constitutes the very first attempt to formulate a nonparametric statistical test to easily measure the sensitivity of a spatially intensive variable to the MAUP.

MAUP effects
In this section, we design a computational experiment to identify the key elements that should be included in the construction of the statistical test. Following previous experiments in the literature on the MAUP effects (e.g., [20] and [31]), we consider the two main parameters involved in the exploration of scale and zoning effects: number of areas (N) and number of regions (k). As in [12] and [21], we also take into account different levels of spatial autocorrelation, ρ. Fig 1 summarizes the steps followed to generate an instance of the computational experiment: (1) y ρ = 0.9 is a random variable generated by a Spatial Autoregressive (SAR) process with autoregressive parameter ρ = 0.9 and rook contiguity matrix. (2) The areas are randomly aggregated into k spatially contiguous regions using a seed-based region growing algorithm proposed by [45]. The attribute value for each region is calculated as the mean value of the attribute values of the areas assigned to the region. This random aggregation is repeated r = 30 times, so that we generate 30 different ways to aggregate N areas into k regions.
Although there exist many options regarding the definition of neighbors, in this paper we decided to use the rook definition [46], which is the most basic and the most popular specification in the field of regional science [47]. Also, in the field of spatial econometrics, although it is a controversial issue, it is recommended the use of an exogenous weights matrix such as the first-order spatial contiguity matrix [48][49][50]. The rook matrix has been also used in four of the most important computational experiments on the effects of the MAUP [20,30,31,33]. However, according to [51], in grouping through averages, the variance of the aggregated variable decreases faster as the average number of neighbors increases. Therefore, we highlight that the S-maup is designed for cases in which the average number of neighbors is close to 4 (see [52], for a topological characterization of different specifications of spatial contiguity matrices).
As is common in the literature on the MAUP effects, our experiment considers different levels of spatial autocorrelation, ρ, ranging between -0.9 and 0.9. Each instance of ρ = 0.9, y ρ = 0.9 , was generated from a spatial autoregressive data generating process, SAR, of the type y = ρWy + �. Once we generate a y ρ = 0.9 , we use it to generate instances with other values of ρ (e.g., y ρ = 0 , y ρ = −0.9 , or y ρ = 0.5 ) by spatially redistributing its values. For example, an instance of y ρ = 0.0 is generated from y ρ = 0.9 according to the following procedure: (1) Generate a SAR process y ρ = 0.9 . (2) Generate a SAR process x ρ = 0.0 . (3) Perform a spatial redistribution of the values of y following the spatial pattern of x; i.e., the area with the highest value of x takes the highest value of y; the area with the second highest value of x takes the second highest value of y; and so forth. (4) Estimate the ρ value from the redistributed variable and keep it if and only if jr À rj � 0:1; otherwise, repeat steps (2) to (4). With this novel process we guarantee that no aggregation effect attributed to a change in ρ comes from using different vectors of values. Having clarified the process that we follow at each instance and our strategy for generating the y ρ values, we present the parameters used in the computational experiment: We implemented the experiment in Python [53]. For the spatial aggregations, we use the Python library ClusterPy [54]. We ran the experiment in the supercomputer APOLO, at the  test to compare the mean of each original variable, μ o , with the mean of each aggregated variable, μ ag . We report in Table 2 the proportion of aggregated variables for which the two-sample t-test rejected the null hypothesis of μ o = μ ag at the 5% level of significance. From this result, we can conclude that there is not a MAUP effect on the mean, which is consistent with those results found by [17,18] and [20]. Each box plot in Fig 4 summarizes the i = 50 values of RCV calculated for each value of ρ and k. Unlike the case seen with the mean, the effect of variance is considerably greater. The box plots show that the effect of MAUP on variance decreases for two reasons: an increase in the level of spatial autocorrelation, ρ; and (2) an increase in the number of regions, k. These effects on variance are consistent with those found by [21].
To verify the MAUP effect on variance, we use the Levene test for equality between the variance of the original variable,

S-maup statistical test
Findings such as the effect of MAUP on variance and how MAUP fades as ρ and k increase are useful to find the functional form of our statistical test, S-maup, for measuring the level of sensitivity of a spatially distributed variable to the MAUP. We designed the test such that S-maup takes values close to zero when the variable is not sensitive to the MAUP and values close to one when the variable is highly sensitive to the MAUP. Furthermore, S-maup will be a univariate statistic applicable to spatially expansive variables whose aggregated values result from the average of the individual values.

S-maup
To find the functional form of S-maup, it is necessary design an expression that describes the distribution of the effects of MAUP on the variance (RCV ). To summarize those effects, we took the median of each Box Plot in Fig 4. Fig 7 shows an example of those summarized effects.
According to Fig 7, the mathematical expression of our test should take values close to one when the variable under evaluation has high negative spatial autocorrelation, ρ and is aggregated into a small number of regions, k. Conversely, the expression should take values close to zero when the variable under evaluation has high positive spatial autocorrelation, ρ and is aggregated into a large number of regions, k. Our expression should also be able to reproduce the way in which, for a given k, the MAUP effects decreases as ρ increases. Note that such a decrease is not the same for all values of k: when k is large, the effects of MAUP are low even for highly negative values of ρ; therefore, for a high k, the reduction of the MAUP effects, as ρ increases, are almost imperceptible. Thus, as k increases, our expression should modify the speed and moment at which the MAUP fades along ρ. Taking into account these different conditions, we started the construction of our S-maup statistic using an inverted logistic function [55], which is defined by Eq (5).
where L determines the maximum value of the curve; η determines the moment at which the curve begins to decline; and τ indicates the speed at which the curve declines. If we endogenize those three parameters, we should be able to approximate any line of the type shown in Fig 7. This is what we are going to develop in the rest of this subsection until we obtain an expression of M in which parameters L, η and τ depend on ρ, k and N.
Starting with the parameter L, Fig 7 shows that the maximum value of each logistic curve depends on the level of aggregation k. This aggregation can be defined in relative terms as y ¼ k N . Therefore, the lower the level of aggregation (i.e., as θ approaches 1), the lower should be L. When plotting each median RCV against θ, it depicts an inverted "S" that could also be modeled as an inverse logistic function with the expression presented in Eq (6), whose linear form is given by Eq (7).
where b and m are the parameters of the inverse logistic function. To estimate those parameters, we used a robust linear regression model that minimizes the influence of outliers. The parameter associated with θ is significant, and the adjusted R-squared = 86.7% . Fig 8(a) shows the robust regression over the linearized logistic function.
Returning to the logistic curves in Fig 7, both the moment at which the curves begin to decrease, η, and the speed of decreasing, τ, depend on k. Therefore, both parameters can be estimated as function of y ¼ k N . For this, we adjusted an inverse logistic function for each curve of the type presented in Fig 7. For each curve, the values of η and τ were calibrated using the optimized module of Scipy Python Library [56]. With this process, we obtained a value for η and τ for each value of θ. Then, we use a linearized power function, Eq (8), and a linear  S-maup: Statistical test to measure the sensitivity to the modifiable areal unit problem  function, Eq (9), to express η and τ as a function of θ.
ZðyÞ ¼ py a ð8Þ The parameter associated with θ was significant in both estimations and the adjusted R 2 , with 91.7% and 84.5% respectively . Fig 8b and 8c present the estimations.
Mðr; yÞ ¼ The results of the estimation of the parameters in the robust linear regression model for the logistic function of L are as follows: m = 7.031 and b = −2.188. Considering that the model is estimated with the linearized logistic function, these results were transformed by natural logarithm. For the power function of η, the results are p = 0.516 and a = 1.287, because of the linearization of the power function, we applied the natural logarithm to the parameter p. Finally, the results of the linear function of τ are as follows: β 0 = 5.319 and β 1 = −5.532. Replacing in the equations produces the following: Thus, the expression of the S-maup statistic is the following: Mðr; yÞ ¼ 1 1þe À 2:188þ7:031y 1 þ ½0:516y 1:287 �e ½5:319À 5:532y�r Recall that S-maup statistic (M) is designed in such a way that for a bigger (smaller) sensitivity of a variable to the MAUP, the larger (smaller) is the value of M. This characteristic allows us to define a non-parametric unilateral statistical test, which is stated below: Where the statistic for the test is given by Eq (14), and therefore, H 0 will be rejected if the statistic value belongs to the rejection region (RR) defined in Eq (15).
M α;ρ,N is the critical value given a significance level α, a level of spatial autocorrelation (ρ), and a number of areas (N). We implemented a Monte Carlo simulation to find the empirical distribution of the S-maup under the null hypothesis previously stated. The empirical distribution allows us to obtain the critical values as well as the pseudo-value p to determine the proof significance.

Critical values and p-value
To calculate the critical values, we performed an exhaustive simulation study based on nonparametric statistic methodology. Recall that H 0 means no sensitivity of a variable to MAUP, which is equivalent to stating that, for a given k, the variance of the aggregated variable is statistically equal to the variance of the original variable. For building the empirical distribution under H 0 , we set a value for N and ρ and generated an SAR process with parameters (N, ρ). Then, we randomly selected an integer value k such that 0.1N < k < N, thus yielding 30 random aggregations of the variable into k regions. Next, we applied the Levene test for equality of variances between the original variable and each one of the 30 aggregated variables. The SAR(N, ρ) variable was kept if and only if the Levene test was not rejected in all 30 cases. If there was at least one rejection, then we chose, at random, a new k and repeated the previous steps. This procedure was repeated until we obtained 1,000 instances for each pair (N, ρ). We then calculated the S-maup statistic for those instances using Eq (14) and generated the empirical distribution of the statistics under H 0 . The critical values were obtained by calculating the 90%, 95%, 99% percentiles for the empirical distribution. Table 3 presents the table of critical  values. This Table implied the generation of 54,000 instances.
Following the percentile approach utilized by [57], we can calculate a pseudo-p-value for a given value of the S-maup test (M), using the Eq (16): where C = 1 if M r;N j > M, C = 0 otherwise. The vector M r;N j comes from the simulations performed to produce Table 3. Since those vectors are extremely computationally intensive to produce (in some instances requiring months of supercomputer computation for completion), they will be publicly available at http://www.___.edu, as well as the Python script to run the Smaup statistic. Table 4 presents some examples of the S-maup statistic for different values of N and k. Note that when the variable y i presents characteristics against the null hypothesis (H 0 ), then the M value of the S-maup should be greater than the critical value at some significance level α, and therefore, the pseudo-value p of the test must be smaller than the significance level. If H 0 is rejected, it can be concluded that the variable y i is sensitive to the MAUP, and therefore, a MAUP effect exists when aggregating y i in k regions.
Note that when the spatial autocorrelation is highly positive (e.g., ρ = 0.801), the variable allows high levels of aggregation. The results also confirm that low levels of spatial aggregation do not lead to the undesirable effects of MAUP.

Power and size
The power is a natural way of evaluating the test performance. It is defined as the probability of rejecting the null hypothesis, given that the null hypothesis is false. In other words, it is the probability of not committing a type II error (β); thus, the power is (1 − β). In our context, the power means the probability that sufficient statistical evidence exists in the sample to affirm that the variable y i is affected by the MAUP, when in fact, the variable y i is affected by the dimensions of the MAUP. Hence, it is expected that the power of the test is close, or equal, to 1.
Since H 1 implies that the variance of the original variable is different from the variance of the aggregate variable, we implemented the following simulation experiment to measure the power of our statistical test: For each tuple (N, ρ), with N 2 {100, 400, 900} and ρ 2 {±0.9, ±0.7, ±0.5, ±0.3,0}. Given a tuple (N, ρ) we generate an SAR process and perform 30 random spatial aggregations of the N areas into k regions such that k is selected at random as an integer value such that 0.1N < k < N. The SAR process is kept if and only if the Leven test between the original variable and each one of the 30 aggregated variables is rejected. We repeat this process until we generate 1,000 valid instances for each tuple (N, ρ). Each entry in Table 5 reports the proportion of 1,000 instances for which our test rejects H 0 . Because most values are close to 1, we can argue that our S-maup is highly effective in identifying variables that are sensitive to the MAUP effect.
Test size is also a way of evaluating the test performance. Test size is defined as the probability of rejecting the null hypothesis given that the null hypothesis is true. In other words, it is the probability of committing a type I error (α). In our context, test size means the probability that sufficient statistical evidence exists in the sample to affirm that the variable y i is affected by the MAUP, when in fact the variable y i is not. Hence, it is expected that the proportion of instances for which our test commits type I error is close the theoretical significance level (α).
The empirical test size is calculated following a similar procedure implemented to calculate the power, but in this case, the tuple (N, ρ) is selected if and only if the Levene test is not rejected in all 30 cases. Table 6 reports the size of our test, which show the best performance in scenarios of positive spatial autocorrelation.

An illustrative application of the S-maup test
In this section, we present an empirical illustration within the context of a Mincer wage equation [58] that explains the salary based on schooling and experience. Eq (17) presents the most where LNW is the natural logarithm of income (hourly wage), YRSCHOOL years of schooling, EXP years of potential labor market experience (calculated as the age in years minus years of education plus 6), and ε is a mean zero residual. It is important to clarify that this example is merely illustrative. We use this equation because its simplicity allows us to present a simple application of our test. We use the 2011 census data from South Africa retrieved from the Integrated Public Use Microdata Series, International (IPUMS-International), at the Minnesota Population Center [59]. The data include 688,310 individuals who were working at the time of the survey. We aggregate the individual data into 206 municipalities using the weighted average of individual incomes, years of schooling, and the potential work experience.
The 206 municipalities are our basic unit of analysis (i.e., our disaggregated variable). Other administrative units in South Africa include 52 districts and 9 provinces. Table 7 shows some descriptive statistics of our variables at the three administrative levels. Note how the standard deviation of the three variables narrows as the level of aggregation increases. The spatial distribution of the variables is presented in Fig 9.  Table 8 presents the estimation at the municipal level. The coefficients of education and experience are significant and exhibit the expected signs. What would be the maximum level of spatial aggregation for which these results hold? Note that here we are asking about the minimum value for k that preserves the distributional characteristics of the variables; we are not aiming to evaluate a specific regionalization for a given value of k. We can use our S-maup statistic to answer this question by identifying the minimum value of k for which our test fails to reject the null hypothesis of no influence of the MAUP. In Table 9, we present the results of our test for different levels of spatial aggregations. For this, our test requires the level of spatial autocorrelation of each variable (ρ) and the value of y ¼ k N . Note that at k = 135, the S-maup indicates that the variable LNW is affected by the MAUP. This finding may imply that the results obtained at municipal level (k = 206) may hold until an aggregation level of k = 136 that is the aggregation level at which all the variables involved in the regression do not lose their distributional characteristics. Another conclusion from these results is that the results obtained at the municipal level do not hold at district or province levels. Fig 10 compares the coefficients obtained at the municipal level (black and dashed vertical lines) with the distribution of the coefficients obtained by estimating the Mincer equation on 1,000 random spatial aggregations of the k = 206 municipalities into k = 136 regions. Fig 10a, corresponding to years of education, shows that 100% of the coefficients estimated with k = 136 fall into the 95% confidence intervals. Fig 10b, corresponding to years of experience, shows that 98.8% of the coefficients estimated with k = 136 fall into the 95% confidence intervals. Finally, Fig 10c, corresponding to the squared years of experience, shows that 98.7% of the coefficients estimated with k = 136 fall into the 95% confidence intervals.
Next, we estimated the Mincer model for k = 52 and compared it with the estimation for k = 206 municipalities. As we did previously, we made 1,000 random aggregations and

Conclusions
This paper introduced the first statistic of its kind for measuring the level of sensitivity of a spatially expansive variable to the MAUP. The statistic requires as input parameters the level of aggregation y ¼ k N and the level of spatial autocorrelation of the variable ρ. The results indicate that, in general, both the statistical size and power improve with increasing sample size. We also provide the table of critical values and a procedure to calculate the pseudo-p value of the test. The empirical application shows the usefulness of the test for identifying the maximum level of aggregation at which the original variable preserves its distributional characteristics. Additionally, it can be useful to test whether two aggregation levels are comparable.
We recognize that the main properties of the S-maup were obtained from an empirical simulation procedure, and they rely more heavily on hard experimental computation than theoretical methods. However, the complexity of the question addressed in this paper may explain why this is the first attempt to answer it even though the MAUP has been in the literature since the late 1970s. We hope that this first attempt motivates other researchers to contribute other approaches to answer the same question.
Further research could focus on the formulation of a S-maup test that includes the average number of neighbors as an additional parameter and makes a robust version of the statistic that allows for any specification of the weights matrix. Also, the formulation of a variogrambased S-maup version could be of great interest for researchers in the field of geostatistics (we acknowledge one of the anonymous referees for this suggestion).