## Figures

## Abstract

Spatial autocorrelation plays an important role in geographical analysis; however, there is still room for improvement of this method. The formula for Moran’s index is complicated, and several basic problems remain to be solved. Therefore, I will reconstruct its mathematical framework using mathematical derivation based on linear algebra and present four simple approaches to calculating Moran’s index. Moran’s scatterplot will be ameliorated, and new test methods will be proposed. The relationship between the global Moran’s index and Geary’s coefficient will be discussed from two different vantage points: spatial population and spatial sample. The sphere of applications for both Moran’s index and Geary’s coefficient will be clarified and defined. One of theoretical findings is that Moran’s index is a characteristic parameter of spatial weight matrices, so the selection of weight functions is very significant for autocorrelation analysis of geographical systems. A case study of 29 Chinese cities in 2000 will be employed to validate the innovatory models and methods. This work is a methodological study, which will simplify the process of autocorrelation analysis. The results of this study will lay the foundation for the scaling analysis of spatial autocorrelation.

**Citation: **Chen Y (2013) New Approaches for Calculating Moran’s Index of Spatial Autocorrelation. PLoS ONE 8(7):
e68336.
https://doi.org/10.1371/journal.pone.0068336

**Editor: **Guy J.-P. Schumann, NASA Jet Propulsion Laboratory, United States of America

**Received: **January 4, 2013; **Accepted: **May 28, 2013; **Published: ** July 12, 2013

**Copyright: ** © 2013 Yanguang Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Funding: **This research was sponsored by the National Natural Science Foundation of China (grant 41171129; http://www.nsfc.gov.cn/Portal0/default166.htm). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The author has declared that no competing interests exist.

## Introduction

The theory of spatial autocorrelation has been a key element of geographical analysis for more than twenty years. A number of measurements of spatial correlation were proposed so that we can investigate the spatial process of geographical evolution from differing points of view [1]–[14]. Although there are various correlation measurements, two are commonly used. One is Moran’s index [15], and the other, is Geary’s coefficient [16]. The former is a generalization of Pearson’s correlation coefficient, and the latter is analogous to the Durbin-Watson statistic of regression analysis. In theory, Moran’s index is somewhat equivalent to Geary’s coefficient and they can be substituted for one another. However, in practice, Moran’s index cannot be replaced by Geary’s coefficient and *vice versa* due to a subtle difference of statistical treatment. Compared with Geary’s coefficient, Moran’s index is more significant to spatial analysis.

Today, the concepts and methods of spatial autocorrelation have been applied to many fields, which have resulted in a number of interesting findings [17]–[26]. However, despite its long history, many basic and important questions remain to be answered. For example, we still don’t know how to determine the spatial contiguity matrix objectively [21]. The relationships between Moran’s index and Geary’s coefficient are still unclear. In fact, spatial autocorrelation is a special case of the spatial correlation function. This correlation in geographic systems is often associated with the scaling process [27]–[30]. However, so far, spatial autocorrelation has not been linked to scaling laws. It is necessary to find new ways of understanding and implementing spatial autocorrelation analysis in order to solve many theoretical and methodological problems in this field. Before this aim can be accomplished, a simple and general framework must be constructed for this theory.

In this paper, a new way is proposed to express and estimate Moran’s index. Geary’s coefficient can be re-expressed in a new form along with other related measurements. By doing so, we can further develop the analytical process of spatial autocorrelation. The remaining sections of the paper are organized as follows. In Section 2, I will reconstruct the spatial weight matrix and Moran’s index, and improve Moran’s scatterplot in terms of the mathematical processes in this study. In Section 3, the association of Moran’s index with Geary’s coefficient is discussed. The scopes of application of the two parameters are defined for geographical analysis. In Section 4, I will introduce four approaches to calculating Moran’s index and a simple approach to computing Geary’s coefficient. The principal cities of China, including national capital and provincial capitals, are taken as a case to show how to use the methods advanced in this article. Partial processes and results of computing are show in Files S1 and S2. Finally, the paper is concluded with a brief summary. There are several innovative aspects of this study. First, the mathematical expressions are regularized, and new methods of computing spatial autocorrelation measurements are proposed. Second, the implication of Moran’s index as a scaling parameter is reviewed. Third, the similarities and differences between Moran’s index and Geary’s coefficient are clarified. Especially, the threshold values of the two parameters indicating no spatial autocorrelation are corrected for spatial samples.

## Results

### Reconstructing Moran’s Index and Spatial Weights Matrix

The first basic measurement of spatial autocorrelation is Moran’s index, which came about as a result of Pearson’s correlation coefficient in general statistics [15]. Generalizing Pearson’s cross-correlation coefficient of two samples to the autocorrelation coefficient of one sample, and then generalizing the 1-dimensional autocorrelation coefficient from time series to the 2-dimensional autocorrelation coefficient about spatial distribution by substituting the weighting function for the lag parameter, we can obtain the formula of Moran’s index. Correspondingly, we can derive Geary’s coefficient from the idea of spatial autocorrelation by analogy with Watson-Durbin’s statistic [16]. It will be demonstrated that Moran’s index is based on populations, while Geary’s coefficient is based on samples. Moran’s index is in fact a standardized spatial auto-covariance, which can be simply reinterpreted with linear algebra. Suppose there are *n* elements (e.g., cities) in a system (e.g., a network of cities) which can be measured by a variable (e.g., city size), *x*. A vector can be defined in the equation below:(1)where *x _{i}* is a size measurement of the

*i*th element (

*i*= 1,2,…,

*n*). The mean of

*x*is given in the following equation:

_{i}(2)The centralized variable can be calculated by(3)where *μ* represents the average value of a vector consisting of *n* algebraic/numeric quantities. The population variance is as below:(4)where σ is the population standard deviation (PSD). The result of scaling transform of the centralized variable forms a standardized vector as follows(5)which is termed z-score in statistics. It can be shown that the norm of z, i.e., the length of the vector, , exactly equals the dimension of the system, i.e., the number of elements in the system, n. Thus we have

(6)Based on the equations prepared above, Moran’s index can be reconstructed in a simple way. Suppose there is an *n*-by-*n* unitary spatial weights matrix (USWM) such as(7)

The three properties of the matrix are as follows: (1) Symmetry, i.e., *w _{ij}* =

*w*; (2) Zero diagonal elements, namely, |

_{ji}*w*| = 0, which implies that the entries in the diagonal are all 0; (3) Normalization condition, that is(8)

_{ii}Then Moran’s index can be expressed in the quadratic form:(9)which is simple and more convenient than the conventional expression of Moran’s index for the mathematical transform in this context. Expanding equation (9) yields the original formula of Moran’s index and provides an autocorrelation coefficient defined in 2-dimensional space(10)in which *v _{ij}* denotes the elements of a spatial contiguity matrix,

*V*, which will be defined and discussed in

*Materials and Methods*. Equation (10) is the common mathematical form of Moran’s index.

The theoretical eigen equation of Moran’s index can be derived from the abovementioned definitions. Equation (9) multiplied left by *z* on both sides of the equal sign yields(11)where(12)can be termed the Ideal Spatial Weights Matrix (ISWM) in a theoretical sense. In equation (11), * z* is the eigenvector (characteristic vector) of *M** and Moran’s index is the corresponding maximum eigenvalue (characteristic root) in terms of the absolute value. According to equation (6), normalizing * z* leads to . It can be proved that the diagonal of *M** provides the Local Indicators of Spatial Association (LISA), that is, the local Moran’s index defined by Anselin [31]. The entries of *M**’s diagonal can be generally expressed as(13)where *I _{i}* refers to the local Moran’s index. Accordingly,

*I*denotes the global Moran’s index. In fact, due to

*w*=

_{ij}*w*, for arbitrary

_{ji}*n*, equation (12) can be expanded as follows

Comparing equation (14) with equation (13) shows that the elements in the diagonal of *M*^{*} give the local Moran’s index. The trace of *M*^{*} is just equal to the global Moran’s index.

### Actual Spatial Weights Matrix

The practical spatial weights matrix is different from the ISWM. In practice of spatial analysis, the matrix *zz*^{T} in equation (11) can be replaced by *z*^{T}*z* = *n*. It can be shown that *n* is the maximum eigenvalue of the matrix *zz*^{T}, and *z* is the corresponding eigenvector. Considering equation (6), for arbitrary *n*, we have(15)

Expanding equation (15) yields(16)

For example, for *n* = 2, the extended form of equation (15) is(17)

This illustrates that *n* is one of the eigenvalues of the matrix *zz*^{T} corresponding to the eigenvector *z*. Further, it can be shown that *n* is the maximum eigenvalue of *zz*^{T}. For a square matrix, the trace of *zz*^{T} is(18)where T_{r} refers to “finding the trace (of *zz*^{T})”. If *λ*_{1} = *λ*_{max} = *n* as given, then

(19)For arbitrary *n*, the extended form of *zz*^{T} is below(20)

According to the Cayley-Hamilton theorem, the eigenvalues of any *n*-by-*n* matrix are identical to the roots of a polynomial equation. For example, for *n* = 2, the characteristic polynomial of the matrix *zz*^{T} is(21)where *E* denotes the identity/unit matrix. Thus

(22)The conclusion can be drawn that the maximum eigenvalue of matrix *zz*^{T} is its dimension.

Substituting the maximum eigenvalue *n* for the corresponding matrix *zz*^{T} in equation (11) will provide a new mathematical relationship. In fact, the precondition that equation (10) comes into existence is as below(23)

In other words, from equation (23) it follows equation (10). Apparently, Moran’s index is the maximum eigenvalue of *nW*, and *y* is the corresponding eigenvector, which can be normalized as . Equation (23) divided by the standard error *σ* on both sides yields(24)

This leads to the following scaling relationship:(25)where

(26)This is the Real Spatial Weights Matrix (RSWM) in the sense of application or practice. What is referred to as the “spatial weights matrix” in the literature is just RSWM rather than *W* or ISWM. The trace of the matrix *nW* is the eigenvalue with the minimum absolute value, i.e. T_{r}(*nW*) = 0. Normalizing the eigenvector yields(27)

If we employ mathematical software such as Matlab and Mathcad to calculate the eigenveactor of *zz*^{T}*W* or *nW*, the result will be *z*° instead of *z*. Comparing equation (25) with equation (11) shows(28)

This suggests that when the eigenvector *z* is multiplied by *W* on the left side, it will remain the eigenvector of *zz*^{T}. Thus we have(29)in which 0 refers to the zero/null vector. However, equation (29) cannot occur unconditionally. In empirical analysis, the null vector should be replaced by a residual vector. An approximation relation is as follows(30)where the arrow “→” denotes “infinitely approach to” or “be theoretically equal to”. Empirically, there are always errors between *M = **z*^{T}*z**W* and *M* = **z**z*^{T}*W*, which will lead to a new approach to testing the spatial autocorrelation analysis. If the spatial autocorrelation is very strong, *M**z* will be a very close approximation to *M***z*; otherwise, the two vectors will be significantly different.

The concept of invariance in the process of mathematical transform is very important for geographical modeling. Equation (11) and Equation (25) are two eigenequations that suggest some invariance in the mathematical transform. The invariance in a transform suggests some invariance behind change or some robustness in the process of spatio-temporal evolution of a system. A characteristic equation denotes a scaling relationship of matrices. The invariance in change or transform implies some kind of symmetry, which indicates some law of conservation. In geography, symmetry is an essential criterion for model building, method choice, and parameter estimation [32]. Moran’s index is a quantity of invariance in mathematical transform, so it is a very basic and significant parameter for spatial analysis.

### Improved Version of Moran’s Scatterplot

The analytical process of spatial autocorrelation can be developed using the mathematical expressions proposed above. In order to find new approaches to evaluating Moran’s index and improve the Moran scatterplot, two vectors based on spatial weights matrix are defined as below(31)(32)where the relationship between *z* and *f*^{*} suggests the theoretical autocorrelation trend, i.e., the regression line, and the dataset of *z* and *f*, denotes the actual autocorrelation pattern, i.e. the points on the scatter diagram. The residuals of spatial autocorrelation can be defined as(33)where ef refers to the errors of the spatial autocorrelation. The squared sum of the residuals Sf is

(34)The value of *e _{f}* fluctuates around 0; therefore, the

*S*value approaches zero. A standard error can be defined in the form(35)in which

_{f}*s*refers to the standard error between the variables

_{f}*f*and

*f*

^{*}.

The error sum of square can be equivalently expressed in another form. From equation (11) it follows the observed values of *z* as below:(36)

Correspondingly, the predicted values of *z* can be given by(37)

Thus another residual vector is as below:(38)where *e _{z}* denotes the residual vector of

*z*and

*z*

^{*}. Obviously, the two residual vectors,

*e*and

_{z}*e*, are equivalent to one another. The squared sum of the residuals

_{f}*e*is

_{z}(39)Accordingly, another standard error can be defined as follows(40)in which *s _{z}* refers to the standard error between the variables

*z*and

*z*

^{*}. This implies that the two standard errors,

*s*and

_{f}*s*, are equivalent to each other.

_{z}After evaluating Moran’s index, it can be tested in two ways. First, the series of residuals should satisfy the normal distribution, which shows a bell-shaped curve or histogram. If this distribution does not occur, the spatial weights matrix must be adjusted or the weight function must be changed. Second, the standard error between *f* and *f*^{*} should be less than 0.15, that is, *s _{f}*<0.15, which is an empirical value. The standard errors and the histograms of residuals make two test approaches of spatial autocorrelation.

Moran’s scatterplot can be amended and developed using the above equations. The Moran scatterplot proposed by Anselin [31] plays an important part in spatial autocorrelation analysis. If *y* represents the *x*-axis, and *nWy* represents the *y*-axis, a conventional Moran scatterplot can be created. The scatterplot can be improved as follows. First, a trend line can be added to the plot. Second, the variables can be standardized, and *x* or *y* can be replaced by *z*. Based on equations (31) and (32), the Moran scatterplot can be bettered so that it will illustrate spatial autocorrelation more efficiently. The plot of *f*^{*} vs. *z* shows a set of ordered data points, which make a straight line, while the plot of *f* vs. z displays a set of randomly scattered data points. Superimposing the two plots onto each other yields an improved scatter diagram for Moran’s *I*. In the revised plot, the coordinates (*z _{i}*,

*f*

_{i}^{*}) represent the ideal locations that form a trend line, while (

*z*,

_{i}*f*) represent the actual locations of data points that are irregularly scattered. The slopes of the trend lines of (

_{i}*z*,

_{i}*f*

_{i}^{*}) and (

*z*,

_{i}*f*) indicate Moran’s index. Thus, the geometric meaning of the Moran scatterplot becomes clear. Moreover, an inverse Moran scatterplot can be defined based on equations (36) and (37). In the inverse scatterplot, the abscissa (

_{i}*x*-axis) is represented by

*f*

^{*}, and the ordinate (

*y*-axis) is represented by

*z*and

*z*

^{*}. The coordinates (

*f*

_{i}^{*},

*z*) indicates the ideal locations which form a straight line, while (

_{i}*f*

_{i}^{*},

*z*

_{i}^{*}) denotes the actual locations of data points which are irregularly scattered. A dual relationship can be found between the Moran scatterplot and the inverse Moran scatterplot.

## Discussion

### Revision of Moran’s Index and Geary’s Coefficient

Moran’s index is only one of the many spatial autocorrelation measurements in geographical analysis. Another important measurement is Geary’s coefficient [16]. In theory, Moran’s index can be associated with Geary’s coefficient. However, the former cannot be directly converted into the latter because that the bases of definitions of the two spatial statistics are different. Geary’s coefficient is defined in the equation below:(41)in which is the mean of *x*, and *v _{ij}* is the spatial contiguity measurement (See

*Materials and Methods*). Equation (41) can be rewritten as(42)where

*s*refers to a sample standard deviation (SSD), and

*Z*, to the corresponding standardized vector, that is

This differs from the definition of Moran’s index, which, as indicated above, is based on PSD. The traditional Moran’s index is used to describe spatial population, while traditional Geary’s coefficient is for describing a spatial sample.

The base of spatial analysis can be divided into two cases: spatial population and spatial sample. Based on spatial sample, Moran’s index can be revised as(43)which suggests that there is a linear scaling relationship between the population-based Moran’s index and the sample-based Moran’s index. On the other hand, based on a spatial population, Geary’s coefficient can be redefined as

(44)Rearranging equation (44) yields(45)

Defining a parameter such as(46)we have(47)which reflects the relation between Moran’s index and the Geary’s coefficient based on spatial population. Since *I* ranges from −1 to 1, *C** will have a value between 0 and 2. The threshold value of Geary’s coefficient is *C _{t}*

^{*}= 1, where the subscript “t” refers to “threshold”; therefore, the threshold value of Moran’s index is

*I*= 0, indicating no spatial autocorrelation. In light of equation (47), if

_{t}*C*

^{*}<1, then

*I*>0, and then there will be a positive spatial autocorrelation (PSAC); If

*C*

^{*}>1, then

*I*<0, then there will be a negative spatial autocorrelation (NSAC).

Similarly, for a spatial sample, the mathematical relationship between the revised Moran’s index and Geary’s coefficient can be derived as follows:(48)where

(49)This suggests that, for a spatial sample, the threshold value of Geary’s coefficient is *C _{t}* = (

*n*−1)/

*n*, which implies no autocorrelation in a spatial process. If

*C*<(

_{t}*n*−1)/

*n*, then

*I*

**>0, there will be a PSAC; If**

^{*}*C*>(

_{t}*n*−1)/

*n*, then

*I*

**<0, and there will be a NSAC.**

^{*}### Thresholds of Moran’s Index and Geary’s Coefficient

A long-standing problem about the threshold values of Moran’s index and Geary’s coefficient, which suggest no spatial autocorrelation, should be solved. For a spatial population, the critical value of Moran’s index is *I*_{t} = 0, and the threshold of Geary’s coefficient is *C*_{t}^{*} = 1. These are indisputable. However, for a spatial sample, the consensus has not yet been reached so far. In many previous works, the threshold value of Moran’s index was regarded as *I*_{t}^{*} = 1/(1-*n*) [10]. If this were true, then, according to equation (43), the threshold of Moran’s index for population would be *I*_{t} = −*n*/(*n*−1)^{2}; According to equation (47), the threshold of Geary’s coefficient for population would be *C*_{t}^{*} = [(*n*−1)^{2}+*n*]/(*n*−1)^{2}; According to equation (44) or equation (48), the threshold of Geary’s coefficient for sample would be *C*_{t} = [(*n*−1)^{2}+*n*]/[*n*(*n*−1)]. However, based on Pearson’s correlation coefficient, Moran’s index will indicate null autocorrelation if and only if its value equals zero. In other words, for populations, the threshold of Moran’s index must be *I*_{t} = 0 rather than other values. Then, according to equation (43), we have *I*_{t}^{*} = 0 for samples; according to equation (47), we have *C*_{t}^{*} = 1 for populations; according to equation (44) or equation (48), we have *C*_{t} = (*n*−1)/*n* for samples. A comparison between the new values and the traditional results of autocorrelation thresholds can be drawn as follows (Table 1).

## Materials and Methods

### Four Approaches to Moran’s Index

The traditional method of evaluating Moran’s index is so complicated that it is difficult for learners to make spatial analyses using the autocorrelation measurement. Based on the new framework of spatial autocorrelation expressed through linear algebra, especially, equations (11), (25), (31), and (32), four simple approaches to computing Moran’s index are proposed as follows. The first is a three-step calculation method, the second is the matrix scaling method, the third is the linear regression method, and the fourth is the standard deviation method. The four approaches are theoretically equivalent to one another. However, in practice, each method has its own advantages and disadvantages (Table 2).

#### (1) Three-step calculation method.

This is the basic method, which can be readily mastered by the beginners of spatial autocorrelation analysis. Based on the standardized vector *z* and the spatial weights matrix *W*, the three steps of calculating Moran’s index are as follows. Step 1: standardize the variable. In other words, convert the initial vector in equation (1) into the standardized vector in equation (5). According to the original definition of Moran’s index [15], [33], the PSD rather than the SSD should be used to standardize the data. Step 2: calculate the USWM. The weights matrix is defined in equations (7) and (8). Step 3: compute Moran’s index. According to equation (9), the USWM is first left multiplied by the transposition of *z*, and then the product of *z*^{T} and *W* is right multiplied by *z*. The final product of the continued multiplication is the value of Moran’s index (see Files S1 for details).

#### (2) Matrix scaling method.

It can also be termed “characteristic value method”. The key is to find the maximum eigenvalue of the matrix *M*^{*} = *zz*^{T}*W* or *M* = *nW* by using equation (11) or equation (25). In fact, if *M*^{*} is obtained through equation (12), the local Moran’s index can also be calculated. The values of the principal diagonal elements are just the local Moran’s indexes. The trace of *M*^{*} is actually the global Moran’s index which can be determined by the equation below:(50)where T_{r} denotes finding the sum of the numbers in the principal diagonal of a matrix.

#### (3) Regression analysis method.

Linear regression can be employed to solve equations (31) and (32) and evaluate Moran’s index. Suppose that *z* acts as an independent variable (i.e., argument), and *f*^{*} = *M*^{*}*z* or *f* = *Mz* as the corresponding dependent variable (response variable). A regression analysis can be conducted by letting the constant equal zero. The regression coefficient (slope) gives the value of Moran’s index.

#### (4) Standard deviation method.

It can be proved that the PSD of *f*^{*} is just the absolute value of Moran’s index. In terms of equation (31), the average value of *f*^{*} is zero since the mean of *z* is zero. Considering equation (6), the population variation of *f*^{*} is(51)

Thus the value of Moran’s index can be given by(52)

This suggests an alternative approach to estimating Moran’s index.

The first method is the most important one in this framework of spatial autocorrelation analysis. The second, third, and fourth ones are in fact derived from the first method. The process of data preparation, parameter estimation, and analysis of results based on Moran’s index can be illustrated with a flow chart (Figure 1). As an alternative measurement of Moran’s index, Geary’s coefficient can be utilized to make a spatial analysis. A new approach comprising five steps to computing Geary’s coefficient has been found by analogy with the first approach to evaluating Moran’s index. However, compared with determining Moran’s index, calculating Geary’s coefficient is complex to some extent. The first step is to standardize data using the formula *z*^{Δ} = (*x*-*μ*)/*s*, where *s* denotes SSD. The second step is to the compute the squares of difference using the formula *Z _{ij}* = (

*z*

_{i}^{Δ}-

*z*

_{j}^{Δ})

^{2}. The results compose a matrix

*Z*= [(

*z*

_{i}^{Δ}-

*z*

_{j}^{Δ})

^{2}] = [

*Z*]. The third step is to transform the spatial contiguity matrix (

_{ij}*V*) into the spatial weights matrix (

*W*=

*V*/

*W*

_{0}). The fourth step is to calculate the sum of the products of the algebraic quantities of

*W*and the numeric quantities of

*Z*using the following formula

*S*= ∑∑

*w*. The fifth step is to evaluate Geary’s coefficient using the formula

_{ij}Z_{ij}*C*=

*S*/2.

### Empirical Analysis

The improved framework of spatial autocorrelation can be applied to China’s cities to make an example of methodological practice. For simplicity, only the capital cities of the 31 provinces, autonomous regions, and municipalities directly under the Central Government of China are considered in this case study (Figure 2). The urban population is employed as a size measurement, while the distances by train between any two cities is used as a spatial contiguity measurement. The census data of the urban population in 2000 are available from the Chinese website (http://pdfdown.edu.cnki.net), and the railroad distance matrix can be found in many Chinese road atlases (see File S2 for details). Because the two cities of Haikou and Lhasa are not connected to the network of Chinese cities by railway in 2000, 29 cities are actually included in the dataset. In other words, the size of the spatial sample is *n* = 29 (Table 2).

In order to construct a spatial weights matrix, a spatial contiguity matrix must be created by using a weight function [21], [34]. For *n* elements in a geographic system, a spatial contiguity matrix, *V*, can be expressed as(53)in which *v _{ij}* is a measure used to compare and judge the degree of nearness or the contiguous relationships between location

*i*and location

*j*(

*i*,

*j*= 1,2,…,

*n*). No matter what the entry

*v*equals, it will be converted into zero (for

_{ii}*i*=

*j*,

*v*≡0). Thus a spatial weights matrix,

_{ii}*W*, can be given by(54)where

The value *v _{ii}*≡0 results in the value

*w*≡0. It is clear that

_{ii}*W*is mathematically equivalent to

*W*

_{*}. In the literature,

*W*

_{*}is often used to represent the spatial weights matrix. However, I employ

*W*to make an empirical analysis of China’s cities because the models and methods presented in

*Results*are based on

*W*instead of

*W*

_{*}. Compared with

*W*

_{*},

*W*make the mathematical process of spatial autocorrelation become simple and graceful.

Three types of functions can be used as a spatial weight function: inverse power function, negative exponential function, and staircase functions [21]. In this case, two functions will be adopted. The first is the inverse power function, which indicates action at a distance or global correlation in geography. Generally, the inverse powerfunction is in the form below:(55)where *r _{ij}* refers to the distance between location

*i*and location

*j*, and

*b*denotes the distance decay coefficient (generally,

*b*= 1). This kind of weight function comes from the impedance function of the gravity model [7]. Cliff and Ord [4], [35] used this function to construct the spatial congruity matrix. The second function is the negative exponential function indicative of quasi-global correlation or even quasi-local correlation [27]. The exponential function can be expressed as(56)where denotes the average distance between any two locations, and it can be defined as the arithmetic mean of all the numbers in a distance matrix. The exponential function can be derived from the entropy- maximizing model proposed by Wilson [36].

The four approaches discussed above can be employed to estimate Moran’s index of the 29 Chinese cities. The process of the basic computation is described below. Using equations (1)–(5), we can standardize the vector of the urban population sizes of the 29 cities *x* and yield *z*. Then, applying equations (55) or (56) to the matrix of railway distances between the 29 cities yields a spatial contiguity matrix, indicated by equation (53). Let the entries of diagonals, *v _{ii}*, equal zero (

*v*= 0). By referring to the method in which equation (53) transforms into equation (54), we can convert the spatial contiguity matrices (

_{ii}*V*) into the spatial weight matrices (

*W*) based on railway distance. Finally, equation (9), or (11), or (31) is utilized to calculate Moran’s index. The local Moran index, LISA, can be determined by equation (12) or (13). The diagonal elements of

*M*

^{*}=

*zz*

^{T}

*W*are simply the value of LISA. The main results are displayed in Table 3.

It has been demonstrated that the thresholds of Moran’s index for both populations and samples are zero (*Discussion*). Based on the inverse power function, equation (55), the value of Moran’s index is *I*≈−0.0315<0, which implies a weak negative autocorrelation; Based on the exponential function, equation (56), the index value is *I*≈−0.0410<0, which also suggests a weak negative autocorrelation between the capital cities. These results are consistent with the conclusion drawn from the error sums of square. Using equations (34) and (35), we can compute the squared sum of the residuals between *f* and *f*^{*} and the corresponding standard errors. Based on the power function, the error sum of square is around *S _{f}* = 1.2570, the standard error is about

*s*= 0.2082; Based on the exponential function, we have

_{f}*S*= 0.1878 as well as

_{f}*s*= 0.0805. These results show that, for the city sizes of China in 2000, the spatial autocorrelation based on the exponential function (confidence level is greater than 99%) is more significant than the correlation based on the power function (confidence level is greater than 59%). This suggests that China’s cities are locally spatial autocorrelation rather than global spatial autocorrelation.

_{f}Moran’s scatterplots can be employed to make an analysis of spatial autocorrelation. It is easy to create Moran’s scatterplots and the inverse Moran’s scatterplots by means of the data in Table 3. Using *z* to represent *x*-axis, and *f* and *f*^{*} to represent *y*-axis, we can draw the revised Moran’s scatterplots, which are displayed in Figure 3. In the plot, the coordinates of observed values (*z*, *f*) yield the scatter points, while the coordinates of expected values (*z*, *f*^{*}) give the trend line. Accordingly, using *f*^{*} to represent *x*-axis, and *z* and *z*^{*} to represent *y*-axis, we can generate the inverse Moran’s scatterplots, which are shown in Figure 4. In this plot, the coordinates of expected values (*f*^{*}, *z*^{*}) give the scatter points, while the coordinates of observed values (*f*^{*}, *z*) yield the trend line. Obviously, the inverse Moran’s plot is the mirror image of a Moran’s plot. In other words, a Moran’s scatterplot and its inverse scatterplot are reciprocal to one another. The reciprocal of the slope of the trend line in an inverse scatterplot equals the value of Moran’s index.

[a. Based on inverse power function; b. Based on negative exponential function].

[a. Based on inverse power function; b. Based on negative exponential function].

By employing the values of LISA and the revised or inverse Moran scatterplots, we can find the patterns of the spatial autocorrelation of China’s cities in 2000. According to Moran’s scatterplots, spatial autocorrelation falls into four types: the high-high correlation (H-H type: e.g. Tianjin, Shenyang, Harbin) in the first quadrant, the low-high correlation (L-H type: e.g. Hangzhou, Hefei, Nanchang) in the second quadrant, the low-low correlation (L-L type: e.g. Lanzhou, Xining, Yinchuan) in the third quadrant, and the high-low correlation (H-L type: e.g. Shanghai, Guangdong, Chongqing) in the fourth quadrant. The cluster result based on the power law is similar to that based on the exponential law; however, Beijing is an exception. In terms of the power function, Beijing belongs to the high-high type, but it falls into the high-low type in terms of the exponential function (Table 3). This indicates that, as a whole, spatial weight functions have no significantly different influence on the autocorrelation pattern of China’s cities. Going a step further, we can find the most prominent Chinese cities through the process of spatial autocorrelation. Several evidences show that the city of Shanghai seems to be an exceptional case in Moran’s scatterplots. First, the standardized sizes of two cities, Shanghai and Beijing, are greater than 2, the value of double standard deviation. Second, the maximum value of LISA belongs to Shanghai. Third, if we remove Shanghai from the sample, the trendline in Figure 3(a) will become nearly flat. These calculations show that the most prominent city of China in the spatial autocorrelation process is Shanghai followed by Beijing, Tianjin, and Hangzhou. This conclusion lends further support to the spatial correlation analysis of China’s cities [28].

If we examine the error frequency distributions, we can determine the difference between the effect of the inverse power function and that of the negative exponential function. By the results displayed in Table 3, the residual values of spatial autocorrelation can be calculated using equation (33). Then, the bar graphs of frequency distributions based on the power function, equation (55), and the exponential function, equation (56), can be illustrated as follows (Figure 5). The graphs are expected to be bell-shaped histograms. However, both of the histograms fall short of expectations, but the second one, shown in Figure 5(b), is closer to the bell curve than the first one, shown in Figure 5(a). Based on the power function, the squared sum of the errors between the real frequency distribution and the theoretical normal distribution is about 0.1963, while based on the exponential function, the corresponding error sum of square is about 0.1487. This seems to suggest that the negative exponential function is more appropriate for the spatial autocorrelation analysis of China’s cities than the inverse power function as a spatial weight function. As a matter of fact, the dataset consisting of only 29 elements is not large enough to form an unambiguous normal curve or bell histogram. The above-stated analytical process of normal histograms is just to demonstrate an application of a method, but the conclusion is for reference only.

(Note: These graphs are created using standardized error series. The filled bars represent the actual distributions based on observed values, while the unfilled bars with double frames represent the normal distributions based on the expected values, which form bell-shaped histograms.) [a. Based on inverse power function; b. Based on negative exponential function].

In practice, the spatial population and the spatial sample are relative. Whether a dataset is treated as sample or population depends on the aim of spatial analysis. For the above example, the 29 cities in China can be regarded as a spatial population or a sample. If we investigate the capital cities only, the set of cities can be thought of as a population; but if we examine all the cities of China by means of this subset of cities, the 29 cities will make up a sample. Applying the approaches to evaluating Moran’s index and Geary’s coefficient to the dataset of the 29 China’s cities yields a series of results of Moran’s index and Geary’s coefficient, which are tabulated below (Table 4). Clearly, the values of the different autocorrelation parameters based on different weight functions led to the same conclusion: weak negative spatial autocorrelation. This also suggests that if the size of dataset *n* is large enough, the parameter values based on SSD are not significantly different from those based on PSD.

### Conclusions

The significance of this work is that it provides a new approach to and a new way of understanding spatial autocorrelation analysis. In particular, based on the reformative expression of Moran’s index, the spatial autocorrelation can be associated with scaling analysis. By mathematical transform, the relationship between Moran’s index and the geographical scaling process can be revealed, and the results will be reported in an upcoming article. Now, three conclusions can be drawn from this study. **First, the spatial autocorrelation analysis can be simplified by means of matrix calculus**. Using equations of matrices, we can calculate Moran’s index or Geary’s coefficient by several steps with ease. If the global Moran’s index is evaluated, then the local Moran’s index can be incidentally obtained. What is more, based on the matrix expression, Moran’s scatterplot can be improved and the inverse Moran’s scatterplot can be put forward. **Second, the scopes of application of Moran’s index and Geary’s coefficient are different**. Moran’s index is based on spatial population, while Geary’s coefficient is based on spatial sampling results. If we plan to apply Moran’s index to spatial samples, the formula of Moran’s index should be revised; if we plan to apply Geary’s coefficient to spatial population, the formula of Geary’s coefficient should also be revised. **Third, the error frequency distribution can be employed to choose a spatial weight function**. One academic contribution of this paper to geographical analysis is the error formula of spatial correlation. More than one type of weight functions has been used to make spatial autocorrelation analysis; however, using the one resulting in an error frequency distribution which is more similar to Gaussian distribution is more advisable.

## Supporting Information

### File S1.

A simple approach to calculating Moran’s index using MS Excel.

https://doi.org/10.1371/journal.pone.0068336.s001

(DOCX)

### File S2.

Data sets and partial processes of computing Moran’s index and Geary’s coefficient.

https://doi.org/10.1371/journal.pone.0068336.s002

(XLSX)

## Acknowledgments

I would like to thank my student, Ms. Ruoxuan Xiong, for assistance in partial mathematical derivation of Moran’s index. I am also grateful to two anonymous reviewers whose interesting comments were very helpful in improving the paper’s quality.

## Author Contributions

Conceived and designed the experiments: YC. Performed the experiments: YC. Analyzed the data: YC. Contributed reagents/materials/analysis tools: YC. Wrote the paper: YC. Derived new formulae: YC. Designed the procedures of parameter computation using these formulae: YC.

## References

- 1. Anselin L (1995) Local indicators of spatial association–LISA. Geographical Analysis 27(2): 93–115.
- 2. Bivand RS, Müller W, Reder M (2009) Power calculations for global and local Moran’s I. Computational Statistics and Data Analysis. 53: 2859–2872.
- 3.
Cliff AD, Ord JK (1969) The problem of spatial autocorrelation. In: Scott AJ, editor. London Papers in Regional Science 1, Studies in Regional Science London: Poin. 25–55.
- 4.
Cliff AD, Ord JK (1981) Spatial Processes: Models and Applications London: Pion.
- 5. Getis A (2009) Spatial weights matrices. Geographical Analysis 41 (4): 404–410.
- 6.
Griffith DA (2003) Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization New York: Springer-Verlag.
- 7.
Haggett P, Cliff AD, Frey A (1977) Locational Analysis in Human Geography London: Edward Arnold.
- 8. Haining RP (2009) Spatial autocorrelation and the quantitative revolution. Geographical Analysis 41 (4): 364–374.
- 9.
Li H, Calder CA, Cressie N (2007) Beyond Moran’s
*I*: Testing for spatial dependence based on the spatial autoregressive model. Geographical Analysis 39(4): 357–375. - 10.
Odland J (1988)
*Spatial Autocorrelation*London: SAGE Publications. - 11. Sokal RR, Oden NL (1978) Spatial autocorrelation in biology. 1. Methodology. Biological Journal of the Linnean Society 10(2): 199–228.
- 12.
Sokal RR, Thomson JD (1987) Applications of spatial autocorrelation in ecology. In: Legendre P, Legendre L, editors.
*Developments in Numerical Ecology, NATO ASI Series, Vol. G14*Berlin: Springer-Verlag. 431–466. - 13.
Tiefelsdorf M (2002) The saddle point approximation of Moran’s
*I*and local Moran’s*I*reference distributions and their numerical evaluation Geographical Analysis. 34(3): 187–206._{i} - 14. Weeks JR, Getis A, Hill AG, Gadalla MS, Rashed T (2004) The fertility transition in Eqypt: Intraurban patterns in Cairo. Annals of the Association of American Geographers 94(1): 74–93.
- 15. Moran PAP (1948) The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B 37(2): 243–251.
- 16. Geary RC (1954) The contiguity ratio and statistical mapping. The Incorporated Statistician 5(3): 115–145.
- 17. Beck J, Sieber A (2010) Is the spatial distribution of mankind’s most basic economic traits determined by climate and soil alone? PLoS ONE 5(5): e10416.
- 18. Benedetti-Cecchi L, Iken K, Konar B, Cruz-Motta J, Knowlton A, et al. (2010) Spatial relationships between polychaete assemblages and environmental variables over broad geographical scales. PLoS ONE 5(9): e12946.
- 19. Bonnot N, Gaillard J-M, Coulon A, Galan M, Cosson J-F, et al. (2010) No difference between the sexes in fine-scale spatial genetic structure of roe deer. PLoS ONE 5(12): e14436.
- 20. Braun A, Auerswald K, Geist J (2012) Drivers and spatio-temporal extent of hyporheic patch variation: Implications for sampling. PLoS ONE 7(7): e42046.
- 21. Chen YG (2012) On the four types of weight functions for spatial contiguity matrix. Letters in Spatial and Resource Sciences 5(2): 65–72.
- 22. Deblauwe V, Kennel P, Couteron P (2012) Testing pairwise association between spatially autocorrelated variables: A new approach using surrogate lattice data. PLoS ONE 7(11): e48766.
- 23. Impoinvil DE, Solomon T, Schluter WW, Rayamajhi A, Bichha RP, et al. (2011) The spatial heterogeneity between Japanese encephalitis incidence distribution and environmental variables in Nepal. PLoS ONE 6(7): e22192.
- 24. Kumar C, Singh PK, Rai RK (2012) Under-five mortality in high focus states in India: A district level geospatial analysis. PLoS ONE 7(5): e37515.
- 25. Mateo-Tomás P, Olea PP (2010) Anticipating knowledge to inform species management: predicting spatially explicit habitat suitability of a colonial vulture spreading its range. PLoS ONE 5(8): e12374.
- 26. Stark JH, Sharma R, Ostroff S, Cummings DAT, Ermentrout B, et al. (2012) Local spatial and temporal processes of influenza in Pennsylvania, USA: 2003–2009. PLoS ONE 7(3): e34245.
- 27.
Chen YG (2008) A wave-spectrum analysis of urban population density: entropy, fractal, and spatial localization. Discrete Dynamics in Nature and Society Vol.2008, Article ID 728420, 22 pages.
- 28. Chen YG (2009) Urban gravity model based on cross-correlation function and Fourier analyses of spatio-temporal process. Chaos, Solitons & Fractals 41(2): 603–614.
- 29. Chen YG (2011) Modeling fractal structure of city-size distributions using correlation functions. PLoS ONE 6(9): e24791.
- 30. Chen YG, Jiang SG (2010) Modeling fractal structure of systems of cities using spatial correlation function. International Journal of Artificial Life Research 1(1): 12–34.
- 31.
Anselin L (1996) The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten HJ, Unwin D, editors. Spatial Analytical Perspectives on GIS London: Taylor & Francis. 111–125.
- 32. Chen YG (2013) Fractal analytical approach of urban form based on spatial correlation function. Chaos, Solitons & Fractals 49(1): 47–60.
- 33. Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1–2): 17–33.
- 34. Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistic. Geographical Analysis 24(3): 189–206.
- 35.
Cliff AD, Ord JK (1973) Spatial Autocorrelation London: Pion.
- 36.
Wilson AG (1970) Entropy in Urban and Regional Modelling London: Pion Press.