New Approaches for Calculating Moran’s Index of Spatial Autocorrelation

Spatial autocorrelation plays an important role in geographical analysis; however, there is still room for improvement of this method. The formula for Moran’s index is complicated, and several basic problems remain to be solved. Therefore, I will reconstruct its mathematical framework using mathematical derivation based on linear algebra and present four simple approaches to calculating Moran’s index. Moran’s scatterplot will be ameliorated, and new test methods will be proposed. The relationship between the global Moran’s index and Geary’s coefficient will be discussed from two different vantage points: spatial population and spatial sample. The sphere of applications for both Moran’s index and Geary’s coefficient will be clarified and defined. One of theoretical findings is that Moran’s index is a characteristic parameter of spatial weight matrices, so the selection of weight functions is very significant for autocorrelation analysis of geographical systems. A case study of 29 Chinese cities in 2000 will be employed to validate the innovatory models and methods. This work is a methodological study, which will simplify the process of autocorrelation analysis. The results of this study will lay the foundation for the scaling analysis of spatial autocorrelation.


Introduction
The theory of spatial autocorrelation has been a key element of geographical analysis for more than twenty years. A number of measurements of spatial correlation were proposed so that we can investigate the spatial process of geographical evolution from differing points of view [1][2][3][4][5][6][7][8][9][10][11][12][13][14]. Although there are various correlation measurements, two are commonly used. One is Moran's index [15], and the other, is Geary's coefficient [16]. The former is a generalization of Pearson's correlation coefficient, and the latter is analogous to the Durbin-Watson statistic of regression analysis. In theory, Moran's index is somewhat equivalent to Geary's coefficient and they can be substituted for one another. However, in practice, Moran's index cannot be replaced by Geary's coefficient and vice versa due to a subtle difference of statistical treatment. Compared with Geary's coefficient, Moran's index is more significant to spatial analysis.
Today, the concepts and methods of spatial autocorrelation have been applied to many fields, which have resulted in a number of interesting findings [17][18][19][20][21][22][23][24][25][26]. However, despite its long history, many basic and important questions remain to be answered. For example, we still don't know how to determine the spatial contiguity matrix objectively [21]. The relationships between Moran's index and Geary's coefficient are still unclear. In fact, spatial autocorrelation is a special case of the spatial correlation function. This correlation in geographic systems is often associated with the scaling process [27][28][29][30]. However, so far, spatial autocorrelation has not been linked to scaling laws. It is necessary to find new ways of understanding and implementing spatial autocorrelation analysis in order to solve many theoretical and methodological problems in this field. Before this aim can be accomplished, a simple and general framework must be constructed for this theory.
In this paper, a new way is proposed to express and estimate Moran's index. Geary's coefficient can be re-expressed in a new form along with other related measurements. By doing so, we can further develop the analytical process of spatial autocorrelation. The remaining sections of the paper are organized as follows. In Section 2, I will reconstruct the spatial weight matrix and Moran's index, and improve Moran's scatterplot in terms of the mathematical processes in this study. In Section 3, the association of Moran's index with Geary's coefficient is discussed. The scopes of application of the two parameters are defined for geographical analysis. In Section 4, I will introduce four approaches to calculating Moran's index and a simple approach to computing Geary's coefficient. The principal cities of China, including national capital and provincial capitals, are taken as a case to show how to use the methods advanced in this article. Partial processes and results of computing are show in Files S1 and S2. Finally, the paper is concluded with a brief summary. There are several innovative aspects of this study. First, the mathematical expressions are regularized, and new methods of computing spatial autocorrelation measurements are proposed. Second, the implication of Moran's index as a scaling parameter is reviewed. Third, the similarities and differences between Moran's index and Geary's coefficient are clarified. Especially, the threshold values of the two parameters indicating no spatial autocorrelation are corrected for spatial samples.

Reconstructing Moran's Index and Spatial Weights Matrix
The first basic measurement of spatial autocorrelation is Moran's index, which came about as a result of Pearson's correlation coefficient in general statistics [15]. Generalizing Pearson's cross-correlation coefficient of two samples to the autocorrelation coefficient of one sample, and then generalizing the 1-dimensional autocorrelation coefficient from time series to the 2-dimensional autocorrelation coefficient about spatial distribution by substituting the weighting function for the lag parameter, we can obtain the formula of Moran's index. Correspondingly, we can derive Geary's coefficient from the idea of spatial autocorrelation by analogy with Watson-Durbin's statistic [16]. It will be demonstrated that Moran's index is based on populations, while Geary's coefficient is based on samples. Moran's index is in fact a standardized spatial auto-covariance, which can be simply reinterpreted with linear algebra. Suppose there are n elements (e.g., cities) in a system (e.g., a network of cities) which can be measured by a variable (e.g., city size), x. A vector can be defined in the equation below: where x i is a size measurement of the ith element (i = 1,2,…,n). The mean of x i is given in the following equation: The centralized variable can be calculated by where m represents the average value of a vector consisting of n algebraic/numeric quantities. The population variance is as below: where s is the population standard deviation (PSD). The result of scaling transform of the centralized variable forms a standardized vector as follows which is termed z-score in statistics. It can be shown that the norm of z, i.e., the length of the vector, z k k, exactly equals the dimension of the system, i.e., the number of elements in the system, n. Thus we have Based on the equations prepared above, Moran's index can be reconstructed in a simple way. Suppose there is an n-by-n unitary spatial weights matrix (USWM) such as The three properties of the matrix are as follows: (1) Symmetry, i.e., w ij = w ji ; (2) Zero diagonal elements, namely, |w ii | = 0, which implies that the entries in the diagonal are all 0; (3) Normalization condition, that is Then Moran's index can be expressed in the quadratic form: which is simple and more convenient than the conventional expression of Moran's index for the mathematical transform in this context. Expanding equation (9) yields the original formula of Moran's index and provides an autocorrelation coefficient defined in 2-dimensional space in which v ij denotes the elements of a spatial contiguity matrix, V, which will be defined and discussed in Materials and Methods. Equation (10) is the common mathematical form of Moran's index.
The theoretical eigen equation of Moran's index can be derived from the abovementioned definitions. Equation (9) multiplied left by z on both sides of the equal sign yields where can be termed the Ideal Spatial Weights Matrix (ISWM) in a theoretical sense. In equation (11), z is the eigenvector (characteristic vector) of M* and Moran's index is the corresponding maximum eigenvalue (characteristic root) in terms of the absolute value. According to equation (6), normalizing z leads to z= ffiffi ffi n p . It can be proved that the diagonal of M* provides the Local Indicators of Spatial Association (LISA), that is, the local Moran's index defined by Anselin [31]. The entries of M*'s diagonal can be generally expressed as where I i refers to the local Moran's index. Accordingly, I denotes the global Moran's index. In fact, due to w ij = w ji , for arbitrary n, equation (12) can be expanded as follows According to the Cayley-Hamilton theorem, the eigenvalues of any n-by-n matrix are identical to the roots of a polynomial equation. For example, for n = 2, the characteristic polynomial of the matrix zz T is where E denotes the identity/unit matrix. Thus The conclusion can be drawn that the maximum eigenvalue of matrix zz T is its dimension. Substituting the maximum eigenvalue n for the corresponding matrix zz T in equation (11) will provide a new mathematical relationship. In fact, the precondition that equation (10) comes into existence is as below In other words, from equation (23) it follows equation (10). Apparently, Moran's index is the maximum eigenvalue of nW, and y is the corresponding eigenvector, which can be normalized as z= ffiffi ffi n p . Equation (23) divided by the standard error s on both sides yields nW y s~I This leads to the following scaling relationship: where This is the Real Spatial Weights Matrix (RSWM) in the sense of application or practice. What is referred to as the ''spatial weights matrix'' in the literature is just RSWM rather than W or ISWM. The trace of the matrix nW is the eigenvalue with the minimum absolute value, i.e. T r (nW) = 0. Normalizing the eigenvector yields If we employ mathematical software such as Matlab and Mathcad to calculate the eigenveactor of zz T W or nW, the result will be zu instead of z. Comparing equation (25) with equation (11) shows This suggests that when the eigenvector z is multiplied by W on the left side, it will remain the eigenvector of zz T . Thus we have in which 0 refers to the zero/null vector. However, equation (29) cannot occur unconditionally. In empirical analysis, the null vector should be replaced by a residual vector. An approximation relation is as follows where the arrow ''R'' denotes ''infinitely approach to'' or ''be theoretically equal to''. Empirically, there are always errors between M = z T zW and M* = zz T W, which will lead to a new approach to testing the spatial autocorrelation analysis. If the spatial autocorrelation is very strong, Mz will be a very close approximation to M*z; otherwise, the two vectors will be significantly different. The concept of invariance in the process of mathematical transform is very important for geographical modeling. Equation (11) and Equation (25) are two eigenequations that suggest some invariance in the mathematical transform. The invariance in a transform suggests some invariance behind change or some robustness in the process of spatio-temporal evolution of a system. A characteristic equation denotes a scaling relationship of matrices. The invariance in change or transform implies some kind of symmetry, which indicates some law of conservation. In geography, symmetry is an essential criterion for model building, method choice, and parameter estimation [32]. Moran's index is a quantity of invariance in mathematical transform, so it is a very basic and significant parameter for spatial analysis.

Improved Version of Moran's Scatterplot
The analytical process of spatial autocorrelation can be developed using the mathematical expressions proposed above. In order to find new approaches to evaluating Moran's index and improve the Moran scatterplot, two vectors based on spatial weights matrix are defined as below where the relationship between z and f * suggests the theoretical autocorrelation trend, i.e., the regression line, and the dataset of z and f, denotes the actual autocorrelation pattern, i.e. the points on the scatter diagram. The residuals of spatial autocorrelation can be defined as where ef refers to the errors of the spatial autocorrelation. The squared sum of the residuals Sf is The value of e f fluctuates around 0; therefore, the S f value approaches zero. A standard error can be defined in the form in which s f refers to the standard error between the variables f and f * . The error sum of square can be equivalently expressed in another form. From equation (11) it follows the observed values of z as below: Correspondingly, the predicted values of z can be given by Thus another residual vector is as below: where e z denotes the residual vector of z and z * . Obviously, the two residual vectors, e z and e f , are equivalent to one another. The squared sum of the residuals e z is Accordingly, another standard error can be defined as follows in which s z refers to the standard error between the variables z and z * . This implies that the two standard errors, s f and s z , are equivalent to each other. After evaluating Moran's index, it can be tested in two ways. First, the series of residuals should satisfy the normal distribution, which shows a bell-shaped curve or histogram. If this distribution does not occur, the spatial weights matrix must be adjusted or the weight function must be changed. Second, the standard error between f and f * should be less than 0.15, that is, s f ,0.15, which is an empirical value. The standard errors and the histograms of residuals make two test approaches of spatial autocorrelation.
Moran's scatterplot can be amended and developed using the above equations. The Moran scatterplot proposed by Anselin [31] plays an important part in spatial autocorrelation analysis. If y represents the x-axis, and nWy represents the y-axis, a conventional Moran scatterplot can be created. The scatterplot can be improved as follows. First, a trend line can be added to the plot. Second, the variables can be standardized, and x or y can be replaced by z. Based on equations (31) and (32), the Moran scatterplot can be bettered so that it will illustrate spatial autocorrelation more efficiently. The plot of f * vs. z shows a set of ordered data points, which make a straight line, while the plot of f vs. z displays a set of randomly scattered data points. Superimposing the two plots onto each other yields an improved scatter diagram for Moran's I. In the revised plot, the coordinates (z i , f i * ) represent the ideal locations that form a trend line, while (z i , f i ) represent the actual locations of data points that are irregularly scattered. The slopes of the trend lines of (z

Revision of Moran's Index and Geary's Coefficient
Moran's index is only one of the many spatial autocorrelation measurements in geographical analysis. Another important measurement is Geary's coefficient [16]. In theory, Moran's index can be associated with Geary's coefficient. However, the former cannot be directly converted into the latter because that the bases of definitions of the two spatial statistics are different. Geary's coefficient is defined in the equation below: in which x x~m is the mean of x, and v ij is the spatial contiguity measurement (See Materials and Methods). Equation (41) can be rewritten as where s refers to a sample standard deviation (SSD), and Z, to the corresponding standardized vector, that is This differs from the definition of Moran's index, which, as indicated above, is based on PSD. The traditional Moran's index is used to describe spatial population, while traditional Geary's coefficient is for describing a spatial sample.
The base of spatial analysis can be divided into two cases: spatial population and spatial sample. Based on spatial sample, Moran's index can be revised as which suggests that there is a linear scaling relationship between the population-based Moran's index and the sample-based Moran's index. On the other hand, based on a spatial population, Geary's coefficient can be redefined as Rearranging equation (44) yields Defining a parameter such as we have which reflects the relation between Moran's index and the Geary's coefficient based on spatial population. Since I ranges from 21 to 1, C* will have a value between 0 and 2. The threshold value of Geary's coefficient is C t * = 1, where the subscript ''t'' refers to ''threshold''; therefore, the threshold value of Moran's index is I t = 0, indicating no spatial autocorrelation. In light of equation (47), if C * ,1, then I .0, and then there will be a positive spatial autocorrelation (PSAC); If C * .1, then I ,0, then there will be a negative spatial autocorrelation (NSAC).
Similarly, for a spatial sample, the mathematical relationship between the revised Moran's index and Geary's coefficient can be derived as follows: where This suggests that, for a spatial sample, the threshold value of Geary's coefficient is C t = (n21)/n, which implies no autocorrelation in a spatial process. If C t ,(n21)/n, then I * .0, there will be a PSAC; If C t .(n21)/n, then I * ,0, and there will be a NSAC.

Thresholds of Moran's Index and Geary's Coefficient
A long-standing problem about the threshold values of Moran's index and Geary's coefficient, which suggest no spatial autocorrelation, should be solved. For a spatial population, the critical value of Moran's index is I t = 0, and the threshold of Geary's coefficient is C t * = 1. These are indisputable. However, for a spatial sample, the consensus has not yet been reached so far. In many previous works, the threshold value of Moran's index was regarded as I t * = 1/(1-n) [10]. If this were true, then, according to equation (43), the threshold of Moran's index for population would be I t = 2n/(n21) 2 ; According to equation (47) (Table 1).

Four Approaches to Moran's Index
The traditional method of evaluating Moran's index is so complicated that it is difficult for learners to make spatial analyses using the autocorrelation measurement. Based on the new framework of spatial autocorrelation expressed through linear algebra, especially, equations (11), (25), (31), and (32), four simple approaches to computing Moran's index are proposed as follows. The first is a three-step calculation method, the second is the matrix scaling method, the third is the linear regression method, and the fourth is the standard deviation method. The four approaches are theoretically equivalent to one another. However, in practice, each method has its own advantages and disadvantages (Table 2).
(1) Three-step calculation method. This is the basic method, which can be readily mastered by the beginners of spatial autocorrelation analysis. Based on the standardized vector z and the spatial weights matrix W, the three steps of calculating Moran's index are as follows.
Step 1: standardize the variable. In other words, convert the initial vector in equation (1) into the standardized vector in equation (5). According to the original definition of Moran's index [15,33], the PSD rather than the SSD should be used to standardize the data. Step 2: calculate the USWM. The weights matrix is defined in equations (7) and (8).
Step 3: compute Moran's index. According to equation (9), the USWM is first left multiplied by the transposition of z, and then the product of z T and W is right multiplied by z. The final product of the continued multiplication is the value of Moran's index (see File S1 for details).
(2) Matrix scaling method. It can also be termed ''characteristic value method''. The key is to find the maximum eigenvalue of the matrix M * = zz T W or M = nW by using equation (11) or equation (25). In fact, if M * is obtained through equation (12), the local Moran's index can also be calculated. The values of the principal diagonal elements are just the local Moran's indexes. The trace of M * is actually the global Moran's index which can be determined by the equation below: where T r denotes finding the sum of the numbers in the principal diagonal of a matrix.
(3) Regression analysis method. Linear regression can be employed to solve equations (31) and (32) and evaluate Moran's index. Suppose that z acts as an independent variable (i.e., argument), and f * = M * z or f = Mz as the corresponding dependent variable (response variable). A regression analysis can be conducted by letting the constant equal zero. The regression coefficient (slope) gives the value of Moran's index.
(4) Standard deviation method. It can be proved that the PSD of f * is just the absolute value of Moran's index. In terms of equation (31), the average value of f * is zero since the mean of z is zero. Considering equation (6), the population variation of f * is Thus the value of Moran's index can be given by This suggests an alternative approach to estimating Moran's index.
The first method is the most important one in this framework of spatial autocorrelation analysis. The second, third, and fourth ones are in fact derived from the first method. The process of data preparation, parameter estimation, and analysis of results based on Moran's index can be illustrated with a flow chart ( Figure 1). As an alternative measurement of Moran's index, Geary's coefficient can be utilized to make a spatial analysis. A new approach comprising five steps to computing Geary's coefficient has been found by analogy with the first approach to evaluating Moran's index. However, compared with determining Moran's index, calculating Geary's coefficient is complex to some extent. The first step is to standardize data using the formula z D = (x-m)/s, where s denotes SSD. The second step is to the compute the squares of difference using the formula Z ij = (z i Dz j D ) 2 . The results compose a matrix The third step is to transform the spatial contiguity matrix (V) into the spatial weights matrix (W = V/W 0 ). The fourth step is to calculate the sum of the products of the  algebraic quantities of W and the numeric quantities of Z using the following formula S = ggw ij Z ij . The fifth step is to evaluate Geary's coefficient using the formula C = S/2.

Empirical Analysis
The improved framework of spatial autocorrelation can be applied to China's cities to make an example of methodological practice. For simplicity, only the capital cities of the 31 provinces, autonomous regions, and municipalities directly under the Central Government of China are considered in this case study (Figure 2). The urban population is employed as a size measurement, while the distances by train between any two cities is used as a spatial contiguity measurement. The census data of the urban population in 2000 are available from the Chinese website (http://pdfdown. edu.cnki.net), and the railroad distance matrix can be found in many Chinese road atlases (see File S2 for details). Because the two cities of Haikou and Lhasa are not connected to the network of Chinese cities by railway in 2000, 29 cities are actually included in the dataset. In other words, the size of the spatial sample is n = 29 (Table 2).
In order to construct a spatial weights matrix, a spatial contiguity matrix must be created by using a weight function [21,34]. For n elements in a geographic system, a spatial contiguity matrix, V, can be expressed as in which v ij is a measure used to compare and judge the degree of nearness or the contiguous relationships between location i and location j (i, j = 1,2,…,n). No matter what the entry v ii equals, it will be converted into zero (for i = j, v ii ;0). Thus a spatial weights matrix, W, can be given by The value v ii ;0 results in the value w ii ;0. It is clear that W is mathematically equivalent to W * . In the literature, W * is often used to represent the spatial weights matrix. However, I employ W to make an empirical analysis of China's cities because the models and methods presented in Results are based on W instead of W * . Compared with W * , W make the mathematical process of spatial autocorrelation become simple and graceful.
Three types of functions can be used as a spatial weight function: inverse power function, negative exponential function, and staircase functions [21]. In this case, two functions will be adopted. The first is the inverse power function, which indicates action at a distance or global correlation in geography. Generally, the inverse powerfunction is in the form below: where r ij refers to the distance between location i and location j, and b denotes the distance decay coefficient (generally, b = 1). This kind of weight function comes from the impedance function of the gravity model [7]. Cliff and Ord [4,35] used this function to construct the spatial congruity matrix. The second function is the negative exponential function indicative of quasi-global correlation or even quasi-local correlation [27]. The exponential function can be expressed as where r r denotes the average distance between any two locations, and it can be defined as the arithmetic mean of all the numbers in Table 3. Classification of spatial autocorrelation based on population size of the principal cities in China (2000). a distance matrix. The exponential function can be derived from the entropy-maximizing model proposed by Wilson [36]. The four approaches discussed above can be employed to estimate Moran's index of the 29 Chinese cities. The process of the basic computation is described below. Using equations (1)-(5), we can standardize the vector of the urban population sizes of the 29 cities x and yield z. Then, applying equations (55) or (56) to the matrix of railway distances between the 29 cities yields a spatial contiguity matrix, indicated by equation (53). Let the entries of diagonals, v ii , equal zero (v ii = 0). By referring to the method in which equation (53) transforms into equation (54), we can convert the spatial contiguity matrices (V) into the spatial weight matrices (W) based on railway distance. Finally, equation (9), or (11), or (31) is utilized to calculate Moran's index. The local Moran index, LISA, can be determined by equation (12) or (13). The diagonal elements of M * = zz T W are simply the value of LISA. The main results are displayed in Table 3.
It has been demonstrated that the thresholds of Moran's index for both populations and samples are zero (Discussion). Based on the inverse power function, equation (55), the value of Moran's index is I<20.0315,0, which implies a weak negative autocorrelation; Based on the exponential function, equation (56), the index value is I<20.0410,0, which also suggests a weak negative autocorrelation between the capital cities. These results are consistent with the conclusion drawn from the error sums of square. Using equations (34) and (35), we can compute the squared sum of the residuals between f and f * and the corresponding standard errors. Based on the power function, the error sum of square is around S f = 1.2570, the standard error is about s f = 0.2082; Based on the exponential function, we have S f = 0.1878 as well as s f = 0.0805. These results show that, for the city sizes of China in 2000, the spatial autocorrelation based on the exponential function (confidence level is greater than 99%) is more significant than the correlation based on the power function (confidence level is greater than 59%). This suggests that China's cities are locally spatial autocorrelation rather than global spatial autocorrelation.
Moran's scatterplots can be employed to make an analysis of spatial autocorrelation. It is easy to create Moran's scatterplots and the inverse Moran's scatterplots by means of the data in Table 3. Using z to represent x-axis, and f and f * to represent y-axis, we can draw the revised Moran's scatterplots, which are displayed in Figure 3. In the plot, the coordinates of observed values (z, f) yield the scatter points, while the coordinates of expected values (z, f * ) give the trend line. Accordingly, using f * to represent x-axis, and z and z * to represent y-axis, we can generate the inverse Moran's scatterplots, which are shown in Figure 4. In this plot, the coordinates of expected values (f * , z * ) give the scatter points, while the coordinates of observed values (f * , z) yield the trend line. Obviously, the inverse Moran's plot is the mirror image of a Moran's plot. In other words, a Moran's scatterplot and its inverse scatterplot are reciprocal to one another. The reciprocal of the slope of the trend line in an inverse scatterplot equals the value of Moran's index.
By employing the values of LISA and the revised or inverse Moran scatterplots, we can find the patterns of the spatial autocorrelation of China's cities in 2000. According to Moran's scatterplots, spatial autocorrelation falls into four types: the high-high correlation (H-H type: e.g. Tianjin, Shenyang, Harbin) in the first quadrant, the low-high correlation (L-H type: e.g. Hangzhou, Hefei, Nanchang) in the second quadrant, the low-low correlation (L-L type: e.g. Lanzhou, Xining, Yinchuan) in the third quadrant, and the high-low correlation (H-L type: e.g. Shanghai, Guangdong, Chongqing) in the fourth quadrant. The cluster result based on the power law is similar to that based on the exponential law; however, Beijing is an exception. In terms of the power function, Beijing belongs to the high-high type, but it falls into the high-low type in terms of the exponential function (Table 3). This indicates that, as a whole, spatial weight functions have no significantly different influence on the autocorrelation pattern of China's cities. Going a step further, we can find the most prominent Chinese cities through the process of spatial autocorrelation. Several evidences show that the city of Shanghai seems to be an exceptional case in Moran's scatterplots. First, the standardized sizes of two cities, Shanghai and Beijing, are greater than 2, the value of double standard deviation. Second, the maximum value of LISA belongs to Shanghai. Third, if we remove Shanghai from the sample, the trendline in Figure 3(a) will become nearly flat. These calculations show that the most prominent city of China in the spatial autocorrelation process is Shanghai followed by Beijing, Tianjin, and Hangzhou. This conclusion lends further support to the spatial correlation analysis of China's cities [28].
If we examine the error frequency distributions, we can determine the difference between the effect of the inverse power function and that of the negative exponential function. By the results displayed in Table 3, the residual values of spatial autocorrelation can be calculated using equation (33). Then, the bar graphs of frequency distributions based on the power function, equation (55), and the exponential function, equation (56), can be illustrated as follows ( Figure 5). The graphs are expected to be bellshaped histograms. However, both of the histograms fall short of expectations, but the second one, shown in Figure 5(b), is closer to the bell curve than the first one, shown in Figure 5(a). Based on the power function, the squared sum of the errors between the real frequency distribution and the theoretical normal distribution is about 0.1963, while based on the exponential function, the corresponding error sum of square is about 0.1487. This seems to suggest that the negative exponential function is more appropriate for the spatial autocorrelation analysis of China's cities than the inverse power function as a spatial weight function. As a matter of fact, the dataset consisting of only 29 elements is not large enough to form an unambiguous normal curve or bell histogram. The above-stated analytical process of normal histograms is just to demonstrate an application of a method, but the conclusion is for reference only.
In practice, the spatial population and the spatial sample are relative. Whether a dataset is treated as sample or population depends on the aim of spatial analysis. For the above example, the 29 cities in China can be regarded as a spatial population or a sample. If we investigate the capital cities only, the set of cities can be thought of as a population; but if we examine all the cities of China by means of this subset of cities, the 29 cities will make up a sample. Applying the approaches to evaluating Moran's index and Geary's coefficient to the dataset of the 29 China's cities yields a series of results of Moran's index and Geary's coefficient, which are tabulated below (Table 4). Clearly, the values of the different autocorrelation parameters based on different weight functions led to the same conclusion: weak negative spatial autocorrelation. This also suggests that if the size of dataset n is large enough, the parameter values based on SSD are not significantly different from those based on PSD.

Conclusions
The significance of this work is that it provides a new approach to and a new way of understanding spatial autocorrelation analysis. In particular, based on the reformative expression of Moran's index, the spatial autocorrelation can be associated with scaling analysis. By mathematical transform, the relationship between Moran's index and the geographical scaling process can be revealed, and the results will be reported in an upcoming article. Now, three conclusions can be drawn from this study. First, the spatial autocorrelation analysis can be simplified by means of matrix calculus. Using equations of matrices, we can calculate Moran's index or Geary's coefficient by several steps with ease. If the global Moran's index is evaluated, then the local Moran's index can be incidentally obtained. What is more, based on the matrix expression, Moran's scatterplot can be improved and the inverse Moran's scatterplot can be put forward. Second, the scopes of application of Moran's index and Geary's coefficient are different. Moran's index is based on spatial population, while Geary's coefficient is based on spatial sampling results. If we plan to apply Moran's index to spatial samples, the formula of Moran's index should be revised; if we plan to apply Geary's coefficient to spatial population, the formula of Geary's coefficient should also be revised. Third, the error frequency distribution can be employed to choose a spatial weight function. One academic contribution of this paper to geographical analysis is the error formula of spatial correlation. More than one type of weight functions has been used to make spatial autocorrelation analysis; however, using the one resulting in an error frequency distribution which is more similar to Gaussian distribution is more advisable.

Supporting Information
File S1 A simple approach to calculating Moran's index using MS Excel.