Reconstruction and normalization of LISA for spatial analysis

Yanguang Chen

doi:10.1371/journal.pone.0303456

Abstract

The local indicators of spatial association (LISA) are important measures for spatial autocorrelation analysis. However, there is an inadvertent fault in the mathematical processes of deriving LISA in literature so that the local Moran and Geary indicators do not satisfy the second basic requirement for LISA: the sum of the local indicators is proportional to a global indicator. This paper aims at reconstructing the calculation formulae of the local Moran indexes and Geary coefficients through mathematical derivation and empirical evidence. Two sets of LISAs were clarified by new mathematical reasoning. One set of LISAs is based on non-normalized weights and non-centralized variable (MI1 and GC1), and the other set is based on row normalized weights and standardized variable (MI2 and GC2). The results show that the first set of LISAs satisfy the above-mentioned second requirement, but the second the set cannot. Then, the third set of LISA was proposed and can be treated as canonical forms (MI3 and GC3). This set of LISAs satisfies the second requirement. The observational data of city population and traffic mileage in Beijing-Tianjin-Hebei region of China were employed to verify the theoretical results. This study helps to clarify the misunderstandings about LISAs in the field of geospatial analysis.

Citation: Chen Y (2024) Reconstruction and normalization of LISA for spatial analysis. PLoS ONE 19(5): e0303456. https://doi.org/10.1371/journal.pone.0303456

Editor: Yuxia Wang, East China Normal University, CHINA

Received: October 27, 2023; Accepted: April 25, 2024; Published: May 22, 2024

Copyright: © 2024 Yanguang Chen. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available from the supporting information files.

Funding: The project is funded by the National Natural Science Foundation of China (42171192). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Geography has two core concepts on location effect: difference and dependence. The former is related to a classical topic of geography, while the latter is related to spatial correlation analysis. The concept of spatial difference is also termed regional differences, which came from areal differentiation [1–3]. The traditional concept of difference seems to be in contradiction with the pursuit of general laws, so geography embarks on the road of "exceptionalism" [4]. After the quantitative revolution (1953–1976), geography began to attach importance to spatial organization and correlation, which indicates spatial dependence. Spatial interaction models and spatial autocorrelation analysis are the main approaches to research spatial correlation processes [5, 6]. Spatial autocorrelation is originally a biological statistic concept, which is mainly used to evaluate whether the spatial sampling results meet the traditional statistical requirements [7–9]. When geographers introduced spatial autocorrelation measure into geospatial analysis, they found that there are few spatial uncorrelated phenomena. In this context, the spatial autocorrelation analysis method was developed [10–12]. The early spatial autocorrelation analysis was only at the global level, rarely involving the local level, so it provided limited geospatial information. In other words, the initial spatial autocorrelation focuses on spatial dependence rather than spatial difference. After the theoretical revolution in the later period of the quantitative revolution was frustrated, the traditional regional trend of thought of geography returned quietly, and the concept of regional difference was again valued by geographers with a new expression of spatial heterogeneity [13]. Tobler proposed the first law of geography based on spatial dependence [14], and Harvey proposed that spatial heterogeneity be the second law of geography [15]. The study of spatial heterogeneity naturally involves spatial locality. According to Fotheringham [16–18], there are three trends in the development of quantitative geography: localization, computation and visualization. In this sense, local spatial autocorrelation analysis came into being [13, 19–22]. Therefore, spatial difference (heterogeneity) and spatial correlation (dependency) have reached the same goal through different routes [13, 23].

Local spatial autocorrelation analysis is developed on the basis of global spatial autocorrelation analysis. The Local Indicators of Spatial Association (LISA) proposed by Anselin [19] plays an important role in the local correlation analysis of geographical research. LISA includes local Moran indexes and local Geary coefficients. These spatial statistics, together with the G index proposed by Getis and Ord [21] and Moran scatterplot proposed by Anselin [13], have become systematic tools for local autocorrelation analysis. However, even the wisest are not always free from error. The Anselin’s outstanding paper contains some important issues that need to be addressed. The main problems are as follows. First, there is an unintentional mistake of mathematical reasoning resulted from step skip of mathematical transformation. This mistake leads readers to misunderstand the relationship between global normalized spatial weight matrix and row-normalized spatial weight matrix. Second, the row-normalized spatial weight matrix violates the distance axiom. A spatial weight matrix is based on distance matrix or generalized distance matrix, which must conforms to distance axiom. Otherwise, the calculation result of the global or local Moran’s index may appear abnormal. Third, the basic difference between Moran’s index and Geary’s coefficient was omitted. Moran’s index is based on spatial population, while Geary’s coefficient is based on spatial sample. Different definitions lead to different application directions. However, in the definitions of LISA, the local Geary’s coefficient is based on spatial population rather spatial sample. This is not consistent with original aim of defining Geary’s coefficient.

The above issues cause a series of consequences. First, the two sets of LISA values are not equivalent to each other. For example, the ratios of the LISA values based on non-normalized spatial weight matrix to the LISA values based on normalized spatial weight matrix are not constants. This is a serious logical problem. As we know, if two measures are equivalent to one another, the ratio of the two measures is a constant. For example, the ratio of Student’s t statistic to Pearson’s part correlation coefficient is a constant, which equals the square root of the ratio of residuals mean square deviation to total sum of squares. Second, sometimes, the calculated values of Moran’s index and Geary’s coefficient exceed reasonable upper and lower limits. Moran’s index bear two sets of boundary values at least. One is absolute boundary values, that is -1 and 1, which depend on the mathematical structure of Moran’s index formula and can be proved by conditional extremum principle of quadratic form. The other is relative boundary values, which are determined by the maximum and minimum eigenvalues of normalized spatial weight matrix [24–26]. Beyond the boundary values of spatial statistics is another logical problem. One of the key reasons lies in that symmetric spatial contiguity matrix is replaced by asymmetric row normalized spatial weight matrix in the process of mathematical deduction. What is more, Anselin’s LISA lack clear boundary value and critical value. Anyway, spatial statistics represent a kind of measures, which may be used to describe or infer. No matter where the goal is, a good measure should have a clear critical value or boundary value. For example, the boundary values of Pearson correlation coefficient is -1 and 1, and the critical value is 0. The purpose of this paper is to develop the spatial measures based on LISA. The rest parts are organized as below. In Section 2, Anselin’s mathematical reasoning process is sorted out and his unintentional mistakes are corrected. Based on the mathematical derivation, the local Moran index and local Geary coefficient will be normalized. In addition, the strict mathematical relationship between Moran’s indexes and Geary’s coefficients are derived. In Section 3, the observational data of the system of cities in Beijing-Tianjin-Hebei region in China will be employed to testify the improved results. In Sections 4 and 5, the related questions are discussed, and finally, the discussion will be concluded by summarizing the main points of this study.

2 Theoretical results

2.1 Local spatial autocorrelation measurements

2.1.1 The first formula of local Moran index.

One of the bases of spatial analysis is spatial proximity matrix, which can be measured by spatial distance matrix. Spatial distance matrix or spatial proximity matrix can be transformed into spatial contiguity matrix by means of spatial weight function such as negative power law or step function [27, 28]. A spatial contiguity matrix can be treated as non-normalized spatial weight matrix. Suppose that there are n elements in a geographical region, and this size of the ith element is measured by x_i (i = 1, 2,…,n). The size variable x are not standardized and the spatial contiguity matrix V = [v_ij] is not transformed into the globally normalized spatial weight matrix W = [w_ij]. Note that the so-called global normalization refers to the normalization of a matrix or vector by the sum of its elements. So, global normalization can also be termed sum-normalization or sum-based normalization. Correspondingly, row-normalization is a type of local normalization which can also be called row-based normalization. Using the symbol systems defined in this context, we can extract two sets of local spatial autocorrelation statistics (Table 1). The first local Moran index formula defined by Anselin [19] is as follows (1) where , denote centralized size variables, and refers to mean value. In Eq (1), i≠j, otherwise v_ij = 0. The centralized variables can be transformed into standardized variables by means of z-score formula. Based on population standard derivation, the standardized variables can be expressed as where z denotes standardized variable, and σ refers to population standard deviation. The sum of Eq (1) is (2) which is essentially the sum of spatially weighted outer products of centralized variables. The spatial weight coefficient is not normalized by sum. The sum of the elements in spatial contiguity matrix is (3)

Dividing Eq (2) by V₀ yields spatial precision weighted auto-covariance as follows (4)

Furthermore, the spatial weighted covariance can be divided by the population variance of the size variable, which is called the second moment in literature [19], that is (5)

The result is global Moran’s index, I = Cov/σ². It can be expanded as (6) where w_ij is the element of the globally normalized weight matrix W. According to Anselin [19], Eq (6) can be expressed as (7)

The relationship between the sum of Anselin’s first local Moran’s indexes and the global Moran’s index is obtained as below (8)

The proportionality coefficient in Eq (8) is (9) which represents the general expression of the ratio of the sum of local Moran’s indexes to the global Moran’s index. Please note that Eqs (8) and (9) are derived from the relations based on non-normalized spatial weight matrix. They cannot be directly applied to the mathematical processes based on row-normalized spatial weight matrix. According to Anselin [19], Eq (3) can be replaced by a vector indicating the sum of rows of the spatial contiguity matrix as below (10)

Correspondingly, spatial contiguity matrix can be normalized by row. Anselin called it row-standardized spatial weights matrix [19]. In this way, Eq (4) becomes a locally weighted spatial auto-covariance, that is (11)

The summation of Eq (11) is (12)

Based on Eqs (11) and (12), it is impossible to obtain the global spatial weighted auto-covariance, and it is impossible to derive the simple summation relationship between local Moran index and global Moran index. If so, the reasoning from Eq (4) to Eq (9) will be invalid.

Download:

Table 1. Three sets of LISAs researched in this paper based on Anselin’s work.

https://doi.org/10.1371/journal.pone.0303456.t001

It can be seen that the local-global relationship based on Anselin’s first local Moran index formula suggests a global normalized weight matrix with symmetry. The first local Moran index formula of Anselin [19] is correct, it satisfy the two requirements defined by Anselin [19]. The shortcoming lies in that it is not standardized. A good measure should have a clear critical value (reference value) or a pair of explicit boundary values. However, the local Moran index calculated by Eq (1) has neither boundary values nor clear threshold value.

2.1.2 The second formula of local Moran index.

Suppose that the variables are standardized, the spatial contiguity matrix is transformed into a spatial weight matrix which is normalized by row. In this way, V₀ in is replaced by V_i in Eq (4). Thus, revised Eq (4) divided by population variance yields the second local Moran’s index formula of Anselin [19], I_i** = Cov_i/σ², that is (13) where w_ij* denotes the elements in the row-normalized spatial weight matrix, V^*. Apparently, Eq (13) is based on Eqs (10) and (11). Thus, in terms of Eq (10), the sum of the spatial weight matrix is (14)

The variance of standardized variable is 1, namely, σ² = 1. For normalized matrix by row, the sum is V₀* = n, thus we have (15) Substituting Eq (15) into Eq (8) seems to yield the following relation (16) which is once of relations given by Anselin [19]. Note that the symbols have been slightly changed. That is, V₀ is replaced by V₀*, and I_i* is replaced by I_i**. The new added asterisk indicates the inherent difference between the two sets of local Moran’s indexes. On the surface, there is no problem at all in the mathematical derivation process. However, Anselin [19] inadvertently made a mistake in above reasoning process (S1 File). Looking at Eq (14) alone, we may think that there is no problem. However, by summing Eq (13), it is impossible to extract an independent Eq (14), and this is exactly the problem. In fact, Anselin [19] unintentionally replaced a mathematical concept by directly applying the derived results based on non-normalized weight matrices to the relationship formula based on row-normalized spatial weight matrices. Regardless of whether the spatial contiguity matrix is symmetric or not, the non- normalized spatial weight matrix and the row normalized spatial weight matrix are not isomorphic to each other. However, the non-normalized spatial weight matrix is isomorphic to the sum-based normalized spatial weight matrix.

Mathematical deduction problems can be revealed through logical analysis, and also can be reflected through empirical analysis. Let us check the problem from another view of angle. The relation between the second set of local Moran’s indexes of Anselin [19] and global Moran’s index can be derived from Eq (13). The summation of the local Moran’s indexes based on Eq (13) is (17)

By variable standardization, the population standard deviation becomes 1 unit, i.e., σ² = 1. However, the row sum of spatial contiguity matrix V_i is not a constant. It can neither be eliminated nor converted to a constant. Therefore, no constant proportionality relation between the second set of local Moran’s index and the global Moran’s index. If and only if Eq (6) is introduced into Eq (17) can the proportional relationship similar to Eq (8) be derived. Based on Eq (6), Eq (17) can be re-expressed as (18)

Unfortunately, we cannot prove the following relation: (19)

This lends further support to the judgment that Eq (16) does not hold. However, the proportional relationship given in Eqs (17) and (18) can be easily verified by the observational data. Another view of angle is to examine the ratios of two sets of local Moran indices. If the ratios are constant, the two definitions are equivalent to one another, otherwise they are not. In fact, the values in the first set of local Moran indexes divided by the corresponding values in the second set of local Moran indexes yields (20) which, obviously, is a variable that changes with V_i rather than a constant.

It can be seen that the ratios of two sets of local Moran’s indexes are not constant, so they are not equivalent to each other. This suggests that, the second set of local Moran indexes cannot satisfy the second requirement of Anselin [19], which said, “The sum of the local indicators is proportional to a global indicator”. The reason for the fault is that Anselin [19] inadvertently replaced a concept in this mathematical derivation. Concretely speaking, the globally normalized symmetric weight matrix W becomes the locally normalized asymmetric weight matrix V^*. This way violates the law of identity of concepts and the principle of logical consistency in mathematical reasoning.

2.1.3 The formula of local Geary coefficient.

The global Geary coefficient is complementary to the global Moran index: the former is oriented to spatial sample analysis, and the latter is based on spatial statistical population. Similar to the treatment of local Moran index, two local Geary statistics were defined by Anselin [19]. It is assumed that the variables are not standardized and the spatial contiguity matrix is not transformed into a global normalized spatial weight matrix. Anselin [19] defined the first local Geary’s coefficient as (21) in which the divisor 2 is ignored. Suppose that the variable is standardized, and the spatial contiguity matrix is transformed into a row normalized spatial weight matrix. Anselin [19] defines the second local Geary coefficient as (22)

Summation of Eq (21) divided by the population variance σ² is (23) where C refers to global Geary coefficient. It can be expressed as (24) in which z* referes to the standardized size variable based on the sample standard deviation s, i.e.,

Here s denotes sample standard deviation, that is, s = σ(n/(n-1))^1/2. In addition, the proportional coefficient between the sum of the first local Geary coefficient divided by the population variance and the global Geary coefficient is as below (25)

Therefore, the relationship between the sum of the first local Geary coefficients and the global Geary coefficients is (26)

This formula is correct, and it satisfies the two requirements given by Anselin [19]. However, it is neither direct nor standard. Dividing the summation of Eq (21) by both the population variance σ² and the sum of the spatial weight matrix V₀ to obtain the relationship between the local Geary’s coefficients and the global Geary coefficient, that is (27)

This is the corrected expression of the relationship between local Geary coefficient and global Geary’s coefficient, differing from that given by Anselin [19]. The reason is that derivation of this relationship is based on the global normalization of spatial weight matrix. However, due to the fact that divisor 2 is ignored in Eq (21), when n is sufficiently large in Eq (27), the sum of local Geary’s coefficients does not equal the global Geary’s coefficient. Based on the row-normalized weight matrix, the sum of local Geary’s coefficients is (28)

The constant proportional relationship between local Geary coefficient and global Geary coefficient cannot be derived in terms of Eq (28). Anselin [19] believes that, according to Eq (25), for the weight matrix normalized by row, V₀ = n, so there is γ_c = 2n²/(n-1), that’s right. Then he gave the following relation (29)

This is wrong and cannot be strictly derived by mathematical methods, nor can it be verified by observational data. Based on the row-normalized weight matrix, the correct result is (30) in which γ_c* represents the proportionality coefficient. The coefficient can be expressed as (31) which is not a constant. It cannot be proved that Eq (29) is equivalent to Eq (30). Moreover, starting from Eqs (21) and (22), the proportional relationship between the two sets of local Geary coefficients is (32)

This is obviously not a constant, but a variable that changes with the sum of the rows of the spatial proximity matrix. This shows that the two sets of local Geary coefficients are not equivalent to each other, and the ratio of the corresponding values of the two sets of local Geary coefficients is equal to the ratio of the values of the two sets of local Moran’s indices. In short, the second set of local Geary statistic does not satisfy the second requirement given by Anselin [19].

2.2 Revised and normalized results

2.2.1 Adjustment of symbol system and clarification of concept.

Concept is the cornerstone of logic. If and only if a concept is clear, there will be no mistakes in reasoning. The premise of mathematical reasoning is the symbolization of concepts. Confusion of symbols can easily lead to mistakes in reasoning. The main reason for the inconsistency between the two sets of LISA proposed by Anselin [19] is the unintentional concept substitution caused by the symbol mixing of spatial measure matrixes. At present, there are several problems about spatial autocorrelation in geographical literature.

Firstly, the symbols of the spatial weight matrix need to be improved. The symbols of spatial contiguity matrix (SCM), say, [1/d_ij], and those of spatial weight matrix (SWM), say, [v_ij/∑∑v_ij], where v_ij = 1/d_ij, are confused with each other. The two matrixes are regarded as equivalence and are both represented by the same symbol [w_ij]. In fact, the spatial distance matrix can be transformed into a spatial contiguity matrix according to a certain distance decay function, and the weight matrix can be obtained by normalizing the spatial contiguity matrix [29]. Despite the final result is the same in the case of symbol confusion, the expression form causes many unnecessary misunderstandings for beginners. This paper distinguishes the symbols as follows: SCM is represented by V, its elements are represented by v_ij; SWM is represented by W, and its elements are expressed as w_ij. Thus we have SCM, V = [v_ij], and SWM, W = [w_ij] = [v_ij/∑∑v_ij].

Secondly, the definitions of spatial matrixes need to be explained. After the spatial contiguity matrix (SCM) is transformed into the spatial weight matrix (SWM), the global normalization and local normalization by row are confused. Anselin [19], the original founder of the local Moran index, adopted the method of row normalization (he term the processing “row-standardization”). The sum of the SWM elements is thus equal to n. However, this method will lead to two results: (1) The symmetry of the spatial distance matrix is broken. Spatial weight matrix comes from spatial distance matrix or generalized spatial distance matrix. One of the important properties of distance measure is symmetry: d_ij = d_ji holds for all i and j [30]. This is one of the four principles of the distance axioms (positivity, specification, symmetry, and triangle inequality). (2) The absolute value of the calculated local Moran index may exceed 1 sometimes. Moran index is an autocorrelation coefficient whose absolute value should fall between—1 and 1 in theory. As for the special boundary values of Moran’s index determined by the maximum and minimum eigenvalues of the spatial weight matrix, it should be discussed in another work.

Thirdly, the meanings and symbols of the two types of variance are different. The population variance is often confused with the sample variance in spatial statistics. Moran’s index is defined based on population variance, and Geary’s coefficient is defined based on sample variance [29]. According to Fisher’s symbol system in statistics, the population variance is expressed as σ², and the denominator in the formula is n; the sample variance is expressed as s², and the denominator in the formula is n-1 in the formula [31]. The relationship between them is σ² = (n-1)s²/n.

Fourth, the difference in numbering between rows and columns needs to be noted. There is sometimes confusion between row summation and column summation. The sum based on row vector is expressed as summation by j, and the sum of column vector is expressed as summation by i. Based on globally normalized weight matrix, the difference is only formal and has nothing to do with the results. However, based on row-normalized weight matrix, the results of row summation differs from the results of column summation.

Fifth, the methods of value transformation need to be particularly clarified. The concepts of normalization and standardization are always confused in literature. Generalized standardization includes normalization. However, both standardization and normalization have different definition methods and corresponding calculation formulas. The transformation formula of variables should be determined according to different research objectives (S2 File).

In order to make it easy for readers to understand, it is necessary to distinguish symbols, and then clarify the concept of variable transformation. There are three principles for adopting symbols in this paper: First, the principle of consensus. Priority will be given to the conventional expression in the field of mathematical statistics. For example, the population standard deviation is expressed as σ, and the sample standard deviation is expressed as s [31]. Second, the principle of direction. For example, the spatial weight matrix represents W because “W” it is the capital form of the initial of “weight”. Third, the principle of distinction. For example, the spatial contiguity matrix represents V, so as to distinguish it from the spatial weight matrix W, and this distinguishing facilitates mathematical reasoning. Among the above three principles, the distinction principle is the most important (Table 2). In the spatial autocorrelation literature, centralization variables (such as defining local Moran’s index), standardized variables (such as simplifying the calculation of global Moran index) and globally normalized variables (such as simplifying the calculation of Getis-Ord’s index) are used, respectively (Table 3). In the literature, when the spatial weight matrix is normalized by row, the concept of row standardization is adopted, but the calculation formula is not given [19]. This can easily lead to misunderstandings for beginners of spatial autocorrelation analysis.

Download:

Table 2. Comparison between Anselin’s symbol system and the symbol system in this paper.

https://doi.org/10.1371/journal.pone.0303456.t002

Download:

Table 3. Value transformation methods, calculation formulas, and properties of converted variables.

https://doi.org/10.1371/journal.pone.0303456.t003

2.2.2 Definition of normalized local Moran’s index.

Moran’s index is defined on the basis of population standard deviation rather than sample standard deviation. Accordingly, local Moran’s index should also be defined through population standard deviation. In light of Eq (7), canonical local Moran’s index can be defined as (33)

Further, according to Eq (7), the relation between global Moran’s index and the sum of local Moran’s indexes is (34)

According to Eq (33), the relation between Anselin’s first set of local Moran indexes and the local Moran’s indexes formula improved in this paper is (35)

Thus, for the globally normalized spatial weight matrix W and the standardized variable based on population standard deviation z, we have σ² = 1, V₀ = 1. Thus, Eq (9) should be replaced by (36)

This suggests that, according to the second basic requirement for LISA from Anselin [19], the sum of normalized local Moran’s index equals the global Moran’s index.

2.2.3 Definition of normalized local Geary’s coefficient.

Geary’s coefficient is defined on the basis of sample standard deviation rather than population standard deviation. Accordingly, local Geary’s coefficient should also be defined through sample standard deviation. The generalized Geary’s coefficient is another case [29]. In terms of Eq (26), global Geary’s coefficient can be expressed as (37) where s² = nσ²/(n-1) reflects the relationship between sample variance s² and population variance σ². Thus local Geary’s coefficient can be defined as (38)

Summing Eq (38) yields global Geary’s coefficient, that is, Eq (24). According to Eq (37), the relation between Anselin’s first set of Geary’s coefficient and the local Geary’s coefficient formula improved in this paper is (39)

Thus, for the globally normalized spatial weight matrix W and the standardized vector based on sample standard deviation z*, we have s² = 1, V₀ = 1. Thus, according to Eq (26), the relation between proportionality coefficients is (40)

Moran’s index and Geary’s coefficient reflect the same problem from different angles of view. It can be proved that the relationship between global Moran’s I and global Geary’s C is as follows (41) where z denotes standardized vector based on population standard deviation, z² = diag(zz^T) refers to a vector composed of the squares of the elements in z, o^T = [1 1 … 1] is a ones vector in which all the elements are 1. The symbol “T” indicates transposition, and the function "diag" represents taking the diagonal elements of a matrix to form a vector. If the mean of the global Moran’s index is treated as I₀ = 1/(1-n), the mean of global Geary’s coefficient, C₀, can be estimated by (42)

Further, the relationship between local Moran’s indexes and local Geary’s coefficient can be derived. From Eq (38) it follows (43)

Changing the form of Eq (43) yields (44)

This means that there is a strict numerical conversion relationship between local Moran’s indexes and local Geary’s coefficient, although they describe the same problem from different angles. It can be seen that Eq (41) can be obtained by summing Eq (44).

In the new framework for LISA, the spatial weight matrix is normalized by sum. This is a type of global normalization in value transformation. There are several benefits to using a globally normalized weight matrix. We know that mathematics is a science relying highly on form in a sense. The same mathematical method often has vastly different effects when expressed in different forms. For spatial autocorrelation, using a normalized spatial weight matrix instead of a non-normalized weight matrix results in at least the following advantages. First, by normalized weight matrix, it is very convenient to calculate the global Moran’s index I and local Moran’s indexes I_i, and reflect the clear relationship between the two, I and I_i [29]. Second, normalizing weight matrix, we can obtain a standardized Moran’s scatterplot, where the slope of the trend line is exactly equal to the global Moran’s index value [32]. Third, based on normalized weight matrix, the structure of the parameters of the spatial autoregressive models can be clearly revealed using the spatial autocorrelation coefficients. Fourth, it makes the values of local Moran’s index and local Geary’s coefficient more intuitive. The fourth advantage mentioned above is more relevant to the research in this work. Many basic measures and models of spatial statistical analysis are rooted in conventional statistics and are created by analogy with time series analysis methods. The common measures and models of time series analysis, such as autocorrelation coefficients and autoregressive models, are also rooted in traditional statistical theories. The development of statistics took place in the wider context of the Victorian culture of measurement [31]. For simplicity’s sake, the numerous data of measurement results are usually condensed into an index [33]. In this case, an index is often treated as a characteristic measurement [6, 34]. A good index either has a pair of clear boundary values, a clear critical value, or even a combination of both. Based on standardized variable and globally normalized spatial weight matrix, the values of the local Moran’s indexes fall between -1 and 1, the corresponding critical value is 0; and the values of the local Geary’s coefficient falls between 0 and 2, and the corresponding critical value is 1.

3 Empirical analysis

3.1 Study area and data

The results of mathematical deduction ultimately need to be verified through mathematical reasoning and empirical analysis. After all, the success of sciences rests with their great emphasis on the role of quantifiable data and their interplay with models [35]. Taking cities in Beijing, Tianjin and Hebei (BTH) region as an example, we can make a concise calculation case study. This is a demonstrative case, not an explanatory case. In other words, this example is used to verify the reasoning results rather than to study the spatial structure and characteristics of BTH urban systems. The study area includes Beijing city, Tianjin city, and the main cities of Hebei Province. The study region is also termed Jing-Jin-Ji (JJJ) region in literature [36]. The cities are all of prefecture level and above, and the number of cities is n = 13. The size measurement is the city population of the fifth census in 2000 and the sixth census in 2010. Town population is not taken into account. At present, urban population has the definitions of regional total population, municipal population, city population and urban population consisting city population and town population. This case uses the city population, which can better reflect the characteristics of city size. City population size can be reflected by night light area in map [32, 36]. The population size was processed by centralization (y), population-based standardization (z) and sample-based standardization (z*) (Table 4). As for the spatial weight matrix, the basic data is derived from the traffic mileage between cities (Table 5). The spatial weight function adopts the special negative power law, the inverse proportion function, which is actually the intersection of power law and hyperbolic function. Thus, the spatial contiguity is defined as (45) where d_ij denotes the distance by road between city i and city j. On this basis, the traffic mileage matrix (U) can be transformed into a spatial contiguity matrix (V), which can be changed to the global normalization weight matrix (W) and row normalization weight matrix (W*).

Download:

Table 4. Beijing-Tianjin-Hebei city population and its centralization and standardization results.

https://doi.org/10.1371/journal.pone.0303456.t004

Download:

Table 5. Spatial distance matrix (d_ij) of Beijing-Tianjin-Hebei cities based on traffic mileage.

https://doi.org/10.1371/journal.pone.0303456.t005

3.2 Calculation results

For the data of two years and two statistics, i.e., local Moran index and local Geary coefficient, three sets of calculation results are given, respectively. The calculation process is simple, easy to understand, and the author’s calculations can be repeated by readers using Microsoft Excel (See S1 and S2 Datasets). For the local spatial statistics defined by Anselin [19], the first set of local Moran index is expressed as MI1, the second set of local Moran index as MI2; the first set of local Geary coefficients is expressed as GC1, and the second set of local Geary coefficients is written as GC2. Accordingly, the modified local Moran index and Geary coefficient are expressed as MI3 and GC3, respectively (Fig 1). The results are as follows. First, the ratio of MI1 to MI2 is not a constant, and the ratio of GC1 to GC2 is also not a constant. This proves that the two sets of local Moran indices and the two sets of local Geary coefficients of Anselin [19] are not equivalent to one another; Secondly, the ratio of MI1 to MI3 is a constant, and the ratio of GC1 to GC3 is also a constant. It is proved that the first set of local Moran index of Anselin [19] is equivalent to the modified local Moran index in this paper, and the first set of local Geary coefficient of Anselin [19] is also equivalent to the modified local Geary coefficient of this paper (Tables 6 and 7). The reason is that the first set of local Moran index and local Geary coefficient defined by Anselin [19] are based on symmetric spatial contiguity matrix. The modified statistics in this paper are based on the globally normalized spatial weight matrix which is symmetric, while the second set of local Moran index and local Geary coefficient defined by Anselin [19] are based on the locally normalized spatial weight matrix, in which the symmetry is broken.

Download:

Fig 1. A schematic flowchart of the conversion relationship from Moran’s index to different types LISAs.

(Note: Moran’s index is taken as an example in this figure. By analogy, we can know the conversion process of the Geary’s coefficient. In fact, using Eqs (42) and (44), we can achieve the numerical conversion between Moran’s index and Geary’s coefficient readily).

https://doi.org/10.1371/journal.pone.0303456.g001

Download:

Table 6. Comparison of three sets of local Moran index values in two years.

https://doi.org/10.1371/journal.pone.0303456.t006

Download:

Table 7. Comparison of three sets of local Geary coefficient values in two years.

https://doi.org/10.1371/journal.pone.0303456.t007

Using the calculation results, we can verify two key equations. The relationship between the sum of the first set of local Moran indexes and the global Moran index satisfies Eq (8), and the relationship between the sum of the first set of local Geary coefficients and the global Geary coefficient satisfies Eq (26). However, the relationship between the sum of the second set of local Moran indexes and the global Moran index does no satisfy Eq (16), and the relationship between the sum of the second set of local Geary coefficients and the global Geary coefficient does not satisfy Eq (27). The sum of spatial contiguity matrices is V₀ = 0.6671. In 2000, the population variance of city population in Beijing-Tianjin-Hebei region is σ² = 65835.5974, thus γ = σ²V₀ = 43916.8725, the global Moran index is I = -0.1191, and the sum of the first set of local Moran indexes is ∑I_i^* = -5229.3702 = γI = 43916.8725*(-0.1191). On the other hand, n = 13, γ_c = 2nV₀/(n-1) = 1.4453, and the global Geary coefficient is C = 1.1377, so the sum of the first set of local Geary coefficients is ∑C_i^* = 108253.8824 = γ_cσ²C = 1.4453*65835.5974*1.1377. However, the sum of the second set of local Moran indices is ∑I_i^** = -1.4299, while n*I = 13*(-0.1191) = -1.5480. The two values are not equal to one another (-1.4299≠-1.5480). The sum of the second set of local Geary coefficients is ∑C_i^** = 30.4883, and 2n²*C/(n-1) = 28.1667*1.1377 = 32.0446. The two values are not equal to one another (30.4883≠32.0446). These results indicate that, based on the conventional formula for the second sets of LISA, Anselin’s [19] second basic requirement cannot be met. The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding third set of local Moran indexes is γ = σ²V₀ = 43916.8725, which is a constant; the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficients to the corresponding third set of local Geary coefficient is γ_cσ² = 1.4453* 65835.5974 = 95153.2237 is a constant (Tables 6 and 7). This suggests that, based on improved formulae, Anselin’s [19] second basic requirement can be met by the calculation results.

The calculation result of one year may be regarded as an isolated case, so we might as well take a look at the situation in 2010. Based on the 6^th census data, the population variance of Beijing-Tianjin-Hebei city population is σ² = 184856.6464, thus γ = σ²V₀ = 123312.1000, the global Moran index is I = -0.1124, and the sum of the first set of local Moran indexes is ∑I_i^* = -13856.5039 = γI = 123312.1000*(-0.1124). On the other hand, γ_c = 1.4453, and the global Geary coefficient is C = 1.1329, so the sum of the first set of local Geary coefficients is ∑C_i^* = 302682.5671 = γ_cσ²C = 1.4453*184856.6464*1.1329. However, the sum of the second set of local Moran indices is ∑I_i^** = -1.3523, while n*I = 13*(-0.1124) = -1.4608 (Fig 2(A)). The two numbers are not equal to each other (-1.3523≠-1.4608). The sum of the second set of local Geary coefficients is ∑C_i^** = 30.3506, and 2n²*C/(n-1) = 28.1667*1.1329 = 31.9099. The two numbers are not equal to each other (30.3506≠31.9099). These results once again indicate that Anselin’s [19] second basic requirement cannot be satisfied through common formula. The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding numbers in the third set of local Moran index is γ = σ²V₀ = 123312.1000 (Fig 2(B)); the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficient to the corresponding third set of local Geary coefficient is γ_cσ² = 1.4453* 184856.6464 = 267176.2168 is a constant (Tables 6 and 7). This suggests that, based on new formulae, Anselin’s [19] second basic requirement can be satisfied once again by the calculation results. It can be seen that the calculation results of the two years fully support the previous theoretical inferences and related judgments.

Download:

Fig 2. The relationships between three sets of local Moran’s indexes of BTH cities in 2010.

(a) MI2 vs MI1 (high correlation). (b) 2MI3 vs MI1 (perfect fit) (Note: The second set of local Moran’s indexes (MI2) are highly correlated with the first local Moran’s indexes (MI1), but not equivalent to one another. The third set of local Moran’s indexes (MI3) is equivalent to the first set of local Moran’s indexes (MI1). The coefficient 1/γ = 1/123312.1000 = 0.000008110. MI2 does not satisfy the second requirement for LISAs given by Anselin [19]).

https://doi.org/10.1371/journal.pone.0303456.g002

4 Questions and discussion

The re-expressed local Moran indexes and the local Geary coefficients in this work are derived from Anselin’s correct definition and relationship, without substantial innovation. The contribution of this study lies in three aspects. First, it clarifies a series of logical misunderstandings of local spatial autocorrelation statistics and gives the correct expressions. Second, it normalizes the local spatial autocorrelation statistics, and the canonical results are helpful for more convenient application. Third, it clarifies a number of fundamental concepts related to spatial autocorrelation that have long been confused in literature. In terms of the tradition of statistics, important concepts and their symbols have been distinguished. Especially, it emphasizes the distance axiom hidden behind the spatial weight matrix. If the spatial contiguity matrix is normalized by row, the locally normalized spatial weight matrix will bear a different mathematical structure from the non-normalized spatial weight matrix and the globally normalized spatial weight matrix by sum. Applying the results derived from the models based on non-normalized spatial weight matrix to the relation formulae based on row-normalized spatial weight matrix results in wrong mathematical expressions. Generally speaking, spatial contiguity matrix is of symmetry. Therefore, non-normalized spatial weight matrix and globally normalized spatial weight matrix are symmetric. Substitution of symmetric spatial weight matrix with asymmetric spatial weight matrix leads to two wrong relations: First, the sum of local Moran index based on standardized variable and local normalized weight matrix is equal to n times of global Moran index; Second, the sum of local Geary coefficients based on standardized variable and local normalized weight matrix is equal to 2n²/(n-1) times of global Geary coefficient. In fact, the two relations can never be derived from Anselin’s original assumptions.

The errors based on the wrong relations are not too significant in many cases, but the results have a far-reaching impact on geographical analysis. Concretely speaking, these incorrect relationships lead to a series of problems (Table 8): (1) The relationship between the definitions of two local Moran indexes is broken (not equivalent to each other). The first set of local LISA is based on symmetric spatial adjacency matrix, and the second set is based on asymmetric spatial weight matrix normalized by row. As a result, the ratio of the values of the two sets of parameters is not a constant. (2) When defining the local spatial autocorrelation index, we only consider the relationship between one element and other elements. The pairwise correlation between all elements is ignored. That is, for the local index of the ith geographical element, only the relationships between element i and element j are taken into account, the relationships between element j and element k are neglected (i, j, k = 1,2,3,…,n). In this case, the wholeness of a geographical system is overlooked in the local spatial analysis. (3) The absolute value of the local Moran index may exceed 1, thus decoupling from the concept of correlation coefficient. Moran’s index was proposed by analogy with Pearson correlation. The values of Moran’s index comes between -1 and 1. (4) The parameters are lack of clear boundary value and critical value. The absolute boundary values of Moran index is -1 and 1. The critical value is 0 in theory and 1/(1-n) in experience. The boundary values of the Geary coefficient are 0 and 2, and the critical value is theoretically 1. In addition, Anselin [19] used the population standard deviation to replace the sample standard deviation when defining the local Geary coefficient. Where logic is concerned, no problem; while where history is concerned, there is problem: the result violates the original intention of the definition of Geary coefficient. In spatial analysis, it is sometimes difficult to distinguish between spatial samples and spatial populations. Moran’s index, which is derived from Pearson correlation coefficient, as indicated above, is a statistics based on population standard deviation. Geary’s coefficient is defined by analogy with Durbin-Watson statistics based on sample standard deviation in order to make up for the deficiency of Moran’s index. To define the local Geary coefficient, we should respect the original meaning of the definition of the Geary coefficient, so that the local Geary coefficient can be effectively associated with the global Geary coefficient. From the existing literature, some readers have found Anselin’s mistakes. Some scholars adopt a compromise approach. For example, they use the global normalized spatial weight matrix instead of the local normalized spatial weight matrix by row, but multiply n in front of the corrected local Moran index calculation formula—I found this kind of treatment in some teaching courseware. This ensures that the sum of local Moran indexes is equal to n times the global Moran index.

Download:

Table 8. Functions and problems of Anselin’s LISA and the improved effect of this paper.

https://doi.org/10.1371/journal.pone.0303456.t008

As we know, Anselin is a well-known outstanding scholar in the field of geographical spatial analysis. Due to the far-reaching influence of Anselin’s work, its logical errors caused confusion in its application and interpretation. Science respects logic and facts, not authority—only pseudoscience starts from authoritative judgment. In order to solve the above problems, this paper carries out the following processing in the process of mathematical deduction: First, return to the essence of the spatial distance matrix behind the spatial weight matrix, and respect the basic distance axiom. The global spatial weight matrix is obtained by global normalization of spatial contiguity matrix. The globally normalized spatial weight matrix is used to replace Anselin’s row-normalized weight matrix. In this way, the connotation of the concept before and after is unified and the logic is consistent, so as to avoid reasoning mistakes. Second, start from the original idea of Moran’s index and Geary’s coefficient. The normalized local Moran’s index is defined, and the population standard deviation is used to standardize the size variable; the normalized local Geary’s coefficient is defined, and the sample standard deviation is used to standardize the size variable. Third, start from the original intention of Anselin [19]. Anselin gives two sets of local Moran’s index and local Geary’s coefficient. But there is inconsistency between them. By examining the reasoning process, we can find that the reason for the error lies in the logic error caused by the unintentional concept replacement. According to the sign system and simplification principle of this paper, we transform Anselin’s second set of local Moran index and local Geary coefficient formulae. Comparing the two sets of results, we can see the problems and thus understand the similarities and differences between the two sets of formulae (Tables 8 and 9).

Download:

Table 9. Comparison of between normalized LISA and the equivalent transformation results of Anselin’s second set of LISA definitions.

https://doi.org/10.1371/journal.pone.0303456.t009

Finally, it is appropriate to briefly discuss the definition of spatial weight matrix. Spatial autocorrelation analysis depends on spatial contiguity matrix, which has multiple definitions. In fact, definition of spatial contiguity involves different spatial effects. Spatial effects of geographical processes fall into two categories: action at a distance and local action [37]. Local action can be expressed with step function in mathematics and nominal variable in value. In spatial autocorrelation analysis, the spatial contiguity matrix based on local action is mainly applicable to relationships between regions. The spatial contiguity relationship of regions bears three ways of definitions, that is, Rook’s contiguity, Bishop’s contiguity, and Queen’s contiguity [38]. Rook’s contiguity plus Bishop’s contiguity yields Queen’s contiguity. In fact, Rook’s contiguity corresponds to von Neumann’s neighborhood definition, while Queen’s contiguity corresponds to Moore’s neighborhood definition [39]. Action at a distance can be reflected by certain distance, including Euclidean distance, travel time, transportation mileage and so on. When converting distances into spatial contiguity matrix, a certain spatial contiguity function needs to be adopted. Common spatial contiguity functions include absolute step function, relative step function, exponential function, and distance inverse function (a type of hyperbolic function) [6, 12, 27]. Distance-based spatial contiguity matrix is suitable for networks of locations such as urban systems. In this case, based on the step function, spatial contiguity is represented by nominal variable (dummy variable in discrete format); based on other functions, the spatial contiguity is represented by metric variable (continuous variable). Although the function expressions are different, the logic behind them is consistent with one another. Mathematics is the pinnacle of logic. In mathematics, the most basic function is exponential function. Various forms of simple functions can be reduced to exponential function. The step function is an extreme form of an exponential function, and moving average on the step function can yield an inverse distance function [40]. So, using different functions to define spatial contiguity matrices will definitely affect the calculation results, but it has no impact on the mathematical reasoning results and the logical relationships behind them. The reason why row normalization weight matrix affects mathematical reasoning results is because the logic behind the spatial weight matrix has been changed, and the logic is regulated by the distance axiom. Scientific research typically involves three worlds: the real world, the mathematical world, and the computational world [41]. The process of mathematical transformation and derivation belongs to the mathematical world, while the selection of spatial weight matrix forms belongs to the computational world. The key is to choose the appropriate spatial contiguity matrix definition method for different geographic systems based on different situations [27]. One obvious drawback of this study is the lack of empirical analysis based on different types of spatial weight matrices. Therefore, the influence of types and structure of spatial contiguity matrixes on theoretical modelling and computational results of spatial autocorrelation appears hollow.

5 Conclusions

The global spatial autocorrelation coefficients reflect the sum of any two geographical elements in a region, while the local spatial autocorrelation indexes reflect the sum of correlation between a geographical element and all other geographical elements. The sum of parts is proportional to the whole. The first set of local Moran indexes and Geary coefficients defined by Anselin [19] is effective and consistent with the idea of global Moran index and Geary coefficient. However, the second set of local Moran indexes and local Geary coefficients defined by him are not equivalent to the first set of parameters. The non-normalized spatial weight matrix is isomorphic to the sum-based normalized spatial weight matrix, but not isomorphic to the row-based normalized spatial weight matrix. The derived results based on non-normalized spatial weight matrix cannot be directly applied to the mathematical relations based on row-normalized spatial weight matrix. The key issue rests that Anselin [19] directly applied the derived results based on the non-normalized spatial weight matrix to the relationship formula based on the row-normalized spatial weight matrix. This paper is devoted to correcting the unintentional mistakes in his reasoning process and gives the third set of definitions of local Moran indexes and local Geary coefficient in canonical forms. The newly-defined local Moran index and local Geary coefficient are simple and concise. The improved expressions are consistent with the original intention of Anselin [19] and the statistical essence of global Moran index and global Geary coefficient.

Local spatial autocorrelation analysis is a methodology developed on the basis of global spatial autocorrelation analysis. The progress of science has no end. The main points of this paper are summarized as follows. Firstly, the LISA defined in literature is of great significance for analysis of local spatial autocorrelation, but there are also some faults. The first set of LISA is based on the definition of centralized variables and non-normalized spatial contiguity matrix, lacking clear boundary values and critical value. The second set of local LISA is based on the definitions of standardized variables and row-normalized spatial weight matrix, which ignores the global relationship behind the local analysis. One of the results is that the two sets of indexes are not equivalent to one another. In addition, the population standard deviation is adopted when defining the second local Geary coefficients, which violates the original intention of Geary coefficient. All the indexes lack clear boundary values and critical value, and they are uncoupled from the correlation coefficient. One consequence is that the analysis process is complex; the other is that the conclusions drawn from the two sets of indexes are often inconsistent with each other. Secondly, the LISA expression is reconstructed by using the sum-normalized spatial weight matrix and standardized size variables based on z-score to eliminate the defects of Anselin’s LISA definition. By doing so, we have canonical spatial autocorrelation measurements. The sum-based globally normalized spatial weight matrix is used to replace the row-based locally normalized spatial weight matrix. The population standard deviation is used to standardize the variables when defining the local Moran indexes, and the sample standard deviation is used to standardize the variables when defining the local Geary coefficient. The local LISA problem of Anselin [19] can be solved effectively and the results are more concise and simpler. The results given in this paper are equivalent to those given by Anselin’s first set of formulas, i.e. first sets of local Moran index and local Geary coefficient, but they are not linearly proportional to the results of the second set of formulas, namely the second sets of local Moran index and local Geary coefficient.

Supporting information

S1 File. Anselin’s derivation and expressions for LISA.

This is a microcosm of Anselin’s paper on LISA. The key parts of Anselin’s mathematical reasoning are extracted, and the main errors in the reasoning process are revealed. This file uses Anselin’s original symbol system. Through this file, readers can more easily grasp the essence of the problem.

https://doi.org/10.1371/journal.pone.0303456.s001

(DOCX)

S2 File. Value transformation methods and formulae.

This file show common concepts and methods of value transformation and corresponding formulae for variable standardization. This document clarifies some confusion and inappropriate expressions regarding variable standardization in the literature.

https://doi.org/10.1371/journal.pone.0303456.s002

(DOCX)

S1 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2000.

This file includes the dataset of spatial distances and city population in 2000, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. The original data and calculation process are displayed for readers.

https://doi.org/10.1371/journal.pone.0303456.s003

(XLSX)

S2 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2010.

This file includes the dataset of spatial distances and city population in 2010, global Moran’s indexes and Geary’s coefficients, three sets of local Moran’s index, and three sets of local Geary’s coefficients. All the results are tabulated for comparison and references.

https://doi.org/10.1371/journal.pone.0303456.s004

(XLSX)

Acknowledgments

My student, Dr. Yuqing Long, has extracted spatial distance matrix data from the Beijing Tianjin Hebei urban network map for me, and I would like to express my gratitude. I would like to thank the anonymous reviewer and Dr. Yuxia Wang whose interesting and constructive comments were very helpful in improving the quality of this paper. The academic editor, Dr. Yuxia Wang, put in tremendous effort to invite reviewers for this paper, and I am particularly grateful for it.

References

1. Hartshorne R. Perspective on the Nature of Geography. Chicago: Rand McNally & Company; 1959.
2. Hu ZL, Chen YG, Liu T. Three laws of the changes in economic geography. Economic Geography. 2018; 38(10): 1–4 [In Chinese].
- View Article
- Google Scholar
3. Martin GJ. All Possible Worlds: A History of Geographical Ideas (4th Revised Edition). New York, NY: Oxford University Press; 2005.
4. Schaefer FK. Exceptionalism in geography: a methodological examination. Annals of the Association of American Geographers. 1953; 43: 226–249.
- View Article
- Google Scholar
5. Griffith DA. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Berlin: Springer; 2003.
6. Haggett P, Cliff AD, Frey A. Locational Analysis in Human Geography. London: Edward Arnold Ltd.; 1977.
7. Geary RC. The contiguity ratio and statistical mapping. The Incorporated Statistician. 1954; 5: 115–145.
- View Article
- Google Scholar
8. Moran PAP. The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B. 1948; 37(2): 243–251.
- View Article
- Google Scholar
9. Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950; 37: 17–33. pmid:15420245
- View Article
- PubMed/NCBI
- Google Scholar
10. Cliff AD, Ord JK. Spatial Autocorrelation. London: Pion Limited; 1973.
11. Cliff AD, Ord JK. Spatial Processes: Models and Applications. London: Pion Limited; 1981.
12. Odland J. Spatial Autocorrelation. London: SAGE Publications; 1988.
13. Anselin L. The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten HJ, Unwin D (eds.). Spatial Analytical Perspectives on GIS. London: Taylor & Francis; 1996. pp.111–125.
14. Tobler W. A computer movie simulating urban growth in the Detroit region. Economic Geography. 1970; 46(2): 234–240.
- View Article
- Google Scholar
15. Tobler W. On the first law of geography: A reply. Annals of the Association of American Geographers. 2004; 94(2): 304–310.
- View Article
- Google Scholar
16. Fotheringham AS. Trends in quantitative methods I: Stressing the Local. Progress in Human Geography. 1997; 21: 88–96.
- View Article
- Google Scholar
17. Fotheringham AS. Trends in quantitative method Ⅱ: Stressing the computational. Progress in Human Geography. 1998; 22: 283–292.
- View Article
- Google Scholar
18. Fotheringham AS. Trends in quantitative methods III: Stressing the visual. Progress in Human Geography. 1999; 23(4): 597–606.
- View Article
- Google Scholar
19. Anselin L. Local indicators of spatial association—LISA. Geographical Analysis. 1995; 27(2): 93–115.
- View Article
- Google Scholar
20. Getis A, Aldstadt J. Constructing the spatial weights matrix using a local statistic. Geographical Analysis. 2004; 36 (2): 90–104.
- View Article
- Google Scholar
21. Getis A, Ord JK. An analysis of spatial association by use of distance statistic. Geographical Analysis. 1992; 24(3):189–206.
- View Article
- Google Scholar
22. Ord JK, Getis A. Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis. 1995; 27(4): 286–306.
- View Article
- Google Scholar
23. Goodchild MF. GIScience, geography, form, and process. Annals of the Association of American Geographers. 2004; 94(4): 709–714.
- View Article
- Google Scholar
24. de Jong P, Sprenger C, van Veen F. on extreme values of Moran’s I and Geary’s C. Geographical Analysis. 1984; 16(1): 985–999.
- View Article
- Google Scholar
25. Tiefelsdorf M, Boots B. The exact distribution of Moran’s I. Environment and Planning A. 1995; 27(6): 985–999.
- View Article
- Google Scholar
26. Xu F. Improving spatial autocorrelation statistics based on Moran’s index and spectral graph theory. Urban Development Studies. 2021; 28(12): 94–103 [In Chinese].
- View Article
- Google Scholar
27. Chen YG. On the four types of weight functions for spatial contiguity matrix. Letters in Spatial and Resource Sciences. 2012; 5(2): 65–72.
- View Article
- Google Scholar
28. Getis A. Spatial weights matrices. Geographical Analysis. 2009; 41(4): 404–410.
- View Article
- Google Scholar
29. Chen YG. New approaches for calculating Moran’s index of spatial autocorrelation. PLoS ONE. 2013; 8(7): e68336. pmid:23874592
- View Article
- PubMed/NCBI
- Google Scholar
30. Chen YG. Spatial autocorrelation approaches to testing residuals from least squares regression. PLoS ONE. 2016; 11(1): e0146865. pmid:26800271
- View Article
- PubMed/NCBI
- Google Scholar
31. Magnello E, van Loon B. Introducing Statistic: A Graphic Guide. London: Icon Books; 2009.
32. Chen YG. Spatial autocorrelation equation based on Moran’s index. Scientific Reports. 2023; 13: 19296. pmid:37935705
- View Article
- PubMed/NCBI
- Google Scholar
33. Wheelan C. Naked Statistics: Stripping the Dread from the Data. New York and London: W. W. Norton & Company; 2013.
34. Taylor PJ. Quantitative Methods in Geography. Prospect Heights, Illinois: Waveland Press; 1983.
35. Louf R, Barthelemy M. Scaling: lost in the smog. Environment and Planning B: Planning and Design. 2014; 41: 767–769.
- View Article
- Google Scholar
36. Long YQ, Chen YG. Multi-scaling allometric analysis of the Beijing-Tianjin-Hebei urban system based on nighttime light data. Progress in Geography. 2019; 38(1): 88–100 [In Chinese].
- View Article
- Google Scholar
37. Chen YG, Li YJ, Feng S, Man XM, Long YQ. Gravitational scaling analysis on spatial diffusion of COVID-19 in Hubei province, China. PLoS ONE. 2021; 16(6): e0252889. pmid:34115791
- View Article
- PubMed/NCBI
- Google Scholar
38. Widip CA, Utomo WH, Yulianto SJP. Identification of spatial patterns of food insecurity regions using Moran’s I (Case study: Boyolali regency). International Journal of Computer Applications. 2013; 72(2): 54–62.
- View Article
- Google Scholar
39. Batty M, Couclelis H, Eichen M. Urban systems as cellular automata. Environment and Planning B: Planning and Design. 1997; 24: 159–164.
- View Article
- Google Scholar
40. Chen YG. Power-law distributions based on exponential distributions: Latent scaling, spurious Zipf’s law, and fractal rabbits. Fractals. 2015; 23(2): 1550009.
- View Article
- Google Scholar
41. Casti JL. Would-Be Worlds: How Simulation Is Changing the Frontiers of Science. New York: John Wiley and Sons; 1996.

[ref1] 1. Hartshorne R. Perspective on the Nature of Geography. Chicago: Rand McNally & Company; 1959.

[ref2] 2. Hu ZL, Chen YG, Liu T. Three laws of the changes in economic geography. Economic Geography. 2018; 38(10): 1–4 [In Chinese].
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Martin GJ. All Possible Worlds: A History of Geographical Ideas (4th Revised Edition). New York, NY: Oxford University Press; 2005.

[ref4] 4. Schaefer FK. Exceptionalism in geography: a methodological examination. Annals of the Association of American Geographers. 1953; 43: 226–249.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Griffith DA. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Berlin: Springer; 2003.

[ref6] 6. Haggett P, Cliff AD, Frey A. Locational Analysis in Human Geography. London: Edward Arnold Ltd.; 1977.

[ref7] 7. Geary RC. The contiguity ratio and statistical mapping. The Incorporated Statistician. 1954; 5: 115–145.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref8] 8. Moran PAP. The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B. 1948; 37(2): 243–251.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref9] 9. Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950; 37: 17–33. pmid:15420245
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref10] 10. Cliff AD, Ord JK. Spatial Autocorrelation. London: Pion Limited; 1973.

[ref11] 11. Cliff AD, Ord JK. Spatial Processes: Models and Applications. London: Pion Limited; 1981.

[ref12] 12. Odland J. Spatial Autocorrelation. London: SAGE Publications; 1988.

[ref13] 13. Anselin L. The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten HJ, Unwin D (eds.). Spatial Analytical Perspectives on GIS. London: Taylor & Francis; 1996. pp.111–125.

[ref14] 14. Tobler W. A computer movie simulating urban growth in the Detroit region. Economic Geography. 1970; 46(2): 234–240.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref15] 15. Tobler W. On the first law of geography: A reply. Annals of the Association of American Geographers. 2004; 94(2): 304–310.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref16] 16. Fotheringham AS. Trends in quantitative methods I: Stressing the Local. Progress in Human Geography. 1997; 21: 88–96.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref17] 17. Fotheringham AS. Trends in quantitative method Ⅱ: Stressing the computational. Progress in Human Geography. 1998; 22: 283–292.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref18] 18. Fotheringham AS. Trends in quantitative methods III: Stressing the visual. Progress in Human Geography. 1999; 23(4): 597–606.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref19] 19. Anselin L. Local indicators of spatial association—LISA. Geographical Analysis. 1995; 27(2): 93–115.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref20] 20. Getis A, Aldstadt J. Constructing the spatial weights matrix using a local statistic. Geographical Analysis. 2004; 36 (2): 90–104.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref21] 21. Getis A, Ord JK. An analysis of spatial association by use of distance statistic. Geographical Analysis. 1992; 24(3):189–206.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref22] 22. Ord JK, Getis A. Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis. 1995; 27(4): 286–306.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref23] 23. Goodchild MF. GIScience, geography, form, and process. Annals of the Association of American Geographers. 2004; 94(4): 709–714.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref24] 24. de Jong P, Sprenger C, van Veen F. on extreme values of Moran’s I and Geary’s C. Geographical Analysis. 1984; 16(1): 985–999.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref25] 25. Tiefelsdorf M, Boots B. The exact distribution of Moran’s I. Environment and Planning A. 1995; 27(6): 985–999.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref26] 26. Xu F. Improving spatial autocorrelation statistics based on Moran’s index and spectral graph theory. Urban Development Studies. 2021; 28(12): 94–103 [In Chinese].
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref27] 27. Chen YG. On the four types of weight functions for spatial contiguity matrix. Letters in Spatial and Resource Sciences. 2012; 5(2): 65–72.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref28] 28. Getis A. Spatial weights matrices. Geographical Analysis. 2009; 41(4): 404–410.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref29] 29. Chen YG. New approaches for calculating Moran’s index of spatial autocorrelation. PLoS ONE. 2013; 8(7): e68336. pmid:23874592
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref30] 30. Chen YG. Spatial autocorrelation approaches to testing residuals from least squares regression. PLoS ONE. 2016; 11(1): e0146865. pmid:26800271
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref31] 31. Magnello E, van Loon B. Introducing Statistic: A Graphic Guide. London: Icon Books; 2009.

[ref32] 32. Chen YG. Spatial autocorrelation equation based on Moran’s index. Scientific Reports. 2023; 13: 19296. pmid:37935705
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref33] 33. Wheelan C. Naked Statistics: Stripping the Dread from the Data. New York and London: W. W. Norton & Company; 2013.

[ref34] 34. Taylor PJ. Quantitative Methods in Geography. Prospect Heights, Illinois: Waveland Press; 1983.

[ref35] 35. Louf R, Barthelemy M. Scaling: lost in the smog. Environment and Planning B: Planning and Design. 2014; 41: 767–769.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref36] 36. Long YQ, Chen YG. Multi-scaling allometric analysis of the Beijing-Tianjin-Hebei urban system based on nighttime light data. Progress in Geography. 2019; 38(1): 88–100 [In Chinese].
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref37] 37. Chen YG, Li YJ, Feng S, Man XM, Long YQ. Gravitational scaling analysis on spatial diffusion of COVID-19 in Hubei province, China. PLoS ONE. 2021; 16(6): e0252889. pmid:34115791
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref38] 38. Widip CA, Utomo WH, Yulianto SJP. Identification of spatial patterns of food insecurity regions using Moran’s I (Case study: Boyolali regency). International Journal of Computer Applications. 2013; 72(2): 54–62.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref39] 39. Batty M, Couclelis H, Eichen M. Urban systems as cellular automata. Environment and Planning B: Planning and Design. 1997; 24: 159–164.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref40] 40. Chen YG. Power-law distributions based on exponential distributions: Latent scaling, spurious Zipf’s law, and fractal rabbits. Fractals. 2015; 23(2): 1550009.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref41] 41. Casti JL. Would-Be Worlds: How Simulation Is Changing the Frontiers of Science. New York: John Wiley and Sons; 1996.

Figures

Abstract

1 Introduction

2 Theoretical results

2.1 Local spatial autocorrelation measurements

2.1.1 The first formula of local Moran index.

2.1.2 The second formula of local Moran index.

2.1.3 The formula of local Geary coefficient.

2.2 Revised and normalized results

2.2.1 Adjustment of symbol system and clarification of concept.

2.2.2 Definition of normalized local Moran’s index.

2.2.3 Definition of normalized local Geary’s coefficient.

3 Empirical analysis

3.1 Study area and data

3.2 Calculation results

4 Questions and discussion

5 Conclusions

Supporting information

S1 File. Anselin’s derivation and expressions for LISA.

S2 File. Value transformation methods and formulae.

S1 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2000.

S2 Dataset. Spatial data sets and calculation results of local spatial autocorrelation indexes for 2010.

Acknowledgments

References