Reconstruction and normalization of LISA for spatial analysis

The local indicators of spatial association (LISA) are important measures for spatial autocorrelation analysis. However, there is an inadvertent fault in the mathematical processes of deriving LISA in literature so that the local Moran and Geary indicators do not satisfy the second basic requirement for LISA: the sum of the local indicators is proportional to a global indicator. This paper aims at reconstructing the calculation formulae of the local Moran indexes and Geary coefficients through mathematical derivation and empirical evidence. Two sets of LISAs were clarified by new mathematical reasoning. One set of LISAs is based on non-normalized weights and non-centralized variable (MI1 and GC1), and the other set is based on row normalized weights and standardized variable (MI2 and GC2). The results show that the first set of LISAs satisfy the above-mentioned second requirement, but the second the set cannot. Then, the third set of LISA was proposed and can be treated as canonical forms (MI3 and GC3). This set of LISAs satisfies the second requirement. The observational data of city population and traffic mileage in Beijing-Tianjin-Hebei region of China were employed to verify the theoretical results. This study helps to clarify the misunderstandings about LISAs in the field of geospatial analysis.

If the data are held or will be held in a public repository, include URLs, accession numbers or DOIs.If this information will only be available after acceptance, indicate this by ticking the box below.For example: All XXX files are available from the XXX database (accession number(s) XXX, XXX.).

•
If the data are all contained within the manuscript and/or Supporting Information files, enter the following: All relevant data are within the manuscript and its Supporting Information files.
• If neither of these applies but you are able to provide details of access elsewhere, with or without limitations, please do so.For example: Data cannot be shared publicly because of [XXX].Data are available from the XXX Institutional Data Access / Ethics Committee (contact via XXX) for researchers who meet the criteria for access to confidential data.
The data underlying the results presented in the study are available from (include the name of the third party

•
The data underlying the results presented in the study are available from the supporting information files.and contact information or URL).This text is appropriate if the data are owned by a third party and authors do not have permission to share the data.

• * typeset
Additional data availability information: Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation

Introduction
Geography has two core concepts: difference and dependence.The former is related to a classical topic of geography, while the latter is related to spatial correlation analysis.The concept of spatial

Manuscript
Click here to access/download;Manuscript;Manuscript for review 2023-10-27.docxdifference is also termed regional differences, which came from areal differentiation (Hartshorne, 1959;Hu et al, 2018;Martin, 2005).The traditional concept of difference seems to be in contradiction with the pursuit of general laws, so geography embarks on the road of "exceptionalism" (Schaefer, 1953).After the quantitative revolution , geography began to attach importance to spatial correlation, which indicates spatial dependence.Gravity models, spatial interaction models, and spatial autocorrelation analysis are the main approaches to research spatial correlation processes (Griffith, 2003;Haggett et al, 1977).Spatial autocorrelation is originally a biological statistic concept, which is mainly used to evaluate whether the spatial sampling results meet the traditional statistical requirements (Moran, 1948;Moran, 1950;Geary, 1954).When geographers introduced spatial autocorrelation measure into geospatial analysis, they found that there are few spatial uncorrelated phenomena.In this context, the spatial autocorrelation analysis method was developed (Cliff and Ord, 1973;Cliff and Ord, 1981;Odland, 1988).The early spatial autocorrelation analysis was only at the global level, rarely involving the local level, so it provided limited geospatial information.In other words, the initial spatial autocorrelation focuses on spatial dependence rather than spatial difference.After the theoretical revolution in the later period of the quantitative revolution was frustrated, the traditional regional trend of thought of geography returned quietly, and the concept of regional difference was again valued by geographers with a new expression of spatial heterogeneity (Anselin, 1996).Tobler (1970) proposed the first law of geography based on spatial dependence, and Harvey proposed that spatial heterogeneity be the second law of geography (Tobler, 2004).The study of spatial heterogeneity naturally involves spatial locality.According to Fotheringham (1997Fotheringham ( , 1998Fotheringham ( , 1999)), there are three trends in the development of quantitative geography: localization, computation and visualization.In this context, local spatial autocorrelation analysis came into being (Anselin, 1995;Anselin, 1996;Getis and Aldstadt, 2004;Getis and Ord, 1992;Ord and Getis, 1995).Therefore, spatial difference (heterogeneity) and spatial correlation (dependency) have reached the same goal through different routes (Anselin, 1996;Goodchild, 2004).
Local spatial autocorrelation analysis is developed on the basis of global spatial autocorrelation analysis.The Local Indicators of Spatial Association (LISA) proposed by Anselin (1995) plays an important role in the local correlation analysis of geographical research.LISA includes local Moran indexes and local Geary coefficients.These spatial statistics, together with the G index proposed by Getis and Ord (1992) and Moran scatterplot proposed by Anselin (1996), have become systematic tools for local autocorrelation analysis.However, even the wisest are not always free from error.
The Anselin's outstanding paper contains some significant issues that need to be addressed.The main problems are as follows.First, there is an unintentional mistake of mathematical reasoning resulted from step skip of mathematical transformation.(Chen, 2017).For example, the boundary values of Pearson correlation coefficient is -1 and 1, and the critical value is 0. The purpose of this paper is to develop the spatial measures based on LISA.The rest parts are organized as below.In Section 2, Anselin's mathematical reasoning process is sorted out and his unintentional mistakes are corrected.Based on the mathematical derivation, the local Moran index and local Geary coefficient will be normalized.In addition, the strict mathematical relationship between Moran's indexes and Geary's coefficients are derived.In Section 3, the observational data of the system of cities in Beijing-Tianjin-Hebei region in China will be employed to testify the improved results.In Sections 4 and 5, the related questions are discussed, and finally, the discussion will be concluded by summarizing the main points of this study.

The first formula of local Moran index
One of the bases of spatial analysis is spatial proximity matrix, which can be measured by spatial distance matrix.Spatial distance matrix or spatial proximity matrix can be transformed into spatial contiguity matrix by means of spatial weight function such as negative power law or step function (Chen, 2012;Getis, 2009) where ii y x x , jj y x x  denote centralized size variables, and x refers to mean value.
The centralized variables can be transformed into standardized variables by means of z-score formula.Based on population standard derivation, the standardized variables can be expressed as , where z denote standardize variable.The sum of equation ( 1) is which is essentially the sum of squares of spatial weighted deviations.The sum of the elements in spatial contiguity matrix is 0 11 (3) Dividing equation ( 1) by V0 yields spatial weighted auto-covariance as follows Furthermore, the spatial weighted covariance can be divided by the population variance of the size variable, which is called the second moment by Anselin (1995), that is (5) The result is global Moran's index, I=Cov/σ 2 .It can be expanded as where wij is the element of the global normalized weight matrix W. According to Anselin (1995), equation ( 6) can be expressed as The relationship between the sum of Anselin's first local Moran index and the global Moran index is obtained as below *2 0 1 The proportionality coefficient in equation ( 8) is Equation ( 3) can be replaced by a vector indicating the sum of rows of the spatial contiguity matrix as below (10) Spatial contiguity matrix can be normalized by row.Anselin (1995) called it row-standardized spatial weights matrix.In this way, equation (4) becomes a locally weighted spatial auto-covariance, The summation of equation ( 11) is In this case, it is impossible to obtain the global spatial weighted auto-covariance, and it is impossible to derive the simple summation relationship between local Moran index and global Moran index.If so, the reasoning from equation (4) to equation ( 9) will be invalid. It

The second formula of local Moran index
Suppose that the variables are standardized, the spatial contiguity matrix is transformed into a spatial weight matrix which is normalized by row.In this way, V0 in is replaced by Vi in equation ( 4).Thus, revised equation ( 4) divided by population variance yields the second local Moran's index formula of Anselin (1995), where wij * denotes the elements in the row normalized spatial weight matrix, V * .Thus, the sum of the spatial weight matrix is The variance of standardized variable is 1, namely, σ 2 =1.For normalized matrix by row, the sum is Substituting equation ( 15) into equation ( 8) seems to yield the following relation On the surface, there is no problem at all.The two asterisks indicate the inherent difference between the two sets of local Moran's indexes.However, Anselin (1995) inadvertently made a mistake in above reasoning process.
Mathematical deduction problems can be revealed through logical analysis, and also can be reflected through empirical analysis.Let us check the problem from another view of angle.The relation between the second set of local Moran's indexes of Anselin (1995) and global Moran's index can be derived from equation ( 13).The summation of the local Moran's indexes based on equation ( 13) is By variable standardization, the population standard deviation becomes 1 unit, i.e., σ 2 =1.However, the row sum of spatial contiguity matrix Vi is not a constant.It can neither be eliminated nor converted to a constant.Therefore, no constant proportionality relation between the second set of local Moran's index and the global Moran's index.If and only if equation ( 6) is introduced into equation ( 17) can the proportional relationship similar to equation ( 8) be derived.Based on equation ( 6), equation ( 17) can be re-expressed as Unfortunately, we cannot prove the following relation: This lend further support to the judgment that equation ( 16) does not hold.However, the proportional relationship given in equation ( 18) can be easily verified by the observation data.
which, obviously, is a variable that changes with Vi rather than a constant.
It can be seen that the ratios of two sets of local Moran's indexes are not constant, so they are not equivalent to each other.This suggests that, the second set of local Moran indexes cannot satisfy the second requirement of Anselin (1995), which said, "The sum of the local indicators is proportional to a global indicator".The reason for the fault is that Anselin (1995) inadvertently replaced a concept in this mathematical derivation.Concretely speaking, the global normalized symmetric weight matrix W becomes the local normalized asymmetric weight matrix V * .This way violates the law of identity of concepts and the principle of logical consistency in mathematical reasoning.

The formula of local Geary coefficient
The global Geary coefficient is complementary to the global Moran index: the former is oriented to sample analysis, and the latter is based on statistical population.Similar to the treatment of local Moran index, two local Geary statistics were defined by Anselin (1995).It is assumed that the variables are not standardized and the spatial contiguity matrix is not transformed into a global normalized spatial weight matrix.Anselin (1995) defined the first local Geary coefficient as Suppose that the variable is standardized, and the spatial contiguity matrix is transformed into a row normalized spatial weight matrix.Anselin (1995) defines the second local Geary coefficient as Summation of equation ( 21) divided by the population variance σ 2 is 22 where C refers to global Geary coefficient.It can be expressed as In addition, the proportional coefficient between the sum of the first local Geary coefficient divided by the population variance and the global Geary coefficient is as below The standardized size variable based on the sample standard deviation s is used here, i.e Therefore, the relationship between the sum of the first local Geary coefficients and the global Geary This formula is correct, and it satisfies the two requirements given by Anselin (1995).However, it is neither direct nor standard.Dividing the summation of equation ( 21) by both the population variance σ 2 and the sum of the spatial weight matrix V0 to obtain the relationship between the local Geary's coefficient and the global Geary coefficient, that is This is different from the relationship between local Geary coefficient and global Geary's coefficient given by Anselin (1995).The reason is that derivation of this relationship is based on the global normalization of spatial weight matrix.Based on the row-normalized weight matrix, the sum of local Geary's coefficients is The constant proportional relationship between local Geary coefficient and global Geary coefficient cannot be derived in terms of equation ( 28).Anselin (1995) believes that, according to equation ( 25), for the weight matrix normalized by row, V0 = n, so there is γc=2n 2 /(n-1), that's right.Then he gave the following relation in which γc * represents the proportionality coefficient.The coefficient can be expressed as which is not a constant.It cannot be proved that equation ( 29) is equivalent to equation (30).
Moreover, starting from equations ( 21) and ( 22), the proportional relationship between the two sets of local Geary coefficients is This is obviously not a constant, but a variable that changes with the sum of the rows of the spatial proximity matrix.This shows that the two sets of local Geary coefficients are not equivalent to each other, and the ratio of the corresponding values of the two sets of local Geary coefficients is equal to the ratio of the values of the two sets of local Moran's indices.In short, the second set of local Geary statistic does not satisfy the second requirement given by Anselin (1995).

Adjustment of symbol system and clarification of concept
Concept is the cornerstone of logic.If and only the concept is clear, there will be no mistakes in reasoning.The premise of mathematical reasoning is the symbolization of concepts.Confusion of symbols can easily lead to mistakes in reasoning.The main reason for the inconsistency between the two sets of LISA proposed by Anselin (1995) is the unintentional concept substitution caused by the symbol mixing of spatial measure matrixes.At present, there are several problems about spatial autocorrelation in geographical literature.Secondly, after the spatial contiguity matrix (SCM) is transformed into the spatial weight matrix (SWM), the global normalization and local normalization by row are confused.Anselin (1995), the original founder of the local Moran index, adopted the method of row normalization (he term the processing "row-standardization").The sum of the SWM elements is thus equal to n.However, this method will lead to two results: (1) The symmetry of the spatial distance matrix is broken.Spatial weight matrix comes from spatial distance matrix or generalized spatial distance matrix.One of the important properties of distance measure is symmetry: dij=dji holds for all i and j (Chen, 2016).This is one of the four principles of the distance axioms (positivity, specification, symmetry, and triangle inequality).
(2) The absolute value of the calculated local Moran index may exceed 1 sometimes.
Moran index is an autocorrelation coefficient whose absolute value should fall between -1 and 1 in theory.
Thirdly, the population variance is confused with the sample variance.Moran's index is defined based on population variance, and Geary's coefficient is defined based on sample variance (Chen, 2013).The population variance is expressed as σ 2 , and the denominator in the formula is n; the sample variance is expressed as s 2 , and the denominator in the formula is n -1 in the formula.The relationship between them is σ 2 =(n-1)s 2 /n.
Fourth, confusion between row summation and column summation.The sum based on row vector is expressed as summation by j, and the sum of column vector is expressed as summation by i.Based on global normalized weight matrix, the difference is only formal and has nothing to do with the results.However, based on row-normalized weight matrix, the results of row summation differs from the results of column summation.
Fifth, the concepts of normalization and standardization are confused.Generalized standardization includes normalization.However, both standardization and normalization have different definition methods and corresponding calculation formulas.The conversion formula of variables should be determined according to different research objectives.In order to make it easy for readers to understand, I first distinguish symbols, and then clarify the concept of variable transformation.There are three principles for adopting symbols in this paper: First, the principle of consensus.Priority will be given to the conventional expression in the field of mathematics.For example, the population standard deviation is expressed as σ, and the sample standard deviation is expressed as s.Second, the principle of direction.For example, the spatial weight matrix represents W because "W" it is the capital form of the initial of "weight".Third, the principle of distinction.For example, the spatial contiguity matrix represents V, so as to distinguish it from the spatial weight matrix W, and this distinguishing facilitates mathematical reasoning.
Among the above three principles, the distinction principle is the most important (Table 2).In the spatial autocorrelation literature, centralization variables (such as defining local Moran's index), standardized variables (such as simplifying the calculation of global Moran index) and global normalized variables (such as simplifying the calculation of Getis-Ord's index) are used, respectively (Table 3).In the literature, when the spatial weight matrix is normalized by row, the concept of row standardization is adopted, but the calculation formula is not given (Anselin, 1995).
This can easily lead to misunderstandings for beginners of spatial autocorrelation analysis.The values range from 0 to 1 Global normalization The values come between 0 and 1 and the sum of the values equals 1

Definition of normalized local Moran's index
Moran's index is defined on the basis of population standard deviation rather than sample standard deviation.Accordingly, local Moran's index should also be defined through population standard deviation.In light of equation ( 7 Thus, for the global normalized spatial weight matrix W and the standardized variable based on population standard deviation z, we have σ 2 =1, V0=1.Thus, equation ( 9) should be replaced by This suggests that, according to the idea from Anselin (1995), the sum of normalized local Moran's index equals the global Moran's index.

Definition of normalized local Geary's coefficient
Geary's coefficient is defined on the basis of sample standard deviation rather than population standard deviation.Accordingly, local Geary's coefficient should also be defined through sample standard deviation.In terms of equation ( 26), global Geary's coefficient can be expressed as where s 2 =nσ 2 /(n-1) reflects the relationship between sample variance s 2 and population variance σ 2 .
Thus local Geary's coefficient can be defined as Summing equation ( 38) yields global Geary's coefficient, that is, equation ( 24).According to equation (37), the relation between Anselin's first set of Geary's coefficient and the local Geary's coefficient formula improved in this paper is where o T =[1 1 … 1] is a row vector in which the elements are all 1.The symbol "T" indicates Changing the form of equation ( 43) yields This means that there is a strict numerical conversion relationship between local Moran's indexes and local Geary's coefficient, although they describe the same problem from different angles.It can be seen that equation ( 41) can be obtained by summing equation ( 44).

Study area and data
Taking cities in Beijing, Tianjin and Hebei (BTH) region as an example, a concise calculation case is given in this section.This is a demonstrative case, not an explanatory case.In other words, this example is used to verify the reasoning results rather than to study the spatial structure and characteristics of BTH urban systems.The study area includes Beijing city, Tianjin city, and the main cities of Hebei Province (Figure 1).The study region is also termed Jing-Jin-Ji (JJJ) region in literature.The cities are all of prefecture level and above, and the number of cities is n = 13.The size measurement is the city population of the fifth census in 2000 and the sixth census in 2010.
Town population is not taken into account.At present, urban population has the definitions of regional total population, municipal population, city population and urban population consisting city population and town population.This case uses the city population, which can better reflect the characteristics of city size.The population size was processed by centralization (y), populationbased standardization (z) and sample-based standardization (z * ) (Table 4).As for the spatial weight matrix, the basic data is derived from the traffic mileage between cities (Table 5).The spatial weight function adopts the special negative power law, the inverse proportion function, which is actually the intersection of power law and hyperbolic function.Thus, the spatial contiguity is defined as where dij denotes the distance by road between city i and city j.On this basis, the traffic mileage matrix (U) can be transformed into a spatial contiguity matrix (V), which can be changed to the global normalization weight matrix (W) and row normalization weight matrix (W * ).

Calculation results
For the data of two years and two statistics, i.  6, Table 7).
If the calculation result of one year is an isolated case, we might as well take a look at the situation  6, Table 7).It can be seen that the calculation results of the two years fully support the previous theoretical conclusions and related judgments.The errors based on the wrong relations are not too significant in many cases, but the results have a far-reaching impact on geographical analysis.Concretely speaking, these incorrect relationships lead to a series of problems (Table 8): (1) The relationship between the definitions of two local Moran indexes is broken (not equivalent to each other).The first set of local LISA is based on symmetric spatial adjacency matrix, and the second set is based on asymmetric spatial weight matrix normalized by row.As a result, the ratio of the values of the two sets of parameters is not a constant.
(2) When defining the local spatial autocorrelation index, we only consider the relationship between one element and other elements.However, the pairwise correlation between all elements is ignored.
For the local index of the ith geographical element, only the relationships between element i and element j are taken into account, the relationships between element j and element k are neglected (i, j, k=1,2,3,…,n).In this case, the wholeness of a geographical system is overlooked in the local spatial analysis. 1I found this kind of treatment in some teaching courseware.Comparing the two sets of results, we can see the problems and thus understand the similarities and differences between the two sets of formulae (Table 8, Table 9).

Format
below to enter a competing interest statement for this submission.On behalf of all authors, disclose any competing interests that could be perceived to bias this work-acknowledging all financial support and any other relevant financial or nonfinancial competing interests.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate and that any funding sources listed in your Funding Information later in the submission form are also declared in your Financial Disclosure statement. of the Institutional Animal Care and Use Committee (IACUC) or other relevant ethics board that reviewed the study protocol, and indicate whether they approved this research or granted a to make all data underlying the findings described fully available, without restriction, and from the time of publication.PLOS allows rare exceptions to address legal and ethical concerns.See the PLOS Data Policy and FAQ for detailed information.Yes -all data are fully available without restriction Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation A Data Availability Statement describing where the data can be found is required at submission.Your answers to this question constitute the Data Availability Statement and will be published in the article, if accepted.Important: Stating 'data available on request from the author' is not sufficient.If your data are only available upon request, select 'No' for the first question and explain your exceptional situation in the text box.Do the authors confirm that all data underlying the findings described in their manuscript are fully available without restriction?Describe where the data may be found in full sentences.If you are copying our sample text, replace any instances of XXX with the appropriate details.
can be seen that the local-global relationship based on Anselin's first local Moran index formula is a global normalized weight matrix with symmetry.The first local Moran index formula of Anselin (1995) is correct, it satisfy the two requirements defined by Anselin (1995).The shortcoming lies in that it is not standardized.A good measure should have a clear critical value (reference value) or a pair of explicit boundary values.However, the local Moran index calculated by equation (1) has neither boundary values nor clear threshold value.
Another view of angle is to examine the ratios of two sets of local Moran indices.If the ratios are constant, the two definitions are equivalent to one another, otherwise they are not.In fact, the values in the first set of local Moran indexes divided by the corresponding values in the second set of

Firstly, the symbols
of spatial contiguity matrix (SCM) and spatial weight matrix (SWM) are confused with each other.The two matrixes are regarded as equivalence and are both represented by the same symbol[wij].In fact, the spatial distance matrix can be transformed into a spatial contiguity matrix according to a certain distance decay function, and the weight matrix can be obtained by normalizing the spatial contiguity matrix(Chen, 2013;Chen, 2015).Despite the final result is the same in the case of symbol confusion, the form causes many unnecessary misunderstandings for beginners.This paper distinguishes the symbols as follows: SCM is represented by V, its elements are represented by vij; SWM is represented by W, and its elements are expressed as wij.Thus we have SCM, V=[vij], and SWM, W =[wij].
)Thus, for the global normalized spatial weight matrix W and the standardized variable based on sample standard deviation z * , we have s 2 =1, V0=1.Thus, according to equation (26)index and Geary's coefficient reflect the same problem from different angles of view.It can be proved that the relationship between global Moran's I and global Geary's C is as follows 2

Figure 1
Figure 1 Main cities in Beijing, Tianjin, and Hebei region, China e., local Moran index and local Geary coefficient, values are not equal to one another (-1.4299≠-1.5480).The sum of the second set of local Geary coefficients is ∑Ci ** =30.4883, and 2n 2 *C/(n-1)=28.1667*1.1377=32.0446.The two values are not equal to one another (30.4883≠32.0446).The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding third set of local Moran indexes is γ=σ 2 V0=43916.8725,which is a constant; the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficients to the corresponding third set of local Geary coefficient is γcσ 2 = 1.4453* 65835.5974= 95153.2237is a constant (Table in 2010.Based on the 6 th census data, the population variance of Beijing-Tianjin-Hebei city population is σ 2 =184856.6464,thus γ=σ 2 V0=123312.1000,the global Moran index is I=-0.1124, and the sum of the first set of local Moran indexes is ∑Ii * =-13856.5039=γI=123312.1000*(-0.1124).On the other hand, γc= 1.4453, and the global Geary coefficient is C=1.1329, so the sum of the first set of local Geary coefficients is ∑Ci * =302682.5671= γcσ 2 C = 1.4453*184856.6464*1.1329.However, the sum of the second set of local Moran indices is ∑Ii ** =-1.3523, while n*I=13*(-0.1124)=-1.4608(Figure 2(a)).The two numbers are not equal to each other (-1.3523≠-1.4608).The sum of the second set of local Geary coefficients is ∑Ci ** =30.3506, and 2n 2 *C/(n-1) = 28.1667*1.1329=31.9099.The two numbers are not equal to each other (30.3506≠31.9099).The sum of the third set of local Moran index is equal to the global Moran index, the ratio of the first set of local Moran indexes to the corresponding numbers in the third set of local Moran index is γ= σ 2 V0 = 123312.1000(Figure 2(b)); the sum of the third set of local Geary coefficients equals the global Geary coefficient, and the ratio of the first set of local Geary coefficient to the corresponding third set of local Geary coefficient is γcσ 2 =1.4453* 184856.6464= 267176.2168is a constant (Table

Figure 2
Figure 2 The relationships between three sets of local Moran's indexes of BTH cities in 2010 (Note: The second set of local Moran's indexes (MI2) are highly correlated with the first local Moran's indexes (MI1), but not equivalent to one another.The third set of local Moran's indexes (MI3) is equivalent to the first set of local Moran's indexes (MI1).The coefficient 1/γ= 1/123312.1000=0.000008110.MI2 does not satisfy the second requirement for LISAs given byAnselin (1995).) (3) The absolute value of the local Moran index may exceed 1, thus decoupling from the concept of correlation coefficient.Moran's index was proposed by analogy with Pearson correlation.The values of Moran's index comes between -1 and 1. (4) The parameters are lack of clear boundary value and critical value.The boundary values of Moran index is -1 and 1.The critical value is 0 in theory and 1/(1-n) in experience.The boundary values of the Geary coefficient are 0 and 2, and the critical value is theoretically 1.In addition, Anselin (1995) used the population standard deviation to replace the sample standard deviation when defining the local Geary coefficient.Where logic is concerned, no problem; while where history is concerned, there is problem: the result violates the original intention of the definition of Geary coefficient.Moran's index, which is derived from Pearson correlation coefficient, as indicated above, is a statistics based on population standard deviation.Geary's coefficient is defined by analogy with Durbin-Watson statistics based on sample standard deviation in order to make up for the deficiency of Moran's index.To define the local Geary coefficient, we should respect the original meaning of the definition of the Geary coefficient, so that the local Geary coefficient can be effectively associated with the global Geary coefficient.From the existing literature, some readers have found Anselin's mistakes.Some scholars adopt a compromise approach.For example, they use the global normalized spatial weight matrix instead of the local normalized spatial weight matrix by row, but multiply n in front of the corrected local Moran index calculation formula 1 .This ensures that the sum of local Moran indexes is equal to n times the global Moran index.
, Anselin is a well-known outstanding scholar is the field of geographical spatial analysis.Due to the far-reaching influence of Anselin's work, its logical errors caused confusion in its application and interpretation.Science respects logic and facts, not authority --only pseudoscience starts from authoritative judgment.In order to solve the above problems, this paper carries out the following processing in the process of mathematical deduction: First, return to the essence of the spatial distance matrix behind the spatial weight matrix, and respect the basic distance axiom.The global spatial weight matrix is obtained by global normalization of spatial contiguity matrix.The global normalized spatial weight matrix is used to replace Anselin's row-normalized weight matrix.In this way, the connotation of the concept before and after is unified and the logic is consistent, so as to avoid reasoning mistakes.Second, start from the original idea of Moran index and Geary coefficient.The normalized local Moran index is defined, and the population standard deviation is used to standardize the size variable; the normalized local Geary coefficient is defined, and the sample standard deviation is used to standardize the size variable.Third, start from the original intention ofAnselin. Anselin (1995) gives two sets of local Moran index and local Geary coefficient.We absolutely don't want the inconsistency between them.By examining the reasoning process, we found that the reason for the error lies in the logic error caused by the unintentional concept replacement.According to the sign system and simplification principle of this paper, we transform Anselin's second set of local Moran index and local Geary coefficient formulae.
For comparison, Anselin's definitions are transformed and re-expressed with new symbols.However, the new expressions are completely equivalent to Anselin's original expressions.5ConclusionsThe global spatial autocorrelation coefficients reflect the sum of any two geographical elements in a region, while the local spatial autocorrelation indexes reflect the sum of correlation between a geographical element and all other geographical elements.The sum of parts is proportional to the whole.The first set of local Moran indexes and Geary coefficients defined in literature is effective and consistent with the idea of global Moran index and Geary coefficient.However, the second set of local Moran indexes and local Geary coefficients defined by him are not equivalent to the first set of parameters.This paper is devoted to correcting the mistakes in its reasoning process and gives the third set of definitions of local Moran indexes and local Geary coefficient in canonical forms.The new local Moran index and local Geary coefficient are simple and concise.The new expressions are consistent with the original intention of Anselin and the statistical essence of global Moran index and global Geary coefficient.The main points of this paper are summarized as follows.Firstly, the LISA defined in literature is of great significance to the analysis of local spatial autocorrelation, but there are also some faults.The first set of LISA is based on the definition of centralized variables and non-normalized spatial contiguity matrix, lacking clear boundary values and critical value.The second set of local LISA is based on the definitions of standardized variables and row-normalized spatial weight matrix, which ignores the global relationship behind the local analysis.One of the results is that the two sets of indexes are not equivalent to one another.In addition, the population standard deviation is adopted when defining the second local Geary coefficients, which violates the original intention of Geary coefficient.All the indexes lack clear boundary values and critical value, and they are uncoupled from the correlation coefficient.One consequence is that the analysis process is complex; the other is that the conclusions drawn from the two sets of indexes are often inconsistent with each other.Secondly, the LISA expression is reconstructed by using the global normalized spatial weight matrix and standardized size variables based on z-score to eliminate the defects of Anselin's LISA definition.By doing so, we have canonical spatial autocorrelation measurements.The global normalized spatial weight matrix is used to replace the row-based local normalized spatial weight matrix.The population standard deviation is used to standardize the variables when defining the local Moran indexes, and the sample standard deviation is used to standardize the variables when defining the local Geary coefficient.The local LISA problem of Anselin can be solved effectively and the results are more concise and simpler.The results given in this paper are equivalent to those given by Anselin's first set of formulas, i.e. first sets of local Moran index and local Geary coefficient, but they are not linearly proportional to the results of the second set of formulas, namely the second sets of local Moran index and local Geary coefficient.
This mistake leads readers to misunderstand the relationship between global normalized spatial weight matrix and row-normalized spatial weight matrix.Second, the row-normalized spatial weight matrix violates the distance axiom.A spatial weight matrix is based on distance matrix or generalized distance matrix, which must conforms to distance axiom.Otherwise, the calculation result of the global or local Moran's index may appear abnormal.Third, the basic difference between Moran's index and Geary's coefficient was omitted.Moran's index is based on spatial population, while Geary's coefficient is based on spatial sample.Different definitions lead to different application directions.However, in the definitions of LISA, the local Geary's coefficient is based on spatial population rather spatial sample.This is not consistent with original aim of defining Geary's coefficient.The above issues cause a series of consequences.First, the two sets of LISA values are not equivalent to each other.For example, the ratios of the LISA values based on non-normalized spatial weight matrix to the LISA values based on normalized spatial weight matrix are not constants.This is a serious logical problem.As we know, if two measures are equivalent to one another, the ratio of the two measures is constant.For example, the ratio of Student's t statistic to Pearson's part correlation coefficient is constant, which equals the square root of the ratio of residuals mean square deviation to total sum of squares.Second, sometimes, the calculated values of Moran's index and Geary's coefficient exceed reasonable upper and lower limits.Moran's index bear two sets of boundary values at least.One is absolute boundary

)
This is wrong and cannot be strictly derived by mathematical methods, nor can it be verified by observational data.Based on the row-normalized weight matrix, the correct result is

Table 3 Variable conversion methods, calculation formulas, and properties of converted variables
transposition.If the mean of the global Moran's index is treated as I0=1/(1-n), the mean of global