Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.


Introduction
Spatial scan statistic, which was introduced by Kulldorff [1], focuses on detecting the presence and locations of geographic clusters within spatial datasets. The free software SaTScan [2] allows users to apply this statistic in different fields. A list of studies that utilized spatial scan statistic is posted in the SaTScan official website [3].
The maximum spatial cluster size is the only parameter that must be selected by users to apply commonly used circular spatial scan statistics with SaTScan software. This parameter is the maximum size that the scanning window can reach as scaled in terms of spatial distance or percentage of the total population at risk [4]. Ribeiro and Costa [5] investigated the performance of spatial scan statistics with different maximum spatial cluster sizes, including secondary clusters; they suggested that three performance measures are sensitive to the maximum spatial cluster size. Although simulation datasets support the selection of different maximum spatial cluster sizes for a specific cluster model, identifying a cluster model applicable for complex real datasets is complicated. Therefore, the guidelines for selecting the maximum spatial cluster sizes for real data remain unclear.
Kulldorff [6] reported that a window sized up to 50% of the population at risk can generally reduce negative clusters. Other researchers also selected lower values for practical reasons, such as data availability [7], location discontinuity [8], specific interest on small clusters [9], search for small clusters with high relative risk (RR) [10], low infectivity of a specific pathogen [11,12], exploratory analysis for irregular-shaped clusters [8,13], or limited available resources for intervention [5]. As such, simple selection of the maximum spatial cluster size may not be appropriate. The performance of the spatial scan statistic must be ranked with different parameters in an application because of varied relationship between the maximum spatial cluster size and the performance in different data sets. Therefore, a performance measure that is generally applicable for various applications must be used.
Numerous performance measures are commonly used in simulation studies; however, few of these measures can be easily applied in real data [14] because they are based on the presence of given artificial clusters. Identification of all the detected clusters as true or otherwise is usually impractical. For example, disease surveillance studies usually have limited available resources; moreover, several performance measures represent different aspects of performance [15]. As such, outcomes from multiple measures can be problematic when ranking the performance of different implementations of spatial scan statistics. Performance measures can be combined using specific formulas with arbitrary weights, but parameter selection is inevitably arbitrary. If the overall performance, rather than a specific aspect of performance, is of interest, then the overall performance measure that is not based on the given artificial clusters would be less arbitrary. Performance measures at the aggregation level are commonly used over data sets generated with a similar underlying model because the former can detect slight differences among spatial scan statistics with different parameters. However, these datasets do not exist in reality. Although simulation datasets can be generated from historical data by using clustering models, this approach is difficult especially when no such historical data exist [16]. In this regard, a performance measure for a single data set is preferred than a performance measure based on a batch of data sets generated with the same model.
In summary, an overall performance measure based on applied dataset, rather than the known presence of true clusters, can be used to select the optimal spatial parameters for improving the performance of spatial scan statistics in applications. However, to the best of our knowledge, this measure has not been developed yet.
This study proposes a novel overall performance measure, namely, maximum clustering set-proportion (MCS-P), which is based on the likelihood function and is customized for all significant clusters and applied dataset. A full definition of MCS-P and additional details are provided in the next section. This new performance measure is applicable to data sets without known clusters because the presence of clusters is unnecessary. Section 3 describes the simulation study for selecting the maximum spatial cluster size to compare MCS-P with existing performance measures. Section 4 presents the application of MCS-P in case data of measles in Henan, China, and Section 5 provides the discussion and conclusions.

Spatial scan statistic
Spatial scan statistics are used to identify the maximum likelihood clusters in the form of a set Z of spatial units, which reject the null hypothesis in the study area G and consider p and q as the probability of an event that occurs inside and outside a zone, respectively. In current applications, we usually focus on detecting zones where p>q.
Although spatial scan statistics vary in terms of the shape of scanning window and the probability model, most of them employ the logarithm of the likelihood ratio (LLR) as the test statistic to identify maximum likelihood clusters [17][18][19][20][21][22]. A maximum likelihood estimation method is also applied to determine the most clustered sub-region Z. The detected clusterẐ is the maximum likelihood estimator of Z. Let C and c z be the observed number of events in G and z, respectively, whereas N and n z are the expected number of events in G and z under the null hypothesis; hence, N = C. Let L(z) be the likelihood under the alternative hypothesis that z is a cluster and L 0 be the likelihood under the null hypothesis; in this case, LLR is: where L 0 is a constant for a given G. The collection z of spatial units can maximize LLR(z) and L(z).
A scanning window with a pre-defined shape and maximum spatial size is employed to identify the solutionẐ ¼ fZjLLRðZÞ ! LLRðZ 0 Þ8Z 0 2 Gg. The size (η) of the window varies between zero and the maximum spatial cluster size (η(Z)) on each possible focus in G to generate a set of potential clusters: P = [{z|η(z) η(Z)}. The potential cluster in P that maximizes the likelihood is the estimator of Z and is also called the most likely cluster (MLC). In addition to this MLC, secondary clusters with high likelihood values are considered.
The precise distribution of the test statistic remains unclear; thus, a Monte Carlo simulation is employed to obtain the critical value under the null hypothesis. The LLRs of all potential clusters are compared with the critical value to determine their significant differences.

Performance measures
Although the capacity to detect the presence of clusters has been widely studied [18,[22][23][24][25][26][27], the performance or the so-called spatial accuracy of the detected clusters should also be considered [14,16,20]. In most studies, measures concerning two respective aspects of performance are used in pairs [21,[28][29][30], with one measure exhibiting the capacity to correctly identify spatial units inside the true clusters and the other measure possessing the capacity to correctly identify spatial units outside the true clusters. Moreover, measures accounting for both aspects are used to measure the overall performance [5,31]. The three commonly used performance measures include sensitivity, positive predictive value (PPV), and misclassification. Previous studies used the number of spatial units to calculate performance measures; however, the use of a population-based measure can provide more robust estimates [32]. Read et al. [14] stated that all spatial units in a study region can be classified into four types to evaluate the performance of the spatial scan statistic: 1. Units inside both true and detected cluster(s) 2. Units inside the true cluster(s) but outside the detected cluster(s) 3. Units inside the detected cluster(s) but outside the true cluster(s) 4. Units outside both true and detected cluster(s).
Let the population in each of the four types of spatial units be a, b, c, and d. The three common performance measures are described as follows: Sensitivity represents the proportion of the population in the true cluster(s) that is correctly identified as cluster(s). This measure is used to determine the capacity to determine true cluster(s).
PPV, which is commonly used with sensitivity, represents the proportion of the population in the detected cluster(s) which actually belongs to the true cluster(s). This measure indicates the capacity to accurately identify spatial units outside the true cluster(s).
Misclassification represents the proportion of mistakenly identified populations. This measure accounts for the population of spatial units within the true cluster(s) but outside the detected cluster(s), as well as the population of detected spatial units outside the true cluster(s). If the misclassification is equal to zero, then all spatial units are correctly identified.
These performance measures are based on the given presence of true clusters and are not applicable for real data sets with unknown clusters. In this study, we propose a novel overall performance measure by using the applied data set.
Novel overall performance measure based on applied data sets Let MLC with η(Z) = i beẐ i1 and the jth significant cluster beẐ ij , then the maximum spatial cluster size η(Z) is a parameter of the collection of potential clusters P = [{z|η(z) η(Z)}. For a local optimum with η(Z) = i,Ẑ i1 ¼ fZjLLRðZÞ ! LðZ 0 Þ8Z 0 2 P i g may differ from the global optimumẐ ¼ fZjLLRðZÞ ! LLRðZ 0 Þ8Z 0 2 Gg. Therefore, LLRðẐ i1 Þ may be smaller than LLRðẐÞ. When only MLC is found or of interest, the optimal η(Z) is selected by ranking the LLR of different MLCs. The optimal η(Z) = i maximizes LLRðẐ i1 Þ.
Comparison of the LLR of the corresponding clusters, such as MLC, may be insufficient for ranking the performance with different η(Z) values when secondary clusters are of interest. First, pairing of the corresponding multiple clusters is very complicated. Second, multiple outcomes from comparisons of different paired clusters may not be consistent. For instance, LLRðẐ i 2 1 Þ can be smaller than LLRðẐ i 1 1 Þ, whereas LLRðẐ i 2 2 Þ can be larger than LLRðẐ i 1 2 Þ. When all the significant clusters are of interest, the significant clusters classify all spatial units into two sets: clustering set in which events are likely to cluster, as well as the set in which events in the rest of the spatial units are not likely to cluster. Therefore, the union of all significant clusters, instead of individual clusters, can be used as the clustering set when ranking the performance of multiple clusters. Let the union of significant clusters found with η(Z) = i be Z i0 , then Clustering sets with different η(Z) values can be used to rank the performance of multiple clusters in a manner similar to that of MLC. With η(Z) maximizing the likelihood under the alternative hypothesis, events in the clustering set are least likely to cluster by chance. Moreover, the likelihood function is maximized when LLR is maximized. Therefore, comparison of the LLR of the clustering set can rank the performance of multiple clusters with different η(Z) values.
The LLR conditioned on Z i0 represents the ratio of the likelihood of the clustering set with η (Z) = i and the likelihood under the null hypothesis. LLR can also measure the dissimilarity between Z i0 as the clustering set and the null hypothesis. When η(Z) = i maximizes the LLR (Z i0 ), L(Z i0 ) will also be maximized. That is, when η(Z) = i maximizes LLR(Z i0 ), the events in Z i0 are more likely to cluster than any other clustering sets found with other η(Z) values.
Although LLR(Z i0 ) can be used to rank the performance of the spatial scan statistic with different η(Z) values, the range of LLR(Z i0 ) may differ because of the spatial distribution of events. For instance, a non-clustering spatial unit surrounded by clusters may be included in the scanning window. The non-clustering spatial units with relatively high RR near a cluster are more likely to be included in Z i0 than those far from the clusters. This trend causes varied ranges and optimal values of LLR(Z i0 ), even in data sets generated with the same model. In addition, most existing performance measures are built in the form of proportions and rahnge from 0 to 1. Thus, the adjustment of the effect of spatial distribution on the data set would render the measure comparable with existing performance measures. An approximate maximum of LLR from G is therefore used. The spatial scan statistic is employed to detect clustering spatial units with p>q; hence, the union of spatial units with RR higher than 1 is selected as the most clustering set (MCS) to obtain the approximate maximum LLR in G.
Subsequently, we adjust LLR(Z i0 ) with LLR(Z MCS ), such that the performance measure MCS-P is: MCS-P represents the ratio between the LLR of the clustering set with η(Z) = i and the approximate maximum LLR in G. LLR describes the relative support of the alternative hypothesis against the null hypothesis. MCS-P presents the closeness of the relative support of the clustering set to the maximum support obtained from the dataset. With this adjustment, MCS-P ranges from 0 to 1, which is similar to that of other performance measures. The denominator LLR MCS is the approximate maximum LLR obtained from G. In extreme cases, the LLR of the clustering set may be higher than that of MCS. Although no such case was found in the present study, we should note that MCS-P is an approximate relative performance measure.

Simulation Study Simulation data
Simulated benchmark data sets based on a real data set of breast cancer mortality [33] were used in this study. The population at risk in the simulated data analysis is the female population from the 1990 census, which contains 29,535,210 individuals in Northeastern USA. The study region consists of 245 counties in Northeastern USA [23].
Fifty scenarios were built with 50 different circular cluster models. The models contain two different total simulated case numbers of 600 or 6000; five different cluster sizes of 1, 2, 4, 8, or 16 counties; and five cluster spatial distribution patterns containing one cluster located in rural, mixed, or urban area. Two clusters were located in rural and urban areas, whereas three clusters were located in all the three areas. These benchmark datasets are available at the SatScan website [34] and commonly used to evaluate different clustering tests [23] or spatial scan statistics with scanning windows of different shapes and parameters [24,32,35]. Details of the cluster models are given in Table 1.
In the latter part of the paper, scenarios are mentioned in the form of "total case numberscluster location-cluster size." For instance, 600-two-1 refers to the data sets in the scenario with 600 total simulated cases and two clusters located in urban and rural areas, with each cluster covering only one county.

Spatial scan parameters
Each of the 50 different maximum spatial cluster sizes were set to increase from 1% to 50% by increments of 1% for the total population at risk for each data set. Only clusters with P-values less than 0.05 were considered significant. In cases when no significant clusters were detected, the detected population was set to zero. Based on previous study, the inclusion of secondary clusters that overlap with more likely clusters does not improve the performance [5]. As the default reporting criteria for secondary clusters, only the secondary clusters unrelated to any likely clusters were reported. With each maximum spatial cluster size, the performance of the circular scan statistic was evaluated by MCS-P with three existing performance measures, such as sensitivity, PPV, and misclassification.

Agreements of MCS-P with other performance measures in different scenarios
To validate MCS-P for different cluster models, we defined the values of each performance measure that differs by less than 0.01 (1%) for each data set as the values close to the optimal result [5]. The 50 different maximum spatial cluster sizes were classified into four types for each data set: For types 1 and 4, MCS-P provided results similar to those of other performance measures. The agreement of MCS-P with each existing performance measure was reported for each cluster model. Agreement represents the similarity of MCS-P and other performance measures for identifying whether a maximum spatial cluster size is close to the optimal value. An agreement of 100% indicates that all 50 maximum spatial cluster sizes are accurately identified and similar between that derived from MCS-P and from another existing performance measure.
The average agreement values of MCS-P with sensitivity, PPV, and misclassification are 86.6257%, 66.2698%, and 81.3829%, respectively. Most MCS-P results are similar to those of sensitivity and misclassification. Although relatively low at more than 66%, the results obtained using MCS-P are similar to those of PPV. Generally, when the values of MCS-P are close to the optimal results, other performance measures also achieve values close to the optimal results. Moreover, MCS-P works better with sensitivity than that with misclassification and PPV. Despite the arbitrary cut-off points for determining whether the values of the performance measures are close to their optimal results, the results show the high agreement of MCS-P and the other performance measures.
The agreements of MCS-P in different scenarios are shown in Table 2. For 45 scenarios, MCS-P exhibits high agreement values with other performance measures similar to the average agreement. For five scenarios including 600-two-1, 600-two-2, 6000-two-1, 600-three-1, and 6000-three-1, MCS-P exhibits low agreements with the other performance measures.
For measuring the capacity to accurately detect true clusters, the results of MCS-P are generally similar to sensitivity in all scenarios, except for the abovementioned five cases. For measuring the capacity for correct identification of spatial units outside the clusters, the agreement of MCS-P with PPV varies in different scenarios. In 600-rural-1, which generated the highest RR and the smallest population, MCS-P exhibits high agreement with PPV. For large clusters with low RR, the agreement of MCS-P with PPV decreases. As an overall performance measure, MCS-P manifests similar results to those of misclassification for most scenarios. That is, with PPV and misclassification, MCS-P is highly accurate in scenarios containing clusters with high RR and small populations. Moreover, MCS-P with sensitivity always exhibits high agreement in most scenarios.
The result of MCS-P with the other performance measures is less accurate in the five scenarios than that in the other scenarios; as such, multiple clusters are generated by different cluster models. One cluster is constructed with very high RR and a small population in a rural area, whereas other cluster(s) possess low RR with a large population in urban areas (and mixed areas), as shown in Table 1. Hence, the clusters are highly heterogeneous. Based on the likelihood of all clustering zones, MCS-P treats clustering zones from different clusters as a homogeneous clustering set. Therefore, in cases with multiple instances of highly heterogeneous clusters, larger values of MCS-P are more likely to be achieved when only the cluster with high RR and a small population is included compared with that when all the clusters are included. The clusters become less heterogeneous with increasing cluster sizes; therefore, the agreements of MCS-P with the other performance measures increase. Additional details are provided in the comparison of the average MCS-P and other measures for different cluster models.

Comparison of average MCS-P and the other performance measures for each maximum spatial cluster size
In each scenario, the maximum spatial cluster sizes near the optimal value were selected with MCS-P. The values of other performance measures with the selected maximum spatial cluster sizes were compared with those of measures with other maximum spatial cluster sizes to determine whether the selection can improve the performance of the spatial scan statistic. The mean values of the performance measures over replicas in the same scenarios were reported for each maximum spatial cluster size to provide detailed information regarding the relationship between MCS-P and the other performance measures.
Generally, with the selected maximum spatial cluster sizes in most scenarios, high values of MCS-P correspond to high values of sensitivity and PPV and low values of misclassification. Selection of the maximum spatial cluster sizes using MCS-P, sensitivity, PPV, and misclassification suggests that the spatial scan statistics achieve accurate results for most cluster models.
The comparison of averaged MCS-P and other performance measures in different scenarios shows their detailed relationships. The average MCS-P is positively related to average sensitivity and PPV but negatively associated with average misclassification in most scenarios. Similar to the agreements of MCS-P with the other performance measures, the five scenarios containing highly heterogeneous clusters, namely, 600-two-1, 600-two-2, 6000-two-1, 600-three-1, and 6000-three-1, exhibit different relationships between MCS-P and the other performance measures. The relationship between average MCS-P and the other performance measures are presented for several typical scenarios. The summary of 6000-three-8 (Table 3) shows the relationship between the average MCS-P and the other performance measures. The optimal results of each measure are marked in boldface, whereas values that differ by less than 0.01 (1%) from the optimal results are underlined. For the underlined values of MCS-P, the sensitivity and misclassification mostly overlap, which implies that the spatial scan statistic with the maximum spatial cluster sizes selected by MCS-P achieve values close to the optimal results of these measures. Therefore, maximum spatial cluster sizes can be selected with MCS-P to obtain accurate results for other performance measures. Hence, selection of the maximum spatial cluster sizes with MCS-P can improve the performance of spatial scan statistics. Detailed relationships between MCS-P and other performance measures are presented in Fig 1. The results of the other performance measures become closer to the optimal results with increasing MCS-P. Similar relationships between MCS-P and other performance measures can be found in the remaining 45 scenarios. The five scenarios show the irregular relationship, similar to the agreements of MCS-P and the other performance measures. Given its conditioning on the likelihood of all significant clusters as a homogeneous clustering set, MCS-P cannot correctly measure the performance of multiple instances of highly heterogeneous clusters. This trend is particularly typical in 600-two-1, as shown in Table 4. One cluster in the rural area presents a high RR value of 192.89 and a very small population of 2675, whereas another cluster in the urban area has a large population of 786178 but a low RR value of 2.73. In the 600-two-1 scenario, the exclusive inclusion of the former cluster provides higher MCS-P values for the maximum spatial cluster sizes of 1% and 2% of the population at risk (Table 5). When parts or the entire latter cluster is included with a large maximum spatial cluster size of over 3%, MCS-P sharply decreases.
This limitation disappears with increasing cluster sizes because of reduced heterogeneity of the clusters. If very small clusters with great heterogeneity are reported, then MCS-P may not be used as an appropriate performance measure.
Interestingly, the statistic shows details of the relationships between MCS-P and the other performance measures. Results of 6000-two-16 describe clearly the details of this feature in Table 5 and Fig 2. The relationships between MCS-P and the three measures can be divided into two stages. The cut-off point is the very first value close to the optimal results of MCS-P, where MCS-P = 0.2701. During the first stage before MCS-P reaches the values close to the optimal results, sensitivity, PPV, and misclassification indicate that improved performance is achieved with increasing MCS-P. Hence, MCS-P works well with the three other performance measures. After MCS-P reaches the values close to the optimal results, other performance measures begin to fluctuate slightly around the optimal results. During this stage, MCS-P increases The fluctuation of existing performance measures around the optimal value of MCS-P is due to the fact that marginally expanding or reducing the cluster size will not significantly alter LLR [4]. Thus, MCS-P in such cases will slightly increase when including a new area with a relatively low RR but not a very large population. In these cases, other measures slightly decrease because the very small area should not be included. The inclusion of areas with plain RR and large population can decrease MCS-P. This trend can be found in 600-rural-1. In Table 6, the sensitivity remains equal to 1 when the true cluster is all included. However, large maximum spatial cluster sizes, including areas outside the true cluster, lead to a steep decrease in MCS-P. Therefore, the maximum spatial cluster sizes with MCS-P values close to the optimal results can be selected to improve the performance of the statistic.

Measles Incidence Data in Henan, China
We applied MCS-P to case data of measles on the county level in Henan province, China in May 2009; data were extracted from the disease reporting system of China CDC. A total of 1,371 cases of measles among a population of 91,669,661 were reported, and the annual incidence rate was 17.6 per 100,000. The data were analyzed using 50 maximum spatial cluster sizes following the simulation study. MCS-P was used for evaluation, and the result demonstrating the maximum MCS-P value was selected and compared with the result obtained using the default maximum spatial cluster size of 50% population at risk.
Maximum MCS-P was achieved when the maximum spatial cluster size was set to 2% of the total population. A total of 649 cases of 14,369,140 individuals in 21 counties were detected using a maximum spatial cluster size of 2% (Z2), whereas 886 cases of 28,859,679 in 41 counties were detected using a maximum spatial cluster size of 50% (Z50). The relative risk of Z2 and Z50 were 3.0200 and 2.0527, respectively, and both were located in the same parts of the study region. Variations in edges were found in clusters located southwest and southeast (Fig 3). To provide additional details, we compared the counties in Z2 and Z50. Twenty of 21 counties in Z2 were also found in Z50, and the remaining 22 counties that differed between Z2 and Z50 are shown in Table 7.
A total of 12 of 22 counties showed lower incidence rates than the average value. Based on the purpose of scan statistics, these counties were probably incorrectly identified as clustering counties when searching for significantly high-risk spatial units. As shown in Fig 3, these counties, which were located at the edges of clusters 1 and 2 in Z50, are close to the high-risk Selection of Parameter of Spatial Scan Statistic by Using MCS-P counties. Therefore, these counties were included with near multiple high-risk areas in Z50. For instance, despite the absence of cases in 411224 and 411330, these counties were misclassified into Z50. In Z2, the use of MCS-P in maximum spatial cluster size selection excluded counties with low incidence rates. Insignificant clustering counties with RR higher than 1 in Z50 were also excluded in Z2. These counties were surrounded by low-risk counties and were insignificant as a single potential cluster. In particular, 411723 showed higher RR (1.8574) than that of the surrounding counties and functioned as a potential cluster. The result suggested that high RR possibly occurred by chance. These counties were excluded in Z2 when MCS-P was used in the maximum spatial cluster size selection.
Although this county was insignificant in Z50, 411623 was regarded as the only clustering county in Z2 because the critical value of LLR is related to the maximum spatial cluster size. At a significance level of 0.05, the critical values for maximum spatial cluster size of 2% and 50% were 6.0221 and 7.5736, respectively. Therefore, 411623 (LLR = 6.0830) was significant in Z2 but not in Z50. When using MCS-P for maximum spatial cluster size selection, scan statistic is more sensitive to small clusters, whose ntest statistics are close to the critical value.
Moreover, 411426 was the only missed high-risk county contiguous to significant clusters in Z2. Three contiguous counties in the eastern region, such as 411426, 411425, and 411402, were tested to be a significant cluster in Z50. In Z2, the scanning window at a maximum spatial  Selection of Parameter of Spatial Scan Statistic by Using MCS-P cluster size of 2% is too small to cover the three high-risk counties in the eastern region. Being one of the counties showing the lowest RR and is least likely to cluster out of the three counties, 411426 was tested as insignificant.
To sum up, the use of MCS-P in maximum spatial cluster size selection excluded counties with low incidence rates and insignificant high rates; these counties were incorrectly included in large clusters when using the default parameter. In addition, the former approach improved the capacity to identify small clusters showing relatively high incidence rates, which can be missed when using large critical value of test statistic with default maximum spatial cluster size. Although smaller maximum spatial cluster size may exclude a part of clustering areas, this phenomenon only occurs to the least likely clustering part of a cluster.

Discussion
Spatial scan statistics are widely used in different fields to identify unusual clustering events throughout the study region. The maximum spatial cluster size is the only parameter of the circular scan statistic that affects its performance. Consistent with previous study, the present simulation study showed that the optimal maximum spatial cluster sizes vary in different scenarios [5]. As such, selection of a proper maximum spatial cluster size for each data set can improve the performance of the statistic because the cluster models of the practical data set are usually unknown. However, existing performance measures are inapplicable in most practical applications without known clusters. This limitation is addressed using the proposed MCS-P performance measure (Section 2.3). MCS-P is based on the likelihood of reported clusters and the approximate maximum likelihood from the used data set; therefore, this measure can be calculated without using the known presence of clusters. The simulation study also showed that the results of MCS-P are similar to those of other performance measures, namely, sensitivity, PPV, and misclassification, in most situations. Although MCS-P is not applicable for conditions with multiple highly heterogeneous clusters, this statistic can be used in most fields where clusters of interest exhibit similar characteristics. For instance, in epidemiological studies, outbreaks of the same disease usually share similar patterns for the same route of transmission, pathogen, and population at risk. In addition, customizing MCS-P for multiple highly heterogeneous clusters could be a direction for our future work.
The results of simulation study are conditioned to the data sets. Although the cluster models vary in terms of cluster number, location, RR level, and cluster size, this study presents several limitations. For example, in the benchmark data, the clusters generated are very far from each other such that no detected clusters can cover the parts of different clusters even when the maximum spatial cluster size is set to 50% population at risk. This phenomenon explains why the results do not differ in cases with large maximum spatial cluster sizes. However, the clusters may be located close to each other in actual practice as shown in the case data of measles. The detected clusters may include risky areas, which are not contiguous, and incorrectly include non-clustering areas around them. To address this problem, researchers should select parameters with MCS-P and alter the cluster shape, which would be investigated in our future work.
Two aspects of performance are considered to evaluate the spatial scan statistic. However, measures for detecting areas inside clusters may provide inverse evaluations against measures representing the capacity to identify areas outside the clusters. These results are common in simulation studies. Although the measures are conditioned to the data sets, the performance measures accounting for one specific aspect of performance, such as sensitivity and PPV, are likely to choose the largest or smallest maximum spatial cluster size. A large scanning window that reasonably covers more parts of the study region is more likely to contain the true clusters, whereas a small scanning window is less likely to contain areas outside the true clusters. Therefore, when focusing on sensitivity, large maximum spatial cluster sizes should be selected. By contrast, smaller maximum spatial cluster sizes should be selected when considering capacity to identify areas outside the clusters. In addition, the large maximum spatial cluster sizes should be selected first to avoid missing clusters. However, the overall performance is of more interest for most cases; as such, the preferred maximum spatial cluster size varies between different cluster models, and spatial scan statistics with different maximum spatial cluster sizes should be applied. Overall performance measures, such as misclassification and MCS-P, should also be used to select the maximum spatial cluster sizes to improve the performance. For data sets without true clusters, MCS-P may be the only performance measure that can be used. Before using MCS-P, the reported clusters still need to be checked. As shown in the simulation study, MCS-P may not work as an appropriate overall performance measure in the reported highly heterogeneous clusters. Although the selection of the maximum spatial cluster size with MCS-P can improve the performance of the statistic, this approach will consume more time than simply using the default setting. For instance, MCS-P has to be calculated for all the 50 maximum spatial cluster sizes to select the optimal maximum spatial cluster sizes. This step would consume as much as 50 times the original computation time. The computation time becomes longer as the applied data set becomes more complicated. Therefore, selection of a proper number of potential maximum spatial cluster size is important.
MCS-P is a measure based on significant clusters and thus varies among different significance thresholds. For example, the spatial scan statistic with parameters set to A may have higher values of MCS-P than those with parameter B at a significance threshold of 0.05 but may be lower at 0.01. Therefore, the significance threshold should be at the same level when comparing different detected values.
In conclusion, the results of using MCS-P in the simulation study are similar to those of three existing performance measures, namely, sensitivity, PPV, and misclassification, in most situations, except those with high heterogeneous clusters. MCS-P can be calculated without known true clusters and is therefore considered applicable to data sets without any given true clusters. The selection of the maximum spatial cluster size using MCS-P is helpful to achieve accurate results. Comparison of the average MCS-P and the other performance measures indicates that the selection of the maximum spatial cluster sizes with values close to the optimal results of MCS-P is a vital step to achieve satisfactory performance of the statistic.
Supporting Information S1 Appendix. Evaluations of different maximum spatial cluster sizes from different performance measures for different cluster models.