Assessing current temporal and space-time anomalies of disease incidence

Approaches used to early and accurately characterize epidemiologic patterns of disease incidence in a temporal and spatial series are becoming increasingly important. Cluster tests are generally designed for retrospective detection of epidemiologic anomalies in a temporal or space-time series. Timely identification of anomalies of disease or poisoning incidence during ongoing surveillance or an outbreak requires the use of sensitive statistical methods that recognize an incidence pattern at the time of occurrence. This report describes 2 novel analytical methods that focus on detecting anomalies of incidence at the time of occurrence in a temporal and space-time series. The first method describes the paucity of incidence at the time of occurrence in an ongoing surveillance and is designed to evaluate whether a decline in incidence occurs on the single current day or during the most recent few days. The second method provides an overall assessment of current clustering or paucity of incidence in a space-time series, allowing for several space regions. We illustrate the application of these methods using a subsample of a temporal series of data on the largest dengue outbreak in Taiwan in 2015 since World War II and demonstrate that they are useful to efficiently monitor incoming data for current clustering and paucity of incidence in a temporal and space-time series. In light of the recent global emergence and resurgence of Zika, dengue, and chikungunya infection, these approaching for detecting current anomalies of incidence in the ongoing surveillance of disease are particularly desired and needed.


Introduction
Approaches used to early and accurately characterize epidemiologic patterns of disease incidence in a temporal and spatial series are becoming increasingly important. Statistical analysis for detecting temporal and space-time anomalies (clusters and paucity) of health-related events is often required for various epidemiologic and biomedical applications. Cluster tests are generally designed for retrospective detection of epidemiologic anomalies in a temporal or space-time series. Timely identification of anomalies of disease or poisoning incidence during ongoing surveillance or an outbreak requires the use of sensitive statistical methods that recognize an incidence pattern at the time of occurrence. However, clusters or outbreaks are usually detected when the specialists or professionals of health agencies notice an unusually high frequency of health-related events. Perceptions of clustering or decline in incidence are often intuitive without statistical analysis. However, what is necessary is to determine whether or not a cluster or decline in incidence occurs to an extent greater than what would be expected by chance variation. Hryhorczuk et al. discussed the importance of enhancing early detection and suggested using the scan test to detect temporal clustering of poisoning cases in the data reported daily to poison control centers [1]. The scan test employs a moving window of predetermined length w and finds the maximum number of cases revealed through the window as it slides over the entire region [2]. The scan test is structured to detect the largest cluster of incidence. The maximum number of events occurring in a window is the test statistic for the scan test. The authors demonstrated that the scan test can be retrospectively applied in the daily surveillance of poisoning clusters in an analysis of the temporal clustering of carbon monoxide poisonings. In contrast to detecting historical clusters, Grimson et al. proposed a statistical method that is sensitive to detect clusters in incidence at the time of occurrence [3]. Their method, based on a binomial distribution, is designed to detect current clusters of incidence with a duration of one or more days during an ongoing daily data collection and monitoring process. The authors applied their method to the daily carbon monoxide poisoning incidence data on which the scan test had been applied in the study by Hryhorczuk et al., and showed that their test for current clusters of incidence evidently has more power than the scan test.
The purpose of this report is to illustrate the use of novel statistical approaches that focus on detecting anomalies of incidence at the time of occurrence in a temporal series and in a spacetime series. In contrast with the existing method for current clustering [3], one approach is designed to detect paucity of health-related events on the single current day of occurrence or on the most recent few days in a temporal series. When several space regions are involved, we suggest a global form of the test for detecting current clusters or paucity of incidence in a space-time series, depending on the alternative hypothesis. The global test is similar in construction to the Ederer-Myers-Mantel (EMM) test for space-time clustering based on the maximum frequency in a unit of time [4], the V test for space-time vacuity based on the minimum frequency in a unit of time [5], and the scan test for space-time clustering [6]. The global test for incidence anomalies provides an overall assessment, allowing for several space regions in the setting, and has more power for detecting anomalies than do the tests to investigate observed overall temporal incidence combined across multiple space regions.
Testing for excessive aggregations of disease incidence that occurred during a single current unit of time (e.g., day, week) or most recent few consecutive units of time is used to signal the occurrence of an excess of incidence in the current time period, permitting the immediate response and application of early intervention. In contrast, the detection of unusually sparse incidence of disease at the time of occurrence characterizes the current disease activity and epidemiologic transmission in an opposite way. A small p-value of the test indicates that a decline in disease incidence is occurring within the current time period, allowing for immediate assessment of an intervention strategy and decisions regarding prevention programs in the ongoing daily monitoring process.
We illustrate these methods using a subsample of a temporal series of data on dengue incidence in 2015 from the Taiwan Centers for Disease Control and demonstrate that they are useful to efficiently monitor incoming data for clustering and paucity of incidence in a temporal and space-time series. With the recent global emergence and resurgence of epidemic arboviruses such as Zika, dengue, and chikungunya, statistical methods for detecting current anomalies in the ongoing surveillance of disease are particularly desired and needed. Most of the analytic methods for spatial and temporal analysis proposed in the statistical and epidemiological literature are retrospective in nature, particularly those for spatial analysis. Several prospective analytic methods for early detection of emerging disease outbreaks were developed recently, including those used in a temporal series [7][8][9][10][11] and those used in a spacetime series [12]. These prospective analytic methods in a temporal series are designed to identify disease outbreaks over a broad geographical area (e.g., country) and are useful when relatively few cases are observed in any one jurisdiction. They usually require knowledge or assumptions of probability distributions that underlie the data and may need exploratory studies or preliminary analysis for the estimation of model parameters. In contrast, the analytic methods that we proposed here require mild assumptions with the null hypothesis of randomization and are structured to assess whether the incidence during the current few days progresses at the same rate, at a higher rate, or at a lower rate within a surveillance period. The spatial scan test for time periodic geographical disease surveillance by Kulldorff (2001) are designed to detect geographical disease clusters that remain during the last time period for which data are available [12]. While the global form of the proposed test for detecting current clusters or paucity of incidence when several space regions are involved is purely temporal in nature without involvement with the detection of geographical disease clusters.

Methods
Suppose that K health-related events have occurred during T days. Consider the frequency of health-related events that occurred within the most recent w days in comparison with the frequency of health-related events that occurred in the T-w previous days. What interests us is to determine whether the observation of x events that occurred on the current day or during the most recent w days is rare compared with the occurrence of K-x events during the T-w previous days. Assuming that X is the random variable that represents the number of events occurring within the most recent w days and that each of the K events independently and equally occurs on one of the T days, the test for current paucity of incidence, denoted by Pau, is based on the random variable X with a binomial distribution. The exact p-value formula for Pau under the null hypothesis of random allocations of K health-related events over the T days is expressed as follows: where x represents the observed number of events within the most recent w days. The test Pau assesses the significance of the small frequency at the time of occurrence on the basis of randomization and is used to measure an empirical paucity of incidence within the most recent w days. A small p-value of Expression (1) indicates that the observed x events occurring within the most recent w days, compared with the frequency of health-related events occurring during the T-w previous days, is significantly sparse; that is, a decline in incidence in the current wday period occurs within the T-day surveillance period. Expression (1) gives an exact p-value of the test Pau for current paucity of incidence on the single current day for w = 1 and on the most recent 3 days for w = 3. The exact p-value formula of the test for current clusters by Grimson et al [3], denoted by Clu, is as follows: where x represents the observed number of events within the most recent w days. Expression (2) measures an empirical cluster of incidence at the time of occurrence. A small p-value of Expression (2) indicates that the occurrence of x events that excessively aggregated within the most recent w days cannot be explained by chance alone. Expression (2) gives an exact p-value of the test Clu for current clustering of incidence on the single current day for w = 1 and on the most recent 3 days for w = 3. The tests Pau and Clu are based on the same binomial random variable but characterize opposite aspects of an observed incidence pattern at the time of occurrence. They are structured to respectively detect current paucity and current clustering of incidence in a temporal series. When many space regions are involved, we suggest using the global form of the tests to detect current clustering or paucity of incidence in a space-time series. Letting X i be the random variable that represents the number of events that occurred within the most recent w days in the i-th space unit and letting E(X i ) and Var(X i ) denote the expected value and variance of X i , respectively, the test M is defined as If the total number of space units, R, is large and the involved space units are relatively homogeneous in size, then the global test, M, approximately follows a standard normal distribution under the null hypothesis of random arrangements of health-related events in each of the space units. The test M provides an overall assessment of current clustering or paucity of incidence in a space-time series, depending on the alternative hypothesis of M > 0 or M < 0. A large positive value of the test statistic M indicates that an excess of disease incidence occurs within the current time period in the time line for several geographically described population (e.g., hospitals, towns, or counties). In contrast, a small negative value of M indicates that a decline in incidence occurs within the current time period for several geographically described population.
Letting Max i be the maximum frequency in a unit of time in the i-th space-time unit, the EMM test [13] is expressed as EMM ¼ Replacing Max with Min in the above expression is the test V for space-time vacuity based on the minimum frequency that developed in a unit of time [5].

Applications of the tests to dengue outbreak data
Dengue fever is the most common arbovirus infection in humans with virus transmission occurring in more than 100 countries in tropical regions. It is estimated that 390 million dengue infections occur annually, of which 50-100 million cases have apparent clinical manifestations [14][15][16]. The data on dengue incidence from Taiwan provide an opportunity to illustrate the applications of these methods for detecting current temporal and space-time anomalies of incidence. Dengue fever is a notifiable communicable disease in Taiwan

Testing for current temporal clustering and paucity of incidence
We illustrate the use of the tests Clu and Pau for current clustering and paucity of dengue incidence, respectively, in the ongoing surveillance using the daily data in Fig 1 as follows. Where a day is the unit of time, setting w = 3 and T = 10, the number of cases reported during the most recent 3 days are compared with the number of cases that occurred in the previous 7 days. On the 10 th day of August, 80 cases that occurred during the most recent 3 days (that is, August 8-10) are compared with 166 cases that occurred during the 7 previous days (August 1-7). Using Expression (2)  A small p-value of 6.18×10 −3 on August 13 indicates that an excess of dengue cases for current 3-day clusters is identified within the 10-day surveillance period at a nominal significance level of 0.05; that is, an excess of dengue cases has occurred in the most recent 3-day period. Low p-values on August 13, 14, and 15 reflect the high incidence on those days in comparison with the incidence that occurred during the previous few days, indicating that the dengue incidence becomes worse daily, as shown in Fig 1. Next, we illustrate the use of Expression (1) to evaluate the evidence of a decline in dengue incidence in the daily monitoring process. On the 22 nd of September, 323 cases that occurred during the most recent 3 days (September 20-22) are compared with 796 cases that occurred during the 7 previous days (September [13][14][15][16][17][18][19]. The exact p-value for current 3-day paucity of incidence on September 22 is not significant with P( On September 23, a small p-value of 4.45×10 −3 for current 3-day paucity of incidence is obtained, indicating that a decline in dengue incidence during the current 3-day period has occurred within the 10-day surveillance period. Low dengue incidence on September 23-26, compared with the incidence that occurred during the previous few days, results in very small pvalues of Expression (1), indicating that the dengue incidence declines daily, as shown in Fig 1. Testing for current space-time clustering and paucity of incidence We selected 11 districts in Tainan with the highest dengue rates to illustrate the testing for current clustering and paucity of incidence in a space-time series, using the test M shown in Expression (3). The rates, which were the numbers of dengue cases per 100,000 persons, ranged from 0 to 4,497 among the 37 districts in Tainan  For the purpose of illustration, we analyzed a temporal series of dengue incidence data over these 11 districts on August 1-15 and September 13-26, 2015, which are the same time periods as those in the analysis above using Clu and Pau. Where a day is the unit of time, w = 3, T = 10, and R, the total number of space units, is 11 in the model conditions. We obtained M = 1.29 on August 10, which gives a p-value of 0.098 and does not reject the null hypothesis of randomization at a 0.05 nominal significance level. The test statistics and p-values of M for current 3-day clusters in the 10-day surveillance period on August 11-15 are as follows: Small p-values (< 0.05) for current 3-day clusters are identified on August 11 and August 13-15, indicating that there exists a significantly important departure from the expected frequencies and that the dengue incidence becomes worse each of these days over these 11 districts.
Next, we illustrate the use of the M test to evaluate the evidence of a space-time decline in dengue incidence within the ongoing daily surveillance. In the same settings of w = 3, T = 10, and R = 11, M is applied to the data on September 22. We obtain M = 1.11, which gives a pvalue of 0.886, and do not reject the null hypothesis of randomization at a 0.05 nominal significance level. No decline in dengue incidence is detected over these 11 districts on September 22. The p-values for current 3-day paucity of incidence on September 23-26 in the 10-day surveillance period are presented as follows: The very small p-values presented above show that a space-time decline in dengue incidence is detected on September 23-26. We note that the size of the decline increases daily over the 4-day period. The global test M is more powerful for detecting temporal anomalies in incidence than do Pau and Clu to investigate observed overall annual incidence combined over the 11 districts because the global test uses temporal incidence information for each of the 11 districts [4][5][6].

Discussion
Cluster tests are generally designed for retrospective detection of epidemiologic anomalies over a temporal or space-time series. Existing tests for clustering that focus on detecting the times of large or "peak" incidence in a temporal or space-time series, including the EMM test [4,13], scan test [2,6], and Maxima test [17,18], are all based on the maximum frequency in a unit of time. We and others have developed tests that focus on the times of paucity of events, including the empty cells (or empty columns) test [17,18], run of empty cells tests [19], and Minima and V tests that are based on the minimum frequency in a unit of time [5,20]. In this report, we illustrated the use of novel statistical approaches to testing of significance for the small or large frequencies at the time of occurrence on the basis of random allocations of distinct epidemiologic events into consecutive days, which underlies a binomial distribution. The first approach uses a test that describes the paucity of disease incidence at the time of occurrence in an ongoing surveillance and is designed to evaluate whether a decline in incidence occurs on the single current day of occurrence or during the most recent few days, which is the test Pau. In contrast, the existing test, Clu, represents a direct measure of clustering as many cases accumulating at the time of occurrence and is used to assess the evidence of whether an excess of disease incidence has occurred on the single current day or during the most recent few days [3]. The second approach uses the global form of the tests in a temporal series to test for current clustering or paucity of incidence in a space-time series, depending on the alternative hypothesis of M > 0 or M < 0, which is the M test.
Pau and Clu, which are structured to characterize current disease activity and epidemiologic transmission in opposite ways, are useful to determine whether a decline or cluster in incidence has occurred on the single current day or during the most recent few days in the ongoing daily monitoring process, respectively. Statistical approaches that are sensitive to current paucity or clustering of incidence in a temporal and space-time series as presented here provide early and accurate recognition and identification of clusters and declining incidence, which are required for application and assessment of early intervention strategies and for effective disease prevention and control [21]. For instance, health authorities may expand (or change) an intervention strategy as soon as a decline in incidence is (or is not) detected after the use of certain insecticide sprays in a given region. When an excess of disease incidence is identified at a time point, intervention can be initialized immediately. Climate factors, such as sudden cold spells or heat waves, may instantly affect disease activity and epidemiologic transmission. The use of the tests, Pau, Clu, and M, allows health authorities or investigators to statistically evaluate the association between the local climate variables and disease incidence. Diseases for which activity and transmission are affected by environmental or climatic factors are particularly modifiable by intervention.
The use of daily testing in an ongoing surveillance process raises the issue of multiple comparisons. Procedures, such as Bonferroni and False Discovery Rate corrections, have been developed for controlling false positives by using a smaller nominal significance level (< 0.05) for rejecting the null hypothesis [22]. While the use of these adjustments also reduces the overall statistical power and may miss true positives. More importantly, Mantel and others emphasize that the purpose of applying the tests in a monitoring process is "signaling" rather than hypothesis testing [1,3,13,[23][24][25]. Therefore, the use of multiple comparison procedures is not recommended here. In the applications to dengue incidence above, we set T = 10 (surveillance period) and w = 3 (current period). The choice of the appropriate size of unit scale in time for testing anomalies of incidence depends on the disease, the frequency and duration of an outbreak, and perhaps other considerations. We suggest using various sets of the values of w and T for further charactering and comparing the temporal patterns of incidence at the time of occurrence, particularly when the knowledge of the disease etiology is lacking. In the application for daily ongoing surveillance of infectious diseases such as dengue or Zika, one may consider a smaller number for w (e.g., 1 or 3) and T (e.g., 7 or 10). In interpretation of the outcomes of the analysis, we must emphasize that the results are based on the specific sizes of unit scales in time or space chosen.
Recently, Kulldorff (2001) stressed that p-values should be used as an indicator concerning the evidence for true clustering and the amount of effort for the investigation should be dependent on this evidence rather than maintaining a strict cut-off for the p-value to determine detected clusters to be investigated or not [12]. In addition to using multiple comparison adjustments, understanding the correlation between the proposed tests on the two consecutive days will be very helpful for controlling false signal rates. This correlation structure is complex and warrants future research.
The recent global emergence of Zika virus infection and its severe forms, Guillain-Barre syndrome and microcephaly, which have been associated with the Zika virus in French Polynesia and Brazil, suggest that Zika has become a very serious global public health problem [26]. Active disease surveillance is designed to monitor disease activity and epidemiologic transmission. Health authorities must be able to accurately determine whether a decline or cluster in incidence is happening at the time of disease occurrence [21,27]. Statistical methods to efficiently monitor incoming data for clustering and paucity of incidence in a temporal and space-time series as presented here are increasingly desired in light of the recent global emergence of Zika and dengue infection.