A smoothing and bootstrap-based framework for early outbreak detection

Lengyang Wang; Yingcun Xia; Ee Hui Goh; Mark Chen

doi:10.1371/journal.pone.0345088

Abstract

Timely detection of infectious disease outbreaks is critical for effective public health response. The effective reproduction number (R_t) is a key metric that captures transmission dynamics and signals the potential onset of outbreaks when it rises above 1. However, day-of-the-week and public holiday effects, along with random fluctuations in reported cases, can distort R_t estimates and reduce their usefulness for real-time surveillance. In this study, we present an R_t-based outbreak detection framework that integrates calendar-aware smoothing with bootstrap inference to quantify the uncertainty of smoothed R_t estimates. Using daily COVID-19 case data from Singapore, we evaluate several smoothing approaches—including a working-day moving average (MAH) that adjusts for public holidays—and compare the performance of the proposed method with established outbreak detection algorithms such as Early Aberration Reporting System (EARS), Bayesian-based detection methods (EpiEstim) and logistic regression–based approaches. In our framework, calendar-aware smoothing is not a generic pre-processing choice but a necessary, model-agnostic step that produces R_t inputs with reduced calendar artefacts. This makes subsequent inference and testing on R_t both more stable and more interpretable. Our results show that smoothing, particularly with MAH, improves the stability of R_t estimates and enables more reliable outbreak detection. The proposed method consistently demonstrates superior timeliness across observed and simulated outbreaks, while maintaining desired false positive rates. Simulation studies further confirm its robustness under varying sample sizes and case volumes, highlighting advantages over other methods. In conclusion, the proposed method offers a simple, interpretable, and theoretically grounded framework for early outbreak detection. Its consistent performance across real and simulated data suggests it may be broadly applicable to other infectious diseases with similar transmission dynamics.

Citation: Wang L, Xia Y, Goh EH, Chen M (2026) A smoothing and bootstrap-based framework for early outbreak detection. PLoS One 21(3): e0345088. https://doi.org/10.1371/journal.pone.0345088

Editor: Kristan Alexander Schneider, The University of New Mexico, UNITED STATES OF AMERICA

Received: August 11, 2025; Accepted: March 2, 2026; Published: March 23, 2026

Copyright: © 2026 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The datasets/codes used and/or analysed during the current study are available in https://github.com/lengyang1995/A-Smoothing-and-Bootstrap-Based-Framework-for-Outbreak-Detection.

Funding: The authors did not receive specific funding for this work. XY is partially supported by the National Natural Science Foundation of China (72033002) and the SUSTech–NUS Joint Research Program.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Surveillance of infectious diseases intends to provide insights into the progression and impact of epidemics. This is essential for effective planning and mitigation strategies for the population [1]. An early warning model for infectious diseases is a crucial tool for timely monitoring, prevention, and control of disease outbreaks [2] and hence enable rapid implementation of control measures to minimize their impact on public health [3]. In the context of an evolving epidemic with a significant number of infected individuals, the effective reproduction number (R_t) is a crucial metric based on the underlying principles of transmission dynamics. It considers the size of the remaining susceptible population and encapsulates various factors affecting the epidemic’s trajectory [4]. For the purpose of monitoring COVID-19 outbreaks, daily reported cases can be utilized to estimate R_t. An outbreak would continue and case numbers should increase when R_t > 1, but it would subside when R_t < 1 [5,6]. Theoretically, when there is switch from R_t < 1 to R_t > 1, either due to changes in the virus, population level immunity, or the interactions within an at-risk population, we should anticipate that an outbreak is imminent. However, there are several challenges in reliably estimating R_t from available data sources [7].

One key artefact in healthcare data used for surveillance are day-of-the-week and public holiday effects [8,9]. For instance, many healthcare facilities, particularly outpatient and primary care clinics, are typically closed on Sundays and public holidays. This closure results in a sharp decline in reported cases on those days, followed by a surge on the first working day after, leading to misleading troughs and spikes that do not accurately reflect the trajectory of an epidemic. If these systematic day-of-the-week and public holidays effects are inadequately accounted for, they can obscure genuine increases as well as cause artefactual post-weekend and post-holiday increases in case notifications, both of which can adversely affect the estimation of R_t, and render R_t less useful as a tool to support timely decision-making. Methods to adjust for day-of-the-week and public holidays effects have been evaluated [8], but not specifically for adjusting and monitoring estimates of R_t.

However, even after accounting for day-of-the-week and public holiday effects, we would still face the challenge of determining when R_t has truly switched to a value above 1. Counts of cases exhibit stochastic fluctuations around a mean that reflects some baseline incidence. Just on the basis of sampling variation, we could observe a relative increase in cases and hence R_t > 1 because illness episodes presenting at sentinel clinics or those selected for testing by chance included more infections than in a preceding period. We must thus determine when R_t is significantly above 1 before R_t can be used for predicting when an outbreak is imminent.

In this work, we first focused on evaluating calendar-aware smoothing techniques [10] that account for day-of-the-week and public holiday effects, to assess how well these methods work with case notification data for COVID-19 from Singapore, and which are most appropriate when estimating of R_t. Building on these results, we integrated these smoothing approaches into an R_t-based outbreak detection framework designed to provide robust and timely signals despite random fluctuations in reported cases. The framework incorporates calendar-aware smoothing as a core component, ensuring that R_t estimation and bootstrap inference are based on data adjusted for calendar-related artefacts. This integration strikes a balance between interpretability and simplicity, yielding stable outbreak signals with low false-positive rates (fpr) – features that support more responsive and efficient public health action. We then demonstrate the validity of these methods for detecting a series of COVID-19 outbreaks from case notification data in Singapore.

2 Materials and methods

2.1 Data

In this study, we analysed daily reported COVID-19 cases in Singapore from 2021/04/01–2023/2/13, when COVID-19 diagnoses were routinely notified to the Ministry of Health. This period encompassed significant outbreaks driven by variants such as Delta, BA.1/2, BA.4/5, and XBB, which were used to assess the effectiveness of various data smoothing and outbreak detection techniques [11]. By integrating smoothing methods with the estimation of R_t and bootstrap techniques, we compared our proposed outbreak detection method against existing approaches during key outbreak periods. Furthermore, we conducted a simulation study focusing on the BA.4/5 and XBB subvariants to explore how the total number of daily observed cases, and the sample size of individuals tested to ascertain the variant could affect outbreak detection.

The replacement of previously dominant variants by emerging ones was monitored using whole genome sequencing (WGS), the gold standard for identifying and classifying SARS-CoV-2 variants. WGS enables both the detection of known lineages and the surveillance of new ones, making it essential for tracking viral evolution and informing public health strategies [12].

2.2 Methodological framework

2.2.1 Calendar-aware smoothing method.

Calendar-related anomalies introduce systematic trough–spike patterns in reported case counts. Because R_t can be expressed as a nonlinear function (e.g., a ratio) of short-term weighted sums of past incidence [13], these irregularities can be amplified by R_t estimators and obscure early outbreak signals. To address this, our framework includes calendar-aware smoothing as a required Step 1, producing incidence series whose short-horizon aggregates remain comparable across weeks regardless of weekend/holiday placement. Subsequent R_t estimation and bootstrap testing are then applied to these calendar-stabilized series.

A 7-day moving average (MA) is the simplest calendar-aware smoothing approach to remove the day-of-the-week effect. For example, the daily cases after MA smoothing are calculated as:

However, a drawback of this method is its inability to account for public holiday effects. The working-day moving average method, which treats both public holidays and weekends as non-working days and the remaining days as working days (denoted as MAH), as described in [8], addresses this limitation. Let T represent the number of working days within a given 7-day block. In this method, the calculation involves:

Multiplying the sum of cases on working days by .
Multiplying the sum of cases on non-working days by .

This adjustment mirrors the typical operations of most primary care clinics, which operate five days a week during non-public holiday periods. The overall smoothed cases are calculated by dividing the sum of these adjusted totals by 7. For consecutive 7-day blocks without public holidays, this approach yields results identical to the MA method. However, when public holidays are included, the MAH method marginally increases the weight of working days while reducing that of non-working days, to make the data for that period more comparable to a typical 7-day block without public holidays.

[8] also described an extended working day moving average method that utilizes historical data to calculate specific scaling factors for each day of the week. While this approach can provide more accurate estimates, its reliance on historical data limits its applicability to short data periods. As a result, we have opted not to include this method in our analysis.

2.2.2 Estimation of R_t based on observed data.

Prior to presenting our proposed method for outbreak detection, we outline a technique based on observed data with a fixed generation time (time between the infection of a primary case and one of its secondary cases) of H days for estimating R_t [14]. Assuming a fixed generation time of H = 3 [15], the R_t is determined by taking the sum of new reported cases over a span of 3 consecutive days and comparing it to the sum of new reported cases from the 3 days preceding that period. For instance, let’s illustrate this with some hypothetical data in Table 1:

Download:

Table 1. New Reported Cases Over Days.

https://doi.org/10.1371/journal.pone.0345088.t001

For instance, let’s illustrate this with some hypothetical data:

2.2.3 Bootstrap method for drawing statistical inference.

We use a bootstrap method [16–18] to draw statistical inference for R_t, since accurate estimation of the asymptotic distribution of R_t is challenging. In this analysis, we considered two scenarios:

Scenario (a) involved examining the total observed infections notified to health authorities. This included both cases diagnosed by PCR-based assays and Rapid Antigen Tests, which do not distinguish between variants.
Scenario (b) involved a stratified analysis for a new variant versus the incumbent variant, including only a subset of total cases tested by PCR-based assays with differentiation between variants using SGTF (and potentially WGS, although this introduces a lag due to turnaround time).

Let Y_t represent the daily COVID-19 cases and assume . We assume the data-generating process (DGP) belongs to a parametric class , such that:

Denote as the estimator of , which we specify as an ARIMA model to estimate the conditional mean . Let denote the d-th order differenced series. Then we have:

in which B is lag operator, is the error term and p,d,q are selected by the model with the smallest AIC. The final inference and forecasts are mapped back to the original scale by applying the inverse differencing transformation.

For scenario (a), with observed COVID-19 cases , define bootstrap samples by:

(1)

The advantage of this design is that it preserves and mimics the dependence structure of the underlying observations, making it well-suited for analysing time series data [16].

For scenario (b), let V_t denote the number of infections due to a variant of concern (VOC) among S_t people tested, with the corresponding proportion . The estimated total daily cases of the VOC are . Similarly, we assume:

where serves a role analogous to , representing a time series model (e.g., ARIMA) that estimates the conditional mean of V_t based on its historical values.

Then, with observed , , and , define bootstrap samples by:

(2)

Note that can be either the original observed cases or the adjusted values after accounting for day-of-week and public holiday effects, as described in Section 2.2.1. However, based on our results, we strongly recommend applying calendar-aware smoothing when calendar anomalies are present.

Next, at time t, we compute the effective reproduction number R_t as described in Section 2.2.2. To draw statistical inference for R_t, we perform bootstrapping on the time series from to , using either (1) or (2), and repeat this B times.

For scenario (a), denote the bootstrap effective reproduction number at time t by:

and for scenario (b), denote:

2.2.4 Outbreak detection algorithm summary.

Combining the previous methods, our proposed approach for outbreak detection can be described in the following steps.

We illustrate the procedure using Scenario (a). Given observed cases , we aim to test the hypothesis:

Step 1: Calendar-aware smoothing. Mitigate the day-of-the-week and public holiday effects in the observed cases using the methods described in Section 2.2.1. Denote the smoothed case series as . This step is essential to the framework, as all subsequent R_t estimation and inference are performed on the calendar-adjusted series Y_t’ to ensure that short-term fluctuations are not confounded by calendar effects.

Step 2: Compute R_t. At time t, compute the effective reproduction number R_t from the smoothed case series, assuming a constant generation interval H, as described in Section 2.2.2.

Step 3: Bootstrap under H₀. Generate bootstrap samples under the null hypothesis H₀ using the approach in Section 2.2.3. From time to , compute the bootstrap effective reproduction number at time t as:

Step 4: Hypothesis testing. Calculate the p-value at time t as:

where is the indicator function. We reject the null hypothesis H₀ if for all , where:

The parameters K, H, and α are selected to ensure that the fpr over the entire monitoring period remains below a prespecified level. Further details on parameter calibration are provided in the following section.

This four-step procedure combines calendar-aware smoothing and bootstrap inference within an R_t-based structure, producing epidemiologically interpretable and statistically supported outbreak signals.

2.3 Statistical comparison of different detection methods

Daily reported COVID-19 cases in Singapore from 2021/04/01 to 2023/2/13, were used to compare smoothing and outbreak detection methods. The smoothing approaches used in the proposed outbreak detection framework were:

No smoothing (NS)
7-day moving average (MA)
Working-day moving average (MAH)

We also compared our proposed outbreak detection method, referred to as PM, with the following:

Early Aberration Reporting System (EARS) [19]: We implemented all three EARS methods (C1–C3). In EARS C1, an outbreak signal is detected if the current observed value exceeds the mean of the previous w reporting periods plus the product of a critical value (corresponding to significance level α) and the standard deviation. C2 differs by shifting the w-day baseline window to the left with a gap of g = 3 days. While C1 and C2 are analytically equivalent in the absence of outbreaks, C2 is more sensitive to continued outbreaks. C3 modifies C2 by using a partial sum of positive daily deviations over the current and previous two days [20]. For analysis, we use the abbreviation EC.
Bayesian-based detection [13]: The EpiEstim method, introduced by [13], provides a framework for estimating the time-varying effective reproduction number (R_t) during an epidemic using daily incidence data and the serial interval distribution. It relies on a renewal equation approach, assuming that new cases arise from past infections weighted by the serial interval. The method adopts a Bayesian formulation, allowing for real-time estimation of R_t with uncertainty quantified via posterior distributions. We define an outbreak signal at time t if the lower bound of the estimated R_t exceeds 1. For analysis, this method is referred to using the abbreviation EPI.
Logistic regression-based detection [21,22]: These methods estimate R_t of a new variant of concern (VOC) relative to the incumbent variants using logistic regression. Suppose the VOC variant v is a specific value of the sequence S_t, and T_t is the sampling time. Then, the logistic regression model is:

where represents the growth rate difference between the VOC and the incumbent variants at time t. While R_t can be derived using generation time distributions, these can vary across regions and time periods. Thus, we treat as the outbreak signal and define an outbreak as occurring at time T_i if with statistical significance at level α. This method is abbreviated as Logi in our analysis.

Other commonly used methods for outbreak detection are the Farrington algorithm [23] and CUSUM [24]. However, both may be suboptimal for COVID-19 surveillance. The Farrington algorithm is designed primarily for diseases with clear seasonal and periodic patterns and typically relies on over a year of historical baseline data. In contrast, CUSUM requires careful parameter tuning for each outbreak, as different outbreak magnitudes necessitate different threshold settings. A single set of parameters cannot accommodate all scenarios, and multiple parameter combinations may yield similar false positive rates. This complicates fair comparisons of detection timeliness, as performance may be heavily influenced by the chosen parameter combination.

Suppose the detection period of an epidemic starts at t₁ and ends at t₂. We computed the following performance metrics for comparison:

Timeliness = ,
False positive rate (fpr) = .

For timeliness, the testing period extended from the onset of a new epidemic until its peak. As shown in Fig 1, for the Delta variant, we identified three major outbreaks:

An outbreak arising at Changi Airport [25],
An outbreak arising at Jurong Fishery Port [26],
A widespread outbreak following the relaxation of COVID-19 restrictions [27].

Download:

Fig 1. Fig A and B indicate outbreak caused by different variants.

https://doi.org/10.1371/journal.pone.0345088.g001

For other variants, major outbreaks were identified due to several omicron subvariants, with specific dates detailed in the S1 Supplementary Materials. For the false positive rate, the testing periods were defined as the intervals between the peak of a previous epidemic and the onset of a new one. During these non-epidemic periods, any detected outbreak signal is considered a false positive.

2.3.1 Simulation analysis 1.

Determining an appropriate sample size for testing is critical to the effectiveness of surveillance systems, as it directly influences data analysis and the system’s ability to achieve public health objectives [28], while being financially and logistically feasible. To address this, we evaluated how varying the total daily observations and sample sizes of tested individuals affects the timeliness of different methods. Our simulation study focused on the BA.4/5 and XBB variant waves, using available data for analysis [11,29]. It is important to note that this analysis was designed to assess the sensitivity of each method to changes in observation scale and testing sample size, rather than to directly compare timeliness across different methods.

For the BA.4/5 variant wave, the logistic regression method treated both BA.2 and BA.2.10 as the incumbent strains. In this case, binary logistic regression was applied, with BA.2 and BA.2.10 combined as the reference group. In contrast, during the XBB wave, the incumbent strains included the BA.2.75, BA.4/5, and BA.2. Given the presence of four variants, multinomial logistic regression was employed. Both BA.2.75 and BA.4/5 were considered as potential reference groups, and the results from the logistic regression method were averaged to account for both scenarios.

Owing to methodological and data structure limitations, we were unable to access the false positive rate for the logistic regression method. We hence simply fixed its significance level at 5%. For the other methods, we used the same parameter settings as in the previous analysis, which compared the performance of each method using observed COVID-19 data from Singapore.

In the simulation:

Let represent the simulated daily COVID-19 cases, modeled by , where is an ARIMA model fitted to the observed cases during the BA.4/5 or XBB outbreaks.
Let represent the adjusted daily COVID-19 observations, where . This adjustment examines the impact of total daily observations.
Denote the daily sample size of tested individuals as , where . This adjustment examines the impact of the sample size of tested individuals.

Parameter a was varied to simulate different overall case volumes—representing scenarios from low-incidence to high-incidence surveillance periods—to examine how the magnitude of observed data influences model stability and detection performance. Similarly, parameter b was adjusted to evaluate how the proportion of tested individuals affects the sensitivity of each method under different levels of surveillance intensity. The above analysis allowed us to assess how these variations affected the performance of each method during the BA.4/5 and XBB variant waves. The steps for the simulation study, from iterations 1 to N, are outlined in Fig 2.

Download:

Fig 2. Steps of each method for simulation study 1.

https://doi.org/10.1371/journal.pone.0345088.g002

2.3.2 Simulation analysis 2.

We carried out an additional simulation to evaluate the performance of the proposed outbreak detection methods (with MAH smoothing and no smoothing), all EARS methods, EpiEstim method and the logistic regression method. The comparison focused on their timeliness while ensuring that all methods operated at the same fpr. In the simulation, we use 3 models to compare the performance of various methods. Model 1 and 2 are based solely on observed total case counts (excluding the logistic regression method), while Model 3 is designed to assess performance in stratified analyses (including all methods). Please refer to the S1 Supplementary Materials for further details.

3 Results

3.1 Results for smoothing method

As an illustration, Fig 3 compares various smoothing methods for the BA.1/2 and XBB waves. It is evident that the original data (NS) exhibited significant fluctuations due to the lack of adjustment of day-of-the-week and public holiday effects, with R_t oscillating around 1, making it difficult to detect outbreaks. In contrast, both the MA and MAH methods showed a smoother adjustment of cases, and R_t remained more stable, with the interference from calendar effects largely mitigated.

Download:

Fig 3. Panel A and B are adjusted cases for BA.1/2 and XBB outbreaks between different smoothing methods.

Panel C and D are estimated Rt for BA.1/2 and XBB outbreaks between different smoothing methods.

https://doi.org/10.1371/journal.pone.0345088.g003

3.2 Comparison between outbreak detection methods for observed data

First, we selected the parameters for each method by controlling the fpr within a reasonable range. Considering that the serial interval for COVID-19 in Singapore varies between 2–4 days [30], we used constant generation times H of 2, 3, and 4 (as described in Section 2.2.2) and varied K from 1 to 4 (as detailed in Section 2.2.4) to evaluate the fpr of the proposed methods. The parameters for all methods were adjusted to ensure that the fpr remained at or below 0.01, 0.02, and 0.03.

Fig 4 illustrates the timeliness of various outbreak detection methods under . The solid navy-blue horizontal line indicates the median timeliness for each method, while the dashed line represents the average. Among the proposed methods (PM), PM_NS (K = 3) showed the poorest performance, as expected, due to the absence of smoothing on the input data. In contrast, PM_MA (K = 1) and PM_MAH (K = 1) yielded comparable results, with PM_MAH offering a slight edge. All EARS variants with the ‘NS’ suffix—based on unsmoothed data—performed worse than PM methods utilizing MA or MAH smoothing, particularly during the Delta 1 outbreak. During the BA.1/2 outbreak, however, all methods achieved similar timeliness scores. This likely reflects the relatively stable case counts between early December 2021 and mid-January 2022, a period of delayed case growth possibly influenced by Singapore’s phased reopening, including the introduction of vaccinated travel lanes, easing of event restrictions, and intensified contact tracing for BA.1 and BA.2 cases [31]. Consequently, all methods exhibited timeliness scores around 0.6 during this period. For more detailed results across different fpr thresholds, refer to the S1 Supplementary Materials, which shows consistent patterns. Notably, achieving required substantially smaller α values for EARS C2 and C3, which severely compromised their performance. In contrast, PM_MAH, PM_MA, and EPIS continued to perform well under these stricter thresholds.

Download:

Fig 4. Comparing different methods in terms of timeliness, the EARS/EpiEstim methods that end with ’NS’ are based on the original data, while the EARS/EpiEstim methods that end with ’S’ are based on data smoothed using the MAH method.

The solid navy-blue horizontal line represents the median timeliness for each method, whereas the dashed navy-blue line indicates the average timeliness.

https://doi.org/10.1371/journal.pone.0345088.g004

Although EARS methods do not rely on smoothing, we also applied them to MAH-smoothed data (suffix ‘S’). These smoothed variants exhibited moderate improvements in timeliness, and in several outbreaks achieved performance comparable to PM_MAH and PM_MA.

The EpiEstim method outperformed the EARS methods when applied to un-smoothed data but remained slightly less effective than both PM_MAH and PM_MA under and . Although EpiEstim does not require smoothed input, we also assessed its performance using MAH-smoothed incidence. This smoothing notably improved timeliness during the Delta 1 outbreak but had limited effect on other outbreak waves.

Finally, the logistic regression method could not be directly evaluated in this comparison due to a key limitation: it requires differentiation between an emerging VOC and incumbent variants based on SGTF or WGS data. This requirement restricts its applicability to observed case counts alone.

3.3 Results for simulation analyses

3.3.1 Results for simulation analysis 1.

The simulation results for the BA.4/5 and XBB variants are presented separately in Fig 5, using the same parameter settings as in Fig 4 across all methods. This simulation analysis specifically evaluates the impact of sample size, expressed here as the daily fraction of individuals tested to determine the variant(parameter b, from 5% to 50%) and the total number of daily COVID-19 observations (parameter a, as a multiple of the observed number of cases, from 0.25 to 1.0). It is not intended to compare the timeliness of the methods directly, as they operate under different fpr. Consequently, direct comparisons of timeliness across methods are not appropriate.

Download:

Fig 5. Panels A and B examine how the sample size of daily tested individuals (parameter b) and the total number of daily COVID-19 observations (parameter a) affect the performance of the proposed methods with MAH smoothing (PM_MAH), as well as the logistic regression method.

In the logistic regression methods, those ending with ’NS’ are based on the original data, while those ending with ’S’ use data smoothed with MAH. Panels C and D evaluate the effect of total daily COVID-19 observations (parameter a) on the performance of the EARS methods. Similarly, EARS methods ending with ’NS’ use the original data, while those ending with ’S’ apply MAH smoothing. It should be noted that this analysis was not intended to compare timeliness across methods, but rather to evaluate how performance responds to changes in observation scale and sample size.

https://doi.org/10.1371/journal.pone.0345088.g005

In Fig 5 A and B, the findings indicate that increasing the fraction of daily tested individuals enhances timeliness for both the logistic regression method and the proposed method incorporating MAH smoothing. This result aligns with their theoretical properties, which will be discussed further in the Section 4. However, the timeliness of the proposed method plateaus beyond a certain threshold, as it can only generate signals starting from or later.

Fig 5 C and D demonstrate the influence of the total number of daily COVID-19 observations on the performance of all EARS methods, which are based on the counts of all cases (i.e., without requiring testing to determine the variant). For these methods, timeliness remains largely unaffected as the total number of observed cases increases. This behavior is attributed to the structure of their test statistics and will be explained in more detail in the Section 4.

3.3.2 Results for simulation analysis 2.

We began by adjusting the parameters for all methods across various fpr levels, except for the EARS C3 method with smoothing, as it could not achieve an fpr of 0.03 or lower. Once these adjustments were made, we evaluated the timeliness performance across all models.

As shown in Fig 6, for Model 1—where the DGP excluded day-of-the-week and public holiday effects—smoothing did not improve the timeliness of either the proposed methods or the EARS and EpiEstim approaches. This was expected, as the DGP lacked calendar-related anomalies. Nonetheless, the proposed method consistently outperforms all other methods by a small margin in terms of timeliness, regardless of whether smoothing is applied.

Download:

Fig 6. Each panel corresponds to a specific model and compares the timeliness of all methods while maintaining the false positive rate at the same level.

For EARS methods, those ending with ’NS’ are based on the original data, while those ending with ’S’ use data smoothed with MAH. Model 1 and 2 rely solely on observed total cases (excluding the logistic regression model for comparison), with the difference being that the DGP in model 1 excluded while the DGP in model 2 included day-of-the-week and public holiday effects. Model 3 is designed to assess performance in stratified analyses (and hence include the logistic regression model), with day-of-the-week and public holiday effects are simulated as well. Further details are provided in the S1 Supplementary Materials.

https://doi.org/10.1371/journal.pone.0345088.g006

For Model 2, which included calendar effects and random noise, MAH smoothing moderately improved timeliness for both the proposed and EARS methods. In contrast, it had little impact on EpiEstim, likely because EpiEstim already applies temporal smoothing via its sliding time window. Overall, the proposed method with MAH smoothing consistently achieves the best timeliness, followed by EpiEstim without smoothing.

For Model 3—which introduced varying day-of-the-week and public holiday effects across different periods compared to Model 2—the results were mixed. While smoothing slightly improved the timeliness of EARS C1 at lower fpr levels, it had no effect on EARS C2, and the overall performance of the EARS methods remained limited under this more complex DGP. Similarly, the logistic regression method performed poorly, likely due to the modest growth advantage of the VOC over the non-VOC variant, which reduced its ability to detect differential growth. Smoothing also provided little benefit to this method, as it relies on relative growth rates that are minimally affected by calendar-driven fluctuations. EpiEstim’s performance remained largely unchanged with smoothing, consistent with findings in Model 2. In contrast, the proposed method consistently benefited from smoothing, with MAH substantially enhancing its timeliness. Notably, the MAH-smoothed version (PM_MAH) outperformed all other methods across all scenarios, followed closely by EpiEstim without smoothing (EPINS).

In conclusion, the simulation study highlights that MAH smoothing offers substantial benefits for both the proposed method and the EARS methods when calendar anomalies are present in the DGP, although the magnitude of these benefits varies across scenarios. Overall, the proposed method with MAH smoothing consistently emerged as the best performer, followed by the EpiEstim method—an expected outcome given the more sophisticated design of these two approaches.

4 Discussion

In this work, we confirmed the importance of smoothing methods [16] to adjust for the anomalies caused by day-of-the-week and public holiday effect in COVID-19 data reported in Singapore, spanning from 2021/04/01–2023/02/13. Employing these calendar-aware smoothing techniques go beyond improving visual representation of epidemic curves but also significantly stabilise effective reproduction number estimates. By integrating this adjustment directly into the outbreak detection framework, rather than treating it as a separate preprocessing step, we ensured that the estimation of R_t and subsequent inference were performed on calendar-stable inputs, reducing artefacts and improving interpretability.

While the effective reproduction number has long been a fundamental concept in explaining why epidemics occur, we showed that these stabilized estimates also allow it to be an effective indicator of when an outbreak is in progress. When used with smoothed case surveillance data, a proposed algorithm using the reproduction number outperformed several other statistical methods for flagging the start of COVID-19 epidemics in Singapore. Then to support more robust comparison under a variety of scenarios, we applied bootstrap methods [16–18] to draw statistical inference for effective reproduction number versus other methods, with and without smoothing. This provided a rigorous framework for assessing the timeliness and reliability of outbreak detection, and yielded insights on how smoothing of day-of-the-week and public holiday effects, as well as observed incidences and sample sizes affects the performance of different methods.

[8] highlighted the importance of addressing anomalies caused by day-of-the-week and public holidays to appropriately smooth healthcare data for infectious disease surveillance. We took their work one step further by showing how calendar-aware smoothing techniques significantly improves detection timeliness, both in established methods like EARS as well as our proposed method. Conceptually, calendar anomalies from the day-of-the-week are a source of “periodic noise” in surveillance data. Consequently, algorithms that neglect adjustment for these will require thresholds to manage false positive signals that also compromise timely outbreak detection. However, smoothing algorithms that account for such periodic effects by aggregating data over several days may also result in reduced timeliness. In this paper, we showed both in real data and a series of simulated scenarios, smoothing does improve the performance of most algorithms, as evidenced in the context of COVID-19 epidemics. Additional adjustment for public holidays slightly outperformed smoothing for day-of-the-week on its own. Public holidays are relatively rare, and the degree of improvement hence depends on when the holidays occur relative to critical time points in an epidemic. We would recommend to routinely adjust for this since our results show how the combined MAH algorithm best mitigates anomalies from calendar effects.

Having established the relevance of the smoothing algorithm, we tested the timeliness of our proposed method based on reproduction number as an early warning of impending COVID-19 epidemics. Results using both observed and simulated data for several established techniques [19,21,32] were also explored for comparison. Overall, the family of EARS methods exhibited varying performance across different scenarios. Within the EARS family, no single method consistently outperformed the others. In contrast, our proposed method (incorporating calendar-aware smoothing) demonstrated modest enhancements in timeliness compared to all EARS and EpiEstim methods. Taken together, integrating statistical and epidemiological components within a unified structure positions our framework as a practical extension of existing threshold- and regression-based detection systems. It achieves a balance between interpretability, sensitivity, and operational simplicity, making it broadly applicable to real-time surveillance settings.

Using simulation, for additional insights into their applicability to different scenarios, we also explored if the sample size of tested individuals and the incident number of observations choice affected the performance of the various methods. The EARS methods were minimally influenced by the number of incident cases in surveillance data, with the small fluctuations in timeliness for different case numbers in Fig 5 due to the resampling procedures rather than the number of simulated cases. A close examination of the EARS methods show that their test statistics utilize a standard normal distribution, and consequently their performance is minimally influenced by the number of incident cases in surveillance data.

To illustrate why this is so, let us set the baseline window size to w for EARS-C1. Assuming the case count is Y_t, under the null hypothesis of “no outbreak”, the expected value of Y_t is approximated by the mean of the previous w observed case counts:

Additionally, the variance of Y_t is approximated by the sample variance of the previous w counts:

which follows the distribution:

where is a chi-squared random variable with w degrees of freedom.

Since the mean and variance are approximated by the previous w counts, under the null hypothesis of no outbreak, the following statistic was defined by EARS-C1:

and an alarm is raised if , where is the critical value from the standard normal distribution. This statistic C₁(t) is hence independent of the case count.

Our exposition of the statistics underlying the EARS method highlights a key issue: for the same relative change, outbreak signals based on data with larger case counts should generally be more reliable than those with smaller case counts. However, this difference in reliability is not reflected in the EARS methods. Although EARS allows for adjusting the fpr by modifying the α level, it fails to account for scenarios where there is insufficient training data to properly calibrate the fpr. These results are consistent with those shown in Fig 5, where the all EARS methods have the same timeliness results regardless of different total number of daily COVID-19 observations.

Compared to the EARS methods, EpiEstim offers several advantages for outbreak detection. As a model-based approach, EpiEstim estimates R_t, capturing changes in transmission dynamics rather than relying solely on deviations from recent case counts. This allows for earlier detection of outbreaks, particularly when R_t exceeds 1, even before large increases in case numbers are observed. EpiEstim is also less sensitive to random fluctuations and day-of-week effects in the reported data, especially when combined with smoothing techniques (sliding window). Furthermore, it remains informative during low-incidence periods and provides uncertainty estimates through confidence intervals for R_t, enhancing interpretability and supporting risk-based decision-making. In contrast, EARS relies on empirical thresholds and is more vulnerable to noise and delayed signals during the early stages of an outbreak.

A key limitation of the EARS method becomes apparent when applied to variant-specific data—a drawback also shared by EpiEstim. Both methods rely solely on observed case counts and do not account for the underlying sample size used to determine the variant. As a result, they generate identical signals regardless of whether the estimates are based on a sample of 50 or 500 individuals, potentially leading to misleading conclusions when sample sizes vary substantially. In contrast, the logistic regression method showed increasing timeliness with increasing sample size of tested individuals, which is also in-line with its theoretical property as the variance of the estimate—being the inverse of Fisher information—decreases as the sample size increases.

One obvious limitation of the proposed outbreak detection method is its inability to detect outbreaks right at the onset of epidemics, as it necessitates the exclusion of early observations when calculating R_t. This limitation is shared by the EARS approach, which also requires some baseline data for effective detection. In contrast, the logistic regression method’s design allows it to function independently of prior data, enabling it to identify outbreaks at the onset of an epidemic when the new VOC has a strong growth rate relative to the incumbents. However, the method based on logistic regression requires that we have some form of laboratory testing to differentiate the new VOC from the incumbent variants against which it is being compared. In particular, if WGS is used to discriminate variants, this introduces an additional lag of several days between the point when samples are collected to when the data can be analysed.

It is also theoretically possible for the logistic regression method to result in a “false positive” of predicting an epidemic by a new VOC that is shrinking in incidence but is still growing relative to incumbent variants undergoing even more rapid decline. These complexities underscore the importance of selecting an appropriate method based on the specific epidemiological context, technical feasibility, and data constraints.

Finally, all methods will likely be susceptible to false positive signals when encountering exponential growth of imported cases of a variant to which the local population has herd immunity. This remains an untested scenario in the case of COVID-19 but could occur should a more effective vaccine be developed. Finally, the proposed method was validated here on case notification data. However, the method can potentially be applied to data from sentinel testing, which is available for COVID-19 in Singapore. This could be a focus for future extensions of this method, where comparisons would also need to be made against methods based on viral load indices from wastewater testing [33].

5 Conclusion

In this study, we addressed the significant impact of day-of-the-week and public holiday effects in syndromic data, highlighting the importance of these adjustments through analyses of COVID-19 data and comprehensive simulation studies. By integrating smoothing techniques with bootstrap resampling methods, we developed a quantitative framework for outbreak detection that performed consistently across both real and simulated outbreaks when compared with EARS and EpiEstim algorithms. It also has simplicity in terms of parameter choices, with the only key setting being the assumed serial interval, which can be informed by external literature. These are advantages which build confidence in its applicability to future events. In addition, given its theoretical foundation in the principle of the reproduction number, it may also work with other infections like influenza with similar transmission dynamics, and this should be an area for future work.

Supporting information

S1 Text. Additional analyses in this study.

https://doi.org/10.1371/journal.pone.0345088.s001

(PDF)

References

1. Elliot AJ, Harcourt SE, Hughes HE, Loveridge P, Morbey RA, Smith S, et al. The COVID-19 pandemic: a new challenge for syndromic surveillance. Epidemiol Infect. 2020;148:e122. pmid:32614283
- View Article
- PubMed/NCBI
- Google Scholar
2. Hu W-H, Sun H-M, Wei Y-Y, Hao Y-T. Global infectious disease early warning models: An updated review and lessons from the COVID-19 pandemic. Infect Dis Model. 2024;10(2):410–22. pmid:39816751
- View Article
- PubMed/NCBI
- Google Scholar
3. Wagner M, Tsui F, Cooper G, Espino JU, Harkema H, Levander J, et al. Probabilistic, Decision-theoretic Disease Surveillance and Control. Online J Public Health Inform. 2011;3(3):ojphi.v3i3.3798. pmid:23569617
- View Article
- PubMed/NCBI
- Google Scholar
4. Dietz K. The estimation of the basic reproduction number for infectious diseases. Stat Methods Med Res. 1993;2(1):23–41. pmid:8261248
- View Article
- PubMed/NCBI
- Google Scholar
5. Fauci A, Lane H, Redfield R. Covid-19 — Navigating the Uncharted. New England Journal of Medicine. 2020;382(2).
- View Article
- Google Scholar
6. Hethcote H. The mathematics of infectious diseases. SIAM Review. 2000;42:599–653.
- View Article
- Google Scholar
7. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, Rt. PLoS Comput Biol. 2020;16(12):e1008409. pmid:33301457
- View Article
- PubMed/NCBI
- Google Scholar
8. Buckingham-Jeffery E, Morbey R, House T, Elliot AJ, Harcourt S, Smith GE. Correcting for day of the week and public holiday effects: improving a national daily syndromic surveillance service for detecting public health threats. BMC Public Health. 2017;17(1):477. pmid:28525991
- View Article
- PubMed/NCBI
- Google Scholar
9. Mathew A, Fyyaz SA, Carter PR, Potluri R. The enigma of the weekend effect. J Thorac Dis. 2018;10(1):102–5. pmid:29600032
- View Article
- PubMed/NCBI
- Google Scholar
10. Khan S. Handbook of Biosurveillance, M.M. Wagner, A.W. Moore, R.M. Aryel (Eds.). Elsevier Inc. ISBN13: 978–0-12-369378-5. Journal of Biomedical Informatics. 2007 08;40:380–1.
- View Article
- Google Scholar
11. Tan HY, Khamis N, Goh A, Mah TKL, Yeo B, Ngan JYG. Singapore’s COVID‐19 Genomic Surveillance Programme: Strategies and Insights From a Pandemic Year. Influenza and Other Respiratory Viruses. 2024;18(12).
- View Article
- Google Scholar
12. Ntagereka PB, Oyola SO, Baenyi SP, Rono GK, Birindwa AB, Shukuru DW, et al. Whole-genome sequencing of SARS-CoV-2 reveals diverse mutations in circulating Alpha and Delta variants during the first, second, and third waves of COVID-19 in South Kivu, east of the Democratic Republic of the Congo. Int J Infect Dis. 2022;122:136–43. pmid:35598737
- View Article
- PubMed/NCBI
- Google Scholar
13. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol. 2013;178(9):1505–12. pmid:24043437
- View Article
- PubMed/NCBI
- Google Scholar
14. Alessandro A, Tommi A. Effective Reproduction Number Estimation from Data Series. Publications Office of the European Union; 2020. Available from: https://api.semanticscholar.org/CorpusID:226697132
15. Xu X, Wu Y, Kummer AG, Zhao Y, Hu Z, Wang Y, et al. Assessing changes in incubation period, serial interval, and generation time of SARS-CoV-2 variants of concern: a systematic review and meta-analysis. BMC Med. 2023;21(1):374. pmid:37775772
- View Article
- PubMed/NCBI
- Google Scholar
16. Kreiss J-P, Lahiri SN. Bootstrap Methods for Time Series. Handbook of Statistics. Elsevier. 2012. p. 3–26. https://doi.org/10.1016/b978-0-444-53858-1.00001-6
17. Wang L, Kong E, Xia Y. Bootstrap Tests for High-Dimensional White-Noise. Journal of Business & Economic Statistics. 2022;41(1):241–54.
- View Article
- Google Scholar
18. Wang L, Zhang M. Statistical modeling of Dengue transmission dynamics with environmental factors. Computational Statistics & Data Analysis. 2025;203:108080.
- View Article
- Google Scholar
19. Fricker R, Hegler B, Dunfee D. Comparing syndromic surveillance detection methods: EARS’ versus a CUSUM-based methodology. Statistics in Medicine. 2008;27:3407–29.
- View Article
- Google Scholar
20. Craig AT, Leong RNF, Donoghoe MW, Muscatello D, Mojica VJC, Octavo CJM. Comparison of statistical methods for the early detection of disease outbreaks in small population settings. IJID Reg. 2023;8:157–63. pmid:37694222
- View Article
- PubMed/NCBI
- Google Scholar
21. Campbell F, Archer B, Laurenson-Schafer H, Jinnai Y, Konings F, Batra N. Increased transmissibility and global spread of SARS-CoV-2 variants of concern as at June 2021. Eurosurveillance. 2021;26:2100509.
- View Article
- Google Scholar
22. Earnest R, Uddin R, Matluk N, Renzette N, Turbett SE, Siddle KJ, et al. Comparative transmissibility of SARS-CoV-2 variants Delta and Alpha in New England, USA. Cell Rep Med. 2022;3(4):100583. pmid:35480627
- View Article
- PubMed/NCBI
- Google Scholar
23. Farrington CP, Andrews NJ, Beale AD, Catchpole MA. A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease. Journal of the Royal Statistical Society Series A (Statistics in Society). 1996;159(3):547.
- View Article
- Google Scholar
24. Page ES. Continuous inspection schemes. Biometrika. 1954;41(1–2):100–15.
- View Article
- Google Scholar
25. Linette L. Singapore’s largest active Covid-19 cluster: What went wrong at Changi Airport? The Straits Times. Available from: https://www.straitstimes.com/singapore/health/singapores-largest-active-covid-19-cluster-what-went-wrong-at-changi-airport
26. Lim Min Z. KTV and Jurong Fishery Port Covid-19 clusters linked: Ong Ye Kung. The Straits Times. Available from: https://www.straitstimes.com/singapore/ktv-and-jurong-fishery-port-covid-19-clusters-linked-ong-ye-kung#::text=Home-,KTV
27. Yen Nee L. Singapore to start relaxing Covid restrictions Aug.10 as vaccination rate rises. CNBC. Available from: https://www.cnbc.com/2021/08/06/singapore-to-relax-covid-measures-as-vaccination-rate-rises.html
28. Bruckers L, Faes C, Pieters Z, Sumalinab B, Dias JG, Marrone G. Sample size guidance for surveillance data. ECDC. 2023.
29. Goh AXC, Chae S-R, Chiew CJ, Tang N, Pang D, Lin C, et al. Characteristics of the omicron XBB subvariant wave in Singapore. Lancet. 2023;401(10384):1261–2. pmid:37061259
- View Article
- PubMed/NCBI
- Google Scholar
30. Zeng K, Santhya S, Soong A, Malhotra N, Pushparajah D, Thoon K, et al. Serial intervals and incubation periods of SARS-CoV-2 Omicron and Delta variants, Singapore. Emerging Infectious Diseases. 2023;29.
- View Article
- Google Scholar
31. Yong C. Work-related events for up to 1,000 people, with no food and drinks, allowed from Jan 3. The Straits Times. Available from: https://www.straitstimes.com/singapore/work-related-events-of-up-to-1000-people-with-no-food-and-drinks-allowed-from-jan-3
32. Rogerson P, Yamada I. Approaches to syndronic surveillance when data consist of small regional counts. MMWR Morbidity and Mortality Weekly Report. 2004;53(Suppl):79–85.
- View Article
- Google Scholar
33. Jin S, Tay M, Ng LC, Wong JCC, Cook AR. Combining wastewater surveillance and case data in estimating the time-varying effective reproduction number. Sci Total Environ. 2024;928:172469. pmid:38621542
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Elliot AJ, Harcourt SE, Hughes HE, Loveridge P, Morbey RA, Smith S, et al. The COVID-19 pandemic: a new challenge for syndromic surveillance. Epidemiol Infect. 2020;148:e122. pmid:32614283
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Hu W-H, Sun H-M, Wei Y-Y, Hao Y-T. Global infectious disease early warning models: An updated review and lessons from the COVID-19 pandemic. Infect Dis Model. 2024;10(2):410–22. pmid:39816751
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Wagner M, Tsui F, Cooper G, Espino JU, Harkema H, Levander J, et al. Probabilistic, Decision-theoretic Disease Surveillance and Control. Online J Public Health Inform. 2011;3(3):ojphi.v3i3.3798. pmid:23569617
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Dietz K. The estimation of the basic reproduction number for infectious diseases. Stat Methods Med Res. 1993;2(1):23–41. pmid:8261248
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Fauci A, Lane H, Redfield R. Covid-19 — Navigating the Uncharted. New England Journal of Medicine. 2020;382(2).
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref6] 6. Hethcote H. The mathematics of infectious diseases. SIAM Review. 2000;42:599–653.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref7] 7. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, Rt. PLoS Comput Biol. 2020;16(12):e1008409. pmid:33301457
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref8] 8. Buckingham-Jeffery E, Morbey R, House T, Elliot AJ, Harcourt S, Smith GE. Correcting for day of the week and public holiday effects: improving a national daily syndromic surveillance service for detecting public health threats. BMC Public Health. 2017;17(1):477. pmid:28525991
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref9] 9. Mathew A, Fyyaz SA, Carter PR, Potluri R. The enigma of the weekend effect. J Thorac Dis. 2018;10(1):102–5. pmid:29600032
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref10] 10. Khan S. Handbook of Biosurveillance, M.M. Wagner, A.W. Moore, R.M. Aryel (Eds.). Elsevier Inc. ISBN13: 978–0-12-369378-5. Journal of Biomedical Informatics. 2007 08;40:380–1.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref11] 11. Tan HY, Khamis N, Goh A, Mah TKL, Yeo B, Ngan JYG. Singapore’s COVID‐19 Genomic Surveillance Programme: Strategies and Insights From a Pandemic Year. Influenza and Other Respiratory Viruses. 2024;18(12).
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref12] 12. Ntagereka PB, Oyola SO, Baenyi SP, Rono GK, Birindwa AB, Shukuru DW, et al. Whole-genome sequencing of SARS-CoV-2 reveals diverse mutations in circulating Alpha and Delta variants during the first, second, and third waves of COVID-19 in South Kivu, east of the Democratic Republic of the Congo. Int J Infect Dis. 2022;122:136–43. pmid:35598737
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref13] 13. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol. 2013;178(9):1505–12. pmid:24043437
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref14] 14. Alessandro A, Tommi A. Effective Reproduction Number Estimation from Data Series. Publications Office of the European Union; 2020. Available from: https://api.semanticscholar.org/CorpusID:226697132

[ref15] 15. Xu X, Wu Y, Kummer AG, Zhao Y, Hu Z, Wang Y, et al. Assessing changes in incubation period, serial interval, and generation time of SARS-CoV-2 variants of concern: a systematic review and meta-analysis. BMC Med. 2023;21(1):374. pmid:37775772
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref16] 16. Kreiss J-P, Lahiri SN. Bootstrap Methods for Time Series. Handbook of Statistics. Elsevier. 2012. p. 3–26. https://doi.org/10.1016/b978-0-444-53858-1.00001-6

[ref17] 17. Wang L, Kong E, Xia Y. Bootstrap Tests for High-Dimensional White-Noise. Journal of Business & Economic Statistics. 2022;41(1):241–54.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref18] 18. Wang L, Zhang M. Statistical modeling of Dengue transmission dynamics with environmental factors. Computational Statistics & Data Analysis. 2025;203:108080.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref19] 19. Fricker R, Hegler B, Dunfee D. Comparing syndromic surveillance detection methods: EARS’ versus a CUSUM-based methodology. Statistics in Medicine. 2008;27:3407–29.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref20] 20. Craig AT, Leong RNF, Donoghoe MW, Muscatello D, Mojica VJC, Octavo CJM. Comparison of statistical methods for the early detection of disease outbreaks in small population settings. IJID Reg. 2023;8:157–63. pmid:37694222
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref21] 21. Campbell F, Archer B, Laurenson-Schafer H, Jinnai Y, Konings F, Batra N. Increased transmissibility and global spread of SARS-CoV-2 variants of concern as at June 2021. Eurosurveillance. 2021;26:2100509.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref22] 22. Earnest R, Uddin R, Matluk N, Renzette N, Turbett SE, Siddle KJ, et al. Comparative transmissibility of SARS-CoV-2 variants Delta and Alpha in New England, USA. Cell Rep Med. 2022;3(4):100583. pmid:35480627
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref23] 23. Farrington CP, Andrews NJ, Beale AD, Catchpole MA. A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease. Journal of the Royal Statistical Society Series A (Statistics in Society). 1996;159(3):547.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref24] 24. Page ES. Continuous inspection schemes. Biometrika. 1954;41(1–2):100–15.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref25] 25. Linette L. Singapore’s largest active Covid-19 cluster: What went wrong at Changi Airport? The Straits Times. Available from: https://www.straitstimes.com/singapore/health/singapores-largest-active-covid-19-cluster-what-went-wrong-at-changi-airport

[ref26] 26. Lim Min Z. KTV and Jurong Fishery Port Covid-19 clusters linked: Ong Ye Kung. The Straits Times. Available from: https://www.straitstimes.com/singapore/ktv-and-jurong-fishery-port-covid-19-clusters-linked-ong-ye-kung#::text=Home-,KTV

[ref27] 27. Yen Nee L. Singapore to start relaxing Covid restrictions Aug.10 as vaccination rate rises. CNBC. Available from: https://www.cnbc.com/2021/08/06/singapore-to-relax-covid-measures-as-vaccination-rate-rises.html

[ref28] 28. Bruckers L, Faes C, Pieters Z, Sumalinab B, Dias JG, Marrone G. Sample size guidance for surveillance data. ECDC. 2023.

[ref29] 29. Goh AXC, Chae S-R, Chiew CJ, Tang N, Pang D, Lin C, et al. Characteristics of the omicron XBB subvariant wave in Singapore. Lancet. 2023;401(10384):1261–2. pmid:37061259
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref30] 30. Zeng K, Santhya S, Soong A, Malhotra N, Pushparajah D, Thoon K, et al. Serial intervals and incubation periods of SARS-CoV-2 Omicron and Delta variants, Singapore. Emerging Infectious Diseases. 2023;29.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref31] 31. Yong C. Work-related events for up to 1,000 people, with no food and drinks, allowed from Jan 3. The Straits Times. Available from: https://www.straitstimes.com/singapore/work-related-events-of-up-to-1000-people-with-no-food-and-drinks-allowed-from-jan-3

[ref32] 32. Rogerson P, Yamada I. Approaches to syndronic surveillance when data consist of small regional counts. MMWR Morbidity and Mortality Weekly Report. 2004;53(Suppl):79–85.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref33] 33. Jin S, Tay M, Ng LC, Wong JCC, Cook AR. Combining wastewater surveillance and case data in estimating the time-varying effective reproduction number. Sci Total Environ. 2024;928:172469. pmid:38621542
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

Figures

Abstract

1 Introduction

2 Materials and methods

2.1 Data

2.2 Methodological framework

2.2.1 Calendar-aware smoothing method.

2.2.2 Estimation of Rt based on observed data.

2.2.3 Bootstrap method for drawing statistical inference.

2.2.4 Outbreak detection algorithm summary.

2.3 Statistical comparison of different detection methods

2.3.1 Simulation analysis 1.

2.3.2 Simulation analysis 2.

3 Results

3.1 Results for smoothing method

3.2 Comparison between outbreak detection methods for observed data

3.3 Results for simulation analyses

3.3.1 Results for simulation analysis 1.

3.3.2 Results for simulation analysis 2.

4 Discussion

5 Conclusion

Supporting information

S1 Text. Additional analyses in this study.

References

2.2.2 Estimation of R_t based on observed data.