Skip to main content
Advertisement
  • Loading metrics

A history-dependent approach for accurate initial condition estimation in epidemic models

  • Dongju Lim ,

    Contributed equally to this work with: Dongju Lim, Kyeong Tae Ko

    Roles Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Mathematical Sciences, KAIST, Daejeon, Republic of Korea, Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, Republic of Korea

  • Kyeong Tae Ko ,

    Contributed equally to this work with: Dongju Lim, Kyeong Tae Ko

    Roles Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Statistics, Kyungpook National University, Daegu, Republic of Korea

  • Hyukpyo Hong,

    Roles Formal analysis, Investigation, Writing – review & editing

    Affiliation Department of Mathematics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America

  • Hyojung Lee,

    Roles Funding acquisition, Investigation, Writing – review & editing

    Affiliation Department of Statistics, Kyungpook National University, Daegu, Republic of Korea

  • Boseung Choi,

    Roles Data curation, Funding acquisition, Investigation, Writing – review & editing

    Affiliations Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, Republic of Korea, Department of Big Data Science, Korea University, Sejong, Republic of Korea, College of Public Health, The Ohio State University, Ohio, United States of America

  • Won Chang,

    Roles Funding acquisition, Investigation, Writing – review & editing

    Affiliations Institute for Data Innovation in Science, Seoul National University, Seoul, Republic of Korea, Department of Statistics, Seoul National University, Seoul, Republic of Korea

  • Sunhwa Choi ,

    Roles Conceptualization, Funding acquisition, Investigation, Project administration, Validation, Writing – original draft, Writing – review & editing

    jaekkim@kaist.ac.kr (J.K.K.), shchoi@nims.re.kr (S.C.)

    Affiliation Innovation Center for Industrial Mathematics, National Institute for Mathematical Sciences, Seongnam, Republic of Korea

  • Jae Kyoung Kim

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing

    jaekkim@kaist.ac.kr (J.K.K.), shchoi@nims.re.kr (S.C.)

    Affiliations Department of Mathematical Sciences, KAIST, Daejeon, Republic of Korea, Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, Republic of Korea, Department of Medicine, College of Medicine, Korea University, Seoul, Republic of Korea

Abstract

Mathematical modeling is a powerful tool for understanding and predicting complex dynamical systems, ranging from gene regulatory networks to population-level dynamics. However, model predictions are highly sensitive to initial conditions, which are often unknown. In infectious disease models, for instance, the initial number of exposed individuals (E) at the time the model simulation starts is frequently unknown. This initial condition has often been estimated using an unrealistic, history-independent assumption for simplicity: the chance that an exposed individual becomes infectious is the same regardless of the timing of their exposure (i.e., exposure history). Here, we show that this history-independent method can yield serious bias in the estimation of the initial condition. To address this, we developed a history-dependent initial condition estimation method derived from a master equation expressing the time-varying likelihood of becoming infectious during a latent period. Our method consistently outperformed the history-independent method across various scenarios, including those with measurement errors and abrupt shifts in epidemics, for example, due to vaccination. In particular, our method reduced estimation error by 55% compared to the previous method in real-world COVID-19 data from Seoul, Republic of Korea, which includes likely infection dates, allowing us to obtain the true initial condition. This advancement of initial condition estimation enhances the precision of epidemic modeling, ultimately supporting more effective public health policies. We also provide a user-friendly package, Hist-D, to facilitate the use of this history-dependent initial condition estimation method.

Author summary

Accurately predicting infectious disease spread requires knowing the initial number of individuals in the exposed compartment at the start of the simulation (E(t0)), but this number is usually unknown. A common method to estimate E(t0) assumes that the chance of an exposed individual becoming infectious is the same, regardless of when they were exposed. However, this unrealistic assumption can lead to serious errors in the estimation of E(t0). To solve this problem, we developed a method that considers exposure timing. Our method successfully estimated E(t0) even with measurement errors or sudden changes in outbreak conditions. In particular, our approach accurately estimated E(t0) for COVID-19 data from Seoul that includes likely infection dates, which allowed us to obtain the true initial condition. This advancement of initial condition will help improve epidemic predictions and public health strategies. Our method can also be applied to estimate initial conditions in systems where timing or history matters, such as protein maturation or cell degradation pathways. To facilitate the broad adoption of our method, we have also developed and released Hist-D, a user-friendly software package.

Introduction

Epidemic dynamics have been successfully explained by harnessing mathematical models such as the Susceptible–Exposed–Infectious–Removed (SEIR) model [13]. These models predict the future exposed or infectious population over time, allowing predictions of disease spread and the formulation of appropriate public health policies [46]. However, these predictions from mathematical models, particularly those based on ordinary differential equations (ODEs), are highly sensitive to the initial condition used, such as the initial number of exposed (E) and infectious (I) individuals. Variations in these initial conditions lead to differences in the simulation of epidemic dynamics [7], ultimately affecting the subsequent estimations of epidemiological parameters such as reproduction number ().

Despite the importance of accurate initial conditions for the predictive power of the model, the initial values for some compartments of the model are usually unknown. In particular, the initial condition of the exposed compartment (E) is generally unknown, as determining how many people are actually exposed requires extensive contact tracing, whose complexity increases exponentially with the number of contacts [8]. As a result, previous studies have often subjectively determined the initial condition [9,10]. Some studies minimized this subjectivity by treating initial conditions as free parameters and estimating them [11,12], or using various potential values for the initial conditions and selecting the one whose subsequent simulation best fits the data [13,14]. However, this approach is computationally intensive. An alternative approach estimates the initial condition of E using the known number of daily incidence of becoming infectious [15]. This approach is consistent with the fundamental assumption of a standard SEIR model—the daily number of new infectious people is the product of (i) the population in the exposed compartment and (ii) the rate of progression to the infectious stage, which is reciprocal to the length of the latent period. Under this assumption, the initial condition of E can be estimated by multiplying the number of daily infectious individuals with the average latent period [15]. However, this method does not account for the different timing of exposure among people in compartment E, instead assuming the same likelihood of transitioning to the infectious stage for all individuals regardless of when they were exposed (i.e., exposure history). Thus, we refer to this method as the History-Independent estimation (Hist-I) throughout this study.

Relaxing the unrealistic history-independent assumption of a standard SEIR model requires two components: (i) a model reflecting the changing likelihood of transitioning to the infectious stage, and (ii) accurate initial conditions for such a model. The first component has been extensively studied through approaches such as the method of stages [16], linear chain tricks [17,18], and delay differential equation (DDE) models [19]. Applying this model, particularly the DDE-based model, enabled more accurate estimation of epidemic parameters by incorporating individual exposure history [19]. However, despite this advantage, a method for accurately estimating their initial conditions—particularly the number of exposed individuals—remains unknown.

Here, we developed a history-dependent method for estimating the initial condition of E, Hist-D, that considers the exposure history. Specifically, we estimated the initial condition of E by finding the solution of the formula expressing the time-varying likelihood of being infectious during a latent period. When applying this approach to simulation data mimicking the latent period of COVID-19, Hist-D outperformed Hist-I under various conditions including scenarios without measurement errors, in the presence of measurement errors, and with abrupt changes in the epidemic phase. Furthermore, when we applied Hist-D to real-world COVID-19 data from Seoul, South Korea, the error in initial condition estimation was reduced by 55% compared to Hist-I. As our approach provides a more accurate estimation of the initial condition of E, it will lead to a more precise understanding of epidemic dynamics, ultimately enabling more effective public health policies. To facilitate the application of Hist-D, we developed a user-friendly package.

Results

The history-independent method is inaccurate when the latent period is non-exponential

Mathematical models, even with identical parameter values, can yield different simulation results depending on their initial conditions. Consequently, using different initial conditions to fit the same model to identical data also can yield different estimates of key parameters, such as the transmission rate () or the reproduction number ( in the SEIR model (Fig 1a inset). For example, if the initial condition for the exposed population is reduced to 25% of the original value, this leads to a 33.9% relative error in estimating the reproduction number under the parameter condition mimicking the COVID-19 dynamics, highlighting the importance of setting accurate initial conditions (S1a Fig). How the bias in initial conditions evolves in the estimation of the reproduction number is illustrated in S2 Text and S1b Fig.

thumbnail
Fig 1. Estimating initial conditions for the SEIR model.

(a) Schematic of the SEIR mathematical model, including the susceptible (S), exposed (E), infectious (I), and removed (R) individuals, which effectively explains epidemic dynamics. Fitting the SEIR model with observed time series (white dots) from enables the estimation of crucial parameters in the epidemic such as reproduction number (). This estimation strongly depends on the initial condition at (e.g., ), the starting point of the model simulation (See Supplementary Information for more details). (b) The initial condition of E ( in red) can be determined by summing up the daily change of E () up to the since the beginning of the disease (0) (red arrow). However, it requires daily incidence of exposure () and daily incidence of becoming infectious () data before , which are often unknown. This highlights the need for a method to estimate the initial condition of E using only the available daily data on infectious individuals from time onward (green arrow). (c) To address this limitation, previous studies estimated the initial condition of E ( in red) by multiplying at and the mean latent period () (green arrow). (d) However, while this History-Independent estimation (Hist-I) method provides an accurate estimation (red dots) if the latent period follows the exponential distribution (left), it becomes less reliable for the gamma distribution (right) observed in many infectious diseases, whereby an individual is more likely to transition from exposed to infectious the longer their time since exposure.

https://doi.org/10.1371/journal.pcbi.1013438.g001

To calculate this initial condition accurately, it is first necessary to precisely determine the beginning time of the disease (time = 0) and estimate how the epidemic dynamics have changed from that point up to the start of the SEIR model simulation (time = ) (Fig 1b). For example, to determine the initial condition for the exposed population (E), it is essential to track how E has changed from the beginning of the disease to the start of the simulation. Tracking these daily changes requires knowing how many people became exposed each day () and how many transitioned to the infectious stage, leaving the E compartment (). This information can be derived from exposure history (the date of exposure) and the timing of infectiousness among exposed individuals. However, collecting such data, particularly examining the exposure history of each exposed individual, becomes increasingly challenging as the disease progresses because it requires labor-intensive contact tracing. Consequently, the only available data for determining the initial condition is typically the after the data collection starts (Fig 1b) [20], making it challenging to set accurate initial conditions.

To overcome these limitations and set the initial conditions using the available data, the assumption underlying the standard SEIR model can be used (Fig 1c; See S2 Text for more details). Specifically, is the product of exposed populations () and the rate of becoming infectious (), which is the inverse of the average latent period (). This assumption naturally leads to the equation : the initial condition of E can be estimated by multiplying the average latent period () by the at the initial time point () [15].

This method, Hist-I, follows the core assumption of the standard SEIR model that does not consider the time when individuals are exposed (i.e., exposure history), assuming everyone experiences the same chance of becoming infectious regardless of their exposure history (Fig 1d). This history-independent (or memoryless) assumption is well-suited for the scenario where the latent period follows an exponential distribution. Conversely, when the latent period follows a non-exponential distribution (e.g., a gamma distribution), the memoryless property is lost, leading to inaccuracies in the Hist-I method (Fig 1d). However, most infectious diseases exhibit a gamma-distributed latent period [2127] (Fig 1d), meaning that the longer the time since an individual was exposed, the more likely they will transition to becoming infectious (i.e., the history-independent assumption does not hold in reality). This highlights the need for a new method that accounts for this variability in the chance of becoming infectious, depending on the individual’s exposure history.

A framework for estimating initial condition in history-dependent manner

To determine the initial conditions in a history-dependent manner (Fig 2a), we first utilized an equation that represents the relationship between the number of daily exposed individuals each day ( in Fig 2b) and the number of individuals leaving the compartment E and becoming infectious ( in Fig 2b), which is given by the data (Fig 2b (i)). This equation uses convolution to express the fact that after being exposed, each individual becomes infectious and leaves compartment E after a latent period () following a specific probability distribution (; e.g., Gamma) (Fig 2b (i)). In this way, it directly accounts for different exposure histories among exposed individuals.

thumbnail
Fig 2. Schematic figure for deriving the loss function to estimate the initial condition.

(a) To address the limitation of the history-independent method (left), we developed a novel history-dependent method (right). (b) (i) We established the connection between the known data, , and the unknown by treating the as a convolutional output of and the probability density function of the latent period, . (ii) By discretizing this relationship and (iii) assuming remains consistent before , we can express the known as a linear combination of unknown and unknown with known coefficients and . represents the probability of an individual having a latent period of exactly days, while represents the probability of the latent period being longer or equal to days. and can be obtained by integrating the convolution of and , where represents the characteristic function supported on [0,1] (See Methods for more details). (c) Extending the linear combination expression to the whole data (i.e., for ), we can construct a matrix that describes the relationship between known data and unknown parameters. (d) We utilized this matrix equation that must satisfy to establish the data loss function, then sought to minimize this data loss by finding optimal values for unknown parameters, including . However, as the number of unknown parameters () exceeds the number of equations (), the parameters cannot be determined solely from the data loss. This leads us to incorporate the regularization loss for the parameters, which aims to smooth the parameters by minimizing their second order derivatives. Consequently, by finding the parameters that minimizes the total loss function (), which includes both the data loss and the regularization loss, we can estimate . By summing up the difference between daily incidence of exposure () and daily incidence of becoming infectious at (), we finally get the initial condition of E.

https://doi.org/10.1371/journal.pcbi.1013438.g002

After discretizing this equation (Fig 2b (ii)), and assuming that individuals were being exposed at a constant rate before time (Fig 2b (iii)), we were able to express the given data (, ) as a linear combination of and the daily incidence of exposure after time (, ) (See Methods for more details). The coefficient corresponding to in this linear combination ( in Fig 2b (iii)) represents the probability that the latent period longer or equal to days. This expresses that individuals exposed before must go through a latent period longer or equal to days to become infectious at time . Conversely, the coefficient corresponding to in the linear combination ( in Fig 2b (iii)) represents the probability that the latent period is days. This reflects that an individual exposed at must go through a latent period of days to become infectious at time .

We can write these relationships for all given data (, …, ), and by combining them, we can express the equations in the form of a matrix (Fig 2c). By finding the value of that satisfies this matrix equation, we can estimate the . To achieve this, we created a data loss function that becomes minimal when both sides of the matrix equation are equal (Fig 2d), then sought to minimize this data loss by finding optimal values for unknown parameters (, ,…, ). However, as the number of parameters to be estimated is , which is one more than the number of data points, , there are infinitely many parameter combinations that satisfy the data loss. To identify a single parameter combination that is close to the true values among these infinitely many combinations, we added a regularization loss term (Fig 2d). This regularization term helps minimize the second derivative of , ensuring that the estimated does not exhibit abrupt changes. Consequently, by finding the combination of , ,…, that minimizes the loss function, which includes both the data loss and the regularization loss, we can estimate . By summing change in the number of exposed people at time (), we finally estimated the initial condition of E () (Fig 2d).

The new history-dependent method outperforms the history-independent method

We evaluated whether our new history-dependent method can provide accurate estimates of the initial condition of E when the latent period follows the gamma distribution unlike Hist-I. To do this, we simulated an SEIR model whose latent period follows the gamma distribution with shape 4.06 and scale 1.35 [24], from to (Fig 3a). We then extracted the value of E and the number of people transitioning from E to I () at each time point (see Methods for more details).

thumbnail
Fig 3. Hist-D outperforms Hist-I, regardless of the phase transition of epidemic dynamics and noise.

(a) The trajectory of E and the daily incidence of becoming infectious () were simulated through the SEIR model whose latent period follows the gamma distribution with shape 4.06 and scale 1.35 (See Methods for more details). (b) Simulated was then utilized to estimate the and compare History-Independent estimation (Hist-I) and History-Dependent estimation (Hist-D). Hist-I utilizes data from only single day, , while Hist-D uses data from consecutive days after the , where is a mean latent period. (c) The graph comparing the true (light gray-colored bars) and the estimated (). (d) The scatter plot displaying the error () across different levels of true . Estimation from Hist-D (green squares) has a much lower error compared to Hist-I (red triangles). (e) The graph showing the root mean squared error (RMSE) (bars) and the mean absolute percentage error (MAPE) (line) of Hist-I and Hist-D. When Hist-D was utilized, RMSE and MAPE was reduced by 86% and 85%, respectively, compared to Hist-I. (f) To better reflect the real-world situation with observation noise in given data, we applied multiplicative noise (), where is the uniform distribution on , to the simulated data used in (c-e) and compared the accuracy of Hist-I and Hist-D. (g) The scatter plots displaying the estimation error at the noise level . The error of both Hist-I and Hist-D increased proportionally to the level of true , and this was specifically manifested in Hist-I (top). In addition, compared to the zero-noise level case (i.e., the case in (c-e)), the error increment of Hist-D was lower than that of Hist-I (bottom). (h) The graph showing the RMSE (bars) and MAPE (line) of Hist-I and Hist-D across the different noise levels (). Hist-D achieved a lower RMSE and MAPE than Hist-I across all noise levels. (i) To assume the transition of epidemic dynamics, we abruptly changed the transmission rate, , from to at a single point (top), and simulated data (middle), which were then used to investigate the accuracy of Hist-I and Hist-D. (j) The scatter plot showing the error of Hist-I and Hist-D when the transmission rate has been doubled. Hist-D outperformed Hist-I. (k) The graph showing the RMSE (bars) and MAPE (line) of Hist-I and Hist-D across the different fold change ( / = 1/3, 1/2, 1, 2, 3). Hist-D consistently outperformed Hist-I across all fold changes. In particular, when was reduced to 1/3, the absolute increase in RMSE and MAPE for Hist-D was 22% and 19% that of Hist-I, respectively, demonstrating the robustness of Hist-D to sudden changes in .

https://doi.org/10.1371/journal.pcbi.1013438.g003

With this data, we estimated the initial condition from the given data using the history-independent method (Hist-I) and the history-dependent method (Hist-D). First, the Hist-I method estimates the by multiplying the mean latent period (i.e., 4.06 × 1.35 = 5.48) by the value of at (Fig 3b). For example, we estimated by multiplying 5.48 by the respective values of at . The second method, History-Dependent estimation (Hist-D), estimated the value that minimized the loss function (Fig 2d) with the data of for 2 × mean latent periods ≈ 10 days after (Fig 3b). For example, when estimating , we used data from to , and for estimating , we used data from to . Note that while 2 × mean latent periods are used in this study, the length of data can be adjusted by users. Both Hist-I and Hist-D assume that data only exists after the time point , which is the start of the SEIR model simulation.

Using these methods (Hist-I and Hist-D), we estimated the for and compared them with their true values (Fig 3c). As a result, Hist-D was much more accurate than Hist-I (Fig 3d), particularly reducing the root mean squared error (RMSE) and mean absolute percentage error (MAPE) by 86% and 85%, respectively (Fig 3e). Similar improvements were also observed during the earlier phase () of epidemic growth (S3 Fig). This superiority of Hist-D persisted under various parameter conditions (See S1 Table) and even after modifications were made to the Hist-I method by summing up the future daily incidence of becoming infectious, as done in a previous study [28] (see S3 Text and S2 Fig). Consequently, we focused our analysis on the Hist-I method rather than its modified version.

Hist-D demonstrated superior accuracy compared to the Hist-I method under ideal conditions without measurement errors. However, real-world situations differ from simulations, as measurement errors are always present. To simulate a scenario with observation errors, we introduced the multiplicative noise to the given data () and used this data to estimate with Hist-I and Hist-D (Fig 3f). When the noise level was 0.1 ( in Fig 3f), due to the effect of the multiplicative noise, the error increased as grew larger for both methods (Fig 3g, top). However, Hist-D still maintained smaller errors compared to Hist-I (Fig 3g, top). Furthermore, Hist-D demonstrated greater resilience to increasing noise levels compared to Hist-I, exhibiting a smaller error amplification as the noise intensified from 0 to 0.3 (Fig 3g, bottom). This higher accuracy persisted across all tested noise levels, ranging from 0 to 0.3 in 0.1 intervals (Fig 3h). These results show that Hist-D is recommended for real-world applications with measurement errors in observed data.

Beyond the measurement error, real-world epidemics present additional complexities such as sudden changes in the epidemic phase due to social distancing, vaccination, or large-scale outbreaks of COVID-19. To reflect these changes in the simulation, we regenerated simulation data by changing the transmission rate, , at a specific point (i.e., when E reaches its peak) and used this new simulation data to compare the accuracy of Hist-I and Hist-D (Fig 3i). When we doubled the and investigated the errors for the 20-time points before and after the changing point, both Hist-I and Hist-D showed increased errors around the point of the second peak (time = 150 – 160 in Fig 3i; the right most of the graph in Fig 3j). However, the Hist-D method produced smaller errors than the Hist-I method (Fig 3j). This superior performance of Hist-D persisted across various changes (1/3, 1/2, 2, and 3-fold), consistently achieving smaller RMSE and MAPE compared to Hist-I (Fig 3k). In particular, when was reduced to 1/3, the absolute increase in RMSE and MAPE for Hist-D was less than half that of Hist-I (Fig 3k), demonstrating the robustness of Hist-D to sudden changes in . Taken together, these results highlight the promising potential of Hist-D for estimating in dynamic, real-world scenarios.

Hist-D outperforms Hist-I for real-world COVID-19 data

The results from the simulation data demonstrated the strong potential of Hist-D for accurately estimating the initial condition of E in real-world scenarios. To test this, we applied Hist-I and Hist-D to COVID-19 data from Seoul, Republic of Korea, spanning August 13th to November 25th, 2020. This data included the contact dates and symptom onset for people in Seoul, allowing us to empirically derive the number of people moving from S to E () and from E to I (), as well as the distribution of the incubation period (i.e., the time between contact and symptom onset date) (See Methods for more details) (Fig 4a). With this information, we calculated the daily change in E () and accumulated these changes starting from the date of the first recorded case of international transmission in Korea, to compute the daily for 2020. Then, this real was compared with estimated by applying Hist-I and Hist-D to the data and the empirical distribution of incubation period (Fig 4b). In particular, Hist-D utilized 8 days of data, approximately twice the mean incubation period, as in the case of the simulation study (Fig 3b). For the last few days, when 8 days of data were unavailable, Hist-D utilized data from to the last available date.

thumbnail
Fig 4. Hist-D provide more accurate estimates of the initial condition of E compare to Hist-I for real COVID-19 data in Seoul, Republic of Korea.

(a) We compared Hist-I and Hist-D to estimate the initial conditions of E for COVID-19 data in Seoul, Republic of Korea, from August 13 to November 25, 2020. From this data, data and the distribution of the incubation period (light blue histogram) were extracted (see Methods for more details) and then used to estimate the initial condition of E with Hist-I and Hist-D. (b) The graph comparing the true (light gray bars) and estimated (Hist-I: red triangles, Hist-D: green squares). While both methods capture the long-term trend, Hist-I exhibits more pronounced fluctuations. (c) The scatter plot comparing the true and estimated (). Estimation from Hist-D is closer to the perfect estimation (i.e., the black cross line, where ) than Hist-I. (d) The scatter plot displaying the error () across different levels of true . The error of Hist-I increased proportionally to the , while such a pattern was not manifested in Hist-D. (e) The graph showing the RMSE (bars) and the mean absolute percentage error (MAPE) (line) of Hist-I and Hist-D. Hist-D achieved 55% lower RMSE (8.44) and 55% lower MAPEs (18.9%) compared to Hist-I (RMSE: 18.76, MAPE: 42.2%), respectively, demonstrating the superior performance of Hist-D, in real-world epidemic data. (f) 95% Credible interval and empirical coverage of our estimated values. The upper and lower horizontal lines of each box represent the upper and lower bounds of the credible interval, corresponding to the 97.5% and 2.5% quantiles, respectively. 91.3% of true values were included in the 95% credible interval of Hist-D.

https://doi.org/10.1371/journal.pcbi.1013438.g004

While both methods effectively captured the long-term trend of , Hist-D exhibited less fluctuation compared to Hist-I (Fig 4b). Notably, Hist-D provided more accurate estimates than Hist-I during abrupt changes in such as near Oct 31 (Fig 4b), consistent with Fig 3k. As a result, Hist-D consistently demonstrated the higher accuracy than Hist-I (Fig 4c, 4d), whose error increased as the magnitude of grew (Fig 4d). In particular, Hist-D reduced the RMSE by 55% compared to Hist-I (Fig 4e). Similarly, it decreased the MAPE by 55% (Fig 4e). These results indicate that in highly volatile real-world scenarios, Hist-D provides more accurate and reliable estimates of initial conditions than the Hist-I method.

Despite the promising results, Hist-D did not achieve perfect estimations. Therefore, we checked whether the true values fell within the 95% credible interval when using the Hist-D (Fig 4f; see Methods for more details). As a result, 91.3% of the true values were included within the credible interval for the Hist-D method (Fig 4f). Taken together, Hist-D demonstrates robust capabilities in precisely determining the initial condition of E, which is likely to result in a more accurate estimation of epidemic dynamics.

Discussion

While accurate initial conditions are crucial for the SEIR model, the initial condition value of the exposed population (E) is often unknown. Thus, the initial condition of E has often been estimated with the Hist-I method. However, Hist-I does not consider the timing of exposure of the individuals in the exposed compartment (i.e., exposure history). As a result, this method yields biased estimation (Fig 1d). To resolve this problem, in this study, we developed a new history-dependent method, Hist-D (Figs 2 and 3a-3b). For the simulated data, Hist-D estimated the initial condition of E much more accurately than Hist-I (Fig 3c-3e), even with measurement errors (Fig 3f-3h) or sudden changes in epidemic phases (Fig 3i-3k). Importantly, Hist-D successfully estimated the initial condition of E in real-world COVID-19 data from Seoul, Korea, reducing estimation error by 55% compared to Hist-I (Fig 4). These findings demonstrate that Hist-D can more accurately estimate the unknown initial conditions in the SEIR model using relatively accessible data.

Although this study focused on the SEIR model, Hist-D can be applied to any compartmental model where the transition time between two compartments is known and inflow data for the downstream compartment is available. Thus, Hist-D can be used when the daily incidence of becoming infectious and the latent period distribution are known (Fig 3), or when the daily incidence of symptom onset and the incubation period distribution are known (Fig 4). This flexibility allows Hist-D to be applied to other infectious disease models, such as SEIR-Vaccinated (SEIRV) [29,30], SEI-Quarantined-R (SEIQR) [31,32], or SE-Presymptomatic-IR (SEPIR) [33,34]. For more complex models [16] with additional substages in the exposed (E) or infectious (I) compartments, the same approach used in Hist-D can be adapted by modifying the left and right sides of the equations derived in this study (see Fig 2b and the ‘Derivation of loss function’ section of the Method section for more details). These modifications can also be easily implemented in our Hist-D package by altering a single function (see [7] of the S1 Text for further guidance). Therefore, Hist-D offers a flexible framework that can be readily extended, and applying it to a broader class of infectious disease models represents a promising direction for future research.

The epidemic dynamics, in particular, the transition from exposure to infectiousness, is inherently history-dependent (i.e., its likelihood varies over time since the exposure). However, this has been overlooked in previous studies, which employed a simple ODE model that assumes a constant chance of becoming infectious. While this history-independent representation simplifies the inference of crucial epidemiological parameters such as reproduction number, our previous work revealed that it introduces significant bias [19]. Thus, we address this bias by utilizing a model that describes the history-dependent dynamics [19]. Nonetheless, the advantage of using history-dependent models relies heavily on accurate initial conditions (S4 Text and S4 Fig), as these values significantly affect the model predictions. Previous methods for determining initial conditions were based on a history-independent assumption, misaligning with the dynamics in history-dependent models and resulting in a considerable bias in initial conditions (Fig 3) and subsequent estimation of the reproduction number (S4 Fig). We addressed this here by developing a history-dependent method for estimating the initial condition (S4 Fig). This, combined with history-dependent models, provides the first framework that completely describes the history-dependent dynamics of infectious disease.

Hist-D employs a master equation (Fig 2b (i)), which represents the daily infectious population as a convolution of daily exposed individuals and the latent period distribution. In this study, we modified this master equation to derive the total loss function (Fig 2b-2d). In contrast, previous studies have applied this master equation without direct modification [3537]. For example, Abbott et al. utilized a similar master equation to develop an algorithm that can estimate the sometimes-unknown daily infectious population from typically available daily confirmed cases [37]. This suggests the potential to extend the applicability of our approach by combining with the approach by Abbott et al. Specifically, our framework currently requires daily incidence of becoming infectious data, which is sometimes unknown. In such cases, we can estimate the daily infectious cases from the daily confirmed cases, which is typically easier to obtain, by using the approach of Abbott et al.

Beyond infectious disease studies, other biological systems have also been studied using mathematical models incorporating delay [3846]. These models simplify complex biological processes involving many intermediate stages by representing them as a single pathway with a time delay. For example, the complex maturation process of proteins has been replaced with one single protein production process with time delay [41,43] and the complex degradation pathway of damaged cells has been replaced with a single degradation process with delay [46]. This approach is similar to the SEIR model used in this study, where the detailed process from exposure to infectiousness has been simplified to a single process with delay (i.e., latent period). Considering this, Hist-D can be generalized to the other models incorporating the delay. For instance, when modeling the level of immature and mature proteins, Hist-D could estimate the initial condition of the immature proteins, which is often difficult to measure experimentally, by using the known data of mature proteins.

Despite the novelty of Hist-D, several limitations should be noted. First, our methods are derived from ODE-based infectious disease models, though stochastic compartmental models and network-based models are also used to better capture transmission uncertainty and detailed processes of disease progression, respectively [47,48]. Whether Hist-D can be extended to these models remains an open question, and exploring this would be a promising direction for future research. In addition, Hist-D assumes a constant daily incidence of exposure before the initial time point (See equation (8) in the Methods section). Relaxing this assumption to accommodate scenarios such as super spreading events [49] or exponential growth (or decay) represents another important direction for future work.

Another limitation is that our method has been primarily validated using COVID-19 data. In addition to real data from Seoul, Korea, we used simulation data that mimics the latent period of COVID-19 with a Gamma distribution. However, the latent periods of other infectious diseases may not follow the Gamma distribution. Nonetheless, even in such cases, Hist-D can be readily adapted by simply adjusting the latent period distribution, as our derivation of the loss function does not depend on any specific distributional assumption. Therefore, we hypothesize that Hist-D can still estimate the initial condition of E with reasonably high accuracy across various latent period distributions.

Lastly, the credible intervals for Hist-D were relatively wide, indicating a high degree of uncertainty in the estimates. As such uncertainty can hinder the precise determination of initial conditions, future work should focus on reducing this uncertainty. Additionally, the empirical coverage of the 95% credible intervals was below the expected level (i.e., 95%). This discrepancy may arise from a mismatch between the real-world data generation process and the model assumptions, such as that constant exposure occurs before the initial point (equation (8)), underlying the Bayesian approach. Reducing this model deficiency through advanced statistical techniques [50] could improve empirical coverage and enhance the reliability of the estimates.

Method

Derivation of the total loss function used in Hist-D

We established the total loss function to estimate the initial conditions of exposed individuals. This loss function started from the master equation that characterizes the history-dependent rate of becoming infectious by incorporating a non-exponentially distributed latent period. While this can be modeled through the method of stages [16], which introduces multiple substages in the exposed compartment, it requires specifying the number of substages by fitting an Erlang distribution to the empirical distribution of the latent period, which may increase the computational cost of the optimization process. More importantly, the method of stages constrains the latent period to an Erlang distribution. To avoid this, we adopted an alternative approach, following a previous study [19], which allowed us to incorporate arbitrary latent period distribution. In particular, the instantaneous rate of individuals becoming infectious is a convolution of the instantaneous rate of individuals exposed and the probability density function of the latent period, as follows (Fig 2b (i)):

(1)

where represents the instantaneous rate of the number of individuals becoming infectious at time , denotes the instantaneous rate of the number of individuals exposed at time , and is the probability density function of the latent period. Here, we modified this equation to explicitly account for the effect of the initial condition of E on the daily infectious individuals. We first integrated the equation (1) to express the number of daily infectious individuals ().

(2)

Then, we discretized the marginal number of individuals exposed in terms of the number of daily incidence of exposure, as follows (Fig 2b (ii)):

(3)

This can be rewritten as follows:

(4)

where denotes the characteristic function supported on the interval . By plugging in equation (4) to the equation (2), we derived the new equation:

(5)

By changing the variable from to by , we obtain

(6)

where * symbol denotes the convolution. Considering that for , we finally obtain

(7)

However, this equation includes infinitely many unknown parameters ( where is an integer smaller or equal to ), making parameter estimation challenging. To overcome this, we approximated the equation (7) by assuming individuals were exposed at a constant rate, , before time (Fig 2b (iii)):

(8)

Here, we found that this constant rate is closely related to the . Specifically, we can express as a function of :

(9)

This equation arises from the fact that the people exposed at time can remain in the exposed (E) compartment at time only if their latent period () is greater than . By applying Fubini’s theorem to this equation, we can further simplify the equation as follows:

(10)

where is the mean of the latent period. This equation suggests that the constant rate is proportional to the :

(11)

Plugging in equations (8) and (11) to equation (7), we can derive the final equation:

(12)

This final equation holds for every given data (), and this system of equations can be written in a matrix form (Fig 2c):

(13)

where and . From this matrix equation, we established the data loss function which is minimal when the left and right sides of the equation (13) are similar.

(14)

We aimed to find the unknown parameters () by minimizing the data loss. However, as the number of unknown parameters (), exceeds the number of equations (), the parameters cannot be determined solely from the data loss. This leads us to incorporate the additional regularization loss for the parameters. For this regularization, we employed the second-order derivative of daily incidence of exposure, because typically one day is insufficient to make a drastic increase or decrease in daily change of the exposed population. As a result, we derived the final total loss function.

(15)

Finally, we calculated the initial condition of E () from the estimated parameters () and available information () by using following formula:

Parameter estimation from the loss function

To find the value of minimizing the total loss function (i.e., equation (12)), we utilized the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method. This algorithm is a gradient-based quasi-Newton approach designed for solving large-scale optimization problems. The optimization process incorporates boundary conditions to ensure biological plausibility. These constraints guarantee that the estimated values remain non-negative and do not exceed the maximum population size S, preserving the physical meaning of the parameters.

Construction and simulation of the SEIR model with Gamma-distributed latent and infectious periods

We compared the accuracy of Hist-I and Hist-D by using simulation data from the SEIR model mimicking the latent period of COVID-19. For this, we constructed the SEIR model following the previous study by Hong and Eom et al. [19]:

(16)

where is the transmission rate and is the number of the entire population (i.e., ). and indicates the history-dependent rate of transition from compartment E to I and I to R, respectively. These rates were calculated as follows:

(17)

where and represent the probability density functions of the distribution of the latent period and infectious period, respectively, and and represent the probability density functions of sojourn times of individuals initially in compartments E and I, respectively [19].

We set initial conditions as to mimic the initial stage of COVID-19 in Seoul, where the whole population of Seoul is susceptible except for one infectious individual. Additionally, we assumed that the latent period follows the Gamma distribution with shape 4.06 and scale 1.35, based on a previous study that fitted a gamma distribution to observed data on the COVID-19 latent period [24]. The infectious period was assumed to follow the Gamma distribution with shape 30 and scale 0.2, as estimated in a previous study using same COVID-19 data employed in this study [19]. Simulating this model using Heun’s method, we obtained daily numbers of and , which were then converted to the daily incidence of exposure () data and daily incidence of becoming infectious () data as follows:

(18)

These daily exposed and infectious people datasets were employed to compare the performances between Hist-I and Hist-D (Fig 3c-3e). Then, to further consider the possible measurement errors in the real-world, we applied multiplicative noise () to the given data (i.e., daily incidence of becoming infectious, ) (Fig 3f) and compared the estimation accuracy of Hist-I and Hist-D (Fig 3g). Additionally, we incrementally increased the noise level () by 0.1 to assess whether Hist-D maintained its superiority under higher noise conditions (Fig 3h). To ensure the reliability and stability of our findings, this process was iterated 10 times, and their average RMSE and MAPE were reported (Fig 3h). Lastly, we simulated a sudden shift in epidemic dynamics by adjusting during the simulation (Fig 3i), changing its initial value of 0.4 by multiplying it by factors of 1/3, 1/2, 1, 2, 3 at .

Data collection and preprocessing

For the real-world data analysis, we utilized contact tracing data of COVID-19 in Seoul, Republic of Korea from January 20th, 2020 to November 25th, 2020. From this data, the period from August 13th, 2020 to November 25th, 2020 was chosen as the testing set for Hist-D and Hist-I as it includes both the increasing phase and decreasing phase of exposed individuals. The dataset contains individual contact dates, symptom onset dates, and confirmation dates of COVID-19 cases. While confirmation dates were complete, only 35% and 63% of contact dates and symptom onset dates were available, respectively, leading us to use 21% of the total data that had complete information on contact dates, symptom onset dates, and confirmation dates.

From these data, we extracted the number of daily incidence of exposure () and daily incidence of becoming infectious (), by assuming that individuals become “exposed” at their contact dates and “infectious” at their symptom onset dates. Then, we calculated the population in the E compartment by setting on January 20th, 2020, the date of the first officially confirmed COVID-19 case in Seoul [19], and cumulatively summing the difference of and . The resulting values were used as true values of . Additionally, the distribution of the incubation period was obtained empirically, by calculating the time difference between the date of symptom onset and the contact date for each case: the probability of an incubation period of 5 days is the ratio of cases whose difference between symptom onset date and contact date is 5 days.

These data are protected and are not available due to data privacy laws. The Korea Public Institutional Review Board Designated by Ministry of Health and Welfare waived the need for ethical approval for the collection and analysis of the real-world data since the data was anonymized and none of the individuals were identifiable (reference number: P01-202404-01-016).

Uncertainty quantification using the Markov chain Monte Carlo (MCMC) method

We quantified the parametric and prediction uncertainties of Hist-D using Bayesian inference. We set the likelihood of given observed daily incidence of becoming infectious, , as follows:

(19)

where and are known constants introduced in equation (10). The prior distributions of parameters were assumed as follows:

(20)

where denotes the lognormal distribution with mean and variance and denotes the point estimates of , obtained by utilizing Hist-D. We used and .

We used an MCMC method to sample the parameters () from their posterior distribution defined by the likelihood in [19] and the priors in [20]. To be more specific, we performed 100,000 iterations of sampling from their posterior distributions, by using a Hamiltonian Monte Carlo (HMC) algorithm with the No-U-Turn Sampler (NUTS). We generated the posterior samples of by cumulatively summing up the difference between posterior sample of daily incidence of exposure and given data: . Then, the credible interval for Hist-D is calculated by determining the range between 0.025 and 0.975 quantiles of the posterior samples of .

Quantification and statistical analysis

In this study, functions to estimate the initial condition and simulate all scenarios were developed by the authors in the programming languages R (version 4.3.2) and Stan (version 2.32.2) and Rstan (version 2.32.6).

Supporting information

S1 File. Hist-D.

Computational package for Hist-D.

https://doi.org/10.1371/journal.pcbi.1013438.s001

(ZIP)

S1 Text. Computational package for Hist-D.

https://doi.org/10.1371/journal.pcbi.1013438.s002

(DOCX)

S2 Text. Estimation of the reproduction number heavily depends on the initial conditions.

https://doi.org/10.1371/journal.pcbi.1013438.s003

(DOCX)

S3 Text. Hist-D is more accurate than Hist-I, even after Hist-I was modified.

https://doi.org/10.1371/journal.pcbi.1013438.s004

(DOCX)

S4 Text. Hist-D enhances the accuracy of reproduction number estimation.

https://doi.org/10.1371/journal.pcbi.1013438.s005

(DOCX)

S1 Fig. Estimation of reproduction number is heavily dependent on the initial condition of E.

The plot displaying the fold error between the estimated reproduction number () and the true reproduction number () across various bias levels in . was initially , and it changed to the .When (red curve), led to a 37.9% relative error and led to a 19.4% relative error. When (blue curve), led to a 42.5% relative error and led to a 34.1% relative error. (b) The plot showing the estimated time-varying reproduction number (()) under varying levels of bias in the initial condition of : and , where is the original initial condition value. The influence of bias in continues noticeably well after the time point where the initial condition was estimated, and decreases over time, leading to convergence in the estimated reproduction numbers.

https://doi.org/10.1371/journal.pcbi.1013438.s006

(EPS)

S2 Fig. The superiority of Hist-D was preserved even after the modification of Hist-I as done in Rauch et al. (a) The graph comparing the true (light gray-colored bars) and the estimated ().

The modified Hist-I underestimates the (b) The scatter plot displaying the error between estimated () and true across different levels of true . Estimation from Hist-D (green squares) shows smaller errors compared to the modified Hist-I (red triangles). (c) The bar plot showing the root mean squared error (RMSE; bars) and mean absolute percentage error (MAPE; line) of the modified Hist-I and Hist-D. RMSE and MAPE were reduced by 80% and 53%, respectively, when Hist-D were utilized, compared to the modified Hist-I.

https://doi.org/10.1371/journal.pcbi.1013438.s007

(EPS)

S3 Fig. Hist-D outperforms both Hist-I and the modified Hist-I even in the early phase of the epidemic dynamics.

(a) The graph comparing the true (light gray-colored bars) and the estimated () for time . (b) The scatter plot displaying the error between estimated () and true across different levels of true . Estimation from Hist-D (green squares) shows smaller errors compared to both Hist-I (red triangles) and the modified Hist-I (blue circles). (c) The bar plot showing the root mean squared error (RMSE; bars) and mean absolute percentage error (MAPE; line) of Hist-I, the modified Hist-I, and Hist-D. Hist-D reduced both RMSE and MAPE by 81% compared to Hist-I, whereas the modified Hist-I reduced both by 47%.

https://doi.org/10.1371/journal.pcbi.1013438.s008

(EPS)

S4 Fig. Hist-D enhances the accuracy of reproduction number estimation.

Boxplots of the posterior samples of the reproduction number () obtained from IONISE with initial conditions estimated by Hist-D (green) and Hist-I (red). IONISE combined with Hist-D accurately estimated the reproduction number, while using Hist-I introduced considerable bias. Here, the posterior samples were normalized by the true value () employed for generating the simulation data depicted in Fig 3c.

https://doi.org/10.1371/journal.pcbi.1013438.s009

(EPS)

S1 Table. Hist-D is more accurate than Hist-I under various parameter conditions.

The table shows the reduction in RMSE (former) and MAPE (latter) achieved by Hist-D relative to Hist-I under the same conditions as Fig 3c–3e, but with varied latent and infectious period parameters.

https://doi.org/10.1371/journal.pcbi.1013438.s010

(DOCX)

References

  1. 1. Tang B, Wang X, Li Q, Bragazzi NL, Tang S, Xiao Y, et al. Estimation of the Transmission Risk of the 2019-nCoV and Its Implication for Public Health Interventions. J Clin Med. 2020;9(2):462. pmid:32046137
  2. 2. He S, Peng Y, Sun K. SEIR modeling of the COVID-19 and its dynamics. Nonlinear Dyn. 2020;101(3):1667–80. pmid:32836803
  3. 3. Hao X, Cheng S, Wu D, Wu T, Lin X, Wang C. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature. 2020;584(7821):420–4. pmid:32674112
  4. 4. Radulescu A, Williams C, Cavanagh K. Management strategies in a SEIR-type model of COVID 19 community spread. Sci Rep. 2020;10(1):21256.
  5. 5. Hong H, Noh JY, Lee H, Choi S, Choi B, Kim JK, et al. Modeling incorporating the severity-reducing long-term immunity: higher viral transmission paradoxically reduces severe COVID-19 during endemic transition. Immune Netw. 2022;22(3):e23. pmid:35799710
  6. 6. Lopez L, Rodo X. A modified SEIR model to predict the COVID-19 outbreak in Spain and Italy: simulating control scenarios and multi-scale epidemics. Results Phys. 2021;21:103746.
  7. 7. Carcione JM, Santos JE, Bagaini C, Ba J. A simulation of a COVID-19 epidemic based on a deterministic SEIR model. Front Public Health. 2020;8:230. pmid:32574303
  8. 8. Wong V, Cooney D, Bar-Yam Y. Beyond Contact Tracing: Community-Based Early Detection for Ebola Response. PLoS Curr. 2016;8. pmid:27486552
  9. 9. Viana J, van Dorp CH, Nunes A, Gomes MC, van Boven M, Kretzschmar ME, et al. Controlling the pandemic during the SARS-CoV-2 vaccination rollout. Nat Commun. 2021;12(1):3674. pmid:34135335
  10. 10. Li X, Ghadami A, Drake JM, Rohani P, Epureanu BI. Mathematical model of the feedback between global supply chain disruption and COVID-19 dynamics. Sci Rep. 2021;11(1):15450. pmid:34326384
  11. 11. Goldberg EE, Lin Q, Romero-Severson EO, Ke R. Swift and extensive Omicron outbreak in China after sudden exit from “zero-COVID” policy. Nat Commun. 2023;14(1):3888. pmid:37393346
  12. 12. Drake JM, Handel A, Marty É, O’Dea EB, O’Sullivan T, Righi G, et al. A data-driven semi-parametric model of SARS-CoV-2 transmission in the United States. PLoS Comput Biol. 2023;19(11):e1011610. pmid:37939201
  13. 13. Cooper I, Mondal A, Antonopoulos CG. A SIR model assumption for the spread of COVID-19 in different communities. Chaos Solitons Fractals. 2020;139:110057. pmid:32834610
  14. 14. Gozzi N, Chinazzi M, Dean NE, Longini IM Jr, Halloran ME, Perra N, et al. Estimating the impact of COVID-19 vaccine inequities: a modeling study. Nat Commun. 2023;14(1):3272. pmid:37277329
  15. 15. Girardi P, Gaetan C. An SEIR model with time-varying coefficients for analyzing the SARS-CoV-2 epidemic. Risk Anal. 2023;43(1):144–55.
  16. 16. Wearing HJ, Rohani P, Keeling MJ. Appropriate models for the management of infectious diseases. PLoS Med. 2005;2(7):e174. pmid:16013892
  17. 17. Hurtado PJ, Kirosingh AS. Generalizations of the “Linear Chain Trick”: incorporating more flexible dwell time distributions into mean field ODE models. J Math Biol. 2019;79(5):1831–83. pmid:31410551
  18. 18. Hurtado PJ, Richards C. Building mean field ODE models using the generalized linear chain trick & Markov chain theory. J Biol Dyn. 2021;15(sup1):S248–72.
  19. 19. Hong H, Eom E, Lee H, Choi S, Choi B, Kim JK. Overcoming bias in estimating epidemiological parameters with realistic history-dependent disease spread dynamics. Nat Commun. 2024;15(1):8734. pmid:39384847
  20. 20. De Salazar PM, Lu F, Hay JA, Gómez-Barroso D, Fernández-Navarro P, Martínez EV, et al. Near real-time surveillance of the SARS-CoV-2 epidemic with incomplete data. PLoS Comput Biol. 2022;18(3):e1009964. pmid:35358171
  21. 21. Nishiura H, Inaba H. Estimation of the incubation period of influenza A (H1N1-2009) among imported cases: addressing censoring using outbreak data at the origin of importation. J Theor Biol. 2011;272(1):123–30. pmid:21168422
  22. 22. Saito MM, Hirotsu N, Hamada H, Takei M, Honda K, Baba T, et al. Reconstructing the household transmission of influenza in the suburbs of Tokyo based on clinical cases. Theor Biol Med Model. 2021;18(1):7. pmid:33568160
  23. 23. Miura F, van Ewijk CE, Backer JA, Xiridou M, Franz E, Op de Coul E. Estimated incubation period for monkeypox cases confirmed in the Netherlands, May 2022. Euro Surveill. 2022;27(24).
  24. 24. Xin H, Li Y, Wu P, Li Z, Lau EHY, Qin Y. Estimating the latent period of coronavirus disease 2019 (COVID-19). Clin Infect Dis. 2022;74(9):1678–81.
  25. 25. Huang S, Li J, Dai C, Tie Z, Xu J, Xiong X, et al. Incubation period of coronavirus disease 2019: new implications for intervention and control. Int J Environ Health Res. 2022;32(8):1707–15.
  26. 26. Men K, Li Y, Wang X, Zhang G, Hu J, Gao Y, et al. Estimate the incubation period of coronavirus 2019 (COVID-19). Comput Biol Med. 2023;158:106794. pmid:37044045
  27. 27. Li Y, Jiang X, Qiu Y, Gao F, Xin H, Li D, et al. Latent and incubation periods of Delta, BA.1, and BA.2 variant cases and associated factors: a cross-sectional study in China. BMC Infect Dis. 2024;24(1):294. pmid:38448822
  28. 28. Rauch W, Schenk H, Rauch N, Harders M, Oberacher H, Insam H, et al. Estimating actual SARS-CoV-2 infections from secondary data. Sci Rep. 2024;14(1):6732. pmid:38509181
  29. 29. Ringa N, Bauch CT. Dynamics and control of foot-and-mouth disease in endemic countries: a pair approximation model. J Theor Biol. 2014;357:150–9. pmid:24853274
  30. 30. Meng X, Cai Z, Si S, Duan D. Analysis of epidemic vaccination strategies on heterogeneous networks: Based on SEIRV model and evolutionary game. Appl Math Comput. 2021;403:126172. pmid:33758440
  31. 31. Prabakaran R, Jemimah S, Rawat P, Sharma D, Gromiha MM. A novel hybrid SEIQR model incorporating the effect of quarantine and lockdown regulations for COVID-19. Sci Rep. 2021;11(1):24073. pmid:34912038
  32. 32. Tiwari S, Vyasarayani CP, Chatterjee A. Data suggest COVID-19 affected numbers greatly exceeded detected numbers, in four European countries, as per a delayed SEIQR model. Sci Rep. 2021;11(1):8106. pmid:33854165
  33. 33. Hilton J, Riley H, Pellis L, Aziza R, Brand SPC, K Kombe I, et al. A computational framework for modelling infectious disease policy based on age and household structure with applications to the COVID-19 pandemic. PLoS Comput Biol. 2022;18(9):e1010390. pmid:36067212
  34. 34. Thompson RN, Gilligan CA, Cunniffe NJ. Detecting presymptomatic infection is necessary to forecast major epidemics in the earliest stages of infectious disease outbreaks. PLoS Comput Biol. 2016;12(4):e1004836.
  35. 35. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, Rt. PLoS Comput Biol. 2020;16(12):e1008409. pmid:33301457
  36. 36. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol. 2013;178(9):1505–12.
  37. 37. Abbott S, Hellewell J, Thompson RN, Sherratt K, Gibbs HP, Bosse NI, et al. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts. Wellcome Open Research. 2020;5(112):112.
  38. 38. Josić K, López JM, Ott W, Shiau L, Bennett MR. Stochastic delay accelerates signaling in gene networks. PLoS Comput Biol. 2011;7(11):e1002264. pmid:22102802
  39. 39. Hong H, Cortez MJ, Cheng YY, Kim HJ, Choi B, Josic K, et al. Inferring delays in partially observed gene regulation processes. Bioinformatics. 2023;39(11).
  40. 40. Cortez MJ, Hong H, Choi B, Kim JK, Josić K. Hierarchical Bayesian models of transcriptional and translational regulation processes with delays. Bioinformatics. 2021;38(1):187–95. pmid:34450624
  41. 41. Kim DW, Hong H, Kim JK. Systematic inference identifies a major source of heterogeneity in cell signaling dynamics: The rate-limiting step number. Sci Adv. 2022;8(11):eabl4598. pmid:35302852
  42. 42. Szavits-Nossan J, Grima R. Uncovering the effect of RNA polymerase steric interactions on gene expression noise: Analytical distributions of nascent and mature RNA numbers. Phys Rev E. 2023;108(3–1):034405.
  43. 43. Jo H, Hong H, Hwang HJ, Chang W, Kim JK. Density physics-informed neural networks reveal sources of cell heterogeneity in signal transduction. Patterns (N Y). 2023;5(2):100899. pmid:38370126
  44. 44. Song YM, Campbell S, Shiau L, Kim JK, Ott W. Noisy Delay Denoises Biochemical Oscillators. Phys Rev Lett. 2024;132(7):078402. pmid:38427894
  45. 45. Choi B, Cheng Y-Y, Cinar S, Ott W, Bennett MR, Josić K, et al. Bayesian inference of distributed time delay in transcriptional and translational regulation. Bioinformatics. 2020;36(2):586–93. pmid:31347688
  46. 46. Byun JH, Roh Y, Yoon I-S, Kim KS, Jung IH. Fractional transit compartment model for describing drug delayed response to tumors using Mittag-Leffler distribution on age-structured PKPD model. PLoS One. 2022;17(11):e0276654. pmid:36331932
  47. 47. Calleri F, Nastasi G, Romano V. Continuous-time stochastic processes for the spread of COVID-19 disease simulated via a Monte Carlo approach and comparison with deterministic models. J Math Biol. 2021;83(4):34. pmid:34522994
  48. 48. Stehlé J, Voirin N, Barrat A, Cattuto C, Colizza V, Isella L, et al. Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees. BMC Med. 2011;9:87. pmid:21771290
  49. 49. Nielsen BF, Sneppen K, Simonsen L. The counterintuitive implications of superspreading diseases. Nat Commun. 2023;14(1):6954. pmid:37907452
  50. 50. Shafer G, Vovk V. A tutorial on conformal prediction. Journal of Machine Learning Research. 2008;9(3).