Figures
Abstract
Background
The COVID-19 epidemic has shown that efficient prediction models are required, and the well-known SI, SIR, and SEIR models are not always capable of capturing the real dynamics. Modified models with novel structures could help identify unknown mechanisms of COVID-19 spread.
Objective
Our objective is to provide additional insights into the COVID-19 spread mechanisms based on different models’ parameterization which was performed using evolutionary algorithms and the first-wave data.
Methods
Data from the Our World in Data COVID-19 database was analysed, and several models—SI, SIR, SEIR, SEIUR, and Bass diffusion—and their variations were considered for the first wave of the COVID-19 pandemic. The models’ parameters were tuned with differential evolution optimization method L-SHADE to find the best fit. The algorithm for the automatic identification of the first wave was developed, and the differential evolution was applied to model parameterization. The reproduction rates (R0) for the first wave were calculated for 61 countries based on the best fits.
Results
The performed experiments showed that the Bass diffusion model-based modification could be superior compared to SI, SIR, SEIR and SEIUR due to the component responsible for spread from an external factor, which is not directly dependent on contact with infected individuals. The developed modified models containing this component were shown to perform better when fitting to the first-wave cumulative infections curve. In particular, the modified SEIR model was better fitted to the real-world data than the classical SEIR in 43 cases out of 61, based on Mann–Whitney U tests; the Bass diffusion model was better than SI for 57 countries. This showed the limitation of the classical models and indicated ways to improve them.
Citation: Stanovov V, Grabljevec S, Akhmedova S, Semenkin E, Stojanović R, Rozman Č, et al. (2022) Identification of COVID-19 spread mechanisms based on first-wave data, simulation models, and evolutionary algorithms. PLoS ONE 17(12): e0279427. https://doi.org/10.1371/journal.pone.0279427
Editor: Sebastián Gonçalves, Universidade Federal do Rio Grande do Sul, BRAZIL
Received: April 26, 2022; Accepted: December 7, 2022; Published: December 28, 2022
Copyright: © 2022 Stanovov et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and within the Github repository: https://github.com/VladimirStanovov/OWID-COVID-19-Analysis.
Funding: This work was supported by the Ministry of Science and Higher Education of the Russian Federation within limits of state contract № FEFE-2020-0013 and by the Slovenian Research Agency (ARRS) (programs No.: UNI-MB-0586-P5-0018, No.: BI-RU/19-20-034, No. BI-ME/18-20-009). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: BD, Bass diffusion model; DE, differential evolution; DRC, Democratic Republic of the Congo; GDP, gross domestic product; eSEIR, exponential susceptible-exposed-infected-recovered; eSEIUR, exponential susceptible-exposed-infected-unreported-recovered; eSI, exponential susceptible-infected; eSIR, exponential susceptible-infected-recovered; SEIR, susceptible-exposed-infected-recovered; SEIUR, susceptible-exposed-infected-unreported-recovered; SI, susceptible-infected; SIR, susceptible-infected-recovered; UK, United Kingdom
Introduction
The COVID-19 pandemic [1] has demonstrated the need for efficient infection-spread prediction models that are capable of capturing the dynamics of the process in different populations. Some of the well-known models often used for epidemiological dynamics are the susceptible–infected (SI) [2], susceptible–infected–recovered (SIR) [3], and the susceptible–exposed–infected–recovered (SEIR) models [4, 5]. These models are often described with several differential equations, controlled by population-based and infection-based parameters.
The mentioned models, especially the SEIR model, are often used in epidemiological studies due to their flexibility, simplicity, and transparency. Despite the popularity and simplicity of the basic models and their ability to give reasonable predictions, there are numerous modifications to SI, SIR and SEIR that allow for the consideration of additional factors that influence the dynamics of infection spread [6, 7]. The SEIR model is often modified, for example, at the SEIRU model [8] in addition to the reported infected cases, the unreported infected cases are also considered. In addition, this study introduces additional latency periods into the dynamics of the COVID-19 pandemic. In [9], a similar modification was applied—i.e., the reported and unreported infected cases were considered; however, following an early study on COVID-19 in [1], a metapopulation model was applied, which also considered human mobility networks. Obviously, the consideration of asymptomatic cases requires the estimation of their ratio, as well as their possible influence on the number of new infections [10, 11]. In [12], the extended SEIR model was applied, where detectable, undetectable, and isolated cases were considered, and an estimation of their number and influence were provided.
Based on these studies, it is clear that the proposed model modifications often introduce additional states describing real processes within the population; however, some of these models are often complicated and difficult to tune. The Bass diffusion model (BD) [13] is rarely used in epidemiological studies [14, 15], yet it has gained more recent interest given its simplicity and ability to build S-shaped curves. In [15], the BD model was also identified as one of most suitable for real data fitting. In this study, however, the discrete form of the BD model was applied, while in the present study, the differential equations form was considered, which is typically used for modelling. The original BD model is most often used to show the spread of specific products in the market, but it could also be adapted to describing the spread of some infections. One of the features of interest in the BD model is that spreading may begin even if there are no product buyers/infected people at the outset of the study period.
The focus of the present study is the analysis of different types of models that could be applied to the description of real infection dynamics observed in different countries around the world for COVID-19 spread. The goal of our study was to compare the efficiency of different epidemic models in describing the real dynamics, and not to forecast the future spread. In particular, the mentioned models are fitted for each country included in the study, and conclusions are made about the efficiency of the models in these settings. One should note that the parameters in the presented models were fixed, which might be argued due to the facts, that countermeasures were forced in most countries, however, at least the first part of the first wave took majority of us by surprise. By fixed model parameters, one could extract possible novel epidemic spread mechanisms, otherwise, it would be possible to consider time varying infectivity factors which, however, might lead to model overfitting and losing the important information about the process. The modifications inspired by the Bass diffusion model were incorporated into the SIR, SEIR, and SEIUR models, and an additional set of experiments was performed. Based on the results, the importance of the modification is discussed, as well as its role in explaining the ongoing observed epidemiological reality.
The rest of the paper is organised as follows. First, the classical models are described, and their similarities and differences are highlighted. After this, the modified models are proposed, followed by the presentation of the data preparation method. Next, the modified models are described and the results of the experiments are given. Finally, the conclusions are provided.
Methods
Basic epidemiological models
To approximate the real data, containing the number of infected for every country, five basic models were considered—the SI, SIR, SEIR, SEIUR, and BD models.
The simplest SI model is described by a pair of differential equations:
where β is the parameter controlling the flow from susceptible to infected; S and I are the number of susceptible and infected people, respectively; and N is the maximum number of susceptible. Note that if the number of infected is initially set to zero, there will be no dynamics, as the flow rate depends on the product of two states. Fig 1 shows the cumulative number of infected with β = 0.1, initial number of susceptible S0 = 100000, and initial infected I0 = 1, exercising S-shaped growth.
The SIR model can be described with the following equations:
where β and γ are the model parameters, and S, I and R are the number of susceptible, infected, and recovered, respectively. The γ parameter controls the exponential flow rate from infected to recovered. Alternatively,
can be seen as the number of days, D, required to get from the infected to recovered state.
A sample response of the SIR model is shown in Fig 2, with S0 = 100000, I0 = 1, β = 0.1, and γ = 0.01.
The main difference between SI and SIR is that in the SI model, the number of infected never decreases, while in the SIR model, the infected transition to the recovered state after certain period of time.
The SEIR [16] model equations are shown below:
where β, σ, and γ are the model parameters, and S, E, I, and R are the number of susceptible, exposed infected, and recovered. The value
can be seen as the number of days, Z, required to move from the exposed to infected states [17]. An example response of the SEIR model is shown in Fig 3, with S0 = 100000, I0 = 1, β = 1, σ = 0.01, and γ = 0.0005.
In addition to the SEIR model, its extended version, the SEIUR model [8, 18, 19], was considered, which also included unreported cases:
where β, σ, γ, and λ are the model parameters, and S, E, I, U, and R are the number of susceptible, exposed, infected (reported), unreported infected, and recovered. The two additional parameters of this model are the λ, responsible for the ratio of reported infected, and μ, responsible for the influence ratio of the unreported infected. A sample response of the SEIUR model is shown in Fig 4.
The original Bass diffusion model [13] can be described as follows:
The p and q are the model parameters, the p value is the coefficient of innovation, and q is the coefficient of imitation. The F(T) is defined as the cumulative of f(T), which is the purchase likelihood function. The initial condition F(0) = 0 could be declared as 0 due to the innovation part of the model. Here the analogy is used, in particular the number of purchases is analogous to the number of infected, and the number of potential customers (determined by 1−F[T]) is analogous to the number of susceptible. Unlike in the SI model, here it is possible to become infected without interaction with the infected—e.g., being infected by external sources, such as animals. The graphical representation of this equation in PowerSim software system [20] is shown in Fig 5, and Table 1 contains the parameters.
An alternative way of describing the same model is as follows:
where Nt is the current total number of purchases (infected) and M is the total number of purchases (total susceptible). These could also be interpreted as current number of infected and total number of susceptible. The graphical representation of this equation in PowerSim is shown in Fig 6, and Table 2 contains the parameters.
An example response of the BD model is shown in Fig 7. Here, as well as for all the other models, the cumulative number of infected is shown.
All presented models were further tested against the real data over a set of countries.
Modified models.
In addition to the models described above, the modified models were applied, with the modification inspired by the Bass diffusion model and supported by the experimental results. The exact reasons for such modifications are described in the following sections.
The proposed modification incorporates the exponential outflow from susceptible to infected with θ parameter. This parameter is responsible for the rate of progression from susceptible to infectious without the interaction between infectious and susceptible. The modified SI model, also referred to as exponential SI (eSI), can be described with the following equations:
where θ is responsible for the external factor, which influences the infections dynamics. An example response of the modified SI is shown in Fig 8.
The modified SI model has two parameters, which play the same roles as in the BD model. In fact, these two models are identical: one of the parameters controls the exponential outflow from the susceptible, while the other controls interactions between the susceptible and infected. With the same parameter values, the BD and modified SI demonstrate the same dynamics. Going forward in this paper, eSI is referred to as the BD model.
The modified SIR (exponential SIR, eSIR) model is described as follows:
An example response of the original and modified SIR models is shown in Fig 9.
For the modified SIR model, the following parameters were used: β = 0.1, γ = 0.01, and θ = 1∙10−4; for the original SIR, β = 0.06 and γ = 0.01 were used. These parameters were chosen so that the two curves would better match each other. As one may observe, the SIR model begins with a lower rate and stabilises faster, whereas the initial rate of the modified SIR is higher and takes longer to stabilise at the end. This slower approach to the limit value was also observed in the real data, which is one of the desired model properties.
The SEIR model was changed to include the θ parameter. The modified SEIR (exponential SEIR, eSEIR) model equations are shown below:
where θ plays the same role as in the modified SIR model. A sample response of the modified eSEIR model is shown in Fig 10.
For the modified SEIR model, the following parameters were used: β = 1, σ = 0.006, γ = 0.0005, and θ = 0.001; for the original SEIR, β = 1, σ = 0.01, and γ = 0.0005 were used. Again, these parameters were chosen so that the two curves would better match each other. In both the modified SIR and SEIR models, if the θ parameter is set to zero, the models turn into their unmodified versions. As in the case of the SIR model, the SEIR model begins slowly and takes the limit value sooner, while the modified SEIR model begins its growth earlier and at a greater rate, but it takes more time to achieve the limit value. This property of reaching the limit value slower was also observed in the real data.
The SEIUR model was also modified to include the θ parameter, and the equations are given below:
A comparison of the SEIUR and its modified version (exponential SEIUR, eSEIUR) is shown in Fig 11.
In Fig 11, the following parameters were used for the SEIUR model: β = 2, σ = 0.025, γ = 0.07, θ = 0, λ = 0.1, and μ = 0.3. For the eSEIUR, the following parameters were set: β = 2, σ = 0.025, γ = 0.07, θ = 0.001 λ = 0.1, and μ = 0.3. The next section contains a description of the data preparation.
Data preparation.
The dataset considered in this study was taken from the GitHub repository using the Our World in Data database containing the COVID-19 spread dynamics in every country and region [21]. The dataset contained the number of infected persons daily since the beginning of observations, as well as the cumulative number of infected. In addition, some other data is provided for each country, such as population, GDP, etc.
In this study, the cumulative number of infected was considered—i.e., the models presented above were used to fit to the actual cumulative infection curves. The dataset contained information about several waves of infection for many countries, however for the purpose of our study only the first wave—the initial spread of the infection—was considered [22]. The data from the first wave were considered to extract the basic characteristics of the process, which would be more difficult to extract in second and consecutive waves. Due to different start times of the first wave in different countries, it was necessary to cut several first measurements. To identify the starting points, Stmin, the following heuristic rule was used: if for τ = 3 time steps before t, the exponentially smoothed (with parameter ε = 0.2) total number of infected was increasing by at least 1 case, then Stmin = t.
The algorithm for detecting the end of the first wave is more complex and considers several steps. First, the number of infected per day was considered, and according to the observations, the first wave ended at the point where the number of infected per day dropped closer to zero after a relatively high peak (see Fig 12).
Fig 12 shows that it should be possible to identify the end of the first wave by determining the position of the first minimum after the first maximum of daily infections. However, it could be challenging to do so, as there were many oscillations in the number of infections per day. As the provided dataset contained already smoothed number of infected per day, this data was used in the experiments. To identify the minimum, the estimation of the derivative of the number of infected per day was calculated with finite differences:
where t = 1…T−1. The approximate derivative was further processed using an exponential smoothing approach:
where t = 1…T−1, ε = 0.02. The resulting values were analysed to detect the first wave following these three rules:
should be positive for at least wp∙T steps, after this
should be negative for at least wp∙T, and then
should switch from negative to positive value.
The highlighted area in Fig 12 shows the results of automatic detection, and Fig 13 shows the smoothed number of infections per day and the smoothed derivative.
For the case in Figs 12 and 13, the wp parameter was set to 0.07, and the detected end of the first wave and beginning of the second was identified at day 145.
The two main parameters controlling the proposed heuristic for determining the first wave are τ and wp. To demonstrate the influence of these values, Fig 14 shows the possible scenarios of determining the first wave with varied parameter values.
The goal was to analyse the first wave of the infection, but because several countries were still far from the inflection point, they were filtered out. In addition, countries with populations of less than 1 million people were not considered. Several countries were also removed because either very few cases were reported relative to the total population, or they were special cases, like China, where a very small fraction of new infections were reported after a long period of time. Other countries that were excluded due to peculiarities in the data were Denmark, Brazil, Uganda, Vietnam, Poland, and Northern Macedonia. For these countries, the first wave did not follow the S-shaped curve, and the first wave was not clearly distinguished from the second. This limitation was made due to possible biased dynamics for very small countries, such as islands or cities, which are significantly affected by their location, tourism, and other factors.
Differential evolution.
There are several known attempts to apply computational intelligence methods, such as neural networks, for prediction of COVID-19 epidemic trends [23]. In our study, to find the parameter values for the presented models, the differential evolution (DE) algorithm was applied [24]. In particular, a well-known adaptive version, L-SHADE [25], is implemented and modified to find the parameter values of the presented models.
The DE algorithm performed the search for optimal solutions in RD, where D is the dimensionality. DE begins with initialising a set of vectors: xi,j = Random(Lj, Uj), where i = 1…NP, j = 1…D, Lj, and Uj are the lower and upper boundaries, and NP is the population size. The fitness values f(xi)—that is, the goal function values—were calculated for every individual.
After the initialisation step, the DE proceeded with the main algorithm loop, containing mutation, crossover, and selection steps. The mutation in DE is the main search operator, which combines the coordinates of several points to produce new ones. The mutation strategy used in L-SHADE is called current-to-pbest/1 and generates new mutant vector, v, using the scaling factor, F, as follows:
where r1 and r2 are the randomly chosen indexes from [1, NP], pbest is one of the p% fittest solutions, and i is the index of current vector. Indexes i, r1, r2 and pbest are different from each other. The r2 index is chosen from either the population or the external archive, composed of individuals replaced during selection.
The mutation step was followed by crossover, which combined the target vector xi and the mutant vector v to the produce trial vector u in the following way:
where j = 1…D, Cr is the crossover rate parameter and jrand is the randomly chosen integer from [1, D] needed to ensure that at least one component is taken from the mutant vector.
Once the trial vector was generated, its fitness f(u) was calculated and compared to the fitness of the target vector in the selection procedure:
where G is the current generation number—i.e., selection forms the new generation of individuals. The replaced parent is placed into the archive, and if the archive size reaches NP, then the random solution from the archive is replaced.
The parameter adaptation is one of the features of L-SHADE method. For every selection and mutation the F and Cr parameters are sampled from Cauchy and normal distributions with scale parameter equal to 0.1: F = randc(MF,k, 0.1), Cr = randn(MCr,k, 0.1). The algorithm maintains a set of H memory cells each containing a pair of (MF,k, MCr,k) values. One of the memory cells is updated at the end of each generation using weighted Lehmer mean of stored successful F and Cr values in SF and SCr, with weights being the fitness improvementsΔf = |f(uj)−f(xj)| in the selection step, stored in SΔf:
where
, S is either SF or SCr. The memory cells are updated as follows:
where t is the current iteration number, iterating from 1 to H. The L-SHADE algorithm also reduces the population size in the following way:
where NPmin = 4, NFE and NFEmax are the current and total number of function evaluations available, g is the current generation number. At the end of each generation the worst solutions are removed, if the population size changes.
In addition to the parameter adaptation, for the goals of current study the L-SHADE algorithm was modified by a relatively simple restart heuristic. If the best found solution does not change for 0.25NFEmax function evaluations, then the whole population is randomized, except for the best solution. The idea of the modification was to allow the algorithm to escape local optima if it gets stuck. Note that the resulting L-SHADE-R method restarts with a smaller population size, depending on the used computational resource.
Results
To compare the ability of the presented models to describe the real-world dynamics, the models were tuned to fit the available data regarding the first wave of the COVID-19 pandemic. To choose the appropriate parameter values for the SI, BD, SIR, SEIR, and SEIUR models, as well as the modified versions, the differential evolution algorithm was applied. The DE represents a universal zero-order continuous optimiser, capable of finding the global optimum even in the case of rough goal function terrain, as demonstrated in many benchmarks and real-world applications. All models were implemented in Python 3.7 with Odeint, which is a part of the SciPy library [26]. To perform experiments, the parallel implementation with OpenMPI was utilized, and the search for parameter values of each model on each country data was performed on a cluster of 12 computers with 8 cores each, connected via network.
In addition to the model parameters mentioned above, one additional parameter, α, was considered, controlling the amount of susceptibility at the end of the first wave with respect to the total population: S = αP, where P is the total population of a country.
In the first series of experiments, the goal function, used as the fitness function in L-SHADE-R, was calculated as the relative error between the real data and the curve predicted by the model over all data points:
where N is the total number of cases in the first T days, c is the country index, CIc,t is the cumulative number of infected at day t in country c, and MIt is the cumulative number of infections, all as predicted by the model at day t.
Preliminary, we have used several standard measures for fitness function such as Mean Absolute Percent Error (MAPE), Mean Absolute Error (MAE), Mean Squared Error (MSE). However, the proposed relative error measure REc was more stable in case of extreme values e.g. when the number of infected is close to 0.
The L-SHADE-R algorithm was executed with the following parameters:
- Population size: 500 individuals,
- Computational resource: 125000 evaluations,
- Memory size H = 5, initial MF,k =0.2, MCr,k = 0.8,
- Ratio of best for mutation pbest = 0.2,
- Initial population generated within the following boundaries:
- α∈[0,0.1];
- β∈[0,8];
;
;
- θ∈[0,0.1];
- λ∈[0.0001,0.5];
- μ∈[0,8].
The initial conditions of the models, i.e. state variables, were set according to the available data. First value of the Infected, that were recorded in official statistics, were used for initialization of the Infected state. Other states, such as exposed, recovered and unreported, were initialized to zero.
The ranges for σ and γ were chosen according to [27], where transmission from exposed to infected and from infected to recovered were determined between 3 and 14 days. However, the lower boundary for these parameters was expanded to 1/42, i.e. 42 days, to test if this gives additional performance gains. For β the search range was set from 0 to 8 according to preliminary experiments in literature, for example in [28] the β parameter was set to from 1.5 to 3. Larger range was used to allow the search algorithm discover better solutions, so that post-processing would show the most preferable range to which parameter values fall into.
Different models had different dimensionalities, as the number of parameters varied. For SI there were two variables, for BD there were three variables, for SIR and SEIR there were three and four variables, and for eSIR and eSEIR there were four and five variables, respectively. Finally, SEIUR and eSEIUR had six and seven variables. The best parameter values for every model and every country were saved at the end of each algorithm run, and there were 25 independent runs performed for every model and every country. Table 3 contains the average relative error rates and their standard deviations for all countries and models, as well as the number of cases when a particular model was the best compared to others. In addition to standard settings, an experiment with extended range for eSEIUR, denoted with eSEIURrange was performed, with the following settings:
- α∈[0,0.5];
- β∈[0,12];
;
;
- θ∈[0,0.5];
- λ∈[0.0001,0.5];
- μ∈[0,8].
The idea behind an additional experiment with eSEIURrange was to check if the chosen search ranges are sufficient for fitting.
The results in Table 3 show that the modified models were better in most cases; for example, the eSEIR model was the best for 8 countries out of 61. Comparing the SI and Bass diffusion models, the second was the best among all other models for 7 countries out of 61, while the SI model was not the best for country. Between the SIR and eSIR models, the modified model and appears to be the best model for 9 countries. The models with unreported cases did not show best results for any of the countries; however, the difference in relative error was not large. Some of the values in Table 3 are relatively large, especially for SI and BD model, where standard deviation of errors could reach 100%. This is due to the stochastic nature of the L-SHADE-R algorithm, which may not find a good fit in all 25 runs. It can be fixed by restarting the method or running it until it gets a reasonable error value. Large mean and standard deviations are usually due to a single outlier in error values. We leave these values in the Table 3 for the reader to be aware, that there could be some unsuccessful optimization attempts due to randomness of the approach. However, at the end the results of the best fit are considered.
Fig 15 provides the visualization of the relative error values of the basic and modified models of all countries. Here the error values of the SEIR model are chosen as baseline, i.e. the error values of all models are divided by error value of SEIR of each particular country. Also, the countries are sorted by eSEIR relative error values to improve visibility.
Considering the results in Fig 15, it can be concluded that the SI model is not sufficient for describing the dynamics of the infection spread, however, the BD model is much better, and is capable of outperforming other models, including SEIR in many cases. The eSEIR model performs much better than SEIR in up to 40 countries on average, and SEIUR has similar performance to standard SEIR. As for eSEIUR and eSEIURrange models, the former has better performance in many cases, but the difference is not large.
Table 4 contains the aggregated pairwise comparison results of the standard and modified models across all countries. The comparison was performed using non-parametric Mann–Whitney statistical U tests with normal approximation and tie-breaking, as well as a significance level p = 0.01. For every country, the comparison was performed using the results of 25 runs, and the following three outcomes were considered: the modified model was better, equal, or worse than the original. The numbers of better, equal, and worse cases were summed together.
The results in Table 4 prove that the modified models were better in most cases. In particular, the BD model was better than SI for 57 countries out of 61. The modified SIR model was worse than original only for one country. The modification also significantly improved the SEIR model, where the original model was statistically equal only for 13 cases out of 61, for all other cases the modified model was better. Comparing eSEIUR and eSEIURrange, it can be seen that expanding the search range for parameter values results in equal or worse error values, because bigger range results in larger search space, hence lower overall performance of optimization algorithm. This is important for the numerical experiments, which is highly dependent on the parameters’ boundaries.
To estimate whether the ranges of the parameters’ search are appropriate, the boxplots were drawn for the β parameter for all considered models. The β parameter was chosen, as it is present in all models, and has significant influence, as it is responsible for new infections caused by contacts between infected and susceptible. Fig 16 shows box plots for all models, the parameter values were taken from all 25 runs on all countries.
As can be seen from Fig 16, for simple models like SI and SIR, large β values are not required, so the range from 0 to 8 is appropriate. As for SEIR and eSEIR models, they also stay within this range, and even largest values rarely reach 5. The SEIUR and eSEIUR models, on the other hand, reach the maximum value, and the average value of β may be larger than 2. As for the experiment with larger range, it can be seen that although some parameter reach value 8 or even more, most of them stay below 4.
Table 5 contains the calculated basic reproduction rates, R0, for every country for the SIR and eSIR models. For the SI and BD models, the calculation of R0 was not possible, as the recovery of the infected was not considered in these models. Due to the complexity of the SEIR and SEIUR models, the R0 numbers were not estimated for in these cases. The values in Table 5 are calculated based on the found parameter values using the best out of 25 independent runs as R0 = β/γ for the SIR and eSIR model.
The results in Table 5 show that the calculated R0 values for SIR model was 1.647, and for eSIR model– 2.367. For some countries, the values of R0 were far from realistic values; however, most of them were within the expected range. According to some of the initial estimations in [29], the R0 for COVID-19 was estimated by a meta-analysis as 3.42. It should be noted that the R0 values for the modified model are on average larger than for the original SIR. This is because part of the dynamics in the modified model is described by added exponential term which influences the change of the ratio between β and γ which determine the value of R0. From epidemiological point of view this means that if there is an additional source of infection except contacts between susceptible and infected, intensity of the infection spread would be higher due to the fact, that each susceptible person would have more chances to be infected in certain period of time. Although R0 in Table 5 is not always larger for eSIR then for SIR, this is clearly the observed trend.
Fig 17 provides an example of the fitted curves on the real data for Saudi Arabia.
In Fig 17, the basic models, such as SI, SIR, and SEIR are shown to be incapable of capturing the real dynamics, while even the simple BD was enough to achieve an acceptable fit to the real data. The modified models, which incorporated the exponential outflow, were better than their non-modified versions. The results in Fig 17 were obtained with the parameter values, shown in Table 6.
In Table 6 it can be seen that θ parameter is significantly smaller than the β parameter. This is due to the fact that the total population is considered as a normalization constant.
Table 7 contains the start date, end date, initial and final numbers of infected, and population size for every country used in the computations.
Table 7 is provided to assure the repeatability of similar simulation experiments on the same dataset with different models.
The values in Tables 6 and 7, combined with the description of all models can be used to reproduce the performed experiments.
Discussion
The task was to perform parameterization of classical models to obtain parameters for COVID-19 to improve the knowledge about modelling such systems. At the beginning, aside from the classical models like SI, SIR, SEIR, and SEIUR, the Bass diffusion model was added, which is rarely used in the field of epidemiology [15]. Surprisingly, the Bass diffusion model provided fits as good as the more advanced models, such as SEIR. This led to the idea of applying additional exponential outflow from the susceptible to modify the known models. The main idea behind this additional exponential outflow modification was that the transitions from susceptible to infected or exposed would also occur without direct interactions between the susceptible and infected, with the infection caused by some external factor. With the identification of best models, which incorporated the exponential outflow from the Bass diffusion model, it was possible to improve the forecasting of the first wave of the epidemics. The possible transfer to use in analysing the second, third and other waves is a topic for further research.
The presented results show that the modified models had better fit to real-world data, although including an additional parameter in the model made it more complex for the optimization method to solve. This means that the models that have additional exponential outflow from susceptible to infected or exposed are more suitable to describe the real dynamics of infections. This would suggest that a significant part of the dynamics is not due to the contact between the susceptible and infected [30], but because of some external factors [31]. One explanation could be that the virus is spreading with aerosol/food or via some unknown mechanism, independent of the contacts between the susceptible and infected. This external source could also be from the unreported immigration of people from infected areas and corresponding accumulation of the virus in aerosol. As in the present study, only the first wave was considered, and it is possible to suggest that even before the pandemic breakout, the number of virus carriers had significantly accumulated, with these being asymptomatic cases, transitioning later with the exponential flow mechanism.
These are only a few hypothetical reasons that could explain this behaviour. In any case, the models imply the presence of such mechanisms, but this should be the topic of further research in the field of epidemiology. If our observations and proposed modifications are confirmed by other independent studies, then the SIR, SEIR, and SEIUR models should be modified as suggested. Moreover, explanations should be provided to outline the related dynamics. The developed models provided guidelines to explain the apparent mechanism, which was not considered in the classical models.
According to [32], “The ability of SARS-CoV-2 to remain viable longer on surfaces taken together with its higher virulence in establishing an infection makes it very likely that this coronavirus uses other modes of transmission in addition to respiratory droplets”. This could be one explanation of the observed higher efficiency of the modified models, although some studies, such as [33], suggest that this cannot be the main source of transmission, because the risk of infection after 3 days (72 hours) is minor.
Another explanation could be that the aerosol particles containing respiratory viruses are significantly accumulated in human environments, which are in large extent buildings, and as such this represent independent source of infection, not directly connected to person-to-person transmission, which is indicated in [34–36].
One should be aware, that the spread of infectious disease inevitably involves spatially connected dynamics, which could be modelled in more details. Several studies pointed out that COVID-19 epidemics, being a human-to-human driven disease, evolves locally [37, 38]. The infected persons spread the disease to close susceptible persons, not to any suspected person in the entire country. One should note that this is the limitation of the presented models, which might be used on an abstract, aggregate level.
In this study, the parameterization of the first wave was performed using four classical models and four modified models. The experiments showed that the modification based on the Bass diffusion model allowed for the better fitting of curves, which was also indicated in [15]. By the examination of the equations and the comparison of the modified and unmodified models, one could anticipate that an additional mechanism of COVID-19 spread is present that is not considered in classical models.
References
- 1. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 2020;368:489–493. pmid:32179701
- 2. Hethcote H. The mathematics of infectious diseases. SIAM Rev 2000;42: 599–653. https://doi.org/10.1137/S0036144500371907
- 3. Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proceedings Royal Soc. London (A); 1927;115:700–721. https://doi.org/10.1098/rspa.1927.0118
- 4. Hethcote HW, Tudor DW. Integral equation models for endemic infectious diseases. J Math Biology 1980;9;37–4. pmid:7365328
- 5.
Keeling MJ, Rohan P. Modelling infectious diseases. Princeton University Press; 2008. ISBN 9780691116174. https://doi.org/10.2307/j.ctvcm4gk0
- 6.
Diekmann O, Heesterbeek H, Britton T. Mathematical tools for understanding infectious disease dynamics. Princeton series in Theoretical and computational biology. Princeton, NJ: Princeton University Press; 2013. https://doi.org/10.1002/AJHB.22436
- 7. Yin F, Shao X, Ji M, Wu J. Quantifying the influence of delay in opinion transmission of COVID-19 information propagation: modeling study. Journal of Medical Internet Research 2021;23. pmid:33529153
- 8. Liu Z, Magal P, Seydi O, Webb G. A COVID-19 epidemic model with latency period. Infectious Disease Modelling 2020;5:323–337. ISSN 2468-0427. pmid:32346664
- 9. Agapiou S, Anastasiou A, Baxevani A, et al. Modeling the first wave of Covid-19 pandemic in the Republic of Cyprus. Sci Rep 2021;11:7342. pmid:33795723
- 10. Yanes-Lane M., Winters N., Fregonese F., Bastos M., Perlman-Arrow S., Campbell J., et al. (2020). Proportion of asymptomatic infection among COVID-19 positive persons and their transmission potential: A systematic review and meta-analysis. PLOS ONE, 15(11), e0241536. pmid:33141862
- 11. Wu C., Xu C., Mao F., Xu X., & Zhang C. (2022). The impact of invisible-spreaders on COVID-19 transmission and work resumption. PLOS ONE, 17(1), e0252994. pmid:35020722
- 12. Xi Huo, Jing Chen, Shigui Ruan. Estimating asymptomatic, undetected and total cases for the COVID-19 outbreak in Wuhan: a mathematical modeling study. BMC Infectious Diseases;2021;21:476. pmid:34034662
- 13. Bass FM. A new product growth for model consumer durables. Management Science 1969;15(5):215–227. https://doi.org/10.1287/mnsc.15.5.215
- 14. Škraba A, Vavtar B, Stanovov V, Semenkin E, Stojanović R. Parametrization of bass diffusion model on COVID-19 first wave data. IOP Conference Series: Materials Science and Engineering 2021;1047(1):012084. https://doi.org/10.1088/1757-899x/1047/1/012084
- 15. Eryarsoy E, Delen D, Davazdahemami B, Topuz K. A novel diffusion-based model for estimating cases, and fatalities in epidemics: the case of COVID-19, Journal of Business Research 2021;124:163–178. ISBN 9780691155395. pmid:33281248
- 16. Li MY, Muldowney J. Global stability for the SEIR model in epidemiology. Mathematical Biosciences 1995;125(2):155–164. pmid:7881192
- 17. Lauer S, Grantz K, Bi Q, Jones FK, Zheng Q, Meredith H, et al. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Annals of Internal Medicine 2020. pmid:32150748
- 18. Zhao H, Lu X, Deng Y, Tang Y, Lu J. COVID-19: asymptomatic carrier transmission is an underestimated problem. Epidemiology and Infection 2020;148:e116:1–3. pmid:32525469
- 19. Moghadas SM, Fitzpatrick MC, Sah P, Pandey A, Shoukat A, Singer BH, et al. The implications of silent transmission for the control of COVID-19 outbreaks. Proceedings of the National Academy of Sciences Jul 2020;117(30):17513–17515. pmid:32632012
- 20.
Powersim Studio | Powersim Software. Powersim.com. (2021). Retrieved 8 July 2021, from https://powersim.com/.
- 21.
Data on COVID-19 (coronavirus) by Our World in Data. URL: https://github.com/owid/covid-19-data/tree/master/public/data
- 22. Feng S., Feng Z., Ling C., Chang C., & Feng Z. (2021). Prediction of the COVID-19 epidemic trends based on SEIR and AI models. PLOS ONE, 16(1), e0245101. pmid:33417605
- 23. Ghaffarzadegan N, Rahmandad H. (2020), Simulation-based estimation of the early spread of COVID-19 in Iran: actual versus confirmed cases. Syst Dyn Rev 2020;36:101–129. pmid:32834468
- 24. Storn R, Price K. Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 1997;11:341–359. https://doi.org/10.1023/A:1008202821328
- 25.
Tanabe R., & Fukunaga A.S. (2014). Improving the search performance of SHADE using linear population size reduction. 2014 IEEE Congress on Evolutionary Computation (CEC), 1658–1665. https://doi.org/10.1109/CEC.2014.6900380
- 26. Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020;17:261–272. pmid:32015543
- 27. Chu J. A statistical analysis of the novel coronavirus (COVID-19) in Italy and Spain. PLoS ONE, 2021, 16, https://doi.org/10.1371/journal.pone.0249037
- 28. Schmitt F.G. An algorithm for the direct estimation of the parameters of the SIR epidemic model from the I(t) dynamics. European Physical Journal plus, 2022, 137, pmid:34961835
- 29. Yu C, Wang Z, Xu Y, Hu M, Chen K, Qin G. Assessment of basic reproductive number for COVID-19 at global level. Medicine 2021; 100. https://doi.org/10.1097/MD.0000000000025837
- 30. Jayaweera M., Perera H.R., Gunawardana B., & Manatunge J. (2020). Transmission of COVID-19 virus by droplets and aerosols: A critical review on the unresolved dichotomy. Environmental Research, 188, 109819–109819, pmid:32569870
- 31. Lewis D. (2021). COVID-19 rarely spreads through surfaces. So why are we still deep cleaning? Nature, 590 7844, 26–28, pmid:33514939
- 32. Galbadage T., Peterson B.M., & Gunasekera R.S. (2020). Does COVID-19 Spread Through Droplets Alone? Frontiers in Public Health, 8, pmid:32391310
- 33.
Centers for Disease Control and Prevention (CDC). "SARS-CoV-2 and surface (fomite) transmission for indoor community environments." CDC COVID-19 Science Briefs [Internet. Atlanta, GA: Centers for Disease Control and Prevention (US), 2020. https://www.cdc.gov/coronavirus/2019-ncov/more/science-and-research/surface-transmission.html
- 34. Azuma K., Yanagi U., Kagi N., Kim H., Ogata M., & Hayashi M. (2020). Environmental factors involved in SARS-CoV-2 transmission: effect and role of indoor environmental quality in the strategy for COVID-19 infection control. Environmental Health and Preventive Medicine, 25, pmid:33143660
- 35. Anderson E.L., Turnham P., Griffin J.R., & Clarke C.C. (2020). Consideration of the Aerosol Transmission for COVID‐19 and Public Health. Risk Analysis, 40, 902–907 pmid:32356927
- 36. Ehsanifar M. (2021). Airborne aerosols particles and COVID-19 transition. Environmental Research, 200, 111752–111752, pmid:34302822
- 37. Arenas A., Cota W., Gómez-Gardeñes J., Gómez S., Granell C., Matamalas J.T., et al. Modeling the Spatiotemporal Epidemic Spreading of COVID-19 and the Impact of Mobility and Social Distancing Interventions. Physical Review X, 10, 041055. (2020): 041055, https://doi.org/10.1103/physrevx.10.041055
- 38. Cardoso, Ben-Hur Francisco, and Sebastián Gonçalves. Universal scaling law for human-to-human transmission diseases. EPL (Europhysics Letters) 133.5 (2021): 58001, https://doi.org/10.1209/0295-5075/133/58001