Transmission matrix parameter estimation of COVID-19 evolution with age compartments using ensemble-based data assimilation

Santiago Rosa; Manuel A. Pulido; Juan J. Ruiz; Tadeo J. Cocucci

doi:10.1371/journal.pone.0318426

Abstract

The COVID-19 pandemic, with its multiple outbreaks, has posed significant challenges for governments worldwide. Much of the epidemiological modeling relied on pre-pandemic contact information of the population to model the virus transmission between population age groups. However, said interactions underwent drastic changes due to governmental health measures, referred to as non-pharmaceutical interventions. These interventions, from social distancing to complete lockdowns, aimed to reduce transmission of the virus. This work proposes taking into account the impact of non-pharmaceutical measures upon social interactions among different age groups by estimating the time dependence of these interactions in real time based on epidemiological data. This is achieved by using a time-dependent transmission matrix of the disease between different population age groups. This transmission matrix is estimated using an ensemble-based data assimilation system applied to a meta-population model and time series data of age-dependent accumulated cases and deaths. We conducted a set of idealized twin experiments to explore the performance of different ways in which social interactions can be parametrized through the transmission matrix of the meta-population model. These experiments show that, in an age-compartmental model, all the independent parameters of the transmission matrix cannot be unequivocally estimated, i.e., they are not all identifiable. Nevertheless, the time-dependent transmission matrix can be estimated under certain parameterizations. These estimated parameters lead to an increase in forecast accuracy within age-group compartments compared to a single-compartmental model assimilating observations of age-dependent accumulated cases and deaths in Argentina. Furthermore, they give reliable estimations of the effective reproduction number. The age-dependent data assimilation and forecasting of virus transmission are crucial for an accurate prediction and diagnosis of healthcare demand.

Citation: Rosa S, Pulido MA, Ruiz JJ, Cocucci TJ (2025) Transmission matrix parameter estimation of COVID-19 evolution with age compartments using ensemble-based data assimilation. PLoS ONE 20(4): e0318426. https://doi.org/10.1371/journal.pone.0318426

Editor: Matthew Chin Heng Chua, National University of Singapore, SINGAPORE

Received: July 17, 2023; Accepted: January 14, 2025; Published: April 28, 2025

Copyright: © 2025 Rosa et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The raw data used in this work can be found at https://sisa.msal.gov.ar/datos/descargas/covid-19/files/Covid19Casos.zip. A curated version is available at http://covid19.unne.edu.ar/obs_arg.csv. The codes developed for the work are available at https://gitlab.com/pulidom/covid/.

Funding: This study received funding from the Ministry of Science, Technology and Innovation of Argentina (PICT2021-I130 Dr. Manuel A. Pulido, Dr. Juan J. Ruiz), the General Secretariat for Science and Technology, Universidad Nacional del Nordeste (CORR 01 COVID FEDERAL EX-2020-38902538, Santiago Rosa, Dr. Manuel A. Pulido, Dr. Juan J. Ruiz), and the National Scientific and Technical Research Council (PICT2020-SERIEA-I-A Dr. Juan J. Ruiz). The sole responsibility for the content of this publication lies with the authors. The funders had no role in study design and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Governments worldwide faced several challenging decisions as the SARS-COV-2 virus spread in early 2020. Several non-pharmaceutical interventions, from social distancing measures for high-risk population to general lockdowns, were implemented to alleviate the propagation of COVID-19, at the expense of a decline in productivity. While lockdowns can significantly impact epidemic propagation by flattening the active cases curve, they also negatively affect education and social activities. Moreover, COVID-19 outbreaks impact the economy, as evidenced in the case of strictly enforced sick leaves. Therefore, decision-makers must carefully evaluate the trade-off between socio-economical well-being and public health. Real-time decision-making tools are required for monitoring the pandemic’s situation and for predicting the evolution of the disease at different scales: from neighborhoods and cities to states and nationwide. Epidemiological predictions can help prevent the health system overload, allowing governments to implement timely non-pharmaceutical interventions and avoid healthcare collapse. Research on monitoring and modeling COVID-19 spread (e.g.[1]) had a strong political impact worldwide. However, the dispair COVID-19 evolution in various countries made clear that continuous monitoring of the local spreads was required to adopt timely distancing measures.

The propagation of COVID-19 has been modeled using epidemiological compartmental models, such as Susceptible-Exposed-Infected-Recovered (SEIR) models. The exponential growth of the initial phase of an outbreak may be well represented by compartmental models. However, the virus propagation is subject to the complexity of human interactions or individual-wise varying viral loads [2] and this poses challenges for compartmental models to accurately describe such dynamics. Even the most advanced meta-population models (e.g. GLEAM [3]) and agent-based models [4] crudely represent the transmission dynamics of the virus due to the inherent difficulty in modeling interactions between individuals. Furthermore, social life, and so human interactions, underwent significant changes throughout the pandemic.

The accumulated data on the epidemic was rather limited and prone to errors due to detection policies changed with time, delays in reported cases occurring during weekends, and the absence of hospital discharge dates, among other factors. In addition to these sources of data uncertainty, a significant number of cases were not detected. Many individuals either experienced mild or no noticeable symptoms so that they were not reported and, on a smaller scale, the tests gave false negatives [5]. Given the incomplete and noisy nature of the data and the inherent challenges in accurately representing complex underlying processes with models, the idea of combining model and data becomes appealing. Real-time model-data fusion techniques, such as sequential inference and data assimilation, aim to combine very diverse sources of information considering their uncertainties.

There are several ways to combine data with models during the evolution of an outbreak. In the context of Bayesian inference, some works use Monte Carlo Markov-Chain models [6–9]. Alternatively, other works propose the use of data assimilation techniques for epidemiological modeling, which is computationally cheaper at the cost of assuming Gaussianity. Shaman et al. [10,11], use an ensemble-based data assimilation framework to model influenza propagation. The state evolution of an epidemiological SIRS model (Susceptible, Infectious, Recovered, Susceptible), is combined with direct and indirect data (e.g. level of web activity related to the illness) from the epidemic. At the same time, the parameters of the system are learned online as the observations become available. In these works, they use a variant of the ensemble Kalman filter (EnKF). Due to the need for monitoring the spread of COVID-19 and the abundance of worldwide data, some works use these data assimilation techniques to estimate the spread of the SARS-COV-2 virus. Li et al. [12] use the iterated filter-ensemble adjustment Kalman filter to assimilate COVID-19 data within China using a meta-population model and mobility data. They propose the estimation of the undocumented (asymptomatic) infections fraction together with the rate of transmission of the undocumented infections. They estimate the undocumented rate to be 86%. Engbert et al. [13] use an EnKF for regional transmission modeling. They propose estimating time-independent parameters by maximizing the likelihood in a stochastic SEIR model to capture the dynamics of the pandemic at regional levels. Evensen et al. [14] apply an ensemble Kalman smoother technique to a meta-population model. The evolution of epidemiological parameters is estimated over a long time period. The technique can capture the abrupt changes in the reproduction number found in several countries following the implementation of lockdown measures.

There is a strong dependence between the severity of COVID-19 symptoms and age. Infections among children and young people often result in asymptomatic cases. On the other hand, Individuals aged over 60 tend to develop the most severe symptoms, often necessitating hospitalization during the course of the illness. Transmission effects have also been associated with age [15–17]: while children under 10 years old appear to have a low susceptibility to infection, people over 60 are highly susceptible. Hence, a technique for real time monitoring must use age-disaggregated data for an effective response to epidemics [18,19]. This includes accurate prediction of hospital beds and ventilators availability. Moreover, identifying age-dependent patterns in virus transmission is essential for policymaking regarding non-pharmaceutical interventions, such as deciding when to open or close schools [14].

Estimating the number of contacts between individuals for a particular population poses a challenge. This can be achieved by statistically significant population surveys. Arregui et al. [20] use surveys from eight countries [21] to extrapolate known contact matrices to other countries. Klepac et al. [22] use the data collected from a smartphone application in the UK to infer social interactions. The data contains the contact history of each user labeled by age groups, so that an empirical statistical contact matrix of the population is estimated. This matrix was then utilized in an agent-based model (ABM) to simulate an influenza-like outbreak, contributing to the BBC documentary Contagion. These works use a fixed contact matrix to study the evolution of epidemics and there is no estimation of time-varying contact rates.

Previous works assume a time-independent transmission matrix, however the response to COVID evolution showed that the social distancing measures changed with time and independently for different age groups. Motivated by this limitation, this work explores alternatives to a fixed transmission matrix by proposing and estimating time-varying transmission matrix parameterizations. We aim to estimate changes over time in the transmission matrix, including variations in mobility within specific age groups. To achieve this, we integrate a meta-population SEIRHD model with a stochastic EnKF to assimilate age-structured cumulative cases and deaths. Alongside the transmission parameters, we also estimate other crucial parameters, such as the effective reproduction number and the fraction of detected cases and deaths, utilizing information on the age-structured data related to the spread of the virus.

The outline of this article is as follows:

In Section 2 we show our model and introduce the data assimilation framework.
In Section 3 we give details of the real-world data utilized, present the general experimental details and show the different contact matrix parameterizations used.
In Section 4 we present and discuss the results, each subsection corresponds to a different experiment including synthetic and real-world data experiments.
In Section 5 we draw the conclusions of our investigation.

2 Technique details

2.1 Compartmental epidemiological model

In this work, the evolution of COVID-19 is modeled for the entire population of a region, which is assumed to be isolated. The model is an extension of a basic SEIR model [23], applied to a closed population (i.e. no births, deaths, immigration or emigration) divided into n age groups. This model classifies individuals of a population into the following mutually exclusive categories: (susceptible), (exposed but not infectious), (infected), (mild symptoms), (severe symptoms), (critical symptoms), (recovered) and (dead). The index j = 1 , . . . , n denotes the corresponding age group.

The flow between epidemiological categories of the model is shown in Fig 1. Infected individuals in the age group j can interact with susceptible individuals in the age group k with a transmission rate . Susceptible individuals exposed to the disease move to the exposed compartment . Individuals in this compartment do not transmit the virus. After a mean incubation time , exposed individuals move to the infected group . At this stage, individuals can spread the virus to susceptible persons during the period . Subsequently, individuals move to the compartments , or with probabilities , and , respectively. The group comprises individuals with severe cases that require hospitalization and, after a time , recover from the disease and move to the recovered compartment . The compartment (critical) represents the individuals with severe cases that require hospitalizations and, after a time , die and move to the dead compartment . The compartment consists of the individuals who present mild symptoms and require no hospitalization, and after a time , they transit to the recovered compartment. The model does not incorporate transitions between , and , because these compartments do not represent the illness progression but rather indicate the worst state that an individual has reached. If an individual is misclassified in one of these compartments, the data assimilation system will make the required adjustments to correct it, as explained in the next section. After a period , individuals from the recovered compartment become susceptible again, given that SARS-COV-2 immunity diminishes substantially after 5-7 months [24]. The compartments are designed to characterize the dynamics of COVID-19 infection. Individuals are unable to transmit the virus in the initial incubation phase becoming infectious afterward. They are also expected to be isolated once the symptoms are apparent (or tested positive). Therefore, individuals in , , or are expected to be isolated and do not spread the disease, only individuals in the compartment do.

Download:

Fig 1. Diagram of the compartmental model.

An individual moves to the next compartment after a period , which depends on the compartment X.

https://doi.org/10.1371/journal.pone.0318426.g001

The model parameters are the transmission matrix parameters (which is the number of contacts that a person in group j have with persons in group k, in a period of time Δt, multiplied by the probability of that contact resulting in an infection), the average time an individual stays in each of the epidemiological states , , , , and , and the fractions of infections and of moving to and , respectively. The population of each age compartment is constrained by the total population of the age group .

The resulting model equations are

(1)

Table 1 summarizes the variables and the parameters and Table 2 shows the numeric values of all the fixed parameters. In [25], is reported, and in [26] , while and are set both to 15 days following [14]. The fractions of hospitalizations, , in Table 2 were estimated from the early stages of the pandemic using the available data. Eqs (1) are integrated with the Euler method using a time step of 1h.

Download:

Table 1. Model variables and parameters.

https://doi.org/10.1371/journal.pone.0318426.t001

Download:

Table 2. Numeric values of the model parameters. All time scales (τ’s) are expressed in days.

https://doi.org/10.1371/journal.pone.0318426.t002

The most important parameters controlling the spread of a disease in a meta-population model are the elements of the transmission matrix. In a population divided into age-compartments, these elements represent the interaction between the infected and susceptible age groups, hence they are the main driver of the disease evolution. One of the central objectives of this work is to parameterize and estimate, from observed data, the transmission matrix from observations to obtain a better representation of the propagation between age groups. In Eq (1), the elements of the transition matrix are not independent [20]: the total number of contacts that individuals in the group j have with those in group k has to be equal to the total amount of contacts that individuals in group k have with those in group j:

(2)

The most relevant parameter in epidemiological modeling is the basic reproduction number. It represents the mean number of new infected individuals caused by one infected person in a totally susceptible population. The basic reproduction number may be estimated in compartmental models by linearizing the dynamics of the infected differential subsystem, which is the part of the model that governs the production of new infections when all individuals are susceptible (in a SEIR model, for example, this subsystem are the compartments SEI). The resulting Jacobian matrix is known as next generation matrix [27], whose spectral radius corresponds to the basic reproduction number R₀. If the linearization of the infected subsystem is conducted at time t, the spectral radius of the resulting matrix is known as the effective reproduction number . This represents the number of secondary cases that an infected individual produces at time t, assuming the remaining non-infected or recovered population is susceptible. A review of the topic can be found in [28].

2.2 State-parameter estimation with ensemble-based data assimilation

The evolution of epidemiological variables can be modeled as a partially observed time-evolving process, i.e. a hidden Markov model [29]. Within this framework, the evolution of the state of the system can be written as

(3)

where x_k is the state of the system at time k, M ( ) is the dynamical model and η_k is the model error, representing simplifications and numerical approximations in the model. It is assumed to be a realization of (normal distribution of mean 0 and covariance matrix ). In our case, the dynamical model M ( ) is (1). The second equation set forming the hidden Markov model corresponds to the observational map (in other words, the relationship between state variables and the observed quantities that can be measured such as the daily number of new infections). The observations are related to the state x_k by the observation operator which maps the space of state variables to the observational space:

(4)

where ϵ_k is the observation error assumed to be a realization of and Ř is the observation covariance matrix (assumed known). This represents the uncertainty in the observed values such as those produced by limitations in testing strategies, delays in the transmission of reports, etc. The observation covariance matrix is assumed diagonal

(5)

where is the variance of the observed variables. In Section 3, we define . The observation covariance may be estimated with expectation-maximization [30], however here we consider a fixed observation covariance.

If the the hidden Markov model (3) and (4) satisfies the conditions of detailed balance, the inference may be greatly simplified. Here we assume a general framework and use filtering theory for the inference.

In filtering theory, the estimation problem involves obtaining the conditional probability density function (pdf) of x_k knowing the current and past observations , denoted by p ( x_k | Y_k ) (a.k.a. filtering or analysis distribution). We can obtain the prediction pdf by performing a forecast step

(6)

then, using Bayes theorem, the posterior density conditioned on the set of observations is derived,

(7)

Eqs. (6) and (7) can be solved sequentially every time new observations y_k are available, but they have to be integrated over the entire state space, which is usually computationally intractable. However, using a sample-based representation of the distributions, the forecast step can be approximated by a Monte Carlo approach by simply evolving every sample point forward with the model M ( ) . In this work we employ the EnKF, which is a Monte Carlo non-linear extension of the Kalman Filter [31]. The analysis distribution is represented by an ensemble of possible states. The resulting analysis state members are of the form

(8)

where the supra-index i denotes the i-th ensemble member. Each forecast state member is obtained by evolving the model (1) from the previous analysis state: , and then corrected with an innovation term based on the difference between the observation and the forecast state. In Eq (8), the observation vector is perturbed with Gaussian noise: , where . This is required to obtain a sample covariance of the analysis state members with the expected analysis covariance [32]. The matrix K_k, referred to as Kalman gain, gives the weight to the innovation term, and is the rate between the forecast uncertainty and the total uncertainty, namely

(9)

where is the forecast error covariance and Ȟ is the tangent linear observational operator defined by

(10)

In an ensemble Kalman filter, is estimated from the ensemble of forecasted state vectors at time k:

(11)

The analysis mean state, , provides a point estimate of the state of the system.

2.3 Observation operator

Observations are assumed to be cumulative cases () and deaths () both disaggregated by ages. The map from state to observation space is as follows: we assume that the cases are partially documented. This is achieved with a time-dependent parameter in the observational operator, H, which accounts for the sub-detection of cases. In other words, we assume there is a sub-detection bias in some observational variables. This parameter depends on the age group, since the symptoms may increase with age so that the amount of undocumented cases is larger for children. In the age group j, the relation between the cumulative observed cases () and observed deaths () and the state variables at time k is

(12)

where , and

(13)

To avoid index overclutter, we do not include temporal index k in this equation, but note that γ is a time-varying parameter.

For parameter estimation with the ensemble-based data assimilation technique, the model parameters γ to be estimated and the state variables , , , , , , , are concatenated together into an augmented state vector x. Then, the model parameters are estimated in the same way as the state variables, using the EnKF. This parameter estimation methodology is known as augmented state. A review of parameter estimation using various data assimilation methods based on the state augmentation approach can be found in [33]. The fractions of detected cases γ_j are also estimated in this way. The prior density at the initial time is assumed Gaussian for both model variables and parameters. This is coherent with the Gaussian assumption of the Kalman filter. Although the parameters are not part of the model equations, their estimation can be conducted in the same way as for the model variables. They are estimated through their correlations with the observed variables. Therefore, parameter estimation depends crucially on an accurate quantification of the augmented forecast covariance matrix (11).

While chaotic dynamics drive the evolution of state variables leading to an increase in their ensemble spread, persistence is assumed for the time evolution of the parameters. Because of this, an inflation method is required to prevent the parameter ensemble spread from collapsing during the recurrence (e.g. Ruiz et al. 2013) [33].

We conducted preliminary experiments to evaluate the use of multiplicative inflation in the EnKF framework. Despite we use two independent inflation factors, one for the parameters and one for the state variables [34], we were not able to find a suitable set of inflation factors. The attempted combinations resulted in either filter divergence or poor estimation performance. Consequently, we opted for the stochastic approach originally proposed in [35]. The parameter evolution of each ensemble member is modeled as an independent auto-regressive process or correlated random walk with correlation ρ and standard deviation σ, this is

(14)

where ξ is a random vector sampled from a Gaussian distribution with zero mean and identity covariance matrix. The random perturbation is added before the analysis step and is only applied to the parameters ; no inflation is applied to the model state variables.

During the analysis update, the EnKF can result in non-physical values for some model parameters and ensemble members (e.g. negative values for the transmission matrix elements). This is a consequence of the assumption of Gaussian forecast error in the EnKF. To avoid this complication, we force the lower limit of all the estimated parameters to 0 in each ensemble member.

To summarize our estimation method, the EnKF methodology is represented concisely in Algorithm 1.

Algorithm 1: Stochastic ensemble Kalman Filter

3 Experimental details

3.1 Transmission matrix parameterizations

For an n × n transmission matrix there are independent parameters to be estimated instead of because of the restriction (2). In our case we use three age groups, so the resulting transmission matrix is

(15)

where parameters depend on time.

As shown in the experiments in Section 4, the parameters of (15) are not all identifiable when only information of the accumulated infection cases in each group is available, without details regarding the specific age group responsible for the new exposed individuals.

To overcome this limitation, we propose a parameterization for the transmission matrix with fewer parameters:

(16)

from now on, we call this matrix the parameterized transmission matrix.

This parameterization is a particular case of (15) where the upper diagonal parameters ij are defined as a function of the diagonal elements of the row i and column j: , and the lower diagonal parameters are defined by the constraint (2). The parameter α controls the relative importance of inter-age group and intra-age group infections, with lower values giving more weight to the latter.

3.2 Data

We use three age groups in the range of [ 0 , 30 ) , [ 30 , 65 ) and [ 65 , − ] years. This division is motivated because we want to represent age groups with different activities, so that children and young individuals’ activities are mainly school and universities, adults are the working age group and the senior population is assumed to be mainly retired. At the same time, these groups grossly represent different health profiles, with the senior population being the ones that most likely will develop severe symptoms, while the first age group are more likely to have minor symptoms. The total population is assumed to be 44.8 million divided into the three age groups by , , and 4.8 × 10⁶, which represent the approximate number of people within the aforementioned age groups in Argentina (taken from the 2010 population census).

Synthetic observations.

In a data assimilation context, a “twin” experiment is an idealized simulation in which we use a known epidemiological model with a set of “true” parameters to generate synthetic observations by first computing the model state evolution, and then adding random noise in the observation space to simulate the observation error (4). In this way, we generate cumulative cases and deaths for each age group using (1) and add random normal noise to them to generate the synthetic observations. Then, the EnKF is used to reconstruct the evolution of the system and the parameters from the syntethic observations. In this way, we can assess the estimated parameters given by the technique comparing them to the “true” parameters (i.e., the ones used to generate the synthetic observations). The objective of the twin experiments is to evaluate the data assimilation-based parameter estimation in a context in which the true parameters are known and errors in the estimation can be accurately computed. We refer to a “true” state variable or parameter to the state variable or parameter used in the model to generate the synthetic observations.

The model used to simulate the observations uses a “true” transmission matrix which has the form (15).

The “true” parameters are defined as

(17)

The decrease in the transmission matrix parameters at t = 80 d mimics the effect of a lockdown, and the increase at time 140 d represents a relaxation to normal conditions but with some sanitary measures (e.g. social distancing, mandatory use of masks in public spaces, etc). These conditions result in a double outbreak situation as observed in Argentina (and several other countries) in the first year of the pandemic.

This time-dependent “true” transmission matrix must be estimated by the assimilation in the twin experiment. Note that the relative changes in the parameters are different for different age groups (i.e. not proportional). We chose on purpose a transmission matrix that cannot be fully represented by the parameterization (16), so that the model used in the estimation is not perfect (some structural uncertainty is introduced in the parameterization process). Another motivation was to represent the resulting different levels of mobility that were found in different age groups.

The true values of the fraction of detected cases , are taken to be 0 . 15, 0 . 2, and 0 . 3 corresponding to the young, adult, and senior age groups. For reference, a well-mixed population detection fraction was estimated in [12] to be 0.16 in the early stage of the pandemic. Intuitively, we expect a higher fraction of symptomatic cases for the elderly age group, as it is the most vulnerable population. The fraction of deaths of each age group is assumed to be 0 . 002, 0 . 05, and 0 . 1 . We used the values showed in [36], weighted by the corresponding population age group in Argentina.

Cumulative infected cases and deaths segmented by age groups are assumed to be daily observed during the time period. For the observational error, we set the observation standard deviation of the accumulated cases to , where indicates the observed cumulative cases for every age group j. We assume that deaths are well documented so the standard deviation of the deaths observational error is . The way we define the observational error means that eventually, all the observations will have the upper limit standard deviation after some time. These observation standard deviations are used to generate the synthetic observations and are also used as the observation variances in the ensemble Kalman filter (9).

Real world observations.

For the real-world experiments, we use epidemiological data from Argentina collected by the National Health Surveillance System (SNVS, for its acronym in Spanish). The SNVS dataset is openly available (http://datos.salud.gob.ar/dataset/covid-19-casos-registrados-en-la-republica-argentina) and consists in all the reported tests from public and private tests. The available information for each case is, among other data, the date of the test, the province of residence, age, and whether the person required hospitalization (with or without respiratory support) and if he or she died during the infection. The first case of SARS-CoV-2 in Argentina was reported on March 3, 2020. Just 16 days later on March 19, 2020, a nationwide lockdown was established. The curated time series of the data used in this work are available at http://covid19.unne.edu.ar/obs_arg.csv.

4 Results

We present our results in the following order:

In the subsection 4.1 we evaluate the model and data assimilation framework with synthetic observations (Section 3.2.1).
In the subsection 4.2 we apply the methodology to COVID-19 data of Argentina (Section 3.2.2).
In the subsection 4.3 we conduct forecasts to examine the performance of the meta-population model coupled with the EnKF using the data of Argentina.

4.1 Experiments with synthetic observations

The EnKF estimates all the variables of the system and the parameters of the transmission matrix, which are augmented to the system state vector. The dimension of the state vector is 24 (eight variables in each of the three age groups). From these, 21 variables are independent since there are 3 constraints (last Eq in (1)). The amount of estimated parameters is six in the case of the parameterized transmission matrix: three belonging to the parameterized transmission matrix and three corresponding to the fractions of detected cases, namely the parameter vector is so the augmented state vector dimension is 30. In the case of the full transmission matrix (15), there are nine estimated parameters: six from the matrix and three from the fraction of detected cases, i.e. θ = (λ₁₁, λ₂₂, λ₃₃, λ₁₂, λ₁₃, λ₂₃, γ₁, γ₂, γ₃) so that the augmented state vector dimension is 33.

As mentioned, the EnKF for parameter estimations requires an inflation approach for the parameter spread [33]. The filter exhibits convergence using the correlated random walk, (14), for high values of ρ and σ in the range [ 0 . 001 , 0 . 2 ] . We performed preliminary experiments to determine optimal values by minimizing the root-mean-square error (RMSE, i.e. ) for the additive inflation parameter values, which we found to be σ = 0 . 05 and ρ = 0 . 999. The same random walk parameter values are used in the real data experiments (Section 4.2).

Fig 2 shows the estimated parameters for the synthetic observation experiment (Section 3.2.1) employing the full transmission matrix (15). There is some delay in the estimated transmission matrix parameters compared to the true parameters at the abrupt changes due to the lockdown measure (both in the beginning and end). Estimated parameters start to adjust to these abrupt changes a few days after the change and they converge to a new value 20-30 days later. The reason for this is that parameters in ensemble-based assimilation systems are estimated through the correlation with observed variables, so these state-parameter correlations take some cycles to adapt to abrupt changes. This behavior can be reduced by tuning up the amount of inflation, at the expense of having an increased spread in the ensemble of estimated parameters and state variables. Overall, the amplitude of the abrupt changes is rather well estimated beyond the mentioned delay.

Download:

Fig 2. Estimated diagonal values (left panels) and off-diagonal parameters (right panels) of the full transmission matrix are shown for the synthetic observation experiments using different initial conditions in the parameters (color lines, IC1, IC2 and IC3).

True parameter values are shown with black lines. Shading around the curves indicates the parameter spread.

https://doi.org/10.1371/journal.pone.0318426.g002

To examine the identifiability and sensitivity to initial conditions of the estimated parameters, three independent experiments with different initial mean parameters (transmission matrix and fraction of detected cases) at t = 0 are shown in Fig 2 denoted as IC1, IC2 and IC3. Some of the time variability of the true parameters is captured. However, the different experiments converge to different estimated parameter values. Different initial conditions of the parameters result in different estimations, and neither of the three experiments is able to estimate precisely the true parameters (Fig 2).

The reason for this lack of identifiability is that an increase in the rate of cases say in the age group 1, can be attributed by the assimilation system to either a change in the parameter or a change in λ₁₁ and . Both scenarios result in the same infection rates so the information provided by the observations is not enough to identify the actual scenario. In the experiment, green curves in Fig 2 (IC2) present a greater underestimation at the end of the lockdown, compared to the orange and blue estimations (IC1 and IC3). This underestimation is balanced with the overestimation of λ₁₂, leading to an evolution of the number of cases consistent with the observations.

Fig 3 shows the estimated daily cases (left panels) and deaths (right panels) of the young (upper panels), adult (middle panels), and senior (lower panels) age groups, using the full transmission matrix (15). Although the assimilated observations are cumulative cases and deaths, we show daily cases, the new cases produced in one day, because differences are more visible. The term “daily cases” was taken from [12]. The three experiments with different initial conditions in the parameters give similar results (curves of the three experiments are indistinguishable in Fig 3). In the three experiments, the EnKF is able to keep track of the observations of cases and deaths in all the age groups, even though the transmission matrix parameters are not identifiable. The ensemble dispersion in the senior age group is relatively larger because the population is almost five times lower than in the other age groups, and all the age groups have the same observation error upper limit so that the relative error of the estimation is higher.

Download:

Fig 3. Estimated daily cases (left panels) and daily deaths (right panels) of the young (upper panels), adult (middle panels), and senior (lower panels) age groups for the full transmission matrix experiment using synthetic observations using different initial conditions in the parameters (color lines, IC1, IC2 and IC3).

The observations are shown with red dots and black lines represent the true variable values (they are almost indistinguishable). Shaded areas around colored curves indicate the corresponding variable spread.

https://doi.org/10.1371/journal.pone.0318426.g003

Given that the transmission matrix parameters are not identifiable using the matrix form (15), we conducted a second set of estimation experiments using the proposed parameterization (16) and the same set of synthetic observations. We took α = 0 . 25 in (15), which represents the degree of intra-group contagious. Fig 4 shows the estimated parameters of the parameterized transmission matrix. The right panels show the values of the diagonal, and the left ones show the values of the upper off-diagonal.

Download:

Fig 4. Estimated diagonal values of the parameterized transmission matrix (left panels) and off-diagonal values (right panels) using synthetic observations for different initial conditions (colored lines, IC1, IC2 and IC3), and the true parameter values (black lines).

Shaded areas around colored curves indicate the parameter ensemble.

https://doi.org/10.1371/journal.pone.0318426.g004

The three experiments converge to the same estimated parameter values, independently of the initial condition. The true values of the parameters cannot be estimated precisely because this parameterization is not able to fit the structure of the prescribed true transmission matrix (17). The parameter estimates in Fig 4 also show a delay in the representation of the sudden parameter changes found at the beginning and at the end of the lockdown period, as found in Fig 2.

Fig 5 shows the effective reproduction number calculated with the next generation matrix for the experiment corresponding to the parameterized transmission matrix (left panel) and to the full transmission matrix (right panel). The true values of can be accurately estimated with both parameterizations (apart from the delay in parameter changes), even when the true transmission matrix is non-reproducible by the parameterized transmission matrix. This result suggests that our parameterized transmission matrix is sufficiently flexible to capture the system’s and its temporal evolution. At the same time, its low enough dimensionality ensures identifiability of its parameters from the available observations.

Download:

Fig 5. Estimated effective reproduction number using the parameterized transmission matrix (left panel) and the full transmission matrix (right panel).

The two panels correspond to different assimilation experiments. Colored curves represent estimations with different initial conditions (IC1, IC2 and IC3), and black curves represent the true parameter values. Shading areas around colored curves indicate the parameter spread.

https://doi.org/10.1371/journal.pone.0318426.g005

Fig 6 shows the fraction of detected cases of each age group (right panels). We expect these parameters to be correlated to the observed accumulated cases and deaths. Therefore, the system should be able to constrain them. The true values of are accurately estimated by the assimilation system, regardless of the initial condition. The spurious peaks estimated in the parameterized transmission matrix at the lockdown transitions are also found in the parameters around time 80 d and, with much less intensity, at 170 d .

Download:

Fig 6. Estimated fraction of deaths for each age group (left panels) and estimated fraction of detected cases for each age group (right panels).

Left and right panels correspond to different assimilation experiments (either detected fractions or fractions of deaths are estimated). Estimations with different initial conditions are shown with colored curves (IC1, IC2 and IC3), and black curves indicate the true parameter values. Shading around colored curves represent the parameter spread.

https://doi.org/10.1371/journal.pone.0318426.g006

In the previous experiment, we estimated a parameterized transmission matrix and the fraction of detected cases . The cases, deaths, and the parameters can also be estimated alongside the fractions of deaths instead of . To illustrate this, we fix equal to the true values and perform three experiments that estimate the transmission matrix and the fraction of deaths. The parameters are similar to the ones shown in Fig 4. The obtained estimates are shown in the left panels of Fig 6. In all experiments, the estimated parameters converge to the true values, and the sudden change in the estimations is again observed at the times when the true transmission matrix parameters change. The shading in the upper right panel of Fig 6, parameter , is limited to zero because this is the lower bound imposed to parameters in the different ensemble members.

4.2 Experiments with the Argentinian COVID-19 data

An experiment was conducted with the same assimilation system as in the previous section but using the real COVID-19 data from Argentina (Section 3.2.2). In contrast to the twin experiments, the observations in this non-synthetic case may be biased and the observation error covariance is unknown. Indeed, the observed cases are highly noisy. One of the sources of the noise is due to the fact that testing and reports diminish on weekends and increases on Mondays and Tuesdays due to delayed reports, resulting in an spurious weekly cycle in the observations.

We estimate the time-dependent fraction of deaths and the parameterized transmission matrix (16) with α = 0 . 25 in the real observation experiments. The parameter vector in this case is . The estimation must account for a time interval of almost 1.5 years, time in which the SARS-Cov-2 virus underwent multiple mutations, changing the severity of the symptoms, there were advancements in treatments within the healthcare system and it includes the start of the vaccination period. This turns crucial the use of time-dependent parameters.

Fig 7 shows the daily cases (left panels) and daily deaths (right panels) of the young (top panels), adult (middle panels), and senior (bottom panels) age groups. The filter is able to keep track of the observations of each age group. As in the twin experiments, we use three sets of initial conditions, and they yield the same estimation of cases and deaths (in Fig 7 only IC3 estimates are visible). The high-frequency cycle found in the estimations of daily cases and deaths corresponds to the above mentioned spurious weekly cycle. If required, this effect can be mitigated by increasing the observational error of the cases, at the expense of an increase of the uncertainty of the estimations.

Download:

Fig 7. Estimated daily cases (left-side panels) and deaths (right-side panels) using the parameterized transmission matrix in the Argentinian data experiments for the young (upper panels), adult (middle panels) and senior age groups (bottom panels).

Observations are in red dots. Shades around the curves represent the estimated variable uncertainty. The lines of the three estimations, IC1, IC2 and IC3, are indistinguishable.

https://doi.org/10.1371/journal.pone.0318426.g007

The estimated deaths in the young age group have a high dispersion because of the relatively few observed cases compared to the other age groups and their relative errors. Although the estimated accumulated number of deaths is always positive, the daily changes in the number of deaths are sometimes negative for some ensemble members in the young compartment. This non-physical behavior is a consequence of the updates introduced by the observations which may eventually result in the reduction of the estimated number of deaths in order to better fit the observed values.

Fig 8 shows the three independent parameters , i = 1 , 2 , 3 of the parameterized transmission matrix (left panels), and the upper off-diagonal parameters (right panels). All different initial conditions yield the same estimations of the parameters. There is a predominance of the parameters λ₁₁ and given that the majority of the cases occur in the first two age groups. Consequently the interaction parameter between young people and adults is higher compared to and λ₂₃. The filter takes a couple of months to estimate parameters, with a spinup time longer than for the synthetic experiments, so that the estimates during the first two months depend on the chosen initial mean parameters. This parameter behaviour is consistent with the results of Sauer et al [37], where it is shown that an optimal set of parameters of a SEIR model can be estimated using the complete time-series of cumulative cases, but parameter identifiability problems occur when using only cumulative cases in the pre-peak interval. Besides this work compares the use of Markov Chain Monte Carlo and data assimilation methods for inference.

Download:

Fig 8. Estimated diagonal values of the parameterized transmission matrix (left panels) and diagonal values (right panels) for different initial conditions (color lines IC1, IC2, IC3).

Shading around colored lines represent the parameter spread.

https://doi.org/10.1371/journal.pone.0318426.g008

Fig 9 shows the estimated effective reproduction number . The estimated parameter does not depend on the chosen initial conditions. The peak at the start of the pandemic corresponds to the unrestricted spread of the virus until the lockdown starts on March 27, 2020. The filter takes some days to fully estimate the reproduction number up until the lockdown start given the inertia of the system, and after some days, it start to decreace. The estimated periods where corresponds to an increase in the cases up to the peaks. The initial condition IC1 (IC2) with a large (small) initial reproduction number reachs a higher (lower) peak on the lockdown date than the others. The lockdown start date is estimated earlier (later) than observed. The initial condition IC3 estimates correctly the time of the lockdown start. After this spinup time, all estimations converge to the same values.

Download:

Fig 9. Daily cases (upper panel) estimated with different initial conditions and estimated reproduction number (lower panel) using COVID-19 data of Argentina.

Shades around colored curves represent the corresponding variable spread. Vertical grey lines point out periods where , and the vertical red line points out when the lockdown started.

https://doi.org/10.1371/journal.pone.0318426.g009

Fig 10 shows the estimated fractions of deaths of the young (top plot), adult (middle plot) and senior (bottom plot) people age groups. The estimated parameters are slightly higher than the reference values 0 . 002, 0 . 05, and 0 . 1 [36] of the young, adult, and senior age groups. The death fraction of the senior age-group exhibits a large value in the first six months, about 0.175, but then it decreases substantially below 0.1. This may be caused by an improvement in the healthcare system: more hospital beds and artificial ventilators, early detection of low blood oxygen, to name a few.

Download:

Fig 10. Estimated fraction of deaths for young (upper), adults (medium) and senior (lower panel) age groups for different initial conditions (color lines IC1, IC2, IC3), and shades around colored curves represent the corresponding variable spread.

https://doi.org/10.1371/journal.pone.0318426.g010

4.3 Forecasts

To evaluate the potential use of the estimated parameters for decision making, we conducted an evaluation of the performance of the resulting forecasts using the estimated parameterized transmission matrix on the Argentinian COVID-19 data. Once assimilation of the last observations are conducted, individual constant (c), linear (l) and quadratic (q) fits are performed on the last 15 days of the estimated parameterized transmission matrix values to obtain the parameter tendencies. Then, these transmission matrix tendencies are projected 30 days forward, starting from the last value of the analysis state (current day). The extrapolation of the regressed parameters is asssumed to have a standard deviation of , where d is the lead time so that the parameter uncertainty increases over the lead time. Finally, 30-day forecasts are conducted with the free evolution of the model using the projected parameterized transmission matrix and starting from the current (last) analysis state. We compare the forecasts to the analysis daily cases using the entire set of observations over time as the reference. Fig 11 shows some selected forecasts up to 20 day lead time which are started over different dates (every 30 days) of the pandemic and evolved using the linear regression extrapolation of the estimated parameterized transmission matrix. Some forecasts are accurate but some diverge from what actually happened during the tendency changes of the pandemic. The orange shading indicates forecast uncertainty as given by the forecast ensemble members.

Download:

Fig 11. Selected 20-day forecasts (orange curves) conducted at different stages of the pandemic in Argentina using the linear regression.

Initial conditions are separated over 30 days (to avoid overcluttering). Observed cumulative cases are showed as daily new cases as red dots, and analysis of cases in blue using the EnKF with multiplicative inflation. Shades around curves represent ensemble members.

https://doi.org/10.1371/journal.pone.0318426.g011

To assess the accuracy of the different forecasts, a total of 400 forecasts are conducted each started everyday and for a maximum lead time of 30 days. The forecasts cover the time window from June 2020 to the end of August 2021, featuring two peaks of the pandemic so that there is a wide variety of pandemic behaviors. From June 2020, a large amount of cases is detected every day so that we can safely assume a mean-field dynamics in the data assimilation framework. To examine the impact of considering the interactions among different age groups on the forecasts, we repeat the forecast experiments using a well-mixed SEIR model without age-group divisions. This model is obtained setting n = 1 in Eq (1). The initial condition of the well-mixed forecast is the sum along the age groups of the meta-population model.

Fig 12 shows the relative RMSE as a function of the lead time. The behavior of the forecasts is similar along the age groups. There is a clear advantage of the meta-population model forecasts over the well-mixed ones in all age groups. In the meta-population model, the forecast using the extrapolation of the linear regression for the parameters is the most accurate, but in the well-mixed model the constant-fit forecast outperforms the others. In both cases, the quadratic-fit forecast is less accurate.

Download:

Fig 12. Relative average root mean squared error of the quadratic (orange), linear (blue), and constant (red) forecasts for the young (upper panel), adult (middle panel) and senior people (lower panel) using the parameterized transmission matrix and the well-mixed model (labeled ‘wm’).

The different color lines corresponds to the quadratic, linear and constant transmission matrix tendencies (denoted q, l, c in the labels, respectively).

https://doi.org/10.1371/journal.pone.0318426.g012

5 Conclusions

In this work, we used the ensemble Kalman filter applied to a meta-population compartmental model for monitoring epidemiological parameters of the SARS-CoV-2 virus and for conducting predictions. We sequentially calibrated the model parameters using a state augmentation approach. Crucially, in contrast to recent works that use a constant transmission matrix, we provided a time-dependent parameterization of an age-dependent transmission matrix that was identifiable by the assimilation system when observations of detected cases and deaths disaggregated by age are available. This approach allows for the detection of nontrivial parameter variations and interactions between age groups which would otherwise not be captured. Additionally, the approach recovers other important epidemiological parameters such as the mortality, fraction of undocumented cases, and the effective reproduction number, the last one diagnosed using the next generation matrix. The assimilation technique serves as a valuable tool for monitoring and predicting current and future contagious diseases. An assessment of the proposed technique was conducted using synthetic and real data from Argentina.

While three age groups were considered, the technique is readily adaptable to more age groups containing narrower age ranges for a more precise analysis. Attempting to estimate the full transmission matrix with only observing cases and deaths disaggregated by age groups results in the non-identifiability of matrix elements. To address this issue, we introduced a parameterization of the transmission matrix (16). This parameterization consist of a single inter-group transmission parameter, α, which can be estimated by conducting forecasts in a validation dataset (past evolution of the pandemic up to the ‘current’ pandemic day) and minimizing the relative root mean squared error as a function of α at an a priori defined forecast lead time.

The use of the meta-population model disaggregated by age groups led to a significant improvement of the daily cases forecast accuracy for up to 30 days lead times, when compared to well-mixed model that neglect the interaction of compartments among different age groups. This highlights the critical importance of disaggregating epidemiological information in both data and models. The age-dependent forecasts hold particular relevance for the deployment of epidemiological modeling for pandemic decision-making by governments.

We assessed three parameter regression functions for the transmission matrix values. These regressed functions are then extrapolated temporally to conduct the forecasts. Up to 15-day lead times, there is practically no difference in the forecast accuracy between the three regressed functions (constant, linear and quadratic), but for longer lead times, the linear and constant regression functions results in the most accurate forecasts.

The technique could be readily implemented in different cities or states providing the population age groups and epidemic observations disaggregated by age. The framework could undergo significant improvement by including hospitalizations as an observed variable. If reliable data on check-in and check-out hospitalizations were available (which was not the case in Argentina), relevant quantities could be estimated such as average hospitalization times and use of hospital beds, as well as parameters like the fraction of hospitalizations and the fraction of intensive care.

Within the EnKF framework, we assume that errors follow a Gaussian distribution, which may not be suitable for certain model parameters. Because of this, some model parameters need to be constrained to remain within their physically meaningful range. Specifically, parameters such as the parameterized transmission matrix, the fraction of detected cases, and the fraction of deaths, are enforced to be non-negative to prevent a non-physical evolution of the model. However, this in turn conflicts with the Gaussian assumption, particularly when the spread of the parameter is close to the boundaries of their meaningful range. This is the case for the fraction of deaths in the young age group. To overcome this limitation and account for a non-Gaussian density of the near-zero parameters, a non-parametric data assimilation framework, such as the mapping particle filter [38], can be applied. Additionally, the variables are assumed to evolve smoothly, a condition that is met when dealing with a relatively large number of individuals (country-level observations). In the case of city-level populations, the behavior of the age-meta-population model within the EnKF framework may not be robust. To enhance granularity in age groups and contacts, epidemiological agent-based models can be employed. Recent works by Cocucci et al. (2022) [39] used an EnKF combined with an ABM using mean field data to infer the COVID-19 pandemic in the city of Buenos Aires, Argentina. Schneider et al. (2022) [40] employed a complex agent-based network model to assimilate synthetic data at individual level.

References

1. Ferguson NM, Laydon D, Nedjati-Gilani G, Imai N, Ainslie K, Baguelin M, et al. Report Imperial College 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. https://doi.org/10.25561/77482
2. Großmann G, Backenköhler M, Wolf V. Heterogeneity matters: contact structure and individual variation shape epidemic dynamics. PLoS One 2021;16(7):e0250050. pmid:34283842
- View Article
- PubMed/NCBI
- Google Scholar
3. Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, Vespignani A. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci U S A 2009;106(51):21484–9. pmid:20018697
- View Article
- PubMed/NCBI
- Google Scholar
4. Kerr CC, Stuart RM, Mistry D, Abeysuriya RG, Rosenfeld K, Hart GR, et al. Covasim: an agent-based model of COVID-19 dynamics and interventions. PLoS Comput Biol 2021;17(7):e1009149. pmid:34310589
- View Article
- PubMed/NCBI
- Google Scholar
5. Lau H, Khosrawipour T, Kocbach P, Ichii H, Bania J, Khosrawipour V. Evaluating the massive underreporting and undertesting of COVID-19 cases in multiple global epicenters. Pulmonology 2021;27(2):110–5. pmid:32540223
- View Article
- PubMed/NCBI
- Google Scholar
6. Batista AA, da Silva SH. An epidemiological compartmental model with automated parameter estimation and forecasting of the spread of COVID-19 with analysis of data from Germany and Brazil. Front Appl Math Stat. 2022;8.
- View Article
- Google Scholar
7. Bertozzi AL, Franco E, Mohler G, Short MB, Sledge D. The challenges of modeling and forecasting the spread of COVID-19. Proc Natl Acad Sci U S A 2020;117(29):16732–8. pmid:32616574
- View Article
- PubMed/NCBI
- Google Scholar
8. Fang Y, Nie Y, Penny M. Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: a data-driven analysis. J Med Virol 2020;92(6):645–59. pmid:32141624
- View Article
- PubMed/NCBI
- Google Scholar
9. Dehning J, Zierenberg J, Spitzner FP, Wibral M, Neto JP, Wilczek M, et al. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science. 2020;369(6500):eabb9789. https://doi.org/10.1126/science.abb9789 pmid:32414780
10. Shaman J, Karspeck A. Forecasting seasonal outbreaks of influenza. Proc Natl Acad Sci U S A 2012;109(50):20425–30. pmid:23184969
- View Article
- PubMed/NCBI
- Google Scholar
11. Shaman J, Karspeck A, Yang W, Tamerius J, Lipsitch M. Real-time influenza forecasts during the 2012–2013 season. Nat Commun. 2013;4:2837. pmid:24302074
- View Article
- PubMed/NCBI
- Google Scholar
12. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 2020;368(6490):489–93. pmid:32179701
- View Article
- PubMed/NCBI
- Google Scholar
13. Engbert R, Rabe MM, Kliegl R, Reich S. Sequential data assimilation of the stochastic SEIR epidemic model for regional COVID-19 dynamics. Bull Math Biol 2020;83(1):1. pmid:33289877
- View Article
- PubMed/NCBI
- Google Scholar
14. Evensen G, Amezcua J, Bocquet M, Carrassi A, Farchi A, Fowler A, et al. An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation. FoDS 2021;3(3):413.
- View Article
- Google Scholar
15. Richard A, Wisniak A, Perez-Saez J, Garrison-Desany H, Petrovic D, Piumatti G, et al. Seroprevalence of anti-SARS-CoV-2 IgG antibodies, risk factors for infection and associated symptoms in Geneva, Switzerland: a population-based study. Scand J Public Health 2022;50(1):124–35. pmid:34664529
- View Article
- PubMed/NCBI
- Google Scholar
16. Dattner I, Goldberg Y, Katriel G, Yaari R, Gal N, Miron Y, et al. The role of children in the spread of COVID-19: Using household data from Bnei Brak, Israel, to estimate the relative susceptibility and infectivity of children. PLoS Comput Biol 2021;17(2):e1008559. pmid:33571188
- View Article
- PubMed/NCBI
- Google Scholar
17. Davies NG, Klepac P, Liu Y, Prem K, Jit M, CMMID COVID-19 working group, et al. Age-dependent effects in the transmission and control of COVID-19 epidemics. Nat Med 2020;26(8):1205–11. pmid:32546824
- View Article
- PubMed/NCBI
- Google Scholar
18. Diaz T, Strong KL, Cao B, Guthold R, Moran AC, Moller A-B, et al. A call for standardised age-disaggregated health data. Lancet Healthy Longev 2021;2(7):e436–43. pmid:34240065
- View Article
- PubMed/NCBI
- Google Scholar
19. Heidari S, Ahumada C, Kurbanova Z, GENDRO Gender, Evidence and Health Network. Towards the real-time inclusion of sex- and age-disaggregated data in pandemic responses. BMJ Glob Health 2020;5(10):e003848. pmid:33028702
- View Article
- PubMed/NCBI
- Google Scholar
20. Arregui S, Aleta A, Sanz J, Moreno Y. Projecting social contact matrices to different demographic structures. PLoS Comput Biol 2018;14(12):e1006638. pmid:30532206
- View Article
- PubMed/NCBI
- Google Scholar
21. Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med 2008;5(3):e74. pmid:18366252
- View Article
- PubMed/NCBI
- Google Scholar
22. Klepac P, Kissler S, Gog J. Contagion! The BBC Four Pandemic - The model behind the documentary. Epidemics. 2018;2449–59. https://doi.org/10.1016/j.epidem.2018.03.003 pmid:29576516
23. Blackwood J, Childs L. An introduction to compartmental modeling for the budding infectious disease modeler. LiB 2018;5(1):195–221.
- View Article
- Google Scholar
24. Ripperger TJ, Uhrlaub JL, Watanabe M, Wong R, Castaneda Y, Pizzato HA, et al. Orthogonal SARS-CoV-2 serological assays enable surveillance of low-prevalence communities and reveal durable humoral immunity. Immunity. 2020;53(5):925–933.e4. pmid:33129373
- View Article
- PubMed/NCBI
- Google Scholar
25. Guan W-J, Ni Z-Y, Hu Y, Liang W-H, Ou C-Q, He J-X, et al. Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med 2020;382(18):1708–20. pmid:32109013
- View Article
- PubMed/NCBI
- Google Scholar
26. Byrne AW, McEvoy D, Collins AB, Hunt K, Casey M, Barber A, et al. Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases. BMJ Open 2020;10(8):e039856. pmid:32759252
- View Article
- PubMed/NCBI
- Google Scholar
27. Diekmann O, Heesterbeek JA, Metz JA. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. J Math Biol 1990;28(4):365–82. pmid:2117040
- View Article
- PubMed/NCBI
- Google Scholar
28. Heffernan JM, Smith RJ, Wahl LM. Perspectives on the basic reproductive ratio. J R Soc Interface 2005;2(4):281–93. pmid:16849186
- View Article
- PubMed/NCBI
- Google Scholar
29. Cappé M, Moulines E, Rydén T. Inference in hidden Markov models. New York: Springer. 2005.
- View Article
- Google Scholar
30. Pulido M, Tandeo P, Bocquet M, Carrassi A, Lucini M. Stochastic parameterization identification using ensemble Kalman filtering combined with maximum likelihood methods. Tellus a: dynamic meteorology and oceanography 2018;70(1):1442099.
- View Article
- Google Scholar
31. Kalman RE. A new approach to linear filtering and prediction problems. J Basic Eng 1960;82(1):35–45.
- View Article
- Google Scholar
32. Burgers G, Jan van Leeuwen P, Evensen G. Analysis scheme in the ensemble Kalman filter. Mon Wea Rev 1998;126(6):1719–24.
- View Article
- Google Scholar
33. Ruiz JJ, Pulido M, Miyoshi T. Estimating model parameters with ensemble-based data assimilation: a review. J Meteorol Soc Jpn 2013;91(2):79–99.
- View Article
- Google Scholar
34. Jose Ruiz J, Pulido M, Miyoshi T. Estimating model parameters with ensemble-based data assimilation: parameter covariance treatment. J Meteorol Soc Jpn 2013;91(4):453–69.
- View Article
- Google Scholar
35. Liu J, West M. Combined parameter and state estimation in simulation-based filtering. New York: Springer. 2001. p. 197–223.
36. Coronavirus Age, Sex, Demographics (COVID-19)-Worldometer. 2022. [cited 2022 Feb 17]. https://www.worldometers.info/coronavirus/coronavirus-age-sex-demographics
37. Sauer T, Berry T, Ebeigbe D, Norton MM, Whalen AJ, Schiff SJ. Identifiability of infection model parameters early in an epidemic. SIAM J Control Optim. 2022;60(2):S27–48. pmid:36338855
- View Article
- PubMed/NCBI
- Google Scholar
38. Pulido M, van Leeuwen PJ. Sequential Monte Carlo with kernel embedded mappings: the mapping particle filter. J Comput Phys. 2019;396:400–15.
- View Article
- Google Scholar
39. Cocucci TJ, Pulido M, Aparicio JP, Ruíz J, Simoy MI, Rosa S. Inference in epidemiological agent-based models using ensemble-based data assimilation. PLoS One 2022;17(3):e0264892. pmid:35245337
- View Article
- PubMed/NCBI
- Google Scholar
40. Schneider T, Dunbar ORA, Wu J, Böttcher L, Burov D, Garbuno-Inigo A, et al. Epidemic management and control through risk-dependent individual contact interventions. PLoS Comput Biol 2022;18(6):e1010171. pmid:35737648
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Ferguson NM, Laydon D, Nedjati-Gilani G, Imai N, Ainslie K, Baguelin M, et al. Report Imperial College 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. https://doi.org/10.25561/77482

[ref2] 2. Großmann G, Backenköhler M, Wolf V. Heterogeneity matters: contact structure and individual variation shape epidemic dynamics. PLoS One 2021;16(7):e0250050. pmid:34283842
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, Vespignani A. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci U S A 2009;106(51):21484–9. pmid:20018697
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Kerr CC, Stuart RM, Mistry D, Abeysuriya RG, Rosenfeld K, Hart GR, et al. Covasim: an agent-based model of COVID-19 dynamics and interventions. PLoS Comput Biol 2021;17(7):e1009149. pmid:34310589
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Lau H, Khosrawipour T, Kocbach P, Ichii H, Bania J, Khosrawipour V. Evaluating the massive underreporting and undertesting of COVID-19 cases in multiple global epicenters. Pulmonology 2021;27(2):110–5. pmid:32540223
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Batista AA, da Silva SH. An epidemiological compartmental model with automated parameter estimation and forecasting of the spread of COVID-19 with analysis of data from Germany and Brazil. Front Appl Math Stat. 2022;8.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. Bertozzi AL, Franco E, Mohler G, Short MB, Sledge D. The challenges of modeling and forecasting the spread of COVID-19. Proc Natl Acad Sci U S A 2020;117(29):16732–8. pmid:32616574
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Fang Y, Nie Y, Penny M. Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: a data-driven analysis. J Med Virol 2020;92(6):645–59. pmid:32141624
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref9] 9. Dehning J, Zierenberg J, Spitzner FP, Wibral M, Neto JP, Wilczek M, et al. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science. 2020;369(6500):eabb9789. https://doi.org/10.1126/science.abb9789 pmid:32414780

[ref10] 10. Shaman J, Karspeck A. Forecasting seasonal outbreaks of influenza. Proc Natl Acad Sci U S A 2012;109(50):20425–30. pmid:23184969
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Shaman J, Karspeck A, Yang W, Tamerius J, Lipsitch M. Real-time influenza forecasts during the 2012–2013 season. Nat Commun. 2013;4:2837. pmid:24302074
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 2020;368(6490):489–93. pmid:32179701
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Engbert R, Rabe MM, Kliegl R, Reich S. Sequential data assimilation of the stochastic SEIR epidemic model for regional COVID-19 dynamics. Bull Math Biol 2020;83(1):1. pmid:33289877
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref14] 14. Evensen G, Amezcua J, Bocquet M, Carrassi A, Farchi A, Fowler A, et al. An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation. FoDS 2021;3(3):413.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref15] 15. Richard A, Wisniak A, Perez-Saez J, Garrison-Desany H, Petrovic D, Piumatti G, et al. Seroprevalence of anti-SARS-CoV-2 IgG antibodies, risk factors for infection and associated symptoms in Geneva, Switzerland: a population-based study. Scand J Public Health 2022;50(1):124–35. pmid:34664529
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref16] 16. Dattner I, Goldberg Y, Katriel G, Yaari R, Gal N, Miron Y, et al. The role of children in the spread of COVID-19: Using household data from Bnei Brak, Israel, to estimate the relative susceptibility and infectivity of children. PLoS Comput Biol 2021;17(2):e1008559. pmid:33571188
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref17] 17. Davies NG, Klepac P, Liu Y, Prem K, Jit M, CMMID COVID-19 working group, et al. Age-dependent effects in the transmission and control of COVID-19 epidemics. Nat Med 2020;26(8):1205–11. pmid:32546824
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref18] 18. Diaz T, Strong KL, Cao B, Guthold R, Moran AC, Moller A-B, et al. A call for standardised age-disaggregated health data. Lancet Healthy Longev 2021;2(7):e436–43. pmid:34240065
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref19] 19. Heidari S, Ahumada C, Kurbanova Z, GENDRO Gender, Evidence and Health Network. Towards the real-time inclusion of sex- and age-disaggregated data in pandemic responses. BMJ Glob Health 2020;5(10):e003848. pmid:33028702
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref20] 20. Arregui S, Aleta A, Sanz J, Moreno Y. Projecting social contact matrices to different demographic structures. PLoS Comput Biol 2018;14(12):e1006638. pmid:30532206
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref21] 21. Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med 2008;5(3):e74. pmid:18366252
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref22] 22. Klepac P, Kissler S, Gog J. Contagion! The BBC Four Pandemic - The model behind the documentary. Epidemics. 2018;2449–59. https://doi.org/10.1016/j.epidem.2018.03.003 pmid:29576516

[ref23] 23. Blackwood J, Childs L. An introduction to compartmental modeling for the budding infectious disease modeler. LiB 2018;5(1):195–221.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref24] 24. Ripperger TJ, Uhrlaub JL, Watanabe M, Wong R, Castaneda Y, Pizzato HA, et al. Orthogonal SARS-CoV-2 serological assays enable surveillance of low-prevalence communities and reveal durable humoral immunity. Immunity. 2020;53(5):925–933.e4. pmid:33129373
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref25] 25. Guan W-J, Ni Z-Y, Hu Y, Liang W-H, Ou C-Q, He J-X, et al. Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med 2020;382(18):1708–20. pmid:32109013
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref26] 26. Byrne AW, McEvoy D, Collins AB, Hunt K, Casey M, Barber A, et al. Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases. BMJ Open 2020;10(8):e039856. pmid:32759252
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref27] 27. Diekmann O, Heesterbeek JA, Metz JA. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. J Math Biol 1990;28(4):365–82. pmid:2117040
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref28] 28. Heffernan JM, Smith RJ, Wahl LM. Perspectives on the basic reproductive ratio. J R Soc Interface 2005;2(4):281–93. pmid:16849186
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref29] 29. Cappé M, Moulines E, Rydén T. Inference in hidden Markov models. New York: Springer. 2005.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref30] 30. Pulido M, Tandeo P, Bocquet M, Carrassi A, Lucini M. Stochastic parameterization identification using ensemble Kalman filtering combined with maximum likelihood methods. Tellus a: dynamic meteorology and oceanography 2018;70(1):1442099.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref31] 31. Kalman RE. A new approach to linear filtering and prediction problems. J Basic Eng 1960;82(1):35–45.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref32] 32. Burgers G, Jan van Leeuwen P, Evensen G. Analysis scheme in the ensemble Kalman filter. Mon Wea Rev 1998;126(6):1719–24.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref33] 33. Ruiz JJ, Pulido M, Miyoshi T. Estimating model parameters with ensemble-based data assimilation: a review. J Meteorol Soc Jpn 2013;91(2):79–99.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref34] 34. Jose Ruiz J, Pulido M, Miyoshi T. Estimating model parameters with ensemble-based data assimilation: parameter covariance treatment. J Meteorol Soc Jpn 2013;91(4):453–69.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref35] 35. Liu J, West M. Combined parameter and state estimation in simulation-based filtering. New York: Springer. 2001. p. 197–223.

[ref36] 36. Coronavirus Age, Sex, Demographics (COVID-19)-Worldometer. 2022. [cited 2022 Feb 17]. https://www.worldometers.info/coronavirus/coronavirus-age-sex-demographics

[ref37] 37. Sauer T, Berry T, Ebeigbe D, Norton MM, Whalen AJ, Schiff SJ. Identifiability of infection model parameters early in an epidemic. SIAM J Control Optim. 2022;60(2):S27–48. pmid:36338855
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref38] 38. Pulido M, van Leeuwen PJ. Sequential Monte Carlo with kernel embedded mappings: the mapping particle filter. J Comput Phys. 2019;396:400–15.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref39] 39. Cocucci TJ, Pulido M, Aparicio JP, Ruíz J, Simoy MI, Rosa S. Inference in epidemiological agent-based models using ensemble-based data assimilation. PLoS One 2022;17(3):e0264892. pmid:35245337
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref40] 40. Schneider T, Dunbar ORA, Wu J, Böttcher L, Burov D, Garbuno-Inigo A, et al. Epidemic management and control through risk-dependent individual contact interventions. PLoS Comput Biol 2022;18(6):e1010171. pmid:35737648
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

Figures

Abstract

1 Introduction

2 Technique details

2.1 Compartmental epidemiological model

2.2 State-parameter estimation with ensemble-based data assimilation

2.3 Observation operator

3 Experimental details

3.1 Transmission matrix parameterizations

3.2 Data

Synthetic observations.

Real world observations.

4 Results

4.1 Experiments with synthetic observations

4.2 Experiments with the Argentinian COVID-19 data

4.3 Forecasts

5 Conclusions

References