Skip to main content
Advertisement
  • Loading metrics

Estimating the time-varying effective reproduction number via Cycle Threshold-based Transformer

  • Xin-Yu Zhang,

    Roles Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliations College of Computer Science, Sichuan University, Chengdu, China, Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Sichuan University, Chengdu, China

  • Lan-Lan Yu,

    Roles Data curation, Methodology, Writing – review & editing

    Affiliations College of Computer Science, Sichuan University, Chengdu, China, Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Sichuan University, Chengdu, China

  • Wei-Yi Wang,

    Roles Data curation, Visualization, Writing – review & editing

    Affiliations College of Computer Science, Sichuan University, Chengdu, China, Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Sichuan University, Chengdu, China

  • Gui-Quan Sun ,

    Roles Conceptualization, Formal analysis, Investigation, Writing – review & editing

    gquansun@126.com (G-QS); quanhuiliu@scu.edu.cn (Q-HL)

    Affiliations Department of Mathematics, North University of China, Taiyuan, China, Complex Systems Research Center, Shanxi University, Taiyuan, China

  • Jian-Cheng Lv,

    Roles Formal analysis, Investigation, Writing – review & editing

    Affiliations College of Computer Science, Sichuan University, Chengdu, China, Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Sichuan University, Chengdu, China

  • Tao Zhou,

    Roles Formal analysis, Investigation, Writing – review & editing

    Affiliation Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China

  • Quan-Hui Liu

    Roles Conceptualization, Formal analysis, Investigation, Writing – original draft, Writing – review & editing

    gquansun@126.com (G-QS); quanhuiliu@scu.edu.cn (Q-HL)

    Affiliations College of Computer Science, Sichuan University, Chengdu, China, Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Sichuan University, Chengdu, China

Abstract

Monitoring the spread of infectious disease is essential to design and adjust the interventions timely for the prevention of the epidemic outbreak and safeguarding the public health. The governments have generally adopted the incidence-based statistical method to estimate the time-varying effective reproduction number Rt and evaluate the transmission ability of epidemics. However, this method exhibits biases arising from the reported incidence data and assumes the generation interval distribution which is not available at the early stage of epidemic. Recent studies showed that the viral loads characterized by cycle threshold (Ct) of the infected populations evolving throughout the course of epidemic and providing a possibility to infer the epidemic trajectory. In this work, we propose the Cycle Threshold-based Transformer (Ct-Transformer) to estimate Rt. We find the supervised learning of Ct-Transformer outperforms the traditional incidence-based statistic and Ct-based Rt estimating methods, and more importantly Ct-Transformer is robust to the detection resources. Further, we apply the proposed model to self-supervised pre-training tasks and obtain excellent fine-tuned performance, which attains comparable performance with the supervised Ct-Transformer, verified by both the synthetic and real-world datasets. We demonstrate that the Ct-based deep learning method can improve the real-time estimates of Rt, enabling more easily adapted to the track of the newly emerged epidemic.

Author summary

The time-varying effective reproduction number Rt is an important indicator in tracking the epidemic spread. The well-known method to estimate Rt is the incidence-based statistical method, which is constrained with the assumptions and available data. The recent studies show that the time-varying distribution of cycle threshold (Ct) values of the sampled infected population provides a possibility to infer the epidemic trajectory. Here, we propose the Cycle Threshold-based Transformer (Ct-Transformer), a deep neural network based method to estimate Rt. The results on both the synthetic and real-world datasets demonstrate that the Ct-Transformer surpass the traditional incidence-based and the existing Ct-based estimating methods. More importantly, the proposed self-supervised learning of Ct-Transformer estimates the Rt accurately for the newly emerged infectious disease. Our study suggests that Ct-based deep learning method can be employed to improve the tracking of the spread of infectious disease, and especially for the newly emerged epidemic.

Introduction

The emergence of epidemics, including the Corona Virus Disease 2019 (COVID-19) pandemic [1], presents a serious risk to public health and even human lives. Real-time tracking of an emerging infectious disease is essential to inform better control policies and avoid a large number of infections. The government generally have used the estimates of the time-varying effective reproduction number Rt to monitor the transmission of an epidemic, as it provides insights into the transmission risk and the assessment of implemented interventions timely [16]. In specific, the time-varying effective reproduction number Rt can be defined in two ways: the instantaneous reproductive number and the case reproductive number [7]. Therein, the case reproduction number is defined as the average number of secondary cases generated by an infectious individual infected at time t [8], which is used in this work. Rt < 1 suggests that the epidemic is waning and can be considered that the transmission of epidemic is under control [9].

Most existing studies [1, 1017] have adopted the statistical methods to estimate Rt based on the time series of daily symptomatic cases, hospitalizations and death numbers. Wallinga and Teunis [10] propose a likelihood-based method (WT method) to estimate Rt based on the daily number of reported cases and the generation interval distribution. The estimation results of this method may show substantial variation in a short period when the data aggregation time step is small. As an improvement, Cori et al. [13] develop the EpiEstim, a generalized R toolkit that estimates Rt based on Bayesian inference. The EpiEstim provides precise posterior distribution of Rt and has been employed to monitor several recent outbreaks [1720]. Gressani et al. [14] propose the EpiLPS, which can smooth the epidemic curve and allows to obtain accurate estimates of Rt. Liu et al. [15] propose a discrete spline-based approach rtestim and produce a locally adaptive estimation results for Rt. Parag et al. [16] construct a recursive inference algorithm and develop the EpiFilter. This method unifies the WT method and EpiEstim and largely resolves their edge-effect issues. These methods typically require the assumption of the generation interval distribution (i.e., the time interval between the infection time of the infector and her/his infectees), which is hard to acquire at the early stage of the emergent epidemic [7, 21]. Further, the aforementioned time series data is sensitive to detection resources, latent periods, and reporting delays [2224].

Recent studies [25, 26] demonstrate that the viral load distribution of infected populations can be used to infer the trajectory of epidemics. They show that the fast-growing epidemiological populations exhibit a predominance of new infections who have higher viral loads, while the shrinking epidemiological populations have more older infections who have lower viral loads. The viral load of an infected individual can be represented by the cycle threshold (Ct) values, which are obtained from the real-time quantitative reverse-transcription polymerase chain reaction (RT-qPCR) assay and the lower Ct value indicates a higher viral load [27, 28]. Some works [25, 2933] have attempted to estimate Rt using the statistical methods by monitoring the time-varying distribution of Ct values of the infected populations. Hay et al. [25] use the Ct values from cross-sectional samples to estimate Rt based on Bayesian inference. They also develop the R toolkit ViroSolver to infer epidemic dynamics, including the estimates of Rt. However, this method makes specific assumptions, such as the epidemic trajectory following a Gaussian process. In a follow-up study, Hay et al. [29] explore the epidemic dynamics of COVID-19 variants of concern (VOCs) and expand the capabilities of the ViroSolver to estimate Rt in a two-strain epidemic. Liu et al. [34] compare the EpiEstim [13] and ViroSolver [25] and conclude that the ViroSolver provides more accurate estimates of Rt, while the EpiEstim requires the adjustment of the generation interval distribution for better performance. Besides, Lin et al. [31] apply a log-linear regression model to estimate Rt based on the mean and skewness of Ct values in the infected populations. They show that the Ct-based Rt estimation methods are less sensitive to the detection resources [28]. The above Ct-based Rt estimation method adopts statistical techniques, such as Bayesian and regression models. Although effective, these methods are constrained by their reliance on assumptions and limited capability to extract the temporal features from the time-varying distribution of Ct values and process the non-linear relationships.

With the increasing availability of data and powerful learning ability, the deep learning as a nonlinear mathematical method is widely used in image recognitions [35], natural language processing [36], and time series forecasting [37], etc. The core of deep learning is to build machine learning architecture model with multiple hidden layers, to train them on large-scale data, and obtain a large amount of more representative feature information [38]. And then, some samples can be used to test the model and finally improve the model performance such as the prediction accuracy. The early notable deep learning models include Recurrent Neural Networks (RNNs) [39] and Long Short-Term Memory (LSTM) networks [39]. Therein, the architecture of RNNs is designed to extract temporal feature from sequence data, while the LSTM, as an advanced form of RNNs, is built for capturing long-term dependencies in sequence data. During the COVID-19 period, the deep learning models have been applied in the field of epidemiology [4043]. Davahli et al. [40] develop two types of Graph Neural Networks (GNN) models [44], which are designed to handle graph-structured data, to estimate Rt using COVID-19 cases from 22 January 2020 to 26 November 2020 in the United States. Gatto et al. [41] utilize LSTM architecture to predict the daily trend of time-varying reproduction number (Rt). They find the neural network-based models accurately predict Rt in the region and autonomous province of Italy when sufficient epidemiological data is available. Although the above architectures of neural network are effective in time series prediction tasks, they still exhibit the vanishing gradients and hard to capture long-range dependencies. A significant advancement in modern machine learning is that Vaswani et al. [45] propose the Transformer, an encoder-decoder based neural network architecture with the novel self-attention mechanism. The encoder is designed to process the input sequence, and the decoder generates output sequences, and both of them consist multiple layers and communicating via attention mechanism. The self-attention mechanism allows the model to weigh the importance of different parts of the input sequence and enables it to capture the long-range dependencies efficiently. Both the computation efficiency and its ability to handle with long-range dependencies have made the Transformer architecture a mainstream choice in deep learning models.

The main aim of this study is to estimate the case reproduction number from the time-varying distribution of Ct values based on deep learning models. In order to better capture the long-range dependencies and extract the temporal features, we propose the Cycle Threshold-based Transformer (i.e., Ct-Transformer) with the canonical components, including the patching layer, the gated recurrent unit network and the multi-head attention layer. The detailed descriptions of the above components in the architecture of Ct-Transformer are presented in Materials and methods. To be noted that, both the supervised learning and the self-supervised learning of Ct-Transformer are developed. Therein, the supervised learning of Ct-Transformer is trained on a large number of datasets with labels, while the self-supervised one can be trained on the datasets without labels and fine-tuning the pre-trained model on small portion of labeled data, which finally both of them can be used to estimate Rt. By evaluating the Ct-Transformer on the synthetic datasets, we find the supervised learning of Ct-Transformer performs best in all Ct-based deep learning and statistical methods. More in detail, the supervised learning of Ct-Transformer achieves a 31.1% reduction in MAE and a 27.0% reduction in RMSE, compared with other cycle thresholds based deep learning methods. We also find the proposed Ct-Transformer is robust to both the limited and the time-varying detection resources. It achieves similar performance or better performance in the situations with 25% detection rate compared with the traditional incidence-based Rt estimation method in the situations with 100% detection rate. Lastly, we pre-train the self-supervised learning of Ct-Transformer on synthetic datasets, and then fine-tune the pre-trained model on the Hong-Kong COVID-19 data. We obtain a 24% reduction in MAE compared with the result reported in [31].

In this study, we propose the Cycle Threshold-based Transformer (Ct-Transformer), which is a deep learning-based method to estimate Rt. The self-supervised learning of Ct-Transformer can be used to estimate Rt of the same disease on other regions or countries by fine-tuning on small portion of real labeled data. The proposed Ct-Transformer is evaluated on both the synthetic and real datasets, which proves to be effective and robust to the detection resources. The ablation experiments also show the necessity of each canonical components in the design on the architecture of Ct-Transformer. In summary, this study offers public health authorities a deep learning method for estimating Rt without the time delays inherent in traditional incidence-based methods and few prior knowledge requirement. Additionally, it can be easily adapted to estimate Rt for newly emerging diseases by adjusting the model with small portion of real data.

Materials and methods

The supervised learning of Ct-Transformer

We consider the following problem: given a time series of Ct variables with length L: , where each is a vector with dimension M (i.e., ), we estimate the corresponding time series of Rt with length L: (R1, R2, …, RL). The dimension M refers to the number of Ct variables, including the mean, skewness and distribution of Ct values. Detailed exploration of dimension M is shown in S1 Supplementary Methods (3. Ct Variables in the Synthetic Datasets). The workflow of Ct-Transformer and an overview of the architecture of supervised learning of Ct-Transformer are respectively as shown in Fig 1A and 1B.

thumbnail
Fig 1. The architecture of Ct-Transformer.

(A) The Ct-Transformer accepts a time series of Ct variables and outputs the corresponding time series of estimated Rt. The agent-based SEIR transmission method is designed for producing the synthetic datasets. (B) The Ct-Transformer segments the time series of Ct variables into patches, which are then projected into a high-dimensional space based on their respective weights. The model extracts the temporal features from the high-dimensional hidden representations and builds non-linear relationships with Rt. (C) Masked self-supervised learning of the Ct-Transformer, where patches are randomly selected and set to zero. The Ct-Transformer is then tasked with reconstructing these masked patches.

https://doi.org/10.1371/journal.pcbi.1012694.g001

Patching layer.

The Patching layer segments the input into N non-overlapping patches, and outputs a time series of patch , where N is the number of patches, calculated as . P represents the length of each patch and the dimension of the i−th patch is C = M×P, with 1 ≤ iN. To be noted that, we append repeated instances of the last value to the end of the time series of Ct variables before patching. The patching layer shortens the length of the input from L to L/P, which makes the Ct-Transformer able to deal with long time series and also improves computational efficiency. The “Prediction Head” at the end of Ct-Transformer is designed to flatten the series back to its original length.

Ct variable selection layer.

Inspired by the Variable Selection Network [46] and the Temporal Covariate Interpreter [47] designed in the neural network models, we introduce the Ct Variable Selection (CVS) layer in the architecture of Ct-Transformer to identify the important variables by calculating the weight of variables. The CVS layer first transforms each variable of the output from the patching layer into a high-dimensional space. Let denote the representation of the j-th variable in the i−th patch, where dmodel represents the dimension of representation. The representations of all variables in the i−th patch are flattened and expressed as . Further, a linear transformation layer is employed to obtain the weights of all variables in the i−th patch, which is listed as: (1) where represents the weights of all variables in the i − th patch. ELU and Softmax(⋅) respectively represent the Exponential Linear Unit activation function [48] and the Softmax operation [49]. W1 and b1 respectively represent the learnable weights and biases. Finally, each representation is operated by a linear transformation and then weighted by its weight. Thus, the output of the CVS layer is expressed as follows: (2) where W2 and b2 respectively represent the learnable weights and biases.

Temporal Features Extraction layer.

The Temporal Features Extraction (TFE) layer designed in Ct-Transformer incorporates a gated recurrent unit (GRU) network [50] to extract the temporal features from the output of the CVS layer. The GRU network can process and remember information from long time series data more effectively than the traditional RNNs [39]. This network employs two critical gates, i.e., reset gate and update gate, which regulate the information flow of the hidden state. The hidden states of all time steps from the GRU are passed to the latter residual connection. The residual connection integrated with a Gated Linear Unit (GLU) [51] further mitigates the problem of vanishing gradient and bolsters the process of training.

Gated Residual Network Layer.

The Gated Residual Network (GRN) layer, implemented in the architecture of the Ct-Transformer, is mainly used for non-linear processing. This layer first applies two linear transformation layers with ELU activation to the output from the TFE layer, which results in . A Gated Linear Unit is then designed to control the flow of information, which is expressed as follows: (3) where W3,W4 and b3,b4 respectively represent the learnable weights and biases of the linear layers. σ(⋅) and ⊙ respectively represent the Sigmoid activation function [52] and the Hadamard product of the elements [51]. To further enhance the model’s generalization, LayerNorm [53] is applied and produces the final output of the GRN layer (i.e., ).

Multi-head Self-attention layer.

We develop the Multi-head Self-attention (MSA) layer to learn dependencies between time steps based on the self-attention mechanisms [45]. Inspired by the modification of the self-attention mechanisms in the TFT, the MSA layer adopts the shared V matrix across each head and employs an additive aggregation of all heads, which is calculated as: (4) where (5) therein, respectively represent the Query, Key and Value matrices and dattn is calculated as dmodel/hm. hm represents the number of heads and is used for linear transformation to the output. A(⋅) represents the scaled dot-product attention [45], calculated as . , are the head-specific weights for the Keys and Queries, while are the head-shared weights for the Value.

From Eq (5), each head is capable of learning different attention patterns , which concentrate on the shared V matrix. This process can be explained as an ensemble of attention patterns and enhances the representational capacity. Besides the Multi-Head Attention, a feedforward network with two linear layers is employed. Both the Multi-Head Attention and the feedforward network are followed by a residual connection with a GLU.

Loss function.

The Ct-Transformer is optimized by minimizing the sum of the quantile losses Lq [54], with respect to multiple quantile points Q for all time step T: (6) where (7) therein, yt and respectively represent the ground truth of Rt and estimated Rt by the model. Q represents the set of quantile points and is set as Q = {0.025, 0.5, 0.975} to obtain the 95% confidence intervals in this paper.

The self-supervised learning of Ct-Transformer

The training of supervised learning method requires a sufficient dataset with labels, which is not available for the emergence of new infectious diseases. As an improvement, we also propose the self-supervised learning of Ct-Transformer, which requires less labeled data for representation learning. This approach is particularly suited for the newly emergent epidemic or the spread of the same infectious disease in different countries with distinct contact patterns. As shown in Fig 1C, both kinds of learning of Ct-Transformer share the same architecture but with different training processes. Specifically, the “Prediction Head” in the supervised learning of Ct-Transformer is replaced by the “Pretrain Head” in the self-supervised learning.

The training of self-supervised learning includes two steps: (1) Masked self-supervised training to obtain a pre-trained model using a relatively large dataset without labels. During this initial step, portions of the input are randomly removed, and the model is trained with quantile loss to reconstruct these omitted parts [55]. This step is crucial for learning high-level abstract representations. (2) Supervised fine-tuning this pre-trained model on part of the target dataset with labels. Upon the completion of these two steps, the fine-tuned model is capable of estimating Rt in the target dataset. Usually, there are two strategies for fine-tuning [56]: (a) linear probing (Lin. Prob) and (b) end-to-end fine-tuning (End2End). For (a), we only train the “Prediction Head”, while freezing the rest of the network. For (b), we first update the “Prediction Head” using the Lin. Prob and then perform fine-tuning to the entire network.

SEIR transmission model and the Ct value model

We adopt the agent-based stochastic Susceptible-Exposed-Infectious-Removed (SEIR) transmission model [57] and the Ct value model [25] to produce the synthetic dataset. The SEIR transmission model classifies the population into four types: susceptible (S), exposed (E), infectious (I) and removed (R). Therein, the susceptible state represents an individual who is not infected but can be infected, the exposed state represents an individual who is infected but not infectious, and the infectious state represents an individual who is infected and can infect susceptible individuals. Lastly, the removed state represents an individual who has recovered from the infectious state and can not be infected again. The flow of these states is presented in Fig 1A.

We employ the Ct value model reported in [25] to simulate the evolution of viral load (characterized by the Ct value) within host since the infection. In specific, we denote the observed Ct value of an infected individual a days after his/her infection as Ct(a), and it is described in S1 Supplementary Methods (1. Parameters of the Agent-based SEIR Transmission Model and Ct Value Model). As shown in S1 Fig, the trajectories of Ct values capture the variation resulting from both swabbing variability and individual-level differences in viral kinetics.

The parameters used to simulate both the SEIR model and the Ct value model are listed in S1 Table in S1 Supplementary Methods (1. Parameters of the Agent-based SEIR Transmission Model and Ct Value Model).

Data

Synthetic datasets.

The synthetic dataset are generated with the same framework but with different transmission parameters. For each simulation, the data is produced with following three steps: 1) On each day of the simulation, we simulate the SEIR transmission model with one time step (i.e., one day) and produce the newly infected individuals at day t; 2) A subset of individuals are sampled on each day of the simulation determined by the detection scenarios described at the subsection of Impact of Detection Rates on Performance in 3. 3) For each individual i sampled on day a, a Ct value Ct(at) is generated according to the Ct value model. Therein, at represents the elapsed time since the infection of individual i, which is described in S1 Supplementary Methods (1. Parameters of the Agent-based SEIR Transmission Model and Ct Value Model). To be noted that, at step 1), the micro-transmission chains including the time of each infected individual, and his/her infectees, which are used to calculate the time-varying effective reproduction number Rt are recorded. The details about the calculation are presented in S1 Supplementary Methods (2. Calculation of Rt based on Micro-transmission Chains).

We consider multiple scenarios by modeling outbreaks on two distinct types of contact networks: the Erdős-Rényi (ER) network [58] and Scale-Free (SF) network [59]. The ER network is a random graph where edges between nodes are formed independently and randomly, and is used as a theoretical baseline to study the spread of disease on networks. The degree of a node, i.e., the number of neighbors or edges connected to a node in the ER network, approaches a Poission distribution, which is relatively narrow and most of nodes in the ER network having similar degrees. Thus, the ER network is used to characterize the homogenous contact pattern of population. In contrast, the SF network is developed to characterize the degree distribution of real networks, which usually follows the power law. It means that a small number of nodes in the SF network have a very high number of connections, while most other nodes have relatively few connections. We set the average degree of both type of contact network equal to 10 and simulate the outbreaks with different value of R0 (the average number of secondary cases generated by a typical infectious individual over the entire course of the infectious period in a fully susceptible population) to represent the modeled epidemic with different transmission abilities. As shown in Table 1, the synthetic datasets are classified into the ER and SF datasets. The Ct-Transformer is trained and evaluated on the training and validation sets in the ER (SF) dataset with R0 equal to 1.5, 2.0, 2.5, and 3.0. The performance of the model is then tested on the separate testing set in the ER (SF) dataset, where R0 equal to 1.2, 1.8, 2.2, 2.8, and 3.4. The total number of simulations used to train, evaluate and test for the Ct-Transformer are 4800, 960, and 480, respectively. Detailed combinations of Ct variables are shown in S1 Supplementary Methods (3. Ct Variables in the Synthetic Datasets).

thumbnail
Table 1. Detailed partitioning of the synthetic datasets, including the ER dataset and SF dataset.

https://doi.org/10.1371/journal.pcbi.1012694.t001

Real-world dataset.

In addition to the synthetic datasets, we also evaluate the performance of the Ct-Transformer on the Real-world dataset (i.e., Hong Kong COVID-19 dataset). The Hong Kong COVID-19 dataset includes two consecutive waves of the spread of the COVID-19 in Hong Kong, which are respectively as the third wave spanning July to August 2020 and the fourth wave from November 2020 to March 2021. During these two periods, 8646 COVID-19 cases are detected through both the clinical diagnosis and the public health surveillance. In the meanwhile, the first available records of Ct values (derived from RT-qPCR tests targeting E gene) for the detected COVID-19 cases are collected. We align the dates of first available record of Ct value of all detected cases and compute the mean and the skewness of Ct values at each date during the third and forth waves of the spread of COVID-19 in Hong Kong. Besides, the estimated time series of Rt reported in [31] are included in this Real-world dataset. The section of Real-world Data Verification provides a comprehensive description of the temporal changes in the mean and skewness of Ct values, as well as the associated Rt. To test the performance of Ct-Transformer on this real-world dataset, we adjust the input format of the Ct-Transformer with the mean and the skewness of Ct values at each time step. We adopt the same method reported in [31] and divide the real-world dataset into the training, validation and testing sets, which are respectively as follows:

  1. Training set, from Jul 6th 2020 to Sept 17th 2020.
  2. Validation set, from Sept 18th 2020 to Nov 15th 2020.
  3. Testing set, from Nov 16th 2020 to Mar 23rd 2021.

Experimental settings

Data preprocessing.

We utilize the Mix-Max normalization technique, as detailed in [60], to standardize the non-uniform input data. This process ensures that no single variable disproportionately influences the model’s performance. For instance, the average of Ct values for infected individuals is between 16 and 40, whereas their probability distributions span from zero to one. To address this, each input variable is normalized according to the following procedure: (8) where and respectively represent the minimum and maximum values of the k−th variable for the input in the training set. Thus, Min-Max Normalization transforms a value xk of the k−th variable to in the range of zero to one. Besides, we apply the logarithmic transformation [61] to the value of Rt to make the distribution of the label smoother and improve the training efficiency of Ct-Transformer.

thumbnail
Table 2. Hyperparameters, corresponding tuning spaces, and the best hyper-parameter settings for the Ct-Transformer on the ER dataset and SF dataset.

https://doi.org/10.1371/journal.pcbi.1012694.t002

Evaluation metrics.

Mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) are adopted as the evaluation metrics of model performance. Lower values of MAE and RMSE indicate a better performance of the model. Conversely, the R2, which varies between zero and one, evaluates the model’s fit quality. A value of R2 approaching one indicates a superior fit to the data.

Baseline methods.

We conduct a comparative analysis of the proposed Ct-Transformer against two types of baseline methods: the incidence-based and Ct-based methods. The incidence-based method encompasses the EpiEstim [13], while the Ct-based method is further categorized into the statistical and deep learning methods based on their implementation strategies. In specific, the Ct-based statistical method includes the ViroSolver [25] and Regression [31]. As for the Ct-based deep learning method, we realize several prominent deep neural network architectures, namely the MLP [41], Transformer [45], and TFT [46] with the same input as the Ct-Transformer for comparison.

Experimental environment.

Experiments are implemented in Python 3.10.12 with Pytorch 2.0.1. Each deep learning model is trained and tested on a workstation featuring an Intel(R) Core i5–13600KF CPU (@5.3 GHz), 32GB of RAM, and an NVIDIA RTX 4070Ti GPU with 12GB of memory. All deep learning models are trained over 30 epochs with the strategy of early stopping [62].

Hyperparameter selection.

A grid search approach [63] is employed to determine the optimal hyperparameters for the deep learning models. The specific tuning spaces and the best values for each hyperparameter of the Ct-Transformer are presented in Table 2. Meanwhile, the explored tuning spaces and the best hyperparameter values for all deep learning models are provided in S5 Table in the S1 Supplementary Methods (4. Hyperparameters of Deep Learning Methods). The hyperparameters yielding the minimum validation loss for the Ct-Transformer are used in subsequent experiments to estimate Rt.

Results

Relationship between Ct values and Rt

As shown in Fig 2A, we find the ratio of the number of newly infected individuals to the currently infected individuals keeps high during the early stage of the epidemic. This ratio then decreases sharply as the epidemic reaches its peak, and finally keeps low towards the end of the epidemic. This trend implies that during the early stage of the epidemic, random sampling is more likely to detect individuals who have been recently infected and exhibit lower Ct values. Conversely, during the declining stage of the epidemic, the sampled infections present higher Ct values as they approach recovery. This is revealed in the average of Ct values in Fig 2C.

thumbnail
Fig 2. The Ct values reflect epidemic dynamics throughout their outbreaks.

(A) The number of currently infected individuals (light histogram) and the number of newly infected individuals (dark histogram) in a stochastic simulation with R0=3.0. (B) The average of Ct values for the infected population versus time. Each line shows the average of 1500 simulations. (C) The distribution of Ct values (violin plots) in the infected population, which are randomly selected on the detected days (t=10, 30, 50, and 70) during the outbreak described in (A). The median, along with the first and third quartiles of the distribution, are indicated by purple lines, while the red dots represent the average of Ct values (the median quartiles) on these detected days. (D) Rt varies as the transmission of epidemics, with lines corresponding to those of the same color in panel (B). Each outbreak is simulated on the ER contact network.

https://doi.org/10.1371/journal.pcbi.1012694.g002

The temporal evolution of Rt and the average of Ct values are respectively presented in Fig 2B and 2D. We find that Rt keeps relatively stable during the early stage of the epidemic and then undergoes a rapid decline when reaching the peak. This trend is revealed by the evolution of the average of Ct values, which initially remains steady but then experiences a sharp increase as the transmission of the epidemic. In the meanwhile, a lower average of Ct values at the onset of the epidemic is found in the simulated epidemic with a higher R0. These results imply the time-varying distribution of Ct values may serve as an observable proxy to infer the transmission dynamics of an epidemic [25].

Supervised learning results

Performance on the ER dataset.

We first evaluate the performance of the supervised Ct-Transformer on the ER dataset. The estimated Rt for two stochastic simulations with different R0 (i.e., R0=1.8 and R0=3.4) are presented in Fig 3. Overall, the supervised Ct-Transformer (blue line) captures the temporal evolution of Rt more accurately than both the incidence-based and the Ct-based methods. The supervised Ct-Transformer shows enhanced precision in estimating the infection point (i.e., the time when Rt reaches one) than the ViroSolver and TFT. In specific, the ViroSolver estimates the infection point significantly earlier than the Ground Truth (black line) in the simulation with R0=1.8. Conversely, it estimates the infection point with a large delay in the simulation with R0=3.4. Meanwhile, the performance of the EpiEstim (dotted yellow line) exhibits bias at the beginning of both simulations. Further, the 95% confidence intervals of Rt estimation for different stochastic simulations are shown in S2 Fig in S1 Supplementary Methods (5. Confidence Intervals of Rt Estimation). We find that the 95% confidence intervals given by the Ct-Transformer can accurately capture the scales of the temporal evolution of Rt.

thumbnail
Fig 3. The performance of the supervised Ct-Transformer on two stochastic simulations on the ER network.

Panels (A) and (B) display the Rt estimations by the Ct-Transformer and other alternative models for simulations where R0 is set to 1.8 and 3.4, respectively. To be noted that, all models begin estimating Rt when there are fifteen cases in the simulated population. The ground truth values are derived from the micro-transmission chain of the two stochastic simulations, based on the definition of the case reproduction number. The estimation results of EpiEstim (dotted yellow line), ViroSolver (dotted green line), and TFT (orange line) are presented as the representative alternative methods for incidence-based, Ct-based statistical, and Ct-based deep learning approached, respectively.

https://doi.org/10.1371/journal.pcbi.1012694.g003

The quantitative results of the model performance are summarized in Table 3. In terms of the Average, the supervised Ct-Transformer outperforms all the baseline methods in estimating Rt despite a slight decline in performance when R0 has the lowest value of 1.2. Compared with the best results (i.e., EpiEstim) achieved by both incidence-based and Ct-based statistical methods, the supervised Ct-Transformer attains a 41% decrease in MAE and a 39.1% reduction in RMSE. Further, in comparison with the optimal results (i.e., TFT) in other Ct-based deep learning methods, the supervised Ct-Transformer continues to exhibit superior accuracy, which achieves a 31.1% reduction in MAE, a 27.0% reduction in RMSE and 4.6% increase in R2.

thumbnail
Table 3. Results of the supervised Ct-Transformer and baseline methods on the ER dataset.

We respectively show the average result of simulations with each R0 ∈ {1.2, 1.8, 2.2, 2.8, 3.4} in the testing set. The Average represents the average of these above results. For each R0 and the Average, the best results are in bold and the second best are underlined.

https://doi.org/10.1371/journal.pcbi.1012694.t003

Impact of detection rates on performance.

The testing policy associated with the availability of testing resources can be summarized as the detection rate [24], which changes during the course of the epidemic spreading. Meanwhile, the variations in detection rate lead to inaccurate time series data, which further causes the biases of the standard methods when estimating Rt, such as the EpiEstim [13]. To demonstrate the performance of the proposed Ct-Transformer, we implement five detection scenarios similar to [31], which are listed as follows:

  1. Full Detection: a fixed detection rate of 100%. This scenario represents the ideal situation in which all infected individuals are detected;
  2. Scenario 1: a fixed detection rate of 25%. This scenario represents the situation of stable detection;
  3. Scenario 2: a fixed detection rate of 10%. This scenario represents the situation of stable but limited detection;
  4. Scenario 3: a detection rate increases from 15% to 60% as the transmission of epidemics. This scenario represents the situation of expansion in detection;
  5. Scenario 4: a fixed detection rate of 25% but 5% in the early stage of the epidemic (i.e., the early stage is defined as the first 20 days since the start of the outbreak). This scenario represents the situation of limited detection resources at the early stage of the epidemic.

We present the full comparisons between the Ct-Transformer and EpiEstim on the ER dataset under different detection scenarios, as shown in Table 4. We find the supervised Ct-Transformer consistently outperforms the EpiEstim across all detection scenarios, except the Full Detection scenario when R0=1.2. In the meanwhile, compared to the Full Detection scenario, the increased percentage in MAE is minimal for the Ct-Transformer (see S3(A) Fig in the Scenario 1 to Scenario 4). These results are also robust when we compare the performance of Ct-Transformer and EpiEstim on the SF dataset (see S6 Table and S3(B) Fig in S1 Supplementary Methods (6. Further Exploration of Detection Rate on Performance)). These results demonstrate the proposed Ct-Transformer is robust to the time-varying detection rate and limited detection resources.

thumbnail
Table 4. Results of the supervised Ct-Transformer and the EpiEstim on the ER dataset under different detection scenarios.

We show the average results of simulations with R0 ∈ {1.2, 2.2, 3.4} in the testing set. For each detection scenario, the best results are shown in bold between the Ct-Transformer and the EpiEstim.

https://doi.org/10.1371/journal.pcbi.1012694.t004

Self-supervised learning results

In this section, we present the result of the self-supervised learning of Ct-Transformer on both the synthetic and real-world datasets. Notably, 30% of the patches are masked with zero at step (1) during training process of self-supervised learning, which can better learn representations from the distribution of Ct values. The performances of the Ct-Transformer with other rates of masked patches are reported in S7 Table in S1 Supplementary Methods (7. Rate of Masked Patches for Self-supervised Learning).

thumbnail
Table 5. Results of the Ct-Transformer (with End2End, Lin. Prob, and Sup.), other Ct-based supervised and incidence-based methods on the SF dataset.

We show the average results of simulations with R0 ∈ {1.2, 1.8, 2.2, 2.8, 3.4} in the testing set. The Average represents the average of the above results. For each R0 and the Average, the best results are in bold and the second best are underlined.

https://doi.org/10.1371/journal.pcbi.1012694.t005

Performance on the SF dataset.

We first obtain a pre-trained model by self-supervised training the Ct-Transformer on the ER dataset. Then, we implement both the Lin. Prob and End2End supervised training on part of SF dataset. The performance of the Ct-Transformer (with Lin. Prob, End2End and the supervised training from scratch), other Ct-based supervised methods and the incidence-based methods on the SF dataset are summarized in Table 5.

In terms of the Average, the Lin. Prob performs better than the Ct-based statistical methods (i.e., ViroSolver and Regression) and two Ct-based deep learning methods (i.e., Transformer and MLP), with the reduction in MAE ranging from 17.2% to 46.8%. Meanwhile, the End2End performs slightly better than the Ct-Transformer trained from scratch and achieves the best result. Compared with the incidence-based method, the End2End exhibits a 42.6% decrease in MAE and a 33.1% reduction in RMSE. The performance of the Ct-Transformer, across all learning approaches, experiences a minor decrease in performance when R0 is lower on the SF dataset, which is consistent with those observed on the ER dataset.

Real-world data verification.

In this part, we present the result of the self-supervised Ct-Transformer with end-to-end fine-tuning on the real-world dataset. As shown in Fig 4A, the average of Ct values over time display a distinctly different trend compared to both the skewness of Ct values and the Rt trend, as illustrated in Fig 4B. This behavior mirrors the phenomena observed in the synthetic dataset, as reported in Fig 2B and 2D. Specifically, the Ct-Transformer undergoes self-supervised training on the ER dataset. Then, this pre-trained model is supervised fine-tuned with end-to-end approaches using the Hong Kong COVID-19 outbreak dataset. Distinct from its application on the synthetic datasets, the input for the Ct-Transformer in this real-world dataset is replaced with the average and skewness of Ct values.

thumbnail
Fig 4. Estimation results in the real-world dataset.

(A) Daily average (green line) and skewness (orange line) of Ct values from July 6th 2020 to March 23th 2021. The dotted horizontal line represents the value of skewness equal to zero. (B) The orange line and shaded area indicate the average and 95% confidence intervals for the results of the incidence-based method, while the blue line and shaded area indicate the average and 95% confidence intervals for the results of the regression method. The data mentioned above is provided by [31]. The green line and shaded area indicate the average and 95% confidence intervals for the results of the self-supervised Ct-Transformer. The grey bars represent the number of infections by sampling date. The overall period is divided into training, validation, and testing periods (left, middle, and right columns).

https://doi.org/10.1371/journal.pcbi.1012694.g004

As shown in Fig 4B, this real-world dataset is partitioned according to the method described in [31]. In the results of the testing set, the pre-trained Ct-Transformer with End2End fine-tuning can capture the temporal evolution of Rt accurately, especially before Feb. 23rd 2021. In contrast and as reported in [31], the regression method, designed to estimate Rt using the average and skewness of Ct values, attains a MAE of 0.25. In this context, the proposed self-supervised learning of Ct-Transformer achieves a smaller MAE of 0.19 and realizes a 24% reduction.

Leveraging the pre-trained Ct-Transformer by self-supervised learning on the synthetic datasets, we can selectively retrain the ‘Prediction Head’ or the entire model over fewer epochs to accurately estimate Rt across diverse datasets, including the SF dataset and the Hong Kong COVID-19 dataset. This approach enhances the model’s generalization capability and concurrently diminishes both computational demands and data requirements.

Ablation experiments

To quantify the benefit of each layer in the architecture of Ct-Transformer, we perform the following ablation experiments and quantify the percentage increase in loss compared to the original architecture:

  1. No patch: We set the length of patch in the Patching layer to one.
  2. No CVS: We replace the CVS layer with a linear layer, which performs a linear transformation on the input.
  3. No TFE: We directly remove the TFE layer.
  4. No GRN: We directly remove the GRN layer.
  5. No MSA: We set the head-shared weights for the V matrix in Eq (5) to be head-specific, which is calculated as: (9)

We train the ablated model using the hyper-parameter settings as the same in Data. The average percentage increase in loss compared to the original architecture of Ct-Transformer is calculated as follows: (10) where represents the average percentage increase in loss (MAE or RMSE or R2) of the ablated model for type α (No patch or No CFS or No TFE or No GRN or No MSA). and Er respectively represent the average loss of the ablated model for type α and the original architecture of Ct-Transformer in the simulated epidemics with R0 equal to r, where rR = {1.2, 1.8, 2.2, 2.8, 3.4}. nR represents the number of R0 in the set R. The average percentage increase in loss (MAE, RMSE, and R2) for each type of ablation is shown in Fig 5.

thumbnail
Fig 5. The average percentage increase in loss for each ablation type .

(A), (B), and (C) respectively represent the MAE, RMSE, and R2 loss. The gray and yellow bars respectively represent the ablation results on the ER and SF datasets. The numbers on the red dotted line represent the average result of both datasets. The error bars represent the standard deviation among the increases in loss for each R0 in the testing set.

https://doi.org/10.1371/journal.pcbi.1012694.g005

Patching.

As shown in Fig 5A, removing the Patching results in a 15.9% increase in MAE when averaged by the results on both the ER and SF datasets. In addition to improving the computational efficiency, the Patching layer crucially emphasizes the differences among all time steps, which aids in extracting the temporal features. Further analysis of this phenomenon, together with the exploration of the patch lengths, is detailed in S1 Supplementary Methods (8. Analysis of the Patching Layer).

Temporal Features Extraction.

From Fig 5, the most important role of the TFE layer in the architecture of the Ct-Transformer becomes evident. In specific, the removal of the TFE layer results in a significant increase in MAE, RMSE and R2, which are respectively as 54.7%, 49.8% and -3.0%. These results highlight the crucial role of extracting temporal features from the time-varying distribution of Ct values for the accurate estimates of Rt.

Nonlinear fitting and dependency learning.

The GRN and the MSA layers have a large impact on the performance of Ct-Transformer on the SF dataset. In specific, the removal of the GRN and MSA layers respectively results in a 20.7% increase and 30.2% increase in MAE on the SF dataset, while these become 4.8% and 2.7% in the ER dataset, as shown in Fig 5A. These differences reflect the importance of non-linear fitting and dependency learning in heterogeneous contact patterns.

In summary, each layer in the Ct-Transformer improves the performance of estimating Rt. The removal of the TFE layer results in the most significant percentage increase in loss, while removing the CVS layer leads to the least percentage increase in loss. The effects of removing the GRN and MSA layers differ across the two synthetic datasets.

Discussion

Real-time estimates of Rt are crucial for understanding epidemic dynamics and timely adjustment of interventions to mitigate and prevent the spread of emergent infectious diseases. In this work, we propose the Ct-Transformer, an attention-based deep neural network architecture for estimating the time-varying effective reproduction number Rt based on the time-varying distribution of Ct values. We find the Ct-Transformer outperforms both the incidence-based and existing Ct-based methods in estimating Rt. This work highlights the effectiveness of the Ct-based deep learning method in improving real-time estimates of Rt and timely tracking of epidemic dynamics.

Some studies [25, 2933] have developed the statistical methods to estimate Rt based on the distribution of Ct values of the infected population. The ViroSolver [25] employs Bayesian inference to estimate Rt by collecting the distribution of Ct values at single or multiple time points in the population. This method leverages the idea of refining the prior knowledge with new evidence and re-estimating the probability distributions of the results continuously, which significantly enhances interpretability. However, this method needs the prior knowledges including both the transmission dynamic such as the prior distribution for the epidemic seed time, the constant exponential growth of infection incidence, etc. In contrast, the proposed deep learning model, Ct-Transformer, can estimate Rt accurately without these assumptions. In specific, the supervised Ct-Transformer is trained using the simulated outbreaks with four R0, while it can estimate Rt accurately in the simulated outbreaks with a range of R0. This capability is attributed to the layers designed in the architecture of the Ct-Transformer, which can extract the temporal features from the time-varying distribution of Ct values and establish the complex non-linear relationship with Rt in a purely data-driven manner. Further, we explore and compare the impact of different combinations of Ct variables on the model performance to fully leverage the information from the evolution of Ct values, as detailed in S1 Supplementary Methods (3. Ct Variables in the Synthetic Datasets). The distribution of Ct values with an interval of 4 can better reflect the shape of the distribution and the average can serve as the viral load level of the infected population. Thus, the combination of these two variables enables the Ct-Transformer to more effectively capture the evolution of Ct values compared to the ViroSolver and regression model [31], which rely on a single type of Ct variables input.

To demonstrate the robustness of the proposed Ct-Transformer, we have evaluated the performance of the supervised Ct-Transformer under different detection scenarios. We find the proposed model is capable of accurately estimating Rt under both the low and time-varying detection rates. In contrast, the incidence-based methods, such as the EpiEstim [13], exhibit fluctuations and instability during the early stages of epidemics. These sensitive analyses present that the proposed Ct-Transformer is robust to detection resources, which can be explained by the input of the Ct-Transformer and the advantages offered by deep learning methods. Such robustness is especially valuable in the epidemic area with limited detection resources or during the early stage of the new emergent epidemic.

To further enhance the adaptability of the proposed model, we have developed the self-supervised learning of Ct-Transformer, which can be pre-trained on the synthetic dataset without labels and then undergo supervised training on a small target dataset. Our findings on the synthetic dataset indicate that the self-supervised Ct-Transformer through End2End fine-tuning strategy achieves a comparable performance with the model supervised trained from scratch. Further, this approach yields superior performance on the Hong Kong COVID-19 dataset, evidenced by lower MAE loss and narrower confidence intervals compared to the regression model proposed in [31]. These results prove that the developed pre-trained Ct-Transformer can be adopted to estimate Rt across diverse epidemic dynamics through fine-tuning strategies.

All layers in the architecture of Ct-Transformer contribute to the accuracy of estimating Rt, as presented by the result of ablation experiments. The Ct-Transformer outperforms both the Transformer [45] and MLP [41], which is primarily attributed to the crucial role of extracting the temporal features through a GRU network in the TFE layer. Additionally, this model benefits from an innovative patching mechanism and exhibits superior performance compared with the TFT [46]. The Ct-Transformer’s ability to accurately estimate Rt in stochastic simulations with heterogeneous contact networks is further enhanced by the incorporation of the gated unit in the GRN layer and self-attention mechanisms in the MSA layer.

There are several limitations in our work. Firstly, the Ct-Transformer needs prior knowledge about the evolution of Ct values in the newly emergent epidemic, which may be uncertain despite the reduction of extensive training datasets due to self-supervised learning. Additionally, the trajectories of Ct values can vary significantly across different populations. In particular, populations with higher vaccination rates [64] or with younger age [65] exhibit higher average Ct values during the epidemic outbreak. This intrinsic relationship between Ct values and Rt requires recalibration when applying the Ct-Transformer to a new outbreak or the same infectious disease in different countries with distinct contact patterns. Regarding the architecture of the model, there is potential for further improving the Ct-Transformer by developing an adaptive patching strategy based on the input, which may enhance performance beyond the current fixed length of the patch. Lastly, there are two kinds of definition on time-varying effective reproduction number: the instantaneous reproduction number and the case reproduction number. The former focuses on the short time windows and very sensitive to immediate changes in the transmission, while the latter reflects more on the actual transmission rather than the potential. Although the proposed Ct-Transformer performs well in estimating the case reproduction number, whether it will be effective in estimating the instantaneous reproduction number left for further investigations.

Conclusion

In this paper, we propose a novel attention-based deep neural network architecture Ct-Transformer and the self-supervised learning approach for the estimates of the time-vary effective reproduction number Rt based on the time-varying distribution of Ct values. The supervised learning of Ct-Transformer outperforms standard incidence-based, Ct-based statistic approaches and other Ct-based deep learning methods. These results indicate the effectiveness of the designed architecture of Ct-Transformer in extracting the temporal features and processing the nonlinear relationships. The pre-trained Ct-Transformer is evaluated on both the synthetic and real-world datasets and attains a comparable performance with the supervised model. We demonstrate the Ct-based deep learning model can improve the real-time estimates of Rt, especially in monitoring the newly emerged infectious disease. Further, the architecture of Ct-Transformer can indeed provide insights for designing models in other tasks, including time series forecasting.

Supporting information

S1 Supplementary Methods.

1. Parameters of the Agent-based SEIR Transmission Model and Ct Value Model. 2. Calculation of Rt based on Micro-transmission Chains. 3. Ct Variables in the Synthetic Datasets. 4. Hyper-parameters of Deep Learning Methods. 5. Confidence Intervals of Rt Estimation. 6. Further Exploration of Detection Rate on Model Performance. 7. Rate of Masked Patches for Self-supervised Learning. 8. Analysis of the Patching Layer.

https://doi.org/10.1371/journal.pcbi.1012694.s001

(ZIP)

S1 Fig. The trajectories of Ct values.

The gray lines represent the trajectories of Ct values for 50 randomly selected infected individuals, while the black line represents the trajectory of one infected individual.

https://doi.org/10.1371/journal.pcbi.1012694.s002

(EPS)

S2 Fig. The 95% confidence intervals of Rt estimation.

The pink line and shaded area respectively represent the average of Rt and the 95% confidence intervals.

https://doi.org/10.1371/journal.pcbi.1012694.s003

(EPS)

S3 Fig. The increased percentage in MAE relative to the Full Detection scenario in stochastic simulations on both ER and SF networks with R0=2.2.

The greyish-blue and yellow bars respectively represent the MAE loss of the Ct-Transformer and EpiEstim in different detection scenarios. The numbers displayed on the bars indicate the increased percentage in MAE relative to the Full Detection scenario.

https://doi.org/10.1371/journal.pcbi.1012694.s004

(EPS)

S4 Fig. Attention maps of the supervised Ct-Transformer with or without patching.

(A) Attention map in stochastic simulation on the ER network with patching. (B) Attention map in stochastic simulation on the SF network with patching. (C) Attention map in stochastic simulation on the ER network without patching. (D) Attention map in stochastic simulation on the SF network without patching. All stochastic simulations with R0=1.2.

https://doi.org/10.1371/journal.pcbi.1012694.s005

(EPS)

S5 Fig. The MAE loss with varying patch lengths P = [2, 4, 6, 8, 10, 12, 14, 16, 18] on the ER and SF datasets.

https://doi.org/10.1371/journal.pcbi.1012694.s006

(EPS)

S1 Table. Parameters of the agent-based SEIR transmission model and Ct value model.

https://doi.org/10.1371/journal.pcbi.1012694.s007

(PDF)

S2 Table. Intervals d and the set D in the distribution of Ct values.

https://doi.org/10.1371/journal.pcbi.1012694.s008

(PDF)

S3 Table. The sensitivity results of intervals d on ER and SF datasets.

andThe Average means the average of simulations with R0 ∈ {1.2, 1.8, 2.2, 2.8, 3.4} in the testing set. For each R0 and the Average, the best results are in bold the runners-up are presented as underlined.

https://doi.org/10.1371/journal.pcbi.1012694.s009

(PDF)

S4 Table. The sensitivity results of Ct variables (Var) on ER and SF datasets.

The Average means the average of the simulations with R0 ∈ {1.2, 1.8, 2.2, 2.8, 3.4} in the testing set. For each R0 and the Average, the best results are in bold and the runners-up are presented as underlined.

https://doi.org/10.1371/journal.pcbi.1012694.s010

(PDF)

S5 Table. Hyperparameters, tuning spaces, and the best hyperparameter settings for deep learning methods (Ct-Transformer, TFT, Transformer, and MLP) on the ER dataset and SF dataset.

https://doi.org/10.1371/journal.pcbi.1012694.s011

(PDF)

S6 Table. Results of the supervised Ct-Transformer and the EpiEstim method on the SF dataset under different detection scenarios.

For each detection scenario, the better one is presented as in bold.

https://doi.org/10.1371/journal.pcbi.1012694.s012

(PDF)

S7 Table. The sensitivity results of mask rates on the SF dataset.

The Average means the average of the simulations with R0 ∈ {1.2, 1.8, 2.2, 2.8, 3.4}. For each R0 and the Average, the best one is in bold and the runners-up is presented as underlined.

https://doi.org/10.1371/journal.pcbi.1012694.s013

(PDF)

References

  1. 1. Flaxman S, Mishra S, Gandy A, Unwin HJT, Mellan TA, Coupland H, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature. 2020;584(7820):257–261. pmid:32512579
  2. 2. Li Y, Campbell H, Kulkarni D, Harpur A, Nundy M, Wang X, et al. The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: a modelling study across 131 countries. The Lancet Infectious Diseases. 2021;21(2):193–202. pmid:33729915
  3. 3. Brauner JM, Mindermann S, Sharma M, Johnston D, Salvatier J, Gavenčiak T, et al. Inferring the effectiveness of government interventions against COVID-19. Science. 2021;371(6531):eabd9338. pmid:33323424
  4. 4. Liu QH, Ajelli M, Aleta A, Merler S, Moreno Y, Vespignani A. Measurability of the epidemic reproduction number in data-driven contact networks. Proceedings of the National Academy of Sciences. 2018;115(50):12680–12685. pmid:30463945
  5. 5. Liu QH, Bento AI, Yang K, Zhang H, Yang X, Merler S, et al. The COVID-19 outbreak in Sichuan, China: epidemiology and impact of interventions. PLoS computational biology. 2020;16(12):e1008467. pmid:33370263
  6. 6. Chen D, Zhou T. Evaluating the effect of Chinese control measures on COVID-19 via temporal reproduction number estimation. PLoS One. 2021;16(2):e0246715. pmid:33571273
  7. 7. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, R t. PLoS computational biology. 2020;16(12):e1008409. pmid:33301457
  8. 8. Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford university press; 1991.
  9. 9. Nishiura H, Chowell G. The effective reproduction number as a prelude to statistical estimation of time-dependent epidemic trends. Mathematical and statistical estimation approaches in epidemiology. 2009; p. 103–121.
  10. 10. Wallinga J, Teunis P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of epidemiology. 2004;160(6):509–516. pmid:15353409
  11. 11. Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PloS one. 2007;2(8):e758. pmid:17712406
  12. 12. Bettencourt LM, Ribeiro RM. Real time bayesian estimation of the epidemic potential of emerging infectious diseases. PloS one. 2008;3(5):e2185. pmid:18478118
  13. 13. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American journal of epidemiology. 2013;178(9):1505–1512. pmid:24043437
  14. 14. Gressani O, Wallinga J, Althaus CL, Hens N, Faes C. EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number. PLoS computational biology. 2022;18(10):e1010618. pmid:36215319
  15. 15. Liu J, Cai Z, Gustafson P, McDonald DJ. RtEstim: Effective reproduction number estimation with trend filtering. medRxiv. 2023; p. 2023–12.
  16. 16. Parag KV. Improved estimation of time-varying reproduction numbers at low case incidence and between epidemic waves. PLoS Computational Biology. 2021;17(9):e1009347. pmid:34492011
  17. 17. Abbott S, Hellewell J, Thompson RN, Sherratt K, Gibbs HP, Bosse NI, et al. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts. Wellcome Open Research. 2020;5(112):112.
  18. 18. Ferguson NM, Cucunubá ZM, Dorigatti I, Nedjati-Gilani GL, Donnelly CA, Basáñez MG, et al. Countering the zika epidemic in latin america. Science. 2016;353(6297):353–354. pmid:27417493
  19. 19. Kirsch TD, Moseson H, Massaquoi M, Nyenswah TG, Goodermote R, Rodriguez-Barraquer I, et al. Impact of interventions and the incidence of Ebola virus disease in Liberia—implications for future epidemics. Health policy and planning. 2017;32(2):205–214. pmid:28207062
  20. 20. Nouvellet P, Cori A, Garske T, Blake IM, Dorigatti I, Hinsley W, et al. A simple approach to measure transmissibility and forecast incidence. Epidemics. 2018;22:29–35. pmid:28351674
  21. 21. Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences. 2007;274(1609):599–604. pmid:17476782
  22. 22. Jombart T, Van Zandvoort K, Russell TW, Jarvis CI, Gimma A, Abbott S, et al. Inferring the number of COVID-19 cases from recently reported deaths. Wellcome Open Research. 2020;5. pmid:32518842
  23. 23. Lipsitch M, Swerdlow DL, Finelli L. Defining the epidemiology of Covid-19—studies needed. New England journal of medicine. 2020;382(13):1194–1196. pmid:32074416
  24. 24. Pitzer VE, Chitwood M, Havumaki J, Menzies NA, Perniciaro S, Warren JL, et al. The impact of changes in diagnostic testing practices on estimates of COVID-19 transmission in the United States. American Journal of Epidemiology. 2021;190(9):1908–1917. pmid:33831148
  25. 25. Hay JA, Kennedy-Shaffer L, Kanjilal S, Lennon NJ, Gabriel SB, Lipsitch M, et al. Estimating epidemiologic dynamics from cross-sectional viral load distributions. Science. 2021;373(6552):eabh0635. pmid:34083451
  26. 26. El Zein S, Chehab O, Kanj A, Akrawe S, Alkassis S, Mishra T, et al. SARS-CoV-2 infection: Initial viral load (iVL) predicts severity of illness/outcome, and declining trend of iVL in hospitalized patients corresponds with slowing of the pandemic. PLoS One. 2021;16(9):e0255981. pmid:34529675
  27. 27. Sala E, Shah IS, Manissero D, Juanola-Falgarona M, Quirke AM, Rao SN. Systematic Review on the Correlation Between SARS-CoV-2 Real-Time PCR Cycle Threshold Values and Epidemiological Trends. Infectious Diseases and Therapy. 2023;12(3):749–775. pmid:36811776
  28. 28. Péré H, Podglajen I, Wack M, Flamarion E, Mirault T, Goudot G, et al. Nasal swab sampling for SARS-CoV-2: a convenient alternative in times of nasopharyngeal swab shortage. Journal of clinical microbiology. 2020;58(6):10–1128. pmid:32295896
  29. 29. Hay JA, Kennedy-Shaffer L, Mina MJ. Viral loads observed under competing strain dynamics. MedRxiv. 2021; p. 2021–07.
  30. 30. Andriamandimby SF, Brook CE, Razanajatovo N, Randriambolamanantsoa TH, Rakotondramanga JM, Rasambainarivo F, et al. Cross-sectional cycle threshold values reflect epidemic dynamics of COVID-19 in Madagascar. Epidemics. 2022;38:100533. pmid:34896895
  31. 31. Lin Y, Yang B, Cobey S, Lau EH, Adam DC, Wong JY, et al. Incorporating temporal distribution of population-level viral load enables real-time estimation of COVID-19 transmission. Nature Communications. 2022;13(1):1155. pmid:35241662
  32. 32. Alizon S, Selinger C, Sofonea MT, Haim-Boukobza S, Giannoli JM, Ninove L, et al. Epidemiological and clinical insights from SARS-CoV-2 RT-PCR crossing threshold values, France, January to November 2020. Eurosurveillance. 2022;27(6):2100406. pmid:35144725
  33. 33. Phillips MC, Quintero D, Wald-Dickler N, Holtom P, Butler-Wu SM. SARS-CoV-2 cycle threshold (Ct) values predict future COVID-19 cases. Journal of Clinical Virology. 2022;150:105153. pmid:35472751
  34. 34. Liu C, Xu L, Bai Y, Xu X, Lau EHY, Cowling BJ, et al. Local Surveillance of the COVID-19 Outbreak. Frontiers in Physics. 2022;10.
  35. 35. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020;.
  36. 36. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;.
  37. 37. Lim B, Zohren S. Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A. 2021;379(2194):20200209. pmid:33583273
  38. 38. Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence. 2013;35(8):1798–1828. pmid:23787338
  39. 39. Graves A, Graves A. Long short-term memory. Supervised sequence labelling with recurrent neural networks. 2012; p. 37–45.
  40. 40. Davahli MR, Fiok K, Karwowski W, Aljuaid AM, Taiar R. Predicting the dynamics of the COVID-19 pandemic in the United States using graph theory-based neural networks. International journal of environmental research and public health. 2021;18(7):3834. pmid:33917544
  41. 41. Gatto A, Aloisi V, Accarino G, Immorlano F, Chiarelli M, Aloisio G. An artificial neural network-based approach for predicting the COVID-19 daily effective reproduction number Rt in Italy. AI. 2022;3(1):146–163.
  42. 42. Cinaglia P, Cannataro M. Forecasting COVID-19 epidemic trends by combining a neural network with Rt estimation. Entropy. 2022;24(7):929. pmid:35885152
  43. 43. Song P, Xiao Y. Estimating time-varying reproduction number by deep learning techniques. J Appl Anal Comput. 2022;12(3):1077–1089.
  44. 44. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks: A review of methods and applications. AI open. 2020;1:57–81.
  45. 45. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.
  46. 46. Lim B, Arık SÖ, Loeff N, Pfister T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting. 2021;37(4):1748–1764.
  47. 47. Zhou B, Yang G, Shi Z, Ma S. Interpretable temporal attention network for COVID-19 forecasting. Applied Soft Computing. 2022;120:108691. pmid:35281183
  48. 48. Clevert DA, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:151107289. 2015.
  49. 49. Rumelhart DE, Hinton GE, Williams RJ, et al.. Learning internal representations by error propagation; 1985.
  50. 50. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078. 2014;.
  51. 51. Dauphin YN, Fan A, Auli M, Grangier D. Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR; 2017. p. 933–941.
  52. 52. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. nature. 1986;323(6088):533–536.
  53. 53. Ba JL, Kiros JR, Hinton GE. Layer normalization. arXiv preprint arXiv:160706450. 2016.
  54. 54. Wen R, Torkkola K, Narayanaswamy B, Madeka D. A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:171111053. 2017.
  55. 55. Zerveas G, Jayaraman S, Patel D, Bhamidipaty A, Eickhoff C. A transformer-based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining; 2021. p. 2114–2124.
  56. 56. Nie Y, Nguyen NH, Sinthong P, Kalagnanam J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:221114730. 2022.
  57. 57. Perez L, Dragicevic S. An agent-based approach for modeling dynamics of contagious disease spread. International journal of health geographics. 2009;8(1):1–17. pmid:19656403
  58. 58. ERDdS P, R&wi A. On random graphs I. Publ math debrecen. 1959;6(290-297):18.
  59. 59. Barabási AL, Albert R. Emergence of scaling in random networks. science. 1999;286(5439):509–512. pmid:10521342
  60. 60. Al Shalabi L, Shaaban Z, Kasasbeh B. Data mining: A preprocessing engine. Journal of Computer Science. 2006;2(9):735–739.
  61. 61. Leydesdorff L, Bensman S. Classification and powerlaws: The logarithmic transformation. Journal of the American Society for Information Science and Technology. 2006;57(11):1470–1486.
  62. 62. Prechelt L. Early stopping-but when? In: Neural Networks: Tricks of the trade. Springer; 2002. p. 55–69.
  63. 63. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of machine learning research. 2012;13(2).
  64. 64. Levine-Tiefenbrun M, Yelin I, Katz R, Herzel E, Golan Z, Schreiber L, et al. Initial report of decreased SARS-CoV-2 viral load after inoculation with the BNT162b2 vaccine. Nature medicine. 2021;27(5):790–792. pmid:33782619
  65. 65. Jones TC, Biele G, Mühlemann B, Veith T, Schneider J, Beheim-Schwarzbach J, et al. Estimating infectiousness throughout SARS-CoV-2 infection course. Science. 2021;373(6551):eabi5273. pmid:34035154