Estimating the time-varying effective reproduction number via Cycle Threshold-based Transformer

doi:10.1371/journal.pcbi.1012694

Fig 1.

The architecture of Ct-Transformer.

(A) The Ct-Transformer accepts a time series of Ct variables and outputs the corresponding time series of estimated R_t. The agent-based SEIR transmission method is designed for producing the synthetic datasets. (B) The Ct-Transformer segments the time series of Ct variables into patches, which are then projected into a high-dimensional space based on their respective weights. The model extracts the temporal features from the high-dimensional hidden representations and builds non-linear relationships with R_t. (C) Masked self-supervised learning of the Ct-Transformer, where patches are randomly selected and set to zero. The Ct-Transformer is then tasked with reconstructing these masked patches.

More »

Expand

Table 1.

Detailed partitioning of the synthetic datasets, including the ER dataset and SF dataset.

More »

Expand

Table 2.

Hyperparameters, corresponding tuning spaces, and the best hyper-parameter settings for the Ct-Transformer on the ER dataset and SF dataset.

More »

Expand

Fig 2.

The Ct values reflect epidemic dynamics throughout their outbreaks.

(A) The number of currently infected individuals (light histogram) and the number of newly infected individuals (dark histogram) in a stochastic simulation with R₀=3.0. (B) The average of Ct values for the infected population versus time. Each line shows the average of 1500 simulations. (C) The distribution of Ct values (violin plots) in the infected population, which are randomly selected on the detected days (t=10, 30, 50, and 70) during the outbreak described in (A). The median, along with the first and third quartiles of the distribution, are indicated by purple lines, while the red dots represent the average of Ct values (the median quartiles) on these detected days. (D) R_t varies as the transmission of epidemics, with lines corresponding to those of the same color in panel (B). Each outbreak is simulated on the ER contact network.

More »

Expand

Fig 3.

The performance of the supervised Ct-Transformer on two stochastic simulations on the ER network.

Panels (A) and (B) display the R_t estimations by the Ct-Transformer and other alternative models for simulations where R₀ is set to 1.8 and 3.4, respectively. To be noted that, all models begin estimating R_t when there are fifteen cases in the simulated population. The ground truth values are derived from the micro-transmission chain of the two stochastic simulations, based on the definition of the case reproduction number. The estimation results of EpiEstim (dotted yellow line), ViroSolver (dotted green line), and TFT (orange line) are presented as the representative alternative methods for incidence-based, Ct-based statistical, and Ct-based deep learning approached, respectively.

More »

Expand

Table 3.

Results of the supervised Ct-Transformer and baseline methods on the ER dataset.

We respectively show the average result of simulations with each R₀ ∈ {1.2, 1.8, 2.2, 2.8, 3.4} in the testing set. The Average represents the average of these above results. For each R₀ and the Average, the best results are in bold and the second best are underlined.

More »

Expand

Table 4.

Results of the supervised Ct-Transformer and the EpiEstim on the ER dataset under different detection scenarios.

We show the average results of simulations with R₀ ∈ {1.2, 2.2, 3.4} in the testing set. For each detection scenario, the best results are shown in bold between the Ct-Transformer and the EpiEstim.

More »

Expand

Table 5.

Results of the Ct-Transformer (with End2End, Lin. Prob, and Sup.), other Ct-based supervised and incidence-based methods on the SF dataset.

We show the average results of simulations with R₀ ∈ {1.2, 1.8, 2.2, 2.8, 3.4} in the testing set. The Average represents the average of the above results. For each R₀ and the Average, the best results are in bold and the second best are underlined.

More »

Expand

Fig 4.

Estimation results in the real-world dataset.

(A) Daily average (green line) and skewness (orange line) of Ct values from July 6th 2020 to March 23th 2021. The dotted horizontal line represents the value of skewness equal to zero. (B) The orange line and shaded area indicate the average and 95% confidence intervals for the results of the incidence-based method, while the blue line and shaded area indicate the average and 95% confidence intervals for the results of the regression method. The data mentioned above is provided by [31]. The green line and shaded area indicate the average and 95% confidence intervals for the results of the self-supervised Ct-Transformer. The grey bars represent the number of infections by sampling date. The overall period is divided into training, validation, and testing periods (left, middle, and right columns).

More »

Expand

Fig 5.

The average percentage increase in loss for each ablation type .

(A), (B), and (C) respectively represent the MAE, RMSE, and R² loss. The gray and yellow bars respectively represent the ablation results on the ER and SF datasets. The numbers on the red dotted line represent the average result of both datasets. The error bars represent the standard deviation among the increases in loss for each R₀ in the testing set.

More »

Expand