An explainable covariate compartmental model for predicting the spatio-temporal patterns of dengue in Sri Lanka

doi:10.1371/journal.pcbi.1013540

Fig 1.

Aggregated reported cases for the 16 study districts across Sri Lanka, 2011–2019.

There was a significant outbreak in 2017, with smaller outbreaks occurring in 2012, 2014, 2016 and 2019 [42]. Colombo and Gampaha generally have a higher case count than other districts partly because of the higher population number. Districts in order from top to bottom: Colombo, Gampaha, Kalutara, Kandy, Polonnaruwa-Matale (P-M), Badulla-Nuwara Eliya (B-N), Galle, Ampara-Monaragala-Hambantota (A-M-H), Matara, Jaffna, Kilinochchi-Mannar-Vavuniya-Mullaitivu (K-M-V-M), Batticaloa, Trincomalee-Anuradhapura (T-A), Kurunegala-Puttalam (K-P), Ratnapura, Kegalle. Due to the limited number of cases in a few districts, they were aggregated with neighboring districts.

More »

Expand

Table 1.

List of covariates.

More »

Expand

Fig 2.

Structure of hybrid compartment model (A) and lag configuration selection (B).

(A) At each time step, t, covariates (household income, population rate of age over 60, mean temperature, precipitation, mean NDVI) and newly infected cases, and the hidden state h_t from the previous time step are encoded by the LSTM model. The output, , from LSTM model is then passed into a linear layer with a sigmoid activation function and amplified by a scaling coefficient to approximate the time-varying force of infection λ_t. The compartment model with S, E, I, R representing susceptible, exposed, infected and recovered compartments, respectively, are driven by λ_t. (B) We perform lag configuration selection on the validation set by shifting climate factors with different lag time steps (no lag, 1-3 week, 4-6 week, 7-9 week, 10-12 week) and get the best performance lag, worst performance lag configuration. Then we repeat the same selection for all the climate covariates. At last, we combine the best performance lag configurations of all the climate covariates to select the best lag model.

More »

Expand

Fig 3.

The vertical axis shows MAE values for the model configurations described in Table 2, calculated as the average result across all districts. Models I through IV are evaluated as part of the model selection process, while Models V additionally account for the strain shift and the immunity reduction observed in 2017 with the introduction of DENV-2 and are validated on 2019 prospective data.

More »

Expand

Table 2.

Models evaluated during model selection and validation. Model I includes only socioeconomic covariates. Model II to V include socioeconomic factors and meteorological factors which exhibit lag effects, i.e., mean temperature, precipitation, and mean NDVI. Model V compares the full set excluding corrected covariates with introduction of the shifting strain.

More »

Expand

Fig 4.

Cases fitting curves for the whole dataset over all the districts for the comparison.

Y axis is the average of newly reported cases over all the districts. The Training and validation period division is indicated by the red dotted vertical line. The shaded areas represents conformal interval for model V, model I abd LSTM.

More »

Expand

Fig 5.

Covariate influence on the force of infection for the selected hybrid model (model V).

More »

Expand

Fig 6.

The directional impact of covariates on the cases from the selected hybrid model (V).

Red color indicates a positive covariate influence and blue color indicates a negative covariate influence. The bars are the total sum of positive (red) and negative (blue) covariate influences on the outcome from a 1% change in the covariate.

More »

Expand

Table 3.

Comparison on Mean Average Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) of LSTM and hybrid model.

More »

Expand

Fig 7.

Prospective validation of the hybrid model (SEIR-LSTM) and the pure machine learning model (LSTM) on average weekly case case data for all districts in 2019 (A) and per district aggregated mean absolute error (MAE) of the SEIR-LSTM model (B).

In (A) the X-axis represents the time in weeks and the Y-axis represents the newly infected cases. The blue shaded area represents the prediction interval of the LSTM model. The MAE over all time steps of the hybrid model and LSTM respectively are 491 and 527. The light red shaded area represents the confidence interval of the hybrid model (see uncertainty qualification for details). The choropleth map (B) shows the MAE of the hybrid model forecasts for each geographical district where X- and Y-axes correspond to longitude and latitude. Comparison figures as figure (A) for all the districts can be found in Fig C and Fig E in S1 Text, S3 and S5 Figs.

More »

Expand