Forecasting influenza-like illness dynamics for military populations using neural networks and social media

doi:10.1371/journal.pone.0188941

Table 1.

The ICD-9 codes used to describe ILI symptoms (NOS = Not otherwise specified).

More »

Expand

Fig 1.

Location-specific ILI dynamics.

Weekly ILI proportions between 2011 and 2014 for six example geolocations.

More »

Expand

Fig 2.

Tweet distribution across geolocations.

The number of tweets collected within a 25-mile radius of military installations for 31 geolocations.

More »

Expand

Fig 3.

Diagram of a two-branch neural network model for ILI dynamics prediction.

The model combines ILI historical estimates encoded using one LSTM layer on the left and social media predictors (ILI + SM) encoded using another LSTM layer on the right to forecast ILI dynamics in weeks.

More »

Expand

Table 2.

ILI nowcasting results (current week) for six geolocations estimated using cross-validation over four years (2011–2014).

Models: AdaBoost, SVM with a linear kernel, and LSTM. Metrics: Pearson correlation (CORR), RMSPE (%), MAPE (%), and RMSE. The highest performance results within each datatype are highlighted in bold.

More »

Expand

Fig 4.

ILI nowcasting results for six geolocations from various social media signals estimated using cross-validation over four years (2011–2014).

Predictive models—AdaBoost, SVM, and LSTM. Predictive features—ILI, Network, Tweets, and Embeddings. Evaluation metric—Root Mean Squared Error (RMSE).

More »

Expand

Fig 5.

ILI nowcasting results for six geolocations obtained using SMOnly model trained on nine types of social media signals.

We contrast Pearson and RMSE for six geolocations. Locations with labels show the best (embeddings, unigrams, hashtags and mentions), on the right from the dotted vertical line, and the worst (stylistic and topics) social media signals.

More »

Expand

Fig 6.

True vs. predicted ILI proportions (real-time current week estimates) as a function of time (2011–2014) for six geolocations.

We plot true ILI proportions (True), predictions from social media (tweet and network) features (SM), and predictions from ILI historical data (ILI) obtained using LSTM model.

More »

Expand

Table 3.

ILI forecasting (one week) results for six geolocations.

Models: AdaBoost, SVM with a linear kernel, and LSTM. Metrics: RMSPE (%), MAPE (%), and RMSE (% ILI). The best performing models within each data type are highlighted in bold.

More »

Expand

Table 4.

ILI predictions for 31 geolocations estimated using Pearson correlation for nowcasting (this week) and forecasting (one and two weeks).

Neural network models are trained from ILI data only (ILI), social media data only (SM), or both ILI and SM data (ILI + SM). Locations are sorted by the amount of Twitter data available in a descending order (the first column is shown in millions). Locations with min and max correlations are underlined.

More »

Expand

Table 5.

ILI prediction results for 31 geolocations estimated using RMSPE for nowcasting (this week) and forecasting (one and two weeks in advance).

Locations with min and max correlations are underlined.

More »

Expand

Table 6.

ILI prediction results for 31 geolocations estimated using MAPE for nowcasting (this week) and forecasting (one and two weeks).

Locations with min and max MAPE scores are underlined.

More »

Expand

Fig 7.

True vs. predicted ILI dynamics one week in advance as a function of time in 2014 for 31 geolocations.

We plot true ILI values (True), one week forecasts obtained using social media features only (SM), ILI historical data (ILI), and combined ILI + SM data (ILISM).

More »

Expand

Fig 8.

True vs. predicted ILI dynamics two week forecasts as a function of time in 2014 for 31 geolocations.

We plot true ILI values (True), two week forecasts obtained using social media features only (SM), ILI historical data (ILI), and combined ILI + SM data (ILISM).

More »

Expand

Fig 9.

Model performance measured as Pearson correlation between true and predicted ILI dynamics as a function of the number of tweets per location.

Predictions are made for the current week, one and two weeks in advance using SMOnly and ILI + SM models. Outlier locations are marked with labels. Trends are shown as dotted lines. International locations are shown as triangles.

More »

Expand

Fig 10.

Model performance measured as RMSPE as a function of the number of tweets per location.

Predictions are made for the current week, one and two weeks in advance using SMOnly and ILI + SM models. Outlier locations are shown with labels. Trends are shown as dotted lines.

More »

Expand