Table 1.
General description of Dengue activity for Twitter dataset in Brazil (Kind et al, 2016).
Fig 1.
Country wide time series for web-based data and Dengue cases.
Scatterplots and linear regression lines for all web-data analyzed: Tweets (r = 0.87, p<0.001), Google Trends (r = 0.92, p<0.001), and Wikipedia (r = 0.71, p<0.01) (A). Dengue Cases times series and association with Dengue web-data: Twitter data (B), Google Trends interest (C), and Wikipedia access logs (D). Each point on graph A represent data aggregate per week from September, 2012 through October, 2016.
Fig 2.
Tweets are a useful tool for estimating Dengue activity at country level.
(A) The non-linear effect of Tweets on Dengue. (B) The non-linear effect of weeks on Dengue. (C) Time series of observed Dengue cases (black line); model in-sample estimated Dengue cases (red line), and out-of-sample estimated Dengue cases (blue line); and its 95% confidence interval (dashed red and blue lines) during 209 weeks.
Table 2.
Tweets are a useful tool for estimating Dengue activity at country level.
Comparison between the selected model and other models with combinations of the variables: tweets, Dengue cases and temporal structures; using AIC, explained deviance and mean relative error as estimation capacity indicators.
Fig 3.
Capacity of the tweets to predict Dengue up to 8 weeks in advance. The model selected (Table 2) was adjusted to different time lags between tweets and Dengue cases. The lines indicate the model result of Dengue estimated in 1 to 8 weeks in advance of tweets.
Table 3.
Dengue estimation capacity of tweets up to 8 weeks in advance. The model selected (Table 2) was adjusted to different time lags between tweets and Dengue cases, the numbers (Tw-1, Tw-2,…Tw-n) indicates the number of weeks that tweets are considered before the week of Dengue prediction.
Fig 4.
Tweets signal can be obtained at city level.
Spatial distribution of evaluated cities in Brazil and their intensity of Dengue-related tweeting activity (A), and their incidence of Dengue cases (B). The data were aggregated from September, 2012 to October, 2016 and presented as cases or activity per 100,000 inhabitants.
Fig 5.
Heat map indicating the goodness-of-fit of the tweets model at city level.
Deviance explained index resulting from the prediction model is shown. Cities that had too few data to be analyzed by the model are represented with grey circles. Cities with higher indices are mostly clustered at the southeastern region of the country.
Table 4.
Dengue estimation capacity by tweets.
Frequency distribution of cities considering the deviance explained result of the model applied to each city.
Fig 6.
Dengue estimation by tweets model at city level.
Time series with observed Dengue cases (black lines), tweets data (blue bars), model fitted Dengue cases (red lines) and 95% confidence interval (red dashed lines) at city level. (A) Belo Horizonte (r = 0.93, r2 = 90.3); (B) Fortaleza (r = 0.41, r2 = 90.0); (C) Manaus (r = 0.78, r2 = 83.5); (D) Porto Alegre (r = 0.71, r2 = 76.1); (E) Rio de Janeiro (r = 0.80, r2 = 82.6); and (F) São Paulo (r = 0.47, r2 = 89.0).
Table 5.
Goodness-of-fit of the tweets model is influenced by disease incidence, access to computers, internet and social factors.
Cities were divided into two groups: high, for cities with model Dengue estimate explained deviance equal or higher than 60%, and, low, otherwise.
Fig 7.
Correlation between different possible explanatory factors and the goodness-of-fit (explained deviance) of the final model. The variables are: population (Pop), gross internal product per capita (GDP), mean human development index (IDHM), human development index for income (IDHMI), human development index for longevity (IDHML), human development index for education (IDHME), coverage of houses with personal computer (PC), coverage of houses with internet access (Net), total Dengue cases (DenTot), total Dengue incidence in cases per 100,000 inhabitants (DenInc), total Dengue cases in logarithmic scale (DenLog), total tweets (TwTot), total tweets incidence in activity per 100,000 inhabitants (TwInc), total tweets in logarithmic scale (TwLog), and deviance explained by the model (Model). Total tweets and Dengue cases were calculated as the sum of occurrences from September, 2012 to October, 2016.