Public reaction to Chikungunya outbreaks in Italy—Insights from an extensive novel data streams-based structural equation modeling analysis

The recent outbreak of Chikungunya virus in Italy represents a serious public health concern, which is attracting media coverage and generating public interest in terms of Internet searches and social media interactions. Here, we sought to assess the Chikungunya-related digital behavior and the interplay between epidemiological figures and novel data streams traffic. Reaction to the recent outbreak was analyzed in terms of Google Trends, Google News and Twitter traffic, Wikipedia visits and edits, and PubMed articles, exploiting structural modelling equations. A total of 233,678 page-views and 150 edits on the Italian Wikipedia page, 3,702 tweets, 149 scholarly articles, and 3,073 news articles were retrieved. The relationship between overall Chikungunya cases, as well as autochthonous cases, and tweets production was found to be fully mediated by Chikungunya-related web searches. However, in the allochthonous/imported cases model, tweet production was not found to be significantly mediated by epidemiological figures, with web searches still significantly mediating tweet production. Inconsistent relationships were detected in mediation models involving Wikipedia usage as a mediator variable. Similarly, the effect between news consumption and tweets production was suppressed by the Wikipedia usage. A further inconsistent mediation was found in the case of the effect between Wikipedia usage and tweets production, with web searches as a mediator variable. When adjusting for the Internet penetration index, similar findings could be obtained, with the important exception that in the adjusted model the relationship between GN and Twitter was found to be partially mediated by Wikipedia usage. Furthermore, the link between Wikipedia usage and PubMed/MEDLINE was fully mediated by GN, differently from what was found in the unadjusted model. In conclusion—a significant public reaction to the current Chikungunya outbreak was documented. Health authorities should be aware of this, recognizing the role of new technologies for collecting public concerns and replying to them, disseminating awareness and avoid misleading information.


Materials and methods
We analyzed Internet data through several novel data streams, notably from GT, Wikipedia, Twitter, PubMed/MEDLINE, and Google News (GN). All novel data streams utilized and units of measurement/ranges of values in the current study are briefly overviewed in Table 1. All data are unbounded count variables, with the exception of data generated by GT and GN, which are provided as rescaled in the range 0-100.
GT is a free online open-source tracking system of Internet search activity. In the current investigation, GT has been used to assess public interest in Chikungunya-related issues. For this purpose, GT was mined from inception (last search carried out on October 19, 2017). Searches on GT can be performed using the "search term" or the "search topic" options. The first approach enables to search exactly the keyword(s) entered by the user, while the second option results in a broader search where GT systematically performs a search of all web searches containing the entered keyword(s) or related pertinent terms.
GT web queries are reported not as absolute, raw figures but as normalized figures (relative search volumes or RSVs). In detail, in order to make comparisons, every query is divided by the total searches performed in that given region and time range, then re-scaled on a scale from 0 to 100 based on the topic's proportion with respect to all searches carried out on all searchable topics.
In our analysis, we used the second searching option. In particular, we looked for "Chikungunya (Topic)" and limited the search within Italy. For further details concerning GT, the reader is referred to Nuti et al's review of GT [22].
Wikipedia is a free online encyclopedia launched in 2001. It is generally one of the most visited websites worldwide and often consulted for health-related information. We looked at the number and time of edits for the Italian Wikipedia entry for "Chikungunya" between 2004 and 2017, as well as page visits between July 2015 and October 2017 using the Wikipedia page's revision history and the Wikimedia Foundation's Pageviews Analysis tools [23], respectively. The chronological changes of the Wikipedia page were assessed on October 19, 2017. Twitter is a social media and news platform where users post and interact with messages "tweets". A Twitter search for "Chikungunya" in Italy was performed to compare the number and time of tweets with Chikungunya outbreaks between 2006 and 2017. The search was performed and results were identified manually on October 19, 2017 and classified by number of tweets per year.
PubMed is an online repository of scholarly peer-reviewed articles using MEDLINE, a large bibliographic database covering almost all medical fields and disciplines. A PubMed/MED-LINE search was performed on October 19, 2017 for all Chikungunya-related peer-reviewed articles written in Italy and/or by at least one Italian scholar as co-author.
GN is a free news aggregator provided and operated by Google, selecting articles from thousands of news websites. It was first launched in 2002 as beta version and released officially in Correlational analyses and multivariate regression models were performed on all the novel data streams described above with the number of Chikungunya infection cases.
The partial least squares path modeling (PLS-PM) method to structural equation modeling (SEM) was chosen, in that it allows estimating complex cause-effect relationship models with latent variables, being a component-based estimation approach.
According to MacKinnon and collaborators [24], a suppressor effect can be found in the case of "a variable which increases the predictive validity of another variable (or set of variables) by its inclusion in a regression equation". Rucker and colleagues [25] have defined a suppressor variable "as one that undermines the total effect by its omission, meaning accounting for it in a regression equation enhances the predictive utility of the other variables in the equation". We used this statistical model to estimate cause-effect relationships between the different online sources used and the number of confirmed autochthonous cases, notified autochthonous cases, and allochthonous/imported cases of Chikungunya infection during the recent 2017 outbreak.
PLS-PM models have been conducted both unadjusting and adjusting for the Internet penetration index, in order to avoid that the increase of searches on the topic may be in part due to the increase of users instead of an increasing interest on the topic. Since this fact could directly affect the results obtained (in the study period, the number of Internet users has exponentially grown in recent years, growing from 33.2% to 66.0%), both models are hereby presented. Data related to the Internet penetration index were taken from the National Institute for Statistics (ISTAT).
All statistical analyses were carried out using the commercial software XLSTAT Premium (version 19.7, Addinsoft, France).
All figures with p-value less than 0.05 were considered statistically significant.

Results
A corpus of 233,678 pageviews and 150 edits on the Italian Wikipedia page, 3,702 tweets written in Italian, 149 scholarly peer-reviewed articles from Italy or by Italian scholars, and 3,073 news articles written in Italy and/or in Italian language were found and analyzed. The correlational analyses between the different novel data streams used and the number of notified or confirmed cases of Chikungunya infection showed several significant temporal correlations (Table 2); notably, a significant correlation was observed between the RSVs on GT and notified cases (p = 0.0008), as well as with confirmed cases (p = 0.0030). There was an initial burst of web searches for Chikungunya in 2006 and 2007, a second smaller one in 2014, and a very large peak in 2017 ( Fig 1A). There was also a significant correlation between notified cases and tweets (p = 0.0171). With 3,702 Chikungunya-related tweets shared in the past 12 years, Twitter activity showed a small spike in tweets in 2014 and a very large one in 2017 ( Fig 1D).
While the other sources of online data did not show a significant correlation with the number of cases, we observed similar spikes in online activity as was seen with GT and Twitter. Over 3,000 news articles were written and aggregated on GN since 2008. Small bursts in traffic were observed in 2011 and 2014, and large ones in 2008 and 2017 ( Fig 1B).
The Italian "Chikungunya" Wikipedia page was created in 2006 and underwent through 150 modifications by users and has been viewed 244,358 times. It underwent most edits during its year of creation and the two subsequent years. The page gradually saw less modifications except for small bursts of edits in 2011 and 2014. However, in 2017 the number of edits spiked back up to ranges close to that of the page's inception (Fig 1C). Between July 2015 and August 2017, there was a daily average of 69 pageviews. However, between September and October 2017 there was a very large burst of page traffic, resulting in an average of 3,862 daily active visits.
A PubMed/MEDLINE search of academic works written in Italy and/or in Italian yielded 149 peer-reviewed articles between 2004 and 2017. Major spikes in number of publications could be seen in 2008 and 2017, with a more than average amount being written between 2010 and 2014 ( Fig 1E).
Correlations between novel data streams and allochthonous/imported cases of Chikingunya were not statistically significant (data not shown).
Concerning the PLS-SEM approach, global R 2 was 0.469 and 0.663 for the notified cases (unadjusted and adjusted models, respectively), 0.453 and 0.658 for the confirmed cases (unadjusted and adjusted models, respectively), 0.371 and 0.666 for the imported cases (unadjusted and adjusted models, respectively), indicating a satisfactory fitting of the computed models. Notably, the fitting parameter (global R 2 ) was higher for the adjusted models which takes into account the Internet penetration index. Further details are reported in Table 3.
Regarding the unadjusted model of the PLS-SEM approach, the relationship between epidemiological autochthonous Chikungunya cases, either notified or confirmed, and tweets production was found to be fully mediated by the Chikungunya-related web searches as captured by GT (path coefficient between autochthonous confirmed cases and GT 0.590, p<0.05, and between GT and Twitter 0.959, p<0.01- Fig 2A; path coefficient between autochthonous notified cases and GT 0.662, p<0.05, and between GT and Twitter 0.907, p<0.01- Fig 3A). However, in the allochthonous/imported cases model, tweet production was not found to be significantly mediated by the epidemiological cases, but web searches as described by GT still significantly mediated tweet production (path coefficient between imported cases and GT -0.128, p>0.05, and between GT and Twitter 0.987, p<0.01- Fig 4A).
Taking into account the Internet penetration index (adjusted model), the relationship between epidemiological autochthonous Chikungunya cases, either notified or confirmed, and tweets production remained fully mediated by the Chikungunya-related web searches captured by GT (path coefficient between autochthonous confirmed cases and GT 0.308, p<0.05, and between GT and Twitter 0.806, p<0.01- Fig 2B; path coefficient between autochthonous notified cases and GT 0.327, p<0.01, and between GT and Twitter 0.787, p<0.05- Fig 3B). Also in the allochthonous/imported cases model, adjusted for the internet penetration index, tweet production was still found not to be significantly mediated by the epidemiological cases, but web searches described by GT still significantly mediated tweet production (path coefficient between imported cases and GT -0.056, p>0.05, and between GT and Twitter 0.894, p<0.001 - Fig 4B). In the unadjusted model, inconsistent relationships were detected in mediation models involving Wikipedia usage as a mediator variable. The direct effect between epidemiological cases and tweets production was found to be suppressed by the editing of Wikipedia (path coefficient between epidemiological confirmed cases and Wikipedia 0.375, p>0.05, and between Wikipedia and Twitter -0.567, p<0.05- Fig 2A; path coefficient between epidemiological notified cases and Wikipedia 0.348, p>0.05, and between Wikipedia and Twitter -0.571, p<0.01- Fig 3A; path coefficient between allochthonous/imported cases and Wikipedia -0.067, p>0.05, and between Wikipedia and Twitter -0.556, p<0.05- Fig 4A). A proof of such suppressor effect was obtained by calculating the regression coefficient of epidemiological cases as a predictor of tweets production (regression coefficients c 0.108 and 0.053 for the notified and confirmed cases models, respectively, being smaller in both cases than the computed path coefficients c' 0.118 and 0.056).
These findings related to the suppressor effect of Wikipedia held in the adjusted models (path coefficient between epidemiological confirmed cases and Wikipedia 0.098, p>0.05, and between Wikipedia and Twitter -0.634, p<0.05- Fig 2B; path coefficient between epidemiological notified cases and Wikipedia 0.077, p>0.05, and between Wikipedia and Twitter -0.620, p<0.05- Fig 3B; path coefficient between allochthonous/imported cases and Wikipedia -0.026, p>0.05, and between Wikipedia and Twitter -1.412, p<0.01- Fig 4B).
Similarly, in the unadjusted model, the direct effect between scientific interest (assessed using bibliometric index as a proxy) and tweets production was suppressed by Wikipedia usage (path coefficient between PubMed/MEDLINE -0.044, p>0.05 both for the notified and confirmed cases models , Figs 2A and 3A). The regression coefficient c was 0.004, smaller than the two path coefficients c' 0.038 and 0.045, for the notified and confirmed cases models, thus confirming the suppressor effect. These findings remained valid when incorporating the Internet penetration index in the model (adjusted model): the path coefficient between PubMed/ MEDLINE and Wikipedia yielded a value of -0.079 and -0.074, for confirmed and notified cases of Chikungunya, respectively (Figs 2B and 3B). A similar trend was obtained for allochthonous/imported cases (path coefficient between PubMed/MEDLINE and Wikipedia 0.005, p>0.05, and -0.053, p>0.05, for unadjusted and adjusted models, respectively, as shown in Fig 4A and 4B).
Similarly, in the unadjusted model, the direct effect between news consumption (as assessed by GN) and tweets production was suppressed by the Wikipedia usage (path coefficients between GN and Wikipedia 0.483, p>0.05, for the notified cases model; 0.489, p>0.05, for the confirmed cases model ; Figs 2A and 3A). Once more, the regression coefficient c 0.046 was smaller than the path coefficients c' 0.287 and 0.280 in the notified and confirmed cases models, respectively. However, this finding could not be replicated in the adjusted model: the path coefficient between GN and Wikipedia yielded a value of 0.982, p<0.001, and 0.988, p<0.001, for the notified and confirmed cases models, respectively (Figs 2B and 3B). A similar discrepancy between the unadjusted and the adjusted model could be detected for allochthonous/ imported cases of Chikungunya: path coefficient between GN and Wikipedia 0.594, p>0.05, and 1.027, p<0.001, respectively (Fig 4A and 4B).
In the unadjusted model, a further inconsistent mediation was found in the case of the effect between Wikipedia usage and tweets production, with web searches (as captured by GT) as a mediator variable. The path coefficients between Wikipedia and GT and between GT and Twitter were 0.107, p>0.05, and 0.907, p<0.01, respectively for the notified cases model, and 0.108, p>0.05, and 0.959, p<0.01, for the confirmed cases model (Figs 2A and 3A). The effect of Wikipedia usage on tweets production was, as already said, significantly negative. Similar findings were reported in the adjusted model: the path coefficients between Wikipedia and GT and between GT and Twitter yielded a value of 0.227, p>0.05, and 0.806, p<0.01, for the confirmed cases model, and 0.279, p>0.05, and 0.787, p<0.05, for the notified cases model. A similar statistical pattern could be found for the allochthonous/imported cases of Chikungya: the path coefficients were computed 0.392, p>0.05, and 0.987, p<0.001, and 0.263, p>0.05, and 0.894, p<0.001, for the unadjusted and adjusted models, respectively (Fig 4A and 4B).
All path coefficients with their standard errors, T-statistics, p-value, the computed bootstrapped values and standard errors, critical ratio, lower and upper bound values are reported in Tables 4 to 9 (even-numbered tables for unadjusted and odd-numbered for adjusted models). Public reaction to Chikungunya outbreak Public reaction to Chikungunya outbreak

Discussion
The surveillance of disease outbreaks and their correlation to web searches was addressed in multiple occasions in the medical literature [26,27]. Recently, an outbreak of Chikungunya was recorded in the Lazio region (western central part of Italy). This ongoing outbreak, which started in August 2017, has provoked public awareness as reflected by high peaks of webrelated activity as shown here in our study. Similarly, a burst of web related searches was recorded by GT mirroring the previous Chikungunya outbreak in Italy ten years ago; a modest peak of GT searches corresponded to the reported cases of Chikungunya in 2014 from the Caribbean and Central America [28]. Interestingly, the current outbreak is met with a greater public interest. The rise in cases is accompanied with an accelerated upslope of Google-related searches that are increasing faster than that seen in 2007 (which remain significant after adjusting for the growth of Internet users throughout the years in the study period).
Chikungunya-related reports in GN revealed news peaks in 2007 and 2017, corresponding to the Chikungunya outbreaks in Italy. Several smaller peaks were recorded during the allochthonous cases mentioned earlier, supporting the fact that local cases induce a more significant impact on web search activity. The role of GN during outbreaks was recently shown concerning the Zika virus outbreak reporting an increase in GN-related Zika outbreak web searches underlying worries and concerns of the public [29].
Infoveillance can be appreciated from other web-based search engines too. Wikipedia is a prominent online health source of information that has been shown to have an integral role in increasing public knowledge concerning the emergence of new disease or outbreaks of infectious agents [30,31]. Currently, the Italian public interest has been clearly reflected by the large volume of Wikipedia page visits during the present Chikungunya outbreak. Similar results were reported following the 2015 outbreak of Zika virus in Central and South America [29]. Moreover, high levels of public health concerns were documented by increased Wikipedia page visits around the announcement of H1N1 vaccine outbreak back in 2012 [32]. Furthermore, the high number of edits in the Italian "Chikungunya" Wikipedia page reported herein also reflect the significant interest of volunteer editors in the matter.
Twitter is another fundamental social media data stream that is a highly used by the public to share information, including health related issues. In our study, the increase of public awareness was demonstrated by increased Twitter activity. Being launched only in July 2006, it is reasonable that the 2007 Chikungunya outbreak in Italy won no attention in comparison with the large spike of activity of the current outbreak. This can be attributed to the very few users (around 702,000) on the platform during its first years since inception, which was followed by an exponential growth in users in the beginning of 2009 reaching over 300 million monthly active users in 2017 [33,34]. Within the past decade, a small spike in Chikungunyarelated tweets is documented in 2014, coinciding with the small outbreak of allochthonous cases of Chikungunya [28].
From a scientific standpoint, PubMed/MEDLINE is one of the leading resources for published medical papers. As for scholarly articles concerning Chikungunya published since 2004, the first spike was noted approximately one year after the 2007 Chikungunya outbreak and dropped immediately afterward. The delay of one year between the outbreak and publication peak might be attributed to the time frame it takes for manuscripts to be peer-reviewed, processed and published. Similarly, major articles describing the Ebola outbreak in 2014 came to light half a year following the outbreak official announcement by the WHO in March of the same year [35,36].
Subsequently, smaller peaks of published research were recorded between the 2007 and 2017 outbreaks, coinciding with Chikungunya infection in returned travelers to Italy. Prior to the current outbreak, the largest number of allochthonous cases of Chikungunya in Italy were recorded during the years 2014-2015, thus contributing to the peak of publications in 2016 [28]. In June 2016, Guzetta et al. [37] discussed the potential risk of Chikungunya and dengue outbreaks in northern Italy using a mathematical model which was built based on mosquito abundance data. The authors estimated the potential of imported human cases of Chikungunya or dengue to generate autochthonous cases in Italy in the absence of control interventions. The current outbreak fits with the findings of the latter study in terms of timing. Further investigations at the end of the current outbreak would provide better data facilitating the comparison between the 2007 and 2017 outbreak in terms of publications in PubMed/MEDLINE, taking into consideration the unavoidable delay between outbreaks and publishing.
Addressing the interaction between the novel data streams using structural equation modelling, we found that cases, whether notified or confirmed, positively affected GT in terms of search volume. In addition, we found them both to positively affect Twitter in terms of tweets. To better elucidate the last points, users tended to search for "Chikungunya" as a topic on Google before posting about it on Twitter. On the other hand, editing Wikipedia pages was found to negatively affect (suppress) the volume of tweets. Imported cases of Chikungunya had no effect on GT. Nevertheless, the effect of Wikipedia editing on Twitter posts was found to be negative.
Another interesting finding we found using the adjusted models was that for all Chikungunya case models, edits to Wikipedia pages and posts on Twitter were positively affected by PubMed/MEDLINE publications, with GN articles acting as a mediator. In other words, as more articles were published in PubMed/MEDLINE concerning Chikungunya, more articles were released into public news streams as shown by GN, and in result, more tweets were posted and the Chikungunya-related Wikipedia page was edited to reflect these updated and news. To the best of our knowledge, no similar studies have addressed this issue/topic using structural equations with such interplay of novel data streams. In the extant literature, only Rodgers et al. [38] have assessed the usage of media sources as variables for the statistical analysis to predict public health behaviors showing that there is predictive value for including media variables as part of the segmentation process. Another study also noted similar increases in page views and edits on Wikipedia articles that were related to certain topics featured on TV or news outlets [39].
The importance of these findings would better guide health authorities to take advantage of web streams in terms of providing credited news as well as addressing the population's concerns during outbreaks. The use of online content to detect public interest and disease outbreaks have been shown to be quicker than traditional public health surveillance, thus providing prospects for revolutionizing future surveillance methodologies [40].
Our study has several limitations; first, the precise algorithms used by GT, GN, Wikipedia, Twitter and PubMed/MEDLINE are not publicly available, leading to difficulties in processing and manipulation of the data, as already reported. Additionally, the presented data are provided as relative, normalized figures, and not as absolute, crude data. Therefore, our results and findings strictly depend on the chosen time window and geographic location studied. However, we compared GT, GN, Wikipedia, Twitter, and PubMed/MEDLINE search volumes to each other, showing that the behavior is consistent and reproducible among different web platforms. Furthermore, it remained significant also after adjusting for the Internet penetration index.
In conclusion, exploiting novel online data streams provide public health professionals the tools to detect disease outbreak patterns earlier, and thus to employ more effective policies and measures.
However, despite great progresses made regarding the internet as a major source of healthrelated information, there are still many challenging issues frequently encountered, including the future implications of such novel data streams as an effective tool to prevent infectious diseases outbreaks. Therefore, nationwide public health authorities, particularly Italian health authorities, should take advantage of our findings in dealing with the current outbreak of Chikungunya in Italy. Moreover, health authorities should be aware of the public's reactions to current events, to recognize online resources as tools for collecting the concerns of public opinion and reply to them, disseminating awareness and avoid misleading information.