Engagement with Health Agencies on Twitter

Objective To investigate factors associated with engagement of U.S. Federal Health Agencies via Twitter. Our specific goals are to study factors related to a) numbers of retweets, b) time between the agency tweet and first retweet and c) time between the agency tweet and last retweet. Methods We collect 164,104 tweets from 25 Federal Health Agencies and their 130 accounts. We use negative binomial hurdle regression models and Cox proportional hazards models to explore the influence of 26 factors on agency engagement. Account features include network centrality, tweet count, numbers of friends, followers, and favorites. Tweet features include age, the use of hashtags, user-mentions, URLs, sentiment measured using Sentistrength, and tweet content represented by fifteen semantic groups. Results A third of the tweets (53,556) had zero retweets. Less than 1% (613) had more than 100 retweets (mean  = 284). The hurdle analysis shows that hashtags, URLs and user-mentions are positively associated with retweets; sentiment has no association with retweets; and tweet count has a negative association with retweets. Almost all semantic groups, except for geographic areas, occupations and organizations, are positively associated with retweeting. The survival analyses indicate that engagement is positively associated with tweet age and the follower count. Conclusions Some of the factors associated with higher levels of Twitter engagement cannot be changed by the agencies, but others can be modified (e.g., use of hashtags, URLs). Our findings provide the background for future controlled experiments to increase public health engagement via Twitter.


Introduction
Government agencies are increasingly interested in using social media to distribute information at the national, state and local levels. U.S federal agencies, for example, routinely use a variety of social media sites including Twitter, Facebook, YouTube, Flickr, and Instagram to enhance communication [1]. In addition to distributing information, government agencies are increasingly interested in interacting with the populations they serve. For example, new guidelines entitled ''Digital Governmental Strategy'' outline specific steps for governmental agencies to make digital information more ''customer centric'' [2]. This bidirectional form of communication can be defined as engagement: interactions designed to promote some common goal [3].
To date no study has systematically explored factors associated with the levels of health agency engagement on social media. Our objective is to address this gap by using retweeting as a measure of engagement. Specifically we address the following three questions with respect to Twitter messages posted by US Federal Health agencies and their responses. First, which features are associated with the level of response in the form of retweets? Second, which features are associated with the interval between an agency's tweet and its first retweet? Third, which features are associated with the interval between an agency's tweet and the last retweet it generates? We address our goals by analyzing an almost comprehensive set of tweets posted by the 130 Twitter accounts of 25 Federal Health Agencies. We explore associations between factors with level of retweeting using hurdle models. We explore the temporal factors related to our second and third questions using survival models. Factors we examine include standard features such as the number of friends and followers as well as less studied features relating to the semantic content of a tweet.

Background and Significance
The U.S. government uses several social media services, but Twitter is one of the most commonly used service. Recent estimates indicate that approximately 18% of online adults use Twitter [4], and over 500 million users around the globe [5] generate over 500 million tweets per day [6]. Given the widespread use of Twitter and the fact that people are increasingly using it to share their experiences with illness and treatments as well as other health concerns [7], Twitter provides a potentially valuable stream of health-related information. Several studies have used Twitter to discover adverse drug reactions [8,9], perform surveillance for disease activity [10,11] and health beliefs [12,13]. Twitter has also been used to investigate general health behavior [14,15]. However, few studies have focused on how health agencies use Twitter. The studies that do exist describe activity consistent with distributing information with little attention paid to engagement [16]. One of the few studies on engagement via Twitter focuses on levels of engagement: low (have followers), medium (promote retweeting) and high (have offline interactions) [3]. In contrast, to previous studies, our goal is to determine the factors associated with engagement of federal agencies with the ''Twitter Public''. The caveat to note is that while we focus on public engagement an agency may be equally or even more interested in information dissemination alone.
We study factors related to engagement in terms of retweeting activity. A retweet is an acknowledgment that the original tweet has been read and also that it is viewed as sufficiently interesting to merit a re-post. The followers of the retweeting account now have access to the original retweet. Retweets are in some sense analogous to citations in an article. A second aspect of engagement relates to the time period over which retweeting occurs. A tweet with a longer retweeting time span compared to another is one where engagement occurs over a longer period of time. Thus, Twitter engagement for a federal agency is maximized when all of its tweets generate the highest possible number of retweets with retweets starting almost immediately after the tweet is posted and continuing on forever. While in practice these conditions are never achieved, it is clear that some tweets generate stronger responses than others. Our overarching goal is to determine whether there are features that relate to higher levels of retweeting and longer lifespans of tweets in order to offer insight into ways to gauge and strengthen Twitter engagement for health agencies.

Data Collection
Agencies & Handles. We selected health agencies through the HHS Social website, which maintains a list of all official HHSaffiliated accounts across various social media platforms [17]. We identified all agencies with Twitter accounts (also known as handles).
Tweets & Retweets. The Twitter REST API v1.1 [18] was used to collect all tweets from a handle's timeline as of late November 2012 (data collection was done between 11/20/2012-11/21/2012). Using this method, a maximum of 3200 tweets from a handle's timeline can be retrieved. These timelines extended from a few months (e.g., around 3 months for CDCSTD) to several years (e.g., around 3 years for NIGMS). On average the timeline was around 2 years for all handles. We could collect all posted tweets for 112 handles; 18 handles had more than 3200 tweets at the time of data collection so the data for these handles was censored. The average timeline for these handles also spanned around 2 years. Handles such as CDCSTD, womenshealth and CDCNPIN had posted over 9000 tweets by the time of the data collection. For such handles the most recent 3200 tweets were collected. For each agency tweet, we recorded its unique identifier and raw retweet count among other tweet-based data and metadata as described below.

Tweet Features
First we decided which features we would use to represent each tweet. We included those examined commonly in Twitter-based studies as well as those that have not yet been considered. Table 1 lists 11 features we considered under 2 broad categories: handlelevel features that are the same for all tweets issued by a handle (e.g., numbers of followers and friends) and tweet-specific features such as sentiment.
We also divided the features into two logical groups. Group 1 has features that cannot be changed or easily manipulated by an account holder. We include tweet age in this group as it represents a natural phenomenon. The account holder has control over Group 2 features.
Group 1 features include the number of followers, friends and favorites. If user Y is a follower of user X then it means that Y receives all of X's tweets automatically. Also, X is regarded as a friend of Y. Relevant to us is that a tweet is displayed on the timelines of all of its handle's followers, so these are the users most likely to retweet the post. The feature favorite is the number of users favoring a particular handle. Twitter forms a network due to its follower and friend relationships between users. From this network, we calculate a betweenness-centrality score. This shows the extent to which a node acts as an intermediary in the shortest paths between nodes in the network; it indicates the importance of a particular node in the network structure. We analyzed sentiment using a state-of-the-art lexicon-based sentiment classifier, SentiStrength [19,20]. SentiStrength has been widely applied for sentiment analysis of tweets [21] and has been shown to outperform other lexical classifiers [22]. SentiStrength classifies each tweet into positive and negative sentiments on a scale of +/21 (neutral) to +/25 (extreme).
One aspect of tweet analysis that is often overlooked in Twitter studies is the content of the tweets. The exception is in the few studies focused on specific domains (e.g., manual coding of 1,000 concussion-related tweets along 9 broad themes [23]). Content is important as some subjects may attract a broader audience than others. In order to analyze tweet content, we design a fully automated method for content analysis. Manual analysis is not feasible as it limits the number of tweets that can be content coded. We use the National Library of Medicine's Medical Text Indexer (MTI) [24,25] to assign Medical Subject Headings (MeSH) [26,27] recommendations for each tweet. MTI is commonly used for recommending MeSH terms to biomedical literature based on the titles and abstracts. It has been shown to be useful in other domains such as clinical text [25]. The terms recommended for each tweet are mapped into semantic types [28], which in turn are assigned to semantic groups [29,30]. Note that a particular tweet can be assigned to multiple semantic groups.

Choice of Models
The number of retweets per tweet in our dataset is highly skewed with many zeros. This type of data distribution where the variance is much greater than the mean is described as overdispersed [31] with zero-inflation [32]. Typically models such as Poisson or negative binomial regression are used to model count data. However the zero-inflation of the retweet count necessitates the use of two-part count data models such as the hurdle regression model [33][34][35].
Hurdle models have two separate components: a zero-portion to model the inflation of zero counts in the data and a count-portion to model the non-zero counts of the data. The zero-portion determines the binary outcome of whether a count is zero (no retweets) or not using a binomial probability model. The count portion of the model determines the conditional distribution of the non-zero count of the data using a zero-truncated negative binomial or Poisson model.
We formally compare different count data regression models (namely, the Poisson (P), negative binomial (NB), hurdle Poisson (HP) and hurdle negative binomial (HNB)) using standard goodness-of-fit measures [36,37] better fit compared to the other models. Our comparison of full and nested models such as hurdle negative binomial and negative binomial using the likelihood ratio test (LRT) also corroborates to other goodness-of-fit measures in implying that the former model fits our data best.
In addition, we use methods from survival analysis [38,39], to model the temporal aspects of retweeting. Typically in survival analysis we build models to analyze ''time to events'' such as death of an organism or failure of a machine [40]. In our case, we estimate two survival models. For the first model, the ''event'' refers to the time until the appearance of the first retweet. For the second model, the ''event'' is the time to the last retweet of a tweet -the length of time that the tweet is in ''circulation''. Similar to previous Twitter research [41] we use the Cox proportional hazards regression model [42] to predict how the different handle and tweet-based features influence the time to the first and last retweets.  Table 2 lists the various agencies (including their expanded names), the number of handles for each and a few examples of handles.
In raw numbers we note that while the CDC posted the most tweets (37,136), it also has the highest raw number of tweets that are not retweeted (11,063). In contrast, the Office of the Secretary (OS), a close second in the number of total tweets (36,587), has the highest number of retweeted tweets (28,561) and also the highest number of retweets (376,158). Each tweet from OS gets approximately 10 retweets. The agency with the most retweets per retweeted tweet is NIH/NIMH with about 18 retweets per tweet. Also, it leads the agencies with 82% of its tweets retweeted at least once. Interestingly, this agency has less than 1000 tweets. Table 4  88.46% of the retweeted tweets get their first retweet on the day of the tweet (referred to as day zero in our discussion). 60.6% of the retweeted tweets get their last retweet on day zero. Very few tweets receive their first tweet after 100 days. Similarly very few tweets get their last retweet after day 500.
We also study the power-law characteristics of different aspects of retweeting. With the exception of time to first retweet (power exponent = 1.87), we find retweets/tweet (exponent = 2.56), retweets/retweeter (exponent = 2.35) and time to last retweet (exponent = 2.33) have exponents in the range expected for power law distributions (between 2 and 3, with few exceptions). Concerning retweets/retweeter, we note that a few Twitter users retweet extensively (more than 500 times) while the majority of them retweet sparingly. Figure 1 shows these plots.
Concerning agencies, we find that 117 of the 130 HHS handles retweet each other's tweets. The top retweeting agencies are womenshealth with 2500 retweets followed by the NIH/NCI with 1662 retweets. MedicareGov, NCITechTransfer, NEHEP, NIAIDFunding and NIOSHManuf have the lowest retweet counts with 1 retweet each. Apart from these HHS handles, OrleansCo-Health, the Twitter handle of Orleans County Health Department (New York), has the highest retweeting activity with 3154 retweets. Figure 2 shows a scatter plot of followers versus friends. We find that CDCemergency has the highest number of followers (1,432,424) but very few friends (393). On the other hand GoHealthyPeople has many friends (7,688) but few followers (34,913). NIAIDCareers (1008: 729) and distressline (1701: 1203) have relatively balanced number of followers and friends in comparison to the overall ratio of followers and friends for the different handles (49832: 405).
The top ranking handles in tweet count are CDCSTD (12151) NIHforHealth, CDCgov and HHSGov have the highest betweenness-centrality values of 987.2, 851.51 and 717.54 respectively. Betweenness-centality does not apply to NIHforFunding and nlm_newsroom as these are nodes with zero in-or outdegrees.
An overwhelming portion, 75% of tweets in our dataset contain URLs. Around 57% contain hashtags while 38% contain usermentions. Table 5 shows the distribution of tweets across sentiment scales. We find that in general slightly more tweets are classified as negative (percentage of moderate to extreme negative is 32.2% while for positive this percentage is 28.3%). Table 6 shows the 15 semantic groups with examples of component semantic types and their prevalence in our dataset. ''Concepts & Ideas'' (41.68% tweets) is the most prevalent group followed by ''Disorders'' and ''Living Beings'' (around 36% for each). ''Genes & Molecular Sequences'' is least frequent (0.69%). Health agencies more often discuss concepts and ideas or disorders than amino acid and carbohydrate sequences.
We also compared the tweets posted by the health agencies with news in traditional media. The influence of traditional news sources on social media has been studied [43][44][45] but not in health. Google Health News is an aggregator that has been shown to be useful in infectious disease monitoring [46]. Gathering news Our results with health agency tweets is consistent with previous studies finding topics discussed in Twitter to be considerably different from traditional news sources [43].

Hurdle Model Analysis of Tweets
Results from the hurdle model are given in Table 7. But first, an important assumption in multiple regression analysis is that the variables used in the statistical models are independent of each other i.e. multicollinearity should not exist among them. We use the variance inflation factor (VIF) to check for the presence of multicollinearity in our experiments. VIF scores for all independent variables in our regression analysis were within the range of zero to 5 indicating no multicollinearity issues.
For the zero portion of the hurdle model -modeling whether a retweet occurs or not -increases in the number of favorites and followers are positively associated with retweets, as is tweet age. Tweet count, however, is negatively associated with retweets. Hashtags, URLs and user-mentions -are positively associated with retweets. Both positive and negative sentiments are associated with a lower probability of retweeting. Almost all semantic groups, except for geographic areas, occupations and organizations, are positively associated with retweeting.
For the count portion of the hurdle model -modeling the number of retweets -the results are similar to those of the zero portion with a few exceptions: friend count, which was insignificant in the zero portion, is negatively associated with number of retweets. Hashtags and URLS are negatively associated with the number of retweets. Also, some semantic groups are negatively associated with retweet counts, but positively associated with whether or not a retweet occurred, specifically anatomy, devices, genes & molecular sequences and procedures.

Cox Models of Tweets
We estimated two Cox proportional hazards models. First, we modeled time to first retweet, and the results are presented in Table 8. In this case, shorter time periods are preferred. Time to retweet is shorter for handles that have more favorites and followers. It is also shorter for tweets with longer tweet age and the presence of hashtags. Time to retweet is longer for increases in friend count, user-mentions, and positive sentiment. Most of the semantic groups are not associated with time to first retweet.
Second, we modeled the time to the last retweet, and the results are presented in Table 9. In this case, longer time periods are preferred. Longer time to the last retweet is associated with the  handle's follower count, the presence of a URL in the tweet, and positive sentiment. Handles with more favorites, higher tweet count, and increased betweenness-centrality, as well as tweets with user-mention, hashtags and negative sentiment have shorter times to last retweet.

Discussion
Our results show that although multiple federal health agencies are using Twitter, there is a great deal of difference between levels of Twitter use and also retweets. For public health agencies, we found that a tiny minority of tweets gets more than 100 retweets; a two-thirds majority of tweets get on average 8 retweets. We also found that a handle's follower count and favorite count have strong positive relationships with retweeting behavior. While these features are not easy for agencies to improve, they are easy metrics to follow. In contrast, we found that having more friends on Twitter was negatively associated with the number of times a tweet is retweeted.
Early adoption of Twitter by an agency is associated with our measures of engagement. As a handle ages the chances for engagement overall seem to improve. This is consistent with findings in the general Twitter domain [47]. This is not something that agencies can change but it does provide support for health agencies thinking about starting Twitter accounts to do just that and not to wait and delay getting started.
Agencies generating more tweets than others do not necessarily have more retweets. In fact, we found that tweet count, the number of tweets posted overall, is negatively associated with retweets. This is consistent with anecdotal evidence from the web [48,49]. This suggests that an agency might consider only tweeting posts that it regards as important so as to not 'dilute' the public's attention. However, this observation must be balanced against the fact that information dissemination on a topic may be an organization's main goal and not necessarily public response. In that case regular or even frequent postings related to a message may be appropriate.
Health agencies can augment their tweets by adding hashtags, URLs, or user-mentions and this may increase the likelihood that users will find the information encoded in the tweet more useful and thus retweet it. Indeed, we found that the addition of hashtags, URLs, or user-mentions did indeed increase the likelihood that a given tweet would be retweeted. However, the inclusion of hashtags and URLs is also associated with decreased numbers of retweets, and user-mentions are associated with shorter times to last retweet. Thus, agencies may be able to increase retweets by using these conventions, but they might not increase the longevity of tweets. Our user-mentions results are in slight contrast to previous research, which found these to have (marginally significant) negative associations with retweeting [47]. But our results for hashtags and URLs are generally consistent with previous results [47,50].
Our observations regarding hashtags, user-mentions and URLs are also interesting because of differences in their prevalence between our dataset and Twitter data in general. The agency tweets in this paper use more URLs than found in the general domain, 75% vs. 19% [51] and 21% [47]. We speculate that this abundance of URLs for tweets from health agencies may be because in health communications references to sources and supporting materials are necessary. This is supported by another study on the use of Twitter by local health departments where the authors found 74% of tweets contain URLs [52]. Hashtags and user-mentions are also more prevalent in our dataset appearing in 57% and 38% of agency tweets respectively, while in the general domain hashtags were found in only 16% and user-mentions in only 20% of tweets [52].
Betweenness-centrality is positively related to the number of retweets and negatively related to the time to last retweet. While betweenness-centrality has been used extensively in social media research in various domains ranging from health to politics [53][54][55][56], in most cases it is used as a metric of influence in a retweet or a reply network. To the best of our knowledge, researchers have not explored the direct association of betweenness centrality scores to retweeting activity. We speculate that since we calculated betweenness-centrality based on the follower-following network among agencies, an agency with high betweenness-centrality, i.e. following many other federal agencies, may not have any major effect on the rate or lifespan of retweets. Much work has been done involving mining sentiment from Twitter and it has previously been demonstrated that the presence of sentiment of one kind or the other is associated with higher rates of retweeting [57][58][59]. In contrast, we found that sentiment in tweets from government agencies, either positive or negative, is not associated with retweeting. It should also be noted that agency tweets are predominantly neutral (70%).
Semantic groups have not been studied in the context of retweet rates. We found that posts about activities and behaviors, chemicals and drugs, disorders, living beings, objects, phenomenon and physiology are positively associated with engagement. In contrast, posts about organizations, occupations, genes & sequences and geographic areas tend to lower engagement. But it may also be that the intent behind such posts are less to engage and more to just inform.

Limitations
Our study has a few limitations. First, it is comprised of observational data; i.e., we did not run formal experiments. Thus although we can describe associations, we cannot establish causality. For example, while we find that the number of followers is associated with retweeting, we cannot insure, due to the descriptive nature of the study, that increasing the number of followers will lead to an increase in retweets. Second, although we captured the majority of tweets from federal agencies we could only collect a maximum of 3200 for each handle, so for a few of these agency handles (18/130), our data was censored. Nevertheless, we still had a large corpus of tweets over a long period of time. Third, the intent behind some tweets may simply be to inform and not necessarily to engage via retweeting. We do not know about an organization's motivations for tweeting or for posting specific tweets or the targeted audience. Furthermore, some agencies may have more information that naturally draws the public. Thus, these results do not represent a ''report card'' on these agencies. Fourth, our definition of engagement is limited to examining retweeting and its features. Fifth, although we considered various important and typically used tweet-based features in our statistical analysis, there may be other key features. For example, while time or day of the week may have significant effects on tweeting or retweeting behavior [60,61] and hence engagement, these features were considered outside the scope of our study. We also did not examine the features of the retweet. For example, a retweet may agree with or contradict the message in the source tweet. Finally we limit our analysis to Twitter, and there are other social network platforms that federal agencies are using.

Conclusions
We present the first comprehensive analyses of Twitter engagement by public health agencies. The level of Twitter activity varies greatly by health agency: some health accounts are very active and others are not as much. However, it seems to be the content of the Tweets (e.g., about activities and behaviors, disorders) and not the number of tweets alone that is associated with a higher level of engagement (number of retweets). Furthermore, although some of the factors associated with more engagement cannot be changed by the agency (e.g., the length of time they have been active on Twitter), several factors associated with higher retweets can be controlled (e.g., use of hashtags, URLs). Our results provide a framework for future experiments designed to improve the public's engagement with health agencies via Twitter.

Supporting Information
Data S1 List of Twitter handles of 130 HHS health agencies used in this paper.