The Why We Retweet scale

Background Twitter offers a platform for rapid diffusion of information and its users' attitudes and behaviors. Insights about information propagation via retweets (the message forwarding function) offer observable explanations of ways in which modern human interactions get organized in the form of online networks, and contextualized in the form of public health, policy decisions, disaster management, and civic participation. This study conceptualized and validated the Why We Retweet Scale to contextualize retweeting behavior. Objective Twitter users were identified using clustering algorithms that consider a users’ position in their network and invited for an online survey. Participants (N = 1433) responded to 19 questions about why they retweet. Exploratory factor Analysis (EFA) was conducted on a scale development sample (70% of original sample), which informed the Confirmatory Factor Analysis (CFA) on a scale testing sample (30% of the original sample). Varimax rotation was used to obtain a rotated factor solution, which resulted in interpretable factors. Demographic differences among scale factors were analyzed using one-way ANOVA or independent samples t-tests. Results The final model (χ221 = 28, RMSEA = .03 [90% CI, 0.00–0.06], CFA = .99, TLI = 0.99) represented a parsimonious solution with 4 factors, measured by 2–3 items each, creating a final scale consisting of 9 items. Factor labels and definitions were: (1) Show approval, “Show support to the tweeter”; (2) Argue, “To argue against a tweet that I disagree with”; (3) Gain attention, “Add followers or gain attention”; and (4) Entertain, “Create humor/amusement”. Demographic differences were also reported. Conclusions The Why We Retweet Scale offers a useful conceptualization and assessment of motivations for retweeting. In the future, communication strategists might consider the factors associated with information propagation when designing campaign messages to maximize message reach and engagement on Twitter.

Introduction retweeted the message and the user who originally posted the message. To create a diverse stratified sample, this information was used to construct the social network structure of users, where connections between users were defined by retweets of messages from one user to another. From this retweet network, clusters were identified by conducting a modularity analysis, which helped locate clusters within a network by grouping nodes (i.e., Twitter users) who have more connections (i.e., retweets) with others within a group than those outside of the group. From each cluster, Opinion leaders were chosen as those who had been retweeted the most; Followers were identified within each cluster as those who had retweeted others the most. Random users were independently found by Twitter's API get-user-status function, which returns users who have recently posted a tweet, from which a sample was randomly selected. The goal of this procedure was to make sure we included a variety of Twitter users based on their positions in the Twitter network; 24.2% of the participants were categorized as Opinion Leaders, 39.6% as Followers, and 36.2% as Random users in the retweet network From January-December 2016, Twitter users identified in the above networks were sent private messages inviting them to participate in a survey on health behaviors and reasons for retweeting among other survey items. After consenting to participate, each participant was directed to the online survey. All participants were over 18 years, residing in the United States, able to complete an online survey in English, and received a $20 gift card for completing the survey. The University of Southern California Institutional Review Board approved all study procedures. All analyses adhered to the terms and conditions, terms of use, and privacy policies of Twitter.

Scale items
Participants responded to 19 questions that were developed to understand why people retweet. Response options were provided on a scale of 1-5 with "Never" coded 1 and "Very often" coded 4, 'Prefer not to answer' coded as missing (Table 1). These items were based on boyd et.

Demographic measures
Participants were asked to indicate their gender (male, female), age (years), race (White, Black or African American, American Indian or Alaska Native, Asian/Pacific Islander, or Other) and ethnicity (Hispanic/Non-Hispanic), income (Less than $10,000; $10,000 to $14,999; up to $200,000 or more in increments of $10,000 per year), level of education (Less than high school; some high school, no diploma; GED; High school graduate-diploma; Some college but no degree; Associate degree-occupational/vocational; associate degree-academic program; bachelor's degree (ex: BA, AB, BS); master's degree (ex: MA, MS, MEng,Med, MSW); professional school degree (ex: MD, DDS, DVM, JD); Doctorate degree (ex: PhD, EdD)) S1 Table. Those who did not wish to answer selected the option 'Prefer not to answer' for all the above questions except sex and age.

Procedure
All statistical analyses were performed using Stata version 14.2. Responses indicating 'prefer not to answer' were marked as missing. Complete cases were randomly drawn to populate the scale development sample (70% of total sample, N = 1003) and scale validation sample (30% of total sample, N = 430). The analytic sample for Exploratory Factor Analysis (EFA) was n = 824 due to listwise deletion to handle nonresponse or missing items, while the analytic sample for Confirmatory Factor Analysis (CFA) was n = 366. First, EFA was performed on the scale development sample to determine the optimal number of factors that could account for the observed variation in responses. Factor correlations less than 0.3 implied that the solution remained orthogonal [23]. We centered all scale items on their means and the Kaiser-Meyer-Olkin measure of sampling adequacy was assessed to determine how well the correlation between pairs of variables was explained by other variables in the analysis [24]. The Bartlett test of sphericity was used to test the null hypothesis that the observed correlation matrix was an identity matrix corresponding to no correlation between scale items. The EFA used principal components analysis. Factors with eigenvalues greater than one were extracted. Items with factor loadings greater than 0.7 were retained as indicators of their respective factors. Next, to validate the scale, confirmatory factor analysis was conducted on the validation sample. The criteria for model fit were CFI was greater than 0.9 and RMSEA <0.05 [25]. The maximum-likelihood estimation procedure was employed as a global test of the model [26]. Each subscale's internal reliability was assessed using Cronbach's alpha and/or Spearmanbrown's coefficient (in the case of two-item factors) [27]. Items that loaded on each factor were summed to create a factor score. Construct validity, in particular, convergent validity was assessed based on the average variance extracted (AVE) for each factor. For factors with AVE less than 0.5, a composite reliability higher than 0.6 was considered adequate for establishing convergent validity [28]. Additionally, squared inter-factor correlations for each factor were compared with the corresponding squared root of AVE scores to establish discriminant validity [29]. Lastly, independent samples t-tests and ANOVA tests were used on the complete analytic sample (N = 1190) to analyze demographic differences on each subscale.

Internal consistency and exploratory factor analysis
Bartlett's test of sphericity yielded a large value (7835.64) and the associated significance probability (p = 0.001) indicated that the observed correlation analyses were statistically significant. Additionally, Kaiser-Meyer-Olkin's value was 0.87 which justified further analysis. Initial item analysis was performed on all 19 items on a training sample. It was determined that the solution remained orthogonal. Varimax rotation was performed on all 19-items on the same training sample. Principal components analysis was performed and produced a four-factor solution with eigenvalues greater than 1 based on Kaiser's criteria (cumulative variance explained = 55%). Factor loadings greater than or equal to 0.7 were retained for interpretation. Table 2 reports resulting factor loadings of 19 items.
The rotated factor solution resulted in four interpretable factors. Factor labels and items were: Factor 1 Show approval, "To show my support for the tweeter" (explained 24% of the variance); Factor 2 Argue, "To argue against a tweet that I disagree with" (explained 22% of the variance); Factor 3 Gain attention, "Add followers or gain attention" (explained 14% of the variance); Factor 4 Entertain, "To entertain" (explained 14% of the variance).

Confirmatory factor analysis
To confirm findings from the EFA, a CFA model was fit using 9 items and 4 factors, with each of the items only allowed to load and be freely estimated on its hypothesized factor. The final model (χ 2 21 = 28, RMSEA = .03 [90% CI, 0.00-0.06], CFA = .99, TLI = 0.99) represented a parsimonious solution with 4 factors, measured by 2-3 items each, creating a final scale consisting of 9 items [30]. This solution offered a good fit without any adjustments, such as covarying parameters or allowing variables to load on additional factors, to achieve the final model. Individual item loadings were high for all items on their respective factors (range = 0.70-0.93; see Table 3).

Factor inter-correlations and internal consistency
Pearson's product-moment correlations were assessed for each pair of subfactor scores. All factors were correlated significantly (p < .05) from 0.19 to 0.45. Internal consistency was also acceptable for each factor, as measured by Cronbach's alpha (range = 0.6 to .084; see Table 3). We examined the Spearman-Brown coefficient for all two-item factors to predict their reliability for a 3-item test (See Table 3). As noted in Table 3, Factor 2 -Argue Spearman-Brown coefficient is 0.69, which is lower but approaches an acceptable coefficient of 0.70.

Construct validity
The scale's construct validity was assessed in terms of convergent and discriminant validity. Convergent validity was assessed based on the average variance extracted (AVE) for each factor. AVE values were higher than 0.5 all except one factor, Argue (AVE = 0.42; Table 3). However, the composite reliability (CR) of this factor was 0.59, indicating that it approached an acceptable level of convergent validity (see Table 4). Discriminant validity was determined to be sufficiently high for the scale, given the square root of the AVE values was higher than the inter-factor squared correlations were (see Table 4).

Relationship between Why We Retweet Scale with demographic characteristics
In terms of the demographic differences (Figs 1-5), those who retweeted to Show approval (t 924 = -2.05, p = 0.04) and Gain attention (t 924 = -2.62, p = 0.001) were more likely to be men than women. However, those who retweeted to Argue were more likely to be women than men (t 924 = 2.14, p = 0.03). Those who retweeted to Argue (F = 4.99, p = 0.001) or Entertain (F = 3.11, p = 0.01) were more likely to be African American than other races (F = 4.99, p = 0.001). Those who retweeted to Gain attention were likely to be less educated (t 902 = 2.58, p = 0.01) and Colored cells indicate factor loadings � 0.7 for the corresponding items (Column 1). Cronbach's alpha (above) indicate reliability coefficients for each factor in the Confirmatory Factor Analysis. Spearman-Brown co-efficient is reported for two-item factors. Average variance extracted explain the extent to which each factor explains the variance of its indicators. earning a lower annual income of less than $34,999 per year (t 764 = 2.42, p = 0.01). Those who retweeted to Entertain were more likely to be younger (less than or equal to 20 years of age).

Discussion
The present study conceptualized and validated the Why We Retweet Scale, offering insights into the nature and dimensionality of the motivations for retweeting, and provided an empirical investigation of boyd et al.'s exploratory, qualitative study [18]. While boyd et al reported on ten different motivations to retweet, the present study suggested that retweeting is driven by four factors: Show approval, Argue, Gain Attention and Entertain among a sample of Twitter users. Prior research has suggested that self-efficacy in information sharing, attachment motivation and critical mass explain retweeting motivations [20], which broadly contextualizes our findings in the realm of social cognitive theory. Similarly, findings predominantly align with Gruber (2017)'s findings wherein showing approval, arguing and gaining attention are predominantly interpersonal factors driving retweeting behavior. Factors driving retweeting behaviors could be extrinsic (i.e., show approval, entertain) or intrinsic (i.e., argue, gain attention). Different motivations for retweeting could be instrumental in assessing or inferring reasons for user involvement in different topics or issues. This is especially so for specific demographic groups. Determining why people retweet could enable communication strategists to contextualize and gauge messages to the public online. Specifically, communication strategists could reach certain groups with targeted messages that elicit response (e.g., sending provocative messages to women who tend to engage/retweet through argument). Earlier research has suggested that retweeting is typically a measure of viral reach of information, such that the messages that receive the most retweets are considered to be the most influential [31]. This view, however, limits the understanding of this increasingly ubiquitous communication practice. The communicative meaning and valence of a tweet may change depending on what motivates the user to retweet and should be an area of future research. Additionally, while this study showed the reliability of the Why We Retweet Scale, it could not demonstrate its validity in relationship to prior reasons for retweeting. Future research should examine how the Why We Retweet Scale relates to existing measures of motivations to retweet including measures that include attention-seeking. Future research should also examine if these factors predict actual content of retweets.  Twitter recently introduced two changes that could make retweeting more powerful than before. For example, one change pertains to its algorithmic timeline, which exposes users to trending topics on top of their feed, which facilitates accelerated diffusion of popular tweets [32]. The other change includes a thread feature which allows users to string together tweets to serialize information [33]. Retweet references to these threads have the potential to engage a large audience with a longer story or thought or offer an in-depth commentary on an event or topic. These new features create opportunities for in-depth discussions about emerging topics. As a result, Twitter is likely to evolve as a communicative platform that encourages more nuanced exchanges. Coupled with the present study's findings, it is critical to examine the underlying motivations for sharing information related to health, natural disasters, public policies, and governance.

Limitations
This sample comprises Twitter users with public profiles limiting generalizability to those with private accounts. The sampling strategy (network clustering based on users' tobacco-related terms) and sample size of this study also limits findings' generalizability but are improvements over previous work [18]. The reliability of the Argue factor is lower than desired (Cronbach's alpha = 0.6) and may be due to the number of items [34]. The convergent validity of this factor was also lower than desired (AVE = 0.42), however the Spearman-Brown coefficient approached an acceptable level. Replication, invariance testing (e.g., temporal, cultural), as well as other ongoing construct validity evaluation need to be considered in future research to better understand retweeting motivations.

Conclusion
By developing the Why We Retweet Scale, this study provides a number of exploratory insights into the practice of online information dissemination. Instead of using counts of retweets as a reference to tweet virality or user engagement, this scale points to the user context, which lends meaningful interpretation of messages. For example, a policy decision maker would benefit from knowing whether the general public is retweeting about a proposed policy to express support for the policy or pursue their goal of building a network of like-minded individuals. Taken together, this scale informs communication strategists about factors associated with information propagation when designing campaign messages in order to maximize message research and engagement on Twitter.
Supporting information S1