Testing Propositions Derived from Twitter Studies: Generalization and Replication in Computational Social Science

Replication is an essential requirement for scientific discovery. The current study aims to generalize and replicate 10 propositions made in previous Twitter studies using a representative dataset. Our findings suggest 6 out of 10 propositions could not be replicated due to the variations of data collection, analytic strategies employed, and inconsistent measurements. The study’s contributions are twofold: First, it systematically summarized and assessed some important claims in the field, which can inform future studies. Second, it proposed a feasible approach to generating a random sample of Twitter users and its associated ego networks, which might serve as a solution for answering social-scientific questions at the individual level without accessing the complete data archive.

(3) For the number of followees, the KS-D metric for ego_batch1 and ego_batch2 is 0.0085 (p = 0.8053), for ego_batch1 and ego_batch3 is 0.0104 (p = 0.5774), while for ego_batch2 and ego_batch3 is 0.0149 (p = 0.1574). All comparisons suggest that there are no significant differences across batches and attributes, indicating a very high level of internal validity of our sampling approach.
Second, we compared the distributions for alter batches.
(3) For the number of followees, the KS-D metric for alter_batch1 and alter_batch2 is 0.032 (p < 0.001), for alter_batch1 and alter_batch3 is 0.0352 (p < 0.001), while for alter_batch2 and alter_batch3 is 0.0547 (p < 0.001). All comparisons suggest that distributions are significant different across batches of alters, indicating that the induced alters from representative egos are not representative at all. It also shows that the commonly used BFS sampling approach could not generate a representative sample. 1) The 20/80 rule was tested using the number of statuses in ego profiles. We acknowledge that there might be many fake accounts in our sample. However, it might not affect the distribution too much. We could assume that users posted a few tweets are fake accounts. As Fig.1  2) Retweet could be identified by whether the API returned a retweeted user ID. We did not count the unofficial retweet (e.g., "RT: @username") in our study, because it may introduce additional noise. Also, we emphasized that @ could be a byproduct of retweet and reply-to. We explicitly distinguished the induced @ (by replying to others or retweeting) from the @ in original tweets. Both official RT and @ are provided by the Twitter Timeline API.

Analysis details
Similarly, reply could be identified by whether the API returned a reply-to user ID. Original tweets are statuses that are not replies or retweets. In our ego tweets sample (4,702,258 tweets produced by 17,244 users), the proportion of replies is 24.1% and the proportion of retweets is 22.4%, therefore, the proportion of original tweets is 53.5% (1-24.1%-22.4%).
The proportions were calculated at the tweet level. We should note that Twitter timeline API has a 3,200 limit for each user. According to ego profiles, the maximum tweets posted by our sampled egos is 1,082,000. However, we do not think it will influence our results in general, because only 2.5% (873) Fig. 3C was obtained from the ego timeline. For those egos who posted nothing, friend count was set to 0. 5) Ego profiles contain sufficient information to examine the distributions of the number of followers and followees per ego. We need ego-alter relationships to calculate the number of reciprocal ties per ego. In case fake accounts may influence the degree distributions, we delete those users who have not posted anything. Fig. S2 shows that the results are actually similar.

Figure B | Degree distribution in the follower-followee network excluding users
with zero post. 6) We calculated the local clustering coefficient for each ego using the ego-alter and alter-alter relationships. In other words, we calculate this coefficient in each 1.5 ego network separately (N = 6,415 active egos). The average clustering coefficient is the mean of the 6,415 local clustering coefficients. Please note that the 1.5 ego network only contains the full triangles of the ego node. Thus, calculating alters' clustering coefficient is meaningless. The mutual graph is the 1.5 ego network excluding non-reciprocal ties and the associated nodes.
7) Fig. 5A shows that the estimated limit is around 87, which is much smaller than 100-200 estimated in Gonçalves et al. 1 . The former study collected the data from the active users in 2009. Active users in different periods may behave quite differently in social interactions. Fig. S3 shows that this cohort effect indeed exists.
Users registered before 2009 have a higher limit than their counterparts. It demonstrates that previous studies may only reflect behaviors of a sub-population of Twitter users, whereas our study reflects the average. Second, we used generalized linear model because the dependent variables are binary responses (retweetability) and count data (retweet count). Therefore, the link function for retweetability is the logit (for binomial distribution) while the link function for count is the logarithm (for Poisson distribution). In both models, we only included random intercept effect. Interpretations to the fixed effects are analogue to logistical regression and Poisson regression respectively. 9) Overall, according to the Z-scores in Table 1, post level variables are more powerful in predicting retweeting behavior. Another observation is that retweetability is much more predictable than frequency using our variables.
10) The exposure hypothesis focuses on the probability of retweeting alters' tweets by egos. Therefore, we selected the users who has retweeted at least once (7,226 egos). The official RT, rather than hashtag and URL, was used to identify retweet.