Table 1.
Number of tweets for the period May 2012 to Nov 2013.
Fig 1.
Formation of cascades in the Twitter follower network.
At time t, node 1 posts a tweet. Nodes 2 and 4 post at times t2 and t4 between t and t′ = t + D. Node 5, which follows 2, posts at some time t5 between t′ and t″ = t′ + D. Therefore, the cascade C(1, t, D) is C(1, t, D) = {(1, t), (2, t2), (4, t4), (5, t5)}.
Fig 2.
Follower cascade size as a function of tweeting rate for ten follower cascades in Mexico (produced between June 27, 2012 and September 7, 2012), with synthetic traffic.
As the tweet rate of the users increases, we observe a sudden transition from a regime of very low user participation to a higher-activity regime.
Fig 3.
Node and shell removal heuristics for CSSP (Brazil).
Here, we see the largest remaining sub-cascade size in terms of numbers of tweets (normalized by the original size) as a function of numbers of remaining nodes in the cascade graph (normalized by the original number of nodes). This cascade occurred in June 2013, and its original size is 15,791 tweets.
Fig 4.
Node and shell removal heuristics for CSSP (Venezuela).
Here, we see the largest remaining sub-cascade size in terms of numbers of tweets (normalized by the original size) as a function of numbers of remaining nodes in the cascade graph (normalized by the original number of nodes). This cascade occurred in April 2013, and its original size is 226,179 tweets.
Fig 5.
The (normalized) maximum cascade size vs. the fraction of users selected for some of the largest cascades in different countries. Data are: blue (Mexico, Δ = 1 hour); light blue (Brazil, Δ = 4 hours); and orange (Venezuela, Δ = 4 hours).
Fig 6.
The (normalized) maximum number of unique users vs. the fraction of users selected for some of the largest cascades in different countries. Data are: blue (Mexico, Δ = 1 hour); light blue (Brazil, Δ = 4 hours); and orange (Venezuela, Δ = 4 hours).
Fig 7.
Descriptive statistics of selected features (Brazil) for the MRT and F models.
The names in the first column consist of the name of the structural feature (i.e., cascade size, duration or slope, which is the incremental increase in the size per day), and the statistical operations (i.e. median, average etc.).
Fig 8.
Variables selected by LASSO in the cascade model for Brazil, for a training period of November 2012 through May 2013.
Fig 9.
Performance of the predictive models.
We show the performance of the three models in terms of accuracy, brier score, and area under the ROC curve. The cascades model has the best performance accross different countries.
Fig 10.
ROC curves for the volume-based model.
We show the ROC curves for Mexico, Brazil, and Venezuela. Training period November 1, 2012 to November 9, 2013; test period November 10, 2013 to November 30, 2013.
Fig 11.
ROC curves for the baseline model.
We show the ROC curves for Mexico, Brazil, and Venezuela. Training period November 1, 2012 to November 9, 2013; test period November 10, 2013 to November 30, 2013.
Fig 12.
ROC curves for different countries.
ROC curves for Mexico, Brazil and Venezuela for the cascade model. Training period November 1, 2012 to November 9, 2013; test period November 10, 2013 to November 30, 2013.
Fig 13.
ROC curve for different models for Brazil, for a training period of Nov 2012 through May 2013 and testing period of June 1-30, 2013.
Fig 14.
Cascade properties as predictors of protest.
Cascade size, number of users, and number of cascades for Follower and MRT cascades in Brazil for the period November 2012—June 2013.