Forecasting Social Unrest Using Activity Cascades

doi:10.1371/journal.pone.0128879

Table 1.

Number of tweets for the period May 2012 to Nov 2013.

More »

Expand

Fig 1.

Formation of cascades in the Twitter follower network.

At time t, node 1 posts a tweet. Nodes 2 and 4 post at times t₂ and t₄ between t and t′ = t + D. Node 5, which follows 2, posts at some time t₅ between t′ and t″ = t′ + D. Therefore, the cascade C(1, t, D) is C(1, t, D) = {(1, t), (2, t₂), (4, t₄), (5, t₅)}.

More »

Expand

Fig 2.

Follower cascade size as a function of tweeting rate for ten follower cascades in Mexico (produced between June 27, 2012 and September 7, 2012), with synthetic traffic.

As the tweet rate of the users increases, we observe a sudden transition from a regime of very low user participation to a higher-activity regime.

More »

Expand

Fig 3.

Node and shell removal heuristics for CSSP (Brazil).

Here, we see the largest remaining sub-cascade size in terms of numbers of tweets (normalized by the original size) as a function of numbers of remaining nodes in the cascade graph (normalized by the original number of nodes). This cascade occurred in June 2013, and its original size is 15,791 tweets.

More »

Expand

Fig 4.

Node and shell removal heuristics for CSSP (Venezuela).

Here, we see the largest remaining sub-cascade size in terms of numbers of tweets (normalized by the original size) as a function of numbers of remaining nodes in the cascade graph (normalized by the original number of nodes). This cascade occurred in April 2013, and its original size is 226,179 tweets.

More »

Expand

Fig 5.

Greedy heuristic for CSFP.

The (normalized) maximum cascade size vs. the fraction of users selected for some of the largest cascades in different countries. Data are: blue (Mexico, Δ = 1 hour); light blue (Brazil, Δ = 4 hours); and orange (Venezuela, Δ = 4 hours).

More »

Expand

Fig 6.

Greedy heuristic for CSFP.

The (normalized) maximum number of unique users vs. the fraction of users selected for some of the largest cascades in different countries. Data are: blue (Mexico, Δ = 1 hour); light blue (Brazil, Δ = 4 hours); and orange (Venezuela, Δ = 4 hours).

More »

Expand

Fig 7.

Descriptive statistics of selected features (Brazil) for the MRT and F models.

The names in the first column consist of the name of the structural feature (i.e., cascade size, duration or slope, which is the incremental increase in the size per day), and the statistical operations (i.e. median, average etc.).

More »

Expand

Fig 8.

LASSO Variables.

Variables selected by LASSO in the cascade model for Brazil, for a training period of November 2012 through May 2013.

More »

Expand

Fig 9.

Performance of the predictive models.

We show the performance of the three models in terms of accuracy, brier score, and area under the ROC curve. The cascades model has the best performance accross different countries.

More »

Expand

Fig 10.

ROC curves for the volume-based model.

We show the ROC curves for Mexico, Brazil, and Venezuela. Training period November 1, 2012 to November 9, 2013; test period November 10, 2013 to November 30, 2013.

More »

Expand

Fig 11.

ROC curves for the baseline model.

We show the ROC curves for Mexico, Brazil, and Venezuela. Training period November 1, 2012 to November 9, 2013; test period November 10, 2013 to November 30, 2013.

More »

Expand

Fig 12.

ROC curves for different countries.

ROC curves for Mexico, Brazil and Venezuela for the cascade model. Training period November 1, 2012 to November 9, 2013; test period November 10, 2013 to November 30, 2013.

More »

Expand

Fig 13.

ROC curve for Brazil.

ROC curve for different models for Brazil, for a training period of Nov 2012 through May 2013 and testing period of June 1-30, 2013.

More »

Expand

Fig 14.

Cascade properties as predictors of protest.

Cascade size, number of users, and number of cascades for Follower and MRT cascades in Brazil for the period November 2012—June 2013.

More »

Expand