Hurricanes and hashtags: Characterizing online collective attention for natural disasters

We study collective attention paid towards hurricanes through the lens of n-grams on Twitter, a social media platform with global reach. Using hurricane name mentions as a proxy for awareness, we find that the exogenous temporal dynamics are remarkably similar across storms, but that overall collective attention varies widely even among storms causing comparable deaths and damage. We construct ‘hurricane attention maps’ and observe that hurricanes causing deaths on (or economic damage to) the continental United States generate substantially more attention in English language tweets than those that do not. We find that a hurricane’s Saffir-Simpson wind scale category assignment is strongly associated with the amount of attention it receives. Higher category storms receive higher proportional increases of attention per proportional increases in number of deaths or dollars of damage, than lower category storms. The most damaging and deadly storms of the 2010s, Hurricanes Harvey and Maria, generated the most attention and were remembered the longest, respectively. On average, a category 5 storm receives 4.6 times more attention than a category 1 storm causing the same number of deaths and economic damage.

are all associated with the 2017 storm Hurricane Harvey. Some hashtags are even associated with multiple storms, such as the relief focused "#hr4hr" or the general "#hurricane". Using our n-gram usage rate dataset, it is not possible to correctly attribute the attention share of general hashtags to a particular storm. However, concerns remain that our single hashtag could be too restrictive, and miss large amount of attention.
To confirm the validity of our chosen hashtag, we constructed a more comprehensive measure of hurricane attention. First, we searched for tweets in a 0.1% subsample of tweets containing the 2-gram "hurricane *" for each Hurricane in the study period within the 100 day period after each storm's formation date. Within these matching tweets, we counted every hashtag. The majority of these co-occurring hashtags are not specific to the storm, such as "#news". We also remove the pattern "#hurricane*" to avoid biasing this alternate measurement meant to confirm our initial choice. To help identify closely related hashtags, we plotted the usage rate of up to the top 20 most frequency co-occurring hashtags. Hashtags with attention spikes around the storm's dates of activity and which were related to the storm in question were added to a list of relevant hashtags. This list will miss hashtags with such low usage rates that they do not appear in our 0.1% sample, but these hashtags should not considerably add to our aggregate measures of attention.
For each storm we compute two measurements of attention: • Summed Related Hashtags Usage Rate-a sum of all associated hashtags' usage rate for 28 days after hurricane formation • Maximum Related Hashtag Usage Rate-the sum the usage rate for the most used associated hashtag for 28 days.
The result i s s hown i n S1 Fig. The most obvious outlier, Hurricane Bill from 2009, was a curiosity i n t he meteorological community as powerful Category 4 s torm, but caused relatively little destruction remaining at sea. Clearly, the measures are highly correlated. We also calculated Spearman's Rho, to find the rank-order correlation between our Integrated Usage Rate, from the main study, and the Summed Related Hashtags Usage Rate. We found ρ = 0.92, confirming this strong association between different measures. This increases our confidence that our Integrated

Summary Tables for Regressions
Provided for the reader here are tables of summary statistics of the estimated parameters in the regression models in subsections Attention and Impact Regressions by Category and Regression Models for Impacts, Impact Interactions and Hurricane Category. Examining the top 2-grams matching the pattern "hurricane *" in S3 Fig, we can get a sense of what are the top storms during the season, and how much attention is allocated to each at a given time. For English tweets, the first major spike of the 2017 hurricane season is surrounding Hurricane Harvey, though attention also spikes for Hurricane Katrina, in reference to the 2005 storm that affected a nearby region of the gulf coast. As attention begins to decay for Hurricane Harvey, a spike in usage for the 2-gram "hurricane relief" is observed, though it reaches only f = 3 * 10 −5 . Next, attention turns to Hurricane Irma, which reaches the highest 2-gram usage rate of any hurricane in our dataset. Finally, one week after attention for Irma begins to decay, attention spikes for Hurricane Maria, though at a level noticeably lower than for Harvey or Irma.
We notice that during storm events the 2-gram usage rates for storms "hurricane *" is often between half or a fifth the usage rate of the 1-gram "hurricane", meaning that about one in every 5 times the name of the storm follows the word hurricane in English tweets during active storms.
In Spanish tweets the usage rates of "huracán harvey" only reach a maximum of around f ∼ 10 −4 , while "huracán irma" receives much more relative attention. "huracán marìa" receives about as much attention as Harvey, and also occupies a space similar to "hurricane maria" in English, around f ∼ 10 −4 .  Table. The unnormalized values associated with radar plots in Results. Storms are colored by the maximum hurricane category from red as Category 5 to yellow as Category 1. As in the radar plots, storms are ordered by damage. W e can s ee t hat t he word usage rate surrounding "hurricane maria" captures a similar amount of the total attention for the 1-gram hurricane as "hurac´an mar´ıa" captures. Additionally, Hurricane Harvey's 2-gram usage rate is lower in Spanish than in English, while Hurricane Katrina is talked about considerably in English but does not rise about the 50000th most used 2-gram in Spanish. As always, usage rates are case-insensitive.

Bi-exponential Decays
To quantify the characteristic time scales of attention given to storms, we examined usage rates by fitting the bi-exponential model introduced by Candia et al. [33]. Not all storms receive enough attention, but 50 of 75 in the Atlantic basin recorded at least 6 days of consecutive 2-gram usage within the year of the hurricane, and these storms were had both their hashtag and 2-gram usage rate fit with the bi-exponential model of Candia et al.The model here assumes two populations, u and v, which become interested in a given event. Population u, comparable to the general population starts with a peak interest, and loses attention as du dt = −(p + r)u. During every unit time pu attention is lost from the system and ru is transferred to population v. The dynamics of population v are as follows: dv dt = ru − qv, so attention decays from v with rate q, but increases proportionally to the total attention of population u. The final bi-exponential model is and we present the half-lives associated with this model as τ 1 = ln(2) and τ = ln(2) ,

S4 Fig
(p+r) 2 q which are the rates of decay from the two populations u and v. The distributions of τ 1 and τ 2 for both hashtag usage rates and 2-gram usage rates are shown in . The mean half-life for population u, the population with faster attention decay, is τ¯1 = 1.3 days for hashtags, and τ¯1 = 1.1 days for 2-grams. The decays for population v were not uni-modal, due to some storms regaining attention long after their initial impact, deviating from the model and receiving poor fits, and resulting in very large values of τ 2 , but median values were approximately 24 days. All summary statistics are reported in S9 and S10 Tables. We speculate that for this model the population u is largely people effected by the storm, while population v is largely people writing about the storms or sharing i nformation about t he s torm response, eg, reporters and non-profit professionals. Further work could look to confirm who is behind the tweets.
The model we use makes an assumption that users tweeting with the hashtag do so within a role of one of two groups, where one group's attention is dependent on the attention of the other group. Other models, such as the one used by García-Gavilanes et al. citegarcia2016dynamics to study page views on Wikipedia, fit attention decays to a three-phase exponential model. Their model makes no explicit assumptions about subgroups of users and instead fits three sequential but separate exponential processes. This phased approach is useful for quantifying decay time series with dynamics that cannot be adequately described by a simple exponential, but assumes the three phases have unrelated decay rates. In Candia's model a smooth change in observed decay rate arises from the transfer of attention between two groups with different rates of attention loss. Further work can investigate whether this assumption of different groups is justified, but the model remains useful in our primary goal of summarizing the observed rates of change.
While we only found it necessary to use a bi-exponential model to adequately capture the decay dynamics, in general n-exponential decay models will assume a minimum of n decay rates, if there are no interaction terms. However, we are unable to observe n-grams with very low usage rates, so it is quite likely that a third regime exists with a decay rate operating at the year scale for historical storms. If this were the case a tri-exponential model would be appropriate, though unfortunately we would be unable to accurately fit all its parameters with our current data resolution.
The fitting procedure was to first find the maximum value of the usage rate for each storm, before fitting the above model to the decay of log usage rate after this maximum.
The resulting fits are shown in S7 and S8 Figs. The fits generally appear sensible, but there are sometimes issues for noisy time series, where the rate parameter r becomes very small, corresponding to a very long half-life, and misfitting the early decay. This occurs in the time series for Hurricane Florence. The distributions of Mean Squared Error (MSE) are shown in S6 Fig. Looking at the decay half-lives in S9 Table we notice can see that most hurricane hashtags lose half their volume on the order of 1 or 2 days. The storms with relatively more attention on Twitter, Harvey, Irma, Matthew, and Sandy, all initially decay quickly, with a half-life on the order of a few days, but then have much longer decays associated with τ 2 , on the order of a few weeks. There are some aberrations where the biexponential model does a poor job of explaining the data, such as for Hurricane Joaquin, where a fight between Governor Bobby Jindal and the Obama administration over the size of a recovery package spurred news stories and attention long after the initial activity associated with the storm itself. This leads to increases in hashtag usage 1.4 × 10 −5 0.9 6.9 2010 S10 Table. Fitted half-lives τ 1 and τ 2 for all storms with at least 10 days of observed 2gram usage.

S4 Fig.
Bi-exponential Hurricane decay half-lives: Distributions of fitted half-lives for the populations u and v. The mean half-lives for τ 1 = 1.3 days and τ 2 = 156 days for hashtags and τ 1 = 1.1 days and τ 2 = 241 days for 2-grams. For τ 2 the median half-lives are also interesting since we suspect the longest half-lives are due to poor fits. For hashtags τ 2 = 23 days, and for 2-grams τ 2 = 24 days.
rate, and thus negative half-lives. The longest half-life is associated with Hurricane Maria, τ 2 was approximately twice as long as the next largest hurricane. The extended crisis in Puerto Rico caused by Maria may be a reason this exceedingly long lifetime, even though the initial attention received by the hashtag was less than storms of comparable strength.
We also fit a simple exponential model S(t) = N e −pt . For high attention storms for which we have more than a week of data, this model is unable to capture decays occurring on different time scales, and thus has poor fits. For smaller storms for which attention is lower than the resolution of our data set, the exponential model is perhaps more appropriate. A distribution of half-lives for hashtags and 2-grams is shown in S5 Fig. While for larger storms, the fits did not capture the changing rates of attention decay, it was adequate for smaller storms that decay quickly below our instrument's resolution. However, for storms for which we have data for an extended decay, the biexponential model is more appropriate.

Hurricane Attention Maps
The remaining Hurricane Attention Map and time series from 2009 to 2018 are presented for the reader's perusal. Only storms reaching at least Category 2 are shown, and Seasons 2013 and 2014 are omitted. Earlier storms in our dataset mostly did not make landfall, and thus appear to receive relatively little attention. The scale of attention on the maps is held constant between years.

S5 Fig.
Simple Exponential Hurricane decay half-lives: Distributions of fitted halflives for a single population. The median half-lives for τ = 5.3 days a for hashtags and τ = 5.2 days for 2-grams. The simple exponential model fails to explain the break in attention decay for larger storms, receiving more attention. The bi-modal distribution of half-lives for 2-grams suggests that there are two categories of storms, ones with larger half-lives have more data, and thus the longer decay increases the fitted half-life. Meanwhile, smaller storms receive so little attention, that we don't measure any after a week or so, leading to a much smaller half-live, which corresponds to τ 1 in our bi-exponential fit. The bi-exponential model has the lowest average MSE, followed by the simple exponential decay. The power law decay fails to capture the dynamics of attention decay, when the fit is compared to the data visually, and is reflected in the higher average MSE. Here p would be interpreted as rate of decay from population 1, r would be the transfer rate from population 1 to population 2, and r would be the rate of decay from population 2. Population 1 might be thought of as bystanders with a shorter attention span, while population two are those living with the ramifications, or working on the recovery who lose attention more slowly. Reported on the graph are the half lives associated with fitting this model for both the hashtag usage rate and 2-gram usage rate, τ 1 = ln 2 p+r and τ 2 = ln 2