We study collective attention paid towards hurricanes through the lens of n-grams on Twitter, a social media platform with global reach. Using hurricane name mentions as a proxy for awareness, we find that the exogenous temporal dynamics are remarkably similar across storms, but that overall collective attention varies widely even among storms causing comparable deaths and damage. We construct ‘hurricane attention maps’ and observe that hurricanes causing deaths on (or economic damage to) the continental United States generate substantially more attention in English language tweets than those that do not. We find that a hurricane’s Saffir-Simpson wind scale category assignment is strongly associated with the amount of attention it receives. Higher category storms receive higher proportional increases of attention per proportional increases in number of deaths or dollars of damage, than lower category storms. The most damaging and deadly storms of the 2010s, Hurricanes Harvey and Maria, generated the most attention and were remembered the longest, respectively. On average, a category 5 storm receives 4.6 times more attention than a category 1 storm causing the same number of deaths and economic damage.
Citation: Arnold MV, Dewhurst DR, Alshaabi T, Minot JR, Adams JL, Danforth CM, et al. (2021) Hurricanes and hashtags: Characterizing online collective attention for natural disasters. PLoS ONE 16(5): e0251762. https://doi.org/10.1371/journal.pone.0251762
Editor: Yong-Yeol Ahn, Indiana University, UNITED STATES
Received: February 26, 2021; Accepted: April 30, 2021; Published: May 26, 2021
Copyright: © 2021 Arnold et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data files are available on a public repository, hosted at github.com/mvarnold/Hurricane-Data-Repo.
Funding: This work was supported by gifts from the Massachusetts Mutual Life Insurance Company and Google. Additionally, Massachusetts Mutual Life Insurance Company provided support in the form of salaries for author DRD, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Author DRD’s commercial affiliation with paid employment. Additionally, we have received funding from a commerical source: Massachusetts Mutual Life Insurance Company and Google. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
The collective understanding and memory of historic events shapes the common world views of societies. In a narrative economy, attention is a finite resource generating intense competition [1–9]. As commerce and communication shift to online platforms, so too has the narrative economy moved to the digital realm. In 2018, over $100 billion dollars were spent on internet advertising in the United States, nearly overtaking the $110 billion spent on traditional media advertising—about 1% of the US GDP . Today, social media both facilitates and records an extraordinary percentage of the world’s public communication [11, 12]. For computational social scientists, the migration of parts of the narrative economy to the web continues to present an immense opportunity, as the discipline becomes data-rich [13, 14].
Academics have become interested in narrative spreading around newsworthy events on social media platforms such as Twitter, as increasingly political fights for influence or narrative control are fought by actors as wide ranging from activists and police departments , to state censors suppressing discourse internally and state supported troll factories spreading divisive narratives internationally [16–21]. In 2019, the social media platform Twitter boasted over 145 million daily active users .
Quantifying the spread of narratives and the total attention commanded by them is a daunting task. Recent work has made progress in tracking the spread of quoted and modified phrases through the news cycle, and others have worked to identify actant-relationships and compile contextual story graphs from social media posts [3, 23]. In comparison, quantifying attention directed towards a topic, person or event is a somewhat easier task. Rather than identifying actors and identifying what they act on, as is the case for narrative attention, we can simply count mentions of an entity. Since increasing raw attention or number of mentions is often the zeroth order activity in public relations campaigns, quantifying the volume of attention, irrespective of the sentiment or narrative within which the attention is embedded, seems a natural first step [24–26].
An understanding of attention has typically focused on time dynamics as measured by the number of mentions in a given corpus, explaining either temporal decay of interest or heavy-tailed allocation of attention given to a spectrum of topics through some preferential attachment mechanism. [27–34]. Another group of studies have worked to classify attention time series from social media as either exogenous or endogenous to the system, modeling the functional form of collective attention decay, or determining if spreading crosses a critical threshold [35–38]. While these studies have typically focused on scientific works, patents, or cultural products such as movies, the rise of large social media datasets have enabled the investigation of a wider range of topics in online public discourse .
A broad spectrum of collective attention studies have been conducted using Twitter as a data source. Researchers have used Twitter data to indicate the likely outcome of elections by quantifying the collective attention directed toward political parties . Other researchers have investigated the relationship between the dynamics of collective attention and event credibility, finding that “sustained, intermittent bursts of attention were found to be associated with lower levels of perceived credibility .”
In this study we examine the collective attention focused on hurricanes, using Twitter, which allows us to capture more natural speech intended for human readers as opposed to search terms. Twitter data has been used to measure shifts in collective attention surrounding exogenous events like earthquakes by looking for jumps in the Jensen-Shannon divergence between tweet rate distributions between days, or creating real-time earthquake detection using keyword based methods [42, 43].
Here, we use collective attention in a more narrow sense. Instead of looking for anomalous tweet rates, we study n-gram usage rates for hashtags and 2-grams associated with individual events. Specifically, we examine the usage rates of hashtags and 2-grams matching the case-insensitive pattern “#hurricane*” and “hurricane *”, respectively. Natural disasters provide an ideal case study, since they are generally unexpected, producing the signature of an exogenous event. However, the volume of attention given to any particular hurricane varies widely across several orders of magnitude, as does the severity of the storm in terms of the lives lost and damages caused.
Prior efforts have examined the attention received by disasters by type and location, as measured by time devoted on American television news network coverage, and striking discrepancies: for example, to have the same estimated probability of news coverage as a disaster in Europe, a disaster in Africa would need to cause 45 times as many deaths . The same study found that in order to receive equivalent coverage to a deadly volcano, a flood would need to cause 674 times as many deaths, a drought 2,395 times as many, and a famine 38,920 times as many casualties.
Strong hurricanes are more likely to capture attention than weak hurricanes, and hurricanes impacting the continental United States capture much more attention than those failing to make landfall. To what degree does attention shrink when hurricanes make landfall outside of the continental US? The 2017 hurricane season is a particularly stark example, showing that for comparably powerful storms above category 4, those projected to make landfall over the continental United States were talked about nearly an order of magnitude more than Hurricane Maria, which impacted Puerto Rico, and two orders of magnitude more than Hurricane Jose, which never made landfall.
Given the attention received by some hurricanes so unbalanced, we must ask the question: Do government or humanitarian relief resources get dispersed with greater generosity for storms that capture public attention, or are these organizations insulated from popular attention? For the 2017 hurricane season, more money was spent more quickly to aid the victims of hurricanes Harvey and Irma than victims of Hurricane Maria, contributing to the significantly higher death toll and adverse public health outcomes in Puerto Rico . While the attention and policies of government agencies are not usually dictated from Twitter, public attention certainly has some effect on the focus of agencies and allocation of government resources, and recently more attention has been focused on understanding the discourse on social media before, during, and after natural disasters [46–51]
We structure our paper as follows. In Results, we examine the spatial associations between hurricanes and the attention they receive, we compute and compare measures of total attention, maximum daily attention, and non-parametric measures of the rate of attention decay for the most damaging hurricanes in the past decade. We present conclusions in Concluding Remarks. Finally, we outline our methods and data sources, covering the collection of n-gram usage rate data in English tweets as well as data sources for hurricane locations and impacts.
Materials and methods
n-gram usage rates
We query the daily usage rate of hashtags referencing hurricanes are queried from a corpus of 1-gram—words or other single word-like constructs—usage rate time series, computed from approximately 10% of all posts (“tweets”) from 2009 to 2019 collected from Twitter’s “decahose” . We define usage rate, f, as with count, cτ, of a particular 1-gram divided is by the count of all 1-grams occurring on a given day, . The usage rates are based only on the usage rate of 1-grams observed in tweets classified as English by FastText, a language classification tool [53, 54]. We choose to focus on English tweets to study attention to North Atlantic storms, primarily because English is the most common language on Twitter. Additionally, US government agencies such as NOAA and FEMA compile estimates of hurricane impacts inside the US, a complementary dataset that we discuss below.
Our usage rate data set includes separate usage rates for 1-grams in “organic” tweets, tweets that are originally authored, as well as usage rates of 1-grams in all tweets (including retweets and quote tweets). More details about the parsing of the Twitter n-gram data set are available in .
For the purpose of studying attention, our usage rates are derived from the corpus with all tweets, including retweeted text, to better reflect not only the number of people tagging a storm, but also the number of people who decide the information contained therein was worth sharing.
We studied the usage rate of 1-grams exactly matching the form “#hurricane*”, where * represents a storm’s name. We also measured the usage rate of 2-grams matching the pattern “hurricane *” for each storm name. All string matching is case-insensitive. This choice is deliberately narrow, so that more broadly used hashtags do not inflate our measurement of attention associated with each storm. A broader measure is discussed in the S1 File and we show the two measures are strongly associated.
For the ten years covered by the HURDAT2 dataset overlapping with our Twitter dataset, there have been 75 storms reaching at least category 1 in the North Atlantic Basin. Within our 10% sample of tweets, we count over all storms a total of 1,824,842 hashtag usages within a year of each storm, and 3,643,411 instances of the matching 2-gram.
Deaths, damages, and locations
To augment our usage rate data set, we downloaded data associated with all hurricanes in the North Atlantic basin from 2008 to 2019 from Wikipedia . Included in the Wikipedia data are the damage estimates (US$) and deaths caused by each storm, as well as the dates of activity and areas effected. We also used the HURDAT2 data set containing the positions and various meteorological attributes of all North Atlantic hurricanes from 1900 to 2018 for the spatial component of this work . For the time range overlapping with the Twitter derived data set, HURDAT2 has 3 hour resolution.
We note that we collected all data while complying with the terms and conditions of the respective websites.
Hurricane attention maps
In Fig 1, we show hurricane positions as well as their hashtag usage rate timeseries with a time series indicating the usage rate of the hashtag of the form #hurricane*.
For 1-grams matching the case-insensitive pattern “#hurricane*” for all four hurricanes reaching at least category 4 in the 2017 hurricane season. Markers along the hurricane trajectory indicate the National Oceanic and Atmospheric Administration (NOAA) reported position for every day at noon UTC. On the map, the smoothed rate of hashtag usage is wrapped in an envelope around the hurricane trajectory in panel A, showing the spatial dependence of attention on Twitter. In the lower two plots, panels B and C, we show the usage rates for hashtags and 2-grams matching hurricane* in English language tweets on linear and logarithmic scales. Usage rates within all tweets are indicated with a solid line, while usage rates in ‘organic’ tweets (tweets that are not retweets), are represented by a dashed line. The day of maximum attention on Twitter is marked with a star or a diamond for hashtags or 2-grams, respectively. Generally, hurricanes making landfall on the continental United States received greater attention than those not making landfall. The hashtag usage rate for hurricanes Harvey and Irma at their maximum were approximately an order of magnitude larger than the maximum hashtag usage corresponding to Hurricane Maria, and two orders of magnitude larger than Hurricane Jose.
We plot the same hashtag usage rate time series below on both linear and logarithmic axes, as well as 2-gram usage rates. For clarity, we only include hurricanes reaching at least category 4.
The hurricane map tracks are meant to show the spatial dependence of attention given to hurricanes, while giving enough visual cues to connect locations along the path to the time the attention was observed. We generated the map shown in Fig 1 by filling in the polygon defined by the set of points lying at the end of a line segment of length proportional to the smoothed usage rate of the related hashtag, along the vector normal to the current velocity of the hurricane, and centered at the hurricane position at the given time. Maps of additional years are provided in the S9-S15 Figs in S1 File.
Our hashtag usage rate is at the day scale, while HURDAT has 3 hour resolution, so the wrapped attention volume is smoothed with a moving average with a window size of one day to avoid discontinuous jumps. This method obscures any sub-day scale resolution on the map, which could be related to the daily fluctuation of tweet volume as well as varying interest in the hurricanes. While we lose some granularity using daily usage rates, the decays in attention are spread out over days and weeks for smaller storms, and months for larger storms. Daily resolution is sufficient to capture the longer decays in attention, which are our primary interest.
Examining the map, we can see the minimal attention paid to Hurricane Harvey as it traveled across the Caribbean sea and made landfall in Mexico. It is only after crossing the Gulf of Mexico that the hashtag registered on our instrument, and only when it was about to make landfall over Texas did the hashtag usage rate approach its maximum rate, approximately 3 of every 10,000 1-grams in English tweets. It appears that the devastation wrought by Harvey primed hurricane-related conversation, as the next hurricane, Irma was talked about long before it made landfall. While Irma was talked about with a similar usage rate as Harvey as it impacted Puerto Rico, Hispaniola, and Cuba, it spiked while making landfall in the Florida keys.
Comparing the attention generated by the previous two storms, Hurricane Maria generated substantially less hashtag usage. The peak of its attention gathered as it made landfall over Puerto Rico as a category 4 storm, with less than a fifth of the attention as the hurricanes making landfall on the US. Part of the reason may be due the affected area being Spanish speaking, while our hashtag usage measurement only counts occurrences in English tweets. We find that usage rates of the 2-gram “huracán maria” in Spanish tweets were also lower than the usage rates for “huracán irma”, but comparable to those for “huracán harvey.” See S2 and S3 Figs in S1 File to compare top hurricane related 2-gram time series for the 2017 hurricane season in English and Spanish.
Another potential contributing factor for the low volume of Hurricane Maria tweets could be that Puerto Rico’s electric grid was destroyed and 95% of cell towers were down in the aftermath of the storm, making it impossible for those directly affected to communicate about the storm . Unfortunately, due to Twitter’s usage norms in this time period, we do not have locations for the vast majority of tweets. The number of people affected by the storms could also help explain the different levels of attention, as both Hurricane Harvey and Irma affected 19 million people, while Maria affected about 4 million .
Hurricane attention comparison
To compare the variation in attention received by different storms, we combined measurements of the hashtag usage rate with deaths and damages caused by each storm from 2009 to 2019.
In Fig 2, we show radar plots (radial, categorical charts) comparing six measurements of impact and attention for each of the eight most damaging hurricanes in the time period of study . S8 Table in S1 File shows the raw measured values for the most damaging hurricanes in this period.
For each plot, starting at the top position and rotating clockwise the measures are: the sum of usage rate of the hashtag, the number of days to reach 90% and 50% of the total attention received during that season, the total cost in dollars attributed to damage caused by the hurricane (in its year), the number of deaths attributed to the hurricane, and maximum usage rate of the hashtag during the year of interest. All measurements are normalized to the maximum value achieved by any hurricane. Hurricane Harvey was the most talked about hurricane, as well as the most damaging. Hurricane Irma was the most talked about on any single day. Hurricane Maria caused the most deaths, and had the longest attention half-life of all measured hurricanes. Raw values for this figure are shown in S8 Table in S1 File. Hashtag usage rate spark lines above each radar plot are normalized to show the common decay shape, and can not be compared to evaluate relative volume, and are shown on a log scale.
Included measurements are:
- Max Usage Rate—peak attention on any single day
- Integrated Usage Rate—total attention over the entire hurricane season
- Quantile 0.9: Q0.9—days to 90% attention
- Quantile 0.99: Q0.99—days to 99% attention
- Damage—total damage caused by the storm in US dollars
- Deaths—total deaths associated with the storm (both direct and indirect)
The relative magnitude of each quantity is shown as a fraction of the maximum value for any storm in the study. The quantile values are non-parametric measurements of the attention time scale—comparable to half-lives but without the assumption of an exponential decay. Some storms receive significant interest months after they pass, usually related to the recovery efforts. Spark lines above each plot show the attention time series for the year after each storm, as measured by the log usage rate, but do not convey relative scale.
The three most damaging storms, Hurricanes Harvey, Maria, and Irma, all destroyed tens of billions of dollars of property. Storms in Fig 2 are ordered by damage, with the least damaging being Hurricane Irene in 2011, which still destroyed an estimated $14 billion in property.
The most deadly North Atlantic hurricane in the past decade was Hurricane Maria, killing over 3000 people over the course of the extended disaster. The next most deadly storms were Hurricanes Matthew, Sandy, Irma, and Harvey, all killing at least 100 people. Among the storms shown in the Fig 2, Hurricanes Florence and Irene were the least deadly, causing 58 and 57 deaths, respectively.
The highest hashtag usage rate on a single day was associated with Hurricane Irma, reaching maxfτ = 4.6 × 10−4, or 4.6 of every 10,000 1-grams, as the storm made landfall over the Florida Keys. Other storms reached comparable single day usage rates, such as Hurricanes Harvey and Matthew, reaching maxf = 3.5 × 10−4 and maxf = 2.6 × 10−4, respectively. Within the top eight most damaging storms, the hashtag associated with Hurricane Maria had the lowest maximum usage rate. The hashtag “#hurricanemaria” appeared only five times for every 100,000 1-grams as Maria made landfall in Puerto Rico.
The highest integrated hashtag usage rate was associated with Hurricane Harvey, followed by Hurricanes Irma, Matthew, and Florence. The integrated hashtag usage rate for “#hurricaneharvey”, I = 2.3 × 10−3. Hashtags associated with Hurricanes Sandy and Irene had the total attention, with I = 3.7 × 10−4 and I = 2.0 × 10−4, respectively.
Due to the extended crisis in the aftermath of Hurricane Maria, the hashtag continued to be used at relatively high volumes even a year after the storm had passed, leading to much larger value for Q0.9 of 175 days [60, 61]. Typical values for Q0.9 were around 1–4 days, with more prolonged and damaging storms like Harvey in 2017 taking 15 days to reach 90% total attention. In comparison no other storm took longer than 100 days to reach this benchmark. We chose the longer term attention timescale benchmark, Q0.99, to describe how long until nearly all storm focused attention has passed. We observe the hashtag associated with Hurricane Maria is the largest for this measurement as well, with Q0.99 of 363 days, which should be interpreted as attention not dying away within a year, since we truncate the timeseries after one year. Hurricane Michael, Sandy, and Harvey also have triple digit values for Q0.99, as they continued to be talked about, albeit at much lower levels than their peak. Other storms quickly lose attention, such as Hurricane Irene, which took only 12 days to reach 99% total attention.
We observed variation in the overall radar plot shape. More recent storms have been more damaging and deadly, and we find higher measures of total attention and attention decay. A number of storms like Sandy, Michael, and Matthew have relatively higher values for both maximum usage rate and number of days to reach 99% total attention. While there is significant variation in the magnitude of these measurements, the essential exogenous shape of the hashtag usage rate timeseries, f, is consistent. We fit a bi-exponential decay model to further quantify how quickly attention decreases, and present the fitted half-lives in S9 and S10 Tables in S1 File.
Attention and impact regressions by category
We next explore the associations between damage, deaths, and attention given to hurricanes. In Fig 3, we show the scaling relationship between attention and impacts for each category storm on the Saffir-Simpson wind scale . The scale assigns a hurricane a category from one to five based on the sustained wind speed. Importantly, this category is often the descriptor used by metrologists to communicate the severity of the storm to the public. For the regression, we assign a storm the maximum observed category. Each sub-panel plots the integrated usage rate, I = ∑t f(t) for hashtag or 2-gram τ, against a measure of storm impact, where t runs over an index of the 365 days after each storm began. I is chosen as a measure of total attention given to the storm during its respective hurricane season, which can be compared across years since it is already normalized to the total volume of conversation on Twitter. Color represents the maximum category storm reached, and the smaller subplots are breakout panels for each category. We include Spearman’s ρ, a non-parametric measure of rank correlation, in each panel.
There is a clear positive association between the total attention represented by hashtags and the impacts of these storms. We reported Spearman’s rho, ρs, in the top left corner of each plot. While for some categories, there is little evidence for a positive association, for the entire dataset ρs ∼ 0.54. We perform a Bayesian linear regression for each category storm between the logI and log impacts. We show the mean model, along with the credible interval within a standard deviation of the mean model. We use hybrid axis with logarithmic scaling for most horizontal and vertical values and linear scaling near zero, in order to show storms that caused zero deaths or damages, as well as storms for which we measured a hashtag usage rate of zero. Changes in axis scaling occur at the blue dashed lines. Generally, more powerful storms received more attention, higher category storms received more attention even when causing minimal damage, and high category storms had a higher regression slope. These results suggest that for powerful storms, a given increase in impact was associated with a larger increase in attention. While for category one storms a 10-fold increase in deaths is associated with a four-fold increase in attention, for category five hurricanes, this same 10-fold increase in attention is associated with a 25-fold increase in attention.
We perform linear regressions on storms in each category separately, a choice that models the attention received by different category storms as separate processes. With models in Regression Models for Impacts, Impact Interactions and Hurricane Category, we separately consider attention as a singular process where we account for the hurricane’s maximum category rating using an explicit indicator variable.
Model choice and fitting procedure.
For each category and each impact, we model total attention as (1) where Ximpact is either log10 deaths or log10 damages caused by each storm. We use a logarithmic model both to capture the scaling relationships between impacts and attention and to inform on the relative changes in attention associated with storm impacts. We offset I by 10−8 and the log impacts, Ximpact by $10,000 and 0.1 deaths, respectively to avoid divergent log data where observed values are equal to zero.
We set a zero-centered normal prior on the slope of the regression model as a1 ∼ normal(0, 1). We set a normal prior on the intercept of the model with mean equal to log10 I = −8, the minimum value of the offset added to I. We did not have strong beliefs about the likely precision of a0 since it was not a priori clear how much attention would be paid to hurricanes with very little associated monetary damage or few deaths. We thus set a weak hyper-prior on the precision of a0, τ ∼ gamma(3, 1); the intercept of the regression is distributed as a0 ∼ normal(−8,τ−1).
We found regression coefficients by sampling with the No-U-Turn-Sampler (NUTS), using 8 chains with 2000 draws each after 1000 steps of burn-in . Our models converged, with the Gelman-Rubin statistic, , never exceeding 1.004 for any model in the 12 models fit.
Model posteriors and discussion.
In Fig 3, we show the fitted regressions for each category. The size of the impact and attention variables vary over many orders of magnitude, but also include zero values, corresponding to storms that cause no deaths or damage, or had zero usage of the hashtag associated with their name during the year the storm was active. Note that it should not be surprising that tropical storms appear to receive less attention via our hashtag usage rate measurement, since they never officially become hurricanes, and thus many of the tropical storm hashtags have an integrated usage rate, I = 0.
To display all data, we use symmetric log axes: logarithmic for large values and linear for small values. We indicate the switch point from linear to log space axis as blue dotted lines. This choice of axes causes the linear regressions on the log transformed data to appear curved for small values.
In each of the small subplots of Fig 3, we show the 1σ credible interval for the model as a band around the mean regression model. The credible interval is noticeably wider for category five storms, which is reasonable given there are only six storms reaching this category. Generally the mean regression lines are ordered such that higher category storms are receiving more attention than lower category storms. The slopes of the regressions are also higher for higher category storms. However, to better understand the models, we need to compare the model parameters individually.
In Fig 4 we provide posterior distributions for model parameters, which show that, as expected, more intense storms receive more attention per unit of log impact than weaker storms. For category five storms, we find a mean regression co-efficient of adeaths = 1.35 ± 0.39, using the format μ±σ where μ is the mean and σ is the standard deviation, while for category one storms we find a mean regression co-efficient of adeaths = 0.61 ± 0.18. For a Table of mean parameter values, see S1 Table in S1 File.
For the model log10 I ∼ a0+a1 Xi, where Xi is either the log number of deaths (A and C) or log damages in dollars associated with the storm (B and D), and log10 I is the log integrated hashtag usage rate. The trend in regression coefficients for association between the log attention and log deaths suggests that higher category storms receive more attention per unit impact, while the trend of intercepts shows increasing baseline attention for a hypothetical minimally disruptive storm causing exactly $1 in damages or one death. For regression coefficients relating log attention to log damages, Category 4 and 5 storms receive more attention per unit increase in log damages than lower category storms. However, the coefficients are smaller in magnitude due to damages varying across 7 orders of magnitude, as compared to deaths varying over 4 orders of magnitude. There is a larger uncertainty for the category 5 intercept values, as only 6 storms of this intensity formed between 2009 and 2019 in the Atlantic basin. At the right of each plot, we show the coefficients for the model fit for all hurricanes (blue violin), excluding tropical storms. Above each category, we show the value of the mean posterior distribution for each parameter. For a Table of mean parameter values, see S1 Table in S1 File.
Looking at associations between log damages and log attention we find adeaths = 0.46 ± 0.07 for category 5 storms, while for category one storms we find adeaths = 0.17 ± 0.05.
To interpret the regression coefficients, aimpact, as representing proportional increases in attention per proportional increase in impact, we exponentiate the coefficient. Thus, our model shows a 10-fold increase in deaths for a category 5 storm is associated with a 22-fold increase in attention, while for a category 1 storm the same 10-fold increase in deaths is associated with a 4-fold increase in attention.
The intercepts, a0, for higher category storms tend to be larger, meaning that for a theoretical minimally disruptive storm causing exactly $1 of damages or one death, a powerful storm would be talked about more, as shown in Fig 4. We believe this trend could continue for category 5 storms, but we have observed only n = 6 such storms for the duration of our attention dataset. We interpret the intercepts as indications of how much attention low-impact storms receive on average.
In Fig 4, we fit another regression model on all hurricanes examining log deaths and log attention. We find a 10-fold increase in deaths is associated with a 14-fold increase in attention, since the mean value of For damages, coefficients tend to be lower than those for deaths: . We interpret this coefficient as a 10-fold increase in damage being associated with no more than a 2-fold increase in attention.
Regression models for impacts, impact interactions and hurricane category
In order to better understand the scaling of attention with hurricane impacts, we fit a number of models on the log transformed data. We applied the same offsets as in the previous section to avoid non-finite log transformed data. We exclude tropical storms, since their attention is not captured in same way as our string matching for hurricanes.
We fit the regression model, (2) where both predictors X are log impacts, which we refer to as regression 1. The regression coefficients can be interpreted as the increase in log attention received for every unit increase in log impact. Likewise, the intercept can be interpreted as the expected attention for a minimally damaging storm causing one death and $1 of damage. This model is distinguished from the previous section by including both log impacts in a single model, while not including an interaction term as later models will.
We set priors for the model as shown in S2 Table in S1 File. We chose the intercept, a0 ∼ normal(−8, 3), to be centered around -8, approximately the lowest usage rate captured in our data, as we guess storms causing 1 death and $1 worth of damage are talked about relatively little, but wish to allow a wide range of uncertainty spanning a few orders of magnitude. We chose the priors for the regression coefficients, adeath ∼ normal(0, 1) and adamage ∼ normal(0, 1), to be weakly informative and centered around zero, as to not bias towards any association. We sampled the coefficients’ posterior distributions using NUTS, using 8 chains with 2000 draws each, after 500 steps of burn-in . We found the model converged, with the maximum value of .
We show the posterior distributions of model parameters for regression one in Panel A of Fig 5, which have a positive scaling between both deaths and damages, and the amount of attention commanded by the storm, as measured by the log hashtag usage rate. We interpret the mean value of a0 = −7.57±0.5 for the regression constant as the expected log hashtag usage rate for a minimally destructive storm, i.e., that in English tweets, the hashtag usage rate would integrate to 10−7.57 over the season. We provide summary statistics in S3 Table in S1 File.
Plots A–C show posterior distributions for regression 1, plots D–G show distributions for regression 2, which includes the addition of an interaction term, and plots H–O showing distribution for regression 3, which includes indicators variables for hurricane categories two through five. The addition of the interaction term, ad,D increases posterior variance for adeaths as well as reducing its mean from adeaths = 0.49 in regression 1 to adeaths = 0.05 in regression 2 and adeaths = 0.12 in regression 3, suggesting that while the number of deaths is associated with increased attention, attention response is primed by destruction. Additionally, the hurricane category indicator variables in regression 3 show the progressive increase in attention given to higher category storms compared to category 1 hurricanes.
At first glance, this level of attention seems remarkably low: if occurring all in a single day, this is little more than 1 usage for every 100 million 1-grams. The most devastating storms can have integrated usage rates of I = 2.3 × 10−3, five orders of magnitude more attention than our regression constant. However, the least impactful storms affect relatively few people, while the most destructive storms significantly disrupt the lives of tens of millions, so the differences in the scale of total hashtag usage rate are not unreasonable. See S8 Table in S1 File for measured values corresponding to each storm.
We find adeath ≃ 0.49 and adamage ≃ 0.24. Because 100.24 ≃ 1.7, considering the results in linear space, a 10-fold increase in damages is associated with a 1.7-fold increase in hashtag usage rates, while a 10-fold increase in deaths is associated with a 3-fold increase.
Prior distributions for the intercept and main effect coefficients are unchanged from regression 1. We set the prior distribution for the interaction coefficient to be ad,D ∼ normal(0, 1), a standard weakly informative prior for regression coefficients. All priors are shown in S4 Table in S1 File. We used identical fitting procedures as above, and found the models converged with a maximum value of
Here, the intercept is largely the same as the simplest regression model. Interpreting adeath as the conditional relationship between log usage rate and log deaths when total damage is $1, the adeath = 0.05 implies that for a 10-fold increase in deaths is associated with a 1.12-fold increase in hashtag usage rate, though the standard error includes zero. Similarly, adamage = 0.22 implies a 10-fold increase in damage is associated with a 1.6-fold increase in hashtag usage rate. Finally, the interaction coefficient ad,D is small, but positive: a 10-fold increase in Xdeath Xdamage is associated with a 1.14-fold increase in hashtag usage rate. Notably, the inclusion of the interaction term significantly reduces the regression coefficient associated with deaths, while the coefficient associated with damage is largely unchanged. We provide summary statistics in S5 Table in S1 File.
This provides evidence that storms that cause a large number of deaths and damages are associated with higher volumes of attention, while a storm causing a large number of deaths but relatively less damage will attract much less attention for Twitter users. One possible explanation for this is that attention is primarily driven by those directly affected, while the Twitter users are not evenly distributed throughout the population. Wealthy people are over-represented among Twitter users, and thus hurricanes that affect capital-poor regions also affect few Twitter users. Second, we performed this regression on data from English language tweets, so the attention paid to storms effecting Spanish speaking regions is an underestimate.
To better understand the effect of hurricane category on attention, we performed a regression including this categorical variable, modeled as (4) where the index j runs from 2 to 5. We did not include a variable for category 1 hurricanes to avoid issues of multi-collinearity. Fitting procedures were identical to above, and we found the model converged with the max value of .
We did not change priors for the model coefficients from above for existing parameters, and we set the coefficients for category indicator variables to a weakly informative prior, . A Table of priors is shown in S6 Table in S1 File. Since we have included our hurricane categories, the interpretation of the intercept a0 is now the expected log integrated hashtag usage rate I for a category one hurricane, which causes one death and $1 of damage. The value is similar to the other regression models. Effect sizes for adamage and ad,D are reduced in magnitude slightly compared to the preceding regression.
As measured by the integrated hashtag usage rate, compared to a category 1 storm causing the same deaths and damages, hurricanes in:
- category 2 receive 1.14 times more attention,
- category 3 receive 1.5 times more attention,
- category 4 receive 5.6 times more attention,
- and category 5 receive 4.6 times more attention.
For each model presented in the paper. SubTable A refers to the regressions by category, while subTable B refers to the later sequential regression models. Each impact variable is presented as the expected increase in attention associated with a 10-fold increase in the variable of interest. For categorical variables we report the expected multiplier for the given hurricane category over a Cat 1 storm. The mean of the fitted posterior regression parameters are provided for the reader in the Appendix in S3, S5 and S7 Tables in S1 File.
We have explored the attention given to hurricanes as measured by the hashtag and 2-gram usage rate. We quantify the relative volume of attention time series for major storms. We find evidence that not only are more powerful—higher maximum category rating—storms talked about more than weaker storms, but they are talked about more when they inflict the same amount of damage or take the same number of lives. Further, different attention scaling relationships exist for different category storms. For the most destructive storms, we demonstrate that a 10-fold increase in deaths is associated with a 25-fold increase in attention, while for weaker storms the same proportional increase in deaths would lead to only a four-fold increase in attention on average.
How people outside of the government agencies and non-governmental organizations (NGOs) tasked with responding to natural disasters perceive the importance of disasters have real-world consequences [64, 65]. We hypothesize that monetary donations to NGOs that assist with hurricane disaster relief efforts are strongly associated with the amount of attention attracted by the hurricane. If this is true, it could be advantageous for NGOs to prospect for financial contributions within the narrow time window when collective attention is focused most strongly on a storm . It is also possible that the speed and scale of governmental relief programs are influenced by popular attention paid to storms, and previous work has shown that relief has been inequiTable in the past . Future work could compare the quantities of non-profit and governmental assistance with attention volume.
While the users of Twitter are certainly not representative of the world, or even English speakers, measuring the text they generate approaches measurement of the population at large, at least more-so than published books or edited newspaper columns [67–71]. The digital signatures left behind by our collective online presence offers rich data for observational studies of everyday language with unprecedented time resolution. Of course, many tweets referencing hurricanes are authored by journalists or news organizations and future efforts could attempt to disentangle the various motivations contributing to the overall usage rate of hashtags and other n-grams.
Another limitation of our work, particularly relevant to any geospatial findings, is that we only consider tweets classified as English. While the density of English speakers closely mirrors the population density for much of the United States, we observe much lower usage rates for the English language hashtags and 2-grams over predominately Spanish speaking areas. While different populations may use different n-grams to reference the same storm, for the purposes of our study we have focused only on the English-speaking population of Twitter.
Future work could consider how to better quantify the total fraction of conversation of Twitter focused on a storm or event of interest. Our current method only includes counts for individual n-grams, which we believe acts as a proxy of total attention, but almost certainly underestimates the total fraction of text devoted to discussing a topic. Hashtag co-occurrence network-based methods could help to identify the most prominent hashtags associated with a given storm, or any event of interest, and to classify tweets as relevant. Examining properties of this network changing in time, such at the integrated usage rate of all significant hashtags within one degree could give a more unbiased view of the total attention surrounding the hurricane than our current method. Other dynamics of hurricanes could be explored in this way, perhaps by encoding Jenson-Shannon Divergence shifts between hashtags as a node attribute , or more simply how the most frequently used hashtags in this ego network change in rank over time, as different phases of the storm occur. With better data coverage for infrequently used hashtags, the effect of new storms on the attention paid to historical storms could be studied using a measure similar to view flow . Authors of previous works studying the effectiveness of NGO hashtag usage following natural disasters could exploit these network based methods .
The authors are grateful for the computing resources provided by the Vermont Advanced Computing Core.
- 1. Shiller RJ. Narrative Economics. American Economic Review. 2017;107(4):967–1004.
- 2. Shiller RJ. Narrative Economics: How Stories Go Viral and Drive Major Economic Events. Princeton University Press; 2019. Available from: https://books.google.com/books?id=Y62SDwAAQBAJ.
- 3. Leskovec J, Backstrom L, Kleinberg J. Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining; 2009. p. 497–506.
- 4. Tufekci Z. “Not this one” social movements, the attention economy, and microcelebrity networked activism. American Behavioral Scientist. 2013;57(7):848–870.
- 5. Franck G. The economy of attention. Journal of sociology. 2019;55(1):8–19.
- 6. Humphreys A, Kozinets RV. The Construction of Value in Attention Economies. ACR North American Advances. 2009;.
- 7. Citton Y. The ecology of attention. John Wiley & Sons; 2017.
- 8. Franck G. Scientific Communication–A Vanity Fair? Science. 1999;286(5437):53–55.
- 9. Nowak A, Kacprzyk-Murawska M, Serwotka E. Social psychology and the narrative economy. In: Non-Equilibrium Social Science and Policy. Springer, Cham; 2017. p. 45–58.
- 10. IAB Internet Advertising Revenue Report; 2018. Available from: https://www.iab.com/insights/iab-internet-advertising-revenue-report-2018-full-year-results/.
- 11. Newman N. The rise of social media and its impact on mainstream journalism. Working Paper. 2009;.
- 12. Perrin A. Social media usage. Pew research center. 2015; p. 52–68.
- 13. Michel Jean-Baptiste, Shen Yuan Kui, Aiden Aviva Presser, Veres Adrian, Gray Matthew K, Team, The Google Books, et al. Quantitative Analysis of Culture Using Millions of Digitized Books. Science. 2011;331(6014):176–182. pmid:21163965
- 14. Pechenick EA, Danforth CM, Dodds PS. Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution. PLoS ONE. 2015;10:e0137041.
- 15. Gallagher RJ, Reagan AJ, Danforth CM, Dodds PS. Divergent discourse between protests and counter-protests: #BlackLivesMatter and #AllLivesMatter. PLOS ONE. 2018;13(4):e0195644. pmid:29668754
- 16. of Excellence NNSC, 2015. Internet Trolling as a Hybrid Warfare Tool: The Case of Latvia;.
- 17. Colleoni E, Rozza A, Arvidsson A. Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data. Journal of Communication. 2014;64(2):317–332.
- 18. Gruzd A, Roy J. Investigating Political Polarization on Twitter: A Canadian Perspective. Policy & Internet. 2014;6(1):28–45.
- 19. Barberá P, Rivero G. Understanding the Political Representativeness of Twitter Users. Social Science Computer Review. 2015;33(6):712–729.
- 20. Broniatowski DA, Jamison AM, Qi S, AlKulaib L, Chen T, Benton A, et al. Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate. American Journal of Public Health. 2018;108(10):1378–1384. pmid:30138075
- 21. Subrahmanian VS, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, et al. The DARPA Twitter Bot Challenge. Computer. 2016;49(6):38–46.
- 22. Salinas S. Twitter stock plunges as company blames ad targeting problems for earnings miss. CNBC. 2019;.
- 23. Shahbazi B. StoryMiner: An Automated and Scalable Framework for Story Analysis and Detection from Social Media. UCLA; 2019.
- 24. Dodds PS, Minot JR, Arnold MV, Alshaabi T, Adams JL, Dewhurst DR, et al. Fame and Ultrafame: Measuring and comparing daily levels of ‘being talked about’ for United States’ presidents, their rivals, God, countries, and K-pop. arXivorg. 2019;.
- 25. Alshaabi T, Minot JR, Arnold MV, Adams JL, Dewhurst DR, Reagan AJ, et al. How the world’s collective attention is being paid to a pandemic: COVID-19 related 1-gram time series for 24 languages on Twitter. arXiv preprint arXiv:200312614. 2020;.
- 26. Alshaabi T, Adams JL, Arnold MV, Minot JR, Dewhurst DR, Reagan AJ, et al. Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter. arXiv preprint arXiv:200712988. 2020;.
- 27. Dorogovtsev SN, Mendes JFF. Evolution of networks with aging of sites. Physical Review E. 2000;62(2):1842–1845.
- 28. Golosovsky M, Solomon S. Stochastic Dynamical Model of a Growing Citation Network Based on a Self-Exciting Point Process. Physical Review Letters. 2012;109(9):098701.
- 29. Valverde S, Solé RV, Bedau MA, Packard N. Topology and evolution of technology innovation networks. Physical Review E. 2007;76(5):056118.
- 30. Higham KW, Governale M, Jaffe AB, Zülicke U. Unraveling the dynamics of growth, aging and inflation for citations to scientific articles from specific research fields. Journal of Informetrics. 2017;11(4):1190–1200.
- 31. Higham KW, Governale M, Jaffe AB, Zülicke U. Fame and obsolescence: Disentangling growth and aging dynamics of patent citations. Physical Review E. 2017;95(4):042309.
- 32. Wang D, Song C, Barabási AL. Quantifying Long-Term Scientific Impact. Science. 2013;342(6154):127–132.
- 33. Candia C, Jara-Figueroa C, Rodriguez-Sickert C, Barabási AL, Hidalgo CA. The universal decay of collective memory and attention. Nature Human Behaviour. 2019;3(1):82–91.
- 34. Lorenz-Spreen P, Mønsted BM, Hövel P, Lehmann S. Accelerating dynamics of collective attention. Nature Communications. 2019;10(1):1–9.
- 35. Crane R, Sornette D. Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences. 2008;105(41):15649–15653.
- 36. Lehmann J, Gonçalves B, Ramasco JJ, Cattuto C. Dynamical classes of collective attention in Twitter. New York, New York, USA: ACM; 2012.
- 37. Wu F, Huberman BA. Novelty and collective attention. Proceedings of the National Academy of Sciences. 2007;104(45):17599–17601.
- 38. Kiley D. Characterizing the Shapes of Collective Attention using Social Media. The University of Vermont; 2014.
- 39. Ladle RJ, Correia RA, Do Y, Joo GJ, Malhado AC, Proulx R, et al. Conservation culturomics. Frontiers in Ecology and the Environment. 2016;14(5):269–275.
- 40. Eom YH, Puliga M, Smailović J, Mozetič I, Caldarelli G. Twitter-based analysis of the dynamics of collective attention to political parties. PloS one. 2015;10(7):e0131184.
- 41. Mitra T, Wright G, Gilbert E. Credibility and the dynamics of collective attention. Proceedings of the ACM on Human-Computer Interaction. 2017;1(CSCW):1–17.
- 42. Sasahara K, Hirata Y, Toyoda M, Kitsuregawa M, Aihara K. Quantifying Collective Attention from Tweet Stream. PLOS ONE. 2013;8(4):e61823.
- 43. Sakaki T, Okazaki M, Matsuo Y. Earthquake Shakes Twitter Users: Real-Time Event Detection by Social Sensors. In: Proceedings of the 19th International Conference on World Wide Web. WWW’10. New York, NY, USA: Association for Computing Machinery; 2010. p. 851–860. Available from: https://doi.org/10.1145/1772690.1772777.
- 44. Eisensee T, D S. News droughts, news floods, and US disaster relief. The Quarterly Journal of Economics. 2007;.
- 45. Willison CE, Singer PM, Creary MS, Greer SL. Quantifying inequities in US federal response to hurricane disaster in Texas and Florida compared with Puerto Rico. BMJ Global Health. 2019;4(1):e001191.
- 46. Allen DE, McAleer M. President Trump tweets supreme leader Kim Jong-Un on nuclear weapons: A comparison with climate change. Sustainability. 2018;10(7):2310.
- 47. Niles MT, Emery BF, Reagan AJ, Dodds PS, Danforth CM. Social media usage patterns during natural hazards. PLOS ONE. 2019;14(2):1–16.
- 48. Cody EM, Stephens JC, Bagrow JP, Dodds PS, Danforth CM. Transitions in climate and energy discourse between Hurricanes Katrina and Sandy. Journal of Environmental Studies and Sciences. 2017;7(1):87–101.
- 49. Ahmed MA, Sadri AM, Pradhananga P. Social Media Communication Patterns of Construction Industry in Major Disasters. Pre-print. 2020;.
- 50. Martín Y, Li Z, Cutter SL. Leveraging Twitter to gauge evacuation compliance: Spatiotemporal analysis of Hurricane Matthew. PLOS ONE. 2017;12(7):1–22.
- 51. Wang Z, Lam NSN, Obradovich N, Ye X. Are vulnerable communities digitally left behind in social responses to natural disasters? An evidence from Hurricane Sandy with Twitter data. Applied Geography. 2019;108:1–8. https://doi.org/10.1016/j.apgeog.2019.05.001
- 52. Li Q, Shah S, Thomas M, Anderson K, Liu X, Nourbakhsh A, et al. How Much Data Do You Need? Twitter Decahose Data Analysis; 2016.
- 53. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of Tricks for Efficient Text Classification. arXivorg. 2016;.
- 54. Alshaabi T, Dewhurst DR, Minot JR, Arnold MV, Adams JL, Danforth CM, et al. The growing echo chamber of social media: Measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020. arXiv preprint arXiv:200303667. 2020;.
- 55. Wikipedia contributors. 2010 Atlantic hurricane season—Wikipedia, The Free Encyclopedia; 2019. Available from: https://en.wikipedia.org/w/index.php?title=2010_Atlantic_hurricane_season&oldid=895399672.
- 56. Weinkle J, Landsea C, Collins D, Musulin R, Crompton RP, Klotzbach PJ, et al. Normalized hurricane damage in the continental United States 1900–2017. Nature Sustainability. 2018;1(12):808–813.
- 57. Scott M. Hurricane Maria’s devastation of Puerto Rico; 2020. Available from: https://www.climate.gov/news-features/understanding-climate/hurricane-marias-devastation-puerto-rico.
- 58. Bureau UC. Hurricanes; 2019. Available from: https://www.census.gov/topics/preparedness/events/hurricanes.2017.html.
- 59. Wen-Yuan Liu, Bao-Wen Wang, Jia-Xin Yu, Fang Li, Shui-Xing Wang, Wen-Xue Hong. Visualization classification method of multi-dimensional data based on radar chart mapping. In: 2008 International Conference on Machine Learning and Cybernetics. vol. 2; 2008. p. 857–862.
- 60. Román MO, Stokes EC, Shrestha R, Wang Z, Schultz L, Carlo EAS, et al. Satellite-based assessment of electricity restoration efforts in Puerto Rico after Hurricane Maria. PloS one. 2019;14(6). pmid:31251791
- 61. Zorrilla CD. The View from Puerto Rico—Hurricane Maria and Its Aftermath. New England Journal of Medicine. 2017;377(19):1801–1803.
- 62. Taylor HT, Ward B, Willis M, Zaleski W. The Saffir-Simpson Hurricane Wind Scale. Atmospheric Administration: Washington DC, USA. 2010;.
- 63. Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research. 2014;15(1):1593–1623.
- 64. Miller LM. Collective disaster responses to Katrina and Rita: Exploring therapeutic community, social capital and social control. Southern Rural Sociology. 2007;22(2):45–63.
- 65. Burnside R, Miller DS, Rivera JD. The impact of information and risk perception on the hurricane evacuation decision-making of greater New Orleans residents. Sociological Spectrum. 2007;27(6):727–740.
- 66. Halloran M. Analysis Finds Disaster Relief Support Swift But Short, Recurring Donors Crucial | Classy; 2018. Available from: https://www.classy.org/blog/analysis-disaster-relief-support-recurring-donors-crucial.
- 67. Mislove, A, Lehmann, S, Ahn, Y Y, ICWSM, JP Onnela. Understanding the Demographics of Twitter Users. aaaiorg. 2011;.
- 68. Java A, Song X, Finin T, Tseng B. Why we twitter. In: the 9th WebKDD and 1st SNA-KDD 2007 workshop. New York, New York, USA: ACM Press; 2007. p. 56–65.
- 69. Housley W, Procter R, Edwards A, Burnap P, Williams M, Sloan L, et al. Big and broad social data and the sociological imagination: A collaborative response. Big Data & Society. 2014;1(2):205395171454513.
- 70. Sloan L, Morgan J, Burnap P, Williams M. Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data. PLOS ONE. 2015;10(3):e0115545.
- 71. Wojcik S, Hughes A. How Twitter Users Compare to the General Public; 2019. Available from: https://www.pewresearch.org/internet/2019/04/24/sizing-up-twitter-users.
- 72. Dodds PS, Minot JR, Arnold MV, Alshaabi T, Adams JL, Dewhurst DR, et al. Allotaxonometry and rank-turbulence divergence: A universal instrument for comparing complex systems. arXiv preprint arXiv:200209770. 2020;.
- 73. Garcia-Gavilanes R, Mollgaard A, Tsvetkova M, Yasseri T. The memory remains: Understanding collective memory in the digital age. Science advances. 2017;3(4):e1602368.
- 74. Wukich C, Steinberg A. Nonprofit and Public Sector Participation in Self-Organizing Information Networks: Twitter Hashtag and Trending Topic Use During Disasters. Risk, Hazards & Crisis in Public Policy. 2013;4(2):83–109.