Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Performance of Social Network Sensors during Hurricane Sandy

  • Yury Kryvasheyeu,

    Affiliation: National Information and Communications Technology Australia, Melbourne, Victoria, Australia

  • Haohui Chen,

    Affiliation: National Information and Communications Technology Australia, Melbourne, Victoria, Australia

  • Esteban Moro,

    Affiliations: Department of Mathematics & GISC, Universidad Carlos III de Madrid, Leganés, Spain, Instituto de Ingeniería del Conocimiento, Universidad Autónoma de Madrid, Madrid, Spain

  • Pascal Van Hentenryck,

    Affiliations: National Information and Communications Technology Australia, Melbourne, Victoria, Australia, Research School of Computer Science, Australian National University, Canberra, Australian Capital Territory, Australia

  • Manuel Cebrian

    manuel.cebrian@nicta.com.au

    Affiliations: National Information and Communications Technology Australia, Melbourne, Victoria, Australia, Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America

Performance of Social Network Sensors during Hurricane Sandy

  • Yury Kryvasheyeu, 
  • Haohui Chen, 
  • Esteban Moro, 
  • Pascal Van Hentenryck, 
  • Manuel Cebrian
PLOS
x

Abstract

Information flow during catastrophic events is a critical aspect of disaster management. Modern communication platforms, in particular online social networks, provide an opportunity to study such flow and derive early-warning sensors, thus improving emergency preparedness and response. Performance of the social networks sensor method, based on topological and behavioral properties derived from the “friendship paradox”, is studied here for over 50 million Twitter messages posted before, during, and after Hurricane Sandy. We find that differences in users’ network centrality effectively translate into moderate awareness advantage (up to 26 hours); and that geo-location of users within or outside of the hurricane-affected area plays a significant role in determining the scale of such an advantage. Emotional response appears to be universal regardless of the position in the network topology, and displays characteristic, easily detectable patterns, opening a possibility to implement a simple “sentiment sensing” technique that can detect and locate disasters.

Introduction

Natural, man-made and technological disasters present a constant threat to society [1]. The increased frequency, intensity and impact of such events are often attributed to the effects of climate change [24]. Consensus is growing that the likelihood and potential damage of natural disasters in the future will rise, and there is a need to adequately prepare for their consequences [58]. An integral part of such preparation efforts is an understanding of the information flow during disasters in order to derive early-warning sensors, track public awareness, gather emergency and relief information and predict human behavior, such as escape panic [911]. Conveniently, online social media, like Twitter and Facebook, have matured into prominent communication platforms and provide an unprecedented opportunity to record and analyze vast amounts of information [12]. The potential of these networks is already leveraged during natural disasters [1317], with applications in situation awareness [18,19], event detection [20,21], search/locating persons [22,23] or helping through crowdsourcing initiatives [24,25].

One particular network phenomenon, the “friendship paradox”, may increase efficiency of monitoring disaster related information. The paradox was studied in a seminal work by Feld [26] and is known colloquially as “your friends have more friends than you do”. Feld showed that a node in a social network on average has lower number of links than the average of links its friends have. This occurs because well-connected nodes are included multiple times in a set of “friends-of-friends”, therefore boosting the corresponding average. The original paradox and its strong form (formulated for median rather than mean averaging) were observed experimentally in the context of online social networks [27,28] and networks of co-authorships and citations [29]. Although the initial focus of Feld’s study was on psychological implications of the paradox (e.g. potential perception of inadequate social inclusion), his finding inspired a simple technique of forming a sample group with network centrality above what random sampling allows. This could be achieved without any global knowledge of network topology, just by using friends of randomly selected people instead of themselves. Because centrality often appears in correlation with other attributes—like activity, popularity, health or income [27,29]—friends are exposed earlier to a contagion [30,31] or information that propagates through the network [32].

In a disaster, the ability to implement efficient and early detection of emergency information is extremely valuable and a sensor method technique based on the “friendship paradox” is attractive for that purpose. Existing experimental validations of the sensor method [30,32] confirm its applicability for endogenous processes, when the spread of contagion or exchange of information occurs only between the nodes of a network. Contrary to such a spread of infection or social network memes, information about disasters is carried simultaneously by many other external channels. Considering this complex interplay of exogenous and endogenous propagation modes, and factoring in the speed, scale and strong geographical nature of phenomena such as hurricanes, it is not immediately obvious whether the sensor method would perform reliably in a disaster. To address this central question, we study performance of the sensor method during Hurricane Sandy to establish if there is an early awareness advantage, what is its magnitude, and what is the effect of geographical location of users. Finally, while there is evidence that centrality correlates with measurable attributes [2729], it is unclear if there is an underlying correlation with personality or behavior traits. To study this, we employ sentiment analysis to explore differences between random control groups and corresponding sensor groups in terms of the timeliness and magnitude of their emotional response.

Data and Methods

Context of the research: Hurricane Sandy and its digital traces on Twitter

The disaster event at the center of our case study is Hurricane Sandy, the largest hurricane of the 2012 season and one of the costliest disasters in the history of the United States. Sandy was a late season hurricane that formed on October 22 2012 at 12:00 UTC about 500 km south-southwest of Kingston, Jamaica. It made its first landfall in Jamaica at 19:00 UTC on October 24 as a Category 1 hurricane, then as a Category 3 hurricane in Cuba at 05:30 UTC on October 25, subsequently weakening down to Category 1 as it moved through the Bahamas. It continued to grow in size and, while moving northeast along the United States coast, re-intensified to the maximum wind speeds of 85 knots at 12:00 UTC on October 29, about 350 km southeast of Atlantic City. The next day the hurricane weakened to a post-tropical storm and made its landfall at 23:30 UTC on October 29 near Brigantine in New Jersey. At the time of landfall the wind reached 70 knots and the storm surge was as high as 3.85 meters, with prevalent levels between 0.8 and 2.8 meters along the coast of New Jersey and New York. The storm surge was responsible for most of the damage to houses, totaling up to 650,000 destroyed or damaged buildings. Over 8.5 million people were affected by power losses that lasted for weeks in some areas. According to the National Hurricane Center report [33], Sandy caused 147 direct casualties along its path and brought damage in excess of $50 billion for the United States.

Raw datasets

Hurricane Sandy attracted extensive media coverage, both in traditional broadcasting media and online. In our work we look at digital traces of the hurricane on Twitter. Raw data collected for analysis is comprised of two principal sets of Twitter messages. The first one consists of messages with the hashtag “#sandy” posted between October 15 and November 12, in the period that precedes the formation of the hurricane and extends beyond its landfall in the continental United States. The data includes the text of messages and a range of additional information, such as message identifiers, user identifiers, followees counts, re-tweet statuses, self-reported or automatically detected locations, timestamps, and sentiment scores. The second dataset has a similar structure and is collected within the same timeframe; however, instead of a hashtag, it includes all messages that contain one or more instances of specific keywords, deemed to be relevant to the event and its consequences (“sandy”, “hurricane”, “storm”, “superstorm”, “flooding”, “blackout”, “gas”, “power”, “weather”, “climate”, etc.). The full list of keywords used to build up the dataset is provided in S1 Table. Both datasets were obtained through the analytics company Topsy Labs [34], and relationship graphs (list of followees for each user) were reconstructed using Twitter API. In total there are 52.55 million messages from 13.75 million unique users available for analysis.

Location data and geocoding

Raw data is filtered to include only those messages that contain location information. Since only a minor fraction of the messages (about 1.2% for the hashtag dataset and 1.5% for the keywords dataset) are geo-tagged by Twitter, we attempt to extract additional information from incomplete self-reported data in user profiles. Such data includes self-identification of a country, state, province, city or town, or any arbitrary combination of those items. We analyze only profile data, instead of searching for location-specific text within messages, to avoid ambiguity of dealing with the context of such in-text location mentions (hypothetical, past or future travels, messages about other people, abstract mentions of various places, etc.). After crosschecking partial location information against coordinates of all major administrative regions and cities worldwide, 46% of messages (or 43% of users) were encoded with location data. Precision of geocoding varies between the exact latitude and longitude of a user, as recorded by Twitter, and the coordinates of the center of an administrative unit that returned a match to a self-reported location. The rate of location detection compares favorably with other studies, e.g. 6.6% detection rate in Mislove et al. [35], where Google Maps API interpretation of self–reported location strings was used to obtain coordinates.

We further filter for users from the United States and Canada, to reduce variations in time zones and languages of tweets. Hurricane path and extent data, shown in Fig. 1, are utilized to distinguish users based on whether they were directly affected by Sandy. After filtering for geocoded messages the hashtag dataset includes 3.65 million messages from 1.24 million users (25.6% of them directly affected by the hurricane) and the keyword dataset includes 24.15 million messages from 5.98 million users (14.1% in the affected region).

thumbnail
Fig 1. The track and approximate extent of Hurricane Sandy, combined with the heatmap of geolocated tweets density.

The path of the hurricane from the moment of its formation until dissipation is accompanied by the approximate extent of the hurricane force winds. Three threshold levels distinguished by shading correspond to the hurricane forces between Categories 1 and 3 (34, 50 and 64 knots respectively). An outer extent of the Category 1 winds is outlined in red and serves as a border of the area directly affected by the hurricane.

http://dx.doi.org/10.1371/journal.pone.0117288.g001

Relevance filtering

The last filter that we impose on the raw data is content analysis to insure a message is relevant to Hurricane Sandy. Our study of the awareness advantage relies on the time of the first hurricane-related tweet for each user, which must be determined as correctly as possible. Potential issues may arise equally from data incompleteness or excess. Incomplete data is a problem for the hashtag-based dataset, because hashtags are not used systematically in every message and some (or even all) relevant messages may be overlooked. In our case, the hashtag dataset includes 3.65 million messages, but the same users within the same period of time are represented by 11.07 million messages in the extended dataset. Although some additional messages are not related to the hurricane, many are, and must be included in the analysis. To avoid “false positives” (messages with no relevance to Hurricane Sandy) we implement simple filtering described below.

The evolution of Hurricane Sandy provides a convenient frame of reference in order to check the relevance. Since the hurricane was first classified as such and officially assigned its name on October 22, any keyword with a significant level of activity before that date should be filtered out to avoid inclusion of unrelated information. S1 and S2 Figs. summarize the histograms of messages matching specific keywords and demonstrate that the majority of them suffer from the irrelevance noise, i.e. have a significant level of activity prior to October 22. Regrettably, certain keywords of interest (“storm”, “power” and “gas”) are contaminated by irrelevant messages simply due to their general nature and/or multiple meanings: for instance “storm” is mentioned in messages about small scale local weather events and “power” is used not just in the context of post-hurricane power outages, but also in the context of politics against a backdrop of the presidential election campaign. To eliminate noise from the datasets, only messages with a word “sandy”, either in a hashtag or keyword form, were included in the analysis. The effect of filtering is demonstrated in Fig. 2, which compares histograms for messages without filtering, moderate filtering (words include “sandy”, “storm”, “hurricane”, “huracán”, “superstorm” and “frankenstorm”), and strict filtering (“sandy”). The results indicate that only the strict filtering succeeds in suppressing noise messages prior to the formation of the hurricane. Relevance filtering brings the total volume of data down to 4.51 million messages from 1.39 million unique users.

thumbnail
Fig 2. Effect of different levels of content filtering.

Histograms show the number of messages over time (horizontal axis represents an offset in hours with respect to the hurricane landing time at 00:00 UTC on October 30 2012) for three different levels of filtering. The filtering is implemented to discard messages that have hurricane-related keywords, but were generated prior to October 22 2012, when Hurricane Sandy was officially named (approximately at -200 hours on the time axis). Strong filtering avoids this early noise of irrelevant messages that may skew the estimate of an entry time. The most reliable form of such filtering is achieved by including only those messages that have “sandy” as part of a text or as a hashtag.

http://dx.doi.org/10.1371/journal.pone.0117288.g002

Results

Lead-time in awareness

Arguably the most important aspect of the information flow during a disaster is awareness. Given the limitations of our dataset and the lack of retrospective studies into the link between disaster awareness and patterns of online activity, we assume that the time a person becomes aware about the hurricane and tweets about it are close to each other. To evaluate performance of the sensor method, we focus on the entry time t, defined as the time a user first appears in our dataset by posting a message relevant to Hurricane Sandy. Following the conventional terminology, we call an original random sample a control group, and a group formed from their friends a sensor group. Let us define the lead-time as the difference between the average entry times of the sensor group and its corresponding control group:, with negative lead-times indicating awareness advantage. We estimate lead-times of sensor over control groups across the range of sample sizes from 500 to 100,000 users. Control groups are formed by random selection from the pool of users with known geographical location. Sensor groups are formed from the friends (followees) of users in their corresponding control groups; the two groups are of the same size and without user duplication. In all the analysis reported below and in the horizontal axes of figures, times are given as offsets in hours with respect to the 00:00 UTC on October 30, which is approximately the time of Sandy’s landfall on the continental United States.

To start our assessment of the sensor method we look at basic indicators, such as the entry time, the total number of messages, and the counts of friends and followers for each user. For the sensor method to work, the relationship must exist between users’ entry times and their topological (node degree) or behavioral (activity) characteristics. We observe that users with early entry times are characterized by an increased level of activity, as seen in Fig. 3A. Early entrants also have higher network centrality expressed by their in-degree (number of followers) and out-degree (number of friends or followees), shown in Fig. 3B. These direct relationships between entry times and other characteristics are especially pronounced in the pre-landing stage of the hurricane, weakening in the post-landing stage.

thumbnail
Fig 3. Average activity (A) and mean number of friends and followers (B) for users as a function of their entry time.

Analysis shows that users who appear in the dataset early demonstrate higher level of activity and are characterized by higher counts of friends and followers (occupy a more central network position). These features are especially pronounced in the pre-landing stage of the hurricane history (landing time at 00:00 UTC on October 30 2012 is taken as a reference zero point). The fact that both activity and centrality correlate with entry time (awareness) suggests that the “friendship paradox” holds and sensor groups have an advantage of awareness lead-time, the magnitude of which is to be established.

http://dx.doi.org/10.1371/journal.pone.0117288.g003

An example of a distribution of messages over time is shown in Fig. 4 for a random control sample and its corresponding sensor group. The inset presents a histogram of tweets, with negligible level of initial activity that builds up and peaks in the landfall day, slowly falling off afterwards. The pattern is largely the same for control and sensor groups, except for the absolute level of activity, with the sensor group being more active. The cumulative distribution function of entry times shows that the sensor group curve is shifted to the left, which confirms earlier entry times. The size of the sample in this particular example is 5,000 users, but all sample sizes considered in the study exhibit a similar left shift in the cumulative distribution and an elevated level of activity. The magnitude of the shift and the scale of difference in the activity level both decrease when the size of a sample increases.

thumbnail
Fig 4. Typical cumulative distribution functions of messages posted by a random control group and its corresponding sensor group.

Figure shows a distribution of messages over time (horizontal axis represents an offset in hours with respect to the hurricane landing time at 00:00 UTC on October 30 2012), discretized on a daily basis with diurnal oscillations (and discrete steps in cumulative representation) smoothed out by the Gaussian kernel density estimation with 8-hour bandwidth. The inset shows simple daily count histograms, which peak in the landing day, with the sensor group activity at significantly higher level. Left shift of the sensor group’s cumulative distribution confirms that it consistently leads in terms of the entry time (awareness).

http://dx.doi.org/10.1371/journal.pone.0117288.g004

Preliminary findings discussed above reveal the link between awareness and centrality, which results in early awareness of sensor groups. It is important to estimate the magnitude of the lead-time and the influence of other factors, in particular the size of the sample and geographical location of users. Lead-time as a function of sample size is presented in Fig. 5 (sampling without control over location is given in panel A in a solid black line). In the range of sample sizes considered, the lead-time varies between -11 and -5 hours, with sensor groups consistently showing earlier average entry time. An increase in the size of the sample shortens the lead-time and reduces its variance, as previously reported in other studies [32], which is explained by the asymptotic convergence of control and sensor group properties to those of a whole population. Results for key metrics, including tweeting activity levels, lead-times and entry times, are summarized in S2 Table.

thumbnail
Fig 5. Lead-time magnitude for control and sensor groups of various combinations of geographical origin (A) and comparison against a null model with shuffled timestamps (B).

The lead-time magnitude and its variance (A) both decrease with an increasing sample size. Geography plays an important role in determining the scale of the awareness advantage. Longest lead-time is achieved through the combination of network centrality and geographical relevance in the case of “control out—sensor in” combination. The geographical factor outweighs the network effect, as illustrated by the positive lead-times (or rather lag times in this case) for the “control in—sensor out” combination. Relative under-performance of the actual data against a randomized null model (B) may be caused by the exogenous, rather than endogenous, mode of information spread and the correlation between centrality and tweeting frequency.

http://dx.doi.org/10.1371/journal.pone.0117288.g005

We repeat the analysis with direct control over the location of users. Although it is difficult to accurately identify the area directly affected by the hurricane (because of the multitude of its effects including winds, storm surge, rain, snow, gas and electricity outages), the path and extent of hurricane force winds provide a good approximation. The track and wind radii data obtained from National Hurricane Center [36] are used to outline the affected area, which is shown in Fig. 1. The border of this area is used for selective sampling of individuals directly hit by Hurricane Sandy. Such geographically selective sampling is possible in four combinations: both control and sensor groups are within the affected area; control groups in and sensor groups out; control groups out and sensor groups in; and finally both groups out of the hurricane-affected area.

Key statistics for these combinations are summarized in Fig. 5 and S3S6 Tables. It can be seen that geography strongly affects awareness. Four combinations of the geographical origin for control and sensor groups all result in the change of the lead-time magnitude compared to the sampling without regard to the location (the solid black line in Fig. 5A). The change is moderate if both groups are geographically similar (both either inside or outside of the disaster zone), with affected pairs giving slightly longer and unaffected pairs slightly shorter lead-times (see labeled orange and blue trends in Fig. 5A). The change is strong for mixed combinations, to the extent that the lead-time reverses its sign (and indicates lagging) when the control group is within and the sensor group is outside of the affected area (green line in Fig. 5A). The longest lead-times arise for the combination of two factors: geographical relevance and central position in the network topology. This case is illustrated by the purple trend in Fig. 5A for control groups formed outside (random position in the topology and low geographical relevance) and sensor groups within the disaster area (high geographical relevance and central position in the network topology). It could be argued that the direct relevance of the event influences one’s behavior in seeking, transmitting and generating information more than one’s position as a central node of a social network. A similar explanation is coined to explain other digital traces of Hurricane Sandy, i.e. photographs posted on Flickr [37], where the number of pictures peaks close to the landing time and suggests that the observed severity of the disaster may motivate people to document it with higher intensity. Finally, it is noteworthy that the entry times for the sensor groups located inside the affected area are actually negative and correspond to the pre-landing phase of the hurricane, see S3 and S5 Tables.

In summary, our experiments show that the sensor method results in the awareness advantage on a scale between 3 and 26 hours, depending on the sample size and geographical origin of the groups.

To evaluate the statistical significance of the lead-times obtained above, we compare them to the null model where the timestamps of all messages in our dataset are randomly shuffled. Such a null model aims to preserve the correlation between centrality and normal tweeting frequency, serving as an upper limit on the performance of the sensor method assuming that every user tweets about the disaster shortly after becoming aware about it. Unfortunately, our data does not contain full tweeting history of users on topics unrelated to the hurricane, making it difficult to establish underlying normal tweeting frequency. This may introduce certain bias into the shuffled null model, illustrated by our preliminary results in Fig. 3A, where people who first post a message about the hurricane after its landfall tweet considerably less than those who have pre-landing entry times. We cannot assume with any certainty that this stark difference in the activity level is a direct result of the difference in users’ regular tweeting frequency; therefore we treat our null model as an overestimated upper bound on achievable lead-times. Comparison is presented in Fig. 5B, with the null model lead-times exceeding those in the actual data. This suggests that the spread of the Sandy-related information on Twitter is not purely viral and endogenous, as in that case the actual lead-times would outperform the null model [32]. Nonetheless, lead-times in sensors are significant, reaching 60%–80% of an upper bound predicted by the null model. Future development of more complex null models, better suited for exogenous processes, may be required to adequately test experimental results.

Sentiment study

We demonstrated above that the sensor method is generally successful in selecting users with high centrality, activity and awareness. An important question remains if they also differ in their emotional response. To study this, we employ several sentiment analysis techniques. Primarily, we use the sentiment scores generated by a proprietary algorithm from a data provider, analytics company Topsy Labs [34]. During sentiment analysis of a message each word is matched against a dictionary of keywords and assigned a weight that reflects its emotional impact. Total weights are calculated, normalized by the word count and returned as either a relative sentiment (average of all scores taking into account their sign) or an absolute sentiment (average of absolute values). In our analysis we use the relative sentiment, because it is indicative both of the strength and polarity of sentiment. We also use discrete scores (1, 0 or -1) to distinguish positive, neutral and negative messages and to monitor their fractions in the stream of messages posted.

Since the exact details of the sentiment detection algorithm (i.e. the dictionaries of emotion words and their respective weights) were not published by Topsy and thus cannot be fully reproduced, we verify the analysis with two additional techniques freely available for academic research. The first one is Linguistic Inquiry and Word Count [38], a general-purpose text analysis library widely used for detection and classification of emotions in texts. On the most basic level, LIWC provides frequencies of occurrence of positive and negative emotional markers in texts. To combine these two measures into a single metric of sentiment polarity we follow Taboada et al. [39] and use the difference between the positive score and the scaled (by the factor of 1.5) negative score, a procedure that compensates for a statistical prevalence of positive emotions. The second tool is SentiStrength by Thelwall et al. [40], developed specifically for the sentiment classification in short online messages characterized by the frequent use of non-standard spelling, slang, abbreviations and emoticons. Our comparison shows that all three techniques produce highly consistent temporal sentiment trends, shown in Fig. 6, that differ only in the scaling factor and in the case of SentiStrength a moderate vertical offset (upward shift of approximately 10% of the peak value is implemented to bring the trend inline with LIWC and Topsy). We conclude that all of these techniques are equally adequate for the study, and the behavior detected is robust regardless of the specific measurement tool applied.

thumbnail
Fig 6. Temporal evolution of sentiment measured by three different techniques (Topsy, LIWC and SentiStrength).

Comparison of the proprietary Topsy algorithm against freely available alternatives shows that sentiment trends over time (horizontal axis represents an offset in hours with respect to the hurricane landing time at 00:00 UTC on October 30 2012) are consistent between different tools. Positive (posemo) and negative (negemo)emotion scores provided by Linguistic Inquiry and Word Count library were combined into a single polarity measure as score = posemo -1.5·negemo, and the result of SentiStrength was shifted upwards by approximately 10% of its peak value.

http://dx.doi.org/10.1371/journal.pone.0117288.g006

The temporal evolution of sentiment is tracked as follows: we discretize time into non-overlapping bins of equal duration and take an average of relative sentiment scores for messages posted during each time step. To suppress the noise, we use basic smoothing by a three-point running average, when a value in a time-series is averaged with its nearest neighbors. Typical hourly sentiment trends are shown in Fig. 7A-D for the control and sensor groups drawn from various combinations of affected or unaffected areas. The trends are quite noisy, making it necessary to analyze sentiment behavior over large samples, in this particular instance of 100,000 users. We look at the entire corpus of messages, both the primary hashtag dataset and the additional keywords dataset, which gives more data to analyze and reduces the noise in the average trends. Inclusion of all available data also allows monitoring sentiment well in advance of the hurricane formation. We observe a noticeable diurnal oscillation pattern in sentiment, previously reported in the analysis of daily and seasonal variations in online activity [41]. Discretizing time by days produces smoother curves of a lower temporal resolution, as the ones shown in Fig. 7E-H.

thumbnail
Fig 7. Hourly (A-D) and daily (E-H) sentiment trends for control and sensor groups of different geography, as identified in the panel titles.

Horizontal axis represents an offset in hours with respect to the hurricane landing time at 00:00 UTC on October 30 2012. Sentiment trends exhibit high level of random noise when aggregated on the basis of short time steps (A-D). If averaged over a longer period (E-H), an overall positive baseline level appears, which temporarily drops into negativity in the lead up to and the aftermath of the Hurricane Sandy landing (approximately from -100 to +100 hours). There is no discernible difference, i.e. detectable horizontal shift, in the temporal evolution of sentiment between control and sensor groups, suggesting that the emotional response is situational and universal. Geographical relevance causes the magnitude shift, with the groups affected by the hurricane demonstrating more negative average sentiment (panels B and C, or F and G).

http://dx.doi.org/10.1371/journal.pone.0117288.g007

Sentiment behavior exhibits certain general features. First, there is little difference in the dynamics of evolution between the sensor and control groups. Maxima and minima of the control and sensor trends are normally positioned at the same points in time, see for instance Fig. 7A and D (similar geography of control and sensor groups). It appears that the awareness advantage of the sensor group does not affect emotional response during the actual disaster itself. While on average sensor users start to monitor the event earlier, sentiment trends suggest that the emotional content of messages is situational, reflects current events, and is universal regardless of the network centrality. Second, being directly affected by the disaster has a detectable effect on the strength of sentiment. Sample groups formed outside of the disaster zone, regardless of whether they are control or sensor groups, consistently show more positive levels of sentiment illustrated by the vertical shift of corresponding sentiment trends in Fig. 7B and C (or Fig. 7F and G).

The composition of the message stream evolves with time, as illustrated in Fig. 8. The fractions of positive and negative messages are relatively stable, oscillating daily at a certain level. During the disaster phase, the number of negative messages grows at the expense of the positive ones and results in a distinct negative overall sentiment, which lasts approximately from -100 to +100 hours. This increase in the frequency of negative messages at the expense of positive ones potentially gives an opportunity for a simple and universal monitoring technique. Checking for the negative overall sentiment of messages from a randomly selected user sample, with the share of negatives growing at the expense of positives, may suffice to detect both the occurrence and location of an emergency or disaster. More broadly, the same concept may be applicable to any topic of prominence reflected in the interactions online. For instance, the period of negative sentiment around -300 hours is due to the October 17 presidential debates about the hotly disputed topic of energy policy. The sharp drop in sentiment at +220 hours is due to the weather related tweets discussing the November northeastern storm and the associated snowfall at a time when people still suffered severe consequences from Sandy, including power outages. Notably, this sharp drop is more pronounced in groups drawn from the disaster-affected region.

thumbnail
Fig 8. Daily trends in the composition of the message stream.

We monitor the fraction of positive (solid green line for control and dashed green for sensor groups) and negative messages (solid red for control and dashed red for sensor groups) in the total volume of all tweets. Time is taken as an offset in hours with respect to the hurricane landing time at 00:00 UTC on October 30 2012. During the most severe stage of the hurricane, in the anticipation of and after its landing, the composition undergoes transition from predominantly positive to predominantly negative.

http://dx.doi.org/10.1371/journal.pone.0117288.g008

As an illustration of the sentiment sensing technique, we apply it to the United States in the period between October 21 and November 7. On a regular spatial grid all messages are aggregated hourly and the average sentiment is calculated to obtain a spatio-temporal evolution of density and sentiment of tweets (see S1 Video online). Two snapshots for October 25 and October 29 are presented in Fig. 9. The top panel shows a distribution of messages on October 25, when activity is low and sentiment is mostly neutral or positive, with the exception of the Miami area. At this stage, Sandy has just passed Jamaica and Cuba, and Florida is directly under threat, which contributes to the “negativity” cluster in Miami. Closer to the landing time (bottom panel) activity increases significantly and sentiment in the affected area is overwhelmingly negative. Interestingly, regions unaffected by the hurricane still demonstrate rather neutral reaction, except for the major urban centers.

thumbnail
Fig 9. Comparison of density and sentiment for messages posted during an hour-long period at 17:00 EDT October 25 (top) and 20:00 EDT October 29 (bottom).

The polarity of sentiment is highlighted by green (positive average sentiment) or red (negative average sentiment). The density of messages is represented by the transparency of the color, where more solid shading indicates higher activity. At the early stage (top panel), prevalent sentiment is either neutral or positive, and the interest in the hurricane is comparatively low, except for the Miami area. Close to the landing time (bottom) the activity increases and the sentiment in the area affected by the hurricane is overwhelmingly negative. Unaffected regions remain neutral or positive.

http://dx.doi.org/10.1371/journal.pone.0117288.g009

Discussion

In this empirical study we found that the method based on the “friendship paradox” is generally successful in forming sensor groups with an awareness advantage over the randomly selected control groups. The magnitude of the lead-time varies with the size of a sample, showing an advantage of up to 11 hours in small samples of 500 users. This advantage shortens to 5 hours when the size increases to 100,000 users. Lead-time can change significantly when geographical restrictions are imposed on the formation of control and sensor groups, especially if one of them is from the disaster-affected area and the other one is not. Maximum advantage detected in our study was about 26 hours and resulted from the combination of high network centrality and geographical relevance.

Additional study of sentiment revealed that the emotional response was universal and followed a similar temporal evolution pattern in both control and sensor groups. The stream of messages changed its composition during the active phase of the disaster, and the increased fraction of negative messages pushed the average sentiment into negativity. Similar behavior was observed on a shorter time scale during the observation period and was linked to other prominent events (the presidential debates and the northeastern snowstorm). Features demonstrated by sentiment are promising in terms of developing a universal sensing technique that does not require any preconditioning in the form of specific keywords to monitor.

Our study presents a first empirical investigation of the sensor method in a network where information propagates in a mixed mode, both endogenously and exogenously, and factors like a relatively short time scale and strong geographical nature of a disaster affect performance of the method. The lead-times we obtained may be sufficient for individuals to improve their own preparedness to a threat like a hurricane (warn others, stockpile water, food, medicines, fuel, batteries, protect properties, etc.), but unlikely to give authorities enough time to adjust their global large-scale response. Nonetheless, the importance of the efficient pathway for the propagation of emergency information, provided by sensor groups, should not be underestimated. Early exposure to witness reports is a factor in compliance with authorities, because behavior in disasters is often collective [42] and evacuation decisions depend heavily on the perception of risk and peer influence [43]. There is also an inherent resistance towards recognizing potential dangers [44] and online interactions may either facilitate better response to threats or create undesired consequences, like panic [45,46]. Management of the information flow must therefore be included in communications recommendations of emergency planning guidelines [47].

It should be understood that the specific nature of a threat would be reflected in the performance of the sensor method. Since hurricanes, compared to other disasters, evolve slowly and are characterized by exceptional predictability using modern atmospheric simulation techniques, the value of the advantage on the scale of hours may be questionable as an early warning. But this might not be the case in other scenarios without extended warning time and predictability: for instance earthquakes [1721], terrorist attacks, technological catastrophes, forest fires and flash floods. In events like these, a social network may effectively serve as a primary source of information from eyewitnesses [48], as well as a medium for its distribution.

The sentiment sensing technique proposed here is an attractive alternative to the existing methods of disaster detection on Twitter [20,21], which are based on the monitoring of message frequency and text mining for event-specific keywords. Our approach is indifferent to the nature of a disaster and does not require any filtering of a message stream to extract event-related tweets. However, additional study is needed to evaluate how well such a technique performs in an unfiltered stream of information.

One important limitation to the reliability of quantitative estimates of the lead-time via digital traces on social media lies in the assumption of a direct correlation between awareness and online activity on the topic. Such correlation needs additional experimental validation by traditional sociological methods. Regional and demographic differences are likely to exist in online behavior, based on the communication platform adoption rate and patterns of use across groups from different regions, age, socio-economic status or cultural heritage. However, rigorous validation of these assumptions is impossible on the basis of the data that we used and is therefore beyond the scope of our research.

Finally, the scope of this study is confined to a single social network platform—Twitter. Although numerous other studies have confirmed that the “friendship paradox” holds true in a variety of settings, it is imperative in the future to investigate the scale of the awareness advantage achievable not only for other disaster scenarios, but on other social platforms as well.

Overall, our findings confirm the potential of the sensor method for efficient early detection of emergency information and offer a new sentiment sensing technique for detection and localization of disasters.

Supporting Information

S1 Fig. Histograms of occurrence for messages with specific keywords.

The figure shows the probability density functions for the occurrence of a keyword on its own in orange, and in combination with “sandy” in blue. The messages that occur before October 22 2012 (approximately at -200 hours on horizontal axis) are likely to be irrelevant to Hurricane Sandy and should be filtered out. Histograms are arranged in the approximate order of relevance.

doi:10.1371/journal.pone.0117288.s001

(TIFF)

S2 Fig. Histograms of occurrence for messages with specific keywords, continued from S1 Fig.

This figure continues the sequence from S1 Fig., with decreasing level of keyword relevance. Note the frequent incidence of use before October 22 2012.

doi:10.1371/journal.pone.0117288.s002

(TIFF)

S1 Table. List of keywords in the extended dataset accompanied by their corresponding message counts.

There is an overlap in counts caused by the matches of a message to multiple keywords (e.g. a tweet with more than one keyword, like “widespread power losses after #sandy”; or a word “sandyaid” matching simultaneously to “sandyaid”, “sandy” and “sand”). However, the dataset itself contains only unique messages without duplication.

doi:10.1371/journal.pone.0117288.s003

(DOC)

S2 Table. Average activities NC and NS (messages per user), entry times 〈tC〉 and 〈tS〉, and lead-times Δt (in hours) for control and sensor groups formed without location restrictions.

doi:10.1371/journal.pone.0117288.s004

(DOC)

S3 Table. Average activities NC and NS (messages per user), entry times 〈tC〉 and 〈tS〉, and lead-times Δt (in hours), with both control and sensor groups formed from users affected by the hurricane: “Control In—Sensor In” sampling.

doi:10.1371/journal.pone.0117288.s005

(DOC)

S4 Table. Average activities NC and NS (messages per user), entry times 〈tC〉 and 〈tS〉, and lead-times Δt (in hours) for control groups affected and sensors unaffected by the hurricane: “Control In—Sensor Out” sampling.

doi:10.1371/journal.pone.0117288.s006

(DOC)

S5 Table. Average activities NC and NS (messages per user), entry times 〈tC〉 and 〈tS〉, and lead-times Δt (in hours) for control groups unaffected and sensors affected by the hurricane: “Control Out—Sensor In” sampling.

doi:10.1371/journal.pone.0117288.s007

(DOC)

S6 Table. Average activities NC and NS (messages per user), entry times 〈tC〉 and 〈tS〉, and lead-times Δt (in hours) for both groups unaffected: “Control Out—Sensor Out” sampling.

doi:10.1371/journal.pone.0117288.s008

(DOC)

S1 Video. Spatio-temporal evolution of local density and average sentiment of tweets during Hurricane Sandy.

doi:10.1371/journal.pone.0117288.s009

(MP4)

Author Contributions

Conceived and designed the experiments: YK HC EM PVH MC. Performed the experiments: YK HC EM PVH MC. Analyzed the data: YK HC EM PVH MC. Contributed reagents/materials/analysis tools: YK HC EM PVH MC. Wrote the paper: YK HC EM PVH MC.

References

  1. 1. Nelson B (2013) Natural disasters: A calculated risk. Nature 495: 271–273. pmid:23495390 doi: 10.1038/nj7440-271a
  2. 2. Tollefson J (2012) Hurricane Sandy spins up climate discussion. Nature News. doi: 10.1111/cob.12003. pmid:25586252
  3. 3. Schiermeier Q (2006) Insurers' disaster files suggest climate is culprit. Nature 441: 674–675. pmid:16760941 doi: 10.1038/441674a
  4. 4. Van Aalst MK (2006) The impacts of climate change on the risk of natural disasters. Disasters 30: 5–18. pmid:16512858 doi: 10.1111/j.1467-9523.2006.00303.x
  5. 5. Press F, Hamilton RM (1999) Mitigating natural disasters. Science 284: 1927–1927. pmid:10400533 doi: 10.1126/science.284.5422.1927
  6. 6. Kennedy D (2002) Science, terrorism, and natural disasters. Science 295: 405–405. pmid:11799206 doi: 10.1126/science.295.5554.405
  7. 7. Guikema SD (2009) Infrastructure design issues in disaster-prone regions. Science 323: 1302–1303. doi: 10.1126/science.1169057. pmid:19265011
  8. 8. Diffenbaugh NS, Field CB (2013) Changes in ecologically critical terrestrial climate conditions. Science 341: 486–492. doi: 10.1126/science.1237123. pmid:23908225
  9. 9. Cutter SL, Finch C (2008) Temporal and spatial changes in social vulnerability to natural hazards. Proceedings of the National Academy of Sciences 105: 2301–2306. doi: 10.1073/pnas.0710375105. pmid:18268336
  10. 10. Helbing D (2013) Globally networked risks and how to respond. Nature 497: 51–59. doi: 10.1038/nature12047. pmid:23636396
  11. 11. Vespignani A (2009) Predicting the behavior of techno-social systems. Science 325: 425–428. doi: 10.1126/science.1171990. pmid:19628859
  12. 12. Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, et al. (2009) Life in the network: the coming age of computational social science. Science 323: 721–723. doi: 10.1126/science.1167742. pmid:19197046
  13. 13. Watts D, Cebrian M, Elliot M (2013) Dynamics of social media. Public Response to Alerts and Warnings Using Social Media: Report of a Workshop on Current Knowledge and Research Gaps (The National Academies Press, Washington, DC).
  14. 14. Bagrow JP, Wang D, Barabasi A-L (2011) Collective response of human populations to large-scale emergencies. PLoS ONE 6: e17680. doi: 10.1371/journal.pone.0017680. pmid:21479206
  15. 15. Wang D, Lin Y-R, Bagrow JP (2012) Social Networks in Emergency Response. Encyclopedia of Social Network Analysis and Mining.
  16. 16. Hughes AL, Palen L (2009) Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management 6: 248–260. doi: 10.1504/ijem.2009.031564
  17. 17. Li J, Rao H (2010) Twitter as a rapid response news service: An exploration in the context of the 2008 China Earthquake. The Electronic Journal of Information Systems in Developing Countries 42: 1–22.
  18. 18. Caragea C, McNeese N, Jaiswal A, Traylor G, Kim H-W, et al. (2011) Classifying text messages for the Haiti earthquake. Proceedings of the 8th International Conference on Information Systems for Crisis Response and Management (ISCRAM2011).
  19. 19. Guy M, Earle P, Ostrum C, Gruchalla K, Horvath S (2010) Integration and dissemination of citizen reported and seismically derived earthquake information via social network technologies. Advances in intelligent data analysis IX: Springer. pp. 42–53.
  20. 20. Earle PS, Bowden DC, Guy M (2012) Twitter earthquake detection: earthquake monitoring in a social world. Annals of Geophysics 54: 708–715.
  21. 21. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. Proceedings of the 19th international conference on World wide web: 851–860.
  22. 22. Pickard G, Pan W, Rahwan I, Cebrian M, Crane R, et al. (2011) Time-critical social mobilization. Science 334: 509–512. doi: 10.1126/science.1205869. pmid:22034432
  23. 23. Rahwan I, Dsouza S, Rutherford A, Naroditskiy V, McInerney J, et al. (2013) Global Manhunt Pushes the Limits of Social Mobilization. Computer 46: 68–75. doi: 10.1109/mc.2012.295
  24. 24. Cooper S, Khatib F, Treuille A, Barbero J, Lee J, et al. (2010) Predicting protein structures with a multiplayer online game. Nature 466: 756–760. doi: 10.1038/nature09304. pmid:20686574
  25. 25. Von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) recaptcha: Human-based character recognition via web security measures. Science 321: 1465–1468. doi: 10.1126/science.1160379. pmid:18703711
  26. 26. Feld SL (1991) Why Your Friends Have More Friends Than You Do. American Journal of Sociology 96: 1464–1477. doi: 10.1086/229693
  27. 27. Hodas NO, Kooti F, Lerman K (2013) Friendship Paradox Redux: Your Friends Are More Interesting Than You. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM). pp. 225–233.
  28. 28. Kooti F, Hodas NO, Lerman K (2014) Network Weirdness: Exploring the Origins of Network Paradoxes. Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (ICWSM). pp. 266–274.
  29. 29. Eom Y-H, Jo H-H (2014) Generalized friendship paradox in complex networks. arXiv preprint (arXiv:14011458).
  30. 30. Christakis NA, Fowler JH (2010) Social Network Sensors for Early Detection of Contagious Outbreaks. PLoS ONE 5: e12948. doi: 10.1371/journal.pone.0012948. pmid:20856792
  31. 31. Sun L, Axhausen KW, Lee D-H, Cebrian M (2014) Efficient detection of contagious outbreaks in massive metropolitan encounter networks. Scientific Reports 4.
  32. 32. Garcia-Herranz M, Moro E, Cebrian M, Christakis NA, Fowler JH (2014) Using Friends as Sensors to Detect Global-Scale Contagious Outbreaks. PLoS ONE 9: e92413. doi: 10.1371/journal.pone.0092413. pmid:24718030
  33. 33. Blake ES, Kimberlain TB, Berg RJ, Cangialosi JP, Beven II JL (2013) Tropical Cyclone Report. Hurricane Sandy (AL182012) 22–29 October 2012. National Hurricane Center.
  34. 34. Topsy Labs (2012) Topsy website. Available: http://www.topsylabs.com. Accessed 2012 Nov 20.
  35. 35. Mislove A, Lehmann S, Ahn Y-Y, Onnela J-P, Rosenquist JN (2011) Understanding the Demographics of Twitter Users. ICWSM.
  36. 36. National Hurricane Center (2013) NHC GIS Archive—Tropical Cyclone Best Track (AL182012). Available: http://www.nhc.noaa.gov/gis/archive_besttrack_results.php?id=al18&year=2012&name=Hurricane SANDY. Accessed 2013 Feb 12.
  37. 37. Preis T, Moat HS, Bishop SR, Treleaven P, Stanley HE (2013) Quantifying the Digital Traces of Hurricane Sandy on Flickr. Scientific Reports 3: 3141. doi: 10.1038/srep03141. pmid:24189490
  38. 38. Pennebaker J, Francis M, Booth R (2001) Linguistic inquiry and word count. LIWC2001: Mahwah, NJ: Erlbaum Publishers.
  39. 39. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Computational linguistics 37: 267–307. doi: 10.1162/coli_a_00049
  40. 40. Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology 61: 2544–2558. doi: 10.1002/asi.21416
  41. 41. Golder SA, Macy MW (2011) Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333: 1878–1881. doi: 10.1126/science.1202775. pmid:21960633
  42. 42. Riad JK, Norris FH, Ruback RB (1999) Predicting Evacuation in Two Major Disasters: Risk Perception, Social Influence, and Access to Resources. Journal of Applied Social Psychology 29: 918–934. doi: 10.1111/j.1559-1816.1999.tb00132.x
  43. 43. Baker EJ (1991) Hurricane evacuation behavior. International Journal of Mass Emergencies and Disasters 9: 287–310.
  44. 44. Quarantelli EL (1980) Evacuation Behavior and Problems: Findings and Implications from the Research Literature. Ohio State University Disaster Research Center. pmid:25121255
  45. 45. Helbing D, Farkas I, Vicsek T (2000) Simulating dynamical features of escape panic. Nature 407: 487–490. pmid:11028994 doi: 10.1038/35035023
  46. 46. Helbing D, Mukerji P (2012) Crowd disasters as systemic failures: analysis of the Love Parade disaster. EPJ Data Science 1: 1–40. doi: 10.1186/1687-4153-2012-1. pmid:22376768
  47. 47. Perry RW, Lindell MK (2003) Preparedness for emergency response: guidelines for the emergency planning process. Disasters 27: 336–350. pmid:14725091 doi: 10.1111/j.0361-3666.2003.00237.x
  48. 48. Gao L, Song C, Gao Z, Barabasi A-L, Bagrow JP, et al. (2014) Quantifying Information Flow During Emergencies. Scientific Reports 4: 3997. doi: 10.1038/srep03997. pmid:24499738