Cross-comparative analysis of evacuation behavior after earthquakes using mobile phone data

Despite the importance of predicting evacuation mobility dynamics after large scale disasters for effective first response and disaster relief, our general understanding of evacuation behavior remains limited because of the lack of empirical evidence on the evacuation movement of individuals across multiple disaster instances. Here we investigate the GPS trajectories of a total of more than 1 million anonymized mobile phone users whose positions were tracked for a period of 2 months before and after four of the major earthquakes that occurred in Japan. Through a cross comparative analysis between the four disaster instances, we find that in contrast to the assumed complexity of evacuation decision making mechanisms in crisis situations, an individual’s evacuation probability is strongly dependent on the seismic intensity that they experience. In fact, we show that the evacuation probabilities in all earthquakes collapse into a similar pattern, with a critical threshold at around seismic intensity 5.5. This indicates that despite the diversity in the earthquakes profiles and urban characteristics, evacuation behavior is similarly dependent on seismic intensity. Moreover, we found that probability density functions of the distances that individuals evacuate are not dependent on seismic intensities that individuals experience. These insights from empirical analysis on evacuation from multiple earthquake instances using large scale mobility data contributes to a deeper understanding of how people react to earthquakes, and can potentially assist decision makers to simulate and predict the number of evacuees in urban areas with little computational time and cost. This can be achieved by utilizing only the information on population density distribution and seismic intensity distribution, which can be observed instantaneously after the shock.


Introduction
Severe earthquakes such as the Kobe earthquake (1995), the Tohoku earthquake (2011) and more recently the Kumamoto earthquake (2016) caused mass evacuation activities owing to PLOS  considerable damage to buildings and urban infrastructure [1][2][3][4]. Human mobility prediction in disaster scenarios is crucial for various recovery efforts, including the planning of locations and capacities of evacuation shelters, and the allocation of various emergency supplies. Many of the conventional methods use urban infrastructure failure data to predict the number of evacuees in upcoming disasters. For example, the Tokyo Metropolitan Government uses a model that utilizes variables such as building collapse rate, lifeline damage rate, and inundation area data to estimate the number of evacuees [5]. However, it is difficult to use this model shortly after an earthquake because typically, several days are required for government organizations to inspect and gather information about the status of lifelines and infrastructure. In fact, after the Kumamoto earthquake (2016, magnitude 7.3), authorities failed to obtain an accurate estimate of the number of evacuees due to difficulties in data collection. This caused delays in rescue and inefficient distribution of emergency supplies [6]. Traditionally, transportation surveys were used to understand the city-scale human mobility [7]. In recent years, large scale datasets collected from mobile phones and smartphones are beginning to be utilized to understand the behavior of individuals at a low cost [8][9][10][11][12][13]. These data are used in applications in various fields such as traffic management [14][15][16], monitoring pandemic spreading [17], tourist mobility analysis [18] and the prediction of population distributions and dynamics [19][20][21][22]. Many works have applied this new data source for applications in disaster management [23]. Studies have shown the effect of weather patterns on human mobility [24], and its predictability using socio-economic factors [25]. In terms of understanding human mobility during larger scale disasters, a recent study analyzed call detail records to investigate the predictability of evacuation destinations of individuals after the Haiti earthquake and showed that most evacuation destinations were places that individuals have visited earlier (e.g., the home of a relative or a friend) [26]. Also, population decline was observed in various areas affected by the Tohoku Earthquake using mobile phone data [27]. Other works have used Twitter data to observe the perturbation in the radius of gyration of affected individuals during disasters [28], and to understand the tweeting behaviors of affected individuals and the transition of sentiments after large scale disasters [29,30]. More recent studies conducted after the Nepal earthquake and Kumamoto earthquake showed that evacuation behavior could be monitored after an earthquake by using mobile phone location data (e.g. GPS, call detail records) obtained from the evacuees [31][32][33]. Although these works provide valuable insights into the human mobility patterns during a individual disaster case studies, they fail to provide general insights that could be applicable across different disasters. Moreover, results obtained from mobile phone data are usually delivered to decision makers several days or even weeks after the initial shocks because the use of real-time mobile phone location data is highly sensitive against privacy issues [34]. This motivates us to perform a cross-comparative analysis of human evacuation behavior after various disaster cases, to obtain a general understanding of evacuation behavior after earthquakes which can be used in planning evacuation strategies for future disasters.
Several works have performed a cross-comparative analysis across different disasters. However, to the best of our knowledge, none of the works provide analyses of detailed evacuation behavior which can be utilized by emergency management practitioners. The analysis of calling behavior after various types of disasters has shown that increase in communication after emergencies are both spatially and temporally localized, but information about emergencies spreads globally [35]. Here in this study, we focus more on the physical movements of individuals rather than communication patterns. Similar to [28], the radius of gyration of individuals across different disaster cases using Twitter Geo-tag data has been studied [36]. Although using the radius of gyration measures the perturbation of human mobility due to disasters, it does not provide direct measures of evacuation movements. In this study, we provide detailed analysis of evacuation rates and evacuation distances of individuals after multiple earthquakes using large scale mobile phone location datasets from Japan. More specifically, we analyze the mobility of individuals who were affected by the Tohoku earthquake (2011 March) [37], Kumamoto earthquake (2016 April) [38], Nagano earthquake (2014 November) [39], and Tottori earthquake (2016 October) [40].

Mobile phone location data
Yahoo Japan Corporation (https://about.yahoo.co.jp/info/en/) provides a variety of disaster notifications to users through its disaster alert app. The disaster alert app continues to acquire real-time location information of individuals in order to transmit only geographically relevant disaster notifications for individuals. The users have accepted to provide their location data when installing the application. The data is anonymized so that individuals cannot be specified, and personal information such as gender, age and occupation are unknown. Each location data is stored as a GPS record in Yahoo Japan's internal server. Each record consists of a user's unique ID (random character string), latitude, longitude, date and time. The acquisition frequency of the GPS data changes according to the movement speed of the user. If it is determined that the user is staying in a certain place for a long time, data is acquired at a relatively low frequency, and if it is determined that the user is moving, the data is acquired more frequently. By reducing the number of times data is acquired using this algorithm, it is possible to reduce the burden on the user's smartphone battery. On average, about 40 points are observed per day per user, therefore we can observe the main staying places of each individual. Currently the number of users who have installed the disaster alert app in Japan nationwide are about 1 million (Kumamoto, at 2016 April), and it is currently increasing due to the rise in disaster risk awareness. Approximately, this is 1% sample rate of the whole population which is equivalent to the national census, which is performed once in 10 years for the purpose of understanding traffic behavior. S1 Fig shows the actual GPS information for one day plotted on a white map around Kumamoto City. Each fine red dot corresponds to one GPS data. By comparing the plot with an actual map of the area, we can observe that the city center and roads that connect the surrounding cities can be traced from the data, showing its high spatial density. Table 1 shows the statistics of the disaster and also the data that were used for the experiments. We used 712,904 users' data for which an average of 19 points were observed per day for the Kumamoto earthquake. For Nagano and Tottori earthquakes, the number of users were 10244 and 24103, respectively with similar observation frequencies. We also use the GPS data set (called Konzatsu-Tokei data), provided by Zenrin Data Com (https://www.zenrindatacom.net/en/index.html) (ZDC) to anlyze the evacuation mobility during the Tohoku earthquake. "Konzatsu-Tokei (R)" Data refers to location data collected from mobile phones using the AUTO-GPS function under the users' consent, through the "docomo map navi" service provided by NTT DOCOMO, INC. The data is processed collectively and statistically in order to ananymize and to conceal private information. GPS data (latitude, longitude) of smartphones are collected approximately every 5 minutes at most and does not include other personal information such as gender and age. To analyze the evacuation behaviors of people at the time of the Tohoku earthquake, we used the data from February 1, 2011 to March 31, 2011, and the total number of IDs in the Tohoku region (Aomori, Iwate, Akita, Miyagi, and Yamagata prefectures) was 157,225. To analyze the evacuation behavior due to the effects of earthquakes and to separate the effects of other disaster types, we excluded areas that were hit by the tsunami (coastal areas) shown in blue color in Fig 1 and cities in Fukushima, which were heavily affected by the nuclear power plant accident.

Seismic intensity data
"Seismic intensity" is used as an index to measure the strength of the external force of earthquakes, varying from 1 (very small) to 7 (extreme shock), with 0.1 increments. Mashiki City experienced the largest seismic intensity of 6.7 during the Kumamoto earthquake. Seismic intensity is obtained from the acceleration caused by the earthquake, and each local government unit is given a seismic intensity value. The seismic intensity data is published in the "Jishin/Kazan Geppou (Monthly magazine of earthquakes and volcanoes)" published by the Japan Meteorological Agency (https://www.data.jma.go.jp/svd/eqev/data/gaikyo/ (in Japanese)). Note that seismic intensity is different from "magnitude", which is only observed at the epicenter. We used the data from 2016 April and October, 2014 November, and 2011 March.

Home location estimation and evacuation
It is well known that human trajectories show a high degree of temporal and spatial regularity, each individual having a significant probability to return to a few highly frequented locations, including his/her home location [11]. Due to this characteristic, it has been shown that home locations of individuals can be detected with high accuracy by clustering the individual's stay point locations over night [12]. The home location of each individual was detected by applying mean-shift clustering to the nighttime staypoints (observed between 8PM and 6AM), weighted by the duration of stays in each location [41,42]. Mean shift clustering was implemented using the scikit-learn package on Python (http://scikit-learn.org/stable/modules/generated/sklearn. cluster.MeanShift.html). An individual was detected to be evacuated if the individual is estimated to be staying more than r (meters) away from his or her estimated home location. Evacuation rate on a given day in a given city is calculated by dividing the number of evacuated individuals by the total number of users that were estimated to be living in that city. For each disaster d, we calculate the evacuation rate p d (z) at a given seismic intensity z by the following equation.
where S d (z) is the set of LGUs that experienced a seismic intensity of z in disaster d, M i is the total number of users living in LGU i, and M � i is the number of evacuated people from LGU i, which we observe from trajectories of mobile phone location data.

Accuracy of mobile phone location data
The mobile phone data has a sample rate of about 1%, which is about the same as the sample rate of the personal trip survey conducted by the Ministry of Land, Infrastructure and Transport, so it is suggested that this sample rate is enough for grasping the entire urban flow. Moreover, studies have shown that the distribution of the entire population can be accurately reproduced from the call detail record data obtained from mobile phones [19]. To show that our data is also sufficient to represent the whole population, we examine how accurate the actual population distribution can be estimated from the nighttime GPS data. As verification data, we use the national census data, which contains the residential population data for every 1,000 meter grid. We separate Kumamoto Prefecture into 1,000 meter grid, and we estimate the population in each grid mesh by multiplying the number of IDs in the grid mesh from the GPS dataset by the inverse of the sample rate. The correlation coefficient between the estimated population and the actual population is 0.853, showing a high accuracy of population estimation (S2 Fig). Fig 2 shows the spatial distribution of seismic intensity (A) and evacuation rate (B) after the Kumamoto earthquake. Both data are aggregated into local government unit (LGU) areas. The evacuation rate in a given LGU is calculated by dividing the estimated number of individuals who are staying more than 200 meters away from their estimated home location after the earthquake by the total number of individuals who are estimated to be living in the given LGU. We can observe that even though seismic intensity gradually attenuates as the distance from the epicenter increases, evacuation rate abruptly decreases from approximately 40% to less than 10% around areas with a seismic intensity of 5.0. We can infer that human evacuation activity has a critical 'tipping point' with respect to seismic intensity. The underlying reasons for this behavior are not trivial. However, it can be assumed that evacuation activities increase sharply owing to the collapse of buildings and critical social infrastructure such as electricity and gas lines, which occur at similar seismic intensities. We model this nonlinear characteristic of evacuation activities by further observing the evacuation activities for four large earthquakes in Japan.

Fragility curve for evacuation behavior
To explore the general properties of the evacuation activities after large earthquakes, we analyzed each individual's movement after the initial shocks of the four earthquakes. Out of these earthquakes, the Tohoku earthquake caused multiple types of hazards including a tsunami and a nuclear power plant accident. In this study, we focus on areas that were affected only by the earthquake to understand the evacuation behavior caused by large earthquakes. Therefore, the LGUs in the Fukushima prefecture and the LGUs along the coastal line in the Tohoku region are beyond the scope of this study [43]. For other earthquakes, we calculated the evacuation rate of all LGUs that experienced a seismic intensity of larger than 4.0. Fig 3A shows the evacuation rates during the four earthquakes according to their seismic intensities. The four different colors and plots correspond to the different earthquakes. To model the sudden increase of evacuation rates with respect to seismic intensities, we use the fragility curve model.
The fragility curve is a model used in the field of structural mechanics, mainly for modelling the collapse rate of a building with respect to seismic intensities of earthquakes and the inundation height of tsunami [44]. The external force of the disaster is taken on the horizontal axis and the vertical axis is the collapse rate. It was found that the inundation depth of the tsunami and the collapse rate of the building follow a cumulative log normal distribution function [44]. Similarly, it was also found that the seismic intensity of the earthquake and the collapse rate of the building can also be approximated with a cumulative log normal distribution function [45]. The common feature of both discoveries is that the collapse rate of buildings is close to 0 up to a certain external force but rapidly increases from a certain inflection point. The cumulative log normal distribution is a function that can model this sudden increase. The cumulative log normal distribution function is a function shown in equation, and there are three parameters (μ, σ, a). μ and σ dictate the slope of the function and the position of the inflection point, while a determines the maximum value of evacuation rates. We model the evacuation rate of humans p(z) given seismic intensity z as the following fragility function, and estimate parametersm;ŝ;â using the maximum likelihood estimation approach [45] given the observations from mobile phone data.
The estimated parameters of the fragility curve using the four disaster cases are μ = 1.73, σ = 0.075, a = 0.63, and the correlation coefficient between the model and the data was R = 0.916, as shown in Fig 3B. From these estimations, we can infer that the seismic intensity at which individuals start to evacuate is approximately 5.2 and evacuation rate increases sharply between intensities of 5.5 and 6.0. At a seismic intensity of approximately 6.5, evacuation rate plateaus at 63%. This plateau explains the strength of the infrastructure in Japan because it implies that regardless of seismic intensity, a relatively large fraction (approximately 34%) of the individuals do not have to evacuate from their homes. We test the robustness of our results by performing a leave-one-out test similar to cross validation. More specifically, we estimate the parameters of the fragility curve using data obtained from three earthquakes, and then test the fit for the left out earthquake by measuring the correlation coefficient R and mean average percentage error (MAPE) of evacuation rates. Table 2 shows the robustness testing results. The first row means that the parameters of the fragility curve were estimated using data from disasters except Tohoku earthquake (i.e. Kumamoto, Tottori, Nagano earthquakes) asm ¼ 1:79 andŝ ¼ 0:122, and were used to predict the evacuation rates after the Tohoku earthquake, which showed high accuracy (R = 0.879, MAPE = 5.35%). For all disasters, prediction showed high correlation and small MAPE, all below 10%. Moreover, the estimated parameters were similar for all testing cases, showing the high stability and robustness of the estimated fragility curve. This test shows that by using the fragility curve fitted by data from past disasters, we are able to predict the evacuation rates in future disasters with high precision.
Understanding the spatial and temporal characteristics of evacuation behavior is important for creating effective evacuation plans. We investigated the evacuation distance, which is shown in Fig 4. The long tail distribution of the evacuation distance follows a power law P(d) = αd −γ , similar to cases in non-disaster cases [11]. We can observe the evacuation distance to be

Conclusion
Using large scale mobility data collected from over 1 million mobile phones of users affected by earthquakes, we carried out a cross-comparative analysis on the evacuation mobility patterns of individuals. Through a cross comparison across four large scale earthquakes in Japan, we found that although city characteristics vary, the evacuation rates can be approximated well with a single fragility curve with respect to seismic intensities, with a high correlation of R = 0.916. The robustness of the fragility curve was checked by performing a cross-validation test on the four disasters. Moreover, it was found that the distribution of evacuation distances did not depend on the seismic intensity that the individual experienced, which extends the past findings to a general setting. Our data-driven analysis of evacuation behavior after earthquakes based on seismic intensity could improve the manner in which disaster managers prepare and respond to disasters. Since seismic intensity data can be obtained instantly after the shock, practitioners can use that information to predict the approximate number of evacuees instantaneously after the earthquake to develop evacuation shelter location plans and significantly improve the quality of disaster response. Our future works include analyzing the fragility curves of evacuation rates in different areas of the world apart from Japan to investigate the factors that characterize robustness of cities against earthquakes.