Modeling migration patterns in the USA under sea level rise

Sea level rise in the United States will lead to large scale migration in the future. We propose a framework to examine future climate migration patterns using models of human migration. Our framework requires that we distinguish between historical versus climate driven migration and recognizes how the impacts of climate change can extend beyond the affected area. We apply our framework to simulate how migration, driven by sea level rise, differs from baseline migration patterns. Specifically, we couple a sea level rise model with a data-driven model of human migration and future population projections, creating a generalized joint model of climate driven migration that can be used to simulate population distributions under potential future sea level rise scenarios. The results of our case study suggest that the effects of sea level rise are pervasive, expanding beyond coastal areas via increased migration, and disproportionately affecting some areas of the United States.


Population projections
The population projections in this paper are reproduced from the methodology described in Hauer et al. 2016 [1]. This methodology involves modeling the growth (or decline) of housing units, h t i , for each census block group in the US based on historical data from 1940-2010, then projecting the number of housing units per census block group out to 2100. The population per housing unit, d i , and the group quarters population, g i , for each block group is assumed to stay constant at their 2010 values. Now, the total population for a particular block group, i, at some time, t, is given as p t i = h t i * d i + g i . We find our estimates of affected county level populations in 2100 under both the medium and high SLR scenarios are similar to those reported in the the Supplementary Information of Hauer et al. 2016. These estimates can be downloaded in the accompanying code repository.

Implementation and results using the Extended Radiation model
We implement our proposed framework using the extended radiation model [3] as a human migration model instead of the ANN model proposed in the main text. The extended radiation model is a rederivation of the original radiation model [4] under a survival analysis framework that includes the addition of a parameter which controls the influence of scale of the region and the degree of heterogeneity in the distribution of destination locations. Other human mobility models such as the gravity model [5,6,7,8], and Schneider's intervening opportunities model [9] provide alternate ways to estimate the number of people that migrate between different locations. The extended radiation model is given in Equation 1, where the variables are as follows: P ij is the probability that a migrant who leaves zone i will travel to zone j, T ij is the number of migrants that travel from zone i to j, m i is the population of zone i, β is a parameter of the extended radiation model that controls the influence of scale of the region on trips, d ij is the distance between zones i and j, and s ij is the population in the circle centered at i with radius d ij (which does not include m i or m j ).
As in the main text, we use two "versions" of the extended radiation model to model migrations from flooded areas and unflooded areas separately. We calculate β C and β S , the parameters for the models M IGRAT ION C and M IGRAT ION S respectively, by minimizing the Common Part of Commuters (CPC) metric [8] over the same sets of migration data that were used to train the previous ANN model. Here β C = 0.13 and β S = 0.33. We compare the fit of the ANN and Extended Radiation model in Section 3. Figures 1 and 2 show the results from using these extended radiation models and are in the same format as the results figures from the main text. The general patterns match those from main text (which use an ANN model) -higher concentrations of migrants move to urban areas and counties immediately inland from coastal counties have higher percentages of incoming migrants by population. Figure 2 shows the severity of indirect effects degrading with distance from the East coast of the US, while showing that counties that are adjacent to the West coast will not experience similar effects. This result is a combination of the larger displaced populations along the East coast, and the relatively higher density of populations of counties in the Eastern US. The extended radiation model describes the probability of migration between two counties as decaying as a function of the intervening opportunities between them. The intervening opportunities between counties are fewer in the less dense counties of the Western US, meaning longer migrations will be more probable, while shorter migrations will be likely in the more dense Eastern US. Therefore, affected migrants leaving large populations centers along the east coast, such as Miami-Dade in Florida, will diffuse across the available opportunities at a rate roughly proportional to the distance from their origin -giving the observed pattern.
The results given by the extended radiation model predict larger amounts of indirect effects at the higher values of d than the ANN models do, i.e. that there will be larger numbers of people that are indirectly effected by SLR. This is a direct consequence of the diffusion pattern described in the previous paragraph, more population mass is dispersed over larger numbers of rural counties in the southeast US, thus counting the entire populations of these counties as being affected under higher values of d.

Migration model training and validation
Our implementation of the Joint Model relies on ANN models of human migration to estimate migrations from affected and unaffected counties. We fit these models using historical county-to-county migration data from 2004-2014 from the IRS [10]. In this section, we evaluate the performance of our ANN models against the Extended Radiation [3], Radiation [4], and two types of Gravity models [7,8]. We have three sets of counties: all counties which consists of every county from all years of migration data, affected counties which consist of the 7 counties most heavily impacted by Hurricanes Katrina and Rita in 2005 (discussed in the main text), and unaffected counties which consists of the set difference all countiesaffected counties. We evaluate each model's average cross-validated performance in three tasks: estimating migrations from all counties to all counties, from unaffected counties to unaffected counties, and from affected counties to unaffected counties. In all of these cross-validation experiments we split on origin counties, i.e.
we select a set of training counties and use all migration observations originating in that set of counties (different rows of T) to fit our models, then test on the migration observations originating in the remaining counties. In the affected counties to unaffected counties estimation task we use leave-one-out cross-validation as we only have 7 origin counties, while in the other two tasks we use 5-folds cross-validation. In each cross-validation fold we fit a model for every year of training data and average the results. This fitting procedure involves: training the ANN models using the parameters/architecture described in [2], estimating the single parameter, β, of the extended radiation or gravity models, and estimating the production function coefficient α. We estimate the β parameters by maximizing the CP C metric (described below) between the training migration data and modeled migration data. Finally, we estimate the α parameter by calculating the slope of the best fit line through all (m i , n j=1 T ij ) points, i.e. computing the average fraction of the population of a county that migrates away in a given year. All parameter fitting is done only within a fold's training data, and the average and standard deviations of the α and best β parameters for each task is reported in Table 1.
To measure model performance we record the Common Part of Commuters (CP C) [7], Common Part of Commuters distance variant (CP C d ) [8], mean absolute error (M AE), and coefficient of determination (r 2 ) between the ground truth migrations, T, and the model estimated migrations, T. These metrics are calculated on the origin destination migration matrices, hence we refer to them as "metrics on full matrix" in Table 1.
Common Part of Commuters (CP C) This metric directly compares numbers of travelers between the predicted and ground truth matrices. It will be 0 when the two matrices have no entries in common, and 1 when they are identical.

CP C(T,T)
Common Part of Commuters Distance Variant (CP C d ) This metric measures how well a predicted migration matrix recreates trips at the same distances as the ground truth data. In this definition, N is a histogram where a bin N k contains the number of migrants that travel between 2k − 2 and 2k kilometers. It will be 0 when the two matrices do not have any migrations at the same distance, and 1 when all fall within the same distances.
Mean absolute error (M AE) This is a standard error measure, the average absolute difference between the predicted and ground truth values. Here, smaller values represent smaller errors in terms of number of migrants.
M AE(T,T) = 1 n n i,j=1 Coefficient of determination (r 2 ) This score measures the goodness of fit between a set of predictions and the ground truth values. This score ranges from 1, in a perfect fit, to arbitrarily negative values as a fit becomes worse, and is 0 when the predictions are equivalent to the expectation of the ground truth values.
We also measure the aggregate model performance of predicting the aggregate incoming migrants per county. We calculate the M AE and r 2 between T :i = n j=1 T ji andT :i = n j=1T ji and call them "metrics on incoming migrants vector" in Table 1.
The results of these experiments are shown in Table 1. Here, the ANN model outperforms the other models in the large unaffected counties to unaffected counties task and all counties to all counties tasks. In these two tasks the ANN model has a large amount of training data to exploit, and performs best in the matrix M AE, matrix r 2 , and incoming migrants M AE metrics, and second best in the remaining metrics. The gravity model with power law decay performs best in terms of CP C and CP C d on both tasks, however fails to appropriately model the pairwise flows with a matrix r 2 score of near 0. Similarly, the extended radiation model performs slightly better than the ANN model in terms of incoming migrants r 2 (and is indeed the only traditional migration model with a positive matrix r 2 metric), but performs significantly worse than the ANN model in all other dimensions.
In the affected counties to unaffected counties task, the best performing model is not obvious. Here, the matrix M AE and r 2 values are identical to the incoming migrants M AE and r 2 values due to the leave-one-out cross validation method -the test set in each split are all possible migrations originating from a single county. We observe that the extended radiation model performs the best in terms of CP C and r 2 scores, however with wildly different best parameter values between cross-validation splits (shown in the large standard deviation of the best β value). The ANN model is performing best in terms of CP C d and M AE, but poorly on average considering the average r 2 score near 0. In the testing splits where the model is performing poorly it has overfit to the longer distance migrations observed in 6 out of the 7 affected counties and is unable to capture the shorter distance migrations from the held out county. As the extended radiation model captures the general behavior of human migration (with a single parameter), it does not overfit in this case. The best β parameters in the affected counties to unaffected counties task are significantly different than in the other two tasks, showing how the structure of migration may be different under extreme flooding events.

Effects of modeling climate migrants separately
In the main text we argue that persons living in areas that will be affected by SLR will be exposed to increased climatic pressures and will be forced to migrate elsewhere. Furthermore, we train an ANN model to separately model these migrations based on historic migration patterns from counties that were especially affected by Hurricanes Katrina and Rita. Here, we show the effect that separately modeling these migrations has on our results by simulating the same set of conditions with a single migration model for both types of migrants under both the ANN and Extended Radiation migration models. Figure 3 shows the difference in number of incoming migrants per county between the results using separate migration models and the results using a single migration model for the 1.8m SLR scenario. Notably, in both sets of results, when climate driven migrations are not modeled separately, then more migrations to coastal areas are predicted. One explanation for this pattern is that the  probability of migration for climate driven migrations does not decay as strongly with distance as in standard migrations (e.g. the calibrated scale parameter in the Extended Radiation model is lower for climate migrants, which decreases the impact of intervening opportunities), therefore displaced migrants "see" distant population centers as attractive of a destination as nearby coastal destinations (which will usually be more populated than surrounding areas).
One discrepancy between the two sets of results is how, in the Extended radiation model results, separately modeled affected and unaffected migration behavior results in more predicted migrations to urban centers, while the corresponding ANN results show fewer predicted migrations to urban centers. This could also be due to the different emphasis on distance in predicting migration probabilities.