Detecting behavioural changes in human movement to inform the spatial scale of interventions against COVID-19

On March 23 2020, the UK enacted an intensive, nationwide lockdown to mitigate transmission of COVID-19. As restrictions began to ease, more localized interventions were used to target resurgences in transmission. Understanding the spatial scale of networks of human interaction, and how these networks change over time, is critical to targeting interventions at the most at-risk areas without unnecessarily restricting areas at low risk of resurgence. We use detailed human mobility data aggregated from Facebook users to determine how the spatially-explicit network of movements changed before and during the lockdown period, in response to the easing of restrictions, and to the introduction of locally-targeted interventions. We also apply community detection techniques to the weighted, directed network of movements to identify geographically-explicit movement communities and measure the evolution of these community structures through time. We found that the mobility network became more sparse and the number of mobility communities decreased under the national lockdown, a change that disproportionately affected long distance connections central to the mobility network. We also found that the community structure of areas in which locally-targeted interventions were implemented following epidemic resurgence did not show reorganization of community structure but did show small decreases in indicators of travel outside of local areas. We propose that communities detected using Facebook or other mobility data be used to assess the impact of spatially-targeted restrictions and may inform policymakers about the spatial extent of human movement patterns in the UK. These data are available in near real-time, allowing quantification of changes in the distribution of the population across the UK, as well as changes in travel patterns to inform our understanding of the impact of geographically-targeted interventions.

Introduction Fine-scale geographic monitoring of large populations can potentially increase the accuracy and responsiveness of epidemiological modelling, outbreak response, and intervention planning in response to public health emergencies like the COVID-19 pandemic [1][2][3][4][5][6]. Population and mobility datasets collected from the movement of individuals' mobile phones provide empirical, near-real time metrics of population movement between different geographic regions [6][7][8]. The COVID-19 pandemic response has coincided with the availability of new data sources for measuring human movement, aggregated from mobile devices by network providers and popular applications including Google Maps, Apple Maps, Citymapper, and Facebook [7,9,10].
The spread of COVID-19 through travel networks has been demonstrated in China, where connectivity to Wuhan was shown to predict the timing of arrival of COVID-19 cases in each region [11,12]. The volume of mobility has also been shown to correlate with transmission of COVID-19, where movement can be used as a proxy for the measurement of the degree of social distancing [13]. Travel and movement behavior during epidemics may also change in response to imposed interventions, perceived risk, and due to seasonal activities such as vacations [11,12]. During the COVID-19 pandemic, mobility data has been used to assess adherence to movement restrictions [13,14], the impact of movement restrictions on the transmission dynamics of COVID-19 [15][16][17], and demonstrate differential adherence to movement restrictions among socioeconomic groups [18][19][20][21].
In this analysis, we use movement and population data provided by Facebook from March 10 to November 1 2020, which records approximately 15 million daily locations of 4.8 million users [7]. We also used population, age, ethnicity, and Index of Multiple Deprivation (IMD) data from UK national statistics agencies to understand the population of users recorded in the movement and population data [22][23][24][25]. We identify changes in travel behavior in response to initially stringent movement restrictions (March to May 2020) and subsequent easing of restrictions, paired with a policy of spatially-targeted interventions in response to local resurgences (May to October 2020). Using network analytic methods to understand the structure of interconnected communities in the movement network, we trace the evolution of geographic communities through time, comparing them to intervention measures implemented in response to local resurgences.

Facebook data
Data provided by the Facebook Data for Good partner program [7] uses aggregated and anonymized user data to record user locations in grid cells (S1 Fig). This data is generated from the population of Facebook users with location services actively enabled and is released approximately 48 hours after data collection.
We used movement data, describing users' modal location in map cells in sequential 8 hour periods. This means that, in each period, a user is assigned to the cell in which they record the largest number of locations. For this individual, the beginning and end points of a network edge are defined as their location in sequential periods. This individual's movement is then aggregated with others to form a weighted Origin-Destination matrix of movement between grid cells. Edges with fewer than 10 travellers were removed by Facebook prior to data sharing to preserve privacy (S2 Fig). Any cell that did not record any between-cell travels with greater than 10 travellers in a given time window was omitted from the dataset, regardless of whether that cell recorded an internal number of users greater than 10. In our network analysis, we constructed a weighted, directed network where nodes were cells, and edge weights were the number of users observed travelling between cells.
We use movement outside of local areas as a measurement of movement from one cell to any other cell in the network. The movement data also records connections from one cell to the same cell in sequential periods. This type of movement may indicate movement within a local area (less than the area of a given cell), or completely stationary individuals. Because of the spatial resolution of the data, it is not possible to quantify the volume of within cell movement that corresponds to each of these behaviours within the same cell. We therefore use the volume of movement outside of the cell as a measurement of the overall movement behaviour, making the assumption that travellers leaving a cell will primarily return to the same cell in sequential periods.

Bing maps tile system
Movement data is referenced to the Bing Maps Tile System, a standard geospatial reference used primarily for serving web maps [26]. The system is divided into 23 zoom levels ranging from global level 1, (map scale: 1:295,829,355.45) to detailed level 23, (map scale: 1:70.53). Each Bing Map cell is identified by a "quadkey", or unique identifier of the zoom level and pixel coordinates of an individual cell. In this analysis, all mobility and census datasets were referenced to Bing Maps cells. The movement dataset was referenced to cells at zoom level 12 (approximately 4.8 to 6.2 km 2 in the UK-measured at 60.77˚and 50.59˚respectively). The ground resolution of cells varies with latitude, with cells at higher latitudes covering a smaller ground area than those at lower latitudes. This distortion is caused by the distortion inherent in the Web Mercator projection (EPSG:3857) used by the Bing Maps Tile System.

Demographic information
We compared the age, population, ethnicity, and IMD of each cell to the population of users recorded in the movement data to identify relationships between the percentage of Facebook users and these demographic factors [27][28][29][30][31][32]. We extracted these variables from national statistics agencies (Office for National Statistics, Northern Ireland Statistics and Research Agency, Scottish Government, and Welsh Government) and aggregated them to grid cells. Census variables were referenced to different statistical units by country. In Northern Ireland, census variables were referenced to Super Output areas (SOAs), in England and Wales, Lower Super Output areas (LSOAs), and in Scotland, Data Zones (DZs). Detailed population data was also collected from national statistics agencies, providing a measure of population for Small Areas (Northern Ireland), Output Areas (OA; England and Wales), and Data Zones (Scotland) [33]. Census variables referenced to different national statistical areas were aggregated to align with mobility datasets at zoom level 12. First, we combined the 2011 population weighted centroids of each OA (or equivalent) from the UK Census with 2020 mid-year population estimates in each UK country. We then assigned each OA centroid to the cell it falls within. We then joined 2011-derived census variables (Age, Ethnicity and IMD) to the OA centroids and computed an average of each census variable for each cell, weighted by the OA population estimates. For IMD data (recorded as ranks) we ranked the weighted average values to create a rank of cells by their population weighted IMD. OAs are much more granular than cells and therefore nested within them in the majority of cases, minimising the risk of the cells detrimentally intersecting OAs during the spatial assignment of demographic variables.
To assess the correlation between census variables and the proportion of Facebook users in each cell, we computed the Pearson correlation coefficient and two-sided p-values between the proportion of Facebook users in a cell and each census variable.

Temporal aggregation
Both Facebook movement and population data are recorded in 8 hour intervals. These data display strong and consistent intraday and intraweek patterns. To isolate changes in daily mobility, data collected in 8 hour periods were aggregated to daily periods by taking the sum of the observed number of travellers along an edge for all periods within a day.

Baseline population estimates
To obtain an accurate measurement of the number of users in the Facebook movement and population datasets relative to census population estimates, we used baseline measurements of travel from one cell to the same cell in sequential time periods computed during the 45 days prior to the creation of the data collection, from January 29th to March 9th 2020. This baseline population recorded the median number of users in a cell for each daily time window in the reference period.

Edge betweenness centrality
We computed the edge betweenness centrality of the movement network, a measure of which edges are most responsible for maintaining connection between different parts of the network. Betweenness centrality was calculated for the weighted, directed network. Comparing this measure of centrality provides information about changes in network topology, the volume of travel along the network, and which connections might be preferred targets for interventions.

Community detection
Community detection methods are algorithms for identifying groups of meaningfully connected vertices in a network. Many methods exist, with various tradeoffs on computational performance, resolution, or other characteristics [34][35][36][37]. Different community detection methods produce different results because of differences in the network characteristics that they use to define communities. To understand the robustness of the communities detected in this study, we employed two different algorithms, InfoMap and Leiden. InfoMap is a community detection algorithm based on the Map equation which records the movements of a random walker along the network. The algorithm tests network partitions, attempting to identify the partition that minimizes the description length required to describe the walker's path [38]. In other words, the algorithm compresses the description of movements in parts of the network where the walker spends a large amount of time, thereby forming communities from these strongly connected sections of the network. The Leiden algorithm maximizes the modularity of different node partitions. To do this, the algorithm assigns network nodes to a partition, assesses the modularity of the partitioned groups, aggregates the nodes in a given partition, and repeats this process until there is no further improvement in the modularity of the partitions. This process results in a partition in which communities possess stronger connections to members of their own community than to other network nodes [39,40]. There is a penalty for both algorithms if they identify a number of communities larger than the minimum number required to maximize their respective objective functions (InfoMap: description length, Leiden: modularity).
We compared the effect of the different community detection algorithms, and found that they aligned hierarchically, where the Leiden algorithm identified geographically larger communities. If the communities detected by one method are largely a superset of the communities detected by another, with shared boundaries between the defined communities, this likely represents a differing hierarchical structure, compared to a different interpretation of community structure. We assessed the agreement between community detection methods to understand the stability of detected communities by comparing the proportion of nodes in each community detected using InfoMap with all communities determined using Leiden, and vice versa (S9 Fig). This comparison allows for the computation of the proportion of shared nodes between both algorithms. The maximum and mean overlap of communities in each algorithm helped to identify the agreement between each method of community detection. In general we found that Leiden detected larger communities, for which the InfoMap communities were (for the most part) sub-communities.

Community label inheritance
The community detection methods used in this study identified communities each day. To track the evolution of communities over the study period, we employed a heuristic approach, assigning the label of a given community identified in a certain time step to that community with the highest number of shared nodes in the following time step [41]. When multiple communities in a certain time step "claim" the same community in the following timestep, the community with the closest size to the community in the following timestep "wins" the right to pass its own label to the following timestep.

COVID-19 data
We used confirmed COVID-19 cases from the UK Pillar 1 and Pillar 2 testing schemes available for England [42]. Pillar 1 is predominantly hospital-based tests including patients and health care workers. Pillar 2 is symptomatic community testing on demand, and represents the bulk of the testing in the UK. Data on the number of confirmed SARS-CoV-2 positive tests by specimen date were available at the Lower Tier Local Authority (LTLA) level [43].
To compare confirmed COVID-19 cases to movement indicators, we measured the total proportion of travellers leaving a grid cell in monthly periods for all cells in England. Cells were then assigned to LTLAs by their maximum areal overlap.

Movement patterns observed
To understand the representativeness of Facebook data to the general population, we explored the size of the population of Facebook users included in the movement dataset, and compared this population to 2019 UK census population estimates. Facebook recorded an average of 4.5 million users per 8 hour period, ranging from 5.8 million on March 29th between 4pm and midnight, to 3.7 million on August 9th between midnight and 8am ( Fig 1A).
The percentage of Facebook users per grid cell was comparable in the four nations of the UK (Fig 1B). The population of Facebook users was highly correlated (R = 0.97, p < 2.2e-16) with census population estimates for all cells (Fig 1C) although there is variance in the proportion of Facebook users to census population across cells, t ( Fig 1D). We compared the proportion of Facebook users to identify whether the distribution of Facebook users was biased in relation to a specific demographic variable, finding no strong associations between the percentage of Facebook users and the average age, percent minority ethnic, population density, or IMD of each cell (S3 and S5 Figs).
Using COVID-19 case data at LTLA level in England, we identified a consistent association between the proportion of users travelling outside of grid cells and the number of cases in LTLAs per month during the study period (Figs 2 and S6). While the strength of this association varies at different stages of the pandemic, it supports previous research demonstrating the relationship between increased rates of movement, which we measure as the percentage of travel outside of local areas, and increased COVID-19 incidence.

Network structure
To quantify how the structure of the overall network changed through time, we computed the edge betweenness centrality of connections between cells for the weighted movement networks, a measure of the relative importance of a given connection in the network (Fig 3A and  3B). Overall, the network experiences a reduction in travel following the announcement of movement restrictions in the first national lockdown, introduced on March 23, 2020 ( Fig 3C). Comparing the period preceding the first national lockdown between March 10 and March 22, 2020, with the period immediately following, from March 23 to April 4, 2020, we compared which edges remained in the same betweenness quantile during the intervention period ( Fig  3D). We observe a disproportionate reduction in edges which were highly central before the introduction of national restrictions, reflecting the reduction of travel along long distance connections (Fig 3E and 3F) which tend to be highly central in the network, and an increased fragmentation of the network.
While we continued to observe a weekly trend of increased between-tile movements during weekdays, the variance of weekly between-tile connections decreased during the period of national interventions (S7 Fig). Overall, the network was reduced by 10,690 edges, with only 46 new edges observed in the post-intervention period. All of these newly created edges had a distance less than 47km. The network also became more disconnected, with the number of components increasing from 23 (largest component size: 3394) to 37 (largest component size: 2579).
We also measured the distance travelled per user in the weighted, directed network, observing a decrease in the overall distance travelled by users during the period of national interventions and a subsequent increase in this distance throughout the summer. As the overall travel in the network decreased, we observed a sharper decrease in the volume of long distance connections, with most long distance connections absent from the movement network during national restrictions (Fig 3F). This decrease reflects both the decreasing volume of long distance travel, and the increased effect of censoring during periods of lower travel volumes.

Community detection
We identified geographically-explicit "communities of interaction" in the network of user movements using the InfoMap and Leiden algorithms (S8, S9 and S10 Figs).
We observed an increase in the number of identified communities and a corresponding decrease in their area and population size following the introduction of nationwide intervention measures on March 23rd, 2020 (Figs 4, S11 and S12). Overall, these communities were smaller than LTLAs and Travel to Work Areas (TTWAs) in terms of their size and population, although in cities like London and Manchester, communities intersected multiple TTWAs and LTLAs (S13 Fig). The cell-level network also became more sparse as cells were censored from the dataset due to the lower number of edges connecting cells. Restrictions were eased incrementally between May and July 2020, during which time we observed an increase in the volume of between-cell movements and an increase in the geographic area and connections between communities.
We found that the most persistent communities existed in some large population centers (S14 Fig). This reflects both the smaller influence of censoring on higher population cells as well as the continued existence of movement networks, though reduced, around population centres. Persistent communities were identified in Manchester, Newcastle, Glasgow, and Edinburgh, but not in London, which regularly split into more communities on weekends (S15 Fig). We did not find evidence that community stability is associated with population density (S16 Fig). We found strong stability of communities calculated for networks aggregated to daily, weekly, and monthly intervals, with a Normalised Mutual Information, a measure of similarity between network partitions, between 0.89 and 0.92 for all partitions (S17 Fig). By transferring community labels between time windows, we constructed a network of communities in which each community is a node, connected to other communities in a directed network weighted by the number of users travelling between community pairs (Fig  4). In this network, the degree of all nodes decreased after the implementation of nationwide interventions and the overall reduction of between-cell edges (S18 Fig). Nonetheless, we did not observe a significant reorganisation in the hierarchy of connections between communities.

Local lockdown extents
After the period of national interventions, the UK introduced local area interventions at differing levels of stringency. The first such intervention was implemented in Leicester on June 30, 2020 in response to a local resurgence. To understand the impact on the mobility of users, we assessed changes in the volume of travel and network topology before and after introduction.
We measured the connection between cells overlapping areas of local interventions in four areas: Leicester, Manchester, the North West, and the North East (S19, S20 and S21 Figs) to assess the impact on volume of travel and the isolation of intervention areas from the broader UK movement network. We found that, while movement indicators did decrease, particularly in Leicester, the response was smaller than during the first national lockdown (Fig 5A). The introduction of local interventions measures in Leicester was followed by a reduction in cases in the area, asomething that was not observed for the other local interventions (Figs 5B, S19, S20, and S21).
Motivated by the need to identify communities associated with epidemic resurgences and responses to reactive interventions, we compared the extent and date of local interventions with the spatial extent and temporal persistence of network communities (Fig 5C). We found that network communities remained relatively stable after the introduction of intervention measures in all areas, with some peripheral changes to movement communities in Manchester, the North West, and North East (S19, S20 and S21 Figs).
From July 2020 onwards, the geographic extent of local area interventions was in closer agreement with movement communities, particularly in Manchester (S19 Fig). Some   multiple Local Authorities. Additionally, movement communities evolve over time, and have the potential to shift following local area interventions, requiring an understanding of realtime patterns of movement to monitor the appropriateness of a given measure.

Discussion
This study used a large, anonymized movement dataset to quantify changes in the UK movement network and assessed how geographic communities were affected by interventions. Using movement communities, we can identify strongly connected areas that can be used to inform the spatial scale of geographically-targeted interventions to respond to resurgences of COVID-19. These communities can also be used to monitor adherence to public health measures in near-real time, as changes in community structure indicate a reorganization of the patterns of travel for specific populations. We found that overall, these communities were smaller in population and area than LTLAs and TTWAs, providing a finer-grained understanding of mobility connections between areas. We also explored the structure of the UK travel network through the pandemic, identifying variations in the central connections between population centres and changing travel patterns in response to the introduction of public health measures.
Gridded mobility datasets such as those used in this study provide granular, near real-time information about the movement patterns of a large sample of the UK population. While these datasets could usefully inform epidemic responses [18, [44][45][46], there remain questions about the generalizability of the movement recorded in these datasets to the movements of the overall population [47,48]. The privacy preserving structure of the Facebook movement dataset means that low frequency connections are not recorded, precise locations are replaced by grid cell references, and data is provided in uniform-area grid cells which vary in population size. Ideally, multiple mobility datasets should be analysed to improve the interpretation of changes in mobility indicators.
In response to the nationwide lockdown introduced March 23rd, the UK mobility network changed drastically. The movement network became more sparse, with reductions in travel volume and distance. These changes disproportionately affected long distance and highly central edges, important connections which integrated geographically distant areas in the broader UK travel network.
We identified geographically distinct communities with strong interconnections that are relevant to policy responses focused on limiting transmission in response to geographicallylimited disease resurgences. These communities delimit the boundaries of areas with strong internal movement connections and therefore provide empirical boundaries that can inform policy responses or form the basis for spatially-explicit transmission models. Community structure has been shown to influence the progression of modelled disease transmission in simulated and real-world networks [49][50][51]. Variations in community structure can affect or limit mixing between people or between populations, giving rise to epidemics with multiple peaks or geographically-localised resurgences [52]. Identifying communities is therefore important for understanding and projecting epidemic dynamics.
There is a demonstrated relationship between human mobility and transmission of COVID-19 relating both to the spatial introduction of disease, where connectivity is correlated with the time of first introduction [15], and rates of movement in specific locations, which are correlated with disease transmission [53]. In the movement dataset used in this study, we identified groups of network nodes which have strong movement connections to each other, meaning that they share a relatively high number of travellers with other members of their community. Similar delimiting of movement communities has been applied to define "functional zones" in the European Union [54], where these zones have been recommended as a way to define the extent of locally targeted interventions when responding to spatially-limited resurgences of COVID-19.
In response to resurgences in a particular area, determining the geographic extent of reactive interventions should be driven by areas at risk of increased transmission, which may not intersect with administrative boundaries. The geographic areas identified here could be used to delimit the extent of areas with strong connections to a particular resurgence or new outbreak and to define the extent of interventions such as surge testing or social distancing measures. We found that while communities tended to stabilise around settlements, there was disagreement between the extent of these communities and the boundaries at which local area interventions have been introduced in the UK thus far. These communities provide valuable information in near-real time about the extent of typical patterns of travel, their temporal variations, and "catchment areas" of movement around a given area.
There are several caveats to the methods of community detection used in this study, as the extent of communities could be influenced by the level of aggregation of the Facebook mobility data, and cells were assigned to a single community each day. While we conducted a sensitivity analysis using two methods for identifying communities, there are a wide variety of community detection algorithms which emphasize different aspects of network structure. Questions also remain about the general reliability of community detection methods, which have been developed on well understood network structures, when applied to real-world networks [35]. The effect of local area interventions on travel depends on the specifics of each intervention and their stringency. Additionally, interventions occur at multiple spatial scales, and across overlapping time periods. For example, in the UK, national interventions coincide with local interventions, and each may contribute differently to changes in movement behaviour.

Conclusion
Data-driven approaches using mobility data can help to quantify patterns of travel and inform geographically-targeted public health interventions. In each case, a white cell means the data were missing from the Facebook mobility data and so are not displayed here. In most cases this is due to censoring of low numbers, except for the small discontinuity around Swindon, mentioned in the Main Text. a) IMD rank. Each country has a different colour because the measure of IMD is different in each country. In each case, the darker shade is higher IMD. b) Population density per cell (log scale). c) Percentage of the population self-identifying as any other ethnicity than "Any white background". community labels that each cell has had (i.e. number of communities that the cell has ever been in) during the study period. The darkest shade indicates that a cell was always in the same community. b) the number of community labels for a given cell as a proportion of the number of days that cell was present in the dataset. This was calculated as the (number of unique community labels/number of days a cell was present). c) stable communities, marked as those which had the same community label for the entire study period. Base map data from Natural Earth