Territorial differences in the spread of COVID-19 in European regions and US counties

This article explores the territorial differences in the onset and spread of COVID-19 and the excess mortality associated with the pandemic, with a focus on European regions and US counties. Both in Europe and in the US, the pandemic arrived earlier and recorded higher Rt values in urban regions than in intermediate and rural ones. A similar gap is also found in the data on excess mortality. In the weeks during the first phase of the pandemic, urban regions in EU countries experienced excess mortality of up to 68 pp more than rural ones. We show that, during the initial days of the pandemic, territorial differences in Rt by the degree of urbanisation can be largely explained by the level of internal, inbound and outbound mobility. The differences in the spread of COVID-19 by rural-urban typology and the role of mobility are less clear during the second wave. This could be linked to the fact that the infection is widespread across territories, to changes in mobility patterns during the summer period as well as to the different containment measures which reverse the link between mobility and Rt.


Introduction
The COVID-19 pandemic is creating severe social and economic consequences, with some places experiencing disproportionately high levels of mortality and economic losses. Urban regions, and particularly large cities, have been severely affected by the spread of the pandemic in its early stages. Public discussion on the territorial impact of the pandemic requires a greater understanding of the way the pandemic is affecting regions that are diversely vulnerable and will require different recovery plans. Analyses of the role of population density and city size on the virus spread have led to mixed results [1][2][3]. While these analyses primarily look at the population scale as a whole, other analyses have examined disparities within the urban environment, looking in particular at the intensity of social contacts related to the urban organisation and life that would make some places more prone to infection in the first phase. In particular, some of the factors considered relevant for virus transmission are the connectivity of cities as hubs of national and international transport systems [4][5][6], and the structure of industry and the concentration of essential jobs in certain areas. [7,8]. In addition, it has been documented that COVID- 19  tend to be in close contact and with multi-generational family members living together [9,10]. Our paper is aimed to gain a deeper understanding of the links between COVID-19, urbanrural typologies, territorial conditions, and mobility, which is critical for designing effective public health policy responses. We first explore the heterogeneity of COVID-19 patterns in its onset, spread, and associated excess mortality by comparing the results by the level of urbanisation of European regions and counties in the US. For the EU we use Eurostat NUTS3 ruralurban typologies and for the US we use the Rural-Urban Continuum Codes reduced to 3 classes. The classification in the EU and the US according to rural-urban typologies follows harmonised criteria of population density and size of the urban centres. On the basis of the share of the rural population regions at Territorial Level 3 (i.e. NUTS3 in the EU and counties in the US) are classified as predominantly rural regions, intermediate regions and predominantly urban regions. These classifications are routinely used by National Statistical Offices, the OECD and by the European Commission to publish territorial statistics. The results of our comparison of the spread of COVID-19 across regions show that the pandemic started earlier in urban regions than in intermediate and rural ones. Urban regions had the highest Rt values in both Europe and the US during the first wave, whereas rural counties were more affected than urban counties in the second wave. Analysis of excess mortality, calculated using Eurostat statistics and obtained from the difference between reported fatalities and a baseline model based on historical data between 2011 and 2019, also shows a large gap by urbanisation level during the first wave, with a median excess mortality up to 73% for urban regions, 18% for intermediate regions, and 11% for rural regions. In a second phase, we empirically examine the impact of mobility on virus spread. We model population mobility in European regions through a harmonised mobility index derived from mobile phone data. For our purpose of comparison by rural-urban typologies these data is unique because it provides not only relative temporal variation of mobility within each region in respect to a reference date but also information about absolute differences across regions. Due to the lack of similar data for the US our analysis on the effect of mobility on Rt is limited to the EU. We examine the geographical distribution of mobility changes through regression models for the weeks in the first and second virus waves. Our results show that, on the one hand, higher mobility explains most of the variation in values in the weekly Rt during the first wave, with internal, inbound, and outbound mobility positively affecting Rt. The effect of the per capita internal mobility, in particular, is more pronounced than that of the degree of urbanisation, and remains significant even when population and population density are taken into account. On the other hand, the same regression models replicated for the second wave show a negative role of mobility on the local spread of the virus, as well as a higher prevalence of the infection in rural regions compared to large cities. The paper is organised as follows. The data section describes the data and methods used in the analyses. In the results and discussion section, we present how the COVID-19 pandemic spread in rural, intermediate, and urban regions during the first and the second wave, and the conclusions are outlined in the final section.

Data and methods
In this section we present the data sources and the methods we used to assess the spread of the follow a similar classification scheme that distinguishes metropolitan counties by the population size of their metropolitan area and non-metropolitan counties by the degree of urbanisation and their proximity to a metropolitan area see https://www.ers.usda.gov/data-products/ rural-urban-continuum-codes.aspx.) In this analysis, the variable grouping the 2013 ruralurban codes has been reclassified into 3 categories: urban (codes 1-2), intermediate (codes [3][4], and rural counties (codes 5-9). We calculated the reproductive number (Rt) as an indicator to assess how fast the virus spread across different types of geographical areas. We estimated the excess mortality to monitor in quantitative terms the evolution and impacts of COVID-19 pandemic. We used fully anonymised and regionally aggregated mobility data to get insights about the different regional mobility patters. Finally, we fitted a linear regression model to assess the relationship between mobility and Rt during the first and the second wave.

Rt
Rt is the main real-time indicator used to assess the evolution of the pandemic, design containment measures and monitor their effectiveness. During the pandemic several governments and administrations have established systems for the automatic triggering of restriction measures based on a weekly monitoring of regional Rt values. Technically, Rt gives a measure of the number of new infections caused by infected individuals at time t in a partially susceptible population. Values above one indicate that that the number of cases will increase while with values below one the pandemic will extinguish. A time-dependent reproduction number, Rt, was calculated for each day and region with the R package 'R0' [11]. For the calculation we followed a likelihood-based estimation procedure that derives the probability of infection from the analysis of the epidemic curve of the observed cases using sliding temporal windows. [12] This estimation procedure relies on a parameter about the time between the infection and the manifestation of the symptoms which in our cases was obtained from data reported during the early phases of the pandemic in China [13]. The data on confirmed COVID-19 cases at regional level was obtained through the 'COVID19' R package [14] and updated until end of 2020. To analyse at descriptive level territorial differences, the daily Rt values were averaged by consecutive weeks and across regions classified according to their rural-urban typology.

Excess mortality
The baseline for mortality was calculated with Generalised Additive Models fitted independently for each region. In the models we included a seasonal component to account for the increase in mortality during the winter months linked to influenza outbreaks, and a linear time trend to account for long-term changes in mortality due to demographic dynamics. The excess mortality was measured as difference between the reported data in 2020 and the estimated baseline for all occurrences exceeding the lower or upper 95% confidence intervals of the estimated baseline. The weekly mortality were obtained from Eurostat (demormweek3) and covered 900 regions in 26 EU Member States and the UK with time series which, depending on the MS, were starting from 2001 or 2015 and spanning until the end of 2020.

Mobility
In this study we used fully anonymised and aggregated mobility data shared with the European Commission (EC) by European Mobile Network Operators (MNOs). These mobility data comply with the 'Guidelines on the use of location data and contact tracing tools in the context of the COVID-19 outbreak' by the European Data Protection Board [15]. The mobility data were in the form of Origin-Destination Matrix (ODM) [16,17] and they provided valuable insights into mobility patterns across geographical areas. The data has been used to derive mobility insights and build tools to inform better targeted containment measures, in a Mobility Visualisation Platform, available to the Member States [18]. Given the high variation in the spatial and temporal aggregation across countries and operators, the original ODMs were harmonised at standardised spatial and temporal granularity to the derived Mobility Indicators [19]. We further aggregated the Mobility Indicators at weekly intervals, and we normalised the Mobility Indicators to enable a better cross-country comparison. The normalisation was performed by comparing the number of movements for each NUTS3 areas and each type of movements (internal, inbound, outbound) by the average mobility levels between February 10 and March 8, 2020. The reason for this normalisation was to capture the relative decrease/increase of mobility compared to pre-lockdown levels. In addition to normalised mobility, we also estimated the per-capita internal mobility by dividing the number of movements recorded using mobility data in a NUTS3 region by population size reported by Eurostat as of 1 January 2018. The number of movements recorded by each Mobile Network operator depends on their methodology and their penetration rate in each country. Thus to enable cross-country comparison, we normalised the per-capita internal mobility by setting for each country the value of one to the NUTS3 regions with the higher per capita mobility over the reference time period February to December 2020, and the value of zero to the NUTS3 regions with the lowest per-capita mobility over the same time period. The limitation of our proposed indicators are the following. First we assume that the penetration rate of each MNO in each country is the same across rural, intermediate and urban areas and it remains stable across the time period that we analyse. Second, we assume that the population of the NUTS3 areas remained stable during the period that we analyse.

Regression
To support our intuition about the territorial heterogeneity in the spread of COVID-19 during the first and second waves, we examine the effect of different mobility patterns through OLS regression models. The models have the Rt values recorded in each European region as dependent variable, the rural-urban typology of the region, the internal, outbound and internal per capita mobility as main independent variables and the logs of the population and population density of the region as control variables. We run two set of models at 28 days since the onset of the pandemic in each region to capture effects during the first wave and for the weeks after August 2020 for the second wave. All specifications include country fixed effects to account for differences in virus transmission resulting from invariant country characteristics. The fitting of the regression models was constrained by the necessity of having regional data on COVID-19 cases for the calculation of Rt and mobility indicators for the same periods. Data on population was obtained from Eurostat (demorpjangrp3 and demord3dens). Overall the regressions are based on around 3500 observations in 654 regions for the first wave, and 10500 observations in 551 regions for the second wave.

Results and discussion
The COVID-19 pandemic started earlier in urban regions   Figure), and weeks since the start of the second wave of the pandemic (lower Figure). Looking first at the upper figure, we observe that urban regions in Europe and the US recorded higher Rt values than those found in intermediate and rural regions at the start of the pandemic. This indicates that the disease spread faster in urban regions and that containment was more difficult in more densely populated areas. Approximately 56 days after the start of the pandemic, we find a general decline in the Rt and a reduction in the differences in Rt between the three groups of regions. At the start of the pandemic, the rural-urban divide in Rt values is more pronounced in the US counties. However, even in this case, the disparity in the pandemic spread by level of urbanisation has narrowed among the three regional groups, with the Rt index close to 1 at the end of the first wave. The lower part of Fig 2 shows the median Rt values across regions and counties in the weeks following the summer period, when the pandemic began to spread in a second wave of infections. In the European regions, we observe an initially higher Rt in the urban regions and increasing and   The increase in weekly mortality compared to past trends is used as an indirect measure to monitor the evolution of COVID-19. This indicator has the downside of including fatalities not necessarily linked to COVID-19, such as those caused by the saturation of hospital capacity, but has the advantage of being less influenced by the underestimation of the real infection rate due to asymptomatic cases or differences in testing strategies over time and regions [20]. The bars in Fig 3 show the weekly total excess mortality calculated from Eurostat statistics for most EU countries and the UK. The excess mortality is obtained from the difference between the reported fatalities and a modelled baseline estimated from historical data until 2019. The number of weekly fatalities attributable to COVID-19 peaked at the beginning of April, with about 41 400 deaths in excess compared to the baseline.(This peak represents 21 600 more cases than the excess mortality recorded in the same countries during the second week of January 2017, corresponding to a particularly severe year for the seasonal flu.) The lines in the figure show the median excess mortality in the NUTS3 regions classified according to their degree of urbanisation. At the peak of the pandemic in third week of April, the median excess mortality in urban regions reached its peak with an excess mortality of 73%, which was 58 pp higher than in intermediate regions and 68 pp higher than in rural regions in the same week. In the second wave of the pandemic, the disparities among regions appear less pronounced. There is also a reverse in the trend of excess mortality, with rural and intermediate regions having higher rates, 38% and 32% respectively, than urban regions with an excess mortality rate of 26%.

Mobility is higher in urban regions
One possible explanation for the higher Rt and excess mortality in urban regions is that in cities the infection can spread more rapidly given the higher population density, larger use of public transportation and higher number of social interactions. The intensity of social interaction is reflected in mobility indicators which can be calculated from mobile phones data. In fact, the relation between intensity of social contacts, mobility and infection is at the basis of mobility restriction that most governments have put in place to contain the pandemic. We analyse the patterns of mobility within, from and toward European regions with anonymised and aggregated mobile indicators derived from mobile phone data as described in the Data and methods section.  measures, the level of per capita mobility was higher in urban regions in respect of intermediate and rural ones. During the second wave, the per capita mobility is almost equal across all areas, indicating substantial reduction of mobility in urban and intermediate regions at the beginning of summer and an increase in rural regions. This shift in mobility patterns is exemplified in Fig 5 showing the weekly relative changes in mobility for each Italian region (rows) in respect of the levels recorded during the last week of February. In this case, regions are sorted on the basis of their proximity to the sea or mountains to better appreciate the mobility linked to domestic tourism. In May, after the lifting of lockdown, all Italian regions recorded an increase of mobility to the levels of February. However, during summer, in coastal and mountain regions mobility increased to higher values than at the beginning of the year. The highest increase was recorded in the second week of August in the renowned region of Olbia in Sardinia (+373%). The fact that there was high mobility from urban to coastal and mountainous areas could have contributed to spreading the disease from cities to intermediate and rural areas. With the re-opening of schools in September, the level of mobility started again to increase uniformly across all regions. Table 1 shows the results of regressions on the first wave of infection considering Rt values in the 28 days after the start of the pandemic. Table 2 presents results for the second wave on the Rt values in the weeks after August. The results of the regressions show a significant relationship between the effective reproduction number, Rt, and the levels of urbanisation (Column 1 in Table 1). During the first wave of the pandemic, Rt values are lower in rural and intermediate regions than in the urban regions used as reference. Urban regions are therefore the most affected in the first weeks of the pandemic in terms of number of cases due to their high population density and large concentration of social interactions, as well as the high local and global connectivity (Balcan et al., 2009). In Columns 2-4 we include the mobility controls separately, i.e. internal, inbound, outbound mobility, given the correlation between these measures within countries. We use the three-week lagged value of each mobility variable in the regressions to account for the delay between the mobility-driven infection and the positive case confirmation and to mitigate a potential reverse causality problem between the two variables. A sensitivity checks for the choice of the alternative lag periods is shown in Fig 6. We selected a lag of 3 weeks which is maximising the positive coefficient of mobility during the second wave. Positive lags produce as expected negative coefficients since mobility is reacting to restrictions measures rather than driving the infection. In all specifications, each mobility indicator is positively correlated with Rt values, indicating that higher mobility is associated with higher transmission. The coefficient on delayed mobility ranges from 1.82 to 1.53, depending on the specification. Mobility is also analysed using a per capita mobility indicator (Column 5), which captures the daily movements per capita in a nuts region. The positive and significant coefficient of the per capita mobility confirms a pattern of Rt that increases as the internal mobility measured on the total population increases. The demographic controls of the (log) total population and density, presented in Columns 6 and 7, also exert a positive effect on Rt in the first wave. Finally, in Columns 8 and 9, we simultaneously estimate the effect of the per capita internal mobility, the level of urbanisation of the regions and the population density and size. The main result of the estimates is that the increase in the internal mobility is positively and significantly associated with the number of cases, with a stable coefficient across the different specifications. The coefficient of the internal mobility indeed remains significant and positive even when we include the other control variables. Internal mobility appears to be a critical determinant of the rate of COVID-19 cases during the first wave, positively influencing the spread of the virus possibly through increased social interactions. These results confirm that great part of the territorial characteristics influencing the higher epidemiological risk at the onset of the pandemic in urban regions can be explained by the role of mobility. Table 2 examines the relationship between Rt and different mobility patterns in the European regions in a similar way to Table 1 but with data for the second wave (from August). The estimates show an inversion of sign from the first wave with a positive association between the virus spread and the rural and intermediate regions compared to large cities. These results may reflect a behavioural response as well as more severe containment measures in the most severely affected areas. In the second wave, different mobility patterns are associated with lower Rt values, presenting a weaker relationship than in the first wave, as presented in Column 2-6. The results thus indicate that the relationship between mobility and the regional virus transmission has changed over time and that shifts in mobility were used to control the pandemic. However, these changes were not sufficient to prevent a second wave of infection in most of the regions analysed. The demographic variables are significant and negative on the virus spread. The models that simultaneously estimate the effect of internal mobility per capita and different regional characteristics show a negative relationship between this mobility pattern and the virus transmission, as well as a higher prevalence of the infection in rural regions compared to large cities during the second wave. Fig 6 show the regression coefficients with internal mobility shifts of 3-0 weeks before and after the Rt reference week. This study is not aimed at a causality analysis between the two variables, however we quantify the different time-lag effects to detect their potential influence on transmission, which is useful for the deasese monitoring policies. During the first wave the relation of mobility on Rt is positive and peaks during the same week (week 0). During the second wave, the relation is constantly decreasing towards negative values. The fact that the relation during the first wave is becoming clearer towards the reference week indicates that mobility is having an effect on Rt. In contrast, during the second wave there is an inversion in the relationship and mobility rather than influencing seem to react to changes in Rt by moving in opposite directions. Intuitively, this is in line with the consideration that during the advanced stages of the pandemic, mobility is highly conditioned by restriction measures and closures that are put in place in correspondence with increases in Rt. (A specification linking the disease values to mobility may suffer from reverse causality. To mitigate this potential problem, we use a three weeks lagged value of each mobility variable in the regressions.)

Conclusion
In this article we analysed the territorial differences in the onset and spread of COVID-19 and the associated excess mortality, across the European NUTS3 regions and US counties during the first and second COVID-19 wave. During the first wave, the COVID-19 pandemic arrived earlier, recorded higher Rt values and had a higher impact in terms of excess mortality in urban regions compared to the intermediate and the rural ones. In the first wave, mobility influenced the spread of COVID-19, since the higher mobility of urban regions is explaining entirely the differences between the three groups of regions. The fact that these effects are more difficult to recognise in later stages of the pandemic can be tentatively explained by the widespread of the infection, the implementation of restriction measures which invert the link between mobility and Rt, often applied on a territorial basis, and the more complex mobility patterns experienced during the summer period. Our findings are in line with previous studies identifying the role of mobility on virus spread in the early stages of the pandemic. To our knowledge, our research is unique in providing a broad geographic coverage and a high level of geographical detail, and in examining the role of regional mobility for the spread of COVID-19 through a unique data set derived from mobile phone data. In terms of policy implication, our research contributes to a better understanding of territorial characteristics of the spread of COVID-19, which is critical for designing effective public health policy responses, often decided at regional level.
Supporting information S1 File. Data and R code used for the regressions presented in Tables 1 and 2. The ZIP file contains the two data files "first_wave.csv" and "second_wave.csv", the r code "Script.R" and a read me file with the data description "data_ReadMe.txt". Please note that due to the commercial sensitivity of the mobility data, we have added a skewed non-negative random noise to the mobility columns, therefore the statistical results of the models that include mobility data are not replicable. (ZIP)