Quasi-experimental evaluation of national border closures on COVID-19 transmission

With over 200 pandemic threats emerging every year, the efficacy of closing national borders to control the transmission of disease in the first months of a pandemic remains a critically important question. Previous studies offer conflicting evidence for the potential effects of these closures on COVID-19 transmission and no study has yet empirically evaluated the global impact of border closures using quasi-experimental methods and real-world data. We triangulate results from interrupted time-series analysis, meta-regression, coarsened exact matching, and an extensive series of robustness checks to evaluate the effect of 166 countries’ national border closures on the global transmission of COVID-19. Total border closures banning non-essential travel from all countries and (to a lesser extent) targeted border closures banning travel from specific countries had some effect on temporarily slowing COVID-19 transmission in those countries that implemented them. In contrast to these country-level impacts, the global sum of targeted border closures implemented by February 5, 2020 was not sufficient to slow global COVID-19 transmission, but the sum of total border closures implemented by March 19, 2020 did achieve this effect. Country-level results were highly heterogeneous, with early implementation and border closures so broadly targeted that they resemble total border closures improving the likelihood of slowing the pandemic’s spread. Governments that can make productive use of extra preparation time and cannot feasibly implement less restrictive alternatives might consider enacting border closures. However, given their moderate and uncertain impacts and their significant harms, border closures are unlikely to be the best policy response for most countries and should only be deployed in rare circumstances and with great caution. All countries would benefit from global mechanisms to coordinate national decisions on border closures during pandemics.


Border Closure Coding
For the border closure coding, we hand-coded matrices of border closures represented as dichotomous variables by date of implementation and country targeted for 179 countries using data from the Oxford COVID-19 Government Response Tracker, with corrections, updates and additions detailed in S1 Data 11 . This matrix was then used to code categorical indicators of the daily travel restriction status between all country-pairs: 1) no border closurethe country has not implemented any closure of land, air, or sea border in response to the COVID-19 pandemic; 2) targeted border closurethe country restricts non-essential entry of foreign nationals from one or more specified countries; 3) total border closurethe country restricts all non-essential entry of foreign nationals; or 4) reopeningthe country had previously implemented a total border closure and has re-opened borders to at least one country.
Visa suspensions and closure of land borders were coded uniquely as de facto border closures and analyzed as targeted border closures in quantitative analyses. Eleven jurisdictions (Aruba, Barbados, Bermuda, Cabo Verde, Greenland, Guam, Hong Kong, Kosovo, Macao, and Puerto Rico, and Solomon Islands) were excluded due to low data availability in either outcome or covariate data and to limit analysis to countries. Two countries, North Korea and Turkmenistan, reported no cases and were not included in analyses. China was also exempted from primary quantitative analyses because, as the first country to report cases of COVID-19 and as the initial epicenter of the pandemic, national border closures could not have controlled domestic transmission in this study period. The final dataset contains border closure data for 179 countries, and full outcome and intervention data for 166 countries.

Covariates
A full list of control variables hypothesized to have been associated with the effectiveness of border closures is available in Table A. All covariates were specified a priori and obtained from publicly available databases and were selected to cover categories of country characteristics, economic factors, gender parity, health indicators, policy control indicators, and border closure details. Where data was not available for the current year, we used data from the most recent year available. Further information is made available in a metadata repository alongside our open-access dataset on Scholars Portal Dataverse with download dates, updated data sources, and variable format and description (S1 Data).

Quantitative Methods
The proportion of the world's population targeted by border closures was calculated as the sum of the population of targeted countries divided by the population of all countries included in the study. A measure excluding own population was also calculated to account for the limited impact of border closures for countries with large populations. The proportion of global cases targeted by border closures was calculated as the sum of cumulative incidence for every country targeted up to the day being evaluated divided by the global cumulative incidence. The decision to limit our study to the first 22 weeks of the COVID-19 pandemic was driven by considerations related to our quasi-experimental methods as well as global border closure trends. Our analysis plan (S3 Data) was developed once our team determined that new border closures had begun to stall in May 2020, as observed in Fig 2. Because interrupted time series analysis can be performed on a balanced dataset with a minimum of 12 data points 12 , adding a lag time of five days to the final country-specific border closure on May 27, 2020 led us to the 22-week mark. All calculations were conducted in Stata release 15 and maps created using Tableau version 2020.2 unless otherwise stated 13,14 .

Interrupted Time-Series Analysis
Country-level intervention points were evaluated for all targeted and total border closures that met the following criteria: 1) a minimum of seven days of data exists prior to and after the intervention point, 2) for multiple intervention time series, a minimum of seven days has passed since the last intervention point, and 3) for multiple sequential targeted border closures, the second (or third) intervention represents an increase of at least 20% of the world's population being targeted by the new border closures. As robustness checks, Dickey-Fuller tests were run for every time series, and none exceeded the 5% critical value of the t-distribution for a unit root (S2 Data). Results correcting for serial autocorrelation were calculated using Cumby-Huizinga general tests for autocorrelation in time series analysis (S4 Data) 15 .
We additionally considered alternate models aligning by the date of total border closures for both the global average and population-weighted global average of Rt. Stratified analyses of high-income and low-and middle-income countries (as classified by the World Bank), were conducted using pooled data with new intervention points calculated for each group exceeding the 20% global population threshold for both targeted and total restrictions, and the same analytical methods as global analyses 16 .
Falsification tests were run to evaluate whether results were being driven by chance or analytical choices. A false intervention date was created at the midpoint between the first date 90% of the world's population was living in countries implementing border closures and the final date in the study period, and for each country that implemented only a total border closure, country-specific false intervention points were created at the midpoint between the date of total border closure and the final date in the study. In consideration of the visually apparent peak in global Rt occurring between February 18-21, 2020, whose extreme values could skew findings, a robustness check was run dropping four values that exceed an Rt of 4.0. Finally, a robustness check restricting the length of time after the global total border closure intervention to match the period of time after global targeted border closure intervention (45 days) was conducted, with the global ITS results remaining unchanged (Tables G-H).

Meta-regression
The proportion of the global population and cases targeted by each restriction was calculated for the first targeted restriction for each country, and countries were categorized into three groups for the timing of restrictions relative to incidence within each country and relative to border closures implemented by other countries. Each country's first restriction, first targeted restriction, and first total restriction was categorized as happening either prior to 50, between 50 and 500, or after 500 cumulative domestic COVID-19 cases. Each country's first border closure, first targeted border closure, and first total border closure was also categorized as occurring within the first, second, or third tercile (third) of border closures globally. Every factor listed in Table A was first run against the sum of positive, negative, and null effects for targeted, total, and all border closures for both unlagged and five-day lagged outcomes and evaluated for significance at the 95% level (Table T). Full model regressions of every covariate found to have been independently associated with changes in Rt were then conducted. Robustness checks with unlagged outcomes produced similar results, but also identified higher GDP as being associated with beneficial total border closures and higher GDP per capita associated with less beneficial closures (Table W).

Fixed-Effects Time-Series Regression
Fixed effects regression was used to inform a robustness check for CEM analyses through identifying country-level indicators which significantly modified the association between border closures and the transmission of COVID-19. The primary evaluations used a five-day lag in Rt and after surveying published literature, a list of country characteristics was organized into a theoretical framework, including countrylevel indicators for the economy, gender, health, and domestic containment and closure policies (Table A). Within the fixed-effects regression model, the index represents the individual country ( = 1, … ,166), is the day of the study period between Jan 1 to June 8, 2020 ( = 1, … ,160), and is the vector of independent variables evaluated for association (Equation 1). For the primary panel analysis, each of the twenty-one country-level indicators = [ , . . . , ] were independently interacted with the percentage of global population targeted (excluding own population).
A fixed-effects regression model mitigates for potential confounding by controlling for country-level observables, while the model assumptions also accounted for the unobservable country-level factors. 17 The additional use of interaction effects examined the intra-country association between the vector of independent variables and Rt outcomes as related to the intra-country variation observed in changes in the percentage of global population targeted (excluding own population). Nineteen country-level factors were identified at a significance level of 90% (Table AH). Scenario analyses using the percentage of global cases targeted (excluding own cases) as the intervention variable and Rt values lagged by zero, 10 and 15 days were also conducted (Table AI).

Coarsened Exact Matching
Coarsened exact matching (CEM) was used to match and reduce the multivariate imbalance within countries with no border closure (control group) and countries with a border closure (treatment group). 18 An adjusted regression analysis using maximum likelihood estimation (MLE) was then fitted on the dataset with improved balance and used in predicting the size of the treatment effect averaged over all treated countries in our sample. 18 The four border interventions of interest are: i) targeted border closures; ii) proportion of global population targeted by a targeted border closure; iii) proportion of global cases targeted by a targeted border closure; and iv) total border closures, with instances of border de-escalation censored from all CEM analyses. A diagrammatic representation of the coding of treatment status by type of border closure studied is provided in Fig B in S1 Text. To evaluate targeted border closures, the treatment group was comprised of all countries that had ever implemented a targeted border closure and the control group included countries that had never implemented a targeted border closure. In one model, data after a country escalated to a total border closure were censored for both treatment and control groups, while in another model, total border closure data were left intact and controlled for in MLE.
To evaluate total border closures, two approaches to assigning treatment status were used. The primary model assigned all countries that had ever implemented a total border closure to the treatment group and all countries that had never implemented a total border closure to the control group. A conservative model was also constructed, assigning countries that had only ever implemented a total border closure to the treatment group and all countries that had either never implemented a border closure, only implemented a targeted border closure, or escalated from a targeted border closure to a total border closure to the control group. The same two techniques outlined above for the targeted border analyses was used for both models in evaluating total border closures: censoring data during period of total border closure for both treatment and control groups or controlling for any instance of total border closures in MLE.
Due to the trade-off between matching on more information (a higher number of variables) and a reduction in the dataset size, varying degrees of coarsening were used in the matching process. Countries were matched on a subset of the independent variables identified to have significant interaction effects in the fixed-effects regression, with variations in the degree of coarsening. All country-day data points without a matched pair and a zero value for the CEM weight was excluded from further analysis.
The adjusted regression analysis used MLE to determine the average treatment effect on the treated in the newly balanced dataset. and are dichotomous variables representing treatment status, with indicating a country having implemented a targeted border closure (Eq 2) and indicating a country having implemented a total border closure (Eq 3). The coefficient quantifies the average treatment effect across all countries having either implemented a targeted border closure or a total border closure.
The primary analysis (Table 2) moderately coarsened variables with higher causal plausibility, and a robustness check minimally coarsened lower priority variable (Table Z). A robustness check on the primary model verified the improved balance after matching (Table AJ). Additional robustness checks using higher causal plausibility included greater coarsening (Table X) and minimal coarsening (Table Y). Furthermore, a robustness check to address concerns of confounding from changes in domestic containment and closure policies on the effects of border closures was conducted using moderate coarsening (Table AE). A robustness check was conducted using variables from the highest R-squared model with moderate coarsening (Table AK).
Further robustness checks coarsened by first limiting to the countries matched on in the primary analysis moderately coarsening on the closure policy variables and higher plausibility variables (Table AF), and then a follow-up robustness check matching only on closure policy variables (Table AG). More robustness checks included limiting by the initially matched countries and restricting the time period pre/post global intervention dates by forty-five days (Table AA) or by sixty days (Table AB). The length of forty-five days prior to the global targeted border closure is around the time when the first estimated case of COVID-19 was recorded, and about forty-five days later is the global total border closure date. The same period of 60 days was used in the total border closure analysis. Lastly, we conducted scenario analyses using a subset of the matched dataset and segregated by higher-or lower-tier covariate distributions, using the mean as the baseline for targeted border closures (Table AC) and for total border closures (Table AD). Regardless of the approach employed in selecting the number of variables used in the matching process and the level of coarsening, all results were highly consistent.

Limitations
There were a number of challenges to conducting a rapid evaluation of national border closures in the midst of the ongoing COVID-19 pandemic. Although there is no perfect source of information to quantify COVID-19 incidence, we strived to maximize comparability between countries, legitimacy of data sourcing, and completeness of information. The OxCGRT Data Repository was deemed to best satisfy these objectives, however, issues relating to testing, data reporting, and data comparability remain. Reported cases of COVID-19 are highly dependent on the degree of testing being done to detect transmission, and the accuracy and timeliness of data reporting.
Some challenges to coding the border closure matrix include data validation, reconciling unspecified countries by targeted closures, and missing country data. Firstly, due to the vast size of the OxCGRT dataset, we purposively verified the correctness of coding and cited news sources for 1) countries with large populations; 2) countries with early border closures; and 3) randomly selected countries for a final layer of accountability. Secondly, four countries targeted countries with greater than a stated number of cases but did not explicitly specify the countries to be targeted. We determined the list of countries targeted for these countries based on John Hopkins University case counts. 4 Lastly, seven countries with populations over 1 million that were missing from the OxCGRT dataset were added for analysis (Armenia, Equatorial Guinea, Guinea-Bissau, Latvia, North Macedonia, Timor-Leste, and Togo).                                Table AB. CEM robustness check censoring for 60 days only, before and after the two intervention dates, using higher causal plausibility variables, moderately coarsened. Maximum likelihood randomeffects estimation for targeted border closures. Country-days are matched on moderately coarsened higher causal plausibility variables, with a reduced time period of 60 days, before and after the global intervention dates for targeted (February 5, 2020) and total border closures (March 19, 2020 Table AG. CEM robustness check using containment and closure variables and higher causal plausibility variables, moderately coarsened. Maximum likelihood random-effects estimation for targeted border closures. Country-days are matched on moderately coarsened containment and closure policy changes and the higher causal plausibility variables, restricting only to the countries that were matched upon in the primary analysis. Variables being matched on include the GHSI score, logged GDP, logged passengers flown, health expenditures, closures on workplaces and public transit, stay at home orders, and restrictions on public gatherings. Countries 44 44 110 110 *** p < 0.01, ** p < 0.05, * p < 0.10; standard errors in parentheses Table AH. Fixed-effects regression analyses. Results are presented for all factors found to produce statistically significant fixed-effects interactions between the extent of border closures and their effect on five-day lagged Rt for the Population Model (based on the proportion of population targeted) and Case Model (based on the proportion of cases targeted). The number of countries, regression coefficient, and direction of effect are shown below.     Countries 83 85 122 144 *** p < 0.01, ** p < 0.05, * p < 0.10; standard errors in parentheses