A causal inference approach for estimating effects of non-pharmaceutical interventions during Covid-19 pandemic

In response to the outbreak of the coronavirus disease 2019 (Covid-19), governments worldwide have introduced multiple restriction policies, known as non-pharmaceutical interventions (NPIs). However, the relative impact of control measures and the long-term causal contribution of each NPI are still a topic of debate. We present a method to rigorously study the effectiveness of interventions on the rate of the time-varying reproduction number Rt and on human mobility, considered here as a proxy measure of policy adherence and social distancing. We frame our model using a causal inference approach to quantify the impact of five governmental interventions introduced until June 2020 to control the outbreak in 113 countries: confinement, school closure, mask wearing, cultural closure, and work restrictions. Our results indicate that mobility changes are more accurately predicted when compared to reproduction number. All NPIs, except for mask wearing, significantly affected human mobility trends. From these, schools and cultural closure mandates showed the largest effect on social distancing. We also found that closing schools, issuing face mask usage, and work-from-home mandates also caused a persistent reduction on Rt after their initiation, which was not observed with the other social distancing measures. Our results are robust and consistent across different model specifications and can shed more light on the impact of individual NPIs.


Introduction
The coronavirus disease 2019 (Covid-19) pandemic has caused an enormous impact on the economy and on global public health. As of January 1 st 2022, the disease had over 290 million cases and more than 5 million deaths recorded in over 200 countries and territories [1]. In response to the state of emergency declared by the World Health Organization (WHO) in January 2020, governments worldwide have introduced multiple restriction policies, known as non-pharmaceutical interventions (NPI), to mitigate the spread of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Today, two years from the first outbreak registered in Wuhan China, the relative impact of control measures and the long-term causal contribution of each NPI are still a topic of debate. In June 2020, Flaxman and colleagues [2] pioneered the challenge of estimating the effectiveness of major interventions on the transmission of SARS-CoV-2. On the basis of mortality data and a Bayesian hierarchical model, they concluded that lockdowns had been effective in most European countries that were studied. Since then, due to the increased widespread adoption of restrictions, several studies have attempted to disentangle the effect of individual NPIs in several countries [3][4][5][6][7][8][9][10] or in the US alone [11][12][13][14][15][16][17]. Overall, most previous mathematical modeling suggest that public health interventions were associated with a reduction of Covid-19 incidence. Nonetheless, the conclusions regarding the effect of each specific intervention are not unequivocal. For example, [4,5,7,8,18] point out that school closures had a significant effect on the transmission of new infections, while a review conducted by [19] suggested that closing schools did not contribute to controlling the pandemic in countries like China, Hong Kong, and Singapore. In fact, existing evidence for the impact of policies is not consistent in the literature, as NPI effectiveness may vary across regions depending on the local context [16]. However, even with heterogeneous effects across countries, we found that estimating an overall effectiveness of interventions was possible due to generalization power encountered in cross-country data, which also contained a large enough sample size to detect a causal effect.
In this study, we provide a comprehensive analysis of the employment of five NPIsconfinement, school closure, mask wearing, cultural closure, and work restrictions-in 113 countries during the first 5 months of 2020. We observed that during the first-wave period there was a great deal of consistency in the set of restriction measures imposed throughout the world, which was substantially reduced in the subsequent waves. Thus, we focus on the initial outbreak period, when the long-term consequences of the virus were still poorly understood, vaccines were still not available, and policy-makers were not certain which control measures would be effective. With this approach, we hope to suggest a method to infer the efficiency of restriction measures and to inform future urgent preparedness response plans in the time-critical phase of a pandemic.
We estimate the impact of individual NPIs on social distancing and Covid-19 spread using causal analysis methodology, taking into account confounding factors such as concurrent NPIs, Covid-19 morbidity measures, and country-level socio-economic and demographics factors. We tested two different variables as outcomes in the causal effect estimation. First, we analysed how NPIs influence the mobility trends across different categories of places such as residential and retail/recreation areas. In the second approach, we evaluated the effect on the growth rate of the reproduction number R t , i.e., the rate by which the pandemic spreads. To the best of our knowledge, this is the first study to quantify NPI effectiveness using a causal inference framework in such a wide geography coverage, while accounting for confounding biases and performing sensitivity analyses to assess the robustness of our findings.

Data collection and pre-processing
Non-pharmaceutical interventions. We extracted the data on the restriction policies in 113 countries using the Worldwide Non-pharmaceutical Interventions Tracker for Covid-19 (WNTRAC) [20], a comprehensive database consisting of over 8, 000 Covid-19-related NPIs implemented worldwide. WNTRAC was updated periodically until October 5 th 2021, and we use the latest release for the analyses described hereby, which covers the period until the day before. For our experiments, we selected a subset of the five most well-defined and frequently imposed NPIs in the data: confinement, entertainment/cultural sector closure, work restrictions, mask-wearing, and school closure. Table 1 shows each NPI with its respective description.
Mobility trends. To infer people's dynamic behavioral response to restriction policies, we obtained Google mobility data [21]. These reports include the per-day change in movement across different categories of places compared to a baseline day before the pandemic outbreak. Similar to [17], we chose the change in duration of time spent in residential areas as a primary metric to measure social distancing and policy adherence. We additionally considered the effect of NPIs on the changes of movement in retail and recreation areas, generally seen as nonessential visits. For each category, we smoothed weekly patterns by using the seven-day rolling averaged mobility. When values were missing, we performed a linear interpolation. We used country-level information for all countries in our analysis, except for the US, which contained state-level data for both NPI and mobility data.
Socio-economical and health indicators. We used development indicators compiled from officially recognized international sources to account for heterogeneity in terms of socioeconomic and health factors within individual countries. From the World Bank's World Development Indicators (WDI) [22], we selected a subset of variables, including access to electricity, outdoor air pollution, and forest area. We also included country-level health information, ranging from life expectancy at birth, smoking rate, and prevalence of undernourishment. As we have indicators per year, we take the most recent metrics available per country. Similar covariates were used in previous works [3,4].
We also included variables describing population distribution, age-structure, and human development index from Our World in Data (OWID) [23]. To avoid producing highly biased estimates with missing values imputation, we only analysed countries with data available for at least 70% of the WDI and OWID variables and imputed the remaining missing features with the mean. We characterize each country with a cluster ID obtained through DBSCAN clustering [24] (scikit-learn, version 1.0.1) performed on the vector of socio-economical and health features (see S1 Table in S1 Appendix with the final set of features, S1 Fig in Appendix E1 of S1 Appendix for a detailed explanation on the clustering method and S2 Fig in Appendix E2 of S1 Appendix for analysis on the robustness of missing values imputation).
Covid-19-related variables. Our model used daily cumulative confirmed cases and deaths from the WHO reports [25]. Before estimating R t , we smoothed the number of new cases over time with a local polynomial regression using a window size of 14 days and a polynomial degree of 1 to minimize the impact on the edges. The 14 day span (7 days forward and backward) was used to ensure that an equal number of weekdays was used for smoothing and to account for a time lag between exposure, testing, and documenting case reports. We used the R package EpiEstim developed by Cori and colleagues [26] and extended by Thompson et al.

NPI Description
School closure Closing of any school category: kindergartens/daycares, primary/secondary schools or universities.
Cultural closure Closing of any cultural establishment: bars, restaurants, night clubs, museums, theaters, cinema, libraries, festivities, parks and public gardens, gyms and pools, or churches.

Confinement
Physical distancing and universal lockdown measures. Considered as imposed when it is mandatory for the entire population.
Work-from-home order Work restrictions and mandatory work-from-home orders for all nonessential workers.

Mask wearing
Facial coverings or mask wearing. Considered as imposed if it was mandatory or recommended in public spaces. https://doi.org/10.1371/journal.pone.0265289.t001 PLOS ONE [27]. We chose a gamma distributed serial interval (mean, 3.96 days [SD, 4.75 days]; this interval was constant across periods and derived from previous epidemiological surveys on Covid-19 [28]), to estimate R t and its 95% credible interval on each day via a 7-day moving average. EpiEstim was chosen due to its statistically robust analytical estimates and extensive use in the disease epidemiology literature. Because estimating the serial interval distribution may not be possible in the early phase of an outbreak, or may be associated with significant uncertainty as countries still had to establish their documentation practices, we exclude the period before 100 cumulative cases of each country from our dataset. Therefore, the window of analysis for each country starts on the day the cumulative cases exceeded 100 and ends on June 1 st 2020. When concatenating all data sources, we found an intersection of 113 countries that were used in our final analysis. Countries like China, Iran and part of central African countries were excluded due to missing mobility data (see geography selection in Appendix E3 of S1 Appendix).

Statistical analysis
Outcome prediction. To investigate whether the observed variables (Covid-19 morbidity measures, socio-economical factors and status of NPIs) contained the predictive power for the outcomes of interest, we built prediction models to estimate R t and the change in mobility. We split the dataset into two non-overlapping sets: 70% of samples used for training and 30% for validation. We used the gradient boosted trees algorithm with the XGBoost [29] package (version 1.2.0; https://xgboost.ai), with hyperparameters tuned using 5-fold cross validation on the train dataset, via a randomized grid search. The TreeExplainer from the SHapley Additive exPlanations (SHAP) [30] package was fit on the validation set to estimate the association and contribution of each feature to the XGBoost model.
We evaluated the accuracy of the predictions using the mean-squared error (MSE). To compare between the performance of different models at different scales, we normalized the estimates and the observed values in the test set before computing the MSE. The 95% confidence intervals (CI) were obtained with 100 bootstrapping iterations.
Potential outcome framework. We formulated causal effects in terms of the potential outcome framework [31].
Each country i at each point in time t is characterized by a feature vector X i,t , consisting of dynamic variables-the cumulative and new number of cases/deaths per million-and a static variable-a socioeconomic cluster ID, which is country-specific and constant over time. In addition, we included information regarding the status of the restrictions. This is a feature vector containing a set of binary features corresponding to whether other NPIs are in place at that time point. We denote Y ðaÞ i;t as the potential outcome in a given country i on day t for binary treatment a.
We were interested in two possible outcomes: (i) M ðaÞ i;t , the difference in the country-level mobility defined as in [32] and (ii) R ðaÞ i;t , the change in SARS-CoV-2 transmission represented by a ratio of the reproduction number R t , defined as in [9]. We evaluated the outcomes with respect to a time-lag w (in days), as follows: In this study, we considered the effect on the outcomes 2 weeks after the NPIs were enacted ± 1 week (i.e., w = 7, 14 and 21). Our goal was to estimate the Average Treatment Effect on the Treated (ATT), defined as where ATT M and ATT R denote the causal effects of human mobility trends and R t , respectively. Thus, a null effect of NPIs on mobility would be equivalent to ATT M = 0. Similarly, a null effect on R t corresponds to ATT R = 1.
To identify such a causal effect, we make several assumptions: (i) conditional exchangeability (ii) Stable Unit Treatment Value Assumption (SUTVA) and (iii) positivity. Identifying the ATT requires a weaker version of the assumptions above, as we are only estimating potential outcomes for the treated part of the population. A detailed discussion about the assumptions can be found in Appendix E5 of S1 Appendix.
Study design. A crucial component for estimating causal effect from observational studies is the ability to conceptualize and to emulate randomized experiments. To this end, we designed a cohort study exclusively separating samples (in this case, days) from the treatment and control groups. For each NPI k 2 K, we defined an event as the day NPI k was enacted in a certain country, given that this intervention was not in action the day before. We denote this day t � k . We defined the treatment group for NPI k as the set of t � k in the 113 countries during their respective period of analysis (Fig 1). The control group includes all days in the cohort except t � k � w 8k where w is the number of days before and after the event, and it is also the same time-lag period defined in Eqs 1 and 2. The control cohort represents a period where none of the five NPIs were enacted (an event did not occur), but they might still be in place during that period. Unlike the treatment groups, which are NPI-specific, the control groups are the same for all NPIs used as treatment.
Covariate balancing. We used Adversarial Balancing (AdvBal) [33] to estimate the mean potential outcome of the control group. In brief, AdvBal borrows principles from generative adversarial networks to assign weight to each sample in some source data, such that the resulting weighted data becomes similar to a given target data. Unlike the original paper, we defined only the treated group as the target population, and not the entire population, i.e., we fixed the weights of the treated samples and adjusted the weights of the control in order to balance the groups. The implementation of AdvBal and the resulting balancing evaluation of the models were done with the Python package causallib [34] (version 0.7.1; https://github.com/IBM/ causallib). The final effects and the associated 95% confidence intervals were computed with 1, 000 bootstrap samples. Sensitivity analysis. We performed sensitivity analyses to test our conclusions under different scenarios. First, we reran our framework with a different model to estimate treatment effect from observational data. We used inverse probability weighting (IPW) [35], a longstanding popular method that overcomes confounding by weighting samples by the inverse of their probability of being assigned to their treatment, conditioned on their covariates. These probability parameters were estimated with a logistic regression model. Once confounding was adjusted and the weights were computed, we estimated the causal effect on the treated sample as a weighted average of the outcome.
Second, we performed a complementary analysis to test whether the estimated effects were consistent when using an alternative study design. In this approach, the treatment groups are the same as in the original study (Fig 1). On the other hand, the control group consists of events from the remaining NPIs (apart from the NPI used as treatment). For example, if we were to estimate the effects of NPI k 2 K, the treated group would contain only the events for this specific NPI, t � k , whereas the control group would be the events of all other NPIs K -k , as long as these events do not overlap with t � k . The entire workflow of our study is simplified in Fig 2.

NPI employment statistics
As a preliminary step, we examined the frequency at which each NPI was imposed in the 113 countries (Fig 3). Since governments introduced and lifted lockdowns and social distancing measures consistently throughout time, we observed a continuous growth in the number of confinement events (Fig 3A). This behavior was the opposite for mask wearing events, which had a swift and sustained increase until October 2020, and then stayed constant for the rest of the period. By the end of March 2020, over 60% of all documented NPI events until October 2021 had already happened. We then analysed the distribution of NPIs across different countries before and after June 1 st 2020 (S6 Fig in Appendix E9 of S1 Appendix). We discovered that in the first wave, most countries imposed a large and diverse set of NPIs, which was greatly reduced in the following waves. For example, European countries like Spain and Italy introduced very similar government policies during their first waves, but adopted different strategies in the subsequent period: while Italy imposed similar preventive measures with less frequency, Spain placed more emphasis on lockdowns. Overall, throughout 2021, countries had to choose the appropriate NPIs that best fit their socio-economic circumstances and gradually lifted restrictions to avoid negative impacts in the economy [36].
Because a large number of restriction measures were initiated simultaneously in multiple countries within a short period of time, we analysed the events' coincidence per country until end of May 2020. The co-occurrence matrix in Fig 3B shows that the NPIs are co-linear in this period, i.e., they frequently co-occurred. For example, out of 57 confinement events, 12 of them occurred within a period of two weeks (seven days earlier or later) of a work restriction event in the same country. When high co-linearity exists, individual effect estimates are more challenging, as it is harder to identify what in fact drives the change in SARS-CoV-2 transmission. To overcome this problem and to control for the influence of individual NPIs on the Fig 1. Schematic presentation of the study design. Vertical lines represent days. We denote day t � k (in red) as the day when NPI k was enacted, considering that this NPI was not active the day before (i.e., event date). The treatment group for NPI k consists of t � k across all countries. The control group are the remaining days (in blue) outside the time interval t � k � w, where w is a lag period. https://doi.org/10.1371/journal.pone.0265289.g001 estimated effects, we opted to add the status of NPIs as confounders to our causal model (recall section Potential outcome framework).

Predictors of mobility change and Covid-19 R t
The prediction performance on the validation dataset of both outcomes and their feature contribution plots are summarized in Fig 4 (see S3 Table in Appendix E4 of S1 Appendix for technical specifications of the XGBoost algorithms). To evaluate the performance of the XGBoost models, we compared their accuracy to the one of a model based on permutation tests [37].

PLOS ONE
This model generates a null distribution by calculating the normalized MSE under the null hypothesis, where in each bootstrap replicate the features are kept the same but the outcomes undergo different permutations. We found that for all scenarios considered in this study (different time lags and different outcome variables), the features we utilized showed enough statistical power to allow a good prediction (Fig 4C). Associations between the outcome variables

PLOS ONE
and baseline features can be visualized using SHAP "beeswarm" plots, that show the top-10 contributing features for the outcomes prediction using a time-lag of 7 days (Fig 4A and 4B). The strongest predictor of R t was its own value 7 days before, implying a clear positive association between R t measures within a one-week period (Fig 4A). Such strong a association was not observed with other important features, such as confirmed cases and deaths. Changes in mobility within several categories also impacted the R t prediction. In particular, an increase in mobility in workplaces and retail contributed to a higher SARS-CoV-2 transmission. The feature importance analysis of residential mobility prediction revealed that mobility categories are highly correlated: as the time spent at home increases, the movement outside residence decreases (Fig 4B).

Estimated effect of interventions on policy adherence and SARS-CoV-2 transmission
We derived causal effect estimates using balancing weights that minimized the confounding biases. The absolute standard mean difference (ASMD) was used as a measure to compare the distribution of observed baseline covariates between treated and untreated groups. The AdvBal algorithm could significantly decrease differences between covariate distributions, bringing the ASMD to less than 0.25 for all covariates (S3 Fig in Appendix E6 of S1 Appendix). This value is considered a reasonable cutoff for acceptable standardized biases, indicating that the effect estimates are reliable and robust to confounding [38].
We report the estimated causal effects d ATT M and d ATT R without balancing weights (unadjusted) and with balancing weights generated by the AdvBal algorithm and IPW (Fig 5). The estimated effects were compared with different models to show how the results change under different scenarios, and to strengthen our analysis by providing strong corroboration In both plots, the opacity of the markers represents the ability of the balancing weights method to balance the treatment groups: the smaller the ASMD is, the more opaque the markers are. Fully opaque markers indicate an ASMD <0.1, half-transparent markers indicate 0.1 � ASMD � 0.25 and most transparent ones represent ASMD > 0.25. Apart from mask wearing mandates, all NPIs caused a significant increase in time spent at home. Of those, school and cultural closures were the most effective. Closing schools, issuing face mask usage, and work-fromhome mandates also caused a persistent reduction in R t after their initiation, which was not observed with the other social distancing measures. Code used for generating figure is available at https://github.com/barakm-ki/symptomsdynamics-of-COVID-19-infection/blob/master. for the correction of bias. All estimates are presented with 95% confidence intervals and were obtained after assessing whether the balancing weights approaches were able to reduce biases.
Our results suggest that the full extent of NPIs, except mask wearing, significantly affected human mobility change as early as 7 days after their initiation. However, changes in mobility varied between these 4 restrictions. School and cultural closures caused a quick and sustained increase in the time spent inside the home, whereas confinement and work-from-home orders had a slower and plateauing effect over time. Estimates from the AdvBal algorithm suggest that, following the introduction of NPIs, the time spent at home 14 days later was estimated to increase by 9.06% [95% CI: 6.86%, 11.10%], 9.22% [95% CI: 6.90%, 11.66%], 13.29% [95%CI: 10.57%, 15.95%] and 11.38% [95%CI: 8.83%, 13.80%], in response to confinement, work restrictions, school and cultural closure, respectively. Since people already spent a lot of time in their residence before the pandemic, movement changes in residential areas were likely to be smaller compared to outside locations, such as recreation and retail areas ( Results of a complementary analysis based on an alternative study design are described in Appendix E8. We found that under a stricter cohort design, where we compare the effect of NPIs with respect to the other NPIs, school closure had the greatest impact on increasing mobility in residences and in reducing R t 14 days after its initiation (S5 Fig in S1 Appendix).

Discussion
We constructed a dataset that combines rich information about countries and their reaction to the urgent need to control the pandemic spread. The data include information on social-economics and health benefits, NPIs, and mobility data from more than 100 countries. We showed that the data have predictive power, and that the prediction of changes in mobility after imposing NPIs is more accurate than the prediction of the reproduction number. We employed a causal inference approach to quantify the effect of NPIs on the rate of R t (i.e., transmission of SARS-CoV-2) and on the change of human mobility, which is considered a proxy measure of population adherence and social distancing. The purpose of this study was to help infer the efficiency of interventions in the early months of a pandemic, when a number of control measures had already been imposed by multiple countries in the absence of vaccines.
The most common use of causal inference seeks to estimate the average treatment effect (ATE). Such an analysis would answer questions such as "what would have happened if every country applied the NPI?". However, in this case an analysis of this type proved to be unfeasible, since it was not possible to balance the confounding differences between the treated and untreated countries. We therefore opted to measuring the impact of NPIs in the countries that chose to impose the NPIs, which is known as the average effect on treated (ATT). This answers the question of "what was the effect of applying the NPIs in the countries that applied it?". This approach allowed covariate balancing, which provided more reliable estimates of the effect of NPIs.
Our findings showed that mask wearing did not significantly impact mobility patterns in the first wave. Although a number of countries favored face mask usage early in their outbreaks [39], the main reason people changed their behavior was social distancing policies. We found that issuing confinement, work-from-home orders, or school/cultural closure mandates resulted in high levels of policy compliance even one week after their initiation, as measured by changes in movement in residential areas. This result was consistent with recent findings [32,40].
The estimated effect on R t showed that not all NPIs significantly contributed to a decrease in SARS-CoV-2 transmission. In particular, school closure achieved a sustained decline on the rate of R t , similar to what was found in observational studies of the first wave [4,5,[7][8][9] and second European wave [41]. Because infected children can experience mild or no symptoms more frequently than older individuals [42] and tend to have more social contact than adults [43], it is expected that closing schools would considerably contribute to reduce the transmission. In contrast, on its own, closing cultural establishments does not seem to have an effect on the reproduction number, nor do work restrictions in the first 14 days after imposing the NPIs. Notably, we did not find substantial differences in the results when performing sensitivity analysis.
Our study extends previous first-wave estimate studies [2,4,5,[7][8][9][10] in a number of ways. First, we used the potential outcome framework to infer NPI effects. In its simplest form, our causal model made use of standard causal inference methods to correct observed biases and obtain valid effects with more transparent confidence intervals. Second, we addressed the issue of concurring NPIs by using their status as covariates in the causal model. By ensuring that all NPI-related covariates are well balanced between treatment groups, we enhanced the power to detect independent NPI effects. Third, we account for heterogeneity of countries by including a social economical cluster indicator in the dataset. Fourth, we conducted complementary analyses under alternative scenarios to test our conclusions not only with a different cohort study design, but also with another balancing weights generation approach.
We acknowledge several limitations in our analyses. Even with data from multiple countries that had diverse sets of interventions in place, inferring NPI effects still remained a challenging task. First, the R t estimation was based on epidemiological parameters that are only known with uncertainty, due to many mild or asymptomatic cases that make it difficult to model the timing for the onset of symptoms and serial interval distributions. On top of that, R t also relies on the data of confirmed cases, which were generally unreliable in the early days of the pandemic due to lack of testing availability and not-established documentation practices. To account for this, we began our analysis at each country's 100 th case. Secondly, the data are retrospective and observational, meaning that unobserved factors could confound the results. Third, we were unable to assess the effect of lifting interventions. Since the NPI events in WNTRAC are automatically extracted from Wikipedia articles, which report the introduction of NPIs more frequently than their relaxation, the number of lifting events documented in the database did not have enough statistical power for causal inference. Yet, we believe we set the ground for a thorough analysis of NPIs and we were able to draw conclusions regarding the effect different NPIs had on the pandemic spread. Future work can assess the causal effect of the post-vaccine newly defined NPIs where health certificate notions were introduced in somewhat similar ways across different countries.

Code availability
The source code for the causal inference evaluation of NPIs is available in a public GitHub repository at https://github.com/IBM/causallib. Please refer to the README file in the repository for further instructions on using the code. Requests for the code used to generate the results and the plots should be directed to the corresponding author.