Pandemic catch-22: The role of mobility restrictions and institutional inequalities in halting the spread of COVID-19

Countries across the world responded to the COVID-19 pandemic with what might well be the set of biggest state-led mobility and activity restrictions in the history of humankind. But how effective were these measures across countries? Compared to multiple recent studies that document an association between such restrictions and the control of the contagion, we use an instrumental variable approach to estimate the causal effect of these restrictions on mobility, and the growth rate of confirmed cases and deaths during the first wave of the pandemic. Using the level of stringency in the rest of the world to predict the level of stringency of the restriction measures in a country, we show while stricter contemporaneous measures affected mobility, stringency in seven to fourteen days mattered most for containing the contagion. Heterogeneity analysis, by various institutional inequalities, reveals that even though the restrictions reduced mobility more in relatively less-developed countries, the causal effect of a reduction in mobility was higher in more developed countries. We propose several explanations. Our results highlight the need to complement mobility and activity restrictions with other health and information measures, especially in less-developed countries, to combat the COVID-19 pandemic effectively.


Introduction
By December 31, 2020, the COVID-19 pandemic had infected over 90 million people and claimed almost 2 million lives. Countries across the world have responded with what might well be the set of biggest state-led mobility and activity restrictions in the history of humankind. The hope is to contain the contagion and reduce the congestion in health-care utilization. But besides being controversial and costly, such measures may not always be successful in containing the spread and can, sometimes, worsen the situation (see, among others, [1][2][3][4][5][6][7][8][9][10]). The trade-offs are bigger for developing countries. In the absence of proper social security support, they face a catch-22 situation where strict mobility and activity restrictions, especially if ineffective, will unnecessarily increase the economic cost through lost livelihoods and, perhaps, even compromise the immunity of the vulnerable population. The natural question that then follows is whether such measures have been effective in controlling the COVID-19 contagion.
If yes, what institutional factors contribute to their effectiveness? Multiple recent studies have submitted that there exists a negative association between such restrictions and the contagion [11][12][13][14][15][16][17]. However, much of the work either simulate counterfactual scenarios or documents association between the restrictions and the contagion. With studies suggesting a steep economic cost of such restrictions, to be able to design optimal mitigation policy for COVID-19 and future pandemics, it is crucial to understand whether, when, where, and how much do these restrictions causally affect the contagion [18]. Few studies attempt to identify causal effects of the restrictions using a difference-in-differences (DiD) design-comparing regions with high and low levels of restrictions [12,16,17]. But the restrictions were almost always in response to the disease situations in the region. Areas with worse contagion or more watchful populations might have enacted stringent restrictions relatively early. Since these factors could have also affected the evolution of the disease scenario, the assumption of parallel trends underlying the DiD methodology is unlikely to hold.
We propose an instrumental variable approach to estimate the causal effect that the level of stringency of the restrictions had on human mobility and the growth rate of the contagion. In deciding whether to impose restrictions, national and local governments took into account not only the prevailing disease situation in the country (the factor that confounds DiD estimates) but also what they expected would happen in the future in the presence and absence of such restrictions. Lacking perfect foresight, they made predictions based on their observations of the condition in the rest of the world [19]. Governments that witnessed a rapid increase in the number of COVID-19 cases and subsequent mobility and activity restrictions in the rest of the world in the days following the first confirmed case in their own country, sprung into action swiftly and imposed stricter restrictions. It is important to note that enacting strict policy measures does not necessarily translate to enforcement or compliance, but rather an acknowledgment of the need for stricter restrictions, and possibly an intent. Building on this insight, we use the day-to-day changes in the stringency of the restrictions in the rest of the world to instrument how stringent a country's internal mobility and activity restrictions were.
We conduct our analysis combining high-frequency measures of mobility data from Google's daily mobility reports, country-date-level information on the stringency of restrictions in response to the pandemic from Oxford's Coronavirus Government Response Tracker (OxCGRT), and daily data on people tested, confirmed cases, and deaths attributed to COVID-19 from Our World In Data and the Johns Hopkins Center for Systems Science and Engineering (CSSE). Using the instrumental variable technique, we estimate large causal effects of stricter restrictions on mobility and the weekly growth rate of recorded cases and deaths attributed to COVID-19. In comparison, we find that more stringent restrictions have weak marginal effects on the growth rate of tests conducted. Consistent with the current scientific understanding that an infected human can infect another human up to 14 days since being infected, we see that the level of stringency of the restrictions in the previous two weeks matters more than the contemporaneous level or level of restrictions 3 weeks in the past (see Qiu et al. (2020) [15] and the studies they cite for a discussion of the incubation and the infection period). We also document considerable differences between the correlation and causal estimates that raise concerns over the use of the association estimates from previous studies to evaluate the costs and benefits of the restrictions.
Next, we show that the effectiveness of the restrictions varies significantly across countries as per their institutional capacities. In particular, more stringent measures help more in richer, more educated, more democratic, and less corrupt countries with older, healthier populations and more effective governments. Finally, we draw attention to the observation that announcing restrictions does not necessarily imply a reduction in mobility; it depends on the level of compliance and enactment of the policies. The estimated reduced-form effects of stringency on the growth rate of cases and deaths incorporates the differential compliance across countries. Utilizing a recursive mixed-process model, we show that even though the stricter restrictions had a larger negative marginal effect on mobility in relatively less-developed countries, they were more effective in containing the contagion in more developed countries. However, as readers will see, while this result is stark when the outcome variable is cases to tests ratio, it is not as stark for deaths to tests ratio. Consistent with the institutional heterogeneity results, these results indicate that imposing mobility restrictions is not enough to contain the contagion in developing countries, and the benefits reaped from high stringency are lower relative to developed nations. The restrictions should be effectively complemented with other policy measures, such as raising awareness about best practices when these restrictions are imposed, and health and economic assistance for those affected [20,21].
The findings have important policy implications. COVID-19 is not the first and will not be the last epidemic to afflict humanity. Better future preparedness requires a better understanding of when and how to act in times of such crises. Understanding the effectiveness of mobility and activity restrictions in containing contagions will not only help us optimize our current response to COVID-19 but also prepare us better to face future disease outbreaks. The institutional heterogeneity analysis suggests that increasing stringency alone might not be enough, especially in developing countries where labor market conditions, lacking health infrastructure, and constraints on implementation infrastructure might limit the effectiveness of these restrictions. Since the economic downturn can negatively affect the health and welfare outcomes in poorer countries more than in rich countries where the transition into work from home is relatively easier, this raises serious concerns about the cost-effectiveness of stringent mobility and activity restrictions in the absence of complementary policies. The results call for a country-specific policy response suited to the institutional capacity and socio-economic circumstances of the country.

Data and summary statistics
For our analyses, we collate and link country-level daily data from the following sources: 2.1.1 Google community mobility reports. To facilitate better monitoring and compliance to the nationwide and local lockdown decisions and social-distancing requirements to reduce the transmission of the COVID-19 contagion, Google has released publicly daily aggregated data on changes in mobility across six key high-level location categories in 131 countries. The mobility measures reflect how busy these places are. The six location categories are groceries and pharmacies, retail and recreation sites, parks, transit stations, workplaces, and residences. We source the mobility data from these reports from the 15th of February to the 30th of July reflecting daily percentage changes in reference to a baseline. The baseline is the median value of mobility for the corresponding day of the week during the 5 weeks of January 3, 2020, to February 6, 2020. These measures of changes in mobility across the six location categories serve as our first set of outcome variables.
While a reasonable measure of the extent of compliance to the restrictions, or successful enactment of the policies, the data comes with certain caveats. The reports are generated using a technology similar to the real-time anonymized Google Maps traffic data, and as such are reflective of only those users who have their location history setting turned on in their (cellphone, tablet, etc.) Google account [22]. Therefore, while the data is impressive, it is not representative of the population at large. Another important aspect to note is that while the residential category shows the relative change in daily time spent at home, the other measures reflect respective daily relative changes in the number of individual visits. So, the residential category carries a different unit of measurement than the other categories and thus should be interpreted as such.

Oxford COVID-19 Government Response Tracker (OxCGRT).
OxCGRT provides a comprehensive and systematic country-level daily stringency index, constructed based on common policy responses implemented by governments to combat COVID-19 [23]. Stringency is measured as a composite score, equally weighted and normalized between 0 and 100 for each country (with 100 being the strictest response), using eight ordinal indicators of containment, movement restriction, and closure policies, and a ninth indicator measuring the coordinated presence of public awareness campaigns on the pandemic. The containment indicators include school closures, workplace closure, cancellation of public events, restrictions on gatherings, public transport closure, stay-at-home requirements, internal movement restrictions, and international travel controls.
Since the stringency index further tracks how quickly governments implemented or rolled out their policy measures, we use the index as our primary independent variable of interest. As contemporary stringency measures would affect mobility but its effect on the growth of the pandemic would be observable only days after, we also use lagged values of the index in our analyses. While the index provides a numerical score to the strictness of the policies enacted, it does not reflect the compliance or effectiveness of the stringency put in place. The case for compliance is more relevant when exploring the legally binding nature of the policies. For example, Katafuchi et al. (2020) [24] shows that even without the declaration of a state of emergency in Japan people partially suppressed their mobility. Although, expectedly, mobility was suppressed more with the state of emergency in place. This paper aims to provide a more comprehensive overview of mobility changes, and their subsequent role in containing the contagion, with or without legally binding policies. Hence, while a higher score in the index reflects a willingness for greater stringency, it does not necessarily translate to a country's response being better than countries with a lower score.
It is also important to note while some countries enacted rigid mobility and activity restrictions, other countries adopted more flexible measures. Further, these levels of flexibility/rigidity have changed within a country over time. OxCGRT integrates these fluctuations into their stringency index by categorizing each of the nine indicators into ordinal levels by the rigidity of the restriction. For example, school closures are categorized into "0-no measures; 1-recommend closing or all schools open with alterations resulting in significant differences compared to non-Covid-19 operations; 2-require closing (only some levels or categories, for eg. just high school, or just public schools); 3-require closing all levels" [25]. The final stringency index is then a composite weighted index where higher values reflect the levels of rigidity of the restrictions. Please refer to [25] for details on the index's construction. S1-S9 Figs in S1 Appendix provide event graphs of the stringency index by country over time for all 127 countries in our sample. Values above 50 can be interpreted as the country undertaking relatively stricter measures.

COVID-19 outbreak data.
We source COVID-19 country-specific daily data on confirmed cases per million and deaths attributed to COVID-19 per million from the Johns Hopkins Center for Systems Science and Engineering (CSSE) COVID-19 data repository [26]. We combine this with tests per million population data collated by Our World In Data (OWID). Since it takes some time for delayed reporting to be reflected in the dataset, we use 7-day moving averages of the outbreak variables and restrict our focus to events between February 15, 2020, to July 30, 2020. It is perhaps worth mentioning that even if we do not use moving averages, our conclusions remain the same. Results are available upon request. We construct daily growth rates of 7-day moving averages for the outbreak variables-tests, cases, cases to tests ratio, deaths, and deaths to tests ratio-and use them as our second set of outcomes. We explain the rationale behind using ratios in Section 2.2. OWID collects testing data from country-specific official government reports and is available only from 85 countries. We limit our analysis to the 127 countries (80 for tests) that have mobility, stringency, cases, and deaths data available. The countries are listed in S1 Table in S1 Appendix.
Several studies and media outlets have reported that due to country-specific differences in testing rates, data aggregation, and reporting quality, the number of cases and deaths are under-reported [27][28][29]. Testing data, when available, has a strong selection bias with many countries screening and testing only those people who presented symptoms. The extent of this selection bias might be systematically related to country-specific characteristics. While we control for country and time fixed effects in our empirical specifications, it will not account for systematic changes in selection bias over time across countries. Therefore, this study, like all studies utilizing the CSSE and the OWID data, should be interpreted with a healthy dose of skepticism. We further elaborate on such data limitations in Section 2.2 and what we do to best circumvent the constraints.

Heterogeneity variables.
To investigate the role of institutional heterogeneity in the impact of the restrictions across developing and developed countries, we link our data with various pre-COVID-19 country-specific demographic, health, and governance factors, that may aid or hinder the stringency effect on people's mobility and the spread of the disease. Along the demography dimension, we examine heterogeneity by population density, education, poverty headcount, economic inequality (Gini index), the share of the population aged 65 years or above, and air pollution per capita (measured by the concentration of suspended particulate matter in the air with a diameter of 2.5 micrometers or less-PM2.5). We also examine heterogeneity by available hospital beds per 100 thousand population (a proxy of available healthcare infrastructure), the share of the population with hand-washing facilities on-premises (a proxy for the availability of tools to combating the growth in transmissions), and the death rate from cardiovascular diseases (CVD) (a proxy for share of the immune-compromised population who face higher risks from COVID-19).
Finally, we examine heterogeneity along country's governance indicators using the Economist Intelligence Unit (EIU) democracy index, government effectiveness from the Worldwide Governance Indicators (WGI) [30], and the corruption perception index (CPI) developed by Transparency International (TI), where larger values represent cleaner countries. The vast majority of data from the demography and health dimensions are sourced from the World Development Indicators (WDI), United Nations Population Division, or the Global Burden of Disease Collaboration Network. S2 Table in S1 Appendix provides details of the sources for each of the variables used, and Table 1 below presents the summary statistics.
The stringency index appears to be skewed to the left with a mean value of 64 below the median of 71, meaning there is a relatively long tail of days with lower stringency scores. All mobility measures, excluding residential mobility, show a percentage decrease in the visits with the decrease being greatest at about 38 percent at transit stations, followed closely by mobility around retail and recreation sites. On the other hand, the percentage change in time spent at home increases by about 13 percent. Segregating the measures by developing vis-à-vis developed countries, reported in S3 Table in S1 Appendix, shows that even though mean stringency is relatively similar in both cohorts, mobility changes were mostly greater in developed vis-à-vis developing countries (except in mobility around parks, and slightly for grocery and pharmacy). This could be an initial indication of either overall lower compliance to mobility restrictions in developing countries, or greater self-regulation in developed countries (for example, while transit stations see a decrease of 34 percent in developing countries, developed countries see a 40 percent decrease). Mean cumulative 7-day moving average daily growth rates of tests, cases, and deaths are 5, 7, and 7 percent, respectively, while that of the ratios are smaller, with cases to tests ratio at 1 percent and deaths to tests ratio at 2 percent. Finally, while the mean statistic of the variables provides a snapshot of the overall sample, developing countries are, on average, significantly less educated, poorer, younger, more polluted, lack adequate health infrastructure, face greater corruption, and have poorer levels of democracy and government effectiveness (see S3 Table in S1 Appendix).

Data limitations
In the absence of a unified framework for testing and reporting for COVID-19 infections, the available data suffers from a multitude of problems. These range from no count of COVIDpositive people who are not diagnosed including those asymptomatic, varying assay specificity and sensitivity leading to false-negatives or false-positives, differences in testing, comorbidities, imperfect reporting, the release of incorrect data, and delays in reporting, to name a few. Angelopoulos et al. (2020) [31] discusses how the problems could bias estimation in either direction depending on their relative magnitude, and Millimet and Parmeter (2019) [32] provides a discussion of cases when data is skewed in one direction due to one-sided measurement errors. While many of these data issues cannot be resolved, we interpret our estimates with caution and perform various robustness checks to minimize the bias in the comparisons we make. One glaring problem is the misreporting of the number of (per capita) confirmed cases. To account for delays in reporting, we use 7-day moving averages of COVID-related measures. However, the number of confirmed cases depends on the number of tests conducted, which itself varies across time within each country. This variation across time within each country will not be absorbed by the country or time fixed effects that we include in our specifications. In general, however, countries have increased testing over time (which could be with country income level or other country characteristics), albeit at differential rates. Using the growth rate of cases would, therefore, bias our results. Instead, we use the ratio of cases to tests. If the rates of infection in the untested population are similar to the rates of infection in the tested population, the ratio of cases to tests is a better reflection of the prevailing disease situation.
Similar measurement errors plague the information about deaths due to COVID-19. For surveillance purposes, WHO [33] defines a COVID-19 death "as a death resulting from a clinically compatible illness in a probable or confirmed COVID-19 case, unless there is a clear alternative cause of death that cannot be related to COVID-19 disease." Since, COVID deaths are ascertained in relation to probable or confirmed COVID-19 cases, where the latter depends on the number of tests within a country, data about deaths due to COVID from different countries suffer from different levels of measurement errors. For example, even within Europe, countries like Belgium have a more comprehensive approach to reporting deaths due to COVID than the United Kingdom that does not count non-hospital fatalities [34]. But if the rates of recovery from infections are similar in the tested and the untested population, the infection fatality ratio (IFR ¼ deaths infected ) and the case fatality ratio (CFR ¼ deaths cases ), reported by many countries, can be relatively good measures of COVID-related mortality in the population. However, as WHO [35] points out, the accuracy of these measures relies on two assumptions. First, that the likelihood of detection of confirmed cases and deaths due to COVID is consistent over time; and second, all detected cases are resolved (either recovered or died). Given testing and reporting limitations, often both these assumptions might be violated.
Other studies have explored statistical methods to correct for the bias, each with their own caveats (see for example, [36][37][38][39]). However, there is no agreement on whether these statistical measures yield better estimates and whether one measure is superior to the others. Instead, we construct a deaths to tests ratio as a measure of the disease situation. We prefer using deaths to test ratio to deaths to infected and deaths to cases ratio because the latter measures have time-varying measurement error in both the numerator and the denominator. Without any information on the relative degree of measurement error in the numerator and the denominator, it would be difficult for us to sign the bias. While the information on tests conducted is not free of measurement error, unless deliberately misreported by the reporting country, the error should be relatively small. So, while the deaths to tests ratios are not free of error, it is, arguably, the most comparable measure across countries. The estimated effects, albeit differing in magnitude, remain qualitatively similar when we use other measures: death per 100,000, infection fatality ratio, or case fatality ratio. In addition, since countries varied in their approaches to track the spread of the disease, this measure allows us to check the robustness of our results by limiting our sample to countries with relatively more reliable infection rate data.
There are three possible ways to limit the sample to countries that provide reliable test data. First, in the absence of testing a randomly selected sample from the population that most countries lack the resources to implement, Angelopoulos et al. (2020) [31] expands on how contact tracing can be a powerful tool that allows otherwise intractable biases to be controlled. Contact tracing expands to include a much larger section of the target population, specifically a larger portion of mild and asymptomatic cases, that are otherwise left out from the testing pool. The authors show that by assuming non-response rates to contact tracing as identical for asymptomatic and symptomatic cases, asymptotically unbiased estimations can be obtained. Therefore, we could limit our sample to the 59 countries that conducted comprehensive contact tracing and tests (which also has a good split of developed and developing countries, as indicated in S1 Table in S1 Appendix). S4 Table in S1 Appendix reports the mean of the 7-day moving average growth rates by no, limited, and full contact tracing. As expected, the growth rate means fall with increasing contact tracing. But the means fall only slightly when compared to the full sample, except for the cases to tests ratio.
A second approach could be to limit the sample by the country testing policy. Data from countries with an open public testing policy might be better representative than from countries with only limited testing. But it would still suffer from selection bias as people might self-select into testing. OxCGRT categorizes testing policy into four groups: (1) no testing policy, (2) limited testing of those who both have symptoms and meet specific criteria (eg. key workers, admitted to hospital, came into contact with a known case, returned from overseas), (3) symptom-based testing, and (4) open public testing. S4 Table in S1 Appendix once again shows the falling growth rate means with better testing policy. However, only 34 countries conducted open public testing within the timeframe of our study, resulting in a much smaller sample of 2,866 (compared to 10,539 for the full sample). Further, since only developed countries with adequate resources were able to adopt this testing approach, restricting the sample by the country's testing policy will be against the purpose of this study.
A third approach could be restricting the sample according to the type of testing data reported. A vast majority of countries report only the total number of tests conducted, doublecounting follow-up or repeat tests for the same person [40]. This double-counting would, in most cases, exert an upward bias on the estimated effects. Thus, limiting the sample to only the countries that report the number of people tested may be a viable approach. Austin and Kachalia [2020] [40] posits that these countries may also be reporting quality data relative to others. However, only 21 countries reported the number of people tested, and when compared, S4 Table in S1 Appendix shows that the growth rate means are fairly similar between the two groups that reported the number of people tested and that reported the total number of tests.
Among the three approaches, we believe restricting the sample by comprehensive contact tracing would work best in minimizing measurement error in testing. Further, since testing strategies have changed over time for some countries, limiting the sample to days when contact tracing was done as a consistent testing strategy, will also be a good robustness check for our results. First, we conduct our analysis using the full sample. Then, we check the robustness of the findings using the restricted sample of countries (and days) that conducted comprehensive contact tracing as a consistent testing strategy.
One other data concern is that policy stringency measures from OxCGRT capture only the restrictions imposed, and not how they are enforced or the behavior of the citizens. Our estimates, therefore, are net of enforcement, compliance, and mitigating self-disciplining behavior of the citizens. One of the nine indicators used to construct the stringency index is a measure of the coordinated presence of public awareness campaigns about the virus. Any change in the citizens' self-disciplining due to public awareness campaigns should be captured by this component of the stringency measure. Therefore, the stringency effect not only reflects citizens' response to policy measures put in place but also any self-disciplining effect. The available data do not permit us to disentangle the two.

Empirical strategy
Investigating the causal impact of the level of stringency on the mobility indicators and COVID-19 outbreak growth rate variables present a few empirical challenges. While we do not provide a theoretical model to mathematically breakdown the causal mechanism, Keppo et al. (2020) [41] extend the epidemiological SIR model to a "behavioral SIR model" and is a good resource for anyone looking for a theoretical construct. First, governments around the world enacted these measures in response to the disease situation in their countries. Therefore, ordinary least squares estimation (OLS) of the associations between the stringency of the policy measures and the outbreak growth rates could be driven by reverse causality-countries with worse disease situations had to enact more stringent measures to control the contagion. Similarly, even without the announced restrictions, countries with a higher proportion of circumspect population might see a decrease in both mobility and disease spread. The governments in these countries might have responded to the expectation this could have placed on the government to support their citizens. Country or time fixed effects will not be able to account for the changes in expectations people have from their government or actions of the government in response to these expectations across time. There is also considerable variation in how well and how soon governments might have reacted to the disease environment in the country. That is, the extent to reverse causality also varies by country. Further, as discussed in section 2.2, the outcome measures likely suffer from non-classical measurement errors. For example, less educated countries might be less stringent and might also have larger measurement errors in recording cases and deaths. All these factors will bias the OLS estimates.
To address these concerns, we opt for an instrumental variable (IV) approach. We use the level of stringency in countries other than country c on date t, to predict the level of stringency of the restrictions in the country c on date t. The rationale is that governments, in deciding the level of stringency of the restriction, looked not only at the disease condition in their own country but also what they expected would happen if they did not impose stricter measures. Since there was no way for them to predict the counterfactual scenario, they looked at the situation of other countries. In particular, they observed the actions other countries in the world were taking. If a country observed that other countries around the world were imposing strict restrictions, it could have been also inclined to enact stricter restrictions regardless of the disease situation at home. Since a country did not observe the private signal of other countries about how bad they expected the situation to become, the country used the observable decision of other countries to inform its own decision. This is at least likely to be true in the initial stages of the COVID-19 pandemic (see [19]), the timeframe in concern for this study. However, lockdowns and other such extreme restrictions are not sustainable for long, especially in developing countries, with countries soon facing pressure to gradually open up while balancing various health and socio-economic concerns [42]. So, the level of stringency of the policy measures in a country c at time t must be correlated with the stringency of the policy measures in the rest of the world, satisfying the relevance requirement for the IV.
While the day-to-day variations in the extent of governments-imposed restrictions in the rest of the world might influence a country's propensity to impose mobility and activity restrictions, it should not, at least in the short-run, significantly affect the level of activity and the growth rate of confirmed cases or deaths in the country. However, one possibility is if in the presence of a lead time prior to implementing travel bans, asymptomatic individuals traveled to avoid anticipated restrictions in the home country, and thereby seeded an outbreak in another country, thus directly driving cases/deaths. While this possibility is indeed difficult to negate using available data, we try to minimize this possibility by comparing several different geographical definitions of our instrument. We use the World Bank's classification of world regions and sub-regions for this exercise. In the preferred specification, we construct the instrument measuring the average level of stringency at time t in countries in our sample excluding all countries in the same sub-region s as country c to minimize the effects of spillovers of infections across borders. Other definitions we explore are (1) World stringency minus country stringency, (2) Region stringency minus country stringency, (3) Sub-region stringency minus country stringency, and (4) World stringency minus region stringency. The results using these alternative instruments are reported in the S1 Appendix.
Another scenario where the instrument will not be valid is if individual behavior is dominantly affected not only by the news she receives in her home country but also by the news she receives from around the world. That is if her behavior is affected directly by stringency policies in other countries. However, as the event study presented in Fig 1 below shows, this has not been the case.
In Fig 1, Day = 0 represents the day of national lockdown while the red line to the left of Day = 0 represents the average number of days prior when the date of national lockdown was announced/recommended. As can be seen, changes in mobility are in a sharp response to changing stringency in the home country. The change in mobility is most stark after the date of national lockdown was announced/recommended, remaining fairly constant prior to that, and reaching its peak almost concurrently with the day of national lockdown. However, note that given the differential timing of when countries implemented actions to restrict activity, Day = 0 varies across countries given there was not a singular date when countries implemented these measures. So one can argue for the case that the change in mobility is picking up both the response to the home country's action as well as other countries' activity restricting policies prior to the home country's implementation. In order to check for this possibility, we plot separate event study graphs for each country and see that the change in mobility is most stark after the date of lockdown was announced/recommended at home country, and is not driven by lockdowns of other countries. This suggests that individual behavior was dominantly affected by the news she receives in her home country. Therefore, the exclusion restriction is likely to hold.
The first stage of our preferred 2SLS specification is as follows: where, Stringency c,t is the level of stringency of the measures at time t in country c. Stringency w−s,t measures the average level of stringency at time t in countries in our sample excluding all countries in the sub-region s of country c. We also tried instruments by ranking each country's GDP/capita and categorizing them into 10 quantile groups. Then two instruments were constructed as follows: (i) Quantile group minus country average, and (ii) Other quantile groups minus country quantile group average. We get similar results using these instruments as our preferred specification and are available upon request. Excluding the sub-region also provides more overall variation to the instrument, even though under this specification it does not vary within a sub-region (at time t). Note that even with the other definitions of the instrument, where it varies within sub-regions, our results remain consistent in significance and direction, as can be seen in the S1 Appendix. θ1 c controls for time-invariant unobservables and differences across countries that capture factors like differential measurement errors in outcomes variables, levels of health and health infrastructure, times at which the first case was detected in different countries, and so on. δ1 t−i controls for effects that are associated with days since the first confirmed case in the country, where i is the day of the first confirmed case. We believe δ1 t−i does a better job at capturing the time-varying unobservable factors that might affect stringency across countries. This is because how the disease spreads within a country depends on when the first confirmed case was detected. For example, since the first confirmed case in China was much earlier than in the United States of America, there is no reason why both countries will have a similar level of unobservable factors affecting Stringency c,t on February 15, 2020. Both the fixed effects, θ1 c and δ1 t−i , thus attempts to control for any correlation of unobservables with other countries' stringency, and own country cases/deaths/mobility. We then use the predicted values of Stringency c,t in: where Y c,t is any of the mobility or outbreak growth rate outcomes for country c at time t. In

PLOS ONE
The role of mobility restrictions and institutional inequalities in halting the spread of COVID  28 to account for the possibility that the impact of a change in stringency on the number of confirmed cases and deaths might show up after a lag. We cluster the standards errors at the level of the country. Note that our instrument does not correct for non-classical measurement errors. However, in the preferred specification, excluding all other countries in the sub-region can minimize the chances of the measurement error in the instrument being correlated with measurement error in the endogenous variable. In the case of skewed or one-sided measurement error in the dependent variable, as is likely our case with the under-reporting of cases and deaths, Millimet and Parmeter (2019) [32] proposes using stochastic frontier analysis (SFA) or nonlinear least squares (NLLS) estimation to correct for such bias. Similar approaches has been used by Hofler and List (2004) [43] and Kumbhakar et al. (2012) [44] to correct for systematic over-or under-bidding in auctions, and by Anthopolos and Becker (2010) [45] to correct for undercounting in infant mortality data. As a robustness check, we repeat our analyses using SFA and NLLS estimators and find our results to be qualitatively similar to the ones reported using 2SLS. SFA and NLLS results are available upon request.
A decrease in mobility and activity, due to the stricter restrictions, may not necessarily decrease the growth of deaths due to the virus. For example, if those infected transfer it to others in and around their living quarters, infections and deaths may not decrease even if mobility does. To understand better the effectiveness of decreasing mobility and activity on the contagion, we estimate a three-stage recursive conditional mixed-process (CMP) model from Roodman (2011) [46]. The process is akin to a 3-Stages Least Square methodology, and similar to the 2SLS estimator assumes Stringency w−s,t to be exogenous. To understand the intuition behind the process, note that, That is, the ratio of the causal IV estimate of the impact of the stringency index on the growth rates to the impact of the stringency index on mobility is an estimate of how mobility affected the growth rates of cases or deaths in different countries. The system of equations is as follows: This allows us to compare how changes in mobility, due to stringent policy measures, affected the growth rates of cases to tests or deaths to tests ratios in different countries. We use mobility at public transport transit stations for this analysis.

Results
The mobility and activity restrictions enacted by countries around the world aimed at containing the contagion by limiting human-to-human contact. However, it is not obvious whether these restrictions actually limited mobility and activity; it depended on people's will and ability to observe these restrictions and their government's ability to enforce them. For example, multiple factors including, but not limited to, the level of education, trust in the government, and ability to maintain basic consumption expenditure without working, affect the extent to which citizens of a country might observe the restrictions. In Table 2, we begin by examining the impact of these restrictions on mobility. The dependent variables in columns (1) to (6) are the percentage changes in mobility in areas of the country as compared to the median value for the corresponding day of the week, during the 5 weeks of January 3, 2020, to February 6, 2020. The first five panels of the table present the association between these dependent variables and the stringency of the restrictions in the country at distinct points in time.
The estimated coefficient for Stringency Index (Lag 0) reports the association between mobility and contemporaneous restrictions. Similarly, coefficients for Stringency Index (Lag 7), Stringency Index (Lag 14), Stringency Index (Lag 21), and Stringency Index (Lag 28) report the association of the mobility measures with the stringency of the restrictions seven, fourteen, twenty-one, and twenty-eight days ago, respectively. All specifications include country fixedeffects and the number of days since the first case fixed effect, and we cluster the standard errors at the country level.
Two observations stand out. First, the restrictions had the intended impact-countries with stricter restrictions observed a higher reduction in mobility in public areas and an increase in time spent in residential areas. This is consistent with the associations between restrictions and mobility that [12,14,16,17] report. Second, as expected, contemporaneous restrictions matter more than past restrictions. The magnitude of the association of mobility measures falls with increasing lagged days of stringency of the restrictions.
The next five panels of the table present the results from the instrumental variable (IV) approach. As we discuss in Section 2, we use the level of stringency of the restrictions in countries in the rest of the world to predict the level of stringency in a country. The rationale, once again, is that countries, while deciding on the level of stringency responded not only to the disease situation at the home country but also to how it was expected to evolve. To predict how the situation would have evolved and what the optimal level of stringency might have been, every country looked at the rest of the countries in the world. Therefore, while the level of stringency in the rest of the world affected the stringency of the restrictions in a country, it did not affect the mobility and the disease situation in the country directly. That is, the exclusion restriction is likely to be satisfied. We use several definitions of the instrumental variable, all of which yield similar results. We present the results from using alternative instruments in S5 and S6 Tables in S1 Appendix. In what follows, we present results from our preferred IV specification where we use the stringency in the countries outside the sub-region to which the country belongs. Excluding countries from the sub-regions minimize the chances of the stringency in other countries affecting the mobility or spread of the disease in the country through pathways other than affecting the country's restriction stringency. The first stage F-stats are provided under each estimation and are well above the conventionally accepted threshold of 10 (for the case of a single endogenous regressor; see, [47]), indicating that the instrument is relevant and not weak. Compared to the association results in the first five panels, the IV causal estimates are slightly larger in magnitude. But the two broad observations remain unchanged -countries with stricter restrictions observed higher reduction in mobility, and contemporaneous restrictions matter more than past restrictions.
Next, in Table 3, we examine the impact of the level of stringency of the restrictions on the 7-day moving average growth rates of the numbers of tests conducted, confirmed cases, cases to tests ratio, deaths attributed to COVID-19, and deaths to tests ratio across time in different countries. The first five panels present the associations for comparison, but the discussion hereon will focus on the IV results. Compared to Table 2 where the contemporaneous restrictions had the largest impact on mobility, the stringency of the measures seven days and fourteen days ago have a larger impact on the growth rate of confirmed cases and deaths attributed to COVID-19. Given the current scientific understanding that the virus has an incubation and Table 2. Impact of stricter restrictions on mobility.  Table 3. Impact of stricter restrictions on 7-day moving average growth rates. infection period of up to fourteen days, this is expected. Second, even if we focus only on the effect of stringency seven or fourteen days prior, there appears to be a much smaller effect on the number of tests. There is no reason why the number of tests conducted, given the testing infrastructure of a country is controlled for by the country fixed effects, would have been largely affected by a decrease in mobility. It is possible that with reduced mobility, events like accidents that require medical attention decreases reducing the pressure on the health infrastructure that could then be devoted to COVID-19 testing. However, that would have lead to an increase in testing, which is not what we observe. However, the more stringent the measures, the lower the growth in the number of confirmed cases and deaths attributed to COVID-19. The impact on cases and deaths suggests that stricter restrictions achieved their goal of containing the contagion. The impacts on the growth rate of the two ratio variables are, as expected, smaller in magnitude but follow the same trends. Since we believe them to be better indicators, we report results with the growth rates of cases to tests and deaths to tests, hereon. As discussed in Section 2.2, in S7 Table in S1 Appendix we show that the results are robust to using restricted samples by testing approaches followed by different countries. The coefficients, when the sample is limited to countries and dates with full contact tracing as a consistent testing strategy, are similar to that of the full sample and follow the same trend.

OLS
But were restrictions equally effective across developing and developed countries, and adequate to contain the contagion? Countries with differing institutional capacities are likely to respond differently in not only adopting stringency measures [48] but also in their subsequent role in curbing mobility and in containing the contagion. Heterogeneity analysis by demography, the status of the health infrastructure, and governance indicators will help us understand the mechanisms and the role of other institutional and cultural factors. To find out, we split the sample of countries at the median for a range of characteristics and repeat the analysis. Another approach would be to include all these different dimensions of heterogeneity in one regression. However, setting aside the multicollinearity concerns that would arise from such an approach, we are not interested in individual heterogeneity coefficients along these dimensions but rather what they suggest collectively.
We present the heterogeneity in the impact of stringency on mobility in Tables 4 and 5, along with its coefficients plot in Fig 2 presented below. The first and last three columns in each panel of the tables report the impact of imposing stricter restrictions on mobility in countries below and above the median along the different dimensions. Comparing column (1) with column (4), column (2) with column (5), and column (3) with column (6), stricter restrictions had a larger marginal effect in limiting mobility in densely populated, less educated, poorer, more unequal, more polluted countries with younger but unhealthier populations and worse health infrastructure. From their description, and affirmed by the segregated summary statistics presented in S3 Table in S1 Appendix, these characteristics describe the relatively lessdeveloped countries in the sample. The restrictions also worked better in more democratic countries, with better government effectiveness and lower perceived levels of corruption. Finally, it should be noted from Fig 2 that not all two-point estimates are statistically significantly different from each other; but rather the purpose of the exercise is to point towards general trends of the coefficients over the different mobility measures.
However, this stronger effect of stringency on mobility does not imply that the relatively less-developed countries contained the contagion better. First, it is important to note that upon announcement of the lock-downs, many less-developed countries witnessed large migration of urban migrant workers back to their homes in rural areas before the lockdown came into effect, relevant for the timeframe explored in this study (see, for example [49], for the case in India). With limited mobility (or mobility not captured in the Google data) in their rural Table 4. Heterogenous impact of stricter restrictions on mobility 1.  homes, this can contribute to the stronger stringency effect on mobility for less-developed countries, while not translating to better contagion containment. Second, it may be that people in more developed countries were already socially distancing even in the absence of these restrictions [50]. From S3 Table in S1 Appendix we know that compliance to mobility restrictions was potentially lower in less-developed countries. This is supported by several other studies. For example, Ali et al. (2020) [51] report that compliance to lockdown measures for most people in Bangladesh was conditional on proper relief distribution by the government (the lack of which, due to weak institutional capacity, lead to the effective end of lockdown). Choudhury et al. (2020) [52] also show that food security policies played a crucial role in ensuring lockdown compliance in India. Thus, a lower self-disciplining behavior of the citizens in less-developed countries may be leading to the larger marginal effects of stringency on mobility in the countries. Think of this as being similar to an early point on a diminishing marginal returns curve, resulting in a larger marginal effect, but not with an overall greater reduction in mobility. Similarly, it is also possible that countries with a population in better health and adequate health infrastructure, handled the infections better, even if the restrictions were not stringent or if the populations were lax about observing them (examples include Sweden, Norway, and Germany). We see this in Tables 6 and 7, and its coefficients plot in Fig 3 presented below. As opposed to the results in Tables 4 and 5, stricter measures contained the contagion better in richer, more educated, more equal, less-polluted countries with older but healthier populations and better health infrastructure. From the description, the countries appear to be the more developed countries in the sample. These results are partly in contrast with association results from [14] that finds the correlation between stricter pandemic policies and lower future mortality growth was more pronounced in countries with a greater proportion of the elderly population, higher density, greater proportion of employees in vulnerable occupations, greater democratic freedom, more international travels, and further distance from the equator. The differences in our findings highlight the need to distinguish causal effects of these restrictions from associations. Not surprisingly, the restrictions also worked better in more democratic countries, with better government effectiveness and lower perceived levels of corruption. The results from Tables 4-7 taken together suggest that even though stricter restrictions worked better at limiting mobility in relatively less developed countries, it did not translate into better control of the contagion. Once again, restricting the sample to contain only countries and days with full contact tracing as a consistent testing strategy does not change the results. The results are available upon request.

VARIABLES (1) Transit Stations (2) Workplaces (3) Residential (4) Transit Stations (5) Workplaces (6) Residential
As explained in Section 2, a decrease in mobility and activity, due to the stricter restrictions, does not necessarily reduce the growth of the deaths due to the virus. To understand the effectiveness of decreasing mobility and activity on the contagion, we next report our results from the three-stage recursive conditional mixed-process (CMP) model described in Eqs (3) to (5). We use mobility at public transport transit stations for the analysis. Using alternative measures of mobility, except for mobility around parks, produces similar results. The results are presented in Table 8, with its coefficients plot in Fig 4 presented below. From the table, comparing the coefficients in column (1) with column (3), column (2) with column (4), column (5) with column (7), and column (6) with column (8), it is clear that the decrease in mobility had a larger effect in more developed countries that are more equal, have less poverty, are more educated, less polluted with better health infrastructure and governance. For example, a unit decrease in mobility in countries with more than the median score for the Corruption Perception Index (cleaner countries) causes a 0.0014 unit decrease in the growth of confirmed cases to tests ratio. The corresponding figure for countries with less than the median score for the Corruption Perception Index is insignificant and close to zero. With relatively few exceptions, the results suggest that developed countries benefited more from a reduction in mobility, in containing the growth rate of cases to tests ratio, than developing countries. This result is consistent with Barnett-Howell and Mobarak (2020) [53] who also report much lower estimated benefits of social distancing and social suppression in low-income countries. However, as   Fig 4 visually shows, we do not find such stark differences across the heterogeneity dimensions in the case of deaths to tests ratio, where education, democracy, and government effectiveness seem to play a dominant role. The results remain consistent when restricting the sample to only countries and dates with full contact tracing as a consistent testing strategy, and are reported in S8 Table in S1 Appendix. The heterogeneity results provide some elucidation to the possible reasons. Given that the population, on average, in relatively less-developed countries is more immunocompromised, fewer people might have been able to fight off the infections. There is a growing amount of scientific evidence that points towards people with better immune systems being able to fight SARS-CoV-2 infection better. See, for example [54]. People who can fight off the viral infection are possibly being less diagnosed, due to a shorter incubation period. Stringency measures are unable to counter immunodeficiency. This is further aggravated by the fact that stringent mobility measures lower the spread of the disease at the cost of people's economic opportunities. With higher poverty rates in developing countries, poor people may place greater value on their livelihoods relative to contracting the infection. The reduction in economic activity due to the restrictions could directly affect the daily consumption of poorer people, further compromising their immune systems. Similarly, hand-washes are not on the top of the  Tables 6 and 7 with 90% confidence intervals. https://doi.org/10.1371/journal.pone.0253348.g003 shopping list of poor people, especially during times of economic hardships. The lack of access to adequate hand-washing facilities might also hinder their ability to combat the virus, even in the presence of greater stringency.
The idea of instilling mobility restrictions is to flatten the curve and thereby lower the disease burden on the health infrastructure. However, most less-developed countries have a limited number of hospital beds and ventilators. If these are already over-whelmed and inaccessible, flattening the curve is only marginally useful compared to countries with better and accessible health infrastructure, and the effect of stringency measures would be, accordingly, much lower. Furthermore, the higher population density in less-developed countries could Table 8. Heterogenous impact of transit station mobility on growth rates: Recursive mixed-process model. mean a higher rate of human-to-human contact and transfer even with lower mobility than richer countries. Finally, another reason could be that the poor in the less-developed countries lack the knowledge of best practices to follow when a person who has tested positive is isolated either at home or at the hospital. Poorer government effectiveness and more corruption also mean sluggish enforcement of recommended best practices. Whatever be the reason(s), one clear inference from the final set of results is that mobility measures alone were not and will not be sufficient to contain the contagion in developing countries. What is worse is that on top of the relatively worse performance of a decrease in mobility in controlling the spread, the economic cost of these restrictions is also higher in these countries. With weaker institutions, social security support, and reliance on daily wages for consumption, restrictions on economic activity mean that poorer countries face a catch-22 much worse than the richer countries. Finding a solution could be difficult without external support to implement complementary measures.

Conclusion
Some have claimed that governments across the world have responded slowly and insufficiently to the COVID-19 pandemic [55]. Others have highlighted the real threats of stricter restrictions [56]. It is, therefore, imperative to understand how effective the restrictions  implemented by the countries across the world are. Compared to earlier evaluations of these restrictions that document a strong negative association between the stringency of the restrictions and the spread of the disease, we use an instrumental variable approach to estimate the causal effect of the restrictions.
We find that while the restrictions implemented affected mobility and the spread of the disease, there was considerable heterogeneity across countries. While stricter measures reduce mobility more in less-developed countries, they do not contain the contagion as effectively as they do in developed countries. Thus, it would seem less-developed countries with weaker institutions have less to gain from stricter mobility restrictions. This could result from the lower levels of awareness, poorer health conditions and practices, and worse economic conditions in these countries. The results highlight the need to complement restriction policies with awareness, economic, and health assistance schemes.
It is, however, unclear what these complementary policies could be. From direct monetary help to only partial shutdowns, there is a range of policies to choose from. Future research should investigate the effectiveness of these alternative complementary policies in increasing the effectiveness of the mobility and activity restrictions in developing nations.
Supporting information S1 Appendix. Appendix to "pandemic catch-22: The role of mobility restrictions and institutional inequalities in halting the spread of COVID-19". (PDF)