Explaining the effective reproduction number of COVID-19 through mobility and enterprise statistics: Evidence from the first wave in Japan

This study uses mobility statistics combined with business census data for the eight Japanese prefectures with the highest coronavirus disease-2019 (COVID-19) infection rates to study the effect of mobility reductions on the effective reproduction number (i.e., the average number of secondary cases caused by one infected person). Mobility statistics are a relatively new data source created by compiling smartphone location data; they can be effectively used for understanding pandemics if integrated with epidemiological findings and other economic data sets. Based on data for the first wave of infections in Japan, we found that reductions targeting the hospitality industry were slightly more effective than restrictions on general business activities. Specifically, we found that to hold back the pandemic (that is, to reduce the effective reproduction number to one or less for all days), a 20%–35% reduction in weekly mobility is required, depending on the region. A lesser goal, 80% of days with one or less observed transmission, can be achieved with a 6%–30% reduction in weekly mobility. These are the results if other potential causes of spread are ignored; for a fuller picture, more careful observations, expanded data sets, and advanced statistical modeling are needed.


Introduction
Many countries have suffered from the COVID-19 pandemic and have experienced severe economic impacts due to the restrictions on socio-economic activities. GDP losses have been significant (e.g., -7.9% in Japan during the second quarter of 2020 [1]), and unemployment numbers are increasing. Several countries have managed to restart socio-economic activities to near pre-pandemic levels, but most have suffered a second wave of the pandemic.
The conditions of first, second, and higher-round waves can differ because individual or organizational countermeasures (e.g., masks, hand washing, antiseptic solutions, and partitioning) have advanced. However, analyzing the infection risks and degree of lockdown/ voluntary restriction of socio-economic activities in the first wave, the only currently available data, is meaningful for creating better activity restriction policies. Traffic flows or mobility habits are a representative measure of lockdown restriction levels. Therefore, many previous studies have focused on the relationship between infection levels and traffic flows to demonstrate the effects of lockdowns/voluntary restrictions. For example, focusing on traffic data obtained from 1200 automatic traffic sensors in Italy, Cratenì et al. [2] constructed a multiple regression model to explain the number of daily new positive cases of COVID- 19. In their regression model, the explanatory variables on particulate matter pollutant, number of tests per day, travel time decay from outbreak, and temperatures are statistically significant, but the mobility habits 21 days before an onset is shown to be most influential. Similarly, rail-based transport accessibility [3] and traffic volumes on express ways [4] also have the capability to explain the number of infections, and a real-time car-parking data set is used to identify the spatiotemporal exposure risk at a local spatial scale (identifying the crowds around a parking lot) [5].
Person-based mobility statistics, which have recently become available through smartphone devices, are a powerful tool for understanding regional overviews of socio-economic activities. For example, Google Mobility Report [6] provides population statistics for retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, and residential areas all over the world. Engle et al. [7] used GPS locational data for 94,116 observations in 3,142 U.S. counties from 2/24/2020 to 3/25/2020 and found that a rise in the local infection rate from 0% to 0.003% is associated with a 2.31% reduction in mobility. These smart-device based mobility statistics are also utilized to understand the effect of control measures in China [8] and to estimate the number of COVID-19 infections in New York [9].
As for the Japanese case, Yabe et al. [10] employed 200,000 anonymized mobile phone users in Tokyo and concluded that by April 15 (one week into the state of emergency), human mobility behavior decreased by approximately 50%. Similarly, Arimura et al. [11] analyzed the city of Sapporo to understand the effect of the emergency declarations from the local and national government.
In this way, many researchers have focused on mobility data sets in order to better understand the number of infections and deriving effective countermeasures. However, there is still a lack of linkages among epidemiological findings and mobility and economic data. Our approach also utilizes mobility data (hourly and 500 m grid scale populations across Japan), considering its powerful ability to capture the actual rate at which people stayed at home during the pandemic crisis, but at the same time uses epidemiological findings (e.g. incubation period) and economic data sets (e.g. employees in hospitality sector) as well. Improvements remain to be made regarding models, integration of other data sources, and updated information, but our goal here is to quickly begin taking steps toward trying many combinations of currently available data sets and simple statistical methods for sharing this information, because the pandemic is an on-going crisis.
For this purpose, this study performs two major exploratory data analyses using such data sets. First, we investigate whether our mobility measures are correlated with infection risk. To measure people's potential contacts especially in a specific area where businesses are located, a simple measure is used combining recent business census data at a 500 m grid scale with mobility statistics. Then, this measure is compared with the effective reproduction number (how many people are infected by one infected person, denoted by R(T)) in each region. The second focus of data analysis is the degree to which we need to restrict our (daily) travel to reduce the pandemic crisis.
Compared to previous research with similar aims [2,3,8,9], the data set is constructed at a very detailed spatial scale, and substantial effort is put into data processing to provide evidence that reinforces/complements the results in the literature. The effective reproduction number, estimated in this study is also helpful for interpreting the results because it is a direct indicator of an increasing or decreasing trend in infections. Understanding the relationship between mobility levels and the effective reproduction number can help policymakers make informed policy choices to reduce local infections.

Data set and approach
In this study, three different types of statistics are utilized: the number of infected people [12], mobility statistics [13], and business census data from Japan's Ministry of Internal Affairs and Communications (MIC) [14]. The number of infected people is recorded on the date that the infections are confirmed by Japan's Ministry of Health, Labor and Welfare (MHLW). Here, we focused on the eight prefectures where the number of infections exceeded 500 by May 31, 2020. We then prepared the associated data sets for these eight prefectures. Fig 1 illustrates the time series for the number of infected people in the eight target prefectures. Tokyo had the most infected people, and the second largest city, Osaka, followed. Explosions of infections can be seen from the end of March. It is assumed that people reduced their restriction levels during the holidays before this large wave came. April 7 is the day the emergency statement was issued by the Japanese Government, after which the first wave of the pandemic gradually abated.
The effective reproduction number (or instantaneous effective reproduction number due to time dependent characteristics) can be calculated from this data and the serial interval distribution (time between successive infections from one person to another). Basically, the infection process is regarded as a stochastic phenomenon due to the difference in infectivity among people and observation errors, and a certain type of probability distribution needs to be assumed. To estimate the effective reproduction number, we employed the model provided by Cori et al. [15], which assumes a gamma distribution for the number, because the researchers validated the model by checking the consistency of the estimates for five historical outbreaks. We adopted Nishiura et al. [16] for the serial interval distribution: 4.8 days for the mean and 2.3 days for the standard deviation, which are relatively consistent with the report by Kramer et al. [8], which estimated 4.8 days for the mean and 3.3 days for the standard deviation.
NTT Docomo is one of the largest carriers in Japan, and it holds 37.4% of all mobile phone contracts in Japan [17]. Populations of people ages 15 to 79 are counted with the following rule for target "hourly duration" (i.e., one day is divided into 24 time slots such as 15:00-16:00.): If a person stays within the grid for 15 minutes during a target hour, then 1/4 is added to the population. The rule is applied for every 15 minutes of stay. Based on the duration of each person's stay in a grid, either 1/4, 1/2, 3/4, or 1 is added to the population. The mobility data are not raw data, but are magnified based on the share of the carrier, NTT Docomo, in each region. People below age 15 and over 80 as well as people who do not have a smartphone are not counted. Therefore, the estimated population tends to be lower than the actual population, but the majority of the population is explained by this data set.  [18]). In Fig 2b, 12,128,864 people were counted in April. Reductions are seen in inflows from other prefectures and abroad as well as in inflows from residential areas to the center of the prefecture.
The last statistic introduced is the 2016 economic census for businesses conducted by the MIC [14], which is the most recent data set available. The statistics include the number of employees in over 100 business sectors, and it is aggregated in a 500m grid scale. By using the mobility statistics and business census, we employ the following criterion as a measure of potential contacts in the business and commercial districts. This criterion should have a strong (negative) correlation to the level of stay home activity. The measure of (daily) potential contacts (PC) in the business and commercial districts is defined as the sum of the weighted average of the hourly population as follows: where W(s) is a variable for the weight at grid s, N(s, t) is the mobility statistics at grid s and time t, H T is a set of time slots in T, and S is a set of grids in a target area. In the analysis, we investigate two cases: W(s) = TEmp(s) as the total employees in all business sectors and W(s) = HEmp(s) as the total employees in all hospitality sectors (wholesale and retail, hotel and restaurant, living related and personal services, amusement, education, and medical and healthcare sectors). In either case, if people are crowded in business and commercial areas, where the number of employees is large, the value of the measure increases. This measure cannot capture the actual contacts (e.g., distance, meeting duration, mask use, and other countermeasures), but it is expected to explain the potential contacts considering that that literature has shown that a decrease in trips leads to a statistical decrease in infections.
The daily effective reproduction number R(T) is estimated over a weekly sliding window before T (i.e., R(T) represents the number from day T-6 to T). PC(T) is also estimated as a weekly average (from T-6 to T), but a 5 days lag is used to comparing R(T). That is, the incubation time is set as approximately 5 days based on previous studies, where the median incubation time was reported to be 5.1 days in China [19] and 5.8 days in Japan [20]. For simplicity, we denote the weekly average of PC(T) as PCðTÞ. The period for estimating R(T) and PCðTÞ can be changed, but a good correlation is seen so far between R(T) and PCðT À 5Þ in the later analysis.
Based on the estimates of PCðTÞ and R(T), the mobility restriction levels required to reduce the pandemic can be calculated. That is, the threshold value of PCðT À 5Þ that can achieve R(T) � 1, is regarded as a minimum activity level. (If the PCðT À 5Þ is smaller than the threshold, all the observed R(T) are smaller than or equal to 1.  where N'(s,t) is the required population at grid s at time t, which is one of the solutions to achieve the threshold value. From Eq (2), we can understand that one of the solutions to achieve the target PC 0 ðTÞ is to set the relative population N 0 (s, t)/ b N ðs; tÞ equals to PC 0 ðTÞ= c PCðTÞ for all s2S and t2H T .

Results and discussion
We adopt the Pearson correlation coefficients to investigate the relationships between PCðTÞ and R(T). To use the Pearson correlation coefficients, both variables must satisfy normality or linearity conditions. Therefore, we applied the Kolmogorov-Smirnov test to PCðTÞ and R(T), respectively. This test investigates the hypothesis regarding whether the empirical distribution (observations) and theoretical distribution are statistically identical. A detailed explanation is given, for example, by Massey [21]. As a result, we set the target period to the days from April 1 to May 6 (from about one week before the emergency declaration in the severely affected region to the end of the long holiday), which is regarded as the duration of the first severe wave. Unfortunately, the PCðTÞs for two prefectures (Chiba and Saitama) did not pass the normality test (reject the null hypothesis on normality with 5% significance level), but the other data sets are regarded as a normal distribution. The results of Chiba and Saitama are similarly analyzed as those in other prefectures in the later analysis for reference purpose only. Fig 3 describes the Pearson correlation coefficients between R(T) and PCðT À 5Þ for two cases of employment (total and hospitality sector). From this figure, the correlation coefficients are generally better when employees in the hospitality sector are selected as the weight of PCðTÞ. This indicates that infections tend to occur in the hospitality sector, but more rigorous investigations are necessary, for example using more detailed sector classification.  The number of infections was small during the first few weeks, and the R(T)s were not stable during this period. In the figure for Hokkaido, the target days after April 1 are highlighted for a reference. In many prefectures, the potential contacts decreased considerably in April and the beginning of May, but gradually started to return to pre-pandemic conditions at the end of May. The state of emergency declared by the central government ended on May 16 in Hokkaido and on May 31 in all the other prefectures, but people gradually restarted their activities, probably because the atmosphere of emergency was alleviated after many of the other prefectures ended emergency actions on May 6 th .
Based on the estimates of PCðT À 5Þ and R(T) described in Fig 4a-4h, the required mobility restriction levels can be calculated for reducing the pandemic from Eq (2). That is, the threshold value of PCðT À 5Þ can achieve R(T) � 1. Our study adopts the average PCðTÞ in February 2020 as the c PCðTÞ for Eq (2). Fig 5 shows the estimated population (mobility) restriction levels (N 0 (s, t)/ b N ðs; tÞ from t-6 to t) to ensure achievement of R(T) � 1 in each prefecture. Because potential contacts are the sum of the weighted average of the population by employees in the hospitality sector, PC 0 ðTÞ can be more easily achieved if the rate of people is reduced more in places where the hospitality sector is agglomerated.
From the figure, Tokyo requires the largest reduction (35%), followed by Osaka. This result can be naturally interpreted as these prefectures are the largest in Japan, and the population in the hospitality sector tends to be large. As shown in Fig 4a-4h, the scale of potential contacts in these prefectures is more than two times larger than the scale in the other prefectures. Hokkaido, Kanagawa, Hyogo, and Fukuoka also require high restriction levels on visits to the hospitality sector. Among these prefectures, Hyogo is a less populated prefecture, and the index of PCðTÞ is low. Its population characteristics are generally reflected in the low value of R(T) in Hyogo, but a large restriction on visits to the hospitality sector is still required to guarantee R(t) � 1. Another index, such as R(t) � 1 with an 80% chance, may be appropriate to capture the relationships between PCðTÞ and an average low value of R(T). Saitama and Chiba may be classified into a third group, where the required restrictions are not so strict. However, these two sectors do not pass the normality test, and the relationship between mobility and the number of infections in these prefectures is not yet clear.
Based on the discussion above, in order to better understand the case of Hyogo, another index "R(T) � 1 for 80% of days" is introduced to capture the relationships between PCðT À 5Þ and R(t) in Fig 6. This provides a slightly different view from Fig 5. The required reduction level becomes much lower, especially in Kanagawa, Osaka, and Hyogo, where generally low R(T)s are observed. In these prefectures, a large number of infections were observed from an early stage, and the introduction of countermeasures (e.g., the number of people wearing masks and social distancing at local spots within a 500 m grid) might have been preventative.
On the other hand, Hokkaido, Tokyo, and Fukuoka still require a large reduction in the number of visits to the hospitality sector. Among these prefectures, Hokkaido and Fukuoka has a large agglomeration of hospitality sector businesses, especially restaurants and night spots for visitors, which possibly affected the infection rates.
The above result is based on a case in which an 80% reliability level is arbitrarily determined, but more discussion is needed to determine a reliability level that we can accept. A better discussion would be to investigate on the relationships between the mobility reduction levels at different reliability levels for R(T) � 1 and to estimate the economic impacts of mobility reductions as well. However, this is beyond the scope of this study.
There are several studies that explain the change in total cases with a decrease in mobility. A quantitative comparison of these studies with our result is difficult because the data items  [9] founds that 10% reduction in trips leads to a 0.27 log point drop in per capita Covid-19 prevalence. A 0.27 log point fall in Covid-19 represents five fewer cases per 1000 habitants, from a sample mean of 17 per 1000. Roughly speaking, 29.4% (= 5/17) of cases are reduced by 10% reduction in trips. We interpret that mobility reduction leads to the effective reduction of new incidences. Their approach is favorable because the number of essential workers, which cannot avoid visiting areas with large numbers of infections, are considered an instrumental variable. Considering that our results indicate the reduction level needs to be around 20.3%-35.4% to reduce new incidences, we cannot clearly say that lower percentage of mobility reduction would have a large potential to reduce new infections. We agree, at least, that the effects of mobility reduction have a  statistical relationship with the number of infections, but varies depending on population density, especially in the hospitality sector.
Another necessary discussion point lies in the difference between the first wave treated in this study and the second or higher round waves. Considering that the countermeasures have advanced and the temperature has changed, further analysis is needed to know how much restrictions on mobility achieve R(T) � 1. Continuous monitoring is necessary to understand when we will establish a new life with COVID-19.

Conclusions
This study utilized mobility statistics and a business frame census, analyzed on a fine spatial scale, to capture the effective reproduction number of COVID-19, which is an important indicator in epidemiology. The weighted average of population density is estimated as a measure of congestion by using employees in the hospitality sector/total business sector. The study examines the correlation between these measures and the effective reproduction number in eight Japanese prefectures, where the incidence is large. One of the major conclusions in this study is that the measure of potential contacts in the hospitality sector (weighted average of population in the hospitality sector) has a fair correlation with the effective reproduction number, but the difference between hospitality and total business sectors is slight.
From this measure, the necessary population reduction level to hold back the pandemic can be derived. Our analysis indicated 0.20 (Hyogo)-0.35 (Tokyo) reductions are required to achieve R(T)�1 for all days, depending on the conditions of the prefectures, but 0.06 (Hyogo)-0.30 (Tokyo) are enough to achieve R(T)�1 for 80% of days. Because of the regional variety in values, and the high sensitivity to the required reliability to achieve R(T)�1, these relationships should be carefully checked in each prefecture to determine mobility restriction policies. An analysis of the relationships between mobility reduction and economic impacts would also assist in this kind of policy making.
However, there are many limitations in the current study. Our numerical results should be both updated from newer data sets and remodeled to include more explanatory variables with regard to natural and social conditions. Our analysis focused only on population density and business sector locations, whereas the attributes of the population are unknown. For example, age and job type largely affect the type of activity in the visited area. In addition to mobility and personal attributes, the number of incidences should be analyzed on a more detailed spatial scale. The current study uses the total number on a prefecture scale, which has a less significant relationship with mobility information in specific areas, especially when number of incidences is small. Most of our infection data use the date of symptom onset, but some of the data miss this information. Delays on reported infections also affect the results in our study, and necessary modifications are required. Our statistical model also has a much room to be improved. Approaches used in previous studies, such as the spatiotemporal regression model adopted by Jiang et al. [5], would enhance the quality of analysis.
Moreover, for additional future studies on Japanese conditions, a comparative study between the first and second or higher round waves will be important to identify the progress of countermeasures and the effects of temperatures. A similar analysis in other countries would also help to understand what level of mobility restrictions and local countermeasures would contribute to a low infection risk.