On mobility trends analysis of COVID–19 dissemination in Mexico City

This work presents a tool for forecasting the spread of the new coronavirus in Mexico City, which is based on a mathematical model with a metapopulation structure that uses Bayesian statistics and is inspired by a data-driven approach. The daily mobility of people in Mexico City is mathematically represented by an origin-destination matrix using the open mobility data from Google and the Transportation Mexican Survey. This matrix is incorporated in a compartmental model. We calibrate the model against borough-level incidence data collected between 27 February 2020 and 27 October 2020, while using Bayesian inference to estimate critical epidemiological characteristics associated with the coronavirus spread. Given that working with metapopulation models leads to rather high computational time consumption, and parameter estimation of these models may lead to high memory RAM consumption, we do a clustering analysis that is based on mobility trends to work on these clusters of borough separately instead of taken all of the boroughs together at once. This clustering analysis can be implemented in smaller or larger scales in different parts of the world. In addition, this clustering analysis is divided into the phases that the government of Mexico City has set up to restrict individual movement in the city. We also calculate the reproductive number in Mexico City using the next generation operator method and the inferred model parameters obtaining that this threshold is in the interval (1.2713, 1.3054). Our analysis of mobility trends can be helpful when making public health decisions.


Introduction
The coronavirus disease 2019 (COVID-19) is caused by a novel coronavirus. The coronaviruses are a family of viruses that cause infection in humans and animals. The diseases that are by a coronavirus are zoonotic [1]. In particular, the coronaviruses that affect humans (HCoV) can produce clinical symptoms, such as the Severe Acute Respiratory Syndrome (SARS) viruses and Middle East Respiratory Syndrome (MERS-CoV) [2]. COVID-19 was first identified amid an outbreak of respiratory illness cases in Wuhan City, Hubei Province, China. This disease was initially reported to the WHO on 31 December 2019. On 11 March 2020, the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 WHO declared COVID-19 to be a global pandemic [3]. From the beginning of the epidemic to 21 January 2021, more than 97,890,676 cases and 2,094,459 deaths have been reported globally.
The first case of COVID-19 in South America was registered in Brazil on 26 February 2020. The first death from this infection in this region was announced in Argentina on 7 March 2020. The virus then arrived in Mexico, where by 21 January 2021 there have been almost 1,688,944 confirmed cases and 144,371 deaths.
To date, many researchers around the world have focused on understanding the transmission dynamics of COVID-19 disease using mathematical and statistical models and methods, see for example [4][5][6][7][8][9][10][11][12]. In this work, we will focus on those models that incorporate information on human movement. The relationship between human mobility and the transmission of coronavirus disease in the United States has been studied in [13,14]. Metapopulation models are not only among the simplest spatial models but they are also the most applicable to modelling many human diseases [15]. The metapopulation concept is to subdivide the entire population into distinct sub-populations, each of which has independent dynamics together with limited interaction between the sub-populations. This approach has been used to great effect within the ecological literature [16] and it has recently been used to model the spread of COVID-19; see, for example, [17][18][19][20][21].
In this work, we calibrate the metapopulation model proposed by Li et al. in [21], similar to [22], using incidence data reported in [23]. Consequently, we first describe the mathematical model that we have used; then, we compute the number of trips that are produced and attracted in each borough of Mexico City using data about these trips in 2017 [24], which then we combine with the rates of reduction or increase in mobility during the pandemic reported by Google [25] and the government of Mexico City [26]. Later, by using Bayesian inference, we solve the associated inverse problem to predict the dynamics of the spread of cases, similar to the following references [27][28][29][30][31][32][33]. Our conclusions are presented in the last section.

Computation of the mobility matrices
To incorporate mobility in the transmission model, the produced and attracted trips in the boroughs of Mexico City are considered (see Fig 1 and Table 1). Mexico City is the capital of Mexico. It has around 9 million inhabitants and a floating population of over 22 million, who are composed of daily commuters and international visitors. Mexico City is among the top 10 most crowded cities in the world [34]. It also has a large number of corporate headquarters and a large transport network, which is composed of 20 different modes of transport.
Mobility between the zones in Table 1 is represented in a two-dimensional arrangement, which is known as the origin-destination matrix (O-D matrix) M = {M ij }, i, j = 1, . . ., 16, where M ij represents the number of trips from zone i to zone j. Origin-destination matrices are usually obtained every 10 years from surveys. In Mexico City, the last survey was carried out in 2017 [24]. The information available identifies, among other things: if the trip was made on a weekday or if it was made during the weekend, the transport mode used, the purpose and the time. It is important to notice that, due to the complexity of mobility in Mexico City, the O-D matrix does not have to be symmetric nor the sum of the row i has to be equal to the sum of the column i. This can be explained because the O-D matrix captures chains of trips that may begin one day and end within the next few days. For instance, trips of people who leave their home to go to zone i to work and then go to zone j to see a movie before returning home, or people that work in zone i for few consecutive days (see [35]). In this paper, we consider all of the trips and they are identified by the mode of transport that was used in the area of interest (see Table 2).
There are several methodologies to update the O-D matrix in the literature. Most of them combine known information with current data observed, such as the number of trips in some segments of the transit network [36]. There are also some approaches that project the trips to/ from each zone based on the projected economic growth in those areas [37]. Nevertheless, given the pandemic situation that we are experiencing today and that we have current available data about the increase or decrease in mobility for some modes of transport, transit stations and parking lots, we consider the 2017 O-D matrix as a reference matrix and we update it to a scenario in 2020 using the daily mobility reports provided by Google [25] and the government of Mexico City [26]. According to [24], Tables 3 to 6 represent the number of trips between  Table 3. Mean number of trips from zones 1-16 to zones 1-9 during a week day. 1  2  3  4  5  6  7  8  9   1  477560  21108  7564  120676  6475  25026  1557  0  24149   2  21111  886428  11010  48039  32230  228272  49305  17446  108741   3  7336  10462  301805  9770  5397  10867  3491  722  100180   4  120735  50796  9699  1677356  23427  52676  2602  590  27846   5  6486  32393  5358  22936  361938  181660  1563  145  19190   6  25244  221750  10308  56688  177845  2766471  9326  10395  64521   7  1685  50941  4148  2576  1389  8008  260789  441  83675   8  0  16022  722  590  145  10872  336  218175  3901   9  24180  108129  102892  28836  18012  61990  80171  5577  1034673   10  1730  50969  1994  7283  8243  134119  2179  22801  11070   11  11034  237367  10663  17958  14370  82757  53911  17203  95685   12  2148  100785  2637  6062  7586  46856  7030  31284  these zones; for instance, the mean number of trips whose origin is Coyoacán (id = 2) and destination is Iztapalapa (id = 6) during a week day is 228,272 (see Table 3) and the mean number of trips whose origin is Tláhuac (id = 10) and destination is Cuauhtémoc (id = 14) is 21,881 (see Table 6) during a day on the weekend. In Tables 3-6, the last row represents the total number of trips attracted by each zone and the last column in Tables 4 and 6 represents the total number of trips generated by each zone. This way we can see in Table 4 that during a week day 849,911 trips are attracted to zone 10 and 540,274 trips are generated from zone 7.

Id
To compute the new number of trips made using the subway, RTP/M1, trolleybus, light rail or suburban, we used the corresponding rates given by the government of Mexico City. To compute the number of trips made using collective/micro or buses, we used the rates of transit stations given by Google and for personal transportation we used the workplaces rate. To compute the number of trips made by bicycle, we used the rates for Ecobici and for metrobus/mexibus we used an average of both the rates of metrobus and mexibus in Mexico City. The other transport modes remain the same as in 2017. Fig 2 shows the variations in the mobility indices from 27 February to 31 November, for each mode of transport that was modified.
The population of each borough is also considered in our mathematical model. According to [38], these populations in 2020 are given in Table 7.
Note that both, the mobility rates from Google and the government of Mexico City and the reference O-D matrix (the one from 2017) are given per day. This way, the population N i could be considered as constant; but only per day. In this work, a phenomenon whose duration is of the order of months is studied, so that for the entire period of estimation N i is considered as a variable.
Furthermore, although in all boroughs the largest number of trips are carried out within the same borough, the distribution of trips to/from the other boroughs does not follow the same pattern in all cases. For example, the Coyoacán borough attracts the highest number of trips from the Tlalpan borough and the least amount of trips from Cuajimalpa; meanwhile, the borough of Iztapalapa attracts the highest number of trips from the Cuauhtémoc borough and the least amount of trips from La Magdalena Contreras. This phenomenon can be explained by the prevailing economic activity in each borough and the transportation connectivity with the others. In order to obtain an stable algorithm for modeling correctly the migration of people, we use the Fratar method to balance all the origin-destination matrices [39].

Clusters
In this section, we describe the clustering analysis that we implemented on Mexico City based on mobility data. This analysis is presented not only to try to find some possible socioeconomic relations between some boroughs, but because there exist computational challenges which can be avoided creating clusters. Firstly, in order to solve model (1) for the whole Mexico City, the program uses around 50GB in RAM during the compilation process using the Stan package. Secondly, the computation time for estimating the parameters is around 3 days. We have used a computer with Ubuntu 20.04, 64 GB in RAM and 12 cores. Therefore, we propose in this section how it could be avoided both of these computational challenges. Solving model (1) for each cluster would reduce the amount of RAM memory used and the computation time. Moreover, this strategy could be implemented simultaneously using the twalk package for example, since the t-walk package only uses one core for execution program. During the pandemic, the Mexican government has scheduled four phases depending of level of contagion risk. These phases corresponding to the following periods: phase 1: from 27 February February to 22 March; phase 2: from 23 March to 19 April; phase 3: from 20 April to 28 June; phase 4: from 29 June to 27 October. The mobility network was analysed using the community detection module Louvain inside the igraph R package [40]. For more details about the igraph R package, see [41]. Thus, using the Louvain community detection algorithm, we are able to identify that Mexico City's network has a modular structure, with three communities, as shown in Figs 3 and 4. From Figs 3 and 4 we observe that the communities in the first and fourth phases of the pandemic are the same, and the second and third of the pandemic are the same. The community 1 of the first phase of the pandemic is composed of boroughs 1, 4, 14, 15 and 16; community 2 is composed of boroughs 2, 3, 7, 8, 9, 11, 12 and 13; and, community 3 is composed of boroughs 5, 6 and 10. The community 1 of the second phase of the pandemic is composed of boroughs 1, 4, 14, 15 and 16; community 2 is composed of boroughs 2, 5, 6, 8, 10, 11, and 12; and, community 3 is composed of boroughs 3, 7, 9 and 13.

Mathematical model
As we mention in the introduction, the transmission model incorporates information on human movement within the following Susceptible, Exposed, Infected, Recovered (SEIR) metapopulation structure [21]: where S i , E i , A i , I i and N i are the susceptible, exposed, undocumented infected, documented infected and the total population in borough i at time t, respectively, and N 0 i denotes the fixed population in borough i given by Table 7. Spatial coupling within the model is represented by the daily number of people traveling from city j to city i (M ij ) an a multiplicative scale factor θ, reflecting the under-reporting of human movement. It is also assumed that documented infected individuals (I i ) do not move between boroughs, although these individuals can move between boroughs during the latency period. The total population N i in each borough is reset each new day as the sum of N 0 i and the inflow term θ∑ j M ij , minus the outflow term θ∑ j M ji . We note that distinction between daytime and nighttime in the transmission model 1 is implemented in [35]. A complete description of the parameters involved in the model (1), the respective range of values proposed in [21] and their measurement units can be found in Table 8. We have set a minimum value for the denominator N j − I j or N i − I i as equal to 10 3 in order to avoid instabilities.

Parameter estimation
For parameter estimation, we use the daily reported dataset [23]. We use Bayesian inference to solve the inverse problem associated to the system of Ordinary Differential Equations (ODEs) given on (1), similarly to [33]. Some references using this method of parameter estimation can be found in [42][43][44][45][46][47][48][49][50][51][52][53]. Let Problem (2), defines a mapping F(θ) = x from parameters θ to state variables x, where F : R m þ ! ðL 2 ð½0; T�Þ n ; where R þ denotes the non-negative real numbers. We assume that F has a Fréchet derivative. Usually, not all states of the system can actually be directed measured, i.e., the data consists of measurements of some state variables at a discrete set of points t 1 , . . ., t k , e.g. in epidemiology, these data consist of number of cases of confirmed infected people. This defines a linear observation mapping from state variables to data C : where s � n is the number of observed variables and k is the number of sample points. Let us define F : R m ! R s�k as F(θ) = C(F(θ)), called the forward problem. Thus, the inverse problem is formulated as a standard optimization problem min θ2R m kFðθÞ À y obs k such that x = F(θ) holds, with y obs is the observable data which has error measurements of size η. Problem (3) may be solved using numerical tools to deal with a non-linear least-squares problem [54][55][56][57][58]. In this work, we implement Bayesian inference to solve the inverse problem given on (3). From the Bayesian perspective, all of the state variables x and parameters θ are considered as random variables and the data y obs is fixed. For the random variables x and θ, the joint probability distribution density of the data x and the parameters θ, denoted by π(θ, x), is given by π(θ, x) = π(x|θ)π(θ), where π(x|θ)π(θ) is the conditional probability distribution, which is also called the likelihood function, and π(x|θ) is the prior distribution, which involves the prior information of parameters θ. Given x = y obs , the conditional probability distribution π(θ|y obs ), which is called the posterior distribution of θ, is given by the Bayes' theorem: pðθjy obs Þ / pðy obs jθÞpðθÞ; ð4Þ If an additive noise is assumed where η is the noise due to discretisation, the model error and the measurement error. If the noise probability distribution π H (η) is known, θ and η are independent, then pðy obs jθÞ ¼ p H ðy obs À FðθÞÞ: All of the available information regarding the unknown parameter θ is codified into a prior distribution π(θ), which specifies our belief in a parameter before observing the data. All of the available information regarding how we obtained the measured data is codified into the likelihood distribution π(y obs |θ). This likelihood can be seen as an objective or cost function because it punishes deviations of the model from the data. To solve the associated inverse problem (4), one may use the maximum a posterior (MAP) We used the dataset in the zone i as y obs ¼ ðS i ;Ẽ i ;Ã i ;Ĩ i Þ, which correspond to the susceptible, exposed, documented infected and undocumented infected in the zone i, respectively. A Poisson distribution with respect to the time is typically used to account for the discrete nature of these counts. However, the variance of each component of the dataset y obs is larger than its mean, which indicates that there is over-dispersion of the data. Thus, a more appropriate likelihood distribution is to use the Negative Binomial (NB) because it has an additional parameter that allows the variance to exceed the mean [50,51,59]. The NB is a mixture of Poisson and Gamma distributions, where the rate parameter of the Poisson distribution itself follows a Gamma distribution [59,60]. We note that although there are different mathematical expressions for the NB depending on the author or source, they are equivalent. Because of this multiple representation of the NB in the literature, one must ensure to use the NB distribution accordingly to the source. Here, we have used the following expression for the NB distribution where μ is the mean of the random variable y � NBðyjm; �Þ and ϕ is the over-dispersion parameter; that is, We recall that the Poisson distribution has mean and variance equal to μ, so μ 2 /ϕ > 0 is the additional variance of the NB with respect to the Poisson distribution. The inverse of the parameter ϕ, controls the over-dispersion. Thus, it is important to select its support adequately for parameter estimation. In addition, there are alternative forms of the NB distribution. We have used the first option neg_bin of the NB distribution of Stan [61]. We acknowledge that some scientists have had success with the second alternative representation of the NB distribution [47]. We assume independent NB distributed noise η (i.e., all dependency in the data is codified into the contact tracing model). In other words, the positive definite noise covariance matrix η is assumed to be diagonal. Therefore, using the Bayes formula, the likelihood is pðθjðĨ i Þ / pðĨ i jθÞpðθÞ; where i denotes the borough index. As mentioned earlier, we approximate the likelihood probability distribution corresponding to diagnosed cases with a NB distributioñ where the index j denotes the number of days, i the number of the boroughs, and ϕ i are the parameters corresponding to the over-dispersion parameter of the NB distribution (5) respect to each borough. For independent observations, the likelihood distribution π(y|θ) is given by the product of the individual probability densities of the observations where the mean μ of the NB distribution NBðI i ðyÞ; � i Þ, is given by the solution I i (t) of the model (1) at time t = t j . For the prior distribution, we select the LogNormal distribution for β s and β a parameters, Gamma distributions for α and γ parameters and Uniform distributions for the other parameters to estimate: ρ, θ and initial conditions ðS i 0 ; E i 0 ; A i 0 ; I i 0 Þ. The hyperparameters and their support corresponding to all the distributions of the parameters to estimate are given on the table's range Table 8.
The posterior distribution π(θ|y obs ) given by (4) does not have an analytical closed form since the likelihood function, which depends on the solution of the non-linear model given on (1), does not have an explicit solution. Then, we explore the posterior distribution using two methods: first, the Stan Statistics package [61] within its version the Automatic Differentiation Variational Inference (ADVI) method,; and second, the general purpose Markov Chain Monte Carlo Metropolis-Hasting (MCMC-MH) algorithm t-walk [62]. Both algorithms generate samples from the posterior distribution π(θ|y obs ) that can then be used to estimate marginal posterior densities, mean, credible intervals, percentiles, variances, and others. We the reader refer to [63] for a more complex description of the MCMC-MH algorithms. Fig 5 shows the credible intervals of parameters of model (1) within 95% Highest-Posterior Density (HPD) using the ADVI-Fullrank method of Stan package [61]. Table 9 shows the posterior mean and quantiles of all the estimated parameters of model (1) using the ADVI-Fullrank method of Stan package. Table 10 shows the posterior mean and quantiles of all overdispersion parameters ϕ i of the Negative Binomial distribution (6).  (1) within 95% Highest-Posterior Density (HPD) using the t-walk Package [62]. Note that the result obtained with the t-walk package are preliminary because we only performed 60,000 iterations, with 30,000 of them as burn-in and it was obtained without balancing the Origen-Destination matrices. We performed this limited quantity of iterations because the computational time consumption is significantly large for each 1,000 of iterations. However, we will perform more iterations in the near future. Using both packages, we did a fit for the first 245 days of the pandemic in Mexico City, starting 27 February, and we have performed predictions from 245-275 days, corresponding to 28 October to 27 November and compared with the true cases in this last period 28 October to 27 November. We assumed that the mobility from 28 October to 27 November is the same as from 28 September to 27 October, i.e., we assumed the same mobility cluster for the projection period. We set up a minimum borough fraction equal to 0.6 to limit the borough to fall below their population size. Our future work will analyse the identifiability of the parameters of model (1), as suggested in [49,64,65]. Specifically, the ρ parameter because it is multiplied by the period of incubation of the disease, α. Thus, estimating both parameters simultaneously may lead to non-identifiability difficulty. we  Table 9. Parameter estimation of β s , β a , ρ, α, γ, θ and initial conditions of the model (1 have uploaded all the codes and source data used in this paper to the following Github link for a detailed review.

The basic reproduction number estimation
The basic reproduction number, which is commonly denoted by R 0 , is the average number of secondary infections generated by a single infective during the curse of the infection in a whole susceptible population. We calculate the reproductive number R 0 in Mexico City using the inferred parameters. Define X = (E, A, I) and using the next generation operator method [66] on the system (1), the Jacobian matrices F and V of system (1) are given by The disease free equilibrium (DFE) of system (1) is X 0 = (0, 0, 0, N, 0) T , we then have Therefore, the next-generation matrix is K = FV −1 , from where R e is computed as the leading eigenvalue of matrix K; that is, Table 8 shows the range of values for the parameters involved in the expression (8) obtained using the Stan package [61]. With those values, we obtain a %95 credible interval for R 0 2 ð1:0224; 1:2376Þ.

Discussion
In this work, we analyse a networked dynamic metapopulation model of the coronavirus dissemination in Mexico City using ODEs and Bayesian statistics. We present an explanation of how to estimate the mobility per day between the boroughs that compound the Mexico City, both on a weekday and on weekends; combining available information from the origin-destination survey carried out in 2017 with the current mobility indices that Google and the government of Mexico City report, depending on the mode of transport used to make each trip (e.g., bus, subway, car, etc.) and then we use the Fratar method to balance the daily origindestination matrices. We also present a clustering analysis of the boroughs which compound Mexico City based on mobility data from Google and the Transportation Mexican Survey. From Figs 3 and 4, we can identify three different clusters during the each phase of the pandemic. We point out that the same cluster analysis done for Mexico City, could be implemented for a broader area, the metropolitan area named Valle de Mexico, which is rather important for the whole country. We consider that this clustering analysis which is based on individual movement may be crucial to efficiently model a human pandemic on the same scale as presented here, or at a smaller scale.
From Fig 5, the transmission rate of symptomatic was 0.19 within 95% Credible Interval (CI) [0.06, 0.42], and the transmission rate of asymptomatic was 0.27 within 95% CI [0.14, 0.40], which is in concordance with the value estimated of 0.25 in [67], in that study the transmission rate was not separated in symptomatics and asyntomatics. The fraction of undocumented infections, ρ, was 0.027 within 95% CI [0.02, 0.04]. The estimated latency period, 1/α, is 5.96 days within 95% CI [3.60, 8.93] days, which is in concordance with the value used of 5.99, 6.0, 5.1 and 5.0 in [50,52,68,69], and the estimated recovery period, 1/ γ, is 4.86 days within 95% CI [3.15, 9.33] days, which is lower in comparison with the ones used of 5.97 and 10.81 (asymptomatic and symptomatic class, respectively) in [69], 10.0 in [52], 7.0 in [50] and 5.0, 10.8 and 14.0 (reported infectious, symptomatic and asymptomatic class, respectively) in [68]. The inter-borough scale factor θ was 0.77 within 95% CI [0.61, 0.89], this value indicates that the mean number of trips made by a person is between three and four in one day, which makes sense with complete trips to get out of home, do some activities (e.g., work, shopping, or services), and return home. The results of the inferred parameters of model (1) and the population size of the boroughs (e.g., Iztapalapa and Gustavo A. Madero) help to explain the fast dispersion of COVID-19 and indicate the challenge of finding strategies to contain it. We have compared the parameter values inferred with respect to those used for Mexico City. As mentioned in Section, we will analyse the identifiability of the parameters of model (1) (i.e., the ρ parameter) because this parameter is multiplied by the period of incubation of the disease, α. Thus, estimating both parameters simultaneously may lead to a non-identifiability difficulty. We may observe this non-identifiability in Figs 5 and 10; that is, different combinations of the model parameters lead to the same "energy" value of the system 1. In particular, we can observe that a different combination of the estimated parameters values obtained with the methods ADVI-Fullrank and ADVI-Meanfield give very similar fitted curves for diagnosed cases. We also observe that the recovery period time is more in accordance with the values used in [50,52,68,69] but the latency period is lower than the ones used there. We note that the parameters, β s , β a of the model (1) are considered as global; that is, they are assumed the same for all the boroughs of Mexico City and all the transportation modes. In the near future, we will explore a more robust model that will consist of local parameters of transmission β s , β a , instead of globally (i.e., a pair of transmission rates β s , β a for each borough). Furthermore, we will consider those transmission rates, β s , β a , to be dependent on time as in [52,70]. In addition, we will consider the interstate and international mobility from/to Mexico City. We will take into account imported cases from the other 31 states of Mexico. We will also consider the cases imported from overseas by airplane passengers, and will do a global and local sensitivity analysis of model (1). Finally, we will investigate a spatio-temporal model based on a diffusion partial differential equation model combined with individual movement trends.