Predicting COVID-19 cases in Belo Horizonte—Brazil taking into account mobility and vaccination issues

Eder Dias; Alexandre M. A. Diniz; Giovanna R. Souto; Henrique L. Guerra; Humberto Torres Marques-Neto; Simon Malinowski; Silvio Jamil F. Guimarães

doi:10.1371/journal.pone.0269515

Abstract

The pandemic caused millions of deaths around the world and forced governments to take drastic measures to reduce the spread of Coronavirus. Understanding the impact of social distancing measures on urban mobility and the number of COVID-19 cases allows governments to change public policies according to the evolution of the pandemic and plan ahead. Given the increasing rates of vaccination worldwide, immunization data may also represent an important predictor of COVID-19 cases. This study investigates the impact of urban mobility and vaccination upon COVID-19 cases in Belo Horizonte, Brazil using Prophet and ARIMA models to predict future outcomes. The developed models generated projections fairly close to real numbers, and some inferences were drawn through experimentation. Brazil became the epicenter of the COVID-19 epidemic shortly after the first case was officially registered on February 25th, 2020. In response, several municipalities adopted lockdown (total or partial) measures to minimize the risk of new infections. Here, we propose prediction models which take into account mobility and vaccination data to predict new COVID-19 cases.

Citation: Dias E, Diniz AMA, Souto GR, Guerra HL, Marques-Neto HT, Malinowski S, et al. (2024) Predicting COVID-19 cases in Belo Horizonte—Brazil taking into account mobility and vaccination issues. PLoS ONE 19(2): e0269515. https://doi.org/10.1371/journal.pone.0269515

Editor: Ming Tang, East China Normal University, CHINA

Received: December 17, 2021; Accepted: April 20, 2022; Published: February 23, 2024

Copyright: © 2024 Dias et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are publicly available from https://prefeitura.pbh.gov.br/saude/coronavirus.

Funding: This study was supported by the Pontifícia Universidade Católica de Minas Gerais [2020/27196] and by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) in the form of a grants to SJFG [PQ 310075/2019-0] and AMAD [303733/2019-5]. This study was also supported by the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) in the forms of grants to SJFG [PPM 00006-18] and HTM-N [00253-18]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The outbreak of Coronavirus disease (COVID-19) caused by the SARS-CoV-2 led to serious health, economic, and social challenges worldwide. In terms of number of infections, in December 2021, 273 million cases and 5,3 million deaths have been officially reported worldwide [1]. Public health and social measures were implemented across the globe to suppress SARS-CoV-2 transmission and reduce mortality and morbidity from COVID-19, such as physical distancing measures that affect human mobility. The first case of COVID-19 confirmed in Brazil was on 25 February 2020 when a man from São Paulo tested positive for the virus. The first case in Belo Horizonte, Minas Gerais, Brazil was confirmed on March 21 2020, and the disease spread to the first quarter of 2021 in every federative unit of Brazil. Belo Horizonte is the sixth-largest city in Brazil, with a population of around 2.52 million commanding a metropolitan area with over 6 milion people, which is part of an important economic triangle in the Brazilian Southeast region. By June 16, 2021, 224,976 infections and 5,487 deaths had been notified in Belo Horizonte. Since the first reported case, public health policies and social distancing measures have been adopted to limit the number of infections and deaths. Social distancing [2] may be seen as is a primary and, in many cases, a necessary measure in combating virus dissemination.

Therefore, the impact of human mobility upon the dissemination of COVID-19 cases needs to be evaluated since it may help in stopping the virus dissemination. In this direction, several works were developed in order to study the impact of mobility [3–5]. Additionally, the development of indicators to estimate future cases of COVID-19 in urban centers could be used to suppress virus transmission such as SARS-CoV-2.

The development of indicators to estimate future cases of COVID-19 could help health authorities to make decisions about mobility restrictions and strategies of social distancing, that could delay the spread of the virus. Therefore, a good indicator of future cases could help achieving the best course of action and a better allocation of public heath resources. In the present study, by using mobility graph analysis it was possible to detect an increase in COVID-19 cases with increasing social mobility. However, changes that affect the behavior of the virus in the environment, such as mutations that make the virus more transmissible, vaccination that optimizes the immune response of the host making it less susceptible to infections, or periods of increased transmissibility that result in herd immunity are factors that may limit the development of indicators of future case based on mobility data.

In this context, the present study aims to evaluate the evolution of COVID-19 cases and deaths in Belo Horizonte, and the influence of local public health policy, such as the closing of establishments and commercial activities, upon the dissemination of COVID-19. Based on public health policy and human mobility indicators, we seek to estimate future COVID-19 cases. In terms of Brazil, other works have studied the behaviour of the spreading of COVID-19 cases between cities through mobile phone data [6], which may be seen as a mobility strategy as well.

The main contributions of this work can be summarized as follows: (i) study of the impact of mobility for spreading the COVID-19 infections; (ii) study of the effect of vaccination on transmission of SARS-CoV-2; (iii) proposition of three models based on time-series analysis for predicting new COVID-19 infections; and (iv) inclusion of the exogenous mobility variable, which is predicted by using Prophet framework, into the new cases prediction model.

Materials and methods

Ethics statement

This study was approved by the Research Ethics Committee from Pontifícia Universidade Católica de Minas Gerais, Brazil (4.359.790). Individual consent was not obtained because the data were analyzed anonymously.

COVID-19 cases data

Belo Horizonte’s COVID-19 reported cases were gathered between March 20, 2020 and December 31, 2021. Data were retrieved from the online Epidemiological Bulletin of Belo Horizonte, available at: https://prefeitura.pbh.gov.br/saude/coronavirus, published by Belo Horizonte’s Municipal Health Secretariat, which brings daily updates on the official number of COVID-19 cases, transmission rates, occupation of ICU beds devoted to COVID-19 patients, ongoing vaccination rates, among other information. The dataset was constructed by manually extracting these daily data from the epidemiological bulletin. Data were subjected to three different analyses: one SARIMA model, using COVID-19 cases as the reference; and two ARIMA models with exogenous variables, one using Google Mobility data and another one combining mobility and vaccination variables.

ARIMA using Google Mobility data time-series was split into two sections: a training set and a test set. For the training set, we used daily data ranging between Apr 20, 2020, to Feb 28, 2021, while the testing set contemplated the period Mar 1, 2021, to Mar 28, 2021. See trajectories in Fig 1.

Download:

Fig 1. Evolution of COVID-19 infection cases in Belo Horizonte.

Source: Belo Horizonte’s Municipal Health (2021).

https://doi.org/10.1371/journal.pone.0269515.g001

On the other hand, the time-series database used in the ARIMA model using vaccination coverage data used a different timespan. The time series was divided into two sets. The training set ranged from Mar 2, 2021, to Dec 3, 2021, while the testing set ranged from Dec 4, 2021, to Jul 31, 2021.

Google mobility reports data

Mobility, in conjunction with vaccination, are important factors to understand the epidemic spreading. Moreover, the use of masks and social vulnerability are other important issues for explaining the incidence of cases in some regions. Here, we are interested in global factors rather than local ones. Thus, the mobility represents a very important issue to this study. For that, there are several possibilities to study its behaviour, among them is the use of Call Detail Records (CDR) collected by mobile phone carriers. However this strategy is controversy, due to privacy problem, and it depends on the mobile operators. In several cities worldwide, the public authorities have used this strategy, however these data are not pubicly available. The other potential source is the use of Google mobility reports data.

Google Community Mobility Reports provides information on the impact of social distancing policies implemented by local authorities upon urban mobility to address the COVID-19 pandemic. They show charts with displacement trends over time by region and across different categories of locations, depicting how visits to places have been changing across different societies. Mobility trends are measured around six types of locations:

Grocery & pharmacy—places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies;
Parks—places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens;
Transit stations—places like public transport hubs such as subway, bus, and train stations.
Retail & recreation—places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters;
Residential—places of residence; and
Workplaces—places of work.

Google mobility data use the same world-class anonymization technology deployed daily by its products to keep users’ activities private and secure. This includes differentiated privacy, which adds artificial noise to datasets, enabling us to generate insights while avoiding identifying anyone.

These datasets show how visits and length of stay at different places changed compared to a baseline, represented by the median mobility value for any given corresponding day of the week, during the 5-week period: Jan 3 to Feb 6, 2020. The datasets show trends over several months and are generated based on data from users who have opted-in to Location History for their Google Account, so the data represent a sample of Google users. For further details, please refer to: https://www.google.com/covid19/mobility/.

Residence and workplace mobility data were not used in the study as the Coronavirus is more transmissive in public places where the circulation of people is intense. In addition, the Google Mobility data use 2019 as baseline to measure the percentage increase or decrease in mobility. As workers from many companies in Belo Horizonte are still in home-office, one witnessed an emigration of several workers away from the city towards the other municipalities of Minas Gerais State and even to other Brazilian states, a fact that could interfere with model predictions. Moreover, Belo Horizonte is a very important regional center which receives, every day, thousands of persons from its surounding areas. Thus, the number of persons circulating daily in Belo Horizonte is much higher than its population, thus the inclusion of this kind of mobility is recommended, and as the city of Belo Horizonte cannot be treated as a closed system, given the intense circulation therein not only of local dwellers but also inhabitants of nearby municipalities.

This study uses Google Mobility Reports data for Belo Horizonte across all the above-mentioned location categories for the time period Apr 20, 2020, to Set 17, 2021. See trajectories in Fig 2.

Download:

Fig 2. Google mobility reports data for Belo Horizonte.

Source: Google Community Mobility Reports (2021).

https://doi.org/10.1371/journal.pone.0269515.g002

Vaccination data

Public health care in Brazil is hierarchically organized and the municipalities are held responsible, among other duties, for the provision of primary health care and immunization campaigns. In the case of the Coronavirus pandemics, COVID-19 vaccines have been purchased by the federal government in large batches and redistributed to the 5,568 municipalities according to their demands.

At the municipal level, COVID-19 vaccine doses have been administered based on a scale of priorities, which initially benefited the health professionals, followed the elderly and those with comorbidities. Vaccination was initiated in Belo Horizonte on Jan 16, 2021, and the local health authorities have been keeping track of immunized and unimmunized populations.

The vaccination data used in this study were provided by Belo Horizonte’s Health Secretariat, and contemplates the cumulative number of doses administered between Jan 25, 2021 and Dec 03, 2021. Four vaccines were used in Belo Horizonte during this period: CoronaVac (Sinovac/Butantan); AstraZeneca (Oxford/Fiocruz); Pfizer (Pfizer/BioNTech) and Janssen (Johnson & Johnson). Complete immunization for these vaccines required at the time the application of two doses, with the following inter-dose intervals: CoronaVac vaccine has an interval of 14 to 28 days; AstraZeneca vaccine has an interval of up to 12 weeks; Pfizer vaccine has an interval of up to 8 weeks; and Janssen is a single-shot vaccine.

Fig 3 shows the accumulated number of first and second doses applied per day. Note that the curves show a consistent growth trend, however, the oscillations therein result from the spasmodic way with which the vaccines were made available, either through importation or through domestic manufacture.

Download:

Fig 3. Evolution of first and second shots of COVID-19 vaccination in Belo Horizonte, Brazil (January 25th to December 03rd 2021).

Source: Belo Horizonte’s Municipal Health Secretariat (2021).

https://doi.org/10.1371/journal.pone.0269515.g003

Mobility and vaccination are not the sole factors that determinig disease transmission. Other non-pharmaceutical interventions should also be considered in the study of the epidemic spread, such as social distancing, the use of masks, large-scale virus testing, and the degree of dissemination of new variants. However, to the best of our knowledge, no single study has been able to associate all these factors while attempting to predict epidemic spread.

A study estimated the minimum vaccination rate required to avoid the spread of the COVID-19 pandemic, which is 50.91 doses per 100 people [7]. Additionally, a mathematical model suggests that with a mask usage rate of 50% and a 50% effective vaccine, one needs 55–94% vaccination coverage to avoid the exacerbation of the pandemic [8]. In Belo Horizonte, up to Jun 29, 2021, only 20.8% of the target population were vaccinated with a second dose. Therefore, we considered the impact of urban mobility and vaccination to predict future COVID-19 outcomes, although recognizing that other factors may also play a role in the spread of the disease, whose data are unavailable.

Architecture of the proposed model

In our proposed model, we use the Prophet method (discussed further on) to predict urban mobility, based on social distancing public policy. For this, an exogenous variable (lockdown) was created and received the value 1, when it is within a period of restriction of circulation decreed by the city hall, and receives the value 0, when it falls in the reopening period.

The lockdown dates were Mar 18, 2020; Jun 29, 2020; Jan 11, 2021; and Mar 05, 2021. The reopening dates were May 25, 2020; Jun 08, 2020; Aug 06, 2020; Feb 01, 2021; and Apr 22, 2021. These dates were used to define periods in which the lockdown variable is assigned 0 or 1. After performing the mobility prediction in Prophet, the actual data on the number of cases is joined with the predicted data to be applied as an exogenous variable in the SARIMA model (discussed in the following).

Fig 4 shows the diagram of our proposed model. It is possible to visualize where the data will be applied and in which prediction model it will be used. The SARIMA model receives the training data of the number of cases with the predicted time series in Prophet, and performs the search for the best parameters (p,d,q) according to the smallest AIC value. We also used the auto-arima function in Python to run SARIMA models and the Akaike information criterion (AIC) as reference to identify the best fit model structures spanning out of the stepwise procedure. After finding the best SARIMA model. This proposed model is applied both to vaccination and mobility, however to predict vaccination the lockdown measure was not applied.

Download:

Fig 4. Architectural diagram of the proposed model.

https://doi.org/10.1371/journal.pone.0269515.g004

Once the best SARIMA models are fitted, they are used to evaluate the relationship between COVID-19 cases and several mobility measures and vaccination data. Model selection was based on the lowest MAE, RMSE and the higher R2. In calculating these metrics values, the prediction results are obtained from the SARIMA model which is built from the training data and the actual data is the data test. The base line of SARIMA modeling will be the SARIMA model with the best results. The best SARIMA models will perform each Google Mobility category as the external variables for the COVID-19 dataset. Similar to the SARIMA process, the best fit combination model is chosen by the lowest RMSE [9].

Facebook’s prophet algorithm

Prophet is an open-source library developed by Facebook, designed for making forecasts for time series datasets, using the forecasting tools available in Python. The Prophet Algorithm is a model for predicting time-series data. It is based on a decomposition of the time series into three main components: the trend, the seasonality and the holidays. The equation of the Prophet model is the following: where g_t represents the trend, s_t the seasonality and h_t the effect of holidays. g_t is modeled by a piece-wise linear function, s_t by a Fourier series of period s (the period of the seasonality in the time series), and h_t by an indicator function multiplied by a Gaussian prior. We use this last component to indicate the days of lockdown during the considered period of time. Prophet is robust event in presence of missing data and trend changes, and usually handles outliers well. Please refer to https://github.com/facebook/prophet for additional information about Prophet [10].

We use Prophet in this work incorporating a Brazilian holiday regressor as a way to minimize the impact of days presenting sudden changes in data, in order to achieve better fitting models.

ARIMA model

The ARIMA model is a mixture of auto-regressive (AR) and moving average (MA) models, in which both current and historical residual series values in the present time series are linearly expressed [11, 12]. The ARIMA model is an extended version or ARMA, able to deal with non-stationary time series, by differencing the time series until it gets stationnary. The ARIMA model is usually referred to as ARIMA (p, d, q), in which:

q is the non-seasonal moving average order,
d is the order of regular differentiation,
d is the number of differentiation.

The equation of an ARIMA model is the following: where the coefficients α and θ are the parameters of the model and ϵ are the error terms.

However, ARIMA does not support seasonal data, or a time series with a repeating cycle. This is not the case with Belo Horizonte’s COVID-19 datagiven the way in which they are gathered, processed and made available by the municipality of Belo Horizonte in weekly bulletins.

Therefore, the Seasonal Autoregressive Integrated Moving Average (SARIMA) or Seasonal ARIMA, is the most appropriate technique. It is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component. This extension considers a new ARIMA model in addition to the classical one. This new model is combined (multiplicatively) to the classical ARIMA model. There are hence three new components: an autoregression (AR), a differencing (I) and moving average (MA) component for the seasonal part of the time series, as well as an additional parameter for the period of the seasonality:

P is the seasonal order for autoregressive,
Q is the seasonal order for moving average,
D is the order of seasonal differentiation and,
s in the subscription shows the seasonal period.

In the present analysis, for instance, the occurrence of COVID-19 cases varies over the weekly period, thus s = 7.

The ARIMA family modeling procedure consists of three iterative steps: identification, estimation, and diagnostic checking. Identification is the process of determining seasonal and non-seasonal orders using the autocorrelation functions (ACF) and partial autocorrelation functions (PACF) of the transformed data [13]. The ACF is a statistical tool that measures whether earlier values in the series have some relation to later values. PACF captures the amount of correlation between a variable and a lag of the said variable that is not explained by correlation at all low-order lags.Fig 5 shows the ACF and PACF plots of the time series of the number of COVID-19 cases. Analyzing the ACF and PACF graphs we can obtain candidate values for moving average order (p) and autoregressive (q).

Download:

Fig 5. ACF and PACF graphs for COVID-19 cases in Belo Horizonte, Brazil.

https://doi.org/10.1371/journal.pone.0269515.g005

Parameters in the auto SARIMA models are estimated with the conditional least squares (CLS) method [14] after the identification step. The adequacy of the established model for the series is verified by employing white noise tests [15] to check whether the residuals are independent and normally distributed. It is possible that several SARIMA models may be identified, and the selection of an optimum model is necessary. Such selection of models is usually based on the Akaike Information Criterion (AIC) [16]. The p and q values generated from the ACF and PACF analysis will be the limit combination in making the SARIMA model. Then, by using the COVID-19 dataset, we combine SARIMA’s parameters (p,d,q,P,D,Q) to find the best-fit parameters. Akaike’s Information Criterion (AIC) method is used in selecting the best model by looking at the smallest AIC value using only the training dataset.

We used auto-arima function from pmdarima library in Python which fits the best ARIMA model to a univariate time series according to the Akaike information criterion (AIC), which is an estimator of prediction error and thereby relative quality of statistical models for a given set of data [17]. The function performs a stepwise search over possible model and seasonal orders within the constraints provided, and selects the parameters that minimize the given metric.

Results

Table 1 brings summary statistics for all variables used in the analyses. Notice that between Mar 30, 2021 and Jun 14, 2021, Belo Horizonte displayed on average 551 new COVID-19 cases daily, reaching 4,545 new cases at the height of the pandemic. Mobility trends measured during the same period display different patterns. While mobility around grocery stores and pharmacies have reached an average of 16% above the benchmark, mobility around retail and recreation facilities, as well as around transit stations have significantly decreased compared to the benchmark.

Download:

Table 1. Summary statistics.

https://doi.org/10.1371/journal.pone.0269515.t001

Prophet model results

Before predicting COVID-19 infection cases, it is necessary to predict mobility trends in order to provide exogenous information to be input in the infection cases model. This prediction used Prophet framework and Fig 6 brings the results of this procedure.

Download:

Fig 6. Google mobility prediction data for Belo Horizonte.

. Source: Google Community Mobility Reports (2021).

https://doi.org/10.1371/journal.pone.0269515.g006

SARIMA model results

SARIMA models are fitted to the COVID-19 diseases from Apr 20, 2020 to Dec 31, 20211. Table 2 presents the results of the estimations using the SARIMA processes for the COVID-19 case’s time series. The selection of the best model is performed according to the principle of AIC. Statistical results indicated SARIMA (0,1,2)(1,0,1)[7] as the best fit model, which is applied to forecast future cases.

Download:

Table 2. COVID-19 auto ARIMA models.

https://doi.org/10.1371/journal.pone.0269515.t002

Stationary data, along with ACF and PACF, are considered over time. The ACF and PACF correlogram revealed that COVID-19 cases had seasonality impact [18]. The ACF plot shows that sharp significant peak (greater correlation) occurs at lags of 7. This pattern strongly supports the existence of seasonality in the time-series. This is due to the non-recording of data on weekends by the Belo Horizonte city hall. Moreover, time diagram shows that the data are distributed in a horizontal way, the ACF and PACF values decline fairly fast close to zero (Fig 5).

Because the PACF shows a gradual downward trend and the ACF shows sharp cutoffs along with the lags, good candidates for MA, are lag values that have a sharp cutoff (lag = 1, lag = 2). However, it cannot be ruled out that there may be a good model with RA, as there are some sharp cutoffs along with the lags in the ACF.

Fig 6 shows the projections of new cases of COVID-19 with a relative confidence interval of 95% for a four weeks period. The difference between predicted (1,685) and real cases of COVID-19 (2,111) amounts to a 20.18% overestimation, when using exclusively the evolution of COVID-19 cases as reference.

SARIMAX models results using Google mobility trends data as predictors

Table 3 brings the results of SARIMAX models using the Google Mobility Reports data as external variables. Python’s auto-ARIMA from pmdarima algorithms identified the best fit models for each class of mobility data, including the movement around recreation, pharmacy, parks and transit station locales. We have also generated a multivariate ARIMA using all mobility measures in order to evaluate the combined effects of intraurban movement upon COVID-19 cases. The cumulative confirmed COVID-19 cases in 28 days turned out to be 27,666.

Download:

Table 3. ARIMA models results.

https://doi.org/10.1371/journal.pone.0269515.t003

Curiously, the multivariate model is the one presenting the highest AIC value, frustrating the expectations of a better performance. As Google Mobility Reports data are based on intraurban travel behavior of Google Maps users, mobility data are not independently generated, as one individual may carry his/her cell phone in a shopping trip along which he/she passes by pharmacies, parks, transit stations and recreation facilities (Table 3). Thus, multicollinearity among these variables is a serious issue, compromising the multivariate model’s performance. On the other hand, single external factors ARIMA models performed much better, with mobility around grocery/pharmacy and transit station displaying the lowest AIC value. Model results are in fine tune with the reality of Belo Horizonte. Pharmacies were not closed during the lockdown period, on the contrary, they functioned as a thermometer to know if the population was getting sick. Interestingly, when the number of cases reached peaks, that of pharmacy mobility also increased. On the other hand, mobility around transit stations is also an important predictor of COVID-19 cases, as despite the social distancing measures decreed by Belo Horizonte mayor’s office, several economic sectors remained active. Therefore, commuting journeys between home and place of work, captured by Google mobility data, significantly contributed to the spread of Coronavirus. In Fig 7, the prediction of infection cases taking into account different Google mobility analysis are showed.

Download:

Fig 7. Prediction of COVID-19 infection cases taking into account mobility issues.

https://doi.org/10.1371/journal.pone.0269515.g007

Another way of examining model performance is through the metrics MAE, MSE, RMSE and R². MAE (Mean absolute error) represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set. It measures the average of the residuals in the dataset. MAE is less sensitive to outliers compared to RMSE. MSE (Mean Squared Error) measures the average of the squares of the errors. RMSE (Root Mean Squared Error) is the error rate by the square root of MSE. It measures the standard deviation of residuals. RMSE represents the estimated white noise standard deviation in ARIMA analysis, and can only be compared between models whose errors are measured in the same units. The lower the MSE and RMSE, the better the model. R² (Coefficient of determination) represents the coefficient of how well the values fit compared to the original values. The value from 0 to 1 interpreted as percentages. The higher the value, the better the model. is [19]. From the measures illustrated in Table 3, we can say that MSE, RMSE and R² present consistent behaviour, confirming the validity of our results.

SARIMAX model results using vaccination as predictor

Table 4 shows the prediction of COVID-19 cases using vaccination data and the forecast of COVID-19 applying a mix of independent variables (4 types of mobility + vaccination). Notice that the model with all variables had the smallest number of predicted cases compared to the model that exclusively incorporated vaccination data. We can argue that when vaccination advances, there is a higher correlation between both vaccination and mobility data, which can explain the greater approximation between predicted and real cases.

Download:

Table 4. ARIMA model results taking into account vaccination as predictor.

https://doi.org/10.1371/journal.pone.0269515.t004

Fig 8 shows the comparison between the models applying the variables as shown in Table 4. The multivariate model presents seasonal patterns analogous to the vaccination one, although the waves have greater amplitudes, which can be explained by the increase of mobility as vaccination progresses.

Download:

Fig 8. Prediction of COVID-19 infection cases taking into vaccination issues.

https://doi.org/10.1371/journal.pone.0269515.g008

In Tables 5–9, we illustrated the statistical methods obtained by the proposed models when different strategies for computing the Google mobility data are employed.

Download:

Table 5. SARIMA Model recreation results.

https://doi.org/10.1371/journal.pone.0269515.t005

Download:

Table 6. SARIMA Model pharmacy results.

https://doi.org/10.1371/journal.pone.0269515.t006

Download:

Table 7. SARIMA Model park results.

https://doi.org/10.1371/journal.pone.0269515.t007

Download:

Table 8. SARIMA Model transit station results.

https://doi.org/10.1371/journal.pone.0269515.t008

Download:

Table 9. SARIMA Model multivariable results.

https://doi.org/10.1371/journal.pone.0269515.t009

Conclusion and future work

By analyzing the mobility graph and transposing it with the graph of the number of real cases, it is possible to infer that there was an increase in the number of cases with increasing mobility levels. The projection of the number of cases with each Google mobility variable has similar seasonal characteristics, the difference being the number of cases. In addition, the grocery/pharmacy application had the smallest error in the model (absolute error of 0.835%) about the accumulated cases: 27,666 real against 27,435 predicted cases.

We can highlight that using the model with the combination of mobility and vaccination, even with a larger projection, the error metrics were smaller compared to the mobility model. Besides, we managed to find a projection value very close to the real value, 5,904 real cases accumulated, against 5,770 of foreseen cases. Moreover, the model using vaccination as a dependent variable turned out to be weaker, with 9,611 predicted cases. Still, this model recognizes well the seasonality pattern of the series. A hypothesis for this result is that when the vaccination progresses, the number of cases decreases, and consequently, the population resume its normal life by increasing the urban mobility. Thus, thanks to the high vaccination rates, even after mobility has been with higher mobility, the impact of new cases is smaller.

In addition, the mobility variable starts to lose importance since even after mobility has been fully reestablished in Belo Horizonte, the number of cases continued to fall. Finally, the proposed model can be replicated, as long as city level COVID-19 and Google Mobility Reports datasets are available.

In terms of limitations, in this study, we have studied the statistical associations between the flow of people in Belo Horizonte, by using urban mobility from Google Mobility reports data, and the occurrence of cases in different periods, considering public interventions to limit activities in the city in conjunction with the vaccination progresses. Although these results already contribute to a better understanding of these relationships, in future studies we will try to identify and understand the mechanisms involved in these associations in order to establish and test the effectiveness of interventions.

This work was restricted to the application of the Prophet and ARIMA methods, however, there are other ways of performing time-series predictions, such as the LSTM (Long short-term memory), artificial recurrent neural network architecture used in the field of deep learning, but that could also be applied to time series. Another method that could be applied is the SIR model, which is specifically aimed at predicting communicable diseases, applying three ordinary differential equations for susceptible, infectious, and recovered numbers of people.

Supporting information

S1 File.

https://doi.org/10.1371/journal.pone.0269515.s001

(ZIP)

Acknowledgments

The authors sincerely thank the Prefeitura Municipal de Belo Horizonte for providing the data used in this analysis. Moreover, we thank all PUC Minas’ Health Observatory and IMScience members for the fruitful discussions and contributions.

References

1. Hopkins J. Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE; 2021. https://gisanddata.maps.arcgis.com/apps/dashboards/bda7594740fd40299423467b48e9ecf6.
2. Kissler S, Tedijanto C, Lipsitch M, Grad YH. Social distancing strategies for curbing the COVID-19 epidemic. medRxiv. 2020. https://doi.org/10.1101/2020.03.22.20041079
- View Article
- Google Scholar
3. Kraemer MUG, Yang CH, Gutierrez B, Wu CH, Klein B, Pigott DM, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020;368(6490):493–497. pmid:32213647
4. Chagas ETC, Barros PH, Cardoso-Pereira I, Ponte IV, Ximenes P, Figueiredo F, et al. Effects of population mobility on the COVID-19 spread in Brazil. PLOS ONE. 2021;16(12):1–27. pmid:34874978
5. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. pmid:32144116
6. Candido DS, Claro IM, de Jesus JG, Souza WM, Moreira FRR, Dellicour S, et al. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science. 2020;369(6508):1255–1260. pmid:32703910
7. Chen YT. The Effect of Vaccination Rates on the Infection of COVID-19 under the Vaccination Rate below the Herd Immunity Threshold. International Journal of Environmental Research and Public Health. 2021;18(14):7491. pmid:34299942
8. Shen M, Zu J, Fairley CK, Pagán JA, An L, Du Z, et al. Projected COVID-19 epidemic in the United States in the context of the effectiveness of a potential vaccine and implications for social distancing and face mask use. Vaccine. 2021;39(16):2295–2302. pmid:33771391
9. Aji BS, Indwiarti, Rohmawati AA. Forecasting Number of COVID-19 Cases in Indonesia with ARIMA and ARIMAX Models. In: 2021 9th International Conference on Information and Communication Technology (ICoICT); 2021. p. 71–75.
10. Taylor SJ, Letham B. Forecasting at scale. PeerJ Preprints. 2017;5:e3190v2. https://doi.org/10.7287/peerj.preprints.3190v2
- View Article
- Google Scholar
11. Roy S, Bhunia GS, Shit PK. Spatial prediction of COVID-19 epidemic using ARIMA techniques in India. Modeling Earth Systems and Environment. 2021;7(2):1385–1391. pmid:32838022
12. Zhang X, Zhang T, Young AA, Li X. Applications and comparisons of four time series models in epidemiological surveillance data. PLOS ONE. 2014;9(2):e88075–e88075. pmid:24505382
13. Grahn T. A Conditional Least Squares Approach to Bilinear Time Series Estimation. Journal of Time Series Analysis. 1995;16(5):509–529.
- View Article
- Google Scholar
14. Ho SL, Xie M, Goh TN. A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Computers & Industrial Engineering. 2002;42(2):371–375.
- View Article
- Google Scholar
15. Galbraith JW, Zinde-Walsh V. On the distributions of Augmented DickeyâFuller statistics in processes with moving average components. Journal of Econometrics. 1999;93(1):25–47.
- View Article
- Google Scholar
16. Koehler AB, Murphree ES. A Comparison of the Akaike and Schwarz Criteria for Selecting Model Order. Journal of the Royal Statistical Society Series C (Applied Statistics). 1988;37(2):187–195.
- View Article
- Google Scholar
17. Aho K, Derryberry D, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC. Ecology. 2014;95(3):631–636. pmid:24804445
18. Benvenuto D, Giovanetti M, Vassallo L, Angeletti S, Ciccozzi M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief. 2020;29:105340. pmid:32181302
19. Kaliappan J, Srinivasan K, Mian Qaisar S, Sundararajan K, Chang CY, C S. Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate. Frontiers in Public Health. 2021;9:1319. pmid:34595149

[ref1] 1. Hopkins J. Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE; 2021. https://gisanddata.maps.arcgis.com/apps/dashboards/bda7594740fd40299423467b48e9ecf6.

[ref2] 2. Kissler S, Tedijanto C, Lipsitch M, Grad YH. Social distancing strategies for curbing the COVID-19 epidemic. medRxiv. 2020. https://doi.org/10.1101/2020.03.22.20041079
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Kraemer MUG, Yang CH, Gutierrez B, Wu CH, Klein B, Pigott DM, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020;368(6490):493–497. pmid:32213647
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref4] 4. Chagas ETC, Barros PH, Cardoso-Pereira I, Ponte IV, Ximenes P, Figueiredo F, et al. Effects of population mobility on the COVID-19 spread in Brazil. PLOS ONE. 2021;16(12):1–27. pmid:34874978
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref5] 5. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. pmid:32144116
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref6] 6. Candido DS, Claro IM, de Jesus JG, Souza WM, Moreira FRR, Dellicour S, et al. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science. 2020;369(6508):1255–1260. pmid:32703910
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref7] 7. Chen YT. The Effect of Vaccination Rates on the Infection of COVID-19 under the Vaccination Rate below the Herd Immunity Threshold. International Journal of Environmental Research and Public Health. 2021;18(14):7491. pmid:34299942
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Shen M, Zu J, Fairley CK, Pagán JA, An L, Du Z, et al. Projected COVID-19 epidemic in the United States in the context of the effectiveness of a potential vaccine and implications for social distancing and face mask use. Vaccine. 2021;39(16):2295–2302. pmid:33771391
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref9] 9. Aji BS, Indwiarti, Rohmawati AA. Forecasting Number of COVID-19 Cases in Indonesia with ARIMA and ARIMAX Models. In: 2021 9th International Conference on Information and Communication Technology (ICoICT); 2021. p. 71–75.

[ref10] 10. Taylor SJ, Letham B. Forecasting at scale. PeerJ Preprints. 2017;5:e3190v2. https://doi.org/10.7287/peerj.preprints.3190v2
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref11] 11. Roy S, Bhunia GS, Shit PK. Spatial prediction of COVID-19 epidemic using ARIMA techniques in India. Modeling Earth Systems and Environment. 2021;7(2):1385–1391. pmid:32838022
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Zhang X, Zhang T, Young AA, Li X. Applications and comparisons of four time series models in epidemiological surveillance data. PLOS ONE. 2014;9(2):e88075–e88075. pmid:24505382
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref13] 13. Grahn T. A Conditional Least Squares Approach to Bilinear Time Series Estimation. Journal of Time Series Analysis. 1995;16(5):509–529.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref14] 14. Ho SL, Xie M, Goh TN. A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Computers & Industrial Engineering. 2002;42(2):371–375.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref15] 15. Galbraith JW, Zinde-Walsh V. On the distributions of Augmented DickeyâFuller statistics in processes with moving average components. Journal of Econometrics. 1999;93(1):25–47.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref16] 16. Koehler AB, Murphree ES. A Comparison of the Akaike and Schwarz Criteria for Selecting Model Order. Journal of the Royal Statistical Society Series C (Applied Statistics). 1988;37(2):187–195.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref17] 17. Aho K, Derryberry D, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC. Ecology. 2014;95(3):631–636. pmid:24804445
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref18] 18. Benvenuto D, Giovanetti M, Vassallo L, Angeletti S, Ciccozzi M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief. 2020;29:105340. pmid:32181302
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref19] 19. Kaliappan J, Srinivasan K, Mian Qaisar S, Sundararajan K, Chang CY, C S. Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate. Frontiers in Public Health. 2021;9:1319. pmid:34595149
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Ethics statement

COVID-19 cases data

Google mobility reports data

Vaccination data

Architecture of the proposed model

Facebook’s prophet algorithm

ARIMA model

Results

Prophet model results

SARIMA model results

SARIMAX models results using Google mobility trends data as predictors

SARIMAX model results using vaccination as predictor

Conclusion and future work

Supporting information

S1 File.

Acknowledgments

References