Assessment of the outbreak risk, mapping and infestation behavior of COVID-19: Application of the autoregressive and moving average (ARMA) and polynomial models

Infectious disease outbreaks pose a significant threat to human health worldwide. The outbreak of pandemic coronavirus disease 2019 (COVID 2019) has caused a global health emergency. Identification of regions with high risk for COVID 19 outbreak is a major priority of the governmental organizations and epidemiologists worldwide. The aims of the present study were to analyze the risk factors of coronavirus outbreak and identify areas with a high risk of human infection with virus in Fars Province, Iran. A geographic information system (GIS) based machine learning algorithm (MLA), support vector machine (SVM), was used for the assessment of the outbreak risk of COVID 19 in Fars Province, Iran. The daily observations of infected cases was tested in the third-degree polynomial and the autoregressive and moving average (ARMA) models to examine the patterns of virus infestation in the province and in Iran. The results of disease outbreak in Iran were compared with the data for Iran and the world. Sixteen effective factors including minimum temperature of coldest month (MTCM), maximum temperature of warmest month (MTWM), precipitation in wettest month (PWM), precipitation of driest month (PDM), distance from roads, distance from mosques, distance from hospitals, distance from fuel stations, human footprint, density of cities, distance from bus stations, distance from banks, distance from bakeries, distance from attraction sites, distance from automated teller machines (ATMs), and density of villages were selected for spatial modelling. The predictive ability of an SVM model was assessed using the receiver operator characteristic area under the curve (ROC AUC) validation technique. The validation outcome reveals that SVM achieved an AUC value of 0.786 (March 20), 0.799 (March 29), and 86.6 (April 10) a good prediction of change detection. The growth rate (GR) average for active cases in Fars for a period of 41 days was 1.26, whilst it was 1.13 in country and the world. The results of the third-degree polynomial and ARMA models revealed an increasing trend for GR with an evidence of turning, demonstrating extensive quarantines has been effective. The general trends of virus infestation in Iran and Fars Province were similar, although an explosive growth of the infected cases is expected in the country. The results of this study might assist better programming COVID 19 disease prevention and control and gaining sorts of predictive capability would have wide ranging benefits.

A dataset of active cases of COVID-19 in Fars was prepared to analyse the relationships 1 4 0 between the locations of active cases and the effective factors that may be useful for 1 4 1 predicting outbreak risk. The data utilized in this research was collected on April 10, 2020 1 4 2 from Iranian's Ministry of Health and Medical Education (IMHME). Choosing the appropriate effective factors to predict the risk of pandemic spread is vital as its risk for COVID-19 distribution, the selection of effective factors is a quiet challenging task. Ongoing research on the pandemic has revealed that local and community-wide transmission 1 4 8 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020 9 purposes in various fields [29]. It was first given by Hoerl and Kennard [30] which exploits 1 9 9 L 2 norm of regularization for lessening the model complication and controlling overfitting. Ridge regression was also developed to avoid the excessive instability and collinearity 2 0 1 problem caused by least square estimator [31]. The 'caret' package (https://cran.r-2 0 2 project.org/web/packages/caret/caret.pdf) of R 3.5.3 was utilized for assessing the variable 2 0 3 importance using ridge regression. [32], which is utilized for classification as well as regression intricacies [33][34]. SVM has a 2 0 9 high efficacy in classifying both linearly separable and inseparable data classes [35]. It 2 1 0 utilizes an optimal hyperplane to distinguish linearly divisible data whereas kernel functions 2 1 1 are employed for transforming inseparable data into a higher dimensional space so that it can 2 1 2 be easy categorized [36]. Assume a calibration dataset to be (s m , t m ), where m is 1, 2, 3…, x; 2 1 3 s m refers to the sixteen independent factors; t m denotes 0 and 1 that resembles risk and non-2 1 4 risk classes and x represents the total amount of calibration data. This algorithm tries to 2 1 5 obtain an optimal hyperplane for classifying the aforementioned classes by utilizing the 2 1 6 distance between them, which can be formulated as follows [37]: where, p denotes the rule of normal hyperplane; a refers to a constant. When Lagrangian All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020  for creating decision boundaries and the kernel function is expressed as follows [32]: where, K(z a , z b ) refers to kernel function and υ represents its parameter.

3 4
Analysis of growth rate for active and death cases of COVID-19 2 3 5 In this study, the growth rate (GR) of active and death cases around the world, Iran, and Fars 2 3 6 Province were evaluated using the data acquired from WHO and IMHME between February 2 3 7 26, 2020 and April 10, 2020 for active cases and from March 3, 2020 to April 10, 2020 for 2 3 8 death cases. The cross-checking of calibrated model using untouched testing data is vital for determining 2 4 1 the scientific robustness of the prediction [33]. In this research, we utilized ROC-AUC were 1.0-0.9, 0.9-0.8, 0.8-0.7, 0.7-0.6 and 0.6-0.5 respectively [39]. The behavior of the variable infection cases was captured by a third-degree polynomial or 2 4 9 cubic specification as follows: represents the total infected cases in day t and t denotes the days specifications. In the literature, this form of the specification has been applied by Aik et al. [40] to examine the Salmonellosis incidence in Singapore. We also used an ARMA model to of order (p,q) can be written as [41]: Where x is the dependent variable and ߝ is the white noise stochastic error term. In the 2 6 2 applied model, x shows the total infected cases and t is the days starting from the first day of epidemiological trend of COVID-2019.

6 5
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. The analysis of variable importance using ridge regression revealed that distance from bus 2 7 0 stations, distance from hospitals, and distance from bakeries have the highest significance 2 7 1 whereas distance from ATMs, distance from attraction sites, distance from fuel stations, 2 7 2 distance from mosques, distance from road, MTCM, density of cities and density of villages  distance from hospitals; bakery: distance from bakeries; atm: distance from ATMs; attraction: 2 7 7 distance from attraction sites; fuel: distance from fuel stations; mosque: distance from The COVID-19 outbreak risk map generated using SVM displays that risk of SARS-CoV-2 Fars Province which is likely to experience a higher risk of COVID-19 outbreak (Fig 5, a-b). Khorrambid, Rostam, Larestan and Kazeron of Fars Province has the highest risk of being the 2 9 0 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020.04.28.20083998 doi: medRxiv preprint 1 3 epicentre of SARS-CoV-2 outbreak. Apart from which counties like Eghlid, and Fasa also lie 2 9 1 in the high risk zone.  Our results displayed that the highest active cases in world, Iran, and Fars Province was  Also, the outcome stated that GR average of active cases in world, Iran, and Fars Province  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. March 28 to April 4, and April 5 to April 8, the GR of death cases was equal to zero.

1 7
Although the deaths on March 31, April 3, April 7, and April 10 were 3, 2, 4, and 1, 3 1 8 respectively, the daily growth rate is zero. Also, average of the GR in Fars Province during   cumulative rate of active cases, whereas the highest rate was observed in Qom, Semnan, outbreak of COVID-19 was recorded.  and Fars Province it is related to March 14 and March 31, respectively. Following Table 1 3 3 6 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. show that age class > 50 years old lie in the highest class of death rate. So, this age class of 3 3 7 above 50 years is highly sensitive to COVID-19.   When tested with active case locations on March 29, 2020, the model achieved an increased All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020.04.28.20083998 doi: medRxiv preprint 1 6 map (Fig 10 and Table 3). Also, change detection on April 10, 2020 show that accuracy of 3 5 0 the built models is increased to 86.6% (AUC=0.868) (Fig 11 and Table 4).    Iran. The first one is a third-degree polynomial model that is presented in Fig 12. Another quantitative model is an ARMA presented in Table 5. Fig 12 shows  The infection cases are increasing over the selected horizon. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020.04.28.20083998 doi: medRxiv preprint 1 7 The first derivative of the estimated model which turns it to a second-degree polynomial   (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. noting that a turning point means that after passing the peak it is expected to show a 3 7 6 deceasing trend. In the 38 th day of infection, Fars province accounts for around 2.84% of the 3 7 7 total Iranian cases while its population share is more than 6% (Statistical Center of Iran,  taken by the provincial government may be considered more effective than those taken in outbreak. It is worth noting that the comparison of the specified models is more appropriate  The ARMA time series models for infection variables of the Fars province and Iran are 3 8 8 presented in Table 5. These models may show the generating process of the variables in time horizon. It is worth noting that in order to have more comparable models, a 38-day time 3 9 0 horizon is selected. This is the period of time that data are available, staring on 19th of 3 9 1 February for Iran and one week later for Fars province. As shown in Table 4, the both series 3 9 2 are generated by an ARMA (2, 1) process. However, the absolute values of the AR terms for Fars province are lower than those of Iran, indicating a slower process of increasing trend for Fars province compared to those of Iran. However, regarding the values for AR roots, the that COVID-2019 spread tends to reveal slightly decreasing spread. In addition, 1 9 Heteroscedasticity (ARCH) were found to be insignificant in both models, indicating that the 3 9 8 infection cases tend to show insignificants fluctuations. This is the fact that is not easily 3 9 9 captured in the trends shown in Fig 12. Generally speaking, the diagnostic statistics indicate 4 0 0 that the estimated models are acceptable since Q-statistics reveal that the residuals are not inside the unit circle, indicating that ARMA process is (covariance) stationary and invertible. There is a great necessity for new robust scientific outcomes that could aid in containing and malaria. The ability to classify inseparable data classes is the greatest benefit of SVM model 4 2 2 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. [51] demonstrated that SVM also yields excellent precision in predictive modelling when a 4 2 5 large dataset is utilized. The algorithm have a very low probability to overfit and is not whereas other proximity factors such as distance from ATMs, distance from attraction sites, high influence in the transmission of SARS-CoV-2. In addition, the study conducted 4 4 7 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020.04.28.20083998 doi: medRxiv preprint 2 1 disclosed that increase in temperature will not decline the SARS-CoV-2 cases, although it has 4 4 8 been also revealed that increase in temperature and absolute humidity could decrease the 4 4 9 death of patients affected by . A third-degree polynomial and ARMA models  All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020  Watanabe, K. Machine learning methods reveal the temporal pattern of dengue incidence  CoV-2) epidemic and associated events around the world: how 21st century GIS technologies 5 6 8 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020.     Ariyoshi, K. Population density, water supply, and the risk of dengue fever in Vietnam: All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted April 30, 2020. . https://doi.org/10.1101/2020.04.28.20083998 doi: medRxiv preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.