Spatial and temporal dynamics of leptospirosis in South Brazil: A forecasting and nonlinear regression analysis

Although leptospirosis is endemic in most Brazilian regions, South Brazil shows the highest morbidity and mortality rates in the country. The present study aimed to analyze the spatial and temporal dynamics of leptospirosis cases in South Brazil to identify the temporal trends and high-risk areas for transmission and to propose a model to predict the disease incidence. An ecological study of leptospirosis cases in the 497 municipalities of the state of Rio Grande do Sul, Brazil, was conducted from 2007 to 2019. The spatial distribution of disease incidence in southern Rio Grande do Sul municipalities was evaluated, and a high incidence of the disease was identified using the hotspot density technique. The trend of leptospirosis over the study period was evaluated by time series analyses using a generalized additive model and a seasonal autoregressive integrated moving average model to predict its future incidence. The highest incidence was recorded in the Centro Oriental Rio Grandense and metropolitan of Porto Alegre mesoregions, which were also identified as clusters with a high incidence and high risk of contagion. The analysis of the incidence temporal series identified peaks in the years 2011, 2014, and 2019. The SARIMA model predicted a decline in incidence in the first half of 2020, followed by an increase in the second half. Thus, the developed model proved to be adequate for predicting leptospirosis incidence and can be used as a tool for epidemiological analyses and healthcare services.Temporal and spatial clustering of leptospirosis cases highlights the demand for intersectorial surveillance and community control policies, with a focus on reducing the disparity among municipalities in Brazil.


Unfunded studies
Enter: The author(s) received no specific funding for this work. Yes -all data are fully available without restriction

Introduction
The febrile illness leptospirosis is one of the most widespread emerging zoonoses worldwide (Adler and Moctezuma, 2010). It has a high transmission capacity for individuals in social and environmental vulnerability conditions (Clazer et al., 2015;Gonçalves et al., 2016;Buffon, 2018). Therefore, in Brazil, it constitutes a relevant public health problem, with social, health and economic impact, high hospital costs, lost working days and high lethality (Brasil, 2019;Martins and Spink, 2020). The etiological agent of the infection is Leptospira sp., which has the ability to live in varied environments for prolonged periods. Pathogenic species affect humans and wild, domestic and synanthropic animals, which can become carriers and contribute to the microorganism spread in nature (Benacer et al., 2016).
In Brazil, leptospirosis is an endemic disease and is present in all regions of the country. It has a high incidence, with an annual average of 3,926 confirmed cases and a death rate of 8.9%, between 2007 and 2016, with the highest number of cases registered in the Southeast and South regions (Brasil, 2018;Marteli et al. ., 2020;Galan et al., 2021).
Leptospirosis is easily overlooked and relatively little is known about it, as few studies are carried out about the disease, and that's why it is considered a neglected disease. In this sense, reliable data on its incidence and prevalence in different areas are still scarce (World Health Organization, 2003). Thus, we consider that the analysis of real data on the incidence of confirmed leptospirosis cases collected from information systems derived from the official surveillance of the disease in Brazil can provide contributions to the elucidation of the difficulties in control and to the dynamic understanding of the occurrence of this endemic disease in the country. Observations can be used as tools to help health managers to design prevention and promotion public health actions (Martinez and Silva, 2011;Souza, Uberti and Tassinari, 2020).
Transmissible diseases surveillance data is useful to managers while monitoring daily incidence trends, and the usefulness of this data can be extended with the help of statistical analysis. It can be used to find out gaps in the notification system, to identify the disease incidence peak periods in a season. In addition to those benefits, predictive analytics gives us scope to predict the future burden of the disease as well as to early predict an epidemic (Promprou, Jaroensutasinee and Jaroensutasinee, 2006;Lal et al., 2012;Ho and Ting,02015;Kapagunta and Chetty, 2021).
Thus, the objective of this paper was to analyze the spatial and temporal distribution of leptospirosis incidence rate, in the period between 2007 and 2019, in Rio Grande do Sul, Brazil and to predict the expected incidence of the disease in a subsequent period. Based on the results we expect to contribute to the development of more effective strategies for leptospirosis control in a region of high incidence, such as southern Brazil.

Methods
A ecological study was carried out based on data colected in leptospirosis surveillance routine in the state of Rio Grande do Sul, between 2007 and 2019. The dynamics of the disease in the state was analyzed through spatial and temporal statistical analyses. An incidence prediction model for subsequent years was built.

Study location
Rio Grande do Sul is the southernmost state in Brazil, divided into 497 municipalities. It has an estimated population of 11,422,973 inhabitants in 2020 , an area of 281,730.149 km 2 , with a population density of 39.79 inhabitants/km 2 .

Statistical analysis
An ecological study of time series and spatial analysis was developed, which make it possible to identify geographical temporal patterns of the disease in Rio Grande do Sul. Using leptospirosis data over 13 years of observation, incidence rates (per 100,000 inhabitants) and lethality (%) were calculated using the population projection of Instituto Brasileiro de Geografia e Estatística, for each year of interest (IBGE, 2021).
Those indicators were annually mapped -in all municipalities in Rio Grande do Sul -in order to verify the existence of clusters and, therefore, analyze the disease risk in different regions of the state. In addition, incidence rates were analyzed using temporal regression models and forecasting, using SARIMA modeling.

Spatial clusters
Hotspot analysis evaluates events distribution in a given area. It is possible to visually identify the concentration of the event and indicate concentration areas where the phenomenon frequently occurs. This analysis was performed to detect spatial clusters with a higher concentration of high incidences (clursters) of leptospirosis, called hotspots, between 2010 and 2019, using the Getis-OrdGi statistic (Anselin, 1995).
The hotspots are observed as central points, where there is a greater intensity of the analyzed event in the study area, and considering each central point circular distance, it performs a score of all points within the sample influence radius, with a event smoothing (Marteli et al., 2020;Chaiblich et al., 2017). Thus to visualize the Hotspot analysis with the highest leptospirosis incidence áreas, 'Heat Maps' were constructed (Qgis, 2021).

Spatial statistical analyzes were performed using the Geographic Information
System of Quantum GIS software (QGIS), version 3.4.7.

Generalized additive models (GAM)
Generalized additive models (GAM) represent an extension of the generalized linear model, as proposed by Hastie and Tibshirani (1987) and Hastie and Tibshirani (1990). Generalized additive models represent an alternative for modeling nonlinear relationships that do not have a defined shape (Conceição et al., 2001). They are based on non-parametric functions, called smoothing curves, in which the association shape is defined by the data 1990;Conceição et al., 2001). GAM models provide a framework for generalizing a general linear model, allowing the variables nonlinear functions additivity (Ravindra et al., 2019).
With this replacement, it is not necessary to assume a linear relationship between g(µi) and the explanatory variables, nor is it necessary to previously know the relationship form, but it is possible to estimate it from a data set (Conceição et al., 2001). This estimated function, also called a smoothed curve, is nothing more than some kind of Yi average values in the vicinity of a given xi value, which allows describing the shape, and even revealing possible nonlinearities in the studied relationships, since it does not have the rigid structure of a parametric function (Conceição et al., 2001).
In order to select the best GAM, we followed Baquero et al. (2018) and used a Poisson likelihood and cubic splines with 5 knots on all predictors. We proceeded with a Poisson likelihood (we trained equivalent models with negative binomial and Gaussian likelihoods), to tune the type of spline (shrinkage cubic or cubic) and the upper limit on the degrees of freedom (df) associated with the spline (k = df-1 = {3,4,5,6,7,8}) (Baquero et al., 2018).
The general formula of GAM with Poisson likelihood was: where was the observation i, Ζ is the linear predictor for the observation i, 0 is the intercept, is the spline for predictor and is the number of knots (Baquero et al., 2018).
GAM models allow a wide range of distributions for the adopted response variable, as well as linkage functions to measure the redictor variables effects on the dependent regressors, as reported by McCullagh and Nelder (1984) and Hastie and Tibshirani (1990). Thus, in this study, GAMs were adjusted in order to verify the association form lepotospirosis occurrence of over time. Gross rates in Rio Grande do Sul were adjusted. The response variables were the observed reported and confirmed cases with Poisson distribution. The models linear predictor was formed by the notification year variable (2007 -2019) with a smoothing function (spline) wih the offset term being the population natural logarithm exposed in each notification year. The analyzes were performed using R software and "mgcv" and "tidyverse" packages.

SARIMA modelling
Time series analysis was used to describe leptospirosis behavior between 2007 and 2019 and to forecast the disease incidence in 2020. Trend elements evaluation, seasonality, cyclical variation, association and random variation are time series analyses components. That assessment makes it possible to understand the implicit processes over time and, thus, providing important information for illness public health policies planning (Antunes, 2015).

Results
From 2010 to 2019, 4,760 cases and 238 deaths from human leptospirosis were confirmed in Rio Grande do Sul. The average incidence rate was 4.06 cases per 100,000 inhabitants and lethality rate of 5%. Leptospirosis' incidence and lethality rates spatial distribution are shown in figure 1. In several regions of the state, high leptospirosis incidence rates were observed, with emphasis in the Metropolitana, Vales and Norte Macroregion. In addition, it was observed that 220 municipalities did not confirm any case in the entire period, while deaths from leptospirosis were recorded in 59 municipalities, with the highest number of deaths recorded in the capital, Porto Alegre (60 deaths, LR 8.9%).   April (P<0.05) ( Figure 3B). After differentiating the time series, the necessary stationarity for the SARIMA modeling was confirmed using ADF test, in addition to the ACF and PACF ( Figure 3B).    (Barcellos et al., 2003;Galan et al., 2021), which demonstrates the maintenance of incidence clusters in the state over the last decades.
In Brazil, leptospirosis is an endemic disease distributed in all regions, with a higher prevalence in the South and Southeast regions (Marteli et al., 2020;Galan et al., 2021). In urban areas, leptospirosis transmission greatest risk occurs in areas with poor sanitation infrastructure, precarious housing and prone to natural flooding caused by rains (Chaiblich et al., 2017). The metropolitan macroregion of Porto Alegre, which constitutes a cluster of cases in the last decade, concentrates 38.2% of the total population of the state (Rio Grande do Sul, 2020) is a low altitude area, where cities often experience periodic flooding, which can facilitate disease transmission in the form of outbreaks (Schneider et al., 2015).
In southern Brazil, leptospirosis incidence in rural areas is twice as high as in urban areas (Galan et al., 2021). About 50% of the municipalities in Rio Grande do Sul are considered at risk for the disease, most of which are leptospirosis critical areas (Schneider et al., 2015). The concentration of rural cases of leptospirosis in the state is reported in the central region, which produces tobacco and the south, which produces rice (Schneider et al., 2015), with a higher incidence among males (Galan et al., 2021).
A higher incidence of the disease in the regions is mentioned in the literature with risk for rural workers, which alerts to the danger of infection in those places, with the presence of synanthropic animals, as well as exposure to production animals that may be infected. Even with worker health surveillance efforts, there is great difficulty in adapting workers from rural segments to accept personal protective equipment use.
Thus, preventive strategies are increasingly necessary and must be intensified, through educational campaigns for this sector.
In the present study, leptospirosis seasonality was observed, with a higher incidence in the hottest and wettest months. Warnasekara et al. (2021), using SARIMA time series analysis, observed a higher leptospirosis incidence in wetlands in Sri Lanka, which also had higher rainfall and a higher number of rainy days per month. Still, they attribute the difficulty of Leptospira to survive for long periods to low temperatures, high altitude and high solar radiation. In Rio de Janeiro, from 2007 to 2012, Guimarães et al. (2014), observed leptospirosis seasonality, with a higher concentration in the summer, and increase trend in the number of cases in the period 2008-2010, due to the rain precipitation increase. Through time series analysis, they suggested that there is a time interval between precipitation peak and the symptoms appearance, in which the number of cases begins to increase about a month after the rainy season. This information becomes crucial for preventive or assistance actions planning for the exposed communities.
The knowledge of the temporal dynamics and predicting infectious disease outbreaks using time-series, and particularly the SARIMA and GAM models, is the goal of several researchers (Martinez and Silva, 2011;Warnasekara et al., 2021). GAM models are often used in association analysis that do not show a linear pattern. This is the situation that usually occurs with the temporal evolution of infectious diseases.
Thus, in this study, the GAM model includes non-linear effects of time and leptospirosis incidence allowed a better visualization of the temporal dynamics of the disease in Rio Grande do Sul, based on the identification of temporal trends and significant occurrence peak in southern Brazil, mainly in 2011, 2014 and 2019. In 2020, on the other hand, the SARIMA model predicted a fall in incidence in the first half, followed by an increase in the second half of 2020.
The estimation of the occurrence of future cases, based on predictive models such as SARIMA (1,2,1) (1,0,1), which offer an acceptable adjustment, can be highlighted as one of the tools available to face the challenges to the surveillance of leptospirosis and other infectious diseases in Brazil. The autoregression and moving average parameters showed that leptospirosis cases number in future periods can be estimated by leptospirosis cases number that occurred in the previous months. The proximity of leptospirosis real number cases that occurred in 2019 (5.9 cases / 100,000 inhabitants) and SARIMA predicted number (3.9 cases / 100,000 inhabitants; CI. 95%=1.9 -6.7) showed that the model can be used to predict leptospirosis cases in the state. Thus, the prediction of cases as an instrument for planning control actions at the municipal level, associated with other determining variables and preventive measures, can highlight priorities in small areas of the urban territory and indicate the adoption of integrated strategies (Martinez e Silva 2011;Gabriel et al. 2019).
Using secondary data from surveillance systems entails working with numerous limitations. We used secondary data where cases were reported by health professionals, health services, and the public. Therefore, this case reporting was subject to misclassification bias. Thus, the municipal epidemiological surveillance service must be attentive and the health services sensitive to the notification of this disease, with professionals trained in the identification of suspected cases of the disease and notification, in order to ensure early diagnosis and treatment, in addition to being active in epidemiological investigation and health education actions for the population. The limitations of the study were, among others, related with this cited misclassification bias, especially if we consider that in almost half of the state municipalities (220; 44.2%) no leptospirosis cases were reported in any of the 13 evaluated years. However, even with the possible notification biases, the analysis of these data is extremely valuable for health agencies, as it is a way to analyze the behavior of various health problems and thus direct efforts and resources in order, to anticipating risk situations, and then to make surveillance and control more effective.
This study evaluated the spatial and temporal leptospirosis dynamics in Rio Grande do Sul, and showed the highest incidence regions and the highest infection risk years, allowing a satisfactory disease incidence prediction. Through the used models, we bring an approach that intends to help the understanding of the disease occurrence in an endemic area in southern Brazil. In addition, the used prediction model can be made dynamic, after current data inclusion, as well as more complex predictive models can be developed taking into account climatic, environmental and socioeconomic variables, among others, for a more accurate prediction. The model can be applied over timeframes or at smaller geographic levels to predict leptospirosis incidence. Thus, we consider that proposed leptospirosis incidence prediction serves as a tool for surveillance services, considering the growing need to clarify the dynamics of this neglected disease in Brazil.