Hierarchical Bayesian Spatio–Temporal Analysis of Climatic and Socio–Economic Determinants of Rocky Mountain Spotted Fever

This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio–economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio–temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio–economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main–effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate–change impacts on tick–borne diseases are discussed.


Introduction
Rocky Mountain spotted fever (RMSF) is a life-threatening fulminant tick-borne infection caused by the obligate intercellular bacterium Rickettsia rickettsii (Family: Rickettisiaceae, Order: Rickettsiales) [1]. This disease has been reported from most of the lower 48 states in the United States, with onsets typically occurring during the tick season (April through September)

Study area
The spatial extent considered in this study included the contiguous states of Kansas, Missouri, Oklahoma and Arkansas in central United States, which have recorded RMSF cases since the first identification of the disease, and has some of the highest number of incidences in the United States. Climate in the study area is transitional from east to west, and as well as south to north, with the southeastern region receiving progressively more rainfall than the west.

Data
2.2.1 Ethics statement. The RMSF data used in this study was notifiable by individual state health departments to the Centers for Disease Control and Prevention (CDC), and were aggregated annually to their respective administrative units. No individually-identifiable information was collected for this study. The use of RMSF data was approved by the Internal Review Board at Kansas State University's Office of Research Compliance (IRB #7465) and the study was considered exempt from the requirement for full review by the Missouri Department of Health and Senior Services (MDHSS) Institutional Review Board based on 45 CFR 46.101(b)(4).
2.2.2 Epidemiological data. Annual county-level RMSF cases between years 2005 and 2014 were obtained from the state health departments of Kansas, Missouri, Arkansas and Oklahoma. Cases are classified into 'confirmed', 'probable', and 'suspected' categories per CDC guidelines. Case classification for RMSF changed once during the study period in 2008 and included the 'suspected' criteria for diagnosis. For the purposes of this study all three categories were considered to indicate positive RMSF diagnosis. In the case of Missouri, cases were predominantly aggregated at county-level; however, they also included three additional public health administrative units that were subsequently used in the disease mapping and Bayesian spatio-temporal analysis. Population data for these jurisdictions were provided by the MDHSS, and for all other jurisdictions an average value of county population recorded for years 2000 and 2010 by the US Census Bureau was used.
2.2.3 Covariate data. Covariate data for this study was collected from three thematic groups; viz., physical environment, climate, and socio-economic status. For environmental data, the percentage land occupied by different cover types in each county was estimated in a geographical information systems (GIS) environment from the publicly available 2006 National Land Cover Dataset [29] (Table 1). Among climate variables, county averages of annual mean temperature; the maximum normalized vegetation index (NDVI); minimum  (Table 3). Identical census attribute information and geographic boundary files for counties were also obtained from the NHGIS. From the tables, 20 housing and 23 population related variables were extracted for each county by spatial query and joined to the census shapefiles using the common GIS codes.

Statistical analysis
2.3.1 Covariate selection. Candidate explanatory variables to be included in the Bayesian hierarchical models were screened a priori in order to avoid model fitting issues. Several frequentist bivariate regression models were used to evaluate each variable independently and only variables that were significant at a liberal p 0.2 were kept for further analysis. A bivariate regression takes the form, where Y ij is RMSF relative risk, β 0 the intercept coefficient, and β k the coefficient for the explanatory variable vk ij (k = 1,. . ., n) and (i = 1, . . ., 105 county; and j = year 2005, . . ., 2014). Care was taken not to remove candidate variables that were deemed clinically relevant [33]. Among screened variables the presence of multicollinearity was tested by estimating the variance inflation factor (VIF) and all variables with a (VIF ! 10) were considered to indicate multicollinearity, in which case, one of the variables was dropped at a time until multicollinearity was absent. Non-linearity among independent variables was evaluated, and significant variables with nonlinearity were categorized using cutoffs based on scatter-plots.
2.3.2. Bayesian model specification. The observed number of RMSF cases in the study was notated as Y ij among N ij individuals at risk in the population of county i, diagnosed with RMSF in year j. Y ij was assumed to follow a Poisson approximation, where E ij is the expected number of the population at risk for RMSF and θ ij is the relative risk. Although a Binomial approximation could have been used in this context since n is known, relatively only few RMSF infections occur randomly over time with different age groups, whose members vary throughout the study period due to aging; therefore, Poisson distribution was chosen over Binomial since the latter assumes constant population of individuals. And, since RMSF prevalence is disproportionate among different age groups [4], standardized rates were calculated assuming five age classes l, (< 5, 5-19, 20-45, 46-65 and > 65). Mapping crude rates can be non-informative or misleading when population in some areal units are small, resulting in large posterior estimates, which in turn render it difficult to distinguish chance variability from genuine differences. The expected number of RMSF cases was therefore calculated by A logit link function in an extended generalized linear model (GLM) structure was used that incorporated stochastic spatial and temporal functions and as well as different covariate effects. We first fitted a parametric spatio-temporal model with various random terms to act as surrogate indicators of unobserved risk factors that vary over time, space or both. In a second model, the parametric terms were replaced with nonparametric terms to assess any model improvements. In a third step, we extended the second model with different covariates to explain spatio-temporal patterns.
The parametric model was notated as following, Where, α (intercept term) represents the mean prevalence of RMSF in all counties in all years, and u i and v i are random terms accounting for spatially structured variation in RMSF prevalence and unstructured heterogeneity, respectively. No interaction was assumed to exist between u i and v i , and these terms were assigned u i~C AR, and v i $ Normal ð0; s 2 v Þ priors. Spatial dependence in u i was applied by assuming a conditional autoregressive model (CAR)(γ) with a Gaussian distribution, which implies that each u i is conditional on the neighbor u j with variance ðs 2 i Þ dependent on the number of neighboring counties n i of county i, i.e., The γ j term measured the purely temporal and a linear time trend in the data, which assumed no temporal structure a priori. An independent mean-zero normal prior with unknown variance s 2 g was used for γ. In the same model, in order to account for spatio-temporal interaction effects in RMSF prevalence, a spatially structured C ij term modeled as an intrinsic Gaussian Markov random field (IGMRF) [34] was included, with a joint prior density for ψ = (ψ 1 ,. . .,ψ I ) T written as A major limitation with the above described model is the assumption that there is a linear time trend in each region. Therefore, in a second model the linearity assumption was dropped and a nonparametric term, β j was included, whose prior density can be written as The β term quantified the overall time trend for the region. In addition, a nonparametric Type-IV interaction prior [25], [35] was assigned to the C ij term, which is notated as For the extended models with covariate terms, different covariates were included to the second model in several steps, starting with a model that included all covariates screened in the bivariate procedure that retained significance at a liberal (p 0.2) value followed by the removal of one variable at each step. Individual covariates were retained in the model unless their removal resulted in the increase of Deviance Information Criterion (DIC) value by 5 units or more. The removed covariates did not re-enter the model and all covariates were assigned noninformative priors, Normal ð0; s 2 v Þ: The spatiotemporal interaction term, C ij in the covariate model measured the trend after accounting for purely spatial (α + u i + v i ) and purely temporal effects (β ij ) in addition to the covariate effects, and reveals the variation in RMSF incidence trends across counties over time. The differential trend (the difference between overall trend and local trend) of a county i for a given year j can also be estimated from C ij , with C ij > 0 indicating steeper than overall trend and C ij < 0 indicating a less steeper the overall trend. Counties where C ij = 0 are areas where the trends are equal [36].
Model selection criteria in this study used two criteria, the DIC value and the logarithmic score (LS). The former is the tool of Bayesian model choice for selecting the most parsimonious model after penalizing for model complexity [37], however DIC can be problematic in models that consider many random effects [38]. Thus, in addition to DIC the LS of each model was computed to assess predictive quality of models [39], [40], which is represented by LS = −log(π ij ), where π ij = pr(Y ij = y ij | y -ij ) denotes the cross-validated predictive probability. A smaller LS indicates better predictive quality of a model.
All model posterior parameters were estimated using a Bayesian framework implemented using R-INLA software [41], and the median estimates from the posterior distribution and their corresponding uncertainty measures [95% Credible Intervals (CrI)] were recorded.

Results
From January 1, 2005 to December 31, 2014, there were a total of (n = 11,062) confirmed, probable and suspected cases of RMSF reported to the health departments of the four states included in the study. Missouri recorded the most number of cases (n = 3,766) followed by Arkansas (n = 3,271) and Oklahoma (n = 3,271), and Kansas recorded the least number of cases (n = 754) among the four states during this period. Throughout the study period, and in all four states males were diagnosed at a higher rate than females, with a male female ratio averaging 2.2:1 for all states. Also, the disease was more common among the 46-64 year old age group followed by the !65 year old age group. A plot of case numbers in the four states during the study period is illustrated in Fig 1 Of all the variables evaluated in the bivariate screening procedure in this study, five retained significance at a liberal p 0.2 level (Table 4) Bayesian hierarchical models, the posterior Bayes median estimates and their corresponding 95% CrI are present in Table 5. From Table 6, we see that Model-1 has the highest deviance information criterion (DIC) value pointing to the presence of additional space-time structure to the data beyond just purely spatial and temporal main effects. The addition of nonparametric terms to the fixed effect spatio-temporal model markedly reduced the DIC values, and the subsequent addition of covariate terms further improved the spatio-temporal model performance. 'Income in the past 12 months below poverty level' (henceforth, poverty-status), 'average relative humidity' and 'average daytime land surface temperature ! 35°C' were retained in the final, best fitting Bayesian hierarchical covariate model. This showed that the covariate terms are competing to explain the space-time structure in the dataset that are not captured by the main effects alone. Even though the covariate model is not the most parsimonious it was the best among all the models considered, and all interpretations were made based on this model alone.
The posterior Bayes estimates of the final covariate model indicated that all variables except the 'number of houses built in 2005 or before', and 'percent developed-medium intensity area' were significantly positively related to RMSF levels in the four states. Based on the magnitude of posterior median estimates, we observe that the socio-economic variable 'povertylevel' was most important, followed by almost identical influence of 'average relative humidity' and 'average daytime land surface temperature'(! 35°C).  The posterior median and uncertainty levels (95% CrI) of overall time trend, β j is depicted in Fig 2, showing a clearly increasing trend for the region with a slight decrease in the later part of the study. Plots of annual crude rate estimates for RMSF prevalence for individual years over the study period is depicted in Fig 3, while Fig 4 illustrates maps of Bayes smoothed annual RMSF estimated relative risks for the study period after adjusting for the random terms and covariate effects. The estimates are based on the Bayesian geostatistical covariate model with socio-economic and climatic predictors and correspond to the median of the posterior predictive distribution. The counties of high risk were identified by mapping county-specific differential trends (posterior median of C ij ) with values greater than 0 (Fig 5) after accounting for D is the expected deviance, p D is the deviance derived from the expected values of parameters, DIC is the deviance information criterion, and LS is the logarithmic score.

Discussion
The smoothed risk maps of relative risk produced in this study show all areas in the region that are affected by RMSF year after year, and are superior to maps of crude rate estimates, largely owing to the ability of Bayesian models to efficiently borrow information from neighboring areal units, and in tackling the low-number problem commonly associated with epidemiological data in some areal units versus others. A primary concern in this study however was to identify the presence of any overall trend for the region, and to isolate high risk counties where disease trend could be increasing. We therefore specified appropriate random effect terms in a Bayesian framework to identify an overall time trend, and any space-time interaction effect, Spatial Analysis of Rocky Mountain Spotted Fever followed by the addition of different covariates that would explain additional variability in the data due to space-time effect. Although a descriptive plot of the reported numbers of cases to the health departments (see Fig 1) show a declining trend for Missouri, and a more or less stable trend for Kansas, this study has identified a steadily increasing overall time trend for RMSF for all four contiguous states combined during the study period, indicating that RMSF has in fact increased year by year in the overall region, which are not visible when simply the crude numbers are plotted. Maps of counties with positive differential trends (C ij > 0) represent higher risk for RMSF compared to other counties in the region, and can be seen to have increased in area over the years and expanded outwards from a central focus in 2006 bordering all four states towards all the directions in subsequent years. These findings have implications for prevention and management strategies of RMSF. The notable abrupt increase in the number of cases during the 2011-2012 period in Arkansas and Oklahoma could be attributed to a number of factors, including changes to reporting practices employed by the state health departments in their respective states that may have temporarily increased interest in case reporting, and as well as changes to diagnostic methods adopted by laboratories during this period, which may have improved the accuracy levels by which different tick-borne diseases are identified. It is evident through the state health departments of Arkansas and Oklahoma that they made a strong push for improving submission rates for tick-borne disease cases through awareness and education campaigns during the 2010-2011 period. Also, increasingly more diagnostic laboratories in the region are adopting molecular methods based on in vitro amplification procedures for rapidly detecting multiple tick-borne diseases, reasons that could have played a role in the noticed surge. The presence of environmental or climatic linkages behind this surge is also conceivable, but they could not be adequately evaluated in the present study due to data limitations. However, such evaluation of any associations, particularly with climatic patterns is worthy of further consideration. The winter and spring months of the year 2012 was unusually warm and humid in the Midwestern U.S [42], [43]. Even though a prolonged low pressure system over the Midwest lead to a drought in the summer of 2012, conditions earlier in the tick season may have favored a higher population growth and the proliferation of pathogens.
Poverty status of individuals in the four states was a strong predictor of RMSF in this study. In an earlier study we identified poverty status to be a significant predictor of another tickborne disease, human monocytic ehrlichiosis in the same region as well [28]. Such associations of tick-borne diseases with socio-economic conditions are rarely evaluated in North America; however, lower socio-economic conditions were stronger predictors of tick-borne encephalitis in Europe compared to climatic and environmental predictors [23], [44]. This recurring association is perhaps indicative of higher exposure to tick pathogens for low-income individuals, who may work outdoors, and lower literacy levels and less awareness towards preventing tickborne illnesses could be contributing factors.
There is notable interest in understanding the implications of ongoing climate-change on incidences of vector-borne diseases worldwide [10], [14], [45]. Climatic conditions affect arthropod phenology, population dynamics and their distribution over space, and the recent changes noted in shifting spatial range of some tick vectors [46], extended activity period due to relatively warmer conditions during winter [47], and changes to questing behavior all have been suspected to be influenced by climate-change [48], [49]. Whilst a strong emphasis is placed on researching the effect of climate change on disease-spreading arthropod vectors, the detection and then ascribing climate-change to increased disease incidences is however problematic since there are many confounding players involved in such systems. Even though ongoing climate change is conceivably an influential driver in vector-borne disease systems, its relative importance over social, economic and demographic factors is debated [23]. This study further shows that a mixture of factors are important in the spatio-temporal patterns or the eco-epidemiology of tick-borne diseases, including such factors as poverty status and they need to be accounted for in modeling climate-change implications on vector-borne diseases.
The two climatic factors associated with RMSF prevalence in this study, relative humidity and daytime land surface temperature are important climate-change indices, and are intensely monitored by climate-change researchers [50]. Their identification in the present study as important drivers of RMSF at a county-scale is therefore significant. Optimal humidity conditions are vital for tick survival and it is an important delimiter to their spatial distribution [51], and relative humidity can often be seen associated with the survival and abundance of ticks in the literature, with higher humidity conditions often favoring the long-term survival of some ticks species' life stages through dry seasons [48] among other reasons. The higher daytime land surface temperature identified in this study is likely an indication of the average temperature conditions in some counties that are unfavorable for Dermacentor ticks versus others. Potential temperature effects on ticks and tick-pathogen interactions have been noted [52], and studies aiming to understand any physiological effects on Dermacentor ticks and R. rickettsii interaction in their host are worthy of consideration.
Some limitations of the study need to be mentioned. Even though R. rickettsii is identified as the causative agent for RMSF, multiple Rickettsia species are vectored by ticks-many known to infect humans, which lead to similar clinical symptoms to that of RMSF [53]. These other Rickettsia spp. also cross-react in diagnostic tests that are currently performed for confirming RMSF status, leading to poor test specificity [54]. The extent to which cases were wrongly identified as RMSF in the present dataset cannot be reliably verified, although a majority of the cases are likely to be R. rickettsii causing RMSF. We expect that the use of PCR-based diagnostic methods versus the current serology-based diagnosis will alleviate the misdiagnosis in the future and a clearer picture of the space-time dynamics of RMSF will become evident.

Conclusions
Results of this study show that Rocky Mountain spotted fever incidence in the central Midwestern states of Kansas, Missouri, Oklahoma and Arkansas have increased over the past decade, with high risk counties for this disease lying in the central and southern portions of the region. At the scale of a county, the spatial and spatio-temporal covariates of poverty level and average relative humidity are positively influential for incidence levels, while average day time land surface temperatures above 35°C is a limiting factor. Future epidemiological studies on this disease and other tick-borne diseases will benefit by considering socio-economic status of individuals, particularly poverty status. The identification of climate-change indices as important drivers of RMSF in this study is significant in the context of climate-change impacts on infectious diseases.