A Bayesian spatio-temporal model for forecasting the prevalence of antibodies to Borrelia burgdorferi, causative agent of Lyme disease, in domestic dogs within the contiguous United States

This paper models the prevalence of antibodies to Borrelia burgdorferi in domestic dogs in the United States using climate, geographic, and societal factors. We then use this model to forecast the prevalence of antibodies to B. burgdorferi in dogs for 2016. The data available for this study consists of 11,937,925 B. burgdorferi serologic test results collected at the county level within the 48 contiguous United States from 2011-2015. Using the serologic data, a baseline B. burgdorferi antibody prevalence map was constructed through the use of spatial smoothing techniques after temporal aggregation; i.e., head-banging and Kriging. In addition, several covariates purported to be associated with B. burgdorferi prevalence were collected on the same spatio-temporal granularity, and include forestation, elevation, water coverage, temperature, relative humidity, precipitation, population density, and median household income. A Bayesian spatio-temporal conditional autoregressive (CAR) model was used to analyze these data, for the purposes of identifying significant risk factors and for constructing disease forecasts. The fidelity of the forecasting technique was assessed using historical data, and a Lyme disease forecast for dogs in 2016 was constructed. The correlation between the county level model and baseline B. burgdorferi antibody prevalence estimates from 2011 to 2015 is 0.894, illustrating that the Bayesian spatio-temporal CAR model provides a good fit to these data. The fidelity of the forecasting technique was assessed in the usual fashion; i.e., the 2011-2014 data was used to forecast the 2015 county level prevalence, with comparisons between observed and predicted being made. The weighted (to acknowledge sample size) correlation between 2015 county level observed prevalence and 2015 forecasted prevalence is 0.978. A forecast for the prevalence of B. burgdorferi antibodies in domestic dogs in 2016 is also provided. The forecast presented from this model can be used to alert veterinarians in areas likely to see above average B. burgdorferi antibody prevalence in dogs in the upcoming year. In addition, because dogs and humans can be exposed to ticks in similar habitats, these data may ultimately prove useful in predicting areas where human Lyme disease risk may emerge.

Introduction seropositivity against B.burgdorferi is established using the late phase C6 antigen that is indicative of disseminated disease [25][26][27][28][29]. Specifically in dogs, seroconversion to B. burgdorferi antigen C6 can occur as early as 3-4 weeks post-infection when bacterial burden is high [29]. Multiplex assays suggest antibodies to OspC, OspF, and C6 are characteristic of the intermediate stage of infection (at least 3-4 weeks post-infection), while antibodies to OspF and C6, in the absence of antibodies to OspC, are suggestive of late infection stage [29]. Data from pointof-care tests for C6 seropositivity have been used to develop distribution maps of B. burgdorferi seroprevalence in domestic dogs throughout the United States [30,31].
In the United States, B. burgdorferi is transmitted by Ixodes scapularis on the East coast and Midwest and Ixodes pacificus on the West coast, while a variety of wildlife species (e.g., mice, squirrels, shrews) serve as reservoirs [1]. Although not infected with B. burgdorferi, deer and migratory birds play an important role in maintaining and transporting tick vectors [32,33]. The risk of Lyme disease exposure is therefore related to abundance of a suitable reservoir and exposure to infected ticks. Overall, 96% of human Lyme disease cases reported by the CDC are from 14 states (Connecticut, Delaware, Maine, Massachusetts, Maryland, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin) [34]. Prevalence of exposure in dogs is similarly high in these states, although the range is expanded beyond the core areas of human cases within these states and into neighboring states such as Ohio, West Virginia and North Dakota [6]. While dogs show no age-specific epidemiological risk profile, children 5-9 years of age are at highest risk of Lyme disease, followed by adults aged 45-59 years [35]. This bimodal distribution of human risk is attributed to increased contact with the environment and exposure to infected ticks. Recent assessment of vector distribution identified a 45% increase in Ixodes spp. range over the last 18 years [36], with I. scapularis firmly established in twice as many counties in the North-Central and Northeastern US as previously believed. Importantly, the Northeastern and North-Central range of I. scapularis appears to be merging in the Ohio River Valley, forming a contiguous range of potential vector establishment.
The dynamic change in Ixodes spp. ranges suggests ongoing monitoring of these medically important vectors, while important, will be challenging and economically unfeasible on an annual basis. Understanding and forecasting spatial and temporal patterns of risk of exposure to B. burgdorferi is thus critical for targeting use of vaccines, preventives measures, and educational campaigns to best protect humans and veterinary species. Building on previously reported risk assessment and surveillance tools for vector-borne disease [37,38], we now report on a novel predictive model of canine B. burgdorferi exposure using a Bayesian approach to factor assessment and forecasting. As previously described, Bayesian modeling offers a number of advantages over classical approaches [39,40]. The probabilistic likelihoodbased methods are highly flexible and are able to adapt to data availability constraints. These methods are also capable of assessing predictive significance of various covariate factors. The use of data augmentation Markov chain Monte Carlo (MCMC) methods to sample from a posterior distribution provides the opportunity to treat missing data, such as absence of serologic data from certain counties, as latent (missing) variables, even in large populations [41,42]. Finally, a forecast of future seroprevalence, conditional on the past history of data, are easily constructed. The Bayesian methods capably quantify uncertainty both in terms of the potential stochasticity of the disease process and the model parameter estimates.
In what follows, eight factors previously purported to influence canine B. burgdorferi seroprevalence will be examined: annual precipitation, annual relative humidity, annual temperature, elevation, percentage forest coverage, percentage surface water coverage, population density, and median household income [43]. After assessment, a predictive model using the significant factors is developed, and annual B. burgdorferi seroprevalence forecasts are constructed. A comparison of actual versus predicted 2015 prevalence is conducted. Intended uses of annual B. burgdorferi seroprevalence forecasts for veterinary medicine include: 1) provision of an evidence-based tool used to encourage the year-round use of tick preventive to reduce exposure, and 2) promotion of annual use of diagnostic testing in areas where the disease is emerging. Finally, based on previous work using canine Lyme disease prevalence to assess human disease risk [17,44], we highlight the use of canine disease forecast maps to inform risk for human disease in an effort to reduce the burden of human disease on the US healthcare system.

Data structure
The observed data consist of test results from 11,937,925 B. burgdorferi ELISAs performed from 2011 to 2015 in the contiguous United States. The test detects antibodies produced in response to the C6 peptide found on the outer membrane protein, VlsE, of B. burgdorferi during infection [45]. Detection of antibodies does not indicate an active infection, as they will persist over time, even after the infection is resolved. Nor is this a quantitative test, so the time since infection is not known. The seroprevalence reported is a representation of the prevalence of dogs that have been exposed to B. burgdorferi, not the prevalence of clinically ill dogs. It is also important to note that the C6 ELISA does not detect antibodies to current vaccines [46], so vaccinated dogs will not test positive unless the vaccine has failed and the dogs experienced exposure resulting in disseminated infection. The data were provided by a commercial diagnostic laboratory available to veterinary clinics within the study area, IDEXX Laboratories, Inc., and included the county in which the testing clinic resides and the month in which the test was conducted [47]. The submitted samples represent a population of dogs provided veterinary care. No information was available on demographic details of the individuals tested, such as age, sex or breed of dog, nor the travel or testing history of the dog, or the reason why the tests were conducted. Overall, there were 759,103 positive tests, for an empirical prevalence of 0.0635. For the purposes of fitting the model, the data were aggregated by county and by year. Table 1 shows the eight considered factors thought to be associated with B. burgdorferi prevalence, along with the time period for which data on each factor is available and the level of geographic aggregation (county, state, etc.). For further discussion, including the source of each factor, see [43,48].
An empirical estimate of the prevalence within each county was constructed in the usual fashion after aggregating the serologic data over the available five year time span; i.e., these  Forecasting seroprevalence of Borrelia burgdorferi, causative agent of Lyme disease, in dogs county, South Carolina, only 13 test results were reported, of which 3 were positive, translating to an empirical prevalence estimate of 23%. Obviously, it is inappropriate to believe that the true prevalence is anywhere near as high as the empirical estimate tends to suggest in Colleton county, and that this effect would have been mitigated if more serologic data were available in this region. In the absence of additional serologic data, weighted head-banging provides a robust prevalence estimate at a given spatial location by combining over information that is available in near by areas, thus, for the most part circumventing the small sample size issue. This smoothing procedure used 45 triples and weighted the prevalence of each county proportional to the number of tests from that county. For further details on weighted head-banging and its uses within the context of disease mapping see [48]. Second, in order to render a spatially smooth and complete map Kriging was implemented to interpolate the prevalences of counties not reporting data. Kriging was implemented in ArcGIS using the default parameter settings [49]. Fig 3 provides the baseline B. burgdorferi antibody prevalence map resulting from this process, and should be thought of as a baseline for B. burgdorferi prevalence in domestic dogs within the contiguous United States. Tincubation conditions and hat is, this figure depicts the general spatial trend of B. burgdorferi prevalence in domestic dogs, and will serve as a device which allows one to assess the performance of Bayesian spatio-temporal model developed in the subsequent section, with respect to capturing these trends.

Statistical model
This section describes the statistical model and the techniques used to fit it. The purpose of the model is twofold: to identify environmental and societal risk factors which are significantly associated with the prevalence of B. burgdorferi antibodies in dogs and to predict future trends in the prevalence of B. burgdorferi antibodies in dogs.
Let Y s (t) and n s (t) denote the number of positive and total tests conducted in county s during year t, respectively, for counties s 2 {1, . . ., S} and years t 2 {1, . . ., T}. It is important to note that these values are the raw counts available in the data set; i.e., they are not the post smoothed values resulting from the construction of the baseline map. The available serologic data exhibits both positive spatial and temporal correlation; that is, prevalences in adjacent counties or in successive years tend to have similar values. Thus, to account for these effects this analysis considers using a spatio-temporal model to analyze these data; for additional information about spatial and spatio-temporal models, see [50][51][52][53][54][55][56]. In particular, this paper uses a Bayesian hierarchical model, which models spatio-temporal dependence through the use of random effects. The details of the distribution of these random effects are described in Forecasting seroprevalence of Borrelia burgdorferi, causative agent of Lyme disease, in dogs totality below. In the considered model, the number of positive tests were assumed to follow a Poisson distribution, which is a common choice for modeling count data [52][53][54][55]. Under this specification, the number of tests which are serologically positive for B. burgdorferi antibodies in county s during year t (i.e., Y s (t)) is assumed to be distributed as Y s ðtÞjn s ðtÞ; p s ðtÞ $ Poisson n s ðtÞp s ðtÞ f g; ð1Þ where log(Á) denotes the natural logarithm, 0 denotes the transpose of a matrix (or vector), * reads "has the distributional type," and j reads "given." Thus, Eq (1) reads "Y s (t) given n s (t) and p s (t) follows a Poisson distribution with mean n s (t)p s (t)." Furthermore, X s (t) = (X s,1 (t), . . ., X s,p (t)) 0 is a vector of covariate information for county s at time t, β = (β 0 , . . ., β p ) 0 is the corresponding vector of regression coefficients, and p s (t) is interpreted as the prevalence of B. burgdorferi antibodies in dogs residing in county s at time t. The random effects, ξ s (t), are used to account for the spatio-temporal dependence, so that the positive test counts (i.e., Y s (t)'s) are conditionally independent of each other given the total number of tests, the factor information, and the random effects. Note, this does not imply that the Y s (t)'s are independent across varying space s or time t, only that they are conditionally independent once the spatio-temporal correlation is accounted for through the random effects and the other covariate information. Eq (2) specifies the relationship between the prevalence p s (t) and the covariate information X s (t) and the random effect ξ s (t). This specification is standard for Poisson regression models [52][53][54][55]. In general, different spatio-temporal models specify different structures for the {ξ s (t)}. By far, one of the most popular models for areal data is the conditional autoregresive (CAR) model, and it is the one used here [50]. To further expound on how the CAR model is used in this analysis, it is noted that both spatial and temporal correlation is accounted for through the following multivariate autoregressive model where ξ t = (ξ 1 (t), . . ., ξ S (t)) 0 and ϕ t = (ϕ 1 (t), . . ., ϕ S (t)) 0 are random vectors. Eq (5) specifies that ϕ t are independent and identically distributed random vectors whose distribution follows a CAR model [50,51]. From Eq (4) it is apparent how the model accounts for temporal correlation. That is, in the multivariate autoregressive model depicted above, time-dependence is modeled through a temporal autoregressive model of order one (AR(1)), which is a staple among time series analysis [57]. Eq (4) relates year t to year t − 1. The parameter φ is the temporal correlation between consecutive years and lies within (−1,1). This ensures a causal and stationary solution to the time series model [57].
To examine the treatment of spatial correlation, let ϕ = (ϕ 1 , . . ., ϕ S ) 0 follow a CAR model, where for the ease of exposition the dependence on t has been suppressed, and it is noted that the subscripts correspond to the S spatial locations. There are several different kinds of CAR models. Usually, CAR models are defined by assigning a univariate distribution for each ϕ s , whose mean and variance depends on the spatial relationship between location s and the other locations. This paper uses the CAR model proposed in [51], which specifies the conditional distribution for each ϕ s to be Here, ϕ −s = (ϕ 1 , . . ., ϕ s−1 , ϕ s+1 , . . ., ϕ S ) 0 is an S − 1 dimensional vector that includes every ϕ s 0 except ϕ s . N(μ, σ 2 ) denotes a normal distribution with mean μ and variance σ 2 . The S × S matrix W is defined by W = {w s,s 0 }, where w s,s 0 = 1 if the s 0 th and sth counties are adjacent and w s,s 0 = 0 otherwise. The parameter τ 2 in Eq (6) scales the variance structure. Moreover, the conditional variance of ϕ s , given ϕ −s , is inversely proportional to the number of counties bordering it. That is, the ϕ s for counties with more neighbors have a smaller overall variance. This agrees with intuition; if county s has many neighbors, the model has more information to use in estimating ϕ s , and thus the variance of ϕ s should be smaller. In Eq (6), ρ 2 [0, 1] controls the correlation between bordering counties. The conditional mean of ϕ s , given ϕ −s , is the average of the ϕ s 0 of the neighboring counties weighted by ρ. Thus, as ρ increases so does the degree of dependence between ϕ s and the ϕ s 0 of the neighboring counties.
Using Eq (6), it can be shown that the joint distribution of ϕ is given by the following multivariate normal distribution where W is the adjacency matrix described above and D is an S × S diagonal matrix whose sth diagonal element is the number of neighboring counties for county s. The model is fit using Bayesian Markov Chain Monte Carlo (MCMC) techniques, with posterior samples being used to derive point estimates of the parameters (β, φ, ρ, and τ 2 ). Thus, to complete the Bayesian model formulation, the following prior distributions are specified b k $ Nð0; 1000Þ; for k ¼ 0; . . . ; p; ð7Þ φ $ UniformðÀ 1; 1Þ; ð8Þ r $ Uniformð0; 1Þ; ð9Þ t À 2 $ Gammað0:5; 0:05Þ: In particular, a diffuse prior distribution is placed on β k , for k = 0, . . ., p. This specification permits the prior to exert little, if any, influence on the posterior distribution, thus allowing the data to primarily drive the analysis and subsequently the conclusions. Uninformative (flat) prior distributions are assigned for φ and ρ, for the same reasons. Here "uninformative" means that all possible values of the parameter have equal probability under the prior distribution. The prior for τ −2 is chosen as a conjugate prior. Here "conjugate" means that the posterior and prior distributions are from the same distributional family, which simplifies computation. A posterior sampling algorithm was developed, in the usual fashion, to sample all model parameters and random effects from the posterior distributions. This MCMC sampling algorithm uses a combination of Gibbs and Metropolis-Hastings steps. To complete model fitting, Y s (t) for counties not reporting test results were treated as latent variables, and were sampled along with the model parameters. The posterior sampling algorithm was conducted using code written in R and C++. For more information about Bayesian models and MCMC methods, see [39].

Model assessment
This analysis first considered a full model consisting of all 8 climate, geographic, and societal factors. After fitting the full model, credible intervals were computed. Credible intervals are the Bayesian counterpart to frequentist confidence intervals. Table 2 provides the point estimates (i.e., the median of the posterior samples) of the regression coefficeints along with 98.75% highest posterior density (HPD) credible intervals for each of the 8 coefficients. These intervals reflect a 90% familywise error rate using a standard Bonferroni correction for multiple comparison. For more information about Bayesian credible and HPD intervals, see [39]. Table 2 indicates that annual relative humidity, annual precipitation, and median household income are not significant predictors of B. burgdorferi seroprevalence because their credible intervals contain 0. Removing combinations of these predictors results in 7 potential reduced models. Each of these potential models was fit, and the only model to have all significant predictors was the model which excluded all three of the predictors listed above. This model was selected as the reduced model, and the results for this model are summarized in Table 3. The posterior medians of the remaining model parameters are φ = 0.9404, ρ = 0.9997, and τ 2 = 0.5958, and these point estimates validate the claim of strong positive spatial and temporal dependence.
Most of the predictors in the reduced model behave intuitively. The coefficients for percent forest and water coverage are positive, and the coefficient for population density is negative, as one might expect. Note that the coefficient of elevation is positive and the coefficient of annual temperature is negative, which may appear to be counterintuitive, for further discussion see the Discussion section. In order to assess how well the Bayesian spatio-temporal model explains the data, the correlation between the baseline and model estimated prevalences was computed. In particular, a baseline estimate for each county was extracted from Fig 3, and is denoted asp s , for s = 1, . . ., S. A model based estimate for each county was then constructed by averaging over the 5 yearly estimates available from the fitted (reduced) model; i.e., the model Forecasting seroprevalence of Borrelia burgdorferi, causative agent of Lyme disease, in dogs estimate for the sth county is given byp s ¼ 5 À 1 P 5 t¼1ps ðtÞ, wherep s ðtÞ is the prevalence estimate resulting from the fitted (reduced) model for the sth county during the tth year. LetP andP denote the collection of baseline and model based estimates, respectively, after removing counties not reporting data. The correlation between these two sets was 0.894 indicating that the Bayesian spatio-temporal model provides a good fit to these data.

Forecasting
Under the Bayesian spatio-temporal model, forecasting B. burgdorferi seroprevalence in domestic dogs is tantamount to forecasting the factor levels and the spatio-temporal random effects. In this section, the methods used to forecast these variables are elucidated. First, since the primary goal of this work is to provide for a one year ahead forecast, it is reaonable to assume that certain risk factors are static; i.e., the current years value can be used as the forecasted value since expected changes are negligible. These variables include, forestation, water coverage, and elevation. Thus, the risk factors that need to be forecasted for each county are annual temperature and population density.
To forecast annual temperature, historical temperature records were collected from 1895 to 2015 for each county and modeled as an AR(1) model. The AR(1) model for an annual temperature series {F t } (previously denoted by {X s,1 (t)} for county s and time t) adheres to the following difference equation where {ω t } is zero mean white noise. Standard statistical software packages (e.g., R and SAS) can be used to easily fit AR(1) models. Letd andĝ denote estimates of δ and γ, respectively, and using these estimates a prediction of the annual temperature at year t + 1 from temperatures from year 1 to year t is obtained asF tþ1 ¼d þĝF t : In the proposed forecasting method,F tþ1 is used as next year's annual temperature value. For more information, see [40].
Forecasting the county population density for next year requires the county areas and their recent population counts. The US Census provides reliable county population counts for 2010 and estimated state populations for the years of 1969-2014. A simple linear regression model, with time being the independent variable, was fitted to this state level population data. From this model the county population can be forecasted by first predicting the state population and then partitioning this value into the counties within the state at a proportion that agrees with 2010 Census. The forecasted population density is then obtained by dividing the county population by the county's area.
To forecast the spatial and temporal random effects a year in advance, Eq (4) is used. In particular, since the ϕ t 's are independent and identically distributed over various years, given values of τ 2 and ρ (available from the posterior samples), ϕ t+1 is generated randomly from the multivariate normal distribution N(0, τ 2 (D − ρ W) −1 ). Then ξ t+1 is set to ξ t+1 = φξ t + ϕ t+1 . This process is repeated for each pair of ρ and τ 2 available from the posterior sample, thus yielding a sample of the next year's random effects, for further details see [39,40].
In order to assess the fidelity of the proposed forecasting methods, the 2015 test and factor data were removed and the reduced model was fit to the data collected during the years of 2011-2014. The methods described above were then implemented to forecast the 2015 prevalence of B. burgdorferi antibodies in domestic dogs. Figs 4 and 5 present the observed and forecasted B. burgdorferi seroprevalences, respectively, for 2015. Further, Fig 6 quantifies the localized predictive capability of the proposed approach by depicting the squared difference between the observed and forecasted B. burgdorferi seroprevalences for counties reporting more than 25 test results during 2015. From this figure, one will note that the proposed approach provides an accurate regional forecast throughout the contiguous United States. Note, counties reporting fewer than 25 tests were excluded for the small sample size issues discussed previously. To provide a global assessment of the predictive capacity of the proposed forecasting technique, the weighted correlation (with county weights being set to be the number of tests reported; i.e., n s (t)) between the observed and forecasted prevalence estimates was computed, after removing counties not reporting data, with a value of 0.978 being obtained. Thus, this finding tends to suggest that the proposed approach can be used to accurately forecast future trends in B. burgdorferi seroprevalence within the contiguous United States. Here the weighted correlation between two sets, say A ¼ fa s g S s¼1 and B ¼ fb s g S s¼1 , is defined as CorrðA; BÞ ¼ P S s¼1 w s ða s À " aÞðb s À " bÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P S s¼1 w s ða s À " aÞ 2 P S s¼1 w s ðb s À " bÞ 2 q ð11Þ where w s denotes the weight assigned to the sth observation and Note, the purpose of a weighted correlation is identical to that of the usual correlation, with the exception that it accounts for unequal sample sizes through the weights. Fig 7 presents the 2016 forecast of B. burgdorferi prevalence within the contiguous United States, after smoothing (Kriging with default parameters were used in the software ArcGIS).

Discussion
The objective of this study was to evaluate the space-time patterns and the environmental and socioeconomic drivers of canine B. burgdorferi prevalence in the contiguous U.S. using a Bayesian hierarchical modeling approach. Our data were based on serologic testing results for B. burgdorferi C6 antigen between 2011-2015. Bayesian hierarchical models have advantages over frequentist approaches for analyzing infectious disease datasets with inherent space-time dependency [37,[58][59][60], such as clusters of cases that may be linked due to specific environmental drivers. This is particularly relevant for vector-borne disease as the spatial distribution of most vectors is largely determined by environmental and climatologic conditions and the Forecasting seroprevalence of Borrelia burgdorferi, causative agent of Lyme disease, in dogs presence of suitable reservoir hosts [1,[61][62][63]. As such, we included relevant ecological covariates, where available, in our model to help explain variability in our aggregated dataset and to strengthen inferences from our Bayesian spatio-temporal model. The majority of the predictors in our reduced model of covariates are biologically relevant. The coefficients for percent forest and water coverage are positive, and the coefficient for population density is negative, supporting the established role of vector and vertebrate host ecology in disease transmission [64]. Tick vectors rely on specific environmental and microhabitat factors for survival while off the host [65]. The ecology of B. burgdorferi is complex and involves numerous vertebrate species that may serve different roles such as reservoirs, dilution hosts, and hosts for the ticks (i.e., white-tailed deer), and all of these are impacted by habitat and anthropogenic changes [66]. Interestingly, the coefficient of elevation is positive and the coefficient of annual temperature is negative, which is counterintuitive to our understanding of tick ecology and consensus opinion that ticks in the Eastern United States rarely inhabit high elevations (i.e,. Appalachian Mountains) [61][62][63]. This apparent paradox is likely due to the fact that Lyme disease is more prevalent in the Northeastern United States [34], which compared with the rest of the country, has a relatively low annual temperature and relatively high elevation. It is important to remember that the proposed model relates the mean of the response to the predictors in a linear fashion and is fitted jointly within the range of the observed data. That is, the relationship between the predictors and the prevalence of Lyme disease is only valid for predictor levels within the range of the observed predictor values, with validity declining at the margins. Further, extrapolations to factor levels not in the range of the considered data set will likely lead to contradictions. Thus the positive coefficient of elevation does not imply that Lyme disease prevalence will continue to increase at extreme elevations, nor does the negative coefficient of annual temperature imply that Lyme disease seroprevalence will continue to increase at extremely low temperatures.
Similar to reported incidence of human Lyme disease [67], we observe higher 2011-2015 aggregated B. burgdorferi seroprevalence in dogs from Connecticut, Delaware, Maine, Massachusetts, Maryland, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin. However, in contrast, based on data from dogs, we observed an expansion of this endemic range to include Northern California, Southeastern Oregon, Southwestern Idaho, Eastern Colorado and Northern New Mexico (Fig 1). Perhaps most striking is the recognized expansion of seropositive dogs on the northern border of the contiguous U.S. along the Canadian border, including North Dakota, and the border of Northern Montana and Idaho. The westward expansion of canine B. burgdorferi seroprevalence from Minnesota into North Dakota mirrors recent reports that Lyme disease is poised to be a Forecasting seroprevalence of Borrelia burgdorferi, causative agent of Lyme disease, in dogs significant human public health concern in North Dakota [68]. Further, this observation supports and extends recent concern over northward expansion of B. burgdorferi infected ticks into Canada from the Northeastern and Midwestern United States [69,70].
From five years of historical diagnostic tests, our data show that a Bayesian model can capably quantify B. burgdorferi seroprevalence, which ultimately will support qualitative decisionmaking and surveillance in disease management and response. When comparing actual to forecasted B. burgdorferi seroprevalence in 2015, a weighted correlation of 0.978 was achieved, demonstrating significant predictive skill. Finally, using our predictive forecast model we report forecasted 2016 canine B. burgdorferi seroprevalence. Of note is the apparent convergence of B. burgdorferi infection of dogs from the Northeastern and Mid-Central United States in the Great Lakes region, encompassing Indiana, Ohio, Illinois, Kentucky and Michigan. This observation is supported by recent reports of encroaching I. scapularis into this region [71], and suggests annual testing of dogs in these states, as well as North Dakota and bordering Canadian provinces is strongly warranted. As an adjunct to annual testing, year-round use of acaracides in dogs can reduce tick infestation, thereby reducing the potential for not only B. burgdorferi transmission, but also several other important canine tick-borne pathogens [72,73]. Finally, vaccination of dogs against Lyme disease to prevent B. burgdorferi infection has been shown to be effective in controlled experimental infection studies [74][75][76][77] and in protection against natural infection in dogs living in endemic areas [78,79]. In endemic areas, dogs that are vaccine protocol-compliant are significantly less likely to become infected with B. burgdorferi [80]. Vaccine studies have concluded that emphasis should be placed on vaccinating dogs at risk for Lyme disease before they are exposed to infected ticks [79]. As such, we suggest the forecasted areas with an increased likelihood of B. burgdorferi transmission outside of the established endemic areas should provide veterinary practitioners with evidence-based options for recommendation of Lyme disease vaccines to clients and protection of dogs against emerging disease.
Limitations to this study include selection bias of the canine population. As mentioned above, samples were submitted for testing by veterinary clinics, and therefore represent dogs under the care of a veterinarian. This suggests that the data are a conservative estimate of the prevalence in domestic dogs because dogs at the highest risk of tick exposure would be those that receive no veterinary care, those from lower socioeconomic families, or are owned by clients who decline these additional tests during wellness visits. Additionally, a lack of knowledge about the distribution of tick-borne pathogens may limit the testing that veterinarians request; however, these tests are often run during routine heartworm testing so this latter issue is likely a minimal concern. Additional bias could be introduced through variations by region in the use of the diagnostic tests (i.e. whether it is used predominately for wellness visits or for cases in which the disease is suspected prior to testing) or variations by region in the use of reference laboratory services and products. It is also recognized that travel history is not controlled for and will account for some of the cases outside endemic regions. Despite these limitations, these data are acquired on a monthly basis [6], and thus provide a robust and timely source of information about the dynamic change of canine B. burgdorferi seroprevalence across the contiguous US, and holds promise for longitudinal studies to better understand the dynamic nature of vector-borne disease over time. There is also evidence that other tick vectors are involved in sustaining the enzootic cycle of B. burgdorferi ss, such as Ixodes affinis, [81] and so it is important when interpreting these results to consider the possibility of other vectors of B. burgdorferi. However, I. affinis is believed to be uncommonly found on dogs and does not feed on humans so its impact is considered to be minimal.
Though the proposed technique could be used to construct long-term forecasts, caution should be taken. In particular, our approach makes use of forecasted values of the significant factors, with some factors being assumed to be static throughout time (e.g., forestation and surface water coverage). This assumption is reasonable in the short-term, but would obviously be problematic over a much larger time span; e.g., twenty to fifty years. Moreover, in general, when forecasting future trends one should be cautious of long-term forecasts, due to possible violations of assumed model forms not apparent in the available data; e.g., changes in population density that may be spatially variable throughout time. Thus, we promote the use of our approach to provide only short-term forecasts of spatial trends in B. burgdorferi seroprevalence.
Similar to domestic dogs, spatial risk models for human Lyme disease are needed to address the rise in human Lyme disease incidence. Ideally, spatial risk models for human Lyme disease would be based on vector abundance inclusive of pathogen burden [82,83]. In comparison to other vector-borne diseases, Lyme disease risk assessment is facilitated by the observation that B. burgdorferi prevalence in Ixodes spp. from Lyme disease endemic regions is relatively high [82,84,85]. Canine B. burgdorferi seroprevalence has been cited as one potential tool for Lyme disease risk assessment in humans [44,86]. As noted by Mead et al. [44], canine B. burgdorferi seroprevalence greater than 5% at the county level was associated with human risk of Lyme disease [44], while less than 1% canine seroprevalence was associated with little to no risk for human cases of Lyme disease [86]. We further this suggestion that the use of canine B. burgdorferi seropositivity has merit as a risk assessment tool in both endemic and non-endemic regions. In particular, areas where B. burgdorferi seroprevalence is greater than 1%, but increases over time, may be an area to focus tick-prevention messages as these areas may be at an increased risk for human Lyme disease. As such, we believe that with further research into the relationship between human and canine Lyme disease, canine B. burgdorferi seropositivity has the potential to function as an early warning system for geographic expansion and emerging infection risk in humans [17], and support targeted vector surveillance efforts to monitor Ixodes spp. B. burgdorferi infection in a cost-effective manner. As the increasing incidence of Lyme disease continues to put pressure on the United States healthcare system [7], the use of canine B. burgdorferi seroprevalence to ultimately forecast spatial and temporal patterns of risk of human Lyme disease is a promising tool for targeting public health educational campaigns and resources for vector surveillance to best protect humans and veterinary species from disease.