Uncertainty in Model Predictions of Vibrio vulnificus Response to Climate Variability and Change: A Chesapeake Bay Case Study

The effect that climate change and variability will have on waterborne bacteria is a topic of increasing concern for coastal ecosystems, including the Chesapeake Bay. Surface water temperature trends in the Bay indicate a warming pattern of roughly 0.3–0.4°C per decade over the past 30 years. It is unclear what impact future warming will have on pathogens currently found in the Bay, including Vibrio spp. Using historical environmental data, combined with three different statistical models of Vibrio vulnificus probability, we explore the relationship between environmental change and predicted Vibrio vulnificus presence in the upper Chesapeake Bay. We find that the predicted response of V. vulnificus probability to high temperatures in the Bay differs systematically between models of differing structure. As existing publicly available datasets are inadequate to determine which model structure is most appropriate, the impact of climatic change on the probability of V. vulnificus presence in the Chesapeake Bay remains uncertain. This result points to the challenge of characterizing climate sensitivity of ecological systems in which data are sparse and only statistical models of ecological sensitivity exist.


Introduction
Vibrio spp. bacteria are a threat in many coastal aquatic ecosystems around the world [1][2][3][4]. In the Chesapeake Bay, the number of annual human Vibrio cases of infection has nearly doubled in the past decade [5,6]. Furthermore, Vibrio spp. is frequently detected in shellfish harvested for human consumption during the warm summer months [7]. In general, this seasonality correlates with peak incidence of Vibrio disease caused by Vibrio spp. bacteria in many coastal regions [8][9][10]. The probability of finding various Vibrio spp. in the Bay varies spatially and seasonally, and researchers have modeled these probability patterns as a statistical function of surface water temperature and salinity [11][12][13][14][15][16]. These temperature and salinity-based Vibrio models have demonstrated skill for available datasets in the Bay and structurally similar statistical models have been applied to predictions of V. cholerae, V. vulnificus, and V. parahaemolyticus in other regions [1,4,17,18]. The environmental range of V. vulnificus can vary by region, but in general the bacteria are found in waters with salinity between 5 and 25 (practical salinity units) and temperature above 15uC [12,[19][20][21].
Recent studies show that surface water temperatures in the Chesapeake Bay have warmed by 0.3-0.4uC per decade over the past 30 years [22,23]. This trend has resulted in an expansion of the warm season period during which water temperatures are high enough to support V. vulnificus growth: the onset of spring time temperatures (.15uC) has advanced by nearly three weeks [22]. Salinity patterns are also sensitive to climate change, as changes in springtime flow of the Susquehanna River -the primary freshwater input to the Bay -can influence salinity throughout the Bay over the V. vulnificus growth season. The consensus of climate models is that there will be a rise in winter and spring precipitation in the northern portion of the watershed [24,25] implying an increase in January to May Susquehanna River steam flow. A study by Gibson and Najjar (2000;[26]) showed that an increase in the January-May Susquehanna stream flow could potentially decrease winter and springtime salinity values by 7% in the upper Chesapeake Bay.
Even though there is considerable uncertainty in the magnitude of projected warming and freshening of the Chesapeake Bay [27], it is valuable to understand how a temperature and salinity sensitive pathogen like V. vulnificus might respond to observed and projected trends in these environmental parameters. Here we examine three statistical models of V. vulnificus probability of presence that demonstrate skill in predicting V. vulnificus probability of presence in Chesapeake Bay. All three models use water surface temperature and salinity as the only predictors, but they differ in their structure and/or in the data used for training and evaluation. One model is the generalized linear model (GLM) of Jacobs et al. (2010; [12]) trained on data collected in the Chesapeake Bay in 2007 and 2008, the second model is also a GLM but trained on a 2011 and 2012 data set, while the third model is generalized additive model (GAM) also trained on the 2011-2012 data. The latter two models are from Urquhart et al. (2014;[15]), where surface temperature and salinity were used to model both the probability of presence and concentration Vibrio spp. in the upper Chesapeake Bay. Here we focus just on probability models to enable comparison with Jacobs et al. (2010; [12]), who considered only a probability model. Furthermore, we can evaluate the effect that these differences in structure and training data have on modeled estimates of V. vulnificus probability under current climate conditions, which is relevant for pathogen risk assessment and early warning, and consider the implications of these differences for projected V. vulnificus risk under climate change.

Methods
The Chesapeake Bay Estuary, adjacent to the Maryland, Delaware, and Virginia coastline, covers an area of approximately 11,500 km 2 and is characterized by a sharp north-to-south salinity gradient. Salinity ranges from 0-6 in the northern Bay to 18-30 near the mouth of the Bay. Surface water temperatures follow a seasonal cycle, ranging from local wintertime temperatures of 20.5uC to summertime temperatures of 31uC [28]. The Susquehanna River, the largest and northernmost tributary, accounts for roughly 45% of the yearly freshwater inflow into the Bay. This paper focuses on the upper portion of Chesapeake Bay (Fig. 1). The upper region of the Bay was selected to avoid model predictions outside of the original training data salinity range (salinity .14).
The climatological analysis presented here used historical environmental data collected by the Chesapeake Bay Data Program [29]. Bi-monthly surface water temperature, salinity, and chlorophyll a data were obtained for 16 main stem and tributary monitoring stations ( Fig. 1) collected from 1985 through 2013. For salinity, the absolute difference between observed salinity and the V. vulnificus optimal salinity value of 11.5 [12] was calculated, and use of deviation from this was used as an explanatory covariate. The 16 monitoring stations were selected based on their geographic location serving as a representation of the upper Chesapeake Bay. In situ data were used to delineate three different salinity zones: upper-upper Bay (hereafter: ''upper region''), middle-upper Bay (hereafter: ''mid region''), and lowerupper Bay (hereafter: ''lower region''). These stations cover the upper main-stem Bay as well as tributary locations, with six stations in the upper region, five stations in the mid region, and five stations in the lower region. Observational data were averaged at monthly intervals for each zone resulting in 337 data records for the upper region and 342 data records for both the mid and lower regions.
These salinity and temperature data were applied to the three statistical V. vulnificus probability models available for Chesapeake Bay: n is the regression coefficient for the independent covariates, Temp is daily surface temperature, and |SalnOpt| is the absolute distance from optimal salinity of 11.5], which was trained using 235 V. vulnificus samples collected during the months of July and October of 2007, and April, July, and October of 2008 and were analyzed by the NOAA Chesapeake Bay Office. We included a GAM in addition to the two structurally identical GLMs because GAM models allow a more flexible regression modeling of the transformed response that combine the predictor variables in a nonparametric manner [31]. The NOAA_GLM and JHU_GLM models have the same structure but vary in the training data, whereas the JHU_GLM and JHU_GAM models use the same training data but vary in the structure of model. All models were implemented in logistic form using a ''logit'' link function for an optimal prediction point and were trained using observational bacteria data transformed to binary presence/ absence. Probability of V. vulnificus presence was calculated using p = e z /(1+e z ). Diagnostics for each model were performed using Akaike's Information Criterion (AIC) and accuracy (ACC) in an out-of-bag (OOB) cross validation [32]. ACC is defined as ACC = (TP+TN)/(P+N) where TP is true positive, TN is true negative, P is the number of presence instances, and N is the number of absence instances.
To explore sensitivity of the V. vulnificus models to temperature and salinity, we used a range of surface water temperature (0-40uC) and surface salinity (0-13) values as independent model input. Here the range of model input extends past the range of the in situ temperature (8-31uC) observations, and is constrained to the range of the surface salinity (0-14) observations that were used in the training of the two JHU models. Additionally, historical temperature and salinity data were tested as model input, enabling identification of V. vulnificus climatology and seasonal trends. To further assess the geographic distribution of the predicted V. vulnificus probability for each method, geospatially-interpolated satellite-derived surface temperature and surface salinity [33,34] were used to map spatially complete estimates of probability throughout the upper Bay. Interpolated satellite estimates were developed using monthly, level-2 Moderate Resolution Imaging Spectroradiometer (MODIS) surface water temperature (MOD 28) and ocean color (Rrs 412-678) products.
All statistical computations were carried out in the R Statistical Environment version 2.14, using the 'mgcv' and 'stats' packages, on an Intel Xeon W3580 Processor, 3.33 GHz machine with 12 GB RAM. Computation time for all statistical models was less than one minute.

Results and Discussion
For model evaluation, goodness of fit and predictive skill for the JHU models were determined using AIC and ACC indices. AIC results indicated that the JHU GAM (145.9) offered better model fit than the JHU GLM (160.4), but performance differences between models were small relative to measurement uncertainty. NOAA GLM model fit using the NOAA training dataset yielded an AIC of 164.3 [12]). A direct comparison of model fit of could not be calculated due to lack of access to NOAA GLM training data. We stress that the difference in training data between the NOAA and JHU models is the primary reason for differences between NOAA_GLM and JHU_GLM, as the models are structurally identical. To predict bacterial presence, selection of an optimal prediction point was required. Rather than setting a prediction point at 0.5 arbitrarily, the prediction point was based on three performance indices: true positive rate, true negative rate, and ACC, yielding an optimal threshold of 0.4 for V. vulnificus. To determine the prediction skill of each model, ACC was calculated using the JHU validation dataset (ACC: 0.47, for NOAA GLM, 0.59 for JHU GLM, and 0.60 for JHU GAM). The AIC and ACC values indicated that the JHU models performed significantly better than a null model that only included seasonality as a predictor. Figure 2 shows the relationship between temperature, salinity, and the mean estimate of predicted V. vulnificus probability for each of the tested models, with likelihood levels plotted as contour curves. NOAA GLM (Fig. 2a) exhibits a sharp increase in V. vulnificus probability with increasing temperatures along the axis of optimal salinity (11.5). Similarly, JHU GLM (Fig. 2b) exhibits a steady increase in probability with higher temperatures, though the rate of change with temperature is less steep than NOAA GLM. In contrast to the GLMs, JHU GAM (Fig. 2c) shows a probability maximum dependent on temperature, indicating a temperature optimum V. vulnificus growth above which probability gradually declines. Figure 2d offers an alternative view of predicted V. vulnificus probability with temperature, at optimal salinity, including temperature observations during in situ bacteria collection. Furthermore, the wide range of observed temperatures confirms that the declining GAM probability above optimal temperature is a valid model response and not an issue of limited observations at high temperature. These differences in model response also have implications for retrospective or near real-time estimation of risk of V. vulnificus presence. Using a 27-year in situ record of temperature and salinity in the upper Chesapeake Bay, we estimated V. vulnificus monthly probability of presence according to each statistical model. Fig. 3 shows the climatology of surface water temperature and mean estimate model predictions in each region of the upper Bay for March through November. A southward increase in predicted probabilities for all statistical methods during summer months suggests that distance from optimal salinity plays a role in the spatial distribution of V. vulnificus presence. Predicted probabilities are likely lower in the upper region due to decreased salinity and larger deviation from optimal salinity. Seasonal patterns in all regions indicate that NOAA_GLM and JHU_GLM predict highest probabilities during the warmest summertime months. JHU_GAM exhibits a bimodal seasonal pattern with peaks in early and late summer across all regions. These JHU_GAM results are consistent with the temperature dependency shown in Fig. 2c.
The difference in model sensitivity to temperature has implications for characterizing interannual variability in risk. Fig. 3d-f show mean predicted V. vulnificus probability for the upper, middle, and lower portions of the study area plotted against annual peak monthly SST for the available historical record. In all three subregions, NOAA_GLM predicts that peak probabilities were highest in warmer years, while JHU_GAM predicts the opposite and JHU_GLM falls in between. We emphasize that these are the mean probability estimates for each model, and that there may not be statistically significant differences between model predictions in any given year. Nevertheless, mean estimates are commonly used to communicate risk and to project trends, so the fact that two comparably high performing models -NOAA_GLM and JHU_GAM -yield opposite mean estimates of the relationship between warm summers and V. vulnificus probability is relevant.
The differences in these model response surfaces also have clear implications for projections of V. vulnificus probability under climate change. As a simple demonstration, we consider the consensus prediction of warming and freshening of the Bay (dashed line in Figure 2 a-c). NOAA_GLM projects steady or increasing probabilities: freshening moves conditions away from the salinity optimum but this effect is offset by increases in predicted probability with rising water temperatures. The JHU_GLM shows a similar pattern but with lower sensitivity to changing environmental conditions. In contrast, warming only increases predicted probability of V. vulnificus presence in JHU_GAM for relatively cool temperatures, representative of spring or fall conditions. Peak summertime temperatures are already above the temperature optimum in this model, so further warming results in a predicted decline in peak summertime V. vulnificus probability.
While we cannot presently determine which sensitivity pattern is correct-the JHU_GLM and NOAA_GLM increase with higher temperatures or the JHU_GAM decline under warmest conditions-the JHU_GAM behavior might indicate that present-day summertime water temperatures are already above the optimal temperature for V. vulnificus growth in Chesapeake Bay. Alternatively, the result might be understood in the context of previous studies that have shown Vibrio dependence on zooplankton due to attachment and/or Vibrio's chitinoclastic activity [35,36]. Unfortunately we do not have adequate co-located measurements of zooplankton and V. vulnificus to include zooplankton in a predictive model. However, we do find that the climatology of Chesapeake Bay Program in situ chlorophyll a concentrations, which generally correlate with zooplankton presence, exhibits a bimodal seasonal pattern with a slight lead over the JHU GAM predicted V. vulnificus peaks (Fig. 4).
To examine the geographic extent of each methods' predicted V. vulnificus probability, monthly interpolated satellite surface water temperature and surface salinity products were used to create spatially complete probability hind-casts for 2012 in the upper Bay (Fig. 5). Consistent with results shown in Fig. 3, these maps show highest predicted probability towards the south of the analysis region, where salinity values are closest to optimum. NOAA_GLM and JHU_GLM both show the most widespread zones of high probability in the warmest summer months, while JHU_GAM predicts higher probabilities at the beginning and end of the warm season. Interesting spatial structures are also apparent in these maps. For example, NOAA_GLM predicts slightly elevated V. vulnificus probabilities in the eastern waters of the Chesapeake Bay during warmer months, while JHU_GAM predicts high probability zones in the western Bay during months with lower overall probability (Fig. 5). These patterns likely reflect the Bay's two-layer physical circulation scheme in which we see fresher surface waters along the western shore and saltier waters along the eastern shore of the Bay. The predictions of statistical V. vulnificus probability models compared in this study clearly differ in the implied relationships between the structure of this circulation and the location of high V. vulnificus risk areas.

Conclusions
In summary, this study presents a comparison of three statistical ecological habit models for estimating the probability of V. vulnificus presence in the upper Chesapeake Bay. We examined individual model sensitivity to climatic variability and change within the upper Bay by assessing model response to a range of temperature and salinity values. We find that the three models differ systematically in the predicted response of V. vulnificus probability to high temperatures in the upper Chesapeake Bay.
These results indicate that more data are required to constrain estimates of climate sensitivity of V. vulnificus in Chesapeake Bay: statistical models are limited by the paucity of publicly available data from V. vulnificus collections and co-located measurements of ecologically relevant variables, and process-based models would require further research on the V. vulnificus life cycle in the Bay. Our results also caution against predicting or projecting climatebased changes in V. vulnificus exposure risk on the basis of the mean predictions of existing statistical models, as skillful and statistically indistinguishable models differ systematically in predicted V. vulnificus sensitivity to rising surface water temperature, even within the range of environmental conditions under which the models were trained.
The challenges facing V. vulnificus modeling in Chesapeake Bay are not unique. Indeed, predictive capabilities for climate sensitivity of many pathogens are limited to statistical models based on scarce data. Other recent studies [37][38] emphasize that the inadequacy of available data hamper climate change projections for a diversity of waterborne pathogen systems in many regions. In the case of V. vulnificus in Chesapeake Bay we have a specific example of closely related modeling efforts that suggest systematically different impacts of climate change due to differences in model structure-i.e., the difference between JHU_GLM and JHU_GAM-and training data-i.e., the difference between JHU_GLM and NOAA_GLM. These kinds of structural comparisons of statistical models, however, are not always performed in studies of climate sensitivity in ecological systems. The results of this study suggest that such model comparisons can be quite important when evaluating uncertainty in climate-based predictions and projections.