Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Systematic review of predictive models of microbial water quality at freshwater recreational beaches

Abstract

Monitoring of fecal indicator bacteria at recreational waters is an important public health measure to minimize water-borne disease, however traditional culture methods for quantifying bacteria can take 18–24 hours to obtain a result. To support real-time notifications of water quality, models using environmental variables have been created to predict indicator bacteria levels on the day of sampling. We conducted a systematic review of predictive models of fecal indicator bacteria at freshwater recreational sites in temperate climates to identify and describe the existing approaches, trends, and their performance to inform beach water management policies. We conducted a comprehensive search strategy, including five databases and grey literature, screened abstracts for relevance, and extracted data using structured forms. Data were descriptively summarized. A total of 53 relevant studies were identified. Most studies (n = 44, 83%) were conducted in the United States and evaluated water quality using E. coli as fecal indicator bacteria (n = 46, 87%). Studies were primarily conducted in lakes (n = 40, 75%) compared to rivers (n = 13, 25%). The most commonly reported predictive model-building method was multiple linear regression (n = 37, 70%). Frequently used predictors in best-fitting models included rainfall (n = 39, 74%), turbidity (n = 31, 58%), wave height (n = 24, 45%), and wind speed and direction (n = 25, 47%, and n = 23, 43%, respectively). Of the 19 (36%) studies that measured accuracy, predictive models averaged an 81.0% accuracy, and all but one were more accurate than traditional methods. Limitations identifed by risk-of-bias assessment included not validating models (n = 21, 40%), limited reporting of whether modelling assumptions were met (n = 40, 75%), and lack of reporting on handling of missing data (n = 37, 70%). Additional research is warranted on the utility and accuracy of more advanced predictive modelling methods, such as Bayesian networks and artificial neural networks, which were investigated in comparatively fewer studies and creating risk of bias tools for non-medical predictive modelling.

Introduction

Between 2000 and 2014, 140 outbreaks were reported in 35 states and a territory in the United States (U.S.) in untreated recreational water sources, leading to 4958 cases of waterborne disease, with 84% of the outbreaks associated with a lake, pond, or reservoir [1]. However, when accounting for non-outbreak linked cases, underreporting, and missing state data, the estimate for total water-borne illness from recreational surface waters in the U.S. is around 90 million cases annually, costing $2.2-$3.7 billion USD in healthcare services [2]. Routine monitoring for water-borne pathogens is infeasible at recreational beaches, therefore, fecal indicator bacteria (FIB) are sampled as a marker of potential pathogen concentrations and risk of infection to bathers. There are many pathogens that are spread via recreational water use that can cause recreational water illness, including enteric viruses (e.g. norovirus, adenovirus) and bacterial and protozoal pathogens (e.g. Campylobacter, Salmonella, Cryptosporidium) [3, 4]. E. coli is often used as the indicator for the presence of these pathogens in freshwater beaches [5]. Enterococcus is occasionally used as an indicator in addition to or in place of E. coli, most commonly in marine waters [68]. E. coli is often a preferred indicator in freshwater sources due to its strong association with the risk of gastrointestinal illness in bathers [5, 9].

Decisions on whether to close or post beaches as potentially unsafe for swimming due to water quality concerns are conducted by public health officials or other beach managers. Traditionally, these decisions are based on evaluating whether FIB levels in beach waters exceed health-action threshold values. This approach has been termed the “persistence model” of beach management, because it typically relies on culture-based laboratory assessments of FIB counts which require 18–24 hours to obtain a result, leading beach managers to make water quality decisions using the previous day’s measurements. More modern genetic techniques, such as qPCR, can achieve results in 3–4 hours, but are costly for beach management and laboratories to run daily [10]. Some beach managers have moved to forecasting FIB levels using predictive models. These models typically use environmental inputs such as temperature, precipitation, and turbidity to predict FIB levels at beaches on a given day, which can then be validated and assessed with the subsequent FIB lab results [11, 12]. A wide variety of predictive modelling methods have been used at recreational beaches; including multiple linear regression [13, 14], artificial neural networks [15], and Bayesian networks [16]. These models use local weather and environmental data, collected from various sources, that are associated with FIB concentrations in the water [6, 17].

Given the variety of predictive modelling approaches and applications published to-date, there is a need to identify and describe existing approaches, trends, and their accuracy to inform beach water management policies. The purpose of this systematic review was to identify and summarize modelling methods used, where they have been applied, and their performance in correctly predicting beach water quality to support management decisions (e.g., posting a beach as unsuitable for swimming due to poor water quality). The review was conducted as part of a larger study to examine environmental influences on freshwater beach quality in Canada. Therefore, we have focused the scope on models developed for freshwater, recreational sites located in a temperate climate. To our knowledge, no systematic review exists on predictive models of fecal indicator bacteria at freshwater recreational sites in temperate climates.

Methods

Review question and eligibility criteria

The protocol for this review was created in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Protocol 2015 checklist [18]. The remainder of this review was written using the PRISMA 2020 statement [19]; a PRISMA checklist is located in S1 Table. A review protocol was developed a priori following Cochrane Collaboration review guidelines (see S1 Protocol) [20]. However, the protocol was not registered with any databases. The research questions were: 1) what types of predictive models were created for predicting FIB concentrations based on environmental variables for freshwater beach management decisions? 2) which predictors were included in these models? 3) how accurate are the models in determining if recreational water quality exceeds guideline recommendations?

Our eligibility criteria followed the PECO approach: Population, Exposure, Comparison, and Outcome. Our population of interest included freshwater beaches in temperate climates that are used for recreational purposes. Therefore, we excluded models focusing on coastal and estuarial waters, and waters not used for recreation (drinking water sources). Our exposure of interest included environmental data that can be collected in real time to support beach water monitoring, such as weather parameters and water conditions. We included models that compared accuracy to their original dataset, to persistent models, and that used other validation methods (e.g., bootstrapping). Our outcome of interest was FIB levels. Models predicting algal blooms were excluded. We included publications reporting on the development and/or evaluation of predictive models, reported in journal articles, conference proceedings, thesis and dissertations, and government reports. Reviews and commentary articles were excluded.

Search strategy

We designed a comprehensive search strategy in collaboration with a research librarian. The following databases were used to search for relevant articles: Medline via OVID, SciTech Premium, Scopus, Web of Science, and ProQuest Dissertations and Thesis Global. The search terms used in each database are provided in S2 Table. As an example of the search terms used, the search in Scopus was:

  1. (Escherichia coli OR enterococc* OR fecal indicator bacteria) AND (regression analysis OR predict* OR nowcast* OR forecast* OR model*) AND ("fresh water" OR recreational water OR beach* OR lake OR river) AND (weather OR monitor* OR rain* OR environmental).

All articles published until the search date, December 15, 2020, were included with no publication date restrictions. A grey literature search was also conducted and involved searching nine targeted government websites from December 10–14, 2020. A list of websites searched is available in the S3 Table. To ensure all relevant publications were captured, reference lists of relevant articles were hand-searched for additional potentially relevant articles.

Relevance screening

Citations identified by the searches were stored in a Mendeley database (Elsevier, Amsterdam, Netherlands), deduplicated, and then uploaded into DistillerSR (Evidence Partners, Ottawa, Canada). All articles were independently screened twice by CH and JS in two levels of screening: title and abstract screening (Level 1) and full article screening (Level 2).

Level 1 screening involved the question:

Is this reference potentially relevant to our review? (Yes/No).

Level 2 involved three screening questions:

Is this article about microbial water quality? (e.g., measuring E. coli, Enterococcus).

Is this article about freshwater, recreational beaches in a temperate region?

Does this article report on a predictive model for beach water quality using environmental data? (Yes/No for all).

Beaches were defined as any site intended for primary water contact activities (e.g., swimming, wading, water sports) to capture all recreational water sites. All screening forms were created prior to any screening and pre-tested by two reviewers screening 50 articles and discussing discrepancies. Pre-testing of Level 1 screening resulted in a kappa score of 0.76, after which the reviewers discussed their conflicts and agreed to proceed with independent reviewing after improving clarity on how to apply the eligibility criteria. Questions for level 2 were discussed prior to screening and tested on five articles by both reviewers to ensure consistent interpretation and clarity of the questions.

Data characterization and extraction

Articles passing the screening process were obtained as full-texts and data were extracted using a pre-specified and pre-tested form. Data were extracted by CH into a form in DistillerSR, which can be found in S5 Table. The form included 20 questions that collected information such as location details of beaches, length of study, type of predictive model, variables explored in making the model, performance metrics of the model, and risk-of-bias. Data extraction results were independently validated by JS.

Risk-of-bias assessment and data analysis

Risk of bias of each relevant article was assessed using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) [21]. We adapted the checklist from human health predictive models to environmental modelling. We considered “participants” to be beach days, and questions relating to human health were removed (e.g., details of treatments, blinding outcomes). Of 21 CHARMS questions, 10 were included in the data extraction form. Questions included sources of data, blinding predictors from outcomes, number and handling of missing data, predictor selection method, predictor transformations, and model validation methods and performance measures. Due to a priori knowledge that many studies collect data from government sources, predictor measurement methods were not included. CHARMS does not score studies based on bias, therefore, we did not determine an overall risk-of-bias score or rating for each study. Data from DistillerSR were downloaded in Excel (Microsoft, Redmond, United States) for analysis, which consisted of descriptive summary tabulations. Data visualizations were also created in Excel. While we report on performance metrics, we do not draw conclusions on validity nor compare models to each other. Meta-analysis was not deemed appropriate for this review given that predictive modelling approaches and performance metrics varied widely across studies.

Results

Of 1710 unique citations identified in the search, 53 relevant studies were identified and included in the review (Fig 1). A descriptive summary of the model types, variables, and performances from each relevant study is presented in Table 1.

thumbnail
Table 1. Summary characteristics of models extracted from 53 relevant articles that created predictive models of FIB using environmental variables.

https://doi.org/10.1371/journal.pone.0256785.t001

Studies were published from 2000 to 2021 (median of 2013). S6 Table summarizes study characteristics, including number of years of model building and publication type (Figs 2 and 3). While the maximum number of swimming seasons included in model building was 12 seasons [33], 19 (36%) of the studies used only one swimming season of data for model creation. Around half (26 studies, 49%) used two seasons or less. However, the number of seasons used in model building do not include seasons that were used solely for model validation in the 21 (40%) of studies that used temporal validation.

thumbnail
Fig 2. Frequency of the number of swimming seasons used in building models.

https://doi.org/10.1371/journal.pone.0256785.g002

Five countries were represented in this study: U.S. (44 publications), Germany (4), Canada (2), New Zealand (2), and France (1). Additionally, the studies mostly focused on the Great Lakes, in particular Lake Michigan (20 studies) and Lake Erie (14) (Fig 4). Lake Ontario and Lake Superior were investigated in two studies each. No studies included Lake Huron. Overall, 40 studies (75%) modelled lakes and 13 studies (25%) modelled rivers. Fig 5 shows the frequency of the number of beaches in each study.

thumbnail
Fig 5. Frequency of the number of beaches, or sampling sites if beaches not provided.

https://doi.org/10.1371/journal.pone.0256785.g005

Table 2 summarizes modelling methods employed in the studies. The most commonly used model building method was multiple linear regression, which was used in 37 studies (70%), while univariate linear regression used in three (6%). Logistic regression, using a dichotomous outcome variable representing whether recreational waters met thresholds for safe use by bathers, were explored in five studies (9%). Additionally, tree regression or random forests were utilized in six studies (11%). Decision trees were created in three studies (6%). Beginning in 2012, more computationally advanced models were introduced including Bayesian networks, artificial neural networks, and deterministic or hydrodynamic models, of which there were five (9%), three (6%), and four (8%) of these model types, respectively. Various studies involved multiple modelling methods to compare their efficacy, comparing multivariate linear regression, artificial neural networks, hydrodynamic models, Bayesian networks, and stacking of multiple models together.

thumbnail
Table 2. Modelling techniques for creating the predictive models present in 53 relevant studies.

https://doi.org/10.1371/journal.pone.0256785.t002

E. coli was the most commonly investigated FIB (n = 46 studies, 87%), while 11 (21%) modelled Enterococcus, one (2%) modelled total fecal coliforms, and one (2%) included models for Salmonella and Campylobacter. Of these, 34 (64%) studies log-transformed the concentration of the FIB of interest.

The predictor variables examined and included in final models are presented in S7 Table (and Fig 6). The variables used in most studies’ final models were turbidity, wind direction, wave height, and wind speed. Time variables were important in creating models, as seen with the regular inclusion of day of year, sampling time, and month/ sub-season variables in final models. Forty-five (85%) studies assessed rainfall variables, including amount of rainfall in the previous <24, 24, 48, or 72 or more hours, the length of time since the last rainfall, or intensity of the last rainfall. Three commonly transformed variables were log10(turbidity), log10(discharge), and weighted rainfall. Most studies obtained these environmental variables from government sources such as US Geological Survey river gauges and National Weather Service airport weather stations.

thumbnail
Fig 6. Frequency of environmental variables explored in studies and frequency of variables included final models.

https://doi.org/10.1371/journal.pone.0256785.g006

Accuracy of predictive models was measured in 19 studies. The overall accuracy of these studies was 81% (S8 Table). Of these studies, 13 compared their accuracy to pre-existing persistence models at those locations, and with the exception of one study, all or most of their models were more accurate than persistence models.

Risk-of-bias characteristics of each individual study are presented as S9 Table, while summary data are presented in Table 3. We found that one study adjusted predictor weights to address overfitting (regularization of data) and only three studies (6%) compared predictors’ calibration distributions to validation distributions. Additionally, little information was provided on the handling of missing data, with only 17 (32%) studies reporting any method of dealing with missing FIB concentrations or predictor values. Modelling assumptions, such as normality, were rarely fully addressed, with only 12 (23%) studies affirming they met all model assumptions.

thumbnail
Table 3. Risk of bias checklist summary for 53 relevant studies.

https://doi.org/10.1371/journal.pone.0256785.t003

Predictor measurements were mostly collected from governmental sources (37 studies, 70%) or directly by the authors (28 studies, 53%) deploying their own instruments or water sampling. Most predictor transformations were categorizations (20 studies, 38%), weighting rainfall over several days (11 studies, 21%), or logarithmic (18 studies, 34%), however some studies utilized other transformations such as polynomial [64] or trigonometric transformations [34]. Twenty-seven studies (51%) reported they used no pre-screening criteria for selecting variables that were evaluated in multivariable modelling. To select predictors in final models, 13 studies (25%) used model fit characteristics of predicted values compared to actual values of FIB concentrations in many or all possible models. A full model approach using all variables was used in 10 studies (19%). Other techniques included backwards elimination, Akaike’s Information Criterion, and forward selection. Seven (13%) studies created models using the Virtual Beach software tool.

Discussion

This review compiles results of the literature reporting on predictive models of FIB at fresh, recreational waters using environmental predictors. It provides novel insight on key variables of interest, modeling techniques, and considerations of modeling for those looking to create predictive models at other waters. Our review is the first to provide a systematic approach to reviewing the literature in this area. It focuses exclusively on fresh, recreational waters, and further explores the role of various environmental predictors, which is novel to the literature of this type of modelling. de Brauwere et al. reviewed regression and hydrodynamic models predicting FIB in all surface waters in 2014, and provided an in-depth summary of important processes for hydrodynamic models [72]. We similarly found that most relevant studies in this area were conducted in the U.S., despite wider search parameters. Additionally, this review reports on the validation techniques and amount of data used during model building and validation of reviewed studies.

As the geology, pollution sources, and climate of beaches differs geographically, building beach-specific models is important for accuracy [13, 65, 72]. Even in the same region, different bodies of water behave differently. For example, Hatfield [43] created an effective model for FIB in Lake Erie, but a similar model for a nearby artificial lake was not successful due to poor efficacy. However, geographically similar beaches within a specific region may be able to be modelled similarly to help reduce resources required to build models [54]. Different beach models may require different modeling approaches and environmental variables, so it is important to explore these elements in new contexts before generalizing models to other beaches.

Predictive modelling has the ability to overcome several issues in recreational water monitoring. Firstly, it addresses the reliance on persistence models, where the accuracy of posting beaches as suitable or unsuitable for swimming and other water activities depends on FIB concentrations remaining consistent across the 24-hour lab-response time. It also does not require the large resource and capacity investment of upgrading to qPCR for rapid testing, as most beach managers collect FIB data and government weather and water stations are already set up at or near many recreational waterways, resulting in less investment to collect data to develop and implement models. However, these techniques can still be integrated together. The city of Chicago has adopted a hybrid model for determining beach water quality [73]. The five beaches (out of 20) that produce 56% of poor water quality days are tested with qPCR everyday, with the others placed into clusters, with one beach per cluster tested with qPCR and the rest predicted with models. This hybrid approach identifies poor water quality days three times more accurately than the previous predictive models alone. The rapid testing ensures accuracy, while the predictive models reduce costs and may provide a solution to the shortcomings of both methods.

The efficacy of predictive models depends on the quality and accuracy of information put into them. Thirty-seven studies collected at least some of their environmental data from governmental sources, which are likely to be reliable in quality. While they might reflect slightly different weather conditions from beaches, due to being located elsewhere, such small changes are not likely to be a limitation in modelling. Rainfall is an important environmental factor as it washes microbial contamination from urban surfaces and agricultural sources into larger bodies of water, and increases sewer and river discharge [35, 47]. As a result, elevated E. coli levels are often associated with extreme rainfall events [69]. A wide range of timeframes for antecedent rainfall were explored, from a few hours prior to sampling to several days before. For easier interpretation, this review categorized these times as <24 hours, 24 hours, 48 hours, and 72 or more hours. Of the studies that explored times across this range, the most commonly used time in final models was 72+ hours [48, 61, 64]. Some studies also evaluated weighted rainfall variables that emphasized more recent rainfall across a 3-day period. Regardless, when explored in a study, every rainfall variable was included in at least one final model more than 50% of the time, indicating the value of examining and comparing a variety of ways of expressing rainfall.

After rainfall, turbidity was the most frequently included variable in at least one final model. It’s importance relates to the association of bacteria with sediments and particulate suspended solids [74]. As UV radiation can kill E. coli, higher turbidity can protect the bacteria by absorbing or scattering solar radiation [75]. The importance of sand-associated FIB was shown at a beach in Lake Huron, where erosion of sand was the main source of E. coil from the foreshore to surface water, mediated by wave height [76]. Larger waves may also be responsible for washing bird fecal matter from the beach into the water [54]. Wind direction and speed are important explanatory variables as they are associated with driving FIB from sediments or point sources towards the beach [77, 78]. Winds, waves, and turbidity are often correlated parameters, as winds and waves churn sediments which increases turbidity [43, 78].

While explored less often, temporal variables were consistently included in final models, 100% of the time for day of year, day of week, and time of sampling, and 75% of the time for sub-season/month. FIB may accumulate in water bodies over the summer and, on average, increase over time during the bathing season [34]. Depending on characteristics, FIB concentrations may increase as the day progresses [66] or decrease [65] due to solar inactivation. This result is also dependent on enumeration method, as Telech et al. found that time of day was an important predictor of Enterococcus cell counts, but not qPCR results [65]. Pollution sources, such as waterfowl, other bathers, and discharge into the body of water were similarly explored less often but were nonetheless important considerations.

Numerous modelling techniques and predictor selection methods were utilized in this review. Multiple linear regression methods were the most popular and were shown to produce accurate predictions. However, other methods may produce more accurate predictions. Comparing models built at different locations with different variables and rates of FIB exceedances would not yield accurate comparisons; however, four studies included in this review compared modelling techniques using the same data and were thus able to compare techniques. The best performing models in these four studies were artificial neural networks [50], Bayesian networks [23], gradient boosting machine (a type of random forest) [30], and a model stacking algorithm that combines two or more models into one prediction [67]. All outperformed regression methods such as ordinary, partial, and sparse partial least squares methods for multiple linear regression, and were more consistent across years and locations. Further research is warranted on these approaches and their utility for implementation in routine beach water quality monitoring.

Predictor selection was also varied, but no comparisons of methods were conducted. However, seven studies (13%) used the Virtual Beach tool, created by the U.S. Environmental Protection Agency, which is intended to aid researchers and beach managers in creating predictive models [79]. The tool allows users to upload data, explore relationships among variables, transform variables, use different regression-based modelling techniques (including a recent addition of a gradient boosting machine), and evaluate models based on several model fit characteristics. The tool is free and designed to be user-friendly to support implementation of modelling at more beaches. While a gradient boosting machine was added, it still relies on regression techniques. Models created by the tool outperformed persistence models in some studies [27] but not others [37].

A few key limitations in the literature were found in the risk-of-bias. For instance, 22 studies validated their models by refitting the model through the original dataset that built the model without internal validation (bootstrapping or cross-validation), which increases the risk of overfitting [21]. Furthermore, only 13 studies (25%) specified whether or not modelling assumptions were met, which could impact model accuracy and reliability. Lastly, 37 studies (70%) did not provide any information about how missing data were dealt with, which raises additional concerns about reliability of the models. The risk of bias checklist, CHARM, required several modifications for this review compared to it’s intended context of human health outcomes. A checklist intended for systematic reviews of non-health related predictive models would benefit future reviews and improve reporting of risk of bias information when creating predictive models in this research area.

The goal of predictive models is to produce more accurate results than persistence models, using the previous day’s FIB measurement for current day decisions. Most models included in this review outperformed persistence models to varying degrees, in terms of sensitivity, specificity, and/or accuracy, supporting the use of predictive models in management decisions [27, 35, 64, 70, 80]. However even if models are used for management decisions, routine water sampling for FIB should still be conducted to ensure models remain valid, and are updated and refined as appropriate, across seasons. To ensure models are up to date, the U.S. Geological Survey suggests that beach managers update their predictive models before every new bathing season [27, 70], which may not always occur in practice [81].

Once an accurate model is created, their use by beach management or the public to make decisions regarding recreational activities requires a user-friendly interface. The U.S. Geological Survey Great Lakes NowCast [81] provides real-time estimates of beach water quality along Lake Erie and Lake Ontario to the public [81]. Built from the Ohio NowCast system, several studies in this review were used in developing this tool [35, 36, 38]. The predictive models created for the Cuyahoga river were also added into the Ohio NowCast [27, 28]. The website allows users to examine current and past conditions, and also explains factors in the model. The Philly Rivercast [82] provides nowcasts for the Skullykill River and it’s development was outlined by Maimone et al. [49]. These platforms are used by beach managers and the public, which allows authorities to make real-time water quality decisions easily, and the pubic to learn about beach postings prior to arrival and make decisions about whether or not to swim or engage in other recreational activities at the beach. Additionally, as seen with the Great Lakes NowCast, these platforms can be modified and scaled to include new beaches as appropriate.

There were several limitations to this study. Firstly, while grey literature was included, only selected government websites were searched. Therefore, we could have missed some relevant studies. However, our search verification strategy helped to mitigate this potential bias. Lastly, our review was geographically limited to fresh, recreational waters in temperate regions, excluding models created for marine, tropical and subtropical waters. Predictive models in those settings may have different environmental predictors and performance.

Conclusions

This review is the first to systematically examine literature on predictive models for FIB levels in fresh, recreational waters. The review reports on 53 relevant articles extracted from five databases. We have highlighted commonly explored and frequently used environmental variables and modelling techniques that can inform future predictive modelling projects and options for beach managers. Rainfall, turbidity, wind, and wave height were most commonly incorporated into final models, and most models used linear regression. Evidence supports use of real-time models of FIB levels as an indicator of water quality rather than or in addition to using persistence models. At locations with consistent monitoring of FIB, predictive models can improve the effectiveness and response times of risk communication with beachgoers about recreational water quality risks, which can help to potentially reduce water-borne illness. A risk of bias checklist was adapted for this review and identified common limitations in the literature. Future research may benefit from a risk of bias checklist intended for non-medical predictive models. This review provides insight for researchers and beach managers interested in creating their own predictive models in terms of key variables, modelling approaches, and bias-reduction techniques to consider. More research should be conducted to evaluate the effectiveness and utility of more advanced predictive modelling approaches such as artificial neural networks, Bayesian approaches, and other machine learning methods.

Supporting information

S1 Table. PRISMA checklist for systematic reviews components and location they can be found in the review.

https://doi.org/10.1371/journal.pone.0256785.s001

(PDF)

S2 Table. Search terms used in each database.

https://doi.org/10.1371/journal.pone.0256785.s002

(PDF)

S3 Table. Grey literature search of government websites and their URLs.

Searched December 10–14, 2020.

https://doi.org/10.1371/journal.pone.0256785.s003

(PDF)

S4 Table. Eligibility criteria to define microbes of interest, geography, predictors of interest, and types of publications.

https://doi.org/10.1371/journal.pone.0256785.s004

(PDF)

S5 Table. Data extraction form, including primary outcomes and risk of bias questions.

https://doi.org/10.1371/journal.pone.0256785.s005

(PDF)

S6 Table. Descriptive summary of number of swimming seasons used for model building, number of beaches investigated, FIB of interest, geography of beaches, and type of publication of the 53 relevant studies.

https://doi.org/10.1371/journal.pone.0256785.s006

(PDF)

S7 Table. Frequency of variables explored in studies and used in a final model for predicting microbial water quality.

https://doi.org/10.1371/journal.pone.0256785.s007

(PDF)

S8 Table. Average accuracy of models that assessed accuracy and whether or not they performed better than persistence models.

https://doi.org/10.1371/journal.pone.0256785.s008

(PDF)

S9 Table. Risk-of-bias characteristics of 53 articles reporting on predictive models of fecal indicator bacteria using environmental predictors, excluding characteristics found in Table 1 of main text.

https://doi.org/10.1371/journal.pone.0256785.s009

(PDF)

S1 Protocol. Protocol for systematic literature review.

https://doi.org/10.1371/journal.pone.0256785.s010

(PDF)

Acknowledgments

We would like to thank Cecile Farnum, a research librarian at Ryerson University, for assistance with the search strategy.

References

  1. 1. Graciaa DS, Cope JR, Roberts VA, Cikesh BL, Kahler AM, Vigar M, et al. Outbreaks associated with untreated recreational water—United States, 2000–2014. MMWR Morbidity and Mortality Weekly Report. 2018 Jun;67:701–6. pmid:29953425
  2. 2. DeFlorio-Barker S, Wing C, Jones RM, Dorevitch S. Estimate of incidence and cost of recreational waterborne illness on United States surface waters. Environmental Health. 2018 Dec 9;17.
  3. 3. Soller J, Bartrand T, Ravenscroft J, Molina M, Whelan G, Schoen M, et al. Estimated human health risks from recreational exposures to stormwater runoff containing animal faecal material. Environmental Modelling and Software. 2015;72:21–32.
  4. 4. Soller JA, Bartrand T, Ashbolt NJ, Ravenscroft J, Wade TJ. Estimating the primary etiologic agents in recreational freshwaters impacted by human sources of faecal contamination. Water Research. 2010;44:4736–47. pmid:20728915
  5. 5. Health Canada. Guidelines for Canadian Recreational Water Quality–Third Edition- Part II: Guideline Technical Documentation. 2012.
  6. 6. Jones RM, Liu L, Dorevitch S. Hydrometeorological variables predict fecal indicator bacteria densities in freshwater: Data-driven methods for variable selection. Environmental Monitoring and Assessment. 2013 Mar 27;185:2355–66. pmid:22736208
  7. 7. World Health Organization. Guidelines for Safe Recreational Water Environments Volume 1: Coastal and Fresh Waters. 2003.
  8. 8. Government of Australia National Health and Research Council. Guidelines for Managing Risks in Recreational Water. 2008.
  9. 9. Marion JW, Lee J, Lemeshow S, Buckley TJ. Association of gastrointestinal illness and recreational water exposure at an inland U.S. beach. Water Research. 2010;44:4796–804. pmid:20723965
  10. 10. Shrestha A, Dorevitch S. Slow adoption of rapid testing: Beach monitoring and notification using qPCR. Journal of Microbiological Methods. 2020 Jul;174:105947. pmid:32442655
  11. 11. Francy DS, Brady AMG, Cicale JR, Dalby HD, Stelzer EA. Nowcasting methods for determining microbiological water quality at recreational beaches and drinking-water source waters. Journal of Microbiological Methods. 2020;175. pmid:32522491
  12. 12. Mälzer H-J, Aus Der Beek T, Müller S, Gebhardt J. Comparison of different model approaches for a hygiene early warning system at the lower Ruhr River, Germany. International Journal of Hygiene and Environmental Health. 2016;219:671–80. pmid:26163780
  13. 13. Shively DA, Nevers MB, Breitenbach C, Phanikumar MS, Przybyla-Kelly K, Spoljaric AM, et al. Prototypic automated continuous recreational water quality monitoring of nine Chicago beaches. Journal of Environmetal Management. 2016 Jan 15;166:285–93. pmid:26517277
  14. 14. Madani M, Seth R. Evaluating multiple predictive models for beach management at a freshwater beach in the Great Lakes region. Journal of Environmental Quality. 2020;49:896–908. pmid:33016491
  15. 15. Zhang J, Qiu H, Li X, Niu J, Nevers MB, Hu X, et al. Real-Time Nowcasting of Microbiological Water Quality at Recreational Beaches: A Wavelet and Artificial Neural Network-Based Hybrid Modeling Approach. Environmental Science and Technology. 2018 Aug 7;52:8446–55. pmid:29957996
  16. 16. Mellios NK, Moe SJ, Laspidou C. Using Bayesian hierarchical modelling to capture cyanobacteria dynamics in Northern European lakes. Water Research. 2020 Nov;186:116356. pmid:32889364
  17. 17. Nevers MB, Whitman RL. Efficacy of monitoring and empirical predictive modeling at improving public health protection at Chicago beaches. Water Research. 2011;45:1659–68. pmid:21195447
  18. 18. Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (prisma-p) 2015: Elaboration and explanation. BMJ. 2015 Jan 2;349. pmid:25555855
  19. 19. Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews review findings. BMJ. 2021;372.
  20. 20. Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions version 6.1. 2020. www.training.cochrane.org/handbook.
  21. 21. Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Medicine. 2014;11. pmid:25314315
  22. 22. Anderson KW. Rainfall and Turbidity as Simple Alternative Predictors of Beach Water Quality in Chicago. ProQuest Dissertations and Theses. [Ann Arbor]: University of Illinois at Chicago; 2019.
  23. 23. Avila R, Horn B, Moriarty E, Hodson R, Moltchanova E. Evaluating statistical model performance in water quality prediction. Journal of Environmental Management. 2018;206:910–9. pmid:29207304
  24. 24. Bachmann-Machnik A, Dittmer U, Schönfeld A. Using Precipitation and Combined Sewer Overflow Data for Predicting Hygienic Contaminations in Bathing Waters–A Data Analysis. In: Green Energy and Technology. Springer Verlag; 2019. p. 654–9.
  25. 25. Brady AMG, Bushon RN, Plona MB. Predicting Recreational Water Quality Using Turbidity in the Cuyahoga River, Cuyahoga Valley National Park, Ohio, 2004–7.
  26. 26. Brady AMG, Plona MB. Relations Between Environmental and Water-Quality Variables and Escherichia coli in the Cuyahoga River With Emphasis on Turbidity as a Predictor of Recreational Water Quality, Cuyahoga Valley National Park, Ohio, 2008.
  27. 27. Brady AMG, Plona MB. Towards Automating Measurements and Predictions of Escherichia coli Concentrations in the Cuyahoga River, Cuyahoga Valley National Park, Ohio, 2012–14. 2015.
  28. 28. Brady AM, Plona MB. USGS Report Series SIR 2012–5074: Development and Implementation of a Regression Model for Predicting Recreational Water Quality in the Cuyahoga River, Cuyahoga Valley National Park, Ohio 2009–11. 2012.
  29. 29. Brooks WR, Fienen MN, Corsi SR. Partial least squares for efficient models of fecal indicator bacteria on Great Lakes beaches. Journal of Environmental Management. 2013 Jan 5;114:470–5. pmid:23186726
  30. 30. Brooks W, Corsi S, Fienen M, Carvin R. Predicting recreational water quality advisories: A comparison of statistical methods. 2016 Feb 1;76:81–94.
  31. 31. Corsi SR, Borchardt MA, Carvin RB, Burch TR, Spencer SK, Lutz MA, et al. Human and Bovine Viruses and Bacteria at Three Great Lakes Beaches: Environmental Variable Associations and Health Risk. Environmental Science & Technology. 2016 Jan;50:987–95.
  32. 32. Cyterski M, Zhang S, White E, Molina M, Wolfe K, Parmar R, et al. Temporal synchronization analysis for improving regression modeling of fecal indicator bacteria levels. Water, Air, and Soil Pollution. 2012 Sep 4;223:4841–51.
  33. 33. Dada AC, Hamilton DP. Predictive Models for Determination of E. coli Concentrations at Inland Recreational Beaches. Water Air and Soil Pollution. 2016 Sep;227.
  34. 34. Francy DS, Gifford AM, Darner RA. Escherichia coli at Ohio Bathing Beaches Distribution, Sources, Wastewater Indicators, and Predictive Modeling Water-Resources Investigations Report 02 4285. 2003.
  35. 35. Francy DS, Brady AMG, Carvin RB, Corsi SR, Fuller LM, Harrison JH, et al. USGS Scientific Investigations Report 2013–5166: Developing and Implementing Predictive Models for Estimating Recreational Water Quality at Great Lakes Beaches. 2013.
  36. 36. Francy DS, Darner RA. Nowcasting Beach Advisories at Ohio Lake Erie Beaches. 2007.
  37. 37. Francy DS, Stelzer EA, Duris JW, Brady AMGG, Harrison JH, Johnson HE, et al. Predictive models for Escherichia coli concentrations at inland lake beaches and relationship of model variables to pathogen detection. Applied and environmental microbiology. 2013 Mar 1;79:1676–88. pmid:23291550
  38. 38. Francy DS, Bertke EE, Darner RA. Testing and Refining the Ohio Nowcast at Two Lake Erie Beaches-2008. 2009.
  39. 39. Francy DS, Darner RA, Bertke EE. Models for Predicting Recreational Water Quality at Lake Erie Beaches, SIR 2006–5192.
  40. 40. Francy DS, Darner RA. Forecasting Bacteria Levels at Bathing Beaches in Ohio. 2003.
  41. 41. Frick W. Bacteria, Beaches, and Swimmable Waters: Introducing Virtual Beach. 2006.
  42. 42. Frick WE, Ge Z, Zepp RG. Nowcasting and forecasting concentrations of biological contaminants at beaches: A feasibility and case study. Environmental Science and Technology. 2008 Jul 1;42:4818–24. pmid:18678011
  43. 43. Hatfield NLC. Tracking diverse sources of recreational beach contamination by fatty acid methyl ester (FAME) analysis and monitoring beaches for public safety. ProQuest Dissertations and Theses. [Ann Arbor]: The University of Toledo; 2000.
  44. 44. He C, Post Y, Dony J, Edge T, Patel M, Rochfort Q. A physical descriptive model for predicting bacteria level variation at a dynamic beach. Journal of Water and Health. 2016 Aug 1;14:617–29. pmid:27441857
  45. 45. Heberger MG, Durant JL, Oriel KA, Kirshen PH, Minardi L. Combining real-time bacteria models and uncertainty analysis for establishing health advisories for recreational waters. Journal of Water Resources Planning and Management-ASCE. 2008 Jan 1;134:73–82.
  46. 46. Herrig I, Seis W, Fischer H, Regnery J, Manz W, Reifferscheid G, et al. Prediction of fecal indicator organism concentrations in rivers: the shifting role of environmental factors under varying flow conditions. Environmental Sciences Europe. 2019 Dec 1;31. pmid:33747698
  47. 47. Hong Y, Soulignac F, Roguet A, Li C, Lemaire BJ, Martins RS, et al. Impact of Escherichia coliform stormwater drainage on recreational water quality: an integrated monitoring and modelling of urban catchment, pipes and lake. Environmental Science and Pollution Research. 2021;28:2245–2259. pmid:32876821
  48. 48. Jones RM, Liu L, Dorevitch S. Hydrometeorological variables predict fecal indicator bacteria densities in freshwater: data-driven methods for variable selection. Environmental Monitoring and Assessment. 2013 Mar 27;185:2355–66. pmid:22736208
  49. 49. Maimone M, Crockett CS, Cesanek WE. PhillyRiverCast: A real-time bacteria forecasting model and web application for the schuylkill river. Journal of Water Resources Planning and Management-ASCE. 2007;133:542–9.
  50. 50. Mälzer H-J, aus der Beek T, Müller S, Gebhardt J. Comparison of different model approaches for a hygiene early warning system at the lower Ruhr River, Germany. International Journal of Hygiene and Environmental Health. 2016;219:671–80. pmid:26163780
  51. 51. Marion JW. Protecting Public Health at Inland Ohio Beaches: Development of Recreational Water Quality Indicators Predictive of Microbial and Microcystin Exposure. ProQuest Dissertations and Theses. [Ann Arbor]: The Ohio State University; 2011.
  52. 52. Molina M., Cyterski M, Whelan G, Zepp R. Comparing Data Input Requirements of Statistical vs. Process-based Watershed Models Applied for Prediction of Fecal Indicator and Pathogen Levels in Recreational Beaches. United States Environmental Protection Agency. 2014.
  53. 53. Motamarri S, Boccelli DL. Development of a neural-based forecasting tool to classify recreational water quality using fecal indicator organisms. Water Research. 2012 Sep 15;46:4508–20. pmid:22743163
  54. 54. Nevers MB, Shively DA, Kleinheinz GT, McDermott CM, Schuster W, Chomeau V, et al. Geographic relatedness and predictability of Escherichia coli along a peninsular beach complex of Lake Michigan. Journal of environmental quality. 2009 Jan;38:2357–64. pmid:19875791
  55. 55. Nevers MB, Whitman RL. Nowcast modeling of Escherichia coli concentrations at multiple urban beaches of southern Lake Michigan. Water research. 2005 Dec 1;39:5250–60. pmid:16310242
  56. 56. Nevers MB, Whitman RL, Frick WE, Ge Z. Interaction and influence of two creeks on Escherichia coli concentrations of nearby beaches: Exploration of predictability and mechanisms. Journal of Environmental Quality. 2007;36:1338–45. pmid:17636296
  57. 57. Nevers MB, Whitman RL. Coastal strategies to predict Escherichia coli concentrations for beaches along a 35 km stretch of southern Lake Michigan. Environmental Science and Technology. 2008 Jun 15;42:4454–60. pmid:18605570
  58. 58. Olyphant GA. Statistical basis for predicting the need for bacterially induced beach closures: Emergence of a paradigm? Water Research. 2005 Dec;39:4953–60. pmid:16290180
  59. 59. Olyphant GA, Whitman RL. Elements of a Predictive Model for Determining Beach Closures on a Real Time Basis: The Case of 63rd Street Beach Chicago. Environmental Monitoring and Assessment. 2004 Nov;98:175–90. pmid:15473535
  60. 60. Parkhurst DF, Brenner KP, Dufour AP, Wymer LJ. Indicator bacteria at five swimming beaches-analysis using random forests. Water research. 2005;39:1354–60. pmid:15862335
  61. 61. Rossi A, Wolde BT, Lee LH, Wu M. Prediction of recreational water safety using Escherichia coli as an indicator: case study of the Passaic and Pompton rivers, New Jersey. Science of the Total Environment. 2020 Apr 20;714. pmid:32018971
  62. 62. Safaie A, Wendzel A, Ge Z, Nevers MB, Whitman RL, Corsi SR, et al. Comparative Evaluation of Statistical and Mechanistic Models of Escherichia coli at Beaches in Southern Lake Michigan. ENVIRONMENTAL SCIENCE & TECHNOLOGY. 2016 Mar 1;50:2442–9. pmid:26825142
  63. 63. Seis W, Zamzow M, Caradot N, Rouault P. On the implementation of reliable early warning systems at European bathing waters using multivariate Bayesian regression modelling. Water Research. 2018 Oct 15;143:301–12. pmid:29986240
  64. 64. Simmer RA. Source determination and predictive model development for Escherichia coli concentrations at F.W. Kent Park Lake, Oxford, Iowa. ProQuest Dissertations and Theses. [Ann Arbor]: The University of Iowa; 2016.
  65. 65. Telech JW, Brenner KP, Haugland R, Sams E, Dufour AP, Wymer L, et al. Modeling Enterococcus densities measured by quantitative polymerase chain reaction and Water Research. 2009 Nov;43:4947–55. pmid:19651425
  66. 66. Uejio CK, Peters TW, Patz JA. Inland lake indicator bacteria: long-term impervious surface and weather influences and a predictive Bayesian model. Lake and Reservoir Management. 2012;28:232–244.
  67. 67. Wang L, Zhu Z, Sassoubre L, Yu G, Liao C, Hu Q, et al. Improving the robustness of beach water quality modeling using an ensemble machine learning approach. Science of The Total Environment. 2020. pmid:33131841
  68. 68. Wendzel A. Constraining mechanistic models of indicator bacteria at recreational beaches in Lake Michigan using easily-measurable environmental variables. ProQuest Dissertations and Theses. [Ann Arbor]: Michigan State University; 2014.
  69. 69. Whitman RL, Nevers MB. Summer E. coli patterns and responses along 23 Chicago beaches. Environmental Science and Technology. 2008;42:9217–24. pmid:19174895
  70. 70. Zimmerman TM. Modeling to Predict Escherichia coli at Presque Isle Beach 2, City of Erie, Erie County, Pennsylvania. 2008.
  71. 71. Zimmerman TM. Monitoring and Modeling to Predict Escherichia coli at Presque Isle Beach 2, City of Erie, Erie County, Pennsylvania. 2006.
  72. 72. De Brauwere A, Ouattara NK, Servais P. Modeling Fecal Indicator Bacteria Concentrations in Natural Surface Waters: A Review. Criticial Reviews in Environmental Science and Technology. 2014 Nov 2;44:2380–453.
  73. 73. Clear Water | City of Chicago. https://chicago.github.io/clear-water/
  74. 74. Walters E, Graml M, Behle C, Müller E, Horn H. Influence of Particle Association and Suspended Solids on UV Inactivation of Fecal Indicator Bacteria in an Urban River. Water, Air and Soil Pollution. 2014 Jan;225:1–9.
  75. 75. Nelson KL, Boehm AB, Davies-Colley RJ, Dodd MC, Kohn T, Linden KG, et al. Sunlight-mediated inactivation of health-relevant microorganisms in water: a review of mechanisms and modeling approaches. Environmental Science: Processes and Impacts. 2018;20:1089–122.
  76. 76. Vogel LJ, O’Carroll DM, Edge TA, Robinson CE. Release of Escherichia coli from Foreshore Sand and Pore Water during Intensified Wave Conditions at a Recreational Beach. Environmental Science and Technology. 2016 Jun 7;50:5676–84. pmid:27120087
  77. 77. Madani M, Seth R. Evaluating multiple predictive models for beach management at a freshwater beach in the Great Lakes region. Journal of Environmental Quality. 2020 Jul 22;49:896–908. pmid:33016491
  78. 78. Francy DS, Struffolino P, Brady AMG, Dwyer DF. A Spatial, Multivariable Approach for Identifying Proximate Sources of Escherichia coli to Maumee Bay, Lake Erie, Ohio Open-File Report 2005–1386. 2005.
  79. 79. Virtual Beach (VB) | Environmental Modeling Community of Practice | US EPA. https://www.epa.gov/ceam/virtual-beach-vb.
  80. 80. He C, Post Y, Dony J, Edge T, Patel M, Rochfort Q. A physical descriptive model for predicting bacteria level variation at a dynamic beach. Journal of Water and Health. 2016 Aug 1;14:617–29. pmid:27441857
  81. 81. Great Lakes NowCast Status. https://pa.water.usgs.gov/apps/nowcast/
  82. 82. Philly RiverCast—Home. https://www.phillyrivercast.org/