Global microbial water quality data and predictive analytics: Key to health and meeting SDG 6

Microbial water quality is an integral to water security and is directly linked to human health, food safety, and ecosystem services. However, specifically pathogen data and even faecal indicator data (e.g., E . coli ), are sparse and scattered, and their availability in different water bodies (e.g., groundwater) and in different socio-economic contexts (e.g., low-and middle-income countries) are inequitable. There is an urgent need to assess and collate microbial data across the world to evaluate the global state of ambient water quality, water treatment, and health risk, as time is running out to meet Sustainable Development Goal (SDG) 6 by 2030. The overall goal of this paper is to illustrate the need and advocate for building a robust and useful microbial water quality database and consortium worldwide that will help achieve SDG 6. We summarize available data and existing databases on microbial water quality, discuss methods for producing new data on microbial water quality, and identify models and analytical tools that utilize microbial data to support decision making. This review identified global datasets (7 databases), and regional datasets for Africa (3 databases


Introduction
In March 2023, the world gathered to review the successes and continued needs for implementing the objectives of the International Decade for Action (2018-2028): Water for Sustainable Development.Greater attention was given to pathogen pollution and disease monitoring through wastewater surveillance.Poor microbial water quality affects the entire water cycle knowledge on microbial water quality for low-and-middle income countries.The available data are mainly on faecal indicator bacteria (FIB), which are not adequate to address health concerns associated with viruses, protozoa or helminths [4,5].Moreover, microbial water quality analyses are currently hindered by limitations of the older culture-based approaches for FIB where results are delayed for 24 hours, faecal source identification (eg.association with humans or animals) is not available, and environmental sources of FIB interfere with the interpretation.Yet, we now have new powerful molecular methods at our fingertips that provide the ability to improve our understanding of microbial water quality around the world that include microbial source tracking and direct pathogen monitoring [4].The SDGs themselves have inspired the world to collect and share data from each country to track progress on ambient water quality as well as the safety of drinking water through the Joint Monitoring Program (JMP).Interestingly, the COVID-19 pandemic increased laboratory capacity across the globe (https://arcg.is/1aummW) and demonstrated that monitoring sewage and polluted waters for viruses directly is feasible and valuable to support the pandemic response.Pathogen and source tracking monitoring data are critical for evidence-based decisions on wastewater treatment, waterbody protection, and restoration of polluted waters to control pathogens and will provide long lasting approaches to protect health, water quality and the environment.
The overall goal of this paper is to illustrate the need and advocate for building a robust and useful microbial water quality database and consortium worldwide that will help achieve SDG 6.We 1) summarize available data and existing databases on microbial water quality, supported by the United Nations, governments, and private entities, 2) discuss the use of innovative molecular methods for microbial water quality, 3) identify and highlight the water quality models and analytical tools that exist that can utilize microbial data to support water quality assessment and decision making.
In the paper we refer to microbiological water quality for different water bodies.Our main focus is water quality of ambient water (surface and groundwater), as this closely links to SDG 6.3.2 on the proportion of water bodies with good ambient water quality.However, to better understand the ambient water quality, also wastewater (as a source) and irrigation, recreational and drinking water (related to the impacts (health risks)) are relevant.These data can be used in assessments with molecular methods, models and tools from the sources of pathogens to the exposures and can inspire use of relevant tools for addressing the One Water concept [6] (a water management approach adopted from integrated water resources management as described by the Water Research Foundation and US Water Alliance among other).

Available water quality databases
Table 1 presents a summary of microbial databases that exist around the world.The table is not exhaustive but helps to illustrate the scale and fragmentation of microbial water quality data globally.Databases range in scale from global to local and mostly cover surface waters, although some databases cover groundwater and drinking water.Many databases are open and some of the data can be viewed online or downloaded in the form of a report or machinereadable formats.
There are two global databases that are open and most accessible for downloading of microbial water quality data for ambient waters: 1) The Global Freshwater Quality Database (GEM-Stat) by the United Nations Environment Program (https://gemstat.org/about/dataavailability/).The GEMStat database provides data on the state of global inland water quality.At the time of writing, the database contains more than 15 million entries from more than 80 countries.However, most of these entries are for chemical and physical parameters with microbiological parameters representing only a minority of entries (6874 entries).The GEMS However, currently it is unclear how many countries have microbial data in their database and the number of records available (Box 2 is an example of one such database).Ambient water databases should include both surface and groundwaters; this type of data could be linked to both drinking and recreational designated uses.Microbial water quality data on wastewater is primarily in the published literature or in records at facilities on discharge limits for FIB.However, an example of a global database with virus concentration data is the Wastewater-SPHERE, a global data and use case repository of wastewater surveillance for SARS-CoV-2 (https://sphere.waterpathogens.org/)currently containing almost 200,000 wastewater sample records from 21 countries (See Box 3).
There are several existing regional datasets.North America, Europe and Australia seem to dominate in the number of databases that exist for monitoring microbial water quality of surface waters for recreation, drinking water and groundwater.However, many of the databases are local, site specific, or region specific.Data in Asia and Africa exist, but most of the data are closed and not accessible.The situation is similar in South America where only one, closed, database was found in Brazil.Nearly all microbial data available in these databases are restricted to indicator bacteria such as total coliforms, thermotolerant (faecal) coliforms, E. coli, Enteroccocci, Streptoccocci and H2S forming bacteria.The CEDEN database (from California) includes some data on Giardia, Cryptosporidium, Salmonella, and human-associated microbial source tracking (MST) markers, and the GEMStat and SURVAL (from France) databases includes some Salmonella data, but otherwise pathogen data are absent.

Box 2. Example of a National Database
New Zealand has all microbial water quality data available at one website which covers groundwater, lake water, river water and recreational water.The data are up to date, easy to download and available as excel sheets.Spatially and temporally, the available microbial indicator data available are quite vast and cover millions of monitoring sites collectively across the globe.In some cases, the data date back to as early as the 1970s.However, the data are fragmented across local, regional, and national databases and it is impossible to easily visualize the data either spatially or temporally.The GEMStat database, although limited in the number of microbial data records, does allow you to visualize on a global map what countries have microbial water quality data, and then when you select certain regions you can drill down and temporally visualize datasets by waterbody.
• The GEMStat database could serve as a global repository for all microbial datasets for ambient waters which then could be linked to wastewater and drinking water data (both infrastructure and monitoring information).If data "keepers" from around the world would be willing to submit their data and appropriate metadata to this centralized repository in the necessary formats then a very valuable dataset would be available for making decisions on protection of watersheds and large waterbasins for a variety of ecosystem services.
Several databases represent "big data".Generally, these are generated using the data from large monitoring programs to evaluate compliance with drinking water or recreational water standards.The WHO-UNICEF JMP water quality testing household surveys or even the SDG and The United Nations Economic Commission for Europe (UNECE) datasets are based on large amounts of data, but only provide the summary statistics (% compliance in different categories).Such datasets are also communicated through published reports in non-reusable formats (PDF).The large databases on recreational water quality data from Europe and the United Kingdom (https://water.europa.eu/freshwater;https://environment.data.gov.uk/bwq/profiles/) allow access to the data by site, but do not allow users to query the data.Although it is understandable that these databases, which were created to address bathing water guideline compliance monitoring, are condensed into a format that allows evaluation of regulatory compliance, this also means there is significant loss of information and lack of reusability.Microbial water quality is often quite variable, and condensing data does not allow evaluation of the true spatial and temporal variability of the data and understanding of its underlying processes.So, even though the original (source) data are available and contain much more information, they are often omitted from public access.
• We strongly suggest that microbial databases provide access to all source data for their records, allowing users to validate, query and observe their geo-temporal variability.
Lastly, to ensure the quality of a global microbial dataset, it is important for its metadata to comply with or even cross-reference other databases and standardized resources.For example, when capturing regional distribution of data, one could make use of existing administrative division standards such as ISO-3166-2 (https://www.iso.org/iso-3166-country-codes.html#2012_iso3166-2) or GADM (https://gadm.org/).Similarly, characterization of progress towards drinking water goals could follow the JMP classification.The same practice could prove particularly useful when referencing standard microbial methods employed such as the Standard Methods for the Examination of Water and Wastewater (https://www.standardmethods.org/)or the recently developed Environmental Microbiology Minimum Information (EMMI) guidelines for qPCR and dPCR [7].
Although ensuring high quality data is of the utmost importance, it may take many years to enforce data quality standards and this may be a barrier to researchers in publishing their data.At the starting point of a global dataset, it seems most practical to include data that has been provided by reputable sources (i.e.accredited labs, government agencies) and published datasets from researchers where one assumes that the peer review process will have ensured some data quality.Some journals such as Environmental Science and Technology are already requiring that molecular environmental data published in their journal report the necessary elements outlined in the EMMI guidelines prior to publication.However, there are also situations where urgent public health response is mandated.The COVID-19 pandemic has shown that many labs worldwide can provide useful data in the absence of standard methods, with the goal to inform decisions in real time.In such cases, efforts in compiling global datasets could focus on standardizing data that becomes available.Data standards such as the Public Health Environmental Surveillance Open Data Model (https://github.com/Big-Life-Lab/PHES-ODM) and initiatives that focus on global data harmonisation such as the W-SPHERE could prove to be valuable tools in determining the data and metadata gaps of published data, while ensuring their compliance with FAIR principles [8] and overall data quality.

Use of innovative molecular methods
While most of the datasets in Table 1 contain data on FIB, the introduction of innovative molecular methods/instruments such as quantitative and digital PCR and next generation sequencing are opening up opportunities to monitor more specific targets, such as pathogens and markers for specific faecal pollution sources (See Box 4).The COVID-19 pandemic has boosted the use of molecular tools such as PCR in water laboratories around the globe [9,10].National and local wastewater monitoring programs were developed in many countries and continue to provide valuable, unbiased information about trends in SARS-CoV-2 infections and early warning of new virus variants-of-concern.For example, the COVIDPoops19 dashboard lists 166 dashboards of SARS-CoV-2 in wastewater, with 288 universities, 72 countries, and 4,107 sites (https://arcg.is/1aummW).Although wastewater monitoring is primarily occurring in high-income countries, many low-and middle-income countries have also implemented these methods [11].Some countries and sites have even started to include other targets beyond COVID-19 including poliovirus, influenza A, Respiratory Syncytial Virus (RSV) and Monkeypox (Mpox).
Box 4. Use of Microbial Source Tracking Genetic Targets for Water Quality [13] The impact on faecal pollution analysis in health-related water quality research by nucleic acid-based methods, such as PCR analysis and sequencing, was assessed by rigorous literature analysis.A wide range of application areas and study designs was identified, since the first application more than 30 years ago (>1,000 publications).This comprehensive meta-analysis provides the scientific status quo of this field, including trend analyses and literature statistics, outlining identified application areas, and discussing benefits and challenges of nucleic-acid-based analysis in water.Given the consistency of methods and assessment types, this emerging science is a new discipline: genetic faecal pollution diagnostics (GFPD) in health-related microbial water quality analysis.Without any doubt, GFPD has already revolutionised faecal pollution detection and microbial source tracking, the current core applications.The widespread application of GFPD in various studies on the impacts of pathogen pollution on ambient waters means that the data could easily be included in a global database.
The presence of these global wastewater monitoring programs at labs with new molecular capabilities demonstrates the global capacity to fill some of the gaps in the existing microbial water quality datasets.The presence of such a large number of wastewater monitoring programs using molecular targets demonstrates that such monitoring efforts are possible for other matrices, such as surface waters, and for other targets.For wastewater surveillance and indeed for the monitoring of pathogens in ambient waters in general, this is just the beginning.A global consortium to support capacity development and data sharing would catalyse this evolution.One positive lesson that can be learned from the global wastewater surveillance efforts is the need to use standardized formats and data dictionaries.
One way global wastewater monitoring labs can support ambient microbial water quality monitoring is through the utilization of new molecular tools, such as microbial source tracking.This method uses key molecular targets to identify more specific information about the sources of faecal pollution.This makes the monitoring data more useful to support measures to mitigate unsafe waters.Molecular microbial source tracking assays have moved from the academic realm to applications in environmental monitoring, for instance of bathing waters and transboundary waters (https://ijc.org/en/hpab/great-lakes-water-quality-centennialstudy-phase-i-report)[12].The great value of these new molecular methods is that all targets can be assayed with one platform (PCR).So once the method is established for one target, it can be expanded to other targets.
• We strongly support the use of new molecular methods for source tracking and targeted pathogen monitoring; resources should be focused on harnessing the laboratory infrastructure that was developed during the pandemic to monitor wastewater for SARS-CoV-2 to more targets and water bodies.This is particularly valuable in low-and middle-income countries and underserved areas in high-income countries, where waterborne diseases are most prominent.

Models and analytical tools that support water quality assessment and decision making
As mentioned above, existing datasets are temporally and spatially disjointed, and data are inequitably distributed between developed and developing regions.Models can be highly effective at supporting microbial water quality assessment and management by filling in some of these data gaps, making inference in geospatial and temporal instances where data are lacking.Moreover, models can also be used with scenarios to help improve our understanding of potential future events, such as climate change or the implementation of interventions.Models are developed to represent reality using mathematical equations and key assumptions.There are different types of models, including those that are more predictive (spatially, temporally, or in response to some event) and those that are more inferential (identify risk factors or sources of microbial pollution).There are also empirical models and process-based models.Empirical models can be useful to explain variability in data, but they are only relevant in areas where data exist.The use of empirical models for extrapolation into future scenarios can be difficult since the models may not account for all relevant influencing variables.Process models, on the other hand, can be useful in data-scarce regions as they can be directly applied to analyse future scenarios or the effectiveness of future interventions.However, the uncertainties associated with these models should be evaluated and quantified through sensitivity analyses and stochastic approaches, and followed up with data collection to verify or falsify the model predictions.
Important criteria to take into account when developing or selecting an existing model for microbial water quality modelling are the spatial and temporal scale and resolution, the water body, the microorganism (and its sources), and an understanding of the relevant environmental processes [14].The spatial extent and resolution determine the area and level of detail of the model (i.e., the size of an area and how detailed you want to produced modelled values for that area).Relevant water bodies may include rivers or smaller streams, lakes, reservoirs or ponds, recreational waters, urban floodwater, or groundwater in different aquifers.The time period and temporal resolution (i.e., how close together are your timepoints: hourly, daily, monthly, etc.) are important and can be dictated for consideration-for example, this may be important in understanding or predicting the impact of extreme hydroclimatic events.
Models have been developed for different microbial groups.While E. coli is frequently used as an indicator for faecal contamination, any microorganism or microbial target can be used, including the new molecular targets mentioned above (Section 3).This allows for an understanding of human wastewater and health risks, using details on the different pathogen types, which have different health risks, as well as transport and fate characteristics [5].Cryptosporidium, an important waterborne pathogen, is an example of a microorganism that is relatively easy to model in the environment, as it does not reproduce in water.Other pathogens may have more complex life cycles and interactions with the environment that need to be considered, such as the helminth Schistosoma spp., which requires a freshwater snail to transform into the lifecycle stage that is infectious to humans [15].Antimicrobial resistant bacteria (ARB) or antimicrobial resistance genes (ARGs) are also more complex to model, because resistance can develop in the environment amidst low concentrations of antibiotics and the genes can also be transmitted from one bacteria to another [16].Finally, the concentration of toxin producing harmful algal blooms (HABs) caused by algae, such as cyanobacteria or dinoflagellates, is strongly related to nutrient concentrations and composition [17].
Several statistical and process-based models of microbiological water quality are available.Table 2 summarizes these models, and while the list is not exhaustive, this table illustrates both the opportunities and limitations of the models.Many of the models can be used by planners and decision makers as tools for purposes, such as early warning and future projections.
One way to make models more accessible to decision-makers is by developing online analytical tools with intuitive user interfaces that allow users to run the models without the need for advanced computer programming experience.The idea of using online analytical tools to assist with the management of microbial risks from waterborne pathogens in recreational waters has been articulated since more than a decade ago [18] and now many analytical tools exist that can be used to visualize microbial water quality and manage risks associated with the use of surface waters for recreation [19] or as sources for drinking water treatment plants [20][21][22][23][24].Other analytical tools are more geared towards sanitation services, or wastewater/faecal sludge treatment systems [25][26][27][28].
An example of an analytical tool that provides model outputs at a global level is the Knowledge-to-Practice Pathogen Flow and Mapping tool (https://tools.waterpathogens.org/maps)[29,30].This tool is based on the Global Waterborne Pathogen (GloWPa) model [31,32], which is a process-based microbial waterborne pathogen model that simulates concentrations of pathogens in rivers worldwide (see Fig 1 for an example of spatially continuous outputs of the model for Cryptosporidium).The online Pathogen Flow and Mapping Tool [29,30] allows regional planners and government authorities, such as Ministries of the Environment, to run scenarios that can help them determine where to prioritize investments in sanitation.
Table 3 shows a summary of analytical tools that use existing data and/or predictive microbiology models to inform decision-making related to water quality, with respect to safe drinking water, recreational water quality, agricultural water quality, and the safety of sanitation, wastewater, and faecal sludge treatment systems.Many of these tools, such as QMRAspot, CC-QMRA, and QMRAcatch, support the management of microbial risks from drinking water, with some of these tools allowing users to model risks from multiple pathways, and some allowing users to use their own microbial water quality data to calibrate source-specific risks and establish health-based pathogen removal targets for drinking water [20-22, 28, 33].
Some analytical tools use models to evaluate the performance of wastewater treatment plants (Table 3).For example, the GWPP Treatment Plant Sketcher Tool allows users to create a custom process flow diagram, or "sketch" of a wastewater treatment system, specifying its configuration, design, and operation, then uses that information to predict the effectiveness of the overall "sketched" wastewater or faecal sludge treatment system at removing and reducing pathogens (viruses, bacteria, protozoa, helminths).The tool also lets users visualize the proportion of pathogens that end up in the liquid effluent versus the sludge biosolids, to support the design of new or upgrades to existing wastewater or faecal sludge treatment plants.
Other analytical tools, such as the HAB data viewer, SWIM:AI, and Vibrio mapviewer (Table 3), use real-time available satellite data on environmental variables, such as rainfall or sea surface temperature, to predict the emergence of HABs or pathogens (e.g., Vibrio spp.), to either prompt sampling for confirmation of the situation, or to warn people not to go swimming.These tools usually have some sort of statistical model underlying the predictions.
Although several models and tools exist, most of them are recent and very few are truly global, so there are still many gaps and opportunities for addressing microbial and pathogen water quality worldwide.For example, while there are some models that predict the microbial quality of groundwater [38,39], there are no models that can be reliably applied at a global scale.Equally, most models are for indicator microorganisms rather than the more relevant pathogens.While similar modelling frameworks used for indicators may be (and have been) in some cases for pathogens and microbial source tracking markers, a lack of concurrence between indicator and pathogen data has been previously noted in the literature, with respect to pathogen persistence [56] and removal by treatment [57], (e.g., for protozoa in particular).Scenario analyses to determine future changes or impact of interventions are rare, though essential for water quality assessments and projections.Planners and decision makers can be much better informed about microbial water quality problems and mitigate public health impacts using models, particularly as the data are scarce.
There are lessons for the water sector to learn from the long-standing precedent in the food sector for the use of predictive analytical tools to forecast microbial risks based on knowledge about foodborne pathogens [58][59][60][61].There is a need to develop curated, risk assessment knowledge repositories that provide tools and models in machine-readable formats [62] to more effectively transfer and reuse existing knowledge about pathogens.While the use of analytical tools to assess microbial water quality is less advanced than the use of similar tools in the food safety sector, there is a need to develop appropriate repositories that may have dual purposes for food and water.
Currently, there are only a few websites that serve as repositories for online analytical tools that support microbial water quality assessments.The Global Water Pathogen Project (GWPP) has created a repository of knowledge, data, and tools about waterborne pathogens (https://www.waterpathogens.org/).This project, established by Michigan State University and UNESCO begun in 2015, includes an open access online book [5], a data portal, and webbased analytical tools for predicting pathogen emissions and persistence in the environment

PLOS WATER
and in sanitation systems.Watershare (https://www.watershare.eu/),which is an international network of utilities, research organizations, and service providers, is another example of an online hub that includes links to several analytical tools that assist users with water quality and water treatment assessment.Depending on the needs within a specific context, it may be necessary to use more than one tool in combination.For example, Okaali et al. (2022) [29] found that the use of several analytical tools (SaniPath, HyCristal, and PFMT) allowed for the identification of health hazard hotspot areas and high-risk exposure pathways to faecal pathogens from sources including food products and drainage systems.Thus, there is a need to continue supporting online knowledge hubs and repositories that provide the tools and models in machine-readable formats and allow for practitioners to access knowledge that has been developed through scientific research.
Many of the summarised models and online analytical decision support tools are open access and can be used to fill in geospatial and temporal gaps where microbial water quality data are not available.New tools will continue to be developed as needs are identified.When developing new models and analytical tools to manage water quality, the following considerations should be made [70] 1. define the purpose of the tool, 2. establish the modelling approach to be used, 3. confirm the availability of data, 4. work with stakeholders to determine how the tool will be applied.
The gap between knowledge and decision making can be filled by harnessing state-of-theart data science and web technology approaches.Such approaches include: a) Data fusion among diverse data sources (such as water quality data, clinical data, SDG progress reports) to provide a more holistic context for data analysis; b) AI-powered insights that are tailored for decision support (e.g., early-warning systems for water pathogen outbreaks, automated estimation of pathogen trends and risk classification, machine-based reporting that translates pathogen data to actions for policy makers); c) Novel visualizations that show the geo-temporal distribution of water pathogens and their association with monitored community behaviour and demographics; d) Connection of existing models and tools to explore new, integrated approaches to decision support; e) Web accessibility standards adoption for online tools and datasets, to address user access from low-bandwidth connections worldwide.
• We recommend the use of models and analytical tools to supplement water quality monitoring efforts world-wide, particularly because of the poor data bases and inequitable coverage.Analytical tools should be collated in tool repositories to foster more widespread support of risk assessment and management.
• We also suggest that training programs are initiated and funded for using these water quality models and analytical tools, supporting collaborations between the scientific community, watershed alliances, and utilities.

Conclusions: Approaches to improve microbial water quality data worldwide
There are great challenges along the road towards meeting the goals of SDG 6, but we also believe there are great opportunities for the digital transformation of the microbial water quality sector to support SDG 6.The time has come to invest in a global data initiative to accelerate the assessment and control of water pathogen pollution.Data and information are needed to accelerate the slow progress for meeting the SDG 6 goals and address microbial water quality to improve health, food safety, ecosystem services, and economic vitality of communities.First, we have shown that existing databases are fragmented and most do not have adequate spatial and temporal coverage, and mostly deal with faecal indicator data, which are inadequate for providing information on pathogen risks.Nevertheless, it is encouraging to see the evolution of large datasets, which generally originate from water quality directives.Global or supranational datasets (see EU Wise, WHO JMP etc) make use of these data, with global objectives such as the SDG as driving force.And despite limited resources and capacity, also LMIC generate data that feed these global data platforms and inform the government about their national WASH priorities.The GEMS database is a promising platform for a global microbial water quality data repository and we recommend that this platform be further utilized to house microbial datasets generated by government agencies and researchers across the globe.
Second, we have highlighted that powerful new molecular methods and boosted laboratory capacity as a result of the COVID-19 pandemic means that there is more global capacity and models to generate high quality datasets, including pathogen data that are lacking.
Third, we have identified models and analytical tools that can help fill gaps in data availability and support decision making and water quality assessments at various scales.Even with limited data, the use of water quality models and analytical tools will advance the understanding of risks and strategies for management.Ideally, databases, models, and tools should be developed together with contributors and users of the data.
Databases together with the models and tools provide first steps towards a global knowledge repository on pathogen pollution.These data with appropriate metadata will be actionable to improve microbial water quality assessment and understanding and ultimately inform management decisions.
As the COVID-19 pandemic has shown, microbial data can be rapidly collected, processed, and shared using FAIR principles [8].There is also ample ground for new digital tools that provide access to data with an eye toward equity, especially focusing on low-and middleincome countries.As discussed, many existing databases data are often presented in non-actionable formats.However, moving toward state-of-the-art data science along with improvement of databases as presented in Section 4 will ensure that data exchange becomes much easier and more accessible.
We provide the following recommendations for consideration.
1. Form an international microbiological water quality consortium through the International Water Association to a) establish metadata and data quality criteria for microbial data (learning from the GEMStat and New Zealand government databases); b) develop mechanisms for data submission to a global database (e.g., governments submit data directly; researchers submit data as a requirement for publication in IWA journals) 2. Advocate that the Global Freshwater Quality Database (GEMStat) by the United Nations Environment Program take the lead in compiling global microbial data and expand their current dataset, starting with including the data from the surface water databases listed in Table 1 using open data sharing principles.
3. Start with IWA journals, to develop a standard for data submission to the central global repository for researchers.Making submission of microbial data a requirement for publishing will contribute to a FAIR microbial water quality data repository.
4. Continue to invest in the laboratory capacity built during the pandemic to evaluate microbial water quality using genetic faecal pollution diagnostics, targeted pathogen monitoring in addition to faecal indicators.
5. Improve and utilize models to fill the gaps in observational datasets.Their outputs should also be incorporated in the global database.Additionally, whenever a new model or analytical tool is developed, this should be added to a global repository of models and tools that is closely linked to the global database.
6. Expand and co-develop training programs with stakeholders in low-and middle-income countries for increased capacity development in water quality monitoring and modelling.
Training programs need to be developed in collaboration to address current inequalities (data colonialism) [71] and to ensure indigenous knowledge is utilized to improve the development and use of models and tools.Ideally, the low-and middle-income country scientists and practitioners lead the collection of samples and analysis in country and related publications.Training programs should be co-developed with stakeholders and training course development should become standard practice in the projects in which the models and tools are developed.

Table 1 .
(Continued) a Methods used for these datasets are based on culture with the exception of those which are unclear see below b Data are quantitative c Data are Presence/Absence d Data are unclear regarding whether these data are cultured or quantified https://doi.org/10.1371/journal.pwat.0000166.t001site allows for data download, visualisation and data upload.2) The Water Quality Portal is a data portal assembled by the United States Geological Survey (USGS), the Environmental Protection Agency (EPA), and over 400 state, federal, tribal, and local agencies in the United States (https://www.waterqualitydata.us).The US-based Water Quality Portal has microbial indicator data from surface water and groundwater, mostly from across the United States.The database also contains some data from other countries and is setup to compile additional global data.