Water-quality monitoring in rivers often focuses on the concentrations of sediments and nutrients, constituents that can smother biota and cause eutrophication. However, the physical and economic constraints of manual sampling prohibit data collection at the frequency required to adequately capture the variation in concentrations through time. Here, we developed models to predict total suspended solids (TSS) and oxidized nitrogen (NOx) concentrations based on high-frequency time series of turbidity, conductivity and river level data from in situ sensors in rivers flowing into the Great Barrier Reef lagoon. We fit generalized-linear mixed-effects models with continuous first-order autoregressive correlation structures to water-quality data collected by manual sampling at two freshwater sites and one estuarine site and used the fitted models to predict TSS and NOx from the in situ sensor data. These models described the temporal autocorrelation in the data and handled observations collected at irregular frequencies, characteristics typical of water-quality monitoring data. Turbidity proved a useful and generalizable surrogate of TSS, with high predictive ability in the estuarine and fresh water sites. Turbidity, conductivity and river level served as combined surrogates of NOx. However, the relationship between NOx and the covariates was more complex than that between TSS and turbidity, and consequently the ability to predict NOx was lower and less generalizable across sites than for TSS. Furthermore, prediction intervals tended to increase during events, for both TSS and NOx models, highlighting the need to include measures of uncertainty routinely in water-quality reporting. Our study also highlights that surrogate-based models used to predict sediments and nutrients need to better incorporate temporal components if variance estimates are to be unbiased and model inference meaningful. The transferability of models across sites, and potentially regions, will become increasingly important as organizations move to automated sensing for water-quality monitoring throughout catchments.
Citation: Leigh C, Kandanaarachchi S, McGree JM, Hyndman RJ, Alsibai O, Mengersen K, et al. (2019) Predicting sediment and nutrient concentrations from high-frequency water-quality data. PLoS ONE 14(8): e0215503. https://doi.org/10.1371/journal.pone.0215503
Editor: Weili Duan, University of the Chinese Academy of Sciences, CHINA
Received: April 2, 2019; Accepted: August 19, 2019; Published: August 30, 2019
Copyright: © 2019 Leigh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: Funding for this project was provided by the Queensland Department of Environment and Science (DES; https://www.des.qld.gov.au/) and the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS; https://acems.org.au/home; Grant CE140100049).
Competing interests: The authors have declared that no competing interests exist.
Measuring the concentrations of sediments and nutrients in rivers, and understanding how they change through time, is a major focus of water-quality monitoring given the potential detrimental effects these constituents have on aquatic ecosystems. Such knowledge can help inform the effective management of our land, waterways and oceans, including World Heritage Areas such as the Great Barrier Reef in the Australian tropics [1,2,3]. In regions dominated by highly seasonal, event-driven climates, such as those in the tropics, high-magnitude wet-season flows can transport large quantities of sediments and nutrients from the land downstream in relatively short time frames . The rapidity of change in sediment and nutrient concentrations during high-flow events poses challenges for water-quality monitoring based on discrete manual sampling of water followed by laboratory measurement of concentrations, which is time consuming, costly and typically temporally sparse. Relatively low sampling frequency increases the chances of missing water-quality events, but high flows may preclude the safety conditions required for manual sampling, and sample collection at the frequency required to capture change in concentrations may not always be physically or economically practical. The spatial sparsity of measurements from manual sampling is also problematic. For example, the Great Barrier Reef lagoon stretches over 3000 km of coastline, but the data currently used to validate estimates of sediments and nutrients flowing to the lagoon are collected from just 43 sites . This lack of data limits knowledge and understanding of sediments and nutrient concentrations in both space and time.
In situ sensors have the potential to complement or circumvent the need for manual sampling and laboratory analysis, whilst also providing monitoring data at the frequencies required to capture the full range of water-quality conditions occurring in rivers (e.g. every 15–60 mins). However, in situ sensors currently used to measure sediments and/or nutrients (e.g. nitrate) have drawbacks related to biofouling and drift, excessive power requirements, high costs, and/or low accuracy and precision . An alternative is to use in situ sensors at sites of interest to measure time series of water-quality variables, such as turbidity, conductivity, and river level (i.e. height), that have the potential to act individually or in combination as surrogates for sediments and nutrients [7,8]. However, an in-depth understanding of the relationship between sediment and nutrient dynamics and other water-quality variables is needed before the latter can be used as surrogate measures.
Turbidity is a visual property of water indicative of its clarity (or lack thereof) due to suspended particles of abiotic and biotic origin that absorb and scatter light. As a result, turbidity tends to increase during high-flow events in rivers, when waters often contain high concentrations of particles (e.g. sediments and nutrients from runoff-derived soil erosion), which makes it a popular surrogate for total suspended solids (TSS; e.g. ). Turbidity can also increase when water residence times increase during low flows, due to the resultant concentration of suspended particles, or when high concentrations of microalgae reduce water clarity. Both turbidity and conductivity of water can change rapidly during flow events. Conductivity reflects the ability of water to pass an electric current as determined by the concentration of ions, which can include nitrate and nitrite (oxidized nitrogen; NOx = nitrite + nitrate). As such, new inputs of fresh water will typically decrease conductivity in rivers as waters rapidly dilute and water levels rise. In contrast, conductivity tends to increase during low-flow periods and when water levels decline (see  for a detailed discussion). Turbidity and conductivity, together with river level, thus have the potential to act as a combined proxy for nutrients such as NOx (e.g. [7,10]). Furthermore, the relationships described above between water level, turbidity and conductivity, and the potential of these variables to act as individual or combined proxies of river flow, TSS and NOx, are grounded in the vast body of work on flow-based relationships with water quality constituents, which can be found even where those relationships are complex, strongly dependent on rare but intense precipitation events, or involve hysteresis [4,11,12].
A wide range of modelling techniques have been used to describe the relationship between sediment and nutrient concentrations and other water quality constituents. For example, Artificial Neural Networks have been used to predict nitrate from multiple measures including other nutrient concentrations , and standard major axis regression to quantify the relationship between turbidity and TSS . One of the most common approaches is to use linear regression to predict TSS from turbidity [4,7,8,11,15,16,17], nitrogen species from turbidity and conductivity [4,10] and phosphorus species from turbidity [4,7,8,11,16,18]. However, these regression models typically fail to account for the temporal autocorrelation (i.e. serial correlation) inherent in water-quality time series and/or the heteroscedasticity in the data. Presence of such features violates the underlying assumptions (e.g. independent and identically distributed residuals), which can lead to biased variance estimates, inflated statistical significance of predictor variables and thus incorrect inference. Models that account for temporal autocorrelation and/or heteroscedasticity through the incorporation of random effects and/or specific variance-covariance structures can produce more accurate and precise predictions when temporal autocorrelation exists in the data (e.g. [12,19]). Despite the advantages of using mixed-effects models that account for temporal autocorrelation to predict concentrations of sediments and nutrients from water-quality time series, they are rarely used for this purpose.
Our key objective was to predict TSS and NOx from high-frequency water-quality data using models that accounted explicitly for temporal autocorrelation and heteroscedasticity. We used turbidity (NTU), conductivity (μS/cm) and river level (m) data collected using in situ sensors in rivers flowing into the Great Barrier Reef lagoon (Fig 1), along with water-quality data measured in the laboratory, as surrogate covariates. We aimed to assess whether relationships between TSS or NOx and the water-quality surrogates differed (i) among sites and (ii) between estuarine and fresh waters. Surrogate approaches are often site-specific and as such suffer from lack of transferability . Thus, we further aimed to assess (iii) whether a single mixed-effects model fit to the water quality surrogates could be used to predict TSS or NOx over multiple locations, and when using data collected by in situ sensors. By investigating the predictive ability of the models, our findings will provide a basis to determine the most effective water-quality surrogates for TSS and NOx, along with the potential generalizability of the models across locations in the study area.
Study sites (closed circles), rivers and catchment boundaries within north tropical Queensland, Australia (left panel), the Wet Tropics (MR; middle panel) and Mackay Whitsunday regions (PR and SC; right panel). Closed triangles show the major towns of Cairns, Townsville and Mackay.
Materials and methods
Study region and sites
Our three study sites are located in rivers that flow into the Great Barrier Reef lagoon along the northeast coast of tropical Australia in Queensland (Fig 1). We chose these sites because they had comprehensive water-quality datasets available containing both laboratory-measured sediment and nutrient concentrations, as well as high frequency in situ water-quality data from multiple sensors. Two of the sites (Sandy Creek and Pioneer River) lie within the Mackay Whitsunday region and the third (Mulgrave River) lies within the Wet Tropics region. These two regions are characterized by seasonal climate, with higher rainfall and air temperatures in the ‘wet’ season and lower rainfall and air temperatures in the ‘dry’ season. Although there is inter-annual seasonal variation in climate and river flow in both regions, the wet season typically occurs from December to April in the Mackay Whitsunday region, and from November to April in the more northern Wet Tropics region [20,21,22]. The wet season is typically associated with tropical cyclones, monsoonal rainfall and associated event-flows in rivers, and the dry season with low to zero surface flow.
Pioneer River (21.1441° S, 149.0753° E) rises in the forested uplands of the Great Dividing Range in north Queensland . Many of its upper reaches lie within National or State Parks, whilst land use in the mid and lower reaches is dominated by sugarcane farming. Sandy Creek (21.2831° S, 149.0228° E) is a low-lying coastal-plain stream south of the Pioneer River, where the dominant land use is also sugarcane farming. The Mulgrave River (17.2075° S, 145.9264° E) in the Wet Tropics World Heritage Area rises, like the Pioneer River, in forested National Park uplands of the Great Dividing Range and flows through mostly cleared alluvial floodplains in its lower reaches . The Pioneer River and Sandy Creek sites (PR and SC) are in freshwater reaches, and the Mulgrave River site (MR) is in an estuarine reach. The monitored catchment area of each site is 1466 km2 (PR), 326 km2 (SC) and 789 km2 (MR).
Laboratory and in situ sensor data
The Queensland Department of Environment and Science (DES) has installed an in situ automated water-quality sensor (YSI EXO2 Sonde attached with an EXO Turbidity Smart Sensor 599101–01 and EXO Conductivity & Temperature Smart Sensor 599870) at each of the three study sites. Sensors are housed in a flow cell in water-quality monitoring stations on riverbanks; water is pumped at regular intervals from the river to the flow cell, every hour or hour and a half depending on the site and variable being measured, and sometimes more frequently during event flows. The sensors measure and record turbidity (NTU) and electrical conductivity at 25 °C (conductivity; μS/cm). Pressure-induction sensors record river level (i.e. height in meters from the riverbed to the water surface; level, m) every 10 minutes. Time-matched observations of level for the occasional out-of-step turbidity or conductivity measurement are provided via linear interpolation of the ten-minute level data. Sensors are equipped with wipers to minimize biofouling, and all equipment are checked and sensors calibrated every six weeks following manufacturer guidelines. Prior to analysis, all the turbidity, conductivity and river level data were quality controlled and assured following standard procedures as per the laboratory data (see below), which for the sensor data included detection and removal of technical anomalies following the framework outlined in .
DES manually collect grab-samples of water approximately monthly, and more frequently during event flows in the wet season when safety permits, from each site for laboratory analysis of water quality, as part of their Great Barrier Reef Catchment Loads Monitoring Program . The laboratory data therefore contain unequally spaced observations of water quality through time. The goal of the program is to track long-term trends in the quality of water entering the Great Barrier Reef lagoon from adjacent catchments, as part of the Paddock to Reef program . Collection, storage, transport and laboratory analysis of grab-samples is conducted under strict quality control and assurance procedures [27,28,29,30]. Samples are analyzed in the National Association of Testing Authorities credited Science Division Chemistry Centre laboratories (Dutton Park, Queensland) for turbidity, conductivity and concentrations of TSS (mg/L) and NOx (mg/L) following standard methods . DES also record river level on-site on most occasions when grab-samples are collected.
Turbidity, conductivity, TSS and NOx data measured in the laboratory were available from January 2016 to June 2017 at SC and MR and from January 2016 to October 2017 at PR (Table 1, Figs 2–4). Turbidity, conductivity and level data measured and recorded in situ by automated sensors were available from March 2017 to March 2018 at all three sites (Table 1, Figs 2–4). As such, the data captured an entire water-year for these event-driven systems. This included the wet-season high-flow period associated with approximately 5–10 high-turbidity and low-conductivity events, depending on the site, and the dry-season low-flow period (Figs 2–4). Inputs of saline water from groundwater inputs or tidal influence will increase the conductivity of surface waters and confound the relationship between conductivity and NOx. For this reason, all conductivity observations under tidal influence at MR (i.e. those greater than the maximum conductivity observed across the two freshwater sites over the same time span, which was 1100 μS/cm) were removed from the laboratory and sensor data prior to analysis. The ranges of the laboratory-measured turbidity, conductivity and level data at each site were comparable and observations exhibited similar patterns through time as the respective turbidity, conductivity and level data from the in situ sensors at each site (Figs 2–4, Table 1), validating the use of the laboratory data to build models for subsequent prediction using the sensor data.
Laboratory-measured (open circles) and in situ sensor-measured turbidity (NTU). Mulgrave River (MR; purple points), Pioneer River (PR, blue points) and Sandy Creek (SC; light green points).
Laboratory-measured (open circles) and in situ sensor-measured conductivity (μS/cm) at Mulgrave River (MR; purple points), Pioneer River (PR, blue points) and Sandy Creek (SC; light green points).
We fit generalized-linear mixed-effects models with a continuous first-order autoregressive correlation (AR(1)) structure  to the laboratory TSS or NOx (i.e. the response) and surrogate water-quality variables (i.e. the covariates). The models are of the form: where y is an n-dimensional vector of TSS or NOx collected at times t1,…,tn; n is the number of observations; X is an n × p design matrix of p covariates (including the intercept) collected at times t1, …tn; β is a p-dimensional vector of estimated regression coefficients; and ε is an n-dimensional vector of zero-mean, normally distributed errors with covariance matrix σ2Λ. The covariance is defined by a continuous AR(1) structure, such that , where ϕ is the parameter in the AR(1) process, which can range between 0 and 1 and defines how the autocorrelation declines with time. The continuous AR(1) structure accounts for both the temporal correlation and unequal temporal spacing present in the laboratory time-series data.
We selected potential covariates for the TSS and NOx models based on plausible mechanisms that could cause changes in TSS or NOx, evidence from the literature, exploratory data analysis, and the availability of covariates within the laboratory dataset (Fig 5, Step 1; ). Turbidity, conductivity, TSS, NOx and river level were all log10-transformed prior to analyses to meet model assumptions. We chose the log10-transform because it is commonly used and has the benefit of being easy to interpret on the transformed scale. Predictions from the models were back-transformed with bias correction  for graphical visualization and assessment of accuracy and precision with respect to the laboratory TSS and NOx concentrations.
LevelQ is a categorical variable with two levels based on first, second or third quartiles of the data (Q1, Q2 or Q3). Turbidity, conductivity and level covariates were all log10-transformed prior to analysis.
Exploratory analyses indicated that including data from all three sites in a single TSS model was appropriate; there was a strong and similar positive relationship between turbidity and TSS at all sites (S1 Fig), reflecting the physical properties of these variables and the processes underlying water quality dynamics in rivers [34,35]. The suite of covariates we selected for the TSS model included: turbidity measured in the laboratory; T15, a categorical variable representing turbidity < 15 NTU (‘below’) or ≥ 15 NTU (‘above’); site (MR, PR, SC); and all of their interactions (Fig 5, Step 1). We also included site as a grouping variable in the temporal correlation structure, to account for within-site correlation (Fig 5, Step 2). We included T15 because the intercept for the relationship with TSS appeared to change below 15 NTU, particularly at freshwater sites PR and SC (S1 Fig). In addition, 15 NTU is the water-quality guideline value for turbidity in freshwater streams and rivers in northern Australia .
We used a two-step model-selection process to identify the final TSS model. First, we implemented a backwards-stepwise model-selection procedure to identify the subset of covariates that had the most support in the data (Fig 5, Step 4a). Parameters were estimated using maximum likelihood and models were compared using the Akaike Information Criterion (AIC) . Next, we assessed the predictive performance of the model using a 5-fold cross-validation (cv) procedure designed specifically for temporally correlated data (Fig 5, Step 6) . Validation data from each site were created by dividing the complete time series into five blocks of chronologically ordered observations. Maximum likelihood can produce biased estimates of ϕ  and so the final model was then iteratively refit without the validation data (i.e. each of the five blocks), using restricted maximum likelihood (REML) for parameter estimation. We then used the observations from the validation set and the associated cross-validation predictions to calculate the root mean-square error (cvRMSE) and the 95% prediction coverage value (cvPC). An r-squared statistic (cvR2) was also generated based on the squared Spearman rank correlation between the observations and the cross-validated predictions. We chose this approach because Spearman rank correlation is suitable for time series and models with autoregressive correlation structures.
Relationships between NOx and the potential covariates conductivity, turbidity (both measured in the laboratory) and river level (measured on-site at the time of water sampling) were more complex and site-specific than the relationship between turbidity and TSS (S2–S4 Figs). Therefore, we first fit NOx models for each site separately (Fig 5, Step 1). The models included covariates for conductivity, turbidity, level and all of their interactions. We also included a binary grouping variable in the temporal correlation structure based on river level (Fig 5, Step 2) because concentrations of NOx can vary more considerably during high flows than more stable flow periods (e.g. ). We did not know a priori what the most suitable cut off would be for the level-based AR(1) structure and so we tested three options for each site: (i) less than the first quartile (Q1), (ii) less than the median (Q2), and (iii) less than the third quartile (Q3; Fig 5, Step 3). We then implemented the two-stage model-selection procedure (Fig 5, Steps 4a,b), using backwards stepwise regression and cross-validation for NOx in the same general way as for TSS, except that models were fit separately to each site and three AR(1) structures were tested for each site. This produced nine models (3 sites x 3 AR(1) structures), which we then refit using REML to calculate a cvRMSE for each. For each site, the model with the lowest cvRMSE was deemed the best model, reducing the nine models to three. Although each of these three remaining models had the greatest predictive ability for the relevant site, our aim was to develop a single model that could be applied across all sites. Therefore, we composited all of the covariates from the three ‘best’ models to create a final model for NOx (Fig 5, Step 5). We refit this final model using the data from each site separately, using cross-validation to generate the cvRMSE, cvPC and cvR2 and evaluate the predictive ability (Fig 5, Step 6).
Prediction using data from in situ sensors.
After identifying the final models for TSS and NOx, we fit those models using covariates based on the data from the in situ sensors (i.e. turbidity, conductivity and/or level, as per the final TSS or NOx model structure), and used them to make predictions and associated estimates of uncertainty of TSS and NOx, respectively (Fig 5, Step 7). There was limited overlap in the timespans of the laboratory and in situ sensor data (Figs 2–4), and of course no sensor-measured observations of TSS or NOx on which to forecast future concentrations. Therefore, we generated the predictions and associated estimates of uncertainty using an infinite-horizon forecast , which assumes that forecasting is being made well into the future so that any temporal autocorrelations in the data are irrelevant.
We used a leave-one-out cross validation (LOOCV) procedure to assess whether the final models for TSS and NOx fit to the sensor-measured surrogate covariates could accurately and precisely predict the response (Fig 5, Step7). This took a single observation from the sensor-measured covariate(s) as the validation dataset and the laboratory data minus a single observation (time-matched with the sensor validation observation) as the training dataset, which was fit using the final TSS or NOx model. A prediction was then made using the validation dataset, and the procedure was repeated for each observation. The laboratory and sensor observation were rarely synoptic, and so we used only those sensor measurements collected within one hour of a laboratory measurement (for MR, PR and SC respectively: n = 11, 49 and 28 for TSS; n = 11, 23, and 30 for NOx) as validation observations. Given the limited size of these data subsets, the LOOCV procedure was a more suitable method than the 5-fold cross-validation procedure  we used on the much larger, complete datasets of laboratory observations. We then calculated the cvRMSE, cvPC and cvR2 using the ‘time-matched’ laboratory concentrations of TSS or NOx and the associated LOOCV predicted concentrations.
TSS model based on laboratory data
The final TSS model, which minimized the AIC, explained over 90% of the variation in TSS (cvR2) and had excellent 95% prediction coverage (cvPC = 97.7%; Table 2; Fig 6) based on the 5-fold cross-validation. Although the linear relationship was strong, the model tended to under-predict at high TSS values, where data were relatively sparse (Fig 6). More specifically, at observed TSS concentrations greater than c. 100 mg/L, the model marginally over-predicted at MR, and under-predicted at both PR and SC (Fig 6).
TSS (mg/L; back-transformed with bias correction). Data from each site shown in purple (Mulgrave River, MR), blue (Pioneer River, PR) and light green (Sandy Creek, SC). Black lines show the 1:1 relationships between observations and predictions.
The relatively high correlation parameter value (ϕ = 0.87, with 95% confidence interval of 0.83–0.91) indicated that there was significant temporal autocorrelation in the data captured by the AR(1) model. The model included laboratory-based covariates for turbidity, site and T15, as well as interactions between site and turbidity and between T15 and turbidity (Table 3). TSS had a statistically significant (p < 0.05) and positive relationship with turbidity across all three sites, with TSS increasing more rapidly per unit rise in turbidity at MR than at the freshwater sites PR and SC, and when turbidity was ≥ 15 NTU as opposed to < 15 NTU.
NOx model based on laboratory data
NOx models with a grouping structure based on the median value of river level (i.e. Q2) had the best predictive ability at all of the sites according to the cvRMSE (S1 Table). The combination of covariates from those models comprised turbidity, conductivity, level, and interactions between turbidity and level and between conductivity and level, which were all included in the final NOx model (Fig 7) along with median river level (Q2, as relevant to each site) in the grouping structure. Correlation parameters for this model indicated that temporal autocorrelation in the data was captured by the AR(1) structure (MR: ϕ = 0.86, with a confidence interval (CI) of 0.75–0.93; PR: ϕ = 0.86, CI = 0.72–0.94; SC: ϕ = 0.87, CI = 0.73–0.94).
NOx (mg/L; back-transformed with bias correction). Data from each site shown in purple (Mulgrave River, MR), blue (Pioneer River, PR) and light green (Sandy Creek, SC). Black lines show the 1:1 relationships between observations and predictions.
The three site-specific models had poor predictive ability, explaining 6–22% of the variation in NOx only (Table 2). At observed NOx concentrations greater than c. 0.1 mg/L NOx, the predictions had almost no relationship with the observations at any of the sites (Fig 7). In addition, the statistical significance and direction of the covariates’ effects in the model also differed among sites (Table 4). For example, the relationship between conductivity and level was significant (p < 0.01) and positive at SC, but non-significant at MR and PR. The relationship between turbidity and level was significant (p < 0.01) and negative at PR, but non-significant at MR and SC. However, such differences may be expected given that the final model contained covariates and interactions were not significant at every site (S1 Table).
TSS predictions from sensor data
The predictive accuracy of the final TSS model that included covariates from in situ sensors was high. The cvR2 and cvRMSE (86.5% and 25.1 mg/L; Table 5, Fig 8) from the LOOCV showed that the accuracy of the model was similar to that of the model fit to laboratory data (cvR2 = 90.4% and cvRMSE = 29.4 mg/L; Table 2). Although the 95% prediction coverage decreased from 97.7% to 88.6%, this is still reasonable given the relatively small sample size. Furthermore, all TSS predictions made from the sensor-measured turbidity covariate fell within the ranges of TSS measured in the laboratory (Table 1), except for some of the ‘future’ predictions at MR (Fig 9) when sensor-measured turbidity in late 2017 and early 2018 was high relative to that measured in the laboratory between January 2016 and June 2017 (Fig 2). The higher TSS values predicted at MR during the latter half of 2017 are therefore reasonable. TSS at SC was under-predicted at higher concentrations (> c. 100 mg/L; Fig 8), which was similar to our findings from the final TSS model fit to laboratory data (Fig 6). Finally, prediction intervals for all sites tended to be wider during peak events than at other times, which is to be expected when dealing with a log-normal response (Fig 9).
Mulgrave River (MR, purple), Pioneer River (PR, blue) and Sandy Creek (SC, light green). Gray shading shows upper and lower boundaries of the 95% prediction interval, and the inner lines the predicted TSS concentrations through time. Gaps indicate periods of missing data in the sensor time series. Closed circles show the laboratory-measured TSS concentrations within the same period.
NOx predictions from sensor data
Accuracy and precision of the NOx predictions produced by the model fit to the sensor-based covariates exceeded that of the final model fitted to all of the laboratory data (Tables 2 and 5, Fig 10). The cvRMSE values were substantially smaller (MR: 0.05 vs 0.10; PR: 0.10 vs 0.16; SC: 0.11 vs 0.29) and the cvR2 values higher (MR: 56.9 vs 19.5%; PR: 6.6 vs 6.2%; SC: 71.1 vs 21.6%). In addition, the 95% prediction intervals were more reliable in the model fit to sensor data and captured the true values 100% of the time, with prediction intervals tending to be wider during events than non-events (Fig 11). At SC, there was a close relationship between the predictions and the observations at concentrations < c. 0.15 mg/L, but values greater than those tended to be under-predicted (Fig 10). There was also consistent over-prediction at MR and no relationship between predicted and observed NOx at PR (Fig 10).
Mulgrave River (MR, purple), Pioneer River (PR, blue) and Sandy Creek (SC, light green). Gray shading shows upper and lower boundaries of the 95% prediction interval, and the inner lines the predicted TSS concentrations through time. Gaps indicate periods of missing data in the sensor time series. Closed circles show the laboratory-measured NOx concentrations within the same period.
The range of NOx values predicted from the sensor covariates using all of the sensor-measured observations at both PR and SC fell within the range of NOx values measured in the laboratory. Although the maximum value predicted at MR (30.6 mg/L) was much greater than that measured in the laboratory (0.48 mg/L; Fig 11, Table 1), the elevated NOx values (>10 mg/L) predicted at MR were associated with peak values of the sensor-based covariates in late 2017 and early 2018 (turbidity, conductivity and/or level, depending on the site; Figs 2–4). As with TSS predictions, these high-concentration NOx predictions were in the ‘future’ (i.e. occurring after the last available laboratory observation); thus, the future NOx concentrations could conceivably have exceeded the concentrations observed in 2016 and early 2017.
Surrogate potential and model generalizability
The transferability of models across sites, and potentially regions, will become increasingly important as organizations move to automated sensing for water-quality monitoring throughout catchments. We found a consistent, positive relationship between TSS and turbidity in both estuarine and fresh waters across study locations in different catchments separated by up to c. 700 km. In addition, the final TSS model had high predictive ability across all sites, indicating that a single mixed-effects model could be used to predict sediment concentrations from high-frequency, in situ sensor data over multiple locations in the study region. Whilst other studies have shown that turbidity is a useful surrogate of sediments in rivers, particularly when models account for temporal correlation in the data (e.g. ), our findings additionally suggest that a model based on sensor-measured turbidity has strong potential to be generalizable across locations, at least for the studied Great Barrier Reef catchments.
The complex relationships between NOx and its potential surrogates made the development of a generalizable model more difficult than it was for TSS. The predictive ability of the composite final NOx model fit to the turbidity, conductivity, and river level covariates at each site was substantially lower than that of the TSS model. Mismatch in the timing of laboratory and sensor observations was also a potential source of variability and bias in the leave-one-out cross-validations for the NOx model, particularly during events when rapid changes in the covariates’ values can occur. However, such mismatch would also have affected the leave-one-out cross-validations for the TSS model, which maintained good predictive ability regardless. Furthermore, the NOx model performed differently depending on the site to which it was applied, with a poor relationship between the observed and predicted concentrations at PR in particular, and a tendency to under-predict at SC for NOx concentrations above c. 0.1 mg/L. Consequently, we would not recommend that the NOx model developed herein be used as a generalized model across the study region.
The lack of generalizability of the NOx model likely relates to the complexity of dissolved nutrient dynamics in rivers, which are influenced by multiple and interacting factors including physical, chemical and biological processes [34,35]. For example, different timings and applications of fertilizers to agricultural land, different spatial configurations and types of soil and agricultural land (e.g. livestock grazing versus sugarcane cropland), and variation in the uptake of nutrients by phytoplankton, may all differentially influence dissolved nutrient concentrations among sites and through time [44,45,46]. Inclusion of additional covariates such as seasonal variation in fertilizer application, flow-weighted land use , soil characteristics and/or time since the last rainfall event may improve model fits and resultant predictions if more site-specific NOx models are desired. We also acknowledge that our findings are based on one year of data alone; inclusion of successive years of data, as they become available, may more comprehensively describe the temporal trends in water quality and the relationships between variables, which may in turn improve our ability to predict NOx across sites using surrogate measures. The NOx model findings also highlight that development of reliable, low-cost nitrate sensors  will remain an important management goal for the study region, and likely elsewhere, in the absence of suitable surrogate measures that can predict NOx across multiple sites with high accuracy and precision (e.g. dissolved oxygen ).
Important information for managers and scientists are also provided by the estimates of uncertainty associated with each prediction. Our results showed that the prediction intervals reliably captured the true TSS and NOx values for the final models fit to in situ sensor data, regardless of the predictive accuracy of the models. Prediction intervals were wider during events when the predicted TSS and NOx concentrations increased rapidly, corresponding with sudden new inputs of fresh water, and this has modelling implications. For instance, if such models were used to predict sediment and nutrient concentrations during events in the study region, end-users would need to be aware that the uncertainty around those predictions may be quite high, especially at the upper end of the prediction interval. Furthermore, if the predicted concentrations were then used to estimate high-frequency sediment and nutrient loads, as most water-quality monitoring programs transitioning to automated in situ sensors would desire, the associated estimates of uncertainty could be propagated through the model and accounted in loads estimates . Knowledge of the magnitude of prediction uncertainty is important because it provides managers with information about where and when they can be most or least confident in model predictions  in order to prioritize future sampling efforts and management actions effectively [50,51]. We therefore recommend that measures of uncertainty be included routinely in water-quality reporting.
Future directions and concluding remarks
Significant investments are being made to change management practices and reduce the quantities of sediments and nutrients entering rivers and, eventually, the Great Barrier Reef lagoon . However, measuring the downstream impacts of these investments is challenging because current water-quality monitoring relies on a relatively small number of sites at or near river mouths. This makes pinpointing where the largest sources of sediments and nutrients are within a catchment difficult. The ability to predict TSS and NOx using data from relatively low-cost in situ sensors will allow networks of sensors to be deployed throughout catchments as technologies advance, creating numerous benefits for management. Firstly, the number of water-quality monitoring sites would increase significantly. Secondly, as the amount of data increases, the opportunity to develop near-real time statistical models for TSS and NOx increases, which could then be used to create dynamic predictive maps of sediment and nutrient concentrations throughout entire catchments. This would provide managers with greater situational awareness of where and when water-quality targets are being breached and would allow prioritization of land management actions in space and time to further reduce land-based impacts on the Great Barrier Reef lagoon.
Our study highlights that models fit to in situ water-quality data can be used to generate accurate predictions of TSS at both freshwater and estuarine sites. As the number of monitoring locations increases, spatial statistical models for stream networks  could be used to generate predictions, with estimates of uncertainty, across entire catchments. These methods could also be extended into both space and time (i.e. spatio-temporal models ), the need for which still clearly exists [32,54]. Such efforts, in combination with the methods developed herein, could revolutionize the way water quality is monitored and managed.
S1 File. Data and code.
Water-quality data files used to fit and predict from the TSS and NOx models using the R script herein.
S1 Table. Site-based NOx models.
The best NOx model for each site based on the cvRMSE (i.e. lowest value per site), following backwards stepwise selection of the covariates in each model using the Akaike Information Criterion.
S1 Fig. Turbidity and TSS.
The relationship between laboratory-determined turbidity (NTU) and total suspended solids (TSS, mg/L), log10-transformed, at Mulgrave River (MR; left plot), Pioneer River (PR, middle plot) and Sandy Creek (SC, right plot), with turbidity values < 15 NTU represented by closed triangles and those ≥ 15 NTU by closed circles, with each set in the freshwater sites PR and SC enclosed by ellipses.
S2 Fig. Turbidity and NOx.
The relationship between laboratory-determined turbidity (NTU) and oxidized nitrogen (NOx, mg/L), log10-transformed, at Mulgrave River (MR; left plot), Pioneer River (PR, middle plot) and Sandy Creek (SC, right plot).
S3 Fig. Conductivity and NOx.
The relationship between laboratory-determined conductivity (μS/cm) and oxidized nitrogen (NOx, mg/L), log10-transformed, at Mulgrave River (MR; left plot), Pioneer River (PR, middle plot) and Sandy Creek (SC, right plot).
S4 Fig. River level and NOx.
The relationship between river level (m) recorded on-site at the time of water sample collection and oxidized nitrogen (NOx, mg/L), log10-transformed, at Mulgrave River (MR; left plot), Pioneer River (PR, middle plot) and Sandy Creek (SC, right plot).
S5 Fig. TSS-model residuals.
Residuals from the final total suspended solids (TSS, mg/L, log10-transformed) model plotted in chronological order of the observations at Mulgrave River (MR; left plot), Pioneer River (PR, middle plot) and Sandy Creek (SC, right plot).
S6 Fig. TSS-model diagnostic plots.
Diagnostic plots for the final total suspended solids (TSS, mg/L, log10-transformed) model. Upper row: fitted values vs residuals. Middle row: boxplots of residuals. Lower row: QQ-plot. Left column: by site (MR, Mulgrave River; PR, Pioneer River; SC, Sandy Creek). Right column: by T15 (Above: turbidity ≥ 15 NTU; Below: turbidity < 15 NTU).
S7 Fig. NOx-model residuals.
Residuals from the final oxidized nitrogen (NOx, mg/L, log10-transformed) models plotted in chronological order of the observations at Mulgrave River (MR; upper plot), Pioneer River (PR, middle plot) and Sandy Creek (SC, lower plot).
S8 Fig. NOx-model diagnostic plots.
Diagnostic plots for the final oxidized nitrogen (NOx, mg/L, log10-transformed) model for each site (Mulgrave River, MR; purple points; Pioneer River, PR; blue points; Sandy Creek, SC; light green points). Upper row: fitted values versus residuals. Middle row: boxplots of residuals. Lower row: QQ-plots.
The authors acknowledge the Queensland Department of Environment and Science, Great Barrier Reef Catchment Loads Monitoring Program for the data and the staff from Water Quality and Investigations for their input. We thank Grace Heron for producing the map in Fig 1. Laboratory and in situ sensor data together with the R code used in this paper are provided in the Supporting Information.
- 1. Brodie JE, Kroon FJ, Schaffelke B, Wolanski EC, Lewis SE, Devlin MJ, et al. Terrestrial pollutant runoff to the Great Barrier Reef: an update of issues, priorities and management responses. Mar Pollut Bull. 2012;65: 81–100. pmid:22257553
- 2. Leigh C, Burford M, Connolly R, Olley J, Saeck E, Sheldon F, et al. Science to support management of receiving waters in an event-driven ecosystem: from land to river to sea. Water. 2013;5: 780–97.
- 3. Humanes A, Ricardo GF, Willis BL, Fabricius KE, Negri AP. Cumulative effects of suspended sediments, organic nutrients and temperature stress on early life history stages of the coral Acropora tenuis. Sci Reports. 201710;7: 44101.
- 4. O’Brien KR, Weber TR, Leigh C, Burford MA. Sediment and nutrient budgets are inherently dynamic: evidence from a long-term study of two subtropical reservoirs. Hydrol Earth Syst Sci. 2016;20: 4881–94.
- 5. Wallace R, Huggins R, King O, Gardiner R, Thomson B, Orr DN, et al. Total suspended solids, nutrient and pesticide loads (2014–2015) for rivers that discharge to the Great Barrier Reef–Great Barrier Reef Catchment Loads Monitoring Program. Department of Science. Information Technology and Innovation, Brisbane. 2016.
- 6. Pellerin BA, Stauffer BA, Young DA, Sullivan DJ, Bricker SB, Walbridge MR, et al. Emerging tools for continuous nutrient monitoring networks: Sensors advancing science and water resources protection. J Am Water Resour Assoc. 2016;52: 993–1008.
- 7. Horsburgh JS, Jones AS, Stevens DK, Tarboton DG, Mesner NO. A sensor network for high frequency estimation of water quality constituent fluxes using surrogates. Environ Model Soft. 2010;25: 1031–44.
- 8. Jones AS, Stevens DK, Horsburgh JS, Mesner NO. Surrogate Measures for Providing High Frequency Estimates of Total Suspended Solids and Total Phosphorus Concentrations 1. J Am Water Resour Assoc. 2011;47: 239–53.
- 9. Godsey SE, Kirchner JW, Clow DW. Concentration-discharge relationships reflect chemostatic characteristics of US catchments. Hydrol Processes. 2009; 23: 1844–1864.
- 10. Kim J, Furumai H. Improved calibration of a rainfall‐pollutant‐runoff model using turbidity and electrical conductivity as surrogate parameters for total nitrogen. Water Environ J. 2013;27: 79–85.
- 11. Ruzycki EM, Axler RP, Host GE, Henneck JR, Will NR. Estimating sediment and nutrient loads in four western lake superior streams. J Am Water Resour Assoc. 2014;50: 1138–54.
- 12. Slaets JI, Schmitter P, Hilger T, Lamers M, Piepho HP, Vien TD, et al. A turbidity-based method to continuously monitor sediment, carbon and nitrogen flows in mountainous watersheds. J Hydrol. 2014;513: 45–57.
- 13. Diamantopoulou MJ, Papamichail DM, Antonopoulos VZ. The use of a neural network technique for the prediction of water quality parameters. Oper Res. 2005;5: 115–25.
- 14. West AO, Scott JT. Black disk visibility, turbidity, and total suspended solids in rivers: A comparative evaluation. Limnology and Oceanography: Methods. 2016;14: 658–67.
- 15. Skarbøvik E, Roseth R. Use of sensor data for turbidity, pH and conductivity as an alternative to conventional water quality monitoring in four Norwegian case studies. Acta Agr Scan B-S P. 2015;65: 63–73.
- 16. Stutter M, Dawson JJ, Glendell M, Napier F, Potts JM, Sample J, et al. Evaluating the use of in-situ turbidity measurements to quantify fluvial sediment and phosphorus concentrations and fluxes in agricultural streams. Sci Tot Environ. 2017;607: 391–402.
- 17. Steffy LY, Shank MK. Considerations for using turbidity as a surrogate for suspended sediment in small, ungaged streams: Time‐series selection, streamflow estimation, and regional transferability. River Research and Applications. 2018;34: 1304–14.
- 18. Viviano G, Salerno F, Manfredi EC, Polesello S, Valsecchi S, Tartari G. Surrogate measures for providing high frequency estimates of total phosphorus concentrations in urban watersheds. Water Rese. 2014;64: 265–77.
- 19. Lessels JS, Bishop TF. Estimating water quality using linear mixed models with stream discharge and turbidity. J Hydrol. 2013;498: 13–22.
- 20. Brodie J. Mackay Whitsunday Region: State of the Waterways. ACTFR Technical Report No. 02/03. Australian Centre for Tropical Freshwater Research, James Cook University: Townsville, Australia. 2004.
- 21. Wang YG, Kuhnert P, Henderson B. Load estimation with uncertainties from opportunistic sampling data–a semiparametric approach. J Hydrol. 2011;396: 148–57.
- 22. McInnes K, Abbs D, Bhend J, Chiew F, Church J, Ekström M et al. Wet Tropics Cluster Report, Climate Change in Australia. Projections for Australia’s Natural Resource Management Regions. Canberra: CSIRO and Bureau of Meteorology. 2015.
- 23. Rayner TS, Pusey BJ, Pearson RG. Seasonal flooding, instream habitat structure and fish assemblages in the Mulgrave River, north-east Queensland: towards a new conceptual framework for understanding fish-habitat dynamics in small tropical rivers. Mar Freshw Res. 2008;59: 97–116.
- 24. Leigh C, Alsibai O, Hyndman RJ, Kandanaarachchi S, King OC, McGree JM, Neelamraju C, Strauss J, Talagala PD, Turner RD, Mengersen K. A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. Sci Tot Environ. 2019; 664: 885–898.
- 25. Huggins R, Wallace R, Orr DN, Thomson B, Smith A, Taylor C, et al. Total suspended solids, nutrient and pesticide loads (2015–2016) for rivers that discharge to the Great Barrier Reef–Great Barrier Reef Catchment Loads Monitoring Program. Department of Environment and Science: Brisbane, Australia. 2017.
- 26. Carroll C, Waters D, Vardy S, Silburn DM, Attard S, Thorburn PJ, et al. A paddock to reef monitoring and modelling framework for the Great Barrier Reef: paddock and catchment component. Mar Pollut Bull. 2012;65: 136–49. pmid:22277580
- 27. Standards Australia. AS/NZS 5667.1:1998, Water quality: sampling—guidance on the design of sampling programs, sampling techniques and the preservation and handling of samples, Standards Australia: Homebush, Australia. 1998.
- 28. Standards Australia. AS/NZS 5667.10:1998, Water quality: sampling—guidance on sampling of waste waters, Standards Australia: Homebush, Australia. 1998.
- 29. DES (Department of Environment and Science). Monitoring and sampling manual: Environmental protection (water) policy. Department of Environment and Science: Brisbane, Australia. 2018.
- 30. APHA-AWWA-WEF (American Public Health Association, American Water Works Association and Water Environment Federation). Standard methods for the examination of water and wastewater. American Public Health Association (APHA): Washington, DC, USA. 2005.
- 31. Pinheiro J, Bates D. Mixed-effects models in S and S-PLUS. Springer Science & Business Media; 2004.
- 32. Isaak DJ, Wenger SJ, Peterson EE, Ver Hoef JM, Nagel DE, Luce CH, et al. The NorWeST summer stream temperature model and scenarios for the western US: A crowd‐sourced database and new geospatial tools foster a user community and predict broad climate warming of rivers and streams. Water Resour Res. 2017;53: 9181–205.
- 33. Ferguson RI. River loads underestimated by rating curves. Water Resour Res. 1986;22: 74–76.
- 34. Wetzel R. Limnology. lake and river ecosystems. Academic Press. 2001.
- 35. Boulton A, Brock M, Robson B, Ryder D, Chambers J, Davis J. Australian freshwater ecology: processes and management. John Wiley & Sons. 2014.
- 36. ANZECC/ARMCANZ. Australian and New Zealand Guidelines for Fresh and Marine Water Quality. Volume 1, Section 3.5—Sediment Quality Guidelines. Australian and New Zealand Environment and Conservation Council, and Agriculture and Resource Management Council of Australia and New Zealand: Canberra, Australia. 2000.
- 37. Akaike H. A new look at the statistical model identification. In Selected Papers of Hirotugu Akaike. 1974 (pp. 215–222). Springer, New York, NY.
- 38. Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera‐Arroita G, et al. Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 2017;40: 913–29.
- 39. Cheang WK, Reinsel GC. Bias reduction of autoregressive estimates in time series regression model through restricted maximum likelihood. J Am Stat Assoc. 2000;95: 1173–84.
- 40. Duncan JM, Welty C, Kemper JT, Groffman PM, Band LE. Dynamics of nitrate concentration‐discharge patterns in an urban watershed. Water Resour Res. 2017;53: 7349–65.
- 41. Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. OTexts; 2018.
- 42. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2018. URL https://www.R-project.org/.
- 43. Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1–131. 2017. URL: https://CRAN.R-project.org/package=nlme.
- 44. Hunter HM, Walton RS. Land-use effects on fluxes of suspended sediment, nitrogen and phosphorus from a river catchment of the Great Barrier Reef, Australia. J Hydrol. 2008;356: 131–46.
- 45. Bainbridge ZT, Brodie JE, Faithful JW, Sydes DA, Lewis SE. Identifying the land-based sources of suspended sediments, nutrients and pesticides discharged to the Great Barrier Reef from the Tully–Murray Basin, Queensland, Australia. Mar Freshw Res. 2009;60: 1081–90.
- 46. Franklin HM, Garzon-Garcia A, Burton J, Moody PW, De Hayr RW, Burford MA. A novel bioassay to assess phytoplankton responses to soil-derived particulate nutrients. Sci Tot Environ. 2018;636: 1470–1479.
- 47. Peterson EE, Sheldon F, Darnell R, Bunn SE, Harch BD. A comparison of spatially explicit landscape representation methods and their relationship to stream condition. Freshw Biol. 2011;56: 590–610.
- 48. Gladish DW, Kuhnert PM, Pagendam DE, Wikle CK, Bartley R, Searle RD, et al. Spatio-temporal assimilation of modelled catchment loads with monitoring data in the Great Barrier Reef. Annals Appl Stat. 2016;10: 1590–618.
- 49. Peterson EE, Urquhart NS. Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: a case study in Maryland. Environ Monit Assess. 2006;121: 615–38. pmid:16967209
- 50. Kuhnert PM, Pagendam DE, Bartley R, Gladish DW, Lewis SE, Bainbridge ZT. Making management decisions in the face of uncertainty: a case study using the Burdekin catchment in the Great Barrier Reef. Mar Freshw Res. 2018;69: 1187–200.
- 51. Yoccoz NG, Nichols JD, Boulinier T. Monitoring of biological diversity in space and time. Trends in Ecology & Evolution. 2001;16: 446–53.
- 52. Peterson EE, Ver Hoef JM. A mixed‐model moving‐average approach to geostatistical modeling in stream networks. Ecology. 2010;91: 644–651. pmid:20426324
- 53. Cressie N, Wikle CK. Statistics for spatio-temporal data. John Wiley & Sons. 2011.
- 54. Peterson EE, Ver Hoef JM, Isaak DJ, Falke JA, Fortin MJ, Jordan CE, et al. Modelling dendritic ecological networks in space: an integrated network perspective. Ecology Letters. 2013;16: 707–19. pmid:23458322