Hybrid forward-selection method-based water-quality estimation via combining Landsat TM, ETM+, and OLI/TIRS images and ancillary environmental data

A simple approach to enable water-management agencies employing free data to create a single set of water quality predictive equations with satisfactory accuracy is proposed. Multiple regression-derived equations based on surface reflectance, band ratios, and environmental factors as predictor variables for concentrations of Total Suspended Solids (TSS) and Total Nitrogen (TN) were derived using a hybrid forward-selection method that considers both p-value and Variance Inflation Factor (VIF) in the forward-selection process. Landsat TM, ETM+, and OLI/TIRS images were jointly utilized with environmental factors, such as wind speed and water surface temperature, to derive the single set of equations. Through splitting data into calibration and validation groups, the coefficients of determination are 0.73 for TSS calibration and 0.70 for TSS validation, respectively. The coefficients of determination for TN calibration and validation are 0.64 and 0.37, respectively. Among all chosen predictor variables, ratio of reflectance of visible red (Band 3 for Landsat TM and ETM+, or Band 4 for Landsat OLI/TIRS) to visible blue (Band 1 for Landsat TM and ETM+, or Band 2 for Landsat OLI/TIRS) has a strong influence on the predictive power for TSS retrieval. Environmental factors including wind speed, remote sensing-derived water surface temperature, and time difference (in days) between the image acquisition and water sampling were found to be important in water-quality quantity estimation. The hybrid forward-selection method consistently yielded higher validation accuracy than that of the conventional forward-selection approach.


Introduction
Continuous monitoring of water quality is essential for the health and welfare of the people and ecosystems reliant upon them. Urbanization, agriculture, and other anthropogenic factors can alter water quality [1], and waiting to remediate until a change is clearly visible can be much more costly than early prevention. Despite this, the cost of adequate temporal and spatial physical measurements can potentially be prohibitive [2]. For example, the United States Geological Survey (USGS) regularly monitors water quality in Lady Bird Lake in Austin, Texas, USA; however, the frequency is only approximately twice per year at a single point near the outlet over the past decade [3]. Additionally, in situ measurements from year to year do not occur in the same months. As a result, it is difficult to distinguish whether a change in the water quality measured at a point is truly a long-term change or the result of a seasonal difference or recent event (e.g., a large precipitation event) [4]. Additionally, it is impossible to evaluate the spatial variation in water quality from single-point measurements.
In recent decades, remote sensing has provided an alternative method for monitoring water quality in a spatially synoptic manner at a lower cost compared with extensive in situ measurement. Each water-column constituent exhibits a specific spectral response that can be observed by satellite-and aircraft-mounted remote sensors [5]. Suspended sediment usually exhibits strong backscattering of incident light [5], where the actual color depends on the terrestrial origin [6]. CDOM generally exhibits an exponential reduction in absorption with increasing wavelength; CDOM spectral absorption curves typically entail strong absorption features in the ultraviolet to blue wavelength region (280-400 nm), with dramatic decreases to near zero in the red and near infrared portions of the spectrum [7]. Chlorophyll a (e.g., in algae-laden waters) entails strong absorption in the blue and red portions of the spectrum, as well as a reflectance maximum around 550 nm (i.e., a green peak) [8]".
For a particular wavelength, λ, the spectral radiance from the water observed vertically, known as the upwelling radiance, L u , is given by where L w is the radiance reflected/backscattered by the water column, in-water constituents, and the bottom if the water column is optically shallow; L s is the skylight radiance; and O is the ratio of radiance directly reflected by the water surface to L s [9]. Note that the radiance observed by a satellite is composed of L u , plus atmospheric interference; therefore, it requires atmospheric correction (discussed below). L w , L s , and O are influenced by a variety of factors. If the water column is sufficiently deep, bottom reflectance may be ignored, and L w can be assumed to be a measure of the effects of water-column constituents alone. Atmospheric conditions (e.g., clear, cloudy, overcast) affect both O and L s , whereas O can be further affected by wind speed in the form of surface ripples (e.g., temporary sun glint) [9]. Wind speed has also been found to have some influence on water clarity [10]. Because of their higher capability to penetrate the water column, visible bands have conventionally been used to estimate water quality [5]. In addition, infrared bands have also shown significance in determining water-quality quantities in some studies [11,12]. However, only near infrared wavelengths were used in these studies. Thermal infrared bands have not extensively been used in water-quality estimation.
Site-specific predictive models can be created to relate a number of band radiance measurements or derived reflectance values [5] to the water-quality quantity of interest by fitting the model to in situ water-quality measurements. Multiple regression analysis and artificial neural networks (ANNs) constitute two methods that are frequently used to generate such predictive models [5,12,13,14].
In academia, satellite remote-sensing images have been increasingly available for waterquality determination. However, the popularity of this approach has not been extended to decision making by management agencies in general [15]. According to Schaeffer et al. [15], the reasons for this phenomenon include cost, product accuracy, data continuity, and programmatic support.
Cost is always a major constraint, as many water-management agencies have limited budgets [15]. Even though there are many free remote-sensing data sets available, such as the multispectral satellite images available from the Landsat program (e.g., Landsat Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+), and Operational Land Imager (OLI)/ Thermal Infrared Sensor (TIRS)) [16], MODIS [17], SeaWiFS [18], etc., terrestrial pond/lake applications are predominately limited to moderate spatial-resolution images from the Landsat program due to its relatively finer spatial resolution. Another aspect of the cost constraint is the cost to collect field water-sampling data, as the creation of empirical predictive models necessitates in situ water-quality data. Sometimes, due to cost, logistical, and other constraints, a water-management agency can only resort to free water-quality data, such as those made available by the USGS. The downside, as noted above, is that spatio-temporal sampling density/data availability may be low. This drawback seriously limits the ability of a water-management agency to utilize free Landsat program data, for example, as the basis of a water-quality monitoring program since the satellite images and corresponding in situ measurements must be acquired in a temporally proximal manner [19]. Furthermore, water-quality variables of interest may not even be measured, given the complexity or cost of the measuring techniques needed, making regular/automatic sampling difficult.
As a result of these issues, water-management agencies that resort to using only free data resources often have access to a limited number of useable satellite images for water-quality monitoring. Such a scenario often leads to the use of a single predictive model to determine water-quality information from satellite images. Nevertheless, many studies divide their analyses by season [20,21] due to systemic seasonal differences in factors such as concentrations of color-producing substances (including phytoplankton), atmospheric disturbances [21], and solar zenith angle [22]. Some studies have shown that the predictive power of equations created without distinguishing by season is lower than it would otherwise be [23,24].
Since the derived predictive equation is seasonally affected by the environment, a few studies have incorporated the influencing factors into predictive equation generation. One example is with the estimation of chlorophyll-a concentration. It is known that phytoplankton growth is statistically significantly dependent on water temperature [25,26]. Incorporating water temperature (derived from the satellite remote-sensor thermal band) in development of predictive equations has proven to be helpful in determining chlorophyll-a concentration [27]. However, this approach has not been investigated extensively. In this study, we consider additional environmental factors based on energy fluxes between a waterbody and the atmosphere. We posit that including these environmental factors in predictive equations not only increases prediction accuracy, but also facilitates the usage of a single set of predictive equations throughout different seasons. The direct benefit is that one can pool all observation data in creating equations, thus resulting in higher predictive power.
Programmatic support is also important to water-management agencies, according to Schaeffer et al. [15]. In most cases, local universities should be sufficient in providing support to water-management agencies. However, we posit that the methodology adopted for generating predictive models should entail model construction in a step-wise manner, such that most people with basic training could implement itwithout much difficulty. For this reason, in choosing a methodology to be implemented by water-management agencies, simple and well-understood methods such as multiple regression should be weighed against more complex methods, such as ANNs.
Product accuracy is another major concern expressed by the water-management agencies [15]. Even though water-management agencies could utilize predictive models from peerreviewed journals, such models may not yield high-accuracy estimates in a given application. Multiple regression analysis has been employed in many studies for its ease of application. However, for applications using this method, overfitting from multicollinearity can be a serious concern. Multicollinearity means that some of the explanatory variables in the multiple regression model are dependent on one another. The direct result of multicollinearity is that the standard error of coefficients of explanatory variables is inflated, which means that coefficients of the derived model are not reliable. Unfortunately, many past studies neither discuss the issue of multicollinearity, nor provide results of validation of the derived regression models [4,5,11,19,28,29,30,31]. A common way to identify multicollinearity of a model is through the use of indicators such as Akaike's Information Criteria [32], Mallow's Cp [33], PRESS [34], etc. However, such indicators apply to the whole model so all possible subsets of explanatory variables must be examined, and this approach becomes difficult when the number of variables increases [35], even with modern computing power.
Other popular methods to identify multicollinearity include the deployment of principal component analysis (PCA) or structural equation modeling (SEM) [35]. PCA creates orthogonal principal components, which are linear combination of variables, and a regression model can be created based on the orthogonal components in order to eliminate multicollinearity completely. However, some studies show that this methodology can result in a loss of explanatory power. Additionally, the main limitation of the PCA approach is that physical interpretations of the principal components are required. On the other hand, SEM accepts the existence of collinearity among explanatory variables and hypothesizes that a model exists among variables. Then all possible combinations of causal links among variables are tested against the hypothesized model. Since SEM is not an exploratory technique, SEM is prone to inferential errors made during development and selection of the hypothetical models [35].
We use the variation inflation factor (VIF) in step-wise variable selection, which is based on p-value, to minimize multicollinearity. Unlike other indicators described above, VIF is calculated for each predictor variable. VIF has been used in the field of remote sensing on a limited basis to check multicollinearity of results [36,37]. Dubovyk et al. [38] used VIF to choose variables to enter into a logistic regression model. VIF has not previously been incorporated along with established variable-selection methods (e.g., forward step-wise selection) to derive predictive equations for water-quality quantities. Details regarding VIF computation and the methodology to include VIF in equation derivation is discussed below in the Methodology section.
Although the Landsat program entails a few limitations, such as the inflexible satellite overpass schedule and the relatively lower sensitivity of sensors prior to Landsat 8, the Landsat program constitutes a truly ideal free data-source candidate for water-management agencies, given the characteristics of the various Landsat sensors, as well its long-term data continuity. The Landsat program has maintained the longest uninterrupted satellite observation record of Earth from its beginning in 1970s, employing several sensors over time including MSS, TM, ETM+, and OLI/TIRS (Landsat 8), with improving sensor sensitivities. Only a few water-quality studies have taken advantage of combining TM, ETM+, and OLI/TIRS datasets [39,40,41] even though these sensors have been shown to be compatible, as shown in Table 1 [41,42,43]. Note Table 1 shows only comparable bands among Landsat TM, ETM+, and OLI/TIRS sensors.
Because band numbering is different in OLI/TIRS, in this study, band numbers will be based on TM/ETM+. For example, if Band 3 is noted, it means Band 3 for TM and ETM+, but Band 4 for OLI/TIRS.
Based on the gaps in the research literature illustrated above, the objectives of this study were: 1. Incorporate environmental factors (such as temperature, wind speed, etc.) into a single set of predictive equations for remote-sensing water-quality measure estimation; and 2. Increase model predictive power for a limnological water-quality quantity-estimation application by considering the effect of multicollinearity in established model-creation methodologies such as forward step-wise selection.
The goal of this study is to address all four concerns of utilizing satellite data in decision making by water-management agencies-i.e., cost, product accuracy, data continuity, and programmatic support. This study provides water-management agencies with a simple, easy-tofollow methodology for utilizing free observation data (from Landsat program, USGS, etc.) in order to address cost and programmatic-support issues for water-quality monitoring. The Landsat program guarantees long-term data continuity. The proposed methodology provides a single set of predictive equations; accuracy is maintained because all available data are consolidated for the creation of a single model. Also, consideration of multicollinearity increases the likelihood for acceptable estimation accuracy of the derived model in future water-quality quantity retrieval applications.

Study area
The population of City of Austin, Texas, USA has increased dramatically in recent decades, from 346,000 in 1980 to 968,000 in 2018 [44]. With significant population growth comes an increase in impervious area, higher runoff and lower water quality in local water bodies. Lady Bird Lake (formerly Town Lake), situated near the city center, provides an opportunity to remotely monitor water quality in an urban watershed (Fig 1). The lake, formed by damming the Colorado River, is maintained at an approximately constant level by the pass-through Longhorn Dam [45]. The surface area is~1.74 square kilometers with a capacity of 9,051,000 cubic meters. The mean depth is 6 meters, with a maximum depth over 11.7 meters [46].
The USGS maintains a number of water-quality sampling stations on Lady Bird Lake, but only four of them, EC, DC, CC and AC (Fig 1), monitor the water-quality constituents of interest in this study within the time frame of available satellite images (i.e., 1983-2015) [3]. Table 2 provides basic information for these four sampling stations, including summary statistics for these water-quality quantities of interest-total suspended solids (TSS) and total nitrogen Hybrid forward-selection method-based water-quality estimation with ancillary environmental data (TN)-derived from water-quality samples collected at a depth of 1 m. Secchi disc transparency, a pseudo-measure of turbidity, was measured in four locations when the samples of Table 2 were taken (Table 3). Secchi disc depths were much shallower than the average bottom depth of the lake (6 m); thus, bottom reflection is not observable from above the air-water interface for these cases. Therefore, contribution of bottom reflectance to the water-leaving radiance (Eq 1) can be ignored.  Hybrid forward-selection method-based water-quality estimation with ancillary environmental data

Selection of satellite images
Selection of Landsat TM, ETM+, and OLI + TIRS images [47] was based on several criteria. Images selected were cloud-free and were acquired within seven days of in situ water-quality measurements in Lady Bird Lake [12,20]. In order to minimize the effects of spatio-temporally-close rainfall events, only images that occurred when daily precipitation depths observed between the dates of the selected images and their associated water-sampling dates were less than 1.25 cm (0.5 inch) were selected (Table 4). This threshold rainfall depth is chosen based on the initial abstraction rainfall depth for a watershed with a runoff curve number of 80, since most of the urbanized area around Lady Bird Lake is residential [48]. Residential districts with small lot sizes (1/4 to 1/8 acre) have a curve number ranging from 61 to 92, depending on the soil hydrologic group [49]. Rainfall depth below this threshold is considered to generate insignificant runoff, and thus should have no marked effect on water quality in the lake.

Atmospheric correction 2.3.1. Image Pre-processing.
Surface reflectance values corrected for path radiance were derived using Fast Line-of-sight Atmospheric Analysis of Spectral Hypercube (FLAASH 1 ) radiative transfer model [50,51]. Remote-sensing reflectance from spectrally dark targets such as Lady Bird Lake is usually much lower than that from the surrounding urban areas [52]. With FLAASH, significant errors can occur when strong albedo contrasts exist among the materials in the scene [51]. To minimize this potential problem, a land mask was created and applied in order to exclude all surrounding land regions [53], leaving just the aquatic areas (i.e., Lady Bird Lake) for subsequent atmospheric-correction processing.

Determination of FLAASH parameter values.
Two of the parameters required by FLAASH are visibility and choice of atmospheric model. Visibility obtained from historical airport records [54] likely caused FLAASH to over-compensate in its correction of atmospheric effects and yield negative reflectance values probably because the highest reported visibility is limited at 6 miles (9.6 km) [55] and visibility higher than that is not discernable from airport records. Therefore, the 2-band (K-T) aerosol retrieval method [51] with "urban" setting was used to estimate visibility. Ideally, selection of an atmospheric model is based on one of the following options, presented in order from most preferred to least preferred: known standard column water vapor amount, expected surface air temperature, or tabulated seasonal-latitude combinations [51]. Although there are atmospheric water-content products available [56], they do not cover all dates of interest in this research. Surface temperatures have been continuously recorded and archived by Camp Mabry Austin City Airport and Austin Bergstrom International Airport every hour over the past 30 years [54]. Therefore, atmospheric models were selected based on the surface air temperature at the time when each satellite image was acquired ( Table 5). The initially-selected December 20, 2001 image was excluded from subsequent processing because it yielded negative reflectance values after FLAASH atmospheric correction.

Atmospheric correction for thermal bands
FLAASH should not be applied to thermal bands [51]; therefore, another atmospheric-correction method was applied to thermal bands. In particular, the single-band atmospheric-correction method described by Barsi et al. [57] was used. The methodology calculates atmospheric transmission and path radiance using MODTRAN [51], based on the atmospheric profiles generated by National Centers for Environmental Prediction (NCEP). Eq 2 provides the relationship between top-of-atmosphere radiance (L TOA ), the target radiance of kinetic temperature T (L T ), the path (upwelling) radiance (L a ), and the sky (downwelling) radiance (L d ): In Eq 2, atmospheric transmission τ, path radiance L a , and sky radiance L d were obtained from the on-line calculator based on the atmospheric correction method of Barsi et al. [57]. Since water is a near-perfect blackbody, emissivity (ε) was set as 1 in this study according to Haydon [58]. Emissivity and transmission are unitless, whereas radiance values are in units of W/m 2 ÁsrÁμm.
The atmospheric profiles are only available after January 2000. For satellite images acquired prior to that, atmospheric profiles from "surrogate dates" in 2000 were used in this study. The surrogate date has nearly identical daily precipitation, temperature, and wind speed as the satellite image date. By choosing a surrogate date in such a manner, the atmospheric condition of the actual satellite image date and the surrogate date are expected to be similar. If more than two surrogate dates were found based on the above criteria for one satellite image, the one that is temporally closest to the date in the year in which a given the satellite image was acquired was chosen. Table 6 provides the list of the satellite image dates, the corresponding surrogate dates, and daily meteorological parameters for both of them.
2.3.4. Determining surface temperature from landsat thermal bands. Target temperature (i.e., water surface temperature) was derived after atmospheric correction according to equations provided in the Landsat Data User Manual [43]. For Landsat ETM+, the low-gain channel was used because it has a wider dynamic range and is not easily saturated [59]. For Landsat TIRS, only band 10 was used because data from band 11 have been contaminated by a stray-light effect, and a remedy has not yet been found [60]. Bands 10 and 11 here are band numbering from Landsat TIRS.
2.3.5. Post-processing for atmospherically-corrected surface reflectance. Surface reflectance values at the water-quality stations were extracted from the FLAASH-corrected satellite images. Pixels located at the exact coordinates of the respective water-quality sampling stations are not necessarily the ideal pixels for which reflectance values should be extracted for analysis. The USGS water-quality stations are all positioned very close to the shore or land-related objects (such as bridges); thus, the pixel located at the exact coordinates of a given water-quality sampling station may contain land and/or land-related objects. Hybrid forward-selection method-based water-quality estimation with ancillary environmental data To minimize potential deleterious effects of such mixed pixels, the search range was expanded to 90 m (i.e., a search neighborhood comprised of 3 × 3 image pixels, centered on the pixel located at the station coordinates). The pixel within this zone with the lowest value in band 5 was chosen as the representative pixel, as it is the pixel to most likely contain only water [61]. If two pixels had the same band 5 values, the pixel closest to the coordinates of water-quality sampling location was selected.

Multiple regression analysis
Multiple regression equations were derived to predict constituent concentrations (TSS and TN, i.e. the dependent variables) from the predictor variables, such as band reflectance. The procedure for selection of predictor variables is delineated below.
The spectral bands and associated band ratios were all chosen as candidates for independent variables. Band ratios were included as independent variables in the regression analysis [12] because they are less apt to be influenced by lighting conditions [62].
Radiance data from the thermal bands (band 6 of Landsat TM and ETM+, and band 10 of Landsat TIRS) were converted to water surface temperature. As discussed earlier, water temperature has been found to be related to phytoplankton concentration [23,24], and thus, related to water quality [63]. However, in this study, most of the satellite image dates differ by several days compared with the closest corresponding actual water-quality sampling date; thus, the water surface temperature derived from the satellite images does not represent the actual water temperature at the time of water sampling.
Eq 3 considers the net energy fluxes between a waterbody and the atmosphere [64]: where NET is the net energy flux, SWR net indicates the net short-wave radiation energy flux (Eq 4), LWR net indicates the net long-wave radiation flux (Eqs 5 and 6), LHF is the latent heat flux (Eq 7), and SHF is the sensible heat flux (Eq 8). These terms are calculated by the following equations [64]: LWR net % εsT s 4 0:39 À 0:05e a C % 1:61 1 À SWR down SWR cs þ 0:0019n ð6Þ where a is the surface albedo (usually very low for water so SWR net % SWR down ), ε is the surface emissivity, σ is the Stefan-Bolzman constant, T s is the water surface temperature, T a is the air temperature, e a is the surface vapor pressure, C is the cloud cover index (Eq 6), SWR cs is the clear-sky short wave radiation, n is the noon solar altitude, ρ is the density of air, L e is the latent heat of evaporation, C e is the turbulent exchange coefficient for latent heat, U is the wind speed, Q s and Q a are saturation specific humidity at the surface and at near-surface atmosphere, respectively, and C h is the turbulent exchange coefficient for sensible heat.
Some of the variables in Eqs 4 to 8 are known or can be reasonably assumed as constants (a, ε, σ, ρ, L e , C e , and C h [65]). The surface vapor pressure, e a , is dependent on water surface temperature [66]. Q s and Q a are both dependent on temperature as well [67]. The air temperature and noon solar altitude (T a , and n respectively) can be obtained from the historical observation record. The water surface temperature T s is obtained from thermal band data. That leaves only one variable unknown, which is the clear-sky short wave radiation SWR cs . Calculating SWR cs involves a complex procedure [68] so it is difficult to associate it with distinct environmental factor(s); thus, we did not consider it in evaluating heat flux in this study.
Assuming that the temperature change between the image date and the water-sampling date directly corresponds with the cumulative heat flux between the dates, the following variables are needed in order to account for the temperature change between the image-acquisition date and the water-sampling date [54]: 1. Time offset (in days) between the image date and the water-quality sampling date (positive offset means that the image date is later than the sampling date); 2. Water surface temperature (in K) derived from the thermal band; 3. Air temperature (in K): both instantaneous temperature at the time of satellite image acquisition, and daily mean air temperature between the image date and the water-quality sampling date are considered; 4. Wind speed (in m/s): both instantaneous wind speed at the time of satellite image acquisition and the daily mean wind speed between the image date and the water-quality sampling date are considered; and 5. Noon solar altitude (in degrees): the mean noon solar altitude between the image date and the water-quality sampling date.
Instantaneous temperature and wind speed were interpolated from the hourly historical data [54]. And further considering Eqs 4 to 8, the full list of variables considered in the multiple regression process is provided in Table 7. A look-up table between variable abbreviations and variable descriptions is provided as Table 8. As described above, in this study, the band number is based on band-numbering scheme for TM and ETM+.
Even though SWR cs and associated LWR net are not considered in selection of variables, LHF (latent heat flux) and SHF (sensible heat flux) already sum to 2/3 of the upwelling energy budget [69]. Further considering that a few environmental factors (e.g. T s − T a ) also play a role in long-wave radiation LWR net , the portion of the upwelling energy flux explained by the environmental factors should be higher than 2/3. Selection of predictor variables is based on a hybrid forward selection that considers the variation inflation factor (VIF). In conventional forward selection, variables are added to the regression one at a time, starting with no predictor variables being selected. The p-value threshold includes a predictor in the regression equation if its p-value is below a "probability to enter," and includes a predictor that will most improve the fit first (i.e., "forward"). A default value of 0.25 in JMP [70] was used for "probability to enter". Hybrid forward-selection method-based water-quality estimation with ancillary environmental data In addition to p-value, the variation inflation factor (VIF) was used to minimize multicollinearity of the model. Multicollinearity occurs when a predictor variable is a linear combination of other predictor variables in the model. The direct consequence of multicollinearity is that the error variance is inflated, which may result in low prediction power if the overfitted model is used with a new set of data. VIF is calculated as: where R 2 j is the multiple coefficient of determination between the j-th predictor variable of interest and the rest of the predictor variables. The rule of thumb to avoid serious multicollinearity is that all chosen predictor variables should have VIF less than 10 [71]. Unlike other criteria such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Mallow's Cp, VIF is generated for each predictor variable. Also, VIF has a suggested absolute criterion, whereas other criteria (AIC, BIC, Cp, etc.) provide only relative comparison between models.
We use an approach that treats the p-value and VIF equally while adding variables in forward selection. When a variable is added according to the p-value (i.e. conventional rules of forward selection), VIFs of all included variables (including the one that is just added) are also checked. If VIFs are all below the threshold of 10 (or any user-defined value), the newly-added variable is allowed, and the next variable is chosen according to the rule of forward selection. However, if any VIF is found to be larger than the threshold for any of the variables, the most recently-added variable is deleted and the selection procedure stops. Coefficients of variables, p-values, and VIF are dynamically recalculated when any variable is deleted from the model. The procedure is illustrated in Fig 2. The derived multiple linear equations were then validated by splitting all data into the calibration and validation groups. 80% of the data were used for calibration and the remaining 20% were used for validation because the minimum number of observation samples used in calibration should be approximately twenty [5]. The calibration and validation processes were repeated fifteen times for each of the water-quality constituent using randomly selected calibration and validation data groups. In each of the fifteen runs, the same calibration and validation data groups were used by both the hybrid and conventional forward-selection processes, so their performance can be correctly compared.

Results
The statistics of coefficients of determination from all fifteen calibration and validation runs were provided in Tables 9 and 10. The results showed good prediction accuracy for future TSS data, but less than satisfactory validation accuracy for TN [72]. Compared to conventional Hybrid forward-selection method-based water-quality estimation with ancillary environmental data forward selection, validation runs have higher accuracy for both TSS and TN with the utilization of the hybrid forward-selection process. More discussions based on Tables 9 and 10 will be provided later in this paper.
After showing the proposed procedure can provide adequate calibration accuracy and improved validation accuracy compared to the conventional approach, a set of "best" predictive equations using all available data was created and provided in Table 11 for use in the subsequent discussions and field applications. The results in Table 11 include the predictor variables, importance of the predictor variable, associated regression coefficients and standard error, 95% confidence intervals for the regression coefficients, p-values, and VIF values for each of the response variables (TSS and TN). The importance values ("Imp. of Var.") are calculated by dividing the change in R 2 (coefficient of determination) when the variable of interest  Hybrid forward-selection method-based water-quality estimation with ancillary environmental data is dropped from the model by the overall R 2 when the variable of interest is included [73]. The sum of importance values of all variables does not equal to 1 since the importance is relative. The resulting best multiple regression-based models are provided in Eqs 10 and 11: TN ¼ ð4:357 À 0:0533 Â W mean À 0:0124 Â T s þ 4:497 Â B1 À 0:0493 Â B4 B1 À 0:0126 Â D off Plots of the observed versus predicted concentrations for the best predictive equations (based on all available data) of TSS and TN calculated from Eqs 10 and 11 are plotted in Figs 3 and 4 respectively. The residual error (defined as the predicted value minus the observed value) and 1:1 line is added to both figures.

Discussion
The multiple linear equations derived from the regression analysis indicate that weatherrelated variables play an important role in predicting water-quality measures. In fact, many weather variables bear more importance than the multispectral variables do. The relative importance of each variable is provided in Table 11. If all the weather variables are removed from Table 11, the predictive variables related to Landsat bands alone provide only coefficients of determination, R 2 , of 0.53 and 0.26 for TSS and TN respectively.
Given the statistics of the fifteen calibration and validation runs, prediction of TSS concentration is accurate, with the hybrid process providing improved accuracy. Even though the calibration accuracy for TN is satisfactory, the validation accuracy is not, and the standard deviation of validation R 2 is relatively high. However, we observed that the derived equations can still predict the relative magnitude of TN concentrations. Therefore, we suggest using the TN equation to evaluate the trend of change in water quality only. The VIF-based hybrid forward-selection process showed better performance than that of the conventional forward selection process. In some of the runs, the hybrid process and the conventional process arrived at the same equations, but the hybrid process successfully prevented overfitting in other runs, thus resulting in higher validation accuracy. Due to overfitting, calibration accuracy from the conventional process is higher than that of the hybrid process, with the cost of lower validation accuracy.
Kloiber et al. [12] found that both B1 and the ratio B3/B1 can be used to predict the Secchi disk transparency, which is closely related to TSS. From Kloiber et al. [12], the regression model containing B3/B1 and B1 predicted Secchi disk transparency with R 2 of 0.75. We also found B3/B1 as the dominant important variable in determining TSS concentrations, but did not find B1 as one of the significant prediction variables. Kloiber et al. [12] accrued a slightly higher R 2 than our study possibly because Kloiber et al. limited their in situ data collection to ±1 day from the corresponding satellite image acquisitions. In the current study, the predictive equation that includes B3/B1 alone has a R 2 of 0.53 for TSS because our available data only allows in situ samples to be ±7 days from satellite image acquisitions. Considering weather variables successfully boosted R 2 to 0.68, such that it was comparable with that of Kloiber et al. [12] (i.e., 0.75). For TSS, we found the instantaneous wind speed, W, to be an important prediction variable. Since the instantaneous wind speed is chosen, instead of the daily mean wind speed between the image date and the water-quality sampling date (W mean ), it indicates that the instantaneous effect of wind (such as the surface ripple effect) is more important to TSS determination than the long-term heat-exchange effect. Even though the difference between the water surface temperature and the daily mean air temperature between the image date and the water-quality sampling date is selected as one of the prediction variables, it is of little importance in the model. It was chosen because the default forward-selection method has a lenient inclusion criterion (p = 0.25).
Dewidar and Khedr [11] determined that the band ratio B2/B1 is important in determining the TN concentration in brackish lagoons. However, the correlation between B2/B1 and TN was low in Dewindar and Khedr [11], with a correlation coefficient of 0.298. B2/B1 was also chosen by this study as one of the predictor variables, but B2/B1 still bears little predictive power as shown in Table 11. In contrast, the daily mean wind speed between the image date and the water-quality sampling date (W mean ) and water surface temperature (T s ) were determined to be the two most important predictor variables for TN prediction.
The high importance of water surface temperature T s fortifies the hypothesis that water temperature is related to the growth of microorganisms. The high importance of the daily mean wind speed between the image date and the water-quality sampling date (W mean ) and date difference (D off ) indicate that temperature change due to accumulated heat flux between Hybrid forward-selection method-based water-quality estimation with ancillary environmental data the image date and sampling date is important. Referring to Eqs 7 and 8, the mechanism involved should be the latent heat flux because latent heat flux (Eq 7) and sensible heat flux (Eq 8) are the only two components in the heat flux budget that involve wind speed. Latent heat flux is a main component of heat exchange between water and the atmosphere, and sensible heat plays a much lesser role [74]. Even though the circumstantial evidence based on Eqs 7 and 8 points to the conclusion noted above, this still needs to be validated by direct evidence from future field experiments.

Field application
To demonstrate the utility of water-quality monitoring by satellites via our proposed method, water-quality measures from Lady Bird Lake on May 14, 2014 were estimated using Eqs 10 and 11, respectively. This date was chosen because storms occurred on the day previous to and in the morning of the satellite overpass (prior to the overpass) with a cumulative rainfall depth of 27 mm, likely making it easier to discern the effect of urban stormwater runoff to the lake. Figs 5 and 6 give the respective predicted spatial distribution of TSS and TN concentrations.
The water quality in the northwestern part of the lake is generally better than that in the southeastern area, which is expected as a result of urban runoff. Lady Bird Lake has three major tributaries in the metropolitan Austin area: Barton Creek, Shoal Creek, and Waller Creek. The confluence points of the three streams are indicated in Figs 5 and 6. Barton Creek includes an extensive green belt around its riparian zone, and strict development regulations are in force because it is located within the Edwards Aquifer recharge zone [75]. As a result, there is no marked change in TSS and TN at the confluence point of Barton Creek, relative to proximal areas of the lake. However, the confluence points of Shoal Creek and Waller Creek show significant increase in TSS and TN. This illustrates the effect of conservation efforts spent on each watershed on water quality. The influence of Shoal Creek is more visible in Figs 5 and 6 than that of Waller Creek because Shoal Creek has a larger drainage area [76]. Such details in spatial distribution can only be achieved via satellite-derived water-quality predictions and can serve as the precursor examination for more detailed water-quality Hybrid forward-selection method-based water-quality estimation with ancillary environmental data examinations. This field application also showed that the equation to predict TN concentration can estimate the trend of change in TN concentration with sufficient confidence, even though the predicted absolute concentration values might not be accurate.

Conclusions
Multiple regression-derived equations using reflectance bands, band ratios, and environmental factors as predictor variables for concentrations of TSS and TN respectively, were derived using a hybrid forward-selection method that considers both VIF and p-value in the forwardselection process. Landsat TM, ETM+, and OLI/TIRS (Landsat 8) images were all used to derive the single set of equations. The coefficients of determination of the best-fitting resultant equations are 0.68 for TSS and 0.62 for TN based on all available observation data. Through repeated data splitting into the calibration and validation groups, the hybrid method delivered a calibration accuracy (in R 2 ) of 0.73 and 0.64 for TSS and TN, respectively, and validation accuracy of 0.70 and 0.37 for TSS and TN, respectively. The hybrid forward-selection process consistently showed better validation accuracy compared to that of the conventional forwardselection process. Validation results show good accuracy for TSS prediction. However, the mean and standard deviation for coefficients of determination of TN validations tends to be unsatisfactory. Therefore, the predictive equation for TN is recommended for trend evaluation only, as indicated by the field application.
Among all chosen predictor variables, B3/B1 has the strongest influence on the predictive power for TSS retrieval. The band ratio of B3/B1 was also selected by Kloiber et al. [12] in predicting Secchi disc transparency, indicating a correlation between Secchi disc transparency and TSS. Other reflectance bands and band ratios, such as B1, B2/B1, and B4/B1 are also influential in estimating TN concentrations, but they are not dominant factors.
Environmental factors, such as wind speed and water surface temperature, were crucial in determination of water quality in this study. Inclusion of environmental factors allows usage of a single set of predictive equations across the seasons, as such predictive equations are innately adapted to the environmental changes for different seasons. The predictive equation will also likely to be more accurate because the pooling of all observation data.
The instantaneous wind speed, W, bears considerable importance in TSS determination, which is explained by wind-generated surface ripple effects. Water surface temperature T s (derived from satellite remote-sensor thermal band image data) is important in determination of TN concentrations, as the growth of microorganisms in water is correlated with water nutrient concentrations.
The time offset between the satellite image-acquisition date and water-sampling date must be accounted for in water nutrient concentration (i.e., TN) retrieval. The heat flux budget between air and the water surface was considered, and components in the budget equations were included in the forward-selection procedure. In additional to the predictor variables identified above, the daily mean wind speed between the image-acquisition date and watersampling date (W mean ) was identified as the most important predictor variables for TN determinations. The time difference (in days) between the image-acquisition date and water-sampling date (D off ) was also chosen for TN determination. According to the heat flux budget equations, the inclusion of W mean , T s , and D off indicates the dominance of latent heat flux in the determination of TN.
The results showed that: 1. Environmental factors can constitute important ancillary variables in water quality estimation based on satellite remote-sensor images; 2. By including environmental factors, it is feasible to pool all observation data to create a single set of predictive equations, and use it to estimate water quality for all seasons; 3. A single set of predictive equations can be determined to retrieve year-round water-quality quantities (e.g., TSS and TN) with satisfactory accuracy from Landsat TM, ETM+, and OLI/TIRS imagery on the same lacustrine water body; 4. Population change does not drastically change the applicability (i.e. the relationship between spectral bands and water-quality constituents) of regression-derived equations for water quality prediction. The derived predictive equations are applicable for data across 30 + years (1983 to 2015) even though population in the metropolitan area almost tripled (from 374,000 in 1983 to 901,000 in 2015) over the same period of time [44]; 5. Including VIF as part of the forward-selection process comprises a simple yet reliable methodology for choosing predictor variables for TSS concentrations; 6. Prediction of water nutrient concentrations (e.g., TN) yields low accuracy using the methodology depicted in this study, but the predictive equations are still valuable in evaluating the trends of spatial and temporal changes of nutrient concentrations; and 7. The methodology depicted in this study, including the utilization of a hybrid forward-selection process and consideration of environmental factors, showed marked improvements compared to the conventional methods, and it is simple enough to be followed by government agencies by addressing the issues of cost, product accuracy, data continuity, and programmatic support.
In the future, the hybrid forward-selection method can be further refined to require a stricter criterion for the inclusion of predictor variables. The default p = 0.25 may have allowed inclusion of a few predictor variables that were not significant in the final selection of variables.
In addition, inclusion of ancillary environmental factors involving long-term averaging, such as average wind speed (W mean ), into the regression models demonstrated that it is possible to satisfactorily estimate water-quality quantities, even when a large temporal offset between satellite image-acquisition and in situ water sampling exists. Currently, the recommended longest temporal window between remote-sensor image-acquisition and water-sampling date is approximately seven days [20]. Since these environmental factors are part of the heat flux equations, including environmental factors in predictive equations means an active compensation in estimation error due to the temporal offset in collecting image and watersample data. This hypothesis needs further testing as part of future research efforts.