Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Quantifying unpredictability: A multiple-model approach based on satellite imagery data from Mediterranean ponds

  • Lluis Franch-Gras ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Valencia, Spain

  • Eduardo Moisés García-Roger,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – review & editing

    Affiliation Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Valencia, Spain

  • Belen Franch,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Software, Validation, Visualization, Writing – review & editing

    Affiliations NASA Goddard Space Flight Center, Greenbelt, MD, United States of America, Department of Geographical Sciences, University of Maryland, College Park, MD, United States of America

  • María José Carmona,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing

    Affiliation Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Valencia, Spain

  • Manuel Serra

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Supervision, Validation, Visualization, Writing – review & editing

    Affiliation Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Valencia, Spain


Fluctuations in environmental parameters are increasingly being recognized as essential features of any habitat. The quantification of whether environmental fluctuations are prevalently predictable or unpredictable is remarkably relevant to understanding the evolutionary responses of organisms. However, when characterizing the relevant features of natural habitats, ecologists typically face two problems: (1) gathering long-term data and (2) handling the hard-won data. This paper takes advantage of the free access to long-term recordings of remote sensing data (27 years, Landsat TM/ETM+) to assess a set of environmental models for estimating environmental predictability. The case study included 20 Mediterranean saline ponds and lakes, and the focal variable was the water-surface area. This study first aimed to produce a method for accurately estimating the water-surface area from satellite images. Saline ponds can develop salt-crusted areas that make it difficult to distinguish between soil and water. This challenge was addressed using a novel pipeline that combines band ratio water indices and the short near-infrared band as a salt filter. The study then extracted the predictable and unpredictable components of variation in the water-surface area. Two different approaches, each showing variations in the parameters, were used to obtain the stochastic variation around a regular pattern with the objective of dissecting the effect of assumptions on predictability estimations. The first approach, which is based on Colwell’s predictability metrics, transforms the focal variable into a nominal one. The resulting discrete categories define the relevant variations in the water-surface area. In the second approach, we introduced General Additive Model (GAM) fitting as a new metric for quantifying predictability. Both approaches produced a wide range of predictability for the studied ponds. Some model assumptions–which are considered very different a priori–had minor effects, whereas others produced predictability estimations that showed some degree of divergence. We hypothesize that these diverging estimations of predictability reflect the effect of fluctuations on different types of organisms. The fluctuation analysis described in this manuscript is applicable to a wide variety of systems, including both aquatic and non-aquatic systems, and will be valuable for quantifying and characterizing predictability, which is essential within the expected global increase in the unpredictability of environmental fluctuations. We advocate that a priori information for organisms of interest should be used to select the most suitable metrics for estimating predictability, and we provide some guidelines for this approach.

2. Introduction

Fluctuations in environmental parameters and their potentially associated unpredictability are increasingly being recognized as essential features of any habitat [1] because they are expected to influence the performance of the inhabiting organisms and affect upper levels of ecological organization [2]. Because human activity often increases environmental fluctuations, their analysis provides an applied interest [35]. Indeed, ascertaining how organisms respond to environmental fluctuations is fundamental in biology [6,7].

From the point of view of an organism, predictability is related to the organism’s ability to anticipate and adjust to a future environmental condition and thus involves a time scale [8,9]. For instance, annual fluctuations in temperature might be experienced very differently by short-living invertebrates compared with long-living vertebrates [10]. Any habitat shows some constancy and some variations in its features, and the focal habitat feature can consistently be decomposed into three components: constancy, predictable–periodic–fluctuations, and unpredictable fluctuations. The relative importance of these components is expected to produce diverging adaptive responses in organisms [11]. Thus, to quantify whether environmental fluctuations are prevalently predictable or unpredictable is highly relevant for understanding evolutionary responses [2] and for testing ecological and evolutionary hypotheses.

The characterization of fluctuations is a complex problem that requires a robust methodology because it cannot be based on mean values but rather on variances. Moreover, this characterization needs predefined assumptions regarding the relevant time-scales that can be associated with predictability, which cannot always be defined in a straightforward manner. Furthermore, one of the most important constraints is the need of long time-series data. Thus, (1) a methodology for the acquisition of long-enough time series is required, and, (2) an appropriate metric then has to be assumed and assessed to characterize the focal habitats.

Remote sensing technology has been highly developed in the last century [12], but its use in several research areas has not reached yet its full potential, as is the case in ecology [13]. Several studies have shown that this technology can offer solutions to a wide variety of problems in nearly all fields of environmental research [1416]. Long time series are costly to obtain and are consequently scarce [17]. In this regard, the information gathered by several satellites in recent years can provide long time-series data, such as the Landsat satellite series, which ranges from 1972 to the present and has moderate spatial and intermediate temporal resolutions [18]. Since 2008, these scenes are freely available from the United States Geological Survey (USGS), which allows the scientific community to gather a great deal of information free of cost [19].

Several indices have been developed to estimate the degree of predictability from time-series data. Basically, these indices decompose the time series based on periodic and stochastic variation [20] and associate these with predictable and unpredictable fluctuations, respectively. For nominal data, such as the presence/absence of water, Colwell [21] proposed a predictability index based on information theory [22] that has been mainly used in streams and rivers [23]. In contrast, indices based on spectral analysis, such as the Fourier transform and asymmetric eigenvector maps (AEM) [24, 25], have been used for continuous data [26]. However, these continuous-metric methods are very sensitive to gaps in the time series; thus, Colwell’s method might be preferable despite the drawback of discretizing a quantitative variable.

In this study, a group of Mediterranean shallow ponds and lakes were considered the case study, and the water-surface area (hereafter A; [27]) was considered as focal habitat feature. A is considered an ecologically relevant factor in lentic water bodies [28] and is correlated with the average lake depth, and both of these measurements determine the habitat size for aquatic organisms. The habitat size affects the ‘colonization vs. extinction’ balance and the habitat heterogeneity [29]. Fluctuations in A have multiple effects on the organisms above and below the waterline, including both aquatic flora (e.g., macrophytes [30]) and fauna (e.g., fish [31] and zooplankton [32]). These fluctuations can strongly affect the habitat conditions through variations in the physicochemical parameters (e.g., temperature, light, nutrient or solute concentration [33,34]). The solute concentration is particularly relevant in saline ponds and lakes, which are a significant, geographically widespread part of the world’s inland aquatic ecosystems [35, 36]. However, despite their importance, remote sensing studies in saline water bodies are scarce and have mostly been based on large lakes [3739]. Overall, band ratios are typically used when detecting water bodies (e.g., [40, 41]), but are rarely used for saline ponds. In contrast, infrared bands have been previously used for the assessment of A in saline ponds, but these are not accompanied by band ratio indices [42, 43].

Water bodies in the Mediterranean region are a good case study because they are expected to cover wide ranges of predictabilty. Some of these water bodies are almost permanent, whereas others are characterized by strong seasonality and temporal unpredictability at several time scales [44]. Both confined and unconfined organisms in non-permanent ponds are expected to undergo adaptation and to be strongly reliant on patterns of pond inundation (i.e., A variation). Waterfowls exhibit migratory patterns and many short-lived animals undergo lifecycles at an annual or shorter time scale. Accordingly, ecological unpredictability in these systems can be conceived as departures from an average seasonal variation, i.e., the main source of environmental unpredictability is the inter-annual variation of the within-year variation.

In this study, we used time-series remote sensing data to assess a set of models for the estimation of environmental predictability. Our study is divided in two parts. The first part of this manuscript proposes a procedure to estimate A and assesses this procedure using a set of Mediterranean saline water bodies that can develop salt crusts by evaporation. In the second part, a set of metrics for estimating stochastic variations in A are elaborated and compared to evaluate their effects on the assessment of environmental predictability. Specifically, in the first part, A was quantified using 27 years of Landsat 5/7 scenes. A salt crust is formed in some of the evaluated ponds when water is evaporated, and this salt crust can make it difficult to distinguish between soil and water. This challenge was addressed through the development of a novel approach that combines the sequential use of band ratio water indices and the short near-infrared band as a salt filter. In this section of the manuscript, a hypothesis regarding the reliability of estimating A from satellite scenes underlies our work. The accuracy of these estimations was assessed through the inspection of aerial (non-satellite) images and qualitative field data. In the second part, the predictable and unpredictable components of the variation in A over the 27-year time series for each pond were extracted. Two different approaches, which showed differences in their parameters, were used to obtain the stochastic variation around a regular pattern with the objective of dissecting the effect of assumptions on predictability estimations. The first approach, which is based on Colwell’s predictability metrics, transforms the focal variable into a nominal one. The resulting discrete categories define which variations in the water-surface area are considered relevant. For the second approach, we developed a novel approach that parallels the method developed by Colwell but is based on regression models. The similarity and divergence of the different predictability indices were analysed to determine both their capability to embrace a wide predictability range and their sensitivity to the initial assumptions. When the indices produce diverging results, we propose that this divergence is related to the fact that the unpredictability of a fluctuation pattern varies depending on the organism of interest.

3. Materials and methods

3.1. Study area

The study region is located in the eastern region of the Iberian Peninsula (Fig 1), an endorheic area of approximately 800 km2 with more than 100 ponds, and 20 of these ponds were included in this study (Table 1). The ponds included are shallow (depth of approximately 1 m) and brackish (salinity: 6.5–40 g/L) and showed different mean areas (0.00013–1.19 km2). Precipitation is the main water inflow, but some ponds are also connected to groundwater [45].

Fig 1. Location of the study region (38°55.4’ to 38°41.803’N and 1°47.32’ to 1°24.26’W) and Landsat 5 scene showing the area where the ponds (highlighted in blue) included in this study are located.

Table 1. Studied ponds sorted by decreasing mean water-surface area ()a.

The climate in the study area is semiarid, with a mean average annual rainfall of 343 mm and a mean temperature of 14°C (local weather station). June to September is considered the dry period, when temperatures might exceed 40°C and rainfall is scarce (typically <8 mm in July). Most precipitation occurs as heavy rains in spring (April-May) and autumn (October-December) [45].

Episodic, in situ observations over the last decade have shown that some ponds can frequently dry out, develop a thick salt crust, and fill up again (Fig 2). In contrast, other ponds, were observed to be flooded for years, including through several dry seasons.

Fig 2. Aerial images of the ponds PET (above) and CAM (below), illustrating the variations in the water-surface area (green-black) and salt crust (white).

3.2 Satellite scene data

In this study, we used a total of 432 images over the path 199 and row 33 scene acquired by the Landsat Thematic Mapper (Landsat 5) and Enhanced Thematic Mapper Plus (Landsat 7) satellites. Among these images, 314 were provided by the European Space Agency (ESA) as the L1T product, which means that they are radiometrically and geometrically corrected [46]. The remaining 118 scenes were downloaded from the United States Geological Survey (USGS) Climate Data Record (CDR) product. This product consists of surface reflectance images, which were atmospherically corrected using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm [47] and an accurate cloud mask developed in a previous study [48]. The mentioned algorithm has been implemented, tested, validated and widely distributed [49]. According to a previous study [50], the LEDAPS application to Landsat has demonstrated a performance comparable to that of the MODIS algorithm for aerosol retrieval over land [49,51] and for the surface reflectance product [52,53]. To maintain consistency between the two datasets (ESA and USGS), the ESA images were processed with the LEDAPS algorithm [54]. The resulting set of scenes cover the period 1984–2011, with a spatial resolution of 30 m and a revisit time of 16 days for each satellite [18].

3.3 Estimation of the water-surface area (A)

Water presence in a pixel was inferred from a two-condition assessment (2cA). The first condition of 2cA was addressed to differentiate areas potentially covered by water from soil. The Modified Normalized Difference Water Index (MNDWI [40]) was calculated using the reflectance from Landsat’s band 2 (Green, 0.52–0.60 μm) and Landsat’s band 5 (middle infrared band, MIR, 1.55–1.75 μm) as follows: (1)

As described previously [40], the pixels with positive MNDWI values were selected as potential water-covered areas.

The potentially water-covered pixels were evaluated in terms of a second condition. The salt crust left by water evaporation yielded false–positive pixels of water-covered areas. To exclude them, a second filter (salt filter) was included using a previously developed approach [42] based on the condition that band 4 (near-infrared) reflectances lower than 0.4 were classified as non-salt. However, although these authors applied the conditions to land pixels, we applied the threshold to the scenes after the first condition was assessed (Fig 3). To evaluate the adequacy of the salt filter, we compared the automatically processed results obtained after using only the first or after using both filters (2cA) through visual interpretations of aerial photographs (years 2006 and 2009; four bands; spatial resolution: 0.25–0.5 m) and qualitative field observations (presence/absence of water); S1 Table provides information for the two latter sources of validation. Satellite data analyses were performed using ENVI/IDL (Exelis Visual Information Solutions, Boulder, CO, USA).

Fig 3. Water presence in a pixel: A two-condition assessment (2cA).

The first condition differentiates potentially water-covered pixels from soil, and the second condition differentiates salt-covered from water-covered pixels.

3.4 Quantification of environmental predictability

For each of the ponds, the degree of predictability in the variation of A in the time series was estimated. Seven models with different assumptions were evaluated. Note that the term ‘model’ in this manuscript refers to a different approach to quantitatively implement a priori concepts to calculate the predictability indices. Model assumptions have implications on which variation of A is regarded as predictable. Our evaluation consisted of identifying similarities and divergences between model outputs, and determining whether the divergences are related to the capability of different organisms to predict environmental fluctuations. Five models were developed in the present study according to Colwell’s approach, and discrete categories for the range of A values in the time series were established. Two additional models, which constitute novel contributions of this study, followed a continuous approach, and therefore, raw observations of A were used. The different years in the time series were treated as replicates; thus, the seven models were applied to the within-year variation. The mean water surface area () for each pond was computed by averaging each month’s mean A (excluding cloud-covered observations). The hydroperiod was estimated as the annual average of the fraction of observations with A> 0 in each month (excluding cloud-covered observations).

3.4.1 Discrete models based on Colwell’s approach.

Colwell’s predictability (P) index [21,22] consists of the summation of two metrics: constancy (which measures the degree in which a pond remains in the same state) and contingency (which measures the repeatability of the variation pattern). Colwell’s approach applies to nominal variables (values = states), and therefore A (a continuous variable) was transformed into a nominal variable. The five discrete models tested are described in Table 2.

Table 2. Models for predictability estimation.

Statistic: parameter of the data distribution used to define the states; scaling: proportion between consecutive range windows; si: i-th state (discrete); raw data: water-surface area (A) after image processing.

The rationale for the number of states and thresholds was straightforward in some models, such as COL_wd, in which two states (presence/absence of water) were considered. The number of states in the remaining discrete models was limited to three to (1) avoid excessive proliferation of states and (2) accumulate sufficient data for each state. The thresholds between states in COL_ANa and COL_ANw were defined under the assumption that most organisms have a rather broad range of tolerance around the mean value of A, at which they are expected to be best adapted (being vulnerable only to extreme values); nonetheless, it also accounts for other organisms that are opportunistic for extremely high or low A values. In other words, they focus on separating extreme values from regular values. Consequently, in these models, the thresholds were ± 70% , which shows a fairly wide intermediate range. Finally, in COL_MAXlin and COL_MAXg, most organisms would have a narrow tolerance range, confined to one of the proposed partitions of the maximum value. In these models, the thresholds were linear and geometric divisions of the maximum value, respectively, based on the assumption that the geometric scaling of the variation of A around large values has a lower effect than the same variation around small values.

In Colwell’s approach, time is also required to be a nominal variable; thus, in our models, time steps were matched to calendar months. In the months with more than one observation of A, the A value was obtained by averaging A observations with the lowest cloud cover. For those observations where partial cloud cover of the pond was found, the observation was only considered if the assignation of a state was not affected by the cloud covered area of the pond. Otherwise, the observation was excluded.

3.4.2 Continuous models: GAM predictability.

This family of models is based on the dispersion of observations in the time series around a typical intra-annual curve of the normalized A (hereon A’, normalization: ). Because a continuous approach was used, any observation with cloud cover was excluded. The typical curve was fitted to A’ (dependent variable) in relation to the day of the year. To avoid assuming an a priori, general, very constrained shape for the curve (e.g., sine function), we performed General Additive Models (GAM) with cubic splines as the smoothing function for identifying trends in the time-series data [55] using the gam function (‘MGCV’ package, [56]) in R statistical software v.2.12.1 [57].

Predictability was estimated based on the departure of the observed values from the values fitted by the GAM regression. The determination coefficient of the regression model regards constancy as a lack of determination (prediction) from the independent variable (here, time) and was thus not used. Hence, an index of predictability (PGAM) was developed based on the dispersion of the data with respect to the fitted model: (2) where SDres is the standard deviation of the residuals of the fitted model. Our predictability index is inversely related to the coefficient of variation produced by the regression, which is defined as the mean value of the independent variable divided by SDres. Note that the normalization of the independent variable (A’) causes its mean values to approach 1, particularly when the values with A = 0 were not excluded.

3.4.3 Relationships among predictability estimations.

Pearson’s correlation coefficients were computed to evaluate the relationships among the predictability estimates from each model using (1) the 20 studied ponds and (2) 18 ponds (excluding the two most ephemeral ponds; more than 90% of time series A = 0; HTU and HYE) because these were expected to pose problems in regards to continuous metrics of predictability (see [20]). To facilitate interpretation of the results, a standard Unweighted Pair Group Method with Arithmetic Mean (UPGMA) cluster analysis was performed with the ‘Pvclust’ package in R [58] using a dissimilarity matrix based on the chord distance () of the predictability values of the 20 ponds, where r is Pearson’s correlation coefficient [59], and the data were resampled 10,000 times.

4. Results

4.1 Estimation of water-surface area (A): Assessment

To assess the accuracy of the A estimations, the estimations were compared with (1) inspections of aerial images, and (2) qualitative field data (presence/absence of water in the pond). Additionally, A values that were visually estimated from raw scenes were compared to A estimations (i.e., produced by automatic processing after applying the 2cA), and both sets of estimations yielded consistent results. As a quantitative assessment, a total of 33 estimations of A and estimations of water-covered pixels based on aerial scenes taken close in time (maximum time for matching values corresponding to the two estimations was 18 days) were compared. Both estimations were found to be strongly correlated (R2 = 0.98, P-value<0.001; right panel in S1 Fig; S1 Table), and no statistically significant differences were found between them (paired t-test; t = 1.3, d.f. = 31, P-value = 0.2). Fig 4 shows instances of this comparison, stressing the role of applying the filter for salt crust involved in 2cA. Moreover, an estimation of A from satellite scenes based on only the first condition of the 2cA (i.e., without applying the salt filter) and a comparison to estimations of water-covered pixels from aerial scenes are shown in the left panel of S1 Fig. This quantitative assessment was complemented by a qualitative analysis in which A estimations were compared with direct qualitative field observations (presence/absence of water in the pond; n = 52) from previous studies [60,61] and Montero-Pau (personal communication) obtained close in time (within the same month, S1 Table). The presence/absence of water observed in the field matched in all cases (n = 52) with the satellite observations.

Fig 4. Raw and processed data for four ponds.

First column: aerial scenes. Second column: Landsat 5 satellite raw scenes. Third column: results after applying the two-condition assessment to each pixel in each satellite scene. Brown pixels do not accomplish the first condition (MNDWI<0). White pixels do accomplish the first condition but not the second one (MNDWI>0 and TM4>0.4). Blue pixels do accomplish both conditions (MNDWI>0 and TM4<0.4, i.e., the pixel is considered a water-covered pixel). Fourth column: from black (0%) to red (100%), proportion of scenes for each pixel that were estimated as water-covered.

The comparison between automatically processed estimations of A with and without the salt filter (i.e., the second condition in the 2cA) showed that this filter causes a 7.5% (overall mean) reduction in A. For some of the ponds, no pixels were excluded by this condition (six ponds), but in some others the reduction can achieve up to 100% of the pixels (S2 Fig). Therefore, the second condition has different effect on A depending on the pond that is evaluated.

4.2 Estimation of the water-surface area (A): Historical data record

From the 27 years of monitoring, 8,640 estimations of A (20 ponds x 432 raw scenes) were obtained. The maximum estimated size of the ponds ranged from 5 to 2,513 pixels. The automatic application of the 2cA to raw satellite scenes yielded 4,036 cloud-free A data points (average per pond = 201; range 92–257). The complete time series of A for each pond are provided in S2 Table. For the smallest pond, A ranged from 0 to 4,500 m2, and for the largest pond, the range of this value ranged from 17,000 to 2,263,000 m2. In the time series of A, the Pearson’s correlation coefficient (r) between ponds ranged from -0.31 to 0.86. The annual hydroperiod ranged from 0.07 to 1.00 (Table 1). Three ponds (PET, SAL and SLD) achieved the maximum hydroperiod because these ponds never completely dried out in our data record.

4.3. Quantification of environmental predictability

Fig 5 illustrates how two models, one discrete (COL_ANa) and one continuous (GAM_a), estimate the predictability of the variability of A in the time series for two ponds, Salobralejo (SAL) and La Campana (CAM), which were selected due to their very different predictability estimations. Despite the reduction of information caused by the discrete classification or the regression model, the indices captured the different fluctuation patterns observed in the time series for each of the ponds.

Fig 5. Instances outlining the data analysis procedure for computing predictability estimates for two ponds (SAL and CAM).

Two models are compared: COL_ANa (discrete) and GAM_a (continuous). (a.1,2): Twenty-seven-year time series of the water-surface area (A) obtained by Landsat 5/7 scenes after applying the 2cA. (b.1,2): Counts for the two-way table (time steps vs. pond state) for the COL_ANa model; constancy, contingency and predictability indices are shown. (c.1,2): Scatter plot of the normalized water-surface area () and day of the year (DOY) from the complete time series; regression curves based on the GAM (solid line), 95% CIs (dashed lines), and predictability indices are shown.

The predictability estimates for each pond and model are shown in Table 3. Maximum predictability (i.e., 1) was achieved for three ponds (PET, SAL and SLD) with the COL_wd model because these ponds never completely dried out in our data record and the COL_wd model only considers the presence/absence of water. The coefficient of variation for the predictability estimations under each model ranged from 0.32 to 0.72, values that can be associated with the discrimination power of each model.

Table 3. Predictability estimates and statistics for each pond and model combination.

The relationships among models are shown in Fig 6. Two clusters of models were identified using 0.90 as the distance threshold. Cluster A includes the COL_wd, GAM_a, COL_ANa, GAM_w and COL_ANw models (bootstrap = 85%). This cluster includes discrete and continuous models. Moreover, the correlation between predictability estimates with discrete vs. continuous models ranged from 0.49 to 0.88. Thus, continuous and discrete approaches can produce similar results depending on other assumptions. The comparison of the two hemi-matrixes in Fig 6 showed that the correlation between GAM_a and the other models in Cluster A increased when excluding the two most ephemeral ponds (lowest correlation: 0.27 vs. 0.72).

Fig 6. Relationships among the predictabilities estimated with the models.

The matrix shows the correlations using Pearson’s coefficients obtained after excluding the two most ephemeral ponds (upper hemi-matrix, n = 18) and without excluding them (lower hemi-matrix, n = 20). The correlations are color-coded (blue-red scale, from negative to positive). A dendrogram obtained through a UPGMA cluster analysis based on the chord distance between predictability estimations (lower hemi-matrix) is shown. The two clusters of models were differentiated using 0.90 as the distance threshold, (A, and B; enclosed by dashed rectangles).

Cluster B includes COL_MAXg and COL_MAXlin (bootstrap = 99%), the two discrete models in which the states were defined with respect to the maximum observed A. Therefore, according to our results, linear and geometric scaling appears to be secondary for predictability estimations. Interestingly, some of the models in cluster A were negatively correlated with the models in cluster B. This diverging result is associated with the predictability index estimations in the ponds that frequently maintained some water cover (A>0), as shown in Fig 7. This figure depicts the performance of the models in clusters A and B, which was assessed by choosing a representative model from each cluster (COL_ANa and COL_MAXlin) and showing their predictability estimations against the proportion of observations in state 1 (the state that includes A = 0) for each pond.

Fig 7. Relation between a model from each cluster (COL_ANa and COL_MAXlin) and the proportion of observations in state 1 (the state that includes A = 0).

The solid line indicates the quadratic least-square fitting, and R2 is the determination coefficient associated with the corresponding fitting.

5. Discussion

The characterization of environmental fluctuations is a central issue in ecology and environmental science that requires long-term time series [20]. Time series analysis often implies discarding a high volume of observations; in our case, 27 years of observations were reduced by 53% due to cloud filtering. Not surprisingly, obtaining long-term time series data is regarded as costly, and these datasets are consequently scarce [17]. In this context, remote sensing has been proven to be a timely, reliable, global and cost-efficient tool for analysing a given environmental variable over a long time series.

The general interest of our work relies first on the importance of evaluating long-term patterns of the water-surface area in ponds located in arid regions, where salt crust is a confounding feature in satellite images. These habitats are far from rare, and as dynamic island-like areas, their fluctuating patterns can be related to their biota (see below). The conditions for water detection applied in this study combine (1) the robustness of band ratios (MNDWI index>0 [40]) and (2) a refinement needed for saline ponds (near-infrared band reflectance<0.4, based on [42]). The consistency between the satellite-based measurements and measurements based on more direct observations was found to be notable. Our study shows that satellite scenes, after convenient processing, provide sufficient resolution for the detection of variations in A for saline ponds ranging from 0.00013 to 1.19 km2. We have shown that the pipeline established here allows quantification of the water cover in small ponds, which are of particular importance in ecological studies of arid regions.

Second, the general interest of our study relies on a conceptual analysis of the notion of predictability when it is quantitatively implemented using models. Based on our interest in practical applications, we performed our analysis using an actual system composed of time series for the water cover of 20 ponds over 27 years, which allowed us to identify divergences between predictability estimates. We also aimed to state that these differences among indices are related to how organisms perceive fluctuations in their environment. Thus, the choice of the predictability index should take into account the organism of interest.

One challenge is to assess the effect of the measurement scale (quantitative vs. qualitative) on the focal variable. Many studies need to be based on a nominal scale or just prefer this approach, while others use a quantitative scale. If quantitative data are available, a quantitative analysis appears a priori to retain more information than a qualitative one. Fourier analysis and AEMs [24,25] are alternative approaches for fluctuation analyses that address quantitative data but are unable to address aperiodic observations. Unfortunately, this case is not uncommon [62], and the continuous (GAM) models developed in this study address this problem. However, in our study, the cluster analysis merges predictability estimates from discrete (nominal) and continuous (quantitative) models. A direct inspection of correlation values among predictability estimates confirmed that the cluster analysis does not force the aggrupation of some discrete and some continuous models. Therefore, these approaches can yield similar predictability results, and other assumptions have a higher impact on predictability estimates. Nevertheless, in contrast with Fourier analysis, both approaches (Colwell’s approach and the GAMs) ignore the correlation between consecutive observations because the days of the year in different years are merged, which makes them share an underlying assumption that might account for the similar estimations found in this study. Further studies should develop a time-series analysis procedure that takes into account the correlation between consecutive observations and is able to address rather aperiodic observations. In the meantime Colwell approach, which involves the discretization of a continuous variable, appears to be reliable compared with the regression methods explored in this study and has the advantage of keeping the analysis simple [20, 63, 64]. However, the discretization criteria could critically affect the predictability estimations, and that is needed to be closely looked by researchers intending to apply Colwell’s approach (see below).

Thus, a second challenge is to determine the type of variation in data that is relevant for estimating unpredictability. For instance, extreme values could be much more–or much less–important than would be accounted for in a linear approach, and unpredictability estimations could be inflated by considering meaningless values. This problem translates into the scale transformation (e.g., normalization) applied to the focal variable, which cannot be thoroughly analysed without considering the organism for which unpredictability is evaluated. In previous stream and river studies, the presence of many zeros in the time series caused the non-zero observations to produce an extreme residual [20]. Here, we explored different models that consider or not consider zeros in several ways, and we obtained similar results in the majority of cases. However, one of the continuous models (GAM_a) differed from the rest of the models in terms of classifying the two most ephemeral ponds (HTU and HYE; A = 0 in more than 90% of observations), which likely caused the low correlation between GAM_a and some of the models in cluster A. Most of the models assigned a high degree of predictability to these two ponds, whereas GAM_a assigned them a low degree of predictability. Predictability estimation by GAM_a is based on the coefficient of variation of A and tended to be high for these two ponds (thus yielding low predictability) because their mean A is very low. This effect on the coefficient of variation has been reported in statistical studies [65, 66]. In other words, GAM_a–and more generally, the consideration of zeros to estimate the mean A in a continuous approach–could inflate the estimation of environmental unpredictability. This effect is supported by direct inspection of the data. In fact, HTU and HYE ponds rather than unpredictable systems appear to be ephemeral ponds, as they are dry most of the time (i.e., predictably dry).

As has been stressed for many years [8,9], predictability depends on the point of view of the organism of interest, and confusion is possible between what human researchers and what other organisms can predict. Therefore, no predictability metric can pretend to serve as an absolute quantification of the degree of predictability of a specific environment, but instead, time scales of periodic variations and other biological factors must be taken into account when referring to any specific predictability estimation [10]. Models designed under different biological assumptions are expected to produce different predictability results, which is the case when comparing models between each of the two identified clusters. The organism’s features that affect its capability to predict fluctuations are potentially numerous and poorly known; thus, the proposal of general rules for selecting an appropriate predictability metric is inherently difficult. In contrast, performing a selection based on the biological information available for the case of interest is advisable. For instance, an inspection of the model assumptions suggests that predictability in the models in cluster A is the one perceived by a generalist, eurioic, small-sized organism, whereas the opposite is true for the models in cluster B. Hence, in model COL_wd (cluster A), the presence of water–either with high A or low A–would be sufficient to make the environment suitable. Consistently, in COL_ANa and COL_ANw, a wide variation around the mean A value does not result in a state change. Thus, taking into account the case of animals, the scaling of the models in cluster A could fit the case of aquatic invertebrates (such as cladocerans, rotifers, copepods and insect larvae, i.e., organisms sized less than ca. 1 cm) that can achieve high population sizes in small volumes and tolerate a broad variation in A. This a priori assessment was confirmed after dissecting the outputs from the models. As shown in Fig 7, the models in both cluster A and cluster B did not confound ponds that are frequently dry with unpredictable ponds. Both ephemeral ponds and nearly permanent ponds achieved high predictability estimations. Thus, (a) frequent maintenance of some water and (b) frequently dry or with a low water cover can be regarded as predictable for these animals. In contrast to cluster A models, the models in cluster B assign a lower predictability to the environment of ponds that frequently maintain water. This finding can be interpreted as that the models in cluster B are sensitive to random variations in the amount of water when water occurs. This sensitivity is welcomed if the organism of interest is affected by the extension of the water cover, not being sufficient condition the presence of some water. This is likely the case for large animals that typically need a large exploitation area and are close to the top of the trophic chains (e.g., fish). Our ponds offer few opportunities for these animals because, according to our analysis, the studied ponds do not fit in such environments. Consistently, the animals typically found in our study area are mostly restricted to small invertebrates [60,67] that are able to display diapause stages. To the best of our knowledge, no large animals have been reported in our study system, with the exception of migratory birds [68]. For waterfowls, the seasonal home range would be the whole pond system rather than a single pond because they can move easily from one pond to another. Therefore, if the fluctuation of A is poorly coupled among ponds–as found in our system–waterfowls could exploit the ponds with high A, making the system more predictable for them. In other words, single-pond fluctuation is fine-grained for waterfowls.

6. Conclusions

When characterizing fluctuations in a natural system, ecologists face the following two problems: the need to gather long-term data and the handling of these valuable data that are tough to acquire. We have shown that remote sensing data have become more accessible, which opens an opportunity for ecologists to obtain the long time series needed to calculate predictability metrics. Our analysis at this step is restricted to a specific variable (A) and type of habitat, but this type is important in many geographic regions. Once this long-term data series is acquired, a model that produces a predictability metric has to be developed. We thus performed a conceptual analysis in which we propose several models based on how the variation of the focal variable can be relevant to different organisms with different strategies and life histories. Interestingly, in addition to Colwell’s approach, we have introduced GAM fitting as an alternative approach for measuring predictability; thus, the effect of a continuous vs. a discrete approach could be studied. Our methodology extracted meaningful information regarding the degree of predictability of a set of Mediterranean ponds. Interestingly, some model assumptions have been shown to exert minor effects (as shown by the correlation data and the cluster analysis). In contrast, other model assumptions caused divergences in predictability estimations, and we propose that this divergence can be associated with the differences in how the organism of interest perceives fluctuations in its environment. The methodology described in this manuscript is applicable to a wide variety of study systems and will be valuable for quantifying and characterizing predictability, which is essential considering the predicted scenario of upcoming global increases in the unpredictability of environmental fluctuations [3,5]. Interestingly, the satellite Sentinel 2A, which was recently launched in 2015, and satellite Sentinel 2B, which will be lunched in the near future, will work with Landsat 8 to provide images every three to five days. This improvement in the temporal resolution of remote sensing data will enable us to improve the area estimations and better study the variability.

Supporting information

S1 Fig. Relationships among the water-surface area (A) obtained through a direct inspection of aerial scenes and the A values from raw satellite scenes after applying only the first condition (left panel, Y axis) and after applying the 2cA assessment (right panel, Y axis).

The dots are the values for 19 out of 20 ponds recorded in the years 2006 and 2009. The maximum time for matching values corresponding to the two estimations was 18 days. Linear least squares fitting is represented as a continuous line, and the corresponding equation and determination coefficient are shown in the upper right corner of each panel. The dashed line represents the ideal situation in which a perfect match exists between the aerial and satellite estimations of A. n = 33; the pair 0–0 was found in 18 cases in the left panel and 22 cases in the left panel.


S2 Fig. Bars: Percentage of satellite scenes in which salt-covered pixels (TM4< 0.4) were detected after retaining potentially water-covered pixels (MNDWI> 0) at each pond.

The means and standard deviations (shown between parentheses) of the percentage of reduction are shown above.


S1 Table. Data used for validation of satellite water-surface area estimation (from aerial scenes) and presence/absence of water (from field observations).


S2 Table. Number of pixels assigned as water–covered pixels and to cloud-covered pixels in each pond (columns) and satellite scene (rows).



We thank Eric F. Vermote for his assistance in Landsat processing issues and helpful comments. Satellite data were provided by the European Space Agency and United States Geological Survey. Aerial images were obtained from the Spanish Instituto de Información Geográfica.


  1. 1. Williams PD, Hastings A. Paradoxical persistence through mixed-system dynamics: towards a unified perspective of reversal behaviours in evolutionary ecology. Proc R Soc B Biol Sci. 2011;278: 1281–1290. pmid:21270032
  2. 2. Simons AM. Modes of response to environmental change and the elusive empirical evidence for bet hedging. Proceedings Biol Sci /The R Soc. 2011;278: 1601–9. pmid:21411456
  3. 3. Pimm SL, Russell GJ, Gittleman JL, Brooks TM. The Future of Biodiversity. Science. 1995;269: 347–350. pmid:17841251
  4. 4. Root TL, Price JT, Hall KR, Schneider SH, Rosenzweig C, Pounds JA. Fingerprints of global warming on wild animals and plants. Nature. 2003;421: 57–60. pmid:12511952
  5. 5. IPCC. Summary for policymakers. The Physical Science Basis Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. 2007. p. 996. 10.1038/nrc3183
  6. 6. Chevin L-M, Lande R, Mace GM. Adaptation, plasticity, and extinction in a changing environment: towards a predictive theory. PLoS Biol. 2010;8: e1000357. pmid:20463950
  7. 7. Chown S, Hoffmann A, Kristensen T, Angilletta M, Stenseth N, Pertoldi C. Adapting to climate change: a perspective from evolutionary physiology. Clim Res. 2010;43: 3–15.
  8. 8. MacArthur R, Levins R. Competition, habitat selection, and character displacement in a patchy environment. Proc Natl Acad Sci USA. 1964;51: 1207–1210. pmid:14215645
  9. 9. Levins R. Evolution in Changing Environments: Some Theoretical Explorations. 1st ed. Princeton: Princeton University Press; 1968.
  10. 10. Southwood T. Habitat, the templet for ecological strategies? J Anim Ecol. 1977;46: 337–365.
  11. 11. Crozier LG, Hendry A. P, Lawson PW, Quinn TP, Mantua NJ, Battin J, et al. Potential responses to climate change in organisms with complex life histories: evolution and plasticity in Pacific salmon. Evol Appl. 2008;1: 252–270. pmid:25567630
  12. 12. Schowengerdt RA. Preface to the Third Edition. Remote Sensing. 3rd ed. Elsevier; 2007. pp. 37–38.
  13. 13. Pettorelli N, Laurance WF, O’Brien TG, Wegmann M, Nagendra H, Turner W. Satellite remote sensing for applied ecologists: opportunities and challenges. Milner-Gulland EJ, editor. J Appl Ecol. 2014;51: 839–848.
  14. 14. Colwell R, Ulaby F, Simonett D. Manual of remote sensing. Volume 2. Interpretation and applications. 2nd ed. Falls Church: American Society of Photogrammetry; 1983.
  15. 15. Campbell J. Introduction to remote sensing. 3rd ed. Guilford Press; 2002.
  16. 16. Roach JK, Griffith B, Verbyla D. Landscape influences on climate-related lake shrinkage at high latitudes. Glob Chang Biol. 2013;19: 2276–84. pmid:23536378
  17. 17. May R, McLean A. Theoretical ecology: Principles and applictions. 1st ed. Oxford University Press. Oxford: Oxford University Press; 2007.
  18. 18. Frazier PS, Page KJ. Water Body Detection and Delineation with Landsat TM Data. Photogramm Eng Remote Sens; 2000;66: 1461–1467.
  19. 19. Asner GP. Tropical forest carbon assessment: integrating satellite and airborne mapping approaches. Environmental Research Letters. 2009. p. 34009.
  20. 20. Sabo JL, Post DM. Quantifying periodic, stochastic, and catastrophic environmental variation. Ecol Monogr. 2008;78: 19–40.
  21. 21. Colwell RK. Predictability, constancy, and contingency of periodic phenomena. Ecology. 1974;55: 1148–1153.
  22. 22. Stearns S. On measuring fluctuating environments: predictability, constancy, and contingency. Ecology. 1981;62: 185–199.
  23. 23. Gallart F, Prat N, García-Roger EM, Latron J, Rieradevall M, Llorens P, et al. A novel approach to analysing the regimes of temporary streams in relation to their controls on the composition and structure of aquatic biota. Hydrol Earth Syst Sci. 2012;16: 3165–3182.
  24. 24. Blanchet FG, Legendre P, Borcard D. Modelling directional spatial processes in ecological data. Ecol Modell. 2008;215: 325–336.
  25. 25. Legendre P, Gauthier O. Statistical methods for temporal and space-time analysis of community composition data. Proc R Soc B Biol Sci. 2014;281: 20132728–20132728. pmid:24430848
  26. 26. Grossman G, Sabo J. Incorporating Environmental Variation into Models of Community Stability: Examples from Stream Fish. Am Fish Soc Symp. 2010;73: 407–426.
  27. 27. Wetzel RG, Likens G. Limnological Analysis. 3rd ed. Springer Science & Business Media; 2013.
  28. 28. MacArthur RH, Wilson EO. The theory of island biogeography. 1st ed. Princeton: Princeton University Press; 1967.
  29. 29. Reche I, Pulido-Villena E, Morales-Baquero R, Casamayor EO. Does ecosystem size determine aquatic bacterial richness? Ecology. 2005;86: 1715–1722.
  30. 30. Turner MA, Huebert DB, Findlay DL, Hendzel LL, Jansen WA, Bodaly R, et al. Divergent impacts of experimental lake-level drawdown on planktonic and benthic plant communities in a boreal forest lake. Can J Fish Aquat Sci. 2005;62: 991–1003.
  31. 31. Cott PA, Sibley PK, Somers WM, Lilly MR, Gordon AM. A review of water level fluctuations on aquatic biota with an emphasis on fishes in ice-covered lakes. Journal of the American Water Resources Association. 2008. pp. 343–359.
  32. 32. Mageed AAA, Heikal MT. Factors affecting seasonal patterns in epilimnion zooplankton community in one of the largest man-made lakes in Africa (Lake Nasser, Egypt). Limnologica. 2006;36: 91–97.
  33. 33. Wurtsbaugh WA. Food-web modification by an invertebrate predator in the Great Salt Lake (USA). Oecologia. 1992;89: 168–175. pmid:28312870
  34. 34. White MS, Xenopoulos MA, Hogsden K, Metcalfe RA, Dillon PJ. Natural lake level fluctuation and associated concordance with water quality and aquatic communities within small lakes of the Laurentian Great Lakes region. Hydrobiologia. 2008;613: 21–31.
  35. 35. Williams WD. Inland salt lakes: An introduction. Hydrobiologia. 1981;81–82: 1–14.
  36. 36. Hammer UT. Saline Lake Ecosystems of the World. 1st ed. Springer Science & Business Media. Dordrecht: Dr W. Junk Publishers; 1986.
  37. 37. Bryant RG, Rainey MP. Investigation of flood inundation on playas within the Zone of Chotts, using a time-series of AVHRR. Remote Sens Environ. 2002;82: 360–375.
  38. 38. Scuderi L a., Laudadio CK, Fawcett PJ. Monitoring playa lake inundation in the western United States: Modern analogues to late-Holocene lake level change. Quat Res. University of Washington; 2010;73: 48–58.
  39. 39. Adams KD, Sada DW. Surface water hydrology and geomorphic characterization of a playa lake system: Implications for monitoring the effects of climate change. J Hydrol. 2014;510: 92–102.
  40. 40. Xu H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. International Journal of Remote Sensing. 2006. pp. 3025–3033.
  41. 41. Chao Rodríguez Y, el Anjoumi A, Domínguez Gómez JA, Rodríguez Pérez D, Rico E. Using Landsat image time series to study a small water body in Northern Spain. Environ Monit Assess. 2014;186: 3511–22. pmid:24452860
  42. 42. Ormeci C, Ekercin S. An assessment of water reserve changes in Salt Lake, Turkey, through multi-temporal landsat imagery and real-time ground surveys. Hydrol Process. 2007;21: 1424–1435.
  43. 43. Groeneveld DP, Barz DD. Dixie Valley, Nevada playa bathymetry constructed from Landsat TM data. J Hydrol. 2014;512: 435–441.
  44. 44. Blondel J, Aronson J. Biology and wildlife of the Mediterranean region. 1st ed. Theil E, editor. Oxford: Oxford University Press; 2000.
  45. 45. Gómez-Alday JJ, Carrey R, Valiente N, Otero N, Soler A, Ayora C, et al. Denitrification in a hypersaline lake-aquifer system (Pétrola Basin, Central Spain): the role of recent organic matter and Cretaceous organic rich sediments. Sci Total Environ. Elsevier B.V.; 2014;497–498: 594–606. pmid:25169874
  46. 46. Northrop A. IDEAS–LANDSAT Products Description Document. 2014.
  47. 47. Masek J, Vermote E, Saleous N, Wolfe R, Hall F, Gao F, et al. LEDAPS Calibration, Reflectance, Atmospheric Correction Preprocessing Code, Version 2. Model Prod. 2013; 10.3334/ORNLDAAC/1146
  48. 48. Zhu Z, Woodcock CE. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens Environ. Elsevier Inc.; 2012;118: 83–94.
  49. 49. Masek J.G., Vermote E.F., Saleous N., Wolfe R., Hall F.G., Huemmrich F., Gao F., Kutler J., and Lim T.K., A Landsat surface reflectance data set for North America, 1990–2000, Geoscience and Remote Sensing Letters. 2006. 3: 68–72.
  50. 50. Vermote E., Justice C., Claverie M., & Franch B. Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment. 2016. 185: 46–56.
  51. 51. Remer L. A., Kaufman Y. J., Tanré D., Mattoo S., Chu D. A., Martins J. V., Li R.-R., Ichoku C., Levy R. C., Kleidman R. G., Eck T. F., Vermote E., and Holben B. N. The MODIS Aerosol Algorithm, Products, and Validation. J. Atmos. Sci. 2005. 62: 947–973.
  52. 52. Claverie M., Vermote E. F., Franch B., and . Evaluation of the Landsat-5 TM and Landsat-7 ETM+ surface reflectance products. Remote Sens. Environ.2015. 169:390–403.
  53. 53. Franch B., Vermote E. F., and Claverie M. Intercomparison of Landsat albedo retrieval techniques and evaluation against in situ measurements across the US SURFRAD network. Remote Sens. Environ. 2014. 152:627–637.
  54. 54. Vermote EF, Saleous N, Justice CO, Kaufman YJ, Privette JL, Remer L, et al. Atmospheric correction of visible to middle-infrared EOS-MODIS data over land surfaces: Background, operational algorithm and validation. J Geophys Res Atmos. 1997;102: 17131–17141.
  55. 55. Hastie TJ, Tibshirani R. Generalized additive models. 1st ed. Statistical Science. Boca Raton: CRC press; 1990.
  56. 56. Wood SN. Modelling and smoothing parameter estimation with multiple quadratic penalties. J R Stat Soc Ser B (Statistical Methodol. 2000;62: 413–428.
  57. 57. R Development Core Team R. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2009.
  58. 58. Suzuki R, Shimodaira H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006;22: 1540–2. pmid:16595560
  59. 59. Greenacre M, Primicerio R. Multivariate analysis of ecological data. 1st ed. Bilbao: Rubes Editorial; 2013.
  60. 60. Lapesa S. Efecto de la depredación por invertebrados sobre poblaciones simpátricas de rotíferos. University of Valencia. 2004.
  61. 61. García-Roger EM. Análisis demográfico de bancos de huevos diapáusicos de rotíferos. University of Valencia. 2006.
  62. 62. Horton NJ, Kleinman KP. Much Ado About Nothing. Am Stat. 2007;61: 79–90. pmid:17401454
  63. 63. Kennard MJ, Mackay SJ, Pusey BJ, Olden JD, Marsh N. Quantifying uncertainty in estimation of hydrologic metrics for ecohydrological studies. River Res Appl. 2009;22: 137–156.
  64. 64. Jiang M, Felzer BS, Sahagian D. Characterizing predictability of precipitation means and extremes over the conterminous United States, 1949–2010. J Clim. 2016;29: 2621–2633.
  65. 65. Gulhar M, Kibria G, Albatineh AN, Ahmed NU. A comparison of some confidence intervals for estimating the population coefficient of variation: A simulation study. SORT Stat Oper Res Trans. 2012;36: 45–68.
  66. 66. Fay DS. A biologist’s guide to statistical thinking and analysis. The C. elegans Research Community, editor. WormBook. WormBook; 2013. 10.1895/wormbook.1.159.1
  67. 67. Cirujano S, Montes C, Garcia L. Los humedales de la provincia de Albacete: una panorámica general. Al-Basit Rev Estud albacetenses. 1988;24: 77–95.
  68. 68. López Donate JA, Montesinos Ibáñez JG, López Cano JA, Martínez Núñez JC. Estudio descriptivo del sector endorreico-salino de Pétrola, Corral Rubio y La Higuera (Albacete). Jornadas sobre el Medio Nat Albacet. 2004; 357–370.