Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Seasonality modulates the predictive skills of diatom based salinity transfer functions

  • Alejandra Goldenberg Vilar ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Palaeoecology, Department of Physical Geography, Faculty of Geosciences, Utrecht University, Willem C. van Unnikgebouw, Utrecht, The Netherlands

  • Timme Donders,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Palaeoecology, Department of Physical Geography, Faculty of Geosciences, Utrecht University, Willem C. van Unnikgebouw, Utrecht, The Netherlands

  • Aleksandra Cvetkoska,

    Roles Writing – review & editing

    Affiliation Palaeoecology, Department of Physical Geography, Faculty of Geosciences, Utrecht University, Willem C. van Unnikgebouw, Utrecht, The Netherlands

  • Friederike Wagner-Cremer

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Palaeoecology, Department of Physical Geography, Faculty of Geosciences, Utrecht University, Willem C. van Unnikgebouw, Utrecht, The Netherlands


The value of diatoms as bioindicators in contemporary and palaeolimnological studies through transfer function development has increased in the last decades. While such models represent a tremendous advance in (palaeo) ecology, they leave behind important sources of uncertainties that are often ignored. In the present study we tackle two of the most important sources of uncertainty in the development of diatom salinity inference models: the effect of secondary variables associated to seasonality and the comparison of conventional cross-validation methods with a validation based on independent datasets. Samples (diatoms and environmental variables) were taken in spring, summer and autumn in the freshwater and brackish ditches of the province of North Holland in 1993. Different locations of the same province were sampled again in 2008–2010 to validate the models. We found that the abundance of the dominant species significantly changed between the seasons, leading to inconsistent estimates of species optima and tolerances. A model covering intra-annual variability (all seasons combined) provides averages of species optima and tolerances, reduces the effect of secondary variables due to the seasonality effects, thus providing the strongest relationship between salinity and diatom species. In addition, the ¨all-season¨ model also reduces the edge effects usually found in all unimodal-based calibration methods. While based on cross-validation all four models seem to perform relatively well, a validation with an independent dataset emphasizes the importance of using models covering intra-annual variability to perform realistic reconstructions.


Diatom assemblage changes provide an excellent basis for inferring environmental changes from seasonal to decadal or centennial scales given their sensitivity to a broad variety of habitat parameters. Available transfer functions cover nutrient status of freshwater bodies, temperature, as well as salinity dynamics [15]. Diatom-based models to infer salinity or tidal height have recently become an increasing focus in transfer function development as a potential tool in sea level reconstruction efforts [68]. These models require extremely high predictive precision given the often subtle changes they need to quantify. However, the application of transfer functions to model a single variable (e.g. salinity) is problematic under multiple response triggers [9,10]. The effect of secondary gradients on model performance and importantly on the forthcoming reconstructions, has yet received surprisingly little attention [11]. Quantitative reconstructions are a form of space-for-time substitution [12], and as such require that the co-variation between the modelled variable and potentially confounding underlying ecological factors is constant in both time and space. In most situations this is difficult to test, but nonetheless a highly unrealistic assumption. Firstly, this inherent problem of quantitative reconstructions is especially evident in relation to seasonality (referring to biological and chemical changes occurring in continental waters according to the different seasons in temperate climates). In diatom populations, seasonal succession is related to changes in nutrient concentrations, light [13], thermal stratification and predator–prey relationships [14,15]. In addition, diatom communities follow distinct seasonal succession patterns caused by changes in life-history traits [16,17] and nutrient stoichiometry [18]. Despite the high likelihood that a modern training set selected for building a transfer function will be influenced by the seasonality effects, the strength of such dependences is not known, yet. Only a limited number of diatom-based studies have used contemporary data to facilitate interpretation of sediment core in terms of intra-annual diatom distribution [19]. The few studies available that have analyzed the effect of seasonality on diatom-based transfer functions generally focused on nutrient variables [1921]. However, the effect of many other environmental variables can be obscured by seasonality, including salinity. The potential effect of seasonality is thereby extremely important, given the intra-annual variability of salinity that needs to be separated from the long-term changes [7].

A second limitation in the development of transfer functions is the lack of independent data sets to test the reliability of the reconstructions [22]. Usually the predictive ability of the transfer functions is only assessed by cross-validation methods. If the observations in the calibration set are not independent, because of autocorrelation or other types of pseudo-replication, performance statistics based on cross-validation will be over-optimistic [23]. Therefore, the ideal way of finding unbiased transfer-function performances is the use of an independent test set [11].

In the present study, we tackle two important uncertainty sources in diatom based transfer function development following the recommendations set outlined in Juggins [11]. We evaluate (1) the effect of seasonality when modeling a single variable, e.g.salinity, regarding the “true” ecological meaning of statistically significant models and (2) compare model performance between conventional cross validation methods and independent validation dataset. Both approaches aim to detect confounding environmental factors on the primary variable of interest.

Materials & methods

Study area and datasets

The study was conducted in the province of North Holland, the Netherlands. The samples were collected in modified wetlands that have been reclaimed for agriculture representing a system of shallow brackish and freshwater ditches draining water from low-lying areas. Water tables in the three areas are kept within strict limits and the banks are bordered by reed belts dominated by Phragmites australis. Most of the diatom species and macrophytes found in these modified wetlands are also found in mesotrophic and eutrophic European lakes. Detailed description of the study area is provided in [24].

The dataset for the development of the diatom inference models is referred as ¨training set¨ and consists of a total of 96 samples from 32 locations sampled in spring (March), summer (June) and autumn (September) 1993. The samples were collected randomly in the province of North Holland spanning a salinity gradient from 200–9000 mg/l chloride. As a result, salinity represents the most important gradient in the dataset. Water samples were collected monthly at the same locations as the diatom samples and measured for: surface water oxygen, pH, conductivity, chloride, sulphates, transparency (Secchi depth), total nitrogen, total phosphorus and chlorophyll-a (results are summarized in Table 1). In this dataset, 400 diatom valves were identified per sample.

Table 1. Descriptive statistics of the environmental variables measured in the training and validation datasets.

Temp = temperature; O2 = dissolved oxygen; tN = total nitrogen; tP = total phosphorus; Cl = chloride; Secchi = Secchi depth; Cond. = conductivity; Chl a = chlorophyll a.

The dataset used for testing the performance of the models (henceforth referred to as “validation dataset”) consists of 90 samples collected in drainage ditches within the framework of a long-term monitoring program in North-Holland carried out by the local water authority during spring 2008–2010. In this dataset, 200–300 diatom valves were identified per sample, according to the standard for routine monitoring in The Netherlands. Fig 1 shows the sampling locations for both datasets. In the validation dataset, water samples were collected monthly. The determination of chloride, nutrients and other environmental variables followed the standardized national protocols accredited by the Dutch Standards Institute (NEN-EN 13946). We used chloride average values of the spring months (i.e. March through May), assuming that these values reflect local conditions better than single measurements. A summary of the environmental variables measured in the validation data set can be found also in Table 1.

Fig 1. Sampling locations of the training and validation dataset.

Sampling locations of the training dataset collected in 1993 in spring, summer and autumn (locations marked with stars). Sampling locations of the validation dataset collected in spring 2008–2010 (marked with circles) in the province of North Holland (The Netherlands).

In both datasets, diatoms were sampled from reed stems (Phragmites australis), which is the most abundant and common emergent macrophyte in the drainage ditches. Reed is the recommended substratum for sampling periphyton in the Netherlands [25] in order to avoid differences caused by substratum heterogeneity. Attached diatoms were prepared using H2O2 digestion and mounted on microscope slides with Permount Mounting Medium (Fischer Scientific, Pittsburgh). Taxonomic identification was based on the volumes of Krammer and Hoffmann [26,27] following standard protocols (NEN-EN 14407). Species percentage data were square-root transformed to reduce the weight of dominant species prior statistical analysis.

Data analysis

The effect of seasonality on diatom community structure.

We performed two different tests to quantify the effect of seasonality on diatom community structure: 1) permutational multivariate analysis of variance (PERMANOVA) [28] to reflect changes in community composition (species identity and abundances) and 2) analysis of multivariate dispersions [29] that reflects changes in community heterogeneity or beta diversity.

PERMANOVA analyzes the variance of multivariate data explained by a set of explanatory factors on the basis of any dissimilarity measure of choice, thereby allowing for a wide range of empirical data distributions. The null hypothesis tested by PERMANOVA is that, under the assumption of exchangeability of the sample units among the groups:”the centroids of the groups are equivalent for all groups” [30]. The test of multivariate dispersions explicitly examines that ‘‘the average within-group dispersion (measured by the average distance to group centroid) is equivalent among the groups”. This test is equivalent to the popular Levene’s test in univariate ANOVA but applied to the study of species assemblages by using dissimilarity indices [31].

The statistical significance of multivariate variance components were each tested using 9999 permutations of residuals under a reduced model [28] with an a priori chosen significance level of α = 0.05. All multivariate analyses were done on the basis of a Bray–Curtis dissimilarity matrix calculated from square-root transformed relative abundance data. To visualize multivariate patterns in assemblages across the three seasons, non-metric multi-dimensional scaling (NMDS) was used as an ordination method.

To determine which individual species contributed most to the differences between the seasons we used the species contribution to similarity method (SIMPER), which measures the percentage contribution of each species to average dissimilarity between two groups [7,32]. One Way Anova was performed to test if environmental variables differed between spring, summer and autumn. These analyses were performed in R with the vegan package [33].

The development of diatom based salinity transfer functions.

The weighted optima of each species along the salinity gradient (chloride) were determined by averaging all values for each variable from the sites where the taxon occurred, weighted by its abundance at each site. The taxon’s tolerance along each gradient was then calculated as an abundance-weighted standard deviation of the environmental variable [34].

We measured the explanatory strength of salinity as a predictor of diatom assemblage composition by calculating the ratio of the eigenvalue of the first (constrained) CCA axis (λ1) with salinity as a single explanatory variable with the first unconstrained axis (λ2). A value of λ1/λ2 greater than 1 indicates that the variable of interest represents an important ecological gradient in the training set and meets the criterion for a “useful calibration” [11]. To assess the potential confounding effect of other explanatory variables, we performed hierarchical partitioning ordination with the full suite of environmental variables [11]. The analysis were performed in Canoco 5.0. [35].

The WA-PLS regression [36] with “leave one out” cross-validation was used to develop statistical prediction models. This method combines the features of weighted averaging (WA) and partial least squares (PLS) and uses the residual correlation structure in the data to improve the fit between the biological data and environmental data in the training set [37]. The predictive abilities of transfer functions were assessed by examining the relationship between the observed and diatom-inferred values, as well as the observed and leave one out-estimated values of the variables of interest in the training set (r2apparent and r2leave-one-out), and evaluation of root mean square error of prediction (RMSEP) [38]. We have worked with the smallest number of ‘useful’ components, these are the ones giving a reduction of 5% or more in the cross-validated RMSE (compared with the RMSE for the 1 component model) [39].

We further evaluated the ability of the model to predict salinity through an independent validation: We used the data from the training set to develop transfer functions and tested their accuracy using the validation dataset. These analyses were performed using C2 version 1.7 software. [40].


A total of 408 species were identified in the training set of which 179 species covered 98% of all observations and therefore only these were used to develop the models. In the validation dataset 253 species were identified of which 145 already covered 98% of the data set and were used for the reconstructions.

The PERMANOVA analysis revealed that diatom community composition is affected by seasonality, in terms of overall differences in species abundances: spring-summer p = 0.001; spring-autumn p < 0.001; summer-autumn p = 0.01 (Table 2 and Fig 2). The species that contributed the most to the differences between the seasons are common species present in all three sampling seasons, as it is revealed by the SIMPER method (Table 3). In contrast, the analysis of multivariate dispersions showed that seasonality does not affect beta diversity or community heterogeneity (Table 2). In summary, these analyses indicate that the seasonal differences in diatom community structure are due to changes in species abundances and not to species identities. The optima and tolerances of species that contribute the most to community dissimilarity are shown on Fig 3. Either spring or summer models tend to produce the highest optima values, while autumn model lead to the lowest optima. The ¨all-season¨ model produces average optima values in comparison to the individual season models. Most of the species of Fig 3 contain a Hill´s N2 value higher than 5 (rule of thumb for selecting reliable optima estimations) [41]. Only two species have Hills N2 values lower than 5 and none of them belong to the all-season model: Mastogloia pumila 3.82; 3.51; 3.66 respectively in the Spring, Summer and Autumn models and Diatoma moniliformis 3.09; 1.66; 3.33 respectively in the Spring, Summer and Autumn models.

Fig 2. Nonmetric multidimensional scaling ordination diagram of sites based on Bray–Curtis similarity in diatom composition in spring, summer and autumn.

Average within group dissimilarity spring: 0.77 (grey squares); summer: 0.83 (open circles); autumn: 0.81 (black circles). The results of the statistical tests of community structure are visible in the graph: the groups differed in their relative position (centroids–PERMANOVA test) and not in the dispersion of the sampling points. (Stress = 0.25).

Fig 3. Optima (dots) and tolerances (error bars) of the species that contributed the most to community dissimilarity in the four models: Spring, summer, autumn and all (the three seasons combined).

Table 2. P values of PERMANOVA and test of multivariate dispersions comparing the three different datasets (spring, summer and autumn).

Significance level of α = 0.05.

Table 3. Species that contribute the most to community dissimilarity (based on Bray-Curtis measure) between the three sampling seasons (spring, summer and autumn).

Average dissimilarity (Av. dissim), contribution percentage (contrib %) and cumulative percentage are shown.

The environmental variables NO3, NH4, TN, TP. Cl and Chl a were log-transformed to achieve normality, in order to reduce skewed distribution of the response variable and hence reduce variance heterogeneity.

If instead of using averages of environmental variables of the summer months we have used whole year averages of environmental variables, none of them was sufficient in explaining diatom composition in any of the individual seasonal datasets. Therefore, environmental variables were calculated per season and the CCA were produced for the corresponding seasonal diatom dataset. Salinity thereby exhibits the strongest explanatory power (λ1/λ2>1) for diatom community composition in the training dataset: λ1/λ2spring = 1.16; λ1/λ2summer = 1.29; λ1/λ2autumn = 1.25 and λ1/λ2all = 1.22 (Fig 4). Only two environmental variables showed statistical significant differences between the seasons according to One way ANOVA analysis: tN (F: 3.37; P = 0.038) and pH (F: 6.76; P = 0.002).

Fig 4. Variance partitioning of the training dataset (spring, summer, autumn and all combined).

The independent and shared components of variance explained by the measured environmental variables and the ratio λ1/λ2 as a measure of the explanatory strength of salinity is also shown.

Hierarchical partitioning revealed that in all seasons sampled, salinity is the variable that explained most of the variation on diatom community composition. From the total explained variation, salinity explained 25.8% in the spring model (unique is 9.5%), 27.8% in the summer model (unique is 9.7%), 25.2% in the autumn model (unique is 7.2%) and 35% in ¨all-season¨ model (unique is 9.7%) (Fig 4). The similarity and high percentage of total explained variation by chloride and sulfates indicates the correlation between these two variables. Also relatively high and expected correlation is found with conductivity (Table 4). Because chloride correlates to sulfates and conductivity in a different degree according to the model, it was not possible to use any rule of thumb to discard collinear variables because otherwise, models cannot be compared. Therefore, we decided to keep all environmental variables measured in all models to see the overall pattern of variance partitioning and be able to compare the three models.

Table 4. Correlations between salinity and the rest of environmental variables in the different models and datasets (spring, summer, autumn, all seasons and validation).

P values are shown for significant Pearson correlations. Temp = temperature; O2 = Dissolved oxygen; tN = Total nitrogen; Tp = Total phosphorus; Secchi = Secchi depth; Cond = conductivity; Chl a = chlorophyll a.

The WA-PLS technique revealed that the most parsimonious models for salinity inference would be the one-component model for spring and autumn and the two-component model for summer and all seasons combined. The relationship between observed and predicted values of salinity were very strong with no apparent outliers (Table 5; Fig 5). The distribution of the most common species along the salinity gradient is shown in Fig 6.

Fig 5. Diatom salinity inference models (WA-PLS) showing the relationship between observed and predicted salinity under leave-one-out cross validation for spring, summer, autumn and ¨all-seasons¨ combined.

Fig 6. Relative abundances (percentages) of the most common species along the salinity gradient in the complete dataset (all-seasons).

Table 5. Performance statistics for weighted averaging partial least squares (WA-PLS) based diatom salinity inference models for spring, summer, autumn and all season models.

The validation of the model using an independent dataset is shown in Fig 7. The models to some extent overestimate salinity values at the low end of the gradients and underestimate the values at the high end, which was evident in the residuals vs. observed salinity plot (S1 Fig). The edge effect is clearly improved by the ¨all-season¨ model, providing the most accurate reconstruction out of the four models (Fig 7).

Fig 7. Salinity reconstructions using the validation dataset for spring, summer, autumn and ¨all-seasons¨ WA-PLS model.


Our study shows that seasonality effects were responsible for important shifts in species dominance in our training set, as it is evident from the highly significant PERMANOVA test results. Changes in species relative abundance during seasonal succession are usually advanced by the shift from early colonizers such as the adnate species Navicula perminuta [42] or poor competitors (Ctenophora pulchella, Gomphonema olivaceum) to species competing well under a resource limitation in late-successional communities (Cocconeis placentula, Gomphonema parvulum) [4345]. Intra-species competition can be expected to play an important role in the nutrient-limited situation commonly occurring during late summer in freshwater and brackish environments [46]. Our data showed that total nitrogen significantly decreased in summer and autumn compared to spring. This could reflect changes in nutrient stoichiometry which might be driving differences in seasonal species abundances. Besides competition under resource limitation, inter-specific interactions, such as grazing, are also important factors driving seasonal shifts in species abundances [14].

As a result of these changes in species abundances, optima and tolerances of species are different if the samples are collected in spring, summer or autumn, even when salinity did not show statistically significant changes between the seasons (Fig 3). The species collected in summer tend to have higher optima estimation, the ones collected in autumn tend to have lower optima while the ones collected in spring showed both, higher or lower optima than the ones provided by the ¨all-season¨ model. Further investigation is needed in order to confirm if this seasonality effect can be generalized for the variable chloride and how the seasonality effect will be regarding the optima estimation of other environmental variables. In our dataset, such differences in the contribution to seasonal community dissimilarity were observed for Cocconeis placentula and Rhocoisphenia abbreviata with optima ranging from 1300–2800 and 900–1600 mg/l mg/l chloride based on autumn and spring models, respectively.

The effect of seasonality investigated here emphasizes a crucial aspect negatively affecting the accuracy of transfer functions. That is, that almost in every environmental variable measured, an important fraction of explained variation is shared with other environmental variables [11]. In our study, even when salinity has the highest contribution to the total percentage of explained variation, still around half of it is shared with other environmental variables, such as sulphates, conductivity, secchi depth, total phosphorus and temperature. As it is expected, high correlation between salt ions (chloride and sulphates) and between ions and conductivity were found in all training sets (spring, summer and autumn). Especially prone to be affected by secondary variables is the training data collected in summer and autumn where chloride concentration is significantly related to nutrient variables (total phosphorus), temperature and secchi depth (Table 4). In summer, phosphorus released from the sediment increases due to sulfate induced phosphorus mobilization [47], and this is reflected in the positive correlation between ions and total phosphorus. As a consequence of increased water nutrients, visibility is also affected. Both, the spring dataset and the dataset that combines the three seasons, are less prone to the effect of secondary variables (e.g. nutrients) and showed lower correlation with potentially nuisance variables. Moreover, the ¨all-season¨ model showed the highest percentage of explained variation by salinity (Fig 4). This model also improves the non-linear distortions at the end of the gradients (edge effects) that are an inherent problem of all unimodal-based calibration methods using weighted averaged estimations [7,36,39,48] (Fig 5). Although the weighted inverse deshrinking regression incorporated in WA-PLS reduces the edge effect, it has its own problems by “pulling” the predicted values towards the mean of the calibration set resulting in an inevitable bias with some over-estimation at low values and some under-estimation at high values, as it is evident from S1 Fig [36,39,49]. In the ¨all-season¨ model this bias is reduced because the salinity signal is stronger (the total explained variation by salinity is much higher as compared to the other variables) and also because the number of samples has increased from 32 to 96 (which improved species optima estimation). The dataset size of the ¨all-season¨ model falls into the range of ≈ 100 suggested by Wilson (1996) [50] for the greatest improvement of the RMSEP. To verify whether the good performance of the ¨all-season¨ model resulted from the higher sample number as compared to the other models, we also built the all-season model by randomly selecting one sample out of the three seasons in all sampling locations. The result of the model performance is the same: r2leave-one-out = 0.86; RMSEP = 0.19.

When we tested the performance of the models using the completely independent validation dataset, we also found that the ¨all-season¨ model performed the best. In the reconstructions all models tend to underestimate the values at the high end of the gradient, and the underestimation is in increasing order from the summer, spring and autumn models (due to the decreasing trend in optima estimation). However, the underestimation of the high salinity values is the lowest in the ¨all-season¨ model. As a result, this model provides the most realistic reconstruction even though some edge effects remain (Fig 7).

The reasons for this offset are several:

  1. the effect of secondary variables in the training sets. The significant correlation between salinity and potential nuisance variables in summer and autumn (total phosphorus, secchi depth and temperature) lies between 0.32–0.54 R2 values (Table 4). According to Juggins (2013), for training sets that exhibit a correlation 0.2< R <0.5 to a nuisance variable, a change in the latter can produce spurious fluctuations in the reconstructed variable typically greater than 10 units (regarding a simulated environmental variable ranging from 1 to 100). In addition, even a low correlation to a nuisance variable (R <0.2) affects the reconstruction in at least 10 units. This corresponds to 0.1 of the gradient length studied in Juggins [11] which in this study will be equivalent to ≈880 mg/l of chloride change. In the independent validation we have found a bias of less than 500 mg/l chloride in around 68%-77% of the observations in the four models and a bias between 2000–4000 mg/l chloride in around 12% (autumn model) - 4.5% (¨all-season¨ model) of the observations.
  2. the ratio λ1/λ2 in the validation dataset is lower than in the models (λ1/λ2validation = 0.88; S2 Fig). The significant correlation between salinity and nuisance variables in the validation dataset (total nitrogen and dissolved oxygen- Table 4) could be the reason of this lower ratio and of obscuring the effect of chloride in the distribution of diatom communities. When transfer functions are used to perform palaeoenvironmental reconstructions, the strength of the relationship between the species and the modelled variable is not known. We have in the other hand, ways of reducing the error of prediction by taking seasonality effects in to account when developing diatom based transfer functions.
  3. there are always some species mismatches between the transfer function and the reconstruction data which lessen the accuracy of palaeoenvironmental reconstructions. In our study, indicators of high salinity in the models (Mastogloia pumila, Rhopalodia brebissonii, Navicula arenaria) were not present or very scarce in the validation dataset. This could be due to problems in the taxonomic identification (e.g. species of the genus Navicula), differences in the number of individuals identified between the model and the reconstruction datasets, or due to ecological factors not taken into account (dispersal, grazing, etc). In addition, brackish species usually have very wide tolerances due to the fact that these species are well adapted to the fluctuating environmental conditions in shallow brackish habitats [51]. An example are species of the genus Diatoma [52]. Other species such as Rhoicosphenia abbreviata and Cocconeis placentula were present in the whole salinity gradient. These are pollution tolerant species [53] that do not respond to the salinity changes considered in this study and thus, may also introduce noise in the reconstructed values. This could be the reason why the greatest bias in model performance occur in the high end of the gradient. High salinity observations (>2000 mg/l chloride) represent only 12% of all the samples in the validation dataset and euhalobe or polyhalobe diatoms may be underrepresented. With the general aim to collect representative community data, not only the taxonomic identification but also the quality of the indicator may have a considerable impact in (palaeo) environmental reconstructions. It has been suggested that species with very wide tolerances to the environmental variable of interest may therefore be eliminated for transfer functions to increase model performance [54]. This is a very interesting field of research and deserves further investigation, so that appropriate recommendations can be made towards further standardization.

The results presented in this study highlight the risk of developing transfer functions based on environmental variables and/or diatom samples collected in individual seasons, only. In palaeocological studies, sediment core samples contain diatom assemblages that have already been subject to temporal and spatial integration. However, the development of the transfer function always required a modern data set. Despite the high number of studies stressing the potential effect of confounding environmental variables and the application of temporal dependent calibrations [11,55,56] the effect of seasonality on the accuracy of diatom-based transfer functions has so far received only little attention. Environmental datasets based on samples taken during one season or calibrated against single measurement are very common [6,57,58] while in other cases sampling time is not specified [3]. Potential inaccuracies and low predictive skills of models based on single season data from sites with highly variable hydrology and water chemistry were proposed already [56]. Examples include diatom calibration data sets that use environmental measurements of annual means that do not necessarily reflect the specific concentrations present when diatoms are growing [20]. Regarding the diatom data, it is common to find in the scientific literature samples that do not integrate seasonal variability. For example when diatom samples are collected from substrates such as macrophytes, stones, pebbles or other hard substrates [7,59,60]. In addition, some of the data sets publicly available to use in transfer function development (European Diatom Database) are also composed by a mixed samples taken from epipelic, epiphytic or plankton samples, and used in several publications. If we compare the most accurate model (¨all-season¨) with the less accurate (autumn model) in our study, we found an underestimation of chloride concentration about 42% and 70% respectively. The effect of seasonality is increasing the error and uncertainty in the model estimation, in addition to the error caused by nuisance variables. The results of our study, and specifically from the independent validation, demonstrate the importance of a comprehensive dataset that takes on board the differences in species abundances due to seasonality to improve the accuracy of diatom based transfer functions.


Assuming that diatom inference models are invariant in space and time has important consequences for the reliability of environmental reconstructions. The present study has shown that seasonality modulates the predictive skills of diatom transfer functions by driving changes in the abundance of the dominant species. Transfer function performance of a single season model (e.g. spring, summer or autumn) is reasonably good in the model cross-validation. Nonetheless the validation of the models with an independent dataset demonstrates that single season sampling leads to strong underestimation of salinity due to the confounding effect of other environmental variables such as temperature, secchi depth, nutrients and dissolved oxygen. The strength of salinity as the main environmental variable explaining the distribution of diatom communities is enhanced in the ¨all-season¨ model. The ¨all-season¨ model combines all data and provides the highest percentage of explained variation by salinity, increases the accuracy of species optima and tolerance estimations, and reduces the non-linear distortions at the end of the gradients.

One of the basic requirements for quantitative palaeoenvironmental reconstruction is the availability of a large, high-quality data set of modern species assemblages and contemporary environmental data. Given the strong effect of seasonality on the composition of diatom assemblages, the development of transfer functions capturing intra-annual variability of species abundances and independent model validation is strongly recommended.

Supporting information

S1 Fig. Residuals vs observed salinity with corresponding R2 values in the four models: Spring, summer, autumn and all season models.

The overestimation of salinity at low end and an underestimation of salinity at the high end of the gradient is especially evident in the spring and autumn models. When all seasons are combined, this edge effects are less prominent and a more accurate performance results.


S2 Fig. Comparison of the percentage of unique and shared explained variation by salinity in the different datasets (spring, summer, autumn, all seasons and validation datasets).

The ratio of λ1/λ2 is also shown for each case.



Special thanks to Gert van Ee (Hoogheemeraadschap Hollands Noorderkwartier) and Herman van Dam (Water en Natuur) for providing the data set used to build the models and Hoogheemraadschap Hollands Noorderkwartier for providing the ecological monitoring data (validation dataset). We also thank Richard Telford and one anonymous reviewer for the interesting comments and suggestions on the manuscript.


  1. 1. Bennion H. A diatom-phosphorus transfer function for shallow, eutrophic ponds in southeast England. Nutrient Dynamics and Biological Structure in Shallow Freshwater and Brackish Lakes: Springer; 1994. p. 391–410.
  2. 2. ter Braak CJ, van Dame H. Inferring pH from diatoms: a comparison of old and new calibration methods. Hydrobiologia 1989;178(3):209–223.
  3. 3. Tibby J. Development of a diatom-based model for inferring total phosphorus in southeastern Australian water storages. J Paleolimnol 2004;31(1):23–36.
  4. 4. Diatoms as a tool for reconstructing sedimentary environments in coastal wetlands; methodological aspects. Twelfth International Diatom Symposium: Springer; 1993.
  5. 5. Reavie ED, Axler RP, Sgro GV, Danz NP, Kingston JC, Kireta AR, et al. Diatom-based weighted-averaging transfer functions for Great Lakes coastal water quality: relationships to watershed characteristics. J Great Lakes Res 2006;32(2):321–347.
  6. 6. Szkornik K, Gehrels WR, Kirby JR. Salt-marsh diatom distributions in Ho Bugt (western Denmark) and the development of a transfer function for reconstructing Holocene sea-level changes. Mar Geol 2006;235(1):137–150.
  7. 7. Wachnicka A, Gaiser E, Collins L, Frankovich T, Boyer J. Distribution of diatoms and development of diatom-based models for inferring salinity and nutrient concentrations in Florida Bay and adjacent coastal wetlands of south Florida (USA). Estuaries and Coasts 2010;33(5):1080–1098.
  8. 8. Benito X, Trobajo R, Ibáñez C. Benthic diatoms in a Mediterranean delta: ecological indicators and a conductivity transfer function for paleoenvironmental studies. J Paleolimnol 2015;54(2–3):171–188.
  9. 9. Brodersen KP, Anderson N. Distribution of chironomids (Diptera) in low arctic West Greenland lakes: trophic conditions, temperature and environmental reconstruction. Freshwat Biol 2002;47(6):1137–1157.
  10. 10. Davidson TA, Sayer CD, Perrow M, Bramm M, Jeppesen E. The simultaneous inference of zooplanktivorous fish and macrophyte density from sub‐fossil cladoceran assemblages: a multivariate regression tree approach. Freshwat Biol 2010;55(3):546–564.
  11. 11. Juggins S. Quantitative reconstructions in palaeolimnology: new paradigm or sick science? Quaternary Science Reviews 2013;64:20–32.
  12. 12. Jackson ST, Williams JW. Modern analogs in Quaternary paleoecology: here today, gone yesterday, gone tomorrow? Annu Rev Earth Planet Sci 2004;32(1):495.
  13. 13. Reynolds CS. The ecology of phytoplankton.: Cambridge University Press; 2006.
  14. 14. Benincà E, Huisman J, Heerkloss R, Jöhnk KD, Branco P, Van Nes EH, et al. Chaos in a long-term experiment with a plankton community. Nature 2008;451(7180):822–825. pmid:18273017
  15. 15. Reynolds CS. Physical determinants of phytoplankton succession. Plankton ecology: Springer; 1989. p. 9–56.
  16. 16. Litchman E, Klausmeier CA. Trait-based community ecology of phytoplankton. Annual Review of Ecology, Evolution, and Systematics 2008:615–639.
  17. 17. Sommer U, Gliwicz ZM, Lampert W, Duncan A. The PEG-model of seasonal succession of planktonic events in fresh waters. Arch Hydrobiol 1986;106(4):433–471.
  18. 18. Philippart CJ, Cadée GC. Was total primary production in the western Wadden Sea stimulated by nitrogen loading? Helgol Mar Res 2000;54(2–3):55–62.
  19. 19. Sayer CD. Problems with the application of diatom‐total phosphorus transfer functions: examples from a shallow English lake. Freshwat Biol 2001;46(6):743–757.
  20. 20. Bradshaw EG, Anderson NJ, Jensen JP, Jeppesen E. Phosphorus dynamics in Danish lakes and the implications for diatom ecology and palaeoecology. Freshwat Biol 2002;47(10):1963–1975.
  21. 21. Bennion H, Johnes P, Ferrier R, Phillips G, Haworth E. A comparison of diatom phosphorus transfer functions and export coefficient models as tools for reconstructing lake nutrient histories. Freshwat Biol 2005;50(10):1651–1670.
  22. 22. Leorri E, Fatela F, Cearreta A, Moreno J, Antunes C, Drago T. Assessing the performance of a foraminifera-based transfer function to estimate sea-level changes in northern Portugal. Quatern Res 2011;75(1):278–287.
  23. 23. Telford R, Birks H. The secret assumption of transfer functions: problems with spatial autocorrelation in evaluating model performance. Quaternary Science Reviews 2005;24(20):2173–2179.
  24. 24. Vilar AG, van Dam H, Vonk JA, van der Geest , Harm G, Admiraal W. Typology of diatom communities in the Dutch delta: Recognizing patterns of environmental drivers in nutrient rich ditches. Ecol Ind 2014;45:561–569.
  25. 25. Bijkerk R, Beers M. Handboek Hydrobiologie: biologisch onderzoek voor de ecologische beoordeling van Nederlandse zoete en brakke oppervlaktewateren. Rapport 2010–28: Stichting Toegepast Onderzoek Waterbeheer Amersfoort; 2010.
  26. 26. Krammer K, Lange-Bertalot H, Krammer K, Lange-Bertalot H, Bate N, Podzorski A, et al. Bd. 2: Bacillariophyceae.: Stuttgart [etc.]: Fischer Verlag [etc.]; 1986.
  27. 27. Hofmann G, Werum M, Lange-Bertalot H. Diatomeen im Süßwasser-Benthos von Mitteleuropa: Bestimmungsflora Kieselalgen für die ökologische Praxis; über 700 der häufigsten Arten und ihre Ökologie.: Gantner ˆ [ua]‰; 2011.
  28. 28. Anderson MJ. A new method for non‐parametric multivariate analysis of variance. Austral Ecol 2001;26(1):32–46.
  29. 29. Anderson MJ. Distance‐Based Tests for Homogeneity of Multivariate Dispersions. Biometrics 2006;62(1):245–253. pmid:16542252
  30. 30. Anderson MJ, Walsh DC. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecol Monogr 2013;83(4):557–574.
  31. 31. Gijbels I, Omelka M. Testing for homogeneity of multivariate dispersions using dissimilarity measures. Biometrics 2013;69(1):137–145. pmid:23002793
  32. 32. Clarke K, Ainsworth M. A method of linking multivariate community structure to environmental variables. Marine Ecology-Progress Series 1993;92:205–205.
  33. 33. Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’hara R, et al. Package ‘vegan’. Community ecology package, version 2013;2(9).
  34. 34. Fritz S, Juggins S, Battarbee R. Diatom assemblages and ionic characterization of lakes of the northern Great Plains, North America: a tool for reconstructing past salinity and climate fluctuations. Can J Fish Aquat Sci 1993;50(9):1844–1856.
  35. 35. ter Braak CJ, Šmilauer P. Canoco reference manual and user's guide: software for ordination, version 5.0.: Microcomputer power; 2012.
  36. 36. ter Braak CJ, Juggins S. Weighted averaging partial least squares regression (WA-PLS): an improved method for reconstructing environmental variables from species assemblages. Hydrobiologia 1993;269(1):485–502.
  37. 37. Birks H. Quantitative palaeoenvironmental reconstructions. Statistical modelling of quaternary science data.Technical guide 1995;5:161–254.
  38. 38. Birks H, Line J, Juggins S, Stevenson A, Ter Braak C. Diatoms and pH reconstruction. Philosophical Transactions of the Royal Society B: Biological Sciences 1990;327(1240):263–278.
  39. 39. Birks H. DG Frey and ES Deevey Review 1: Numerical tools in palaeolimnology–Progress, potentialities, and problems. J Paleolimnol 1998;20(4):307–332.
  40. 40. Juggins S. C2 data analysis. University of Newcastle, England 2003.
  41. 41. Telford R, Trachsel M. Significance Tests for Palaeoenvironmental Reconstructions. 2015.
  42. 42. Majewska R, Convey P, De Stefano M. Summer Epiphytic Diatoms from Terra Nova Bay and Cape Evans (Ross Sea, Antarctica)-A Synthesis and Final Conclusions. PloS one 2016;11(4):e0153254. pmid:27078637
  43. 43. McCormick PV, Stevenson RJ. Mechanisms of benthic algal succession in lotic environments. Ecology 1991:1835–1848.
  44. 44. Soininen J, Eloranta P. Seasonal persistence and stability of diatom communities in rivers: are there habitat specific differences? Eur J Phycol 2004;39(2):153–160.
  46. 46. Hillebrand H, Sommer U. Response of epilithic microphytobenthos of the Western Baltic Sea to in situ experiments with nutrient enrichment. Mar Ecol Prog Ser 1997;160:35–46.
  47. 47. Smolders A, Lamers L, Lucassen E, Van der Velde G, Roelofs J. Internal eutrophication: how it works and what to do about it—a review. Chem Ecol 2006;22(2):93–111.
  48. 48. Bradshaw EG, Anderson NJ. Validation of a diatom–phosphorus calibration set for Sweden. Freshwat Biol 2001;46(8):1035–1048.
  49. 49. Lotter AF, Birks HJB, Hofmann W, Marchetto A. Modern diatom, cladocera, chironomid, and chrysophyte cyst assemblages as quantitative indicators for the reconstruction of past environmental conditions in the Alps. I. Climate. J Paleolimnol 1997;18(4):395–420.
  50. 50. Wilson SE, Cumming BF, Smol JP. Assessing the reliability of salinity inference models from diatom assemblages: an examination of a 219-lake data set from western North America. Can J Fish Aquat Sci 1996;53(7):1580–1594.
  51. 51. Woelfel J, Schoknecht A, Schaub I, Enke N, Schumann R, Karsten U. Growth and photosynthesis characteristics of three benthic diatoms from the brackish southern Baltic Sea in relation to varying environmental conditions. Phycologia 2014;53(6):639–651.
  52. 52. Ryves D, McGowan S, Anderson N. Development and evaluation of a diatom‐conductivity model from lakes in West Greenland. Freshwat Biol 2002;47(5):995–1014.
  53. 53. Saliba FM, Ghobara MM, Attard E, Ellul B. Primary Study of the Non-Marine Epilithic Diatom Communities of Malta and Gozo. Int.J.Curr.Microbiol.App.Sci 2016;5(10):69–78.
  54. 54. Racca JM, Gregory-Eaves I, Pienitz R, Prairie YT. Tailoring palaeolimnological diatom-based transfer functions. Can J Fish Aquat Sci 2004;61(12):2440–2454.
  55. 55. Birks H. DG Frey and ES Deevey Review 1: Numerical tools in palaeolimnology–Progress, potentialities, and problems. J Paleolimnol 1998;20(4):307–332.
  56. 56. Eloranta P, Soininen J. Ecological status of some Finnish rivers evaluated using benthic diatom communities. J Appl Phycol 2002;14(1):1–7.
  57. 57. Roberts D, McMinn A. A weighted-averaging regression and calibration model for inferring lakewater salinity from fossil diatom assemblages in saline lakes of the Vestfold Hills: a new tool for interpreting Holocene lake histories in Antarctica. J Paleolimnol 1998;19(2):99–113.
  58. 58. Wilson SE, Cumming BF, Smol JP. Assessing the reliability of salinity inference models from diatom assemblages: an examination of a 219-lake data set from western North America. Can J Fish Aquat Sci 1996;53(7):1580–1594.
  59. 59. Philibert A, Gell P, Newall P, Chessman B, Bate N. Development of diatom-based tools for assessing stream water quality in south-eastern Australia: assessment of environmental transfer functions. Hydrobiologia 2006;572(1):103–114.
  60. 60. Tibby J, Gell PA, Fluin J, Sluiter IR. Diatom–salinity relationships in wetlands: assessing the influence of salinity variability on the development of inference models. Hydrobiologia 2007;591(1):207–218.