Coupling Demographic and Genetic Variability from Archived Collections of European Anchovy (Engraulis encrasicolus)

It is well known that temporal fluctuations in small populations deeply influence evolutionary potential. Less well known is whether fluctuations can influence the evolutionary potentials of species with large census sizes. Here, we estimated genetic population parameters from as survey of polymorphic microsatellite DNA loci in archived otoliths from Adriatic European anchovy (Engraulis encrasicolus), a fish with large census sizes that supports numerous local fisheries. Stocks have fluctuated greatly over the past few decades, and the Adriatic fishery collapsed in 1987. Our results show a significant reduction of mean genetic parameters as a consequence of the population collapse. In addition, estimates of effective population size (Ne) are much smaller than those expected in a fishes with large population census sizes (Nc). Estimates of Ne indicate low effective population sizes, even before the population collapse. The ratio Ne/Ne ranged between 10−6 and 10−8, indicating a large discrepancy between the anchovy gene pool and population census size. Therefore, anchovy populations may be more vulnerable to fishery effort and environmental change than previously thought.


Introduction
Population dynamics are driven by a complex set of ecological and evolutionary variables that mold population demography on several spatial and temporal scales [1]. Unfortunately, a temporal dimension is not always considered in the assessment of population structure, because historical samples are not always available, or because of the high costs of retrospective laboratory analysis [reviewed in 2]. Among vertebrates, populations of bony fishes are most often examined for historical trends because of the availability of historical tissues [3], which are sometimes available from archeological excavations [3] and from archived fish scales and otoliths used for aging by fishery managers [4]. The extraction of DNA from these archived Ricerche) and were collected 1978-2000. After removal from individuals for age determination, otoliths were stored individually in plastic tubes at room temperature. We analyzed a time series of samples from two spawning grounds, the northern Adriatic (Chioggia) from 1978, 1987, 1994 and 2000, and the middle-southern Adriatic (Vieste) from 1985, 1987 and 1989 (Fig 1). Samples from these years were chosen to include the biomass fluctuations starting in the mid 1970s. This objective was easily fulfilled for samples from Chioggia, because of a more complete time series. Unfortunately, the Vieste time series was available for only 1985-1989. We used otoliths collected between May and September to include individuals collected only during the spawning period. These time series were compared with 73 contemporary fish collected in 2010 in surveys off Chioggia and Vieste. The number of specimens used for each sampling year and location appears in S1 Table. Ethics statement Ethical procedures were not required to manipulate specimens in this study. Tissues used in this study were recovered from previously collected otoliths and from Italian commercial catches.

Genomic DNA extraction
Genomic DNA was extracted from tissue residuals on otolith following the method described in [28]. Each set of otoliths was incubated at 55°C (up to 5 hours) in 500 μL of digestion buffer (100 mmolÁL -1 Tris-HCl, pH 8.0; 100 mmolÁL -1 NaCl; 1 mmolÁL -1 EDTA, pH 8.0; 0.5% SDS). EDTA and SDS concentrations in the digestion buffer were reduced according to [29]. Fin or caudal muscle tissues from contemporary samples were used for DNA extraction by standard phenol-chloroform procedures described in [30].

PCR amplification and genotyping
Samples were screened at 7 microsatellite loci that yielded short alleles less than 200 bp, a feature enhancing the ability to amplify degraded DNA from archived tissues. Primers for 6 of these loci were obtained from the European anchovy genome described by [31] (Ee10) and [32] (Ee2-508, Ee2-165b, Ee2-135, Ee2-407 and Ee2-91b). Primers for the seventh locus (Eja183) was described by [33] in the Japanese anchovy (Engraulis japonicus) genome (S2 Table). However, new primers were needed to obtain molecular sizes smaller than those produced by the original primers. Hence, primer sequences for these loci, except Ee2-91b and Ee2-135, were re-designed using the online program Primer3Plus (http://www.bioinformatics.nl/ cgi-bin/primer3plus/primer3plus.cgi/) [34] from their NCBI (National Center for Biotechnology Information) clone sequence references (S2 Table). Attempts to develop additional primers producing shorter alleles to increase the number of loci failed because sequence constraints did not allow the design of new primers.
PCR conditions were optimized for all loci using touchdown amplification. The PCR reaction mixture contained approximately 5-10 ng genomic DNA, 0.25 U Taq DNA polymerase (MyTaq, Bioline), 0.5 μmolÁL -1 of each primer, and 1× MyTaq (Bioline) Reaction buffer (15 mmolÁL -1 MgCl 2 , 1.25 mmolÁL -1 of each dNTP, plus stabilizers and enhancers) in a total volume of 10 μL. Each PCR plate included a blank control, including an extra control from each DNA extraction. PCR products were separated on a 2% (w/v) agarose gel and stained with GelRed™ (Biotium, Inc.) to check for size and PCR specificity. Subsequently, these products were run on a 5% denaturing polyacrylamide sequencing gel and visualized by a silver staining protocol [35]. Genotyping procedures were described in [6].

Accuracy of extraction and genotyping archived DNA
Archived DNA is particularly prone to genotyping errors due to i) its degraded nature increases the risks for allele dropout and null alleles and ii) potential contamination from exogenous DNA. To address these issues, a standardized protocol was used for DNA extraction and genotyping. First, archived DNAs were extracted in an ancient-DNA room equipped with sterilized laboratory utensils regularly sterilized and supplies dedicated to analyzing archived materials. Secondly, DNA extractions were performed in clusters of individuals from the same locality and year. These precautions helped to avoid potential contamination of DNA between individuals from different years and sampling sites. To further reduce sample contamination, no contemporary anchovy samples were processed while working on archived samples.
Genotyping errors occur primarily during PCR amplification by increasing the number of DNA copies from non-target individuals. Hence, PCRs were carried out in a separate ancient-DNA room, furnished with PCR workstations, set of pipettes, reagents and thermal cyclers used exclusively to amplify archived DNA. PCRs were carried out in single tubes and individuals were amplified in clusters belonging to the same sampling site and year. Additional attention was paid to the number of individuals amplified per PCR; less than 1/4 of the capacity of each thermal cycler was used to maximize the space between tubes. Thermal cyclers were sterilized by UV light and by cleaning with 10-20% bleach before PCR.
Contamination was tested by screening PCR products from two microsatellite loci, Ee2-91b and Ee2-165b. These loci show clear allelic patterns and were used to eliminate individuals that showing multiple alleles indicating contamination (S1 Table). Faint genotypes from uncontaminated individuals were initially entered as missing data. DNA from these individuals was amplified three times using a progressively stringent PCR protocol. This was accomplished by increasing the concentration of Taq Polymerase in each reaction (0.05 μl, 0.1 μl, 0.2 μl on a final volume of 10 μl) and by increasing the annealing temperature by 0.5°C per PCR.
Genotypes were not clearly detectable after these three trials were excluded from the dataset. A set of 10 individuals were randomly selected from each site and genotyped again as quality controls.
Finally, the probability of identity, P ID , and probability of siblings, P ID(sib) , were used to test whether contamination biased genotype quality of our dataset. P ID , the probability that two unrelated individuals in a dataset will have the same multilocus genotype by chance, and P ID (sib) , an analogous that takes inbreeding into account, were calculated with GenAlEx 6.502 [36]. Since we analyzed years and sampling locations independently, we assumed that DNA contamination would only be by individuals in the same sample. Therefore, cross-contaminated DNA should share more common genotypes than non-contaminated DNA. Thresholds for detecting contaminated samples were set at P ID > 0.001-0.0001 and P ID(sib) > 0.01, as proposed by [37].

Statistical treatment of data
We estimated genetic diversities from genotypic and allelic frequencies only after checking the quality of the genotypes (amplification success rate and genotype consistency) and the occurrence of null alleles and other genotyping errors (allele dropout and stutter peaks) using MICRO-CHECKER 2.2.1 [38]. Loci affected by null alleles were corrected following the Brookfield algorithm [39]. Mean number of alleles observed for each locus (N A ), allelic richness (R S ), observed (H o ) and expected (H e ) heterozygosities, and the inbreeding coefficient (F IS ) were estimated with FSTAT 2.9.3 [40]. Linkage disequilibrium between loci was tested with a Monte Carlo Markov Chain (MCMC) test executed by 1000 batches of 2000 iterations each, using Genepop 4.0.10 [41]. A sequential Bonferroni adjustment of P-values was applied to account for an increase in type-I error from multiple comparisons [42]. Outlier loci were detected by departures from the neutral expectations with fdist [43] implemented in Lositan [44] and a LnRH method developed for microsatellites [45] (S1 File).
Temporal variation in genetic diversity was quantified using estimates of expected (H e ) and observed (H o ) heterozygosity and the expected number of alleles (N A ). Since estimates of genetic diversity can be biased by differences in sample sizes, the software POPTOOLS 3.1.0 [46] (http://www.cse.csiro.au/poptools) was used to standardize and analyze temporal trends in these variables. Estimates were standardized by generating 1000 'real' samples (n = 24) with sampling without replacement for each year and location (Chioggia and Vieste). A total of 9000 "real" samples were generated. In addition, 9000 "randomized" samples were generated pooling together the samples from the Chioggia and Vieste time series. Statistical significances of temporal trends were estimated by calculating the slope (b) and the Pearson's correlation coefficient (r) of linear regressions of H e , H o , and N A against years. The r and b estimates of "real" parameters for each year in both the Chioggia and Vieste time series were compared with randomized values, using MCMC chains of 1000 iterations. We evaluated the significance of trends in H e , H o , and N A over the entire time-series and between consecutive years.
Effective population size (N e ) was estimated using a moment-based temporal method and a coalescent likelihood-based method. These methods are generally robust and are often used on historical time series [13]. Both temporal methods use changes in allele frequencies between two samples separated by a known number of generations. Since anchovies become mature at one year of age [24], one generation per year was assumed. Following [47], N e was also estimated over the whole timeframe of each time series to detect consistency with estimates obtained from intermediate temporal timeframes. NE-ESTIMATOR 2.01 [48] was used to estimate moment-based N e values. We implemented the following procedures: i) we used Plan II sampling in which individuals were removed after sampling, ii) we used the Pollak formula [49] for computing the standardized variance in allele frequency (F K ) to estimate N e values, iii) we estimated 95% confidence intervals from jackknifing, and iv) we used a threshold of 0.01 as the lowest allele frequency considered to estimate N e values, because allele frequencies were seldom lower than 0.01. The TM3 software [50] was used to obtain coalescent likelihood-based estimates of N e . TM3 uses a Bayesian approach based on coalescent theory and MCMC simulations to generate a posterior distribution of N e values. In addition, we used maximum N e values (Ne MAX) and the number of generations between consecutive samples (T) as priors in the TM3 simulations. To test for consistency between simulations, three independent runs were performed using Ne MAXes equal to 1000, 5000, and 10,000. Simulations were performed with 20,000 MCMC iterations and were conducted separately for Chioggia and Vieste samples.
Estimates of N e from the moment-based temporal method were used to provide temporal trends in N e /N c ratios from both time series. Estimates of N c were obtained from annual acoustic surveys (MEDIAS Project, http://www.medias-project.eu; Fig 2) [22,23] for stocks in the northern and middle-southern Adriatic Sea.
Datasets were tested for genetic population bottleneck signatures with a heterozygosityexcess based method implemented in BOTTLENECK 1.2 [51]. This test assumes that during a strong reduction in population size, allele numbers decline faster than H e . Since expected heterozygosity at mutation-drift equilibrium (H eq ) is calculated from the number of observed alleles, H e becomes larger than H eq, leading to heterozygosity excess (H exc ) [52]. Values of H exc were tested with Wilcoxon's signed rank tests [53], a powerful and robust test when the dataset contains few (< 20) polymorphic loci [51]. We used 1000 iterations and three mutational models: infinite allele model (IAM), stepwise mutation model (SMM), and a two phase mutation model (TPM) with 95% single-step mutations and 5% multistep mutations, as recommended in [51].
In addition, purported bottlenecks were verified by simulating bottlenecked populations using BOTTLESIM 2.6 [54]. Genotypic data for CH78 and VI85 were used independently as founder populations to simulate bottleneck events lasting 32 and 25 generations, respectively. Population declines were tested using 1000 iterations with the following criteria: dioecious organisms with random mating system, balanced sex ratio (1:1), two years of expected lifespan for the species, given its high natural mortality [19], one year at first reproduction [24], and 90% generational overlap. The moment-based N e estimates from NE-ESTIMATOR were used as analogues of population size. Simulated values of H e and N A were plotted and compared with observed H e and N A estimates to provide evidence of a population bottleneck.
Finally, we estimated the degree of geographic and temporal variability among samples first with pairwise values of θ ST [55], calculated with FSTAT and second with a Bayesian approach, implemented in STRUCTURE 2.3.2.1 [56,57]. The second approach estimates the number of populations (K) under Hardy-Weinberg expectations and linkage equilibrium. The most probable number was tested using priors ranging from K = 1 to K = 6, under an admixture model and with correlated allele frequencies. Ten independent runs were performed for each K using an MCMC of 500,000 iterations after a burn-in of 50,000 iterations.

Results
Missing genotypes accounted for 7.51% of the overall dataset. A pool of 90 individuals were randomly chosen (10 individuals per sampling year and site) to re-genotype 7 of the screened loci (a total of 630 control genotypes were produced). We were unable to PCR-amplify 2.3% of control genotypes (15 of 630), but the remaining 615 genotypes were consistent with their first molecular size detection. There was a lack of allele dropout and stuttering throughout the entire dataset. The occurrence of null alleles was detected in 16 of 63 tests, and after the Brookfield correction [39] 6 of 63 remained significant. Since the Brookfield correction improved our dataset, and no loci showed a systematic presence of null alleles, the entire corrected dataset was used for subsequent statistical analyses.
The results of the BOTTLENECK analysis using the IAM mutation model showed significant tests for heterozygosity excesses (H exc ) in every sample for both "one-tail" and "twotailed" Wilcoxon's tests (Table 3). In contrast, tests using the SMM and TPM mutation models Table 1. Temporal moment-based effective genetic size estimates. N G = number of generations between a pair of samples. N e = effective genetic size estimate. Lower and upper 95% CI represents a 95% Confidence Interval (CI) for N e estimated for each pair of samples. N e/ N c ratio = ratio between effective population size (estimated from temporal moment-based method) and census size (obtained from MEDIAS acoustic survey data; Fig 2). NA = not available data. were not significance ( Table 3). The Shift-Mode tests revealed normal L-shaped allele frequency distributions in all the samples. Simulated population bottlenecks indicated the likely occurrence of bottlenecks in both the Chioggia and Vieste time series, based on comparisons between simulated and observed expected heterozygosity (H e ) values. Similar values between simulated and observed H e were found for CH78, CH94, and CH00 in the Chioggia time series (Fig 4a), whereas H e values for CH87 and CH10 were outside the confidence interval (Fig 4a). In the Vieste time series, simulations showed similar simulated and observed H e values for VI85, VI87, and VI89, whereas VI10 was placed outside the confidence interval (Fig 4b). Observed values of N A were consistent with simulated N A for CH78 and CH87 in the Chioggia time series and for VI85 and VI87 in the Vieste time series.
Estimates of θ ST revealed 9 significant tests among 36 total pairwise comparisons (Table 4). These significant comparisons were largely between localities and years (i.e. significant tests were CH87-VI85, CH94-VI85, CH94-VI89, CH00-VI85, CH00-VI89, CH10-VI89) ( Table 4). Genetic variation varied little with time within each time series. In fact, in the Chioggia time   series only the CH94-CH00 comparison was statistically significant, whereas two tests (VI85-VI89 and VI85-VI10) were significant in the Vieste time series (Table 4). STRUCTURE analysis showed a lack of genetic structure between and within the Chioggia and Vieste time series. The greatest likelihood indicated a single population, K = 1 [LnP(K) = -10843.28] (S1 Fig). Outcomes from the neutrality tests are available in S1 File and in S5 and S6 Tables.

Discussion
The extent of genetic variability estimated from both time series revealed the effectiveness of using archived DNA for population genetics studies. Otolith collections represent valuable sources of material for historical comparisons of genetic variation among locations or temporal variation within locations. Major challenges with the use of archived tissues are the reliability of results in light of potential DNA contamination and the level of polymorphism of the molecular marker. In the first case, we took considerable precautions to avoid DNA contamination. Genotypes were consistent between initial genotyping and re-genotyping some numerous individuals, indicating genotype quality. The amount of missing data was similar with that in other studies in which archived DNA from otoliths was used [6,[58][59][60]. In addition, the comparison between polymorphisms detected in the oldest and newest samples confirmed no age-related failure in genotyping. A new set of PCR primers for seven microsatellite loci yielding products < 200 bp produced a high degree of amplification success, with amplification success was between 68.75% and 100%. The number of loci genotyped in this study is similar to the mean number of loci typed in other archived DNA studies [6,[58][59][60].

Loss of genetic variability as a consequence of demographic collapse
Temporal trends in genetic variables are concordant with abundance trends in the Chioggia and Vieste time series. However, the Chioggia time series showed greater changes in genetic variables than the Vieste time series. In the Chioggia time series, a significant reduction in H e appeared after the population collapse in 1987 and lasted until 1994. This observation was additionally confirmed by a decrease in N A from 1978 to 1994 and supports the idea of different responses of H e and N A to demographic declines [61]. After 2000, both H e and N A increased in the Chioggia time series in association with a growing population biomass in northern Adriatic Sea [22]. Values of H e also declined in the middle-southern Adriatic basin (Vieste), but this drop was not statistically significant. However, values of N A increased significantly after 1989. These results clearly show the influence of demographic fluctuations on genetic variability of Adriatic anchovies, which has also been observed in other marine populations [6,62]. These results add to a growing number of studies showing a large contrast between the genetic effective population size and census in marine species [63]. Even though a significant reduction in H e in northern Adriatic anchovies is consistent with a bottleneck in N e and with a recent 10 to 20-fold drop in fishery biomass [19,20,22], the results from the heterozygosity excess test did not support this scenario. This outcome may be due to two mechanisms. First, the loss of genetic diversity after demographic decline may not have been strong enough to produce genetic profiles leading to heterozygosity excess. This weak-effect hypothesis, however, is inconsistent with the temporal drop in H e in response to large declines in population abundance [64]. Second, the lack of significance may have been due to low statistical power of the test to detect heterozygosity excess with our dataset. Peery et al. [65] reviewed 703 populations belonging to 116 vertebrate species and concluded that even in species with strong population declines these tests often failed to detect bottlenecks [65]. The low statistical power of bottleneck tests results in many cases from little temporal variation in N e [65], and this is consistent with our observations of a relatively small range of N e among samples from Chioggia from 1978 to 2000, as well as that among samples from Vieste from 1985 to 1989.
The use of demographic simulations with BOTTLESIM showed that most of the observed H e values between 1978 and 2000 in Chioggia and between 1985 and 1989 in Vieste, may be consistent with the loss of genetic variability as a consequence of a bottleneck. Observed N A values between 1978 and 1987 in Chioggia and between 1985 and 1987 in Vieste were close to simulated values. These simulations also indicated a decrease in genetic diversity in the late 1980s that is consistent with bottlenecks in population size.
Reduced genetic variability as a consequence of a bottleneck, especially in the northern Adriatic Sea, is also indicated by significant deviations from Hardy-Weinberg expectations for Chioggia in 1978 and 1987, as a consequence of homozygote excess. These genetic signals support the hypothesis that a reduction in genetic variability began earlier than the decline in effective population size. Although unlikely in small pelagic fishes, other marine species characterized by high fecundities and a high variance in reproductive success (sweepstakes recruitment) often show signs of inbreeding [7,66,67].

Evolutionary potential and effective population sizes in anchovies
The MCMC methods used to evaluate temporal variability in N e in the Chioggia and the Vieste time series were characterized by good converges for this parameter, which gives some confidence in the results. We observed low levels of effective population sizes (N e ) of only 100s of fish in the pre-collapse and early post-collapse periods in both historical time series. These results indicate that the reductions in genetic diversity were largely driven by genetic drift [13] and that genetic diversity dropped before the population collapse in 1987. Estimates of N e from pre-collapse years were as low as those for 1987. Our global estimates of N e for Adriatic anchovy indicate values of N e between one hundred and a few thousand fish, values that have previously been reported in other small pelagics, such as European sardines [6,12,27], and in other marine organisms characterized by high fecundity, type III survivorship and sweepstakes recruitment [8,62,66]. These small effective population sizes challenge the idea that marine populations are invulnerable and inexhaustible resources by virtue of their huge census sizes [8,12].
Estimates of N e /N c in Adriatic anchovies was between 10 −6 and 10 −8 orders of magnitude and are smaller than those typical of many marine species (N e /N c ! 10 −5 ) [13]. These small ratios indicate that the large variance in reproductive success can lead to the demographic instability. Although the primary intent of this article was not to provide advice to management, it is nevertheless important to compare our temporal estimates of N e with conservation guidelines provided by the revised 50/500 rule [68]. The 50/500 rule recommends that populations be maintained at sizes larger than 50 individuals over the short term and at least 500 individuals over the long term to avoid inbreeding and the loss of genetic diversity, and at sizes larger than 500-1000 to indefinitely retain evolutionary potential [reviewed in 68]. In our results, anchovy pre-and immediately post-population collapse showed estimates of N e that come close to those recommended by 50/500 rules, suggesting that severe population declines place Adriatic anchovies at risk of losing evolutionary potential in the absence of stable connectivity among anchovy subpopulations in the Adriatic basin.

Temporal population genetic integrity
Despite fluctuations in genetic variables in response to severe population declines, anchovies in the Adriatic Sea generally showed little genetic heterogeneity both between northern and middle-southern samples and among temporal samples in these areas. We detected a lack of genetic differentiation and a single population comprising all the Chioggia and Vieste samples. The occurrence a single genetic stock is also confirmed by a microsatellite study of samples from throughout the Adriatic basin (Ruggeri et al. in preparation). The lack of temporal variability confirms that this genetic composition has been maintained over at least the past four decades. Similar evidence was found in other temporal studies of small pelagics [6,69] and other commercially exploited demersal fishes [58,70]. The increasing trends in H e , N A and N e over time indicates that both localities have recovered from a population bottleneck, but that the recovery appears to have started earlier in Vieste (from 1989) than in Chioggia (from 1994). The relative geographic genetic homogeneity of anchovy populations in the Adriatic (Ruggeri et al. in preparation) may indicate that migration between subpopulations is important for the recovery of genetic diversity in local populations. In this way, migration between the Chioggia and Vieste populations likely led to the recovery of most of the genetic diversity lost in the Chioggia population over about seven generations.

Conclusions
This study confirms the utility of archived DNA to address evolutionary and fishery-related problems. Our results indicate that, contrary to the common assumption that abundant species, such as anchovies, are not immune to the loss of genetic variability, because sweepstakes recruitment can lead to small effective population sizes. The effective population sizes of Adriatic anchovies are several orders of magnitude smaller than census population sizes. This large discrepancy indicates that anchovy populations are more vulnerable to fishery pressures and environmental change than previously thought, and these small effective sizes must be taken into account in the management of fishery harvests. Our results also indicate that temporal genetic data provides an assessment of the impact of demographic disturbances on the persistence of local populations.   Table. Locus-by-locus Probability of Identity (P ID ) and Probability of Identity of Siblings (P ID(sib) ) estimations [36].