Microsatellite Development and First Population Size Estimates for the Groundwater Isopod Proasellus walteri

Effective population size (N e) is one of the most important parameters in, ecology, evolutionary and conservation biology; however, few studies of N e in surface freshwater organisms have been published to date. Even fewer studies have been carried out in groundwater organisms, although their evolution has long been considered to be particularly constrained by small N e. In this study, we estimated the contemporary effective population size of the obligate groundwater isopod: Proaselluswalteri (Chappuis, 1948). To this end, a genomic library was enriched for microsatellite motifs and sequenced using 454 GS-FLX technology. A total of 54,593 reads were assembled in 10,346 contigs or singlets, of which 245 contained candidate microsatellite sequences with suitable priming sites. Ninety-six loci were tested for amplification, polymorphism and multiplexing properties, of which seven were finally selected for N e estimation. Linkage disequilibrium and approximate Bayesian computation methods revealed that N e in this small interstitial groundwater isopod could reach large sizes (> 585 individuals). Our results suggest that environmental conditions in groundwater, while often referred to as extreme, are not necessarily associated with small N e.


Introduction
Effective population size (N e ) is one of the most important parameters in, ecology, evolutionary and conservation biology (e.g. [1][2][3][4][5][6][7]). N e is classically defined as the number of breeding individuals in an idealized population (i.e. a panmictic population, with sex ratio of one and no overlapping generations) that has the same rate of change of allele frequencies or heterozygosity than the population under consideration [8]. Knowing N e allows evaluation of the effects of genetic drift and natural selection, as well as their consequences for population viability in terms of genetic diversity loss or inbreeding effects (e.g. [2,9,10]). Accurate estimation of N e has also become important from an analytical point of view since the formalization of the coalescent theory [11] and democratization of Bayesian inferences in population genetics, because informative priors significantly improve estimation and recovery of posterior densities (e.g. [12]). Last but not least, accessing N e provides access to genetic parameters such as migration, mutation or recombination rates.
Despite community agreement about the importance of N e , it remains one of the most difficult parameters to assess in natural populations. First, no simple and general relationships exist between N e and other demographic estimators such as total population size (population census size, N c [5]) or density [13]. Indeed, the N e /N c ratio does not only vary among taxa but is also likely to vary within species, especially in those showing high fecundity or high variation in density or reproductive success, such as arthropods and teleostean fishes [5,14]. Second, estimating N e from genetic data is also a challenging task because various approaches capture different components of effective population size [5,15,16]. Estimations of contemporary (current) N e , the focus of this study by contrast to long-term Ne inferred from sequence-based phylogenetic methods integrating N e over hundreds of generations, were initially based on temporal methods that analyzed several samples collected at different times from the same population [17][18][19][20]. These methods were time consuming and unmanageable for taxa displaying long generation times. However, recent analytical developments implemented single-sample approaches for estimating contemporary effective population size, using nuclear markers such as microsatellites and single nucleotide polymorphism [3,21,22]. As a consequence, estimates of contemporary N e , while still sparse, are becoming more frequent in the literature.
A survey of the current literature on freshwater metazoans revealed that most of our knowledge on N e arises from studies of actinopterygian fishes, with a strong bias towards economically important Salmoniformes. In particular, much effort was devoted to highlight the effect of environmental variables [23] and anthropogenic disturbances such as dams [24] or restocking practices [25,26] on N e . Actinopterygian fishes hence provided a much-needed opportunity to compare approaches and methods to estimate N e [7] or to infer the intraspecific variability of N e (e.g. [27]). Aside from actinopterygian taxa, information is available for lampreys [28], aquatic snails [29,30], mussels [31], calanoid copepods [32] and crayfishes, in the latter for long-term N e only [33,34]. Nonpermanent aquatic dwellers did not receive much attention either with only a few studies dedicated to frogs [35,36], newts [37], hydrophilid beetles [38], caddisflies [39] and damselflies [40]. Yet, most studies dealt with surface water taxa from streams, lakes, marshes, and springs, despite the fact that a large number of aquatic phylogenetic lineages are restricted to or occur predominantly in groundwater [41]. Rare exceptions concerned the estimation of long-term N e for groundwater crayfishes [33,42,43] and the estimation of N e as a product of an unknown mutation rate for diving beetles [44].
In this study, we estimated the contemporary effective population size of the obligate groundwater isopod Proasellus walteri (Chappuis, 1948). A genomic library was enriched for microsatellite motifs and sequenced using 454 GS-FLX technology. A total of 54,593 reads were assembled in 10,346 contigs or singlets, of which 245 contained candidate microsatellite sequences with suitable priming sites. Ninety-six loci were tested for amplification, polymorphism and multiplexing properties. Two genotyping multiplexes, for a total of seven loci were finally tested for cross-species transferability and used to infer contemporary effective population size in P. walteri.

Materials and Methods
Sampling and taxonomic identification P. walteri morphospecies is a small (body length: 3 mm), eyeless, and depigmented isopod that inhabits the interstices in the hyporheic zone of rivers and groundwater in unconsolidated sediments. Its geographic range encompasses two major river catchments, the Rhine and Rhône Rivers ( Figure 1). Proasellus is one of the most speciose genera in European groundwater, with approximately 120 species formally described in the literature. However, recent molecular studies by Morvan et al. [45] revealed that it contained a large number of unrecognized (morphologically cryptic) species. Morvan and coauthors distinguished three cryptic species within the P. walteri morphospecies: P. walteri_T058 is restricted to the Rhine River catchment, P. walteri_T059 to the Saône River catchment (a major tributary of the Rhône River) and P. walteri_T060 to the southern part of the Rhône River catchment (see Figure 1). Hereafter, the focal taxon in this study, P. walteri_T060, is referred to as P. walteri.
The microsatellite library was developed using P. walteri individuals from Sauzet site, six sites were used in amplification tests, five for polymorphism tests and two for N e estimations ( Figure 1; Table 1). Additionally, six species were collected to test for microsatellite loci cross-species transferability. Field sampling did not involve endangered or protected species and none of the sampling sites required specific permission.
Individuals were collected using a Bou-Rouch pump [46] at a depth of 60 cm below the riverbed. After water elutriation, samples were placed in 96% ethanol at ambient temperature for transport back to the laboratory, then at 4°C until sorting and morphological identification. Specimens were identified at the species level using original species diagnoses, which are mostly based on the morphology of male copulatory organs (second pleopod). Given the prevalence of cryptic species in the Proasellus genus [45] and the occurrence of a morphologically similar species complex Proasellus synaselloides (Henry, 1963) living in sympatry with P. walteri, molecular verifications were conducted for each individual after genomic DNA extraction (see below). Following Chapman et al. [47], we developed a multiplex PCR scheme using one genusspecific forward primer (16S-ProaF1: 5'-CCTATGAGTCGTTTAAATGGCCGCA-3' [48]) and two species-specific reverse primers (16S-Pwalt-R458: 5'-CTATCTATATATATATTTGCTTATATAGGG-3', 16S-Psyna-R217: 5'-TAAAGTTTTATAGGGTCTTATCGTCCA-3', for P. walteri and P. synaselloides complex, respectively) targeting the 16S mitochondrial DNA. Reverse primers were designed in such a way that PCR products were different in size (458 and 217 bp for P. walteri and P. synaselloides complex, respectively) and could be differentiated by electrophoresis on 1.5% agarose gels.
Molecular identifications of species selected for crossspecies transferability of microsatellites (Table 1) were performed by sequencing a fragment of the 16S using genusspecific primers (Forward: as above; reverse 16Sbr: 5′-CCGGTCTGAACTCAGATCACGT -3' [49]) and by comparing obtained sequences to available references 45. Standard PCRs were conducted with a final 15 µl volume containing 1.   Table 1

Development of microsatellite markers
Genomic DNA was extracted using the chloroform extraction method optimized for Proasellus by Calvignac et al. [48]. Briefly, DNA was extracted from whole organisms following standard digestion (15 µg proteinase-K in 200 µl TNES buffer, 0.05 M Tris, 0.1 M NaCl, 0.01 M EDTA, 0.5% SDS) and saltchloroform purification. DNA was resuspended in TE (TrisHCl, 10 mM, 1 mM EDTA) to a final volume of 16 µl. A microsatellite-enriched library was constructed by the Savannah River Ecology Laboratory according to Lance et al. [50] using a pooled DNA extract from 166 individuals of P. walteri collected at Sauzet site (Table 1). DNA was digested using RsaI, DNA fragments were ligated to SNX linkers, and dynabeads linked to di-, tri-and tetranucelotide microsatellite motives were used for enrichment. The enriched library was subsequently sequenced using a 454 FLX sequencer (454 Life Sciences/Roche Applied Biosystems, Nutley, NJ, USA). Raw DNA sequences were cleaned of remnant vectors and assembled in contigs with CAP3 [51]. Appropriate microsatellite motifs with suitable priming sites were subsequently identified using the program MSATCOMMANDER 1.0.4 [52]. Among these, 96 microsatellite loci were selected according to their type of motifs (di-and tetranucleotids), the length and purity of their repeat tract, and their size class. We favored loci with diand tetranuclelotid motifs with the purest stretch and the most numerous repeats. Loci were also chosen so that three size classes (100-200 bp, 200-300 bp, and 300-400 bp) were kept in similar proportions to facilitate multiplexing. They were tested for amplification using P. walteri individuals from six sites ( Table 1). For each site, DNA extracted from eight individuals was pooled because the limited amount of DNA recovered from single organisms (30 ng to 100 ng) would have been insufficient to carry out the 96 amplification tests. Next, polymorphism of amplifiable loci was evaluated using six nonpooled individuals from five sites (total number of individuals = 30, Table 1). PCR reactions were conducted using the Type-it TM Microsatellite PCR Kit (QIAGEN) in a 10-µl volume containing 5 µl of QIAGEN Multiplex PCR Master Mix, 3 µl of H 2 O, 1 µl of Primer Mix (2 µM), and 1 µl of DNA (20 ng µl -1 ). Cycling conditions were as follows: (i) one step of initial denaturation (5 min) at 95°C, (ii) 32 (amplification experiment) or 39 (polymorphism experiment) cycles with denaturation (30 s) at 95°C, annealing (90 s) at 57°C (amplification) or 50°C (polymorphism), extension (30 s) at 72°C, and (iii) one step of final extension (30 min) at 60°C. Amplification and polymorphism were tested on 1.5% or 3% agarose gels respectively.

Microsatellite genotyping
Despite resolute efforts at optimization, only four loci could be combined into two PCR mixes of two loci and the three remaining loci had to be amplified separately ( Table 2) . PCR products were diluted 50 times and pooled into two genotyping mixes: 1) genotyping mix A containing PCR mixes 1 and 2, and 2) genotyping mix B containing the three loci amplified separately (Table 2). These mixes were then beads-purified (AxyPrep Mag PCR Clean-up, Axygen, Union City, CA, U.S.A.) and genotyped by a service provider (Biofidal-Themis, Vaulx-en-Velin, France) on a 3730xl DNA Analyzer (Applied Biosystems).

Cross species transferability
Assessing microsatellite transferability for all Proasellus species was far beyond the scope of this paper. Instead, we selected one site for each of the two other cryptic species within P. walteri morphospecies, one site for P. strouhali, which is the sister species of P. walteri complex, one site for the more distantly related morphospecies P. valdensis and two sites for P. valdensis sister group, the P. synaselloides complex (Table 1 [45]). N e Estimates in Groundwater Proasellus walteri PLOS ONE | www.plosone.org

Estimation of effective population size
We estimated effective population size for two sampling sites (Table 1) using LDNe [22] and ONeSAMP [21]. LDNe uses a linkage disequilibrium method initially developed by Hill [58] and corrected for small sample size [59] to estimate N e from a single population sample, whereas ONeSAMP relies on approximate Bayesian computation (ABC). For LDNe, the lowest allele frequency used was 0.02. As no prior information was available on P. walteri census or effective population sizes, we used two as the lower value and 10,000 as the upper value for priors on N e in ONeSAMP. Three exact replicate runs per dataset were performed as cross validation tests on the online ONeSAMP. As neither LDNe nor ONeSAMP should be used with loci showing LD [21,22], calculations were made dropping one locus per pair in LD, to strictly comply with the conditions imposed by the methods.

Microsatellite characterization
A total of 54,593 reads were obtained using 454 sequencing from the enriched library of genomic DNA from P. walteri. Mean read length reached 242 bp for 13.10 6 bp of total sequence length. CAP3 assembled those reads in 4,225 contigs leaving 6,121 reads as singletons. Raw data were deposited in EBI ENA database (http://www.ebi.ac.uk/ena/) under the study accession number PRJEB4363. MSATCOMMANDER identified 4,518 sequences that contained di-, tri-or tetranucleotid microsatellite loci (File S1), but it was able to define amplification primers for only 245 microsatellite loci (File S2). Among these, 96 loci were retained according to selection criteria. A total of 32 loci produced reliable amplification (i.e. clear band at expected size) in at least 3 out of 6 sampling sites ( Table 1, Table S1) and 19 were found to be polymorphic (i.e. harbored multiple bands) on agarose gels. After fluorescent labeling, seven of these 19 loci were found to produce a clear signal without parasite peaks and at expected size, a strong signal to noise ratio and limited stutters. Thus, a total of 7 loci were considered to provide reliable genotyping (Table S1).
Gene diversity indices are reported in Table 3. Total number of alleles per locus was 11 in Pwalt_Di36, 15 in Pwalt_Di21 and Pwalt_Te30, 20 in Pwalt_Te39, 21 in Pwalt_Di31, 31 in Pwalt_Di12 and 42 for Pwalt_Te46 for a total 155 alleles. The number of alleles per locus per site ranged from five to 28 in 30 individuals genotyped at Sauzet and Bédarrides, and from seven to 30 in 90 individuals at Aouste and Livron. All loci had private alleles in at least one site and three loci had private alleles in all sites (Pwalt_Di12, Pwalt_Di21 and Pwalt_Te46). Importantly, the number of alleles and private alleles correspond to minimal values since homoplasy and point mutations in both flanking regions and microsatellite motives were not assessed. Observed and expected heterozygosities ranged from 0.300 to 0.933 and 0.362 to 0.918, respectively. Significant deviations from HWE were found in three loci out of seven and in two sites (Pwalt_Te39 in Livron and Aouste, Pwalt_Di12 in Aouste and Pwalt_Di21 also in Aouste). Tests Table 2. Seven microsatellite loci from P. walteri with repeat motifs, PCR primers, fluoro-label marking, size ranges in base pairs (bp), PCR annealing temperatures (T a ) and Genbank accession numbers.

Microsatellite transferability
Cross-species amplification success rates were highly variable among species (Table 4). The seven loci were found to amplify better in P. walteri's closest relatives as both P. walteri_T058 and P. walteri_T059 were successfully amplified for all loci in at least three out of five individuals and amplifications gave polymorphic products for all but one locus per species (Pwalt_D31 in P. walteri_T058). Cross amplification in P. strouhali failed for four loci and produced monophorphic fragments for the three remaining loci. In constrast, three loci were polymorphic in P. synaselloides complex and four in P. valdensis. Cross-amplifications were  also highly variable among loci. Products were obtained from all species for Pwalt_Di12 and Pwalt_Di21 loci (86 and 73% of individuals, respectively), whereas Pwalt_Di36 gave reliable products solely in the P. walteri complex.

Effective population size
N e point estimates and associated 95% confidence intervals are reported in Table 5. LDNe provided large estimates of N e in Livron (N e = 892) and Aouste, where a negative value indicated that N e was indistinguishable from infinity [60]. The lower confidence interval in Livron was 243 and reached 585 in Aouste. Upper confidence intervals reached infinity in both estimates. ONeSAMP results proved more challenging to analyze, as replicate runs did not seem to converge in Aouste, with point estimates ranging from 533 to 1844. Replicate runs were more stable in Livron where point estimates ranged from 142 to 186, with lowest and highest confidence intervals of 83 and 327, respectively. LDNe point estimates were hence always higher than those provided by ONeSAMP, and in Livron, the LDNe estimate was even far above the highest confidence interval provided by ONeSAMP (892 vs 327).

Discussion
We successfully identified 4,518 microsatellite loci in the non-model organism P. walteri. Of these, 245 contained flanking regions sufficiently long for further amplification, 96 were tested and seven proved to be useful for subsequent analyses in population genetics. This demonstrates that Table 5. Estimated mean effective population size (N e ) of P. walteri at two river sites inferred from 90 individuals using LDNe and ONeSAMP. despite tremendous advances made over recent years in the development of microsatellite markers, this process can still be a challenging endeavor. Library screening using high throughput sequencing facilities can now be performed even for non-model organisms such as P. walteri and can reveal a large number of microsatellite loci. However this first stage, while being crucial to start with a large number of candidates, is by no means a guarantee as the number of useful loci may dwindle dramatically along the development process. Our finding that sampling sites were characterized by highly polymorphic microsatellite loci does not support prevailing views that populations of subterranean organisms do not support high genetic diversity (e.g. [61,62]). While surprising at the site level, high diversity among sites, as is evidenced by numerous private alleles, was somewhat more expected. In a recent diversification study of the Aselloidea, Morvan et al. [45] suggested using a Bayesian relaxed clock model that the three cryptic species of the P. walteri complex diverged between six and 10 million years ago. Given this time frame, populations of P. walteri_T060 sampled in this study might have diverged a long time ago, thereby providing a reasonable explanation for both the high among-sites allelic diversity and the significant frequency of null alleles encountered at some sites and loci. Linkage disequilibrium between loci was found in one site only. Therefore, linkage processes affecting these loci, while highly significant were probably not caused by physical proximity on chromosomes. However, none of the many other processes resulting in linkage disequilibrium, such as bottlenecks, population substructuring or local selection regimes [63] could be dismissed here.
Cross-species amplification was successful and polymorphism was high among species of the P. walteri complex, but both were hard to predict from phylogenetic distances in more distantly related species. Indeed, P. strouhali is more closely related to P. walteri complex than any of the other three species considered in this study [45]. Yet, cross amplifications in P. strouhali failed or were uninformative for all loci, whereas three and four loci were polymorphic in P. synaselloides complex (i.e. two cryptic species) and P. valdensis, respectively. The expectation that cross-species amplification and polymorphism decrease with increasing genetic distance between the species from which the loci were isolated and the target species was validated in several vertebrate taxa [64,65]. It is however poorly documented in arthropods, and even much less so in freshwater taxa. Primmer et al. [64] suggested that the negative relationship between transferability and genetic distance was linear while Carreras-Carbonell et al. [65] argued it was logarithmic. Our results, if confirmed by additional cross-species experiments, are more compatible with the pattern proposed by Carreras-Carbonell et al. [65], with a high success rate between very closely related species (i.e. within P. walteri morphospecies) followed by a sharp decrease and large variance as phylogenetic distance increases. Whatever the relationship, success rates in cross species amplification and polymorphism reported in the present study suggest that the loci we characterized and developed might be of significant interest for future studies on the ecology of species within the P. walteri complex. These species are widespread in the hyporheic zones of two major catchments (the Rhine and Rhône rivers) that experienced contrasted Pleistocene climates, thereby providing a rare opportunity to test for the effect of past environmental conditions on population dynamics. Estimates in P. walteri pointed towards large N e , with a point estimate of 892 in Livron and a minimum of 585 individuals in Aouste. While similar to N e estimated in two species of calanoid copepods (N e = 672.7 and 1027.4 [32]), these estimates are at odds with the small sizes inferred in snails (N e ≤ 10 [29]; or N e ≤ 245 [30]), damselflies (N e ≤ 250 in most sites and for most methods [40]) or hydrophilid beetles (N e < 20 [38]). There was however a marked difference between the two sites investigated, as in Aouste, a very large N e seemed to prevent LDNe from providing a point estimate different from infinity, and resulted in the absence of convergence between ONeSAMP runs [5]. In contrast, ONeSAMP runs returned congruent values of N e in Livron and both approaches suggested a smaller N e than in Aouste. Such variations are consistent with current literature that often report N e estimates varying by several orders of magnitude in the same species (e.g. [30,36]). Furthermore, our observation empirically corroborates simulations suggesting that accurate and precise estimations can hardly be achieved for moderate or large N e (> c. 500 [60]), even when large amounts of genetic information are available. Our inferences rely on large sample sizes (>87 individuals per population) and high allelic richness (averages of 15.4 and 16.6 for Livron and Aouste, respectively), but on seven loci only. However, as stated by Luikart et al. [5], "precision for estimates of Ne can be improved by roughly the same amount by sampling more individuals or by sampling more microsatellite loci" [18,60,66,67]. We used approximately 3 to 4 time more individuals per population than in most studies (e.g. [25,29,30,36,39]). Moreover, allelic richness of our loci was high as compared to other studies of freshwater organisms (e.g. [30,35,39]). A high allelic richness should theoretically increase the precision of Ne estimates by LDNe as well as that of ONeSAMP summary statistics (e.g. for He [68]). LDNe estimates rely on the number of independent comparisons: (n i,j = (k i -1) * (k j -1), where k i and k j represent the number of alleles at loci i and j, respectively [10]). Altogether, this suggests that obtaining precise estimations for moderate to large N e probably requires fulfilling the following conditions: improving estimation methods, acquiring many more loci using techniques such as Rad-tag sequencing [69] and getting even larger sample sizes [5].
Importantly, our inference of large N e in an obligate groundwater organism such as P. walteri may not be as surprising as prevailing views may suggest. Indeed, Kane et al. [70] using allozymes heterozygosity, and a calculation based on the classical relation He=4N e µ/(4N e µ +1) where He is the expected heterozygosity and µ the mutation rate, also reported large N e for Gammarus minus amphipods (N e = 2x10 5 ) and Astyanax fasciatus teleosteans (N e = 9x10 4 ). For comparison, using the same formula with our data, He for the least and most polymorphic loci, and a mutation rate of 10 -4 for microsatellites, returned slightly lower N e ranges of 5x10 3 to 3x10 4 in Livron and 9x10 3 to 3x10 4 in Aouste. However, calculations based on this formula provide rough estimates that cannot be compared with LDNe or ONeSAMP estimates. Finding large N e in groundwater taxa suggests that environmental factors in groundwater, while often referred to as extreme, are not necessarily associated with small N e . However, our results desperately call for further assessments of N e in groundwater as well as in closely related surfacedwelling species. Such comparisons will make it possible to formally test another central question in groundwater ecology: is N e reduced in groundwater-dwellers when compared to their surface counterparts? To our knowledge, the best evidence in favor of decreased N e in groundwater comes from mitochondrial comparisons in crayfishes [33]. However, Buhay & Crandal [33] documented changes in long-term N e of crayfishes rather than changes in contemporary N e . Other evidences come from reduced heterozygosity values in groundwater [61,62,71], but this reduction may not be systematic [72] nor readily interpretable in terms of N e [73]. Importantly, to solve this issue, alternative molecular methods such as Single Nucleotide Polymorphism detection and calling using RAD-tag sequencing [69] seem very promising as they may provide thousands of loci when it proves difficult to develop more than tens loci in most, including ours, microsatellite studies. These methods will also make it possible to access other long-standing issues including spatial delineation of population and dispersal processes.