Landscape Genetics of Schistocephalus solidus Parasites in Threespine Stickleback (Gasterosteus aculeatus) from Alaska

The nature of gene flow in parasites with complex life cycles is poorly understood, particularly when intermediate and definitive hosts have contrasting movement potential. We examined whether the fine-scale population genetic structure of the diphyllobothriidean cestode Schistocephalus solidus reflects the habits of intermediate threespine stickleback hosts or those of its definitive hosts, semi-aquatic piscivorous birds, to better understand complex host-parasite interactions. Seventeen lakes in the Cook Inlet region of south-central Alaska were sampled, including ten in the Matanuska-Susitna Valley, five on the Kenai Peninsula, and two in the Bristol Bay drainage. We analyzed sequence variation across a 759 bp region of the mitochondrial DNA (mtDNA) cytochrome oxidase I region for 1,026 S. solidus individuals sampled from 2009-2012. We also analyzed allelic variation at 8 microsatellite loci for 1,243 individuals. Analysis of mtDNA haplotype and microsatellite genotype variation recovered evidence of significant population genetic structure within S. solidus. Host, location, and year were factors in structuring observed genetic variation. Pairwise measures revealed significant differentiation among lakes, including a pattern of isolation-by-distance. Bayesian analysis identified three distinct genotypic clusters in the study region, little admixture within hosts and lakes, and a shift in genotype frequencies over time. Evidence of fine-scale population structure in S. solidus indicates that movement of its vagile, definitive avian hosts has less influence on gene flow than expected based solely on movement potential. Observed patterns of genetic variation may reflect genetic drift, behaviors of definitive hosts that constrain dispersal, life history of intermediate hosts, and adaptive specificity of S. solidus to intermediate host genotype.


Introduction
Gene flow can be a key determinant of evolutionary potential, particularly for organisms engaging in interactions shaped by adaptation (i.e., Red Queen dynamics; [1][2][3][4][5][6][7]. Evolutionary respectively, and piscivorous birds from any one of over 40 species as definitive hosts [16,24,27]. The parasite is transmitted to sticklebacks through consumption of infected copepods containing procercoid larvae [16]. Once established in host fish, the parasite transforms into a plerocercoid larvae in the coelom of the stickleback, wherein almost all of the parasite's growth occurs [24,27]. Multiple infections are common, and the total mass of the parasites can equal or exceed the mass of the host fish [28,29]. Infected threespine sticklebacks are consumed by definitive hosts, in whose gut the parasites undergo sexual maturation and reproduction [16,24,30]. Reproduction is accomplished by either selfing or cross-fertilization, depending on the number of parasites infecting a definitive host [16,24]. Bird feces transmit eggs into water where they hatch into free-living coracidia larvae that are consumed by cyclopoid copepods [16,24,31]. Threespine sticklebacks likely have more constrained movement potential than definitive bird hosts. The stickleback hosts sampled in this study were all residents of freshwater lakes, which typically remain in the lake of origin throughout life, unless connecting freshwater drainages allow movement among proximate lakes [32]. Threespine sticklebacks exhibit territoriality and localized homing behaviors up to 200 m; significant genetic divergence also has been found between populations in neighboring lakes [32][33][34][35]. Though comparably little is known about landscape-level movement of definitive hosts, many (e.g., loons and grebes) migrate from Alaska to the Gulf of Mexico during the winter and exhibit little to no population genetic divergence [36][37][38][39]. Breeding territories of Common Loons (Gavia immer), which are dominant predators of threespine stickleback in southern Alaska, average around 70 ha and are frequently held by the same mating pair over several years [37,[40][41][42]. Adult loons without territories and loons that are unsuccessful in breeding may move to establish new territories in nearby lakes [37,40,41].

Specimen collection
Infected threespine sticklebacks were sampled in late May to early June in 2009-2012 from 17 lakes in south-central and south-west Alaska. Collections were specifically approved by annual Fish Resource Permits from the Alaska Department of Fish and Game and animal care protocols from Tulane University Institutional Animal Care and Use Committee (protocols 0304R-UT-C and 0304R2) and the University of Washington Institutional Animal Care and Use Committee (protocol 3142-01). Fish were captured from ten lakes in the Matanuska-Susitna Valley (MatSu) north of Cook Inlet, five lakes on the Kenai Peninsula (Kenai) east of Cook Inlet, and two lakes west of Cook Inlet in Bristol Bay drainages (BB; Table 1; Fig 1). All of the sampled fish exhibited a benthic phenotype. Samples were taken with 3 mm and 6 mm wiremesh minnow traps set near shore or with beach seines and tow nets. After euthanization with MS222, a ventral incision was made in each specimen prior to preservation in 95% ethanol. During necropsies all parasites were removed and preserved in 95% ethanol. Parasites were selected for analysis to maximize the number of hosts with large parasite loads ( 7 parasites) for within host and between lake comparisons (Table 1).

Mitochondrial DNA sequence data collection and analysis
Genomic DNA from S. solidus specimens was extracted using the DNeasy Blood & Tissue extraction kit following the manufacturer's standard protocol (Qiagen, Inc., Valencia, CA, USA). Polymerase chain reactions (PCRs) were performed to amplify 759 base pairs (bp) of the mitochondrial cytochrome oxidase 1 gene (CO1) region for 1,026 individuals (Table 1) using primers CYTW3F2 (CTAATTGGTGTGTGATCTGG TTTTG) and CYTW3R5 (GGAGTGGGAG CCCAACACAAG) [26]. PCRs were carried out using Mastercycler Pro thermocyclers  (Eppendorf AG, Hamburg) with conditions following Nishimura et al. [26]. PCR products were purified using ExoSAP-It (USB, Affymetrix, Cleveland, OH). Cycle-sequencing reactions were then carried out using BigDye (Applied Biosystems, Foster City, CA), 3.2 mM primers, 4 μL ddH 2 O, and purified PCR product. Sequencing reactions were purified using Sephadex (GE Healthcare Biosciences, Pittsburgh, PA) according to manufacturer protocols. An ABI 3730xl DNA sequencer (Applied Biosystems, Foster City, CA) was used to electrophorese cycle-sequencing products. Sequencher v4.9 and Genalex v6.5 [43] software (Gene Codes Corp., Ann Arbor, MI) were used to align and edit the resulting DNA sequence data. DNAsp v5.10.1 and Arlequin v3.5 were used to estimate the total number of haplotypes, haplotype diversity, pairwise differences among haplotypes, nucleotide diversity among haplotypes, and effective number of haplotypes for each lake, region, and year [44,45]. Effective number of haplotypes was calculated as the reciprocal of the sum of squared frequencies. The distribution of haplotype diversity and relationships among haplotypes were evaluated with Network by creating minimum spanning haplotype networks labeled according to lake, collection year, region, and genotype cluster assignment [46,47]. Using Arlequin v3.5 software [44], Analysis of Molecular Variance (AMOVA) was carried out to estimate the proportion of variance attributable to different hierarchical scales. AMOVAs based on 10,000 permutations were performed to assess patterns according to host and year, lake and year, lake and region, and genotype cluster. For AMOVAs examining variation among hosts, tests were limited to individuals from fish hosts with 7 parasites (totaling 565 individuals) to reduce possible bias from small sample sizes. Arlequin v3.5 [44] was also used to estimate global F ST values and pairwise F ST values between each lake and region.
To determine whether patterns of genetic differentiation reflect the movement potential or habits of hosts, pairwise routes of potential movement between all lakes were calculated using Google Earth. The most parsimonious pathways for travel were measured as straight line distances between lakes, as compared to river distances between lakes within and among watersheds. Pairwise F ST values were compared to the between-lake linear and riverine distance matrices with Mantel's tests to evaluate the strength of relationships and to detect signatures of isolation-by-distance. Mantel's tests were performed in Genalex v6.5 with 10,000 permutations [43].
Genetic diversity statistics, including expected (H e ) and observed (H O ) heterozygosity, average number of alleles (N alleles), and average rarefied allelic richness (R) were calculated for each lake, region, and year using Arlequin v3.5; Shannon's Index (I) also was calculated using MSA software [44,49]. Hardy-Weinberg Equilibrium (HWE), linkage disequilibrium (LD), and F IS values were calculated to test for equilibrium and to detect signatures of the Wahlund effect. HWE and F IS values were calculated in Arlequin v3.5. Tests for LD were performed in Genepop v4.3, and p-values were compared following a Bonferroni correction (p < 0.0018) [50]. To examine the hierarchical distribution of microsatellite variation, Arlequin v3.5 was used to conduct AMOVAs with 10,000 permutations run per analysis [44]. Samples were grouped by host and year, host and lake, lake and region, and genotype cluster. For AMOVAs examining variation among hosts, tests were limited to individuals from fish with 7 parasites (totaling 565 individuals) to reduce possible bias from small sample sizes. Pairwise F ST values were also calculated using Arlequin v3.5.
Structure v2.1 was used to assess patterns of genetic differentiation without a priori hypotheses of population structure [51]. We used a burn-in period of 10,000 iterations and 5 independent runs with 100,000 iterations where K (number of populations) was set iteratively from 1 to 17. Runs intended to evaluate the likelihood of structure corresponding to fish host were conducted with a data subset including parasites from hosts with 7 parasites where K was set from 1 and 6 iteratively. Analyses of structure among years and within clusters revealed in previous runs also were conducted where K was set from 1 to 4.
To determine whether patterns of genetic differentiation reflect the movement potential or habits of hosts, pairwise linearized F ST values were compared to between-lake distance matrices (as described above) with Mantel's tests to evaluate the strength of relationships and to detect signatures of isolation-by-distance. Mantel's tests were performed in Genalex 6.5 with 10,000 permutations [43].

MtDNA haplotype diversity and differentiation
Analysis of a 759 bp region of the mitochondrial CO1 gene recovered 248 unique haplotypes in 1,026 individuals. Twelve polymorphisms separated the most divergent haplotypes from the most common haplotype (H1), which was found in all lakes except Hall Lake, and in all regions, years, and genotype clusters. Excluding Hall and Pollard lakes, which were not considered due to small sample sizes, the greatest haplotype diversity was detected in Big Beaver Lake (Table 1), where 24 unique haplotypes were recovered in 30 individuals. Minimum pairwise haplotype diversity occurred in Falk Lake, whereas maximum pairwise haplotype diversity occurred in Cornelius Lake (Table 1). Nucleotide diversity ranged from 0.0012 in Falk Lake to 0.0064 in Cornelius Lake, and the effective number of haplotypes ranged from 2.7 in Falk Lake to 20.32 in Loberg Lake. Haplotype diversity as well as pairwise and nucleotide diversity were consistent across regions ( Table 1). The minimum effective number of haplotypes occurred in the Kenai region whereas the maximum was present in the MatSu region (Table 1). When partitioned by year, the highest haplotype diversity was recovered in 2009 and the lowest haplotype diversity was recovered in 2010. The effective number of haplotypes, and pairwise and nucleotide diversity were all highest in 2009 and lowest in 2010 ( Table 1).
The haplotype networks revealed a high degree of heterogeneity across lakes and years (S1 Fig). There was a high degree of heterogeneity when haplotypes were identified by lake (S1 Fig), and though there was some evidence of structure by year, with early years (2009 and 2010) clustering relative to later years (2011 and 2012), some heterogeneity was still evident (S1 Fig).
Despite the absence of obvious structure in haplotype networks, mtDNA differentiation was significant across multiple hierarchical levels ( Table 2). Significant global F ST values were recovered, ranging from 0.027 to 0.08. The lowest global F ST value was recovered when individuals were grouped by lake and region, whereas the highest global F ST values were recovered when individuals were grouped by host and year (Table 2).
Significant pairwise F ST values by lake ranged from 0.0138 (Walby Lake and Cheney Lake) to 0.142 (Lower Ohmer Lake and Falk Lake; Table 3). Pairwise F ST values by region were not significant except between MatSu and Kenai (0.0087; S1 Table). Mantel's tests for associations between geographic distance and genetic distance (F ST values) were not significant when distance was measured by stream length or straight line distances.

Microsatellite genetic diversity and differentiation
Microsatellite-based measures of genetic diversity were similar across lakes except for Hall Lake and Pollard Lake, which were outliers likely due to small sample sizes. Observed heterozygosity was consistent across all lakes (Table 1) with values ranging from 0.61 (Lower Ohmer Lake) to 0.72 (Scout Lake). Only a few instances of departures from HWE were detected in a subset of populations at select loci (S2 Table). Significant F IS values were present in ten lakes, ranging from 0.04 (Falk Lake) to 0.167 (Lower Ohmer Lake; S2 Table). All loci were found to be in LD except in 4 of 476 comparisons between loci within lakes (Walby Lake, Scso18 & SsCAB6; Walby Lake, Scso18 & Scso35; Rocky Lake, Scso18 & Scso24; Iliamna Lake, Scso35 & Scso9). Specimens from Engineer Lake exhibited the highest average number of alleles, and Scout Lake exhibited the lowest (Table 1). Shannon Index values ranged from 1.42 (Aleknagik Lake) to 1.81 (Cornelius Lake). Rarefied allelic richness ranged from 6.38 (Rocky Lake) to 8.60 (Cornelius Lake). Individuals from the MatSu region exhibited the highest average number of alleles, and individuals from the BB region exhibited the lowest average number of alleles. Shannon Index by region ranged from 1.62 (BB) to 1.8 (MatSu). Rarefied allelic richness ranged from 7.16 (BB) to 7.63 (Kenai). When partitioned by year, the greatest Shannon Index values and average number of alleles was present in 2012 and the smallest respective values occurred in 2010 (Table 1).
Microsatellite-based estimates of genetic differentiation were highly significant ( Table 2). All global F ST values, which ranged from 0.04 to 0.062, were highly significant ( Table 2). The lowest global F ST value was recovered when individuals were grouped by host and lake, while the highest global F ST value was recovered when individuals were grouped by host and year ( Table 2). Pairwise F ST values by lake ranged from 0.0018 to 0.142, with many comparisons being highly significant ( Table 3). The highest pairwise values were between Pollard and Iliamna (0.142), Pollard and Aleknagik (0.135), and Scout and Aleknagik (0.13; Table 3). Pairwise F ST values between regions were all significant (S1 Table). The highest pairwise value was between MatSu and Kenai (0.041), and the lowest was between BB and Kenai (0.022; S1 Table). Mantel's tests revealed evidence of isolation by distance. When all sites were considered, slightly stronger non-significant trends were recovered when distance was measured via stream length (R xy = 0.35, R 2 = 0.12, P = 0.072) than by straight line distances (R xy = 0.32, R 2 = 0.10, P = 0.086). When Hall and Pollard lakes were removed from the calculation to control for small sample sizes, highly significant relationships were detected, with distance based on stream length (R xy = 0.491, R 2 = 0.24, P = 0.007) being a stronger predictor than straight line distances (R xy = 0.445, R 2 = 0.198, P = 0.021; Fig 2).
The results of the Bayesian clustering analysis in STRUCTURE indicated a peak in posterior probabilities (-ln P(D)) at K = 2, and ΔK values also detected the coarsest level of genetic structure at K = 2 (Fig 1, S2 Fig). Most individuals from Aleknagik, Engineer, Hall, Lower Ohmer, Pollard, Seymour, and Wolf lakes were assigned to one cluster, and most individuals in Cheney, Falk, Willow, and Scout lakes were assigned to a second cluster (Fig 1, S2 Fig). Individuals from Big Beaver, Cornelius, Iliamna, Loberg, Rocky, and Walby lakes were assigned to both clusters. Most individuals in a host were assigned completely to one cluster or the other; only six fish hosted S. solidus from more than one cluster, and only a few hosts carried individuals with admixed genotypes (S2 Fig). The frequency of the two clusters varied over time; the second cluster became more frequent in the latter two years of the study (S2 Fig). At K = 5, individuals from Engineer Lake and some individuals from Walby Lake were assigned to a third cluster independent of most other samples (Fig 1, S2 Fig). When individuals from the two primary genetic clusters were run separately as subsets (i.e., cluster one and cluster two respectively), individuals from cluster one were assigned to two genetic clusters, and individuals in cluster two tended towards incomplete assignment when run as a subset, as was observed when all individuals were run together (S2 Fig).

Discussion
The evolutionary potential of a parasite can depend on gene flow, which frequently reflects host dispersal [52,53]. This study tested the hypotheses that the genetic structure of S. solidus reflects the habits of its intermediate host or, alternatively, those of its definitive hosts. We evaluated spatial and temporal patterns of genetic variation and found evidence of significant population genetic structure in S. solidus infecting threespine sticklebacks in lakes throughout south-central Alaska. The observed patterns of genetic differentiation and diversity showed that gene flow is constrained, and that variability over space and time likely reflects factors other than the movement of the most vagile host. We recovered significant mtDNA and microsatellite-based estimates of genetic differentiation, with genetic structure recovered in frequency-based and Bayesian clustering analysis (Tables 1 and 2; Figs 1 and 2). Even though levels of differentiation found in fish and parasites are frequently higher than in other systems, F ST and F ST values in S. solidus are comparably higher than levels of genetic differentiation in similar host-parasite systems [54][55][56][57]. For example, much lower levels of differentiation have been found in Ligula intestinalis, another geographically widespread cestode parasite, which has been attributed to transportation of eggs by vagile piscivorous birds [10,58].
Hierarchical signatures of genetic differentiation were attributable to differences among stickleback hosts within years and among lakes, indicating that parasite populations may be defined by intermediate host identity, host cohort, or lake of origin (Table 2). Though measures of hierarchical genetic differentiation likely captured other sources of differentiation due to samples being taken over time, pairwise measures of differentiation suggest that S. solidus is significantly differentiated among lakes (Table 3). Clustering analysis also suggests the presence of two populations across south-central Alaska that differ in frequency among lakes (Fig  1, S2 Fig). Lakes generally were assigned completely to one cluster or the other, though some (e.g. Big Beaver, Cornelius, Iliamna, Loberg, Rocky, and Walby) harbored both clusters (Fig 1). The highest numbers of effective mitochondrial haplotypes corresponded to lakes that harbored both clusters (Table 1). This indicates that the movement of its vagile, definitive hosts influences gene flow in S. solidus less than would be expected based on movement potential [4,9,10,20,59]. The observed patterns of genetic variation in S. solidus could be attributed to one or more possible mechanisms, including genetic drift, behaviors of definitive hosts constraining dispersal, feeding or movement of intermediate hosts, and adaptive specificity of S. solidus to intermediate host genotype.

Genetic variation
The high levels of genetic diversity observed in S. solidus (Tables 1, 2, and 3; Figs 1 and 2) likely reflects transmission across a complex life cycle. High levels of genetic diversity in S. solidus may arise because a single stickleback host can contain multiple parasite genotypes, as copepods containing cestodes are consumed individually and in great numbers by fish. Definitive avian host individuals may in turn consume many stickleback hosts [60]. Similar levels of genetic diversity have been observed in other host-parasite systems where intermediate hosts harbor multiple parasite genotypes and definitive hosts consume several intermediate hosts [11,61].
The observed hierarchical population structure in S. solidus also is likely an outcome of a complex life cycle. Observed patterns of genetic differentiation suggest that parasites within a stickleback intermediate host can be considered a subpopulation comprised of discrete or overlapping generational cohorts, and lakes a collection of multiple subpopulations [10]. Though subpopulations are well defined within intermediate hosts, subsequent mixing can occur within a lake because multiple infected fish may be consumed by a single definitive host. Further geographic nesting could reflect definitive hosts feeding and shedding parasites within a particular lake or region [10]; if definitive hosts feed on intermediate hosts in a single lake, the subsequent generation of parasites may be returned to the same or nearby lake if shedding occurs soon after consumption of infected stickleback.

Population connectivity
The observed patterns of genetic differentiation in S. solidus indicate that population connectivity is lower than what would be expected from the high dispersal potential of its definitive hosts. Semi-aquatic birds like Common Loons can readily carry S. solidus among lakes. Yet, 2.96% of mtDNA haplotype variation is attributable to differences among lakes (Table 2). Pairwise F ST values based on microsatellite allelic variation also indicate that S. solidus in nearby lakes exhibit significant levels of genetic differentiation (e.g. Engineer and Lower Ohmer, F ST = 0.11; Table 3). Additionally, we found that S. solidus exhibits signatures of isolation-by-distance, suggesting that there are stronger geographical barriers to gene flow in S. solidus than has been previously thought (Fig 2). The signature of isolation-by-distance was more pronounced when distance corresponded to riverine movement corridors, suggesting that watershed boundaries could be important in mediating gene flow among lakes. The movement of definitive avian hosts might be constrained if, for example, flight patterns correspond to drainage topography rather than straight-line pathways or if avian hosts exhibit site fidelity. It is also possible, however, that dispersal of definitive hosts does not coincide with periods of infection, which can be relatively brief (i.e., days to weeks). Reproduction and shedding of S. solidus also might occur prior to dispersal of definitive hosts. If so, then the observed pattern of IBD among lakes could be a reflection of small effective population sizes and genetic drift arising from stochastic synchronicity between infection, reproduction and dispersal of definitive hosts.
The strong signatures of temporal variation observed across the four-year study period appear to be tied to the habits of stickleback hosts. Among year differences in mtDNA haplotype frequencies ( Table 2) and evidence of replacement of one by another microsatellite genotypic cluster over the study period (S2 Fig) are consistent with the feeding ecology of stickleback hosts [53,62]. Sticklebacks typically cease accumulating parasites when their diet shifts away from copepods late in their first year after hatching, which could result in a two year turnover cycle of genotypic clusters (i.e., generational cohorts) in lakes corresponding to the lifespan of stickleback hosts [16,23,32,63]. It is also possible, however, that transmission cycles could be acting in conjunction with genetic drift and constrained or variable definitive host movement to promote spatial and temporal genetic differentiation in S. solidus.
The observed patterns of restricted gene flow may also be an outcome of intermediate host specificity within S. solidus. Nishimura et al. (2011) recovered phylogenetic evidence indicating that continental patterns of genetic differentiation in Schistocephalus parasites reflect specificity between threespine and ninespine stickleback intermediate hosts [26,64]. Evidence of finescale geographical differences in parasite responses to host immune function suggests that S. solidus may also exhibit adaptive specificity to evolutionary lineages of threespine stickleback [65,66]. If so, then the observed patterns of genetic variation might reflect differential infection of the two genetically distinct but morphologically identical clades of threespine stickleback that occur in south-central Alaska [62,65,67,68]. Though the low percentage of hosts (~2.5%) infected by parasites assigned to two genotypic clusters and evidence of high genetic diversity and genetic differentiation in S. solidus are consistent with this scenario, further study is warranted to determine the likelihood of adaptive host specificity in mediating gene flow in S. solidus [3,55,65,69]. Comparisons of host haplotype to parasite genotype could, for example, reveal whether patterns of genetic relatedness are correlated. Additional comparisons focusing on functional genes associated with immune response could also prove informative, especially with reference to further information on life history and demographic factors that may influence host-parasite interactions.