Population Structure and Adaptive Divergence in a High Gene Flow Marine Fish: The Small Yellow Croaker (Larimichthys polyactis)

The spatial distribution of genetic diversity has been long considered as a key component of policy development for management and conservation of marine fishes. However, unraveling the population genetic structure of migratory fish species is challenging due to high potential for gene flow. Despite the shallow population differentiation revealed by putatively neutral loci, the higher genetic differentiation with panels of putatively adaptive loci could provide greater resolution for stock identification. Here, patterns of population differentiation of small yellow croaker (Larimichthys polyactis) were investigated by genotyping 15 highly polymorphic microsatellites in 337 individuals of 15 geographic populations collected from both spawning and overwintering grounds. Outlier analyses indicated that the locus Lpol03 might be under directional selection, which showed a strong homology with Grid2 gene encoding the glutamate receptor δ2 protein (GluRδ2). Based on Lpol03, two distinct clusters were identified by both STRUCTURE and PCoA analyses, suggesting that there were two overwintering aggregations of L. polyactis. A novel migration pattern was suggested for L. polyactis, which was inconsistent with results of previous studies based on historical fishing yield statistics. These results provided new perspectives on the population genetic structure and migratory routes of L. polyactis, which could have significant implications for sustainable management and utilization of this important fishery resource.


Introduction
Marine fishes are generally expected to show much lower geographical differentiation than their freshwater counterparts, which is understandable in light of lacking geographical barriers migration since the capture data was somehow limited [23,24]. Meanwhile, traditional approaches of tagging experiments exhibited extremely low mark-recapture rates most likely due to the intensive fishing pressure [23,24]. It has been previously demonstrated that oceanographic features, such as eddies and fronts, representing shift in temperature, salinity or food availability may also prevent random mixing and diffusion of marine organisms [16,17,35]. Although previous studies have suggested that environment factors, such as the Yellow Sea Warm Current (YSWC) and the Tsushima warm current and Kuroshio, were correlated with the migration and distribution of L. polyactis, genetic evidence for correlations between sea currents and migration or distribution of L. polyactis has not yet been reported in literature. Recent genetic studies based on RAPD and AFLP [30][31][32] demonstrated that there were three populations of L. polyactis, while studies based on mitochondrial (mtDNA) [33,34] and microsatellite markers [4,36] showed no significant population differentiation, suggesting panmixia in L. polyactis. In contrast to most previous studies, Wang et al. (2013) [4] found two microsatellites under diversifying selection and illustrated that the adaptive differentiation of L. polyactis could add to the understanding of the population structure and thus serves as important information for establishing conservation strategies. In addition,it is emphasized that the accuracy of genetic mixed-stock analysis (MSA) using microsatellite markers for marine fish species is mainly limited by the relatively low level of genetic divergence [37]. Therefore, a comprehensive population genetic analysis of L. polyactis is still needed to assess patterns of population genetic divergence throughout its distribution and to resolve discrepant results.
In the present study, a total of 15 novel polymorphic microsatellite markers were used to assess the levels of genetic diversity and patterns of population genetic divergence of L. polyactis populations. Our sampling scheme covered nearly the entire distribution range of L. polyactis including both nearshore spawning and offshore overwintering grounds, representing an open system experiencing multidirectional migration between relatively isolated populations across several environmental gradients. Accordingly, this sampling scheme enhanced our capability to infer processes shaping population structure and migration routes of L. polyactis. Overall, results of the present study could provide useful insights into delineating fine population genetic structure and uncovering possible processes leading to adaptive divergence in L. polyactis, which should have important implications for the management and conservation of this fishery species.

Ethics statement
The field studies did not involve any endangered or protected species. L. polyactis is not protected by Chinese law or by any of the countries where and when sampling was performed. It is a commercially harvested species in China and the other Northeast Asian countries. The samples were collected by trawling and fishes were already dead when collected.

Sampling and DNA extraction
A total of 337 L. polyactis adult individuals were collected from 15 geographical locations ( Fig  1; Table 1), including 10 locations along the coast of China (Dandong, BLA, BLB, Qinhuangdao, Dongying, Weihai, Qingdao, Wenling, Xiapu and Yangtze River Estuary) and five locations from offshore areas of the Yellow Sea and the East China Sea (SYA, SYB, SYC, NEA, NEB). All the samples were collected from natural fishing grounds between March 2005 and September 2014 using direct fishing techniques. Caudal fin or muscle tissue samples were collected and preserved in 95% (v/v) ethanol at -80°C. Total genomic DNA was extracted using the standard phenol-chloroform protocol. Extracted DNA was checked using 1.5% agarose gel electrophoresis and then stored at -20°C.

Data analysis
Genetic diversity. For each population, genetic variation indices including polymorphism information content (PIC), observed heterozygosity (H O ) and expected heterozygosity (H E ) were calculated using the Excel Microsatellite Toolkit [40]. Inbreeding coefficient (F IS, a measure of the extent of nonrandom mating), and allelic richness (A R, an estimate of allelic diversity independent of sample size) were computed in FSTAT 2.9 [41]. Deviations from Hardy-Weinberg equilibrium (HWE) for each locus in each population and linkage disequilibrium (LD) between pairs of loci were tested in GENEPOP 4.3 using the Markov chain methods (10000 dememorizations, 1000 batches, and 10000 iterations per batch) [42,43]. The software MICRO-CHECKER 2.2.0 [44] was used to test for technical artefacts such as null alleles, stuttering and large allele dropout.
Outlier tests. FIDST2 [45] implemented in LOSITAN [46] was applied to identify potential outliers. This method calculates F ST for each locus and assumes that outlier loci under diversifying selection would show increased level of population differentiation compared to simulated distribution of loci based on observed level of population differentiation. Thus loci with unusually high F ST values were regarded as potentially being under directional selection. LOSITAN was run using 100 000 simulations with a confidence intervals of 99.5%, a false discovery rate (FDR) of 0.05, and assuming a stepwise mutation model. Moreover, it has been demonstrated that the results based on LOSITAN may be biased by the presence of strong hierarchical population structure leading to increased level of false positives [47]. Therefore, we further performed the test based on the method of Beaumont & Nichols [45] as described by Excoffier et al. [47], which was performed under a hierarchical island model with 100 000 simulated loci using Arlequin 3.5 [48].
Population structure and differentiation. Pairwise F ST values were estimated based on neutral microsatellite dataset and locus Lpol03 and significance was adjusted using the Benjamini-Yekutieli multiple testing correction [49].The global F ST was calculated by GENEPOP 4.3. In order to compare the patterns of genetic structure between neutral and possible selected loci, principal coordinate analysis (PCoA) based on gene frequency data was performed using GenAlEx 6.5 [50]. The software STRUCTURE 2.3 [51,52] was applied to cluster the samples on the basis of their microsatellite genotypes. We performed an independent assessment of the effects of both neutral and outlier loci on genetic structure. Ten replicates were run for all possible values of the maximum number of clusters (K) up to K = 16 using the admixture model, and for each run, 2,000,000 iterations were carried out after a burn-in period of 200,000 iterations. The Evanno's method [51] implemented in Structure Harvester website was used to detect the optimum number of genetically homogeneous groups (K) [53]. In order to test the partitioning of genetic variation within and among putative groupings of samples, a hierarchical analysis of molecular variance (AMOVA) was conducted based on neutral microsatellite dataset with 10000 permutations using Arlequin 3.5. Demographic history and isolation by geographical distance. Bottleneck software ver. 1.2.02 [54] was used to verify the existence of bottlenecks with 1,000 iterations under the infinite allele model (IAM), stepwise-mutation model (SMM) and two-phased model of mutation (TPM). Significance was tested using the Wilcoxon signed-rank test (Wilcoxon, 1945). Qualitative test of model shift was performed to calcuate the allele frequency distribution using Bottleneck 1.2.02 [54]. The effects of geographical isolation on the genetic structure were assessed by the pattern of isolation by distance (IBD). Pairwise geographic distance between two collections was estimated as the nearest marine distance using Google Maps. IBD regression analysis was performed online using the IBD web service [55] with 10,000 randomizations.

Genetic diversity
Summary statistics of 15 microsatellite loci in the 15 populations are shown in Table 1 & S1 Table. A total of 462 alleles were detected across all 15 loci, with the number of alleles per locus ranging from 15 for locus Lpol11 to 43 for locus Lpol15. Average A R per sample was highest in NEA (11.824) and lowest in DY (10.085). The average of expected heterozygosity ranged from 0.862 (DY) to 0.900 (NEA) and the average of observed heterozygosity varied from 0.870 (DY) to 0.924 (CJK), respectively. The average of PIC values ranged from 0.814 to 0.869 (Table 1), suggesting a high level of genetic diversity. For all samples and loci combinations, no linkage disequilibrium was detected in the 15 populations. However, seven cases of locus-population combination out of 240 showed significant departure from Hardy-Weinberg equilibrium (HWE) after sequential Bonferroni correction (P < 0.0005, S1 Table). The MICRO-CHECKER analysis revealed that the loci Lpol05, Lpol11, Lpol16 and Lpol17 may include null alleles in one to four populations. However, we used all loci in this study because no locus was affected by null alleles in all samples.

Outlier detection
Two loci (Lpol03 and Lpol04) were identified as outliers at the 99.5% confidence interval after FDR correction, with F ST values of 0.094 and 0.023, respectively (S1 Fig). Nevertheless, there were also two loci (Lpol03 and Lpol06) identified as putatively under positive selection at the significant level of 0.01 under hierarchical island model (S2 Fig). The consistent results of both outlier tests suggested Lpol03 was a candidate locus under directional selection. The result of NCBI blastn search against refseq_genomic database for the flanking region sequences of Lpol03 showed strong similarity to Grid2 gene (E-value = 0; Identity = 97%) which encodes the glutamate receptor δ2 protein (GluRδ2), an excitatory receptor for glutamate. The Lpol03 with a fragment of 454 bp was found to be located in the intron of Grid2 gene in a genomic scaffold of Larimichthys crocea.

Genetic differentiation and population structure
The global F ST among all populations based on neutral loci and the outlier locus Lpol03 were 0.0036 and 0.0915, respectively. Pairwise F ST values among samples based on neutral loci ranged from -0.0084 (between NEA and NEB) to 0.0184 (between SYC and CJK). Moreover, most F ST values between SYC and other sampling populations were statistically significant after Benjamini-Yekutieli correction ( Table 2). Approximately half the comparisons based on locus Lpol03 yielded highly significant F ST values after Benjamini-Yekutieli correction (ranging from 0.0772 to 0.2543, P < 0.00908). Pairwise F ST results based on the outlier locus Lpol03 exhibited an overall higher value than those based on neutral loci, suggesting the existence of genetically differentiated populations of L. polyactis under directional selection. PCoA plotting based on neutral loci indicated no clear population structure among populations and a genetic distinctiveness of sample SYC (Fig 2A). However, PCoA plotting for locus Lpol03 showed two clearly differentiated groups: DD,BLA,BLB,QHD,SYA,SYB,CJK,WL and XP clustered as one group, while QD,WH,DY, NEA, NEB and SYC grouped as another distinct cluster ( Fig 2B). Overall, the PCoA plotting analysis illustrated a pattern consistent with previous estimates of population structure derived from pairwise F ST based on the neutral loci and outlier locus Lpol03. In simulations of the Bayesian clustering method to evaluate genetic structure, the mean ΔK values suggested 3 clusters as the most likely population structure for the neutral dataset (S3 Fig), while the number of genetic groups best fitting for the outlier locus Lpol03 was 2 (S4 Fig). Using K = 3 in STRUCTURE, we found that the 15 populations did not show any distinct population genetic structure based on the neutral datasets. Conversely, strikingly different result was obtained based on Lpol03 locus, which showed that two distinct genetic clusters were identified (Fig 3). When the AMOVA was performed without considering hypothetical grouping of populations, no significant genetic variability was observed among populations. However, the hierarchical AMOVA conducted with populations grouped according to sampling time revealed low but significant genetic variance among groups (P = 0.000) respectively. Conversely, with populations grouped according to geographic origins, the AMOVA showed nonsignificant genetic differentiation among groups (P = 0.112 and 0.131) respectively (S2 Table).

Population demography and isolation by distance
Genetic bottleneck analysis showed no significant heterozygotes excess in all of the 15 populations under the two-phase model (TPM) and the stepwise mutation model (SMM), only the test based on the infinite alleles model (IAM) identified several excess of heterozygosity (S3 Table). Given that microsatellite mutations are thought to occur largely through a stepwise process, a combination of the SMM and IAM is expected to provide best estimate of equilibrium heterozygosity for the bottleneck analysis [56]. The absence of heterozygosity excess under both the SMM and the TPM models suggested that all the populations were at mutation-drift equilibrium. These results were also consistent with the normal L-shaped distribution of allele frequency, indicating no genetic bottleneck in the recent past. IBD analysis showed that there was no evidence of isolation by distance pattern (P > 0.05), potentially indicating the persistence of high-level gene flow.

Genetic diversity and population differentiation
In the present study, slight heterozygote deficiency in all populations was observed, which might be attributed to Wahlund effect resulting from mixing of fish from different breeding areas [57]. Deviation from HWE might be explained by decreases in population size as a result of commercial exploitation, in which the consequences are similar to those of the founder effect [58,59]. In addition to population-level processes, the null alleles frequently found at microsatellite loci, which also have been reported in many other fish species [60][61][62], could probably explain deviations from HWE. The high genetic diversity in microsatellite loci of L. polyactis might be attributed to high mutation rate and large population size [63]. The 15 highly polymorphic microsatellites provided much more genetic information than the RAPD, AFLP and mtDNA markers in previous studies on L. polyactis [30,31,34,64,65]. Moreover, as suggested by AMOVA and IBD regression analysis, no evidence of isolation by distance was observed, which further reflected high degree of genetic connectivity among different geographical populations of L. polyactis. However, it should be noted that differentiation between groups clustered by sampling time was supported by the AMOVA (S2 Table), even though the proportion of genetic variation is relatively low.
Combining neutral and selected loci to assess population structure It has been suggested that genomic regions showing high divergence may represent signatures of local adaptation or genomic islands of divergence recalcitrant to gene flow [66,67]. In the present study, no significant population structure was identified in L. polyactis based on neutral microsatellite markers, suggesting the existence of high gene flow, which was consistent with previous results based on microsatellite markers and mtDNA analysis [34,36]. Conversely, strikingly different results were obtained when using Lpol03 locus identified as candidate outlier potentially under positive selection, which showed evidence for predominant divergence among populations of L. polyactis. Significant population differentiation were also observed based on outlier microsatellite loci associated with temperature in Wang et al. (2013). No isolation by distance was detected among populations of L. polyactis, which was also in accordance with previous results using both neutral and outlier microsatellite loci in Wang et al. [4]. The patterns of population divergence observed in the present study were generallyconsistent with previous studies based on a different battery of microsatellites, reflecting the effects of both gene flow and local adaptation on population structure. The genomic sequence containing the outlier Lpol03 showed a strong homology with Grid2 gene encoding the glutamate receptor δ2 protein (GluRδ2), an excitatory receptor for glutamate, which is critically important for dendrite development and synapse formation during neuronal growth such as motor learning and neural wiring [68,69]. Mikami et al. found that the expression of zebrafish GluRδ2 is selective for cerebellum-like neural wiring [69]. In general, glutamate neurotransmitters are involved in a variety of stimulus-induced actions such as stress, aggressiveness, locomotion as well as exploring behavior, which were considered to be important for food accessibility and size-related sex determination for fish [70]. As most anadromous fishes have evolved strict homing behavior, significant variation in motor learning behavior may also be attributed to separate wild groups matched to different strict homing behaviors with spatial and temporal changes [71]. It has been demonstrated that false positives of outlier detection can result from recent population bottlenecks or geographical structure [72,73]. Neither genetic bottleneck nor isolation by distance was detected in L. polyactis, providing support to the validation of the outlier loci detected. Therefore, these results led us to conclude that a cryptic adaptive population structure might exist in populations of L. polyactis, hence illustrating that local selection pressures might override homogenizing effects of high level gene flow [14,74].

Evidence for isolation by marine environment in overwintering aggregations
In the present study, all the analyses based on the outlier locus Lpol03 indicated that two distinct genetically distinct groups existed among the 15 samples. Moreover, clear genetic differentiation were detected between the overwintering samples from the Yellow Sea (SYA, SYB) and those from the East China Sea (NEA, NEB), indicating that there were two genetically distinct overwintering aggregations, which was consistent with the previous findings based on capture fishery data [23]. During the winter monsoon from late October to early April, a period when L. polyactis aggregate in the overwintering grounds, a strong thermohaline front forms in the southeastern Yellow Sea (YSWC), which represents an area with a sharp cline in oceanographical conditions such as temperature and salinity and thus acts as physical barriers to exchange in the marine environment [75]. Likewise, environmental mechanisms, as contributing and interacting factors in determining population distribution of marine fish, have continually drawn attention in recent population dynamics studies of marine organisms [76]. For example, Li et al. [77] found that the wintering migration of Japanese anchovy (Engraulis japonicus) could be driven by spatially varying environmental factors such as water temperature and salinity, and both Yellow Sea Cold Water Mass (YSCWM) and YSWC were demonstrated to have a remarkable influence on the migration and distribution of Japanese anchovy. Moreover, genetic composition of individuals from the overwintering population of SYC were more affiliated with those from the East China Sea overwintering populations (NEA, NEB). Xu et al. (2009) [24] reported that the overwintering aggregations in the southern Yellow Sea are admixtures of relatively isolated spawning groups of L. polyactis, which is consistent with findings in the present study. Possible causes for this process may be attributed to the fact that the YSWC weakens in late spring when the spawning migration of L. polyactis begins [75,78], which therefore leads to a temporal exchange of individuals from the two isolated overwintering aggregations in the Yellow Sea and East China Sea.

Inference for novel migration routes of L. polyactis
Population structure analyses based on the outlier locus Lpol03 indicated that spawning individuals from WL and XP showed more similarity to the Yellow Sea overwintering population (SYA and SYB) rather than the East China Sea overwintering group (NEA and NEB). The result was consistent with previous results detected by diversifying microsatellite markers, which indicated that populations in the CECS (Central East China Sea) were significantly differentiated from those in the NECS (North East China Sea) [4]. In addition, no clear geographical pattern of population genetic differentiation was observed in the present study based on the outlier locus Lpol03, which is consistent with the population structure detected with loci under diversifying selection in Wang et al. [4]. Therefore, our results suggested that spawning groups of L. polyactis inhabiting in the two overwintering grounds were locally adapted to multiple migration routes. For the Yellow Sea overwintering aggregation: one spawning group migrated northward to the northern part of Bohai Sea and Yellow Sea; the other one migrated southward to the coastal waters of the Central East China Sea and the Yangtze River estuary. Furthermore, overwintering aggregations from the North East China Sea mainly migrated to coastal areas off Shandong Peninsula. Interestingly, individuals of both overwintering groups spawn in separate areas of the Bohai Sea (Fig 1). One important oceanographic factor influencing the migration and distribution of L. polyactis in the Bohai Sea could be the existence of Northern Yellow Sea Cold Water Mass (NYSCWM), which is a unique hydrographic phenomenon in the Yellow Sea [79]. The center of NYSCWM locates outside Bohai Strait, forming a stable water mass with low temperature and high salinity from May to October, which may affect the spawning and overwintering migration routes of L. polyactis between the two overwintering grounds in the Yellow Sea and the East China Sea and the coastal spawning areas [79].

Conclusions
Although high gene flow was indicated by the neutral microsatellites, two genetically differentiated overwintering populations of L. polyactis were highlighted by the outlier locus potentially under selection, one mainly in the East China Sea and the other in the Yellow Sea. Accordingly, a novel migration pattern of L. polyactis was also suggested, which was inconsistent with results of previous studies inferred by traditional approaches [23]. It should be noted that although potential migrants were identified based on the outlier data, suggesting the existence of gene flow between the two migratory groups of L. polyactis, there were also strong evidence that gene flow between the two migratory groups might be limited by local adaptation. Population structure based on the outlier locus Lpol03 reported in our study exhibited a high level of genetic differentiation, which was in concordance with previous results based on genetic markers as well as capture data. In total, the consistent results suggested that the outlier locus Lpol03 was a candidate loci with stronger discriminating power leading to increased accuracy for future MSA studies. Future research should aim to generate a comprehensive link between the spawning groups and overwintering aggregations of L. polyactis, and to use the gene-associated marker in combination with seascape variables, which would provide a powerful means for uncovering the processes leading to adaptive divergence. Overall, the present study provided new insights into the population genetic structure and migratory routine of L. polyactis, which would have significant implications for the sustainable management and utilization of this important marine fishery resource.