Genetic Variation and Breeding Signature in Mass Selection Lines of the Pacific Oyster (Crassostrea gigas) Assessed by SNP Markers

In breeding industries, a challenging problem is how to keep genetic diversity over generations. To investigate genetic variation and identify breeding signatures in mass selected lines of Pacific oyster (Crassostrea gigas), three sixth-generation selected lines and four wild populations were assessed using 103 single nucleotide polymorphism (SNP) markers. The genetic diversity data indicated that the selected lines exhibited a significant reduction in the observed heterozygosity and observed number of alleles per locus compared with the wild populations (P≤0.05), indicating the selected lines tended to lose genetic diversity contrasted with the wild populations. The unweighted pair-group method with arithmetic mean (UPGMA) analysis showed that the wild populations and selected lines were not separated into two groups. Using four outlier tests, a total of 17 loci were found under selection at two levels. The global outlier detection suggested that 4 common outlier loci were subject to selection using both the hierarchical island model and Bayesian likelihood approaches. At regional level, 3 SNPs were detected as outlier using at least two outlier tests and one outlier SNP (CgSNP309) was overlapped in the two wild-selected population comparisons. The candidate outlier SNPs provide valuable resources for future association studies in C. gigas.


Introduction
The Pacific oyster (Crassotrea gigas), naturally distributed around Japan, China and Korea, is now an important cultivated oyster species worldwide [1]. Many countries has started introducing C. gigas since 1940s, mainly because of its rapid growth rate, high disease resistance and strong environmental adaptability [2]. The C. gigas is one of the most popular oyster species in China, and its main places of production are Shandong and Liaoning provinces [3,4]. In 2012, China produced 3.94 million tons of oysters with C. gigas as one of the most dominant species [4,5]. However, the broodstock used today remains largely unselected and thus C. gigas has gained little from heredity improvement by selective breeding [6].
Several selective breeding projects have been launched in couple of countries and obtained encouraging results [7][8][9]. With the aim of improving the productivity traits of C. gigas, a breeding program selected for fast growth has also been initiated in China. The first generation of selection was carried out by mass selection on three cultured stocks from China, Japan and Korea in 2007 and the average improvement (increase in shell height) was 10% [4]. An increase in growth rate is detected over the successive six-generation selection, but little is known about the effects of strong directional selection on genetic level in the process of cultivation. The main focus of selected lines is on inbreeding and the decrease of genetic diversity [10,11]. Yu and Guo [11] detected rare alleles of the 4 selected strains decreased significantly compared with a wild population in Eastern oyster (Crassostrea virginica). Moreover, Cruz et al. [12] found selected strains through breeding programs tended to lose genetic diversity compared with wild populations in Pacific white shrimp (Litopenaeus vannamei). The reduction of genetic diversity in a population may possibly reduce disease resistance and decrease adaptability to environmental changes [13,14]. Therefore, it is crucial to survey the genetic diversity within or between selected lines and wild populations for successful hatchery management of C. gigas.
In recent years, the single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) developed from EST sequences have been extensively used in population genetic studies. Microsatellites have been chosen in many studies for their codominant inheritance and high allelic variability [15]. However, SNP genotyping results can be compared across different platforms and laboratories more easily than microsatellite data, therefore facilitating the integration and interpretation of genotyping data across different databases. Moreover, mutations resulting in some SNPs can be responsible for an adaptive phenotype or the direct target of selection and have been shown to be correlated with economic traits in many aquatic animals [16][17][18]. Therefore, SNP markers offer a valuable chance to explore the genetic basis of phenotypic variation during the selective breeding program in China.
In the present study, we used 103 SNP markers to evaluate the genetic diversity level, assess the genetic differentiation, and identify putative loci under selection in three sixth-generation selected lines and four wild populations of C. gigas. The main goal is to demonstrate the effects of cultivation on genetic diversity in the selective breeding program of C. gigas, and to identify candidate SNPs under selection through the use of outlier analysis methods.

Ethics Statement
The C. gigas does not belong to endangered species and the sample locations are not in protection.

Oyster Collections and DNA Extraction
Three selected lines and four wild populations of C. gigas were surveyed in our study. Collection details for the 7 C. gigas populations were shown in Fig 1 and [4]. In each line, 30-60 pairs of these individuals from the top end of the shell height distribution were used as broodstock oysters for the next generation [19]. In July 2012, 91 (50 ♀×41 ♂) and 107 (59 ♀×48 ♂) individuals from the fifth-generation selected lines of Japan and Korea (JS5 and KS5) were selected as parental oysters for the sixth-generation selected lines of Japan and Korea (JS6 and KS6), respectively. In July 2013, 85 (50 ♀×35 ♂) individuals in the fifth-generation selected lines of China (CS5) were selected as parental oysters for the sixth-generation selected lines of China (CS6). Fertilization was conducted at a ratio of 50 sperm per egg with 10 7 oocytes for each female [4,19]

Data Analysis
The observed number of alleles (Na), effect number of alleles (Ne), shannon's information index (I), observed heterozygosity (Ho), expected heterozygosity (He), minor allele frequency (MAF), inbreeding coefficients (Fis) and chi-square analysis of Hardy-Weinberg equilibrium (HWE) were evaluated by POPGENE 1.32 [23]. All comparable statistics (Na, Ne, I, Ho and He) were calculated between wild populations and selected lines using Mann-Whitney U-Test [24] implemented with SPSS 16.0.
The population genetic structure analysis was estimated by Structure 2.3.4 with a modelbased Bayesian procedure [29,30]. The analysis was conducted using a burn-in period of 10 5 iterations and a run length of 10 6 MCMC replications. The number of clusters (K) was set from 1 to 10 with three replicates. The most probable K value was evaluated by STRUCTURE HARVESTER [31,32].
To explore candidate SNPs under selection, Fst-based outlier tests were conducted using two island models including a finite island model (FDIST approach) and a hierarchical island model implemented in Arlequin 3.5.1.3 [25,33], and Bayesian likelihood approach implemented in BayeScan [34]. To detect additional evidence of selection, lnRH tests were also conducted in population pairwise comparisons.
The outlier analysis was implemented at two levels. (1) An overall analysis encompassed all populations under the hierarchical island models. Three groups were set (CS6+RS+DY+RC vs. JS6+MG vs. KS6) based on the Structure result. The analysis was conducted using 50000 coalescent simulations with 10 groups of 100 demes. (2) An analysis was conducted between selected lines and wild populations from which the selected lines originated (CS6 and RS; JS6 and MG) under a finite island-model. We conducted 50000 coalescent simulations with 100 demes to generate the joint distribution of Fst versus heterozygosity. Loci which were outside the 99% confidence intervals were treated as outliers potentially under selection.
The outlier tests at the two levels were also calculated with the BayeScan software. The analyses were estimated using a pilot run of 20, a burn-in of 50000 with 100000 iterations each, a sample size of 5000, a thinning interval of 10, and an FDR of 0.05. Loci with log 10 values of the posterior odds (PO) >0.5 and 2.0 were regarded as candidate SNPs under selection with substantial and decisive evidence [35].
The lnRH test was applied to estimate the ratio of gene diversity (heterozygosity) of all loci in the population pairwise comparisons as follows: where H pop1 and H pop2 denote expected heterozygosity for population 1 and population 2 [36].
The lnRH values were standardized for each population comparisons and therefore the standardized distributions had a mean of zero and a standard deviation of one [37]. Those SNPs fell out of the 95% interval were taken as candidate loci under selection. Sequence annotation was conducted by BLASTx software in NCBI database (http://www. ncbi.nlm.nih.gov/) and OysterBase (http://www.oysterdb.com), and the critical E value was set as 1.0×10 −6 . The NCBI ORF finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) was used to distinguish synonymous SNPs, non-synonymous SNPs or SNPs from untranslated regions (UTRs). The putative function of genes was identified by using the Gene Ontology (GO) annotation by mining the Swiss-Prot database (http://us.expasy.org/sprot/sprot-top.html) and OysterBase.

Genetic Diversity
Genetic variability indices for the four wild populations and three selected lines of C. gigas were shown in Table 2. For all the genetic diversity parameters, the selected lines had lower values (mean Na, Ne, I, Ho and He) than the wild populations, but no significant loss of Ne, I or He were detected (P > 0.05). The observed heterozygosities ranged from 0.2703 to 0.2939 with a mean of 0.2806 in the wild populations, and ranged from 0.2479 to 0.2733 with a mean of 0.2599 in the selected lines. There is a significant reduction of observed heterozygosities in the selected lines in contrast with the wild populations (P = 0.05). The observed number of alleles per locus varied from 1.9417 to 1.9709 in the selected lines, and varied from 1.9806 to 2.000 in the wild populations. There was a significant reduction of the observed number of alleles per locus in the selected lines compared with the wild populations (P < 0.05).
Information of the 103 SNPs evaluated from the three selected lines and four wild populations were summarized in S1 Table. All loci had two alleles in the 7 populations except that 19 loci had only one allele in one or more populations, including 3 loci (CgSNP252, CgSNP265 and CgSNP283) in CS6, 5 loci (CgSNP149, CgSNP176, CgSNP254, CgSNP265 and CgSNP305) in JS6, 6 loci (CgSNP14, CgSNP140, CgSNP176, CgSNP187, CgSNP252 and CgSNP319) in KS6, 1 locus (CgSNP283) in RS, 2 loci (CgSNP254 and CgSNP309) in RC and 2 loci (CgSNP176 and CgSNP283) in DY. A total of 83 of the 721 single-locus exact tests showed significant departures

Genetic Differentiation
Most pairwise Fst values were significant (P<0.001), except for that estimated between wild populations RS and RC. The lowest value was observed between RS and RC (Fst = 0.00689), however the highest value was detected between JS6 and KS6 (Fst = 0.11737) ( Table 3) The pairwise Nei's genetic distances among populations were also shown in Table 3. The lowest value was observed between RS and RC (0.0083), while the highest value was detected between KS6 and JS6 (0.0571). The UPGMA tree was constructed based on pairwise genetic distance (Fig 2). The UPGMA tree separated the 7 populations into two clusters. The KS6 originated from Korea belonged to one cluster while the other cluster included the other 6 C. gigas populations. Moreover, this cluster was further divided into 2 subgroups; the first subgroup contained JS6 and MG originated from Japan, and the second subgroup contained CS6, RS and DY originated from China, and RC from Korea.

Genetic Structure
The genetic structure test indicated that the most likely number of genetic groups was K = 3. The first group consisted of CS6, RS and DY originated from China, and RC from Korea. The second group consisted of JS6 and MG originated from Japan. Moreover, the selected line KS6 originated from Korea constituted the third genetic group (S1 Fig).

Outlier SNPs
Both the hierarchical island model and Bayesian likelihood approach revealed 4 loci (CgSNP140, CgSNP158, CgSNP176 and CgSNP209) under selection across all populations (Table 4; Fig 3). To detect specific footprint of artificial selection at regional level, we conducted pairwise tests between selected lines and wild populations from which the selected lines originated. The Fdist method revealed 5 candidate SNPs under selection, including 3 loci (CgSNP40, CgSNP225 and CgSNP236) for populations originated from China (CS6+RS) and 2 loci (CgSNP176 and CgSNP197) for populations originated from Japan (JS6+MG) ( Table 4). The same two SNPs (CgSNP 225 and CgSNP 176) were also detected by BayeScan. The lnRH   iron-sulfur protein 2 and Collagen alpha-5(VI) chain were involved in G-protein coupled receptor protein signaling pathway, lipid transport, oxidation-reduction process and lipoprotein metabolic process, however, the putative function of other five proteins (CGI_10028477, flotillin 2, CGI_10002462, UPF0451 protein C17 or f61-like protein and CGI_10010736) could not be identified using GO searches.

Genetic Diversity
Maintenance of genetic variation is known to be important for long-term survival of populations because the level of variation determines their adaptability to environmental changes [38,39]. Heterozygosity in many instances is considered as the primary parameter to reflect overall genetic variability of populations [40,41]. In our study, the selected lines showed a significant reduction in the observed heterozygosity contrasted with the wild populations. Similar results have been detected in cultivated Atlantic salmon (Salmo salar) and Pacific abalone (Haliotis discus hannai) populations [42,43]. Many researches have indicated loss of alleles is often more easily observed than decrease of heterozygosity in cultivated or selected populations [10,11,44]. Yu and Guo [11] detected the decrease in allele number was significant for selected lines NEH (by 36.8%) and CTS (by 35.1%) contrasted with the wild population of C. virginica. Wang et al. [45] found the cultured populations lost 9 of the 45 alleles (20%) compared with the wild population of bay scallop (Argopecten irradians). Compared with wild populations, 20% to 48% fewer alleles were detected in farmed salmon strains from Ireland and Norway [46]. In our study, the selected lines exhibited a significant reduction in observed number of alleles in contrast with the wild populations (P<0.05). Moreover, 0.98% (2/205) and 2.43% (5/206) fewer alleles were found in selected lines CS6 and JS6 than their wild progenitors (RS and MG), respectively. The CS6 stock seemed to be missing two rare alleles at 2 loci (CgSNP252 and CgSNP265) compared with RS. No individuals with the rare allele of the CgSNP265 found rarely in RS, were sampled from CS6. There was a possibility that this rare allele existed in CS6, but was not sampled in the population. As this rare allele frequency was 0.0521 in RS, the probability of completely missing this rare allele in the CS6 (48 samples) should be estimated as 0.9479 96 = 0.0059. Therefore, it is most likely that the CS6 has indeed lost this rare allele at the CgSNP265. Similarly, the JS6 may have indeed lost rare alleles at 4 loci (CgSNP149, CgSNP176, CgSNP254 and CgSNP305). Taken together, we can make a conclusion that the selected lines through breeding programs tended to lose genetic diversity compared with the wild populations. The decrease of genetic diversity may result from genetic drift, bottlenecks, and inbreeding caused by a reduced effective population size.
Significant deviations from HWE were detected both in selected lines and wild populations. In selected lines, heterozygote deficiency may be mainly because of a limited number of founders and artificial selection [47]. In the wild populations, heterozygote deficiency may be cause by Wahlund effect and natural selection. As heterozygotes with null alleles may be taken as homozygotes, null alleles may also account for the heterozygote deficiency.

Genetic Differentiation
The UPGMA analysis indicated that the selected lines and wild populations were not separated into two groups and even the wild populations and selected lines from the same origin country were culstered into the same subgroup except for KS6 and RC, suggesting there was no clear division between the wild populations and selected lines. The result is also supported by the Structure analysis.
Within the selected lines, moderate genetic differentiation (0.05<Fst<0.15) was detected. The UPGMA analysis also demonstrated the three selected lines fell into three groups. The significant genetic differentiation among the three selected lines may result from three different founder populations, thus providing the genetic basis for the establishment of selected lines for faster growth using three C. gigas stocks from China (Rushan), Japan (Miyagi Prefecture) and Korea (Pusan).
Within the wild populations, slight genetic differentiations (Fst<0.05) were detected among the three populations RS, DY and RC. The Nm values among these populations ranged from 7.98723 (RC and DY) to 36.03447 (RS and RC), which appeared to be sufficiently large to swamp potential for large genetic differences. Since oyster adults are sedentary, larval dispersal becomes a decisive factor affecting gene flow among wild populations. Therefore, the long pelagic larval phase (14-21 days) and high fecundity (10 to 50×10 6 oocytes per female) of the Pacific oyster may account for the relatively large gene flow detected among RS, DY and RC [48,49]. However, moderate genetic differentiations were found between population MG and other three populations RS, DY and RC, suggesting potential barriers to gene exchange may exist. Many studies showed a significant relationship between gene flow and geographic distance, suggesting that a pattern of isolation by distance (IBD) explained a great proportion of gene flow in Mactra chinensis and Crassostrea ariakensis [50,51]. The IBD pattern indicates long geographic distance can restrict migration among wild populations, which may explain the restricted gene flow between population MG and other three populations RS, DY and RC. Overall, the genetic relationship among the four wild populations could be visualized using the UPGMA tree, which indicated that RS and RC had the closest relationship with DY being the sister group, while MG was in another clade.

Outlier SNPs
In order to improve the productivity traits of C. gigas, successive generations of mass selection for fast growth were conducted in three C. gigas stocks from China, Japan and Korea since 2007. As there are similarities in controlled selective regimes, the selected lines may provide a valuable opportunity to study parallel evolution, which is taken as one of the most convincing manifestations of the effect of selection in driving adaptive change [52,53]. Our study provided support for the hypothesis of parallel evolution at the DNA level as one outlier SNP (CgSNP309) was overlapped in the two wild-selected population comparisons. The parallel evolution is also shown in two ways in transcription level, one by the same DNA sequence polymorphisms with direct changes in gene expression, and the other by the different DNA sequence polymorphisms with the same downstream pathways during the cultivation [54]. Flori et al. [55] detected that different DNA sequence polymorphisms but the same physiological pathways were influenced during the cultivation in dairy cattle. The similar results were also found in rainbow trout (Oncorhynchus mykiss) and brook charr (Salvelinus fontinalis) [56,57]. However, we can not conclude that parallel evolution at functional levels may account for the result in our study as only four outliers could be annotated with GO terms.
Four different tests (hierarchical island model, fdist island model, Bayesian likelihood approach and lnRH test) were used to detect outlier SNPs potentially under selection during the selective breeding program. These approaches rely on the rationale that selection will lead to increased genetic differentiation between populations and should exhibit a reduction or elevation in genetic variation contrasted with neutral genes. Actually, selection is not the only cause of variation changes at particular loci, reduced variation or increased differentiation can also arise from genetic drift, bottlenecks or founder events [58]. The effective population size for these selected lines is likely to have been less than 100, allowing for substantial random genetic drift over six generations.
The highly consistent results from the hierarchical island model and Bayesian likelihood approach for global outlier analyses strongly suggested that the identified outlier SNPs were subject to selection. Moreover, the same three SNPs (CgSNP140, CgSNP176 and CgSNP209) were also detected as outliers using selected lines and wild populations from which the selected lines originated (CS6, RS, JS6 and MG) (data not shown). At regional level, 3 SNPs (CgSNP225, CgSNP 236 and CgSNP 176) was detected as outlier using at least two analysis methods. Those outlier SNPs found in only one test should be considered with caution. The inconsistent results obtained by the three methods (fdist island model, Bayesian likelihood approach and lnRH test) were probably due to the different measures of variability and different assumptions [59]. Although the outlier analysis provides an encouraging result, association genetics and functional studies are ultimately required to confirm the outlier loci are involved in the artificial selection during the selective breeding program in China.
In this study, SNPs has been demonstrated to be a good marker of choice for monitoring the genetic variation in wild populations and selected lines of C. gigas. The genetic diversity analysis showed that the selected lines tended to lose genetic diversity contrasted with the wild populations during the successive six-generation selection. Moreover, the UPGMA results suggested there was no clear division between the wild populations and selected lines. In addition, a total of 17 loci were found under selection at two levels using four outlier tests. Further functional studies are needed to confirm the role of the candidate outlier SNPs during the domestication process in C. gigas.