Hierarchical genetic structure in an evolving species complex: Insights from genome wide ddRAD data in Sebastes mentella

The diverse biology and ecology of marine organisms may lead to complex patterns of intraspecific diversity for both neutral and adaptive genetic variation. Sebastes mentella displays a particular life-history as livebearers, for which existence of multiple ecotypes has been suspected to complicate the genetic population structure of the species. Double digest restriction-site associated DNA was used to investigate genetic population structure in S. mentella and to scan for evidence of selection. In total, 42,288 SNPs were detected in 277 fish, and 1,943 neutral and 97 tentatively adaptive loci were selected following stringent filtration. Unprecedented levels of genetic differentiation were found among the previously defined ‘shallow pelagic’, ‘deep pelagic’ and ‘demersal slope’ ecotypes, with overall mean FST = 0.05 and 0.24 in neutral and outlier SNPs, respectively. Bayesian computation estimated a concurrent and historical divergence among these three ecotypes and evidence of local adaptation was found in the S. mentella genome. Overall, these findings imply that the depth-defined habitat divergence of S. mentella has led to reproductive isolation and possibly adaptive radiation among these ecotypes. Additional sub-structuring was detected within the ‘shallow’ and ‘deep’ pelagic ecotypes. Population assignment of individual fish showed more than 94% agreement between results based on SNP and previously generated microsatellite data, but the SNP data provided a lower estimate of hybridization among the ecotypes than that by microsatellite data. We identified a SNP panel with only 21 loci to discriminate populations in mixed samples based on a machine-learning algorithm. This first SNP based investigation clarifies the population structure of S. mentella, and provides novel and high-resolution genomic tools for future investigations. The insights and tools provided here can readily be incorporated into the management of S. mentella and serve as a template for other exploited marine species exhibiting similar complex life history traits.

In stage/run 2, total 5 populations were included (figure 2). Since scenario 1 was supported in run 1 (see result), we simply added the 'slope' group to the scenario 1 of the run 1 and assessed 3 different scenarios in run 2. Here, scenario 1 proposes first split between the 'deep' and 'shallow' groups followed by the second split between the 'shallow' and 'slope' groups. In scenario 2, the second split occurs between the 'deep' and 'slope' groups. Scenario 3 suggests a concurrent split among the 'shallow', 'deep' and 'slope' groups.

Confidence in scenario choice and model checking
We assessed confidence in scenario choice by evaluating Type I and Type II error rates, following the method described in Cornuet et al. (2010). One thousand test data sets were simulated using each of scenarios. The posterior probability of the competing scenarios was estimated for each of the pseudo-observed data sets. Type I error was estimated by counting the proportion of data sets simulated under the best scenario but resulted in highest posterior probability for other scenarios. Type II error was estimated by the proportion of data sets that resulted in highest posterior probability of the best scenario, although simulated with other scenarios.
For model checking, 1000 data sets were simulated (for each scenario) by drawing with replacement parameter values among the data sets used to compute the posterior distribution of the parameters. The similarity between simulated and real data was estimated using summary statistics differing from the summary statistics used to conduct model choice as suggested by Cornuet et al., (2010). For each summary statistics, the discrepancy between simulated and observed data was assessed.
We assessed precision of parameter estimation by computing the relative median of the absolute error on 500 pseudo-observed data sets simulated with the best scenario. Relative median of the absolute error is the 50% quantile (over the 500 pseudo-observed data sets) of the absolute value of the difference between the median value of the posterior distribution sample (in each data set) and the true value, divided by the true value (Cornuet et al., 2010).
The type I error rate using the logistic method in round 1 and 2 were 0.07 (0.03 when only scenario 6 was excluded) and 0.03, respectively. Type II error rates were 0.038 and 0.018 using the logistic method in round 1 and round 2 analyses, respectively.
Model adequacy was assessed for the scenarios by measuring the similarity between the real data set and data sets simulated with each considered scenario under the posterior distribution of parameter values. Similarity was assessed using all available test summary statistics. For the best scenarios less number of observed summary statistics (than those from the other scenarios) deviated significantly from its simulated distribution.
Parameter estimates gave reasonable values which should be reliable because of small relative median of the absolute errors ranging from 0.127 to 0.178 (round 2 analysis). In round 1 analysis, relative median of the absolute errors ranged from 0.27 to 0.44.