Habitat Discontinuities Separate Genetically Divergent Populations of a Rocky Shore Marine Fish

Habitat fragmentation has been suggested to be responsible for major genetic differentiations in a range of marine organisms. In this study, we combined genetic data and environmental information to unravel the relative role of geography and habitat heterogeneity on patterns of genetic population structure of corkwing wrasse (Symphodus melops), a rocky shore species at the northern limit of its distribution range in Scandinavia. Our results revealed a major genetic break separating populations inhabiting the western and southern coasts of Norway. This genetic break coincides with the longest stretch of sand in the whole study area, suggesting habitat fragmentation as a major driver of genetic differentiation of this obligate rocky shore benthic fish in Scandinavia. The complex fjords systems extending along the western coast of Norway appeared responsible for further regional genetic structuring. Our findings indicate that habitat discontinuities may lead to significant genetic fragmentation over short geographical distances, even for marine species with a pelagic larval phase, as for this rocky shore fish.


Introduction
Connectivity between geographically separated populations plays a pivotal role in populations dynamics and genetic diversity [1]. The presence of physical barriers [2], environmental clines [3] and anthropogenic disturbances [4] may prevent connectivity, while the lack of suitable conditions to satisfy the biological requirements of the species [5][6] will shape patterns of genetic structure. While obvious in some terrestrial and riverine systems, boundaries to connectivity can be more inconspicuous in the marine realm. The relatively large population sizes and high dispersal potential of marine organisms commonly result in lower intraspecific genetic differentiation compared to freshwater species [7]. However, some species display sharp genetic discontinuities or breaks due to a variety of historical and contemporary processes acting as barriers to dispersal and gene flow, such as patchiness of suitable habitat or discontinuities in the oceanographic regimes [8][9][10].
Biological attributes of species, such as dispersal ability and reproductive mode, also play an important role on population demography, connectivity and the location of genetic discontinuities or breaks [11][12]. In species with very limited dispersal potential, genetic breaks may arise and persist for many generations even in the absence of physical barriers to gene flow, whereas their occurrence in high gene flow species will only occur when a barrier to gene flow is present [13][14]. Populations living at the limit of their distribution range may also exhibit strong patterns of genetic structure due to lower individual fitness in relation to severe selection regimes favoring locally adapted genotypes [15][16]. Hence, species with low dispersal capabilities living in the limit of their distribution ranges may display strong signatures of genetic differentiation and low levels of genetic diversity associated to habitat fragmentation [13,[17][18]. Integrating spatial information on habitat features with other ecological and genetic data can contribute to understand the mechanisms driving genetic variation among populations (see reviews [19][20][21][22]).
Corkwing wrasse (Symphodus melops) is a small rocky shore fish inhabiting coastal areas of the Northeast Atlantic and reaches its northern limit in Scandinavia. Previous population genetic studies on this species revealed a major genetic break across the North Sea, and ascribed the significant reduction in genetic diversity of the northern populations to the postglacial colonization of the Scandinavian Peninsula [2,23]. Significant phenotypic divergence in life-history traits between western and southern Norwegian populations [24] may indicate further isolation and subtle genetic divergence [25][26]. The current study aims at evaluate the relative role of geography and habitat heterogeneity on patterns of genetic population structure of corkwing wrasse, in an attempt to resolve underlying mechanisms behind cryptic genetic patterns in marine coastal species.

Material and Methods
The species and the study area Corkwing wrasse (Symphodus melops) is a small benthic fish inhabiting the first few meters of the rocky shorelines of the Northeast Atlantic from southern Portugal to Norway and the western part of the Mediterranean Sea [27]. Since the second half of 1980's, occurrence of corkwing wrasse in Scandinavia has increased greatly, in parallel with the water temperature increase registered in the area [2]. This species is now of commercial importance due to the high demand as cleaner fish by the salmon industry in northern Europe [28]. In contrast, its presence in the Mediterranean has declined and it is currently rare in the southern limit of its distribution range in Portugal [29]. Adult individuals show site tenacity [30], restricting species dispersal to the pelagic larval phase. Territorial males use seaweeds to build nests in rocky areas [31] and guard the benthic sticky eggs laid in the nest for 3-14 days (reviewed by Darwall et al. [32]). After hatching, pelagic larvae spend 3-4 weeks as part of the pelagic plankton prior to settlement (reviewed by Darwall et al. [32]).
The present study focuses on populations inhabiting the Norwegian coastline of North Europe. This area stands as the northern limit of distribution of many temperature species, including the corkwing wrasse, as well as the southern limit of some cold water species [33], with a trend towards warm water pelagic species being gradually more common in recent times [34]. The Norwegian coast is characterized by the predominance of rocky areas, except for an extensive sandy area along the coast of Jaeren and Lista covering 26 km between the southern tip of Norway and the southern limit of the deep western fjords (see Fig 1), and some very small sandy ("pocket") beaches in coves [35], unsuitable for reproduction in corkwing wrasse (see review by Darwall et al. [31][32]). The fragmented coastline with a large number of fjords scattered along the coast and the disparities in the physico-chemical properties of the waters makes the Norwegian coast a complex system with the ideal conditions for retention of planktonic organisms, isolation of fjord populations, existence of sharp genetic breaks and, ultimately, vicariance [36][37].

Sampling
The sampling design aimed at a geographically fine scale coverage of the southern and the western coast of Norway, to complement earlier investigations [2,23]. The sampling area extended from the Swedish east coast to slightly north of 63°N on the Norwegian west coast (cf. Fig 1). In total, we analysed 1437 fish collected at sixteen sampling localities, including eight localities from the western and eight localities from the southern Norwegian coast (details provided in Fig 1 and Table 1). Samples from southern localities were collected by a beach seine, while those from the west coast were collected by pots. Sampled fish were immediately transferred to a sea water tank and sacrificed in the most gentle and swift way by percussive stunning with a priest, in accordance with relevant legislation in Norway (Dyrevelferdsloven §12 (law of animal welfare: http://www.lovdata.no. Accessed 2016 July 30). Practices for sampling and handling of fish for this study were approved by the Norwegian Animal Research Authority and were performed by experienced personnel. The species is not protected by Norwegian law (it is a commercially harvested species in Norway), and no special permits were required either for research or commercially at sampled locations. Muscle tissue was taken from fresh or frozen specimens and stored in 96% ethanol prior to DNA extraction.

Genetic analysis
Total genomic DNA was extracted from ethanol-preserved flesh using either the Viogene Inc. extraction kit (Sunnyvale, CA) or the DNeasy kit (Qiagen, Hilden, Germany), re-suspending the DNA in TE buffer. Microsatellite polymorphism was screened at nine polymorphic markers and following the same PCR protocols as described earlier for the species [2,38]. DNA fragments were run with GeneScan Tm -600 Liz as size standard on an ABI 3130XL automated sequencer. As a guard against potential genotyping errors, all samples were run with the same size standard and on the same machine as those previously used by Knutsen et al. [2]. Capillary traces where scored independently by two trained people, and disagreements were re-analysed (with new PCR of individuals) in order to avoid misclassification of alleles and genotypes.

Statistical Analysis
Levels of genetic variation were characterized by counting observed alleles (A), allelic richness (A r ) and gene diversity within samples (H S ) and the total over all samples (H T ), based on Nei & Chesser [39], using FSTAT software package (version 2.9.3.2; [40]). Deviations from Hardy-Weinberg (HW) equilibrium were investigated by the exact probability test in GENEPOP (version 4.0; [41]). Here, and in subsequent situations of multiple tests, we adopted the false discovery rate (FDR) approach [42] when interpreting the significance of individual tests within a larger table. Non-random association of alleles at different loci (i.e, "linkage disequilibrium", LD) were tested in each sample separately by a G-test with 10 000 demorizations, 100 batches, and 1000 iterations per batch in GENEPOP (version 4.0; [41]). The presence of null alleles, stuttering errors or technical artifacts was investigated with MICROCHECKER (version 2.2.1; [43]).  [45] estimator θ applied to all samples, to pairs of sample localities and also within and among geographic regions. The statistical significance of observed genetic differentiation was estimated by 10 000 permutations in GENETIX (version 4.05; [46]). We adopted the FDR approach proposed by Benjamini & Yukutieli's [47] to correct for multiple tests in pairwise tables.

Genetic Structure
Spatial patterns of genetic population structure were investigated using several approaches. First, we adopted the Bayesian clustering analysis implemented in STRUCTURE (version 2.3.3; [48]). This analysis was performed assuming an admixture model, running seven replicates for each value of K between 1 and 10 and with Markov chain Monte Carlo (MCMC) resampling using 100 000 repetitions after a burn-in of 100 000. The most likely number of clusters, K, was estimated as the value which maximized the averaged log-likelihood, log Pr(X|K) and the ad hoc statistic ΔK [49]. Once K was determined, individuals were assigned to the respective clusters and plotted with DISTRUCT (version 1.1; [50]).
Second, in order to visualize spatial patterns of population structure and locate discontinuities among corkwing wrasse populations, we used GENELAND [51] in R [52]. The analysis was run with 10 replicates setting the number of groups K between 1 and 10. MCMC resampling was set at 100 000 repetitions, 100 thinning and 200 burn-in period. The best result was chosen based on the highest average posterior probability. The analysis was performed under both the correlated and uncorrelated frequency models. The latter model may be over simplistic and miss the subtle patterns of population differentiation detectable with the correlated model; however, it may perform better under isolation-by-distance and prevent overestimation of K [51,53].
Third, a neighbour-joining (NJ) tree, based on the modified Cavalli-Sforza's distance Da of Nei et al. [54], was constructed with the POPTREE software [55] to investigate the phylogenetic relationships between Norwegian populations of corkwing wrasse. The reliability of the tree topology was tested with 100 000 bootstrap replicates.

Isolation-by-Distance and Isolation-by-Environment
We examined the relative importance of geographic distance (isolation-by-distance, IBD) and habitat discontinuity (isolation-by-environment, IBE), on spatial patterns of genetic differentiation (linearized pairwise F ST estimates, F ST / (1-F ST )). Pairwise geographic distances between sample locations were calculated following the coastline (range of distances between 12 and 1259 km). Patterns of IBD were investigated comparing genetic and geographic distance matrices by Mantel tests with 10 000 permutations in IBDWS [56]. For the IBE analysis, data on habitat type was downloaded from the Norwegian Environmental Agency (http://kart. naturbase.no/) and imported in QGIS [57]. The IBE analysis considered the effects associated with the presence of large sandy areas (i.e., absence of a rocky substrate), which are unsuitable habitats for reproduction in this species. The analysis was performed by dividing the study area into a grid of cells of 500 x 500 m and counting the number of cells of unsuitable habitats between each pair of sample locations. The presence of unsuitable sandy areas was only considered when a stretch of sand covered at least two consecutive cells of the grid. Two consecutive cells of unsuitable habitat were given a score or distance of 1 while suitable cells were given a value of 0. As isolation increases with the size of the habitat discontinuity, consecutive cells of unsuitable sand were assigned the square value of their counts, in order to account for the supposedly increased difficulty for the rocky shore species of transgressing larger sandy areas. Thus, localities separated by an area covering three consecutive cells of sand were assigned a distance of 4, while four consecutive cells was given a distance of 9, and so on (range of estimates between 0 and 70). The relative contribution of geographic and environmental distance to genetic differentiation was determined by partial Mantel test with 10 000 permutation in IBDWS [56] controlling for the effects of one factor at a time. The additive effects of geographic and environmental distance to genetic differentiation was further investigated following the multiple matrix regression with randomization (MMRR) approach proposed by Wang [58] with 10 000 permutations in R [52]. This approach was suggested to be especially robust under low to moderate gene flow. In this analysis, matrices were first standardized by subtracting the mean and dividing by the standard deviation.

Summary statistics
Results were based on 1437 individuals genotyped at nine microsatellite markers with a successful coverage rate > 99% (only 54 missing genotypes, ranging between none and 10 individuals per locus). A total of 162 alleles were scored (S1 Table and  Overall, corkwing wrasse populations in this study were significantly structured at all loci (overall F ST = 0.064, P < 0.001, S1 Table). Pairwise F ST estimates averaged over loci ranged greatly among sample localities, from -0.002 to 0.150 (Table 2) with the largest differences found between western and southern Norwegian localities (overall F ST = 0.107, P < 0.001, Table 3). Within the western and southern regions, the southern (Skagerrak) appeared genetically more homogeneous (overall F ST = 0.003, P < 0.001) than did western samples (overall F ST = 0.005, P < 0.001). Genetic differentiation within the western region appears to be driven largely by the two most northern samples (SMO and VES: cf. Table 2).
Genetic variability varied greatly among sample localities in the southern and the western regions (see Table 1 and S1 Fig for details). Briefly, western Norwegian samples consistently  Deviations from Hardy-Weinberg (HW) expectations were significant in 11 of 144 (5.5%) cases. Four of them (2.8%) remained statistically significant at the 5% level also after FDR correction (Table 1: SMA11 in BKA, SMD112 and SMC8 in KRI and SMD112 in LIL), all due to a deficit of heterozygotes. MICROCHECKER suggested the presence of null alleles to explain the deficit of heterozygotes at locus SMC8 in KRI, but not in the other three cases. An examination of F IS estimates for each allele separately in each sample did not indicate any pattern in the departure from HW genotype proportions (data not shown). Omitting one of the three locus at a time (SMA11, SMD112, SMC8) overall F ST yielded similar results (i.e. F ST changed from 0.064 to 0.063, 0.066 and 0.069; respectively); therefore, we decided to proceed the downstream analyses keeping all loci.
Non-random association of alleles at different loci (LD) were found to be significant (at the 5% level) in 30 of 576 pairwise tests (5.2%) before correction for multiple tests. Significant outcomes appeared randomly distributed among samples and only one pair (SMB11-SMD112 in SMO) remained statistically significant after the FDR approach (at the 5% level). Therefore, our results are consistent with the loci being independent.

Pattern of genetic structure
The STRUCTURE analysis uncovered two distinct clusters (K = 2) of sample localities (S2 Fig). One cluster included the samples collected from the southern coast while the other cluster corresponded to the samples from the western coast. Looking at further clustering scenarios, STRUCTURE suggested genetic admixture at samples KRI and LIL for K = 3. At K = 4, western samples showed significant genetic admixture and suggested an isolation-by-distance pattern between two clusters. Analysis assuming further clustering, i.e. K > 4, did not resolve further grouping of individuals. Under the uncorrelated allele method, GENELAND supported the existence of distinct western and southern groups (cf. Fig 2). On the other hand, the less conservative correlated method suggested four groups, with the west coast being divided in three groups, comprising a) those from Boknafjord (BKA, BKB and BKC), b) those collected around Hardangerfjord (AUS, HAR and NOR), and c) the two most northerly samples (SMO and VES).
The NJ tree based on Da genetic distance corroborated the existence of two major clusters (100% bootstrap support) comprising populations inhabiting the west coast of Scandinavia and the Skagerrak coast (Fig 3). The analysis suggested further subtle structuring within each region. In the west coast, the topology of the tree resembled the three groups suggested under the correlated alleles model by GENELAND ; i.e. Boknafjord, Hardangerfjord and northern samples (SMO and VES), with relatively high bootstrap support (64-79%). In the Skagerrak, EGE clustered apart from the rest of the samples (94% bootstrap support); nevertheless, branch lengths separating Skagerrak samples were usually shorter than those separating western Scandinavian samples. correlation (R 2 = 0.428, P < 0.01) between geographical distance and genetic differentiation, but gave considerably higher score to IBE as an explanatory variable for the spatial patterns of genetic structure (R 2 = 0.938, P < 0.01). Genetic distance remained significantly correlated to both geographic distance (R 2 = 0.049, P < 0.01) and environmental distance (R 2 = 0.897, P < 0.01) when the influence of one of the other factor was controlled in partial Mantel tests (Table 4). Integrating genetic, geographic and environmental distance matrices under a multiple regression with randomization (MMRR) approach confirmed that environmental distance (IBE, ß E = 0.95, P < 0.01) was a much stronger predictor of the observed genetic patterns than was geographic distance (IBD, ß D = 0.05, P < 0.01) (Fig 4B-4D). The influence of the two factors (geographic and environment distance) was further investigated in order to understand regional patterns of genetic structure in each side of the genetic break, i.e. west and south Norway. In the southern coast, both partial Mantel tests and MMRR revealed that neither variable was correlated to genetic distance (Table 4). In the west coast, genetic differences showed significant but moderate correlation to geographic distance when controlling for environmental distances (R 2 = 0.162, P = 0.02), whereas no correlation between genetic and environmental matrices was detected when controlling for geographic distances (R 2 = 0.024, P = 0.25). The MMRR analysis confirmed that geographic distance (IBD, ß D = 0.14, P < 0.01) was a stronger predictor of the observed genetic patterns than environmental distance (IBD, ß E = 0.02, P < 0.01). Both factors together explained nearly 70% of the genetic variability in the west coast (Table 4).

Discussion
The current study performed on corkwing wrasse collected along the Norwegian coast revealed the existence of a major genetic discontinuity or "break" in this rocky shore fish species. The genetic break was located in the south-western part of the Scandinavian Peninsula, separating populations inhabiting the western and southern (i.e., Skagerrak) coast of the Scandinavian Peninsula (F ST = 0.107, P < 0.001, Table 3). This area is characterized by the presence of the longest stretch of sand (at Jaeren) along the Norwegian coast. Corkwing wrasse populations across the south-west Scandinavian genetic break displayed a marked difference in levels of genetic variability (both in number of alleles and heterozygosity), with higher levels in the west (cf. Table 1). These findings imply lack of gene flow across the break. It is possible that this marked genetic structuring of the species has its origin in processes, such as bottlenecks or founder effects, occurring during the (re)colonization of post-glacial Scandinavian waters, as discussed by Robalo et al. [23] and Knutsen et al. [2]. However, the fact that the genetic structure remains > 10 000 years later is a strong indication that whatever environmental feature(s) were responsible for generating it still operate to block gene flow in this species. Genetic differentiation across the Scandinavian Peninsula was of similar magnitude to that observed previously across the North Sea by Knutsen et al. [2] (cf. Table 3).
Our analyses suggested environmental distance in the form of long stretches of sandy areas to be a better predictor of current patterns of genetic population structure in Scandinavia than is geographical distance (Fig 4, Table 4). Long stretches of sandy area as a major obstacle to gene flow in corkwing wrasse is consistent with the reproductive mode of this obligate rocky shore species, e.g. nest building on rocky substrate [31][32] and low dispersal potential of adults [30]. Gene flow among western and Skagerrak populations could be possible during the few Table 4. Results of the partial Mantel tests and MMRR analysis comparing the contribution of geographical (IBD) and environmental distance (IBE) to genetic differentiation for the whole Scandinavia, and separately for southern (Skagerrak) and West coast samples. r = coefficient of correlation, P = P-values (significant values in bold). weeks that pelagic larvae travel as part of the pelagic plankton before settling to the bottom (reviewed by Darwall et al. [32]), but our results suggest no gene flow among populations across the putative south-west barrier (c.f. Fig 2 and S2 Fig). Selective forces may also lead to strong genetic differences, even in the presence of gene flow [59][60], but it seems highly unlikely that selection should operate on all investigated microsatellites to yield such an effect. Alternatively, western and southern populations could be adapted to different environmental conditions and therefore selected against, should they manage to cross the "barrier". Genetic breaks have usually been concordant with the location of strong historical oceanographic features and biogeographic breaks, reflecting the parallelism between the processes governing geographic and genealogical boundaries of the species [25,[61][62][63]. Habitat discontinuity in relation to changes in bathymetry has been addressed to explain genetic breaks in a wide variety of rocky shore species [6,[64][65]. However, examples of genetic breaks associated to the presence of sandy areas in rocky shore fish are scarce (but see [66][67]).

Partial Mantel MMRR
Regionally, corkwing wrasse revealed further patterns of genetic substructure (c.f. Table 2 and Fig 2 and S2 Fig). Western Scandinavian populations were genetically more differentiated than southern populations (Table 3). In the west coast, corkwing wrasse displayed a moderate isolation-by-distance pattern in genetic diversity (Table 4, Fig 4 and S2 Fig) with the presence of three major groups corresponding to samples around Boknafjord, Hardangerfjord and the two most northerly samples (SMO and VES) (Figs 2 and 3). Populations in the south were genetically fairly homogeneous (Tables 2 and 3) and the relatively strong coastal currents in the south [68] may favor gene flow among these southern localities. Results of the phylogenetric tree ( Fig  3) clustered EGE apart from the rest of the Skagerrak samples (94% bootstrap support), and it is interesting to note that the second largest sandy area in Norway around Lista separate this locality from the other southern ones (see map for details, Fig 1). In contrast to the single stretch of sand in Jaeren, sandy areas around Lista are interrupted by intermittent rocky areas which may facilitate some population connectivity. The presence of small fjords may be partly responsible for the subtle patterns of regional genetic population structure observed in corkwing wrasse as already reported for other coastal species in the Skagerrak [69][70].
Current findings have important implications for fisheries management and conservation, considering that corkwing wrasse is intensively exploited for use as a cleaner fish in the salmon industry [28,30]. A major concern lies in the fact that large numbers of individuals caught in the Skagerrak coast are translocated to salmon farms located in northern or western fjords, where temperature conditions are less suitable for the species [32]. Once salmons are harvested, net pens are emptied and wrasses released with no information of their fate, including putative hybridization with genetically different local stocks. Some of the concerns include the putative risk on the original genetic makeup of native stocks, differences in adaptive fitness or even extinction risk [71][72]. Hence, the strong genetic differences observed among western and Skagerrak populations suggest discontinuing present translocations among regions, and instead supplement salmon farms with local cleaner fish, as commonly recommended in stock enhancement programs [73]. colours denote inferred clusters (K = 2 to 5). The right graph shows ΔK for different numbers of genetic clusters, suggesting K = 2 as the most likely outcome. (PDF) S1 Table. Genetic variability at nine microsatellite loci. A = number of alleles, HT = gene diversity in the total material; FST = estimate of θ (Weir & Cockerham 1984). Numbers in bold indicate statistical significant tests at 1% level after the False Discovery Rate approach (Benjamini & Hochberg 1995). (PDF)