Genetic Structure of Bluefin Tuna in the Mediterranean Sea Correlates with Environmental Variables

Background Atlantic Bluefin Tuna (ABFT) shows complex demography and ecological variation in the Mediterranean Sea. Genetic surveys have detected significant, although weak, signals of population structuring; catch series analyses and tagging programs identified complex ABFT spatial dynamics and migration patterns. Here, we tested the hypothesis that the genetic structure of the ABFT in the Mediterranean is correlated with mean surface temperature and salinity. Methodology We used six samples collected from Western and Central Mediterranean integrated with a new sample collected from the recently identified easternmost reproductive area of Levantine Sea. To assess population structure in the Mediterranean we used a multidisciplinary framework combining classical population genetics, spatial and Bayesian clustering methods and a multivariate approach based on factor analysis. Conclusions FST analysis and Bayesian clustering methods detected several subpopulations in the Mediterranean, a result also supported by multivariate analyses. In addition, we identified significant correlations of genetic diversity with mean salinity and surface temperature values revealing that ABFT is genetically structured along two environmental gradients. These results suggest that a preference for some spawning habitat conditions could contribute to shape ABFT genetic structuring in the Mediterranean. However, further studies should be performed to assess to what extent ABFT spawning behaviour in the Mediterranean Sea can be affected by environmental variation.


Introduction
Assessing the correlation between landscape features with the genetic variation of populations might lead to identifying environmental factors that are involved in the adaptive divergence of populations [1]. This issues is crucial in the marine realm for reconciling management and conservation of fishery stocks [2].
The Mediterranean is a temperate sea with sharply different oceanographic conditions in the western and eastern parts [3]. In spite of such differences, the two parts extensively exchange water masses between them and with the Atlantic. Surface (0-150 m depth) Atlantic water masses with relatively low salinity (36.2 practical salinity unit, psu) enter the Mediterranean through Gibraltar Strait [4] and move eastward along the North African coast (i.e. the Algerian current) reaching the Levantine Sea. Because evaporation is more intense in the Eastern Mediterranean Sea, salinity increases eastwards with a maximum of 38.5 psu in the Levantine Sea [5]. In the intermediate layer (150-600 m depth), the high-salinity Levantine water masses (,39.1 psu) move westwards and outflow in the Atlantic. This circulation pattern generates anticyclonic gyres and steep gradients of temperature and salinity over short distances (e.g. the Almeria-Oran oceanographic front), which are almost permanent in the western basin and more variable in the eastern Mediterranean. In this complex and changing Mediterranean environment, the Atlantic Bluefin tuna (ABFT, Thunnus thynnus) shows a spatial population structure, which is stable over short and long time periods [6][7][8][9]. Despite its highly migratory behaviour and high potential of dispersal at the larval stages, the Mediterranean ABFT subpopulations display partially independent demographic dynamics [8]. A long-term correlation between ABFT abundance and surface temperature was revealed by time series catch analyses suggesting a potential strong influence of environmental factors on the ABFT migratory behaviour [10]. Population genetics offers powerful tools to identify connectivity and structure of marine populations, which might escape direct observation. The multivariate analysis of genetic data is particularly crucial when dealing with relatively weak genetic differences, as commonly detected in high-dispersal marine species.
Multidisciplinary seascape genetics (sensu [2]) addressed important issues in the spatial ecology of marine populations combining genetic and oceanographic data under ecological modelling [11]. The variation of seascape and environmental features may be correlated with patterns of genetic diversity in marine fish species [12], providing evidence for adaptive mechanisms [13], indeed genetic differentiation in several marine fish species [14,15] has already been shown to correlate with seawater salinity and temperature.
Here, we test the hypothesis that dispersal and reproduction of the ABFT populations in the Mediterranean correlate with the physical environmental heterogeneity. We do that by comparing measures of ABFT genetic structure at microsatellite loci with surface temperature and salinity variation. For this purpose, we numerically and geographically expanded the ABFT sampling [9] by adding a sample collected from the easternmost spawning area known in the Mediterranean [16]. We also improved the analytical framework used in the previous analyses of ABFT population structure in the Mediterranean [6][7][8] by using a modified version of the software STRUCTURE [17], in which the basic models are extended to incorporate information on the sampling location, necessary to properly infer population structure when genetic differences between subpopulations are small. Indeed, several simulations and empirical studies showed that STRUCTURE may not accurately infer the correct number of genetic clusters in a sample when population structure is weak [18,19]. The new method, developed by [17], groups individuals from the same sampling location, improving the performance of the analysis, but at the same time using this information only when clusters are correlated with their sampling location and allowing for the possibility that this information can be only partially, or even not at all, informative [17].
We also used the spatial Bayesian clustering models implemented in the software GENELAND [20,21] to comparatively test the different behaviours of these two software packages in modelling the ABFT genetic structure. These models explicitly account for the spatial location of sampled observations and include a priori spatial autocorrelation in the genetic data, assuming that proximate observations tend to be more similar than distant ones. This assumption is useful for exploring possible spatial patterns that may arise when population differentiation occurs by limited gene flow influenced by the occurrence of landscape barriers. The model naturally incorporates the spatial locations of host samples for its assumption of spatial autocorrelation. The analytical approach was completed by multivariate methods that have displayed great efficiency in extracting information from genetic markers [22][23][24] because of their independence from genetic model assumptions (i.e. the Hardy-Weinberg equilibrium) and their performance to summarize the genetic variation into a few synthetic variables [25].

Ethics Statement
Tissue samples of the Atlantic Bluefin tuna Thunnus thynnus used in this study were collected from Mediterranean individuals caught during scientific research programs under the permission of the Italian Ministry of Agricultural and Forestry Policies (locations:

ABFT Microsatellite Dataset
In this study we have reanalysed the genetic variation of six ABFT Mediterranean population samples (N = 256) previously genotyped at potentially neutral microsatellite loci by [8]. In addition, we have added to this data set a population sample collected from the Levantine Sea, off the coast of Cyprus (CYP, N = 60; Fig. 1). The CYP sample is the easternmost ABFT Mediterranean sample analysed so far for genetic variation and it was collected from an ABFT spawning area identified in the Eastern Mediterranean [16]. Therefore, the complete data set includes seven ABFT population samples (N = 316, Fig. 1) genotyped at seven microsatellite loci (Table S1, Text S1). Lab protocols and experimental conditions used for microsatellite PCR amplification and individual genotyping are reported in the SI and Dataset S1.

Population Genetic Analyses
Allelic richness was estimated using FSTAT version 2.9.3.2 [26], expected (He) and observed (Ho) heterozygosity per locus and per sample, and the corresponding exact test for Hardy-Weinberg Equilibrium (HWE) were calculated by ARLEQUIN v. 3.5 [27] after 1,000,000 steps of Markov chains and 100,000 dememorization steps (Table S2). ARLEQUIN v. 3.5 was also used to estimate pairwise F ST [28] using 10,100 permutations to obtain the null distribution of F ST under the hypothesis of panmixia. A global HWE test was performed by GENEPOP 4.0 [29] using 10,000 dememorization steps of Markov chains, 20 batches and 5,000 iterations per batch. Sequential Bonferroni correction was applied for multiple test adjustment [30] (a = 0.0125).
The software STRUCTURE [31,32] estimates Pr(X|K), the probability of the data given K genetic clusters of individuals (K = 1, 2…), by a Bayesian model-based algorithm under the HWE assumption. STRUCTURE also estimates allele frequencies in each cluster and the probability of membership of each individual to each cluster, by means of a Markov Chain Monte Carlo (MCMC) method assign genotypes to clusters minimizing the linkage disequilibrium of the clusters. The modified version STRUCTURE 2.3 developed by [17] was run allowing the use of sampling location information. This method is different from the 'Model with prior population information' present in the original STRUCTURE paper [31]. That model was designed to test for the presence of migrants belonging to a different location and is only useful for highly informative data, i.e. when there is strong evidence of population structure and sampling locations correspond almost exactly to the inferred clusters. An important class of Bayesian clustering models improves STRUCTURE by including information on individual geographic coordinates. We ran ten independent analyses (each with a different value of K, 1-10) using the admixture model with correlated allele frequencies [31,32]. Each run of analysis consisted in 1,000,000 MCMC with a burnin period of 500,000. The most likely number of clusters was inferred using both the standard method (plotting ln Pr (X|K) vs K and using the Bayes' rule [31]) and the DK statistic [33] based on a rate of change in the log probability of the data. The results were averaged over multiple runs using the CLUMMP software [34] and displayed using the DISTRUCT program [35].
The spatially explicit Bayesian clustering program GENE-LAND 3.2.4 [20] (an extension of program R 2.12.0 [36]) was used to further investigate genetic structure. GENELAND considers individual multi-locus genotype data searching for the best fit to HWE and linkage equilibrium. GENELAND also incorporates spatial data directly under the assumption that populations are spatially organized. This program implemented different models to describe population genetic variation; we tested the correlated allele frequency models, with or without spatial information. The correlated allele frequency model accounts for the situation where some allele frequencies reflect common ancestry of different populations. The primary distinguishing factor between the spatial model and the non-spatial model in GENELAND is the assumption of spatial correlation of genotypes. Any genetic boundaries found are assumed to separate K random mating subpopulations, thus subdividing the space in a way resembling the Voronoi-Poisson tessellation [20,37]. Ten MCMC iterations were performed, with K varying from 1 to 10, using 10,000,000 MCMC, a thinning interval of 100 generations, a maximum rate of Poisson process fixed to 316, and spatial coordinates uncertainty of 50 km. The allele frequencies prior was modelled assuming a Dirichlet distribution [38]. The assignment of individuals to subpopulations and the parameter inference were performed in a separate run as suggested by [20]. For this run, K was set to the inferred number of subpopulations and all other parameters were similar to those runs with variable K. The posterior probability of subpopulation membership was computed for each pixel of the spatial domain (2006400 pixels), using a burn-in of 500 iterations.

Environmental Data
Seawater salinity (S, psu) and surface temperature (t, uC) data from the sampled sites were obtained from SeaDataNet Climatologies Pan-European Infrastructure for Ocean and Marine Data Management (http://gher-diva.phys.ulg.ac.be/web-vis/), a Pan-European infrastructure for managing, indexing and providing access to ocean and marine data sets and data products, acquired via research cruises and other observational activities, in situ and remote sensing. Temperature data were averaged over the period

Multivariate Analysis
Two different ordination methods, the Correspondence Analysis (CA; [25,39]) and the Canonical/constrained Correspondence analysis (CCA; [40]), were used to further investigate the spatial pattern of genetic variability among tuna samples. CA analysis was performed using the R package ADE4 1.4 [41] and ADEGENET 2.7 [42]. The CA is an 'ordination in reduced space' method, and it can be used to analyse tables of allele counts (Text S1). This method optimizes the x 2 distances among observations and therefore it can give a stronger weight to a population possessing a rare allele. As a consequence, to minimize analysis artefacts, alleles present in single copy in only one population were removed [25].
The relationship of genetic diversity with environmental factors was analysed using Canonical/constrained Correspondence analysis (CCA; see Text S1 for the method description). The contribution of each variable was assessed through correlations between environmental variables (mean-S and mean-t) and the CCA axes. This function is based on [40] algorithm and implemented in the VEGAN package [43]. In the ordination plot of CCA analysis, the constraining variables are represented by arrows directed towards the maximum change of the variable across the diagram and their lengths are proportional to the rate of change in this direction.
Multilocus genotype data were finally analysed under a model of isolation by distance (IBD). A matrix of geographical distances was obtained considering the shortest sea-paths between each pair of sampling sites using Google Earth version 6.0.2 OOB; genetic distances between populations were expressed by the ratio F ST / (12F ST ) [44]. Two additional matrices were calculated describing salinity and temperature differences between sites. The correlation between distance matrices was tested by Mantel tests [45] using the VEGAN package for R [43], and 10,000 permutations. Moreover two different types of partial Mantel test [46] allowed us to estimate partial correlation coefficients, namely between genetic and geographic distances, holding the environmental effects constant, and between genetic and environmental distances, holding the effects of geography constant.

Bayesian Clustering Analysis
Genetic diversity in the CYP population was not significantly different from that estimated in the other Mediterranean ABFT samples (Table S2, Table S3). All pairwise F ST values were significantly greater than 0 except four (comparisons ADR-LIG, ALG-STY, CYP-STY and CYP-ALG), suggesting that different Mediterranean areas roughly correspond to distinct ABFT subpopulations. The LIG-SAR F ST became insignificant after the sequential Bonferroni correction ( Table 1).
The Bayesian analysis carried out using STRUCTURE 2.3 and the sampling location information as prior revealed the highest DK value for K = 3 (Evanno's method [33], based on a rate of change in the log probability of the data), while the standard method to detect population clusters and the Bayes' rule method [31] detected the highest probability for K = 1 (Fig. 2, a). The bar plots   for K = 2-4 revealed that K = 2 could be the most plausible results, while in those for K = 3-4 most individuals showed an apparent pattern of admixture of 2 or 3 gene pools representing a clear signal of cluster overestimation. With K = 2, almost all the individuals belonging to the ALG, STY, CYP and ALB samples showed a proportion of the membership coefficients higher than 0.7 that allowed to assign these individuals to the same cluster. Conversely individuals in the SAR sample showed lower membership coefficient (0.68) and it could be grouped with ALG, STY, CYP and ALB individuals only with low statistical confidence. Similarly it was not possible to assign members of the ADR sample to one cluster with confidence, although most individuals showed values of the membership coefficients equal or higher than 0.60 for the second cluster detected by STRUC-TURE. By contrast, the individuals of the LIG sample showed relatively low membership coefficients (0.48-0.52) and it was not possible to confidently assign them to either cluster. The spatially implicit model (with correlated allele frequencies) implemented in GENELAND did not reach convergence and this poor MCMC mixing could be due to departures from model assumptions (see GENELAND manual). On the contrary the spatially explicit model with correlated allele frequencies reached the convergence of MCMC and the most likely value of K identified was for K = 3. Individual assignments performed in GENELAND with K fixed to 3 identified distinct subpopulations splitting the southernmost samples (ALB, ALG, and CYP) and STY from the central (SAR) and northernmost samples (ADR, LIG) (Fig. 3). These findings mirrored STRUCTURE results corroborating the genetic subdivision of the southern samples and STY.

Multivariate Analysis
The Correspondence Analysis (CA) displayed a spatial pattern of differentiation largely consistent with those detected by the Bayesian clustering analysis. The CA separated the Central Northern Mediterranean ADR and LIG samples form the other samples, identified the group SAR-ALB and revealed the genetic similarity between STY and CYP (Fig. 4, Table S4).
A significant correlation between genetic distances and both geographical and mean-S distances, was revealed by the Mantel test (Table 2). On the contrary, insignificant correlations were obtained with mean-t distances. Nevertheless significant Mantel tests showed a negative correlation with both geographical and salinity distances (Table 2) and this result is not easy to explain. Therefore we further explored these correlations through partial Mantel test and Canonical/constrained Correspondence Analysis (CCA) analysis. We observed only an insignificant correlation between geographical and genetic distances (with environmental distances kept constant), while the same test carried out controlling for geographical distance produced significant results with both mean-S and mean-t distances (Table 2). Therefore, the environmental parameters appear to be reflected in the genetic population structure significantly more than geography.
The CCA analysis was carried out using mean-S and mean-t data as constraining factors and correlations at both axes were significant at the ANOVA 95%-significance test (CCA1 = 0.04, CCA2 = 0.05, Tables S4, S5). The CCA biplots clearly indicated the presence of a strong relationship of the mean-S with CYP and of the mean-t with both STY and CYP samples, showing a correlation of CYP sample with the highest values of salinity and temperature (Fig. 5).
The CCA analysis contributed to highlight that ABFT seems genetically structured along two environmental gradients: a longitudinal, West-to-East gradient associated to the variation of temperature and a latitudinal, North-to-South gradient associated to the variation of salinity. Moreover this analysis confirmed the clustering results obtained using GENELAND, separating the northern (ADR, LIG), from the central-southern samples (ALG, ALB, STY, CYP) and differentiating the SAR sample from all the others (Fig. 5).

Discussion
In this study we detected the presence of at least two ABFT genetically differentiated subpopulations in the Mediterranean and we showed that this genetic structure appears to be correlated with both seawater salinity and surface temperature variation. Previous work [6][7][8] has consistently shown signals of genetic structuring in the Mediterranean ABFT populations through the detection of significant F ST s. Nevertheless the level of genetic divergence detected by [8] (F ST s' range 0.011-0.021) was higher than in previous surveys [6,7] and it was also supported by evidence of significant genetic differentiation between historical population samples (F ST = 0.02) showing that such structuring of the Mediterranean ABFT is probably stable through time. These results, along with the evidence for a recent colonization of the Mediterranean (,20 Kya, [47]), suggested that factors other than genetic drift could produce these levels of genetic differentiation.
An important improvement of this study with respect to previous work focussing on the ABFT population genetic structure [6][7][8], is the inclusion for the first time of a sample from the eastern part of the Mediterranean (CYP, Levantine Sea), thus extending the analysis beyond the Adriatic [8] and Ionian [6,7] Seas. The Bayesian analysis carried out using the new version of STRUCTURE program ( [17]; Fig. 2b) provided an unforeseen picture clustering the CYP sample with the southern samples from the Western and Central Mediterranean (ALB, ALG and STY) that instead were separated from the northernmost samples of the same areas (ADR and LIG). This latitudinal, south-to-north substructuring (also observed in several other marine organisms [15,[48][49][50][51][52]) suggested a more complex pattern of genetic differentiation of ABFT in the Mediterranean than the previously detected longitudinal, west-to-east separation [6,7,9,53]. However the results from STRUCTURE analysis showed that population differences are indeed minor, and did not allow us to unequivocally choose between the existence of one or three genetic clusters using classical methods for the estimation of K. Nevertheless, a deeper inspection of barplots for K = 1-4 and of membership coefficients of each sample permitted to identify K = 2 as a reliable solution reconciling F ST s and Bayesian results. The GENELAND method identified the spatial explicit model with correlated allele frequencies as the most reliable after MCMC convergence analysis, for this reason it was considered to be the most suitable to describe the genetic structure of Mediterranean ABFT. Considering this model to describe the genetic diversity of samples, GENELAND detected for K = 3 the highest posterior probability and further inspection of the results allowed us to rule out the presence of 'ghost' clusters (Fig. 3). These three clusters, separating the northern samples from the southern, remind somewhat the pattern identified by STRUCTURE for K = 2, although GENELAND was able to refine this result clearly separating the SAR sample from the northern and southern samples (Fig. 3). Multivariate analysis corroborated these findings but did not rely on assumption about the underlying population model.
In the last ten years the improved methods of satellite tagging provided insights into tuna movements [54], confirming the transatlantic migration of this pelagic fish and highlighting only limited individual movements in this part of the Mediterranean Sea [55]. The integration of tagging [53][54][55] and genetic data [6,7,10] suggests the presence of some tuna sub-populations displaying different migratory behaviour. The intriguing hypothesis that at least three different ABFT sub-populations exist in the Mediterranean was formulated by [10]. Our results contributed to support this scenario by indicating that ABFT could be in fact a metapopulation characterized by complex sub-population dynamics and by partial reproductive isolation. The present work adds an important detail to the picture, namely the finding that genetic structuring correlates with environmental variables, providing further evidence that environmental conditions affect ABFT reproductive behaviour. The Mantel test revealed indeed a significant correlation of ABFT allele frequencies with salinity in the Mediterranean. We also showed that genetic distances correlate with both salinity and surface temperature when the effect of geographic distances is partialled out, whereas the correlation of genetic and geographical distances appears to be a statistical artefact, due to the correlation between geography and environmental variables. The correlation between genetic and environmental variation is a further piece of evidence about the influence of environmental factors on ABFT population dynamics. Indeed Massimo Sella at the beginning of the 20 th century [56,57] already proposed that salinity and temperature influence both water regime and tuna movements. More recently, several experimental and modelling studies suggested a direct involvement of environmental factors on ABFT spatial dynamics and migration pattern. ABFT spawning behaviour [58,59], called ''repeat homing'', is a process of spatial learning of water-mass conditions optimal for spawning (e.g. SST .20.5uC with a preference form 21.5 to 26.5uC in the Western Mediterranean; [60]). Because ABFT larval occurrences were associated with the confluence of inflowing Atlantic waters and saltier resident surface waters [61][62][63], it has been argued that ABFT spawners preferentially target as spawning grounds mesoscale hydrographical structures whose chemical-physical and productivity conditions favour egg buoyancy and hatching as well as larval retention and survival [60,[64][65][66]. In the Western and Central Mediterranean, the constancy of large-scale surface currents spatially portioned hydrographical structures that are more stable than in the Eastern Mediterranean [67]. Moreover, the spatio-temporal variability at the regional scale and the mesoscale circulation [67,68] form patches of highdensity larvae with very limited extension (from 10 to 13 nautical miles; [60,65,66]) favouring ABFT genetic structuring. As in other several fish, the variation of complex environmental and oceanographic conditions (among which SST and salinity might represent two functional and operative proxies of such variation) and of life history traits, as spawners' habitat preferences and larval phase features, can likely influence population connectivity in the Mediterranean ABFT [69]. Our findings suggested that spawning ground choice could affect ABFT genetic differentiation in the Mediterranean. The significant relationship detected between salinity, temperature and population divergence, represents only a starting point, because many other environmental factors could be involved and interact with each other producing this pattern of genetic differentiation. In particular, we revealed two genetic gradients significantly linked to the variation of chemical-physical conditions: a west-east gradient linked to salinity and a north-south gradient linked to surface temperature values, with strongest relationship of the CYP sample with areas of high salinity and temperature.
This survey showed clearly that, to refine the subtle but significant signal of genetic structure detected in highly migratory fish, it is essential to combine several biostatistical tools based on different assumptions [17,21,[70][71][72][73]. Nevertheless methodological differences can be fundamental in the identification of the signal of genetic differentiation. The integration of landscape ecology with population genetics can enable the detection of environmental factors that may promote or constrain the divergence detected, providing also a refinement of the genetic structure identified through methods of analysis completely independent from any kind of genetic model or assumptions [25].