Inter Individual Variations of the Fish Skin Microbiota: Host Genetics Basis of Mutualism?

The commensal microbiota of fish skin is suspected to provide a protection against opportunist infections. The skin of fish harbors a complex and diverse microbiota that closely interacts with the surrounding water microbial communities. Up to now there is no clear evidence as to whether the host regulates the recruitment of environmental bacteria to build a specific skin microbiota. To address this question, we detected Quantitative Trait Loci (QTL) associated with the abundance of specific skin microbiota bacterial strains in brook charr (Salvelinus fontinalis), combining 16S RNA tagged-amplicon 454 pyrosequencing with genetic linkage analysis. Skin microbiota analysis revealed high inter-individual variation among 86 F2 fish progeny based upon the relative abundance of bacterial operational taxonomic units (OTUs). Out of those OTUs, the pathogenic strain Flavobacterium psychrophilum and the non-pathogenic strain Methylobacterium rhodesianum explained the majority of inter-individual distances. Furthermore, a strong negative correlation was found between Flavobacterium and Methylobacterium, suggesting a mutually competitive relationship. Finally, after considering a total of 266 markers, genetic linkage analysis highlighted three major QTL associated with the abundance of Lysobacter, Rheinheimera and Methylobacterium. All these three genera are known for their beneficial antibacterial activity. Overall, our results provide evidence that host genotype may regulate the abundance of specific genera among their surface microbiota. They also indicate that Lysobacter, Rheinheimera and Methylobacterium are potentially important genera in providing protection against pathogens.


Introduction
The study of the beneficial effects of bacteria and their influence on host health is a growing field. Namely, research that has explored host microbiota variability in space and time suggests the presence of a host genetic component into the development of mutualism communities [1,2]. Similar conclusions were found in several studies on twins and on their core gut microbiota [3,4,5,6]. Furthermore, microbiome homeostasis seems to be the key to resistance against some diseases previously considered exclusively influenced by genetic factors [7]. To determine precisely which genes are involved in the recruitment of specific bacterial strains, some studies looked at gene expression in presence of symbiotic bacteria. Recent studies specifically targeted genes already known as being involved in innate or adaptive immunity (for example IgA) [8,9,10].
In fish, the influence of host genetic background on commensal bacterial community structure is poorly known. However, advances in the field of probiotic development indicate that endogenous bacteria are able to outcompete pathogens [11,12]. Results obtained in aquaculture settings are consistent with previous work showing that non-pathogenic bacteria associated with animal mucosa can contribute to the host health by providing protection against pathogen infections [13,14,15,16]. The genetic basis of the host immune response, especially at the major histocompatibility complex (MHC), toll-like receptor (TLR) and immunoglobulin loci, has been well studied [17,18,19,20]. However, aside from the immune response, the influence of the host on the development of bacterial symbiosis remains poorly understood.
Here, the main objective of the present work was to investigate the potential link between host genotype and skin microbiota composition. As a host model, we focused on brook charr (Salvelinus fontinalis), a salmonid that harbors a complex and dynamic skin microbiota [21,22]. The first specific objective sought to document the genetic variation present in the host population that might underpin the inter-individual variations in the skin microbiota. The second specific objective was to characterize the relationship (cooperation/competition) existing among the different bacterial Operational taxonomic units (OTU) within the skin mucus, which influences the overall structure of the skin microbial community. The third objective was to identify host genetic regions associated to the abundance of specific skin bacterial OTUs (i. e. quantitative trait loci (QTL) associated with

Ethics statement
All fish were reared and the experiment conducted strictly following guidelines required by the ''Comité de Protection des Animaux de l'Université Laval (CPAUL, http://www.vrr.ulaval. ca/deontologie/cpa/index.html?accueil). The CPAUL reviewed and approved all experimental procedures used in this study. Fish sampling The study population was generated using two different populations of brook charr. The first one (D) is a domestic population used in aquaculture for more than 100 years (Pisciculture de la Jacques Cartier -Cap-Santé, Québec, Canada). The other population (L), is an anadromous population from the Laval River near Forestville (North of the St. Lawrence River, Québec, Canada). Breeders from the L population were kept in captivity at the Station aquicole de l'ISMER (Québec, Canada, 48u319N, 68u289W) under natural photoperiod and temperature. Crosses between 10 sires of each population (L and D; F 0 generation) with 10 dams (L and D) were performed to generate 10 full-sib outbred hybrid families (L6D -F 1 generation). Subsequently, six F 1 individuals were bi-parentally crossed to obtain three F 2 families. The F 2 family selected for the present study demonstrated the lowest post-hatch mortality rate (,2%) and the greatest abundance. Fish were raised in the same tank and had fasted for 12 hours before sampling. Fish were measured (mean = 28.861.77 cm) and weighed (mean = 276.7659.48 g) to calculate Fulton index [23].
Mucus was sampled using a sterile swab on the same area for all the fish [24]. We choose to sample the skin between the adipose fin and the caudal fin because this area was undisturbed by fish handling. Samples were stored in a sterile micro-centrifuge tube containing lysis buffer (Tris 50 mM, EDTA 40 mM, sucrose 0.75 g) and immediately stored in 280uC until DNA extraction.

DNA extraction
DNA was extracted using a modified protocol of salt-extraction from Aljanabi & Martinez (1997). During the first lysis step, 22.6 mL of lysozyme (40 mg/mL) was added to the sample and incubated 45 minutes at 37uC. After this step, 22.6 mL of proteinase K (20 mg/mL) and 90 mL of SDS (10%) were added to the lysate and incubated at 55uC over night with agitation. The aqueous phase was transferred into a clean Eppendorf tube containing 600 mL of NaCl 6M, mixed and centrifuged 20 min at 14,800 rpm. The supernatant was transferred again into a clean Eppendorf tube containing 1 volume of cold isopropanol, mixed and stored 30 minutes at 220uC. The mixture was centrifuged 20 minutes at 14,800 rpm and the supernatant was thrown away. The pellet was washed with cold ethanol 70%, air-dried and finally resuspended in 25 mL of sterile MilliQ H 2 O. Subsequently, DNA integrity and quantity were controlled using a Nanodrop instrument (ND-1000, Nanodrop).

Microbial 16S pyrosequencing
Each DNA sample, the 16S gene was PCR amplified using Takara Ex taq premix (Fisher). All PCR reactions were performed in a reaction volume of 50 mL containing 25 mL of premix Taq, 1 mM of each primer and sterile MilliQ H 2 O to up to 50 mL. A general reverse primer (R519) combined with B primer (Roche) was used for amplifications in combination with one of 86 uniquely tagged forward primers (F63-targeted) combined with A primer (Roche) (for primer sequences see [25,26]. The mean length of the amplified fragment was 450 bp. This procedure facilitates the parallel sequencing of thousands of samples on the same run and to reassign each reads to their respective samples. PCR conditions were applied as follows: after a denaturing step of 30 s at 98uC, samples were processed through 30 cycles consisting of 10 sec at 98uC, 30 sec at 55uC, and 30 sec at 72uC. The final extension step was done at 72uC for 4 min 30 sec. Following the amplification step, samples were purified using AMPure Beads (Beckman Coulter Genomics). Samples were adjusted to 100 mL with EB (Qiagen), 63 mL of beads were added. Samples were mixed and incubated for 5 min at RT. Using a Magnetic Particle Concentrator (MPC), the beads were pelleted against the wall of the tube and supernatant was removed. The beads were washed twice with 500 mL of 70% ethanol and incubated for 30 sec each time. Supernatant was removed and beads were air dried for 5 min. Tubes were removed from the MPC and 24 mL of EB were added. Samples were vortexed in order to suspend the beads. Finally, using the MPC, the beads were pelleted against the wall once more and supernatant was transferred to a new clean tube. Samples were quantified with Nanodrop before the amplification step. Amplicons were then quantified with Quant-iT PicoGreen dsDNA Assay Kit and mixed equally before being sent to the Plateforme d'Analyses Biomoléculaires (Institut de Biologie Intégrative et des Systèmes, Université Laval) for pyrosequencing on a 454 GS-FLX DNA Sequencer with the Titanium Chemistry (Roche), according to manufacturer's procedure.

16S sequence analysis
All sequences are available on MG-RAST server (MG-RAST IDs: 4539915.3). The data were analyzed in two steps. First, CLC Genomics Workbench 3.1 (CLC Bio, Aarhus, Denmark CLC workbench BIO) was used to trim sequences for quality, recover and remove the primers' sequences and tags (minimum average quality score: 35 for a window of 50, number of differences to the primer sequence = 0, maximum number of differences to the barcode sequence = 0, number of ambiguous base calls = 0, maximum homopolymer length = 8). In a second step, preprocessing and analysis were performed using the MOTHUR software [27]. All datasets were checked for chimeras with the chimera slayer algorithm implemented in MOTHUR. Standardization of the different samples was done by using the zscore which calculates the normalized abundance as follows: normalized abundance = (abundance -mean) / standard deviation [3,28,29]. An alternative method for normalization was also tested, which consisted of subsampling equal number of reads from each sample [30]. This method greatly reduced the numbers of sequences (224 sequences per samples), but gave the same results as zscore ( Figure S2, Figure S3, Table S1). Therefore we preferred to keep zscore normalization to base our conclusions on a larger dataset. We used the Operational Taxonomic Unit-based method described by Costello et al. (2009) because it is not biased towards a predefined taxonomy. One index was retained to assess the quality of pyrosequencing: the sequence coverage index (Good's estimator). The sequence coverage index is a metric used to estimate the quality of the depth sequencing. All sequences were clustered into Operational Taxonomic Unit (OTU) using a 97% identity threshold and OTU were classified from phylum to genus using the program MOTHUR with the default settings. For all interesting OTUs (explaining the inter-individual variation or linked to a genetic region), we extracted all the sequences classified in that OTU and used the single best BLAST hit to identify the different species. Statistical differences in the abundance of each OTU were calculated with the software for metagenomic analysis (Metastats).
To visualize potential differences across host progenies in terms of mucus bacterial community structure, distances between those communities were computed using the Yue & Clayton measure of dissimilarity (Thetayc). This index developed by Yue and Clayton (2005) is a measure of dissimilarity between the structures of two communities, meaning that this calculator takes in account the abundances of each OTU. Then, Distances were represented using a dendrograms based on this index and statistical robustness of the dendrogram was determined by a Unifrac weighted test. This test allows determining whether any of the samples have a significantly different structure than the other samples. A random (Monte Carlo) permutation test was performed to test whether or not the distance between two communities was greater than expected by chance alone.
To visualize the distances between communities, we also applied a Principal Coordinate Analysis (PCoA) based on the Yue & Clayton measure of dissimilarity (Thetayc) using an eigenvectorbased approach to represent multidimensional data in as few dimensions as possible. This method allows determining which OTUs are responsible for shifting the samples along the PCoA axes by measuring the correlation of the relative abundance of each OTU clustered in genera with the axes of the PCoA dataset. Statistical differences in the abundance of each OTU was calculated to highlight which genera were the causative agents of the differentiation between groups obtained in the dendrogram based upon the TethaYC index [31]. All statistical analyses and graphics were carried out in the R environment (http://www.rproject.org). LGs were carrying QTL. The most probable position of the QTL was defined at the position giving the largest logarithm of odds (LOD) score indicating the QTL was fixed. (2) A two QTL model based upon the Haley-Knott regression was used to refine the QTL detection across the genome with a resolution of 5 cM and eventually to detect two QTL on a single LG linked to a particular trait. (3) The best model fitting our data was used to compute the percent variance explained (PVE expressed in %) by the QTL. The chromosome-wide and the genome-wide thresholds were calculated for each LG using 10,000 permutations. The 1.5 LOD confidence intervals were determined for all analyses following the Bayesian method implemented in the ''bayesint'' function in R/qtl. The bayesint function calculates an approximate interval (end points around the maximum LOD) for a given chromosome using the genome scan output.

Results
Mucus samples were obtained from 86 F2 fish progenies (44 males and 42 females) issued from the same family. A total of 87,940 high-quality, classifiable sequences were obtained with an average number of 10226540 per sample, which were subsequently classified in OTUs [1]. All of these sequences were successfully clustered into 9520 OTUs with 97% identity. A final trimming step was performed to focus on the most abundant OTU (which are represented by at least 10 sequences for the whole project) and result in a dataset containing 71,719 sequences (81% of the initial data set) clustered in 192 OTUs. The depth of Groups are defined with the Weighted Unifrac distance. The first and second groups are composed of closely related bacterial communities. The third group is an assemblage of dissimilar communities and is considered as an outgroup. The most differentiated groups are groups 1 and 3 (Unifrac Score: 0.710811, p,0.0010) followed by the distance between groups 2 and 3 (Unifrac Score: 0.685361, p,0.0010), and the distance between groups 1 and 2 (Unifrac Score: 0.401674, p,0.0010). doi:10.1371/journal.pone.0102649.g002 sequencing and the coverage were estimated using the Good's estimator index; the coverage found was always higher than 90% except for one sample (#127), which exhibited 76% of coverage (Table 1, Figure S1). The number of OTUs per sample was not equally distributed and the mean number was 156.5681.7 (ranging from 33 to 368). The alpha-diversity was estimated by calculating the non-parametric index of Shannon (npShannon) because of the non-equally distributed abundance of each OTU in the samples. The npShannon index ranged from 0.29 to 3.45, with a mean index of 1.4360.83.
Phylum, class, and OTU composition of all samples were represented in the Figure 1. Most of the OTUs belong to the Proteobacteria phylum (88.7%) and the Bacteroidetes phyla (9.7%).
Male and female brook charr harbored slightly different microbial communities. The Unifrac weighted distance between samples from males and females was statistically significant indicating that structures of bacterial communities were differentiated according to the sex of the host (Unifrac Score: 0.241695; pvalue ,0.001). However, there was not a sex difference in the two most abundant species, as this difference was based on 22 OTUs classified on 20 genera and 21 species, all of them being less abundant than 0.1% (Table 2).
In terms of community structure, clustering based on ThetaYC index clearly defined two groups of samples, whilst a third group was less well defined and composed of highly variable communities ( Figure 2). There was little variation among samples in the first and second group (group 1 and group 2). The third group (group 3) was an assemblage of dissimilar communities and is considered as an external group. The variation of the alpha-diversity visualized by the npShannon index ( Figure 3) indicated that alpha-diversity increases from group 1 to group 3. Furthermore, the bacterial communities within the third group were more diverse than groups 1 and 2. The same clustering was found with the normalization by subsampling ( Figure S2, Figure S3, table S1). The difference between each group was greater than the difference between males and females ( Figure 2). The most differentiated groups were groups 1 and 3 (Unifrac Score: 0.710811, p-value , 0.001) followed by the distance between groups 2 and 3 (Unifrac Score: 0.685361, p-value ,0.001), and the distance between groups 1 and 2 (Unifrac Score: 0.401674, p-value ,0.001). This was correlated with the numbers of OTU that were differentially abundant between those 3 groups (Figure 4). The divergence between groups 1 and 2 was based upon 54 OTU classified in 26 genera and 41 species (Table 3). The microbiota of the first group was dominated by M. rhodesianum, and host individuals from this group were tightly clustered. The second group was more diverse, dominated by Flavobacterium and had a lower abundance of M. rhodesianum. For groups 1 and 3, the difference was based upon 109 OTU classified in 45 genera and 70 species (Table 4). Finally, 26 OTU were differently abundant between group 2 and 3 and classified in 17 genera and 22 species (Table 5).
PCoA analysis displayed the same patterns of divergence between the three groups ( Figure 5). Each group found in the Unifrac analysis was highlighted in the PCoA. To understand which OTUs were responsible for the differentiation of the cluster on the PCoA axes 1 and 2, individual correlation coefficients were calculated. Three OTUs were highly correlated with the two axes (s.0.6); OTU 36, 50 and 149. Those OTU belonged to two species: Flavobacterium psychrophilum (OTU 36) and M. Rhodesianum (OTU 50 and 149). The negative correlation   Figure 6). In the F2 fish progeny, three significant QTLs (at the genomewide level) were identified on two linkage groups (LG 11 and LG 16). One major QTL per strain was detected respectively for Lysobacter, Rheinheimera and Methylobacterium counts (LOD score = 9.89, 7.46 and 3.48 respectively). For each of these traits, the total PVE (percent variance explained) of the QTL were estimated to 17.01%, 31.05% and 41.1% for Methylobacterium, Rheinheimera and Lysobacter respectively. The most probable positions of these QTL, their respective 95% CIs, the closest linked molecular markers (one per QTL) as well as additive and dominance effects are presented in Table 6.

Discussion
Inter-individual variations in host microbiota has been well documented (e.g. [1]. Such variations occur even in hosts with identical genetic background, as observed in monozygotic twins [3] indicating that both genetic and environmental conditions play a role in the modulation of host microbiota. The gold standard forward genetics technique to identify areas of the genome that relate to certain phenotypes is to make a cross between genetically divergent individuals [32]. In this study, we focused on the structural variation of the skin microbiota (i.e abundance of each bacterial genus) of an F2 intercross originally generated from two genetically contrasted strains of brook charr. To our knowledge, this is the first study that identified host genomic regions associated with the abundance of specific microbiota strains in a non-model vertebrate.
The deep taxonomic analysis of the fish skin microbiota indicates that two phyla dominate: Proteobacteria (88.7%), followed by Bacteroidetes (9.7%). Those two phyla are mainly represented by a single OTU each; 50 and 36 respectively. OTU 50 is classified as M. rhodesianum. OTU 36 is classified as F. psychrophilum, the causative agent of the cold water disease, a pervasive infectious disease in farmed fish [33,34]. This disease especially affects salmonids at early life-stages, and it is well documented that both stressful conditions, and injuries facilitate infection triggering [33,35]. According to the dendrogram (Figure 2), each individual's bacterial communities clustered into three groups. Similar clustering was also found with the second normalization method (subsampling of the same number of sequences for each samples). The same result was obtained with PCoA analysis and furthermore the Pearson correlation indicates that the genus Flavobacterium and Methylobacterium are negatively correlated. In the PCoA, two species (F. psychrophilum and M. rhodesianum) are correlated in opposite ways with the axis 1 which discriminate the three groups. We assume that two or more species which seem to be mutually exclusive are involved in a negative relationship [36]. All together those results indicate that the antagonistic relationship between those two dominant species is very likely playing a key role in shaping the structure of the individual's microbiota.
Because all fish progenies were reared in the same tank, and under the same environmental conditions since birth, environment had likely little influence on the phenotypic variation observed in this study. Yet, the individual's microbiota either clustered in one of two closely related groups, or formed divergent outliers. Sexspecific variations had little influence on individual clustering since both males and females were present in each defined group and there was no correlation between sex and groups. The difference between male and female is based on 22 OTUs which are less abundant than 0.1% of the community and are then classified as part of the rare biosphere. Previous studies observed similar individual variations of the microbiota in human or mouse [1,2,32].
Three OTUs classified into two species were responsible for the inter-individual differentiation; M. rhodesianum and F. psychrophilum. Interestingly, Flavobacterium and Methylobacterium, which are the most abundant genera, negatively co-occurred in the samples, indicating that the relationship between those two genera is based on competition. Furthermore, a strong negative correlation was found between the OTU 36 and 50, thus supporting a strong antagonistic relationship between the two species, F. psychrophilum and M. rhodesianum.
M. rhodesianum produces poly-b-hydroxybutyrate, a polymer of short-chain fatty acid, known to inhibit the growth of pathogens like enterobacteria and Vibrio sp. [37,38,39,40,41,42]. Taken together, this suggests that a reduction of M. rhodesianum   Evidently, a change in skin microbiota taxonomic structure favors opportunist bacteria and especially opportunistic pathogens like F. psychrophilum, Acinetobacter haemolyticus, Acinetobacter johnsonii and Acinetobacter junii, which have a higher abundance in group 2 compared with group 1 Such a disturbance in the microbiome homeostasis is called dysbiosis, and its occurrence enlightens the importance of the function of M. rhodesianum in controlling the balance between both other commensal bacteria and opportunistic pathogens. Furthermore, Flavobacterium psychrophilum was negatively correlated to both M. rhodesianum abundance and Fulton index. The Fulton index is a condition factor used as a proxy for fish health status. Therefore, it suggests that skin microbiota from the weakest host individuals were unable to prevent colonization by the opportunistic pathogen F. psychrophilum. Furthermore, skin microbiota taxonomic structure varies significantly across the F2 progeny. This result added to the fact that i the genetics of the host is the only variable in the experiment, and ii the finding of three QTLs associated to the abundance of bacterial genera with high PVE, suggest that host genotype influences the abundance of commensal strains e.g. Methylobacterium, which will regulate the abundance of F. psychrophilum.
We found three QTL associated with the abundance of three genera: Lysobacter, Rheinheimera and Methylobacterium, all of which were observed to provide antimicrobial compounds [39,40,41,42,43,44,45,46,47]. These finding strongly suggests that host genotype influences abundance of specific commensal strains, and possibly targets their recruitment. The most compelling evidence concerns the QTL associated with Methylobacterium (PVE = 17.01%): as previously described above, Methylobacterium is influential on the structure and the homeostasis of the microbiota. Its abundance is inversely correlated with those of the pathogen F. psychrophilum. The targeted recruitment of M. rhodesianum mediated by the host genotype is therefore a strategy to prevent pathogen growth via harnessing antagonistic relationship towards resources use [48]. To our knowledge, this is only the second study that identified QTL associated with microbial variation among individuals. A study on murine gut microbiota showed similar results, in which McKnite et al. (2012) found 5 QTL located on four chromosomes influencing the variation of Linkage analysis on genus abundance data strongly evidenced the influence of host genetics on the modulation and/or recruitment for those genera. Therefore, brook charr immunity involves both specific commensal strains recruitment and nuclear gene expression, as previously evidenced in [49], those are under the control of the host genotype.
The study of the genetic architecture underlying the regulation of bacterial abundance further highlights the coevolutionary pattern between host, commensals, and pathogens. However, identifying genes located in the genomic regions (QTL) linked to the abundance of microbial partners will be far more challenging as the genome of brook charr is currently not fully sequenced and poorly annotated compared to the mouse genome. The markers surrounding the QTL regions may define regions to be deeply sequenced in future work to identify potential underlying genes and their associated functions. Evidently, those markers will be invaluable to conduct highly innovative genetic breeding programs targeting disease resistant host strains via the recruitment of highly resilient microbiota. Black circles represent individuals belonging to the group 1, red circles represent individuals belonging to the group 2 and green circles represent individuals belonging to the group 3. Arrows represent genus, which are significantly correlated with the axis. The first and second axes represented 24.5% and 7.7% of the variation respectively. The R-squared between the original distance matrix and the distance between the points in 2D PCoA space was 0.87. doi:10.1371/journal.pone.0102649.g005  Table 6. Descriptive statistics, including the LOD score, the position, 95% CI, PVE (%), the associated P-value of each QTL linked to every phenotype related to bacteria counts trait (LOD, Log 10 of the odd ratio; 95% CI, 95% confidence interval; PVE, percent variance explained) [23].

Phenotype
Linkage Supporting Information Figure S1 Rarefaction curves for each group defined by the dendrogram analysis based upon ThetaYC index (see figure 2). Each group curve reaches a plateau thus indicating the depth of sequencing is sufficient. (EPS) Figure S2 Alpha diversity of each group normalized by subsampling of the same number of reads per sample. Each group was defined by the dendrogram analysis based upon ThetaYC index performed on the normalized dataset. Alphadiversity was calculated by the non-parametric index of Shannon. Statistical differences (represented by the asterisk) were calculated with a wilcoxon test with a correction for multiple testing (Holm, p-values ,1.10 24 ). (EPS) Figure S3 PCoA analysis of the microbiome for all 86 F 2 individuals. This PCoA was performed on a normalized dataset with the second method of normalization (subsampling of the same number of reads per sample). Black circles represent individuals belonging to the group 1, red circles represent individuals belonging to the group 2 and green circles represent individuals belonging to the group 3. Arrows represent genera, which are significantly correlated with the axis. The first and second axes represented 24.5% and 7.8% of the variation respectively. The Rsquared between the original distance matrix and the distance between the points in 2D PCoA space was 0.88.

(EPS)
Table S1 Differentiation between the three groups of OTU calculated with an analysis of similarity (ANOSIM, statistic R). The q-values represent p-values corrected with Bonferroni and considered as significant when p,0.05. Note: The analysis was performed on a dataset normalized by two methods: zscore and subsampling. Both analyses gave the same results, each group being significantly different from the two others. (DOC)