Selection and Phylogenetics of Salmonid MHC Class I: Wild Brown Trout (Salmo trutta) Differ from a Non-Native Introduced Strain

We tested how variation at a gene of adaptive importance, MHC class I (UBA), in a wild, endemic Salmo trutta population compared to that in both a previously studied non-native S. trutta population and a co-habiting Salmo salar population (a sister species). High allelic diversity is observed and allelic divergence is much higher than that noted previously for co-habiting S. salar. Recombination was found to be important to population-level divergence. The α1 and α2 domains of UBA demonstrate ancient lineages but novel lineages are also identified at both domains in this work. We also find examples of recombination between UBA and the non-classical locus, ULA. Evidence for strong diversifying selection was found at a discrete suite of S. trutta UBA amino acid sites. The pattern was found to contrast with that found in re-analysed UBA data from an artificially stocked S. trutta population.


Introduction
Genes of adaptive importance are of growing interest to conservation genetics [1]. Major Histocompatibility Complex loci are critical to immune function and highly polymorphic. MHC molecules are loaded with peptides (small fragments of proteins) and transport these to the cell surface. There, the peptide-MHC complex interacts with T cells and, if the peptide is identified as foreign, an immune response is initiated. Variation at MHC affects their ability to bind different types of peptide and is adaptive in helping to resist disease [2][3][4][5]. Populations which lose this variation [6,7] may be of conservation concern [1]. Recently, brown trout (Salmo trutta L.) have shown promise as a model species for MHC studies. MHC class I showed lower population differentiation than neutral markers across trout populations while variation at class I was maintained in populations isolated above waterfalls where it was lost at neutral markers [8]. Both of these phenomena are expected for a gene under balancing selection. Kin association based on sharing alleles at MHC class I has been demonstrated in the same trout [9]. These studies were based on a MH class I marker and, consequently, it is of clear interest to examine allelic diversity, sequence polymorphism and selection at class I itself in S. trutta.
Primates show more rapid turnover of alleles at MHC class I than class II with ancient trans-specific lineages observed in the latter. The difference in turnover rate arises from class II proteins binding a broader range of antigens than class I [37]. The opposite pattern is seen in salmonids [12], where it has been attributed to the lack of linkage between loci. However, the same pattern is seen in Xenopus laevis MHC class I and class II loci, which are linked [38]. A possibility is that salmonid class I alleles have broader binding capacity than class II. Non-conventional T-Cell Receptor-pMHC binding of ''bulged'' antigens has been identified in human MHC class I where just a small number of MHC residues are involved in antigen presentation [39]. Hypothetically, this could be important at salmonid class I and these alleles might be able to present a variety of antigens despites shifts in antigenic pressures. A prediction of this theory would be that the pattern of codon level selection would highlight the importance of these key residues.
There is growing emphasis on adaptive loci in population genetics and recent studies of S. trutta (employing a MH class Ilinked marker), have revealed interesting biological phenomena [8,9]. Consequently, we seek to supplement these studies and help address key questions in conservation genetics by examining polymorphism at MH class I itself in a wild S. trutta population for the first time. Existing data for MH class I from S. trutta are from a limited sample size of a non-native introduced strain in the Colorado River, USA [12], which will have been exposed to novel pathogens and may have experienced bottlenecking. How do patterns of allelic diversity, divergence and codon-level selection differ between the wild and artificial stock? A previous study examined MH class I in S. salar which share the same Irish river and similar exposure to pathogens over time [36], and here we investigate how the native brown trout compare with these? it was felt that the new data from wild brown trout might also reveal important phylogenetic novelties and help identify whether patterns of selection vary amongst salmonid species.

Ethics statement
Electrofishing and sampling were carried out under the Certificate of Authorisation for Purposes of the Fisheries Acts 1959-2003, issued to P. McGinnity by the Irish Minister for Communications, Marine and Natural resources. There is no formal ethics committee in the Marine Institute, who were responsible for the capture and killing of the fish. However, the Institute, as an Irish Government agency, has over sixty years of experience working with salmonid fish both cultured and wild and has always taken the upmost care to handle and manage the animals it studies and works with as humanely as possible. Electrofishing was undertaken using standard battery powered 12volt Safari Research 550E back pack electrofishing equipment (supplied by GFT electrofishing equipment http://www.gft.ie/) for the capture of fish in small streams and rivers. The electrofishing equipment causes the fish to be displaced from it's holding place in the stream into the flow enabling a second person to capture the fish using a handnet. The fish were killed immediately after electrofishing by percussive stunning such that the blow was delivered with sufficient force above or adjacent to the brain in order to render immediate unconsciousness and therefore humane killing of the fish.

Sampling & UBA sequencing
The Srahrevagh River, Co. Mayo, Ireland is a tributary of the Burrishoole River system, where salmonid populations have been extensively studied [8,9,36,40,41]. As part of these ongoing research efforts, a total of 107 S. trutta (1 + and older) were sampled from the Srahrevagh on the 15 th June 2004 by electro-fishing. Portions of anterior head kidney from all fish were taken under sterile conditions and stored in RNAlater TM (Qiagen Ltd., West Sussex, UK) to prevent RNA degradation. These were transported to the laboratory on ice and stored at 220uC. All individuals were screened for Sasa-UBA-3UTR, a microsatellite marker embedded in the 39 untranslated region of the MHC class I locus [11].
We then selected 28 individuals for UBA sequencing. The relationship between the linked microsatellite marker and UBA in trout was of interest to a parallel study [41]. To this end, we excluded the small number of fish which were homozygous for the marker (15/107) from otherwise random sub-sampling and included one fish (RW_107) which had a rare marker allele (128). This one individual was not included in the codon by codon selection analysis described below. The exclusion of homozygotes for the marker could have introduced some potential for bias but marker genotypes proved to be unable to predict UBA genotypes [41]. Therefore, we concluded that any bias in our sub-sample for UBA was minor (See also Table S3).
Head kidney samples were homogenized in lysis buffer [4 M guanidium thiocyanate, 25 mM sodium citrate (pH 7), 0.5% sarkosyl ( = N Lauroyl-sarcosine), 0.1 M b mercaptoethanol], followed by phenol/chloroform extraction. Total RNA was precipitated in ethanol, washed, and dissolved in water. Extracted RNA quality was assessed on a 1% agarose gel and quantified. Working solutions (0.5 mg/ml) of RNA were generated for RT-PCR.
The ,600 bp cDNA products were purified on micro bio-spin chromatography columns with 600 ml of Sephacryl S-400 HR matrix. Ligation of the purified cDNA into pGEMH-T Easy Vector and cloning was done as per the manufacturers' instructions with a minor modification (samples were spun down and 850 ml of the supernatant removed to facilitate concentration of the bacterial cells). Some 40 ml of the suspension of bacterial cells was plated out onto LB/ampicillin/IPTG/X-Gal plates and incubated overnight at 37uC. Single colonies were grown in LB broth with ampicillin overnight at 37uC and shaken at 300 rpm. Plasmid DNA was isolated from single colonies using the QIAGENH QIAprep spin miniprep kit (QIAGEN, Valencia, CA, USA). Both strands of five clones from each of the subsequent PCR amplifications were sequenced using the ABI Prism Bigdye Terminator Cycle Sequencing Ready Reaction kit (Perkin-Elmer, Branchbury, USA) and the T7 and SP6 primers, and analysed on an ABI 377 automated sequencer (Applied Biosystems, Foster City, USA).
Basic descriptive statistics DNASP v4 [43] was used for conducting basic descriptive analysis. Ratios of non-synonymous (d N ) to synonymous (d S ) nucleotide substitutions were calculated. These are used in helping to identify the presence of diversifying selection, with ratios of d N / d S .1 generally accepted as indicative of selection.
An allelic richness statistic was generated by calculating the number of UBA alleles found in 1,000 random samples of ten fish from our total sample of 28 fish, using in-house Python scripts, to provide for direct comparison with the allelic diversity found in the Colorado River brown trout (n = 10, N A = 10) [12].
The program PERMUTE was used to conduct a permutation test for recombination (100,000 permutations in all cases), using the correlation between three measures of linkage disequilibrium (r 2 , D' and G4) and physical distance [44].

Codon by codon analysis of selection and recombination
OMEGAMAP was used for Bayesian co-estimation of selection (d N /d S , termed v hereafter) and recombination (r) on MH class I alleles. This analysis excluded sequences from four fish (RW_11F, RW_37F, RW_61M, RW_90M, RW_107F- Table S1) which microsatellite analysis suggested may have arisen from an isolated upstream population [8] to avoid a violation of its assumption of a single population. An alignment consisting of those Satr-UBA alleles identified from the Srahrevagh River at their frequencies within the sample was constructed for use in OMEGAMAP [44]. The frequency data used in OMEGAMAP are presented in Table  S1. Equilibrium codon frequencies were estimated from an alignment of salmonid UBA. Reversible-jump MCMC was run twice for each analysis with 250,000 iterations and a burn-in of 25,000 iterations. Details of priors are given in Table S2. The method was found to be robust to the use of alternative priors. Both runs were compared for convergence at several parameters and merged to obtain posterior distributions. A companion program, SUMMARIZE was used for analysis of the OMEGA-MAP output. Graphs of data were produced using R scripts provided with OMEGAMAP [45].

Comparative analysis of stocked Colorado River S. trutta using OMEGAMAP
The earliest records of brown trout in Colorado are from the years 1885 and 1886, when state and private hatcheries reported having "English" trout (imported from England). In 1890, the federal hatchery at Leadville began the propagation and distribution of "Von Behr" trout and "Loch Leven" trout. Thus, the ancestry of brown trout now occurring in the headwaters of the Colorado River probably represents a mixture of brown trout from Germany, England, and Scotland (Dr. Robert J. Behnke, Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, Colorado, USA, Pers. Comm.). The UBA data from introduced S. trutta populations in the Colorado River (USA) [12] allowed the construction of ''PAC''-type datasets for analysis in OMEGAMAP using a block model and Prior A, as above. This allowed further comparisons between codon-by-codon selective patterns in UBA in populations of different taxa, cohabiting and otherwise. Estimates of v from the S. trutta PAC dataset from the Srahrevagh River were compared with those for the introduced S. trutta in the Colorado River. We were interested in the relative strength of selective pressures on a codon-by-codon basis in different populations. OMEGAMAP analyses of three subsamples of the Srahrevagh data of the same size as that of the Colorado River (n = 10) demonstrated that v estimates were robust to differences in the size of the data set (data not shown). To test for pairwise differences in the posterior distribution of v at each codon between the outputs for any two populations, A and B, the 95% Highest Posterior Density interval (a Bayesian analogue of confidence intervals outputted by OMEGAMAP for parameter estimates) for log(vA/vB) was calculated at each codon. The hypothesis that vA = vB was rejected when the 95% Highest Posterior Density (HPD) interval did not include log(vA/vB) = 0.
Descriptive statistics for DNA diversity were again calculated in DNASP.

Phylogenetic analysis
Neighbor-Net, as implemented in SPLITSTREE v4 [46], works similarly to Neighbor-Joining tree algorithms. Each taxon is initially represented by a single node with iterative agglomeration of neighbouring pairs of nodes into a composite node. However, it differs in that these neighbours are not amalgamated immediately but, rather, this only occurs when a node has been paired up a second time. The three linked nodes are then replaced with two linked nodes and the distance matrix is reduced. By reversing the amalgamation process, the splits given in the Neighbor-Net are produced. These are a circular collection of splits. Graphically, splits are represented by sets of parallel lines separating groups. PROTTEST v1.3 [47] was used to select the best-fit model of protein evolution for overall Satr-UBA alignments; for a1; and a2 domain alignments, independently. NeighborNet networks were computed with edge weights estimated using ordinary least squares variance and a threshold of 10 26 in SPLITSTREE v4. The equal angle algorithm was employed. Maximum likelihood protein distance estimates under the appropriate PROTTEST model were used in generating networks. Bootstrap support with 1000 replicates was provided, but displayed only for the most significant splits for presentation clarity. Networks were generated for Satr-UBA sequences as a whole and for separate a1 and a2 domain alignments of each, with appropriate S. salar and O. mykiss outgroups. The models used in each case consisted of the JTT matrix [48] with additional parameters for whole Satr-UBA (''+I'' = 0.127; ''+G'' = 0.721), a1 (''+G'' = 1.163) and a2 (''+G'' = 0.386). Where identical a1 or a2 domain ''alleles'' occurred, a single node was presented. A neighbour-joining tree was also constructed in MEGA v3.1 for salmonid UBA amino acid sequences using a JTT matrix with gamma distributed rate variation (+G) of 0.721. Bootstrap support values (1,000 replicates) are presented. SPLITSTREE can help identify recombination events as incongruities or loops in networks. Specific recombination events within Satr-UBA sequences, in particular, and salmonid UBA in general were analysed in parallel to phylogenetic analysis using SPLITSTREE. Potential events were then examined using MAXCHI in RDP2 [49] and by simple eyeballing of the data using the sequence alignment explorer in MEGA v3.1 [42]. Sequences that have been heavily involved in recombination events, which has been observed in other salmonids [12], or show evidence of intraexon recombination, were noted.
Interspecific comparisons of selected codons in UBA in salmonids CODEML [50] was used to analyse available UBA sequence data from S. trutta, S. salar and O. mykiss because it does not require that the analysis is being carried out on a single population. We predicted that the pattern of selected codons should be conserved amongst these taxa due to the ancient nature of the polymorphism at the locus and possible similarities in the selective pressures over time. Further, any differences which do occur should follow the pattern which might be expected from the established phylogenetic relationships, namely S. trutta and S. salar should show a more similar pattern of selection than either do with O. mykiss.
DNA Maximum Likelihood (DNAML) program version 3.5c [51], as implemented in BioEdit ver 7.0.1 [52], was used to construct maximum likelihood trees for each data set for use in CODEML. CODEML detects positive selection via likelihoodratio tests between nested probabilistic models [M0 (null), M1a (neutral), M2a (selection), M7 (b) and M8 (b and v)] of variable v ratios between codons where the simpler model differs from the more complex model by not allowing for v.1. Akaike Information Criterion (AIC) statistics were used to test the relative likelihood of models.

SWISS-MODEL and SPDV DEEPVIEW modelling
The reference Satr-UBA allele, Satr-UBA*0101 was submitted to SWISS-MODEL. The model used was murine MHC I 2bvoA [53] with which Satr-UBA*0101 showed 57% similarity. The returned Protein Data Bank files were loaded into the supplied SPDV DEEPVIEW program for three-dimensional visualisation, graphical manipulations, and the plotting of codons under different selective pressures. SPDV DEEPVIEW was used to output files for the rendering software POV-RAY. This produces very high quality graphics of the protein.

Descriptive statistics for the Srahrevagh River population
Twenty-one alleles were identified in the Srahrevagh, all of which were novel. The alleles were named Satr-UBA*1101-3101 and Genbank accession numbers AM262749-69 have been allocated to them ( Figure S2). Individual genotypes were typical of a single, diploid expressed Satr-UBA locus except one individual which presented three Satr-UBA alleles. These alleles did not cosegregate in other individuals as would be expected for a haplotype with tandemly duplicated class I loci. Some 18 of the twenty-eight individuals yielded only one Satr-UBA allele. This seemed a low level of heterozygosity given the level of allelic diversity observed and, although some form of underdominance may be occurring, it likely reflects preferential amplification of one or other allele given the use of two forward primers. Both problems were noted in previous work on salmonid MHC [12,54]. The Satr-UBA alleles were composed of sixteen a1 sequences (14 novel), and nineteen a2 sequences (15 novel). Nucleotide diversity (p) for the Srahrevagh River was 0.260 (see Table 1). Higher divergence and diversity was seen at a1 than a2 but the ratio d N /d S is somewhat higher in a2 (Table 1). Both values are around 0.5, much less than 1, implying that the total region is not under diversifying selection using this simple measure.
PERMUTE [44] found significant evidence for interdomain recombination over the gene as a whole but not for intradomain recombination, when the a1 and a2 domains were considered separately (Table 1).

Codon by codon analysis of selection and recombination in the Srahrevagh River population
OMEGAMAP showed that codons for 20 amino acid positions were under significant positive selection in Satr-UBA from the Srahrevagh trout ( Figures 1A, 1B, S7). Fourteen of these were in the a1 domain and six were in the a2 domain. Mean v for the entire Satr-UBA region was 0.6560.062 (Table 2), ranging from 0.051 (Asp173)-8.570 (Tyr113). Low background v rates suggest that most of the UBA gene is under purifying selection (Table 2). Evidence for strong positive or diversifying selection was found to occur at a discrete set of codons, (Figure 1A, 1B). Despite more codons being under selection in a1, those under the strongest selection were found within the a2 domain. The v estimate for Tyr113 was thirteen times the mean v, and that for Lys156 was five times the mean.
A recombination hotspot was found between codons 91 and 92, where r was 3.093 (confidence limits 0.250-14.201), 14 times the mean value of 0.2260.019 (Table 1, Figure 1C). This marks the position of the large intron II between the exons II and III coding for the a1 and a2 domains, respectively.
No evidence of a correlation was found between v and r coestimates for positions (Pearson correlation = 20.091, P = 0.219), as can be seen in the lack of correspondence in plots of v and r ( Figures 1B, 1C). Outside of the important recombination hotspot, selection may be more important in generating new alleles and both factors do not necessarily act on the same codons.
Comparative analysis of stocked Colorado River S. trutta using OMEGAMAP Re-analysis of the S. trutta UBA data from the Colorado river [12] showed no evidence of significant levels of selection at any amino acid site in the OMEGAMAP analysis of S. trutta (Figures 2A, 2B, S7). Mean v was 0.4860.022 for S. trutta in the Colorado River, which is lower than that found in the Srahrevagh. Mean r was slightly higher for S. trutta in the Colorado River (0.2960.008). Curiously, there was no significant evidence for a high r estimate at the transition point between the a1 and a2 domains ( Table 2).
Comparison of the results from both analyses showed that no codons had significantly higher v estimates in the Colorado River than in the Srahrevagh population. In contrast, the Srahrevagh had significantly higher v at codons for Arg62-Gly65 and Tyr113; and Ser12-Ala16; Arg62-Gly69; Val93-Asn96; and Tyr113 than those amino acid positions in the Colorado River. The mean v 1 /v 2 value for the brown trout comparisons with the Colorado River was 1.2860.103 (SE). The total v value was not greatly higher (,20%) suggesting the differences in v arise at discrete residues or selective foci in the PBR. Higher v estimates were also found in three sub-samples of the Srahrevagh of n = 10 alleles (data not shown), indicating the pattern is not an artefact of sample size differences.
DNA diversity statistics are presented in Table 3. Divergence levels are somewhat lower than those seen in the Srahrevagh (Table 1), mainly at the a2 domain. However, effective population size estimates (h) are somewhat higher per sequence in the Colorado River stock than in the Srahrevagh. Phylogenetics SPLITSTREE networks of Satr-UBA alleles incorporating the twenty one novel alleles described here and relevant salmonid UBA outgroups ( Figure 4A) demonstrate some large loops, suggesting recombination and/or gene conversion events affecting the alleles connected by those loops. Eleven of 21 (52%) of the Satr-UBA alleles described here are recombinant alleles. Most of the loops can be explained by recombination at the intron between the exons coding for the a1 and a2 domains of the Satr-UBA as previously described for Atlantic salmon [12,36]. Well-supported clades suggestive of conventional radiation by point mutation were also observed (e.g. clades including Satr-UBA*1101 and Satr-UBA*2301 Figure 4A). A neighbour-joining (NJ) tree of the same data presented for comparison ( Figure 4B) shows broad agreement with the SPLITSTREE network. However, alleles which are involved in loops in the network appear to be incorrectly grouped in the NJ tree, e.g. Onmy-UBA*4401 (AY278452), Onmy-UBA*4701 (AY278449) and Onmy-UBA*4601 (AY278450), indicating the utility of the SPLITREE networks for better interpretation of data affected by recombination. Recombinant alleles from the Srahrevagh which are combinations of a1 and a2 lineages which appear to be new to all salmonids are Satr-UBA*1201 and Satr-UBA*1801 (a1 L I /a2 L III ); Satr-UBA*2601 (a1 L II /a2 L III ); and Satr-UBA*2801 (a1 L II /a2 L II ).  (Figure 6)]. It is clear from these data that recombination is a major factor in population level divergence in brown trout, as found to be the case in S. salar [36].

Phylogenetics of a1 domain sequences
This study extends the number of ancient salmonid a1 lineages recorded in S. trutta. The a1 network is broadly tree-like, but features a few loops, suggesting intra-domain recombination between deeply diverged and ancient a1 lineages can also occur ( Figure 5A). Two loops warrant additional discussion. The relationship of the a1 sequence of Satr-UBA*1301 to previously described a1 L III sequences is marked by a loop in the network ( Figure 5A). When Sasa-UBA*0301 was removed from the network (analysis not shown), the Satr-UBA*1301 did not cluster with a1 L III sequences. Closer examination shows Satr-UBA*1301 and Sasa-UBA*0301 have been involved in separate intradomain recombination events which involve a sequence shared with the non-classical locus, ULA, which is unique to Salmo spp ( Figures 5B,  S4A). The ULA locus is on the same linkage group as UBA in S. salar [16]. These data, together with the fact that none of the reported non-classical loci [16] appear to be related to this allele, suggest that Satr-UBA*1301 is the first representative of a wellsupported and novel, ninth a1 lineage at this locus in salmonids. The additional Satr-UBA data provided by these novel alleles also reveal additional sub-lineages within a1 L V , termed L Va , L Vb and L Vc (Figure 5A, 5C). Sub-lineages L Va and L Vc are well-supported and feature characteristic sequence motifs shared amongst salmonids ( Figure 5C, S4B). Sub-lineage L Vb is poorly supported and appears to have been generated by multiple reticulations involving alleles from L Va and L Vc .
The region downstream of the recombination break point in Satr-UBA*1301 is marked by an amino acid motif between residues Pro59 and Ile66 (PDYWERETQI) which appears to be unique amongst salmonid UBA (see Figure S3). This region contains two sites, Tyr59 (conserved) and Glu63 (variable) which form ''Pocket A'' of the peptide binding cleft with Tyr171, on a2. A BlastP search found the identical amino acid motif in a shark (Triakis scyllium) and the Pallid Atlantic Forest Rat (Delomys sublineatus) although the differences at the nucleotide level over the same region were 20% and 17%, respectively. Similarly, human HLA-B*4413 differs by a single amino acid from the trout amino sequence but is 23.3% different in its nucleotide sequence. The shark, rodent and human alleles differ from Satr-UBA*1301 by 41% (55% nt), 52% (70% nt) and 56% (71% nt), respectively over the remainder of their amino acid sequences, suggesting some form of convergence in these MHC alleles in taxa separated by over 400 million years [55]. Separate examination of the phylogeny of a1 L I (which contains half of all the salmonid a1 alleles described in Figure S3) shows

Phylogenetics of a2 domain sequences
The phylogeny of a2 displays four distinct allelic lineages, three of which are already known in salmonids, but the fourth is novel and unique to brown trout ( Figure 6). This study also extends the diversity of a2 lineages recorded in S. trutta. Divergence between a2 allelic lineages is far greater than that between a1 alleles. The distinct ''majority'' type a2 L I lineage (containing two thirds of all the a2 alleles described in Figure S5), and the other two other highly diverged and ancient lineages a2 L II and a2 L III , have been maintained in all salmonids. S. trutta a2 sequences Satr-UBA*0501, Satr-UBA*1301, Satr-UBA*1401, Satr-UBA*1601 and Satr-UBA*2101 form a divergent, monophyletic and well supported, novel clade, designated L IV . The substitution of a hydrophobic valine or methionine residue at Gln95, otherwise conserved across diverse taxa, differentiates this clade from others. This residue is also conserved in non-classical salmonid loci such as UFA, UGA and UEA but not in the Salmo specific ULA where another hydrophobic residue, leucine, is found. The two positions adjacent to Gln95 are known to be important for peptide binding and were under significant diversifying selection in our OMEGAMAP and CODEML analyses. The a2 Satr-UBA*1301 sequence, part of this new lineage, has an interesting substitution of the positively charged histidine at Gln114. Gln114 is ordinarily conserved across diverse taxa (except the zebra fish Danio rerio where it is replaced by negatively charged glutamic acid) and is important to CD8 and bmetaglobulin interactions [56]. This position borders the selection hotspot identified in this work at Tyr113.
Trans-specific polymorphism is pronounced in a2 L II (and to a lesser extent a2 L III ) where a2 alleles found in O. mykiss (e.g. Onmy-UBA*0202) and Satr-UBA*2801 and Satr-UBA*2901 have very similar amino acid sequences. Notably, the entire diversity of salmonid a2 is captured by S. trutta ( Figure 6) and, indeed, all lineages described were identified in the Srahrevagh brown trout population. In contrast, while a2 L I is clearly very old and exhibits trans-specific polymorphism, there is more evidence of speciesspecific diversification, including the large number of S. salar sequences ( Figure S9).

Discussion
The first MH class I Satr-UBA data described from wild, endemic S. trutta have revealed a high diversity of alleles within a single population, new allelic lineages in both a1 and a2 domains, strong selection at discrete codons in the locus and the importance of recombination to population level divergence. These data permit new insights into the evolution of MH class I in salmonids, a locus of considerable importance in adapting to novel ecological challenges.   The frequency of these in the network implies that recombination is an important factor in the evolution of Satr-UBA, predominantly between the a1 and a2 domains. Conversely, good bootstrap support for splits involving several closely related Satr-UBA alleles is suggestive of conventional radiation by point mutation. Roman numerals (a1/a2) indicate the lineages to which each Satr-UBA allele's a1 and a2 sequence belongs (see also Figures 5 and 6). B) Neighbour-joining tree rooted on the midpoint for salmonid UBA amino acid sequences with bootstrap support (1,000 replicates) shown for nodes with 50% support or greater. Nodes in A) and B) highlighted with an orange triangle illustrate how SPLITSTREE is better able to visualise sequences affected by recombination. doi:10.1371/journal.pone.0063035.g004 The identification of twenty-one novel alleles, from twenty-eight individual fish demonstrated the high allelic diversity in the Srahrevagh S. trutta population. Allelic richness (10.2) was very similar to that in the Colorado River S. trutta, N A = 10, (and O. mykiss, N A = 10) [12], and the number of alleles was identical to that in S. salar (N A = 21) taken from four populations (including the Burrishoole) in the same area of Ireland [36]. However, the alleles in the wild S. trutta were more divergent (p = 0.260) than those in S. salar (p = 0.184) [36].
No MH class I allele was shared with the only S. trutta previously studied, from the Colorado River, although a1 and a2 sequences were shared. This mirrors the situation previously identified in S. salar populations [36] and highlights the role of recombination in driving rapid population level divergence at this locus in both Salmo species. Contrary to the findings in S. salar, however, there was no clear evidence of an interplay of selection and recombination on the same sites. We have also identified clear examples of recombination occurring between lineages at both a1 and a2 and with a non-classical locus, ULA. What factors provide for novel recombinant alleles to be functional and readily fixed at the population level? Figure 6. Phylogenetics of the a2 domain. A) Satr-UBA a2 sequences with novel sequences described in this work represented by square nodes. The number of plus signs after a sequence indicates the number of other Satr-UBA alleles which share this sequence in its entirety and, therefore, are sequences which are likely to have been involved in recombination. Known a2 lineages are indicated using roman numerals. Note that a novel a2 lineage, L IV , unique to S. trutta, which appears to have originated more recently from the a2 L I lineage, is well supported with the additional data described in this work. The shape of the overall tree is distinct from that of a1 with fewer well-supported lineages and with evidence of extensive radiation within the 'majority' a2 L I lineage. doi:10.1371/journal.pone.0063035.g006 Recombinant alleles may be more divergent, easier to behaviourally detect [57] (and thus favoured) in sexual selection [29] or kin association (demonstrated to occur in the Srahrevagh S. trutta population) [9]. Recombinant alleles are also likely to result in proteins with a radically altered peptide binding region, which may give rise to a divergent allele advantage [58][59][60][61]. If this is true, the more divergent suite of MH class I alleles found in S. trutta than in co-habiting S. salar should result in superior ability to detect pathogens, a possibility which could be addressed in pathogen challenge experiments. To extend this point further, in terms of adaptive variation, are populations with more divergent MHC alleles fitter?
However, there may not be an advantage to divergent alleles, as there is evidence for convergent evolution in MHC binding specificities [62][63][64][65], with human class I classified into as few as nine ''supertypes'', defined by overlapping peptide-binding motifs. In short, alleles which appear very different could be functionally similar. The advantage of divergent recombinant alleles to pathogen detection could also be negated by the fact that an important antigen processing gene (TAP1) is located on a separate chromosome to MH class I in salmonids [15], requiring that both proteins evolve to an 'average best fit' independently. In that case, the antigen processing genes may not be well adapted to the presentation of different types of peptides to novel divergent UBA alleles.
The extent of recombination in salmonid MH class I and this separation of the antigen processing genes imply that antigen presentation in salmonids is extraordinarily plastic. The discrete pattern of selection at class I may also be of note. The stocked S. trutta in the Colorado River have retained high allelic diversity but appear to lack variation ( Figure 2) at two selective foci identified in the Srahrevagh (Figure 1B), Phe94-Asn96 and Tyr113 (and in CODEML analysis for both Salmo species), which occur at the base of the peptide-binding cleft. These would seem important to antigen binding ( Figure 1A) and the relative lack of variation in the Colorado population is curious. Interestingly, the stocked O. mykiss in the Colorado River show a similar pattern of selected codons to the stocked S. trutta ( Figures S6A, S6B, S6C).
Additionally, Gln155 is one of only two amino acids positions found to be under strong selection across all salmonid taxa. However, this amino acid position is conserved in human class I and is known to be critical to class I restricted T cell recognition [39]. Gln155 is important to a newly-identified form of antigen presentation in HLA wherein longer peptides are bound bulged out of the peptide binding region (PBR) [39]. Direct interactions between the antigen and the T cell receptor dominate this form of binding, most MHC amino acids are not involved and the shape of the PBR is not likely to be a critical factor. We speculate here that this form of binding may be a feature of salmonid class I molecules. This would help explain how recombination between divergent a1 and a2 allelic lineages can freely occur. This hypothesis could be tested in future studies which identify the nature of antigens bound by different salmonid MH class I alleles. Figure S1 Salmonid UBA structure. Relevant structure of the salmonid UBA gene (after [66]) and based on the rainbow trout allele AF296362_Onmy-UBA*0501. Intron-exon organisation is shown with sizes for the relevant exons and introns in nucleotide base pairs given in parentheses. Note the large size of intron II between exons coding for the a1 and a2 domains. (TIF) Figure S2 Novel Satr-UBA alleles. Amino acid alignment of novel Satr-UBA alleles described in this work. Accession numbers are included in each allele name. Sequences from the a1 domain (top) and a2 domain (bottom) are displayed together with the respective lengths of each sequence. (TIF) Figure S3 a1 sequence alignments. A) Representative salmonid UBA a1 domain amino acid sequence alignments capturing the diversity of variation within a1 lineages (roman numerals) and between lineages. We include Satr-UBA*1301 with a1 LIII sequences. When sites which were found to be under selection in OMEGAMAP were considered, it is noted that these fall into two categories, sites which are highly variable between lineages and sites which are highly variable both between and within particular lineages. (TIF) Figure S4 Alignments highlighting recombination in a1 lineages. A) Nucleotide sequences for Satr-UBA*1301, Sasa-UBA*0301, Sasa-UBA*0801, Sasa-ULA*0102 and an a1 L I sequence, the reference sequence, Satr-UBA*0101. Note that Satr-UBA*1301, Sasa-UBA*0301 and Sasa-ULA*0102 have very similar nt sequences between positions 1 and ,136 whereupon Sasa-UBA*0301 is observed to abruptly demonstrate greater similarity to a typical a1 L III sequence, Sasa-UBA*0301. Satr-UBA*1301 sequence similarity to the ULA sequence persists slightly longer but thereafter large numbers of nt differences are observed. This pattern is typical of recombination or gene conversion events occurring within the a1 domain. B) Amino acid alignments of a1 L v lineages. Note the high degree of similarity between sequences from different species indicating that trans-species polymorphism is extensive in a1 L v . Note that sequences in lineage L Vb are more similar to sequences of L Vc between aa positions 1-28 but more similar to L Va sequences in the remainder of the sequence. This pattern might be explained by an ancient recombination event (or events) between L Va and L Vc sequences giving rise the poorly supported L Vb clade. Notably, when a1 L Vb sequences are removed from SPLITSTREE networks (data not shown), L Va and L Vc sequences appear as distinct a1 lineages although sharing a more recent common ancestor than any other pair of lineages in the network. This suggests both that intradomain recombination between lineages is possible but also that it is more feasible between more closely related lineages. (TIF) Figure S5 Alignment of representative salmonid UBA a2 domain amino acid sequence alignments showing the diversity of variation within a2 lineages (roman numerals) and between lineages. Sites found to be under selection in OMEGAMAP fall into two categories, sites which are highly variable between lineages and sites which are highly variable both between and within particular lineages. A notable feature of a2 diversity is the extensive and diffuse polymorphism within a2 L I . In contrast, a remarkable degree of conservation is observed within other a2 lineages. This may point to differences in selective pressures in different a2 lineages. (TIF) Figure S6 A) Model showing selected sites in the UBA protein for the Colorado River introduced populations of S. trutta population (top) and in the Colorado River O. mykiss population (bottom). For comparison, this information from the Srahrevagh River S. trutta population is also provided (inset, right, detail in Figure 1A). Clear differences in the distribution of selected sites in the peptide binding can be seen. B, C) Comparative plots of v for the Colorado River S. trutta (B) and O. mykiss (C) populations. The pattern observed in the O. mykiss population is remarkably flat outside distinct diversifying selection foci at Ser70 and between Asn149 and Ile163. Highest Posterior Density (HPD) 95% confidence intervals are seen in grey about the plot line and are tight about means in all cases, suggesting confidence in the v estimates. (TIF) Figure S7 Selected sites in UBA. Venn diagrams of sites under selection identified in independent OMEGAMAP analyses of the three individual populations labelled. Significance levels of selection on residues: p,0.001 (bold), p,0.01 (normal) and p,0.05 (italics). (TIF) Figure S8 Phylogenetics of a1 Lineage I. a1 L I Large loops are observed in the network, particularly affecting Satr-UBA sequences, indicating recombination events. Other parts of the network are more treelike, suggesting a stronger role for point mutation. Each salmonid species demonstrates some speciesspecific diversification but trans-species polymorphism is observed even within this most diverse of a1 lineages. (TIF) Figure S9 Phylogenetics of a1 Lineage I. The a2 L I network is typified by stellate radiation although incongruities may imply gene conversion, recombination or convergence also occurs. Trans-species polymorphism is observed although no sequences demonstrate a high degree of similarity. In other parts of the network, species-specific diversification is extensive, particularly for S. salar sequences. (TIF)