Diversity and selection of MHC class I genes in the vulnerable Chinese egret (Egretta eulophotes)

The genes of major histocompatibility complex (MHC) are important to vertebrate immune system. In this study, two new MHC class I genes, designated as Egeu-UAA and Egeu-UBA, were discovered in the vulnerable Chinese egret (Egretta eulophotes). Using a full length DNA and cDNA produced by PCR and RACE methods, these two MHC class I loci were characterized in the genome of the Chinese egret and were also found to be expressed in liver and blood. Both new genes showed the expected eight exons and were similar to two copies of the minimal essential MHC complex of chicken. In genetic diversity, 14 alleles (8 for UAA and 6 for UBA) in the MHC class I gene exon 3 were found in 60 individuals using locus-specific primers and showed little polymorphism. Only three potential amino acid residues were detected under positive selection in potential peptide-binding regions (PBRs) by Bayesian analysis. These new results provide the fundamental basis for further studies to elucidate the molecular mechanisms and significance of MHC molecular adaptation in vulnerable Chinese egret and other ardeids, finding that have not been previously reported.


Introduction
The major histocompatibility complex (MHC) is a gene complex that encodes for cell-surface proteins responsible for the recognition and presentation of foreign antigens to T-lymphocytes, and which triggers the adaptive branch of the immune system [1]. The MHC molecules can be divided into three classes, class I, class II and class III [2]. Class I and class II molecules have similar folding patterns, differ in their domain organization and are at best as similar in sequence to each other as to many other molecules. Class II molecules are involved in the adaptive immune response, but class I is recognised by T cells of the adaptive immune system and NK cells of the innate immune system. Some genes in the class III region encode molecules involved in lymphocyte interactions or with innate immunity. MHC class I molecules function to present peptides that arise from proteins expressed in the cytoplasm to contiguous organelles including the nucleus (often derived from what are referred to as intracellular pathogens). In contrast, MHC class II molecules present peptides that arise in intracellular vesicles and extracellular space ( polymorphism at the peptide-binding regions (PBRs) should increase the number of pathogens that can be recognized by an individual, resulting in balancing selection for disease resistance and genetic diversity [3,4]. The MHC is known to be a complicated gene group in vertebrates, both in terms of allelic diversity and gene number [5]. Numerous studies have suggested that selection for resistance and immunity from parasites and disease and possibly sexual selection are the major factors driving this polymorphism [6][7][8]. Thus, several studies have emphasized the potential of MHC genes as valuable molecular markers to assess the evolutionary and adaptive potential of endangered populations and species in relation to the menace of changed and emerging diseases [9][10][11][12].
Despite continuing investigation of avian MHC class I genes, and an increasing interest in their use as markers of adaptive genetic variation in evolutionary ecology research [13][14][15], many important gaps in our knowledge of MHC class I multigene family evolution in birds remain largely unexplored. Many studies have focused upon the isolation of cDNA transcripts or the variation within the polymorphic PBR exons, with the result that no full-length sequences have been described for non-galloanseriforms. However, the limited DNA sequence information, including differences in gene copy number, the presence of pseudogenes, and the inability to identify orthologous loci from sequence similarity in other unrelated species makes characterization of the MHC class I genes in more species of fundamental importance. For ornithological studies, locus-specific amplification is commonly hampered by the existence of highly conserved sequences surrounding the PBR, and locus assignment based upon PBR sequence similarity is often impeded by conflicting evolutionary signals due to recent duplication, recombination and/or concerted evolution [16]. The evolutionary forces acting upon each locus cannot be untangled from the larger issue of polymorphism and selective pressures averaged across all loci. Similarly, the relative contributions of classical versus nonclassical genes to the overall variation cannot be adequately evaluated. For these reasons, our current knowledge of avian MHC class I genes remains taxonomically restricted, which limits the potential of comparative methods and a broader understanding of the evolutionary processes acting upon these genes.
The Chinese egret (Ardeidae, Egretta eulophotes) is listed as a vulnerable species with an estimated global population of 2600-3400 individuals [17]. This egret is a migratory colonial waterbird, wintering in the south of Asia and breeding on offshore islands in Russia, North Korea, South Korea and China. The migratory colonial patterns of this egret may drive MHC polymorphisms as migratory birds would inevitably face exposure to a more diverse fauna of pathogens and parasites, and colonial birds would elevate transmission rates of pathogens [18]. Previous studies of this species have focused on mitochondria [19], microsatellite [20] and MHC class II genes [21][22][23]. However, up to the present, there are no studies on MHC class I genes in any heron or egret species, which is the first step towards understanding the role of pathogen-mediated selection and mate choice in the maintenance of MHC class I diversity. Specifically, this study aims to: (1) isolate full genomic and cDNA MHC class I sequences from the Chinese egret (Egretta eulophotes), (2) design locus-specific primers to survey the variation in PBR-encoding exons for all loci and (3) gain insights into the selection pressures acting on PBR and non-PBR regions within the third exon of MHC class I loci in the Chinese egret.

Ethics statement
This research and all procedures involving collection of animal tissue in the wild were approved by the Administration Center for Wildlife Conservation in Fujian Province (FJWCA-1208). The scientific license for access to the study site was issued by the Administration Department of Xiamen Egret Natural Reserve (XMENR-1005). Blood samples (~0.5mL) were collected from 60 nestlings (aged around 15 days) of E. eulophotes by puncturing the wing vein and using a syringe to withdraw the blood. The nestlings were immediately returned to the nest after staunching the wound with cotton. Collection of the blood samples was conducted during the morning, and visits to the breeding colony were restricted to a maximum of two hours per day.
Methods for DNA extraction, RNA extraction and reverse transcription Blood samples of 60 nestlings were collected from 35 nests at Riyu Islet (27˚01 0 N, 120˚25 0 E) in Fujian Province of China, where a total of 50 nests of the Chinese egret were found. The blood samples were frozen at −80˚C until DNA extraction. Total genomic DNA was isolated using the Universal Genomic DNA Extraction Kit Ver. 3.0 (TaKaRa), and total RNA was extracted using Trizol (Invitrogen, Switzerland) from the blood and liver of one individual Chinese egret that died at this Islet, respectively (S1 Fig). One μg of total RNA was then reverse transcribed into first-strand cDNA with PrimeScript RT Master Mix (TaKaRa), according to the manufacturer's instructions.

Isolation of MHC class I genes
A fragment from exon 2 to exon 4 of MHC class I genes in Chinese egret was first amplified with degenerate primers MHCI-ex2F: CGCTACAACCAGASCRRSG and MHCI-ex4R: GGGTA GAAGCCGTGAGCRC [24]. PCR reactions were carried out on a Biometra T gradient thermocycler in a final volume of 20 μL, which containing 1 × PCR buffer (50 mM KCl, 10 mM Tris-HCl, pH 8.3, 1.5 mM MgCl 2 ), 0.2 mM of each dNTP, 0.4 μM of each primer, 0.7 U of Taq polymerase (TaKaRa), 100 ng of genomic DNA. The conditions for PCR amplification were a denaturing step at 94˚C for 3 min, followed by 35 cycles at 94˚C for 30 s, 55˚C for 30 s, 72˚C for 2 min, and a final extension at 72˚C for 10 min. The 3' and 5' rapid amplification strategy of cDNA ends (RACE) [25] was used to isolate an entire coding sequence and the gene transcription was confirmed by using 3'-full RACE core set ver. 2.0 (TaKaRa) and 5'-full RACE Kit (TaKaRa). Based on the DNA sequence obtained from the fragment, gene specific outer/inner primers were designed in conserved exon 4 regions for 3' RACE (13sp1: CGTGGTCCGAGTGTCG 13sp2: GACCCTGACCTTGTCCTGC) and 5' RACE (15sp1: GGTACTTGTCCTTCTCCTCCG 15sp2: CCAGGTGTAGTAGGTGCCATC) respectively, using Oligo 6.0 (Molecular Biology Insights). PCR reactions followed the protocols and applications as specified in the kit. A 730bp and an 810bp segments were obtained from 3' RACE and 5' RACE, respectively. After removing their overlapping region, a 1380bp long intact gene fragment of MHC class I was obtained.
According to the complete cDNA sequence, two primers, 15utrF CAGAACTCTGCCCGGA GACGG and 13utrR AGCGTGCACAGGGAGCAGAAATCAG were designed to amplify the complete DNA sequence of the MHC class I gene. PCR reactions were carried out on a Biometra T gradient thermocycler in a final volume of 50 μL containing 1× LA PCR buffer II, 2.5 mM MgCl 2 , 0.4 mM dNTP, 0.5 μM of each primer, and 2 U TaKaRa, LA Taq. PCR conditions included an initial denaturation step at 95˚C for 2 min, 35 cycles of denaturation at 98˚C for 10 s, extension at 68˚C for 15 min, and a final extension at 72˚C for 10 min.
All PCR products were purified using the Agarose Gel DNA Purification Kit Ver. 2.0 (TaKaRa), then ligated into pMD 18-T vector (TaKaRa) and transformed into Escherichia coli DH5α. The genes of over 3000 bp were sequenced by using Primer Walking method from both sides respectively. In this Primer Walking method, after a shorter stretch (about 800bp) had been sequenced using M13F or M13R, a new primer was generated from the end of what had been already sequenced and the process was continued until the sequence was overlapped. Ten positive colonies of each band were selected to sequence bidirectionally on an automatic sequencer (ABI PRISM 3730; Invitrogen Biotechnology Co. Ltd.) using universal M13 sequencing primers and BigDye version 3 (Applied Biosystems).

Polymorphism of the MHC class I genes complex
To detect polymorphism within the MHC class I genes in wild populations of the Chinese egret, 60 nestlings on the Riyu Islet were genotyped by semi-nested PCR combined with single strand conformation polymorphism (SSCP). First, two forward locus-specific primers in intron 2 were designed and designated as UAAF and UBAF (UAAF: CATCCCCGTGGCAGTCGAGAand UBAF: AATCCCATCTCCTTGTCCCATCTTCC). These were combined with the reverse primer 13exR: CGTAGCTCACGTATTTCCTCAGCCAC to amplify the two loci, respectively. PCR was carried out in a 20 μL reaction mixture containing 1 μL genomic DNA, 0.7 U of Taq polymerase (TaKaRa), 1.5 mM MgCl 2 , 200 μM of each dNTP and 0.4 μM of each primer, for 25 cycles at 94˚C for 30 s, 60˚C for 30 s and 72˚C for 1min. Second, to obtain suitable length fragments for SSCP genotyping and scanning of the variation in exon 3 of each locus, a second round PCR was carried out using the primer 13exF: ACTGAAGGCCCAGGGCTGC and 13exR was used to amplify the entire exon 3 in each locus. The reaction conditions for the second round PCR were identical with those described for the first round. The PCR products from the first round were diluted 100-fold and used as the template. The amplicons were separated by SSCP analyses as described by Fain [26]. Five μL of each PCR product were mixed with an equal volume of 95% formamide, denatured for 5 min at 99˚C and immediately chilled on ice water for 10 min. The reaction mixture was loaded on an 8% nondenaturing polyacrylamide gel (PAGE) (37.5:1). Electrophoresis was carried out at 10˚C for 5 h at 8 W per gel followed by a sensitive silver-staining procedure, as described by Budowle [27]. Samples with equal banding patterns were rearranged and electrophoresed a second time on a nondenaturing PAGE on adjacent lanes to ensure the genotyping, and samples with unique genotypes were typed using two independent PCR reactions. All SSCP-bands were cut from at least two individuals of each genotype and incubated in 80 μL water for 3 h at 37˚C. One μL was used as template in a second PCR under the same conditions as described above. The products were screened in an additional SSCP analysis and every allele was directly sequenced in both directions from at least two individuals or two independent PCRs from one individual.

Exclusion of PCR errors and definition of alleles
For all PCR reactions, the amplification, cloning and sequencing was carried out twice and only sequences found in both experiments were included in the analyses to avoid the inclusion of PCR artifacts. Since recombination of cloned PCR products can introduce additional artifacts [28,29], direct sequencing of uncloned PCR products was used for agreement with the polymorphic sites. The bands excised from SSCP were directly sequenced in both directions. Only completely identical sequences found in two independent PCR events were defined as alleles. These alleles are referred to as "verified" because the same sequence is highly unlikely to arise twice from independent amplification errors. Throughout this study, the word "allele" is used to describe a 273 bp exon 3 sequence derived from SSCP genotyping.

Data analyses
The complete DNA and cDNA sequences for MHC class I genes, derived from the Chinese egret, and were aligned using the DNAMAN software package (Lynnon Biosoft, Quebec, Canada). Exons and introns were distinguished using the GenScan Program [30]. DNA fragments obtained from the 60 nestlings were edited in the BioEdit Sequence Alignment Editor [31]. Alignment gaps were treated as missing data in all analyses. Sequence variability statistics were calculated using the Mega 4.0 Program [32]. The proportion of synonymous (d S ) and non-synonymous (d N ) substitutions were calculated for both the 18 possible PBRs as defined by Bjorkman and Saper [33,34], and the remaining non-antigen binding sites (non-PBRs) for each of the two loci. The Z-Test in the Mega 4.0 was used to test for directed selection. Standard errors were obtained with 1,000 bootstrap replicates, including average rates of synonymous (d S ) and nonsynonymous (d N ) substitutions per site according to the Nei-Gojobori method with Jukes-Cantor correction for multiple substitutions [35]. Population allele frequencies and tests of deviation from Hardy-Weinberg equilibrium were calculated using GENEPOP 4.0 [36]. The programme CODEML in the PAML package, version 3.15 [37] was used to test for the presence of codon sites affected by positive selection and to identify those sites in exon 3 sequences of the Chinese egret, and the neutral model M7 and the positive selection model M8 were compared using maximum likelihood ratio tests (LRT).

Characterization of MHC class I genes
In this study, DNA sequences of two different MHC class I genes were isolated from one individual Chinese egret, and the sequences were highly similar over large stretches. They could be distinguished, however, by the divergent intron 2. The exon sizes were similar to those found in other birds (Fig 1, S2 Fig). According to Klein [38], these two genes were designated as Egeu-UAA and Egeu-UBA (GenBank Accession KY511591-KY511592). The result of RACE analyses indicated that each of the two genes comprised a total of 1080 bp encoding for 360 amino acid residues in the exon sequences. These encoded for eight of the expected exons including those encoding the signal peptide, the alpha 1 (α1), alpha 2 (α2) and alpha 3 (α3) domains, the transmembrane region, and the cytoplasmic tail (Fig 2). No frame shifts or premature stop codons were observed, but the canonical donor-acceptor splice sites (GT/AG) and polyadenylation recognition sequences (AATAAA or ATTAAA) were present in all sequences.

Sequence variation in exon 3
Based on the Egeu-UAA and Egeu-UBA sequences, two forward locus-specific primers were designed on intron 2 for each locus, respectively. These two sequences were then combined with the universal primers to amplify the exon 3, as described above. Single locus genotyping of MHC class I was carried out on 60 nestlings. The SSCP results showed one or two alleles per individual at each locus (S3 Fig, S1 Table), confirming the presence of two loci. Fourteen alleles in total were obtained from the two loci, including 8 from Egeu-UAA and 6 from Egeu-UBA (GenBank Accession KY511593-KY511606). Translation of the nucleotide sequences produced 14 unique amino acid sequences (Fig 3).

Frequency of allele and Hardy-Weinberg equilibrium
In this study, our sample size was appropriate for the analyses of allele frequency and Hardy-Weinberg Equilibrium because 60 samples were sufficient to represent the full genetic variation present in this islet population. According to the optimal sampling strategy of Marshall et al [41] and Sjögren et al [42], approximately 30 individuals from a finite population could be considered as a recommended size for collection in genetic studies. Our allele frequency results (Table 1) showed that alleles of each locus had an uneven distribution. At each locus, some alleles had a higher frequency than others, such as Egeu-UAA Ã 02, Egeu-UAA Ã 04, Egeu-UBA Ã 01 and Egeu-UBA Ã 04. The departure from Hardy-Weinberg equilibrium within each locus was statistically significant (P<0.001).

Tests of selection
When the ratio of non-synonymous to synonymous substitution (d N /d S ) was analyzed in predicted peptide-binding region (PBR) sites, and non-PBR sites for each gene separately [43], signs of positive selection were only found in the PBR sites of the two loci, with a d N /d S ratio of 1.23 and 1.30 respectively. The ratios were not significantly different than 1, indicating that strong selection was not acting on these two loci in the Chinese egret (Table 2). Furthermore, we tested if positive selection was acting on the exon 3 using CODEML. The LRT values comparing M7 and M8 models indicated that M8 fitted the data significantly better than M7 (P<0.001) in the Egeu-UAA locus. The M8 model identified positive section at the Egeu-UAA locus in which there were three identified codons, 9, 23, and 62 (>95% posterior probability) which corresponded to the potential peptide binding codons identified in human HLA-A2 (Fig 3). As shown in Table 3, the Egeu-UBA locus showed no evidence of positive selection for the LRT (2.1824), indicating that the M8 model did not fit better than M7 (P>0.1). Additionally, no amino acid sites in exon 3 were detected under strong positive selection (confidence interval level is greater than 95%).

Discussion
Using a PCR-based approach, two full-length MHC class I genes in the Chinese egret were isolated and characterized. These two genes were newly discovered in Ardeidae, a non-model avian species. Sequence features of antigen-presenting MHC class I genes were largely conserved in each gene, and no frameshift mutations, premature stop codons, or inappropriate splice signals were observed. Additionally, cDNA sequences isolated for each locus indicated that both of them were transcriptionally active in liver and blood. The MHC class I genomic  architecture and complexity in the Chinese egret conformed to previous data collected in the nonpasserine birds [44].
In this study, intron 2 of the MHC class I genes in the Chinese egret was found to be more divergent than other regions, making it a likely candidate for locus specific primers in single locus typing of MHC class I. Further, the SSCP results showed one or two alleles per individual on each locus, conforming that the loci were separate. The low number of loci in this egret would allow us to completely genotype each locus of the MHC class I in individual egrets. Moreover, the primers used in this study were designed for targeting highly conserved regions across class I genes, which will be useful for successfully cross-amplifying similar fragments in other Ardeid species [24,43,45].
Sequence features typical of antigen-presenting MHC class I genes in the Chinese egret were largely conserved in both loci. Eight residues in the α1 and α2 domains contacted main chain atoms of the bound ligand, and anchored the peptide termini in a sequence-independent manner (Fig 2, in dark grey shading) [33,34]. In classical MHC class I genes of human and mouse, no more than two amino acid replacements exist at these sites, with substitutions tending to be of a conserved nature [46,47]. The Egeu-UAA and Egeu-UBA in the Chinese egret showed no deviation from the consensus "YYRTKWYY" sequence in which arginine is substituted for tyrosine at alignment position 87 in non-mammalian vertebrates [48).
The "Major "class I genes with high sequence variation and high expression levels across tissue types have been described for some bird species such as the chicken, duck, red-billed gull and crested ibis [44,[48][49][50]. Among the classical I genes of the Chinese egret, genetic diversity in Egeu-UAA locus was higher than the other locus, and the number of alleles obtained from this locus by RT-PCR was also higher. Additionally, the d N /d S values (PBR site) suggested that Egeu-UAA locus was under positive selection. These findings suggest that Egeu-UAA represents a major class I gene in the Chinese egret. In contrast to the greater sequence variation of Egeu-UAA, Egeu-UBA showed extremely limited variation and no evidence of positive selection on exon 3, suggesting that this locus may represent a minor MHC class I gene. Moreover, CODEML analyses revealed only 3 residues with posterior probabilities of positive selection (P!0.95) in Egeu-UAA exon 3, which is a finding significantly difference with most previous research results [44,51]. This low number of positive selection sites was also found in recent studies of the greater prairie chickens (Tympanuchus cupido) [52] and several falcon species [53]. A possible explanation for the lack of both polymorphism and detectable balancing selection at the locus is that strong directional selection has acted on this locus and reduced variation at the PBR. Another alternative explanation for this phenomenon is that drift in this egret population has removed most evidence of selection at this locus. In the future, more studies on other populations of the Chinese egret are needed in order to demonstrate whether drift has erased traces of diversifying selection on the MHC class I loci in this population or strong directional selection has acted on this locus throughout the range of the species.