Giant Panda Genomic Data Provide Insight into the Birth-and-Death Process of Mammalian Major Histocompatibility Complex Class II Genes

To gain an understanding of the genomic structure and evolutionary history of the giant panda major histocompatibility complex (MHC) genes, we determined a 636,503-bp nucleotide sequence spanning the MHC class II region. Analysis revealed that the MHC class II region from this rare species contained 26 loci (17 predicted to be expressed), of which 10 are classical class II genes (1 DRA, 2 DRB, 2 DQA, 3 DQB, 1 DYB, 1 DPA, and 2 DPB) and 4 are non-classical class II genes (1 DOA, 1 DOB, 1 DMA, and 1 DMB). The presence of DYB, a gene specific to ruminants, prompted a comparison of the giant panda class II sequence with those of humans, cats, dogs, cattle, pigs, and mice. The results indicated that birth and death events within the DQ and DRB-DY regions led to major lineage differences, with absence of these regions in the cat and in humans and mice respectively. The phylogenetic trees constructed using all expressed alpha and beta genes from marsupials and placental mammals showed that: (1) because marsupials carry loci corresponding to DR, DP, DO and DM genes, those subregions most likely developed before the divergence of marsupials and placental mammals, approximately 150 million years ago (MYA); (2) conversely, the DQ and DY regions must have evolved later, but before the radiation of placental mammals (100 MYA). As a result, the typical genomic structure of MHC class II genes for the giant panda is similar to that of the other placental mammals and corresponds to BTNL2∼DR1∼DQ∼DR2∼DY∼DO_box∼DP∼COL11A2. Over the past 100 million years, there has been birth and death of mammalian DR, DQ, DY, and DP genes, an evolutionary process that has brought about the current species-specific genomic structure of the MHC class II region. Furthermore, facing certain similar pathogens, mammals have adopted intra-subregion (DR and DQ) and inter-subregion (between DQ and DP) convergent evolutionary strategies for their alpha and beta genes, respectively.


Introduction
The major histocompatibility complex (MHC) of vertebrates is a highly polymorphic region that controls immune responses through presentation of pathogen-derived peptides to T cells [1]. The MHC superfamily includes two major families, class I and class II. Class I molecules primarily present cytosol-derived peptides to cytotoxic CD8+ T cells, whereas class II molecules mostly present peptides derived from exogenous molecules to helper CD4+ T cells. These two classes of MHC molecules also differ in the structure of their antigen-binding (AB) domains. The AB domain of class I molecules is composed of two subdomains encoded by exons 2 and 3. The functional class II molecule is a heterodimer consisting of an alpha gene-encoded a chain and a beta gene-encoded b chain. Thus, the AB domain of the class II molecules is also a paired heteromeric structure, half of which is encoded by exon 2 of the alpha gene and half by exon 2 of the beta gene.
As the perceived primary role of the MHC genes has evolved from its original transplantation histocompatibility function [2] to ''the center of the immune universe'' [3], the MHC locus has become one of the most intensely studied genomic regions in vertebrates. The first of these regions to be completely sequenced, mouse MHC (H2) and human MHC (HLA) [4][5], revealed that the class I and II families are linked to a third region (class III) and contain unrelated genes, together constituting a single gene complex. The MHC class II region was further subdivided into DR, DQ, DP, DO, and DM subregions. The DR, DQ, and DP genes encode classical class II molecules responsible for presenting foreign peptides and triggering immune responses, whereas the DO and DM genes encode non-classical class II molecules that serve as molecular helpers promoting peptide binding to classical class II molecules [6]. Each subregion contains different numbers of alpha and beta genes. Furthermore, the number of expressed genes and pseudogenes also varies greatly among subregions.
The difference in the number of alpha and beta genes has been attributed to an evolutionary birth-and-death process in which new MHC genes have been created by repeated gene duplication, with some duplicate genes subsequently being deleted or becoming nonfunctional through the accumulation of deleterious mutations [7]. This evolutionary process has also contributed to the large difference in the number of MHC genes between species within a given mammalian order [8]. A comparison of class I and II sequences from major vertebrate groups demonstrated that the class I loci underwent a faster birth-and-death process than did class II loci, making it difficult to establish orthologous relationships of class I-a genes among different mammalian orders [9][10]. In contrast, the greater longevity of class II genes has made it relatively easy to identify orthologous class II loci in different orders of mammals [10][11]. As a result, and in view of the pivotal role of MHC genes in the immune system and the birth-and-death characteristics of class II genes, the MHC class II genomic regions for several mammals, including cat [12], dog [8], cattle [13], and pig [14], were sequenced to identify a complete and ordered set of MHC class II genes and resolve MHC evolutionary history from a comparative genomics perspective.
The giant panda is one of the world's most endangered species, currently consisting of a population of approximately 1600 individuals [15]. Historically, the range of giant pandas has included southern and eastern China and extended to northern Burma and northern Vietnam; but habitat loss has restricted the giant pandas to six isolated mountain ranges of China [16]. Considerable effort, involving both in situ and ex situ conservation strategies, has gone into protecting this rare species. At present, the captive giant panda population is approximately 240 individuals [17]. However, studies on artificially bred giant pandas have demonstrated reduced infectious disease resistance and reproductive success, problems that have severely hampered the development of the captive giant panda population [18] and created an incentive to explore more effective conservation strategies.
In addition to the crucial role of MHC in the immune response, increasing evidence suggests that MHC genes strongly influence the rate of reproductive success by controlling mate selection [19][20] and mother-fetus bio-compatibility [21][22][23]. Consequently, understanding genetic aspects of MHC genes should greatly aid in efforts to protect the giant panda. In this pioneering effort, we present the results of genomic sequencing of the giant panda. To resolve the number and order of the MHC class II genes, we have determined a 636,503-bp nucleotide sequence spanning the entire MHC class II region. We have then compared this sequence to that of MHC class II loci from other placental mammals to explore the evolutionary history of this important gene cluster.

Results and Discussion
Genomic structure and content A 636,503-bp nucleotide sequence spanning the giant panda MHC class II region was obtained by assembling contigs of 10 sequenced BAC clones. The giant panda MHC was designated Aime-MHC (Aime representing Ailuropoda melanoleuca), following the nomenclature system for the MHC of vertebrates [24]. A total of 26 genes were identified in Aime-MHC after repeat masking, GENESCAN prediction, and BLAST alignment. These were BTNL2, DRA, DRB1, DQA1, DQB1, DQA2, DQB3 y, DQB2, MAPRE1y, DRB2 y, DYB y, DOB, TAP2, PSMB8, TAP1, PSMB9, RPS15A y, ATPase y, DMB, DMA, BRD2, DOA, DPA y, DPB1 y, DPB2 y, and COL11A2 ( Figure 1). Without repeat masking, GENESCAN actually predicted 52 loci based on the original 636,503-bp sequence, but the predicted results indicated that 23 of these were derived from repeats, and 5 non-repeat genes had incorrect exon boundaries that were mixed with intron or repeat sequences. Thus, the final gene prediction was achieved based on repeat-masked sequences. The original gene prediction suggested four novel transcripts, but after alignment with the NCBI nucleotide collection (nr/nt) database, these sequences were all found to correspond to homologous fragments in the MHC regions of other mammals, confirming that these were local sequences specific to the MHC region. None of these homologous loci was predicted to be expressed.
Of these 26 genes, those in the DR, DQ, and DP loci are common to placental mammals and belong to classical MHC class II gene families responsible for antigen presentation. The DO and DM genes encode non-classical MHC class II antigens related to ''molecular helpers'' that promote peptide loading onto classical class II molecules. The pseudogenes DRB2 and DPA were probably induced by relatively recent inactivating mutational events because both retained complete MHC genes that were normal except for a mutation resulting in a stop codon in exon 2 (DRB2) and the deletion of a codon in exon 2 (DPA). This showed that dysfunction of an MHC gene could arise as a result of those being a high spot for mutations and deletions. Other pseudogenes were severely reduced in size, usually comprising 1-2 exon fragments, probably resulting from large-scale deletions and gene recombination. The giant panda MHC class II region presented a linked gene cluster containing DOB, TAP2, PSMB8, TAP1, PSMB9, DMB, DMA, BRD2, and DOA. This gene group is present in this identical order in other sequenced mammals, including humans [5], cats [12], dogs [8], cattle [13], pigs [14], and mice [4]. We designated this DOB,TAP2,PSMB8,-TAP1,PSMB9,DMB,DMA,BRD2,DOA gene cluster as the DO_box. Of these genes, PSMB encodes a proteasome subunit involved in cleaving peptides for presentation by class I antigen, the TAP gene product is a transporter for antigen processing, and BRD2 codes for a mitogen-activated nuclear kinase. Also in this cluster was BTNL2, which is a butyrophilin IIlike gene of the immunoglobulin superfamily, and RPS15A y and ATPase y, which are pseudogenes that once encoded ribosomal protein S15a and vacuolar ATPase.
There were two genes in this region of the giant panda genome, MAPRE1y and DYB y, that we would not have expected to find. Despite being a pseudogene, MAPRE1 y was detected in the Aime-MHC, but was absent in the MHC region of all the other mammals sequenced so far. The MAPRE1 (Microtubule-associated protein, RP/EB family, member 1) molecule is responsible for regulation of microtubule dynamic instability. In other mammals, the MAPRE1 gene is located in a different chromosome from that which harbors the MHC genes. For example, the MHC regions of human, dog, cow, pig, and mouse are located in chromosomes 6, 12, 23, 7, and 17, respectively, but their corresponding MAPRE1 genes are found on chromosomes 20, 1, 13, 17, and 2. This suggests that the giant panda's MAPRE1 y pseudogene has been transposed from another chromosome to the chromosome 9q region where the MHC is located [25]. The functional DY subregion normally contains a pair of DYA and DYB genes, which were thought to be nonclassical MHC class II loci [26] specific to ruminants [27]. The appearance of giant panda DYB y, albeit a pseudogene, suggests that the DY subregion is not strictly ruminant-specific, but was once shared by members of the Laurasiatheria superorder, including Ruminantia and Carnivora ( Figure 2).

Genomic comparisons
Dot-plot analyses of the class II regions of Aime-MHC, HLA, FLA, DLA, BoLA, SLA, and H2 showed that the mammalian class II MHC retained multiple syntenic conserved segments, with the Carnivora MHC fragments of Aime-MHC, FLA, and DLA presenting the highest genomic similarities ( Figure 3A). The selfsequence dot plot of the giant panda MHC class II indicated three large intra-MHC repeat segments corresponding to DQA1- DQA2, DQB1-DQB2, and DRB1-DRB2 pairs ( Figure 3B). Additionally, the intraspecific dot plot of the giant panda yielded some small homologous plots ( Figure 3B), which were probably giant panda-or MHC-region-specific repeats because repeat masking based on Carnivora common repetitive elements failed to reveal them before dot plotting.
The genomic organizations of the class II genes of Aime-MHC, HLA, FLA, DLA, BoLA, SLA, and H2 were compared using Multi-PipMaker and VISTA, both of which produced similar global genomic alignments. The VISTA plot ( Figure 4) revealed that, despite a deficiency in the DQ subregion of FLA the Carnivora's FLA and DLA class II MHC had large segments that were homologous to those of the Aime-MHC, as shown in pair-wise dot plots ( Figure 3). It also showed that the mammalian MHC class II region could be divided into six parts, including three conserved A, D, and F segments, and three highly variable B, C, and E segments. A, D, and F comprise BTNL2, the DO_box, and COL11A2, respectively, while B, C, and E correspond to DR1-DQ, DR2-DY, and DP subregions, respectively ( Figure 4). The major differences among Aime-MHC, HLA, FLA, DLA, BoLA, SLA, and H2 were in the genomic length and content of the highly variable regions B, C, and E (Table 1 and Figure 4), a variability that was attributed to differences in alpha and beta gene birth-anddeath process in DR, DQ, DY, and DP subregions ( Figure 4). The cat MHC region lost the entire DQ subregion but expanded the DR subregion, resulting in the longest Carnivora BTNL2-DR1 region (225,849 bp; Table 1). The human HLA has developed into a MHC complex (,702 kb long; Table 1) containing multiple suites of DR, DQ, and DP genes. It is also the only complex so far identified with two functional DPA and DPB genes ( Figure 4). Similarly, the cow BoLA region has evolved into several sets of DR, DQ, and DY genes and possesses a pair of functional DYA and DYB genes ( Figure 4). Both the expansion of DR and DQ subregions and the complement of DY genes in cow BoLA caused BoLA to become the longest mammalian MHC class II region of approximately 753,296 bp, after excluding the 17 cM gap (Table 1 and Figure 4).
A comparison across HLA, FLA, DLA, BoLA, SLA, and H2 showed that, although the Aime-MHC did not extend its DR subregion and retained only relic DP pseudogenes, its DQ subregion has expanded and four functional DQ loci survived the MHC gene death process ( Figure 4), resulting in a total length of 636,503 bp (Table 1). In contrast, the cat FLA completely lacks the DQ subregion and the dog DLA retains expression of only a pair of DQA and DQB loci ( Figure 4). The cow BoLA region was shown to possess the maximum number of expressed DQ members (five loci). The ongoing horse genome-sequencing project indicates that this herbivore might also have retained two or three pairs of functional DQ genes (NCBI Equus caballus Build 1.1). Collectively, we speculate that a high number of DQ loci could bear a relationship to the herbivorous habit. The identification of four functional DQ genes in the giant panda, which has evolved from a carnivorous habit to a diet of bamboo, is consistent with this interpretation. Thus, these genes were  probably beneficial in the dietary adaptation of giant pandas.. The most striking characteristic in the VISTA-based genomic comparisons is the absence of the entire DR2-DY region in human HLA and mouse H2 (Figure 4). This contrasts sharply with the MHC regions of the other six mammals studied, all of which contained the DR2-DY region; however, only in ruminant cattle were all of these genes functional (Figure 4). In a previous study, Zeng et al. [28] concluded that the mammalian MHC class II region had the genomic structure: BTNL2,DR,DQ,DOB, DM,DOA,DP,COL11A2. However, our comparative genomic study revealed that the DY subregion was not specific to ruminants, suggesting instead that the DY subregion might have been present in the genomic structure of ancestral mammalian MHC class II genes. The resolution of the ancestral MHC class II structure requires an estimate of the time at which DY genes arose in order to judge whether Euarchontoglires, such as humans and mice, lost the DY subregion, or Laurasiatheria, such as carnivores and herbivores, gained the DY genes after the separation of Euarchontoglires and Laurasiatheria.

Phylogenetic tree and divergence time estimation
The above results highlight possible traditional misunderstandings of the mammalian genomic structure of the MHC class II region. Moreover, a recent study reported a completed opossum MHC genome sequence [29], providing a suite of MHC genes from marsupials. As a result, there is a need to reconstruct phylogenetic trees to distinguish the relationships among alpha and beta genes in different taxa. In our phylogenetic analysis, we excluded all pseudogenes because of their distorted exon content and evolutionary rates, and collected as many expressed alpha and  beta genes as possible, including all those that were locus-defined and identified by genomic sequencing. The Bayesian-based and MP-based methods yielded phylogenetic trees with similar topology, so only the Bayesian phylogenetic trees are depicted.
To exclude effects of the highly variable exon 2, whose variation was driven by pathogen variation, we first built a phylogenetic tree using the sequences of exons 3-4. The resulting tree showed that the DM genes clustered at the highest level, consistent with the fact that the DM genes developed before the split between birds and mammals [30]. Hence, the DMA and DMB genes were used as outgroups in constructing alpha and beta gene trees. The phylogenetic tree based on exons 3-4 had a very similar topology to that of the phylogenetic tree constructed using the different taxa sampled ( Figure 5 and Although we adopted a relaxed time range for the split between marsupials and placental mammals that incorporated different results from fossil and genetic data [31][32][33][34], both the alpha and beta trees supported the interpretation that marsupials diverged from placental mammals ,150 MYA ( Figure 5), in good agreement with the fossil record (143-178 MYA; [31]). The radiation time of placental mammals remains controversial, but a crude estimation of .100 MYA has been suggested [32]. In this study, the placental mammals sampled were all from Boreoeutheria fauna, whose members diverged at a relatively definite date, approximately 94-100 MYA [33][34]. As a result, multiple sets of orthologous loci from the giant panda, human, cat, dog, cow, pig, and mouse used the time constraint of 94-100 MYA for the DOA, DQA, DRA, DOB, DRB, and DQB gene clusters ( Figure 5). MULTI-DIVTIME dated the divergence of these mammals to 96-97 They suggest that the DR, DP, DO, and DM regions developed prior to the divergence of marsupials and placental mammals from a common ancestor (150 MYA), whereas the ancestral MHC alpha and beta genes evolved into the DQ and DY loci approximately 130 MYA ( Figure 5); that is, before the intra-Boreoeutheria divergence (100 MYA), but after the marsupial-placental split (150 MYA). This indicates that the DY genes were lost in human and mouse rather than being gained in other mammals. Consequently, the Aime-MHC organization pattern, namely BTNL2,DR1,DQ,DR2,DY, DO_box,DP,COL11A2, may typify mammalian MHC class II regions. In contrast to previous studies [10][11], which grouped the DQB and DPB regions together, our phylogenetic tree yielded separate branches comprising DPB of mammals and DAB of marsupials, and then clustered these with the major clade containing DOB and DRB genes ( Figure 5).
To determine whether inclusion of the antigen presentation region (i.e., exon 2) would change the topology of the gene tree, we constructed another phylogenetic tree based on all exons. As expected, the new tree of beta genes grouped the DPB together with the DQB loci, but the topology of other genes was very similar to that of the exon 3-4-based tree ( Figure 6). One difference, however, was that the new alpha-gene tree displayed intra-clade rearrangements relative to the former alpha tree, especially in the DQA and DRA clusters ( Figure 6). In the former alpha tree, the intra-clade branches of DOA, DQA, and DRA clusters corresponded well to the taxonomic relationships of Euarchontoglires and Laurasiatheria ( Figure 5 and Figure 2). However, with the exception of the DOA genes, human alpha genes were reassigned to the Laurasiatheria clade in the newly built tree (Figure 6). Exon 2 of MHC class II alpha and beta genes (except for nonclassical DO and DM loci) is a highly variable domain, and its variation is driven by continual competition among pathogen variants and host-pathogen co-evolution. Accordingly, the contrasts between the two trees ( Figure 5 and Figure 6) indicate that mammals exposed to certain similar pathogens adopted intra-subregion (DR and DQ) and intersubregion (between DQ and DP) convergent evolutionary strategies for their alpha and beta genes, respectively. Furthermore, the true phylogenetic relationships of MHC class II genes should depend on the sequences of exons 3-4, which are not involved in antigen presentation.

Birth-and-death of MHC class II genes
The phylogenetic trees optimized by timescale provided a straightforward depiction of the MHC class II genes birth-anddeath process ( Figure 5). For example, the marsupials' DCA, DBA, and DBB genes and the placental mammals' DPA, DOA, DRB, and DQB-DYB genes diverged from a common marsupialplacental ancestor but orthologous DCA, DBA, and DBB placental and DPA, DOA, DRB, and DQB-DYB loci were not detected ( Figure 5; for a convenient description, we assigned the descriptors DGA and DGB to the ancestral placental genes of DBA and DBB, and allocated the DWA, DFA, DHB and DXB to Figure 6. Phylogenetic trees based on all exons of alpha and beta genes from marsupials and placental mammals. The Aime-DRB2 was included in the tree because of its critical position in the VISTA plot and the fact that its coding regions were relatively intact. The DMA and DMB genes were used as outgroups in constructing trees (data not shown). Black triangles represent predicted evolutionary branches that are different from those in Figure 5. doi:10.1371/journal.pone.0004147.g006 the ancestral opossum genes of DPA, DOA, DRB and DQB-DYB, respectively.). Thus, the placental mammals lost ancestral DCA, DGA (orthologous to DBA), and DGB (orthologous to DBB) genes whereas the opossum lost the DWA (orthologous to DPA), DFA (orthologous to DOA), DHB (orthologous to DRB), and ancestral DXB (orthologous to DQB-DYB) genes ( Figure 5). The ancestral DX genes spawned two offspring families (DQ and DY subregions) in placental mammals approximately 130 MYA ( Figure 5). In addition to birth and death of gene members, another process that molded MHC class II gene phylogeny is functional conversion. The DCA and DCB genes, which were paired in the opossum [29], went their separate ways in the mammals -DCA died off and DCB genes evolved into DOB genes ( Figure 5). The DAA and DAB genes, also thought to be paired in marsupials, developed into DRA and DPB, respectively, in placental mammals ( Figure 5). DB is the most active subregion in the opossum, but all DB genes ceased to exist in placental mammals ( Figure 5).
On the basis of the foregoing account, we created a schematic depiction of the mammalian MHC class II genes birth-and-death process (Figure 7). Incorporating the MHC genomic structure of the opossum and placental mammals, we propose a therian ancestral MHC organization with six alpha genes and five beta genes ( Figure 7A). These subsequently developed into DO, DR, DQ-DY, and DP in placental mammals ( Figure 7B-E) and DC, DA, DB, and DX in marsupials (Figure 7B'). Genes orthologous to DB loci were not detected in any of the Boreoeutherian animals sampled here, leading us to conclude that the ancestral DB alpha and beta genes had been lost in the Boreoeutherian ancestral MHC ( Figure 7B). The phylogenetic tree also revealed that the evolution of ancestral MHC class II genes into DQ and DY subregions predated the intra-Boreoeutherian divergence, thus marking the presence of a gene duplication event in the Boreoeutherian ancestral MHC ( Figure 7B). Because the giant panda, cat, dog, cow, and pig all possessed a DRB gene adjacent to the DY subregion (Figure 4), we included a duplicate DRB gene in the Laurasiatherian ancestral MHC ( Figure 7C). Of the three Carnivora representatives -giant panda, dog, and cat -only the panda retained the relic DYB pseudogene (Figure 4). From this we infer that the DY loci had become pseudogenes in the Carnivora-ancestral MHC ( Figure 7D). Considering that the giant panda possesses a relatively intact pseudo-DPA locus and exhibits a one-time duplication of the DPB gene, we conclude that the function of the panda's DP subregion was lost during the evolution of the panda instead of representing an event in the evolutionary history of the Carnivora-ancestral MHC. The MHC class II region of the giant panda ultimately evolved into its current species-specific genomic structure after losing the entire DYA locus, expanding the DQ subregion, and extinguishing the duplicate DRB gene ( Figure 7E).  A, B', B, C, D, E depict the ancestral MHC structure of Theria, opossum, Boreoeutheria, Laurasiatheria, Carnivora and giant panda, respectively. The opossum MHC structure was modified from Belov et al. [29]. * denotes genes that are pseudo in Euarchontoglires and ** shows those genes that are functional in ruminants. The black and gray rectangles represent respectively functional and pseudogenes in the opossum and giant panda. The colour boxes and arrows indicate ancestral MHC genes and their orientation. The dotted colour and blank symbols reflect that the ancestral genes become pseudo genes and absent ones, respectively. doi:10.1371/journal.pone.0004147.g007 The birth of new MHC class II genes is rooted in genome fragment duplications, which are probably triggered by long interspersed nuclear elements (LINEs) [35]. A statistical analysis of repetitive elements in the MHC class II regions of giant panda, cat, dog, cow, pig, human, and mouse showed that each species had a characteristic distribution of repetitive elements ( Table 2). The Aime-MHC, FLA and DLA of Carnivora contained the highest number of LINEs, whereas the HLA and H2 of Euarchontoglires had the most LTRs (Table 2). SINEs and LINEs were by far the most common repetitive elements of Cetartiodactyla BoLA and SLA, which had the highest content of SINEs among all MHC regions compared ( Table 2). A comparison of repeat content between the entire MHC class II region and the DR1-DQ-DR2-DY region revealed a clearly evident increase in LINE and LTR contents in the gene-rich DR1-DQ-DR2-DY region ( Table 2). In sharp contrast to LINEs and LTRs, SINEs and DNA transposon content were generally observed to decrease in the DR1-DQ-DR2-DY region, suggesting that both LINEs and LTRs are associated with the MHC class II gene fragment duplication process. In the HLA, the LINE content of various subregions was similar to the overall LINE content, but the frequency of LTRs was elevated in the gene-rich DR,DQ region ( Table 2), suggesting that, in certain species, LTRs may play a role in the duplication of MHC class II genes.

BAC clones for sequencing
Construction of a sequence-ready BAC contig spanning the MHC class II region was previously described [28]. This contig is approximately 650-kb in length and includes the entire classical class II region and a part of the extended class II regions. In this study, only the classical class II region containing alpha and beta genes was sequenced. In total, ten BAC clones (504E11, 354C3, 826G2, 237E1, 1071G6, 206G5, 900H8, 692B2, 1561B8, and 1262B6) were sequenced.

Preparation of BAC DNA and generation of shotgun libraries
A single BAC clone was used to inoculate 1 ml of LB containing 12.5 mg/mL chloramphenicol. After incubating at 37uC with shaking for 5-6 h, 1 mL of Copycontrol induction solution was added and the culture was incubated at 37uC with shaking for an additional 1-2 h. A 1-mL aliquot of the induced culture was transferred to 500 mL of LB containing 12.5 mg/mL chloramphenicol and then incubated overnight with shaking at 37uC.
Ultrapure BAC DNA was isolated from the overnight, induced culture using the Qiagen Large-Construct Kit. High-molecularweight BAC DNA was dissolved in 2 mL of TE buffer and stored at 220uC until use.
Shotgun libraries were constructed according to the protocol of Yuhki et al. [12], with modifications. Briefly, approximately 300 mL of ultrapure BAC DNA was sonicated to obtain sheared DNA fragments ranging from 1.5 to 3.5 kb in size. The randomly sheared DNA was end-repaired by incubating with 1.5 mL of Mung Bean Nuclease and then heated at 65uC for 5 min to inactivate the nuclease. The repaired DNA fragments were ligated into SmaI-digested pUC19 vector, and then used to transform competent DH5a cells. Plasmid DNA was extracted using the AxyPrep-96 Plasmid Purification Kit (Axygen) according to the manufacturer's protocol.

Sequencing and assembly of shotgun clones
Plasmid DNA from shotgun clones was sequenced in both directions with M13 forward and reverse primers to achieve 7.2-to 8.8-fold coverage. The sequencing reactions were carried out using dye-terminator and dye-primer chemistry on an ABI Prism 3730x1 DNA analyzer (Applied Biosystems) by the sequencing group of Beijing Genomics Institute, and on a LI-COR Long Read IR 2 4200 DNA automatic sequencer (Li-Cor, NEN) by our research group.
For the sequences obtained from the ABI DNA analyzer, base calling and quality assessment were performed using PHRED [36][37]. The nucleotide sequences were first examined for contaminating vector and E. coli DNA with PHRAP (http://www.phrap. org) and the trimmed sequences were assembled by CONSED [38][39]. For the sequences derived from the LI-COR IR 2 sequencer, base calling and quality control were carried out using E-seq (http://www.licor.com) and the sequence pre-processing and assembly were performed using the SeqMan module of the DNAStar software package. Sequence gaps were closed by primer walking or by sequencing PCR products that spanned small gaps [13]. The primers for closing gaps were designed from unique endsequences from each assembly after first analyzing for Carnivora repetitive elements by RepeatMasker (http://www.repeatmasker. org).

Sequence analysis and annotation
The finished genomic sequence of each BAC clone was analyzed by RepeatMasker (http://www.repeatmasker.org); GENESCAN was used to predict genes from the masked nucleotide sequence ( [40]; http://genes.mit.edu). The predicted gene sequences were aligned against NCBI reference mRNA sequences using BLAST programs [41] to identify the boundaries of exons (i.e., to prevent contamination of coding regions by introns) and to distinguish expressed genes from pseudogenes. The GENESCAN-predicted peptide sequences were aligned against non-redundant protein sequences by BLASTP [42] to search for amino-acid homology. If a novel transcript was predicted but failed to yield homologous fragments in either BLAST search, it was realigned against the NCBI nucleotide collection (nr/nt) database to exclude an origin from local region-specific sequences. The 10 BAC-derived contigs were assembled into a complete nucleotide sequence for the giant panda MHC class II region and reanalyzed for predicted genes by GENESCAN to produce the final annotated genomic sequence.

Construction of phylogenetic trees
To completely resolve the evolutionary history of giant panda MHC class II genes, we also included some expressed alpha and beta genes from marsupials during phylogenetic tree construction. The analysis of alpha and beta genes from diverse mammals, marsupials, and birds was facilitated by first aligning sequences by translated amino acids to locate the positions of gaps in the corresponding codon positions, as implemented in Mega 4.0 [48]. The aligned sequences were subjected to Bayesian phylogenetic analyses with MRBAYES 3.1.2 [49]. Analyses were run with one cold chain and three heated chains for 500,000 generations. Amino-acid analysis was based on the POISSON model and nucleotide analysis adopted a codon-based GTR+G+I model of substitution. Maximum parsimony (MP) analysis was also performed by PAUP 4.0beta [50] for comparison of trees. The MP tree was produced using heuristic searches with tree-bisectionreconnection (TBR) branch swapping; node support was tested by bootstrapping 100,000 replicates.