Reconstructing an Ancestral Mammalian Immune Supercomplex from a Marsupial Major Histocompatibility Complex

The first sequenced marsupial genome promises to reveal unparalleled insights into mammalian evolution. We have used theMonodelphis domestica (gray short-tailed opossum) sequence to construct the first map of a marsupial major histocompatibility complex (MHC). The MHC is the most gene-dense region of the mammalian genome and is critical to immunity and reproductive success. The marsupial MHC bridges the phylogenetic gap between the complex MHC of eutherian mammals and the minimal essential MHC of birds. Here we show that the opossum MHC is gene dense and complex, as in humans, but shares more organizational features with non-mammals. The Class I genes have amplified within the Class II region, resulting in a unique Class I/II region. We present a model of the organization of the MHC in ancestral mammals and its elaboration during mammalian evolution. The opossum genome, together with other extant genomes, reveals the existence of an ancestral “immune supercomplex” that contained genes of both types of natural killer receptors together with antigen processing genes and MHC genes.


Introduction
The major histocompatibility complex (MHC) is a multigene complex critical to vertebrate immunity. The MHC is the most gene-dense and polymorphic region of the mammalian genome and is associated with resistance to infectious diseases, autoimmunity, transplantation, and reproductive success [1]. Loci contained within the MHC have been historically grouped into three classes of genes called Class I, II, and III. The three classes of loci are distinguished based on both structure and function of their encoded proteins. Class I molecules can be divided into classical and non-classical molecules. Classical Class I (Class Ia) loci are ubiquitously expressed and encode receptors that typically bind and present endogenously synthesized peptides to antigen specific CD8 þ cytotoxic T cells. Non-classical Class I (Class Ib) loci encode molecules that often perform functions other than antigen presentation. Class Ib loci may be located outside the MHC, and tend to be non-polymorphic and not ubiquitously expressed. Classical Class II genes encode receptors that present exogenously derived peptides to CD4 þ helper T cells, whereas non-classical Class II genes participate in antigen presentation pathways. The MHC also contains several genes encoding molecules that participate in the processing of peptides for presentation to the immune system. The Class III genes encode a variety of immune and non-immune system-related molecules, most of which are not involved in antigen presentation, and include cytokines and components of the complement system. The three classes of loci are been used to define regions within the MHC, i.e. Class I, II, and III MHC regions.
Comparative analyses of the MHC organization across distantly related species has revealed lineage specific rearrangements within the region and changes in gene complexity. Detailed information on MHC organization is currently available for seven species of eutherian (''placental'') mammals, two birds, five teleost fish, and sharks [2][3][4][5]. There are major differences in organization and complexity between eutherian and non-mammalian MHC regions, and reconstructing the evolutionary history of this region has been difficult. The highly complex eutherian MHC is ordered along the chromosome as the Class I-III-II regions. The eutherian MHC is large and gene dense; for example, the human MHC contains 264 genes and pseudogenes over the 3.6 Mb region. In non-mammals, the MHC generally contains fewer genes than is found in mammals and the Class I and II regions are adjacent. Teleost fish are the exception. Their Class I and II regions are unlinked. Of the MHC regions completely sequenced, the chicken MHC is the least complex, containing only 19 genes over 92 kb [6].
In eutherians, the Class I region contains a set of framework genes whose presence and order are conserved among species; and between this framework, the Class I genes have expanded and diversified [7]. These Class I region framework genes have not been reported in the MHC of non-mammals. In eutherians, the Class II region contains the antigen processing genes (TAP1, TAP2, PSMB8, and PSMB9), which process endogenously synthesized peptides for presentation on Class I molecules. In non-mammals, however, the antigen processing genes are found in the Class I region, and their proximity is thought to have influenced Class I gene evolution [8].
Analysis of MHC structure in mammals distantly related to eutherians (marsupials and monotremes) would bridge a 200 million y gap between eutherians and non-mammalian vertebrates [9], and lead to a new understanding of the evolutionary forces that shaped the complex eutherian MHC. The availability of the opossum genome sequence provides the first opportunity to bridge this phylogenetic gap and provide insight into the evolutionary history of the mammalian MHC. Here we report that the opossum MHC region is similar to the eutherian MHC in both size and gene complexity; however, it also contains organizational features more like those found in non-mammals, revealing a likely ancestral organization in mammals. This analysis is the deepest comparison of the MHC region within mammals undertaken to date.

Results/Discussion
The 3.6 Mb MHC region on human Chromosome 6 contains 140 genes between flanking markers MOG and COL11A2 [10]. We found that the opossum MHC region bounded by the same flanking markers spans 3.95 Mb and contains 114 genes, recognized by homology to known genes from other species and/or the presence of open reading frames (ORF). Eighty-seven of these genes are shared with human MHC (Figure 1). A list of putative opossum MHC gene transcripts and our opossum MHC genome browser containing annotation are located at http://bioinf.wehi.edu.au/ opossum. The opossum MHC is located on Chromosome 2q near the centromere, oriented with MOG proximal (Figure 2) [11]. Physical mapping of 19 bacterial artificial chromosome (BAC) clones, corresponding to loci spaced along the entire scaffold, confirmed the accuracy of the assembly.
The opossum MHC is similar in size and gene content to the MHC of eutherian mammals. However, the organization of the opossum Class I, II, and III regions is different from that of eutherians and shows more similarity to the organization seen in birds and amphibians. Ostensibly, the main difference between the opossum MHC and that of eutherians is the position of the Class I genes ( Figure 3). The opossum MHC has (1) a Class I/II region that contains interspersed Class I and II genes, (2) a ''framework region'' that is composed of only the framework genes in the opossum, but which also includes Class I genes in eutherians, (3) a Class III region with gene content and order highly conserved with eutherians, and (4) two extended regions that flank the MHC, corresponding to the eutherian extended Class I and II regions, and containing a very similar gene content and order.
In the opossum, Class I and II regions are adjacent and interspersed rather than being separated by Class III ( Figure  3). This unique arrangement of Class I and Class II loci has not been described in any other mammalian species. The proximity of Class I and II genes in opossum as well as nonmammalian vertebrates implies that Class I and II genes were originally located close together in the mammalian ancestor ( Figure 4). This conclusion is further supported by the presence of Class I pseudogenes in the human Class II region, and the presence of both functional and non-functional Class I genes in the rodent Class II region ( Figure 3) [1].
In humans and mice, the MHC contains a region referred to as the Class I framework region due to the presence of a set of non-Class I or II genes, amongst which the Class I loci are interspersed [7]. The content and gene order of these framework genes are conserved between mice and humans. Remarkably, the opossum MHC contains a homologous cluster of framework genes (including MOG, PPP1R11, TRIM26, TRIM39, GNL1, POU5F1, and BAT1) next to the Class III region opposite the Class I/II region. These genes are in the same order as in the eutherian Class I framework region, but they lack the interspersed Class I loci (Figures 1  and 4). This implies that a block of Class I framework genes was established near the MHC locus prior to the translocation of the Class I genes to this region in eutherians. Framework genes have not been reported in the MHC of non-mammals, but it is likely that the association of the framework region genes is ancient and that the framework region moved into the MHC en masse, given that five framework region genes appear on the same scaffold as Class III genes in Xenopus tropicalis (Ensembl scaffold_547) (unpublished data).
In Figure 4, we present a model to explain how MHC The size and complexity of the opossum MHC is similar to that of eutherian mammals, but the organization resembles that seen in non-mammals. The Class I and Class II regions are adjacent and somewhat interspersed. The antigen processing genes are closely linked to the Class I genes. organization evolved from a simple ancestral form to the complex forms seen today in therian mammals. We propose that in the MHC of a therian ancestor of marsupials and eutherians, Class I and Class II loci were located together at one end of the region, along with the antigen processing genes. A similar hypothesis was suggested previously based on studies of MHC organization in non-mammals [12]. Adjacent to the Class I and II regions was a gene-rich Class III region that already contained most of the genes present in human, mouse, and opossum MHC. The framework region, devoid of most or all Class I genes, assembled on the opposite side of Class III. The extended regions are present in the opossum as well as in eutherians, and therefore must have been present in the ancestral form. Studies have identified extended region genes in close proximity to MHC genes in teleost fish, despite the overall non-linkage of Class I, II, and III genes in these species [3,13,14]. The eutherian MHC Class I-III-II structure exemplified by rodents and primates evolved relatively recently. Class I genes must have relocated across the Class III region and interspersed between the framework genes after the divergence of marsupials and eutherians, but prior to the divergence of primates and rodents (;60 million y ago) [15]. This process gave rise to the eutherian Class I region. It is unclear how or why the Class I genes relocated, but Class I loci appear to ''migrate'' in different species along with their frequent expansions and contractions. Specifically, Fukami-Kobayashi et al. [16] have suggested that long interspersed nuclear element (LINE) sequences can trigger genome fragment duplications, producing pairs of duplicated genome fragments. Perhaps, a series of duplicated genome fragments inserted themselves between framework genes in ancient eutherian mammals and have since been evolving via expansions and contractions in their new location.
The opossum MHC is unique in that Class I and II genes are interspersed and closely linked to antigen processing genes. The Class I expansion has occurred within the Class II region. The opossum MHC Class I/II region contains 11 putative Class I and ten Class II genes (predicted coding sequences available at http://bioinf.wehi.edu.au/opossum). Class II loci include the non-classical DMA and DMB genes, whose homologs are found in birds and eutherians. Three marsupial-specific classical Class II gene families are present; DA, DB [17], and a newly discovered family that we have designated DC (Figures 1 and 5).
Of the 11 Class I loci in the opossum MHC, only one; UA1, is known to have all the characteristics expected of a classical Class Ia locus by being both ubiquitously expressed and highly polymorphic. UA1 transcripts have been detected in all tissues tested by RT-PCR and account for all previously described Class Ia cDNAs [18] ( Figure 6A). The level of UA1 polymorphism is also comparable to that of human HLA-A  [19,20]. Unlike X. laevis, the opossum Class Ia gene, UA1, does not appear to have allelic lineages.
Two of the Class I loci (UA2 and UH) appear to be pseudogenes, because they lack a predicted ORF and have not been found expressed in any of the tissues examined (data not shown). Two other loci (UF and UL) have predicted ORFs, but their transcription has not been detected in any tissue so far and their functionality remains unknown. The expressed Class I loci in the opossum Class I/II region are highly diverse, sharing as little as 49% nucleotide identity, and at most 83%, over exons 2, 3, and 4 among loci. A phylogenetic analysis of the Class I loci, including Class Ia and Ib loci from other species, is shown in Figure 7. Despite the sequence divergence of opossum Class I loci, they are phylogenetically related and probably evolved from common ancestral loci. This observation raises some questions about one of the current theories explaining the general absence of non-classical Class I genes within the MHC of non-mammals. It has been suggested that proximity of Class I genes to the antigen processing genes has constrained their divergence [12]. In eutherians, loss of this tight association by movement The color code is the same as in Figure 1. Orthologous genes are indicated by connecting lines. Dashed lines and question marks represent unknown or uncertain data for a given gene or portion of the genome. The total number of genes is reported for each region or sub-region (pseudogenes are not included). Map is not drawn to scale. Unless specified, the small boxes represent single genes. This map was generated using data from [2,47]. The asterisk in the figure indicates that the duplication of DMB concerns only the mouse. DOI: 10.1371/journal.pbio.0040046.g003 of the Class I genes away from the antigen processing genes may have resulted in increased plasticity that led to fluctuations in gene number and function and allowed Class Ib genes to reside in the MHC [12]. However, in the opossum, antigen processing genes have not constrained the diversification of the adjacent classical and non-classical Class I genes.
It is unclear what selective advantage, if any, might have been gained by the separation of Class I from Class II or antigen processing genes in eutherians. Close linkage has been implicated in co-evolution of Class I genes and antigen processing genes [19,[21][22][23]. Perhaps the Class I genes in M. domestica have evolved to be less constrained by their proximity to the antigen processing machinery, allowing them to duplicate and diversify in close linkage to the TAP and PSMB genes. Alternatively, co-evolution with the antigen processing machinery may have severely restricted Class I evolution in this marsupial, perhaps resulting in only a single locus performing the classical role.
Class I loci, UB and UC, were previously assumed to be linked to the MHC due to their high levels of sequence similarity to UA1 [18], but surprisingly they are not found on the scaffold containing the MHC region ( Figure 1) and have been localized to the telomere of Chromosome 2p, distant from the MHC at 2q centromere ( Figure 2). Localization of Class I genes outside the MHC implies that these genes may have a non-classical role. In eutherians, the Class I loci lying outside the MHC are among the most divergent from Class Ia genes [2]. However, in non-mammals, genes closely related to Class Ia have been found outside the MHC, and in sharks, Class Ib genes found outside the MHC share very high levels of similarity to the Class Ia genes [8]. These non-mammalian genes have been designated Class Ib without elucidation of their functional roles, based on levels of expression and polymorphism. Currently, we do not have information about polymorphism levels of UB and UC, but their relatively low expression levels [24] may indicate evolution towards nonclassical Class Ib functions. UB and UC are both flanked by marsupial-specific retroelements of the CORE-SINE type [24], which would be consistent with the role of such elements in Class I gene mobility [16] and may explain the recent relocation of UB and UC outside of the MHC [24]. The high level of sequence similarity of UA, UB, and UC raises the possibility that Class Ia genes can maintain their function when unlinked to the MHC.
Comparisons between MHC sequences of distantly related mammals highlight the conservation of the most important regulatory sequences, namely the SXY DNA motifs. Transcription of most MHC Class I and II genes is largely regulated by the Class II transactivator (CIITA), which interacts with several transcription factors, particularly those that bind to this motif [25,26]. Conservation of promoter elements in opossum Class I genes has been reported previously [24]. Using computational methods, we were able to identify SXY motifs upstream of most opossum MHC Class I and II genes. Eight SXY motifs were identified within 273 base pairs (bp) of the coding start in the opossum Class II genes (Table 1). Overall, these motifs were found to be conserved between eutherians and the opossum ( Figure 8A). Eight SXY motifs were also identified upstream of opossum Class I genes (Table 1). We were not able to identify the SXY motif in genes UK, UF, and UM. Furthermore, the S motifs in the promoters of genes UH, UI, and UL appear to be weak with respect to the eutherian pattern. This suggests that the opossum Class I SXY regions have diverged from their corresponding eutherian motifs (particularly in the X motif; Figure 8B) more than the Class II SXY regions have. This is not unexpected given that Class II genes (classical and nonclassical) are typically co-expressed whereas the non-classical Class I genes tend to evolve novel functions.
Perhaps most significantly, our data also suggest an ancient relationship between the MHC and the natural killer complex (NKC), which contains C-type lectin natural killer (NK) cell receptor loci [27]. This relationship is drawn from the presence of two genes within the opossum MHC, MIC and OSCAR. Opossum MIC is the most distant homolog to the polymorphic human Class I genes MICA and MICB found to date (Figure 7). The MIC genes are Class I-related genes that encode ligands for NKG2D, a C-type lectin NK receptor [28]. MIC genes are not found within the MHC of rodents. Instead, rodents have closely related genes, known as MILL. In a phylogenetic analysis, the opossum MIC is basal to a clade containing human MICA/B and the mouse MILL1/2 genes (Figure 7). The function of rodent MILL genes is not yet known [29], but our results support a common evolutionary origin of MIC and MILL in eutherians. The presence of MIC in the opossum MHC, and its apparent absence in nonmammals, implies that MIC-like genes appeared before marsupials and eutherians diverged, and uniquely evolved into MILL in rodents.
The osteoclast-associated receptor (OSCAR) was first discovered as a receptor on mouse osteoclasts [30], but it has recently been shown to participate in antigen uptake and processing for Class II molecules in dendritic cells [31].
OSCAR (also known as polymeric immunoglobulin receptor 3) is located within the leukocyte receptor complex (LRC) of humans, chimps, mice, and rats [32]. The presence of an OSCAR homolog within the opossum MHC is surprising. Using Genscan, we confirmed that the opossum OSCAR homolog contains an intact ORF and a predicted promoter. Human and opossum OSCAR share 47% identity at the amino acid level and are reciprocal best hits in BLAST searches (opossum OSCAR against human Refseq: best hit  [17]. Here we report a new Class IIB gene family, DCB. The DMB genes were used as the outgroup. DAB was not found in scaffold_42 and a cDNA sequence was used for the analysis. Physical mapping localizes DAB BACs to the centromeric region of 2q, and it appears this gene was not sequenced or was unable to be assembled.  NP_573399.1 OSCAR isoform 4, e-value ¼ 7eÀ66). Further, the presence of OSCAR in the opossum MHC suggests that involvement in antigen processing may be its original function.
MHC Class I molecules are ligands for NK cell receptors, so these two gene families must co-evolve. Keeping up with the rapid evolution of the MHC loci in response to pathogenic pressures is thought to have resulted in the independent evolution of two vertebrate NK receptor families, the C-type lectin and Ig superfamily types. In humans, the C-type lectin NK receptors are found on Chromosome 12 within the NKC [27]. The second NK receptor family contains the killer cell Ig-like receptors (KIR) and is encoded in the LRC on human Chromosome 19 [27]. The recent discovery of C-type lectin NK receptor genes in avian MHC [6] supports an ancestral association of the MHC and the C-type lectin genes of the eutherian NKC. Just as birds provide an ancestral link between the MHC and NKC [33], the two aforementioned opossum genes, OSCAR and MIC, provide links between the MHC and LRC. OSCAR is in the MHC in opossum (Figure 1) but in the LRC in humans and rodents. MIC genes are in the MHC of opossums and humans, whereas the related MILL1/2 genes are in the LRC of rodents [34]. These observations support the existence of an ancestral genomic region in amniotes that probably contained MHC Class I loci and NK cell receptor genes of both the KIR and C-type lectin forms. This organization would have allowed both classes of NK receptors to co-evolve with their MHC ligands.
Recently, CD1 genes were linked to the MHC of chickens, and may have been part of the primordial MHC [35,36]. Although a clear evolutionary relationship is evident between eutherian MHC Class I genes and CD1, CD1 is not located within the eutherian MHC. The marsupial homolog of CD1 has been identified in the opossum genome (M. L. Baker, S. D. Melman, and R. D. Miller, unpublished data). It is located on a separate scaffold (scaffold_13) from that containing the MHC and maps to Chromosome 2p ( Figure 2). CD1, like the NK receptors, probably moved out of the MHC after the separation of mammals and birds but prior to the separation of eutherians and marsupials.
Comparative analyses of the MHC region in opossum and other species supports the idea that at one time in vertebrate evolution there was a single ''immune supercomplex'' of genes that contained MHC Class I and II, antigen processing genes (TAP and PSMB), CD1, and C-type and Ig-type NK receptor genes [37]. This complex is no longer found in any living species analyzed so far, but clues of its existence remain in extant genomes.

Materials and Methods
Sequence analysis and annotation. All results presented in this paper are based on the MHC-containing scaffold from the preliminary assembly of the Monodelphis domestica genome, MonDom2, released by the Broad Institute. The contig N50 length is 111 kb and the scaffold N50 length is 54 Mb (J. Chang, personal communication). MonDom2 is an interim assembly of unordered scaffolds. A final M. domestica assembly with ordered scaffolds that are anchored to chromosomes is in preparation. Similarity features were identified by aligning all of the human proteins from the extended MHC that are represented in the RefSeq collection (Release 11, http://www.ncbi. nlm.nih.gov/RefSeq) against the opossum genome using TBLASTN [38] and extracting best hits. Known opossum MHC Class I and II genes were located by aligning their transcripts with the opossum genome assembly using BLASTN [38]. To exclude alignments with shared domains in other genes, a heuristic approach that identified the shortest chain of BLAST HSPs (highest-scoring segment pairs) having the best protein coverage was implemented. A single scaffold, scaffold_42, was found to contain most of the genes expected in the MHC. Known MHC transcripts from other marsupial species, including tammar wallaby (Macropus eugenii) and brushtail possum (Trichosurus vulpecula) were aligned with opossum scaffold_42 using TBLASTN and best hits were extracted. Gene predictions were made by running GENSCAN [39] on scaffold_42. To visualize these four feature annotation tracks, a MHC genome browser, based on GBROWSE [40], was set up. All features were clustered spatially (based on sequence position) using a custom PYTHON (http://www. python.org) script and these cluster features were hand curated. If a known opossum gene was present in a cluster, that gene replaced the cluster in the curated annotation. Class I cDNA sequences, obtained using 59 and 39 RACE, were aligned with scaffold_42 using BLAT [41]. Class III genes were annotated by extracting sequence from the cluster neighborhood and GENOMESCAN [42] was used together with the orthologous human protein. Class II and framework region genes were annotated using a combination of GENSCAN, alignment of orthologous proteins from multiple species with the predicted protein and hand curation of the putative gene. The annotation of the extended regions is based, in general, on similarity features only.
Class I expression analysis. Analysis of Class I gene expression in different tissues was done using reverse transcription PCR (RT-PCR). Total RNA was extracted using Trizol (Invitrogen) following manufacturers recommended protocols. Thymus, brain, and spleen RNA were extracted from tissues taken from a 9-wk-old male M. domestica. All other tissues were from an adult (1-y-old) female. RNA was treated with TURBO DNA-free (Ambion, Austin, Texas, United States) to remove contaminating DNA. RT-PCR using total RNA samples from tissues shown in Figure 6 was performed using the GeneAmp RNA PCR Core Kit with oligo-dT priming following manufacturers recommended protocols (Applied Biosystems, Foster City, California, United States). The primers used for each Class I locus were designed to amplify exons 2 through 4, with the exception of UA1, which amplifies exons 2 to 3, and UJ, which amplifies exons 3 Phylogenetic analysis. Sequence alignments were made by first aligning amino acid translations to establish gaps corresponding to codon position. The MHC Class II trees were constructed using neighbor joining using pairwise deletion with Jones-Taylor-Thornton matrix (JTT) matrix model and 100 bootstrap replicates using MEGA 3. Species included are nurse shark (Gici), zebrafish (Brre), chicken (Gaga), echidna (Taac), platypus (Oran), rednecked wallaby (Maru), gray short-tailed opossum (Modo), brushtail possum (Trvu), rabbit (Orcu), cat (Feca), mouse (Mumu), mole rat (Naeh), human (Hosa). Class IIB Genbank references are as in [17,45]. Class IIA Genbank accession numbers are listed below in Accession Numbers.
The MHC Class I tree was constructed using Maximum Parsimony with 1,000 bootstrap replicates using the MEGA 3 program (http:// www.megasoftware.net). The overall tree topology was reproduced using the Neighbor Joining and Minimal Evolution models. M. domestica Class I loci were named in the following manner: UA1 was identified as the locus encoding the previously identified Class Ia transcripts (e.g., Modo3 included in this tree [18]), and UA2 is a locus with high nucleotide identity (94% over exons 2, 3, and 4) to UA1; UA1 and UA2 are the only two Class I loci within the MHC similar enough to be considered two members of the same family. UB and UC were previously described [24] and named, and are not in the MHC (Figure 2). UE through UM are individual loci sharing 49% to 83% nucleotide identity over exons 2, 3, and 4 in a pairwise comparison. Species abbreviations are as in Figure 5 with the addition of rhesus macaque (Mamu), cottontop tamarin (Saoe), pig (Susc), cow (Bota), and rat (Rano).