Immunoglobulin Genomics in the Guinea Pig (Cavia porcellus)

In science, the guinea pig is known as one of the gold standards for modeling human disease. It is especially important as a molecular and cellular biology model for studying the human immune system, as its immunological genes are more similar to human genes than are those of mice. The utility of the guinea pig as a model organism can be further enhanced by further characterization of the genes encoding components of the immune system. Here, we report the genomic organization of the guinea pig immunoglobulin (Ig) heavy and light chain genes. The guinea pig IgH locus is located in genomic scaffolds 54 and 75, and spans approximately 6,480 kb. 507 VH segments (94 potentially functional genes and 413 pseudogenes), 41 DH segments, six JH segments, four constant region genes (μ, γ, ε, and α), and one reverse δ remnant fragment were identified within the two scaffolds. Many VH pseudogenes were found within the guinea pig, and likely constituted a potential donor pool for gene conversion during evolution. The Igκ locus mapped to a 4,029 kb region of scaffold 37 and 24 is composed of 349 Vκ (111 potentially functional genes and 238 pseudogenes), three Jκ and one Cκ genes. The Igλ locus spans 1,642 kb in scaffold 4 and consists of 142 Vλ (58 potentially functional genes and 84 pseudogenes) and 11 Jλ -Cλ clusters. Phylogenetic analysis suggested the guinea pig’s large germline VH gene segments appear to form limited gene families. Therefore, this species may generate antibody diversity via a gene conversion-like mechanism associated with its pseudogene reserves.


Introduction
The guinea pig (Cavia porcellus), also called the cavy, is a species of rodent belonging to the family Caviidae. This animal has been used in scientific experimentation since the 17th century. During the 19th and early 20th centuries, the guinea pig was a popular experimental animal for studying prevalent bacterial diseases such as tuberculosis and diphtheria [1], resulting in the epithet ''guinea pig'' being used to describe a test subject. Guinea pigs are currently still used in research, primarily as models for human diseases, including juvenile diabetes, tuberculosis, scurvy, and pregnancy complications [2,3,4,5,6,7,8].
Immunoglobulins (Igs) are only expressed by jawed vertebrates [9,10,11] and are usually composed of two identical heavy (H) chains and two identical light (L) chains. Exceptions include shark IgNAR and camelid IgGs, which are only comprised of heavy chains [9,10,11]. To date, mammalian Ig genes are organized into a 'translocon' configuration [12]. In the heavy chain locus, multiple variable (V H ), diversity (D H ), and joining (J H ) gene segments are followed by m, d, c, e, and a genes [13]. In the kappa encoding locus, multiple joining (J k ) region gene segments are present within a cluster, followed by a single constant (C k ) gene, whereas in the lambda encoding locus, joining (J l ) and constant (C l ) genes occur as J l -C l blocks, which usually have multiple copies [14].
The word ''guinea pig'' is synonymous with scientific experimentation, but little is known about its Ig genes. We therefore used the recently available genome data of guinea pig provide as an opportunity to study the Ig genes of this species. Our study aimed to characterize the guinea pig IgH and IgL loci, in an effort to promote a better understanding of the immune system and evolutionary divergence of the Ig genes in placental mammals.

Guinea Pig Genome Sequence
Guinea pig (C. porcellus) genome data were obtained from the Ensembl database (http://www.ensembl.org), and the Broad Institute conducted genome sequencing and assembly (cavPor3, 6.796 coverage, Jul 2008). High-coverage ensured increased accuracy of the genome analysis results.

Identification of the Guinea Pig Ig Genes
Guinea pig Ig constant region genes were retrieved on the basis of comparing guinea pig and human Ig gene sequences (http:// genome.ucsc.edu/). FUZZNUC, an online software (http:// mobyle.pasteur.fr/cgi-bin/portal.py?#forms::fuzznuc), was used to find adjacent recombination signal sequences (RSSs) for the identification of variable, diversity, and joining gene segments. Five or more mismatched bases were allowed to cover all genes.

Total Ribonucleic Acid (RNA) Extract and 39 RACE
Total RNA was extracted from spleen tissues of three male HARTLY guinea pigs using TRIzol Reagent (Invitrogen, USA) and pooled equally. Animals were treated in accordance with the China Agricultural University on the protection of animals used for experimental and other scientific purposes. The study was approved by animal welfare committee of China Agricultural University with approval number XK257.
Complementary deoxyribonucleic acid (cDNA) was synthesized from 1 mg of total RNA using 39RACE-reverse transcription primers. The first polymerase chain reaction (PCR) amplification of each Ig heavy chain constant region gene (m, c, e, and a) was performed using the corresponding sense primer and 39RACEantisense primer 1. While the second PCR amplification was conducted using the corresponding sense primer and 39RACEantisense primer 2. All primers are displayed in Table S1.

Southern Blotting Analysis of Genomic DNA Sample
Guinea pig liver genomic DNA was isolated according to the method as previously described [15]. The V H gene family-specific probes belonging to guinea pig V H 1, V H 2 and V H 3 gene families were labeled using a PCR digoxygenin probe synthesis kit (Roche, Germany) with primers designed from V H 1-95, V H 2-17 and V H 3-157 (The primer sequences are displayed in Table S1). Ten micrograms of genomic DNA digested separately with BamH I, EcoR I, Hind III and Xba I (New England Biolabs, USA) were loaded into each well of a 0.8% agarose gel and electrophoresed for 12 h, transferred to a positively charged nylon membrane (Roche, Germany). Hybridization and detection were conducted following the manufacturer's instructions.

Sequence and Phylogenetic Analysis
Multiple sequence alignments were edited and handled with the Megalign software program [16], and the Clustal W and Clustal X algorithms [17,18], before being analysed using BioEdit [19]. Comparative phylogenetic trees were constructed using the PHYLIP 3.67 [20] software and TreeView [21] based on the final nucleotide alignment. The neighbor-joining algorithm was used for phylogenetic analysis and bootstrap support was provided by 1000 replicates. Sequences from other species used in our phylogenetic analyses and sequence alignments are presented in Figures S1, S2, and S3 and Table S2.

Dot Matrix Analysis
Pairwise dot matrix comparisons were made using DNAMAN software (window size = 30-bp, mismatch limit = 9-bp) to identify potential alignment of nucleotide bases between the sequences.

Definition of the V H /V L Gene Families
In mammals, germline V H and V L genes are categorized into different families according to their amino acid or nucleotide sequences similarity [22]. Sequences with greater than 75% similarity are general considered to belong to the same family, while those with less than 70% similarity are placed in different gene families, and those possessing between 70% and 75% similarity are inspected on a case-by-case basis [23]. We placed potentially functional V H and V L gene segments sharing more than 70% similarity into the same family.

Guinea Pig IgH Locus
Analysis of the genomic sequence revealed that the guinea pig IgH locus is located within genomic scaffolds 54 and 75 ( Figure 1, Figure 2). The entire IgH locus spans approximately 6,480 kb of the two scaffolds (4,302 kb in scaffold 54 and 2,178 kb in scaffold 75). The total length is an estimate due to the existence of sequence gaps ( Figure 1, Figure 2). Six J H segments, 507 V H , 41 D H , four constant region genes (m, c, e and a) and a reverse d trace (marked with an arrow towards the left) were identified within the two scaffolds. Locations of the annotated IgH genes on the guinea pig genome are displayed in Table S3.

Analysis of the Guinea Pig Constant Region Genes
To acquire the cDNA sequence for four heavy chain isotypes (m, c, e, and a), 39RACE were performed on splenic RNA using specific primers. Using this strategy we successfully identified genes encoding the constant region exons for IgM, IgG, IgE, and IgA ( Figure S4). As predicted from the analysis of the guinea pig Ig genomic genes, the domain structure for the four isotypes was typical of that for other mammalian species ( Figure S5).
Within the guinea pig genome, C H 3 and parts of the C H 4 segments of the m gene were determined to be missing because of the existence of gaps ( Figure 1). By using the 39 RACE method, the complete secreted IgM, including four exons, was successfully cloned ( Figure S4-S5).
Most mammals also express the d gene, with the exception of the rabbit [24] and opossum [25]. The area predicted to contain an IgD C region, and in particular the 39 region of the IgM exons between IgM and IgG, as well as the whole cavPor3 assembly, was thoroughly searched in two different orientations for coding sequences that might correspond to a putative IgD. These searches detected d trace fragments that are homologous with mammalian IgD (Figure 3) and showed an opposite transcription direction to the upstream m gene. Based on sequence alignment, the threefragment d trace was found to belong to the C H 2 and C H 3 domains (Figure 3).
Although two IgG isotypes (IgG1 and IgG2) were previously identified in domestic guinea pig serum [26], we only identified one IgG gene in the guinea pig genome perhaps due to sequence gaps. Our sequence alignment further showed that the recognized IgG in guinea pig genome shared the highest similarity with IgG1 (six amino acid difference) ( Figure S6). Interestingly, we were just able to clone the IgG2 mRNA transcript by 39 RACE. To address this question, a further Southern blotting experiment using the C H 1 exon (high similarity between IgG1 and IgG2) as a probe, which showed that there were more than one c genes in the guinea pig genome ( Figure S7). Taken together, these data suggested that the guinea pig had two c genes in its genome but only one was preferentially expressed.
A phylogenetic tree constructed with C H 2 and C H 3 exons of IgG from different mammalian species revealed that the guinea pig IgG genes were clustered with rodents IgG genes ( Figure 4). IgG2 and IgA also exhibit a hinge region, which is thought to have evolved by condensation of the C H 2 exon in an ancestral isotype such as IgY in birds [25,27]. The hinge region of IgA is encoded by the 59 end of the C H 2 exon, as observed in other eutherian mammals [28,29,30] (Figure S4-S5).
In mammals, especially in humans and mice, a pentameric tandem repeat sequence is found upstream of the heavy chain constant region, which acts as a switch or S region. S regions have previously been mapped and sequenced, and are relevant in Ig class switch recombination. Such characteristic tandem repeats were also found within the upstream Cm, Ce and Ca gene sequences of the guinea pig. Three putative S regions span 2.2 kb to 3 kb, and exhibit similar repeats (GAGCT and GGGCT) to those observed in humans and mice. However, the characteristic sequence of the switch regions could not be identified within the c gene, most likely due to sequence gaps. Dot matrix analysis of the guinea pig S region revealed substantial nucleotide similarity with those of humans and mice ( Figure 5).

Analysis of the Guinea Pig V H Gene Segments
A total of 507 V H segments were identified in scaffolds 54 and 75 ( Figure 1, Figure 2). Ninety-four of these appeared to be potentially functional because they contained leader exons (L), uninterrupted open reading frames (ORF), downstream RSS, and a V gene domain (framework regions and complementarity determining regions). The remaining 413 segments that contain either in-frame stop codons or are partial sequences were designated as pseudogenes (Figure 1, Figure 2 and Table S3). Given that gaps existed within the assembly, it is possible that as yet unidentified V H genes are also present in the guinea pig genome.
Phylogenetic analysis and multiple sequence alignments, including all functional guinea pig germline V H gene segments, revealed the V H gene families 1, 2 and 3, which were comprised of 22, 17 and 55 members, respectively ( Figure 6, Figure 7). The  largest family, V H 3, could be further divided into two subfamilies ( Figure 6, Figure 7). We also performed Southern blotting to verify the multiple numbers of V H genes of different families in the guinea pig genome (Figure 8).
We chose all potentially functional guinea pig V H sequences of and V H sequences that represented previously reported gene families from other mammalian species to construct a neighborjoining phylogenetic tree [31,32] (Figure 9). The phylogenetic tree indicated that the mammalian V H gene families were classified into three clans [33]. The guinea pig V H families 1 and 2 belonged to clan II, and family 3 belonged to clan III.

Analysis of the Guinea Pig D H and J H Gene Segments
Approximately 504 kb downstream from the last V H segment (V H 1-1), we identified 41 D H segments (i.e., D H 1-D H 41). They spanned a 660 kb region of DNA in scaffold 54 ( Figure 1). Each    D H segment was flanked by conserved RSS elements composed of heptamers and nonamers separated by 12 bp spacers and existed within at last one alternative reading frame ( Figure 10 and Figure  S8), suggesting that they are potentially functional. J H region contained six genes (designated J H 1 to J H 6) spanning approximately 2 kb. Each J H gene had an upstream RSS element with a 22-23 bp spacer, ORF, and a downstream RNA donorsplicing site at the 39 end, suggesting that they are potentially functional ( Figure 11).

Guinea Pig Igk Locus
The guinea pig Igk chain is located in scaffolds 37 and 24, and spans an approximately 4,029 kb genomic region (Figure 12, Figure 13 and Table S4). A total of 349 V k genes were identified. Further analysis revealed that 111 V k genes might be potentially functional genes, given that they contained an L sequence, ORF, RSS and V domain. The remaining 238 segments contain either in-frame stop codons or frameshifts, and are thus designated as pseudogenes. Based on sequence analysis, the 111 potentially functional V k genes were divided into six families (V k 1-V k 6), which contained 69, 15, 7, 17, 2, and 1 member/s, respectively ( Figure 14). In addition, 222 V k genes were arranged in the same transcriptional orientation and exhibited downstream J k and C k , and 61 V k segments and a reverse transcriptional direction in scaffold 37. Downstream of the V k genes, three J k gene segments were identified that spanned 0.6 kb. Furthermore, approximately 4 kb downstream from the last J k , a single C k gene was identified.

Guinea Pig Igl Locus
Guinea pig Igl chain genes were identified in scaffold 4, and spanned an approximately 1,642 kb length ( Figure 15, Table S5). Of 142 germline V l genes identified, 58 segments were categorized as potentially functional genes, and the remaining 84 were differentiated as pseudogenes. Based on sequence similarity analysis, the potentially functional guinea pig V l genes were assigned to twelve families (Figure 16), comprised of 3, 1, 13, 6, 6, 8, 1, 3, 2, 5, 3 and 7 member/s, respectively. In contrast to the V k genes, all the V l genes have the same transcriptional orientation as the downstream J k and C k regions. At the 39 end of this scaffold, 11 J segments and C segments were organized in tandem and spanned 44 kb, while the C l 2 exhibited less structural integrity owing to the presence of gaps. The sequence alignments of J l and C l gene segments are shown in Figure 17.

Discussion
Rodents are a ubiquitous group of species worldwide, representing nearly half of all mammalian species, which evolved from a common ancestor shared with the lagomorphs approximately 62-100 million years ago [34,35]. The classification of the guinea pig within the family Caviidae and genus Cavia is somewhat controversial because the origin of this rodent is poorly known, with current classification data mostly relying on fossils or genetic relationships [36,37,38,39,40,41,42,43,44]. We therefore analyzed the Ig genes sequences of the guinea pig, not only to better understand the immune system of this species, but also to provide data for comparative studies of mammalian Ig genes.
The IgH locus of mammals is arranged in a ''translocon'' configuration [32,45,46,47,48,49,50,51,52]. In the present study, we characterized the guinea pig Ig genes based on recently released genomic data and our experimental results. The guinea pig IgH locus in a configuration of V H (507)-D H (41)-J H (6)-Cmyd-Cc2-Ce-Ca spanning at least 6,480 kb in two scaffolds may be largest in all mammalian species studied so far.
On the basis of sequence analysis, we identified a single m, c, e and a gene within the guinea pig genome. We also found three fragments of d gene in an opposite direction downstream from the m gene. Due to sequence gaps, it is not certain if an additional functional d gene also exists in this species. We have also tried to confirm the sequence and orientation of the d gene fragments by genomic PCR to eliminate assembly error. The sequences of the d gene fragments were verified. Because two genomic fragments (approximately two kb and six kb) between m and d gene can not be successfully amplified, so the orientation of the d gene fragments remains a question.
Many placental mammals, such as human, cow, sheep, horse and dog, have a single functional d gene. Except for a functional d gene which consists of ten C H domains, a reverse d pseudogene was previously observed in the platypus IgH locus [53], while in the elephant genome, only one Cd3 remnant fragment was identified [51]. In camelids, Cd3 exon appears to be highly mutated [54]. In the rabbit [24] and opossum (marsupial) [25], the d gene has clearly been shown to be missing in their genomes, indicating that the d gene might be not as essential as the m gene in the humoral immunity. IgG is an important antibody molecule, which is believed to have initially evolved 600 million years ago [55]. The structure of c gene usually contains three C H domains and one hinge domain. Different IgG subclasses have been reported in the majority of mammalian species, ranging from one in the rabbit [56], two in sheep [57], three in cattle [58], four in human and rat [49,59], five in mouse [60], six in pig [61], seven in horse [62], up to nine in elephant [51]. In the guinea pig, two IgG subclasses are identified, and they share about 73% amino acid similarity in C H domains. It has been postulated that different c subclasses are derived from gene duplications of ancestral c gene in mammals [63]. Three ancestral c genes of mouse and rat can evolve multiple c genes by gene duplications [64,65], and the divergence of the c genes of human depends on one ancestral c gene and duplicated Cc-Cc-Ce-Ca fragments [66].
In different mammalian species, the number of V H genes and the ratio between V H functional genes and V H pseudogenes vary significantly, even between closely related species, like mice and rats. For example, there are 60 pseudogenes and 44 functional V H genes in humans, whereas in cows, only 6 pseudogenes and 11 functional genes have been identified [49,67,68]. The guinea pig germline V H repertoire contains at least 507 V H gene segments, which is the largest number in mammals studied to date. A large number of these germline V H genes may greatly contribute to guinea pig antibody diversity. Another notable feature is the number of guinea pig V H pseudogenes, which amount to 81% of the total V H genes (413 pseudogenes vs. 507 total genes). The high frequency of gene duplication in variable region generated multiple V H copies, many of which became pseudogenes due to genomic drift [68]. The V H pseudogenes are not truly nonfunctional in some species, for example, in rabbits, the pseudogenes usually are used to generate immunoglobulin diversity by gene conversion [67].
Guinea pig V H gene segments were also divided into three gene families (families 1, 2 and 3), which were orthologous to human V H families 4, 2 and 3, respectively. The guinea pig exhibited a multiple gene family group, as observed in dogs (three families) [69], humans (seven families) [70,71], horses (seven families) [32,72] and mice (sixteen families) [73], yet different from the single family group observed in sheep [57], rabbits [74], camels [75], swine [76] and cattle [72]. The guinea pig V H genes were also classified into different clans as in other mammalian species [50,52,67,72,74,77,78,79]. Some reports have revealed that representative V H genes belong to all three V H clans in the human, mouse, cat and dog [67,77]. This characteristic is different in cattle, horses and sheep [50,52,72,78,79], which have lost much of their ancestral repertoire, and the V H genes only belong to clan II. Pigs and rabbits only have clan III genes [74,77,79]. V H genes of guinea pig are distributed in clans II and III, and the majority of V H genes (family 3) are most closely related to human V H 3 (clan III), which has been proposed as the ancestral V H gene family [78,80]. These features are also found in swine and rabbits [74,76].
The precise evolutionary relationship among mammalian lineages has not yet been resolved [81]. Results of the relevant study show that marsupials and monotremes were estimated to have separated from the common ancestor of present-day placental mammals more than 130 million years ago, while the major radiation of the placental mammals occurred approximately 70-120 million years ago [82,83,84,85,86]. Certain V gene families which descended from common ancestor genes are orthologues between nonplacental mammals and placental mammals. For example, platypus (Monotreme), dog and human share the same ancestral gene families (V H 3 and V H 4), While the American short-tailed opossum (Monodelphis domestica), swine, rabbit and human share ancestral V H 3 family, and artiodactyl species share ancestral V H 4 family. With an older evolutionary origin in present day mammals [86], platypus have two ancestral V H gene families, while other mammals share one or two ancestral V H gene families. These could be explained by an inactivation or loss of V H gene members in these species during evolution [25]. For new V H gene families in human or mouse, the divergence of V H genes probably occurred after speciation.
The ratio of functional V k and V l is variable within mammalian species, with the germline V k genes being more abundant than V l genes in humans (40 V k genes vs. 30 V l genes) and mice (V k genes vs. V l genes over 95%) [87]. This is also the case for the guinea Figure 17. The alignment of amino acid sequences of J and C genes from guinea pig IgL chains. A, Alignment of the deduced amino acid sequences of the three guinea pig J k gene segments. B, Alignment of the amino acid sequences of the C k proteins from human, mouse and rabbit. C, Alignment of the deduced amino acid sequences of the eleven guinea pig J l gene segments. D, Alignment of the amino acid sequences of the C l proteins from human, mouse and rabbit. Amino acid residues that are identical to the top counterpart in every panel are shown as dots; Gaps and missing data are indicated by hyphens. doi:10.1371/journal.pone.0039298.g017 pig, in which V k germline genes are more dominant than V l (84 functional V k genes vs. 58 functional V l genes). It has been proposed that the priority of use of the light-chain gene isotypes at the protein level may be connected with the overall number of V gene segments [87]. It is possible that the k chain preponderates over the l chain at the protein level in guinea pigs. Also, multiple pseudogenes exist in the guinea pig V H (413), V k (238), and V l (84) loci, which may contribute to the Ig diversity in guinea pigs, similar to other species [47,88,89].
In conclusion, we have reported the characterization and annotation of the guinea pig Ig loci genomic maps for the first time. This information, together with the characterization of the guinea pig Ig germline gene repertoire currently being undertaken, should lay the foundations for further studies into the differentiation and structure of mammalian Ig genes, including those found in guinea pigs.