Analysis of Immunoglobulin Transcripts in the Ostrich Struthio camelus, a Primitive Avian Species

Previous studies on the immunoglobulin (Ig) genes in avian species are limited (mainly to galliformes and anseriformes) but have revealed several interesting features, including the absence of the IgD and Igκ encoding genes, inversion of the IgA encoding gene and the use of gene conversion as the primary mechanism to generate an antibody repertoire. To better understand the Ig genes and their evolutionary development in birds, we analyzed the Ig genes in the ostrich (Struthio camelus), which is one of the most primitive birds. Similar to the chicken and duck, the ostrich expressed only three IgH chain isotypes (IgM, IgA and IgY) and λ light chains. The IgM and IgY constant domains are similar to their counterparts described in other vertebrates. Although conventional IgM, IgA and IgY cDNAs were identified in the ostrich, we also detected a transcript encoding a short membrane-bound form of IgA (lacking the last two CH exons) that was undetectable at the protein level. No IgD or κ encoding genes were identified. The presence of a single leader peptide in the expressed heavy chain and light chain V regions indicates that gene conversion also plays a major role in the generation of antibody diversity in the ostrich. Because the ostrich is one of the most primitive living aves, this study suggests that the distinct features of the bird Ig genes appeared very early during the divergence of the avian species and are thus shared by most, if not all, avian species.


Introduction
The adaptive immune system of jawed vertebrates is characterized by the production of immunoglobulins (Igs) in response to antigens [1]. The B cell antigen receptor Ig is a heterodimeric protein that is usually composed of two identical heavy (H) chains and two identical light (L) chains. A disulfide bond formed by cysteine residues between the C L and C H 1 domains covalently joins the L chain to the H chain, and the two V domains associate non-covalently to form the antigen-binding site [2].
The Ig classes (in mammals, IgM, IgA, IgD, IgG and IgE) are defined by the isotypes of the heavy chain constant genes (m, a, d, c and e). Additional Ig isotypes have been identified in lower jawed vertebrates, including birds, reptiles, amphibians, bony fish and cartilaginous fish [3]. IgM is structurally conserved throughout evolution and is expressed in all jawed vertebrates. IgD is as ancient as IgM and has been described in elasmobranchs (in which it was previously known as IgW), bony fish, amphibians, reptiles and mammals [4,5,6,7,8]. IgD is, however, absent in birds and several mammals such as rabbits, opossum and elephants [9,10,11,12]. Compared with IgM, IgD shows a high degree of structural plasticity because of variance in the copy number and number of C H encoding exons as well as alternative RNA splicing [13]. In addition to these ancient Ig classes, some additional distinct Ig classes have been found in different vertebrates, such as IgY in lower tetrapods [7,9,14], IgNAR in cartilaginous fish [15], IgT/IgZ in the trout and zebrafish [16,17], IgX and IgF in amphibians [6,18], and IgO in the platypus [19].
The L chains contribute considerably to combinatorial antibody diversity by their association with H chains [20]. It is known that cartilaginous fish, teleost fish and amphibians express three IgL isotypes: k, l and s [21,22,23]. A fourth IgL isotype, s-cart, is only found in sharks [24]. Evolutionarily, fewer types of Ig light chains are present in mammals and reptiles, which express only l and k. The two light-chain loci differ significantly in their genomic organization. At the l locus, multiple Vl segments are followed by Jl-Cl repeats. By contrast, the k chain-encoding locus contains only a single Ck gene with a small cluster of Jk and multiple Vk genes located upstream [25,26,27]. Surprisingly, birds exclusively express l light chains [28,29]. The chicken and zebra finch IgL loci include only one functional IGVL gene and one IGJL gene, but multiple IGVL pseudogenes are located upstream of this functional IGVL gene [30,31]. Light chain diversity is generated by intrachromosomal gene conversion using the upstream pseudo-Vl gene segments as donor sequences [32].
The avian species described to date express only three immunoglobulin classes: IgM, IgA and IgY, which are encoded by Cm, Ca and Cu respectively [33,34], and no IgD encoding gene has been identified. The Cu and Ca genes in the chicken and duck IgH loci are positioned in reverse orientation [14,35], which raises questions regarding the mechanism of class switch recombination in birds and the evolution of the IGHC gene locus. IgY is a monomeric antibody of low molecular weight found in amphibians, reptiles, and birds and is thought to be the ancestor of mammalian IgG and IgE [36]. In addition to the full-length IgY, ducks can also generate a truncated IgY termed IgY(DFc), which is expressed by the alternative transcriptional termination of the single u gene [37,38].
Birds represent an enormously diverse group of vertebrates comprising nearly 9000 species. Our knowledge of the avian Ig genes is currently restricted to a few galliform (chicken, turkey, pheasant and quail) and anseriform birds (duck) [34]. According to phylogenetic studies, these two groups of birds diverged only recently (approximately 100 million years ago [MYA]) [39]. The ostrich (Struthio camelus) belongs to the ratitae order, which represents the most primitive living aves, i.e., birds that diverged from other avian lineages approximately as early as 140 MYA [40]. In the present study, we analyzed the Ig genes in this species to investigate whether it expresses other Ig isotypes in addition to the IgM, IgA, IgY and l light chains. Our objective was to provide additional clues to understand the evolution of Ig genes in birds.

IgH classes expressed in the ostrich
To analyze the IgH classes expressed in the ostrich, we generated two Ig-specific mini-libraries using the total RNA isolated from the spleen and intestine. In total, 234 clones derived from the spleen library were analyzed. Most clones (199) were found to contain IgA cDNA, whereas only 16 IgM-and 5 IgYcontaining clones were identified. The remaining 14 clones were shown to contain non-Ig sequences. It is surprising that the IgA clones comprised such a large portion of the library. This finding is likely the result of PCR bias in the construction of the library. We only identified IgA clones (319 clones) from the intestine library (total of 327 clones analyzed). These data suggest that the ostrich also expresses IgM, IgY and IgA, similar to chickens and ducks. The presence of these encoding genes in the ostrich genome was subsequently confirmed by Southern blotting using Cm-, Ca-and Cu-specific full-length probes (Fig. 1A).
To investigate whether the ostrich expresses IgD, we designed several pairs of degenerate primers based on the conserved Cd regions of other species. However, we did not to amplify any putative IgD sequence regardless of whether cDNA or genomic DNA was used.

Analysis of the ostrich Cm gene
Analysis of the obtained IgM heavy chain constant-region cDNA clones revealed only a unique sequence, which suggests the expression of a single m gene. However, four bands were detected when the mCH4 sequence (containing no Hind III site) was used as a probe in the Southern detection of Hind III-digested genomic DNA (Fig. 1B), which indicates the presence of more than one m genes in the ostrich genome.
The obtained ostrich secretory IgM heavy chain constantregion cDNA encodes 447 amino acids, in which 12 cysteines are positionally conserved compared with Cm in other species (Fig.  S1). All of the aligned Cm sequences (secreted form) exhibit an identical three-amino-acid motif (TCY) in their carboxy terminals ( Fig. S1), which is where the cysteine is assumed to bind the J chain to form polymeric IgM [41]. The entire ostrich IgM constant region contains four potential N-linked glycosylation sites (N-X-S/T): N-46, N-127, N-199 and N-434. Only N-46 and N-434 are conserved among reptiles, birds and mammals [42,43]. The N-127 site is conserved in birds and reptiles. The N-199 site is found exclusively in birds (Fig. S1). Alignment of the ostrich IgM constant region with those of other species demonstrated that the Cm3 and Cm4 domains to be less divergent than the Cm1 and Cm2 domains (Fig. S1). The ostrich IgM constant region shares an overall identity of 53.1% and 63.1% with the chicken and duck Cm, respectively, at the protein level. The identity of the ostrich IgM is supported by a phylogenetic analysis (Fig. 2).
Northern blotting to detect IgM gene expression further showed that the ostrich m gene was primarily expressed in the spleen and large intestine and only weakly expressed in the liver and small intestine ( Fig. 3) although RT-PCR showed IgM transcripts to be present in all tissues examined (Fig. S2).

Analysis of the ostrich a gene
Southern blotting with either the full-length or Ca3 exon as probes suggested that a single a gene was present in the ostrich genome ( Fig. 1). IgA is the principal antibody class in mucosal secretions and acts as an important first line of defense [44]. It is usually highly expressed in mucosal tissues but only weakly expressed in the spleen. However, most clones in our spleenderived Ig-specific mini-library were found to be IgA. This finding could be the result of a PCR bias during the process of 39 RACE. Indeed, our RT-PCR and Northern blotting data showed that the ostrich IgA was primarily expressed in the large and small intestines (Fig. S2, Fig. 3).
When comparing the ostrich IgA heavy chain constant region with those of other species, 10 conserved cysteines were observed. There are three N-linked glycosylation sites in Ca2, Ca3 and the canonical secretory tail: N-165, N-221 and N-419, all of which are conserved in birds (Fig. S3). The ostrich Ca gene shares 44% sequence identity with chicken and 66% with duck Ca.
When performing 39RACE PCR using the spleen RNA and J Hderived primers, we observed an 850-bp band in addition to the major 1.6-kb products (all 4-Ca containing transcripts). Sequencing of this band showed that it encoded a short, membrane-bound IgA lacking the last two Ca domains (i.e., VDJ-Ca1-Ca2-TM) (Fig. S4). To further confirm the presence of this short transcript, we used primers derived from the Ca1 to perform IgA-specific 39RACE. In addition the full-length of 1.4-kb IgA transcript, we again detected the short IgA transcript, which contained only the first two Ca domains (Fig. 4A). To determine whether the short IgA is only expressed in the spleen, we then performed RT-PCR using the primers derived from the Ca1 and TM regions. The short form was detected in multiple tissues (Fig. 4B). Northern blotting with the first two Ca exons as a probe showed the short form to be mainly expressed in the intestine, albeit at a much lower level than the full-length form (Fig. 4C). To confirm that the short IgA transcript was derived from alternative splicing, we amplified and sequenced the exon-intron boundaries of Ca2intron-Ca3, and Ca4-intron-TM, which clearly demonstrated that the short form to be derived from splicing of the Ca2 onto the TM exon.
The presence of the short IgA transmembrane transcript raises a question as to whether the ostrich is able to express a secreted IgA form lacking the last two Ca domains (i.e., IgA(DFc), similar to IgY(DFc) in ducks), although we did not observe such transcripts in the RACE experiments. We thoroughly analyzed the intron sequence between Ca2 and Ca3, and did not find any potential transcriptional termination signal or polyadenylation signal sequence (i.e., AATAAA). A polyclonal rabbit antiserum against the ostrich Ca1 and Ca2 were used in Western blotting. Only the intact form of IgA (approximately 65 KD under reducing conditions) was detected in the intestine membrane and cytoplasmic proteins (Fig. 5A). No short form of IgA could be identified at the protein level, probably because of an extremely low level of expression. The IgA in secretions of the large intestine appeared to be dimeric (approximately 350 KD under non-reducing conditions), as under reducing conditions, the molecular weight of the IgA heavy chain (without light chains) is approximately 65 KD (Fig. 5B).

Analysis of the ostrich u gene
The full-length IgY heavy chain constant region cDNA (secreted form) was obtained by screening the spleen Ig-specific mini-library. A phylogenetic analysis indicated that it was the ostrich u gene (Fig. 2). Similar to ostrich m, we only obtained a single IgY heavy chain constant-region cDNA, although Southern blotting indicated that more than one u gene was present in the ostrich genome (Fig. 1). Alignment of the ostrich IgY heavy chain constant region with those of other species revealed two cysteines in the Cu1, which suggests that these molecules can associate with light chains. Seven additional cysteines were distributed in Cu2-Cu4, all of which are conserved across all species examined (Fig.  S5). Cu contains two N-linked glycosylation sites: N-166 in the Cu2 conserved in birds and lizards and N-265 in the Cu3 conserved in Xenopus and humans (Fig. S5). A domain-by-domain comparison of the Cu regions indicated that the Cu1 displayed the lowest amino acid identity in birds (Fig. S5).
The expression pattern of the ostrich IgY transcript was examined using RT-PCR and Northern blotting suggested that the u gene was primarily expressed in the spleen and large intestine (Fig. S2, Fig. 3).

Analysis of rearranged VDJ fragments
To analyze the expressed VDJ sequences, 59RACE was performed using the primers derived from the m, a and u chain constant regions. The inferred amino acid sequences were aligned and showed relatively low sequence diversity. The amino acid sequence variabilities of the V H region were mostly confined to the CDR regions, in particularly the CDR3 region [45]. We sequenced 83 cDNA fragments, which provided 54 unique CDR3 (Fig. S6). The length of CDR3 varies from 9 to 24 residues to create considerable variability with an average of 14.3362.18 codons, which is longer than the CDR3 of Xenopus (8.6 codons) and mice (8.7 codons) [46]. Analysis of the FR4 sequences suggests that there are two distinct J H gene segments in the ostrich: J H 1 and J H 2, which differ by seven nucleotides but have only one amino-acid substitution (Fig. S7). Among the obtained V H clones, more than 10 contained leader peptide-encoding sequences that were identical in sequence (MGPRLPGFVLL-LLLLAALPGLRA). It is highly likely that only a single V H gene segment was available for VDJ recombination events in the ostrich.

Analysis of the ostrich light chain genes
To analyze the light chain genes in the ostrich, we designed several pairs of degenerate primers for the l and k genes based on the conserved Cl and Ck sequences of other species. These primers were used in PCR amplifications with the spleen cDNA as templates. We were only able to amplify the l gene in the ostrich, as a phylogenetic analysis clearly showed that the identified gene belonged to the l lineage (Fig. 6). We further performed 59 RACE amplifications based on the Cl sequence that we obtained. In total, 57 clones were sequenced and shown to encode the fulllength Vl domain and the same leader peptide (MAWAPLL-LAVLAHGSGSLV). Overall, the inferred Vl region amino acid homology among these clones ranged from 87.2 to 99.2%. The average length of the CDR3 was 9.8162.38 codons, with a range of 4 to 14 codons. The tetrapod IGVL sequences generally have a fairly well-conserved DEAD (Asp-Glu-Ala-Asp) motif in the FR3 region [47]. However, as in the chicken IGVL sequence (DEAV), the Asp residue is also substituted by a Val in the ostrich. An analysis of these Vl sequences revealed only a single Jl in the ostrich.
We subsequently performed 39RACE using the leader peptide specific primers and which identified a single Cl in the ostrich. The Cl sequence shows a 67.0%, 55.1% and 64.5% amino acid sequence homology to the chicken, lizard and human Cl1, respectively. A protein sequence alignment of Cl in amphibians, reptiles, birds and mammals revealed an identical pattern with regard to the cysteine distribution (Fig. S8).

Discussion
In the present study, three Ig isotypes (IgM, IgA and IgY, but not IgD nor Igk), were identified in the ostrich, which is a primitive avian species belonging to the order struthioniformes. Although Southern blotting indicated that there was more than one copy of the m and u genes in the ostrich genome, we were only able to obtain one copy of the m and u expressed at the cDNA level. We amplified and sequenced the intron between Cm1 and Cm2 and obtained only a single sequence. We also performed PCR using two pairs of primers derived from the conserved Cu4 sequences using genomic DNA. All sequenced clones were shown to contain the same sequence. The additional m and u genes detected by Southern blotting were likely pseudogenized and have diverged. We also performed Southern blotting using probes for the IgD and Igk constant regions of crocodiles and could not detect any bands (data not shown). It is likely that, as in chickens and ducks, the d and k genes are both absent in the ostrich; however a definite conclusion cannot currently be reached. All of the expressed ostrich heavy chain and light chain V regions harbored the same signal peptide, which indicates that there is    only one functional V H and Vl involved in V(D)J recombination in the ostrich. It is reasonable to assume that the ostrich also uses gene conversion as a major mechanism for generating antibody diversity. The ostrich Ig genes essentially exhibit the same distinct features that have been previously observed in chickens and ducks. This similarity demonstrates that the typical bird Ig system was likely already present in the common ancestor of carinatae and ratitae bird species and has remained unchanged over a long period of evolution (Fig. 7).
Reptiles are the closest relatives of the aves, and they are believed to have diverged approximately 250 MYA [48]. Recent studies have shown that reptiles, such as lizards and turtles, express IgD and k light chains [7,27,49]; this finding suggests that the evolutionary loss of these two genes must have occurred in birds after their divergence from reptiles (Fig. 7). Another interesting issue regarding IgA evolution also arises when considering findings present in both reptiles and birds. The IgA-encoding gene in ducks and chickens shows a transcriptional orientation opposite to that of IgM and IgY [9,14]. We also recently showed that the IgA encoding gene was absent in lizards and some other reptiles ( [7] and our unpublished data), which suggests that the IgA gene in the lineages leading to reptiles and birds has undergone some gene rearrangements that either deleted or inverted this gene. These germ-line DNA rearrangements in the IgH locus might also account for the evolutionary loss of the IgD gene in birds. A future investigation on the Ig genes in more primitive living birds or reptiles may help to clarify this issue.
When analyzing the ostrich IgA transcripts, we identified a shorter membrane-bound IgA encoding form with the last two Ca exons removed, as the Ca2 exon was directly spliced onto the TM exon. However, this short form of IgA could not be detected at the protein level, which suggests limited to no functional significance. Indeed, this short form of the IgA transcript was present at a very low level even at the mRNA level, and its presence may simply be to the result of accidental RNA splicing caused by non-critical mutations around the splice sites.
In summary, we characterized three Ig heavy chain classes (IgM, IgA and IgY) and the l light chain in the ostrich in this study. This study enriches the current knowledge of ratitae Igs, provides support for the continuous evolution of immunoglobulins in birds and highlights the importance of comparative studies in understanding the evolutionary history of the immune system.

Animals, RNA and DNA isolation
Ostriches (Struthio camelus) were purchased from a local Beijing farm. The animals were treated in accordance with the guidelines of China Agricultural University regarding the protection of animals used for experimental and other scientific purposes. The study was approved by the ethics committee of China Agricultural University (ID number 20110302). The total RNA was extracted from different tissues using the TRNzol kit (TianGen Biotech), following the manufacturer's instructions. Genomic DNA was extracted from the liver following routine protocols.

Southern blotting
Fourteen micrograms of liver genomic DNA digested with EcoR I, EcoR V, BamH I, Xba I, Pst I, Hind III, Dra I and Pvu II were fractionated in 0.9% agarose and transferred to Hybond N + nylon membranes. Cm-, Ca-, and Cu-specific full-length as well as single-exon probes were labeled using a PCR digoxigenin probe synthesis kit (Roche, Germany). The primers used to amplify the full-length Cm and Cm4 exon probes were Cms PCR and Northern blotting detection of ostrich Ig gene expression in different tissues

Western blotting
Membrane and cytosol proteins derived from different ostrich tissues were prepared using extraction kits (Beyotime, Beijing). Large-intestine secretions were diluted with 3% PBS. The oligopeptide that encodes ostrich IgA C H 1-C H 2 exons was synthesized, modified and coupled with KLH, and then intravenously injected into New Zealand rabbits. A polyclonal rabbit antiserum against the ostrich IgA C H 1-C H 2 domains was obtained and purified (Cwbiotech, Beijing). Samples were thermally denatured, separated by 12% SDS-PAGE and transferred to nitrocellulose membranes (Millipore, USA). The blot was blocked in Tris-buffered saline (TBS) containing 5% skim milk (w/v) for 1 h. Rabbit pAb and HRP-conjugated goat anti-rabbit IgG secondary antibodies (Cwbiotech, Beijing) were diluted using TBS+5% milk to 1:600 and 1:5000, respectively. The membranes were washed six times in TBS+0.05% Tween20 (TBST) between each step, and all incubations were performed at room temperature for 1 h. The bands were detected by incubation with Pierce ECL plus western blotting substrate (Thermo Fisher Scientific, USA) following the manufacturer's instructions. Figure S1 Sequence alignment of the ostrich IgM C H region with that of other species. Dots are used to denote identical amino acids, and dashes are used to adjust the sequence alignment. Canonical cysteines are shaded and conserved N-linked glycosylation sites across species are in red. The alignment was performed using ClustalW with some manual adjustments.