A Comprehensive Analysis of the Phylogeny, Genomic Organization and Expression of Immunoglobulin Light Chain Genes in Alligator sinensis, an Endangered Reptile Species

Crocodilians are evolutionarily distinct reptiles that are distantly related to lizards and are thought to be the closest relatives of birds. Compared with birds and mammals, few studies have investigated the Ig light chain of crocodilians. Here, employing an Alligator sinensis genomic bacterial artificial chromosome (BAC) library and available genome data, we characterized the genomic organization of the Alligator sinensis IgL gene loci. The Alligator sinensis has two IgL isotypes, λ and κ, the same as Anolis carolinensis. The Igλ locus contains 6 Cλ genes, each preceded by a Jλ gene, and 86 potentially functional Vλ genes upstream of (Jλ-Cλ)n. The Igκ locus contains a single Cκ gene, 6 Jκs and 62 functional Vκs. All VL genes are classified into a total of 31 families: 19 Vλ families and 12 Vκ families. Based on an analysis of the chromosomal location of the light chain genes among mammals, birds, lizards and frogs, the data further confirm that there are two IgL isotypes in the Alligator sinensis: Igλ and Igκ. By analyzing the cloned Igλ/κ cDNA, we identified a biased usage pattern of V families in the expressed Vλ and Vκ. An analysis of the junctions of the recombined VJ revealed the presence of N and P nucleotides in both expressed λ and κ sequences. Phylogenetic analysis of the V genes revealed V families shared by mammals, birds, reptiles and Xenopus, suggesting that these conserved V families are orthologous and have been retained during the evolution of IgL. Our data suggest that the Alligator sinensis IgL gene repertoire is highly diverse and complex and provide insight into immunoglobulin gene evolution in vertebrates.


Introduction
Immunoglobulin (Ig) is one of the most important primary effector molecules in the adaptive immune system of jawed vertebrates [1]. Each immunoglobulin is composed of a heavy (H) chain and one of two light (L) chain types: λ or κ in mammals. Each of these L chains typically covalently links to H by disulfide bonds formed by positionally conserved cysteine residues [2]. As exceptions, shark IgNAR and camelid IgGs are only composed of heavy chains [3,4]. The Ig light chain is encoded by λ and κ loci, which differ significantly in their genomic organization. At the λ locus, multiple V λ segments are followed by J λ -C λ repeats. In contrast, the cluster of V κ gene segments is followed by a cluster of J κ gene segments and then by a single C κ gene [5][6][7]. Lymphocytes can generate specific immunoglobulins against diverse antigens by a somatic recombination process, known as V (D) J recombination [8][9][10]. A pair of recombina-conducted to investigate Ig gene isotypes and their genomic organization in reptilia. Until now, IgM, IgD and IgY encoding genes have been identified in all Squamata species studied to date [45][46][47]. While it was shown that the Anolis carolinensis express two types of light chains: λ and κ [7,39,48], snakes lack the Igκ light chain isotype [45]. In the Testudines, IgM, IgD, IgY and IgD2 encoding genes were described, and two immunoglobulin domains of IgD2 are shown to be homologous to bird IgA domains, suggesting that they may originate from a common ancestral gene [49][50][51]. Crocodilians appeared during the Middle Triassic, approximately 240 million years ago (MYA). Although similar in appearance, crocodilians, as reptiles, are only distantly related to lizards and are thought to be the closest relatives of birds and have thus occupied an important position in evolution [52,53]. According to phylogenetic studies, crocodilians provide a phylogenetic link to other reptiles and birds, and analysis of their Ig genes may provide important clues to understanding Ig evolution. In addition, despite living in poor conditions, crocodilians are rarely subject to infections caused by bacteria and viruses because of their strong immune systems [54,55]. However, there have been few studies on the crocodilian immune system. Recently, IgH genes of crocodilians were identified; the results indicated that there are multiple μ genes and that IgM subclasses can be expressed through class-switch recombination. The crocodilian α genes are the first IgA-encoding genes identified in reptiles and suggested that reptiles and birds share a common ancestral organization [56,57].
Crocodilians are the closest phylogenetic group to birds, and they all come from a group known as archosaurs. However, little is known about the IgL locus of crocodilians. Although a previous study suggested that two distinct light chain types were present in alligator [48], the isotypes and the genomic organization of their encoding genes are still not known [39]. In this study, we present the phylogeny, genomic organization and expression of the Igλ/κ of the Alligator sinensis and provide insight into understanding the crocodilian immune system and the evolution of immunoglobulin in vertebrates.

Materials and Methods
Sample collection, DNA and RNA extract Blood samples of Alligator sinensis were collected from the Beijing Zoo. Genomic DNA was extracted from the blood following the standard protocol. Total RNA was extracted from the blood using a TRIzol kit (TIANGEN BIOTECH, Beijing) following the manufacturer's instructions. Our studies were approved by the Animal Care and Use Committee of the China Agricultural University.

BAC library
An Alligator sinensis genomic BAC library was constructed using a service provided by Bioestablish Biotechnology Co., Ltd. (Beijing, China) and was stored in our laboratory [56].

BAC screening and sequencing
Based on sequences derived from Gallus gallus and other related species, we designed degenerate primers for the Igκ/λ. We ascertained the identities of the PCR-generated product sequences by BLAST against the NCBI GenBank, and then designed specific primers for the Igκ/λ genes based on the determined sequences (S1 Table). BAC clones containing Igκ/λ genes were rescreened from the BAC library using PCR. The positive BAC clones were sequenced from both ends, and the end sequences were used to design primers to determine overlap ping BAC clones and to obtain the extended segments in the next round of screening (S1 Table).
The positive BAC clones were then sequenced by shotgun sequencing and assembled with the next generation sequencing platform by BGI (Beijing, China).
All PCR amplifications were performed using a proofreading enzyme Pyrobest DNA polymerase (TaKaRa, Dalian). The PCR products were cloned into the pMD-19 T vector (TaKaRa, Dalian) and sequenced.

Southern blotting
Genomic DNA was digested with restriction endonuclease and was loaded into a 0.9% agarose gel, electrophoresed for 6 h, and transferred to a positively charged nylon membrane (Roche, Germany) for hybridization. The restriction endonucleases Bgl II, Nco I, Hind III and Sph I were used to digest genomic DNA to identify Igλ. Genomic DNA was digested with restriction endonucleases Kpn I, Nde I and Xba I to validate Igκ. The single exon of the C λ /C κ probe was labeled using a PCR digoxigenin probe synthesis kit (Roche, Germany). The primers used to amplify the C λ /C κ exon probes were as follows: LC-F, 5'-ACA GCC AAA GGC CTC TCC T-3'; LC-R, 5'-CGA TCT CTT CAG GGT CTT CTC-3'; KC-F, 5'-AAA GGG GGA AGA GCC ACC-3'; KC-R, 5'-TAC ACT CGG TCC TCT TGA-3'. The hybridization and detection were performed following the manufacturer's instructions.

Sequence computations
DNA and protein sequence editing, alignments and comparisons were performed with the MegAlign software (DNASTAR). The EquCab2 assembly in Ensembl database (http://www. ensembl.org/index.html) was used to retrieve the genomic contig that contained the Alligator sinensis Igλ/Igκ chain sequences. IgBLAST (http://www.ncbi.nlm.nih.gov/igblast/) was used to predict the V λ / κ segments. Germline V λ and V κ gene segments were grouped into families using the IMGT numbering system [62]. The RSSs for the V and J gene segments were analyzed using the online program FUZZNUC (http://embossgui.sourceforge.net/demo/fuzznuc. html).

Genomic organization of IgL chain gene loci in Alligator sinensis
According to the IgL chain isotypes in Anolis carolinensis, the genomic organization of the Igλ gene locus in the Alligator sinensis was analyzed. An Alligator sinensis BAC (bacterial artificial chromosome) genomic library, which was constructed using the peripheral blood leucocytes from an Alligator sinensis and stored in our laboratory, was employed. The library is composed of 2.1 × 10 5 clones with an average insert size of~100 kb, representing~9 × genomic coverage (Alligator sinensis genome size of~2.5 Gb). Using a PCR-based approach and sequencing, we identified four Igλ gene-positive BAC clones (Y210O3, Y47P24, Y147P18 and Y127H24) (S1 Table). An~331 Kb genomic sequence was obtained and was found to contain four λ chain C genes (C λ 1, C λ 2, C λ 3 and C λ 4) and four λchain J genes (J λ 1, J λ 2, J λ 3 and J λ 4) in front of each C gene, spanning approximately 12 kb DNA, there are potentially 37 functional λ chain V genes, 32 λ chain V pseudogenes and one ORF. Furthermore, using the available genomic database of the Alligator sinensis (http://www.ncbi.nlm.nih.gov/), a genomic contig (AVPB01119656.1) was identified by BLAST; three λ chain C genes were identified in the contig: one is identical with C λ 4 in the~331 kb genomic sequence, and one appears to be a pseudogene because it contains an in-frame stop codon. Furthermore, three J genes were found in the contig. There are six λ chain C genes (C λ 1, C λ 2, C λ 3, C λ 4, CC λ 1 and C λ 5) and seven λ chain J genes (J λ 1, J λ 2, J λ 3, J λ 4, J λ 5, J λ 6 and J λ 7) (Fig 1 and S1 Fig). All of the C λ genes share at least 84.1% amino acid sequence identity, of which the amino acid sequence identities between C λ 1 and C λ 2, C λ 1 and C λ 4, C λ 2 and C λ 3, C λ 2 and C λ 4, and C λ 3 and C λ 4 are greater than 90.7%. Each C λ gene was preceded by a single J gene segment that was 5' flanked by conserved RSS (nonamer and heptamer) with a 12 bp nucleotide spacer, resembling the genomic organization of the λ chain gene loci in mammals (S2A Fig). However, a single J segment (J λ 7) was found downstream from C λ 5, but no additional λ chain C genes were identified (Fig 1A and S1 Fig), which implied that there are more C λ genes in the Alligator sinensis Igλ locus. A protein sequence alignment of the identified C genes with the C λ in lizards, birds and mammals uncovered an identical pattern with regard to the cysteine distribution (S2B Fig). Genomic Southern blotting with the C λ exon as a probe was conducted to verify the numbers of λ chain C genes. In Bgl II, Nco I and Sph I digested DNA, different shades of six bands were detected, and there were more than six bands in Hind III digested DNA, which indicated that there are additional C λ genes in the chromosome (Fig 2).
We performed a BLAST search against the Alligator sinensis whole-genome shotgun sequence (WGS) assembly deposited in the Ensemble database. Seven genomic contigs (KE698600.1, AVPB01102472.1, KE698001.1, KE698031.1, KE697531.1, KE697626.1 and KE695978.1) were found to contain λ chain V gene segments (S2 Table). Each V λ gene, which is 3' flanked by a conserved RSS (heptamer and nonamer) with 23 bp nucleotide spacer, was identified, resembling the genomic organization of the V λ chain gene loci in mammals. In summary, a total of 86 potentially functional V λ segments ( Fig 1B and S1 Appendix), two ORFs and 67 V λ pseudogenes were identified upstream from the (J λ -C λ ) n segments (S1 Fig), and 67 V λ that either contain in-frame stop codons or lack a leading peptide appear to be pseudogenes (S2 Appendix). According to the sequence identity (> 75% sequence identity within a single family) and phylogenetic analysis, the potentially functional V λ genes can be classified into at least 19 families (Fig 3; S3 and S4 Figs; S3 Appendix). In addition, there may be more V λ segments unidentified in the Alligator sinensis based on the gaps in the contig and incomplete genomic data.
A similar approach was used to identify the C κ from the genome of the Alligator sinensis. Using a PCR-based approach and sequencing, we obtained 5 Igκ gene-positive BAC clones (Y146M9, Y65C14, Y77E6, Y329F14 and Y146B4) (S1 Table). An~484 kb genomic sequence was found to contain a single copy of the Alligator sinensis C κ gene, which showed homology to several mammalian species, six J κ gene segments and 66 V κ gene segments, including 29 V κ pseudogenes. We performed a BLAST search against the Alligator sinensis whole-genome shotgun sequence (WGS) assembly deposited in the Ensemble database. Seventeen DNA contigs (AVPB01013186. 1 Table). At least 62 potentially functional V κ gene segments (Fig 1B and S4 Appendix); 56 V κ pseudogenes, which either contain in-frame stop codons or lack a leading peptide (S5 Appendix); and 4 partial V κ genes were identified from the Alligator sinensis genomic sequence (S5 Fig).
The C κ gene as a single copy in the genome was subjected to confirmation by Southern blotting. We designed a pair of degenerate primers for the C κ gene based on the conserved C κ sequences of the Alligator sinensis. Only a single band was observed in Kpn I, Nde I and Xba I digested genomic DNA, which supported the C κ gene as a single copy present in the genome (Fig 2). Upstream of the single copy of the C κ gene, six functional J κ s (J κ 1-J κ 6) gene segments with RSS interrupted by a 23 bp nucleotide spacer at their 5' ends were identified (S6A Fig). An amino acid sequence alignment of the C κ gene in the Alligator sinensis with other species suggested homology to the Igκ chains of several vertebrates, including the Homo sapiens, Mus musculus, Didelphimorphia, Ornithorhynchus, Anolis carolinensis, X. laevis and X. tropicalis Almost all V κ genes were flanked on the 3' end by RSS and were separated by a 12 bp nucleotide spacer to conform the 12-23 rules (S5 Appendix). All V κ genes showed the same transcriptional orientation as (J κ ) n -C κ , with the exception of pseudogene V κ 46. The 62 potentially functional V κ genes can be integrated into 12 families based on the phylogenetic analysis and the rule that V κ members in one family share at least 75% identity at the nucleotide level (Fig 4; S7 and S8 Figs; S6 Appendix). Because gaps exist in the contigs and the genomic data are incomplete, it is possible that more V κ genes present in the Alligator sinensis genome were not found.

Phylogenetic analysis of the Alligator sinensis Ig light chain gene segments
Using the amino acid sequences of IGLV-and IGLC-encoded genes from different jawed vertebrates, we constructed V and C phylogenetic trees, respectively. The trees were constructed  using protein sequences without CDR3. The phylogenetic trees, based on both the C domains and the V domains, support the fact that there are three major groups of IgL genes in jawed vertebrates: κ, λ and σ (including σ-cart), and Alligator sinensis κ and λ clearly fall into their own respective groups, suggesting that the Alligator sinensis has only two IgL isotypes: κ and λ   Phylogenetic tree analysis of the 62 Alligator sinensis V κ genes. A phylogenetic tree of the nucleotides of the Alligator sinensis V κ segments was constructed. The 12 V κ gene families are labeled with numbers on the right. Phylogenetic trees were constructed using MrBayes3.1.2 [58] and viewed in TREEVIEW [59]. doi:10.1371/journal.pone.0147704.g004 Analysis of Immunoglobulin Light Chain in Alligator sinensis (Figs 5 and 6; S9, S10, S11 and S12 Figs). The results reveal that the ρ gene of X. tropicalis, teleost L1 and L3 and cartilaginous fish type III/NS4 is located in the κ group, which also includes the κ genes of the Crocodilians, lizards and mammals. Teleost L2, cartilaginous fish type II/ NS3, and X. tropicalis type III all belong to λ groups, including the λ genes of the Crocodilians, lizards, birds and mammals. The σ genes are only found in cartilaginous fish, teleost and amphibians. Taken together with our shared synteny of the κ and λ locus in the Alligator sinensis and the phylogenetic analysis, these data provide convincing evidence that the Alligator sinensis expresses two IgL isotypes: κ and λ. From the phylogenetic analysis, it is not difficult to obtain the relationships between the Alligator sinensis and other species' V families. Alligator sinensis families V κ 10 and V κ 11 are clustered with Anolis carolinensis V κ ; Alligator sinensis family V κ 7 is clustered with X. laevis ρ; and the same phylogenetic analysis was also performed for V λ . As shown in Fig 6, Alligator sinensis families V λ 9 and V λ 19 are clustered with X. laevis type III V5 and Mus musculus families 1 and 3; Alligator sinensis families V λ 1-V λ 8 are related to the Anolis carolinensis V λ 1, V λ 3, Gallus gallus and Anas platyrhynchos V λ and X. laevis type III V4; and Alligator sinensis families V λ 11 is clustered with X. laevis type III V6. The V genes were orthologous in different isotypes of IgL. We found no relations between the remaining V λ genes and other jawed vertebrate species, suggesting that V λ genes exhibit more abundant diversity in the Alligator sinensis.

Syntenic analysis of Igλ and Igκ chain loci in tetrapods
To determine the identified genes belonging to the λ lineage, we analyzed the chromosomal location relative to the flanking genes of the available genomic data containing the Igλ loci in tetrapods. GNZA (guanine nucleotide-binding protein, α z subunit) and RTDR1 (rhabdoid tumor deletion region gene 1), MRPL40 (mitochondrial ribosomal protein L40) and HIRA (histone cell cycle regulation defective homologue A) located on, respectively, the two sides of the λ locus in Homo sapiens were selected as markers to provide evidence for the gene. An available genomic contig (NW_005841940) containing the Igλ locus of the Alligator sinensis was used for analysis. The results showed three situations in which the λ genes had the same transcriptional orientation: first, the Igλ locus was flanked downstream by MRPL40 and HIRA and upstream by GNZA and RTDR1, as in Homo sapiens and X. tropicalis; second, the opposite situation existed, with the Igλ locus flanked downstream by GNZA and RTDR1 and upstream by MRPL40 and HIRA, as in Gallus gallus, which can occur via intrachromosomal gene conversion; and third, the Igλ locus was only flanked upstream by MRPL40 and HIRA, as in Mus musculus and Anolis carolinensis. In the third situation, GNZA and RTDR1 were identified on chromosome 10, which does not contain IGL in Mus musculus, and in Anolis carolinensis, the chromosomal position of GNAZ was identified in contig (NW_003341094.1). However, no IGL gene was found in this contig, and the RTDR1 gene was not identified in Anolis carolinensis. In Mus musculus, the chromosome was recombined, leading to GNZA and RTDR1 being separated from the Igλ locus and located on another chromosome, whereas in Anolis carolinensis, the position of GNAZ could not be confirmed because of limited genomic data. The Igλ locus of the Alligator sinensis was also flanked upstream by GNZA and RTDR1 (Fig 7), whereas MRPL40 and HIRA were located in another Alligator sinensis genomic contig (NW_005841997.1), which could not be identified as an IGL gene. We cannot confirm that the two contigs of Alligator sinensis assembled together due to the preliminary nature of the genome assembly. The results suggested that the position of the Igλ locus on the chromosome in the Alligator sinensis was syntenic to that in Homo sapiens and X. tropicalis. In the other species, the flanking genes of the Igλ locus have changed in different ways, including possible intrachromosomal gene conversion (e.g., Gallus gallus), chromosome recombination (e.g., Mus musculus), and others that are not confirmed because of limited genomic data (e.g., Anolis carolinensis). All taxa studied showed the same flanking genes on one side or both sides of the Igλ locus. These data provide convincing evidence that the identified genes originated from the same ancestral gene as the λ gene in tetrapods and originated from the same ancestral gene as the type III light chain gene in X. tropicalis. The position of the Igλ locus on chromosome in X. tropicalis may be the oldest form in tetrapods.
Similarly, to determine the identified κ genes in the Alligator sinensis belonging to the κ lineage, we performed a syntenic analysis of the κ genes using the data available for tetrapods, Fig 5. Phylogenetic analysis of the IgL chain C genes in jawed vertebrates. The phylogenetic tree was constructed using C domains. The scale shown as a bar represents the genetic distance (number of nucleotide changes at the given scale). The credibility value for each node is shown. Phylogenetic trees were constructed using MrBayes3.1.2 [58] and viewed in TREEVIEW [59].  Phylogenetic analysis of the IgL chain V genes in jawed vertebrates. The phylogenetic tree was constructed using V domains. The scale shown as a bar represents the genetic distance (number of nucleotide changes at the given scale). The credibility value for each node is shown. Phylogenetic trees were constructed using MrBayes3.1.2 [58] and viewed in TREEVIEW [59]. doi:10.1371/journal.pone.0147704.g006 Analysis of Immunoglobulin Light Chain in Alligator sinensis including Homo sapiens, Mus musculus, Gallus gallus, Anolis carolinensis and X. tropicalis. We used the available long genomic contig (NW_005843366.1) containing the Igκ locus of the Alligator sinensis to compare with the chromosomal location relative to the flanking genes of the κ gene in other species. The Igκ loci in all analyzed species, except the Gallus gallus, were flanked on the 5' side by RPIA (ribose-5-phosphate isomerase A) and EIF2AK3 (eukaryotic translation initiation factor 2-α kinase 3) encoding genes (Fig 8), revealing that the Igκ locus of the Alligator sinensis was syntenic to the Homo sapiens, Mus musculus, Anolis carolinensis and X. tropicalis. We also searched for relevant genes upstream of the Igκ locus in the analyzed species and found some gene families that were located far from the Igκ locus, including SCL (solute carrier family 4, sodium borate transporter) and RP (ribosomal protein). In the analyzed species, either one or two of these gene families were located in the same chromosome with the Igκ locus, except for the Alligator sinensis and X. tropicalis, which lack a complete genomic sequence. Similar to the Igλ locus, we found intrachromosomal gene conversion, as in Homo sapiens, Anolis carolinensis and Gallus gallus, and chromosome recombination leading to lost genes, as in Mus musculus. The preservation of the precise order of genes near the Igκ locus on the chromosome suggested that the Igκ of the Homo sapiens, Mus musculus, Anolis carolinensis and Alligator sinensis and the ρ of X. tropicalis was passed down from a common ancestor. However, we did not find any light chain gene located together with the RPIA and EIF2AK3, but SUCLG1 (succinate-CoA ligase, GDP-forming, α subunit) was located on the 5' side of RPIA and EIF2AK3 in the Gallus gallus. SUCLG1 was located downstream from the same chromosome and far from RPIA and EIF2AK3 in the Homo sapiens (~4.0 Mb) and Mus musculus (~2.4 Mb), suggesting intrachromosomal gene conversion, such as Igλ in the Gallus gallus. During this process, the Gallus gallus Igκ locus was lost. In Anolis carolinensis, SUCLG1 is located on chromosome 5 rather than on chromosome 6, on which the Igκ locus is located. In X. tropicalis, gene EIF2AK3 was not identified with confidence. We also could not identify the gene SUCLG1 in the Alligator sinensis.

IgL loci functionality and V-J junction diversity in Alligator sinensis
Using 5'RACE, we cloned and sequenced 402 amplified cDNA fragments from the blood of Alligator sinensis which was the same Alligator sinensis to construct genomic BAC library, generating 181 clones that exhibited unique V-J junctions. The sequences were somewhat different from the corresponding genome sequence of the EquCab2 assembly. Among these 181 clones, 56 clones contained a C λ 1, 44 clones contained a C λ 2, 32 clones contained a C λ 3, 3 clones contained a C λ 4, and 37 clones contained another C λ chain that slightly differed from the identified C λ 4 gene and shared at least 97.3% sequence identity with C λ 4, suggesting the existence of an allelic variant of C λ 4. In addition, two new C λ genes were found in clones LV6-51 and LV61, which were distinct and shared at least 92.7% sequence identity with C λ 1, C λ 2, C λ 3, C λ 4 and C λ 5. However, in the rest of the C region, clones exhibited chimeras: clone LV2-11 and clone LV6-91 are C λ 2-C λ 1 chimeras; clone LV2-8 and clone LV-14 are C λ 3-C λ 2 and C λ 4-C λ 1 chimeras, respectively; clones LV11, LV2-38 and LV56 are C λ 3-C λ 1 chimeras (S7 Appendix). All chimeras most likely indicated PCR artifacts. The results of the usage of C λ genes and the genomic organization of the Igλ chain gene locus suggested the existence of additional C λ genes in the Igλ locus of Alligator sinensis. Furthermore, we could not amplify J λ 6-C λ 5 in the Alligator sinensis due to its low expression level. Analysis of Immunoglobulin Light Chain in Alligator sinensis As expected, J λ 1, J λ 2, J λ 3 and J λ 4 were co-expressed with their respective C λ genes in most cases. However, in some cases, J λ segments were not co-expressed with their respective C λ genes, such as one J λ 2-C λ 1 in clone LV25, one J λ 3-C λ 2 in clone LV109, one J λ 1-C λ 3 in clone LV5-51 and one J λ 4-C λ 3 in clone LV6-82, which were generated by template jumping during PCR amplification. Furthermore, two additional C λ genes were not found in the genome and were co-expressed with J λ 1 and J λ 2 in clones LV6-51 and LV61, respectively. By alignment, the amino acid sequence identities of the two C λ genes were 97.6% and 98.8% with C λ 1 and C λ 2, respectively, suggesting that the two C λ genes in clone LV6-51 and LV61 might be two allelic genes with C λ 1 and C λ 2 genes. All three clones containing C λ 4 and the other 37 clones, which contain an allelic variant of C λ 4, were co-expressed with J λ 4, indicating the existence of a C λ 4 allelic gene. Moreover, we analyzed the J λ genes in 7 chimeras of C λ genes; clones LV2-11 (C2 +C1) and LV6-91 (C2+C1) included J λ 2, clone LV14 (C4+C1) included J λ 4, and clones LV2-8 (C3+C2), LV11 (C3+C1), LV2-38 (C3+C1) and LV56 (C3+C1) contained J λ 3. All of these products most likely represented PCR artifacts or were generated by template jumping during PCR amplification. We did not find J λ 5, J λ 6, J λ 7 or any other J λ in the unique 181 clones because of their low expression. We did not find any other C λ genes in our study, although an isolated J λ 7 was located in the present genomic sequence. It is possible that more C λ genes were not found because of the incomplete genomic data for the Alligator sinensis.
Of the 181 cDNA clones described above, 115 had an identifiable V gene, which provided 63 uniquely recombined V-J junctions (S8 Appendix), and were chosen for analysis and revealed a biased usage pattern of V λ (Fig 9). The results showed that V λ segments family 7 was the most frequently used, which accounted for roughly one-third of the expressed V λ repertoire (45/115). Family 1, family 6 and family 9 were more frequently used segments (Fig 9). V λ segments from families 2, 8, 11, 12 and 17 were less frequently used. The V λ segments of other families were not observed in the cDNA clones of the Alligator sinensis. In these 63 uniquely recombined V-J junctions, 30% of the clones (35/115) had insertion of N and P nucleotides, generally one to two nucleotides, but there were some exceptions. For example, clone LV6-73 had seven N and P nucleotides in its junction; clone LV5-13 and clone LV5-34 had six and five N and P nucleotides in their junctions, respectively; clones LV6-51 and LV6-8 had four N and P nucleotides in their junctions; and clone LV2-8 had three N and P nucleotides in its junction. On average, the length of the N + P nucleotides in these clones was 0.6 ± 1.2 nucleotides. More than 85% of the clones (98/115) had exonuclease removals at the 3' end of V λ . Compared with V λ , fewer nucleotides were removed at the 5' end of J λ (67/115) by the exonuclease activity (V λ 3.1 ± 2.2 vs. J λ 1.6 ± 1.8). The average length of the CDR3 in these λ gene clones was 10.6 ± 0.9 (S8 Appendix). The results above demonstrated the abundant diversity of the V λ genes in the Alligator sinensis.
We cloned and sequenced 237 cDNA fragments from the Alligator sinensis using 5' RACE to analyze the use of J κ and V κ segments in the expressed κ chain, among which KV-4 has a stop codon in the leading peptide. After the removal of redundant clones, 124 clones that showed unique V-J junctions were obtained for analysis. All six functional J κ segments were used in these clones: 51 clones contained J κ 1; 22 clones contained J κ 2; 18 and 21 clones contained J κ 3 and J κ 4, respectively; 10 clones contained J κ 6; and J κ 5 was only employed in clone KV-47. In addition, another J κ that was not found in the genome occurred only once in clone KV2-67, suggesting the existence of another J κ in the genome or an allelic variant of J κ . The results revealed a preferential J κ segment with J κ 1 as the first preferential usage. The usage frequencies of J κ 5 and J κ 6 were lower, with J κ 5 being the lowest.
We chose 91 clones from the above mentioned 124 clones that had identifiable V κ genes for analysis, revealing a preferential V κ usage pattern (Fig 10). The results showed that V κ segments family 1 and family 5 demonstrated obvious advantages, which accounted for 51% and 40% of the expressed V κ repertoire, respectively. V κ segments from families 2, 3, 6 and 8 were less frequently used. The V κ of other families were not observed in the cDNA clones of the Alligator sinensis probably because these families contained only one or two members and their expression levels were low. These 91 clones represented 59 uniquely recombined V-J junctions  (S9 Appendix). More functional V κ genes that were not found in the genome were expressed in 33 clones, suggesting more V κ genes in the Alligator sinensis that have not been identified because of gaps in contigs and incomplete genomic data. The majority of V-J junctions in uniquely recombined κ chain clones lack N and P nucleotide additions. In 59 uniquely rearranged clones, 10 clones show putative N or P nucleotides, and the number of N and P nucleotides is 1 or 2, with an average of 0.16 ± 0.47 bp per clone. The exonuclease removals at the 3' end of V κ and the 5' end of J κ were 2.1 ± 1.5 and 1.3 ± 1.7 nucleotides. The average length of the CDR3 was 8.8 ± 0.5 nucleotides, and 89% the expressed κ V-J junctions might be formed by microhomology (S9 Appendix).

Discussion
Reptilian is comprised of Aves and non-avian reptilia (Crocodylia, Testudines and Squamata) [63,64]. Immunoglobulin genes have been studied in non-avian reptilia of Testudines species [50,51] and Squamata species [45,46,[65][66][67]. Crocodilians are thought to be the closest relatives of birds, and they are believed to have strong immune systems [52][53][54][55]. Recently, the IgH gene of crocodilians was identified [56,57]. An interesting feature of the crocodilian IgH constant loci is the presence of a number of duplicated genes encoding five Ig classes [57]. In addition, an investigation of the crocodilian α genes suggested that reptiles and birds share a common ancestral organization [56,57]. To better understand the immune system of crocodilians, to provide a more complete data of crocodilians Igs, and to obtain more information about immunoglobulin evolution in mammals, birds and reptiles, we identified the Alligator sinensis IgL gene repertoire based on the genome sequence and Alligator sinensis genomic BAC library.
Previous studies suggested that different IgL genes of jawed vertebrates were classified into four isotype groups: λ, κ, σ and σ-cart. To date, all four isotypes are present only in cartilaginous fishes: type I (NS5), type II (NS3), type III (NS5) and σ [13]. Type III is clearly κ, type II is more similar to λ [15,16], type I is classified as σ-cart [13], and σ is orthologous to the σ isotype in amphibians [13]. Three IgL isotypes exist in amphibians, including λ, κ, and σ [27][28][29][30][31], whereas most other tetrapods, including reptiles, have two IgL isotypes (λ and κ) [5,7,[32][33][34][35][36]44]. Birds and snakes have only the λ isotype [39,42,45]. The different IgL isotypes are located in different genomic regions. The genomic organizations of these regions are also different [13]. In the κ locus, multiple J κ genes, which are present in different numbers in different species, are present in a cluster and are generally followed by a single C κ [5]. Because the κ isotype is present in cartilaginous and bony fishes, with a clear phylogenetic relationship, and in tetrapods, with the exception of Gallus gallus, it is believed to be the oldest and most evolutionarily conserved isotype [13]. Unlike Igκ, the λ gene locus often contains several pairs of J λ -C λ , which are also present in different numbers in different species, located downstream from the V segments [34]. Previous studies found that multiple J λ -C λ were duplicated after speciation [7,31].
In our recent study, two IgL loci λ and κ were identified in another reptile, the Alligator sinensis, using an available genomic database and sequencing of the Alligator sinensis genomic BACs, which contain IgL genes. In addition, using the X. tropicalis C σ as a template [31], we performed a BLAST search against the Alligator sinensis whole-genome shotgun sequence assembly. No similar sequence was identified (data not shown). The results are consistent with those for Anolis carolinensis, revealing only λ and κ isotypes in reptiles. We sketched the map of the genomic organization of the Igλ and Igκ gene loci of the Alligator sinensis (Fig 1; S1 and S5 Figs). As in other species, each C λ gene is preceded by a single J λ gene segment (Fig 1A and  S1 Fig), whereas a single C κ gene follows a cluster of J κ gene segments (Fig 1B and S5 Fig). To analyze the structure of the RSS elements flanking the IGLV and IGLJ genes, the rule of the heptamer-12 bp spacer-nonamer and the nonamer-23 bp spacer-heptamer, which is a universal rule of IGLV and IGLJ gene in all species, is demonstrated. The results reveal that the genomic organization of Igλ in the Alligator sinensis is similar to that in X. tropicalis, lizards, birds and mammals, whereas Igκ is similar to that in X. tropicalis, lizards and mammals because the κ gene has been lost in birds. We found six C λ genes and seven J λ genes from the genomic DNA sequence, and the C λ 5 gene and J λ 5-7 were not found to be expressed, likely because of their low expression levels. Generally, J λ -C λ pairs are located in the genome. In our study, an isolated J λ 7 was located on the 3' end of the Igλ locus without following a corresponding C λ gene. This result suggested that more C λ genes might be located in the Igλ locus in the Alligator sinensis, which was supported by the Southern blotting results.
Our study also found multiple germline V λ and V κ in the Alligator sinensis. A total of 155 V λ and 118 V κ gene segments were identified, which contain 69 V λ pseudogenes and 56 V κ pseudogenes, respectively. All V λ genes are oriented in the same transcriptional orientation as the C λ gene and are upstream of the (J λ -C λ ) n or (J κ ) n . The multiple functional V genes can increase the antibody diversity and enhance the immune response of antigen recognition and binding. The ratio of functional V λ and V κ varies significantly in different species [5,[32][33][34][35][36]. It has been proposed that the number of V gene segments may be connected to the preferential use of light chain isotypes at the protein level [68]. The results of the present study indicated that V λ germline genes are more dominant than V κ (86 functional V λ genes vs. 56 functional V κ genes) in the Alligator sinensis. It is possible that the λ isotype in Alligator sinensis serum antibodies is more abundant than the κ isotype. Additionally, there is a large number of pseudogenes in the V λ and V κ loci. We question whether these pseudogenes are functional as those in birds for use as donors of uniquely combined functional V genes in gene conversion [43]. These pseudogenes were likely involved in generating Ig diversity. The diversification of IgLs in the Alligator sinensis is similar to that in most tetrapods but is different from that in the Gallus gallus. A total of 142 potentially functional V λ genes (V λ and V κ ) are classified into 31 families in the Alligator sinensis: 19 families in V λ and 12 families in V κ (Figs 3 and 5; S3, S4, S7 and S8 Figs, S3 and S6 Appendixs). For other species, 177 functional V λ genes (V λ and V κ ) are classified into 23 families in Mus musculus (http://www.imgt.org/IMGTrepertoire/), 148 functional V λ genes (V λ and V κ ) are classified into 23 families in Homo sapiens (http://www.imgt.org/ IMGTrepertoire/), 51 functional V L genes (V λ and V κ ) are classified into 11 families in Anolis carolinensis [7], and only one V λ gene (or one family) is present in Gallus gallus [42]. The diversity of the IgL chain is generated by V-J recombination, somatic hypermutation, and the polymorphism of the V L genes, including the number of V L genes and families (classifying family according to the similarity of sequence). Our results reveal that the Alligator sinensis possesses at least 142 functional V L genes (possibly more) and 31 V L gene families, although the number of V L genes in the Alligator sinensis is not the most plentiful in the tetrapods. However, the number of V L gene families is the greatest. The phylogenetic analyses show that many V λ gene families in the Alligator sinensis are orthologous with other species, but the remaining V λ gene families are characteristic of the Alligator sinensis. The Alligator sinensis also possesses a large number (68) of DH gene segments and multiple μ genes in the IgH locus, suggesting that the DH segments may contribute significantly to antibody diversity in crocodilians and that IgM subclasses can be expressed through class-switch recombination in the IgH gene locus [56]. These results reveal the vast diversity of Ig in the Alligator sinensis, suggesting that crocodilians have a strong immune system.
We compared IgL chains between two reptiles: the Alligator sinensis and Anolis carolinensis. We found more abundant V L genes in the Alligator sinensis than in Anolis carolinensis, including functional V L genes and pseudogenes. The analysis of the expressed V λ and V κ in the Alligator sinensis showed that a large number of V genes were employed in both λ and κ, suggesting that somatic V-J recombination can contribute to the Alligator sinensis antibody diversity, as in Anolis carolinensis [7]. Additionally, the occurrence of N or P nucleotide additions at V-J junctions is increased in the Alligator sinensis compared to the paucity of N or P nucleotide additions in the V-J junctions in Anolis carolinensis, suggesting that crocodilians have more V-J combinatorial diversity than lizards.
We analyzed the preserved co-localization of genes on the Igλ and Igκ loci in different species. First, we identified a syntenic relationship between two conserved gene clusters the GNZA and RTDR1 cluster and the MRPL40 and HIRA cluster with the Igλ gene on the chromosome in the Alligator sinensis and other species, including Homo sapiens, Mus musculus, Gallus gallus, Anolis carolinensis and X. tropicalis (Fig 7). All species retained either one or two gene clusters beside the Igλ locus, although two gene clusters reversed their position in Gallus gallus and one gene cluster was lost in Mus musculus, suggesting that the location of Igλ locus was conserved in tetrapods, including crocodilians. The oldest form was found in X. tropicalis and Homo sapiens and possibly in Alligator sinensis. We also found a syntenic relationship of the Igκ gene on the chromosome in different species. The results showed that conserved genes RPIA and EIF2AK3 were flanked on the 3' side of Igκ in all species, except in Gallus gallus (Fig  8). The two gene families, SCL and RPL, were located far upstream of the Igκ locus. The results suggested that likely intrachromosomal gene conversion occurred in Gallus gallus and Homo sapiens or Anolis carolinensis during speciation, leading to Gallus gallus Igλ and Igκ loci changes. The flanking genes of Igλ were reversed and were lost, and the positions of SCL and RPL were reversed in Homo sapiens and Anolis carolinensis. Either Homo sapiens or Anolis carolinensis retained the oldest Igκ locus in the genome.
The results of the phylogenetic tree based on the C domain revealed that isotypes were grouped first, and then species were grouped (Fig 5; S9 and S10 Figs). The phylogenetic tree of V genes also showed the same result (Fig 6; S11 and S12 Figs), suggesting that IgL isotypes were individually orthologous. The phylogenetic analyses showed that the σ gene was only present in cartilaginous fish, bony fish and amphibians and was absent in reptiles, birds and mammals [13,24,31,39]. The κ gene existed in all vertebrates except birds [13,[39][40][41]. Therefore, the σ gene was lost in other vertebrates after their divergence from amphibians [13,31], and the κ gene was lost in birds [39][40][41]. Phylogenetic analysis of the IGLV gene, including all 19 V λ families and 12 V κ families in the Alligator sinensis, Alligator sinensis families V λ 1-V λ 8 are related to the Anolis carolinensis V λ 1, V λ 3, Gallus gallus and Anas platyrhynchos V λ , and X. laevis type III V4 (Fig 6; S11 and S12 Figs), which suggested that during the evolution of the λ locus, there was an ancestral locus shared by birds, reptilia and Salientia [7]. Alligator sinensis families V λ 11 is clustered with X. laevis type III V6; Alligator sinensis families V κ 11 and V κ 10 are clustered with Anolis carolinensis V κ ; and Alligator sinensis family V κ 7 is clustered with X. laevis ρ (Fig 6; S11 and S12 Figs), which indicated that reptilia and amphibians shared some V λ and V κ families and originated from descendants of a common ancestor. Crocodilians possess more V L families than frogs, lizards and mammals, and there is more abundant diversity of the V gene in crocodilians. Taken together, the results strongly suggest that we have identified two IgL loci in Alligator sinensis that belong to the κ and λ lineages. We present evidence that the σ was lost in early reptilians, avian and mammalians after their divergence from amphibians [13,31], and the κ gene was absent in birds after their divergence from reptilians, similar to the δ gene [39][40][41].
This study investigated the genomic organization of Alligator sinensis IgL genes. The organizations and structures of IgL genes are similar to those of other jawed vertebrates. The study of the Alligator sinensis λ and κ loci revealed a diverse and complex repertoire of IgL in crocodilians; the information provides key insights into the evolution of IgL genes in jawed vertebrates. S8 Appendix. V-J junctions of the λ chain genes. The letter in the middle indicates N/P nucleotides. The column "N+P" indicates the total nucleotide length of the N and P nucleotides, and the column "CDR3" indicates the codon numbers. The column "Deletions in 3' end of V λ " indicates the number of nucleotides deleted by exonuclease activity at the 3' end of V λ , and the column "Deletions in 5' end of V λ " indicates the number of nucleotides deleted by exonuclease activity at the 5' end of J λ . Germline sequences of each V λ gene segment are shown above the cDNA clones in bold, and the CDR3 is also underlined. (DOCX) S9 Appendix. V-J junctions of the κ chain genes. The letter in the middle indicates N/P nucleotides. The column "N+P" indicates the total nucleotide length of the N and P nucleotides, and the column "CDR3" indicates the codon numbers. The column "Deletions in 3' end of V κ " indicates the number of nucleotides deleted by exonuclease activity at the 3' end of V κ , and the column "Deletions in 5' end of J κ " indicates the number of nucleotides deleted by exonuclease activity at the 5' end of J κ . Germline sequences of each V κ gene segment are shown above the cDNA clones in bold, and the CDR3 is also underlined. The phylogenetic tree was constructed using V domains. Each V subgroup is represented with one sequence per species chosen at random among the functional genes. The scale shown as a bar represents the genetic distance (number of nucleotide changes in the given scale). The credibility value for each node is shown. The phylogenetic tree was constructed using Phylip3.695 [60] and viewed in TREEVIEW [59]. (TIF) S12 Fig. Phylogenetic analysis of the IgL chain V genes in jawed vertebrates. The phylogenetic tree was constructed using V domains, and by Neighbor-joining P-distance and pairwise deletions using MEGA6.0. (TIF) S1