Contribution of VH Replacement Products in Mouse Antibody Repertoire

VH replacement occurs through RAG-mediated recombination between the cryptic recombination signal sequence (cRSS) near the 3′ end of a rearranged VH gene and the 23-bp RSS from an upstream unrearranged VH gene. Due to the location of the cRSS, VH replacement leaves a short stretch of nucleotides from the previously rearranged VH gene at the newly formed V-D junction, which can be used as a marker to identify VH replacement products. To determine the contribution of VH replacement products to mouse antibody repertoire, we developed a Java-based VH Replacement Footprint Analyzer (VHRFA) program and analyzed 17,179 mouse IgH gene sequences from the NCBI database to identify VH replacement products. The overall frequency of VH replacement products in these IgH genes is 5.29% based on the identification of pentameric VH replacement footprints at their V-D junctions. The identified VH replacement products are distributed similarly in IgH genes using most families of VH genes, although different families of VH genes are used differentially. The frequencies of VH replacement products are significantly elevated in IgH genes derived from several strains of autoimmune prone mice and in IgH genes encoding autoantibodies. Moreover, the identified VH replacement footprints in IgH genes from autoimmune prone mice or IgH genes encoding autoantibodies preferentially encode positively charged amino acids. These results revealed a significant contribution of VH replacement products to the diversification of antibody repertoire and potentially, to the generation of autoantibodies in mice.


Introduction
The variable region exons of the immunoglobulin (Ig) genes are generated through sequential rearrangement of previously separated V H , D H (for heavy chain only), and J H gene segments catalyzed by the recombination activating gene products (RAG1 and RAG2) [1][2][3][4][5]. The specific joining of V H , D H , and J H gene segments is directed by the recombination signal sequences (RSSs) [6,7]. The RSS consists of a highly conserved heptamer and a nonamer, separated by a non-conserved spacer region with either 12-bp or 23-bp nucleotides [6][7][8][9]. Efficient recombination occurs between a 12 bp RSS-and a 23 bp RSS-flanked gene segments [6,7]. After RAG-mediated cleavage, the resulting double strand DNA breaks are repaired by the Non-Homologous End Joining (NHEJ) pathway [4,5]. The coding end hairpins are opened and re-joined to form the coding exon of Ig gene, whereas the signal ends are ligated to form an excision circle and released from the chromosomal DNA [6,7].
Rearrangement of Ig heavy (IgH) chain genes starts with a D H to J H recombination on one allele of the IgH loci in early progenitor (pro) B cells followed by recombining a V H gene segment to the DJ H joint in late pro B cells [4,5]. If the rearrangement is non functional, pro B cells will start to rearrange the second IgH allele [4,5]. Functionally rearranged IgH genes will be expressed as the m heavy chains to form pre-B cell receptors with the non-rearranged components, Vpre-B and lambda 5 [10][11][12][13][14][15]. Signaling from the pre-BCR will stimulate pre B cell proliferation and subsequent IgL gene rearrangement [14,15]. The IgL gene variable region exon is generated by a one step rearrangement between a V L segment and a J L segment in the small precursor (pre-) B cells [4,5,16]. Due to the random recombination process, two thirds of the V(D)J rearrangement products might be out of reading frame and cannot express functional Ig peptides. Even if the IgH gene rearrangements are productive, they might fail to pair with the surrogate or conventional light chains. B cells lacking functional pre-B cell receptors (pre-BCRs) or B cell receptors (BCRs) cannot develop further along the B lineage pathway [14,17]. Moreover, functionally expressed BCRs may be self-reactive. In all these cases, early B lineage cells retain the abilities to initiate secondary RAGmediated recombination to alter the rearranged Ig genes, a process known as receptor editing [18][19][20].
Editing of rearranged IgL genes can occur through RAGmediated secondary recombination between any upstream V L gene to a downstream J L gene [21][22][23][24][25][26]. The intervening DNA fragment containing the previously rearranged V L J L joint is deleted during the editing process [24][25][26]. As a default mechanism, pre-B cells with non-functional rearrangements on both Igk alleles can initiate de novo rearrangements at the Igl locus [26]. Accumulating studies indicated that non-functional or autoreactive IgH gene rearrangements can be edited through a V H replacement process [27][28][29][30][31][32][33]. V H replacement occurs through RAG-mediated recombination between a cryptic RSS embedded at the 39 end of the rearranged V H gene with the 23 bp RSS from a upstream V H gene [31]. V H replacement was originally observed in murine pre-B cell leukemia cells, which generated functional IgH genes from non-functional IgH rearrangements [27,28]. The potential biological function of V H replacement in editing IgH genes encoding anti-DNA antibodies was demonstrated in a series of studies using engineered mouse models carrying knocked-in IgH V(D)J rearrangements encoding anti-DNA antibodies [29,34,35]; Later studies also provided evidence that V H replacement was employed to diversity the antibody repertoire in mouse carrying knocked-in IgH genes encoding anti-NP antibodies [30,36] and to rescue B cells with two alleles of non-functional IgH rearrangements [32,33]. Despite of these findings in engineered mice, evidence for ongoing V H replacement during B cell development in normal mouse and contribution of V H replacement products to the mouse antibody repertoire were lacking for a long time [37,38].
Due to the location of the cRSS at the 39 end of V H germline gene, V H replacement renews almost the entire V H coding region but leaves a short stretch of nucleotides from the previously rearranged V H gene at the newly formed V-D junction [28,31]. These remnants can be used as V H replacement footprints to trace the occurrence of V H replacement and to identify potential V H replacement products through analyzing IgH gene sequences [31]. Our previous analysis of 412 human IgH gene sequences estimated that V H replacement products contribute to about 5% of the primary B cell repertoire in human [31]. A recent analysis of IgH genes generated from knock-in mice expressing IgH genes encoding anti-DNA antibodies showed that 7.5% of the newly generated IgH genes contain pentameric V H replacement footprints [39]. Similar frequency of V H replacement products were also found in IgH genes obtained from the wild type B6 mice [39].
To explore the contribution of V H replacement products to the diversification of mouse IgH repertoire, we developed a Java based V H replacement footprint analyzer (V H RFA) program and analyzed 17,179 mouse IgH gene sequences from the National Center for Biotechnology Information (NCBI) database to identify V H replacement products. These results revealed a significant contribution of V H replacement products to the murine IgH repertoire and the enrichment of V H replacement products in several strains of autoimmune prone mice.

The Mouse IgH Sequence Repertoire
To analyze a large number of IgH gene sequences and to identify potential V H replacement products, we developed a Java based V H Replacement Footprint Analyzer (V H RFA) program. Using the V H RFA program, we analyzed 17,179 mouse IgH gene sequences from the NCBI databases to identify V H replacement products. First, the potential V H , D H , and J H germline gene usage were assigned using the IMGT/V-QUEST program by sending batches of sequences using the V H RFA program (shown in Table  S1). Based on the IgH CDR3 region sequences, clonally identical sequences were stripped out. There are 11309 unique IgH gene sequences; 10159 of them have clearly identifiable V H , D H , and J H genes; 9774 of them are productive and 373 of them are nonproductive IgH rearrangements. In these IgH genes, different families of V H genes are used differentially (Fig. 1). There are 63683 (65%) functional IgH genes using the IGHV1/V H J558 family of V H genes; 911 (or 9.3%) functional IgH genes using the IGHV5/V H 7183 family of V H genes. The other families of V H genes, including IGHV4/X-24, IGHV11/CP3, IGHV12/CH27, IGHV13/3609N, and IGHV15/VH15A, are used at much lower frequencies (Fig. 1A). Among the non-functional IgH rearrangements, the usages of most V H gene families are similar to those in functional IgH genes, but the usages of the IGHV5/V H 7183 and IGHV3/36-60 gene families are increased (Fig. 1A). Among different D H genes, the IGHD1-1 gene is used the most frequent in almost 39% of the IgH sequences (Fig. 1B). For the J H genes, the IGHJ2 gene is used the most frequent in 43% of IgH genes (Fig. 1C). It should be noted that these 17179 mouse IgH sequences were derived from about 861 published reports (Table  S2), presumably from more than 861 experiments with different mice. This analysis represents a comprehensive view of the IgH repertoire of the current available mouse IgH gene sequences in the NCBI database.

Identification of V H Replacement Products
In the initial test, we use the V H RFA program to identify potential V H replacement products in 271 mouse IgH gene sequences described previously [40]. Among them, 252 unique IgH genes have clearly identifiable V H , D H , and J H germline genes. Then, we searched for V H replacement footprint motifs with 3, 4, 5, 6, or 7 nucleotides within the V H -D H junction (N1) regions of these IgH genes. V H replacement can only introduce V H replacement footprint in the N1 region. As an internal control, we searched for similar V H replacement footprint motifs in the D H -J H junction (N2) regions of these IgH genes, which are likely generated by random nucleotide addition. The frequencies of 3, 4, and 5-mer V H replacement footprint motifs in the N1 regions are significantly higher than those in the N2 regions (Table 1, top), suggesting that the distribution of such motifs in the N1 region is not due to random nucleotide addition. Based on the identification of the pentameric V H replacement footprints within the N1 regions, we estimate that the frequency of V H replacement products is 5.5% in these 252 mouse IgH gene sequences (Table 1, Top). If we consider the 4-or 3-mer of V H replacement footprints in the N1 regions, the frequencies of V H replacement products in these 252 IgH genes will be 21.2% or 38%, respectively (Table 1, top and the identified V H replacement products with 4-mer V H replacement footprints are shown in Table S5).
Further analysis of the 14 identified V H replacement products validated the assignment of V H replacement footprints by the V H RFA program (Table 2). Theoretically, V H replacement occurs through an upstream V H gene replacing a downstream rearranged V H gene. Among these 14 identified potential V H replacement products, 11 of them were likely generated through upstream V H genes replacing downstream V H genes; 3 of them did not follow such order ( Table 2).

Contribution of V H Replacement Products to the Mouse IgH Repertoire
Next, we analyzed the 11,309 unique mouse IgH gene sequences from the NCBI database using the V H RFA program to search for V H replacement products. We performed separated analyses to identify V H replacement footprints with 3, 4, 5, 6, and 7 nucleotides in the V H -D H junction (N1) regions. As internal controls, we also searched for the similar motifs in the D H -J H junction (N2) regions. The frequencies of identified V H re-placement footprints with 3, 4, 5, 6, or 7 nucleotides in the N1 regions are significantly higher than those in the N2 regions ( Table 1, bottom). These results indicate that the presence of these motifs at the N1 region is not due to random nucleotide addition. With a stringent setting to search for the pentameric V H replacement footprints at the N1 regions, 5.29% of the IgH genes contain such motifs and can be assigned as potential V H replacement products. If we consider V H replacement footprints with 4 or 3 nucleotides, 15.95% or 33.55% of the IgH genes, respectively, contain such motifs and can be assigned as potential V H replacement products (Table 1, bottom). These results revealed a significant contribution of V H replacement products to the diversification of the murine IgH repertoire.

Distribution of V H Replacement Products in IgH Genes Using Different Families of V H Genes
As we showed earlier, different V H gene families are used at different frequencies in the 10159 mouse IgH gene sequences. Next, we analyzed the distribution of the identified V H replacement products with 5-mer footprint motifs in IgH genes using different V H gene families. Among all the IgH genes using different families of V H genes, the frequency of V H replacement products in IgH genes using the VH2/Q52 genes is significantly higher than that in the overall mouse IgH sequences ( Table 3). The frequencies of V H replacement products in IgH genes using the other V H gene families are quite similar. For example, although the IGHV1/V H J558 and IGHV5/V H 7183 families are used most frequently and the IGHV4/X-24, IGHV12/CH27, and IGHV14/SM7 families are used at very low frequencies, the frequencies of V H replacement products in IgH genes using the IGHV1/V H J558, IGHV5/V H 7183, IGHV4/X-24, IGHV12/ CH27, and IGHV14/SM7 families are similar (Table 3). These results indicate that although different families of V H genes are used differentially during the primary V(D)J recombination, they are similarly targeted for secondary recombination during V H replacement. As an internal negative control, we analyzed the N1 regions of IgH genes using the D H proximal V H 5-2/7183.2 gene. Among the 56 functional IgH genes using the V H 5-2/7183.2 gene, there is no pentameric V H replacement footprints in the N1 regions. Such result provides supporting evidence that the presence of pentameric footprints in the N1 regions of mouse IgH genes is contributed by V H replacement.

Enrichment of V H Replacement Products in IgH Genes Derived from Different Strains of Autoimmune Prone Mice and IgH Genes Encoding Autoantibodies
To explore the biological significance of V H replacement in mouse, we analyzed the distribution of V H replacement products in IgH genes correlating with different keywords in the NCBI database. Based on the identification of 5-mer V H replacement footprints within the N1 regions, the frequencies of V H replacement products in IgH genes derived from C57BL/6 and  BALB/c strains of mice are 3.17% and 5%, respectively ( Fig. 2A and Table S6). Such numbers may serve as the basal levels of V H replacement products in these mice. Comparing IgH genes derived from several strains of mice, the frequencies of V H replacement products are highly elevated in IgH genes derived from different strains of autoimmune prone mice ( Fig. 2A). In particular, the frequencies of V H replacement product are elevated in IgH genes derived from lupus prone NZB/NZW F1, NZM2410, MRL/lpr, and SLE1/SLE3 mice. In IgH genes derived from mice carrying the spontaneous Fas lpr mutation (MRL/MpJ-Lpr/Lpr), the frequency of V H replacement products is 15.38%. In IgH genes from the Sle1/Sle3 mice, the frequency of V H replacement  FJ816520  IGHV1S132  tgtgcaaga  gggaggacct  IGHD2-14  IHGV8-10, IGHV8-14, IGHV8S2  Y   FJ150867  IGHV14-3  tgtgcaaga  gggagaggggggcgtgatc  IGHD1-1  IGHV3-3, IGHV10-3  products is 30%. These frequencies are significantly higher than that in the BALB/c or C57BL/6 mice (p,0.05, two tailed Chisquare test) ( Fig. 2A). The elevated levels of V H replacement products in autoimmune prone mice suggest that V H replacement products contribute to the generation of autoantibodies. Indeed, further analyses of the IgH genes encoding different antibodies showed that the frequencies of V H replacement products are 12.1% in IgH genes encoding ANA antibody and 9.34% in IgH genes encoding anti-DNA antibodies. These levels are significantly higher than those in the BALB/c or C57BL/6 mice. As a negative control, the frequency of V H replacement products in IgH genes obtained from mice immunized with NP is 3.66%, which is similar to that in the C57BL/6 mice. Taken together, these results provide the first information that V H replacement products are highly enriched in IgH genes derived from different strains of autoimmune prone mice and in IgH genes encoding anti-DNA and ANA autoantibodies.
Using the V H RFA program, we also analyzed the frequencies of V H replacement products based on the 4-or 3-mer of V H replacement footprints in IgH genes derived these diseased subcategories. Extending the assignment of V H replacement products with considering the 4-and 3-mer V H replacement footprints clearly increases the frequencies of V H replacement products in IgH genes from all subcategories. With considering the 4-mer V H replacement footprints, the frequencies of V H replacement products in IgH genes derived from NZB/NZW, MRL/ lpr, SLE1, SLE1/SLE3 and IgH genes encoding anti-DNA and ANA antibodies are significantly higher than that in the BALB/c mice (p,0.05, two tailed Chi-square test) (Fig. 2B); with considering the 3-mer V H replacement footprints, the frequencies of V H replacement products in IgH genes derived from NZB/ NZW, NZM2410, MRL/lpr, SLE1, SLE1/SLE3, NOD/NOR and IgH genes encoding auto antibodies, anti-DNA antibodies, and ANA antibodies are significantly higher than that in the BALB/c mice (p,0.05, two tailed Chi-square test) (Fig. 2C). Taken together, these results showed that V H replacement products are enriched in IgH genes derived from different strains of autoimmune prone mice and in IgH genes encoding autoantibodies.

The Identified V H Replacement Footprints Preferentially Encode Charged Amino Acids
Our previous analysis of the identified V H replacement products in human IgH genes showed that the V H replacement footprints preferentially encode charged amino acids into the IgH CDR3 regions [31]. Here, analysis of the identified V H replacement products from mouse IgH genes showed that 64% of the amino acids encoded by the identified V H replacement footprints contribute charged amino acids, including K, R, D, E, N, and Q. Such frequency is significantly higher than the overall frequency of charged amino acids in the N1 regions (p,0.0001) (Fig. 3A). Moreover, the frequencies of charged amino acids, including E, K, and R, encoded by the identified V H replacement footprints are significantly higher than those encoded by the N1 regions of non-V H replacement products (p,0.0001) (Fig. 3B). The preferential contribution of charged amino acids by the V H replacement footprints seems to be predetermined by the sequences at the 39 end of V H germline genes following the cRSS sites. The frequencies of charged amino acids encoded by the 39 ends of V H germline gene, including K, R, D, E, N, and Q, are significantly higher than those encoded by the D H germline genes (p,0.0001) (Fig. 3C). In non-functional IgH genes, the identified V H replacement footprints also preferentially encode charged amino acids, although the usages of different charged residues are slightly different from those in the functional V H replacement Figure 2. Enrichment of V H replacement products in IgH genes derived from different strains of autoimmune prone mice and IgH genes encoding autoantibodies. The frequencies of V H replacement products in IgH genes derived from different strains of mice were analyzed using the V H RFA program based on the keyword linked to each IgH gene in the NCBI database. V H replacement products were assigned based on the identification of (A) 5-mer V H replacement footprints, (B) 4-mer V H replacement footprints, or (C) 3-mer V H replacement footprints within the V H -D H junctions (N1 regions). The frequencies of V H replacement products in different subcategories were compared with that in the BALB/c mice. n, number of IgH sequences in each subcategory. Statistical significance was determined using a twotailed Chi square test with Yate's correction. p,0.05 (*) is considered products (Fig. 3D). Such results are consistent with previous findings that the V H replacement footprints identified in human or mouse V H replacement products preferentially encoded charged residues [31,39].

The 3-mer V H Replacement Footprints are Less Likely Contribute Charged Amino Acids to the CDR3 Regions
V H replacement was considered as a receptor editing process to change non-functional IgH rearrangements or IgH genes encoding autoantibodies [29,41]. Finding that the 5-mer V H replacement footprints preferentially encoded charged amino acids, especially R and K residues, is contrast to the original goal of V H replacement to eliminate autoreactive IgH genes. Because charged residues within the IgH CDR3 might contribute to autoreactivity. Interestingly, when we analyzed the amino acids encoded by the identified 3-mer V H replacement footprints, the usages of charged residues, including R, K, and E, are significantly reduced; meantime, the usages of several neutral residues, including H, L, and Y, are significantly increased (Fig. 4A). These results showed that shorter V H replacement footprints are less likely to encode charged residues.

V H Replacement Products have Longer CDR3 Lengths
During V H replacement products, a short stretch of nucleotides from previously rearranged V H genes were left within the newly generated V H -DJ H junctions [31]. Comparison of the IgH CDR3 lengths of the identified V H replacement products showed that the average CDR3 length of V H replacement products with 5-mer footprints is significantly longer than that of V H replacement products with 3-mer footprints; the average CDR3 length of V H replacement products with 3-mer footprints is significantly longer than that of the total functional IgH genes in the NCBI database (p,0.0001, unpaired t test) (Fig. 4B). These results indicate that elongation of IgH CDR3 region is one of the intrinsic features of V H replacement.

Selection of V H Replacement Footprints Encoding Positively Charged Residues in Autoantibodies
The preferential contribution of charged amino acids by V H replacement footprints is likely predetermined by the 39 end sequences of V H germline genes. Based on the 39 end sequences of V H germline genes, V H replacement footprints can contribute almost equal numbers of positively or negatively charged residues (Fig. 5A). Indeed, in the identified V H replacement products from IgH genes derived from BALB/c or C57BL/6 mice, the frequencies of positively and negatively charged amino acids encoded by the V H replacement products are similar (Fig. 5A). However, in the identified V H replacement products in IgH genes from autoimmune prone mice, including MRL/lpr and Sle1/Sle3 mice, the frequencies of positively charged residues encoded by the V H replacement footprints are significantly higher than that in the control mice. Meantime, the frequencies of negatively charged residues encoded by the V H replacement footprints are significantly lower than that in the control mice (Fig. 5A). The frequencies of negatively charged residues encoded by the identified V H replacement footprints are significantly lower in IgH genes derived from C56BL/6/lpr mice and in IgH genes encoding anti-DNA or ANA antibodies (Fig. 5A). Detailed analysis of the functional versus non-functional IgH genes derived from MRL/lpr mice showed that the frequencies of positively charged residues encoded by the identified V H replacement footprints were elevated in functional but not in non-functional IgH genes (Fig. 5B). These results indicate that the positively charged residues encoded by V H replacement products were positively selected in these autoimmune prone mice.

The Identified V H Replacement Products are Mutated
The accumulation of V H replacement products in IgH genes derived from different strains of autoimmune prone mice and IgH genes encoding different autoantibodies suggested that V H replacement products contribute to the generation of autoantibodies in mice. Analyses of the mutation status of these identified V H replacement products showed that the enriched V H replacement products in autoimmune prone mice or IgH genes encoding anti-DNA or ANA autoantibodies are mutated (Fig. 5C), indicating that these V H replacement products are positively selected in these autoimmune prone mice.

Discussion
In the current report, we analyzed 17,179 mouse IgH gene sequences available from the NCBI database and provided a comprehensive view of the V H , D H , and J H gene usages of these mouse IgH genes. Based on the identification of the pentameric V H replacement footprints in the N1 regions, we estimated that the frequency of V H replacement products in the 11309 unique mouse IgH gene sequences with identifiable D H genes is 5.29%. Such result indicates a significant contribution of V H replacement products to the diversification of murine antibody repertoire. This result is consistent with the previously estimated frequencies of V H replacement products in human and mouse IgH genes [31,39]. It should be pointed out that such estimation is based on the identification of V H replacement footprints with a minimal length of 5 nucleotides. In comparison to human V H germline genes, many mouse V H germline genes have fewer nucleotides following the cRSS sites. Out of the 150 functional mouse V H germline genes with cRSS sites, 60 of them have only 5 nucleotides following the cRSS sites. If there is any exo-nuclease activity to remove one nucleotide at either the 39 or the 59 end of the V H replacement footprint during primary V H to DJ H recombination or V H replacement recombination, respectively, the remaining V H replacement footprints will have less than 5 nucleotides and cannot be identified from this analysis. Based on this consideration, assigning V H replacement footprints with 4 or 3 nucleotides might be a reasonable and accurate method to identify potential V H replacement products in mouse IgH genes. If we consider the 4-or 3-mer V H replacement footprints at the N1 regions to assign V H replacement products, the frequencies of V H replacement products in the mouse IgH gene sequences should be 16% or 32%, respectively.
It has been shown previously that in mice carrying two nonfunctional alleles of IgH genes, V H replacement occurs efficiently to generate almost normal number of B cells with a diversified repertoire [32,33]. All these functional IgH genes in this mouse are generated through V H replacement. However, only about 20% of the IgH gene sequences contain potential V H replacement footprints (.3 mer). The other 80% of IgH gene sequences have no identifiable V H replacement footprints [32,33]. This result indicates that most of the V H replacement footprints are deleted during V H replacement recombination. Thus, even if using the minimal length of V H replacement footprints with 4 or 3 significant and p,0.0001 (**) is considered extremely significant. The detailed sequence analysis and the identified V H replacement products with 5-mer V H replacement footprints correlating with keywords are included in Table S6. doi:10.1371/journal.pone.0057877.g002 nucleotides, we may still under-estimate the actual frequency of V H replacement products in the murine IgH repertoire. Theoretically, 66.7% of the IgH rearrangements generated during V(D)J recombination will be out of reading frame and cannot produce functional IgH proteins; about 44% of the pro B cells undergoing V(D)J recombination should carry non-functional rearrangements  Table S6. (B) The frequencies of individual amino acid encoded by the identified V H replacement footprints or the N1 regions of non-V H replacement products were compared. n, amino acids encoded by the identified V H replacement footprints or the N1 regions of non-V H replacement products. (C) The frequencies of individual amino acid encoded by the 39 end of V H germline genes and D H regions were compared. n, amino acids encoded by the V H gene 39 ends or D H regions. (D) Usages of different amino acids encoded by the identified V H replacement footprints in functional V H replacement products and non-functional V H replacement products. n, amino acids encoded by the identified V H replacement footprints. Statistical significance was determined using a two-tailed Chi square test with Yate's correction. n, number of amino acid residues encoded by indicated sequences. p,0.05 (*) is considered significant and p,0.0001 (**) is considered extremely significant. doi:10.1371/journal.pone.0057877.g003 on both IgH alleles. If V H replacement can efficiently rescue these pro B cells, at least 44% of the expressed IgH genes should be generated by V H replacement.
We should also point out that this sequence analysis based approach in identification of V H replacement footprints may have false positive calls. Theoretically, there are no V H replacement footprints in the N2 regions. In some of the IgH sequences, we identified similar 3, 4, or 5 mer V H replacement footprint motifs in the N2 regions, although the frequencies of such motifs in the N2 regions are significantly lower than those in the N1 regions. The presence of such V H replacement footprint motifs in the N2 regions could be due to random nucleotide addition during V(D)J recombination. In this regard, a low frequency of identified footprints might be false positive.
If we use the 5-mer V H replacement footprints to assign V H replacement products, the frequencies of V H replacement products in IgH genes derived from BALB/C or C57BL/6 mice are about 5% or 3.2%, respectively, which may represent the basal level of V H replacement product in these two strains of mice. Interestingly, the frequencies of V H replacement products are significantly elevated in IgH genes derived from different strains of autoimmune prone mice, including MRL/Lpr and Sle1/Sle3 mice. It has been well demonstrated that these mice spontaneously produce anti-DNA or anti-ANA antibodies and develop lupus like symptom [42][43][44][45][46][47][48][49]. Indeed, V H replacement products are significantly elevated in IgH genes encoding anti-DNA antibodies or ANA autoantibodies derived from mice with lupus glomerular nephritis. These results suggested a potential contribution of V H replacement products to the generation of autoantibodies. When we consider the 4-or 3-mer V H replacement footprints to assign V H replacement products, the frequencies of V H replacement products are elevated in all the sub-categories of IgH genes. Nevertheless, the frequencies of V H replacement products in IgH genes derived from different strains of autoimmune prone mice and IgH genes encoding anti-DNA and ANA antibodies are significantly higher than that in the BALB/c mice.
Due to the location of the cRSS, V H replacement will leave a short stretch of V H replacement footprints to elongate the IgH CDR3 region [31,41]. Strikingly, the identified pentameric V H replacement footprints preferentially encode charged amino acids in the newly formed CDR3 regions. Such features are commonly found in V H replacement products identified from human and mouse IgH genes [31,39] and highly conserved in all the jawed vertebrates [50]. IgH genes with long CDR3 and charged residues are frequently encoding autoantibodies or anti-viral antibodies [51]. Here, our results showed that the frequencies of V H replacement products are significantly elevated in IgH genes encoding anti-DNA and ANA autoantibodies in mouse. Theoretically, the V H replacement footprints can encode either positively or negatively charged residues. Analysis of the amino acids encoded by the identified V H replacement products from different strains of autoimmune prone mice and IgH genes encoding autoantibodies showed that the frequencies of positively charged residues encoded by V H replacement footprints are significantly elevated; while the frequencies of negatively charged residues encoded by V H replacement footprints are significantly reduced. Previous studies have shown that positively charged residue like Arg within the IgH CDR3 is critical for DNA binding [52][53][54]. These results suggested that the identified V H replacement products from autoimmune prone mice have been positively selected. Such notion is also supported by the accumulated mutations in these identified V H replacement products.
V H replacement was originally recognized as a receptor editing process to change either non-functional IgH genes or IgH genes encoding autoreactive antibodies [20,55]. The enrichment of V H replacement products in IgH genes from different strains of autoimmune prone mice and in IgH genes encoding autoantibodies are surprising findings from this study. Currently, it is not clear why V H replacement products are accumulated in autoimmune prone mice. Like any recombination process, V H replacement is a random process that can generate non-functional IgH genes or IgH genes encoding autoreactive antibodies.
Previous studies have shown that V H replacement products generated through replacing the knocked-in anti-DNA IgH genes can produce high affinity anti-DNA antibodies during chronic graft-versus-host (cGVH) response [56]. Theoretically, after V H replacement recombination, the newly generated IgH genes should be subjected to strict negative selection again to eliminate B cells expressing autoreactive BCRs. The observed accumulation of V H replacement products in autoimmune prone mice could be due to the defective negative selection processes in these mice. In autoimmune prone mice, the newly generated V H replacement products encoding autoreactive antibodies cannot be efficiently eliminated, but are rather positively selected and contribute to the generation of autoantibodies. To this extend, the different strains of autoimmune prone mice will be excellent experimental models to dissect how the V H replacement products are selected and enriched during early B cell development.
Our analyses of the amino acid residues encoded by the identified V H replacement footprints also uncovered an interesting finding that short V H replacement footprints, especially the 3-mer footprints, encode less charged residues. These results suggested that if the V H replacement footprints were trimmed down to 3-mer during either primary or secondary recombination, they will be less likely to contribute charged amino acids into the IgH CDR3 regions. Given the fact that 33.55% of IgH genes contain 3-mer V H replacement footprints at their N1 regions, it is reasonable to conclude that the majority of these V H replacement products successfully edited the IgH genes without introducing of extra charged residues into the newly formed CDR3 regions. The observed accumulation of V H replacement products based on the identification of 5-mer footprints in the N1 regions in IgH genes derived from autoimmune prone mice may represent the failed V H replacement attempts either due to defects in negative selection or defects in trimming down the V H replacement footprints during primary or secondary recombination. Such findings raised several interesting questions that require further studies.
In conclusion, analysis of large number of mouse IgH gene sequences from the NCBI database provides a comprehensive view of the IgH repertoire of the available mouse IgH genes in the NCBI database and reveals a significant contribution of V H replacement products to the diversification of mouse IgH repertoire. Identification of enriched V H replacement products in IgH genes derived from different strains of autoimmune prone mice and IgH genes encoding autoantibodies indicated that abnormal regulation of V H replacement may contribute to the generation of autoreactive antibodies.

Mouse IgH Sequences
Entrez IDs of mouse IgH sequences were provided by Igblast (http://www.ncbi.nlm.nih.gov/projects/igblast/) on May 07, 2011, which were used to download GenBank records of the sequences from NCBI. There were total 17,179 mouse IgH gene sequences retrieved at that time. The IDs of these IgH genes and their V H , D H , and J H gene assignments are included in Table S1. After assignment of the potential germline V H , D H , J H genes, clonally redundant sequences were stripped out based on their identical CDR3 regions. The resulting 11,308 unique sequences were further analyzed. Clonally related sequences with mutations within their CDR3 regions still remain. The 17179 mouse IgH sequences were derived from 861 published studies (Table S2). There were 1, 2, 4, 4, and 6 publications that contributed more than 500, 400-499, 300-399,200-299, and 100-199 sequences, respectively; 127 publications contributed 11-99 sequences; 717 publications contributed 10 or less than 10 sequences.

The V H RFA Program
We developed a Java-based V H RFA program to incorporate assignments of the V H , D H , and J H germline gene segments using the V-QUEST program (http://www.imgt.org/IMGT_vquest), identification of V H replacement footprints with different lengths, analysis of amino acids encoded by the identified V H replacement footprints, calculation of the amino acid usage encoded by the identified V H replacement footprints, and correlation of the identified V H replacement products with different keywords and publications associated with the sequences in the NCBI database.

V H , D H , and J H Germline Gene Assignment
Mouse IgH sequences in the GenBank format were converted to FASTA format and submitted to IMGT/V-QUEST (http:// www.imgt.org/IMGT_vquest/share/textes/) for assign potential germline V H , D H , J H genes, allowing 1 mutation at the 39 end of V H genes and at the 59 end of J H genes. All the IgH gene sequences were analyzed in batches containing 50 sequences each batch and the results were downloaded to a local computer as Excel files. These processes were conducted using the V H RFA program.

Identification of V H Replacement Footprint
All the rest steps were conducted on a local computer by the V H RFA program. First, a library file was generated, which contains all the potential V H replacement footprints derived from functional V H germline reference genes from the IMGT database (Table S3). Basically, the 39 end segments following the cRSS sites from functional mouse V H genes were sliced into different groups with 3,4,5,6,7,8,9,10, and 11 nucleotides in length (Table S4). The V H RFA program will use this library to search the N1 (V H -D H junction (N1) or D H -J H junction (N2, as negative control) regions of the IgH genes to identify matched footprint motifs. For each IgH gene, the V H RFA program started by searching the longest footprint motifs (11 mer) from the 59 to 39 of the DNA sequences and then goes to search footprints with one nucleotide shorter. The identified footprints were listed if it does not overlap with any previously identified footprint within this region. For examples, the end results of footprint analyses of with specified 5 mer included all the footprints with 5, 6, 7, 8, 9, 10, and 11 mer from the V H replacement footprint library. The end result was exported as a CVS file that contains the gene ID, functionality, V H , D H , J H gene assignment, V H replacement footprint in N1 (N1 signatures) or N2 (N2 signatures), together with other information from the original Excel file provided by the IMGT V-QUEST program. The identified footprints were shown in parenthesis within the N1 or N2 region sequences.

Analysis of the Amino Acid Encoded by V H Replacement Footprints, Keyword and Publication Linked to Each Gene, and Mutation
After identification of the V H replacement footprints within the N1 regions, the V H RFA program further analyzed the amino acids encoded by the V H replacement footprints and the usages of different amino acid. Each result was exported as an individual Excel file.
The V H RFA program can also analyze the original GenBank file to correlate the keywords and publication information with each IgH gene sequence. Basically, the V H RFA program parses the source GenBank file for keywords in the KEYWORDS and FEATURES sections of each entry sequence and output the keyword list in correlation with the sequence IDs, VDJ assignments, N1 footprints, and N2 footprints. Through this analysis, we can determine the distribution of V H replacement products in different diseases.
For mutation analysis, the V H RFA program only calculated the mutation rate of IgH V H genes with .80% similarities to the assigned germline V H genes.

Statistical Analysis
Statistical significance was determined by using either the two tailed Chi-square test with Yates' correction or non paired student t test. Significant difference was determined if the p value ,0.05.

Supporting Information
Table S1 Analyses of mouse IgH genes and identification of VH replacement products. (XLSX)