Comparison of Antibody Repertoires Produced by HIV-1 Infection, Other Chronic and Acute Infections, and Systemic Autoimmune Disease

Background Antibodies (Abs) produced during HIV-1 infection rarely neutralize a broad range of viral isolates; only eight broadly-neutralizing (bNt) monoclonal (M)Abs have been isolated. Yet, to be effective, an HIV-1 vaccine may have to elicit the essential features of these MAbs. The V genes of all of these bNt MAbs are highly somatically mutated, and the VH genes of five of them encode a long (≥20 aa) third complementarity-determining region (CDR-H3). This led us to question whether long CDR-H3s and high levels of somatic mutation (SM) are a preferred feature of anti-HIV bNt MAbs, or if other adaptive immune responses elicit them in general. Methodology and Principal Findings We assembled a VH-gene sequence database from over 700 human MAbs of known antigen specificity isolated from chronic (viral) infections (ChI), acute (bacterial and viral) infections (AcI), and systemic autoimmune diseases (SAD), and compared their CDR-H3 length, number of SMs and germline VH-gene usage. We found that anti-HIV Abs, regardless of their neutralization breadth, tended to have long CDR-H3s and high numbers of SMs. However, these features were also common among Abs associated with other chronic viral infections. In contrast, Abs from acute viral infections (but not bacterial infections) tended to have relatively short CDR-H3s and a low number of SMs, whereas SAD Abs were generally intermediate in CDR-H3 length and number of SMs. Analysis of VH gene usage showed that ChI Abs also tended to favor distal germline VH-genes (particularly VH1-69), especially in Abs bearing long CDR-H3s. Conclusions and Significance The striking difference between the Abs produced during chronic vs. acute viral infection suggests that Abs bearing long CDR-H3s, high levels of SM and VH1-69 gene usage may be preferentially selected during persistent infection.


Introduction
A highly diverse repertoire of antibodies (Abs) is a prerequisite for the adaptive immune system to recognize a vast array of antigens (Ags) and distinguish self from non-self. Three processes contribute to the production of this diverse repertoire: (i) somatic recombination of germline V, D and J genes, (ii) addition and deletion of nucleotides at the V-D, D-J, and V-J junctions, and (iii) somatic hypermutation after Ag stimulation [1,2]. The third complementarity-determining region of the Ab heavy chain (CDR-H3) is encoded by the D H gene, parts of the V H and J H genes, and nucleotides added at the junctions between these; it is the most variable region in the Ab, and typically is central to contact with cognate Ag [3].
A major goal for an HIV vaccine is to elicit Abs that neutralize a broad range of HIV-1 primary isolates. To this end, efforts have been made to identify and use broadly (b) neutralizing (Nt) monoclonal (M) Abs with this activity for epitope-targeted vaccine design [4]. The bNt MAbs identified so far are rare and most of them bear unusually long CDR-H3s. Despite intensive effort, only eight bNt MAbs have been discovered (b12, 2F5, 4E10, 2G12, 447-52D, PG9/PG16, VRC01/02, and HJ16 [5,6,7,8,9]; 5 of which bear CDR-H3s of 20 aa or more, based on the IMGT numbering system). Consistent with this, most HIV-1-infected individuals produce strong strain-specific Nt Ab responses against HIV-1 envelope (Env) soon after initial infection; yet rarely do they develop broad neutralization [10,11], and then only after a year or more [12].
While high levels of SM have been noted for all bNt MAbs, starting with Kunert et al. [13], a number of authors have proposed a connection between the length of the CDR-H3 region, and the broad neutralization of these MAbs [5,7,14,15,16,17,18,19,20,21]. Mutagenesis experiments and/or X-ray crystal structures of Fab bound to protein or peptide Ag have implicated their CDR-H3s as being required for neutralization: b12 [15,22], 2F5 [16,23,24], 447-52D [17], and 4E10 [18,25]. This has been observed even in cases in which CDR-H3 appears to make minimal or no contact with envelope protein Ag [16,23]. It has been speculated that in these cases, CDR-H3 may contact other sites on HIV-1, such as the viral membrane [16,18,19,20,24,25,26]. Nevertheless, it is not clear whether long CDR-H3s are required, in general, for broad neutralization; certainly the exceptions, MAbs 2G12 and VRC01/02, disprove this as an absolute rule for broad Nt activity.
While it is generally acknowledged that high levels of SM are produced by T-cell driven processes in germinal centers, the conditions under which long CDR-H3 Abs appear in adaptive immune responses are less well understood, and this could help explain the origin of bNt Abs during HIV infection. Importantly, the long CDR-H3s have been associated with anti-protein Abs [27] and anti-viral Abs [28]. Long CDR-H3s found among polyreactive natural Abs [29,30] and autoreactive Abs produced by naïve B cells in SLE patients [31] do not carry SMs; yet, those found among the memory B cells in healthy people do [32]. However, analyses to directly associate different types of adaptive immune response with CDR-H3 length have not been reported. Our purpose was to identify the circumstances under which Abs bearing some of the features of bNt MAbs (viz., long CDR-H3s and high levels of SM) appear during adaptive immune responses in humans.
As an initial approach, we examined heavy chain variable (V H ) genes expressed by individuals undergoing Ag-specific immune responses, by compiling a database of expressed human V H genes of MAbs for which the Ag specificity of the MAb was known. The MAbs were taken from individuals with chronic infections (ChI), acute infections (AcI), or following immunization (included with AcI MAbs), and systemic autoimmune diseases (SAD). CDR-H3 length, level of SM, and V H -gene usage were compared among MAbs with specificity against self vs. non-self Ag, with specificity against protein vs. non-protein Ag, and/or from different conditions (ChI MAbs, SAD MAbs and AcI MAbs).
Both long CDR-H3s and SMs were strongly associated with protein Ag. Long CDR-H3s were at their highest frequency among ChI MAbs, and less so among SAD and AcI MAbs, whereas SMs were more prevalent in ChI and SAD anti-protein Abs and greatly reduced among anti-protein AcI Abs. Both ChI and AcI Abs tended to use distal V H genes; the use of V H 1-69 was especially high among anti-HIV Abs, and was associated with high levels of SM and long CDR-H3s. The picture emerging from this analysis is that Abs bearing high numbers of SMs, long CDR-H3s and the distal gene V H 1-69 appear to be selected in chronic vs. acute viral infections. Thus different biological processes, and perhaps different B-cell subsets, such as marginal zone vs. conventional B2 B cells [33,34], (see Baumgarth [35] for review) could be involved in the earlier vs. later stages of viral infection, respectively. Table 1 summarizes the analysis of expressed V H genes for several MAb categories with regard to CDR-H3 length, number of SMs relative to the predicted germline V H gene, and the distance in the IgH locus between an Ab's germline V H gene and the V H 6-1 gene, the V H gene closest to the D H region. As expected, the bNt HIV MAbs had the longest CDR-H3s with an average length of 20.9 aa. Fig. 1 compares the CDR-H3 length distributions for the bNt HIV MAbs, the non-bNt HIV MAbs (i.e., excluding the bNt HIV MAbs), and the remaining ChI MAbs (excluding all HIV MAbs). It shows that both the bNt and non-bNt HIV MAbs have long CDR-H3s; the difference between these two MAb groups was only marginally significant (Table 1A; unadjusted p = 0.0501), and not significant compared to the Bonferroni corrected value of 0.0018. Thus, CDR-H3 length does not appear to be restricted to broad neutralization. In addition, the CDR-H3s of all of the antiprotein HIV MAbs (mean 17.8 aa) were not significantly longer than those of non-HIV ChI MAbs (16.5 aa). Although our data set has only 34 non-HIV ChI MAbs, 12 of these have CDR-H3s of 19 aa or longer ( Fig. 1), placing them in the upper quartile of the 427 Ag-specific MAbs. Thus, long CDR-H3s were associated with all types of ChI MAb, including the anti-HIV MAbs. Fig. 2 shows the distribution CDR-H3 length for anti-HIV MAbs, partitioned according the region of Env bound; CDR-H3 length was longest for MAbs against the CD4i site (19.6 aa), intermediate for those against the V3 loop (18.5 aa) and CD4bs (18.3 aa), and shortest for the anti-gp41 MAbs (15.9 aa). These distributions were significantly different (x 2 test, p,0.05, PROC FREQ, SAS), indicating that, while the anti-HIV MAbs as a whole bear long CDR-H3s, epitope specificity also shapes CDR-H3 length.

CDR-H3 length of expressed V H genes in adaptive Ab responses
The observation that long CDR-H3s are associated with chronic viral infections in general led us to compare MAbs from such infections to those from other immune responses, namely SAD and AcI. Categorization of MAbs by these types of immune response showed that the average length of the ChI MAbs (17.6 aa, Table 1B and Fig. 3A) was most different from the AcI MAbs (14.7 aa), with the SAD MAbs being intermediate (15.1 aa). This trend, of the AcI MAbs being the most different from the ChI MAbs, persisted when the categories were further divided into Abs against protein vs. non-protein Ags ( Fig. 3B and Table 1B). For example, among anti-protein MAbs, those from the ChI group had significantly longer CDR-H3s than did those from the AcI or SAD groups (p,0.0001). As virtually all of the ChI Abs (except bNt MAb 2G12) and all of the anti-protein AcI Abs are anti-viral, it is striking that a large difference in CDR-H3 length exists between the Abs elicited by the two types of viral infection. In addition, anti-protein MAbs had significantly longer CDR-H3s than MAbs against non-protein Ags (p,0.0001, Table 1C), even with the Bonferroni correction; while the difference between self and non-self Abs was not as great (p,0.005, not significant with the Bonferroni correction). Thus autoimmune status (self vs. nonself) had a much lower effect on CDR-H3 length than did protein vs. non-protein Ag.

SM of expressed V H genes in adaptive Ab responses
That long CDR-H3s were found mainly among anti-protein Abs suggests that Abs with long CDR-H3s are selected by protein Ag, and hence, by T-cell-driven processes; such responses typically occur in germinal centers and involve SMs introduced by activation-induced cytidine deaminase (AID) [36]. Predicting that SMs should increase among Abs bearing long CDR-H3s, we compared the patterns of SM for the same categories of MAb as for CDR-H3 length in Table 1. For many comparisons, the patterns observed for SMs paralleled those observed for CDR-H3 length. For example, the eight bNt HIV MAbs had both the highest average CDRH3 length and the highest average level of SM in V H (mean 53.3). Given the small number of bNt MAbs, these were significantly longer than the non-bNt HIV MAbs (mean 27.3, p = 0.0024), but not when the Bonferroni correction for multiple tests was used. In comparing MAbs from the different types of immune response, the ChI MAbs (SM mean 27.3) were, again, most different from the AcI MAbs (mean 10.9), with the SAD MAbs being intermediate (mean 17.9). The SAD and AcI MAbs showed opposite patterns when they were partitioned according to protein vs. non-protein Ag; for SAD, the SM level among anti-protein MAbs was higher than that of MAbs against non-protein Ags, whereas for AcI MAbs, the level of SM was lower for protein vs. non-protein Ags. The average number of SMs for AcI MAbs against non-protein Ags, which were mainly against streptococcal capsular polysaccharide, was 15.5, intermediate between ChI and SAD non-protein MAbs. However, the level of SM was extremely low for AcI MAbs against protein Ags (mean 7.9), many of which were against rotavirus (more than 40%). This reduction in SMs among the anti-protein AcI MAbs was not restricted to anti-rotavirus MAbs, as when they were excluded from the AcI MAb category, the anti-protein AcI MAbs still had a low number of SMs (mean 7.3). In summary, there was a significant difference between Abs from chronic and acute viral infections, with the latter consistently having much shorter CDR-H3s and far fewer SMs. Little difference in SMs was observed between anti-protein Abs produced by chronic processes (ChI and SAD); both had long CDR-H3s and high levels of SM, and both involve persistent exposure to Ag.
To further analyze the relationship between CDR-H3 length and levels of SM, the MAb dataset was divided into quartiles according to CDR-H3 length, with the shortest quartile (S) having lengths of 13 aa or less, the longest quartile (L) 19 aa or more, and the two middle quartiles comprising the M class (between 14 and 18 aa inclusive, Table 2). While the differences in SM levels observed in Table 1B among the disease conditions or types of Ag specificities generally held across the corresponding quartiles in Table 2, some differences were observed.  (Table 1A), chronic infection (ChI), acute infection (AcI) or systemic autoimmune disease (SAD) (Table 1B), and self vs. non-self (Table 1C). shows little difference in the number of SMs between the antiprotein Abs for all disease categories and little difference in SM levels between the quartiles for medium-length and the longest CDR-H3s; however, the short CDR-H3 quartile tended to have lower levels of SM than did the longer two groups. Furthermore, the trend for MAbs against non-protein Ags was reversed: MAbs in the shortest CDR-H3 quartile tended to have the highest SM levels for both the SAD and AcI categories. Although long CDR-H3 MAbs against non-protein Ags are uncommon, their SM level was lower than that of their short CDR-H3 counterparts. This is consistent with the hypothesis that MAbs against non-protein Ags (even those with long CDR-H3s) may derive from different B-cell subsets and/or different immune processes than the MAbs against protein Ags.

Germline V H -gene usage in adaptive Ab responses
Early reports of V H gene family usage in anti-HIV Abs reported that V H 3 was over-utilized and V H 4 was under-utilized compared to the naïve repertoire [37,38]. Given that our dataset includes Abs of known Ag specificity for several disease conditions, we compared the use of V-gene families and of the specific gene V H 1-69 among these conditions ( Table 3 and Figure S1). Table 3 shows that there were significant differences when the proportions were tested across each gene family. The proportion of anti-HIV MAbs that use family V H 3 genes (29%) was lower than that for SAD MAbs (52%) or AcI MAbs (53%). Concomitantly, use of family V H 1 increased for HIV MAbs (38% for HIV compared to 23% for SAD MAbs and 22% for AcI MAbs), whereas the proportion of MAbs using family V H 4 was not significantly different among the major categories, ranging from 18 to 25%. Thus, HIV infection was related to an increase in V H 1 and decrease in V H 3 gene usage that was not apparent for ChI Abs (perhaps because this sample size is small) or other conditions. We were particularly interested in usage patterns for the V H 1-69 germline gene, as it is not commonly used in the naïve repertoire (e.g., Wardemann et al. 2003 [39]), but is characteristic of Ab repertoires in several disease states (see Discussion). As shown in Table 3 Given this distinct difference, we directly compared the features of MAbs that use V H 1-69 to those that do not ( Table 4). Among ChI MAbs, those that used V H 1-69 had significantly longer CDR-H3s (means of 20.1 vs. 17.0, P,0.001), whereas their SMs were not significantly different. Very few SAD or AcI MAbs used V H 1-69, so no statistical comparisons could be made within those groups. Among MAbs that use V H genes other than V H 1-69, the patterns among the disease categories of ChI, SAD and AcI were similar to those observed in the full data set; the CDR-H3 length of ChI MAbs remained significantly longer than those of SAD and AcI MAbs, and the SMs of all three groups were different. Thus, ChI Abs encoded by V H 1-69 appear to have longer CDR-H3s but not more SMs than their counterparts that do not use this germline gene.
V H 1-69 is a fairly distal gene, being approximately 764 Kb from V H 6-1 [40]; only five of the approximately 40 functional V H -genes are more distal. Thus we wondered if the use of V H 1-69 among Abs bearing long CDR-H3s in HIV infection could be part of a . Arrows indicate the mean for each category. Distribution of CDR-H3 length for more than 4000 Abs compiled from IMGT and Kabat databases [63] is included in Figs. 1-3, as a control comparison (blue line). The average CDR-H3 length of the 425 Ag-specific MAbs was 16.3 aa, which is higher than the 15.2-aa mean reported for 4751 expressed V H sequences compiled from the Kabat and IMGT databases by Zemlin et al. [63]; this may reflect that our data set included a higher proportion of ChI Abs, both HIV and non-HIV. doi:10.1371/journal.pone.0016857.g001 larger trend toward using distal genes in ChI. Table 1A shows that ChI MAbs use the most distal V H genes, consistent with this class having the longest CDR-H3s and the highest frequency of SMs. However, inconsistent with CDR-H3 length and SMs, antiprotein (anti-viral) AcI Abs had intermediate V H gene distances whereas SAD Abs had the most proximal ones. Importantly, the AcI MAbs, only three of which use V H 1-69, used distal genes overall; the average distance for 69 anti-protein (antiviral) AcI MAbs is 450 kb, suggesting that distal genes besides V H 1-69 may be selected in viral infections of all types. Table 2 further analyzes the relationship between CDR-H3 length and V H gene distance, dividing the MAbs into short, medium and long. Table 2 shows that the pattern is also not consistent within disease condition. Analyzing CDR-H3 length by quartile, the trend between CDR-H3 length and V H gene distance held for ChI and SAD but not for AcI Abs. The medium and long quartiles of ChI and SAD MAbs used the most distal V H genes, whereas they were used by the short and medium quartiles of anti-protein AcI MAbs. Thus, within the viral infection groups, and following similar trends for SM, distal gene usage was related to CDR-H3 length for the ChI, but not the AcI, MAbs.

Discussion
The primary motivation for this analysis was to determine if other types of Ab share features with the bNt HIV MAbs, which might help explain the rarity of bNt MAbs in HIV-1 infection. We found that the bNt MAbs most closely resemble the other anti-HIV MAbs, and ChI MAbs as a group, in being enriched for long CDR-H3s and high numbers of SMs; this indicates that these features are not limited to broad neutralization, but appear to be common characteristics of the Abs involved in chronic viral infections. That all anti-HIV MAbs, including the bNt MAbs, share similar features indicates that unusual immunological processes, such as breaking of tolerance [19], are probably not responsible for the rarity of the bNt Abs during chronic HIV infection. Instead, processes involved in chronic viral infections in general may be at play in shaping the repertoire of Abs available for selection after viral persistence and/or multiple rounds of viral escape; such processes are probably linked to broadening of the Ab response beyond those involved in the response to initial infection [41]. Our results are consistent with the view that Abs having the features of the bNt MAbs are not rare, but arise as a result of chronic viral infection.
Strikingly, the Abs from acute viral infections (anti-protein AcI Abs) had significantly shorter CDR-H3s and lower numbers of SMs than did the ChI Abs. In addition, while both types of Ab tended to use distal V H genes, Abs from acute viral infections bearing long CDR-H3s tended not to use the most distal V H genes, nor did they use V H 1-69 to the same extent as the ChI (and HIV) Abs. We speculate that, if the Abs involved in acute viral infections reflect those produced during the early phases of chronic viral infection, a shift in expressed V H gene composition (i.e., CDR-H3 length and V H gene usage) must occur over time, along with an increase in SMs.  High SMs, but not long CDR-H3s nor use of distal V H genes, were also found among SAD-related anti-protein MAbs. This lack of shared features between the SAD and bNt MAbs (or ChI MAbs in general) suggests that the bNt MAbs against HIV are probably not drawn from an initial pool of autoimmune B cells bearing long CDR-H3s, as previously hypothesized [19,20,42]; were that the case, then the bNt MAbs would be expected to be similar to the SAD MAbs but not the ChI ones. Clearly, Abs having the features of the bNt MAbs are not rare, and are routinely produced during ChIs. Thus, it seems more likely that the rarity of the bNt HIV Abs results from the cryptic, flexible and/or transient nature of conserved epitopes on the neutralization-competent structure of HIV Env; such epitopes are not immunodominant on the virus, nor on the envelope ''debris'' shed by infected cells, and as such, multiple rounds of viral escape are likely required before the immune system can mount an effective Ab response against them.
This study is to our knowledge the first to explicitly compare gene family usage in MAbs from HIV with those from other types of immune response. We observed a bias toward family V H 1 and against family V H 3 genes in the HIV and other ChI MAbs. The increased usage of family V H 1 agrees with Scheid et al. [43]; and removal of the large number of Abs from Scheid et al. did not affect this conclusion (analysis not shown). A deficit in family V H 3 usage associated with HIV infection has been reported in several studies [37,38,44,45,46]. This deficit is consistent with the suggestion that HIV-1 gp120 acts as a superAg that specifically deletes B cells bearing Abs encoded by genes from family V H 3 [47,48]. In addition, and in contrast to some previous findings [45], we did not observe over-utilization of the V H 4 family, which remained mostly constant across all MAb categories.
We observed an overabundance of V H 1-69 gene usage in the HIV MAbs and among ChI MAbs in general, compared to the naïve repertoire reported from other studies [39], and to the other MAb categories in our database. Our results extend the   observations of Huang et al. [49], who noted that nine of twelve MAbs against the CD4i site of gp120, used V H 1-69, and those of Gorny et al. [46], who showed that MAbs against all HIV Env epitopes, except the V3 loop, are enriched for V H 1-69 usage. In addition, three studies have noted almost exclusive use of V H 1-69 among cross-protective MAbs against influenza virus [50,51,52]. Both AcI and ChI MAbs tended to use distal V H genes, but only in the latter group were long CDR-H3s present in Abs encoded by distal V H genes, which tended to be V H 1-69. Three mechanisms can produce long CDR-H3s: (i) longer V H , D H , or J H genes can be used preferentially, (ii) CDR-H3 can be lengthened by insertions induced by activation-induced cytidine deaminase [9,53], and (iii) secondary rearrangement (or receptor editing or revision [54,55]) can result in N and P additions at the N1 junction. Secondary V Hgene rearrangement necessarily involves the use of distal V H and J H genes, because once a V H gene is somatically recombined with a D H gene (i.e., after primary V-D-J rearrangement), only genes more distal to the D H region are available for further joining. The features of the ChI Abs alone are consistent with secondary rearrangement model, in both using distal V H genes and having long CDR-H3s for the same Ab population. Thus, viral infection appears to select for distal V H genes, but if secondary rearrangement is playing a role in lengthening CDR-H3, it appears to be doing so only for the ChI Abs.
Many of our conclusions should be interpreted with caution, given that they are based on a limited dataset that may be biased in several ways. For example, there was significant bias related to Ag specificity for several categories of MAb; many of the anti-HIV MAbs were against the gp120 CD4bs, and were obtained via phage-displayed Ab libraries; most of the SAD MAbs were against the non-protein Ags DNA and phospholipid/cardiolipin; whereas most of the AcI MAbs were against streptococcal capsular polysaccharide and rotavirus. Another potential bias is related to the limited number of SADs we studied, with the preponderance being SLE and anti-phospholipid syndrome. Since we are studying MAbs, it is important to realize that particular antibodies are often selected for further study based on characteristics such as strength of binding, isotype or epitope, and thus the data set is not random with respect to these parameters. This is one reason why we adopted a conservative approach, and emphasize those results that satisfy a Bonferroni-adjusted p value based on the total number of tests conducted in Table 1. A larger dataset, indexed by disease and clinical condition, would overcome many of these potential biases. One roadblock to such a compilation is that many researchers do not routinely submit expressed sequences to public databases; this will become especially critical as high-throughput methods are employed to survey large sets of disease-specific MAbs.
This analysis of 427 Ag-specific MAbs should directly inform vaccine research. For example, the result that long CDR-H3s are associated with chronic and persistent Ag and anti-protein Abs motivates several questions. Are Abs bearing long CDR-H3s present at the beginning of an immune response, or do they ''evolve'' over time? If they accumulate over time, then are they directly selected from a pre-existing minor compartment within the naïve B-cell populations, do they comprise a specially recruited B-cell subset, and/or do they evolve by secondary processes (e.g., V H -gene replacement, DNA insertion, or gene conversion)? All of the bNt MAbs against HIV are heavily mutated and five of the eight have long CDR-H3s. This line of reasoning raises the possibility that long CDR-H3s are not required to bind conserved epitopes on the HIV-1 envelope, but arise instead through processes that come into play during long-term persistence of protein Ag and viral escape. If so, an effective HIV vaccine may produce bNt Abs via "normal" immunization processes, by virtue of enhancing the immunogenicity of Nt sites on Env. Given this scenario, it remains unknown if Abs bearing the features of acute antiviral Abs (which we expect to be similar to the features of Abs elicited by a traditional vaccine) can act as bNt Abs. While bNt Abs have yet to be elicited by vaccines meant to mimic the epitopes on Env that mediate neutralization by the bNt MAbs, that should not be taken as evidence that they cannot be so produced. Our results indicate that HIV vaccine research should continue to follow ''reverse vaccinology'' approaches [56] that attempt to make the sites recognized by the bNt MAbs immunodominant [57,58]. Progress in this approach has recently been observed with an influenza vaccine that elicts broadly protective Abs [59]. Conversely, it is also possible, but not proven, that ''chronic'' type Abs bearing the features of the bNt MAbs will be required for broad neutralization. If this is the case, then research into the cellular and genetic origins of such Abs is required. Thus our second recommendation is for research efforts to be expanded in this area, with the goal of developing vaccination strategies that stimulate key features of these chronic processes, and in so doing, elicit bNt Abs.

Sequence database
Heavy chain sequences of expressed MAbs were retrieved from the IMGT/LIGM-DB on-line database (http://imgt.cines.fr/), from the literature, and from direct contacts with researchers (see Table S1). Our goal was to collect V H sequences for all of the available human HIV MAbs. The Ag targets of the HIV MAbs included gp120, its CD4 binding site (CD4bs) and CD4 inducible site (CD4i), the gp120 V3 loop, gp41, Rev, Tat, p24, and p25. This MAb dataset was expanded to include human MAbs associated with other chronic infections (the ChI MAbs), including those against Epstein Barr virus, hepatitis B and C virus, herpes simplex virus and human cytomegalovirus. (Note that all of these MAbs are from viral infections.) For comparison, a similar group of MAbs from Systemic Autoimmune Disease (SAD) was assembled, including from systemic lupus erythematosus (SLE), anti-phospholipid syndrome, mixed connective tissue disease, rheumatoid arthritis, Sjögren's disease, and cold agglutin disease, and against the Ags, cardiolipin (serum dependent and serum independent), phospholipids, DNA, beta-2-glycoprotein, Sm ribonucleoprotiens, myelin basic protein, myelin-associated glycoprotein, achetylcholine receptor, Ro/SSA and La/SSB. We concentrated on SAD MAbs based on the hypothesis that the bNt anti-HIV MAbs were derived from autoAb/autoreactive precursors [19,20,42]. In addition, V H sequences were collected for MAbs associated with acute infections (Pseudomonas aeruginosa, rotavirus, Pneumococcus pneumoniae, Ebola virus, Neisseria meningitidis, hepatitis A virus), and from vaccinated individuals (Haemophilus influenzae Type b conjugate vaccine, 23-valent pneumococcal polysaccharide vaccine, Streptococcus pneumonia, tetanus toxoid, hepatitis B surface Ag), reported as the AcI MAbs. For each MAb, we attempted to obtain information on its isotype, Ag specificity, the methods used to obtain it (e.g., phage display, B-cell sorting, etc.), clinical data on the source-subject, and the bibliographic reference and GenBank accession number for the original MAb sequence. This information was entered into an Excel database by hand.

Sequence analysis
Nucleotide sequences were analyzed using a recent version of JoinSolver (http://joinsolver.niams.nih.gov/index.htm; [60]), which provides the closest-matched V H , D H and J H genes, determines the limits of the CDR-H3 region, the length (in amino acids) of CDR-H3 region, the contributions of P and N nucleotides at both the V-D and D-J junctions, and the number of SMs in the MAbs relative to the predicted germline genes (this number is defined as of the number of base pair substitutions relative to a predicted germline gene). These results, including nucleotide sequence for CDR-H3 region, were also entered into the Excel database. For a few HIV MAbs, only CDR-H3 length, and not the expressed V H sequence, was available (e.g., Ditzel et al. [61]; see Table S1). Results from JoinSolver were compared to those produced by the V-QUEST and JunctionAnalysis algorithms of the IMGT system [62], which also analyzes V H sequences for gene usage and somatic mutations. In addition, assignments of each MAb to predicted germline V H , D H and J H genes were confirmed visually. Results from IMGT and JoinSolver differed systematically. For example, the size of the region of CDR-H3 contributed by the germline D H gene was consistently estimated to be greater using IMGT V-QUEST. This result can be explained by the fact that standard parameters for V-QUEST allow more mutations in the D H -gene core. However, differences between classes of MAbs were similar whether the comparisons were calculated using V-QUEST or JoinSolver, and these relative differences (e.g., the average CDR-H3 length of self vs. non-self MAbs) are the important parameters in our study.
JoinSolver results were used to screen for clonal expansions, which were identified as those Abs that used the same sets of V H , D H and J H regions with similar patterns of N and P nucleotides. A single, randomly chosen MAb was retained to represent each clonally-expanded set. Two recently reported bNt MAbs [7] are expansions of the same B-cell lineage, so we randomly selected one, PG16, for analysis; taking the same approach we selected VRC01 from the set of two bNt MAbs reported by Wu et al. [9]. Thus, the final set that was statistically analyzed does not include all reported HIV MAbs, but only those representing independent clonal lineages (see below, and Tables S1 and S2).
In summary, the entire database consists of over 700 MAbs (Table S1), which underwent two screens to produce the final dataset (Table S2) for analysis of CDR-H3 length, SM and gene usage. In the first screen, each MAb had to have a specified Ag, and to be associated with a particular immune response. In the second screen, clones from the same clonal expansion were deleted from the dataset, resulting in a 427-MAb dataset comprising 227 ChI MAbs (including 193 HIV MAbs), 87 SAD MAbs and 113 AcI MAbs, which was exported to SAS (Rel. 8.2, 2001; SAS Institute Inc., Cary, NC) for statistical analysis. Of these 427 MAbs, 318 were identified as to IgM or IgG; 90% of these were IgG (Table S2).

Statistical analysis
For all MAb categories, PROC UNIVARIATE (SAS) was used to test the distributions of CDR-H3 length, total V H -gene mutations, distance of predicted V H gene used in the MAb relative to V H 6-1, the V-gene most proximal to the D H region (V H -distance), and distance of predicted J H gene from J H 6, the J H gene most distal to the D H region (J H -distance), against the normal distribution. Most of the distributions were non-normal, even after log transformation, so a non-parametric Kruskall-Wallis Test was used to test for differences among sets of MAbs (PROC NPAR1WAY, SAS). To avoid zero values, the natural log of (3*CDR-H3 length in aa + 0.1) was used in tests for differences in CDR-H3 length. All results from the non-parametric tests were compared to one-way ANOVA (PROC GLM, SAS), and in all cases the results were similar in terms of levels of significance. When more than two categories were compared, (i.e., comparisons among ChI, AcI and SAD MAbs), Tukey a posteriori tests were used to determine what groups were statistically different, and these different groups were denoted by different letters (PROC GLM, SAS). In Table 1 we present the p values for the main statistical tests of this study. This Table reports 9 hypothesis tests for each of CDR-H3 length, number of SMs and V H -distance, for a total of 27 tests; therefore, to be conservative, all tests that passed a Bonferroni-corrected P value of 0.05/27 = 0.0018 were highlighted in bold. Given the many confounding factors in this data base, these probability values should be interpreted as indicators of strong differences among categories rather than strictly interpreted statistical tests (see Discussion). Distributions of CDR-H3 length for CD4bs, CD4i, V3 loop, and anti-gp41 MAbs presented in Figure 2 were tested for heterogeneity by x 2 . J H -distance did not vary among MAb categories and is not reported. The difficulty of assigning germline D H genes to expressed Ab sequences, especially for highly mutated HIV MAbs, precluded a comprehensive analysis of D H gene usage or the number of P and N nucleotides. Figure S1 V H gene family usage in anti-protein and non anti-protein MAbs for 3 disease conditions. See Table 3 for sample sizes; there is only 1 ChI Mab that is not anti-protein.

Supporting Information
MAbs utilizing V H 1 family were separated into those using V H 1-69 and others. (TIF)