HIV-1 Envelope Subregion Length Variation during Disease Progression

The V3 loop of the HIV-1 Env protein is the primary determinant of viral coreceptor usage, whereas the V1V2 loop region is thought to influence coreceptor binding and participate in shielding of neutralization-sensitive regions of the Env glycoprotein gp120 from antibody responses. The functional properties and antigenicity of V1V2 are influenced by changes in amino acid sequence, sequence length and patterns of N-linked glycosylation. However, how these polymorphisms relate to HIV pathogenesis is not fully understood. We examined 5185 HIV-1 gp120 nucleotide sequence fragments and clinical data from 154 individuals (152 were infected with HIV-1 Subtype B). Sequences were aligned, translated, manually edited and separated into V1V2, C2, V3, C3, V4, C4 and V5 subregions. V1-V5 and subregion lengths were calculated, and potential N-linked glycosylation sites (PNLGS) counted. Loop lengths and PNLGS were examined as a function of time since infection, CD4 count, viral load, and calendar year in cross-sectional and longitudinal analyses. V1V2 length and PNLGS increased significantly through chronic infection before declining in late-stage infection. In cross-sectional analyses, V1V2 length also increased by calendar year between 1984 and 2004 in subjects with early and mid-stage illness. Our observations suggest that there is little selection for loop length at the time of transmission; following infection, HIV-1 adapts to host immune responses through increased V1V2 length and/or addition of carbohydrate moieties at N-linked glycosylation sites. V1V2 shortening during early and late-stage infection may reflect ineffective host immunity. Transmission from donors with chronic illness may have caused the modest increase in V1V2 length observed during the course of the pandemic.


Introduction
The gp120 portion of the HIV-1 envelope protein (Env) mediates attachment prior to fusion with the host cell membrane during target cell infection. gp120 has five hypervariable regions (V1-V5) bounded by cysteine residues and separated by four relatively ''constant'' regions (C1-C4) [1][2][3]. Gp120 is notable for its sequence variation, which may arise through recombination and point mutation, as well as by insertion and deletion of one or more nucleotides. Insertion and deletion events (indels) occur throughout env but are maintained through positive selection particularly within the hypervariable loops, which thereby may acquire significant length variation [4], The third hypervariable region is known to encode the primary determinants of coreceptor usage specificity [5][6][7], as well as epitopes recognized by humoral [8,9] and cellular [10,11] immune responses. V3 loop sequence variation has been extensively studied, and correlated with changes in host cell range, cytopathogenicity, and disease progression [12][13][14].
The V1V2 region in particular is characterized by a high degree of length polymorphism, sequence variation, and predicted N-linked glycosylation sites (PNLGS) [15][16][17][18][19][20], each of which may affect viral attachment, coreceptor usage and recognition by neutralizing antibodies [20,21]. Comparison of structural models of gp120 and gp120 bound to CD4 and a chemokine coreceptor have yielded considerable insight into the functional roles played by V1V2 and V3 during viral attachment [22,23]. In the unbound gp120 conformation, the V2 loop partially obscures V3 and other gp120 residues involved in coreceptor binding. Binding to CD4 induces conformational changes that expose the coreceptor binding site on gp120, including residues from V1V2, V3 and other regions [22,24].
Numerous studies have suggested that sequence variation in V1V2 influences host cell range and/or syncytium-inducing (SI) phenotype [25][26][27][28][29][30][31]. For example, Toohey demonstrated that recombinant chimeric clones with a V1V2 region from macrophage-tropic HIV-1 strains replicated efficiently in macrophages, whereas clones with the V1V2 region from lymphotropic strains did not [31]. However, not all studies have been concordant on the role of V1V2 in viral replication kinetics, cell range and transmission [15][16][17][18][19]32]. For example, Pastore showed that sequence changes in V1V2 could rescue otherwise lethal mutations in V3 associated with a change in coreceptor usage [33], and V2 polymorphisms have also been linked with restriction to CCR5 coreceptor usage [16]. In contrast, Wang et al found no relationship between SI phenotype and V1V2 sequence, length, distribution of PNLGS or charge [32].
The V1V2 region also appears to be an important determinant of sensitivity to neutralizing antibodies [34][35][36][37][38]. The V1V2 region evolves under positive natural selection in vivo [4,[39][40][41], and an inverse relationship between V1-V4 length and neutralization susceptibility has been demonstrated in subtypes A [20], B [34][35][36][37][38] and C [42]. Tellingly, laboratory strains lacking V1V2 may still replicate efficiently in vitro, but appear to be especially sensitive to antibody neutralization [43,44]. Consistent with this observation, viral strains with shorter and less glycosylated V1V4 regions have been reported to preferentially replicate in subjects newly infected with HIV-1 subtype C [45] (where presumably an effective neutralizing antibody response has not had time to emerge), and similar observations have been made concerning the V1V2 loop in individuals recently infected by HIV-1 subtype A [19]. However, we and others have not observed this effect in HIV-1 subtype B [19,46,47].
Despite these reports, the relationship between V1V2 region length polymorphism and disease progression remains unclear. In two small longitudinal studies, elongation of V1 and V2 was noted in long-term nonprogressors (LTNP), but not within individuals progressing rapidly to AIDS [15][16][17][18][19]. In a third study, no clear relationship between V1V2 length variation and disease progression was observed [48]. Lastly, some investigators postulate that V1V2 length changes positively correlate with the pace of disease progression [16,19], while others have suggested that V1V2 length increase may be a correlate of delayed progression to AIDS [18].
Thus, our understanding of the role of the V1V2 loop in influencing HIV pathogenesis remains incomplete and is challenged by several contradictory observations. To more fully characterize HIV envelope subregion variability and to clarify the associations between subregion length variation, glycosylation, and disease progression, we have comprehensively examined length and glycosylation of each gp120 subregion as a function of clinical parameters in a large collection HIV-1 subtype B infected individuals.

Ethics statement
This study was performed using publicly available data from the Los Alamos database, and previously unpublished experimental data obtained at the University of Washington. Unpublished data were obtained and analyzed with written informed consent of study participants, and approval by the University of Washington Institutional Review Board.

Patient selection
We analyzed new and published HIV-1 envelope gene sequences and associated clinical data from all available subjects in the Seattle Primary Infection Cohort (PIC) [49], the Multicenter AIDS Cohort Study (MACS) [50], and from the Los Alamos National Laboratories HIV database (HIVDB) (http://www.hiv. lanl.gov/content/hiv-db/mainpage.html) not meeting pre-specified exclusion criteria. Subjects were excluded from this study if younger than 18 years of age or if there was any history of antiretroviral therapy prior to sampling as determined by patient report and clinical records (MACS, PIC) or as indicated in the methods section of published reports (HIVDB), unless otherwise noted. All subjects considered in the cross-sectional and longitudinal analyses were infected with HIV-1 subtype B, except for two subjects infected with HIV-1 subtype A who were included in longitudinal analyses, but were excluded from cross-sectional analyses. (Additional subtypes were considered in analyses of env subregion length change during transmission, presented in Text S1, Section 8). Clinical data retrieved included CD4 count, viral load, time since infection, and treatment history. Sequence data were only accepted if directly derived from plasma or PBMC without an intervening step involving viral propagation in vitro. In some cases, individual authors were consulted to resolve clinical or methodological ambiguities. Accession numbers for published sequences are provided in Table S1. Gene sequence data used in this study are available at http://mullinslab.microbiol.washington. edu/publications/curlin_2010/.

Subject groups
Viral gene sequence data were considered in both crosssectional (Table 1) and longitudinal analyses ( Table 2). The crosssectional dataset included only plasma and PBMC sequences derived from individuals infected with subtype B (see results, and Table 1). Sequences were triaged by author, database identifier and associated clinical data to exclude duplicate entries. To assess the role of stage of illness on loop length variation, subjects were divided into four non-overlapping groups; group C x 1 subjects were sampled within two months of the estimated time of infection. Group C x 2 subjects were sampled between two months and three years following infection. Group C x 3 subjects were sampled at times .3 years post infection. Group C x 4 was comprised of all individuals meeting 1993 CDC criteria for AIDS when sampling occurred (generally CD4 count ,200/mm 3 ), regardless of time since infection.
The longitudinal dataset was derived from 20 subjects infected with subtype B and 2 individuals infected with subtype A, from the PIC cohort and from previous reports [18,[51][52][53][54][55], in whom data were available from two or more timepoints (see results, and Table  2). All intra-individual longitudinal comparisons were made between sequences obtained from the same compartment (e.g., plasma vs. plasma). Individuals partitioned into group L1 (N = 15)

Author Summary
The HIV envelope gene (env) encodes viral surface proteins (Env) that are vital to the basic processes used by the virus to infect and cause disease in humans. Adaptations in env determine which cells the virus can infect, and permit the virus to avoid elimination by the immune system. Env is one of the most variable genes known, and it can change dramatically over time in a single individual. However, Envhost cell interactions are complex and incompletely understood, and changes in this viral protein during infection have not yet been systematically described. We examined a large number of env sequences from 154 individuals at various stages of HIV infection but who had never received antiretroviral treatment. We found that the env V1V2 region lengthens during chronic infection and becomes more heavily glycosylated. However, these changes partially reverse during late-stage illness, possibly in response to a weakening host immune system. V1V2 lengths are also increasing over time in the epidemic at large, possibly related to the epidemiology of HIV transmission within the subtype B epidemic. These results provide fundamental insights into the biology of HIV.

Nucleic acid isolation, cloning and sequencing
Sequences from the PIC and MACS cohorts (Tables 1 & 2) were obtained from plasma or PBMC by standard methods [56,57], using safeguards to prevent contamination and template resampling [58]. Briefly, PCR amplification was performed using Taq polymerase (Bioline) with primers ED3 and BH2 [59] (first round) followed by ED5 and DR7 (second round) [60]. PCR products were cloned into a TA TOPO vector (Invitrogen) and selected colonies sequenced under contract using Big Dye dyeterminator protocols. Genbank accession numbers pending submission.

Statistical analysis
For cross-sectional analyses, univariate and multivariate regressions were conducted assessing subregion lengths and number of glycosylation sites as a function of time since infection, stage of disease, CD4 count, HIV viral load, adjusting for sample source (plasma vs. PBMC), and date of sampling (calendar year). In regression analyses, to allow direct comparisons of the effect of each variable on V1V2 length and/or glycosylation, we compared b values (i.e., regression coefficients scaled such that each variable is equivalent to having a mean value of 0 and a standard deviation of 1). Generalized estimating equations (GEE) were utilized to account for non-independence of data points [68][69][70], and an exchangeable correlation structure was assumed. This method adjusts for the correlation of multiple sequences nested within a sample as well as multiple samples per patient. As an additional means of verifying that analysis outcomes were not influenced by data linkage, regression analyses were performed on replicate data subsets reconstituted from the original data by random resampling, including analyses on 100 data subsets each obtained by using one randomly selected sequence from each individual (See Text S1 section S2). To ensure that results were not unduly influenced by outlying sequences with extremely short or long loop lengths, analyses were repeated after excluding sequences representing the shortest 5% and longest 5% of the V1V2 loops in the dataset. For the longitudinal dataset, multivariate linear regressions were conducted assessing V1V2 length and number of glycosylation sites as a function of time since infection within a person, and the mean rate of change per year was estimated. Statistical analyses were performed using SAS version 9.1 (SAS Institute, Cary, NC).

Sequence data
We obtained 5185 partial length HIV-1 env gene sequences for cross-sectional and longitudinal analysis by the methods described above (Tables 1 & 2 All subtype A sequences and sequences derived from sites other than blood were excluded from cross-sectional analyses, but were considered as special cases under longitudinal analyses (sequence data available at: *webaddress pending acceptance*).

Relationship between V1V2 loop length, sample features
and clinical factors -univariate analyses. (And see Text S1, sections S1, S3, S6, and Figures S1, S2 and S12.) We examined V1V2 loop lengths as a function of year of sampling and specimen type (plasma vs. PBMC). In separate univariate GEE analyses, V1V2 length increased with calendar year of sampling (b = 1.62 increase in V1V2 length per year; p = 0.003, Figure 2, lower panel) and trended towards greater length in PBMC, though not significantly (b = 1.70 for PBMC compared to plasma; p = 0.11). We then examined individual subregion lengths as a function of time since infection, clinical stage, CD4 counts, and HIV plasma viral load. In separate GEE regression analyses, V1V2 length was significantly correlated with time since infection (b = 1.00 increase in V1V2 length per year; p,0.001, Figure 2, upper panel) and clinical stage, as subjects with stage 3 (b = 6.36; p,0.001) and stage 4 (b = 3.30; p = 0.02), but not stage 2 (b = 0.80; p = 0.4) had significantly longer V1V2 lengths compared to subjects with stage 1 infection (Figure 3 and S12). However, V1V2 length did not significantly correlate with either CD4 stratum (,200, 200-500 or .500 cells/ml) or plasma viral load.
Relationship between V1V2 loop length, sample features and clinical factors -multivariate analyses (Table 3). To further understand the interaction between significant variables, we next performed multivariate analyses of V1V2 length vs. time since infection, clinical stage, CD4 level, and HIV viral load after adjusting for calendar year and type of sample. This analysis was performed for all sequences in the dataset, as well as with plasma sequences and PBMC sequences considered separately (Table 3). Overall, V1V2 length was not significantly associated with time since infection, CD4 level, or HIV viral load. However, among sequences derived from plasma, V1V2 length was significantly associated with increased time since infection (b = 0.77 per year; p,0.001). Conversely, among the PBMC sequences, V1V2 length was associated with decreased CD4 counts (b = 8.13 for CD4 counts between 200 and 500 and b = 6.77 for CD4 counts less than 200 compared to .500) although the association with the lowest CD4 count group did not reach statistical significance (p = 0.09). Among subjects without AIDS (Stages 1 through 3), V1V2 length was associated with time since infection (b = 0.70 increase in V1V2 length per year; p,0.001), even after adjustment for calendar year and type of sample (data not shown). Overall, after adjusting for calendar year and sample type, V1V2 length remained significantly associated with clinical stage, as subjects with stage 3 (b = 6.25; p,0.001) and stage 4 (b = 3.54; p = 0.02), but not stage 2 (b = 0.09; p = 0.9) had significantly longer V1V2 lengths compared to subjects with stage 1 infection. However, V1V2 lengths in subjects with clinical stage 4 were significantly shorter than V1V2 lengths from subjects in stage 3 (p,0.001). The findings of increased V1V2 length in stage 3 and 4 infection compared to stage 1 and 2 were similarly noted both among sequences derived from plasma as well as PBMC, although the plasma associations did not reach statistical significance in all cases. In order to assess the potential that the results regarding clinical and viral factors associated with V1V2 length could be driven by unusually short or long sequences, we repeated the above analyses excluding the shortest and longest 5% of V1V2 lengths. Since model coefficients and p-values were similar in this restricted analysis (Table S2), our findings do not appear to be unduly influenced by a small number of outlying small or large sequences (Also see Text S1, section S3 and Figure S6).
As an alternative means of accounting for the variable number of sequences contributed by study subjects, the data was subjected to a resampling analysis, in which each subject contributed a single randomly selected sequence. This process was repeated 100 times,  resulting in 100 resampled datasets. These analyses confirmed that the observed relationship between V1V2 length and time since infection, and year of sampling were not significantly biased due to the inclusion of individuals with multiple sequences (See Text S1, section S2 and Figure S5). Coreceptor usage, clinical factors and V1V2 loop length. (Also see Text S1 section S5 and Figures S10 and S11.) We next used four published genotypic methods to infer coreceptor usage based on V3 loop amino acid sequence [62][63][64][65]. In our dataset, 4476 V3 loop sequences were available for scoring, and were derived from 129 individuals. 121 V3 loops could not be scored by the PGRC method because the aligned sequences exceeded the length limit specified by the input format (40 characters). There was agreement in coreceptor usage assignment by all of the methods in 3644 of 4476 sequences (81.4%) and disagreement between one or more methods in the remaining 832 sequences. 1046 of 4476 sequences were scored as CXR4-using or syncytium-inducing by one or more methods, and the remaining 3430 were uniformly scored as CCR5 or non-syncytium by all methods. 60 of 129 individuals had at least one X4-scoring V3 loop as determined by one or more of the prediction methods, while the remaining 69 had only CCR5-scoring sequences.
We then considered inferred coreceptor usage as a function of time since infection, clinical stage, CD4 counts, HIV viral load, and V1V2 length, both overall and separately in plasma-and PBMC-derived viruses. Because the PSSM method provides a continuous numerical measure corresponding to the sequence position on a continuum of the evolutionary changes leading to X4 usage (the PSSM score), we examined PSSM score in relation to these variables. Overall, in separate GEE regression analyses, PSSM score was not related to time since infection (p = 0.9) or HIV viral load (p = 0.5). However, PSSM score was significantly increased (indicating greater CXCR4 usage) in those with stage 4 (b = 6.34, p = 0.0002) but not stage 2 or stage 3 infection (p = 0.8 each). Similarly, PSSM score was significantly increased in those with intermediate (200-500 cells/ml) and low (,200 cells/ml) CD4 counts (b = 1.52, p = 0.02 and b = 6.62; p,0.0001, respectively) compared to those with CD4 counts above 500 cells/ml. PSSM score was weakly associated with increased V1V2 length

Longitudinal analyses
In the longitudinal dataset, significant V1V2 length increases between first and second timepoints were noted in 10 of 22 subjects, a significant V1V2 length decrease over time occurred in one subject, and no significant V1V2 length changes over time were seen in the remaining 11 subjects. These findings appeared to vary by stage of infection (t-test p = 0.03). In the 15 patients from the L1 group (individuals not meeting AIDS criteria at any time prior to final sampling), the mean increase of V1V2 length per subjects was 1.69 amino acids per year, and 9 subjects experienced significant V1V2 length increases over time (Figures 4 and 5). In contrast, of the seven subjects in the L2 group (individuals progressing to AIDS between first and final sample), the mean V1V2 length decreased by an average of 0.10 amino acids per year, with only one having a significant trend of increasing length, while one individual showed a significant decrease in length ( Figure 6). The distribution of V1V2 length change (increase or decrease) by group was therefore asymmetric (Fisher's exact test, p = 0.02), reflecting a trend of increasing length in asymptomatic individuals (group L1) and stable or decreasing length in individuals with AIDS (group L2) ( Table 4). Three subjects in group L1 had extensive longitudinal sampling ( Figure 5); in 1362 and Q23 [51], there was a period of V1V2 length stability of approximately 2 years, followed by increase through 4.5 years. V1V2 length increase over time was also seen in CC1. In the case of CC1, a pseudotyped virus was created using the gp120 coding region from the initial timepoint from this individual in a HIV-1 NL4-3 background, and cultured in vitro [54]. In contrast to the patterns observed in vivo, V1V2 length and number of glycosylation sites both declined rapidly over 20 generations in vitro (p,0.001).

Discussion
We have systematically examined gp120 subregion length variation, and the relationship between length polymorphism, Nlinked glycosylation sites, and clinical markers of disease progression. Although V1V2, V4 and V5 all displayed remarkable length heterogeneity, and V1V2, C3 and V4 were also quite variable with respect to glycosylation, the most significant associations between virological and clinical variables localized to the V1V2 region. We found that V1V2 length and glycosylation increased significantly over time during chronic infection, and then declined in late-stage illness. In regression analyses, time since infection was the most influential factor in determining V1V2 length. In addition, there was a modest but significant increase in V1V2 length over the period from 1984-2004. V5 loop length was highly variable, but tended to decrease slightly in length over the course of infection.
In SIV infection, the number of PNLGS in gp120 increases over time in vivo following inoculation of a cell-passaged strain [71]. In one earlier study in humans, Bunnik et al noted expansion in gp120 length followed by contraction over time in 4 of 5 individuals receiving antiretroviral therapy, and similar changes in glycosylation in 3 subjects [72]. Others have noted a relationship between early infection and reduced V1V2 length and glycosylation in subtypes C and A [19,45]. In contrast, a comparison of early and chronic HIV-1 subtype B sequences from the HIV sequence database failed to reveal any significant difference in V1V2 length [19], suggesting that these effects may be subtypespecific. Data on length/glycosylation changes during transmission have been conflicting. Derdeyn et al [45] demonstrated reduced length and glycosylation in V1-V4 following heterosexual transmission in HIV-1 subtype C. However, Frost et al failed to note similar findings in a study of eight subtype B homosexual transmission pairs [47], and in our examination of these and 10 additional subtype B infected homosexual transmission pairs, we found no consistent pattern of change in V1-V2 or V1-V4 length or glycosylation upon transmission [46].
Interpretation of the data presented here may be affected by several methodological factors. There is probably some variation in the accuracy of the reported time of infection for sequences obtained from previous reports. In some cases, sequences obtained from prior publications may have been obtained under conditions permitting template resampling [73], and a systematic error due to evolving laboratory methods could result in bias. Also, in our analyses, we have not formally corrected for multiple comparisons. Physiological factors are also likely to introduce some noise, particularly in cross-sectional analyses of parameters with respect to time since infection. The individuals included here represent a broad spectrum of clinical scenarios, diverse host immune response profiles and varying disease progression rates. Plasma sequences may receive contributions from both recently infected target cells and older reservoirs, and therefore imperfectly reflect selective pressures prevailing at the time of infection. Finally, length and glycosylation phenotypes are likely to be affected by chance events and unknown factors not considered in our analyses. Therefore, the effects we describe are influential rather than deterministic, and reflect important selective forces that can be discerned against a background of high inter-individual variation.
Despite these limitations, the analyses presented here and the work of others [40,[45][46][47]72] provide the outlines of an overall pattern characterized by transmission of randomly selected V1V2 loop lengths from viruses present in the donor pool, a brief decline in loop size during the initial months immediately following infection, gradual selection for bulkier V1V2 loops during chronic infection, and finally, reversion to more compact loops during late stage illness. Structural studies [22,23], neutralization studies [20,[34][35][36][37][38]42], and in vitro data on viruses lacking V1 and V2 [43,44] suggest that one major function of the V1V2 region may be to permit evasion from humoral immune responses in the host. Thus, the trends outlined above support the hypothesis that HIV populations may evolve to escape humoral selective pressure by increasing V1V2 loop size. According to this view, the newly infected, immunologically naïve host might be expected to harbor relatively short V1V2 loops that eventually lengthen in response to Figure 5. V1V2 length vs. time in subjects Q23, CC1 and 1362. Panel A: Subject Q23, infected with HIV subtype A. Sequences were derived from PBMC (black circles), plasma (black diamonds) and DNA from cervical lymphocytes (green squares) as described by Poss et al [80]. Panel B: Subject CC1, infected with subtype A. Sequences were obtained from plasma (black diamonds) and tissue culture (red squares). Length change of in vitro sequences occurs over , 40 days, and are represented along an expanded X-axis for clarity. doi:10.1371/journal.ppat.1001228.g005 an effective humoral response at some fitness cost ( Figure S9). Experimental evidence indicating that relaxation of antibodymediated selective pressure during early infection is associated with shorter loops is provided by Derdeyn, who demonstrated significantly greater neutralization sensitivity among five recipients during early infection, than in the corresponding donors [45]. The decline in V1V2 size observed in advanced disease probably reflects waning effectiveness of humoral immunity in hosts with late-stage illness and profound immune dysregulation (Figure 7). This decline is also congruent with previous findings of an inverse relationship between the rate of HIV genetic evolution and the rate of CD4 T cell decline in some individuals [74]. The dramatic reduction in V1V2 length associated with transfer to the in vitro environment [54] represents the extreme case of absent host immunity, where viruses without an unnecessarily bulky V1V2 loop achieve maximum replicative fitness. As would be expected, the patterns we observe are most pronounced in plasma sequences, which most directly reflect the selective forces present at the time of sampling. In contrast, a significant increase in V1V2 length over time was not seen in the PBMC compartment. These observations are consistent with the presence of archived genotypes from earlier times during the course of infection within the PBMC compartment. We also note that genotypes present in plasma may emanate from other cellular compartments in addition to PBMC, and may therefore reflect somewhat different evolutionary pressures. However, a considerably greater number of V1V2 sequences were derived from plasma, and sample size may also account for some of the differences observed between these compartments.
Our model may help to explain a failure to find any significant difference in V1V2 length in a comparison of early and chronic HIV-1 subtype B sequences (including sequences from late-stage individuals) [19]. When we reanalyzed the data presented by Chohan [19] after separating subjects with stable chronic illness from subjects with AIDS ( Figure S13), we observed a pattern of lengthening over time, followed by decline in late-stage illness, as reported here (See Text S1, section S7). Similarly, we may explain discordant results obtained on V1V2 length variation during transmission of HIV-1 subtypes C and B. While a trend towards shorter loops in recipients was seen in subtype C [45] but not B [46,47], it is likely for methodological reasons that the subjects studied by Derdeyn were sampled at somewhat later times than those of Frost and Liu. Thus the sequences in the latter two studies would be expected to be a random sampling from the donor pool, while those of Derdeyn might reflect the expected shortening prior to the onset of an effective antibody response. Indeed, when we examine a much larger set of subtype A and C transmission pairs from East Africa with more precisely known sampling times obtained soon after transmission, it is difficult to appreciate any consistent pattern of V1V2 length change (See Text S1, section S8 and Figure S14). Thus there may be no need to infer separate mechanisms for different HIV-1 subtypes and modes of transmission.
In addition, we may also explain a trend of increasing V1V2 length by calendar year. If shorter and less glycosylated V1V2 were always selected during transmission, transmission from donors in early infection would maintain a constant V1V2 length within the epidemic, whereas if all new cases were acquired from chronically infected hosts, this increase of V1V2 length by calendar year could be dramatic. However, most studies suggest that about half of transmission events involve subjects in early infection [46,75,76], consistent with the moderate trend we observed. Alternatively, the temporal trends we have observed could represent a gradual adaptation by HIV-1 to host the host environment at the population level, a hypothesis that has been proposed by several investigators with respect to mutational escape from HLA-restricted CTL epitopes [77][78][79].
Finally, our results imply that the polymorphisms seen in V1V2 reflect the ability of the host to mount a meaningful immunological response, rather than virologic features that dictate the course of illness. That is, we argue that V1V2 length change is a consequence of environmental selective pressure rather than a causative factor in disease progression.

Supporting Information
Text S1 Supporting analyses text -PDF document containing text containing supplementary analyses and citations. Found at: doi:10.1371/journal.ppat.1001228.s001 (0.14 MB PDF) Figure S1 V1V2 length vs. virologic and clinical parameters I.  Figure S9 Agreement between 4 bioinformatic coreceptors used to assign probable coreceptor usage. There was complete agreement between all methods for ,80% of sequences examined, while in the remaining 20%, there was some disagreement in assignment between one or more scoring methods. Most sequences were predicted to be CCR5-tropic by all methods (white bar), while a modest number of sequences was predicted to be CXCR4tropic by all methods. The remaining sequences were scored differently by various methods, as represented (colored bars). Found at: doi:10.1371/journal.ppat.1001228.s010 (2.84 MB TIF) Figure S10 V1V2 sequence length vs. time since infection and PSSM score. Rising PSSM scores (color scale), depicted as warmer colors, indicate a greater likelihood of CXCR4 coreceptor usage; in this dataset, predicted X4 coreceptor usage occurs at a PSSM score of approximately -2. In these data, there is a pronounced preponderance of CCR5-using viruses, with a trend towards increasing prevalence of X4-tropic viruses during chronic infection. However, X4 and R5 viruses are distributed throughout all infection times, and cannot be easily distinguished on the basis of V1V2 length. Found at: doi:10.1371/journal.ppat.1001228.s011 (1.15 MB TIF) Figure S11 V1V2 potential N-linked glycosylation sites vs. V1V2 length and PSSM score (color scale). There is a very marked dependence of glycosylation on length (b = 0.13 PNGL/ amino acid, R 2 = 0.52). X4-usage appears to be more commonly associated with V1V2 sequences bearing 4-7 PNLG sites, than with sequences with more than 7 sites (and see figure S1 panel D). Found at: doi:10.1371/journal.ppat.1001228.s012 (1.10 MB TIF) Figure S12 V1V2 and Stage of Illness. V1V2 length vs. Time since Infection for stage 1 (orange ''+''), stage 2 (gray triangles), stage 3 (blue squares), and stage 4 (red diamonds). There is a slight decline in V1V2 length from stage 1 to stage 2, reflecting regression from transmitted viruses of essentially random lengths to shorter loop lengths during early infection prior to the onset of a meaningful immune response. This is followed by a strong trend towards lengthening during chronic infection (stage 3) and a weakening of this trend in late-stage illness (stage 4). Found at: doi:10.1371/journal.ppat.1001228.s013 (1.52 MB TIF) Figure S13 Chohan Data revisited: V1V2 sequence length for subjects in early infection (first bar), chronic infection and AIDS considered together (second bar), chronic stable infection only (third bar), and individuals with AIDS-defining clinical conditions (fourth bar). Length differences between ''early'', ''chronic'' and ''AIDS'' are statistically significant (p#0.02). Thus, separation of sequences obtained during AIDS from sequences obtained during chronic stable infection reveals a trend of rising V1V2 length through chronic infection, followed by falling length in AIDS that is not otherwise apparent.