HLA-Associated Immune Pressure on Gag Protein in CRF01_AE-Infected Individuals and Its Association with Plasma Viral Load

Background The human leukocyte antigen (HLA)-restricted cytotoxic T-lymphocyte (CTL) immune response is one of the major factors determining the genetic diversity of human immunodeficiency virus (HIV). There are few population-based analyses of the amino acid variations associated with the host HLA type and their clinical relevance for the Asian population. Here, we identified HLA-associated polymorphisms in the HIV-1 CRF01_AE Gag protein in infected married couples, and examined the consequences of these HLA-selected mutations after transmission to HLA-unmatched recipients. Methodology/Principal Findings One hundred sixteen HIV-1-infected couples were recruited at a government hospital in northern Thailand. The 1.7-kb gag gene was amplified and directly sequenced. We identified 56 associations between amino acid variations in Gag and HLA alleles. Of those amino acid variations, 35 (62.5%) were located within or adjacent to regions reported to be HIV-specific CTL epitopes restricted by the relevant HLA. Interestingly, a significant number of HLA-associated amino acid variations appear to be unique to the CRF01_AE-infected Thai population. Variations in the capsid protein (p24) had the strongest associations with the viral load and CD4 cell count. The mutation and reversion rates after transmission to a host with a different HLA environment varied considerably. The p24 T242N variant escape from B57/58 CTL had a significant impact on the HIV-1 viral load of CRF01_AE-infected patients. Conclusions/Significance HLA-associated amino acid mutations and the CTL selection pressures on the p24 antigen appear to have the most significant impact on HIV replication in a CRF01_AE-infected Asian population. HLA-associated mutations with a low reversion rate accumulated as a footprint in this Thai population. The novel HLA-associated mutations identified in this study encourage us to acquire more extensive information about the viral dynamics of HLA-associated amino acid polymorphisms in a given population as effective CTL vaccine targets.


Introduction
Accumulating evidence indicates that cytotoxic T lymphocytes (CTLs) play a central role in controlling human immunodeficiency virus (HIV) replication in vivo, and a number of CTL-inducing vaccines have been developed [1,2]. All trials of CTL-inducing vaccines against HIV have been unsatisfactory including the most recent trial conducted in Thailand [3,4,5]. Genetic polymorphisms in the human leukocyte antigens (HLAs) are key factors contributing to the complexity of developing CTL-inducing vaccines [6,7]. HLA class I molecules play a critical role in defining the epitopes of CTLs, which probably influence their antiviral efficacy. The extraordinary capacity of this virus to generate genetic diversity is another important factor contributing to this complexity. To date, 13 prototype HIV clades and 43 circulating recombinant forms have been described worldwide and HIV diversity appears to be increasing as the infection spreads [8].
Once the virus infects a host, it rapidly evolves and evades the host cellular immune response. Viral adaptation to the HLArestricted immune response and the selection of viral mutations associated with the loss of the antiviral immune response have been described in both acute and chronic HIV-1 infections at the individual level [9,10]. Recently, viral adaptations to HLA have also been reported at the population level [11,12]. Therefore, there is a growing concern that HIV may evolve to reduce the availability of key CTL epitopes that are associated with the control of HIV infection at the population level. This in turn would greatly affect the clinical outcomes of HIV/AIDS. Therefore, these associations are becoming increasingly important for effective CTL-based vaccine strategies.
Several studies have attempted to define HLA-associated mutations in a given population using a large number of HIV genome sequences and to determine their influence on clinical outcomes [13,14]. These studies have identified HLA polymorphisms in the HIV-1 Gag protein and this association continues to be reinforced [15]. However, most information has been derived from studies of subtype-B-HIV-infected Caucasian and subtype-C-HIV-infected African populations, and very little information is available on the CRF01_AE virus, the predominant clade circulating in southeast Asia [16,17].
Therefore, in this study, we investigated the amino acid variations in the HIV-1 CRF01_AE Gag protein among HIV-1infected people with known HLA alleles in Thailand, with the primary objective of identifying the amino acid mutations associated with the host HLA class I types and their influence on clinical outcomes. Moreover, because our cohort included dozens of discordant couples (viral transmission pairs), we took advantage of this point and further analyzed the fate of these HLA-selected mutations after transmission to HLA-unmatched recipients.

Ethical statement
This study was approved by the Ethics Committee of the Thai Ministry of Public Health and was conducted in accordance with the set guidelines for research. All patients provided their written informed consent for the collection of the samples and their subsequent analysis.

Population and samples
We recruited 116 chronically HIV-1-infected Thai couples (219 patients in total) at a government referral hospital in northern Thailand between 6 July 2000 and 15 October 2002. The cohort has been described in detail elsewhere [18]. We obtained two sequential blood samples from each patient, with an interval of 6-27 months (mean interval 19.75 months, mode 24 months) between the two collections. The majority of patients were naïve to antiretroviral therapy, except for 27 individuals who were receiving treatment with single or dual nucleoside reverse transcriptase inhibitors. However, no patient was receiving highly active antiretroviral therapy. The median (interquartile range, IQR) CD4 + cell count in our study population was 163 (23, 370) cells/mL, and the median (IQR) plasma viral load was 5.20 (4.54, 5.63) log 10 RNA copies/mL. Peripheral blood mononuclear cells (PBMCs) were separated with a commercially available cellseparation tube (CPTH Cell Preparation Tube with Sodium Citrate, BD, Franklin Lakes, NJ, USA) and used in this study.

HLA class I typing
Genomic DNA was extracted from patient PBMCs with the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions. HLA class I typing for the A and B loci was performed using a PCR microtiter plate hybridization method (WAKFlowH HLA typing kit) (Wakunaga Co. Ltd., Hiroshima, Japan), according to the manufacturer's instructions.
For statistical analysis, each HLA allele of each individual was assigned a two-digit designation.

PCR amplification of HIV gag and sequencing
Genomic DNA was extracted from patient PBMCs as described above, and nested PCR was performed. First, the 9.1-kb nearly full-length HIV genome was amplified using Takara LA Taq DNA polymerase (Takara, Shiga, Japan) and the following primers, which bind to both the long terminal repeat (LTR) regions of the HIV genome: sense outer primer, MSF12b 59-AAATCTCTAG-CAGTGGCGCCCGAACAG-39, and antisense outer primer, OFMR1 59-TGAGGGATCTCTAGTTACCAGAGTC-39. The PCR conditions were as follows: melting at 95uC for 5 min; 30 cycles each of 95uC for 10 s, 65uC for 30 s, and 68uC for 8 min; and a final extension at 68uC for 7 min. The 1.7-kb fragment containing the entire gag gene was amplified from the first round PCR product using Qiagen Taq DNA polymerase (Qiagen): sense inner primer, Gag-F1 59-TCTCGACGCAGGACTCGGCTT-GCT-39, and antisense inner primer, Gag-R2 59-CCTCCAATT-CCCCCTATCATTTTTGG-39. The thermocycling conditions for the second round of PCR were as follows: melting at 95uC for 2 min; 30 cycles each of 95uC for 30 s, 60uC for 30 s, and 68uC for 90 s; and a final extension at 68uC for 7 min. The PCR product was analyzed by gel electrophoresis. The appropriate PCR products were directly sequenced by Macrogen Inc., Korea.

Sequence analysis
The HIV nucleic acid sequences were analyzed to identify their subtypes, using the RIP 2.0 software (http://www.hiv.lanl.gov/ content/sequence/RIP/RIP.html). All CRF01_AE sequences were submitted to GenBank (accession number GU458430-GU458799). Only the sequences that included the complete gag open reading frame were selected for sequence analysis. The sequences were aligned and translated using the MEGA 3.1 software [19]. A consensus sequence was created from the most abundant amino acid at each position in the cohort. HIV transmission between spouses was confirmed by constructing a neighbor-joining phylogenetic tree using the entire gag nucleotide sequences derived from the whole sample. If the viruses derived from a husband and wife clustered on the same branch, the couple's viruses were regarded as having a common ancestor, implying that the virus was transmitted between them. On this basis, we identified 68 such couples. Each member of the remaining couples was considered to be infected with virus distinct from that infecting his/her spouse. The direction of transmission was determined by in-depth interviews with field workers. The associations between the sequence polymorphisms and the HLA types were analyzed with Fisher's exact test with a 95% confidence interval (CI), using only the patients who were source of virus in the couples (index cases), and was limited to the HLA alleles shared by at least five subjects to ensure sufficient statistical power. Amino acids that were identical to the consensus sequence were considered to be ''dominant'' amino acids, and any difference from the consensus sequence was classified as ''non-dominant''. An amino acid position was declared an ''HLA-associated variable site'' if a significant HLA association was identified in the sequence at both times of sample collection. The HLA-associated variable site was mapped in relation to the best-defined CTL epitopes published in the Los Alamos HIV Databases [http://www.hiv. lanl.gov/content/sequence/HIV/mainpage.html, accessed Dec. 2009].
For detecting adaptive evolution in protein-coding sequences under natural selection in the population, the branch lengths and nucleotide substitution rate parameter was estimated to approx-imate the analogous parameters of the codon model. The MG94 codon model, which estimates synonymous and non-synonymous rate independently for every amino-acid, was performed. The estimating site-by-site variation rate was evaluated by single likelihood ancestor counting (SLAC) and fixed-effected likelihood (FEL) methods. The adaptive evolution study was done by HyPhy 2.0 software [20,21].

Statistical analysis
All statistical analyses were performed with Excel 2007. Fisher's exact test with a 95% CI was used to detect HLA-associated dominant or non-dominant sites, and Spearman's correlation test was used to determine the number of HLA-associated nondominant sites and for the viral load correlation analysis. We also used one-way ANOVA to test the differences in viral load among the T242X mutations with or without compensatory mutations in the HLA_B*57/*58-positive or -negative groups.

Results
Sequencing results, study population, and HLA allele frequencies By subtype analysis, nine individuals were found to be infected with subtype B or a CRF01_AE/subtype B recombinant form of virus, so they were excluded from further analysis. Then, 370 CRF01_AE Gag sequences were determined in the 209 and 161 samples at the first and second time points, respectively, obtained from 219 individuals (116 couples). The numbers of CRF01_AEinfected individuals carrying specific HLA class I alleles are shown in Figure 1. The most frequent HLA_A allele was A*11, followed by A*02, A*24, and A*33. The allele B*15 was the most frequent HLA_B allele, followed by B*40, B*46, and B*13. Clearly, the HLA distribution in Thailand differs from those in North American and African countries. We also analyzed the linkage disequilibrium. Strong linkage was found between A*33 and B*58 (p = 1.39610 212 ), as previously reported elsewhere [22,23].

HLA-associated amino acid variations
To identify HLA-associated amino acid variations, we analyzed the Gag amino acid sequences in relation to the HLA types. Phylogenetic analysis identified 68 couples in which the CRF01_AE virus transmission between the spouses was confirmed ( Figure S1). In the remaining couples, the spouses were considered to be infected with distinct viruses. To minimize the lineage effect that might result from sampling viruses from concordant couples, we included only one spouse from each couple in the analysis. After removing the contact cases from these 68 concordant couples, 144 first samples and 122 second samples were used for further analysis. We found 44 amino acid site variations (among the known 498 amino acid positions) in the Gag region. All these variations showed statistically significant associations with some of the HLA types (p,0.05) and these are described below.
The number of HLA-associated amino acid variations did not necessarily correlate with the frequency of the allele. More than five amino acid variations were associated with B*58 and B*52, despite the relative infrequency of these alleles, whereas no variation was significantly associated with one of the most frequent alleles, B*15 ( Figure 1B). Among the 56 HLA-associated amino acid variations, 49 (87.5%) were selected by non-dominant amino acids in the presence of the specific HLA type. The remaining seven (12.5%) variations were selected by dominant amino acids in the presence of a specific HLA type. Six amino acid variations caused by negative selection were located in p17, whereas only one was located in p24. Dominant amino acid selection was always associated with frequent HLA alleles: five variations were associated with A alleles (A*11, A*02, A*24, or A*33) and two were associated with B alleles (B*46 or B*27).
We also found that 35 (62.5%) HLA-associated amino acid variations were located within or adjacent to the best-defined HIV-specific CTL epitopes, restricted by the relevant HLA allele [24] (Table 1). Some HLA-associated amino acid variations were located at anchor positions of binding peptides: A*24-associated F79X, B*40-associated E93X, and B*58-associated V485X (Table 1). Odds ratios were widely variable, ranging from 2.60 to 90.0, with a median (IQR) of 7.87 (4.48, 13.3). The odds ratio was highest by far at B*58-associated T242.
The codon-based analysis revealed a large number of significant selection sties in the Gag protein, mostly purifying selection; among the 498 Gag amino acid positions, 270 (54.2%) sites and 52 (10.4%) sites were identified by either SLAC or FEL method as purifying selection and positive selection sites, respectively (Table  S1). Interestingly 19 (36.5%) out of the 52 positive selection sites located at the sites of HLA-associated amino acid variations, whereas only 6 (2.2%) out of 270 purifying selection sites located at the sites of HLA-associated amino acid variations (Table 1). This implies that HLA-pressure is one of major factors driving the positive amino acid selection among Gag protein.

Associations between numbers of HLA-associated amino acid variants and clinical outcomes
After defining the HLA-associated amino acid variation sites in CRF01_AE in the analysis described above, we counted the numbers of HLA-associated variations in autologous viral sequences for each patient, and plotted them on the X axis, and plotted the plasma viral loads and CD4 + cell counts on the Y axis. We found significant associations between the numbers of HLAassociated amino acid variations and the CD4 + cell counts or viral loads. Patients with a higher number of HLA-associated amino acid variants tended to have a higher plasma viral load and lower CD4 + cell counts (Figure 2A). We further analyzed these associations according to the subregions of Gag in which the variations occurred. Intriguingly, these correlations were mainly driven by the associations with variations in the p24 region ( Figure 2B).

Amino acid variations in a recipient host with different HLA alleles
With in-depth interviews conducted by designated field workers, the index and contact cases were determined among the 65 concordant couples. Looking at the viral sequences in a pairwise manner, we noted that the frequencies of de novo HLA-associated mutations and reversions after viral transmission to contact cases with distinct HLA profile varied considerably, depending on the amino acid positions involved. Mutations and reversions of each HLA-associated amino acid variant were studied whenever data for at least five couples were available. When the virus was transmitted to a contact case with a different HLA environment, as confirmed by sequencing, the rate of reversion or mutation for each HLA-associated amino acid variant was calculated and was plotted on a scatter graph (Figure 3). To avoid overestimation of the mutation or reversion rate, we counted only HLA-associated sites with p values of ,0.01 with a 99% CI and with a denominator of more than one when we calculated their rates. In total, 30 HLA-associated amino acid variation sites were listed. For instance, at the S9X site restricted by B*13, which was selected by non-dominant amino acid, the mutation rate was calculated as 5/11 ( = 0.45), five S9T-B*13-positive contact cases divided by 11 S9S-B*13-negative index cases. Its reversion rate was calculated as 4/6 ( = 0.67), four S9S-B*13-negative contact cases divided by six S9T-B*13-positive index cases (see supplementary data for details of the mutation and reversion sequence variations, Table S2). For dominant amino acid selection sites, the mutation rate and reversion rate were calculated in the opposite way. At the K76X site restricted by A*02, the mutation rate was calculated as 3/3 ( = 1.0), three K76K-A*02-positive contact cases divided by three K76R-A*02-negative index cases. The reversion rate was calculated as 1/13 ( = 0.077), one K76R-A*02-negative contact case divided by 13 K76K-A*02-positive index cases. The average reversion rate was 0.42 and the average mutation rate was 0.33. There was a rough inverse relationship between the reversion and mutation rates. P255X (A*11) and I223X (B*13) scored reversion rates of 1.00 and both had low mutation rates. Conversely, F79X (A*24), K76X (A*02), and T242X (B*58) had mutation rates of 1.00 and the former two had low reversion rates. Interestingly, T242X (B*58) was outstanding in that both its mutation rate and reversion rate were very high. This indicates that the rate of accumulation of CTL escape mutations in a given population varies considerably among mutations and restricting HLA types.

T242N mutations
As described above, T242X had a high reversion rate. The vast majority of T242X mutations were T242N, known as an escape mutation from CTL (TSTLQEQIGW: TW10), restricted by the protective HLA alleles B*57 and B*5801 in the setting of clade B and C infections. This mutation emerges almost universally in B*57/*5801-positive subjects. Several studies have demonstrated that the T242N substitution affects viral replicative fitness in vitro and it is believed to contribute to the protective effect of these alleles against the progression of HIV disease [25,26]. Moreover, several mutations within the cyclophilin A binding loop, such as H219 and M228, have been shown to compensate to some extent for the reduced viral replicative capacity caused by T242N [27,28]. However, the roles of T242N and the compensatory mutations in CRF01_AE infections are unknown. HLA_B*5801 is known to present the same epitopes as B*57 [29], and our unpublished data indicate that the vast majority of B*58 alleles in Thailand are B*5801. There were no statistically significant differences in the plasma virus loads of the B*57/*58-positive andnegative populations in our cohort (data not shown). Five of the 23  B*57/*58-positive subjects did not carry T242X, and there was no statistically significant difference in their plasma viral loads or CD4 cell counts, i.e., in terms of the presence or absence of T242X in these patients (data not shown). We then stratified the B57*/*58positive patients with T242X according to the presence/absence of the described compensatory mutations. Interestingly, we found that B*57/*58-positive patients with the compensatory mutations had significantly higher viral loads and lower CD4 cell counts than those without the compensatory mutations (Figure 3), indicating that the proposed mechanism of virus attenuation by the escape mutation and its restoration by the compensatory mutations at the B*57/*5801 TW10 epitope is applicable in the context of CRF01_AE infections.

T242N mutations and transmission
It was recently reported that the transmission of viruses with attenuating CTL escape mutations, particularly T242N from B*57-restricted CTL, is associated with better early clinical outcomes in HLA-unmatched recipients [30,31]. However, the long-term effects of the transmission of these viruses to HLAunmatched recipients remain unknown. We summarized the amino acid variations around the TW10 epitope in B*57/*58negative contact cases who had contracted the virus from B*57/ *58-positive spouses and their clinical features ( Table 2). Only two B*57/*58-negative spouses carried the T242N mutation at the time of sampling. Both had very high CD4 cell counts of .500 cell/mL and very low viral loads of less than 10 4 copies/mL, which is in distinct contrast to the remaining six B*57/*58-negative spouses who lacked T242N (median plasma viral load, 5.39 log copies/mL), and supports the results of the recent study by Chopera et al. [28]. However, because the T242N escape mutation is known to emerge within the first three months of infection in B*57-positive subjects [32], it is unlikely that these six contact cases had acquired the wild-type T242 virus, but instead, the transmitted T242N probably reverted after its transmission to these recipients. These data suggest that the majority of the recipients from B*57/*58-positive donors do not receive the benefit conferred by the transmission of the attenuated virus after many years of infection, although we did not know the duration of the infection in each patient in the present study. We identified three other patients without the B*57/B*58 alleles who carried viruses with T242N, and they all had very low viral loads of less than 10 4 copies/mL (data not shown). We presume that they contracted the virus from B*57/B*58-positive patients, although we could not identify their index cases in our study population. Taken together, these results imply that the transmission of CTLselected attenuated viruses might confer a survival advantage on HLA-unmatched recipients, at least during the early stage of infection, and that this advantage is not limited to infection with a particular clade of virus. However, this effect may not be retained for an extended period of time.

Discussion
This is the first published study that systematically analyzes variations in the Gag sequence and their associations with HLA in HIV-1 CRF01_AE infections. We identified 56 amino acid variations at 44 amino acid positions, which were significantly associated with a particular HLA class I type. We found that a substantial number of HLA-associated amino acid variations appeared to be unique to this CRF01_AE-infected Thai population. However, despite these distinct variants, we confirmed that the capsid protein (p24) is probably the preferred target of CTLs in CRF01_AE infections. We also found that the reversion rate of these putative CTL escape mutations upon transmission to HLA-unmatched recipients varies considerably, suggesting that the rate of accumulation of CTL escape mutations in a given population differs substantially between mutations. Our data also suggest that the transmission of CTL-selected attenuated viruses is likely to confer a survival benefit on HLA-unmatched recipients, at least during the early days of the infection. These associations between HLA and viral sequences can be explained in several ways. The majority of associations are probably attributable to specific mechanisms by which the virus escapes from HLA-restricted, HIV-specific CTLs, such as loss of peptide binding, loss of T-cell receptor recognition, and/or changes in peptide processing and presentation [1]. In fact, we found that two thirds of the HLA-associated variations were located within known CTL epitopes or their flanking regions. However, most previously reported CTL epitopes were identified in studies of other clades, indicating that these CTLs are probably cross-clade CTLs. Because studies of CTL epitopes in CRF01_AE infections are limited, we believe that several other mutations may occur in clade-specific CTL epitopes that have not yet been reported. Our current work on CTL epitope mapping using overlapping CRF01_AE Gag peptides has identified several new CTL epitopes, and at least two amino acid mutations have been found within the newly identified CRF01_AE Gag CTL epitopes (data are in preparation). Linkage disequilibrium can also explain the associations between HLA and viral sequences. There is a strong association between HLA_A*33 and _B*58 [22,23]. The A*33-associated T242X and G248X mutations are widely known to be selected by HLA_B*57/*5801 [26], so they are likely to be reflected in the linkage disequilibrium with B*58. Some HLAassociated mutations may also be part of the structural and functional compensatory mechanism underlying the development of primary CTL escape mutations, which can arise at sites considerably remote from the relevant epitopes [30].
This study has relatively low statistical power compared with previous studies. However, one of its strong aspects is that we identified a substantial number of HLA-associated amino acid variations that appear to be unique to a CRF01_AE-infected Thai population. Many of these variations do not appear in the list of HLA-associated mutations identified in subtype-B infections in over 1,500 subjects [15], and this cannot be explained by the relatively low statistical power of the present study. For instance, mutations such as S9X, V280X, and P453X had convincingly strong associations with HLA_B*13, _B*46, and _B*55, respec-tively (p,0.001; Table 1). However, none of these associations have been noted in subtype B, suggesting that there are a number of unidentified CTL pressures on Gag in the context of CRF01_AE infections in the Asian population. This indicates that an extensive search for CTL epitopes in various clades is warranted to facilitate the development of globally effective CTL vaccines.
Several publications have suggested that the Gag CTL response, as measured by interferon c ELISPOT, has the most profound effect on the clinical outcome in subtype B and subtype C infections. Responses to the capsid protein are also likely to be most crucial in the containment of viral replication in vivo [31]. In this study, we have shown that variations in the capsid protein (p24) had the strongest association with viral load and CD4 cell count, indicating that regardless of the HIV clade, the p24 capsid is the most preferred target of HIV-1-specific CTLs. Therefore, p24 may be one of the important targets for effective CTL-based vaccines.

Reversion rates and mutation rates
When the virus is transmitted into a new host with a different HLA environment, the virus, which had already adapted to the previous host, must adapt again to the new HLA environment by reversion of the previous mutations and/or the creation of new mutations. One of the most interesting results of the present study is the variability in the reversion and mutation rates for each HLAassociated variant. We found that these rates varied considerably, depending upon the amino acid position, and that the reversion rate tended to correlate inversely with the mutation rate. It is plausible that an amino acid change with a low reversion rate tends to accumulate in the population, and that the rate of accumulation is higher if the mutation rate is higher and especially if its HLA allele is dominant. In fact, we found that all the   variations selected by dominant amino acid were associated with common HLA alleles and rarely changed from the consensus sequence, even after they were transmitted to HLA-unmatched recipients. These findings suggest that the viruses circulating within the population had already adapted to the common HLA alleles, resulting in best-fit sequences that could escape from those alleles. The results of our study will increase our understanding of the influence of immune pressure on HIV and on the future direction of virus evolution. Conversely, it is also plausible that an amino acid change with a high reversion rate, presumably with a functional or structural constraint at that position, is unlikely to accumulate rapidly in the population. It will be important for future CTL vaccine development to consider whether an escape mutant will accumulate or not. Therefore, further studies of this kind are required to provide valuable insights for future vaccine design.

T242N issues
This is the first report indicating that the p24 T242N escape mutation from the B*57/*58 CTL has a significant impact on the HIV-1 viral load in CRF01_AE infections and that the mutations, H219 and M228, compensate for the crippling effect of T242N. Although this is a rather predictable result because the TW10 site is conserved throughout the HIV-1 clades, it increases our insight. Two other dominant CTL epitopes within the capsid protein are restricted by B*57 (IW9: ISPRTLNAW; KF11: KAFSPEVIPMF). However, the founder virus of CRF01_AE has a well-described 'peptide-processing mutation' at IW9 that affects the epitope presentation on HLA class I molecules, and a well-described CTL escape mutation within KF11 [33]. In fact, 100% of the CRF01_AE sequences in our cohort carried these amino acid substitutions (data not shown; see the Methods, for the GenBank accession number), indicating that these two CTL epitopes are less likely recognized in CRF01_AE infections. Several lines of evidence indicate that TW10 is the most important target determining the viral load set point: TW10 is the earliest target during primary infection [32]; and among all the described HLAassociated mutations, T242N is the earliest escape mutation that emerges during acute infections [13]. In light of these previously reported data and the nature of the CRF01_AE sequence, it would be interesting to determine whether TW10 is sufficient to protect B*57/*5801-positive subjects from disease progression. Unfortunately, we saw no clear protective effect of B*57/*5801 in the cohort reported here. However, as shown by the low median CD4 + T-cell count, our cohort was substantially advanced in terms of disease progression. Therefore, the accumulated compensatory mutations might have masked the true protective effect of B*57/ *5801. Supporting this explanation, we observed substantially lower plasma viral loads in the B*57/*5801-positive subjects with T242N but without the compensatory mutations (p = 0.012 by ANOVA; Figure 4).
We have also shown that the transmission of virus crippled by T242N might be associated with lower plasma viral loads in HLAunmatched recipients, supporting a previous study of clade C infections [34]. However, the long-term effects of the transmission of virus attenuated by CTL escape mutations remain unknown and these contact cases should be examined longitudinally to determine whether virological escape accompanies the reversion of the T242N escape mutation.

Limitations
One of the limitations of this study is that we had no information regarding the timing of HIV transmission within the couples, although the rate of mutation is known to depend on the time from transmission [35]. However, perhaps because the duration of marriage in our discordant couples was quite long (median of seven years), we detected negligible amino acid changes between the first and second samplings. Therefore, we believe that most of the associations between amino acid mutations and HLA alleles observed in this study occurred in chronic infections. Another limitation is that this type of analysis depends heavily on statistical power. Therefore, it is difficult to identify HLAassociated variations if the allele frequencies and rates of mutation are low. Moreover, we did not use multiple testing corrections because of the small sample size, so we must admit there would have been a substantial number of false positive results. However, the high odds ratios of some of the novel HLA-associated mutations in the context of CRF01_AE infections in this Asian population strongly encourage us to obtain more information about CTL epitopes in different geographical regions where distinct HIV clades circulating, to develop globally effective CTLbased vaccines.
Our contact cases were not incident cases. Inevitably, there is also concern that the estimated direction of viral transmission was not true. However, because the HIV epidemic in Thailand started with commercial sex workers transmitting to their male clients and then from husbands to their wives [16], and partly because our study was conducted in a hospital close to a rural community, our interviews clearly indicated which spouse displayed risk behavior for HIV infection in most couples.
In conclusion, our data suggest two points: (a) HLA-associated amino acid mutations and CTL selection pressure on the p24 antigen appear to have the most significant impact on HIV replication in this CRF01_AE infection in an Asian population; and (b) the rates of accumulation of CTL escape mutations at the population level differ substantially between escape mutations, because the reversion rates varied considerably among the HLAassociated mutations. Supporting Information Figure S1 The inserted box magnifies the phylogenetic tree to show how the couples were identified. Found at: doi:10.1371/journal.pone.0011179.s001 (0.48 MB TIF)