Is the tryptophan codon of gene vif the Achilles’ heel of HIV-1?

To evaluate the impact of hypermutation on the HIV-1 dissemination at the population level we studied 7072 sequences HIV-1 gene vif retrieved from the public databank. From this dataset 854 sequences were selected because they had associated values of CD4+ T lymphocytes counts and viral loads and they were used to assess the correlation between clinical parameters and hypermutation. We found that the frequency of stop codons at sites 5, 11 and 79 ranged from 2.8x10-4 to 4.2x10-4. On the other hand, at codons 21, 38, 70, 89 and 174 the frequency of stop codons ranged from 1.4x10-3 to 2.5x10-3. We also found a correlation between clinical parameters and hypermutation where patients harboring proviruses with one or more stop codons at the tryptophan sites of the gene vif had higher CD4+ T lymphocytes counts and lower viral loads compared to the population. Our findings indicate that A3 activity potentially restrains HIV-1 replication because individuals with hypermutated proviruses tend to have lower numbers of RNA copies. However, owing to the low frequency of hypermutated sequences observed in the databank (44 out of 7072), it is unlikely that A3 has a significant impact to curb HIV-1 dissemination at the population level.

It has been proposed that hypermutation is not enough to repress HIV-1 infection within host because proviruses with varying amounts of G!A mutations are commonly observed in the cells of infected individuals [10,26]. So, it is reasonable that when G!A mutations are ineffective in neutralizing viral genomes, A3G activity can actually increase HIV-1 diversification [9,27].
In this work, we used 7042 sequences to estimate the rate of stop codons at tryptophan (TGG) residues of the vif gene. By assuming that A3 hypermutations are context-dependent and RT mutations are context-independent this approach enables us to distinguish the impact of A3 hypermutation from the RT activity in the vif gene.
We found that patients harboring proviruses with one or more stop codons at the tryptophan sites of the gene vif had higher CD4+ T lymphocytes counts and lower viral loads compared to the population.

Sequence processing
To evaluate the impact of hypermutation to the HIV-1 dissemination at the population level we studied 7072 sequences of the vif gene obtained from the public HIV databank at the Los Alamos National Laboratory (https://www.hiv.lanl.gov). From this dataset 854 sequences that had associated values of CD4+ T lymphocytes counts were selected to assess the correlation between clinical parameters and hypermutation. Sequence alignment and editing. The sequences were aligned using the ClustalX program [28]. In addition, the SE-AL program, version 2.0 (Department of Zoology, Oxford University; http://tree.bio.ed.ac.uk/software/seal/) was used to edit the alignment in order to keep all reading frames opened.

Mutation rates and statistical analysis
To compute the stop codons induced by A3 proteins we used the same approach described by Cuevas et al., 2015. Briefly, it is well characterized by the preferential targets of distinct A3 proteins while editing the minus strand of viral DNA [27]. The A3G protein converts GG to AG primarily in the GGG triplet, thus in this context tryptophan codons (TGG) can be converted to the stop codons TAG or TGA. On the other hand, A3F/D/H convert GG to GA in the target GGA, consequently, tryptophan codons (TGG) will be converted into TGA stop codons. Following this conception, it is possible to estimate the amount of A3 mutations compared to the baseline mutations induced by the reverse transcriptase RT. Since vif gene of HIV-1 is essential to the life cycle of the virus and the protein has eight canonical tryptophan codons at the sites 5, 11, 21, 38, 70, 79, 89 and 174 we counted the number of stop codons in each of these sites [7,19,24,25]. The Bayesian independent Welch test was used to correlate clinical data and mutations rates. All statistical tests were performed using JASP software v.0.11.1 (https://jasp-stats. org/). Boxplots were constructed using R software v 3.5.1 (www.r-project.org).

Rates of stop codons
Estimates of mutations rates indicated there are variable (i.e., 83 and 70) and more conserved (i.e., 5) sites in vif (Fig 1). The frequency of stop codons at sites 5, 11 and 79 ranged from 2.8x10 -4 to 4.2x10 -4 while at codons 21, 38, 70, 89 and 174 the frequency of stop codons ranged from 1.4x10 -3 to 2.5x10 -3 . The differences in mutation rates induced by the A3 (dark gray bars in Fig 1) and by the RT (light gray bars in Fig 1) revealed that sites not targeted by A3 proteins (i.e., 5, 11 and 79) were those with lower mutations rates. Conversely, the sites 38 and 70, which are targets of the A3G (at this site TGG is followed by G; TGGG), had higher rates (Fig  1). It is worth mentioning that in site 70 of the Vif all patients presented the context TGGG meanwhile in site 38 the context TGGG was presented in 205 of patients while TGGT was present in 70%. The context TGGT can be mutated either by the A3 activity, converting TGGT into TAGT, or by the RT activity, converting TGGT into TGAT. Besides, at the sites 89 and 174 the tryptophan codon TGG is followed by A, this context (TGGA) is targeted by the A3D/F/H proteins. The above results showed that tryptophan sites 5, 11 and 79, not targeted by A3 proteins, are those with lower rates of stop codons. Conversely, sites targeted by A3 proteins have higher rates of stop codons.

Rates of other mutations
We also estimated the overall mutation rates considering the TGG context regardless of the position it was located in the vif protein. These measurements were summarized in Fig 2 that shows that mutations from TGG (tryptophan codon) to the stop codons TGA or TAG are higher when the tryptophan codon is followed by Gs or As (TGGG). Tryptophan codons that are targeted by A3 proteins (i.e., TGGG or TGGA) have rates of 1x10 -3 to 3.1x10 -3 while codons target by the RT has rates of 2x10 -4 to 6x10 -4 .

Stop codons and clinical status
To assess the correlation between clinical parameters and mutations we used 854 sequences that had associated values of CD4+ T lymphocytes counts and viral loads. We found eleven sequences having one or more stop codons at the tryptophan sites of the gene vif. Notably, these sequences presented lower viral loads (posterior probability = 0.097) and higher levels of CD4+ lymphocytes (posterior probability = 0.071) compared with the overall values of 854  Table 1). The median viral load in patients with hypermutation was equal to 7,864.00 copies/mL (variance = 1.624E11) with a mean of 216,392.36 copies/mL, but in patients without hypermutation the median was 50,709.00 copies/mL (variance = 2.865E11) and mean of 225,917.78 copies/mL. The median CD4 + T lymphocytes for the group with hypermutation was 434.00 (variance = 67968.82) cells/mm 3 and mean of 485.72 cells/mm 3 , while the samples without hypermutation median CD4 + T cells was 403.00 (variance = 65505.90) cells/mm 3 and mean of 434.61 cells/mm 3 (Fig 3).

Discussion
The amount of A3G hypermutation varies considerably along HIV-1 genome and this gradient of G-to-A substitutions correlates with the time the minus strand remains as a single-stranded molecule during replication [26]. One consequence of the hypermutation gradient is that some genes are more affected than others. The vif gene has the lowest amount of A3G-associated mutation compared to other HIV-1 genes [27]. Vif also has tryptophan residues (W) at the specific positions 5, 11, 21, 38, 70, 89 and 174 that are involved in A3G and A3F binding. These codons will be target by the A3 activity and the TGG codon will be changed into a stop codon (e.g., TAG, TGA, TAA). Equally, the TGG codon will be targeted by RT activity converting it into stop codons and also into others codons such as TTG, TGT, AGG, etc.
Vif has some conserved residues, notably in the motifs 14 [19]. This lower diversity in some residues has been related to the very strong purifying selection detected on this viral protein, thus indicating that vif is essential to the HIV life cycle [4,7,19,21,23]. While hypermutation induced by A3G activity is a natural barrier against retroviruses it is not enough to restrain HIV-1 infection. Since HIV-1 infection is characterized by multiple strains forming a quasispecies, then it is likely that hypermutated strains can benefit from circulation of or even reservoir viruses in distinct tissues. It is likely that A3G activity can actually increase HIV-1 diversification when G-to-A hypermutation is ineffective in neutralizing all viral genomes within a host [9].
Colson et al., [29] showed that in long-term non-progressors patients A3 activity is able to restrain HIV-1 replication by changing of tryptophan (TGG) codons into stop codons (TAG/ TGA) mainly on the gene vif. However, the effect of hypermutation to the spread of HIV-1 is not known yet. We studied this subject by using the tryptophan codons of Vif as a proxy to evaluate the A3 activity to potentially reduce the chances of HIV spread between individuals.

Hypothesis
Bayes factor Posterior probability Our analysis indicated a correlation between clinical parameters and hypermutation where patients harboring proviruses with one or more stop codons at the tryptophan sites of the gene vif had higher CD4+ T lymphocytes counts and lower viral loads compared to the population. We found a correlation between clinical parameters and hypermutation in patients harboring proviruses with one or more stop codons at the tryptophan sites of the gene vif had higher CD4+ T lymphocytes counts and lower viral loads compared to the population. Thus our findings indicate that A3 activity potentially restrains HIV-1 replication because individuals with hypermutated proviruses tend to have a lower number of RNA copies. However owing to the low frequency of hypermutated sequences observed in the databank (44 out of 7072), it is unlikely that A3 has a significant impact to curb HIV-1 dissemination at the population level. Our findings are in consent with the observation that A3G hypermutation is more frequent among elite controllers [30]. It is also worth to mention that CD4+ T lymphocytes counts are related to selective diversity and hypermutation [11,13,31].