Transmissions of simian viruses to humans has originated the different groups of HIV-1. We recently identified a functional motif (CLA), in the C-terminal domain of the integrase, essential for integration in HIV-1 group M. Here, we found that the motif is instead dispensable in group O isolates, because of the presence, in the N-terminal domain of HIV-1 O of a specific sequence, Q7G27P41H44, that we define as the NOG motif. Alterations of reverse transcription and of 3’ processing observed by mutating the CLA motif of IN M are fully rescued to wt levels by inserting the sequence of the NOG motif in the N-ter of the protein. These results indicate that the two motifs (CLA and NOG) functionally complement each other and a working model accounting for these observations is proposed. The establishment of these two alternative motifs seems to be due to the different phylogenetic origin and history of these two groups. Indeed, the NOG motif is already present in the ancestor of group O (SIVgor) while it is absent from SIVcpzPtt, the ancestor of group M. The CLA motif, instead, seems to have emerged after SIVcpzPtt has been transferred to humans, since no conservation is found at the same positions in these simian viruses. These results show the existence of two-group specific motifs in HIV-1 M and O integrases. In each group, only one of the motifs is functional, potentially leading the other motif to diverge from its original function and, in an evolutionary perspective, assist other functions of the protein, further increasing HIV genetic diversity.
HIV-1 establishes a permanent infection by integrating its genome in the one of the host cell. This key step is achieved by the viral integrase (IN), which is composed of three domains: N-terminal (NTD), catalytic core (CCD), and C-terminal (CTD). We recently reported, in the CTD, the existence of a functional motif (CLA) that is essential for integration in HIV-1 group M. Here we show that in HIV-1 group O, the function exerted by the integrase M CLA motif is instead ensured by another motif (NOG), located in the NTD of the protein. We show that these two motifs can functionally complement each other by acting on the same steps of the viral cycle, highlighting for the first time such an important functional duality between HIV-1 integrases. We also show that, while the NOG motif in HIV-1 O was likely inherited from its simian ancestor (SIVgor), the CLA motif seems to have emerged in the human host, probably in the process of adaptation to the new host of the simian virus on its way to become HIV-1 M. Such a requirement did not seem to have existed for HIV-1 O, possibly because of the presence of the NOG motif.
Citation: Toccafondi E, Kanja M, Winter F, Lener D, Negroni M (2023) A snapshot on HIV-1 evolution through the identification of phylogenetic-specific properties of HIV-1 integrases M/O. PLoS Pathog 19(3): e1011207. https://doi.org/10.1371/journal.ppat.1011207
Editor: Welkin E. Johnson, Boston College, UNITED STATES
Received: October 26, 2022; Accepted: February 13, 2023; Published: March 30, 2023
Copyright: © 2023 Toccafondi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by the ANRS (grant ECTZ72120) to M.N. E.T. was recipient of a three-years doctoral fellowship from the ANRS (2nd call 2018) and then a one-year doctoral fellowship from Sidaction (FJC-12935). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Transmission of viruses from animals to human is a main threat to human health, with the HIV-1 pandemic being a clear example of this. The four HIV-1 groups, in fact, all originated from an independent zoonotic transmission of simian viruses to humans. Group M and group N both derive from SIVcpzPtt [1,2], while group O and P derive from SIVgor [3,4]. Although HIV-1 groups M and O share similar geographic and temporal origins [5–7], they encountered a largely different epidemiological success. While group M is the responsible for the AIDS pandemic, infecting around 39 million people all over the world, group O has a largely lower epidemiological success, infecting around 30 thousand people mostly in the west-central region of Africa [8,9]. The bases for this discrepancy are only partially known to date, although they constitute a central question to identify critical properties allowing cross-species transmission and diffusion. Their different zoonotic origin and the subsequent sequence diversification in the human host are responsible for the large intergroup genetic diversity between groups M and O that can reach almost 50% in the env gene . Despite this, they have globally convergent phenotypes and, to date, only few functional differences have been highlighted between their proteins and enzymes. Among those, the most marked one concerns the counteraction of the antiviral properties of the cellular protein tetherin, that is exerted by Vpu in HIV-1 M while it is partially carried out by Nef in the case of HIV-1 O [11–14].
HIV replication requires the integration of the reverse transcribed genomic RNA into the genome of the infected cell. This key step is catalyzed by the integrase (IN), one of three viral enzymes. Integrases M and O share 84% of sequence identity as well as the same domain organizations and the same functions. IN is constituted by three domains connected by flexible linkers: the N-terminal domain (NTD), the catalytic core domain (CCD), and the C-terminal domain (CTD) [15–17]. Each of these domains is specialized in one or more functions. The NTD is important for the multimerization and stabilization of the active form of the integrase [18,19], which is a highly organized multimer formed by several dimers of dimers [20,21]. The CCD is involved in DNA binding and contains the amino acidic triad responsible for the catalytic activity of the enzyme , but it is also the domain involved in protein dimerization and it is in charge of the interaction with LEDGF/p75, a fundamental host factor required for the successful infection by HIV-1 . Finally, the CTD is involved in binding viral RNA/DNA at different steps of the infectious cycle [24–27], and in the interaction with the viral reverse transcriptase [28,29].
It is in the CTD of the IN from group M that we previously identified a functional motif, constituted by four non-contiguous amino acids (positions 222, 240, 254, and 273) . We will refer to the sequence N222K240N254K273 of integrases M (that is the one yielding the highest levels of integration in group M while also assuring the highest levels of reverse transcription) as the "CLA (C-terminal lysine-amidic) motif" and to the same positions, irrespectively of the amino acids harbored, as the CLA positions. Despite its high conservation in vivo, the positions of the four residues could be permutated within the motif, in most cases, without affecting the efficiency of integration in cell culture . In fact, as long as at least two lysine are present within the motif and the remainders are amidic residues (N or Q), functionality is retained to wt levels in most of the possible combinations . The combination where the integration efficiency was the most affected, dropping to 25% of the wt, is NQKK. Also, other steps of the viral cycles are affected when this sequence was inserted in HIV-1 M IN. Indeed, a drop in the amount of the reverse transcription products and in the amount of the 3’ processed ends was observed . We previously determined the structure of the CTD for this variant (NQKK) showing that the protein folded into a structure similar to that of the wt CTD (N222K240N254K273), but with a different distribution of charges at its surface , which could participate in the observed impairment of functionality. In the wt conditions, indeed, the CLA motif forms a positive surface that was proposed to bind to a negatively charged partner, while, when the amino acidic sequence NQKK is inserted at the CLA positions, the positive surface is lost . Astoundingly, we found that this aminoacidic sequence (N222Q240K254K273) is highly conserved in group O CLA positions, raising the question of how could have been selected in group O a motif with such a markedly reduced functionality compared to the motif of group M. Understanding whether a possible functional difference between the integrases of HIV-1 isolates M and O might exist could help us understanding the reasons behind their different success. Therefore, investigating the role of the CLA motif functionality in the IN of HIV-1 O isolates constituted the starting point of this work.
The CLA motif is dispensable in isolates of group O
While, as mentioned above, the influence on integration of the amino acids that occupy the CLA positions has been well characterized for HIV-1 group M, their effect is unknown for group O isolates. To shed light on this aspect, we used two isolates from this group, BCF120 and RBF206 (named hereafter O120 and O206, respectively) that present in the CLA positions either the same sequence as the group O consensus sequence (NQKK, isolate O120, Fig 1A) or a different one (KQKQ, isolate O206 that was chosen as outlier). In both isolates we replaced the sequence in the CLA positions by NQNQ (O120/NQNQ and O206/NQNQ, Fig 1B and 1C), a sequence that was shown to abolish integration in isolates M . In sharp contrast to what observed for group M, for both isolates the replacement of the original sequence in the CLA positions by NQNQ did not affect integration neither in HEK293T nor in Jurkat cells (Fig 1B and 1C). The same replacement in isolate M AF286237 (referred hereafter as "isolate M"), used as a control, led to undetectable levels of integration (Fig 1D). These observations indicate that either isolates O do not require the function exerted by the CLA motif or that this function is endorsed by another region of the integrase or by another protein.
The NTD of isolates O complements the function of the CLA motif of isolates M
We first investigated whether another region of group O integrases exerts the same function of the integrases M CLA motif. To this end, we replaced, in O206/NQNQ, five large regions with the homologous ones of isolate M and measured integration in HEK293T (Fig 2A). In isolate M, the replacement of the NKNK sequence, the CLA motif, by NQNQ was sufficient to abolish integration (Fig 1B) indicating that no other region complements the default in the CLA motif in this group. Therefore, if the region that ensures the functions of the CLA motif in O206/NQNQ is replaced by the homologous region of isolate M, integration should no longer occur.
(A) Sequence conservation logo of the CLA motif positions in isolates of group O. (B-D) Top of each panel: schematic representation of IN tested for integration. Color code is white for isolates M and black for isolates O. When mutated with respect to the sequence of the wt, the amino acids of the CLA motif are shown in capital letters. Bottom of each panel: normalized levels of integration relative to the level of the wt IN. (B) n = 3 for HEK293T and n = 5 for Jurkat. (C) n = 5. (D) n = 4. Data are shown as the average ± SD. ****p ≤ 0.0001. ns, not significant (two-tailed, unpaired Student’s t-test).
Integrase is a pleiotropic protein. As such, if mutated, it can influence different steps of the infectious cycle, several of which can affect the generation of proviral DNA. Among these are reverse transcription and, when IN is still part of the Gag-Pol precursor, Pr160Gag-Pol proteolytic processing, a step required to obtain a mature infectious particle. The decrease in the number of provirus generated with a mutated IN could therefore be due either to a default in integration per se, or to a default in the steps preceding integration. For these reasons, for each mutant generated in this work, we evaluated, besides the formation of proviral DNA, the efficiency of reverse transcription and that of Pr55Gag proteolytic processing. Furthermore, if less reverse transcription products (RTPs) are produced with a mutant, less proviral DNAs will be generated even if the mutant is not affected in the step of integration itself. For this reason, to measure the efficiency of integration per se we expressed the levels of integration normalized by the amount of late reverse transcription products throughout the study (see Methods).
The estimates of the efficiency of integration for the chimeras shown in Fig 2A clearly indicate that integration was abolished for two of them (chimeras O206/NTD-M/NQNQ and O206/CCD1-M/NQNQ, Fig 2B), corresponding to the chimeras where either the NTD or the N-terminal part of the CCD were replaced by the homologous regions of isolate M. The effect on these two mutants was specific for integration since proteolytic processing of Pr55Gag was unaffected with respect to wt IN O206 in all chimeras as well as in O206/NQNQ (S1A and S1B Fig) while reverse transcription was reduced to approximately 60% of wt IN O206, although in a comparable manner across the chimeras (Fig 2B).
(A) Schematic representation of the chimeras with the NQNQ sequence in the CLA motif positions and of IN O wt, as reference at the top of the drawing. Color code is black for isolates O and white for isolates M. (B) Normalized levels of integration (left graph) and amount of RTPs (right graph), relative to the wt IN, for the chimeras shown in panel A (n = 5 for O206 wt and O206/NQNQ, n = 3 for the remaining samples). (C) Schematic representation of the mutants used to discern whether the loss of functionality of the two chimeras shown in panel B is related to the functionality of the CLA motif. (D) Normalized levels of integration (left graph) and amount of RTPs (right graph), relative to the wt IN, for the chimeras shown in panel C (n = 7 for O206 wt and O206/NKNK, n = 3 for the remaining samples). Data are shown as the average ± SD. ****p ≤ 0.0001. **p ≤ 0.01. *p ≤ 0.05. ns, not significant (one-way ANOVA with Tukey’s multiple comparisons correction).
The inability of O206/NTD-M/NQNQ and O206/CCD1-M/NQNQ to produce proviral DNA could be due to the absence of the functionality provided by the equivalent of the CLA motif or to other defects such as, for example, protein misfolding. To ascertain whether the lack of integration was related to the absence of the region that ensures the function of the CLA motif, we replaced NQNQ (non-functional CLA motif) by NKNK (functional CLA motif), obtaining chimeras O206/NTD-M/NKNK and O206/CCD1-M/NKNK (Fig 2C). We also inserted the sequence NKNK in wt IN O206 (O206/NKNK) to verify that this insertion did not affect the functionality of the enzyme. As shown in Fig 2D, neither integration nor reverse transcription were affected in this mutant. Integration was fully restored for the chimera containing the NTD M, while it remained undetectable for O206/CCD1-M/NKNK (Fig 2D). Therefore, the default of chimera O206/NTD-M/NQNQ appears related to the lack of the region that exerts the function of the CLA motif, whereas for O206/CCD1-M/NKNK the loss of integration was unrelated to the functionality ensured by the CLA motif (Figs 2D and S1B). These results indicate that the NTD of isolate O206 can complement the absence of a functional CLA motif. Furthermore, the high similarity between the NTD of O206 and the consensus sequence O (only one substitution, K46R; S2 Fig), suggests that this is likely the case for integrases of HIV-1 group O in general.
Identification and characterization of the N-terminal O group (NOG) motif
The consensus sequences of the NTDs M and O differ for 10 residues (Fig 3A). According to the score of the BLOSUM62 matrix , the replacement of four of these residues (Q7, G27, P41, H44, highlighted by a star in Fig 3A and highly conserved in group O as shown in Fig 3B) introduces more drastic changes in the properties of the protein than the substitution of the other residues.
(A) Alignment of the amino acid sequences of the NTD of IN M (top row) and IN O (bottom row). Unchanged amino acids in IN O with respect to IN M are indicated by a dash. Positions differing in the two sequences are in bold. Residues whose replacement gives a BLOSUM62 matrix score difference ≤ 1 are highlighted by a star. (B) Sequence conservation logo for positions 7, 14, 41 and 44 of IN O. (C) Schematic representation of the mutant IN used to evaluate the function of the NOG motif. White for isolate M and black for isolate O120. When mutated with respect to the sequence of the wt, the amino acids of the NOG or of the CLA motifs are shown in capital letters. (D) Normalized levels of integration relative to the wt IN, for the chimeras shown in panel C (n = 3 for M/NTD-O/NQKK; n = 6 for all the remaining samples). (E) Amounts of RTPs, relative to the wt IN, for the chimeras shown in panel C (n = 3 for M/NTD-O/NQKK; n = 6 for all the remaining samples). (F) Schematic representation of IN M/QGPH. (G and H) Normalized levels of integration (panel G) and amount of RTPs (panel H), relative to the wt IN, for IN M/QGPH (n = 6 for all the samples). Data are shown as the average ± SD. ****p ≤ 0.0001. ***p ≤ 0.001. **p ≤ 0.01. ns, not significant (one-way ANOVA with Tukey’s multiple comparisons correction for panels D and E. Two-tailed, unpaired Student’s t-test for panels G and H).
To test if the four residues Q7G27P41H44 of the NTD O are the ones allowing for the complementation of the functionality ensured by the CLA motif, we inserted them in the NTD of the IN HIV-1 M that harbors, in the CLA positions, the same sequence as the consensus one of isolates O (IN M/QGPH/NQKK, Fig 3C). This double mutant recovered an integration efficiency from 25% of IN M/NQKK to 100% of wt IN M, both in HEK293T and Jurkat cells (Fig 3D). The same results were obtained by replacing the whole NTD M by the NTD O (IN M/NTD-O/NQKK in Fig 3D). For both cell types, the replacement of the QGPH, also led to an improvement of reverse transcription levels to the ones observed for the wt enzyme (Fig 3E). The four amino acids Q7G27P41H44, located in the NTD of HIV-1 O IN, are therefore sufficient to complement the decreases in integration and in reverse transcription that were generated by the replacement of the sequence NKNK by NQKK in the CLA motif of IN M. We will refer to them hereafter as the NOG (for “N-terminal O group”) motif. Finally, no differences were observed in the efficiency of Pr55Gag processing for all constructions compared to the wt (S1C Fig).
Tracing the phylogenetic origins of the NOG and CLA motifs
To understand how these functional differences between IN from HIV-1 groups M and O could have emerged, we analyzed the NOG and the CLA positions in the simian viruses assumed to be the ancestors of these groups of HIV-1, SIVcpzPtt and SIVgor, respectively (Fig 4A). Concerning the NOG motif, the sequence QGPH, is highly conserved in HIV-1 group O (see S3 Fig) and in SIVgor (Fig 4A) and it is also found in the isolate supposed to be the closest to the founder of HIV-1 O, SIVgor BQID2  (Fig 4A). These observations suggest that this motif was inherited by HIV-1 O from its ancestor and that it has remained unaltered ever since. The possibility of a cross-species transmission event of the NOG motif from SIVgor to human viruses is further supported by the observation that also the other group of HIV-1 that originated from SIVgor (HIV-1 group P) carries the NOG motif in one of the two isolates identified so far (Fig 4A). The next question was then to understand whether the NOG motif emerged in SIVgor or if it was inherited from its ancestor. Indeed, SIVgor originated from a cross-species transmission event of SIVcpzPtt in which, the NOG positions are occupied by the highly conserved sequence KNDQ (Fig 4A). The unrelatedness of the sequences found in the NOG positions in these two viruses, supports the view that the NOG motif emerged and was fixed in SIVgor. The amino acidic sequence found at the NOG positions of SIVcpzPtt appears to be instead conserved in HIV-1 groups M and N, in which the same sequence (KNDQ) is found (Fig 4A).
(A) Sequence conservation logos of the NOG (in grey) and CLA (in black) motifs are shown for HIV-1 groups (M, N, O, P) and their ancestor viruses (SIVgor and SIVcpzPtt). The numbers above each logo indicate the amino acidic position in the IN. The number below each logo indicate the number of sequences that were aligned to obtain the logo. Each arrow represents a zoonotic transmission event. On or under each arrow are shown the name and the NOG and CLA sequences of the isolates phylogenetically most related to the HIV-1 group indicated by the arrow itself. (B) The CLA positions amino acidic sequence is shown for each SIVcpzPtt isolate, which name are indicated on the left. Amidic amino acids are shown in black; basic amino acids are shown in grey; any other amino acid is shown in white (with a black outline).
The CLA motif seems to have been established in HIV-1 M, after SIVcpzPtt had been transmitted to humans. Indeed, in SIVcpzPtt no conservation is observed in the CLA positions, except for K273 (Fig 4A). Accordingly, none of the two isolates of SIVcpzPtt considered to be the closest to HIV-1 M, SIVcpzPtt MB897 and SIVcpzPtt LB715 [2,32], carries the sequence NKNK (Fig 4A). Strikingly, when we looked at the conservation of the CTD region (200–280) of SIVcpzPtt we found that it is overall conserved (S4 Fig). In fact, together with the first three CLA positions (222, 240, 254), only a few more show to have a conservation level of around 50% (211, 212, 220, 255, 278, 279, 280) (S4 Fig). Furthermore, no amidic or basic amino acids are exclusively found in SIVcpzPtt IN CTD, in comparison to HIV-1 M IN CTD (S4 Fig), suggesting that no other positions could compensate for the CLA functionality in SIVcpzPtt IN CTD. Despite the lack of conservation at SIVcpzPtt CLA positions, though, a trend for the preference of basic (K, R) and amidic (N, Q) amino acids is observable (Fig 4B). At the level of the individual isolates, two SIVcpzPtt (EK505 and US) have the NKNK sequence (Fig 4B). EK505 is the isolate most related to HIV-1 group N . In this group, as in HIV-1 M, the sequence NKNK is highly conserved suggesting that transmission of this specific isolate, or isolates closely related to it, could be responsible for the presence of the motif in HIV-1 group N. Irrespectively of the path followed, it is worth to underline that the sequence NKNK was ultimately fixed in both groups of human viruses derived from SIVcpzPtt. Concerning the relationship between HIV-1 O and SIVgor for the amino acids present in the CLA positions, the simian virus carries the conserved sequence NTKK that, in HIV-1 O, presents the replacement of T240 by a Q (NQKK, Fig 4A). This replacement could reflect adaptation to the new host, but it could also have been inherited from an isolate of SIVgor that had a Q at position 240, as for BQID2, that is furthermore assumed to be the closest to HIV-1 O (Fig 4A).
Once established that the sequence of the CLA motif (NKNK) of HIV-1 group M, was probably not inherited from SIVcpzPtt, we wanted to understand if a virus with the IN from the closest to HIV-1 M SIVcpzPtt MB897, but already endowed with the NKNK sequence in the CLA positions, could have been infectious in human cells. To address this issue, we first produced viral particles in which we replaced the RT and IN of HIV-1 M by those of isolate SIVcpzPtt MB897, that has the KKKK sequence in the CLA positions (Fig 5A). The goal was to verify that the chimeric nature of these viruses (Gag and protease from HIV-1 M; RT and IN from SIV) was not an obstacle for infection. The chimeric particles were fully processed by the protease (S5A Fig) and integration levels were twice those obtained with wt IN M (Fig 5B). Reverse transcription was also increased with respect to wt IN M (Fig 5B). In conclusion, the chimeric nature of the virus, did not negatively affect its functionality, which, instead, was enhanced. These results are in line with other reports where it has been shown that SIVcpzPtt isolate MB897 efficiently infects human cells, with kinetics more similar to HIV-1 M rather than other SIVcpzPtt [33,34]. We then replaced the KKKK sequence in the CLA positions of the SIVcpzPtt integrase by NKNK (Fig 5C). This change reduced integration to around 10% with respect to wt IN SIVcpzPtt MB897 (Fig 5D) and, consequently, to around 20% of wt IN M. Reverse transcription and Pr55Gag processing were either only slightly (Fig 5D) or not at all altered (S5B Fig), respectively. This result markedly differs from what had been observed for IN M, for which the two sequences (KKKK and NKNK) yielded comparable levels of integration . The replacement of the amino acids in the CLA positions by NQNQ, condition that abolished integration in IN M, caused a drop of integration to undetectable levels, as well as a significant decrease in reverse transcription levels (Fig 5D). Pr55Gag processing levels, instead, were still unaltered (S5B Fig). Altogether, these results indicate that in SIVcpzPtt, the sequence in the CLA positions is crucial to determine the levels of integration, like for IN M, but in sharp contrast with IN M, the sequence NKNK was poorly functional, at least in the background of strain MB897 of SIVcpzPtt.
SIVcpzPtt CLA motif is important for integration (A) Schematic representation of wt IN M and IN SIVcpzPtt MB897. (B) Normalized levels of integration (left graph) and amounts of RTPs (right graph) relative to isolate M for the IN shown in panel A (n = 6). (C) Schematic representation of IN SIVcpzPtt MB897 wt and the two mutants for the CLA motif positions tested for integration and reverse transcription in panel D. When mutated with respect to the sequence of the wt, the amino acids of the NOG or of the CLA motifs are shown in capital letters. (D) Normalized levels of integration (left graph) and amounts of RTPs (right graph) relative to SIVcpzPtt MB897 wt (n = 6 for MB897 wt; n = 5 for SIVcpzPtt MB897/NKNK; n = 3 for SIVcpzPtt MB897/NQNQ). Data are shown as the average ± SD. ****p ≤ 0.0001. **p ≤ 0.01. *p ≤ 0.05. (Two-tailed, unpaired Student’s t-test for panel B. One-way ANOVA with Tukey’s multiple comparisons correction for panel D).
Functional complementarity of the NOG and CLA motifs
We have shown that replacing the NKNK sequence in the CLA motif of IN M by the sequence NQKK decreases the amount of reverse transcription products by a two-fold factor (Fig 3E). To characterize further the steps of the infectious cycle that are affected by this replacement, we measured the efficiency of 3’ processing of this mutant that resulted to be half that of the wt enzyme (Fig 6A and 6B). Interestingly, inserting the sequence of the NOG motif in this mutant (Fig 6A), rescued the defect, partially in HEK293T, and totally in Jurkat cells for which 3’ processing was comparable to that observed for wt IN M (Fig 6B). Combining this result to the reduction of the efficiency of reverse transcription leads to two conclusions. One is that these two defects are sufficient to explain the overall decrease in integration to 25% that of wt IN (as a result of 50% efficiency of reverse transcription combined to 50% efficiency of 3’ processing) when the NQKK sequence replaces NKNK in the CLA motif. The second is that, since the insertion of the NOG motif fully restores each of the two steps affected when NQKK is present in the CLA motif, the two motifs (NOG and CLA) should exert the same function in the infectious cycle. If this is the case, it may be expected that the two motifs, when present in the same IN should not have an additive effect and that the efficiency of integration would be the same as when only one of the motifs is present (i.e. = to wt IN). Accordingly, when we inserted the NOG motif in wt IN M (IN M/QGPH, Fig 3F), no improvement was observed for integration, nor for reverse transcription neither in HEK293T nor in Jurkat cells (Fig 3G and 3H) or for Pr55Gag processing (S5C Fig).
(A) Schematic representation of the IN used to evaluate the effect of the NOG motif on 3’ processing. When mutated with respect to the sequence of the wt, the amino acids of the NOG or of the CLA motifs are shown in capital letters. (B) Efficiency of 3’ processing, relative to the wt IN, for the mutants shown in panel A, in HEK293T and in Jurkat cells (n = 4 for all samples). (C) Outline of the EURT assay, with reverse transcription, adapted from . The two types of RNA that are co-packaged in the viral particles are shown with their essential functional features. Ψ: packaging sequence, RBS: ribosome binding site, F-luc: firefly luciferase coding sequence, AAA(A): polyA sequence, R, U5 and U3: elements of HIV-1 LTRs, PBS, HIV-1 primer binding site. The SGP-RNA also has a poly-A tale, but it is not shown for clarity, not being relevant for this experimental setting. (D) Outline of the EURT assay, without reverse transcription. (E) Luciferase expression, with reverse transcription happening inside the capsid, relative to the wt IN, for the mutants shown in panel A, in HEK293T (n = 6) and in Jurkat cells (n = 3). (F) Luciferase expression, without reverse transcription happening inside the capsid, relative to the wt IN, for the mutants shown in panel A, in HEK293T (n = 3) and in Jurkat cells (n = 3). Data are shown as average ± SD. ****p ≤ 0.0001. ***p ≤ 0.001. **p ≤ 0.01. *p ≤ 0.05. ns, not significant (one-way ANOVA with Tukey’s multiple comparisons correction).
Integrase mutants and stability of the viral capsid
Finally, we looked for possible differences, with the various IN mutants used, in the process of dismantling of the capsid, since this could alter the levels of RTPs available for integration, even if equal amounts of RTPs were measured in the cell. For instance, premature uncoating can lead to the dissociation of IN from the RTPs, while closed capsid prevents the RTPs from interacting with the genome of the infected cell. To this end, we used the EURT assay approach , in which the stability of the capsid is measured through the expression of a reporter gene carried by the VLP. The coding sequence is carried by an RNA (EUrep-RNA) that cannot be reverse transcribed but can be translated, leading to the synthesis of the firefly luciferase (Fig 6C). The experiment can also be carried out by co-packaging with the EUrep-RNA another RNA that can be reverse transcribed (in our case "SGP" RNA, Fig 6C). In this case the luciferase signal provided by heterozygous EUrep/SGP viruses will evaluate the stability of the capsid in the presence of reverse transcription, which is the condition relevant for the present study. If only EUrep-RNA is used (Fig 6D), the assay will measure the stability of the capsid in the absence of reverse transcription.
To study the stability in the presence of reverse transcription, VLPs are produced by transfection of cells that express equimolar amounts of two types of RNAs. Since the packaging and the dimerization signals are the same in the two RNAs, the resulting viral population is expected to be constituted by 50% heterozygous EUrep/SGP virions, 25% EUrep/EUrep and 25% SGP/SGP homozygous virions. While SGP/SGP viruses will not give any luciferase signal, homozygous EUrep/EUrep RNAs will interfere with the signal provided by the heterozygous EUrep/SGP particles. For this reason, the experiment was also performed in the absence of reverse transcription, to evaluate the contribution of homozygous EUrep/EUrep virions to the results obtained in the presence of reverse transcription and take this into account for the interpretation of the results.
The experiments were performed using wt IN M, IN M/NQKK and IN M/QGPH/NQKK (Fig 6A) in HEK293T and in Jurkat cells. In the presence of reverse transcription (Fig 6E) the replacement of NKNK by NQKK in the CLA motif led to a modest increase in the expression of the luciferase, indicating that the mutant NQKK triggers a slight decrease of the stability of the capsid. The addition of the NOG motif had no effect in HEK293T cells (Fig 6E) while it markedly increased the stability of the capsid in Jurkat cells that became even more stable than what observed with wt IN M (Fig 6E). In the absence of reverse transcription, instead, no change in the stability of the capsid was observed among the different mutants and cells tested (Fig 6F). Therefore, the specific changes in the stability of the capsid observed in the experiment performed in the presence of reverse transcription are due to the heterozygous virions and are thus related to the ongoing reverse transcription in the viral particles.
In this work, we document that integrases of HIV-1 groups M and O have developed two phylogenetic-group specific functional motifs that can cross-complement each other. One motif (CLA) is located in the CTD of the protein of HIV-1 group M, the other (NOG) in the NTD of isolates of HIV-1 group O. This observation highlights that, depending on the phylogenetic sequence considered, two different domains of the same HIV-1 protein carry out functions that can mutually complement each other during the infectious cycle.
We previously showed that, when at least two K are present among the four amino acids that constitute the CLA motif, the positions of the individual residues can be permutated without affecting integration in eight of the ten possible combinations . In the two other cases, integration was significantly reduced with the most marked decrease (to around 25% of the wt IN M) observed with the sequence N222Q240K254K273. This phenotype was confirmed in this study in HEK293T cells and, for the first time, shown also in Jurkat cells (Figs 3E and 6B). Despite this, NQKK is highly conserved in HIV-1 O, raising the question of how it could have been selected. We find here that HIV-1 IN O has a motif in its NTD (NOG, Q7G24P41H44) that allows to bypass the need for the CLA motif, ultimately yielding levels of integration comparable to IN M. Indeed, when the NOG motif is inserted in an IN M where the CLA sequence has been mutated into NQKK, the levels of integration are brought back to those of wt IN M. This is achieved by compensating the defaults in the same specific steps (reverse transcription and 3’ processing) that were caused by the NQKK sequence, suggesting that the two motifs exert the same, or at least very similar, functions.
In Jurkat cells these effects were concomitant to an increase of the stability of the capsid. Discrepancies in the uncoating kinetics/pathways dependent on the cell type were previously reported [36–39]. This is probably due to cell-specific determinants involved in the capsid stability, as it could be the different effects observed for host factors known to interact with the capsid in different cells [40–44]. Therefore, the fact that we observed a different phenotype in HEK293T and Jurat cells for capsid stability could be explained by a cell-specific effect on this uncoating step. Dismantling of the viral capsid is a central step in the control of infectivity . An implication of the IN in ensuring the optimal stability of the viral core by favoring the interaction between the capsid protein and cyclophilin A had been previously described . Reverse transcription favors dismantling of the capsid core in vitro and the generation of full-length RTPs has been proposed to be the main motor promoting its disassembly [47,48]. The longer permanence of the reverse transcription complex in the core that is observed in Jurkat cells, might account for the higher increase in 3’ processing for this cell type (Fig 6B). Indeed, since 3’ processing occurs right after reverse transcription, when a longer time is allotted to this process before the capsid is dismantled, it could benefit from a confined environment that keeps a high concentration of the components of the reaction, as it was previously observed for reverse transcription [49–51].
The results concerning the stability of the capsid rule out the possibility that our mutants could be impaired in the packaging of gRNA. Indeed, the CTD of IN binds the viral RNA and altering this interaction results in dislocation of the gRNA outside the capsid, severely affecting reverse transcription [24,26]. The fact that the alteration of the stability of the capsid we observe is found exclusively for the viruses where reverse transcription was allowed (Fig 6B–6F) indicates that the dislocation of the gRNA is not at the origin of this result. This observation does not exclude the possibility that our results reflect an altered interaction of IN with the gRNA, but only that altering this interaction does not cause the default discussed above.
How could two different domains have converged to ensure such similar (if not the same) functions? The simplest explanation is that they interact with the same molecule. We showed that the three first residues of the motif (N222K240N254) form a positively charged surface, absent in the case of the N222Q240K254 sequence . This surface was proposed to interact, possibly with the contribution of the additional K273, with a repetitive, negatively charged partner  (Fig 7A) as the backbone of DNA or RNA molecules. In HIV-1 O IN, the presence of the NOG motif is predicted to induce, particularly due to P41 and H44, the formation of an alternative positively charged surface, absent in HIV-1 M IN-NTD (Fig 7D and 7E), which could drive the interaction to involve preferentially the NTD (Fig 7F).
(A) CTD (220–274) and NTD (1–47) of HIV-1 M IN, PDB 6PUT , are shown in green, facing each other. This distribution could happen between two different IN being part of the same intasome. In this case, in the CTD, the CLA motif (in yellow) is forming a positively charged surface (in blue) that is interacting with a negatively charged partner (in red). In magenta are shown the amino acids occupying the NOG positions. (B) HIV-1 O NTD (1–47, in orange) with the NOG motif highlighted in magenta. The structure of HIV-1 O NTD was obtained with AlphaFold2. (C) Surface electrostatic potential of HIV-1 O NTD (panel B). The yellow circle shows the positive surface exposed by the NOG motif. (D) HIV-1 M NTD (1–47, in green) with the amino acids occupying the NOG positions highlighted in magenta. (E) Surface electrostatic potential of HIV-1 M NTD (panel D). The yellow dashes are showing the lack of the positive surface found in HIV-1 O NTD. (F) HIV-1 IN O CTD (220–274) and NTD (1–47), both obtained with AlphaFold2, are shown in orange, facing each other. Here, the positive surface (in blue) is formed thanks to the presence in the NTD of HIV-1 O IN of the NOG motif (in magenta), interacting with a negatively charged partner (in red). In yellow are shown the amino acids occupying the CLA positions in HIV-1 O IN CTD.
A possible scenario for the emergence of these two alternative motifs is that the NOG motif was established in the simian virus infecting gorillas (either by fixation of an inherited SIVcpzPtt sequence or by emergence and subsequent fixation of the QGPH sequence) where it is highly conserved, at least within the limits of an analysis carried out on only 8 sequences available for SIVgor. The emergence of the CLA motif, instead, appears to date after transmission of the virus to the human host, since no conservation is found in this region in SIVcpzPtt, although an overall trend for the presence of amidic and basic residues (as those composing the consensus sequence of the motif in HIV-1 M) is found. The fact that none of the combinations of these amino acids was selected in the simian virus, which instead occurred after transfer to humans, suggests that, in simian cells, the function exerted by this motif was either not required or not so important as it is in humans, allowing the co-existence of multiple functional sequences. Another possible explanation for the sequence diversity observed at SIVcpzPtt CLA positions is that these amino acids were involved in a dynamic co-evolution process with a rapidly evolving molecular interface of, for example, another protein. In both cases it is tempting to speculate that the emergence of the NKNK motif was part of the process of adaptation to the new host.
When we inserted the NKNK motif in the RT-IN coding region of the SIVcpzPtt MB897 strain and generated a chimeric HIV-1 carrying RT-IN of SIVcpzPtt, integration was around 10% of that observed with its wt sequence (KKKK) in Jurkat cells (Fig 5D). Altogether, these results rather support the view that, once the simian virus has been transferred to humans, both sequences (RT and IN) have undergone a stepwise adaptation process to the new host that finally generated the genetic context in which the NKNK sequence in the CLA positions became optimal. The genetic flexibility that we described for the CLA locus, with several permutated sequences retaining integration ability , could constitute what remains of the swarm of sequences generated by genetic drift and from which selection for the successful NKNK sequence occurred.
Dominant epistasis, relieving selective pressure from the CLA motif, would have allowed this region of IN O to develop, potentially, new accessory functions. Indeed, HIV-1 integrase is a multifunctional protein that, logically, acquired its diverse functions, and optimized those already acquired, progressively during evolution. Increasing evidence supports the notion that in multifunctional proteins, the initial steps toward the establishment of a new function are undertaken by genetic drift before selection for the new function is applied [52–54]. Intra-patient expanding HIV populations are characterized by extensive genetic drift, driven by neutral selection , thereby creating favorable conditions for the generation of new functionalities in its proteins . The presence in the CLA positions of SIVcpzPtt of the same type of amino acids that would have then generated the NKNK motif in HIV-1 M, but still without selection for a specific sequence, could constitute a snapshot of such early phases of genetic drift in the process of generation of what will then become an essential motif for integration in HIV-1 M.
In conclusion, this work sheds light on crucial aspects of the emergence of two phylogenetic-group specific motifs of the integrase of HIV-1, from their simian ancestors across the barrier of the zoonotic transmission to humans. By deciphering how optimization of integration is achieved in these two cases, this work contributes to improve our understanding of the rules governing viral evolution and their role in the zoonotic transmissions.
HEK293T and Jurkat cells were obtained from the American Type Culture Collection (ATCC). HEK293T were cultured in DMEM while Jurkat were cultured in RPMI. Both mediums were completed with 10% fetal bovine serum and 1% PenStrep. HEK293T and Jurkat culture conditions were at 37°C in 5% CO2.
Viral strains and sequence alignments
The primary HIV-1 isolates used in this study were: isolate HXB2 (GenBank accession number: K03455.1), isolate A2 (GenBank accession number: AF286237) from group M, subtype A2, (named "isolate M" in this study) obtained from the NIH AIDS Research and Reference Reagent Program; isolate RBF206 (GenBank accession number: KU168298) and isolate BCF120 (GenBank accession number: KU168297) both from group O, kindly provided by J.C. Plantier (CHU Rouen, France). Isolates AF286237 and RBF206 were chosen because they were used in the work that originated the present study . Isolate BCF120 was chosen as the isolate O carrying the same sequences as the consensus ones in the two motifs considered in this work. The SIVcpzPtt isolate employed in this work is the MB897 (GenBank accession number: EF535994) and it was chosen being one of the two isolates which are the most phylogenetically related to group M.
For the creation of the conservation logos, by using WebLogo (http://weblogo.threeplusone.com) [57,58], we performed sequence alignments using the QIAGEN CLC Genomic Workbench 22 that employs the progressive alignments approach . All the sequences were obtained from the Los Alamos National Laboratory HIV database (https://www.hiv.lanl.gov/content/index).
Plasmids and molecular cloning
The plasmid p8.91-MB previously described , was used as backbone for all cloning procedures. Therefore, all our constructs have the gag and the protease-coding sequences from HXB2 (HIV-1 group M). RT and IN coding-sequences, instead, varied. In isolate M the RT and IN was from isolate A2, while in isolates HIV-1 O it was either from isolate O206 or O120. In the chimpanzee isolate the RT and IN was from SIVcpzPtt MB897. All IN mutant coding sequences were inserted between the BspEI and SalI restriction sites of p8.91-MB by Gibson assembly. The plasmid used to produce the genomic RNA of the VLPs, carrying the two reporter genes used to evaluate integration efficiency (nGFP and PUROR), is a modified version of the previously-described pSRP  where the nuclear RFP was replaced by the nuclear GFP, giving the pSGP.
The pEUrep-RNA  was kindly provided by Andrea Cimarelli. The plasmid is coding for an mRNA containing the RNA packaging sequence (Ψ) and the cDNA of the Firefly luciferase followed by a polyA signal.
Two plasmids, both previously described , were employed for the creation of standard curves in the quantitative PCR assays. The pJet-1LTR for the detection of late RTPs and the pGenuine2LTR for the evaluation of the 3’ processing efficiency.
Transfection and VLPs collection
To produce virus-like particles (VLPs) HEK293T cells were co-transfected with the plasmid coding for the vesicular stomatitis virus glycoprotein (VSV-G) , the plasmid carrying HIV-1 Gag-Pol gene (p8.91MB with different IN) and the plasmid with the modified viral genome with the reporter genes to follow the infection (pSGP). For the EURT assay the pEU-repRNA plasmid, coding for the EU-repRNA, was either added to the mix or used in the place of pSGP. All transfections were done by using 5 μg of total DNA and polyethyleneimine (PEI, Polyscience) following the manufacturer’s instructions. The medium was changed after 6 h and VLPs were collected and filtered with a 0.45 μm filter after 48–72 hours. The amount of VLPs was estimated by quantifying the p24 via ELISA (Fujirebio).
Western blot analyses
The same volume of VLPs was concentrated by centrifuging them through a 20% sucrose for 2 h at 20,000 g and at 4°C. Pellets were resuspended in 3x Laemmli buffer and viral proteins were separated on a Criterion TGX Strain-Free 4–15% gradient gel (Bio-Rad) and then blotted on a PVDF membrane. To evaluate Pr55Gag proteolytic processing, polyproteins and mature capsid proteins were detected by probing the membrane with a mouse monoclonal anti-CA primary antibody (NIH AIDS Reagent Program) and a secondary anti-mouse HRP-conjugated antibody (Millipore). ECL reagent (Bio-Rad) was added to the membrane and images were taken with Bio-Rad Chemidoc Touch and analyzed with the Image Lab software (Bio-Rad). The Pr55Gag processing efficiency was expressed as the ratio of mature CA signal on the total CA signal (unprocessed, partially processed and fully processed CA proteins).
Quantitative PCR for viral DNA and its forms
HEK293T or Jurkat cells were transduced by spinoculation with polybrene (Sigma-Aldrich) and an amount of VLPs corresponding to a nominal MOI of 1. Prior to infection, VLPs were incubated with Benzonase nuclease (Sigma-Aldrich) to remove non-internalized DNA. 24-hours post-transduction (hpt) cells were collected, and total DNA was extracted with DNeasy Blood & Tissue Kit (QIAGEN). All qPCR assays were designed with the Taqman hydrolysis probe technology using the IDT Primers and Probes design software (IDT), with dual quencher probes (one internal ZEN quencher and one 3’ Iowa Black FQ quencher). qPCRs were performed with the iTaq Universal Probes Supermix (Bio-Rad) on a CFX96 (Bio-Rad) thermal cycler according to the manufacturer’s protocols. Standard curves and analysis were conducted with the CFX Manager (Bio-Rad).
Late reverse transcription products were quantified with oligos amplifying the U5-Psi junction. This was normalized by the amount of genomic DNA that was quantified by amplifying an exon of the actin gene. Absolute quantification was performed by creating a standard curve with known quantities of pJet-1LTR for RTPs and the genome extracted from a known quantity of cells for actin quantification.
To evaluate the 3’ processing efficiency we first quantified the quantity of 2LTR circles (2LTRc) with oligos and probe annealing to the 2LTRc junction and then we evaluated the nature of this junction (perfect or imperfect, which are respectively the unprocessed and processed 3’ ends) with oligos and probes annealing specifically only to the perfect junction. The imperfect junction ratio was subsequently calculated as 1-perfect junction, where 1 represents the total amount of 2LTRc. For both 2LTRc and perfect junction quantification, the standard was prepared with pGenuine2LTR. All the oligos and probes used for the qPCR assays can be found in S1 Table.
Evaluation of integration
HEK293T or Jurkat cells were transduced by spinoculation with polybrene and an amount of VLPs corresponding to a nominal MOI of 0.01. 24-hpt, puromycin was added to HEK293T at a final concentration of 0.6 μg/ml and integration was measured by counting the puromycin-resistant clones 1-week post-transduction. As previously shown, this method is comparable to the classical Alu-gag quantitative PCR method . For Jurkat cells integration was measured by FACS 72-hpt by counting the percentage of cells expressing the nGFP. Therefore, in this analysis, we did not take into account the intensity of the signal but only the number of GFP positive cells. This time was chosen after having established that no signal would be detected using a catalytically inactive IN (D116A), to exclude the possibility that the signal of our constructions would derive from episomal forms of the viral DNA. Since integration depends on the availability of the RTPs and since reverse transcription is affected by the viral IN, in both HEK293T and Jurkat, results were normalized by the reverse transcription efficiency evaluated by qPCR (see above). Namely, the amount X1 of RTP was estimated for sample 1, for example. The number of puro resistant clones (P1) for HEK293T cells or the number of nGFP positive cells (F1) for Jurkat cells, was computed for the same sample. The normalized integration values were then computed as P1/X1 or F1/X1.
Assessment of the capsid stability
As described above VLPs employed in this assay contain either two RNAs, the EUrep-RNA and the SGP-RNA, or the EUrep-RNA alone. The corresponding quantity of VLPs of a nominal MOI of 0.5 was used to transduce either HEK293T or Jurkat cells. 8-hpt cell protein extract was obtained, and Luciferase assay was performed with the Luciferase Assay System (Promega). Luminescence (Relative Luminescence Units, RLU) was normalized for protein concentration measured by the Bradford assay and therefore expressed as RLU per mg of protein extract (RLU/mg).
Structure and molecular modelling
The NTD and CTD structures of IN M show in the manuscript belong to PDB 6PUT . The NTD and CTD structures of IN O were obtained from molecular modelling of isolate O120 made with AlphaFold2 [61,62] by Patrice Gouet. The structure is available upon request. Pictures used in the manuscript were obtained with PyMOL2.5.
All statistical tests were performed on at least three independent experiments (n is indicated in every figure legend) using Prism 9. ANOVA with Tukey’s multiple comparisons correction was used when more than three groups were compared. An unpaired t-test was used when two samples were directly compared. The numerical data used in all figures are included in S1 Data.
S1 Fig. Pr55Gag processing of IN tested in this work.
(A) Results for Pr55Gag processing for the constructions shown in Fig 2A. Pr55Gag is not affected for all the constructions tested (n = 3). (B) Results for Pr55Gag processing for the constructions shown in Fig 2C. Pr55Gag is not affected for all the constructions tested (n = 3). (C) Results for Pr55Gag processing for the constructions shown in Fig 3C. Pr55Gag is not affected for all the constructions tested (n = 3). Data are shown as the average ± SD. ns, not significant (one-way ANOVA with Tukey’s multiple comparisons correction).
S2 Fig. HIV-1 O IN NTD O206 differs for only one substitution from the consensus sequence.
The consensus amino acidic sequence of HIV-1 IN NTD O is shown above. Below, the sequence from the same region (1–46) is shown for isolate O206. Dash indicate a conserved position. Substitutions are indicated by bold letters.
S3 Fig. Alignment of HIV-1 group O integrases.
The fifty sequences aligned were obtained from the Los Alamos National Laboratory HIV database (https://www.hiv.lanl.gov/content/index). Different colors indicate different conservation levels for each position. Alignment performed with QIAGEN CLC Genomics Workbench 22.
S4 Fig. Conservation in the CTD region of SIVcpzPtt IN.
Alignment of the CTD region (200–280) of SIVcpzPtt isolates. In the first line is shown the sequence from the same region of HIV-1 M isolate HXB2. Different colors indicate different conservation levels for each position. Alignment performed with QIAGEN CLC Genomics Workbench 22.
S5 Fig. Pr55Gag processing of IN tested in this work.
(A) Results for Pr55Gag processing for the constructions shown in Fig 5A. Pr55Gag is not affected for all the constructions tested (n = 3). (B) Results for Pr55Gag processing for the constructions shown in Fig 5C. Pr55Gag is not affected for all the constructions tested (n = 3). (C) Results for Pr55Gag processing for the constructions shown in Fig 3F. Pr55Gag is not affected for all the constructions tested (n = 3). Data are shown as the average ± SD. ns, not significant (one-way ANOVA with Tukey’s multiple comparisons correction).
S1 Table. Oligos and probes used for quantitative PCR assay.
- 1. Gao F, Bailes E, Robertson DL, Chen Y, Rodenburg CM, Michael SF, et al. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature. 1999;397: 436–441. pmid:9989410
- 2. Keele BF, van Heuverswyn F, Li Y, Bailes E, Takehisa J, Santiago ML, et al. Chimpanzee reservoirs of pandemic and nonpandemic HIV-1. Science (1979). 2006;313: 523–526. pmid:16728595
- 3. D’Arc M, Ayouba A, Esteban A, Learn GH, Boué V, Liegeois F, et al. Origin of the HIV-1 group O epidemic in western lowland gorillas. Proc Natl Acad Sci U S A. 2015;112: E1343–E1352. pmid:25733890
- 4. Plantier JC, Leoz M, Dickerson JE, de Oliveira F, Cordonnier F, Lemée V, et al. A new human immunodeficiency virus derived from gorillas. Nat Med. 2009;15: 871–872. pmid:19648927
- 5. Korber B, Muldoon M, Theiler J, Gao F, Gupta R, Lapedes A, et al. Timing the Ancestor of the HIV-1 Pandemic Strains. Science (1979). 2000;288: 1789–1796. pmid:10846155
- 6. Lemey P, Pybus OG, Rambaut A, Drummond AJ, Robertson DL, Roques P, et al. The molecular population genetics of HIV-1 group O. Genetics. 2004;167: 1059–1068. pmid:15280223
- 7. Leoz M, Feyertag F, Kfutwah A, Mauclère P, Lachenal G, Damond F, et al. The Two-Phase Emergence of Non Pandemic HIV-1 Group O in Cameroon. PLoS Pathog. 2015;11: 1–13. pmid:26241860
- 8. Mourez T, Simon F, Plantiera JC. Non-M variants of human immunodeficiency virus type. Clin Microbiol Rev. 2013;26: 448–461. pmid:23824367
- 9. Peeters M, Gueye A, Mboup S, Bibollet-Ruche F, Ezaka E, Mulanga C, et al. Geographical distribution of HIV-1 group O viruses in Africa. AIDS. 1997;11: 493–498. pmid:9084797
- 10. Santoro MM, Perno CF. HIV-1 Genetic Variability and Clinical Implications. ISRN Microbiol. 2013;2013: 1–20. pmid:23844315
- 11. Bego MG, Cong L, Mack K, Kirchhoff F, Cohen ÉA. Differential Control of BST2 Restriction and Plasmacytoid Dendritic Cell Antiviral Response by Antagonists Encoded by HIV-1 Group M and O Strains. J Virol. 2016;90: 10236–10246. pmid:27581991
- 12. Kluge SF, Mack K, Iyer SS, Pujol FM, Heigele A, Learn GH, et al. Nef Proteins of Epidemic HIV-1 Group O Strains Antagonize Human Tetherin. Cell Host Microbe. 2014;16: 639–650. pmid:25525794
- 13. Mack K, Starz K, Sauter D, Langer S, Bibollet-Ruche F, Learn GH, et al. Efficient Vpu-Mediated Tetherin Antagonism by an HIV-1 Group O Strain. Silvestri G, editor. J Virol. 2017;91. pmid:28077643
- 14. Sauter D, Schindler M, Specht A, Landford WN, Münch J, Kim K-A, et al. Tetherin-Driven Adaptation of Vpu and Nef Function and the Evolution of Pandemic and Nonpandemic HIV-1 Strains. Cell Host Microbe. 2009;6: 409–421. pmid:19917496
- 15. Engelman A, Craigie R. Identification of conserved amino acid residues critical for human immunodeficiency virus type 1 integrase function in vitro. J Virol. 1992;66: 6361–6369. pmid:1404595
- 16. Engelman A, Bushman FD, Craigie R. Identification of discrete functional domains of HIV-1 integrase and their organization within an active multimeric complex. EMBO Journal. 1993;12: 3269–3275. pmid:8344264
- 17. van Gent DC, Vink C, Groeneger AA, Plasterk RH. Complementation between HIV integrase proteins mutated in different domains. EMBO J. 1993;12: 3261–3267. pmid:8344263
- 18. Eijkelenboom APAM, van den Ent FMI, Vos A, Doreleijers JF, Hård K, Tullius TD, et al. The solution structure of the amino-terminal HHCC domain of HIV-2 integrase: a three-helix bundle stabilized by zinc. Current Biology. 1997;7: 739–746. pmid:9368756
- 19. Zheng R, Jenkins zen, Craigie R. Zinc folds the N-terminal domain of HIV-1 integrase, promotes multimerization, and enhances catalytic activity. Proceedings of the National Academy of Sciences. 1996;93: 13659–13664. pmid:8942990
- 20. Passos DO, Li M, Yang R, Rebensburg S v., Ghirlando R, Jeon Y, et al. Cryo-EM structures and atomic model of the HIV-1 strand transfer complex intasome. Science (1979). 2017;355: 89–92. pmid:28059769
- 21. Passos DO, Li M, Jóźwik IK, Zhao XZ, Santos-Martins D, Yang R, et al. Structural basis for strand-transfer inhibitor binding to HIV intasomes. Science (1979). 2020;367: 810–814. pmid:32001521
- 22. Kulkosky J, Jones KS, Katz RA, Mack JP, Skalka AM. Residues critical for retroviral integrative recombination in a region that is highly conserved among retroviral/retrotransposon integrases and bacterial insertion sequence transposases. Mol Cell Biol. 1992;12: 2331–2338. pmid:1314954
- 23. Busschots K, Vercammen J, Emiliani S, Benarous R, Engelborghs Y, Christ F, et al. The Interaction of LEDGF/p75 with Integrase Is Lentivirus-specific and Promotes DNA Binding. Journal of Biological Chemistry. 2005;280: 17841–17847. pmid:15749713
- 24. Elliott J, Eschbach JE, Koneru PC, Li W, Puray-Chavez M, Townsend D, et al. Integrase-RNA interactions underscore the critical role of integrase in HIV-1 virion morphogenesis. Elife. 2020;9: 1–56. pmid:32960169
- 25. Engelman A, Hickman AB, Craigie R. The core and carboxyl-terminal domains of the integrase protein of human immunodeficiency virus type 1 each contribute to nonspecific DNA binding. J Virol. 1994;68: 5911–5917. pmid:8057470
- 26. Kessl JJ, Kutluay SB, Townsend D, Rebensburg S, Slaughter A, Larue RC, et al. HIV-1 Integrase Binds the Viral RNA Genome and Is Essential during Virion Morphogenesis. Cell. 2016;166: 1257–1268.e12. pmid:27565348
- 27. Rocchi C, Gouet P, Parissi V, Fiorini F. The C-Terminal Domain of HIV-1 Integrase: A Swiss Army Knife for the Virus? Viruses. 2022;14: 1397. pmid:35891378
- 28. Wilkinson TA, Januszyk K, Phillips ML, Tekeste SS, Zhang M, Miller JT, et al. Identifying and Characterizing a Functional HIV-1 Reverse Transcriptase-binding Site on Integrase. Journal of Biological Chemistry. 2009;284: 7931–7939. pmid:19150986
- 29. Zhu K, Dobard C, Chow SA. Requirement for Integrase during Reverse Transcription of Human Immunodeficiency Virus Type 1 and the Effect of Cysteine Mutations of Integrase on Its Interactions with Reverse Transcriptase. J Virol. 2004;78: 5045–5055. pmid:15113886
- 30. Kanja M, Cappy P, Levy N, Oladosu O, Schmidt S, Rossolillo P, et al. NKNK: a New Essential Motif in the C-Terminal Domain of HIV-1 Group M Integrases. Simon V, editor. J Virol. 2020;94: 1–23. pmid:32727879
- 31. Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992;89: 10915–10919. pmid:1438297
- 32. van Heuverswyn F, Li Y, Bailes E, Neel C, Lafay B, Keele BF, et al. Genetic diversity and phylogeographic clustering of SIVcpzPtt in wild chimpanzees in Cameroon. Virology. 2007;368: 155–171. pmid:17651775
- 33. Takehisa J, Kraus MH, Ayouba A, Bailes E, van Heuverswyn F, Decker JM, et al. Origin and Biology of Simian Immunodeficiency Virus in Wild-Living Western Gorillas. J Virol. 2009;83: 1635–1648. pmid:19073717
- 34. Sato K, Misawa N, Takeuchi JS, Kobayashi T, Izumi T, Aso H, et al. Experimental Adaptive Evolution of Simian Immunodeficiency Virus SIVcpz to Pandemic Human Immunodeficiency Virus Type 1 by Using a Humanized Mouse Model. J Virol. 2018;92: 1–21. pmid:29212937
- 35. da Silva Santos C, Tartour K, Cimarelli A. A Novel Entry/Uncoating Assay Reveals the Presence of at Least Two Species of Viral Capsids During Synchronized HIV-1 Infection. PLoS Pathog. 2016;12. pmid:27690375
- 36. Rasaiyaah J, Tan CP, Fletcher AJ, Price AJ, Blondeau C, Hilditch L, et al. HIV-1 evades innate immune recognition through specific cofactor recruitment. Nature. 2013;503: 402–405. pmid:24196705
- 37. Peng K, Muranyi W, Glass B, Laketa V, Yant SR, Tsai L, et al. Quantitative microscopy of functional HIV post-entry complexes reveals association of replication with the viral capsid. Elife. 2014;3: e04114. pmid:25517934
- 38. Hulme AE, Kelley Z, Foley D, Hope TJ. Complementary Assays Reveal a Low Level of CA Associated with Viral Complexes in the Nuclei of HIV-1-Infected Cells. J Virol. 2015;89: 5350–5361. pmid:25741002
- 39. Francis AC, Marin M, Shi J, Aiken C, Melikyan GB. Time-Resolved Imaging of Single HIV-1 Uncoating In Vitro and in Living Cells. PLoS Pathog. 2016;12: 1–28. pmid:27322072
- 40. Li Y, Kar AK, Sodroski J. Target Cell Type-Dependent Modulation of Human Immunodeficiency Virus Type 1 Capsid Disassembly by Cyclophilin A. J Virol. 2009;83: 10951–10962. pmid:19656870
- 41. Lee KE, Ambrose Z, Martin TD, Oztop I, Mulky A, Julias JG, et al. Flexible Use of Nuclear Import Pathways by HIV-1. Cell Host Microbe. 2010;7: 221–233. pmid:20227665
- 42. Setiawan LC, van Dort KA, Rits MAN, Kootstra NA. Mutations in CypA Binding Region of HIV-1 Capsid Affect Capsid Stability and Viral Replication in Primary Macrophages. AIDS Res Hum Retroviruses. 2016;32: 390–398. pmid:26414211
- 43. Kane M, Rebensburg S v., Takata MA, Zang TM, Yamashita M, Kvaratskhelia M, et al. Nuclear pore heterogeneity influences HIV-1 infection and the antiviral activity of MX2. Elife. 2018;7: 1–44. pmid:30084827
- 44. Bejarano DA, Peng K, Laketa V, Börner K, Jost KL, Lucic B, et al. HIV-1 nuclear import in macrophages is regulated by CPSF6-capsid interactions at the nuclear pore complex. Elife. 2019;8: 1–31. pmid:30672737
- 45. Toccafondi E, Lener D, Negroni M. HIV-1 Capsid Core: A Bullet to the Heart of the Target Cell. Front Microbiol. 2021;12: 1–17. pmid:33868211
- 46. Briones MS, Dobard CW, Chow SA. Role of Human Immunodeficiency Virus Type 1 Integrase in Uncoating of the Viral Core. J Virol. 2010;84: 5181–5190. pmid:20219923
- 47. Christensen DE, Ganser-Pornillos BK, Johnson JS, Pornillos O, Sundquist WI. Reconstitution and visualization of HIV-1 capsid-dependent replication and integration in vitro. Science (1979). 2020;370. pmid:33033190
- 48. Rankovic S, Varadarajan J, Ramalho R, Aiken C, Rousso I. Reverse Transcription Mechanically Initiates HIV-1 Capsid Disassembly. J Virol. 2017;91: 1–14. pmid:28381579
- 49. Eschbach JE, Elliott JL, Li W, Zadrozny KK, Davis K, Mohammed SJ, et al. Capsid Lattice Destabilization Leads to Premature Loss of the Viral Genome and Integrase Enzyme during HIV-1 Infection. Simon V, editor. J Virol. 2020;95. pmid:33115869
- 50. Forshey BM, von Schwedler U, Sundquist WI, Aiken C. Formation of a Human Immunodeficiency Virus Type 1 Core of Optimal Stability Is Crucial for Viral Replication. J Virol. 2002;76: 5667–5677. pmid:11991995
- 51. Stremlau M, Perron M, Lee M, Li Y, Song B, Javanbakht H, et al. Specific recognition and accelerated uncoating of retroviral capsids by the TRIM5 restriction factor. Proceedings of the National Academy of Sciences. 2006;103: 5514–5519. pmid:16540544
- 52. Aharoni A, Gaidukov L, Khersonsky O, Gould SMQ, Roodveldt C, Tawfik DS. The “evolvability” of promiscuous protein functions. Nat Genet. 2005;37: 73–76. pmid:15568024
- 53. Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science (1979). 2003;300: 1701–1703. pmid:12805536
- 54. O’Brien PJ, Herschlag D. Catalytic promiscuity and the evolution of new enzymatic activities. Chem Biol. 1999;6. pmid:10099128
- 55. Maldarelli F, Kearney M, Palmer S, Stephens R, Mican J, Polis MA, et al. HIV Populations Are Large and Accumulate High Genetic Diversity in a Nonlinear Fashion. J Virol. 2013;87: 10313–10323. pmid:23678164
- 56. Bloom JD, Romero PA, Lu Z, Arnold FH. Neutral genetic drift can alter promiscuous protein functions, potentially aiding functional evolution. Biol Direct. 2007;2: 17. pmid:17598905
- 57. Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: A Sequence Logo Generator: Figure 1. Genome Res. 2004;14: 1188–1190. pmid:15173120
- 58. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18: 6097–6100. pmid:2172928
- 59. Feng D-F, Doolittle RF. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. J Mol Evol. 1987;25: 351–360. pmid:3118049
- 60. Naldini L, Blömer U, Gallay P, Ory D, Mulligan R, Gage FH, et al. In vivo gene delivery and stable transduction of nondividing cells by a lentiviral vector. Science (1979). 1996;272: 263–267. pmid:8602510
- 61. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596: 583–589. pmid:34265844
- 62. Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022;50: D439–D444. pmid:34791371