Molecular dynamic simulations to investigate the structural impact of known drug resistance mutations on HIV-1C Integrase-Dolutegravir binding

Resistance associated mutations (RAMs) threaten the long-term success of combination antiretroviral therapy (cART) outcomes for HIV-1 treatment. HIV-1 Integrase (IN) strand transfer inhibitors (INSTIs) have proven to be a viable option for highly specific HIV-1 therapy. The INSTI, Dolutegravir is recommended by the World Health Organization for use as first-line cART. This study aims to understand how RAMs affect the stability of IN, as well as the binding of the drug Dolutegravir to the catalytic pocket of the protein. A homology model of HIV-1 subtype C IN was successfully constructed and validated. The site directed mutator webserver was used to predict destabilizing and/or stabilizing effects of known RAMs while FoldX confirmed any changes in protein energy upon introduction of mutation. Also, interaction analysis was performed between neighbouring residues. Three mutations known to be associated with Raltegravir, Elvitegravir and Dolutegravir resistance were selected; E92Q, G140S and Y143R, for molecular dynamics simulations. The structural quality assessment indicated high reliability of the HIV-1C IN tetrameric structure, with more than 90% confidence in modelled regions. Change in free energy for the three mutants indicated different effects, while simulation analysis showed G140S to have the largest affect on protein stability and flexibility. This was further supported by weaker non-bonded pairwise interaction energy and binding free energy values between the drug DTG and E92Q, Y143R and G140S mutants suggesting reduced binding affinity, as indicated by interaction analysis in comparison to the WT. Our findings suggest the G140S mutant has the strongest effect on the HIV-1C IN protein structure and Dolutegravir binding. To the best of our knowledge, this is the first study that uses the consensus wild type HIV-1C IN sequence to build an accurate 3D model to understand the effect of three known mutations on DTG drug binding in a South Africa context.

Introduction HIV-1 subtype C (HIV-1C) accounts for nearly 50% of all global HIV-1 infections, while HIV-1 subtype B (HIV-1B) accounts for only approximately 12% [24]. However, a vast majority of research on HIV-1 infections focussed on the development of combination anti-retroviral therapy (cART) drugs for HIV-1B and studying the mechanisms of drug resistance in HIV-1B, with less information known about HIV-1C. As a result, all antiretroviral drugs have been developed in relation to HIV-1B. They have also been reported to be effective against a wide range of HIV-1 subtypes [25]. Other clinical studies have however revealed very poor cART treatment outcome when associated with HIV-1C infections [26][27][28]. Although HIV-1C has not been considered an effective predictor for therapy failure earlier, a recent trial indicated that HIV-1C has independent predictors for viral failure [28]. Recent studies also have identified subtype specific differences in DTG cross-resistance pattern in patients failing the firstgeneration RAL treatment [8,14].
Our work carries on from such previously reported molecular dynamics simulation findings to try and assess the molecular mechanisms of resistance in a subtype C IN protein and see how the reported known resistance mutations will affect binding affinity. Based on this insight, this study has been dedicated to employ in silico methods to understand whether the second-generation drug, DTG, will be able to retain efficacy against selected RAL and EVG known resistance mutations in an HIV-1C IN protein. In 2017, Cryogenic electron microscopy was used to solve the structure of HIV-1 strand transfer complex intasome for HIV-1 subtype B [29]. This provided us with a unique opportunity to model the structure of HIV-1 subtype C IN to interrogate the effect of known drug resistance associated mutations (RAMs) on the protein structure using molecular dynamic simulation studies. This is the first study that uses the consensus wild type subtype C IN sequence to build an accurate 3D model of HIV-1C IN to understand the effect of three known RAL, EVG and DTG mutations on DTG drug binding.

Generation of consensus HIV-1C Integrase sequence
To compare our sequences with the rest of the IN sequences from South Africa, we performed a search on the HIV Los Alamos National Library (LANL) database (https://www.hiv.lanl.gov/ components/sequence/HIVsearch.com). Our search inclusion criteria included all South African HIV-1 subtype C IN sequences and those identified from treatment naïve patients. We selected one sequence per patient and all problematic sequences were excluded from further analyses. Finally, the consensus sequence was generated using the database-derived HIV-1C ZA sequences (n = 314) and cohort sequences (n = 91) [8]. Nucleotide sequences were verified for stop codons, insertion and deletions using an online quality control program on the HIV-LANL database (https://www.hiv.lanl.gov/content/sequence/QC/index.htm). Multiple sequence alignments were done with MAFFT version 7, from which the consensus sequence was derived [30]. As part of quality control, each of the viral sequences were inferred on a phylogenetic tree in order to eliminate possible contamination.

Molecular modelling and quality assessment
The crystal structure of the HIV-1B intasome (PDBID: 5U1C) was used to generate a threedimensional tetrameric structure of HIV-1C IN using the consensus HIV-1C sequence that we generated. The SWISSMODEL webserver was used for model generation by first constructing a pairwise sequence-structure alignment between HIV-1C wild-type (WT) amino acid sequence and template 5U1C [31]. The quality of the resulting model was assessed using SWISSMODEL quality assessment scores, Root mean square deviation analysis compared to homologous template (PDBID: 5U1C) and with publicly available algorithms located at the SAVES webserver (https://servicesn.mbi.ucla.edu/SAVES/) namely; ERRAT, VERIFY3D and Ramachandran plot [32][33][34].

Structure preparation
The predicted 3D structure of HIV-1C IN was superimposed to 5U1C to extract proviral DNA, while the Magnesium (MG) ions and drug DTG were extracted and obtained from homologous template 3S3M (Prototype foamy virus) onto their specific positions within the predicted HIV-1C IN using PyMOL. The atomic coordinates of the wild-type (WT) structure of HIV-1C in complex with DNA, MG and DTG, were uploaded onto the CHARMM-GUI solution builder webserver to generate a series of input files for energy minimization of the molecule in an aqueous solvent environment [35,36]. 50,000 steps of energy minimization using the steepest descent minimization integrator was used to equilibrate the system of the solvated complex structure using CHARMM36 force field [37], and applying constraints to hydrogen bonds using the LINCS constraint algorithm. All this was performed with Gromacs software version 5.1 [38]. Thereafter, we predicted the stabilizing and/or destabilizing effect of mutations on the protein structure. For this purpose, the site directed mutator (SDM) webserver and the software FoldX was used to predict the change in Gibbs free energy after the introduction of the mutation [39,40]. We also calculated the loss or gain of polar interactions between neighbouring residues located adjacent to the mutation using PyMOL [36].

Molecular dynamic simulation
For simulation studies we only considered the two inner dimers of the protein structure, as the other two monomers were similar in sequence and structure. Three different mutant systems were prepared by introducing a specific mutation into the WT structure through the mutagenesis wizard in PyMOL and energy minimizing the structures using Gromacs applying the same parameters used to energy minimize the WT structure [38,41]. The WT and three mutant systems (E92Q, G140S and Y143R) were prepared by uploading the atomic coordinates of the Protein-DNA-MG-DTG complexes to the CHARMM-GUI interface [36]. These three mutant systems were selected for simulation studies because they represent three resistance pathways associated with RAL, EVG and possibly DTG resistance. Both E92Q and G140S mutations have been reported to reduce susceptibility of DTG~1.5-fold and up to 10-fold respectively [13,17]. The individual systems were built using the solution builder option in the input generator [35,36]. Each system was solvated in a rectangular TIP3 waterbox with 10Å distance between the edges of the box. The topology and coordinates for each system was generated using the CHARMM36 all-atom force field [37] and CHARMM general force field [37] for DTG. Each system was neutralized by adding counter ions to each of the systems. For the WT system, 157 potassium ions (K) and 81 chloride ions (Cl) were added, for the mutant Y143R system we added 156 K and 81 Cl ions, while for the mutant system G140S we added 157 K and 81 Cl ions and for the mutant system E92Q we added 156 K and 81 Cl ions. Each system was at a final concentration of 0.15M for simulation dynamics.
Gromacs version 5.1 was used for running all the simulations [38]. Each system underwent 50000 steps of steepest descents energy minimization to remove steric overlap. Afterwards, all the systems were subjected to a two-step equilibration phase namely NVT (constant number of particles, Volume and Temperature) and NPT (constant Number of particles, Pressure and Temperature). The NVT equilibration was run for 500 picoseconds (ps) to stabilize the temperature of the system and a short position restraint NPT was run for 500 ps to stabilize the pressure of the system by relaxing the system and keeping the protein restrained. The V-rescale temperature-coupling [38,39] method was used for the NVT ensemble, with constant coupling of 1 ps at 303.15K. For NPT, the Nose-Hoover pressure coupling [42][43][44] was turned on with constant coupling of 1ps at 303.15K under conditions of position restraints (h-bonds) selecting a random seed. Electrostatic forces were calculated for both NVT and NPT using Particle Mesh Ewald method [45]. All systems were subjected to a full 300 nanoseconds (ns) simulation under conditions of no restraints, an integration time step of 0.002 ps and an xtc collection interval of 5000 steps for 10 ps. Each simulation was repeated (duplicating each simulation separately) to validate reproducibility of results.
The analyses of the trajectory files were done using GROMACS utilities. The root mean square deviation (RMSD) was calculated using gmx rmsd and root mean square fluctuation (RMSF) analysis using gmx rmsf. The radius of gyration was calculated using gmx gyrate to determine if the system reached convergence over the 300 ns simulation. Pairwise distance analysis between the drug and MG was done using gmx pairdist tool. The total number of hydrogen bonds between the protein and drug was calculated using gmx hbond. The total pairwise non-bonded interaction energy (which is not a free energy or binding energy) between the protein and the drug DTG was calculated using gmx energy over 100 ns. The energy terms included were average short range Coulombic interactions and short range Lennard Jones interactions. The free energy of binding was calculated using Molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) protocols implemented in the g_mmpbsa package [46]. The ΔG binding energy was calculated for 800 frames between the protein and the drug over 8 ns from 100-108 ns of the simulation trajectory with a sampling interval of 10 ps. Afterwards, we extracted structures every 50 ns over the last 200 ns of the equilibrated system to determine any structural changes and differences in the number of interactions between the protein and drug at different time intervals.

Principal component analysis
Principal component analysis (PCA) is a statistical technique that reduces the complexity of a data set in order to extract biologically relevant movements of protein domains from irrelevant localized motions of atoms. The technique is known for its ability to transform a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables, called principal components (PCs), while retaining those characteristics of the data set that contribute most to its variance [19]. For PCA analysis the translational and rotational movements were removed from the system using gmx covar from GROMACS to construct a covariance matrix. Next the eigenvectors and eigenvalues were calculated by diagonalizing the matrix. The eigenvectors that correspond to the largest eigenvalues are called "principal components", as they represent the largest-amplitude collective motions. We filtered the original trajectory and project out the part along the most important eigenvectors namely: vector 1 and 2 using gmx anaeig from GROMACS utilities. Furthermore, we visualized the sampled conformations in the subspace along the first two eigenvectors using gmx anaeig in a two-dimensional projection.

Sequence and structural analysis
The amino acid sequence of HIV-1C IN shared 93.4% sequence identity with the template 5U1C amino acid sequence (S1 The VERIFY 3D score was 80.1%, the ERRAT overall quality score was 90% and higher for all four chains (A, B, C and D) and the Ramachandran plot indicated more than 90% of residues fell within the most favoured regions of the plot suggesting the predicted structure is a reliable model. Stability predictions indicated 15 RAMs to be destabilizing and five to be stabilizing for the protein structure based on SDM delta-delta G free energy scores ( Table 1). The FoldX change in unfolded energy values indicated that the G140S was stabilizing, E92Q destabilizing and Y143R neutral based on comparison with the WT structure each having values of 162.89, 131.94, 146.47 and 151.83 Kcal/Mol, respectively. Interaction analysis showed ten mutations resulted in a loss of polar contacts; three resulted in an increase in polar contacts, while seven showed no change in the number of polar contacts with neighbouring residues (Table 1).

Molecular dynamic simulations
All the MD trajectory analysis considered the single chain A (monomer) of the IN protein in contact with the drug DTG and the DNA. Trajectory analysis of the RMSD of the backbone indicated that the WT system reached equilibrium after 100 ns as well as the E92Q, Y143R and G140S mutant systems (Fig 2A). Only G140S showed higher RMSD variation values compared to the WT, Y143R and E92Q systems (Fig 2A). RMSF analysis clearly showed higher flexibility for the G140S mutant system, with four highly flexible regions (residues 68-70, 142-146, 166-170 and 253-256) compared to the WT, E92Q and Y143R systems (Fig 2B). These flexible regions affect the 140's loop region that regulates drug binding. The Radius of gyration values indicated decreasing values for Y143R and E92Q compared to the WT and G140S mutant system (Fig 2C). Plotting the first two principal components provided insight into the collective movement of each protein atom. The 2D projections of the first and second principal components for the WT vs E92Q, WT vs Y143R and WT vs G140S systems are shown in S3A, S3B and S3C Fig Table 2). The major contributors to the total binding free energy in the WT was the van der Waals energy, electrostatic interaction energy and SASA energy, while the polar solvation had no contribution (because of its positive value) to the binding of the drug and similarly for the E92Q mutant (Table 2). For the Y143R system the van der Waals, polar solvattion and SASA energy were the major contributors to the binding free energy, while the electrostatic interaction energy had no contribution. Both the van der Waals and the electrostatic interaction energies contributed significantly to the total binding energy observed in the G140S mutant, while the polar solvation and SASA energy each had smaller contributions to the binding of the drug. This suggests that all the mutants considered may trigger conformational changes in the active site resulting in significantly weak binding of DTG to HIV-1C IN.

Interaction analysis
We performed interaction analysis for five snapshots (every 50 ns) of each of the simulation systems to determine which residues played a role in the binding of DTG to the protein in the WT and mutant protein structures. For the WT system, interactions were observed between the drug DTG and known active site residues D64, D116 and N148, MG ion and also to DNA nucleotides (Table 3). Similarly, interactions were observed between the drug DTG and known active site residue D64, Y143R, N148, MG ion and DNA nucleotides for the Y143R system (Table 3). Interestingly, no active site residue and MG ionic interactions were formed between DTG and the G140S mutant system, resulting in the dissociation of the drug from the binding pocket over time (S6D Fig). On the other hand, the E92Q system showed interactions with one of the active site residues (D116) but no MG ionic interactions (Table 3). S6A-S6D Fig in Supporting Information File shows the different interactions formed between the drug, MG ion and active site residues for snapshots taken at 100 ns for each simulation system.

Discussion
Previous studies by Chen et al. and Dewdney et al. [19,21] showed the structural impact of mutations Q148H/R and G140S/A on the flexibility of the HIV-1 IN as a mechanism for RAL resistance. Furthermore, Xue and team [46] found that the cross-resistance mutation E138K/ Q148K resulted in a reduction in the chelation ability of oxygen atoms in INSTIs to Mg 2+ in the active site of the mutated intasomes resulting in a reduced binding affinity of RAL and EVG to the protein. Another simulation study also revealed the binding mode of EVG and RAL to HIV-1 IN and the structural mechanism of drug resistant mutants (G140A and G149A) that affect the 140's loop region spanning residues 140-149 [23]. However, all of these studies only considered HIV-1B IN and protein models of low sequence identity. In this study, we selected three known mutations E92Q, G140S and Y143R associated with RAL, EVG and DTG resistance to investigate their effect on the protein structure of HIV-1C IN and DTG drug binding. The structural modelling of HIV-1C IN considered a homologous template of high sequence identity, and good overall target sequence coverage, compared to previous homology models that considered templates of low sequence identity. We could therefore accurately reconstruct HIV-1C using the close homolog HIV-1B crystal structure as template to infer accurate drug interactions. Further inspection of the overall structure confirmed accurate prediction of more than 90% of domains within the protein structure, compared to the template HIV-1B structure. The quality analysis provided support for the predicted model based on side chain conformations. Stability predictions showed contrasting results to Table 3. Summary of interaction analysis.

Structure Cluster Interactions
Hydrogen bonds Ionic interaction analysis, whereby amino acid substitutions that resulted in a gain of interactions was predicted to be destabilising. The FoldX changes in energy values were similar to interaction analysis for the three mutant structures under investigation. To fully comprehend the effects of individual mutations we opted to use molecular dynamic (MD) simulations to understand the effect of three known mutations on protein movement and drug interactions. MD analyses have shown to be successful in quantifying small changes in protein structures that can affect overall drug binding [47]. Analysis of the change in trajectory of the mutant systems compared to the wild type suggested less stability and higher fluctuation of the G140S mutant system compared to the WT system. We also confirmed the destabilizing effect of the G140S mutant using principal component analysis which suggested larger randomized concerted movement for the G140S mutant compared to the WT, E92Q and Y143R systems. These findings are contradictory to Chen et al. [19] who performed 150 ns simulation studies of the G140S HIV-1B IN mutant system with NAMD and discovered that the 140's loop of the single G140S mutant system displayed reduced movements using principal component analysis. Their results showed that the single G140S mutation did not adversely affect drug binding. In our case, the 140's loop region is stabilized by the G140S mutation and we assume that could reduce drug binding. This is supported by pairwise distance analysis confirming a larger distance between the MG ion and drug DTG for the G140S mutant system compared to the WT and Y143R. Furthermore, the total pairwise non-bonded interaction energy was significantly lower for the G140S mutant compared to the WT, suggesting weaker affinity of the drug DTG for HIV-1C IN in the presence of the mutant. Similarly, the binding free energy calculations also showed higher binding energy between the WT HIV-1C IN and DTG and reduced binding for the E92Q, Y143R and G140S mutant systems. These results are in stark contrast to the study of Chen et al. [19] that showed no difference in binding affinity of RAL to the WT and G140S single mutant. Interestingly, the binding free energy in our study for the WT and DTG (-29.65 ± 18.54) was comparable to that found in the Xue et al. [22] study (-30.95 ± 0.10), although having~1.3 Kcal/Mol energy difference. Further interaction analysis was performed to confirm the hypothesis that the G140S mutation could reduce drug binding by extracting structures at different snapshots of the simulation. Here, we found that the G140S mutation resulted in the drug moving further away from the binding pocket. We also observed weaker interactions for the E92Q mutation but stronger interactions for Y143R mutant based on the average number of hydrogen bonds and the total number of polar contacts between the protein and the drug. The model generated in this study can be used to tease out the effects of novel variants. A few limitations of this study are the use of RAL and EVG mutants and not considering novel RAL or DTG mutations and also simulating single instead of double mutations. However, we have yet to identify double mutants within the South African cohort of HIV-1C infected patients. Another limitation is the exclusion of entropy effects due to the lack of computational resources this might have led to under or overestimation of the binding free energy. However, our total pairwise interaction energies also correlate well with RAL binding energies observed in the Chen et al. [19] study with the WT showing higher pair interaction energy compared to the G140S/Q148H double mutant. Future work will include viral fitness assays to determine the effect of mutants E92Q, Y143R and G140S on the HIV-1C virus replication in the presence of DTG.