The Interaction of RNA Helicase DDX3 with HIV-1 Rev-CRM1-RanGTP Complex during the HIV Replication Cycle

Molecular traffic between the nucleus and the cytoplasm is regulated by the nuclear pore complex (NPC), which acts as a highly selective channel perforating the nuclear envelope in eukaryotic cells. The human immunodeficiency virus (HIV) exploits the nucleocytoplasmic pathway to export its RNA transcripts across the NPC to the cytoplasm. Despite extensive study on the HIV life cycle and the many drugs developed to target this cycle, no current drugs have been successful in targeting the critical process of viral nuclear export, even though HIV’s reliance on a single host protein, CRM1, to export its unspliced and partially spliced RNA transcripts makes it a tempting target. Due to recent findings implicating a DEAD-box helicase, DDX3, in HIV replication and a member of the export complex, it has become an appealing target for anti-HIV drug inhibition. In the present research, we have applied a hybrid computational protocol to analyze protein-protein interactions in the HIV mRNA export cycle. This method is based on molecular docking followed by molecular dynamics simulation and accompanied by approximate free energy calculation (MM/GBSA), computational alanine scanning, clustering, and evolutionary analysis. We highlight here some of the most likely binding modes and interfacial residues between DDX3 and CRM1 both in the absence and presence of RanGTP. This work shows that although DDX3 can bind to free CRM1, addition of RanGTP leads to more concentrated distribution of binding modes and stronger binding between CRM1 and RanGTP.


Introduction
The human immunodeficiency virus (HIV) is a well-known pandemic lentivirus responsible for millions of deaths annually worldwide, particularly in developing and third-world countries [1]. Drugs exist to target nearly every aspect of the viral replication cycle, but treatment aggressiveness is limited by the very potent and potentially dangerous side effects of many of the drugs used. Despite extensive study on the HIV-1 life cycle and the many drugs developed to target this cycle, no current drugs have successfully targeted the critical process of viral nuclear will elucidate the key binding site locations between DDX3 and CRM1 and whether there is cooperativity in the binding of these components in the formation of the export complex.
In the present research, the initial candidates for CRM1-DDX3 binding were determined by protein-protein docking. According to the CAPRI structure prediction contest, this method has experienced considerable progress in the past decade [22,23]. Yet, none of the automated docking servers is able to claim its best solution as the native complex in all test cases [24]. In order to overcome this issue, previous studies suggested using a combination of different docking solutions by consensus scoring [25,26] or post-processing using molecular dynamics (MD) simulation [27]. In this research, we use a combination of docking results obtained from three of the most successful docking servers followed by molecular dynamics equilibration. MD simulation results describing the stability and strength of binding replace the original docking scores for assessment of complex candidates. Computational alanine scanning [28] is used for analysis of key interfacial residues. Some other bioinformatics approaches including hot spot prediction and conservation analysis are also performed to examine their correlation with the simulation results.

Methods
Although docking approaches have been traditionally developed for identifying the binding of small molecules (mostly drugs) to large proteins, some recent docking approaches have proven their ability in predicting the binding between proteins. However, due to limitations in protein-protein docking algorithms, one cannot rely solely on binding modes resulting from docking. On the other hand, de novo reconstruction of large protein complexes based on pure MD simulation is beyond current computational resources. This calls for a hybrid approach that can leverage the capabilities of multiple methods and tools. A similar protocol, binding estimation after refinement (BEAR), was developed for rapid virtual screening of small ligands using docking, short MD, and scoring calculation using MM/PBSA and MM/GBSA [29]. This work will mainly involve a combination of bioinformatics, docking and MD. Also, free energy analysis will be applied for evaluation of binding strength. Fig. 2 outlines an overview of the protocol.

Protein Crystal Structures
Crystal structures of DDX3X (PDB ID: 2I4I), herein referred to as DDX3, and CRM1 bound to Snurportin-1 and RanGTP (PDB ID: 3NBZ) or bound to just Snurportin-1 (PDB ID: 3GB8) were obtained from the Protein Databank. All of Snuportin-1 was removed except the NES region from both CRM1 crystal structures. Note that 3NBZ contains a Snuportin-1 molecule with the NES region from HIV-1 Rev. The missing H atoms were added to the structures and they were minimized before docking.

Molecular Docking
To obtain a list of potential CRM1-DDX3 binding modes, a series of docking simulations were carried out by binding DDX3 with the two CRM1 complexes. Docking was carried out using three webserver tools: ClusPro2.0 [30], GRAMM-X [31], and FireDock [32]. These servers were selected as they were some of the most successful based on the CAPRI benchmarks [23]. Broadly, these docking tools use an algorithmic approach to explore all potential geometries of binding while treating the interacting proteins as rigid bodies. FireDock performs an additional refinement procedure by introducing side-chain flexibility to the rigid-body docked structures and performing a side-chain optimization. Subsequently, Monte Carlo energy minimization is performed and a final ranking is obtained based on a binding score. No interface constraints were used with GRAMM-X. Also, PatchDock [33] was used first to generate a primary list of docked structures, and then these structures were passed to FireDock for further refinement. The top 10 docked structures from each of these tools (30 bound structures for each CRM1 complex; 60 total structures) were gathered for further analysis.

Molecular Dynamics Simulation
Molecular dynamics (MD) simulation was performed using each docked structure to further refine the binding mode. MD models were built using NAMD 2.9 [34] and the CHARMM27 force field [35,36]. Protein manipulations, measurements, and water box addition were done with VMD1.9.1 and the included plugins [37].
This system was placed in a water box with the TIP3P water molecule, a periodic simulation cell with a 10 Å margin and Na + and Clcounter-ions at the concentration of 150mM. In all simulations, Particle Mesh Ewald [38] was used for electrostatic energy calculation. Total atom numbers varied from 200,000 to 230,000 for different cases.
In order to conserve computation resources the bonds between hydrogen and larger atoms were held at fixed length, and thus, a timestep of 2 fs was used. The default multiple timestepping method of NAMD was used [39], with 2 fs step for bonded force evaluation, 2 fs for nonbonded forces, and 4 fs for long-range electrostatics. Pressure was regulated at 1 atm using Langevin piston [40] with a period of 100 fs and damping timescale of 50 fs, and Langevin damping factor of 1 fs -1 . Three independent simulations, each 10 ns long at 310 K, as well as preliminary minimization were performed for each docked structure. The second half (5 ns) of the simulation time is considered as the production part.

Binding Analysis and Free Energy Calculation
Energetic and structural analysis was performed subsequent to the simulations. Specifically, root mean square deviation (RMSD), buried surface area (BSA), interaction energy between CRM1 and DDX3, and MM/GBSA were calculated.
Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) is an approximate free energy calculation method that uses a combination of molecular mechanics energy and implicit solvation models. It is much more accurate than conventional docking scores yet more computationally expensive. However, it is faster and more convenient than rigorous free energy calculation methods such as umbrella sampling or alchemical approaches [41,42,43]. This method and its counterpart based on Poisson-Boltzmann approximation (MM/PBSA) have been successfully utilized to compare conformational stabilities and binding free energies in a variety of cases including nucleic acids structures [41], protein folding [44], protein-ligand complexes [45] and computational mutagenesis [46]. MM/GBSA is categorized as an end-point method for free energy calculation and is derived from direct calculation of various components of free energy including bonded, electrostatic and vdW energies, polar and non-polar desolvation free energies with addition of conformational entropy. Because we have similar ligands in different MM/GBSA calculations, dropping the entropic term can still give us reasonable results if we just focus on the relative free energies. Similarly, the BEAR protocol also excludes the entropic term from this calculation. This way we can also save a lot of computational resources. It must be noted that even inclusion of entropy portion for protein-protein complexes will not give reliable absolute free energy values [47,48]. Parameter set GB OBC II was used for calculation of GB term [49]. Surface tension coefficient was set at 0.005 in nonpolar solvation energy term. We chose the last 5ns from each simulation and uniformly picked 100 sampling points for the calculation of different MM/GBSA terms.
Computational alanine scanning (CAS) [50] was performed using the webserver Robetta [28] in order to ascertain interfacial residues critical in maintaining stability between CRM1 and DDX3. Like traditional experimental alanine scanning, Robetta CAS server mutates interfacial residues to alanine and the change in free energy of binding is calculated using a linear free energy function. Robetta reports a list of "hot spots" consisting of residues that would significantly destabilize the interface of two bound proteins if mutated to alanine with the associated change in free energy. CAS was performed for all 60 docked structures. Using CAS, a list of interfacial residues is compiled and is the basis of the predicted binding mode for CRM1 and DDX3.
Two additional bioinformatics tools were used for further validation, SPPIDER [51] and ConSurf [52]. SPPIDER is a webserver that can predict functional residues at protein-protein interaction sites based on a consensus-classifier. The two different crystal structures of CRM1 and DDX3 were submitted to the SPPIDER webserver and the list of putative interface residues was complied. The settings 'SPPIDER I' and a tradeoff of 0.3 were chosen. Another tool, Con-Surf, gives a measure of how evolutionarily conserved the positions of amino acids in a given protein are based on phylogenetic relations of homologous protein sequences. Again, the same crystal structures of CRM1 and DDX3 were submitted individually to the ConSurf webserver. The multiple sequence alignment was first generated from ConSurf using the UNIREF-90 protein database [53] and CSI-BLAST homolog search algorithm with 3 iterations and an E-value cutoff of 0.0001. A Bayesian calculating method with a Jones-Taylor-Thornton [54] evolutionary substitution model was used to calculate the conservation scores.

Clustering and Extended MD
Following MD simulation, it was noticed that there was redundancy within the 60 docked structures as a result of using multiple docking tools. Thus, clustering was performed to collapse the 60 docking structures to a set of non-redundant binding modes. A hierarchical RMSD-based clustering algorithm was adapted from ClusPro2.0. VMD was first used to perform the necessary RMSD calculations. For two given docked structures, the CRM1 structures were aligned. Then, the RMSD between the two DDX3 molecules was calculated. This procedure was performed pairwise for all 60 structures to generate a 60 x 60 RMSD matrix. Clusters are then generated by selecting all docked structures that are below a cutoff of 30 Å from the reference docked structure, designated the cluster center. Then, the clusters are sorted by the number of members in each cluster. After finding the largest cluster, all of its members will be removed from the list and the next largest cluster will be determined. This procedure is repeated for all subsequent clusters. After clustering, structures that contained a DDX3 bound to a location that sterically prohibited the binding of RanGTP (in the case of 3GB8) or RanBP1 (PDB ID: 4GMX; both 3GB8 and 3NBZ cases) were removed. For each cluster with more than 4 members, the most structurally and energetically stable docked complex was selected for extended MD simulation. In addition to these docked complexes, the most energetically and structurally stable complexes not found within the top clusters were also selected for extended simulation. This is so that any structurally stable complex not clustered well would still be included.
After selecting these top docked structures, a new set of 50 ns extended MD simulations was performed following 10000-step minimization. For each of these structures, DDX3 was moved 10 Å away from CRM1 surface. All simulation conditions are the same as the previous 10 ns simulations. Energetic and structural analysis was performed once again after these extended simulations consisting of calculating RMSD, interaction energy, MM/GBSA, and BSA. Additionally, CAS was performed again using Robetta.

Flexible Multi-Domain Docking
Among the methods used for protein-protein docking, few can handle more than two interacting molecules at a time. Moreover, most of these methods are restricted to symmetric homomeric complexes. HADDOCK [55] is one of the few tools able to handle simultaneous multibody docking. Moreover, in some circumstances one of the interacting proteins is composed of two separate domains connected by a highly flexible linker. In such cases we may observe global changes including large-scale domain motions like hinge and shear. The flexible multi-domain docking (FMD) feature of HADDOCK multi-body docking enables examining the highly flexible linker between domains while allowing them to independently dock on the other agent [56]. Our particular system can benefit from this utility since the two lobes of DDX3 are free to explore different conformations especially when DDX3 binds to RNA. FMD complemented our main protocol to ensure covering other possible DDX3 conformations and to let it open or close freely upon encountering the host (CRM1). WeNMR was used as the computational resource for running the HADDOCK jobs [57].
In the flexible multi-domain docking (FMD) method, the protein structure is divided into domains separated by flexible linkers between each pair. The location of the flexible hinge region can be determined using an elastic network model. Hingeprot was used to identify the separation point between the domains of the flexible structure [58]. HADDOCK requires a list of interaction restraints to start with. The knowledge about interactions should ideally come from unambiguous experimental studies. Presently, however, such experimental data are scarce for many biological complexes including the one we deal with. Hence, we must identify the potential interacting residues with the help of predictor servers. CPORT was used here to identify the potential interacting residues [59]. CPORT gathers the outcomes from other prediction servers and makes a collective list of the predicted interfacial residues outcomes. It must be taken into account that the list of residues for each protein is independent of other interacting proteins and is derived solely based on each protein structure. The list of suggested interfacial residues was then used to prepare the ambiguous interaction restraints as input to the HADDOCK server.
The flexible molecule, DDX3X in the present work, is divided into two separate pieces. Hence, the connectivity between the separated domains must be defined and maintained throughout the docking procedure. This can be implemented as unambiguous distance restrains between the C-and N-termini. The server returns the list of solutions sorted based on the highest scores, where a score is defined as the interaction energy penalized by violation of distance constraints. The candidates are also clustered according to their proximity to each other. From this list, only solutions that satisfy the distance constraint between the two domains will be considered feasible candidates. Also, possible overlap with other interacting molecules with the receptor must be considered to filter out undesirable candidates.

Molecular Docking of DDX3 to CRM1
The ultimate objective is to predict the binding mode of the CRM1-NES-RanGTP-DDX3 protein complex and elucidate residues strongly implicated in binding between DDX3 and the CRM1 export complex. Due to lack of detailed, experimental information regarding the binding mode of DDX3 to CRM1 export complex, molecular docking was used in order to obtain a sample of possible binding modes. With this information, it will then be possible to perform targeted, systematic MD simulations with DDX3 and the CRM1 export complex.
To assess whether DDX3 has a different binding mode with CRM1 as a function of RanGTP, we first docked DDX3 to two different forms of the CRM1 export complex. The first form contains CRM1 bound to an NES peptide as well as RanGTP while the second form is only bound to NES. Three separate servers (ClusPro2.0, GRAMM-X, FireDock) were used in order to avoid any biases in a given docking algorithm and therefore obtain a varied set of sample binding modes. The top 10 ranked docking structures were selected from each server. The key binding locations for DDX3 on CRM1 were, first, near the N-C terminal junction, second, near the NES peptide, third, distributed somewhere along the rim of CRM1, and fourth, on the back of CRM1 (i.e., the side opposite where Ran binds). Examining the top 10 results from Clu-sPro, DDX3 was docked 8 times in some position radiating from the center of the back of CRM1 and twice along the bottom rim of CRM1 from 3NBZ (Fig. 3). Five of the docked DDX3 molecules on the back of CRM1 are within close proximity to the NES peptide. Conversely, a different distribution of binding locations for DDX3 was observed when it was docked to CRM1 from 3GB8, which lacks the presence of RanGTP. Namely, there was an even distribution of DDX3 molecules docked all along the rim of CRM1. Interestingly, the top 10 GRAMM-X results using CRM1 from 3NBZ mimicked the same general DDX3 binding distributions as seen with ClusPro2.0, with only one DDX3 molecule far from the NES (S1 Fig.). Similarly, there was no discernible trend with the DDX3 binding distribution using CRM1 from 3GB8, as seen with ClusPro2.0. On the other hand, this behavior was not noticed with FireDock (S2 Fig.). In the 3NBZ case, there was an arbitrary distribution of DDX3 around CRM1, while the 3GB8 case had half of the DDX3 molecules bound close to the NES or on the opposite side of CRM1, at the bottom. With this, we have obtained a set of 30 protein complexes containing CRM1-NES-RanGTP-DDX3 and 30 protein complexes containing CRM1-NES-DDX3.

Molecular Dynamics Simulation of Bound CRM1-DDX3 complexes
Docking provides a distribution of potential binding modes but lacks information regarding the stability of the binding modes. Additionally, most docking tools do not allow of large conformational changes commonly found in protein-protein binding. To determine the specific binding residues accurately, further refinement was necessary to each of the docked protein complexes. Thus, all-atom molecular dynamics (MD) simulation was performed to minimize and equilibrate the protein complexes.
After simulation, several system attributes/quantities were calculated for each of the 60 complexes (S3 Fig.). First, the root-mean-square deviation (RMSD) of the protein complex was monitored to see if the system reached a steady value, indicating an equilibrium was reached (see S4 Fig. for some sample cases), and the RMSD was computed for the last 5 ns (i.e., the production part) of the simulation. Second, the interaction energy between DDX3 and CRM1 was determined to gauge the strength of binding. Next, the change in buried surface area (BSA) was measured to help define a potential binding event. Finally, the molecular mechanics/generalized-Born implicit solvent (MM/GBSA) was calculated for an estimation of the binding free energy. Recall that the entropic term was not included in our calculations and thus can only be used to compare free energies amongst the 60 docked structures.
The values for each of these four quantities are listed and ranked in Table 1 for all 60 docked complexes. Broadly, the docked complexes from ClusPro2.0 using 3NBZ had the lowest RMSD, strongest interaction energy, largest MM/GBSA and greatest BSA. This is the same for the 3GB8 case except GRAMM-X generally had lower RMSD and the highest BSA. The results from ClusPro2.0 indicated its docked structures were more stable with stronger binding than other servers may be due in part because of the large number of solutions involved in the clustering process. In other words, the outcome is the representative of large number of strong complexes. The average results from the servers comparing 3NBZ and 3GB8 are similar (S1 Table). However, ClusPro2.0 results could possibly be weighted more due to its clustering process. Moreover, ClusPro2.0 achieved top results from CAPRI. Considering this, when examining the 3NBZ and 3GB8 ClusPro2.0 results, the 3NBZ docked structures achieved much stronger binding. There did not seem to be any discernible correlation between the docking tool rankings and the rankings of the calculated quantities in Table 1. This can be expected because each server has their own ranking criteria that may not include these four calculated quantities seen in Table 1. Among the four quantities, there seemed to be some observable correspondence among the rankings of a given docked complex. Specifically, the rankings for interaction energy, MM/GBSA, and BSA seemed to trend with one another. For example, a docked structure that had a high rank for interaction energy, such as (3NBZ) ClusPro #6, Interaction of RNA Helicase DDX3 with HIV-1 Rev-CRM1-RanGTP Complex tended to also have a high rank for MM/GBSA and BSA. Conversely, a low rank for interaction energy correlated with a low rank for MM/GBSA and BSA. Naturally, complexes with lower RMSDs are structurally stable and do not exhibit large conformational changes after binding. The time evolution of RMSD for a complex with low RMSD generally reached a stable value during the equilibration phase of MD simulation (S4 Fig.). Moreover, each of the three simulations exhibited the very similar time evolution over the 10 ns. There were no complexes with proteins that became detached but there was a noticeable spread in RMSD values, allowing us to discern which docked structures were least stable.
Energetic analysis gives another diagnostic with which to ascertain strong and stable binding modes. MM/GBSA gives a measure of binding free energy. As seen in Table 1, there is little spread from the smallest value, 751 kcal/mol, to the largest, 870 kcal/mol. Because the entropic term is neglected due to a lack of computational resources, this term is only used as a relative measure among the 60 docked structures. The interaction energy between CRM1 and DDX3 as well as CRM1+RanGTP and DDX3 are listed separately in Table 1. There is a large range of interaction energy values, from 18 to over 1300 kcal/mol. Notice that there are several cases where the interaction energy is substantially stronger when the interaction between RanGTP are included, such as (3NBZ) FireDock #9 (from-18 to-464 kcal/mol). It is necessary to be cognizant of the fact that only a few residues need be involved in protein-protein binding to achieve a stable interaction. Thus, docked structures with lower interaction energies cannot be ruled out exclusively based on the criteria of interaction energy.

Evaluating Key Binding Residues with Computational Alanine Scanning, Interfacial Residue Prediction (SPPIDER) and Evolutionary Conservation (ConSurf)
The interacting residues between CRM1 and DDX3 were then determined using computational alanine scanning. The location of each binding residue for all 30 3NBZ docked structures Interaction of RNA Helicase DDX3 with HIV-1 Rev-CRM1-RanGTP Complex (Fig. 4) and 30 3GB8 docked structures (Fig. 5) were highlighted by color based on the frequency of appearance. The hottest binding regions for CRM1 from 3NBZ were located on the back of CRM1 and near the NES binding location. CRM1 from 3GB8 instead exhibited binding residues spread all around the structure, front and back, with a hot region below the NES binding location. There was no discernible bias in the binding residues of DDX3 when bound either to CRM1 from 3NBZ or 3GB8. SPPIDER and ConSurf were then applied to our system. Using SPPIDER, a list of predicted interaction residues was obtained for CRM1 (both from 3NBZ and 3GB8) and DDX3 (S5 Fig.). The predicted interaction sites for CRM1 from 3NBZ are located along the upper rim of CRM1, flanking both sides of the NES binding region, and at the lip protruding on the front face of the bottom of CRM1, underneath Ran (Panels A and B in S5 Fig.). The predicted binding locations on CRM1 from 3GB8 show the same regional localization as with the 3NBZ case but with a lower density of predicted sites (Panels C and D in S5 Fig.). The predicted interface sites on DDX3 occurred around the RNA and ATP binding sites, located in the upper lobule, Interaction of RNA Helicase DDX3 with HIV-1 Rev-CRM1-RanGTP Complex and on the lower lobule at its interfacial region with the upper lobule (Panels E and F in S5 Fig.).
Evolutionary conservation scores for each residue were mapped onto each amino acid in S6 Fig. The strongest region of conservation on CRM1 from both 3NBZ (Panels A and B in S6 Fig.) and 3GB8 (Panels C and D in S6 Fig.) occurred at the NES binding location and the associated binding location of RanGTP. This is expected as NES containing proteins and Ran are critical binding partners and give functionality to CRM1. Additional strongly conserved binding locations speckle CRM1 in both cases all over the protein. For DDX3, the hottest regions of conservation are the locations of RNA and ATP binding and the regions surrounding these two binding spots (Panels E and F in S6 Fig.).

Clustering and Extended MD Candidate Selection
To reiterate, the final goal is to predict the most probable CRM1-NES-DDX3 binding mode. At this current stage, we have 60 possible binding modes and it is necessary to systematically reduce the list of docking structures. While the goal of using multiple docking servers is to avoid any bias present in a given docking algorithm, the methods used by each tool is not so distinct such that there will be no overlap in the binding modes among the three servers. Indeed, we observed some redundant binding modes within the set of the docked structures. Thus, structural clustering was necessary to generate a minimal, non-redundant list of docked structures. Briefly, docked structures that had a DDX3 RMSD below a specific cutoff were clustered together. Then, the docked structure that had the best overall RMSD, interaction energy, MM/GBSA, and BSA from each cluster with at least 4 members was selected as representative of the cluster. Prior to this selection, docked structures that had any structural clashing with the binding location of other known CRM1-binding partners, namely RanGTP and RanBP1, were excluded. Recall that one type of the CRM1 binding complexes (3GB8) analyzed does not have RanGTP. As binding of RanGTP is a requirement for nuclear export, DDX3 cannot be docked to this location and any docked structure that has this structural clashing is removed. Additionally, RanBP1 is required for complex disassembly and its binding location on CRM1 must be open. So, any docked structures that have DDX3 positioned in a location that would sterically clash with RanBP1 were also removed from the clustering. After this clustering process, there were four clusters with at least 4 members. The best docked structures in each of these clusters were (3NBZ) GRAMM-X #8, (3NBZ) ClusPro #6, (3GB8) ClusPro #7, (3NBZ) ClusPro #7.
In addition to clustering, the docked structures with the strongest binding based on RMSD, interaction energy, MM/GBSA, and BSA from Table 1 were also selected for further analysis, namely extended MD simulation. These docked structures are (3NBZ) ClusPro #6, (3NBZ) ClusPro #7, (3NBZ) ClusPro #10, and (3NBZ) FireDock #2. Note that some of these structures overlap with the selected structures from clustering.
In total, there are 6 docked structures selected for further MD simulation. These binding modes are depicted in Fig. 6 (see S7 Fig. for individual complexes). Of note, DDX3 is located in close proximity to the NES peptide in all but one case (3GB8-ClusPro #7). Additionally, the list of warm and hot interfacial residues on CRM1 and DDX3 are listed in Table 2 for all 6 of these structures.

Extended MD Simulations
The 6 docked structures were selected for extended MD simulation (50 ns) in order to determine if DDX3 could bind CRM1 when placed a distance of 10 Å apart. Note that with the previous 10 ns MD simulations, DDX3 was docked with CRM1, making it unlikely for DDX3 to move far away from CRM1. These extended simulations test more rigorously the binding of DDX3 and the two CRM1 complexes.
Structural and energetic calculations were performed once again after simulation (Table 3). RMSD and interaction energies indicate that these structures were structurally stable (Fig. 7). Within 10 to 20 ns, all structures' RMSD plateaued and was maintained until the end of the 50 ns simulation. The interaction energy between CRM1-NES (+/-RanGTP) and DDX3 reached a stable value towards the last 25 ns for all 6 cases, with a few cases exhibiting large oscillation due to some weak binding between certain residues or structural rearrangement (Fig. 7C). Lastly, BSA for each structure was stable for all cases except (3NBZ) ClusPro #7, which increased throughout the 50 ns (Fig. 7D). Interestingly, the interaction energy changed by a non-trivial amount when including interaction between RanGTP and DDX3 (NES-DDX3 interaction was negligible) for three cases: (3NBZ) GRAMM-X #8 (372 kcal/mol), (3NBZ) ClusPro #7 (230 kcal/mol), and (3NBZ) ClusPro #10 (85 kcal/mol). This RanGTP-DDX3 interaction over the 50 ns simulation is shown for each case in Fig. 7B. Note that DDX3 binds on the back of CRM1 where there is an opening allowing for interaction with RanGTP for these three cases.
CAS was also performed after the 50 ns simulations for CRM1 and DDX3 (Table 2). In each case, the number of warm/hot residues decreased as compared to the previous 10 ns simulations. This is the result of placing DDX3 from a much greater distance from CRM1 as compared to its location after docking. Generally, residues that appeared in the extended simulations were also present in the 10 ns simulations. Notably, however, (3NBZ) GRAMM-X #8 only had one critical residue on CRM1. This is in line with the data from Table 3 indicating it had among the lowest energetic values and therefore weak binding. Again, its binding was substantially improved through an interaction between RanGTP and DDX3.
The strongest and most stable binding mode of the 6 docked structures based on Tables 2 and 3 is (3NBZ) ClusPro #7. The binding modes of (3NBZ) ClusPro #6 and #10 are very similar to #7, but the DDX3 in #7 is situated in a slightly different orientation such that it has a strong interaction with RanGTP. Data in Table 2 shows individual warm (and hot) interfacial residues according to CAS analysis. Also, highly conserved residues and the ones generally predicted as general hot spots on both sides are annotated in the table. The list of interfacial Interaction of RNA Helicase DDX3 with HIV-1 Rev-CRM1-RanGTP Complex residues are provided after both post-docking MD and extended MD (after detachment). It is natural that the second list shrinks notably from the original one.
Each salt bridge pair for all top 6 docked structures is listed in Table 4. Both (3NBZ) ClusPro #6 and #7 formed 17 salt bridges between CRM1 and DDX3 throughout the time evolution of the 50 ns simulation, but #7 also formed the most salt bridges between DDX3 and RanGTP with a total of 6. It shows the importance of DDX3-RanGTP binding in some of the most promising complex formation modes. The formed salt bridges are shown in Fig. 8 for (3NBZ) ClusPro #7 as the top candidate. Also, the bridge distances are plotted throughout the bold face indicates the hot residues (i.e., ΔG mut!Ala >2 kcal/mol) * Indicates the residues predicted by SPPIDER. † Indicates the highly conserved residues based on ConSurf data.
doi:10.1371/journal.pone.0112969.t002 Table 3. Calculated structural and energetic quantities for the top 6 docked structures with rankings after extended MD simulation.     Table 4 for specific salt-bridge pairs. doi:10.1371/journal.pone.0112969.g008 Interaction of RNA Helicase DDX3 with HIV-1 Rev-CRM1-RanGTP Complex binding occurs far from Rev binding cleft at CRM1 terminal domains (see S9 Fig.). This mode is quite similar to the top 3GB8 candidate obtained from our hybrid protocol. The index of hot interfacial residues based on CAS is listed in S2 Table. Although the FMD method offers a powerful framework for examining multi-domain proteins, obtaining promising results is not always an easy procedure. This may be more relevant in cases with missing residues in the linker region.

Discussion
Nucleocytoplasmic traffic across the nuclear envelop is regulated by the nuclear pore complex (NPC) [60]. The human immunodeficiency virus (HIV) exploits the nucleocytoplasmic pathway to export its RNA transcripts across the NPC to the cytoplasm. The HIV relies on a single host protein, CRM1, to export its unspliced and partially spliced RNA transcripts. Recent findings have implicated a DEAD-box helicase, DDX3, in HIV replication and a member of the export complex, making it an appealing target for anti-HIV drug inhibition. Computational modeling approaches have offered a powerful platform for predicting the quantitative biology of nucleocytoplasmic transport processes [61][62][63][64][65]. In the present work, we used a hybrid computational protocol consisting of preliminary docking, MD equilibration, approximate free energy calculation, clustering and computational alanine scanning accompanied by evolutionary analysis and protein-protein binding prediction to investigate the binding of DDX3 to CRM1 in the context of HIV-1 Rev-mediated nuclear export of partially spliced and unspliced HIV-1 RNA. While it has yet to be tested, our hybrid computational protocol holds promise as a general approach to discover binding modes for protein-protein complexes for which structural experimental data are missing. Future work will need to validate this protocol by using a range of protein-protein complexes whose binding modes have been previously determined.
We used multiple docking methods and gathered the obtained results in a single pool of candidates. We then post-processed and refined the results using MD simulations. While most docking tools use some criteria obtained from a single sample point, MD trajectories enable us to achieve multi-point sampling as well as conformational equilibration. MM/GBSA, structural stability, interaction energy and BSA were used to re-rank different complexes. Our results showed a substantial change in the rankings after MD equilibration. Then, the docked complexes were clustered based on the proximity of the ligands. The strongest members of the most populated clusters were selected as the representatives of the whole pool of candidates. Top-ranked complexes were again equilibrated after deliberate separation of ligand from the host to examine the binding reconstruction and stability. Included in our system are Rev-NES and RanGTP. Using three high performance docking approach (ClusPro2.0, FireDock and GRAMM-X), for each of the two types of host (w/ and w/o RanGTP from 3NBZ and 3GB8 respectively) the best 10 candidates were selected from each docking server and were merged together in a pool of 60. While DDX3 molecules spread all around CRM1 without RanGTP, addition of RanGTP lead to higher accumulation of ligands on the back side of CRM1 (opposite side of the RanGTP binding site). This attractive area is also in proximity of NES for DDX3, which would place DDX3 in a favorable position for interaction with Rev and HIV-1 RNA. After performing MD equilibration, new rankings were obtained based on approximate free energy and structural stability. According to the relative strength of all 60 candidates, ClusPro2.0 produced the strongest complexes in all cases, both with and without RanGTP. Considering all docking tools together, MM/GBSA values are not meaningfully different between the two cases (with and without RanGTP). However Clu-sPro2.0 candidates have higher MM/GBSA with RanGTP. Also, interaction energy, RMSD and BSA showed their best values among ClusPro2.0 results in the presence of RanGTP. Hence, the most favorable cases were achieved by ClusPro2.0 and in the presence of RanGTP. DDX3 binding sites without RanGTP seem to be distinct from the host with RanGTP. It could be that DDX3 can bind without RanGTP erroneously; nonetheless, this structure may not leave the nucleus. Instead, the correct binding spot for DDX3 (correct as in capable of allowing export) may be present once RanGTP has bound to CRM1.
Computational alanine scanning of interfaces in all binding candidates helped us determine the distribution of hot interfacial residues collected from the post-processed docking analysis. The CAS results corroborated the observation mentioned about the DDX3 binding mode distribution all around CRM1 in different cases. In addition, conservation analysis and potential hot residue prediction were performed to accompany the main results. However, comparing CAS hot residue distribution with the outcome of evolutionary analysis, we could not see a strong correlation and agreement between them. The only region that appeared to be hot in all three distributions is the region on the back side of CRM1 neighboring the NES binding cleft (Figs. 4 and 5). Indeed, any region closer to the cargo-binding site is essential for CRM1's role and must be the most conserved and active domain.
Clustering of the docking solutions helped us find the most attractive regions for accumulation of binding mode candidates. Together with the strongest binding modes, 6 top candidates were selected, wherein 5 had RanGTP in the host structure. Four DDX3 molecules were bound to CRM1 on the site opposite RanGTP, with some of these DDX3 molecules being able to interact with RanGTP through the CRM1 central hole. One DDX3 molecule sits on the outer rim close to NES binding cleft and the last one adheres to the terminal domains. After a 10Å separation for the 50ns MD simulation, the top 6 candidates could again establish stable, strong binding. For the best 2 in the top 6 ((3NBZ) ClusPro #7 and (3NBZ) GRAMMX #8), it was observed that interaction with RanGTP played an important role in stabilizing the binding. In agreement with previous research [13], DDX3 and CRM1 can bind stably in the absence of RanGTP, but the addition of RanGTP leads to more robust interactions. As previously mentioned, the most attractive binding area for DDX3 was the back side of CRM1, close to NES binding site. Of the docked structures that exhibited binding of DDX3 in this region, the strongest binding modes also included interaction between DDX3 and RanGTP. Some of these binding modes, specifically (3NBZ) ClusPro #7 and (3NBZ) GRAMMX #8, became the strongest modes because they exhibited salt bridge formation between DDX3 and RanGTP.
The order of events occurring during the formation and transport of HIV mRNA export complex is not fully understood. Our results suggest that the binding of DDX3 is stronger and more directed toward a specific region in the presence of RanGTP. Given that CRM1 does not leave the nucleus without bound RanGTP, these data could suggest that DDX3 binds to CRM1 in the nucleus and is kept bound throughout the export process to be utilized as an RNA helicase down the road.
On the other hand, DDX3 has only been shown to be involved in CRM1-mediated nuclear export during HIV-1 infection. HIV-1 Rev may form some transitory interaction with DDX3 and deposit it in the vicinity of CRM1. In other words, the interaction of DDX3 with CRM1 cannot rely on free diffusion, but if it can be held in the vicinity of CRM1 for sufficient time to sample different conformations, then stable binding with CRM1 can be achieved long enough to last throughout the transport process. In an alternative scenario, when RanGTP is present, the affinity of CRM1 toward Rev-NES is increased, and if DDX3 is bound to Rev, DDX3 can be re-localized within the vicinity of the NES-binding site of CRM1. This may explain why in the presence of RanGTP-DDX3 binding sites are so focused instead of spread all over CRM1.  Table. List of interfacial warm and hot residues (ΔG mut!Ala > 1kcal/mol) for top FMD binding mode obtained from computational alanine scanning. bold face indicates the hot residues (i.e., ΔG mut!Ala >2 kcal/mol). (DOCX)