New Binding Site Conformations of the Dengue Virus NS3 Protease Accessed by Molecular Dynamics Simulation

Dengue fever is caused by four distinct serotypes of the dengue virus (DENV1-4), and is estimated to affect over 500 million people every year. Presently, there are no vaccines or antiviral treatments for this disease. Among the possible targets to fight dengue fever is the viral NS3 protease (NS3PRO), which is in part responsible for viral processing and replication. It is now widely recognized that virtual screening campaigns should consider the flexibility of target protein by using multiple active conformational states. The flexibility of the DENV NS3PRO could explain the relatively low success of previous virtual screening studies. In this first work, we explore the DENV NS3PRO conformational states obtained from molecular dynamics (MD) simulations to take into account protease flexibility during the virtual screening/docking process. To do so, we built a full NS3PRO model by multiple template homology modeling. The model comprised the NS2B cofactor (essential to the NS3PRO activation), a glycine flexible link and the proteolytic domain. MD simulations had the purpose to sample, as closely as possible, the ligand binding site conformational landscape prior to inhibitor binding. The obtained conformational MD sample was clustered into four families that, together with principal component analysis of the trajectory, demonstrated protein flexibility. These results allowed the description of multiple binding modes for the Bz-Nle-Lys–Arg–Arg-H inhibitor, as verified by binding plots and pair interaction analysis. This study allowed us to tackle protein flexibility in our virtual screening campaign against the dengue virus NS3 protease.


Introduction
Dengue fever (DF) is an infectious disease caused by four distinct serotypes of Dengue virus (DENV1-4) transmitted by Aedes spp. Milder manifestations of the disease may include fever, rash, headaches, joint and muscle pain, fatigue and vomiting. Re-infection by different serotypes, however, may cause much more significant clinical conditions, like Dengue Hemorrhagic Fever (DHF) and Dengue Shock Syndrome (DSS) [1,2] which can cause death.
DF is estimated to affect over 500 million people every year [3] and has recently been ranked as the most common cause of febrile illness in travelers, surpassing malaria and gastrointestinal infections [4]. Together with the ongoing expansion of mosquito habitats [5] either due to the recent climate changes and to the urbanization of developing countries [6], this fact has drawn the attention of sanitary and health centers around the globe. Two autochthonous cases in Europe [7] and recent outbreaks in southern USA [8] have shown that dengue is no longer exclusively a problem for tropical developing countries. Despite its high incidence, severity and economic burden, there are currently no antiviral treatments nor vaccines for DF. The development of an efficient anti-DF vaccine faces the challenge to provide protection for all four serotypes at once [9], otherwise it may render immunized individuals more susceptible to DHF [10]. Regarding the design of antiviral drugs, viral proteases are often proposed as potential therapeutic targets due to their essential task of processing viral polyproteins into their functional unities [11]. Concerning Dengue virus (DENV) and its close relative West Nile virus (WNV), this role is assigned to the multi-domain nonstructural protein 3 (NS 3). NS3 is composed by a protease (NS3 PRO ) and a helicase (NS3 HEL ) domain, with the former being responsible for processing the polyprotein in specific sites ( Figure 1, adapted from Umareddy et al, 2007 [12]). The NS3 PRO domain (EC 3.4.21.91) belongs to the S7 family of serine proteases, and needs a cofactor, the hydrophilic loop from NS2B (NS2B CF ) in the case of DENV and WNV, to become fully active [13,14].
This protease has already been recognized as a valuable target for the design of new antiviral inhibitors against Dengue virus [15,16], and, therefore, designing potent inhibitors against DENV NS3 PRO is an active research line in the fight against DF [17][18][19][20][21][22]. The use of computational drug-design approaches would be useful here to improve the discovery of putative hits and to help obtain new leads [23][24][25][26][27][28][29][30][31]. However, previous virtual screening campaigns have fallen short in the identification of new inhibitors, since none was able to find small organic compounds in the submicromolar inhibitory range. It is now widely accepted that protein flexibility is an important factor to be taken into account to ensure the success of virtual screening campaigns [32]. The flexibility of DENV NS3 protease is evidenced in current crystallographic structures by the lack of atomic coordinates for many residues, the difficulty in resolving an inhibitor-enzyme structure, the variable positioning of the cofactor in respect to the protease, and those of several loops [33][34][35][36]. This scenario could explain the relatively low success of previous virtual screening attempts against this target. In fact, the flexibility of DENV NS2B/NS3 protease has already been proposed, but not proved, to explain the poor results of current drug design campaigns [21].
To address this issue, we performed molecular dynamics (MD) simulations towards an ensemble docking campaign. Ensemble docking is now widely recognized as an efficient strategy for incorporating protein flexibility in virtual screening campaigns [37][38][39], especially when the ensembles are extracted from molecular dynamics simulations [40,41]. MD simulations of DENV NS2B/NS3 PRO described in the present work intended to sample as closely as possible the ligand binding region landscape prior to inhibitor binding. The obtained conformational MD sample was therefore clustered into families diverging at least 4.5 Å apart. Thus, a limited protein conformational set, shown to contain structures sufficiently different at their binding sites ( Figure S1), was chosen to be employed in a subsequent ensemble docking simulation using focused chemical libraries.

Homology Modeling
We obtained most of our results before the availability of the 3D structure of the Dengue virus 3 NS2B/NS3 PRO in complex with its inhibitors [36]. However, as all of the accessible crystallographic structures, including the novel ones, presented missing residues, we needed to model their related segments. In addition to that, we considered three main factors during the construction of our 3D model. Firstly, it had to relate to the fulllength sequence of the expression construct we have (DENV 1 NS2B-Gly [4] -Ser-Gly [4] -NS3 PRO ). Secondly, it had to represent the catalytically active "closed" conformation. Lastly, it needed to be able to take into account the possible influence of N-and C-terminus domains in the NS3 protease active site conformational changes and/or accessibility. This led us to build a 3D model that included the hydrophilic NS2B cofactor region, a glycine flexible link, and the NS3 proteolytic domain in complex with the NDL inhibitor.
To obtain this DENV NS2B CF -Gly-NS3 PRO model with as much structural information as possible, we performed an alignment among all Dengue virus and West Nile virus protease 3D structures available in the protein database (PDB) at the time (Table 1). This allowed us to check the completeness of the PDB structures compared to the sequences deposited by the same authors. From this analysis, we were able to gather information to build three different models of our system: Two preliminary models based either The polyprotein is composed of three structural subunits: capsid (C), precursor of membrane protein (PrM) and envelope (E), as well as seven nonstructural (NS) subunits: NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5. The NS3 is a multifunctional protein composed of a helicase domain (NS3 HEL , see detailed box) and a protease (NS3 PRO , also observable in the detailed box) domain, which in turn needs the hydrophilic loop of NS2B (NS2B CF , marked in green) as a cofactor to be fully active. Red arrows indicate sites processed by the viral NS2B/NS3 protease. Based on Figure 1 from Umareddy et al., 2007 [12], for illustrative purposes only. For each model, we prepared the PDB templates by removing unused chains, ligands, water and ions. The cleaned structures were aligned using the Discovery Studio Visualizer 3.1 (Accelrys, Inc) based on the sequence alignment of their conserved residues. Prior to submission to the homology software MODELLER [42], the sequence alignment obtained previously was edited to ensure certain features would be found in the resulting model, i.e., the N-terminus region (including NS2B CF ) from WNV structures and C-terminus from DENV structures in the final model. For each set of templates (WNV only, DENV only or WNV/DENV), five homology models were generated, each with three loop refinements, giving us 45 models. The three lowest energy models (based on each set of templates) were chosen and verified by PROCHECK [43] to ensure their structural quality. We next visually compared these models and retained the one with features from both WNV and DENV structures, as it contained less disordered coils and loops.
Finally, we transferred the atomic coordinates of the NS2B cofactor and those of the NDL inhibitor (Bz-Nle-Lys-Arg-Arg-H) from the 2FP7 PDB template into the final NS3 PRO model as obtained above. After arranging the cofactor in place, we built the glycine linker with the homology module of the INSIGHT II package (Accelrys Inc). The inhibitor was deliberately left covalently unbound to simulate conformational changes in the active site prior the formation of the protease/substrate intermediate. Using the AutoPSF Generation Plugin provided with VMD [44], missing hydrogens were added according to their predicted protonation state at pH 7.0 and histidine residues were assigned to their δ-protonated state.

Molecular Dynamics Simulations
The final model of the complex between the NDL ligand and the NS2B CF -Gly-NS3 PRO target obtained above was submitted to MD simulations. The complex was firstly solvated with TIP3P WNV 3E90 NS2B CF -Gly-NS3 PRO + NKK [69] a. Instead of using the full hydrophilic loop from NS2B, a short sequence of 18 residues was used, only to provide solubility to the expressed complex.
b. The construct has a 10 residues deletion in the N-terminus of NS3 PRO , in order to facilitate crystallization.
NDL: Bz-Nle-Lys-Arg-Arg-H; BPTI: bovine pancreatic trypsin inhibitor; NKK: Naph-Lys-Lys-Arg-H explicit water molecules in a box with at least 15 Å apart from any point in the protein, ending up with a box of 80 x 100 x 85 Å. Eight Na + ions were added to ensure electrostatic neutrality. Missing parameters for the ligand were added using the CHARMM22 force field [45]. The NAMD software [46] was used to simulate the dynamic behavior of the complex using the parm.prm parameter file from CHARMM22 in periodic boundary conditions. Long-range electrostatics was evaluated using the particle-mesh Ewald approach [47]. Simulations were carried out in the NPT ensemble using Langevin dynamics and piston to fix temperatures (300 K) and pressure (1 atm). Hydrogen-heavy atom bonds were constrained to their equilibrium values with the SHAKE algorithm [48].
The system was first energy minimized (6400 conjugate gradient steps) and then equilibrated (500 ps) before recording the trajectories. All MD trajectory frames were recorded at 1 ps intervals, for a total of 30 ns simulation.

Clustering
Once the MD simulations were complete, all frames were aligned by taking into account only heavy atoms in the core region of the NS3 protease alone (residue range from 20 to 168). This was done in order to focus the protein conformational analysis on the binding site region only. The RMSDs were calculated by using the appropriate VMD plugin and the corresponding data were exported as an ASCII matrix by the VMD Multiplot module. For clustering, a previously developed in-house Tcl script was used with a cutoff of 4.5 Å. This script builds a 2D RMSD matrix for identifying conformational families within a given RMSD range.

Pocket detection and volume analysis
Pockets were detected using METAPOCKET [49], an algorithm using several reliable pocket detection tools to combine their results and improve sensibility. Changes in pocket volume and surface (pocket descriptors) were monitored with MD Pocket [50], a software capable of analyzing topological changes in cavities during MD simulations.

Analysis software
All molecular representations as well as several other analyses (such as distances between catalytic residues through the trajectory and secondary structure maintenance over time, for example) were performed by VMD. All graphics (RMSD, pair interactions, mobility plots, pocket volume evolution, etc) were plotted using GraphPad Prism 5 (GraphPad Software, Inc). Two-dimensional binding plots from docking results were generated with Discovery Studio Visualizer 3.1 (Accelrys, Inc).

Building the 3D model PDB Templates Sequence alignment.
To build a homology model based on the available homologous PDB structures, we first performed a manual sequence alignment analysis of these templates ( Figure 2). This alignment shows highly conserved residues between the DENV and WNV NS3 PRO proteins, with an overall identity of 40.6% (64.2% similarity). One block of 5 residues, numbered 59 to 63, seems to vary according to the DENV serotypes. A lack of structural information at C-and N-terminal regions was found, probably due to their high flexibility, and hence little information from those residues could be utilized for molecular modeling. NS3 PRO N-terminus data were more likely to be found in West Nile virus structures ( Figure 2, blue box), while those of Cterminus were better provided by full (with both protease and helicase domains) NS3 Dengue virus structures ( Figure 2, red box).

Templates structural alignment.
Regarding the PDB structures, while much has been said about different conformations related to distinct serotypes [35] and about the influence of ligands and co-factors [33], the protein core structure is well conserved even when compared between DENV and WNV ( Figure 3A). The RMSD values for each template main chain atoms compared to the reference protein 2FOM are given in table 2, showing that active site residues position is similar among them, irrespectively of whether the cofactor was in the open or closed conformation.
The situation is different for the C-terminal turn ( Figure 3A, residues 151 to 167, marked in shades of red, bottom left corner) that adopted several spatial conformations among This alignment allowed us to verify the completeness of each deposed structure, as well as to fill structural gaps, direct N-(blue box) and C-terminus (red box) regions and the cofactor itself. Green triangles indicate the catalytic triad (HIS 51 , ASP 75 and SER 135 ); sequences are numbered accordingly to the full constructs and shaded based on sequence similarities (black for identical, dark gray for strongly similar and soft gray for weakly similar residues). doi: 10.1371/journal.pone.0072402.g002 DENV structures, but not among WNV, indicating that this conformation may be stabilized by ligand binding.
Homology Modeling. Our strategy was initially to build a robust model of the NS3 PRO domain and next to add the NS2B CF , the ligand and the linker pieces to achieve the active conformation of the NS2B CF -Gly-NS3 PRO model, which was later used in the MD simulations. For that purpose, we first built two preliminary NS3 PRO models based exclusively either on the DENV or on the WNV structures. Besides belonging to different species, the main difference observed was the absence (DENV) or the presence (WNV) of ligands. Among the five models generated by MODELLER from each template set, those with the lowest energies were analyzed. Overall, the models did not differ much at the protein core, whereas, given the divergence on the amount of information between templates, N-and C-terminus were quite distinct. For the model based on the DENV structures, the C-terminal end was arranged more concisely in an alpha-helix shape, which was not observed in the model based on WNV NS3 PRO . On the other hand, the WNV-based model had its N-terminal end projected into a beta strand that contributes to one of the betasheets found in the protein (data not shown).
The superposition of these two 3D models based exclusively on the WNV or DENV NS3 PRO provided different features that were considered to build a third NS3 PRO model. In this last model, the protein core was modeled as the conserved consensus of all template structures, while the N-terminal region came from WNV proteases and the C-terminal end from the DENV proteases. After visually comparing the structures of the three models, we finally retained the third one, which contained less disordered coils and loops, to be used in the next building steps to obtain the final NS2B CF -Gly-NS3 PRO model.
After being verified by PROCHECK, the complete NS2B CF -Gly-NS3 PRO model was found to be of acceptable accuracy (over 96% residues in allowed regions) and was chosen as definitive to carry on further studies (File S1). Figure 3B shows the structural alignment of the finished homology model with the newly available ligand-bound DENV structure (PDB id: 3U1I) [36]. The alignment has a C α atom RMSD of 0.9 Å for 195 atoms, which gives consistency to our built model. Additionally, as pointed by Knehans et al. [24], the oxyanion hole was also present in our model ( Figure 3B, boxed detail), even though we did not used restraints during homology modeling. Those features, together with our model completeness, including both inhibitor and cofactor in the active conformations, as well as the gap-filling loop grafts directed by  . [36]. In (A), residues were colored based on their Qres factor, obtained after a STAMP structure alignment performed with the VMD Multiseq plugin [68]. Color ranges from dark blue (highly conserved positions) to red (not conserved at all). In (B), crystallographic structure was colored with gray (NS3 PRO ) and red (NS2B CF ); homology model was colored in lighter shades -white (NS3 PRO ) and pink (NS2B CF ). In addition to the highly conserved active site residues position, the oxyanion hole was also preserved in this model (boxed detail).

Protein model overall stability and flexibility
Molecular dynamics.
The timeline analysis of the secondary structure pieces during the 30 ns of MD simulations (Figure 4) showed that protein domains were stable during the trajectory, especially concerning extended conformations (in yellow). Turns and coils (teal and white, respectively), including the polyglycine linker (NS2B residues 95 to 104), constantly alternated between those two types of conformations.
After the MD simulation run, both the ligand and the cofactor remained closely bound to the NS3 PRO domain ( Figure S2). Each frame collected from the trajectory file was then aligned to the first one based on their heavy atoms. RMSD analysis shows a 4-stages curve, with an increasing value in the first 2 ns, after which the model stabilizes for the next 5 ns, followed by a new increase and final stabilization after 15 ns ( Figure 5A, NS3 PRO F). However, this stabilization is achieved soon after equilibration (before 1 ns of simulation), and with much lower RMSD values (average RMSD of 2.2 versus 5.97 Å), when we analyze the protein core alone (residues 20 to 168, Figure 5A, NS3 PRO C). As the secondary structure modifications during MD simulations could not account for such a high deviation when analyzing the whole model, we supposed that this was rather due to domain flexibility, which in turn could regulate the access to the binding site.
As it would be impossible to explore each frame separately in vHTS studies, we decided to use a two-dimensional plot clustering technique capable of investigating the structural relatedness of different parts of a MD trajectory ( Figure 5B). By doing so, 4 distinct conformational families were detected when analyzing the whole sample, but only two (and with a high degree of relationship) could be detected while analyzing the protein core alone. By choosing the most conserved spots in the two-dimensional RMSD plot, representatives of each conformational family (snapshots corresponding to frames 200, 400, 600 and 800, Figure 5C) were selected for conformational analysis of the binding site. As the last cluster (frames 800 to 1000) contained several conformations related to the previous one (frames 400 to 800), we picked up the frame 800, instead of 900, as an averaged representative of the last family of conformations.
Principal component analysis (PCA). To verify which motions could account for RMSD variations, we finally performed a principal component analysis (PCA) from the MD trajectory. Results for the first 5 eigenvectors confirm that the greatest motions were contained within the first two eigenvectors and they indeed came from the NS3 PRO Cterminal end ( Figure 6). This region has been described as highly flexible, as noticed by differences in the two crystallographic structures for the full (both proteolytic and helicase domains) NS3 construct [34,51,52]. However, this fact does not exclude the possible existence of small but significant motions that may be important for binding site plasticity ( Figure   6B). To investigate further this question, we also performed 2D ligand-receptor interaction plots.

Protein binding site and binding modes flexibilities
Binding pocket detection and evolution. The DENV protease has a very shallow pocket [33,53], a fact that led us to employ the METAPOCKET servers. This server utilizes several redundant already validated geometric algorithms to search for proteins cavities. Thus, where one algorithm may fail, another one can correct the results, providing accurate topological information. This chosen pocket detection strategy showed to be in accordance with experimental data, with almost 70% (15/22, Table 3) accuracy. Both S1 and S4 sub-pockets were fully detected, with less favorable results for sub-pockets S2 and S3 with 37.5 and 33.3% accuracy, respectively. It is noteworthy to say that this method was capable of detecting several residues not present in the experimental data. Those residues may constitute useful sub-pockets for designing new competitive molecules.
Following detection, the protease pocket coordinates ( Figure  7A) were subjected to the MD pocket algorithm to be monitored for volume changes. As in many proteases [54][55][56][57], the pocket volume presented a 'breathing' behavior, adopting a sinusoidal curve when plotted against time ( Figure 7B). This suggested that, even if the protein core remained stable during the same simulation, the binding site accessibility seemed to change, probably due to allosteric modulation provided by the more flexible domains.
Protein/ligand interactions evolution. Ligplots of the protein/ligand binding modes (Figure 8) showed changes in both the protein sub-pockets conformation and in ligand interactions, indicating that the binding site is definitely not static and that this plasticity could be explored for competitive structure based drug design.
Noteworthy was the fact that the residues from the S2 subpocket were represented only by THR 83 in the conformations 2-4, but this could not be depicted in the conformation 1. In this case, the protein relaxation promoted by the MD simulation may have exposed this residue so it could be detected by this method. The residues from the S3 pocket (MET 84 , LYS 85 and ILE 86 ) showed the same pattern, all absent in conformer 1 but with variable incidence in the other conformations. As for residue interactions, some were more conserved than others. In all conformations, residues GLY 133 and GLY 153 were involved with hydrogen bonds between their main chain atoms and the C-terminal oxygen from P1 and the backbone oxygen from P2, respectively. On the other hand, π-interactions seemed to need a more constrained conformation, as this was observed only with the aromatic residues TYR 161 (with both P1 and P3) and HIS 51 (with P2) of the first conformation. Hydrogen bonds coming from SER 135 , GLY 151 and THR 134 were less conserved, but also related to the C-terminal oxygen from P1. Interactions between P3 and S3 were observed only in conformations 2 and 4; side-chain interactions from P1 were observed only in conformation 1 and 4.
To investigate further the plasticity of the DENV NS2B CF NS3 PRO binding site and the possible consequences on ligand binding, we analyzed the protein/ligand interactions during the whole MD trajectory. For that purpose, we calculated the pair interaction energies between the ligand NDL and the protein ( Figure 9A, black line), the ligand and surrounding water ( Figure 9A, gray line) and for the ligand with each protein residue ( Figure 9B). Contact points between ligand and protein could be mapped by plotting the average total energy (potential plus kinetic) of each residue. Similar to results obtained from the binding plots, the main residues involved in the interactions with the ligand were from the sub-pockets S3 (residues MET 84 , LYS 85 and ILE 86 ), and all the residues from S1 and S4. The greatest interaction averages came from ASP 129 , followed by THR 134 and SER 135 from S1. However, the evaluation of pair interaction forces was more sensitive for detecting residues from the sub-pocket S2, as it was able to recognize residues ASP 75 , LYS 73 and VAL 72 , none of them found within any conformation inspected through the ligand-binding plots. When analyzing the ligand/protein and the ligand/water interactions together ( Figure 9A), we could see that while the energy of the former increases (thus, diminishing ligand-protein attraction), the latter seems to decrease proportionally (enhancing the ligand-water attraction). This is normally expected, as during MD simulations, the protein-ligand complex becomes more relaxed, allowing water to interact with the residues from the binding site. This results in decreased interactions between ligand and protein. It is interesting to note that, despite the attraction between ligand and protein being diminished at the beginning of the simulation, the complex achieves stabilization after 10 ns. In fact, one can say that during the last 5 ns of simulation, the protein-ligand interaction increases again, approaching the initial state.

Discussion
One of the limitations in using crystallographic structures in structure-based drug design is the fact that those structures are often related to a particular energy minima state, which is a single conformation among many possible ones. Although this does not represent a significant problem, it may undermine virtual high-throughput campaigns. To circumvent this issue, one may use a range of strategies for simulating protein flexibility, among which we can highlight the ensemble docking with structures derived from molecular dynamics simulations [38,40,41].
In the last few years, several DENV NS3 PRO automatic docking studies were published based on homology models [23][24][25][26][58][59][60]. In our work, besides investigating protein flexibility, we were careful to build our DENV NS3 PRO model in the predicted active conformational state (including the oxyanion hole), a concern also present in the study by Knehans et al. [24]. However, most of the other studies were based either on the retracted apoenzyme structure [26,59], or in the open holoenzyme conformation [23,25,58] and were based on a rigid protein system with a single conformation. The first homology model for the DENV NS2B/NS3 protease was done by Brinkworth et al. in 1999 [61]. However, as it was built based on the more distantly related hepatitis C virus NS3/ NS4A protease (PDB id 1JXP [62]), several structural differences (Cα RMSD of 11 Å) were observed when comparing it to the recently resolved DENV-3 NS2B/NS3 protease (PDB id 3U1I [35]). Also, this work did not take into account the ligand-binding induced fit mechanism, as it was yet to be proposed [33,63], and thus, ligand was absent from their model. Later, in 2007, Kee et al. [64] used this same model for automatic docking studies. In 2004 and 2005, respectively, Niyomrattanakit et al. [65] and Chanprapaph et al. [60] built homology models based on the now retracted apo DENV NS3 PRO structure. Similarly, Ganesh et al., in 2005 [59], and Tomlinson et al., in 2009 [26], used homology models based on the retracted structure for automatic docking studies. For studying the importance of domain motion between the NS3 PRO and NS3 HEL domains through MD simulations (1 ns), Rosales-León et al., in 2007 [66], built a full DENV NS3 model, but also based on this retracted structure. In 2010, two groups [29,67] independently published a similar approach for building active conformational models, where the cofactor and the ligand were extracted from the WNV NS2B CF /NS3 PRO structure (PDB id 2FP7) and fused to the DENV NS3 PRO . However, the oxyanion hole is not in the active conformation in the work by Wichapong and colleagues [67], and could not be accessed in the work by Frecer and Miertus [29] for further comparisons. Also, neither models presented linker regions nor grafted loops.
Regarding protein flexibility, Ekonomiuk et al. [31] also postulated that it would be important to access new conformations for vHTS in the case of the WNV protease. However, instead of performing a long MD simulation, they Table 3. Pocket residues as detected by METAPOCKET compared to those found experimentally.
Detected pockets were identified by METAPOCKET, as described in Methods. * Detected residues which may be important to ligand binding, but were not previously reported in literature. preferred a short run (1 ns) to look for a conformation able to better accommodate a benzene ring later used for fragment docking. Figure 5A shows the importance of performing a longer simulation to our purposes: even if a protein seems stable during shorter trajectories, they can surpass greater conformational changes after longer simulation times, revealing important motions that otherwise would not be detectable. These motions can be related to important conformational states found during protein equilibrium, which in turn could be subjected to vHTS. When comparing the one-and twodimensional RMSD graphs, we can observe that the variations in the former are related to conformational changes in the latter, which are maintained in a clustered family. This means that conformational changes observed in the one-dimensional RMSD plot last for some time, which is extremely desirable for finding new inhibitors that may bind to this protein state. Despite presenting higher conservation during the trajectory, the binding site also presents considerable plasticity that may be important in structure-based drug discovery. In the work by Ekonomiuk et al. [31], backbone changes as low as 0.8 Å between the crystallographic structure and their model extracted from MD simulations were enough to better accommodate three different fragments used in vHTS. In our case, the average binding site heavy atoms deviation was approximately 2 Å, reflecting changes both in pocket volume ( Figure 7B) and in ligand binding modes ( Figure 8).
In an already mentioned work, Wichapong et al. [67] performed 10 ns of MD simulations to examine ligand-protein interactions in three different models. Irrespective of the differences between this work and that of Wichapong and colleagues' (for example: the inhibitor was covalently bound to SER 135 , the absence of the linker region and grafted loops, and the position of the oxyanion hole), our analysis seems to be in accordance with theirs. However, as they did not perform timewise analysis related to different points in the trajectory, changes in the binding modes through the simulation could not be evaluated.
Despite the fact that almost every binding pocket residue found in the literature was also detected in our study, binding modes were sometimes quite distinct. With the recent publication of the ligand-bound structure of the DENV-3 NS3 protease [36], using the same inhibitor used in our study, we were able to compare the protease sub-pockets. While in the crystallographic data GLY 151 interacts with the P2 residue from the inhibitor, in our study it interacts promptly with the P1 residue. Other changes include GLY 153 , part of the S3 subpocket in the crystallographic structure, but conserved in S2 in all conformations from our model; and the residue TYR 161 , which was capable of making π-interactions with both P1 and P3 residues in our studies, while it is described as a S1 subpocket residue in the crystallographic structure. Also, LYS 85 from the cofactor NS2B was able to produce a charge-charge interaction with the P3 LYS from the inhibitor, a fact not observed in the work from Noble et al. [36]. Altogether, those changes in binding mode, either between our models and the crystallographic structure or among different conformations extracted from our MD simulations, indicate that the binding site indeed presents significant flexibility that could be explored in structure-based drug design.

Conclusions
This article describes the analysis of the Dengue virus NS3 protease flexibility through molecular dynamics simulations, as a part of an ensemble docking campaign for identifying potential new leads for drug design. As a result, we were able to clearly distinguish four different conformational states, which are being used in a subsequent virtual high-throughput ensemble docking campaign. By doing so, we hope to discover new promising molecules with therapeutic potential to treat this life-threatening disease. Figure S1.

Supporting Information
The binding site of each representative conformation identified by our clustering strategy. The NDL inhibitor is displayed as a stick model (C, O and N atoms in white, red, and blue, respectively). Residues participating in the binding site are displayed both as thin sticks and as transparent surfaces colored accordingly to the residue name. The plasticity of the binding site environment and in the binding modes is evidenced by the differences observed during the molecular dynamics simulations. (TIF) Figure S2. Snapshot from the last frame of the 30ns trajectory, showing that both the inhibitor and the cofactor remain closely bound to the NS3 protease. The NS2B CF is maintained in the "closed" conformation. NS3 PRO is depicted in gray, NS2B CF (and linker) in red. Active site side-chains are represented as green sticks, and the NDL inhibitor in orange. (TIF) File S1.
PDB file of the final model with all its components: NS3 PRO , NS2B CF , glycine linker and NDL inhibitor. For displaying it in a molecular viewing program (such as VMD), please copy the content of the. doc file, paste it into a new document in a simple text editor (MS Windows notepad, for instance), and save it with a. pdb file extension. (DOC)