F429 Regulation of Tunnels in Cytochrome P450 2B4: A Top Down Study of Multiple Molecular Dynamics Simulations

The root causes of the outcomes of the single-site mutation in enzymes remain by and large not well understood. This is the case of the F429H mutant of the cytochrome P450 (CYP) 2B4 enzyme where the substitution, on the proximal surface of the active site, of a conserved phenylalanine 429 residue with histidine seems to hamper the formation of the active species, Compound I (porphyrin cation radical-Fe(IV) = O, Cpd I) from the ferric hydroperoxo (Fe(III)OOH-, Cpd 0) precursor. Here we report a study based on extensive molecular dynamic (MD) simulations of 4 CYP-2B4 point mutations compared to the WT enzyme, having the goal of better clarifying the importance of the proximal Phe429 residue on CYP 2B4 catalytic properties. To consolidate the huge amount of data coming from five simulations and extract the most distinct structural features of the five species studied we made an extensive use of cluster analysis. The results show that all studied single polymorphisms of F429, with different side chain properties: i) drastically alter the reservoir of conformations accessible by the protein, perturbing global dynamics ii) expose the thiolate group of residue Cys436 to the solvent, altering the electronic properties of Cpd0 and iii) affect the various ingress and egress channels connecting the distal sites with the bulk environment, altering the reversibility of these channels. In particular, it was observed that the wild type enzyme exhibits unique structural features as compared to all mutant species in terms of weak interactions (hydrogen bonds) that generate a completely different dynamical behavior of the complete system. Albeit not conclusive, the current computational investigation sheds some light on the subtle and critical effects that proximal single-site mutations can exert on the functional mechanisms of human microsomal CYPs which should go rather far beyond local structure characterization.


Introduction
Cytochrome P450s (CYPs) form a ubiquitous enzyme family which is directly involved in the oxidation of a wide range of organic compounds including drugs, steroids, and vitamins. [1] The importance of CYPs in nature is witnessed by their presence in all forms of life, from archea to animalia, including mammals. For instance, in the human genome, there are 57 different genes involved in CYP expression. [2] Due to their activity and diversity, [3][4][5] CYPs have attracted massive studies designed to understand the factors of their activity.
The active site of CYP contains a heme group buried inside the protein matrix and linked to a conserved cysteine residue with a Fe-thiolate bond. [6,7] This implies that substrates in the bulk solution can interact with the CYPs active site if and only if they are capable of permeating the functional channels connecting the active site with the protein surface which varies between the isoforms. [8][9][10][11][12][13] In the absence of the substrate, the Fe (III) ion is hexacoordinated with a covalently bound water molecule occupying the sixth-position opposite to the cysteine-thiolate ligand, so-called resting ferric state shown in 1 in Fig 1.[ 14,15] The entrance of the substrate (RH in Fig 1) to the distal region of the active site causes the displacement of the water molecule bound to the heme ion, to generate 2. Furthermore, the entrance of the substrate is generally accompanied by the exclusion of all excess water molecules from the vicinity of the heme group, through exit pathways generally different from the substrate access channel. [16] Such a dehydration of the catalytic site is essential for starting the enzymatic cycle of cytochrome P450 proteins, [17] since this favors the reduction of the ferric species 2 by cyt P450 reductase (CPR, first Electron Transfer, ET) due to increase of the Fe (II) /Fe (III) redox equilibrium towards the  [18][19][20]. Subsequently, O2 binds to the heme iron (4), which then undergoes a second reduction (5) and a protonation to form the Fe (III) OOHintermediate 6, called Cpd 0. The latter undergoes in turn, a second protonation on the distal OH, resulting in heterolytic O-OH bond cleavage, and generation of the iron-oxo porphyrin-cation radical intermediate, Por+• Fe (IV) = O 7, so called Cpd I. [21][22][23][24][25][26] This Por+• Fe (IV) = O species, which is the ultimate oxidant of the enzyme, catalyzes the efficient mono-oxygenation of many different substrates, e.g., 8. [14,24,[27][28][29][30][31][32][33] Alternatively, the protonation of the proximal oxygen atom of the Cpd 0 species is responsible for the release of hydrogen peroxide (H 2 O 2 ), thereby uncoupling the oxygen consumption and substrate oxidation (see central arrow in Fig 1). [34][35][36] The overall coupled process may be then summarized as follows: A detailed understanding of the factors modulating the enormous catalytic power of human cytochrome P450s is therefore of crucial importance. This group of enzymes in not only involved in the efficient hydroxylation of important natural and synthetic compounds; [1,5,[37][38][39] they also catalyze the oxidation of environmental pollutants to which the human body may be exposed, making them more soluble for easier elimination. [40] In the course of analyzing the activity of drug metabolizing CYPs, many enzymes were found to be nonspecific and capable of hydroxylating non-native substrates. [1,[3][4][5]40,41] The fact that drug-metabolizing CYPs are not specific is due to their conformational plasticity: [11] these enzymes feature adaptive structures capable of accommodating different reaction partners while essentially maintaining the same fold in all CYPs. This intriguing aspect has been attributed to the presence of secondary structural elements having relatively high degrees of flexibility. [11,12] It has been shown that substrate and water pathways follow different routes and that water access to the active site is regulated by the interaction of residue R375 with the heme propionate groups CYP-3A4 [13,16], which is able to switch between different conformations, allowing a fine control over the presence of water inside the pocket.
Previously Davydov et. al. [42] had unequivocally showed that, starting from the same amounts of Cpd 0, the Phe429!His mutant of CYP 2B4 exhibits, in the presence of the butylated hydroxytoluene (BHT) substrate, only 4−5% product formation (e.g. hydroxymethyl BHT and 3-hydroxy-tert-butyl BHT) compared with the wild-type (WT, 2B4) form. In a recent communication, Quantum-Mechanical/ Molecular Mechanical (QM/MM) calculations showed an increased propensity of the F429H mutant enzyme to undergo homolysis of the O-OH bond of Cpd 0, rather than the heterolysis exhibited by the WT enzyme. The root cause of this behavior was shown to be the formation of a hydrogen bond (H-bond), connecting the His429 to the Cys436 in the mutant enzyme, which increased the propensity for bond homolysis. [43] The QM/MM calculations followed molecular dynamics (MD) simulations which established the existence of this H-bond between His429 and the thiolate of the cysteine ligand. The results were later confirmed by a spectroscopic investigation of a newly obtained CYP 2B4 F429H crystal structure. [44] In this contribution, we tried to push forward the phase space exploration of CYP-2B4 by extensive MD simulations characterizing the WT enzyme and 4 different mutations of the F429 residue, namely F429A, F429E, F429H and F429L for a total of 0.75 μs of sampling time. Target mutations were selected in order to provide a wide range of hydrophilic/hydrophobic character and side chain bulk. This large scale investigation of the global mechanical effects induced by mutations on CYP 2B4 mechanical properties allowed us to further elucidate the role of F429, by searching for shared, frequently visited and statistically relevant, conformational sub spaces and alteration of specific contacts in all the trajectories. Several unique features of the native enzyme as compared to the 4 mutants have been found, even when F429 is mutated into another bulky hydrophobic residue such as Leu. Moreover, the results show clearly that the functional impact of this point mutation are not limited to the modes of O−OH bond cleavage in the WT and F429H enzymes, [43] but are extended to the whole enzyme dynamics, altering inter domain contacts (salt bridges and hydrogen bonds) and access pathways to the active site. Note that F429H has already been the object of a MD investigation [43] and for this reason we have omitted in this study results that have been already discussed previously. Here, our intention was to (i) to test whether the mutation of Phe has "long range" effects in addition to the local disruption of the active site electronic properties which further explain the selective pressure that make this residue highly conserved and (ii) to further investigate which unique properties of the F429H mutant observed in the experiments were caused by the removal of Phe429 or by the presence of a histidine residue.
The manuscript is organized as follows: after a summary of the methods used to set up, run (section 2.1) and analyze the five trajectories (section 2.2) the whole conformational space sampled in the 750 ns is analyzed in section 3.1 mainly by cluster analysis; an accurate assessment of the frames used to cluster and on the parameters is given in the Supporting Information (section S3.1). Section 3.2 includes covariance and fluctuation analysis of the studied systems. Section 3.3 compares the five trajectories in finer detail using hydrogen bond and salt-bridge statistics and solvent distribution around critical groups (e.g. Cys436) and includes also a closer inspection of the representative structures selected by the collective analysis presented in section 3.1, including solvent pathways. All the results are then discussed together in section 4, while section 5 includes a few concluding remarks.

System setup
Initial coordinates were obtained by the 1SUO [9] crystal structure (at a resolutions of 1.9 Å, see Fig 2A and 2B). At first, the enzyme was modified in order to form the Cpd 0 species, replacing the inhibitor present in the crystal ctructure with a O-OH covalent moiety. [43] The 4 mutations (Ala, Glu, His and Leu) were generated using the Dunbrack rotamer library [45]. In Fig 2 (panels a, b), we report a pictorial view of the WT form using the original crystal (1SUO. pdb) structure, with various domain shown in different colours; Fig 2C shows the active site with the with the mutated residue 429,which is located at the proximal side of the CYP 2B4 active site, and in particular the region adjacent to the cysteine residue as the fifth ligand to heme iron atom. Missing hydrogen atoms were added by MolProbity [46] and protonation states counter-checked with H ++ code [47]. Simulations were carried out with GROMACS, version 4.6.6. [48] Afterward, the entire enzyme was placed at the center of a square box of size 10.76 nm, subsequently filled with SPC [49] water molecules at a density of 1000 kg/m 3 ; Clanions were added to ensure the overall electrical neutrality of each system. The GROMOS 54A7 force field [50] was adopted for modeling the enzymes. Electrostatic interactions were accounted for by means of the Particle Mesh Ewald method [51] using a cutoff of 1.5 nm for the real space and Van der Waals interactions. The LINCS algorithm [52] was used to constrain all bond lengths and angles. Relaxation of solvent molecules and Clanions was initially performed keeping solute atoms restrained to their initial positions with a force constant of 1000 kJ/(mol • nm 2 ), for 3.0 ns in a NPT ensemble at 1 bar using the Parrinello-Rahman barostat [53] and a coupling constant of 1.0 ps; temperature was increased in 50 K steps from 0 to 298.15 K using the velocity rescale method [54] and a coupling constant of 0.1 ps; the integration time step was increased progressively from 0.1 to 1.0 fs. Each system was carried back to 0 K and then heated again to 298.15 K in a NVT ensemble using the same stepwise fashion (for a total of 3.0 ns of further thermalization) without restrains on protein atoms. The five systems were then simulated for 150 ns each in a NVT ensemble with a time step of 2.0 fs (updating the neighbor list every 10 steps). Additional short (10 ns) simulation were also carried out for each system generating new random velocities. In total, 2 short simulations were carried out for each system. Thus, 3 simulations were carried out for each system or 18 simulations in total. The starting coordinates for these trajectories were selected observing the Root Mean Square Deviation graph (see Section 3.1 and S1 Fig).

Analysis of trajectories
Root mean square deviations (RMSD), root mean square fluctuations (RMSF) and principal components analysis (PCA) were carried out using standard GROMACS analysis tools applying usual definitions [54].  Radial distribution functions of species B with respect to species A were calculated as: whereis the particle density of species B at distance r around species A, and hρ B i local is the particle density of type B averaged over all spheres around particle A with radius r max .
2.2.1 Artificial trajectories. For cluster analysis and the search of solvent access pathways frames from the various simulations were selected using the following protocol: a trajectory was divided in equal chunks using a previously determined stride (see below); a frame was then selected randomly from each chunk, excluding those within a cutoff of neighbor chunks equal to 10% of the stride. Thus, a stride of 40 ps mean selecting frames randomly in each chunk, excluding those within 4 ps of neighbor chucks. Strides of 40 ps and 80 ps were selected following Shao et al [55]: in this study it was observed, using several different clustering methods, that frames separated by time spans up to 50 ps are usually grouped in the same cluster and thus a lower stride (i.e. a higher number of frames) does not improve the quality of the analysis. This procedure minimized the correlation between frames due collective motions in the 1-10 picoseconds time scale. After this random sampling the frames obtained were concatenated together to form an "artificial trajectory", always using the following order for systems: WT, F429A, F429E, F429H, F429L. For each system the 2 short simulation were concatenated after the 150 ns one. Note that concatenating the trajectories of the five systems was meant as a way to easily identify and comparing shared conformational basins in the different systems. In this study, whenever frames from multiple simulations have been concatenated and/or analyzed, the resulting trajectory is referred to as an "artificial" trajectory. In particular, 3 frame sets, were created by concatenating the frames from the various simulations. In the first one (hereafter labeled as frame set #1) an average stride of 80 ps was used; the number of frames sampled from each trajectory is given in Section 3.1. This first artificial trajectory was used in order to optimize the number of clusters and/or the cut-off in the pairwise distances in a series of calculation. A second one (hereafter labeled as frame set #2) was created using a stride of 40 ps; this artificial trajectory was used to repeat the clustering, select centroids and perform fluctuation and principal component analysis. A third trajectory (hereafter labeled as frame set #3) was also created, using a stride of 80 and only taking frames from the 150 ns simulations. To avoid the inclusion of configurations from far from equilibrium states in the 150 ns runs, the first part of these simulations was excluded from the generation of artificial trajectories (see Section 3.1). Note that the computational cost for clustering scales as N 2 , where N is the number of configurations used; on the other hand, a larger data set allows to find more representative centroids structures.
2.2.2 Clustering algorithms. Conformations were clustered using Cα and Cβ atoms (thus taking into account side chain orientation) employing a partitioning approach (k-medoids) and the so-called GROMOS method [56] and always using pairwise RMSD as a measure of distance between 2 data points (sampled structures). The k-medoids searches for k representative objects (the medoids) among the observations in a dataset (the MD frames here). After finding a set of k medoids (randomly select at the beginning), k clusters are constructed by assigning each observation to the nearest medoid (using the pairwise RMSD to measure distances between structures); then the cost of each cluster is calculated swapping every non-medoid point with its medoid and selecting the configuration with the minimum cost (calculated as decrease or increase of variance within each cluster) as a new medoid to build new clusters. The procedure is iterated until there is no change in the medoids. In the GROMOS method, [56] after a distance cut-off is set, the structure with the largest number of neighbors (i.e. configurations within the cutoff range) is taken as the centroid of the first cluster and it is eliminated by the pool with all its neighbors; the process is repeated until all structures have been assigned to a cluster or are singletons; the number of clusters is not fixed.
Independently of clustering, centroids of sub-trajectories were also associated to those frames which maximized the sum of the similarity scores between conformation pairs: Àrms ij s where s ij ¼ e Àrms ij is the similarity beween frames i and j, rms ij is the RMSD between i and j and σ is standard deviation of the RMSD matrix.

Clustering internal validation.
The k-medoids method naturally aims at minimizing the internal cluster variance thereby maximizing the variance between clusters. When the optimal number of clusters, n C , is unknown a "scan" is often performed changing the n C parameter and the results obtained are compared by plotting the within-cluster variance (actually the within cluster sum of squares, WSS) as a function of n C . The optimum n C value is such that adding more clusters does not yield significant variation of WSS; this is the so-called "elbow criterion". However, sometimes it may be difficult to detect such an "elbow" and the best values of n C . To further assesses the results of the k-medoids procedure we also applied 2 internal evaluation criteria, namely the Dunn index and the Davies-Bouldin (DBI) index [55]. The Dunn index is defined as: where Δ k is the distance between centroids in this work and δ(C i , C j ) is the maximum intracluster distance. When testing different pre-determined number of clusters (as in k-medoids) a maximum of DI corresponds to the most probable number of clusters. The Davies-Bouldin index is defined as: where N is the number of clusters, d i is the distance of elements from the centroid in cluster i and d ij represents the average inter-cluster distance; DBI measures the maximal value of within-cluster dispersion as compared to inter-cluster distances and has a minimum value when the number of clusters chosen yields the best combination of compact and well-separated clusters.

2.2.4
Analysis of water pathways. The analysis of the water aqueducts was performed by means of the CAVER 3.1 program [57], as done by Cojocaru et al. [8] Note that we define a "tunnel" is a single solvent path from the protein surface to the selected point existing in a given frame (if any); and a "pathway" as a group of homogeneous tunnels, obtained by a cluster analysis of the tunnels found along the trajectory. Such a cluster is not the same as the groups obtained by applying the k-medoids or GROMOS clustering algorithms. Single tunnels are ranked according to their cost, which takes into account the tunnel length and radius along its length and pathways are ranked according to their average throughput (which is simply e -cost and ranges from 0 to 1) on tunnels in a given cluster. We used spherical probes of either 0.7 or 1.3 Å and the coordinates of the Sγ atom in C436 or of the center of mass of the peroxide anion as starting points. Hydrogen bond connectivity along the sampling was analyzed as follows: 2 protein residues were considered to be joined by an H-bond in a given MD snapshot if the Donor-Acceptor distance was less than 3.5 Å and the Donor-Hydrogen-Acceptor angle was 30°. Unless otherwise specified an automatic cut-off to the lifetime equal to 50% of sampling (75 ns) was applied when discussing the existence of hydrogen bonds.
Data analysis was performed using standard GROMACS tools (RMSD matrices and GRO-MOS clustering), the Pycluster library (k-medoids) or in house written software. Graphs have been obtained with the Grace program and images have been created using the VMD [58] and Chimera [59] packages. Two additional short trajectories of 10 ns were started for each system selecting the starting coordinates at 60 and 130 ns and generating new random velocities at 298.15 K. The RMSD calculated as a function of time from the (10) trajectories is superimposed over the original trajectory in S2 Fig; it may be easily observed that each system continues to sample conformations comparable to those in the originating trajectory, confirming the simulations had reached an acceptable equilibrium. Artificial trajectories were created by selecting frames from 30 to 150 ns as well as all the configurations from the 10 ns simulations for each system (equivalent to 140 ns of sampling for each of the 5 species). Using a 80 ps stride yielded a total of 8750 frames while a 40 ps stride resulted in an artificial trajectory formed by 17500 frames; using only frames from the longer simulations from 30 ns onward and a 80 ps stride yielded an artificial trajectory formed by 7500 frames.

Conformational basins of CYP-2B4 and F429 mutations
Following the RMSD calculation, the conformational space sampled by the enzyme in all the simulations was explored my means of clustering methods. The first step involved the application of the k medoids method in order to find if and which of the mutated species shared conformational sub-spaces that allowed to group them together, that is to evaluate if there was an optimal number of clusters between 2 and five. Before the actual clustering was performed the distributions of RMSD distances obtained from frame sets #1, #2 and #3 were compared (including the third one, which did not included frames from the 10 ns simulations); the results are shown in S3 Fig; it is easily observable that the distribution of distances (and thus of conformations) obtained from the 3 artificial trajectories is equivalent. To determine the best n C value the k-medoids algorithm was applied to frame set #1; the analysis of the cluster-wise total variance (see S4 Fig, panel a) indicated 3 clusters as the best solution even if an observable gain is still observed with n C = 4. Calculation of the DI and DBI evaluation criteria (see Table 1) confirms n C = 3 as the best choice. S4 Fig (panel b) illustrates the partition of the studied systems in clusters and shows that the F429A, F429E and F429H mutants were assigned to the same cluster while WT, F429L formed distinct classes. Using n C = 2 yielded one class including bulky hydrophibic side chains (i.e. WT and F429L) and hydrophilic or small ones (F42A, F429H, F429E); with n C = 3 F429L and WT formed distinct classes and finally n C = 4 separated F429H from the Ala and Glu mutations.
On the basis of the k-medoids results we ran a scan over the cut-off distance using the GRO- It is clear that the Ala, Glu and, to a lesser extent, His mutations had a high degree of similarity, the corresponding configurations being located in the biggest cluster, and a relevant conformational freedom since the smaller clusters included mostly configuration from these systems. On the other hand, the Phe, and Leu side chains yielded isolated conformational subspaces with little or no overlap between trajectories. This may be further understood by inspecting S5A Fig, panel b) and same is observed between clusters #4 and #6; clusters #2 and #3 on the other hand showed almost no transitions. In summary, all mutants tend to generate trajectories whose conformational spaces differs from that of the wild type enzyme. F429A and F429E displayed an overlap of their conformational spaces while F429H and F429L sampled a distinct one. A comparison between available experimental crystal structures (i.e. 1SUO [43] and 4MGJ [44]) and the centroids of WT (found at 123,2 ns) and F429H (found at 60.68 ns) was made; the two conformation generated by MD simulations had a RMSD distance of 0.0073 nm while their distance from the corresponding experimental structure was 0.0023 (1SUO-WT) and 0.064 (4MGJ-F429H). The distance between centroids is greater than that with the experimental structures showing that the simulation were at least partially able to capture the diversity between the WT and the His mutation; note that the 4MGJ structure has some missing loops. A superimposition of these structures is shown in S6 Fig. A principal component analysis of the frame set #2 was performed in order to correlate how much the separation of enzymes in distinct classes could be related to high amplitude motions in the simulations. Fig 4 shows a projection of the clusters over the plane spanned by the eigenvectors (accounting for 75% of total variance). Indeed, conformations pertaining to the systems   studied formed conformational basins in this volume which matched almost perfectly the separations obtained by the clustering procedure: WT and F429L were separated by other mutations along the first eigenvector and between themselves along the second one. F429A and F429E were joined a continuous region and were connected (but distinct) to F429H along the third eigenvector.

Structural Fluctuations in clusters
Following the investigation of global structural changes induced by the 4 point mutations with respect to the active site we proceeded with investigating the possible role of single residues in generating these differences, by comparing the Root Mean Square Fluctuation (RMSF) Cα atoms. We have compared in Fig 5 the RMSF calculated obtained from the frames of clusters #1, #2 and #3 (13940 frames in total) trajectories (panel a) with the RMSF difference obtained by directly subtracting the RMSF yielded by the wild-type enzyme to those obtained from the 4 mutants trajectories (panel b) (using configurations from frame set #2). Three regions, centered around M137, K225 and G418 showed conserved high fluctuations in all clusters as well as in the original simulations. A number of relevant differences in the fluctuation patterns could be observed; for instance, the first cluster showed larger fluctuations around residues A92, S277, R343 and K433. In general, the region 430-435 near the C436 and the hem distal side features  high fluctuations in the Ala/Glu/His mutants as compared to the WT enzyme or F429L. Inspection of panel b shows that in general F429A and F429E showed the greatest variation as compared to WT while F429L produced the most similar fluctuations. It can also be seen that the fluctuation in A92 is mostly due to configurations from the F429E and F429H trajectories while that of S277 and R343 to F429A and F429E; peaks associated to E424 were observable in the WT and F429L trajectories (and thus in clusters #2 and #3).
Fluctuation patterns observed in the clusters have been mapped on their centroid structures in Fig 6. It is worth to observe that residue S277 is located at the hinge between helices H and I and that K433 is located in loop before helix L.

Water pathways regulation by H bonds
3.3.1 Proxymal side hydration. Fig 7 shows the radial distribution functions (g(r)) of water molecules around the Sγ atom of the catalytic residue Cys436. A sharp separation of trajectories in classes is observable: the wild-type enzyme shows the lowest penetration of H 2 O towards the S atom followed by F429H and F429L and then F429A and F429E; using a cut-off of 7 Å this corresponds to a number of water molecules in the neighbourhood of C436 equal to 4 (WT), 7 (F429A and F429E), or 5 (F429H and F429L) water molecules, respectively.
Interestingly, the Leu and His mutants display a similar pattern, apparently at odds with the results of the cluster analysis; however, the pca (Fig 4B) showed that the separation between F429H and F429L share is less as compared to WT, particularly along eigenvectors 2 and 3. This behavior can be explained by looking at representative structures from the 3 clusters. Representative structures were obtained from the three clusters by finding new centroids using the coordinates of the complete protein and the frames forming clusters #1, #2 and #3 (see Section 3.1). The centroid for cluster #1 was found at 36.68 ns in the F429E trajectory; the centroids of clusters #2 and #3 were found at 119.39 and 123.20 ns, in the F429L and WT trajectories, respectively. Inspection of these representative structures by means of the CAVER software showed that in cluster #1 a single partially open pathway (see Table 2) was observed whose bottleneck residues were A/E429, C436 and L437. The orientation of the tunnel with respect to the bottleneck residues in shown in Fig 8A. No open tunnels were found analysing the centroids of clusters #2 and #3, meaning that hydrophobic side chains such as Phe and Leu effectively blocked the access of water molecules to the thiolate. Since the radial distribution functions showed in Fig 7 are not fully coherent with the assignation of the Glu and His mutants to the same class and because the analyzed centroid was sampled from the F429E simulation we decided to analyse also the most representative structure of the F429H alone, which was sampled at 60.68 ns. The results were similar to those obtained for cluster #1 (see Table 2 and S7 Fig in the ESI for a comparison of tunnel radius vs tunnel length) with a slightly decreased throughput and a flat profile and small radius but without real bottlenecks.
The tunnel leading to C436 and the thiolate group are lined by residues in the range R434-T444. Looking at differences in the hydrogen bond network (see S1 Table) involving these residues may be useful to explain these results and to relate them to the conformation clustering part. Note that this analysis was carried out on all frames (starting from 30 ns, see section 3.1 and S1 Fig) on the original trajectories, including the short 10 ns simulations, and showed that WT and F429L featured a number of relevant differences in stable hdyrogen bonds as compared to the Ala, Glu and His mutations. For instance, WT lacked the heme-R434 interaction which was very stable in all systems while both WT and F429L lacked the S430-heme interaction. Further, I435-K433 was observed only in featured WT and F429L; L437 featured a different network in WT as compared to all other species. These differences correlate well with the absence of tunnels in both WT and F429L even if they were assigned to  Note that the R434-D propionate group and S430-A propionate group hydrogen bonds, present with varying stabilities in the mutant species but not in the WT, have been also observed in the recent experimental study of F429H [44].

Distal side hydration.
The radial distribution function of water molecules around the peroxide anion is shown in Fig 9: The g(r) show the same maxima and minima in positions but different heights and clearly a higher level of hydration as compared to the proximal site: at least 10 water molecules were always present within 7.0 Å. In this case the separation of trajectories in classes is different: relatively comparable radial distribution functions were obtained from the F429H and F429L trajectories and from the F429A and F429E trajectories, with the WT enzyme featuring a g(r) with intermediate properties as compared to F429A and F429H. To elucidate these results water access pathways have been analysed also for the distal site using CAVER; [57] in this more complex case we extended the analysis to the full artifcial trajectory of 17500 frames (in this case the probe radius was put to 1.3 Å). Each frame was analysed searching for all possible tunnels; then, an average linkage clustering was applied to group single tunnels in pathways. In total four major pathways were found. Table 3 shows the result of this analysis along the 17500 frames (equivalent to 700 ns of concatenated sampling) ns of the artificial trajectory, including the number of frames in which a tunnel cluster was observed and its average length and bottleneck radius (note that the total number of frames in which at least one solvent tunnel was less than the total). Performing the same calculation using 8750 or 15000 frames (i.e. with an average stride of 80 ps or with one or 40 ps using only the longer simulations from 30 ns onward) yielded comparable results (data not shown).
The partition of simulation in homogeneous groups based on the presence of a specific pathway in a given simulation can be observed in Fig 10, where the bottleneck radii have been plotted as a function of time (using 2 panels to avoid overcrowded graphs). Table 3 shows that four main pathways for transport of the molecules (including solvent) to the distal site were observed and the frequency and throughput are in good agreement with the g(r) plot of Fig 9. The first major very wide pathway was sampled, with different frequency, in all trajectories even if the relative frequency and average bottleneck radius observed in the WT configurations was much lower (see Fig 10). In addition, the bottleneck radius obtained in the F429H and F429L simulations (above 1.5 Å) was on average much higher than what observed in other cases. Pathway #2 was mostly a feature of F429H and F429E and to a lesser extent of F429A and F429L. Pathway #3 was observed most often in F429H and F429L. Pathway #4 was sampled mostly in the WT simulation and less frequently in F429H. The overall picture is similar to what observed about the hydration of the proximal site: the three trajectories forming cluster #1 show a varying number of common features between them. They also show feature the pathway with highest average throughput. There is also a degree of similarity between F429L and F429H which agrees with the results of clustering and PCA (see Figs 3 and 4). It is worth observing the distinct behavior of WT which shares just pathway #4 with another  Table 3. Pathways properties for frame set #2. The table shows the total number of frames in which a specific tunnel cluster was present, the average length and bottleneck radius and the throughput, i.e. the average of e -cost across all frames using the cheapest tunnel in each frame. Pathways are shown from highest to lowest average throughput; average values refers to total number of frames in which a pathways was existent. The last column summarizes the simulation(s) in which a given tunnel cluster was observable; systems for which the pathway was less important are shown in brackets. system. To understand the origin of access to different water pathways, an atomistic view of the relative positions of these residues is given in Fig 11 where the tunnels and the corresponding bottleneck lining residues are shown. Access of molecules to the active site of CYP 2B4 through pathway #1 and #2 is regulated by a gate formed by E301 and T302; however pathway #1 is also regulated by S210 (side chain), F206 (backbone) and I209 while pathway #2 by 305 and N479. Note that F202, F206 and S210 formed a networks of hydrogen bonds (see S1 Table) which was very stable in all mutant species but only episodically observed in the wild-type simulation. Very stable bonds between involving the critical residues E301 (with 308) and T302 (the latter with the peroxide anion) were also unique features of WT and it worth to highlight the critical role of these residues in the CYP-2B4 catalytic cycle, as explained by Usharani et al. [43]. The bottleneck of pathway #3 was formed by R98 and I114. Access via pathway #4 was regulated by R98, H369 and R434; again, it is worth to observe the different behavior showed by this residues in the WT simulation as compared to the mutants in general, even if this pathway was relevant also in the F429H trajectory. Note that, like pathways #1 and #2, pathways #3 and #4 featured some common bottleneck residues which had different hydrogen bonds character (see S1 Table) and were observed in different systems (e.g. #3 was sampled in the F429L trajectories and #4 in WT trajectories, respectively).

Discussion
The key features of the F429H mutant elucidated in the previous computational works [43] were: (a) the role of the Phe alanine in protecting the sulphur atom in C436 and the electrostatic environment around the thiolate group and (b) the interaction between T302 and the peroxide anion which favoured the attack on the meso position. However, the reaction model proposed for the heme-oxygenase behaviour of F429H was primarily based on the wise application of MD results, particularly of PCA, [60,61], in selecting candidate structures for higher level of calculations (e.g. QM/MM). Thus, the obvious direction for further investigation was to extend the simulation time and study additional mutations. To this end, in this study we performed new simulations of both the WT and F429H species extending the simulation time to 150 ns and added equally long simulations of three mutations: the bulky and hydrophobic Phe side chain was changed with Ala (small side chain), Glu (hydrophilic side chain) and Leu (bulky and hydrophobic side chain). This wide range of amino acid properties was studied to address the specific properties of either the WT enzyme or its F429H mutant with respect to other species and thus provide further insights in the function of the Phe residue (a highly conserved one) and of the specific alterations induced by the presence of the imidazole moiety in the mutation.

Mutation of F429 deeply affects global mechanical properties
As a first step (sections 3.1 and 3.2 and section S3.1 in the ESI) we investigated the amount by which, upon mutation of F429 into another amino-acid, the collective motions of the CYP-2B4 enzyme were affected. Indeed, the results show that the presence of the Phe residue is critical for modulating the structural flexibility of the enzyme since the WT trajectory was always separated from the other systems in clustering procedures and had the highest propensity to form a compact cluster when a cut-off (GROMOS and linkage methods) was used, i.e. no configurations of the WT enzyme were present in the smaller clusters. The results obtained in this first part provided both a tool of simplification (in analysing structural fluctuations) and a key of interpretation (in studying water pathways) of the subsequent results. This results was also confirmed by the principal component analysis of the artificial trajectory used in the clustering. In addition, we observed a different fluctuation profile of residues in the region 430-435 which are critical in the regulation of water pathways to both the distal and proximal sides. The results obtained in this first part can be summarized in the following way: mutation of F429 creates a perturbation in the inter domain weak interactions regulating enzyme fluctuations; this perturbation does create a sharp separation between the WT and mutant species which can be roughly ordered in function of the mutated residue hydrophobicity from Leu to His to Glu and Ala (where Ala is not hydrophobic by itself but does not create enough steric hindrance).
The Cys436 residue is exposed to solvent in all mutant species In section 3.3, we analysed the access pathways to both sides of heme group. As remarked in the introduction, controlled hydration/dehydration of the active site is critical for starting the enzyme critical activity and water clusters also play a fundamental role in the heme-oxygenase activity of F429H; in addition, direct interaction with water may provide the same function of the imidazole hydrogen in altering the enzyme function. Thus, any relevant alteration in the hydration properties of the distal and proximal sides of the heme group are likely to have a profound effect on the enzyme activity. The analysis of the proximal side yielded a clear and simple message: mutation of the Phe residue with a less hydrophobic species may open a short and direct tunnel from the protein surface to the proximal side. Such an opening was reduced for the His mutation and greater for Ala and Glu, a separation that is in very good agreement with the classification of systems performed in Section 3.1.
The F206-E301 pathway is highly affected in all mutant species The analysis of tunnels for the distal site showed the unique properties of the WT enzyme that displayed a distinct pattern of hydrogen bonds and correlated water pathways. It has been previously proposed [8,11] that this pathway and the CYP's conformational plasticity that closes and opens it is a key factor to explain the ability of this enzyme to host different substrates and create a proper reaction environment. In a communication by Shah et al. [62], it was shown that in the F297A mutant of CYP-2B4 a reorientation of phenylalanine residues along this pathway, in particular F202 and F203 was observed and this reorientation was at the origin of different poses obtained by docking drugs such as clopidogrel in the WT and mutant enzyme. It is important to observe than the F202 and F206 residues followed completely different dynamics in the WT simulation as compared to the mutant species. It is also worth to remark that in CYP-3A4, the F206 residue is replaced by R212 i.e. a completely different amino acid; nevertheless, the transport of solvent and drugs to active site is still regulated by this residue. [63] The solvent pathway via H369 is either closed (F29A and F429E) or altered (F429L and F429H) in mutants We want to highlight one last feature of the WT system i.e. the presence of a reversible pathway (#4) regulated (among others) by residue H369, absent in the Ala, Glu or His mutants. This aqueduct and its functioning have been studied previously [16], also in other cytochrome species and in the presence of reductase with CYP-3A4 by means of classical molecular dynamics simulations. [13] It has been found that in CYP-3A4 these residues can create a reversible gating for water, together with R375 in the 1TQM structure, [64] which is replaced by H369 in CYP 2B4. Residues H369, K433 and R434 (i.e. R440 in the 1TQM structure) are bottleneck lining residues in pathway #4 (mainly sampled in the WT trajectory). Please note that, K433 showed a rather different mechanical behaviour in conformational clusters (see Fig 5). The different dynamics of these residues in F429H and (more frequently) F429L gave raise to pathway #3.

Conclusions
Fifteen MD simulations, were carried out to study the conformational behavior of aqueous CYP 2B4 and the F429A, F429E, F429H and F429L single polymorphisms while hosting ferric hydroperoxide (Cpd 0) key intermediate. Three trajectories were generated for each system and sampled for either for 150 or for 10 ns, starting from equilibrated structures and assiging new velocities. In total, each system was simulated for 170 ns; analysis of RMSD prompted us to discard the firsy 30 ns of the longer trajectories yielding 140 of sampling available to data analysis for each system. Comparison of the classical trajectories shed some light on the subtle role played by the highly conserved Phe429 proximal residue in the CYP 2B4 biological machinery. The combined use of different clustering techniques, analysis of collective motion hydrogen bonds and solvent aqueducts allowed us to isolate the most relevant conformational reservoirs of the five systems, highlighting their shared and distinct features. The results showed that, the effect of the Phe429 mutations are propagated across the entire CYP 2B4 structure, altering both protein dynamics and functional access channels to the active site. These alterations follow a pattern that may be related to the mutated residue hydrophobicity (or lack thereof) as compared to the phenylalanine side chain. We conclude, then, that the proximity of Phe429 to the proximal side of CYP-2B4 has a twofold critical function: (i) it creates a closed environment for the thiolate group, protecting it from possible oxidation, protonation or other process by neighbour residues or solvent molecules (ii) it stabilizes the hydrogen bond network around the active site and, as a consequence, the proper functioning of the access and egress pathways to the heme distal site which are critical for allocating substrate and solvent molecules and thus for the enzyme catalytic activity. We are confident that the presence results might provide a useful starting point for future studies aimed to clarify, how the enzyme plasticity, one of its key features, is affected by single polymorphisms.  Table. Differences in Hydrogen bond networks in the WT, F429A, F429E, F429H and F429L trajectories. The system in which a specific bond was observed, the donor and acceptor atom and the relative stability (with respect to a total of 140 ns) are shown. "heme-A" or "heme-D" stand for heme-priopionate group A or D (see Fig 2). Interactions are shown if the lifetime is grater than 60% of used configurations (i.e. 91 ns of sampling) Atom names N/O without further specification indicate backbone atoms; N Ã and O Ã for Arg and Glu or Asp residues indicate either nitrogen or oxygen atom of the side chain; the sum of the interactions is shown. (DOCX)