Novel Non-Peptide Inhibitors against SmCL1 of Schistosoma mansoni: In Silico Elucidation, Implications and Evaluation via Knowledge Based Drug Discovery

Schistosomiasis is a major endemic disease known for excessive mortality and morbidity in developing countries. Because praziquantel is the only drug available for its treatment, the risk of drug resistance emphasizes the need to discover new drugs for this disease. Cathepsin SmCL1 is the critical target for drug design due to its essential role in the digestion of host proteins for growth and development of Schistosoma mansoni. Inhibiting the function of SmCL1 could control the wide spread of infections caused by S. mansoni in humans. With this objective, a homology modeling approach was used to obtain theoretical three-dimensional (3D) structure of SmCL1. In order to find the potential inhibitors of SmCL1, a plethora of in silico techniques were employed to screen non-peptide inhibitors against SmCL1 via structure-based drug discovery protocol. Receiver operating characteristic (ROC) curve analysis and molecular dynamics (MD) simulation were performed on the results of docked protein-ligand complexes to identify top ranking molecules against the modelled 3D structure of SmCL1. MD simulation results suggest the phytochemical Simalikalactone-D as a potential lead against SmCL1, whose pharmacophore model may be useful for future screening of potential drug molecules. To conclude, this is the first report to discuss the virtual screening of non-peptide inhibitors against SmCL1 of S. mansoni, with significant therapeutic potential. Results presented herein provide a valuable contribution to identify the significant leads and further derivatize them to suitable drug candidates for antischistosomal therapy.


Introduction
Schistosomiasis (bilharzia) is one of the major health problems of the world, caused by the trematode blood fluke of the genus Schistosoma.Report states that the disease infects broad range of humans (391-587 million) in 74 developing countries in tropical and subtropical regions [1].The causative agents of the schistosomiasis primarily include three species, namely Schistosoma mansoni, S. haematobium and S. japonicum [2].In addition, schistosomiasis burden is estimated to exceed 70 million disability-adjusted life-years (DALYS) [3].
Schistosomes undergo several morphological and physiological changes, perpetuating their life cycle between definitive-vertebrate and intermediate-snail hosts.The complex life cycle of schistosomes involves the infective aquatic stage (cercariae), which invade the host skin and transform into schistosomula [4].Schistosomula travel to the lungs via venous circulation in 4-6 days post penetration and then migrate to the hepatic portal circulation.At this site, the parasites mature and copulate to produce numerous eggs [5].Studies suggest that human schistosomiasis-associated morbidity results from the immunological reactions in response to the disposition of eggs in the liver and other sites [6].The complex developmental stages of schistosomes, thus, make it difficult to perform the in vivo experiments related to the drug action against these parasites in humans.
In the schistosome gut, cathepsin SmCL1 is located in the gastrodermal cells lining the cecum of the parasite [7].Here it plays a digestive role by degrading the host haemoglobin which is the main nutrient source for the adult schistosomes [8].Thus, the important role of cathepsin SmCL1 in the metabolism of the schistosome renders it to be a crucial target for novel anti-schistosome chemotherapy and immuno-prophylaxis [9,10].
Despite substantial efforts in the past, no effective vaccine has been developed against schistosomiasis.Treatment of schistosomiasis relies only on a single drug, praziquantel [11].However, the intensive use of praziquantel is an increasing concern as it may lead to the development of drug-resistant strains [12].Hence, it is prudent to identify anti-schistosomal drugs and new schistosomal protein targets for the control and treatment of this 'neglected tropical disease' [13,14].In a previous study, it was reported that treating S. mansoni infected mice with broad spectrum peptide-based cysteine protease inhibitors not only reduced worm burden but also inhibited worm fecundity [15].This shows that cysteine proteases are potential targets of anti-schistosomal drugs.This finding paves the way for the rescuing of more molecules against cathepsin SmCL1, a utility in prophylactic and therapeutic interventions.
Efforts have been made to identify new cathepsin SmCL1 inhibitors as an alternative to traditional therapy in drug-resistant organisms.Inhibitors such as peptidyl fluoromethyl ketones [15], peptidyl diazomethyl ketones [16], vinyl sulphones [17] and epoxysuccinyl derivatives [7] have been categorised as peptide-based inhibitors of SmCL1.To date, a lot of peptide-based inhibitors of cathepsin SmCL1 have been synthesised and evaluated as a potential cysteine protease targets.However, in vivo efficacy of peptide-based inhibitors has been limited due to various pharmacological constraints: solubility, stability and selectivity.Hence, the discovery and optimisation of non-peptide inhibitors is necessary to overcome these limitations for reliable and safer chemotherapeutic treatments [18].
In view of the above facts, SmCL1 was taken as a potential target for the present work.Since the three-dimensional (3D) structure for SmCL1 is yet unavailable, a theoretical 3D structure of SmCL1 was developed using reliable templates via homology modeling protocol.Computational approaches such as molecular docking, virtual screening and MD simulations were carried out to identify novel non-peptide inhibitors against SmCL1.It is expected that the nonpeptide phytochemical inhibitors can serve as an alternative to cope up with the limitation of in vivo efficacy of peptide inhibitors, and are likely to be developed as potential inhibitors against SmCL1.

Sequence analysis
The 319 amino acid (aa) long protein sequence of cathepsin SmCL1 of S. mansoni was retrieved from the universal protein resource (UniProt) database (ID: Q26534) in FASTA format.Based on reported literature, mature sequence of SmCL1 was 215aa long and started from the amino acid residue Ile105 [16].The physio-chemical properties of SmCL1, such as theoretical isoelectric point (pI), molecular weight, total number of positive (+R) and negative (-R) residues, extinction coefficient (EC) [19], instability index (II) [20], aliphatic index [21] and grand average hydropathy (GRAVY) [22] were evaluated by Expasy's ProtParam server [23].SOPMA [24] was employed for calculating the secondary structural features of the SmCL1 sequence used in this work.
The mature part of SmCL1 sequence was cleaved off and taken as input for performing BLAST against Protein Data Bank (PDB), using protein-protein BLAST (blastp) [25].Among the list of homologous sequences identified with the BLAST program, the hits which meet the following criteria were selected as template structure for homology modeling of SmCL1: (1) Evalue below 10 -5 ; (2) query coverage >95% and sequence identity >35% or query coverage >90% and sequence identity >40%; (3) PDB structure with a resolution < 2.5 Å; (4) PDB structure bound with ligand.The templates and the target sequence (mature SmCL1) were then aligned using the PSI-Coffee mode of T-Coffee (v.9.03) server [26] to give a clear description and relation between the target sequence and the templates considered in this experiment.

Homology modeling
The 3D structure of SmCL1 was constructed using the 'Python' based automodel routine of Modeller (V.9.12) [27].Two important inputs for Modeller are 'alignment files in PIR format' and the 'atom files'; atom file contains the coordinates of the proteins taken as templates.To obtain a PIR format alignment file, multiple sequence alignment (MSA) using ClustalW [28] was performed between the target sequence and the template sequences.The SmCL1 models were constructed based on different alignments, single and multiple.For each alignment, 100 theoretical protein models were constructed using Modeller.
To validate the above models and to pick a single and best model to proceed with the experiments, a series of model evaluation techniques were incorporated in this study.Starting with the inbuilt routine of Modeller, called normalized DOPE score, Z-score of each model was evaluated.The models with the least scores (Z -1) were selected for further validations.The servers like ProSA-Web [29,30], PROCHECK [31], ProQ [32], QMEAN [33] and locally installed TM-score [34] were used to validate the 'selected' models and finalize from them a single and best theoretical 3D model of SmCL1.

Ligand Library Preparation
Since the aim of this study was to identify novel non-peptide inhibitors and to perform comparative analysis of the action of 'phytochemicals' on cathepsin SmCL1 with respect to nonphytochemical drugs, we used two separate libraries of ligands for molecular docking.These libraries were prepared using 'ZINC Database' and 'Dr.Duke's Phytochemical and Ethanobotanical Database'.a. Ligand library of orally active Drug-like compounds from ZINC Database: As on March 30 th  2014, the ZINC database (http://zinc.docking.org/)[35] had about 35 million purchasable compounds, with an exception of few non-purchasable compounds too.Using the 'Combination Search' string of the database, the molecules of the interest were retrieved.While performing combination search, the following parameters were included: (1) Molecular weights 150-500; (2) Octanol-water partition coefficients (log P) 5; (3) Rotatable bonds 10; (4) Polar surface area 150 Å 2 ; (5) Hydrogen bond acceptors 10; (6) Hydrogen bond donors 10; (7) Net charge 5.These parameters are basically constituents of Lipinski 'rule of five' for a drug to be orally active.Thus, it was ensured that the molecules to be retrieved are 'drug-like' and 'purchasable', so that the wet lab researchers may proceed with the 'real' experiments.Applying the above filters, there were 200 molecules which have shown to inhibit the enzyme 'Cysteine Protease' of Eukaryotes.As stated earlier, studies suggest that non-peptide inhibitors could be better against the parasites [18], therefore a manual protocol was adopted in further filtering of non-peptide inhibitors from peptide ones.Later, only those molecules were retrieved which inhibit the cysteine protease of the parasites.This step was repeated to minimize the human error.Finally, 60 compounds were confirmed as non-peptide anti-parasitic inhibitors, and were included in this library for further docking procedures.
b. Ligand library of orally active Drug-like Phytochemicals: This library included the molecules from the Dr. Duke's Phytochemical and Ethanobotanical Database (http://ars-grin.gov/duke/).This database was searched and 545 molecules were retrieved on April 2 nd , 2014 showing the antimalarial, antileschmanial and antitrypanosomic activities.This search criterion was taken into consideration, because the important amino acid residues of SmCL1 of S. mansoni are closely related to the residues of cysteine proteases of Plasmodium falciparum, Leishmania donovani and Trypanosoma cruzi, which cause important parasitic diseases.The PASS server (http://www.pharmaexpert.ru/passonline/)[36], meant for prediction of activity spectra for substances, was used to screen the retrieved molecules for anti-parasitic activity.The molecules with Pa > 0.7 are likely to exhibit the activity in the experiment.Out of 545, only 167 molecules fall under this criterion and were subjected to further filtration.Applying the filter of drug-likeness (Lipinski 'rule of five') using the server Chemicalize.org(http://www.chemicalize.org/)[37], only 55 molecules were left in this library which were further taken for docking procedures.

Docking studies
All docking studies were performed using the standard AutoDock (v4.2) [38] suit incorporated in MGL tools (v1.5.6) [39], utilising the default protocols of this program.The target protein was prepared using standard protocol and saved into 'PDBQT' format using MGL tools.Similarly, before starting the docking protocol, individual ligands from the respective libraries were prepared using the python script 'prepare_ligand4.py'present in MGL tools and saved into 'PDBQT' format too.The input 'grid parameter' files were modified and the grid size was adjusted to 70 points in XYZ dimensions respectively, with default spacing of 0.375 Å, to cover the catalytic region of the target protein SmCL1.Rest all the docking parameters were set to default values.
Top pose, from all the 10 confirmations from AutoDock, was saved as a complex in default 'PDBQT' format.The molecules were visualized and written to PDB format using Chimera (v1.8.1) [40].Hydrogen bond interactions and their distances between the protein and ligands were visualized and measured using PyMOL software [41] (PyMOL Molecular Graphics System, version1.5.0.1, Schrodinger, LLC).The workflow used for the virtual screening is depicted in Fig 1.

ROC analysis of orally active-drug like compounds in two libraries
To determine the reliability of phytochemical as well as that of ZINC leads obtained, ROC curves [42] for actives and decoys from the two libraries were computed.The ligands which had interaction with important residues of SmCL1 were considered as actives and labeled as 1, while rest were referred to as decoys and labeled as 0. Regarding the ROC curves, it may be noted that a straight line indicates that the model gives no preference to true inhibitors over decoys or vice versa, and the bending more towards left shows a greater accuracy of the model.Simply speaking, the curve bowing more towards left is appreciable and indicates higher ratio of true positives to false positives.

Stability evaluation of the docked SmCL1 via MD simulations
In order to determine the stability of top docking pose of the best ligands from two libraries, MD simulations were performed using the GROningen MAchine for Chemical Simulations (GROMACS) v4.6.5 package [43].
One known limitation of GROMACS is to parameterize heteroatom groups from PDB files of protein-ligand complex.Hence, in the first step, PDB file of the protein-ligand complex was separated into PDB files of protein and ligand.
The system topology of separated PDB files was prepared individually in two different steps.The ligand topology was prepared using PDB file of ligand over PRODRG server [44].Protein topology was prepared using PDB of protein with 'pdb2gmx' using 'GROMOS96 43a1 force field' [45].Next, 'unit cell' was defined and system was filled with water.The complex of protein and ligand was confined into a cubic box maintaining a minimum of 10 Å between the box edges and its surface, while keeping it centered within the box.The resulting system was then solvated with simple point charge (SPC) 216 water model [46].At physiological pH, the system was found to have a net charge of -8.0.Therefore, counter ions (8Na+) were added to neutralize the system that replaced water molecules at positions of favourable electrostatic potential.
The solvated system was then minimized in 50,000 steps using the steepest descent method, to remove close van der waals contacts.After energy minimization, position restraint dynamics (equilibration run) was performed for 100 picoseconds (ps) (50,000 steps) in two consecutive steps, NVT (Number of particles, Volume and Temperature) equilibration and NPT (Number of particles, Pressure and Temperature) equilibration.To avoid unnecessary distortion of structures during simulations, NVT equilibration was performed, in which all heavy atoms of the protein along with counter ions were restrained to their starting positions and to make sure that water molecules settle (soak) around the structures.The next equilibration was run under NPT conditions to keep the pressure and temperature constant.Soon after the system was equilibrated, a 2 nanoseconds (ns) long production simulation (MD run) was piloted with a 2 femtoseconds (fs) time step at a pressure of 1 bar, and a temperature of 300 K, to confirm stability of the given system.The interaction energy and intermolecular hydrogen bonds of the system were calculated via the modules g_energy and g_hbond, respectively.The trajectories of simulations were plotted using Gnuplot (v4.6) (http:// sourceforge.net/projects/gnuplot).

Structure-based Pharmacophore model generation via Ligand Scout
The potential SmCL1-inhibitor complex was considered to generate structure-based pharmacophore model via ligand scout (v3.12) [47].An advanced alignment algorithm in the software allows to overlay and identify pharmacophoric points/features such as hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), ring aromatic (RA), hydrophobic groups (HYP) and hydrogen bonding vectors on ligand to generate the pharmacophore model of potential inhibitor against SmCL1.

Results and Discussion
The physio-chemical and structural properties of SmCL1 are shown in Table 1.The calculated pI (isoelectric point) of SmCL1 is 5.06 (pI < 7), which suggests that the protein is acidic in nature.Extinction coefficient (EC) of the protein, which is used to determine the protein-protein and protein-ligand interaction in the medium, was elucidated as 66,265 M -1 cm -1 .The stability of the protein is determined by its instability index (II); with a value less than 40 indicates a stable protein.Instability index of SmCL1 was calculated as 25.03, which shows that it is a stable protein.The aliphatic index (AI), which is regarded as a positive factor for the increase of thermostability of globular proteins, was found to be 76.60.GRAVY value for a protein is an index of the solubility of the proteins in the solution.Its positive and negative values indicate the hydrophobic and hydrophilic nature of the protein in the solution, respectively.GRAVY indices of SmCL1 were found to be -0.267,indicating its hydrophilic nature, and thus showing better interaction with water.The secondary structure of the protein was predicted by the self-optimised prediction method with alignment (SOPMA), which correctly predicts 69.5% of amino acids for a three state description of the secondary structure in alpha helix, beta sheet and random coil.
Growth, maturation and fecundity of schistosome parasite rely on the ingestion of degraded haemoglobin and serum proteins by SmCL1 [48,49].This essential role of SmCL1 in the adult schistosome parasite makes it a suitable target for structure-based drug design.The atomic coordinates of SmCL1 were not available in PDB; therefore homology modeling protocol was necessary to develop a protein model.ModellerV.9.12 program was used to compute homology modeling based on the satisfaction of spatial restraints of the target sequence, which is aligned with the 3D structure of the templates [27].
For the accuracy of homology modeling, it is important to consider the templates whose similarity and identity is close to the target sequence.In this quest, template search and sequence alignments are the crucial steps before starting with the modeling protocol.Therefore, SmCL1 protein sequence was used as a query in a protein-protein BLAST (blastp) search against PDB.Among the best hits, only those 3D templates were retrieved, which satisfy the criteria (1)-( 4) (described in materials and methods).Six templates, namely 1EWP, 2P7U, 2XU3, 1S4V, 2FO5 and 1M6D (1M6D and 2FO5 had highest and least blast score, respectively), were finally considered for homology modeling.The detailed information about these templates is given in Table 2.
Average sequence homology of SmCL1 with six homologs considered was around 47%, ranging from 44% to 58%.The mature sequence of SmCL1 starting from isoleucine (Ile105) and the sequences of selected templates were subjected to multiple sequence alignment as presented in Generally, Modeller appears to function better when two or more templates are considered during homology modeling [50].Therefore, models in this experiment were built based on different alignments resulting from multiple templates using the PSI-Coffee mode of T-Coffee.Following Perez et al. [51], the combination of selected target sequence (SmCL1) and all possible combinations of two to three selected homolog proteins resulted in 41 different alignments.For each alignment, 100 models were constructed, totalling to 4100 models for SmCL1.
Model evaluation was started from the inbuilt feature of Modeller, normalized Z-DOPE feature, a statistical method which was used to start the evaluation of 4100 models.Here a Z-DOPE score of -1 supports it to be a 'reliable' model.The score indicates that 80% of models' Cα atoms are within 3.5 Å of their correct positions [52].Only 23 models out of 4100 had score -1, hence only these were taken for further analysis (Table 3).To select the final model of SmCL1, among the 23 listed models, additional validation tools (Verify3D, PROCHECK, ProSA, ProQ, QMEAN and TM-score) were employed.
In different validation tools, there were different rules and parameters to validate the model reliability and quality.For PROCHECK analysis, a model which has 90% of amino acids of protein sequence, found in the most favourable region, is considered to be a quality model.ProQ measures the quality of the model in two ways: LG score and MaxSub.LG scores >1.5, >2.5 and >4.0 correspond to fairly well, very good and extremely good models, respectively.Similarly, MaxSub scores >0.1, >0.5 and >0.8 represent fairly well, very good and extremely good models, respectively.The Z-score of ProSA indicates the overall model quality.The value of Z-score in ProSA analysis is displayed in a plot that contains the Z-score of all the experimentally determined proteins, present in PDB.QMEAN scoring function estimates the absolute quality of modelled protein, with the models having scoring range 0 to 1 (with higher scores) are ought to be more reliable models.QMEAN Z-score used to predict the reliability of the modelled structure as compared to the high-resolution structures solved by X-ray crystallography, with an average value of 0 is considered good for a reliable model.The calculated 'QMEAN Z-score' provides an estimation of the degree of nativeness of the structural features observed in a model and indicates whether the model is of comparable quality to experimental  structures.All the 23 models passed the quality check performed at Verify3D server [53], which derives a "3D-1D" profile based on the local environment of each residue of the protein.
Considering the above results, the model 12 (X_1M6D_2P7U_2XU3_49), listed in Table 3, was supposed to be the best model among 4100 models prepared via homology modeling using three different templates (1M6D, 2P7U and 2XU3).Model X_1M6D_2P7U_2XU3_49 with alignment score 97 has least RMSD with the templates, and was taken into consideration for further analysis.
Ramachandran plot (Fig 3), obtained from PROCHECK, was used to check the stereochemical quality of a given protein structure.This plot ensured the quality of the model 12 of SmCL1 with 90.3% residues in favourable region and 9.1% in additionally allowed region.Furthermore, there were only 0.5% residues in disallowed region and 0% of the residues in generously allowed region.Only Ser128 was found in the disallowed region of the plot.However, this residue can be ignored due to its far-off distance (19.4 Å) to the Cα atom of the catalytic Cys25.This suggests that Ser128 would interfere very little with the ligands binding in the active site region of SmCL1.
The ProSA analysis of the model 12 of SmCL1 was used to judge the overall quality of the protein on the basis of plot containing the Z-scores of the experimentally determined protein chains available in PDB. .There are four α-helices in the L domain of the modelled SmCL1 and six β-sheets in the R domain.Apart from these, three small α-helices are also seen at the surface of the model, which resembles the typical features of C1 papain-like fold [54].
To stabilize the binding region, the C-terminal of R domain and N-terminal of L domain bind to the L and R domains, respectively.SmCL1 model includes seven cysteine residues, which are common to papain family (Fig 6A).Among these, six cysteine residues (Cys22-Cys63, Cys56-Cys96 and Cys154-Cys203) are involved in the disulphide bonds.The seventh one, Cys25, corresponds to the active catalytic residue.Cys25 together with His161 and Asn182 form the catalytic triad (Fig 6C).Gln19 along with Trp184 is another important and conserved moiety of this family, whose side chain forms the 'oxyanion hole'; this is considered as an important feature of the enzymes with proteolytic activity.Residues Gly65 and Gly66 correspond to conserved glycine rich region, which is a typical feature of this family, ensuring the stability of the complex by forming number of hydrogen bonds with the substrate [55].SmCL1 also contains the tripeptide consensus sequence Arg-Gly-Asp (RGD) binding motif which aids in cellular targeting (Fig 2).It serves to anchor proteins that have no transmembrane spanning domain or do not possess a GPI moiety.The secondary structure of  The first value corresponds to the QMEAN global score, while the value between parenthesis refers to the QMEAN Z-score c RMSD values of SmCL1 model compared to the 3D structures of the templates used for the alignment (listed in same order as column 2) doi:10.1371/journal.pone.0123996.t003SmCL1 as predicted from PSIPRED server [56] supports the three dimensional structure of the protein (Fig 6B ).
The active site of papain-like protease SmCL1 is usually lined by four pockets: S1, S1', S2 and S3 [57] (Fig 7A).Among these, S1 and S2 are the least and most defined pockets, respectively.S2 pocket residues bestow major specificity to the P2 substrate residues.The major residues lining the four distinct binding pockets of cathepsin SmCL1 of S. mansoni are listed in Table 4.
Phylogenetic analysis clearly indicates that SmCL1 is related to cruzain from the parasitic protozoan T. cruzi, supporting that SmCL1 and cruzain are the members of a separate clade of C1 peptidases, termed as "cruzain lineage" [58,59].To supplement the results of phylogenetic analysis, the protein sequences of SmCL1 and cruzain (PDB ID: 1EWP_A) are compared in Fig 2 .It is found that the enzymes have not diverged at the catalytic triad residues and at the "ERFNAAQ/A" motif in the pro-region peptide.Analysis of the deduced primary structure of SmCL1 also revealed that it shares 45% sequence identity with cruzain (Table 2).Moreover, very little differences are found in the residues lining the binding pockets of SmCL1 and cruzain (Table 4).Residues highlighted in Table 4 are the ones which are not present in similar order.Other residues like Ala, Asp, Trp in S1'; Gly, Ser in S1; Leu in S2; Asp in S3 are present in similar order and lining these pockets.The electrostatic potential of the modelled SmCL1 is The availability of the theoretical 3D model of SmCL1 using homology modeling protocol leads to the docking procedure.As discussed earlier, this experiment aims to justify a couple of important facts; one among them is the importance of non-peptide inhibitors over the peptide ones and the other being the 'better' activity of phytochemicals against non-phytochemical inhibitors.Since, no elaborated studies have been reported on phytochemicals against S. mansoni, during library preparation all the phytochemicals which showed anti-parasitic property against T. cruzi, P. falciparum and L. donovani were considered.This consideration seems to be justified, as SmCL1 is closely related to the cysteine protease of the above mentioned parasites on the basis of the important conserved catalytic triad and S2 binding pocket residues (Fig 8 ).Keeping in view these facts, two different libraries were prepared; one had non-peptide non-phytochemicals inhibitors and the other included non-peptide phytochemicals.
Docking was performed for all the molecules in the two libraries.After docking, only few drug-like compounds were considered as 'best compounds/ligands' against SmCL1 on the basis of the following observations using PyMOL: (1) Adopted position and orientation of the ligand in the SmCL1 binding site, (2) Ligand covering/interacting with active site residues, (3) Ligand interacting with active site and S2 pocket residues and (4) Ligand interacting with S1 pocket residues.
The results of '14 best docks/leads' from ligand library comprising 60 orally active drug-like compounds from ZINC Database are listed in Table 5 and the results of '13 best docks/leads' from ligand library of 55 orally active drug-like phytochemicals are listed in Table 6.Tables 5 The docking results were evaluated by ROC curves, which describe the trade-off between "sensitivity" and "specificity"."Sensitivity" is the ability of the classifier to detect true positives, while "specificity" is the ability to avoid false positives.The area under an ROC curve indicates the quality of enrichment.The ROC value of a random classifier is 0.5, while that of an excellent classifier is greater than 0.9 [60].ROC curve analysis was performed using the 'binding energy' results of docked ligands to find out the relation between the true positives and true negatives from two different libraries.Actives and decoys were labelled 1 and 0, respectively; the actives were those ligands which were found interacting with important residues (active site and S2 pocket) of SmCL1 and otherwise were supposed to be decoys.The ROC score of the ZINC leads was 0.56, whereas the same of phytochemicals was 0.67.Thus, ROC curve analysis (  For thorough analysis of the activities of leads and to support that the phytochemicals could be better and potential drug against SmCL1, one best lead was picked from each library.Using PyMOL, the inter-molecular interactions of the docked ligands with SmCL1 were observed Table 4. Important residues lining the binding pockets of SmCL1 of Schistosoma mansoni and cruzain of Trypanosoma cruzi.
Gly23, Ser24, Asn64, Gly65, Gly66 Leu67, Pro68, Gly133, Leu159, Ala162, Val208 Asp60, Asp61 Cruzain Ala141, Ser142, Met145, Asp161, Trp184 Gly23, Ser24, Ser64, Gly65, Gly66 Leu67, Met68, Ala138, Leu160, Gly163, Glu208 and the bond distance (Å) was measured between the interacting residues to finalize the best lead from each library.Through careful observations, compound with ZINC ID 22001688 and phytochemical Simalikalactone-D (PubChem ID: CID 6711208) were found as the best lead compounds from their respective libraries.ZINC lead 22001688 was found interacting with Leu159 (S2 pocket residue) of SmCL1 through the hydrogen bond of length 2.94 Å (Figs 10A and 11A).Simalikalactone-D was found interacting with two important residues, Cys25 (active site residue) and Gly66 (S1 pocket residue) with hydrogen bond of lengths 3.02 Å and 3.17 Å, respectively (Figs 10B and 11B).Interactions of these lead compounds with SmCL1 were analysed using PyMol (   To confirm the stability of each lead compound, MD simulation was performed at 2000ps.The results were observed on the basis of plots obtained for interaction energy and hydrogen bond variation at various time intervals.Negative plot for interaction energy reflects the association of the complex and positive plot suggests the dissociation of the same complex.The interaction energy plot of ZINC lead 22001688 (Fig 13A) suggests it to be non-stable interaction, where the association and dissociation takes place until 1100ps, which later stabilizes reaching the time interval of 1200ps.In the case of Simalikalactone-D (Fig 13B ), the variation in interaction energy was found to be insignificant and was more stable from 600ps to 1400ps.Further, the hydrogen bond interactions of ZINC lead 22001688 reaches a maximum of three and remains one for most of the time (Fig 14A ); whereas, in the case of Simalikalactone-D (Fig 14B ), the hydrogen interaction number reaches five and for most of the time remains two.This illustration of greater hydrogen interaction number further confirms the greater stability of the phytochemical Simalikalactone-D.
In support of this experiment, when complete data was analyzed, the activity of non-peptide non-phytochemical and phytochemical inhibitors against cathepsin SmCL1 could be best illustrated by the bar graph (Fig 15).Non-peptide inhibitors, ZINC lead 22001688 and Simalikalactone-D, were found to show good results both for inhibition constant and interaction energies in comparison to E64, which is a reported peptide-based inhibitor against SmCL1.Further, if we perform comparative analysis of AutoDock results for four good leads from each of the two libraries, the results show that the phytochemical inhibitors have lower inhibition constant and more negative of interaction energy than non-phytochemical inhibitors,  which makes it quite clear that phytochemicals like Simalikalactone-D could be the better drug against the control of schistosomiasis.
Since the actual synthesis and testing of above screened inhibitors is beyond the resources available to us, an attempt was made to validate our computational hypothesis by linking it to reported experimental work.As mentioned earlier, peptide inhibitors such as peptidyl diazomethyl ketones [9], fluoromethyl ketones (FMK) [15] and vinyl sulphones (VS) [17] have shown to inhibit the activity of SmCL1.In order to compare the experimental activity of reported peptide inhibitors [9,15,17] with SmCL1 model, molecular docking approach was implemented.As evident from the AutoDock results (S3 Table ), the peptide inhibitors exhibit less negative interaction energy (-3.14 to -4.91 Kcal/mol), high inhibition constant (251.93 μM-4.77mM), and follow the similar trend of increase/decrease in Ki values as reported in the experimental work [9,15].It has been reported that Mu-F-nitroR-FMK is the only FMK inhibitor that has the least effect on haemoglobin degradation [15]; this result has also been verified by docking protocol giving rise to mM (higher) value of inhibition constant (S3 Table ).Cysteine protease inhibitors (CPIs) against S. mansoni infection have shown to prevent growth and differentiation of the parasite.Among the CPIs, the best schistosomicidal effect was observed with FMK followed by VS to selectively arrest parasite replication [61].Using docking protocol (S3 Table ), we have shown that FMK (Mu-Y-(O-methyl)-hF-FMK) exhibit more negative interaction energy and lower inhibition constant as compared to K11777 (VS), suggesting FMK as better peptide inhibitors against SmCL1.The unavailability of experimental information on non-peptide inhibitors against SmCL1 made it necessary to identify and select the reported non-peptide inhibitors (plant and nonplant) of cysteine protease closely related to SmCL1.As mentioned earlier, cruzain from the parasite T. cruzi is closely related to SmCL1; the two enzymes have not diverged at the catalytic triad and the binding pocket residues are also found conserved (Table 4).S2 Fig clearly shows the superimposed image of SmCL1 (cyan colour) on cruzain (grey colour) highlighting the conserved residues of active site and similarly conserved residues lining S1', S1, S2 and S3 pockets.Considering the above fact, cruzain non-peptide inhibitors were tested against SmCL1 via docking protocol to estimate the interaction energy and inhibition constant parameters.The parameters obtained were used to analyse the inhibition pattern by comparing it with the reported cruzain inhibition values [62][63][64][65].
As evident from the AutoDock results (S4 Table ), non-peptide phytochemical inhibitors exhibit more negative interaction energy (> -7.00 Kcal/mol) and lower inhibition constant (< 10μM) against SmCL1 as in case of cruzain with lower reported IC50 values [62].Further, AutoDock results for non-peptide non-phytochemical inhibitors (S4 Table ) showed that Nequimed, thiazolidinones and thiosemicarbazone inhibitors exhibit less negative interaction energy (< -7.00 Kcal/mol) and higher inhibition constant (> 10μM) against SmCL1, which is in consistence with the reported cruzain inhibition parameters [63][64][65].It was also found that non-peptide non-phytochemical and phytochemical inhibitors show more negative interaction energy and lower inhibition constant as compared to the values of peptide inhibitors against SmCL1.Further, it was interesting to observe that cruzain inactive non-peptide inhibitors act as decoys for SmCL1 too, as shown by the mM values of AutoDock inhibition constant (S4 Table ).
To analyse the stability of cruzain non-peptide inhibitors against SmCL1, MD simulation was performed at 2 ns.In view of the above results and discussion, we may suggest that phytochemicals are the better leads.Therefore, Simalikalactone-D could be a good fit for the pharmacophore model generation and validation.Pharmacophore features of SmCL1-Simalikalactone-D complex were identified using Ligand Scout v3.12 as shown in Figs 16 and 17.Based on the interaction points between the ligand and protein SmCL1, two hydrogen bond acceptor (HA), one hydrogen bond donor (HD) and two hydrophobic (HY) chemical features were identified in classifying the pharmacophore model resulting from SmCL1-Simalikalactone-D docked complex.Interfeature distance constraints (Fig 16) were calculated between different pharmacophoric features.The given inter-feature distances between HA and HA/HD, HA and HY, HY and HY, HY and HA/HD could be used to map new lead molecules whose chemical groups should be

Conclusions
An effort has been made to propose some probable leads against the SmCL1 whose pharmacophore features may be considered for further investigation in devising efficient drugs to retard the hazardous proliferation of S. mansoni.According to the available literature, SmCL1 could be the potential drug target in S. mansoni.The unavailability of 3D crystal structure of SmCL1 in PDB necessitated predicting the best possible theoretical model of this protein via homology modeling protocol.Virtual screening of potential drugs was carried out using the Autodock suite.The best possible library of phytochemicals and their leads were confirmed through ROC analysis and MD simulations.The scope of this investigation is to support cost-effective experimental screening of reported drugs against SmCL1.In addition to the efficacy of the proposed non-peptide leads, our work also suggested that phytochemicals could be better inhibitors against S. mansoni.Pharmacophore features of the proposed lead Simalikalactone-D should be given considerable importance and need to be studied for future screening of related lead compounds for developing noble drugs against SmCL1, thereby putting a check on neglected tropical disease 'schistosomiasis'.

Fig 1 .
Fig 1. Workflow diagram followed in the in silico virtual screening for inhibitors of SmCL1 of Schistosoma mansoni.doi:10.1371/journal.pone.0123996.g001 Fig 2. Alignment confirms the fact that important residues such as catalytic triad: Cys25, His161 and Asn182 (Fig 2), oxyanion hole: Gln19 (Fig 2) and S2 pocket residues (Fig 2) are conserved, and show low polymorphism in the surrounding area (shown in box in Fig 2) of the protein sequence of the selected templates and target protein SmCL1, thus promoting the suitability to use the selected templates for homology modeling of SmCL1.

Fig 2 .
Fig 2. Multiple sequence alignment of the protein sequence SmCL1 and other six templates selected.The boxes indicate the conserved cysteine, histidine and asparagine catalytic regions.The catalytic residues forming the triad: Cys25, His161 and Asn182 are highlighted dark green.Glutamine (Gln19) involved in the formation of oxyanion hole is highlighted purple.The residues highlighted orange are cysteine residues forming disulphide bonds.The residues highlighted with blue, yellow, pink and light green correspond to S2, S1, S1' and S3 pockets, respectively.RGD binding motif of SmCL1 is highlighted in black.doi:10.1371/journal.pone.0123996.g002 Fig 4 shows the Z-score plots of 2XU3 (A), 2P7U (B), model no. 12 of SmCL1 (C) and 1M6D (D), which are -7.24,-7.14, -5.34 and -5.81, respectively.Although the Z-score of model 12 is slightly lower than that of templates 2XU3 and 2P7U, it is in the same range of template 1M6D and found as a perfect fit within the structures in PDB.ProQ analysis suggests the model 12 as an extremely good model from the LG score 4.714 (i.e.>4.0), while MaxSub score 0.311 (i.e.>0.1) ranks it to be a fairly well model.QMEAN score of this model, 0.769 (Z-score -0.02), localized in dark zone, confirms the model X_1M6D_2P7U_2XU3_49 as a reliable one (Fig 5).SmCL1 is systematized into cysteine protease [EC.3.4.22.XX] group belonging to papain-superfamily CA, family C1A and utilizes important catalytic residues like glutamine (Gln19), cysteine (Cys25), histidine (His161) and asparagine (Asn182).Residues that constitute the different binding pockets remain strictly conserved in three catalytic regions (shown in the boxes) as presented in Fig 2. Modelled SmCL1 represents highly conserved framework structure of papain-like cysteine protease composed of two domains.The first one is largely α-helixrich (L) domain and the second one is predominantly β-sheet-rich (R) domain, separated by a groove containing the active site (Fig 6A)

Fig 4 .
Fig 4. ProSA analysis for the model structure of SmCL1 (C) the template structures, 2XU3 (A), 2P7U (B) and 1M6D (D) with Z-scores values of -5.34, -7.24, -7.14 and -5.81, respectively.doi:10.1371/journal.pone.0123996.g004 Fig 9A for ZINC compounds and Fig 9B for phytochemicals) suggests that library of phytochemicals has larger number of the actives than decoys, supporting the effectiveness of the phytochemicals as potential leads against SmCL1.

Fig 5 .Fig 6 .Fig 7 .
Fig 5. Absolute quality of SmCL1 model as assessed by QMEAN Z-score.Good models are generally located in the dark zone.The red marker indicates the positioning of SmCL1 model X_1M6D_ 2P7U_2XU3_49.doi:10.1371/journal.pone.0123996.g005

Fig 13 .
Fig 13.Interaction energy plot for 2 ns MD simulation of SmCL1 with ZINC lead 22001688 (A) and Simalikalactone-D (B).

Fig 15 .
Fig 15.Inhibition constant and binding energies of reference and screened compounds.The reference compound is E64 (peptide inhibitor) while rest are screened non-peptide compounds.doi:10.1371/journal.pone.0123996.g015 For thorough analysis of activities of FMK (Mu-Y-(O-methyl)-hF-FMK) and VS (K11777), MD simulation was performed at 2ns.The interaction energy plot of Mu-Y-(O-methyl)-hF-FMK (S1A Fig) suggests it to be non-stable interaction, where association and dissociation takes place until 750 ps, which later stabilizes reaching time interval of 1000 ps.In case of K11777 (S1C Fig), the variation in interaction energy was much higher suggesting it to be more non-stable interaction as compared to Mu-Y-(O-methyl)-hF-FMK; the association and dissociation in K11777 takes place until 750 ps, which later destabilises and dissociation continues till the end of MD run.Further, in the case of Mu-Y-(O-methyl)-hF-FMK, the hydrogen interaction number (S1B Fig) reaches two and for most of time remains one in number; whereas, in the case of K11777, the hydrogen interaction number (S1D Fig) reaches two from 250-750 ps and for much lesser time remains one in number.This illustration further confirms the greater stability of Mu-Y-(O-methyl)-hF-FMK over K11777.
The interaction energy plot (S3A Fig) of dihydrochalcone-compound 3 (best hit due to interaction with active site residue) suggests it to be a stable complex with lesser energy variation over the MD run.The hydrogen interactions (S3B Fig) for compound 3 reach a maximum of five and remain 2 or 3 for most of the time.Out of the non-peptide non-phytochemical inhibitors, Neq 42 and thiazolidinones (4p) energy plots (S3C and S3G Fig) suggest them as slight non-stable complexes, where the association takes place till 1750 ps, which later destabilises resulting in dissociation of the complex.The hydrogen interactions (S3D Fig) for Neq 42 reach a maximum of three and remain one for most of the time, whereas for thiazolidinones (4p) (S3H Fig) alternate between one and two over 2 ns.As discussed above, docking protocol identified that cruzain inactive non-peptide inhibitors (Neq 175 and thiazolodinones-4k) act as decoys for SmCL1 too; the same was also proved by MD simulation.The interaction energy (S3E and S3I Fig) and hydrogen interaction (S3F and S3J Fig) plots suggest that Neq 175 and thiazolidinones (4k) exhibit low energy values (-20 Kcal/mol) and null hydrogen interactions over 2 ns.

Fig 16 .
Fig 16.Pharmacophore features of Simalikalactone-D represented via Ligand Scout.The coloured spheres represent the pharmacophore features of the lead: yellow sphere for hydrophobic interaction; red and green spheres for hydrogen bond acceptor and donor, respectively.doi:10.1371/journal.pone.0123996.g016

Table 2 .
Identified homologs of SmCL1 based on the filter parameters discussed in the text.

Table 6 .
Best docks from the ligand library from Phytochemical Database.