Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

AI-assisted discovery of potent FGFR1 inhibitors via virtual screening and in silico analysis

  • Ram Lal (Swagat) Shrestha ,

    Contributed equally to this work with: Ram Lal (Swagat) Shrestha, Ashika Tamang, Sandeep Poudel Chhetri

    Roles Data curation, Investigation, Methodology, Resources, Software, Writing – original draft

    Affiliations Department of Chemistry, Amrit Campus, Tribhuvan University, Lainchaur, Kathmandu, Nepal, Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal, Institute of Natural Resources Innovation, Kalimati, Kathmandu, Nepal

  • Ashika Tamang ,

    Contributed equally to this work with: Ram Lal (Swagat) Shrestha, Ashika Tamang, Sandeep Poudel Chhetri

    Roles Data curation, Investigation, Methodology, Writing – original draft

    Affiliations Department of Chemistry, Amrit Campus, Tribhuvan University, Lainchaur, Kathmandu, Nepal, Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Sandeep Poudel Chhetri ,

    Contributed equally to this work with: Ram Lal (Swagat) Shrestha, Ashika Tamang, Sandeep Poudel Chhetri

    Roles Data curation, Investigation, Methodology, Writing – original draft

    Affiliations Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal, Central Department of Physics, Tribhuvan University, Kirtipur, Kathmandu, Nepal

  • Nirmal Parajuli,

    Roles Validation, Visualization

    Affiliations Department of Chemistry, Amrit Campus, Tribhuvan University, Lainchaur, Kathmandu, Nepal, Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Manila Poudel,

    Roles Investigation, Validation

    Affiliation Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Shiva M. C.,

    Roles Visualization

    Affiliations Department of Chemistry, Amrit Campus, Tribhuvan University, Lainchaur, Kathmandu, Nepal, Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Aakar Shrestha,

    Roles Visualization

    Affiliation Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Timila Shrestha,

    Roles Resources

    Affiliations Department of Chemistry, Amrit Campus, Tribhuvan University, Lainchaur, Kathmandu, Nepal, Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Samjhana Bharati,

    Roles Data curation, Resources

    Affiliations Department of Chemistry, Amrit Campus, Tribhuvan University, Lainchaur, Kathmandu, Nepal, Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Binita Maharjan,

    Roles Data curation, Resources, Software

    Affiliations Department of Chemistry, Amrit Campus, Tribhuvan University, Lainchaur, Kathmandu, Nepal, Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal

  • Bishnu P. Marasini ,

    Roles Formal analysis, Supervision, Writing – review & editing

    subinadhikari2018@gmail.com (JAS); bishnu.marasini@gmail.com (BPM)

    ‡ BPM and JAS also contributed equally to this work.

    Affiliations Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal, Institute of Natural Resources Innovation, Kalimati, Kathmandu, Nepal, Nepal Health Research Council, Ministry of Health and Population, Ramshah Path, Kathmandu, Nepal

  • Jhashanath Adhikari Subin

    Roles Conceptualization, Formal analysis, Supervision, Writing – review & editing

    subinadhikari2018@gmail.com (JAS); bishnu.marasini@gmail.com (BPM)

    ‡ BPM and JAS also contributed equally to this work.

    Affiliations Kathmandu Valley College, Syuchatar Bridge, Kalanki, Kathmandu, Nepal, Bioinformatics and Cheminformatics Division, Scientific Research and Training Nepal P. Ltd., Bhaktapur, Kathmandu, Nepal

Abstract

Fibroblast growth factor receptor 1 (FGFR1) is recognized as an oncogene that fosters tumor development, playing a vital role in cancer progression. This has established it as a promising target for cancer drug development. However, existing FGFR1 inhibitors are often limited by drug resistance and lack of specificity, emphasizing the need for more selective and potent alternatives. To address this challenge, the present study employed an AI-driven virtual screening approach, integrating molecular docking (MD) and molecular dynamics simulations (MDS) to discover novel FGFR1 inhibitors. A voting classifier integrating three machine learning classifiers was utilized to screen 10 million compounds from the eMolecules database, leading to 44 promising candidates with a prediction probability exceeding 80%. MD identified compound with PubChem Compound Identifier (CID) 165426608 (−10.8 kcal/mol) as the highest-scoring ligand, while compounds with CID 145940129 (−9.8 kcal/mol), CID 131910163 (−9.4 kcal/mol), CID 155915988 (−9.2 kcal/mol), and CID 132423733 (−9.1 kcal/mol), exhibited binding affinities comparable to or slightly lower than that of the native ligand (−10.4 kcal/mol). MDS further revealed that all these compounds, except CID 131910163, maintained structural stability with time. Thermodynamic stability assessment confirmed the spontaneity and feasibility of their complex formation reactions with negative ΔGBFE values ranging from −21.87 to −12.76 kcal/mol. Decomposition of binding free energy change further provided key stabilizing residues. The heatmaps and histograms of the interaction over the full 200 ns simulation period highlighted the prominent interaction profiles. Structural similarity analysis of the four MDS-stable compounds displayed the dice similarity scores of 0.200000 to 0.452830 with known FGFR1 inhibitors. Additionally, the pIC50 prediction using a voting regressor indicated promising pIC50 values (7.07 to 7.47), highlighting their potential as hit candidates for further structural optimization and therapeutic development. Further, this study underscores the efficiency of machine learning-based virtual screening and in silico analysis as a cost-effective and reliable strategy for accelerating hit drug discovery from large datasets, even with limited resources and time.

1. Introduction

Cancer represents a critical global health challenge, accounting for one in every six deaths worldwide. In 2022, around 20 million individuals were newly diagnosed with cancer, and approximately 9.7 million deaths were attributed to cancer-related diseases [1]. Despite significant advancements in different therapeutic approaches such as chemotherapy, hormone therapy, and immunotherapy, cancer mortality rates remain alarmingly high. This is mainly because of the complex genetic and phenotypic diversity of cancer, and the emergence of drug-resistant phenotypes [2].

The fibroblast growth factor receptor (FGFR) signaling axis plays a crucial role in transducing signals that govern various cellular processes, including proliferation, angiogenesis, differentiation, embryonic development, migration, organogenesis, and survival [3]. Members of the fibroblast growth factor receptor family, including FGFR1, frequently undergo genomic alterations such as mutations, amplifications, and gene fusions across various cancer types [4]. FGFR1, in particular, has been extensively studied and recognized as an oncogene that fosters tumor development, underscoring its critical role in cancer progression [5]. Overexpression of FGFR1 has been observed in cancers such as breast [6,7], lung [8], ovarian [9,10], bladder [11], prostate [12,13], and gastric cancers [14], among others. Consequently, targeting FGFR1 for cancer therapy has become an appealing therapeutic strategy [15]. To date, drugs such as Regorafenib, Nintedanib, Sorafenib, Lenvatinib, Erdafitinib, Pemigatinib, Infigratinib, and Futibatinib have received FDA approval for FGFR1 inhibition [1618]. These inhibitors work by reducing FGFR1 activity, which is often overexpressed in certain cancers, thereby impeding tumor growth. However, the efficacy of these inhibitors is limited due to challenges like drug resistance and lack of specificity [19]. Therefore, the development of novel inhibitors with enhanced effectiveness and reduced side effects remains a significant challenge.

Traditional drug discovery methods rely heavily on in vivo experiments and in vitro screening, which are both costly and labor-intensive [20]. Preclinical drug discovery constitutes approximately one-third of the drug development expenses and usually takes nearly five and a half years [21,22]. The high failure rate during drug development further exacerbates the costs. As a result, methodologies that can reliably predict success at early stages are critically valuable. Computer-aided drug design (CADD) has emerged as a transformative approach in this domain [23]. By employing in silico techniques, CADD accelerates drug discovery and reduces the time required for identifying leads and introducing new drugs. These methods also enable the prediction of biological activity for chemical compounds against specific targets [24]. In this study, an Artificial Intelligence (AI)-driven virtual screening approach of millions of molecules was adopted to advance FGFR1-targeted drug discovery [25,26]. By leveraging AI’s ability to analyze vast datasets and predict drug efficacy in a relatively short time span, this study aims to streamline the drug development process and increase the likelihood of identifying alternate treatments [27]. Fig 1 outlines the detailed workflow adopted in this study.

2. Materials and methods

2.1. Data collection and curation

The Chemical European Molecular Biology Laboratory (ChEMBL) database was used to obtain the Simplified Molecular Input Line Entry System (SMILES) representations and half-maximal inhibitory concentration (IC50) values for 2,153 FGFR1 inhibitors [28]. The IC50 value represents the concentration of a compound required to inhibit a specific biological process or activity by 50% which serves as a preliminary guide for selecting efficient and biologically active molecules. After filtering entries without IC50 values, retaining bioactivity data measured in nanomolar (nM), and removing duplicates, 1876 data points remained. The IC50 values were transformed into pIC50 values using negative logarithms to standardize the data. Lipinski’s Rule of Five (RO5) was applied to assess drug-likeness and exclude less potent compounds [29,30], resulting in 1523 data points for model training. Radar plots depicting the physicochemical properties of the filtered dataset are shown in Fig 2.

thumbnail
Fig 2. Physicochemical radar plots of the compound in the dataset that (a) fulfill RO5 and (b) violate RO5.

https://doi.org/10.1371/journal.pone.0331837.g002

2.2. Model building and database screening

Molecular fingerprints [31], encoded as numerical vectors or bit-strings, facilitate rapid similarity evaluations critical for virtual screening [31,32], structure-activity relation studies, and chemical space mapping [33]. Using the RDKit toolkit [34], fingerprints from SMILES entries were computed, and the dataset was classified into 813 active and 710 inactive compounds (1523 total) using a pIC50 threshold of 7.0, as a cut-off ranging from 5 to 7 has been recommended [26]. Based on the Morgan3 protocol, which employs 2048 bits as a circular fingerprint [35], machine learning models were constructed using Scikit-learn and XGBoost. The Morgan3 fingerprints (radius = 3) encode molecular features extending up to three bonds from each atom, allowing the capture of broader substructural patterns that may play a vital role in determining biological activity. Scikit-learn is a versatile machine-learning library that provides a diverse range of algorithms for classification, regression, clustering, and dimensionality reduction tasks [36]. XGBoost is an advanced library tailored for the fast and scalable execution of gradient-boosting algorithms [37]. Twenty classification models were trained, and the best-performing models were fine-tuned to create a voting classifier, amplifying accuracy and robustness in comparison with the individual models [38]. A similar approach was applied for building a voting regressor. The voting classifier was then employed to screen 10 million compounds from the eMolecules database [39,40]. Compounds with invalid SMILES, those violating the Rule of Five (RO5), and Pan-Assay Interference Compounds (PAINS) were excluded prior to screening.

Classification models’ performance was evaluated using accuracy, precision, sensitivity, specificity, and Area Under Curve (AUC) metrics, calculated based on the confusion (error) matrix [41]. Regressors were assessed according to mean absolute error (MAE), root-mean-squared error (RMSE), and R2 scores [42]. Multiple steps were used systematically for model training and database screening to identify potential molecules.

2.3. Molecular docking calculations

The 44 potential inhibitors with prediction probabilities above 80% obtained from the screening of 10 million compounds from the eMolecules database were selected as hit ligands. The 3D structures of 35 compounds available in the PubChem database (https://pubchem.ncbi.nlm.nih.gov/) [43] were retrieved in SDF format, while the remaining 9 compounds were drawn using their SMILES strings. The molecular formulas were verified using the Avogadro program (v1.2.0) [44] after adding the hydrogen atoms. Energy minimization was carried out using the UFF force field with 5000 steps employing a conjugate gradient algorithm, ensuring energy convergence at 1.0 × 10−8 kcal/mol. This process was repeated until the global minima was reached. The bond orders, including double bond positions, were examined, and steric hindrance or stress was removed. Finally, the optimized ligands were converted to PDBQT format with Gasteiger charges using AutoDock Tools [45].

The 3D crystal structure of FGFR1 (PDB ID: 4ZSA, DOI: https://doi.org/10.2210/pdb4ZSA/pdb) with X-ray crystallographic resolution of 2.00 Å was obtained from the RCSB database (https://www.rcsb.org/) [46]. Missing amino acid residues were repaired using the SwissModeling server (https://swissmodel.expasy.org/) [47], where model_01 of template 5B7V.1.A (global model quality estimate: 0.88, qualitative model energy analysis with distance constraints: 0.83 ± 0.05) was selected due to its 100% sequence identity. The finalized protein structure was converted to PDBQT format with the addition of polar hydrogens and Kollman charges using the AutoDock Tools. The apo form of the protein was then utilized as the target for computational analyses.

The molecular docking calculations of the ligands with the FGFR1 protein were done using the user-friendly software AutoDock Vina [48], with the same protocol as outlined by Phunyal et al. [49], with slight modifications of parameters. A grid box size of (50, 50, 50) Å3, the grid center at (x: 5.060, y: − 0.501, z: 16.013), an energy range of 4, and 20 number of modes were used with an exhaustiveness (converged) of 64. The five protein-ligand complexes with the top binding affinities were saved in pdb format and utilized for MDS. The binding interaction between the protein and ligand was visualized using the PyMOL [50] program and the protein-ligand interaction profiler (PLIP) [51] web server.

2.4. Molecular dynamics simulations

The GROMACS (version 2021.2) software [52] was used to simulate the protein-ligand complex, with the CHARMM36 force field [53] applied to the receptor, while the ligand parameters were derived from the SwissParam server [54]. The system was solvated using the TIP3P water model in a triclinic box with a 12 Å spacing at the sides, to prevent any unwanted effects caused by repetition of the simulation box. Neutralization was achieved by adding counter ions, followed by the inclusion of an isotonic NaCl solution (0.15 M). Equilibration was carried out in four steps with each of 1 ns at 310 K and 1 bar, with the first two using the NVT ensemble and the last two using NPT. The V-rescale thermostat, which is a modified version of the Berendsen method, was used for the temperature coupling, while pressure coupling was applied through the isotropic Parrinello-Rahman approach. The final 200 ns production run, with a 2 fs step size, was performed without constraints on the protein-ligand complex. Additional parameter details for different system setups can be found in the literature [5558]. The complex was centered and analyzed using GROMACS built-in modules to obtain geometric parameters such as snapshots, root mean square deviation (RMSD) of the ligand and protein backbone, root mean square fluctuation (RMSF), radial pair distribution function (RPDF), radius of gyration (Rg), and solvent-accessible surface area (SASA).

The thermodynamic stability of the protein-ligand complexes was evaluated using the Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) binding free energy calculations [59], following the parameters as utilized by Shrestha et al. [60]. A 20 ns equilibrated segment from the 200 ns molecular dynamics trajectory was used for this purpose. Calculations were carried out using the MMPBSA module [61], which applies the Poisson–Boltzmann solvation model. Binding free energy was computed at 100 ps intervals to assess the stability, spontaneity, and feasibility of complex formation over time. The spontaneity and viability of the forward reaction were evaluated based on the sign of the free energy changes. To investigate the contribution of individual amino acid residues to binding free energy change, decomposition analysis was performed on the same 20 ns equilibrated segment using the g_mmpbsa tool [62], which allowed the decomposition of the total binding energy to identify key stabilizing and destabilizing residues. The associated gmx_MMPBSA_ana subprogram was used for data analysis and visualization.

Additionally, to understand residue-level interaction dynamics over the entire course of the simulation, amino acid interaction heatmaps and histograms were generated using the full 200 ns trajectory (20,000 frames).

2.5. Similarity analysis

Based on the principle that structurally similar compounds often share chemical and biological properties, a similarity analysis between potential and known FGFR1 inhibitors [63] was conducted. The relevance of this analysis is closely tied to the nature of structure-activity relationships (SARs) that define biologically active molecules, serving as key factors for the success of ligand-based virtual screening, regardless of the methods employed [64]. Using Morgan2 fingerprints, the similarity maps based on the dice similarity metric were generated, highlighting structural features influencing biological activity [65,66]. Morgan2 fingerprints (radius = 2) encode molecular features extending up to two bonds, emphasizing localized structural variations. In similarity maps, Morgan2 is beneficial for highlighting key local substructures that impact activity, enhancing the interpretability of the visualization.

2.6. Computational resources

All the calculations, including plot generation, were executed on high-performance multiprocessor systems. The machine learning computations were conducted on a system featuring 96 cores, 256 GB of RAM, a 16 GB GPU accelerator, and running Ubuntu 20.04 LTS. Meanwhile, MD and MDS were performed on a system with 24 cores, a 24 GB GPU accelerator, and running Ubuntu 20.04 LTS. The analysis and the visualization of the data were done on a personal computer with Windows 11 operating system.

3. Results and discussion

3.1. Model evaluation and screening results

Twenty classification models were evaluated, and their performance metrics are summarized in Table 1.

thumbnail
Table 1. General performance of 20 different classification models.

https://doi.org/10.1371/journal.pone.0331837.t001

The Support Vector Classifier (SVC), ExtraTreesClassifier (ET), and Extreme Gradient Boosting Classifier (XGB) demonstrated superior accuracy and AUC scores, leading to their integration into a voting classifier with a soft voting mechanism. The soft voting mechanism is an ensemble learning technique that predicts the final class by averaging the probability estimates from multiple models and selecting the class with the highest mean probability [67]. For SVC, the parameters were set to C = ‘2.0’ and probability = ‘True’; for ET, n_estimators = ‘400’, criterion = ‘log_loss’, and max_features = ‘log2’; for XGB, n_estimators = ‘1000’, max_depth = ‘5’, and learning_rate = ‘0.04’; all other parameters were set to their default values.

Fig 3 presents confusion matrices for individual and voting classifiers, while Table 2 summarizes the five-fold cross-validation results.

thumbnail
Table 2. Five-fold cross-validation of individual classifiers and voting classifier using five parameters.

https://doi.org/10.1371/journal.pone.0331837.t002

thumbnail
Fig 3. Confusion matrix for (a) SVC; (b) ExtraTreesClassifier; (c) XGB Classifier; (d) Voting Classifier.

https://doi.org/10.1371/journal.pone.0331837.g003

ROC curves in Fig 4 illustrate the excellent discrimination ability of these models.

thumbnail
Fig 4. ROC curves of the individual models and the voting classifier.

https://doi.org/10.1371/journal.pone.0331837.g004

The AUC scores show that all the classifiers have excellent discrimination abilities between active and inactive compounds. Additionally, a voting classifier was tested on an external test set, which consisted of FDA-approved selective inhibitors of FGFR1- Erdafitinib, Pemigatinib, Infigratinib, and Futibatinib. The model classified all these drugs as active with high prediction probabilities (> 90%), further demonstrating the reliability of our model. Based on these results, the voting classifier was used to screen the eMolecules database, identifying 44 compounds with prediction probabilities above 80% as potential active inhibitors of FGFR1 protein.

3.2. Docking score comparison and interaction analysis

The molecular docking protocol was validated by docking the native ligand into the apo protein’s active site. The pose of the docked native ligand from the molecular docking was superimposed on its pose from the crystal structure, as shown in Fig 5, resulting in a heavy atom RMSD of 0.397 Å (< 2 Å) [68]. This confirmed the parameters, algorithm, and ligand poses, justifying the numerical method adapted with the capability of reproducing the natural process.

thumbnail
Fig 5. Superimposed image of native ligand (blue) from the crystal structure with the docked native ligand (magenta) from molecular docking calculations (a heavy atom RMSD = 0.397 Å).

https://doi.org/10.1371/journal.pone.0331837.g005

The binding affinities and poses of ligands interacting with the FGFR1 protein were obtained from molecular docking calculations. The 44 ligands obtained by the virtual screening of 10 million compounds were subjected to molecular docking calculations against the FGFR1 receptor to assess their potential for competitive inhibition. Ligand M34 exhibited the highest binding affinity (−10.8 kcal/mol), outperforming the native ligand (−10.4 kcal/mol). Additionally, four ligands, M29 (−9.8 kcal/mol), M26 (−9.4 kcal/mol), M32 (−9.2 kcal/mol), and M28 (−9.1 kcal/mol), demonstrated binding affinities comparable to or slightly lower than that of the native ligand. The binding affinity of 44 ligands is presented in Table 3 along with their molecular formula, PubChem chemical identifier (CID), and parent ID.

thumbnail
Table 3. Binding affinities for 44 ligands with the FGFR1 protein, along with that of the native ligand. (Bold-faced was chosen for MDS).

https://doi.org/10.1371/journal.pone.0331837.t003

The mode of interactions was studied for the complexes of M26, M28, M29, M32, and M34 with the protein. The 3D representations of the complexes, as shown in Fig 6, demonstrated that all top five ligands were bound at the catalytic site of the protein, suggesting the competitive inhibitors.

thumbnail
Fig 6. 3D representations (left) of docked ligands at the catalytic pocket of the protein in complexes and the molecular structures (right) of the ligands.

https://doi.org/10.1371/journal.pone.0331837.g006

To better understand the details at the molecular level, the bonding interactions between the top five docked ligands and key amino acid residues, along with the distances, were studied (Table 4) and the interaction profiles are presented in Fig 7. The interaction analysis revealed several key hydrophobic interactions between the ligands and the protein’s amino acid residues, along with hydrogen bonds.

thumbnail
Table 4. Types of interactions between the top five docked ligands and key amino acid residues in the protein-ligand complexes.

https://doi.org/10.1371/journal.pone.0331837.t004

thumbnail
Fig 7. Molecular-level interaction profiles of top docked complexes along with that of the native complex.

https://doi.org/10.1371/journal.pone.0331837.g007

The molecular interaction analysis revealed that all ligand-protein complexes (M26-M34) were primarily stabilized by hydrophobic interactions, especially with residues Leu27, Val35, Val104, Leu173, and Phe185, which appeared frequently across the top complexes and the native complex, except M29-complex. Among these, residues Leu27, Leu173, and Phe185 were the most consistently conserved hydrophobic residues, underlining their importance in ligand anchoring. Notably, residue Asp184 (3–4 Å) contributed to hydrogen bonding in M26 and M32, while M26 further exhibited additional hydrogen bonds with Phe185 (3.93 Å) and Gly186 (4.08 Å), suggesting a stabilizing role of Asp184 that was not observed in the native complex.

In contrast, the native complex formed distinct hydrogen bonds with Glu105 (2.80 Å) and Ala107 (2.87, 2.98 Å), interactions not replicated by the ligand-bound complexes. Despite this, the native complex shared hydrophobic contacts with residues Leu27, Val104, Leu173, and Phe185, which were also involved in complexes of M26, M28, M32, and M34, indicating partial overlap in binding site occupation. M29-complex exhibited a unique interaction pattern, including hydrogen bonds (Lys57: 3.06 Å, Asp195: 2.90 Å), a π-stacking interaction with Phe32, and distinct hydrophobic residues, implying a different binding orientation. Additionally, M28 uniquely formed a halogen bond with residue Ile88 (3.51 Å) via a fluorine atom.

Overall, hydrophobic interactions emerged as the key driving force for ligand binding, with residues Leu27, Val104, Leu173, and Phe185 serving as conserved contributors across multiple complexes. In addition, hydrogen bond distances ranging from 2.90 to 4.03 Å indicated strong to moderate binding affinity, as shorter bond lengths generally correlate with stronger interaction [69]. While most protein-ligand complexes exhibited similar hydrophobic interaction patterns with the native complex, M29 displayed a distinct binding profile, which may influence its receptor modulation potential.

3.3. Adduct stability with time (spatial and energetic)

Understanding the spatial and energetic stability of the adduct is crucial for evaluating the inhibitory potential of ligands on the FGFR1 protein. To achieve this, MDS was performed for 200 ns. The structural and interaction stability of the protein-ligand complexes for the top five ligands was analyzed by examining various time-dependent parameters. The spontaneity and feasibility of the complex formation reactions for the top five ligands were determined in terms of changes in binding free energy. Both geometric and thermodynamic parameters are discussed in the following sections.

3.3.1. Structural stability assessment.

The binding of the ligand to the protein can induce structural alterations in both the protein and the ligand, which may affect the stability of the complex and hence the inhibition mechanism [70]. The stability of the top five ligand complexes was evaluated by analyzing various computational metrics from MDS trajectories, which provided insight into the structural integrity of the protein-ligand complexes. The metrics include the study of ligand pose (snapshot), RMS deviation of ligands and protein backbone relative to protein backbone, RMS fluctuation of the α-carbon atoms, RPDF, Rg, and SASA which are discussed next.

Dynamic insights into ligand behavior at the protein active site through MDS: Snapshots were taken at various time intervals during the MDS to examine the orientation and position of the docked ligands, providing insight into the stability of the complex’s geometry over time. Detailed images of the top four complexes at five distinct instances are presented in Fig 8.

thumbnail
Fig 8. Ligand’s positions and orientations within the protein’s active site across four different complexes at five distinct instances, as extracted from the MDS trajectories.

https://doi.org/10.1371/journal.pone.0331837.g008

Snapshots taken at 1, 50, 100, 150, and 200 ns showed that most ligands remained at the same location but with variations in orientation, except for a few cases. For the M28-complex, the ligand exhibited distinct rotational motion starting at 1 ns, accompanied by a slight upward position shift from 100 ns till the end due to translational movement. The protein backbone displayed some motion, particularly in the α-helix and loops (on the right side) from 50 ns onward, with subtle movement in the central β-sheet. In the case of M29-complex, the ligand underwent significant positional and orientational changes at the active site, with pronounced rotation at 50 ns and minimal rotational motion thereafter. From 100 ns onward, the ligand moved slightly upward, eventually returning to its position at 50 ns by 200 ns. The protein backbone showed the movement of the right α-helix (absent before but observed after 50 ns till the end), along with motions in the left β-sheet and central loops. In the M32-complex, the ligand depicted minimal delocalization and rotation until 100 ns, followed by rotation and a slight downward shift in the position along with the noticeable motion of the α-helix and loops (150 ns). In the M34-complex, the ligand remained stable with some rotation for the first 50 ns. After this period, it slightly shifted upward along with the protein backbone, maintaining minimal rotational and translational movement until the end of the simulation. The protein backbone showed notable dynamics, including the disappearance of the left α-helix after 1 ns and upward movement of the top lying α-helix (100 ns). The loops fluctuated throughout the simulation. The M26-complex was excluded from Fig 8 as the ligand showed displacement from the orthosteric site after 150 ns, suggesting weak binding or instability that compromised complex integrity. This indicated that the binding affinity of −9.4 kcal/mol, even though better than that observed for M32 and M28 complexes, was not sufficient to retain the pose and position at the active site. This implies that MD does not necessarily provide information about the stability of the complexes.

Periodic monitoring of adducts’ dynamical behavior provided valuable insights into molecular evolution, which could be linked to specific structural descriptors. Overall, the results demonstrated that the ligand’s pose and the protein backbone’s structural integrity were nearly preserved across the top four complexes (M28, M29, M32, and M34) with minimal structural changes and no major disruptions, suggesting the stability of the complexes. On the other hand, the M26-complex was unstable due to ligand displacement. These findings can be correlated with the RMSD and RPDF curves, which will be discussed next.

RMSD of ligand and protein backbone in the complex: The stability and dynamic behavior of the protein-ligand complexes were analyzed by examining the RMSD of the ligands and protein backbones. The RMSD for both the ligands and protein backbones with respect to the protein backbone was calculated from the MDS trajectories of various complexes and is displayed in Figs 9 and 10, respectively.

thumbnail
Fig 9. RMSD curves of top ligands relative to protein backbone; M26-complex (green), M28-complex (blue), M29-complex (red), M32-complex (maroon), and M34-complex (magenta).

https://doi.org/10.1371/journal.pone.0331837.g009

thumbnail
Fig 10. RMSD curves of protein backbone with respect to protein backbone in holo form along with the protein backbone in apo form (black); M26-complex (green), M28-complex (blue), M29-complex (red), M32-complex (maroon), and M34-complex (magenta).

https://doi.org/10.1371/journal.pone.0331837.g010

The RMSD of ligands relative to the protein backbone (Fig 9) provides insight into the extent of the conservation of the pose over time. The RMSD profiles of M28 (blue) and M29 (red) in their respective complexes exhibited smooth trajectories, with average RMSD of 0.38 ± 0.11 nm and 0.56 ± 0.08 nm, respectively. A slight increase in fluctuation was observed for M29 after 155 ns, attributed to an orientation shift of the ligand beyond 150 ns, as illustrated in (Fig 8). In the case of M34 (magenta), the RMSD trajectory was relatively flat after 95 ns, whereas M32 (maroon) depicted a moderate RMSD curve throughout the simulation period with some fluctuation after 120 ns. The fluctuation before 95 ns in the M34 complex and after 120 ns in M32 can be corroborated with the ligand’s orientation as seen previously in snapshots (Fig 8). The average RMSD for M34 and M32 was determined to be 0.63 ± 0.16 nm and 0.49 ± 0.16 nm, respectively. Conversely, ligand M26 (green) initially displayed a stable trajectory up to 170 ns, followed by a sharp increase in RMSD, reaching approximately 2.0 nm. This trend suggested an unstable nature of the complex, which aligns with the interpretation from the snapshots.

The backbones (Fig 10) of M28 (blue) and M34 (magenta) followed similar trajectories, with average RMSD of 0.31 ± 0.07 nm and 0.34 ± 0.08 nm, respectively, closely matching that of the apo protein (black = 0.30 ± 0.05 nm). In contrast, the backbones of M29 (red) and M32 (maroon) exhibited slightly lower average RMSD of 0.26 ± 0.04 nm and 0.29 ± 0.03 nm, respectively. The observed spikes in the RMSD plots corresponded to minor backbone adjustments, as depicted in the snapshots (Fig 8). These findings suggest that the protein backbone remained largely stable across the four complexes, indicating that ligand binding had minimal impact on its overall structure compared to the apo form. Since the M26 was unstable, the protein backbone of the M26-complex was not discussed.

From the analysis of the RMSD profiles, it was found that all ligands except M26 were bound at the protein’s active site till the end, narrowing the selection of top candidates from five to four. Among the four, ligands M28 and M29 resulted in the most stable complexes, as reflected by their low and consistent RMSD. M34-complex exhibited good stability, whereas M32-complex demonstrated moderate stability. In contrast, the complex with M26 showed significant instability, with a sharp increase in RMSD after 170 ns, indicating a loss of stable binding, and therefore, protein backbone analysis was omitted. The protein maintained the sturdy geometry across the top four complexes, capable of holding the ligand at its catalytic site. Hence, four ligands, except M26, could potentially inhibit the functioning of the FGFR1 protein.

Radial pair distribution function (RPDF): The radial pair distribution function (RPDF) describes how the distance between two entities varies over time. In its reduced form [g(r)], it represents the probability of finding the ligand’s center of mass at a distance r from the protein’s center of mass [71]. Fig 11 represents the RPDF plots for different protein-ligand complexes, which have been derived from the MDS trajectories.

thumbnail
Fig 11. Radial pair distribution function between the center of mass of ligands ((a) M26 = green, M28 = blue, M32 = maroon; (b) M29 = red, M34 = magenta) and protein’s center of mass in various complexes retrieved from the MDS trajectories; a single sharp peak indicates the ligand’s localization and a double peak signifies occupation at two distinct positions during the MDS period.

https://doi.org/10.1371/journal.pone.0331837.g011

The RPDF plot revealed distinct binding behavior of the ligands relative to the protein’s center of mass. M32 (maroon), M28 (blue), and M34 (magenta) displayed two peaks at different distances, whereas M29 (red) exhibited a single peak. For M32, two peaks with a taller one at ca. 0.9 nm and a shorter one at ca. 1.2 nm were observed. Similarly, a tall peak at ca. 1.1 nm and a short peak at ca. 1.4 nm were observed for M28. The two different peaks indicated occupancy at two distinct locations, with the longer peak suggesting a preference for a shorter ligand-protein distance. On the other hand, M34 displayed two peaks of comparable height at ca. 1.0 nm and 1.2 nm, implying that it occupied two distinct locations within the protein’s active site for most of the simulation period. The presence of these peaks supported the minor variations in ligand position and orientation within the complex, as observed in Fig 8. In contrast, the M29 (red) complex exhibited a single peak at ca. 1.4 nm, indicating the localization of the ligand’s center of mass relative to the protein’s center of mass throughout the simulation. The occurrence of RPDF maxima at ca. 1.0 nm for the top four ligands indicates that the orthosteric site remained occupied throughout most of the simulation period. These results indicate that after binding to the orthosteric pocket of the receptor protein, the ligands remained largely localized within the site, possibly inhibiting the protein’s regular function. Thus, the RPDF analysis effectively evaluated ligand stability over time, reinforcing earlier conclusions drawn from structural snapshots.

For M26 (green), a single peak was observed at ca. 1.0 nm, but the presence of further smaller broad peaks afterward indicated the delocalization, which supported the instability of the ligand, as previously noted in snapshots and RMSD analysis. Since ligand M26 exhibited instability, further geometrical parameter analysis for this ligand was not conducted.

Fluctuation of α-carbon in the protein backbone of the complex: The root mean square fluctuation (RMSF) of the α-carbon atoms was calculated from the MDS trajectory to identify the flexible and rigid regions of the FGFR1 protein after ligand binding. The RMSF curves for the five ligand-protein complexes, along with that of the apo form (Fig 12), displayed a similar nature of the plot.

thumbnail
Fig 12. Root mean square fluctuation of α-carbon atoms in protein backbone of five complexes (in holo form) along with that in the apo form (black); M28-complex (blue), M29-complex (red), M32-complex (maroon), and M34-complex (magenta).

https://doi.org/10.1371/journal.pone.0331837.g012

The RMSF was below ca. 0.8 nm for all top four complexes, whereas it was ca. 1.0 nm for the apo form, indicating the stability of protein geometry [72]. Higher RMSF were observed at the terminal ends and within three specific loop regions around residues 45, 130, and 200. The increased flexibility in these regions can be attributed to the absence of α-helix or β-sheet structures, which typically restrict molecular motion and reduce degrees of freedom [55]. Since these flexible regions do not play a significant role in ligand interactions or disruptions, their high RMSF does not indicate structural instability. Therefore, for clarity in the plot, only residues ranging from 12 to 300 were considered, excluding the terminal ends. The RMSF profiles indicated that the α-carbon atom fluctuations exert minimal influence on the ligand’s binding affinity within the active site. Consequently, the stability of the adduct remained unaffected, which potentially may lead to the inhibition of protein activity.

Gyradius (Rg) and solvent-accessible surface area (SASA): The gyradius (Rg), derived from the MDS trajectory, was used to evaluate the protein’s compactness and backbone conformational changes. It represents the average distance of the macromolecule’s components from its central axis, which is a crucial indicator of the stability of the protein-ligand complex [73]. The Rg plot for the protein backbone of the four ligand-protein complexes (Fig 13) revealed similar stability patterns across all systems, with Rg values ranging from ca. 2.00 to 2.15 nm, indicating no significant structural expansion or contraction during the simulation period. In contrast, the M29 (red = 2.05 ± 0.01 nm) and M32 (maroon = 2.06 ± 0.02 nm) complexes displayed some fluctuations, particularly between 60 ns and 145 ns, similar to that of the apo form (black = 2.06 ± 0.02 nm) and M34 (magenta = 2.07 ± 0.02 nm). A pronounced fluctuation in the case of the M34 complex can be correlated with minor changes in ligand orientation and α-helix positioning at 100 ns (Fig 8). Overall, the Rg of the top four complexes closely matched the apo form, suggesting no significant receptor expansion or shrinkage upon ligand binding. These findings indicate that the protein maintained structural integrity even after ligand binding, suggesting that the ligands may contribute to target protein inhibition.

thumbnail
Fig 13. Radius of gyration curves for protein backbone in top five ligand complexes along with the apo form (black): M28-complex (blue), M29-complex (red), M32-complex (maroon), and M34-complex (magenta); extracted from MDS trajectories.

https://doi.org/10.1371/journal.pone.0331837.g013

Solvent-accessible surface area (SASA) is a crucial parameter for evaluating protein-solvent interactions, as it measures the exposure of protein residues to water molecules [74]. Changes in SASA can influence the protein’s structure, dynamics, and function [75]. SASA analysis was conducted to evaluate the effect of ligand binding on the conformational behavior of the FGFR1 protein over 200 ns MD simulations (Fig 14). The SASA ranged from 155 to 185 nm2 for most complexes, showing similar trends to that of the apo form (black), with an exception for the M34 complex (magenta = 172.94 ± 4.48 nm2) which exhibited slightly higher SASA due to minor surface adjustments after 80 ns. The M28 (blue = 166.00 ± 3.32 nm2), M29 (red = 166.83 ± 2.59 nm2), and M32 (maroon = 168.59 ± 3.75 nm2) complexes displayed comparable average SASA to the apo form (165.47 ± 3.13 nm2), supporting the stable surface geometry. The minimal variations observed (below 5 nm2) suggest that ligand binding did not significantly alter the protein’s hydrophobic regions or shape, ensuring consistent solvent accessibility and reinforcing the structural stability of the complexes for most of the cases, as discussed previously.

thumbnail
Fig 14. Solvent-accessible surface area plots for protein backbone in top five ligand complexes along with the apo form (black): M28-complex (blue), M29-complex (red), M32-complex (maroon), and M34-complex (magenta); extracted from MDS trajectories.

https://doi.org/10.1371/journal.pone.0331837.g014

3.3.2. Thermodynamic stability assessment of the protein-ligand complexes.

The spontaneity and feasibility of complex formation reactions were evaluated by analyzing the binding free energy changes in the equilibrated segment of the MDS trajectory (20 ns, 200 frames) for the top four (MDS stable) ligand adducts, as outlined in Table 5. The table reflects the degree of spontaneity in the complex formation reactions from the discrete protein and ligand. The negative ΔGBFE (ΔGBFE < 0) signifies the spontaneity of the complex formation reaction, and a smaller value corresponds to higher stability [76].

thumbnail
Table 5. Binding free energy changes (kcal/mol) and their components for protein-ligand complexes extracted from the equilibrated segment of the 20 ns MDS trajectories.

https://doi.org/10.1371/journal.pone.0331837.t005

All top-ranked protein-ligand complexes exhibited negative ∆GBFE (ranging from −21.87 to −12.76 kcal/mol), affirming the spontaneity and feasibility of the complex formation reactions. Among them, the M34-complex demonstrated the highest thermodynamic stability, with the lowest ∆GBFE of −21.87 ± 3.98 kcal/mol. Analysis of the thermodynamic components revealed that the solvent contribution from the Poisson-Boltzmann model (pb) posed a significant destabilizing influence across all the complexes. Nevertheless, this adverse effect was effectively mitigated by substantial positive contributions from electrostatic (el), van der Waals (vdW), and non-polar (np) interactions. These findings suggest that the top four ligands exhibit a natural propensity to associate with the FGFR1 receptor, forming energetically favorable and stable complexes throughout the simulation period. Notably, M34 stood out as the most stable adduct based on its overall energy profile.

To further dissect the energetic contributions at the residue level, decomposition analysis was performed. M29 exhibited the most favorable binding energy (−13.57 ± 2.00 kcal/mol), primarily stabilized by hydrophobic residues such as Phe32, Gly33, and Val35. Minor destabilizing effects were noted from residues Asp184 and Lys57. Similarly, M28 showed strong binding affinity (−10.12 ± 1.98 kcal/mol), with stabilizing contributions from Asn111, Val35, and Leu173, while residues Lys57, Glu74, and Glu105 depicted unfavorable effects. The M34-complex displayed moderate binding energy (−9.87 ± 2.00 kcal/mol), supported by residues Val35, Val104, Asn171, Leu173, and Ala183, and opposed by interactions with Lys57 and Asp184. In contrast, M32 demonstrated the weakest binding (−6.57 ± 1.27 kcal/mol), with stabilizing hydrophobic contacts at residues Leu27, Val35, and Leu173, but significant destabilization from Asp184 and Lys57. Detailed residue-wise free energy contributions are presented in Supplementary Information S1 Table in S1 File. Corresponding bar plots and heatmaps highlighting the interactions between active site residues of FGFR and the respective ligands are shown in Supplementary Information S1-S5 Figs in S1 File, respectively. In these heatmaps, blue shades denote favorable (negative value) contributions, while red shades represent unfavorable (positive value) ones [62].

Overall, the results indicate that hydrophobic and polar residues, particularly Val35 and Leu173, played dominant roles in stabilizing the ligand-FGFR complexes. M29 emerged as the most promising ligand candidate based on per-residue energetic contributions, while M34 demonstrated the greatest thermodynamic stability based on ∆GBFE. Collectively, these findings suggest that all top ligands formed energetically stable complexes with FGFR1, with M34 and M29 showing particularly strong and favorable interactions. These results highlight their potential as promising FGFR1-targeted inhibitors, although further experimental validation is required to confirm their therapeutic efficacy.

3.4. Protein-ligand interaction heatmaps and histograms

To understand the residue-level binding dynamics, amino acid interaction heatmaps and histograms were generated for the 200 ns (20,000 frames) simulation period and provided in Supplementary Information S6-S13 Figs in S1 File. The analyses revealed that all four ligands complexed with the FGFR1 protein predominantly exhibited stable van der Waals (vdW) interactions, with sporadic hydrogen bonding observed in some cases.

In the M28-complex, the ligand maintained strong and consistent vdW interactions with residues Leu27, Glu105, Ala107, and Ser108. Among these, Ser108 and Glu105 accounted for the highest number of contacts, followed by Ala107and Leu27, suggesting their central role in ligand stabilization. M29 showed stable vdW contacts with residues Gly33, Gln34, and Met58, while transient interactions were observed with Gly28 and Glu29 residues. A hydrogen bond with residue Asn20 emerged after frame 13,600. Residues Gly33 and Gln34 were the most frequently engaged residues in the M29-complex, with moderate contacts involving residues Met58, Gly29, Gly28, and Asn202. In the M32-complex, the ligand demonstrated stable vdW contacts with residues Ala107 and Ser108 post frame 11300, along with moderate transient interactions involving residues Leu27 and Glu29. Residue Leu27 also formed intermittent hydrogen bonds, further contributing to ligand retention. The residue Ser108 was the most frequently interacting residue in this complex, followed by Ala107 and Leu27. For the M34-complex, stable vdW interactions were primarily observed with residues Glu29, Leu27, Gly18, and Arg170. Occasional hydrogen bonding was also noted with residue Leu27. The corresponding bar plot/histogram indicated residue Glu29 as the most engaged residue, followed by Leu27, with moderate contributions from Arg170, Glu105, and Gly28 residues.

Collectively, these findings underscore the importance of stable vdW contacts, particularly with residues Glu29, Gly33, Glu105, and Ser108 in securing ligand binding and maintaining complex stability. Their frequent engagement across multiple complexes highlights their potential relevance as key anchoring sites in FGFR1-ligand interactions.

3.5. Similarity measure analysis

The chemical similarity of four top candidate compounds (as suggested by MDS) against known FGFR1 drugs was evaluated. Fig 15 shows how structural modifications influence similarities, providing insights into the potential effectiveness of new compounds using a two-color scheme to highlight conserved and divergent features [77]. Green regions represent structural elements shared with reference drugs, suggesting retention of critical pharmacophores needed for FGFR1 inhibition, while pink areas indicate novel modifications or distinct scaffolds [66].

thumbnail
Fig 15. Similarity analysis between known and potential inhibitors of FGFR1.

https://doi.org/10.1371/journal.pone.0331837.g015

The analysis revealed that M32 exhibits strong green overlap with infirgatinib, implying it likely maintains a similar binding mode and could serve as a high-priority candidate for lead optimization. Compounds M28 and M29 displayed a mix of green and pink regions, indicating partial structural conservation with opportunities for hybrid scaffold development to fine-tune selectivity or potency. In contrast, M34 showed prominent pink regions, suggesting unique structural features that may either introduce novel binding mechanisms or require further validation to confirm target specificity. The overall results suggest that M32 is the most promising candidate due to its strong structural similarity to known FGFR1 inhibitors, while M28-M34 offer opportunities for optimization or novel scaffold exploration.

The dice similarity index calculated between the known and potential inhibitors of FGFR1 is shown in Table 6.

thumbnail
Table 6. Dice similarity index between the known and potential FGFR1 inhibitors.

https://doi.org/10.1371/journal.pone.0331837.t006

The dice similarity index revealed that M32 has the strongest structural resemblance to infirgatinib (0.4532830), making it the most promising candidate. M28 showed moderate similarity across all inhibitors, with the highest value for infirgatinib (0.313043). M29 exhibited comparable similarity to erdafitinib (0.251656) and pemigatinib (0.294872), indicating shared structural features. Meanwhile, M34 had the lowest overall similarity, peaking with futibatinib (0.263158). Overall, these results suggest that all hit compounds are viable leads, with M32 standing out as the most promising, warranting further optimization to enhance potency.

3.6. Prediction of pIC50 values

In addition to classification models, a regression model was used to estimate the pIC50 values of the candidate compounds. Table 7 summarizes the performance metrics of the top 20 regression models.

thumbnail
Table 7. General performance of 20 different regression models.

https://doi.org/10.1371/journal.pone.0331837.t007

Light Gradient Boosting Machine (LGBM), Hist Gradient Boosting (HGB), and Gamma Regressor (GR) were combined into a voting regressor for superior performance. For LGBM, the parameters were set to n_estimators = ‘600’ and learning_rate = ‘0.05’; for HGB, max_iter = ‘400’ and learning_rate =’0.05’; for GR, max_iter = ‘500’ and alpha = ‘0.02’; all other parameters were set to their default values.

The five-fold cross-validation of individual regression models and the voting regressor is shown in Table 8.

thumbnail
Table 8. Five-fold cross-validation of individual regressors and voting regressor.

https://doi.org/10.1371/journal.pone.0331837.t008

Experimental versus predicted pIC50 values are shown in Fig 16, while Table 9 compares these values for known and potential inhibitors.

thumbnail
Table 9. Predicted and experimental pIC50 values for known inhibitors and potential inhibitors of FGFR1.

https://doi.org/10.1371/journal.pone.0331837.t009

thumbnail
Fig 16. Experimental versus predicted pIC50 for training and testing set.

https://doi.org/10.1371/journal.pone.0331837.g016

The predicted pIC50 values of the hit candidate compounds (7.07–7.47), while lower than those of FDA-approved selective FGFR1 inhibitors, demonstrate promising lead-like potency. These values suggest that the compounds possess potential as FGFR1 inhibitors and could benefit from further structural optimization. Collectively, the findings support their candidacy for continued validation and development in FGFR1-targeted drug discovery.

Overall, the study identified four hit drug candidates as FGFR1 inhibitors by screening a large dataset. The combination of machine learning-based virtual screening with an in silico method has been found to accelerate the preliminary drug discovery process, enabling efficient analysis of extensive datasets within a short period despite the resource constraints. This integration has not only improved accuracy but also ensured the reliable identification of high-potential drug candidates for FGFR1 inhibition, streamlining the selection for further studies.

4. Conclusion

The study demonstrates the effectiveness of combining AI-guided screening with molecular docking and dynamics simulations in identifying structurally stable and energetically favorable FGFR1 inhibitors. Four hit molecules were identified from a pool of 10 million molecules using this approach. The added insights from per-residue energy decomposition and long-term interaction histogram profiling offered valuable mechanistic insights into ligand-residue interactions, reinforcing the reliability of the identified candidates. These computational results offer a cost effective and quick foundation for further investigation. Future work will focus on pharmacophore modelling, structural optimization of hit compounds and their biological evaluation through in vitro and in vivo studies to support preclinical development.

Supporting information

S1 File. Supplementary data.

This file contains additional data supporting the findings of this study.

https://doi.org/10.1371/journal.pone.0331837.s001

(DOCX)

Acknowledgments

The author(s) hereby declare that Artificial Intelligence (AI) technology (ChatGPT) has been used during the preparation of the work to improve the readability and language of the manuscript. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the published article.

References

  1. 1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. pmid:38572751
  2. 2. Urbach D, Lupien M, Karagas MR, Moore JH. Cancer heterogeneity: origins and implications for genetic association studies. Trends Genet. 2012;28(11):538–43. pmid:22858414
  3. 3. Ornitz DM, Itoh N. The fibroblast growth factor signaling pathway. Wiley Interdiscip Rev Dev Biol. 2015;4(3):215–66. pmid:25772309
  4. 4. Perez-Garcia J, Muñoz-Couselo E, Soberino J, Racca F, Cortes J. Targeting FGFR pathway in breast cancer. Breast. 2018;37:126–33. pmid:29156384
  5. 5. Liu Q, Huang J, Yan W, Liu Z, Liu S, Fang W. FGFR families: biological functions and therapeutic interventions in tumors. MedComm (2020). 2023;4(5):e367. pmid:37750089
  6. 6. Turner N, Pearson A, Sharpe R, Lambros M, Geyer F, Lopez-Garcia MA, et al. FGFR1 amplification drives endocrine therapy resistance and is a therapeutic target in breast cancer. Cancer Res. 2010;70(5):2085–94. pmid:20179196
  7. 7. Shi Y, Ma Z, Cheng Q, Wu Y, Parris AB, Kong L, et al. FGFR1 overexpression renders breast cancer cells resistant to metformin through activation of IRS1/ERK signaling. Biochim Biophys Acta Mol Cell Res. 2021;1868(1):118877. pmid:33007330
  8. 8. Wang K, Ji W, Yu Y, Li Z, Niu X, Xia W, et al. FGFR1-ERK1/2-SOX2 axis promotes cell proliferation, epithelial-mesenchymal transition, and metastasis in FGFR1-amplified lung cancer. Oncogene. 2018;37(39):5340–54. pmid:29858603
  9. 9. Gorringe KL, Jacobs S, Thompson ER, Sridhar A, Qiu W, Choong DYH, et al. High-resolution single nucleotide polymorphism array analysis of epithelial ovarian cancer reveals numerous microdeletions and amplifications. Clin Cancer Res. 2007;13(16):4731–9. pmid:17699850
  10. 10. Lee Y-Y, Ryu J-Y, Cho Y-J, Choi J-Y, Choi J-J, Choi CH, et al. The anti-tumor effects of AZD4547 on ovarian cancer cells: differential responses based on c-Met and FGF19/FGFR4 expression. Cancer Cell Int. 2024;24(1):43. pmid:38273381
  11. 11. Ross JS, Wang K, Al-Rohil RN, Nazeer T, Sheehan CE, Otto GA, et al. Advanced urothelial carcinoma: next-generation sequencing reveals diverse genomic alterations and targets of therapy. Mod Pathol. 2014;27(2):271–80. pmid:23887298
  12. 12. Edwards J, Krishna NS, Witton CJ, Bartlett JMS. Gene amplifications associated with the development of hormone-resistant prostate cancer. Clin Cancer Res. 2003;9(14):5271–81. pmid:14614009
  13. 13. Ko J, Meyer AN, Haas M, Donoghue DJ. Characterization of FGFR signaling in prostate cancer stem cells and inhibition via TKI treatment. Oncotarget. 2021;12(1):22–36. pmid:33456711
  14. 14. Schäfer MH, Lingohr P, Sträßer A, Lehnen NC, Braun M, Perner S, et al. Fibroblast growth factor receptor 1 gene amplification in gastric adenocarcinoma. Hum Pathol. 2015;46(10):1488–95. pmid:26239623
  15. 15. Katoh M. Therapeutics Targeting FGF Signaling Network in Human Diseases. Trends Pharmacol Sci. 2016;37(12):1081–96. pmid:27992319
  16. 16. Zhang P, Yue L, Leng Q, Chang C, Gan C, Ye T, et al. Targeting FGFR for cancer therapy. J Hematol Oncol. 2024;17(1):39. pmid:38831455
  17. 17. Touat M, Ileana E, Postel-Vinay S, André F, Soria J-C. Targeting FGFR signaling in cancer. Clin Cancer Res. 2015;21(12):2684–94. pmid:26078430
  18. 18. Du S, Zhang Y, Xu J. Current progress in cancer treatment by targeting FGFR signaling. Cancer Biol Med. 2023;20:490–9.
  19. 19. Facchinetti F, Hollebecque A, Braye F, Vasseur D, Pradat Y, Bahleda R, et al. Resistance to selective FGFR inhibitors in FGFR-driven urothelial cancer. Cancer Discov. 2023;13(9):1998–2011. pmid:37377403
  20. 20. Hinkson IV, Madej B, Stahlberg EA. Accelerating therapeutics for opportunities in medicine: a paradigm shift in drug discovery. Front Pharmacol. 2020;11:770. pmid:32694991
  21. 21. Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov. 2010;9(3):203–14. pmid:20168317
  22. 22. Shah FA, Qadir H, Khan JZ, Faheem M. A review: from old drugs to new solutions: the role of repositioning in alzheimer’s disease treatment. Neuroscience. 2025;576:167–81. pmid:40164279
  23. 23. Yu W, MacKerell AD. Computer-Aided Drug Design Methods. In: Sass P, editor. Antibiotics: Methods and Protocols. New York, NY: Springer; 2017. pp. 85–106.
  24. 24. Sydow D, Morger A, Driller M, Volkamer A. TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data. J Cheminform. 2019;11(1):29. pmid:30963287
  25. 25. Chhetri SP, Bhandari VS, Maharjan R, Lamichhane TR. Identification of lead inhibitors for 3CLpro of SARS-CoV-2 target using machine learning based virtual screening, ADMET analysis, molecular docking and molecular dynamics simulations. RSC Adv. 2024;14(40):29683–92. pmid:39297030
  26. 26. Salimi A, Lim JH, Jang JH, Lee JY. The use of machine learning modeling, virtual screening, molecular docking, and molecular dynamics simulations to identify potential VEGFR2 kinase inhibitors. Sci Rep. 2022;12(1):18825. pmid:36335233
  27. 27. Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA Jr, et al. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov. 2020;19(5):353–64. pmid:31801986
  28. 28. Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, et al. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024;52(D1):D1180–92. pmid:37933841
  29. 29. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Delivery Rev. 1997;23(1–3):3–25.
  30. 30. Doak BC, Over B, Giordanetto F, Kihlberg J. Oral druggable space beyond the rule of 5: insights from drugs and clinical candidates. Chem Biol. 2014;21(9):1115–42. pmid:25237858
  31. 31. Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods. 2015;71:58–63. pmid:25132639
  32. 32. Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today. 2006;11(23–24):1046–53. pmid:17129822
  33. 33. Boldini D, Ballabio D, Consonni V, Todeschini R, Grisoni F, Sieber SA. Effectiveness of molecular fingerprints for exploring the chemical space of natural products. J Cheminform. 2024;16(1):35. pmid:38528548
  34. 34. Landrum G. Rdkit: Open-source cheminformatics software. 2016. Available from: https://www.rdkit.org/
  35. 35. Kwon S, Bae H, Jo J, Yoon S. Comprehensive ensemble in QSAR prediction for drug discovery. BMC Bioinformatics. 2019;20(1):521. pmid:31655545
  36. 36. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12: 2825–30.
  37. 37. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery; 2016. pp. 785–94.
  38. 38. Ruta D, Gabrys B. Classifier selection for majority voting. Inf Fusion. 2005;6(1):63–81.
  39. 39. Lima AN, Philot EA, Trossini GHG, Scott LPB, Maltarollo VG, Honorio KM. Use of machine learning approaches for novel drug discovery. Expert Opin Drug Discov. 2016;11(3):225–39. pmid:26814169
  40. 40. eMolecules. [cited 4 Jan 2024]. Available from: https://search.emolecules.com/
  41. 41. Luque A, Carrasco A, Martín A, de las Heras A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn. 2019;91:216–31.
  42. 42. Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci. 2021;7:e623. pmid:34307865
  43. 43. Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, et al. PubChem 2023 update. Nucleic Acids Res. 2023;51(D1):D1373–80. pmid:36305812
  44. 44. Hanwell MD, Curtis DE, Lonie DC, Vandermeersch T, Zurek E, Hutchison GR. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J Cheminform. 2012;4(1):17. pmid:22889332
  45. 45. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J Comput Chem. 2009;30(16):2785–91. pmid:19399780
  46. 46. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–42. pmid:10592235
  47. 47. Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46(W1):W296–303. pmid:29788355
  48. 48. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455–61. pmid:19499576
  49. 49. Phunyal A, Adhikari A, Adhikari Subin J. In silico exploration of potent flavonoids for dengue therapeutics. PLoS One. 2024;19(12):e0301747. pmid:39666626
  50. 50. Yuan S, Chan HCS, Hu Z. Using PyMOL as a platform for computational drug design. WIREs Comput Mol Sci. 2017;7(2).
  51. 51. Adasme MF, Linnemann KL, Bolz SN, Kaiser F, Salentin S, Haupt VJ, et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucleic Acids Res. 2021;49(W1):W530–4. pmid:33950214
  52. 52. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25.
  53. 53. Huang J, MacKerell AD Jr. CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. J Comput Chem. 2013;34(25):2135–45. pmid:23832629
  54. 54. Zoete V, Cuendet MA, Grosdidier A, Michielin O. SwissParam: a fast force field generation tool for small organic molecules. J Comput Chem. 2011;32(11):2359–68. pmid:21541964
  55. 55. Sharma BP, Adhikari Subin J, Marasini BP, Adhikari R, Pandey SK, Sharma ML. Triazole based Schiff bases and their oxovanadium(IV) complexes: Synthesis, characterization, antibacterial assay, and computational assessments. Heliyon. 2023;9(4):e15239. pmid:37089299
  56. 56. Neupane P, Adhikari Subin J, Adhikari R. Assessment of iridoids and their similar structures as antineoplastic drugs by in silico approach. J Biomol Struct Dyn. 2024:1–16. pmid:38345021
  57. 57. Subin JA, Shrestha RLS. Computational Assessment of the Phytochemicals of Panax ginseng C.A. Meyer Against Dopamine Receptor D1 for Early Huntington’s Disease Prophylactics. Cell Biochem Biophys. 2024;82(4):3413–23. pmid:39046621
  58. 58. Lal Swagat Shrestha R, Marasini BP, Adhikari Subin J. Phytochemicals of Swertia chirayita Roxb. ex Fleming against malarial dihydroorotate dehydrogenase: an in silico study. Discov Mol. 2024;1(1).
  59. 59. Wang E, Sun H, Wang J, Wang Z, Liu H, Zhang JZH, et al. End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. Chem Rev. 2019;119(16):9478–508. pmid:31244000
  60. 60. Lal Swagat Shrestha R, Maharjan B, Shrestha T, Prasad Marasini B, Adhikari Subin J. Geometrical and thermodynamic stability of govaniadine scaffold adducts with dopamine receptor D1. Results Chem. 2024;7:101363.
  61. 61. Valdés-Tresanco MS, Valdés-Tresanco ME, Valiente PA, Moreno E. gmx_MMPBSA: A New Tool to Perform End-State Free Energy Calculations with GROMACS. J Chem Theory Comput. 2021;17(10):6281–91. pmid:34586825
  62. 62. Kumari R, Kumar R, Open Source Drug Discovery Consortium, Lynn A. g_mmpbsa--a GROMACS tool for high-throughput MM-PBSA calculations. J Chem Inf Model. 2014;54(7):1951–62. pmid:24850022
  63. 63. Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2(22):3204–18. pmid:15534697
  64. 64. Eckert H, Bajorath J. Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today. 2007;12(5–6):225–33. pmid:17331887
  65. 65. Bero SA, Muda AK, Choo YH, Muda NA, Pratama SF. Similarity measure for molecular structure: a brief review. J Phys: Conf Ser. 2017;892:012015.
  66. 66. Riniker S, Landrum GA. Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform. 2013;5(1):43. pmid:24063533
  67. 67. Kumari S, Kumar D, Mittal M. An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cogn Comput Eng. 2021;2:40–6.
  68. 68. Ramírez D, Caballero J. Is it reliable to take the molecular docking top scoring position as the best solution without considering available structural data? Molecules. 2018;23(5):1038. pmid:29710787
  69. 69. Fargher HA, Sherbow TJ, Haley MM, Johnson DW, Pluth MD. C-H⋯S hydrogen bonding interactions. Chem Soc Rev. 2022;51(4):1454–69. pmid:35103265
  70. 70. Almoyad MAA, Wahab S, Ansari MN, Ahmad W, Hani U, Chandra S. Predictive insights into plant-based compounds as fibroblast growth factor receptor 1 inhibitors: a combined molecular docking and dynamics simulation study. J Biomol Struct Dyn. 2024;:1–10. pmid:38669200
  71. 71. Mandle RJ. Implementation of a cylindrical distribution function for the analysis of anisotropic molecular dynamics simulations. PLoS One. 2022;17(12):e0279679. pmid:36584026
  72. 72. Aljarba NH, Hasnain MS, Bin-Meferij MM, Alkahtani S. An in-silico investigation of potential natural polyphenols for the targeting of COVID main protease inhibitor. J King Saud Univ Sci. 2022;34(7):102214. pmid:35811756
  73. 73. Falsafi-Zadeh S, Karimi Z, Galehdari H. VMD DisRg: New User-Friendly Implement for calculation distance and radius of gyration in VMD program. Bioinformation. 2012;8(7):341–3. pmid:22553393
  74. 74. Das R, Bhattarai A, Karn R, Tamang B. Computational investigations of potential inhibitors of monkeypox virus envelope protein E8 through molecular docking and molecular dynamics simulations. Sci Rep. 2024;14(1):19585. pmid:39179615
  75. 75. Ghahremanian S, Rashidi MM, Raeisi K, Toghraie D. Molecular dynamics simulation approach for discovering potential inhibitors against SARS-CoV-2: a structural review. J Mol Liq. 2022;354:118901. pmid:35309259
  76. 76. Dhital S, Parajuli N, Poudel M, Shrestha T, Bharati S, Maharjan B, et al. Spatial and Energetic Stability Assessment of the Adducts of Phytocompounds of Piper longum L. with α-amylase by Computational Approach. Biointerface Res Appl Chem. 2024;14(6):126.
  77. 77. Riniker S, Landrum GA. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J Cheminform. 2013;5(1):26. pmid:23721588