Figures
Abstract
Dehydrosqualene synthase is a validated antibacterial target involved in Staphylococcus aureus staphyloxanthin biosynthesis, a key factor in oxidative stress resistance and virulence. From a dataset of 299,845 natural products, we applied pharmacophore-based screening followed by QSAR-based screening, identifying 1,978 compounds with predicted Ki values of 100 nM or lower. Among them, 47 molecules showed remarkable predicted activity, with Ki values lower than 1.0 nanomolar. Notably, despite the wide chemical diversity of the library evaluated and the fact that the models were not trained with certain scaffolds of known inhibitors of staphyloxanthin biosynthesis, nine of these chemotypes were recovered among these top candidates, demonstrating the ability of the models to capture relevant structural features beyond the training data. Five representative compounds from this group showed high binding affinity and stable interactions with key CrtM residues in molecular dynamics simulations, reflecting the potential of the overall set as a source of promising leads for novel antivirulence agents targeting dehydrosqualene synthase in S. aureus.
Citation: Amorim J, Ortega S, Rodas R, Vélez L, Arteaga D, Carpio J (2026) Pharmacophore- and QSAR-guided discovery of natural product inhibitors targeting dehydrosqualene synthase in Staphylococcus aureus. PLoS One 21(4): e0345620. https://doi.org/10.1371/journal.pone.0345620
Editor: Peter Mbugua Njogu, University of Nairobi Faculty of Health Sciences, KENYA
Received: August 16, 2025; Accepted: March 8, 2026; Published: April 29, 2026
Copyright: © 2026 Amorim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data generated or analyzed during this study are included in this published article and its supplementary information files.
Funding: This research was supported by the Universidad Católica de Cuenca with grant number PIC5P23-26.
Competing interests: The authors declare no conflict of interest.
Introduction
Therapeutic strategies aimed at inhibiting bacterial virulence factors have become a major focus in antimicrobial research [1]. Staphylococcus aureus is among the bacterial species that produce numerous such factors [2]. This clinically significant pathogen is capable of causing a broad spectrum of human infections and can persist in diverse host environments, ranging from skin surfaces to abiotic medical devices and deep-seated tissues [3,4]. Its adaptability and persistence make it challenging to eradicate, highlighting its potential as a target for the development of novel therapeutic approaches [5].
One of the main virulence factors of S. aureus is staphyloxanthin (SXT), a carotenoid pigment that plays an essential role in its defense against oxidative stress [4]. Furthermore, although not part of classical quorum sensing, STX has been reported to confer cross-species protection by transferring from S. aureus to Pseudomonas aeruginosa, thereby enhancing the resistance of the latter to oxidative damage [6]. In murine models of type 2 diabetes, pigment-producing S. aureus strains exhibited enhanced immune evasion and significantly delayed wound healing, whereas STX-deficient mutants were more susceptible to host defenses and allowed faster tissue recovery [7].
The biosynthesis of SXT begins with the precursor molecule farnesyl diphosphate (FPP), which undergoes a series of enzymatic transformations [8]. A key enzyme in this pathway is dehydrosqualene synthase (CrtM), which catalyzes the head-to-head condensation of two FPP molecules to produce dehydrosqualene. This reaction initiates the formation of the triterpenoid backbone that serves as the structural precursor for STX. Following dehydrosqualene synthesis, the biosynthetic pathway proceeds through a cascade of reductive and oxidative enzymatic modifications, ultimately leading to the production of STX [9].
Numerous experimental studies have demonstrated that impaired STX production results in increased bacterial susceptibility to oxidative stress and a marked attenuation of pathogenic traits. Compounds that suppress STX production include several natural products including both microbial-derived metabolites [10] and phytochemicals [8,9]. Celastrol, a plant triterpenoid, reduces STX production by 50% at 0.36 µM (approximately 0.16 µg/mL) [11]. Flavone, at a concentration of 50 µg/mL, suppressed STX synthesis and increased the susceptibility of S. aureus to hydrogen peroxide by approximately 100-fold [12]. In contrast, thymol required a concentration of 100 µg/mL to achieve a 90% reduction in STX production [13].
CrtM has been experimentally validated as a molecular target through multiple inhibitors with known Ki values [14]. These quantitative data make CrtM particularly suitable for ligand-based drug discovery, in contrast to other targets in this pathway for which fewer validated ligands are available [15]. However, it is important to note that certain reported compounds reducing STX levels may operate through mechanisms beyond direct enzyme inhibition. In some cases, pigment suppression may arise from transcriptional regulation or from the inhibition of other enzymes within the biosynthetic pathway, rather than from direct interference with CrtM activity [5,8,16]. For example, the natural monoterpenoid myrtenol reduced STX levels by approximately 20% at 75 μg/mL by downregulating CrtM expression [17]. Similarly, in another study, carvacrol at the same concentration (75 µg/mL) inhibited 72% of STX production and its metabolic intermediates in methicillin-resistant S. aureus (MRSA) through downregulation of CrtM gene expression [18].
As the chemical space of confirmed CrtM inhibitors remains relatively limited, the structural diversity of natural products provides a valuable opportunity to discover candidates with improved potency and pharmacological properties, potentially more suitable for clinical development than those reported to date. A major limitation, however, lies in the uncertainty surrounding the molecular targets of many reported compounds, as numerous studies rely solely on SXT reduction as a phenotypic readout without experimentally confirming direct inhibition of the enzyme [8,10]. This challenge hinders the accurate selection of candidates with genuine CrtM inhibitory activity. In this context, in silico drug discovery provides structure- or mechanism-based strategies for the early identification and rational optimization of compounds [19,20]. In fact, a recent in silico study employed machine learning–based QSAR models to evaluate 1,261 natural compounds as potential CrtM inhibitors, ultimately finding several promising candidates despite the relatively limited number of molecules assessed [21].
In this context, the present study aimed to identify natural compounds with potential CrtM inhibitory activity by implementing an integrated in silico strategy. Using two sequential virtual screening strategies—pharmacophore- and QSAR-based—were integrated with molecular dynamics analyses to identify promising candidates from a library of 299,845 natural products. This approach led to the identification of structurally diverse molecules with the potential to inhibit CrtM at concentrations predicted to be comparable to those of the most potent inhibitors reported to date, and exhibiting overall acceptable pharmacokinetic and toxicity profiles for hit compounds. Overall, the results presented here provide a valuable basis for the development of novel anti-virulence agents targeting S. aureus.
Methodology
Selection of active compounds and generation of decoys
In the BindingDB database [22], 93 compounds were reported as CrtM inhibitors; among them, only 35 compounds had experimentally determined Ki values, ranging from 0.2 to 20,160 nM (corresponding to a pKᵢ range of approximately 9.70 to 4.70), and these were used for the QSAR model construction. A subset of the 22 most potent inhibitors was selected for pharmacophore model generation. Decoy compounds were generated using the DUD-E web server [23], with the SMILES of each active molecule provided as input.
In silico compounds preparation
A total of 299,845 natural product structures were retrieved from the COCONUT database (COlleCtion of Open Natural prodUcTs) [24] and prepared for computational analysis using RDKit, Pandas, and NumPy libraries to generate 3D structures and perform file editing. Protonation states were standardized to pH 7.4, and geometry optimization was conducted through energy minimization with the MMFF94 force field. The process employed the steepest descent algorithm followed by the conjugate gradient method with default parameters, as implemented in Open Babel v3.1.1 [25]. All the structures were then converted to MOL2 and SDF formats.
Pharmacophore modeling, validation, and screening
A ligand-based pharmacophore model was developed using a dataset composed of diverse CrtM inhibitors and their corresponding decoys. The model was constructed using the PSearch software [26], which employs a three-dimensional pharmacophore signature approach independent of molecular alignment. Unlike other methods, PSearch does not divide the dataset into training and test subsets; instead, pharmacophore hypotheses are derived from the active molecules and statistically evaluated using the F-score (F₀.₅). Within PSearch, decoy molecules are incorporated to assess the discriminative performance of each model between active and inactive compounds, while compound clustering reduces redundancy and mitigates potential overfitting. This approach ensures that the resulting pharmacophores capture activity-related structural features rather than general physicochemical properties [26]. In the modeling protocol, referred to as “Strategy 1” within the software, the 22 most potent active compounds—with experimentally determined activities up to 1000 nM—were selected from the previously described set of 35 molecules and compiled into a training set used for model generation. The remaining molecules were excluded to avoid introducing noise and biasing the model toward less promising chemical profiles. To enhance the chemical diversity of the active set, the co-crystallized reference ligand (compound 673) was included. This compound is reported in the Protein Data Bank under the entry PDB ID: 3ACX, which corresponds to the experimentally resolved crystal structure of the CrtM–ligand complex.
Validation of the pharmacophore model was carried out using a dataset of 270 compounds, consisting of active molecules and 248 decoys. Each compound was screened using the final pharmacophore hypothesis in the Pharmit web server [27]. Pharmit only returns compounds that satisfy all pharmacophoric constraints; these were assigned a binary score of 1 (match), whereas all non-retrieved compounds were assigned a value of 0 (non-match). These binary outputs were compared with the true activity labels (1 for actives and 0 for decoys) to evaluate the ability of the pharmacophore model to distinguish between the two classes.
Sensitivity was defined as the proportion of actives correctly identified by the model (true positives, TP, divided by TP + false negatives, FN), whereas specificity represented the proportion of decoys correctly rejected (true negatives, TN, divided by TN + false positives, FP). Overall accuracy was computed as (TP + TN)/N, where N is the total number of compounds in the validation set. The ROC curve and its corresponding area under the curve (AUC) were derived directly from the binary match scores.
Early enrichment was assessed by determining how many active compounds appeared in the top-ranked subset relative to the number expected by chance, following the standard definition of enrichment factor used in decoy-based virtual screening workflows [28]. The expected value was calculated using the expression (k/N) × A, where k is the number of compounds in the top-ranking fraction, N is the total number of screened molecules, and A is the total number of actives. The enrichment factor (EF) was obtained by dividing the number of actives retrieved in the top subset by this expected value.
Structural similarity analysis was carried out using SkelSpheres fingerprints to evaluate the chemical diversity within the set of 23 active compounds employing DataWarrior software [29]. Once the pharmacophore model meeting the statistical criterion (F₀.₅ ≥ 0.8) was obtained, it was used for virtual screening on the Pharmit server against the library of 299,845 natural products. A molecular weight filter of 300–500 Da was applied to select compounds within the drug-like range for further analysis.
Quantitative structure-activity relationship modeling
QSAR models were developed and validated using the QSARINS software package (version 2.2.4) [30], which applies a genetic algorithm (GA) to iteratively select the descriptors most closely associated with the observed activity. Final model construction was performed using multiple linear regression (MLR), and performance was thoroughly evaluated using statistical parameters to guarantee both robustness and predictive accuracy.
The data set of 35 active molecules were divided into a training set of 25 and a test set of eight, using the response-based split option in QSARINS to preserve a broader range of activity values. Two of the active molecules were excluded for being outside the applicability domain during QSAR model construction. The experimental Ki values of the actives, expressed in nanomolar concentrations, were converted to pKi values. Molecular descriptors were computed using the OCHEM online platform (version 4.3.156) [31], with compound structures provided in SDF format. The pool of descriptors considered for model development were generated with alvaDesc v.2.0.16 (5,666 descriptors), Dragon v.7 (5,270 descriptors), CDK 2.8 (256 descriptors), RDKit and MORDRED (1,826 descriptors). Additional descriptor sets were obtained from MOPAC2016 (35 descriptors), KrakenX (124 MOPAC 2016-derived descriptors), PyDescriptor (16,251 descriptors), MERA (529 descriptors), MERSY (42 descriptors), Inductive descriptors (54 descriptors), and Spectrophores (144 descriptors). Molecular structures were geometry-optimized using ULYSSES [32], a C++ library for semiempirical quantum chemical calculations, integrated within the OCHEM platform. The PM6 semiempirical method was used to generate low-energy conformations [33]. To refine the descriptor pool, features that were constant in more than 80% of the molecules or exhibited pairwise correlations above 95% were removed, yielding a final set of 1,783 descriptors. Variable selection was carried out using a genetic algorithm (GA) with a population size of 20, a maximum of 10,000 generations, and a mutation rate of 20%.
QSAR model validation
The reliability of QSAR models is determined not only by their ability to reproduce the experimental data but also on demonstrating robust predictive capacity through comprehensive validation protocols [34]. To verify the internal validity and robustness of the models, several cross-validation strategies were employed, including leave-one-out (Q²_LOO), leave-many-out (Q²_LMO), and the Concordance Correlation Coefficient in cross-validation (CCC_cv). The LMO procedure was repeated over 2,000 iterations, each time randomly excluding 30% of the dataset. In addition, Y-randomization tests were carried out to assess the likelihood of chance correlation; this involved scrambling 30% of the response values over 2,000 iterations and evaluating model performance under these randomized conditions. To test external predictive power, independent test set predictions were analyzed using Q²_ext-F1, Q²_ext-F2, Q²_ext-F3, and the external Concordance Correlation Coefficient (CCC_ext). The applicability domain (AD) was defined via the leverage method, and visualized using a Williams plot, allowing the detection of compounds that fall outside the reliable prediction space of the model. The warning leverage threshold was established based on the leverage formula applied to the training set h* = 3(p + 1)/n, where h* is the critical leverage value, p is the number of descriptors, and n is the number of molecules in the training set [35].
For the QSAR-driven virtual screening, we calculated the molecular descriptors included in the final model equation for each compound and used them to generate pKi predictions. To assess whether the selected compounds were within the applicability domain (AD) defined by the training set, the Insubria graph was used as a visual analyses tool [36].
Similarity analyses
Structural similarity of the molecules in the datasets was evaluated using DataWarrior software (version 060402) [29], employing the FragFp fingerprint algorithm. This approach enabled the identification of fragment-level similarities by encoding characteristic functional groups and substructural patterns. The algorithm is based on a curated set of 512 frequently occurring fragments in organic compounds, designed to reduce redundancy while preserving chemical relevance. The inclusion of wildcard atoms in many fragments enables FragFp to detect meaningful structural similarities even in the presence of small atomic differences, making it particularly effective for comparing diverse scaffolds such as those in our dataset [29]. In addition, similarity analysis based on SkelSpheres fingerprints was also performed to assess topological relationships among compounds. Unlike FragFp, SkelSpheres focuses on the two-dimensional connectivity of the molecular framework, providing a complementary measure of structural diversity within the dataset [29].
In silico toxicity and drug-likeness evaluation of the best-scoring CrtM candidates
The toxicity profile of the best-scoring CrtM inhibitors was evaluated using the pkCSM web server [37], which predicts pharmacokinetic and toxicity parameters based on graph-based molecular signatures derived from the compound structure. SMILES strings of the selected ligands were submitted to the pkCSM interface to estimate endpoints relevant to early drug safety assessment, including mutagenicity (Ames test), hepatotoxicity, skin sensitization, hERG I and II inhibition, and systemic toxicity parameters such as the maximum tolerated dose (MTD), oral rat acute toxicity (LD₅₀), and chronic oral toxicity (LOAEL). The resulting predictions were analyzed comparatively among the selected compounds to identify relative differences in toxicity risk.
The druglikeness, medicinal chemistry, and pharmacokinetic profiles of the top-ranked CrtM inhibitor candidates were evaluated using the SwissADME web tool [38]. Each compound was submitted in SMILES format to calculate physicochemical and ADME-related parameters. The assessment included druglikeness filters such as Lipinski, Ghose, Veber, Egan, and Muegge rules, along with the bioavailability score. Medicinal chemistry parameters included PAINS and Brenk structural alerts, leadlikeness, and synthetic accessibility. The consensus Log Po/w was used to estimate lipophilicity, while gastrointestinal (GI) absorption and blood–brain barrier (BBB) permeability predictions provided insights into pharmacokinetic behavior.
Molecular docking analysis
Following the QSAR-based virtual screening, molecular docking was carried out to determine the binding modes of the most promising ligands within the CrtM active site. The resulting complexes provided the initial coordinates for the subsequent molecular dynamic simulations. Docking analyses were performed using the Protein-Ligand ANT System v1.2 (PLANTS) [39]. For this purpose, the crystal structure of CrtM with the identifier PDBe ID: 3ACX [40] retrieved from the Protein Data Bank in Europe – Knowledge Base (PDBe-KB) [41], was selected as the target structure. Protonation states were assigned at pH 7.4 using PropKa v3.0 [42], with all other parameters set to default. The prepared structure was then processed with the SPORES v1.3 script under default settings and saved in MOL2 format. All docking runs were carried out within a 12.5 Å radius centered on the co-crystallized ligand, using coordinates (x = 55, y = 15, z = 55). An RMSD threshold of 2.0 Å was applied to ensure effective clustering, while other parameters remained at their default values.
Molecular dynamic simulations
All molecular dynamics (MD) simulations of the enzyme–ligand complexes were carried out using GROMACS v2024.4 [43], with the all-atom CHARMM36m force field [44]. The TIP3P water model was employed for all systems, placed within a periodic triclinic box maintaining a minimum distance of 1.0 nm from the solute to the box edge. To mimic physiological ionic strength, Na⁺ and Cl⁻ ions were added to achieve a 0.15 mol/L concentration. Energy minimization was performed using the steepest descent algorithm until the maximum force converged below 500 kJ/mol, eliminating steric clashes. System equilibration was conducted in two phases: first, a 1 ns NVT simulation gradually increasing the temperature to 310 K using the V-rescale thermostat; second, a 3 ns NPT simulation at 1 bar with the C-rescale barostat. Production MD runs were then performed for 500 ns under the Parrinello–Rahman barostat. Long-range electrostatic interactions were computed using the Particle Mesh Ewald (PME) method [45] and the hydrogen bond constraints were applied using the LINCS algorithm [46].
Binding free energy calculations and interaction analyses
Calculations were performed using the MD trajectories to assess the total binding free energy, and to further analyze the interactions influencing the total energy of the selected complexes. For total binding free energy calculations, the MMPBSA methodology [47] was employed, utilizing a single trajectory and performed with the gmx-MMPBSA software (version 1.6.0) [48]. From the MD results of all runs, 500 snapshots were extracted for total energy and amino acid energy decomposition analysis. All settings followed the recommended default values of the software.
The overall computational workflow employed in this study is summarized in Fig 1.
Sequential methodological pipeline applied for the discovery and prioritization of potential CrtM inhibitors from natural products. The workflow integrates ligand-based pharmacophore modeling, QSAR model generation, validation and virtual screening, structural similarity analysis, and in silico ADMET evaluation, followed by molecular docking and molecular dynamics simulations. Each numbered step (1–11) corresponds to a specific stage of the computational protocol, showing the progression from known active ligands to the final selection of five stable and high-affinity CrtM inhibitor candidates.
Results and discussion
Pharmacophore modeling
A ligand-based pharmacophore model was constructed to identify and prioritize the most promising candidates from the extensive natural product database [49]. To construct the pharmacophore model, 23 potent CrtM inhibitors from the dataset of 35 active molecules were selected to capture the essential chemical features underlying their inhibitory activity (S1 and S2 Tables). A structural diversity analysis using SkelSpheres fingerprints with an 85% similarity threshold revealed that these compounds were distributed into three main clusters containing ten, five, and four molecules, respectively, along with four singletons showing similarity values below the defined cutoff (Fig 2). Furthermore, within the largest cluster, several molecules display marked structural differences, as shown by the orange and red nodes and edges. This clustering pattern indicates that the active set shares coherent chemotypes with reproducible structural features, while still retaining enough peripheral diversity to prevent overfitting to a single scaffold. The pharmacophore model generated with PSearch from these inhibitors exhibited robust predictive performance across all statistical metrics, including a precision of 1.0 and a recall of 0.783 (S3 Table). This pharmacophore harbors five features, each with a 1.0 Å radius: an aromatic feature at coordinates (1.21, 0.14, 0.48) Å; two hydrogen bond acceptors at (−6.76, 1.75, −1.49) Å and (−4.08, −1.47, −3.02) Å; a negative ionizable feature at (−6.76, 1.75, −1.49) Å; and a hydrophobic feature at (1.21, 0.14, 0.48) Å (Fig 3).
SkelSpheres fingerprint analysis at an 85% similarity threshold grouped the 23 active inhibitors into three major clusters containing ten, five, and four molecules, respectively, and four singletons (grey nodes) with similarity values below the cutoff.
Magenta sphere represents an aromatic feature spatially overlapping with a hydrophobic feature. Yellow and red spheres correspond to two hydrogen bond acceptor features, with red sphere overlapping with the negative ionizable feature. Distances between features (Å) are given by numerical labels, measured between feature centroids.
The validation screening was performed in Pharmit. Although 23 active molecules were initially considered, one of them could not be included because it was not compatible with the decoy-generation procedure, resulting in a final validation set of 22 actives. Of these, the pharmacophore model successfully retrieved 20, while only 33 of the 248 decoys matched the pharmacophoric features. This corresponded to a sensitivity of 0.909 and a specificity of 0.867, with an overall accuracy of 0.870. The ROC analysis yielded an AUC of 0.888, reflecting strong discriminative power even with binary match scores. Early enrichment analysis further supported the effectiveness of the model: 20 actives appeared within the top 53 ranked compounds (~19.6% of the dataset), whereas only 4.3 would be expected by chance alone, resulting in an enrichment factor of 4.64. The statistical performance of the pharmacophore model is consistent with the ranges typically reported for ligand-based pharmacophore models employing actives–plus–decoys validation [50,51]. The AUC value of 0.888 is particularly noteworthy, as AUC values above 0.85 are generally interpreted as good evidence of discriminative power in decoy-based validation frameworks, as demonstrated in pharmacophore-guided virtual screening studies employing similar evaluation criteria [28]. Similarly, the enrichment factor (EF = 4.64) reflects meaningful early recognition performance, which is a key aspect of decoy-based validation in pharmacophore-guided virtual screening. Although the number of available active compounds was moderate the strong external validation performance, indicates that the model captures meaningful activity-related features and mitigates potential bias arising from dataset size.
Using this pharmacophore as input in the Pharmit web server, 299,845 natural products from the COCONUT database were screened, resulting in 14,168 candidate inhibitors selected for subsequent analyses (S4 Table).
QSAR modeling
To predict the Ki values of the previously identified molecules, a QSAR model was developed in QSARINS using the complete dataset of 35 inhibitors with experimentally determined Ki values. While other CrtM inhibitors have been reported in the literature, most are described through IC₅₀ or percent inhibition data rather than Ki. The latter was preferred because it represents a direct and assay-independent measure of binding affinity, ensuring a more consistent foundation for QSAR modeling [52]. The final dataset included 25 molecules for training and 8 for external validation, whereas two compounds were outside the applicability domain. The final model was constructed using three molecular descriptors to maintain a balanced descriptor-to-sample ratio and reduce the risk of overfitting [35]. These molecular descriptors were: AETA_eta_B, NDLIB22 and N_O_3A. The model equation is: Ŷ = 6.7117 + 95.8573 × AETA_eta_B – 0.199 × NDLIB22–0.9159 × N_O_3A. The AETA_eta_B descriptor (calculated by Mordred) is the averaged ETA branching index, which captures molecular branching within the ETA (Extended Topochemical Atom) framework, reflecting how atomic connectivity and topochemical contributions distribute across the structure [53,54]. Its positive coefficient suggests that, within the modeled chemical space, increased branching-related topochemical complexity is associated with enhance biological activity. NDLIB22 is a MERA descriptor [55] that quantifies the normalized spatial variance of atomic positions along a principal component of the geometry of the molecule. It reflects the extent of molecular elongation or dispersion along this axis, independent of overall size. Its negative coefficient suggests that greater spatial spread in this direction may reduce biological activity. N_O_3A is a topological atom-pair descriptor that counts the number of oxygen atoms located exactly three bonds from a nitrogen atom. It encodes topological proximity without inferring chemical functionality. Its negative coefficient suggests that such N–O arrangements are unfavorable for activity. This feature is easily observed in the less active molecules used to build the model.
Although all three descriptors included in the model are statistically significant (p < 0.001) (S5 Table), their relative contributions differ, as reflected in their standardized coefficients. The descriptors N_O_3A (–0.6237) and AETA_eta_B (+0.589) exhibited the highest absolute values, indicating that topological N–O relationships and ETA (topochemical) show the strongest influence on the predicted activity within this model. In contrast, NDLIB22 presents a smaller standardized coefficient (–0.2781), suggesting a weaker contribution to the predicted activity. However, this 3D descriptor complements the model by capturing geometric features not accounted for by the 2D descriptors.
The quality of the model was assessed using the coefficient of determination (R²). The obtained value of R² = 0.886 indicates that the model explains a substantial proportion of the variance in the experimental Kᵢ values of the compounds in the training set. The dispersion observed for a few compounds (Fig 4A) is within the expected range for QSAR models. Furthermore, the residuals vs. predicted plot shows residuals randomly scattered around zero, all within ±1, with similar scatter patterns between the training and prediction sets (Fig 4B), suggesting an absence of overfitting [35].
(A) Scatter plot of experimental vs predicted pKi by MLR and (B) Scatter plot of Predicted vs Residuals values.
In the internal validation, assessed through Q2 using leave-one-out cross-validation (Fig 5A), each compound in the training set was systematically excluded and predicted by a model built without it. The predicted values from LOO exhibit close agreement with the experimental data, with most points distributed near the identity line. This strong correlation indicates both the robustness of the model and its ability to generalize within the chemical space of the training set. To evaluate the reliability of individual predictions and identify potential outliers, a Williams plot was constructed by plotting standardized residuals against leverage values (Fig 5B). All compounds from both the training and test sets were found to fall below the warning leverage threshold (h* = 0.480), confirming that the QSAR model operated entirely within its applicability domain and was not influenced by outliers or leverage values. Additionally, the standardized residuals remained within ±2.5 standard deviation units, indicating the absence of extreme outliers [56].
(A) Scatter plot of experimental vs Q2-LOO (Leave-One-Out) validation. (B) Williams plot for selected model, with the vertical black line on the right indicating Hat ≤ h* = 0.480.
To further assess the robustness of the model and ensure that the observed correlations were not the result of chance, a Y-randomization test was performed using 2,000 iterations, with 30% of the response variable randomly scrambled in each iteration. The results show that R²_Yscr (yellow dots) and Q²_Yscr (red dots) values from the randomized models were markedly lower than the R² and Q² values of the original model (cyan and blue dots, respectively) (Fig 6), supporting the conclusion that the performance of the model is not due to chance correlations [36].
𝑅2 and 𝑅2 values from 2,000 iterations of Y-scrambled models (yellow and red dots) compared with the performance of the selected model (cyan and blue dots).
The predictive ability of the QSAR model was further assessed through external validation using an independent test set (S5 Table), in line with the requirement that predictivity be demonstrated on truly external compounds not used in model training [34]. The high coefficient of determination (R2 = 0.89) for the test set indicates that the model maintains a high level of predictive consistency when applied to external compounds. The predictive squared correlation coefficients across different formulations of the Q² metric: Q²-F1 (0.87), Q²-F2 (0.87), and Q²-F3 (0.89) all fall within acceptable thresholds, supporting the robustness of the model [57]. Additionally, the external concordance correlation coefficient (CCC = 0.93) demonstrates both precision and accuracy in the predictions [58,59]. Low values of RMSE (0.4802), MAE (0.43), and PRESS (1.84) reflect a good predictive performance, indicating limited error between observed and predicted activities, while the Δr² (0.11), calculated as the absolute difference between the training and test set R², suggests strong internal consistency and reduces concern about overfitting.
Altogether, the internal and external validation results, along with the Y-randomization test and applicability domain analysis, demonstrate that the developed QSAR model is robust, statistically reliable, and possesses strong predictive power within its defined chemical space.
QSAR-based screening
In the next step, QSAR-based screening of the 14,168 compounds previously filtered by the pharmacophore yielded a subset of 9,798 structures. The remaining molecules were excluded as they fell outside the applicability domain and the activity range defined by the active compounds in the training set. Notably, 1,978 molecules exhibited values equal or below 100 nM (pKi ≥ 7). Considering that, compounds with potencies in the 100 nM to 5 µM range are typically classified as hits in early screening stages [60], the predicted high affinities of these compounds provide a solid rationale for their prioritization in subsequent studies. Furthermore, 47 molecules were predicted to be highly active, with pKi values ranging from 9.471 to 9.004 (corresponding to predicted Ki values of 0.338 to 0.989 nM, respectively, S6 Table).
Analysis of structural similarity
Among the 47 compounds with predicted pKi values above 9.0, the similarity map (FragFp similarity > 80%) highlights two major clusters together with several structurally diverse candidates, underscoring the presence of multiple chemotypes within the selected set. The largest cluster in the similarity map (green nodes with blue background in Fig 7A) was excluded from further analysis, as the compounds contain ester functionalities that are generally susceptible to hydrolysis under physiological conditions [61]. Among the remaining compounds, the five highlighted as dark blue nodes were selected for further analysis based on their high predicted activity, as detailed in Table 1. In addition, their structural diversity ensured the representation of distinct chemotypes in this group of candidates rather than closely related analogues. Notably, comparison with the 93 reported CrtM inhibitors in the BindingDB database revealed FragFp similarity values below 80% for all five compounds, indicating that they possess distinct structural features not commonly found among known inhibitors (Fig 7B).
(A) Network of the 47 best-ranked compounds, showing similarity relationships based on the FragFp fingerprint with an 80% similarity cutoff. (B) Network of inhibitors generated with the same grouping strategy. In both panels, each node represents a molecule, and edges connect pairs with similarity ≥ 0.80. Dark blue nodes represent the best-ranked compounds. Node colors reflect neighbor similarity values, ranging from red (0.80) to green (0.95), with higher similarity values forming tighter clusters. Grey nodes correspond to molecules with similarity below the cutoff. The background color scale (red to blue) indicates the size of the neighboring cluster, measured as the number of molecules it contains.
The following descriptions annotated in the COCONUT database [24] summarize the chemical features and reported natural sources of the selected compounds. The compound CNP0260941.1 is a polyketide-derived medium-chain hydroxy acid, naturally occurring in Oldenlandia corymbosa and Thalictrum alpinum. Extracts from O. corymbosa have demonstrated broad-spectrum antibacterial activity, and members of the genus Thalictrum have also been associated with antimicrobial effects [62]. CNP0460023.0 is a sulfated polyphenolic metabolite isolated from the brown alga Pleurophycus gardneri. Although specific antibacterial studies on P. gardneri are limited, brown seaweeds are well recognized as prolific sources of antimicrobial metabolites [63]. The compound CNP0203663.1, a botryane-type sesquiterpenoid, features a distinctive tetralin core substituted with a hydroxylated pentanoic acid side chain. This scaffold has been reported in multiple plant including Artemisia californica and Citrus sudachi. While the genus Artemisia has demonstrated antibacterial activity [64], no studies specifically addressing A. californica were identified. On the other hand, C. sudachi peel extracts have shown inhibitory activity against Vibrio and other foodborne bacteria [65]. CNP0605304.0 belongs to the family of hydroxycinnamic acid derivatives, integrating a 3,4-dihydroxyphenyl moiety with a tartaric acid backbone. This compound has been identified in a broad range of plant species, such as Vitis longii, Callisia fragrans, and Echinacea purpurea—the latter two recognized for their antimicrobial activity [66,67]. Finally, CNP0420418.0 corresponds to Rhamnetin 3,3′-disulfate, a sulfated flavonol isolated from the climbing plant Argyreia mollis. Although antibacterial data for this species are scarce, antimicrobial activity has been demonstrated in other members of the genus Argyreia [68]. Overall, several of the natural sources associated with our top-ranked compounds have shown antimicrobial properties, yet none have been assessed for CrtM inhibition in S. aureus.
It is important to note that some natural products reported to suppress STX synthesis, such as flavones [8] anthraquinones and chalcones [10] were not included in the generation of the pharmacophore and QSAR models, as their activity has not been experimentally confirmed as a direct inhibition of CrtM. Nonetheless, in our analyses, their scaffolds appeared among some of the 47 top-ranked compounds identified in the virtual screenings. In particular, six molecules with a flavone scaffold (CNP0420418.0, CNP0309607.0, CNP0143044.0, CNP0209131.0, CNP0359090.0 and CNP0224661.0); one with an anthraquinone core (CNP0353630.0) and two with chalcone-like hydroxycinnamic backbone (CNP0605304.0 and CNP0572040.0) were identified among these candidates. These findings suggest that CrtM inhibition may underlie, at least in part, the activity of these reported compounds, highlighting the need for further studies to evaluate this potential mechanism of action.
In silico toxicity and drug-likeness evaluation of the best-scoring CrtM candidates
In silico toxicity predictions for the five best-scoring compounds indicated overall favorable safety profiles (S7 Table). None of them showed mutagenic potential in the Ames test, nor did they inhibit the hERG I or II cardiac ion channels, both of which are critical safety endpoints in drug discovery [69,70]. Additionally, no skin sensitization alerts were detected across the dataset.
Among these molecules, four (CNP0420418.0, CNP0460023.0, CNP0203663.1, and CNP0605304.0) were predicted to be non-hepatotoxic, while CNP0260941.1 showed a potential hepatotoxic liability. This compound should therefore be considered with caution in further optimization steps, as structural modifications may be necessary to mitigate this potential hepatotoxic risk.
Regarding systemic toxicity, the five compounds displayed variable tolerance profiles, with predicted maximum tolerated dose (MTD) values ranging from 0.18 to 1.39 log mg/kg/day and chronic oral LOAEL values spanning 1.84 to 4.59 log mg/kg_bw/day. CNP0605304.0 and CNP0460023.0 showed the highest LOAEL values, indicative of lower chronic toxicity and better tolerance under prolonged exposure. In contrast, CNP0260941.1 exhibited the lowest LOAEL and LD₅₀ values, pointing to a higher predicted systemic toxicity within the series.
The evaluation using SwissADME (S8 Table) indicated that the five compounds largely met the main criteria for oral drug-likeness. Most complied with the Lipinski and Ghose rules [71,72], showing suitable molecular size and lipophilicity, while minor deviations in hydrogen bonding or polarity (TPSA > 140 Ų) were observed in CNP0460023.0, CNP0605304.0, and CNP0420418.0. In contrast, CNP0260941.1 and CNP0203663.1 fully satisfied all filters and achieved the highest predicted oral bioavailability scores (0.56 and 0.85, respectively). Synthetic accessibility scores (3.1–4.7) indicated moderate feasibility for chemical synthesis. No relevant PAINS alerts were detected, although CNP0605304.0 contained a catechol moiety and CNP0460023.0/CNP0420418.0 showed sulfonic acid fragments flagged by Brenk filters [73]. Together, these findings highlight CNP0260941.1 and CNP0203663.1 as the most balanced candidates in terms of physicochemical properties and predicted pharmacokinetic behavior.
Molecular docking, molecular dynamics and energy binding analyses
Molecular docking was performed using the PLANTS software to determine the binding poses of the five top-ranked ligands within the CrtM active site. To validate the docking protocol, the co-crystallized ligand was redocked into the CrtM binding site, yielding a root mean square deviation (RMSD) of 2.33 Å relative to the experimental pose, with only deviations in the flexible side chains (Fig 8). Using the docking poses as starting conformations, the stability of the ligand–enzyme complexes was evaluated through the analysis of RMSD values under dynamic conditions, a widely used criterion for monitoring structural fluctuations over time [74]. Lower and stable RMSD values are generally indicative of strong and persistent binding, while higher or progressively increasing values may signal reduced stability or potential dissociation [75].
(A) Superposition of the experimental (tan) and redocked (grey) conformations of the co-crystallized ligand 673, showing close agreement between the two poses. (B) Side view of the 3ACX active site displaying both ligand conformations superimposed within the binding pocket.
The ligands CNP0260941.1, CNP0460023.0, and CNP0203663.1 maintained RMSD values below 0.3 nm for most of the simulation, indicating that they remained stably associated with the binding site. Among them, CNP0260941.1 showed particularly stable behavior: from 215 ns to the end of the trajectory its RMSD stayed near 0.1 nm, making it the most stable ligand–enzyme complex in the set. The co-crystallized ligand stabilized around 0.2 nm at the beginning of the run but displayed a sharp, transient increase to about 0.5 nm near 100 ns. It then regained stability, maintaining an RMSD close to 0.2 nm until the end of the simulation. Ligands CNP0420418.0 and CNP0605304.0 remained stable during the first 100 ns but later exhibited some fluctuations, indicating lower overall stability relative to the other compounds (Fig 9A).
(A) RMSD values and (B) ΔG Total.
The compound with the most favorable binding energy was CNP0203663.1, which showed an average value of –39 kcal/mol during the analysis time, consistent with its stability observed in the RMSD results. CNP0260941.1 and CNP0605304.0 displayed average values of approximately –28 kcal/mol, with similar energetic profiles throughout the simulation. CNP0420418.0 exhibited a binding energy of approximately –30 kcal/mol during the first 30 ns, although with marked fluctuations that coincided with variations in the RMSD values. Finally, CNP0460023.0 displayed a less favorably binding energy at the beginning of the run, with a mean value of approximately –16 kcal/mol. However, from 100 ns onward, the binding energy improved to around –30 kcal/mol, which may account for the reduction in the RMSD values observed, (Fig 9B).
The analysis of the energetic contributions of individual interacting residues in the five ligand–CrtM complexes consistently highlighted key interactions of all ligands with PHE22, VAL137, and LEU164. Notably, as show the Fig 10A, these residues are also recurrently involved in interactions with all co-crystallized ligands, as annotated in the PDBe-KB interaction data. In the case of CNP0260941.1, the heatmap shows interactions with eight amino acids (Fig 10B). Favorable interactions were observed with nearly all residues throughout the simulation, with the exception of TYR41, which consistently exhibited unfavorable contacts. In addition to the three amino acids—PHE22, VAL137, and LEU164—common to all the described ligands, ARG171 also contributed significantly to binding. Each of these residues exhibited interaction energies between −2 to −6 kcal/mol. Notably, ARG171 is involved in interactions with other co-crystallized ligands, appearing in 60% of the PDBe-KB complexes (Fig 10A). The CNP0460023.0 ligand interacts with 11 amino acids, each of which is also involved in interactions with at least 40% of the co-crystallized ligands. Most of these residues contributed to binding affinity throughout the majority of the analysis period, with the exception of VAL133 and, to a lesser extent, LEU141, which exhibited deleterious effects on complex stability (Fig 10C).
(A) Frequency of interactions between CrtM active-site residues and co-crystallized ligands, as reported in the Protein Data Bank in Europe – Knowledge Base (PDBe-KB). Per-residue energy decomposition of CrtM interactions with compounds (B) CNP0260941.1. and (C) CNP0460023.0.
CNP0203663.1 interacts with 12 residues (Fig 11A); however, unlike in the other cases, several of these residues—such as GLY138, LEU160, GLY161, and GLN165—are less commonly involved in crystallographic complexes, showing a frequency of about 20%. Furthermore, this ligand establishes contacts with VAL37, an amino acid not previously observed in interactions with co-crystallized ligands. Throughout the MD run, most of the ligand–residue contacts played a notable role in stabilizing the ligand–protein complex, which explains its high binding affinity, despite VAL133 and GLN165 exhibiting a detrimental effect on binding affinity. CNP0605304.0 exhibited 10 interactions, with contacts involving SER19, PHE22, ASN168, and TYR248 demonstrating particular significance due to their considerable energetic contributions (approximately –4 kcal/mol) sustained from approximately 50 ns until the end of the simulation). On the other hand, contacts with HIS18 and ASP172 identified in our analyses, have been reported at a relatively low frequency in CrtM complexes (approximately 20%). Notably, ASP172 exerted the most detrimental effect throughout most of the simulation (Fig 11B). Lastly, CNP0420418.0 formed the greatest number of contacts; with interactions contributing approximately –2 kcal/mol during the majority of the simulation. Additionally, VAL133 was identified as a key residue exhibiting unfavorable interactions that negatively affect the binding affinity (Fig 11C). Overall, these results from molecular dynamics simulations and binding free energy calculations, confirm the stability and affinity profile of all five potential CrtM inhibitors identified through pharmacophore- and QSAR-based screenings, further demonstrating the robustness of our protocols in reliably selecting promising drug candidates.
Conclusion
In this study, we report a set of 1,978 natural-origin compounds with predicted Ki values below 100 nM, which constitutes a broad list of compounds with potential inhibitory activity against CrtM. Among them, 47 top-ranked candidates displayed predicted Ki values below 1 nM, highlighting their potential as some of most promising inhibitors identified to date. Furthermore, the presence of molecular scaffolds previously associated with suppression of STX synthesis among some of the top-ranked compounds supports the validity of our screening protocols and suggests that they may be capturing biologically relevant features for targeting this enzyme. In fact, the favorable binding energies and stable molecular interactions of the five representative ligands with the CrtM binding site provide further support for this possibility. Importantly, the substantial number and structural diversity of the predicted candidates expand the chemical space of potential inhibitors, providing a valuable resource for experimental validation and for advancing the rational design of effective STX biosynthesis inhibitors.
Supporting information
S1 Table. CrtM inhibitors and their Ki values obtained from BindingDB.
https://doi.org/10.1371/journal.pone.0345620.s001
(XLSX)
S2 Table. Structures, Ki, and pKi values of CrtM inhibitors obtained from BindingDB.
https://doi.org/10.1371/journal.pone.0345620.s002
(XLSX)
S3 Table. Validation parameters for pharmacophoric modeling using Psearch software.
https://doi.org/10.1371/journal.pone.0345620.s003
(XLSX)
S4 Table. List of compounds filtered by pharmacophoric characteristics.
https://doi.org/10.1371/journal.pone.0345620.s004
(XLSX)
S6 Table. Predicted activity of compounds falling within the applicability domain and activity range.
https://doi.org/10.1371/journal.pone.0345620.s006
(XLSX)
S7 Table. In silico toxicity evaluation of the best-scoring CrtM candidates.
https://doi.org/10.1371/journal.pone.0345620.s007
(XLSX)
S8 Table. Drug-likeness, medicinal chemistry, and pharmacokinetic profiles of the top-scoring CrtM inhibitor candidates.
https://doi.org/10.1371/journal.pone.0345620.s008
(XLSX)
Acknowledgments
We would like to express our sincere gratitude to Prof Paola Gramatica (University of Insubria, Varese, Italy) for generously providing a free license for QSARINS v2.2.4, which was used in this study.
References
- 1. Dehbanipour R, Ghalavand Z. Anti-virulence therapeutic strategies against bacterial infections: recent advances. Germs. 2022;12(2):262–75. pmid:36504617
- 2. Liu C-I, Liu GY, Song Y, Yin F, Hensler ME, Jeng W-Y, et al. A cholesterol biosynthesis inhibitor blocks Staphylococcus aureus virulence. Science. 2008;319(5868):1391–4. pmid:18276850
- 3. Scherr TD, Heim CE, Morrison JM, Kielian T. Hiding in plain sight: interplay between staphylococcal biofilms and host immunity. Front Immunol. 2014;5:37. pmid:24550921
- 4. Cheung GYC, Bae JS, Otto M. Pathogenicity and virulence of Staphylococcus aureus. Virulence. 2021;12:547–69.
- 5. Chang J, Yue M, Zhao G, Tang B, Shi C. Advancements in function recognition of staphyloxanthin and targeted inhibitor development. Curr Opin Food Sci. 2025;63:101300.
- 6. Liu Y, McQuillen EA, Rana PSJB, Gloag ES, Parsek MR, Wozniak DJ. A bacterial pigment provides cross-species protection from H2O2- and neutrophil-mediated killing. Proc Natl Acad Sci U S A. 2024;121(2):e2312334121. pmid:38170744
- 7. Campbell AE, McCready-Vangi AR, Uberoi A, Murga-Garrido SM, Lovins VM, White EK, et al. Variable staphyloxanthin production by Staphylococcus aureus drives strain-dependent effects on diabetic wound-healing outcomes. Cell Rep. 2023;42(10):113281. pmid:37858460
- 8. Elmesseri RA, Saleh SE, Elsherif HM, Yahia IS, Aboshanab KM. Staphyloxanthin as a potential novel target for deciphering promising anti-Staphylococcus aureus agents. Antibiotics (Basel). 2022;11(3):298. pmid:35326762
- 9. Xue L, Chen YY, Yan Z, Lu W, Wan D, Zhu H. Staphyloxanthin: a potential target for antivirulence therapy. Infect Drug Resist. 2019;12:2151–60.
- 10. Sakai K, Koyama N, Fukuda T, Mori Y, Onaka H, Tomoda H. Search method for inhibitors of staphyloxanthin production by methicillin-resistant Staphylococcus aureus. Biol Pharm Bull. 2012;35(1):48–53. pmid:22223336
- 11. Yehia FA, Zahraa A, Yousef N, Askoura M. Celastrol mitigates staphyloxanthin biosynthesis and biofilm formation in Staphylococcus aureus via targeting key regulators of virulence; in vitro and in vivo approach. BMC Microbiol. 2022.
- 12. Lee J-H, Park J-H, Cho MH, Lee J. Flavone reduces the production of virulence factors, staphyloxanthin and α-hemolysin, in Staphylococcus aureus. Curr Microbiol. 2012;65(6):726–32. pmid:22965624
- 13. Valliammai A, Selvaraj A, Muthuramalingam P, Priya A, Ramesh M, Pandian SK. Staphyloxanthin inhibitory potential of thymol impairs antioxidant fitness, enhances neutrophil mediated killing and alters membrane fluidity of methicillin resistant Staphylococcus aureus. Biomed Pharmacother. 2021;141:111933. pmid:34328107
- 14. Song Y, Lin F-Y, Yin F, Hensler M, Rodrígues Poveda CA, Mukkamala D, et al. Phosphonosulfonates are potent, selective inhibitors of dehydrosqualene synthase and staphyloxanthin biosynthesis in Staphylococcus aureus. J Med Chem. 2009;52(4):976–88. pmid:19191557
- 15. Silver LL. Challenges of antibacterial discovery. Clin Microbiol Rev. 2011;24(1):71–109. pmid:21233508
- 16. Askoura M, Yousef N, Mansour B, Yehia FA-ZA. Antibiofilm and staphyloxanthin inhibitory potential of terbinafine against Staphylococcus aureus: in vitro and in vivo studies. Ann Clin Microbiol Antimicrob. 2022;21(1):21. pmid:35637481
- 17. Selvaraj A, Jayasree T, Valliammai A, Pandian SK. Myrtenol attenuates MRSA biofilm and virulence by suppressing sarA expression dynamism. Front Microbiol. 2019;10:2027. pmid:31551964
- 18. Selvaraj A, Valliammai A, Muthuramalingam P, Priya A, Suba M, Ramesh M, et al. Carvacrol targets SarA and CrtM of methicillin-resistant Staphylococcus aureus to mitigate biofilm formation and staphyloxanthin synthesis: an in vitro and in vivo approach. ACS Omega. 2020;5(48):31100–14. pmid:33324819
- 19. Swinney DC, Anthony J. How were new medicines discovered? Nat Rev Drug Discov. 2011;10:507–19.
- 20. Schenone M, Dančík V, Wagner BK, Clemons PA. Target identification and mechanism of action in chemical biology and drug discovery. Nat Chem Biol. 2013;9(4):232–40. pmid:23508189
- 21. Younes KM, Abouzied AS, Alafnan A, Huwaimel B, Khojali WMA, Alzahrani RM. Investigating the bispecific lead compounds against methicillin-resistant Staphylococcus aureus SarA and CrtM using machine learning and molecular dynamics approach. J Biomol Struct Dyn. 2025;43(7):3348–65. pmid:38147401
- 22. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007;35(Database issue):D198–201. pmid:17145705
- 23. Mysinger MM, Carchia M, Irwin JJ, Shoichet BK. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem. 2012;55(14):6582–94. pmid:22716043
- 24. Sorokina M, Merseburger P, Rajan K, Yirik MA, Steinbeck C. COCONUT online: collection of open natural products database. J Cheminform. 2021;13(1):2. pmid:33423696
- 25. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open babel: an open chemical toolbox. J Cheminform. 2011;3:33. pmid:21982300
- 26. Kutlushina A, Khakimova A, Madzhidov T, Polishchuk P. Ligand-based pharmacophore modeling using novel 3D pharmacophore signatures. Molecules. 2018;23(12):3094. pmid:30486389
- 27. Sunseri J, Koes DR. Pharmit: interactive exploration of chemical space. Nucleic Acids Res. 2016;44(W1):W442–8. pmid:27095195
- 28. Asati V, Thakur SS, Upmanyu N, Bharti SK. Virtual screening, molecular docking, and DFT studies of some thiazolidine-2,4-diones as potential PIM-1 kinase inhibitors. Chemy Select. 2018;3:12186–94.
- 29. Sander T, Freyss J, von Korff M, Rufener C. DataWarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model. 2015;55(2):460–73. pmid:25558886
- 30. Gramatica P, Chirico N, Papa E, Cassani S, Kovarich S. QSARINS: a new software for the development, analysis, and validation of QSAR MLR models. J Comput Chem. 2013;34:2121–32.
- 31. Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, et al. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des. 2011;25(6):533–54. pmid:21660515
- 32. Menezes F, Popowicz GM. ULYSSES: an efficient and easy to use semiempirical library for C+. J Chem Inf Model. 2022;62(16):3685–94. pmid:35930308
- 33. Řezáč J, Fanfrlík J, Salahub D, Hobza P. Semiempirical quantum chemical PM6 method augmented by dispersion and H-bonding correction terms reliably describes various types of noncovalent complexes. J Chem Theory Comput. 2009;5:1749–60.
- 34. Gramatica P. Origin of the OECD principles for QSAR validation and their role in changing the QSAR paradigm worldwide: an historical overview. J Chemometr. 2025;39(3).
- 35. Gramatica P. Principles of QSAR modeling. Int J Quant Struct-Property Relationships. 2020;5(3):61–97.
- 36. Gramatica P, Cassani S, Roy PP, Kovarich S, Yap CW, Papa E. QSAR modeling is not “push a button and find a correlation”: a case study of toxicity of (Benzo-)triazoles on Algae. Mol Inform. 2012;31(11–12):817–35. pmid:27476736
- 37. Pires DEV, Blundell TL, Ascher DB. pkCSM: predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures. J Med Chem. 2015;58(9):4066–72. pmid:25860834
- 38. Daina A, Michielin O, Zoete V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep. 2017;7:42717. pmid:28256516
- 39. Korb O, Stützle T, Exner TE. Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model. 2009;49(1):84–96. pmid:19125657
- 40. Lin F-Y, Liu C-I, Liu Y-L, Zhang Y, Wang K, Jeng W-Y, et al. Mechanism of action and inhibition of dehydrosqualene synthase. Proc Natl Acad Sci U S A. 2010;107(50):21337–42. pmid:21098670
- 41. PDBe-KB consortium. PDBe-KB: a community-driven resource for structural and functional annotations. Nucleic Acids Res. 2020;48(D1):D344–53. pmid:31584092
- 42. Olsson MHM, Søndergaard CR, Rostkowski M, Jensen JH. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J Chem Theory Comput. 2011;7(2):525–37. pmid:26596171
- 43. Van Der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC. GROMACS: fast, flexible, and free. J Comput Chem. 2005;26(16):1701–18. pmid:16211538
- 44. Best RB, Zhu X, Shim J, Lopes PEM, Mittal J, Feig M, et al. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles. J Chem Theory Comput. 2012;8(9):3257–73. pmid:23341755
- 45. Ewald PP. Die berechnung optischer und elektrostatischer gitterpotentiale. Annalen der Physik. 1921;369(3):253–87.
- 46. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM. LINCS: a linear constraint solver for molecular simulations. J Comput Chem. 1997;18(12):1463–72.
- 47. Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, et al. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc Chem Res. 2000;33(12):889–97. pmid:11123888
- 48. Valdés-Tresanco MS, Valdés-Tresanco ME, Valiente PA, Moreno E. gmx_MMPBSA: a new tool to perform end-state free energy calculations with GROMACS. J Chem Theory Comput. 2021;17(10):6281–91. pmid:34586825
- 49. Hsu Y-C, Liu C-H, Wu Y-C, Lai S-J, Lin C-J, Tseng T-S. Combatting antibiotic-resistant staphylococcus aureus: discovery of TST1N-224, a potent inhibitor targeting response regulator VraRC, through pharmacophore-based screening and molecular characterizations. J Chem Inf Model. 2024;64(15):6132–46. pmid:39078379
- 50. Rathore A, Asati V, Mishra M, Das R, Kashaw V, Kashaw SK. Computational approaches for the design of novel dopamine D2 and serotonin 5-HT2A receptor dual antagonist towards schizophrenia. In Silico Pharmacol. 2022;10(1):7. pmid:35433192
- 51. Saha B, Das A, Jangid K, Kumar A, Kumar V, Jaitak V. Identification of coumarin derivatives targeting acetylcholinesterase for Alzheimer’s disease by field-based 3D-QSAR, pharmacophore model-based virtual screening, molecular docking, MM/GBSA, ADME and MD Simulation study. Curr Res Struct Biol. 2024;7:100124. pmid:38292820
- 52. Schneider M, Pons J-L, Bourguet W, Labesse G. Towards accurate high-throughput ligand affinity prediction by exploiting structural ensembles, docking metrics and ligand similarity. Bioinformatics. 2020;36(1):160–8. pmid:31350558
- 53. Roy K, Das RN. The “ETA” indices in QSAR/QSPR/QSTR research. Pharm Sci. 2016;978–1011.
- 54. Moriwaki H, Tian Y-S, Kawashita N, Takagi T. Mordred: a molecular descriptor calculator. J Cheminform. 2018;10(1):4. pmid:29411163
- 55. Grishina M, Potemkin VMG. MERA descriptors. 2019. Available from: https://docs.ochem.eu/x/NQJr.html
- 56. Li J, Gramatica P. The importance of molecular structures, endpoints’ values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders. Mol Divers. 2010;14(4):687–96. pmid:19921452
- 57. Gramatica P, Sangion A. A historical excursus on the statistical validation parameters for QSAR models: a clarification concerning metrics and terminology. J Chem Inf Model. 2016;56(6):1127–31. pmid:27218604
- 58. Chirico N, Gramatica P. Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J Chem Inf Model. 2011;51(9):2320–35. pmid:21800825
- 59. Chirico N, Gramatica P. Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J Chem Inf Model. 2012;52(8):2044–58. pmid:22721530
- 60. Hughes JP, Rees S, Kalindjian SB, Philpott KL. Principles of early drug discovery. Br J Pharmacol. 2011;162(6):1239–49. pmid:21091654
- 61. Lavis LD. Ester bonds in prodrugs. ACS Chem Biol. 2008;3(4):203–6. pmid:18422301
- 62. Al Bashera M, Moulick SP, Islam MDB, Jahan F, Uddin MDN, Rana GMM, et al. Antimicrobial and antioxidant properties of Oldenlandia corymbosa L. ethanolic extract: a comprehensive study with molecular docking and GC-MS analysis. Heliyon. 2025;11(5):e42901.
- 63. Chouh A, Nouadri T, Catarino MD, Silva AMS, Cardoso SM. Phlorotannins of the Brown Algae Sargassum vulgare from the Mediterranean Sea Coast. Antioxidants (Basel). 2022;11(6):1055. pmid:35739951
- 64. Umam K, Feng C-S, Yang G, Tu P-C, Lin C-Y, Yang M-T, et al. Phytochemistry, pharmacology and mode of action of the anti-bacterial artemisia plants. Bioengineering (Basel). 2023;10(6):633. pmid:37370564
- 65. Tomotake H, Koga T, Yamato M, Kassu A, Ota F. Antibacterial activity of citrus fruit juices against Vibrio species. J Nutr Sci Vitaminol (Tokyo). 2006;52(2):157–60. pmid:16802698
- 66. Dahan A, Yarmolinsky L, Budovsky A, Khalfin B, Ben-Shabat S. Phytochemicals of Callisia fragrans exhibit significant antimicrobial, pro-wound-healing, and anti-inflammatory properties. Phytomedicine Plus. 2025;5(2):100780.
- 67. Ahmadi F. Phytochemistry, mechanisms, and preclinical studies of echinacea extracts in modulating immune responses to bacterial and viral infections: a comprehensive review. Antibiotics (Basel). 2024;13(10):947. pmid:39452214
- 68. Subramanyam GK, Gaddam SA, Kotakadi VS, Palithya S, Penchalaneni J, Challagundla VN. Argyreia nervosa (Samudra pala) leaf extract mediated silver nanoparticles and evaluation of their antioxidant, antibacterial activity, in vitro anticancer and apoptotic studies in KB oral cancer cell lines. Artif Cells Nanomed Biotechnol. 2021;49: 635–50.
- 69. Danker T, Möller C. Early identification of hERG liability in drug discovery programs by automated patch clamp. Front Pharmacol. 2014;5:203. pmid:25228880
- 70. Zeiger E. The test that changed the world: the ames test and the regulation of chemicals. Mutat Res Genet Toxicol Environ Mutagen. 2019;841:43–8. pmid:31138410
- 71. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 2001;46(1–3):3–26. pmid:11259830
- 72. Ghose AK, Viswanadhan VN, Wendoloski JJ. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J Comb Chem. 1999;1(1):55–68. pmid:10746014
- 73. Brenk R, Schipani A, James D, Krasowski A, Gilbert IH, Frearson J, et al. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem. 2008;3(3):435–44. pmid:18064617
- 74. Amorim JC, Carpio JM. Alpha-naphthoflavone as a novel scaffold for the design of potential inhibitors of the APH(3’)-IIIa nucleotide-binding site of Enterococcus faecalis. Microorganisms. 2023;11(9):2351. pmid:37764195
- 75. Kirchmair J, Markt P, Distinto S, Wolber G, Langer T. Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection--what can we learn from earlier mistakes? J Comput Aided Mol Des. 2008;22(3–4):213–28. pmid:18196462