Mpropred: A machine learning (ML) driven Web-App for bioactivity prediction of SARS-CoV-2 main protease (Mpro) antagonists

Nadim Ferdous; Mahjerin Nasrin Reza; Mohammad Uzzal Hossain; Shahin Mahmud; Suhami Napis; Kamal Chowdhury; A. K. M. Mohiuddin

doi:10.1371/journal.pone.0287179

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic emerged in 2019 and still requiring treatments with fast clinical translatability. Frequent occurrence of mutations in spike glycoprotein of SARS-CoV-2 led the consideration of an alternative therapeutic target to combat the ongoing pandemic. The main protease (M^pro) is such an attractive drug target due to its importance in maturating several polyproteins during the replication process. In the present study, we used a classification structure–activity relationship (CSAR) model to find substructures that leads to to anti-M^pro activities among 758 non-redundant compounds. A set of 12 fingerprints were used to describe M^pro inhibitors, and the random forest approach was used to build prediction models from 100 distinct data splits. The data set’s modelability (MODI index) was found to be robust, with a value of 0.79 above the 0.65 threshold. The accuracy (89%), sensitivity (89%), specificity (73%), and Matthews correlation coefficient (79%) used to calculate the prediction performance, was also found to be statistically robust. An extensive analysis of the top significant descriptors unveiled the significance of methyl side chains, aromatic ring and halogen groups for M^pro inhibition. Finally, the predictive model is made publicly accessible as a web-app named M^propred in order to allow users to predict the bioactivity of compounds against SARS-CoV-2 M^pro. Later, CMNPD, a marine compound database was screened by our app to predict bioactivity of all the compounds and results revealed significant correlation with their binding affinity to M^pro. Molecular dynamics (MD) simulation and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) analysis showed improved properties of the complexes. Thus, the knowledge and web-app shown herein can be used to develop more effective and specific inhibitors against the SARS-CoV-2 M^pro. The web-app can be accessed from https://share.streamlit.io/nadimfrds/mpropred/Mpropred_app.py.

Citation: Ferdous N, Reza MN, Hossain MU, Mahmud S, Napis S, Chowdhury K, et al. (2023) M^propred: A machine learning (ML) driven Web-App for bioactivity prediction of SARS-CoV-2 main protease (M^pro) antagonists. PLoS ONE 18(6): e0287179. https://doi.org/10.1371/journal.pone.0287179

Editor: Emilio Gallicchio, Brooklyn College of the City University of New York, UNITED STATES

Received: August 29, 2022; Accepted: May 31, 2023; Published: June 23, 2023

Copyright: © 2023 Ferdous et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The COVID-19 pandemic, which was triggered by SARS-CoV-2, is still having a disastrous impact on public health and the worldwide economy [1,2]. On earlier march of 2020, the outbreak was declared as a pandemic after initially discovering the virus in end of 2019 in Wuhan, China [3,4]. SARS-CoV-2 is a single-stranded RNA virus with an increased mutation rate, a short period of replication, and a high production of virion [5–8]. The virus acquires a significant amount of genetic variation as it spreads, enabling it to adapt quickly to stresses brought on by natural selection, particularly those imposed by the immune system of the host. Mutations build up over time, resulting in alterations to the amino acids that make immune-targeted proteins less antigenic. This process is known as “antigenic drift”, that is the gradual alteration in viral protein antigenicity caused by selective immunological pressure [9]. Antigenic drift permits viruses to avoid host immunity continuously, allowing for recurring viral outbreaks. In cases of acute infectious disease, antibody responses are mostly responsible, resulting in the selection of escape mutants [9]. The spike protein contains several variations in amino acids identified in SARS-CoV-2 variants, is the main target that antibodies neutralize [10]. These antibodies are the sole immune system component capable of providing sterilizing immunity, preventing infection of host cells by the virus. The SARS-CoV-2 spike protein has evolved at a considerably faster rate than similar proteins in additional known viruses that cause severe infectious diseases in humans [11]. In addition, SARS-CoV-2 proteins have accumulated a substantial number of amino acid modifications that are not recognized antibody targets. In acute viral infections, antibody responses are predominantly responsible for antigenic drift; therefore, these amino acid alterations may have given the virus a fitness benefit independent to antibody immunity [12,13].

As a result, structural and functional research of SARS-CoV-2 infection processes have primarily shifted to the main protease (M^pro), which cleaves native polypeptides and forms active fragments that are crucial for viral replication, transcription, and translation process [14]. The protease is consists of three domains [15]. A loop (residues 185–200) connects domain II with domain III. The ligand binding site is found in the loop between the first two domains, where the catalytic dyad consisting of Cys 145 and His 41 is crucial for ligand management [15]. SARS-CoV-2 replication is significantly facilitated by M^pro [15]. Moreover, it is not functionally associated with human homologue proteases, implying that M^pro is a promising target for therapeutic development [15].

The functions of compounds both at structural and chemical level, are crucial in understanding the impact of physicochemical qualities on bioactivity. Computer-aided drug design (CADD) portrays a set of computational techniques that has proven useful in chemical biology and computational approaches to understanding the structure–activity relationship [16,17]. Computational tools are used to decipher bioactivity using ligands, which are known as chemical descriptors [18]. Molecular descriptors computation softwares can also be used to calculate the physicochemical properties of different compounds. The QSAR (quantitative structure activity relationship) is considered a widely used computational technique to construct prediction models which can distinguish the impact of important molecular fingerprints regulating their bioactivities and properties [19]. A variety of targets, including antioxidant [20], antibacterial [21,22], anticancer [23,24], and antiviral [25,26] activities, have been successfully modeled using QSAR models.

In silico strategies such as QSAR, pharmacophore modeling, docking and molecular dynamics (MD) simulation have extensively studied for identifying new inhibitors of M^pro. Isabela et al. used an in-house designed machine learning technique, molecular docking, MM-PBSA calculations, and meta dynamics to find FDA-approved compounds that could potentially suppress the enzyme activity of the M^pro [27]. Nedra et al. developed a machine learning approach by employing the support vector machine (SVM) classification model to categorize two hundred novel chemo-types as potentially active against the viral protease using a dataset of two million commercially accessible drugs [28]. Mahesha et al. used an integrative strategy to screen 1528 anti-HIV1 compounds, using a machine learning predictive model, molecular docking, and a deep learning model that considered the IC50 values of known inhibitors [29]. But the models developed in these works are not accessible as readily available web-apps for the scientific community to further apply on different sets of compounds in order to identify more potent anti-M^pro inhibitors.

In the present study, we collected a set of compounds (non-redundant) with known IC50 values against the SARS-CoV-2 M^pro. Several fingerprint descriptors were used to describe the examined compounds binning the IC50 values to qualitative labels such as active and inactive. Following that, the RF algorithm was used to build prediction models. The sensitivity, specificity, accuracy and Matthews correlation coefficient of the built QSAR model were tested in classifying active or inactive compounds against SARS-CoV-2 M^pro. In addition, the underlying key substructures that are critical for bioactivity were identified and defined. A web-app was built based on the model and made publicly accessible. We also utilized our app to view into the correlation of predicted bioactivity of compounds with their binding affinity to M^pro. As a result, this knowledge can be exploited to develop more potent and specialized drugs against SARS-CoV-2 M^pro.

Materials and methods

A step-wise protocol was followed to build a web-app in order to predict bioactivities of compounds against the M^pro of SARS-CoV-2. The work flow is shown in Fig 1.

Download:

Fig 1. Schematic workflow of building a web-app to predict bioactivities of compounds against the M^pro of SARS-CoV-2.

The upper side of the flowchart depicts the methodology of building the web-app while the lower side shows the molecular modeling protocol that was used to view into the correlation of predicted bioactivity of compounds with their corresponding binding affinity.

https://doi.org/10.1371/journal.pone.0287179.g001

Dataset preparation and curation

A data set consisting of antagonists against SARS-CoV-2 main protease (M^pro) was compiled from an extensive literature review that was initially comprised of 758 compounds [30]. The mean value was calculated in the event that multiple IC50 values were found for the same compound. As our study aims to developing a classification model of M^pro antagonists, we defined the thresholds as <0.5 and >10 μM to distinguish active compounds from inactive ones, respectively. Also, intermediate bioactivities with IC50 values that ranged between 1 and 10 were excluded from the study, consisting of 284 inhibitors. Finally, the curated set of compounds consisting of 478 inhibitors was obtained and analyzed.

Calculation of molecular descriptors

The PaDEL-Descriptor software was utilized briefly to compute the fingerprints of the data set [31]. Generally, molecular descriptors are very crucial for QSAR studies because they are used to characterize the various properties of compounds and aid in the structural information analyses. In the present study, 12 molecular fingerprints that belong to 9 classes were used to describe the structures, and these consisted of 2D AtomPairs, CDK (including extended and graph only version), E-state, PubChem, Klekota–Roth, Substructure and MACCS.

Data filtering and balancing

To choose the fingerprint descriptor sets, variables that were constant or nearly constant were used with a view to removing the bias and complexity in building the model. Using 0.1 as the SD cut-off value, all the constants of each fingerprint descriptor were calculated. For further investigation, fingerprints having SD values of >0.1 were chosen. The undersampling technique was employed by random selection of subset of the active compounds from the starting set to avoid the propensity for overfitting of imbalanced data. Additionally, the data was divided into two sets, with the internal set being 80% of the total data set and the external set comprising the remaining 20% to avoid any possibility of getting a predicted model that is biased.

Multivariate analysis of model

A CSAR model’s prediction performance is influenced by both the predictor and the compound descriptors. Considering the success in various models and the interpretability in many applications, we employed RF in this study.

Random forest is an ensemble classifier that uses a randomly selected subset of training samples and variables to generate a number of decision trees. The classification of RF begins at the root node, where the value of particular descriptors is used to divide the data set at every node, with the descriptors of various activities being primarily transported to distinct branches [32,33]. The classification is then obtained by averaging the outcomes of all trees using a majority vote from each tree [34,35]. The randomForest package of the R language was used to create the RF classifier. To effectively predict M^pro inhibitor activity, two RF model parameters must be tuned: the number of trees used to form the RF classifier (ntree) and the number of random candidate features (mtry). The parameter “mtry” was created using the randomForest package’s tuneRF function, while the ntree parameter was tuned using a 10-fold CV technique from the range of ntree € {100,200,…,1,000} [36]. The importance estimator, an efficient built-in component of the RF model, was also utilized to find informative descriptors to better explain the bioactivity of M^pro inhibitors.

Modelability of data set

The underlying relatedness of chemical structures and their bioactivities is required for modelability. Activity cliffs, also known as two compounds with remarkably different bioactivities (i.e., one pair of compounds has favorable biological activity while the other in the pair has low bioactivity), are detrimental to machine learning algorithms that try to correlate structures with related biological activity. Similar compounds having comparable bioactivities would, on the other hand, contribute favorably to the data set’s modelability. Golbraikh et al. developed this modelability index (MODI) [37]. The following formula can be used to calculate the statistical metric:

Step 1: The normalized Euclidean distance (Dnormalized) for each pair of the compounds, Ci and Cj described by m-dimensional vector is calculated as follows: (1) (2) (3) where d_ij, shows distance scores between two compounds and the n represent mean Euclidean distance.

Step 2: The MODI can be calculated for each compound in a data set by determining whether its first nearest neighbor belongs to the same class as the compound or a different class: (4)

Where the N_C denotes the number of classes, N_i^same denotes the number of total compounds in the ith class having the same ith class as their first nearest neighbors, and Ni^total denotes number of total compounds in the ith class. Any data set is deemed modelable provided that MODI index falls beyond the cutoff value of 0.65. Here, the MODI index was calculated using a R code that was used for assessing modelibility of the HCVpred [38] server.

Model validation (Statistical approach)

Several statistical measures, such as overall prediction sensitivity (Sn), specificity (Sp), accuracy (Ac) and Matthew’s correlation coefficient (MCC), were used to evaluate the model’s fitness. (5) (6) (7) (8) where True positives, false positives, true negatives, and false negatives are denoted by the abbreviations TP, FP, TN and FN, respectively.

Applicability domain analysis

The boundaries within which the model may produce precise predictions for compounds based on similarity towards the compounds on which the model was built are established by the applicability domain (AD). Only those compounds are found inside the AD that match the model’s parameters. In this study, the AD of the compounds from both the training and testing sets were analyzed using the PCA bounding box.

Deployment of model as web-app

Finally, we deployed the developed RF model as a web-app with a view to enabling easy access for the research community. The web-app named “M^propred” was built in the Streamlit python package (https://www.streamlit.io/) and deployed on the “Streamlit Share” cloud application platform while the source-code is maintained in a GitHub repository. The web-app can accept SMILES IDs and compound names in the form of a text (.txt) file and return the predicted pIC50 values of the compounds.

Correlation of predicted bioactivity with binding affinity (Molecular modeling and simulation)

We further tested the correlation of predicted bioactivity of compounds with their corresponding binding affinity to M^pro via an integrated molecular modeling and simulation approach with the utilization of our developed web-app. A new comprehensive marine natural products database named CMNPD was used for this purpose [39]. As no previous research was published on testing the efficacy of the compounds from this database against M^pro, we downloaded all the available 31,492 compounds from the database, collected their SMILES IDs, and submitted them to the M^propred web-app for bioactivity (pIC50) prediction. Later, the 3D structures of the top compounds with high pIC50 values were generated using Open Babel and prepared for molecular docking upon energy minimization using the MMFF94 forcefield. The 3D-structure of SARS-CoV-2 M^pro in complexed with an inhibitor N3 (PDB ID: 6LU7) was used as receptor. Molecular docking was run using Autodock Vina with the same grid box parameters covering the ligand binding residues that were used in our previous work [40,41]. The exhaustiveness value was set to 100. The aim of this approach was to assess whether the compounds with predicted high pIC50 bind to the protease with high affinity.

The top 5 complexes with high binding interaction with M^pro were subjected to MD simulation to view their conformational changes. The GROningen MAchine for Chemical Simulations (GROMACS) version 5.1.2 was utilized to perform the MD simulations with the parameters that we previously used [42,43]. The topologies of proteins and the ligands were generated using the ‘pdb2gmx’ script and the PRODRG server, respectively [44]. The GROMOS96 54a7 force field was used to get the energy minimized conformations of complexes and, further, they were solvated in a square box with 1.0 nm of padding using a single point charge (SPC) water model [45]. The net charges in the systems were neutralized using the ‘gmx genion’ script of GROMACS. The steepest descent algorithm was employed to minimize energy of the complexes with < 10.0 kJ/mol force and a maximum of 50,000 steps. Later, NVT and NPT ensembles were performed to equilibrate the systems, both at 300 K temperature and 1 atm for 100 picoseconds (ps). In the simulation, the thermostat and barostat were chosen as the V-rescale and Parrinello-Rahman, respectively. The final production run was performed for 100 nanoseconds (ns) in the HPC cluster of National Institute of Biotechnology, Savar, Bangladesh at 300 K with a 2-fs time step. The simulations were accelerated using a “NVIDIA GTX 3070” graphics processing unit (GPU). The root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent accessible surface area (SASA) and number of hydrogen bonds were analyzed to evaluate the stability of the complexes after completion of the simulation. The GRACE software was used to plot the graphs.

We also calculated the binding free energies (MM/PBSA) using the ‘g_mmpbsa’ package of GROMACS followed after the final production run [46]. The following equation is used to calculate the binding energies in this method: (9) where ΔG_binding is the overall binding energy of the complex, G_protein is the free protein binding energy, and G_ligand is the unbounded ligand binding energy.

Results

Chemical space analysis

The dataset that we used in this study is contained in S1 File including the SMILES ID of the 758 compounds with references. Exploration of the typical distinctions between active and inactive compounds is one of the major motives for undertaking chemical space analysis. We visualized the actives and inactives distribution as the function of molecular weight (MW) vs. the Ghose–Crippen–Viswanadhan octanol–water partition coefficient to investigate the general chemical space (ALogP). Then, using Lipinski’s rule-of-five (Ro5), we compared the actives and inactives. Fig 2(A) depicts the MW as the function of ALogP. As can be seen, the majority of the compounds are located in the MW range of 250–600 Da and have an ALogP of 0–6. (Fig 2B–2E) includes visualization of data and the statistical analysis of the Ro5. The majority of the compounds meet the Ro5 criteria, having a MW of 500 Da, nHBDon and nHBAcc and ALogP values of <10. Furthermore, the findings of statistical analysis show a noteworthy difference among the active and inactive compounds from employment of the Mann–Whitney U test (Table 1). The ALogP values of inactive compounds were found to be higher than the active ones. The nHBDon values of both the active and inactive compounds were similar, however the nHBAcc values of the active compounds were found to be lower than the values of inactive ones.

Download:

Fig 2.

Chemical space analysis (A), box plot of Lipinski’s rule-of five descriptors (B-E) and applicability domain analysis (F) for analyzed M^pro inhibitors.

https://doi.org/10.1371/journal.pone.0287179.g002

Download:

Table 1. Mann–Whitney U test results of various properties of the compounds.

https://doi.org/10.1371/journal.pone.0287179.t001

Furthermore, the AD of the built model was determined using the MACCS fingerprints as the starting input for PCA analysis, as shown in Fig 2(F). After balancing, the data set of 478 compounds was randomly divided into internal and external (80% and 20% respectively) subsets. It’s important to note that the internal set (80%) is used as the training set for building predictive models to predict on the external set. The external set’s chemical space distribution was revealed to be well inside the internal set’s boundaries. As a result, the AD for the CSAR model described herein appears to be well defined.

QSAR modeling

To develop an interpretable QSAR model, we used fingerprints computed through the PaDEL-Descriptor software. S2–S13 Files contains the computed 12 molecular fingerprints of our dataset. The data set’s modelability score or MODI index sorts active compounds from inactive compounds to determine the likelihood of obtaining the CSAR model. It was found that all the fingerprint descriptors have a MODI value greater than 0.65, proving that the data set is reliable for developing a classification model. Table 2 lists all of these fingerprints, as well as their descriptions and MODI indices.

Download:

Table 2. List of molecular fingerprints employed in the current study for representing chemical structures of the M^pro inhibitor dataset along with their MODI indices.

https://doi.org/10.1371/journal.pone.0287179.t002

To distinguish between active and inactive M^pro inhibitors, we created the CSAR model using the RF algorithm in this work. Table 3 displays the results of 100 independent runs with all the distinct categories of fingerprints, including internal validation test, 10-fold CV, and external validation test. Best averaged values for the MACCS fingerprints were Ac 84.69% and MCC 0.691, as determined by a 10-fold CV analysis. The external validation for the MACCS, Klekota–Roth, and 2D atom pairs fingerprint descriptors, as shown in Table 3, was also better than the rest of the descriptors. Taking into account the results from 10-fold CV as well as the external validation tests, it is found that the MACCS fingerprint descriptors outperform the other fingerprint classes. Fig 3 contains the plot of experimental vs predicted pIC50 values for model that was constructed with MACCS fingerprint descriptors.

Download:

Fig 3. Plot showing experimental versus predicted pIC50 values for model constructed with MACCS fingerprint descriptors.

https://doi.org/10.1371/journal.pone.0287179.g003

Download:

Table 3. Performance summary of CSAR models for predicting M^pro inhibitors of SARS-CoV-2.

https://doi.org/10.1371/journal.pone.0287179.t003

Interpretation of feature importance

The top-ranked MACCS fingerprints as obtained from the RF model are mentioned in Table 4, comprised of fingerprints pertaining to different classes such as aromatic compounds, nitrogen-containing compounds, oxygen-containing compounds, halogens etc.

Download:

Table 4. List of the top-ranking MACCS fingerprints and their corresponding description.

https://doi.org/10.1371/journal.pone.0287179.t004

Model deployment as the M^propred web-app and assessment

In order to allow biologists or chemists without a computer science background to apply the prediction model in their research, we deployed it as a public web-app known as the M^propred and is available at https://share.streamlit.io/nadimfrds/mpropred/Mpropred_app.py. Briefly, a guide on using the M^propred web-app (Fig 4) is given below:

Step 1. A text file (.txt) should be created containing the SMILES ID of the desired compounds space separated by a given name/ID (Fig 4A). SMILES IDs for any desired small compounds can be acquired from various databases e.g. Drugbank [54], PubChem [55] or ChemSpider [56] whereas custom compounds can be drawn on JSME structure editor [57] or ChemDraw [58] so as to create the SMILES notation of unknown compounds.

Step 2. The above-mentioned URL should be typed into any web browser to view homepage of the web-app (Fig 4B).

Step 3. The created text file should be uploaded to the web-app by clicking on the “Browse files” button (Fig 4C).

Step 4. The process of prediction can be started upon clicking on the “Predict!” button (Fig 4C).

Step 5. The results are showed in a box found below the “Prediction results” heading (Fig 4D). Typically, only a few seconds is required for the web-app to process the task. The users can also download the predicted results in the form of a CSV file by clicking the “Download Predictions” button.

Download:

Fig 4.

https://doi.org/10.1371/journal.pone.0287179.g004

Binding affinity of CMNPD compounds with M^pro

Out of the various possible binding positions of each compound predicted by Autodock Vina, the best one was picked considering the lowest binding energy. The molecular docking score of top 200 CMNPD compounds with M^pro ranged from -4.3 Kcal/mol to -10 Kcal/mol shown in S14 File while the result of top 5 compounds is presented in Table 5. The amino acid interactions of M^pro with the top 5 compounds was also identified. The lowest binding energy was found for the compound CMNPD285. The CMNPD16005 is stabilized by a highest number of seven hydrogen bonds and four hydrophobic bonds while binding with the M^pro. The second highest number of hydrogen bonds (6) were formed in the CMNPD12721 complex which was also stabilized by seven hydrophobic bonds. All the 5 compounds formed stable interaction with the active site residues and the catalytic dyad comprised of His41 and Cys145 residues of M^pro. The detailed interaction profile of the top 5 compounds including the N3 ligand with M^pro is explored in Fig 5.

Download:

Fig 5.

Two-dimensional (2D) representation of molecular docking analysis between the SARS-CoV-2 M^pro and (A) N3, (B) CMNPD285, (C) CMNPD20581, (D) CMNPD12721, (E) CMNPD16005, (F) CMNPD6083.

https://doi.org/10.1371/journal.pone.0287179.g005

Download:

Table 5. Predicted pIC50 and binding affinity score of top 5 compounds from CMNPD database against M^pro.

https://doi.org/10.1371/journal.pone.0287179.t005

Molecular dynamics (MD) simulation results

The RMSD of backbone atoms of the protein-ligand complexes were analyzed to view their stability. It can be observed from Fig 6(A) that CMNPD16005 complex shows the lowest RMSD than all other complexes. Surprisingly, the RMSD of the 6LU7-N3 complex is a bit higher than the CMNPD16005, which denotes the significant stability of CMNPD6083. The RMSD of CMNPD285 complex reaches to ∼0.4 nm from 60 to 85 ns, but the value increases after 85 ns and reaches to 0.3 nm. While viewing into the RMSD of CMNPD12721 complex, a steady increase of RMSD is observed after 60 ns. The value is decreased eventually indicating that CMNPD12721 might change the conformation of protein. Unlike the control and CMNPD16005 complex, RMSD of the CMNPD6083 complex is the mostly stable. Particularly, the CMNPD20581 complex shows the highest RMSD and higher degree of fluctuations throughout the period.

Download:

Fig 6.

The Root-mean-square deviation (A), Root-mean-square fluctuation (B), Radius of gyration (C), Solvent accessible surface area (D) and hydrogen bond (E) analysis of protein-ligand complexes from the molecular simulation of 100 ns at 300 K.

https://doi.org/10.1371/journal.pone.0287179.g006

As RMSF aids in understanding the region of the receptor that is fluctuated throughout simulation, the flexibility of every residue is determined to gain a better understanding of how ligand binding impacts receptor flexibility. It is understood from Fig 6(B) that the binding of CMNPD12721 makes the M^pro most flexible in almost all areas in comparison to all other complexes. Overall, the residues such as Glu47, Met49, Leu50, Tyr154, Ala194, Thr196, Arg222, Asn277 and Phe305 are found flexible in case of both control and the ligand-bonded complexes.

The Rg represents the compactness of protein-ligand complexes. The lesser the fluctuation across the simulation period, the more compact the system is. The Rg of the 6LU7-N3 and CMNPD285 complexes were found to be nearly stable in case of fluctuation consistency throughout the simulation (Fig 6C). Besides, the Rg of CMNPD20581 was increased from 40 to 100 ns. The higher change of Rg might be due to protein folding, or distinct structural changes. The remaining complexes showed decreased Rg values indicating greater rigidness throughout the simulation period.

A higher SASA value implies that the protein volume is expanding, and a lower degree of fluctuation is mostly expected over time. SASA can be altered by the binding of any molecule, and this can have a significant impact on the receptor structure. The SASA values of all the complexes including the control were found lowest during the simulation period suggesting that the presence of these molecules potentially could limit protein expansion (Fig 6D).

Since intermolecular hydrogen bonds are known to contribute to conformational stability, the number of total hydrogen bonds in the protein-ligand complexes were determined. Most hydrogen bonds is observed for 6LU7-N3 complex, while the lowest number is observed in CMNPD20581 complex over the simulation period (Fig 6E). The remaining complexes possessed a significant number of hydrogen bonds (ranging from 3 to 8) compared to the CMNPD20581 complex.

Post simulation binding free energy results

Using the MM/PBSA method, the binding free energies of the last 20 ns with a 100 ps interval was estimated from MD trajectories. The overall binding energies of all the complexes were negative, showing greater binding (Table 6). The CMNPD16005 complex showed the lowest binding free energy (-296.193 +/- 25.797 KJ/mol), indicating the most stable conformation of the compound. The other complexes similarly had a low binding energy, suggesting that they could be utilized as potential compounds. A comparative analysis of the binding free energies of the complexes were illustrated in Fig 7(A). The results for the amino acid residue contribution in the binding of the compounds are shown in Fig 7(B). The binding of the compounds to M^pro involved the notable contribution of leu27, Met49, Cys145, Leu167, Pro168, and Thr190 amino acid residues.

Download:

Fig 7.

Graphical illustration of the binding free energy (A) and per residue contribution plot of protein-ligand complexes (B).

https://doi.org/10.1371/journal.pone.0287179.g007

Download:

Table 6. Binding free energy calculations (MM/PBSA) for six protein-ligand complexes.

https://doi.org/10.1371/journal.pone.0287179.t006

Discussion

The COVID-19 pandemic has caused severe damage on the health and daily lives of billions of people around the world over the last two years. We’ve seen a race against time to vaccinate as many people as possible in recent months; however, discrepancies in vaccine distribution between nations, as well as new developing variants, pose an additional public health risk, making it difficult to achieve full immunization [59]. Several vaccine formulations are now available, assisting in the development of immunity [60–63]. Nonetheless, there is an increasing interest in developing new anti-covid medications. The M^pro, which is responsible for the cleavage of polypeptides during viral genome transcription, is a fascinating drug target in this scenario.

In the current study, we aimed to develop a classification model that is able to determine active from inactive compounds, and build a web-app for differentiating compounds for M^pro with selectivity. We followed the Organisation for Economic Co-operation and Development (OECD) recommendations to develop robust QSAR models for this purpose [64]. These guidelines comprise of the following major points: (i) the data set should have a defined endpoint, (ii) it should use an explicit learning algorithm, (iii) there should be a defined applicability domain of the built model, (iv) appropriate measurement of robustness and predictivity and (v) interpretation of the important features of the QSAR model. We initially extracted a dataset of 758 compounds from literature review and thresholds of <0.5 and >10 μM for identifying active compounds from the inactives in order to build a classification model. Upon excluding the intermediate sets of compounds, we obtained a curated set of 478 compounds for detailed analysis. It is feasible to determine if a compound will exhibit the biological or pharmacological property needed for an orally active medicine in humans utilizing the Lipinski’s rule-of-five (Ro5) approach. These characteristics are based on the fact that almost all drugs are relatively large lipophilic compounds with MW, ALogP, the number of hydrogen hydrogen bond donors, and the number of hydrogen bond acceptors. We found that most of the compounds meet the Ro5 criteria (Fig 2B–2E) and the findings of statistical analysis from Mann–Whitney U test showed a significant difference between the active and inactive compounds (Table 1). Also, the chemical space distribution shows that the external set lies well within the areas of the internal set indicating that the AD is well defined for the developed CSAR model found through PCA analysis results (Fig 2F).

Furthermore, we used interpretable molecular fingerprints to develop interpretable QSAR models and evaluated the model performances for all the used 12 fingerprints, following the aforementioned guidelines. Also, it is necessary to identify and address the activity cliffs in the data set using the data set’s modelability score or MODI index before the predictive model can be developed. The data set was found to have a MODI value more than 0.65 for all the 12 fingerprint descriptors, indicating that it is reliable for developing a classification model (Table 2). Then we developed a QSAR model utilizing the random forest (RF) algorithm in order for differentiation of the active and inactive inhibitors for M^pro. The best averaged values determined by a 10-fold CV analysis was found for the MACCS fingerprint descriptors (Ac of 89%, Sn of 89%, Sc of 73%, and MCC of 79%) (Table 3). Similarly, Klekota–Roth and 2D atom pairs descriptors performed well, with the second and third highest best values for Ac and MCC, respectively, with Klekota–Roth fingerprints providing Ac and MCC values of 83.64% and 0.664, respectively, and 2D atom pairs fingerprints providing Ac and MCC values of 85.74% and 0.713, respectively (Table 3). We found that the MACCS fingerprints were the best choice for model interpretation based on the Ac values, MCC values, overall external and CV.

Later, an investigation of the important features on selected descriptors was conducted to obtain a better view of the mechanistic details driving M^pro. The top-ranked MACCS descriptors include descriptors of various classes such as aromatic compounds, nitrogen-containing compounds, oxygen-containing compounds and halogens as obtained from the RF model (Table 4). M^pro has been shown to be inhibited by a range of N-substituted isatin derivatives, with the highest activity being associated with derivatives having carboxamide groups at C-5 of the isatin core (IC50 = 0.045–17.8 μM) [65]. Several oxygen atoms containing small compounds were also found to inhibit M^pro and blocks viral transcription [66,67]. Kowit et al. identified halogenated baicalein as a potent inhibitor of the M^pro and they confirmed its inhibitory activity in an in vitro assay [68]. It was also found that the addition of halogen groups improves binding strength by an order of magnitude [69]. Hossum et al. generated a pharmacophore model and found three acceptor features and one aromatic ring feature as common in all the active hits including the co-crystallized ligand [70]. Thus, the top-ranked MACCS descriptors are in significant correlation with the properties of laboratory validated potent M^pro inhibitors.

In a normal predictive model life cycle, after models are validated and outcomes are shown in the publications, the model’s utility is essentially over. In this way, the model has accomplished its goal to make predictions and offer useful insights into the underlying key characteristics. We believed that deployment of the predictive model as a public web-app that allows scientists and researchers, particularly in the fields of computational chemistry and biology, to use the predictive insights from the model would significantly improve its value, while also benefiting scientific communities, would greatly extend the model’s life cycle. We made the web-app available at “Streamlit share” platform (Fig 4). In order to test the web-app to determine the correlation between predicted pIC50 and the binding affinity, we applied an integrated molecular modeling approach. All the available 31,492 compounds were submitted to the web-app to predict their pIC50 and it was found that top five compounds with highest binding affinity to M^pro had pIC50 values ranging from 6.37 to 7 (Table 5). They formed sufficient hydrogen bond and hydrophobic interactions and all of them formed stable interactions with the catalytic dyad consisting of His41 and Cys145 (Fig 5).

Also, MD simulation results re-confirmed the stability of these five compounds with M^pro. The RMSD plot indicates that all the five compounds are stable, with no unexpected rises in RMSD values across the simulated time (Fig 6A). The complexes had fewer fluctuations in the allowed range, according to the RMSF study (Fig 6B). The radius of gyration (Rg) of the protein-ligand complexes tended to be similar, indicating that every complex had a similar compactness behavior (Fig 6C). The SASA values showed that the volume of the complexes did not substantially increase (Fig 6D). Throughout the simulation, a significant number of hydrogen bonds were observed in all of the complexes, further elucidating their conformational stability (Fig 6E). Furthermore, the binding free energies for all of the complexes were estimated using the MM/PBSA method, and the results suggest that the complexes have a favorable binding energy with M^pro (Table 6 and Fig 7A). It can be determined from the per-residue interaction energy profile that the leu27, Met49, Cys145, Leu167, Pro168, and Thr190 residues of M^pro played an important role in protein-ligand stability and contributed significantly to the binding of the compounds (Fig 7B). As a result, these compounds may have the potential to interfere with and block the activity of SARS-CoV-2 M^pro.

Thus, the web-app presented in the current study can be utilized for further research on various compounds to get a view into their anti-M^pro activity. Also, upon evaluating the toxicity of the five marine derived compounds by various toxicity assays, their inhibition efficacy can be tested through in vitro laboratory validations.

Supporting information

S1 File. The SMILES ID and additional details of the 758 compounds.

https://doi.org/10.1371/journal.pone.0287179.s001

(XLSX)

S2 File. Computed AtomPairs2DCount fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s002

(CSV)

S3 File. Computed AtomPairs2D fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s003

(CSV)

S4 File. Computed CDK fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s004

(CSV)

S5 File. Computed CDK Extended fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s005

(CSV)

S6 File. Computed CDK Graph only fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s006

(CSV)

S7 File. Computed EState fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s007

(CSV)

S8 File. Computed KlekotaRothCount fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s008

(CSV)

S9 File. Computed KlekotaRoth fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s009

(CSV)

S10 File. Computed MACCS fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s010

(CSV)

S11 File. Computed PubChem fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s011

(CSV)

S12 File. Computed Substructure fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s012

(CSV)

S13 File. Computed SubstructureCount fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s013

(CSV)

S14 File. The docking score of all the CMNPD compounds with M^pro.

https://doi.org/10.1371/journal.pone.0287179.s014

(XLSX)

Acknowledgments

The author(s) acknowledge the Bioinformatics Division, National Institute of Biotechnology, Bangladesh, for their extended support of supercomputing system during this study.

References

1. Lam ME. United by the global COVID-19 pandemic: divided by our values and viral identities. Humanit Soc Sci Commun 2021 81. 2021;8: 1–6.
- View Article
- Google Scholar
2. Lambert H, Gupte J, Fletcher H, Hammond L, Lowe N, Pelling M, et al. COVID-19 as a global challenge: towards an inclusive and sustainable future. Lancet Planet Heal. 2020;4: e312–e314. pmid:32702296
- View Article
- PubMed/NCBI
- Google Scholar
3. Roberts DL, Rossman JS, Jarić I. Dating first cases of COVID-19. PLOS Pathog. 2021;17: e1009620. pmid:34166465
- View Article
- PubMed/NCBI
- Google Scholar
4. Worobey M. Dissecting the early COVID-19 cases in Wuhan. Science (80-). 2021;374: 1202–1204. pmid:34793199
- View Article
- PubMed/NCBI
- Google Scholar
5. Amicone M, Borges V, Alves MJ, Isidro J, Zé-Zé L, Duarte S, et al. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evol Med Public Heal. 2022;10: 142–155. pmid:35419205
- View Article
- PubMed/NCBI
- Google Scholar
6. Domingo E, Holland JJ. RNA VIRUS MUTATIONS AND FITNESS FOR SURVIVAL. https://doi.org/101146/annurev.micro511151. 2003;51: 151–178. pmid:9343347
- View Article
- PubMed/NCBI
- Google Scholar
7. Sanjuán R, Domingo-Calap P. Mechanisms of viral mutation. Cell Mol Life Sci 2016 7323. 2016;73: 4433–4448. pmid:27392606
- View Article
- PubMed/NCBI
- Google Scholar
8. Duffy S, Shackelton LA, Holmes EC. Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet 2008 94. 2008;9: 267–276. pmid:18319742
- View Article
- PubMed/NCBI
- Google Scholar
9. Yewdell JW. Antigenic drift: Understanding COVID-19. Immunity. 2021;54: 2681–2687. pmid:34910934
- View Article
- PubMed/NCBI
- Google Scholar
10. Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol 2021 197. 2021;19: 409–424. pmid:34075212
- View Article
- PubMed/NCBI
- Google Scholar
11. Martin DP, Weaver S, Tegally H, San JE, Shank SD, Wilkinson E, et al. The emergence and ongoing convergent evolution of the SARS-CoV-2 N501Y lineages. Cell. 2021;184: 5189–5200.e7. pmid:34537136
- View Article
- PubMed/NCBI
- Google Scholar
12. Huang KYA, Rijal P, Schimanski L, Powell TJ, Lin TY, McCauley JW, et al. Focused antibody response to influenza linked to antigenic drift. J Clin Invest. 2015;125: 2631–2645. pmid:26011643
- View Article
- PubMed/NCBI
- Google Scholar
13. Das SR, Hensley SE, Ince WL, Brooke CB, Subba A, Delboy MG, et al. Defining Influenza A Virus Hemagglutinin Antigenic Drift by Sequential Monoclonal Antibody Selection. Cell Host Microbe. 2013;13: 314–323. pmid:23498956
- View Article
- PubMed/NCBI
- Google Scholar
14. Tahir ul Qamar M, Alqahtani SM, Alamri MA, Chen LL. Structural basis of SARS-CoV-2 3CL pro and anti-COVID-19 drug discovery from medicinal plants. J Pharm Anal. 2020;10: 313–319. pmid:32296570
- View Article
- PubMed/NCBI
- Google Scholar
15. Pillaiyar T, Manickam M, Namasivayam V, Hayashi Y, Jung SH. An overview of severe acute respiratory syndrome-coronavirus (SARS-CoV) 3CL protease inhibitors: Peptidomimetics and small molecule chemotherapy. J Med Chem. 2016;59: 6595–6628. pmid:26878082
- View Article
- PubMed/NCBI
- Google Scholar
16. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57: 4977–5010. pmid:24351051
- View Article
- PubMed/NCBI
- Google Scholar
17. Prachayasittikul V, Worachartcheewan A, Shoombuatong W, Songtawee N, Simeon S, Prachayasittikul V, et al. Computer-Aided Drug Design of Bioactive Natural Products. Curr Top Med Chem. 2015;15: 1780–1800. pmid:25961523
- View Article
- PubMed/NCBI
- Google Scholar
18. Nantasenamat C, Isarankura-Na-Ayudhya C, Prachayasittikul V. Advances in computational methods to predict the biological activity of compounds. Expert Opin Drug Discov. 2010;5: 633–654. pmid:22823204
- View Article
- PubMed/NCBI
- Google Scholar
19. Dudek A, Arodz T, Galvez J. Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb Chem High Throughput Screen. 2006;9: 213–228. pmid:16533155
- View Article
- PubMed/NCBI
- Google Scholar
20. Borsari C, Santarem N, MacEdo S, Jiménez-Antón MD, Torrado JJ, Olías-Molero AI, et al. SAR Studies and Biological Characterization of a Chromen-4-one Derivative as an Anti- Trypanosoma brucei Agent. ACS Med Chem Lett. 2019;10: 528–533. pmid:30996791
- View Article
- PubMed/NCBI
- Google Scholar
21. Pir M, Agirbas H, Budak F, Ilter M. Synthesis, characterization, antimicrobial activity, and QSAR studies on substituted oxadiazaboroles. Med Chem Res. 2016;25: 1794–1812.
- View Article
- Google Scholar
22. De Azevedo WF, Canduri F, Basso LA, Palma MS, Santos DS. Determining the structural basis for specificity of ligands using crystallographic screening. Cell Biochem Biophys 2006 443. 2006;44: 405–411. pmid:16679527
- View Article
- PubMed/NCBI
- Google Scholar
23. Suvannang N, Preeyanon L, Malik AA, Schaduangrat N, Shoombuatong W, Worachartcheewan A, et al. Probing the origin of estrogen receptor alpha inhibition via large-scale QSAR study. RSC Adv. 2018;8: 11344–11356. pmid:35542807
- View Article
- PubMed/NCBI
- Google Scholar
24. Prachayasittikul V, Pingaew R, Worachartcheewan A, Nantasenamat C, Prachayasittikul S, Ruchirawat S, et al. Synthesis, anticancer activity and QSAR study of 1,4-naphthoquinone derivatives. Eur J Med Chem. 2014;84: 247–263. pmid:25019480
- View Article
- PubMed/NCBI
- Google Scholar
25. da Cunha E, Matos K, Ramalho T. QSAR and Docking Studies of HCV NS3 Serine Protease Inhibitors. Med Chem (Los Angeles). 2013;9: 774–805. pmid:23140577
- View Article
- PubMed/NCBI
- Google Scholar
26. Worachartcheewan A, Prachayasittikul V, Toropova AP, Toropov AA, Nantasenamat C. Large-scale structure-activity relationship study of hepatitis C virus NS5B polymerase inhibition using SMILES-based descriptors. Mol Divers. 2015;19: 955–964. pmid:26164590
- View Article
- PubMed/NCBI
- Google Scholar
27. Gomes IdS Santana CA, Marcolino LS Lima LHFd, Melo-Minardi RCd Dias RS, et al. (2022) Computational prediction of potential inhibitors for SARS-COV-2 main protease based on machine learning, docking, MM-PBSA calculations, and metadynamics. PLoS ONE 17(4): e0267471. pmid:35452494
- View Article
- PubMed/NCBI
- Google Scholar
28. Mekni N, Coronnello C, Langer T, De Rosa M, Perricone U. Support vector machine as a supervised learning for the prioritization of novel potential sars‐cov‐2 main protease inhibitors. Int J Mol Sci. 2021;22. pmid:34299333
- View Article
- PubMed/NCBI
- Google Scholar
29. Nand M, Maiti P, Joshi T, Chandra S, Pande V, Kuniyal JC, et al. Virtual screening of anti-HIV1 compounds against SARS-CoV-2: machine learning modeling, chemoinformatics and molecular dynamics simulation based analysis. 2020 [cited 2 Jun 2022]. pmid:33230180
- View Article
- PubMed/NCBI
- Google Scholar
30. Macip G, Garcia-segura P, Mestres-truyol J, Saldivar-espinoza B, Pujadas G, Garcia-Vallvé S. A review of the current landscape of SARS-CoV-2 main protease inhibitors: Have we hit the bullseye yet? Int J Mol Sci. 2022;23: 259.
- View Article
- Google Scholar
31. Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32: 1466–1474. pmid:21425294
- View Article
- PubMed/NCBI
- Google Scholar
32. Wójcikowski M, Siedlecki P, Ballester PJ. Building Machine-Learning Scoring Functions for Structure-Based Prediction of Intermolecular Binding Affinity. Methods Mol Biol. 2019;2053: 1–12. pmid:31452095
- View Article
- PubMed/NCBI
- Google Scholar
33. Ballester PJ. Machine Learning for Molecular Modelling in Drug Design. Biomol 2019, Vol 9, Page 216. 2019;9: 216. pmid:31167503
- View Article
- PubMed/NCBI
- Google Scholar
34. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Classif Regres Trees. 2017; 1–358.
- View Article
- Google Scholar
35. Breiman L. Random Forests. Mach Learn 2001 451. 2001;45: 5–32. :1010933404324.
- View Article
- Google Scholar
36. Liaw A, Wiener M. Classification and Regression by randomForest. 2002;2. Available: http://www.stat.berkeley.edu/.
- View Article
- Google Scholar
37. Golbraikh A, Muratov E, Fourches D, Tropsha A. Data set modelability by QSAR. J Chem Inf Model. 2014;54: 1–4. pmid:24251851
- View Article
- PubMed/NCBI
- Google Scholar
38. Malik AA, Phanus-umporn C, Schaduangrat N, Shoombuatong W, Isarankura-Na-Ayudhya C, Nantasenamat C. HCVpred: A web server for predicting the bioactivity of hepatitis C virus NS5B inhibitors. J Comput Chem. 2020;41: 1820–1834. pmid:32449536
- View Article
- PubMed/NCBI
- Google Scholar
39. Lyu C, Chen T, Qiang B, Liu N, Wang H, Zhang L, et al. CMNPD: a comprehensive marine natural products database towards facilitating drug discovery from the ocean. Nucleic Acids Res. 2021;49: D509–D515. pmid:32986829
- View Article
- PubMed/NCBI
- Google Scholar
40. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J Comput Chem. 2010;31: 455. pmid:19499576
- View Article
- PubMed/NCBI
- Google Scholar
41. Ferdous N, Reza MN, Islam MS, Hossain Emon MT, Mohiuddin AKM, Hossain MU. Newly designed analogues from SARS-CoV inhibitors mimicking the druggable properties against SARS-CoV-2 and its novel variants. RSC Adv. 2021;11: 31460–31476. pmid:35496863
- View Article
- PubMed/NCBI
- Google Scholar
42. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2: 19–25.
- View Article
- Google Scholar
43. Reza MN, Ferdous N, Emon MTH, Islam MS, Mohiuddin AKM, Hossain MU. Pathogenic genetic variants from highly connected cancer susceptibility genes confer the loss of structural stability. Sci Reports 2021 111. 2021;11: 1–19. pmid:34584144
- View Article
- PubMed/NCBI
- Google Scholar
44. AW S DM van A. PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. Acta Crystallogr D Biol Crystallogr. 2004;60: 1355–1363. pmid:15272157
- View Article
- PubMed/NCBI
- Google Scholar
45. N S , AP E, A C, S R, M W, AE M, et al. Definition and testing of the GROMOS force-field versions 54A7 and 54B7. Eur Biophys J. 2011;40: 843–856. pmid:21533652
- View Article
- PubMed/NCBI
- Google Scholar
46. Kumari R, Kumar R, Consortium OSDD, Lynn A. g_mmpbsa—A GROMACS Tool for High-Throughput MM-PBSA Calculations. J Chem Inf Model. 2014;54: 1951–1962. pmid:24850022
- View Article
- PubMed/NCBI
- Google Scholar
47. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. J Chem Inf Comput Sci. 2003;43: 493–500. pmid:12653513
- View Article
- PubMed/NCBI
- Google Scholar
48. Hall LH, Kier LB. Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information. J Chem Inf Comput Sci. 1995;35: 1039–1045.
- View Article
- Google Scholar
49. Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42: 1273–1280. pmid:12444722
- View Article
- PubMed/NCBI
- Google Scholar
50. V1. PubChem Substructure Fingerprint. [cited 3 Jun 2022]. Available: http://pubchem.
51. Simeon S, Jongkon N. Construction of Quantitative Structure Activity Relationship (QSAR) Models to Predict Potency of Structurally Diversed Janus Kinase 2 Inhibitors. Molecules. 2019;24. pmid:31805692
- View Article
- PubMed/NCBI
- Google Scholar
52. Klekota J, Roth FP. Chemical substructures that enrich for biological activity. Bioinformatics. 2008;24: 2518–2525. pmid:18784118
- View Article
- PubMed/NCBI
- Google Scholar
53. Smith DH, Carhart RE, Venkataraghavan R. Atom Pairs as Molecular Features in Structure-Activity Studies: Definition and Applications. J Chem Inf Comput Sci. 1985;25: 64–73.
- View Article
- Google Scholar
54. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36: D901. pmid:18048412
- View Article
- PubMed/NCBI
- Google Scholar
55. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, et al. PubChem Substance and Compound databases. Nucleic Acids Res. 2016;44: D1202. pmid:26400175
- View Article
- PubMed/NCBI
- Google Scholar
56. Pence HE, Williams A. ChemSpider: An Online Chemical Information Resource. J Chem Educ. 2010;87: 1123–1124.
- View Article
- Google Scholar
57. Bienfait B, Ertl P. JSME: a free molecule editor in JavaScript. J Cheminform. 2013;5. pmid:23694746
- View Article
- PubMed/NCBI
- Google Scholar
58. Cousins KR. Computer review of ChemDraw Ultra 12.0. J Am Chem Soc. 2011;133: 8388. pmid:21561109
- View Article
- PubMed/NCBI
- Google Scholar
59. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med. 2020;382: 727–733. pmid:31978945
- View Article
- PubMed/NCBI
- Google Scholar
60. Polack FP, Thomas SJ, Kitchin N, Absalon J, Gurtman A, Lockhart S, et al. Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine. N Engl J Med. 2020;383: 2603–2615. pmid:33301246
- View Article
- PubMed/NCBI
- Google Scholar
61. Voysey M, Clemens SAC, Madhi SA, Weckx LY, Folegatti PM, Aley PK, et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK. Lancet. 2021;397: 99–111. pmid:33306989
- View Article
- PubMed/NCBI
- Google Scholar
62. Zhang Y, Zeng G, Pan H, Li C, Hu Y, Chu K, et al. Safety, tolerability, and immunogenicity of an inactivated SARS-CoV-2 vaccine in healthy adults aged 18–59 years: a randomised, double-blind, placebo-controlled, phase 1/2 clinical trial. Lancet Infect Dis. 2021;21: 181–192. pmid:33217362
- View Article
- PubMed/NCBI
- Google Scholar
63. Baden LR, El Sahly HM, Essink B, Kotloff K, Frey S, Novak R, et al. Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine. N Engl J Med. 2021;384: 403–416. pmid:33378609
- View Article
- PubMed/NCBI
- Google Scholar
64. Shoombuatong W, Schaduangrat N, Nantasenamat C. Towards understanding aromatase inhibitory activity via QSAR modeling. EXCLI J. 2018;17: 688–708. pmid:30190660
- View Article
- PubMed/NCBI
- Google Scholar
65. Elsaman T, Mohamed MS, Eltayib EM, Abdel-aziz HA, Abdalla AE, Munir MU, et al. Isatin derivatives as broad-spectrum antiviral agents: the current landscape. Med Chem Res. 2022;31: 244–273. pmid:35039740
- View Article
- PubMed/NCBI
- Google Scholar
66. Hattori S ichiro, Higashi-Kuwata N, Hayashi H, Allu SR, Raghavaiah J, Bulut H, et al. A small molecule compound with an indole moiety inhibits the main protease of SARS-CoV-2 and blocks virus replication. Nat Commun 2021 121. 2021;12: 1–12. pmid:33510133
- View Article
- PubMed/NCBI
- Google Scholar
67. Weglarz-Tomczak E, Tomczak JM, Talma M, Burda-Grabowska M, Giurg M, Brul S. Identification of ebselen and its analogues as potent covalent inhibitors of papain-like protease from SARS-CoV-2. Sci Reports 2021 111. 2021;11: 1–10. pmid:33574416
- View Article
- PubMed/NCBI
- Google Scholar
68. Hengphasatporn K, Wilasluck P, Deetanya P, Wangkanont K, Chavasiri W, Visitchanakun P, et al. Halogenated Baicalein as a Promising Antiviral Agent toward SARS-CoV-2 Main Protease. J Chem Inf Model. 2022;62: 1498–1509. pmid:35245424
- View Article
- PubMed/NCBI
- Google Scholar
69. Glaser J, Sedova A, Galanie S, Kneller DW, Davidson RB, Maradzike E, et al. Hit Expansion of a Noncovalent SARS-CoV-2 Main Protease Inhibitor. ACS Pharmacol Transl Sci. 2022;5: 255–265. pmid:35434531
- View Article
- PubMed/NCBI
- Google Scholar
70. Abdallah HM, El-Halawany AM, Sirwi A, El-Araby AM, Mohamed GA, Ibrahim SRM, et al. Repurposing of Some Natural Product Isolates as SARS-COV-2 Main Protease Inhibitors via In Vitro Cell Free and Cell-Based Antiviral Assessments and Molecular Modeling Approaches. Pharm 2021, Vol 14, Page 213. 2021;14: 213. pmid:33806331
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Lam ME. United by the global COVID-19 pandemic: divided by our values and viral identities. Humanit Soc Sci Commun 2021 81. 2021;8: 1–6.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Lambert H, Gupte J, Fletcher H, Hammond L, Lowe N, Pelling M, et al. COVID-19 as a global challenge: towards an inclusive and sustainable future. Lancet Planet Heal. 2020;4: e312–e314. pmid:32702296
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Roberts DL, Rossman JS, Jarić I. Dating first cases of COVID-19. PLOS Pathog. 2021;17: e1009620. pmid:34166465
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Worobey M. Dissecting the early COVID-19 cases in Wuhan. Science (80-). 2021;374: 1202–1204. pmid:34793199
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref5] 5. Amicone M, Borges V, Alves MJ, Isidro J, Zé-Zé L, Duarte S, et al. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evol Med Public Heal. 2022;10: 142–155. pmid:35419205
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Domingo E, Holland JJ. RNA VIRUS MUTATIONS AND FITNESS FOR SURVIVAL. https://doi.org/101146/annurev.micro511151. 2003;51: 151–178. pmid:9343347
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Sanjuán R, Domingo-Calap P. Mechanisms of viral mutation. Cell Mol Life Sci 2016 7323. 2016;73: 4433–4448. pmid:27392606
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Duffy S, Shackelton LA, Holmes EC. Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet 2008 94. 2008;9: 267–276. pmid:18319742
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Yewdell JW. Antigenic drift: Understanding COVID-19. Immunity. 2021;54: 2681–2687. pmid:34910934
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref10] 10. Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol 2021 197. 2021;19: 409–424. pmid:34075212
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref11] 11. Martin DP, Weaver S, Tegally H, San JE, Shank SD, Wilkinson E, et al. The emergence and ongoing convergent evolution of the SARS-CoV-2 N501Y lineages. Cell. 2021;184: 5189–5200.e7. pmid:34537136
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref12] 12. Huang KYA, Rijal P, Schimanski L, Powell TJ, Lin TY, McCauley JW, et al. Focused antibody response to influenza linked to antigenic drift. J Clin Invest. 2015;125: 2631–2645. pmid:26011643
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref13] 13. Das SR, Hensley SE, Ince WL, Brooke CB, Subba A, Delboy MG, et al. Defining Influenza A Virus Hemagglutinin Antigenic Drift by Sequential Monoclonal Antibody Selection. Cell Host Microbe. 2013;13: 314–323. pmid:23498956
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Tahir ul Qamar M, Alqahtani SM, Alamri MA, Chen LL. Structural basis of SARS-CoV-2 3CL pro and anti-COVID-19 drug discovery from medicinal plants. J Pharm Anal. 2020;10: 313–319. pmid:32296570
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Pillaiyar T, Manickam M, Namasivayam V, Hayashi Y, Jung SH. An overview of severe acute respiratory syndrome-coronavirus (SARS-CoV) 3CL protease inhibitors: Peptidomimetics and small molecule chemotherapy. J Med Chem. 2016;59: 6595–6628. pmid:26878082
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57: 4977–5010. pmid:24351051
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Prachayasittikul V, Worachartcheewan A, Shoombuatong W, Songtawee N, Simeon S, Prachayasittikul V, et al. Computer-Aided Drug Design of Bioactive Natural Products. Curr Top Med Chem. 2015;15: 1780–1800. pmid:25961523
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref18] 18. Nantasenamat C, Isarankura-Na-Ayudhya C, Prachayasittikul V. Advances in computational methods to predict the biological activity of compounds. Expert Opin Drug Discov. 2010;5: 633–654. pmid:22823204
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref19] 19. Dudek A, Arodz T, Galvez J. Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb Chem High Throughput Screen. 2006;9: 213–228. pmid:16533155
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref20] 20. Borsari C, Santarem N, MacEdo S, Jiménez-Antón MD, Torrado JJ, Olías-Molero AI, et al. SAR Studies and Biological Characterization of a Chromen-4-one Derivative as an Anti- Trypanosoma brucei Agent. ACS Med Chem Lett. 2019;10: 528–533. pmid:30996791
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref21] 21. Pir M, Agirbas H, Budak F, Ilter M. Synthesis, characterization, antimicrobial activity, and QSAR studies on substituted oxadiazaboroles. Med Chem Res. 2016;25: 1794–1812.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref22] 22. De Azevedo WF, Canduri F, Basso LA, Palma MS, Santos DS. Determining the structural basis for specificity of ligands using crystallographic screening. Cell Biochem Biophys 2006 443. 2006;44: 405–411. pmid:16679527
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref23] 23. Suvannang N, Preeyanon L, Malik AA, Schaduangrat N, Shoombuatong W, Worachartcheewan A, et al. Probing the origin of estrogen receptor alpha inhibition via large-scale QSAR study. RSC Adv. 2018;8: 11344–11356. pmid:35542807
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref24] 24. Prachayasittikul V, Pingaew R, Worachartcheewan A, Nantasenamat C, Prachayasittikul S, Ruchirawat S, et al. Synthesis, anticancer activity and QSAR study of 1,4-naphthoquinone derivatives. Eur J Med Chem. 2014;84: 247–263. pmid:25019480
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref25] 25. da Cunha E, Matos K, Ramalho T. QSAR and Docking Studies of HCV NS3 Serine Protease Inhibitors. Med Chem (Los Angeles). 2013;9: 774–805. pmid:23140577
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref26] 26. Worachartcheewan A, Prachayasittikul V, Toropova AP, Toropov AA, Nantasenamat C. Large-scale structure-activity relationship study of hepatitis C virus NS5B polymerase inhibition using SMILES-based descriptors. Mol Divers. 2015;19: 955–964. pmid:26164590
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref27] 27. Gomes IdS Santana CA, Marcolino LS Lima LHFd, Melo-Minardi RCd Dias RS, et al. (2022) Computational prediction of potential inhibitors for SARS-COV-2 main protease based on machine learning, docking, MM-PBSA calculations, and metadynamics. PLoS ONE 17(4): e0267471. pmid:35452494
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref28] 28. Mekni N, Coronnello C, Langer T, De Rosa M, Perricone U. Support vector machine as a supervised learning for the prioritization of novel potential sars‐cov‐2 main protease inhibitors. Int J Mol Sci. 2021;22. pmid:34299333
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref29] 29. Nand M, Maiti P, Joshi T, Chandra S, Pande V, Kuniyal JC, et al. Virtual screening of anti-HIV1 compounds against SARS-CoV-2: machine learning modeling, chemoinformatics and molecular dynamics simulation based analysis. 2020 [cited 2 Jun 2022]. pmid:33230180
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref30] 30. Macip G, Garcia-segura P, Mestres-truyol J, Saldivar-espinoza B, Pujadas G, Garcia-Vallvé S. A review of the current landscape of SARS-CoV-2 main protease inhibitors: Have we hit the bullseye yet? Int J Mol Sci. 2022;23: 259.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref31] 31. Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32: 1466–1474. pmid:21425294
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref32] 32. Wójcikowski M, Siedlecki P, Ballester PJ. Building Machine-Learning Scoring Functions for Structure-Based Prediction of Intermolecular Binding Affinity. Methods Mol Biol. 2019;2053: 1–12. pmid:31452095
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref33] 33. Ballester PJ. Machine Learning for Molecular Modelling in Drug Design. Biomol 2019, Vol 9, Page 216. 2019;9: 216. pmid:31167503
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref34] 34. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Classif Regres Trees. 2017; 1–358.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref35] 35. Breiman L. Random Forests. Mach Learn 2001 451. 2001;45: 5–32. :1010933404324.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref36] 36. Liaw A, Wiener M. Classification and Regression by randomForest. 2002;2. Available: http://www.stat.berkeley.edu/.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref37] 37. Golbraikh A, Muratov E, Fourches D, Tropsha A. Data set modelability by QSAR. J Chem Inf Model. 2014;54: 1–4. pmid:24251851
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref38] 38. Malik AA, Phanus-umporn C, Schaduangrat N, Shoombuatong W, Isarankura-Na-Ayudhya C, Nantasenamat C. HCVpred: A web server for predicting the bioactivity of hepatitis C virus NS5B inhibitors. J Comput Chem. 2020;41: 1820–1834. pmid:32449536
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref39] 39. Lyu C, Chen T, Qiang B, Liu N, Wang H, Zhang L, et al. CMNPD: a comprehensive marine natural products database towards facilitating drug discovery from the ocean. Nucleic Acids Res. 2021;49: D509–D515. pmid:32986829
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref40] 40. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J Comput Chem. 2010;31: 455. pmid:19499576
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

[ref41] 41. Ferdous N, Reza MN, Islam MS, Hossain Emon MT, Mohiuddin AKM, Hossain MU. Newly designed analogues from SARS-CoV inhibitors mimicking the druggable properties against SARS-CoV-2 and its novel variants. RSC Adv. 2021;11: 31460–31476. pmid:35496863
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref42] 42. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2: 19–25.
View Article
Google Scholar

[160] View Article

[161] Google Scholar

[ref43] 43. Reza MN, Ferdous N, Emon MTH, Islam MS, Mohiuddin AKM, Hossain MU. Pathogenic genetic variants from highly connected cancer susceptibility genes confer the loss of structural stability. Sci Reports 2021 111. 2021;11: 1–19. pmid:34584144
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

[ref44] 44. AW S DM van A. PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. Acta Crystallogr D Biol Crystallogr. 2004;60: 1355–1363. pmid:15272157
View Article
PubMed/NCBI
Google Scholar

[167] View Article

[168] PubMed/NCBI

[169] Google Scholar

[ref45] 45. N S , AP E, A C, S R, M W, AE M, et al. Definition and testing of the GROMOS force-field versions 54A7 and 54B7. Eur Biophys J. 2011;40: 843–856. pmid:21533652
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref46] 46. Kumari R, Kumar R, Consortium OSDD, Lynn A. g_mmpbsa—A GROMACS Tool for High-Throughput MM-PBSA Calculations. J Chem Inf Model. 2014;54: 1951–1962. pmid:24850022
View Article
PubMed/NCBI
Google Scholar

[175] View Article

[176] PubMed/NCBI

[177] Google Scholar

[ref47] 47. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. J Chem Inf Comput Sci. 2003;43: 493–500. pmid:12653513
View Article
PubMed/NCBI
Google Scholar

[179] View Article

[180] PubMed/NCBI

[181] Google Scholar

[ref48] 48. Hall LH, Kier LB. Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information. J Chem Inf Comput Sci. 1995;35: 1039–1045.
View Article
Google Scholar

[183] View Article

[184] Google Scholar

[ref49] 49. Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42: 1273–1280. pmid:12444722
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

[ref50] 50. V1. PubChem Substructure Fingerprint. [cited 3 Jun 2022]. Available: http://pubchem.

[ref51] 51. Simeon S, Jongkon N. Construction of Quantitative Structure Activity Relationship (QSAR) Models to Predict Potency of Structurally Diversed Janus Kinase 2 Inhibitors. Molecules. 2019;24. pmid:31805692
View Article
PubMed/NCBI
Google Scholar

[191] View Article

[192] PubMed/NCBI

[193] Google Scholar

[ref52] 52. Klekota J, Roth FP. Chemical substructures that enrich for biological activity. Bioinformatics. 2008;24: 2518–2525. pmid:18784118
View Article
PubMed/NCBI
Google Scholar

[195] View Article

[196] PubMed/NCBI

[197] Google Scholar

[ref53] 53. Smith DH, Carhart RE, Venkataraghavan R. Atom Pairs as Molecular Features in Structure-Activity Studies: Definition and Applications. J Chem Inf Comput Sci. 1985;25: 64–73.
View Article
Google Scholar

[199] View Article

[200] Google Scholar

[ref54] 54. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36: D901. pmid:18048412
View Article
PubMed/NCBI
Google Scholar

[202] View Article

[203] PubMed/NCBI

[204] Google Scholar

[ref55] 55. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, et al. PubChem Substance and Compound databases. Nucleic Acids Res. 2016;44: D1202. pmid:26400175
View Article
PubMed/NCBI
Google Scholar

[206] View Article

[207] PubMed/NCBI

[208] Google Scholar

[ref56] 56. Pence HE, Williams A. ChemSpider: An Online Chemical Information Resource. J Chem Educ. 2010;87: 1123–1124.
View Article
Google Scholar

[210] View Article

[211] Google Scholar

[ref57] 57. Bienfait B, Ertl P. JSME: a free molecule editor in JavaScript. J Cheminform. 2013;5. pmid:23694746
View Article
PubMed/NCBI
Google Scholar

[213] View Article

[214] PubMed/NCBI

[215] Google Scholar

[ref58] 58. Cousins KR. Computer review of ChemDraw Ultra 12.0. J Am Chem Soc. 2011;133: 8388. pmid:21561109
View Article
PubMed/NCBI
Google Scholar

[217] View Article

[218] PubMed/NCBI

[219] Google Scholar

[ref59] 59. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med. 2020;382: 727–733. pmid:31978945
View Article
PubMed/NCBI
Google Scholar

[221] View Article

[222] PubMed/NCBI

[223] Google Scholar

[ref60] 60. Polack FP, Thomas SJ, Kitchin N, Absalon J, Gurtman A, Lockhart S, et al. Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine. N Engl J Med. 2020;383: 2603–2615. pmid:33301246
View Article
PubMed/NCBI
Google Scholar

[225] View Article

[226] PubMed/NCBI

[227] Google Scholar

[ref61] 61. Voysey M, Clemens SAC, Madhi SA, Weckx LY, Folegatti PM, Aley PK, et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK. Lancet. 2021;397: 99–111. pmid:33306989
View Article
PubMed/NCBI
Google Scholar

[229] View Article

[230] PubMed/NCBI

[231] Google Scholar

[ref62] 62. Zhang Y, Zeng G, Pan H, Li C, Hu Y, Chu K, et al. Safety, tolerability, and immunogenicity of an inactivated SARS-CoV-2 vaccine in healthy adults aged 18–59 years: a randomised, double-blind, placebo-controlled, phase 1/2 clinical trial. Lancet Infect Dis. 2021;21: 181–192. pmid:33217362
View Article
PubMed/NCBI
Google Scholar

[233] View Article

[234] PubMed/NCBI

[235] Google Scholar

[ref63] 63. Baden LR, El Sahly HM, Essink B, Kotloff K, Frey S, Novak R, et al. Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine. N Engl J Med. 2021;384: 403–416. pmid:33378609
View Article
PubMed/NCBI
Google Scholar

[237] View Article

[238] PubMed/NCBI

[239] Google Scholar

[ref64] 64. Shoombuatong W, Schaduangrat N, Nantasenamat C. Towards understanding aromatase inhibitory activity via QSAR modeling. EXCLI J. 2018;17: 688–708. pmid:30190660
View Article
PubMed/NCBI
Google Scholar

[241] View Article

[242] PubMed/NCBI

[243] Google Scholar

[ref65] 65. Elsaman T, Mohamed MS, Eltayib EM, Abdel-aziz HA, Abdalla AE, Munir MU, et al. Isatin derivatives as broad-spectrum antiviral agents: the current landscape. Med Chem Res. 2022;31: 244–273. pmid:35039740
View Article
PubMed/NCBI
Google Scholar

[245] View Article

[246] PubMed/NCBI

[247] Google Scholar

[ref66] 66. Hattori S ichiro, Higashi-Kuwata N, Hayashi H, Allu SR, Raghavaiah J, Bulut H, et al. A small molecule compound with an indole moiety inhibits the main protease of SARS-CoV-2 and blocks virus replication. Nat Commun 2021 121. 2021;12: 1–12. pmid:33510133
View Article
PubMed/NCBI
Google Scholar

[249] View Article

[250] PubMed/NCBI

[251] Google Scholar

[ref67] 67. Weglarz-Tomczak E, Tomczak JM, Talma M, Burda-Grabowska M, Giurg M, Brul S. Identification of ebselen and its analogues as potent covalent inhibitors of papain-like protease from SARS-CoV-2. Sci Reports 2021 111. 2021;11: 1–10. pmid:33574416
View Article
PubMed/NCBI
Google Scholar

[253] View Article

[254] PubMed/NCBI

[255] Google Scholar

[ref68] 68. Hengphasatporn K, Wilasluck P, Deetanya P, Wangkanont K, Chavasiri W, Visitchanakun P, et al. Halogenated Baicalein as a Promising Antiviral Agent toward SARS-CoV-2 Main Protease. J Chem Inf Model. 2022;62: 1498–1509. pmid:35245424
View Article
PubMed/NCBI
Google Scholar

[257] View Article

[258] PubMed/NCBI

[259] Google Scholar

[ref69] 69. Glaser J, Sedova A, Galanie S, Kneller DW, Davidson RB, Maradzike E, et al. Hit Expansion of a Noncovalent SARS-CoV-2 Main Protease Inhibitor. ACS Pharmacol Transl Sci. 2022;5: 255–265. pmid:35434531
View Article
PubMed/NCBI
Google Scholar

[261] View Article

[262] PubMed/NCBI

[263] Google Scholar

[ref70] 70. Abdallah HM, El-Halawany AM, Sirwi A, El-Araby AM, Mohamed GA, Ibrahim SRM, et al. Repurposing of Some Natural Product Isolates as SARS-COV-2 Main Protease Inhibitors via In Vitro Cell Free and Cell-Based Antiviral Assessments and Molecular Modeling Approaches. Pharm 2021, Vol 14, Page 213. 2021;14: 213. pmid:33806331
View Article
PubMed/NCBI
Google Scholar

[265] View Article

[266] PubMed/NCBI

[267] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Dataset preparation and curation

Calculation of molecular descriptors

Data filtering and balancing

Multivariate analysis of model

Modelability of data set

Model validation (Statistical approach)

Applicability domain analysis

Deployment of model as web-app

Correlation of predicted bioactivity with binding affinity (Molecular modeling and simulation)

Results

Chemical space analysis

QSAR modeling

Interpretation of feature importance

Model deployment as the Mpropred web-app and assessment

Binding affinity of CMNPD compounds with Mpro

Molecular dynamics (MD) simulation results

Post simulation binding free energy results

Discussion

Supporting information

S1 File. The SMILES ID and additional details of the 758 compounds.

S2 File. Computed AtomPairs2DCount fingerprint of our dataset.

S3 File. Computed AtomPairs2D fingerprint of our dataset.

S4 File. Computed CDK fingerprint of our dataset.

S5 File. Computed CDK Extended fingerprint of our dataset.

S6 File. Computed CDK Graph only fingerprint of our dataset.

S7 File. Computed EState fingerprint of our dataset.

S8 File. Computed KlekotaRothCount fingerprint of our dataset.

S9 File. Computed KlekotaRoth fingerprint of our dataset.

S10 File. Computed MACCS fingerprint of our dataset.

S11 File. Computed PubChem fingerprint of our dataset.

S12 File. Computed Substructure fingerprint of our dataset.

S13 File. Computed SubstructureCount fingerprint of our dataset.

S14 File. The docking score of all the CMNPD compounds with Mpro.

Acknowledgments

References

Model deployment as the M^propred web-app and assessment

Binding affinity of CMNPD compounds with M^pro

S14 File. The docking score of all the CMNPD compounds with M^pro.