Figures
Abstract
This paper provides a comprehensive review of quantitative structure-property relationships (QSPR) about to cancer drugs, with a focus on the application of topological indices (TI) and data analysis techniques. Cancer is a serious and life-threatening disease for which no complete cure currently exists. Consequently, extensive research is ongoing to develop new therapeutic agents. The application of topological indices in chemistry and medicine, particularly in the investigation of the molecular, pharmacological, and therapeutic properties of drugs, has become a significant tool. This article investigates the potential of Temperature indices in analyzing the physicochemical properties of drugs used for cancer treatment. The approach employs QSPR modeling to establish correlations between the molecular structure of a compound and its physical and chemical properties. The analysis covers a range of Cancer drugs, including Aminopterin, Convolutamide A, Convolutamydine A, Daunorubicin, Minocycline, Podophyllotoxin, Caulibugulone E, Perfragilin A, Melatonin, Tambjamine K, Amathaspiramide E, and Aspidostomide E. The findings demonstrate that optimal regression models (Fifty-eight models) incorporating TI can effectively predict physicochemical properties, such as Boiling Point (BP), Enthalpy (EN), Flash Point (FP), Molar Refractivity (MR), Polar Surface Area (PSA), Surface Tension (ST), Molecular Volume (MV), and Complexity (COM). This research suggests that temperature-based topological indices (TI) are promising tools for the development and optimization of cancer drugs, as demonstrated by statistically significant results with a p-value less than 0.05. In addition to the linear regression model, which performed the best, two other machine learning models, namely SVR and Random Forest, were also used for further analysis and comparison of their performance in predicting the physicochemical properties of drugs, to assess the advantages and disadvantages of each model.
Citation: Shi X, Kosari S, Ghods M, Kheirkhahan N (2025) Innovative approaches in QSPR modelling using topological indices for the development of cancer treatments. PLoS ONE 20(2): e0317507. https://doi.org/10.1371/journal.pone.0317507
Editor: Niravkumar Joshi, Federal University of ABC, BRAZIL
Received: July 4, 2024; Accepted: December 30, 2024; Published: February 21, 2025
Copyright: © 2025 Shi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data that support the findings of this study are openly available in [ChemSpider] at [http://www.chemspider.com/About Us.aspx]. Data are contained within the article.
Funding: This work was supported by the National Natural Science Foundation of China under grants 62332006 and 62172302, with Xiaolong Shi as the principal recipient.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
In the treatment of this disease within the human body, alkylating agents and metabolites are commonly employed. Although significant attention is devoted to the development and research of initial cancer therapies, the process of drug discovery, from identifying novel chemical compounds to obtaining regulatory approval, remains complex, costly, and time-intensive. Traditional approaches frequently encounter obstacles in compound synthesis and biological screening, leading the scientific community to explore more efficient methods for compound discovery. Chemical graph theory, an interdisciplinary field, is utilized to examine molecular structures and to establish correlations between activities, properties, and various phenomena. In this context, a molecular graph represents the structural formula of a chemical compound, with vertices corresponding to atoms and edges to chemical bonds. Chemical graph theory provides innovative tools for analyzing chemical structures, including topological indices, which serve as descriptors for the structure and specific properties of molecular graphs, typically represented as real numbers [1, 2]. Numerous studies have applied topological indices in the analysis of molecular graphs and drug structures [3–8]. A fundamental approach to exploring the relationship between a substance’s physicochemical properties and its topological indices is through Quantitative Structure-Property Relationship (QSPR) models. These models use regression analysis to examine the correlations between physical and chemical properties and topological indices. Additionally, many studies in Quantitative Structure-Activity Relationship (QSAR) have applied topological indices to drug structures [9, 10]. In this article, various temperature-based indices are evaluated across several Cancer drugs, enabling researchers to identify the associated physical properties and chemical reactions. Furthermore, in addition to linear regression, we employed Support Vector Regression (SVR) and Random Forest models to explore and assess the predictive capabilities of these methods in determining the physicochemical properties of cancer drugs. These models were applied to identify the most effective model for predicting the properties of the drugs. The results of our analysis help in selecting the best predictive model, which is crucial for improving drug design and optimizing the therapeutic effectiveness of cancer treatments [11, 12].
In this study, the drug’s structure is modeled as a graph where each vertex V(G) represents an atom and each edge E(G) signifies a chemical bond between atoms. The graphs considered are simple and connected. The degree of a vertex, defined as the number of edges incident to it, characterizes its connectivity [13].
2 Methodology and analysis
In this study, cancer drugs are modeled as simple graphs. To calculate the topological indices of these drug structures, we utilize techniques such as vertex partitioning, edge partitioning, and various computational methods. Our analysis is restricted to finite, simple, connected graphs. Let G denote a graph with a vertex set V and an edge set E. The degree du of a vertex u is defined as the number of vertices adjacent to u. Below is a list of the topological formulas used in this study.
Definition 2.1 Fajtlowicz defined the concept of vertex temperature u for a connected graph G as follows [14]:
(1)
Definition 2.2 Product connectivity temperature index [15] is
(2)
Definition 2.3 Harmonic temperature index [3] is
(3)
Definition 2.4 Symmetric division temperature index [16] is
(4)
Definition 2.5 Modified third temperature index [17] is
(5)
Definition 2.6 Modified second temperature index [17] is
(6)
Definition 2.7 Second hyper temperature indices [2] is
(7)
Definition 2.8 Sum connectivity temperature index [16] is
(8)
Definition 2.9 F-temperature index [16] is
(9)
Definition 2.10 Second temperature index [16] is
(10)
Definition 2.11 Reciprocal product connectivity index [16] is
(11)
Definition 2.12 First hyper temperature indices [2] is
(12)
A list of abbreviations used in the article is given in Table 1.
In recent years, scientists have increasingly utilized the QSPR/QSAR methodology to predict the physicochemical properties of chemical compounds through topological indices. This approach has been extensively applied in numerous studies to analyze a diverse array of drugs, including highly resistant anticancer agents, anti-COVID-19 drugs targeting the Omicron variant, breast cancer therapies, entropy tests involving benzene derivatives, nanotubes, Lyme disease treatments, and research on temperature indicators [18–22].
3 Mathematical computations of topological indices
This section presents the topological indices (TI) of cancer drugs and the QSPR modeling of their molecular structures.
3.1 Topological Index computation
Let A be a graph representing Aspidostomide E, where the edges are partitioned into distinct subsets based on specific criteria.
The study of the edges in A is shown in Table 2.
By applying Definitions 2.1 through 2.12, we obtain the following results:
Fig 1 shows the Chemical structure and Molecular graph of Aspidostomide E.
a) Chemical structure of Aspidostomide E. b) Molecular graph of Aspidostomide E. https://doi.org/10.6084/m9.figshare.26984881.v1.
Topological indices for other drugs can be computed using the methods described in Eqs (1) to (12) from Section 2. The indices are detailed in Tables 3, 4, and Fig 2 illustrates the drugs. Additional information about these drugs can be accessed on Chemical book [23], and Table 5 summarizes their physical and chemical properties [15, 24].
3.2 Discussion and comparison of advanced machine learning models and linear models for QSPR analysis
The primary objective of this section is to conduct a QSPR analysis of various topological indices (TI) and examine their correlation with several physicochemical properties and activities of drugs. The drugs under investigation include Aminopterin, Convolutamide A, Convolutamydine A, Daunorubicin, Minocycline, Podophyllotoxin, Caulibugulone E, Perfragilin A, Melatonin, Tambjamine K, Amathaspiramide E, and Aspidostomide E. We assessed the effectiveness of these TI in predicting drug properties. We analyzed eight physicochemical properties: Boiling Point (BP), Enthalpy (EN), Flash Point (FP), Molar Refractivity (MR), Polar Surface Area (PSA), Surface Tension (ST), Molecular Volume (MV), and Complexity (COM), with values obtained from PubChem and Chemspider. Table 6 displays the correlation coefficients (r) between these physicochemical attributes and the degree-based topological indices. Tables 7–13 demonstrate that a linear QSPR model provides the best fit for predicting these properties. The values are normally distributed, and fifty-eight regression models were employed for data analysis. Notably, the PT(G), HT(G), mT3(G), T2(G), and SDT(G) indices exhibit high correlations with COM, with R-values of 0.913, 0.905, 0.908, 0.915, and 0.905, respectively. Additionally, the ST) G (index shows a strong positive correlation with MR, with r = 0.924. In contrast, the RPT(G) topological index does not show a significant correlation with any physicochemical feature. The HT1(G) and T2(G) indices have a significant inverse correlation with MR and MV. The HT2(G) index is identified as the best predictor for BP, EN, MR, and MV, demonstrating an inverse correlation.
Advanced machine learning models, including SVR, Random Forest, and Linear Regression (a traditional model), were employed for the analysis. The findings revealed the following key observations:
- SVR and Linear Regression models exhibited superior performance in predicting physicochemical properties, achieving correlation coefficients (r) above 0.9 for most properties. These results underscore the high predictive power of advanced machine learning techniques in QSPR analysis (Vapnik, 1995; Seber & Lee, 2003).
- The Random Forest model also showed acceptable performance. Although its accuracy was slightly lower than that of the tuned SVR and Linear Regression models, it provided valuable insights into the relationships between topological indices and drug properties (Breiman, 2001).
- In contrast, the SVR model demonstrated weaker performance, with lower correlation coefficients, highlighting the necessity of parameter optimization for achieving accurate predictions (Vapnik, 1995).
Fig 3 provides a graphical representation of the correlations between TI and physicochemical properties. Fig 4 illustrates the relationship between TI and the physical properties of the drugs studied.
3.3 QSPR analysis
Building upon the temperature indices computed in Section 2, this section aims to develop a linear regression model. This model will be used to elucidate the relationships between the temperature indices and the physical and chemical properties of the drugs.
Where:
- P: Represents the Anxiety drug property (dependent variable)
- B: Constant term (y-intercept)
- A: Regression coefficient
- TI: Topological index (independent variable)
Eq (13) represents the formulated linear regression model. In this equation, "P" denotes a specific property of an anxiety drug that we aim to predict or analyze. "B" is the constant term, and "A" is the regression coefficient, which indicates the change in "P" associated with a unit increase in the topological index. The analysis was performed using SPSS software to develop linear models for eight specific properties of cancer drugs across twelve different drugs. These models are based on the eleven topological indices computed earlier. The following section will present the various linear models tailored to each of the eight drug properties, using Eq (13) as the general framework.
3.4 Linear regression models
In this section, the linear regression models for topological indices (TI) are discussed using Eq (13). Tables 7–13 present the parameters and QSPR models associated with these TI. The following linear models for temperature indices are derived based on Eq (13):
1. Product connectivity temperature index [PT (G)]
BP = 396.850+0.748 [PT (G)], EN = 64.216+0.106 [PT (G)], FP = 184.053+0.453 [PT (G)]
MR = 57.416+0.141 [PT (G)], PSA = 29.617+0.271 [PT (G)], ST = 42.638+0.085 [PT (G)]
MV = 166.264+0.333 [PT (G)], COM = 237.722+1.237 [PT (G)]
2. Harmonic temperature index [HT (G)]
BP = 398.336+0.788 [HT (G)], EN = 64.457+0.112 [HT (G)], FP = 185.297+0.476 [HT (G)]
MR = 57.178+0.150 [HT (G)], PSA = 30.878+0.282 [HT (G)], ST = 43.220+0.088 [HT (G)]
MV = 164.758+0.359 [HT (G)], COM = 243.005+1.295 [HT (G)]
3. Symmetric division temperature index [SDT (G)]
BP = 186.910+6.469 [SDT (G)], EN = 37.003+.912 [SDT (G)], FP = 66.890+4.187 [SDT (G)]
MR = 26.265+1.091 [SDT (G)], PSA = -45232.412+2.279 [SDT (G)], ST = 10.415+0.792 [SDT (G)]
MV = 17.235+2.345 [SDT (G)], COM = 0.494+10.271 [SDT (G)]
4. Modified third temperature index [mT3 (G)]
BP = 395.950+1.589 [mT3 (G)], EN = 64.102+0.225 [mT3 (G)], FP = 183.747+0.961 [mT3 (G)]
MR = 56.854+0.301 [mT3 (G)], PSA = 29.724+0.572 [mT3 (G)], ST = 42.788+0.179 [mT3 (G)]
MV = 164.329+0.718 [mT3 (G)], COM = 238.144+2.618 [mT3 (G)]
5. Modified second temperature index [mT2 (G)]
BP = 462.043+0.040 [mT2 (G)], EN = 73.347+0.006 [mT2 (G)], FP = 224.106+0.024 [mT2 (G)]
MR = 69.634+0.008 [mT2 (G)], PSA = 50.813+0.015 [mT2 (G)], ST = 49.727+0.005 [mT2 (G)]
MV = 194.512+0.018 [mT2 (G)], COM = 333.617+0.070 [mT2 (G)]
6. Second hyper temperature indices [HT2 (G)]
BP = 696.092–15380.93 [HT2 (G)], EN = 106.779–2199.264 [HT2 (G)]
MR = 120.232–3735.217 [HT2 (G)], MV = 323.901–10116.120 [HT2 (G)]
7. F-temperature index FT (G)
MR = 120.232–3735.217 [FT (G)], MV = 323.901–10116.120 [FT (G)]
8. First hyper temperature indices HT1 (G)
MR = 120.232–3735.217 [HT1 (G)], MV = 323.901–10116.120 [HT1 (G)]
9. Second temperature index T2 (G)
MR = 120.232–3735.217 [T2 (G)], MV = 323.901–10116.120 [T2 (G)]
10. Sum connectivity temperature index [ST (G)]
BP = 327.804+4.642 [ST (G)], EN = 54.435+.658 [ST (G)], FP = 149.323+2.682 [ST (G)]
MR = 42.390+.911 [ST (G)], PSA = 11.502+1.565 [ST (G)], ST = 37.512+.481 [ST (G)]
MV = 125.946+2.239 [ST (G)], COM = 127.395+7.726 [ST (G)]
4 Machine learning models for predictive analysis
In this study, machine learning models were employed to predict the physicochemical properties of drugs used in the treatment of Cancer. The primary goal was to assess the potential of these models in identifying complex and nonlinear relationships between molecular structures and physicochemical properties. The use of machine learning methods in drug analysis offers the advantage of uncovering hidden patterns within the data that traditional methods may fail to identify (Vapnik, 1995).
4.1. Rationale for using machine learning models
Machine learning models are particularly suitable for capturing intricate, nonlinear relationships in large datasets. This is crucial for predicting drug properties, as these relationships are not always straightforward or linear. In this study, machine learning models were used to model these complex patterns and predict key physicochemical properties of drugs. These properties are vital for drug design, as they influence the drug’s behavior, efficacy, and safety profile. Traditional statistical methods often fail to account for these complexities, making machine learning an ideal choice.
For this analysis, in addition to linear regression, two other machine learning methods were used, which are described below:
- Support Vector Regression (SVR): This model is well-known for its effectiveness in handling nonlinear data.
- Random Forest: A model based on an ensemble of decision trees, which aggregates the predictions of many trees to improve accuracy and reduce overfitting. Random Forest is particularly effective for regression tasks in complex datasets.
These models were employed to predict the following physicochemical properties of the drugs: BP, EN, FP, MR, PSA, ST, MV, COM.
4.2. Comparison of prediction and analysis of models
Linear regression performed the best, effectively capturing the relationships between the molecular structure of the drugs and their physicochemical properties. The SVR model also captured complex patterns but showed weaker results compared to linear regression. Random Forest performed the least well among the models. Tables 14–17 illustrate the predictions of physicochemical properties using different models, and the evaluation results are presented in Table 17 below and Fig 5.
The linear regression model performed well in predicting most physical and chemical properties such as BP, EN, MR, and MV, with its predictions closely matching the actual values. Overall, the model is effective in modeling linear relationships.
The SVR model performed relatively well in predicting most physical and chemical properties, with predictions for BP, EN, MR, and MV being close to the actual values. Although the model showed reasonable accuracy for most properties, there were some discrepancies, especially for COM. Overall, the SVR model was effective in capturing complex, non-linear relationships in the data, but linear regression performed better in providing more accurate predictions.
The Random Forest model showed acceptable results in predicting the physical and chemical properties of the drugs, but compared to the Linear Regression and SVR models, its accuracy was lower in some predictions. For instance, for properties like BP and EN, there were notable discrepancies between the predicted values and the actual values, indicating lower precision in these cases. Therefore, it can be concluded that the Linear Regression and SVR models performed better in most cases, with their predictions being closer to the actual values.
As depicted in Fig 5, Linear regression demonstrated the best performance overall. Random Forest excelled in predicting non-linear relationships in some cases but showed lower accuracy in others. The SVR model exhibited weak performance.
Based on the evaluation of machine learning models using the coefficient of determination (R²) for predicting the physicochemical properties of drugs, linear regression demonstrated the best performance, achieving the highest R² values for most properties such as BP (0.95), EN (0.91), and MV (0.93), indicating strong predictive accuracy. Random Forest provided valuable insights into complex, non-linear relationships, though its accuracy was slightly lower than that of linear regression. Finally, SVR performed poorly and provided less accurate results compared to the other two models. Therefore, linear regression can be considered the best model for predicting the physicochemical properties of drugs.
5 Conclusion
Table 6 and Fig 3 illustrate the correlation between the physical and chemical properties of anti-cancer drugs and the defined temperature indices.
- The Polar Surface Area is best predicted by the modified second temperature index, with a correlation coefficient (r) of 0.808.
- The Sum Connectivity temperature index is the most effective predictor for Boiling Point (r = 0.836) and Molar Volume (r = 0.848). It also exhibits the highest significant correlations with Molar Refractivity (r = 0.924) and Complexity (r = 0.921).
- The Symmetric Division temperature index shows a positive correlation with Enthalpy of Vaporization (r = 0.854), Flash Point (r = 0.857), and Surface Tension (r = 0.808).
This analysis reveals a positive correlation between the physical and chemical properties of Cancer drugs and the temperature indices. Tables 7–13 and 18–20 present regression models for various physical and chemical properties. The results demonstrate that the regression coefficients (r) exceed 0.6, and the p-values are below 0.05, indicating that these predictors are reliable for linear regression. The equations are formulated based on criteria such as minimum standard error (SE), maximum R-squared (R²), and maximum F-statistic. Consequently, it can be concluded that all physical and chemical properties are highly significant. This underscores the potential value of these topological indices in QSPR analysis for Cancer drugs, as evidenced by the plotted regression lines. The study’s findings can be applied to the production, development, and enhancement of more effective Cancer drugs. The theoretical insights derived from this study are beneficial for the development of new cancer therapies. Our findings reveal a clear trend in examining drug structures and their physical characteristics. Ultimately, this research contributes to the efficient design of new drugs and the development of preventive measures for the diseases in question. The principles of QSPR and topological indices offer valuable new approaches for estimating properties related to specific diseases and drugs, as demonstrated by the conclusions of this study. Furthermore, when comparing the three methods, despite the simplicity of Linear Regression, it consistently showed the best performance in predicting the physical and chemical properties of cancer drugs, outperforming both the SVR and Random Forest models. This emphasizes the effectiveness of Linear Regression in capturing the relationships within the data.
References
- 1. Ghorbani M, Hosseinzadeh MA. A new version of Zagreb indices. Filomat. 2012;26(1):93–100.
- 2.
Kulli VR, Pal M, Samanta S, Pal A. Handbook of Research of Advanced Applications of Graph Theory in Modern Society. Hershey, USA: Global; 2020.
- 3. Ghods M, Ramezani Tousi J. Computing Revan Polynomials and Revan Indices of Copper (I) Oxide and Copper (II) Oxide. Communications in Combinatorics, Cryptography & Computer Science. 2021;1(1):50–8.
- 4. Kosari S. On spectral radius and Zagreb Estrada index of graphs. Asian-European Journal of Mathematics. 2023;16(10):4167.
- 5. Kosari S, Dehgardi N, Khan A. Lower bound on the KG-Sombor index. Communications in Combinatorics and Optimization. 2023;8(4):751–7.
- 6. Ramezani Tousi J, Ghods M. Computing K Banhatti and K Hyper Banhatti Indices of Titania Nanotubes TiO2 [m, n]. Journal of Information and Optimization Sciences. 2023;44(2):207–16.
- 7. Ramezani Tousi J, Ghods M. Investigating Banhatti indices on the molecular graph and the line graph of Glass with M-polynomial approach. Proyecciones Journal of Mathematics. 2024;43(1):199–219.
- 8. Shi X, Kosari S, Hameed S, Shah AG, Ullah S. Application of connectivity index of cubic fuzzy graphs for identification of danger zones of tsunami threat. PLoS ONE. 2024;19(1):1–24. pmid:38289906
- 9. Havare ÖÇ. Quantitative structure analysis of some molecules in drugs used in the treatment of COVID-19 with topological indices. Polycyclic Aromatic Compounds. 2022;42(8):5249–60.
- 10. Huang L, Wang Y, Pattabiraman K, Danesh P, Siddiqui MK, Cancan M. Topological indices and QSPR modeling of new antiviral drugs for cancer treatment. Polycyclic Aromatic Compounds. 2023;43(9):8147–70.
- 11. Breiman L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
- 12.
Vapnik V. N. (1995). The nature of statistical learning theory. Springer.
- 13. Ghani MU, Sultan F, Tag El Din ESM, Khan AR, Liu JB, Cancan M. A Paradigmatic Approach to Find the Valency-Based K-Banhatti and Redefined Zagreb Entropy for Niobium Oxide and a Metal–Organic Framework. Molecules. 2022;27(20):6975. pmid:36296567
- 14. Fajtolowicz S. On conjectures of Graffitti. Discrete Mathematics. 1988;72(1):113–8.
- 15.
PubChem. PubChem: National Center for Biotechnology Information [Internet]. [cited 2024 Sep 11]. Available from: https://pubchem.ncbi.nlm.nih.gov/.
- 16. Kulli VR. Computation of Some Temperature Indices of HC5C5 [p, q] Nanotubes. Annals of Pure and Applied Mathematics. 2019;20(2):69–74.
- 17. Kulli VR. Inverse sum temperature index and multiplicative inverse sum temperature index of certain nanotubes. International Journal of Recent Scientific Research. 2021;12(01):40635–9.
- 18. Husin MN, Khan AR, Awan NUH, Campena FJH, Tchier F, Hussain S. Multicriteria decision making attributes and estimation of physicochemical properties of kidney cancer drugs via topological descriptors. PLoS ONE. 2024;19(5): e0302276. pmid:38713692
- 19. Jahanbani A, Khoeilar R, Cancan M. On the Temperature Indices of Molecular Structures of Some Networks. Journal of Mathematics. 2022;2022(1):1–7.
- 20. Kansal N, Garg P, Singh O. Temperature-based topological indices and QSPR Analysis of COVID-19 Drugs. Polycyclic Aromatic Compounds. 2023;43(5):4148–69.
- 21. Ramezani Tousi J, Ghods M. Some polynomials and degree-based topological indices of molecular graph and line graph of Titanium dioxide nanotubes. Journal of Information and Optimization Sciences. 2024;45(1):95–106.18.
- 22. Zhang Y, Khalid A, Siddiqui MK, Rehman H, Ishtiaq M, Cancan M. On analysis of temperature based topological indices of some Covid-19 drugs. Polycyclic Aromatic Compounds. 2023;43(4):3810–26.
- 23.
ChemicalBook. ChemicalBook: Chemical Information [Internet]. [cited 2024 Sep 11]. Available from: https://www.chemicalbook.com/.
- 24. ChemSpider. (2021). Search asd shace chemistry. Retrieved from http://www.chemspider.com/AboutUs.aspx.