Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Computational method for aromatase-related proteins using machine learning approach

  • Muthu Krishnan Selvaraj ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    muthu@imtech.res.in (MKS); jasmeet23k@gmail.com (JK)

    Affiliation Data Center/Bioinformatics, MTCC, CSIR-Institute of Microbial Technology, Chandigarh, India

  • Jasmeet Kaur

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    muthu@imtech.res.in (MKS); jasmeet23k@gmail.com (JK)

    Affiliation Department of Biophysics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India

Abstract

Human aromatase enzyme is a microsomal cytochrome P450 and catalyzes aromatization of androgens into estrogens during steroidogenesis. For breast cancer therapy, third-generation aromatase inhibitors (AIs) have proven to be effective; however patients acquire resistance to current AIs. Thus there is a need to predict aromatase-related proteins to develop efficacious AIs. A machine learning method was established to identify aromatase-related proteins using a five-fold cross validation technique. In this study, different SVM approach-based models were built using the following approaches like amino acid, dipeptide composition, hybrid and evolutionary profiles in the form of position-specific scoring matrix (PSSM); with maximum accuracy of 87.42%, 84.05%, 85.12%, and 92.02% respectively. Based on the primary sequence, the developed method is highly accurate to predict the aromatase-related proteins. Prediction scores graphs were developed using the known dataset to check the performance of the method. Based on the approach described above, a webserver for predicting aromatase-related proteins from primary sequence data was developed and implemented at https://bioinfo.imtech.res.in/servers/muthu/aromatase/home.html. We hope that the developed method will be useful for aromatase protein related research.

Introduction

Cancer cases continue to rise globally despite advances in clinical therapy [1]. Breast cancer remains the most frequently diagnosed cancer in females and metastasis remains the leading cause of death by this cancer [1]. Breast cancer incidence is greater in developed countries, while mortality is highest in developing countries [2]. About 30% breast cancer patients develop recurring metastatic cancer despite recent advances in therapeutic regimens.

Biological actions of estrogen are mediated with the estrogen receptor (ER) and 70% of breast tumors express the ER and/or progesterone receptor (PR). Thus, estrogen deprivation has been considered an important treatment for estrogen-dependent (ER+) breast cancers. In post-menopausal women, estradiol is produced in extragonadal sites and thus it stops functioning as a circulating hormone and acts locally as a paracrine or intracrine factor [3, 4]. These peripheral sites include the mesenchymal cells of adipose tissue, osteoblasts and chondrocytes of bone and numerous sites in the brain and promotes breast cancer [5, 6].

For a long time, tamoxifen has been a reliable therapeutic measure for ER+ breast cancer, in both pre- and post-menopausal women. However, over half of advanced ER+ breast cancers are intrinsically resistant to tamoxifen and about 40% will acquire the resistance during the treatment. Aromatase inhibitors (AIs) are the next line of therapeutic approach for ER+ breast cancer in women and serve as first-line therapy for metastatic breast cancer [7]. AIs block the action of microsomal aromatase cytochrome P450 (P450arom), thus limiting estrogen biosynthesis and tumor progression [8]. Aromatase is a product of CYP19A1 gene, which produces a monomeric enzyme composed of a heme group and a single polypeptide chain of 503 amino-acids [9, 10]. Aromatase is primarily expressed in gonads and brain of humans, but also occurs in placenta and liver of developing fetus, and in muscle, adrenal cortex and adipose tissue of the adults [9]. In the ovary, aromatase is produced in the granulosa cells and converts androgens (male hormones) into estrogens (female hormones) and is essential for the female reproductive cycle, development of female secondary sexual characteristics and for maintaining reproductive health [11].

AIs are currently an established treatment regimen for the ER+ breast cancer patients and FDA has approved first-, second-, and third-generation AIs. The third-generation inhibitors including letrozole, anastrozole and exemestanea are routine treatment for post-menopausal breast cancer patients [12]. Besides the therapeutic success of the third-generation AIs, acquired resistance develops, leading to tumor relapse [13]. Further, patients with prolonged clinical usage of both steroidal and non-steroidal third-generation AIs, have experienced side effects like myalgia, arthralgia, hot flashes and night sweats [14]. Thus, there is an urgency to develop novel aromatase inhibitors for improved effectiveness and lesser side-effects.

Machine learning is coming-up as a useful tool in biological science [15] and it can be used to uncover novel aromatase-related proteins and to investigate the structural and functional properties of these enzymes. At present many computational methods has been proposed for clinical data analysis, clinically important protein or enzymes using machine learning approaches [1619]. But so far, there are no reports on investigating aromatase-related proteins by support vector machine (SVM). Thus finding or identifying novel or unknown aromatase-related proteins using SVM, is the need of the hour.

Support vector machine is a governed machine learning method, commonly used in bioinformatics applications, such as predicting protein functions and their evolutionary correlations, analyzing DNA sequences, and classifying microarray data [2022]. Due to its powerful prediction ability, it is used not just for protein studies but in numerous clinical investigations like gene expression profiling, cancer classification and biomarker discovery [23]. SVM is also being employed for prediction of drug-target interactions, disease-associated genes and drug efficacy [24]. Its use in gene selection and classification of microRNA expression data has enabled researchers to analyze large datasets and help understand relationship between genes and diseases [25].

SVM statistical predictors for sequence-based biological system use step-wise rules: dataset construction/selection to coach and examine a predictor, programming the biological sequence in an effective mathematical term, developing a vigorous algorithm to run the prediction, performing cross-validation to evaluate prediction accuracy and running the algorithm on a public accessible user-friendly web-server [26]. It is one of the major tasks in bioinformatics to predict the protein functions using protein structure, post-translational modification (PTM) sites and DNA binding sites; which can assist in understanding disease mechanisms and/or identifying novel drug targets [23, 27, 28].

Therefore, we have made a concerted effort to develop a method for identifying aromatase-related proteins. We developed a method for recognizing enzymes that will aid in the identification of new or unknown aromatase-related proteins, using amino acid composition (AAC), dipeptide composition (DPC), hybrid and position-specific scoring matrix (PSSM) models.

Methods

Machine learning based support vector machine (SVM)

Amino acid composition (AAC), dipeptide composition (DPC), PSSM profile and Hybrid approach employing machine learning based support vector Machine (SVM) were used to construct the method. The SVM-based prediction technique is often used to manage vast amounts of data, and it has been demonstrated to perform well in a number of biological data processing applications such as classification, protein functions and type identification [2931]. In this study, we used SVM to analyze the performance of the classifiers and five-fold cross validation [3234]. The generated approach model’s performance was assessed using the original and additional protein datasets. To eliminate outcome bias, all models were run with the same amount of negative sequences. Based on the size of the aromatase dataset, negative sequences were picked at random from the UniProt database. The performance of the SVM models was tested using known positive and negative sequence data. A blank dataset was also utilized to test the generated models, which successfully recognized the data.

Generation of survival curves

Kaplan-Meier (KM) plotter is a web-based survival analysis tool and evaluates correlation between the expression of all genes (mRNA, miRNA, protein) and survival in about 30k+ samples from all tumor types. GEO, EGA, and TCGA are the sources for the databases and the plotter provides a meta-analysis based discovery and validation of survival biomarkers for cancer research [35]. The KM plotter tool (http://kmplot.com/analysis/) was used to determine the prognostic value of aromatase (CYP19A1) mRNA expression using Pan-cancer RNA-seq in various cancers by correlating it with overall (OS) and relapse-free (RFS) survival [36], for a follow-up threshold of 240 months. For mRNA expression analysis, samples were split into high and low expression groups based on the median expression of aromatase. The median expression was selected to split patients over other options of lower quartile, lower tertile, upper tertile and upper quartile expression to give almost same sample numbers for both groups and hence less bias. Hazard ratio (HR), 95% confidence intervals and logrank p for all the survival curves were provided by the KM plotter website and p value of < 0.05 was considered to be statistically significant.

Datasets for SVM

Aromatase data was taken from the Uniprot/SWISSPROT database [37]. When we used the keyword, we found 9836 protein sequences which included 257 reviewed sequences. So, we used only reviewed sequences retrieved on 10th May 2021, and removed all these sequences annotated or labeled as "fragments," "isoforms," "potentials," "similarity," or "probables" to generate a high quality dataset and this removal will help in reducing the prediction error. To avoid redundancy and the incorporation of variants, this dataset was then processed with the CD-hit tool, which deleted sequences that were more than 90% identical to any other sequence in the dataset [38]. The final dataset contained a total of 191 aromatase sequences (positive dataset) out of 257, details provided in the S1 File. The negative dataset contained 191 non-aromatase sequences that were unrelated to the aromatase and were picked at random. A Uniprot/Swissprot keyword search for "regulatory proteins" was used to select the negative sequence collection. A web server for predicting aromatase-related proteins from primary sequence data was developed and implemented at weblink https://bioinfo.imtech.res.in/servers/muthu/aromatase/home.html.

Amino acid and dipeptide composition

The amino acid composition of a protein refers to the percentage of each amino acid in the protein [21, 39]. Encoding data into vectors is required by the SVM light. The percentage of all 20 natural amino acids was calculated using the following equation: (1)

In a similar manner, dipeptide composition was calculated using a vector with a constant length of 400 (20x20) dimensions [40]. To determine the fraction of each dipeptide composition, the following equation was used: (2)

PSSM profile

The GPSR software was used to create the PSSM profile against the nr (non-redundant) blast database. We utilized the seq2pssm imp, pssm n2, pssm comp, and col2svm programmes in the GPSR package for PSI-BLAST searches against the nr database using different iterations with a cut-off e-value of 0.001, as well as to normalize the PSSM profile and produce the SVM light input format (i.e. as a composition vector of 400) [26]. Finally, the SVM models were created with various parameters, optimized, and the best model was employed in the prediction server. For normalization, the following formula was used: (3)

Hybrid approach

In order to improve prediction accuracy, a hybrid technique was developed. A hybrid model is defined as the combination of two or more profiles. The hybrid models were developed using 420 vector lengths, which included 20 and 400 from AAC and DPC, respectively. The col_add function in the GPSR 1.0 package’s was used to merge the AAC and DPC profiles to generate a hybrid profile [41, 42].

Evaluation and performance

A five-fold cross validation approach was used to evaluate performance. We started with an aromatase positive dataset and a non-aromatase negative dataset. Positive and negative datasets were randomly divided into five equal groups. In order to run SVM, four sets were utilized for training and the remaining set for testing. This process was performed five times, resulting in only one test for each sub-set [22, 43]. This has been done with all approaches, including amino acid, dipeptide, PSSM, and hybrid. The average of the test scores from all five sets was used to compute the final performance. The performance of the classifiers was assessed using sensitivity, specificity, accuracy, and the Mathew correlation coefficient (MCC). These measurements were calculated using the following standard formulas: (4) (5) (6) (7)

Support vector machine (SVM)

Aromatase prediction was done with the SVM light programme, a very successful machine learning approach. The SVM-light has been used in a variety of investigations, including plasminogen activator prediction, BacHbpred-bacterial hemoglobin prediction, Oxypred-oxygen-binding protein prediction, and VerHb-vertebrate hemoglobin protein prediction [21, 26, 3942]. The SVM may employ a range of parameter settings, including kernel, linear, polynomial, and radial basic functions (RBI) [44]. We optimized distinct parameters for each prediction approach in the prediction studies. In the method, aromatase was utilized as a positive example and non-aromatase was used as a negative example. In practice, we ran SVM light with (+)ve labels for positive sequences and (-)ve labels for negative sequences.

Webserver

The aromatase related protein prediction webserver was developed using HTML and CGI-PERL script. The backend was connected to the apache server utilizing the linux operating system. The prediction webserver can be accessed freely at the following weblink https://bioinfo.imtech.res.in/servers/muthu/aromatase/home.html. It is a Support Vector Machine (SVM) based classification method for predicting aromatase-related protein. The user can paste their sequences in fasta format into the text box on the submit page. This server will predict the input sequences as aromatase or non-aromatase protein, based on the selected approaches—amino acid composition (AAC), dipeptide composition (DPC), PSSM and hybrid (AAC+DPC).

Results

Effect of aromatase mRNA expression on cancer patient’s survival

KM plotter Pan-cancer RNA-seq was used to analyze correlation of aromatase (CYP19A1) mRNA expression and survival in different available tumor types (Table 1). Aromatase higher mRNA expression significantly correlated to poorer OS in head-neck squamous cell carcinoma (Fig 1A, Table 1), kidney renal clear cell carcinoma (Fig 1B, Table 1) and kidney renal papillary cell carcinoma (Fig 1C, Table 1) patients. Aromatase higher mRNA expression was also significantly correlated to poorer RFS in kidney renal papillary cell carcinoma (Fig 1D, Table 1) patients. Further, higher aromatase mRNA expression led to significantly poorer OS in liver hepatocellular carcinoma (Fig 1E, Table 1) and stomach adenocarcinoma (Fig 1F, Table 1) patients. No significant correlation between aromatase mRNA expression and survival was seen for other types of tumors (Table 1).

thumbnail
Fig 1.

Effect of aromatase mRNA expression on OS in head-neck squamous cell carcinoma (A), kidney renal clear cell carcinoma (B) and kidney renal papillary cell carcinoma (C) patients. (D) Effect of aromatase mRNA expression on RFS in kidney renal papillary cell carcinoma. Effect of aromatase mRNA expression on OS in liver hepatocellular carcinoma (E) and stomach adenocarcinoma (F) patients.

https://doi.org/10.1371/journal.pone.0283567.g001

thumbnail
Table 1. Correlation of aromatase mRNA expression with overall (OS) and relapse-free survival (RFS) in various cancer patients.

https://doi.org/10.1371/journal.pone.0283567.t001

Amino acid composition analysis

The amino acid composition of aromatase sequences was computed for aromatase proteins, and it was observed that residue “L” occurs at much greater frequencies (above 10%) (Fig 2A). As shown in Fig 2A, “F”, “P”, “S” and “V” are present more than 6%. The residues “C” and “W” are shown less than 2%. When comparing the amino acid residue profiles of aromatase and non-aromatase, some of the residues pattern are similar, but not all (Fig 2B). These differences can be used to identify the aromatase from negative sequence by the developed models.

thumbnail
Fig 2.

A) Amino acid distribution chart between aromatase and non-aromatase protein sequences. B) Sequence length profile of aromatase and non-aromatase proteins binding.

https://doi.org/10.1371/journal.pone.0283567.g002

Amino acid composition SVM modules

Firstly we used support vector machines (SVM) to develop models based on the amino acid composition of aromatase. SVM was trained on a variety of datasets using the SVM light implementation. A 20-dimensional amino acid composition vector was used to train the SVM classifiers. SVM Kernels and parameters were adjusted for the best discriminating between positive and negative protein sequence data sets. The maximum accuracy (ACC) of aromatase prediction based on amino acid composition was 87.42%, with 100% sensitivity (SN), 74.84% specificity (SP) and 0.87 Mathew correlation coefficient (MCC) (Table 2, Fig 3).

thumbnail
Fig 3. The performance of accuracy (A), sensitivity (B), specificity (C) and MCC (D) based on the threshold value in all approaches.

https://doi.org/10.1371/journal.pone.0283567.g003

thumbnail
Table 2. The performance of SVM models using AAC, DPC, Hybrid and PSSM profiles on the original datasets.

https://doi.org/10.1371/journal.pone.0283567.t002

SVM modules using dipeptide composition

In general, SVM algorithms based on dipeptide composition are more effective than approaches based on single amino acid composition. SVM classifiers for dipeptide composition have also been constructed, which is represented by a 400-dimensional vector of dipeptide frequencies (20 x 20). During the adjustment of the kernel parameter and trade-off parameter C, better prediction performance was found with γ = 3 and C = 375. We developed models to distinguish aromatase from non-aromatase sequences based on these parameters. The SVM-based model achieved a maximum accuracy of 84.05%, 99.84% sensitivity, 68.26% specificity and 0.82 MCC as shown in Table 2 and Fig 3.

Hybrid (AC + DC) SVM modules

The aromatase prediction problem was also addressed using a hybrid prediction approach that integrated amino acid composition (AAC) and dipeptide composition (DPC). The hybrid approach yielded 85.12% accuracy, 98.68% sensitivity, 71.55% specificity and 0.83 MCC respectively (Table 2, Fig 3). The hybrid model results are slightly improved than the individual models, the hybrid model increase sensitivity while decrease in specificity, resulting in a slight improvement in overall performance.

PSSM profile based SVM modules

Aromatase prediction models based on position specific score matrix (PSSM) profiles were also developed to improve the performance, and they achieved maximum accuracy of 92.02% with 100% sensitivity, 84.05% specificity and 0.92% of MCC (Table 2, Fig 3). In general, all models, including the simple AAC method, performed comparably well as measured by accuracy and MCC.

Prediction scoring graphs analysis

Prediction scoring graphs were also used to assess the performance of SVM modules. The prediction score for each individual sequence tested is represented by the scoring graph, which shows how the score of sequences in the positive set is separated from the score of sequences in the negative set by a threshold that may be used to categorize positive and negative predictions. However, not all positive or negative sequences are successfully categorized, leading to misleading negative and positive predictions. This analysis summarizes the prediction results to reflect this element of performance. According to our study’s findings, no positive sequences predicted negatively in AAC, whereas one negative sequence predicted positively (Fig 4A). In DPC, no positive sequences predicted negatively and no negative sequences predicted positively (Fig 4B). In hybrid, three positive sequences predicted negatively while one negative sequence predicted positively (Fig 4C). One positive sequence predicted negatively whereas the one negative sequence predicted positively in the PSSM system (Fig 4D). On the negative dataset, the predicted false positive rate (FPR) in AAC, Hybrid was 0.005, and in PSSM 0.010.

thumbnail
Fig 4. Prediction scores graphs: Prediction performance of the developed models on aromatase and non-aromatase proteins.

A) Amino acid composition based approach (AAC), B) Dipeptide composition based approach (DPC), C) Hybrid profile based approach (AAC+DPC) and D) PSSM profile based approach.

https://doi.org/10.1371/journal.pone.0283567.g004

BLAST data analysis

According to the results of the BLAST dataset, the developed methods are performing well in all approaches in identifying aromatase. We have randomly picked five sequences from our dataset (CP19A_HUMAN, CP2F1_HUMAN, CP4Z1_HUMAN, GCM1_HUMAN, and CP2A7_HUMAN) and BLAST was performed against non-redundancy dataset and collected 500 sequences (100 each from one sequence). Overall, the proposed method using the BLAST dataset was able to accurately identify 97.4% of the sequences in all approaches. All models correctly predicted the respective individual performances of AAC, DPC, Hybrid and PSSM at 99.2%, 93.8%, 96.6% and 100% (Table 3). Thus, the PSSM approach completely identifies the BLAST sequences (Fig 5). This result shows that our method outperforms the BLAST search in identifying the aromatase related proteins.

thumbnail
Fig 5. BLAST-Search Data analysis: Prediction performance of all models on the BLAST-Search data, A) AAC, B) DPC, C) Hybrid and D) PSSM.

https://doi.org/10.1371/journal.pone.0283567.g005

thumbnail
Table 3. The prediction performance of all models on the BLAST-Search data.

https://doi.org/10.1371/journal.pone.0283567.t003

Discussion

Computational biology has helped understand proteins from a new perspective, as algorithms can predict protein-protein interactions [45, 46] and identify novel drug targets in various pathologies [47, 48]. Algorithms performing systematic study of cancer and protein databases [49, 50] have enhanced the accuracy of cancer patients’ survival predictions [5154], provide understanding of drug-induced side-effects [55] and allow identification of novel biomarkers [56]. To our knowledge, there are no algorithms for structural and functional characterization of aromatase or its polymorphisms. As aromatase is a critical target in breast cancer patients [57, 58], we established a reliable approach for detecting novel aromatase-related proteins, which will aid in developing novel AIs with improved efficacy.

Aromatase belong to the cytochrome P450 family, which are heme-containing mono-oxygenases and highly flexible enzymes that allow easy substrate access and binding, and product release [59]. Unlike most P450s, which are not highly substrate selective, androgenic specificity of aromatase sets it apart. Aromatase structure remained unknown for decades and this hindered explanation of its biochemical mechanism. Several laboratories purified aromatase from human placenta [60, 61] and recombinant expression systems [62, 63], however attempts to crystallize aromatase remained unsuccessful. So far, only one crystal structure of the only natural mammalian, full-length P450 human placental aromatase is known [64]. Thus, finding aromatase-related proteins using in-vivo and in-vitro methods is difficult and thus low-cost computational methods like SVM can be a reliable approach to identify novel aromatase-related proteins.

Aromatase is the only vertebrate enzyme which catalyzes aromatization of androgens into estrogens [64, 65]. It is a monomeric integral membrane protein in endoplasmic reticulum [66, 67] and has a heme group with 503 amino acids. Aromatase has twelve α-helices and ten β-strands [64, 68] and its active site is a distal cavity of heme-binding pocket with heme iron being the reaction center [68]. Aromatase in peripheral adipose tissues leads to estrogen biosynthesis in postmenopausal women, thus inducing breast tumors [69]. A small amount of estrogen can stimulate breast tumor formation and aromatase protein is seen in epithelial as well as stromal breast cancer cells [70]. AIs are currently being used to treat breast cancer patients, however resistance and toxicity of AIs induces the need for discovering novel AIs [71].

Survival analysis in various types of cancer patients using KM plotter showed that aromatase higher mRNA expression led to poorer overall survival (OS) in head-neck squamous cell carcinoma (Fig 1A), kidney renal clear cell carcinoma (Fig 1B), kidney renal papillary cell carcinoma (Fig 1C), liver hepatocellular carcinoma (Fig 1E) and stomach adenocarcinoma (Fig 1F) patients. Human fetal liver, kidney and intestine expresses significant level of aromatase [72], but the hepatic aromatase expression becomes untraceable in post-natal life [73]. Estrogens have shown to promote not only the development and progression of breast cancer, but also endometrial, prostrate and colorectal cancer by increasing the mitotic activity [74, 75]. The current survival analysis suggests a key role of aromatase as a tumor-promoter, even in extragonadal tissues including head-neck, kidney, liver and stomach [76]. These results signify the demand for a method to identify aromatase-related proteins for various types of endocrine-responsive tumors.

SVM is used in a variety of studies in the field of basic science and medicine, including clinical data analysis, laboratory testing for detection of disease and clinical trials of medicines [7779]. In this study, we developed a very reliable method for predicting aromatase-related proteins, based on a variety of protein patterns such as AAC, DPC and Hybrid approaches. The overall prediction accuracy for aromatase-related proteins was 87.42%, 84.05%, 85.12% and 92.02% for AAC, DPC, hybrid and PSSM, respectively. The results of the BLAST search data analysis and prediction score graph analysis demonstrate that the established method is effective in identifying the aromatase-related proteins. We expect that our developed method will find undiscovered aromatase-related proteins, which will aid researchers in cancer predictive studies and precision medicine. As it is a first webserver to detect aromatase-related proteins, we cannot compare the performance of our method with any other methods.

Conclusion

So far, there is no web-server/algorithm to predict or detect aromatase-related proteins. Thus, we developed a highly accurate method for identifying aromatase-related proteins using SVM with various amino acid approaches (Fig 6). The method was developed with the fivefold cross validation techniques with the approaches of amino acid composition (AAC), dipeptide composition (DPC), hybrid (AAC+DPC) and position specific score matrix (PSSM). We have tested the known and unknown data with our developed models and as a result all models detect aromatase-related proteins accurately. In future studies, we would like to work on the aromatase inhibitors with molecular docking, and we are also interested in using a deep learning technique [8082]. We believe that this study will facilitate researchers in finding new or undiscovered aromatase-related proteins.

thumbnail
Fig 6. Flow chart for developing SVM method to predict aromatase-related proteins.

https://doi.org/10.1371/journal.pone.0283567.g006

Acknowledgments

We are sincerely thankful to the Directors of CSIR-IMTECH and PGIMER (Chandigarh) for their support. A copy of the manuscript has been submitted to PTM, CSIR IMTECH, dated on 25.07.2022.

References

  1. 1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer Statistics, 2021. CA: A Cancer Journal for Clinicians. 2021;71: 7–33. pmid:33433946
  2. 2. Dhakal R, Noula M, Roupa Z, Yamasaki EN. A Scoping Review on the Status of Female Breast Cancer in Asia with a Special Focus on Nepal. Breast Cancer (Dove Med Press). 2022;14: 229–246. pmid:36052152
  3. 3. Simpson E, Rubin G, Clyne C, Robertson K, O’Donnell L, Jones M, et al. The role of local estrogen biosynthesis in males and females. Trends Endocrinol Metab. 2000;11: 184–188. pmid:10856920
  4. 4. Labrie F, Bélanger A, Cusan L, Gomez JL, Candas B. Marked decline in serum concentrations of adrenal C19 sex steroid precursors and conjugated androgen metabolites during aging. J Clin Endocrinol Metab. 1997;82: 2396–2402. pmid:9253307
  5. 5. Russo J, Hasan Lareef M, Balogh G, Guo S, Russo IH. Estrogen and its metabolites are carcinogenic agents in human breast epithelial cells. J Steroid Biochem Mol Biol. 2003;87: 1–25. pmid:14630087
  6. 6. Cui X, Schiff R, Arpino G, Osborne CK, Lee AV. Biology of progesterone receptor loss in breast cancer and its implications for endocrine therapy. J Clin Oncol. 2005;23: 7721–7735. pmid:16234531
  7. 7. Van Asten K, Neven P, Lintermans A, Wildiers H, Paridaens R. Aromatase inhibitors in the breast cancer clinic: focus on exemestane. Endocr Relat Cancer. 2014;21: R31–49. pmid:24434719
  8. 8. Santen RJ, Santner S, Davis B, Veldhuis J, Samojlik E, Ruby E. Aminoglutethimide inhibits extraglandular estrogen production in postmenopausal women with breast carcinoma. J Clin Endocrinol Metab. 1978;47: 1257–1265. pmid:263348
  9. 9. Simpson ER, Mahendroo MS, Means GD, Kilgore MW, Hinshelwood MM, Graham-Lorence S, et al. Aromatase cytochrome P450, the enzyme responsible for estrogen biosynthesis. Endocr Rev. 1994;15: 342–355. pmid:8076586
  10. 10. Chen SA, Besman MJ, Sparkes RS, Zollman S, Klisak I, Mohandas T, et al. Human aromatase: cDNA cloning, Southern blot analysis, and assignment of the gene to chromosome 15. DNA. 1988;7: 27–38. pmid:3390233
  11. 11. Stocco C. Aromatase expression in the ovary: hormonal and molecular regulation. Steroids. 2008;73: 473–487. pmid:18321551
  12. 12. Ratre P, Mishra K, Dubey A, Vyas A, Jain A, Thareja S. Aromatase Inhibitors for the Treatment of Breast Cancer: A Journey from the Scratch. Anticancer Agents Med Chem. 2020;20: 1994–2004. pmid:32593281
  13. 13. Augusto TV, Correia-da-Silva G, Rodrigues CMP, Teixeira N, Amaral C. Acquired resistance to aromatase inhibitors: where we stand! Endocr Relat Cancer. 2018;25: R283–R301. pmid:29530940
  14. 14. Din OS, Dodwell D, Wakefield RJ, Coleman RE. Aromatase inhibitor-induced arthralgia in early breast cancer: what do we know and how can we find out more? Breast Cancer Res Treat. 2010;120: 525–538. pmid:20157776
  15. 15. Ahmad F, Mahmood A, Muhmood T. Machine learning-integrated omics for the risk and safety assessment of nanomaterials. Biomater Sci. 2021;9: 1598–1608. pmid:33443512
  16. 16. Kalafi EY, Nor NAM, Taib NA, Ganggayah MD, Town C, Dhillon SK. Machine Learning and Deep Learning Approaches in Breast Cancer Survival Prediction Using Clinical Data. Folia Biol (Praha). 2019;65: 212–220. pmid:32362304
  17. 17. Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19: 281. pmid:31864346
  18. 18. Gorji F, Shafiekhani S, Namdar P, Abdollahzade S, Rafiei S. Machine learning-based COVID-19 diagnosis by demographic characteristics and clinical data. Adv Respir Med. 2022. pmid:35102543
  19. 19. Tapani KT, Nevalainen P, Vanhatalo S, Stevenson NJ. Validating an SVM-based neonatal seizure detection algorithm for generalizability, non-inferiority and clinical efficacy. Comput Biol Med. 2022;145: 105399. pmid:35381454
  20. 20. Peng Z-L, Yang J-Y, Chen X. An improved classification of G-protein-coupled receptors using sequence-derived features. BMC Bioinformatics. 2010;11: 420. pmid:20696050
  21. 21. Muthukrishnan S, Puri M, Lefevre C. Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators. BMC Res Notes. 2014;7: 63. pmid:24468032
  22. 22. Muthu Krishnan S. Using Chou’s general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. J Theor Biol. 2018;445: 62–74. pmid:29476832
  23. 23. Sahu SS, Loaiza CD, Kaundal R. Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches. AoB Plants. 2020;12: plz068. pmid:32528639
  24. 24. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18: 463–477. pmid:30976107
  25. 25. Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering. 2023;10. pmid:36829667
  26. 26. Muthu Krishnan S. Classify vertebrate hemoglobin proteins by incorporating the evolutionary information into the general PseAAC with the hybrid approach. J Theor Biol. 2016;409: 27–37. pmid:27575465
  27. 27. Hendrix SG, Chang KY, Ryu Z, Xie Z-R. DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method. Int J Mol Sci. 2021;22. pmid:34073705
  28. 28. Pugalenthi G, Nithya V, Chou K-C, Archunan G. Nglyc: A Random Forest Method for Prediction of N-Glycosylation Sites in Eukaryotic Protein Sequence. Protein Pept Lett. 2020;27: 178–186. pmid:31577193
  29. 29. Huang G, Zhang G, Yu Z. Computational prediction and analysis of histone H3k27me1-associated miRNAs. Biochim Biophys Acta Proteins Proteom. 2021;1869: 140539. pmid:32947024
  30. 30. Zhou L, Duan Q, Tian X, Xu H, Tang J, Peng L. LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification. BMC Bioinformatics. 2021;22: 568. pmid:34836494
  31. 31. Zhang M, Su Q, Lu Y, Zhao M, Niu B. Application of Machine Learning Approaches for Protein-protein Interactions Prediction. Med Chem. 2017;13: 506–514. pmid:28530547
  32. 32. Shirafkan F, Gharaghani S, Rahimian K, Sajedi RH, Zahiri J. Correction to: Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods. BMC Bioinformatics. 2021;22: 366. pmid:34243708
  33. 33. Park B, Im J, Tuvshinjargal N, Lee W, Han K. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models. Comput Methods Programs Biomed. 2014;117: 158–167. pmid:25113160
  34. 34. Suresh V, Parthasarathy S. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures. Protein Pept Lett. 2014;21: 736–742. pmid:23855661
  35. 35. Lánczky A, Győrffy B. Web-Based Survival Analysis Tool Tailored for Medical Research (KMplot): Development and Implementation. J Med Internet Res. 2021;23: e27633. pmid:34309564
  36. 36. Nagy Á, Munkácsy G, Győrffy B. Pancancer survival analysis of cancer hallmark genes. Sci Rep. 2021;11: 6047. pmid:33723286
  37. 37. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49: D480–D489. pmid:33237286
  38. 38. Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26: 680–682. pmid:20053844
  39. 39. Krishnan SM. The evolutionary relationship of S15/NS1RNA binding domains with a similar protein domain pattern—A computational approach. Informatics in Medicine Unlocked. 2021;24: 100611.
  40. 40. Muthukrishnan S, Puri M. Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules. BMC Res Notes. 2018;11: 290. pmid:29751818
  41. 41. Selvaraj M, Puri M, Dikshit KL, Lefevre C. BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial Hemoglobin-Like Proteins. Adv Bioinformatics. 2016;2016: 8150784. pmid:27034664
  42. 42. Agrawal P, Kumar R, Usmani SS, Dhall A, Patiyal S, Sharma N, et al. GPSRdocker: A Docker-based Resource for Genomics, Proteomics and Systems biology. bioRxiv. 2019.
  43. 43. Zhang X, Liu S. RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics. 2017;33: 854–862. pmid:27993780
  44. 44. Palagi L, Sciandrone M. On the convergence of a modified version of SVM light algorithm. Optimization Methods and Software. 2005;20: 317–334.
  45. 45. Di Paola L, Mei G, Di Venere A, Giuliani A. Exploring the stability of dimers through protein structure topology. Curr Protein Pept Sci. 2016;17: 30–36. pmid:26412792
  46. 46. Minicozzi V, Di Venere A, Nicolai E, Giuliani A, Caccuri AM, Di Paola L, et al. Non-symmetrical structural behavior of a symmetric protein: the case of homo-trimeric TRAF2 (tumor necrosis factor-receptor associated factor 2). J Biomol Struct Dyn. 2021;39: 319–329. pmid:31980009
  47. 47. Platania CBM, Di Paola L, Leggio GM, Romano GL, Drago F, Salomone S, et al. Molecular features of interaction between VEGFA and anti-angiogenic drugs used in retinal diseases: a computational approach. Front Pharmacol. 2015;6: 248. pmid:26578958
  48. 48. Di Paola L, Hadi-Alijanvand H, Song X, Hu G, Giuliani A. The Discovery of a Putative Allosteric Site in the SARS-CoV-2 Spike Protein Using an Integrated Structural/Dynamic Approach. J Proteome Res. 2020;19: 4576–4586. pmid:32551648
  49. 49. Mihaylov I, Kańduła M, Krachunov M, Vassilev D. A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models. Biol Direct. 2019;14: 22. pmid:31752974
  50. 50. Werner J, Géron A, Kerssemakers J, Matallana-Surget S. mPies: a novel metaproteomics tool for the creation of relevant protein databases and automatized protein annotation. Biol Direct. 2019;14: 21. pmid:31727118
  51. 51. Amelio I, Bertolo R, Bove P, Candi E, Chiocchi M, Cipriani C, et al. Cancer predictive studies. Biol Direct. 2020;15: 18. pmid:33054808
  52. 52. Han Y, Ye X, Wang C, Liu Y, Zhang S, Feng W, et al. Integration of molecular features with clinical information for predicting outcomes for neuroblastoma patients. Biol Direct. 2019;14: 16. pmid:31443736
  53. 53. Han Y, Ye X, Cheng J, Zhang S, Feng W, Han Z, et al. Integrative analysis based on survival associated co-expression gene modules for predicting Neuroblastoma patients’ survival time. Biol Direct. 2019;14: 4. pmid:30760313
  54. 54. Kim SY, Jeong H-H, Kim J, Moon J-H, Sohn K-A. Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies. Biol Direct. 2019;14: 8. pmid:31036036
  55. 55. Chierici M, Francescatto M, Bussola N, Jurman G, Furlanello C. Predictability of drug-induced liver injury by machine learning. Biology Direct. 2020;15: 3. pmid:32054490
  56. 56. Liu L, Wang G, Wang L, Yu C, Li M, Song S, et al. Computational identification and characterization of glioma candidate biomarkers through multi-omics integrative profiling. Biol Direct. 2020;15: 10. pmid:32539851
  57. 57. Adhikari N, Amin SA, Saha A, Jha T. Combating breast cancer with non-steroidal aromatase inhibitors (NSAIs): Understanding the chemico-biological interactions through comparative SAR/QSAR study. Eur J Med Chem. 2017;137: 365–438. pmid:28622580
  58. 58. Brueggemeier RW, Hackett JC, Diaz-Cruz ES. Aromatase Inhibitors in the Treatment of Breast Cancer. Endocrine Reviews. 2005;26: 331–345. pmid:15814851
  59. 59. Cojocaru V, Winn PJ, Wade RC. The ins and outs of cytochrome P450s. Biochim Biophys Acta. 2007;1770: 390–401. pmid:16920266
  60. 60. Nakajin S, Shinoda M, Hall PF. Purification to homogeneity of aromatase from human placenta. Biochem Biophys Res Commun. 1986;134: 704–710. pmid:3947346
  61. 61. Kellis JT, Vickery LE. Purification and characterization of human placental aromatase cytochrome P-450. Journal of Biological Chemistry. 1987;262: 4413–4420. pmid:3104339
  62. 62. Amarneh B, Simpson ER. Expression of a recombinant derivative of human aromatase P450 in insect cells utilizing the baculovirus vector system. Mol Cell Endocrinol. 1995;109: R1–5. pmid:7664973
  63. 63. Hong Y, Cho M, Yuan Y-C, Chen S. Molecular basis for the interaction of four different classes of substrates and inhibitors with human aromatase. Biochem Pharmacol. 2008;75: 1161–1169. pmid:18184606
  64. 64. Ghosh D, Griswold J, Erman M, Pangborn W. Structural basis for androgen specificity and oestrogen synthesis in human aromatase. Nature. 2009;457: 219–223. pmid:19129847
  65. 65. Schuster D, Laggner C, Steindl TM, Palusczak A, Hartmann RW, Langer T. Pharmacophore modeling and in silico screening for new P450 19 (aromatase) inhibitors. J Chem Inf Model. 2006;46: 1301–1311. pmid:16711749
  66. 66. Shimozawa O, Sakaguchi M, Ogawa H, Harada N, Mihara K, Omura T. Core glycosylation of cytochrome P-450(arom). Evidence for localization of N terminus of microsomal cytochrome P-450 in the lumen. J Biol Chem. 1993;268: 21399–21402. pmid:8407981
  67. 67. Amarneh B, Corbin CJ, Peterson JA, Simpson ER, Graham-Lorence S. Functional domains of human aromatase cytochrome P450 characterized by linear alignment and site-directed mutagenesis. Mol Endocrinol. 1993;7: 1617–1624. pmid:8145767
  68. 68. Ghosh D, Griswold J, Erman M, Pangborn W. X-ray structure of human aromatase reveals an androgen-specific active site. J Steroid Biochem Mol Biol. 2010;118: 197–202. pmid:19808095
  69. 69. Zhao H, Zhou L, Shangguan AJ, Bulun SE. Aromatase expression and regulation in breast and endometrial cancer. J Mol Endocrinol. 2016;57: R19–33. pmid:27067638
  70. 70. Esteban JM, Warsi Z, Haniu M, Hall P, Shively JE, Chen S. Detection of intratumoral aromatase in breast carcinomas. An immunohistochemical study with clinicopathologic correlation. Am J Pathol. 1992;140: 337–343. pmid:1739127
  71. 71. Chumsri S, Schech A, Chakkabat C, Sabnis G, Brodie A. Advances in mechanisms of resistance to aromatase inhibitors. Expert Rev Anticancer Ther. 2014;14: 381–393. pmid:24559291
  72. 72. Price T, Aitken J, Simpson ER. Relative expression of aromatase cytochrome P450 in human fetal tissues as determined by competitive polymerase chain reaction amplification. J Clin Endocrinol Metab. 1992;74: 879–883. pmid:1548354
  73. 73. Yamamoto T, Sakai CN, Yamaki J, Takamori K, Yoshiji S, Kitawaki J, et al. Estrogen biosynthesis in human liver–a comparison of aromatase activity for C-19 steroids in fetal liver, adult liver and hepatoma tissues of human subjects. Endocrinologia japonica. 1984;31 3: 277–81. pmid:6094164
  74. 74. Sasano H, Harada N. Intratumoral aromatase in human breast, endometrial, and ovarian malignancies. Endocr Rev. 1998;19: 593–607. pmid:9793759
  75. 75. Henderson BE, Ross R, Bernstein L. Estrogens as a cause of human cancer: the Richard and Hinda Rosenthal Foundation award lecture. Cancer Res. 1988;48: 246–253. pmid:2825969
  76. 76. Murakami K, Hata S, Miki Y, Sasano H. Aromatase in normal and diseased liver. Hormone Molecular Biology and Clinical Investigation. 2020;41: 20170081. pmid:29489455
  77. 77. Çubukçu HC, Topcu Dİ, Bayraktar N, Gülşen M, Sarı N, Arslan AH. Detection of COVID-19 by Machine Learning Using Routine Laboratory Tests. Am J Clin Pathol. 2022;157: 758–766. pmid:34791032
  78. 78. Abiodun TN, Okunbor D, Osamor VC. Remote Health Monitoring in Clinical Trial using Machine Learning Techniques: A Conceptual Framework. Health Technol (Berl). 2022;12: 359–364. pmid:35308032
  79. 79. Chen Y, Mao Q, Wang B, Duan P, Zhang B, Hong Z. Privacy-Preserving Multi-class Support Vector Machine Model on Medical Diagnosis. IEEE J Biomed Health Inform. 2022;PP. pmid:35259122
  80. 80. Ahmed AA, Abouzid M, Kaczmarek E. Deep Learning Approaches in Histopathology. Cancers. 2022;14. pmid:36358683
  81. 81. Zhao B-W, You Z-H, Hu L, Guo Z-H, Wang L, Chen Z-H, et al. A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning. Cancers (Basel). 2021;13. pmid:33925568
  82. 82. Zhao B-W, Hu L, You Z-H, Wang L, Su X-R. HINGRL: predicting drug–disease associations with graph representation learning on heterogeneous information networks. Briefings in Bioinformatics. 2021;23. pmid:34891172