Core signaling pathways in ovarian cancer stem cell revealed by integrative analysis of multi-marker genomics data

Tumor recurrence occurs in more than 70% of ovarian cancer patients, and the majority eventually becomes refractory to treatments. Ovarian Cancer Stem Cells (OCSCs) are believed to be responsible for the tumor relapse and drug resistance. Therefore, eliminating ovarian CSCs is important to improve the prognosis of ovarian cancer patients. However, there is a lack of effective drugs to eliminate OCSCs because the core signaling pathways regulating OCSCs remain unclear. Also it is often hard for biologists to identify a few testable targets and infer driver signaling pathways regulating CSCs from a large number of differentially expression genes in an unbiased manner. In this study, we propose a straightforward and integrative analysis to identify potential core signaling pathways of OCSCs by integrating transcriptome data of OCSCs isolated based on two distinctive markers, ALDH and side population, with regulatory network (Transcription Factor (TF) and Target Interactome) and signaling pathways. We first identify the common activated TFs in two OCSC populations integrating the gene expression and TF-target Interactome; and then uncover up-stream signaling cascades regulating the activated TFs. In specific, 22 activated TFs are identified. Through literature search validation, 15 of them have been reported in association with cancer stem cells. Additionally, 10 TFs are found in the KEGG signaling pathways, and their up-stream signaling cascades are extracted, which also provide potential treatment targets. Moreover, 40 FDA approved drugs are identified to target on the up-stream signaling cascades, and 15 of them have been reported in literatures in cancer stem cell treatment. In conclusion, the proposed approach can uncover the activated up-stream signaling, activated TFs and up-regulated target genes that constitute the potential core signaling pathways of ovarian CSC. Also drugs and drug combinations targeting on the core signaling pathways might be able to eliminate OCSCs. The proposed approach can also be applied for identifying potential activated signaling pathways of other types of cancers.

Introduction Over 90% of ovarian cancers are epithelial in origin, and epithelial ovarian cancer, especially the most aggressive subtype high-grade serous ovarian cancer (HGSOC), accounts for the majority of ovarian cancer deaths [1,2]. Most tumors are initially responsive to the conventional chemotherapy, and the patients enter into clinical remission after initial treatment. However, tumor recurrence occurs in more than 70% of patients despite treatment, and the majority eventually becomes refractory to treatments [3]. Recent research evidences show that tumor is a mixture of heterogeneous populations of cells with different levels of malignity. A subpopulation of tumor cells characterized by the capacity of self-renewing, differentiation, and tumor-initiating are called cancer stem cells (CSCs) or tumor initiating cells (TICs) [4]. CSCs play important roles in tumor initiation, progression, metastasis, recurrence and drug resistance [4][5][6][7]. Thus, elimination of CSCs is important to overcome drug resistance to improve the prognosis of cancer patients. The knowledge about CSCs is limited, and one major challenge is that it is difficult to identify and isolate CSCs with few biomarkers because CSCs are heterogeneous and could exist a specific hierarchy [8][9][10]. Ovarian cancer stem cells (OCSCs) have been successfully identified and isolated based on their expression of distinctive cell surface markers CD44, CD117, MyD88, and CD133 [11,12], as well as the activity of ALDH [13]. These CSCs harbor enhanced tumorigenicity and chemoresistance [14]) and are thought to drive the universal recurrence of ovarian cancer, as well as responsible for the development of therapeutic resistance [15]. Though studies with these markers show evidence in support of OCSCs [16], there is still a lack of effective drugs to differentiate and eliminate them [17].
Though common signaling pathways, e.g., WNT, NOTCH, SHH, JAK/STAT, have been associated with all types of CSCs [18,19], the core signaling mechanism regulating Ovarian CSCs remain unclear. The differential gene expression analysis often fails to identify genes regulate CSCs because it is difficult to identify a few testable targets from a large number of differentially expressed genes mixed with many passenger genes. Additionally, important proteins regulating CSCs might be missed because either the fold change is small or the gene expression data is not available. Therefore, it is necessary to integrate multi-datasets with prior knowledge, e.g., regulatory network and signaling pathways, to increase the possibility of identifying the true CSC driver genes in the systems biology perspective. Thus, in this study, we propose an approach to identify potential core signaling pathways of OCSCs by integrating transcriptome data of OCSCs isolated based on two distinctive markers, ALDH and side population (Hoechst 33342 stain), with the prior knowledge of regulatory network and KEGG signaling pathways. Our hypothesis is that the integrative analysis of multi-genomics data sets of OCSCs with distinct markers could infer more accurate driver-signaling network regulating CSC to generate a small number of testable biomarkers and drugs. In specific, we first identify the common activated transcription factors (TFs) in two OCSC populations; and then construct the core signaling pathways by uncovering the up-stream signaling cascades of the activated TFs, which constitute the potential core signaling pathways of ovarian CSC. Drugs targeting on the upstream signaling cascades of activated TFs are selected as potential treatments to eliminate ovarian cancer stem cells. The details of methodology and datasets are introduced in Section 2; and analysis results are shown in Section 3, followed by the discussions and conclusion.

Transcriptome datasets of OCSCs
In this study, we manually searched the ovarian CSC gene expression datasets in NCBI GEO (Gene Expression Omnibus) using the aforementioned markers and "Ovarian" as the keywords, e.g., "CD44, Ovarian", and only two datasets were found, i.e., GSE33874 and GSE82304. The dataset GSE33874 is the gene expression profile of isolated side population (SP-Hoechst Blue High and Hoechst Red Low, Ovarian CSCs) and main population (MP) of fresh ascites obtained from women with high-grade advanced stage papillary serous ovarian adenocarcinoma [20]. Gene expression profiles (in triplicate) of SKOV3 human ovarian cancer cells of Aldefluor high (ALDH+, Ovarian CSCs) and Aldefluor low (ALDH-) populations were available in dataset GSE82304 [16]. The GEO2R was employed to get the fold change data of individual genes.

KEGG signaling pathways and regulatory network
To obtain the KEGG signaling pathways, the "Pathview" R package was employed to download KGMLs of humans pathways [21]. Then the "KEGGgraph" R package was used to extract nodes and edges of KEGG signaling pathways from KGMLs [22]. In total, 282 signaling pathways were collected from seven categories: metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases, and drug development. The TF-Target regulatory network was downloaded from the supplemental material of reference [23], which was derived from the TF binding site predictions for all target genes from TRANSFAC (v7.4) [24]. In summary, the TF-target regulatory network consists of 230 TFs, 12733 target genes, and 79100 TF-Target interactions.

Signaling pathway construction
First, the Fisher's exact test (using hypo-geometric distribution) [25] was used to identify the activated TFs by comparing the number of up-regulated targets vs. the number of all target genes, with the number of all the up-regulated genes vs. the number of all the genes tested. The p-value threshold, 0.05, was used to select the activated TFs. Second, all 282 signaling pathways from KEGG were examined, and all the signaling cascades from the starting nodes to the activated TFs were extracted, and then the top 3 signaling paths were kept to construct the signaling pathway regulating the given TFs. The python package, NetworkX, was used to screen all the 282 KEGG signaling pathways to extract signaling cascades starting from the beginning genes of individual signaling pathways to the given TFs. Then we score each signaling cascades using the average expression fold change of genes (on the signaling cascades and with fold change > 0). To control the size of up-stream signaling network of given TFs, the top 3 signaling cascades are kept. The up-regulated target genes (Fold change > = 2) in both datasets are linked to the given TFs.

Twenty-two activated TFs
In the two gene expression datasets, there are 1988 and 2528 up-regulated genes (fold change > = 2); and 883 and 2821 down-regulated genes (fold change < = 0.5) respectively. It is difficult for biologists to identify potential targets associated with Ovarian CSCs from such large number of differentially expressed genes. With the aim of discovering testable regulatory signaling networks that maintain Ovarian CSCs, we identify the activated TFs (whose target genes are up-regulated) in both datasets (CSCs isolated from ALDH + marker and side-population) using the Fisher's exact test by integrating the up-regulated genes (Fold_Change > = 2) and the TF-target interactome (gene regulation network) data. In total, 22 TFs are identified (see Table 1). As can be seen, some TFs will be missed using only gene expression fold change because either there is no gene expression data available (NA) or the fold change is small. We conducted the literature search to evaluate these TFs, and surprisingly 15 TFs have been reported to play important roles in cancer stem cell regulation (see Table 2). For example,

Up-stream signaling cascades regulating activated TFs
We further uncover the up-stream signaling cascades regulating these activated TFs using KEGG signaling pathways. Out of 22 activated TFs, 10 TFs (ELK1, NRF1, FOXO4, LEF1, MEF2A, FOXO3, NFATC2, SPI1, TEAD1 and E2F1) are found in KEGG signaling pathways with up-stream signaling cascades link to them (see Table 1 and red color nodes in Fig 1). As can be seen in Fig 1, many target genes (cyan color nodes) of transcription factors, FOXO4, LEF1, NFATC2, SPI1 and TEAD1, are up-regulated (fold_change > = 2) in both datasets. The yellow color nodes represent the starting proteins activating the signaling cascades in KEGG, e.g., the MAP kinases that are often activated by mitogenic and environmental stress, the EGF growth factor, and FLT3 that encodes a class III receptor tyrosine kinase that regulates hematopoiesis, and PPP3R2 that is related to the calcium signaling. These signaling cascades provide potential testable biomarkers of Ovarian CSCs for experimental design.

FDA approved drugs targeting on up-stream signaling of activated TFs
To investigate potential drugs that can perturb the Ovarian CSCs, we mapped the FDA approved drugs on the integrative signaling network (see Fig 2). The target information  have better effects to eliminate cancer stem cells. In summary, this analysis can provide testable hypothesis and potential drug candidates to eliminate OCSCs.

Discussion and conclusion
Most ovarian cancer tumors are initially responsive to the conventional chemotherapy. Whereas, more than 70% of patients will experience tumor recurrence, and the majority eventually becomes treatment resistant. Ovarian cancer stem cells (CSCs) are thought to drive the universal recurrence of ovarian cancer, as well as responsible for the development of therapeutic resistance. However, the core signaling pathways regulating Ovarian CSCs remain unclear, and there is still a lack of effective drugs and drug combinations to differentiate and eliminate them to improve cancer survival.
In this study, we propose to identify potential core signaling pathways of OCSCs in a datadriven manner by integrating transcriptome data of OCSCs isolated based on two distinctive cell surface markers, ALDH and side population, with prior knowledge, e.g., regulatory network and signaling pathways, to increase the possibility of identifying the true CSC driver Signaling pathways of ovarian cancer stem cell genes and signaling pathways. We identified 22 activated transcription factors, and 15 of them have been reported in the association with cancer stem cells. In addition, 10 transcription factors were found in the KEGG signaling pathways, and we extracted the up-stream signaling cascades regulating these transcription factors, which provide potential core signaling mechanism of ovarian CSC regulation. Moreover, we mapped the FDA approved drugs on these upstream signaling cascades. Forty FDA approved drugs were identified and 15 of these drugs have been reported in cancer stem cell treatment. Combinations of these drugs targeting on different up-stream signaling cascades might be effective to eliminate ovarian cancer stem cells.
The proposed approach can be helpful for discovering synergistic and effective drug combinations. It is well known that inhibiting a single target does not ensure the success of effective treatment due to the complicated interplay of multiple signaling pathways [4]. For example, the activation of Sonic Hedgehog (SHH) signaling and evolution through a mesenchymal phenotype have been uncovered as a novel mechanism of drug resistance to tyrosine kinase inhibitors (TKI) of EGF receptor (EGFR) in lung cancer [48], and play important roles in regulating hepatocellular carcinoma (HCC) [58]. Also, it was reported that the number of CSCs can be increased by MSCs [59], which could be produced by the activation of SHH signaling [60] in ovarian cancer. Thus drug combinations blocking the signaling interplay have high possibility to be synergistic and effective in cancer treatment. For example, Metformin (widely used as  anti-diabetic drug, also identified in this study) and MEK-inhibitors (Selumetinib/Pimasertib targeting on the RAF/RAS/MAPK signaling) were discovered to effectively inhibit the proliferation and metastasis of LKB1 positive Non-Small Cell Lung Cancer (NSCLC) cancer cells [61]. The synergism of the combination is the down-regulation of GLI1, which is the mediator of epithelia-to-mesenchymal transition (EMT) signaling, and can be affected by SHH signaling [61]. Moreover, the drug combination, Metformin and Erlotinib (EGFR inhibitor), is used in a phase II study for the treatment of stage IV NSCLC [62,63]. There are also some limitations of this integrative data analysis. First, the current TF-target interactome data might not be complete and accurate. In the future work, we will integrate new TF-target interaction data resources, e.g., TRUST (text mining) [64] and GTRD (Chip-Seq data) [65], to improve the quality and completeness [66]. Moreover, the tissue specific regulatory network data might be useful to further refine the TF-target interactome data [66,67]. Second, 12 activated TFs are still missed in the up-stream signaling network analysis. Additional signaling pathway database, e.g., BioGRID [68], STRING [69], Reactome [70], could be integrated to identify more up-stream signaling cascades of additional active TFs. Thirdly, other pharmacological data resources, e.g., LINCS (reverse gene signature based data) [71], can be integrated to identify more drugs or prioritize drugs to eliminate the Ovarian CSCs. Also, the drug repositioning [72][73][74] and drug combination prediction [75][76][77] are not trivial tasks. In the future work, we will integrate additional data resources to prioritize targets and drug combinations to block multiple TF and signaling interplays to eliminate ovarian CSCs.