The authors have declared that no competing interests exist.
Conceived and designed the experiments: SB MA CE LE. Analyzed the data: SB MA LE. Wrote the paper: SB MA RS CE LE. Assisted interpreting the results: RS.
Prostate cancer is currently the most frequently diagnosed malignancy in men and the second leading cause of cancer-related deaths in industrialized countries. Worldwide, an increase in prostate cancer incidence is expected due to an increased life-expectancy, aging of the population and improved diagnosis. Although the specific underlying mechanisms of prostate carcinogenesis remain unknown, prostate cancer is thought to result from a combination of genetic and environmental factors altering key cellular processes. To elucidate these complex interactions and to contribute to the understanding of prostate cancer progression and metastasis, analysis of large scale gene expression studies using bioinformatics approaches is used to decipher regulation of core processes.
In this study, a standardized quality control procedure and statistical analysis (
This study illustrates how a standardized bioinformatics evaluation of existing microarray data and subsequent pathway analysis can quickly and cost-effectively provide essential information about important molecular pathways and cellular processes involved in prostate cancer development and disease progression. The presented results may assist in biomarker profiling and the development of novel treatment approaches.
Prostate cancer is currently the most frequently diagnosed malignancy in men and the second leading cause of cancer-related morbidity and mortality in industrialized countries
Although the specific underlying mechanisms of prostate carcinogenesis have not been unraveled yet, it is supposed that prostate cancer results from a combination of genetic and environmental factors, including several susceptibility genes for inherited prostate cancer, ethnicity and family history, as well as different dietary and life style factors
Due to the complex etiology of prostate cancer, treatment options for prostate cancer patients depend on multiple factors, including a patient’s age and general health status, the prostate specific antigen (PSA) level, as well as the tumor grade and status. One treatment option for localized prostate cancer is radical prostatectomy, the surgical removal of the prostate gland and nearby lymph nodes. However, it is estimated that 25–40% of men undergoing radical prostatectomy will have disease relapse, as detected by increasing serum levels of PSA
Gene expression microarray technology has been the method of choice for monitoring the complex expression patterns between the numerous molecular players such as those involved in prostate cancer. Bioinformatics tools, including quality control (QC) and analysis of the generated data up to the biological pathway level, are required to identify key genes and cellular pathways involved in prostate cancer development and progression.
This study involves microarray data analysis using the open source language R
Starting from the publicly available EMBL repository ArrayExpress: 1) Relevant prostate cancer studies were selected and downloaded; 2) Quality control and data pre-processing steps were performed in the R environment. Microarrays with insufficient sample quality, hybridization quality, signal comparability or array correlation were excluded; 3) For each included study, statistical analysis was performed and pathway analysis was run with PathVisio to identify the biological processes involved; 4) Results were then integrated and compared to literature findings.
Datasets are selected from the public repository ArrayExpress (
We searched the open repository ArrayExpress (
Dataset | ArrayExpress ID | Array type | Number of samples |
|
E-GEOD-3325 | Affymetrix Human Genome U133 Plus 2.0 | 19 (NP: 6, pPC: 7, mPC: 6) (−0) |
|
E-GEOD-6956 | Affymetrix Human Genome U133A 2.0 | 46 (NP: 11 (−3), PC: 35 (−1) |
|
E-GEOD-25136 | Affymetrix Human Genome U133A | 79 (indolent: 40 (−2), recurrent: 39 (−4)) |
|
E-GEOD-2443 | Affymetrix Human Genome U133A | 20 (AI: 10 (−1), AD: 10 (−1)) |
|
E-GEOD-20758 | Affymetrix Human Genome U133 2.0 | 6 (PC: 3, stromal cell samples: 3) (−6) |
The number of excluded arrays for each dataset using the standardized QC procedure is indicated in brackets. Abbreviations: NP: normal prostate, pPC: primary prostate cancer, mPC: metastatic prostate cancer, AI: androgen independent, AD: androgen dependent.
The first dataset by Varambally
The second dataset is a subset of 46 Affymetrix Human Genome U133A 2.0 GeneChips created from a gene expression experiment by Wallace
The third dataset by Sun
The fourth dataset by Best
The fifth dataset by Gregg
Microarray data analysis was performed using the open source language R (version 2.13.0) and R packages of Bioconductor 2.8
Linear modeling using the limma package was conducted to compute the genes for each dataset that were significantly changed between experimental groups, as defined by a p-value smaller than 0.05 and these genes were mapped to biological pathways using PathVisio with WikiPathways content
Statistical and pathway analyses were applied to the data obtained by the standardized QC and normalization procedure (‘reprocessed data’) as well as to the normalized data as provided on ArrayExpress (‘published data’) for each dataset. The results of the reprocessed data and the published data were compared to get an overview of the overlap and differences in pathway analysis results. Results of all datasets were combined to robustly identify central biological pathways involved in prostate carcinogenesis. A detailed description of the applied methods and bioinformatics tools can be found in
It is necessary to assess the quality of microarrays and select those having sufficient quality before running further analyses. To control for the quality of each microarray within a dataset, several metrics were computed, resulting in plots and bar diagrams as illustrated in
Several QC results of the dataset by Varambally
The application of the standardized QC procedure to the dataset by Varambally
QC of the dataset by Wallace
QC of the dataset by Sun
QC of the dataset by Best
QC of the dataset by Gregg
Pathway analysis is considered to ease data interpretation and most importantly to lead to more robust results compared to only providing a signature of differentially expressed genes. Concordantly, it was expected that differences in the results of statistical analysis between the processed data using the standardized procedure and the processed data from ArrayExpress are mitigated by performing analysis at the level of biological pathways.
To this end, pathway analysis using PathVisio was performed to study changes at a biological process level, using pathway content from WikiPathways. Results were compared between the processed data obtained using the standardized procedure and the processed data downloaded from ArrayExpress. For the biological interpretation, only significant pathways with a Z-score higher than 1.9 in at least one of the comparisons were included.
Pathway analysis of the dataset by Varambally
Pathway | Z Score (ArrayExpress) | Z Score (Standardized processing) |
|
|
|
Glutathione metabolism |
|
1.59 |
Striated Muscle Contraction |
|
0.78 |
Endochondral Ossification |
|
1.32 |
Delta-Notch Signaling Pathway |
|
0.68 |
|
|
|
|
|
|
Eicosanoid Synthesis | 1.53 |
|
Prostaglandin Synthesis and Regulation | 1.34 |
|
Id Signaling Pathway | 1.14 |
|
Selenium | 0.98 |
|
Nicotine Activity on Dopaminergic Neurons | 0.69 |
|
Cytoplasmic Ribosomal Proteins | 0.10 |
|
Irinotecan Pathway | 0.07 |
|
Ganglio Sphingolipid Metabolism | −0.21 |
|
Sulfation | −0.34 |
|
Pathway analysis is based on a comparison between benign prostate tissue and primary prostate cancer. Only significant pathways with a Z-score >1.9 in at least one of the two analyses are included. Significant Z-scores are depicted in bold; matches in pathways between the analyses are in italics.
Pathway analysis of the published data from ArrayExpress detected 7 significantly altered pathways with a Z-score higher than 1.9, including e.g. the “Cholesterol Biosynthesis”, “Glutathione metabolism”, and “Delta-Notch Signaling Pathway”.
PathVisio results of the reprocessed data indicated 12 significantly changed pathways involved in prostate carcinogenesis, such as the “Cholesterol Biosynthesis”, “Hedgehog Signaling Pathway”, and “Selenium” pathway, amongst others. Three matches in significant pathways between the reprocessed data and the published data could be detected (
Pathway analysis of the published data comparing metastatic with primary prostate cancer revealed 25 significantly changed pathways, while 20 significantly altered pathways in the reprocessed data could be detected. A summary of these pathway analysis results is given in
Pathway | Z Score (ArrayExpress) | Z Score (Standardized processing) |
|
|
|
|
|
|
Glucuronidation |
|
0.62 |
|
|
|
|
|
|
|
|
|
|
|
|
EGFR1 Signaling Pathway |
|
1.80 |
|
|
|
|
|
|
IL-3 Signaling Pathway |
|
1.16 |
Wnt Signaling Pathway NetPath |
|
1.55 |
IL-7 Signaling Pathway |
|
1.40 |
FAS pathway and Stress induction of HSP regulation |
|
0.48 |
|
|
|
DNA damage response (only ATM dependent) |
|
1.56 |
|
|
|
|
|
|
p38 MAPK Signaling Pathway (BioCarta) |
|
0.66 |
Senescence and Autophagy |
|
1.72 |
|
|
|
Insulin Signaling |
|
0.38 |
|
|
|
|
|
|
Focal Adhesion |
|
0.71 |
Endochondral Ossification | 1.69 |
|
Nucleotide Metabolism | 1.68 |
|
Myometrial Relaxation and Contraction Pathways | 1.55 |
|
DNA Replication | 1.27 |
|
One Carbon Metabolism | 0.88 |
|
Angiogenesis | 0.68 |
|
Pathway analysis is based on a comparison between primary prostate cancer and metastatic prostate cancer. Only significant pathways with a Z-score >1.9 in at least one of the two analyses are included. Significant Z-scores are depicted in bold; matches in pathways between the analyses are in italics.
A considerable overlap in pathways between the reprocessed data and the published data could be detected, where 14 complete matches in pathways were identified (
Pathway analysis of the dataset by Wallace
Pathway | Z Score (ArrayExpress) | Z Score (Standardized processing) |
Cytoplasmic Ribosomal Proteins |
|
0.17 |
|
|
|
EGFR1 Signaling Pathway |
|
1.35 |
|
|
|
TNF-alpha/NF-kB Signaling Pathway |
|
1.05 |
|
|
|
Signaling of Hepatocyte Growth Factor Receptor |
|
0.82 |
Androgen Receptor Signaling Pathway |
|
0.57 |
|
|
|
IL-9 Signaling Pathway |
|
−0.34 |
T Cell Receptor Signaling Pathway |
|
0.37 |
Non-homologous end joining |
|
1.36 |
Proteasome Degradation |
|
1.02 |
Serotonin Receptor 4/6/7 NR3C signaling |
|
−0.49 |
Notch Signaling Pathway |
|
−0.89 |
Fatty Acid Biosynthesis |
|
−0.34 |
Insulin Signaling | 1.49 |
|
Pathway analysis is based on a comparison between normal prostate tissue and prostatic adenocarcinoma. Only significant pathways with a Z-score >1.9 in at least one of the two analyses are included. Significant Z-scores are depicted in bold; matches between the analyses are in italics.
PathVisio analysis of the published data indicated 16 significantly altered pathways, like e.g. the “Cytoplasmic Ribosomal Proteins”, “Electron Transport Chain”, and the “EGFR1 Signaling Pathway” (
As depicted in
Pathway | Z Score (ArrayExpress) | Z Score (Standardized processing) |
miRNAs involved in DDR |
|
−1.22 |
Angiogenesis |
|
0.22 |
IL-2 Signaling Pathway |
|
−0.94 |
FAS pathway and Stress induction of HSP regulation |
|
−0.66 |
T Cell Receptor Signaling Pathway |
|
1.27 |
p38 MAPK Signaling Pathway (BioCarta) |
|
−1.81 |
B Cell Receptor Signaling Pathway |
|
0.20 |
Serotonin Receptor 4/6/7 NR3C signaling |
|
−0.59 |
IL-5 Signaling Pathway |
|
−0.69 |
G1 to S cell cycle control |
|
−2.49 |
Cell cycle |
|
NaN |
TCA Cycle |
|
NaN |
Type II interferon signaling (IFNG) |
|
−0.07 |
DNA damage response |
|
−1.08 |
GPCRs, Class B Secretin-like | 0.17 |
|
Inflammatory Response Pathway | 0.13 |
|
Cholesterol Biosynthesis | −0.99 |
|
Pathway analysis is based on a comparison between recurrent and non-recurrent prostate cancer. Only significant pathways with a Z-score >1.9 in at least one of the two analyses are included. Significant Z-scores are depicted in bold. A NaN value commonly occurs when none of the genes in the pathway is present in the dataset.
Pathway analysis of the published data identified 14 significantly changed pathways (
PathVisio analysis of the dataset by Best
Pathway | Z Score (ArrayExpress) | Z Score (Standardized processing) |
|
|
|
Catalytic cycle of mammalian FMOs |
|
0.44 |
|
|
|
|
|
|
IL-1 Signaling Pathway |
|
−0.41 |
Focal Adhesion |
|
1.00 |
Nifedipine |
|
0.69 |
Complement and Coagulation Cascades KEGG |
|
1.20 |
Selenium metabolism/Selenoproteins | 1.80 |
|
TGF-beta Receptor Signaling Pathway | 1.30 |
|
ErbB signaling pathway | 1.02 |
|
DNA damage response | 0.82 |
|
Translation Factors | 0.77 |
|
Blood Clotting Cascade | 0.32 |
|
Oxidative Stress | 0.11 |
|
Pathway analysis is based on a comparison between androgen-dependent and androgen-independent prostate cancer. Only significant pathways with a Z-score >1.9 in at least one of the two analyses are included. Significant Z-scores are depicted in bold; matches between the analyses are in italics.
Pathway analysis of the reprocessed data after QC detected 10 significantly changed biological pathways (
An extensive literature search was performed in order to substantiate the pathway analysis results. Pathway analysis identified several signaling cascades and cellular processes that were overrepresented between the different datasets. These pathways and processes seemed to be characteristic for prostate cancer initiation and progression and could be assigned to three main biological processes, including cholesterol biosynthesis, epithelial-to-mesenchymal transition (EMT) involving epidermal growth factor receptor (EGFR) signaling, and an increased metabolic activity. Therefore, the final biological interpretation focused on these cellular processes, and their potential contribution to prostate cancer development.
Several experimental and epidemiological studies provide strong evidence that the cholesterol biosynthesis plays a pivotal role in prostate cancer development and progression
Several studies have shown that cholesterol has the potential to accumulate in solid tumors and that cholesterol homeostasis gets disturbed in the prostate with advancing age and with the transition from a benign to a malignant state. Cholesterol accumulation in prostatic tumors likely occurs by several mechanisms, such as an increased cholesterol uptake from the circulation, loss of feedback regulation due to downregulation of low density lipoprotein receptors, and an upregulation of specific components of the mevalonate (cholesterol synthesis) pathway, like the 3-hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA) reductase
As cholesterol uptake and synthesis are linked with the cell cycle, the association between cholesterol, other lipogenic mechanisms and androgen action suggests the possibility that lipid products of these pathways play a role in androgenic stimulation of prostate cancer growth
Recent studies investigating genes under transcriptional control of the androgen receptor revealed more than 300 androgen-responsive transcripts. The majority of these transcripts encode proteins that are involved in lipid metabolism. The androgen receptor is responsible for the recruitment of a group of transcription factors that drive the expression of the enzymes involved in lipid metabolism. These sterol response element binding proteins (SREBPs) consist of three related transcription factors, SREBP-1a, SREBP-1c, and SREBP-2 that have been indicated as critical regulators of androgen-regulated lipogenesis. SREBP-1c has been identified as being primarily responsible for the transcription of fatty acid biosynthesis genes, such as FAS, while SREBP-2 regulates genes of the cholesterol synthesis pathway, such as HMG-CoA reductase or farnesyl diphosphate synthase
It has been indicated that the strictly coordinated expression and feedback regulation by this transcription factor family is frequently lost in prostate cancer. The elevated expression of a wide variety of genes involved in lipid metabolism supports an essential role of cholesterol synthesis in prostate cancer, but the underlying mechanisms of the uncontrolled activation of SREBPs remain widely unknown
Numerous signaling proteins have been identified to associate with plasma membrane lipid rafts, including the EGFR, the AR, heterotrimeric G-protein subunits, the T-cell receptor, as well as the interleukin-6 (IL-6) receptor. Signaling through the PI3K/Akt phosphorylation cascade has been demonstrated to be a frequent event in prostate tumors that harbor the inactivated lipid phosphatase tumor suppressor gene
Several of the indicated signaling proteins were also found in the pathway analysis results. The EGFR, AR and T-cell receptor, as well as several classes of G-protein coupled receptors appeared to be significantly altered during prostate cancer initiation (
To conclude, pathway analysis confirmed the results of several recent studies that identified cholesterol as playing a promotional role in prostate cancer. PathVisio analysis indicated a dysregulated cholesterol biosynthesis pathway as essential mechanism in prostate cancer initiation and progression to a more aggressive, metastasizing cancer type. Furthermore, cholesterol appeared to be an important element controlling signaling events in prostate cancer cells. It is suggested that the dysregulation of enzymes involved in cholesterol biosynthesis and metabolism may result in increased cholesterol levels in tumor cells. The destabilized cholesterol equilibrium may influence the transition from a coordinated process of cell proliferation and death to a severely altered condition, resulting in uncontrolled growth and progression to androgen-independent prostate cancer
Pathway analysis revealed several signaling cascades, such as the EGFR1/ErbB-, TGF-β-, Wnt-, Delta-Notch- and TNF-α/NF-κB signaling pathways, that can be assigned to a main process known as epithelial-to-mesenchymal transition. EMT is a key event during embryonic development that is required for morphogenetic movements during the reorganization of the embryonic germ layers. The process of EMT has been well documented in cell lines and mouse experiments, but its clinical relevance remains controversial
Cells that undergo EMT are characterized by transient structural changes resulting in loss of polarity and contact with neighboring cells
Several oncogenic pathways like the Wnt-, TGF-β-, Hedgehog-, TNF-α/NF-κB-, EGFR-, and Notch-signaling pathway are supposed to initiate EMT
Overexpression of the EGFR family has been associated with disease progression of numerous malignancies including prostate cancer. In prostatic tumors, EGFR has been indicated to initiate EMT in cooperation with TGF-β, and enhances the invasion of prostate cancer cells. In the presence of androgens, endogenous and ectopically expressed AR directly associates with EGFR and decreases the activation of downstream PI3K signaling leading to cancer cell growth and survival. EGFR may also sensitize prostate cancer cells to low levels of androgens by enhancing co-activator binding and transcriptional activation of endogenous and ectopically expressed AR. Therefore, the observed cross-talk between the AR and EGFR axes leads to the assumption that EGFR-induced EMT and androgen-independence could occur simultaneously in prostatic tumor cells
Another important pathway playing an essential role in the development and progression of prostate cancer is the HIV-I NEF pathway. This pathway comprises the tumor necrosis factor- (TNF) and FAS receptor signaling pathways and seems to be particularly dysregulated in androgen-independent metastatic prostate cancer compared with localized primary prostatic tumors
According to literature, especially the TNF branch of the HIV-I NEF pathway seems to be of high importance and consists of the activation of nuclear factor-κB (NF-κB) by TNF-α
In conclusion, pathway analysis indicated several significant signaling cascades that are in concordance with findings in the literature, such as the TGF-β, TNF-α/NF-κB-, and EGFR-signaling cascade. The interplay of several of such extracellular signaling molecules, growth factors, and transcription factors has been suggested to induce EMT and possesses the potential to serve as EMT marker
Pathway analysis results led to the assumption that an altered metabolic activity might be involved in prostate carcinogenesis, as the analysis detected pathways like “mRNA processing”, a posttranscriptional modification involving enzymatic activity, the “Electron Transport Chain” (ETC), and “Oxidative phosphorylation” that are actively involved in metabolism. Those pathways were found to play an essential role during prostate cancer initiation and transition from a benign to a malignant state (
Several recent studies provide evidence for a promotive role of an increased metabolic activity in prostate cancer tumor growth. It has been shown that a disrupted respiratory chain activity resulting from mutations in mitochondrial DNA (mtDNA) in prostate cancer cells leads to overproduction of ROS contributing to tumor growth
Normal prostate epithelial cells are unique cells that accumulate high concentrations of zinc, which is able to inhibit enzymes involved in the citrate metabolism through the Krebs cycle. A malignant transformation of the prostate is associated with an early metabolic switch, causing decreased zinc accumulation and increased citrate oxidation by activating the enzyme m-aconitase
The ROS induced mitochondrial dysfunction is subsequently able to activate nuclear genes and signaling pathways involved in tumor initiation and progression. For example, ROS are able to induce pathways, like the TNF-α/NF-κB- and PI3K signaling pathway that are involved in increased hypoxia-inducible factor α (HIFα) expression and that activate genes playing an essential role in angiogenesis and tumor metastasis, thereby contributing to tumor growth. Also pathway analysis results indicated the TNF-α/NF-κB signaling pathway as being significantly dysregulated in prostate carcinoma formation (
Pathway analysis was able to confirm the results of several recent studies and identified an increased metabolic activity as a key process of prostate cancer initiation and progression. A metabolic switch of prostatic cells has been indicated as key event during the transformation of benign epithelial cells into malignant cells. Furthermore, mitochondrial dysregulation as a consequence of elevated ROS production has been shown to play an essential role in prostate tumor growth and metastasis
When comparing the results of the pathway analyses of the data processed by the standardized procedure and the data from ArrayExpress, the level of correspondence differs. For some datasets, large differences are observed, which could be caused by differences in any of the analytical steps, including (i) the removal of some of the arrays in the QC phase, (ii) the preprocessing and normalization methods applied, or (iii) the annotation of the reporters. We observed that none of the individual analytical steps in the standardized procedure consistently exerts the strongest effect on pathway analysis results (results not shown). This also essentially depends on the original quality of the dataset and the methods originally used. The paper shows however, that a systematic analysis of existing datasets using a standardized approach is feasible and leads to meaningful and verifiable results, thereby stimulating reuse of already available datasets and reducing cost. Furthermore it demonstrates that in several publicly available datasets, arrays of low quality are still present. Using a pathway approach may, however, make study outcome more robust to individual variations between datasets.
The application of pathway analysis using PathVisio on multiple datasets led to the identification of several signaling pathways and cellular processes that play an important role in prostate cancer development and that subsequently were assigned to three main biological processes, including cholesterol biosynthesis, epithelial-to-mesenchymal transition and an increased metabolic activity. These results were confirmed with findings in the literature. It has been demonstrated that the indicated cellular processes are key contributors to prostate carcinogenesis and metastasis. An altered cholesterol metabolism has been shown to initiate prostate cancer and to promote the transition from a benign into a malignant state. Preclinical studies indicated that the process of EMT was considered as a hallmark of prostate cancer progression and metastasis, while an increased metabolic activity has been demonstrated to contribute to prostate tumor growth and invasiveness as a consequence of ROS-induced mitochondrial dysregulation. These processes may deliver candidates for new biomarkers, and novel targets for therapeutic regimes. Identifying the most commonly altered pathways in both primary and metastatic cancer could lead to building more detailed and realistic, disease-specific maps. Super-imposing expression data may help discriminating treated versus non-treated patients or even improve our understanding of a drug’s mechanism of action or resistance.
In conclusion, we have demonstrated that the application of a standardized bioinformatics workflow, including QC, statistical analysis and pathway analysis, to publicly available datasets, serves as a powerful and cost and time effective approach to reveal the most relevant biological mechanisms underpinning prostate cancer development and progression. Being a generic approach, it can be similarly applied to datasets related to any other disease or condition of interest.
Detailed description of microarray data analysis.
(DOCX)
Supplemental data.
(DOCX)
We would like to thank Andrea Romano, Department of Obstetrics and Gynaecology, Maastricht University Medical Centre, for his kind support and helpful discussion on the biological interpretation. Furthermore, we would like to express our gratitude to Maud Starmans, Maastro Clinic, Maastricht University Medical Centre, for careful revision of the manuscript.