A meta-analysis of all single genes probes in the TCGA and HAS ovarian cohorts was performed to identify possible biomarkers using Cox regression as a continuous variable for overall survival. Genes were ranked by p-value using Stouffer’s method and selected for statistical significance with a false discovery rate (FDR) <.05 using the Benjamini-Hochberg method.
Twelve genes with high mRNA expression were prognostic of poor outcome with an FDR <.05 (AXL, APC, RAB11FIP5, C19orf2, CYBRD1, PINK1, LRRN3, AQP1, DES, XRCC4, BCHE, and ASAP3). Twenty genes with low mRNA expression were prognostic of poor outcome with an FDR <.05 (LRIG1, SLC33A1, NUCB2, POLD3, ESR2, GOLPH3, XBP1, PAXIP1, CYB561, POLA2, CDH1, GMNN, SLC37A4, FAM174B, AGR2, SDR39U1, MAGT1, GJB1, SDF2L1, and C9orf82).
A meta-analysis of all single genes identified thirty-two candidate biomarkers for their possible role in ovarian serous carcinoma. These genes can provide insight into the drivers or regulators of ovarian cancer and should be evaluated in future studies. Genes with high expression indicating poor outcome are possible therapeutic targets with known antagonists or inhibitors. Additionally, the genes could be combined into a prognostic multi-gene signature and tested in future ovarian cohorts.
Citation: Willis S, Villalobos VM, Gevaert O, Abramovitz M, Williams C, Sikic BI, et al. (2016) Single Gene Prognostic Biomarkers in Ovarian Cancer: A Meta-Analysis. PLoS ONE11(2): e0149183. https://doi.org/10.1371/journal.pone.0149183
Editor: William B. Coleman, University of North Carolina School of Medicine, UNITED STATES
Received: December 15, 2015; Accepted: January 4, 2016; Published: February 17, 2016
Copyright: © 2016 Willis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data for the Ovarian TCGA cohort is publicly available from TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/). Data for the Hungarian Academy of Science Cohort is available for download at http://www.kmplot.com.
Funding: This work was supported in part by NIH grants R01 CA114037 and NIH R01 CA 184968 (B. I. Sikic). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Ovarian cancer is the fifth leading cause of cancer-related deaths with an estimated 22,000 new cases a year and 15,000 deaths in the United States . From 1950–2008, the ovarian cancer death rate of 10 per 100,000 women has remained unchanged, indicating the need to identify new and novel therapies for this disease. Standard of care for advanced-stage ovarian cancer is extensive debulking surgery followed by chemotherapy [2–4]. A significant factor in the elevated mortality rate is the lack of disease-specific symptoms resulting in late-stage diagnoses where the cure rate for early-stage diagnoses is 90% [5,6]. Identification of serum-based biomarkers and imaging to detect early-stage ovarian cancer for routine screening is one potential strategy to improve overall survival (OS) .
Various groups have identified large multi-gene signatures that were prognostic of outcome in molecularly profiled ovarian tumor samples [8–21]. We sought to identify single-gene prognostic biomarkers using meta-analysis of publicly available mRNA expression data from ovarian cohorts with known drug-gene interactions that could be potentially used to indicate alternative treatment strategies.
Materials and Methods
Data extraction was conducted in agreement with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidance (S1 File) . The protocol used to perform this meta-analysis was not registered prior given that we are using data as published and a Cox regression analysis as a continuous variable without any pre-determined cutoffs. We used Cox regression analysis to determine the Wald Test p-value for each Affymetrix probe as a continuous variable where mRNA expression is represented as a z-score. The Cox proportional hazards model was used to calculate the hazard ratios (HR) for OS and their 95% confidence intervals (CI) for each probe. The p-value for each single probe from each cohort was combined using Stouffer’s method to combine the results from two independent ovarian cohorts. The resulting p-value for each probe in the combined cohorts was used to rank the prognostic probes. Probes with a false discovery rate (FDR) <.05 using the Benjamini-Hochberg method were selected as being statistically significant. For Cox regression survival analysis and Kaplan–Meier figures, the Biojava3-survival module from BioJava  was used. The Biojava3-survival module is a direct port of the Cox regression C code in the R survival package [24,25].
The TCGA Ovarian HG-U133A cohort was downloaded on May 21, 2015 from the Broad Institute FireBrowse Data Portal (www.firebrowse.org). This TCGA cohort was used as the discovery cohort consisting of 470 samples with 249 events for OS. The OS events were determined from the metadata “vital_status” and the event/censor time was the maximum time from “days_to_last_followup” and “days_to_death” provided in OV.clin.merged.picked.txt. Additional metadata was merged from OV.clin.merged.txt. The TCGA ovarian cohort consists of 77% stage III and 15% stage IV serous carcinoma patients.
Next, a collection of ovarian data sets was downloaded on December 6, 2013 from the kmplot.com website consisting of 1,287 samples  and was used as the second cohort in the meta-analysis. The ovarian cohort used for outcome analysis at the kmplot web site is a collection of published cohorts profiled on the Affymetrix platform where the raw CEL files were available for MAS5 normalization as a combined cohort and unique sample identification. The HAS ovarian cohort (HAS = Hungarian Academy of Sciences) includes the TCGA ovarian cohort and those samples were removed to establish an independent cohort. Additionally, the HAS ovarian cohort contains a high number of stage I and stage II samples that were removed to match the high number of stage III and stage IV samples in the TCGA ovarian cohort. The resulting independent HAS ovarian validation cohort consisted of 313 samples with 167 events for OS (91% stage III and 9% stage IV). The metadata for HAS ovarian validation cohort indicates 188 serous carcinoma, 6 endometrial and 121 undefined samples. The HAS ovarian cohort includes samples of seven independent cohorts GSE14764, GSE15622, GSE19829, GSE3149, GSE9891, GSE18520 and GSE26712. The HAS ovarian metadata is limited and does not indicate patient age or other standard cohort metrics.
The TCGA Ovarian Cohort and HAS Cohort are well known publicly available cohorts that can be downloaded by researchers for meta-analysis. The co-authors have no affiliation with the ovarian cohorts and no changes were made to mRNA expression values used in the meta-analysis.
Gene-annotation enrichment analysis was performed using DAVID tools using default settings .
The results of the meta-analysis for statistically significant genes with an FDR <.05 where high expression indicates poor outcome can be found in Table 1, and where low expression indicates poor outcome can be found in Table 2. In total, each of the 17,169 Affymetrix probes were used to determine a prognostic p-value using cox regression analysis. The p-values for each probe in two independent cohorts were combined using Stouffer’s method and the probes ranked. The 17,169 probes were used to determine the FDR where probes with an FDR <.05 were considered statistically significant. In total, 32 probes had an FDR <.05 where 12 had high expression indicating poor outcome and 20 had low expression indicating poor outcome. Genes with high expression indicating poor outcome are possible therapeutic targets with known antagonists or inhibitors.
(25–75)% is the difference in expression of the 25th and 75th percentile expression on a log scale. The Stouffer p-value was used as the ranking metric combining the p-values from each cohort.
(25–75)% is the difference in expression of the 25th and 75th percentile expression on a log scale. The Stouffer p-value was the ranking metric combining the p-values from each cohort.
The complete list of probes and resulting p-values are provided in the supplemental. For the probes with an FDR <.05 all HR directions were in agreement in the two cohorts providing further support that the single probes were valid biomarkers with minimal false positives. The expectation is that a valid biomarker would have a consistent prognostic HR in that high expression in both cohorts would denote poor outcome. If a statistically significant cutoff for Stouffer’s p-value <.001 without an FDR correction was used, it resulted in an additional 105 probes, where 8 (7.6%) of the probes did not have HR agreement in the two cohorts and would be considered false positives. Using a Stouffer p-value <.01 identified an additional 432 probes where 70 (16%) of the probes did not have HR agreement. Using an FDR cutoff of <.05 established a list of 32 probes that were informative of outcome.
Gene enrichment analysis of the 20 genes where low expression indicates poor prognosis were associated with endoplasmic reticulum with a Benjamin correction p-value <.05. For the 12 genes where high expression indicates poor prognosis no statistically significant association.
The use of meta-analysis of existing data in publicly available ovarian cancer cohots may yield genes that should be investigated more closely and that may eventually lead to new drug treatments for ovarian cancer patients that have been slow in coming. Chemotherapy is currently used as the standard of care in conjunction with debulking surgery in patients with advanced ovarian cancer [2–4]. The addition of targeted therapy in combination with chemotherapy may improve OS, however, identification of these types of drugs remains elusive. Genes that are overexpressed in ovarian tumors are not only potential biomarkers of prognosis but may also be therapeutic targets if those genes correlate with a poor outcome. Conversely, overexpressed genes that are associated with a good outcome can be unintentionally targeted by standard cancer treatments or off-target effects from drugs the patients may be taking for other health issues. We conducted a meta-analysis of mRNA expression data from two ovarian cohorts and used various statistical tools to identify 12 overexpressed (Table 1) and 20 under-expressed (Table 2) genes that correlated with a poor outcome.
In this study, overexpression of 12 genes and underexpression of 20 genes were associated with a poor outcome. Thus, our meta-analysis has implicated genes that may be prognostic as well as potential therapeutic targets to pursue in the treatment of ovarian cancer. The ability to generate single gene lists from published ovarian cohorts could also lead to a more thorough understanding of what genes contribute to the ovarian cancer tumorigenic process. The use of bioinformatics, therefore, in conjunction with analysis of clinical and literature databases will be required to cull these gene lists in order to focus on the most potentially relevant ones.
Conceived and designed the experiments: SW VV OG BS BLJ. Performed the experiments: SW. Analyzed the data: SW. Contributed reagents/materials/analysis tools: SW OG. Wrote the paper: SW VV OG MA CW BS DLJ.
- 1. Siegel R, Naishadham D, Jemal A. Cancer statistics, 2012. CA Cancer J Clin. 2012 Jan;62(1):10–29. pmid:22237781
- 2. Barakat RR, Markman M, Randall M. Principles and practice of gynecologic oncology. Lippincott Williams & Wilkins; 2009.
- 3. Chang SJ, Bristow RE, Ryu HS. Impact of complete cytoreduction leaving no gross residual disease associated with radical cytoreductive surgical procedures on survival in advanced ovarian cancer. Ann Surg Oncol. 2012;
- 4. Ibeanu OA, Bristow RE. Predicting the outcome of cytoreductive surgery for advanced ovarian cancer: a review. International Journal of Gynecological …. 2010;
- 5. Baker TR, Piver MS. Etiology, biology, and epidemiology of ovarian cancer. Semin Surg Oncol. 10(4):242–8. pmid:8091065
- 6. Holschneider CH, Berek JS. Ovarian cancer: epidemiology, biology, and prognostic factors. Semin Surg Oncol. 19(1):3–10. pmid:10883018
- 7. Nolen BM, Lokshin AE. Protein biomarkers of ovarian cancer: the forest and the trees. Future Oncol. 2012 Jan;8(1):55–71. pmid:22149035
- 8. Riester M, Wei W, Waldron L, Culhane AC, Trippa L, Oliva E, et al. Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J Natl Cancer Inst. 2014 May 1;106(5):dju048 –. pmid:24700803
- 9. Verhaak R, Tamayo P, Yang JY, Hubbard D, Zhang H, Creighton CJ, et al. Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. J Clin Invest. 2013;123(1):517–25. pmid:23257362
- 10. Waldron L, Haibe-Kains B, Culhane A, Riester M, Ding J, Wang X, et al. Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J Natl Cancer Inst. 2014;10.
- 11. Yoshihara K, Tsunoda T, Shigemizu D, Fujiwara H, Hatae M, Fujiwara H, et al. High-risk ovarian cancer based on 126-gene expression signature is uniquely characterized by downregulation of antigen presentation pathway. Clin Cancer Res. 2012 Mar 1;18(5):1374–85. pmid:22241791
- 12. Yoshihara K, Tajima A, Yahata T. Gene expression profile for predicting survival in advanced-stage serous ovarian cancer across two independent datasets. PLoS One. 2010;
- 13. Sabatier R, Finetti P, Bonensea J, Jacquemier J, Adelaide J, Lambaudie E, et al. A seven-gene prognostic model for platinum-treated ovarian carcinomas. Br J Cancer. Cancer Research UK; 2011 Jul 12;105(2):304–11.
- 14. Mok SC, Bonome T, Vathipadiekal V, Bell A, Johnson ME, Wong K-K, et al. A gene signature predictive for outcome in advanced ovarian cancer identifies a survival factor: microfibril-associated glycoprotein 2. Cancer Cell. 2009 Dec 8;16(6):521–32. pmid:19962670
- 15. Hernandez L, Hsu SC, Davidson B, Birrer MJ, Kohn EC, Annunziata CM. Activation of NF-kappaB signaling by inhibitor of NF-kappaB kinase beta increases aggressiveness of ovarian cancer. Cancer Res. 2010 May 15;70(10):4005–14. pmid:20424119
- 16. Denkert C, Budczies J, Darb-Esfahani S, Györffy B, Sehouli J, Könsgen D, et al. A prognostic gene expression index in ovarian cancer—validation across different independent data sets. J Pathol. 2009 Jun;218(2):273–80. pmid:19294737
- 17. Crijns A, Fehrmann R, Jong S de. Survival-related profile, pathways, and transcription factors in ovarian cancer. PLoS Med. 2009;
- 18. Integrated genomic analyses of ovarian carcinoma. Nature. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2011 Jun 30;474(7353):609–15.
- 19. Bonome T, Levine DA, Shih J, Randonovich M, Pise-Masison CA, Bogomolniy F, et al. A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. Cancer Res. 2008 Jul 1;68(13):5478–86. pmid:18593951
- 20. Bonome T, Lee J-Y, Park D-C, Radonovich M, Pise-Masison C, Brady J, et al. Expression profiling of serous low malignant potential, low-grade, and high-grade tumors of the ovary. Cancer Res. 2005 Nov 15;65(22):10602–12. pmid:16288054
- 21. Bentink S, Haibe-Kains B, Risch T, Fan JB. Angiogenic mRNA and microRNA gene expression signature predicts a novel subtype of serous ovarian cancer. PLoS One. 2012;
- 22. Moher D. Corrigendum to: Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. International Journal of Surgery 2010;8:336–341. Int J Surg. 2010;8(8):658.
- 23. Prlić A, Yates A, Bliven SE, Rose PW, Jacobsen J, Troshin PV, et al. BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics. 2012 Oct 15;28(20):2693–5. pmid:22877863
- 24. Therneau T. A package for survival analysis in S. R package version 2.37–4. Available: http://CRAN.R-project.org/package=survival …. 2013;
- 25. Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. Springer Science & Business Media; 2000.
- 26. Gyorffy B, Lánczky A, Szállási Z. Implementing an online tool for genome-wide validation of survival-associated biomarkers in ovarian-cancer using microarray data from 1287 patients. Endocr Relat Cancer. 2012;19(2):197–208. pmid:22277193
- 27. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009 Jan;4(1):44–57. pmid:19131956