Single Gene Prognostic Biomarkers in Ovarian Cancer: A Meta-Analysis

Purpose To discover novel prognostic biomarkers in ovarian serous carcinomas. Methods A meta-analysis of all single genes probes in the TCGA and HAS ovarian cohorts was performed to identify possible biomarkers using Cox regression as a continuous variable for overall survival. Genes were ranked by p-value using Stouffer’s method and selected for statistical significance with a false discovery rate (FDR) <.05 using the Benjamini-Hochberg method. Results Twelve genes with high mRNA expression were prognostic of poor outcome with an FDR <.05 (AXL, APC, RAB11FIP5, C19orf2, CYBRD1, PINK1, LRRN3, AQP1, DES, XRCC4, BCHE, and ASAP3). Twenty genes with low mRNA expression were prognostic of poor outcome with an FDR <.05 (LRIG1, SLC33A1, NUCB2, POLD3, ESR2, GOLPH3, XBP1, PAXIP1, CYB561, POLA2, CDH1, GMNN, SLC37A4, FAM174B, AGR2, SDR39U1, MAGT1, GJB1, SDF2L1, and C9orf82). Conclusion A meta-analysis of all single genes identified thirty-two candidate biomarkers for their possible role in ovarian serous carcinoma. These genes can provide insight into the drivers or regulators of ovarian cancer and should be evaluated in future studies. Genes with high expression indicating poor outcome are possible therapeutic targets with known antagonists or inhibitors. Additionally, the genes could be combined into a prognostic multi-gene signature and tested in future ovarian cohorts.


Introduction
Ovarian cancer is the fifth leading cause of cancer-related deaths with an estimated 22,000 new cases a year and 15,000 deaths in the United States [1]. From 1950From -2008, the ovarian cancer death rate of 10 per 100,000 women has remained unchanged, indicating the need to identify new and novel therapies for this disease. Standard of care for advanced-stage ovarian cancer is extensive debulking surgery followed by chemotherapy [2][3][4]. A significant factor in the elevated mortality rate is the lack of disease-specific symptoms resulting in late-stage diagnoses where the cure rate for early-stage diagnoses is 90% [5,6]. Identification of serum-based biomarkers and imaging to detect early-stage ovarian cancer for routine screening is one potential strategy to improve overall survival (OS) [7].

Meta-Analysis
Data extraction was conducted in agreement with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidance (S1 File) [22]. The protocol used to perform this meta-analysis was not registered prior given that we are using data as published and a Cox regression analysis as a continuous variable without any pre-determined cutoffs. We used Cox regression analysis to determine the Wald Test p-value for each Affymetrix probe as a continuous variable where mRNA expression is represented as a z-score. The Cox proportional hazards model was used to calculate the hazard ratios (HR) for OS and their 95% confidence intervals (CI) for each probe. The p-value for each single probe from each cohort was combined using Stouffer's method to combine the results from two independent ovarian cohorts. The resulting p-value for each probe in the combined cohorts was used to rank the prognostic probes. Probes with a false discovery rate (FDR) <.05 using the Benjamini-Hochberg method were selected as being statistically significant. For Cox regression survival analysis and Kaplan-Meier figures, the Biojava3-survival module from BioJava [23] was used. The Biojava3-survival module is a direct port of the Cox regression C code in the R survival package [24,25].

Meta-Analysis Cohorts
The TCGA Ovarian HG-U133A cohort was downloaded on May 21, 2015 from the Broad Institute FireBrowse Data Portal (www.firebrowse.org). This TCGA cohort was used as the discovery cohort consisting of 470 samples with 249 events for OS. The OS events were determined from the metadata "vital_status" and the event/censor time was the maximum time from "days_to_last_followup" and "days_to_death" provided in OV.clin.merged.picked.txt. Additional metadata was merged from OV.clin.merged.txt. The TCGA ovarian cohort consists of 77% stage III and 15% stage IV serous carcinoma patients.
Next, a collection of ovarian data sets was downloaded on December 6, 2013 from the kmplot.com website consisting of 1,287 samples [26] and was used as the second cohort in the meta-analysis. The ovarian cohort used for outcome analysis at the kmplot web site is a collection of published cohorts profiled on the Affymetrix platform where the raw CEL files were available for MAS5 normalization as a combined cohort and unique sample identification. The HAS ovarian cohort (HAS = Hungarian Academy of Sciences) includes the TCGA ovarian cohort and those samples were removed to establish an independent cohort. Additionally, the HAS ovarian cohort contains a high number of stage I and stage II samples that were removed to match the high number of stage III and stage IV samples in the TCGA ovarian cohort. The resulting independent HAS ovarian validation cohort consisted of 313 samples with 167 events for OS (91% stage III and 9% stage IV). The metadata for HAS ovarian validation cohort indicates 188 serous carcinoma, 6 endometrial and 121 undefined samples. The HAS ovarian cohort includes samples of seven independent cohorts GSE14764, GSE15622, GSE19829, GSE3149, GSE9891, GSE18520 and GSE26712. The HAS ovarian metadata is limited and does not indicate patient age or other standard cohort metrics.
The TCGA Ovarian Cohort and HAS Cohort are well known publicly available cohorts that can be downloaded by researchers for meta-analysis. The co-authors have no affiliation with the ovarian cohorts and no changes were made to mRNA expression values used in the metaanalysis.

Enrichment Analysis
Gene-annotation enrichment analysis was performed using DAVID tools using default settings [27].

Results
The results of the meta-analysis for statistically significant genes with an FDR <.05 where high expression indicates poor outcome can be found in Table 1, and where low expression indicates poor outcome can be found in Table 2. In total, each of the 17,169 Affymetrix probes were used to determine a prognostic p-value using cox regression analysis. The p-values for each probe in two independent cohorts were combined using Stouffer's method and the probes ranked. The 17,169 probes were used to determine the FDR where probes with an FDR <.05 were considered statistically significant. In total, 32 probes had an FDR <.05 where 12 had high expression indicating poor outcome and 20 had low expression indicating poor outcome. Genes with high expression indicating poor outcome are possible therapeutic targets with known antagonists or inhibitors. The complete list of probes and resulting p-values are provided in the supplemental. For the probes with an FDR <.05 all HR directions were in agreement in the two cohorts providing further support that the single probes were valid biomarkers with minimal false positives. The expectation is that a valid biomarker would have a consistent prognostic HR in that high expression in both cohorts would denote poor outcome. If a statistically significant cutoff for Stouffer's p-value <.001 without an FDR correction was used, it resulted in an additional 105 probes, where 8 (7.6%) of the probes did not have HR agreement in the two cohorts and would be considered false positives. Using a Stouffer p-value <.01 identified an additional 432 probes where 70 (16%) of the probes did not have HR agreement. Using an FDR cutoff of <.05 established a list of 32 probes that were informative of outcome.
Gene enrichment analysis of the 20 genes where low expression indicates poor prognosis were associated with endoplasmic reticulum with a Benjamin correction p-value <.05. For the 12 genes where high expression indicates poor prognosis no statistically significant association.

Discussion
The use of meta-analysis of existing data in publicly available ovarian cancer cohots may yield genes that should be investigated more closely and that may eventually lead to new drug treatments for ovarian cancer patients that have been slow in coming. Chemotherapy is currently used as the standard of care in conjunction with debulking surgery in patients with advanced ovarian cancer [2][3][4]. The addition of targeted therapy in combination with chemotherapy may improve OS, however, identification of these types of drugs remains elusive. Genes that are overexpressed in ovarian tumors are not only potential biomarkers of prognosis but may also be therapeutic targets if those genes correlate with a poor outcome. Conversely, overexpressed genes that are associated with a good outcome can be unintentionally targeted by standard cancer treatments or off-target effects from drugs the patients may be taking for other health issues. We conducted a meta-analysis of mRNA expression data from two ovarian cohorts and used various statistical tools to identify 12 overexpressed (Table 1) and 20 underexpressed (Table 2) genes that correlated with a poor outcome. In this study, overexpression of 12 genes and underexpression of 20 genes were associated with a poor outcome. Thus, our meta-analysis has implicated genes that may be prognostic as well as potential therapeutic targets to pursue in the treatment of ovarian cancer. The ability to generate single gene lists from published ovarian cohorts could also lead to a more thorough understanding of what genes contribute to the ovarian cancer tumorigenic process. The use of bioinformatics, therefore, in conjunction with analysis of clinical and literature databases will be required to cull these gene lists in order to focus on the most potentially relevant ones.