Advanced-stage ovarian cancer patients are generally treated with platinum/taxane-based chemotherapy after primary debulking surgery. However, there is a wide range of outcomes for individual patients. Therefore, the clinicopathological factors alone are insufficient for predicting prognosis. Our aim is to identify a progression-free survival (PFS)-related molecular profile for predicting survival of patients with advanced-stage serous ovarian cancer.
Advanced-stage serous ovarian cancer tissues from 110 Japanese patients who underwent primary surgery and platinum/taxane-based chemotherapy were profiled using oligonucleotide microarrays. We selected 88 PFS-related genes by a univariate Cox model (p<0.01) and generated the prognostic index based on 88 PFS-related genes after adjustment of regression coefficients of the respective genes by ridge regression Cox model using 10-fold cross-validation. The prognostic index was independently associated with PFS time compared to other clinical factors in multivariate analysis [hazard ratio (HR), 3.72; 95% confidence interval (CI), 2.66–5.43; p<0.0001]. In an external dataset, multivariate analysis revealed that this prognostic index was significantly correlated with PFS time (HR, 1.54; 95% CI, 1.20–1.98; p = 0.0008). Furthermore, the correlation between the prognostic index and overall survival time was confirmed in the two independent external datasets (log rank test, p = 0.0010 and 0.0008).
The prognostic ability of our index based on the 88-gene expression profile in ridge regression Cox hazard model was shown to be independent of other clinical factors in predicting cancer prognosis across two distinct datasets. Further study will be necessary to improve predictive accuracy of the prognostic index toward clinical application for evaluation of the risk of recurrence in patients with advanced-stage serous ovarian cancer.
Citation: Yoshihara K, Tajima A, Yahata T, Kodama S, Fujiwara H, Suzuki M, et al. (2010) Gene Expression Profile for Predicting Survival in Advanced-Stage Serous Ovarian Cancer Across Two Independent Datasets. PLoS ONE 5(3): e9615. https://doi.org/10.1371/journal.pone.0009615
Editor: Zoltán Bochdanovits, VU University Medical Center and Center for Neurogenomics and Cognitive Research, The Netherlands
Received: November 3, 2009; Accepted: February 16, 2010; Published: March 12, 2010
Copyright: © 2010 Yoshihara et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported in part by a Grant-in-Aid for the Third-term Cancer Control Strategy Program from the Ministry of Health, Labor and Welfare, Japan (KT), and 2009 Research and Study Program of Tokai University Educational System General Research Organization (AT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Patients with advanced-stage ovarian cancer generally undergo primary debulking surgery followed by platinum/taxane-based chemotherapy. Although postoperative introduction of taxane drug has improved the 5-year survival rate for advanced-stage ovarian cancer, patients with this cancer have a 5-year survival rate of only 30% –. Clinicopathological characteristics, such as debulking status after primary surgery, are clinically considered important indicators of prognosis , . However, recurrence after optimal debulking surgery occurs in some patients, while disease-free status after incomplete surgery is maintained in others. In fact, it has been reported that 34% of patients treated with optimal surgery and platinum-taxane combination chemotherapy for advanced-stage ovarian cancer recur within 12 months . Therefore, these clinicopathological factors alone are insufficient for predicting prognosis and elucidating the pathological mechanisms of disease progression or recurrence. Molecular biology approaches can be used to identify new prognosis-related profiles leading to elucidation of pathological issues of advanced-stage serous ovarian cancer.
Microarray technology has been developing very rapidly, and it has become relatively easy to analyze the expression levels of thousands of genes within cancer cells. Although many studies have reported the associations of gene expression profiles with prognoses in cancer patients –, a limited number of such profiles are used in clinical settings. Microarray technology is clinically applied for predicting prognosis in breast cancer patients. MammaPrint ™ (Agendia BV, Amsterdam, the Netherlands) has been already put to practical use for the purpose. Meanwhile, there are no microarray kits for clinical diagnosis and management in patients with ovarian cancer yet.
Three studies have recently reported gene expression profiles that predict overall survival (OS) in ovarian cancer patients using microarray techniques –. These studies use a relative large sample size (n>80) for establishing a survival-related profile in a discovery phase of the experiment and an external independent dataset as the validation set to solve the problem that the number of the genomic variables examined is much larger than that of subjects. Thus, research on the overall survival-related profiles in ovarian cancer patients has progressed, whereas there are no extensive studies based on multicenter validation of gene expression profiles for prediction of disease progression or recurrence in patients with ovarian cancer –. Prediction of the risk of recurrence in patients with advanced-stage ovarian cancer receiving standard treatments (primary surgery+platinum/taxane-based chemotherapy) is more important with respect to optimization of clinical management .
We have recently reported that there are high similarities in gene expression between early-stage and a subset of advanced-stage serous ovarian cancer patients that have favorable prognoses, and two molecular subgroups among patients with advanced-stage serous ovarian cancer according to gene expression profiles reflecting tumor progression and prognosis . In this study, we focused on progression-free survival (PFS) time in a larger number of patients only with advanced-stage serous ovarian cancer treated with platinum/taxane-based chemotherapy, and tried to identify PFS-related gene expression profile using a new survival analysis method: ridge regression Cox model . We then assessed the correlation between our PFS-related genes expression profile and survival time in an external independent dataset of advanced-stage serous ovarian cancer.
The clinical characteristics of 110 Japanese patients with advanced-stage serous ovarian cancer are summarized in Table 1. In the discovery set, 93 patients (84.5%) were diagnosed as the International Federation of Gynecology and Obstetrics (FIGO) stage III, and 17 patients (15.5%) as FIGO stage IV . All patients received platinum/taxane-based chemotherapy after primary surgery. The median progression-free and overall survival times were 17 and 31 months, respectively.
On the other hand, we used a part of publicly available microarray data (GSE9891) as an external independent dataset (See Materials and Methods) . The clinical characteristics of 87 patients with advanced-stage serous ovarian cancer in the external dataset are listed in Table S1 . Kaplan-Meier survival analysis showed that there were no significant differences in PFS and OS time between patients of the discovery dataset and those of the external dataset (Figure S1). When we compared clinicopathological characteristics between the discovery set and the external dataset, there were significant differences in frequencies of stage (Table S1). Because grading system adopted in the external dataset was distinct from that in the discovery set –, we could not make a simple comparison of malignant grade between the two datasets. Then we examined the association between clinicopathological features and PFS time in patients with advanced-stage serous ovarian cancer of each dataset. Multivariate analysis revealed that only optimal surgery was an independent prognostic factor for PFS in the discovery dataset (Table S2) and that there was marginally significant correlation between debulking status of primary surgery and PFS time in the external dataset (Table S2). Therefore, we planned first to develop a prognostic index based on PFS-related genes in the discovery dataset, secondarily to evaluate the prognostic ability of our index in the external dataset using multivariate analysis, and then thirdly to assess predictive performance of the prognostic index again after the stratification of patients according to the debulking status of primary surgery.
Identification of PFS-Related Profile
Using Agilent Whole Human Genome Oligo microarray, we generated gene expression data for 110 advanced-stage serous ovarian cancer patients. Then this dataset was used as a discovery set for the identification of PFS-related profile in patients with advanced-stage serous ovarian cancer. To further evaluate the PFS-related profile, we prepared a part of the GSE9891 dataset as an external independent dataset using Affymetrix Human Genome U133 Plus 2.0 Array (See Materials and Methods) . To deal with cross-platform microarray data appropriately, we analyzed only common genes (28304 probes in Agilent platform; 38497 probes in Affymetrix platform) between the two platforms in this study. Of 28304 Agilent probes, 18178 probes with expression levels marked as “Present” in all of the 110 microarray data from the discovery set was further extracted to remove missing and uncertain signals on gene expression, and then the data were per-gene normalized in each dataset by transforming the expression of each gene to a mean of 0 and standard deviation of 1 (Figure S2).
A univariate Cox proportional hazard model showed that expression levels of 97 probes (representing 88 nonredundant genes) were correlated with PFS time (p<0.01). In case of multiple-tagged 8 genes (represented by 17 probes), we selected 8 probes (one probe per gene) with the largest sum of the squares of individual expression values for the respective genes as representatives . A total of 88 genes (represented by 88 unique probes) were thereby identified as PFS-related profile. Furthermore, we applied the ridge regression model to estimate optimal regression coefficients (β) for 88 genes of the PFS-related profile (Table 2), and calculated the prognostic index for each sample using equation (1) as reported previously . The 88-gene prognostic indices obtained were in the range of -5.09 to 4.14 (median, 0.11), and the frequency distribution of the indices among 110 patients was unimodal.
To assess the prognostic index as a categorical variable, we attempted to divide this dataset into two groups based on median prognostic index of 0.11 . Patients were assigned to the “high-risk” group if their prognostic index was greater than or equal to the median value, whereas “low-risk” group was composed of cases with the prognostic indices that were less than the median. As shown in Figure 1A, patients with high-risk prognostic indices had shorter median PFS times than those belonging to low-risk group (median PFS, 12 months vs. 51 months; log rank test, p<0.0001).
High-risk patients had significantly short progression-free survival times compared to low-risk patients (A) in the discovery set (log rank test, p<0.0001) and (B) in the external set (log rank test, p = 0.0004). Similarly, high-risk patients had significantly shorter overall survival times compared to low-risk patients (C) in the discovery set (log rank test, p<0.0001) and (D) in the external set (log rank test, p = 0.0010).
We then performed univariate and multivariate Cox proportional hazard analyses to prove that the 88-gene prognostic index was an independent prognostic factor (Table 3). A univariate Cox's proportional hazard analysis showed that the prognostic index, stage, optimal surgery, and histological grade were correlated with PFS (p<0.0001, p = 0.022, p<0.0001 and p = 0.016, respectively). Moreover, a multivariate analysis showed that the prognostic index was most significantly associated with PFS time [hazard ratio (HR), 3.80; 95% confidence interval (CI), 2.68–5.61; p<0.0001].
Validation by Quantitative Real-Time RT-PCR
To validate the microarray expression data, we performed quantitative real-time RT-PCR for a subset of the discovery dataset (53 samples). The four genes, E2F2, FOXJ1, DNAH7, and FILIP1, were randomly selected for this purpose. There were significant correlations between microarray expression data and real-time RT-PCR expression data (Figure 2). In spite of the smaller sample size, we confirmed a significant association between PFS time and each of the real-time RT-PCR data for the four genes in the univariate Cox hazard model (data not shown).
Appling PFS-Related Profile to the External Dataset
We translated the 88 prognostic genes with Agilent Probe IDs to Affymetrix 196 probes using a translation function in GeneSpring GX 10 and evaluated the present PFS-related profile in the external dataset (Figure S2). We calculated the prognostic index for each sample in the external dataset by the weighted sum of the expression values of 88 PFS-related genes according to the equation (1), in which the ridge regression coefficients (β) identified in the discovery set were used as weights for the respective genes (See Materials and Methods). We obtained prognostic indices ranging from -5.37 to 4.56 in the external dataset. The frequency distribution of the prognostic indices was not statistically different from that in the discovery set by Kolmogorov Smirnov test (p>0.05).
When we divided the external dataset into two subgroups by the median prognostic index (0.11) in the discovery set, a significant correlation was observed between risk classification and PFS (log rank test, p = 0.0004) (Figure 1B). In univariate analysis of the external data, the estimated prognosis index and optimal surgery were correlated with PFS time (p = 0.0001 and 0.049, respectively) (Table 3). Multivariate analysis showed that prognostic index was an independent prognostic factor for PFS time (HR, 1.64; 95% CI, 1.27–2.13, p = 0.0001).
Assessment of Our Prognostic Index
To assess the sensitivity and specificity of our prognostic index, we used ROC curves for the index. An area under ROC curve of 0.5 (indicated by diagonal dotted lines in Figure S3) represents equality between true positive and false positive test results. The extent to which the ROC curve departs from the diagonal line to left and top axes is a measure of the effectiveness of the 88-gene prognostic index in the prediction of clinical outcome. The area under the ROC curves to distinguish early-relapse patients with less than 18 months of PFS times from late-relapse patients was 0.959 and 0.674 in the discovery set and the external dataset, respectively (Figure S3). When we used median value of prognostic index in the discovery set as the cut-off, the sensitivity and specificity were 88.9% and 85.7% in discovery dataset and 64.4% and 69.2% in the external dataset.
We performed survival analysis after the stratification of patients according to the status of debulking surgery which was an independent prognostic factor in multivariate analysis of the discovery dataset (Table 3). We divided patients into two groups (“optimal group” and “suboptimal group”) in each of the discovery and external datasets, and assigned each patient to “high-risk” or “low-risk” based on the median value of the current prognostic index in each stratum according to the debulking status. Kaplan-Meier survival analysis showed that high-risk patients had significant shorter PFS time than low-risk patients in each of the four strata from the two datasets (Figure 3) as follows: optimal group (p<0.0001) and suboptimal group (p<0.0001) in our dataset; optimal group (p = 0.0034) and suboptimal group (p = 0.015) in the external dataset. This stratified analysis also indicated that the prognostic index was associated with PFS time independently of the debulking status.
High-risk patients had significantly short progression-free survival times compared to low-risk patients (A) in optimal (log rank test, p<0.0001) and (B) suboptimal group of discovery dataset (log rank test, p<0.0001). Similarly, high-risk patients had significantly shorter overall survival times compared to low-risk patients (C) in optimal (log rank test, p = 0.0034) and (D) suboptimal group of the external dataset (log rank test, p = 0.015).
Correlation between This Prognostic Index and Overall Survival
Overall survival is another important endpoint in patients with advance-stage ovarian cancers, and hence we further examined if the present 88-gene prognostic index could be extended to use for predicting the overall survival of patients. To evaluate correlation between this prognostic index and overall survival time, we performed Kaplan-Meier survival curve analysis. Patients with high-risk prognostic indices had shorter overall survival times than the low-risk patients in the two datasets (log rank test, p<0.0001 and p = 0.0010, respectively) (Figure 1C, D). Furthermore, the prognostic index was significantly associated with overall survival time in both the discovery set and the external dataset in multivariate analysis (Table 4).
In addition, we examined the predictive ability of our prognostic index in publicly available Dressman's dataset , in which patients were longer followed-up (median overall survival, 31 months; range, 1–185 months). Dressman's dataset  was composed of 119 advanced-stage serous ovarian cancer patients treated with platinum-based chemotherapy (including non-taxane chemotherapy). Because their data were generated by a different platform with the foregoing two datasets, 75% of 88 PFS-related genes were translated for survival prediction in this dataset. When we divided Dressman's dataset  into two subgroups by the median prognostic index in discovery dataset, a significant association was observed between risk classification and overall survival (log rank test, p = 0.0008) (Figure S4). Its prognostic index was significantly correlated with overall survival time in multivariate analysis (HR, 1.51; 95% CI, 1.19–1.93, p = 0.0008).
Characterization of PFS-Related Profile
We conducted GO analysis to understand the biological characteristics of 88 PFS-related genes. To characterize the gene list based on GO classification on ‘biological process’, ‘molecular function’, and ‘cellular component’, we examined which categories were highly associated with the 88 genes. After multiple testing corrections using the FDR method , 8 categories were significantly overrepresented (FDR q-value<0.10) (Figure 4). In the 88 PFS-related genes, genes involved in GTPase binding (GO17016, GO31267 and GO51020), cellular localization (GO51649 and GO51641), intracellular transport (GO46907 and GO6886), and/or ciliary or flagellar motility (GO1539) were notably enriched. We investigated similarities in overrepresented GO categories between our 88 PFS-related genes and the previously reported gene expression profiles which were correlated to prognosis in ovarian cancer , . However, we could not identify common GO categories between our profile and the previously reported profiles (data not shown).
Significantly over-represented 8 gene ontology (GO) categories in GO-based profiling of 88 genes after multiple testing correction of the Benjamini–Hochberg false discovery rate method (FDR q-value<0.10). Over-represented GO categories were identified using all genes on Agilent platform as a background set of genes for the determining p-values. The actual number of the PFS-related genes involved in each category is given in parentheses.
We further used IPA software to analyze 88 PFS-related genes from the viewpoint of molecular interaction or pathway. Top three significant networks (score>25) are shown in Figures S5-7. The network 1 included 15 of the 88 prognostic genes, and was significantly associated with IPA-defined several networks: cell death, neurological disease, and cellular assembly and organization (Figure S5). Fourteen prognostic genes were included in the network 2, which was defined as networks related to cancer, cell morphology, and renal and urological disease (Figure S6). The network 3 displayed significant interactions and interrelations between genes involved in cell-to-cell signaling and interaction, hematological system development and function, and immune cell trafficking (Figure S7). In the 88 genes, we found several genes interacting with SRC or MYC (Figure S6), each of which was reported as a representative gene in oncogenic pathways of ovarian cancer , .
In this study, we identified the prognostic index for predicting PFS time in patients with advanced-stage serous ovarian cancer treated with platinum/taxane-based adjuvant chemotherapy across two types of microarray expression data from the present discovery set and publicly available external set by using the ridge regression Cox model. The significant correlation between our prognostic index and OS time was also indicated in the two independent datasets.
In expression microarray analysis, there is a so-called “curse of dimensionality” problem that the number of genes is much larger than the number of samples. To improve the reliability of a gene expression-based prognostic model, it is necessary to avoid overfitting to the dataset, and to confirm the reproducibility of the predictive ability in external independent datasets . Until now, several bioinformatics approaches have been proposed to establish a model for survival prediction using microarray data , . Bøvelstad et al.  recently examined the prediction performance of the following seven methods: univariate selection, forward stepwise selection, principal components regression, supervised principal components regression, partial least squares regression, ridge regression and the lasso using three microarray datasets [Dutch breast cancer data (n = 295), diffuse large B-cell lymphoma data (n = 240), and Norway/Stanford breast cancer data (n = 115)] , –. They concluded that the univariate Cox model alone was insufficient for predicting survival and that the ridge regression Cox model demonstrated the best performance in three datasets. Therefore, we used univariate Cox model only for selecting genes related to PFS time, and adjusted the regression coefficients by the ridge regression Cox model in order to increase the predictive performance of the prognostic index in our dataset.
The current study is intended to identify gene expression profile with a superior ability to predict prognosis than other clinicopathological factors. The stratification of patients with ovarian cancer according to clinicopathological prognostic factors is one of important analysis methods for the identification of highly accurate prognostic index . After we stratified patients according to grade, FIGO stage, and status of debulking surgery, we investigated gene expression profile for predicting PFS time in stage III grade 2/3 serous ovarian cancer patients received optimal surgery or suboptimal surgery. However, we could find poorer predictive performance of the prognostic indices from the stratified analyses than that from the non-stratified analysis (Table S3). Besides the reduction of sample size in the discovery and external datasets after the stratification, a variety in clinical features and grading systems between the two datasets (Table S1) might influence the results from these stratified analyses. This is the main reason why we planned to identify prognostic index based on PFS-related genes in 110 advanced-stage serous ovarian cancers and then evaluate the significance of the prognostic index using multivariate analysis including grade, stage, and status of debulking surgery.
Although we enrolled ovarian cancer patients screened carefully by the following three categories: advanced-stage, histological serous-type, and platinum/taxane-based chemotherapy after primary surgery, we established no inclusion or exclusion criterion of histological grade for the enrollment as well as Crijns and colleagues did . This is because a standard system for grading ovarian carcinomas is still under construction in the world, although several grading systems have been proposed for epithelial ovarian cancer –, , . According to the three criteria above, we recruited 110 Japanese ovarian cancer patients as a discovery set for the PFS analysis. The prognostic index for each patient was simply calculated by the ridge-regression-weighted sum of 88-gene expression values, and the prognostic power of our index was assessed using Tothill's dataset . Further, subsequent stratified analysis according to debulking status, which was an independent prognostic factor in multivariate analysis of the discovery dataset, indicated that our prognostic index was associated with PFS time independently of the debulking status. However, the sensitivity and specificity of the prognostic index for discriminating between early- and late-relapse patients were lower in Tothill's dataset than those in the discovery set. This might be caused by different backgrounds in respects of ethnicity or microarray platform. Although the differences in gene expression of cancer tissues among ethnicities have not been reported previously, several studies indicate that the proportions of clear cell and endometrioid histological types in epithelial ovarian cancer in Asian population are higher than those in non-Asian populations , . Recent genome-wide association study has identified a single nucleotide polymorphism at 9p22 associated with ovarian cancer risk in subjects with European ancestry but not in non-European descendants . This type of differences between studies could be also attributed to genetic as well as environmental factors. In addition, we cannot rule out the possibility that the present PFS-associated classifiers with ridge-regression-based weights still have insufficient generalization properties on the external dataset due to the problem of overfitting. Therefore, we will reconsider these important issues such as between-study differences in ethnicities and microarray platforms and the overfitting problem using a larger number of microarray data from advanced-stage serous ovarian cancer patients in order to obtain better classifiers for the prediction of prognosis. And to improve the accuracy of prognostic index, development of prognostic index after the stratification of patients will be a research agenda for further study.
Interestingly, the present 88-gene prognostic index for prediction of PFS time was also significantly associated with overall survival time in both our dataset and Tothill's dataset . Moreover, we examined the predictive ability of our prognostic index in Dressman's dataset  since patients in their dataset received longer-term follow-up than those in the above two datasets. Although Dressman's dataset (n = 119)  included 34 patients treated with platinum/cyclophospamide chemotherapy and 3 with single-agent platinum, the significance of this prognostic index for overall survival was still statistically supported in the longer followed-up dataset. As treatments for recurrent ovarian cancer patients remain an open area of investigation aiming to lead to survival benefit , our prognostic index for patient with advanced-stage serous ovarian cancer displays a potential to predict not only PFS time but also overall survival time. In the future, we may apply the prognostic indices to estimation of risk of recurrence for serous ovarian cancer patients and select a novel treatment such as dose-dense chemotherapy  or molecular-targeted agent for the purpose of improving prognosis of high-risk patients.
There are small number of genes overlapped between our 88 PFS-related profile and previously reported expression-profiles that were related to prognosis or sensitivity of platinum/taxane-based chemotherapy –, , . Konstantinopoulos et al.  have discussed that these discrepancies might be related to the use of different microarray platforms with different normalization methods and different degree of contamination by noncancerous cells in a tumor sample, as well as differences in the patient populations under study. Nevertheless, several survival-associated genes such as E2F2 and HLA-DMB ,  are included in 88 PFS-related genes. Reimer et al.  have reported that E2F2 is associated with grade 3 ovarian tumors and residual disease (more than 2cm in diameter) after initial surgery, and that low E2F2 expression is significantly associated with favorable disease-free and overall survival in epithelial ovarian cancer. Callahan et al.  have recently reported that the high expression of HLA-DMB in ovarian cancer cells is correlated with increased numbers of tumor-infiltrating CD8-positive T lymphocytes, and with good prognosis in advanced-stage high-grade serous ovarian cancer.
We performed GO analysis and IPA to assess biological characteristics of PFS-related genes. GO analysis revealed the significant associations of GTPase binding, intracellular transport, and ciliary or flagellar motility with PFS (Figure 4). PLCE1 belongs to the GTPase binding category and activates MAP kinase or ERK as shown in IPA network 3 (Figure S7). In particular, previous report indicates that PLCE1 activates the small G protein Ras/MAP kinase signaling , which is one of important pathways associated with cell growth and differentiation. Intriguingly, CSE1L included in the intracellular transport category is involved in the regulation of multiple cellular mechanisms, proliferation, and apoptosis . Tanaka et al.  have reported that CSE1L is associated with regulated expression of p53 target genes, and that downregulation of CSE1L protects cancer cell from DNA damage-induced apoptosis. DNAH2 and DNAH7 are components of the inner dynein arm of cilialy axonemes, and axonemal dyneins are molecular motors that drive the beating of cilia and flagella. Plotnikova et al.  have reported that loss of cilia in cancer cells may contribute to the insensitivity of cancer cells to environmental repressive signals, partly owing to derangement of cell cycle checkpoints governed by cilia and centrosomes. On the other hand, IPA analysis showed several genes interacting with SRC or MYC (Figure S6), each of which was reported as a representative gene in oncogenic pathways of ovarian cancer , . Dressman et al.  have demonstrated that Src pathway activity is associated with chemotherapy response because of a significant correlation between the activation of Src pathway and poor prognosis in patients with platinum-resistant ovarian cancer. MYC is a multifunctional proto-oncogene and activated in about 30% of ovarian cancer by several mechanisms . Iba et al.  report that MYC expression is associated with responsiveness to platinum-based chemotherapy and with prognosis in patients with epithelial ovarian cancer. Our PFS-related profile might have potentially functional relevance to altered activities of several oncogenic pathways. Although we identified several genes whose molecular function could be linked to prognosis in ovarian cancer patients, further functional study will be necessary to clarify the biological and pathological implications of the PFS-related profile.
These results suggest that the gene expression profile could be a useful tool to predict disease progression or recurrence of advanced-stage serous ovarian cancer. To apply the gene expression profile in clinical practice, we will need to improve the predictive ability of the profile and confirm the reliability of survival profile in a prospective multi-center study. Nevertheless, the survival-related profile could provide an optimization of the clinical management and development of new therapeutic strategies for the serous ovarian cancer patients.
Materials and Methods
One hundred ten Japanese patients who were diagnosed with advanced-stage serous ovarian cancer between July 1997 and June 2008 were included in this study. Fresh-frozen samples were obtained from primary tumor tissues during primary debulking surgery prior to chemotherapy. All patients with advanced-stage serous ovarian cancer were treated with platinum/taxane-based chemotherapy after surgery. In principle, patients were seen every 1 to 3 months for the first 2 years. Thereafter, follow-up visits had an interval of 3 to 6 months in the third to fifth year, and 6 to 12 months in the sixth to tenth year. At every follow-up visits, general physical and gynecologic examination were performed. CA125 serum levels were routinely determined. Staging of the disease was assessed according to the criteria of the International Federation of Gynecology and Obstetrics (FIGO) . Optimal debulking surgery was defined as ≤1cm of gross residual disease. The histological characteristics of surgically resected specimens were assessed on formalin-fixed and paraffin-embedded hematoxylin and eosin sections by two or three gynecological pathologists belonging to the Japanese Society of Pathology at each institute, and frozen tissues containing more than 80% of tumor cells upon histological evaluation were used for RNA extraction. In this study, the degree of histological differentiation is determined according to the increase in the proportion of solid growth within the adenocarcinoma as follows: grade 1, less than 5% solid growth; grade 2, 6-50% solid growth; grade 3, over 50% solid growth based on grading system proposed by Japan Society of Gynecologic Oncology.
PFS time was calculated as the interval from primary surgery to disease progression or recurrence. Based on standard Response Evaluation Criteria In Solid Tumors (RECIST) guidelines , disease progression was defined as at least 20% increase in the sum of the longest diameters of all target lesions or as the appearance of one or more new lesions and/or unequivocal progression existing non-target lesions. Overall survival time was calculated as the interval from primary surgery to the death due to ovarian cancer. This study was approved by the institutional ethics review board at Niigata University (No. 239, 282, 285, and 318), Niigata Cancer Center Hospital (No. 25), Jichi Medical University (G07-01), Kagoshima City Hospital (H19-21), Hiroshima University (Hi-11), Nagasaki University (080509), Kumamoto University (No. 309), and Tokai University (07I-29). All patients provided written informed consent for the collection of samples and subsequent analysis.
Total RNA was extracted from tissue samples as previously described . Five hundred nanograms of total RNA were converted into labeled cRNA with nucleotides coupled to a cyanine 3-CTP (Cy3) (PerkinElmer, Boston, MA, USA) using the Quick Amp Labeling Kit, one-color (Agilent Technologies). Cy3-labeled cRNA (1.65 µg) was hybridized for 17 hours at 65°C to an Agilent Whole Human Genome Oligo Microarray, which carries 60-mer probes to more than 40,000 human transcripts. The hybridized microarray was washed and then scanned in Cy3 channel with the Agilent DNA Microarray Scanner (model G2565AA). Signal intensity per spot was generated from the scanned image using Feature Extraction Software version 9.1 (Agilent Technologies) in the default settings. Spots that did not pass quality control procedures were flagged as “Absent”. The MIAME-compliant microarray data were deposited into the Gene Expression Omnibus data repository (accession number GSE17260).
Microarray Data Analysis
We analyzed our dataset as a “discovery set” and the publicly available dataset as an “external dataset”. Considering differences in microarray platforms, we selected common genes between the Agilent Whole Human Genome Oligo Microarray and Affymetrix Human Genome U133 Plus 2.0 Array, which was the platform in an external dataset (GSE9891) .
Data normalization was performed in GeneSpring GX 10 (Agilent Technologies) as follows: (i) Threshold raw signals were set to 1.0. (ii) 75th percentile normalization was chosen as normalized algorithm. (iii) Baseline was transformed to median of all samples. Furthermore, the expression level was normalized by Z-transformation (the mean expression was set to 0 and standard deviation to 1 for each gene in each dataset). In our dataset, 18,178 probes with expression levels marked as “Present” in all microarrays were used to remove missing and uncertain signals on gene expression.
The PFS-related genes from the 18,178 probes were identified by univariate Cox proportional hazard analysis, followed by a ridge regression, a penalized Cox regression analysis for survival prediction (Figure S2). We first identified 97 probes with expression levels correlating with the PFS time determined using the univariate Cox proportional hazard model (p<0.01). In case of multiple probes representing a given gene (so-called multiple tagged gene) in microarrays, only the probe with the largest magnitude (i.e., sum of the squares of per-individual expression values) was extracted as a representative probe for the gene . To avoid the problem of overfitting, ridge regression extension of the multivariate Cox model was employed . The ridge regression shrinks regression coefficients (β) of genes in multivariate Cox model by imposing a penalty on squared values of the coefficients, and is able to handle the problem of having larger number of expression values than individuals in an appropriate way . We estimated regression coefficients of the prognostic genes by the ridge regression Cox model using M-files (available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/) for MATLAB (Mathworks, Natick, MA, USA). Using 10-fold cross-validation, we obtained regression coefficients with optimal penalty parameter for the penalized Cox model, and calculated a prognostic index for each patient as defined by(1)where βi is the estimated regression coefficient of each gene in discovery dataset under ridge regression multivariate Cox model and Xi is the Z-transformed expression value of each gene . The estimated regression coefficient of each PFS-related gene given by ridge regression in the discovery set was also applied to calculate a prognostic index for each patient in external dataset using the equation above. We classified all patients into the two groups (high- and low-risk groups) by the median of the prognostic index in discovery set . PFS between high- and low-risk groups was compared using Kaplan-Meier curves and the log rank test using GraphPad PRISM version 4.0 (GraphPad Software, San Diego, CA, USA). Furthermore, We then evaluated the prognostic index in the multivariate Cox proportional hazard model using JMP version 6 (SAS Institute, Cary, NC, USA). We also examined the discrimination performance of the prognostic index between early and late relapse in patients by plotting a receiver operating characteristic (ROC) curve for each dataset (JMP). Because 18 months is the median PFS time for advanced-stage ovarian cancer patients treated with cisplatin-paclitaxel , we used 18 months as the cut-off between early and late relapse. We performed ROC curve analysis for our prognostic index in only patients with follow-up for more than 18 months (Discovery set 103 samples; External dataset 84 samples).
To investigate the biological functions of PFS-related gene expression profiles, we used GO Ontology Browser, embedded in GeneSpring GX , . The GO Ontology Browser was used to analyze which categories of gene ontology were statistically overrepresented among the gene list obtained. Statistical significance was determined by Fisher's exact test, followed by multiple testing corrections by the Benjamini and Hochberg false discovery rate (FDR) method . Furthermore, we tried to explore molecular interaction networks among the PFS-related genes using Ingenuity Pathway Analysis (IPA) .
Quantitative Real-Time Reverse Transcription Polymerase Chain Reaction (RT-PCR) Analysis
Real-time PCR was performed on E2F2 (Hs00231667_m1, Applied Biosystems), FOXJ1 (Hs00230964_m1, Applied Biosystems), DNAH7 (Hs01022427_m1, Applied Biosystems), and FILIP1 (Hs00325074_m1, Applied Biosystems) for a subset of serous ovarian cancer (n = 53) as previously described . The relative quantification method  was used to measure the amounts of the respective genes in serous ovarian cancer samples, normalized to ACTB (Hs99999903_m1, Applied Biosystems) and TBP (Hs99999910_m1, Applied Biosystems).
Evaluation of PFS-Related Genes in the External Dataset
To confirm whether our expression profile could predict prognosis of serous ovarian cancer patients in an independent data set, we selected to use publicly available microarray data (GSE9891) only because the data also disclosed individual clinical characteristics including PFS time. We examined clinical information of these dataset using supplementary data . From this original dataset (n = 285), we selected 87 samples that were (i) diagnosed as advanced-stage serous adenocarcinoma, (ii) treated by platinum/taxane-based chemotherapy, (iii) obtained from primary lesion, and (iv) followed-up for more than 12 months (Table S1). Their samples are histologically graded by Silverberg classification  whose grading system is different from that in this study.
Kaplan-Meier survival curves between 110 patients in this dataset and 87 in Tothill's dataset.
(0.24 MB TIF)
Analytical process to develop a prognostic index for predicting survival.
(0.48 MB TIF)
Assessment of the sensitivity and specificity of 88-gene prognostic index using receiver-operating characteristic (ROC) curves. When early relapse is positive in the analysis, the area under ROC curve to distinguish early-relapse patients with less than 18 months of progression-free survival times from late-relapse patients was 0.959 and 0.674 in (A) discovery set (early, n = 54; late, n = 49) and in (B) external set (early, n = 45; late, n = 39), respectively.
(0.42 MB TIF)
Appling PFS-related gene expression profile to Dressman's dataset . (A) Multivariate analysis showed a significant association of overall survival with the prognostic index estimated using the 88-gene linear combination model with the ridge regression coefficients from the present discovery set in Dresssman's dataset (HR, 1.51; 95% CI, 1.19–1.93, p = 0.0008) (B) Kaplan-Meier survival curves and the log rank test showed that high-risk patients had shorter overall survival compared to low-risk patients (median survival, 31 and 87 months for high- and low-risk patients, respectively; p = 0.0008).
(0.23 MB TIF)
Molecular interaction networks of 88 progression-free survival-related genes using Ingenuity Pathway Analysis (IPA) software. The prognostic genes incorporated into the respective networks were marked as gray-colored.
(2.42 MB TIF)
Molecular interaction networks of 88 progression-free survival-related genes using Ingenuity Pathway Analysis (IPA) software. The prognostic genes incorporated into the respective networks were marked as gray-colored.
(1.68 MB TIF)
Molecular interaction networks of 88 progression-free survival-related genes using Ingenuity Pathway Analysis (IPA) software. The prognostic genes incorporated into the respective networks were marked as gray-colored.
(1.82 MB TIF)
Clinical characteristics of advanced-stage serous ovarian cancer patients in Tothill's dataset  (n = 87).
(0.04 MB DOC)
Univariate and multivariate Cox's proportional hazard model analysis of prognostic factors for progression-free survival.
(0.04 MB DOC)
We thank tissue donors and supporting medical staff for making this study possible. We are grateful to C. Seki and A. Yukawa for their technical assistance.
Conceived and designed the experiments: KY AT TY II KT. Performed the experiments: KY AT. Analyzed the data: KY AT. Contributed reagents/materials/analysis tools: KY TY SK HF MS YO MH KS HF YK KK HM HT HK II KT. Wrote the paper: KY AT TY II KT.
- 1. McGuire WP, Hoskins WJ, Brady MF, Kucera PR, Partridge EE, et al. (1996) Cyclophosphamide and cisplatin compared with paclitaxel and cisplatin in patients with stage III and stage IV ovarian cancer. N Engl J Med 334: 1–6.
- 2. Piccart MJ, Bertelsen K, James K, Cassidy J, Mangioni C, et al. (2000) Randomized intergroup trial of cisplatin-paclitaxel versus cisplatin-cyclophosphamide in women with advanced epithelial ovarian cancer: three-year results. J Natl Cancer Inst 92: 699–708.
- 3. Cannistra SA (2004) Cancer of the ovary. N Engl J Med 351: 2519–29.
- 4. du Bois A, Reuss A, Pujade-Lauraine E, Harter P, Ray-Coquard I, et al. (2009) Role of surgical outcome as prognostic factor in advanced epithelial ovarian cancer: a combined exploratory analysis of 3 prospectively randomized phase 3 multicenter trials: by the Arbeitsgemeinschaft Gynaekologische Onkologie Studiengruppe Ovarialkarzinom (AGO-OVAR) and the Groupe d'Investigateurs Nationaux Pour les Etudes des Cancers de l'Ovaire (GINECO). Cancer 115: 1234–44.
- 5. Winter WE 3rd, Maxwell GL, Tian C, Carlson JW, Ozols RF, et al. (2007) Prognostic factors for stage III epithelial ovarian cancer: a Gynecologic Oncology Group Study. J Clin Oncol 25: 3621–7.
- 6. Konstantinopoulos PA, Spentzos D, Cannistra SA (2008) Gene-expression profiling in epithelial ovarian cancer. Nat Clin Pract Oncol 5: 577–87.
- 7. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–6.
- 8. Motoori M, Takemasa I, Yano M, Saito S, Miyata H, et al. (2005) Prediction of recurrence in advanced gastric cancer patients after curative resection by gene expression profiling. Int J Cancer 114: 963–8.
- 9. Chen HY, Yu SL, Chen CH, Chang GC, Chen CY, et al. (2007) A five-gene signature and clinical outcome in non-small-cell lung cancer. N Engl J Med 356: 11–20.
- 10. Schramm A, Schulte JH, Klein-Hitpass L, Havers W, Sieverts H, et al. (2005) Prediction of clinical outcome and biological characterization of neuroblastoma by expression profiling. Oncogene 24: 7902–12.
- 11. Bonome T, Levine DA, Shih J, Randonovich M, Pise-Masison CA, et al. (2008) A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. Cancer Res 68: 5478–86.
- 12. Crijns AP, Fehrmann RS, de Jong S, Gerbens F, Meersma GJ, et al. (2009) Survival-related profile, pathways, and transcription factors in ovarian cancer. PLoS Med 6: e24.
- 13. Denkert C, Budczies J, Darb-Esfahani S, Györffy B, Sehouli J, et al. (2009) A prognostic gene expression index in ovarian cancer - validation across different independent data sets. J Pathol 218: 273–80.
- 14. Hartmann LC, Lu KH, Linette GP, Cliby WA, Kalli KR, et al. (2005) Gene expression profiles predict early relapse in ovarian cancer after platinum-paclitaxel chemotherapy. Clin Cancer Res 11: 2149–55.
- 15. Spentzos D, Levine DA, Ramoni MF, Joseph M, Gu X, et al. (2004) Gene expression signature with independent prognostic significance in epithelial ovarian cancer. J Clin Oncol 22: 4700–10.
- 16. Agarwal R, Kaye SB (2006) Expression profiling and individualization of treatment for ovarian cancer. Curr Opin Pharmacol 6: 345–9.
- 17. Yoshihara K, Tajima A, Komata D, Yamamoto T, Kodama S, et al. (2009) Gene expression profiling of advanced-stage serous ovarian cancers distinguishes novel subclasses and implicates ZEB2 in tumor progression and prognosis. Cancer Sci 100: 1421–8.
- 18. Bøvelstad HM, Nygård S, Størvold HL, Aldrin M, Borgan Ø, et al. (2007) Predicting survival from microarray data–a comparative study. Bioinformatics 23: 2080–7.
- 19. FIGO Cancer Committee. (1986) Staging Announcement: FIGO Cancer Committee. Gynecol Oncol 25: 383–5.
- 20. Tothill RW, Tinker AV, George J, Brown R, Fox SB, et al. (2008) Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res 14: 5198–208.
- 21. International Federation of Gynecology and Obstetrics (1971) Classification and staging of malignant tumours in the female pelvis. Acta Obstet Gynecol Scand 50: 1–7.
- 22. Silverberg SG (2000) Histopathologic grading of ovarian carcinoma: a review and proposal. Int J Gynecol Pathol 19: 7–15.
- 23. Kommoss S, Schmidt D, Kommoss F, Hedderich J, Harter P, et al. (2009) Histological grading in a large series of advanced stage ovarian carcinomas by three widely used grading systems: consistent lack of prognostic significance. A translational research subprotocol of a prospective randomized phase III study (AGO-OVAR 3 protocol). Virchows Arch 454: 249–56.
- 24. Woo HG, Park ES, Cheon JH, Kim JH, Lee JS, et al. (2008) Gene expression-based recurrence prediction of hepatitis B virus-related human hepatocellular carcinoma. Clin Cancer Res 14: 2056–64.
- 25. Dressman HK, Berchuck A, Chan G, Zhai J, Bild A, et al. (2007) An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J Clin Oncol 25: 517–25.
- 26. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B 57: 289–300.
- 27. Bild AH, Yao G, Chang JT, Wang Q, Potti A, et al. (2006) Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439: 353–7.
- 28. Dupuy A, Simon RM (2007) Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 99: 147–57.
- 29. Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2: e108.
- 30. van Houwelingen HC, Bruinsma T, Hart AA, Van't Veer LJ, Wessels LF (2006) Cross-validated Cox regression on microarray gene expression data. Stat Med 25: 3201–16.
- 31. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, et al. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346: 1937–47.
- 32. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, et al. (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100: 8418–23.
- 33. Tawassoli FA, Devilee P (2003) Pathology and Genetics. Tumours of the Breast and Female Genital Organs. IARC Press, Lyon.
- 34. Malpica A, Deavers MT, Lu K, Bodurka DC, Atkinson EN, et al. (2004) Grading ovarian serous carcinoma using a two-tier system. Am J Surg Pathol 28: 496–504.
- 35. Goodman MT, Howe HL, Tung KH, Hotes J, Miller BA, et al. (2003) Incidence of ovarian cancer by race and ethnicity in the United States, 1992–1997. Cancer 97: 10 Suppl2676–85.
- 36. McGuire V, Jesser CA, Whittemore AS (2002) Survival among U.S. women with invasive epithelial ovarian cancer. Gynecol Oncol 84: 399–403.
- 37. Song H, Ramus SJ, Tyrer J, Bolton KL, Gentry-Maharaj A, et al. (2009) A genome-wide association study identifies a new ovarian cancer susceptibility locus on 9p22.2. Nat Genet 41: 996–1000.
- 38. Ozols RF (2005) Treatment goals in ovarian cancer. Int J Gynecol Cancer 15: Suppl 13–11.
- 39. Katsumata N, Yasuda M, Takahashi F, Isonishi S, Jobo T, et al. (2009) Dose-dense paclitaxel once a week in combination with carboplatin every 3 weeks for advanced ovarian cancer: a phase 3, open-label, randomised controlled trial. Lancet 374: 1331–8.
- 40. Berchuck A, Iversen ES, Luo J, Clarke JP, Horne H, et al. (2009) Microarray analysis of early stage serous ovarian cancers shows profiles predictive of favorable outcome. Clin Cancer Res 15: 2448–55.
- 41. Helleman J, Jansen MP, Span PN, van Staveren IL, Massuger LF, et al. (2006) Molecular profiling of platinum resistant ovarian cancer. Int J Cancer 118: 1963–71.
- 42. Reimer D, Sadr S, Wiedemair A, Stadlmann S, Concin N, et al. (2007) Clinical relevance of E2F family members in ovarian cancer–an evaluation in a training set of 77 patients. Clin Cancer Res 13: 144–51.
- 43. Callahan MJ, Nagymanyoki Z, Bonome T, Johnson ME, Litkouhi B, et al. (2008) Increased HLA-DMB expression in the tumor epithelium is associated with increased CTL infiltration and improved prognosis in advanced-stage serous ovarian cancer. Clin Cancer Res 14: 7667–73.
- 44. Lopez I, Mak EC, Ding J, Hamm HE, Lomasney JW (2001) A novel bifunctional phospholipase c that is regulated by Galpha 12 and stimulates the Ras/mitogen-activated protein kinase pathway. J Biol Chem 276: 2758–65.
- 45. Behrens P, Brinkmann U, Wellmann A (2003) CSE1L/CAS: its role in proliferation and apoptosis. Apoptosis 8: 39–44.
- 46. Tanaka T, Ohkubo S, Tatsuno I, Prives C (2007) hCAS/CSE1L associates with chromatin and regulates expression of select p53 target genes. Cell 130: 638–50.
- 47. Plotnikova OV, Golemis EA, Pugacheva EN (2008) Cell cycle-dependent ciliogenesis and cancer. Cancer Res 68: 2058–61.
- 48. Darcy KM, Brady WE, Blancato JK, Dickson RB, Hoskins WJ, et al. (2009) Prognostic relevance of c-MYC gene amplification and polysomy for chromosome 8 in suboptimally-resected, advanced stage epithelial ovarian cancers: a Gynecologic Oncology Group study. Gynecol Oncol 114: 472–9.
- 49. Iba T, Kigawa J, Kanamori Y, Itamochi H, Oishi T, et al. (2004) Expression of the c-myc gene as a predictor of chemotherapy response and a prognostic factor in patients with ovarian cancer. Cancer Sci 95: 418–23.
- 50. Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, et al. (2000) New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst 92: 205–16.
- 51. Okada H, Tajima A, Shichiri K, Tanaka A, Tanaka K, et al. (2008) Genome-wide expression of azoospermia testes demonstrates a specific profile and implicates ART3 in genetic susceptibility. PLoS Genet 4: e26.
- 52. Livak KJ, Schmittgen TD (2001) Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the 2−ΔΔCT Method. Methods 25: 402–8.