Gene Expression Profile for Predicting Survival in Advanced-Stage Serous Ovarian Cancer Across Two Independent Datasets

Background Advanced-stage ovarian cancer patients are generally treated with platinum/taxane-based chemotherapy after primary debulking surgery. However, there is a wide range of outcomes for individual patients. Therefore, the clinicopathological factors alone are insufficient for predicting prognosis. Our aim is to identify a progression-free survival (PFS)-related molecular profile for predicting survival of patients with advanced-stage serous ovarian cancer. Methodology/Principal Findings Advanced-stage serous ovarian cancer tissues from 110 Japanese patients who underwent primary surgery and platinum/taxane-based chemotherapy were profiled using oligonucleotide microarrays. We selected 88 PFS-related genes by a univariate Cox model (p<0.01) and generated the prognostic index based on 88 PFS-related genes after adjustment of regression coefficients of the respective genes by ridge regression Cox model using 10-fold cross-validation. The prognostic index was independently associated with PFS time compared to other clinical factors in multivariate analysis [hazard ratio (HR), 3.72; 95% confidence interval (CI), 2.66–5.43; p<0.0001]. In an external dataset, multivariate analysis revealed that this prognostic index was significantly correlated with PFS time (HR, 1.54; 95% CI, 1.20–1.98; p = 0.0008). Furthermore, the correlation between the prognostic index and overall survival time was confirmed in the two independent external datasets (log rank test, p = 0.0010 and 0.0008). Conclusions/Significance The prognostic ability of our index based on the 88-gene expression profile in ridge regression Cox hazard model was shown to be independent of other clinical factors in predicting cancer prognosis across two distinct datasets. Further study will be necessary to improve predictive accuracy of the prognostic index toward clinical application for evaluation of the risk of recurrence in patients with advanced-stage serous ovarian cancer.


Introduction
Patients with advanced-stage ovarian cancer generally undergo primary debulking surgery followed by platinum/taxane-based chemotherapy. Although postoperative introduction of taxane drug has improved the 5-year survival rate for advanced-stage ovarian cancer, patients with this cancer have a 5-year survival rate of only 30% [1][2][3]. Clinicopathological characteristics, such as debulking status after primary surgery, are clinically considered important indicators of prognosis [4,5]. However, recurrence after optimal debulking surgery occurs in some patients, while diseasefree status after incomplete surgery is maintained in others. In fact, it has been reported that 34% of patients treated with optimal surgery and platinum-taxane combination chemotherapy for advanced-stage ovarian cancer recur within 12 months [4]. Therefore, these clinicopathological factors alone are insufficient for predicting prognosis and elucidating the pathological mechanisms of disease progression or recurrence. Molecular biology approaches can be used to identify new prognosis-related profiles leading to elucidation of pathological issues of advanced-stage serous ovarian cancer.
Microarray technology has been developing very rapidly, and it has become relatively easy to analyze the expression levels of thousands of genes within cancer cells. Although many studies have reported the associations of gene expression profiles with prognoses in cancer patients [6][7][8][9][10], a limited number of such profiles are used in clinical settings. Microarray technology is clinically applied for predicting prognosis in breast cancer patients. MammaPrint TM (Agendia BV, Amsterdam, the Netherlands) has been already put to practical use for the purpose. Meanwhile, there are no microarray kits for clinical diagnosis and management in patients with ovarian cancer yet.
Three studies have recently reported gene expression profiles that predict overall survival (OS) in ovarian cancer patients using microarray techniques [11][12][13]. These studies use a relative large sample size (n.80) for establishing a survival-related profile in a discovery phase of the experiment and an external independent dataset as the validation set to solve the problem that the number of the genomic variables examined is much larger than that of subjects. Thus, research on the overall survival-related profiles in ovarian cancer patients has progressed, whereas there are no extensive studies based on multicenter validation of gene expression profiles for prediction of disease progression or recurrence in patients with ovarian cancer [14][15]. Prediction of the risk of recurrence in patients with advanced-stage ovarian cancer receiving standard treatments (primary surgery+platinum/ taxane-based chemotherapy) is more important with respect to optimization of clinical management [16].
We have recently reported that there are high similarities in gene expression between early-stage and a subset of advancedstage serous ovarian cancer patients that have favorable prognoses, and two molecular subgroups among patients with advanced-stage serous ovarian cancer according to gene expression profiles reflecting tumor progression and prognosis [17]. In this study, we focused on progression-free survival (PFS) time in a larger number of patients only with advanced-stage serous ovarian cancer treated with platinum/taxane-based chemotherapy, and tried to identify PFS-related gene expression profile using a new survival analysis method: ridge regression Cox model [18]. We then assessed the correlation between our PFS-related genes expression profile and survival time in an external independent dataset of advanced-stage serous ovarian cancer.

Clinical Characteristics
The clinical characteristics of 110 Japanese patients with advanced-stage serous ovarian cancer are summarized in Table 1. In the discovery set, 93 patients (84.5%) were diagnosed as the International Federation of Gynecology and Obstetrics (FIGO) stage III, and 17 patients (15.5%) as FIGO stage IV [19]. All patients received platinum/taxane-based chemotherapy after primary surgery. The median progression-free and overall survival times were 17 and 31 months, respectively.
On the other hand, we used a part of publicly available microarray data (GSE9891) as an external independent dataset (See Materials and Methods) [20]. The clinical characteristics of 87 patients with advanced-stage serous ovarian cancer in the external dataset are listed in Table S1 [20]. Kaplan-Meier survival analysis showed that there were no significant differences in PFS and OS time between patients of the discovery dataset and those of the external dataset ( Figure S1). When we compared clinicopath-ological characteristics between the discovery set and the external dataset, there were significant differences in frequencies of stage (Table S1). Because grading system adopted in the external dataset was distinct from that in the discovery set [21][22][23], we could not make a simple comparison of malignant grade between the two datasets. Then we examined the association between clinicopathological features and PFS time in patients with advanced-stage serous ovarian cancer of each dataset. Multivariate analysis revealed that only optimal surgery was an independent prognostic factor for PFS in the discovery dataset (Table S2) and that there was marginally significant correlation between debulking status of primary surgery and PFS time in the external dataset (Table S2). Therefore, we planned first to develop a prognostic index based on PFS-related genes in the discovery dataset, secondarily to evaluate the prognostic ability of our index in the external dataset using multivariate analysis, and then thirdly to assess predictive performance of the prognostic index again after the stratification of patients according to the debulking status of primary surgery.

Identification of PFS-Related Profile
Using Agilent Whole Human Genome Oligo microarray, we generated gene expression data for 110 advanced-stage serous ovarian cancer patients. Then this dataset was used as a discovery set for the identification of PFS-related profile in patients with advanced-stage serous ovarian cancer. To further evaluate the PFS-related profile, we prepared a part of the GSE9891 dataset as an external independent dataset using Affymetrix Human Genome U133 Plus 2.0 Array (See Materials and Methods) [20]. To deal with cross-platform microarray data appropriately, we analyzed only common genes (28304 probes in Agilent platform; 38497 probes in Affymetrix platform) between the two platforms in this study. Of 28304 Agilent probes, 18178 probes with expression levels marked as ''Present'' in all of the 110 microarray data from the discovery set was further extracted to remove missing and uncertain signals on gene expression, and then the data were per-gene normalized in each dataset by transforming the expression of each gene to a mean of 0 and standard deviation of 1 ( Figure S2). A univariate Cox proportional hazard model showed that expression levels of 97 probes (representing 88 nonredundant genes) were correlated with PFS time (p,0.01). In case of multipletagged 8 genes (represented by 17 probes), we selected 8 probes (one probe per gene) with the largest sum of the squares of individual expression values for the respective genes as representatives [24]. A total of 88 genes (represented by 88 unique probes) were thereby identified as PFS-related profile. Furthermore, we applied the ridge regression model to estimate optimal regression coefficients (b) for 88 genes of the PFS-related profile (Table 2), and calculated the prognostic index for each sample using equation (1) as reported previously [18]. The 88-gene prognostic indices obtained were in the range of -5.09 to 4.14 (median, 0.11), and the frequency distribution of the indices among 110 patients was unimodal.
To assess the prognostic index as a categorical variable, we attempted to divide this dataset into two groups based on median prognostic index of 0.11 [9]. Patients were assigned to the ''highrisk'' group if their prognostic index was greater than or equal to the median value, whereas ''low-risk'' group was composed of cases with the prognostic indices that were less than the median. As shown in Figure 1A, patients with high-risk prognostic indices had shorter median PFS times than those belonging to low-risk group (median PFS, 12 months vs. 51 months; log rank test, p,0.0001).
We then performed univariate and multivariate Cox proportional hazard analyses to prove that the 88-gene prognostic index was an independent prognostic factor ( Table 3). A univariate Cox's proportional hazard analysis showed that the prognostic index, stage, optimal surgery, and histological grade were correlated with PFS (p,0.0001, p = 0.022, p,0.0001 and p = 0.016, respectively). Moreover, a multivariate analysis showed that the prognostic index was most significantly associated with PFS time [hazard ratio (HR), 3.80; 95% confidence interval (CI), 2.68-5.61; p,0.0001].

Validation by Quantitative Real-Time RT-PCR
To validate the microarray expression data, we performed quantitative real-time RT-PCR for a subset of the discovery dataset (53 samples). The four genes, E2F2, FOXJ1, DNAH7, and FILIP1, were randomly selected for this purpose. There were significant correlations between microarray expression data and real-time RT-PCR expression data ( Figure 2). In spite of the smaller sample size, we confirmed a significant association between PFS time and each of the real-time RT-PCR data for the four genes in the univariate Cox hazard model (data not shown).

Appling PFS-Related Profile to the External Dataset
We translated the 88 prognostic genes with Agilent Probe IDs to Affymetrix 196 probes using a translation function in GeneSpring GX 10 and evaluated the present PFS-related profile in the external dataset ( Figure S2). We calculated the prognostic index for each sample in the external dataset by the weighted sum of the expression values of 88 PFS-related genes according to the equation (1), in which the ridge regression coefficients (b) identified in the discovery set were used as weights for the respective genes (See Materials and Methods). We obtained prognostic indices ranging from -5.37 to 4.56 in the external dataset. The frequency distribution of the prognostic indices was not statistically different from that in the discovery set by Kolmogorov Smirnov test (p.0.05).
When we divided the external dataset into two subgroups by the median prognostic index (0.11) in the discovery set, a significant correlation was observed between risk classification and PFS (log rank test, p = 0.0004) ( Figure 1B). In univariate analysis of the external data, the estimated prognosis index and optimal surgery were correlated with PFS time (p = 0.0001 and 0.049, respectively) ( Table 3). Multivariate analysis showed that prognostic index was an independent prognostic factor for PFS time (HR, 1.64; 95% CI, 1.27-2.13, p = 0.0001).

Assessment of Our Prognostic Index
To assess the sensitivity and specificity of our prognostic index, we used ROC curves for the index. An area under ROC curve of 0.5 (indicated by diagonal dotted lines in Figure S3) represents equality between true positive and false positive test results. The extent to which the ROC curve departs from the diagonal line to left and top axes is a measure of the effectiveness of the 88-gene prognostic index in the prediction of clinical outcome. The area under the ROC curves to distinguish early-relapse patients with less than 18 months of PFS times from late-relapse patients was 0.959 and 0.674 in the discovery set and the external dataset, respectively ( Figure S3). When we used median value of prognostic index in the discovery set as the cut-off, the sensitivity and specificity were 88.9% and 85.7% in discovery dataset and 64.4% and 69.2% in the external dataset.
We performed survival analysis after the stratification of patients according to the status of debulking surgery which was an independent prognostic factor in multivariate analysis of the discovery dataset (Table 3). We divided patients into two groups (''optimal group'' and ''suboptimal group'') in each of the discovery and external datasets, and assigned each patient to ''high-risk'' or ''low-risk'' based on the median value of the current prognostic index in each stratum according to the debulking status. Kaplan-Meier survival analysis showed that high-risk patients had significant shorter PFS time than low-risk patients in each of the four strata from the two datasets ( Figure 3) as follows: optimal group (p,0.0001) and suboptimal group (p,0.0001) in our dataset; optimal group (p = 0.0034) and suboptimal group (p = 0.015) in the external dataset. This stratified analysis also indicated that the prognostic index was associated with PFS time independently of the debulking status.

Correlation between This Prognostic Index and Overall Survival
Overall survival is another important endpoint in patients with advance-stage ovarian cancers, and hence we further examined if the present 88-gene prognostic index could be extended to use for predicting the overall survival of patients. To evaluate correlation between this prognostic index and overall survival time, we performed Kaplan-Meier survival curve analysis. Patients with high-risk prognostic indices had shorter overall survival times than the low-risk patients in the two datasets (log rank test, p,0.0001 and p = 0.0010, respectively) ( Figure 1C, D). Furthermore, the prognostic index was significantly associated with overall survival time in both the discovery set and the external dataset in multivariate analysis (Table 4).
In addition, we examined the predictive ability of our prognostic index in publicly available Dressman's dataset [25], in which patients were longer followed-up (median overall survival, 31 months; range, 1-185 months). Dressman's dataset [25] was composed of 119 advanced-stage serous ovarian cancer patients treated with platinum-based chemotherapy (including non-taxane chemotherapy). Because their data were generated by a different platform with the foregoing two datasets, 75% of 88 PFS-related genes were translated for survival prediction in this dataset. When we divided Dressman's dataset [25] into two subgroups by the median prognostic index in discovery dataset, a significant association was observed between risk classification and overall survival (log rank test, p = 0.0008) ( Figure S4). Its prognostic index

Characterization of PFS-Related Profile
We conducted GO analysis to understand the biological characteristics of 88 PFS-related genes. To characterize the gene list based on GO classification on 'biological process', 'molecular function', and 'cellular component', we examined which categories were highly associated with the 88 genes. After multiple testing corrections using the FDR method [26], 8 categories were signi-ficantly overrepresented (FDR q-value,0.10) (Figure 4). In the 88 PFS-related genes, genes involved in GTPase binding (GO17016, GO31267 and GO51020), cellular localization (GO51649 and GO51641), intracellular transport (GO46907 and GO6886), and/ or ciliary or flagellar motility (GO1539) were notably enriched. We investigated similarities in overrepresented GO categories between our 88 PFS-related genes and the previously reported gene expression profiles which were correlated to prognosis in ovarian cancer [11,13]. However, we could not identify common GO categories between our profile and the previously reported profiles (data not shown).  We further used IPA software to analyze 88 PFS-related genes from the viewpoint of molecular interaction or pathway. Top three significant networks (score.25) are shown in Figures S5-7. The network 1 included 15 of the 88 prognostic genes, and was significantly associated with IPA-defined several networks: cell death, neurological disease, and cellular assembly and organization ( Figure S5). Fourteen prognostic genes were included in the network 2, which was defined as networks related to cancer, cell morphology, and renal and urological disease ( Figure S6). The network 3 displayed significant interactions and interrelations between genes involved in cell-to-cell signaling and interaction, hematological system development and function, and immune cell trafficking ( Figure S7). In the 88 genes, we found several genes interacting with SRC or MYC ( Figure S6), each of which was reported as a representative gene in oncogenic pathways of ovarian cancer [25,27].

Discussion
In this study, we identified the prognostic index for predicting PFS time in patients with advanced-stage serous ovarian cancer treated with platinum/taxane-based adjuvant chemotherapy across two types of microarray expression data from the present discovery set and publicly available external set by using the ridge regression Cox model. The significant correlation between our prognostic index and OS time was also indicated in the two independent datasets.
In expression microarray analysis, there is a so-called ''curse of dimensionality'' problem that the number of genes is much larger than the number of samples. To improve the reliability of a gene expression-based prognostic model, it is necessary to avoid overfitting to the dataset, and to confirm the reproducibility of the predictive ability in external independent datasets [28]. Until now, several bioinformatics approaches have been proposed to establish a model for survival prediction using microarray data [18,29]. Bøvelstad et al. [18] recently examined the prediction performance of the following seven methods: univariate selection, forward stepwise selection, principal components regression, supervised principal components regression, partial least squares regression, ridge regression and the lasso using three microarray datasets [Dutch breast cancer data (n = 295), diffuse large B-cell lymphoma data (n = 240), and Norway/Stanford breast cancer data (n = 115)] [7, [30][31][32]. They concluded that the univariate Cox model alone was insufficient for predicting survival and that the ridge regression   Cox model demonstrated the best performance in three datasets. Therefore, we used univariate Cox model only for selecting genes related to PFS time, and adjusted the regression coefficients by the ridge regression Cox model in order to increase the predictive performance of the prognostic index in our dataset. The current study is intended to identify gene expression profile with a superior ability to predict prognosis than other clinicopathological factors. The stratification of patients with ovarian cancer according to clinicopathological prognostic factors is one of important analysis methods for the identification of highly accurate prognostic index [11]. After we stratified patients according to grade, FIGO stage, and status of debulking surgery, we investigated gene expression profile for predicting PFS time in stage III grade 2/3 serous ovarian cancer patients received optimal surgery or suboptimal surgery. However, we could find poorer predictive performance of the prognostic indices from the stratified analyses than that from the non-stratified analysis (Table S3). Besides the reduction of sample size in the discovery and external datasets after the stratification, a variety in clinical features and grading systems between the two datasets (Table S1) might influence the results from these stratified analyses. This is the main reason why we planned to identify prognostic index based on PFS-related genes in 110 advanced-stage serous ovarian cancers and then evaluate the significance of the prognostic index using multivariate analysis including grade, stage, and status of debulking surgery.
Although we enrolled ovarian cancer patients screened carefully by the following three categories: advanced-stage, histological serous-type, and platinum/taxane-based chemotherapy after primary surgery, we established no inclusion or exclusion criterion of histological grade for the enrollment as well as Crijns and colleagues did [12]. This is because a standard system for grading ovarian carcinomas is still under construction in the world, although several grading systems have been proposed for epithelial ovarian cancer [21][22][23]33,34]. According to the three criteria above, we recruited 110 Japanese ovarian cancer patients as a discovery set for the PFS analysis. The prognostic index for each patient was simply calculated by the ridge-regression-weighted sum of 88-gene expression values, and the prognostic power of our index was assessed using Tothill's dataset [20]. Further, subsequent stratified analysis according to debulking status, which was an independent prognostic factor in multivariate analysis of the discovery dataset, indicated that our prognostic index was associated with PFS time independently of the debulking status. However, the sensitivity and specificity of the prognostic index for discriminating between earlyand late-relapse patients were lower in Tothill's dataset than those in the discovery set. This might be caused by different backgrounds in respects of ethnicity or microarray platform. Although the differences in gene expression of cancer tissues among ethnicities have not been reported previously, several studies indicate that the proportions of clear cell and endometrioid histological types in  epithelial ovarian cancer in Asian population are higher than those in non-Asian populations [35,36]. Recent genome-wide association study has identified a single nucleotide polymorphism at 9p22 associated with ovarian cancer risk in subjects with European ancestry but not in non-European descendants [37]. This type of differences between studies could be also attributed to genetic as well as environmental factors. In addition, we cannot rule out the possibility that the present PFS-associated classifiers with ridgeregression-based weights still have insufficient generalization properties on the external dataset due to the problem of overfitting. Therefore, we will reconsider these important issues such as between-study differences in ethnicities and microarray platforms and the overfitting problem using a larger number of microarray data from advanced-stage serous ovarian cancer patients in order to obtain better classifiers for the prediction of prognosis. And to improve the accuracy of prognostic index, development of prognostic index after the stratification of patients will be a research agenda for further study. Interestingly, the present 88-gene prognostic index for prediction of PFS time was also significantly associated with overall survival time in both our dataset and Tothill's dataset [20]. Moreover, we examined the predictive ability of our prognostic index in Dressman's dataset [25] since patients in their dataset received longer-term follow-up than those in the above two datasets. Although Dressman's dataset (n = 119) [25] included 34 patients treated with platinum/cyclophospamide chemotherapy and 3 with single-agent platinum, the significance of this prognostic index for overall survival was still statistically supported in the longer followed-up dataset. As treatments for recurrent ovarian cancer patients remain an open area of investigation aiming to lead to survival benefit [38], our prognostic index for patient with advanced-stage serous ovarian cancer displays a potential to predict not only PFS time but also overall survival time. In the future, we may apply the prognostic indices to estimation of risk of recurrence for serous ovarian cancer patients and select a novel treatment such as dose-dense chemotherapy [39] or molecular-targeted agent for the purpose of improving prognosis of high-risk patients.
There are small number of genes overlapped between our 88 PFS-related profile and previously reported expression-profiles that were related to prognosis or sensitivity of platinum/taxanebased chemotherapy [11][12][13][14][15]40,41]. Konstantinopoulos et al. [6] have discussed that these discrepancies might be related to the use of different microarray platforms with different normalization methods and different degree of contamination by noncancerous cells in a tumor sample, as well as differences in the patient populations under study. Nevertheless, several survival-associated genes such as E2F2 and HLA-DMB [42,43] are included in 88 PFS-related genes. Reimer et al. [42] have reported that E2F2 is associated with grade 3 ovarian tumors and residual disease (more than 2cm in diameter) after initial surgery, and that low E2F2 expression is significantly associated with favorable disease-free and overall survival in epithelial ovarian cancer. Callahan et al. [43] have recently reported that the high expression of HLA-DMB in ovarian cancer cells is correlated with increased numbers of tumor-infiltrating CD8-positive T lymphocytes, and with good prognosis in advanced-stage high-grade serous ovarian cancer.
We performed GO analysis and IPA to assess biological characteristics of PFS-related genes. GO analysis revealed the significant associations of GTPase binding, intracellular transport, and ciliary or flagellar motility with PFS ( Figure 4). PLCE1 belongs to the GTPase binding category and activates MAP kinase or ERK as shown in IPA network 3 ( Figure S7). In particular, previous report indicates that PLCE1 activates the small G protein Ras/MAP kinase signaling [44], which is one of important pathways associated with cell growth and differentiation. Intriguingly, CSE1L included in the intracellular transport category is involved in the regulation of multiple cellular mechanisms, proliferation, and apoptosis [45]. Tanaka et al. [46] have reported that CSE1L is associated with regulated expression of p53 target genes, and that downregulation of CSE1L protects cancer cell from DNA damage-induced apoptosis. DNAH2 and DNAH7 are components of the inner dynein arm of cilialy axonemes, and axonemal dyneins are molecular motors that drive the beating of cilia and flagella. Plotnikova et al. [47] have reported that loss of cilia in cancer cells may contribute to the insensitivity of cancer cells to environmental repressive signals, partly owing to derangement of cell cycle checkpoints governed by cilia and centrosomes. On the other hand, IPA analysis showed several genes interacting with SRC or MYC ( Figure S6), each of which was reported as a representative gene in oncogenic pathways of ovarian cancer [25,27]. Dressman et al. [25] have demonstrated that Src pathway activity is associated with chemotherapy response because of a significant correlation between the activation of Src pathway and poor prognosis in patients with platinum-resistant ovarian cancer. MYC is a multifunctional protooncogene and activated in about 30% of ovarian cancer by several mechanisms [48]. Iba et al. [49] report that MYC expression is associated with responsiveness to platinum-based chemotherapy and with prognosis in patients with epithelial ovarian cancer. Our PFS-related profile might have potentially functional relevance to altered activities of several oncogenic pathways. Although we identified several genes whose molecular function could be linked to prognosis in ovarian cancer patients, further functional study will be necessary to clarify the biological and pathological implications of the PFS-related profile.
These results suggest that the gene expression profile could be a useful tool to predict disease progression or recurrence of advancedstage serous ovarian cancer. To apply the gene expression profile in clinical practice, we will need to improve the predictive ability of the profile and confirm the reliability of survival profile in a prospective multi-center study. Nevertheless, the survival-related profile could provide an optimization of the clinical management and development of new therapeutic strategies for the serous ovarian cancer patients.

Tissue Samples
One hundred ten Japanese patients who were diagnosed with advanced-stage serous ovarian cancer between July 1997 and June 2008 were included in this study. Fresh-frozen samples were obtained from primary tumor tissues during primary debulking surgery prior to chemotherapy. All patients with advanced-stage serous ovarian cancer were treated with platinum/taxane-based chemotherapy after surgery. In principle, patients were seen every 1 to 3 months for the first 2 years. Thereafter, follow-up visits had an interval of 3 to 6 months in the third to fifth year, and 6 to 12 months in the sixth to tenth year. At every follow-up visits, general physical and gynecologic examination were performed. CA125 serum levels were routinely determined. Staging of the disease was assessed according to the criteria of the International Federation of Gynecology and Obstetrics (FIGO) [19]. Optimal debulking surgery was defined as #1cm of gross residual disease. The histological characteristics of surgically resected specimens were assessed on formalin-fixed and paraffin-embedded hematoxylin and eosin sections by two or three gynecological pathologists belonging to the Japanese Society of Pathology at each institute, and frozen tissues containing more than 80% of tumor cells upon histological evaluation were used for RNA extraction. In this study, the degree of histological differentiation is determined according to the increase in the proportion of solid growth within the adenocarcinoma as follows: grade 1, less than 5% solid growth; grade 2, 6-50% solid growth; grade 3, over 50% solid growth based on grading system proposed by Japan Society of Gynecologic Oncology.
PFS time was calculated as the interval from primary surgery to disease progression or recurrence. Based on standard Response Evaluation Criteria In Solid Tumors (RECIST) guidelines [50], disease progression was defined as at least 20% increase in the sum of the longest diameters of all target lesions or as the appearance of one or more new lesions and/or unequivocal progression existing non-target lesions. Overall survival time was calculated as the interval from primary surgery to the death due to ovarian cancer.

Microarray Experiments
Total RNA was extracted from tissue samples as previously described [17]. Five hundred nanograms of total RNA were converted into labeled cRNA with nucleotides coupled to a cyanine 3-CTP (Cy3) (PerkinElmer, Boston, MA, USA) using the Quick Amp Labeling Kit, one-color (Agilent Technologies). Cy3labeled cRNA (1.65 mg) was hybridized for 17 hours at 65uC to an Agilent Whole Human Genome Oligo Microarray, which carries 60-mer probes to more than 40,000 human transcripts. The hybridized microarray was washed and then scanned in Cy3 channel with the Agilent DNA Microarray Scanner (model G2565AA). Signal intensity per spot was generated from the scanned image using Feature Extraction Software version 9.1 (Agilent Technologies) in the default settings. Spots that did not pass quality control procedures were flagged as ''Absent''. The MIAME-compliant microarray data were deposited into the Gene Expression Omnibus data repository (accession number GSE17260).

Microarray Data Analysis
We analyzed our dataset as a ''discovery set'' and the publicly available dataset as an ''external dataset''. Considering differences in microarray platforms, we selected common genes between the Agilent Whole Human Genome Oligo Microarray and Affymetrix Human Genome U133 Plus 2.0 Array, which was the platform in an external dataset (GSE9891) [20].
Data normalization was performed in GeneSpring GX 10 (Agilent Technologies) as follows: (i) Threshold raw signals were set to 1.0. (ii) 75th percentile normalization was chosen as normalized algorithm. (iii) Baseline was transformed to median of all samples. Furthermore, the expression level was normalized by Z-transformation (the mean expression was set to 0 and standard deviation to 1 for each gene in each dataset). In our dataset, 18,178 probes with expression levels marked as ''Present'' in all microarrays were used to remove missing and uncertain signals on gene expression.
The PFS-related genes from the 18,178 probes were identified by univariate Cox proportional hazard analysis, followed by a ridge regression, a penalized Cox regression analysis for survival prediction ( Figure S2). We first identified 97 probes with expression levels correlating with the PFS time determined using the univariate Cox proportional hazard model (p,0.01). In case of multiple probes representing a given gene (so-called multiple tagged gene) in microarrays, only the probe with the largest magnitude (i.e., sum of the squares of per-individual expression values) was extracted as a representative probe for the gene [24]. To avoid the problem of overfitting, ridge regression extension of the multivariate Cox model was employed [18]. The ridge regression shrinks regression coefficients (b) of genes in multivariate Cox model by imposing a penalty on squared values of the coefficients, and is able to handle the problem of having larger number of expression values than individuals in an appropriate way [30]. We estimated regression coefficients of the prognostic genes by the ridge regression Cox model using M-files (available at http://www.med.uio.no/imb/ stat/bmms/software/microsurv/) for MATLAB (Mathworks, Natick, MA, USA). Using 10-fold cross-validation, we obtained regression coefficients with optimal penalty parameter for the penalized Cox model, and calculated a prognostic index for each patient as defined by where b i is the estimated regression coefficient of each gene in discovery dataset under ridge regression multivariate Cox model and X i is the Z-transformed expression value of each gene [18]. The estimated regression coefficient of each PFS-related gene given by ridge regression in the discovery set was also applied to calculate a prognostic index for each patient in external dataset using the equation above. We classified all patients into the two groups (highand low-risk groups) by the median of the prognostic index in discovery set [9]. PFS between high-and low-risk groups was compared using Kaplan-Meier curves and the log rank test using GraphPad PRISM version 4.0 (GraphPad Software, San Diego, CA, USA). Furthermore, We then evaluated the prognostic index in the multivariate Cox proportional hazard model using JMP version 6 (SAS Institute, Cary, NC, USA). We also examined the discrimination performance of the prognostic index between early and late relapse in patients by plotting a receiver operating characteristic (ROC) curve for each dataset (JMP). Because 18 months is the median PFS time for advanced-stage ovarian cancer patients treated with cisplatin-paclitaxel [1], we used 18 months as the cut-off between early and late relapse. We performed ROC curve analysis for our prognostic index in only patients with followup for more than 18 months (Discovery set 103 samples; External dataset 84 samples).
To investigate the biological functions of PFS-related gene expression profiles, we used GO Ontology Browser, embedded in GeneSpring GX [17,51]. The GO Ontology Browser was used to analyze which categories of gene ontology were statistically overrepresented among the gene list obtained. Statistical significance was determined by Fisher's exact test, followed by multiple testing corrections by the Benjamini and Hochberg false discovery rate (FDR) method [26]. Furthermore, we tried to explore molecular interaction networks among the PFS-related genes using Ingenuity Pathway Analysis (IPA) [17].

Evaluation of PFS-Related Genes in the External Dataset
To confirm whether our expression profile could predict prognosis of serous ovarian cancer patients in an independent data set, we selected to use publicly available microarray data (GSE9891) only because the data also disclosed individual clinical characteristics including PFS time. We examined clinical information of these dataset using supplementary data [20]. From this original dataset (n = 285), we selected 87 samples that were (i) diagnosed as advancedstage serous adenocarcinoma, (ii) treated by platinum/taxane-based chemotherapy, (iii) obtained from primary lesion, and (iv) followed-up for more than 12 months (Table S1). Their samples are histologically graded by Silverberg classification [22] whose grading system is different from that in this study.  Figure S4 Appling PFS-related gene expression profile to Dressman's dataset [25]. (A) Multivariate analysis showed a significant association of overall survival with the prognostic index estimated using the 88-gene linear combination model with the ridge regression coefficients from the present discovery set in Dresssman's dataset (HR, 1.51; 95% CI, 1.19-1.93, p = 0.0008) (B) Kaplan-Meier survival curves and the log rank test showed that high-risk patients had shorter overall survival compared to low-risk patients (median survival, 31 and 87 months for high-and low-risk patients, respectively; p = 0.0008).