Exploring Gene Expression Signatures for Predicting Disease Free Survival after Resection of Colorectal Cancer Liver Metastases

Background and Objectives This study was designed to identify and validate gene signatures that can predict disease free survival (DFS) in patients undergoing a radical resection for their colorectal liver metastases (CRLM). Methods Tumor gene expression profiles were collected from 119 patients undergoing surgery for their CRLM in the Paul Brousse Hospital (France) and the University Medical Center Utrecht (The Netherlands). Patients were divided into high and low risk groups. A randomly selected training set was used to find predictive gene signatures. The ability of these gene signatures to predict DFS was tested in an independent validation set comprising the remaining patients. Furthermore, 5 known clinical risk scores were tested in our complete patient cohort. Result No gene signature was found that significantly predicted DFS in the validation set. In contrast, three out of five clinical risk scores were able to predict DFS in our patient cohort. Conclusions No gene signature was found that could predict DFS in patients undergoing CRLM resection. Three out of five clinical risk scores were able to predict DFS in our patient cohort. These results emphasize the need for validating risk scores in independent patient groups and suggest improved designs for future studies.


Introduction
Colorectal cancer is the third most common cancer in men and the second in women worldwide, accounting for approximately 608.000 deaths worldwide [1]. The liver is the most common and often only site of metastatic disease. The development of liver metastases in about 50% of patients is the major determinant of survival in patients with colorectal cancer. Surgical resection is the best treatment option for patients with colorectal liver metastasis offering a median survival of over 40 months after resection compared to a median survival of 18 months when treated with chemotherapy and 6 to12 months if patients remain untreated [2]. Unfortunately, 60%-80% of patients will develop local or distant recurrences after R0 resection of colorectal liver metastasis [2][3][4][5]. Patients with recurrence are likely to benefit from adjuvant chemotherapy. However, 20-40% of the patients do not develop recurrence and will probably better be left untreated after liver resection. Since chemotherapy is associated with serious morbidity and mortality, the therapy-associated risk should therefore be justified by a significant improvement in survival of these patients.
Many research groups have attempted to define factors predicting disease free survival and overall survival (OS) after resection of liver metastasis [5,6]. Recently, five published clinical risk scores, combining different clinical factors, were validated in an independent patient cohort demonstrating that two clinical risk scores were able to predict overall survival in an independent set of patients [7]. Prediction of (disease-free) survival might be improved by the use of gene expression which might capture tumor properties not reflected by clinicopathological variables.
Genome wide gene expression profiling has been used to predict disease outcome or response to therapy in many different tumor types [8,9] It has also been shown that expression profiling can be used to identify colorectal tumors with different aggressiveness and metastatic potential [10][11][12][13]. No study, however, has been published in which gene expression was used to predict disease free survival after resection of colorectal liver metastasis. Identification of a gene signature able to identify recurrence-prone colorectal liver metastases at time of resection would open the way for selection of patients who are likely to benefit from aggressive therapy after resection, while withholding others unnecessary treatment.

Patients and Tumor Samples
Hundred forty-eight patients met the in-and exclusion criteria expression. Profiles were successfully obtained for 119 patients. The baseline characteristics of the 119 included patients, shown in Table 1, did not differ significantly between the high versus low risk group, with the exception of administration of chemotherapy. High-risk patients received neoadjuvant chemotherapy more frequently and adjuvant chemotherapy less frequently than lowrisk patients. Patient samples had a mean tumor cell percentage of  Table S1.

Gene Expression Signature
Using the training set of 75 patients from both centers, a gene signature was discovered consisting of 20 genes (Table S2). This was the most predictive gene signature as measured within the training set, able to predict disease free survival with high statistical significance ( Figure 1A). When used to predict risk for the patients in the independent validation set of 44 patients, however, this gene signature was unable to significantly predict DFS ( Figure 1B). This points to overfitting on the training set patients, a fact underscored by the area under the curve (AUC) of 0.508 (95%CI 0.482-0.534) achieved during the signature discovery (see Methods). The power of the log-rank test used is shown in Figure S1. An analysis to find functional enrichment for the 20 genes in the signature failed to find any significant enrichment. Having failed to find a predictive gene signature, we examined whether a stricter definition of the high and low risk groups would result in a better gene signature by dividing the training set into a high risk group of patients with a DFS less than 6 months and a low risk group with a DFS of at least 2 years ( Figure 2B). Although the validation results of this gene signature seemed to show a positive trend it also failed to reach significance ( Figure 1C).
Some of the clinicopathological factors differed significantly between the patients from the Paul Brousse Hospital and the University Medical Center in Utrecht (Table S3). To explore the possibility that the previous failure to find a predictive gene signature might have been caused by these differences, the gene signature discovery was repeated for the UMC Utrecht samples and the Paul Brousse samples separately. The gene signature derived from the UMC Utrecht data alone did not hold any predictive power when validated ( Figure 1D). Validation of the Paul Brousse gene signature, however, did show a positive trend ( Figure 1E). The result of a multivariate Cox regression, however, suggests that the gene signature is not an independent predictive factor ( Table 2). Stage of the primary tumor and the administration of neoadjuvant chemotherapy seemed sufficient to predict DFS within the validation set. It is possible that neoadjuvant chemotherapy, which is administered before the sample collection, had an effect on the gene expression pattern and was therefore an interfering factor in the experimental setup. This is confirmed by the absence of predictive power when the signature discovery was performed exclusively on Paul Brousse patients who did receive neoadjuvant chemotherapy ( Figure 1F). Additionally, an analysis of the genes differentially expressed between patients treated with neoadjuvant chemotherapy and untreated patients revealed 875 genes that were significantly up-or downregulated (Table S4) suggesting that neoadjuvant chemotherapy induces a sizeable change in the measured gene expression. To investigate whether the absence of a predictive signature was caused by the neoadjuvant treatment bias in the high risk group the signature discovery was repeated using training sets were this bias was removed ( Figure 3) as well as analyzing the neoadjuvant treated and untreated patients separately ( Figure 4). The results strongly suggest that the absence of a predictive signature is independent of the effects of neoadjuvant treatment, adding the caveat that in some of these comparisons the sample size is low. Table S5 shows the predictive performance of all the gene signatures described above when used to predict DFS redefined as a dichotomous variable.
All items of the clinical risk scores were documented except for the status of the hepatoduodenal lymph nodes, which made it impossible for the risk score of Zakaria to be higher than 2. Because we did not include patients with extrahepatic disease in this study, the Basingstoke risk score was not complete. Three out of five clinical risk scores predicted DFS accurately in our patients including the Basingstoke, Fong and Nordlinger risk scores (Table 3). Of these, the score by Fong performed best. Kaplan Meier curves for high and low risk predicted patients, based on the different clinical scores, are depicted in Figure 5.

Discussion
This study was designed to identify and validate a gene expression based classifier that predicts DFS. Unfortunately, we were unable to find a gene signature that could significantly predict DFS in an independent validation set. A gene signature developed using only Paul Brousse patient samples did show a positive trend upon validation. However, in a multivariate Cox regression model, the signature did not prove to be an independent factor for DFS. Instead of reflecting tumor biology, the gene signature appeared to be influenced by a bias in prior administration of chemotherapy, a possibility which should be taken into account when conducting future studies. This view was strengthened both by the absence of predictive power in a gene signature designed in a subset including only Paul Brousse patients receiving neoadjuvant chemotherapy as well as an analysis of differential gene expression between patients treated with neoadjuvant chemotherapy and untreated patients which showed 875 genes differentially expressed. To rule out that the absence of a predictive gene signature was caused by the neoadjuvant treatment bias in the high risk patient group, the signature discovery was repeated using training sets were the neoadjuvant bias was removed as well as analyzing the neoadjuvant treated and untreated patients separately. Similar to earlier results of this study the resulting gene signatures were not predictive of DFS in the brackets. The p value of the log-rank test is shown as well, with the p value adjusted for multiple testing between brackets. A: Survival curves for patients in training set. Gene signature was discovered using the same training set. B: Survival curves for patients in the validation set. Gene signature was discovered using the full training set. C: Survival curves for patients in the validation set. Gene signature was discovered using the full training set defining high risk as DFS #6 months and low risk as DFS .2 years. D: Survival curves for UMC Utrecht patients in the validation set. Gene signature was discovered using the UMC Utrecht subset of the training set. E: Survival curves for Paul Brousse patients in the validation set. Gene signature was discovered using the Paul Brousse subset of the training set. F: Like E but including only Paul Brousse patients who received neoadjuvant chemotherapy (training and validation set). doi:10.1371/journal.pone.0049442.g001 validation set indicating that the overrepresentation of neoadjuvant treatment in the high risk patient group does not explain the lack of positive results.
We also tested five known clinical risk scores and found that Basingstoke, Fong and Nordlinger significantly predicted DFS in our patient group. The fact that three out of five scores were predictive is remarkable given the fact that these clinical risk scores  year unless mentioned otherwise. In all training sets the ratio of patients treated with neoadjuvant chemotherapy to untreated patients in high and low risk group was kept as equal as possible to preclude any treatment bias. The hazard ratio of the gene signature prediction is shown with the 95% confidence interval between brackets. The p value of the log-rank test is shown as well, with the p value adjusted for multiple testing between brackets. A: Survival curves for patients in the validation set. Gene signature was discovered using the full training set controlled for the neoadjuvant treatment bias. B: Survival curves for patients in the validation set. Gene signature was discovered using the full training set defining high risk as DFS #6 months and low risk as DFS .2 years and controlling for the neoadjuvant treatment bias. C: Survival curves for UMC Utrecht patients in the validation set. Gene signature was discovered using the UMC Utrecht subset of the training set controlled for the neoadjuvant treatment bias. D: Survival curves for Paul Brousse patients in the validation set. Gene signature was discovered using the Paul Brousse subset of the training set controlled for the neoadjuvant treatment bias. doi:10.1371/journal.pone.0049442.g003 (CRS) were developed in an era where the use chemotherapy in primary CRC was rare [14][15][16][17][18]. The same five clinical risk scores were recently validated by Reissfelder and colleagues. They found that the Fong and Iwatsuki scores were able to predict disease specific survival in their patients but not Nordlinger and the Basingstoke index [7]. It is remarkable that only the Fong score was predictive in both studies. The non-significant correlation of the Iwatsuki score with DFS could be due to the fact that the highest score could not be calculated, since we did not record the status of the hepatoduodenal lymph nodes. The question remains: why did we not find a signature predicting DFS after resection of colorectal liver metastases? Difficulties in predicting (disease free) survival with gene expression profiling have been reported recently. Lauss et al evaluated the performance of 8 published gene signatures in predicting recurrence in bladder cancer of which none survived the validation [19]. A review evaluating gene signatures developed for predicting survival in lung cancer in 16 studies were all found inadequate for use in clinical practice because of lacking or insufficient validation. In these studies, either the signature did not outperform clinical factors or the authors did not address the influence of any of the clinical factors [20].
We do believe that the design of our study was of sufficient quality to be able find a gene signature for predicting DFS. However, it cannot be excluded that a usable gene signature does Patients are divided into a high and a low risk prediction group based on the risk prediction of the different gene signatures. Gene signatures were discovered defining high risk as DFS #1 year and low risk as DFS .1 year unless mentioned otherwise. Both training and validation sets were separated into neoadjuvant treated and untreated patients. Results are only shown where the training sets contained enough high and low risk patients to make signature discovery possible. The hazard ratio of the gene signature prediction is shown with the 95% confidence interval between brackets. The p value of the log-rank test is shown as well, with the p value adjusted for multiple testing between brackets. A: Survival curves for patients in the validation set. Gene signatures were discovered using the full training set stratified by neoadjuvant treatment. B: Survival curves for untreated UMC Utrecht patients in the validation set. Gene signature was discovered using untreated UMC Utrecht patients in the training set. C: Survival curves for neoadjuvant treated Paul Brousse patients in the validation set. Gene signature was discovered using neoadjuvant treated Paul Brousse patients of the training set. doi:10.1371/journal.pone.0049442.g004 exist but was not found due to limiting factors in our study. These potential factors include our definition of high and low risk patients in the signature discovery, the number of patients included in the study especially in light of the heterogeneity of the patient group, the inclusion of patients from only two medical centers, the existence of a prior treatment effect and limits to the sensitivity of microarrays.
Liver metastases are by their nature biased towards a more aggressive subgroup of CRC. It could therefore be speculated that gene expression patterns that characterize rapidly recurring liver metastases are too subtle to be uncovered using the sample size employed in this study. Moreover, recurrence after resection of liver metastases might not be dependent on the characteristics of the liver metastasis itself, but on the presence of micrometastases at the time of liver resection.
Although we cannot exclude the existence of a predictive gene signature, no added benefit of gene expression signatures for the prediction of disease free survival in metastatic colorectal disease could be established based on the results of this study. Finally, the Fong clinical risk score, already validated by Reissfelder et al [7], is the most powerful risk score for predicting DFS of patients with resected CRLM of the five tested risk scores in our study. This clinical risk score should be used for stratification in prospective clinical studies examining the possible benefit of adjuvant therapies in patients undergoing surgery for CRLM. . Written informed consent was obtained from all patients. Samples were included of patients aged 18 years or older who underwent curative resection for histologically confirmed liver metastases from CRC. Patients with a history of non-colorectal malignancies, extrahepatic disease or macroscopic residual disease (R2) after surgery were excluded. Patients who received local ablative therapy or chemoembolization alone or in combination with resection were excluded. Only specimen were included that were snap frozen in liquid nitrogen within 30 minutes after resection and were stored in 280uC. The amount of stroma, tumor, benign liver cells and necrosis was determined by the two study pathologists (C.G and P.J.vD). Patients whose samples contained benign liver tissue or insufficient tumor cells were excluded from the study. Intraoperative ultrasound of the liver was performed in all patients to assess the size and location of the liver metastases. The size of the dataset was determined by the available patient tumor samples in the two participating institutions which fulfilled all in-and exclusion criteria. Patient-, tumor-and surgical characteristics were extracted from our prospectively collected databases. The definition of synchronous liver metastasis (diagnosis within two months after initial diagnosis) was based on that provided by the US National Cancer Institute.

Follow-up
All patients received standard follow up with spiral CT of the abdomen and chest every 3 months to monitor recurrences. Disease free survival was defined as the time from resection to the time of the first sign of recurrence on CT scanning. All patients were censored at the time of death or the last followup. Survival time was determined using the Kaplan-Meier survival function.

Gene Expression Profiling
RNA isolation. Total RNA was isolated from individual tissue samples using Trizol reagent (Invitrogen) following the manufacturer's protocol. RNA was purified using the RNeasy mini-kit (Qiagen) and was subjected to DNase treatment using the Qiagen DNA-free kit. The yield and quality of total RNA was checked by spectrophotometry and by the Agilent Bioanalyzer (Agilent). Thirteen samples were excluded on the basis of the RNA yield and cRNA yield (RNA integrity number [RIN] ,6). Eight samples were excluded due to amplification failures, and 8 more samples did not meet the labeling criteria, resulting in data from 119 samples.
cRNA synthesis and fluorescent labeling. All amplification and labelling procedures were performed in 96 wells plates (4titude, Bioke) on a customized Sciclone ALH 3000 Workstation (Caliper LifeSciences), with a PCR PTC-200 (Bio-Rad Laboratories), SpectraMax 190 spectrophotometer (Molecular Devices), and a magnetic bead-locator (Beckman). cRNA products were purified and concentrated with RNAClean (Agencourt, Beckman) according to manufacturer's protocol. mRNA was amplified by in vitro transcription using an anchored primer and T7 RNA polymerase on 1 mg of total RNA. First a double stranded cDNA template was generated including the T7 promoter. Next, this template was used for in vitro transcription with the T7 megascript kit (Ambion) to generate cRNA. During the in vitro transcription, 5-(3-aminoallyl)-UTP (Ambion) was incorporated into the single stranded cRNA. Samples with a yield less than 2000 ng or with small cRNA fragments (median less than 500 nt) were not used. Cy3 or cy5 fluorophores (GE Healthcare) were coupled to cRNA. We applied total RNA and cRNA quality control criteria in accordance with the Tumor Analysis Best Practices Working Group [21]. The yield and label incorporation of the cy-labeled cRNA was checked using spectrophotometry. Only samples with between 1.5% and 3% Cy-incorporation were included. Before hybridization, 300-1000 ng of Cy-labeled cRNA from one  biopsy was mixed with an equal amount of reverse color Cylabeled material from the reference sample. Microarray hybridization. For each sample, two expression profiles in dye-swap experiments were generated. The samples were compared against a commercial reference (Universal Human Reference RNA catalog #740000, Stratagene). The Human Array-Ready Oligo set (version 2.0) was purchased from Qiagen and spotted on Codelink slides (GE Healthcare) in a dust filtered and humidity controlled clean room. The microarrays contained 70-mer oligo-nucleotides representing 21.329 human genes and expressed sequence tags (ESTs), as well as 3871 additional spots for control purposes. Gene annotations were updated by BLAST analysis of all feature sequences using ENSEMBL build 55. Arrays were hybridized on a Tecan HS4800PRO hybridization station, using the protocol described previously [22]. Hybridized slides were scanned on an Agilent scanner (G2565BA) at 100% laser power and 60-90% PMT. After automatic data extraction using Imagene 8.0.1 (BioDiscovery), printtip Loess normalization was performed on mean spot intensities [23]. Dye bias was corrected based on a within-set estimate [24].
Data accessibility. In accordance with proposed MIAME (Minimum information about a microarray experiment) standards, primary and processed data as well as protocols were deposited in Array Express (http://www.ebi.ac.uk/microarray-as/aer) under accession number E-TABM-1112.

Identification of a Recurrence Signature
The cohort was randomly divided in a training set (n = 75) and a validation set (n = 44). The latter was not involved in gene selection to avoid a selection bias. For the purposes of discovering the gene signature, patients were initially divided in a high risk and a low risk group. High risk patients were defined as those with recurrence within 1 year (Figure 2). This threshold was based on the observation that a DFS ,1 year is predictive of adverse overall survival as described by Fong et al [14]. A division based on DFS #6 months (high risk) and DFS .2 years (low risk) was also applied ( Figure 2B). Using the training set, genes were ranked based on three different metrics (signal-to-noise-ratio, t-test statistic and Cox proportional hazard ratio). This ranking was done using a multiple sampling approach selecting 2/3 of the samples in each iteration. The 75 top ranked genes were used to predict the risk class of the samples in the remaining 1/3 of samples using nearest mean classification [9] and leave-one-out cross validation (LOOCV). Using these predictions a combined area under the curve for 1000 iterations was calculated giving an indication of the aggregated predictive power of the 75 gene signatures, where a value significantly above 0.5 points to true predictive power. The ranking of the genes were averaged over all 1000 iterations [25]. From the resulting ranked list, the gene signature with the strongest prognostic power (measured as overall accuracy of prediction) was determined using nearest mean classification and LOOCV starting from the best ranked gene and subsequently adding the next highest ranked gene in each iteration (forward selection) [9]. An independent measure of the predictive power was obtained by using the resulting gene signature to predict the risk class of the samples in the validation set (nearest mean, LOOCV). Kaplan-Meier analyses were used to estimate DFS and survival curves for the two predicted risk classes were compared using the Mantel-Cox log-rank test. A power analysis for the logrank test was done using the PS program [26]. Functional gene set enrichment analysis was performed using the Babelomics 4.2 webbased analysis suite including all the databases available for the enrichment analysis [27].

Analysis of Differential Gene Expression
Gene expression in patients treated with neoadjuvant chemotherapy was compared to expression in untreated patients using ANOVA [28]. In a fixed effect analysis, sample, array and dye effects were modelled. P values were determined by a permutation F2-test in which residuals were shuffled 5000 times globally.

Clinical Risk Scores
A univariate Cox proportional hazards regression model was used to estimate the hazard ratios of five clinical risk scores which were calculated for each patient [14][15][16][17][18] A multivariate analysis was also performed entering the factors with p values below 0.1 in the univariate analysis.

Statistical Testing and Software
All statistical tests were two-sided and statistical significance was assumed for p values less than 0.05. Where applicable, p values were adjusted for their false discovery rate using the Benjamini-Hochberg method [29]. Statistical analyses were done in R 2.7.0 with additional Bioconductor packages and SPSS for Windows version 15.0 (SPSS, Chicago, Illinois, USA). Figure S1 Power of the log-rank test. The statistical power of the log-rank test as a function of the hazard ratio of the gene signature prediction in the validation set. (TIF)      The hazard ratio of the clinical risk predictor is shown with the 95% confidence interval between brackets. The p value of the log-rank test is shown as well, with the p value adjusted for multiple testing between brackets. A: Iwatsuki (high risk $3, low risk ,3). B: Basingstoke (high risk $10, low risk ,10). C: Fong (high risk $3, low risk ,3). D: Mayo (high risk $2 low risk ,2). E: Nordlinger (high risk $4, low risk ,4). doi:10.1371/journal.pone.0049442.g005

Author Contributions
Exploring Gene Signatures in CRLM PLOS ONE | www.plosone.org