Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prediction of early breast cancer patient survival using ensembles of hypoxia signatures

  • Inna Y. Gong,

    Roles Conceptualization, Formal analysis, Software, Visualization, Writing – original draft

    Affiliation Ontario Institute for Cancer Research, Toronto, ON, Canada

  • Natalie S. Fox,

    Roles Data curation, Resources, Software, Visualization, Writing – review & editing

    Affiliations Ontario Institute for Cancer Research, Toronto, ON, Canada, Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada

  • Vincent Huang,

    Roles Formal analysis, Visualization

    Affiliation Ontario Institute for Cancer Research, Toronto, ON, Canada

  • Paul C. Boutros

    Roles Conceptualization, Project administration, Supervision, Writing – review & editing

    Paul.Boutros@oicr.on.ca

    Affiliations Ontario Institute for Cancer Research, Toronto, ON, Canada, Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada, Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada

Abstract

Background

Biomarkers are a key component of precision medicine. However, full clinical integration of biomarkers has been met with challenges, partly attributed to analytical difficulties. It has been shown that biomarker reproducibility is susceptible to data preprocessing approaches. Here, we systematically evaluated machine-learning ensembles of preprocessing methods as a general strategy to improve biomarker performance for prediction of survival from early breast cancer.

Results

We risk stratified breast cancer patients into either low-risk or high-risk groups based on four published hypoxia signatures (Buffa, Winter, Hu, and Sorensen), using 24 different preprocessing approaches for microarray normalization. The 24 binary risk profiles determined for each hypoxia signature were combined using a random forest to evaluate the efficacy of a preprocessing ensemble classifier. We demonstrate that the best way of merging preprocessing methods varies from signature to signature, and that there is likely no ‘best’ preprocessing pipeline that is universal across datasets, highlighting the need to evaluate ensembles of preprocessing algorithms. Further, we developed novel signatures for each preprocessing method and the risk classifications from each were incorporated in a meta-random forest model. Interestingly, the classification of these biomarkers and its ensemble show striking consistency, demonstrating that similar intrinsic biological information are being faithfully represented. As such, these classification patterns further confirm that there is a subset of patients whose prognosis is consistently challenging to predict.

Conclusions

Performance of different prognostic signatures varies with pre-processing method. A simple classifier by unanimous voting of classifications is a reliable way of improving on single preprocessing methods. Future signatures will likely require integration of intrinsic and extrinsic clinico-pathological variables to better predict disease-related outcomes.

Introduction

Cancer is fundamentally a disease driven by genetic alterations, with the stepwise accumulation of mutational hits in oncogenes and tumor suppressors [1]. However, cancer is not one disease but many, with significant variability between tumor subtypes and within individual tumours in both the rate of mutation and the specific genes that are mutated [2]. Consequently, the molecular landscape of tumours can vary wildly, leading to differences in progression and overall prognosis. These differences are described as genetic heterogeneity, while intra-tumor heterogeneity refers to heterogeneity within a tumor [36].

Currently, treatment decisions for individual patients are largely based on tumor subtype, histology and pathology; clinico-pathological correlation; and tumor size, nodal and metastatic status (TNM stage), along with a few molecular characteristics. This approach does not account for the wide spectrum of genetic burden experienced by the individual patients, leading to divergent responses to therapy that are currently unpredictable. Accordingly, biomarkers play a key role in the realization of precision oncology to determine the treatment that generates optimal response with minimal toxicity [7]. Biomarkers could be used at all stages of disease management, including prognosis (determining an individual patient’s likely course of disease-related outcomes such as recurrence and survival), or drug-sensitivity prediction [8, 9]. An ideal biomarker may predict multiple of these end-points simultaneously, and current research focuses on creating panels of biomarkers for each disease.

To this end, numerous groups have sought to develop transcriptomic biomarkers using microarray and RNA-sequencing approaches [7]. These efforts have resulted in a wide spectrum of signatures with prognostic potential, with the hope of fulfilling the gap between the underlying genomic heterogeneity and clinical oncology. However, few of these signatures have been successfully translated into routine clinical practice [10]. There are several reasons for this high failure rate of biomarkers [11]. First, there is little overlap in the genes incorporated across biomarkers, leading to criticism that variability in the experimental and computational techniques introduce artificial noise [12, 13]. Second, signatures have been derived from a variety of sources including cell lines, transgenic mouse models, combination of biological pathways known to be perturbed in tumor subtypes, and profiling of tumor specimens. Third, small sample size with low statistical power limits the generalizability of the signatures [14]. Fourth, biases often exist between the training and testing populations, yielding a signature that reflects interdependencies between known clinical variables [15]. Fifth, the lack of guidelines on strenuous evaluation of biomarker performance in independent validation datasets further accentuates false-positive rates and confuses the literature [14]. Finally, lack of standardized preprocessing methods challenge the consistency of the data obtained, which is often re-used in secondary studies.

Several groups have demonstrated that biomarker reproducibility is highly sensitive to the choice of preprocessing algorithm [13, 16, 17]. For example, we demonstrated that applying 24 preprocessing techniques for mRNA abundance normalization and predicting two established signatures led to only ~33% of patients having consistent predictions in a cohort of 442 non-small cell lung cancer (NSCLC) patients [18]. Surprisingly, those patients with unanimous predictions across all preprocessing methods had more robust classifications than those from any individual preprocessing algorithm alone. These findings were corroborated when we evaluated pipeline concordance in a cohort of 1,564 early breast cancers using hypoxia signatures. The ensemble approach of merging multiple preprocessing methods improved the performance of hypoxia signatures, outperforming any individual method [19].

Hypoxia is the result of cancer altering cellular metabolism to focus on anaerobic glycolysis along with the tortuous nature of their blood vessels [19]. Hypoxic regions of the tumor have been implicated in promoting genomic heterogeneity, genomic instability and subclonal expansion of a more aggressive tumor cells [20, 21]. The selective pressures experienced by tumor cells in hypoxic conditions consequentially results in altered gene expression by epigenetics and transcription factor activation for angiogenesis, and gaining of metastatic features. Hypoxia is associated with poor prognosis and treatment failure, prompting the development of several biomarkers to identify such patients [21, 22].

It is unclear why this ensemble-of-preprocessing methods approach works so effectively. One hypothesis is that each individual preprocessing removes a different aspect of underlying noise in the microarray dataset, and that the merged ensemble of noise reduction from various perspectives allows a more accurate estimate of the true biomarker signal. The vast majority of current implementations involve simple voting, which may significantly underestimate the advantages of ensembles. Further, unanimous voting classification method leaves a large fraction (36%-80% depending on the signature) of patients unclassified. To try to bring such approaches to greater clinical utility, we set out to systematically evaluate whether ensembles of preprocessing methods may improve classification in a greater proportion of patients. We replaced the simple voting scheme with supervised machine-learning and evaluated a broad range of signatures.

Methods

Datasets

To systematically evaluate the impact of preprocessing ensemble classifier on risk stratification, two separate sets primary breast cancer mRNA abundance were gathered. First, eight datasets profiled on the Affymetrix Human Genome U133A (HG-U133A) microarray platform were obtained and integrated, comprising a total of 1,564 early breast cancer patients [2330]. Second, two datasets profiled on the Affymetrix Human Genome Plus 2.0 (HG-U133 Plus 2.0) GeneChip Array were obtained for a total of 579 early breast cancer patients [31, 32]. All samples incorporated in the analysis were surgical specimens taken prior to any treatment. To verify the ensemble method can be effective in other data types, a prostate cancer methylation preprocessing dataset containing 310 samples normalized using 11 different strategies was used [33].

Preprocessing pipelines

To evaluate the performance of preprocessing ensemble classifiers learnt from various preprocessing pipelines, data from the two microarray platform datasets specified above were preprocessed in 24 different ways. There were three aspects that were considered to yield the unique 24 preprocessing methods: six preprocessing algorithms, two gene annotation methods, and two dataset handling procedures. The combinations of these that precipitate the 24 preprocessing pipelines were carried out as previously described [19]. Briefly, the six preprocessing algorithms include 4 without log2-transformation [Robust Multi-array Average (RMA) [34], MicroArray Suite 5.0 (MAS5) [35], Model-base Expression Index (MBEI) [36], GeneChip Robust Multi-array Average (GCRMA) [37], and 2 log2-transform versions of MAS5 and MBEI. These algorithms were all available in the R statistical environment (R packages: affy v1.36.0, gcrma v2.30.0). S1 Table provides a brief summary of each of these algorithms. The two dataset handling approaches include either independent or merged preprocessing. The two ProbeSet annotations used were either default Affymetrix gene-annotation (R packages: hgu133aprobe v2.10.0, hgu133acdf v2.10.0, hgu133a.db v2.8.0, hgu133plus2probe v2.6.0, hgu133plus2cdf v2.6.0, hgu133plus2.db v2.8.0) or an alternative Entrez Gene-based updated annotation (R pack- ages: hgu133ahsentrezgprobe v15.1.0, hgu133ahsentrezgcdf v15.0.0, hgu133plus2hsentrezgprobe v15.1.0, hgu133- plus2hsentrezgcdf v15.1.0). S2 Table provides a summary of each of these preprocessing pipelines.

Patient risk classification: Hypoxia signatures

To assess the influence of preprocessing variation on risk stratification of patients, we used four published hypoxia gene signatures: Buffa metagene [38], Winter metagene [39], Hu signature [40], and Sorensen gene set [41]. These signatures were chosen as they exhibited the best performance in predicting patient outcome in our previous work. Briefly, each gene signature was used to stratify patients into either low-risk or high-risk. Following pre-processing of data using pipelines, the multi-gene signature score was calculated for each patient using all genes on the signature’s gene list. First, for each gene of the signature, patients were median dichotomized (0 or 1) based on the signal-intensity of the gene compared to the expression level of that gene across all patients. Next, the multi-gene signature score for each patient was calculated as the sum of all gene scores. Finally, the scores were used to median dichotomize patients into high and low risk groups for each signature.

For preprocessing pipelines with independent dataset preprocessing, stratification was conducted independently. In preprocessing pipelines with merged dataset preprocessing, stratification was conducted simultaneously. In summary, for each patient, 24 risk classifications (high or low risk) was derived from 24 different pre-processing pipelines based on gene signature expression.

Brief descriptions of the original studies deriving these signatures are provided in S3 Table. Of note, genes contained in these signatures are genes that were found to be upregulated in hypoxic tumor environments, resulting in worse prognosis.

Ensemble classifier: Risk classification votes

The primary endpoint was to delineate whether an ensemble of preprocessing pipeline classifiers using hypoxia signatures may improve the prediction of prognosis in early breast cancer patients beyond that achieved by single pipeline classifiers. Since cause-specific mortality data is lacking in our study, individual patient survival outcome was defined as either 0 or 1 to represent dead or alive status at 5-years, respectively (events occurred after 5-years were censored). Five-year survival was chosen as it is an important survival time-point for breast cancer survivors due to the increasing causes of death unrelated to breast cancer in subsequent survivorship years. At the end of 5 years, 1193 were censored while 371 cancer-related events occurred for patients profiled on the HG-U133A platform. For patients profiled on the HG-U133 Plus 2.0 platform, 352 were censored while 227 events occurred.

The 24 dichotomized risk profiles determined from each hypoxia signature were combined to develop a preprocessing ensemble classifier using random forest (randomForest package v4.6.10) to stratify patients within the HG-U133A and HG-U133A Plus 2.0 datasets, respectively, as good or poor prognosis. The HG-U133A and HG-U133A Plus 2.0 datasets were independently separated into training and testing sets by a sample size ratio of 1:1. Random sampling was employed to determine the training and testing set, maintaining a balanced ratio between mortality and survival events in subsequent datasets. Random forest classifier was trained on the training set of HG-U133A and HG-U133A Plus 2.0, respectively, to prognosticate survival. Parameter was set at the upper limit of the total number of events in the training set to maintain equal sampling from patients who survived and those who experienced an event at 5 years. Tuning of random forest classifier parameters mtry (values 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24) and ntree (values 500, 1000, 2000, 5000) was done using grid. The best tuning parameters for the final classifier were selected based on the performance measure accuracy, as specified below.

The test dataset was evaluated using each of the tuned models to produce 0 or 1 to predict whether each patient died by 5 years. To calculate performance, patients alive at 5 years were considered to be true negatives (TNs) if the classifier correctly assigned them to good prognosis group, whereas they were considered as false negatives (FNs) if they died within 5 years. Similarly, patients who died within 5 years were considered to be true positives (TPs) if the classifier correctly assigned them to poor prognosis group, whereas they were considered as false positives (FPs) if they were alive at 5 years. Subsequently, sensitivity, specificity, and accuracy were calculated accordingly. The area under the receiver operator curve (AUC) was calculated based on the receiver operator characteristic (ROC) analysis using the random forest classification probability (pROC v1.8). The final tuning parameters selected were those that yielded the highest accuracy.

Ensemble classifier: Engineered variables

Following random forest classification using only the risk classification votes from different pipeline variants, classifiers were constructed using summary statistics as additional features. The engineered summary variables capture the total number of poor prognosis votes based on the variable aspects of preprocessing pipelines as follows: total number votes overall, total number of votes for pipelines using separate preprocessing, total number of votes for pipelines using merged preprocessing, total number of votes for RMA pipelines, total number of votes for GCRMA pipelines, total number of votes for MBEI pipelines, total number of votes for MAS5 pipelines, total number of votes for log2 MBEI pipelines, total number of votes for log2 MAS5 pipelines, total number of votes for RMA and MAS5 pipelines, total number of votes for pipelines using default annotation, and total number of votes for pipelines using alternative annotation. The derivation of engineered variables is summarized in S4 Table.

Random forest models were built upon the following feature combinations: ensemble of preprocessing pipeline variants and the engineered variables, ensemble of engineered variables, and ensemble of only feature variables selected by the Boruta algorithm (Boruta v4.0.0). Random forest models were tuned based on performance similar to above. For the HG-U133A dataset, models were constructed by incorporating all patients in the cohort or only the subset of patients with unanimous agreement across the preprocessing pipelines. For the HG-U133A Plus 2.0 dataset, given the smaller sample size, models were constructed by incorporating all patients in the cohort to maintain sufficient statistical power.

Classifier evaluation

The prognostic performance of the tuned classifiers was evaluated on the test set Kaplan-Meier estimates with the log-rank test and unadjusted Cox proportional hazard ratio model used to compare between the two groups (survival v2.38.0). In order to assess the performance of random forest-based ensemble classifiers, we compared the random forest classifier hazard ratio (HR), the HR in the subset of patients with unanimous agreement across 24 preprocessing pipelines, as well as the HR of individual preprocessing pipelines. Similarly, binary classification measure accuracy was compared. To compare between the random forest classifiers derived for each hypoxia signature, we assessed prognostic performance using the AUC. The ROC analysis was conducted for each signature using the random forest classification probability (pROC v1.8).

Statistical comparison analysis

We compared the HR performance in the array of random forest classifier models for each hypoxia signature. The classifier HRs were split based on the features used to build the classifier: preprocessing pipelines, engineered variables, and feature variable selection. A paired t-test was used to assess statistical differences in the log2-transformed Hazard Ratios.

New signature creation using preprocessing ensembles

Using the HG-U133A platform datasets, we sought to elucidate the ability of preprocessing ensemble classifiers to improve upon performance of novel signatures. To this end, we generated a 100-top-ranked-gene novel signature for individual preprocessing pipelines. This was done for preprocessing pipelines where all HG-U133A datasets were preprocessed together, yielding 12 individual signatures. To ascertain the signatures, each preprocessing normalization method was used to median-dichotomize the patient cohort by low or high abundance for each gene. The unadjusted Cox proportional hazard model was used to determine the univariate performance of individual genes to prognostic outcome. Statistical significance was assessed using the Wald test and p-values were false-discovery rate (FDR) adjusted to correct for multiple-testing. The 100 top-ranked genes with adjusted p-values < 0.05 were selected to constitute the signature. The individual signatures from the 12 preprocessing pipelines were validated using random forest classifiers using 10-fold cross-validation, where the random forest classifiers were trained on a training set and internally validated on a separate test set. The 12 good versus poor prognosis classifications were subsequently combined in a meta-random forest to evaluate its ability to predict prognosis compared to individual signature classifiers. The random forest model parameters were tuned as described above. Classification accuracy for each breast cancer subtype was calculated by subsetting to patients with known subtype information before dividing the number of correctly classified patients by the total number of patients with the subtype. The ensemble classification accuracy was calculated using all patients in this comparison.

The method outlined to generate an ensemble classifier was also applied to a previously published prostate cancer methylation preprocessing dataset to further test the generalizability of this method.

Program usage

All statistical analyses and plotting were performed in R statistical environment (v3.2.1). The following packages were used for statistical analyses: randomForest v4.6.10, Boruta v4.0.0, survival v2.38.0, and pROC v1.8. All plots were generated in R using custom scripts for lattice (v0.2.31) and latticeExtra (v0.6.26).

Results

Study design: Ensembles of preprocessing pipelines

Our overall approach to evaluate non-linear preprocessing ensembles is outlined in Fig 1. Our goal was to determine how multiple pre-processing methods might best be combined to improve biomarkers predictive of patient prognosis. The datasets used were separated based on the microarray platform–HG-U133A and HG-U133 Plus 2.0 –because of previously reported differences in their noise characteristics [19]. The union of all HG-U133A datasets contains 1,564 patients while that of the HG-U133 Plus 2.0 datasets contains 579. Each individual dataset was preprocessed using 24 pipeline variants, and then each hypoxia signature was scored for each pre-processing variant. This resulted in 24 predictions for each combination of patient and signature. Additionally, we derived several engineered variables from counting the total number of votes based on various preprocessing pipeline characteristics (S4 Table). Random forest classifiers were constructed to predict prognosis for individual patients using combinations of the ensemble of 24 preprocessing pipeline predictions and the engineered features. We evaluated the performance of these classifiers using Kaplan-Meier analysis, Cox proportional hazard model, and the binary classification accuracy.

thumbnail
Fig 1. Summary of the study design for ensemble classification for evaluation of a biomarker.

Microarray data are obtained from specific platforms and preprocessing using 24 different pipelines to normalize the mRNA gene expression. Risk groups are then assigned based on the biomarker of interest, resulting in a collection of either good or poor prognosis stratification based on the expression obtained from various preprocessing methods. Stratification into either good or poor prognosis represents a vote for that group, resulting in a score between 0 and 24. The ensemble of classifications is combined as features for random forest based machine learning. Random forest classifiers learning on a selected training set and evaluated on the test set. The robustness of the classifier derived for the biomarker of interest is evaluated with Cox proportional hazard ratio modeling and Kaplan-Meier survival estimates.

https://doi.org/10.1371/journal.pone.0204123.g001

Different preprocessing ensembles perform best for different biomarkers

We compared the performance of the individual preprocessing pipelines with to those of ensemble approaches. This process was conducted for each of the four hypoxia signatures and both microarray platforms. S5 Table (HG-U133A) and S6 Table (HG-U133 Plus 2.0) comprise the hazard ratios (HRs) and 95% confidence intervals (CIs) determined for each of the 24 preprocessing pipelines, the random forest classifiers evaluated, and the simple preprocessing unanimous classifier, for each signature. Note that in this design each classifier is evaluated on a fully-independent validation cohort, to mitigate over-fitting.

Fig 2A shows a representative forest plot of the prognostic ability of various classifiers measured in HRs for the Winter metagene signature, using the HG-133A microarray platform. The best prediction of prognosis was observed in the subset of patients with unanimous agreement across the pipelines [HR 3.48, 95% confidence interval (CI) 2.44–4.95, p = 4.99 x 10−12]. However, the unanimous classification method only makes predictions for 41% (642) of patients while the remainder are unclassified. With incorporation of all patients in the HG-U133A dataset, the random forest classifier using engineered variables derived from votes of preprocessing pipeline features appeared to be a better predictor of prognosis than any individual pipelines (HR 2.39, 95% CI 1.94–2.93, p = 9.89 x 10−17). Similarly, the prognostic ability of two other ensemble random forest classifiers (preprocessing pipeline in combination with engineered variables, and preprocessing pipelines ensemble) also performed better than any individual pipelines (HR 2.25, 95% CI 1.83–2.76, p = 9.59 x 10−15 and HR 2.24, 95% CI 1.82–2.75, p = 1.41 x 10−14).

thumbnail
Fig 2. Representative hazard ratio forest plot and accuracy for Winter metagene signature using the HG-U133A microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.g002

Surprisingly, though, this improved performance of random forest classifier of pre-processing methods was not a general feature of signatures. Rather, the performance of the ensemble classifier in comparison to individual pipeline variants was highly variable for the Buffa (S1 Fig) Hu (S2 Fig) and Sorensen signatures (S3 Fig). Further, the combination of features resulting in the best classifier was not consistent across the four signatures: engineered variables were important for the Buffa and Winter signatures (Buffa: HR 2.15, 95% CI 1.75–2.64, p = 4.03 x 10−13; Winter: HR 2.39, 95% CI 1.94–2.93, p = 9.89x 10−17), but feature selection using the Boruta algorithm yielded the highest performing classifier for Hu (HR 1.63, 95% CI 1.32–2.00, p = 3.87 x 10−6) and Sorensen signatures (HR 2.28, 95% CI 1.87–2.78, p = 2.51 x 10−16).

These findings of strong divergence in the best way to merge pre-processing algorithms held when we considered other metrics of classification accuracy besides HRs. For example, classification accuracy and evaluation of the area under the receiver operating characteristics curve (AUC) again show the benefits of specific pre-processing ensembles for the Winter signature (Fig 2B) matching those in the HR analysis, and analogously for the Buffa (S1 Fig), Hu (S2 Fig), and Sorensen signatures (S3 Fig).

These trends were also independent of the specific microarray platform used: results were comparable in patients analyzed using the HG-U133 Plus 2.0 microarray platform (S4S7 Figs). The preprocessing unanimous classifier based on simple risk voting resulted in superior prognostication compared to individual preprocessing variants for all signatures except for Sorensen. Furthermore, the random forest classifiers evaluated did not improve upon unanimous classification, except for the Sorensen signature. The best performing random forest classifier was also inconsistent and variable across the biomarkers evaluated. The Kaplan-Meier plots for the HG-U133A dataset are shown in S8S11 Figs. The Kaplan-Meier plots for the HG-U133 Plus 2.0 dataset are shown in S12S15 Figs.

Comparison of patient prognosis prediction between signature classifiers

Taken together, our results show that it is possible to improve upon individual pre-processing pipelines using ensemble techniques, but that the best way to assemble these ensembles varies with the biomarker signature, and not the microarray platform. Fig 3A compares the best ensemble of pre-processing methods to the best individual preprocessing method for each signature and microarray platform. Consistent with our previous results, the random forest classifier outperformed the preprocessing method for Winter and Sorensen signatures, but not for Buffa and Hu signatures. The ROC curve and corresponding AUC obtained for the best ensemble of preprocessing strategies is shown in Fig 3B and Fig 3C. The Buffa, Winter, and Sorensen signature classifiers demonstrated similar AUCs for mortality risk stratification between the two microarray platforms. Conversely, the Hu signature classifier showed better risk stratification using the HG-U133 Plus 2.0 platform compared to the HG-U133A platform.

thumbnail
Fig 3. Summary hazard ratio forest plot and receiver operator curves.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for the best performing preprocessing method, best performing random forest classifier, and the unanimous vote classifier. Plot is ordered by decreasing hazard ratio within each signature and microarray platform evaluated. Colors correspond to the specific signature evaluated. (B and C) Receiver operator curves and area under the curve (AUC) obtained from the best performing random forest classifier for each biomarker, as determined by the highest hazard ratio. HG-U133A ROC curves shown in A, and HG-U133 Plus 2.0 ROC curves shown in B.

https://doi.org/10.1371/journal.pone.0204123.g003

To determine if there are general properties of an ensemble of preprocessing methods that contribute to its performance, we compared each classifier feature to the ultimate performance of the classifier. This was done separately for both microarray platforms. For the HG-U133A platform, patients where all preprocessing methods gave a consistent results (unanimous preprocessing agreement) were statistically easier to classify than those where there was divergence amongst the pre-processing methods. These patients are thus more difficult to prognose, even though ensembles do improve upon the best individual pre-processing method. Similarly for the HG-U133 Plus 2.0 platform, patients with unanimous preprocessing agreement were statistically significantly or trend significantly easier to classify than those with divergence across classifiers. This trend was consistent across all four signatures evaluated, and across both platforms, suggesting that there is a patient sub-group that is fundamentally easier to classify, and that on the agreement of pre-processing methods on this sub-group can give increased confidence to the accuracy of molecular biomarkers.

Generalization to non-hypoxia signatures

To assess the generality of these observations, we trained independent prognostic signatures on each pre-processing method (Fig 4). Thus the same training dataset was pre-processed in 12 distinct ways, and then a learner was applied to each of these, leading to 12 distinct prognostic biomarkers. We focused on the HG-U133A data for this experiment, given its larger sample-size. We selected a standard straight-forward machine-learning approach, involving feature-selection with a univariate statistical text (Cox proportional hazards modeling) and modeling using the non-metric random forest approach. We then evaluated whether these 12 separate classifiers gave similar predictions for each individual patient, and attempted to create an ensemble of them. Finally the twelve separate and one ensemble classifiers were validated on the independent validation dataset using the AUC and Cox proportional hazards modeling.

thumbnail
Fig 4. Summary of the study design for development of novel signature classifiers for each preprocessing pipeline and evaluating its performance in a meta-ensemble classifier.

Microarray data are obtained from specific platforms and preprocessing using 24 different pipelines to normalize the mRNA gene expression. The gene expression is median dichotomized into two expression groups. Novel signatures are determined as the top 100 genes that reached significant after adjustment for false discovery rate, for each preprocessing pipeline (total 12). The training of a random forest classifier based on the individual novel signatures result in individual risk classifications of survival prognosis. These risk stratification are subsequently combined in a meta-random forest classifier to evaluate the robustness of the signature with Cox proportional hazard ratio modeling and Kaplan-Meier survival estimates.

https://doi.org/10.1371/journal.pone.0204123.g004

The signatures trained with each of the 12 preprocessing pipelines had remarkably similar accuracy and HRs (Fig 5A), and a subset of genes overlapped across multiple signatures (S7 Table). An ensemble of these 12 classifiers resulted in marginally, but not statistically significant, improved predictions, suggesting that the signatures are not providing complementary information. To verify this, we compared the agreement of the per-patient predictions across all signatures. Fig 5B illustrates the predictions of individual signature classifiers across all patients stratified by the true survival outcome. The signature showed highly concordant classification, with patients with mortality events were similarly classified as having poor prognosis across the signatures and patients with continued survival were similarly classified as having good prognosis across the signatures. Similarly, inaccurate predictions of survival and mortality occurred in a comparable subset of patients across the signatures. To determine if the signature's accuracy differed between subtypes, we subsetted the patients with known subtype information from the literature and calculated classification accuracy (Fig 5C). We find that accuracy was highest for normal-like and lowest for Luminal B, and these values can be further improved during the model training process.

thumbnail
Fig 5. Hazard ratio forest plots of classifier performance and heatmap of individual classifier predictions of survival prognosis.

(A) Forest plot with 95% confidence intervals of novel signature classifiers. The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. (B) Heatmap of classifier predictions of 5-year survival status. The classifiers (by row) from signatures are ordered by decreasing performance of patient outcome prediction. Patients (by column) are ordered by the degree of agreement of predictions across the array of novel signatures identified from 12 different preprocessing variant pipelines. The true outcome of patients is shown as either 5-year survival status or overall survival status up to the end of study follow-up. Blue represents true positives with correct prediction of poor prognosis. Purple represents true negatives with correct prediction of good prognosis. The white part of the heatmap represents incorrect predictions of good or poor prognosis. (C) Classification accuracy of ensemble model stratified by known subtype of the tumours and of the model itself when subsetted to samples with subtype information in the literature.

https://doi.org/10.1371/journal.pone.0204123.g005

To further evaluate the generalizability of this ensemble method, we executed the workflow on a prostate cancer methylation preprocessing dataset [33]. This set consists of the raw methylation values along with data from 11 preprocessing methods. Following the method previously outlined (Fig 4), individual classifiers were trained before training the ensemble classifier. Similar to the results from the breast cancer data (Fig 5), the results from the prostate cancer dataset showed that the ensemble classifier outperformed the majority of individual classifiers at predicting biochemical recurrence (S16 Fig), but not all.

Taken together, it appears that all signatures predict either good or poor survival for a similar cohort of patients, and that there remains a group of patients whose prognosis is difficult to predict and that leveraging orthogonal information from multiple pre-processing schemes will not help in making more accurate predictions for these.

Discussion

Some groups have suggested that different preprocessing methods have minor effect on predictive signatures [42, 43]. Other work has suggested that this is incorrect, and that different preprocessing algorithms results in substantial differences in outcomes [18, 19]. Indeed we previously showed that ensemble classification combining preprocessing techniques using a unanimous voting method could identify high-confidence predictions, thereby giving increased confidence to risk stratification tools. We sought here to extend this approach and to discover if the predictions from multiple pre-processing algorithms might be combined into more accurate ensemble calls.

Our results demonstrate that there is indeed value to leveraging multiple pre-processing techniques. However, they yield the surprising result that the optimal way to do so is dependent on the characteristics of an individual signature. That is, one must consider all pre-processing methodologies for each new biomarker to determine if and to what extent combining them will improve predictions: there is no apparent universal approach to optimize this problem, even holding the dataset constant. Further, ensembles appear to be limited in the extent to which they can improve signatures–there remains a subset of hard-to-classify patients for whom varying characteristics of the pre-processing do not help in classification. Large inter-individual differences exist in a plethora of extrinsic factors that play an equally imperative role in driving survival outcomes. These include environmental exposure factors, socioeconomic factors, patient compliance concerns, patient preferences, and social habits [44]. Treatment factors include success of surgery such as extent of margins, factors involved in the delivery of adjuvant treatments, as well as variability in the decision-making process between the patient and the treating physician. Currently, much of this information is not considered in the evaluation of intrinsic biological pattern on prognosis. Optimal prediction of outcomes will likely necessitate the integration of both intrinsic and extrinsic information in the biomarker development process. These findings are thus highly consistent with that demonstrated by Tofigh et al., whereby the prognosis for a subset of breast cancer patients was intrinsically more difficult to predict [45].

Our results are not without limitations. First, the datasets included in the analyses herein represent only therapy-naïve early breast cancer tumors. It is well known that cancer is a disease of many, given the inter-tumor and intra-tumor heterogeneity observed. This precludes generalizations of these results to other tumor types. Second, we used random forests to derive classifiers, but potentially other machine learning algorithms may yield different results. Third, our study focused on four previously published hypoxia signatures and it would be difficult to extrapolate our findings to other microarray-based signatures. Studies are needed to elucidate the findings herein for other clinically promising signatures. Lastly, we only used microarray datasets to assess the utility of random forest classifiers for risk stratification. It may be that preprocessing ensemble classifications will be of greater benefit in fields where existing preprocessing methods are less robust [46].

Taken together, our data further highlights the need to incorporate extrinsic factors not accounted for by intrinsic biological signals, in the pursuit of integrative signatures that will allow for the realization of precision oncology.

Supporting information

S1 Table. Overview of preprocessing algorithms.

https://doi.org/10.1371/journal.pone.0204123.s001

(DOCX)

S2 Table. Summary of 24 preprocessing methods.

https://doi.org/10.1371/journal.pone.0204123.s002

(DOCX)

S3 Table. Overview of hypoxia prognostic signatures.

https://doi.org/10.1371/journal.pone.0204123.s003

(DOCX)

S4 Table. Summary of votes used to calculate engineered variables.

https://doi.org/10.1371/journal.pone.0204123.s004

(DOCX)

S5 Table. Hazard ratios and 95% confidence intervals obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier, per signature (HG-U133A microarray platform).

https://doi.org/10.1371/journal.pone.0204123.s005

(DOCX)

S6 Table. Hazard ratios and 95% confidence intervals obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier, per signature (HG-U133 Plus 2.0 microarray platform).

https://doi.org/10.1371/journal.pone.0204123.s006

(DOCX)

S7 Table. Frequency of top ranked genes selected from each of the preprocessing pipelines by univariate Cox proportional hazard models.

https://doi.org/10.1371/journal.pone.0204123.s007

(DOCX)

S1 Fig. Hazard ratio forest plot and accuracy for Buffa metagene signature using the HG-U133A microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.s008

(DOCX)

S2 Fig. Hazard ratio forest plot and accuracy for Hu signature using the HG-U133A microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.s009

(DOCX)

S3 Fig. Hazard ratio forest plot and accuracy for Sorensen signature using the HG-U133A microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.s010

(DOCX)

S4 Fig. Hazard ratio forest plot and accuracy for Buffa metagene signature using the HG-U133 Plus 2.0 microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.s011

(DOCX)

S5 Fig. Hazard ratio forest plot and accuracy for Winter metagene signature using the HG-U133 Plus 2.0 microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.s012

(DOCX)

S6 Fig. Hazard ratio forest plot and accuracy for Hu signature using the HG-U133 Plus 2.0 microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.s013

(DOCX)

S7 Fig. Hazard ratio forest plot and accuracy for Sorensen signature using the HG-U133 Plus 2.0 microarray platform.

(A) Forest plot of log2 hazard ratios with 95% confidence intervals obtained for each of the 24 preprocessing (PP) methods, the random forest classifiers evaluated, and the simple unanimous vote classifier (total number of votes for poor prognosis either 0 or 24). The forest plot is ordered as decreasing hazard ratio. The dotted line represents a hazard ratio of 1. The blue hazard ratio with its 95% confidence interval represents the hazard ratio for the simple unanimous vote classifier. (B) Bar plot of accuracy obtained for each of the 24 preprocessing methods, the random forest classifiers evaluated, and the simple unanimous vote classifier. The bars are ordered by preprocessing pipelines, the unanimous classifier, and the best performing random forest classifier, from left to right.

https://doi.org/10.1371/journal.pone.0204123.s014

(DOCX)

S8 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Buffa metagene signature using HG-U133A microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s015

(DOCX)

S9 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Winter metagene signature using HG-U133A microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s016

(DOCX)

S10 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Hu signature using HG-U133A microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s017

(DOCX)

S11 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Sorensen signature using HG-U133A microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s018

(DOCX)

S12 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Buffa metagene signature using HG-U133 Plus 2.0 microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s019

(DOCX)

S13 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Winter metagene signature using HG-U133 Plus 2.0 microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s020

(DOCX)

S14 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Hu signature using HG-U133 Plus 2.0 microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s021

(DOCX)

S15 Fig. Kaplan-Meier survival curves evaluating the prognostic ability of the Sorensen signature using HG-U133 Plus 2.0 microarray platform.

(A) Prognostic ability of signature in patients with unanimous ensemble agreement across preprocessing pipelines. (B) Prognostic ability of signature classification using the best performing preprocessing pipeline. (C) Prognostic ability of signature classification using the best performing random forest-based ensemble of preprocessing pipelines. Hazard ratios and p-values are from Cox proportional hazard ratio modeling.

https://doi.org/10.1371/journal.pone.0204123.s022

(DOCX)

S16 Fig. Hazard ratios of classifier performance and their predictions of biochemical recurrence in different methylation preprocessing methods for intermediate-risk prostate cancer patients.

(A) Similar to Fig 5, hazard ratios with 95% confidence intervals of novel signature classifiers ordered by decreasing hazard ratios. Dotted line represents a hazard ratio of 1. (B) Class predictions of biochemical recurrence from each of the classifiers and the ensemble. Each row is a classifier and are ordered by decreasing hazard ratios, while each column is a patient and are ordered by agreement across the 13 classifiers. The true class for each patient are indicated by the bar on top. A predicted class label matching the true label is blue for true positive, and purple for true negative, while white indicates an incorrect label.

https://doi.org/10.1371/journal.pone.0204123.s023

(DOCX)

Acknowledgments

The authors thank all members of the Boutros lab for helpful comments and suggestions.

References

  1. 1. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011; 144(5): 646–674. pmid:21376230
  2. 2. Burrell RA, McGranahan N, Bartek J, Swanton C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature. 2013; 501(7467): 338–345. pmid:24048066
  3. 3. Boutros PC, Fraser M, Harding NJ, de Borja R, Trudel D, Lalonde E, et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nature genetics 2015; 47(7): 736–745. pmid:26005866
  4. 4. Cooper CS, Eeles R, Wedge DC, Van Loo P, Gundem G, Alexandrov LB, et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nature genetics 2015, 47(4):367–372. pmid:25730763
  5. 5. de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science. 2014; 346(6206): 251–256. pmid:25301630
  6. 6. Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW et al. The life history of 21 breast cancers. Cell. 2012; 149(5): 994–1007. pmid:22608083
  7. 7. van't Veer LJ, Bernards R. Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature. 2008;452(7187): 564–570. pmid:18385730
  8. 8. Buyse M, Sargent DJ, Grothey A, Matheson A, de Gramont A. Biomarkers and surrogate end points—the challenge of statistical validation. Nature reviews Clinical oncology. 2010;7(6): 309–317. pmid:20368727
  9. 9. Chin L, Gray JW. Translating insights from the cancer genome into clinical practice. Nature. 2008;452(7187): 553–563. pmid:18385729
  10. 10. Diamandis EP. Cancer biomarkers: can we turn recent failures into success? Journal of the National Cancer Institute. 2010;102(19): 1462–1467. pmid:20705936
  11. 11. Boutros PC. The path to routine use of genomic biomarkers in the cancer clinic. Genome research. 2015;25(10): 1508–1513. pmid:26430161
  12. 12. Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, et al. Repeatability of published microarray gene expression analyses. Nature genetics. 2009;41(2): 149–155. pmid:19174838
  13. 13. Venet D, Dumont JE, Detours V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS computational biology. 2011; 7(10):e1002240. pmid:22028643
  14. 14. Kern SE. Why your new cancer biomarker may never work: recurrent patterns and remarkable diversity in biomarker failures. Cancer research. 2012;72(23): 6097–6101. pmid:23172309
  15. 15. Iwamoto T, Pusztai L. Predicting prognosis of breast cancer with gene signatures: are we lost in a sea of data? Genome medicine. 2010;2(11):81. pmid:21092148
  16. 16. Boutros PC, Lau SK, Pintilie M, Liu N, Shepherd FA, Der SD, et al. Prognostic gene signatures for non-small-cell lung cancer. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(8): 2824–2828. pmid:19196983
  17. 17. Starmans MH, Fung G, Steck H, Wouters BG, Lambin P. A simple but highly effective approach to evaluate the prognostic performance of gene expression signatures. PloS ONE. 2011;6(12):e28320. pmid:22163293
  18. 18. Starmans MH, Pintilie M, John T, Der SD, Shepherd FA, Jurisica I, et al. Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies. Genome medicine. 2012;4(11):84. pmid:23146350
  19. 19. Fox NS, Starmans MH, Haider S, Lambin P, Boutros PC. Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences. BMC bioinformatics. 2014;15:170. pmid:24902696
  20. 20. Brown JM. Tumor hypoxia in cancer therapy. Methods in enzymology 2007;435: 297–321. pmid:17998060
  21. 21. Brown JM, Wilson WR. Exploiting tumour hypoxia in cancer treatment. Nature reviews Cancer. 2004;4(6): 437–447. pmid:15170446
  22. 22. Wouters BG, van den Beucken T, Magagnin MG, Lambin P, Koumenis C. Targeting hypoxia tolerance in cancer. Drug resistance updates: reviews and commentaries in antimicrobial and anticancer chemotherapy. 2004;7(1): 25–40.
  23. 23. Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clinical cancer research: an official journal of the American Association for Cancer Research. 2007;13(11): 3207–3214.
  24. 24. Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(38): 13550–13555. pmid:16141321
  25. 25. Pawitan Y, Bjohle J, Amler L, Borg AL, Egyhazi S, Hall P, et al: Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast cancer research. 2005;7(6): R953–964. pmid:16280042
  26. 26. Schmidt M, Petry IB, Bohm D, Lebrecht A, von Torne C, Gebhard S, et al. Ep-CAM RNA expression predicts metastasis-free survival in three cohorts of untreated node-negative breast cancer. Breast cancer research and treatment. 2011;125(3): 637–646. pmid:20352488
  27. 27. Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute. 2006;98(4): 262–272. pmid:16478745
  28. 28. Symmans WF, Hatzis C, Sotiriou C, Andre F, Peintinger F, Regitnig P, et al. Genomic index of sensitivity to endocrine therapy for breast cancer. Journal of clinical oncology. 2010;28(27): 4111–4119. pmid:20697068
  29. 29. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365(9460): 671–679. pmid:15721472
  30. 30. Zhang Y, Sieuwerts AM, McGreevy M, Casey G, Cufer T, Paradiso A, et al. The 76-gene signature defines high-risk patients that benefit from adjuvant tamoxifen therapy. Breast cancer research and treatment. 2009;116(2): 303–309. pmid:18821012
  31. 31. Kao KJ, Chang KM, Hsu HC, Huang AT. Correlation of microarray-based breast cancer molecular subtypes and clinical outcomes: implications for treatment optimization. BMC cancer. 2011;11:143. pmid:21501481
  32. 32. Sabatier R, Finetti P, Cervera N, Lambaudie E, Esterni B, Mamessier E, et al. A gene expression signature identifies two prognostic subgroups of basal breast cancer. Breast cancer research and treatment. 2011;126(2): 407–420. pmid:20490655
  33. 33. Shiah YJ, Fraser M, Bristow RG, Boutros PC. Comparison of pre-processing methods for Infinium HumanMethylation450 BeadChip array. Bioinformatics. 2017;33(20): 3151–3157. pmid:28605401
  34. 34. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2): 249–264. pmid:12925520
  35. 35. Hubbell E, Liu WM, Mei R. Robust estimators for expression analysis. Bioinformatics. 2002;18(12): 1585–1592. pmid:12490442
  36. 36. Li C, Hung Wong W. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology. 2001;2(8):RESEARCH0032.
  37. 37. Wu Z, Irizarry RA. Stochastic models inspired by hybridization theory for short oligonucleotide arrays. Journal of computational biology. 2005;12(6): 882–893. pmid:16108723
  38. 38. Buffa FM, Harris AL, West CM, Miller CJ. Large meta-analysis of multiple cancers reveals a common, compact and highly prognostic hypoxia metagene. British journal of cancer. 2010;102(2): 428–435. pmid:20087356
  39. 39. Winter SC, Buffa FM, Silva P, Miller C, Valentine HR, Turley H, et al. Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers. Cancer Research. 2007;67(7): 3441–3449. pmid:17409455
  40. 40. Hu Z, Fan C, Livasy C, He X, Oh DS, Ewend MG, et al. A compact VEGF signature associated with distant metastases and poor outcomes. BMC Medicine. 2009;7:9. pmid:19291283
  41. 41. Sorensen BS, Toustrup K, Horsman MR, Overgaard J, Alsner J. Identifying pH independent hypoxia induced genes in human squamous cell carcinomas in vitro. Acta Oncologica. 2010;49(7): 895–905. pmid:20429727
  42. 42. Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature Biotechnology. 2010;28(8): 827–838. pmid:20676074
  43. 43. Verhaak RG, Staal FJ, Valk PJ, Lowenberg B, Reinders MJ, de Ridder D. The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies. BMC Bioinformatics. 2006;7:105. pmid:16512908
  44. 44. Poole EM, Shu X, Caan BJ, Flatt SW, Holmes MD, Lu W, et al. Postdiagnosis supplement use and breast cancer prognosis in the After Breast Cancer Pooling Project. Breast cancer research and treatment. 2013;139(2): 529–537. pmid:23660948
  45. 45. Tofigh A, Suderman M, Paquet ER, Livingstone J, Bertos N, Saleh SM, et al: The prognostic ease and difficulty of invasive breast carcinoma. Cell Reports. 2014;9(1): 129–142. pmid:25284793
  46. 46. Ewing AD, Houlahan KE, Hu Y, Ellrott K, Caloian C, Yamaguchi TN, et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nature Methods. 2015;12(7): 623–630. pmid:25984700