Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation

doi:10.1371/journal.pone.0100335

Figure 1.

The four confounding levels considered in this study.

The two bars for each confounding level correspond to the two batches. The different colors correspond to the two experimental groups (e.g., control and treated). The height of the respective bars illustrate the fraction of the samples belonging to each category. In addition to the four situations shown in this figure, we also consider data sets without batch effect at all, that is, where all samples are generated from the same batch.

More »

Expand

Figure 2.

The cross-validation scheme employed in the study.

The upper panel illustrates the combination of the inner cross-validation loop, which is used to estimate the optimal combination of the classifier hyperparameter and number of features, and the outer cross-validation loop, which is used to estimate the predictive performance of the constructed classifier. The lower panel shows how the final classifier is built on the whole input data set, and its performance is estimated on an external validation data set. The bias of the estimate from the cross-validation procedure is obtained by comparing the values in the two colored boxes.

More »

Expand

Figure 3.

Step-by-step description of the flowchart illustrated in Figure 2.

The code used to produce the results presented in this manuscript is provided in Supporting Information S2.

More »

Expand

Figure 4.

Evaluation of classifiers built on data without truly differentially expressed genes between the classes, but with a batch effect with various degree of confounding with the class labels.

(a) Estimated predictive performance from the outer cross-validation (internal) and obtained by applying the constructed classifier to an external test set (external). (b) The fraction of predictor variables selected for the final classifier that were simulated to be differentially expressed and/or associated with the batch. The bars summarize results across all classifiers and all data set replicates. The bar heights represent the average fraction of variables extracted from each category, and the error bars extend one standard deviation above the average. Note that since there are no truly differentially expressed genes in this data set the height of the two corresponding bars is zero.

More »

Expand

Figure 5.

Evaluation of classifiers built on data without truly differentially expressed genes between the classes, but with a batch effect with various degree of confounding with the class labels, after the elimination of this batch effect with ComBat.

(a) Estimated predictive performance from the outer cross-validation (internal) and obtained by applying the constructed classifier to an external test set (external). (b) The fraction of predictor variables selected for the final classifier that were simulated to be differentially expressed and/or associated with the batch. The bars summarize results across all classifiers and all data set replicates. The bar heights represent the average fraction of variables extracted from each category, and the error bars extend one standard deviation above the average. Note that since there are no truly differentially expressed genes in this data set the height of the two corresponding bars is zero.

More »

Expand

Figure 6.

Evaluation of classifiers built on data containing truly differentially expressed genes between the classes, as well as a batch effect with various degree of confounding with the class labels.

(a) Estimated predictive performance from the outer cross-validation (internal) and obtained by applying the constructed classifier to an external test set (external). (b) The fraction of predictor variables selected for the final classifier that were simulated to be differentially expressed and/or associated with the batch. The bars summarize results across all classifiers and all data set replicates. The bar heights represent the average fraction of variables extracted from each category, and the error bars extend one standard deviation above the average.

More »

Expand

Figure 7.

Evaluation of classifiers built on data containing truly differentially expressed genes between the classes, as well as a batch effect with various degree of confounding with the class labels, after the elimination of this batch effect with ComBat.

(a) Estimated predictive performance from the outer cross-validation (internal) and obtained by applying the constructed classifier to an external test set (external). (b) The fraction of predictor variables selected for the final classifier that were simulated to be differentially expressed and/or associated with the batch. The bars summarize results across all classifiers and all data set replicates. The bar heights represent the average fraction of variables extracted from each category, and the error bars extend one standard deviation above the average.

More »

Expand

Figure 8.

Evaluation of classifiers built on data containing truly differentially expressed genes between the classes, but no batch effect.

(a) Estimated predictive performance from the outer cross-validation (internal) and obtained by applying the constructed classifier to an external test set (external). (b) The fraction of predictor variables selected for the final classifier that were simulated to be differentially expressed and/or associated with the batch. The bars summarize results across all classifiers and all data set replicates. The bar heights represent the average fraction of variables extracted from each category, and the error bars extend one standard deviation above the average. Note that since there is no batch effect in this data set the height of the two corresponding bars is zero.

More »

Expand