GradientScanSurv—An exhaustive association test method for gene expression data with censored survival outcome

doi:10.1371/journal.pone.0207590

Fig 1.

GradientScanSurv procedure for exhaustive association test of gene expression with survival outcome.

Expression data was permutated to form trials of datasets (n> = 1000). GoodCounts were derived for each trial dataset and real dataset as the numbers of cutpoints that created significant logrank test p-values. From this, the GoodCount p-value (GoodCountPvals) is derived as the proportion of the times that the GoodCounts of permutated trial datasets is no less than that of real data. See Material and methods for details.

More »

Expand

Fig 2.

Comparison of GradientScanSurv with PrognoScan results for NRAS gene in LUAD datasets downloaded from PrognoScan website.

(A). A joined table of GradientScanSurv results and direct results from the PrognoScan website for the NRAS gene in LUAD (lung adenocarcinoma) datasets from the PrognoScan website. The red arrow indicates a dataset where GradientScanSurv called a significant association with GoodCountPval = 0.047, but PrognoScan missed the call with CORRECTED.P.VALUE = 0.092. (B). Screenshot of PrognoScan’s expression gradient-based logrank p-values plot. The blue vertical line indicates where the minimal p-value is that was used for the final CORRECTED.P.VALUE. (C). Screenshot of the PrognoScan report with final CORRECTED.P.VALUE at 0.092. (D). GradientScanSurv gene expression gradient-based logrank p-values plot with GoodCountPval at 0.047. The green vertical lines indicate specific cut-points where the corresponding logrank p-values are significant at p-value< = 0.05. Univariant expression-based coxph p-value and expression-rank based coxph p-value are also reported here as in panel A. This result was consistent with the TCGA validation result in GoodCountPval (0.001). The plot of the results also illustrates that higher expression of NRAS is associated with faster death, which is indicated by the brown diamonds along the top part of the plots (Figure E in S1 File).

More »

Expand

Fig 3.

Comparison of GradientScanSurv with PrognoScan results for the KRAS gene in LUAD datasets downloaded from PrognoScan.

(A). A joined table of GradientScanSurv results and direct results from the PrognoScan website for the KRAS gene in the same LUAD datasets. The red arrow indicates a dataset where GradientScanSurv called a significant association with GoodCountPval = 0.004, but PrognoScan barely missed the call with CORRECTED.P.VALUE = 0.0524 for dataset Jacob-00182-HLM with probeset 204010_s_at. (B). Screenshot of PrognoScan’s report for the dataset indicated by the red arrow in panel A. (C). GradientScanSurv gene expression gradient-based logrank p-values plot with GoodCountPval at 0.004 for the dataset indicated by the red arrow in panel A. See text for details.

More »

Expand

Fig 4.

Performance comparison with ROC analysis by shared truth lists.

(A). ROC analysis comparing all listed methods for the same set of truth gene lists that were shared amongst multiple truth lists derived from individual methods using PrognoScan LUAD data as the Training Set and TCGA LUAD data as the Testing Set. The settings table shows which datasets were used for the Training Sets and Testing Sets, as well as whether the Truth Set of genes derived from training are shared lists from at least 2 listed methods. The general ROC analysis procedure was described in the Materials and methods section. (B). Performance comparison table listing numbers of times/trials that a method has the largest AUC of ROC curves compared to other methods for the same set of truth gene lists (shared by at least 2 listed methods) and same trial of Testing Set dataset. All truth lists from Training Sets and the result lists derived from Testing Sets are at the adjusted p-value< = 0.05 for each method. The Freq_2_TruthList consists of the truth gene list shared by at least 2 of these selected methods; Freq_3_TruthList consists of the truth gene list shared by at least 3 of the selected methods etc.. Column “Total” summarizes the total counts for all scenarios (if a tie, each method would get 0.5 counts). Percentage lists the proportions of those counts. The details of each trial are shown in Table C in S1 File. (C). Average AUCs of ROC analysis for each method in each subset of total trials of data defined by 100 trials of TCGA data combined with one of the indicated PrognoScan datasets in column “PrognoScan_Datasets” (also see Materials and methods section for details). The last column”MaxAUCsMethods” shows the method with the maximal AUC for each row. (D). Examples of ROC plots showing AUCs of GradientScanSurv method are the largest in these trials.

More »

Expand

Fig 5.

Performance comparison by ROC analysis with the same set of Training Set and Testing Set for the same method.

(A). ROC analysis comparing all listed methods, each of which has been applied to the same set of Training Set and Testing Set for the same method. The settings table shows which datasets were used for the Training Set and Testing Set. The same method has been applied for the same set of TCGA LUAD data as Training Set and PrognoScan LUAD data as Testing Set. (B). Performance comparison table listing numbers of times/trials that a method has the largest AUCs of the ROC curves compared to other methods for the same set of Training Set and Testing Set datasets are shown in panel A. Column “Total” summarizes the total counts of all of the best trials of same Training Set and Testing Set datasets for each method listed in column Methods. Column “Percentage” lists the proportions of those counts. All truth lists derived from the Training Set and the result lists derived from Testing Set are at adjusted p-value< = 0.05 for each method. The details of each trial are shown in Table E in S1 File. (C). Average AUCs of ROC analysis for each method in each subset of total trials of data defined by 100 trials of TCGA data combined with one of the indicated PrognoScan datasets in column “PrognoScan_Datasets” (also see Materials and methods section for details). The last column”MaxAUCsMethods” shows the methods with the maximal AUC for each row. (D). Examples of ROC plots showing that the AUCs of the GradientScanSurv method are the largest in these trials. AUC = NA is due to the fact that the MedianPvals method has no positive calls in Training datasets at adjusted p-value< = 0.05.

More »

Expand

Table 1.

Comparison of shared genes identified by selected methods on two independent PAAD datasets.

More »

Expand

Table 2.

Comparison of results by GradientScanSurv and Lasso methods on TCGA tumor data (adjusted p< = 0.05).

More »

Expand