Skip to main content
Advertisement

< Back to Article

Figure 1.

A schematic diagram of key gene identification and activity inference.

Selected significant pathways are further subject to CORG identification corresponding to the phenotype of interest. Gene expression profiles of patient samples drawn from each subtype of diseases (e.g., good or poor prognosis) are transformed into a “pathway activity matrix”. For a given pathway, the activity is a combined z-score derived from the expression of its individual key genes. After overlaying the expression vector of each gene on its corresponding protein in the pathway, key genes which yield most discriminative activities are found via a greedy search based on their individual power (see Methods). The pathway activity matrix is then used to train a classifier.

More »

Figure 1 Expand

Figure 2.

Discriminative power of pathway and gene markers in the breast and lung cancer datasets.

Mean absolute t-scores against phenotypes were compared between four marker sets in the source dataset, which was used to identify markers—(A) and (C) for the two breast cancer datasets and (E) and (G) for the two lung cancer datasets—or in an independent verification dataset—(B) (D) (F) (H). Pathway markers were ranked by using their absolute t-scores from a two-tail t-test on activity levels (see S(G) in Methods) between the two phenotypes of interest in the source dataset, and their discriminative power in the same order was measured in the verification dataset. Pathway activities were estimated using only CORGs (PAC) or all member genes (PAC_all). The individual predictive power of CORGs in the top pathways was also evaluated using the same t-test on their gene expression levels (CORGs). A similar analysis was performed using the same number of top discriminative genes as the number of CORGs covered by the pathway markers (Genes).

More »

Figure 2 Expand

Figure 3.

Classification accuracy within (A) and across (B) datasets.

Bar chart of Area Under ROC Curve (AUC) classification performance of CORG-based pathway markers (PAC), conventional pathway markers (Mean, Median, and PCA), and individual genes (Gene; same number of top discriminative genes as the number of CORGs in pathway markers). Classification performance is summarized as mean±ste of AUC over 100 runs of 5-fold cross-validation within a dataset. To compute PAC_random, the AUC values of 1000 sets of random gene sets were averaged. Numbers above the red bars are -log (p-value) from the Wilcoxon signed-rank test on the 500 AUCs of “PAC” against those of “Gene” (only the ones with p-value<0.05 are shown). The p-values measure the significance of difference between PAC and gene-based classification.

More »

Figure 3 Expand

Table 1.

Frequently selected pathway markers for lung cancer prognosis.

More »

Table 1 Expand

Figure 4.

Pathway activity of the top frequently used markers in the two lung cancer datasets.

Activities were inferred from CORGs identified from each dataset. Green/red blocks indicate pathways (rows) that are up-/down- regulated in patients (columns) of specific prognosis (above color bars: pink and green indicate poor and good prognosis, respectively). Pathways are clustered based on the similarity of their activities across patients.

More »

Figure 4 Expand