PenDA, a rank-based method for personalized differential analysis: Application to lung cancer

doi:10.1371/journal.pcbi.1007869

PenDA, a rank-based method for personalized differential analysis: Application to lung cancer

Fig 6

Genetic deregulations efficiently classify cancer histologies.

(a, b) Principal Component Analysis on TCGA non-small-cell lung cancers (ADC and SQCC cohorts) using normalized count matrix (a) or PenDA differential expression matrix (b) as input. Full lines represent the decision boundary between ADC and SQCC histologies (using a linear SVM classifier on the first two principal components). Dashed lines represent the upper and lower margins of the decision boundary. Each symbol represents an individual sample (orange crosses for ADC, purple triangles for SQCC). (c) At the bottom, the bar plot represents the histology predictions based on the SVM classifier. SVM on PenDA predicts correctly 95% of ADCs and 93% SQCCs. SVM on count predicts correctly 92% of ADCs and 92% SQCCs. (d) Heatmap of PenDA differential expression matrix applied to a specific set of classifier genes (n = 875) in TCGA non-small-cell lung cancers: ADC (orange) and SQCC (purple). Two hierarchical clustering analyses were performed: using Euclidean distance to sort genes and using Pearson correlation-based distance to classify patients, with a complete linkage function in both cases. ADC subclasses (color-coded, class I to III) are defined according to the dendrogram cutoff n = 3 groups (cutting section = green dashed line). (e) Graphical representation of the contingency table between ADC subtypes (Chen et al,) and ADC subclasses (PenDA analysis). Each bar plot represents the total number of patients in each cell of the table.

doi: https://doi.org/10.1371/journal.pcbi.1007869.g006