Skip to main content
Advertisement

< Back to Article

Fig 1.

Flowchart of CASCAM for congruence quantification and selection.

Tumor and cancer model gene expression data are first harmonized (Module 1). Transparent machine learning by sparse discriminant analysis (SDA) is applied by combining predication accuracy and SDA-based deviance score for pre-selecting candidate cancer models (Module 2). Pathway-specific mechanistic explorations are iteratively investigated to conclude the final representative cancer model (Module 3). Blue frames represent input data, orange frames for essential output results, parallelogram frames for intermediate results, rectangular frames for analysis process, bullet-shaped frames for visualization, and rhombus frames for decision making.

More »

Fig 1 Expand

Fig 2.

UMAP for comparison of multiple data harmonization approaches.

UMAP for normalized BC tumors (n = 960) and BC cell lines (n = 65) to compare five normalization approaches: (A) no correction and (B) quantile normalization (C) ComBat and (D) Celligner utilizing BC tumors and BC cell lines (E) Celligner utilizing pan-cancer tumors and pan-cancer cell lines. The final approach best eliminates batch effects by mixing well the BC tumors and BC cell lines.

More »

Fig 2 Expand

Fig 3.

UMAP after data harmonization with replicates and basal subtype information.

(A) Three replicates (cell line; cell line_C; cell line_I) for each of the eight cell lines are highly reproducible. (B) The lower-right cluster contains dominantly tumors and cell lines annotated as basal-like (118/160 tumors and 26/28 cell lines).

More »

Fig 3 Expand

Table 1.

Evaluation and properties of 13 popular machine learning methods.

Six methods applied for cancer model prediction in previous papers are highlighted (*). Prediction accuracies are shown in three machine learning evaluation examples. Parentheses in the second column are standard deviations of accuracies in five repeats of five-fold cross-validation.

More »

Table 1 Expand

Fig 4.

Genome-wide cell line congruence and pre-selection.

(A) SDA projected scatter plot. y-axis represents the projected values for 38 cell lines, the red and blue horizontal lines represent the median projected value (center) of ILC and IDC tumor samples respectively. The density plots on the right shows distributions of 769 IDC (blue) and 191 ILC (red) tumors. Red color of the dots represents SDA classification to ILC (threshold PSDA > 50%), and the solid dots represent small SDA-based deviance scores (threshold pval(DSSDA)>0.05). More stringent criteria were indicated by the dashed rectangle. Cell lines with PSDA > 0.8 were enclosed by green dashed rectangle and the ones with pval(DSSDA)>0.1 were enclosed by orange dashed rectangle. (B) SDA projected deviance score (absolute value) with 95% confidence interval. 9 unbiased-selected and 1 manually-included (marked with asterisk) cell lines are ranked based on |DSSDA|, and 95% confidence intervals are obtained by bootstrap analysis on log-scale.

More »

Fig 4 Expand

Fig 5.

Pathway- and gene-specific analysis for selection of representative cell line(s).

(A) Heatmap of pathway-specific deviance scores (DSpath) with 14 unbiased-selected and 1 manually-included pathways (30 < size < 200, |NES| > 1.5; shown on the rows) and 9 unbiased-selected and 1 manually-included cell lines (columns). The genome-wide SDA projected deviance score (DSSDA) is shown on the top side-bar and the pathway size and normalized enrichment score (NES) are on the left. Positive (negative) NES indicates up-regulation (down-regulation) in ILC compared to IDC. Average of the 14 pathways and the pre-selected “KEGG Cell Adhesion Molecules” pathway are shown at the bottom. The p-values of DSpath are annotated in the heatmap (one circle: pvalue < 0.1; two concentric circles: pvalue < 0.05; three concentric circles: pvalue < 0.01), and smaller p-values indicate worse congruence. (B) Gene-specific heatmap shows DSgene for the 10 selected cell lines and 22 DE genes in “KEGG Cell Adhesion Molecules” pathway. (C) Part of KEGG PathView topological networks for BCK4 (DSpath = 1.323) for the “KEGG Cell Adhesion Molecules” pathway. The result shows discordance of 10 genes in BCK4 (orange stars showing up-regulation compared to ILC tumors and blue start showing down-regulation).

More »

Fig 5 Expand

Fig 6.

Selecting representative PDO/PDX for ILC.

(A) SDA projected positions for PDO and PDX models from PDMR. Four models (three PDXs and one PDO; red circles) from the same patient (171881–019-R) were identified as candidate ILC models. Six models from this patient are labeled with the sample ID. High consistency was observed between SDA deviance scores and passages among PDX models. (B) Six models originated from the same patient were used for pathway-specific analysis. Six models show high congruence in the majority of 14 pathways and the Cell Adhesion pathway. (C) Violin plot shows the position of PDO.1 and PDX.1B on the six genes on which PDO.1 is discordant with.

More »

Fig 6 Expand

Table 2.

SDA-based genome-wide congruence summary for six models from patient 171881–019-R.

Later passages of PDX models have worse congruence (i.e., larger deviance scores).

More »

Table 2 Expand