Building, benchmarking, and exploring perturbative maps of transcriptional and morphological data

doi:10.1371/journal.pcbi.1012463

Fig 1.

Graphical abstract of the introduced framework.

More »

Expand

Table 1.

Properties of the four datasets on which we are applying the map building and benchmarking framework.

These statistics specifically pertain to the portions of these datasets utilized in our application (Methods: Perturbation data collection) and not necessarily the full data available from each source. We use original study authors’ definition of “expressed” gene whenever applicable (Methods: Filtering genes based on expression).

More »

Expand

Fig 2.

Benchmarking results in maps constructed using RxRx3 and different EFAAR pipelines.

Bars for consistency and magnitude (left) show the percentage of perturbations with a significant p-value (< .05). Bars for CORUM, HuMAP, Reactome, SIGNOR and StringDB show the biological relationship benchmarks, i.e., the percentage of annotated relationships falling within the 5% tails (from each side) of the distribution of all pairwise cosine similarities.

More »

Expand

Fig 3.

Biological relationship benchmarking results in maps constructed using different EFAAR pipelines on the GWPS dataset.

Bar height shows the percentage of annotated relationships which fall within the 5% tails (from each side) of the distribution of all pairwise cosine similarities.

More »

Expand

Fig 4.

Benchmarking results in maps constructed using different EFAAR pipelines on the cpg0016 dataset.

Bars for consistency and magnitude (left) show the percentage of perturbations with a significant p-value (< .05) for each measure. Bars for CORUM, HuMAP, Reactome, SIGNOR and StringDB show the biological relationship benchmarks, i.e., the percentage of annotated relationships falling within the 5% tails (from each side) of the distribution of all pairwise cosine similarities.

More »

Expand

Fig 5.

Benchmarking results in maps constructed using different EFAAR pipelines on the cpg0021 dataset.

Bars for consistency and magnitude (left) show the percentage of perturbations with a significant p-value (< .05) for each measure. Bars for CORUM, HuMAP, Reactome, SIGNOR and StringDB bars show the biological relationship benchmarks, i.e., the percentage of annotated relationships falling within the 5% tails (from each side) of the distribution of all pairwise cosine similarities.

More »

Expand

Fig 6.

Comparative analysis of biology surfaced by different maps.

(A)Venn diagram of the intersection of the CORUM protein complexes captured by the PCA-TVN maps from each of GWPS, cpg0016, and cpg0021. There are 153 evaluated complexes (those with at least ten expressed genes) for GWPS, 83 for cpg0016, and 169 for cpg0021. (B) A split cosine similarity heatmap of the genes in six non-overlapping complexes out of the 12 identified by all of the GWPS, cpg0016, and cpg0021 maps. Below the diagonal represents similarities for the GWPS map, and above the diagonal represents similarities for the cpg0016 map.

More »

Expand

Fig 7.

A split cosine similarity heatmap of the Integrator complex subunits from the RxRx3 and GWPS maps.

Above the diagonal represents similarities for the RxRx3 map, and below the diagonal represents similarities for the GWPS map. There are three main clusters visible in both, which correspond to the three main modules of the Integrator complex.

More »

Expand

Fig 8.

Analysis of strongest gene connections to uncharacterized genes C18orf21 and C1orf131.

(A) Top 25 strongest connections to C18orf21 and the associated cosine similarities in each of the RxRx3 and GWPS maps. Overlapping six genes across are connected by lines between the two heatmaps. (B) GO enrichment results for the six genes that are in the overlap of the top 25 strongest connections to C18orf21 in the RxRx3 and GWPS maps. Bar lengths represent the Bonferroni-corrected -log10(p-value) from a hypergeometric test. (C, D) Similar data is presented for C1orf131.

More »

Expand