Fig 1.
Partitions of genomic regions based on Ensembl annotation and their predictability of blood cancer WGS data.
(A) The partition of the genomic region. (B-G) Median F1 values for the respective regions and their permuted controls when using off-the-shelf ML (B and E), ReVeaL on the original genomic areas (C and F) and ReVeaL on genomic areas normalized by length (D and G). See S2 Table for the F1 values.
Fig 2.
Disease-by-disease ReVeaL Analysis.
(A) F1 scores for genomic sectors for each disease are averaged over all 10 replicate analyses per chromosome and the maximum F1 score is reported for that disease. ReVeaL scores on disease-label permutations are shown in overlaid hatched bars. The gray bar represents the mean over all diseases. (B-C) Boxplot of fg, shingle values representing the four moments of the distribution, of samples per disease and diseases ordered by decreasing median fg for the top 2 ReVeaL features. The line above each boxplot represents the shingle, the yellow interval representing the portion of the segment that is masked. (D-G) t-SNE visualization (perplexity = 40, iterations = 300) using the top 50 shingle fg values (B and C) and mutational load lg, number of mutations for a given window in the genomic region for a given patient, (D and E), respectively, in exonic and dark sectors.