Predicting Human Genetic Interactions from Cancer Genome Evolution

doi:10.1371/journal.pone.0125795

Fig 1.

Patterns across cancer genomes reflecting selection against gene co-inactivation, and the workflow to predict SL interactions.

(a) A SL interaction SL1 between gene A and B can show a ‘compensation’ pattern across cancer genomes in which it is more likely that when A is inactive (denoted by -1), B is overactive (denoted by 1) to compensate the inactive A (genomes 1–10), compared to when A is active (genomes 11–30). SL interaction SL2 can, show a ‘co-loss underrepresentation’ in which a combined loss of A and B (denoted by -1 and -1, genome 10) across cancer genomes is underrepresented compared to a loss of either one of the two (genomes 2–9 and genome 14–18). Note that SL1 can also be identified via the co-loss underrepresentation pattern, but the SL2 can only be identified via the co-loss underrepresentation pattern. (b) The model requires two types of data as input, i) CNVs measured by SNP arrays and ii) gene expression variations measured by RNAseq. In CNVs, the status of a gene can be a homozygous deletion (two dash lines), a heterozygous deletion (one dash and one solid line) or normal (two solid lines). For CNVs, we generated three fractions to quantify the likelihood that a gene pair has a homozygous co-loss (f1), a heterozygous co-loss (f2) or a mixed co-loss (f3) event. In gene expression variations, a gene can be under-expressed (one dash line), normal (one solid line) or over-expressed (one bold line). For expression status, we generated two fractions, f4 and f5. f4 is the likelihood that both genes in a gene pair are under-expressed. f5 is the likelihood that a gene pair has an expression up-down event where one is over-expressed while the other one is under-expressed. All these five fractions showed a distribution difference between SL and non-SL pairs. By integrating these five fractions into a prediction model, we can identify SL interactions that can be presented as a network.

More »

Expand

Table 1.

Five fractions derived from genomic variations for SL interaction identification.

More »

Expand

Fig 2.

SL pairs are reflected in copy number variations.

SL pairs are less likely to have (a) homozygous co-loss events, (b) heterozygous co-loss events and (c) mixed co-loss events than non-SL pairs or random pairs. The fractions for these three types of co-loss events are described as f₁, f₂, f₃ in Methods and Fig 1. Each dot is the fraction for a given pair and the horizontal bar represents the mean of the fractions. P-values for the comparison between SL and non-SL pairs were calculated using one-sided Wilcoxon rank test. P-values for the comparison between SL and random pairs were calculated from 1000 randomizations. P-values were adjusted for multiple comparisons using the Bonferroni correction (see details in Methods).

More »

Expand

Fig 3.

SL pairs are reflected in gene expression variations.

(a) SL pairs are less likely to be co-underexpressed relative to the control i.e., non-SL or random pairs. The fraction for co-underexpression events is described as f₄ in methods and Fig 1. (b) SL pairs are more likely to have expression up-down events where one gene is over-expressed while the other in under-expressed. The fraction for such pattern is described as f₅ in Methods and Fig 1. Each dot is the fraction for a given pair and the horizontal bar represents the mean of the fractions. P-values for the comparison between SL and non-SL pairs were calculated with a one-sided Wilcoxon rank test. P-values for the comparison between SL and random pairs were calculated from 1000 randomizations. P-values were adjusted for multiple comparisons using the Bonferroni correction (for details see Methods).

More »

Expand

Fig 4.

Receiver operating characteristic (ROC) curves.

(a) The ensemble-based prediction model based on all five combined patterns has an area under curve (AUC) of 0.75 (blue line), which is estimated by 10-fold cross validation. Ensemble-based prediction models based on the non-combined individual patterns, i.e., co-loss in CNVs, co-underexpression and expression up-down, are shown in red, green and purple respectively, and have lower AUCs. Standard error bars are added to each ROC. (b) The ensemble-based prediction model (the blue ROC curve) has a better performance than all the seven single. (c) The precision and recall curve is estimated from 10-fold cross validation. Standard error bars are added. The curve is colored according to the cutoff of probability. The color panel of the probability is plotted at the right side. The cutoffs of probability scores (p(x)), 0.81, are printed at the corresponding curve positions. The grey line represents the prediction precision by chance alone.

More »

Expand