Fig 1.
Workflow for generating multi-granularity graphs (MGGs).
Using the NEEP method significant splice variants were calculated, utilizing clinical data, the complementary-DNA reference library, and RNA-Seq data from Ensembl and TCGA. Multi-granular graphs were constructed for known protein-protein interactions from BioGRID, mapped from Ensembl to at least one significant splice variant. Domains were predicted for each splice variant using HMMER3 [16] and Pfam [17]. Gained and ghost domains were identified in splice variants associated with survival. Final MGGs represent potential lost or gained protein interactions associated with patient survival.
Fig 2.
NEEP yields uniformly distributed p-values.
(A) The p-value distribution is shown for the minimum p-value method and NEEP. An ideal statistical test produces p-values that are uniform under the null. (B) Density line plots were constructed for each of 100 simulations of 1000 random abundance patterns (blue) and 20 sets of 1000 values chosen at random from the initial 1,000,000 null distribution (red). The overlap between the two groups of density plots empirically confirms that the null distribution is the same for all expression patterns, given identical clinical data.
Fig 3.
Splice variants are in blue solid boxes and domains are in dotted boxes. The significant splice variant (left box) is linked to its gained (green) and ghost (white) domains. Domains in gray belong to the non-significant splice variant (right box). Splice variants of the same gene that have identical domain connections in the MGG are stacked and their identifiers are listed below the boxes. RAD51C-202 has two MGGs which are separated to simplify visualization.
Table 1.
COSMIC mutation signatures associated with RAD51C-202 expression.
Fig 4.
Mutation signature 3 association with RAD51C-202.
Box-plots were generated for the signature 3 contribution to patients in low (red) and high (blue) expression of RAD51C-202. Patients with 0 values for contributions were removed for visualization. Signature 3 contributed more to the mutation signatures of patients with high RAD51C-202 expression than those with low expression.
Table 2.
Statistical test results comparing smoking variables to RAD51C-202 expression.
Fig 5.
Robustness of single threshold methods and NEEP.
(A) The central 50 ranks for the 100 simulations of each of the 181 significant splice variants are plotted as a shaded range according to their mean simulated rank, separately for the 5% simulations (red) and the 10% simulations. Each of these ranges correspond to an observed (non-simulated) rank, which is plotted as a dot along the same x-axis. The 5% simulation ranges have lower means of simulation rank than the 10% simulation ranges; thus the red dots are closer to the identity line. The maximum mean of the simulation rank is lower for the 5% simulations; thus the red shade ends much sooner than the blue shade. (B) The density plot of the observed vs. simulation pairs for the 181 significant splice variants.