Skip to main content
Advertisement

< Back to Article

Fig 1.

Overall analytical workflow.

Source-target pairs (STPs) are constructed using the links available in Reactome [19]. In the TCGA cancer cohorts, the mutation and copy number variation data are used to construct binary DNA aberration profiles; the presence of either a mutation or high/low copy number variation at a given gene is treated as an aberration for the given gene for that sample omics profile. The gene expression data are used to construct binary RNA aberration profiles based on falling outside the “normal” expression range (in quantiles) for each gene based on TCGA normal tissue expression data, as previously described [14]. The binary profiles are combined to produce paired DNA-RNA aberrations, following which filtering is performed by selecting pairs that are determined to be significant (two-sided χ2 test). The selected STPs then give rise to individual source (DNA) and target (RNA) aberrations, providing binary omics profiles at the level of source, target, and pairs. STPs that are present in less than 2% of samples for a given tissue are omitted. Then coverings are computed at the pair, source and target levels and subtype analysis and heterogeneity analysis carried out.

More »

Fig 1 Expand

Table 1.

Examples of STPs.

For each of the six tissues, one example of a common STP λ = (gg′) is shown. P(DNA&RNA) is our sample-based estimate of the probability that λ is an aberrant pair, namely, the fraction of samples of the indicated tissue for which the source gene g is DNA-aberrant and the target gene g′ is RNA-aberrant. Similarly, P(DNA) (respectively, P(RNA)) is the fraction of samples for which g is DNA-aberrant (resp., g′ is RNA-aberrant), and P(RNA|DNA) is the (estimated) conditional probability that g′ is RNA-aberrant given g is DNA-aberrant.

More »

Table 1 Expand

Table 2.

Colon core STPs.

There are four “core” STPs which appear in every minimal covering of the colon samples. P(DNA&RNA) is the fraction of samples for which the source gene g is DNA-aberrant and target gene g′ is RNA-aberrant; P(DNA) is the fraction of samples satisfying the source gene g is DNA-aberrant; P(RNA) is the fraction of samples with g′ RNA-aberrant; P(RNA|DNA) is the fraction of DNA-aberrant samples for which g′ is RNA-aberrant.

More »

Table 2 Expand

Table 3.

Colon core source genes.

There are five “core” source genes which appear in every minimal source covering of the colon samples. P(DNA) is the fraction of samples for which the indicated source gene is DNA-aberrant; P(DNA&downstreamRNA) is the fraction of samples for which the indicated source gene is DNA-aberrant and there exists an RNA-aberrant gene among its targets. P(downstreamRNA|DNA) is the fraction of the samples with the indicated source gene DNA-aberrant for which there exists some RNA-aberrant gene among its targets.

More »

Table 3 Expand

Table 4.

Colon core target genes.

There are six “core” target genes which appear in every minimal target covering of the colon samples. P(RNA) is the fraction of samples for which the indicated target gene is RNA-aberrant; P(RNA&upstreamDNA) is the fraction of samples for which the indicated target gene is RNA-aberrant and there exists an DNA-aberrant gene among its sources. P(upstreamDNA|RNA) is the fraction of the samples with the indicated gene RNA-aberrant for which at least one of its sources is DNA-aberrant.

More »

Table 4 Expand

Table 5.

Statistics of optimal coverings.

For each of the six tissues, this table provides basic information about the optimal coverings at all levels: STP, source with target, target with source. For instance, for breast cancer, there are 4,026 candidate STPs after both filters (rejecting source-target independence and 2% tissue sample frequency); the minimal covering size is 67 STPs; at least one of these 67 STPs is aberrant in 95.4% of the breast cancer samples; and there are 21 STPs which appear in every minimal covering.

More »

Table 5 Expand

Fig 2.

Networks of pair coverings in breast cancer.

The network shown in the center depicts one covering of breast cancer samples by STPs, with source genes in orange, target genes in blue, and intermediary link genes in green. The thin and thick edges represent, respectively, the two types of relationships: “controls state change of” and “controls expression of” as designated in Reactome [19]. On the left are presented a selection of covering realizations for three ER-positive samples, where aberrant STPs are highlighted, while and on the right, three ER-negative samples samples are shown. The samples have different realizations over the covering network, and are ranked (top to bottom) by the number of events they exhibit. The sample networks demonstrate the inter-sample heterogeneity among the source and target realizations.

More »

Fig 2 Expand

Fig 3.

Core set across tissues at source level.

There are 18 source genes which appear in the core set of at least two tissues. For instance, gene TP53 is a core gene for all six tissues, and genes PTEN and PIK3CA are core genes for three tissues. The color in the heatmap on the left represents the probability that the corresponding source gene is DNA-aberrant and there exists an RNA-aberrant target gene (thereby forming an aberrant source-target pair). On the right, black marks indicate the membership of each gene to the corresponding core set for each tumor type.

More »

Fig 3 Expand

Table 6.

Probabilities of source aberration with downstream target for breast cancer subtypes.

For PAM50 subtypes of breast cancer, the heatmap represents the probabilities that the indicated gene is a DNA-aberrant source gene with some downstream RNA-aberrant target. The sources are selected from the set of core genes for coverings of the given tissue; the selection criterion is that the probability of a DNA-aberration is high for at least one of the subtypes for that tissue. Core sources with varying probabilities present interesting candidates for discrimination between subtypes. For example, the DNA-aberration frequency of TP53 is much higher in the HER2-enriched and Basal-like subtypes than in Luminal A and Luminal B, whereas an aberration in PIK3CA is less frequent among basal-like samples than among the other subtypes.

More »

Table 6 Expand

Table 7.

Probabilities of source aberration with downstream target for lung cancer subtypes.

For smoking history based categories of lung cancer, the heatmap represents the probabilities that the indicated gene is a DNA-aberrant source gene with some downstream RNA-aberrant target. The sources are selected from the set of core genes for coverings of the given tissue; the selection criterion is that the probability of a DNA-aberration is high for at least one of the subtypes for that tissue. TP53 and KRAS are both more frequently DNA-aberrant (with some downstream RNA-aberrant target) among smokers than non-smokers whereas EGFR is a more aberrant source among non-smokers.

More »

Table 7 Expand

Table 8.

Probabilities of source aberration with downstream target for colon cancer subtypes.

For CRIS-class subtypes of colon cancer, the heatmap represents the probabilities that the indicated gene is a DNA-aberrant source gene with some downstream RNA-aberrant target. The sources are selected from the set of core genes for coverings of the given tissue; the selection criterion is that the probability of a DNA-aberration is high for at least one of the subtypes for that tissue.

More »

Table 8 Expand

Table 9.

Probabilities of source aberration with downstream target for prostate cancer subtypes.

For primary gleason grade subtypes of prostate cancer, the heatmap represents the probabilities that the indicated gene is a DNA-aberrant source gene with some downstream RNA-aberrant target. The sources are selected from the set of core genes for coverings of the given tissue; the selection criterion is that the probability of a DNA-aberration is high for at least one of the subtypes for that tissue.

More »

Table 9 Expand

Fig 4.

Comparison of one covering network for luminal breast cancer subtypes.

The probabilities of DNA aberration (with targets) and RNA aberration (with sources) over the Luminal A and Luminal B populations of breast cancer are depicted by the size of each node in the network, which corresponds to one possible covering. The red arrows indicate some sources and target genes that have noticeable differences in the respective probabilities between the two luminal subtypes (e.g., TP53, CHEK1, PIK3CA, and TOP2A, also see Table 6).

More »

Fig 4 Expand

Fig 5.

Rates of covering assembly.

For each of four tissues (breast, colon, lung and prostate), several phenotypes are compared based on the proportion of samples actually covered when requesting 90% coverage or more for the given tissue by the optimization procedure. The boxplots represent the results of 20 iterations of normalizing for sample size among the phenotypes by random sampling. In general, coverings for more aggressive phenotypes assemble faster.

More »

Fig 5 Expand

Table 10.

Entropy estimation at source level.

Entropy estimates for source aberrations with target for prostate Gleason sum, primary Gleason grade, tumor status, and lymph-node status. N is the total number of samples available in the given subtype.

More »

Table 10 Expand

Fig 6.

Coding tree for breast ER status at target level.

For Breast tumor samples, a weighted coding tree T with depth d = 5 is constructed using one covering at target level. At each internal node, a sample is sent to the left if the indicated two genes at the node are both RNA aberrant with some aberrant source, whereas it is sent to the right otherwise. Sample counts for each terminal nodes are indicated and highlighted using a green palette. The two histograms at the bottom show the sample distribution at the 32 terminal nodes for the ER negative and ER positive sub-populations.

More »

Fig 6 Expand