Identification of Networks of Co-Occurring, Tumor-Related DNA Copy Number Changes Using a Genome-Wide Scoring Approach

doi:10.1371/journal.pcbi.1000631

Figure 1.

Co-occurrence score for paired continuous variables.

a. Four possibilities of pairs of hypothetical DNA copy number change measurements are shown, for a set of samples. Each of the four hypothetical measurement pairs is plotted in scatter plot, giving each sample in the set an x- and y-coordinate. The random pair (first panel) is a noisy pair containing no effect. The constitutive member pair (second panel) consists of one measurement that is continuously high, paired with a measurement that varies between two noisy levels. The co-occurring signal (third panel) consists of two noisy measurements that alternate between a high and a basal level, but show concerted change. The mutual exclusive pair (fourth panel) also alternates between two levels but one measurement excludes the other from also reporting a high value. b. In this example we show scoring for co-occurring gains. Therefore we set all negative values to zero. To score for loss-loss pairs we would need to set all positive value to zero and continue using the absolute values. For loss-gain analysis we would set the positive values of the x (y) axis to zero and use the absolute values in the x (y) direction. c. The first panel shows the resulting scores of the four pairs of measurements if only the sum of the minimum is used. The second panel shows the score when the covariance is included.

More »

Expand

Figure 2.

Schematic overview of co-occurrence analysis.

a. Overview of aCGH data. Both and are vectors of genomic grid points spanning a chromosome arm (see Materials and Methods). The genomic grid is constructed from aCGH probe measurements, as explained in the Materials and Methods section. b. The combinations of and are used to construct a genomic pair-wise space in which all further calculations are performed. In this panel a schematic view of the genomic pair-wise space is shown. Each pair of genomic grid points between and is a point in this space and each point contains two values. A pair-wise genomic matrix exists for each tumor in the data set. c. To score for co-occurrence, the minimum value of the pairs of genomic grid points are summed over the tumors and the co-variance over tumors of all genomic grid points is calculated. This results in two equally sized matrices which are multiplied element wise to produce the co-occurrence score matrix. This matrix is again represented in the genomic pair-wise space (). d. The co-occurrence score matrix is convolved with a Gaussian matrix to find local enrichment of high co-occurrence scores in the pair-wise space. Peaks in the convolved co-occurrence matrix are translated back to two genomic regions ( and ) that are annotated as being co-aberrated across the tumor set. e. For the n-th peak in the Convolved Co-occurrence Matrix (CCM) two gene sets, and , are defined, based on a 2σ window centered on the peak. f1. Using a protein-protein interaction database the interactions between gene sets derived from a single co-occurrence peak are analyzed, producing a set of interactions (). f2. Using the Cancer Gene Census we inspect the resulting gene sets for presence of known tumor-suppressor genes and oncogenes.

More »

Expand

Figure 3.

Two co-occurring losses detected in the 2Mb scale analysis.

Raw aCGH data of two co-occurring losses corresponding to four genomic loci are shown. The y-axis of the heatmaps contains the samples, ordered through standard hierarchical clustering. The x-axis contains the probes present in the four genomic loci, ordered by genomic location. The sample information bar contains the names of the cell lines analyzed, the disease of origin and the whether the sample has a T-cell or B-cell lineage. These representations are based on the results of the analysis on the 2 Mb scale.

More »

Expand

Table 1.

Occurrence of T-cell and B-cell related co-occurring losses.

More »

Expand

Table 2.

Enrichment for Cancer Gene Census genes in top 50 co-occurring genomic loci.

More »

Expand

Figure 4.

Significance of finding direct interactions in co-occurring genomic loci.

For two scales the top 50 co-occurring gene lists for the gain-gain, loss-loss and loss-gain situations were compared to a random set of 100 pairs of genomic loci. For each genomic pair two gene sets were queried for direct interactions using the STRING database. Significance was ascertained using Fisher's exact test on the ratios between all genes and the interacting genes for the co-occurrence gene sets versus the random gene set.

More »

Expand

Figure 5.

Networks of co-occurring gain and loss.

The networks that result from hierarchical clustering of Scale 2 results are shown in different panels. Each panel represents either the gain-gain, loss-loss or gain-loss analysis. The resultant network is visualized using the Cytoscape software package (www.cytoscape.org). Edge thickness scales according to the number of co-occurrence links found between the two genomic loci. The size of the nodes is proportional to the highest rank found among the different individual loci that constitute a node. If only one genomic location is present in a node, i.e. this location did not cluster with any other locations, it is colored gray. The cancer gene enrichment among all genes mapping to the locations described by the nodes is shown in the top right hand corner. P-values are determined by Fisher's Exact test. The functional interaction enrichment of all genes between nodes that are linked with an edge is represented in the lower right hand corner of each panel. P-values are determined using Fishers' Exact test, with randomly generated pairs of loci representing the null hypothesis.

More »

Expand

Figure 6.

The gain-gain core network.

a. The reduced core network for the gain-gain analysis obtained by pruning all edges with less than 5% support in the top 500 list of the Scale 2 analysis. Edge thickness and label represent the number of functional interactions between genes associated with the nodes being connected based on the STRING database. The oncogenes as defined by the Cancer Gene Census that map within the regions defined by the nodes are shown in rectangular insets. b. Representation of the 10 most enriched Ingenuity terms associated with the entire collection of genes in the core network that have a STRING interaction along the edges. The x-axis shows the −log transformed p value, corrected by the Benjamini Hochberg procedure as implemented in the Ingenuity software. c. Functional interaction enrichment is shown as a bar graph, which represent the ratio of interacting genes with respect to the total number of genes. P-values are determined using a Fishers' Exact test with randomly selected pairs of loci representing the null hypothesis.

More »

Expand

Figure 7.

The loss-loss core network.

a. The reduced core network for the loss-loss analysis determined by pruning all edges with less than 5% support in the top 500 list of the Scale 2 analysis. Edge thickness and label represent the number of functional interactions between genes associated with the nodes being connected based on the STRING database. The tumor suppressor genes as defined by the Cancer Gene Census that map within the regions defined by the nodes are shown in rectangular insets. b. Representation of the 10 most enriched Ingenuity terms associated with the entire collection of genes that have a STRING interaction between the 17p region and 9p, 9q, 13q, 16q or 22q as determined by the Ingenuity software. The x-axis shows the −log transformed p value, corrected by the Benjamini Hochberg procedure as implemented in the Ingenuity software. c. Functional interaction enrichment is shown as a bar graph, which represent the ratio of interacting genes with respect to the total number of genes. P-values are determined using a Fishers' Exact test with randomly selected pairs of loci representing the null hypothesis. d. A functional interaction network around the nuclear co-repressor NCOR1 (also known as TRAC1) is shown. This network is a part of the network of interactors derived from the 17p interacting regions after removal of the canonical cancer genes TP53, RB1, CDKN2A and CDKN2B from the analysis. e. Illustration of the retroviral insertions mapped near CBFA2T3, recovered in a large screen of MuLV retroviral mutagenesis [11]. Insertions are shown as triangles. Blue triangles indicate insertions in the direction of transcription (plus), red triangles indicate insertions in the anti-transcription direction (minus). Insertions linked by dashed boxes are bi-allelic integrations recovered from the same tumor.

More »

Expand