Carcinogenesis is a complex multifactorial, multistage process, but the precise mechanisms are not well understood. In this study, we performed a genome-wide analysis of the copy number variation (CNV), breakpoint region (BPR) and fragile sites in 2,737 tumor samples from eight tumor entities and in 432 normal samples. CNV detection and BPR identification revealed that BPRs tended to accumulate in specific genomic regions in tumor samples whereas being dispersed genome-wide in the normal samples. Hotspots were observed, at which segments with similar alteration in copy number were overlapped along with BPRs adjacently clustered. Evaluation of BPR occurrence frequency showed that at least one was detected in about and more than 15% of samples for each tumor entity while BPRs were maximal in 12% of the normal samples. 127 of 2,716 tumor-relevant BPRs (termed ‘common BPRs’) exhibited also a noticeable occurrence frequency in the normal samples. Colocalization assessment identified 20,077 CNV-affecting genes and 169 of these being known tumor-related genes. The most noteworthy genes are KIAA0513 important for immunologic, synaptic and apoptotic signal pathways, intergenic non-coding RNA RP11-115C21.2 possibly acting as oncogene or tumor suppressor by changing the structure of chromatin, and ADAM32 likely importance in cancer cell proliferation and progression by ectodomain-shedding of diverse growth factors, and the well-known tumor suppressor gene p53. The BPR distributions indicate that CNV mutations are likely non-random in tumor genomes. The marked recurrence of BPRs at specific regions supports common progression mechanisms in tumors. The presence of hotspots together with common BPRs, despite its small group size, imply a relation between fragile sites and cancer-gene alteration. Our data further suggest that both protein-coding and non-coding genes possessing a range of biological functions might play a causative or functional role in tumor biology. This research enhances our understanding of the mechanisms for tumorigenesis and progression.
Citation: Marczok S, Bortz B, Wang C, Pospisil H (2016) Comprehensive Analysis of Genome Rearrangements in Eight Human Malignant Tumor Tissues. PLoS ONE 11(7): e0158995. https://doi.org/10.1371/journal.pone.0158995
Editor: Amanda Ewart Toland, Ohio State University Medical Center, UNITED STATES
Received: December 8, 2015; Accepted: June 25, 2016; Published: July 8, 2016
Copyright: © 2016 Marczok et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: Funding for S. Marczok and B. Bortz was provided by a grant of the Europäischer Fonds für regionale Entwicklung, EFRE 1416689, State Brandenburg (www.esf.brandenburg.de); and by the Bundesministerium für Bildung und Forschung in the project "Prediction and modeling of hybrid performance and yield gain in oilseed rape by systems biology" (PROGReSs – 031A297G) (www.bmbf.de).
Competing interests: The authors have declared that no competing interests exist.
The incidence of tumor increases rapidly with aging [1, 2], but the tumorigenesis is probably caused mainly by genetic preloads, bad environmental conditions and lifestyle behaviors [1, 3]. Cells affected by somatic mutations develop the typical hallmarks of cancer: sustaining proliferative signaling, evading growth suppressors, activating invasion and metastasis, enabling replicative immortality, inducing angiogenesis and resisting cell death . Unrepaired genetic variations have long been thought associated with carcinogenesis [4, 5]. One type of genomic aberrations is copy number variation (CNV) that leads to an altered number (fewer or more) of copies of a genomic region in comparison with a reference genome. CNV arises from diverse mechanisms including nonallelic homologous recombination (NAHR), nonhomologous end joining (NHEJ) and fork stalling and template switching/microhomology-mediated break-induced replication (FoSTeS/MMBIR) during replication or recombination [6–8].
Previous studies [6, 9–13] have detected an increased number of CNVs in human malignant tumors and found a functional correlation of CNVs with tumorigenesis. For example, Gratias et al.  reported small deletions in retinoblastomas and CNVs at chromosomal band 16q24 known to encompass among others gene KIAA0513. The KIAA0513 gene has been postulated to play an important role in immunologic, synaptic and apoptotic signal pathways . Yang et al.  observed an increase of overall CNV burden in familial colorectal cancer patients compared with healthy controls as well as a novel structural variation at 12p12.3, suggesting a contribution of the overall burden of CNVs to familial colorectal cancer risk. TP53, also known as cellular tumor antigen p53, is involved in growth suppression and apoptosis. Several studies [11, 14–18] identified a remarkable number of CNVs characteristic of TP53-related tumors in Li-Fraumeni syndrome, an autosomal dominantly inherited disorder characterized by a variety of early-onset tumors. CNV breakpoints, forming a boundary between two copy number altered regions, have gained increasing attention in the mechanistic studies of tumorigenesis . Li et al.  have identified cancer-type-specific breakpoint hotspots with distinct genomic patterns and found that these hotspots are enriched with known cancer genes. In an earlier study , we detected an increasing number of CNVs during tumor progression and identified some distinct breakpoint regions (BPRs) arising more frequently than other BPRs (e.g. fragile sites) in mouse mammary tumor. Our findings indicate that higher numbers of copy number alterations lead to an increased cancer risk . In constrast, a recent work of Stewart et al.  supposed that tumor inducing mutations occur by chance and are readily inducible at common fragile sites. In a more recent study, Tomasetti et al.  showed that only one third of cancer cases are attributable to environmental factors or inherited predispositions. The evidence suggests that random mutations in regulatory genomic regions during cell division and the faulty repair mechanisms play the predominant role in carcinogenesis. These findings raise intriguing question regarding causal factors leading to cancer and the predictability of malignant events.
In the present study, we analyzed the influence of CNVs, BPRs and fragile sites on cancer development. In doing so, we investigated copy number altered regions and BPRs in 2,737 tumor samples from eight different tumor entities as well as in 432 normal samples from several tissue types including brain, gastric, lung, ovarian, prostate and renal tissues. Additionally, we evaluated the occurrences of tumor entity-specific, cancer-specific and common BPRs that would serve as a helpful clue to the mechanisms of tumorigenesis and tumor progression.
Materials and Methods
To undertake a comprehensive analysis, a large number of tumor samples from more than one tumor entity with a high resolution must be analyzed. So far, the best resolution of genomic data is provided by next-, second- or third-generation sequencing technologies; however, these methods are very expensive and time consuming. A much cheaper and widely used alternative is represented by SNP arrays with a pretty good ratio of physical coverage and evaluating speed . Moreover, there exists a huge pool of freely available data.
Malignant tumor data: The tumor data used for the genome-wide identification of CNVs and BPRs were taken from the publicly accessible Gene Expression Omnibus (GEO) database from the National Center for Biotechnology Information (NCBI) . Specifically, raw CEL files from the Genome-Wide Human SNP Array 6.0 were analyzed in this study. Totally, 2,737 malignant primary tumor samples from 8 different tumor entities were used, including 377 breast tumor, 189 colorectal tumor, 340 gastric tumor, 291 lung tumor, 1,104 pediatric medulloblastoma, 207 ovarian tumor, 120 prostate tumor and 109 renal tumor samples. For a detailed description of the whole data set see S1 Table.
Reference data: Reference data were retrieved from the International HapMap Project (Phase 3, Release #3) [24, 25]. 990 HapMap samples analyzed by the Genome-Wide Human SNP Array 6.0 were taken into account to build a reference set for comparison (S2 Table).
Normal samples: Additionally, 432 normal samples from the GEO database were included in the study: brain tissue—29 samples, gastric tissue—148 samples, lung tissue—62 samples, ovarian tissue—57 samples, prostate tissue—67 samples, renal tissue—69 samples. These samples serve as the standard set for verification of CNV tumor specificity (S3 Table for further details).
The detection of CNVs and tumor entity-specific, cancer-specific and common BPRs using SNP signal intensities proceeded through three steps: (1) preprocessing of SNP array raw data, (2) calculation of the logarithmized ratio (log2 ratio) of the signal intensities of the tumor sample and the reference set for each SNP, and (3) segmentation [26, 27]. A pipeline was implemented to facilitate these tasks. The preprocessing was carried out using the software Affymetrix Power Tool (APT, Linux version 1.16.0)  and the subsequent steps were performed within the freely available software R (R version 3.0.2) . The segmentation and subsequent detection of CNVs and BPRs were performed for each tumor sample separately. All segments are defined by the respective chromosome, the number of encompassed SNPs and the segmental mean signal intensity. Furthermore, the start and end position of each segment is given as the genomic position of the first and last SNP of the segment. For the determination of the genomic position of each SNP, we used human genome hg19/GRCh37 as a reference.
Building the reference: To identify potential genomic alterations we built a reference set derived from 990 HapMap samples. Firstly we used the APT software package for preprocessing the raw data. Then the signal intensities for both SNP alleles were added up and the average signal intensity for all reference samples was calculated.
Preprocessing and calculation of signal intensities: All samples were preprocessed separately by quantile normalisation and a background correction with the Birdseed v2 algorithm provided by APT [11, 30]. The default program settings were used in the current study. The overall signal intensity of each SNP per sample was obtained by allele summation afterwards .
Segmentation: To detect genomic alteration in malignant tumors, we determined the segmentation profile for each sample. This was carried out by calculation of the log2 ratio of the signal intensities of the tumor sample in relation to the reference intensities for each SNP [26, 32]. The chromosomal segmentation of adjacent SNPs with similar log2 ratio values for the 22 autosomes was calculated using the circular binary segmentation algorithm (CBS algorithm) introduced by Olshen et al.  after outlier detection and data smoothing (“smooth.CNA”) . The Bioconductor package DNAcopy (version 1.32.0) implements the circular binary segmentation algorithm and was used by setting the significance level α to 0.001, the standard deviation SD to 0.5 “sd.undo”) and the minimal number of markers per segment (“min.width”) to 4.
Identification of copy number altered genomic regions The resulting segments (representing the respective continuous genomic regions) of all the tumor samples were then further analyzed. The average value for each segment was accorded to its SNP intensity. The SNP intensities for all the samples belonging to each tumor entity were averaged and a mean value for each chromosome was calculated. The difference between each actual SNP intensity value and the chromosomal mean represents an altered copy number. We defined genomic regions with copy number alteration as segments exhibiting a difference of ≥ 0.1 or ≤ -0.1, respectively.
Determination of genes localized in regions with copy number alteration: The results of the determined regions with copy number alteration were used to discover if any gene could be affected by deleted or amplified genomic regions. The examination was done using the BiomaRt package (version 2.24.0)  in R, which offers access to several data sources including HapMap, HGNC, Ensembl, InterPro and Reactome. In the present study, gene detection was performed with Ensembl (version GRCh37.p13/ release 82).
Breakpoint detection: To detect chromosomal BPRs, we considered the regions of adjacent segments whose segment mean difference was > 0.6 (corresponding to 1 copy number based on the log2 transformation). A potential BPR was defined as the genomic stretch between the last SNP position of a segment and the first SNP position of its successive adjacent segment. The actual breakpoint lies somewhere between these two genomic positions, but could not be detected exactly due to the layout of microarrays. For identification of specific BPRs, the number of detected BPRs was counted.
Determination of the different BPR classes: Taking into consideration the occurrence frequency of BPRs and whether a BPR appeared also in normal samples, we devided BPRs into four BPR recurrency classes: tumor entity-specific BPR (occurrence is ≥ 1% in exclusively one tumor entity with or without being found in healthy samples) (1), cancer-specific BPR (occurrence is ≥ 1% in more than 25% of the entities and in normal samples < 0.5%) (2) and common BPR (occurrence is ≥ 1% in ≥ 25% of the entities and ≥ 0.5% in normal samples) (3), and no cancer-specific BPR (occurrence are < 1% in all tumor entities) (4).
For all 2,737 tumor samples from eight tumor entities, 64,720 different BPRs were identified, and 7,324 of them in more than one tumor entity. On average, the BPRs span 6,831 bp and the size ranges from 10 bp to 22,757,511 bp. 127 BPRs were detetected in more than 1% of all the samples, 47 BPRs in at least 2% (Table 1) and 10 BPRs in 5% or more of the tumor samples. Furthermore, 8,853 BPRs were found exclusively in the normal samples and 7,695 BPRs in both the tumor and the normal samples (S7 Fig). A list of all the identified BPRs is given in S4 Table.
Tumor entity-specific, cancer-specific and common BPRs
To identify tumor entity-specific, cancer-specific and common breakpoint patterns, the occurrence of BPRs in each tumor entity was counted and compared between the entities. The most recurrent BPRs (2,279) are tumor entity-specific. 230 of the cancer-specific BPRs could be repeatedly detected in 25-75% and 7 in more than 75% of the tumor entities. The common BPRs could be determined 207 times in over 25% of the tumor entities. In addition, 32 BPRs were found to exhibit a dinstinct entity specificity. The noticeable occurrence frequency BPRs (NOF-BPRs) occurred in at least 10% of all the samples of the individual entities.
Common BPRs: Two of common BPRs appeared very frequently in seven out of eight tumor entities: one localized on chromosome 16 between 85,091,864 bp and 85,092,483 bp and the other on the same chromosome from 85,092,748 bp to 85,092,892 bp. The former BPR exists in 9.41 up to 27.90% of all the samples for each entity, but only in 4.86% of all the normal samples. The latter even ranges from 9.71 to 43.48% per tumor entity and was found only in 22 out of 432 normal samples. Other two adjacent interesting BPRs on chromosome 8 (chr8: 39,225,941 bp to 39,288,762 bp and 39,397,732 bp to 39,398,022 bp) were found to occur in all the tumor entities with a frequency up to 12.17% (S4 Table).
Cancer-specific BPRs: One of two cancer-specific BPRs occurring in more than 75% of the tumor entities could be identified in chromosome 8 (6,104,977 bp to 6,107,427 bp) with a frequency up to 15.87%. Two BPRs were detected in chromosome 4 (9,994,215 bp to 9,996,852 bp and 9,997,801 bp to 10,001,833 bp) at a mean frequency of 4.75 and 7.06%, respectively, and each in 50% of the tumor entities (S4 Table).
Tumor entity-specific BPRs: However, most BPRs were found to occur frequently in only one tumor entity. For example, a BPR on chromosome 17 (18,917,915 bp to 19,168,912 bp) was detected in 15.40% of all the medulloblastoma samples, but in less than 0.6% of all the samples of the other individual tumor entities (S4 Table).
BPRs in healthy tissues: Eight out of the 31 NOF-BPRs were also found in the normal samples with a frequency of 5% (common BPRs), but this amount is in all cases lower than those in the tumor samples. Only two BPRs from the NOF-BPRs set were found on chromosome 18 (4,976,160 bp to 4,979,612 bp and 4,989,683 bp to 4,990,804 bp) at a frequency greater than 10%, but more often detected in the renal and colorectal than the other tumor samples (Fig 1 and Table 2).
Each circle corresponds to one tissue type (1- brain cancer (pediatric medulloblastoma), 2- breast cancer, 3- colorectal cancer, 4- gastric cancer, 5- lung cancer, 6- ovarian cancer, 7- prostate cancer, 8- renal cancer, 9- healthy tissue). Each gridline correlate to 10%.
Comparison of altered genomic regions within different tumor entities
To recognize a possible influence of genomic alterations on tumorigenesis and progression, the patterns of genomic regions featuring an alteration in copy number were examined to determine if they are overlapped with certain genes. Therefore, the average segment mean for each tumor entity was determined and the genomic regions with an abnormal number of copies were identified. We detected both cancer-specific and tumor entity-specific regions for altered copy number. One genomic region on chromosome 16 (85,091,864 bp to 85,092,748 bp) seemed to be deleted in seven out of eight tumor entities (S1 and S2 Figs). In six tumor entities (breast, colorectal, lung, ovarian, prostata and renal) the genomic region was between 161,222 bp to 39,397,732 bp on chromosome 8 deleted, in dependence of entity. Additionally, the genomic region from 6,689 bp to 18,917,915 bp on chromosome 17 is deleted in malignant tumor tissue by a factor of 0.6, whereas this segment was only slightly reduced (by a factor of 0.1) in breast, colorectal and ovarian tumor samples (S1 and S3 Figs). In all cases, the copy number is less altered in the normal tissue samples than in at least one type of tumor samples (S5–S13 Tables).
Distribution of BPRs and segments of altered copy number in intragenic and intergenic regions
To mention the impact of BPRs and copy number variations we analyzed the frequency of BPRs and segments of copy number variations in intragenic and intergenic regions. The determined BPRs were nearly equally distributed over all eight tumor entities (only 5% difference) (S4 Fig). It is remarkable that the majority of BPRs is completely located within intragenic regions (25%) or within regions overlapping genes and non-coding segments (35-40%) of the human genome. However, the fact that nearly one third of the BPRs occupies intergenic regions indicates altered DNA structures within regulatory or functionally important regions as well as in regions of unknown functionality. The number of affected intergenic regions is very high in breast, gastrointestinal and renal tissues. (S4 and S5 Figs).
Affected genes in regions of altered copy number
To identify genes localized within the copy number altered regions, 564 (mostly short) segments with continuous and concordant SNP signal intensity alterations were examined. Most of these segments (147) were derived from the ovarian tumor samples and the fewest (16) from the brain (pediatric medulloblastoma) tumor samples. By contrast, only 10 were found in the normal samples. On chromosome 8, longer genomic regions were affected in five different tumor entities, but only very short segments in the normal tissues. In renal tumor a remarkably increased number (127) of short (mostly deleted) regions could be observed. It is noticeable that in breast cancer the p-arm of chromosome 1 was mainly amplified and the q-arm almost completely deleted (Fig 2).
Each circle corresponds to one tissue type (1- brain cancer red(pediatric medulloblastoma), 2- breast cancer, 3- colorectal cancer, 4- gastric cancer, 5- lung cancer, 6- ovarian cancer, 7- prostate cancer, 8- renal cancer, 9- healthy tissues). Alterations are shown with red inner bar for a loss (deletion) and blue outer bar for a gain (amplification). The bars are localized to the respective genomic positions.
In all the eight tumor entities 20,062 genes within copy number altered regions were identified and 169 of them are described as tumor-associated. The majority of genes was found in breast and ovarian cancer (10,979 and 9,982, respectively) and the same holds true for tumor-associated genes (99 and 86, respectively). The fewest number of impacted genes and tumor-associated genes was encounted in gastric cancer with 66 and 6 genes, respectively. Only seven affected genes were found in the normal samples, but to the best of our knowledge none are associated with cancer (S6 Fig and S16 Table). Only one genomic region on chromosome 8, where the ADAM (a disintegrin and metalloproteinase) 32 gene is located, was found to be affected by copy number alteration in all the tumor entities as well as in the normal samples. In seven out of eight tumor entities, the gene KIAA0513 was found to be concerned. 12 regions on chromosome 8 and one stretch on chromosome 21 were repeatedly found in six tumor entities. The same 838 gene regions were affected in five tumor entities. 826 of them are located on chromosome 8 and two on chromosome 17 (S14 and S15 Tables).
Comparison of BPRs and copy number altered segments between tumor and normal samples
The genome-wide analysis show that the number of the identified BPRs ascertained together with the regions subject to copy number changes in tumor is approximately equal to the number in healthy genomes. But the occurrence of certain BPRs is higher in the tumor entities, while the distribution of BPRs, detected in healthy tissues, are nearly homogeneously distributed over the genome. For the copy number altered segments, the single tumor entities show a higher altered copy number than the healthy tissue. These results are in line with those of previous studies [11, 35]. On average, 70 segments per tumor entity were detected and only 10 in the normal samples. The equal number of BPRs in normal tissues in comparison with tumor samples supports the idea that genomic variations, especially CNVs, likely amount to 4.8-9.5% in healthy genomes . In addition to the lower appearance of copy number altered segments in the normal samples we identified within those segments a very low number of affected genes. These affected genes would probably not alter the phenotypic outcome  and these are not known as being cancer-associated.
Comparison of identified BPRs and copy number altered segments between different tumor entities
In the current study, many of the BPRs and copy number altered segments could be repeatedly found in multiple tumor samples and entities. To further investigate whether BPRs and segments with alteration in copy number are tumor entity-specific, cancer-specific or common genomic alterations, we evaluated the differences of these alterations between the eight tumor entities. We were able to find seemingly existing patterns in the genome-wide detected alterations as well as BPRs.
Tumor entity-specific BPR: By comparison of three cancer relevant classes, we found that most of the BPRs are tumor entity-specific. Each tumor entity exhibited an individual BPR-pattern to a certain extent, possibly indicating that tumorigenesis and tumor progression are partially due to individual genomic variations. A possible explanation for this might be the distinct differentiation of cells from various tissue sources. The difference in the activation of genomic regions across multiple tissue types could induce varying replication frequencies between different types of tissues . For example, the probability of the incidence of a malignant heart tumor is very low, because the cardiomyocytes are postmitotic cells . Consequently, the differentiation of a cell could have a crucial influence on the individual BPR-patterns and therefore also on the tumorigenesis and tumor progression.
Cancer-specific BPR: The cancer-specific BPR class is in the second place among the detected BPRs. The cancer-specific BPRs appeared only in the tumor and not in the healthy genomes. This class could be substantial for clarification of the common cancer risk factors associated with certain genomic positions, which promote the tumorigenesis.
Common BPR: The common BPR class was the least frequently occurring class found in this study. The BPRs and copy number altered segments identified in the tumor samples were also partly detected in the normal samples. Thereby, it is noticeable that the probability of the occurrence of the BPRs were in at least one tumor entity higher than in the normal samples. These results support the hypothesis relating to fragile sites that the genome at certain sites is unstable in contrast to other genomic positions . The lower stability gives rise to a higher fragility for DNA breakage at those sites . This supports our assumption of the increase in fragmentation of the genome during the tumor progression . Because of the high frequency of replication of cancer cells, the probability of further breaks in those areas is very high. Such an event might account for the increasing occurrence of the common BPRs in the particular tumor entities. Similar findings were also reported in the literature .
Presence of hotspot-regions
In this research, we observed an increased occurrence of several adjacent BPRs by comparison of the detected structural alterations. Also, we noted an overlapping of segments with similar copy number alterations, referred to as “hotspot-areas”. The presence of these areas are likely due to either technical or biological conditions. The technical reason might be the resolution and the layout of the detection method (SNP 6.0 Datasheet) . These regions could also be exhibit a high gene activity with the result that the DNA strand is longer unwind. Within that areas the DNA cut randomly easier because of a minor stability . On the other hand, stochastic effects related to DNA replications may contribute to cancer incidence . Taken together, it is imaginable that the breakage would occur randomly. But in a few but important regions, they frequently arise due to genomic instability.
Importance of the NOF-BPRs
To illustrate the importance of the NOF set of BPRs, we assessed the colocalization of these BPRs, the associated CNV segments and the potentially involved genes. Thereby, we found that these variants were related to both protein-coding and non-coding genes (e.g. lncRNA). About 0.8% of the identified genes are related to cancer. Most of the 31 NOF-BPRs fall into the common class (55%). Specifically, the most frequently detected BPR is located within the protein-coding gene KIAA0513 (chr16: 85,061,374 bp to 85,127,836 bp). It is well known that this gene plays an important role in immunologic, synaptic and apoptotic signal pathways. Thus, a deletion within the gene, as we detected in 7 of 8 tumor entities and the normal samples, could disturb the gene expression and induce a loss of function as a signal molecule in apoptosis , consequently promoting the tumorigenesis.
Two other interesting common BPRs were found on chromosome 8. Between these regions the ADAM32 gene is located (chr8: 39,308,564 bp to 39,380,371 bp). Additionally, in all 8 tumor entities and the normal samples there was a deletion within this gene. Previous studies [43–45] have reported that members of the ADAM family of proteins such as ADAM8, ADAM9, ADAM10, ADAM12, ADAM15, ADAM17, ADAM19, ADAM28 are overexpressed in human malignant tumors. ADAM proteins participate in mediating ectodomain-shedding of several proteins, including tumor necrosis factor-α (TNF), transforming growth factor (TGF)-α and heparin-binding-epidermal growth factor (HB-EGF) [43, 45, 46]. Dysregulation of TNF production has been linked to a variety of human diseases including Alzheimer’s disease, major depression and cancer . Up to now, several pathways have been postulated to account for the mediation of ADAM in cancer cell proliferation and progression. One of them is the ectodomain-shedding of growth factors TGF-α and HB-EGF. This process perhaps alter signaling on the surfaces of cancer cells, inducing amplified cell proliferation through autocrine and paracrine mechanism . According to our data, we speculate that like many of the ADAMs, ADAM32 may have importance in cancer cell proliferation and progression.
29% of the set of NOF-BPRs were classified as cancer-specific. The gene RP11-115C21.2 (chr8: 6,261,072 bp to 6,264,663 bp), coding for a large intergenic non-coding RNA (lincRNA), is localized close to one BPR in over 75% of the analyzed tumor entities (≥ 1% occurrence). In 5 tumor entities, the region of the gene was affected by an extended deletion. The lincRNAs make up the most of the long non-coding RNAs (lncRNAs). In the last few years, the importance of the lncRNA has been uncovered for tumorigenesis and mutagenesis. LncRNA can appear as either oncogene or tumor suppressor gene by alteration in the structure of chromatin [48–50] and also affect the transcription of protein-coding genes . Based on our analyses, it could be suggested that the regulation of cell cycle and apoptosis were disturbed because of the deleted segments and the RP11-115C21.2 gene operated as tumor suppressor gene.
By contrast, only 16% of the NOF-BPRs were found to be tumor entity-specific. This allows for the assumption that higher entity-specific BPRs could be detected preferably in multiple tumor entities. A similar idea has also been proposed by a study of somatic copy number alterations . In this manner, it could be supposed that the genomic alterations, which promote a cancer disease, are common in multiple tumor entities. Only a few of individual alterations are associated with single tumor entities.
One out of this set of BPRs is located on chromosome 17 between 18,917,915 bp and 19,168,912 bp and was only detectable in brain tumor tissues with sufficient frequencies. By contrast, the complete region of 6,689 bp to 18,917,915 bp were deleted in three other tumor entities and in the brain cancer. In this section, several tumor-associated genes are coded, and therefore being affected. Among these, the most known gene coded for the tumor suppressor p53 (TP53 7,565,097 bp—7,590,856 bp) is important for cycle arrest, apoptosis, senescence, DNA repair and evokes changes in metabolism [52, 53].
In conclusion, our analysis of genetic variations ties CNV detection, evaluation of BPR occurrence frequency, identification of CNV-affecting genes and functional annotation. By studying a large number of tumor and healthy samples of diverse tissue origins, we have found that BPRs tended to occur more frequently in certain genomic regions in the tumor samples whereas being genome-wide dispersed in the normal samples. In general, therefore, it seems that some regions are preferential targets for the underlying mutations, suggesting the non-randomness of CNV mutations in tumor genomes. The higher recurrency of the tumor entity-specific BPRs could also suggests that there are several tissue-specific mechanisms of tumorigenesis and progression. The strongly enhanced occurrence of specific BPRs in tumors may be due to increased cell proliferation; however, several known tumor-associated genes were colocalized in the same genomic regions, and thus supporting common progression mechanism that explains the increasing fragmentation of DNA along with tumor progression. A majority of the identified tumor-relevant BPRs are either tumor entity-specific or associated with multiple-entities (termed ‘cancer-specific’). A small part of these BPRs (termed ‘common BPRs’) exhibited also a noticeable occurrence frequency in the normal samples. Further we observed hotspots at which segments with similar alterations in copy number were overlapped along with BPRs adjacently clustered. The presence of those hotspots and common BPRs imply that frequently affected mutations at fragile sites loci might also be responsible for cancer-gene alteration. Colocalization assessment and functional annotation revealed that not only protein-coding genes but also long intergenic non-coding RNAs were affected by CNV genomic regions, suggesting that both protein-coding and non-coding genes with a broad range of biological functions might play a causative or functional role in tumor biology. The findings of the present study with larger sets of samples of diverse tissue origins would serve as viable clues to the interpretation of the mechanisms for carcinogenesis.
Identifying BPRs and characterizing their influence on tumor phenotypes can help to identify molecular factors and biomarkers responsible for tumorigenesis and progression, and hence developing new and effective therapeutic strategies. The sorting of BPRs into BPR recurrency classes would be a suitable starting point. We have found a set of BPRs with noticeable occurrence frequency in specific tumor entities. However, more research on this topic needs to be undertaken before the asscociation between BPR classification and tumor phenotypes is more rationally established. This study showed that tumor genomes exhibit a large set of BPRs. There are many factors contributing to DNA breakpoints, for example, increased cell proliferation, failed DNA repair mechanism, or vulnerability to specific processes or damages and histone modifications. In particular, interest in histone modifications has grown over the last decade because alterations in the function of histone-modifying lead to oncogenic transformation. With the methods in the present study, however, it is not possible to address the influence of this causual variant on cancer development and progression. Note that genes interact with other genes in complex signaling or regulatory networks, and pathways are more likely to cooperate together, it would be desirable to incorporate information about different pathways possibly involved in cancer cell proliferation and progression.
S1 Table. Tumor Samples.
List of GEO accession (GSM) numbers for all used tumor samples.
S2 Table. Reference Samples.
List of all used sample names from HapMap Project (Phase 3, Release #3) for the pooled reference dataset.
S3 Table. Normal Samples.
List of GEO accession (GSM) numbers for all used samples from healthy tissue.
S4 Table. Overview over all detected breakpoint regions (BPRs) (tumor and healthy tissue samples).
Data presented here include the related chromosome, start and end positions, relative occurrence probability in the particular tissues, the total occurrence over all samples and the related classification of the BPR.
S5 Table. Segments of copy number alteration in brain cancer (pediatric medulloblastoma).
Averaged log2 ratios copy number alterations were evaluated for brain cancer tissues. Additional information presented in the table includes chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S6 Table. Segments with altered copy numbers in breast cancer.
Altered averaged log2 ratios from chromosomal average in breast cancer tissues were given, together with chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S7 Table. Segments with altered copy numbers in colorectal cancer.
Averaged log2 ratios from chromosomal average in colorectal cancer tissues are presented, together with additional information including chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S8 Table. Segments with altered copy numbers in gastric cancer.
Averaged log2 ratios from chromosomal average in gastric cancer tissues were calculated. Additional results obtained include chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S9 Table. Segments with altered copy numbers in lung cancer.
Averaged log2 ratios from chromosomal average in lung cancer tissues are given together with information including chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S10 Table. Segments with altered copy numbers in ovarian cancer.
Averaged log2 ratios from chromosomal average in ovarian cancer tissue are presented together with additioinal information including chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S11 Table. Segments with altered copy numbers in prostate cancer.
Averaged log2 ratios from chromosomal average in prostate cancer tissues are given, together with additioinal information including chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S12 Table. Segments with altered copy numbers in renal cancer.
Averaged log2 ratios from chromosomal average in renal cancer tissues are shown together with additional information including chromosome number, types of alterations (deletion—loss, amplification -gain), start and end positions.
S13 Table. Segments with altered copy numbers in healthy tissues.
Data presented here are log2 ratios from chromosomal average in healthy tissues, chromosome number, types of alterations (deletion—loss, amplification—gain), start and end position.
S14 Table. Overview over all detected potentially affected genes due to segments with altered copy number.
For the potentially affected genes, Ensembl IDs are given together with additional information, including gene names (if known), the related chromosome, the start and end positions of the genes, whether a gene is affected at least 1 time in the respective tissue (1—0 if not) and the total number of potentially affected tissues.
S15 Table. Potentially affected tumor-associated genes.
Only those genes which are known as tumor-associated due to segments with altered copy number were summarized. The affected genes are demonstrated with the Ensembl IDs, gene names (if known), the related chromosome, the start and end positions of the genes, whether a gene is affected at least 1 time in the respective tissue (1—0 if not) and the total number of potentially affected tissues.
S16 Table. Potentially affected genes in the healthy tissues.
Only those genes which have segments of altered copy number in healthy tissues were summarized. The affected genes are demonstrated with the Ensembl IDs, gene name, the related chromosome, the start and end positions of the genes, whether a gene is affected at least 1 time in the respective tissue (1—0 if not) and the total number of potentially affected tissues.
S1 Fig. Circos plot of copy number changes over the full genome.
Averaged log2 ratios of copy number were evaluated for the tumor entities and the healthy tissue samples over all autosomal chromosomes. Data are shown with entities numbered (1- brain cancer pediatric medulloblastoma, 2- breast cancer, 3- colorectal cancer, 4- gastric cancer, 5- lung cancer, 6- ovarian cancer, 7- prostate cancer, 8- renal cancer, 9- healthy tissues).
S2 Fig. Circos plot of copy number changes in chromosome 16.
Averaged log2-ratios of the tumor entities and the healthy tissue samples in single plots over chromosome 16 are presented with entities numbered (1- brain cancer pediatric medulloblastoma, 2- breast cancer, 3- colorectal cancer, 4- gastric cancer, 5- lung cancer, 6- ovarian cancer, 7- prostate cancer, 8- renal cancer, 9- healthy tissues).
S3 Fig. Circos plot of copy number changes in chromosome 17.
Averaged log2-ratios of the tumor entities and the healthy tissue samples over chromosome 17 are shown with entities numbered (1- brain cancer pediatric medulloblastoma, 2- breast cancer, 3- colorectal cancer, 4- gastric cancer, 5- lung cancer, 6- ovarian cancer, 7- prostate cancer, 8- renal cancer, 9- healthy tissues).
S4 Fig. Bar plot of the occurrence of BPRs within genomic and intergenic regions.
The numbers of intragenic and intergenic BPRs are shown as percentages of intragenic (dark green) regions, regions which are overlapping intra- and intergenic regions (light green) and intergenic region (grey) for every tumor entity and the healthy tissue.
S5 Fig. Bar plot of the occurrence of segments of altered copy number within intragenic and intergenic regions.
The number of intragenic and intergenic segments of altered copy number were counted. It is shown the percentages of intragenic (dark green) regions, regions which are overlapping intra- and intergenic regions (light green) and intergenic region (grey) for every tumor entity and the healthy tissue.
S6 Fig. Bar plot of the frequency of possibly affected genes by segments of altered copy number.
The following figure illustrates the number of affected genes (grey) compared to the number of affected tumor associated genes (green) for every tumor entity and the healthy tissue.
Conceived and designed the experiments: HP. Analyzed the data: SM BB. Contributed reagents/materials/analysis tools: SM BB. Wrote the paper: CW HP.
- 1. Mahfouz EM, Sadek RR, Abdel-Latief WM, Mosallem FAH, Hassan EE. The role of dietary and lifestyle factors in the development of colorectal cancer: case control study in Minia, Egypt. Cent Eur J Public Health. 2014 Dec;22(4):215–222. pmid:25622477
- 2. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Kinzler KW. Cancer Genome Landscapes. Science. 2013 Mar;339(6127):1546–1558. pmid:23539594
- 3. Stewart BW, Wild CP. In: World Cancer Report 2014. World Health Organization; 2014.
- 4. Hanahan D, Weinberg RA. The Hallmarks of Cancer: The Next Generation. Cell. 2011 Mar;144(5):646–674. pmid:21376230
- 5. Bernstein C, Prasad AR, Nfonsam V, Bernstein H. DNA Damage, DNA Repair and Cancer. In: Clark Chen, editor. New Research Directions in DNA Repair. InTech; 2013.
- 6. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature. 2006 Nov;444(7118):444–454. pmid:17122850
- 7. Malhotra D, Sebat J. CNVs: Harbingers of a Rare Variant Revolution in Psychiatric Genetics. Cell. 2012 Mar;148(6):1223–1241. pmid:22424231
- 8. Zhang F, Khajavi M, Connolly AM, Towne CF, Batish SD, Lupski JR. The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat Genet. 2009 Jul;41(7):849–853. pmid:19543269
- 9. Gratias S, Rieder H, Ullmann R, Klein-Hitpass L, Schneider S, Bölöni R, et al. Allelic loss in a minimal region on chromosome 16q24 is associated with vitreous seeding of retinoblastoma. Cancer Res. 2007 Jan;67(1):408–416. pmid:17210724
- 10. Lauriat TL, Dracheva S, Kremerskothen J, Duning K, Haroutunian V, Buxbaum JD, et al. Characterization of KIAA0513, a novel signaling molecule that interacts with modulators of neuroplasticity, apoptosis, and the cytoskeleton. Brain Res. 2006 Nov;1121(1):1–11. pmid:17010949
- 11. Yang R, Chen B, Pfutze K, Buch S, Steinke V, Holinski-Feder E, et al. Genome-wide analysis associates familial colorectal cancer with increases in copy number variations and a rare structural variation at 12p12.3. Carcinogenesis. 2014 Feb;35(2):315–323. pmid:24127187
- 12. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet. 2008 Oct;40(10):1166–1174. pmid:18776908
- 13. Standfuss C, Pospisil H, Klein A. SNP microarray analyses reveal copy number alterations and progressive genome reorganization during tumor development in SVT/t driven mice breast cancer. BMC Cancer. 2012 Aug;12:380. pmid:22935085
- 14. Albertson DG, Collins C, McCormick F, Gray JW. Chromosome aberrations in solid tumors. Nat Genet. 2003 Aug;34(4):369–376. pmid:12923544
- 15. Shlien A, Tabori U, Marshall CR, Pienkowska M, Feuk L, Novokmet A, et al. Excessive genomic DNA copy number variation in the Li-Fraumeni cancer predisposition syndrome. Proc Natl Acad Sci USA. 2008 Aug;105(32):11264–11269. pmid:18685109
- 16. Silva AG, Krepischi ACV, Pearson PL, Hainaut P, Rosenberg C, Achatz MI. The profile and contribution of rare germline copy number variants to cancer risk in Li-Fraumeni patients negative for TP53 mutations. Orphanet J Rare Dis. 2014 Apr;9:63. pmid:24775443
- 17. Bougeard G, Sesboüé R, Baert-Desurmont S, Vasseur S, Martin C, Tinat J, et al. Molecular basis of the Li-Fraumeni syndrome: an update from the French LFS families. J Med Genet. 2008 Aug;45(8):535–538. pmid:18511570
- 18. Ruijs MW, Verhoef S, Rookus MA, Pruntel R, van der Hout AH, Hogervorst FB, et al. TP53 germline mutation testing in 180 families suspected of Li-Fraumeni syndrome: mutation detection rate and relative frequency of cancers in different familial phenotypes. J Med Genet. 2010 Jun;47(6):421–428. pmid:20522432
- 19. Bose P, Hermetz KE, Conneely KN, Rudd MK. Tandem Repeats and G-Rich Sequences Are Enriched at Human CNV Breakpoints. PLoS ONE. 2014 Jul;9(7).
- 20. Li Y, Zhang L, Ball RL, Liang X, Li J, Lin Z, et al. Comparative analysis of somatic copy-number alterations across different human cancer types reveals two distinct classes of breakpoint hotspots. Hum Mol Genet. 2012 Nov;21(22):4957–4965. pmid:22899649
- 21. Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015 Jan;347(6217):78–81. pmid:25554788
- 22. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, et al. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 2004 May;64(9):3060–3071. pmid:15126342
- 23. Gene Expression Omnibus;. Available from: http://www.ncbi.nlm.nih.gov/geo/.
- 24. International HapMap Project—Raw Data Download Affymetrix6.0;. Available from: http://hapmap.ncbi.nlm.nih.gov/downloads/raw_data/hapmap3_affy6.0/.
- 25. Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010 Sep;467(7311):52–58. pmid:20811451
- 26. Li W, Olivier M. Current analysis platforms and methods for detecting copy number variation. Physiol Genomics. 2013 Jan;45(1):1–16. pmid:23132758
- 27. Karimpour-Fard A, Dumas L, Phang T, Sikela JM, Hunter LE. A survey of analysis software for array-comparative genomic hybridisation studies to detect copy number variation. Hum Genomics. 2010 Aug;4(6):421–427. pmid:20846932
- 28. Affymetrix Power Tools;. Available from: http://www.affymetrix.com/estore/partners_programs/programs/developer/tools/powertools.affx#1_1.
- 29. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2013. Available from: http://www.R-project.org/.
- 30. de Andrade M, Atkinson EJ, Bamlet WR, Matsumoto ME, Maharjan S, Slager SL, et al. Evaluating the influence of quality control decisions and software algorithms on SNP calling for the affymetrix 6.0 SNP array platform. Hum Hered. 2011 Sep;71(4):221–233. pmid:21734406
- 31. Neuvial P, Bengtsson H, Speed T. In: Horng-Shing Lu H, Schölkopf B, Zhao H, editors. Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies. Springer Handbooks of Computational Statistics. Springer Berlin Heidelberg; 2011. p. 225–255.
- 32. Le Scouarnec S, Gribble SM. Characterising chromosome rearrangements: recent technical advances in molecular cytogenetics. Heredity (Edinb). 2012 Jan;108(1):75–85.
- 33. Olshen A, Venkatraman E, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004 Oct;5:557–572. pmid:15475419
- 34. Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005 Aug;21(16):3439–3440. pmid:16082012
- 35. Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, et al. The landscape of somatic copy-number alteration across human cancers. Nature. 2010 Feb;463(7283):899–905. pmid:20164920
- 36. Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. 2015 Mar;16(3):172–183. pmid:25645873
- 37. Nordman J, Orr-Weaver TL. Regulation of DNA replication during development. Development. 2012 Feb;139(3):455–464. pmid:22223677
- 38. McAllister HA, Fenoglio J. Tumors of the Cardiovascular System. In: Hartmannn WCW, editor. Atlas of Tumor Pathology. Washington, DC: Armed Forces Institute of Pathology; 1978. p. 46–47. (second series, fasc 15).
- 39. Dillon LW, Burrow AA, Wang YH. DNA Instability at Chromosomal Fragile Sites in Cancer. Curr Genomics. 2010 Aug;11(5):326–337. pmid:21286310
- 40. Gorgoulis VG, Vassiliou LV, Karakaidos P, Zacharatos P, Kotsinas A, Liloglou T, et al. Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature. 2005 Apr;434(7035):907–913. pmid:15829965
- 41. Thys RG, Lehman CE, Pierce LC, Wang YH. DNA secondary structure at chromosomal fragile sites in human disease. Curr Genomics. 2015 Feb;16(1):60–70. pmid:25937814
- 42. Data Sheet- Genome- Wide Human SNP Array 6.0. Affymetrix. 2009;2:1–4.
- 43. Mochizuki S, Okada Y. ADAMs in cancer cell proliferation and progression. Cancer Science. 2007 May;98(5):621–628. pmid:17355265
- 44. Brocker C, Vasiliou V, Nebert D. Evolutionary divergence and functions of the ADAM and ADAMTS gene families. Hum Genomics. 2009 Oct;4(1):43–55. pmid:19951893
- 45. Fröhlich C, Klitgaard M, Noer JB, Kotzsch A, Nehammer C, Kronqvist P, et al. ADAM12 is expressed in the tumour vasculature and mediates ectodomain shedding of several membrane-anchored endothelial proteins. Biochem J. 2013 May;452(1):97–109.
- 46. Weber S, Saftig P. Ectodomain shedding and ADAMs in development. Development. 2012 Oct;139(20):3693–3709. pmid:22991436
- 47. Idriss HT, Naismith JH. TNF and the TNF receptor superfamily structure-function relationship(s). Microsc Res Tech. 2000 Aug;50(3):184–195. pmid:10891884
- 48. Gutschner T, Diederichs S. The hallmarks of cancer: a long non-coding RNA point of view. RNA Biol. 2012 Jun;9(6):703–719. pmid:22664915
- 49. Gutschner T, Hämmerle M, Eißmann M, Hsu J, Kim Y, Hung G, et al. The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer Res. 2013 Feb;73(3):1180–1189. pmid:23243023
- 50. Yan X, Hu Z, Feng Y, Hu X, Yuan J, Zhao S, et al. Comprehensive Genomic Characterization of Long Non-coding RNAs across Human Cancers. Cancer Cell. 2015 Oct;28(4):529–540. pmid:26461095
- 51. Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D, et al. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell. 2010 Aug;142(3):409–419. pmid:20673990
- 52. Lane DP. Cancer. p53, guardian of the genome. Nature. 1992 Jul;358(6381):15–16. pmid:1614522
- 53. Peifer M, Fernández-Cuesta L, Sos ML, George J, Seidel D, Kasper LH, et al. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nat Genet. 2012 Oct;44(10):1104–1110. pmid:22941188