Colorectal Adenomas Contain Multiple Somatic Mutations That Do Not Coincide with Synchronous Adenocarcinoma Specimens

We have performed a comparative ultrasequencing study of multiple colorectal lesions obtained simultaneously from four patients. Our data show that benign lesions (adenomatous or hyperplastic polyps) contain a high mutational load. Additionally multiple synchronous colorectal lesions show non overlapping mutational signatures highlighting the degree of heterogeneity between multiple specimens in the same patient. Observations in these cases imply that considering not only the number of mutations but an effective oncogenic combination of mutations can determine the malignant progression of colorectal lesions.


Introduction
Our current understanding of colorectal cancer assumes that its pathogenesis includes a progressive accumulation of genomic changes at multiple stages. Thus, initiating events, such as driver mutations affecting APC or KRAS genes, are followed by additional alterations in specific genes such as p16 and p53 [1] and signalling pathways including WNT, MAPK, GNAS or TGFB that, over time, will shape the genomic conditions that drive a pre-malignant lesion towards cancer [2][3][4]. Thus, premalignant lesions such as colorectal adenomas feature mutational events in APC, BRAF, KRAS and other genes [2,5]. As the disease progresses, colorectal adenocarcinoma specimens can also accumulate mutations in genes such as p53 and FBXW7 as well as in MAPK, TGFB, PI3K and DNA mismatch-repair pathways [3]. However, the question of whether somatic mutations accumulate in the adenoma-carcinoma sequence in the same patient remains to be investigated.
Here we have sequenced whole exomes of multiple lesions in four non-MSI colorectal cancer patients corresponding to different adenoma and adenocarcinoma specimen samples taken during the same endoscopic procedure. Our first finding was that adenomas contained a large number of mutations that, in general were reduced but still comparable, with the frequency found in colorectal cancer samples. Additionally, different adenoma lesions within the same patient were strikingly heterogeneous. Analysis of the mutation frequency also showed that a large majority of the mutations found in adenoma samples were subclonal, and probably passenger mutation events.

Results and Discussion
We characterized the genomic variants in a series of untreated colorectal lesions that included adenocarcinomas, adenomas and hyperproliferative polyps taken simultaneously by endoscopic resection, along with normal mucosa, which was used as a control (S1 Table). The topologies of the lesions of each patient are shown in Figs. 1a, 2a, 3a and 3b and the clinical characteristics are summarised in Table 1. We generated two paired-end 75-bp whole exome sequencing libraries and sequenced them using an Illumina HiSeq2000 instrument, which allowed us to map an average of~102 million reads per sample. Under these conditions, the mean coverage of the target sequenced was 99X (78X-141X) with a mean of 92.1% (89.8-95.9) of targeted bases with at least 15X coverage (S1 Table). Somatic variants were identified using the SAMtools suite. Additionally, we used RAMSES software [6] to call potential mutations showing minimum independent multi-aligner evidence that enabled us to identify subclonal variants present in at least 5% of the reads. We also performed a secondary analysis in a selection of genes with known biological activity that confirmed specific mutations in up to 76.5% of those genes with a mutational percentage above 15% in each sample of our primary analysis (Figs. 1b and 2b and S1 Fig. and S2 Table). Using the data obtained in our primary analysis and aligned with previous observations in colorectal lesions [5], we observed that most mutations were C>T/G>A changes that occurred in CpG in up to 75% of the cases (Fig. 4, and S5 Table). In addition, we reproduced these results using the validated data from the secondary analysis (S2 Fig.). A detailed description of the main findings is included in table 2 and S1-S5 Tables. We decided to focus on those alterations that could potentially induce changes in the expression or activity of the proteins including amino acid changing or truncating mutations. Analysing their incidence, we found that most but not all benign lesions (adenoma or hyperproliferative polyp) contained less genomic alterations than the colorectal cancer specimens (Figs. 1b, 2b, 3a and 3b and table 2); a mutational rate similar to that described by the TCGA network for the non-hypermutated colorectal adenocarcinoma samples [3]. Using this approach we were able to detect one or multiple distinct gene alterations affecting APC in 6 of the 8 adenomas analysed, thereby underlining the relevance of the APC gene inactivation in the genesis of colorectal adenomas. In the same line of evidence, we observed that these benign lesions lacked mutations in genes or pathways considered essential in colorectal cancer [3], with the possible exception of PIK3CG in the adenoma-2 case (Fig. 2c) or KRAS and NRAS mutations found in adenomas-4B and 4C (Fig. 3f). On the other hand, we noticed that a number of mutations found in the adenocarcinomas affected oncogenes such as GHR and INSR (Fig. 1c) or KRAS and ERBB4 (Fig. 2c). These are well known for their ability to activate MAPK signalling. We were able to detect them alongside other somatic mutations affecting SMAD genes (TGFB signalling, Fig. 2c and Fig. 3e) or adenylyl cyclases such as ADCY2 (Fig. 1c) and ADCY1 (Fig. 2c) that participate in the COX2-PGE2-PR-GNAS signalling axis (reviewed in [7]). When comparing the mutational spectrum of the multiple samples from the same patient, we did not find a single recurrent mutation, which in addition to the multiple and non-recurrent alterations  found in APC, suggests an independent origin of the multiple adenomas and adenocarcinoma in the same patient. In this respect, we could detect individual lesions like for example adenoma-30 ( Fig. 1), carrying different mutations in APC detected at different percentages (14% and 51%). This may reflect a degree of subclonal activity that is not exclusive to adenomas, since adenocarcinoma-2 ( Fig. 2) also harboured two distinct APC mutations in 10.9% and 10.4% of the alleles read. Moreover, our observations (aligned with those found in [5]), seem to suggest that colorectal adenomas, independently of their size or degree of dysplasia, and even hyperplastic polyps, (both with reduced potential to make progress towards cancer), still feature a relatively high number of subclonal mutations combined into inefficient non-carcinogenic signatures. Thus, early steps of colorectal cancer could be characterised by highly dynamic genetic changes until an efficient neoplastic signature, giving rise to an infiltrating carcinoma, is generated. Due to the limited number of patients analysed we cannot yet generalize whether all benign lesions carry a high mutational load. This may also apply to the observation that mutations found in adenomas do not coincide with those found in synchronous adenocarcinoma specimens in the same patient, a finding that is supported by data from other laboratories [5]. The individual characterisation of these precise mutational signatures controlling tumour dynamics at specific stages of the disease may serve in the near future as an indicator for the development of specific combination therapies.

Ethics statement
All human samples used in this study were collected under a written informed consent form that was appropriately signed and authorized by each patient and the doctor(s) involved and approved by the CEIC (Comité Ético de Investigación Clínica, Cantabria). We kept the original records under specific restrictive conditions to fulfil the current legal requirements. All processes were conducted and approved following the specific recommendations of the CEIC.

Patients and samples
Nine freshly frozen colorectal samples taken from two previously untreated patients by endoscopic resection were selected for whole exome sequencing. Samples from Patient 1 (Fig. 1) consisted of: a) normal mucosa, b) adenomatous polyp (30 cm), c) adenomatous polyp (90 cm) and d) adenocarcinoma. Samples from Patient 2 ( Fig. 2) consisted of: a) normal mucosa, b) hyperplastic polyp, c) adenomatous polyp, d) adenomatous polyp and e) adenocarcinoma. Further information is provided in S1 and S5 Tables. All cases were reviewed by a panel of three pathology specialists and lesions were graded following standard criteria [8].
Genomic DNA extraction, quantification, exome enrichment and sequencing Purified genomic DNA (3 μg) was extracted from snap-frozen (fresh) samples using standard procedures. Briefly, PBS-washed samples, centrifuged and lysed using "Tissue and cell lysis solution" buffer for the MasterPure kit, complemented by proteinase K (5 μl/100 μl buffer) (Epicenter), shaking overnight at 56°C. DNA was extracted using phenol/chloroform/isoamyl alcohol (in proportions of 25:24:1, respectively) in a fast Lock Gel Light Eppendorf tube (Eppendorf), then washed and precipitated. Genomic DNA was quantified using a Qubit ds DNA BR assay kit and a Qubit 2.0 fluorometer (Invitrogen) following the manufacturer's instructions. Genomic DNA (3 μg) was then enriched in each case for protein coding sequences using the in-solution exome capture SureSelect Human All Exon 50 Mb kit (Agilent

Sequence mapping and identification of tumour variants
These methods have been described elsewhere [6]. Briefly, base calling and quality control were performed in the Illumina RTA sequence analysis pipeline. Sequence reads were trimmed up to the first base with a quality of more than 10. Mapping to human genome build hg19 (GRCh37) was done with GEM, allowing up to 4 mismatches [9]. Reads not mapped by GEM (~4% of them) were subjected to a final round of mapping with BFAST [10]. Results were merged and only uniquely mapping non-duplicate read pairs were used for subsequent analyses. The SAMtools suite [11] with default settings was used to call SNVs and short INDELS. Variants identified in regions with low mapability [12], with a read depth of < 10 or a strand bias probability of < 0.001 were filtered out. Variants were annotated and effects predicted with ANNOVAR [13] and snpEff [14], including information from dbSNP build 135 [15], the 1000 Genomes Project [16], the Exome Variant Server (NHLBI GO Exome Sequencing Project (ESP), Seattle, WA; http://evs.gs.washington.edu/EVS/) and an internal database of sequence variants identified in a set of > 100 control samples. Tags were added for positions with high strand bias, high tail distance bias, a read depth of < 10 and those in low mapability regions.   For tumour-normal comparison, the probability of a Fisher's exact test was calculated for positions with different genotypes in the two samples.

Detection of subclonal mutations
To identify mutations present in subclonal populations inside the tumours we used a slightly different analysis pipeline. Sequence reads were aligned to the human reference genome (GRCh37) using BWA, and the alignment was consequently cleaned using SAMtools and Picard tools for mating coordinate fixing and PCR duplicate flagging. Finally, GATK indel realigner was used to realign locally around small insertion and deletions (indels). A program specifically written inhouse named RAMSES ("Realignment Assisted Minimum Evidence Spotter"; Ignacio Varela, manuscript in preparation) was used to identify coordinates with a minimum value of 2, that were independently aligned with BLAT, and that gave high-quality reads reporting differences from the reference genome in the tumour sample and absolutely no evidence of the same change in the corresponding normal sample. Additionally, mutations near DNA repeats, present in the dbSNP and 1000 Genomes databases, or reported near the end of the reads, were removed. The functional consequence of the mutations was annotated using the Ensembl perl API (Ensembl database, release 69) and only coding mutations were retained.

Secondary analysis by 454 Roche
114 candidate variants from patients 1 and 2 were validated by targeted resequencing using the GS Junior System (Roche).~300 bp amplicons around the identified mutations were generated, to which specific adaptors were ligated (S3 Table). A pooled, barcoded mixture of amplicons was sequenced using the 454-Junior platform (Roche). The reads were aligned against the human reference genome (GRCh37) using the BWA-SW algorithm. SAMtools was used subsequently to generate bam and pileup files, which were parsed using scripts written in-house.
Only those positions with a minimum coverage of 20 in both tumour and normal samples were considered. Mutations with at least 5 independent mutant reads corresponding to a minimum of 1% of the total number of reads at that position in the tumour sample, but with no mutant reads present in the corresponding normal sample, were considered to be validated.