Nonsense Mediated Decay Resistant Mutations Are a Source of Expressed Mutant Proteins in Colon Cancer Cell Lines with Microsatellite Instability

Background Frameshift mutations in microsatellite instability high (MSI-High) colorectal cancers are a potential source of targetable neo-antigens. Many nonsense transcripts are subject to rapid degradation due to nonsense-mediated decay (NMD), but nonsense transcripts with a cMS in the last exon or near the last exon-exon junction have intrinsic resistance to nonsense-mediated decay (NMD). NMD-resistant transcripts are therefore a likely source of expressed mutant proteins in MSI-High tumours. Methods Using antibodies to the conserved N-termini of predicted mutant proteins, we analysed MSI-High colorectal cancer cell lines for examples of naturally expressed mutant proteins arising from frameshift mutations in coding microsatellites (cMS) by immunoprecipitation and Western Blot experiments. Detected mutant protein bands from NMD-resistant transcripts were further validated by gene-specific short-interfering RNA (siRNA) knockdown. A genome-wide search was performed to identify cMS-containing genes likely to generate NMD-resistant transcripts that could encode for antigenic expressed mutant proteins in MSI-High colon cancers. These genes were screened for cMS mutations in the MSI-High colon cancer cell lines. Results Mutant protein bands of expected molecular weight were detected in mutated MSI-High cell lines for NMD-resistant transcripts (CREBBP, EP300, TTK), but not NMD-sensitive transcripts (BAX, CASP5, MSH3). Expression of the mutant CREBBP and EP300 proteins was confirmed by siRNA knockdown. Five cMS-bearing genes identified from the genome-wide search and without existing mutation data (SFRS12IP1, MED8, ASXL1, FBXL3 and RGS12) were found to be mutated in at least 5 of 11 (45%) of the MSI-High cell lines tested. Conclusion NMD-resistant transcripts can give rise to expressed mutant proteins in MSI-High colon cancer cells. If commonly expressed in primary MSI-High colon cancers, MSI-derived mutant proteins could be useful as cancer specific immunological targets in a vaccine targeting MSI-High colonic tumours.


Introduction
Approximately 15% of colorectal, gastric and endometrial cancers have defective DNA mismatch repair and high-level microsatellite instability (MSI-High) [1,2]. Most MSI-High tumours result from sporadic hypermethylation of the MLH1 gene promoter [3], but mutation of a DNA mismatch repair enzyme is the underlying defect in cases of hereditary non-polyposis colorectal cancer (HNPCC) [4]. Tumours with MSI have a form of genetic instability that manifests as frameshift mutations in repetitive microsatellite sequences of DNA [1]. Frameshift mutations in coding microsatellites (cMS) alter the genetic reading frame and in most instances encode for truncated proteins with unique C-terminal protein sequences.
An intriguing pathological feature of MSI-High colorectal cancers is a tendency to have increased numbers of tumour infiltrating lymphocytes (TILs) relative to MSI-Low and microsatellite stable (MSS) tumours [5]. The TILs in MSI-High tumours are activated CD8+ cytotoxic T lymphocytes (CTLs) [6], suited to recognising intracellular peptides such as MSI-derived neo-antigens, presented by HLA class I molecules. An MSI-High phenotype and increased TILs in a colorectal cancer connote a better stage-matched prognosis relative to MSI-Low or MSS tumours [7,8]. This survival advantage may reflect a protective benefit of the immune infiltrate [9].
Synthetic peptides corresponding to mutant proteins predicted to arise from MSI-associated frameshift mutations have been shown to be immunogenic. Over the past decade, cytotoxic T lymphocyte (CTL) responses have been described against HLA-A2 binding peptides arising from frameshift mutations in the A10 cMS of TGFBR2 [10,11], the A10 cMS of CASP5 [12], the T10 cMS of OGT [13] and recently the T15 cMS of U79260 (FTO) [14]. CTL responses to mutant proteins have been detected in patients with MSI associated colon cancers and in healthy HNPCC-mutation carriers, raising the possibility of protective immunosurveillance in the latter population [15].
There is published mutation data available for at least 1522 mononucleotide coding microsatellites from 460 genes in MSI-High colorectal cancers [16]. A correlation exists between cMS length and mutation rate. Genes with cMS mutations are potential sources of targetable proteins for immune therapy strategies. Since frameshift mutations affect cMS repeats in a predictable manner, the mutant proteins generated are likely to be conserved in a population of patients with MSI-High colon cancers [17,18]. Although no one mutant transcript is common to all MSI-High tumours, a vaccine that targets a set of commonly mutated proteins may be an effective treatment for MSI-High colon cancers. There is a need for new therapies against MSI-High colon cancers, since multiple studies have shown these tumours to be resistant to standard chemotherapeutics such as 5-fluorouracil [19,20,21].
The existence of MSI-derived mutant proteins has been inferred by immunological studies, but there is a lack of published data regarding the direct detection of the predicted MSI-derived mutant proteins. Western blot analyses have been published in which potential mutant protein bands were detected for the CREBBP, EP300 [22] and MBD4 genes [23], but there was no validation of these bands. Confirming the expression of mutant proteins in MSI-High tumours is important if these are to be successful therapeutic targets for a vaccine. Stable expression of mutant proteins which facilitates cross presentation of tumour antigens by professional antigen presenting cells and stimulates a T helper response is likely to provide superior anti-tumour immunity [18]. However, most mutant transcripts generated by MSI are targeted for rapid degradation by nonsense-mediated decay (NMD) [24,25] and in the absence of intervention, little to no targetable mutant protein will be generated from these aberrant transcripts [24].
In this study we have detected three MSI-derived mutant proteins, all of which arise from NMD-resistant transcripts. Following a genome-wide search for additional NMD-resistant transcripts, cMS-containing genes were screened for mutations in a panel of 11 MSI-High colon cancer cell lines. Genes were selected for mutation analysis on the basis of likely high mutation frequency (based on cMS length), likely immunogenicity (based on mutant C-terminus amino acid length) and likely expression in colon cancer (based on gene and protein expression data where available). Commonly mutated NMD-resistant transcripts were assessed for probable HLA-A*0201 CTL epitopes using in silico predictive software, SYFPEITHI [26], for comparison with MSIderived epitopes already shown to be immunogenic in vitro.
Our results confirm that NMD-resistant transcripts give rise to expressed mutant proteins and also show that commonly occurring frameshift mutations that generate NMD-resistant transcripts are a promising source of targetable tumour antigens in MSI-High colon cancers. A set of commonly expressed mutant proteins has the potential to form the basis for a multivalent vaccine or immunotherapeutic targeting MSI-High tumours.

MSI-High colon cancer cell lines harbor frameshift mutations in multiple cMS
Screening of the colon cancer cells lines for aberrant expression DNA mismatch repair proteins revealed an abnormal phenotype consistent with MSI in 11 of 17 cell lines ( Table 1). The cell lines were assessed for frameshift mutations in a selection of genes with cMS previously reported to develop mutations in MSI-High colon cancers. Numerous mutations consistent with an MSI-High phenotype were detected in each cell line with aberrant DNA mismatch repair expression by immunohistochemistry, but few mutations if any were detected in the remaining cell lines ( Table 2,  Table S1). Deletion of 1 DNA base pair was the most common mutation, accounting for 82% (214/262) of the mutations observed in this study (probable biallelic mutations counted as 2 mutations). Most frameshift mutations encode for truncated proteins of lower molecular weight with a mutated C-terminus (Table 3).

Detectable MSI-derived mutant proteins are expressed by NMD-resistant transcripts
To detect expressed mutant proteins, we purchased antibodies to the conserved N-terminal portion of selected proteins. We initially studied a selection of commonly mutated genes reported to be potential ''target genes'' [1]. Western blot experiments on whole cell lysates and immunoprecipitates of the genotyped cell lines identified protein bands corresponding to the expected molecular weight for normal BAX (Figure 1), MSH3 and CASP5 proteins (unpublished data) in cell lines with non-mutated alleles of these genes. However, no mutant protein bands were detected for any of these genes. In the case of BAX we generated polyclonal rabbit antibodies to a synthetic peptide corresponding to the mutated C-terminus. These antibodies showed specific binding to synthetic mutant peptide in Biacore, ELISA and Western blot assays, but did not detect a mutant BAX protein band in whole cell lysates or immunoprecipitates of BAX mutated cell lines (unpublished data). Mutations in BAX, MSH3 and CASP5 all encode for NMD-sensitive (NMD-S) mutant transcripts; mutant BAX and MSH3 are known to be susceptible to degradation [27]. NMD-sensitivity could account for our inability to detect any mutant proteins arising from these three genes, though abnormal protein folding with loss of the antibody epitopes might also be a factor. We therefore sought to detect mutant proteins arising from NMD-resistant (NMD-R) transcripts. Previously published Western blot data showed potential mutant protein bands of the CREBBP and EP300 genes, each arising from a heterozygous (21, wild type) frameshift mutation of a 5 mer cMS repeats in the last exon, detected in the LoVo (CREBBP) and HCT116 (EP300) cell lines respectively [22]. We confirmed the presence of these mutations in LoVo and HCT116 cell lines by PCR and DNA sequencing. These cMS were not mutated in any of the other MSI-High or MSS cell lines. Bands corresponding to the expected molecular weights of the normal CREBBP and EP300 protein were consistently detected in Western blot analyses of nonmutated cell lines. We detected abnormal protein bands corresponding to the expected molecular weights of mutant CREBBP and mutant EP300 in the LoVo and HCT116 cell lines respectively in Western Blots on whole cell lysates ( Figure 2) and on immunoprecipitates ( Figure S1). Our blot matched the previously published data for EP300 [22]. The prior CREBBP data matched a region of non-specific bands in our blot, but protein bands of appropriate molecular weight were also present in our blots. The CREBBP bands were also detected in experiments where one antibody was used for an immunoprecipitation and an antibody to a different CREBBP epitope was used for detection.

Validation of the mutant proteins
To confirm the identity of the abnormal protein bands and prove that these represent genuine examples of the expected mutant proteins, we initially performed tandem mass spectrometry on immunoprecipitates of the cell lines. This technique confirmed the identity of the normal CREBBP and EP300 protein bands ( Figure S1), thus validating the specificity of the antibodies used. The mutant proteins were not detected by this method, possibly due to co-immunoprecipitation of more abundant proteins of similar molecular weight such as MYH9 (unpublished data). The specificity of the mutant protein bands was instead confirmed by using small interfering RNA (siRNA) to perform gene specific knockdowns of the mutant CREBBP and EP300 proteins. Decreased expression of the mutant CREBBP protein relative to untransfected and non-targeted controls correlated with a significant decrease in targeted mRNA, confirmed by qRT-PCR ( Figure 3). Knockdown of the mutant EP300 protein was similarly demonstrated in immunoprecipitates ( Figure 4). These results confirm that the abnormal protein bands detected are mutated variants of the CREBBP and EP300 proteins. This finding indicates that certain MSI-derived mutant proteins are naturally expressed at detectable levels.

Detection of additional potential mutant protein bands
In further Western Blot experiments ( Figure 5), mutant TTK protein bands were detected in multiple mutated cell lines at slightly higher molecular weight (101 kDa) than non-mutated TTK protein (97 kDa), consistent with the predicted size (Table 3). Attempts at siRNA knockdown of mutant TTK were hindered by difficulty in resolving the mutant and normal protein bands. An AIM2 band was detected at about 46 kDa in Western blots on immunoprecipitates from whole cell lysates, consistent the size detected in previous studies using this antibody [28]. AIM2 protein was detected in multiple non-mutated cell lines and also in homozygous AIM2 mutant LoVo cells. The mutant and wildtype AIM2 proteins are of similar predicted molecular weights (39 kDa and 40 kDa respectively). Because NMDresistant transcripts often have a cMS in the last exon near the C-terminus, mutant proteins will often be of similar size to normal proteins. This makes validation of these mutant protein bands by techniques used to confirm the CREBBP and EP300 bands more difficult. Mutant protein bands were not detected for all NMD-resistant transcripts assessed. In the case of ACVR2A, protein bands were detected at the expected molecular weights for both normal and mutant protein, but in a non-specific manner that did not correlate with the mutation data. Multiple protein bands were detected in blots for TCF7L2, a gene with numerous alternatively spliced transcripts [29], but no mutant proteins were identified that correlated with the mutation data.

Genome-wide search for NMD-resistant transcripts as a source of targetable proteins
Having established that NMD-resistant transcripts can give rise to expressed MSI-derived mutant proteins, we sought to identify additional cMS-containing genes likely to generate expressed mutant proteins in MSI-High colon cancers that could serve as therapeutic targets. Using a bioinformatic strategy, we performed a genome-wide search for cMS-containing genes based on criteria predicted to encode for NMD-irrelevant mutant transcripts [24], which are intrinsically NMD-resistant (Table S2). We identified 1511 non-redundant mononucleotide cMS repeats $7 mer in length (including hypothetical genes) predicted to generate NMDresistant mutant transcripts. This compares with 17,654 mononucleotide cMS ($6 mer) identified (irrespective of NMD status) in a previous study [17].
Genes were selected for cMS mutation analysis based upon selection criteria likely to be important to utility as a targetable protein. We limited the study to the 316 mononucleotide cMS repeats $8 mer in length, since there is a correlation between microsatellite length and likely mutation frequency. Significant mutation rates (.40%) have been reported for some 8 mer repeats [16]. Genes with short predicted neo-C-terminal protein sequences (,10 amino acids) were excluded, as these are unlikely to generate useful T cell epitopes for a range of common HLA types, as is required to overcome MHC restriction. Excluding hypothetical genes (prefix LOC or KIAA), 179 cMS-containing genes satisfied our selection criteria (Table S3). Evidence for protein expression in colon cancers of the shortlisted genes was also sought as an additional selection criterion to indicate likely expression of the mutant proteins. The Human Protein Atlas [30] includes data showing protein expression in colorectal cancers by immunohistochemistry. However, such data is unavailable for many genes, due to dependance of IHC upon the availability of suitable antibodies. Therefore, gene expression profiling data was also used to identify genes with the highest mRNA expression levels based upon data from two studies [31,32] of MSI and MSS colon cancers ( Figure S2). Selected genes (Table 4) were screened for cMS mutations in a panel of 5 MSI and 1 MSS cell lines and a subsequent panel of 6 MSI cell lines was tested if the initial screen revealed any cMS mutations. Three genes (SFRS12IP1, MED8 and ASXL1) were mutated in 6/11 (55%) of MSI-High cell lines and two genes (FBXL3 and RGS12) were mutated in 5/11 (45%). Several genes with lesser mutation rates were also identified (Table S4).

In silico epitope analysis for NMD-resistant transcript derived mutant proteins
To be useful as therapeutic vaccine/immunotherapy targets, the mutant proteins identified need to generate CTL epitopes for common HLA types. In silico analysis using the publicly available software such as SYFPEITHI has been previously identified new CTL epitopes that were subsequently validated in functional assays [26]. To identify which MSI-derived mutant proteins are likely to be immunogenic, we assessed mutant proteins arising from NMD-resistant transcripts for potential HLA-A*0201 epitopes. HLA-A*0201 is the most common HLA type, present in about 50% of the population [33]. SYFPEITHI scores are based upon binding motifs common to known naturally occurring epitopes. According to the SYFPEITHI scoring guidelines, a naturally expressed epitope should be amongst the top 2% of all peptide scores assessed in 80% of cases [26]. We assessed mutant C-termini arising from common NMD-R mutations (Table S5) and the top 2% of HLA-A*0201 SYFPEITHI binding scores generated from the mutant proteins were determined ( Table 5). The top-ranking peptides had SYFPEITHI binding scores ranging from 23-31. These scores are comparable to the binding scores of 135 known HLA-A*0201 CTL cancer epitopes [26], including the 4 published MSI-derived epitopes, which have a median score of 24 and a mean of 23.4 (Table S6).
Based upon available reported cMS mutation rates for EPHB2, TTK and RGS12 (C8), a vaccine including these three mutant proteins will include at least one probable HLA-A*0201 epitope for 69% of patients (12(1241%)x(1227%)x(1228%)). Further HLA-A*0201 epitopes were identified for several genes (TMEM97, ASXL1, SFRS12IP1, RGS12 (A9), RTKN2) with common mutations in this study, indicating that also targeting these genes could broaden HLA-A*0201 coverage. Some rare mutations (e.g. NTAN1, RAPH1) also produce immunogenic epitopes that would not provide broad population coverage, but may also be worth targeting in selected patients.

Discussion
Microsatellite instability gives rise to predictable frameshift mutations involving the cMS of many genes. Mutant proteins arising from these mutations are a logical source of tumour-specific antigens that could account for both the presence of increased TILs and the better prognosis associated with an MSI-High phenotype. This concept is supported by a recent finding of a correlation between immune infiltrate density and the number of cMS frameshift mutations detected in a set of MSI-High colon cancers [34]. We have validated at least two examples (CREBBP and EP300) of naturally expressed mutant proteins in MSI-High colon cancer cell lines, both arising from NMD-resistant transcripts. These findings confirm that NMD-resistant transcripts can generate naturally expressed MSI-derived mutant proteins [34,35]. The cMS repeats in CREBBP and EP300 are short and unlikely to be mutated in a high proportion of MSI-High tumours. However, it is likely that some common NMD-resistant mutations such as TTK will also generate targetable mutant proteins in MSI-High cancers. Our genome-wide strategy to identify immunogenic NMD-resistant transcripts has identified five novel cMS mutations (SFRS12IP1, MED8, ASXL1, FBXL3, RGS12) in at least 45% of eleven MSI-High colon cancer cell lines tested. Mutation analysis in a larger series of primary MSI-High colon cancers will enable the mutation frequencies of these potential therapeutic targets to be more accurately determined.
In vitro studies have identified CTL responses to synthetic mutant peptides arising from common cMS mutations, using peripheral blood mononuclear cells from patients with MSI colon cancers [10,11,12,13,14,36] and intriguingly, also in healthy individuals with HNPCC [15]. These studies have shown that tumour-specific mutant proteins can stimulate in vitro T cell immunity and support the concept of a vaccination strategy. The published HLA-A*0201 CTL epitopes for TGFBR2 [10,11], CASP5 [12], OGT [13] and U79260(FTO) [14] all arise from NMD-sensitive mutant transcripts. NMD-inhibition experiments have shown that mutant TGFBR2 is one of the most upregulated transcripts, with 10-fold increased mRNA expression in one study [27,37]. The T cell responses conflict with the expected impact of NMD upon mutant protein expression. A transfection study showed that mutant TGFBR2 protein is detectable in transfectants of mutant cDNA (NMD-resistant), but not mutant gDNA (NMD-sensitive) [35]. We did not detect mutant proteins from NMD-sensitive transcripts (BAX, CASP5 and MSH3). However, NMD is a translation-dependant process [24] and lowlevel mutant protein expression from the initial round of translation may be sufficient to stimulate immune responses. CTL can show exquisite sensitivity to a specific peptide/HLA ligand, with as few as 3 target peptide/MHC complexes being sufficient for CTL-specific cell lysis in vitro [38]. Humoral responses to mutant TGFBR2 protein have been identified by ELISA, but only in a minority (10%) of MSI-High colon cancer patients [39]. A benefit of NMD-resistant transcripts as a source of targetable mutant proteins is that stably expressed tumour antigens can be cross-presented to the immune system by dendritic cells to stimulate CTL and T helper cell responses [40]. Cross-presentation is likely to result in superior anti-tumour immunity and NMD-resistant transcripts as a source of targetable mutant proteins. A fusion-gene assay reported a significant correlation between the expression of transfected MSI-derived mutant proteins and cross-presentation of co-expressed ovalbumin antigen [18]. For genes with negligible detectable protein, some moderate to strong CTL responses were detected, but without cross-presentation [18]. A limitation of the transfection assay is that mutant cDNA constructs were used and therefore the experiments did not evaluate the impact of NMD upon mutant protein expression. Our approach overcomes this through direct detection of mutant protein expression. A modified transfection assay requiring natural splicing of the C-terminus could also identify biologically relevant vaccination targets.
NMD-inhibition experiments have identified some ''NMDescape'' transcripts which partially resist degradation due to NMD [27],. MBD4 is predicted generate an NMD-sensitive transcript, but a faint abnormal protein band has been reported in Western blot analyses of MBD4 mutated cell lines. This may represent mutant MBD4 protein expression due to NMDescape [23]. NMD-escape transcripts may be a useful source of protein targets, but these are unable be predicted and further work is needed to determine which NMD-escape transcripts produce targetable mutant proteins. Strategies that interfere with NMD without inhibiting protein translation should cause a generalised upregulation in NMD-sensitive mutant transcripts, unmasking a range of tumour antigens. This approach could complement a vaccine targeting MSI-High tumours by broadening the range of available immune targets beyond NMD-resistant transcripts. NMD-inhibition by siRNA targeting of NMD regulatory genes SMG1 or UPF2 has recently been shown capable of inducing  protective in vivo immune responses to tumour cells transfected with an NMD-sensitive target antigen [41].
Our confirmation of mutant protein expression by MSI-High tumours may have important clinical applications. A vaccine targeting commonly expressed mutant proteins has been proposed as a promising strategy for the treatment of MSI-High tumours [15,18,34,42]. With sufficient targets, vaccination may be feasible provided that the mutant proteins are expressed, antigenic epitopes are generated and antigen presentation is functional. Further work is needed to determine which mutant protein immune responses can confer a survival benefit to patients with MSI-High tumours. Mutant proteins secreted into the serum could serve as useful biomarkers for the early detection of MSI-High tumours, particularly in the HNPCC affected population. Such biomarkers would be a useful adjunct to colonoscopic screening, since MSI-High colorectal cancers have a propensity for interval adenomas and tumours developing in the period between screening colonoscopies [43,44]. The common, predictable and tumour-specific nature of MSI-derived mutant proteins makes them ideal candidates as therapeutic targets or diagnostic markers.

Cell lines and culture
LIM colorectal cancer cell lines were established from tumour biopsies at the Ludwig Institute for Cancer Research (LICR) Melbourne Branch as described previously: LIM1215 [45], LIM1899 [46], LIM1863 [47], LIM2463 [48] and (LIM2099, LIM2405, LIM2408, LIM2537, LIM2550 and LIM2551) [49]. The LIM cell lines were established from both sporadic and familial colorectal cancers [49] and can be obtained from CellBank Australia (www.cellbankaustralia.com) or from the LICR Melbourne Branch on completion of a standard academic materials transfer agreement. SW1222 colorectal cancer cells were provided by the Ludwig Institute for Cancer Research, New York Branch, (New York, NY) and HCA7 colorectal cancer cells were obtained from the European Collection of Cell Cultures (Porton Down, UK). HCT116, LS174T, SW480, LoVo and HT29 colorectal  cancer cell lines and HeLa cervical cancer cells were obtained from the American Type Culture Collection (Rockville, MD). Cells were cultured at 37uC and 5% CO 2 in RPMI 1640 media (Invitrogen) supplemented with 10% foetal calf serum (FCS), 8 U/ ml insulin, 0.5 mg/ml hydrocortisone, 1610 22 mg/ml thioglycerol, 20 mg/mL streptomycin sulfate and 12 mg/mL benzylpenicilin or in DMEM media (Invitrogen) supplemented with 10% FCS, 20 mg/mL streptomycin sulfate and 12 mg/mL benzylpenicilin.

Assessment for MSI by immunohistochemistry (IHC)
Cell blocks of each cell line were prepared from 5610 7 cells, rinsed and resuspended in 5 mL PBS, centrifuged into a pellet, fixed in 10% neutral buffered formalin and embedded in paraffin. Sections 3 mm thick were prepared using Superfrost Plus Adhesion slides (Thermo Scientific). IHC for DNA mismatch repair proteins was performed on the sections using Vision Biosystems Bond-Max automated IHC stainer, with onboard antigen retrieval using Bond Epitope Retrieval Solution 2 Table 4. NMD-R C-terminal peptide sequences from genome-wide search selected for mutation analysis.    OGT SLYKFSPFPL 23 [13] CASP5 FLIIWQNTM 20 [12] TGFBR2 RLSSCVPVA 19 [10,11] doi:10.1371/journal.pone.0016012.t005

Mutation analysis of cMS
Cell lines were tested for frameshift mutations in a selection of cMS reported to be mutated in MSI-High colon cancers by DNA sequencing (TGFBR2, BAX, CASP5, TCF7L2, TAF1B, MSH3,  IGF2R, CREBBP, EP300, MARCKS, AIM2, ACVR2) or by  fragment analysis (ASTE1, PTHLH, C4orf6, EPHB2, TFE3,  TTK, PA2G4, KCNC1, SP100, FBXL3, MED8, RGS12, CIR). Genomic DNA was extracted from the cell lines with Qiagen DNeasy Blood & Tissue Mini Kit (Qiagen) using standard protocols. PCR primers ( Figure S7) were designed to flank the cMS of interest, yielding amplimers 100-800 base pairs in length (for DNA sequencing) or 90-200 base pairs in length with FAM tags located at the 59 end of the 59 oligonucleotides (for fragment analysis). PCR reactions were performed in 25 mL total volume, with 100 ng gDNA, 2.5 mL 106 reaction buffer, 1 mM MgCl 2 , 100 mM dNTPs, 200 mM of each primer and 0.5 U Taq DNA polymerase, using the following standard conditions: initial denaturation at 94uC for 3 min, 35 cycles (denaturation at 94uC for 30 sec, annealing at 58uC for 30 sec and primer extension at 72uC for 1 min). A final extension step was performed at 72uC for 10 min. PCR fragments were resolved by agarose gel electrophoresis with ethidium bromide and visualised with ultraviolet light. Gel extractions were performed to purify PCR fragments (Qiagen) prior to DNA sequencing, using 5 ng DNA per 100 base pairs of template and 3.2 pmol of forward or reverse primer in 10 ml ddH2O (Wellcome Trust Sequencing Centre, Monash Medical Centre, Victoria, Australia). DNA sequencing data was analysed using Chromas Version 2.33 (Technelysium) and assessed for cMS frameshift mutations. DNA sequencing data from MSS cell lines were used as controls to indicate the degree of PCR slippage artefact affected the normal cMS sequences. For fragment analysis, PCR products were diluted based upon assessment of the intensity of the bands detected in the DNA agarose gel, loaded into 96 well plates with 2 mL sample, 9.75 mL formamide and 0.25 mL LIZ size standard per well for analysis. Samples were denatured at 95uC for 3 minutes, incubated on ice for 1 minute and loaded into a sandwich for analysis. Data was analysed using Gene Mapper 4.0 software (Applied Biosystems). Samples from MSS cell lines were used as controls to indicate the normal fragment size and pattern of non-mutated cMS.

Prediction of mutant proteins arising from frameshift mutations
Genetic sequences were accessed from Ensembl [50] (latest release 58 -May 2010) and transcripts were assessed for the presence of the described cMS. Normal protein sequences and mutant protein sequences expected to arise from a frameshift mutation of 21 DNA base pair were determined for each cMScontaining transcript using ExPasy Translate Tool and the predicted molecular weight of the normal and mutant proteins were calculated with the ExPasy Compute pI/Mw Tool (Swiss Institute of Bioinformatics) [51].

RNA interference (RNAi) assays
Transient transfectants of cell lines were prepared using individual or SMARTpool preparations of small interfering RNA (siRNA) (Thermo Fisher Scientific), targeting the genes of interest. Transfectants of non-targeting (scrambled) siRNA and no siRNA (PBS only) were used as negative controls. Lipofectamine (7.5 ml) incubated with 250 ml OPTI-MEM media (Invitrogen) per transfection for 7 min was combined with siRNA (50 pmol in 250 ml OPTI-MEM per transfection) for 20 min at RT and added to cells pre-plated in 2 mL antibiotic-free OPTI-MEM media (6 well plates, LoVo 15610 ' 5/well, HCT116 10610 ' 5/well). After incubation overnight the siRNA-containing media was removed and replenished. At 24 hours post-transfection cell samples were split, one-third into fresh 6 well plates with 2 mL media/well for Western blot analysis and the remainder into 10 cm dishes and resuspended in 10 mL media/dish for qRT-PCR analysis. At 72 hours cells were harvested, counted using Countess Automated cell counter (Invitrogen) and lysed. Samples for Western blot analyses were stored at 280uC until ready for analysis. Specificity of candidate normal and mutant protein bands in western blots was confirmed by detection of decreased protein band intensity in lysates of cells transfected with siRNA targeting the gene of interest relative to negative controls. Blots were probed for a non-targeted gene (b-catenin) as a loading control to enable the relative ''knockdown'' to be calculated.
Extracts of total RNA for qRT-PCR were prepared from transfectants using RNAspin Mini RNA isolation kit (GE Healthcare) using standard protocols at room temperature. RNA concentration and quality (A260/280 ratio) was assessed with a NanoDrop ND-100 Spectrophotometer (Thermo Scientific). RNA samples were stored at 280uC until ready for analysis. Samples were digested using RNase free DNase and converted to cDNA with an RT kit (Applied Biosystems). Residual RNA was digested with RNAase. Serially diluted cDNA samples were analysed in duplicate by qRT-PCR with an ABI 7300 Real Time PCR instrument (Applied Biosystems), using Taqman (ABI) Master Mix, primers/probes for the gene of interest and for a control gene (GAPDH). No primer/probe, no RT enzyme and no cDNA negative controls were included and a standard curve was generated from the data. Fold-change in transfectant mRNA expression levels relative to untransfected cells was evaluated using the comparative C T (DDC T ) method, expressed as a range incorporating the standard deviation of DDC T . Gene-specific knockdown in siRNA transfectants was confirmed by detecting a significant decrease in mRNA levels of transfectants relative to untransfected cells.

Nano-Liquid Chromatography-Mass Spectrometric Analysis
Immunoprecipitates of cell lysates (5610 8 ) were produced lysed and size fractionated by SDS-PAGE as described above. Gels for mass spectrometry analyses were stained (1 hr) with Coomassie R-250 (Pierce) and de-stained (.1 hr) in distilled water. Gel sections (1.5 mm) were excised with a scalpel, extensively washed in deionized water and digested with trypsin (0.25 mg) using a robotic workstation (MassPrep, Micromass). Tryptic digests were concen-trated to ,10 mL by centrifugal lyophilization (Savant) in preparation for Electrospray-Ion Trap (ESI-IT) tandem mass spectrometry (MS/MS) (LCQ-Deca, Finnigan). Digests (,10 mL in 1% (v/v) formic acid) were transferred into 100 mL glass autosampler vials for injection and fractionation by nano-reversedphase-HPLC (Agilent Model 1100 capillary HPLC) using a 15060.15 mm I.D. RP-capillary column (Reprasil-Pur, C18-AQ, 3 mm, Dr Maisch, GmbH), linear 60 min gradient from 0%-100% Solvent B, flow rate of 0.8 mL/min at 45uC. Solvent A was 0.1% (v/v) aqueous formic acid and Solvent B was 0.1% aqueous formic acid/60% (v/v) acetonitrile. The capillary HPLC was coupled on-line to an ESI-IT mass spectrometer for automated MS/MS analysis of individually isolated peptide ions. Raw MS/ MS spectra were extracted using Extract-MSN in Bioworks 3.1 (Finnigan, San Jose, U.S.A.). Parameters used to create the peak lists were: minimum mass 400; maximum mass 5000; grouping tolerance 1.5; intermediate scans 1; minimum group count 1; automated calculation of charge state; 30 peaks minimum per spectrum; peptide charge states 1+, 2+ or higher; 62 Da peptide mass tolerance; 60.5 Da MS/MS fragment mass tolerance. Parent ion masses were determined based upon the isotope cluster spacing in the zoom scan spectrum and individual spectra files (.dta file extension) were generated. Acquired MS/MS spectra were searched against the non-redundant protein subset database (Ludwig NR_subset version_Q407) using the Mascot TM search algorithm (v2.1, Matrix Science, U.K.). Searches were conducted with carboxymethylation of cysteine as a fixed modification (+58 Da), variable oxidation of methionine (616 Da) and an allowance for up to 3 missed tryptic cleavages. Results containing possible peptide identifications by the Mascot algorithm were validated manually.

Bioinformatic search for NMD-resistant mutant transcripts
Coding microsatellite (cMS) containing sequences likely to be NMD-resistant upon mutation were identified using a computer program written in the Python language (http://www.python.org) using the BioPython library of Python functions (http://biopython. org). Matched human genomic DNA, mRNA and coding DNA sequence (CDS) gene information was obtained from the RefSeq database (NCBI build 37.1; ftp://ftp.ncbi.nih.gov/genomes/H_ sapiens/). The Python program processes each CDS entry, determines the coding DNA sequence and translates this sequence into a predicted amino acid sequence. Potentially erroneous CDS entries are discarded if they generate sequences lacking an initial methionine residue or a terminal stop codon. CDS entries generating sequences with a premature stop codon are also discarded. Retained CDS are checked for the presence of mononucleotide cMSs at least six bases in length (e.g. AAAAAA). One or two bases are deleted from the mononucleotide cMS and translated to generate amino acid sequences for the mutant proteins. If a premature stop codon in the frameshift mutated sequence is .55 bases upstream of the last exon-exon junction, this cMS is excluded from further consideration, as the mutant mRNA is likely to be NMD-sensitive. If a premature stop codon is in the last exon or within 55 bases of the last exon-exon junction, the cMS is retained, since the mutant mRNA is likely to be NMD-resistant. If no premature stop codon is generated, then frameshifted DNA sequence of the non-coding 39 end of the corresponding mRNA entry is assessed for the presence of a stop codon. DNA sequences lacking a stop codon are discarded, since the entire sequence is unable to be characterized. Predicted mutant DNA and amino acid sequences are characterized by length and site of the cMS repeat, type of deletion (one or two repeat units) and the length, molecular weights and sequences of the normal and corresponding mutant proteins. The presence of any additional cMS in the genetic sequences is also identified to detect any lengthy 59 cMS that may be susceptible to mutation and prevent some NMD-resistant transcripts from being generated. The program also assesses genes for dinucleotide cMS repeats using a similar algorithm.

Identification of new target genes
Genes from the genome-wide search were prioritized for study based upon multiple criteria including susceptibility to frequent frameshift mutation, likely antigenicity and likely protein expression in colon cancer. A minimum mononucleotide cMS length of eight nucleotides was chosen, since mutation rates in our desired range (.30%-50%) have been reported for cMS of this length in some genes, but not for shorter cMS [16]. Genes with ,10 amino acids of novel protein sequence were excluded, since these sequences are unlikely to generate a range of useful CTL epitopes. Immunohistochemical data regarding protein expression in colon cancer is available for some genes in the Human Protein Atlas [30]. In the absence of such data for many genes, gene expression microarray data were also used to assess which genes are highly expressed in MSI colorectal adenocarcinomas. Using two gene expression data sets that compared MSI with MSS colon cancers using the same Affymatrix chip (PMID: 17047040, GEO entry GSE4554 [31], PMID: 19088021, GEO entry GSE13294 [32]), we ranked shortlisted genes (NMDresistant transcripts, $8 mer cMS, $10 amino acids of novel mutant C-terminus) by median gene expression values. Genes with either moderate to strong protein expression in colon cancer in the Human Protein Atlas resource or with median values in the top 50 scores by either gene expression study were considered for cMS mutation analysis. Gene expression data does not necessarily indicate relative expression between different proteins, since factors such as hybridization efficiency influence the microarray expression read-out (evident in the variability in expression values between different probes for the same gene). However, there is some correlation between gene expression and protein expression and it is likely that high gene expression will correlate with protein expression.
To identify new cMS mutations, PCR primers were designed to amplify cMS-containing regions of the selected genes (Supplementary Data 1). PCR and DNA sequencing was performed using protocols described above. Five MSI-High colon cancer cell lines (HCT116, LoVo, LIM1215, LS174T and HCA7) and one MSS cell line (SW480) were initially screened for frameshift mutations in the selected genes. Genes bearing mutations in this initial screen were also tested for mutations in the remaining 6 MSI colon cancer cell lines.

In silico T cell epitope prediction
Mutant proteins predicted to be generated from common cMS mutations were screened for likely HLA-A*0201 CTL epitopes, using the publicly available predictive software, SYF-PEITHI [26]. This program uses a scoring system based upon the binding motifs of known CTL epitopes to generate binding scores that rank all potential nonamer and decamer peptides generated from the input protein sequences in order of likely binding to the HLA-A*0201 molecule. Peptides with scores in the top 2% of peptides were identified as likely CTL epitopes, the scores of which were validated by comparison with score of 133 known HLA-A*0201 CTL epitopes listed on the SYFPEITHI website [26] and 2 MSI-derived HLA-A*0201 epitopes [13,14] not listed at this site. Figure S1 Proteomic analysis for CREBBP and EP300. (A) Immunoprecipitates of SW480 (wild type CREBBP) and LoVo (21, wt) cell lines were prepared using anti-CREBBP antibody and (B) immunoprecipitates of LIM1215 (wild type EP300) and HCT116 (21, wt) cell lines were prepared using anti-EP300 antibody. Tryptic digests were prepared from the region of the gel including the suspected CREBBP and EP300 bands. Tandem mass spectrometry revealed multiple peptide sequences corresponding to CREBBP in the SW480 cell line and EP300 in the LIM1215 cell line, confirming the specificity of the wild type protein band. Mutant protein bands were not confirmed by this technique, possibly due to the presence of more abundant coimmunoprecipitated proteins such as MYH9 (data not shown). (DOCX) Figure S2 Gene expression to prioritise target gene selection. NMD-R transcripts with cMS $8 mer and mutant Cterminus $30 a.a. and CREBBP were assessed using colorectal cancer gene expression data from two studies. Boxplots of top 50 gene expression levels shown, with interquartile range above nonspecific background (set at 6). High gene expression was used to indicate likely protein expression. (DOC)