• Loading metrics

Evolution of the nonsense-mediated decay pathway is associated with decreased cytolytic immune infiltration

  • Boyang Zhao ,

    Roles Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing (BZ); (JRP)

    Affiliations Department of Biomedical Engineering, College of Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America, Quantalarity Research Group LLC, Houston, Texas, United States of America

  • Justin R. Pritchard

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing (BZ); (JRP)

    Affiliation Department of Biomedical Engineering, College of Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America

Evolution of the nonsense-mediated decay pathway is associated with decreased cytolytic immune infiltration

  • Boyang Zhao, 
  • Justin R. Pritchard


The somatic co-evolution of tumors and the cellular immune responses that combat them drives the diversity of immune-tumor interactions. This includes tumor mutations that generate neo-antigenic epitopes that elicit cytotoxic T-cell activity and subsequent pressure to select for genetic loss of antigen presentation. Most studies have focused on how tumor missense mutations can drive tumor immunity, but frameshift mutations have the potential to create far greater antigenic diversity. However, expression of this antigenic diversity is potentially regulated by Nonsense Mediated Decay (NMD) and NMD has been shown to be of variable efficiency in cancers. Here we studied how mutational changes influence global NMD and cytolytic immune responses. Using TCGA datasets, we derived novel patient-level metrics of ‘NMD burden’ and interrogated how different mutation and most importantly NMD burdens influence cytolytic activity using machine learning models and survival outcomes. We find that NMD is a significant and independent predictor of immune cytolytic activity. Different indications exhibited varying dependence on NMD and mutation burden features. We also observed significant co-alteration of genes in the NMD pathway, with a global increase in NMD efficiency in patients with NMD co-alterations. Finally, NMD burden also stratified patient survival in multivariate regression models in subset of cancer types. Our work suggests that beyond selecting for mutations that elicit NMD in tumor suppressors, tumor evolution may react to the selective pressure generated by inflammation to globally enhance NMD through coordinated amplification and/or mutation.

Author summary

The interactions between cancers and their immune microenvironments drive a co-evolutionary process. This can start with the mutational changes that generate neo-antigenic epitopes, leading to the activation of a cytotoxic T-cell response. Activated and proliferating T cells can search out and destroy tumor cells. Then, increased killing of the tumor can lead to changes such as increased expression of PD-L1 and/or loss of function mutations in MHC Class I proteins that help a tumor evade immune surveillance. Frameshift mutations in particular have been found to generate highly antigenic epitopes. However, they can also activate nonsense mediated decay (NMD), leading to the suppression of such epitopes. This presents a paradox in how frameshift mutations and NMD interact in eliciting/evading immune surveillance during the course of tumor evolution. Here, we identify a new example of conditional selection, whereby tumors can coordinately mutate and/or amplify genes in the NMD pathway. We observed significant co-occurrence of mutations and copy number alterations among the NMD pathway genes. The co-alterations within the NMD pathway increase the global NMD efficiency and are associated with lower cytolytic activity, and in some cases, worse overall survival.


The co-evolutionary arms race between cancer and the immune response can drive tumor evolution. Tumors with high levels of clonal neoantigens have higher levels of T-cell infiltration [1], and higher response rates to immunotherapies [13]. High levels of immune infiltration are also associated with loss of function mutations in Class I MHC proteins [4], suggesting that the inflammation caused by T-cell-tumor recognition can result in selective pressure to lose T-cell tumor interactions. Many of the same variables that have been associated with immune infiltration in untreated tumors have also been associated with therapeutic response to checkpoint inhibitors [59]. Thus, exploring the predictors of inflammation and survival in the public TCGA datasets is an important source of hypotheses about immuno-therapeutic responses in human tumors, and gives us a window into the process of co-evolution between tumors and the immune system that shapes the immune ecology of the tumor and its microenvironment.

Only a minority of patients across multiple cancer types have been shown to be sensitive to single agent immunotherapy [1013]. This has prompted the rapid clinical development of anti-PD-1 antibodies alongside biomarkers in diverse patient populations and in combination with a variety of established and experimental therapeutics. Notably, combining checkpoint inhibitors with patient stratification has led to the approval of Pembrolizumab (an anti-PD-1 monoclonal antibody) in previously untreated NSCLC patients with PD-L1 positive tumors [14]. However, like targeted therapy, only a minority of NSCLC patients are PD-L1 positive. Unlike targeted therapy, patients that are PD-L1 negative have non-trivial response rates to immunotherapy.

Beyond PD-L1 positivity, other factors such as tumor mutational burden, microsatellite instability, oncogenic viruses have also been associated with therapeutic response and tumor immune infiltrates [3,4,15]. The rationale for their utility is that increased antigenic burden creates a specific T-cell response. Neoantigen burden due to non-synonymous substitutions has been clearly associated with immunotherapeutic success, cytolytic activity, and overall survival [1,2,16]. This link has also been demonstrated in a prospective clinical trial with nivolumab plus ipilimumab. In this randomized trial, the authors demonstrated that stage IV or recurrent NSCLC (not previously treated with chemotherapy and with a tumor PD-L1 expression level of less than 1%) who have more than 10 nonsynonymous mutations per megabase have a 42.6% progression free survival at 1 year [17]. However, this recent clinical trial only examined single amino acid mutations, and many prior predictions of neoantigen burden also tended to predict neoantigens using single substitution variants [16,1820].

Interestingly, like PD-1, many patients with high neoantigen burden fail to respond to immunotherapy, and some patients with low neoantigen burden have durable responses to immunotherapy and exhibit high levels of tumor inflammation with cytolytic cells [2,21]. As such, there is a critical need to continue to understand and predict mediators of tumor immunity in humans. Recently, multiple improvements to neoantigen predictions have been made by incorporating clonality [1], indel mutations [22], and intron retention mutations [23] into studies of immune infiltration and immune response in tumors.

Specifically, frameshift mutations have been examined genomically, pre-clinically, and in patient case studies [24,25]. These revealed that mutagenic indels provide a highly immunogenic source of antigens [24]. However, deeper investigations of the different mechanisms of neoantigen generation have the potential to expand the prognostic power of clinicians to predict immunotherapy responses and indications. Moreover, better predictions of the neoantigen landscape should aid current and future efforts to develop neoantigen derived peptide vaccines. These peptide vaccines may help turn immunotherapy non-responsive tumors into immunotherapy responsive tumors.

However, the use of frameshift mutations for predictions in pan-cancer analyses raises an important question. Frameshifts were not initially considered in genomic analyses of neoantigens because they were considered to be unlikely to be expressed due to nonsense mediated decay (NMD) [1]. Frameshift mutations cause premature termination codons (PTCs), resulting in mRNAs that are the target of nonsense-mediated decay (NMD). While NMD should lead to a loss of expression of the resulting transcripts, NMD has been found to function with varying efficiency [26]. This may lead to reduced NMD and result in the expression of frameshift mutations that could be presented as neoantigens. In addition, genomic analyses of frameshift mutations and their expression suggests that NMD operates with reduced efficiency in cancer [24]. Finally, there is also evidence in preclinical models that inhibiting nonsense mediated decay can enhance tumor immunity [27]. Thus, we believe that NMD itself may also act as an independent biological filter of which indels are expressed, and thus we aimed to quantify the additional information that NMD brings to the prediction of frameshift neoantigens. We hypothesize that additional orthogonal predictive metrics of immune activation can be derived based on patient-level NMD efficiencies. Here, we examined multiple cancer indications for associations between NMD/mutational burden with cytolytic activity/survival. We find that accounting for NMD and indel mutations is significantly better than accounting for indels alone. We also find that NMD derived metrics have some independent prognostic value alongside current clinical parameters like PD-L1 expression and simple tumor mutation burden (TMB) [3,28].


NMD metrics are orthogonal predictors of tumor cytolytic activity

Recent work by Rooney et al. used the transcript levels of two cytolytic effectors GZMA and PRF1 (S1 Fig) to assess the immune cytolytic activity [4]. Here, using this measure for immune cytolytic activity, we quantitatively examined 17 cancer indications for the contribution of mutation variant counts to observed cytolytic activity (high versus low). We performed a pan-cancer analysis using a random forest model with the total counts of each mutation variant type as features. A final AUROC value of 0.59 suggest that using mutation counts does not fully explain the cytolytic activity, but they are statistically significant contributors (S2A and S2B Fig). Almost all the mutation variants are important and contribute to the model accuracy (S2C Fig). As expected, we observed that missense, nonsense, and silent mutation variants are correlated [24]. However, frameshift mutation counts are not strongly correlated with silent mutation counts, hence suggesting frameshift are an orthogonal predictor (S2D Fig).

Recent developments have suggested that frameshifts (which create very distinct neoepitopes) can improve the prediction of inflamed tumors and patient survival [24]. However, this presents a question: does patient level NMD independently associate with metrics of tumor inflammation and overall survival in a manner that is independent from indel abundance? Previous work performed an approximate correction for NMD, but, the NMD process has been shown to be complex and variable [26], and could be measured at the patient level by many metrics. For instance, the central tendency of NMD across all transcripts should give an indication of the efficiency of the process of NMD within an individual while the maximum NMD level within an individual for a specific transcript might measure the propensity for NMD to inhibit specific neoantigens. We hypothesized that to understand the role of nonsense mediated decay more deeply, we had to investigate many measures of NMD activity simultaneously. As NMD efficiency is measured at the individual gene level, while cytolytic activity is measured at the patient-level, we began by deriving multiple patient-level measures of ‘NMD burden’, using different approaches to aggregate the NMD efficiency values (Fig 1A and S3 Fig). This included a burden metric of nonsense mutations (ns), frameshift mutations (fs), and combined nonsense and frameshifts (ns+fs). We used multiple aggregated NMD metrics in order to cover our uncertainty in the relevant metric to generalize gene-level to patient-level NMD burden. We first examined the correlation among the variables, and observed that related variables (i.e. NMD related metrics, cytolytic activity metrics) tended to cluster together (Fig 1B). In addition, simple metrics of mutation abundance are positively correlated with cytolytic activity while most NMD-based metrics are negatively correlated (Fig 1B, S4 Fig). This suggests that higher NMD efficiency lowers the expression of indels and possibly neoantigens. This is consistent with NMD suppressing neoantigens in experimental models of cancer [27].

Fig 1. NMD burden as orthogonal predictors of cytolytic activity.

(A) Schematic of data processing pipeline for deriving NMD burden, incorporating TCGA datasets for CNA, exome-seq, and mRNA-seq. (B) Pan-cancer correlation among features for mutations and NMD burden. (C) Pan-cancer ROC for random forest model with mutation variant counts only (Mut), NMD burden only (NMD), or combined (Mut+NMD). (D) Out-of-bag error of overall model (black) and for predicting cytolytic activity low (red) and high (green), for combined random forest model. (E) Variable importance of the features used in the combined model, based on mean decrease in model accuracy. nmdns: NMD metric based on nonsense transcripts; nmdfs: NMD metric based on frameshift transcripts; nmdptc: NMD metric based on nonsense and frameshift transcripts; _n_decayed: number of transcripts with NMD; _frac_decayed: fraction of transcripts with NMD; _max: maximum NMD efficiency value; _med: median NMD efficiency; _mean: mean NMD efficiency; .wt: NMD efficiency metric weighted by mRNA expression.

Using our NMD features, we built pan-cancer models using mutation counts only, NMD-burden only, or a combination of the two feature groups. Surprisingly, NMD alone was as good of a predictor of pan-cancer cytolytic activity as mutations (AUROCs of ~0.6). Importantly, the combined model improved upon single data type predictors with an AUROC of 0.67 (Fig 1C and 1D). These results were significant as based on cross-validation, the AUROCs for mutation counts only, NMD-burden only, and mutation+NMD models were 0.59 ±0.03, 0.59 ± 0.03, and 0.66 ± 0.02. Thus, mutation counts and NMD-burden offer equally important orthogonal information that combines to improve our understanding of cytolytic activity in tumors (Fig 1D and 1E).

The NMD pathway is significantly and coordinately altered in cancer

A potential explanation for an association between the central tendency of NMD across all genes within a patient and cytolytic activity is that T-cell infiltration might exert selective pressure upon tumors to mutate or amplify genes in the NMD pathway. Towards this hypothesis we utilized the TCGA pan-cancer data sets. Focusing on the major genes in the NMD pathway (SMG1,5,6,7 and UPF1,2,3B) [29], we first examined whether these genes were amplified in cancer. Examining all patients in the Pan-Cancer dataset, we observed amplification of NMD pathway genes (Fig 2A) that resembled a gain of function pathway such as MAPK family members more than they did tumor suppressors such as P53 and PTEN (i.e. more amplifications than deletions were observed, S5 Fig). Beyond amplification, while none of the individual genes in the NMD pathway are predicted to be drivers of cancer, we thought it was possible that tumor evolution might select for co-occurrence of multiple individual NMD pathway mutations. Surprisingly, all 21 pairwise combinations of the 7 core NMD pathway genes exhibited a tendency towards mutational co-occurrence at a multiple hypothesis corrected p-value <0.001 (Fig 2B). The 7 genes were also co-amplified more often than one would expect by chance (corrected p-values < 0.05, Fig 2B). Only one of these pairwise mutual amplifications, SMG5-SMG7, contains 2 genes that reside in a similar genomic location on chromosome 1. Importantly, this co-alteration appears to also have a functional consequence. When we examined patients with no NMD alterations versus patients with co-alterations (i.e. >1 alteration) we observed a significant trend towards increasing NMD efficiency as NMD genes became coordinately altered (Fig 2C and S6 and S7 Figs). Moreover, patients with NMD alterations had lower cytolytic activity (S8 Fig).

Fig 2. NMD alterations co-occur and associated with improved NMD efficiency.

(A) Amplifications/deletions of genes in the NMD pathway (SMG1,5,6,7 and UPF1,2,3B) across different indications. (B) Co-occurrence of copy number and mutations of NMD genes, using the TCGA pan-cancer atlas datasets on cBioPortal. (C) NMD efficiency of patients with co-altered NMD genes versus those without any alterations. Y-values are shown as the difference in median of log10 transformed NMD metric values (co-altered versus no alterations). Each dot corresponds to a specific NMD burden metric in a given indication. Dots shown in red are statistically significant with adjusted p-value < 0.05; Mann-Whitney test with Benjamini-Hochberg multiple hypothesis correction.

Effects of NMD burden in individual cancer types

We next examined the contribution of our NMD metrics to predicting cytolytic activity in each individual indication in the TCGA. We observed varying AUROC patterns of mutations, counts, NMD burden, and combined models across the different indications (Fig 3 and S1 Table). Notably, for example, the contribution of NMD burden in GBM/LGG and LUSC toward predicting cytolytic activity were minimal while mutational counts added predictive value. On the other hand, in BLCA and LUAD, NMD burden contributed more than mutational counts toward predicting cytolytic activity. In indications COAD/READ, SKCM, HNSC, KIPAN, STAD, and UCEC, both NMD and mutational burden were important. In 6 out of 15 indications, the combined model resulted in AUROC values better than the individual models. We also randomized the input data and observed no predictive power in any of the null models (S9 Fig). Hence, the contributions of mutational counts and/or NMD burden were clearly significant. We next examined the individual prediction metrics within each feature category across the indications to infer which processes were important in which tumor type. The metric values varied in significance across all the TCGA indications (S10 Fig) with varying levels of contribution/significance toward the final model (S11 Fig). The features that contributed toward multiple indications included counts of missense, silent, and total mutations, and several of the NMD burden metrics (mostly as measured by mean or median) (S11 Fig). Most importantly, we observed that the metrics contributing toward the model are concordant with the final AUROC values of the model (e.g. indications where only NMD metrics are important based on AUROC values (Fig 3) also had only NMD metrics as important and statistically significant (S11 Fig)).

Fig 3. NMD burden improves predictivity of cytolytic activity.

Individual ROC, AUROC, and out-of-bag (OOB) error of random forest models, for (A) indications without microsatellite instability (MSI) incorporated into the model, and (B) indications with MSI incorporated into the model. AUROC data are shown as AUROC ± SE of AUROC from cross-validation.

Effects of NMD burden on survival outcomes

In addition to cytolytic activity, we examined the effects of NMD burden on survival. Here, we used the overall survival from TCGA clinical data and performed univariate Cox regression for each feature across all the indications. With multiple hypothesis correction, we observed mutational counts or NMD burden influence survival for a subset of the cancer types (BRCA, GBM/LGG, KIPAN, OV, SKCM, and UCEC) (S2 Table). We examined melanoma in particular, as it is among the cancer type with the highest tumor mutational burden (TMB) and many clinical biomarkers have been studied, with robust response to immunotherapies [30,31]. As expected, in melanoma higher cytolytic activity was associated with a better survival outcome (hazard ratio of cyt-high vs cyt-low, 0.5; 95% CI, 0.3–0.7); p-value = 0.002) (Fig 4A). A NMD burden metric was also shown to contribute to the overall survival differences (specifically nmdptc_med, which was found to be statistically significant in both univariate and multivariate models). Here a high NMD burden led to lower cytolytic activity (Fig 4B) and a worse overall survival (Fig 4C). We have also examined other known clinical biomarkers, including discretized TMB and PD-L1 levels, both of which were found to stratify the survival outcomes in subsets of cancer types (S12 Fig). Upon controlling for the covariates TMB, PD-L1, age, gender, and TNM stage, the same NMD burden metric was also found to be significant in melanoma (Fig 4D). All covariates were checked to be valid for the proportional hazards assumption based on scaled Schoenfeld residuals (S13 Fig).

Fig 4. Cytolytic activity and NMD burden stratifies overall survival outcomes.

(A) Kaplan-Meier overall survival for SKCM, stratified by cytolytic activity (categorized into low, med, and high based on quartiles). Legend shows the number of patients in each cytolytic activity level. (B) Association of feature nmdptc_med (NMD burden based on nonsense and frameshift, calculated based on median) on cytolytic activity. Statistical significance was determined using Mann-Whitney test. (C) Kaplan-Meier overall survival for SKCM, stratified by nmdptc_med (categorized into low, med, and high based on quartiles). (D) Forest plot of hazard ratios of each variable, in a multivariate Cox regression model for NMD burden (nmdptc_med), controlling for age, gender, TNM stage, tumor mutational burden (TMB), and PD-L1 levels.


A tumor’s mutational burden/neoantigen repertoire has been associated with inflammation, overall survival and therapeutic response to immunotherapies. Previous studies in the TCGA and in patients treated with checkpoint inhibitors have identified similar variables that predict immune infiltration and checkpoint response [4,8,3234]. Because frameshift neoantigen availability is hypothetically regulated by the efficiency of the NMD process, and functional evidence has suggested that inhibition of NMD can induce a tumor immune response and tumor regression in pre-clinical models by exposing neoantigens [27], we derived orthogonal patient-level metrics based on gene level NMD efficiencies [26] that improved our ability to predict tumor cytolytic activity both within and across TCGA cancer types. Tumors with co-amplifications and mutations in the NMD pathway had increased NMD efficiencies and were less likely to have immune infiltration. Beyond cytolytic activity we found that NMD metrics stratified some cancer types by distinct overall survival outcomes. These stratifications remained to be significant in melanoma after controlling for established clinical covariates PDL1 status and TMB.

In tumor suppressors, mutations arise spontaneously, and if they occur in a region that elicits NMD, the mutation can be selected for because NMD will eliminate the mRNA of the tumor suppressor. Clearly tumor evolution does not need to alter NMD to create loss of function in tumor suppressor proteins, it is simply selecting for mutations that happen to elicit NMD. However, our study suggests that at the level of a patient (and not a gene), mutations in NMD and a global increase in NMD efficiency can be selected for. The origin of the alterations in the NMD process could be due to enhanced suppression of tumor suppressor genes, or enhanced suppression of tumor neoantigens. Regardless of the causative selective pressure, the consequence is a tendency for genomic amplifications and mutations to co-occur within the key proteins in the NMD machinery (SMG 1,5,6,7, and UPF 1,2,3B). Consistent with this co-alteration, we observed that these co-occurring mutations/amplifications increased NMD efficiency in patients that had low cytolytic activity. This suggests that pan-cancer tumor evolution might select for co-alteration in the NMD pathway. The association of amplifications and not deletions as well as the observed functional increase in NMD efficiency suggests that cytolytic activity is inhibited by gain of function alterations in the NMD pathway.

The identification of correlates of tumor immune activity is a broad field with numerous potential candidates. One difficulty in interpreting these studies is that many studies find significant associations without quantifying the proportion of the data explained (and unexplained) by those variables. Thus, deciding which covariates to add, and how to quantify progress towards full prediction of cytolytic activity is difficult without the careful presentation of how much we understand versus how much we do not. Therefore, we present ROC curves and out-of-bag error estimates for all models. This quantitation allows us to clearly see that while we have identified significant predictors, our AUCs are mostly between 0.6 and 0.7, except for the pan-kidney dataset that has an AUC >0.8. This is for both previously studied mutational burden as well as our novel NMD burden. This suggests additional factors may contribute toward the full cytolytic activity observed.

A potential weakness of this study is the use of mutation burden rather than MHC class I binding predictions. However, predicting neoantigen-MHC class I binding remains challenging and prune to high false positives [35]. We chose here to instead focus on mutation burden to minimize confounders from imperfect predictions. In addition, previous studies showed that mutation burden is highly correlated with neoantigen burden [4,24], and as such likely to harbor similar information.

Beyond predictions of cytolytic activity, we quantified tumor mutation burden [36] and PD-L1 positivity across all sample and indications in the TCGA. In melanoma the effect of NMD significantly added to a multivariate Cox regression model. Though the effect was more modest after correcting for co-variates, we suggest that it is easy to examine NMD in future clinical datasets with larger cohorts to confirm the predictability based on NMD. In addition, as the survival outcomes stratified by NMD were limited to subsets of cancer types and different NMD metrics were found to be important, this highlights potentially the limitations of biomarkers such as that proposed here in being broadly applicable in a pan-cancer setting. We recommend that groups performing biomarker studies in treated and untreated patients should add metrics of NMD to attempt to understand and stratify responses across diverse cancers.

In addition to better understanding immunity and, potentially, the response to checkpoint therapy, the inhibition of NMD has been proposed as a therapeutic strategy in cancer [27]. Suppression of NMD can create a therapeutic effect through cell intrinsic (via restoring the activity of a tumor suppressor) or extrinsic mechanisms (via tumor immunity). Suppressing NMD activity has been proposed via directly inhibiting the NMD machinery (e.g. SMG1 or UPF1-SMG5/7 interactions) or modulating translation to promote PTC read-throughs [37]. Should a clinical candidate to inhibit NMD arise, the road to biomarker driven application of the cell intrinsic therapeutic effects is clear, but picking indications for the immune dependent activity requires studies like this one. Thus, we suggest that indications whose cytolytic activity is particularly well explained by NMD, or patients with alterations in the NMD pathway that increase NMD efficiencies might be interesting indications to look for cell non-autonomous immune driven efficacy of future NMD inhibitors.

Materials and methods


The following The Cancer Genome Atlas (TCGA) datasets were acquired from Board Institute GDAC Firehose repository: mRNA-seq V2 RSEM level 3, mutation calls level 3, copy number level 3, and clinical data level 1, for the following indications: BLCA (bladder urothelial carcinoma), BRCA (breast invasive carcinoma), CESC (cervical and endocervical cancers), COAD/READ (colorectal adenocarcinoma), GBM/LGG (glioma), HNSC (head and neck squamous cell carcinoma), KIPAN (pan-kidney cohort), LUAD (lung adenocarcinoma), LUSC (lung squamous cell carcinoma), OV (ovarian serous cystadenocarcinoma), PRAD (prostate adenocarcinoma), SKCM (skin cutaneous melanoma), STAD (stomach adenocarcinoma), THCA (thyroid carcinoma), and UCEC (uterine corpus endometrial carcinoma), downloaded over the period from October to December of 2017. Microsatellite instability (MSI) data was acquired from Hause et al[38]. MSI was included for analyses only for indications with at least 10 cases of MSI-H and MSS each (UCEC and STAD satisfied this criteria). Note COAD/READ also had substantial MSI-H, but the mutation calls dataset for the COAD/READ was limited (to 223 patients), thus after merging across datasets, there were too few MSI-H patients for MSI to be included for subsequent analyses.

Data preprocessing and quality controls

The datasets were further preprocessed in preparation for analyses and modeling in R. For all datasets, only one tumor sample was used per patient (filter was based on sample type code). In the case of patient tumors with multiple vials, the lowest-valued vial was used. For mRNA-seq data, the transcripts per kilobase million (TPM) value was used. TPM was calculated as scaled estimate (tau value) * 1e6. For CNA data, missing CNA were treated as zero (no CNA). For calculating NMD efficiency and burden (more details in NMD feature engineering section), a series of data cleaning and quality controls were performed (S2 Fig). PTC-bearing transcripts were excluded if they overlapped with CNA. Genes were removed from the NMD efficiency and burden calculation if the WT was noisy (CV > 0.05 or < 10 samples) or low expression (median TPM < 5).

Cytolytic activity

The cytolytic activity was calculated as the geometric mean of the expressions (in TPM+0.01) of GZMA and PRF1 [4]. The cytolytic activity values were also categorized into high/low with upper/lower quartiles.

NMD feature engineering

Metrics for nonsense-mediated decay burden was derived based on NMD efficiency values. The calculation for NMD efficiency was based on Lindeboom et al [26]. The data was preprocessed as described in the data quality controls section above. The efficiency was calculated at the gene-level, as the negative log base 2 transform of the ratio between the expression of the mutant-bearing transcript and the median mRNA expression of that transcript (calculated from samples with no mutations and copy number variations for that transcript). We accordingly derived NMD efficiencies for nonsense-, frameshift-, and nonsense/frameshift-bearing transcripts. Here it is important to clearly distinguish between NMD metrics. A measure of NMD burden is at the patient-level and is unique to this paper, while the measurement of NMD efficiency is at the gene-level and is the same as Lindeboom et al. To turn NMD efficiency measurements into estimates of NMD burden at the patient level, the NMD efficiency values were aggregated in several ways. This included calculation of the: median, mean, maximum, total number of genes with NMD, the fraction of total genes with NMD, and the median of the expression-weighted efficiency value (weighted by percentile of median WT gene expression out of all median WT gene expressions), and mean of the expression-weighted efficiency value.

Random forest model

A random forest model was used for predicting cytolytic activity (as a binary classification of low/high) based on the engineered feature set using the randomForest package in R. The random forest is a robust machine learning model that controls for overfitting internally and robust to multicollinearity. The hyperparameters for our random forest models were tuned using the caret package across the following values: number of randomly selected variables to try at each split, mtry [Round(α/2), α, 2α], where α = and N is the number of features; and number of trees to try [500, 1000]. The combination of hyperparameter values chosen for the final model was based on the model with the highest AUC. Since the overall goal of the model is to infer importance, the entire dataset with all patients was used for model building. Although random forest internally controls for overfitting, 10-fold cross-validation was also performed to check for robustness of the model. Different models were built with different subsets of feature classes (e.g. mutation, NMD burden, MSI, mutation + NMD burden, mutation + MSI, NMD burden + MSI). Pan-cancer models were also created. The individual datasets per indication were pooled together to generate the pan-cancer dataset. All metric calculations (e.g. median, etc.) were calculated at the per-indication level. Variable importance and significance were calculated using the rfPermute package. We also built null models where the dependent variable (cytolytic activity) was randomized. Ten randomized datasets were generated and random forest models (with cross-validation) were constructed on each. The final performance of the null model was reported as the mean and standard error of the AUROCs and OOB errors for each cancer type.

NMD alterations and pathway alterations

For NMD co-occurrence analyses, we used the cBioPortal [39] to examine all TCGA pan-cancer datasets (pan_cancer_atlas) on Jan 8, 2019. We searched NMD genes SMG 1,5,6,7 and USP 1,2,3B. Amplifications and mutations were searched separately. For copy number alterations of different pathways, we also searched the corresponding genes in cBioPortal. Genetic alterations of NMD and their association with NMD efficiency metrics and cytolytic activity metrics were also examined, with the metrics calculated per methods described above. For copy number variations, amplifications and deep deletions were considered using copy number cutoff of 1 and -1, respectively. NMD genes with no copy number variations (copy number value of zero) and no mutations were used as controls.

Survival analyses

Survival analyses were performed using the survival package in R. The survival function was estimated using Kaplan-Meier product limit method and its variance was estimated using Greenwood’s method. The hazard ratios were calculated using Cox proportional hazards model. In addition to univariate models, multivariate models were performed where each feature was controlled for age (categorical, < or ≥ 65), gender (categorical, male/female), TNM stage (categorical), TMB (categorical, ≤ or > median), and PD-L1 level (categorical, low/med/high with low/high based on top/bottom quartiles of TPM values). TMB was determined using total mutation counts of missense, nonstop, nonsense, and frameshift divided by 38Mb, an estimate of the exome size. The p-values were corrected for multiple hypothesis testing using Benjamini-Hochberg. Proportional hazards assumption for the final Cox regression models were also checked based on scaled Schoenfeld residuals plots and two-sided p-values.

Statistical analyses

All analyses were performed in R. Correlations among the covariates were calculated with Spearman correlation. In univariate analyses of feature values, the difference between categorical cytolytic low versus high was tested using Mann-Whitney for continuous features and Chi-squared for categorical features. Statistical test for trend between NMD metrics and NMD genetic alterations was performed using Jonckheere-Terpstra test. Statistical methods for random forest model and for survival analyses were described in their respective sections above.

Code and data availability

Code for the analyses and outputs are available on GitHub at

Supporting information

S1 Table. Multivariate random forest model performances for each cancer type.


S2 Table. Hazard ratios and statistical significance from univariate Cox regression models for each feature per cancer type.


S1 Fig. Mutation variant counts are associated with cytolytic activity, albeit small.

(A) Pan-cancer ROC curve for predicting cytolytic activity, using a random forest model with only counts of each mutation variant type (B) Out-of-bag error of overall model (black) and for predicting cytolytic activity low (red) and high (green). (C) Variable importance of the features used in the model, based on mean decrease in model accuracy. (D) Association in mutation counts among different mutation variant types. Missense, silent, and nonsense are correlated while frameshift is not.


S2 Fig. Correlation of gene expressions for GZMA and PRF1, component genes used for cytolytic activity.


S3 Fig. Pipeline schematic for data preprocessing and metric calculations for NMD burden.

The mRNA-seq, CNA, and exome-seq datasets were incorporated. Noisy genes were filtered out, followed by derivation of gene-level NMD efficiency values. expr, expression.


S4 Fig. Univariate regression of each feature to cytolytic activity, aggregated by indication.

Features were grouped into mutations (A), NMD frameshift-bearing (fs) (B), NMD nonsense-bearing (ns) (C), and NMD nonsense/frameshift-bearing (ptc) (D).


S5 Fig. Copy number alterations in different pathways across multiple indications, using the TCGA pan-cancer atlas datasets on cBioPortal.

Amplifications are shown in red and deletions in blue.


S6 Fig. NMD efficiency of patients with co-amplified NMD genes versus those without any alterations.

Y-values are shown as the difference in median of log10 transformed NMD metric values (co-altered versus no alterations). Dots shown in red are statistically significant with adjusted p-value < 0.05; Mann-Whitney test with Benjamini-Hochberg multiple hypothesis correction.


S7 Fig. Association between NMD co-alterations and global NMD efficiency (measured by different patient-level NMD metrics).

Statistical significance was assessed with Jonckheere-Terpstrata test for trends and p-values were multiple hypothesis corrected using the Benjamini-Hochberg method. Results with adjusted p-value < 0.05 are shown. The p-values shown in the plots are based on Mann-Whitney test of pair-wise comparisons.


S8 Fig. Association between NMD co-alterations and cytolytic activity.

Statistical significance was assessed with Mann-Whitney test and p-values were multiple hypothesis corrected using the Benjamini-Hochberg method. Results with adjusted p-value < 0.05 are shown. Nominal p-values shown in plots.


S9 Fig. Null models were not predictive of cytolytic activity.

ROC, AUROC, and out-of-bag (OOB) error of random forest models, built from ten randomized datasets, for (A) indications without microsatellite instability (MSI), and (B) indications with MSI. AUROC data are shown as AUROC ± SE of AUROCs.


S10 Fig. Distribution of values for each feature per indication.

Features were grouped into mutations (A), NMD frameshift-bearing (fs) (B), NMD nonsense-bearing (ns) (C), and NMD nonsense/frameshift-bearing (ptc) (D).


S11 Fig. Random forest model (with mutation variant counts (Mut) and NMD burden (NMD) combined) feature summaries for all indications.

(A) Mean decrease in model accuracy when a given feature is removed from the model. (B) Standardized mean decease in accuracy values from (A). (C) Statistically significant features from the model, with p-value < 0.05 marked in black.


S12 Fig.

Univariate overall survival analysis of SKCM for TMB (A) and PDL1 (B).


S13 Fig. Scaled Schoenfeld residuals for each covariate in the Cox regression model used in Fig 4C (SKCM).



We thank Pritchard lab members for helpful comments and criticism on the data and presentation.


  1. 1. McGranahan N, Furness AJS, Rosenthal R, Ramskov S, Lyngaa R, Saini SK, et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science (80-). 2016;351: 1463–1469. pmid:26940869
  2. 2. Rizvi NA, Hellmann MD, Snyder A, Kvistborg P, Makarov V, Havel JJ, et al. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science (80-). 2015;348: 124–128. pmid:25765070
  3. 3. Hellmann MD, Callahan MK, Awad MM, Calvo E, Ascierto PA, Atmaca A, et al. Tumor Mutational Burden and Efficacy of Nivolumab Monotherapy and in Combination with Ipilimumab in Small-Cell Lung Cancer. Cancer Cell. 2018;33: 853–861.e4. pmid:29731394
  4. 4. Rooney MS, Shukla SA, Wu CJ, Getz G, Hacohen N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell. 2015;160: 48–61. pmid:25594174
  5. 5. Tumeh PC, Harview CL, Yearley JH, Shintaku IP, Taylor EJM, Robert L, et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature. 2014;515: 568–71. pmid:25428505
  6. 6. Saltz J, Gupta R, Hou L, Kurc T, Singh P, Nguyen V, et al. Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. Cell Rep. 2018;23: 181–193.e7. pmid:29617659
  7. 7. Shin DS, Zaretsky JM, Escuin-Ordinas H, Garcia-Diaz A, Hu-Lieskovan S, Kalbasi A, et al. Primary Resistance to PD-1 Blockade Mediated by JAK1/2 Mutations. Cancer Discov. 2017;7: 188–201. pmid:27903500
  8. 8. Zaretsky JM, Garcia-Diaz A, Shin DS, Escuin-Ordinas H, Hugo W, Hu-Lieskovan S, et al. Mutations Associated with Acquired Resistance to PD-1 Blockade in Melanoma. N Engl J Med. 2016;375: 819–829. pmid:27433843
  9. 9. Gao J, Shi LZ, Zhao H, Chen J, Xiong L, He Q, et al. Loss of IFN-γ Pathway Genes in Tumor Cells as a Mechanism of Resistance to Anti-CTLA-4 Therapy. Cell. 2016;167: 397–404.e9. pmid:27667683
  10. 10. Borghaei H, Paz-Ares L, Horn L, Spigel DR, Steins M, Ready NE, et al. Nivolumab versus Docetaxel in Advanced Nonsquamous Non–Small-Cell Lung Cancer. N Engl J Med. 2015;373: 1627–1639. pmid:26412456
  11. 11. Kvistborg P, Philips D, Kelderman S, Hageman L, Ottensmeier C, Joseph-Pietras D, et al. Anti-CTLA-4 therapy broadens the melanoma-reactive CD8+ T cell response. Sci Transl Med. 2014;6: 254ra128–254ra128. pmid:25232180
  12. 12. Larkin J, Chiarion-Sileni V, Gonzalez R, Grob JJ, Cowey CL, Lao CD, et al. Combined Nivolumab and Ipilimumab or Monotherapy in Untreated Melanoma. N Engl J Med. 2015;373: 23–34. pmid:26027431
  13. 13. Motzer RJ, Tannir NM, McDermott DF, Arén Frontera O, Melichar B, Choueiri TK, et al. Nivolumab plus Ipilimumab versus Sunitinib in Advanced Renal-Cell Carcinoma. N Engl J Med. 2018;378: 1277–1290. pmid:29562145
  14. 14. Reck M, Rodríguez-Abreu D, Robinson AG, Hui R, Csőszi T, Fülöp A, et al. Pembrolizumab versus Chemotherapy for PD-L1–Positive Non–Small-Cell Lung Cancer. N Engl J Med. 2016;375: 1823–1833. pmid:27718847
  15. 15. Ock CY, Keam B, Kim S, Lee JS, Kim M, Kim TM, et al. Pan-Cancer Immunogenomic Perspective on the Tumor Microenvironment Based on PD-L1 and CD8 T-Cell Infiltration. Clin Cancer Res. 2016;22: 2261–2270. pmid:26819449
  16. 16. Anagnostou V, Smith KN, Forde PM, Niknafs N, Bhattacharya R, White J, et al. Evolution of Neoantigen Landscape during Immune Checkpoint Blockade in Non–Small Cell Lung Cancer. Cancer Discov. 2017;7: 264–276. pmid:28031159
  17. 17. Hellmann MD, Ciuleanu T-E, Pluzanski A, Lee JS, Otterson GA, Audigier-Valette C, et al. Nivolumab plus Ipilimumab in Lung Cancer with a High Tumor Mutational Burden. N Engl J Med. 2018; NEJMoa1801946. pmid:29658845
  18. 18. Giannakis M, Mu XJ, Shukla SA, Qian ZR, Cohen O, Nishihara R, et al. Genomic Correlates of Immune-Cell Infiltrates in Colorectal Carcinoma. Cell Rep. 2016;15: 857–865. pmid:27149842
  19. 19. Van Allen EM, Miao D, Schilling B, Shukla SA, Blank C, Zimmer L, et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science (80-). 2015;350: 207–211. pmid:26359337
  20. 20. Rajasagi M, Shukla SA, Fritsch EF, Keskin DB, DeLuca D, Carmona E, et al. Systematic identification of personal tumor-specific neoantigens in chronic lymphocytic leukemia. Blood. 2014;124: 453–462. pmid:24891321
  21. 21. Van Allen EM, Miao D, Schilling B, Shukla SA, Blank C, Zimmer L, et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma.[Erratum appears in Science. 2015 Nov 13;350(6262):aad8366; PMID: 26564858]. Science (80-). 2015;350: 207–211.
  22. 22. Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang T-H, et al. The Immune Landscape of Cancer. Immunity. 2018;48: 812–830.e14. pmid:29628290
  23. 23. Smart AC, Margolis CA, Pimentel H, He MX, Miao D, Adeegbe D, et al. Intron retention is a source of neoepitopes in cancer. Nat Biotechnol. 2018;36: 1056–1058. pmid:30114007
  24. 24. Turajlic S, Litchfield K, Xu H, Rosenthal R, McGranahan N, Reading JL, et al. Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. Lancet Oncol. 2017;18: 1009–1021. pmid:28694034
  25. 25. Duval A, Gayet J, Zhou XP, Iacopetta B, Thomas G, Hamelin R. Frequent frameshift mutations of the TCF-4 gene in colorectal cancers with microsatellite instability. Cancer Res. 1999;59: 4213–5. Available: pmid:10485457
  26. 26. Lindeboom RGH, Supek F, Lehner B. The rules and impact of nonsense-mediated mRNA decay in human cancers. Nat Genet. 2016;48: 1112–1118. pmid:27618451
  27. 27. Pastor F, Kolonias D, Giangrande PH, Gilboa E. Induction of tumour immunity by targeted inhibition of nonsense-mediated mRNA decay. Nature. 2010;465: 227–230. pmid:20463739
  28. 28. Garon EB, Rizvi NA, Hui R, Leighl N, Balmanoukian AS, Eder JP, et al. Pembrolizumab for the Treatment of Non–Small-Cell Lung Cancer. N Engl J Med. 2015;372: 2018–2028. pmid:25891174
  29. 29. Isken O, Maquat LE. The multiple lives of NMD factors: Balancing roles in gene and genome regulation. Nat Rev Genet. 2008;9: 699–712. pmid:18679436
  30. 30. Kitano S, Nakayama T, Yamashita M. Biomarkers for immune checkpoint inhibitors in melanoma. Front Oncol. 2018;8: 1–8.
  31. 31. Goodman AM, Kato S, Bazhenova L, Patel SP, Frampton GM, Miller V, et al. Tumor mutational burden as an independent predictor of response to immunotherapy in diverse cancers. Mol Cancer Ther. 2017;16: 2598–2608. pmid:28835386
  32. 32. McGranahan N, Rosenthal R, Hiley CT, Rowan AJ, Watkins TBKK, Wilson GA, et al. Allele-Specific HLA Loss and Immune Escape in Lung Cancer Evolution. Cell. 2017;0: 1259–1271.e11. pmid:29107330
  33. 33. Brown SD, Warren RL, Gibb EA, Martin SD, Spinelli JJ, Nelson BH, et al. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Genome Res. 2014;24: 743–50. pmid:24782321
  34. 34. Shukla SA, Rooney MS, Rajasagi M, Tiao G, Dixon PM, Lawrence MS, et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol. 2015;33: 1152–1158. pmid:26372948
  35. 35. The problem with neoantigen prediction. Nat Biotechnol. 2017;35: 97–97. pmid:28178261
  36. 36. Chalmers ZR, Connelly CF, Fabrizio D, Gay L, Ali SM, Ennis R, et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017;9: 1–14.
  37. 37. Nickless A, Bailis JM, You Z. Control of gene expression through the nonsense-mediated RNA decay pathway. Cell Biosci. 2017;7: 1–12.
  38. 38. Hause RJ, Pritchard CC, Shendure J, Salipante SJ. Classification and characterization of microsatellite instability across 18 cancer types. Nat Med. 2016;22: 1342–1350. pmid:27694933
  39. 39. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6: pl1. pmid:23550210