Figures
Abstract
Immunotherapy has recently shown important clinical successes in a substantial number of oncology indications. Additionally, the tumor somatic mutation load has been shown to associate with response to these therapeutic agents, and specific mutational signatures are hypothesized to improve this association, including signatures related to pathogen insults. We sought to study in silico the validity of these observations and how they relate to each other. We first addressed the question whether somatic mutations typically involved in cancer may increase, in a statistically meaningful manner, the similarity between common pathogens and the human exome. Our study shows that common mutagenic processes like those resulting from exposure to ultraviolet light (in melanoma) or smoking (in lung cancer) increase, in the upper range of biologically plausible frequencies, the similarity between cancer exomes and pathogen DNA at a scale of 12 to 16 nucleotide sequences (corresponding to peptides of 4 – 5 amino acids). Second, we investigated whether this increased similarity is due to the specific mutation distribution of the considered mutagenic processes or whether uniformly random mutations at equal rate would trigger the same effect. Our results show that, depending on the combination of pathogen and mutagenic process, these effects need not be distinguishable. Third, we studied the impact of mutation rate and showed that increasing mutation rate generally results in an increased similarity between the cancer exome and pathogen DNA, again at a scale of 4 – 5 amino acids. Finally, we investigated whether the considered mutational processes result in amino-acid changes with functional relevance that are more likely to be immunogenic. We showed that functional tolerance to mutagenic processes across species generally suggests more resilience to mutagenic processes that are due to exposure to elements of nature than to mutagenic processes that are due to exposure to cancer-causing artificial substances. These results support the idea that recognition of pathogen sequences as well as differential functional tolerance to mutagenic processes may play an important role in the immune recognition process involved in tumor infiltration by lymphocytes.
Citation: Ebrahimzadeh E, Engler M, Tse D, Cristescu R, Tchamkerten A (2019) Somatic mutations render human exome and pathogen DNA more similar. PLoS ONE 14(5): e0197949. https://doi.org/10.1371/journal.pone.0197949
Editor: Alvaro Galli, CNR, ITALY
Received: May 9, 2018; Accepted: February 18, 2019; Published: May 1, 2019
Copyright: © 2019 Ebrahimzadeh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data files files are available from the http://www.ensembl.org/Homo_sapiens/Info/Index, http://uswest.ensembl.org/biomart/martview/, http://www.ncbi.nlm.nih.gov/.
Funding: This work was supported in part by the Agence Nationale de la Recherche (Grant Number 2KCPR923) and by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG008164. Merck Research Laboratories provided additional support in the form of salary for author RC, but did not have any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of this author are articulated in the author contributions section.
Competing interests: RC is employed by Merck Research Laboratories. This commercial affiliation does not alter our adherence to PLOS ONE policies on sharing data and materials. All other authors declare that no competing interests exist.
Introduction
Recent clinical advances firmly establish the role of immunotherapy (in particular, checkpoint inhibition targetting the CTLA4 and PD1/PD-L1 pathways [1]) in the treatment of cancer. However, the rates of response vary by indication, outlining the important role of identifying the patients most likely to respond [2–5]. In parallel, the analysis of the data in large scale genomic efforts including The Cancer Genome Atlas (TCGA [6]) has identified universal characteristics of the tumor and its environment that ellicit potential recognition by the host immune system. In particular, somatic mutational load as inferred by DNA sequencing [7, 8] and cytolytic infiltrate as inferred by immunohistochemistry or RNA sequencing [9] have emerged as hallmarks of an immune-active tumor enviroment. It is thus important to understand the causality and mechanism of action that drives the heterogenous composition of the tumor and its environment and consequently the heterogeneity of response to immunotherapy, in order to select the right patients for treatment, potential combinations, and potential for early intervention.
Multiple recent studies have suggested a strong causal link between the mutational burden of the tumor and clinical response to immunotherapy across multiple indications including Melanoma [10, 11], Non Small Cell Lung Cancer [12], Bladder cancer [13] and Colorectal cancer [14]. In these studies, a strong relationship between neoantigen load (the number of mutations with immunogenic potential) and response to immunotherapy has been identified. Importantly, each of these indications are characterized by distinct mutagenic processes that result in abundant neoantigen load [7, 8]: UV light exposure in Melanoma, smoking in Non Small Cell Lung Cancer, APOBEC activation in Bladder cancer, and MMR defficiency in MSI-h Colorectal cancer. Whether particular mutations or mutational patterns preferentially induce an immunologic phenotype remains an open question [10, 11]. However, several hypotheses have recently been put forward, including the presence of mutations in particular genes [15, 16], or the presence of a transversion signature related to smoking [12]. In particular, Snyder et al. [11] put forward a hypothesis linking cancer exomes with patterns present in common pathogens. Namely, their results with exome analysis of Melanoma patients treated with Ipilimumab, a CTLA4 inhibitor, suggest that somatic mutations in cancer genomes that lead to tetrapeptides similar to those found in common pathogens are more likely to elicit a response to the therapy than common somatic mutations. This association is presumably driven by the innate ability of significant portions of the adaptive immune repertoire to recognize such pathogens.
We took an in-silico approach to evaluate the impact of certain mutagenic processes on the similarity between cancer exomes and pathogen DNAs. Somatic mutations are an inherent natural process related to cell division and aging which in some instances is exacerbated by mutagenic factors. We simulated such mutagenic processes using mixtures of mutational signatures with empirically derived mixing parameters. We used a simple similarity metric between the mutated exome and common pathogen exomes to estimate changes in overall potential immunogenicity of cancer exomes as compared to the normal exome. We considered simulations of mutagenic processes that yield most mutated cancer exomes, namely ultra-violet (UV) light (Melanoma), smoking (Non Small Cell Lung Cancer), and APOBEC activation (Bladder cancer) [7, 9]. Our results suggest that, in the upper range of biologically plausible mutation rates, mutagenic processes resulting from exposure to these common mutagens lead to cancer exomes that are more similar pathogen DNAs at a scale of 12 to 16 nucleotides. These changes are subtle but nevertheless statistically significant and are particularly important in the range of peptide sizes that are relevant for epitope presentation in the human MHC mechanism; MHC presentation typically involves peptides with lengths between 8-18 nucleotides (8-13 for class I MHC and 13-18 for class II MHC [17]).
However, our results also suggest that the increased similarity need not be caused by the specificity of the mutation distribution. Depending on the pathogen, uniformly random mutations (at the same rate) may result in equal increased similarity. Finally, we show that increasing mutation rate generally results in increased similarity between cancer exomes and pathogen DNAs. These conclusions suggest that mutagenic processes might act as a mechanism of pressure that models the mutational spectra observed in tumors by increasing recognition from the host immune system.
Opposite to the aforementioned effect that increases the likelihood that a cancer exome is recognized by the immune system, an antagonist mechanism of pressure on mutational landscape stems from tolerance by the immune system to natural mutagenic processes. To that extent, we establish that exomes across species are generally more resilient, in terms of a functional point of view related to the synonymity of amino-acid changes, to mutagenic processes that are due to exposure to elements of nature than to mutagenic processes that are due to exposure to cancer-causing artificial substances. In particular, we observe that the functionality of the genetic code (allocation of codons to amino-acids) is more resilient to UV light than smoking mutagenic processes at a fixed rate. This suggests the possibility that there are different tissue-dependent evolutionary tolerance levels, modulated by the pathogen recognition apparatus in terms of both immune recognition and cancer development, which for example reflect in the much higher mutational loads and immune infiltrate in Melanoma compared to Lung cancer [9].
1 Methods
We sought to assess whether certain mutagenic processes result in somatic alterations that increase the similarity of the mutated human exome with selected pathogens. Accordingly, we first defined a pairwise similarity metric among DNA sequences of different length and evaluated the similarity between pathogens and the normal human exome. Second, we simulated mutations resulting from different mutagenic processes at different mutation rates acting on the human exome and evaluated the consequent change in similarity of the mutated human exome with respect to the pathogen exomes. Third, we investigated the resiliency of exomes (human exome and model organism exomes) in terms of maintained functionality of the resulting amino-acids and compared the sequences of amino acids of the normal and mutated exomes.
Data and computing resources
We obtained the human normal exome from GRCh38 http://www.ensembl.org/Homo_sapiens/Info/Index.
We considered the following list of model organisms: Mus Musculus (Mouse), Saccharomyces Cerevisiae (Yeast), Felis Catus (Cat), Drosophila Melanogaster (Fruitfly), Caenorhabditis Elegans (Nematode), Xenopus, Danio Rerio (Zebrafish), Cavia Porcellus (Pig), Anolis carolinensis (Anolis). Exomes from these organisms were obtained from http://uswest.ensembl.org/biomart/martview/.
We considered the following list of viral pathogens: Cytomegalovirus (CMV), Dengue virus, Ebola virus, Epstein-Barr virus (EBV), Human Herpesvirus 6 (HHV), Human Papillomavirus (HPV), Measles virus, Yellow Fever virus. DNA sequences from these pathogens were obtained from http://www.ncbi.nlm.nih.gov/.
We considered simulations of mutational signatures resulting from ultra-violet (UV) light (specific to Melanoma), smoking (specific to Non Small Cell Lung Cancer (NSCLC)), and APOBEC activation (specific to Bladder cancer). These simulations were based on the data from [8, Supplementary information, Table S2] restricted to the set of patients with Melanoma cancer, NSCLC, and Bladder cancer.
For simulations we used Python 2.7.6 (libraries random, numpy, and scipy.stats) and ran programs on a shared server with 8 CPUs and 128GB memory.
2 Results
2.1 Pathogen DNA vs. human exome and MHC mechanism
To quantify the similarity between a pathogen DNA, denoted by x, and the human exome, denoted by y, we considered the following similarity score. For a given integer ℓ ≥ 1, the similarity score, denoted by sℓ(x, y), corresponds to the relative proportion of length-ℓ strings in the pathogen DNA that also appear in the human exome at least once, that is
where
Here L denotes the length of the pathogen DNA,
denotes the pathogen DNA substring starting at position i and ending at position i + ℓ − 1, and “≺” denotes string inclusion. In particular, sℓ(x, y) = 1 corresponds to the case where all length-ℓ strings in the pathogen DNA also appear in the human exome and sℓ(x, y) = 0 corresponds to the case where the pathogen DNA and the human exome have no length-ℓ string in common. Observe that sℓ(x, y) can be interpreted as the probability that a randomly and uniformly picked length-ℓ string in the pathogen DNA also appears in the human exome. Accordingly, we often refer to sℓ(x, y) as the matching probability. Finally, notice that sℓ(x, y) does not count multiplicity, i.e., strings that appear only once in the human exome and strings that appear multiple times in the human exome are note distinguished.
In Fig 1, each curve represents the matching probability sℓ(x, y) for a specific pathogen DNA x and the normal human exome y, for ℓ ∈ {9, 10, …, 18}. To benchmark these scores we also considered the matching probability with respect to a randomly and uniformly generated “pathogen” sequence, where each nucleotide is equally likely to occur. The average matching probability with respect to such a sequence is represented by the “Random” curve in Fig 1 and turns out to be independent of its length L. This curve is indistinguishable from the 95% confidence interval corresponding to a randomly generated sequence. Supporting material for Fig 1 is deferred to Section A.1 in the Appendix. We make the following observations:
- For all pathogens the similarity score is equal to one for ℓ ≤ 10, that is length ℓ ≤ 10 subsequences of the pathogen DNAs all appear in the human exome as well.
- The similarity scores are non-zero for all pathogens up to length ℓ = 20. At ℓ = 21 the similarity scores is zero for the Ebola virus, the Measles virus, and the Dengue virus.
- For all ℓ ∈ {11, …, 18} the similarity score for pathogen DNAs is higher than for a random sequence, except for CMV (ℓ ∈ {15, …, 18}) and for HHV (ℓ ∈ {13, …, 18}).
- From ℓ = 10 there is a steep decrease in the similarity scores, down to less than 15% for ℓ = 15. A closer look at the data (see Tables in the Appendix A.1) reveals that, for all pathogens, the sharpest relative drop of the similarity score occurs from ℓ = 12 to ℓ = 13 or from ℓ = 13 to ℓ = 14.
- The differences in score across pathogens is maximal at ℓ ∈ {12, 13}.
The “Random” curve refers to the average score of a randomly and uniformly generated “pathogen” DNA sequence.
These in-silico observations are in line with the concept that 4 – 5 amino acids are enough for the presentation machinery in terms of both diversity of possible sequences (204 – 205) and differentiation of self from foreign sequences in the MHC machinery. Namely, this length is strikingly similar to the length of peptides studied in the signature determined by [11].
2.2 Impact of somatic mutations on pathogen DNA and human exome similarity score
To assess the impact of somatic mutations on pathogen DNA and human exome similarity score and identify the roles of mutation distribution and mutation rate we proceeded as follows:
- Normal exome vs. cancer exome: we investigated whether cancer somatic mutations render pathogen and human exome more similar, and whether random mutations alone, with uniform distribution across mutations, would produce the same results as (typically non-uniform) cancer-dependent mutations, at the same mutation rate.
- Impact of mutation rate: we investigated whether a higher mutation rate renders pathogen DNA and human exome more similar.
Central to our investigation is a notion of cancer channel described next.
Cancer channel.
We simulated the changes induced to the normal exome by cancer specific mutagens in a probabilistic way. The cancer exomes were generated from the normal exome by using cancer-dependent mixtures of mutational signatures with empirical weights derived from data in [8]. Note that even if a cancer typically exhibits a dominant mutational signature, the simulated mutagenic process results in a more realistic combination of such signatures. The similarity scores of the normal exome and cancer exome were then computed for each pathogen. To formalize our analysis, we used concepts from information theory, in particular related to communications over a noisy channel. To a given cancer and mutation rate we associated a transformation, referred to as “cancer channel,” which mimics the typical effects of the mutagenic process that are specific to the cancer at the given mutation rate. Analogously to a communication channel that alterates a transmitted message because of noise (see, e.g., [18]), a cancer channel alterates a DNA sequence because of somatic mutations. Given a particular cancer c and a mutation rate ρ the cancer channel assigns to each nucleotide α the probability of being mutated into nucleotide β. This probability was derived using data from [8, Supplementary information, Table S2] (see Appendix A.2 in this paper).
To obtain a cancer exome we “passed” the normal human exome y through cancer channel
as shown in Fig 2. Specifically, the cancer exome
was generated from y so that the probability to obtain
from normal exome y = {y1, y2, …, yG} was given by
Cancer exome is obtained from normal exome y = {y1, y2, …, yG} through a cancer specific probabilistic transformation
which assigns to each nucleotide α the probability of being mutated to nucleotide β. This transformation depends on both the mutation distribution specific to cancer c and the mutation rate ρ.
Normal vs. cancer specific and random mutations.
For given pathogen x, cancer c, and mutation rate ρ we performed two tests. In Test 1, we evaluated the statistical significance of the effect of cancer somatic mutations in making human exome more similar to pathogen DNA sequences. In Test 2, we compared cancer somatic mutations and random mutations in making the human exome more similar to pathogen DNA sequences. Both tests were peformed for ρ-values of 0.0005, 0.001, and 0.01. The lowest mutation rate was chosen to be 0.0005 as it represents a good compromise between biological and statistical relevance. It lies in the upper range of the mutation rates observed in actual cancer samples [8] and in the lower range for statistical relevance—see next subsection.
Test 1: For each ℓ ∈ {10, 11, …, 18} we independently generated 1000 cancer exomes from the normal human exome y and computed the corresponding similarity scores
. P-values were computed for comparing the mean of
against sℓ(x, y) using a one-sided t-test with a null hypothesis that the true mean of
is no larger than sℓ(x, y).
Test 2: We replaced the cancer channel by a “random channel” which produced mutations at the same rate but in a uniform (1/3, 1/3, 1/3) manner. For each ℓ ∈ {10, 11, …, 18} we independently generated 1000 exomes by passing the normal human exome y through the random channel and computed the corresponding similarity scores
. P-values were computed for comparing the mean of
against the mean of
(obtained in Test 1) using a two-sample one-sided t-test with a null hypothesis that the true mean of
is no larger than the true mean of
—note that directly computing the true mean of
over
is impossible as it amounts to computing a sum over all
possible cancer exomes, and similarly for the mean of
.
In Fig 3, each histogram refers to a particular cancer and mutation rate. Red bars refer to Test 1 and blue bars refer to Test 2. Bar height represents, for any given subsequence length ℓ ∈ {10, 11, …, 18}, the proportion of pathogens (out of the 8 considered in this paper) for which the p-value is ≤ 0.01. Related data can be found in the tables of the Appendices A.3, A.4, and A.5 for ρ = 0.0005, ρ = 0.001, and ρ = 0.01, respectively. In these tables, the second column refers to sℓ(x, y), the third column gives a 95% confidence interval for , the fourth column gives the p-value for Test 1 and the fifth column gives the p-value for Test 2. We make the following observations:
- Referring to Test 1 (red bars in Fig 3), all three mutagenic processes render the human exome more similar to all pathogen DNA sequences at all ρ ∈ {0.0005, 0.001, 0.01} and ℓ ∈ {12, …, 16}. For ℓ ≤ 11 or ℓ ≥ 17 the effect of the mutagenic processes on the similarity scores are less conclusive. This suggests that the increase of similarity is particularly relevant in the range of peptide sizes (4 – 5 amino-acids) that are relevant for epitope presentation in the human MHC presentation. Note, however, that the changes in similarity are small, typically ≪ 1% (see tables in Sections A.3-A.5, Columns 2, 3).
- Whether the above change of similarity is due to the specificity of the mutation distribution or random mutations trigger the same effect depends on the pathogen, the length, and the mutation rate. For instance, for Melanoma at ℓ = 13 the change in similarity due to cancer specific mutations is more pronounced for 5 out of the 8 pathogens, for ρ ∈ {0.0005, 0.001, 0.01}. By contrast, for all mutagenic processes there appears to be no statistical difference at length 11.
The height of the blue bars represents the proportion of pathogens whose DNA are more similar to cancer exomes than to exomes with equal mutation rate but uniformly distributed mutations (two-sample one-sided t-test p-value ≤ 0.01).
Impact of mutation rate.
To assess the impact of mutational rate on the similarity between pathogen DNA and human exome, for any given mutagenic process, pathogen DNA, and length we proceeded as follows. We first generated 1000 cancer exomes at mutation rate ρ = 0.0005 and 1000 cancer exomes at mutation rate 0.001. Second, we computed the similarity scores of the two sets of cancer exomes relative to the pathogen DNA. P-values were computed for comparing the means of the two sets of similarity scores using a two-sample one-sided t-test with a null hypothesis that the true mean of the similarity scores at the lowest rate (ρ = 0.0005) is no larger than the true mean of the similarity scores at the higher rate (ρ = 0.001). We then repeated the experiment for ρ = 0.001 vs. ρ = 0.01. In Fig 4, the histograms represent the proportion of pathogens for which the p-value is ≤ 0.01—grey bars refers to the 0.0005 v.s. 0.001 experiment and the orange bars refer to the 0.001 v.s. 0.01 experiment. We obtain the following result:
- For all combinations of mutagenic processes and pathogens, and for all ℓ ∈ {11, …, 16}, a higher mutation rate results in higher similarity score. For ℓ ∈ {9, 10, 17, 18} results are inconclusive.
Orange bars refer to the same proportions but when mutation rate increases from 0.001 to 0.01.
2.3 Resiliency of exomes with respect to mutagenic processes
In order to compare the resiliency of the model organism exomes with respect to mutagenic processes, we evaluated the error correction capabilities of the genetic code (the codon allocation to amino-acids) for each combination of model exome and mutagenic process. Referring to Fig 5, y = {y1, …, yL} represents a DNA sequence whose corresponding sequence of amino acids is {a1, …, aL/3}. This DNA sequence is then passed through a given cancer channel and results in a cancer sequence
and a corresponding sequence of cancer amino acids
. From {a1, a2, …, aL/3} and
we computed the relative proportion of amino acids that were affected, that is
(1)
Finally, averaging over all possible realizations of
(and therefore over
), we obtained the average error probability
(2) Fig 6 represents
for each combination of model organism, cancer mutation process, and mutation rate ρ ∈ {0.0001, 0.001, 0.01}. Notice that
is not a linear function of ρ. Computation details for
are deferred to the Appendix A.6. Referring to Fig 6, we obtain the following result:
- Although the proportion of non-synonymous mutations varies across exomes for the three types of mutagenic processes, it is always lowest for melanoma and maximal for lung. Moreover, this ordering holds irrespectively of the intensity of the mutation rate. It should be noted that we evaluated the proportions of non-synonymous mutations for several other organisms as well (including the set of pathogens considered in this paper) and this finding was validated in all cases.
3 Discussion
We employed large scale simulations to model the random (across space) effect of stochastic mutagenic processes on the human normal genome. We believe this is a valid approach since the cancer exome available data does suggest that, while at the granular level mutation rates vary, the mutagenic processes in cancers with large number of mutations affect equally all chromosomal regions of the exome [8]. Essentially, we simplify the analysis using this assumption.
Our in-silico results show that, in general, the typical stochastic mutagenic processes encountered in the major cancer indications with abundant neoantigens do appear to shift the peptide distribution of the modified exome universally towards a landscape that appears more similar to pathogenic insult. Specifically, all three mutagenic processes considered induce subtle but robust shifts in the measure by which we characterized the similarity between the normal human exome and pathogen DNA sequences, at mutation rates in the upper range of the mutation rates observed in actual cancer samples (≥ 0.0005). Moreover, the range of peptide lengths where this shift happens aligns with the typical length of peptides presented by the human MHC presentation system, suggesting an increased potential for recognition of these types of somatic mutations by a pathogen-trained host immune system.
We also note that for many combinations of pathogen DNA and mutagenic process cases this increase of similarity cannot be solely attributed to the mutation distribution; randomly and uniformly distributed mutations can cause similar shifts in similarity. By contrast, increasing the mutation rate while keeping the underlying mutation distribution fixed always results in an increased similarity betweeen human exome and pathogen DNA at ℓ ∈ {11, …, 16}, which again corresponds to the length of peptides presented by the human presentation system. This suggests that the intensity of the mutational rate is an important parameter that directly affects the similarity between cancer exome and pathogen DNA.
We also observe that the effect of the considered mutagenic processes on the likelihood of observing a non-synonymous alteration is strikingly different across processes but consistent across the species studied in our framework (human and model organisms). Melanoma/UV light alterations are the least likely to result in amino acid functional changes, followed by APOBEC-driven alterations and then by smoking alterations, suggesting different error-correcting capabilities of the living exomes towards this various mutagenic insults. This is an attractive observation from an evolutionary perspective: due to universal exposure to sunlight, organisms likely developed similarly universal intrinsic protection from UV light type of modifications to their exomes via the redundancies in the aminoacid codon allocation. Similarly, APOBEC-activation appears to be a universal innate protection mechanism that allows the cell to induce damaging mutations to foreign organisms, while the mutations resulting from tobacco smoking are less likely to have presented evolutionary pressure. In summary, our in-silico approach reveals two competing mechanisms of tolerance pressure on the major mutagenic processes present in human cancers that modulate the potential immune recognition of alterations at the exome level through pathogen similarity and through functional redundancy; the balance between these mechanisms may significantly contribute to the eventual mutational landscape of advanced cancers.
A Appendices
A.1 Data for Fig 1
In Table 1 below we listed the similarity scores sℓ(x, y) of each pathogen x against the human exome y, as a function of the subsequence length ℓ.
The column “Random” refers to a 95% confidence interval for the similarity score between a randomly generated pathogen sequence X, where each nucleotide is independently and uniformly selected with probability 1/4, and the normal human exome y. To compute this confidence interval we proceeded as follows. The similarity score for a random instance X of length L is given by
where the Zi’s are i.i.d. Bernoulli random variables such that
(3)
Here Mℓ denotes the number of distinct length-ℓ substrings in the human genome and was computed empirically for ℓ ∈ {9, 10, …, 15}:
Taking expectation over X yields
A confidence inteval for sℓ(X, y) was computed via Chebyshev’s inequality as follows. We have
(4)
Furthermore,
where for the second equality we used the fact that the Zi’s are identically distributed and that Zk and Zj are independent whenever j ≥ k + ℓ. Now
and since the Zi’s are binary random variables
Therefore,
(5)
Finally, from (3), (4), and (5) we get
To obtain a 95% confidence interval we picked
(6)
which is below 0.002 for all ℓ ∈ {9, 10, …, 15} regardless of the pathogen length L.
A.2 Cancer channel
We describe how we obtained cancer channel for a given cancer and mutation rate. For each cancer c (Melanoma cancer, NSCLC, Bladder cancer) we considered the set
of patients in [8, Supplementary information, Table S2] with that cancer. Then, for every mutation α → β we empirically computed the average proportion of mutations across patients
where pc(i, α → β) denotes the proportion of α → β mutations among all mutations in patient i and was computed from [8, Supplementary information, Table S2]. The probability that a nucleotide α in the normal exome results in nucleotide β in the cancer exome is therefore given by
for β ≠ α and
The parameter ρ denotes the overall mutation rate and p(α) denotes the relative number of nucleotide α in the exome and was computed from [8, Supplementary information, Table S2].
Remark. Because in the data from [8, Supplementary information, Table S2] complementary mutations were counted under the same category (e.g., a change from cytosine to tyamine would be treated the same as a change from guanine to adenine), mutation types were considered in pairs. Since the relative proportions of complementary pairs were not given inf, we made the assumption that they were equal. Hence, in the above expression pc(i, α → β, i) actually corresponds to
where (α′, β′) is the complementary pair of (α, β).
The second column in Tables 2–9 represents sℓ(x, y) as a function of ℓ. The third column represents a 95% confidence interval for obtained through a standard application of the central limit theorem. This confidence interval is given by
where sℓ(x) denotes the average of
over the 1000 independent trials
and where σ denotes the empirical standard deviation of
. The fourth column in the tables of Sections A.3-A.5 gives the p-value for Test 1 and the fifth column gives the p-value for Test 2.
A.3 ρ = 0.0005.
A.4 ρ = 0.001.
A.5 ρ = 0.01.
A.6 Error probability data for Fig 6
To compute in (2) we proceeded as follows. We have
(7)
where the summation ranges over amino acid positions. Let us compute
—for the other terms we proceed in the same way. Observe that a1 is a function of the first three nucleotides y1, y2, y3 of the normal exome y. To emphasize this, let us write a1 as a1(y1, y2, y3). Similarly,
is a function of the first three nucleotides
of the cancer genome
and we write it as
. Therefore, we have
(8)
where
is the cancer channel defined in the Appendix A.2.
References
- 1. Hoos . Development of Immuno-Oncology Drugs—from CTLA4 to PD1 to thte next generations. Nature Reviews Drug Discovery. 2016;15:235–247. pmid:26965203
- 2. Mariathasan S, Turley SJ, Nickles D, Castiglioni A, Yuen K, Wang Y, et al. TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature. 2018;554(7693):544. pmid:29443960
- 3. Chowell D, Morris LG, Grigg CM, Weber JK, Samstein RM, Makarov V, et al. Patient HLA class I genotype influences cancer response to checkpoint blockade immunotherapy. Science. 2018;359(6375):582–587. pmid:29217585
- 4. Riaz N, Havel JJ, Makarov V, Desrichard A, Urba WJ, Sims JS, et al. Tumor and microenvironment evolution during immunotherapy with nivolumab. Cell. 2017;171(4):934–949. pmid:29033130
- 5. Łuksza M, Riaz N, Makarov V, Balachandran VP, Hellmann MD, Solovyov A, et al. A neoantigen fitness model predicts tumour response to checkpoint blockade immunotherapy. Nature. 2017;551(7681):517. pmid:29132144
- 6.
The Cancer Genome Atlas. cancergenomenihgov.
- 7. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500(7463):415–421. pmid:23945592
- 8. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214–218. pmid:23770567
- 9. Rooney MS, Shukla SA, Wu CJ, Getz G, Hacohen N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell. 2015;160(1):48–61. pmid:25594174
- 10. Van Allen EM, Miao D, Schilling B, Shukla SA, Blank C, Zimmer L, et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science. 2015;350(6257):207–211. pmid:26359337
- 11. Snyder A, Makarov V, Merghoub T, Yuan J, Zaretsky JM, Desrichard A, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. New England Journal of Medicine. 2014;371(23):2189–2199. pmid:25409260
- 12. Rizvi NA, Hellmann MD, Snyder A, Kvistborg P, Makarov V, Havel JJ, et al. Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer. Science. 2015;348(6230):124–128. pmid:25765070
- 13. Rosenberg Jea. Atezolizumab in patients with locally advanced and metastatic urothelial carcinoma who have progressed following treatment with platinum-based chemotherapy: a single-arm, multicentre, phase 2 trial. The Lancet. 2016;387:1909–1920.
- 14. Le DT, Uram JN, Wang H, Bartlett BR, Kemberling H, Eyring AD, et al. PD-1 blockade in tumors with mismatch-repair deficiency. New England Journal of Medicine. 2015;372(26):2509–2520. pmid:26028255
- 15. et al H . Genomic and Transcriptomic Features of Response to Anti-PD-1 Therapy in Metastatic Melanoma. Cell. 2016;165:35–44.
- 16. Eroglu Z, Zaretsky JM, Hu-Lieskovan S, Kim DW, Algazi A, Johnson DB, et al. High response rate to PD-1 blockade in desmoplastic melanomas. Nature. 2018. pmid:29320474
- 17. Wieczorek M, Abualrous ET, Sticht J, Álvaro-Benito M, Stolzenberg S, Noé F, et al. Major histocompatibility complex (MHC) class I and MHC class II proteins: conformational plasticity in antigen presentation. Frontiers in immunology. 2017;8:292. pmid:28367149
- 18.
Cover TM, Thomas JA. Elements of information theory. John Wiley & Sons; 2012.