An in vitro model of tumor heterogeneity resolves genetic, epigenetic, and stochastic sources of cell state variability

Tumor heterogeneity is a primary cause of treatment failure and acquired resistance in cancer patients. Even in cancers driven by a single mutated oncogene, variability in response to targeted therapies is well known. The existence of additional genomic alterations among tumor cells can only partially explain this variability. As such, nongenetic factors are increasingly seen as critical contributors to tumor relapse and acquired resistance in cancer. Here, we show that both genetic and nongenetic factors contribute to targeted drug response variability in an experimental model of tumor heterogeneity. We observe significant variability to epidermal growth factor receptor (EGFR) inhibition among and within multiple versions and clonal sublines of PC9, a commonly used EGFR mutant nonsmall cell lung cancer (NSCLC) cell line. We resolve genetic, epigenetic, and stochastic components of this variability using a theoretical framework in which distinct genetic states give rise to multiple epigenetic “basins of attraction,” across which cells can transition driven by stochastic noise. Using mutational impact analysis, single-cell differential gene expression, and correlations among Gene Ontology (GO) terms to connect genomics to transcriptomics, we establish a baseline for genetic differences driving drug response variability among PC9 cell line versions. Applying the same approach to clonal sublines, we conclude that drug response variability in all but one of the sublines is due to epigenetic differences; in the other, it is due to genetic alterations. Finally, using a clonal drug response assay together with stochastic simulations, we attribute subclonal drug response variability within sublines to stochastic cell fate decisions and confirm that one subline likely contains genetic resistance mutations that emerged in the absence of drug treatment.


Novelty and utility of the genetic-epigenetic-stochastic (G/E/S) framework. The manuscript
pseudo-presents a set of ideas at the beginning as if they have never been described before. In fact, prior work by the authors follows almost the exact same organizing principles, with "genotype" substituted for "drug treatment" (PMID 29590606)

. It is fine to build off past examples-simply cite the best and most-informative examples and get on with the new results
here. Second, while I do not doubt that the G/E/S framework provides a straightforward physicsbased analogy, I question whether it provides more than a cartooned explanation of the findings. The framework has no predictive value, and the sketches are so flexible that it appears to be consistent with whatever data are at hand. Is there anything that could falsify the G/E/S framework? A theory that explains everything explains nothing.
Response: We thank the reviewer for this comment and apologize for any confusion that our presentation of the G/E/S framework may have caused. We certainly do not claim that these ideas are novel to us, nor that they have never been described before. To the contrary, we explicitly stated at the beginning of the third paragraph of the original manuscript that this three-tiered view of tumor heterogeneity has been proposed previously and provided numerous citations to relevant works. We followed that with a statement, which we believe to be true, that these ideas have not been fully accepted within the cancer research community because of a lack of strong experimental evidence to support them. In making these statements, we were attempting to frame the purpose of our paper as providing such evidence. Indeed, as the Reviewer notes, we used the same theoretical framework in our previous publication (PMID: 29590606) but did not include molecular or genomic profiling. This work builds on that prior work, albeit in a different cancer type. We recognize, however, that by focusing so heavily on the framework we failed to convey this message. Therefore, we have made significant changes to the manuscript to reduce the focus on the G/E/S framework (a term we no longer use) and instead focus on the experimental data that we collected and the model simulations we performed. For example, we modified the title to remove any reference to the framework. Also, we now present the framework briefly in the second paragraph of the Introduction, refer the reader to the Supplementary Text for additional details, and move immediately into the Results. We return to the framework in the second paragraph of the Discussion, where we use it to interpret our results and draw conclusions. We follow that with a paragraph where we briefly review some recent papers that we believe are consistent with the framework. We then conclude the manuscript with a paragraph where we use the framework to muse on possible reasons for why many modern cancer therapies fail and hypothesize on strategies that may prove more effective. We hope these changes more clearly communicate the goals of our manuscript and convey the proper deference to the wide body of literature on which our work is based.
With regard to the predictive value of the framework, we do believe that it can produce testable hypotheses and be falsified. A simple example of this are the transcriptomic profiles of the sublines. The theory asserts that single-cell cloning will produce genetically and epigenetically homogenous populations, at least for some period of time (depending on the depths of the basins). Therefore, one prediction would be that the transcriptomic profiles of the PC9-VU sublines will overlap with different portions of the PC9-VU parental transcriptome, which we indeed see. If we did not observe this in our data for the majority of the sublines, that would call into question the validity of the theory. Furthermore, the DS8 subline is an obvious exception. Since its transcriptomic profile is so distinct from that of the PC9-VU parental line, one would predict that it is genetically distinct, which we argue is true based on our data. If this were not the case, it would again call into question the validity of the theory. A third example relates to the outlier state in PC9-VU and DS9 that resides in the same region of transcriptomic space as DS8 (also see our response to Major Point #5 below). The nature of this state is unclear at present but in the Supplementary Text and Supplementary FIG. S14 we propose two hypotheses based on the G/E/S framework that could be tested experimentally. Moreover, as the Reviewer also notes (Major Point #5), this state could be similar to the preresistant state reported by Shaffer et al. (PMID: 28607484). An important question, however, is whether cells linger in that state or simply transition in and out randomly due to stochastic noise. The G/E/S framework would predict that cells linger since the state is a basin in an epigenetic landscape. In principle, this prediction could be tested by experimentally measuring residence times in the state and comparing against timescales associated with random fluctuations.
We hope that our responses above are convincing enough to assuage the Reviewer's concerns regarding the usefulness and legitimacy of the G/E/S framework as a theoretical concept. We also refer the Reviewer to our response to a similar question below by Reviewer #2 (Major Concern #2).
2. Lack of clarity on the meaning of "Unknown" exonic mutations. Amidst all the text in the Supplementary Note defining SNPs, indels, and missense mutations, I can find no explanation for how exome sequencing can yield an "Unknown" mutation. The only way I can fathom is if the capture beads pulled down collateral bits of intronic and extragenic sequence, but then those mutations should be omitted from the data reporting as they are not in exons. Most of the mutations reported in Fig. 3D and S10D fall into the "Unknown" category, so it is critical to know what these are.
Response: We apologize for the lack of clarity. "Unknown" corresponds to a category of mutations detected by the dNdScv tool that are not single-nucleotide variations. The unknown mutations have been annotated as the different types of mutations (primarily inframe and frameshift insertions/deletions). A small number of complex substitutions were also noted as Unknown mutations. With the addition of copy number variation data (revised FIGS. 3E and 4E; see response to Major Point #4 below), we have removed the complex mutations from the dataset. These changes have been updated in the figures (revised FIGS. 3D and 4D).
3. Lack of consistency in exome sequencing results. There are also inconsistencies in the exome data that were not acknowledged. For example, ZNF717 is reported to be mutated in three of the four clones in Fig. 4D. One would expect that frequency of mutation to be picked up in the PC9-VU bulk population, but no ZNF717 mutation is reported in Fig. 3D. Furthermore, only two ZNF717 mutations are reported among the five clones shown in Fig. S10B, which is supposed to be a oneclone expansion of the data in Fig. 4D.
Response: The inconsistencies noted by the reviewer were due to running the dNdScv tool on different subsets of the data. When running the tool on the entire 8 sample cohort together, most of those inconsistencies were resolved. Based on a preliminary check, depending on the samples input into the algorithm, certain samples pass or fail a specified confidence threshold (dNdS = number of nonsynonymous divided by synonymous mutations in a specific gene across all input samples). In the case of specific mutations (such as ZNF717), samples were close to the threshold. With all 8 samples together, a few genes (MUC3A, GOLGA6L7, MADCAM1, GOLGA8F, CHD4, GOLGA6L4, IGFN1) still exhibited mutations across multiple samples. These mutations were in different locations in the gene and, therefore, considered different mutational events. For clarity, those events were removed from the visualization so that only unique mutated genes (instead of distinct events) are included. Improved visualizations can be seen in FIGS. 3D and 4D. Furthermore, frequencies of mutations can (and probably were) muddled by the heterogeneity of each cell population. Cell line versions, like PC9-VU, likely had more genomic variability than the more homogeneous single-cell derived sublines. Due to the thresholds needed for variant detection, this variability would actually result in fewer detected mutations in a heterogeneous population because the confidence needed to detect these mutations would not be acheived. However, in a homogeneous population, mutations likely exist in all of the cells and, therefore, would have more statistical confidence. This result is present in the total number of detected variants ( Supplementary FIG. S4A) as well as the large overlap between PC9-VU and its derived clones ( Supplementary FIG. S4E). However, as mentioned above, running the dNdScv tool with all 8 samples combined has resolved this inconsistency.
4. Copy number alterations are ignored. The experimental implementation of the G/E/S framework focuses entirely on mutations and does not consider the impact of chromosome stability and copy-number alterations. In a larger, thematically-similar study of HeLa cells (PMID 30778230), copy-number alterations were a driving force for heterogeneity among derivatives of the line. The authors could address this concern without any further experiments by applying inferCNV to their bulk RNA-seq data on each PC9 derivative and clone. CCLE RNA-seq data on PC9s could serve as a "normal" starting reference, assuming that CCLE obtained their PC9s directly from the original source.
Response: We thank the reviewer for this suggestion. In the revised manuscript, we now use inferCNV on our single-cell RNA-seq data to predict potential copy number variations. Those results are now shown in revised FIGS. 3E and 4E. Specifically, cell line versions (revised FIG. 3E) were run with no reference set (such as CCLE data, to avoid potential batch effect errors), while sublines (revised FIG. 4E) were run with PC9-VU as the reference (revised Supplementary  FIG. S11). PC9-MGH seems to have the clearest changes in the cell line version cohort, with amplifications in chromosomes 6, 11, and 22 and a deletion in chromosomes 16 and 19. In the subline cohort, DS8 has major amplifications in chromosomes 6, 18, and 22 and major deletions in chromosomes 7 and 13. These results are now discussed in the second and third subsections of "Results" in the main text. Response: We agree with the reviewer on this point and have tried in the Supplementary Text ("Genetic, epigenetic, and stochastic sources of tumor heterogeneity" -formerly in the main text) to emphasize heritability and characteristic timescale as ways to differentiate epigenetic processes from stochastic processes. We generally think of the different sublines as corresponding to different epigenetic states and the spread of cells within the transcriptomic spaces of the sublines as being reflective of stochastic variability. However, the outlier state in PC9-VU and DS9 that resides in the same region of transcriptomic space as DS8 is admittedly a bit of a mystery at this point. Our intuition is that it corresponds to a shallow basin within the PC9-VU epigenetic landscape, i.e., it is less stable than the other states but more than simply the result of stochastic fluctuations. It is possible that a cell in this state acquired a genetic mutation that stabilized the state, resulting in what we see in DS8. We have added some language to this effect at the end of the third paragraph of the Discussion, where we cite the two works mentioned above by the Reviewer. We have also modified Supplementary FIG. S14 to illustrate these concepts visually. Of course, we cannot know whether our intuition is correct until we assess the lifetime of the state experimentally, as the Reviewer suggests. Unfortunately, with our current data, there are not enough DS9/PC9-VU cells (~2%) that fall within the DS8-like state to differentiate them from other cells in their respective groups, making biomarker identification infeasible. However, we plan to pursue this question further in the future. We are currently working on a follow-up study in melanoma on phenotypic state transitions that is similar in scope and which we hope to expand to PC9 and other EGFR-mutant NSCLC cell lines shortly.

Equating transcriptional and epigenetic states. The authors acknowledge that epigenetics is
6. Proof-by-intimidation reporting of the omics data. Whereas the cFP and DIP experiments are generally clear, well described, and informative, the omics data is presented in a manner that overwhelms the reader more than it empowers them. Taking Fig. 3D-E as a case in point, the text is illegible at anything approaching a normal page, the shading that redundantly encodes the bar graph to the right is imperceptible even on a retina display, and the cancer genes list is effectively devoid of information (while being crammed with gridded lines and microscopic font). I would recommend a phased presentation, where the most-critical results are presented in main figures, and the gory details are presented ONCE in the supplementary figures (i.e., Figure  7. Excluding trivial clone-to-clone differences. As understood, the PC9-VU clones were isolated from a polyclonal population of PC9-VU cells ectopically expressing H2B-mCherry. The abundance of the H2B marker could readily change the epigenetic state of cells, as mCherry is much larger than H2B, and there are many unpublished anecdotes of H2B-mCherry cells behaving oddly, possibly the result of artifactually opening chromatin that should normally be closed. At the minimum, the authors should assess the relative abundance of H2B-mCherry in the different clones and confirm that the overall expression of the label is not dramatically different among clones. Response: We thank the reviewer for this suggestion. PC9-VU clones were isolated from a polyclonal population of PC9-VU cells expressing H2B-RFP. Prior work from our lab (PMID: 22886092, PMID: 27135974) has shown no discernible effect of the RFP molecule on drugresponse phenotypes. To directly address the Reviewer's concern, we provide bulk RNA-seq normalized expression data among H2B gene family members across all eight samples in the study (see below). We see minimal variation in H2B expression level across these samples. Please note that we have not included this data in the manuscript in an attempt to limit length and number of figures as much as possible.
Minor points: 1. The "semantic similarity" is possibly a good way to bring together the mutational data with the transcriptomic data. However, there must be some type of interval estimate on the score and the polyclonal baseline used for comparison. Stability of the scores could be estimated by cross validation.
Response: We agree with the Reviewer on this point. To address this issue, we have modified the method for calculating the semantic similarity score to generate a 95% confidence interval as well as a baseline. In Supplementary FIG. S10 of the revised manuscript, we present these scores for each cell line version and subline normalized relative to their baselines (i.e., scores > 0 are considered significant). Note that we have moved the semantic similarity analysis into the Supplementary Figures because the analysis is somewhat more nuanced and less clear given these changes. We have replaced this analysis in the main text with a more straightforward correlation analysis of shared GO terms between data modalities (IMPACT mutations and differentially expressed genes). We also refer the Reviewer to our response to Reviewer #2 (Major Concern #5), where we provide a more detailed explanation of the modifications that we have made to the calculation of the semantic similarity score as well as our rationale for moving the results of the analysis to the Supplementary Figures.
2. I can understand the motivation for "withholding" the DS8 clone in the presentation, but it really exacerbates the bloat of the manuscript. Also, in the presentation of the first four clones, it is not clear why DS9 and DS3 were selected, as they are very similar in their response to EGFR inhibition. Perhaps it is better to frame those two clones as a "within group" comparison to estimate how different the transcriptional states can be and yet yield similar drug-induced responses.
Response: We thank the reviewer for this comment. By presenting the DS8 subclone near the end of the "Results" section we were attempting to simplify the presentation since our conclusion is that it is genetically distinct from the other subclones. However, we understand that this had the opposite effect and only confused the presentation. To address this, all data duplication of the sublines has been removed from the figures and we now present all omics data from the subclones in FIG. 4

of the revised manuscript and in various Supplementary Figures. We have also integrated the section on DS8 from the original manuscript into the subsections in the revised manuscript on subline heterogeneity and model results (third and fourth subsections of the Supplementary Text, respectively).
With regard to the choice of the four sublines (DS3, DS6, DS7, DS9), we based the decision on their DIP rate distributions rather than their population-level drug responses, which can be deceiving. As mentioned at the beginning of "One PC9-VU subline is genetically distinct, while all others are transcriptomically distinct from each other" in the revised manuscript, DS3 has a peak in the negative DIP rate range, DS6 has a peak near zero, and DS7 and DS9 have very similar distributions in the positive DIP rate range. If we had the chance to redo our selections, we may have decided to choose to DS4 instead of one of DS7 or DS9. However, we do not believe that this would have affected the primary conclusions of this work. Importantly, we were limited to four (of six) of these sublines because, together with DS8 and the three cell line versions, this was the maximum number of samples we could run on a single 10x Genomics single-cell RNA-seq experiment (using cell hashing) and comfortably get more than 1000 cells per sample. This allowed us to avoid batch effects that could have complicated our analyses.
3. The formulation of the parameter inference of the stochastic DIP models is generally well done. However, I am not clear on why the division rates appear to flat-line at a division rate of ~0.045. I hope that was not simply the maximum division rate tested. Also, it is inappropriate to use large p values as a metric of goodness of fit. Much better would be to report the K-S statistic and shade to indicate the minimum value(s) that are deemed of interest.
Response: We apologize for not including the full range of parameter pairs in our original parameter analysis. In the revised manuscript, we have rectified this by including division rate constants up to 0.08 h -1 (revised FIGS. 5D,H and Supplementary FIG. S13D). With regard to p-values, we no longer shade tiles in these figures based on p-value. We simply define a p-value cutoff and shade all tiles that exceed the threshold a color corresponding to the subline. Please note that based on a suggestion by Reviewer #2 (see Minor Comment #1) we now use the Anderson-Darling (A-D) test rather than the Kolmogorov-Smirnov test. This involved bootstrapping the experimental DIP rate distribution (100 times) and comparing each to the simulated DIP rate distribution to generate a distribution of p-values. If the <p> -sdev(p) > 0.05, we considered the comparison to be significant.
4. This is not entirely the fault of the authors, but I find the V3 output of the GeneRanger software annoying for GEO uploads. The barcodes file precludes reviewers from quickly spot-checking the raw data against the figures. If the gene names could be exported along with the unique barcodes, that would be preferred.
Response: We will submit a request to GEO to add the full matrix (with row and column names) to the database before publication of our manuscript. 2. Page 7: The statement "The result above illustrate…" is inaccurate. The only thing that can be said is that the mutations are associated with transcriptional differences.
Response: We apologize for the confusion. The full quote that the Reviewer is referring to is "The results above illustrate that genetic differences among cell line versions translate to epigenetic differences at the transcriptomic level." In this work, we are defining "epigenetic" to mean any heritable state of a cell that is not associated with the DNA. As discussed in the revised Supplementary Text ("Relationship between transcriptomics and epigenetics"), the transcriptomic state is just one lens through which the epigenetic state can be viewed. However, to avoid confusion, we have removed the sentence that the Reviewer is referring to from the main text and rewritten the paragraph (page 6 of the revised manuscript) discussing the quantification of the connection between genomic and transcriptomic states in the cell line versions.
3. Page 11: The statement "However, individual basins are not discernable." is false. The outlier basin is clearly visible in Fig. 3G.
Response: We apologize for the confusion. In this sentence, we were specifically referring to the scRNA-seq data for the PC9-VU cell line version shown in FIG. 3F of the main text. We were trying to make the point that it was not obvious from that data that there were discrete states "hidden" within it but obscured by stochastic fluctuations. This was meant to set up the statements in the following sentences about how the scRNA-seq data for the different sublines overlap with different portions of the PC9-VU parental transcriptomic space (FIG. 4F of the main text), which we view as evidence that PC9-VU has an associated epigenetic landscape with multiple phenotypic attractors (FIGS. 1 and 6 of the main text). We maintain this argument in the revised manuscript but have removed the sentence in question to avoid confusion. Figure S8: There is no need for a legend here. Just label the data points.

4.
Response: As suggested, in Supplementary FIG. S8 we have removed the legend and labeled the data points by sample.
Response: We thank the Reviewer for pointing this out. The Supplementary Text has been completely rewritten in the revised manuscript. The sentence that the Reviewer is referring to is no longer present.

Reviewer #2: [identifies himself as Aaron S Meyer]
Overall Comments: Hayford et al. provide a framework to deconvolve drug response variability into genetic, epigenetic, and stochastic sources in an in vitro heterogeneous tumor model. They explore the genetic and epigenetic differences that explain drug response variability using mutational impact, single-cell differential gene expression, and semantic similarity of gene ontology analysis across 3 versions of the PC9 NSCLC cell line and 8 sublines of PC9-VU. The authors conclude that the cell line versions, as well as single cell-derived sublines differ at genetic and transcriptomic levels. They argue that their framework could be employed to account for all levels of heterogeneity when evaluating cancer treatment.
Overall, the manuscript presents a comprehensive analysis of heterogeneity within a cell line model. It presents a valuable analysis that would be of interest to the drug response and tumor heterogeneity communities. Before it is suitable for publication, however, there are some major concerns outlined below that must be resolved. I expect the authors can fully address these concerns with text changes and some additional computational analysis.
Response: We thank Dr. Meyer for these overall comments. We address each aspect pointwise below.
Major concerns: 1. I appreciate that the authors have aimed to be comprehensive in their definitions and discussion. However, parts of the paper seem very wordy. As one example, genetic, epigenetic, and stochastic heterogeneity are defined within separate sections, and then again defined within a supplementary table. If there are suitable references the paper can use to explain these definitions, this could help to focus the paper on the new results presented here. It would also make the manuscript more accessible overall.
Response: We thank Dr. Meyer for this comment. In the revised manuscript, we have moved the descriptions of the three levels of heterogeneity (genetic, epigenetic, and stochastic) to the Supplementary Text ("Genetic, epigenetic, and stochastic sources of tumor heterogeneity") and have reduced the number and lengths of the definitions in Supplementary Table S1. We believe the manuscript is now much more streamlined and accessible.

I understand the comparison to the ideas proposed by Waddington but think this needs (1) be clarified, and (2) its ultimate usefulness highlighted. The Waddington landscape model specifically describes a developmental process, and how barriers between states might increase as development progresses. Where is the analogous process here? In what way can one use the attractor model as a testable hypothesis, or is this a tautological idea?
Response: We appreciate Dr. Meyer's comments and welcome the opportunity to clarify our thoughts on these important issues. We acknowledge that Waddington's landscape was developed as a conceptual framework for understanding cellular differentiation during development. However, we do not believe that the framework is exclusive to that setting. Rather, we view development, or more specifically, a developmental hierarchy as a certain type of Waddington landscape comprising successive basins of decreasing quasi-potential energy. As cells traverse down this hierarchy, from stem cell to progenitor cell and so on, reverse transitions, or "de-differentiation", become increasingly unlikely. This explains why barriers between states increase as development progresses. In cancer, we believe the same processes occur, except without a well-defined hierarchy. This is why tumors are composed of multiple phenotypes, some of which are "stem-like", the degree of stemness being a function of how deep or shallow a basin is within the landscape. This perspective has been discussed by many others, including Marusyk et al. (2012) (PMID: 22513401). We have added some language to the end of the second paragraph of the Introduction to address the issue of Waddington's landscape in development and how it applies in cancer.
With regard to the usefulness of the Waddington landscape concept in cancer, we argue that it can be extremely valuable in deciphering and interpreting complex and sometimes contradictory experimental results. For example, in a recent publication of ours (PMID: 29590606), we observed melanoma cell populations that when treated with a BRAF inhibitor expanded, regressed, expanded again, and then entered a state of balanced division and death that we termed "idling". At first, we had a hard time interpreting this behavior. However, once we started thinking in terms of epigenetic landscapes and built a simple model similar to that used in this work except with transitions between states, the explanation became clear: the drug changed the landscape, knocking the population out of equilibrium; the complex dynamics we observed were the result of a re-equilibration of the population over the new, drug-modified landscape. We also believe that the Waddington landscape concept can explain why therapies based on targeting specific genetic alterations or rare cell types, i.e., cancer stem cells (CSCs), so often fail. In the first case, since each genetic state has an associated epigenetic landscape with multiple phenotypes that cells diversify across, some of these phenotypes may be suited, by their molecular makeup, to survive drug treatments targeting that genetic state. These surviving cells may then acquire genetic resistance mutations, leading to tumor recurrence and metastasis. In the second case, a CSC can be thought of as a shallow basin within an epigenetic landscape. Killing cells in that basin does not eliminate the basin. Thus, once the drug is removed, cells from "adjacent" basins can transition and repopulate the basin, restarting tumor growth. We discuss these issues and more in the final paragraph of the Discussion section.

Finally, we do believe that the Waddington landscape model can produce testable hypotheses. For example, if the PC9-VU subline transcriptomes did not overlap with different portions of the PC9-VU parental transcriptome, that would have severely damaged our hypothesis that the sublines constitute individual basins within a common PC9-VU epigenetic landscape.
Moreover, the fact that the DS8 subline transcriptome does not overlap with that of PC9-VU led to the hypothesis that it is genetically distinct from PC9-VU, which was verified by our whole-exome data. Another example that has yet to be tested relates to whether the genetic mutation within DS8 was acquired before or after the subline was established. In the Supplementary Text ("Multiple possible routes to a new genetic state in DS8") and Supplementary FIG. S14, we present two possible scenarios based on the concept of the Waddington landscape. These hypotheses could be tested through some combination of singlecell cloning of DS8, scRNA-seq, and DNA barcoding to track lineages within co-cultured populations. We are currently devising experiments along these lines.
We also refer Dr. Meyer to our response to a similar question above by Reviewer #1 (Major Point #1).

The authors must at least acknowledge that heterogeneity can arise from environmental and pharmacologic differences. There is extensive literature on the contribution of these factors, even within simple tissue culture systems.
Response: We agree with Dr. Meyer and thank him for this comment. Environmental and pharmacological factors are actually implicitly accounted for in the genetic-epigeneticstochastic framework that we advocate for in this manuscript. Environmental factors can be seen as inputs or boundary conditions to the molecular networks that underlie the epigenetic landscape. Therefore, similar to the effect of genetic mutations (mentioned briefly in the Introduction), changes in the environment can alter the nature of the landscape, changing the number, types, and abundance of phenotypic states within a tumor. This is true for microenvironmental differences between primary and metastatic tumors as well as macroscale factors that act on the level of the patient, such as pollution or diet. Fluctuations in these environmental factors can also drive phenotypic state transitions, in the same way that intrinsic fluctuations in gene expression do.
Pharmacological factors have a similar effect but since they are usually small-molecule drugs, we think of them as new molecular species within the network, rather than inputs or boundary conditions. Adding new molecular species to a network can obviously change the nature of the epigenetic landscape. This concept is actually the basis for a recent publication from our lab (PMID: 29590606), where we measured the response of BRAF-mutant melanoma cell populations to a BRAF inhibitor over time. We used a three-state population dynamics model similar to the one used in this work, except we allowed phenotypic transitions between states. Our conclusion from that study was that the BRAF inhibitor changed the topography of the epigenetic landscape and that the complex short-term dynamics that we observed at the cell population level were due to a re-equilibration of cells across the drug-modified landscape.

To address Dr. Meyer's concern, we have added sentences to the first and second paragraphs of the Introduction listing out various sources of non-genetic heterogeneity and how changes in environmental and pharmacological factors can alter epigenetic landscapes within a tumor.
We hope this, together with the explanations above, sufficiently address this issue.

The tense throughout the manuscript is inconsistent which leads to confusion. The authors seem to start by referring to their experiments in the past, and observations in the present, which reads well. However, this changes a few times throughout the text.
Response: We apologize for this oversight. We have revised the manuscript to use consistent tense throughout, with experiments referred to in past tense and observations in present tense, as suggested.

Semantic similarity is an interesting idea for demonstrating conserved molecular programs.
Can the semantic similarity score be evaluated compared to a meaningful null model for significance? For example, randomized gene sets of the same size from those genes expressed in PC9? Maybe it can also be shown that the similarity between types of molecular measurements is higher within a cell line (e.g. BR8) as compared to between lines? I am concerned that GO enrichment might simply be a reflection of the genes expressed in this cell line, and that semantic similarity simply scales with the number of impacted genes.
Response: We would like to thank Dr. Meyer for this comment. In response to this comment and those from the other Reviewers, we have performed an exhaustive investigation into the semantic similarity metric and discovered that it does appear to scale (at least partially) with the number of impacted genes (see plot below). Since the score averages the maximum values from each column and row of the GO semantic similarity matrix to arrive at a single value, adding more GO terms (from more genes) acts to increase the average. Put simply, if we have a list of values x, the maximum value of x can only increase as more values are added to the list, it can never decrease. Thus, semantic similarity scores for samples and modalities with more genes tend to skew higher than those with fewer genes. To address this issue, we have modified the implementation of the metric in the following ways:

Developed a confidence interval metric -Instead of choosing GO terms to be input into the
semantic similarity calculation based on a p-value threshold (p<0.05), as in the original method), GO terms are now selected if r < 1-p, where r is a random number between 0 and 1. This creates longer GO term lists, which are then fed into the semantic similarity calculation. But instead of creating a "max average", as in the original version, we take the largest 1000 scores from the similarity matrix and calculate a median and 95% confidence interval.

Created a baseline score with confidence intervals -This score is created by compiling, for each sample (cell line version or subline), a random list of genes (from the hg38 reference)
with the same number of terms as those for each data modality. These random gene lists are run through the semantic similarity pipeline to generate a baseline median score and confidence interval (as described in point 1 above). The baseline median value + one standard deviation is subtracted from the experimental semantic similarity score to produce a normalized value. Thus, semantic similarity scores > 0 are considered significant.
Furthermore, there is also a question as to how much confidence to assign to the different GO term categories: "Biological Process" (BP), "Molecular Function" (MF), and "Cellular Component" (CC). The categories have drastically different numbers of terms in the GO network structure (BP ~14000 >> MF ~6000 > CC ~2000). We have taken the position here that the number of terms is a reflection of the "biological knowledge" contained within each category. Thus, we put more stock in the BP semantic similarity score than we do in that for MF and more in MF than for CC. This makes the interpretation of the semantic similarity analysis more nuanced. Because of this, we decided to move the results of this analysis to Supplementary FIG. S10. In the main text, in place of the semantic similarity analysis, we have added correlation plots for shared GO terms across modalities. We performed a GO over-enrichment analysis on each gene list (as in the first part of the semantic similarity calculation; see above) and identified terms that are significantly over enriched (p<0.05) for both modalities (IMPACT mutations and differentially expressed genes). We then plotted these shared terms on axes of log p-values, along with correlation coefficients if there were enough shared terms to warrant it (revised FIGS. 3G and 4G). We saw that, generally speaking, cell line versions (other than PC9-VU) and DS8 have positive correlation coefficients, while the other sublines have negative coefficients or not enough terms to calculate a correlation. Together with the semantic similarity analysis, we feel these results support our position that the cell line versions and the DS8 subline are genetically distinct but the other PC9-VU sublines are not. We have modified the main text to make this point.
6. The author's conclusions from their SSA model are overreaching and inconsistent. I agree that they can conclude their DIP rate measurements are consistent with the SSA model but do not agree that they can say much more. It is unclear to what extent their measurements are powered to reveal other forms of heterogeneity if present. Second, they are using one measure of phenotypic heterogeneity, when "epigenetic" heterogeneity is a much more encompassing term (e.g. drug response to other compounds). Third, their parameter uncertainty is simply that, and not evidence that the cells themselves fluctuate.
Response: We would like to thank Dr. Meyer for this comment and apologize for any confusion that our discussion of the SSA results may have caused. As Dr. Meyer points out, the most important conclusion of these results is that the DIP rate distributions for all of the PC9-VU sublines (other than DS8) can be described by a model with a single cell state. This provides strong evidence that the sublines are monoclonal ( revised FIG. 5), which is consistent with the idea that they constitute basins within an epigenetic landscape (revised FIGS. 1 and 6). We have modified the main text to emphasize this point exclusively.
Other conclusions that we could draw from these results are admittedly more subtle and we have removed them from the manuscript to avoid confusion. Conceptually, we were attempting to connect DIP rate distributions to intracellular fluctuations in molecular species concentrations. Since intracellular signaling networks govern cell fate decisions, it follows that fluctuations in the concentrations of species involved in cell fate decision making would result in randomness in division and death events and, hence, distributions of DIP rates. However, we are obviously not measuring fluctuations in protein levels to directly tie to cell fate decisions in this work and our main conclusions are not reliant on making this direct connection. Hence, we made the decision to remove these statements from the manuscript.
Also, we apologize for any confusion regarding the terms "phenotypic heterogeneity" and "epigenetic heterogeneity", which we use effectively interchangeably. Much of the material provided in the Supplementary Text (which was originally in the main text) and the definitions in Supplementary Table S1 are attempts to avoid confusion on such issues. Basically, we define a phenotypic, or epigenetic, state as a basin within an epigenetic landscape. Thus, a phenotypically, or epigenetically, heterogeneous population is one in which cells occupy numerous such basins. Examples are the cell line versions (VU, MGH, BR1); counterexamples are the monoclonal PC9-VU sublines (DS3, DS6, DS7, DS9). Critically, the epigenetic landscape is defined in terms of a set of intracellular molecular species, i.e., the molecular "state space". Which species define the landscape depends on context, e.g., in our case, the context is sensitivity to the EGFR inhibitor erlotinib, so the state space is technically defined by all molecular species involved in cellular response to erlotinib. However, this means that the phenotypic, or epigenetic, states can be different in different drugs. DS3 and DS6, for example, which are distinguishable in erlotinib (both by DIP rate and transcriptomic state) may not be under a different drug; conversely, DS7 and DS9, which cannot be distinguished in erlotinib, may be distinguishable under a different drug. Thus, while we agree with Dr. Meyer's statement that epigenetic heterogeneity can broadly refer to drug response to multiple compounds, we are using it in a narrower sense in our manuscript to mean degree of dispersion across multiple epigenetic basins within a specific context. We hope this explanation is sufficient to alleviate any concerns regarding this issue.
Minor comments: 1. When comparing the SSA model to DIP measurements, the Anderson-Darling test should be more sensitive to differences as compared to the Kolmogorov-Smirnov test. Also, bootstrap resampling is necessary to prevent bias in the test statistic. This is summarized well here: https://asaip.psu.edu/articles/beware-the-kolmogorov-smirnov-test/.
Response: We thank Dr. Meyer for this comment and online resource. In the revised manuscript, we have changed the statistical test to the Anderson-Darling test and bootstrap resampled the data (100 times) to compare to each simulated result, creating a range of pvalues. We chose <p> -sdev(p) > 0.05 as the condition under which we cannot reject the null hypothesis that the experimental and simulated distributions are the same. In addition, we have updated revised FIG. D,H and Supplementary FIG. S13D to remove shading based on pvalue and simply color the tiles if this condition is met (see also response to Reviewer #1, Minor Point #3). figure 2B, the y-axis is density-is the magnitude of this quantity meaningful (e.g. in units of cell number)? If not, it would be better to normalize the distributions such that the y-axis would be 0-1. FIG. 2B,D is in units of [# of colonies / DIP rate], so that the areas under the curves equal 1. We did this to facilitate direct comparisons among the distributions since each is based on a different number of cFP colonies. To avoid confusion, we have changed the y-axis labels in these plots to "Number of Colonies / DIP Rate". figure 3, it is not clear to which plot "CV=12.84" belongs. Same in figure 4.

Response: We apologize for the confusion. The CV values correspond to panels (A) in Figs. 3 and 4. We have modified the figures by moving the values into the middle of the circular bar plots to avoid confusion.
Reviewer #3:

Overall Comments:
Hayford et al tackle the issue of disentangling the genetic, epigenetic and stochastic sources of heterogeneity in cancer response to treatment. The authors tackle this problem with a combined approach of theoretical and mathematical modeling with genomics, single cell transcriptomic and experiments on drug response in three cancer cell lines and several monoclonal sublines derived from one of these three lines. This is a very important problem which has relevant repercussions on drug design and development of therapeutic strategies in cancer. The system and the approach used are clever and effective and the results are interesting. However, I think the authors could have done a better job in analyzing the large amount of data they produced, in particular the transcriptomic data, which could provide a much more detailed and informative characterization of the epigenetic landscape of the system. Moreover, the manuscript is massive, highly redundant and difficult to read because the information is poorly organized and scattered through main text, methods and supplementary note (the pdf is also missing line numbers which makes it difficult to review). Most importantly, the results can be presented in a much more concise and effective way. As the most striking example, almost all the data reported in figure 4 is repeated in figure 6 and supplementary figure S10 (even with different color codes), with the addition of another subline, making figure 4 B C D E F G H and I panels completely redundant. Also, in another example, the authors separate the sublines in different subplots in figure 2C when Figure S4 shows that sublines can be put in the same plot which is more concise and makes it easier to compare between them. A profound reorganization of the figures and of the results section is necessary before publication.
Response: We would like to thank the Reviewer for the insightful critique provided here and in the comments below. We have attempted to address each concern to the best of our ability and believe our revised manuscript is much stronger as a result. To address the concerns raised in this comment, we have removed and streamlined portions of the main text and Supplement. Specifically, we have moved the descriptions of the three levels of heterogeneity (genetic, epigenetic, stochastic) to the Supplementary Text, made more concise arguments in the Introduction, shortened the Glossary of Terms in Supplementary Table S1, 4F). We have also added analyses that provide insight into the biological basis for UMAP dimensions and clusters, performed a copy number variation analysis, and modified the semantic similarity analysis. Additional details are provided in our responses to the critiques below. Response: We thank the Reviewer for this comment. To address the first concern regarding the scRNA-seq data being split into three parts, in the revised manuscript we have combined all of the subline data, including for DS8, into a common UMAP plot (revised FIG. 4F). We still present the scRNA-seq data for the cell line versions separately (revised FIG. 3F) to coincide with the presentation in the main text and to avoid overlap in the UMAP plot between the sublines and PC9-VU parental cell line, which would make the two sets of data hard to distinguish. However, we have included topographical contour lines for the cell line versions in the UMAP plot for the sublines in order to facilitate direct comparisons (revised FIG. 4F).
Regarding biological interpretability of the UMAP dimensions and clusters, we used the VISION tool recommended by the Reviewer to generate distributions for 50 Hallmark gene signatures, taken from the MSigDB database (gsea-msigdb.org/gsea/msigdb/collections.jsp), for all cell line versions and sublines (Supplementary FIG. S9). In short, we see broad hallmark processes that differentiate the populations and, therefore, the UMAP axes. For example, the KRAS signaling score is enriched in PC9-MGH and the PI3K/AKT/MTOR signaling score is enriched in the PC9-VU cohort. Since PC9-MGH and PC9-VU differ along the UMAP_1 but not UMAP_2 axis ( revised FIG. 3F), this suggests that UMAP_1 has some relation to signaling through the parallel PI3K/AKT/MTOR and RAS/RAF/MEK/ERK pathways. Furthermore, PC9-BR1 is enriched for cholesterol homeostasis but not for Hedgehog signaling, while DS8 is enriched for DNA repair and unfolded protein response but not for the p53 pathway. PC9-BR1 and DS8 differ along the UMAP_2 but not UMAP_1 axis ( revised FIG. 4F), suggesting that these processes are relevant for the interpretation of the UMAP_2 axis. We have included a summary of these results in TABLE 1 of the revised manuscript and briefly discuss them in the Results section.
2. The GO semantic similarity analysis between the genetic and epigenetic level is potentially interesting but we lack information about the results and interpretation of this analysis. For example, the authors provide just an example of the analysis for subline DS8 in supplementary table S2 but they only provide the GO identifiers and not the names making any attempt to understand and interpret the analysis very hard. What are the biological processes and molecular functions that drive the semantic similarities between the genetic and epigenetic level of the cell lines? Are those processes or functions shared among some of the lines or sublines?
Response: To address the concern about biological interpretation, we have included in revised FIGS. 3G and 4G correlation plots for the GO terms that have shared significance between the data modalities (IMPACT mutations and differentially expressed genes). We display on the plots the GO term labels for all terms with -log10(p) > 2 for either modality, to give some insight into the molecular programs active within each sample. Please note that these correlation plots replace the semantic similarity scores included in the original manuscript. The semantic similarity metric has been modified in the revised manuscript and the analyses moved to Supplementary FIG. S10 (see our response to Reviewer #2-Major Point #5 for further details). We have also removed from Supplementary Table S2 the example GO similarity matrix for DS8 that the Reviewer references in their comment.
We would like to clarify that the goal of the semantic similarity analysis is to determine whether genetic variants within a cell line or subline can explain differential expression at the transcriptomic level within the same cell line or subline. If so, then there is evidence that the samples being compared (i.e., cell line versions, sublines) are genetically distinct. If not, then there is evidence that the samples represent basins within a common epigenetic landscape. Importantly, since we are specifically looking at mutations that are unique to each sample and genes that are differentially expressed within a sample relative to others within the sample group, we would not expect to see many, if any, shared processes. Indeed, there is only one shared term between PC9-VU and PC9-MGH, one between PC9-VU and DS3, and one between DS3 and DS9. No other pairs of samples showed any overlap in their GO terms.
3. In addition, if I understand correctly, the authors only compare each subline against the other sublines both at the genetic and RNA seq level, but wouldn't it be interesting to compare them also against the line from which they are derived?
Response: To address this concern, we have added a mutational overlap between PC9-VU and the sublines (Supplementary FIG. S4E), which shows that the largest degree of overlap, by far, is for all the sublines and PC9-VU. This is unsurprising since the sublines were derived from the PC9-VU parental population. We reference this in the revised text at the end of "Methods: Derivation of single-cell sublines". At the transcriptomic level, we have included contour lines for the cell line versions in the UMAP projection for the scRNA-seq data for the sublines (revised FIG. 4F). This facilitates a direct comparison between the PC9-VU subline transcriptomes and the PC9-VU parental transcriptome, which show significant overlap. We also show a bulk RNAseq comparison of the PC9-VU sublines and parental in Supplementary FIG. S8A. We hope these inclusions sufficiently address the Reviewers' concerns.

Why haven't the authors performed single cell RNA seq of DS1?
Response: As mentioned in our response to Reviewer #1 (Minor Point #2), we had space limitations in the scRNA-seq experiment to incorporate cell hashing and minimize batch effects. With the three cell line versions (VU, MGH, BR1) and the DS8 subline (which was an obvious choice), we had space for four additional samples out of the six remaining sublines. We chose these more or less randomly to cover a range of drug responses, based primarily on their DIP rate distributions ( revised FIG. 2D). We chose DS3 because it has a peak in the negative DIP rate range, DS6 because it has a peak near zero, and DS7 and DS9 because they have peaks in the positive DIP rate range. DS1 and DS4 just happened to be the two sublines we excluded. We do not believe that this affected the primary conclusions of our work in any way.

5.
It would be nice if the authors could confidently conclude that the BR1 line and the subline DS8, which are both resistant to the drug, achieve resistance through two distinct genetic and/or epigenetic cell states. However, from figure 4 G, (or figure 6 C) it seems to me that a minority of the DS8 cells occupy the same epigenetic state of BR1. It is therefore difficult to conclude that the resistance in DS8 is not driven by those cells that are close to BR1, also considering that the bimodal distribution in the DIP rate would suggest that it is a minority of cells that drive the resistance. Maybe the authors could sort and profile mutations and RNA of the high DIP rate and low DIP rate cells separately and look at where those cells map in the genetic and epigenetic landscape?
Response: We thank the reviewer for this comment. After a thorough re-analysis of the scRNAseq data, including doublet detection ( Supplementary FIG. S17A,B) and stricter thresholds for removing low-quality cells ( Supplementary FIG. S17C) , we no longer find overlap between PC9-BR1 and DS8 (revised FIGS. 3F and 4F). Moreover, the addition of copy number variant data (FIGS. 3E and 4E), which shows different amplifications and deletions within the two populations, strongly suggests that PC9-BR1 and DS8 are genetically distinct. Also, we agree with the Reviewer that sorting high and low DIP rate cells within DS8 and comparing their genomic and transcriptomic signatures is an excellent idea. We plan to pursue this in future work. However, we do not believe that this affects our primary conclusion in this work, which is that DS8 harbors a genetic resistance state that emerged spontaneously from the PC9-VU population.
6. As explained above, authors should try to present the analyses of all lines and sublines combined. It doesn't make sense to repeat the analysis and presentation of the sublines data before and after adding the DS8, it's too redundant. Then, they can easily discuss the different nature of the DS8 with respect to the other sublines referring to the same plots.
Response: We appreciate and agree with the Reviewer's comment. Originally, we were concerned that including DS8, which we conclude is genetically distinct from the other sublines, together with the sublines that we argue are genetically indistinguishable from each other and the parental PC9-VU cell line would complicate the story and confuse the reader. However, we realize that waiting to present DS8 at the end actually caused more confusion than it prevented. Thus, in the revised manuscript, we have consolidated all of the subline scRNA-seq data, including for DS8, together in FIG. 4 of the main text. We have kept the cell line version data separate in FIG. 3, however, to avoid overlap with the subline data and to coincide with how the data is presented in the text.

Minor points:
Main text: 1. First part of results i.e. the lengthy description of the three levels is actually more suitable for introduction and I'd suggest to shorten it to improve readability, or alternatively stick it to the supplementary note.
Response: We agree with the Reviewer on this point and have moved the descriptions of the three levels of heterogeneity (genetic, epigenetic, stochastic) to the Supplementary Text ("Genetic, epigenetic, and stochastic sources of tumor heterogeneity").