Fig 1.
Bayesian inference better assesses parameter inclusion in the "true" model.
(A) Aspects of linear regression model assessed by model selection and model averaging. (B) Candidate linear regression model set for multimodel inference analysis and comparison to AICc in our analysis of the example problem from Galipaud et al., 2014 [15]. (C) Results from our analysis of candidate models in (B) using data generation code from Galipaud et al., 2014 [15]. Candidate models are arranged along the x-axis by posterior probability. Each posterior probability from our analysis is compared to the AICc weight from our analysis. (D) Heatmap for each model averaging analysis based on data generation and candidate models in [15]. Top two rows are summed AICc weights (SW), where SW is a value starting at 0 and weights are summed until the final SW value; thus, the top color bar corresponding to these two rows moves from 0 (white) to 1 (dark red). First row, the idealized summed AICc weights based on generated “ground truth” data (with Pearson correlations between x2 and y and x3 and y different than 0.0 or 1.0, we cannot be sure what the “true” SW should be); second row, summed AICc weights from our analysis. Note the large difference between ideal SW for x4 (zero) and its calculated SW (0.25; see Table 1). Bottom two rows are posterior probabilities from nested sampling and Bayes-MMI, where before nested sampling, prior probabilities are at 0.5 (white in the bottom color bar) and color in the heatmap represents the probability of the variable’s inclusion in the “true” model (darkest blue for 0% probability, darkest red for 100% probability in the bottom color bar). Deeper colors indicate a larger deviation from the prior. Third row, the idealized probability based on “ground truth” data (similarly to SW, Pearson correlations between 0 and 1 for x2 and x3 mean we cannot be sure what “true” posterior probability should be); fourth row, posterior probability from our Bayes-MMI analysis. Note the closer correspondence between ideal posterior probability for x4 (zero) and its marginal-likelihood-derived probability (0.09; see Table 1) than between ideal SW and calculated SW.
Table 1.
SW and posterior probability calculations for each model variable.
Fig 2.
Conclusions and hypotheses from literature build mechanistic hypothesis exploration space for tumor growth and development.
(A) Synthesis of what is currently known about SCLC subtypes, which have been divided into two overall phenotypes, neuroendocrine (NE) and Non-NE, and then further classified into subtypes based on transcription factor expression. [1] NE SCLC cells, which do not express HES1, transition into Non-NE cells, which do. [2] HES1+ cells release unidentified factors (gray circle) that support viability and growth of HES1- cells, and the two HES1+ and HES1- populations grow better together rather than separately. [3] Consensus across the field led to labeling SCLC phenotypic subtypes by the dominant transcription factor expressed in that subtype. [4] Subtype with transcriptional signature intermediate between NE and Non-NE, named SCLC-A2. [5] Phenotypic transitions occur in a hierarchical manner from SCLC-A to SCLC-N to SCLC-Y cells. (B)-(E) Candidate model examples representing SCLC biological hypotheses (Table 1). Here we indicate schematically how a population dynamics model can represent each biological hypothesis, as well as denote how the set of candidate models is built combinatorially, in order of (B)-(E). (B) Model topologies constructed with 2+ subtypes, with number of combinations per number of subtypes. There are 11 options total, and each of these move forward to choose one effect option from (C [1], [2], or [3]. (C) Subtype effect schema, where there are different effectors between candidates and different affected cellular actions. If there are effects (C [2] or [3]), model behaviors affected are chosen (choose (C [4] & [6], [4] & [7], or [5] & [6]). Whether effects present (C [2], [3]) or not (C [1]), the candidate moves forward to choose initiating subtype(s): each subtype in the model must follow (D [1], [2], or [3]) and corresponding transition schemes (E [1], [2]). (D, E) Initiation schemes (D) and potential transition schemes (E), where all subtypes in topology must be accessible either as initiating subtypes or via transitions (D), unidirectional transitions are those that follow a hierarchy (E, top left), and bidirectional transitions must be symmetrical when present (E, top right and bottom). A: ASCL1, Achaete-scute homolog 1; N: NEUROD1, neurogenic differentiation factor 1; H: HES1, Hes Family BHLH Transcription Factor 1; P: POU2F3, POU class 2 homeobox 3; Y: YAP1, yes-associated protein.
Table 2.
Existing data pertaining to SCLC intratumoral heterogeneity and communication.
Fig 3.
Population composition data and probabilistic representation.
(A) CIBERSORT deconvolution of TKO and RPM genetically engineered mouse model (GEMM) samples (previously published) as well as SCLC-A cell line samples. CIBERSORT was performed on bulk RNA-sequencing data. (B) Probabilistic representation of tumor proportion based on mean and standard deviation of proportions across samples within an experimental model; these distributions were then used for fitting models to data. TKO, p53fl/fl;Rbfl/fl;p130fl/fl tumors [28]; RPM, Rb1fl/fl;Trp53fl/fl;Lox-Stop-Lox[LSL]-MycT58A tumors [36]; SCLC-A cell lines, a subset of SCLC cell lines from the CCLE [54] that we previously assigned as representative of tumors made up largely of the SCLC-A subtype [33].
Fig 4.
Fitting to data and assigning Bayesian evidence separates candidate models into more and less likely.
(A) Aspects of linear regression model assessed by model selection and model averaging (see Fig 1A). (B) Aspects of mass-action kinetics model / ordinary differential equation assessed by model selection and model averaging. (C) Schematic representation of the equation in (B). (D)-(F). Evidence values (left y-axis) and posterior probability values (right y-axis) from nested sampling, one point per model, ordered from model with greatest evidence to model with least evidence. Models whose evidence value are within 101/2 of the greatest evidence value, the "relative likelihood confidence interval," are colored in red. Nested sampling and evidence calculation is performed per dataset. (D) TKO dataset. (E) RPM dataset. (F) SCLC-A cell line dataset. (G) Numbers and percentages of models in the relative likelihood confidence interval, 95% confidence interval, and remaining non-confidence interval models. TKO, p53fl/fl;Rbfl/fl;p130fl/fl tumors [28]; RPM, Rb1fl/fl;Trp53fl/fl;Lox-Stop-Lox[LSL]-MycT58A tumors [36]; SCLC-A cell lines, a subset of SCLC cell lines from the CCLE [54] that we previously assigned as representative of tumors made up largely of the SCLC-A subtype [33].
Fig 5.
Likely model topologies vary across datasets; transition rates vary according to subtype presence in similar ways.
(A) Hypothesis assessment of model topologies, per dataset. Probability indicates the result of Bayes theorem using equivalent prior probabilities per topology (e.g., 9% probability that one of the topologies in the x-axis best represents a dataset) and Bayesian evidence values (marginal likelihoods) summed per topology. Model topologies represented by images and corresponding numbers along the x-axis. Posterior probability based on marginal likelihoods of all candidate models that include A as an initiating subtype. (B) Division and phenotypic transition parameters for TKO, RPM, and SCLC-A cell line datasets, comparing between higher-probability topologies (A) and four-subtype topology per dataset. Red arrowheads indicate higher A-to-A2 transition rate in 3-subtype TKO topology (A, A2, Y) compared to A-to-Y and A2-to-Y. Teal arrowheads indicate higher A-to-N transition rate in 4-subtype RPM topology compared to A-to-Y and N-to-Y. TKO, p53fl/fl;Rbfl/fl;p130fl/fl tumors [28]; RPM, Rb1fl/fl;Trp53fl/fl;Lox-Stop-Lox[LSL]-MycT58A tumors [36]; SCLC-A cell lines, a subset of SCLC cell lines from the CCLE [54] that we previously assigned as representative of tumors made up largely of the SCLC-A subtype [33]. (*) indicates significance between samples from BMA parameter distributions at family-wise error rate (FWER) = 0.01, averaged over ten sampling iterations using one-way ANOVA plus Tukey HSD.
Table 3.
Probabilities after hypothesis exploration using Bayesian multimodel inference.
Fig 6.
Across datasets, multimodel inference indicates likely bidirectional phentoypic transitions, suggesting high SCLC phenotypic plasticity.
(A) Heatmap for high probability three-subtype topologies for each dataset (rows), all models initiated by A +/- other subtypes. Color represents the probability of each cellular behavior (column). Since prior probability starts at 0.5 (white), deeper colors indicate a larger deviation from the prior, with red vs blue indicating more likely or less likely, respectively. (B)-(D). Model schematics with each cellular behavior represented by edges coming from or moving toward each cell subtype, (gray circles) growth rates, (self-arrows) or transitions (arrows between gray circles). Edge colors correspond to colors for that behavior in the heatmap in (A). Top-scoring three-state topology for TKO dataset (B), RPM dataset (C), and SCLC-A cell line dataset (D). (E) Schematic of consolidated model behaviors, drawn from each dataset’s high-probability three-subtype topology results ((B)-(D)). When multiple dataset results included different posterior probabilities for a model feature, the one closest to 0.5 was chosen (most conservative). Edge colors correspond to posterior probabilities, with intensity of colors representing information gained from data, as in (A)-(E). (F) Parameter fitting results (part of the nested sampling algorithm) for four-subtype topology models initiated by A +/- other subtypes, across datasets. tsn, transition (e.g., subtype transition). TKO, p53fl/fl;Rbfl/fl;p130fl/fl tumors [28]; RPM, Rb1fl/fl;Trp53fl/fl;Lox-Stop-Lox[LSL]-MycT58A tumors [36]; SCLC-A cell lines, a subset of SCLC cell lines from the CCLE [54] that we previously assigned as representative of tumors made up largely of the SCLC-A subtype [33]. (*) indicates significance between samples from BMA parameter distributions at family-wise error rate (FWER) = 0.01, averaged over ten sampling iterations using one-way ANOVA plus Tukey HSD.