Estimating ectopic beat probability with simplified statistical models that account for experimental uncertainty

Ectopic beats (EBs) are cellular arrhythmias that can trigger lethal arrhythmias. Simulations using biophysically-detailed cardiac myocyte models can reveal how model parameters influence the probability of these cellular arrhythmias, however such analyses can pose a huge computational burden. Here, we develop a simplified approach in which logistic regression models (LRMs) are used to define a mapping between the parameters of complex cell models and the probability of EBs (P(EB)). As an example, in this study, we build an LRM for P(EB) as a function of the initial value of diastolic cytosolic Ca2+ concentration ([Ca2+]iini), the initial state of sarcoplasmic reticulum (SR) Ca2+ load ([Ca2+]SRini), and kinetic parameters of the inward rectifier K+ current (IK1) and ryanodine receptor (RyR). This approach, which we refer to as arrhythmia sensitivity analysis, allows for evaluation of the relationship between these arrhythmic event probabilities and their associated parameters. This LRM is also used to demonstrate how uncertainties in experimentally measured values determine the uncertainty in P(EB). In a study of the role of [Ca2+]SRini uncertainty, we show a special property of the uncertainty in P(EB), where with increasing [Ca2+]SRini uncertainty, P(EB) uncertainty first increases and then decreases. Lastly, we demonstrate that IK1 suppression, at the level that occurs in heart failure myocytes, increases P(EB).

Introduction Delayed after-depolarizations (DADs) are spontaneous depolarizations that occur during diastole [1]. When their amplitude is sufficiently large, DADs can trigger action potentials (APs) known as ectopic beats (EBs) [2]. Under certain conditions, for example in the setting of catecholaminergic polymorphic ventricular tachycardia, DADs can also trigger large scale cardiac arrhythmias [3]. Computational modeling of EBs and DADs has provided insights into the mechanisms by which they can trigger arrhythmias in the heart [4,5].
We seek to understand how variations in the underlying biophysical properties or physiological state of the myocyte influence the probability of occurrence of EBs. To do this, we use a three-dimensional spatial model of the ventricular myocyte developed by Walker et al [5] in which the fundamental events governing intracellular calcium (Ca 2+ ) dynamics are modeled stochastically. This model has previously been shown to reproduce realistic Ca 2+ waves, DADs, and ectopic beats driven by stochastic gating of L-type Ca 2+ channels (LCCs) and sarcoplasmic reticulum (SR) Ca 2+ release channels (ryanodine receptors, RyRs).
The complexity of this stochastic model makes it challenging to perform the many repeated simulations needed to estimate the probability of ectopic beats, denoted P(EB), as a function of underlying model parameters. Here we explore an approach designed for estimating P(EB) that is computationally efficient. To do this we leverage a technique first developed by Sobie et al [6,7] and apply it to formulate a logistic regression model (LRM) that maps myocyte model inputs (MMIs), which refers collectively to myocyte model parameters as well as state variable initial conditions, to probabilities of cellular arrhythmias. This approach, which we refer to as arrhythmia sensitivity analysis, allows for evaluation of the relationship between these event probabilities and their associated MMIs.
Use of an LRM to map MMIs to the probability of cellular arrhythmias also enables analyses of how experimental uncertainty in the measurement of MMIs influences estimates of these probabilities. One mainstream method for experimental uncertainty analysis is Monte Carlo simulation [8]. It has been used extensively in system biology and myocyte modeling [9][10][11]. However, such methods require many repeated model simulations, an approach which can become computationally demanding when using complex models of the myocyte such as that of Walker et al. [5]. LRMs also provide a computationally efficient approach for performing uncertainty analyses.
In this study, we develop a LRM model, mapping 4 MMIs (initial state of diastolic cytosolic Ca 2+ concentration ([Ca 2+ ] i ini ), initial state of SR Ca 2+ load ([Ca 2+ ] SR ini ), conductance of the inward rectifier current I K1 (G K1 ), and RyR opening rate (k RyR + )) to P(EB). We also investigate the role of uncertainty in [Ca 2+ ] SR ini on P(EB). It surprisingly reveals that as [Ca 2+ ] SR ini uncertainty increases, P(EB) uncertainty first increases (from a sharp unimodal distribution to a broad uniform-like distribution) and then decreases (toward a bimodal distribution with P (EB) concentrated at 0 or 1). Investigation of G K1 uncertainty showed that P(EB), with G K1 at reduced levels associated with heart failure, is significantly higher than that at normal G K1 density.

Modeling ectopic beat probability
An LRM was formulated to capture the quantitative relationship between P(EB) and 4  , all other initial states of the myocyte model were set to values consistent with the diastolic state. To study ectopic beats and DADs, experiments generally have to create conditions that will favor the generation of DADs. We therefore performed simulations under conditions that favor the generation of DADs and ectopic beats. We used a modified version of the Walker et al. model [5]. A novel Na + -Ca 2+ exchanger (NCX) model developed by Chu et al. [12] was incorporated into this model. Adjustments were made to the Ca 2+ flux rate from submembrane to cytosol, the conductance of the rapid delayed rectifier K + current and the density of the Ca 2+ dependent Clcurrent to ensure a realistic action potential duration (~290ms) and Ca 2+ transient amplitude (~800nM) [13]. A detailed description of model modifications is provided in Methods. Myocyte model simulations were performed under the β-adrenergic stimulation condition described in Walker et al [5], with the exception that k RyR + was increased 4-fold compared to its control value based on experimental data for heart failure [14] and to increase the propensity for EBs. A detailed description of simulation protocol is described in Methods and S1 Text. Because of the stochastic nature of the myocyte model, each realization may or may not generate a DAD or an ectopic beat. We defined a biophysically meaningful region of interest over which each of these parameters are varied and within which the LRM is built, where the range for [Ca 2+ ] i ini is 100nM-300nM, the range for [Ca 2+ ] SR ini is 300 μM-700 μM, the range for G K1 _sf is 0-1 and the range for k RyR + _sf is 0.5-1.5. Fig 1A shows five examples each of myocyte model simulated ectopic beats (red), DADs (blue), and no spontaneous depolarization events (green). Since ectopic beats consistently exhibit membrane potential waveforms that exceed 0 mV whereas DADs do not, a threshold of 0 mV was used to detect such events in each realization. Fig 1B demonstrates the relationship between this region of interest and P(EB) for both the underlying myocyte model and the LRM, where B T P (the argument of the logistic regression function) is the weighted (B) summation of LRM features (P). The nonlinear logistic relationship defines three domains of interest: transition domain, lower plateau domain, and upper plateau domain. The transition domain is a region in which P(EB) is a sensitive function of B T P, corresponding to the steep part of the curve in Fig 1B. The lower plateau domain and upper plateau domain are regions where P(EB) is relatively insensitive to MMI perturbation, with P(EB) =~0 in the lower plateau domain and P(EB) =~1 in the upper plateau domain. To build the LRM, we sampled 200 MMI sets from the region of interest and ran 100 realizations for each set from which we obtain P(EB). Since the transition domain is relatively narrow and P(EB) is a very steep function within it, we have developed a two-stage sampling strategy to ensure adequate sampling of this domain (see Methods and S1 Text for details). The 200 MMI sets and their corresponding P(EB)s were used to train the LRM. Once determined, the LRM enables direct calculation of P(EB) for any MMI set without the need to simulate the underlying complex stochastic three-dimensional myocyte model. Fig  1B shows the fidelity with which the LRM model (blue line) reproduces P(EB) estimated using the myocyte model (red points) for the training data set (200 MMI sets) as a function of B T P. Fig 1C shows the predicted P(EB) from the LRM versus P(EB) computed from the myocyte model simulations. The LRM performs well in predicting P(EB) (R 2 = 0.981 for the 102 MMI sets inside the transition domain, mean prediction error of 0.021 ± 0.034 (see Methods and Eq 2)), despite the highly complex nonlinear properties of the myocyte model. To test the ability of the LRM to generalize, Fig 1D and 1E compare the LRM predicted P(EB)s to P(EB)s computed using the myocyte model for test data consisting of 100 independent MMI sets. The LRM performs well on the test set (R 2 = 0.993 for the 22 sets within the transition domain, mean prediction error = 0.006 ± 0.013). Detailed modeling methods and performance metrics are described in Methods and S1 Text.
The LRM for P(EB) was formulated with a total of 10 features. _sf. The selection of quadratic features is based on minimization of the Akaike information criterion (CAIC) [15] associated with this set of features (see Methods for details). Prior to fitting the LRM, features were scaled to a range of 0 to 1 in order to ensure interpretability of the LRM weights. Table 1 shows the LRM weights ranked by feature importance. Feature [Ca 2+ ] SR ini is most important and is followed by feature k RyR  These results presented here demonstrate that LRMs can accurately predict P(EB) generated by the complex myocyte model. For the 100 MMI test sets used in each case, thousands of CPU hours are required to obtain event probabilities from the myocyte model whereas evaluation of LRMs requires negligible computing time on the order of milliseconds. The computationally demanding task of estimating these probabilities is made possible by this LRM approach.
In Fig 2A- . I K1 downregulation also increases P(EB). Note that the P(EB) shown in Fig 2A-2D is predicted directly from the LRM. These findings agree with earlier simulations by Walker et al. [5]. The positive

Uncertainty analysis of P(EB)
The LRM allows for analysis of the uncertainty of probabilistic events such as ectopic beats arising from the experimental uncertainties inherent in measurements of the MMIs. We treat MMIs as random variables, and then the LRM becomes a probability transformation function (i.e. S3 Eq) of these MMI random variables. MMIs with uncertainties are assumed to have a normal distribution with σ taking the experimentally measured value for each MMI. From the experimentally derived distribution for each MMI, we randomly generate 10 6 sets of MMIs and evaluate the LRM for each set to generate a predicted distribution of P(EB).
In   uncertainties, which leads to 3 different distribution patterns: unimodal (Fig 3A), approximately uniform (Fig 3B), and bimodal ( Fig 3C). The red line indicates the mean P(EB). Fig 3A (σ = 5 μM) is a unimodal distribution in which P(EB) takes on values at or near 0.5 with high probability. Fig 3B (σ = 15 μM) is an approximately uniform distribution between 0 and 1, which indicates a complete lack of certainty in the value of P(EB).To quantify the degree of uncertainty in P(EB), entropy, a measure of the uncertainty of random variables, was evaluated for each P(EB) distribution shown in Fig 3 [3]. Entropy is more appropriate than the variance to assess uncertainty for multimodal distributions [16]. A detailed description of the entropy calculation is provided in S1 Text. The distribution of P(EB) in Fig 3B has maximum entropy.
In Fig 3C (σ = 30 μM), the distribution of P(EB) becomes bimodal, with two peaks located at P (EB) values close to 0 and 1. In this scenario, the bimodal distribution indicates that P(EB) is either close to 0 or close to 1 given the symmetry of the P(EB) distribution. Thus, the entropy of the P(EB) distribution in Fig 3C less  becomes equally likely to take on values of 0 or 1 and mean P(EB) converges to 0.5 (Fig 3F).

Role of G K1 on P(EB)
The effect of I K1 current density scaling factor (G K1 _sf) uncertainty on P(EB) was evaluated. We

Discussion
Regression models have been used previously by Sobie et al. to estimate action potential duration, peak and resting membrane potential, and Ca 2+ transient amplitude from deterministic myocyte models, as well as the probability of the occurrence of Ca 2+ sparks from a stochastic Ca 2+ release site model [6,7]. Based on their method, we developed a new pipeline for establishing simplified models (LRMs) for estimating the probability of cellular cardiac arrhythmia events simulated by a complex stochastic myocyte model. Fig 1E shows excellent prediction performance of the LRM in predicting the probability of ectopic beats. The method developed in this work is universal in the sense that it should be considered a general approach to simplifying highly complex nonlinear stochastic models in a wide range of applications.
Our approach differs from that of Lee at al. [7] at two points in the modeling pipeline. First, we developed a two-iteration strategy for generating MMI sets. The second iteration ensures that the transition domain is adequately sampled. Second, we derived LRM quadratic features from the MMIs. The rationale for this approach is that since the myocyte model is highly nonlinear, we expect improved performance by allowing for a nonlinear relationship between MMIs and P(EB) in the LRM. To demonstrate the performance improvement of these two steps, we built preliminary LRMs from only the first iteration MMI sets and using only MMIs as features (i.e. without derived quadratic features). For prediction of P(EB), this approach yields an average error of 0.026 ± 0.058 (S2A Fig), which is worse than that (0.006 ± 0.013) obtained with the full pipeline (S2D Fig). The LRM approach has the benefit of computational efficiency. The average computational time required for each MMI set in the EB study is~0.4 (hours per realization) � 100 (realizations) = 40 CPU hours. To build the LRM, however, 200 training MMI sets must be simulated with the myocyte model, which takes~8000 CPU hours. After building the LRM, calculating for new MMI sets takes negligible time. For example, the computational time required to run the test set (100 MMI sets) in the ectopic beat study is therefore 4000 CPU hours. In contrast, the computational time for LRMs is negligible (~5 CPU ms). Additionally, in the uncertainty study of [Ca 2+ ] SR ini , 10 6 samples were randomly sampled from the uncertainty range of [Ca 2+ ] SR ini . If we estimate the P(EB) for each sample, the total computational time required is 0.4 � 100 � 10 6 = 4 � 10 7 CPU hour. In contrast, LRM takes less than 1 CPU second. These results show the dramatic improvement in computational efficiency of the LRM.
LRMs enable us to quantitatively explore the relationship between MMIs and event probabilities. In the ectopic beat study, Fig 2A and  , indicating that elevation of cellular Ca 2+ (i.e. Ca 2+ overload) tends to increase the likelihood of ectopic beats. This result is consistent with previous experimental and computational studies [5,21]. Fig 2C shows that increasing G K1 _sf reduces P(EB). I K1 downregulation has been shown to be associated with ventricular arrhythmias such as Anderson's syndrome [22] and LQTS [23]. Increased k RyR + _sf is also shown in Fig 2D to  To further analyze the importance of features in P(EB) prediction, we trained 9 sub-models which included only the top n (n ranging from 1 to 9) features showed in Table 1 on the same 200 training MMI sets, respectively. The R 2 and error (mean ± SD) were then obtained for all 9 sub-models on the same 100 test MMI sets (S1 Table). The sub-model based on the top 7 features (n = 7) has mean error of P(EB) = 0.013 and R 2 = 0.995, whereas the n = 6 sub-model has mean error P(EB) = 0.119 and R 2 = 0.758. This indicates that the n = 7 sub-model is the smallest sub-model with performance similar to the full model (R 2 = 0.999, mean error = 0.006). In addition, the n = 7 sub-model is the smallest sub-model containing all four MMIs. These results indicate that all 4 MMIs evaluated in this study are necessary to achieve a high level of LRM performance regardless of their relative importance rankings.
In Fig 3A- widely from 10μM to 290μM [17][18][19]. A consequence of this is that for the majority of the realistic uncertainty range (30-290 μM), myocytes are unlikely to be in the transition domain where both DADs and EBs occur with nonzero probability. Namely, with a small perturbation in [Ca 2+ ] SR ini relative to the uncertainty range, the most likely behavior of the cell would jump between always exhibiting DADs and always exhibiting EBs. This is consistent with the fact that the EB is a bifurcation phenomenon. Other studies also showed that EBs arise as a result of a bifurcation [25].
In Fig 4A, the heart failure (HF) G K1 study shows that, given the experimental uncertainty, P(EB) associated with HF G K1 is statistically significantly higher than P(EB) of normal G K1 . Consistent with our results, Maruyama et al. showed that suppressing I K1 considerably enhanced the DAD amplitude and P(EB) [26]. This result shows the potential in using the LRM-predicted probability of an arrhythmic event to test the propensity for arrhythmias resulting from variations in other ion currents. We chose to explore G K1 in HF myocytes because I K1 data were available, and this method can be used to explore the impact of variation in other MMIs as well. These results also demonstrate the potential role for P(EB) as a useful biomarker for arrhythmia predictions, particularly in mutations and diseases linked to DADs such as catecholaminergic polymorphic ventricular tachycardia [27]. Additionally, early afterdepolarization (EAD), another important type of cellular arrhythmia, can be predicted via LRM. Such an LRM may be used as a biomarker to predict the arrhythmogenic risk of EADrelated diseases such as long QT syndrome [28].
As shown by two case studies (uncertainty of [Ca 2+ ] SR ini as well as G K1 in heart failure), The LRM-based uncertainty analysis approach presented in this study enables efficient propagation of experimentally measured uncertainty to P(EB) prediction. To further demonstrate the broad significance of this uncertainty analysis approach, two additional potential applications are discussed. 1) In this work, each application of uncertainty analysis has explored the impact of uncertainty in a single MMI ([Ca 2+ ] SR ini or G K1 ). However, this uncertainty analysis approach can be applied to evaluate the role of uncertainties of all MMIs simultaneously. Such an approach may better mimic real experimental conditions where multiple quantities are being measured or controlled. 2) Additionally, this uncertainty approach can serve as a guide to experimentalists for the determination of the degree of accuracy required in any particular measurement in order to yield a reasonable prediction of P(EB). A potential future application of our P(EB) prediction approach will be to systematically study the relationship between P(EB) and the delayed afterdepolarization. It is well understood that triggered ectopic beats arise from DADs when DAD amplitude is sufficiently large to activate rapid inward Na + current [2]. Liu et al. have known that the DAD amplitude threshold for generating ectopic beats is dynamic [29]. These results imply that the relationship between P(EB) and DAD amplitude is not necessarily a simple monotonically increasing function. Some interesting as yet unanswered questions could be explored. For example, under what conditions (i.e., what set of MMIs), might P(EB) be relatively large while DAD amplitude distribution tends to smaller values, and vice versa. How does the uncertainty in DAD amplitude relate to the uncertainty in P(EB)s?
In summary, we developed a pipeline for building logistic regression models (LRMs) that predict the probability of cellular arrhythmias. These simplified models faithfully reproduce the relationship between parameters/inputs and probabilistic events as learned from the mechanistic biophysically-detailed stochastic myocyte models, but with far less computational burden. As far as we know, this is the first simplified model enabling quantitative investigation of determinants of the probability of cellular arrhythmia events in a complex stochastic spatial myocyte model. This approach, which we refer to as arrhythmia sensitivity analysis, allows for systematic study of the relationship between these event probabilities and their associated myocyte model inputs. As an example, we built a LRM to study the relationship between P (EB) and heart failure G K1 study shows the potential value of our approach in revealing the determinants of arrhythmia probability, indicating the potential for this method to be used as a tool for arrhythmia risk prediction.

Modified walker ventricular myocyte model
This study uses a modified version of the Walker et al. stochastic ventricular myocyte model [5]. The Walker et al. model is a biophysical-detailed three-dimensional spatial stochastic model of a cardiac ventricular myocyte containing 25,000 Ca 2+ release units (CRUs) (each representing a dyad) along with their sub-membrane compartments. A subset of CRUs (5400) was used for all simulations to accelerate the computation and all CRU Ca 2+ fluxes were scaled to the total number of CRUs. This model has been shown to reproduce realistic Ca 2+ waves as well as DADs. Chu et al. [12] recently developed a model of the Na + -Ca 2+ exchanger (NCX) that includes allosteric regulation of NCX function by Ca 2+ . This NCX model was incorporated into the Walker et al. model. Following Chu et al. [12], NCX was localized 15% in dyad, 30% in the submembrane compartment, and 55% in the cytosol. In addition to these changes, NCX conductance was increased 20%, the time constant of Ca 2+ flux from submembrane to cytosol was reduced by 20%, I Kr conductance increased by 20%, SERCA pump rate decreased by 25%, and number of I to2 channels reduced from 8 to 5 in each submembrane compartment to assure a more realistic canine APD (~290 ms) and Ca 2+ transient peak (~800 nM), similar to that of the normal canine ventricular myocyte [13].

Simulation protocol for ectopic beat
In the ectopic beat studies, the initial state of the myocyte model corresponds to the diastolic state, where membrane potential is -93.4 mV and NCX is partially allosterically activated (~20%-40%). To create the condition that favors the generation of DADs and ectopic beats, model simulations are performed under a β-adrenergic stimulation protocol, which is the same as previously described by Walker et al. [5,24], with the exception that RyR opening rate is increased 4-fold compared to its control value based on experimental data for heart failure [14] and to increase the propensity for EBs. No external current was applied, and simulation duration is up to 800ms. Please see details in S1 Text.

Statistical modeling
, where P = P o is a vector of features, B = B o is a vector of weights. P o consists only of linear features (i.e. MMIs). The subscript "o" stands for "initial", representing the first iteration of the pipeline. P(event) represents the estimated event occurrence probability. After this first iteration, the logistic equation Eq 1 is then used as an additional constraint to estimate the transition domain, which is a subregion within the region of interest for the MMI set. A detailed description of transition domain is in the next section. The second iteration then samples only within the transition domain, and steps 4-5 of the workflow are repeated. MMI sets and corresponding event probabilities obtained from both iterations are combined, and the final LRM is built again using Eq 1, where P = P f is the final feature vector and B = B f is the final weights vector. The subscript "f" stands for "final", representing the second iteration of the pipeline. In addition to linear features (MMIs), P f also includes quadratic features derived from the linear features. To select the best performing combination of quadratic features for P f , we formulated the following approach. We first construct a set of feature vectors which includes all possible combinations of quadratic features. Each feature vector in this set consists of all linear features and a unique combination of quadratic features. For each feature vector, we fit Eq 1 with the MMI sets from first and second iterations. We choose, as the best performing set of quadratic features, those that comprise the particular feature vector which minimizes the Akaike information criterion (CAIC) [15].
We refer to this model as the LRM. 100 training MMI sets are generated in both the first and second iterations. For each MMI set, 100 realizations (for ectopic beat study) are simulated to estimate the probability of the event of interest. To ensure the generalizability of the LRM, an additional independent 100 test MMI sets are generated within the region of interest to be used for evaluation of the performance of the LRM. All logistic regression model fits were performed with MATLAB R2018b fitglm function.

MMI identification and MMI region of interest
When building the LRM for predicting probability of occurrence of DAD-induced EBs (P (EB)), four MMIs were selected to be varied. These were initial states of [  [20] has been shown to occur in failing heart myocytes and changes in these cellular properties are associated with cellular arrhythmias. Thus, the range of 0-1 is selected for G K1 _sf. The range for k RyR + _sf is 0.5-1. 5 The region of interest can be separated into two mutually exclusive subspaces in terms of the arrhythmia event probability: plateau domain and transition domain. The plateau domain is the space where MMI sets simulated in the myocyte model yield P(event) strictly equal to 0 or 1. The transition domain is the space where MMI sets simulated in the myocyte model yield P(event) strictly > 0 and < 1. We can specify the lower plateau domain as corresponding to P (event) = 0 and the upper plateau domain as corresponding to P(event) = 1 since these are also mutually exclusive. Because of the finite number of realizations performed with the myocyte model for each MMI set, P(event) takes on a discrete set of values such that we can strictly follow this definition.

LRM validation metrics
Linear regression was performed between values of myocyte model-generated actual P(event) and LRM predicted P(event). The R-squared (R 2 ) from the linear regression was calculated. Absolute error was also calculated as ,where n is the number of MMI sets, P i (event) MM is event probability of ith MMI set obtained from the myocyte model, and P i (event) LRM is event probability of ith MMI set obtained from the LRM.

Uncertainty analyses
The LRM derived in this study is a function that maps MMIs to P(event). Experimental measurements of these MMIs exhibit variability, and this variation can be modeled by assuming these measurements are drawn from an underlying probability distribution. In this case, the event probability produced by the LRM is itself a random variable. We wish to characterize this uncertainty by computing the distribution of the estimated event probability. To do this, when the mean and variance of MMI estimates are available from experimental data, we assume MMIs are random variables drawn from a normal distribution with that mean and variance. In cases where the argument of the LRM (B f T P f ) involves a weighted sum of individual MMIs not including quadratic features, this distribution can be calculated analytically, and is the logit-normal distribution (Eq. S5) [33]. In cases where the feature vector P f includes elements containing quadratic features, the distribution of P(event) need to be estimated numerically. In this latter case, we randomly generate 10 6 sample vectors for a specific MMI set based on the underlying MMI uncertainty distributions and use the LRM to predict P(event) for all sample vectors. Following this approach, the P(event) distribution can be estimated.