A greedy classifier optimization strategy to assess ion channel blocking activity and pro-arrhythmia in hiPSC-cardiomyocytes

Novel studies conducting cardiac safety assessment using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) are promising but might be limited by their specificity and predictivity. It is often challenging to correctly classify ion channel blockers or to sufficiently predict the risk for Torsade de Pointes (TdP). In this study, we developed a method combining in vitro and in silico experiments to improve machine learning approaches in delivering fast and reliable prediction of drug-induced ion-channel blockade and proarrhythmic behaviour. The algorithm is based on the construction of a dictionary and a greedy optimization, leading to the definition of optimal classifiers. Finally, we present a numerical tool that can accurately predict compound-induced pro-arrhythmic risk and involvement of sodium, calcium and potassium channels, based on hiPSC-CM field potential data.


Introduction
The Comprehensive in vitro Proarrhythmia Assay (CiPA) is an initiative for a new paradigm in safety pharmacology to redefine the non-clinical evaluation of Torsade de Pointes (TdP) [1][2][3].
It aims to more precisely assess TdP risk in vitro by using a multifaceted approach that combines in vitro evaluations of electrophysiological responses in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) and in silico models providing reconstructions of drug effects on ventricular electrical activity [4,5].
Since CiPA, in vitro studies using hiPSC-CMs become an increasingly integrated part of today's cardiac safety assessment. While encouraging, adequately predicting TdP risk of unknown drugs based on in vitro studies alone is challenging [6]. Besides, the analysis of the large data sets derived from those studies is often far from being automated.
One of the main challenges in proposing a high-throughput screening based on novel devices is often related to the variability of the signals measured, that could pose sensible questions about the ability to extract useful information from them. The main impact of the present work is related to this aspect, and the proposed framework can be considered as a first preliminary step towards the setup of a systematic procedure.
The main focus of the present study is to investigate a computational tool that combines statistical analysis and machine learning approaches (used in this context in [7]) to the mathematical modeling and the numerical simulations (in silico experiments) of the drug effects on the field potential (FP) of hiPSC-CMs obtained by multi-electrode array (MEA) technology.
Two problems of interest in the pharmacology community will be addressed: the first one is related to the prediction of the proarrhythmic behaviour of a drug, and the second one to the ion channels blockade (see Fig 1 for a summary of the main keypoints of the article). These are typical classification tasks. Some classification studies in cardiac electrophysiology were proposed in the literature, on simulated action potentials [7,8] or ECG [9,10].
The contributions of the present paper are the following: 1. A dictionary based greedy optimisation method is proposed, that selects the most pertinent signal features to maximise the classification score. This procedure helps correct the classical markers used to analyse Field Potential signals and provides encouraging results.
2. The in vitro dataset is complemented by an an in silico dataset. This makes it possible to explore all the possible scenarios and help mitigate the high-dimensional/low sample size regime potentially affecting the performances of the classifiers. 3. In constructing the signal database, the uncertainties affecting the experimental setup are accounted for. Despite many variability sources, the proposed approach aims at defining a robust classifier. Concerning the problems considered in the present manuscript, in the Field Potential there is enough information to provide an answer to them, irrespective of all the uncertainties affecting the experimental setup. 4. The proposed approach was tested on real data coming from actual experiments performed with MEA technology.
The work is structured as follows: the first part is dedicated to the methods used to reproduce in silico physiological signals (FP and calcium transient signals) based on the bidomain equations [11] and the O'Hara, Virág, Varró and Rudy (ORd) ionic model [12]. The relation between drug concentration and ion channel activity is rendered through scaling factors depending on IC50 values (as proposed in [13][14][15]. The outputs of the in silico model are the simulations of the Field Potentials (FPs) recorded from extracellular micro electrodes, and the averaged calcium transient on a well ([Ca 2+ ] i ).
The second part is dedicated to the description of the method used to integrate in silico experiments and in vitro data in order to design an optimised classification tool. The proposed approach is based on the construction of a dictionary of linear and non-linear forms applied to the set of in vitro and in silico data; a greedy algorithm is defined to build a sparse observationto-prediction relation.
Finally, we applied the classification process in two situations: detecting torsadogenicity (TdP risk versus non-TdP risk) with a synthetic dataset and detecting ion channel blockade (for sodium, calcium or potassium channels) by the action of a given compound, on in vitro MEA data.
The classification results obtained show that the double greedy optimization strategy is effective in improving classifiers performances (with only a few parameters to be tuned) and is well adapted to study compound effects on hiPSC-CM electrophysiology that will aid in early and predictive cardiac safety assessment.

Materials and methods
In this section, we present the method developed to improve the classification of electrophysiological regimes based on MEA signals. It consists in fusing together information coming from available experimental MEA data and numerical simulations in order to design the classifiers to be used.
First, the experimental methods are described; then, we show the different models used to reproduce FPs and calcium signals (MEA computational model) and we end the section by presenting the optimised classification algorithm (Classification) and the definition of the dictionary entries (Dictionary construction). The structure of this section is shown in Fig 2.

Experimental setup
The methods used to perform the experiments and acquire the recordings of the FPs are presented in detail below.
Cell culture. Human iPSC-CMs (Pluricyte Cardiomyocytes, Ncardia, Leiden, The Netherlands) were stored in liquid nitrogen until thawed and cultured onto 96 well MEA plates (Axion Biosystems, Inc., Atlanta, USA) according to manufacturer instructions. Briefly, the MEA plates were coated with fibronectin (50μg/ml in PBS [+ Ca2+ & Mg2+], Sigma-Aldrich, St. Louis, MO, USA; Cat. No. F-1141) for 3 hours at 37˚C and 5% CO2. After 3 hours of incubation time, the excess of fibronectin coating solution was removed and cells were plated in a 5 μl droplet at a density of 25000 cells/well. After 1 hour of incubation (37˚C and 5% CO2), 100μl pre-warmed (37˚C) medium (Pluricyte Cardiomyocyte Medium, Ncardia, Leiden, The Netherlands) was carefully added to each well. Cells were maintained in Pluricyte Cardiomyocyte Medium for 8 days and refreshments took place at day 1 post-thaw and subsequently every other day. MEA recordings were performed at day 8 post-thaw. The choice of these different parameters of the experimental setup was presented and commented in [16].
Test compounds. At day 8 post-thaw, medium was refreshed at least 2 hours before compound addition. The 12 test compounds were provided by the Chemotherapeutic Agents Repository of the National Cancer Institute and consisted of a random subset of CiPA compounds. The compounds were from 3 clinical TdP risk categories: low/no (Loratadine, Mexiletine, Diltiazem), intermediate (Clozapine, Chlorpromazine, Clarithromycine, Cisapride, Droperidol) and high (Ibutilide, Dofetilide, Bepridil, Azimilide) [17] (see Table 1). Chemical stock solutions at 1000 folds of the target concentrations were prepared under sterile conditions in DMSO and stored at −20˚C, according to HESI Myocyte Phase II Validation Study Protocol instructions. The serial diluted compounds were further prepared in DMSO on the day of compound assay. The 10-fold final dilutions of the compounds were prepared with Pluricyte Cardiomyocyte Medium, for single time use only. Pluricyte Cardiomyocytes were exposed to four different concentrations of the compound, under sterile conditions in single point additions (i.e. one concentration per well) in five replicates for each concentration. Vehicle control was 0.1% DMSO.
Ethic statement. All the experiments were performed under permits granted from the Commissie Medische Ethiek Leiden University Medical Center (permit number: NL45478.058.13).
MEA recordings. At day 8 post-thaw, 96 well MEA plates seeded with hiPSC-CMs were placed in the Maestro MEA device (768-channel amplifier) with an integrated heating system, temperature controller and data acquisition interface (Axion BioSystems, Inc., Atlanta, USA). The field potential traces of the hiPSC-CMs were recorded prior to (baseline) and 30 min after compound addition for 5 min. The recording conditions were at 37˚C using Cardiac Standard filters and amplifiers in spontaneous cardiac mode (12.5 Hz sampling frequency, 2 kHz Kaiser Window, 0.1 Hz IIR). The beat detection threshold was 300 μV.

MEA computational model
This part provides a detailed description of the mathematical models used to simulate FPs in a realistic MEA geometry. Simulated FP studies were already performed for in silico assessment of drugs effects [14] or channel activity identification [18] and have shown the potency to reproduce and analyze compound effects on cardiac electrophysiology.
The first section concerns the bidomain equations, which governs the electrical activity propagation in a tissue. Since in a well the cells might not be perfectly uniformly distributed and the cell population might even be heterogeneous, a stochastic model of the population distribution was adopted, which is described in section Heterogeneity. In the last part (Drug modeling) we describe the compound simulation strategy, aiming at reproducing the experimental protocol used to classify reference compounds holding ion channel blocking properties.
Bidomain equations. To simulate MEA recordings, the bidomain Eq (1) were solved on a domain D, representing a well, by using a finite element method. The main geometrical hypothesis is that the cardiac cells in the well form a monolayer; therefore, to write the model, it is reasonable to consider a two dimensional domain, D � R 2 .
The equations read: where A m is the surface area of the membrane per unit volume of tissue, C m the membrane capacitance and z thick the thickness of the cell layer. V m and ϕ e correspond respectively to the transmembrane potential and extracellular potential and I ion is the ionic model depending on V m and the gating variables. The propagation velocity is carried by the intracellular and extracellular conductivity (σ i and σ e respectively). The boundary conditions used with the bidomain equations are the following: σ i rϕ i � n = 0 (with ϕ i = V m + ϕ e ), and either ϕ e = 0 on the region connected to the electrical ground or σ e rϕ e � n = 0 elsewhere.

PLOS COMPUTATIONAL BIOLOGY
To take the impact of the electrodes on the signal into account, an imperfect electrode model [18] is coupled to the bidomain equations. The model is described in Eq 2.
where � k e;mean is the averaged extracellular potential on the electrode e k , R el and C el are the electrode resistance and electrode capacitance respectively and R i is the internal resistance of the measurement device. τ = C el (R i + R el ) is the time constant of the RC circuit. Then, the field potential � k f measured on the electrode e k is given by � k f ¼ R i I k el . For this study, electrodes parameters values used are summarized in Table A in S1 Text.
The ionic current I ion (V m , w) and the state variable w are provided by the ORd model [12]. Three types of cells are considered to mimic the monolayer heterogeneity (Heterogeneity): Epicardial, Mid-myocardial and Endocardial. These cell types are simulated through specific sets of parameters given in [12]. This model takes into account the main concentration dynamics ([Na + ] i , [Ca 2+ ] i and [K + ] i ). The current I app = I app (x, y, t) is the origin of the activation. The source is supposed to be located in a unique region and is defined as follows: where the position (x 0 , y 0 ) is drawn randomly and r = 50μm is the radius of the source. I 0 = −130pA/pF is the maximum stimulation value, t 0 is the time when I app is at its maximum and s ¼ Dt 6 with Δt = 4ms. The discretization of the partial differential equation was done in space using P1 Lagrangian finite elements and in time using backward differentiation formula (BDF) schemes with a time step of 0.1 ms. The ODE system governing the action potential modeling (I ion (V m , ω) in Eq 1) was solved using BDF scheme with adaptive time steps and order, whose implementation is provided by Sundials' CVODE [19]. These space and time discretizations of the bidomain equations were already used in different studies (in silico ECG and in silico field potentials) and have shown qualitatively good results compared with real data [14,18,[20][21][22][23].
To mimic experimental measurements, a 10μV standard deviation noise of a zero-mean Gaussian was added to FPs.
As some devices are able to get the intracellular calcium transient by fluorescence, we made the assumption that we have access to intracellular calcium transient data. We added a zeromean Gaussian noise of 10 −3 μM on the intracellular calcium transient obtained by simulation with the ORd model.
Heterogeneity. The hiPSC-CMs used in this study are > 70% pure cardiomyocytes based on positive Troponin T (TnT) expression. At least 70% of the TnT positive cells express a ventricular phenotype (based on ventricular myosin light chain 2 (MLC2v) expression and patch clamp technology). The other 30% of the cell population are of mesodermal origin. The actual distribution of these cells inside the well is unknown, and is a source of uncertainty that we need to take into account when developing the classifier in order to provide meaningful results in realistic applications.
To simulate this heterogeneity in the cell distribution inside a well, a space stochastic process was introduced, similarly to what was proposed in [24]: let ðZ; A; PÞ be a complete probability space, Z being the set of outcomes, A a σ-algebra and P a probability measure: An hypothesis on the correlation of the process was made and expressed in Eq 5: that is, the correlation is normal and its length l c was set to 0.25 mm, which corresponds approximately to the distance between two electrodes. A Karhunen-Loève expansion based on the diagonalisation of the correlation kernel was used in order to generate the heterogeneity fields (see [24] for details). As shown in [25] in a case of a low value for l c , the medium is homogenised which leads to a decrease of the repolarization phase. The parameter l c = 0.25 mm was a good choice to qualitatively reproduce FP signals and more precisely the repolarization phase. When discretised on the finite element space (P1 Lagrangian elements were used), a cell type was affected to each node of the finite element mesh according to the following rule: Endocardial; otherwise: where c i 2 [0, 1] is given by the random process c discretised at the node whose coordinates are x (i) (see Eqs 4 and 5). Example of random heterogeneity obtained with this method is presented in Fig 3. Drug modeling. In this study we assume that a drug may affect only sodium, calcium and/or potassium channels. If the properties of a compound are known (e.g., IC50 for each ionic channel) then the conductance-block model [13][14][15] can be used to render its action on the ion channels. This model rewritten in Eq (7) qualitatively reproduces the expected compound effect on FPs [20].
where g control,s is the conductance of the channel s at control case (baseline), [D] is the concentration of the drug and IC50 s is the constant of the drug concentration at which current of channel s is blocked at 50%. We chose to set the Hill coefficient n at 1. The first reason is due to the confidence intervals of computed Hill coefficients for different compounds [26] which most of the time includes 1. The second reason comes from the use of the EFTPC in our simulations. Varying the Hill coefficient between 0.6 to 1.4, the standard deviation of the channel activity is lower than 0.05 for concentrations higher than the IC50 and lower than 0.03 for concentrations lower than the IC50 (see Fig 4). The use of the EFTPC leads to a low variability in the channel activity according to the Hill coefficient and studied compounds.
On the other hand, we can simulate unknown compounds, blocking randomly sodium, calcium and/or potassium channels.

Dictionary construction
In this section the details about the construction of the dictionary entries are provided and commented. As mentioned previously, the dictionary is a collection of linear and non-linear forms applied to the signals, corresponding to the definition of features (think, for instance, to the maximum of the signal, or its average, and so on). Since a greedy optimization strategy has PLOS COMPUTATIONAL BIOLOGY been devised, and since the internal stages of the descent are easily parallelizable, the affordable dictionary size is potentially large (a few hundred in the study, potentially few thousands).
In the present work, the dictionary is divided into two parts: • non-agnostic, or informed.

• agnostic
In the informed part we collect the biomarkers extracted from the signal, identified by the experts as correlated to some regime of interest. These quantities are meant to reveal a particular state of the system or alteration of a parameter. For instance, altering the sodium channel activity induces a modification in the depolarization amplitude [29]. The second part of the dictionary is agnostic, meaning that the linear and non-linear forms introduced are extracted from the signal as a mathematical object. The goal of the agnostic part of the dictionary is to enrich it, henceforth increasing the possibilities of computing from the dictionary an input leading to a good classification. The dictionary entries and their numbering are presented in Table D in S1 Text.
Dictionary entries: Electrophysiological biomarkers. First, some intuitive biomarkers were extracted, e.g. depolarization amplitude (DA), field potential duration (FPD), etc. These quantities (called parameters in the electrophysiology community are presented in Fig 6. Remark: In the electrophysiology community we often refer to parameters to designate quantities extracted from the experimental signals. In the present work, we follow the usage in applied mathematics and engineering communities, that name parameters the quantities affecting the state of the system and not the quantities read from the system observable. Their computation follows the work in [24] and it is described in more details in Section Field Potential Biomarkers computation (S1 Text). Concerning the calcium transient signal computation, details are given in Section Calcium Signals Biomarkers computation (S1 Text).
As these values of these biomarkers are computed in control and drug case, we decided to use relative values to the control case. For instance, the DA ratio is: DA drug DA ctrl . The justification of this choice is shown in Fig 7. As we can see, even if the control case is different, the impact due to a compound is qualitatively the same regardless of the heterogeneity field.
An example of an effect of a drug on the repolarization of the cells compared to baseline is presented in Fig 8. In a case where a drug does not affect the repolarization, we should obtain a curve similar to f(x) = x (red line in Fig 8).
In the case where the repolarization is affected by a compound, a distortion appears on the signal (see black dots lower panel in Fig 8 corresponding to an increase in the FPD). Five markers (from K1 to K5) were extracted from the signal, with FP drug rep and FP ctrl rep the repolarization part of the FP for the drug and control case: • Time of the maximum deviation.
Dictionary entries: Wavelets. In order to construct the agnostic part of the signal, a wavelet decomposition was considered for the repolarization phase. The number of coefficients retained is such that the signal could be represented up to the noise level by the wavelets expansion. When a new signal is analyzed, only the selected coefficients (already computed for the training database) are then used to reconstruct the signal. If the L 2 error is lower than an arbitrary value, we store these coefficients. Otherwise, we compute the new location and add the missing locations. The wavelet transform was done on the absolute difference between the   Table F in S1 Text) on the FP (from one electrode) and intracellular calcium transient (from one well) for two different heterogeneity fields. A finite element mesh of 96-well MEA device from Axion Biosystems was used for this simulation (see right panel of Fig 3).

Classification
Given a molecule which is a candidate to become a drug, several questions arise concerning its impact on the electrical activity of cells. Basic questions like: Is this drug blocking channel X? or: Is the drug potentially causing arrhythmia? are naturally treated by solving a classification problem.
One of the main difficulties related to such a study is the curse of dimensionality [30] since we are dealing with high dimension, low sample size data. Otherwise stated, the function to be identified in view of setting up efficient classifiers is defined over a high dimensional domain PLOS COMPUTATIONAL BIOLOGY and the number of available data is too low. To tackle this problem, numerical simulations were exploited. The rationale behind the strategy is twofold: first, we added virtual in silico experiments to the data set to increase the population size. Second, we exploited the simulations to extract meaningful low dimensional subsets of the data, which contributed to the mitigation of the high-dimensionality. Several methods are available in the literature to extract these low dimensional subsets [31].
However, the risk in using generic problem independent methods is that the subsets obtained could drastically reduce the amount of information conveyed about the quantity of interest to be classified.
Henceforth, we constructed a low dimensional subset of the data that has been exploited in the classifier construction, designed to deliver optimal classification performances. This method can be applied to all different classification techniques, and the result is classifier dependent.
Classification optimization. As stated, the aim was to set up a dimension reduction strategy in such a way that, when the extracted subset of the data is used as input for the classifier, we obtain optimal performances. The procedure reminds, in the spirit, the feature selection PLOS COMPUTATIONAL BIOLOGY algorithms (and feature hashing), with some important differences, which will be highlighted later on.
First of all, for a classification problem, we needed to define a database of signal samples. Let i 2 N � ; i � N s , the i-th signal sample is z ðiÞ ðtÞ : R þ ! R m , where m depends on the number of electrodes of the device, as well as the ability for instance to read intracellular calcium transients. The database contains either real signals coming from MEA experiments or numerical signals.
After having collected a database of N s signals, we defined a dictionary of linear and nonlinear forms that has been applied to the signal.
The whole dictionary applied to the signals population is stored in a matrix The input of the classifier is defined as a linear combination of the dictionary entries applied to a signal: The weights ω kj of the linear combination are stored in a matrix O 2 R d�N b . The goal, is to find the weights matrix which defines the input space maximizing the classification score, for a given classification strategy. The problem is recast as a minimization of a cost function, describing the classification performances.
In the binary classification case, let N s be the number of samples, l i = {−1, 1} the true label of the i th sample andl i ¼ fÀ 1; 1g the predicted label of the i th sample associated withp i 2 1 2 ; 1 � � the confidence of the classifier to be the predicted label. n 1 (respectively n −1 ) is the number of samples labeled as 1 (respectively -1) and is introduced to avoid a possible bias due to unbalanced classes. Then, we have n 1 + n −1 = N s . The δ i (x) function is the Dirac function (δ k (x) = 1 if x = k, 0 otherwise). Finally parameter α � 1 is introduced to penalize the false positive case (l i = 1 andl i ¼ À 1). The expression of the cost function is presented in Eq 11.
With this formulation, the minimum value of s N s is -2 and the highest value is 1 + α. See the demonstration in Cost function demonstration: 2 classes (S1 Text). The rationale of including the termsp i in the cost function is to better describe the performances of the classifier, accounting for the confidence in the classification, and not merely on the success rate. This aims at setting up a robust classification tool.
Following the same principles, the cost function can be extended to k classes as shown in Eq 12: where n j is the number of samples labeled j and α j > 0 the weight assigned if the predicted PLOS COMPUTATIONAL BIOLOGY label is not the class j. The value obtained for the best case is −k whereas the cost obtained in the worst case is P k j¼1 a j . Demonstrations are presented in Cost function demonstration: general case (S1 Text). In the case where we do not penalize classes, all the α j are equal to 1.
A regularization term was added to the cost function: where b 2 R þ is a penalization parameter. This term aims at breaking the scaling invariance of the linear combination of the dictionary entries. In particular, if a linear classifier is used, let a 2 R; a 6 ¼ 0, the classification score when using α O is the same as O, irrespective of the value of α.
The optimization problem reads: The optimization problem is challenging to be solved for several reasons: first, the dimension d of the input space to be found is not known a priori, but has to be determined. Second, given realistic signals, the number of features that can be extracted, and hence the size of the dictionary, can be quite large, leading to an optimization problem on a large dimensional space. To mitigate these difficulties, a greedy optimization strategy was adopted. The rationale is the following: we are interested in finding an input space of small dimension (to overcome the curse of dimensionality), and it is preferable that each input dimension is expressed as a sparse linear combination of the dictionary entries, to avoid overfitting problems. Henceforth, we start by defining an input space of dimension d = 1. For this, the weight matrix reduces to a vector. Assuming that only one dictionary entry can be used, an optimization is performed to choose the entry. Then, we look for a combination of two dictionary entries, and so on, till the score variation when adding an entry is less than a prescribed tolerance. When the first component of the input space x 1 is determined, we look for possible improvements by setting d = 2 and computing x 2 in the same way, when x 1 is the one found at the previous step of the algorithm. The greedy descent stops when the variation of the classification score is lower than an arbitrary tolerance. This double greedy strategy is defined in the pseudo-code Algorithm 2. The inner steps of the algorithm require the solution of a small size optimization problem, which is carried out by using a stochastic evolutionary strategy CMA-ES (presented in [32]).
Algorithm 2 Section Classification optimization: Sparse optimised classification. for ic = 1 to N comp do ⊳ Loop on the desired number of components to take into account for dimension id. 9: p save , ω save ⊳ Empty vectors. Store respectively best components position and best weights. 10:  The stopping criterion can be derived based on the analysis of the cost function. We propose to find the minimum variation which ensures that the classification does not change. It means that only the probabilities can differ between two consecutive cost computations without changing the predicted class. This corresponds to an increase of the margin between the classes. More details are given in Stop criterion based on the success rate variation (S1 Text).
Remark: the presented algorithm can be interpreted as a simple perceptron [33] with one hidden layer as shown in Fig 10 where all the weights are equal to zero except those selected by the algorithm. Instead of using usual activation function as sigmoid or Heaviside [34] we used the linear discriminant analysis (LDA) technique [35]. The analogy to the back propagation to compute the weights corresponds here to the CMA-ES algorithm. For a given set of weights, the obtained output is compared to the real solution through the cost function. Then a correction is made on the weight until convergence. Fig 11 shows the multidimensional case where the LDA classifier takes the different linear combinations computed for each dimension as input.
Cross-Validation. As inputs are computed to maximize the classification, a risk is to lose the generalization capacity of a good classifier. To prevent the overfitting and to increase the robustness of the strategy a random k-fold cross-validation was used. A stratification was applied on the data to ensure the conservation of the output repartition in each fold.
The pseudo-code is described in Algorithm 3 and the corresponding Scikit-Learn method was used.
Algorithm 3 Section Cross-Validation: Randomized K-fold cross-validation procedure.   The repetition of the random K-fold strategy allows the convergence of the weights regardless of the training and test set generated. The higher the number of weights to determine, the higher should be the number of random K-fold.

Results
Two different studies were performed in the present work: classify compounds for their risk on TdP; classify compounds for their ion channel blocking properties.
The results of these are presented hereafter.
In the first part of section TdP classification, we describe the study results based on the conductance-block model (see Eq 7). Using this model, we classified the TdP risk of 86 known compounds based on simulated data using the compound's IC50 values for blocking sodium, potassium and calcium currents and the effective free therapeutic plasma concentration (EFTPC) values, reported by the literature.
In the second study (section Channel classification) we classified compounds based on experimental data. The outcome consists in identifying which channel is affected (sodium, calcium or potassium) by a compound. These experiments were performed for 12 compounds using Pluricyte Cardiomyocytes. Five of them were used for the training set and seven of them for the validation. Because of the low sample size of data, a simulated database was generated to enrich the training set.
The stop criterion used for the following results is when the cost variation between the last two components is lower than 5%. The penalization parameter β described in Eq 13 was set to 0.1.

TdP classification
This section is dedicated to the torsadogenicity risk classification. Only simulated data are considered for this study. To predict the risk of TdP of a wide range of compounds, we simulated the application of 86 known compounds previously reported by [7].
Tests setup. The numerical choices leading to the results are summarised hereafter: • The false positive part in the cost function (see Eq 11) was taken α = 2, to minimise the false positive rate.
• Bidomain equation parameters are summarized in Table B in S1 Text.
• Drugs were modeled using Eq 7 presented in section Drug modeling. The IC50 values for each compound are given in [13,36]. Concentrations chosen to simulate compounds are the effective free therapeutic plasma concentrations (EFTPC). These values are listed in Tables E and F in S1 Text. Eighteen drugs were modeled twice because of their different IC50 and EFTPC observed in the literature (see Tables E and F in S1 Text).
• The corresponding channels blocked in the ORd model are I Na (g Na ), I Kr (g Kr ) and calcium channels (I CaL , I CaNa and I CaK ) through the PCa variable as previously reported in [7].
• A 6-well MEA device (Multichannel Systems) with 9 electrodes per well (60-6well-MEA20030iR-Ti [37]) where the corresponding finite element mesh is presented in the left panel of Fig 3. A cell heterogeneity field was applied on this finite element mesh following the strategy developed in section Heterogeneity.
• The sparse optimization was performed on a dataset of 1520 data points (76 first compounds, each compound simulated 20 times with different heterogeneities and sources). The FP traces corresponding to the last 10 compounds were also simulated 20 times with different heterogeneities and source, but used for the validation set. The same process was done for the calcium transient signals. The dictionary entries used for this classification problem are summarized in Table D in S1 Text.

Results of TdP classification
We start this section by commenting on the results of the classifier as function of the input space constructed by the greedy algorithm. In Fig 12 the success rate of the classification for the validation set is plotted as function of the cost presented in section Classification optimization, Eq (11). The cost minimised by the proposed algorithm is a pertinent descriptor of the success rate of the classifier. The input space selected by progressively increasing the input space dimension as well as the components per dimension produces a high success rate. The input space corresponding to the case where the input is in R 2 (with three dictionary components per direction) is shown in Fig 12, from which we can appreciate that the separation between the classes is satisfactory. The results of the classification are detailed. In Fig 13 the confusion matrices for the training set (left, in blue) and for the validation set (right, red) are shown. Globally, the results are similar for training and validation (no apparent overfitting phenomena were seen). The type II error (wrongly classifying a compound as non-torsadogenic) is well minimized thanks to the choice to penalize false positives (α = 2 in the cost function, Eq 11). In the validation set, no compounds were wrongly classified as non-torsadogenic. Only the Propranolol was misclassified as torsadogenic (see Table E in S1 Text).
It is interesting to monitor the classification results at different stages of the algorithm. The confusion matrices are given in Figure A in S1 Text. Confusion matrices obtained for test and validation sets show an improved TdP risk classification when we increase the number of components and dimensions. This improvement is particularly visible on the test set for the first components. The training on the first dimension is not sufficient to classify well the validation set. Meaning that other dictionary entries would have been selected by the algorithm (for the first dimension) if the validation set was in the training set. However, dictionary entries selected for the second dimension seem to be better to discriminate torsadogenic risk on the validation set.

Channel classification
This section is dedicated to the channel classification of 12 compounds based on in vitro data derived from MEA recordings of spontaneous beating hiPSC-CMs (Pluricyte Cardiomyocytes) cultured on 96 well MEA plates (8 electrodes per well, Axion Biosystems), as described in section Cell culture. As we are limited by the experimental sample size (see compound list in Table 1), we enriched the experimental database with a simulated database (for which we know the classification output). For this study only FP traces were recorded and used for the training and classification, no calcium transient measurements were performed.
Tests setup. The numerical choices leading to the results are summarised hereafter: • Bidomain equation parameters are summarized in Table C in S1 Text.
• Drugs were modeled using Eq 11 presented in section MEA computational model. The in silico database was generated blocking alternatively sodium (g Na ), potassium (g Kr ) or calcium (PCa) channels of the ORd model at a random percentage between 0% and 50%. Other channels are blocked between 0% to 5% to introduce some variability (e.g. blocking sodium at 35%, calcium at 2% and potassium at 3.5%). An example is shown in Fig 14. • The simulated sample size is 140 (computed from signals resulting from the simulation performed for different heterogeneity fields).
• A 96-well MEA device (Axion Bioystems) with 8 electrodes per well where the corresponding finite element mesh is presented in the right panel of Fig 3. A cell heterogeneity field was applied on this finite element mesh following the strategy developed in section Heterogeneity.
The experimental data leading to the results are summarised hereafter: • In vitro data used for this part are FP traces recorded from a hiPSC-CM monolayer (Pluricyte Cardiomyocytes, Ncardia) plated on a 96 well MEA plate (8 electrodes per well) Axion Biosystems (Classic MEA 96 M768-KAP-96 [38]).
• The 12 "CiPA" compounds listed in Table 1 were tested on Pluricyte Cardiomyocytes and FP traces were recorded before and 30 minutes post compound addition. MEA results of 5 PLOS COMPUTATIONAL BIOLOGY compounds were used for the training and MEA results of 7 "blind" compounds for the validation.
• Each compound was tested at 4 concentrations, 1 concentration per well and in 5 replicates (n = 5 per concentration).
The final experimental sample size was 75 for the training set and 85 for the validation set (some wells were removed from the analysis due to quiescence or noisy signal observations). The dictionary entry list is given in Table D in S1 Text.
Using the conductance-block model described in Eq 7 we obtain the percentage of activity for each channel and concentration. This is shown in Table G in S1 Text.
Two different kinds of classification problems have been studied. A binary classification (i. e. given a channel, is the molecule affecting its functioning), whose results are shown in section Binary classification: and a ternary classification (i.e. is the molecule affecting potassium, calcium or sodium?), whose results are reported in section Ternary classification. For the numerical experiments proposed, the success rate of the classifier for the training set was about 90%. In the following, we present in details the results on the in vitro data in the validation set.

PLOS COMPUTATIONAL BIOLOGY
Binary classification. We start this section by describing the outcome of the greedy algorithm selection. These are shown, for the three classification problems addressed, in Fig 15, in which the weights of the dictionary entries are plotted. The selected entries are different (also in number) for the different classification problems. For instance, for the sodium binary classification, we obtained 3 components for the dimension 1 whereas for the same dimension, we obtained 4 components for the potassium case and 5 for the calcium case. In all the cases, the linear combinations retained are sparse.
The classification results are reported hereafter. First, an aggregated result is presented (considering all the different concentrations, providing an overall label). Then, in the last part of this section, the results at different concentrations are described.
Binary classification: Aggregated result. Fig 16 shows classification results for the seven compounds that were included in the validation set. The value shown for each compound corresponds to the success rate of classifying the compound correctly as a blocker or non-blocker for either the sodium, potassium or calcium channel according to their label (see Table 1).
The results for the 7 molecules in the validation set are commented. (Table 1) For the sodium channel block classification, Loratadine is always well classified (as a non PLOS COMPUTATIONAL BIOLOGY sodium channel blocker) with high confidence (averaged probability returned by the classifier is close to one, see Fig 16). For potassium blockade, Loratadine is well classified in 80% of the cases. Moreover, when Loratadine is well classified, the classifier is more confident (> 0.8) than when it is misclassified (< 0.8). (Table 1) The classifier always returns Dofetilide as a potassium channel blocker with a high probability. Dofetilide is also classified as a sodium blocker, but only for 10% of the cases and with a lower probability than when it is not classified as a sodium blocker. For the calcium channel block classification, Dofetilide is considered as a non-calcium channel blocker for 40% of the cases but with a higher probability than when it is considered as a calcium channel blocker.

Dofetilide: potassium channel blocker
3. Chlorpromazine: potassium, calcium and sodium channel blocker (Table 1) Chlorpromazine is well classified for the potassium and calcium channel classifications (i.e. it is considered as a potassium channel blocker and calcium channel blocker). The success rates for Chlorpromazine are similar to those obtained with Loratadine. An explanation might be the fact that they approximately have the same factor between the hERG and Cav1.2 IC50 values. In addition, Chlorpromazine is classified as a sodium channel blocker in only 20% of all cases, but with a probability of 100%.  (Table 1) Clarithromycine is well considered as a non-sodium channel blocker with a high confidence and for all tested samples. In 70% of the cases Clarithromycine is well classified as a potassium channel blocker with around 90% of confidence. However, Clarithromycine is also labeled as a calcium blocker for 80% of the samples and with more than 80% of confidence. Important to note here is that, although Clarithromycine is labeled as a non-calcium channel blocker for only 20% of the samples, the confidence for this well classification is close to 100%. (Table 1) If we compare classification results obtained for Chlorpromazine and Cisapride, the potassium channel binary classification success rate is the same. However, the classifier is more confident when Cisapride is well classified. Moreover, for the calcium channel classification, Chlorpromazine is classified as a calcium channel blocker in 60% of the cases whereas Cisapride is classified as non-calcium channel blocker in 50% of the cases with a higher confidence than when Cisapride is misclassified as calcium channel blocker. Moreover, in 90% of the samples tested, Cisapride is being classified as a non-sodium channel blocker with a confidence close to 100%. These results are in good agreement with the high potency of Cisapride to block the hERG channel, and the multi-channel block ability (hERG, Nav1.5 and Cav1.2) for Chlorpromazine.

Cisapride: potassium channel blocker
6. Bepridil: potassium, calcium and sodium channel blocker (Table 1) Sodium and potassium channel blockade classification for Bepridil is well captured by the classifier (with high proportion and high confidence). Calcium channel blockade is not seen by the classifier for Bepridil. A potential explanation could be that if calcium and potassium channels are blocked simultaneously, Bepridil does not show a specific pattern of a calcium channel blocker, but essentially potassium and sodium channel patterns are detected as shown in Fig 17 (e.g FPD prolongation due to potassium channel block and DA decrease due to sodium channel block).
7. Azimilide: potassium, sodium and calcium channel blocker (Table 1) Azimilide is well classified as a potassium channel blocker with a high confidence and for 90% of the sample. The sodium channel blockade by Azimilide is clearly not seen by the classifier as 90% of the sample is labeled as non-sodium channel blockade with a confidence close to 100%. The calcium channel blockade classification is also less clear as only 70% of the samples are labeled as non-calcium channel blockade with almost 80% of confidence. This could be related to the potency of Azimilide to block the inward sodium currents and L-type calcium channels is lower than for blocking the hERG channel [52]. Besides, the highest concentration tested was lower than the IC50 values for blocking sodium and calcium channels (Table 1). A dictionary entry chosen by the algorithm for potassium and calcium blockade classification is the ratio RW RA (see Fig 15). As shown in [53], hERG channel block can induce a T-wave flattening in the ECG. This phenomenon is also observed in the FP repolarization of Pluricyte Cardiomyocytes for 0.1μM of Bepridil (see Fig 17) and could be an explanation of the RW RA selection by the algorithm.
For most of the cases, drugs are well classified with a high confidence. However, this is not always the case. For instance, Dofetilide has been perfectly classified as a potassium channel blocker with a high confidence (around 90%), but Dofetilide has also been misclassified as a calcium channel blocker with a high confidence (around 70%).
Binary classification: Study for each concentration. Details for ion channel block classification of each concentration of each compound are given in Fig 18. This figure shows how each compound was classified at each concentration. The interest is to study the evolution of the classification with respect to increasing concentrations.
As done in the previous section, we present the results for each of the 7 molecules in the validation set. (Table 1) We know from Fig 16 that Loratadine is always classified as a non-sodium channel blocker. From Fig 18 we can conclude that the confidence of Loratadine being a non-sodium channel blocker increases with higher concentrations. In addition, Loratadine has also been classified as a potassium channel blocker in 80% of the cases (see Fig 16). Fig 18 shows that the classification is the best at the highest concentration (no misclassification), which can be explained by the relatively low test concentrations compared to the IC50 values (Table 1). Moreover, for the first two concentrations in the potassium channel classification, the confidence is higher when Loratadine is well classified than when Loratadine is misclassified. A bad mark is the increase of the misclassified confidence for the first three concentrations.

PLOS COMPUTATIONAL BIOLOGY
However, for the highest concentration tested, none of the samples were misclassified. Concerning the classification for the calcium channel, the success rate of Loratadine to be classified as a calcium channel blocker was 70% (Fig 16); and based on Fig 18 we can conclude that the misclassified confidence decreases strongly when the concentration increases. This is in line with the differences seen in IC50 values between hERG, Cav1.2 and Nav1.5 (Table 1). (Table 1) Dofetilide was wrongly labeled as a calcium channel blocker in 60% of the cases (see Fig  16). However, the well-classified confidence increases strongly with the concentration (see Fig 18), which means that the confidence of Dofetilide being a calcium channel blocker decreases when the concentration increases. The well-classified probability for the sodium channel (Dofetilide being a non-sodium channel blocker) and potassium channel (Dofetilide being a potassium channel blocker) is around 90% or even higher for all concentrations tested.

Dofetilide: potassium channel blocker
3. Chlorpromazine: potassium, calcium and sodium channel blocker (Table 1) Chlorpromazine is known to block sodium, potassium and calcium channels (see Table 1).
Only for the first three concentrations, Chlorpromazine is clearly seen as a potassium PLOS COMPUTATIONAL BIOLOGY channel blocker (Fig 18). The fourth and highest concentration show that sodium and calcium channels are affected instead of potassium. This is in line with the different potencies of Chlorpromazine for the different ion channels: Chlorpromazine blocks hERG more potently than sodium or calcium (see Table 1). The calcium channel blockade is confirmed by the fact that well-classified confidence for calcium channel block increases with concentration, in addition to being well classified in 60% of all cases.
4. Clarithromycine: potassium channel blocker (Table 1) Clarithromycine is better classified as a potassium channel blocker at higher concentrations (higher confidence for the third concentration and no misclassification for the fourth concentration). Also the sodium classifier shows us that for all test concentrations, Clarithromycine is well classified as a non-sodium channel blocker. However, for any concentration, the calcium classifier does not give us satisfactory results, which means that Clarithromycine is wrongly classified as a calcium channel blocker. (Table 1) Well-classified confidence for Cisapride is always higher than misclassified confidence regardless of the concentration and, particularly for the sodium and potassium channel classifiers. This is in line with Cisapride being a very potent potassium blocker (see Table 1).

Cisapride: potassium channel blocker
6. Bepridil: potassium, calcium and sodium channel blocker (Table 1) Bepridil is well classified as a sodium and potassium channel blocker with a high confidence. This is not the case for the calcium classification. An explanation could be that the potassium channel blockade hides the effect of the calcium channel blockade as above mentioned.
7. Azimilide: potassium, sodium and calcium channel blocker (Table 1) Azimilide is classified as a potassium channel blocker with a probability higher than 90% for all concentrations tested. However Azimilide is misclassified for sodium and calcium channel blockade. As abovementioned, this could be related to the fact that Azimilide blocks the inward sodium currents and L-type calcium channels at concentrations 5-10 times higher than required for blocking the hERG channel [52].
Ternary classification. For the ternary classification we only considered one classifier but with three outputs: sodium, potassium and calcium channel blockers. Aggregated results for the ternary classification are presented in Fig 19. The ternary classifier classified all seven compounds from the validation set as potassium channel blockers.
As expected, the probability returned by the classifier decreases when the IC50 value increases (for example the probability for Loratadine to be a calcium channel blocker is 0.41 with IC50 = 11.4μM (see Table 1) and the probability for Dofetilide to be a calcium channel blocker is 0.29 with IC50 = 26.7μM (see Table 1)). These results do not take into account the different concentrations tested. The probabilities given for each concentration are given in Fig 20. The results for each of the 7 molecules are detailed hereafter. For sake of brevity, both aggregated and by-concentration results are commented. (Table 1) Loratadine is well classified as a potassium channel blocker with a probability equals to 0.57 (Fig 19). The second highest probabilty concerns the calcium channel blocker, which was expected as Loratadine is more potent to block hERG (6.1μM, see Table 1) than to block the L-type calcium channel (11.4μM, see Table 1). Fig 20 shows us that the confidence of the classifier is almost the same regardless to the concentration. (Table 1) From Fig 19, we can see that Dofetilide is well classified as a potassium channel blocker with a probability equals to 0.59. If we now look at the classification results of Dofetilide in Fig 20, a high concentration gives a higher probability of being a potassium channel blocker and a lower probability to be a calcium channel blocker (which is in line with the binary classification method presented in Fig 17). This can be explained by the fact that for the three lower concentrations, the potassium activity is always higher than 90%(see Table G in PLOS COMPUTATIONAL BIOLOGY S1 Text) whereas the highest concentration corresponds to a 75% activity (see Table G in S1 Text).

Dofetilide: potassium channel blocker
3. Chlorpromazine: potassium, calcium and sodium channel blocker (Table 1) Chlorpromazine is well classified as a potassium channel blocker with a probability equals to 0.56 (Fig 19). The probabilities for Chlorpromazine of being a calcium or sodium channel blocker are close to each other (Fig 19), which was expected as the IC50 values for calcium and sodium channel blockade are close to each other as well ( Table 1). The confidence to classify Chlorpromazine as a sodium channel blocker is the highest for the highest concentration tested (3μM, see Table 1) (Fig 20). An explanation of this result could be that some compensation effects would appear on the repolarization due to the simultaneous

PLOS COMPUTATIONAL BIOLOGY
block of potassium as well as calcium. 3μM of Chlorpromazine corresponds at a 50% activity of the sodium channel (see conductance-block model in section Drug modeling), which is clearly visible on the depolarization amplitude (see Fig 21). (Table 1) Clarithromycine is well classified as a potassium channel blocker with a probability equals to 0.56. It is interesting to see that the confidence of being a calcium channel blocker is lower for Clarithromycine than for Loratadine (see Fig 19). This point is expected because Loratadine is known to block calcium channels with an IC50 of 11.4μM (see Table 1), which is not the case for Clarithromycine (IC50 > 30μM). Another good point is that the confidence of being a potassium channel blocker for Clarithromycine slightly increases with higher concentrations (Fig 20). (Table 1) Cisapride is well classified as a potassium channel blocker with a probability equals to 0.6 (see Fig 19). The second highest confidence is for the calcium channel blocker. These channel blockade probabilities are in good agreement with the IC50 values of Cisapride for potassium channel blockade (0.02μM) and calcium channel blockade (11.8μM) ( Table 1).

PLOS COMPUTATIONAL BIOLOGY
The difference in these values might also explain the observation that the confidence of being a potassium channel blocker decreases when the concentration increases, following by a higher confidence of Cisapride being a calcium channel blocker (Fig 20). 6. Bepridil: potassium, calcium and sodium channel blocker (Table 1) Bepridil is well classified as a potassium channel blocker with a probability equals to 0.63 (see Fig 19). The order of the different ion channel blockade probabilities is in good agreement with the IC50 values order ( Table 1). The sodium channel blockade probability is 0.11. This probability is coherent in the sense that Bepidril is known to block the sodium channel; other compounds which are not known as sodium channel blockers have a lower probability (0.01-0.08) of being a sodium channel blocker (except for Dofetilide at low concentrations). Unexpectedly, Fig 19 shows that the probability to be a calcium channel blocker is similar between Bepridil and Dofetilide (not a calcium channel blocker). Even for the last concentration of Bepridil, there is a decreasing confidence of being a calcium channel blocker in favor of being a potassium and sodium channel blocker (Fig 20). This could be explained by the fact that Bepridil has a higher potency for blocking hERG compared to blocking calcium channels and that the effects of hERG channel blockade masked the effects of blocking calcium channels.
7. Azimilide: potassium, sodium and calcium channel blocker (Table 1) Azimilide is well classified as a potassium blocker with a probability of 0.68 (see Fig 19).
Although it is known that the potency of Azimilide to block the inward sodium currents and L-type calcium channels is lower than blocking the hERG channel, the probability of being a sodium channel blocker was still lower than expected and did not change with higher concentrations.

Discussion
Human iPSC-CMs are being increasingly adapted as a novel in vitro model to better recapitulate human heart function and to complement or replace existing in vitro assays for improved cardiac safety assessment. Of the many studies that have now investigated the impact of drugs on the electrophysiology of hiPSC-CMs, the most well-known is the multisite CiPA initiative. Data presented in [6] describe the utility of hiPSC-CMs in combination with MEA and voltage-sensing optical methods in evaluating the electrophysiological responses to 28 drugs linked to low, intermediate, and high TdP risk categories. Studies like the CiPA multisite study show promising results. However, predicting TdP risk at a reasonable level of accuracy remains a challenge. Besides, many screening platforms, like various MEA and calcium-flux devices, are becoming increasingly sophisticated and generate large multidimensional datasets. Improved automated analysis methods, including classification methods to accurately predict the risk for ion-channel block and TdP, are needed. In the present work, a preliminary step towards the setup of high-throughput screening procedures was attempted. In particular, a method was proposed to systematically deal with classification problems involving "CiPA" compounds for their risk to induce TdP as well as for their ion channel blocking properties.

Algorithm
This algorithm selects and combines pertinent features extracted from the signals in order to maximize the classification score (both in terms of the success rate and the confidence of the classifier) by means of a double greedy optimization. The algorithm promotes sparsity (hence mitigating the overfitting risk) and it is fully scalable in terms of parallelism (remark that the number of cores can potentially equal the dictionary entry size). In this paper, the input space computed by the algorithm maximize a score by linearly separating the classes samples, using the classical LDA method. It would be interesting to test the algorithm with other classifiers such as support vector machine (SVM) with different kernels or k nearest neighbor (KNN) and against classification with PCA.
We applied the algorithm on simulated FP and calcium transient data for TdP risk classification as well as on in vitro data coming from FP signals recorded from hiPSC-CMs (Pluricyte Cardiomyocytes) that were cultured on 96 well MEA plates and subjected to 12 CiPA reference compounds (5 compounds were used as a training set and 7 were used to validate the algorithm).

TdP risk assessment
The classifiers obtained have given encouraging results for a drug safety profile of the compounds. Compounds known to have a high TdP risk were 100% well classified according to the arrhythmogenicity risk classification and a compound known to have a low TdP risk was well classified in 67% of the cases. This is conforming to the fact that we decided to put a strong weight on the type II error (wrongly classify a compound as non-torsadogenic). Concerning the torsadogenicity classification, more tests have to be done with higher concentrations (10xEFTPC, 50xEFTPC, etc). Thus, the compound impact on physiological traces (FP and intracellular calcium transient) would be more important, which would improve the classification (bigger margin between the data points and the separation plan). However, even at EFTPC, the TdP risk classification results are encouraging as only Propranolol was misclassified as torsadogenic. Particularly, the algorithm allows us to weight the type II error. To improve the arrhythmogenicity assessment, a ternary classifier could be established to distinguish low, moderate or high TdP risk.

Ion-channel blockade
Concerning ion-channel blockade classification of compounds, potassium was always well classified with a high confidence. Moreover, for the ternary classification study, for most of the tested compounds, the lower the IC50 for a channel, the higher the confidence of the classifier to block this channel.
The binary sodium channel blockade classification is good for all the compounds except for Chlorpromazine (at low concentration). The ternary classification study shows similar probabilities of Chlorpromazine for blocking the sodium channel as for blocking the calcium channel, which is in agreement with the similar IC50 values of Chlorpromazine for these channels.
However, the binary classification is less good for the calcium channel blockade classification. This could also be related to the fact that all CiPA compounds from the validation set block the hERG channel and with higher potencies compared to blocking the calcium channel. The effect of blocking hERG could mask the effect of blocking calcium, making calcium channel blockade more difficult to classify.
In general, the binary and ternary classification strategies are in a good agreement (e.g potassium channel blockade is always well classified).
Nevertheless, more tests have to be done on the algorithm in order to validate and/or improve the classifiers.
For the channel block classification, simulations have been done only on highly pure channel block properties (no multi-channel blockade), simplified to only three types of channels: potassium, calcium or sodium, which is often not representative of the total ion channel blocking effects a compound could have. Training based on in silico multi-channel blockade would be more realistic and would most likely increase the robustness of the classification. Moreover, the present experimental protocol was performed at different concentrations for each compound. The dictionary entry could take this information into account.
Each application presented in this paper was based on one specific model of MEA device. It would also be interesting to know whether the MEA device might have an impact on the analysis of the drug effect, i.e. to study the case where we learn with one MEA device and we validate with data coming from another MEA device.
The addition of intracellular calcium transient data would increase the classification in order to identify not only effects on ion-channels but also to detect negative and positive inotropic effects thereby having the capability to classify other classes of compounds, such as calcium-sensitizers or adrenergic receptor agonists.
The time compound dynamic was not studied in this paper. The dictionary could be extended with new biomarkers as beat rate or depolarization standard deviation. These new entries could provide information on the impact of the compound on the monolayer stability. In order to represent this behavior in the in silico dataset, a pacemaker action potential model showing experimental beat rate behavior (Paci et al [54]) could be introduced.
In summary, the algorithm that we developed proved to be a promising tool to classify compounds for their risk to induce TdP as well as for their ion-channel blocking properties based on in vitro and in silico data derived from hiPSC-CMs. Therefore, this method can be implemented in in vitro MEA and/or calcium-flux studies using hiPSC-CMs where it may serve as a tool to improve machine learning approaches and to deliver fast and reliable prediction of drug-induced ion channel blockade and proarrhythmic behavior to advance cardiac safety assessment.