Multicenter validation of a machine learning phase space electro-mechanical pulse wave analysis to predict elevated left ventricular end diastolic pressure at the point-of-care

Background Phase space is a mechanical systems approach and large-scale data representation of an object in 3-dimensional space. Whether such techniques can be applied to predict left ventricular pressures non-invasively and at the point-of-care is unknown. Objective This study prospectively validated a phase space machine-learned approach based on a novel electro-mechanical pulse wave method of data collection through orthogonal voltage gradient (OVG) and photoplethysmography (PPG) for the prediction of elevated left ventricular end diastolic pressure (LVEDP). Methods Consecutive outpatients across 15 US-based healthcare centers with symptoms suggestive of coronary artery disease were enrolled at the time of elective cardiac catheterization and underwent OVG and PPG data acquisition immediately prior to angiography with signals paired with LVEDP (IDENTIFY; NCT #03864081). The primary objective was to validate a ML algorithm for prediction of elevated LVEDP using a definition of ≥25 mmHg (study cohort) and normal LVEDP ≤ 12 mmHg (control cohort), using AUC as the measure of diagnostic accuracy. Secondary objectives included performance of the ML predictor in a propensity matched cohort (age and gender) and performance for an elevated LVEDP across a spectrum of comparative LVEDP (<12 through 24 at 1 mmHg increments). Features were extracted from the OVG and PPG datasets and were analyzed using machine-learning approaches. Results The study cohort consisted of 684 subjects stratified into three LVEDP categories, ≤12 mmHg (N = 258), LVEDP 13–24 mmHg (N = 347), and LVEDP ≥25 mmHg (N = 79). Testing of the ML predictor demonstrated an AUC of 0.81 (95% CI 0.76–0.86) for the prediction of an elevated LVEDP with a sensitivity of 82% and specificity of 68%, respectively. Among a propensity matched cohort (N = 79) the ML predictor demonstrated a similar result AUC 0.79 (95% CI: 0.72–0.8). Using a constant definition of elevated LVEDP and varying the lower threshold across LVEDP the ML predictor demonstrated and AUC ranging from 0.79–0.82. Conclusion The phase space ML analysis provides a robust prediction for an elevated LVEDP at the point-of-care. These data suggest a potential role for an OVG and PPG derived electro-mechanical pulse wave strategy to determine if LVEDP is elevated in patients with symptoms suggestive of cardiac disease.

paired with LVEDP (IDENTIFY; NCT #03864081). The primary objective was to validate a ML algorithm for prediction of elevated LVEDP using a definition of �25 mmHg (study cohort) and normal LVEDP � 12 mmHg (control cohort), using AUC as the measure of diagnostic accuracy. Secondary objectives included performance of the ML predictor in a propensity matched cohort (age and gender) and performance for an elevated LVEDP across a spectrum of comparative LVEDP (<12 through 24 at 1 mmHg increments). Features were extracted from the OVG and PPG datasets and were analyzed using machine-learning approaches.

Introduction
'Phase space' is a concept based on dynamical systems theory in which possible states of a given object such as position and velocity are represented with each state corresponding to one unique point in phase space [1]. While originating from mechanical systems, it has application to cardiovascular physiology. In one possible application, while systolic dysfunction is characterized by reduced ejection fraction, additional modalities are required to adjudicate dysfunction that is limited to diastole, with the aim of estimating left ventricular (LV) filling pressures. Left ventricular end diastolic pressure (LVEDP) is of distinct interest. The measurement of LVEDP, whether in the presence of reduced or preserved ejection fraction is complex and commonly characterized by multimodality diagnostic imaging. For example, elevation in Brain Naturetic Peptide (BNP) [2,3] and fixed ratios based on echocardiography (spectral Doppler and Tissue Doppler derived E/e') [4] are used to classify if left atrial pressure is elevated or not. Several recent studies have aimed to predict diastolic dysfunction (i.e., intracardiac pressure elevation) using ML approaches, such as from CNN analysis of echocardiographic beat variability [5] and clustering of echocardiographic markers to understand the patterns of diastolic dysfunction across patients with symptomatic CVD [6]. While such developments are promising in the characterization of myocardial function, the prediction of LV pressure elevation as a binary classification (elevated or not elevated) across a spectrum of LV pressures that can be used to guide downstream testing and treatment is of value.
In this context, phase space is a continuous measurement that simultaneously captures data related to electromechanical and pulse-wave signals over successive cardiac cycles, with the resultant biopotential plot being a large-scale data representation of myocardial function and is unique for any given person [1]. The benefits of such an approach are that it captures signals of myocardial function and dysfunction through high fidelity, time-series data collection that cannot be quantified by conventional non-invasive imaging or laboratory testing modalities [7,8].
The physiologic findings of the failing heart that result in elevations in LV filling pressures, LV end diastolic pressure (LVEDP) and left atrial pressure are commonly determined by electrocardiographic and echocardiographic findings of atrial and ventricular remodeling [9], functional changes in diastolic relaxation [10], and changes in flow dynamics [5]. Given a highly heterogenous association between symptoms and the presence of cardiac dysfunction, especially in prevalent conditions such as those with heart failure with preserved EF (HFpEF), new modalities that leverage machine learning (ML) have emerged as potential tools to predict diastolic properties [11,12] and ejection fraction (EF) [13] through computational approaches including neural network analysis of electrocardiographic intervals and wavelet transformation to predict myocardial function and relaxation.
Similarly, we have previously demonstrated the diagnostic accuracy of a ML approach to predict obstructive coronary artery disease (>70% luminal stenosis) from a cardiac phase space analysis. The predictive algorithm was trained and validated with tomographic, voltagegradient features that were paired with the degree of coronary stenosis defined at angiography [7]. An approach such as this provided a method to collect data at the time of an outcome of interest and provided a pathway to evaluate cardiac dysfunction at the point-of-care [8,14,15]. In this context, we investigated a novel ML algorithm based on electromechanical features that was derived non-invasively from orthogonal voltage gradient (OVG) and photoplethysmography (PPG) to predict an elevated LVEDP among symptomatic patients referred for cardiac catheterization, and herein report the findings from the multi-center, prospective validation cohort.
Demographics characteristics for the overall population, study cohort as well as LVEDP groups are listed in Table 1. Within the study cohort, the mean age was 63 years and 45% were women. One third of the population had diabetes with greater than 70% with hypertension and/or hyperlipidemia. The mean EF was 60% with 93% (N = 186) with preserved EF >50%. 38% had obstructive CAD at angiography. Multivariate clinical predictors of an elevated LVEDP can be found in S1 File.

PLOS ONE
Phase space electro-mechanical pulse wave analysis to predict elevated left ventricular end diastolic pressure

Permutation feature importance and exemplar features
A permutation analysis was performed to determine feature importance in LVEDP elevation prediction, and the top 30 most contributive features were grouped by family (S2 File). The most contributive signal-based feature family was PPG indicators, followed by OVG spectral and phase space analysis. The most contributive feature within the PPG indicator family was the maximum of the PPG pulse base (S3 File).

Primary outcome-Performance of the machine-learned predictor
All results (primary and secondary) used the ensembled model as a single assessment of algorithm performance on the blinded validation cohort. Testing of the machine-learned predictor as a continuous measurement demonstrated an AUC of 0.81 (95% CI 0.76-0.86) for algorithmic performance (Fig 4) and corresponded to a sensitivity and specificity of 82% (95% CI: 72-90%) and 68% (95% CI: 61-72%), respectively. S4 File contains the 2x2 cross tabulation for the prediction of an elevated LVEDP based on the sensitivity and specificity determined from the primary analysis.

PLOS ONE
Phase space electro-mechanical pulse wave analysis to predict elevated left ventricular end diastolic pressure

Secondary outcomes
Predictive performance for an elevated LVEDP among a propensity matched cohort. Among a propensity-matched cohort (N = 79 pairs of study subjects) between elevated and non-elevated LVEDP based on age and gender (S5 File), the machine-learned predictor demonstrated a similar result to the primary analysis for the prediction of an elevated LVEDP, AUC 0.79 (95% CI: 0.72-0.86, Fig 5).
Determine the predictive performance for an elevated LVEDP across a spectrum of comparative LVEDP thresholds. Using a constant elevated definition of LVEDP � 25 mmHg and varying the definition of non-elevated across LVEDP values (in 1 mmHg increments), the machine-learned predictor demonstrated an AUC ranging from 0.79-0.82 ( Fig  6a) with corresponding specificity between 59%-69% (using the constant, predefined elevated LVEDP threshold yielding a sensitivity of 82%). Fig 6b illustrates the effect of varying both the definitions of LVEDP elevation and non-elevated, demonstrating a consistent performance, based on AUC, for the prediction of an elevated LVEDP at a threshold value of 25 mmHg.
Sub-group performance. Among the predefined sub-groups, the machine-learned predictor demonstrated an adequate diagnostic accuracy between group stratifications (Fig 7). While the predictive accuracy for an elevated LVEDP was similar among cohorts with obstructive and non-obstructive (p = 0.31), there was a statistically significant difference with greater predictive accuracy among cohorts with preserved EF compared to low EF (p = 0.03).

Safety and adverse events
Testing of the machine-learned model within the pre-specified safety analysis among a healthy cohort without CV disease, at a threshold sensitivity of 82% reported in the primary analysis, demonstrated a specificity of 95% (95% CI: 90-97%). The corresponding 2x2 tabulation and AUC can be found in S4 File.

PLOS ONE
Phase space electro-mechanical pulse wave analysis to predict elevated left ventricular end diastolic pressure No adverse events related to device use were reported during the study.

Bayesian analysis and simulations for net reclassification index between BNP and the machine-learned predictor
A Bayesian analysis of the post-test probability of an elevated LVEDP based on a range of pretest probabilities and distributed according to the machine learned predictor, and the two

Discussion
There is a growing need for new methods to measure LV filling pressures. Recent studies have used ML approaches that analyze echocardiography data to predict diastolic dysfunction [5,6]. While the prospect for discovery is promising, the application of any new analytic technique requires robust methodologies for validation. In this context we employed a trial design of prospective validation [16] within a multicenter study. Prospective data collection permitted the validation dataset to be blinded from the training dataset. This is important because blinding may limit common biases such as spectrum bias and measurement bias between training and validation datasets, and to balance clinical characteristics between both datasets [14]. Towards mitigating bias, we recruited a diverse cohort of patients across multiple healthcare centers and geographies and aimed to enroll study subjects that are representative of a real-world population with a clustering of cardiac risk factors, and various ethnicities. Overall, half of the participants recruited were women, nearly 50% with a BMI � 30 (mean of 36) and greater than 90% with preserved EF (mean EF of 61%), a triad of findings where an accurate assessment of LV filling pressures by conventional testing such as BNP [17] and echocardiography [4,18] have marginal accuracy and vary significantly across those with symptoms of HF. This is particularly true in HFpEF given the heterogeneity of myocardial dysfunction (i.e. ischemic vs non-ischemic etiologies), the cardiopulmonary response to increased afterload and/or preload [19] and the phasic changes in left atrial function that are variable across individuals [9]. Our observations of a high incidence (34%) of an elevated LVEDP among symptomatic patients referred to angiography for the evaluation of ischemic heart disease; however, did not have evidence of obstructive CAD is equally important as it may reflect the underdiagnosis of HF in an ambulatory cohort, and those referred for further cardiovascular testing. Our hypothesis that electromechanical pulse wave features predict myocardial dysfunction is an extension of the hypothesis that the progression from normal myocardial mechanics to pathologic atrial and ventricular remodeling is a result of rising LV filling pressure. Whereas atrial enlargement and ventricular remodeling from alternations in myocardial tension and strain can be considered mechanical features of a pressure loaded left ventricle [20], we postulate that a high dimensional dataset captured from voltage gradients and photoplethysmography can accurately represent myocardial electromechanical function. In support of this argument, wavelet transformation and the mathematical conversion of an ECG into a normalized energy distribution (depicted by a color spectrum of myocardial energy) has recently emerged as a computational modeling and ML method for the prediction of diastolic dysfunction. Potter [11] and Sengupta [12], paired ECG data with echocardiographic data of diastolic abnormalities such as E/e'> 14, left atrial enlargement, and abnormal LA volume index among 398 and 188 patients at risk of HF, respectively. Using supervised and unsupervised ML including random forest classifiers on 250-650 wavelet features, testing of the ML algorithm on a validation data set demonstrated a high diagnostic accuracy (AUC 0.83-0.91) for the prediction of diastolic dysfunction, with a diagnostic performance that was greater than clinical prediction alone.
The present study extends Potter and Sengupta's results to a time-series, electromechanical and perfusion dataset with training and validation on direct LVEDP measurements. Within our dataset, in an LV with normal EF that is under high pressure [10], electromechanical features such as variation in atrial depolarization duration, ventricular repolarization in phase space plausibly represents physiologic findings of elevated left atrial pressure [9] and lusitropic changes of diastolic relaxation [11,12]; respectively. PPG derived feature and the measurement of the pulse wave base may represent a systolic time interval of isovolumetric contraction as this nadir point in the pulse-wave is associated with the lowest photoabsorption immediately prior to onset of systole; a time point of interest that has been associated with HF [21]. Our method of high frequency data capture analyzed in phase space and the corresponding electromechanical features are unique in any given study subject. Once all features are evaluated from a patient's signal, the machine-learned model processes the feature values to yield a continuous score representing the risk of LVEDP elevation. In contrast to BNP or echocardiography which use a threshold or binary values of elevated or non-elevated LV pressures, the high number of features used in this analysis creates a unique signature of LVEDP that is specific to that individual at the N-of-1 level.
One pertinent question leading from the present analysis, is how our results are translated within the continuum of diagnostic tests to determine the presence of elevated LV pressure among symptomatic patients, particularly those with preserved EF. Several point-of-care diagnostic tests are available to diagnose acute HF including chest radiography, BNP/NT-pro-BNP, handheld echocardiography/lung ultrasound, and bioimpedance [22]. Various studies evaluating such point-of-care tests have largely concluded that lung ultrasound and echocardiography have utility to differentiate HF symptoms from non-HF symptoms [22]. While useful, these tests require trained individuals to acquire and interpret cardiopulmonary images and can have limited diagnostic accuracy as they are dependent on patient characteristics such as body habitus and the user experience with, point-of-care imaging devices. Our findings of high diagnostic accuracy for prediction of elevated LVEDP when compared to a range of nonelevated LVEDP such as normal (�12 mmHg) and mid-range (13-24 mmHg) is potentially valuable for 2 main reasons: 1) it supports that our method for data capture and analysis of electromechanical data is robust and that our features are those data representations associated with elevated LVEDP; and 2) that LVEDP at a threshold of �25 mmHg is representative of an elevation that is clinically relevant as our study population was derived from symptomatic patients requiring cardiac catheterization. The latter is important in the setting of high-risk HF patients such as those at risk for HF re-hospitalizations. If we assume that such patients have, at minimum, a 50% (intermediate) to 70% (high) pre-test probability of an elevated LVEDP, when used sequentially with BNP, our Bayesian simulation demonstrate that the ML predictor reclassifies a BNP of > 150 pg/ml (NRI of 0.24) from 59-77% to a post-test probability of 79-90%, with the greatest reclassification margin within the intermediate (30-50%) pre-test probability group. Such results may have clinical utility to more accurately triage HF patients at the point-of-care for further testing and to identify patients with HFpEF.

Limitations
We identified 14% (36/258) of subjects with non-elevated LVEDP were taking a diuretic at the time of enrollment that may have impacted the performance of the ML predictor. When compared to overall study population the specificity was similar within this cohort (68% vs 64%, p = 0.57) and when the analysis was re-run when excluding this cohort, there was no difference in overall specificity. While we contend that diuretics are an important factor when considering the measurement and prediction of LVEDP, the small number of subjects in this group does not permit us to determine its impact on performance within the study population as presented.
Our study population is intrinsically limited by the recruitment methodology, which was subjects referred to left heart catheterization for assessment of obstructive CAD using coronary angiography, and specifically the subgroup where the treating physician chose to measure the LVEDP. We employed this study methodology to ensure that subjects had a catheterizationconfirmed elevated LVEDP, but at the limitation of subjects referred for the evaluation of obstructive CAD. While this may introduce sample bias, we found significant CAD in only 38% of the overall study cohort, a higher incidence of obstructive CAD was observed in subjects with non-elevated LVEDP compared to those with elevated LVEDP (43% vs 24%). Upon subgroup analysis, there was difference in algorithmic performance among those with or without obstructive CAD.
Overfitting, and conversely generalizability, are critical aspects of machine learning and when a large number of features are used for model development. The use of an ensemble, as is the case, does not increase the likelihood of overfitting but rather mitigates it by reducing the dependence on a single model. The methods employed to avoid overfitting could include the use of cross-validation within the development data, using simple models with regularization and penalty terms, and testing the performance of the model on unseen data (doing so only once). With respect to the models in particular, first, each model is exposed to an average of 149 features (with a range of 89-194, S10 File). Second, the model hyperparameters were conservatively designed to mitigate the possibility of overfitting (S10 File). For example, four of the models were Random Forest, which intrinsically limit overfitting by only allowing each component tree access to the square root of the total number of features, and by bootstrap sampling training subjects so that every component tree only has access to a subset of the entire training set. Overfitting was additionally controlled through the use of the maximum tree depth hyperparameter. Deep trees with many splits increase the likelihood of overfitting, and therefore the depth was limited to 3-7. Other model types (Elastic Net and XGBoost) were also designed conservatively. Finally, the ultimate test of overfitting is the performance on unseen blinded data, which yielded an AUC of 0.81. In conclusion, through the analysis of the algorithm, and the performance on unseen blinded data, overfitting did not occur.

Conclusions
We validated a machine learning algorithm of electromechanical pulse wave features to predict an elevated LVEDP among symptomatic patients with a precise measurement of LVEDP. Such techniques to quantify intracardiac pressure with machine learning on large datasets acquired with a portable digital device provides a new method to determine the presence or absence of HF. These data suggest a potential role for a novel OVG and PPG derived electromechanical diagnostic test for the prediction of an elevated LVEDP at the point-of-care.

Trial design
Enrollment in the overall IDENTIFY trial began on December 10 th 2018 with 3,486 participants consecutively enrolled as of April 2021 and was primarily executed to develop a machine-learned predictor to determine the presence of obstructive coronary artery disease (CAD) defined at cardiac catheterization. A cohort analysis (N = 606) using a phase-space ML approach to predict CAD using the same inclusion/exclusion criteria as in this present study has been previous published [7]. The present results are reported according to STARD guidelines [23] (S6 File). The study was approved by a centralized IRB (Western IRB #20183107, now known as WIRB-Copernicus Group). It was initially released on clinicaltrials.gov on January 1 st 2019 (NCT #03864081), and was performed at 15 healthcare institutions in the United States (S7 File). A preliminary analysis of the LVEDP development cohort was presented at the 2020 Scientific Sessions of the American College of Cardiology [24].

Study population-Development and validation data
Data sources. The data sources for the study population included patients who were referred for angiography at the discretion of their treating physicians and for the evaluation of symptoms suggestive of CAD. Patients provided written informed consent to participate in the study. Inclusion and exclusion criteria have been previously published [7] and can be found in S8 File.
Development and validation groups. The study population was derived from a pooled individual patient-level analysis stratified by unique time points and into development and validation cohorts. The development cohort was comprised of symptomatic patients referred to cardiac catheterization for the evaluation of CAD and consecutively enrolled between April 2017 -December 2017 (N = 696) and included asymptomatic individuals without CVD (N = 576). A separate validation cohort prospectively consecutively enrolled symptomatic patients referred to cardiac catheterization between March 2019 -November 2019 (N = 1,023). The data sources for the development and validation cohorts were separate with no data from the validation group used for development. Therefore, the validation group is considered blinded.

Primary and secondary study objectives
The primary objective was to develop and validate an ML algorithm for the prediction of an elevated LVEDP � 25 mmHg. LVEDP was measured invasively at the time of cardiac catheterization using conventional techniques for left ventricular pressure assessments. For the primary objective, an analysis using threshold of LVEDP of 25 mmHg (study cohort) was chosen and compared to individuals with a normal LVEDP defined as � 12 mmHg (control cohort). These thresholds were selected to reflect those LVEDP measurements that are likely to be sufficiently high to result in symptoms (elevated LVEDP) or absence of symptoms (normal LVEDP), as they relate to a spectrum of symptoms among CV patients undergoing angiography, and those with obstructive and non-obstructive CAD [25,26].
Secondary objectives included those analyses to refine the primary objective within the following 5 categories: 1. Performance of the machine-learned predictor for an elevated LVEDP � 25 mmHg among a propensity-matched cohort (scoring using age and gender).
2. Performance of the machine-learned predictor for an elevated LVEDP across a spectrum of comparative LVEDP thresholds (<12 through 24 at 1 mmHg increments).
4. Safety analysis and predictive accuracy of the machine-learned predictor using a healthy control cohort to determine the specificity and negative predictive value of the algorithm, and; 5. Bayesian analysis to determine the post-test (i.e., posterior) probability of the machinelearned predictor based on varying the pre-test probability (i.e., low, intermediate, and high prior probability of elevated LV filling pressures) among symptomatic patients.

Acquisition system description
The acquisition system (CorVista Capture™ device) simultaneously collects two modalities of time series data: orthogonal voltage gradient (OVG) data, representing cardiac electromechanical activity analyzed in phase space, and photoplethysmography (PPG) data representing blood volume changes as a measurement of distal perfusion. Consecutive study subjects underwent signal acquisition immediately prior to angiography or within seven days prior to the procedure. In addition to OVG and PPG data, the device also captured patient-specific metadata (gender, age, height and weight). Signal data was acquired for 3.5 minutes.

Raw data collection
The OVG signal is collected using electrodes attached to the skin (S9 File). Specifically, the signal is acquired at 8kHz (i.e., 8,000 samples per second, with each consecutive pair of samples separated by 0.000125 seconds) from seven electrodes at an amplitude resolution of 0.024 microvolts. The signal originates from three bipolar pairs of electrodes collecting data from the coronal, sagittal and transverse planes, and the seventh electrode acting as the reference. See S9 File for further details. Similar to existing signal collection methods, the OVG measures the biopotential at the surface of the skin caused by cardiac electrical activity. While the OVG signal acquisition resembles ECG, it differs with greater sampling frequency (conventional ECG sampling frequency of 500-2000Hz) by a factor of 4-16 and provides broader spatial information due to the orthogonal lead configuration and vectors along different planes of the body [27,28].
The OVG biopotential data is represented within a three-dimensional phase space, where the parameters of the phase space are defined by the three bipolar orthogonal acquisition channels. Specifically, the amplitudes of three voltage gradient data points from the three channels form a three-dimensional coordinate within the phase space. As the signal processes through time, it traces a phase space trajectory. The PPG signal contains red and infrared light components, both collected at 500Hz via a finger clip sensor. The pulse wave is captured as the absorption of light in the tissue varies based on changes in cardiac activity.

Development & validation approach for the machine learning predictor
Development and validation of a machine learning predictor occurred in two distinct phases. In the first phase, the machine learning predictor was trained using the development dataset. Upon completion, the machine-learned predictor was finalized such that no further modifications were permitted. Then, in the second phase, the machine learning predictor was tested in the blinded validation cohort and the performance was assessed.

Signal processing and development of the machine-learned predictor
The sequence for processing a patient's data to generate the ML prediction occurred in four steps.
Step 1: Confirmation of signal quality. As an initial processing stage, the signal was confirmed to have adequate quality to proceed through the next three steps. Signal quality acts to check for the presence of noise generated by common sources in a clinical environment. The signal quality assessment has been previously published [8], and will be summarized herein. The OVG signal is examined for the presence of powerline noise, which is the electrical noise at the frequency at which alternating current (AC) power is delivered (i.e., to electrical outlets, etc.); specifically, this is 60 cycles per second (Hz) in North America. The OVG signal is also examined for high-frequency noise. A SNR of 57 was considered acceptable for powerline noise, and of 19 for high frequency noise. The PPG signal is examined for sensor saturation, which can occur when the light emitted from the LED on one side of the finger clip directly enters the sensor on the opposite site of the clip without transiting the finger. The light is strong because it isn't attenuated by the finger, and therefore an optical value is registered that exceeds the maximum measurable value. Excessive occurrence of this situation reduces the physiological information in the signal, and results in a prompt to attempt to reacquire the signal. SNR is not applicable to this score because the occurrence is transient.
Step 2: OVG and PPG feature extraction. After signal quality assessment, features were extracted from the signal. We defined a feature as a characteristic of the data that is automatically measurable on any acquired signal. A spectrum of feature domains of signal characterization were used in the present analysis and include the dynamics of the OVG and PPG signals in isolation; the synchronization dynamics of the OVG and PPG signals, the spectral properties of each signal modality; deviations of the OVG signal from subject-specific models; both conventional time-domain features and variations of those features; phase space features; PPG pulse-wave indicators; and approximation of a patient's respiration waveform (see S3 File). For example, the variation in the atrial depolarization duration is a feature extracted from the OVG signal in the time domain. Specifically, the duration of the atrial depolarization is measured on each cardiac cycle which forms a distribution of durations across the length of data acquisition. The standard deviation of this distribution is then calculated to represent the variation in the atrial depolarization duration. Examples of other features can be found following in S3 File.
The OVG data represents the entirety of the electrical biopotential signals plotted on axes corresponding to the signal amplitude in millivolts of each channel, where ventricular depolarization and repolarization, and atrial depolarization, appear visually as loops. While this OVG data may appear similar across different people, it is unique for a given individual, therefore generating unique feature vectors in the high-dimensional feature space.
Step 3: Outlier detection. Mathematically outlying subjects were identified based on the signal's feature values using the Isolation Forest algorithm [29]. Excluding outlying data ensures that the algorithm is not exposed to any data that is significantly differently than the development data.
Step 4: Machine learned model optimization. 13 machine-learned models optimized to return high values for elevated LVEDP and low values for non-elevated LVEDP were evaluated given the features as input. Each model was trained individually by varying the subjects, features and thresholds. The models also varied, from Random Forest [30], Extreme Gradient Boosting [31] and Elastic Net [32]. See S10 File for a description of the algorithm, data, and hyperparameters for each model, as well as an explanatory figure for Extreme Gradient Boosting. Each of the 13 models were individually performant based on stratified 5-fold cross-validation repeated for 100 iterations (to vary the train/test folds) within the development data (S11 File), but represent unique analyses of LVEDP assessment. To capture the diversity of each model in a final single prediction, which is intended to eliminate the bias associated with the selection of a single model and thus reduce the likelihood over overfitting on the development data, the 13 model were amalgamated into a single predictive ensemble. The ensemble, composed of an average of the normalized outputs from the constituent models, is intended to on-average outperform any model that we may have selected from the pool of 13 when applied to new data [33].

Statistical analysis
Analyses were performed to determine the diagnostic performance of the machine learned predictor used as a continuous measurement (independent variable) on the prediction of LVEDP (dependent variable) adjudicated as elevated (�25mmHg) or normal (�12mmHg). A threshold was then established on the continuous measurement to yield a binary output, after which standard techniques were used to calculate sensitivity, specificity, negative and positive predictive values with each study subject categorized as either a true negative, true positive, false negative, and false positive. R 3.5.2 was used for statistical calculations, including 95% CIs, relevant statistical tests, ROC-AUCs, and propensity matching between the elevated LVEDP and control cohort. CIs were calculated using De Long's method for AUC, and Clopper Pearson for sensitivity and specificity. Comparisons between ROC-AUCs were computed with DeLong's test. Independent clinical predictors for an elevated LVEDP were calculated using multivariable logistic regression analyses. Permutation analysis was used to determine feature contribution within the machine learning model [30].
A simulation of BNP performance was performed using the known distributions of measured BNP values [2,3], and specifically the estimated minimums, 25 th percentiles, medians, 75 th percentiles and maximums [2,3] Specifically, the performance of BNP, as a commonly used point-of-care test for the prediction of heart failure, was explored using published studies that included patients with similar clinical characteristics to those in the present study (symptoms suggestive of decompensated HF [3], those with HFpEF, and those with obesity [2]) and to compare the post-test probability of the machine-learned predictor vs BNP. This simulation was then used to explore the sequential prediction of an elevated LVEDP [2] and non-elevated LVEDP [3] (i.e. dyspnea due to non-cardiac causes) using the BNP post-test (i.e., posterior) probability as the pre-test (i.e., prior) probability for the machine-learned predictor. Therefore, to propose how the present results may be used in clinical practice, the sequence of this analysis is the following: pre-test probability ! BNP post-test probability used as the machinelearned posterior probability ! machine-learned predictor post-test probability ! prediction of elevated LVEDP; and to determine a net reclassification index (NRI) within this sequence [34].
The simulation of BNP performance was performed using the distributions of measured BNP values in two relevant publications cohorts [2,3], specifically the estimated minimums, 25 th percentiles, medians, 75 th percentiles and maximums using boxplots. The approximate values of the constraining statistics for the two BNP datasets are as follows: 0ng/mL minimum for both datasets, 25th percentile of 75ng/mL for the non-cardiac etiology dataset and 250ng/ mL for the obese HFpEF dataset, respective medians of 190ng/ML and 250ng/mL, 75th percentiles of 475ng/mL and 750ng/mL, and maximums of 1075ng/mL and 5000ng/mL. These statistics were used as constraints to generate a simulated distribution of BNP for each of the cohorts, matching the number of subjects in the non-elevated and elevated LVEDP groups in the present work (258 subjects in the non-cardiac etiology group and 79 subjects in the obese HFpEF group). The performance of BNP was then assessed using this simulated data, from which AUC can be calculated, and thresholds of 50pg/mL and 150pg/mL were applied to calculate any statistics requiring a binary result (i.e., test-negative or test-positive to calculate sensitivity, specificity, PPV, NPV, likelihood ratios). The simulation was repeated for 1000 iterations with the values of the performance statistics averaged and confidence intervals calculated using the distribution of the statistics over the iterations (i.e., values at 2.5 th and 97.5 th percentiles). The simulation result was analyzed using a Bayesian methodology and to calculate the NRI for the ML predictor based on published methods [34].