Identifying Cardiac Syncope Based on Clinical History: A Literature-Based Model Tested in Four Independent Datasets

Background We aimed to develop and test a literature-based model for symptoms that associate with cardiac causes of syncope. Methods and Results Seven studies (the derivation sample) reporting ≥2 predictors of cardiac syncope were identified (4 Italian, 1 Swiss, 1 Canadian, and 1 from the United States). From these, 10 criteria were identified as diagnostic predictors. The conditional probability of each predictor was calculated by summation of the reported frequencies. A model of conditional probabilities and a priori probabilities of cardiac syncope was constructed. The model was tested in four datasets of patients with syncope (the test sample) from Calgary (n=670; 21% had cardiac syncope), Amsterdam (n=503; 9%), Milan (n=689; 5%) and Rochester (3877; 11%). In the derivation sample ten variables were significantly associated with cardiac syncope: age, gender, structural heart disease, low number of spells, brief or absent prodrome, supine syncope, effort syncope, and absence of nausea, diaphoresis and blurred vision. Fitting the test datasets to the full model gave C-statistics of 0.87 (Calgary), 0.84 (Amsterdam), 0.72 (Milan) and 0.71 (Rochester). Model sensitivity and specificity were 92% and 68% for Calgary, 86% and 67% for Amsterdam, 76% and 59% for Milan, and 73% and 52% for Rochester. A model with 5 variables (age, gender, structural heart disease, low number of spells, and lack of prodromal symptoms) was as accurate as the total set. Conclusion A simple literature-based Bayesian model of historical criteria can distinguish patients with cardiac syncope from other patients with syncope with moderate accuracy.


Introduction
The differential diagnosis of syncope is wide, ranging from vasovagal syncope, to cardiac arrhythmias, orthostatic hypotension, and valvular heart disease [1]. Among patients less than 65 years of age, cardiac syncope comprises little more than one-fifth of causes of syncope, whereas in patients above 65 years it is the cause of syncope in 42% of cases [2,3]. Importantly, patients with cardiac syncope are at increased risk of cardiovascular events and mortality [4].
Several studies have investigated the usefulness of clinical history and demographics as predictors of the cause of syncope [5][6][7][8]; others have investigated criteria associated with adverse outcome in syncope patients [9][10][11][12][13][14]. However, these studies differ in setting, outcome, definitions, and potential markers in relation to syncope. The purpose of this study was to use these publications to develop an international, literaturebased model to predict cardiac syncope. A model for predicting cardiac syncope based on historical and demographic criteria that is effective and robust to minor differences in definition and setting may provide a useful tool for risk stratification of patients presenting with syncope.
Accordingly, we conducted a literature search to identify studies reporting historical and demographic variables associated with cardiac syncope, and combined the reported results in a conditional probabilities model. We then tested the effectiveness of this model against data of patients with syncope from four centres that were not associated with the derivation populations.

Methods
Model derivation and model testing were conducted independently. Two samples of publications were defined. Publications were included in the derivation sample [5,6,[15][16][17][18][19] if they presented summary findings such as proportions or sensitivity and specificity, but the patient-specific primary data sets were not acquired by the investigators. Publications were included in the test sample if they had summary data, and the patient-specific primary data sets were acquired by the investigators [7][8][9]20,21].
Our intent was to summarize the results from major articles published in the field. Pubmed was searched for relevant studies in 2 separate searches. First, the terms "diagnosis", "signs and symptoms" and "vasovagal syncope" were entered. Second, the terms "clinical history", "diagnosis" and "syncope" were entered. Resulting abstracts were scanned manually and studies were included if they featured all of the following: patients with ≥1 transient loss of consciousness; a diagnosis of cardiac syncope vs. other causes, according to the degree of evidence accepted in each paper; and ≥2 historical symptoms or criteria (not counting results of physical exam or further diagnostics) reported in relation to the final diagnosis. Secondary searches were conducted in local bibliographies and in references from identified articles.

Key variables in the derivation sample
Variables from the 7 derivation populations [5,6,[15][16][17][18][19] were selected as predictors if they were reported to be associated with specific causes of syncope. Similarities and differences among the derivation datasets in the prevalence of predictors, and the relation between predictors and outcome, were identified using logistic regression models. Some studies included a range of variables in a multiple logistic regression model without presenting a cross-tabulation of the presence of individual variables against the final diagnosis; therefore a statistically significant association between a predictor and the diagnosis may have been reported without a quantification of this association. To ensure data were entered in the model on all selected predictors, we introduced an arbitrary threshold: only variables reported to be statistically significantly associated with the diagnosis (regardless of how the results are presented), in all patients or among a subgroup of patients, in ≥3 studies, were included. Signs observed by bystanders were excluded because they are conditional upon the presence of bystanders [22].

Model construction in derivation sample
The most important model variables were combined in a conditional probabilities model. The conditional probabilities of predictors, given the diagnosis, were derived by summation of the frequencies reported in each study, whether the association was reported as statistically significant or not. Associations that were quantified indirectly (for example from sensitivity and specificity, or mean and standard deviation) were recalculated as frequencies, where possible.
The model consists of a graphical structure of nodes, illustrated in Figure 1. The nodes represent the study population, cardiac syncope and a range of predictors. The a priori probability of cardiac syncope corresponded with the prevalence of cardiac syncope in the study population. The direction of the arrows indicates that the outcome (cardiac syncope) is dependent on the predictors (age, gender, structural heart disease, etc.) For each symptom, the relation between the symptom and the outcome is characterised by a conditional probability table: the probability that cardiac syncope causes the symptom. For example, based on the literature search we know that the probability of nausea, given the presence of cardiac syncope, is 8%; the probability of nausea, given the presence of non-cardiac syncope, is 20%. The probabilistic information and the node structure are combined to calculate the probability that a patient suffers from cardiac syncope, using a joint Bernouilli probability model. The probability that a patient presenting with a specific set of historical criteria had cardiac syncope is the normalized product of the individual conditional probabilities, multiplied by the prior probability that a random member of the patient population has cardiac syncope. For each patient, the model output is within the range of probability of zero and one.

Model description
The accuracy of the model in predicting cardiac syncope was tested in datasets from 4 independent syncope populations for whom primary data were available [7][8][9]20,21]. A graphical representation of the final model structure is given in Figure 1. The population node contains 4 categories; one for each test dataset (Calgary [7,8,20], Amsterdam [21], Milan [9] and Rochester [2]). Each test dataset corresponds to a dataderived a priori chance of cardiac syncope. For each test subject, the prior probability of cardiac syncope, which is specific to the test population (e.g. 21% in the Calgary data), is then updated with a posterior probability: the probability of cardiac syncope given the population as well as the predictors for that subject.

Testing the Model from the Derivation Sample
For each test dataset, patient data were fitted to the conditional probabilities model, resulting in a predicted probability of cardiac syncope for each patient. The predicted probabilities against the actual diagnoses were displayed in a receiver operating characteristics (ROC) curve. A c-statistic was computed to give an overall estimate of how well the model discriminated cardiac vs. non-cardiac syncope in the test data.

Parsimonious model
Not all test datasets contained all model variables; therefore, for a fair comparison of the model efficiency in the 4 test datasets, model fits were repeated using only the 5 predictors available in all datasets.

Testing the Model using Resampled Data
Among patients who have only fainted once, and among older patients, historical criteria might be less predictive [17]. Accrual bias might, therefore, have affected the associations between predictors and outcome. We standardized the proportion of older patients, and of patients who only fainted once to attenuate differences due to accrual bias. Each dataset was resampled to create 1000 samples with a standardized distribution of age and number of spells (both as categorical variables), using PROC SURVEYSELECT in SAS. The overall distribution of age and number of spells reported in the literature used to build the model was used to create the standardised distribution. Only variables common to all datasets were used in the resampled data. Hugin 7.1 was used to compile the conditional probabilities model, to establish proof of concept. For computing large datasets, the model was programmed in Matlab using Kevin Murphy's Bayes Net Toolbox (BNT). All other analyses were conducted using SAS software, version 9.2.

Predictors of cardiac syncope
Eleven variables were statistically significantly associated with the diagnosis in 3 or more derivation studies listed in Table  1. The resulting conditional probability tables for these variables are given in Table 2. Palpitations were more common among cardiac syncope patients in some studies [5,6], and more common among vasovagal patients in others [15,18]. Overall the association of palpitations with cardiac vs. noncardiac syncope was not statistically significant; therefore, this variable was excluded. The resulting 10 variables associated with cardiac syncope were age >60 years, male sex, structural heart disease, <3 spells, and supine syncope and effort syncope. Factors associated with non-cardiac syncope included age < 40 years, a long prodrome, nausea, diaphoresis, and blurred vision. These had moderate accuracy in identifying patients with cardiac syncope (Figure 2).

Test populations
There were 4 test datasets. Canada and United Kingdom: Syncope Symptom Study, 'Calgary data' [7,8,20]. Between January 1995 and July 2001, 670 patients (age 51±21 years) with at least one loss of consciousness were recruited from neurology, cardiology, pacemaker, arrhythmia and syncope clinics. All patients completed a 118-item questionnaire based on Calkins et al. [15], assessing symptom burden, provocative situations, perisyncopal symptoms, symptoms thought to be associated with epileptic seizures, signs observed by bystanders and relevant medical history. Definitions of clinical diagnoses were tightly and prospectively defined; patients underwent electrophysiologic studies and where necessary also tilt table tests (excepting those with a clinically declared cause of syncope such as sustained VT during syncope) [20]. In total, 138 (21%) patients had cardiac syncope.
Netherlands: Fainting Assessment Study (FAST), 'Amsterdam data' [21]. Between February 2000 and May 2002, 503 patients (age 52±19 years) presenting with a transient loss of consciousness at the neurology, cardiology, and internal medicine emergency department, or the cardiac emergency department of the Academic Medical Center in Amsterdam were enrolled. A standardised medical history was taken, using a questionnaire based on the European Society of Cardiology guidelines [1]. Clinical diagnoses were based on expert opinion after 2 years of follow-up, and documented arrhythmias where available. A total of 44 (9%) patients had cardiac syncope.
Italy: Short-Term Prognosis of Syncope (STePS) Study, 'Milan data' [9]. Between January and July 2004, 689 patients (age 60±21 years) presenting with syncope at emergency departments in one of four general hospitals in the Milan area were enrolled [9]. Cardiac syncope was mainly defined as a change in rhythm therapy or a documentation of a potential substrate for syncope. Only 37 (5%) patients had cardiac syncope.
United States, 'Rochester data' [2]. Between January 1996 and December 1998, 3877 patients (age 57±23 years) were evaluated for syncope at the Mayo Clinic in Rochester, Minn. Patients were referred from outpatient clinics, inpatient services, hospital emergency departments and other institutions. Clinical diagnoses were based on expert opinion, and documented arrhythmias where available. Cardiac syncope was defined as a documented symptom-rhythm correlation, or a potential substrate for syncope. In total, 424 (11%) patients had cardiac syncope.

Distribution of predictors in test populations
The prevalence of predictors in each dataset, and how the predictors were distributed among patients with cardiac and non-cardiac symptoms, is shown in Table 3; not all predictors were surveyed in all centres. Structural heart disease was common among Calgary (21%) and Milan (25%) patients, but much less so among Amsterdam (10%) and Rochester patients Results are given as number ( %) There were statistically significant differences in the overall prevalence of predictors, between the 4 centres (chisquare tests, p <0.001 for all symptoms/demographics). * 'Blurred vision' was not explicitly asked in the Rochester study. The term 'blurred vision' was extracted from a text field for patients' comments. † In the Milan data, the number of spells was categorised as 'One spell' vs. 'More than one spell'. Because this was the only information available, it was used as a proxy for 'Two spells or less' vs 'More than two spells'.

Performance of conditional probabilities model in test populations
Three analyses of the test populations were performed: 1) all 4 test populations; 2) a heavily resampled population to mitigate data collection biases; and 3) a parsimonious model. The conditional probabilities model, using all available information for each dataset, resulted in c-statistics of 0.87 for the Calgary data [7,8,20], 0.84 for the Amsterdam data [21], 0.72 for the Milan data [9] and 0.71 for the Rochester data [2] ( Figure 3). With a cut-off value of the probability of cardiac syncope of 0.02 (selected to favour sensitivity over specificity), the sensitivity and specificity of the full model were 92% and 68% for Calgary, 86% and 67% for Amsterdam, 76% and 59% for Milan, and 73% and 52% for Rochester. Using only variables common to all datasets (age, gender, structural heart disease, number of spells, and prodromal symptoms) a more parsimonious model resulted in very similar c-statistics of 0.88, 0.83, 0.72 and 0.69, respectively.
In an effort to identify the source of discrepancy in model accuracy for the 4 test datasets, we sought to reduce accrual bias by standardizing the distribution of age and number of spells. The conditional probabilities model on the resampled data resulted in c-statistics of 0.80 for Calgary, 0.78 for Amsterdam, 0.73 for Milan and 0.69 for Rochester (Figure 3). There were great differences among centres in the mean predicted probability of cardiac syncope among cardiac syncope patients (Figure 4). However, for all locations, the predicted probability of cardiac syncope was greater in cardiac syncope patients than in non-cardiac syncope patients (Mann-Whitney rank sum test; p<0.001 for all comparisons).

Discussion
We aimed to develop an international, literature-based model to predict cardiac syncope. This simple model of historical and demographic criteria distinguishes patients with cardiac syncope from patients with other causes of syncope, with accuracy in the range of the Framingham [23] and TIMI [24] scores. The accuracy of the simple model approaches the pooled estimates [25] of sensitivity and specificity of the San Francisco Syncope Rule used in emergency rooms for a variety of medical outcomes. However, there were differences between research centres in the predictive power of individual historical criteria, and therefore also in the predictive power of the overall model.
The inclusion of only 5 predictor variables preserved the overall accuracy of the model in all test datasets for cardiac syncope. They were age over 40 years, male sex, structural heart disease, no more than 2 spells, and brief or no prodromal symptoms. The parsimonious model is simpler, and therefore generally preferable. However it will not perform as well in specific settings, such as when arrhythmias are suspected in young people [26]. The summary of predictive powers in Table  2 contains useful data for these particular settings.
This study is the first literature-based model to identify patients with cardiac syncope based on historical and demographic criteria. Predictive models are generally datadriven, and may not be effective in other populations. This model is based on the results of 7 studies, conducted in 4 countries, and is more likely to be robust. To confirm this, model testing was performed on independent primary datasets from 4 countries. A further strength was the use of a conditional probabilities model taking into account the prior probability of cardiac syncope. Prior probabilities may differ among locations (and did), and taking this into account makes the model more suitable for use in different settings.
Clinical and physiologic features: The predictor variables are simple, and have familiar clinical and physiologic features. For example, male sex predicts cardiac syncope, which resonates with the lower likelihood of males having vasovagal syncope [27,28]. Advancing age predicts cardiac syncope, and it also associates with age-dependent diseases such as sick sinus syndrome, aortic stenosis, and myocardial infarction. Factors associated with non-cardiac syncope such as age < 40 years, a long prodrome, nausea, diaphoresis, and blurred vision resonate strongly with the demographic and clinical features of vasovagal syncope. The long prodrome and blurred vision is likely due to the relatively gradual decline of blood pressure, and the preceding diaphoresis reflects the intense cutaneous sympathetic activity that occurs with vasovagal syncope.
The complexities of health service and epidemiology studies of syncope have left the field with several large opportunities, which have only recently begun to be addressed. The main issues are the uncommon occurrence of documenting syncope in the act, the large differential diagnosis with quite varying severities of outcomes, the heterogeneous substrates that predispose to it, and the range of outcomes. Compounding this is a lack of uniform criteria and definitions for diagnosis, substrates, degree of certainty about diagnosis, variable approaches to diagnostic investigations, and variable definitions of outcomes. There are assumptions about the link between investigation results and true diagnosis, and therefore ongoing uncertainty that demonstrating electrical or structural substrates actually establish a diagnosis. As well, the relative uncommonness of any single diagnosis of any single form of cardiac syncope often leads to them being pooled together. With these problems it is not surprising that up to half of syncope goes undiagnosed in other than specialty centres. Finally, rare causes of syncope such as genetic arrhythmias do not figure prominently in most studies because of their rarity.
There have been three approaches to these problems. The first is to use very rigorous diagnostic criteria for each cause of syncope, exemplified by the Calgary Syncope Symptom studies [7,8,20] and the ISSUE ILR studies [29][30][31][32]. The second is to develop uniform criteria for outcomes, as is now the case for studies of syncope in the emergency department [33]. This study takes a third approach, more common and pragmatic, more probabilistic and integrative. It is based on 11 studies, and integrates the assumptions widely made in the practice of syncope. For example, it implicitly assumes that all cardiac syncope lacks the autonomic symptoms associated with vasovagal syncope, and that most vasovagal syncope has these symptoms. It implicitly assumes, as do its integrated studies, that identification of a substrate diagnoses the cause of syncope, and it simply accepts the varied definitions of cardiac syncope. It also implicitly accepts that not all syncope can be diagnosed. Although this approach lacks the rigor of some earlier studies, it has the strength of reflecting current styles and standards of practice. It also establishes a foundation for future, more tightly controlled studies.

Limitations
There are several potential limitations. The model was based only on the data reported in a sufficient number of 7 studies [5,6,[15][16][17][18][19], raising the possibility of inclusion bias due to the biases built into the care patterns of the involved health care systems. For example, a completely publicly funded system with easy access to health care facilities might increase the a priori probability of non-cardiac syncope. Second, not all test datasets included all model variables. Third, although interactions between variables in relation to the outcome were tested in logistic regression models (data not shown), dependence between predictors was not taken into account.
There are differences in terminology among centres in both derivation and test populations. Of the two studies used to derive conditional probabilities for structural heart disease, the term 'heart disease' was defined by one [19]; but not the other [16], and the definition "structural heart disease" varied among the reports. This report suggests the importance of international agreement on definitions, and for an international approach [25,33] to the development of decision rules that work well across all reasonably relevant clinical populations.
Fifth, diagnostic scores are only as good as the clarity and certainty of the reference diagnoses. The Calgary Syncope Symptom Study [7,8,20] used very precise a priori definitions, essentially requiring either a hemodynamically documented syncopal spell or an induced arrhythmia or faint. Others included inferential causes, such as sinus bradycardia documented at another time [2]. Differences in the certainty of the diagnoses might explain some of the differences among the model behaviours in the test populations.
Finally, there were differences among test populations in important variables such as age and number of spells [2,[7][8][9]20,21]. However, resampling to create standardized distributions of number of age and spells did not equalize the model accuracy for the 4 test datasets. Whether other uncollected factors affect the association between predictors and diagnosis of cardiac syncope remains unknown.

Conclusion
A literature-based conditional probability model using information on age, gender, structural heart disease, number of spells, (long) prodrome, nausea, diaphoresis, blurred vision, supine syncope and effort syncope identifies patients with cardiac syncope with moderate accuracy. Future studies might benefit from providing uniform definitions of predictors and outcomes, and collecting a wide range of patient characteristics to take into account inter-centre patient population differences.