Comparative analysis of the association between 35 frailty scores and cardiovascular events, cancer, and total mortality in an elderly general population in England: An observational study

Background Frail elderly people experience elevated mortality. However, no consensus exists on the definition of frailty, and many frailty scores have been developed. The main aim of this study was to compare the association between 35 frailty scores and incident cardiovascular disease (CVD), incident cancer, and all-cause mortality. Also, we aimed to assess whether frailty scores added predictive value to basic and adjusted models for these outcomes. Methods and findings Through a structured literature search, we identified 35 frailty scores that could be calculated at wave 2 of the English Longitudinal Study of Ageing (ELSA), an observational cohort study. We analysed data from 5,294 participants, 44.9% men, aged 60 years and over. We studied the association between each of the scores and the incidence of CVD, cancer, and all-cause mortality during a 7-year follow-up using Cox proportional hazard models at progressive levels of adjustment. We also examined the added predictive performance of each score on top of basic models using Harrell’s C statistic. Using age of the participant as a timescale, in sex-adjusted models, hazard ratios (HRs) (95% confidence intervals) for all-cause mortality ranged from 2.4 (95% CI: 1.7–3.3) to 26.2 (95% CI: 15.4–44.5). In further adjusted models including smoking status and alcohol consumption, HR ranged from 2.3 (95% CI: 1.6–3.1) to 20.2 (95% CI: 11.8–34.5). In fully adjusted models including lifestyle and comorbidity, HR ranged from 0.9 (95% CI: 0.5–1.7) to 8.4 (95% CI: 4.9–14.4). HRs for CVD and cancer incidence in sex-adjusted models ranged from 1.2 (95% CI: 0.5–3.2) to 16.5 (95% CI: 7.8–35.0) and from 0.7 (95% CI: 0.4–1.2) to 2.4 (95% CI: 1.0–5.7), respectively. In sex- and age-adjusted models, all frailty scores showed significant added predictive performance for all-cause mortality, increasing the C statistic by up to 3%. None of the scores significantly improved basic prediction models for CVD or cancer. A source of bias could be the differences in mortality follow-up time compared to CVD/cancer, because the existence of informative censoring cannot be excluded. Conclusion There is high variability in the strength of the association between frailty scores and 7-year all-cause mortality, incident CVD, and cancer. With regard to all-cause mortality, some scores give a modest improvement to the predictive ability. Our results show that certain scores clearly outperform others with regard to three important health outcomes in later life. Finally, we think that despite their limitations, the use of frailty scores to identify the elderly population at risk is still a useful measure, and the choice of a frailty score should balance feasibility with performance.


Methods and findings
Through a structured literature search, we identified 35 frailty scores that could be calculated at wave 2 of the English Longitudinal Study of Ageing (ELSA), an observational cohort study. We analysed data from 5,294 participants, 44.9% men, aged 60 years and over. We studied the association between each of the scores and the incidence of CVD, cancer, and all-cause mortality during a 7-year follow-up using Cox proportional hazard models at progressive levels of adjustment. We also examined the added predictive performance of each score on top of basic models using Harrell's C statistic. Using age of the participant as a timescale, in sex-adjusted models, hazard ratios (HRs) (95% confidence intervals) for allcause mortality ranged from 2.4 (95% CI: 1.7-3.3) to 26.2 (95% CI: 15.4-44.5). In further adjusted models including smoking status and alcohol consumption, HR ranged from 2.3 (95% CI: 1.6-3.1) to 20.2 (95% CI: 11.8-34.5). In fully adjusted models including lifestyle and comorbidity, HR ranged from 0.9 (95% CI: 0.5-1.7) to 8.4 (95% CI: 4.9-14.4). HRs for CVD and cancer incidence in sex-adjusted models ranged from 1.2 (95% CI: 0. 5 16.5 (95% CI: 7.8-35.0) and from 0.7 (95% CI: 0.4-1.2) to 2.4 (95% CI: 1.0-5.7), respectively. In sex-and age-adjusted models, all frailty scores showed significant added predictive performance for all-cause mortality, increasing the C statistic by up to 3%. None of the scores significantly improved basic prediction models for CVD or cancer. A source of bias could be the differences in mortality follow-up time compared to CVD/cancer, because the existence of informative censoring cannot be excluded.

Conclusion
There is high variability in the strength of the association between frailty scores and 7-year all-cause mortality, incident CVD, and cancer. With regard to all-cause mortality, some scores give a modest improvement to the predictive ability. Our results show that certain scores clearly outperform others with regard to three important health outcomes in later life. Finally, we think that despite their limitations, the use of frailty scores to identify the elderly population at risk is still a useful measure, and the choice of a frailty score should balance feasibility with performance.

Author summary
Why was this study done?
• Frailty has been associated with poor outcomes in the elderly, and the need for valid instruments for its assessment is generally recognised.
• Many frailty scores have been developed based on different theoretical concepts; however, none of them can be considered the gold standard.
• The predictive capacity of frailty scores has mainly been studied for mortality, and evidence on their ability to predict other important outcomes in the elderly population, such as cardiovascular or cancer events, is limited (cardiovascular) or nonexistent (cancer) at the present time.
What did the researchers do and find?
• We performed secondary analyses of the most comprehensive list of frailty scores in 5,294 participants of the English Longitudinal Study of Ageing, and we demonstrated that all frailty scores were associated with future mortality and that some of them were also associated with later cardiovascular events. However, no relationship with cancer was observed.
• In addition, the results of this study showed that multidimensional frailty scores may have a stronger and more stable association with mortality and incidence of cardiovascular events.
• Beside the results evidenced that despite significant associations of frailty scores with mortality, the added discriminative ability of frailty scores to chronological age may be limited.

Introduction
Although chronological age is the strongest determinant of disease occurrence and mortality, it is increasingly recognised that the process of ageing is heterogeneous [1] due to a combination of differences in lifetime cumulative exposure to determinants of chronic disease and differences in individual susceptibility. The concept of frailty was introduced as a way of identifying individuals who, at a given age, have a particularly fragile health balance and are therefore more vulnerable to rapid health deterioration and early mortality [2]. However, the operationalization of the concept of frailty has been fraught with difficulties, as different groups of researchers and clinicians have expressed diverging views on which characteristics make up frailty and on how these should be assessed individually and in unison.
Considering the type and composition of variables of frailty scores, four main approaches to frailty can be distinguished. First, the "phenotype of frailty" approach describes frailty as a physiological syndrome of diminished resistance to stressors associated with poor health outcomes [3]. Second, the "multidimensional" approach defines frailty as a dynamic process of loss of function in one or more domains, making the individual vulnerable [4]. Third, the "accumulation of deficit" approach counts the number of health problems or deficits to classify the individual as frail [5]. Fourth, we propose a "disability" approach, as frailty scores were created primarily with variables representing a degree of disability. We have included this classification even without a theoretical basis/reference, as these scores are used as frailty scores, although disability is considered by many authors more as a result of frailty or an overlap condition than as an equivalent of frailty [6].
There is no gold standard to measure frailty and many different frailty scores have been created, even within each of the four main approaches [7]. We have previously shown that there is only limited agreement in which individuals will be classified as frail, according to different scores, and that, in consequence, it is impossible to compare the prevalence of frailty or associations with relevant outcomes between studies using different frailty scores directly [8].
To fully assess and compare the performance of different frailty scores, it is also necessary to consider their prospective association and predictive ability for the main conditions that cause the loss of healthy life years and quality of life in an ageing population [9]. Prospective associations were used in this study to investigate frailty scores as risk factors of important outcomes in the elderly population: death or cardiovascular or cancer events [10]. Predictive value was used in this study to determine the ability of frailty scores to discriminate or separate participants who will from those who will not develop an event [11].
Many scores have shown strong associations with all-cause mortality, risk of hospitalization, and disability [7], but the knowledge concerning their association with other major causes of ill-health and loss of quality of life, such as cardiovascular disease (CVD) events and cancer, is very limited. In a longitudinal study, Klein [12]. Another study shows associations between variables that take part of some frailty instruments and cancer incidence [13], but no direct large-scale comparison studies are available.
This comparative analysis is important beyond the fact that this has not been done. Researchers need more information on what frailty scores actually measure and how they can compare or pool results of studies using different frailty scores. Clinicians need more information on the performance of the scores and on the most appropriate instruments in clinical settings. Policy makers need more information on the usefulness of measuring frailty at a population level and how to achieve it with the best instruments.
Therefore, the objective of this study was to carry out a comparative external validation of a comprehensive list of frailty scores with regard to three important health outcomes in later life: CVD, cancer, and all-cause mortality, by direct comparison of the strength of associations and of added predictive value, using prospective data from a population-based study in the elderly. Some of the scales included are composite scales for physical activity or function, grouped as frailty scores for this paper.
Our hypothesis was that the marked heterogeneity in approach, type, and composition of frailty scores would translate into heterogeneity in associations and predictive ability, with important health outcomes.

Participants, inclusion criteria, and study design
Participants. Data on participants from the English Longitudinal Study of Ageing (ELSA) were used under data-sharing project number 82538. ELSA is an ongoing longitudinal cohort study based on a representative sample of middle-aged and elderly general population 50 years and over living in England [14]. ELSA has extensive subjective and objective information collected in biennial surveys (waves). All waves gathered information concerning physical, cognitive, and psychological health, disability, lifestyle factors, comorbidities, social participation, and social support. Also, even-numbered waves have objective measures: physical functioning assessment and biological sampling [15]. Ethical approval was obtained from the Multicentre Research and Ethics Committee and all participants provided written informed consent [16].
Inclusion criteria. Participants aged 60 or over (because not all frailty-related variables were measured in participants younger than 60 years) who gave permission to link their data with a national mortality register and had a nurse visit in wave 2 were included. The outcomes were measured up to 2012, when mortality data were assessed.

Study design
This is a longitudinal secondary data analysis of ELSA and no formal written analysis plan exists. The analysis was planned in November 2015 during meetings with coauthors. We used the second wave (2004)(2005) as baseline because this was the first wave with a clinical examination and laboratory samples. The exposure was the frailty state measured with 35 different frailty scores at baseline, and the follow-up time was from 2004-2005 to 2012.

Frailty scores
A structured search was performed to identify all published original frailty scores. The search strategy has previously been described in detail [8].
The original scores that could be calculated with the ELSA wave 2 data (i.e., those for which at least 80% of the necessary variables were measured) were selected. Multiple imputation was used to deal with missing data in the underlying measured study variables necessary to calculate the frailty scores. In order to obtain optimally plausible values for the scores, imputation was applied to the original underlying variables, and frailty scores were calculated a posteriori using imputed values.
For preparing an analysis in one single continuous scale, frailty scores were rescaled from 0 (non-frail) to 1 (maximum frail) by dividing the output of each frailty score by the maximum possible value. If the frailty score was defined with a score that gave different weight to some variables, the output was accorded this weight and then rescaled. In addition, some frailty scores had to be inverted to convert the result, according the definition of 0 as non-frail and 1 as maximum frail.
Scores were classified into 4 groups depending on their underlying frailty approach: phenotype of frailty (mainly physical functioning variables), multidimensional (at least 2 different dimensions and less than 30 variables), accumulation of deficits (at least 30 variables), and disability (mainly disability variables).
A total of 67 original frailty scores were found in the literature search and 35 had at least 80% of variables possible to calculate with the data of ELSA wave 2, and in consequence, they were selected ( Table 1). Out of them, 19 had binary cutoffs identifying frail and non-frail individuals, and 10 had categorical cutoffs, additionally identifying an intermediate pre-frail group [8].

Missing data
Missing data of some needed variables to calculate frailty scores were observed in 1 (<1.0%) to 3,037 (57.4%) participants. The mechanism of missing data was assumed to be missing at random because the underlying values necessary to calculate frailty scores that were missing for some individuals are likely to depend on observed data in the ELSA data. In other words, missing data did not depend on any unobserved data, but only upon observed data.
Each variable was defined as being of numerical, binary, or categorical type, which defined the appropriate method for imputation. The chained equations approach was chosen because it is a very effective, flexible, and straightforward method to impute data. This method is based on a set of models adapted to the type of missing value; the values are filled first with random sampling, based only on the observed data, and then also based on already imputed data [49,50].
The imputation model was built by selecting the best missing data predictors among the available variables. The imputation model incorporated strong predictors of missing data (cognition, disability) and confounders (age, sex, education, physical activity). Moreover, outcomes were included in the imputation model (mortality, cancer, CVD), but they were not imputed. To optimise the imputed values, the data were ordered from lower to higher percentage of missing data before running the imputation, and a seed was set to allow reproducibility.
We performed 30 imputations to create 30 different data sets. Then, we ran 20 iterations by each of these 30 imputations, sufficient to achieve convergence of the Gibbs sampler. The imputations were assessed by hand (plausible values for imputed data compared to completed data) and by using graphical methods.

Outcomes
We assessed 3 main outcomes: all-cause mortality, CVD, and cancer events. Mortality data linked to ELSA participants was provided by the National Health Service's Central Registry, Southport, UK. For 68 participants, mortality was obtained from other sources (found during ELSA fieldwork or from participants' relatives). Main causes of death were registered as CVD, cancer, diseases of the respiratory system, and other causes. CVD or cancer events were defined by self-report in waves 3-5. A CVD event could be myocardial infarction, heart failure, stroke, or CVD death. A cancer event could be cancer of any type, including cancer death. For each outcome separately, participants' exposure time was calculated from the participant's age at entry (wave 2 clinical examination: 2004-2005) to participant's age at first event or final censoring (date of mortality assessment: February 2012). Participants lost to follow-up were rightcensored at the midpoint between their last visit and the next one. For analysis of CVD and cancer incidence, respective prevalent cases at baseline were excluded.

Definition of covariates/potential confounders
Smoker status was defined as never, previous, or current smoker. The maximum alcohol consumption per day was defined as 0, 1, 2, and >2 units/day. Body mass index (BMI) was defined as a continuous variable calculated as weight (kg)/height (m) 2 . Self-reported physical activity was defined as time spent in vigorous, moderate, low, and sedentary activity. Diabetes was defined through self-reported medical diagnosis or fasting glucose !7.0 mmol/L or glycated haemoglobin !6.5% [51]. Hypertension was defined from systolic or diastolic blood pressure !140 or !90 mm Hg, respectively, or self-reported high blood pressure medication [52]. Anaemia was defined as a measured haemoglobin level <13 g/dL (men) and <12 g/dL (women) [53]. Arthritis was self-reported diagnosis. Neuropsychiatric problems were selfreported diagnoses of: Alzheimer or Parkinson disease, dementia, or psychiatric problems. Cognition was evaluated with a total continuous cognitive index (memory and executive functions) [54]. Self-rated health was defined as excellent, very good, good, fair, or poor. Quality of life was evaluated with the 19-item scale control, autonomy, pleasure, and self-realization (CASP-19) questionnaire [55]. Depression symptoms were assessed with the 8-item Centre for Epidemiologic Study Depression Scale, with cutoff !4 points [56].

Statistical analysis
We performed two parallel statistical analyses. The first was a continuous analysis with frailty scores rescaled to the range 0 (no frailty) to 1 (frailty). The second was a categorical analysis of frailty scores using cutoffs when they were defined. All data analyses were carried out in R version 3.3.0 using packages 'Mice', 'lattice', 'Survival', mitml', and 'survC1'. A p-value of less than 0.05 was considered statistically significant.
Survival analysis. Cox proportional hazards models were fitted for each outcome and independently for each frailty score as a continuous variable. Where a published cutoff level to define frailty was available, an additional model was run on the binary or categorical frailty classification.
For each outcome (all-cause mortality, CVD, and cancer events), 4 models were fitted with progressive levels of adjustment (0-3): model 0: frailty score; model 1: model 0 + sex; model 2: model 1 + smoking status and alcohol consumption; and model 3: model 2 + physical activity, BMI, diabetes, hypertension, CVD, cancer, anaemia, chronic obstructive pulmonary disease (COPD), arthritis, neuropsychiatric problems, depression, cognition, and self-rated health and quality of life. The covariates in each model were chosen because all of them could potentially be confounders, affecting the outcome and/or the exposure. To avoid collinearity issues, the covariates of model 3 were tailored to each frailty score, excluding covariates that were an underlying variable of the score or a highly correlated variable. For CVD and cancer models, CVD and cancer were excluded as covariates (see S1 Table).
The proportional hazards assumption was checked by adding a time-covariate interaction in the model. The interaction term was retained in the model if significant [57]. The Cox models were fitted in 30 imputed data sets and the results, including calculated 95% confidence intervals, were pooled according to Rubin's rules [58].
The discrimination ability was assessed with Harrell's C statistic [9] using a calendar time to event scale. Three basic adjusted models: model 1 = age and sex; model 2 = model 1 + age, sex, smoking status, and alcohol; model 3 = model 2 + physical activity, BMI, diabetes, hypertension, CVD, cancer, anaemia, COPD, arthritis, neuropsychiatric problems, depression, cognition, and self-rated health and quality of life were calculated for each outcome. Each frailty score was added to each of these models and improvement of the predictive ability was assessed by evaluating whether the C statistic of the model with the score was significantly higher than in the respective base model. Results are expressed as the difference in C statistics (delta C with 95% confidence intervals) of each model, including a score and its respective base model. Sensitivity analysis. We performed a sensitivity analysis by excluding all events that occurred during the first year of follow-up with the objective of assessing if pre-existing disease near the date of enrolling could bias the results. For all-cause mortality, all analyses were also performed stratified by sex and age (>70/ 70 years).
This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines (S1 Text). Table 2 shows the baseline characteristics of the participants included in the analysis.

Results
From 9,432 participants in wave 2 of ELSA, 5,294 (44.9% men) fulfilled the inclusion criteria. Mean age was 71.2 (SD: 8.0) years. The prevalence of CVD and cancer at baseline were 13.7% and 9.3%, respectively. Data from 4,554 participants free of CVD and 4,792 participants free of cancer at baseline were analysed in the respective incidence analyses.
For the majority of cases, the proportion hazard assumption was not proved. Therefore, all figures and tables show hazard ratios (HRs) at the median follow-up time (3.5 years for mortality and 2.5 years for CVD and cancer events).  Table 3 show all-cause mortality HRs for frailty scores calculated at median time follow-up (3.5 years) and analysed as continuous variables at different levels of adjustment. The strength of the association between frailty scores and mortality ranged from an HR of 2.4 (95% CI: 1.7-3.3) to 26.2 (95% CI: 15.4-44.5) for those with the highest possible frailty state (rescaled to 1) to the lowest possible frailty state (rescaled to 0), with adjustment for sex. Adjustments in model 2 slightly attenuated associations for all scores, while retaining statistical significance in all cases. HRs for model 2 ranged from 2.3 (95% CI: 1.6-3.1) to 20.2 (95% CI: 11.8-34.5). Adjustments in model 3 attenuated associations for all scores, retaining statistical significance in 27 out of 35 cases. HRs for model 3 ranged from 0.9 (95% CI: 0.5-1.7) to 8.4 (95% CI: 4.9-14.4). Table 3 illustrate the same analysis using categorical variables (frailty status). In sex-adjusted models, HRs ranged from 1.2 (95% CI: 0.9-1.7) to 3.4 (95% CI: 1.4-8.0), with 30 out of 37 cases showing a statistically significant association. Adjustments in model 2 attenuated associations, while retaining statistical significance in 28 out of 37 cases. HRs for model 2 ranged from 1.2 (95% CI: 1.0-1.4) to 3.0 (95% CI: 1.5-6.2). Adjustments in model 3 attenuated associations for all scores, retaining statistical significance in 10 out of 37 cases. HRs for model Self-reported frequency of at least once a week of mild/moderate/vigorous activity. 4 Diabetes defined as self-reported, or fasting glucose !7.0 mmol/L, or glycated haemoglobin !6.5%. 5 Hypertension defined as systolic !140 or diastolic blood pressure !90 mm Hg or taking antihypertensive medication. 6 Dyslipidemia defined as total cholesterol >6.2 mmol or taking medication. 7 CVD defined as self-reported myocardial infarction, stroke, or congestive heart disease. 8 Haemoglobin lower than 13 g/dL in men and 12 g/dL in women. 9 Depression defined with !4 out of the 8-item version of the Center for Epidemiological Studies-Depression Scale. 10 Sum of memory and executive indices; values range from 0 (worst) to 50 (best). The strongest and more stable associations after adjustment with CVD events were seen for scores from the "accumulation of deficits approach" group. Fig 2B and S4 Table show the analysis performed for incident CVD based on the categorical frailty definitions. Only 6 out of 37 HRs were statistically significant and ranged from 0.6 (95% CI: 0.4-1.0) to 2.7 (1.2-6.3) in sex-adjusted models. The effect of adjustment was a slight attenuation of the associations. S5 and S6 Tables show HR for cardiovascular events assessed in yearly intervals with continuous and categorical analysis, respectively.  Table show HRs for incident cancer. Analyses based on continuous scores ( Fig  3A) yielded HRs for cancer ranging between 0.7 (95% CI: 0.4-1.2) and 2.4 (95% CI: 1.0-5.7), while most associations (31 out of 35) did not reach statistical significance in sex-adjusted models. Further adjustment (models 2 and 3) attenuated associations for all scores, not retaining any statistical significance. Fig 3B and S7 Table show the results based on categorical frailty classifications, for which most associations did not reach statistical significance; also, with further adjustment (models 2 and 3), no score retained any statistical significance. S8 and S9 Tables show HRs for cancer events assessed in yearly intervals, with continuous and categorical analysis, respectively. Table 4 shows the discriminative ability of frailty scores for all-cause mortality using Harrell's C statistic. The improvement in prediction for each frailty score analysed as a continuous variable on top of a basic model consisting of age and sex ranged from 0.6% (95% CI: 0.2-0.9) to 3.1% (95% CI: 2.3-3.9) and was statistically significant for all scores. With model 2, improvement was significant in all cases and ranged from 0.4% (95% CI: 0.1-0.7) to 2.5% (95% CI: cardiovascular, cancer, anemia, COPD, arthritis, neuropsychiatric problems, depression, cognition, and self-rated health and quality of life. HRs were at 3.5 years (median follow-up for mortality      Analyses adding frailty categories to the age and sex basic model gave improvements ranging from 0.1% (95% CI: 0.0-0.2) to 2.1% (95% CI: 1.5-2.6), with all scores showing statistically significant improvement. In most cases, when the predictive value of the different scores was assessed over and above basic models 2, the improvement was attenuated; in most cases, it was also statistically significant.

Evaluation of discriminative ability
The C statistic of the basic model for CVD events based only on age and sex was 70.1 (95% CI: 65.7-74.4). None of the continuous scores added predictive performance to this model at a statistically significant level. In analyses of frailty categories, only the G-8 Geriatric Screening Tool (G8) score added statistically significant predictive value (delta C: 1.6 [95% CI: 0.4-2.8]) (S10 Table).
For cancer events, the C statistic of all three basic models was below 60, and all deltas were nonsignificant both in continuous and categorical analyses (S11 Table).

Sensitivity analysis
In sensitivity analyses excluding all events occurring the first year, we observed very similar results compared to those obtained with the total sample, although the strength of the associations was slightly diminished (S12 Table).
In sex-stratified analyses for all-cause mortality, men had slightly higher HRs than women. The strongest associations in both sexes were obtained with the "multidimensional approach" (S13 and S14 Tables).
In age-stratified analyses (>70/ 70 years), HRs for all-cause mortality were much higher in younger participants. However, the pattern of results was similar, with scores from the "multidimensional approach" showing the strongest associations with all-cause mortality in both age strata (S15 and S16 Tables).

Discussion
Our direct comparison of the association between 35 published frailty scores and three major health outcomes in later life demonstrates that there is great variability in the strength of the prospective association with CVD, cancer, and total mortality. Moreover, the strength of the association also differed between each of the three outcomes. While most scores added predictive ability to both simple and more complex underlying models for total mortality, this was not the case for CVD or cancer. Our finding of large heterogeneity in the magnitude of the association between different frailty scores and all-cause mortality may be due to the number and selection of variables that make up each score, along with the weight attached to each component variable in the score calculation. This is expected because these scores measure different dimensions of health, are underpinned by significantly different conceptualizations of frailty, and have different objectives of application. Therefore, the choice of a frailty score should also take into account these other aspects such as the target population (patients or general population) and the final objective of frailty assessment (clinical evaluation, research, or public health recommendations).
Interestingly, we observed that for many frailty scores, the proportional hazard assumption was not proved and the association was significantly nonuniform during follow-up time. In most of these cases, HRs for all-cause mortality were lowest directly after baseline and increased subsequently, but in some cases (40-item Frailty Index [FI40]), the opposite pattern was seen, with HRs that decreased over time. While the former set may capture information regarding underlying determinants of longer-term poor health and thus be more interesting in prognostic settings, the latter set can be hypothesized to collect information about existing health problems.
To avoid overadjustment, the most adjusted models were fitted excluding variables that were underlying variables of frailty scores. We specifically chose these models to investigate whether the score retained an association over and above a comprehensive set of clinical indicators. Our observation of heterogeneity, not only in the strength of associations but also in the degree of attenuation upon the same sets of adjustments, confirms our earlier observation that different frailty scores cannot be assumed to be interchangeable.
Our finding of a difference between analyses based on continuous scores and categorical classifications of frailty and pre-frailty indicates that the analysis with cutoffs may lead to a loss of information. This observation reflects the well-known loss of information caused by categorization of continuous variables, which assumes that the risk level is uniformly low for all below the given threshold and high for all above the threshold. Although the wish to provide users with a score with clear categories is understandable from a clinical point of view, it should be considered with caution due to the disadvantages. We have previously shown that cancer, anemia, COPD, arthritis, neuropsychiatric problems, depression, cognition, and self-rated health and quality of life. HRs were at 2.5 years (median follow-up for CVD events). BDE, Beaver Dam Eye Study Index; BFI, Brief Frailty Index; BMI, body mass index; CGA, Comprehensive Geriatric Assessment; CGAST, Comprehensive Geriatric Assessment Screening Tests; COPD, chronic obstructive pulmonary disease; CSBA, Conselice Study of Brain Aging Score; CVD, cardiovascular disease; EFIP, Evaluative Frailty Index for Physical Activity; EFS, Edmonton Frail Scale; FI40, 40  Models were fitted using age as timescale, with time 0 = age at entry of study and time 1 = age at event or censoring date. Model 1 in blue: adjusted by sex. Model 2 in red: Model 1 + smoking status, alcohol, and alcohol consumption. Model 3 in green: Model 2 + physical activity, BMI, diabetes, hypertension, many individuals are categorised differently by different scores [8]. Moreover, cutoff levels derived from one population may not be applicable in another.
A recent meta-analysis of 24 prospective studies, including 25 different scores, assessed the performance of frailty scores on mortality prediction and found a pooled relative risk (RR) of 1.83 (95% 1.68-1.98) for all-cause mortality based on binary/categorical frailty classifications in elderly populations (!65 years) [7]. The result of the meta-analysis is similar to our results in the older subgroup and in our analyses based on categorical classifications. The authors found high heterogeneity OR(I 2 statistics heterogeneity index = 95%, p < 0.001) and HR/RR (I 2 statistics heterogeneity index = 98%, p < 0.001). They attribute this to the different populations, monitoring periods, and concepts of frailty that were included in the meta-analysis. Our study is likely to have less heterogeneous results because it is an analysis in a single data set.
We also found an association between different frailty scores and incident CVD. This was not directly expected, as frailty scores have not been designed for CVD events prediction. Our finding may be explained by the fact that component variables included in the frailty scores are also CVD events. Also, some variables are CVD symptoms and risk factors that could capture pre-existing presentations of CVD. Another explanation is that physicians are possibly less likely to treat CVD risk factors as aggressively in frail patients. In addition, frailty and CVD may share etiological pathways such as chronic low-grade inflammation [59].
There are few prospective studies of the association between frailty scores and incident CVD. Our results expand upon the evidence summarised in a review by Chen [60], which showed a significant cross-sectional association between a binary frailty classification and prevalent CVD in several previous studies [12,26,61]. White et al. reported a statistically significant association (HR: 1.8 [95% CI: 1.4-2.3]) during 30 months of follow-up in a study analysing the Phenotype of Frailty (PHF) score only [62]. Finally, Afilalo et al. demonstrated that to add frailty and disability improves the discrimination of prediction models of mortality in cardiovascular patients [63].
Frailty scores were not associated with incident cancer. As with CVD, frailty scores were not designed for the prediction of cancer. A further possible explanation is that the triggering of a cancer is a process too slow or too heterogeneous to be captured by frailty scores.
We found that almost all frailty scores improved the predictive ability of a simple age-and sex-adjusted base model for all-cause mortality. The scores that showed statistically significant added predictive value over and above the most complete base model collect information about weight loss and assess physical functioning, important prognostic determinants, and they are based on relatively few variables, which makes them easily applicable in clinical settings. However, the magnitude of the added predictive value was modest (up to 3%) and might not be clinically relevant. This could be explained in part because the basic model (age-sex) already had a good predictive ability. cardiovascular, anaemia, COPD, arthritis, neuropsychiatric problems, depression, cognition, and self-rated health and quality of life. HRs were at 2.5 years (median follow-up for cancer events Frailty scores and incidence of cardiovascular events, cancer, and mortality Our results showed that frailty scores add predictive ability to chronological age and sex only when the outcome is mortality and are not for the prediction of incident CVD or cancer events. Ensrud et al. compared the mortality predictive ability of 2 scores, the Study of Osteoporotic Fractures (SOF) score and the PHF score, and did not find important differences in the values of the area under the curve (AUC), which were somewhat similar to those obtained by this study [64]. Also, Sourial et al. observed a modest improvement in the mortality predictive ability of age-sex models, adding models including several combinations of frailty scores [65].
Our results also show that frailty scores from the accumulation of deficit and multidimensional families have stronger associations with mortality compared with the phenotype of frailty and disability families. In their meta-analysis, Vermeiren et al. did not report differences in the magnitude of the associations using different frailty approaches [7]. Our study has the clear advantage of making a direct comparison of the predictive performance of the different scores in the same population.

Strengths and limitations
Our study has several strengths. The large set of scores included allows for the comparison between families of scores as well as between individual scores.
We performed state-of-the-art multiple imputation to deal with missing data, thereby making optimal use of the available events and follow-up time. We decided to impute underlying variables into their more basic form, which means that we imputed binary, categorical, and continuous variables with different models. Continuous variables were not categorised. The goal was to obtain the most plausible values of frailty scores without losing information. We are convinced that frailty scores with underlying imputed variables give less biased results and increase statistical power and accuracy. With frailty scores that have missing values for some underlying variables, it is likely that a lot of information will be lost. In addition, when some variables have missing data, we cannot rule out a missing at random mechanism. For example, a missing physical examination may be observed more frequently in a frail participant, because he could reject the test for fear of falling. There is strong evidence of the need to impute missing data, especially when the missing mechanism is not totally at random [66]. In addition, our results fill a gap especially concerning the scarce information about the relationship between frailty scores and incident CVD and cancer. The results of this study are directly applicable to the general elderly English population and are probably also generalizable to similar populations in other European countries.
A limitation of our analysis was that we had to tailor some variables to calculate certain frailty scores. We based this adaptation on published studies when possible. Another important limitation was the different follow-up duration for total mortality compared to CVD and cancer. Almost 100% of ELSA participants were followed for all-cause mortality based on reliable and objective mortality registries. In contrast, more participants were lost to follow-up with regard to CVD and cancer end points. This could be a source of bias if loss to follow-up was associated both with frailty and with the two outcomes, because participants who were lost to follow-up could be precisely those who experienced a cardiovascular or cancer event. Also, the ascertainment of CVD and cancer was based on self-reports, possibly leading to misclassification due to differential recall. However, in both cases, the most likely impact of these sources of selection would be an underestimation of a true effect rather than identification of a spurious association. Finally, while the ELSA study is a rich source of data and well suited to the study of frailty, we performed a secondary data analysis, which meant that we had to adapt our data analysis to the existing data.
The best performing scores for all-cause mortality using the continuous analysis were multidimensional and accumulation of deficit approach. The multidimensional scores can have few variables, and in consequence, they are easy to apply in a clinical setting. These scores are tailored to capture features related to ill-health in later life over and above the obvious things we can obtain from a simple clinical history, such as polymedication, weight loss, depression symptoms, cognition, and self-reported health. Based on our data, we think that the isolated presence of comorbidity and/or polypharmacy is not enough to evaluate the presence of frailty, which means it is also necessary to measure physical and/or cognitive function.

Conclusions
It seems that while some scores can be regarded as a simple summary indicator for known risk factors, other scores capture other important information, such as self-reported health, medications, cognition, and disability. In our analysis of frailty categories, the best performing scores included physical functioning assessment. Overall, we found that multidimensional frailty scores have the strongest association and largest additional predictive performance for mortality outcomes.
Frailty scores could have been considered clinically useful tools for identifying patients at higher risk of imminent death. However, the observed additional predictive ability for allcause mortality is low, which reduces their clinical value for separating individuals who will experience from those who will not experience the outcome.
There are marked differences between scores with regard to their complexity as well as strength and stability of association, with all-cause mortality probably due to a great heterogeneity in the conception of different scores. This means that users of frailty scores should carefully balance the feasibility of measurement with a score's performance. Our results provide evidence to guide clinicians, researchers, and public health practitioners in striking this balance.
We think that future research should focus on the study of the trajectories of frailty scores. Frailty should be assessed with the most adapted instrument for this purpose. This approach could help identify individuals or characteristics of frailty early in time to establish useful interventions in patients and/or the general population.  Table. Cancer hazard ratios of frailty scores assessed in intervals from 1 to 7 years: Ageadjusted model and categorical analysis. (DOCX) S10 Table. Discriminative assessment of cardiovascular models using Harrell's C statistic (n = 4,554). (DOCX) S11 Table. Discriminative assessment of cancer models using Harrell's C statistic (n = 4,792). (DOCX) S12 Table. Sensitivity analysis: Mortality hazard ratios of frailty scores (n = 5,253). (DOCX) S13 Table. Mortality hazard ratios of frailty scores in men (n = 2,377) calculated at median time follow-up (3.5 years). (DOCX) S14 Table. Mortality hazard ratios of frailty scores in women (n = 2,917) calculated at median time follow-up (3.5 years). (DOCX) S15 Table. Mortality hazard ratios of frailty scores in participants older than 70 years (n = 2,536) calculated at median time follow-up (3.5 years). (DOCX) S16 Table. Mortality hazard ratios of frailty scores in participants of 70 years and younger (n = 2,758) calculated at median time follow-up (3.5 years).