Current Developments in Dementia Risk Prediction Modelling: An Updated Systematic Review

Background Accurate identification of individuals at high risk of dementia influences clinical care, inclusion criteria for clinical trials and development of preventative strategies. Numerous models have been developed for predicting dementia. To evaluate these models we undertook a systematic review in 2010 and updated this in 2014 due to the increase in research published in this area. Here we include a critique of the variables selected for inclusion and an assessment of model prognostic performance. Methods Our previous systematic review was updated with a search from January 2009 to March 2014 in electronic databases (MEDLINE, Embase, Scopus, Web of Science). Articles examining risk of dementia in non-demented individuals and including measures of sensitivity, specificity or the area under the curve (AUC) or c-statistic were included. Findings In total, 1,234 articles were identified from the search; 21 articles met inclusion criteria. New developments in dementia risk prediction include the testing of non-APOE genes, use of non-traditional dementia risk factors, incorporation of diet, physical function and ethnicity, and model development in specific subgroups of the population including individuals with diabetes and those with different educational levels. Four models have been externally validated. Three studies considered time or cost implications of computing the model. Interpretation There is no one model that is recommended for dementia risk prediction in population-based settings. Further, it is unlikely that one model will fit all. Consideration of the optimal features of new models should focus on methodology (setting/sample, model development and testing in a replication cohort) and the acceptability and cost of attaining the risk variables included in the prediction score. Further work is required to validate existing models or develop new ones in different populations as well as determine the ethical implications of dementia risk prediction, before applying the particular models in population or clinical settings.


Introduction
Dementia is a complex disease often caused by a combination of genetic and environmental risk factors. Although many risk factors for the occurrence and progression of dementia have been identified, their utility for determining individual risk through dementia prediction models remains unclear.
Numerous models for predicting dementia, and more specifically Alzheimer's Disease (AD), have been developed [1]. Such models could be used to refine inclusion criteria for clinical trials, focus treatment and intervention more effectively and help with health surveillance. A systematic review published in 2010 identified over 50 different dementia risk prediction models [1]. The models differed in the number and type of variables used for risk score calculation, follow-up time, disease outcome and model predictive accuracy. The review concluded that no model could be recommended for dementia risk prediction largely due to methodological weaknesses of the published studies. Model development had generally been based on small cohorts, restricted to Caucasians, and at the time of the review there had been a lack of objective and unbiased model evaluation, such as external validation.
Over the last five years, research into dementia risk prediction has greatly expanded and dementia prevention is a high policy priority in many countries. In order for clinicians, researchers and policy makers to keep up to date on relevant findings and make decisions about which model to apply to identify those high risk of future dementia, it is necessary to have an accurate knowledge of model development (including component variables and validation work), discriminative accuracy, and sensitivity and specificity of cut-off scores. In this review, based on the results of an updated literature search, we aim to evaluate the latest developments in dementia risk prediction modelling including a critique of the variables selected for model inclusion and an assessment of model prognostic performance.

Methods
Search Strategy discriminative accuracy (AUC or c-statistic), and where available, sensitivity and specificity estimates of cut-off scores and positive/negative likelihood ratio (LR+ or LR-, respectively). Two authors (ET, SH) independently assessed the quality of the included studies using an adapted version of the Newcastle-Ottawa Scale (NOS) for non-randomized studies, specifically cohort studies [5], as endorsed by the Cochrane collaboration [6]. The NOS uses a star rating system to assess selection, comparability and outcome criteria. Items describing a non-intervention cohort were excluded and therefore the total ranking was out of 6 (rather than 9).

Model Development
Most models have been derived using Logistic Regression [7,9,10,21,[24][25][26] or Cox Proportional Hazards Regression analysis [8, 11, 13, 15-20, 22, 23], usually using stepwise selection to identify candidate predictors (e.g., based on a p-value; forwards or backwards), with one model using the Bayesian Information Criterion [19]). One model, the Australian National University Alzheimer's Disease Risk Score (ANU-ADRI), was developed using an Evidence-Based Medicine Approach rather than through a data analytical approach [4] and another, the Brief Dementia Screening Indicator (BDSI) was developed using data synthesis based on the best dementia predictors identified in four different cohort studies [20]. When computed, simple risk scores have been derived from the model's Beta- [4,13,16,19,20] or logit-coefficients [21]. This is similar to the methodology employed for risk model development in other fields of medicine such as cardiovascular disease [27,28]. No models have yet been developed using systems biology or neural network approaches. Neural networks simulate the functions of neurons of human brains which can interact for processing data and learning from experiences [29]. Given the potential complexity of the risk factor variables used in dementia risk prediction, neural networks may also hold promise although this has yet to be evaluated.

Risk Models
Models can be broadly divided into the following categories: (1) demographic only models; (2) cognitive based models (incorporating cognitive test scores, with or without subjective memory/cognitive complaint indicators or demographic data); (3) health variables and health risk indices (incorporating self-reported or objectively measured health status); (4) genetic risk scores including APOE, PICALM (Phosphatidylinositol binding clathrin assembly protein), CLU (Clusterin) and other genes associated with AD (i.e., BIN1, CR1, ABCA7, MS4A6A, MS4A4E, CD2AP, EPHA1 and CD33), either alone or in combination with non-genetic variables; and, (5) multi-variable models typically incorporating demographic, health and lifestyle measures. Table 3 shows comparisons of the model components between this and the 2010        review and illustrates that there is large variability in model components and differences across the two reviews. Differences are mainly in the addition of novel (non-traditional) dementia risk variables [25], information on diet [4], depression symptomology [4,13,24], ethnicity [14,20,23], and extension of genetic analysis to include non-APOE genes [17]. In addition, fewer cognitive tests are used and there is a smaller pool of candidate risk factors, likely due to more evidence being available.

Model Diagnostics
Performance of models has been assessed using measures of discriminative accuracy (e.g., AUC/ c-statistic), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), internal calibration, LR+/LR-, the net reclassification index (NRI) and the integrated discrimination improvement (IDI) statistic. Discriminative accuracy was measured in all studies and ranged from low 0.49 [4] to moderate 0.89 [9]. Cut-off points with sensitivity and specificity estimates were only reported in five studies [7,19,22,23,26]. Where available, cut-off points were determined as follows: (1) maximisation of Youden's index (Formula = Sensitivity + Specificity-1) [22,23,26]; (2) computed by cross validation to correct for optimism as a result of validation on the learning data to correspond to a sensitivity value [7]; or, (3) defining the cut-off scores as those values with high specificity and increased PPV [19]. No model reported a cut-off score with both sensitivity and specificity over 80%. Three studies reported PPVs: (1) 9 to 41% (range across different length of follow-up interval and educational level) [7]; (2) 6.6 to 49.9% (range across different cut-off scores) [23]; and, (3) 14.7% with a cut-off that provided sensitivity of at least 80% [19]. Two studies reported NPVs: range across different cut-off scores: 86.0 to 99.0% [7] and 97.7% [19]. NPV should be higher than the proportion of subjects who did not have the outcome of dementia (i.e., the "stable" subjects), if the prediction is better than chance. Table 4. Optimum features of study design and variables selected for dementia risk prediction models.

Data & Data Analysis
Minimal attrition or use of methodology that accounts for attrition (e.g., loss of follow-up and death) Considerations of the description of population, diagnostic method and follow-up time Examine internal validity (i.e., equivalent performance in different subgroups)

External validation
High AUC/c-statistic (closer to 1 the better) Consideration of whether to prioritise sensitivity or specificity and the implications of doing so

Data Presentation
Sensitivity and specificity given at multiple cut-off points Confidence Intervals of each statistic and if comparisons are made, use of a formal method of statistical inference

Risk Factor Timing
Special attention to mid-life risk factor ascertainment in older subjects (e.g., risk factors for dementia in midlife may not be associated with dementia in older subjects)

Patient Acceptability
Risk variable attainment (e.g., cost and ease of acquisition of the data from a patient as well as health care provider point of view) Risk score calculation (acceptability to the patient/health care provider)

Developments in Dementia Risk Prediction Models: A Systematic Review
Since the proportion of subjects not becoming demented was generally >85% (and always >70%)-as inferred from Table 2 -a high NPV is expected, because the number depends on the prevalence of the disease in the sample. The same logic can be applied to PPV. It should be noted that because the proportion of subjects becoming demented (or not) varies among the populations, variations in PPV and NPV do not necessarily reflect differences in performance of the models. Only one study [26] reported LR+/LR-and found that a neuropsychological prediction model provided a clinically important change in pre-to post-test probability of converting to dementia (all cause) over 5 and 10 years follow-up.
Model calibration was rarely reported, and where reported indicated good fit [13,14,20]. Reclassification indices including NRI and the IDI statistics that test the addition of variables to risk models, were used in two studies [14,15]. The first study evaluated the influence of the APOE genotype on the accuracy of AD risk assessment using NRI and found a significant improvement (NRI 0.18, Z NRI = 2.47, P = 0.01) compared to a non-APOE model; the IDI was estimated as 6.25 (Z IDI = 3.75, P <0.001) [15]. The second used NRI and IDI to assess the improvements in performance of the Cardiovascular Risk Factors, Aging and Dementia (CAIDE) risk score by adding new risk factors [14]. Here, both the NRI and IDI showed no model improvements with the additional variables.

Models for Specific Subgroups of Individuals
One study developed a risk score, the Type-2 Diabetes Specific Dementia Risk Score (DSDRS), for predicting 10-year incident dementia, in a primary care setting, in a large cohort of individuals (N = 29,961) with type II diabetes [13]. The model incorporated age, education, microvascular disease, diabetic foot, cerebrovascular disease, cardiovascular disease, acute metabolic event and depression and was well calibrated and externally validated with moderate levels of predictive accuracy (AUC = 0.74 development cohort vs. AUC = 0.75 validation cohort). In another study, model development was undertaken in a sample stratified by education level (low: no elementary school diploma vs. high: secondary school or university) and follow-up time (3 vs. 10 years) [7]. This resulted in four different models that varied by education and length of follow-up as shown in Table 2.

Model Validation
Only four studies have undertaken validation [4,13,14,20]. Differences between the AUCs in the development and validation cohorts for the different models tested are shown in Fig 2. CAIDE. The CAIDE model was developed in sample of participants from Finland (N = 1,409, age range: 39 to 64 years) and uses risk factors in midlife to estimate an individuals' risk of later life dementia (mean follow-up time = 20 years). The model incorporates age, education, sex, cholesterol level, BMI and systolic blood pressure (with and without APOE e4 status) [30]. Using data from the Kaiser Permanente (n = 9,480; age range: 40 to 55 years; mean follow up-time = 36.1 years) a similar AUC to the original study was reported (0.75 validation vs. 0.78 original cohort) [14]. Furthermore, when the Kaiser Permanente sample was stratified by ethnicity the CAIDE model was found to predict dementia well across different ethnicities including Asian (AUC = 0.81), Black (AUC = 0.75) and White (AUC = 0.74). These estimates are comparable to those reported in the original publication (model development dataset AUC = 0.78) [30]. This study also attempted to improve the discriminative accuracy of the CAIDE score with the addition of new variables such as central obesity, depressed mood, diabetes, head trauma, poor lung function and smoking but no significant improvement was shown [14].
When using the CAIDE risk score for predicting dementia in three older aged cohorts, the Rush Memory and Aging Project, the Kungsholmen Project and the Cardiovascular Health and Cognition Study, validation was found to be poor (AUC range all-cause dementia: 0.49 to 0.57; AUC range AD: 0.49 to 0.57) [4]. Interestingly, excluding BMI or BMI and cholesterol level together, modestly increased discriminative accuracy for all-cause dementia (AUC range: 0.55 to 0.60) and AD (AUC range 0.55 to 0.58) [4]. This result suggests that these variables may not be as important for predicting dementia in later vs. midlife. Indeed, in some studies of older aged cohorts higher BMI, cholesterol levels and blood pressure are found to be protective against dementia [31]. Therefore, poor transportability of the CAIDE model to these three cohorts may be due to the fact that the development dataset was a midlife cohort and the validation datasets were from older aged cohorts (Mean at baseline range: 72.3 to 81.5 years). The results could also be due to attrition rates as only 3% were lost to follow-up in the original study [30] compared to more than 20% of participants being lost in the three validation cohorts [4].

ANU-ADRI.
Using an evidence synthesis approach to model development, the ANU-A-DRI model was developed to assess a persons' risk for later life AD (i.e., over 60 years of age) based on exposure to 11 risk and four protective factors including: age, education, sex, BMI, diabetes, depression, cholesterol, traumatic brain injury, smoking, alcohol use, physical activity, pesticide exposure, social engagement, cognitive activity, and fish intake. Validation of the ANU-ADRI [4] score produced moderate levels of discrimination for dementia when between eight to 10 of the different risk/protective variables were mapped across three studies (AUC range: 0.64 to 0.74). The studies included the Rush Memory and Aging Study (N = 903), the Kungsholmen Project (N = 905) and the Cardiovascular Health Cognition Study (N = 2,496). The variables mapped included: demographic (age, gender, education), health (diabetes, traumatic brain injury, depressive symptoms), cognition (cognitive activity) and lifestyle factors (social network and engagement, smoking, alcohol, physical activity). When only common variables from all cohorts were used (n = 6 variables including: age, sex, education, diabetes, smoking, alcohol) the AUCs were: 0.69 (95% CI 0.65 to 0.73), 0.68 (0.63 to 0.70) and 0.73 (0.71 to 0.76) in the Rush Memory and Aging Study, the Kungsholmen Project and the Cardiovascular Health Cognition Study, respectively. The authors did not test whether the differences in AUCs when common variables were mapped across the different cohorts were statistically significant. These results are interesting and raise the question as to whether all or just some risk factors are needed to accurately predict future disease. It should be noted that one explanation for differences in AUC estimates could be variation in age. In particular the Kungsholmen Project, which included participants born before 1912 (baseline age > 75), was older than the other two samples (Rush Memory and Aging Study baseline age > 53, Cardiovascular Health Cognition Study > 65). The study also investigated the effect of gender on discriminative accuracy of the ANU-ADRI within each cohort and found only slight differences: the reported 95% CIs for males and females overlapped suggesting that any differences in discriminative accuracy were not significant.
BDSI. The BDSI was developed with a three-step approach in four cohort studies including: the Cardiovascular Health Study, Framingham Heart Study, Health and Retirement Study, and the Sacramento Area Latino Study on Aging [20]. First, a list of potential predictive factors available in most or all cohorts was identified. Second, in each cohort, variables most predictive of dementia at six years were identified independently. Third, a subset of variables that were consistently found in all four cohorts was identified and used in the model including demographics (age, education), health (history of stroke, diabetes mellitus, BMI, depressive symptoms) and lifestyle (assistance needed with money or medications) factors. The c-statistic for predicting 6-year incident dementia varied between the 4 cohorts from 0.68 to 0.78. Sensitivity analyses, using data from the Health and Retirement Study and Cardiovascular Health Study, suggested that discrimination was good across different race/ethnic groups: Health and Retirement Study (c-statistic = 0.75 Whites, 0.70 Blacks, 0.71 Latinos) and the Cardiovascular Health Study (c-statistic = 0.70 Whites, 0.65 Blacks) [20].

Cost Considerations
One study considered the impact on discriminative accuracy of modifying the calculation of a resource intensive risk score to incorporate less expensive measures. The original resource intensive model, the Late Life Dementia Risk Index, included demographic (age), lifestyle (alcohol consumption), neuropsychological (Modified Mini Mental State Examination (MMSE) score and Digit Symbol Substitution Score), medical (history of coronary bypass surgery and BMI), physical functioning (time to put on and button a shirt in seconds), genetic (APOE), cerebral magnetic resonance imaging (MRI) (white matter disease and enlarged ventricles), and carotid artery ultrasound (internal carotid artery thickness >2.2mm) measures, and had good discrimination for the prediction of 6-year incident dementia (c-statistic = 0.81, 95%CI: 0.79 to 0.83) [32]. The revised model, the Brief Dementia Risk Index, incorporated age, neuropsychological testing (3 word delayed recall, interlocking pentagon copying, verbal instructions (paper taking and folding), four legged animal naming task (30 seconds)), selfreported attention difficulties (3 or more days per week in the last month), medical history (stroke, peripheral artery disease, or coronary artery bypass surgery and BMI) and alcohol consumption and had a significantly lower discriminative accuracy (c-statistic = 0.77, p<0.001), but was able to categorize subjects as having low, moderate, or high risk of dementia with similar accuracy compared to the more resource intensive score [21].

Discussion
The results from the review show that many new models for dementia risk prediction have been developed over the last five years. There have also been significant changes to the types of variables used when compared to the previous review. However few studies have addressed the issues of external validation and cost in using the risk model.

Strengths and Limitations
The strengths of this review are its systematic approach and inclusivity. There are some limitations. It is difficult to synthesise the literature on dementia risk prediction due to the large variability across studies in follow-up length (range: 1.5 years[9] to 17 years [15]), sample age (range: 40 to 99 years), outcomes tested (e.g., AD vs. all-cause dementia vs. dementia subtypes, quality of the diagnosis), source of population (volunteer vs. population representative) and the different variables incorporated into the prediction models. As such a meta-analysis was not possible. Furthermore, any meaningful conclusions for population screening are limited by the lack of cost-effective analysis and limited assessment of model transportability.

Clinical Implications
There is currently a clinical drive towards timelier diagnosis particularly in developed countries such as the UK with the introduction of primary care direct enhanced services looking to identify those at risk of developing dementia e.g. stroke, diabetes, cardiovascular disease [33]. A risk prediction tool, particularly in at risk populations such as diabetes [13] could further enhance existing services. However, if model development in the field of dementia continues at its current pace and if dementia risk prediction is found to be useful and cost-effective, then researchers and possibly clinicians will face difficult choices regarding which model to apply, particularly as study comparison is difficult.
Variables in the Prediction Models and Comparison to Results from the First Review. Compared to the earlier review [1] elements common to the majority of risk scores include age, education, measures of cognition and health. However, new developments in dementia risk prediction include non-APOE genes and genetic risk scores [17,18], testing of non-traditional dementia risk factors [25], incorporation of information on diet [4], physical function [4], physical activity/exercise and ethnicity [16] into risk modelling, and model development in specific subgroups of the population (e.g., individuals with diabetes [13] and those with low vs. high educational attainment [7]) and over different follow-up times. Furthermore, fewer cognitive tests have been used in prediction models, reflecting our increasing knowledge of risk and protective factors. Despite the dramatic increase in the number of models and novel risk scores, discriminative accuracy has not changed to a significant degree when compared to the previous review (range in the 2010 review: 0.49 to 0.91 vs. range in this review: 0.49 to 0.89). Aside from cognitive based models, generally, the best models are those that incorporate multiple risk factors across different variable categories (e.g., demographic, cognition, physical and health). Within the limits of our relatively limited knowledge of the genetic factors that influence dementia risk, where statistically tested, addition of novel (non-APOE) risk factors to prediction models do not appear to significantly increase discriminative accuracy [17]. In contrast, the APOE genotype, at least in some studies, appears to be informative. Future work into polygenic risk scores would help.
Further there has been a drive towards more accessible and potentially modifiable variables. Modifiable variables are important as they have the potential to be specifically targeted in primary or secondary prevention. There is now also evidence that around a third of AD cases worldwide may be due to modifiable risk factors [34]. With risk models likely to be used in the primary care setting, the availability of imaging variables may be difficult to obtain nor do they significantly improve discrimination in prediction of dementia beyond the more readily published multifactorial models [35].

Stratified Analyses
Results from stratified analysis suggest that unique dementia risk prediction models may need to be developed depending on follow-up length (e.g., BMI and hypertension may be more important in mid-life compared to later life models) [7,23,26,[36][37][38], an individual's education level (found in one study and requires replication) [7], health status (e.g., diabetes) [13], APOE status [23] and the outcome tested (e.g., most studies focus on all-cause dementia and different models maybe needed depending on dementia subtype) [19]. Further research is required to develop risk models in samples stratified by other confounding factors known to influence the timing and presentation of dementia symptoms (such as mid vs. later life) as well as investigate the interactions between different risk variables (e.g., such as AOPE status and age).

Model Transportability
Before a risk prediction tool can be used in clinical practice or for research, transportability of the model outside the cohort from which it was developed needs to be assessed. Only four studies have externally validated dementia risk prediction models and the results were mixed [4,13,14,20]. The DSDRS model was developed and validated with moderate but similar levels of discrimination (AUC 0.74 development vs. 0.75 validation) [13]. This is in comparison to the CAIDE score (originally developed for midlife), which was poorly validated in three separate (older) cohorts (AUC 0.77 development vs. AUC range 0.49-0.57 validation) [4]. However, it is important to note that in this validation of the CAIDE score rather different populations were used, that varied by age (mid vs. later life). Indeed, factors like higher BMI, blood pressure and cholesterol levels are associated with lower incidence of dementia in the oldest old [31]. In contrast, the CAIDE score was found to transport well when the test population better resembled the original population (i.e., was a mid-life cohort) [14]. Generally, external validation is difficult largely due to the lack of available cohorts with which to test models in terms of follow-up times, data collected, age groups and risk variable measurement.
Most studies have been developed in datasets from North America (N = 12 studies), with others developed in the UK (N = 1), Japan (N = 1), Austria (N = 1), the Netherlands (N = 2), Pan-Europe (N = 1), Germany (N = 3) and France (N = 1). Whether the different models are applicable across countries that vary by health and wealth is not known. Further, no models have been developed for predicting other dementia sub-types, such as vascular dementia or dementia with Lewy Bodies or for predicting different disease severity (e.g., mild, moderate and severe dementia). This may have implications for treatment options.

Issues Around Cost
The ability to assess the relative cost of calculating the different risk models and compare this against the model's accuracy will significantly influence recommendations about possible protocols for screening for dementia risk. Further, the incorporation of readily accessible primary care related factors would be most useful for application within clinical and population based settings. Three studies [4,20,21] have considered time and/or financial implications in risk score calculation. However, in one study it was found that reducing the cost of risk score computation by removing the need for MRI, ECG and detailed neuropsychological measures and replacing them with less resource intensive variables (e.g., more detailed self-reported health history and simple cognitive test items) resulted in a significant decline in discriminative accuracy [21,32]. Thus raising the issue of what is the best information and minimum data set needed for accurate dementia risk prediction. It is important to note that current models have not yet utilized cerebrospinal fluid or positron emission tomography data as recommended for classifying AD and its preclinical stages in new (clinical and research) criteria for AD and its preclinical/prodromal states [39,40]. Although these factors can be used to assist dementia diagnosis, the feasibility and acceptance of incorporating them into risk prediction models and, particularly in population-based settings, would likely be low.

Conclusions
Before a risk assessment tool can be implemented we need to know its discriminative accuracy, predictive value, cost-effectiveness, transportability (e.g., to particular populations, ages and gender etc.), and the general availability of its variables (e.g., to enable cross-study comparison and result verification). We must also consider the design implications of the model: Are we interested in highly sensitive indicators of near term (i.e., 3-years) incidence of dementia? What operating characteristics are optimal for longer-term (5-10 years) predictive models? Are we contemplating a stratified approach, whereby low-cost screening identifies subjects with higher risk for more detailed and costly assessments?
It is not possible to state with certainty whether there exists one model that can be recommended for dementia risk prediction in population-based settings. This is largely due to the lack of risk score validation studies. Consideration of the optimal features of new models should largely focus on methodology (model development and testing) and the cost and acceptability of deriving the risk factors. Further work is required to validate existing models or develop new ones, as well as to assess their cost-effectiveness and ethical implications, before applying the particular models in population-based or clinical settings. While it is difficult to make a recommendation regarding which model, we nonetheless offer some recommendations of the optimal features for new models (see Table 4).