An accurate and timely diagnosis for Alzheimer’s disease (AD) is important, both for care and research. The current diagnostic criteria allow the use of CSF biomarkers to provide pathophysiological support for the diagnosis of AD. How these criteria should be operationalized by clinicians is unclear. Tools that guide in selecting patients in which CSF biomarkers have clinical utility are needed. We evaluated computerized decision support to select patients for CSF biomarker determination.
We included 535 subjects (139 controls, 286 Alzheimer’s disease dementia, 82 frontotemporal dementia and 28 vascular dementia) from three clinical cohorts. Positive (AD like) and negative (normal) CSF biomarker profiles were simulated to estimate whether knowledge of CSF biomarkers would impact (confidence in) diagnosis. We applied these simulated CSF values and combined them with demographic, neuropsychology and MRI data to initiate CSF testing (computerized decision support approach). We compared proportion of CSF measurements and patients diagnosed with sufficient confidence (probability of correct class ≥0.80) based on an algorithm with scenarios without CSF (only neuropsychology, MRI and APOE), CSF according to the appropriate use criteria (AUC) and CSF for all patients.
The computerized decision support approach recommended CSF testing in 140 (26%) patients, which yielded a diagnosis with sufficient confidence in 379 (71%) of all patients. This approach was more efficient than CSF in none (0% CSF, 308 (58%) diagnosed), CSF selected based on AUC (295 (55%) CSF, 350 (65%) diagnosed) or CSF in all (100% CSF, 348 (65%) diagnosed).
We used a computerized decision support with simulated CSF results in controls and patients with different types of dementia. This approach can support clinicians in making a balanced decision in ordering additional biomarker testing. Computer-supported prediction restricts CSF testing to only 26% of cases, without compromising diagnostic accuracy.
Citation: Rhodius-Meester HFM, van Maurik IS, Koikkalainen J, Tolonen A, Frederiksen KS, Hasselbalch SG, et al. (2020) Selection of memory clinic patients for CSF biomarker assessment can be restricted to a quarter of cases by using computerized decision support, without compromising diagnostic accuracy. PLoS ONE 15(1): e0226784. https://doi.org/10.1371/journal.pone.0226784
Editor: Han Zhang, University of North Carolina at Chapel Hill, UNITED STATES
Received: September 16, 2019; Accepted: December 3, 2019; Published: January 15, 2020
Copyright: © 2020 Rhodius-Meester et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data from this study are available upon request. For this study we requested data from the PredictND consortium. The data-sharing agreement of the PredictND consortium however allows us use of data for this specific project only. The data underlying the results presented in the study are available from the PredictND-board; email@example.com.
Funding: This study is partly funded by Combinostics. The funder provided support in the form of salaries for authors [JK and JL], and had an additional role in the study as Juha Koikkalainen and Jyrki Lötjönen developed the method and quantitative raw data were generated using Combinostics’ tools. They also reviewed the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section. For development of the PredictAD tool, VTT Technical Research Centre of Finland has received funding from European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreements 601055 (VPH-DARE@IT), 224328 (PredictAD), and 611005 (PredictND). The latter had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.'
Competing interests: Ingrid M. van Maurik, Antti Tolonen, Kristian S. Frederiksen, Steen G. Hasselbalch, Sanna- Kaisa Herukka, Anne M. Remes and Yolande A.L. Pijnenburg have declared that no competing interests exist. I have read the journal’s policy and the authors of this manuscript have the following competing interests: Hanneke F.M. Rhodius- Meester performs contract research for Combinostics, all funding is paid to her institution. Juha Koikkalainen and Jyrki Lötjönen report that Combinostics owns the following IPR related to the paper: 1. J. Koikkalainen and J. Lötjönen. A method for inferring the state of a system, US7,840,510 B2. 2. J. Lötjönen, J. Koikkalainen and J. Mattila. State Inference in a heterogeneous system, US 10,372,786 B2. 3. J. Lötjönen and J. Koikkalainen. Method of inferring a need for medical test, FI 20195603. Koikkalainen and Lötjönen are shareholders in Combinostics. Lötjönen has been an invited speaker for Merck and Sanofi. Hilkka Soininen has served as consultant for ACImmune. Charlotte E. Teunissen has a collaboration contract with ADx Neurosciences, performed contract research or received grants from Probiodrug, Biogen, Esai, Toyama, Janssen prevention center, Boehringer, AxonNeurosciences, Fujirebio, EIP farma, PeopleBio, Roche. Frederik Barkhof receives personal fees from Springer, is a consultant to Bayer, Biogen, Roche, Novartis, Merck, IXICO, GeNeuro and GE Healthcare; receives grants from UK MS Society, g Dutch Foundation MS Research, NWO, NIHR and IMI (H2020), outside the submitted work. Philip Scheltens has served as consultant for Wyeth-Elan, Genentech, Danone and Novartis and received funding for travel from Pfizer, Elan, Janssen and Danone Research. Wiesje M. van der Flier performs contract research for Biogen. Research programs of Wiesje van der Flier have been funded by ZonMW, NWO, EU-FP7, Alzheimer Nederland, CardioVascular Onderzoek Nederland, Gieskes-Strijbis fonds, Pasman stichting, Boehringer Ingelheim, Piramal Neuroimaging, Combinostics, Roche BV, AVID. She has been an invited speaker at Boehringer Ingelheim and Biogen. All funding is paid to her institution. These competing interests do not alter our adherence to PLOS ONE policies on sharing data and materials.
An accurate and timely clinical diagnosis of Alzheimer’s disease (AD) is important both for care and research . Over the last decade the field has moved from a purely clinical concept to a more biological definition of AD . The development of CSF biomarkers has played a pivotal role in this paradigm-shift, enabling accurate detection of AD pathology in vivo . The current diagnostic criteria emphasize the use of these biomarkers in suspected AD [2, 4].
Diagnostic guidelines do not specify which patients should receive CSF testing however, and as a result there is a large practice variation . Recently, a set of criteria for the use of CSF has been published, mirroring the appropriate use criteria (AUC) for amyloid PET, but these are still rather difficult to translate to clinical practice . First, they are phrased in rather general terms, e.g. ‘CSF biomarkers can be ordered for all patients with suspected underlying AD pathology’, which carries the risk of performing CSF unnecessarily. In addition, the AUC for CSF state that there ‘should be a determination of how CSF biomarkers might contribute to the diagnosis and clinical decision making’, but it is not self-evident how this should be done. Furthermore, ordering of CSF should also depend on ‘the confidence of the clinician in the diagnosis’ .
Data-driven support tools could be helpful to answer the question ‘for which patient should CSF biomarkers be measured?’ before embarking on actual testing. By using simulated positive (AD like) or negative (normal) CSF biomarker profiles for a specific patient added to their priori clinical information, such a tool could inform the clinician on how the knowledge of CSF biomarkers would impact (confidence in a) diagnosis .
The Disease state index (DSI) classifier is a clinical decision support system (CDSS) which supports the differential diagnosis of several types of dementia by combining and weighing all available information, including cognitive tests, automated MRI features and CSF biomarkers . This multi-class classifier has previously shown to be able to differentiate between several types of dementia [9, 10]. The DSI classifier can achieve high probability of correct diagnosis even without CSF biomarkers, by using an optimal combination of tests [11, 12].
In this work, we took the functionality of our CDSS as an input to predict who would benefit from CSF testing. More specifically, we tested whether the CDSS may help to answer the following question: if a clinician already has information on neuropsychological tests and MRI brain, would additional CSF testing contribute significantly to a more accurate diagnosis? We thus aimed to evaluate how a computer-based tool can help in selecting those patients by using a data-driven approach and simulated CSF outcome.
We included 535 subjects from three memory clinics, as part of the PredictND project : 463 subjects from the Amsterdam Dementia Cohort from the Alzheimer center of the Amsterdam UMC (the Netherlands) [14, 15]), 50 subjects from the Danish Dementia Research Centre at Copenhagen University Hospital, Rigshospitalet (Denmark) and 22 subjects from the Neurocenter at Kuopio University Hospital (Finland). The pooled cohort consisted of subjects with the following diagnosis: 286 (53%) AD, 82 (13%) frontotemporal dementia (FTD), 28 (5%) vascular dementia (VaD) and 139 (26%) controls with subjective cognitive decline (SCD). Subjects were eligible for inclusion if both CSF and brain MRI results were available.
All subjects had received a standardized work-up, including medical history, physical, neurological and neuropsychological assessment, MRI, laboratory tests and CSF. Subjects were diagnosed as SCD when the cognitive complaints could not be confirmed by cognitive testing and criteria for mild cognitive impairment (MCI) or dementia were not met. Probable AD was diagnosed using the core clinical criteria of the NIA-AA for AD dementia . Probable FTD (including the behavioural variant of FTD, progressive non-fluent aphasia and semantic dementia) was diagnosed using the criteria from Rasckovsky and Gorno-Tempini, respectively [16, 17]. VaD was diagnosed using the NINDS-AIREN criteria . The local Medical Ethical Committee reviewed and approved the study, including its consent procedure, in accordance with the declaration of Helsinki. Capacity to consent was determined by the clinician, who performed the work-up. All patients provided written informed consent for their data to be used for research purposes.
Cognitive functions were assessed with a brief standardized test battery including widely used tests. We used the Mini-Mental State Examination (MMSE) for global cognitive functioning . For memory we included the Rey auditory verbal learning task (RAVLT) . To measure mental speed and executive functioning we used Trail Making Test A and B (TMT-A, TMT-B) . Language and executive functioning were tested by category fluency (animals) . For behavioral symptoms we used the Neuropsychiatric Inventory (NPI) . Missing data ranged from n = 3 (1%) (MMSE) to n = 126 (24%) (TMT-B).
Cerebrospinal fluid biomarkers
CSF beta amyloid 1–42 (AB42), total tau and tau phosphorylated at threonine 181 (p-tau) were measured with commercially available ELISA tests (Innotest, Fujirebio, Ghent, Belgium) locally according to standard procedures. Raw data values were used for analysis. Analysis showed comparable values between the centers and hence no further correction was performed.
Apolipoprotein-E (APOE) genotyping were performed locally in each center. Patients were dichotomized into APOE e4 carriers (hetero- and homozygous) and non-carriers. APOE data were available in 472 (88%) subjects.
MRI images were acquired using 1.5 T or 3 T scanners. The voxel size varied between 0.5–1.1 × 0.5–1.1 × 0.9–2.2 mm for T1 images and 0.4–1.3 × 0.4–1.2 x 0.6–6.5 mm for FLAIR images. We extracted five imaging markers using the cNeuro® cMRI quantification tool: 1) computed medial temporal lobe atrophy (cMTA), 2) computed global cortical atrophy (cGCA), 3) AD similarity scale, 4) Anterior Posterior index and 5) white matter hyperintensities (WMH). First, we automatically computed the MTA and GCA scores [24, 25]. To do so, we defined volumes of brain structures from T1 image segmentations produced by a multi-atlas segmentation algorithm [9, 26]. The cMTA was defined from the volumes of the hippocampi and inferior lateral ventricles separately for left and right. In addition to volumetry, voxel-based morphometry (VBM)  was applied to compute gray matter concentrations which were used to define the cGCA. For the AD similarity scale, the region of interest (ROI) in the patient image was represented as a linear combination of the corresponding ROIs from a database of previously diagnosed patients [28, 17]. A ROI-based grading imaging marker was defined as the share of the weights from the linear model having the diagnostic label AD while using the hippocampus area as a ROI. The Anterior Posterior index is a specific measure for characterizing fronto-temporal atrophy, a well-known hallmark in FTD. API measures the relative volume of the cortex in the fronto-temporal and parieto-occipital regions. . In addition, we automatically extracted the volume of WMH from FLAIR images [9, 30]. All imaging markers are corrected for head size  and for age and sex .
Disease State Index classifier
The Disease State Index (DSI) classifier is a simple supervised and data-driven machine learning method that compares different diagnostic groups with each other; controls, AD, FTD and VaD in this work [8, 13]. DSI is a continuous value between zero and one, measuring the similarity of all patient data to diagnostic groups defined . For each single test result from cognitive tests, APOE, imaging or CSF, a reference dataset determines a fitness function f(x) = FN(x)/(FN(x)+FP(x)) where FN and FP are false negative and positive rates, respectively, when using x as a cut-off to classify patients into two diagnostic groups. Then, this function is used to compute fitness value between zero and one to each test result of the patient. 2) The ‘relevance’ of each test result is defined as sensitivity+specificity-1 from the reference dataset. 3) DSI is calculated by averaging all fitness values by weighting them with their relevance. This process is repeated for each pairwise comparison (controls-AD, controls-FTD, controls-VaD, AD-FTD, AD-VaD, VaD-FTD). The total DSI for each diagnostic group is the average over all pairwise comparisons, e.g. AD-controls, AD-FTD and AD-VaD for AD. All variables are corrected for age and sex . Due to the design of DSI, there is no need to impute data or exclude cases with incomplete data, as only available data are used .
Probability of correct class
A high total DSI value or a big difference in DSI values between the two most similar diagnostic groups (first and second DSI) provides more confidence for making a diagnosis than a low value or a small difference. Therefore, we define a new measure, probability of correct class (PCC), which estimates the probability that the suggested diagnosis for a patient is correct, when compared to the clinical diagnosis (ground truth). If the highest DSI value is 0.75 and the second highest is 0.67 for the patient being studied, PCC measures the share of correctly classified cases in the reference database having DSI close to 0.75 and difference close to 0.75–0.67 = 0.08. In other words, PCC estimates (personalized) confidence in classification, i.e., classification accuracy for this very patient. In clinical practice the clinician can adjust the applied PCC cutoff depending on the required accuracy. In this study, patients were considered as having a diagnosis with sufficient accuracy if PCC was at least 0.80. For comparison we performed a sensitivity analysis and repeated our analyses for different PCC cut-offs. Details on the definition and calculation of PCC can be found in the supplement file (S1 File).
Simulated CSF to guide clinical decision making
In our search for a scenario with the largest proportion of patients diagnosed with high confidence (PCC≥0.80), against the smallest proportion of CSF tested, we applied four scenarios:
- Scenario A (“computerized decision support”): We performed CSF only when predicted to be useful (Fig 1A). This was modelled using a stepwise approach: first, patients diagnosed with PCC≥0.80 based on neuropsychology, MRI and APOE were regarded as clear cases and no CSF was ordered (step one). For the remaining patients with a PCC<0.80, we simulated adding CSF biomarkers; PCC was recomputed after adding positive (age- and sex-normalized median AB42, tau and p-tau values of AD group) and negative (age- and sex-normalized median AB42, tau and p-tau values of SCD group) CSF values (step two) . If either of the resulting simulated PCC values reached the cut-off of≥0.80, actual CSF values were added, and DSI and PCC calculated using these real values (step three).
- Scenario B (“No CSF”): We calculated DSI and PPC for each patient using neuropsychology, MRI and APOE, excluding CSF (Fig 1B).
- Scenario C (“AUC”): We performed CSF based on the AUC criteria (Fig 1C), which state that CSF biomarkers can be applied to all patients with suspected underlying AD pathology . We operationalized this as a DSI for AD >0.6. In all patients with a DSI for AD >0.6, we then calculated PCC adding CSF (step two). For patients with a DSI for AD ≤0.6, no CSF was added and DSI and PCC were calculated using only neuropsychology, MRI and APOE.
- Scenario D (“All CSF”): We performed CSF in all patients (Fig 1D). We calculated DSI and PCC for each patient using neuropsychology, MRI, APOE and CSF.
AUC: appropriate use criteria according to , operationalized as a DSI for AD >0.6, PCC: probability of correct class, NP: neuropsychology, MRI: magnetic resonance imaging, CSF: cerebrospinal fluid biomarkers, Sim.: simulate, FU: follow-up. Numbers in circles denote groups described in section 3.3 and Table 3.
We used a two-sample test of proportion to test differences in proportion of patients 1) with CSF tested, and 2) diagnosed with sufficient confidence (PCC≥0.80), between the computerized decision support and the other scenarios . We compared the different groups derived from the computerized decision support (scenario A), using Analysis of Variance (ANOVA).
In an additional sensitivity analysis, we visualized the share of patients diagnosed (percentage of patients above PCC cutoff) and the share of patients with CSF measurement for a range of PCC cut-offs (0.5–1.0), and calculated the classification accuracy of the diagnosed patients. This was done for the four diagnostic scenarios: A) the computerized decision support, B) No CSF, C) AUC, and D) All CSF.
Statistical analyses were performed using STATA version 14.1 and R version 3.5.3. A MATLAB toolbox created by  was used in the DSI analyses. The analyses were performed in MATLAB version R2015b (MathWorks, Natick, MA, USA).
Overall mean age was 65±8 and 262 (49%) were females. Details on cognitive tests, MRI markers and CSF results and by group can be found in Table 1.
Contribution of CSF to diagnosis
To evaluate in which patients the knowledge of CSF changed the confidence in the diagnosis, we examined four diagnostic scenarios, as shown in Fig 1.
In the computerized decision support (Fig 1A) 308 (58%) patients that had PCC≥0.80 in step one (similar to scenario B), did not receive CSF testing. For the remaining 227 (42%) patients, we added simulated CSF values to identify how such values would impact diagnosis. In 140 patients the cutoff of PCC 0.80 was reached (72 with only positive CSF biomarkers, 46 only negative CSF biomarkers and 22 conflicting CSF biomarkers). When we added the actual CSF results of these patients to calculate DSI and PCC (step three), an additional 71 patients reached our threshold to be diagnosed with sufficient confidence. As a result restricting measurement of CSF to only 26% (140/535), diagnosis with sufficient confidence was reached in 71% of the patients ((308+71)/535). In comparison with the other scenarios, the computerized decision support diagnosed a significantly higher proportion of patients diagnosed with sufficient confidence (Table 2). Of note, the computerized approach even outranks scenario D (all CSF), emphasizing that performing CSF in patients were it is not useful, causes confusing and thus lower confidence in the diagnosis. Furthermore, this approach had a significantly lower proportion of CSF tested compared to scenario C based on AUC.
Groups comparison computerized decision support
When we used the computerized decision support (scenario A, Fig 1A), we can discern different groups. Table 3 shows the baseline characteristics of these different groups. In the largest group 1, patients already had PCC≥0.80 using neuropsychology, MRI and NPO. This group contained mainly controls and patients with a clear clinical profile, i.e. large difference between the highest and second-highest DSI. In group 2, CSF was not helpful as PCC remained <0.80 also after adding simulated CSF values. This group consisted mainly of AD patients, yet with more WMH and a small difference to the second-highest DSI, indicating high probability of comorbid neuropathology. In group 3 CSF was helpful as PCC increased from <0.80 to ≥0.80 by adding simulated CSF, and was ≥0.80 also after adding the actual CSF values (a diagnosis could be made). This group contained mainly FTD and AD patients, where especially tau and p-tau were more abnormal. Finally, in group 4 PCC increased from <0.80 to ≥0.80 by adding simulated CSF, but remained <0.80 when adding the actual CSF values. Patients in group 4 seemed to have multiple underlying neuropathology, with higher tau and more WMH, and a smaller difference with the second-highest DSI.
Nearly all patients in groups 3 and 4 were considered appropriate for CSF following the AUC, showing that our computer-supported decision does not result in a risk of missing patients who would actually have benefited from CSF. In addition, 45% of the patients in group 1 and 30% of the patients in group 2 were considered eligible for CSF testing according to AUC. Yet our computerized decision support predicted that additional CSF testing would not yield additional diagnostic accuracy.
Different PCC cut-offs
For the current manuscript, we arbitrarily defined an accurate diagnosis as PCC≥0.80. To circumvent this inherent arbitrariness, we repeated our analyses for different PCC cutoffs. Fig 2 shows the share of patients with CSF tested and the share of patients diagnosed for different PCC cutoffs, comparing the four diagnostic scenarios: the computerized decision support, no CSF, AUC and CSF for all patients. Overall, a higher PCC cutoff resulted in a lower number of diagnosed patients. Independent of PCC cutoff, applying no CSF resulted in the smallest share of patients diagnosed, while the stepwise approach gave the largest share of patients diagnosed with CSF tested in max 26.5% patients.
Blue: proportion of patients diagnosed, Red: proportion of patients with CSF measured, PCC: probability of correct class. Solid lines show results for the computerized decision support (Fig 1A), dotted lines show results for using no CSF, but only neuropsychology, MRI and APOE (Fig 1B), dashed dotted lines show results for AUC (Fig 1C) and dashed lines using all data (Fig 1D).
Regardless of PCC cutoff, the accuracy of the four different approaches remained comparable (see S1 Fig) demonstrating that the stepwise approach does not compromise accuracy.
Visualization of computerized decision support
To enable clinicians to use this method in daily practice, we have visualized the computerized decision support in Fig 3. Using the DSI classifier, clinicians select the PCC threshold (default PCC≥0.80) and simulate positive and negative CSF biomarkers. The resulting change in PCC can then support the clinician to refrain from or embark on CSF testing. In Fig 3, two examples of the visualization for clinical use are shown. For patient A, the clinician chose the default PCC cutoff of 0.80. When using only neuropsychology, MRI and APOE data, the highest DSI was obtained for AD with a 72% probability of correct diagnosis. When adding simulated CSF, PCC increased to 0.89 (AD-like CSF), resulting also in a higher DSI for AD. When CSF would be negative, PCC also increased to 0.86 resulting in higher DSI for FTD. As the simulated CSF values resulted in a PCC above the cutoff of 0.80, the suggested advice by the tool is that “CSF measurement is considered potentially useful”. Following this advice, the results of the actually performed CSF testing are shown in the bottom panel. The actual CSF biomarker results were AD-like, resulting in a clinical AD diagnosis with 91% probability of correct diagnosis. For patient B, however, simulating only negative CSF increased the PCC above 0.80. And indeed, actual CSF ruled out AD and indicated FTD as the most probable diagnosis. This was also the clinical diagnosis.
PCC: probability of correct class, AD: Alzheimer´s disease, FTD: Frontotemporal dementia, VAD: Vascular dementia, CN: control. For patient A both simulated positive and negative CSF resulted in an increase of PCC. Adding actual CSF confirmed the AD diagnosis. For patient B simulated negative CSF increased the PCC. Adding actual CSF ruled out AD and indicated FTD as the most probable diagnosis.
Although relevant for diagnosing and excluding AD, it remains unclear in which patients applying CSF biomarkers has most added value. This study showed that a data-driven, individualized approach, simulating CSF outcome before embarking on actual CSF testing, can help in identifying those patients in whom CSF biomarkers are most likely to contribute to a more accurate diagnosis. Recommending CSF testing in only one quarter of patients, our computerized decision support led to a diagnosis with sufficient confidence in 71% of the patients. By contrast, in the three other scenarios (CSF in none, based on AUC or CSF in all) less patients were diagnosed with sufficient confidence. By visualizing the effect of simulated CSF (Fig 3) on the probability of correct class (PCC), clinicians can be supported in making a conscious decision in ordering additional biomarker testing.
Neurodegenerative disorders in general and AD in specific, are increasingly considered as biologically defined diseases. Current guidelines and diagnostic criteria therefore emphasize the use of biomarkers [2, 4]. By using these biomarkers, the clinician can either confirm or rule out underlying AD. Yet, clinicians do not often apply CSF biomarkers . The appropriate use criteria (AUC) which have been proposed as a guideline, are generally worded and challenging to translate to clinical practice . In our study, more than half of the patients would be selected for CSF measurement based on fulfillment of the AUC. In addition to specifying criteria, the AUC further mention that the clinician should estimate ‘how CSF biomarkers might contribute to the diagnosis and clinical decision making’ . Operationalization of this statement is difficult. Tools are therefore needed in search for an answer on the question ‘which patient benefits most from CSF?’. To select patients for amyloid measurement, previous studies have aimed to ‘predict’ amyloid positivity in controls and patients with AD [37–40]. Yet a clinician requires information on how knowledge on amyloid predicts clinical outcome. We added to these studies by using a data-driven approach including not only controls and AD, but also patients with VaD and FTD, and simulated AB42, tau and p-tau values to guide decision making.
In the computerized decision support approach, PCC is first calculated using only neuropsychology, MRI and APOE data. If the clinician considers the PCC (= the confidence in the correct diagnosis) high enough, no additional testing is needed. This was frequently the case for controls and VaD . When the PCC is too low, simulated normal or AD-like CSF values are added, enabling the clinicians to decide whether the change in PCC is enough for them to order additional testing. If the PCC remains low, the clinician could use this information to refrain from embarking on CSF testing, which would only yield unnecessary costs without improving diagnostic accuracy. Contrariwise the clinician could take other further steps to improve diagnostic certainty, e.g. genetic screening or FDG-PET. One could imagine that comparable approaches would be developed for these other diagnostic tests.
As an example, we used PCC≥0.80 as a cut-off. There is however some inherent arbitrariness to the cut-off of diagnostic certainty. It is for example unclear what level of certainty is acceptable in clinical practice. Also, clinicians may value the PCC differently, and may want to use less or more stringent cut-off values. This is why we repeated our analyses for a wide range of cut-offs, resulting in Fig 2. Independent of PCC cut-off, the results remained largely similar: using estimated CSF values to support the choice to embark on testing, helps to reduce costs (smaller proportion CSF testing) while improving diagnostic certainty.
An often mentioned, and important limitation of data-driven decision support systems is that they require multiple pieces of data that are not readily available to the clinician, or that the models are too complex which limit their clinical footprint. In order for clinical decision support tools to be implemented in clinical practice, they should have an understandable basis and be intuitive (i.e. not a complex black box), time efficient, and assist rather than replace the clinician . Our stepwise approach meets all these requirements; well-known CSF values are simulated in a simple way and the tool provides a suggestion on the usefulness of CSF testing. We used the DSI classifier, an existing, validated machine learning algorithm [10, 12]. The DSI classifier has a graphical counterpart which makes interpretation of results to clinicians more transparent (an example is shown in Fig 3), tolerates missing data (no imputation needed), and gives information about the confidence of the classification.
Furthermore, we used data of three different memory clinic cohorts, warranting a heterogeneous sample. All patients came to the memory clinic seeking medical help; our data thus reflects real-life, clinical patients. In each clinic, patients were diagnosed using a standard clinical workup, and we were able to use measures that overlapped. CSF analyses were performed in three different laboratories, yet we found no significant differences in values between the three centers. MRI scans were acquired on systems with different field strengths, but the automatic analyses of these scans were all performed by the same software .
Some limitations also warrant discussion. First, we categorized patients as having single pathologies, while on average 20–40% patients have in fact multiple pathologies, contributing to their syndrome of dementia . This is probably also the case in our sample, as can be seen in diagnostic group 4, described in Fig 1 and Table 3. Here we see a smaller difference with the second DSI (and subsequently a low PCC), suggesting mixed pathology. Second, a limitation could be the use of SCD as controls. However, if SCD is indeed an enriched population, then the ability of our computerized decision support approach to select the correct patient for CSF testing would be underestimated, and would even be better if ‘healthy controls’ were used. In fact, clinical practice is not about comparing AD patients with healthy controls, but on the differential diagnosis. As SCD patients do seek help for their complaints, they also reflect daily clinical practice, where clinical decision support systems should work.Third, we simulated positive or negative CSF biomarkers and did not include amyloid-PET. This would hamper use of our computerized approach, for example, in the United States, where amyloid-PET is more frequently used than CSF. However, we know CSF correlates strongly with amyloid-PET, especially when both AB42 and tau have been used (as we did here) . Research replicating this study using PET instead, would be a useful addition to this work. Another interesting field for future work is MCI. CSF is a known predictor of progression from MCI to dementia; applying a stepwise approach might be helpful in guiding clinicians also in this challenge . Finally, future studies should evaluate this approach in a prospective design.
The diagnostic routine for AD is very suitable for shared decision making as there is typically more than one reasonable option to choose, such as either perform or not perform additional CSF testing . In shared decision making, the clinician and patient decide together which care plan best fits with individual preferences and needs . An important aspect of shared decision making is that appropriate expectations are set, and recognize that the decision for or against biomarker testing is a highly individualized decision that warrants a shared decision making approach . However, previous research showed that informing patients on diagnostic testing in general was not standard in the memory clinic [36, 44, 47]. The current paper helps clinicians to appreciate whether biomarkers are potentially useful for a specific patient. This information could then be the topic of a dialogue between clinician and patient/caregiver before actually ordering the test. This in turn aids the clinicians in appropriate expectation setting before embarking on biomarker testing.
In conclusion, we showed that computerized decision support can guide clinicians in who will benefit from CSF and who will not. This resulted in an optimized diagnosis that is at the same time cost-effective. Clinicians can thus order CSF applying a data-driven, individualized approach. By visualizing the effect of simulated CSF on diagnostic certainty, clinicians can be supported in making a weighted decision in ordering additional biomarker testing.
S1 Fig. Accuracy for different probability of correct class cutoff’s, comparing computerized decision support, no CSF, AUC and CSF for all patients.
PCC: probability of correct class
Solid lines show results for the computerized decision support (Fig 1A), dotted lines show results for using no CSF, but only neuropsychology, MRI and APOE (Fig 1B), dashed dotted lines show results for AUC (Fig 1C) and dashed lines using all data (Fig 1D).
Research of the Alzheimer center Amsterdam is part of the neurodegeneration research program of Amsterdam Neuroscience. The Alzheimer Center Amsterdam is supported by Stichting Alzheimer Nederland and Stichting VUmc fonds. The clinical database structure was developed with funding from Stichting Dioraphte. Wiesje M van der Flier holds the Pasman chair.
- 1. 2016 Alzheimer's disease facts and figures. Alzheimers Dement. 2016;12(4):459–509. Epub 2016/08/30. pmid:27570871.
- 2. Jack CR Jr., Bennett DA, Blennow K, Carrillo MC, Dunn B, Haeberlein SB, et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer's disease. Alzheimers Dement. 2018;14(4):535–62. Epub 2018/04/15. pmid:29653606.
- 3. Mattsson N, Lonneborg A, Boccardi M, Blennow K, Hansson O. Clinical validity of cerebrospinal fluid Abeta42, tau, and phospho-tau as biomarkers for Alzheimer's disease in the context of a structured 5-phase development framework. Neurobiol Aging. 2017;52:196–213. Epub 2017/03/21. pmid:28317649.
- 4. McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR Jr., Kawas CH, et al. The diagnosis of dementia due to Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7(3):263–9. S1552-5260(11)00101-4 [pii]; pmid:21514250
- 5. Frisoni GB, Boccardi M, Barkhof F, Blennow K, Cappa S, Chiotis K, et al. Strategic roadmap for an early diagnosis of Alzheimer's disease based on biomarkers. Lancet Neurol. 2017;16(8):661–76. Epub 2017/07/20. pmid:28721928.
- 6. Shaw LM, Arias J, Blennow K, Galasko D, Molinuevo JL, Salloway S, et al. Appropriate use criteria for lumbar puncture and cerebrospinal fluid testing in the diagnosis of Alzheimer's disease. Alzheimers Dement. 2018;14(11):1505–21. Epub 2018/10/15. pmid:30316776.
- 7. Shortliffe EH, Sepulveda MJ. Clinical Decision Support in the Era of Artificial Intelligence. Jama. 2018;320(21):2199–200. Epub 2018/11/07. pmid:30398550.
- 8. Mattila J, Koikkalainen J, Virkki A, Simonsen A, van GM, Waldemar G, et al. A disease state fingerprint for evaluation of Alzheimer's disease. J Alzheimers Dis. 2011;27(1):163–76. KG54325631131N10 [pii]; pmid:21799247
- 9. Koikkalainen J, Rhodius-Meester H, Tolonen A, Barkhof F, Tijms B, Lemstra AW, et al. Differential diagnosis of neurodegenerative diseases using structural MRI data. Neuroimage Clin. 2016;11:435–49. S2213-1582(16)30040-7 [pii]. pmid:27104138
- 10. Tolonen A, F. M. Rhodius-Meester H, Bruun M, Koikkalainen J, Barkhof F, Lemstra A, et al. Data-Driven Differential Diagnosis of Dementia Using Multiclass Disease State Index Classifier. Front Aging Neurosci. 2018;10:111. pmid:29922145
- 11. Bruun M, Rhodius-Meester HFM, Koikkalainen J, Baroni M, Gjerum L, Lemstra AW, et al. Evaluating combinations of diagnostic tests to discriminate different dementia types. Alzheimers Dement (Amst). 2018;10:509–18. Epub 2018/10/16. pmid:30320203; PubMed Central PMCID: PMC6180596.
- 12. Rhodius-Meester HFM, Liedes H, Koikkalainen J, Wolfsgruber S, Coll-Padros N, Kornhuber J, et al. Computer-assisted prediction of clinical progression in the earliest stages of AD. Alzheimers Dement (Amst). 2018;10:726–36. Epub 2019/01/09. pmid:30619929; PubMed Central PMCID: PMC6310913.
- 13. Bruun M, Frederiksen KS, Rhodius-Meester HFM, Baroni M, Gjerum L, Koikkalainen J, et al. Impact of a Clinical Decision Support Tool on Dementia Diagnostics in Memory Clinics: The PredictND Validation Study. Current Alzheimer research. 2019;16(2):91–101. Epub 2019/01/04. pmid:30605060.
- 14. van der Flier WM, Pijnenburg YA, Prins N, Lemstra AW, Bouwman FH, Teunissen CE, et al. Optimizing patient care and research: the Amsterdam Dementia Cohort. J Alzheimers Dis. 2014;41(1):313–27. LR61J34616453267 [pii]; pmid:24614907
- 15. van der Flier WM, Scheltens P. Amsterdam Dementia Cohort: Performing Research to Optimize Care. J Alzheimers Dis. 2018;62(3):1091–111. Epub 2018/03/23. pmid:29562540.
- 16. Rascovsky K, Hodges JR, Knopman D, Mendez MF, Kramer JH, Neuhaus J, et al. Sensitivity of revised diagnostic criteria for the behavioural variant of frontotemporal dementia. Brain. 2011;134(Pt 9):2456–77. awr179 [pii]; pmid:21810890
- 17. Gorno-Tempini ML, Hillis AE, Weintraub S, Kertesz A, Mendez M, Cappa SF, et al. Classification of primary progressive aphasia and its variants. Neurology. 2011;76(11):1006–14. WNL.0b013e31821103e6 [pii]; pmid:21325651
- 18. Roman GC, Tatemichi TK, Erkinjuntti T, Cummings JL, Masdeu JC, Garcia JH, et al. Vascular dementia: diagnostic criteria for research studies. Report of the NINDS-AIREN International Workshop. Neurology. 1993;43(2):250–60. pmid:8094895
- 19. Folstein MF, Folstein SE, McHugh PR. "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189–98. 0022-3956(75)90026-6 [pii]. pmid:1202204
- 20. Schmand B, Houx P, de Koning I. Normen van psychologische tests voor gebruik in de klinische neuropsychologie (in Dutch). Nederlands Instituut van Psychologen. 2012;www.psynip.nl/website/sectoren-en-secties/sector-gezondheidszorg/neuropsychologie.
- 21. Reitan R. Validity of the Trail Making Test as an indicator of organic brain damage. Percept Mot Skills. 1958;8:271–6.
- 22. Van der Elst W, Van Boxtel MP, Van Breukelen GJ, Jolles J. Normative data for the Animal, Profession and Letter M Naming verbal fluency tests for Dutch speaking participants and the effects of age, education, and sex. J Int Neuropsychol Soc. 2006;12(1):80–9. S1355617706060115 [pii]; pmid:16433947
- 23. Cummings JL, Mega M, Gray K, Rosenberg-Thompson S, Carusi DA, Gornbein J. The Neuropsychiatric Inventory: comprehensive assessment of psychopathology in dementia. Neurology. 1994;44(12):2308–14. pmid:7991117
- 24. Scheltens P, Leys D, Barkhof F, Huglo D, Weinstein HC, Vermersch P, et al. Atrophy of medial temporal lobes on MRI in "probable" Alzheimer's disease and normal ageing: diagnostic value and neuropsychological correlates. J Neurol Neurosurg Psychiatry. 1992;55(10):967–72. pmid:1431963
- 25. Koikkalainen JR, Rhodius-Meester HFM, Frederiksen KS, Bruun M, Hasselbalch SG, Baroni M, et al. Automatically computed rating scales from MRI for patients with cognitive disorders. European radiology. 2019. Epub 2019/02/24. pmid:30796570.
- 26. Lotjonen JM, Wolz R, Koikkalainen JR, Thurfjell L, Waldemar G, Soininen H, et al. Fast and robust multi-atlas segmentation of brain magnetic resonance images. Neuroimage. 2010;49(3):2352–65. Epub 2009/10/28. pmid:19857578.
- 27. Ashburner J, Friston KJ. Voxel-based morphometry—the methods. Neuroimage. 2000;11(6 Pt 1):805–21. Epub 2000/06/22. pmid:10860804.
- 28. Vos SJ, Visser PJ, Verhey F, Aalten P, Knol D, Ramakers I, et al. Variability of CSF Alzheimer's disease biomarkers: implications for clinical practice. PLoS One. 2014;9(6):e100784. PONE-D-14-16936 [pii]. pmid:24959687
- 29. Bruun M, Koikkalainen J, Rhodius-Meester HFM, Baroni M, Gjerum L, van Gils M, et al. Detecting frontotemporal dementia syndromes using MRI biomarkers. NeuroImage Clinical. 2019;22:101711. Epub 2019/02/12. pmid:30743135; PubMed Central PMCID: PMC6369219.
- 30. Wang Y, Catindig JA, Hilal S, Soon HW, Ting E, Wong TY, et al. Multi-stage segmentation of white matter hyperintensity, cortical and lacunar infarcts. Neuroimage. 2012;60(4):2379–88. Epub 2012/03/06. pmid:22387175.
- 31. Buckner RL, Head D, Parker J, Fotenos AF, Marcus D, Morris JC, et al. A unified approach for morphometric and functional data analysis in young, old, and demented adults using automated atlas-based head size normalization: reliability and validation against manual measurement of total intracranial volume. Neuroimage. 2004;23(2):724–38. Epub 2004/10/19. pmid:15488422.
- 32. Cole TJ, Green PJ. Smoothing reference centile curves: the LMS method and penalized likelihood. Statistics in medicine. 1992;11(10):1305–19. Epub 1992/07/01. pmid:1518992.
- 33. Girdler-Brown BV, Dzikiti LN. Hypothesis tests for the difference between two population proportions using Stata. Southern African Journal of Public Health. 2018;2(3):63–8. http://dx.doi.org/10.7196/SHS.2018.v2.i3.71.
- 34. Cluitmans L, Mattila J, Runtti H, van Gils M, Lotjonen J. A MATLAB toolbox for classification and visualization of heterogenous multi-scale human data using the Disease State Fingerprint method. Stud Health Technol Inform. 2013;189:77–82. Epub 2013/06/07. pmid:23739361.
- 35. Da X, Toledo JB, Zee J, Wolk DA, Xie SX, Ou Y, et al. Integration and relative value of biomarkers for prediction of MCI to AD progression: spatial patterns of brain atrophy, cognitive scores, APOE genotype and CSF biomarkers. NeuroImage Clinical. 2014;4:164–73. Epub 2013/12/29. pmid:24371799; PubMed Central PMCID: PMC3871290.
- 36. Visser LNC, Kunneman M, Murugesu L. Clinician-patient communication during the diagnostic work-up: the ABIDE project. Alzheimer's & Dementia: The Journal of the Alzheimer's Association. 2019;11:520–8.
- 37. Prestia A, Caroli A, Wade SK, van der Flier WM, Ossenkoppele R, Van BB, et al. Prediction of AD dementia by biomarkers following the NIA-AA and IWG diagnostic criteria in MCI patients from three European memory clinics. Alzheimers Dement. 2015. S1552-5260(14)02890-8 [pii]; pmid:25646957
- 38. Insel PS, Palmqvist S, Mackin RS, Nosheny RL, Hansson O, Weiner MW, et al. Assessing risk for preclinical beta-amyloid pathology with APOE, cognitive, and demographic information. Alzheimers Dement (Amst). 2016;4:76–84. Epub 2016/10/11. pmid:27722193; PubMed Central PMCID: PMC5045949.
- 39. Verberk I, Slot RE, Verfaillie SC, Heijst H, Prins ND, Van Berckel B, et al. Plasma-amyloid as pre-screener for the earliest Alzheimer's pathological changes. Ann Neurol. 2018;in press.
- 40. Palmqvist S, Insel PS, Zetterberg H, Blennow K, Brix B, Stomrud E, et al. Accurate risk estimation of beta-amyloid positivity to identify prodromal Alzheimer's disease: Cross-validation study of practical algorithms. Alzheimers Dement. 2019;15(2):194–204. Epub 2018/10/27. pmid:30365928; PubMed Central PMCID: PMC6374284.
- 41. Zekry D, Hauw JJ, Gold G. Mixed dementia: epidemiology, diagnosis, and treatment. J Am Geriatr Soc. 2002;50(8):1431–8. Epub 2002/08/08. pmid:12165002.
- 42. Zwan M, van Harten A, Ossenkoppele R, Bouwman F, Teunissen C, Adriaanse S, et al. Concordance between cerebrospinal fluid biomarkers and [11C]PIB PET in a memory clinic cohort. J Alzheimers Dis. 2014;41(3):801–7. Epub 2014/04/08. pmid:24705549.
- 43. van Maurik IS, Zwan MD, Tijms BM, Bouwman FH, Teunissen CE, Scheltens P, et al. Interpreting Biomarker Results in Individual Patients With Mild Cognitive Impairment in the Alzheimer's Biomarkers in Daily Practice (ABIDE) Project. JAMA Neurol. 2017;74(12):1481–91. Epub 2017/10/20. pmid:29049480; PubMed Central PMCID: PMC5822193.
- 44. Van der Flier W, Kunneman M, Bouwman FH, Petersen R, Smets EMA. Diagnostic dilemmas in Alzheimer's disease: Room for shared decision making. Alzheimers Dement. 2017;3(3):301–4. http://dx.doi.org/10.1016/j.trci.2017.03.008.
- 45. Stiggelbout AM, Pieterse AH, De Haes JC. Shared decision making: Concepts, evidence, and practice. Patient education and counseling. 2015;98(10):1172–9. Epub 2015/07/29. pmid:26215573.
- 46. Grill JD, Apostolova LG, Bullain S, Burns JM, Cox CG, Dick M, et al. Communicating mild cognitive impairment diagnoses with and without amyloid imaging. Alzheimers Res Ther. 2017;9(1):35. Epub 2017/05/06. pmid:28472970; PubMed Central PMCID: PMC5418690.
- 47. Kunneman M, Pel-Littel R, Bouwman FH, Gillissen F, Schoonenboom NSM, Claus JJ, et al. Patients' and caregivers' views on conversations and shared decision making in diagnostic testing for Alzheimer's disease: The ABIDE project. Alzheimer's & dementia (New York, N Y). 2017;3(3):314–22. Epub 2017/10/27. pmid:29067338; PubMed Central PMCID: PMC5651429.