Evaluating trajectories of episodic memory in normal cognition and mild cognitive impairment: Results from ADNI

Background Memory assessment is a key factor for the diagnosis of cognitive impairment. However, memory performance over time may be quite heterogeneous within diagnostic groups. Method To identify latent trajectories in memory performance and their associated risk factors, we analyzed data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) participants who were classified either as cognitively normal or as Mild Cognitive Impairment (MCI) at baseline and were administered the Rey Auditory Verbal Learning test (RAVLT) for up to 9 years. Group-based trajectory modeling on the 30-minute RAVLT delayed recall score was applied separately to the two baseline diagnostic groups. Results There were 219 normal subjects with mean age 75.9 (range from 59.9 to 89.6) and 52.5% male participants, and 372 MCI subjects with mean age 74.8 (range from 55.1 to 89.3) and 63.7% male participants included in the analysis. For normal subjects, six trajectories were identified. Trajectories were classified into three types, determined by the shape, each of which may comprise more than one trajectory: stable (~30% of subjects), curvilinear decline (~ 28%), and linear decline (~ 42%). Notably, none of the normal subjects assigned to the stable stratum progressed to dementia during the study period. In contrast, all trajectories identified for the MCI group tended to decline, although some participants were later re-diagnosed with normal cognition. Age, sex, and education were significantly associated with trajectory membership for both diagnostic groups, while APOE ɛ4 was only significantly associated with trajectories among MCI participants. Conclusion Memory trajectory is a strong indicator of dementia risk. If likely trajectory of memory performance can be identified early, such work may allow clinicians to monitor or predict progression of individual patient cognition. This work also shows the importance of longitudinal cognitive testing and monitoring.

Introduction From a clinical and research perspective, an individual's cognition may be categorized as unimpaired (normal cognition), mildly impaired (mild cognitive impairment or MCI), or moderately to severely impaired (dementia). Over time, those with normal cognition remain stable or decline to MCI or dementia. Similarly, those with MCI may remain stable (about 61% based on Wolf's study), progress to dementia (annual progression rate is about 9.6% in specialist settings and about 4.9% conducted in community settings), or revert to normal (about 19.5%) [1][2][3][4]. Therefore, examining potential trajectories within populations and identifying individuals who are likely to follow particular cognitive trajectories could inform early diagnosis and predict progression. However, most analyses of cognitive test data, such as from linear mixed models, only provide a mean score to describe the average change for the study population over the follow-up time [5][6][7]. The different underlying developmental courses within certain diagnostic groups are poorly understood due to the limitations of the statistical methods. Group-Based Trajectory Modeling (GBTM), developed by Nagin [8,9], provides a solution for this issue. It assumes that the underlying population (such as people with MCI) is a mixture of at least two latent subgroups. Individuals in each latent subgroup follow a similar trajectory over time. Results of this analysis provide estimated longitudinal trajectories, sometimes called "developmental trajectories", and the procedure provides the estimated proportion of each sub-group following the same latent trajectory.
Memory assessment in neuropsychological testing is one of key elements in the diagnosis of MCI and dementia [10]. One of most commonly used tests for verbal memory assessment is the Rey Auditory Verbal Learning Test (RAVLT) [11], which is designed to evaluate episodic memory in persons age 16 and older [12]. The RAVLT provides measures of immediate memory span, learning, and delayed recall, so the severity of memory dysfunction and changes over time can be evaluated. For instance, MCI patients show poorer learning than 'recovered' MCI and healthy control groups [13]. The RAVLT is easily administered, so researchers often prefer it to other list learning tests, especially under conditions of limited assessment time [14]. RAVLT performance is influenced by subjects' demographic characteristics, including age, education, and sex [15]. The RAVLT delayed recall score has been reported to have adequate discrimination in older adults with normal cognition vs. MCI (AUC = 0.71) and good discrimination for normal cognition vs. dementia (AUC = 0.93) [16].
Poor performance on the test is considered a prognostic marker for MCI and dementia [17]. Zhao et al.'s [18] study shows that RAVLT performs better than the Complex Figure Test (CFT) for predicting progression from MCI to AD, and data from the Canadian Study of Health and Aging demonstrate that RAVLT short delayed recall may be used to predict incident dementia (sensitivity = 78%, specificity = 72%, positive likelihood ratio = 2.81 when combined with Wechsler Adult Intelligence Test Revised (WAISR) Digit Symbol). [19]. In the Gothenburg MCI study [20], neuropsychological tests including RAVLT, along with hippocampal volume and cerebrospinal fluid markers, were used to predict progression from MCI to dementia within a follow-up time of two years. They found that a combination of all markers was the most successful in predicting dementia, but the RAVLT was the best individual predictor (AUC = 0.93) for dementia. RAVLT was also used to distinguish the AD from other types of dementia [21].
In this analysis, we explored latent trajectories of episodic memory using GBTM and longitudinal RAVLT measures within two diagnostic groups: Alzheimer's Disease Neuroimaging Initiative Phase 1 (ADNI1) subjects with a diagnosis of normal cognition or MCI at study baseline. Also, we investigated whether trajectory membership predicted incident dementia.

Sample and data sources
Data were obtained and downloaded from the ADNI database (http://adni.loni.usc.edu/) on June 3, 2015. The primary goal of ADNI project is to obtain and assess clinical, imaging, genetic and biospecimen biomarkers related to the development and progression of the AD and develop treatments that may slow the progression of AD [22]. More information can be found at www.adni-info.org.
Because our interest is focused on longitudinal change, our analysis was limited to ADNI1 participants, who have the longest follow-up. During ADNI1, which began recruiting participants in 2004, 400 MCI participants, 200 participants with early AD, and 200 control participants, all aged 55-90 years, were targeted for recruitment at 50 study sites across North America (actual enrollment: 397 MCI participants, 189 early AD, and 229 normal control participants). They were followed-up at regular intervals from study baseline. Baseline MCI subjects were followed-up every six months for the first three years and then yearly after that. Baseline normal subjects were followed-up every six months for the first year and then yearly after that. ADNI1 has the following inclusion criteria for all subjects: 1) Hachinski Ischemic Score less than or equal to 4; 2) Age between 55-90; 3) Geriatric Depression Scale less than 6; 4) Visual and auditory acuity adequate for neuropsychological testing; 5) Good general health with no diseases precluding enrollment; 6) At least a 6 th grade education. Participants were classified as normal cognition or MCI based on criteria in Table 1 (more details can be found in ADNI website: http://adni.loni.usc.edu/).
All ADNI research activities were approved by Institutional Review Boards (IRB) at the participating study sites, and all participants provided written informed consent. The University of Kentucky IRB declared this secondary analysis of ADNI data exempt since the ADNI data are de-identified.

Inclusion and exclusion criteria
All analyses for the current study were based on participants diagnosed with MCI or normal cognition who enrolled in ADNI1 and had any follow-up visits in ADNI1, ADNIGO, or ADNI2. Fourteen participants (1 American Indian, 12 Asian, and one more than one race) were excluded because their numbers were too small for further analysis. Twenty-one participants with only one visit were also removed from the analyses, which resulted in 591 total participants for analysis: 219 normal participants and 372 MCI participants. No statistical significances were found among baseline age, sex, education, and baseline MMSE total scores between included and excluded participants.

Rey auditory verbal learning test (RAVLT)
The RAVLT is a list-learning task that measures auditory verbal memory [23]. The RAVLT is conducted using two 15-item lists of unrelated words (List A and List B) that are read to the participant in a series of trials. To begin, List A is read to the participant, and the participant is asked to repeat as many of the 15 words as they can, and the number of correct words is recorded. This procedure is repeated in another four trials, which results in 5 learning trial scores. Then the examiner reads the second list of 15 words (List B) to the participant, and the participant is asked to recall as many of words in List B as possible. Next, the participant is again asked to recall the words in List A, and the number of words (immediate recall score) correctly recalled is recorded. The participant is then given different tasks to do for 30 minutes. After 30 minutes, the participant is asked again to recall as many words as they can from List A, and the number of correct words (30-minute delayed recall) is recorded. Last, the participant is asked to recognize the words in List A when presented a sheet containing the 15 List A words plus 15 distractor words, and examiner records the number of successes (recognition score). In the current study, the 30-mintute delayed recall score, which ranges from 0 to 15 [24], is the outcome of interest.

Covariates
Covariates included APOE genotype, baseline age, race, sex, years of education, smoking information, and body mass index (BMI), as well as self-reported indicators of cardiovascular disease risk (i.e., diabetes, and hypertension) and sleep apnea. APOE genotype, which has been shown to be associated with cognitive trajectory [25], was available for all 591 participants (ε2/ 2: 2 (0.34%); ε2/3: 46 (7.78%); ε2/4: 13 (2.20%); ε3/3: 281 (47.55%); ε3/4: 199 (33.67%); ε4/4: 50(8.46%)). The genotypes were converted to a dummy indicator for a carrier of at least one ε4 allele. Age at baseline was calculated based on the participant's birthdate and visit date. Race was coded as a dummy variable: 0 (Black) and 1 (White). Similarly, smoking was coded as 0 (non-smoker) and 1 (current smoker). Since ADNI collects medical history as single-field text strings (variable "mhdesc"), the self-reported status of hypertension, diabetes, and sleep apnea was extracted by searching for keywords. For example, participants with sleep apnea were identified by first converting all "mhdesc" text string values to uppercase, and then a search for the text string 'SLEEP' was used to find participants who reported sleep problems. Then each identified case was checked individually to confirm sleep apnea. A similar   procedure was conducted for the status of hypertension (keywords: "HYPERTENSION," "HIGH BLOOD PRESSURE") and diabetes (keyword: DIABETES). Misspelled conditions in the raw data were identified when each value was checked. These three variables were coded as dummy variables (0 = not reported and 1 = reported).

Statistical analysis
Analyses were conducted in this study as follows. First, analyses on baseline differences of characteristics between normal and MCI participants were assessed with Chi-square, t test, or the Mann-Whitney test. Second, GBTM was applied to identify latent longitudinal trajectories of RAVLT 30-minute delayed recall scores for normal and MCI participants separately. For implementing GBTM, the mean level of the outcome was modeled first as a function of time, and latent groups were identified; the proportion of the population that follows each latent trajectory was estimated based on posterior probablities of group memberships. Individuals were assigned to specific latent groups based on the maximum posterior probability of group membership for each. Next, we compared individuals' cognitive status at enrollment and cognitive status at the end of follow-up by each trajectory. Finally, we also examined how the probability of trajectory group membership varied with covariates versus an arbitrary reference trajectory group.
To identify the best fitting GBTM models, various models were fitted for 2 to 6 trajectories (inclusive) [6] and all combinations of orders (quadratic was the highest order) of each group. All covariates were included and fixed at baseline. Bayesian Information Criterion (BIC) was applied to select the optimal number of groups and orders [8]. Then, log-likelihood ratio tests were applied to reduce the number of covariates in the model. GBTM accommodates different types of outcome data including count, psychometric scale, and dichotomous data [9]. Based on histograms of the outcome in each subsample (see Fig 1), we assumed that the 30-minute delayed recall scores followed a censored (i.e., bounded) normal distribution for the normal group and a zero-inflated Poisson (ZIP) distribution for the MCI group, which showed evidence of excess zero scores. In the normal group, the 30-minute delayed recall score was standardized by subtracting the baseline sample mean (7.5) and dividing by the sample standard deviation (3). The estimated scores were transformed back to the original scale when the figures (Figs 2 and 3) were plotted. For simplicity in the ZIP model for the MCI group, we assumed that the probability for the excess zero generating process was common to all trajectory groups and constant over time.
The final fitted models provide descriptive information on the estimated groups, which include: (1) posterior probabilities of an individual belonging to one of the identified groups, (2) the proportion of each potential trajectory group following the same latent trajectory, (3) regression parameters to define the shape of the trajectories over time (intercept only, linear, and quadratic in the present study), and (4) risk and protective factors associated with membership in a trajectory group. All data were analyzed using PC-SAS 9.4 (SAS Institute, Inc., Cary, NC), and 0.05 was set as the significance level. Group trajectory analyses were carried out using the procedure PROC TRAJ [5]. Table 2 presents the characteristics of participants overall and by cognitive status at baseline. Baseline normal participants (n = 219) were followed up longer than MCI participants (n = 372) (p < 0.001). Normal participants were older (p = 0.039), more highly educated (p = 0.049), more likely to be female (p = 0.007), and had higher BMI than MCI participants (p = 0.037). Normal participants comprised fewer APOE-ε4 allele carriers and participants with sleep apnea than the MCI group (p<0.001). Over 91% of participants had at least three examinations. There were 2476 total observations from MCI participants and 1541 observations from normal participants.

Potential groups identified from cognitive normal and MCI by GBTM, respectively
GBTM was applied to test how many distinct trajectory groups fit normal and MCI sample in this study, separately. Six latent profiles were identified for normal subjects (Fig 2), while five profiles were identified for MCI participants (Fig 3) based on the BIC values among the candidate trajectory models. Table 3 shows detailed descriptions of the trajectories for normal and MCI participants, including the shape of each group trajectory and the number of probable members, parameter estimates of trajectories, and mean and standard deviation of posterior probabilities. As shown in Table 3, for all six normal groups and all 5 MCI groups, the averages of the posterior membership probabilities were greater than 0.7, which indicates that the models are acceptable based on the Nagin's 'rule of thumb' on minimum average posterior probability [26]. To facilitate discussion, trajectories were labeled based on baseline cognitive diagnosis and the baseline mean score of 30-minute delayed (Figs 2 and 3, and Table 3). For example, 'Norm 6.9' means trajectory for normal subjects with average 30-mintute delayed recall = 6.9 at baseline. We further describe the six trajectories for normal subjects as three types of trajectories over time based on the progress over time: stable (Norm 12.9 and Norm 6.9), curvilinear decline (Norm 9.4 and Norm 9.1), and linear decline (Norm 3.3 and Norm 6.2) (Fig 2). The Norm 12.9 (n = 22) and Norm 6.9 (n = 44) groups remained relatively stable over nine years of follow-up, which account for about 30% of normal participants in the sample. Both Norm 9.4 and Norm 9.1 present curvilinear change indicated by the quadratic term in the model (Table 3) over time, but with different decline rate at the different time. Norm 9.1 (n = 30) showed a slow curvilinear decline during the first four years of follow-up and faster decline after four years, and Norm 9.4 (n = 31) revealed mild curvilinear decline (Fig 2). The individuals in Norm 3.3 and Norm 6.2 demonstrate linear decline overtime (Fig 2). Notably, some of the groups are differentiated primarily by their intercepts, such as groups of Norm 12.9, Norm 6.9, Norm 6.2, and Norm 3.3, which might suggest participants assigned in those groups with low baseline means were misclassified at the time they enrolled into ADNI (e.g., Norm 3.3 and Norm 6.2). In contrast to groups identified for normal participants, all potential trajectory groups for MCI participants showed the tendency to decline, except MCI 0.0, which starts near and stays around "0" (floor effect) (Fig 3).

Comparison between cognitive status at enrollment and cognitive status at the end of follow-up for individuals in each trajectory
Since the baseline cognitive status was evaluated in ADNI1 without regards of RAVLT assessments, we were able to investigate whether the development of each trajectory predicts dementia by calculating the proportion of participants' cognitive status at the end of follow-up by each trajectory (Table 4) within baseline normal and MCI participants, respectively. The majority of normal participants assigned to Norm 6.9 and Norm 12.9, who should be stable as shown in the trajectory, remained cognitively normal over nine years follow-up and only 5 (out of 66) participants in Norm 6.9 progressed to MCI status (Table 4). No participants in Norm 6.9 or Norm 12.9 progressed to dementia by the end of follow-up. Members of Norm 3.3, and Norm 9.1 were most likely to develop dementia by the end of follow-up (18% and 17% of group members, respectively), which demonstrates that baseline scores alone often poorly predict future cognitive status.
Similarly, for baseline MCI, participants in MCI 1.5 (n = 143) and MCI 0.0 (n = 66) were most likely to develop dementia by the end of follow-up, with over 70% of each group progressing (Table 4). Participants in MCI 3.3 (n = 66) had a slightly better chance to remain in MCI (52%) than develop dementia (48%), while the majority of participants in MCI 5.6 (70%) and MCI 10.1 (65%) remained MCI. Interestingly, 11% of participants in MCI 5.6 (n = 73), and 22% of participants in MCI 10.1 (n = 23) were re-diagnosed with normal cognition by the end of follow-up. Again, baseline performance was not a good predictor of future cognitive status since participants' cognitive status can convert, be stable, or progress to dementia. No participants in the groups of MCI 1.5, MCI 0.0, and MCI 3.3 were diagnosed as normal by the end of follow-up.

Risk factors associated with the probability of trajectory group membership
Analyses also were done to examine the factors may influence group membership. Tables 5 and 6 present the parameter estimates for the risk factors associated with trajectory group membership in normal and MCI participants, respectively. The comparison groups were arbitrarily selected for both normal (Norm 9.4) and MCI (MCI 5.6) participants. Based on BIC and log-likelihood ratio test, age, BMI, and education were retained in both the 6-group model for normal participants and 5-group MCI model (Tables 5 and 6), while sex was only retained in the model for normal participants, and APOE ε4 was in the model only for MCI participants. Demographic variables associated with group memberships among baseline normal participants (vs. Norm 9.4) included female sex (p = 0.02 for Norm 12.9), older age (p = 0.03 % a = percent of subjects in trajectory group with each diagnosis based on the greatest posterior probability for the subject; n b = number of subjects assigned in the trajectory group; c = count (%). https://doi.org/10.1371/journal.pone.0212435.t004 Evaluating trajectories of episodic memory for Norm 6.2) and higher education (p = 0.02 for Norm 6.2, and p = 0.01 for Norm 6.9). For example, in Norm 6.2 group for normal baseline participants, it was estimated that each additional year of education increase reduces ratio of the probability of belonging to Norm 6.2 vs. the probability of belonging to Norm 9.4 by 22%. Similar effects were observed for Norm 3.3 vs. Norm 9.4. Presence of APOE ε4 allele increased the probability ratio of belonging to MCI 1.5 vs. MCI 5.6 by 85%, and the probability ratio of belonging to MCI 0.0 vs. MCI 5.6 by 388%, holding other covariates in the model constant. Based on Table 6 for MCI participants, age is not significant but kept in the model, which may suggest that age cannot distinguish the rest groups from reference group MCI 5.6, but it may distinguish MCI 3.3 from MCI 0.0 (data not shown). BMI was significant in MCI 1.5 (p = 0.02), and MCI 0.0 (p <0.001), and higher BMI increased the relative probability of classification into MCI 5.6.

Discussion
In this study, 6 latent trajectories with three main change patterns-stable, linear decline, and curvilinear decline-were identified for baseline normal participants, while five latent trajectories were found for baseline MCI. These results demonstrate that within same clinical diagnosis, distinct subgroups exist and may follow different developmental trajectories and experience disparate outcomes. The baseline scores that defined the trajectory groups were not Evaluating trajectories of episodic memory a strong predictor of future cognitive status. Comparisons between cognitive status at enrollment and the end of follow-up by trajectories verified the prognosis of these potential trajectories, which emphasize the need for longitudinal data in making predictions about future cognition. Consistent with the findings on memory change trajectories in participants from the Australian Imaging, Biomarkers, and Lifestyle (AIBL) study [27] and in Washington Heights Inwood Columbia Aging Project (WHICAP) [25], we identified stable and decline groups for baseline normal participants. Furthermore, our study also identified two curvilinear groups (Norm 9.1 and Norm 9.4), which accounts for about 28% of subjects in our sample. The Norm 9.1 group was stable during early follow-up, then showed a rapid decline in the following years. Several papers [28][29][30] described the sharp decline phenomenon, which suggested that some of those participants were initially cognitively stable but may have experienced a significant decline associated with cognitive impairment and dementia, and patients with rapid cognitive decline usually have a worse prognosis [29][30][31]. Compared to 65.5% and 50% participants assigned into stable groups for AIBL and WHICAP, respectively, we had proportionally fewer participants assigned to the stable groups (Norm 6.9 and Norm 12.9; about 30%). This inconsistency may be due to the larger number of trajectories identified in our study, the longer follow-up time in our analyses (9 years vs. 4.5 years in AIBL study and six years in WHICAP), as well as different inclusion and exclusion criteria within each study.
To our knowledge, this is one of a few studies to explore memory trajectories in MCI participants by using GBTM. Although most of the potential trajectory groups show a tendency to decline, 11% and 22% of participants in MCI 5.6 and MCI 10.1, respectively, were re-diagnosed with normal cognition at the end of follow-up, and 19% and 13% progressed to Evaluating trajectories of episodic memory dementia, respectively, which reflects the heterogeneous outcomes often reported in MCI participants [32]. Consistent with other studies [4,6,32], our study supports that MCI may not be just the intermediate stage between normal cognition and dementia. The trajectories in MCI 1.5 and MCI 0.0 began with low scores, and the majority (73% in MCI 1.5 and 71% in MCI 0.0) progressed to dementia, which may indicate the participants in these two groups were already at a late stage of MCI at enrollment. Based on our results, the rate of incident dementia from MCI may be correlated with the baseline mean of RAVLT 30-minute delayed recall. The higher the baseline mean value, the lower the incidence rate. Overall, the 9-year cumulative incidence of dementia from MCI was 53% (roughly 8% per year). The annual rate is comparable to the rate for the 5-year cumulative incidence of dementia from MCI reported in specialist centers (39%, or roughly 9% per year) [4]. Different demographic variables were associated with trajectory membership for normal and MCI participants. For normal baseline participants, older age and less education were significantly associated with being in the "linear decline" group (Norm 3.3), and participants with less education were relatively more likely to be in Norm 6.9. Being female was associated with a stable trajectory (Norm 12.9), which is shown in Table 6 (female 77%), but was inconsistent with Lin's study [33]. In baseline MCI participants, genetic risk factor APOE-ε4 allele and/or lower BMI was associated with lower memory scores (MCI 1.5 and MCI 0.0).

Strengths and limitations
The strengths of this study included relatively large baseline sample size (219 for normal and 372 for MCI), frequent clinical assessments, standardized diagnostic criteria for cognitive status, and standardized data collection procedure across multiple study sites. This allowed a rigorous investigation of memory trajectories and their relationship with risk or protective factors using long follow-up and multiple visits (up to 12 visits for over nine years).
One limitation of the study sample is that the participants in ADNI may not be representative of the general population of older adults in the United States. We focused only on participants from ADNI1 to obtain participants with longer follow-up, so we excluded the early MCI participants recruited in ADNIGO, and late MCI participants enrolled in ADNI2 due to insufficient follow-up. The diagnosis of MCI was made without further specifying the subtype of MCI (i.e., amnestic, nonamnestic, single domain, multiple domains). Thus, a more homogeneous set of trajectories may exist within subtypes of MCI participants. Furthermore, the specific trajectory groups defined in our analysis are not likely to generalize to other populations. In the future studies, we aim to validate these trajectories using MRI or biomarker data and identify trajectories for subsets of MCI participants (i.e., early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI)).
Another limitation is the uncertainty of group membership. Even though the average posterior probability is high, the uncertainty of group assignment may lead to bias [5,34]. Also in general, although demographics and baseline scores may provide some guidance, patients cannot be assigned with accuracy to any trajectory at an initial visit but rather only after the subject has been followed for several assessments.

Conclusion
Group based trajectory modeling can be used to identify latent subgroups of participants based on memory trajectory. The relationship between trajectory group and cognitive status at end of follow-up confirmed that memory trajectory is an excellent indicator of dementia risk. If trajectory group membership can be identified reliably during early follow-up, such work will allow clinicians to monitor or predict progression of individual patient's cognition. This work also shows the importance of longitudinal cognitive testing and monitoring.