Early Prediction of Alzheimer’s Disease Using Null Longitudinal Model-Based Classifiers

Incipient Alzheimer’s Disease (AD) is characterized by a slow onset of clinical symptoms, with pathological brain changes starting several years earlier. Consequently, it is necessary to first understand and differentiate age-related changes in brain regions in the absence of disease, and then to support early and accurate AD diagnosis. However, there is poor understanding of the initial stage of AD; seemingly healthy elderly brains lose matter in regions related to AD, but similar changes can also be found in non-demented subjects having mild cognitive impairment (MCI). By using a Linear Mixed Effects approach, we modelled the change of 166 Magnetic Resonance Imaging (MRI)-based biomarkers available at a 5-year follow up on healthy elderly control (HC, n = 46) subjects. We hypothesized that, by identifying their significant variant (vr) and quasi-variant (qvr) brain regions over time, it would be possible to obtain an age-based null model, which would characterize their normal atrophy and growth patterns as well as the correlation between these two regions. By using the null model on those subjects who had been clinically diagnosed as HC (n = 161), MCI (n = 209) and AD (n = 331), normal age-related changes were estimated and deviation scores (residuals) from the observed MRI-based biomarkers were computed. Subject classification, as well as the early prediction of conversion to MCI and AD, were addressed through residual-based Support Vector Machines (SVM) modelling. We found reductions in most cortical volumes and thicknesses (with evident gender differences) as well as in sub-cortical regions, including greater atrophy in the hippocampus. The average accuracies (ACC) recorded for men and women were: AD-HC: 94.11%, MCI-HC: 83.77% and MCI converted to AD (cAD)-MCI non-converter (sMCI): 76.72%. Likewise, as compared to standard clinical diagnosis methods, SVM classifiers predicted the conversion of cAD to be 1.9 years earlier for females (ACC:72.5%) and 1.4 years earlier for males (ACC:69.0%).


Sociodemographic and clinical ADNI data
APOE-✏4 carrier state include the several states: 0: non-carrier, 1: single copy carrier, 2: two copies carrier); all of these features were documented in the screen visit of ADNI participants. These features have also been considered in other dementia studies based on ADNI database.
CDRGLOBAL indicate severity of dementia (0: no dementia, 0.5: very mild dementia, 1: mild dementia, 2: moderate dementia, 3: severe dementia); and it is obtained by using an algorithm that weights memory more heavily than the other remaining five categories (orientation, judgment and problem solving, community affairs/involvement, home life and hobbies, and personal care).
MMSE and CDGLOBAL are available for each participant visit and are the basis of ADNI for baseline clinical assessment.

PLSR modelling
By definition, after observing n data samples from each block of variables, PLSR decomposes the n ⇥ N matrix of zero-mean predictors variables Y nv and the n ⇥ M matrix of zero-mean responses variables Y v into the form shown in Eq (1).
where Y nv ⇢ R N and Y v ⇢ R M represent the y 0 values of vr and qvr ROIs, respectively. T and U are n ⇥ p matrices that are the p extracted score vectors (projections, components, latent vectors) of Y nv and Y v , respectively. The N ⇥ p matrix P and the M ⇥ p matrix Q represent matrices of loadings; and the n ⇥ N matrix E and the n ⇥ M matrix F are the matrices of residuals (or error matrices), assumed to be independent and identically distributed random normal variables. The decompositions of Y nv and Y v are made to maximize the covariance between T and U .
Final LME formulation LME modelling for each ROI was applied separately in men and women by assuming different random intercepts (at baseline response) for each subject. Also, the effect of age ( a) and educ ( e) was assumed the same for all subjects. The y-intercept varies between subjects, but it is the same for all subjects' observations. Eq (2) describes the LME formulation used to model the change of every MRI biomarker.
where i = 1, ..., n subjects; n is the number of normal-HC csf subjects (n=46); j = 1, ..., n i ; n i is equal to the number of observations per subject; r = 1, ..., nr biomarkers, nr=166 ROIs. y r ij is the value of the r th ROI for the j th of n i observations in the subject i. The coefficients r 1 , r a and r e represent a p⇥1 vector of unknown fixed effect parameters of ROI r, being p the number of fixed effects including the intercept. These 's vary between ROIs, but they are fixed for all subject's observations. Intercept ij , age ij and educ ij are the set of fixed-effects covariates or regressors for the j th response on the i th subject. Intercept ij regressor is constant and equal to 1. ↵ i1 is the random effects coefficient for the i th subject and it varies between subjects. ✏ ij is the error for the j th observation in subject i.
By reorganizing terms, the formulation of mixed-effects model defined in Eq (2) can be written as Eq (3).
where the summation ( 1 Intercept ij + ↵ i1 Intercept ij ) represents the y 0 (y-intercept value at basal stage), see figure in S1 Fig. In Eq (4) is represented the matrix and vector notation of Eq (3), where r =( a , e )'. X ij is the design matrix with the values of age ij and educ ij regressors (without the constant term). have used the LME and PLSR approaches to infer the ROI values at basal stage and over time; and then to infer the residuals. The figure shows an example of LME-based trajectories for hypothetical variant and quasi-variant ROIs fitted on healthy elderly data. In each plot, P 1 , P 2 and P 3 represent hypothetical observations of each ROI y for two subjects at three different ages (a 1 , a 2 and a 3 ). The first subject is assumed as HC and the second subject is assumed as AD, and it is assumed that neither subject was used to build the models. The black lines represent the healthy population regression line calculated for each ROI, whereŷ 0 represents the vertical y-intercept value of healthy population. The blue and red lines represent the individual regression lines estimated for both subjects by assuming both as healthy; and the pointsP 1 , P 2 andP 3 represent the inferredŷ's for the three ages. Observe that,ŷ HC 0 andŷ AD 0 are the subject-specific y-intercepts estimated for HC and AD subjects, respectively. For both cases, y HC 0 andŷ AD 0 of vr ROI are inferred from theŷ HC 0 andŷ AD 0 of qvr ROI using the PLSR model (as described above). The slope a is the rate change of the standard deviation of ROI per unit of age; and this slope is the same for both estimated individual regression lines.

Application of proposed method in a hypothetical example
✏ HC1 , ✏ HC2 , ✏ HC3 , ✏ AD1 , ✏ AD2 and ✏ AD3 are the residuals of each observation with respect to the estimated individual regression lines, which are computed in general way as y ŷ. Here, the figure shows that AD residuals are greater than HC residuals because this subject is possibly affected by further neurodegeneration.