Quantitative Analysis of PiB-PET with FreeSurfer ROIs

In vivo quantification of β-amyloid deposition using positron emission tomography is emerging as an important procedure for the early diagnosis of the Alzheimer's disease and is likely to play an important role in upcoming clinical trials of disease modifying agents. However, many groups use manually defined regions, which are non-standard across imaging centers. Analyses often are limited to a handful of regions because of the labor-intensive nature of manual region drawing. In this study, we developed an automatic image quantification protocol based on FreeSurfer, an automated whole brain segmentation tool, for quantitative analysis of amyloid images. Standard manual tracing and FreeSurfer-based analyses were performed in 77 participants including 67 cognitively normal individuals and 10 individuals with early Alzheimer's disease. The manual and FreeSurfer approaches yielded nearly identical estimates of amyloid burden (intraclass correlation = 0.98) as assessed by the mean cortical binding potential. An MRI test-retest study demonstrated excellent reliability of FreeSurfer based regional amyloid burden measurements. The FreeSurfer-based analysis also revealed that the majority of cerebral cortical regions accumulate amyloid in parallel, with slope of accumulation being the primary difference between regions.


Introduction
The prevalence of Alzheimer's disease (AD) is expected to increase dramatically worldwide within the next 50 years [1]. The future success of disease-modifying therapies will depend on accurate early diagnosis before the onset of clinical symptoms [2,3,4]. Amyloid-beta (Ab) plaque deposition is a hallmark of AD [5,6]. With the development of positron emission tomography (PET) tracers with high affinity for Ab plaques, such as 11 C-Pittsburgh Compound B (PiB), it is now possible to quantify neuropathology that was previously detectable only by postmortem examination [7]. PET enables in vivo visualization of AD pathology and allows for a broad range of metabolic processes to be assessed in preclinical and clinical AD. Individuals with AD and mild cognitive impairment have been shown to have elevated PiB retention in the cerebral cortex [7,8,9] although elevated PiB retention is also observed in some cognitively normal individuals. Ab deposition in asymptomatic individuals may represent a preclinical biomarker of AD [10,11]. Therefore, it is critical to quantify the Ab burden accurately and robustly, to further our understanding of disease mechanisms and to develop early diagnostic techniques.
Various imaging protocols and analysis procedures currently exist for PiB PET imaging. Our approach utilizes a 60-minute dynamic PiB scan. Binding potentials (BP ND ) are calculated using Logan graphical analysis [12] with cerebellar cortex as the reference region [4,11,13]. Manually defined regions of interest (ROIs) routinely examined include: prefrontal cortex (PREF), lateral temporal cortex (TEMP), precuneus (PREC), occipital lobe (OCC), head of the caudate (CAU), gyrus rectus (GR), cerebellum (CER), and brainstem (BS), with a predetermined set of rules for ROI delineation using co-registered MR images [11]. Based on these ROIs, our laboratory defines the mean cortical binding potential (MCBP) value as the mean BP ND in PREF, PREC, TEMP, and GR [11]. Other investigators may use a dynamic scan of 90 minutes with a distribution volume ratio (DVR) value calculated using cerebellum as the reference region [14,15,16] and a different selection of manually defined ROIs. Additional technical variations include, but are not limited to, the use of standard uptake value ratio [17,18,19] and voxel-wise analyses [20,21]. Due to the variation in imaging and data analysis protocols, it is not known whether findings from different research groups can be meaningfully compared. One key difficulty in achieving a standard protocol is dependence on manually drawn regions. One laboratory has reported good inter-rater reliability (in 5 control and 5 AD individuals) [22], but reproducibility was limited to the same research group. It should also be noted that, in many amyloid imaging studies using either hand drawn regions [11,13,14,15,17,19] or automatic templates [22,23,24], the rationale for ROI selection typically has been based on which regions have been previously reported as selectively affected by AD [11,16,24]. Other regions have generally been overlooked except in voxel-based analysis [21,25].
This study has three aims. First, we develop an automated, regional, quantitative amyloid imaging analysis protocol using FreeSurfer (Martinos Center for Biomedical Imaging, Charlestown, Massachusetts). We demonstrate that this protocol generates global amyloid deposition measurements comparable to results obtained with conventional hand drawn regions. FreeSurfer automatically segments and parcellates T1-weighted brain MRIs [26,27,28]. This tool has been used in many neuroimaging studies, including those focused on AD [29,30,31]. As a second aim, we examine test-retest reliability of the FreeSurfer based technique by analyzing the same PiB scan with FreeSurfer segmentation results from two consecutive MR scans. Finally, we investigate the distribution of amyloid deposition in FreeSurfer-defined cortical, subcortical and white matter regions of interest throughout the brain. Since the start of this work, a few groups has published their research using FreeSurfer to facilitate PiB imaging quantification [32,33,34], the relationship of FreeSurfer based quantification to manual based quantification has not been examined. It is also unknown how much the uncertainty in FreeSurfer segmentation would affect PiB quantification. We examine both of these two questions in this study.

I. Participants
Seventy-seven individuals (G1) aged 48 to 90 years old were selected from a larger population enrolled at the Washington University Knight Alzheimer's Disease Research Center (KADRC) in longitudinal studies of memory and aging. This cohort comprised 49 females and 28 males; 27 individuals were APOE4+ and 50 individuals were APOE4-. G1 includes representative participants across age and PiB status. Individuals were not excluded based on imaging findings; one individual had encephalomalacia, which provided a useful comparison between the manual and automated approaches. The clinical assessment protocol has been previously described [11,35,36]. In brief, a clinician determines the presence or absence of dementia and rates the severity in accordance with the Clinical Dementia Rating (CDR). CDR 0 indicates no cognitive impairment and CDR 0.5, 1, 2 and 3 indicate very mild, mild, moderate and severe dementia [35]. Our study included 67 non-demented individuals (CDR 0) and 10 individuals with very mild or mild dementia of the Alzheimer's type (CDR 0.5 or 1). All imaging was performed between 2005 and 2010.
A separate group of forty individuals (G2) aged 46 to 79 years old were selected from our KADRC participants for an MRI testretest study. This cohort consisted of 29 females and 11 males; 15 individuals were APOE4+ and 24 were APOE4-; three individuals had CDR rating of 0.5; one individual had no APOE status or CDR rating.
I.1 Ethics statement. All assessment and imaging procedures were approved by Washington University's Human Studies Committee. Written informed consent was obtained from all individuals or their care givers.

II. Imaging
In both cohorts, human brain PET imaging for amyloid deposition was performed using the radiotracer N-methyl-[ 11 C]2-(4-methylaminophenyl)-6-hydroxybenzothiazole (PiB). Preparation of PiB was carried out according to the published protocol [37]. Dynamic PET imaging was conducted with a Siemens 962 HR+ ECAT scanner in three-dimensional mode after intravenous administration of approximately 12mCi of PiB. The images were reconstructed on a 1286128663 matrix (2.1262.1262.43 mm) using filtered back-projection. Typical dynamic scans had 2565 seconds frames, 9620 seconds frames, 1061 minute frames, and 965 minutes frames.

III. Manual ROI analysis
ROIs ( Fig. 1) were manually defined according to previously described rules [11] using ANALYZE TM software [38] and MRI images previously transformed (12-parameter affine) to atlas space [39]. These regions were originally selected through review of the 30 to 60 minute PiB PET images in Alzheimer individuals to optimize the detection of elevated PiB uptake [11]. PET-MR registration was performed using the VGM algorithm [40]. Manually defined ROIs were then transformed to the native PET space. Inter-frame motion correction for the dynamic PET images was performed using standard image registration techniques [41] implemented in in-house software [39]. Regional timeactivity curves for each ROI were extracted by resampling the ROIs on the co-registered, unblurred PET images. Regional binding potentials (BP ND ) were estimated using Logan graphical analysis [12] with cerebellar cortex as reference [42]. The average of BP ND from four regions (PREF, PREC, GR, and TEMP) determined the mean cortical binding potential (MCBP) [11]. The washout rate constant (k 2 ) of the reference region (cerebellum) was set to 0.16/minute. It has previously been shown that varying k 2 over a 10-fold range (0.05 to 0.5/minute) has minimal impact on the BP ND values [11].

IV. FreeSurfer based analysis
FreeSurfer 5.0 was used to automatically segment the brain into various regions for G1 (as defined in the wmparc.mgz file); FreeSurfer 5.1 was used for brain segmentation for G2. Visual inspection of the automated segmentation results was performed for quality assurance purposes in all datasets. Correction was done when necessary according to the FreeSurfer manual (http://surfer. nmr.mgh.harvard.edu/fswiki/). Corresponding regions from the left and right hemispheres of the brain were combined to form a single ROI, e.g., the Left-Cerebellum-Cortex and the Right-Cerebellum-Cortex were combined to form a single ROI for quantitative analysis. The procedures used to compute BP ND values from FreeSurfer-defined and manually traced ROIs were otherwise identical.
To generate a comparable global amyloid deposition index similar to MCBP from our manual region approach, volumetric analysis was performed to identify FreeSurfer cortical regions maximally overlapping the manual ROIs (Table 1). To estimate the FreeSurfer version of MCBP (MCBP_FS), the FreeSurfer counterparts of the four manual regions (PREC_FS, PREF_FS, GR_FS, TEMP_FS) for MCBP calculation were used in the same fashion as in the manual technique.

V. Partial volume correction
In addition to analysis based on raw regional time-activity curves, partial volume corrected results were also obtained for G1 using a two-component technique [43] that has been widely applied in the context of PiB data analysis [14,16,17]. A brain tissue mask is generated based on FreeSurfer segmentation, a CSF dilution factor is calculated for each region, and the raw timeactivity curve for each region is corrected by this dilution factor before BP ND is calculated.

VI. Test-retest study (G2)
For G2, we processed the same PiB dataset with FreeSurfer ROIs generated based on the two different MPRAGE scans. A mean test-retest variability measurement DBP% was calculated for each region and MCBP according to Eq. 1: where, N is the total number of participants (40), i is the index for individual patients, BP NDi1 and BP NDi2 are the estimated BP ND using the first and second MPRAGE, respectively. In addition, a volumetric variability measurement DVOL% was also calculated for each region based on the repeated MPRAGE and FreeSurfer outputs following Eq. 2: where, VOL i1 and VOL i2 are the total number of voxels in each FreeSurfer region obtained with the first and second MPRAGE.

VII. Statistical analysis
Intra-class correlation coefficients (ICC) were calculated to examine the agreement between binding potentials estimated using the manual and FreeSurfer approaches. SAS software (SAS Institute Inc., Cary, North Carolina, USA) was used to calculate the ICC estimates and their confidence intervals. We adjusted for CDR status (CDR = 0: negative, CDR.0: positive), age, and ApoE4 status. ApoE4 status was defined as 0 (no copies ApoE4) or 1 (at least 1 copy of the ApoE4 allele). To adjust for these covariates, mixed models with a variance components structure were employed to estimate the ICC and 95% confidence intervals. We specified a random intercept to account for the within-subject correlation caused by each subject having two regional binding potential observations. In addition, we treated rater as a random effect. The variance components estimated from the mixed model provided an ICC estimate. ICC was estimated as s 2 w =(s 2 w zs 2 r zs 2 ), where s 2 w is the within-subject variance, s 2 r is the within-rater variance, and s 2 is the residual variance.
In the MRI test-retest study, in addition to test-retest variability as defined by Eqs. 1 and 2, ICC was also calculated for repeated measurements of BP ND and FreeSurfer regional volumes for comparison with previously reported results.
To examine regional amyloid binding patterns, the Pearson correlation coefficient was evaluated across subjects between the  regional BP ND and the MCBP. Pearson correlation was also evaluated between cortical gray matter regions and the underlying white matter regions. Both Pearson correlation and Spearman correlation were evaluated for BP ND estimated with and without partial volume correction.
We have previously used a manual MCBP cutoff of 0.18 as the criterion for PiB status determination [11,44,45]. To investigate the impact of using the FreeSurfer-based PiB quantification technique to assess PiB status, we also examined the feasibility of   classifying participants as either PiB-or PiB+ using FreeSurferbased global or regional binding potentials. These classifications were compared with results obtained by the manual MCBP approach.

VIII. Software
The FreeSurfer-based analysis workflow has been implemented as an open source package that can be run from a linux command line or through the XNAT imaging informatics platform [46]. Specific modules include PET quantification and partial volume correction (C source code), a toolbox for image registration and analysis (C and Fortran), and a Unix shell script for executing the full workflow. The source code for the partial volume correction is available at (https://bitbucket.org/nrg/fs_tools). The XNAT module includes: a pipeline for executing the workflow, data types for representing the FreeSurfer and MCBP output, and webbased reports for displaying quality control and data reports. The XNAT module can be accessed on the XNAT Marketplace at https://marketplace.xnat.org/fspet.

I. Manual vs. FreeSurfer region definitions
Excellent agreement in MCBP measurement was observed between the manual and FreeSurfer based approaches without partial volume correction (ICC = 0.98 (95%CI: 0.97, 0.99)) (A recent review (in Russian) [47] briefly mentioned our technique and a modified version of Figure 2 was shown to demonstrate the FreeSurfer based quantification method as an effective approach for routine analysis of amyloid PET imaging data). The results obtained by both methods were highly correlated (Pearson r = 0.99, p,10 268 , MCBP_FS = 0.916 MCBP_MAN+0.03; Spearman r = 0.94). These results were generated without considering the MR scanner differences. The same outcome was obtained controlling for variability in MR scanners and excluding the 5 subjects scanned at 1.5T. Therefore, all the analyses presented below were based on all the participants without controlling for MR scanner differences. When partial volume correction was applied, agreement was still excellent although ICC decreased slightly (ICC = 0.95, 95%CI: 0.94, 0.96). With partial volume correction, the Pearson correlation between the two approaches was 0.99 (p,10 268 ), and Spearman correlation was 0.92.
To categorize subjects as PiB-vs. PiB+, a MCBP cutoff value of 0.18 has been used in previous studies [11,44,45]. Using the same cutoff, the present cohort was separated into 52 PiB-subjects and   25 PiB+ subjects based on the conventional manual approach. Determination of PiB status was identical using FreeSurfer ROIs and the same 0.18 cutoff, which further demonstrates the equivalence of the two approaches.

II. MRI Test-retest reproducibility
Test-retest data are listed in Table 2. FreeSurfer segmented ROI volumes varied by a few percent (nominally, ,5%) on repeat MRI. ICC values for ROI volume ranged from 0.684 for the frontal pole to 0.996 for Unsegmented White Matter. However, BP ND measurements were remarkably stable (ranged 0.25% for MCBP to 1.91% for CC_Mid_Posterior). Test-retest reproducibility of BP ND assessed by ICC was excellent: the minimum ICC was 0.970 for CC_Central; in several regions, including MCBP and posterior cingulate cortex, test-retest ICC was 1.0.

III. Partial volume correction
Partial volume corrected binding potential values strongly correlated with uncorrected values (Table 3). Thus, in most ROIs, partial volume correction did not cause major changes in subject ranking, as revealed by high values of Spearman correlation. Most rank changes occurred in subjects with low levels of PiB uptake. Lower Spearman correlations were observed in regions with low PiB retention and narrow ranges of BP ND values (e.g., hippocampus).

IV. Regional specificity of PiB binding
Traditionally, PiB status has been determined by evaluating MCBP, computed by averaging BP ND over a fixed set of ROIs [11,14,48]. For this purpose, our group has used four ROIs (see Introduction) [10,11]. However, it is unclear whether the determination of PiB status is sensitive to this particular choice. To investigate this question, we evaluated regional BP ND values in relation to our measure of MCBP. In the majority of the cortical regions, BP ND values strongly correlated with the global MCBP (Table 4, Fig. 3). As might be predicted, regions with high levels of PiB binding in the clinically positive group (e.g., precuneus, BP ND = 0.73860.286 (mean 6 SD) and rostral anterior cingulate, BP ND = 0.65760.295) (Fig. 4) showed the greatest correlation with MCBP (Pearson r = 0.98 and 0.96, respectively). Similarly, subcortical structures with high levels of PiB binding in the clinically positive group were also strongly correlated with MCBP, e.g., caudate (r = 0.853), putamen (r = 0.862), and accumbens (r = 0.913) ( Table 5). Conversely, regions with lower BP ND , e.g., the cuneus gyrus and the entorhinal cortex, more weakly correlated with MCBP (Fig. 3). These lower correlations may reflect a different trajectory of amyloid accumulation over time in high vs. low BP ND regions.
As noted earlier, previous studies have classified individuals as PiB-vs. PiB+ using MCBP .0.18 as the criterion [11,44,45]. We observed that many regions can be similarly used to classify individuals, provided an appropriate ROI-specific criterion is identified (Table 6). Among the FreeSurfer regions we examined, 26 cortical regions and 3 subcortical regions could be used to determine PiB positivity with less than 10% difference in classification using MCBP .0.18 as the reference. Identical classification was obtained based on the BP ND in four regions, viz., ctx-medialorbitofrontal, ctx-parsorbitalis, ctx-rostralmiddlefrontal, and GR_FS.

Discussion
The main objective of this study was to examine the feasibility of using FreeSurfer-defined ROIs in place of manual regions for purposes of determining PiB status. A high level of agreement was found between the manual and FreeSurfer-based approaches to quantifying global amyloid burden using the MCBP. Moreoever, we observed high test-retest ICC for BP ND measurements using FreeSurfer segmentations of repeated MRI scans. In fact, this ICC (.0.970) is better than the reported ICC values for inter-rater reliability and manual vs. automated comparison of regional PiB uptake measurements [22]. This indicates the FreeSurfer based PiB quantification is reliable in many regions and can therefore be routinely deployed. Some regions, e.g., the frontal pole, exhibit variable FreeSurfer volumes (test-retest ICC = 0.684 in our data) [27]. Nevertheless, measured BP ND was generally reliable, even in such regions (frontal pole ICC = 0.995 in our data). It should be pointed out that the BP ND test-retest reproducibility in this study only represents uncertainty attributable to region definition; we did not conduct a full test-retest study with repeated PiB scans as done by Lopresti and colleagues [16]. Uncertainty in BP ND (,1% for most regions) attributable to FreeSurfer ROI definition variability is only a small fraction of the full test-retest variability reported by Lopresti et al. (,5%) [16]. This study confirms the observation that amyloid deposition varies spatially [11,15]. Traditionally, a small number of regions with the greatest PiB binding potentials have been used to evaluate PiB status. However, we find that many regions are comparably useful in determining PiB status, albeit with different thresholds (Table 6). This observation reflects the high correlation of regional BP ND to MCBP in many regions ( Table 4). The logic here is reminiscent of the demonstration by Haxby and colleagues that classification can be based on less robust features of imaging data [49], Thus, it is not critical to identify the ''optimal'' set of regions for determination of PiB status. Rather, we should focus on developing a standard approach to facilitate multi-institutional studies and cross comparisons of results from various groups.
It has not been standard practice in our group to apply partial volume correction in PiB studies. The two-component partial volume correction technique adopted by many groups [17,22] compensates for the brain atrophy without modeling difference between gray vs. white matter. In a comparison study [50], it was demonstrated that three-component partial volume correction, which differentiates between gray vs. white matter, provides a more accurate estimation of regional intensity values. However, the three-component model was more sensitive to errors in image co-registration and segmentation. Therefore, it is not surprising that two-component partial volume correction did not change the rank of the amyloid burden measured by PiB PET, nor did it change correlation to MCBP within individual cortical regions. High correlations between cortical gray matter regions and the underlying white matter reflect the limited spatial resolution of PET. More sophisticated partial volume correction may enable detection of more localized variations in PiB retention. But these techniques must be thoroughly investigated to determine the impact of registration and segmentation errors.

Conclusion
FreeSurfer-based ROI analysis has the advantage of automated segmentation, which greatly reduces labor costs and potentially enables standardization across laboratories. In addition, since FreeSurfer is widely used in AD research [29,30,51], a FreeSurferbased amyloid imaging analysis protocol would allow integration of amyloid deposition measurements with cortical thickness, volume and other anatomical measurements. Although some degree of variability exists in the automated segmentation procedure [26,28,52], and manual correction of FreeSurferderived boundaries is sometimes necessary, especially in the presence of atrophy, our MRI test-retest study demonstrated excellent reliability of the FreeSurfer based estimation of regional BP ND despite variability of ROI volumes. Our data also suggest that the majority of cerebral cortical regions accumulate amyloid in parallel. Longitudinal studies investigating the rate of amyloid accumulation both globally and regionally are ongoing in our laboratory.