Multi-Contrast Multi-Atlas Parcellation of Diffusion Tensor Imaging of the Human Brain

In this paper, we propose a novel method for parcellating the human brain into 193 anatomical structures based on diffusion tensor images (DTIs). This was accomplished in the setting of multi-contrast diffeomorphic likelihood fusion using multiple DTI atlases. DTI images are modeled as high dimensional fields, with each voxel exhibiting a vector valued feature comprising of mean diffusivity (MD), fractional anisotropy (FA), and fiber angle. For each structure, the probability distribution of each element in the feature vector is modeled as a mixture of Gaussians, the parameters of which are estimated from the labeled atlases. The structure-specific feature vector is then used to parcellate the test image. For each atlas, a likelihood is iteratively computed based on the structure-specific vector feature. The likelihoods from multiple atlases are then fused. The updating and fusing of the likelihoods is achieved based on the expectation-maximization (EM) algorithm for maximum a posteriori (MAP) estimation problems. We first demonstrate the performance of the algorithm by examining the parcellation accuracy of 18 structures from 25 subjects with a varying degree of structural abnormality. Dice values ranging 0.8–0.9 were obtained. In addition, strong correlation was found between the volume size of the automated and the manual parcellation. Then, we present scan-rescan reproducibility based on another dataset of 16 DTI images – an average of 3.73%, 1.91%, and 1.79% for volume, mean FA, and mean MD respectively. Finally, the range of anatomical variability in the normal population was quantified for each structure.


Introduction
For quantitative analysis of the human brain anatomy, defining structures or regions of interest (ROIs) is one of the first essential steps. There are many types of automated or manual approaches that have been proposed to define ROIs in the brain, based on locations and contrasts of the structures. These methods often require a priori knowledge as a form of atlas. For manual ROI drawing, an atlas could be a simple pictorial representation of a structure of interest, which guides operators to define the boundary. The manual delineation, while often used as a gold standard, is a time-consuming approach. Various types of automated parcellation tools have been proposed, which try to define the boundary of anatomical structures based upon image contrasts [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]. Some of the advanced tools incorporate a priori knowledge about the location of the target structures as a form of probabilistic atlas [18][19][20][21][22][23][24][25][26]. This location constraint prevents the contrast-based boundary definition from leaking into unlikely regions.
To use the probabilistic location information in the atlas, the atlas has to be registered, or warped, to each subject image, in which voxel-to-voxel correspondence is established between the two coordinate systems where the atlas image and the subject image are defined. The concept of brain mapping also leads to an alternative approach for automated structural parcellation, which is called atlas-based parcellation [3,7,11,[27][28][29][30][31][32][33][34][35]. Namely, if the voxel-to-voxel mapping is perfectly accurate, any arbitrary structures can be defined only once in the atlas and such anatomical definitions can be transferred to the images being mapped to. There are no constraints in the number of structures or the way the structures are defined in the atlas. In this sense, we can assume that the whole-brain voxel mapping inherently contains parcellation tools for potentially all definable structures inside the brain.
The atlas-based parcellation, however, is accurate only if the voxel-to-voxel brain mapping correctly defines corresponding voxels between the two images, which is not always guaranteed. The accuracy level is influenced by various sources of differences between the atlas and the subject brains; these could be morphological (atrophy, hypertrophy, malformation, etc.) or contrast (biological such as signal hyperintensity or hypointensity or procedural such as imaging parameters).
To reduce the impact of erroneous mapping of voxels, and consequent mis-parcellation of target structures, multi-atlas approaches have been postulated. Suppose that the hippocampus is defined in Ndifferent atlases and each atlas is warped to a to-beparcellated subject image, then the Ndefinitions (labels) of the hippocampus are casted to the subject space, which can be fused (a.k.a ''label fusion'') based upon pre-defined algorithms such as those proposed by [7,[36][37][38][39][40][41][42]. If the mis-registration of each atlas causes random errors, the errors should be reduced by integrating Ndefinitions. It has been shown that simple label fusion techniques based on majority voting yield robust parcellations [7,41,43]. More recently, weighted majority voting strategies, by incorporating intensity information, demonstrated significant improvement in the parcellation accuracy. A variety of weighting approaches, based on intensity similarity metrics, have been proposed -global [36], local [44,45], semi-local [45,46], and nonlocal [47]. In addition to voting, a statistical fusion technique (i.e. Simultaneous Truth and Performance Level Estimation, STAPLE [42]) and a collection of its variants [48][49][50] have been proposed, in which a stochastic model of rater behavior has been incorporated in the estimation process. Compared with voting techniques, the main limitation of statistical fusion strategies is that the decision rule is independent of the image intensity while the major advantage is its underlying elegant mathematical theory. Initial attempts to incorporate the intensity information into the STAPLE framework rely on a priori similarity measures [51,52] or estimating the voxelwise correspondence between the registered rater and the subject using intensity information [49].
We have recently introduced the diffeomorphic likelihood fusion algorithm (DLFA) as an approach to integrate anatomical information from multiple T1-weighted atlases and fuse their anatomical features [53,54]. Unlike previous label fusion algorithms, DLFA does not fuse a set of binary label maps obtained from the atlas-to-subject propagations. DLFA poses the parcellation problem in the framework of maximum a posteriori (MAP) estimation, estimating the maximizing parcellation labels given the observable image intensity, similar to the idea proposed in [6]. The MAP estimation is handled within the class of generative models by representing the observable imagery as a conditionally Gaussian mixture random field, conditioned on the random atlaslabel pair and the diffeomorphic change of coordinates for each label. The atlas-label pair and their diffeomorphic correspondences are unknown and viewed as latent variables. Locality is introduced into the global representations of the deformable templates by allowing different atlas-label pairs to be used to interpret different voxels or different structures, under the assumption that the local optimal diffeomorphism varies from label to label for a given atlas. The MAP estimation is solved by iterating between fixing the local optimal diffeomorphisms and obtaining the maximizing parcellation labels, and then locally optimizing the local diffeomorphisms for the fixed parcellation, in an EM fashion [55]. The atlas-dependent structure-specific local diffeomorphisms are estimated in the E-step in the EM algorithm.
The purpose of this paper is to extend the DLFA to diffusion tensor imaging by incorporating multiple-contrast information. It arises naturally to extend a single contrast image such as T1weighted images to multi-contrast images (e.g. eigenvalues and eigenvectors of DTI) by assuming conditional independence in computing p(IjW ,a), where Idenotes the measurable image, W denotes a given parcellation label, and a the randomly selected atlas-label pair. Previously, vector-to-vector or tensor-to-tensor registration algorithms have also been introduced [56][57][58][59], which was further extended to multi-channel image registration, in which multiple-contrast information, such as FA, diffusivity, and fiber orientation, is used simultaneously to drive the registration algorithm [60]. These ideas could improve the registration accuracy between each atlas and the testing subject, but the incorporation of the multiple-contrast information in the multiatlas likelihood fusion process has not been introduced so far.
In this paper, we introduce a framework to incorporate the multi-contrast intensity information generated in DTI into the multi-atlas DLFA framework and apply it to whole brain parcellations into 159 structures. In the T1 case, the distribution of the intensity in each structure is modeled as a single Gaussian. In the DTI case, we use five intensity elements ([FA, MD, and fiber angle (a unit vector)]) with the intensity distribution of each, in every single label, being modeled as a Gaussian Mixture Model (GMM), the parameters of which are computed using maximumlikelihood estimation. In this study, we examine the parcellation accuracy of the method on 25 patient data with a varying degree of pathology. We also present the scan-rescan reproducibility of the method on another dataset of 16 healthy subjects which were scanned twice. In addition, the ranges of anatomical variability of all the structures in the 16 subjects were characterized.

Patient populations
All subjects used as atlases and the first testing dataset were obtained from the existing clinical database of pediatric brain MRI, and were older than 24 months of age. DTIs from sixteen subjects (Female = 7, Male = 9, age = 7.67+/24.12) were used to create the multiple atlases (Table 1). Among these sixteen subjects, ten subjects were diagnosed as normal. In order to cover the wide range of anatomical phenotypes in the multiple atlases, 6 cases with different types of anatomical abnormalities were also included in the atlas set as shown in Table 1.
For the scan-rescan reproducibility test, sixteen healthy volunteers with no history of neurological conditions (8 M/8 F, 22-61 years old, mean: 31 years old) participated in this study. This is the same data used by [61], where details of the protocol can be found.

Ethics Statement
This study was approved by the Johns Hopkins Medicine Institutional Review Board (JHM-IRB). All subjects provided written, informed consent for participation in accordance with the oversight of the JHM-IRB.

MRI scans
For the first dataset, MR imaging was performed using a 1.5T scanner (Avanto; Siemens, Erlagen, Germany). All patients underwent routine clinical multiplanar T1, T2, and FLAIR pulse sequences, including DTI. The DTI was obtained using a singleshot EPI with parallel acquisition. Diffusion weighting was performed along 21 independent axes with b = 1000 s/mm2, and repeated twice to enhance the SNR (TE = 84 ms, TR = 7700 ms). DTI was scanned in the axial orientation with an imaging matrix of 96696 (to 1926192 with zero-filled interpolation), FOV 2406240, and slice thickness 2.5 mm.
For the second dataset, subjects were scanned twice using a 3T MR scanner (Achieva, Philips Healthcare, Best, The Netherlands). The DTI dataset was acquired using a multi-slice, single-shot, echo-planar imaging (EPI), spin-echo sequence (TR/TE = 6281/ 67 ms, SENSE factor = 2.5). Sixty-five transverse slices were acquired parallel to the line connecting the anterior commissure (AC) to the posterior commissure (PC) with no slice gap and 2.2 mm nominal isotropic resolution (FOV = 2126212, data matrix = 96696, reconstructed to 2566256).

DTI processing
All DTI datasets were processed offline using DTIStudio software (H. Jiang and S. Mori, Johns Hopkins University, Kennedy Krieger Institute, lbam.med.jhmi.edu or www. MriStudio.org) [62]. The raw diffusion-weighted images were first co-registered to one of the b0 images with a 12-parameter affine transformation using Automated Image Registration (AIR) [63]. The six elements of the diffusion tensor, the fractional anisotropy (FA), and the mean diffusivity (MD) were calculated.

Initial creation of multiple atlases
For the sixteen subjects that were selected to be the multiple atlases, the images were first normalized to MNI coordinates with a nine-parameter affine transformation. The initial parcellation of the brain into 159 structures was performed using the atlas-based automated image parcellation pipeline as described in our previous publication [60]. We used our single-subject Eve atlas [64] and the accompanied brain parcellation map with 159 structural definitions as the template, which was warped to the 16 subjects using the three-contrast large deformation diffeomorphic metric mapping (LDDMM) [64][65][66]. The three contrasts included FA, MD, and the manually-delineated lateral ventricles. In a previous study, we tested the accuracy of this automated structural parcellation approach in cerebral palsy patients and excellent accuracy was reported [65]. In this study, we included patients with more severe abnormalities. If gross parcellation errors occurred, they were manually corrected to establish the multiple

Multi-contrast likelihood-fusion
Let (I a1 ,W a1 ),(I a2 ,W a2 ),:::,(I aN ,W aN ) f g denote N DTI atlaslabel pairs, where N~16 in this study. Instead of using singlevalued (T1-weighted) images, we use vector-valued images for both the atlases and the test subjects. For each atlas-label pair a, I a~½ I a FA ,I a MD ,I a x ,I a y ,I a z ,a[ a 1 ,a 2 ,:::,a 16 f g , where I a FA denotes the gray-scale FA image of the atlas-label pair, I a MD denotes the grayscale MD image of the atlas-label pair, and I a x ,I a y ,I a z denote the absolute values of the three elements of the primary eigenvector. In this sense, the image intensity at each voxel is a 5-element vector I a (x) : x[V?R 5 , with V5R 3 being a finite grid where the images are defined. For the label image W a in each atlas-label pair, we define it as a function from the image domain V to a subset of the non-negative integersW a (x) : x[V? 0,1,2,3,:::, f 159g, where W a (x)~0 for voxel x belonging to the unlabeled background, and W a (x)~k,k[ 1,2,3,:::,159 f gfor voxel x labeled as the k-th structure such as the left caudate, the right putamen, and so on. Correspondingly, we denote the to-be-parcellated test subject as (I,W ), where I~½I FA ,I MD ,I x ,I y ,I z and W is the label image we aim to obtain.
For multi-contrast, multi-atlas parcellation, the goal is to estimate the label map W associated with the im-ageI~½I FA ,I MD ,I x ,I y ,I z of the test subject, for which we solve via the Maximum a Posteriori (MAP) estimation To achieve this goal, we use the EM algorithm by introducing the latent variable A[ a 1 ,a 2 ,:::,a N f g that designates the random atlaslabel pair. The Q-function in the EM algorithm computes the loglikelihood of the complete data log p(I,W ,A) given the incomplete data -the to-be-parcellated measured image I and the previous parcellation label W old , where interpret the voxel xin the test image. Denoting the conditional probability of the atlas-label selector as P A(x) (ajI,W old ), the Qfunction reduces to: The sequence of iteratesW (1) ,W (2) ,:::, associated to the alternating maximization defined by the iteration is monotonic in the incomplete data likelihood with atlas selector P A(x) (ajI,W old ), the proof of which can be found in [53].
The algorithm can be summarized as: Step1: Initialize the diffeomorphism for each voxel x to be identical everywhere, as:Q Q old a (x)~Q Q (0) a . InitializeW old . Step2: Compute the approximated atlas-label selector as: Step3: Obtain a new parcellation image for the test image via Step4: Recalculate the diffeomorphisms of the atlases onto the parcellation labels via: Step5: Update the parcellation W old /W new and the optimal diffeomorhiphismsQ Q old a /Q Q new a , go to Step 2.
Stop the iteration if either {4 or the number of total iterations is bigger than 30.
Remarks. 1. To initialize the optimal diffeomorphismQ Q (0) a in Step 1 that is associated to the atlas-label pair a, we used the optimal diffeomorphism obtained from a two-channel LDDMM image mapping with one channel being the FA images and the other the MD images, which has been validated in registering DTI images [60]. Given the pair of the target Iand an atlasJ, we compute a diffeomorphic deformation Qbetween the two vector valued images I~½I FA ,I MD andJ~½J FA ,J MD such that where w 0 is the identity transformation. The optimal diffeomorphic deformation is generated by integrating the vector field, which is found to minimize the energy: v v t~a rg min v t : Parcellation of Diffusion Tensor Images where the parameters s FA and s MD control the weighting of the two contrast-matching terms of smoothness regularization terms.
In this study, we sets FA~sMD~1 . To ensure that the solution of Eq. (9) lies in the space of diffeomorphisms, the set of time-indexed vector fields v t must be sufficiently spatially smooth, requiring V to be a reproducing kernel Hilbert space. For computational purposes, we use an operator-induced norm on V such that where + 2p is the Laplacian operator with powerp §1:5. In this study, we usep~1,c~1, and ais selected according to the cascading method described in [60,67] as0:01{0:005{0:002. For the initialization of the parcellation label in Step1, there are multiple choices. In our case, we use the propagation of the labels of atlas-label pair a 1 under the optimal global diffeomorphism To incorporate the local optimized diffeomorphismQ Q old a (x) in the calculation of Eq. (6), we use mode approximation via The optimized diffeomorphism in Eq. (10) is obtained as the mode, as computed in Eq. (8).
3. In calculating the terms in Eq. (6), we assume that the prior distribution on the atlas-label pairAis uniformp A~a ð Þ~1 N ,a~a 1 ,:::,a N . Via Bayes' rule, we have: To calculatep(I(x)jW old (x),a,Q Q old a (x)), we define a hierarchical model between the image Iand the underlying diffeomorphic change in coordinates of the atlasQ Q a , so that W splits IandQ Q a . Conditioned on W , the joint measurement, I andQ Q a , is independent, giving rise to: Therefore, we have: where Mdenotes the total number of Gaussians in the mixture model, p(I a FA (x)jW a (x)~k,t) represents the probability density function of a single Gaussian and P t a ak t~1 are the mixing coefficients for different Gaussians. For the parameters of the mixture of Gaussians associated to a specific labelk, h ak~( m ak t ,s ak t ,a ak t ),t~1,2,: we employ the EM algorithm to derive the maximum-likelihood estimators. The termp(I FA (x)jW old (x),a) in Eq. (14) is computed according to For any given structure, the total numbers of Gaussians, M, for the mixtures are pre-defined. We set M~2 for the structures smaller than 1000 mm 3 and M~4 for those larger than 1000 mm 3 . These parameters in the GMM were empirically determined. The GMM is used to quantitatively characterize the characteristic shape of the histogram of the intensity distribution of each contrast in each structure. 4. To compute the Q-function as described in Eq. (7) To be specific, we use the Dice overlap measurement between W new and W a 0Q {1 a to approximate the termp(W new ja,Q). The optimal local diffeomorphism is assumed to come from a composition of the optimal global diffeomorphism and an optimal local 12-parameter affine transformation. Namely, Q a (x)~Q Q (0) a 0a(x), whereQ Q (0) a is the optimal global diffeomorphism computed in Step 1 and ais the optimal local 12-parameter affine transformation that maximizes p(W new ja,Q Q (0) a 0a)p(Q Q (0) a 0aja), where p(W new ja,Q Q (0) a 0a)is quantified as the Dice overlap between W new and W a 0(Q Q (0) a 0a) {1 . Note that the optimal local affine transformation is obtained on a structureby-structure basis. Therefore, for a single atlas-label pair a, the optimized local diffeomorphismsQ Q new a (x) should be identical for voxels in the same structure. Given atlas-label pair a, the prior distribution of the transformation p(Qja) is estimated as the multiplication of two terms p(Qja)~p(Q Q (0) a ja)p(aja,Q Q (0) a ), where p(Q Q (0) a ja)is estimated by the one over the metric distance [68] in diffeomorphism space given by the exponential of the geodesic length, computed from the two-channel LDDMM mapping. The prior on the 12-parameter affine transformationp(aja,Q Q (0) a ) is modeled as a multivariate Gaussian, p(aja,Q Q (0) a )~N M a ,C a ð Þ , similar to the strategy adopted in [69]. In our approach, we use M a~1 ,0,0,0,1,0,0,0,1,0,0,0 ½ T andC a~d iag 1e {2 ,1e {2 ,:::, Â À 1e {2 ,1e 2 ,1e 2 ,1e 2 Þ. We assume that all the parameters are mutually independent, and thus the covariance matrix is diagonal. Since the first 9 parameters in arepresent the affine matrix, their variances should be small, for which we assign 0.01. The last 3 parameters represent the translation in the x,y,zdirections, and therefore their variances should be big, for which we use 100.
Since the optimization of the local diffeomorphisms is based on the overlap between the parcellation W old of the target and the diffeomorphically deformed results of the atlas parcellations and the optimized diffeomorphismsQ Q a (x) are identical for voxels in the same structure, the term p(W old (x)ja,Q Q old a (x))in Eq. (11) is approximated as being proportional to the overlap distance between W old (x) and W a 0Q Q old a ({1)(x). Again, for atlas-label pair a, this quantity is identical for voxels in the same structure.
To sum up, the MAP estimation problem is solved in an EM approach. We iterate between fixing the local optimal diffeomorphism for each label in each atlas-label pair and obtaining the maximizing parcellation of the target, and then locally optimizing the diffeomorphisms associated to each label in each atlas-label pair given the fixed parcellation.

Image quantification
After the brain had been parcellated to the 159 structures, the peripheral ROIs were further decomposed to the CSF, cortex, and peripheral white mater using MD (threshold value = 0.0015 to separate the CSF and the tissue) and FA (threshold value = 0.    ROIs. The parcellation criteria used in this paper followed our previous publications [70,71], in which the cortex and the white matter definitions followed ICBM-LPBA40 [24] and probabilistic white matter atlas [71], respectively.

Manual delineation for accuracy measurements
Eighteen structures (sixteen white matter structures and two deep gray matter structures) were manually delineated on the preselected 2D slices of fourteen subjects from three groups -four from the normal group, five from the mild abnormal group, and five from the severe abnormal group. The manual delineation was performed by incorporating information from MD, FA, and colorcoded eigenvector maps. To investigate the intra-and inter-rater variability of the manual delineations, two raters (X.T. and J.H.) performed the manual delineations twice with more than 3-week intervals. To quantitatively evaluate the parcellation accuracy of our algorithm, we used four measurements: 1. Dice overlap coefficients We calculated the Dice overlap coefficients between the manually delineated 2D ROI and the corresponding ROI in the automated parcellations. The Dice overlap coefficient is calculated , where TP is the area of the region that belongs to both the automated ROI and the manual ROI, FP is the area of the region that belongs to the automated ROI but not the manual, and FN is the area of the region that belongs to the manual ROI but not the automated. 2. The correlation between the size of the manually delineated ROI and that of the automated ROI.
3. The correlation between the mean FA value of the manual ROI and the mean FA of the automated ROI.
4. The correlation between the mean MD value of the manual ROI and the mean MD of the automated ROI.
To evaluate the improvement in parcellation accuracy given by the multi-contrast approach, we compared the parcellations from the 5-contrast multi-atlas approach (FA, MD, vector elements x, y, and z, combined), with those obtained from the same multi-atlas but with only a single contrast -FA-only, MD-only, and a threecontrast approach -EV-only (x, y, z combined). These four methods vary from each other only in the computation of Eq. (14). To compare the four methods statistically, for each structure, we performed a one-way ANOVA to examine significant difference among the Dice results obtained from the four approaches. For statistical correlation analysis, we used William's modification of Hotelling's test [72].
For the scan-rescan reproducibility test, we investigated the volume difference between the automated parcellations of the same structure from the two scans for the same subject. The volume difference is computed as: , where vol(ROI 1 )denotes the volume of a specific ROI for scan 1 and vol(ROI 2 )denotes the volume of the same ROI for the second scan of the same subject.
In addition, we examined the difference between the mean FA value of the automated parcellation of each single structure for the first scan and that of the automated parcellation for the second scan, as well as the difference between the mean MD values. Results Figure 1 demonstrates the concept of the multi-contrast image parcellation using five different contrasts obtained from DTI as well as the concept of characterizing the intensity distribution of each contrast using a GMM. The five selected structures are spatially adjacent to each other and need to be accurately demarcated based on their contrast features. Each of the five contrast rows cannot uniquely differentiate all the five structures, but each column (structure) has a unique contrast signature by combining these five contrasts. For example, the distinction between the tissue and the ventricle is most clear on the MD image, while the distinction between the caudate and the anterior limb of internal capsule (ALIC) is most clear on the FA image. Likewise, the difference between ALIC and the posterior limb of internal capsule (PLIC) is largest in the eigenvector image. The GMM quantitatively characterizes the contrast features delineated by these multi-channel histograms. Figure 2 shows results of Dice measurements, reporting spatial agreement with manual delineation. The 5-contrast approach is compared with FA-only, MD-only, and EV-only approaches. Because of the unique contrast signature of each structure, the best contrast that can accurately define it varies. For example, to define the contricospinal tract (CST), the EV provides the best accuracy, but it provides poor results to define the putamen, which is best defined by FA or MD. The 5-contrast approach performs well for all structures. According to the results from the one-way ANOVA, we found statistical differences among the 4 approaches in 11 structures, in which the 5-contrast approach consistently achieved one of the best results. These structures include: the caudate, the putamen, the cingulate gyrus, the middle cerebellar peduncle, and the corticospinal tract in both hemispheres. The absolute Dice level was 0.8-0.9. Note that some structures are difficult to define even manually with perfect reproducibility. A good example is the superior longitudinal fasciculus (SLF), which has a vague structural boundary and the inter-rater variability is large (Dice = 0.8+/2 0.259). Because the manual definition is used as the gold standard, the spatial matching cannot be better than the inter-rater spatial matching (automated methods cannot be more accurate than manual delineation by definition).
The correlation coefficients between the sizes of the manual and the automated parcellations obtained from the four approaches for all the eighteen ROIs are listed in Table 3. For some structures, the ROI sizes from all the four automated approaches are highly correlated with the ROI sizes of the manual delineations. However, structures such as the caudate, the corticospinal tract (CST), and the cingulate gyrus (CG), the performance varies from approach to approach. In Figure 3 and Figure 4, we show examples of the correlation plot between the automated and manual approaches for a gray matter (caudate) and a white matter (CST) structures. Figure 5 shows actual parcellation results of the CST in the brainstem of subjects with different degrees of abnormalities, which demonstrates how the integration of five contrast information can accurately delineate the sizes. In this example, the fiberorientation information in the EV contrast is necessary to accurately reflect the small CST sizes in Case #3. Namely, the CST has a characteristic Z-orientation (blue) fiber orientation, which can uniquely differentiate the CST from the surrounding high-FA white matter structures. The integration of the EV information provides strong constraints for the parcellation, specifically defining the high-FA regions with a strong orientation alignment along the Z axis. The FA-, MD-, and EV-only Table 3. Pearson correlations between manual and four different automated ROI area measures, with bold typesetting indicating that the correlation between the automated and the manual measures is statistically significantly stronger than other methods (p,0.025). approaches extracted the CST accurately for Case #1 and #2, but grossly overestimated the CST size for Case #3. Based on the comparison results shown in Table 3, significant improvement of the correlation between the size of the automated parcellations and that of the manual delineations was achieved by the 5-contrast approach, compared with the other single contrast approaches. Likewise, we performed the manual-auto correlation analyses of the mean FA and MD values within each single ROI. As shown in Table 4 and 5, again, the 5-contrast approach is consistently superior to the other three approaches in terms of either FA or MD correlation. Figure 6 demonstrates parcellation results for three patients with different degrees of abnormalities. A high level of parcellation accuracy is visually appreciable for the wide variety of anatomical states.
According to our test-retest experiments, the reproducibility results were 4.7%, 2.19%, and 2% for the volume, FA, and MD, respectively, averaged over all 193 structures. If we remove 26 small gray matter structures (,1000 mm 3 ), the reproducibility improves to 3.73%, 1.91%, and 1.79% respectively. These small gray matter structures had poor test-retest reproducibility because they lack clear contrasts in DTI and they are usually not the targets of DTI measurements. Figure 7 shows the maps of cross-subject variability in the volume, FA, and MD measurements. The cross-subject variability is computed for each label of interest. It is quantified as the ratio of the standard deviation to the mean value across the sixteen subjects. A large amount of morphological variability was found in the ventricle volumes, while the standard deviations of the volumes of white matter structures are in 10-20% range. The standard deviations of the FA and MD were noticeably lower and most areas were below 10%. The table in Appendix shows a comprehensive report of the test-retest reproducibility and the cross-subject variability of all the 193 defined structures. These values should provide useful information for power calculation in future study designs.

Discussion
In this study, we developed and tested an automated image parcellation method based on a multi-contrast multi-atlas likelihood-fusion algorithm. DTI can generate multiple quantitative maps with markedly different qualities of anatomical contrasts. The mean diffusivity contrast provides clear distinction between the tissue (generally within the range of 0.6-0.9 cm 2 =s) and the CSF (approximately 3.0-3.5 cm 2 =s), providing a strong constraints to define the ventricles and the brain surface. The FA contrast provides sharp distinction between the gray (typically FA,0.15-0.25) and white matter structures (FA.0.15-0.25). The eigenvector (EV) can differentiate intra-white matter structures based on their characteristic orientations. In this work, we used the absolute values of the three components, EV-x, EV-y, and EV-z. While this approach solves the difficulties associated with the sign of the eigenvectors, some orientation information degenerates [73]. This is obviously a simplified approach and there is room for improvement. The mixture of these three types of information could also invite noise. For example, in the low-FA gray matter structures, the fiber orientation information may be random and should not receive significant weighting. This type of weighting is naturally achieved by incorporating the variability information about the voxel values within a single parcellated structure; for example, the EV information of the thalamus in Figure 1 shows almost equal values for the X, Y, and Z channels with high intrastructure variability. In Eq. (18), we use mixtures of Gaussians to model the intensity distribution within a single structure. If the intensity within the structure is homogeneous, the algorithm automatically assigns weight 1 to a single Gaussian. If there is high intra-structure variability, multiple Gaussians will be used to model the intensity, with each Gaussian being given a small weight. In this way, it effectively reduces the contribution of this contrast information in computing the quantity p(I m (x)jW ,a),m[fFA,MD,EV À x,EV À y,EV À zg in Eq. (14). By incorporating the consistent anatomical signatures into the parcellation criteria, we aim to achieve robust parcellation. In the past, multi-contrast image registration approaches have been postulated including ones for DTI data [60,[74][75][76][77][78][79]. These tensoror vector-based registration also indicated improved registration accuracy [56][57][58][59]. The proposed method can be considered as an extension of these previous works by incorporating them into a multi-atlas framework.
The improved accuracy, with respect to a single-contrast approach, is shown in Figure 2 using Dice measurements. While the performance of single-contrast approaches varies depending on the structure, the five-contrast approach consistently achieved  Table 4. Pearson correlations between the mean FA value within the manual ROI and the automated parcellations, with bold typesetting suggesting that the correlation between the automated and the manual measures is statistically significantly stronger than other methods (p,0.025).  the highest level of accuracy. As mentioned in the Results section, the accuracy of the automated method cannot be higher than the reproducibility level of the manual delineation. In this sense, the results in Figure 2 indicate that the five-channel approach is as good as a human rater. Careful observation of this figure reveals that, for many core white matter structures with distinctive and uniform tract orientations, the eigenvector contrast alone can provide a similar level of accuracy as the five-contrast method,  suggesting that the parcellation was mainly driven by the eigenvector contrast. However, if a structure is not characterized by a uniform fiber orientation due to curvature within a segment, such as GCC and BCC, the eigenvector may not always be the most reliable contrast. While the Dice measurements ( Figure 2) provide important information about the accuracy level of the automated parcellation, it is probable that the correlation results reported in Table 3 are more important for actual image-based studies. Anatomical delineations of brain structures depend on anatomical definitions. It is reasonable that there is consistent difference in boundaries of defined structures between two different approaches. In this sense, low Dice values do not necessarily mean that the automated results are not useful. The high correlation between the manual and automated methods indicates that they have similar powers to differentiate different anatomical states, which is ultimately the goal of quantitative analyses. In this respect, the high correlation between the 5-contrast and manual approaches is an encouraging result.
One limitation of our accuracy evaluation is that the measured structures were limited to the core brain structures, which can be reproducibly defined manually. This is inevitable because the manual delineation results were used as the gold standard. As reported in the Results section, one of the core white matter structures, the SLF, suffered from low inter-rater reproducibility due to its complex shape. Reproducible definitions of peripheral white matter regions by manual tracing would be prohibitively difficult and, due to the absence of the gold standard, the accuracy measurements of the proposed automated segmentation were challenging. In this study, we are therefore limited to reporting test-retest reproducibility and population variability measurements, which could be important resources for power analyses of the proposed method.
The test-retest reproducibility showed less than 5% variability for the volume measurement for most of the defined structures (see Appendix table).The test-retest reproducibility measures of FA and MD indicated higher reproducibility (less than 3%). The anatomical variability for the 193 measured structures reported in Figure 7 should provide information for power analysis to design population studies.
In this study, we reported the accuracy level of the multicontrast multi-atlas approach for a wide range of anatomical phenotypes. The performance of this technology relies heavily on the availability of atlases with consistent parcellation criteria. Creation of such atlases is a time consuming task, which is one of the limitations of this approach. The accuracy of the parcellation is, of course, influenced by the extent of the anatomical abnormality. Conceptually, the greater the number of the atlases and the wider the anatomical range the inventory includes, the wider the applicability of the tools regardless of the anatomical findings/pathology. However, the larger number of the atlases would cost computational time. In the current study, we used sixteen atlases. The relationship between the number and properties of the atlases and the resultant accuracy and applicable anatomical range is not systematically analyzed in this study, which is an important future investigation.
An interesting extension of this discussion is the necessity of a ''normal'' definition. Usually, when we establish a populationbased atlas, we invest great efforts to make sure the population consists of real, normal, healthy subjects. The involvement of abnormal cases in the atlas building would make the atlas biased. In the proposed multiple-atlas approach, we need to make sure that the atlases cover a wide range of anatomical phenotypes, at least if the approach is to have clinical applications; if a structure of interest is dislocated to a position beyond the range of all atlases, the likelihood would become zero and we cannot get correct parcellation. In this process, the achievement of accurate labeling is an independent issue from whether we define the normal anatomy in the atlases.
In conclusion, we developed a multi-contrast multi-atlas image parcellation algorithm and applied it to whole brain parcellations in DTI data. Compared to single-contrast approaches, improved parcellation accuracy was confirmed. Anatomical structures in patients with a wide range of anatomical states could be accurately parcellated by incorporating various anatomical phenotypes in the atlas inventory.

Supporting Information
Appendix S1 Quantification of the test-retest reproducibility and the cross-subject variability in the volumetric measurement, the mean FA value, and the mean MD value, for each of the 193 ROIs. (XLSX)