Reliable Dual Tensor Model Estimation in Single and Crossing Fibers Based on Jeffreys Prior

Purpose This paper presents and studies a framework for reliable modeling of diffusion MRI using a data-acquisition adaptive prior. Methods Automated relevance determination estimates the mean of the posterior distribution of a rank-2 dual tensor model exploiting Jeffreys prior (JARD). This data-acquisition prior is based on the Fisher information matrix and enables the assessment whether two tensors are mandatory to describe the data. The method is compared to Maximum Likelihood Estimation (MLE) of the dual tensor model and to FSL’s ball-and-stick approach. Results Monte Carlo experiments demonstrated that JARD’s volume fractions correlated well with the ground truth for single and crossing fiber configurations. In single fiber configurations JARD automatically reduced the volume fraction of one compartment to (almost) zero. The variance in fractional anisotropy (FA) of the main tensor component was thereby reduced compared to MLE. JARD and MLE gave a comparable outcome in data simulating crossing fibers. On brain data, JARD yielded a smaller spread in FA along the corpus callosum compared to MLE. Tract-based spatial statistics demonstrated a higher sensitivity in detecting age-related white matter atrophy using JARD compared to both MLE and the ball-and-stick approach. Conclusions The proposed framework offers accurate and precise estimation of diffusion properties in single and dual fiber regions.

Introduction new challenge arises: how to automatically adapt the model complexity to warrant an accurate characterization of the underlying diffusion processes?
Many model selection methods were introduced in the DWI field, for instance based on constrained spherical deconvolution (CSD) [14], the Bayesian information criterion (BIC) and the generalization-error [15] A limitation of the CSD approach is that it requires tuning of a threshold to reject small contributions. Furthermore, BIC is influenced by the estimated likelihood and non-estimated factors such as the number of parameters and the sample size. The generalization-error method is a non-local model selection technique. Importantly, all these approaches involve model selection techniques that make hard decisions to select an appropriate model.
Automatic relevance determination (ARD) aims to eliminate the redundant parameters in a complex model, such that the simplified model yields a better description of the data [16]. Behrens [8] adopted ARD for assessing the appropriate number of fiber orientations in each voxel for fiber tracking. This method ensures that if there is no evidence for a second fiber orientation in the data, the volume fraction attributed to this fiber will automatically be forced to zero. ARD methods assume a prior distribution for the model parameters. A Gaussian distribution is a straightforward choice for a prior [16]. Such a prior may involve hyper-parameters to tune its shape. Previous ARD approaches [8,17,18] involved marginalization (integration) over the hyper-parameters to get a prior for each parameter separately. Such a prior is likely to be suboptimal for different diffusion geometries since potential correlations between the parameters of complete models are ignored.
We present a new framework for data-acquisition adaptive estimation of the diffusion shape of simple and complex white matter structures. We consider the method data-acquisition adaptive as it takes properties of the data-acquisition into account such as the number of gradient directions, the number of b-values used and the noise level. The method is based on ARD for a rank-2 dual tensor model and assesses whether two anisotropic tensors are 'mandatory' to model the acquired diffusion-weighted signals. Our ARD estimates the mean of the a posteriori distribution, i.e. the model parameters given the data, exploiting Jeffreys prior [19] [20]: 'JARD' . This data-acquisition adaptive prior is based on the Fisher's information matrix [21]. Previous work on ARD for diffusion-weighted MRI primarily focused on the accurate reconstruction of fiber orientations based on the ball-and-stick model [8,17]. This rank-1 tensor model is not appropriate for estimating the diffusion shape as reflected by a rank-2 tensor model. The proposed JARD method is particularly suited for application in comparative studies in which the goal is to assess subtle differences in diffusion shape between patients and matched controls.

Methods
The JARD framework for estimation of the diffusion shape processes every voxel in the same way. It estimates the parameters of a constrained dual tensor model (DTM) by computing the mean of the posterior distribution sampled by a Markov Chain Monte Carlo (MCMC) approach. The algorithm is initialized by applying the constrained DTM to the measured diffusion-weighted signals using maximum likelihood estimation (MLE). The prior on the parameters in the MCMC sampling is based on the non-informative Jeffreys prior. This prior forces parameters, particularly the volume fraction, towards zero when there is little to no information in them. This will be verified experimentally in the Experiments and Results section.
(DTM) [3]. This model contains signal contributions of up to two fiber bundles and an isotropic component and is given by where S θ,j is the diffusion-weighted signal given parameter vector θ for diffusion weighting b j in gradient direction g j and S 0 the signal without diffusion weighting. D 1 and D 2 are rank-2 tensors to model the anisotropic diffusion in each fiber, D iso is the amount of isotropic diffusion (i.e., D iso ÁI 3×3 , D iso representing the scalar amount of isotropic diffusion), and f i represents the volume fraction of component D i . Note that the DTM in Eq (1) reduces to a single tensor model (STM)-reflecting a single fiber-if f 1 > 0 and f 2 = 0 or vice versa. The volume fraction parameters play an essential role in our JARD scheme.

Maximum Likelihood Estimation of a constrained DTM
The measured diffusion weighted image (DWI)S j;s with diffusion weighting b j in direction g j is corrupted by Rician noise of standard deviation σ [22]. Therefore, the probability density function (PDF) forS j;s is given by with I 0 (Á) the zero-th order modified Bessel function of the first kind. The DWIs are statistically independent, so that the joint probability density function pðS s jθÞ of the signal profileS s is given by the product of the marginal distributions for the measured signalsS j;s in each of the N g diffusion weighted directions g j : Here, pðS s jθÞ is the likelihood function of θ givenS s . The underlying parameter values can be inferred by maximizing this likelihood function [23] θ MLE ¼ arg max θ fpðS s jθÞg: ð4Þ Maximum likelihood estimation (MLE) has a number of favorable statistical properties in the estimation of diffusion properties in crossing bundles [3]. First, under very general conditions, MLE asymptotically reaches the Cramér-Rao lower bound (CRLB). The CRLB is a theoretical lower bound on the variance of any unbiased estimator. Second, the MLE is consistent, which means that it asymptotically (N g ! 1) converges to the true value of the parameter in a statistically well-defined way [24]. The dual tensor model given in (1) should be parameterized such that its parameter values reside in a well-defined range. In previous work [3], we parameterized the tensor D i as follows: is a diagonal matrix with the eigenvalues of the tensor D i on its diagonal. The non-negativity constraint that is imposed on the estimated diffusivity values is accomplished by employing an exponential mapping [3]. The matrices R i = 1,2 describe rotations around the x-, y-and z-axes: R i (α 1-4 ) = R x (α 1 )R y (α 2 )R z (α 3 ±α 4 /2). The first two rotations R x (α 1 )R y (α 2 ) determine the orientation of the plane in which the first principal eigenvectors of both tensor reside. R z (α 3 + α 4 /2) and R z (α 3 − α 4 /2) denote the in-plane rotations of the first principal eigenvector of the two tensors. As such, the parameter vector to be estimated for a dual-tensor model MLE becomes However, MLE does not necessarily yield useful estimates. A potential error in the estimated parameters is greatly influenced by the degrees of freedom (DOFs) and the covariance (s) between parameters. We demonstrated that restricting the DOFs by imposing constraints on the DTM greatly reduces the covariance between parameters. The experiments in [3] showed that precise and accurate estimation can be achieved if we apply the following constraints: Eq (6) imposes that the "unrestricted" diffusivity (i.e., free diffusivity) along the fibers, denoted by the first eigenvalues of D 1 and D 2 , are equal. Eq (7) states that the diffusion perpendicular to the fiber orientation is assumed to be axially symmetric, which models the average shape of axons. Eq (8) defines that D iso equals C free-water = 3×10 −3 mm 2 s −1 , the diffusivity of free water at body temperature 37°C, and Eq (9) states that the two anisotropic tensors plus the isotropic compartment fill the entire volume of each voxel. Constraining the DTM cannot avoid the inherent risk of overfitting. This happens when a complex model is fitted to simple data, e.g. fitting multiple tensors to data of a single fiber bundle. Typically, this yields an increase of the variance and the covariance of the parameters, but also leads to biased diffusivity estimates.

Automatic Relevance Determination
Bayes factors offer an alternative to model selection by the classical likelihood test [25]. It computes the evidence for a model to be used in model selection. However, calculating the evidence for any model requires integration over all model parameters, weighted by the parameter priors. This is computationally unfeasible, especially with high-dimensional parameter spaces for which no analytical solution exists. ARD was introduced exactly to cope with such issues [16] [8] [17] [26] [27]. Compared to the Bayes factors approach, ARD does not fit competing potential models to the data and compares them on the basis of the residual after fitting. Instead, ARD always fits a complex model to the data and forces irrelevant parameters to zero, so that a complex model reduces to a simpler one.
Our JARD estimates the mean of the posterior distribution of the constrained DTM based on Bayes' theorem [16]. The posterior distribution pðθjS s Þ is where pðS s jθÞ is the aforementioned likelihood function, p(θ) the prior probability of θ in the DTM, and pðS s Þ the evidence for the DTM. As the evidence term in Eq (10) is constant for any measured signal, the posterior probability distribution in JARD becomes pðθjS s Þ / pðS s jθÞpðθÞ ð11Þ Our framework estimates the posterior distribution given the data, which is influenced by the likelihood function and the prior. We introduce a data-acquisition adaptive prior for DTM parameters based on Jeffreys theorem (see next subsection). It allows simplifying a complex model to a simple model by automatically forcing volume fractions, which are not supported by data to zero. If the posterior estimates of both anisotropic fractions lie in a small interval around their MLE value, then this would indicate that the prior did not significantly change the outcome and that fitting the initial dual tensor model was justified. Reversely, if the posterior estimate for one of the two anisotropic fractions does not significantly differ from zero, then its corresponding tensor compartment can be treated an unnecessary parameter. In such a case, the estimation essentially returns a 'single-tensor' model.

Jeffreys Prior
Methods to choose the prior for a Bayesian analysis can be divided into two groups: informative and non-informative priors [27] [30]. An informative prior expresses specific, definite information about a variable, whereas an uninformative (or diffuse) prior expresses only general information about a variable. We aim to introduce a new, data-acquisition adaptive prior, which makes JARD non-informative. Specifically, we adopt Jeffreys non-informative prior p J (θ) which can be written as: where I(θ) denotes the Fisher information matrix given by The Fisher information matrix I(θ) provides the amount of expected information about the parameter vector θ in measurements. By definition, it is influenced by properties of the dataacquisition such as the number of data points and the noise. In general, Jeffreys prior is in agreement with one's intuition that if a parameter is necessary, it must be supported by the data. Poot [31] showed that the Fisher information matrix for Rice distributed measurements given by Eq (13) can be efficiently computed. We aim to exploit Jeffreys prior in Bayesian estimation to discriminate between configurations that yield a degenerate or a non-degenerate model. The preferred prior must convey support for a dual tensor representation in crossing fibers, e.g. represented by two tensors at 90-degree angles. Therefore, the prior should be flat in such a configuration, making that it only mildly affects she posterior distribution, which would peak near the initial dual tensor parameters obtained by MLE. Reversely, consider an actual single tensor configuration, e.g. a dual tensor model consisting of two tensors with the same shape and orientation and having f 1,2 = 0.45. In this case the dual-tensor model is degenerate and the prior must be harsh, promoting a near-zero volume fraction in the posterior distribution.
In order to meet these preferred properties, we use a prior based on Jeffries prior: Fig 1 illustrates the shape of the prior. Notice in Fig 1(A) that as the volume fraction f 1 % 0.45 the prior is rather flat. This is because in a true dual tensor configuration, there is no correlation between the model parameters, as such yielding a maximum for det(I(θ)). Reversely, as the volume fraction f 1 approaches zero to establish a single tensor configuration, there will be large correlation between several parameters yielding a very small det(I(θ)) and in turn a sharp rise in the prior probability. Additionally, Fig 1(B) shows that as orientation divergence α 4 decreases to zero, the prior increases in the direction of f 1 = 0. As such, the prior will favor a 'simpler' configuration (i.e. push f 2 to zero) as long as the decrease in goodness of fit is more than compensated for by an increase in the prior probability. Note that the bathtub shape of the prior probability in Fig 1(A) indicates that the roles of the two volume fractions can be exchanged. The deviation from a pure symmetric curve is due to the difference in FA of the two crossing fiber bundles. A nice property of Jeffreys prior is that it automatically adapts to data acquisition conditions. This is exactly why we describe our prior as data-acquisition adaptive. For example, a decreased number of gradient directions or an increased noise level will reduce the Fisher information of a dual tensor model. This corresponds to larger theoretical minimum variance and a reduced support of that model. Therefore, the curve representing the lower SNR in Fig 1 (C) (blue) starts to increase at a larger α 4 than that corresponding to the higher SNR (red). This shows that a higher SNR yields a better orientation sensitivity in detecting and estimating crossing fibers. In effect, our prior enforces that the dual-tensor model reduces to the single tensor representation unless there is sufficient support for two non-zero tensors in the data. Furthermore, a higher b-value by itself enhances the support for a two-tensor representation (see [13] [32] [33]). However, such a higher b-value usually comes with a lower signal to noise ratio. Therefore, the effect of varying the b-values is more difficult to predict. In previous work, we found that imaging at two b-values (1000, 3000) is appropriate for precise estimation of diffusion parameters in fiber crossings.

Results
All experiments below were carried out on a DELL laptop computer with an Intel i7-2620CPU @2.7GHz and 4 GB RAM running a Windows-7 64-bit operating system. The method was implemented in MATLAB_R2014b. The average execution time on our brain image data was 6 voxel/s. Applying our method to one DTI volume (from dataset A, see below) took approximately one day by parallel processing of voxels on a cluster of 12 Intel Pentium cores. Clearly, a further speedup can be achieved by involving more cores because the execution time decreases linearly with the number of cores. Furthermore, the presented number is measured based on interpreted MATLAB code. We expect that a further speedup can be achieved by compiling the MATLAB code.
In the first part of this section we evaluate the performance of estimating the parameters of our constrained dual tensor model by JARD and by MLE on simulated data. We studied the differences between JARD and MLE as a function of the volume fraction for simulated crossing fibers under realistic conditions. Diffusion measurements were simulated by means of the model presented in Eq (1). The parameters of crossing fibers are listed in Table 1 and are in agreement with the work of Pierpaoli [34] who reported diffusivities ranging from 0.25×10 −3 to 1.5×10 −3 mm −2 s. At the given diffusivities FA 1 = 0.91 and FA 2 = 0.67. The SNR (defined by S 0 /σ) was 25 [3]. Two measurements at b = 0 mm -2 s were simulated. Furthermore, 92 gradient directions were adopted for each of two b-values (1.0Á10 3 mm −2 s and 3.0Á10 3 mm −2 s), homogeneously distributed over the surface of a sphere. These settings are identical to dataset B (see below) and equal to those reported in [3].
In the second part of this section we demonstrate the potential of the proposed framework for some neuroimaging applications. Therefore, we evaluated the performance on varying types of brain datasets to verify whether a reliable estimation could be achieved. We applied JARD and MLE to the genu of the corpus callosum (CC) representing a single fiber region enclosed at both ends by a crossing with the corticospinal tract (CST). Subsequently, we show a neuroimaging application of our proposed framework, i.e. automatic estimation of diffusion properties.
Three different datasets, acquired with different acquisition protocols, were adopted to explore the two methods. Dataset A concerned diffusion data from one subject of the Human Connectome Project (HCP) [35]. The relevant acquisition parameters of this dataset were: three b-values 1000, 2000 and 3000 s/mm 2 , 90 gradient directions per b-shell and three measurements at b = 0 per b-shell, TE/TR 89.5/5520 ms, voxel size 1.25×1.25×1.25 mm 3 . Dataset B was acquired from one control subject (see also [3]). The acquisition parameters: two b-values 1000 and 3000 s/mm 2 , 92 gradient directions per b-shell and one measurements at b = 0 per bshell, TE/TR 84/3800 ms, voxel size 1.7×1.7×2.2 mm 3 . Dataset C consisted of data from 24 healthy controls from an ongoing DTI study into the effects of HIV on the brain [36]. The acquisition parameters were: two b-values at 1000 and 2000 s/mm 2 , 64 gradient directions bshell and one measurements at b = 0 per b-shell, voxel size 2.0×2.0×2.0 mm 3 . The SNR for each of the three datasets was found to be higher or equal than 20. This was determined by fitting a single tensor model to a selected region of the CC, after which we took the ratio between S υ,0 and the model residual as the estimated SNR.

JARD versus MLE: simulation experiment
The volume fraction of the constituting compartments of a dual tensor model is a crucial parameter for the modeling of simple and complex fiber geometries. Therefore, we evaluated the performance of the methods as the volume fractions of the two compartments were varied. To assess the robustness for variations in volume fraction, for each volume fraction and divergences of 90 degrees and 45 degrees we generated 100 realizations with the parameters given in Table 1. For each realizationθ JARD andθ MLE were computed. Fig 2 shows boxplots depicting the results of dual tensor estimation by JARD (red) and MLE (blue). Since the estimation procedure assigns a random label to the first and second tensor, we need to assign the two estimated tensors to the corresponding ground truth compartment. The estimated tensors are sorted by tensor similarity based on the Frobenius norm: the tensor with the smallest Frobenius norm with respect to the ground truth of compartment 1 received the label 1. Fig 2A-2D) show the results for the 90°crossing. The estimated volume fraction by JARD nearly ideally correlates with the ground truth over the entire range of volume fractions, both for single fiber (f 1 = 0 _ f 1 = 0.9) and crossing fiber configurations (f 1 2 (0, 0.9)), see Fig 2A and 2B). Clearly, MLE yields a random f 1 as the true volume fraction approaches that of a single fiber configuration. Fig 2(C) shows that the median FA 1 estimation is almost identical for both estimation methods in crossing fiber configurations, irrespective of the volume fraction. Fig 2(D) shows that the estimated FA 2 converges (almost) to the true value as the true f 2 increases. The FA estimation in single fibers (f 1 = 0.9 _ f 2 = 0.9) appears equally unbiased for both methods, but a considerably larger spread is encountered with MLE than with JARD. Note that the estimated FA's with f 1 = 0 _ f 2 = 0 essentially represent degenerate measurements and are therefore not shown in the graphs. In the absence of a ground truth this can be detected with JARD as the corresponding volume fraction is automatically forced to (near) zero (see Fig 2 (A)). MLE does not offer such a mechanism, which might lead to a fictitious fiber compartment. Fig 2E-2H) show the results for the 45°crossing. This figure shows similar trends as obtained for the 90°crossing in Fig 2A-2D The effect of varying the fiber orientation was assessed by generating diffusion measurements through Eq (1) using the parameter values given in Table 1, with α 4 2 {0, 20°, 40°, 90°}. For each such configuration 100 noisy realizations were generated. Subsequently, the model  Table 1 and acquisition parameters accords with Dataset B. The boxes display the median and 25th, respectively 75th percentiles of the data distribution; whiskers extend to 1.5 times the interquartile range; values outside these ranges are indicated as individual points.
doi:10.1371/journal.pone.0164336.g002 parameters were estimated by means of MLE and ARD using both the prior from Behrens' paper (i.e. f −1 (1−f) −1 ) as well as Jeffreys prior, referred to as BARD and JARD respectively.
The outcome of this experiment is presented in Fig 4. It contains scatterplots of estimated orientations of the largest eigenvectors of the tensors.
The results confirm that the three methods yield very similar performance as α 4 ! 20°. However, the proposed JARD clearly yields the most precise orientation estimation if α 4 = 0: the tensors with a large volume fraction assert a single orientation, while the scattered orientations all have a very small volume fraction.
Summarizing, the graphs demonstrate two things: 1) JARD facilitates accurate estimation of volume fractions especially with increasingly unbalanced real volume contributions, in which case MLE grossly fails; 2) the estimation of FA by JARD shows a much narrower distribution than by MLE, especially in single fibers

JARD versus MLE: brain imaging
To demonstrate the performance of JARD and MLE on brain data, one subject was randomly selected from each of the three aforementioned datasets (A, B, and C).   Table 1  Clearly, JARD forces the volume fraction of one tensor compartment nearly to zero in the single fiber region of the CC. Evidently, MLE yields an erratic outcome regarding both volume fraction and fiber orientation in the same region. Furthermore, notice that FSL-ARD often returns two fiber orientations in this region. In crossings like the region where CC crosses with CST, JARD, MLE and FSL-ARD yield a similar outcome regarding fiber orientations. These trends can be observed for each of the three datasets.
The performance of FSL-ARD is influenced by the so-called ARD weight: a higher weight reduces the number of secondary fibres per voxel. Fig 6 shows the influence of the FSL-ARD weight on FSL's performance. In this experiment, we examined the same regions as shown in Fig 5 and used three different FSL-ARD weights.

Fig 4. Scatterplots of estimated orientations by MLE and ARD using Behrens prior (BARD) and Jeffreys prior (JARD) for with varying α 4 .
The two axes in each subplot represent angles (unit: π rad). The orientation of the tensor with the largest volume fraction is shown in red, the other in gray. The intensity of each symbol is scaled by the volume fraction of the tensor. It can be noticed that a larger FSL-ARD weight reduces the number of modeled fibers, which is preferred in single fiber regions such as the central part of the corpus callosum. However, a larger weight simultaneously decreases the number of modeled fibers in crossing regions. Furthermore, the proper FSL-ARD weight varies for the different datasets: 10 for dataset A, 1 for dataset B and C. For instance, Fig 6(C) shows that weight = 10 yields good model selection in the single fiber region of Dataset A (see the red circle). However, at this same weight spurious second fibers are not effectively restrained in Dataset B (see the red circle in Fig 6(F)). On the other hand, at this weight no secondary fibers are detected in crossing regions (indicated by yellow circles in (f) and (i)). Observe that our framework (as reported in Fig 5) yields a better performance than FSL-ARD, even at the optimal weight settings, and does not require parameter tuning.  JARD framework (red) and MLE (green). We had to select the tensor compartment that corresponds to the GCC since the labels assigned by MLE and JARD are random. To solve this, the FA belonging to CC was selected based on "front evolution" [37]. In front evolution, the estimated tensor of one compartment is randomly chosen as the reference tensor. Then, the tensor of a neighborhood voxel with the smallest Frobenius norm to the reference tensor receives the same label. After processing all neighbors of the current front as such, these neighbors become the new reference tensors for the next iteration. The green (MLE) and red (JARD) areas indicate the uncertainty in the estimated value as quantified by the square root of the CRLB.

JARD versus MLE: Fractional Anisotropy along the CC
Confirming the above findings, estimation by JARD yields a rather constant FA with small variance. In contrast, MLE yields FA values with a larger variance, particularly in the central part of GCC. We attribute this to overfitting. JARD versus MLE: Dual-tensor FA and volume-fraction maps FA as well as volume-fraction maps have been used to detect white matter changes [38] [37]. Here, we will display FA and volume fraction maps generated by JARD and MLE and point out the differences.
Specifically, Fig 8 shows color-coded RGB maps encoding FA in the red channel and the corresponding volume fraction in the green channel. Left images show the outcome by JARD, right images by MLE; top images reflect the first tensor, bottom images denote the second tensor. The ordering of the tensors was performed by front evolution [37].
Regarding the JARD outcomes, one can observe that in Fig 8 In contrast, MLE does not specifically force the volume fraction of one tensor to zero in single fiber and gray matter regions. Therefore, the corpus callosum in

TBSS based on dual tensor FA and volume-fraction maps
For a statistical analysis of the FA and volume fractions with age, we included the healthy controls from dataset C [36]. The subjects aged between 45 and 50 (12 subjects, mean-age: 46.2, standard deviation 1.49) and those aged between 65 and 75 (12 subjects, mean-age: 68.2, standard deviation 3.72) were selected from the full control group. All data was registered to the MNI152 standard space using FNIRT [39]. We used the version of FNIRT that is implemented in FSL (version 5.0.7). Subsequently, differences between the two age groups were analyzed by means of the classical TBSS technique, i.e. single tensor analysis, [38] as well as the extended TBSS method for the dual tensor models [39]. Compared to the classical TBSS method, the extended TBSS technique employs 'front evolution' to avoid swapping of the two anisotropic components between the different images. Extended TBSS was used to analyze differences in the dual tensor volume fractions (comparable to [37]) as well as differences in the dual tensor FA maps between the two age groups. Importantly, our approach facilitates such a separate analysis of tensor volume and shape. Notice that the volume fraction used in [37] is a different variable, representing the stick strength in a ball-and-stick model. Fig 9 shows an axial slice containing multiple areas with crossing fibers and regions with just a single fiber. Fig 9(A) shows the classical, single tensor TBSS analysis and Fig 9(B) shows the extended TBSS analysis of the volume fraction estimated from FSL's ball-and-stick model. Fig 9C and 9E contain the extended TBSS analysis of the dual tensor FA maps, and Fig 9D and  9F) the extended TBSS analysis of the dual tensor volume fraction maps. The dual tensor estimations in Fig 9C and 9D were obtained by JARD, those in Fig 9E and 9F by means of MLE. The red-yellow colored regions in Fig 9A-9F identify regions where the differences are significant. Color-coded output displaying the FA (green channel) and the corresponding volume fraction (red channel) for ARD and MLE. The tensor compartments were classified into first and second by front evolution. FA of first tensor and its corresponding volume fraction by JARD (a) and by MLE (b); FA of the second tensor and its corresponding volume fraction by JARD (c) and by MLE (d). A percentile stretch was performed on the data for more contrast. This stretching put the 5% lowest signal values to zero and the 5% highest values to 1; the remaining values were linearly mapped in between. The red circle indicates a single fiber region, the blue circle a region with crossing fibers. It can be observed that the single tensor approach does not detect significant differences in the crossing fiber regions indicated by the red ellipse (Fig 9(A)). By comparison, the analysis of FSL's ball-and-stick method yields remarkably more regions with significant differences, particularly in fiber crossings such as in the red ellipse (Fig 9(B)). The JARD method (Fig 9C and  9D)) finds slightly more significant differences compared to both the classical approach and FSL's ball-and-stick method in single tensor regions, see the right part of the green circle ( Fig  9C and 9D)) and also in the a blue circle (Fig 9(D)). This signifies that the superfluous parameters are effectively eliminated by JARD in single fiber regions. In single fiber regions the extended TBSS analysis based on MLE (Fig 9E and 9F)) yields smaller regions with significant differences compared to the classical approach (Fig 9(A)). We attribute this to the large variability in such regions that we already observed with MLE e.g. in Fig 2. Furthermore, we found the expected similarities regarding detected differences in crossing regions (e.g. the red circles) between the MLE and JARD. This indicates that Jeffreys prior does not affect the MLE outcome in such regions. All significant differences are negative, i.e. reduced FA and volume fraction with increasing age. This outcome confirms the finding that significant age-related white matter atrophy was found in the corpus callosum [40].

Discussion
We developed a new framework for estimating the parameters of a constrained dual tensor model in diffusion-weighted MRI. It automatically determines to what extend the diffusion in a voxel should be modeled by one or two rank-2 tensors. In essence, the complexity of the model is implicitly inferred with JARD in a Bayesian probabilistic manner. An initial guess of the diffusion model is obtained by fitting a dual tensor model to the data with MLE. Subsequently, a new automated relevance determination method assesses whether two tensors are 'mandatory' to model the data. If this is not the case, the volume fraction of the superfluous tensor automatically reduces to (nearly) zero.
The proposed framework extends previous work by Behrens et al and Jbabdi et al [38] [37]. A crucial difference is that we employ a rank-2 tensor model, whereas the previous works concerned ball-and-stick models. As such, we aim to recover the full diffusion shape. In FSL 5.0.9, released after the initial submission of this manuscript, bedpostx was extended to include axially symmetric tensors (see http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/WhatsNew#anchor1). This might reduce the number of voxels where bedpostx finds multiple fibers in the corpus callosum (as in Fig 5). However, our model still differs from this new bedpostx in two important ways: (1) bedpostx forces both tensors to have the same perpendicular diffusivity and hence FA, and (2) our prior is data-acquisition adaptive. Furthermore, an important novelty of our work is that Jeffreys non-informative prior is exploited in JARD, yielding an alternative for previously used informative priors. It facilitates accurate and precise estimation of the volume fractions as well as the diffusion properties with a DTM in single fiber and crossing fiber regions. Jeffreys prior is based on the Fisher's information matrix which accounts for properties of the data acquisition, such as diffusion weighting b-value, the gradient directions and the effective SNR. Therefore we call this method data-acquisition adaptive.
We demonstrated that both in a central part of the corpus callosum (single fiber) as well as in a region where the corpus callosum crosses with the corticospinal tract, the configuration inferred by our method corresponds to the expected neuro-anatomy [41] [42]. The proposed framework has been compared with direct MLE of the same dual tensor model. Several differences between the proposed framework and MLE were observed. In regions that were considered to contain just a single fiber, MLE typically inferred a large volume fraction for both tensor components (see Fig 5). Here, the proposed framework yielded a single tensor representation by diminishing the volume fraction of the second tensor component. Furthermore, the FA estimated by MLE showed much more variation than the FA estimated by JARD in such a region (Fig 7). In regions that were considered to contain crossing fibers, the results of MLE and JARD were similar.
There are a few limitations of our method. Firstly, we assume a mono-exponential decay along the eigenvectors of the three compartments up to b = 3000 s/mm 2 . Measuring at higher b-values will certainly introduce sensitivity to different compartments such as the myelin sheet [33] with the associated restricted and hindered diffusion [5]. In the latter case, the Gaussian diffusion assumption is no longer valid. However, investigating such non-Gaussian diffusion is beyond the scope of this work. As such, we follow [3] and [43]. Secondly, recent studies reported the presence of a three-way crossing of fiber bundles [14] [44]. In our framework a dual-tensor model is employed to characterize voxels encompassing crossing fibers. The reason for limiting the number of anisotropic components to two is the limited SNR in our HARDI data. Notice that whereas [14] and [44] only recover the fiber orientation, we aim to reconstruct the full diffusion shape, which requires a higher SNR. In [3] we showed that estimating a dual rank-2 tensor model already requires HARDI at two b-values, data of sufficient SNR, and some model restrictions. The latter is needed to ensure stability as the number of model parameters may approach or even surpass the number of degrees of freedom present in the data. Therefore, fitting a triple rank-2 tensor model to voxels with a three-way-crossing will be even more challenging [13]. Developing methods for estimation of the diffusion properties in threeway-crossing fiber bundles will remain an important challenge for future research.
Diffusion imaging may reveal several aspects to white matter integrity: (I) locations of alterations; (II) which fiber tract is affected; (III) the exact change in diffusion. Previously, many solutions were already proposed for the first two aspects. Our work focused on all three aspects. Particularly, our framework may aid a more accurate characterization of the diffusion shape.