Fully Automated Segmentation of the Pons and Midbrain Using Human T1 MR Brain Images

Purpose This paper describes a novel method to automatically segment the human brainstem into midbrain and pons, called LABS: Landmark-based Automated Brainstem Segmentation. LABS processes high-resolution structural magnetic resonance images (MRIs) according to a revised landmark-based approach integrated with a thresholding method, without manual interaction. Methods This method was first tested on morphological T1-weighted MRIs of 30 healthy subjects. Its reliability was further confirmed by including neurological patients (with Alzheimer's Disease) from the ADNI repository, in whom the presence of volumetric loss within the brainstem had been previously described. Segmentation accuracies were evaluated against expert-drawn manual delineation. To evaluate the quality of LABS segmentation we used volumetric, spatial overlap and distance-based metrics. Results The comparison between the quantitative measurements provided by LABS against manual segmentations revealed excellent results in healthy controls when considering either the midbrain (DICE measures higher that 0.9; Volume ratio around 1 and Hausdorff distance around 3) or the pons (DICE measures around 0.93; Volume ratio ranging 1.024–1.05 and Hausdorff distance around 2). Similar performances were detected for AD patients considering segmentation of the pons (DICE measures higher that 0.93; Volume ratio ranging from 0.97–0.98 and Hausdorff distance ranging 1.07–1.33), while LABS performed lower for the midbrain (DICE measures ranging 0.86–0.88; Volume ratio around 0.95 and Hausdorff distance ranging 1.71–2.15). Conclusions Our study represents the first attempt to validate a new fully automated method for in vivo segmentation of two anatomically complex brainstem subregions. We retain that our method might represent a useful tool for future applications in clinical practice.


Introduction
In neuroimaging, brain segmentation plays an important role in several medical applications. This field of study has attracted much interest from the clinical community since in vivo automatic quantification of anatomical abnormalities in critical brain regions represents crucial information that might significantly impact clinical management and practice (i.e. identification of new biomarkers).
Although manual segmentation is currently considered the gold standard approach to determine the morphology of brain regions [1], this method is traditionally time-consuming and dependent on rater experience. Designing algorithms that automatically segment brain regions is challenging, especially for highly variable structures such as the ones located at subcortical level. In the last few years, a large amount of robust and fully automated segmentation tools have been developed for extracting complex anatomical brain regions. To overall summarize these methods several theoretical categories have been proposed.
For instance, automated segmentation methods may be classified in three broad classes: The first class of methods achieves brain tissue segmentation by applying statistical classification methods to the signal intensities. The initial works used information from only one MRI contrast (i.e., T1 weighted images) [2][3][4] while advanced techniques employed very complicated adaptive segmentation algorithms [5] or multichannel tissue segmentation methods combining data from different imaging contrasts [6], which drastically improved the ability to segment subcortical structures. However, analysis of the signal intensity alone is not sufficient to distinguish between different subcortical gray mater structures. For this reason, it has been suggested [7,8] that incorporating a priori constraints, such as corresponding anatomical landmarks, can improve the identification of boundaries in regions where anatomical boundaries are fuzzy and greater residual anatomic variability remains. (ii) The second class refers to methods that rely primarily on strong shape, mathematical models, where the algorithms are dedicated to discriminate between brain regions by using morphological contents (geometry) of MRIs. For example, FreeSurfer [9] employs affine transformations while combining the voxel intensity values relative to a probability distribution for tissue classes, with the information of the voxel location in respect with the neighboring structures obtained from a manually labeled atlas. Again, Pohl et al. [10] used an Expectation Maximization (EM) type algorithm with shape priors to perform segmentation. The EM algorithm is a general method of finding the maximum likelihood estimate of the parameters of an underlying distribution from a given data set when the data is incomplete or has missing values. This algorithm has been employed by several 3D automatic tools for improving brain segmentation [7,11]. (iii) Finally there are others tools that use mathematical models for brain segmentation that include both the appearance (voxel intensity) and shape (geometry) of each single brain region [12,13]. For example, in FSL/FIRST [13], automated segmentations proceed via a Bayesian probabilistic approach using shape and appearance models, built from a library of manually segmented images, parameterized as surface meshes and then modeled as point distributions.
Using the learned models, FIRST searches through linear combinations of shape modes of variation (principal components) to find the most probable shape instance given the observed intensities from the input image. FIRST uses an empirically determined fixed number of modes (iterations) for each structure. Finally, the vertex information or models are transferred back to the native space in which the boundaries were corrected and volumes (labels) were generated.
Again, an elegant division proposed by Khan et al. [14] distinguishes these methods in: (a) Knowledge-driven methods, which use implicit or explicit anatomical knowledge to guide the segmentation, mainly for individual structure such as caudate nucleus and hippocampus/amygdala [15,16]. (b) Probabilistic-based methods, which treat segmentation as a classification problem and estimate the labeling that maximizes an a posteriori probability given specific constraints (such as Freesufer [9]). (c) Deformable template-based methods, which involve finding a geometric transformation from a pre-labeled template scan to the target scan and propagating the labels with the same transformation to label the target brain.
Although all these methods contributed towards quickly and accurately obtaining in vivo 3D volumetric quantification of almost all clinically relevant subcortical brain regions, there is a specific subcortical region of the human brain that still remains sparsely studied: the brainstem. Among the above-mentioned techniques only few methods directly provide an objective quantification of the brainstem [6,9,13]. The human brainstem is a very complex structure composed by two sub-regions of great clinical interest, the pons and the midbrain, far from being adequately segmented. The automatic 3D segmentation of these two regions is essential by virtue of their neurophysiological peculiarities. The midbrain holds several important nuclei, such as substantia nigra and red nucleus, involved in several neurophysiological processes regulating emotion and motor behaviors. The pons connects the cerebellum to the main portion of the brain through two thick structures known as cerebellar peduncles. White matter inside the pons is crucial to a number of important motor functions, including: arousal, sleeping and sensory awareness. Moreover, there is increasing evidence that some neurodegenerative diseases, such as Parkinson's disease (PD) and Alzheimer's disease (AD), are characterized by early (and distinct) involvement of these two regions [17][18][19][20][21][22]. At this time, the only validated MRI-based measurement employed in clinical practice derives from conventional MRI using manual morphometric quantification. In fact, several authors [19,22,23], using different approaches, demonstrated that the diameter or area assessed on the mid-sagittal plane of these two brainstem structures, allow a reliable differential diagnosis of PD with respect to parkinsonism of different etiology, such as Progressive Supranuclear Palsy (PSP)(clinically similar to PD patients, but with a more rapid disease progression). In all these studies, the steps to identify and quantify the pons and midbrain are based upon manual intervention, thus making it difficult to assess reliability between the methods and the application on large samples (due to time constraints). For this reason, an accurate in vivo automatic measurement of these two regions is an essential step for improving clinical management of neurological patients, as well as, in longitudinal and prospective studies, thereby eliminating the problems associated with manual segmentation.
In this paper we present a novel method able to perform an unbiased automatic segmentation of the pons and the midbrain using high-resolution structural MRIs, in order to obtain accurate measurements of these two anatomically complex subcortical regions. This method is called: LABS (Landmark-based Automated Brainstem Segmentation). LABS is based on a revised landmark-based approach using general, widely-accepted knowledge of human brain morphology, which integrates information about tissue class, structure and position, without requiring manual intervention. This information is integrated with a ''thresholding-based approach'', that allows separation of the two classes through the choice of a pixel intensity value (''threshold'') [24], so that all pixels with intensity greater than the threshold are grouped within a class and all other pixels into another class. Our method has been tested using a 3Tesla MRI scanner on healthy populations having a large age-range and validated using quantitative comparisons. Moreover, to further validate our tool in the neurological realm, we also tested automatic 3D segmentation on patients with AD, obtained from another MRI scanner (Siemens 1.5T, ADNI database)

Participants
Thirty right-handed healthy right-handed subjects (mean age: 42.762.5 years; age range from 24 to 71 y; 15 males) participated in this study. Careful screening was performed to ensure that participants were free of psychiatric and/or neurological history, psychoactive drug treatment, drug or alcohol abuse, and had no MR contraindications. All participants gave written informed consent to participate in the present study, approved by the Ethical Committee of the University 'Magna Graecia' of Catanzaro according to the declaration of Helsinki.
To further validate our study, we enrolled 40 patients with AD (mean age: 71.866.8 years; 22 males) and 40 age-/sex-matched healthy controls (mean age: 73.7610.1 years; 23 males) from the Alzheimer's disease Neuroimaging Initiative (ADNI) repository (www.loni.ucla.edu/ADNI) (Table S1 and Table S2). The ADNI project was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), together with private pharmaceutical companies and non-profit organizations. The primary goal of ADNI has been to test whether serial MRIs, positron emission tomography, several biological markers, and clinical/neuropsychological assessments can be suitably combined to measure the progression of mild cognitive impairment and early AD.

MRI Acquisition
Brain MRI for healthy controls was performed on a 3 Tesla scanner with an 8-channel head coil (Discovery MR-750, GE, Milwaukee, WI, USA) at the Neuroimaging Research Unit, Institute of Neurological Sciences, National Research Council, Catanzaro, Italy. Structural MRI data were acquired using a 3D T1-weighted spoiled gradient echo (SPGR) sequence with the following parameters: TR: 3.7 ms, TE: 9.2 ms, FOV: 25.6 mm, flip angle 12u, voxel-size 16161 mm 3 , producing 368 slices covering the entire brain. Subjects were positioned to lie comfortably in the scanner with a forehead-restraining strap and various foam pads to ensure head fixation. All scans had equally good quality with negligible motion artifacts.
AD patients and age-/sex-matched healthy controls followed the ADNI MRI acquisition protocol [25]. We only used images acquired with 1.5 T scanners and already pre-processed to avoid artifacts due to magnetic field inhomogeneity and signal drifts [25]. For each subject, we used the MRI considered as the ''best'' quality scan by the ADNI investigators. In the description of the ADNI methods (http://www.loni.ucla.edu/ADNI/Data/ADNI_ Data.shtml), the ''best'' quality image is the one which was used for the complete pre-processing steps. The identification numbers of the images used in this study are reported in Table S1 and Table  S2.

Algorithm
In this section we describe the proposed automated segmentation approach to define the pons and midbrain parcellation. This method consists of three stages summarized by the outline in the Figure 1, whose details will be presented in the following sections. The input data for our method consisted of high-resolution T1weighted structural MRI scans. This method was first validated on population of 30 right-handed healthy subjects enrolled in our institute. Next, we tested its robustness on additional cohorts of AD and controls extracted from ADNI database.
The first stage was aimed to automatically define the midsagittal plane. For this reason, morphological high-resolution T1weighted images were rigidly registered (using a 6-parameter affine registration) to the template MNI image based on the mutual information metric using the software SPM8 software (http:// www.fil.ion.ucl.ac.uk/spm) and resampling the registered T1 using cubic spline interpolation. Subsequently, we individuated the midsagittal plane using specific anatomical landmarks such as the corpus callosum and the upper part of the brainstem. In the second stage we segmented the brainstem, mammillary body and the quadrigeminal plate using the mid-sagittal plane. The outline of these latter subcortical structures is extremely important for defining the planes useful for dividing brainstem into the pons and midbrain, following previous radiological and anatomical criteria [22,26]. Finally, in the last stage we used the planes of cut, previously defined, to delineate two subvolumes of images in which we may respectively segment and extract the entire volumes of the pons and midbrain.
Each step of our method is based upon a combination of a revised landmark-based approach together with a threshold-based algorithm. The threshold value used for each step was obtained using a revised Otsu et al. [24] approach, introducing some correction factors to improve the quality of the segmentation.

Individuation of mid-sagittal plane
The first stage was characterized by the individuation of the mid-sagittal plane within the 368 slices of our morphological T1weighted sequence. In general, the mid-sagittal plane can be defined as either the plane best matching the cerebral interhemispheric fissure [27,28] or the plane maximizing the bilateral symmetry [29][30][31]. A large amount of work has been dedicated to the automatic individuation of this plane using different approaches [32][33][34][35][36].
In our method we registered the volume of images to a template MNI using a rigid transformation. Next, to reach automatic identification of the ''mid-sagittal'' plane, several pre-processing steps were needed. First, we individuated in the overall T1weighted sequence, a sub-volume of 40 images, called S 1 , (arbitrarily) centered on the ''middle slice'' of slice number 184. Then, we defined the gravity center of the headmask on the middle slice. In particular, the gravity center was obtained by thresholding the morphological T1-weighted images using Otsu's method [24] ( Figure S1). In this way we obtained a binary mask where the coordinates X and Y of the barycenter B were given by Eq. 1, 2: where (x i ,y i ) and p i are the coordinates and the value of the i-th pixel and n is the numbers of pixels of image. Second, knowing the position of the gravity center on the middle slice we may automatically segment the corpus callosum on each slice within S 1 . To do this we binarized the image and then we individuated the corpus callosum as the biggest connected component [37] that is positioned closer to the gravity center of the headmask. The threshold value (t) used to determine the connected component was different from that used to find the center of gravity of the head. In fact, we used a higher threshold in order to separate the corpus callosum, a structure with a very high intensity and therefore very clear in a T1-weighted sequence, from surrounding regions. The threshold value used for the segmentation of the corpus callosum was (Eq.3): where t1 was the threshold obtained using Otsu's binarization [24]. To determine the connected components, we used a neighborhood of eight pixels; and before the binarization, we enhanced the contrast of the grayscale image by transforming the values using contrast-limited adaptive histogram equalization (CLAHE) with a 0.02 contrast enhancement limit of 64 tiles [38].We were then able to automatically discriminate the corpus callosum on each single slice. Examples of automatic segmentation of the corpus callosum on different slices, included in the subvolume S 1 , have been reported in Figure S2.
Considering the slice where we detected the smallest area of the corpus callosum, we automatically individuated three lines as shown in Figure 2. Line A is aligned with the left-bottom extreme point (point 1) and the right-bottom extreme point (point 2) of the corpus callosum. Lines B and C were perpendicular to the first line, and they crossed the corpus callosum respectively at points 1 and 2. These lines are helpful to discriminate the R 1 region on each slice that will be used to segment the upper part of the brainstem. The dimensions of the rectangle R 1 are L6L*2/3 where L was the distance between points 1 and 2.
Each slice within S 1 was binarized and the pixels that are out of the rectangle R 1 were set to zero. Next, we individuated the biggest connected component. This component is considered as the upper part of the brainstem (see Figure S3) and will be used to  calculate the number of pixels composing it on each slice (quantification of area).
In accordance with morphological knowledge, we defined the ''mid-sagittal plane'' as the slice where the upper part of the brainstem is minimal. In this slice we have the biggest distance between the midbrain tectum and the quadrigeminal plate; consequently, we may observe the maximal expansion of the Sylvius aqueduct. In fact, the expansion of Sylvius aqueduct (see Figure S4) is considered an additional anatomical marker characterizing the mid-sagittal plane.
Calculating the area in each single slice included in the subvolume R 1 (see Figure S5), the slice nu 20 is where we detected the minimal area of the upper part of the brainstem, thus it was definitively targeted as the mid-sagittal slice for this subject. Due to the extreme anatomical variability of the human brain, this latter step was repeated for each single subject. This approach has been further validated comparing the mid-sagittal slice, as automatically defined by LABS for each single subject, with that provided by two blind expert neuroradiologists (P.P, F.F). An excellent agreement was found. In fact, in 94% of the cases the algorithm individuated the same slice as those chosen by the neuroradiologists. Furthermore, in the remaining 6% of the cases, the difference in the spatial position between the automated and manual slice was minimal (about 1-2 slices).

Individuation of boundaries of pons and midbrain on mid-sagittal slice
The manual approach proposed by Oba et al. [22], and Luft et al. [26], to identify the boundaries of the pons and the midbrain requires localizing the entire brainstem on the mid-sagittal slice. We used the same method shown in the above section. In particular, repeating some steps previously described, we identified the position of the corpus callosum on the mid-sagittal slice where we re-applied the R 1 region ( Figure 3A). Within this subvolume we individuated the brainstem as the largest connected component. In this case we did not define a lower limit for the rectangle R 1 (as previously done, see Figure 2), thus allowing us to entirely segment the brainstem ( Figure 3B).
The threshold value used to evaluate the connected component was (Eq.4): After the segmentation of the brainstem ( Figure 3B), we were able to identify another two critical anatomical landmarks that will become important for the following steps: mammillary body and quadrigeminal plate ( Figure 4A). To identify the mammillary body we extracted the contour of the brainstem and studying the variation of thickness of the upper left profile we identified the enlargement of the rostral midbrain and so the mammillary body ( Figure S6). To identify the quadrigeminal plate we employed again the R 1 region, deleting the pixels belonging to the brainstem and identifying the plate as the connected component that had the center of gravity closer to the midbrain tectum ( Figure S7).
At this point we can define the boundaries of midbrain and pons according to the method proposed by Oba et al. [22], and Luft et al., [26]. In particular the cranial border of the midbrain is defined as the axial plane through the mammillary body and the superior edge of the quadrigeminal plate. The caudal border is defined as the axial plane aligned for the superior pontine notch and the inferior edge of the quadrigeminal plate. The inferior boundary of pons is composed of a plane parallel to the latter plane and aligned with the inferior pontine notch ( Figure 4B) To automatically define lines A and B, we employed three points ( Figure 5). The positions of the first and the second points are determined by extrapolating the contour of the left side of the region and defining the variations of its profile. Points 1 and 2 represent respectively the beginning and the end of the anterior ''hump'' of the pons. To find the third point we isolated the quadrigeminal plate and defined its inferior point. Finally, we separated the pons from the cerebellum using a coronal plane through two points belonging to the dorsum of the brainstem ( Figure S8).

3D segmentation of the pons and midbrain
In the final stage we extracted the entire volume of the pons and midbrain. First, we removed all pixels outside the brainstem setting the pixels to zero outside the boundaries defined in the previous sections and we resampled volumes to isotropic dimensions of 0.5 mm to allow a better spatial resolution of small brain structures. Then, using a thresholding approach we identified the area of the midbrain and pons on each slice, taking the largest connected component in the binarized image. Before the binarization we processed the image using contrast-limited adaptive histogram equalization. Furthermore, we re-introduced a corrector factor to binarize the image: the threshold value used was (Eq.5): In the Figure S9 and S10 we highlighted in red the automatic segmentation of the midbrain and pons as performed by LABS. As shown in the last slices of Figure S10, the loss of anatomical definition in the upper part of the pons is dependent upon the complex separation between the pons and the middle cerebellar peduncles. In the neuroradiological and neuroimaging community there is no consensus on which anatomical landmark might be useful to improve this separation. For this reason, we decided to introduce another arbitrary landmark in order to better delineate the middle cerebellar peduncles. To do that, the lateral margins of the superior cerebellar peduncle were used for demarcating the boundaries between the middle cerebellar peduncles and pons. In particular, a volumetric slab of 40 mm (0.5-mm section thickness) tangent to the floor of the fourth ventricle was placed on a midsagittal plane to cover the entire extension of superior cerebellar peduncles ( Figure S11).
The resulting oblique coronal images were used for subsequent individuation of two bilateral points employed as anatomical landmarks of the middle cerebellar peduncles ( Figure S12). We used these two points to define two vertical lines, which we then used to separate the pons from the middle cerebellar peduncle ( Figure S13).
Finally, it is possible to obtain a 3D reconstruction of the automatic segmentation of the pons and midbrain using the matlab function, called ''isosurface'' ( Figure 6). Overall, our algorithm run-time is between 10 min and 15 min on a 2.30 GHz Asus with an Intel Core i7-3610QM processor.

MRI-based Manual Volume
Automated brainstem subregion accuracies were compared with manual segmentations as performed by two independent raters (G.Z)(F.F.), with more than 10 years of experience in neuroradiology, blind to the aim of this study. Manual segmentation was performed using MRIcro software (www.mricro.com), drawing, on the mid-sagittal slice, three lines that defined the cranial and the caudal borders of the midbrain and the inferior boundary of the pons in accordance with the methods described by Oba et al. [22] and Luft et al. [26]. All pixels outside these boundaries were then automatically set to zero and the data sets were reoriented into the axial plane. Raters were thus able to segment the pons and midbrain on the same slices used by LABS. The binary masks representing the results of manual segmentation were generated for each subject and considered as the gold standard to evaluate   the performance of the LABS. The LABS output was similarly converted into binary images. Finally, the accuracy of LABS's performance was compared to manual tracing using the following criteria [39]: (i) percent volume overlap or Dice's coefficient as defined in Eq. 6 [40], (ii) percent volume difference as defined in Eq. 7 and (iii) the Hausdorff distance [41].
(i) The DICE coefficient is one of a number of measurements made to determine the extent of spatial overlap between two binary images. It is commonly used in the neuroimaging community to report on the performance of segmentation methods, and its values range between 0 (no overlap) and 1 (perfect agreement). Given two different labels of a structure, L 1 and L 2 , and a function V(L), which takes a label and returns its volume, the percent volume overlap is given by: (ii) For each individual segmentation result we find the volume (V) as the number of labeled voxels multiplied by the voxel dimensions. We then calculate the percentage absolute volumetric difference (AVD) as the ratio of the absolute difference between the original volume and the segmented volume, to the original volume (Eq.7). The absolute value is used to account for some segmentation results having a lower volume than the gold standard, and others having a higher volume.
(iii) The modified Haussdorf distance between two point sets A~a 1 , . . . ,a NA f g nd B~b 1 , . . . ,b NB f g f size NA and NB is defined as (Eq. (8)) where d B a ð Þ represents the minimum distance value at point a to the point set B.

Statistical Analysis
Statistical analysis was performed with Statistical Package for Social Sciences software-SPSS (version 12.0, Chicago IL, USA). Assumptions for normality were tested for all continuous variables. Normality was tested using the Kolmogorov-Smirnov test. All variables were normally distributed.
To analyse effects of age, a linear regression model (r's Pearson) was applied with the volumes of the midbrain and pons as extracted by LABS and by the two independent raters. Unpaired ttest was used to assess the presence of anatomical differences between the brainstem regions of healthy controls and AD patients. Age and gender were included in this model as nuisance variables. All statistical analyses had a two-tailed a level of ,0.05 for defining significance.

LABS validation on healthy controls
LABS showed that the average volumes of midbrain and pons in the human brain were of: 4031, 6 mm 3 and 10440,4 mm 3 (respectively, see Table 1 and Table 2) in agreement with previous post-mortem evidence [42]. The linear regression model revealed significant associations between age and the volumes of brainstem subregions either as measured by LABS (r = 20.44, p-value = 0.01; r = 20.36, p-value = 0.02; respectively for midbrain and pons) or by manual tracers (first rater: r = 20.36, p-value = 0.03; r = 20.31, p-value = 0.04; second rater: r = 20.34, p-value = 0.04; r = 20.33, p-value = 0.03; respectively for midbrain and pons).
To evaluate the accuracy of the LABS's parcellation compared to the gold standard (represented by manual segmentation) we used several MRI metrics (Table 1 and Table 2; Figure 7). A) DICE coefficient; the mean and standard deviation was excellent: (first/second rater) 0.9160.03/0.960.03 for mibrain; 0.9360.03/ 0.9560.03 for the pons. Similarly, the inter-rater variability for DICE coefficients was consistent among raters (mean 6 SD: 0.9560.02; 0.9760.03; respectively for the midbrain and pons). B) Volume Ratio; calculation of the volume ratio between the entire midbrain and pons as provided by LABS with respect to the gold standard confirmed the elevated accuracy of our method showing a slight tendency of LABS to overestimate morphology of the pons (mean 6 SD volume ratio for the first/second rater: 0.99960.11/ 1.000960.133 for midbrain; 1.0560.07/1.02460.05 for the pons). C) Hausdorff distance; examining the surface distances led to more meaningful comparisons between structures as only accuracy of the segmentation boundaries is taken into account. Our tabulated results showed an increased LABS surface distances over the midbrain (mean 6 SD for the first/second rater: 3.00560.75/2.89560.7), while the LABS segmentation boundaries followed the manual gold standard boundaries better over the pons (mean 6 SD for the first/second rater: 1.96661.86/ 2.0362.01), although a large standard deviation among measurements was detected.

LABS validation on neurological population
To further stress the robustness of our MRI-based automated segmentation method, we analysed MRIs of a neurological population. In particular, from the initial cohort we compared the LABS's performance against manual segmentation on randomly selected 10 AD patients extracted from ADNI database (Table S1). Figure S14 showed a qualitative evaluation of the overall 3D reconstruction of the pons and midbrain in the selected AD patients as performed by LABS. Table 3 (60.02). Similarly, the inter-rater variability for DICE coefficients was consistent among raters (mean 6 SD: 0.9160.08; 0.9560.06; respectively for the midbrain and pons). B) Calculation of the volume ratio between the entire midbrain and pons as provided by LABS with respect to the gold standard confirmed the elevated accuracy of our method also considering neurological patients, with values around 0.95 for the midbrain and values around 0.98 for the pons (mean 6 SD volume ratio for the first/second rater: 0.9560.09/0.9560.11 for midbrain; 0.9760.05/0.9860.08 for the pons). C) Hausdorff distance; similar to that found in healthy controls, tabulated results showing an increased LABS surface distances over the midbrain (mean 6 SD for the first/second rater: 1.7160.68/2.1560.83), while the LABS segmentation boundaries followed the manual gold standard boundaries better over the pons (mean 6 SD for the first/second rater: 1.0760.46/1.3360.59).
Since previous post-mortem and neuroimaging studies demonstrated that patients with AD might be characterized by early abnormalities of these two brainstem subregions [20,21], we further evaluated the presence of morphological atrophy, comparing the brainstem volumes, as assessed by LABS, on a larger sample size of AD patients and healthy controls (nu 40 for each group). Unpaired t-test revealed significant findings when comparing the pons and midbrain volumes between groups. The pons volume in AD patients resulted to be strongly reduced (,10%) with respect to controls (T -value = 4.48, p-level = 0.00002; mean 6 SD: 10693.026917.2 mm 3 , 9587.26661260.4 mm 3 ; respectively for controls and AD), whereas the statistical difference detected in the midbrain was less prominent (8.2%)(T -value = 2.89, p-level = 0.004; mean 6 SD: 4864.216492.3 mm 3 , 4461.66729.8 mm 3 ; respectively for controls and AD) (Table S1  and Table S2). To further confirm this later evidence we performed a voxel-based analysis on the same dataset using a well-known unbiased advanced neuroimaging tool: voxel-based morphometry (VBM). The objective quantification of volumetric loss in the brainstem subregions, as performed by LABS, was also confirmed by this different MRI approach (see Figure S15).

Discussion
Segmenting sub-cortical structures from 3D brain images has attracted much attention in recent years given the numerous clinical and neuroscientific applications. Until now, a large amount of advanced post-processing MRI tools have been developed allowing very reliable and consistent objective quantification of almost all subcortical regions of the human brain. However, two clinically relevant regions, such as the pons and midbrain, have received very little attention. As recently stated by Lambert et al., [6], this is in part due to the difficulties in resolving in vivo the internal architecture of the brainstem in a reliable and repeatable fashion. We present here a novel MRI method, called LABS, to automatically segment these two complex brain regions. Overall, LABS may be considered a part of the knowledgedriven methods [14][15][16] that generally use implicit or explicit anatomical knowledge to guide the segmentation, performing better when applied on single brain structures. Generally, this class of automatic 3D segmentation methods presents difficulties if pathology, scan sequence or manual delineation protocol differs from those that the method is designed for [14]. Our data did not support this hypothesis. Indeed, the robustness and accuracy of our method was tested by means of quantitative evaluations both on a large group of healthy subjects and on neurological patients characterized by evident anatomical abnormalities in the brainstem [20,21]. Moreover, further evaluation of our method has also been reached including MRIs acquired with a different scanner and protocol (ADNI database) [25]. On the entire set of healthy controls, quantitative comparison of the LABS's segmentations against the corresponding manual measurements showed excellent performance considering all reliability metrics. In the neurological realm, our software showed high performance in the segmentation of the pons, while more variability was detected in the quantification of the midbrain volume. However, when we directly compared the volumetric measurements of the brainstem subregions, as obtained from LABS, between controls and AD patients we found significant atrophies of the pons and midbrain. Overall, it is important to highlight that among all MRI metrics, LABS's performance had slightly lower accuracy when distancebased measurements were considered. In fact, the Hausdorff distance value, a measure strongly influenced by the shape and surface of the brain region, was higher in the midbrain with respect to pons, thus confirming the intrinsic anatomical complexity of this subcortical region. To the best of our knowledge, LABS is the first advanced neuroimaging technique able to automatically segment the pons and midbrain of the human brain. State-of-the-art methods for human brain MRI segmentation using conventional (manual morphometry) [19,22,23,26] and advanced automatic methods [3][4][5][7][8][9][10][11][12][13][14][15][16] are not applicable to clearly separate the midbrain and pons. LABS represents the first approach that clearly distinguishes these two brainstem subregions without requiring manual intervention. In fact, the segmentation and labeling of the pons and midbrain are performed using a revised landmark-based approach that integrates the well-known thresholding-based approach, with the individuation of specific anatomical markers (i.e., corpus callosum, mamillary body and quadrigeminal plate) useful to better identify and separate the brainstem subregions. Generally, automated segmentation methods, using landmark-based approach, have a more widespread applicability, mainly in neurological context [8]. However, in contrast to atlas or template-based methods (where atlas definition is highly dependent upon the population that they were obtained from), LABS relies heavily on anatomical landmarks present in all neurological and psychiatric populations.
Despite the excellent performance provided by LABS, our method has some limitations that deserve to be discussed. First, the robustness of our method relies heavily on the intensity thresholding of MRIs. Generally, intensity-based classification of MRIs has proven problematic. Intra-scan intensity inhomogeneities due to radio-frequency coils or acquisition sequences are a common source of difficulty that ultimately may disturb intensity-based segmentation methods. Although we also demonstrated the robustness of our method on a different scanner and protocol (ADNI database), a future target should be the inclusion of additional algorithms able to adapt intra-scan and inter-scan  intensity inhomogeneities, by using a priori knowledge of tissue proprieties and voxel-intensity [5]. Another limitation was the chosen anatomical boundaries for the separation of the pons and midbrain from surrounding structures. In particular, as regards the separation of the middle cerebellar peduncles from the pons, due to the lack of shared anatomical criteria, we only referred to one previous morphometric criterium [19] to extract useful landmarks. However, to improve the separation between the pons and middle cerebellar peduncles, we retain that an useful future improvement will derive from the implementation of a multimodal approach, integrating diffusion tensor images (DTI) information in the segmentation steps of our algorithm. With DTI, it would be possible to deeply delineate the boundary between the brainstem and the cerebellar peduncles using cerebellar tracts to place new reliable anatomical landmarks, Final, the computation of the midsagittal plane was restricted to planes parallel to the orientation of the volume after registration, without sub-voxel accuracy -as opposed to other midsagittal plane detection algorithms [43]. Although it has been proposed that this does not seem to be of critical importance in 3D brain images [12], we retain that this lack of precision had a negligible impact on our method since segmentation steps specifically relied on well-defined anatomical landmarks.
In conclusion, after additional developments (i.e., automatic adaptive correction of intensity inhomogeneities for segmenting MRIs), we retain that our method might represent a useful tool for future applications in clinical practice.  Figure S15 VBM analysis revealed the presence of significant volumetric WM loss in the midbrain and pons of AD patients compared to healthy controls. In order to further validate measurement of brainstem as performed by LABS, we employed voxel-based morphometry (VBM) in order to reveal subtle volumetric loss in AD patients. Data were processed using the SPM8 software where we applied VBM implemented in the VBM8 toolbox, incorporating the DARTEL toolbox that was used to obtain a high-dimensional normalization protocol (http://dbm.neuro.uni-jena.de/vbm.html). Images were bias-corrected, tissue classified, and registered using linear (12-parameter affine) and non-linear transformations, within a unified model. Subsequently, the warped white matter (WM) segment was affine transformed into MNI space and were scaled by the Jacobian determinants of the deformations (modulation). Finally, the modulated volumes were smoothed with a Gaussian kernel of 8 mm. The WM volume maps were statistically analysed using the general linear model based on Gaussian random field theory. We investigated the presence of volumetric differences between AD patients (nu40) and healthy controls (nu40) using unpaired t-test. Age and total intracranial volume (ICV) were included in the model as covariates of no-interest. We selected midbrain and pons as regions of interest (ROIs) for VBM analysis. These ROIs were created with the ''aal.02'' atlas included in the Wake Forest University Pickatlas software version 1.04 (http:// www.fmri.wfubmc.edu/download.htm). Statistical threshold was set at P,0.05 with Family-Wise error (FWE) correction for multiple comparisons within ROIs. As showed in Figure S15