Magnetic Resonance Imaging of the Newborn Brain: Automatic Segmentation of Brain Images into 50 Anatomical Regions

We studied methods for the automatic segmentation of neonatal and developing brain images into 50 anatomical regions, utilizing a new set of manually segmented magnetic resonance (MR) images from 5 term-born and 15 preterm infants imaged at term corrected age called ALBERTs. Two methods were compared: individual registrations with label propagation and fusion; and template based registration with propagation of a maximum probability neonatal ALBERT (MPNA). In both cases we evaluated the performance of different neonatal atlases and MPNA, and the approaches were compared with the manual segmentations by means of the Dice overlap coefficient. Dice values, averaged across regions, were 0.81±0.02 using label propagation and fusion for the preterm population, and 0.81±0.02 using the single registration of a MPNA for the term population. Segmentations of 36 further unsegmented target images of developing brains yielded visibly high-quality results. This registration approach allows the rapid construction of automatically labeled age-specific brain atlases for neonates and the developing brain.


Introduction
Anatomical structures can be segmented in biological images by transfer of voxel labels from an analogous image previously segmented into anatomical regions, or atlas [1]. This requires an accurate alignment and correspondence of structurally equivalent regions between the atlas and the target image usually achieved using non-rigid registration [2,3,4,5,6]. Segmentation methods often first register the atlas to the target image and then segment the target image into anatomical structures based on transferred information [5,7,8,9,10,11,12], although registering multiple atlases to the same target with subsequent fusion of different segmentations will frequently improve the final segmentation result, compensating for nonsystematic errors in single registrations [13,14,15,16,17].
Magnetic Resonance (MR) images of the brains of newborn infants have been particularly difficult to segment [18] due to: the low tissue contrast; signal inhomogeneity; intersubject differences due to the rapid development of the brain, especially in white matter (WM) structures [19]. The development of automatic segmentation methods has been further hindered by the lack of gold standard data for comparison and validation [20].
In this paper we present two methods for the automatic segmentation of neonatal and developing brain MR images into 50 regions of interest (ROI) utilizing a new set of manually defined neonatal atlases called ALBERTs [21]. The first approach is based on fusion of anatomical prior information from various neonatal atlases. The second approach is based on propagation of labels from a maximum probability neonatal ALBERT (MPNA). For both methods we evaluated the performance of different atlases and MPNAs and compared the results to the gold-standard manual segmentations.

Image Acquisition
MR images were acquired using a 3.0 Tesla Philips Achieva scanner (Philips Medical Systems, Best, The Netherlands). The technical characteristics of the scans, as well as detailed demographics, can be found in our previous work [21]. T1weighted magnetization prepared rapid-acquisition gradient echo volumes in the sagittal plane were acquired with an echo time of 4.6 ms and repetition time 17 ms; 124-150 sagittal slices of 1.6 mm thickness were acquired with a 210 mm field of view, a flip angle of 30u, and a 2566256 matrix, resulting in voxel sizes of 0.8260.8261.6 mm 3 . We used the following data sets: N 15 preterm neonates scanned at term (eight female), with a median gestational age at birth of 29 weeks (range 26-35 weeks) and a median gestational age at the time of the scan of the subjects of 40 (37)(38)(39)(40)(41)(42)(43) weeks, manually segmented [21]. Two were twins with a gestational age of 29 weeks, scanned at 40 weeks.
Approval for scanning the subjects had been obtained from the Hammersmith Hospital Research Ethics Committee, and written informed consent obtained prior to scanning. Post-processing of anonymised scan data that had been acquired for clinical purposes did not require individual consent from the individuals who had been scanned.
MR Image Pre-processing T1-weighted 3D image volumes were obtained in DICOM format and converted to the NIfTI format using the UCLA Laboratory of Neuro-Imaging's Debabeler Software (www.loni. ucla.edu/Software/Debabeler). The image matrix was reduced in superior, anterior, posterior, and lateral directions to contain five empty slices (560.82 = 4.1 mm) after the last slice containing skin. Inferiorly, five empty slices (561.6 = 8 mm) were added after the last slice where the posterior floor of the skull was visible. The reduction in matrix size simplified the subsequent bias correction step in that inferior extracranial signal would not have to be Step: Non-rigid registration of cohort to the candidate target; 2nd Step: Averaging of the nonrigid 10 mm transformation (blue arrow); 3rd Step: Inversion of the average nonrigid 10 mm transformation (red arrow); 4th Step: Composite transformation (nonrigid 2.5 mm+inverted average 10 mm); 5th Step: Averaging of the MRIs in average (AV) space. 1b. Second Iteration: 1st Step: Non-rigid registration of cohort to the new candidate (AV space); 2nd Step: Averaging of the nonrigid 10 mm transformation (blue arrow); 3rd Step: Inversion of the average nonrigid 10 mm transformation (red arrow); 4th Step: Composite transformation (nonrigid 2.5 mm+inverted average 10 mm); 5th Step: Averaging of the MRIs in New AV space. doi:10.1371/journal.pone.0059990.g001 considered. The padding around the skull was maintained because our previous work had shown the skull to be an essential landmark for successful registration in young children [16,17]. T1-weighted image volumes were corrected for non-uniformity using the FAST Software from the FMRIB Software Library (FSL version 4, [22]). T1-weighted images were resampled, creating isotropic voxels of 0.8260.8260.82 mm 3 using windowed sinc interpolation, to make them compatible with Analyze AVW 8.1 software. Re-orientation of the sagittal T1 volumes was performed with the horizontal line defined by the anterior and posterior commissures (AC-PC orientation) and the sagittal planes parallel to the midline [23]. We reduced the number of interpolation steps during reorientation by coregistration of the native images onto the re-orientated versions of themselves. This was performed using a method based on normalised mutual information and 7 th degree B-spline interpolation. Coregistration was performed using SPM2 (Statistical Parametric Mapping, Wellcome Trust Centre for Neuroimaging, UCL, London) [24,25] under Matlab version 6.5 (Mathworks Inc, Sherborn, MA, USA). A 16-bit voxel depth was maintained throughout the process.

Delineation, Manual Segmentation, and Nomenclature
The MR images had been manually segmented into 50 ROIs each, covering the whole brain, using newly created protocols established according to previously described principles [26,27,28]. Each voxel belongs to one ROI only, and their ensemble thus constitutes a brain atlas: a label-based encephalic ROI template (ALBERT), as described in detail in Gousias et al., 2012 [21]. The 60-pages illustrated Appendix of the aforementioned companion paper consists of the protocols for all the regions.
In the remainder, the brain atlases are designated as ALBERTs; ALBERTs having been registered onto other individual brain MRIs via their underlying MRI and MRI-to-MRI registrations as ''transformed ALBERTs''; average greyscale MRIs as ''Templates''; fused atlases in a template space (i.e. in an average space as opposed to an individual space) as maximum probability neonatal ALBERT (MPNA).

Automatic Segmentation via Multiple ALBERTs
Registration. After pre-processing, every neonatal subject was paired with every other neonatal subject for image registration, resulting in 380 (20619) image pairs. All pairs were aligned using 3D voxel-based registration in three steps using IRTK Software (available via http://wwwhomes.doc.ic.ac.uk/,dr/ software/): rigid, affine and non-rigid registration. Parameter settings were tuned to the specific challenges posed by images of neonates [17]. Blurring of both target and source images during the subsequent affine registration improved results. Furthermore, we increased the resolution levels from one to three, increased the number of iterations from 100 to 200, decreased the length of steps from 3.2 to 2 and used the correlation coefficient as the similarity measure in this step. For the final non-rigid registration, iterations were increased from 10 to 100 compared with adult-to-adult registration. The non-rigid step was based on the manipulation of a free-form deformation represented by displacements on a grid of control points blended using cubic B-splines [29] (available via http://wwwhomes.doc.ic.ac.uk/,dr/software/) and maximizing normalised mutual information (NMI; [30]. The registration was refined in a multi-resolution fashion by stepwise reduction of the control point spacing from 20 mm to 10 mm, 5 mm and finally 2.5 mm. Registration pairs were processed in parallel on a cluster of approximately 400 Linux PCs, controlled by Condor software (Version 6.7.13, http://www.cs.wisc.edu/condor/).
Label propagation. The output of the registration of an image pair is a transformation that maps the neonatal source image to the neonatal target image. These transformations were then applied to the ALBERTs using nearest-neighbour interpolation, resulting in 19 individualized propagated transformed ALBERTs for each of the 20 target brains.
Decision fusion. Each resulting transformed ALBERT assigns a structure label to every voxel in the corresponding MR image volume. To combine the information from multiple individual label sets into a single segmentation, we applied voterule decision fusion. The consensus class of each voxel was defined as the modal value of the distribution of the individual label assignments [31]. This approach yielded good results in our previous studies [14,16,17,26]). In the case of non-unique modes, one of the modal values was assigned at random. Even vs. odd numbers of individual label sets resulted in twice the number of equivocal voxels, but in absolute terms, the fraction was very small (less than 1% of the total number of voxels) [16]. Three versions were created.
First, we created fused atlases for all 20 subjects based on fusion of all remaining 19 transformed ALBERTs (ALBERTs_19). Secondly, for the 15 preterms, we also created fused atlases based on only the remaining 14 preterm transformed ALBERTs (ALBERTs_14_Pre); and finally for the term population (n = 5), we also created fused atlases based on only the remaining four term transformed ALBERTs (ALBERTs_4_Term).

Automatic Segmentation via Probabilistic Templates and MPNAs
Neonatal template creation. One of the term-born controls was selected as the candidate target. All remaining 19 data sets were registered to the candidate target with rigid, affine, and nonrigid registration starting with 20 mm spacing down to 2.5 mm as described above. The process is illustrated in Figure 1. The 10 mm non-rigid transformations were averaged and the average transformation was inverted, on the assumption that this average transformation maps the hypothetical average space we want to create to the candidate target space. Combining each 2.5 mm non-rigid transformation with the inverse average 10 mm nonrigid transformation we transferred each image, through the candidate target space, to the average space. In the second iteration ( Figure 1b) we used the mean intensity image in average space as the new candidate target, in order to reduce possible bias arising from the choice of the first candidate target. We registered the 19 data sets to the new candidate average and repeated the steps twice. After the second iteration, we obtained the new average space. Similar approach has been used for the creation of pediatric templates [32].
In order to assess the influence of the bias resulting from the choice of the candidate target on subsequent MPNA registrations we created four different average spaces. For three of these, the visually most representative and symmetrical of the term controls was selected as a candidate target and used to transform 1) the whole cohort, 2) the preterm data sets only, 3) the term data sets only. Finally, we selected the visually most representative and symmetrical preterm data set and transformed only the preterm data sets to this candidate target (Table 1).
After the creation of the average spaces for the cohort of neonates, each data set was registered to each average space using the same parameter settings as previously and the segmentations were fused to create eight different MPNAs (Table 2), each corresponding to one of the four template spaces (Table 1). This is similar to the creation of a maximum probability atlas for the pediatric population [32].

Validation
Validation of all automatic segmentations was achieved via overlap measurements, expressed as a Dice index (twice the intersection divided by the union; [33,34]) between the automatically created segmentations and the corresponding manually created ALBERT, which served as the gold standard. Automatic segmentations were based on individual pairwise registrations and subsequent label fusion of: N all 19 remaining manually created ALBERTs -ALBERTs_19, N 14 remaining preterms -ALBERTs_14_Pre, N four remaining terms -ALBERTs_4_Term. transformed ALBERTs. Only the template creation differs in terms of the initial candidate target (see Figure 1 as well as back-registration of the eight MPNAs in the four average template spaces. We also tested the performance of twin pair ALBERT after single label propagation on the corresponding twin brain, compared to the fusion of non-corresponding priors.
We compared the performance of the best methods for each group using two-tailed paired TTEST, after Bonferroni correction for multiple comparisons, with regards to each ROI. For the preterm population we compared the results of ALBERTs_19 with ALBERTs_14 and MPNA_04. For the term-borns we compared the results of ALBERTs_19 with ALBERTs_4 and MPNA_04_-Terms. Besides, we used two-tailed paired TTEST to compare the overall performance of these methods for each group.

Results
A total of forty atlases for 3T MR data sets of neonates resulting from individual pairwise registration and label fusion were created automatically (Figure 2), consisting of 50 ROIs each. Each atlas is the result of label fusion of the remaining 19 ALBERTs (20 atlases), 14 ALBERTs in the cases of preterms (15 atlases), or 4 ALBERTs (5 atlases) in the cases of terms.
In the case of template-based segmentations resulting from single registrations of a template to a target, we used the four templates created using different candidate targets and fusing different cohorts ( Figure 3) described in Table 1. The eight MPNAs created for the corresponding templates ( Figure 3) have been described in Table 2. In total, this resulted in 160 (8620) individualized segmentations via templates and MPNAs that were compared with their respective manual gold standard.
Validation was performed by means of Dice measurements. In Table 3 and Figure 4 we display the results of the validation of the different approaches for automatic segmentation when compared with manual gold standards. In Figure 4, we display some comparative Dice measurements for the approaches that performed best, either fusing ALBERTs or using MPNAs. The best methods for each group and the Dice indices for all 50 ROIs are displayed in Tables 4,5,6,7,8,9. We compared the performance of the best methods for each group using two-tailed paired TTEST, after Bonferroni correction for multiple comparisons, with regards to each ROI. For the preterm population we compared the results of ALBERTs_19 with ALBERTs_14 and MPNA_04. For the term-borns we compared the results of ALBERTs_19 with ALBERTs_4 and MPNA_04_-Terms. Two-tailed paired TTEST showed that the overall performance of ALBERTs_19 and ALBERTs_14 was significantly better than MPNA_04 (Table 3). Also, for the term population, MPNA_04_Terms performed significantly better than AL-BERTs_19 and ALBERTs_4 (Table 3).
For the preterms, in a regional level, ALBERTs_19 performed similarly to ALBERTs_14, without significant differences after Bonferroni correction for multiple comparisons (Tables 4-5). ALBERTs_19 performed better than MPNA_04 in all the regions in either one or both hemispheres, apart from the posterior part of the superior temporal gyrus and the posterior part of the cingulate gyrus (Tables 4, 6).
For the term-borns, in a regional level, ALBERTs_19 performed similarly to ALBERTs_4, without significant differences after Bonferroni correction for multiple comparisons (Tables 7-8). MPNA_04_Terms performed better than ALBERTs_19 in the  Table 4. doi:10.1371/journal.pone.0059990.g004 anterior part of the middle and inferior temporal gyrus and the parietal lobe (Tables 7, 9). MPNA_04_Terms performed better than ALBERTs_4 in the posterior part of the parahippocampal gyrus, the thalamus and the lentiform nucleus (Tables 8-9).
In Figure 5 we display the preliminary results of an automatic segmentation of developing brain MRIs that do not belong to the cohort of twenty used for the creation of the manually defined ALBERTs. In this instance, automatic segmentation was achieved via a single registration of a template constructed from term-borns (Template_04), whereas the MPNA was obtained through fusion of all ALBERTs transformed into the space of Template_04 (MPNA_04, see Tables 1 and 2). The results are visually acceptable.
ALBERTs and MPNAs with corresponding MRIs and templates will become available through our website www.braindevelopment.org.

Discussion
We present two methods for automatic segmentation of neonatal brain MR images into 50 ROIs. The first approach is based on fusion of anatomical prior information from various manually constructed neonatal atlases after one pairwise registration per atlas used. The second approach is based on propagation of labels from various neonatal MPNAs, requiring only one registration. In both cases we evaluated the performance of Such Dice overlaps are in line with results using maximum probability maps in adults, and somewhat lower than multi-atlas propagation and label fusion in adults [14]. Segmentations of unlabeled ex-cohort target images yielded segmentations of high quality on visual inspection. In terms of atlases used, pre-processing pipeline and parameters used, we present the first detailed evaluation of several strategies for the automatic segmentation of neonatal and developing brains. This was made possible through the availability of manual priors. It took 18 person-months to delineate the 1000 (50620) structures and thus create the first cohort of neonatal manual priors, and another four to check them for consistency with the protocols; it is unlikely that larger single-investigator datasets will ever become available. The ALBERTs will become available through our website (www.brain-development.org). While validation of automatic labeling methods is only possible within-sample, where labels created manually with the same protocol are available for calculating overlaps, the availability of the atlases will make it possible to assess automatic segmentation of other cohorts, e.g. NIHPD (http://pediatricmri.nih.gov/nihpd). The validation of the two different methods is based on leaveone-out approaches, which have been widely used by researchers in the past, including our team [14,16,26,35,36]. In such an approach, for the validation of the ALBERTs_19 performance for example, for each of the target brains we use as priors the remaining 19. This means that the manual segmentation of the target brain and the automatic segmentation we obtain after registration, propagation and fusion are two totally independent segmentations. The results of the validation by all means highlight the potential to segment unseen brains.
Automatic segmentation is commonly used in studies of adult MR brain images but has been challenging in infants. An initial spatial normalization to a template or average brain in a standard stereotaxic space [37,38] can be problematic. Spatial normaliza- tion requires an appropriate template [39], and when cerebral images of children are aligned using an adult template the variation of anatomical landmarks is increased [40,41], and greater nonlinear local deformations are required for registration [42]. Indeed the use of the adult MNI template [43] in infants and children has been criticized [44,45,46], and pediatric templates recommended for the analysis of pediatric images [47,48,49]. Transforming neonatal rather than pediatric cerebral images to an adult template has additional difficulties [37] including agedependent differences in regional brain size [50] and unmyelinated white matter with different MR characteristics in neonates [51], so that several groups have constructed specific neonatal templates [37,52,53,54,55,56].
Tissue segmentation is also difficult due to the different and highly variable tissue characteristics [18,57]. Prastawa et al. (2005) reported an automated method using a three-subject atlas for GM, CSF and myelinated and unmyelinated WM, but did not attempt subcortical GM segmentation [20]. Warfield et al. (2000) use a specific template for newborn brains with predefined classifications for myelinated and unmyelinated WM [58]. Huppi et al. (1998) and Inder et al. (2005) showed tissue class segmentation results of newborn infants using this method [59,60]. Kazemi et al. (2011) presented a neonatal brain phantom that consists of 9 different tissue types: skin, fat, muscle, skull, dura mater, gray matter, myelinated white matter, nonmyelinated white matter and cerebrospinal fluid [61]. Despite the difficulties there have been previous reports of automatic segmentation methods for newborn infants. Nishida et al. (2006) presented a semi-automated method for segmentation of preterm infants at term corrected age into anatomical ROI. Unfortunately, their cohort did not include any term controls and they did not validate with any gold standard segmentation and hence the comparison with our method is difficult [62]. There are also approaches based on Diffusion Tensor Imaging, resulting in segmentations with numerous regions with or without clear anatomical or functional correspondence [63,64]. These approaches yield results that are visually plausible, but have not yet been compared or validated against external standards in the neonatal population, as for example defined anatomical protocols. It is hence difficult to compare our work with these two studies.
The average spaces required for spatial normalisation were created using an approach similar to that of Guimond et al. (2000) and Rueckert et al. (2003) for averaging local deformations [39,65], also used in the pediatric population [32]. In some studies, contrary to the main trend of using a standard reference template like MNI, a single subject data set of the image group is selected as the reference or template image [66,67,68]. A disadvantage of this atlas construction method is that the resulting atlas can inherently contain unique features of the selected initial reference image, which results in local topological bias [69]. Group-wise registration, based on the minimization of the average deformation field, could be a solution to the problem [70,71]. However, the presence of a few very unusually shaped brains (cf.  number of subjects available due to the phenomenal effort required for manual delineation, leads us to believe that our strategy of explicitly choosing ''normal looking'' brains is appropriate in this situation. Extremely dolichocephalic subjects or subjects with obvious major asymmetries were not selected as candidate targets. Group-wise registration remains an area for future study. The average Dice indices for the various approaches (Table 3) and the Dice indices for the best approaches (Tables 4, 5 , 6, 7, 8, 9) indicate that fusing ALBERTs of the same group (in terms of degree of myelination) yields better or similar results than fusing more classifiers from different groups: term priors performed better for a term target and preterm priors better for a preterm target, confirming our previous findings [17]. Also, single propagations from twin pair, expected to be more similar than brain MRIs from unrelated subjects, perform at a level comparable to fusion. This finding highlights the importance of resemblance in sulcal and gyral patterns between the source and the target brain, as it has been shown before in corresponding scans between different ages in the context of longitudinal segmentation [17]. Optimal template selection approaches have previously been shown to be effective in atlas-based segmentation of confocal microscopy images of bee brains [15], as well as in human brain segmentation [72,73]. An MPNA was created for the term (MPNA_04_Term) and the preterm brain (MPNA_04_Preterm). This type of atlases [26,32] has shown its potential in the absence of a bigger database or for computational time savings [14]. In the present study, their application results in segmentation accuracies comparable with the segmentation using fusion of transformed ALBERTs. The results of the MPNA_04 template registration, between source template and target image, show the need for crispier and not extremely smooth templates (MPNA_02, 03), which incorporate the basic anatomical information from a smaller number of images and not necessarily the whole cohort (MPNA_01). Besides, the results illustrated in Figure 5 highlight the effectiveness of the MPNA_04 template registration through its intrinsic smoothness to capture the lack of prominent cortical anatomical landmarks in the extremely preterm population. The latter findings highlight the importance of the feature of smoothness, which has to be present but not to an extreme level. Template selection is important, because it has been shown that the choice of the template affects region-based volumetric analysis, either when the template does not correspond to the age cohort [49] or when multiple templates are used [74].
In neonates, the selection of the candidate target was also based on symmetry and normality criteria. The first template (term candidate target -all subjects) is slightly rounder on transverse sections than the second (term candidate target -preterms) ( Figure 3). This happens because the term brains seem to have a more round/spherical brain shape. The difference between the second template and the third (preterm candidate targetpreterms) is more obvious, especially in the subcortical tissues, because of the different candidate target (Figure 3). This could indicate that a good template, in terms of representation of anatomy and corresponding tissue properties, should be limited to a cohort of data sets of tight gestational age range, due to the rapid progression of myelination of the WM and the contrast issues arising as a consequence. In case of a wider gestational age range the template may become extremely blurry. This may be the reason template 3, which was based on the whole preterm cohort and not some images of tighter age range, did not perform as expected for the corresponding preterm population. The fact that template 04, based on a term control candidate average and MRI averaging of the transformed ALBERTs of the remaining four term controls (ALBERTs_4_Term), gave the best results for the term population also supports this statement.
Atlases containing such detailed segmentation can be useful in the monitoring of developmental growth of different brain regions in longitudinal studies or aid group comparisons between normal controls and pathological cases. The associated templates can be used as a reference in functional and connectivity studies and will benefit from the anatomical annotations contained in the associated MPNAs. Both methods presented here yield very plausible and comparable results, ALBERTs performing slightly better in absolute Dice measurements for the preterm. However, MPNAs have the advantage of requiring only one registration per target brain and will require fewer computational power resources (8 hours compared to 2068 = 160 hours for all ALBERTs, even if the latter process can be calculated in parallel on a cluster of computers).