A Novel Mouse Segmentation Method Based on Dynamic Contrast Enhanced Micro-CT Images

With the development of hybrid imaging scanners, micro-CT is widely used in locating abnormalities, studying drug metabolism, and providing structural priors to aid image reconstruction in functional imaging. Due to the low contrast of soft tissues, segmentation of soft tissue organs from mouse micro-CT images is a challenging problem. In this paper, we propose a mouse segmentation scheme based on dynamic contrast enhanced micro-CT images. With a homemade fast scanning micro-CT scanner, dynamic contrast enhanced images were acquired before and after injection of non-ionic iodinated contrast agents (iohexol). Then the feature vector of each voxel was extracted from the signal intensities at different time points. Based on these features, the heart, liver, spleen, lung, and kidney could be classified into different categories and extracted from separate categories by morphological processing. The bone structure was segmented using a thresholding method. Our method was validated on seven BALB/c mice using two different classifiers: a support vector machine classifier with a radial basis function kernel and a random forest classifier. The results were compared to manual segmentation, and the performance was assessed using the Dice similarity coefficient, false positive ratio, and false negative ratio. The results showed high accuracy with the Dice similarity coefficient ranging from 0.709 ± 0.078 for the spleen to 0.929 ± 0.006 for the kidney.


Introduction
Multi-modality imaging techniques have been widely used in preclinical research owing to their advantage in providing complementary information. As a predominant structural imaging modality, micro-CT has often been combined with various functional imaging modalities including positron emission tomography (PET) [1], single photon emission computed tomography (SPECT) [2], and optical imaging such as fluorescence molecular tomography (FMT) [3] and bioluminescence tomography (BLT) [4]. In a hybrid imaging system combining FMT and micro-CT, the anatomical information provided by micro-CT can be used to locate the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 A homemade micro-CT is used to acquire the dynamic contrast enhanced (DCE) images. Then, a principal component analysis (PCA) is performed to reduce data dimensions and noise. Supervoxel generation is implemented to provide primitives for classification. Subsequently, supervised learning algorithms are used to classify the supervoxels into different categories according to the supervoxels' signal intensities at different time points. The heart, liver, spleen, lung, and kidney are extracted separately from the corresponding categories through morphological processing. The bone is segmented by a thresholding method. Finally, all organs were integrated, forming an intact mouse anatomy. The method was validated on seven BALB/c mice and two different classifiers.

Ethics statement
All of the experiments were conducted in accordance with the Institutional Animal Care and Use Committee of Hubei Province. The protocol was approved by the Institutional Animal Care and Use Committee at Tongji Medical College, Huazhong University of Science and Technology (Permit Number: 452). All efforts were made to minimize suffering during the study procedure.

Image acquisition
We used a homemade gantry rotation micro-CT system to acquire the DCE images. The system consists of a micro-focus X-ray tube (Apogee 93501, Oxford Instrument, U.S.) and a complementary metal-oxide-semiconductor (CMOS) based flat-panel detector (Dexela 1512, PerkinElmer, U.K.) that are fixed on a rotation stage (ADRT-260-180, Aerotech, U.S.). The mouse position is adjustable using a three-dimensional translational platform. Source-todetector distance and source-to-object distance are 632.1 mm and 332.7 mm, respectively, contributing to a magnification factor of 1.9. The field of view is 60.5 mm in diameter and 76.5 mm in length.
The heart rate of mice is several times greater than that of humans. Consequently, the non-ionic iodinated contrast agents could be cleared from the vessels of mice in several minutes. In order to acquire DCE images, the scanner was operated using the following high temporal resolution settings: X-ray tube settings of 50 kVp and 800 μA with 1 mm aluminum filter, a 40 ms exposure time per projection, 400 projections, a 360˚continuous rotation, a binning factor of 4, and no gating. Without a sliding ring, each scan ultimately took about 45 s. All images were reconstructed using the Feldkamp's filtered back-projection algorithm accelerated by a GPU [28], resulting in a matrix size of 200×200×450 with an isotropic voxel size of 157.5 μm.
Seven female BALB/c mice with body weights ranging from 14 to 18 g were used for demonstration of the proposed method. A dose of 1 g/kg urethane and 0.2 g/kg α-chloralose was administered intraperitoneally to anesthetize the mouse. Then, a polyethylene catheter, connected with a needle at one end and the tube of a 1 ml syringe at the other end, was inserted into the lateral tail vein and fixed with tape for contrast agent delivery. Each mouse lay prone on the sample holder. The tail with the needle and catheter was kept outside the scanner's field of view. Dynamic image sequence was acquired using the following protocol: after acquisition of the pre-contrast series, iohexol (300 mg I/ml, GE, Shanghai, China) was administrated by a syringe pump (KDS 210, KD scientific, Holliston, MA, USA) at a dose of 25 μl/g and a constant injection duration of 45 s, triggered at the start of the first post-contrast scan. A total of five post-contrast series were obtained with an interval of 50 s. Following the scanning process, mice were revived on a heating pad and returned to their home cages.
During the image acquisition process, some motion, due to breathing and cardiac motions, is likely to occur. Although algorithms [29,30] exist to correct the motion, no registration algorithms were used in this study.

Principal component analysis
As a simple and nonparametric method, principal component analysis (PCA) can be used to reduce the dimensionality of large multivariate datasets and retain the variability at the same time through constructing new uncorrelated variables, namely principal components (PCs) [31]. For each mouse, we reshaped the 3-D image matrix at one time point to a column vector and organized the six column vectors from six time points into a two-dimensional matrix, I-(X, t), with the individual signal versus time curves in its rows (X depicting spatial location and t depicting time). PCA decomposes I into the product of two parts: one of the parts consists of the PCs, which are orthonormal and ordered by decreasing amounts of the variability they represent; the other part is named scores and consists of the weights of each PC in the original data.
We chose the first four PCs, which explained more than 99% of total variability in I, resulting in a new dataset consisting of four scores for each voxel. Thus, the feature dimension reduced from six to four. Then, for each PC, the corresponding scores for the volume were normalized between [0, 1] and reshaped reversely to a 3-D image matrix for subsequent supervoxel generation. The advantage of the PCA here is its ability to remove the effects of noise and also to accelerate the following supervoxel generation and classification by reducing feature dimensions.

Supervoxel generation
Superpixel/supervoxel algorithms provide a convenient primitive to increase the granularity and overcome the influence of noise effectively. Furthermore, superpixels capture the local redundancy and reduce the computational complexity for further processing. The simple linear iterative clustering algorithm (SLIC) [32] was used in this research. The SLIC algorithm adapts a k-means clustering approach to generate 3-D supervoxels. This algorithm has been shown to be a fast method with good performance. SLIC initializes k cluster centers by sampling the grid regularly with distance S ¼ in all three dimensions, where N denotes the number of voxels. Next, the centers are moved to the lowest gradient position in a 3×3×3 neighborhood, followed by iterative clustering. A distance function is defined, combining the spatial and intensity proximity of voxels within a limited 2S×2S×2S region. Ultimately, post processing is applied to enforce connectivity.
The SLIC was extended to an n-feature image to enable extraction of supervoxels from DCE micro-CT images [26]. A distance function combining feature distance and spatial distance was defined as follows: where b jk is the score of the jth voxel corresponding to the kth PC, (x j1 , x j2 , x j3 ) is the 3-D coordinate of the jth voxel, and m is a constant allowing us to weigh the relative importance between color similarity and spatial proximity. Normalizing the spatial proximity and feature similarity by S and m allows the distance measurement to combine these quantities, which have very different ranges. S and m provide control over the size and compactness of the supervoxels, respectively. As presented above, we used n = 4 components, and the scores were between [0, 1]. In our experiments, S was chosen to be 8 by qualitatively assessing the ability of the supervoxel algorithm to correctly separate thin structures, and m was chosen to be 0.02 empirically to provide a good boundary adherence.

Supervoxel classification
After the supervoxels were generated, they were classified into different categories according to the signal intensities at different time points, which were represented by the scores of the first four PCs as previously mentioned. The mean value of all voxels in each supervoxel were calculated and used as the feature vectors. Based on the characteristics of the acquired DCE images, we planned to classify the image into ten categories, which were expected to correspond to heart, liver, spleen, lung, kidney, bone, intestinal wall, intestinal cavity, subcutaneous fat and muscle, and some small regions that adhere to skin that showed irregular changes because of physiologic motion. The training sets were recognized and selected manually from the images to be segmented. In consideration of the organ heterogeneity, for each category, the training set must contain supervoxels from different parts. For example, some supervoxels from the renal parenchyma and renal pelvis were both selected and labeled as kidney in the training set. In order to evaluate the influence of the training sample size, classifications using different numbers of training samples were compared.
The classification was performed with two different classifiers: a support vector machine (SVM) classifier with radial basis function (RBF) kernel and a random forest (RF) classifier.
SVM classifier with RBF kernel: SVM has strong generalization ability and presents advantages in solving small sample, nonlinear, and high dimension pattern recognition problems. An open source library LIBSVM [33] was used in our research. The cost parameter, C, and the shape parameter, γ, of the RBF kernel function were found using a grid search. Twenty-one different C-values, ranging from 2 −5 to 2 15 , and 31 different γ-values, ranging from 2 −15 to 2 15 , were tested, using a 5-fold cross-validation. The (C, γ)-pair that gave the smallest misclassification fraction was chosen for the SVM model. The probability estimates were generated from the SVM decision function.
Random forest (RF) classifier: The RF classifier was included since it incorporates feature selection as a part of its learning and classification algorithms [34]. RF classifiers do not overfit because of the law of large numbers. A RF is an ensemble of decision trees. The number of features selected at each node is recommended to be much less than the total number of features. Therefore, we set it to 2 to ensure the ability of a single decision tree. In addition, we tested the number of trees by 5-fold cross-validation and found 500 trees were enough to ensure the accuracy. The likelihood given by the classifier is computed by averaging the output over all the trees.

Objects extraction and integration
The value for each supervoxel was used as the values for all voxels in the supervoxel. For both classifiers, instead of directly using the classification results, the probabilistic outputs were used. A Gaussian filter was applied on the probability maps, and the voxels were finally labeled to the class with maximum probability. Given that the initial interval, S, for supervoxel generation was set to be 8 voxels, we used a 9×9×9 Gaussian filter with σ = 3.
The final step of our approach is to extract the organs from the corresponding classes by a simple post-processing procedure and then integrate them. For the heart, liver, spleen, lung, and kidney, the procedure was the same. First, a binary image was obtained from the classification results. Then, an object opening using a disk-shaped structure element with a radius of 2 voxels was implemented. Finally, an 8-neighborhood connected component analysis was applied, and holes in the objects were filled.
The bone has many tiny structures. Therefore, it cannot be obtained exactly using a similar procedure. Fortunately, the bone segmentation can be performed on images by a simple thresholding method. Due to the physiological motion, the bone might have small position displacements between different images. For the pre-contrast image, the threshold could be simply determined by the histogram. For the post-contrast images, to avoid the influence of other enhanced organs, the region of interest should be determined in advance by dilating the segmented bone in the pre-contrast image. Then, the threshold segmentation could be implemented in this region.
The subcutaneous fat and muscle and the intestine could not be classified and extracted successfully in this research due to their less prominent enhancement characteristics and anatomical characteristics. After all target organs had been extracted separately, they were integrated into a whole mouse volume of data, and overlaps were eliminated. The priority order was bone, kidney, lung, liver, heart, and spleen, and this order was decided according to the segmentation accuracy and stability.

Validation of the image segmentation
To evaluate our algorithm, we compared the results to the reference datasets segmented manually. In order to quantify the intra-operator variability of manual segmentation, two manual segmentation repetitions were carried out by one expert. For inter-operator variability, a comparison was performed between the segmentation results of two experts. Therefore, a total of three manual segmentations were achieved, which we called M1, M2, and M3, respectively. M1 and M2 were from the first expert, and M3 was from the second expert.
The manual segmentation was performed mainly based on the post-contrast image at 100 s on a 3-D visualization and analysis software platform (Amira, version 5.2.2, Germany). Three metrics, Dice similarity coefficient (DSC), false positive ratio (FPR), and false negative ratio (FNR), were used for accuracy assessment. They are defined as follows: FPR : FNR : where X and Y represent the segmented dataset and the reference dataset, respectively, and |•| denotes the number of voxels. DSC measures the similarity between the two datasets and is a more general measure for accuracy. FPR indicates the percentage of voxels of the segmented data that do not belong to reference data, while FNR indicates the percentage of voxels of the reference data that are not contained by segmented data.

Image acquisition
The images before and after contrast agent administration are shown in Fig 2A. The image at 0 s represented the first post-contrast image acquired during the inflow of contrast agent. Fig 2B  represents the relative signal enhancement versus time curves of the regions whose centers were marked by arrows with same color as in Fig 2A. The relative signal enhancement at a specific time point t is defined as follows: where T 0 and T t represent the intensity of pre-contrast and post-contrast at time point t, respectively. To avoid the influence of noise, we used the mean intensity of a region containing 100 voxels to calculate R t . The curves demonstrate that the heart is enhanced immediately after injection, and the intensity of the kidney remains high for a longer time. The intensity of the spleen has a steeper drop in intensity than the liver. The subcutaneous muscle and the intestinal cavity have little enhancement.
In our experiments, all mice tolerated the injection of 25 μl/g of iohexol, and no immediate behavioral changes were observed. As to the radiation dose, the exposure time was so short that the radiation dose was still very low even though we scanned six times for each mouse. We measured the radiation dose in a mouse carcass with a thermoluminescence dosimeter (CTLD-1000). The detector was calibrated at the National Institute of Metrology of P. R. China. The total dose of six scans is 53.52 mGy, which is only about 1% of the LD50/30 for a small rodent [11].

Influence of the number of training samples
We used a different number of training samples for classification and evaluated the influence of the number of training samples on the accuracy of our segmentation method. The test to evaluate the influence of the number of training samples was performed on one mouse. First, for each of the ten categories, a relatively large number of supervoxels (Table 1) were recognized from the enhanced images and chosen manually to constitute the total data set. Next, five subsets that contained 10%, 30%, 50%, 70%, and 90% of the total number of each category were used as training sets. For each subset, the samples were selected randomly and repeated five times. Finally, these training sets were used for classification, and the target organs were extracted by post-processing. Table 1 lists the number of supervoxels for each category in the total data set. The means and standard deviations of the Dice similarity coefficients compared with M1 for each training number are plotted in Fig 4 and listed in S1 Table. Apparently, the SVM results in a low accuracy for the spleen when the training sample size is too small. In addition, the RF could not extract the lung successfully due to its connection with subcutaneous tissue being misclassified. This happened once when 10% of samples were chosen and

Visual assessment of segmentation results
Fig 5 and S1-S3 Movies show a 3-D isosurface rendering of organs obtained by manual segmentation (M1) and our automatic segmentation using the SVM and RF. Fig 6 shows the segmented organ boundaries superimposed on contrast-enhanced images. Coronal and sagittal slices are presented. The segmentation of the lung and kidney were highly consistent with the ground truth. The heart and liver seemed to be influenced by large blood vessels, such as the posterior vena cava. In addition, the spleen had a larger region than the ground truth.

Quantitative assessment of segmentation results
According to the evaluation results of the influence of the training sample size, the training sample size used for all seven mice in our research was equivalent to about 50% of the total number. The quantitative evaluation of our method based on the two different classifiers is listed in S2 Table and illustrated in Fig 7. The mean values and standard deviations of DSC, FPR, and FNR are plotted. 'SM1' and 'RM1' represents the comparison of the automatic segmentation by SVM and RF with the manual segmentation (M1), respectively. 'M1M3' quantifies the inter-operator variability by comparing the manual segmentations of two independent experts. M3 was taken as reference dataset to compute the FPR and FNR. Similarly, 'M1M2' quantifies the intra-operator variability by comparing two manual segmentation repetitions of one expert, taking the M2 as reference dataset. For the DSC, a mean value closer to 1 means better accuracy. However, for the FPR and FNR, a mean value closer to 0 means better accuracy. The two classifiers yield similar DSC values that are above 0.700 for all organs, which means our results and the ground truth are in good correspondence [35]. The highest accuracy   is obtained for the kidney with a DSC of 0.929, a slightly lower than the DSC between two operators (0.949). Moreover, for the lung, the variability between the automatic and manual segmentation (0.904) is comparable to the inter-operator variability (0.908). Worse result is obtained for the spleen with a low DSC value of 0.709. For the bone, the manual segmentation was only performed in M1. Compared to M1, the bone segmented by the thresholding method yielded a large DSC of 0.925 ± 0.013. The FPR and FNR were 0.083 ± 0.041 and 0.068 ± 0.032, respectively. In order to evaluate the significance of difference between the SVM and RF, a Wilcoxon rank sum test for 'SM1' vs 'RM1' was performed with each organ for each of the three accuracy values. Most of the tests yielded results of p > 0.05, with only three exceptions: the results of the heart FPR and FNR were p = 0.0233 and p = 0.0122, respectively, and the result of the spleen FNR was p = 0.0262.

Discussion and Conclusions
In this manuscript, we proposed a robust automatic segmentation framework for mice based on DCE micro-CT images. Firstly, DCE images were acquired by a homemade micro-CT system. Then, PCA was used to reduce data dimensions, and the image was over-segmented to Dynamic Contrast Enhanced Micro-CT Image Mouse Segmentation Method supervoxels. Subsequently, supervoxels were classified into different categories based on signal enhancement characteristics. Finally, the heart, liver, spleen, lung, and kidney were achieved by simple morphological post-processing, and the bone was segmented by a thresholding method. Seven mice were used to validate the method. Fig 4 illustrates the influence of training set size on the accuracy of segmentation. It is clear that the spleen yields a very small DSC value by the SVM when the training set contains too few samples. In addition, we found that the lung could not be extracted successfully by morphological processing when the RF was applied with a small training set size. This result provides a guidance for decision in the training set size.
The quantitative evaluation demonstrates that our method has high accuracy. Judging from the results of the Wilcoxon rank sum test for 'SM1' vs 'RM1', the two classifiers, SVM with RBF kernel and RF, yield similar DSC without significant difference. As shown in Fig 7, the liver, lung, and kidney achieved remarkable high DSC values (>0.850) and low FPR and FNR values (<0.150), which indicates excellent agreement between the automatic segmentation and manual segmentation, only a little worse than the agreements between two independent manual segmentation. The heart yielded a suboptimal DSC value and a high FPR when it was segmented based on the RF. The segmentation errors occurred mostly due to the influence of large vessels close to the heart, which could be reduced by adjusting the parameter of postprocessing. For example, the radius of structure element used for object opening can be increased to separate the vessels from the heart. The poor performance of the spleen segmentation was induced by its similar intensity change with surrounding tissues, such as pancreas. In addition, due to its small size, small disagreement would lead a great decrease of the DSC value, which could be proved by the large inter-and intra-operator variability of the spleen manual segmentation. The bone segmented by a thresholding method produced a large DSC value of 0.925 ± 0.013.
To date, many efforts have been made to improve the accuracy of atlas-based registrations [12][13][14][15][16]. The performance of registration methods is highly dependent on the relationship between the atlas and the animals. Different strains of the animals need a different atlas. To mitigate individual differences of animals from the same strain, a large number of the animal model must be included, and the posture of the animals must be the same. Baiker et al. [13] had developed an articulated skeleton atlas to overcome large variations in posture. However, the result still had poor accuracy. The DSC between manually segmented and interpolated organs varied between 0.470 ± 0.080 for the kidneys and 0.730 ± 0.040 for the brain, which is far lower than our accuracy. A statistical atlas based on 45 training subjects was constructed by Wang et al. [15] and used to yield high accuracy registration. The lung and heart had a DSC as high as 0.900. However, for the abdominal organs, the DSC was below 0.800. The spleen only had a DSC of 0.450. In contrast, our method has the advantages of simple algorithms and high accuracies without restriction of variety for individual differences and postures of the animals. Except for atlas-based registrations, a few studies on accurate segmentation of whole mice CT images are found in literatures [17][18][19]. The small-animal-dedicated contrast agents, such as iodinated lipid emulsion contrast agents [18] and nanoparticulate contrast agents [19], are often expensive and unavailable. Moreover, due to the low clearance rate, this kind of agent is not suitable for daily repeated imaging and multi-modality imaging. Many simulations [6,18,36], phantom experiments [7,37], and ex vivo experiments [6,38] have proved that the anatomical structures extracted from micro-CT provided a priori for the FMT to reconstruct images with accurate localization and quantification of fluorophore distribution. Owing to the high clearance rate of the non-ionic iodinated contrast agents, the DCE micro-CT images can be acquired immediately after the FMT imaging without moving the mice. Thus, the anatomical information and functional information can be merged directly according to the dual-modal system geometry. As a result, the proposed method could further promote in vivo experiment researches in dual-modal micro-CT/FMT imaging.
It is worth noting that our method still has some limitations. First, no registration for the 4-D images is implemented to correct the breathing and cardiac motion. Second, our method relies on the temporal resolution of the micro-CT scanner due to the rapid clearance rate of the contrast agent. From the curves in Fig 2, we could conclude that a scanner with one-minute temporal resolution is able to capture enough information of different organs in our research for classification. In addition, the administered volume of contrast agent in our research is relatively large, almost 30% of the LD50 [39]. Although all the mice in our experiments tolerated this dose well and no immediate behavioral changes were observed, the adverse effects were not further investigated. However, it can be expected that these limitations will be overcome along with the fast development of the CT techniques. So far, some commercially available scanners are fast enough and have been used in vascular imaging enhanced by non-ionic iodinated contrast agents [21,40]. High performance reconstruction methods, which need fewer exposures, will also reduce the requirements of imaging speed and contrast agents [41,42].