A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

doi:10.1371/journal.pone.0095753

Table 1.

A summary of the supervised classification algorithms.

More »

Expand

Figure 1.

Scatter plots of the T1-w, T2-w and FLAIR voxel intensities and functions of these intensities for 10,000 randomly sampled voxels from 5 randomly sampled subject's MRI studies.

Each point in the plot represents a single voxel from a study. (A–C) Color key for these plots: (A) the FLAIR volume for an axial slice from a single subject's MRI study, (B) the technician's manual segmentation for this slice and (C) the colors that are used in the plots corresponding to this slice. Lesion voxels are pink, voxels within 1 mm of a lesion voxel are orange, voxels within 2 mm of a lesion voxel are blue and all other voxels in the brain are colored grey. The arrows in the figure indicate the order that the features are created. For the unnormalized intensities there is no plane that can separate lesion voxels from non-lesion voxels, but after normalization and with the addition of features that include neighborhood information, a plane is able to separate lesion and non-lesion voxels with improved accuracy.

More »

Expand

Table 2.

A summary of the six feature vectors.

More »

Expand

Figure 2.

An axial slice from a single subject of the FLAIR volume with the normalization procedure, manual lesion segmentation, neighborhood functions of the FLAIR volume, and an example of a classification result.

(A) FLAIR volume; (B) manual lesion segmentation; (C) FLAIR smoothed volume with a neighborhood n = 41; (D) FLAIR first local moment volume with neighborhood n = 5; (E) FLAIR second local moment volume with neighborhood n = 5; (F) FLAIR third local moment volume with neighborhood n = 5. The smoothed volumes act on large neighborhoods, while the local moment volumes act over smaller neighborhoods; (G) The probability map produced for the logistic regression classifier on the smoothed feature vector; (H) The scale of intensities in the probability map.

More »

Expand

Table 3.

A summary of the training set, training set after the voxel selection procedure has been applied, and the validation set.

More »

Expand

Figure 3.

The partial Receiver Operating Characteristic (pROC) curves for the classification algorithms for false positive rates up to 10% in the validation set.

The diagonal line is shown on each plot in black for reference, and represents a classifier that performs as well as chance. A plot is presented for each of the six feature vectors: (A) unnormalized, (B) normalized, (C) voxel selection, (D) smoothed, (E) moments, and (F) smoothed and moments. The performance of the simpler classification algorithms on the feature vectors with features including spatial information are superior to that of the more complex classifiers on the original features on the unnormalized feature vector.

More »

Expand

Figure 4.

The scaled partial Area Under the Curve (pAUC) for each algorithm on each feature vector.

The differences in scaled pAUC comes more from differences in feature vectors than differences in classification algorithms. The scaled pAUC of the simpler classification algorithms in the developed feature vectors are larger than that of the more complex classifiers on the original features in the unnormalized feature vector.

More »

Expand

Figure 5.

The Dice similarity coefficient (DSC) for all pairs of classification algorithm segmentations and manual segmentations.

The binary segmentations for each classification algorithm are at a threshold of false positive rate = 0.5% in the validation set. A plot is presented for each of the six feature vectors: (A) unnormalized, (B) normalized, (C) voxel selection, (D) smoothed, (E) moments, and (F) smoothed and moments. On the developed feature vectors, the class labels assigned to the voxels for each algorithm are similar. This shows that not only are the overall predictive performances of the methods similar on these vectors, but the resulting segmentations from each method are also similar.

More »

Expand

Figure 6.

(A) The time in hours required to fit the algorithm on each feature vector and (B) the time in minutes required to make a prediction for a single MRI study from the fitted algorithms. Both of the bar plots are partitioned into the six feature vectors on the horizontal axis. The simpler algorithms without tuning parameters require significantly less computational time than more complex methods.

More »

Expand

Figure 7.

The impact of downsampling the training set on computational time and classification performance.

Time in hours to fit the algorithm (left column) and scaled pAUC for false positive rates up to 10% (right column) versus the number of voxels the algorithm is fit on for the unnormalized (A,B) and smoothed and moments feature vectors (C,D). Here we see the effectiveness of downsampling the training set as the performance of the algorithms is not impacted and the computational time is significantly lowered.

More »

Expand

Figure 8.

The super learner coefficient versus the number of voxels the algorithm is fit on for the (A) unnormalized and the (B) smoothed and moments feature vectors.

As the number of voxels used to fit the algorithm changes, the super learner consistently assigns large weights to the same small number of algorithms. For the unnormalized feature vector, high coefficient weights are selected for the logistic regression, one of the random forest tuning parameters, and the Gaussian mixture model. On the smoothed and moments feature vector, the super learner favors the less complex algorithms: logistic regression, the quadratic discriminant analysis, and the linear discriminant analysis. Some weight is also assigned to the Gaussian mixture model and the random forest.

More »

Expand