Concussion classification via deep learning using whole-brain white matter fiber strains

doi:10.1371/journal.pone.0197992

Fig 1.

The Worcester Head Injury Model (WHIM).

Shown are the head exterior (a) and intracranial components (b), along with peak fiber strain-encoded rendering of the segmented WM outer surface (c). The x-, y-, and z-axes of the model coordinate system correspond to the posterior–anterior, right–left, and inferior–superior direction, respectively. The strain image volume, which was used to generate the rendering within the co-registered head model for illustrative purposes, directly served as input signals for deep learning network training and concussion classification (see Fig 2).

More »

Expand

Fig 2.

Structure of the deep learning network.

The network contained five fully connected layers to progressively compress the fiber-strain-encoded image features, and ultimately, into a two-unit feature vector for concussion classification.

More »

Expand

Table 1.

Summary of the dimensions of the weights and offset parameters, along with the normalization functions used to define the deep learning network.

See Appendix for details regarding the normalization functions.

More »

Expand

Fig 3.

Illustration of the training and validation error functions and the corresponding validation accuracy.

(Top): Error functions from three deep learning training trials; (Bottom): the corresponding validation accuracy (based on the 10% training dataset used for validation internally), vs. training epochs for three randomly generated trials. Maximum validation accuracies based on validation datasets were achieved using an early-stopping criterion after 2000 epochs.

More »

Expand

Fig 4.

Cumulative WM fiber strains on representative orthogonal planes for a pair of striking (non-injury) and struck (concussed) athletes.

More »

Expand

Fig 5.

Probability maps for WM voxels selected.

(a and b): using the F-score or (c and d) RF-based approach based on 58 independent feature selections. In each trial, the two approaches selected 4% and 1%, respectively, of the WM voxels as features. To improve visualization, only voxels with a probability greater than 50% (i.e., selected by at least 29 times) are shown. For the RF-based approach, SLF-R and EC-L were two dominant regions often selected for classification. See Matlab figure (S1 Fig) in the supplementary for interactive visualization.

More »

Expand

Table 2.

Summary of results.

Shown are leave-one-out cross-validation accuracy, sensitivity, and specificity based on the testing dataset for the three feature-based machine learning classifiers. No feature selection was conducted and WM voxels of the entire brain were used for classification. For RF, the 95% confidence intervals (CI) were also reported based on the 100 random trials.

More »

Expand

Table 3.

Performance summary.

Shown are accuracy, sensitivity, and specificity of the three feature-based classifiers when using either the F-score or RF-based approach for feature selection prior to classification (95% CI for RF in parentheses).

More »

Expand

Fig 6.

Comparisons of ROCs based on the testing dataset for the total of 7 classifiers.

More »

Expand

Fig 7.

Comparisons of ROCs based on the training datasets.

For the deep/machine learning techniques, only results from those with the RF-based feature selection are shown. The two ROCs correspond to the best and worst AUC, respectively.

More »

Expand

Table 4.

Performance summary of the best performing feature-based classifiers (all with RF feature selection) as well as of the four scalar metrics from univariate logistic regression.

Accuracy, sensitivity, specificity and AUC were reported based on the 58 separate injury predictions in the leave-one-out cross-validation framework. The average AUC measures (and 95% CI) for the training datasets were also reported.

More »

Expand

Table 5.

Summary of results.

Shown are out-of-bootstrap accuracy, sensitivity, specificity, and AUC (mean and 95% CI [59]), along with .632+ error [58] based on 100 bootstrapped trials using the best performing feature-based classifiers (all with RF feature selection) as well as those of the scalar metrics from univariate logistic regression.

More »

Expand