Dynamic changes in diffusion measures improve sensitivity in identifying patients with mild traumatic brain injury

The goal of this study was to investigate patterns of axonal injury in the first week after mild traumatic brain injury (mTBI). We performed a prospective cohort study of 20 patients presenting to the emergency department with mTBI, using 3.0T diffusion tensor MRI immediately after injury and again at 1 week post-injury. Corresponding data were acquired from 16 controls over a similar time interval. Fractional anisotropy (FA) and other diffusion measures were calculated from 11 a priori selected axon tracts at each time-point, and the change across time in each region was quantified for each subject. Clinical outcomes were determined by standardized neurocognitive assessment. We found that mTBI subjects were significantly more likely to have changes in FA in those 11 regions of interest across the one week time period, compared to control subjects whose FA measurements were stable across time. Longitudinal imaging was more sensitive to these subtle changes in white matter integrity than cross-sectional assessments at either of two time points, alone. Analyzing the sources of variance in our control population, we show that this increased sensitivity is likely due to the smaller within-subject variability obtained by longitudinal analysis with each subject as their own control. This is in contrast to the larger between-subject variability obtained by cross-sectional analysis of each individual subject to normalized data from a control group. We also demonstrated that inclusion of all a priori ROIs in an analytic model as opposed to measuring individual ROIs improves detection of white matter changes by overcoming issues of injury heterogeneity. Finally, we employed genetic programming (a bio-inspired computational method for model estimation) to demonstrate that longitudinal changes in FA have utility in predicting the symptomatology of patients with mTBI. We conclude concussive brain injury caused acute, measurable changes in the FA of white matter tracts consistent with evolving axonal injury and/or edema, which may contribute to post-concussive symptoms.

The goal of this study was to investigate patterns of axonal injury in the first week after mild traumatic brain injury (mTBI). We performed a prospective cohort study of 20 patients presenting to the emergency department with mTBI, using 3.0T diffusion tensor MRI immediately after injury and again at 1 week post-injury. Corresponding data were acquired from 16 controls over a similar time interval. Fractional anisotropy (FA) and other diffusion measures were calculated from 11 a priori selected axon tracts at each time-point, and the change across time in each region was quantified for each subject. Clinical outcomes were determined by standardized neurocognitive assessment. We found that mTBI subjects were significantly more likely to have changes in FA in those 11 regions of interest across the one week time period, compared to control subjects whose FA measurements were stable across time. Longitudinal imaging was more sensitive to these subtle changes in white matter integrity than cross-sectional assessments at either of two time points, alone. Analyzing the sources of variance in our control population, we show that this increased sensitivity is likely due to the smaller within-subject variability obtained by longitudinal analysis with each subject as their own control. This is in contrast to the larger between-subject variability obtained by cross-sectional analysis of each individual subject to normalized data from a control group. We also demonstrated that inclusion of all a priori ROIs in an analytic model as opposed to measuring individual ROIs improves detection of white matter changes by overcoming issues of injury heterogeneity. Finally, we employed genetic programming (a bio-inspired computational method for model estimation) to demonstrate that longitudinal changes in FA have utility in predicting the symptomatology of patients with mTBI. We conclude concussive brain injury caused acute, measurable changes in the FA of white matter tracts consistent with evolving axonal injury and/or edema, which may contribute to post-concussive symptoms. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Introduction Traumatic brain injury (TBI) is a significant medical problem worldwide. In the United States, visits to emergency departments (ED) for TBI increased more than 8-fold compared to the total increase in ED visits between 2006-2010, likely reflecting a combination of increased TBI exposure, awareness, and diagnosis [1]. Of all ED visits for concussions or TBI, approximately 75% of individuals are treated and released with diagnoses of mild TBI (mTBI) [2]. mTBI frequently results in cognitive deficits, motor dysfunction and emotional dysregulation [3]. Severity of axonal injury is a determinant of recovery following severe TBI [4], and diffusion tensor imaging (DTI) has emerged as an imaging modality for mTBI that can quantify white matter damage using DTI metrics such as fractional anisotropy (FA), radial, axial, and mean diffusivity (RD, AD, MD) [5][6][7]. However, both increases and decreases in FA may occur at different time points after TBI, and this, coupled with normal variability in DTI metrics in the population at large, represents a substantial challenge for the diagnostic use of DTI in mTBI [8][9][10][11].
If large enough, studies comparing mTBI to control subjects may average out between-subject variability in DTI metrics and show group differences, but natural variation in DTI measures limits the diagnostic value for detecting relatively subtle effects of mTBI in individual patients [12,13]. Animal studies suggest that reductions in FA occur late (7 days) after injury and not in the first 24 hours [14]. A recent meta-analysis of human subjects suggested that increases in FA occur early and drops in FA take longer to evolve, but this observation is based on a composite analysis of multiple separate studies [15]. There are not yet studies of longitudinal studies of DTI metrics at multiple time points during the first week after injury.
We hypothesized that longitudinal imaging at multiple times during the first week after concussion would overcome the limitation of anatomic and mechanistic heterogeneity, and provide increased sensitivity for detection of white matter injuries. The purpose of this study was to quantify the acute changes that occur in DTI metrics in human mTBI subjects across the first week of injury. In addition, we sought to test the hypothesis that acquiring DTI data from mTBI patients at two time points within the first week of injury could improve identification of mTBI subjects compared to controls and delineate injury in mTBI subjects when focusing on the axonal regions of interest most commonly reported as abnormal in the mTBI-DTI literature [16]. The utility of genetic programming in the analysis of DTI metrics was explored as a novel way to diagnose patients who have suffered a mTBI and to predict future clinical outcome.

Study design and setting
This was an Institutional Review Board approved study. All participants provided written informed consent prior to participation in this study and no verbal consent was obtained.
We performed a prospective, controlled cohort study of adult patients with mTBI patients and healthy controls recruited from the emergency department at a single tertiary care academic medical center. Controls included both trauma patients with isolated extremity injuries without head trauma, as well as healthy, normal subjects. MRI of the brain was performed within 3 days of injury and again between 5 and 10 days post-injury, with a target interval of 7 days.

Study population
Between June 2011 and December 2012 patients coming to the emergency department were screened for eligibility by research staff. Eligible patients were those aged 18-60 years old who were diagnosed with mTBI, defined as an isolated head injury with an injury severity score (ISS) for any other organ system <2 and with two or more concussive symptoms including headache, loss of consciousness, blurred vision, confusion, dizziness, memory problems or poor balance. Patients were excluded if they: 1) did not have two or more concussive symptoms; 2) had severe TBI or past history of severe TBI (i.e., requiring surgery or rehabilitation); 3) were unable to complete initial MRI within 72 hours of injury; 4) had a pre-existing neurological disorder; 5) had a psychiatric condition (including depression or anxiety) requiring medical treatment within the past year; 6) had a history of substance abuse; 7) had any contraindications to MRI scanning. The control group was comprised of normal volunteers without acute injury who responded to flier advertisements and non-head injured extremity trauma patients who presented to the emergency department without head trauma or TBI-associated symptoms. All controls were subject to the same exclusion criteria as mTBI patients, aside from those pertaining to acute head injury.

Clinical data collection
Initial clinical data were prospectively extracted in the emergency department by research staff, through discussion with the patients' healthcare providers, structured patient evaluations and interviews, and reviews of medical records. Additional research-specific details were also collected from questionnaires and the Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) testing battery [17]. Follow-up data collection was performed by research staff at the time of repeat MRI. All study data were double-entered and compared for accuracy using the Research Electronic Data Capture (REDCap) tools hosted by UVM [18] MRI acquisition Brain MRI was performed at the earliest possible time point after injury and again approximately one week later. All initial scans were completed <72hrs after injury and the period of 7-10 days following injury for the follow up scan was selected in order to cover the time period when most patients are maximally symptomatic after concussion [19]. MRI data were acquired on a Philips Achieva TX 3.0 Tesla (Philips Healthcare, Best, Netherlands) MRI scanner using an 8-channel brain coil with dual quasar gradients (maximum gradient strength 80 mT/m, slew rate 100 T/m/s). T1-weighted images were acquired using a 3D inversion recovery spoiled gradient echo technique (TE/TR/TI/flip angle = 3.7ms/8.1ms/1008ms/8˚with a SENSE factor of 1.5). A sagittal acquisition matrix of 240x240x160 provided whole-brain coverage with an isotropic 1mm spatial resolution and a scan time of less than 8 minutes. Diffusion-weighted images were acquired using a single-shot, spin-echo EPI acquisition with b = 1000 s/mm 2 with 46 uniformly distributed, non-collinear directions. An additional 6 images were acquired with no diffusion weighting (b = 0 s/mm 2 ). The acquisition matrix was 120x120 with a field of view of 240x240mm 2 using a SENSE factor of 2. 59 contiguous 2mm-thick slices were acquired, aligned AC-PC. TE/TR = 68ms/10000ms with a scan time of 9 minutes.

Outcome measures
Following the first MRI, research staff administered the ImPACT neurocognitive testing battery version 2.0. The ImPACT test measures attention span, working memory, sustained and selective attention, response variability, non-verbal problem solving, and reaction time; each of which is sensitive to mild cognitive impairment. ImPACT also includes a 22-item symptom score based on patient self-rating on a Likert scale ranging from 0 (the symptom was not experienced at all) to 6 (the symptom was the worst they had ever experienced). The 22 ImPACT symptoms include: headache, nausea, vomiting, balance problems, dizziness, fatigue, trouble falling, sleeping more than usual, drowsiness, sensitivity to light, sensitivity to noise, feeling dazed or stunned, irritability, sadness, nervousness, feeling more emotional than normal, numbness or tingling, feeling slowed down, feeling mentally foggy, difficulty concentrating, difficulty remembering, and visual problems. Repeat ImPACT testing was done at 7-10 days following injury to determine the primary clinical outcome of the total number of post-concussive symptoms (ranging from 1-22). All ImPACT tests were administered in a quiet conference room directly outside the emergency department.

DTI data processing
The diffusion-weighted volumes were manually examined by a single research associate to ensure fit for artifacts due to, for example, cardiac pulsatility and subject motion. Corrupted volumes were excluded from the subsequent analysis or subjects were removed from analysis if more than 4 of the 46 volumes required exclusion. The remaining volumes were coregistered using an affine transformation to correct for both head motion and eddy current-induced distortions. The data were then fit to the diffusion tensor model using FSL4 [20] to generate FA maps. FA maps were spatially normalized using the tract-based spatial statistics [20] (TBSS) processing stream built into FSL, see supplemental methods for the step-by-step processing stream. FA maps were transformed and resampled to 1mm isotropic resolution in the template MNI 152 space. Eleven a priori regions of interest (ROIs) were selected as those most commonly reported in the mTBI-DTI literature [16]: the splenium, body and genu of the corpus callosum (CC); and the left and right posterior limbs of the internal capsule (PLIC), uncinate fasciculus (UF), corona radiata (CR), and corticospinal tract (CST). ROIs were pre-defined in MNI 152 space using the Johns Hopkins white matter anatomical atlas [21][22][23], and were of uniform size (Splenium: 12729 mm 3 , Body: 13711 mm 3 , Genu: 8851 mm 3 , PLIC R: 3754 mm 3 , PLIC L: 3752 mm 3 , UF R: 380 mm 3 , UF L: 376 mm 3 , CR L: 18077 mm 3 , CR R: 18074 mm 3 , CST R: 1362 mm 3 , CST L: 1370 mm 3 ) for all control and mTBI subjects. All regions were verified (by a neuroradiologist) for anatomic accuracy for each scan from each subject. Mean FA, RD, AD and MD were then extracted for each ROI for both mTBI and control subjects.

Cross-sectional data analysis
We first performed cross-sectional comparisons of DTI metrics from each ROI for both timepoints between mTBI and control, using the Mann-Whitney U test, to determine if there were any time-points or regions that might differentiate the groups. We anticipated either no significant differences or differences that were significantly diluted due to noise generated by between-subject variability and injury heterogeneity. Specifically, between-subject variability introduces noise because the normal distribution of DTI metrics, including FA, for a given region is much larger than the magnitude of change attributed to TBI that has been observed in prior human and animal studies [8,14], and is likely more problematic in mTBI. Further, injury heterogeneity due to different mechanisms, directionality and magnitude of concussive forces likely leads to different white matter tract injuries [15], making it unlikely that changes within a specific ROI would be significant. We do acknowledge that specific regions appear more susceptible to injury and are likely affected by multiple types of injuries, but differences via cross-sectional analysis would regardless be subjected to the noise of injury heterogeneity, albeit to a lesser extent. To control for injury heterogeneity in the cross-sectional analysis, we employed a method reported by MacDonald et al. [8] where we used control subjects to create means and standard deviations for each ROI, which were used to identify the number of abnormal regions in mTBI subjects. We looked for significant differences across all 11 regions for abnormalities rather than each region individually, thus avoiding noise created by regional injury heterogeneity. Abnormal regions were defined as having a DTI metric (FA, MD, RD or AD) >2 standard deviations above or below the mean of controls. When calculating the number of subjects who would be expected by chance to have DTI metrics >2 standard deviations above or below the mean, based on a binomial distribution with n = 11 regions and assuming regions are independent, the difference between groups reaches statistical significance (p<0.05). When we compared the number of mTBI subjects with more than one abnormal region to the number expected by chance based on a binomial distribution, for n = 20 subjects, the difference between groups was not statistically significant (p = 0.0867).

Longitudinal data analysis
Next, we employed the use of longitudinal data so that within-subjects comparisons could be performed in order to assess changes in DTI metrics across time. By using the changes in DTI metrics to then compare controls and mTBI subjects, we avoided the noise introduced by between-subject variability. When evaluating the longitudinal data, we analyzed the absolute change in the DTI metrics so that both increases and decreases were included, as opposed to just looking unidirectionally or analyzing each separately. We analyzed bidirectional changes because there is no consensus on how DTI metrics change following mTBI, especially longitudinally during the acute post-injury time period, but there is however consensus that these changes likely indicate injury associated with mTBI [15,16]. There are currently human and animal studies independently showing increases [4,7] [24] [25] and decreases [26] [8] [27] [28] [29] in FA and other DTI metrics following mTBI, as well as a few showing both increases and decreases [30] [31] [15]. Additionally, there are no longitudinal studies of human mTBI patients, which acquire multiple images, within the first week of injury to suggest how DTI metrics may change in the acute to sub-acute period. Animal studies with longitudinal data acquired during the acute/subacute period following TBI have shown both increases [25] and decreases [27] in FA among white matter tracts following the initial FA changes immediately following injury. While more human [15] and animal studies have found that FA tends to decrease early following injury, they neither account for nor explain why others find contradictory results. Even four of the most recent animal studies using DTI to assess mTBI in the acute period found contradictory changes in FA and other DTI metrics in various white matter regions [24,27,28,30]. Therefore, because both human and animal studies report increases and decreases in FA and other DTI metrics following TBI, we felt that it was important not to limit our study and risk false negative results that could occur with only a unidirectional analysis.
Since current studies suggest increases and decreases occur following TBI, we performed another analysis considering any change in DTI metrics (either increase or decrease) as abnormal. To do this, longitudinal changes in DTI metrics between mTBI and control groups were first calculated across each individual ROI, using the Mann-Whitney U test. Using this method, comparisons between subject variability, but not injury heterogeneity could be accounted for. We predicted that results might reach significance if a specific region was commonly injured or sensitive to injury, or may not reach significance at all if no particular region was injured with high enough frequency. To avoid noise created by injury heterogeneity within the longitudinal data, we again employed a similar approach to MacDonald et al., as we had with the cross-sectional data, by identifying the number of abnormal regions among mTBI subjects. Similarly, abnormal regions were defined as having a DTI metric (FA, MD, RD or AD) >2 standard deviations above or below the mean of controls. Statistical significance was determined by first calculating the number of subjects that would be expected by chance to have DTI metrics >2 standard deviations above or below the mean, based on a binomial distribution of n = 11 regions of interest, assuming regions are independent. Then the number of mTBI subjects with more than one abnormal region were compared to the number expected by chance based on a binomial distribution with n = 20 subjects. Next, we performed the Wilcoxon rank test to compare mean changes in DTI metrics across all 11 ROIs between mTBIs and controls. This allowed for direct comparison between mTBI and controls, while avoiding the noise created by between-subject variability and injury heterogeneity.

Data reproducibility
To determine the reproducibility of FA measures, both between-subject and within-subject coefficients of variation (CVs) were calculated for the control subjects. The processing stream employs single step resampling of 2mm resolution subject data to 1mm resolution MNI space. Such resampling of data is unlikely to introduce substantial additional smoothness. However, to verify that resampling did not introduce such smoothing, coefficients of variation were calculated for controls a second time by transforming the mask images into the individual space for comparison without resampling of the diffusion data.

Software
All statistical calculations were performed in SPSS (PASW Statistics 18, release 18.0.2). Matlab was used to analyze the results of genetic programming.

Genetic programming
Symbolic regression was performed using the genetic programming (GP) package Eureqa [32] to see if we could predict recovery in mTBI patients. This powerful method implicitly combines feature selection, model identification, and parameter estimation, and has been successfully applied in various application domains [33], including identification of nonlinear relationships in BOLD time-series fMRI data between ROIs in the human brain [34]. GP is a population-based algorithm in which sets of candidate solutions (symbolic expressions) are allowed to evolve based on the principles of Darwinian evolution (reproduction with heritable variation, alternating with fitness-based selection). Eureqa is a bi-objective GP that seeks to simultaneously minimize model prediction error and model complexity, and each run returns a set of solutions that are non-dominated with respect to these two objectives. The user can then select solutions that appear to appropriately balance prediction accuracy and parsimony (to minimize the risk of over-fitting). GP is a "white box" optimization algorithm, in that the resultant predictive expressions may provide domain-specific insights. For this study, we sought to evolve functions capable of predicting the sum of post-concussive symptoms at time 2 (S 2 ). We specified mean absolute prediction error as the primary objective, and model complexity was defined as the sum of the number of operators, constants, and variables (input features) in each evolved expression. Specifically, we allowed the GP to select and combine: 1) arithmetic operators from the set [18]; 2) co-evolved numerical coefficients; and 3) variables from sets of 1 to 12 possible input features, depending on the particular experiment. Following Eureqa's recommendations for small data sets, we performed each symbolic regression on all n = 36 data points. We considered a total of 34 possible input features: the sum of post-concussive symptoms at time 1 (S 1 ) and 3 types of FA data for each of the 11 ROIs. Nine distinct types of experiments were performed using Eureqa (Version 1.24.0), each using a set of 1-12 of these as input features, as detailed in Table 1. For each of the 9 types of experiments, we performed 5 independent de novo runs of Eureqa, each starting from random initial populations, to assess consistency of the evolved non-dominated sets of solutions. Each run of Eureqa was manually terminated after Eureqa's "percent converged" heuristic (based on time since last significant improvement) had reached 100%, which required on the order of 10 9 function evaluations per run. Running on an 8-core Intel i7-3770 CPU @ 3.40GHz desktop computer, this was usually achieved within 3-5 minutes for the experiments in which the lowest error solutions were achieved. However, in some experiments where there was little or no useful signal in the input feature set, this could take significantly longer (nearly all of the experiments terminated within 30 minutes or less, however one run of experiment 1, which allowed only FAs at time period 1 as input features, required over 2 hours to converge).

Demographics, neurocognitive outcomes and symptoms scores
Subjects included 20 mild TBI patients: 11 male and 9 female, age range 18-57 years, mean age 30.6 years; 16 controls (9 trauma and 7 healthy): 7 male and 9 female, age range 20-57 years, mean age 28.1 years (Table 2). There were no demographic parameters that differed significantly between mTBI and controls. mTBI and extremity trauma control subjects were enrolled in the study and imaged within 72 hours of injury, with a mean time of 46.5±22 hours for mTBI and 57.6±12 hours for the controls. All subjects from the mTBI and control groups returned for follow-up imaging one week later, at a mean of 6.7±1.1 days for mTBI and 6.9 ±1.7 days for all controls. All mTBI subjects were symptomatic at the time of first imaging. There were no differences in head motion between mTBI and controls groups. mTBI subjects reported significantly more symptoms at time-points 1 and 2 compared to controls. We observed a wide variation in reaction time, working memory and other neurocognitive testing results among both the injured groups and controls, but differences in functional measurements between groups were not significant (Table 3). Reproducibility of diffusion measures DTI measurements of white matter in healthy individuals across time were remarkably stable. The ROI for the Left Uncinate Fasciculus did not anatomically fit one control subject, so that region was therefore excluded from analysis for that individual. FA measurements across the 11 regions of interest showed that the between-subject variation (CV range 2.3% to 7.3%) was far greater than within-subject variation (CV range 0.4% to 4%). Through the use of longitudinal data, a 70-97% reduction in variance can be achieved (Table 4). CVs calculated in the native space of the control subjects showed negligible differences compared to the CVs calculated from the resampled control data used for analysis (S6 Table 3. Neurocognitive and symptom outcomes for the mTBI subjects and trauma controls. Outcomes were measured in those controls subjects with extremity injuries only. Values given as mean ± standard deviation.  Table). Large white mater regions with higher FA values were also found to have lower levels of within-subject variation. Within our control population, we found a high degree of reproducibility across 1 week (for individual subjects at two time points R 2 = 0.971).

Cross-sectional analysis
We did a cross-sectional analysis looking at the two time-points individually. We compared DTI measures in the 11 ROIs between mTBI to control, and we also compared the total number of ROIs that were abnormal between mTBI and control. mTBI vs control region of interest analysis. Of the 11 ROIs, no significant (p>0.05) differences between mTBI and control subjects were found at either time-point 1 or 2 for any of the quantifiable DTI measures (FA, MD, RD, AD) (S1-S4 Tables) (Fig 1A and 1B),

Longitudinal analysis
We then looked at the change in DTI measures between the two time-points. mTBI vs control region of interest analysis. When comparing the absolute change between time 1 and time 2, we found the largest difference in the splenium of the CC in mTBI subjects compared to controls (p<0.05) ( Table 5) (Fig 1C). All ten remaining ROIs did not differ significantly compared to controls. When collectively comparing all eleven ROIs, the mean FA change was significantly greater in the mTBI subjects than in the control group (p<0.05, Wilcoxon signed rank test). Further, 16 out of the 20 mTBI subjects were found to have at least one region, and as many as 7 regions, where the change in FA exceeded the changes seen across all controls within specific regions. The quantitative diffusion measurements of MD, RD, and AD comparing absolute changes across time-points are provided in supplemental tables (S2-S4 Tables). mTBI abnormal regions analysis. FA, RD and AD all showed significantly increased abnormal changes in ROIs across time-points. When quantifying FA, 9/20 subjects were found to have more than 1 region with abnormal changes (p<0.05). RD found 8/20 subjects with more than 1 region with abnormal change (p<0.05), and AD found 5/20 subjects with more than 1 region with abnormal change (p<0.05). MD did not show a significantly increased number of abnormal changes among the 11 regions (Fig 2).

Genetic programming
Finally, we performed a post-hoc exploration of our data to identify correlations between DTI measures and clinical outcome variables that may be tested in future studies. We computed the Pearson correlation coefficient (r) between each of the 34 possible input features provided to Eureqa and the outcome variable S 2 . Of these, only 9 had statistically significant correlations (p 0.05), as shown in Table 6 Not surprisingly, the strongest of these correlations was between S 1 and S 2 . More interesting were the significant correlations we found between S 2 and (i) the FA of region UF (left) at time period 1, (ii) longitudinal changes in the FA of CR-L_All, and (iii) absolute values of longitudinal changes in FA in 6 other ROIs.
However, Eureqa was able to evolve many expressions with interaction terms and/or higher-order terms that exhibited much higher correlations with S 2 . Of the 561 possible pairs of the 34 features, 98 of them had significant cross-correlations (p < 0.05); we hypothesized  Table 7, and for 4 of these expressions we plot predicted vs. observed (Fig 4).
We first addressed the results of experiments 1-4 ( Fig 3A and 3C) which used only FA data for input features. In experiment 1, in which the only input feature was FA at time period 1 (blue lines), the very gradual decline in error with increasing complexity in the results indicate that there is little or no useful signal in this data alone that is predictive of S 2 . On the other hand, regressing on ΔFA values (green lines) gave a much sharper drop in error than regressing on the same ΔFA values but with rows permuted (red lines), indicating that there is some useful signal included in the ΔFA values that is predictive of S 2 . The models resulting from regressing on |ΔFA| values (black lines) exhibited even lower errors and higher R 2 values than when regressing on ΔFA, and there was greater consistency between the 5 independent model runs using |ΔFA|, with an apparent "knee" (change in slope) in both error and R 2 at model complexity 9.
Not surprisingly, the errors in models from experiments 5-9, which were all evolved using S 1 as an input feature (Fig 3B and 3D), were much lower than models of the same complexity that were evolved without using S 1 (Fig 3A), since S 1 had the largest main effect (Table 6). When S 1 is the only input feature (Fig 3B, cyan lines), there is a sharp drop in error between the constant model and the linear model (Eq 1 in Table 7), but adding in higher order terms only reduces the error gradually, due to over-fitting. Similarly, including permuted ΔFA values (experiment 6, red lines), where we had randomized the association between features and outcomes, causes gradual decreases in error due to over-fitting. However, including FA 1 (experiment 7, blue lines) appears to provide a small amount of additional signal. In fact, the equation numbered 2 actually dominates the linear model of the equation numbered 1 ( Table 7). The best models were achieved with using S 1 in conjunction with either ΔFA (experiment 8, green lines) or |ΔFA| (experiment 9, black lines), as shown with the non-dominated equations numbered 3-8 (Table 7). Interaction terms in these expressions appear to be boosting the useful signal in these data. Again, the results were more consistent between runs when using |ΔFA| Diffusion changes improve detection of mild brain injury and, for the more parsimonious models (of complexity 13), the R 2 values were also higher when using |ΔFA|, indicating that there is more useful signal in these data.

Discussion
This neuroimaging study of emergency department patients with acute head injury, with imaging immediately after injury and one week later, has several important implications. First, we show that longitudinal analysis of changes in FA in each ROI, using each individual subject as their own control, discriminated between mTBI subjects compared to controls, identifying abnormalities which would have not been detected using standard cross-sectional and whole brain analysis approaches. This corroborates prior findings of diffusion changes occurring across time following TBI, which have been described in human subject studies that have used longitudinal imaging, but over a longer time period [11,26,35]. Our study is significant because it is the first to have employed multiple timepoint imaging for both mTBI and control subjects within the first week after injury. We validate previous studies showing that ROIbased analysis of FA measurements provides the highest sensitivtiy for detection of axonal injury in mTBI. We also demonstrated that inclusion of all a priori ROIs in an analytic model as opposed to measuring individual ROIs improves detection of white matter changes by overcoming issues of injury heterogeneity. Finally, we use a genetic programming approach to provide support for a model associating longitudinal changes in FA with the severity of postconcussive symptoms 7-10 days after mTBI In this model, the direction (increase or decrease in FA value) was not as predictive as the absolute change in FA. In sum, identification of white matter ROIs with changes in FA signal, over the week after trauma, may provide critical insight into understanding the clinical effects of concussive head injury. There is no established consensus for optimal, quantitative DTI analysis to identify mTBI. Region of interest (ROI) methods are generally used, but this technique may artifactually minimize group differences when ROIs are placed within maximal FA regions on post-processed FA maps and underestimate FA values if ROIs are placed adjacent to low-FA structures where partial volume averaging artifacts occur such as gray-white junctions or the pericallosal white matter. Voxel-based methods such as tract-based spatial statistics that normalize DTI data to a common space to account for differences in individual brains (i.e., size and shape) are subject to error from partial volume effects, particularly as slice thickness is increased. "Pothole" the number of mTBI subjects with more than one abnormal region is significantly different to that expected by chance (binomial distribution, n = 20 subjects, p = 0.0867) [8]. https://doi.org/10.1371/journal.pone.0178360.g002 Table 6. Significant pearson correlation coefficients were found between 9 of the 34 possible input features provided to Eureqa and the outcome variable S 2 . The notation ΔFA means FA 1 -FA 2 . Diffusion changes improve detection of mild brain injury techniques [11,[36][37][38][39][40], in which areas of unusually high or low FA are identified throughout the brain, and the number and/or volume of such regions provides a single metric of injury, has been shown to result in large numbers of false-positives due to the issue of multiple comparisons, and recent research has questioned aspects of its methodology used in many TBI studies [41]. In our study, we compared FA values to regions of the brain most commonly  Table 7.
https://doi.org/10.1371/journal.pone.0178360.g003 associated with mTBI [16,42] to provide a proxy measure of injury severity without introducing the issues of multiple comparisons to the extent seen in voxel-based analyses that may lead to false positive errors.  Table 6. The top one is simply the linear relationship with S 1 , the middle two were evolved from experiment 9 (using S 1 and |ΔFA| as input features), and the bottom one was evolved from experiment 8 (using S 1 and ΔFA as input features).
We demonstrate that in our cohort, counting the total number of abnormal ROIs in individual patients at either time point was not sensitive to detection of mTBI. There were a significant number of abnormal regions in only one DTI metric, AD, among mTBI subjects, but only at time-point 1. Otherwise no significant differences were found at either individual timepoints or for any individual ROI whether compared cross-sectionally or longitudinally. We observed some ROIs, such as the splenium of the CC, trended towards showing a significant difference in the comparison between groups, but these differences were not statistically significant when corrected for multiple comparisons. These findings demonstrate the impact that inter-subject variability and injury heterogeneity have on diluting subtle white matter changes seen following mTBI. We acknowledge that there are a number of studies demonstrating significant findings using cross-sectional ROI analyses, but many of these results may be explained by similarity of injury mechanism among the study population and/or increased severity of injury [8,15]. Further, based on our findings, we suggest results of prior studies may be missing significant white matter changes due to analysis methodology that is not sensitive enough to subtle changes in white matter seen across time. If these subtle changes play significant roles in the initial and long-term symptoms seen in mTBI, as we have suggested, then this may explain why DTI metrics have thus far been poorly correlated with mTBI symptoms and prognoses [15,16].
Studies using whole-brain analyses are also susceptible to the effects of inter-subject variability, but to a larger extent avoid the issue of injury heterogeneity. This allows for improved ability to detect white matter changes and may explain why the analysis method has been popular in recent years. Our results however suggest that these studies likely still fail to capture the true degree and extent of white matter changes, thereby demonstrating an incomplete picture of the white matter disruption caused by mTBI. Our results support and extrapolate on the earlier discovery by Inglese et al. that abnormalities in FA and MD after mTBI are too subtle to be detected by whole brain analysis [43].
It also is worth considering whether FA is the most appropriate diffusion metric for assessing mTBI. Inglese et al. [43], performed a cross-sectional study of whole brain and ROI analysis in subjects after mTBI. Similar to our results, they found diffusion changes in FA within commonly injured white matter regions such as the CC and internal capsule, which were not detected by whole brain analysis. They concluded that early time-point imaging may have utility as a prognostic measure, consistent with our genetic programming analysis. Our findings differ with regards to the utility of MD; AD and RD had more diagnostic utility, but none of these measures provided much additional discriminatory capacity to distinguish mTBI patients from controls. Clinical MR DTI sequences model diffusion as a single three-dimensional ellipsoid, but newer methodologies, such as CHARMED [44] (composite and restricted model of diffusion), which account for different compartments, may be more realistic and better differentiate edema from axonal disruption. Most tensor models do not account for the presence of multiple fiber populations within a single voxel, but instead average them together, resulting in an apparent decrease in FA. In addition, heterogeneity of the injuries and forces that lead to mTBI as well as the spatial heterogeneity of damage within the brain parenchyma [13] ultimately limit the ability of DTI in the setting of acute mTBI regardless of the diffusion metric used. FA was the most sensitive measure in our study for capturing white matter changes in mTBI subjects. These findings are consistent with most current animal and human studies and demonstrate the importance of employing multiple DTI metrics during analysis for future studies [15,16,30]. Our study corroborates the importance of DTI with ROI analysis of FA to identify axonal injury in mTBI subjects, and further suggests that analysis of longitudinal imaging across time may detect more subtle white matter injury than single time point analysis.
Furthermore, on closer examination of the DTI changes demonstrated via longitudinal analysis, that most subjects only have a few regions with significant changes in white matter; rather than a more diffuse pattern across all ROIs. We believe that this further demonstrates the focal and heterogeneous nature of white matter injury after in mTBI, which is likely secondary to variables such as injury mechanism, location, directionality and magnitude of impact. Our results show that, for every ROI analyzed, the subject with the greatest degree of change in FA for that ROI was always from the mTBI group. When looking across individuals, 80% of the mTBI subjects exhibited one or more regions with a change in greater magnitude than all changes seen in controls for that region. These findings provide convincing evidence that these changes are markers of white matter disruption. The reason 80% and not 100% of mTBI subjects had a larger change than controls may be due a variety of factors including injury severity, imaging timepoints in conjunction with the time frame of specific patient's recovery or that the location of injury was within an uncommon white matter tract. We also noted that certain regions such as the splenium of the CC had an increased frequency of white matter changes among mTBIs, likely representing an anatomical susceptibility as has already been suggested by prior research [16]. Moreover, the regions most commonly associated with FA changes in our study were consistent with those previously published [16].
In addition, we observed both increases and decreases in DTI signals among mTBI subjects following injury, which matches findings in both animal and human literature [15,16,30]. The variability in directionality of DTI metrics likely involves co-occurring processes of injury and recovery [15,30]. While there currently is no consensus on the true mechanism behind the injury and recovery processes occurring within white matter following mTBI, the most recent animal models have proposed the changes are driven by edema, gliosis, inflammatory cytokines, and various astrocytic processes [24,27,28,30]. All of these likely vary among individuals due to a large number of factors including severity of injury and individual recoverability.
Our results also demonstrate the highly reproducible nature of ROI-based FA measurements in human subjects across short periods of time. Longitudinal imaging in our control population enabled us to determine coefficients of variation (CV) for test-retest reliability. The CVs determined from our control subjects compared well to those published previously [45]. For large, uniform white matter structures such as the splenium, the within-subject CV of 0.4% is remarkably good. Smaller anatomic structures such as the uncinate fasciculus have substantially higher CVs. Because the higher sensitivity of the longitudinal study is likely due to the fact that within-subject variability is much smaller that between-subject variability, technical improvements such as multiband imaging [46] may enable faster, more accurate, and reproducible determination of DTI metrics, but are unlikely to substantially increase sensitivity in cross-sectional studies where between-subject variability is the dominant source of uncertainty. For longitudinal studies, within-subject reproducibility is the limiting factor, and technical improvements may improve sensitivity.
Not surprisingly, we found that the sum of post-concussive symptoms at time 1 (S 1 ) by itself had a moderate linear relationship (R 2 = 0.43) with the sum of post-concussive symptoms at time 2 (S 2 ). However, when we evolved non-linear expressions using Genetic Programming (GP) that combined S 1 with longitudinal changes in FA, we found several expressions with relatively low model complexity (indicating they were not simply over-fitting the data) that were much more strongly predictive of S 2 (with R 2 values up to 0.96). Furthermore, the observation that we could not evolve such strongly predictive models when we randomly permuted the longitudinal changes in FA from each patient, relative to the observed values of S 1 and S 2 , is another indication that there is meaningful signal in the longitudinal changes in FA. Finally, in the more parsimonious evolved models, we observed greater consistency and greater predictive value when using the absolute value of longitudinal changes in FA. Thus, although with only 36 data points, these results from the GP are far from conclusive, they do offer support for the following two conjectures: 1. There is useful signal in longitudinal changes in FA that is associated with the severity of post-concussive symptoms 7-10 days after mTBI; and 2. Using absolute values of longitudinal changes in FA appears to have slightly more useful signal than using signed changes, especially at low model complexities, where over-fitting is less likely.
Limitations of our study include the relatively small sample size and the use of only two time points in the acute and subacute phases, as opposed to an additional third point in the chronic phase. A recent meta-analysis [15] showed that FA increases are generally seen in the acute stage 2-4 days post-injury, changes are inconsistent in the sub-acute stage of 4 days-2 weeks post-injury, and decreases are observed beyond 2 weeks. In our study, we observed greater change in the mTBI group compared to the control group. Directions of change varied in different ROIs. Whether longitudinal imaging is more sensitive than cross-sectional imaging may depend on the magnitude of change in DTI metrics over the time course between scanning. Further work is required to determine optimal imaging time points to maximize the sensitivity of longitudinal studies. Another limitation is the analysis of only 11 white matter regions where injury was most commonly reported. However, because only 11 regions were analyzed, it is likely that mTBI subjects had injury to additional regions, and it is possible that some mTBI subjects had no injury among our 11 chosen regions. It is also important to address the fact that changes in FA among our study population are relatively small in magnitude compared to overall FA values. While we acknowledge this fact, it is important to remember that because we have demonstrated, as have other studies [22], that FA is highly reproducible across time in healthy individuals, that significant changes would still hold clinical significance. Further, subtle abnormalities are rather, to an extent, actually anticipated due to the variable nature of injury within the mTBI population. Furthermore, when examining large white matter tracts, it would be paradoxical to theorize that large-magnitude changes/insults, which commonly present with more severe neurological findings consistent with stroke or severe TBI, would produce the clinically non-specific symptomatology seen in mTBI.

Practical applications
Significant changes in FA were detected in patients with mTBI within the first week following injury. Although between-subject variation is a substantial obstacle in the study of mTBI using DTI, we demonstrate that longitudinal imaging over the first week after an injury improves the characterization of mTBI by DTI metrics. Our results suggest that sequential imaging of the same individual is superior to cross-sectional imaging for quantitative DTI analysis of mTBI. Our results using genetic programming support this suggestion that longitudinal changes in FA may have clinical utility in predicting severity of post-concussive symptoms a week or more after mTBI, and that the magnitude of change may be more predictive of longterm outcomes compared to the directionality of the change. for S1-S4 Figs. Number of abnormal regions of interest in mild traumatic brain injury (mTBI) subjects. Abnormal regions are defined as having DTI metrics more than 2 standard deviations above or below the mean for the control group. Blue bars indicate the number of mTBI subjects with a given number of abnormal regions. Red bars indicate the number of subjects that would be expected by chance, based on a binomial distribution with n = 11 regions, p = 0.0455. Regions are assumed to be independent (8). Dashed boxes indicates metrics in which the number of mTBI subjects with more than one abnormal region is significantly different to that expected by chance (binomial distribution, n = 20 subjects, p = 0.0867). (TIF) S1  Table. Reproducibility of FA seen in control subjects without resampling across 1 week. Binary masks defining each region of interest were transformed into the individual subject space using nearest neighbor resampling. This prevented any possible effect of smoothing of the original data, although the mask definitions are likely to be somewhat less accurate. For control subjects, the standard deviation of FA values for each ROI at time point 1 and 2, along with the standard deviation of change within subjects across the two time points is shown. (DOCX) S1 Methods. Supplemental methods. (DOCX) S1 Dataset. Original DTI metrics data. (XLSX)