Uncovering precision phenotype-biomarker associations in traumatic brain injury using topological data analysis

Background Traumatic brain injury (TBI) is a complex disorder that is traditionally stratified based on clinical signs and symptoms. Recent imaging and molecular biomarker innovations provide unprecedented opportunities for improved TBI precision medicine, incorporating patho-anatomical and molecular mechanisms. Complete integration of these diverse data for TBI diagnosis and patient stratification remains an unmet challenge. Methods and findings The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months after injury. Further analyses revealed that this patient subset had high rates of post-traumatic stress disorder (PTSD), and enrichment in several distinct genetic polymorphisms associated with cellular responses to stress and DNA damage (PARP1), and in striatal dopamine processing (ANKK1, COMT, DRD2). Conclusions TDA identified a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms. Machine learning methods such as TDA may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients. Trial Registration ClinicalTrials.gov Identifier NCT01565551


Methods and findings
The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months PLOS ONE | DOI:10.1371/journal.pone.0169490 March 3, 2017 1 / 19 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 application of TDA on real-world datasets [8][9][10]. TDA is a machine learning data analytic used to cluster patients based on functional outcome data to derive novel insights into disease mechanisms (Fig 1). To date TDA has been successfully applied to biological datasets to discover novel insights including identification of subpopulations of cancer, identification of genomic biomarkers, disease association, RNA folding, viral evolution, immunology, diabetes, and preclinical spinal cord injury and TBI [10][11][12][13][14][15][16]. The current study aims to test the following hypotheses in the TRACK-TBI Pilot information commons: 1) TDA will reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery following TBI, and 2) TDA-identified biomarkers predict patient outcome recovery after TBI.

TBI Common Data Elements (TBI-CDEs) and the TRACK-TBI pilot study
The NIH/NINDS developed the TBI-CDEs to overcome pitfalls in TBI clinical research, including lack of standardization in data collection and analysis, inability to appropriately stratify patients, and discordant injury types [17]. Using a consensus-based approach, NINDS working groups developed standards for data capture across 4 broad domains: clinical assessments and demographic information, genetics and proteomics, neuroimaging, and outcome measures. The NINDS-CDE planning committee instructed working groups to stratify data elements into 1 of 3 categories: 'core', 'basic,' and 'supplemental.' [18] Core elements comprise the most basic information: data that is absolutely fundamental to capture (e.g., gender, age). Basic elements provide additional diagnostic detail (e.g., education level, cause of injury). Emerging CDEs include innovative approaches that require validation before broad clinical adoption (e.g. imaging, serial plasma biomarkers).) [19]. The multicenter prospective TRACK-TBI Pilot study assessed the feasibility and utility of the TBI-CDEs in a prospective, limited multicenter (3-center) clinical observational trial [7], setting the stage for large-scale multicenter prospective efforts currently underway in the US and Europe [6,20].

Patient enrollment
Subject eligibility was based on presentation to any of the  [21]. Patients were excluded if pregnant, in custody, non-English speaking, or on a psychiatric hold (danger to yourself or others). Between April 2010 and May 2011 the TRACK-TBI Pilot enrolled 599 acute TBI patients; 13 subjects age <16 years were excluded due to differences in variables recommended by CDE working groups, resulting in 586 subjects in the current analysis. TRACK-TBI Pilot collected 944 raw data elements per subject. From these, a set of 213 cleaned, well-curated endpoints were distilled for meaningful analysis. Eligible subjects were enrolled through convenience sampling at all three sites. Institutional Review Board (IRB) approval was obtained at all participating sites prior to study initiation. Written informed consent was obtained from all subjects prior to enrollment in the study. For patients unable to provide consent due to the severity of their injury, consent was obtained from their legally authorized representative (LAR). Patients were then re-consented, if cognitively able, at later inpatient and/or outpatient follow-up assessments for continued participation in the study. Children aged 13 and above provided their own written consent in addition to written parental/guardian consent. Clinical characteristics for patients included in the study are summarized in Table 1.

Clinical assessments and demographics
The CDE working group defined subject characteristics (i.e., demographics and social status), subject and family history, injury-or disease-related events (e.g., mechanism of injury, secondary insults), and assessments and evaluations (e.g., vital signs, intracranial pressure). The working group created a basic, intermediate, and advanced adaptation of each data element, offering investigators flexibility in the level of detail appropriate to a given study [22]. From the CDEs, the TRACK-TBI Pilot collected a combination of core, supplemental, and emerging variables.

Genetic material
DNA and acute plasma samples (<24 hours) were collected using standardized protocols developed by the NINDS TBI CDE biospecimens and biomarkers working group [23]. TRACK-TBI Pilot also followed the meticulous guidelines regarding how samples should be obtained, processed, stored locally, stored centrally, and shipped [23].

Neuroimaging
The NINDS neuroimaging working group supplied pathoanatomical definitions for 23 distinct lesion types to be used with any imaging modality. Core variables were distinguished as the presence or absence of individual lesions; lesion location and volumetric properties comprised most of the supplemental category, and emergent elements encompassed lesion-specific complexities. The imaging working group also provided recommendations for protocols to be used for both CT and MRI [24,25]. A board-certified neuroradiologist examined and coded all levels of neuroimaging variables. Recommended imaging parameters were implemented at all sites.

Outcomes
The outcomes working group delineated 12 domains of behavioral outcomes. Choosing 1 measure from 11 of the 12 domains, TRACK-TBI Pilot included a broad outcomes battery. Global outcome was assessed using a standard endpoint, the Glasgow Outcome Scale-Extended (GOS-E). The GOS-E is an 8-point clinical grading scheme for categorizing the outcome and disability spectrum from 'dead' (GOS-E = 1), lower moderate disability (GOSE = 5), to upper good recovery (GOS-E = 8). Recovery from TBI is evidenced by achieving a higher GOS-E over time. Supplemental cognitive and psychological assessments added to a more comprehensive understanding of a domain, whereas tools in the last stages of validation were considered emerging [26]. TRACK-TBI Pilot administered the core and a subset of supplemental measures 3-, and 6-months after injury. Study personnel received a priori training to ensure standardization.

Topological Data Aalysis (TDA)
TDA was performed using a cloud-based analytic platform (Ayasdi, Inc. v 3.0) on 586 patients enrolled in the TRACK-TBI pilot clinical observational trial. Patients were prospectively measured on over 900 separate variables, including the NIH/NINDs common data elements (CDEs). TDA was applied to extract the fundamental outcome features across multiple clinical variables, simultaneously. For the purposes of TDA, we limited our analysis to 17 CDEs based on their clinical importance ( Table 2). These 17 CDEs included CT findings, PTSD diagnosis, and cognitive measures of processing speed and verbal learning. TDA clustered patients into subgroups (nodes) based on similarity across the 17 measures, considered simultaneously as a holistic unit (Fig 1). Subgroups that share at least 1 patient in common are joined by a line (edge). The descriptive statistics of the 17 CDEs are summarized in Table 2. Missing data were only observed in the 6-month outcome variables. Determining whether there are natural subtypes within the TBI population based on these 17 CDEs presents an analytic problem that is both multi-dimensional (17 dimensions) and multi-scalar (each CDE has different range, distributional and metric features). TDA is mathematically well-suited for dealing with this complexity (see below) [8,9,11]. Simply put, TDA uses shape-based feature detection to extract the fundamental shape of the data-space. This shape is mathematically referred to as a 'reeb graph' and represents the manifold of the outcome data space. We refer to the mapping of the patients within the TBI-CDEs as the 'syndromic space' of TBI (Fig 1A-1C). We refer to the TDA network as the TBI 'syndromic map' of patients within the syndromic space ( Fig 1D). TDA clustered patients using a norm correlation metric, which measures the distance between 2 points by the Pearson correlation, given by: Where X', Y' are the column-wise, mean-centered, and variance normalized versions of X and Y, and This was combined with a lens called multidimensional scaling (MDS) coordinate 1 and MDS coordinate 2. These lenses generate a factorization of the data matrix into linearly uncorrelated components, with MDS coordinate 1 representing the highest variance, and MDS coordinate 2 representing the second-highest variance. The patient data are mapped into a Euclidean space, minimizing the sum of squares error, using the distance matrix rather than the coordinates. Gower's normalization is then applied prior to applying MDS to generate the lens values by: TDA then resamples the MDS space millions of times in a cloud-based supercomputer, with overlapping sample bins of variable sizes to extract the shape of the data manifold. Binning size was set at a resolution of 30 and a gain of 3.0 (equalized). The resolution setting controls the number of bin partitions patients are clustered into, similar to scaling up or down on a microscope. Increasing the resolution increases the number of nodes in the analysis graph to reveal more fine structure in the syndromic space, with fewer patients per node, preserving only the strongest connections between groups of patients. Nodes that are weakly associated tend to break apart and create smaller subgroups of patients. Gain is adjusted so that most data points will appear in the same number of bins that the gain is set to. Increasing the gain increases the number of connections between nodes/groups of patients to highlight relationships within the data. Reducing the gain value will result in smaller groups of nodes and more unconnected/single nodes. Equalizing the network distributes the patients evenly across all nodes in the network.

Targeted hypothesis testing using General Linear Models (GLM)
SNPs found to be significantly enriched in the TDA-identified sub-groups of mild TBI patients exhibiting worse GOS-E outcome between 3 and 6 months and a positive diagnosis of PTSD, detected by the PTSD checklist, civilian version (PCL) a validated tool, were formally tested for their influence on poor outcome after TBI, including PARP1, COMT, DRD2 and the 3 different ANKK1 SNPs. The statistical model was designed as a repeated measures analysis of variance (ANOVA), testing the 3-way interaction between SNP, CT pathology (yes or no), and change in GOS-E over time (3 to 6 months) performed on the full dataset. Results are reported as both within-subject effects to tease out the influence of each polymorphism on GOS-E over time either with or without CT pathology, as well as between-subject effects to test main effects of each polymorphism on GOS-E pooled outcome across 3 and 6 months, either with or without CT pathology. This targeted hypothesis testing was performed in SPSS v.19 (IBM) using the general linear model command using type III sums-of-squares and a full factorial design. Significance was assessed at p < .05.

Natural subtypes of TBI population as defined by CDEs in a TDA network
The generated TBI syndromic map consisted of multiple sub-networks comprising of 434 clusters (from 586 patients and 17 CDEs). Similar patients are grouped as a node (Fig 1D), with similarity defined topologically, and in a multivariate fashion from all the CDEs used in the analysis. Similar nodes are close together and joined by an edge (Fig 1E). In this way patient differences are graded by location across the syndromic map. The emergence of distinct subnetworks reflects distinct subpopulations of TBI patients. We statistically explored each subnetwork to understand which CDEs play the most significant role in defining similarity and dissimilarity among patient sub-clusters.
The TBI syndromic map reveals that patients with acute pathological findings on CT ( Fig  2A) and MR (Fig 2B) scans belonged to the same sub-networks, indicating that CDEs used in the analysis were able to cluster more severely injured TBI subpopulations together (red nodes on right half of network). On the other hand, the left sides of the connected sub-networks contained patients that were CT-negative and mostly MR-negative. (Fig 2A and 2B, blue clusters). The TBI syndromic map revealed relationships between patients as defined by the CDEs in a continuously-graded manner across multiple dimensions including Glasgow Coma Score (GCS), the Marshall CT score [30], Rotterdam CT score [31], and the presence of individual CT features, both categorical and quantitative. In addition, a clear CT-negative sub-network emerged with corresponding high GCS, indicating mild TBI (data not shown).  (Fig 2A), MRI positive findings (Fig 2B), a positive diagnosis of PTSD according to DSM IV criteria (Fig 2C). Red nodes and connections in the network highlight positive findings for these measures, showing a clear distinction between the left (blue) and the right (red) portions of the network. Our initial observation showed that the majority of patients with a diagnosis of PTSD did not show substantial brain pathology measured by either CT or MRI. When the network was colored by the GOS-E at both 3 months (Fig 2D) and 6 months ( Fig 2E) after TBI, these patients with a positive PTSD diagnosis and no obvious brain pathology (N = 19) did show substantial functional deficits compared to the other CT-/MR-patients (N = 43) (circled area of the network). Data-driven exploration of this region of the network revealed a significant enrichment of the PARP1 SNP ( Fig 2F) measured in these patients, not previously reported by the TRACK-TBI Pilot investigators. Results from previously identified genetic polymorphisms for ANKK1 [27], COMT [28] and DRD2 [29] were confirmed to have an impact on outcome deficits in patients with TBI (Figures in S1-S3 Figs, Tables in S1-S6 Tables).

Mapping of TBI severity and long-term clinical outcome measures
In order to formally test the hypothesis that the PARP1 SNP was a significant predictor of GOS-E recovery in patients with mild TBI, we performed an independent analysis on the full dataset using a 3-way mixed general linear model with repeated measures. This analysis was structured as a balanced factorial design testing the impact of the following factors on GOS-E recovery: Time (repeated measure; 3 vs. 6 months), CT findings (between-subjects; yes/no) and PARP1 genotype (between subjects: AA, AT, TT). Significant between-subject effects were detected in the 3-way analysis: time by CT by PARP1 genotype interaction (N = 122 patients, PARP1 A/A (n = 33), A/T (n = 44), T/T (n = 45), p = 0.019). Patients with the T/T and A/T genotypes performed worse over time on the GOS-E compared with patients with the A/A genotype in the patients with no CT pathology (Fig 3, Tables 3 and 4). Clinical characteristics of patients in the TDA-selected subgroup circled in Fig 2 (N = 37) are summarized in Table 5, alongside clinical characteristics for all patients with data collected and analyzed for the PARP1 SNP (N = 298). The TDA-selected patient group was slightly younger (41.1 ± 14.2 TDA group, vs 43.5 ± 18.2 all PARP1 group), with 6.1% fewer females, 22.4% fewer Caucasians, and roughly

Fig 2. (A-F). TBI CDE network topology identifies the PARP1 SNP as a candidate predictor of GOS-E deficits in mild TBI.
Patients with TBI were mapped into a TDA network, highlighting color schemes for CT (A) and MRI (B) pathology and whether they had a confirmed diagnosis of PTSD (DSM IV) at 6 months post-TBI (C). Patients in the circled regions of the network were identified due to substantial dysfunction measured by the GOS-E both at 3 months (D) and 6 months (E) post-TBI, compared with other patients in the network with no CT pathology and no diagnosis of PTSD. Data-driven exploration of these patients in the network revealed a significant categorical enrichment for the PARP1 SNP (F), particularly the heterozygous allele (A/T). Heat map represents range of numerical values for each measure: Panels A-C yes (1 = red) vs, no (0 = blue); Panels D-E GOS-E range from less than 3 (blue) to 8 (red); Panel F PARP1 allele A/A = 1 = blue, A/T = 2 = yellow/green, T/ T = 3 = red. 6% less likely to have finished high school or college. TDA selected patients also had 15% less previous psychiatric history, however were more likely to have a previous TBI, either with (28.9%) or without hospitalization (9.2%), and were 22% more likely to have received their TBI from an assault.
Hypothesis testing of the interaction between CT pathology and the ANKK1 SNP allele on GOS-E outcome over time revealed a significant 3-way interaction for ANKK1 Gly422Arg (rs4938016) only, and a significant difference in GOS-E scores at both 3 and 6 months for patients with a positive head CT for ANKK1 Gly318Arg (rs11604671). However, these differences were not found to significantly change over time (Figure in S1 Fig, Tables in S1 and S2 Tables). Hypothesis testing of the interaction between CT pathology and the COMT SNP allele on GOS-E outcome over time revealed both a significant influence of COMT on GOS-E recovery over time, and a 3-way interaction of GOS-E recovery time with the SNP allele and presence/absence of CT pathology, specifically in patients with a negative head CT ( 3. (A-B). Hypothesis testing of PARP1 genetic polymorphism influence on GOS-E deficits in mild TBI. GOS-E scores between 3 and 6 months post-TBI were plotted for patients who were CT negative (A) or CT positive (B), based on the SNP allele expressed (A/A = blue, A/T = yellow/green, T/T = red). Hypothesis testing of the interaction between CT pathology and the SNP allele over time revealed a significant 3-way interaction; however, no significance was detected at each time point individually. Only change in GOS-E over time was significant in patients with a negative head CT.*p < .05.
doi:10.1371/journal.pone.0169490.g003  TDA uncovered a subgroup of mild TBI individuals with poorer outcome, associated with increased PTSD rates and specific single-nucleotide polymorphisms (SNPs) associated with DNA damage and brain dopamine processing. The results provide proof-of-concept for application of multi-scalar big-data analytics to improve TBI precision medicine Discussion TDA applied to data from multiple CT and MR imaging and neuropsychological domains captured the multidimensional locus of individual patients within the TBI syndromic space. Rapid mapping of TBI outcome onto the TDA-syndromic space revealed that mild TBI can be  Visualizing TBI phenotypes stratified into multiple subgroups that have differentiated outcome. A large subpopulation of mild TBI subjects showed poor recovery and tendency to deteriorate from 3-6 months postinjury (Fig 2D and 2E). These same individuals had very high rates of PTSD ( Fig 2C) and significant enrichment in the heterozygous allele of the PARP1 SNPs (Fig 2F) that is associated with cellular responses to stress and DNA damage [32,33]. TDA improves upon traditional outcome-prediction approaches for TBI that have relied on regression modeling of multiple predictors with respect to a single 'gold-standard' outcome measure (e.g., the GOS-E). By simultaneously leveraging the full information provided by all outcomes, TDA and related big-data approaches have potential to improve diagnosis and therapeutic targeting. For example, CT features and neuropsychiatric batteries provided alternative views of injury severity within the topological syndromic map (Fig 2), and considering each of these pieces of information in isolation would provide only a limited view of the full syndrome of TBI. Therefore, once the TBI syndromic space was established using the pre-selected CDEs ( Table 2), we were able to harness this full set of information for all patients to discover novel predictors of recovery following TBI, including several SNPs. The most striking genetic biomarker finding was that PARP1 predicted recovery in patients with a negative head CT, who would be considered to have a mild TBI (mTBI). Previous studies have implicated PARP1 as a useful therapeutic target in humans with TBI, particularly in patients with severe TBI that are enriched for A/A allele [32]. Additionally, attempts to inhibit PARP1 in rat models of TBI have shown promise in helping to reduce cell death [33]. Therefore, PARP1 may be a useful biomarker in mTBI patients when considering patient trajectories and how to maximize recovery in patients presenting with this particular A/T SNP (rs3219119) of the PARP1 gene.
TDA also confirmed the influence of genes involved in dopamine processing reported previously in TRACK-TBI Pilot patients for ANKK1 [27] (Figure in S1 Fig, Tables in S1 and S2 Tables) and COMT [28] (Figure in S2 Fig, Tables S3 and S4 Tables), as well as the novel findings of an influence of the DRD2 SNP C/C allele associated with better recovery of GOS-E in patients with a positive head CT (Figure in S3 Fig, Tables in S5 and S6 Tables), however recent findings have suggested that the T/T allele may be predictive of better recovery on verbal learning tasks after correcting for injury severity [29]. These genes represent divergent molecular mechanisms that result in lowered brain dopamine signaling. ANKK1 T/T is associated with a 40% reduction in the DRD2 receptor [34], whereas the rs4680 SNP encodes for the Met158Val locus of COMT, and the G/G genotype has been associated with lower dopamine levels due to the increase in enzymatic activity [35]. Previous studies have investigated the effect of this mutation on personality traits, dubbing the resulting phenotype as "warrior" compared to its "worrier" counterpart. The "warrior" phenotype is associated with higher concentration, memory, and cognitive function with mixed reports on the ability to emotionally process stimuli. Specifically, there have been multiple studies linking the rs4680 G/G genotype with schizophrenia [36] and lower drug responsiveness for antidepressants and anti-narcoleptics [37,38]. The association of TBI outcome to these genotypes may be due to decreased dopamine levels rather than the specific biomolecular mechanism, leaving still unanswered questions regarding the inherent predisposition to outcome and drug responsiveness of individuals suffering traumatic brain injuries.
Taken together the results indicate that, COMT and PARP1 may be useful biomarkers in a clinical prediction model to determine whether patients with an initial diagnosis of a mild TBI will develop significant functional deficit as measured on the GOS-E. ANKK1 and DRD2, on the other hand, may be useful biomarkers in a clinical prediction model for severe TBI, and warrants further investigation and cross-validation in a larger patient cohort to test whether mitigating the downstream effects of these genetic variants will improve outcome following TBI.
The present findings illustrate the value of TDA for expanding upon traditional diagnostic and prognostic tools for TBI. TDA exhibits several benefits as compared with regression methods, which perform poorly with numerous inter-correlated (multi-collinear) variables. In a regression context, multi-collinearity can lead to over-fitting to a particular dataset, limiting diagnostic value for distinct patient populations. In addition, traditional multiple regression models for TBI have been constructed to explain the variance of a single 'gold standard outcome,' for example the GOS-E. Such approaches ignore the fact that TBI outcome is intrinsically multifaceted. The most precise patient information is captured by considering all of the domains (e.g., psychological, cognitive) of outcome simultaneously, as is possible with TDA. Finally, traditional statistical approaches are designed to maximize the variance explained (predicted) in outcome and their performance is benchmarked by assessing value added over alternative/competing models. TDA does not suffer from these limitations because it is fundamentally focused on extracting the most robust shape (persistent homology) [8,9] across multiple alternative data views through numerous dimensions, different patient clustering algorithms, and patient subpopulations. In essence, TDA provides direct visualization of the shape of multidimensional TBI, enabling rapid insight-discovery not achievable through traditional analytics.
TDA and similar integrative analytics hold great promise to further propel recent advances in the use of novel molecular biomarkers, imaging biomarkers, and psychosocial outcomes for TBI [6,7,[39][40][41]. To develop targeted therapeutic interventions, TBI clinician-researchers face the complex task of stratifying patients based on multifaceted information, and integrating information about TBI is fundamentally a data-intensive undertaking that could benefit from the application of advanced statistical pattern-detection approaches for enhanced decision support. Through integrative analytics of TRACK-TBI Pilot and similar datasets from other CNS diseases, TDA may help realize the potential of precision medicine to rapidly and accurately classify TBI and to identify subpopulations to target with precision medicine approaches.
Supporting information S1 Fig. ANKK1 SNP distribution in TDA network and hypothesis testing on GOS-E recovery between 3 and 6 months post-TBI. (A) Distribution of 3 separate ANKK1 SNPs in the TDA network. (B) GOS-E scores between 3 and 6 months post-TBI were plotted for patients who were either CT negative or CT positive, grouped based on the SNP allele expressed. Hypothesis testing of the interaction between CT pathology and the ANKK1 SNP allele on GOS-E outcome over time revealed a significant 3-way interaction for ANKK1 Gly422Arg (rs4938016) only, and a significant difference in GOS-E scores at both 3 and 6 months for patients with a positive head CT for ANKK1 Gly318Arg (rs11604671). However, these differences were not found to significantly change over time. GOS-E scores between 3 and 6 months post-TBI were plotted for patients who were CT negative or CT positive, group based on the SNP allele expressed (C/C = blue, C/ T = yellow/green, T/T = red). Hypothesis testing of the interaction between CT pathology and the DRD2 SNP allele on GOS-E recovery revealed a significant association of DRD2 with GOS-E at 3 and 6 months post TBI, however this was only detected in patients with a positive head CT and did not significantly change over time. Ã p < .05. (TIF) S1 (DOCX) S1 Dataset. Minimal dataset of variables used to generate and color the TDA network. Variables included in this minimal dataset are those described in Table 2 as well as GOS-E and selected SNPs for PARP1, ANKK1, COMT and DRD2 used for hypothesis testing. The first column of the dataset is the global unique identifier for the TRACK-TBI pilot dataset, which can be used to link to additional variables from these patients in the full dataset stored in the Federal Interagency Traumatic Brain Injury Research (FITBIR) informatics system (https:// fitbir.nih.gov/) and the One Mind Portal (http://onemind.org/Our-Solutions/One-Mind-Portal). Access to the full dataset can be requested by qualified researchers through these data portals. (XLSX) S1 Metadata. Relevant metadata for S1 Dataset to understand description and value ranges and codes for each variable used to generate and color the TDA network. Variables listed in column A of the S1 Metadata file are copied and transposed from the first row of variables in the S1 Dataset, and accompanied by definitions and value ranges and ordinal codes for each variable.