Diagnostic accuracy of tablet-based software for the detection of concussion

Despite the high prevalence of traumatic brain injuries (TBI), there are few rapid and straightforward tests to improve its assessment. To this end, we developed a tablet-based software battery ("BrainCheck") for concussion detection that is well suited to sports, emergency department, and clinical settings. This article is a study of the diagnostic accuracy of BrainCheck. We administered BrainCheck to 30 TBI patients and 30 pain-matched controls at a hospital Emergency Department (ED), and 538 healthy individuals at 10 control test sites. We compared the results of the tablet-based assessment against physician diagnoses derived from brain scans, clinical examination, and the SCAT3 test, a traditional measure of TBI. We found consistent distributions of normative data and high test-retest reliability. Based on these assessments, we defined a composite score that distinguishes TBI from non-TBI individuals with high sensitivity (83%) and specificity (87%). We conclude that our testing application provides a rapid, portable testing method for TBI.


Introduction Background
Between 1.6 and 3.8 million cases of traumatic brain injuries (TBI) are reported every year in America [1], of which 75% are classified as mild (mTBI) [2]. A large fraction of these cases come from military warzones, as well as from team sports such as football, rugby, hockey and soccer [3]. Collectively, these injuries result in over 50,000 deaths, leave over 70,000 patients with permanent neurological problems, constitute the leading cause of death and disability of U.S. children and young adults, and cost approximately $60B annually [2]. Further, it is a growing belief that the number of TBI cases are underestimated due to the lack of a central reporting system and the fact that many people with less severe injuries do not seek medical treatment [1,4,5]. Such statistics and the downstream neurodevelopmental consequences [6] have created a call to action to provide more objective measures of cognitive functioningboth before an injury occurs, as well as post-injury to measure recovery [7]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Importance Despite the prevalence of mTBI, early diagnosis of the condition remains a challenge [8]. Many individuals with mTBI do not immediately exhibit symptoms and do not have neurological deficits that can be detected by brain imaging or mental state examination. It is estimated that only about 10% of TBIs are detected by CT scans where brain bleeding has occurred, leaving 90% to other diagnostic criteria. Thus, while moderate and severe TBI are fairly easily diagnosed, mild TBI often escapes notice. Unfortunately, early detection of mTBI is critical to avoid secondary damage, which is sometimes irreversible [9].

Goals of this investigation
Although early detection of mTBI is often missed [1], it does not have to be. Subtle brain injury can be detected with tests of attention, perception, and visuomotor skills [10]. We set out to determine whether such tests could be adapted to a tablet device and be used to detect mTBI. To that end, we developed a tablet-based software battery with the aim of maximizing diagnostic accuracy, portability, and ease of operator use, while minimizing testing time and the possibility of malingering. Our approach capitalizes on brief; simple tests appropriate for tablet computing to tease out several aspects of mTBI.
Our primary research aim is to estimate the diagnostic accuracy of BrainCheck when compared to physician diagnoses of concussion. We also sought to determine the distribution of performance for each test variable, identify differences in performance driven by age or sex, quantify test-retest reliability, and evaluate the efficacy of a software module intended to test for malingering.

Selection of participants
This study was approved by the Institutional Review Board of the Baylor College of Medicine, and each participant was required to provide written informed consent. Informed verbal accent was required for adolescent participants in addition to written informed consent from their guardian.
TBI population and pain-matched controls. Eligible subjects were patients admitted to the Emergency Department (ED) at Ben Taub Hospital with a chief complaint of head trauma, and were determined by a physician evaluation to have suffered a mTBI in accordance with the diagnostic criteria [11]. We also enrolled pain-matched control subjects from the ED, defined as those clinically determined to be experiencing similar amounts of pain but without any head injury (e.g. a twisted ankle). Physicians were blind to our test results because they reached diagnoses immediately prior to the administration of BrainCheck.
Normal population. For comparison, we also enrolled a healthy control population. Data was gathered from universities, high schools, community centers, and the Texas Medical Center accelerator (specifically, Rice University, University of Houston, Houston Community College, St. Thomas High School, Emery Weiner High School, KIPP Generations, KIPP Houston, KIPP Sunnyside, Tellepsen YMCA, Trotter YMCA, Weekly YMCA, Robinson Judson Jr. Community Center, Fonde Community Center, and the Texas Medical Center accelerator. Retest data was only collected from the Texas Medical Center accelerator). Participants from the local universities and the TMC accelerator were administered the test one-on-one; participants from the high schools and community centers were administered both individually and also in groups of 2-12 people at a time. To ensure quality participant effort, each testing site had a minimum of three test administrators. Inclusion and exclusion criteria. All participants were required to be between the ages of 18 and 64, have full function of both hands and perfect or corrected vision. For the mTBI group, inclusion criteria were as follows: patients were required to have presented to the ED with a suspected traumatically-induced brain injury (if the patient was not sure that the head was directly injured in the traumatic event, they needed to report a loss of consciousness [LOC] or amnesia); patients were required to have a Glasgow Coma score (GCS) of 14 or 15 on initial evaluation in the ED; patients or their legal representatives had to be willing to undergo the Informed Consent process prior to enrollment into the study. Additionally, we required a physician diagnosis of mTBI in accordance with the diagnostic criteria of mTBI [11].
Subjects were excluded from the study if any of the following criteria were true: TBI within the last 6 months, any pre-existing neurological condition (including neurodegenerative disease, a primary diagnosis of ischemic or hemorrhagic stroke, or history of neurosurgery within the last 30 days) [12], less than 4 hours of sleep the night before [13], strenuous physical activity within 1 hours of testing [14], or any drug or alcohol use within the last 12 hours [15].
Participant recruitment. Different strategies were pursued depending on the location and the purpose. For the normative data, including the test-retest sample, participants from universities and high schools were referred by the athletic department, while those from community centers and the Texas Medical Center were recruited by posters in well-trafficked areas. Participants were unpaid, with the exception of a subset of the control adult participants in the Texas Medical Center who received a small compensation ($10) for taking sequential tests for the analysis of test-retest reliability.
For concussed patients and pain-matched controls, patients were recruited by nursing staff after diagnosis. BrainCheck's concussion battery was always administered within 24 hours of the physician diagnosis. Pain matching relied on pain as self-reported to nurses via a research case report. Once identified, participants were then consented for participation in the experiment. Ten percent of non-concussed patients refused to participate, whereas thirty percent of concussed patients were eligible but refused to consent. The discrepancy is likely due to collateral symptoms like blurred vision or dizziness that affected the patients' ability to use the tablet. These symptoms were more common in the concussed sample than the non-concussed, painmatched sample. Both samples were infrequently interrupted during the administration of BrainCheck; however, they differed in terms of their willingness to restart the battery. Whereas the nonconcussed, pain-matched patient was always willing to restart, approximately 10% of concussed patients refused to restart. For the concussed group, 2 subjects had a GCS score of 14, while the others with the GCS score of 15. Seventy percent of the concussed patients had experienced loss of consciousness and amnesia. Patient demographics are shown in Table 1. All normative and test-retest data was collected by the same certified psychiatric professional, who holds a bachelor of science in exercise physiology from University of Houston. He has administered BrainCheck for two years in his capacity as clinical coordinator.
For the concussion and pain-matched controls, diagnoses came from several different ER physicians with an MD. Participants were identified by hospital nursing staff and then evaluated by one of the authors (SY), who was trained to use the BrainCheck battery to ensure consistency across settings.

Methods and measurements
All participants (or their legal representatives) signed the informed consent forms prior to participation in the study, as approved by the Institutional Review Board at Baylor College of Medicine.
We created tablet-based versions of six neurocognitive tests, described below (Fig 1); collectively, the battery of tests is referred to as "BrainCheck". We compared BrainCheck against the reference standard of a physician diagnosis of concussion. This choice mirrors the reference standard used in other studies of traumatic brain injury (see, e.g. [9]). A brief description of each neurocognitive test follows. An effective test of attention is the Flanker Task [16]. Patients with mTBI show significantly longer reaction times for alerting, orienting and executive parts of attention when compared with normal controls [17]. The test presents participants with a target item (in this case, a central arrow) flanked by congruent or incongruent arrows. Participants identify the direction of the target as quickly and accurately as possible.
The Digit Symbol Substitution Task measures general cognitive performance. Participants must match an arbitrary correspondence of symbols to digits; when presented with a new symbol, they find as quickly as possible the corresponding digit and answer by pressing the digit. This is a continuous performance task in which the participant makes as many correct matches as possible within a fixed testing period. It has been previously demonstrated that mTBI patients perform significantly worse on this task than controls [18].
The Stroop Task measures the reaction time required to overcome cognitive interference [19]. When the name of a color (e.g., "blue," "green," or "red") is printed in an incongruent color (for example, the word "blue" printed in red), naming the color of the word takes longer and is more prone to errors than when the word and color are congruent. Commonly used as a measure of executive function, this task measures a subject's ability to shift cognitive set [20] and provides a measure of cognitive inhibition [21,22], that is, the ability to inhibit an overlearned response [20]. The Stroop task is often used to screen for brain damage [23]. The magnitude of "Stroop interference" (the difference in reaction times for congruent and noncongruent color words) is greater in patients with mTBI than the normal population [24]. The TBI Clinical trial network has demonstrated that the Stroop task has diagnostic value for identifying neurocognitive deficits in patients with TBI [25].
The Trail Making Test (TMT) is a neuropsychological test of visual attention and task switching. Participants are instructed to connect a set of 25 numbers in increasing order as rapidly as possible. The TMT provides measures of visual search speed, scanning, speed of processing, mental flexibility, and executive functioning. Trail Making Test A uses only numbers (1 through 25), while Trail Making Test B employs alternating letters and numbers (1 -A-2 -B-3 -C-. . .). Both TMT-A and TMT-B have strong positive predictive values as a diagnostic measure form TBI [26].
Balance and coordination are often impaired in patients with mTBI [27][28][29][30]. To detect subtle defects in that domain, we developed a test in which a ball is displayed on the tablet, moving according to the tilt of the tablet. A participant holds the tablet out in front at arm's length, and tilts it appropriately to keep the ball in a central circle. The task is not difficult with normal coordination abilities; with a deficit it becomes measurably more difficult.
The Immediate and Delayed Recall Tests measure a participant's ability to correctly recall seen words in the presence of distractors. First, immediate recall is measured by serially displaying 10 words, and then asking whether a word was just seen-either a distractor word or a target word (20 trials). At the end of the testing battery, without seeing the original list again, participants are again presented with 20 words and asked whether each word was presented before. Both immediate and delayed recall show sensitivity to mTBI [31].

Analysis
We analyzed the data using custom software written in MATLAB. A group of tests taken by a single user at a single time is defined as a battery. We considered only complete batteries and, for the control populations, removed outlier data that was more than 3 standard deviations from the mean. Principle component analysis (PCA) was performed with the MATLAB function 'pca', which is built into the statistics and machine learning toolbox. Statistical significance of differences in mean values between groups was evaluated using the two-sample t-test, while statistical significance for different distributions was determined using the two-sample Kologorov-Smirnov test. All figure error bars reflect standard deviation from bootstrapping. Data is available at https://figshare.com/s/352b64af1ca84ed9251d (doi: 10.6084/m9.figshare. 4887314).

Characteristics of study subjects
We enrolled 30 patients in the mTBI group (53% males, median age 32.2), 30 users in the pain-matched control group (40% males, median age 33.6), and 538 participants in the healthy control group (46% males, median age 24.9). Demographics are presented in Table 1. The battery of all six tests was completed within five minutes by the majority of users.

Main results: Normative data
Normative data for all the tests using only the data from the healthy control group is shown in Fig 2 and Table 2. In general, the data were well approximated by log-normal distributions, and with sufficient samples size such that the standard error of the means were quite small (1-2% of the mean values). Differences in test performance by age and sex. All tests showed age-dependent performance differences, with peak performance in the population aged between 19 and 51 and declining in older or younger cohorts (Fig 3). Although there were differences between the age brackets, there were no performance differences by sex (Fig 4), with the exception of small performance advantages for males on the coordination test (p < 0.02) and for females on the Trail Making Test-B (p < 1 × 10−5). Test-retest reliability. We next sought to determine the reliability of the tests by comparing the results from users who took the assessment more than once. The time interval between test administrations was at least 7 days. We found that most individual tests showed strong retest reliability with correlation coefficients between first and second trials ranging from 0.6 to 0.9, as shown in

Main results: Performance of concussed individuals
The performance of 30 mTBI patients was compared to 30 orthopedic controls. All tests were capable of distinguishing mTBI from controls (Fig 6) and mTBI from healthy individuals (see Table 3 for p-values of the differences in the means of these groups for each test).  We also examined the performance of the individual tests as diagnostic of mTBI, presented as specificity and sensitivity as a function of the threshold for distinguishing TBI and healthy individuals (Fig 7). The most specific tests resulted in slightly above 50% detection, while the most sensitive test (the coordination test, 77% sensitivity), yielded 75% specificity.
Defining a scoring metric for concussions. To maximize the sensitivity and specificity of the individual tests, we sought to define a combined metric which would robustly discriminate patients with TBI while minimizing false positives. Thus, we defined an optimized linear sum of the scores from all six assessments. The mean of this score differed significantly between concussed and control individuals (p < 3×10−5), and between concussed and healthy individuals (p < 1×10−20) (Fig 8). This metric also provided a sensitive and specific test for TBI with sensitivity and specificity of 83% and 87%, respectively (Fig 8).
An independent test for malingering. Many test takers may have incentives to intentionally perform poorly. Athletes who perform poorly at baseline could potentially more easily return to play following an incident, while others may deliberately perform poorly on a postincident assessment in order to influence their diagnosis for insurance purposes. As a hidden metric for malingering, we included a test to detect deliberate poor performance in the battery. We asked participants to use a slider to adjust the relative size of two circles until they appeared to be equal (Fig 9). If the circles are surrounded by shapes of the same size, participants should be able to make the circles equally sized, even if the person has experienced an mTBI; the only way a person could 'fail' the test is if he were deliberately trying to do so. This test showed normally distributed data around the expected mean of zero pixels. Most importantly, concussed and healthy individuals performed similarly on the test so a large outlier performance on this test presumably indicates malingering rather than mTBI.

Discussion
Patients with mTBI typically present with subtle deficits in perception, attention, executive function, working memory, spatial attention, and coordination-all of which can be detected with simple tests. While any one of these tests can suggest mTBI, the combination of six tests creates a much more comprehensive tool and thus has more opportunity to detect mTBI. In this study, we demonstrated that a simple battery of tablet-based, easy to understand psychophysical tests has high sensitivity and specificity for detecting mTBI.
Other computer-based psychophysics tests for mTBI exist, such as the ImPACT test, which has approximately the same specificity and sensitivity (89% and 82%, respectively) as BrainCheck [32,33]. However, the advantage of the current tests is that they are briefer, tablet-based, gamified, and very straightforward to use. The ability to perform the BrainCheck battery on a tablet in approximately five minutes (as compared to over 30 minutes for the ImPACT test) suggests an easier adoption into athletics, emergency situations, and the military.
Many other detection techniques are currently in development. This includes the assessment of serum biomarkers, quantitative EEG, smooth pursuit eye tracking, and a variety of imaging techniques (e.g., diffusion tensor imaging, high definition fiber tractography, and magnetic resonance spectroscopy) but these approaches are expensive, require expertise to administer, and have an unclear future for use on the field. This leaves a need for first responders to have a decision-assist device in the field-one that is portable, accurate, and requires no expertise to administer. Tablet computing makes a portable approach to mTBI measurement possible and it may offer characteristics not available with other types of objective testing. Further, because this test can be performed rapidly, with no expertise to administer, and has safeguards to preclude malingering, it may be a candidate for use by military and athletics teams to rapidly make fitness-for-duty or return-to-play decisions.  Diagnostic accuracy of tablet-based software for the detection of concussion As with all tests, there is a trade-off between sensitivity and specificity. This tradeoff is shown explicitly for the individual tests in Fig 7 and for the composite score in Fig 8. Considering the nature of TBI, it may be useful to have a test that is highly sensitive, even if it is not highly specific, or highly specific, even if it is not highly sensitive. For example, when making return-to-play decisions, a "better safe than sorry" approach is in order, making sensitivity the most important parameter. By choosing a low threshold (around 10), BrainCheck tests can  Diagnostic accuracy of tablet-based software for the detection of concussion achieve nearly 100% sensitivity with a reduction in specificity to approximately 70%. Conversely, in emergency department settings, CT scans are typically ordered, even for minor of head injuries. By choosing a high threshold (around 22), BrainCheck can achieve nearly 100% specificity, while reducing sensitivity to approximately 30%. Because even a small reduction in unnecessary CT scans would be beneficial, it may be useful to have a test that can be optimized for specificity. By tuning the threshold, BrainCheck testing application provides a rapid, portable testing method for concussion that could be a very effective first line test for a wide variety of settings.