Imputation strategies for missing baseline neurological assessment covariates after traumatic brain injury: A CENTER-TBI study

Statistical models for outcome prediction are central to traumatic brain injury research and critical to baseline risk adjustment. Glasgow coma score (GCS) and pupil reactivity are crucial covariates in all such models but may be measured at multiple time points between the time of injury and hospital and are subject to a variable degree of unreliability and/or missingness. Imputation of missing data may be undertaken using full multiple imputation or by simple substitution of measurements from other time points. However, it is unknown which strategy is best or which time points are more predictive. We evaluated the pseudo-R2 of logistic regression models (dichotomous survival) and proportional odds models (Glasgow Outcome Score—extended) using different imputation strategies on the The Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) study dataset. Substitution strategies were easy to implement, achieved low levels of missingness (<< 10%) and could outperform multiple imputation without the need for computationally costly calculations and pooling multiple final models. While model performance was sensitive to imputation strategy, this effect was small in absolute terms and clinical relevance. A strategy of using the emergency department discharge assessments and working back in time when these were missing generally performed well. Full multiple imputation had the advantage of preserving time-dependence in the models: the pre-hospital assessments were found to be relatively unreliable predictors of survival or outcome. The predictive performance of later assessments was model-dependent. In conclusion, simple substitution strategies for imputing baseline GCS and pupil response can perform well and may be a simple alternative to full multiple imputation in many cases.


Introduction
As a major worldwide cause of death, disability and socioeconomic burden [1], traumatic brain injury (TBI) continues to be an important area of research into novel or better stratified interventions or systems of care. Baseline risk adjustment is critical to any outcome-focused research project aiming to better understand the influence of putative factors or treatments that may improve outcomes. There have been a number of attempts at baseline characterisation of which the International Mission on Prognosis and Clinical Trial Design in Traumatic Brain Injury (IMPACT) [2] and Corticosteroid Randomisation After Significant Head Injury (CRASH) [3] model are perhaps the best known. In common with other such models the presenting Glasgow coma score (GCS-or its motor component GCSm) and number of unreactive pupils are both important clinical parameters describing injury severity at presentation.
There is some potential for ambiguity in defining the baseline GCS and pupil reactivity as both may be confounded by other contributors to unconsciousness such as hypoxia or hypotension before resuscitation. The pre-hospital and resuscitation phase of severely injured TBI patients is a high-pressure and time-critical period which may additionally involve handover of care between a number of different individuals, not all of whom may be healthcare professionals experienced in GCS assessment. Coupled with a general lack of documentation standardisation between pre-hospital and hospital care, there may be a high proportion of such early data that will be missing, unreliable or confounded by under-resuscitation [4]. Dealing with these missing covariates is a critical data curation task since baseline adjustment underpins almost all analyses and therefore deserves particular methodological attention.
Complete case analysis is generally statistically undesirable and therefore some form of imputation strategy will be necessary to deal with missing values. Arrival [5] or post-emergency resuscitation scores may be substitutes, but it is not clear which is best or, when more than one is present, which to use. Furthermore, the GCS after TBI is not static; instead it may evolve dramatically in the early phase, and, since it is impossible to define standardised timepoints given the vast heterogeneity of TBI presentations, this may influence the measured level of consciousness even between similarly injured patients. There is no clear consensus on which time points to use; for example, IMPACT and Trauma Audit and Research Network (TARN) models [6] have published different approaches.
The concept of a 'post-stabilisation' score (as used in IMPACT) is superficially appealing but also problematic to unabiguously define clinically. In modern pre-hospital care and emergency medicine systems, anaesthesia and endotracheal intubation are key interventions which may even occur simultaneously with other resuscitation efforts so that the motor and verbal GCS sub-scores may be unavailable in the 'best resuscitated'/'post stabilisation' case. Again, this mandates some form of imputation. One simple 'shorthand' approach used by clinicians is to rate these components a '1' but this approach may over-estimate injury severity. Some form of imputation is desirable, and, whilst a regression model for the estimation of missing values has been published [7], such a univariate approach does not include other covariates and also discards discriminating pre-intubation information which might be prognostically important.
Multiple imputation is perhaps the most appropriate approach for dealing with missingness when constructing prediction models [8] and is likely to be less biased than complete case analysis if performed carefully. However, the computation of many large datasets may be computationally expensive and reproducibility will be determined by choice of covariates for the imputation as well as the imputation method chosen and the random seed. Multiple imputation offers the advantage of creating a complete dataset of covariates, avoiding combining data across various time-points. However, the variance from the various imputations must be incorporated in some way into the uncertainty of the final estimator. Methods for pooling results across imputations exist for many simpler model types but may not be established in other cases in general (in which case researchers may need to instead use a single imputation or alternative strategy).
The Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) study [9,10] is a large, pan-European observational study which aims to better understand the determinants of outcome and optimal treatment by better clinical phenotyping through deep data collection. Detailed data on early neurological assessments as well as outcome data (including, inter alia, the extended Glasgow outcome score-GOSE) was collected. For all the reasons above there is considerable uncertainty in how best to robustly define the most 'reliable' GCS (or sub-score) to use in such observational research and what difference different imputation assumptions might make. It is therefore likely to be critical to establish this to ensure a consistent, principled and reproducible approach, and such considerations will be equally appropriate to other studies.
The CENTER-TBI dataset is particularly complex. The GCS and pupil response variables in CENTER-TBI are recorded at several time points: pre-hospital, arrival at any referring hospital (for secondary transfers), arrival at study hospital and post-stabilization. The objective of this work was to determine a clinically and statistically plausible method to obtain a derived baseline GCS for prognostic analyses in CENTER-TBI. We investigated different substitution methods for dealing with missing GCS and pupil reactivity data as well as comparing these strategies to more numerically cumbersome multiple imputation. Furthermore, we set out to determine the effect of different strategies for imputing missing motor and verbal sub-scores in anaesthetised and ventilated patients.

Materials and methods
We assessed the performance of these strategies and their combinations by comparing McFadden's pseudo-R 2 for both logistic regression (for dichotomous alive/dead outcome) and proportional odds logistic regression (for the ordered categorical modelling of GOSE) using the other IMPACT predictors as additional covariates.

Patients and data
Like many TBI studies, the IMPACT model was based on patients with moderate and severe TBI (i.e., GCS � 12). By contrast, the CENTER-TBI study is instead stratified into 'emergency department', 'hospital admission', 'ICU admission' strata. These strata are not directly comparable to the traditional 'mild', 'moderate' and 'severe' categorization. Since re-stratification would require the use of an optimal GCS, which is itself the goal of this work, we instead created models for all patients and for the ICU stratum only to avoid this circular logic. The latter stratum is likely to best approximate combined moderate/severe categories modelled in IMPACT.
We used release 1.0 of the CENTER-TBI data set with local data hosting, management and extraction on the Opal platform [11]. GCS component and number of unreactive pupils data from the original data set were available for the following time-points: pre-hospital, arrival at referral centre ED (where a secondary transfer took place), arrival at study hospital and ED discharge ('post-stabilisation'). IMPACT covariates were extracted from the electronic case report form data.

Ethics
The study was authorised by the CENTER-TBI management committee. The ethical approvals for the CENTER-TBI study have been previously described [9,10] and a full list of ethical approvals is available at https://www.center-tbi.eu/.

Imputation by substitution strategies
We considered five approaches for obtaining a derived GCS, GCS motor score and pupil reactivity: • 'IMPACT' approach: start with ED discharge assessments (approximates the 'post-stabilisation' score used in IMPACT). If absent, substitute with the next available value going back in time. I.e., ED discharge ! study hospital ED arrival ! referring hospital ED arrival ! prehospital.
• 'TARN' approach: start with the value on arrival to the referring hospital ED. If absent, find most reliable score going forward in time. I.e., referring hospital ED arrival (for secondary transfers) ! study hospital ED arrival. If this is missing then the prehospital score is used.
• 'Best score' approach (to reflect the severity of primary injury without neuroworsening/once resuscitated): Best neurological status (highest GCS sum or GCS motor score, fewest unreactive pupils) across all time points (pre-hospital/referring hospital (secondary transfers)/study hospital ED arrival/post-stabilization).
• 'Erasmus' approach: start with the study hospital arrival GCS score (reflects complete prehospital/referring centre stabilisation). If this is missing, use sequence study hospital arrival ! referring hospital arrival (for secondary transfers) ! pre-hospital.
• 'Worst score' approach (most pessimistic assessment): Worst neurological status (lowest GCS sum or GCS motor score, most unreactive pupils across all time points. The outcome of interest was 6 month survival or GOSE. Where this was missing, it was imputed (as explained below).

Statistical analysis
We used the same predictors as were found to be significant in the IMPACT model, viz. we added candidate GCS/GCSm and pupil unreactivity to a set of 'fixed' baseline characteristics; age, glucose, haemoglobin and CT characteristics (Marshall score, the presence of traumatic subarachnoid haemorrhage or epidural haematoma). We used 6 month GOSE as our outcome of interest.
To evaluate 'simple' imputation by substitution strategies for GCS, GCSm and pupil reactivity it was necessary to first obtain a fully imputed set of the remaining covariates and outcomes so that we did not have additional missingness in these parameters that varied between models. Linear regression was used for imputation of haemoglobin and glucose which were approximately normally distributed. Logistic regression was used for dichotomous presence or absence of an extradural haematoma or traumatic subarachnoid haemorrhage. Marshall score and 6 month GOSE (the outcome of interest) was imputed using a proportional odds model which also included 3 and 12 month GOSE for the imputation to better approximate 6 month GOSE. Cases in which no GOSE was available were deleted after imputation for statistical efficiency. Age, hypoxia and hypotension were included in the imputation as predictors but did not need to be imputed themselves: age data was complete and hypoxia and hypotension were assumed to not have been present if the data were missing.
The putative GCS, GCSm and pupil reactivity scores were then calculated as described above using Opal. The similarity among these scores was compared by pairwise Spearman's correlation analysis. We then used proportional odds logistic regression to build models with 6 month GOSE as an ordinal outcome evaluating all combinations of GCS or GCSm and pupil versions. We also constructed equivalent logistic regression models for 6 month survival status. Model performance was compared using their (McFadden's) pseudo-R 2 values.
To evaluate the performance of models with a completely imputed data set, similar imputations were performed for the complete data set by also including the GCS and GCS motor score as well as pupils for pre-hospital, study or referring hospital ED arrival score and ultimate ED discharge time points. Because the 'referring' ED time point applies only to the subset of patients who underwent a secondary transfer to the study centre, the imputed 'referring' and 'study hospital' arrival assessments were then combined into a single 'presenting ED arrival' time point representing the neurological assessment at the first ED to which the patient presented.
In all cases, 200 imputed data sets were obtained using 5 iterations using the MICE (version 3.3.0) package [12] for the R statistical programming language [13]. MASS (version 7.3.50) was used to build the proportional-odds model. The imputations were run on the machine (Intel Xeon Silver 4110 2.1G, 8C/16T, 10 GB RAM, 240 GB Hard disk) running Ubuntu 16.04.5 LTS (Xenial). Cases with no GOSE at any of the time points were included in the imputation but subsequently deleted before modelling for statistical efficiency in accordance with the methodology of [14]).

Results
The patient characteristics of the CENTER-TBI data set will be published elsewhere. However, for all strata there were (n = 4, 509) patients and the ICU stratum subset consisted of (n = 2, 138) patients. Multiple imputations each took approximately 6 hours of dedicated machine time.

Imputation by substitution
The above substitution strategies were implemented and the percentage missing for different imputation by substitution strategies for GCS and GCSm are shown in Table 1. Fig 1 shows how the resulting imputed data sets using different imputation strategies are weighted across time points. The distribution of GCS between 'mild', 'moderate' and 'severe' categories was similar and is presented in the Supporting information for all strata and the ICU stratum (S1 and S2 Figs). The GCS and GCSm and also pupils were highly correlated between imputation strategies (Fig 2) illustrating their degree of equivalence. Fig 3 shows the variation of McFadden's pseudo-R 2 for logistic regression models for alive/ dead constructed using differing 'simple' imputation by substitution schemes and using different strategies for handling motor and verbal scores for intubated or sedated patients. There was a variation in pseudo-R 2 between models that was often statistically significant but small in absolute value. For all stratum models, the optimal model (with R 2 � 0.44) was obtained by using the full GCS sum score and using the IMPACT methodology for imputation (although combining IMPACT GCS with ERASMUS pupils performed marginally better still) with deletion of patients where motor or verbal subscores could not be assessed. For the ICU stratum only (S3 Fig), the optimal strategy (R 2 � 0.38) was to use the best-neurology motor score although the best-neurology pupils was consistently sub-optimal across the models. In this group, treating un-assessed/untestable verbal or motor subscores as '1' produced comparable or possibly marginally better models than a deletion strategy. The equivalent results for proportional odds models for GOSE are shown in Fig 4. Again, there was variability in model performance that was typically statistically significant but small. Values of R 2 � 0.16 were much lower than for the dichotomous/logistic models. For the all strata models, the optimal model was obtained by combining the IMPACT method full GCS sum scores with either IMPACT or ERASMUS imputed pupil responses. In this group, imputing unavailable motor component as 1 but deleting cases for which verbal component was not assessable gave the best model (with R 2 � 0.165). For the ICU stratum, the best model (with R 2 � 0.14) was obtained using the best-neurology full GCS sum and by imputing missing motor and verbal components as 1 (S4 Fig). There was little to choose between pupil imputation strategies except that the best neurology strategy performed consistently badly. For the all strata group, the optimal strategy for handling missing motor or verbal components was deletion and imputation. Model performance was typically better using the GCS sum rather than motor sub-score and there was a consistent trend to better model performance using neurological assessments from later time points (although using pupil assessments at ED discharge was marginally inferior). The best possible R 2 was �0.44. For the ICU group, deleting and imputing missing motor or verbal GCS time

Fig 2. Spearman's correlations between imputed data for all patients (complete case analysis with verbal/motor scores for intubated/sedated patients treated as missing).
The figure shows that GCS, GCSm and pupil reactivity variables are highly correlated between substitution imputation methods. Furthermore, the GCS sum and motor component are also highly correlated. There is a modest negative correlation between imputed versions of pupil reactivity and GCS sum or motor score (the negative correlation reflecting the coding of pupils as the number of unreactive pupils).
https://doi.org/10.1371/journal.pone.0253425.g002 points was also a good strategy, yet substituting missing motor as 1 was marginally better still. Use of the full GCS sum again marginally outperformed the models based on the motor component alone. A similar consistent pattern to improved models at later time points (again, with ED discharge pupils performing marginally worse than the ED arrival values). For the ICU group (S5 Fig), the best R 2 � 0.38. Models based on pre-hospital neurological assessments were the worst performing for both all strata and ICU stratum groups.
The results for the fully-imputed proportional odds models are shown in Fig 6. Again, model performance was lower than that of the logistic regression model overall but slightly higher for the all strata group (best R 2 � 0.155) compared to the ICU subgroup (best R 2 � 0.135; S6 Fig). Deletion and imputation of missing motor and verbal sub-components was the best strategy in both groups. However, in contrast with the results of the logistic regression model, as summarised in Fig 5, the optimal time for prediction (in both groups) of the proportional odds model was at the time of arrival to the presenting hospital rather than at ED discharge. Again, using the pre-hospital time-point consistently generated the worst performing models.

Discussion
These results from CENTER-TBI are broadly in agreement with the performance of those previously quoted [15]. It is important to point out that the (pseudo)variance is highly dependent on model choice (e.g., dichotomous alive/dead vs proportional odds model for ordered GOSE categories, which is a more difficult modelling task, statistically speaking and has a  commensurately smaller R 2 ). Similarly, it is perhaps not surprising that model performance varies between patient subgroups; models for all strata generally performed better than those for the ICU subgroup (see Supporting information for figures) although the difference in R 2 was much smaller than the difference between the choice of logistic or proportional odds models. This is presumably because the all strata group contains a proportionately higher contribution from less severely injured patients who are more homogeneous.
Our results also demonstrate a sensitivity of model performance to choice of imputation strategy. This sensitivity is typically statistically significant (even accounting for the very large number of comparisons in this work) but small in absolute R 2 terms. Therefore for applications, such as baseline risk adjustment, where a truly optimal R 2 is not of primary importance, there is little to choose between these strategies. Broadly speaking, using the full GCS sum performs better than the motor score alone. Furthermore, a very simple imputation by substitution strategy based on filling in missing values using other available time-points outperforms full multiple imputation and is computationally far more straightforward. The IMPACT strategy (working backwards from the ED discharge) seems to generally work well, though 'best GCS' and 'best GCSm' approaches perform extremely well on the ICU stratum subset of patients presumably because these parameters best reflect the degree of primary neurological injury. In contrast, however, use of the best (i.e., highest number of reactive) pupils was a consistently poor strategy. This is interesting and presumably reflects a comparatively disproportionate effect of even an episode of one or two unreactive pupils on outcome.
For applications (e.g., in generating a prediction tool) where optimal model performance is important, the sensitivity to imputation strategy may be more important, and it is likely that the optimal choice will depend on the details of the task at hand. Furthermore, the inclusion  criteria, patient group and precise structure or missingness in the CENTER-TBI data set is likely to be different from other data sets, and it is likely that our findings will not generalise. Therefore, for specific applications where high model performance is mission critical, we recommend that a bespoke evaluation, analogous to that presented here, is undertaken to ensure the best strategy is obtained.
We have demonstrated that simple strategies for imputation of of GCS/pupil reactivity can outperform formal multiple imputation in terms of R 2 , simplicity of use and computing requirements. As demonstrated in Table 1, very low levels (<10%) can be achieved with such simple approaches, and therefore, imputation by substitution using, say, an 'IMPACT' approach has merit over a multiply-imputed strategy for general purposes. Having said this, such simple models achieve imputation by sacrificing the temporal data, and fully imputed models allow predictions at multiple time-points which may be of interest in some applications. Indeed, our findings from the multiply-imputed case are interesting. The models based on the pre-hospital time-point consistently perform worse than for those using later time points. This finding may be because pre-hospital clinical assessment is intrinsically less predictive with outcome being dominated by subsequent neuroworsening events. However, another equally plausible explanation is that this is simply a reflection of the high degree of missingness and variable accuracy of pre-hospital data. In either case, our results demonstrate the unreliability of pre-hospital data which is an important result when analysing studies or planning future studies.
It is interesting that there is a monotonic increase in logistic regression performance with neurological assessments at later time points being more predictive of outcome. Thus, the ED  discharge time point is, broadly speaking, recommended from our findings (although the trend was perhaps a little less pronounced in the ICU subgroup). This is perhaps not surprising since ED discharge is temporally closest to outcome and occurs after a period where the patients are maximally 'differentiated'. However, we have demonstrated that this is also model-specific. When a proportional odds model is used to model GOSE instead, the ED discharge time point performs almost as poorly as the pre-hospital one, and predictions are better based on the neurology at arrival at the presenting hospital. It is hard to be certain why this might be but clearly there is some factor that is predictive of intermediate outcomes but not gross survival that is diluted by the time of ED discharge. It is noteworthy that that ED discharge methods are heavily weighted towards ED arrival GCS measures, as illustrated in Fig 1. Again, this reinforces the need to individualise the time points of neurological assessment when planning TBI studies.
It is important to stress that what we have herein referred to as the 'TARN' imputation refers only to the order in which we choose substitutes for this 'simple' imputation. The actual TARN model uses a more sophisticated range of covariates (with GCS polytomised to reflect predictive power and additional categories to reflect injury severity and intubation status). In this work we have instead used the IMPACT model structure for all evaluations. This facilitates direct comparison as we are not interested necessarily in finding a new model but instead in comparing the predictive information content of different GCS/pupil response formulations. However, this may account for the slightly poorer performance of the TARN imputation in absolute terms; optimum imputation strategies for other bespoke models, such as that of  TARN, would need to be evaluated on a case-by-case basis if achieving the highest possible R 2 were of importance.
Our analysis was conducted on data from the CENTER-TBI study in order to determine the optimal imputation strategy in this case. Other studies are likely to have differing patient demographics, and the numbers and structural determinants of patterns of missingness may well differ in alternative medical systems. Indeed the patient populations we used are not directly comparable to the IMPACT group as the latter was developed on a group of moderate and severe TBI patients whereas CENTER-TBI was designed to be stratified by ICU, hospital or ED admission instead. As a result, we cannot be sure that our results will generalise directly. However, we believe that we have presented a statistical framework which other studies may wish to follow as a first step when considering their best baseline risk adjustment variables.

Conclusions
Simple imputation of missing GCS, GCS motor score or pupil reactivity using substitution strategies are computationally trivial and can outperform full multiple imputation in terms of higher pseudo-R 2 and achieve data missingness of less than 10%. For both logistic regression survival and proportional odds GOSE models, the R 2 is sensitive to imputation strategy and patient subgroup. However, this variability is small in absolute terms, and a strategy based on using assessments at discharge from the ED and choosing earlier time points if these are missing generally performs well. Full imputation using chained equations may be useful if temporally-sensitive predictions are to be made. In this case, pre-hospital assessments are unreliable predictors.