EEG-based motor network biomarkers for identifying target patients with stroke for upper limb rehabilitation and its construct validity

Rehabilitation is the main therapeutic approach for reducing poststroke functional deficits in the affected upper limb; however, significant between-patient variability in rehabilitation efficacy indicates the need to target patients who are likely to have clinically significant improvement after treatment. Many studies have determined robust predictors of recovery and treatment gains and yielded many great results using linear approachs. Evidence has emerged that the nonlinearity is a crucial aspect to study the inter-areal communication in human brains and abnormality of oscillatory activities in the motor system is linked to the pathological states. In this study, we hypothesized that combinations of linear and nonlinear (cross-frequency) network connectivity parameters are favourable biomarkers for stratifying patients for upper limb rehabilitation with increased accuracy. We identified the biomarkers by using 37 prerehabilitation electroencephalogram (EEG) datasets during a movement task through effective connectivity and logistic regression analyses. The predictive power of these biomarkers was then tested by using 16 independent datasets (i.e. construct validation). In addition, 14 right handed healthy subjects were also enrolled for comparisons. The result shows that the beta plus gamma or theta network features provided the best classification accuracy of 92%. The predictive value and the sensitivity of these biomarkers were 81.3% and 90.9%, respectively. Subcortical lesion, the time poststroke and initial Wolf Motor Function Test (WMFT) score were identified as the most significant clinical variables affecting the classification accuracy of this predictive model. Moreover, 12 of 14 normal controls were classified as having favourable recovery. In conclusion, EEG-based linear and nonlinear motor network biomarkers are robust and can help clinical decision making.


Ethics statement
This study was approved by the Institutional Review Board of Taipei Veterans General Hospital (approval number: 2010120091A).

Patients and outcome measurements
Patients with stroke admitted to the Rehabilitation Center at Taipei Veterans General Hospital were included. The inclusion criteria were as follows: (1) first hemiparetic stroke, (2) Brunnstrom stage between stage II and V, (3) no cognitive dysfunction and (4) willingness to participate and sign the informed consent form. The exclusion criteria were as follows: (1) unstable vital signs; (2) irreversible contracture over any of the joints of the upper extremity; (3) a history of surgery, fracture, arthritis, or pain; (4) spasticity (modified Ashworth Scale score of >2); (5) poststroke seizures; (6) heart attack within 3 months; and (7) cortical lesions in the 5 core motor areas. Of the recruited 60 patients, 7 withdrew because of personal reasons. Among the final 53 patients enrolled, EEGs of the first 37 patients were used as training data and those of the remaining 16 patients were used as validation data. This study was approved by the Institutional Review Board of Taipei Veterans General Hospital. All patients provided their written consent. They received 24-hour rehabilitation, 3 times per week for 8 weeks. The UL motor function of all patients was evaluated before and after the intervention by using the Fugl-Meyer Assessment of Physical Performance (FMA) [30], Upper Extremity Performance Evaluation Test for the Elderly (TEMPA) [31], and WMFT [32]. Table 1 lists the patients' demographic and clinical characteristics. Furthermore, we recruited 14 healthy right-handed participants in the control group for comparison.

Movement task and EEG preprocessing
Before rehabilitation, the patients were instructed to use their affected hand to perform either shoulder or elbow flexion and extension, depending on their movement ability. The healthy controls performed the right shoulder flexion and extension. All participants sat upright with 0 degree of shoulder flexion. For elbow movements, the affected upper arm rested on an armrest adjusted for the subject's height. Auditory cues were used to pace the movement (intertrial interval = 10 ± 2 s to avoid anticipation). Fail in completing a movement (angle of movement < 45 degree) was recorded digitally by the therapist pressing a button and this trial was discarded from further analysis. For every 10 success trials, participants took a short break of 1-10 minutes to prevent the distortion of performance and thus results engendered by muscle fatigue. We acquired 80 trials for each participant. All participants were instructed not to blink in the first 2 seconds of each movement.
During hand movements, 32-channel EEG data (10-20 system montage) were measured (sampling rate = 2000 Hz) and preprocessed offline by using SPM8 (Wellcome Trust Centre for Neuroimaging, http://www.fil.ion.ucl.ac.uk/spm/). The EEG data were first epoched with a peristimulus window of −1500 to 2000 milliseconds (time zero indicated the onset of an auditory cue) and filtered from 4 to 48 Hz. Artefact-contaminated trials such as electrooculography or movements (EEG amplitude > 500 μV) were excluded from further analysis. These artefact-free trials were then entered into DCM.  [12] as the initial guess. The locations of the 5    predefined areas were optimized individually through the DCM inversion. The model space of the connection architecture was constructed under the following hypotheses. First, intrinsic connections become more nonlinear after stroke, in contrast to being more linear under a healthy state. Second, reduced lateralized activation after stroke results in a less lateralized network [33]. This resulted in 6 models for comparison (see S1 Fig), which have been tested in healthy individuals [12]. Furthermore, we also constructed an all-linear model for comparison. All connections have been structurally identified in the macaque brains [34,35] and functionally tested using fMRI and EEG data in the human brain during motor tasks [36] [37,38]. For given DCM and EEG data, artefact-free epochs were projected from the channel space to sources by using the generalized inverse of a lead field matrix for our chosen sources and then transformed using a time-frequency Morlet wavelet transform (wavelet number 7) over the peristimulus time of -500 to 800 milliseconds. The absolute values of the resulting time-frequency responses were then averaged over artefact-free trials and baseline (−850 to −800 ms) corrected to produce the power spectrum at sources. Bayesian inversion was employed to estimate DCM parameters, particularly the matrices A that contain coupling parameters of modulating spectral activity induced by other sources and exogenous (e.g., stimulus) inputs. The details of this standard procedure have been reported in previous studies [12,28,37,38]. To perform a second-level analysis of the model space, the models of patients with lefthand weakness (right hemispheric lesion) were flipped along the mid-sagittal line. Bayesian model selection (BMS) was used to identify the best model under the fixed and random effects assumption at the individual and group levels [39,40,41] for the control and training groups.

DCM feature extraction
After establishing the best model, we divided the coupling parameters in the core motor cortical network into 4 frequency bands, namely ɵ (4-8 Hz), α (8-15 Hz), β (15-30 Hz), and γ (30-48 Hz), and 2 mechanisms, namely excitatory (positive) and inhibitory (negative); we averaged them across frequencies to give 8 parameters for each connection (Fig 1a). After BMS, we can then determine the number of attributes for each subject that entered into the classifier for constructing the predictive model. For example, there were 18 connections in the model profiled in Fig 1a, including 9 reciprocal connections between SMA-cM1, SMA-iM1, SMA-iPM, SMA-cPM, iPM-cPM, iPM-iM1, cPM-cM1, iPM-cM1 and iM1-cM1 and this yielded 144 attributes (8 parameters x 18 connections). For comparison, spectra over the 5 core areas were also averaged over the prespecified 4 frequency bands and mechanisms (positive and negative) to give 8 parameters for each source (Fig 1b). Hence, 40 attributes were obtained for each participant.

Classification and construct validity
The first 37 patients recruited (i.e., the training data set) were categorized as either having favorable (n = 19) or poor recovery (n = 18) according to whether their postrehabilitation improvement in FMA, WMFT, or TEMPA reached the 10% level of the maximum score [42] ( Table 1). The DCM features of the best model at the group level were used in a logistic regression for two-class classification (i.e., favorable and poor). Five-fold cross-validation was used to evaluate the classification accuracy of different combinations of frequency-specific features derived from DCM. After identifying the optimal combination of frequency-specific features, we used a backward elimination procedure to select significant connection-specific features [43,44].
Sixteen new patient datasets were used to validate the predictive model at the individual level. For comparison, we performed a dichotomous classification on the clinical factors of the validation dataset by using cutoffs that best separated the favorable from the poor outcomes. Finally, 14 datasets of healthy participants were tested through the predictive model to determine the degree to which the model reflects the normal motor network. The program used for classification and the identified predictive model in this study can be downloaded from http:// 140.115.52.12/.

Statistical tests
A 2-sample t-test was used to examine whether the clinical characteristic of patients significantly differed between datasets. Furthermore, after establishing the best predictive model, we statistically analyzed the frequency-and connection-specific parameters in the model by using analysis of variance to determine whether these predictors were significantly consistent across the patients. Statistical significance was set at P < .05.

Results
Demographic characteristics at the baseline and improvement after rehabilitation Table 1 listed the demographic characteristics of all patients at the baseline and improvement after rehabilitation and Table 2 summarized the statistic results of demographic and clinical characteristic. The statistical result confirmed that the validation dataset resembled the training dataset, suggesting a similarity among recruited patients. We observed no significant between-dataset differences in age, time poststroke, or initial impairment measured using the Brunnstrom stage, FMA, and TEMPA (Table 2). However, the WMFT scores were significantly lower in the training dataset than in the validation dataset (P = .04). Notably, age (P = .07) and time poststroke (P = .11) tended to differ between the 2 datasets, but the difference was not significant. The participants were significantly younger (P = .02) and the WMFT scores were significantly higher (P = .048) in the validation dataset than in the training dataset.
Regarding the clinical characteristics of the patients in the training dataset, only time poststroke was significantly shorter in the favorable outcome subgroup than in the poor outcome subgroup (P = .03).
Regarding the treatment effect of rehabilitation, the mean improvement scores in the training and validation datasets were 4.81 and 4.62 for the FMA, 6.54 and 6.25 for the TEMPA, and 4.59 and 2.90 for the WMFT, respectively. The mean improvement scores did not significantly differ between the 2 datasets (P > .05). However, in the training data set, all improvement scores were higher in the favorable outcome subgroup than in the poor outcome subgroup, indicating that the patients with favorable outcomes were adequately separated from those with poor outcomes. In addition, improvement scores did not significantly differ between the 2 favorable outcome subsets.

BMS result
The BMS results revealed that DCM1 and DCM3 were the best models for 19 and 18 patients in the training data set, respectively, at the individual level. At the group level, the BMS results under random effects assumption identified DCM1 as the optimal model for the training group, with an exceedance probability of 0.87 (Fig 2a, upper panel). Regarding the control group, under RFX, DCM1 was identified as the winning model for all at the individual level and at the group level with an exceedance probability of 0.97 (Fig 2a, lower panel). These results indicated that among the models being tested, DCM1 (shown in Fig 1a, right) was the most optimal for both the patients and healthy controls. We then further compared DCM1 with the all-linear model at both the individual and group level. The BMS results of the training dataset showed an overwhelming significance of DCM1 over all-linear model under FFX and RFX assumptions (Fig 2b).

Frequency-specific features and classification accuracy
We compared the classification accuracy of frequency-specific features derived from DCM1 with that of features derived from the spectrum in the 5 motor areas. The features of the motor network (i.e. DCM features) provided higher classification accuracy (up to 90%) than did the frequency-related source power alone (up to 80%; Fig 3a). Furthermore, we investigated the effect of the frequency and nonlinearity on the classification accuracy. Fig 3b presents a plot of all the combinations of frequency-specific features against the classification accuracy. We observed that, under DCM1, β oscillation was the most crucial component in the motor network, yielding an 83.1% ± 3.9% accuracy, followed by alpha and gamma rhythms. The β plus γ and β plus ɵ features yielded the highest accuracies (92.9% ± 5.0% and 92.9% ± 4.8%, respectively; Fig 3b, red arrows). However, both the combination of β and α features and the use of all features did not improve the classification accuracy. Regarding to the linear features derived from the all-linear DCM, we found the best accuracy was approximately 80% when using α+β +γ features.

Network -specific features
After identifying the optimal combination of frequency-specific features, we used a backward elimination procedure to select significant area-specific features. We determined 7 and 8 connections for β plus ɵ (Fig 4, left) and β plus γ (Fig 4, right) features, respectively, which could effectively differentiate between the patients with favorable and poor recovery. Three common β connections existed in both network models (marked with asterisks): 2 originating from SMA and 1 from cM1. Specifically, in the favorable recovery group, the SMA β connection exerted a higher negative effect on cPM but a lower inhibitory effect on cM1 while cM1 β activity facilitated SMA rhythms. In other words, we identified a greater involvement of the SMA and cM1, but low engagement of cPM, to be critical for treatment gains. However, none of the network-specific features identified reached significance in the ANOVA test (P > .05) at the group level to differentiate between the patients with favorable and poor recovery.

Construct validity and clinical variables with predictive value
We validated our model by using 16 new datasets of the patients with stroke. The overall accuracy, sensitivity, and positive predictive value were 81.3%, 90.9%, and 83.3%, respectively Neuromarkers for recovery (Table 3). Furthermore, we identified that subcortical lesions, time poststroke, and initial impairment scores measured using the WMFT were the most significant clinical variables affecting the classification accuracy in this model (Table 4). When the motor cortices were intact (i.e., subcortical lesions) and time poststroke was within 6 months, the classification rate was accurate (100% accuracy). The classification accuracy decreased with an increase in time poststroke, and it further decreased considerably when the time poststroke was >12 months (50% accuracy). In addition, the outcome was predictable in this model when the initial WMFT scores were above 45 (91% accuracy). Although age significantly differed between the training and validation datasets, it did not strongly affect the classification. The classification accuracies for the patients aged older than and younger than 55 years were 71% and 88%, respectively (Table 4). Furthermore, the predictive value of the network biomarkers was always superior to the prediction accuracy obtained by the dichotomous classification. Of the 14 control participants used to test the predictive model, 12 were predicted to have favorable outcomes and 2 to have poor outcomes.

Prior requirements for the application of the predictive model
In this study, we identified the linear and nonlinear motor network biomarkers using the prerehabilitation EEG data of a heterogeneous stroke population. By examining the clinical variables affecting the classification accuracy using independent data sets, we further recognized the patient populations for the predictive model.

Effect of time poststroke
The time poststroke in the training data set was significantly shorter in the patients with favorable outcomes (5.10 ± 4.63 mo) than in those with poor outcomes (8.22 ± 5.39 mo; P = .03), and this result supports the sensitive period for recovery ( 6 mo). In other words, stroke patients should receive the rehabilitation treatment as early as possible to enhance the efficacy. However, this result does not indicate that the predictive model can be applied only to patients close to their chronic stage, because the favorable group used in the construction of the predictive model had an average time poststroke of 5 months. The mean time poststroke was averaged across a broad range of time poststroke (1-15 mo) in this data set and the time poststroke of 10 of the 19 patients in the favorable group of the training data set was within 3 months (Table 1). In addition, because of the significant between-participant variability in neuroplasticity in response to rehabilitation, the sensitive period of "<6 months" is not a highly inferential factor. In the training data set, 9 of the 22 patients in the early-to-late subacute phase ( 6 mo) did not exhibit significant improvement after rehabilitation, whereas 5 of the 15 patients in the chronic phase demonstrated significant improvement (Table 1). Moreover, the factor of a time poststroke within 9 months became the best cutoff for the dichotomous classification. Therefore, by using an independent data set for validity testing, we demonstrate that patients having a time poststroke within 12 months are suitable for this predictive model (91% overall accuracy), with the model exhibiting excellent performance for patients with a time poststroke within 6 months (100% accuracy; Table 4). Notably, the performance of this predictive model was superior to the best accuracy of the dichotomous classification.
On the basis of our result that 85.71% healthy controls were classified as having favorable outcomes, we can infer that the predictive model of the motor network pattern resembles the Table 4. Clinical variables affecting the prediction.

Clinical variables EEG Prediction accuracy (n) Best dichotomic classification accuracy (n)
Lesion area Subcortical (n = 12) 100% (n = 12) Cortical (n = 4) 25% (n = 1) Neuromarkers for recovery normal motor network to a certain degree, but not completely. In addition, the classification accuracy decreased with an increase in the time poststroke, suggesting that recovery and/or maladaptation over time occurs through mechanisms that alter network patterns.

Effect of lesion site and initial WMFT scores
In addition to the time poststroke, we observed that the subcortical lesion and an initial WMFT score of >45 are also prerequisites for the application of this model. In the training data set, 23 of the 37 patients had a lesion at the subcortical level, and the validation data suggested that the best prediction could be obtained for subcortical lesion sites. Because electroencephalography measures the electrophysiological activity of cortical areas, any damage to the cortex would be expected to perturb the patterns and severely degrade the classification accuracy. By contrast, when the motor cortices are intact, the classification is accurate, as observed in this study. This result also indicates that the subcortical lesion consequently caused cortical network alternations, which are crucial in identifying patients who are responding to rehabilitation.
Only the initial WMFT scores significantly differed between the 2 datasets and 2 favorable subsets. The mean WMFT scores were significantly lower in the training data set than in the validation data set; in addition, the mean WMFT scores were significantly lower in the favorable group of the training data set than in that of the validation data set. It seems that a lower initial WMFT score appeared to predict better outcomes, as observed in the training data set. However, a similar trend was not observed in the validation data set because only a few patients (n = 5) were included in the poor outcome subset. Through dichotomous prediction, we observed that a WMFT score of <50 predicted better outcomes; however, the best accuracy was only 77%. By using the EEG predictive model, we identified a WMFT score of >45 to be inferential, with an accuracy of 91%. By contrast, this result did not support that a lower initial WMFT score could predict better outcomes as 8 of the 12 patients having a WMFT score of >45 were accurately predicted to have favorable outcomes by the EEG model.
In summary, patients who have subcortical lesions, an initial WMFT score of >45, and a time poststroke within 12 months are the most suitable candidates for this predictive model.

Benefits of favorable outcomes
Measuring rehabilitation outcomes is challenging because of the heterogeneity of stroke. Various measures are currently used in clinical and research practice, but no single measure is universally acceptable or sufficient for recording the minimal clinically important difference [45]. Therefore, in this study, a patient having an improvement of 6.6 on the FMA, 7.5 on the WMFT, or 16.2 on the TEMPA (i.e. 10% increase of the maximum score) was considered to have favorable outcomes. The improvement of 10% is the minimum effect that has been previously considered to be of clinical significance [42] and the FMA and WMFT are highly recommended for measuring the outcomes of patients with stroke [46]. The reported minimal clinically important difference values for chronic patients were 6 for the lowerextremity FMA [47] and 1-1.2 for the WMFT functional score [48]. Therefore, our definition of improvement is beyond the result of chance or increasing familiarity with the test observed in chronic patients. Nevertheless, the statistical results suggest that patients were adequately separated and the classification results are good, thus indicating that this definition is sufficient for our goal.

Functional significance of frequency-and network-specific biomarkers
In this study, we observed that the DCM derived network features can lead to the best prediction accuracy, thereby confirmed our hypothesis of combining linear and nonlinear (crossfrequency) network parameters to be favourable biomarkers. Furthermore, our result also suggested that the network parameters can add some predictive values to the accuracy as the frequency-specific network features outperformed the frequency-specific source power spectrum by 10%. Nevertheless, the frequency-specific source power spectrum along can provide up to 80% prediction accuracy, indicating the informative value for clinical management and prognostication of outcomes [14,49]. In addition, two possible predictive models (β+γ and β+θ) yielded relatively equal classification accuracy levels and we identified 3 consistent β connections in both network models. All the 3 connections located in the contralesional hemisphere and were connected to the SMA in the β band, highlighting the importance of the intact hemisphere and SMA in recovery. Specifically, our findings of higher SMA activity and lower inhibitory connection between the SMA and cM1 in the favorable recovery group are in agreement with previous studies that have reported that increased activity in the SMA may play a role in recovery [50] and that enhanced β connectivity between the SMA and cM1 is believed to subserve the control of the recovered hand [13]. Moreover, many studies using different methods and devices have confirmed a significant involvement of cM1 as compensatory mechanisms to recovery [51] [52,53,54]. Our finding, that the inhibition of cM1 activity was higher in the patients with poor recovery, may be one of the reasons responsible for poor recovery. Furthermore, a study reported that greater recruitment of cPM helps patients with more severe impairment to control the affected hand [55]. Our findings can explain the underlying mechanisms for these observations-the increase in cPM activity in patients with more severe impairment may be due to the insufficient inhibition from the SMA after stroke that altered the brain network. In summary, SMA β oscillations are the most prominent biomarkers for identifying target patients with stroke for rehabilitation. Regarding the translational values of these biomarkers, SMA activities could be the modulation target for rTMS.

Importance of nonlinear coupling
Nonlinear coupling in EEG/MEG signals has been seen in a variety of tasks, systems and pathological conditions, for instance, during face processing [56] or as an index of the depth of anaesthesia [57,58], though the nonlinearity may differed in the expressions of cross-frequency coupling. In the motor system, it has been shown that coupling between regions is nonlinear during the performance of a simple motor task [12,37,38]. This nonlinearity was thought to reflect the mechanisms at the level of post-synaptic receptors (e.g., NMDA receptors) that afferent inputs from other regions exert nonlinear effects on intrinsic dynamics [12]. In this study, the result of model comparison strongly points to nonlinear and linear connections (i.e. DCM1) (Fig 2b) for explaining the observed spectral dynamics after stroke. Furthermore, in support of our hypothesis, these nonlinear coupling features are informative as the prediction accuracy was improved when compared to the all-linear features (Fig 3b).

Spatial resolution of electroencephalography
Unprocessed scalp potentials derived from EEG systems involving <64 channels provide only a coarse spatial resolution [59]; therefore, there is a concern that the cortical activities of the M1 and PM may not have been appropriately separated in this study. We compared the mean spectral difference (msd) between the M1 and PM in both hemispheres by using all 37 patients' EEG data in the training data set. The statistical result suggested significant spectral differences between the iM1 and iPM (msd = −12.21%; t = −71.234, P < .0001) and between cM1 and cPM (msd = 26.49%;t = 154.5147, P < .0001). However, the mean differences of −12.21% and 26.49% also indicate a similarity of >70% in power. Because of the lack of a gold standard for comparison, additional studies using high-density EEG or magnetoencephalography may help derive such a standard. Nevertheless, it has been shown that 19-electrode array is adequate to detect the EEG based frequency-specific markers for predicting the clinical outcome after stroke [60].

Study considerations and limitations
First, all the patients underwent rehabilitation for 24 hours, and we considered that improvements in clinical scores represented the intervention outcome. However, 27 of the 53 patients having a time poststroke within 6 months may still experience spontaneous recovery and contribute to functional improvement; nevertheless, only 17 of the 27 patients in the early-to-late subacute phase but 13 of the 26 patients in the chronic phase exhibited significant improvement ( Table 1). The observed improvement cannot be completely explained by spontaneous recovery. This consideration instead reflects that intervention also plays a role in promoting recovery. Second, because we recruited a heterogeneous stroke population to construct the predictive model, the heterogeneity of patients with stroke rendered any in-depth interpretation of the observed neurophysiological difference between the favorable and poor outcome groups difficult. Thus, we discussed only the possible functional roles of 3 common connections in both predictive models, although some other connections in the predictive model may also be informative. Finally, the motor task used here may confound the results. Because the movements of the proximal part of the UL are partly controlled from the contralesional hemisphere [61], the finding that the intact hemisphere may be crucial in recovery is attributable to the uncrossed descending motor pathways. Additional studies using a set of homogeneous patients and a task that entails moving the distal part of the hand can provide insights into the underlying mechanisms.
In conclusion, poststroke linear and nonlinear features in the motor network are favorable biomarkers that can accurately predict treatment gains. The predictive value and the sensitivity of these biomarkers were 81.3% and 90.9%, respectively. Specifically, SMA beta oscillations were identified as the most crucial biomarker in the motor network. Our findings can help physicians in making clinical decisions, such as suggesting an alternative treatment along with rehabilitation at an early stage, and guide the design of rehabilitation strategies, consequently facilitating overall rehabilitation efficacy.