Motor cortical beta transients delay movement initiation and track errors

Motor cortical activity in the beta range (13-30 Hz) is a hallmark signature of healthy and pathological movement, but its behavioural relevance remains unclear. Recent work in primates and human sensory cortex suggests that sustained oscillatory beta activity observed on average, may arise from the summation of underlying short-lasting, high-amplitude bursts of activity. Classical human movement-related event-related beta desynchronisation (ERD) and synchronization (ERS) may thus provide insufficient, non-dynamic, summaries of underlying focal spatio-temporal burst activity, limiting insight into their functional role during healthy and pathological movement. Here we directly investigate this transient beta burst activity and its putative behavioural relevance for movement control, using high-precision magnetoencephalography (MEG). We quantified the subject-specific (n=8), trial-wise (n>12,000) dynamics of beta bursts, before and after movement. We show that beta activity on individual trials is dominated by high amplitude, short lasting bursts. While average beta changes generally manifest as bilaterally distributed activity (FWHM = 25mm), individual bursts are spatially more focal (FWHM = 6 mm), sporadic (1.3 −1.5/s), and transient (mean: 96 ms). Prior to movement (the period of the classical ERD), the timing of the last pre-movement burst predicts movement onset, suggesting a role in the specification of the goal of movement. After movement (the period of the classical ERS), the first beta burst is delayed by ~100ms after a response error occurs, intimating a role in error monitoring and evaluation. Movement-related beta activity is therefore dominated by a spatially dispersed summation of short lasting, sporadic and focal bursts. Movement-related beta bursts coordinate the retrieval and updating of movement goals in the pre- and post-movement periods, respectively.

beta activity increases before gradually returning to baseline, and this increase has been linked to 52 processes relating to updating of movement outcomes (Fine et al., 2017;Tan et al., 2014Tan et al., , 201653 Torrecillos et al., 2015). 54 In addition to healthy movement control, understanding the functional role of cortical beta activity 55 appears vital to unlocking understanding of the pathophysiology of diseases of the human motor system 56 discrimination, whereby non-detection is linked to higher burst rate and temporal coincidence with 74 sensory input (Shin et al., 2017). Trial-averaged beta activity may therefore reflect the underlying 75 cortical burst rate, which will itself stochastically determine the probability of the very largest bursts 76 occurring across trials at behaviourally relevant time points (Jones et al., 2009;Sherman et al., 2016;77 Shin et al., 2017). Burst occurrence may be linked to transient inhibitory activity directed to sensory 78 processing, an idea supported by biophysical modelling of inter-laminar neural dynamics (Shin et al., 79 2017). However, whether beta has an active processing and information transmission role which may 80 generalise to other brain systems (e.g. motor) is unknown ( Specifically, it would seem difficult to reconcile a role for beta in sustained facilitation of the current 86 motor state if bursts are sporadic, short-lasting events which do not reliably time-lock to behaviour and 87 dominate the changes in beta power both within and across trials. Here we address whether burst 88 characteristics such as rate, size and timing might be primary drivers of behaviour. 89 To investigate the role of cortical bursting activity in healthy subjects before and after movement, we 90 analyse trial-wise dynamic beta burst activity from motor cortex as obtained from a high precision MEG 91 dataset using subject specific head-casts and detailed cortical models (Bonaiuto et

99
We recorded high signal-to-noise (SNR) MEG, using individual head-casts (Bonaiuto et al., 2018; 100 Meyer et al., 2017;Troebinger et al., 2014bTroebinger et al., , 2014a, in 8 healthy subjects during a probabilistically 101 cued movement selection task ( Fig.1 A,B). This provided us with pre-and post-movement measures of 102 beta activity, with over 12,000 trials (per subject M=1620, SD=763.7) (Bonaiuto et al., 2017). Multiple 103 runs (1-4) were recorded per subject over several days and within each recording of 15 minutes, head 104 movements were only 0.23 ± 0.04, 0.25 ± 0.05 and 0.99 ± 0.54 mm in the x, y and z directions 105 respectively ( Fig.1 C), supporting high SNR individual datasets. Subject-specific head-casts improve within and between session co-registration accuracy and reduce 112 within-run head movements, facilitating the acquisition of large, high SNR, datasets. C Mean head 113 movements across all subjects (± SEM errorbars) are on average <1mm. 114 115 Source inversion and virtual electrode creation at the contralateral primary motor cortex (see methods) 116 disclosed a broadband (2-100 Hz) time series for each trial. Frequency specific changes in movement-117 related activity were first examined using a time varying spectrogram (averaged across all trials), shown 118 here for a representative subject (Fig 2A; top panel). The spectrogram illustrates the classical, average 119 (across trials), reduction in beta activity (13-30 Hz) prior to movement (ERD -blue), and, average 120 increase after movement (ERSred); also shown specifically for the beta band alone (Fig 2A; middle  121 panel). 122 This gradually modulating signal has been linked to mechanisms supporting the current motor state as 123 well as inhibitory processes relating to upcoming movement planning and preparation (Engel and Fries, 2010). Proposals about the functional relevance of movement-related beta activity generally rest on the 125 implicit assumption that individual trials show a similar pattern of activity to the average ERD and ERS 126 signal, and that the behaviourally relevant component is therefore the slowly modulating magnitude of 127 the beta signal. However, we next show that the coefficient of variation of the beta amplitude is 128 consistently higher than expected when compared to a phase-shuffled surrogate data, which is spectrally 129 identical to that of the representative subject (Fig 2A, lower panel; see methods). Subsequently, 130 individual trials from this same representative subject are plotted as spectrograms, filtered signal (with 131 amplitude enveloped superimposed), beta power (amplitude .^2), and individual selected beta burst time 132 series. This reveals the highly dynamic activity in the un-averaged data ( Fig 2B) which does not 133 obviously accord with the average and gradual changes in beta shown in the average spectrogram ( Fig  134   2A). Conventionally, beta amplitude envelopes have been averaged over trials to remove and smooth 135 out such dynamic fluctuations, with attention focused (and behavioural correlations made) on the slowly 136 changing, mean signals that results. We here asked whether such dynamic changes in beta activity 137 contributed to classical average signals, and whether these transient bursts had significant behavioural 138 relevance. We use "ERD" and "ERS" to denote average (across trials) changes in the magnitude of the 139 beta signal before and after movement, respectively. At the single trial (non-averaged) level we use 140 "amplitude" to refer to the time varying envelope of the beta signal (√power; Fig B second  In order to formally assess the relationship between the average beta ERD, ERS and transient burst 143 activity, we empirically determined in each individual and separately for the ERD and ERS period a 144 threshold for defining burst activity. These thresholds were quantified in terms of standard deviations 145 (SD) of the beta amplitude variability over time, above the median beta amplitude for the pre-(-3000ms 146  0ms) and post-(0  2000ms) movement periods. In order to define these thresholds, we used the 147 method of Shin et al. (Shin et al., 2017), and for each subject we correlated the trial-wise beta amplitude 148 with beta burst count across a range of different thresholds for the pre-and post-movement periods 149 separately. The burst definition threshold, across all subjects, that resulted in the highest correlation 150 between burst rate and mean (within trial) beta amplitude, was found to be 1.75 SDs above the median 151 for the pre-movement and post-movement periods. This threshold was then used on each subject (on 152 instruction cue and response locked datasets) so that their relative thresholds were matched (1.75 SDs 153 above beta amplitude median), although absolute thresholds could differ across subjects according to 154 their data and SNR. 155

Beta burst probability is modulated by movement 170
The temporal distribution of motor cortical beta bursts in relation to movement remains unexplored in 171 humans. To assess whether the beta ERD and ERS might contain significant beta burst activity, we 172 examined the relationship between the burst probability and the ERD and ERS in primary motor cortex 173 during the pre-and post-movement periods, respectively ( Fig 3A). 174 Pre-movement, the average beta burst rate was 1.3 ± 0.08 bursts per second (bps; -3s to 0s prior to 175 instruction cue). Normalised average ERD beta (across trials) in the same pre-movement period (% 176 relative to pre-movement mean) decreased from 16.3.3 ± 2.1 % (-3 s prior to the instruction cue 177 presentation) to -15.9 ± 3.2 (at the instruction cue; Fig 3C,  pre-movement period the beta burst probability across trials fell from 20.3 ± 0.7 % to 4.5 ± 1.0 % at the instruction cue (Fig 3C, top panel). In order to assess whether beta bursts made a significant contribution 182 to the shape of the average (classical) ERD signal, we then mathematically replaced the short periods 183 during each beta burst with the mean (pre-movement) beta amplitude. Therefore, if beta really were 184 sustained oscillations at the single trial level (i.e. bursts were noise, unrelated to movement) one would 185 expect that replacing sparse bursts would lead to only a small shift in all timepoints of the ERD 186 downwards but not greatly change the overall shape. Alternatively, if beta bursts appreciably contribute 187 to the ERD, we predicted that removing bursts would considerably flatten the shape of the ERD curve. 188 This was borne out by the data, in which bursts comprised only 11.9 ± 0.2 % of the total time course 189 (pre-movement) and yet their replacement with mean beta amplitude led to a significant reduction in 190 early ERD from 16.3.3 ± 2.1 % to -13 ± 1.6 % (-3 seconds prior to instruction cue, t7=-10.5; p<0.001). 191 There was a smaller reduction in the late pre-movement period from -15.9 ± 3.2 % to -24.2 ± 2.3 % 192 (onset of instruction cue, t7=-5.4; p=0.001) with therefore an overall effect of flattening of the ERD 193

curve. 194
Post-movement, the average beta burst rate was 1.51 ± 0.08 bps and the average beta ERS signal 195 increased from -25.8 ± 3.9 (button press) to 21.2 ± 4.9% at the peak (568 ms). Over the same period, 196 the beta burst probability increased from 2.3 ± 1.0 to 22.9 ± 2.5 %. Therefore, even at the peak of the 197 ERS, bursts were infrequent across trials at that exact moment (< 1 in 4 trials). Furthermore, during the 198 ERS period, rather than aligning to movement, bursts were temporally delayed and widely distributed 199 across the post-movement period with mean onset of the maximum of the first burst at 681 ± 34 ms post 200 button press with an average 404 ± 12.5 ms within subject standard deviation (ie variability across trials 201 for each individual subject). We then repeated our burst (mean) replacement analysis in the post-202 movement period with the prediction this would specifically attenuate the classical ERS (rather than 203 lower all time points across the time series, our alternative hypothesis if bursts were noise, unconnected 204 to behaviour). Here, replacement of the bursts, comprising just 12.8 ± 0.32 % of the total time course, 205 significantly attenuated the classical ERS, reducing it from a maximum of 21.2 ± 4.9 % above the mean 206 (568 ms) to -0.6 ± 1.8 % following burst replacement. Importantly, replacing periods of higher beta 207 amplitude with the (lower) mean beta amplitude will inevitably reduce the results (modified) ERD and 208 ERS across trials. However, what is notable about replacing the beta bursts was that it was the shape of 209 the ERD (gradient reduced) and ERS (peak flattened) that changed as opposed to a general (time 210 independent) shift across the whole period. Furthermore, the complete negation of the ERS peak despite 211 removal of just a small amount (12.8%) of the total data in the form of short lasting, infrequent (less 212 than 1 burst in 4 trials at peak) bursts (threshold of 1.75 sd above the median captures ~ top 12% of 213 data), strongly suggests that the cortical ERD and ERS are dominated by burst probability rather than a 214 slowly changing drift in amplitude. method for empirically determining the threshold (correlating burst rate with beta amplitude across 218 trials) we here correlate the time resolved burst probability ( Fig 3A, black line) against the ERD and 219 ERS over time (Fig 3C, blue line). This quantifies, over the timecourse of a trial, the relationship 220 between the shape of classical ERD and ERS and the shape of beta burst probability during the trial 221 ( Fig 3B), separately for the pre-movement and post-movement periods and can be considered as 222 correlating the blue and black curves ( Fig 3A) against each other. At each time-point, burst probability 223 and the ERD and ERS at that same point (across all trials) were closely related (mean R 2 of 0.93 ± 0.02 224 in the pre-movement period and a mean R 2 0.93 ± 0.03 in the post-movement period; Fig 3B). This 225 indicates that average beta changes (as performed in the majority of conventional analyses) is dominated 226 by burst probability across trials (Fig. 3B). In order to directly visualise the relationship between burst 227 timing and behaviour, we plotted the timing of beta bursts for all subjects and all trials separately for 228 the pre-and post-movement periods as a raster plot ( Fig 3C). This shows a striking consistency in the 229 distribution of bursts across subjects, with minimal bursts around movement onset, and then a 230 temporally diffuse increase in burst probability after movement. 231 232 233 Figure. 3. Burst probability is modulated by movement and accounts for the classical ERD and 235 ERS. A Normalised (% versus pre-movement mean) beta amplitude ± SEM (blue) is shown in the pre-236 movement period with burst probability (%) overlaid above (black). Beta ERD with bursts replaced by 237 the mean beta amplitude is shown overlaid (red) demonstrating that replacement of low numbers of 238 sporadic (comprising just 11.9% of total pre-movement time) bursts markedly flattens the ERD (top 239 panel). Below we show the same analysis for the post-movement period. Note that the burst probability 240 accurately tracks the classical change in the beta amplitude and also that replacement of short periods 241 of intermittent bursting (12.8% of total time series) by mean power removes the classical ERS peak and 242 attenuates the late ERS slope. Example of a single subject / trial (inset; 0 -1000ms post-movement) 243 beta amplitude (blue) shown with single burst replaced with the mean (red) B Burst probability (across 244 all trials) for each subjects is correlated against the normalised, time resolved ERD and ERS 245 demonstrating a strong and consistent relationship between them. C Raster plot showing timing of each 246 individual burst (single dash = peak of burst) for all 8 subjects (>12,000 trials) using the empirically 247 defined (1.75 sd > median) threshold (individual subjects divided by dashed red lines). This highlights 248 a notably consistent relationship of burst timing to movement onset with a gradual reduction in burst 249 probability in the time building up to movement and then a significant increase (weakly temporally 250 locked) after movement. 251 252

Beta bursts are larger than matched surrogate data 253
We next examined whether the variability of the beta signal could be differentiated from a spectrally 254 matched, surrogate (phase-shuffled) dataset across all subjects. We found that the coefficient of 255 variation of the beta amplitude was increased in all subjects in the real data compared to the surrogate 256 for the pre-(t7=21.3; p<0.0001; Fig 4A) and post-movement periods (t7=26.4; p<0.0001; Fig 4A)  as well as duration and previous evidence in primates suggests that the distribution of cortical beta 260 amplitude is skewed towards high amplitude bursts when compared to surrogate data in primates 261 (Feingold et al., 2015). The distribution of peak beta burst amplitudes in our dataset, across all time 262 points, was therefore compared with the peak beta burst distribution from the same surrogate (phase 263 shuffled) dataset. The comparison demonstrated that there were a greater number of larger bursts in the 264 original data compared to the surrogate dataset and this was significant for all burst sizes above the 30 th 265 percentile (pFDR<0.05, Fig. 4B). Burst duration was also longer than in the surrogate dataset, but in a 266 restricted range (300-400 ms, pFDR<0.05). In fact, most bursts were of short duration with an average 267 burst duration of 96.2 ±8.8 ms and 94.2 ± 10.4 ms in the pre-and post-movement periods, respectively 268 (t7= 0.7,p=0.49). Burst characteristics were significantly larger in terms of height (t7=2.3; p=0.049), but 269 showed no difference in length (t7=-0.7; p=0.49) w the pre-and post-movement period. 270 There was a tendency of M1 beta oscillations to produce more, higher powered and longer, bursts than 282 would be expected from spectrally matched surrogate data (Fig. 4). Therefore, this difference in size 283 and length, in addition to their strong modulation by behaviour ( Fig. 3)  burst across all subjects for real (blue) versus phase shuffled, surrogate (x1000 shuffles), data (red), 292 shown separately for the pre-(left) and post-(right) movement periods. C Sensor level scalp maps 293 showing sensor level activity (normalised within subjects and averaged over all bursts / subjects) 294 aligned to three periods of maximum extrema (indicated by vertical dashed lines). 295 Thus far we have formally examined the properties (height and duration) of the burst amplitude (Fig 4  296 A,B). We next proceeded to examine individual bursts in the time series domain to investigate their 297 consistency with regards to shape in the pre-and post-movement periods and within and across subjects 298 (unfiltered) together from the pre-and post-movements periods separately, aligned to the maximum of 300 the burst amplitude peak (± 100ms). We then averaged across these bursts to derive the mean burst 301 shape ( Fig 5A; average burst for each subject shown). In order to quantify the homogeneity of bursts, 302 we then cross-correlated each individual burst with the mean burst and extracted the maximum cross 303 correlation (from the centre point ± 25 ms; see methods) and averaged this across all bursts. This 304 demonstrated a mean maximum cross-correlation of 0.39 ± 0.03 and 0.44 ± 0.03 for the ERD and ERS 305 periods respectively (Fig 5B).
The subject specific mean burst shapes (real data and surrogate) were then taken forwards for analysis 307 of between subject effects (normalised by the maximum extrema). These highlighted a relatively 308 conserved mean burst shape across the pre-and post-movement periods, albeit with larger bursts post-309 movement, across subjects ( Fig 5B), a finding which was not found in the surrogate data. This suggests 310 that although there is a stereotyped average beta burst shape across subjects, there is a significant degree 311 of variability across bursts within subjects. Finally, in order to determine if the source reconstructed 312 beta signal corresponded to a single dipole, we analysed the sensor level data using the same epochs as 313 source level data at a single sensor (MLT14) overlying the left sensorimotor cortex. This demonstrates 314 a single dipole at the maxima of the beta burst (4ms) which is highly stereotyped in both the pre-and 315 post-movement bursts ( Fig 5C). 316 317

Spatial distribution of beta bursts and conventional ERD and ERS 318
Having demonstrated that beta bursts are transient, we proceeded to examine the distribution and 319 consistency of beta bursts over space. Previously, movement related average beta changes have been 320 we considered whether the spatially diffuse signal may be partially a result of averaging of bursts. We 323 therefore analysed the spatial topography of the ERD and ERS movement-related beta change (dB 324 normalised to -3000  -2500 ms pre-movement, shown in classical top panel spectrograms) during the 325 pre-and post-movement periods for each subject in source space. During the pre-movement (ERD) 326 period ( Figure 6C: left panels), the reduction in average beta power was spatially distributed and 327 bilateral in the majority of subjects, echoing previous reports (Cheyne, 2013;Doyle et al., 2005). This 328 spatial pattern was conserved in the post-movement (ERS) period although there was some notable 329 heterogeneity across subjects (Fig 6C; right panels). 330 The spatial topography of the ERD and ERS was then directly compared with the spatial topography of 331 the average beta burst. Source level beta burst topography (dB; normalised to beginning (100 ms pre-332 peak)) was averaged across all pre-and post-movement bursts separately to generate a topographical 333 map of the spatial distribution analogous to the ERD and ERS maps ( Figure 6B; right panels). This 334 demonstrated a similar topography of bursts between pre -and post-movement epochs, although post-335 movement bursts contained slightly greater power. Direct comparison of the spatial topography of the 336 beta burst and the average ERD and ERS revealed that the topography of the bursts was overall more 337 focal than that found for the ERD / ERS spectral changes. We formally quantified this by examining 338 the reduction in normalised power as a function of distance from the primary motor cortex, for the ERS 339 peak and for individual bursts (separately for pre-and post-movement bursts). While the peak height of the centre of the primary motor cortex was significantly steeper for bursts compared to the ERS period 342 beta activity (Fig 5C). For the pre-and post-movement bursts, the Full Width Half Maximum (FWHM) 343 was, on average, 7.25 ± 0.8 mm and 6.75 ± 0.75 mm, respectively. By contrast, for the conventional 344 trial averaged peak ERS, the FWHM was significantly greater (more diffuse) at 25 ± 2.4 mm (t7=7.4, 345 p<0.001 and t7=7.8, p<0.001). Therefore, we here show that beta bursts are, spatially more focal than 346 conventional trial averaged ERS activity. 347 348 Figure 6. Topography of classical average beta ERD and ERS signals is more spatially diffuse 350 than beta bursts. A

Beta burst timing predicts behaviour 366
Having shown that conventional, pre-movement beta ERD and post-movement ERS are dominated by 367 punctate, non-rhythmic, spatially focal, high amplitude burst events, we proceeded to examine the 368 putative functional relevance of these beta burst events in the cued movement selection task. All 369 subjects were able to complete the task and mean response time was 277.5 ± 29.6 ms with a mean error 370 rate of 12.6 ± 4.7 % (Fig 7B). 371 It has previously been demonstrated that pre-movement beta activity relates to response time (Doyle et  we observed that the ERD and ERS are tightly coupled to beta burst rate, we predicted that beta burst 374 rate in the pre-and post-movement periods would relate to response time and error monitoring, 375 respectively. 376 We therefore divided trials according to response time (median split; slow versus fast trials), and 377 compared the z-scored mean (averaged over the time period -2500  0 ms pre-instruction cue and 378 averaged over fast or slow trials separately) beta amplitude and burst rate in the pre-movement period 379 in these two groups of trials for all subjects. Pre-movement mean beta amplitude distinguished between 380 fast and slow trials (t7=3.2, p=0.015, pFDR=0.015), with faster trials displaying, on average, lower beta 381 amplitude ( Fig 7A). We then examined beta burst rate over the same period with a matched analysis 382 that showed that beta burst rate was also lower on trials with faster compared to slower response times 383 (t7=4.5, p=0.003, pFDR=0.006). 384 We next turned to the post-movement period (0-2000 ms) to investigate the relationship between beta 385 amplitude, burst activity and the commission of errors. We therefore divided trials into correct versus 386 incorrect trials and compared z-scored mean (averaged over the time period 0  2000 ms post 387 movement and averaged over correct and incorrect trials separately) beta amplitude and burst characteristics in these two groups of trials for all subjects. Trials with errors were weakly related to 389 post-movement mean beta amplitude (t7=2.1, p=0.07, pFDR=0.07) and burst rate (t7=2.7, p=0.03, 390 pFDR=0.06), whereby errors were associated with lower average beta amplitude and burst rates, although 391 this did not survive FDR multiple comparison correction. 392 In order to check whether the pre-and post-movement beta amplitude and burst rate were specific to 393 response time and error respectively we performed secondary (reversed) analyses. We asked whether 394 pre-movement beta related to decision error or post-movement beta related to response time but found 395 no significant effects (p>0.1 for all correlations, uncorrected). By way of further control analyses, we 396 also investigated other beta burst features and found no relationship between pre-movement beta burst We predicted that beta burst timing prior to movement would relate to response time, and beta burst 409 timing after movement would be associated with the commission of errors. To test this, we first 410 commenced with a conservative analysis and examined mean differences in burst timing across trials 411 with respect to behaviour. For this, we again split trials according to fast versus slow trials for the pre-412 movement period, and correct versus incorrect trials for the post-movement period. 413 The mean timing of the final beta burst before movement onset was earlier on fast compared to slow 414 trials in 6 out of 8 subjects (mean: 42 ± 22 ms; Fig 7B left panels). We then examined the time course 415 of this difference by plotting the mean burst probability on fast versus slow trials across all subjects and 416 found a critical window between -1.5 s and -0.9 s where the occurrence of bursts during this window 417 predicted slower response times (p<0.05, FDR corrected, Fig 7C upper panels). Burst timing prior to 418 movement onset was not related to movement errors (p>0.05, Fig 7C lower panels). 419 Post-movement, the timing of the first post-movement beta burst occurred later on error trials compared 420 with correct trials in all 8 subjects (mean: 105 ± 17 ms; Fig 7B right panels). When contrasting burst 421 probability at each time point for correct versus incorrect trials, we observed two periods of lower burst 0.046, Fig 7D, upper panel), the latter of which survived FDR correction. Post-movement burst timing 424 did not correlate with previous response time (p>0.05, Fig 7D lower panels). 425 In order to further examine the effect on behaviour of burst timing, on a trial-by-trial basis and also to 426 compare this with beta amplitude and beta burst rate, we performed a regression analysis with linear 427 mixed effects on the z-scored trial-wise beta amplitude, burst rate and burst timing (see methods). For 428 each trial, we determined the timing of the last beta burst prior to the instruction cue, and the first beta 429 burst post-movement for regression against response time and error commission, respectively. 430 In the pre-movement period we used a linear mixed effects model including trial-wise mean beta 431 amplitude, burst rate and the timing of the final burst in the movement preparation period (-2500 ms 432  0 ms) prior to the instruction cue and their effect on response time. We found that both beta amplitude 433 (χ 2 (1)=5.88, β=0.0031, p=0.015) and beta burst timing prior to instruction cue (χ 2 (1)=6.6, β=0.0022, 434 p=0.010) significantly related to response time, but no such relationship was found for beta burst rate demonstrates that pre-movement, the timing of the final burst is later in slow trials in 6 out of 8 subjects 462 and post-movement the timing of the first burst is earlier in 8 out 8 subjects on correct trials. Burst 463 probability plots show the window over which this burst timing operates and demonstrates a higher rate 464 of bursts (ie higher likelihood of a burst in this window on any given trial) in slow versus faster trials 465 between 1500  500 ms (red asterisks, FDR corrected). Conversely, post-movement, burst probability 466 plots shows a significant (FDR corrected, red asterisk) difference between burst probability in correct 467 (blue) versus incorrect (red) trials across all subjects (~1500ms, lower panels). The difference at the 468 first peak (600 ms) did not survive FDR-correction.

Average beta amplitude analyses conceal burst timing dynamics 481
It is increasingly becoming apparent that beta amplitude and particularly average beta changes in the mandates detailed decomposition of the average changes of rhythmic activity into the underlying 485 dynamics at a single trial level. By leveraging high-precision MEG, we here show that activity at the 486 single trial level is highly dynamic, and is dominated by bursts of beta activity that dictate the majority 487 of changes in synchronization and de-synchronization classically seen (on average) before and after 488 movement, respectively. This is despite the fact that beta bursts comprise a relatively small fraction of 489 total time in a given trial (pre-movement 11.9 %; post-movement 12.7 %), with a mean burst duration 490 of ~100ms, and burst rate of ~1.5 bps, even during the sustained average amplitude increases generally 491 seen post-movement (burst < 1 in 4 trials at ERS peak). Importantly however, this burst activity is 492 modulated by the motor response suggesting a behaviourally relevant role that is not well captured in 493 the average beta amplitude alone. Specifically, the timing of the beta burst before and after movement 494 predicts behaviour, and notably, in the post-movement period, this factored out the effect of beta 495 amplitude. In the pre-movement period, we found that bursts near to the instruction cue were followed 496 by slower response times. By contrast, in the post-movement period, beta bursts occurred earlier when 497 a correct response was made. This timing relationship with behaviour is concealed in conventional 498 analyses of trial averaged beta amplitude. 499

Beta burst timing in movement planning and performance monitoring 500
Our results provide a novel link between cortical beta burst activity in the primary motor cortex to 501 endogenous movement planning and performance in humans. This extends recent work on the short 502 lasting temporal dynamics of beta in the human sensory cortex, basal ganglia and non-human motor

Beta bursts suggest active processing 513
Our results here suggest that, rather than beta activity sub-serving a generalised merely inhibitory role, 514 it has a specific, information encoding role that can rapidly bias behaviour on a trial-by-trial basis 515 despite the events being short lasting. Such a view sides with recent proposals that link beta activity to 516 more complex information coding, such as response uncertainty (Grent- fractionally larger post-movement). This similarity is despite greatly differing behavioural demands 537 during these periods. Beta bursts prior to movement could then reflect the orchestration of incoming 538 information for the formation of action goals. Once a goal is specified, no further rehearsal is required 539 and bursts effectively stop as the moment of movement approaches. 540 By contrast, in the ERS period following movement, the updating of action goals based on the outcome 541 of a movement is required for learning, and a prominent role for beta in this process is consistent with the idea that beta activity tracks a history of inputs and errors (Gould et al., 2011(Gould et al., , 2012Tan et al., 2016;543 Tzagarakis et al., 2010). One could also speculate that the delay seen in the beta burst following errors 544 allows for more information gathering and sensori-motor integration. Others have suggested that beta 545 bursts may be simply represent inhibitory processing, even if transient and local (Sherman et  2017) or can also support long range communication (Fries, 2015) more generally remains to be 561

determined. 562
Here we specifically focused our analysis on beta burst rate and timing, and demonstrate that these can 563 account for the changes seen in the ERD and ERS both in terms of the neural signatures (slow changes 564 in beta amplitude are, in the main, the summation of bursts) and behaviour (burst timing is strongly 565 related to motor behaviour and for errors this relationship is stronger than that found with conventional 566 beta amplitude, even at the single trial level). Notably, if beta bursts do represent active processing, then 567 further information will likely be present in the time series shape that is lost in conventional spectral 568 analyses (which effectively enforce a frequency specific sinusoid), and which is becoming accessible

Spatial Topography of Beta Bursts revealed by high resolution MEG 571
With regards to a putative functional role, it is also worth noting that the spatial extent of classical beta 572 changes (ERD and ERS) is often topographically distributed and bilateral. Interestingly, using a 573 matched, parallel analysis on beta bursts, these appear more focally consistent than previously 574 appreciated. Our topographical maps are of the average (classical) ERD and ERS and average bursts. also contain important and behaviourally relevant information that may traverse the cortex over time. 578

Methods for dealing with this level of spatio-temporal complexity such as Independent Component 579
Analysis and Hidden Markov Modelling are now validated and will in the future enable the community 580 to address whether the time varying spatial topography of beta burst (Baker et al., 2014;Lee et al., 581 2003), or their propagation over the cortex (Rubino et al., 2006) also contribute to behaviour in the same 582 way as burst timing does, as shown here. 583 584

Conclusion 585
Beta activity in the primary motor cortex in humans is primarily characterised by brief, high amplitude, 586 sporadic bursts of activity. These bursts are spatially more focal than previously shown and through 587 summation account for the majority of classical, trial-averaged, ERD and ERS signals. Moreover, beta 588 burst timing is strongly related to motor behaviour with pre-movement beta bursts being associated with 589 delayed movement initiation and early post-movement bursts to correct performance. These results 590 challenge classical interpretations of beta activity that propose a global inhibitory role and suggest a 591 specific informational carrying role of beta burst activity during action preparation and decision 592 monitoring. 593 594 595

Methods 596
Eight healthy subjects participated, following informed written consent which was approved by the 597 UCL Research Ethics Committee (reference number 5833/001). All subjects were right handed, had 598 normal or corrected-to-normal vision, and had no history of neurological or psychiatric disease (6 male, 599 aged 28.5 ± 8.52 years). 600

Visually cued movement selection task 601
Participants performed a visually cued movement selection task, in which they were had to press a 602 corresponding (left or right) button as quickly and as accurately as possible after a visual instruction 603 cue (or up to a maximum of 1s in the absence of a response) (Figure 1B). The instruction stimuli were 604 preceded by a preparation cue that in 70% of trials predicted the direction of the upcoming imperative 605 stimulus, thus allowing participants to prepare their response in advance of the subsequent instruction 606

cue. 607
At trial onset, participants fixated on a white cross (0.5° × 0.5°) in the middle of the screen. Following 608 a short delay (uniform random distribution between 1 and 2s), a random dot kinetogram (RDK) 609 appeared with coherent motion to either the left or the right that indicated the likely direction of the 610 upcoming instruction cue (70% of the time). The RDK comprised a 10°×10° square aperture, in the 611 centre of the screen with 100, 0.3° diameter dots, all traversing at 4°/s. There were three levels of 612 coherent motion, which defined the percentage of dots that moved coherently, with the rest of the dots 613 moving in random directions. The middle level of coherence was set individually for each subject using 614 40 perceptual decision-making trials at the beginning of the experiment, with an adaptive staircase 615 design to achieve 82% accuracy (QUEST; Watson and Pelli, 1983). The lower and higher levels of 616 coherence were then set at 50% and 150% respectively and coherence levels were balanced across trials. 617 The RDK lasted 2s and was followed by a 500 ms delay and then by an instruction cue consisting of a 618 3° × 1° arrow pointing to the left or the right ( Figure 1A). Responses were indicated via a right-hand 619 index (left instruction cue) or right-hand middle finger (right instruction cue) on a button box. The 620 paradigm was implemented using the Cogent 2000 toolbox through MATLAB (The MathWorks, Inc., 621 Natick, MA. http://www.vislab.ucl.ac.uk/cogent.php). 622

Magnetic resonance imaging 623
Subjects underwent both a standard MRI scan for individualised head cast creation and a high 624 resolution, quantitative, multi-parameter map scan (Lutti et al., 2014), which was used for accurate 625 cortical surface extraction and source reconstruction. The first anatomical sequence used a 12-channel 626 head coil and was radiofrequency (RF) and gradient spoiled T1 weighted, 3D fast low angle shot 627 (FLASH) sequence with an image resolution of 1mm 3 , field-of-view: 256,256 and 192 mm along the Germany). Excitation flip angle was set to 12° to ensure sufficient signal-to-noise ratio and repetition 630 time was set to 7.96 ms. MPM protocol images consisted of 3 spoiled multi-echo 3D FLASH 631 acquisitions (800 µm isotropic resolution, proton density, T1 or MT-weighting) using a 32 channel 632 headcoil with two additional sequences used for calibration and correction of inhomogeneities in the 633 radio-frequency transmit field. To achieve an MT-weighting, a Gaussian RF pulse 2 kHz of resonance

Head-cast fabrication 643
In order to record beta activity with a maximal level of spatial precision we used subject specific head-644 casts which have been shown to optimise co-registration, reduce head-movements and thereby separately for the x, y and z directions (Fig 1C). Subjects performed 1 -4 recording sessions, with each 667 session split into 3 separate runs. Participants completed 3 blocks per session (with short breaks between 668 them), and 1-4 successful sessions on different days. This led to a total number of 1614 ± 763 (mean ± 669 sd) completed trials per subject taken forward for analysis. Data from individual runs within sessions 670 were concatenated and these datasets from each session analysed individually and results averaged 671 across session for each subject. The data was filtered with a 5th order (Butterworth) bandpass filter (2-672 100 Hz, Notch filter: 50Hz) and downsampled to 250 Hz. Eye blink artefacts were removed using 673 multiple source eye correction (Berg and Scherg, 1994) and trials with variance > 2.5 sd away from the 674 mean were excluded. Data was epoched from 3.5 s before to 1.5 s after the instruction cue (total 5 s) for 675 the analysis of pre-movement neural activity (cue-locked) and to 2 s before to 2 s after the button press 676 for post-movement analysis (response locked). subject with reference to the hand knob (Dechent and Frahm, 2003;Yousry et al., 1997). 694 In order to obtain the beta amplitude trace, the time series virtual electrode data was filtered using a 4 th 697 order (two pass) Butterworth filter with a frequency range of 13 -30 Hz. A complex (analytic) function 698 was then extracted from the time series using the Hilbert transform and the amplitude derived by taking 699 the modulus of that signal: 700 Where S(t) is the amplitude signal, u(t) is the filtered data and H represents the Hilbert transform. For 702 time resolved spectrograms (Fig 2A and Fig 6A/B)

Identification of beta bursts through empirically defined thresholds 707
In order to identify beta bursts, we first needed to select an appropriate threshold. Previous studies have 708 generally selected heuristic thresholds relative to the mean beta amplitude and shown robustness of 709 findings across a range of threshold levels (Feingold et al., 2015;Tinkhauser et al., 2017bTinkhauser et al., , 2017a. This 710 works well for individual studies but limits the scope for comparison across different brain sites and 711 recording methods as well as theoretically subsampling a smaller distribution of relevant bursts (if the 712 threshold is set too high), or alternatively, including smaller, non-burst, activity such as noise which 713 may dilute behavioural effects. Recently an empirical method has been introduced whereby burst 714 frequency is correlated with total beta power in the trial across a range of thresholds and the maxima in 715 the correlation used to define the threshold for defining bursts (Shin et al., 2017). Here we used a similar 716 technique to that previously described but defined the threshold in terms of standard deviations away 717 from the median signal (as opposed to multiples of the median) and so as not to transform the underlying 718 signal in any way performed this on the beta amplitude, rather than the beta power (amplitude.^2). 719 Therefore, for each subject we correlated the trial-wise mean beta amplitude with the number of burst 720 events in each trial (defined according to an amplitude threshold), separately for pre-and post-721 movement periods. Each subject performed 540 trials per recording session. Following automated 722 rejection, this may leave, for example, 535 valid trials. In such a case therefore, we would correlate the 723 535 mean amplitudes (across 13 -30 Hz) in the pre-movement period against the 535 burst counts in 724 the same pre-movement period (repeated separately for the post-movement period). This procedure 725 resulted in one correlation value for each subject for the pre-movement period and one correlation value 726 for the post-movement period, per burst definition threshold. This was then repeated across a range of 727 thresholds (defined in terms of SDs above the median beta amplitude) to generate two curves for each 728 subject, which showed the relationship between amplitude thresholds used to define the burst and the Curves were then averaged across all subjects and the peak taken at the group level (Fig S1), as the 731 definition of the amplitude thresholds for defining bursts. This revealed a peak correlation between 732 average beta power and burst amplitude at 1.75 standard deviations above the median, which was 733 consistent for both the pre-movement, cue-locked period, and the post-movement, button-locked period 734 ( Fig 2C). The threshold of 1.75 SDs above the median was then used for each individual subject, defined 735 according to their own dataset, so that they had statistically matched thresholds, although the absolute 736 threshold levels could differ to take account of varying SNR across subjects. Notably, this threshold 737 was robust to MEG inversion method (Fig S2). 738 Comparing our empirically derived threshold with previously used burst definition thresholds, we find 739 that this is higher than has been used in subcortical recordings (Tinkhauser et al., 2017b(Tinkhauser et al., , 2017a, is 740 comparable to findings in the motor cortex in primates (Feingold et al., 2015), but lower than shown 741 previously in the human sensory cortex, (Shin et al., 2017). The fact that this threshold level found here 742 is lower than that of other MEG data may partly related to the use of amplitude correlations rather than 743 power correlations (Shin et al., 2017). Indeed, using power (as opposed to amplitude) increases the 744 empirically derived threshold, but also leads to flatter correlation curves in our dataset (Fig S1). Burst probability (at each time point across trials) was calculated by binary coding all trial time points 757 as to whether the beta envelope was either above or below threshold 2 on each individual trial and this 758 matrix (trials x time points) averaged over trials and scaled by total trial number to give a time resolved 759 probability of bursts per subject. Time resolved ERD and ERS were calculated by averaging beta 760 amplitude across trials and this average value was then normalised as a % change to baseline in order 761 to facilitate comparison across all subjects (Fig 3A). With the purpose of quantifying the impact of 762 removal of bursts on the conventional ERD and ERS measures, bursts were identified (above threshold replaced) were then averaged across trials to demonstrate the effect of burst removal on the ERD and 766 ERS separately (Fig 3A, red trace = ERD and ERS with bursts replaced). The ERD and ERS (over time) 767 was then correlated against the burst probability curve (over time) to derive an R 2 value of correlation 768 between the two and plotted as a scatter plot (with a line of best fit) for the pre-and post-movement 769 periods separately. This is a measure of how the shape of the time resolved burst probability (Fig 3A;  770 black trace) matches that of ERD and ERD (Fig 3A; red trace) during the pre-and post-movement 771 periods respectively. Burst amplitude peak timings were also plotted in the form of a raster plot across 772 all trials for all subjects to show the change in burst timing with regard to behaviour (Fig 3C). 773 774

Beta burst characteristics 775
We first analysed beta amplitude (all beta amplitude, not restricted to bursts) variability and quantified 776 this by the coefficient of variation (CV = std beta amplitude / mean beta amplitude). This was performed 777 for each trial for the pre-and post-movement periods separately and plotted as a histogram of CV values 778 (across trials) for a single subject (Fig 2A) and for all subjects together (Fig 4A). This was then 779 compared to a spectrally matched surrogate dataset that retained all of the spectral features of the 780 original data except, through phase shuffling (prior to filtering), the time series was randomised and 781 therefore redistributed the (same) beta power across time, taking a highly nonstationary signal and 782 making it more stationary (function: spm_phase_shuffle; SPM12) (Feingold et al., 2015). Having 783 established that the beta amplitude is more variable than would be expected from a spectrally matched 784 surrogate dataset, we proceeded to examine the beta bursts specifically. Burst (as defined by beta 785 amplitude periods above our empirically defined, 1.75 sd > median, threshold) characteristics were 786 quantified by height and duration. Height was taken as the maximum of the peak of the burst in the 787 amplitude domain. Duration was defined as the time (ms) that the amplitude remained above threshold 788 2. These features were again compared to surrogate data using the phase shuffled dataset as described. 789 The differences (real datasurrogate data) in the burst characteristics between the two different 790 populations (same threshold used for burst definition in both datasets) were plotted for each subject on 791 a 3D histogram. We then tested the hypothesis that beta bursts were larger than would be expected by 792 comparing the distributions of burst heights and burst lengths for 10 different heights (percentiles) and 793 lengths (100 ms bins up to 1000 ms) against our surrogate, spectrally matched distributions (mean of 794 x1000 iterations) using paired t tests across subjects (FDR corrected). This allowed for ruling out the 795 possibility that beta burst activity was not just noise superimposed on top of a slowly changing signal, 796 which would predict a similar distribution of CV and beta bursts in a surrogate dataset. 797 To analyse the shape and consistency of the bursts we took our data and re-aligned it to the centre of 798 the peak of the amplitude. Unfiltered source level data was re-epoched around the amplitude peaks with across all bursts to look for consistency of the waveform (in the time domain) within the amplitude 801 aligned bursts and this was repeated with surrogate data (Fig 5A). Individual bursts were then cross-802 correlated (r) against the average burst and the maximum r value (max(r)) within 25 ms (half a cycle), 803 before or after the midpoint (to allow for some a small amount of jitter = ~1 beta cycle), taken as a 804 measure of the concordance of the individual burst with the average, repeated for the pre-movement 805 and post-movement bursts. The average waveform for each subject was then normalised by the 806 maximum (absolute) value for that subject and these normalised subject specific burstsaveraged 807 across all subjects with the time series +/-orientation displayed with the minimum closest to time zero 808 as per previous studies (Sherman et al., 2016;Shin et al., 2017). All statistical analyses were performed 809 on the beta amplitude envelope and were therefore invariant to inherent uncertainty in dipole orientation 810 causes by source inversion uncertainty and sulcal variation across subjects. Analyses were repeated in 811 the theta and gamma ranges as functional controls and to check that the behavioural relationships found 812 with beta bursts were specific (Fig S3). 813 Finally, the sensor level field map was examined by averaging the (subject mean normalised) magnetic 814 field for all bursts and subjects for a representative sensor overlying the left motor area (MLT14) at the 815 three moments of the average maximum dipole in the amplitude aligned dataset and plotted as an 816 average scalp map which confirmed the presence of a single dipole (spm_eeg_plotScalpData) that was 817 similar for pre-and post-movement periods. 818 819

Analysis of the burst spatial distribution 820
We examined the distribution of activity on the cortical mesh in source space using both a classical 821 (averaged across trials) relative change (ie ERD and ERS) and a matched analysis for average burst 822 activity changes at the peak. For spatial topography, we used the conventional, baseline corrected, (for 823 comparison across subjects) analysis consisting of averaging the power change of the period of interest 824 (e.g. ERD minimum, ERS maximum, burst maximum) relative to baseline for all different vertices 825 across the mesh: 826

Power change (dB) = 10log10(Power(poi)/Power(baseline) 827
The baseline was defined as the 200ms prior to the onset of the trial, the ERD as the final 200 ms prior 828 to button press and the ERS as 200 ms either side of the time point 600 ms post button press. The power 829 in the bursts was normalised to the start of the bursts (-100 ms from the peak) using the same formula 830 (dB). The power was then shown projected onto the mesh with colour range limits between -5 dB and 831 + 5 dB. For the calculation of the FWHM, the power at each vertex was binned according to distance 832 from the centre of the hand knob area (in 5mm bins up to 50 mm) of the primary motor cortex and the also for each individual burst at the peak (power drop off averaged over all bursts). The average power 835 change by distance for all bursts (pre-and post-movement) as well as the ERS power change by distance 836 were then normalised within subjects by dividing all of them by the maximum power (first bin) of the 837 average post-movement bursts (which was the largest). The FWHM was then calculated within 838 individual subjects as distance away from the centre of the M1 hand area for which the power had 839 dropped by half. 840 841

Behavioural analyses 842
In order to analyse the differential contribution of the beta amplitude and the burst characteristics with 843 regards to behaviour these features were calculated separately in the pre-and post-movement periods. 844 Responses that matched the direction of the instruction cue were classed as correct and the response 845 time (RT) was taken as the time difference between instruction cue and the button press. Beta amplitude, 846 burst rate, mean burst height and mean burst duration were calculated in the pre-movement period (cue 847 locked dataset; start of RDK -2500 ms  0 ms instruction cue) and post-movement period (button 848 locked dataset; 0 ms  2000 ms) and z-scored across all trials. Z-scored trials were then split according 849 to response time (slow versus fast) and decision error (correct versus incorrect) and compared across 850 all subjects by paired t-testing across subjects for both pre-and post-movement datasets respectively. 851 Burst timing was analysed in a similar fashion with the last pre-instruction cue burst timing taken 852 forwards for the pre-movement dataset and also the timing of the first burst after button press taken 853 forwards for the post-movement dataset. Timing resolved burst probabilities across all trials were 854 downsampled to a sampling frequency of 20 and then similarly grouped according to behavioural output 855 (response time and error) with differences compared between trials grouped by behaviour for each time 856 point by paired t-testing (and FDR correction). 857 Finally, in order to examine the effect of beta amplitude, burst rate and burst timing on a individual 858 trial-by-trial basis and directly compare their impact on behaviour we proceeded to take these features 859 forwards into a linear, mixed effects model (R statistics, version 3.4.3). Firstly, for RT (dependent 860 variable) we used a linear mixed model with congruence, coherence and their interaction as fixed effects 861 and beta amplitude, beta burst rate and beta burst timing plus a subject-specific intercept as random 862 effects. Fixed effects for this model were estimated using type III Wald F tests with Kenward-Rogers 863 approximated degrees of freedom (Kenward and Roger, 1997). For the post-movement analysis and 864 beta amplitude, burst rate and burst timing together we used a generalized linear mixed model with a 865 logit link function, using correct (true or false) in each trial as the dependent variable, congruence