Fig 1.
Model architectures based on neural networks with gated recurrent units (GRUs).
(A) Classifier. At each time step, time series data,xt, provided inputs. The recurrent neural network transformed the inputs into a latent representation, ht, which then determined the output class scores, . The unit with highest activation determined the model’s prediction of the input stimulus at each time. (B) Dimensionality reduction. The encoder shares the GRU component shown in A. GRU outputs, ht, were first linearly projected to a lower-dimensional space using a fully-connected layer (DR-FC). Classification was then performed based on the low-dimensional representation,
. (C) Input signal reconstruction. A separate GRU was trained independently to reconstruct the original brain signals based on the low-dimensional signals,
.
Fig 2.
Prediction of movie clip (15-way classification).
(A) Average clip prediction accuracy using the neural network architecture with gated recurrent units (GRUs) as a function of time (dark yellow). Accuracy increased sharply during the first 60 seconds, and stabilized around 90 seconds. Results using multinomial logistic regression (Log-Reg), a feed-forward architecture (FF; 1 layer, 103 units), temporal convolutional networks (TCN; kernel widths of 5 and 40) also also shown (see part B for labels). Error bars show the 95% confidence interval of the mean across test participants. (B) Summary of accuracy results after the 90-second transient period (see dashed line in part A).
Fig 3.
(A) Trajectories for all clips. Solid line: mean trajectory averaged across participants (line thickness is scaled by variance, which was highest at the end of the clip). All trajectories progressed away from the center (see sample arrows). The inset provides clip abbreviations. (B) Euclidean distance between trajectories. The Euclidean distance between the clip trajectory while watching Home Alone and the mean trajectory across participants for a second clip was computed. The thicker line corresponds to the distance of participants’ Home Alone trajectories to the mean of this clip. The same results are shown for all clips in S1 Fig. (C) Clip prediction accuracy and fraction of variance captured after reconstruction using low-dimensional models. Error bars correspond to the standard error of the mean across participants.
Fig 4.
Determining brain contributions to classification.
(A) Saliency map for the “Star Wars” clip. For illustration, the top 30 regions are shown. Color scale in arbritary units. See also the video for dynamics. (B) Saliency values for the first 60 seconds of two clips. The gray bands corresponds to the 95-th percentile of null saliency values generated via permutation testing. Scale of the y-axis is arbitrary.
Fig 5.
(A) The impact of a lesion in each of the seven standard networks is shown. The orange line shows accuracy without any lesion. (B) The impact of lesions to specific subnetworks was also evaluated, revealing greater contributions to clip classification by specific sets of brain regions. (C) Comparison of removal of regions from a given network (visual, somato-motor, or default mode) relative to a random set of regions from that same network. Removing regions in descending order of overall saliency (see text) impacted classification more than selecting regions without considering saliency (without replacement). For illustration, we also display the regions of the most impactful subnetworks observed in part B (red dots; also shown in the insets). The gray region shows the 95% confidence interval when the same number of regions were excluded at random.
Fig 6.
(A) Prediction of fluid intelligence scores as a function of time. Prediction (blue) fluctuates considerably but consistently exceeds chance values (indicated by the tic marks at the bottom). Values obtained by connectome-based predictive modeling (CPM) are indicated for comparison (red: CPM applied to Star Wars data; maroon: highest value applying CPM across all clips). The gray region indicates the 95th percentile region based on permutation testing (N.B.: applies to our method only, not CPM). (B) Prediction of verbal IQ as a function of time. Only short periods of time of the Star Wars clip exceeded chance levels. The green bar indicates a segment of the time series that is significant at the 0.05 level corrected for multiple comparisons.
Fig 7.
Voxelwise prediction of experimental conditions: Threat vs. safe.
Only voxels from the anterior insula were employed. Saliency values at t = 6 are shown; for illustration no thresholding was applied.
Fig 8.
Selection of model architectures via cross validation on data from 100 participants.
Val: validation portion of training data. Error bars show 95% confidence intervals across folds.
Table 1.
Summary of models implemented in the paper.
Hyperparameters and other parameters: LR: controlled the rate of update for gradient descent; Dr: dropout for the GRU layer [53]; L2: L2 regularization coefficient for the GRU layer; BS: Batch Size, number of training samples before updating model parameters; Ep: Epochs, number of passes through the training dataset.