Searching through functional space reveals distributed visual, auditory, and semantic coding in the human brain

The extent to which brain functions are localized or distributed is a foundational question in neuroscience. In the human brain, common fMRI methods such as cluster correction, atlas parcellation, and anatomical searchlight are biased by design toward finding localized representations. Here we introduce the functional searchlight approach as an alternative to anatomical searchlight analysis, the most commonly used exploratory multivariate fMRI technique. Functional searchlight removes any anatomical bias by grouping voxels based only on functional similarity and ignoring anatomical proximity. We report evidence that visual and auditory features from deep neural networks and semantic features from a natural language processing model, as well as object representations, are more widely distributed across the brain than previously acknowledged and that functional searchlight can improve model-based similarity and decoding accuracy. This approach provides a new way to evaluate and constrain computational models with brain activity and pushes our understanding of human brain function further along the spectrum from strict modularity toward distributed representation.

We want to thank the reviewer for their prior and current feedback, which greatly strengthened our work.
Few minor questions: how do the functional searchlights look across the two studies (Forrest and Sherlock). Do they reveal similar distributed patterns across studies? This is related to my original concern of whether these are revealing some general distributed networks comparable across experiments. This is an interesting question worth addressing in the manuscript. The movies differ in terms of style, characters, dynamics, emotions, etc., so we did not expect that they would necessarily drive brain activity in a way that the features discovered by SRM would be identical or highly similar. For example, an SRM feature (or set of features) could correspond to individual actors, speaker accents, or locations present in the Sherlock episode but not in Forrest Gump. If so, then voxels that organize along such dimensions in functional space would not be consistent across movies. Nonetheless, we did expect that some features might be present in both movies and thus that there would be some similarity between their functional spaces in the aggregate. For example, such features could respond more generically to faces, language, or buildings.
To test this possibility, we conducted a new analysis in which we compared the pairwise distances between voxels in the functional spaces derived from the Sherlock and StudyForrest datasets. For each dataset, the vector mappings of voxels into the features of the shared response were used to calculate the Euclidean distance between all pairs of voxels. The similarity of functional spaces between datasets was then calculated as the correlation of these distances across voxel pairs. Because there is an astronomical number of such pairs, we used a Monte Carlo approach of calculating the correlation of distances in a random sample of voxel pairs and repeating several times to estimate the overall correlation. We tested statistical significance by comparing this value to a null distribution of correlations generated by randomization. The pairwise distances had a modest but highly significant positive correlation across datasets, suggesting that the functional spaces learned from the movies were at least somewhat similar.
Manuscript change: "Additionally, we tested whether the functional spaces generated from the Sherlock movie data and the StudyForrest movie data were similar. We found that the pairwise distances between voxels in these spaces had a modest but highly significant positive correlation (Fig S8). This suggests that SRM can capture a general organization of voxels common across audiovisual movie viewing conditions, but also that individual movie content may partly govern the learned functional spaces." p. 11-12 Added Figure S8: To determine the similarity of the functional spaces constructed by SRM for the Sherlock and StudyForrest datasets, we extracted the Euclidean distances between all voxel pairs in functional space and then correlated these distances between datasets. The overall correlation was estimated by randomly sampling the same 10,000 voxel pairs from the two datasets, calculating the correlation of distances in the sample, repeating 1,000 times with new random samples, and then averaging across iterations. There was a positive correlation overall (r = 0.198). To test statistical significance, we performed a non-parametric randomization test by repeating this process 1,000 times while randomly permuting the voxel distances of the Sherlock dataset each time to populate a null distribution. The true correlation of the datasets was highly significant (p < 0.001). One of the samples used to estimate the overall correlation is visualized in the plot, with each dot depicting one voxel pair and the axes corresponding to their Euclidean distance in the two functional spaces. Color coding by anatomical distance of each voxel pair indicates how dramatically SRM reshaped the brain based on function (i.e., voxels that are anatomically distant can be functionally close, and vice versa). a.u. indicates arbitrary units.
-Results section still doesn't contain numbers but just qualification of them. You can specify the average increase in correlations and accuracies in the text to give the reader an idea of effect sizes. There is no way to parse if from the text now. I also suggest using actual correlation and classification accuracy values as the main figure (S5 now) rather than the percent changes which will be a more direct report of results.
We appreciate the interest of the reviewer in presenting the data more directly. As suggested, we added a new figure (Figure 3) based on the old Figure S5. Because this figure reports all neural network layers and the main percent change figure depicts only representative layers, we have also added to the new figure the percent change for all layers from the old Figure S2. Additionally, we report the requested numerical results in the main text.
Added Figure 3: (A) Average performance in the top 1% of functional and anatomical searchlights for neural network similarity, annotation vector decoding, and localizer category decoding. White bar is the functional searchlight performance, black bar is the anatomical searchlight performance. Error bars represent standard error across subjects. Chance (red lines) was computed in the neural network similarity and annotation vector decoding analysis as the mean of a null distribution estimated non-parametrically by rolling data in time, and in image classification as the theoretical chance level (1/6 categories). (B) Percent improvement of functional over anatomical searchlight in the top 1% of searchlights for neural network similarity (all layers), annotation vector decoding, and localizer category decoding. To calculate percent improvement, we first subtracted the chance level from the performance of each searchlight type. Error bars represent 95% CIs.