Fig 1.
Replication of topics from Poldrack et al. [31].
Figure shows the results of applying the generic LDA model [23] to the Neurosynth database, as described in Poldrack et al. (2012). (A) Selected topics reported in Poldrack et al.[31] using an older Neurosynth database of 5,809 studies. (B) Closest matching topics when applying the same approach to the current, expanded, Neurosynth database (11,406 studies).
Fig 2.
Schematic overview of the GC-LDA model.
Each document (an article in the Neurosynth corpus) is represented as a mixture of learned latent topics, where each topic is associated with both a 3-dimensional Gaussian spatial distribution, and a set of linguistic terms extracted from the abstract text.
Fig 3.
Selected topics learned by the GC-LDA model (for full results, see S1 Fig).
(a) Spatial distributions for 90 of the 200 topics. Each color represents a different topic. Top row: hard assignments of activations to topics; each point represents a single activation from a single study in the Neurosynth database (note that each topic is spatially represented by a mixture of only two symmetrically-constrained gaussians; the appearance of multiple regions that share colors is due to the inevitable reuse of perceptually similar colors). Bottom row: estimated multivariate Gaussian mixture distribution of each topic. (b) Top semantic associates (word clouds) and activation distributions (orthogonal brain slices) for selected topics. The size of a term in each word cloud is proportional to the strength of loading on the corresponding topic.
Fig 4.
Activation profiles and top-loading words for spatially overlapping topics in parietal cortex.
Top row: hard assignments of activations to topics; each point represents a single activation from a single study in the Neurosynth database. Bottom row: estimated multivariate Gaussian mixture distribution of each topic.
Fig 5.
Activation profiles and top-loading words for emotion-related topics.
Top row: hard assignments of activations to topics; each point represents a single activation from a single study in the Neurosynth database. Bottom row: estimated multivariate Gaussian mixture distribution of each topic.
Fig 6.
Data-driven estimation of hemispheric lateralization of cognitive function.
Top: histogram and kernel density estimation plot of the lateralization coefficient for all topics. Values below 0.5 represent left-lateralization; values above 0.5 represent right-lateralization. Bottom: selected topics that displayed notable hemispheric lateralization.
Fig 7.
Examples of generative text-to-image and image-to-text mapping using the trained GC-LDA model.
(A) Generation of predicted whole-brain images from arbitrary text. (B) Topic-based decoding of discrete activation coordinates. (C) Topic-based decoding of continuous whole-brain images; examples selected from the cognitive components reported in Yeo et al. [33], the BrainMap ICA components reported in Smith et al. [15], and the language and emotion contrasts from the n = 500 release of the HCP dataset. Note that the scale of the values in (B) and (C) is dependent on the input image, and should not be assigned an absolute interpretation.
Fig 8.
Effects of different topic priors on decoding results.
The top 10 terms produced by the decoder are displayed for two different HCP contrasts (Gain > Loss from the Gambling task and Language > Math from the Language task) and three different sets of topic priors (left: uniform prior; middle: priors seeded with a list of reward-related terms; right: priors seeded with the Neurosynth “reward” meta-analysis map). For the non-uniform priors, results are displayed for priors of differing strengths (weak = 0.1, strong = 0.25). Line plots above the decoder outputs illustrate the prior distribution of topics used in each analysis (for the sake of visual clarity, topics are ordered by increasing weight separately in each case).
Fig 9.
Graphical model of the full GC-LDA model.