Representational structure or task structure? Bias in neural representational similarity analysis and a Bayesian method for reducing bias

doi:10.1371/journal.pcbi.1006299

Fig 1.

Standard RSA introduces bias structure to the similarity matrix.

(A) A cognitive task including 16 different experimental conditions. Transitions between conditions follow a Markov process. Arrows indicate possible transitions, each with p = 0.5. The task conditions can be grouped into 3 categories (color coded) according to their characteristic transition structure. (B) Standard RSA of activity patterns corresponding to each condition estimated from a brain region reveals a highly structured similarity matrix (left) that reflects aspects of the transition structure in the task. Converting the similarity matrix C to a distance matrix 1-C and projecting it to a low-dimensional space using MDS reveals a highly regular structure (right). Seeing such a result, one may infer that representational structure in the ROI strongly reflects the task structure. (C) However, applying RSA to regression estimates of of patterns obtained from pure white noise generates a very close similarity matrix (left), with a similar low-dimensional projection (right). This indicates that standard RSA can introduce spurious structure in the similarity matrix that does not exist in the data. (D) RSA Using Euclidean distance as a similarity metric applied to patterns estimated from the same noise (left) yields a slightly different, but still structured, similarity structure (right). (E) Calculating the correlation between raw patterns of resting state fMRI data (instead of patterns estimated by a GLM), assuming the same task structure as in (A), also generates spurious similarity structure, albeit different from those in (B-D). This structure is significantly correlated with the theoretical bias structure (details in main text). Left: average of the similarity structure based on raw patterns. Right: average of the theoretical bias similarity structure arising purely from task structure and fMRI noise autocorrelation. (F) The bias in this case comes from structured noise introduced during the GLM analysis. Assuming the true patterns β (red dots) of two task conditions are anti-correlated (the horizontal and vertical coordinates of each dot represent the response amplitudes of one voxel to the two task conditions), regression turns the noise ϵ in fMRI data into structured noise (X^T X)⁻¹ X^Tϵ (blue dots). The correlation between the noises in the estimated patterns is often non-zero (assumed to be positive correlation here) due to the correlation structure in the design matrix and the autocorrelation property of the noise. The estimated patterns (purple dots) are the sum of β and (X^T X)⁻¹ X^Tϵ. The correlation structure between estimated activity vectors for each condition will therefore differ from the correlation structure between the true patterns β. (G) Distribution of the autocorrelation coefficients in a resting state fMRI dataset, estimated by fitting AR(1) model to the time series of each voxel resampled at TR = 2.4s. The wide range of degree of autocorrelation across voxels makes it difficulty to calculate a simple analytic form of the bias structure introduced by the structured noise, and calls for modeling the noise structure of each voxel separately.

More »

Expand

Fig 2.

Generative model of Bayesian RSA.

The covariance structure U shared across all voxels in an ROI is treated as a hyper-parameter of the unknown response amplitude β. For voxel k, the BOLD time series Y_k are the only observable data. We assume Y_k is generated by task-related activity amplitudes β_⋅k (the k-th column of β), intrinsic fluctuation amplitudes β_0⋅k and spatially independent noise ϵ_k: Y_k = Xβ_k + X₀ β_0⋅k + ϵ_k, where X is the design matrix and X₀ is the set of time courses of intrinsic fluctuations. ϵ_k is modeled as an AR(1) process with autocorrelation coefficient ρ_k and noise standard deviation σ_k. β_⋅k depends on the voxel’s pseudo-SNR s_k and noise level σ_k in addition to U: β_⋅k ∼ N(0, (s_k σ_k)² U). By marginalizing over β_⋅k, β_0⋅k, σ_k, ρ_k and s_k for each voxel, we can obtain the likelihood function p(Y_k|X, X₀, U) and search for U which maximizes the total log likelihood of the observed data Y for all n_V voxels. The optimal can be converted to a correlation matrix, representing the estimated similarity between patterns.

More »

Expand

Fig 3.

Performance of BRSA and other methods on simulated data.

(A) We simulate task-related activation magnitude according to a multivariate normal distribution with a hypothetical “true” covariance structure U as displayed. (B) We use lateral occipital cortex (bright region) as an example ROI and resting state fMRI data from the Human Connectome Project as noise. (C) We multiplied the design matrix of the task in Fig 1A with the activity pattern simulated according to A and then added this “signal” to voxels in a cubical region of the ROI. The colors show the actual SNR of the added signal for one example simulated brain, corresponding to the plot circumvented by a red square in F. SNR here is defined as the ratio of standard deviation of the simulated signal to that of the noise (the time series of the resting state fMRI data): , where noise_k is the time series of the resting state fMRI data in voxel k, treated as task-irrelevant noise in our simulation. (D) The pseudo-SNR map estimated by BRSA for the data with a true SNR map shown in C. The scale does not match the scale of true SNR, but the spatial pattern of SNR is recovered. The result corresponds to the simulation condition with red box in F. (E) The distributions of the fitted pseudo-SNR of task-active (pink) and inactive (blue) voxels are highly separable. The inset shows the SNR in active voxels and their fitted pseudo-SNR are significantly correlated. r = 0.62, p<1.4e-20. (F) Average covariance matrix (top) and similarity matrix (bottom) estimated by BRSA in the cubic area in C, across different SNR levels (columns) and different numbers of runs (rows). The average SNRs within the voxels with signals added (i.e., voxels with color in C) are displayed at the bottom. Note that we do not expect the values in the covariance or correlation matrix to scale linearly with SNR. The major effect of SNR is that the similarity structure becomes noisier as SNR decreases. (G) The corresponding result obtained by standard RSA based on activity patterns estimated within runs, which are spatially whitened. The major effect of SNR is that bias structure is stronger in the result as SNR decreases. (H) The corresponding result of RSA based on cross-correlating patterns estimated from separate runs, which are spatially whitened based on the residuals of all scanning runs. The major effect of SNR is noisier result and smaller correlational coefficients overall as SNR decreases. (I) Top: average correlation (mean ± std) between the off-diagonal elements of the estimated and true similarity matrices, for each method, across SNR levels (x-axis) and amounts of data (separate plots). Bottom: The correlation between the average estimated similarity matrix of each method (for GBRSA, this is the single similarity matrix estimated) and the true similarity matrix. “point-est”: methods based on point estimates of activity patterns; “-crossrun”: similarity based on cross-correlation between runs; “-whiten”: patterns were spatially whitened.

More »

Expand

Fig 4.

Limited performance of BRSA at very low SNR and small amount of data.

(A) The average correlation between the off-diagonal elements of the estimated and the true similarity matrices (mean ± std) as the number of simulated subjects increases. Each simulated subject had one run of data. Legend shows average SNR in task-responsive voxels. Half of the voxels do not include any signal related to the design matrix. The correlation reaches asymptotic levels slightly below 1 with increasing numbers of participants except when the SNR is extremely low (0.07), indicating that the bias is not fully eliminated. (B) The average correlation between the estimated similarity matrix and the expected bias structure assuming white noise. The estimated similarity structure is most dominated by the bias structure at the lowest SNR simulated (0.07). The negative correlation at the highest SNR reflects the weak negative correlation between the true similarity structure and expected bias structure (-0.055).

More »

Expand

Fig 5.

Cross-validation reduces the chance of false positive results.

A group of 24 subjects with different SNRs were simulated as in Fig 3. 1 or 2 runs of data were used as training data and 1 left-out run was used as test data. The full BRSA model and a null model that assumes no task-related activity were fit to each simulated subject’s training data. Student’s t-test was performed on the differences between the cross-validation scores (the predictive log likelihoods for the test data) of the full model and null model across the simulated subjects to determine whether the full model should be accepted for each group of simulated subjects. This procedure was repeated on 36 different groups of simulated data (all using real fMRI data of non-overlapping subjects from HCP dataset as noise). (A) Task-related signals were added to both training and test data. The frequencies with which the full models were accepted based on the t-test (correct acceptance) are displayed for each simulation condition, grouped by the amounts of training data (1 or 2 runs). SNRs in the task-active voxels (about 4% of all) are displayed on the top-right and correspond to the SNRs in Fig 3. Full model was almost always rejected at the lowest simulated SNR. (B) Mean ± std of the t-statistics of the difference between cross-validation scores of the full and null models across simulated groups, for the corresponding amounts of data and SNR in A. (C) Mean ± std of the difference between the cross-validation scores of the full models and the null models across simulated groups in A. (D) Mean ± std of the difference between the cross-validation scores when only the training data but not test data were added with task-related signals. The statistical test correctly rejected the full model in all simulated groups. (E) Mean ± std of the difference between the cross-validation scores when neither training nor test data contain task-related signals. The statistical test correctly rejected the full model in all simulated groups.

More »

Expand

Fig 6.

Decoding capabilities of the BRSA method.

(A) Decoded task-related activity of the sixth condition from one simulated subject in one run of test data, and the true design matrix of that condition in the test data. The simulated data with the second highest SNR in Fig 3 were used. BRSA model was fitted to one run of training data. (B) Average correlation between the decoded signals for each task condition (rows) and the time courses for each condition in the design matrix used to simulate the test data (columns). (C) The distribution of correlation between the decoded signals of each condition with the time series in design matrix of the corresponding condition across 24 simulated subjects (pink), and the null distribution of correlation between condition-permuted decoded signals and design matrix (blue). The Bhattacharyya coefficient [39] between the two distributions is 0.36.

More »

Expand