Sparse coding reveals greater functional connectivity in female brains during naturalistic emotional experience

Functional neuroimaging is widely used to examine changes in brain function associated with age, gender or neuropsychiatric conditions. FMRI (functional magnetic resonance imaging) studies employ either laboratory-designed tasks that engage the brain with abstracted and repeated stimuli, or resting state paradigms with little behavioral constraint. Recently, novel neuroimaging paradigms using naturalistic stimuli are gaining increasing attraction, as they offer an ecologically-valid condition to approximate brain function in real life. Wider application of naturalistic paradigms in exploring individual differences in brain function, however, awaits further advances in statistical methods for modeling dynamic and complex dataset. Here, we developed a novel data-driven strategy that employs group sparse representation to assess gender differences in brain responses during naturalistic emotional experience. Comparing to independent component analysis (ICA), sparse coding algorithm considers the intrinsic sparsity of neural coding and thus could be more suitable in modeling dynamic whole-brain fMRI signals. An online dictionary learning and sparse coding algorithm was applied to the aggregated fMRI signals from both groups, which was subsequently factorized into a common time series signal dictionary matrix and the associated weight coefficient matrix. Our results demonstrate that group sparse representation can effectively identify gender differences in functional brain network during natural viewing, with improved sensitivity and reliability over ICA-based method. Group sparse representation hence offers a superior data-driven strategy for examining brain function during naturalistic conditions, with great potential for clinical application in neuropsychiatric disorders.


Introduction
Functional neuroimaging techniques, such as functional magnetic resonance imaging (fMRI), are widely used to examine changes in brain function associated with age, gender, and a wide range of neuropsychiatric disorders [1][2][3]. Most of these studies used task-based paradigms, where participants perform laboratory-designed tasks in the scanner. While these tasks are designed to engage and isolate a particular aspect of brain function such as working memory or visual perception, it is unclear whether and to what extent such paradigms could uncover the complex mental processes in real life. To address such limitation, recent fMRI studies employed naturalistic stimuli, such as movie and music, to examine neural processes under real-life condition [4][5][6][7][8][9][10]. Despite the dynamic and complex nature of these naturalistic paradigms, they evoke highly consistent brain responses across individuals, laying down the foundation of using naturalistic paradigms to study real-life cognition [4][5][6][7][8][9][10].
While naturalistic fMRI paradigms are increasingly used to map brain function in healthy populations, only a handful of studies have adopted them to the examination of group differences [2,[7][8][9]. One challenge is the lack of effective statistical models to decode neural correlates of naturalistic stimuli [11]. The inputs in naturalistic paradigms are often dynamic and complex, hence difficult to model using traditional hypothesis-driven methods such as general linear model (GLM) [12]. Data-driven approaches, which do not depend on specifications of the input stimuli, are hence better suited for naturalistic fMRI studies. Progress has been made with inter-subject correlation and independent component analysis (ICA): several studies have demonstrated that neural synchrony, as measured by inter-subject correlation, during natural viewing is reduced in patients with autism and depression [8,9]. However, inter-subject correlation compares neural responses at the group level, offering very limited view on individual brain responses. ICA, on the other hand, could reconstruct the functional networks for individual brain. A recent naturalistic fMRI study has successfully used ICA to capture altered functional brain networks in individual patients with reduced levels of consciousness [13].
ICA is based on the assumption of independence between each signal source [14]. Since brain is composed of complex interconnected networks, there is no biological reason for different spatial components to hold independent distributions [15]. In addition, ICA does not account for the intrinsic sparsity of neural coding [14]: the brain encodes information with activities of sparse sets of neuronal ensembles while the majority of neurons are silent [16]. To address this issue, recent studies have decomposed fMRI signals into linear combinations of multiple atoms based on sparse representation of whole-brain fMRI signals [17][18][19][20][21][22][23]. Sparse population coding of a set of neurons has been showed to be more effective than ICA in reconstructing brain networks [15], which is increasingly applied to fMRI data analyses [18,20,24]. The basic sparse representation pipeline is to extract the whole-brain fMRI signals of one subject into a big data matrix, which is subsequently decomposed into a dictionary matrix and an associated coefficient matrix by sparse coding algorithm [25]. Thus, the time series of each dictionary atom corresponds to the functional activities of a brain network, and its associated coefficient vector represents the spatial map of this brain network.
However, as there is no correspondence of dictionary atoms across subjects and groups, it is difficult to derive group inference or compare group differences using previous single subject sparse representation. To address this problem, we here adopted a group sparse representation-based computational framework to extract functional networks during natural viewing [26]. The advantage of our method is that a common signal dictionary can be learned from the aggregated fMRI signals of two groups of subjects and then the coefficient matrices corresponding to each common dictionary can be used to statistically assess group differences. To learn a common signal dictionary for two groups of subjects from group sparse representation, we assumed that all the subjects evoke highly consistent neural temporal responses during natural viewing, as revealed by previous naturalistic fMRI studies [6,27,28]. Here, we applied group sparse representation to naturalistic fMRI data acquired from healthy males and females while they watched an emotional movie. We then statistically compared the gender differences of functional activity based on the correspondences established by the common learned dictionary. The effectiveness and the reproducibility of our method were further evaluated against ICA-based method. Our results demonstrated the feasibility and superiority of group sparse representation on elucidating functional networks and group differences in functional brain activity under naturalistic conditions.

Overview
The overview of our analytical pipeline is shown in Fig 1. First, whole-brain fMRI signals of each subject were extracted using a common mask and then stacked into a 2D signal matrix (Fig 1a). Signal matrices from all participants were pooled and concatenated in the spatial dimension into a 2D matrix S (Fig 1b), which was then factorized into one common dictionary matrix D, and the associated coefficient matrix A composed of 2D individual coefficient submatrices for each participant, using online dictionary learning and sparse coding method (Fig  1c) [25]. Finally, the derived coefficient matrices were used to assess functional differences between two groups of subjects, and the test-retest reliability of group sparse representation (Fig 1d). All variables used in the main text are defined in S1 Table. Data acquisition and pre-processing 18 right-handed (10 females, 8males) healthy subjects (ages 27±2.7) participated in the study, who were all recruited from the University of Queensland and compensated for their participation. Every participant signed a written informed consent. The subject recruitment for the first session lasted from April to October in 2014, while the recruitment for second session lasted from July to December in 2014. The study was approved by the ethics committee of the The computational framework of group sparse representation on the whole-brain fMRI signals from two groups of subjects (G F : Female, G M : Male). (a) Extracting whole-brain fMRI data from subjects x (subscript represents the label of subject, e.g., x). (b) FMRI data matrices (S x ) from all the subjects are aggregated (S). (c) Coefficient matrix A with the same spatial information and group correspondence of S, which is decomposed into 2 matrices (A GF , A GM ) corresponding to two groups (G F , G M ), and each group is made of sub-matrices corresponding to the sparse representation for each subject. University of Queensland and was conducted according to National Health and Medical Research Council guidelines. The experiments comprised two scanning sessions with an interval of around 3 months. For each session, participants underwent an 8 min resting state fMRI exam with eyes closed, and then freely viewed a 20-min movie The Butterfly Circus. This is a short, positively valenced movie that depicts the story of a man born without limbs who is encouraged by the showman of a renowned circus to discover his own potential. All participants reported that they had not previously seen the movie. The movie stimulus was presented using the Presentation software (NeuroBehavioral Systems, USA) and displayed via an MRIcompatible monitor located at the rear of the scanner. The soundtrack of the movie was delivered through MRI-compatible audio headphones (Nordic NeuroLab, Norway). After 3 months, 16 (9 females, 7 males) subjects were scanned with the same protocol for test-retest reliability analysis (session B dataset), although 2 subjects could not return for the re-scan. Thus these 2 subjects were excluded from our test-retest reliability analysis. All structural and functional images were scanned in a whole-body 3T Siemens Trio MRI Scanner. The scanning parameters are TR/TE/FA/FOV of 2200ms/30ms/79˚/134mm×134mm, resolution of 3mm×3mm×3mm, and dimension of 64×64×44.
Functional images were preprocessed using Statistical Parametric Mapping toolbox (SPM12). The preprocessing pipeline included slice timing correction and realigning, co-registration, normalization, motion correction, spatial smoothing with 6mm full width half maximum Gaussian kernel, and band pass filtering (0.0085-0.15 Hz). Nuisance covariates including WM, CSF and Friston-24 motion parameters were then regressed out using the Data Processing Assistant for Resting-state fMRI software (DPARSF) to reduce potential effects of physiological confounds. By computing the intersection of all single brain masks of each participant together, we generated a group-wise common mask to extract whole brain signals. In this way, we ensured that the same voxels are processed for each participant. We also calculated the mean head motion parameter as the mean absolute displacement of each brain volume as compared to the previous volume estimated from the translation parameters in the x (left/right), y (anterior/posterior), and z (superior/inferior) directions (displacement = square root (x 2 +y 2 +z 2 )) of two groups [29], and found no significant group differences of mean head motion parameter (P>0.05, Mean±SD: 0.0552 mm±0.0357 for the female group, and 0.0703 mm±0.0292 for the male group).
While the whole naturalistic fMRI dataset was 20-minitute, we first segmented the 20-min fMRI dataset into three segments, according to the narrative structure of the movie The Butterfly Circus performed by three experts trained in screenwriting and film theory [10]. We then applied our framework to one segment of fMRI data (7~13min, both sessions) that comprised a complete event, for the efficiency of computation and investigation of the over-completion problem. We finally replicated all analyses on the whole 20-minute dataset for further validation. Based on both results, we investigated the over-completion problem, that is, less observations (n) than predictors (p) [25]. Since the two datasets yielded similar results, only results based on middle segment of fMRI are presented in the main text, while the results based on whole fMRI datasets are provided in the supplemental materials (S1-S4 Figs, S2 and S3 Tables).
Affective ratings of the movie. After participants completed each scanning session, they were asked to rate their experience when watching the movie, including the level of boredom, enjoyment, valence, as well as the audio and video quality of the movie during fMRI acquisition, in the scale between 1 and 5 (S4 Table). Note that higher rating of boredom means more boring the participants feel, while higher rating of enjoyment refers to more enjoyable the participants experience. Participants rated the movie as positive and happy (3.7±1.2). For ratings of boredom, where males (2±1.07) tend to rate higher scores than females (1.10±0. 32) indicating that females were more engaged than males during movie viewing (P<0.0217, twosample t-test; Table 1). We did not detect significant group differences in ratings of enjoyment, valence and audio and video quality between male and female (P>0.05; Table 1).

Dictionary learning and sparse representation theory
The sparse coding framework is implemented by custom codes in Matlab (MathWorks, MA, USA) and a publicly available software (http://spams-devel.gforge.inria.fr/). In this framework, each signal sample s i in the data matrix S = [s 1 ,s 2 ,. . .,s n ]2R t×n is modeled as the sparse and linear combination of atoms in a learned dictionary D = [d 1 ,d 2 ,. . .,d m ]2R t×m , i.e., s i = D × a i and S = D × A, where A = [a 1 ,a 2 ,. . .,a n ]2R m×n is the weight coefficient matrix for sparse representation and each column a i is the corresponding weight vector for s i [30].
Training a solution for sparse representation of S = [s 1 ,s 2 ,. . .,s n ]2R t×n , the empirical cost function is summarized in Eq (1) by considering the average loss of representation for n signals.
The loss function is defined in Eq (2) with the ℓ 2 norm that yields the minimization of the representation error, and the ℓ 1 norm that constrains the sparsity of a i . Here, λ is a regularization parameter to trade off the representation error and sparsity level.
As we mainly focus on the fluctuation shapes of input signals and aim to prevent D from becoming arbitrarily large, we constrain columns d 1 ,d 2 ,. . .,d m in D with Eq (3).
In summary, the sparse representation problem is summarized as a matrix factorization problem as shown in Eq (4). Similar as in Eq (2), the Frobenius norm is employed for factorization error minimization, and the ℓ 1 norm of A matrix yields sparsity. The alternative optimization strategy [31] is usually employed to solve the problem, where the dictionary D and coefficient A are iteratively optimized, by alternatingly minimizing over one while keeping the other fixed, and the dictionary D is initialized randomly, as proposed by [31]. A method based on this strategy, called online dictionary learning, which could deal with infinite data input [25], was proposed and the software (http://spams-devel.gforge.inria.fr/) was developed for public use. Comparing with classical dictionary learning methods that access the whole training data at each iteration to optimize the dictionary D and coefficient A, the online dictionary learning improves the efficiency by progressively augmenting the training data [25,32]. In each iteration, the sparse coding and dictionary updating is performed with a subset of the training data based on stochastic optimization. Afterwards, the subset is then augmented with a new training sample, and the optimization is performed again on the new training data with the outcome of the previous iteration as warm restart. The online dictionary learning repeats these iterations until all training data have been adopted, providing an efficient solution with low memory consumption and computational cost as compared to classical dictionary learning methods [25]. In this paper, we employ this online dictionary learning and sparse coding method for our group sparse representation analysis.
Sparse representation of whole-brain fMRI signals Whole-brain fMRI signals of each subject were extracted and stacked into a 2D matrix S x (x represents the label of participant, S Fp or S Mq ). The fMRI signals of each voxel make up the columns of S x . Then, all signal matrices from two groups were pooled and concatenated in the spatial dimension into a big 2Dmatrix S (Fig 1). Online dictionary learning and sparse coding method was adopted to decompose S into a learned dictionary matrix D and the coefficient matrix A [25]. Note that D is commonly shared by all subjects, and the A has the same spatial voxel organization and group correspondence of S, which permits group statistical analyses and comparison. Thus, A can be decomposed into 2 matrices which represent male and female groups, and each group comprises sub-matrices of participant as shown in Fig 2, Each column of D corresponds to a dictionary atom and its time course, and each row in A x (A Fp or A Mq ) represents its coefficient vector that assigns a coefficient to each voxel in the brain and can be mapped back to the brain volume.
There are two essential parameters in sparse coding strategy: λ that keeps balance between the residual error of sparse representation and the sparsity of spatial regions in each atom, and the number of dictionary atoms m. Currently, there is no established criteria on determining λ and m. We thus assessed the impact of parameter settings by systematically varying λ (0.1, 0.5, 1) and m (100, 200, 400). Specifically, we conducted network decomposition, group difference detection and test-retest reliability analyses using different combinations of λ and m. Results in the main text were obtained when λ is set as 0.5 and m as 200, and results using other parameter settings are presented in the supplementary materials. While our main conclusion is relatively robust to different parameter settings (S5-S7 Figs), we found the setting of λ as 0.5 and m as 200 produces the highest test-retest reliability and highest number of clusters showing significant group difference, suggesting this setting is relatively more robust (S6-S8 Figs).

Group-wise statistical analysis
Coefficient matrix A maintains the spatial information and group correspondence of S. A can be decomposed into 2 matrices corresponding to two groups (A GF , A GM ), and each group is made of sub-matrices corresponding to the sparse representation for each subject (Fig 2). F and M denote female and male, respectively. Each row of this sub-matrix A x represents the individual coefficient spatial map of each dictionary atom (subscript x represents the label of participant, e.g., Fp represents the pth subject in the Female group), so the A x (i, j) in each submatrix represents the reconstruction coefficient of the j th voxel to the i th atom in the dictionary (Fig 2). For all the subjects together, we hypothesize that each coefficient is group-wisely null, and the T-test (with T defined as Eq (7)) is carried out to test acceptance or rejection of the null hypothesis for each element. The derived T-value is transformed to z-score [12].
Due to the sparsity of coefficient matrix A, the t-test result of A is also sparse. Each row of ttest result represents the statistically significant contribution to each dictionary atom, and can be mapped back to brain volume. The resultant z-score map hence represents the spatial distribution of the atom commonly shared by two groups (Fig 2). Corresponding coefficient spatial maps were then used to compare differences between the two groups.
To test group differences in each dictionary atom, each row of sub-coefficient matrices A x (A Fp or A Mq ) are set as input of SPM12 for two-sample t-tests. Search volume is masked by the z-score map from the one-sample t-test corresponding to the same dictionary atom. Both left- tailed and right-tailed two-sample t-tests were performed. Group-difference results were thresholded using a joint probability distribution method to correct for multiple comparisons [33]. Two levels of thresholding were used, where a threshold of P<0.005 was set for voxel height, and paired with two different thresholds for cluster extent (FDR-corrected P<0.005 and FDRcorrected P<0.05).

Independent component analysis
To evaluate the performance of group sparse representation method, we compared it to three commonly-used data-driven strategies. First, we used tensor independent component analysis (tensor ICA) implemented in FSL MELODIC toolkit [34]. Tensor ICA is commonly used for decomposing the data into independent components where stimulus paradigm is consistent among subjects. Specifically, the number of components in our study was experimentally set to 50, or 100 when specified in the text. Dual regression was then used to project tensor ICA components to each subject space, which were then used for group statistical and test-retest reliability analyses. Similar method is employed to conduct group statistical analysis in SPM12. As no significant results were revealed with the stringent threshold used for group sparse representation, we used a lenient threshold where P<0.01was used for voxel height, and P<0.01was used for cluster extent.
In addition to tensor ICA, we also examined the functional data with spatial concatenation group ICA method, where the input and output is organized in the same way as group sparse representation [35,36]. This ICA method was implemented using the Fast ICA algorithm [37,38]. Specifically, subjects' data were concatenated in the spatial dimension into a big signal matrix, which was then factorized into a common time series mixing matrix and the independent spatial components matrix. The spatial components matrix had the same spatial organization of input data and was composed of 2D sub-matrices representing the spatial components for each participant, similar to the coefficient matrix derived by our group sparse representation. Here, the number of components was experimentally set to 50, or 100 when specified in the text. Similar method was adopted to generate group z-score maps and conduct group statistical analysis. As no significant results were revealed with the stringent threshold used for group sparse representation, we used a lenient threshold of P<0.01 for voxel height and P<0.01 for cluster extent.
Finally, we also adopted commonly used temporal concatenation group ICA, as implemented using GIFT Matlab software [39]. Subject-specific spatial maps were extracted using back-reconstruction for group statistical and test-retest reliability analyses. As no significant results were revealed with the stringent threshold used for group sparse representation, we used a lenient threshold where P<0.05 was used for voxel height and P<0.05 was used for cluster extent. Since temporal concatenation group ICA yielded similar results, they were mostly presented in the supplemental materials (S9 and S10 Figs, S5 Table).
Corresponding brain networks derived across methods were identified by matching them to established network template [40], followed by careful visual inspection. Furthermore, the corresponding clusters showing gender difference detected across methods were defined by matching them to the Brodmann area and automated anatomical labeling (AAL) atlas, followed by careful visual inspection.

Test-retest reliability analysis
To test the reproducibility of each brain network fMRI measures, we conducted the same group sparse representation, tensor ICA, spatial and temporal concatenation group ICA on fMRI dataset of session B (16 participants viewed the same movie for the second time), and then identified matching z-score maps which share maximum number of overlapping voxels. The matching was also confirmed with careful visual inspection. For each selected matching network, the networks from two scans were first binarized using a threshold of Z>1.65, and then the common brain region shared by two scans was defined as a mask to evaluate the reliability. The reliability was quantified by calculating the intra-class coefficient (ICC) between measures from the two scans [41,42]. A oneway ANOVA was applied to the measures of the two scans across subjects, to derive between-subject mean square error MS p and within-subject mean square error MS e , where the measures here referred to z-score, thus ensuring measures of different subjects comparable. ICC values were defined as Eq (8), where d is the number of repeated sessions (here d = 2). This form of ICC has been widely used in previous test-retest reliability analyses of fMRI data [43,44].
Only the common brain regions shared by both individual spatial maps and mask defined above were included in ICCs calculation. For both group sparse representation and ICA-based methods, we evaluated test-retest reliability at both scan-wise and voxel-wise levels, following previous methods [43]. Scan-wise level was defined as the average z-score across all voxels within the common mask generating a single ICC for whole network, while the voxel-wise level was defined as the individual voxel's z-score generating ICCs for all the voxels within mask. The test-retest reliability is classified as excellent (ICC>0.

Functional brain networks identified with group sparse representation
We first investigated whether group sparse representation approach could identify functional brain networks of interests during movie viewing. We applied group sparse coding to naturalistic stimuli fMRI data aggregated from 18 healthy subjects. Voxels with significant reference to each dictionary atom were determined using an experimentally determined threshold (Z>1.65, one-sample t-test). Group sparse representation identified several brain networks that have been established previously, including auditory (#72), visual (#19, #143, #37), dorsal attention (#28), default mode (#46), and salience networks (#119) (Fig 3). Some networks identified with sparse representation appeared to be combinations of different networks, such as auditory and supplementary motor networks (#16), default mode and salience networks (#81), default mode and cerebellar networks (#54, #132), salience and executive control networks (#64), auditory and visual networks (#61) (Fig 3). While network patterns identified under different λ (0.1, 0.5, 1) and m (100, 200, 400) settings are generally similar, the impact of parameter settings can be observed (S5 Fig): the coefficient maps appear to be coarse and noisy with small λ and dictionary size, and become sparse with large λ and dictionary size (S5 Fig). Overall, the setting of λ as 0.5 and m as 200 resulted in the most robust network decomposition.
With the same threshold of Z>1.65, both tensor ICA and spatial concatenation group ICA identified similar functional brain networks, including auditory (#7/#11), visual (#3, #22/#28, #47), dorsal attention networks (#11/#26), and default mode and salience (#2/#10) (Fig 4). Five networks were detected by all three methods (labeled in the same color in Figs 3 and 4). These results validate that group sparse representation can identify meaningful brain networks driven by naturalistic stimuli. In addition, sparse representation method detected artifact components that related to head-motion, white matter, susceptibility-motion, cardiac, and MRI acquisition/reconstruction (S11 Fig), similarly to the ICA methods [45,46]. While these networks share spatial patterns with ICA networks in general, they are more circumscribed than the latter, suggesting the sparse representation framework might be more specific and accurate in defining functional networks. We further examined this possibility by comparing 1) the sensitivity in detecting group differences and 2) the test-retest reliability among the three datadriven methods.

Gender differences in functional brain networks
A question rises up that whether group sparse representation could detect gender differences in brain responses to natural emotional experience. As shown previously, females tend to present greater emotional reactivity [2,7,47,48]. Hence, we hypothesized that brain responses to the naturalistic emotional stimulus is more robust in female than male participants. To test this hypothesis, we compared the coefficient spatial maps for each dictionary atom between the female and male groups. Remarkably, several network regions were found to show higher coefficients, or stronger functional connectivity, in females than males, whereas no regions were found to be higher in males (Fig 5a; two-sample t-tests). Specifically, females display significantly increased functional connectivity in higher order brain centers, including posterior cingulate cortex, precuneus, insula, anterior cingulate cortex, superior medial frontal gyrus, and superior parietal lobule, as well as thalamic nucleus and medial occipital gyrus (Fig 5a; Table 2, clusters 1-7; P<0.005 for voxel height and FDR-corrected P<0.005 for cluster extent). At a less stringent threshold, additional 7 clusters were detected at the precuneus, posterior cingulate cortex, superior temporal gyrus, inferior frontal gyrus, primary and secondary visual cortices and cerebellar Crus I (Fig 5a; Table 2, clusters 8-14; P<0.005 for voxel height and FDR-corrected P<0.05 for cluster extent). Note, with this less stringent threshold or even an uncorrected threshold (P<0.005 for voxel height and P<0.005 for cluster extent), we did not detect any clusters showing greater functional connectivity in male than female participants.
In contrast, the effect of gender difference was much weaker with tensor ICA and spatial concatenation group ICA networks, where no significant gender difference was detected using either the two FDR-corrected thresholds (P<0.005 for voxel height and FDR-corrected P<0.005 or 0.05 for cluster extent), or the one without FDR correction (P<0.005 for voxel height and P<0.005 for cluster extent). Only with a more lenient threshold were we able to detect significant clusters-3 clusters for tensor ICA and 6 clusters for spatial concatenation group ICA-with greater connectivity in female than male participants (Fig 5b and 5c, S6 Table, clusters 1-3, S7 Table, cluster 1-6; P<0.01 for voxel height and P<0.01 for cluster extent). Two of these clusters, precuneus and anterior cingulate cortex, were detected with the group sparse representation results at stringent FDR-corrected thresholds (highlighted with same color in Fig 5). These results suggested that group sparse representation is more effective and sensitive for detecting functional differences in brain networks than the ICA-based method.
In addition, we examined whether these results are robust to different parameter settings in sparse representation. We repeated the group statistical comparison analyses with λ of 0.1, 0.5, and 1, and dictionary size of 100, 200, and 400. Across all settings, we detected significantly greater functional connectivity in female than male group, and nothing in the opposite contrast (P<0.005 for voxel height and FDR-corrected P<0.05 for cluster extent). The comparison   (sorted by p-value in ascending  order). The Network Index refers to the index of dictionary atom generated by group sparse representation algorithm (corresponding to the index in Fig 3). Sparse coding reveals gender differences

Test-retest reliability
Next, we sought to establish the long term test-retest reliability of our method. The same participants underwent the fMRI scan while they viewed the same movie three months later (session B). Group sparse representation, tensor ICA, spatial and temporal concatenation group ICA were used to identify functional brain networks in the repeated scan sessions. To compare the test-retest reliability of these methods, we focused on four networks that can be detected between session A and B by all three methods: visual network, auditory network, dorsal attention network, and default mode-salience network (Fig 6 and S8 Fig). In addition to these four networks, group sparse representation identified one more matching network, the default mode-cerebellar network (Fig 6a), while tensor ICA identified the frontoparietal network. The reliability was assessed at both voxel-wise and scan-wise levels [43]. All four methods showed a range of reliability at the voxel level across different networks (Fig 6), with greater variability in tensor ICA networks (larger error bars in Fig 6b). On average, voxel-wise ICCs are moderate in the visual network (0.4573, #143; 0.4459, #30) and the default mode-salience network (0.5147, #81; 0.5376, #11) using group sparse representation and temporal concatenation group ICA, but reduced to fair or poor level with tensor ICA (0.3693, #3; 0.3940, #2) and spatial concatenation group ICA (0.3131, #28; 0.0363, #10) (Fig 6b). For the auditory network, average voxel-wise ICCs are at the moderate level using all four methods (0.4261, #72; 0.5086, #7; 0.4558, #11; 0.4268, #1) (Fig 6b). Average voxel-wise ICCs are fair in the dorsal attention network using group sparse representation (0.3595, #28) and temporal concatenation group ICA (0.3869, #32), and are at moderate level using the other two ICA methods (0.4969, #11 and 0.4794, #26) (Fig 6b). To statistically compare the reliability among four methods, we focused on the voxels that are identified by all methods for these matching networks. In two out of four (visual and default mode-salience), voxel-wise ICCs are significantly higher with group sparse representation than tensor ICA and spatial concatenation group methods (paired t-tests, Bonferroni-corrected P<0.001, Fig 6b).
To further assess the impact of parameter setting on test-retest reliability, we repeated the reliability analyses using a combination of component numbers and lambda (m = 100, 200, 400; λ = 0.1, 0.5, 1) for group sparse representation and a high component number for tensor ICA and spatial concatenation group ICA (100) (S8 Fig). We chose three functional networks that can be robustly identified under all settings: visual, auditory and dorsal attention network. Averaging across the three networks, the highest reliability was obtained with m = 200 and λ = 0.5, further supporting the use of these parameters in the main analyses (S8 Fig). Similarly, tensor ICA and group ICA were more reliable with 50 as the component number than 100 (S8 Fig).

Discussion
A variety of methods have been developed for functional neuroimaging analysis, such as general linear model (GLM) [12], seed-based method [49], principal component analysis (PCA) [50], singular value decomposition (SVD) [51], inter-subject correlation (ISC) [8], and independent component analysis (ICA) [40]. While GLM is widely used in detecting task-evoked brain activations, ICA is one of the most common methods in characterizing brain network using naturalistic or resting-state fMRI paradigms. Tensor ICA is particularly suitable for naturalistic fMRI paradigms when data are acquired with consistent stimulus presentation [9]. However, the underlying statistical assumption of ICA is not well supported by neurobiological basis-brain networks are not necessarily independent from each other. Rather, they could recruit inputs from the same cortical region and interact between each other in the service of behaviors [15,[52][53][54][55][56][57]. On the other hand, sparse coding algorithm does not constrain dictionary matrix to be uncorrelated or independent, resulting in low level of correlation among atoms [18,24]. In addition, sparse population coding of a set of neurons has been shown to be more effective in detecting brain activity patterns and brain networks [15,17,23]. Recently, a Sparse SPM algorithm based on group sparse dictionary learning shows superior performance for characterizing resting state functional network, over seed-based and ICA methods. However, due to generating common group network and different individual temporal dictionaries, this algorithm differs from our framework that assumes common temporal responses across subjects [23]. Therefore, our group sparse representation framework offers potential advantage in extracting functional brain networks during naturalistic paradigms. Our study presents comprehensive comparisons between group sparse representation-based and three ICA-based methods, based on their performance on network decomposition, group difference detection and test-retest reliability analyses. Our findings suggest that, while all four methods offer comparable test-retest reliability, sparse representation-based method could be more sensitive in identifying functional brain networks during naturalistic paradigms.
Our method builds upon previous studies that employed group sparse representation on static task-based fMRI data [26], and provides one of the first applications of group sparse representation to naturalistic neuroimaging paradigms. Comparing with individual-based sparse representation methods [18,24], group sparse representation method can automatically establish correspondences across individuals and populations, enabling detailed examinations of group differences or brain-behavioral correlations. Our results revealed that this method could identify well-known functional connectivity networks during dynamic naturalistic stimulation. Furthermore, we developed rigorous statistical tests to characterize group differences in functional activity, taking full advantage of the inherent correspondence between individual networks established by group sparse representation.
The current study employed group sparse representation to detect gender differences in brain responses during natural emotional experience. This was motivated by previous findings in psychology and cognitive neuroscience that females respond more strongly than males to affective stimuli [7,47,[58][59][60][61][62]. It is well recognized that females responds more strongly than males to affective stimulus, particularly in limbic regions such as the anterior cingulate cortex in response to negative valence [60,62]. Here, we examined the gender differences in emotional experience using a ecologically-valid paradigms [9]. We found support from behavioral ratings that females were more engaged by this dynamic emotional stimulus than males (Table 1). Furthermore, our study identified several brain regions that showed stronger functional activations in female. Many of these brain regions are known to contribute to affective processing. Several clusters are detected at the anterior insula (cluster #7) and anterior cingulate cortex (cluster #2, 4), which coactive in response to emotional salience [63]; the anterior insula is particularly postulated as a hub region that integrates intero-and exteroceptive information and generates the subjective experience of emotion [64,65]. In addition, the default mode network, anchored by the precuneus (cluster #1, 8,14) and posterior cingulate cortex (PCC) (cluster #1, 14), is activated more strongly in females than males, potentially reflecting greater episodic memory retrieval and self reflection associated with emotion and pain [66,67]. The superior temporal gyrus also showed stronger activations in females than males (cluster #11, 12), consistent with its role in processing emotional facial stimuli [68]. Although the underlying mechanism of these findings requires further investigation, they support greater responses to naturalistic emotional stimuli in females than males.
Our study not only demonstrated gender differences in naturalistic emotional experience, but also characterized meaningful functional brain networks. As shown in Fig 3, several brain networks identified with group sparse representation are engaged in audio-visual processing, including visual (#19, #143, #37) and auditory networks (#72). In addition, networks well established in the resting state literature are also detected, including the dorsal attention network (#28) associated with attention-demanding activities [69], salience network (#119) responding for salience detection [70], and default mode network (#46) [71]. Interestingly, sparse representation approach also identified brain networks that appear to be combinations of different network regions, such as auditory and supplementary motor networks (#16), default mode and salience networks (#81), default mode and cerebellar networks (#54, #132), salience and executive control networks (#64), auditory and visual networks (#61), which could reflect the interactions between these brain regions during movie viewing. These networks show consistent spatial and temporal patterns across sparse representation and ICA methods (Figs 3, 4 and 7). We then further assessed the test-retest reliability of these brain Sparse coding reveals gender differences networks identified by our method. While participants might be less engaged in viewing the same movie for the second time, we still observed good test-retest reliability for several networks, confirming that the natural viewing condition manifests improved reliability over resting state condition [72]. On the other hand, we found that the visual, auditory and attention networks show higher scan-wise ICC values than that of default-salience network (Fig 6). The lower reliable brain activities of higher order brain regions might indicate affective experiences might have changed during repeated viewing. Nonetheless, as our main goal of reliability analysis is to compare the performance of our method to other ICA-based methods, the absolute values for reliability is not the focus of this study. In addition, sparse representation method could separate artifact components relating to head-motion, white matter, susceptibilitymotion, cardiac, and MRI acquisition/reconstruction (S11 Fig) [45,46]. Some of these artifact components might also be influenced by the naturalistic stimuli, such as ones related to the head motion. Overall, our method can effectively and robustly detect meaningful brain networks driven by naturalistic stimuli.
We further assessed the effect of parameter setting in sparse representation with rigorous statistical analysis (S6 and S7 Figs). We found that the gender differences in functional connectivity were very robust to parameter settings-significantly greater functional connectivity in female than male group was identified across all parameter settings (P<0.005 for voxel height and FDR-corrected P<0.05 for cluster extent). The setting used for the main analysis, λ as 0.5 and m as 200, generated the highest number of clusters showing significant gender difference (S6 Fig). Network components extracted with this setting also showed the highest test-retest reliability (S8 Fig), suggesting the selection of this parameter setting is relatively more robust. Furthermore, our application was not significantly impacted by over-completion, that is, less observations (n) than predictors (p). Here, the observations are time points of fMRI scan, and the predictors refer to the learned dictionary atoms. Sparse representation using either a segment of fMRI data (n<p) or the whole data (n>p) generated very similar results (S1-S4 Figs, S2 and S3 Tables), suggesting that our results are robust to over-completion. Other studies have also shown that over-completion is a valid setting in Lasso framework and appears to offer several advantages, including greater robustness when facing with noises and other kinds of degradation, superior flexibility in matching the generative model to the structure of input data and better approximation of the underlying statistical distribution [25,[73][74][75].
In summary, our study developed a novel application of group sparse representation on naturalistic fMRI data. Using rigorous statistical analyses, we demonstrated that group sparse representation could reliably identify functional networks during natural viewing and detected subtle gender differences in these networks with higher sensitivity than ICA-based method. Hence, the group sparse representation framework could offer a suitable approach in analyzing dynamic fMRI data during naturalistic paradigms. In the future, dimension reduction of input data can be investigated to further improve the effectiveness and efficiency of our framework. In addition, it would be useful to test our framework to naturalistic fMRI datasets acquired during different emotional movies to further validate and investigate gender differences during affective processing. Moreover, this framework could potentially be useful in clinical research, to detect abnormal brain function and develop neuroimaging markers for neuropsychiatric disorders. One potential concern is that if large differences exist between the groups, such as patients and healthy controls, it may be inappropriate to pool fMRI data together for dictionary training. Further investigation is needed to further explore the potential of sparse representation in clinical neuroscience.  Table. Brain regions with greater activation in females than males as detected by group sparse representation for whole fMRI data(sorted by p-value in ascending order). (DOCX) S3 Table. Brain regions with greater activation in females than males as detected by tensor ICA for whole fMRI data(sorted by p-value in ascending order). (DOCX) S4 Table. The questionnaire for subjects rating their experience after scanning session. (DOCX) S5 Table. Brain areas with greater activation in females than males as detected by temporal concatenation group ICA (sorted by p-value in ascending order). (DOCX) S6 Table. Brain areas with greater activation in females than males as detected by tensor ICA (sorted by p-value in ascending order). (DOCX) S7 Table. Brain regions with greater activation in females than males as detected by spatial concatenation group ICA (sorted by p-value in ascending order). (DOCX)