Identification of alterations associated with age in the clustering structure of functional brain networks

Initial studies using resting-state functional magnetic resonance imaging on the trajectories of the brain network from childhood to adulthood found evidence of functional integration and segregation over time. The comprehension of how healthy individuals’ functional integration and segregation occur is crucial to enhance our understanding of possible deviations that may lead to brain disorders. Recent approaches have focused on the framework wherein the functional brain network is organized into spatially distributed modules that have been associated with specific cognitive functions. Here, we tested the hypothesis that the clustering structure of brain networks evolves during development. To address this hypothesis, we defined a measure of how well a brain region is clustered (network fitness index), and developed a method to evaluate its association with age. Then, we applied this method to a functional magnetic resonance imaging data set composed of 397 males under 31 years of age collected as part of the Autism Brain Imaging Data Exchange Consortium. As results, we identified two brain regions for which the clustering change over time, namely, the left middle temporal gyrus and the left putamen. Since the network fitness index is associated with both integration and segregation, our finding suggests that the identified brain region plays a role in the development of brain systems.


Introduction
The transition from childhood to adulthood involves major changes in cognitive and emotional functions. These changes are driven by continuous dynamic interactions between genetic profiles and environmental factors, which impact the structural and functional brain networks [1][2][3]. Current evidence suggests that such neurodevelopmental changes are the result of synaptic pruning and myelination [4][5][6]. Investigating developmental effects on functional networks is fundamental to better understanding cognition maturation, the hierarchical structure of neural circuitries [2], and the association between neural substrates and many psychiatric and neurological disorders [7][8][9][10]. PLOS  In several neuroimaging studies, the human brain has been generally modeled as a functional network (graph) composed of regions of interest (vertices) and their connections (edges). Pioneering studies based on resting-state protocol and functional magnetic resonance imaging (fMRI) investigated the trajectories of brain networks from childhood to adulthood and identified evidence of an age-evolving functional integration and segregation [11,12]. These studies found that during the developmental process, the organization of functional networks evolves from a local to a distributed architecture, mainly with regard to long-range connections [7,12]. In addition, other studies have suggested that children present stronger subcortical-cortical and weaker cortical-cortical connectivity than young adults do [1,13]. As concluded in a review written by [14], who combined results from [12,13,15,16], there is evidence of greater functional segregation in childhood and increased integration in adulthood.
Recent approaches have focused on the hypothesis that the functional brain network is organized into spatially distributed clusters (sets of vertices that are more tightly connected within a module and less connected to the vertices of other modules) that are associated with specific cognitive functions [17][18][19]. Some examples of these modules are the visual, default mode, cognitive control, motor, and auditory systems, usually identifiable using resting-state fMRI protocols in groups of subjects.
Several studies have suggested that this modular structure evolves across the life span [20,21]. For example, [15,22,23] investigated the influence of typical trajectories on the modular structure of brain networks. [22] focused on adulthood (from 25 to 65 years of age) while [15] focused on adolescence and early adulthood (from 7 to 31 years of age). [20,21,24,25] identified regions with linear and quadratic trajectories from childhood to adulthood by separately analyzing intra-and inter-modular connectivity. In a study based on a sample of 780 American youths (8 to 22 years old), [25] have demonstrated that brain functional connectivity is subject to a process of modular evolution, balancing within-and between-module connections. This study also showed that individual variability predicts cognitive performance. Moreover, in a review study of functional connectivity in resting-state fMRI, [26] concluded that developmental changes occur through segregation processes of local regions and integration of distant regions from distinct brain modules.
However, since there is evidence that integration and segregation processes are age dependent, the relevance of a brain region to each functional module could be different in childhood, adolescence, and adulthood. As a hypothetical example, one region could present an increased/decreased integration with a module when the individual gets older. However, studies exploring the age dependence of the goodness-of-fit of brain regions into a functional cluster are, to the best of our knowledge, scarce to nonexistent. One challenge in addressing this point is the lack of a proper methodological framework to analyze the data through this cluster analysis perspective. Most studies describe developmental changes in integration/segregation by using seed-based or independent components analyses.
In the current study, we tested the hypothesis that some brain regions change in terms of clustering structure across neurodevelopment (thus, an alternative manner to analyze segregation/integration during development). To investigate this hypothesis, we: (i) first defined an index that measures how well one region of interest (ROI) is clustered (i.e., a network fitness index) in its respective sub-system; and (ii) we identified the ROIs in which this index is associated with age. The intuitive idea is that, if the network fitness index of a region is altered across neurodevelopment, then this region presents an altered connectivity within its module (segregation) and also between regions belonging to other modules (integration). Finally, we applied this method to a large fMRI dataset composed of 397 males under 31 years of age (6.47-30.78 years) collected from the Autism Brain Imaging Data Exchange (ABIDE) Consortium.

Materials and methods
The data pre-processing, the construction of functional networks, and the clustering approach used here are similar as those described in our previous studies [27,28] using the ABIDE dataset. We briefly describe them in the following sub-sections.

Data description and pre-processing
An fMRI dataset composed of 1112 individuals was downloaded from the ABIDE Consortium website (http://fcon_1000.projects.nitrc.org/indi/abide/). The research performed at the ABIDE contributing sites complied with Health Insurance Portability and Accountability Act (HIPAA) guidelines and the 1000 Functional Connectomes Project/International Data-sharing Initiative protocols. All data distributed via the ABIDE website were fully anonymized in compliance with the HIPAA privacy rules, and no protected health information was included. To study how the clustering structure evolves over development, we excluded 367 individuals diagnosed with autism spectrum disorder (ASD), 57 without ASD subtype classification, all the 141 females, and 33 older than 31. To pre-process the fMRI data, we used the Athena pipeline (http://www.nitrc.org/plugins/mwiki/index.php/neurobureau:AthenaPipeline), which excluded 10 subjects. To minimize the effects of head movement during magnetic resonance scanning, we used the "scrubbing" procedure proposed by [29] and used in several other works [30][31][32][33]. Volumes with frame-wise displacement (FD) or temporal derivative of the root mean square (RMS) variance larger than the 95% percentile were excluded (two individuals). One hundred and five individuals with less than 100 volumes after scrubbing were also excluded. Thus, the final dataset used in our analysis was composed of 397 males (mean age ± standard deviation, 16.29 ± 5.61 years) distributed along 19 datasets collected from 16 international sites (there is more than one dataset per site) of the ABIDE Consortium (Table 1). All data collection was conducted with local internal review board approval and in accordance with local internal review board protocols. For further details about this dataset, refer to the ABIDE Consortium website.

Functional connectivity networks
To define the regions of interest (ROIs), we used the CC400 atlas [34]. Thirty-five ROIs including the ventricles were identified by using the MNI atlas and removed, leaving 316 ROIs. The average time-series signal of all voxels belonging to a ROI was considered to be representative of the region. The functional brain network for each subject was constructed by estimating the Pearson's correlation coefficient among the 316 ROIs (i.e., the similarity matrices M t for t = 1, . . ., 397), and the site effects (different fMRI recording sites) were removed using a generalized linear model with the Pearson's correlation coefficient as the response variable and the site as an independent covariate. Residues of this model were considered to be the Pearson's correlation coefficients without the site effect. Then, we discarded negative correlations as described by [12,31,[35][36][37]. Finally, the average functional network ( " M) was constructed by calculating the average of the Pearson's correlation coefficients (covariated by site). The average functional network " M will be used in Clustering analysis section to obtain a reference clustering structure.

Silhouette statistic
The silhouette statistic was proposed in 1987 by [38] to estimate the number of clusters in a dataset. The interpretation of this statistic is that it quantifies how well a specific item is fitted to the cluster it was assigned to by the clustering algorithm. In other words, the silhouette statistic is a measure of the goodness-of-fit of the obtained clustering structure. Thus, the number of clusters that maximize the silhouette statistic is the suggested number of modules in the dataset.
The silhouette statistic can be formalized as follows. Let X = {x 1 , . . ., x n } be the ROIs that are clustered into C = {C 1 , . . ., C k } clusters (sub-networks) by the spectral clustering algorithm (X ¼ S k q¼1 C q ) [39]. Let d(x, y) be the dissimilarity between ROIs x and y (one minus the Pearson's correlation coefficient between x and y), and let |C| be the number of ROIs of C. Then, we define the average dissimilarity of x to all ROIs of cluster C & X as follows: Denote by D q 2 C the cluster to which x q has been assigned by the clustering algorithm and by E q 2 C any other cluster different from D q , for all q = 1, . . ., n. Let a q be the dissimilarity of x q to the cluster to which it was assigned to and b q be the minimum dissimilarity of x q among all other clusters different from the cluster that it was assigned to by the algorithm. In other words, we have the following: Then, the silhouette statistic is defined as: The interpretation of the silhouette statistic is as follows [38]: if s q % 1, then a q ( b q , i.e., the distance of ROI x q to the cluster it was assigned to is smaller than the smallest dissimilarity between x q and other clusters. In other words, the ROI x q has been assigned to an appropriate cluster because the second-best choice cluster is not as close as the actual cluster. If s q % −1, it means that a q ) b q , i.e., the ROI x q lies much closer to the second-best choice cluster than to the cluster that this item was assigned to by the clustering algorithm. Hence, it is more natural to assign the ROI x q to the second-best choice cluster than to the actual cluster because this ROI x q has been "misclassified". Finally, if s q % 0, then a q % b q ; therefore, it is not clear whether x q should have been assigned to the actual cluster or to the second-best cluster because it lies equally far away from both. Based on these interpretations (goodness-of-fit), we will define the silhouette statistic s q as the Network Fitness Index (NFI).

Clustering analysis
To identify the ROIs associated with age in terms of functional clustering, we developed a framework based on the NFI. We describe our problem in the following way: let n be the number of ROIs, t be the number of subjects, and M 1 , . . ., M t be (n × n) similarity matrices representing the functional brain networks, with a column vector age = [age 1 , . . ., age t ] > containing the ages of each subject. We would like to identify the ROIs that cluster in a different manner in accordance with age. To calculate the NFI for each ROI of each subject, it is necessary to apply a common reference clustering structure (the clustering label of the ROIs) to all individuals. Thus, we applied the spectral clustering algorithm [39] on the similarity matrix " M to obtain the reference k sub-networks (clusters) C = {C 1 , . . ., C k }. Briefly, to cluster the ROIs into k modules, the spectral clustering algorithm selects the k eigenvectors associated with the k smallest eigenvalues of the similarity matrix " M and uses them as k-dimensional points. Then, the k-means clustering algorithm is applied to these k-dimensional points. Here, rather than using the usual k-means algorithm inside the spectral clustering, we used the k-medoids algorithm because the latter is more robust to outliers than the former [40]. Similar to other clustering algorithms, the spectral clustering algorithm also requires the number of clusters k as an input. We estimated the number of clusters by using the silhouette criterion, i.e., the number of clusters that maximizes the silhouette statistic [38].

Identification of brain regions associated with age
The identification of ROIs associated with age regarding their clustering structure was performed using a generalized linear model (GLM) with the NFI as the response variable and age as the predictor variable in each site with more than 30 subjects separately (because the ABIDE data set is known to present site effects, i.e., different scanners and acquisition protocols). To remove the effect of the proportion of volumes removed by the "scrubbing" procedure, we included it as a covariate. To combine the p-values obtained from the different sites we carried out the Fisher's method (meta analysis). ROIs presenting differences in nominal p-values greater than 5% between results obtained in data with and without "scrubbing" were discarded. A p-value threshold of 5% after correction for multiple tests by the false discovery rate (FDR) procedure [41] on data with "'scrubbing" was considered to be statistically significant. The whole pipeline is summarized in Fig 1. To further understand how the clustering structure changes at different neurodevelopmental stages, we divided the data into three groups, namely, children (subjects with ages between 6 and 13), adolescents (subjects with ages between 13 and 18), and adults (subjects with ages between 18 and 31): (i) 56 children (mean age ± standard deviation, 10.59 ± 1.61), (ii) 46 adolescents (15.72 ± 1.28), and (iii) 51 adults (22.85 ± 3.94).

Results
We pre-processed the fMRI data and constructed the functional brain networks as described in the "Materials and Methods" section. To select the number of clusters, we used the First, functional connectivity (FC) networks were constructed by estimating the Pearson's correlation coefficient between ROIs. Second, the spectral clustering algorithm was applied to the average functional network to obtain the subnetworks (clusters). Then, the network fitness index (NFI) was calculated for each ROI. To identify the ROIs associated with age, we modeled the NFI using a generalized linear model with the number of volumes removed by the "scrubbing" procedure as a covariate. This procedure was applied to data with and without "scrubbing" per site. P-values of different sites were combined by using the Fisher's method. ROIs that presented a difference in nominal p-values greater than 5% were excluded. The remaining p-values were corrected for false discovery rate. ROIs with a corrected p-value lower than 5% in the scrubbed data were considered to be associated with age. silhouette criterion [38]. Note that the highest value for the silhouette statistic is obtained when the number of clusters is k = 5 (Fig 2). Therefore, we set the number of clusters to k = 5, which is the estimated value by the silhouette statistic and also consistent with other reports [18,19,42]. The five sub-networks identified by the spectral clustering algorithm [39] are illustrated using different colors in Fig 3, namely the cerebellar network (CeN), default mode network (DMN), fronto-parietal network (FPN), somatomotor network (SN), and visual network (VN). We also analyzed when the number of clusters is set to k = 4 due to the similar silhouette statistic. In this case, the default mode and fronto-parietal networks become only one cluster, and the cerebellar system included part of the default mode network (S1 Fig). Thus, we considered k = 5 for further analyses.
Then, the identification of the ROIs associated with age was performed on data with and without "scrubbing". Eighty-five ROIs presenting a difference in nominal p-values greater than 5% between the analysis of data with and without "scrubbing" were excluded (high differences in p-values with and without scrubbing means that they are highly affected by head movement). Finally, the p-values for the remaining 231 ROIs were corrected for multiple tests using the FDR procedure [41]. ROIs with p < 0.05 (after correction) were considered as statistically significant.
Only two ROIS presented clustering goodness-of-fit associated with age, namely the left middle temporal gyrus and the left putamen (Fig 4). Fig 5 illustrates the associations between age and the network fitness index (NFI-a measure of goodness-of-fit of the ROI in the cluster) for both ROIs. Note that the NFI is negatively associated with age (the slopes are −0.005 with p < 0.001 for the left middle temporal gyrus and −0.007 with p < 0.001 for the left putamen). In other words, both ROIs tend to integrate with other brain systems (clusters) from childhood to adulthood.
Additionally, we calculated the proportion of subjects assigned to each brain module in each neurodevelopmental stage (children, adolescents, and adults) (Fig 6). By analyzing Fig 6, it is clear that the left putamen gradually diverges from the sensorimotor network to become   more integrated with the cerebellar system during development. Additionally, after childhood, the left middle temporal gyrus starts diverging from the default mode network to also participate in other systems, particularly, the somatomotor network. Fig 7 shows how the NFI changes over each developmental stage (childhood, adolescence, adulthood). For the left putamen, in fact the NFI decreases as the age increases. In other words, it corroborates the results obtained in Figs 4 and 5 that the left putamen is clustered in a different manner along neurodevelopment (and not in a specific transition between two stages). Furthermore, the same effect is observed in the left middle temporal gyrus.
To certify that these results are not due to numerical fluctuations or another source of error that was not taken into account, we verified the control of the type I error. The ages of the individuals were permuted and the association between the NFI and age re-calculated for the left Identification of alterations associated with age in functional brain networks middle temporal gyrus and left putamen. This procedure was repeated 1000 times. The proportion of falsely rejected null hypothesis for p-values lower than 1, 5, and 10% were 1.1, 3.7, and 10%, respectively for the left middle temporal gyrus and 0.5, 4.5, and 10.7%, respectively for the left putamen. Therefore, these results confirm that the type I error is effectively controlled in this data set.
To exclude the possibility that our results are due to unbalanced data (the number of individuals in the three sites used to estimate the association between NFI and age are 56 children, 46 adolescents, and 51 adults, i.e., a lower number of adults when compared to the set children + adolescents), we carried out the following experiment. We randomly re-sampled (with replacement) the same number of adults and children + adolescents, i.e., 51 adults and 51 from the set children + adolescents. Then, we performed the regression between NFI and age. We repeated this procedure 100 times and estimated the slope and respective p-value for the left middle temporal gyrus and the left putamen. The number of times that the slope's p-values were lower than 5, 1, and 0.1% were 99, 91, 49 for the left middle temporal gyrus, and 100, 96, 59 for the left putamen. Therefore, these results suggest that the significant relationships between NFI and age in these two ROIs are not due to unbalanced data.

Discussion
In the current study, we performed a whole brain functional network analysis to identify regions that are differentially clustered among childhood, adolescence, and adulthood. Functional connectivity was identified by using the Pearson's correlation coefficient. We defined the sub-networks by applying the spectral clustering algorithm, namely, the cerebellar network (CeN), default mode network (DMN), fronto-parietal network (FPN), somatomotor network (SN), and visual network (VN). These five modules are very similar to those that we identified in our previous studies [27,28] using other subsets of the ABIDE dataset.
The left putamen (Fig 4) identified as exhibiting significant age effects on clustering network fitness index (NFI) presented a reduction in the NFI over development (Fig 4), suggesting that this region may function as inter-system bridge with age. An increase in the intermodular connectivity could be explained by the increasing connections between other modules through myelination of the brain pathways [43]. The putamen is well established as an associative region. Thus, it is reasonable to expect developmental effects on this region, mainly regarding its role as integrative region in the overall brain network. The putamen, an element of the basal ganglia, is a hub that connects many systems, and it is involved in several functions, such as motor control [44], learning [45], and motivational outcomes of action [46]. Previous studies have investigated developmental effects on functional integration and segregation [14] by using centrality measures. However, none of them were based on the analysis of the evaluation of the quality of modular organization (i.e., clustering goodness-of-fit). Fig 6 suggests that during development, the left putamen becomes more integrated with the cerebellum. To the best of our knowledge, this is the first study to report this finding. Notably, it is well established in the literature that the putamen plays different roles in motor control [47].
Similarly, the left middle temporal gyrus, an associative region well established in cognitive processes involving memory, was also found to present a reduction in NFI. This finding points out that this region plays a role in integrating the DMN and other subsystems. However, further studies are necessary to understand the role of temporal regions in the dynamics of other brain systems, but this result is in agreement with previous neurodevelopmental studies reporting that DMN organization changes during childhood establishing long-range connections [7,12,15].
As limitations of the current study, we first note that head movement during fMRI scanning may affect functional connectivity estimates in developmental studies [29,48] by increasing short-distance correlations (between blood oxygen level-dependent signals) and decreasing long-distance correlations. Thus, to minimize the effects of head movement in our analyses, we applied the "scrubbing" procedure proposed by [29] and adopted a conservative approach, i.e., we discarded any ROI with difference in p-values greater than 5% between data with and without scrubbing. Since this is a developmental sample, we preferred to discard any findings, which could potentially be altered by scrubbing. Moreover, we included the proportion of scrubbed volumes in each subject as a covariate in the GLM model. The rationale is that individual realignment at preprocessing and scrubbing may not be completely eliminated due to non-linear artifactual components. It is important to remark that data scrubbing impacts the p-values associated with functional connections, by reducing the evidence against the null hypothesis. Thus, our approach based on p-values instead of the correlation coefficient is conservative. Second, note that the dataset is cross-sectional, whereas longitudinal data would be more appropriate for determining neurodevelopmental trajectories and also to identify the subgroup of subjects that presents the characteristics observed here. However, cross-sectional data are sufficient to demonstrate age effects on brain organization. Finally, the data were collected from 16 different sites, which may have used different scanners and protocols. For these reasons, we analyzed each site independently and also included the site as a covariate in GLM to minimize its effects.

Conclusion
In conclusion, we identified two regions with NFI associated with age and that maybe more integrated with other brain system, perhaps playing the role as a bridge between multiple systems. However, additional studies based on diffusion imaging and/or histology are still necessary. Finally, it is important to mention that defining typical developing age effects is a crucial first step to identifying deviant patterns that may lead to future mental disorders [49]. Possible future studies would be the investigation of divergent patterns in clinical and sub-clinical populations.