^{1}

^{2}

^{3}

^{*}

^{4}

^{5}

^{1}

^{6}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: NL AP. Performed the experiments: NL. Analyzed the data: NL. Contributed reagents/materials/analysis tools: NL AP LJ. Wrote the paper: NL AP LJ.

Graph theory deterministically models networks as sets of vertices, which are linked by connections. Such mathematical representation of networks, called graphs are increasingly used in neuroscience to model functional brain networks. It was shown that many forms of structural and functional brain networks have small-world characteristics, thus, constitute networks of dense local and highly effective distal information processing. Motivated by a previous small-world connectivity analysis of resting EEG-data we explored implications of a commonly used analysis approach. This common course of analysis is to compare small-world characteristics between two groups using classical inferential statistics. This however, becomes problematic when using measures of inter-subject correlations, as it is the case in commonly used brain imaging methods such as structural and diffusion tensor imaging with the exception of fibre tracking. Since for each voxel, or region there is only one data point, a measure of connectivity can only be computed for a group. To empirically determine an adequate small-world network threshold and to generate the necessary distribution of measures for classical inferential statistics, samples are generated by thresholding the networks on the group level over a range of thresholds. We believe that there are mainly two problems with this approach. First, the number of thresholded networks is arbitrary. Second, the obtained thresholded networks are not independent samples. Both issues become problematic when using commonly applied parametric statistical tests. Here, we demonstrate potential consequences of the number of thresholds and non-independency of samples in two examples (using artificial data and EEG data). Consequently alternative approaches are presented, which overcome these methodological issues.

The human brain is organized as a highly interconnected structural network that functionally connects adjacent and distant brain areas

Networks of so-called “small-world” topology constitute an ideal balance of efficient information transmissions between distant nodes (small path length), while retaining efficient local information processing (high clustering coefficient)

Although this research strategy provides promising insights, the commonly used analysis approach is associated with some particular statistical problems. In this paper we will discuss these problems and will present two alternative approaches that overcome these methodological issues.

Usually, small-world network analyses in the context of exploring interindividal differences aim to test whether parameters of network efficiency (i.e. path length and average cluster coefficient) are related to specific populations. For example, the researcher aims to examine whether two groups differ in terms of particular network parameters. In order to accomplish this comparison, the network parameters are calculated for each group separately and then compared between these groups using parametric tests, such as, t-tests or ANOVAs. A common approach is to calculate various measures of dependency (i.e. correlation) between brain attributes obtained from regions of interest (i.e. cortical thickness, brain activity, etc.) that are extracted from anatomical or neurophysiological data (i.e. EEG, MEG. fMRI, MRI, or DTI). This leads to regions-wise within-subject measures of connectivity. If measures of connectivity are obtained for each group separately - as with structural magnetic resonance imaging (sMRI) and diffusion tensor imaging (DTI) data (except for fibre tracking data) -

A commonly used strategy to conduct statistical comparisons for the latter type of data is to use different and arbitrarily chosen thresholds from which the different network measures are calculated

Although frequently used, this “multiple-thresholds-approach” is associated with several problems. First, depending on the number of chosen thresholds the sample size will vary and this influences the power of statistical testing. Second, the sets of thresholded mean correlation matrices are not independent (as classical statistics would require), because the information in a sparser correlation matrix is also comprised in a denser correlation matrix. This is particularly problematic for parametric statistical tests, since they inevitably require independence of the data. Thirdly, not only the number of thresholds used causes problems, but also the range of the thresholds used to estimate the network parameters are arbitrary. For example, one could restrict the thresholds to a range from 0.2 to 0.6 or to a range from 0.3 to 0.8. Using these different ranges will generate different results.

Although the above-mentioned approach is not entirely wrong, since one may wish to compare the profiles of network parameters across the different thresholds, this approach can nevertheless lead to ambiguous results. In this paper, we will demonstrate with two examples how this approach can lead to ambiguous results. In the last part we will propose an alternative approach, which uses randomisation statistics and does not suffer from the above-mentioned statistical problems.

This study was conducted according to the principles expressed in the Declaration of Helsinki. The study was approved by the local ethics committee (Kantonale Ethikkomission: EK-80/2008). All participants provided written informed consent for the collection of samples and subsequent analysis.

For the first illustration of the problem associated with this approach we used EEG data from a previous study

We demonstrated this by using three different numbers of thresholds while keeping the ranges constant (range: 0.65–0.99). The sparsest network (threshold r = 0.99) was omitted, because the networks became no longer consistent. In the first trial we thresholded the connectivity matrix 10 times (increments: 0.034) resulting in 10 networks per group, in the second trial we thresholded the connectivity matrix 15 times (increments: 0.0227), and in the third trial we thresholded the connectivity matrix 35 times (increments 0.01). In a second step, the small-world parameters were calculated for each threshold per group. The different thresholded networks served as the different measurements units within each group.

Thus, in the first trial we obtained 10 measurements for each small-world parameter, in the second trial we obtained 15 measurements for each small-world parameter, and in the third trial we obtained 35 measurements for each small-world parameter. Afterwards, we separately compared these small-world parameters between the low IQ and the high IQ groups for each trial by using a t-test for independent samples (p<0.05). Since we have to consider the fact that p-values depend on sample size, we also calculated effect sizes according to Cohen

For the first trial (thresholding the matrix 10 times), there were no significant differences between the low and the high IQ groups regarding small-world parameters (clustering coefficient: t_{(8)} = 1.87, p = 0.078, Cohen’s d = 0.42; path length: t_{(8)} = −1.30, p = 0.21, Cohen’s d = 0.31; number of edges: t_{(8)} = 1.85, p = 0.08, Cohen’s d = 0.42). For the second trial (thresholding the matrix 15 times), we found significantly more edges (t_{(13)} = 2.40, p = 0.02, Cohen’s d = 0.38), a higher cluster coefficient (t_{(13)} = 3.07, p = 0.004, Cohen’s d = 0.46), and no differences regarding characteristic path length (t_{(13)} = −1.51, p = 0.14, Cohen’s d = 0.25) for the high IQ group compared to the low IQ group. For the third trial (thresholding the matrix 35 times), t-tests revealed highly significant differences between the high and the low IQ groups. There was a significantly increased number of edges (t_{(33)} = 3.52, p = 7.76*10^{−4}, Cohen’s d = 0.39), and a higher clustering coefficient (t_{(33)} = 4.44, p = 3.33*10^{−5}, Cohen’s d = 0.47) in the high IQ group. In contrast, we found a significantly decreased characteristic path length (t_{(33)} = −2.24, p = 0.02, Cohen’s d = 0.26). An overview of this data is presented in

Mean values for the small-world parameters clustering coefficient, path length, and number of edges. We thresholded the correlation matrix 10, 15, and 35 times; this resulted in different statistical results. For the version with 10 increments, t-tests revealed no statistical differences. For the version with 15 increments, the clustering coefficient and number of edges was significantly increased in the high IQ group compared to the low IQ group. In the version with 35 different thresholds, the comparison between the high and low IQ groups revealed significant effects for all small-world parameters. The high IQ group showed a significantly enhanced small-world topology. For an optimized display, the numbers of edges were scaled (number of edges divided by 1000).

In our second example, we use a simulation to illustrate how the commonly used multiple-threshold-approach may lead to false positive results. An illustration of the method is displayed in

Networks of two groups based on artificial data. The networks were thresholded over a set of thresholds.

Comparing the random networks of the two simulated groups for the first trial (10 thresholded connectivity matrices) within the threshold range of 0.86–0.91 revealed no significant difference in any of the small-world parameters. In the second trial (25 thresholded connectivity matrices), we found significantly more edges (t_{(23)} = 2.18, p, = 0.03, Cohen’s d = 0.29) and a lower characteristic path length (t_{(23)} = −2.09, p, = 0.04, Cohen’s d = 0.28) for the first group. For the third trial (50 thresholded connectivity matrices), the t-tests revealed highly significant differences between the two simulated groups. There was also a significant increase in the number of edges (t_{(48)} = 3.19, p, = 0.002, Cohen’s d = 0.30) and in the clustering coefficient (t_{(48)} = 2.20, p, = 0.03, Cohen’s d = 0.21). In contrast, we found a significant decrease in the characteristic path length (t_{(48)} = −3.05. p = 0.003, Cohen’s d = 0.29).

Within the middle threshold range (0.50–0.54), there were no significant differences between the random networks of the two simulated groups in the first trial (10 thresholded connectivity matrices). However, for the second trial (25 thresholded connectivity matrices) there were only significant differences in the clustering coefficient (t_{(23)} = −2.19, p, = 0.03, Cohen’s d = 0.29) between the two simulated groups. The analysis of the number of edges displayed a trend to decreased number of edges in group one (t_{(23)} = −1.83, p, = 0.07, Cohen’s d = 0.23). In the third trial (50 thresholded connectivity matrices), the random network of the first group showed a decreased number of edges (t_{(48)} = −2.61, p = 0.03, Cohen’s d = 0.25) and a decreased clustering coefficient (t_{(48)} = −2.97, p = 0.02, Cohen’s d = 0.28) compared to the random network of the second group. The path length of the first group was significantly higher (t_{(48)} = 2.24, p = 0.03, Cohen’s d = 0.22).

For the lower threshold range (0.001–0.06), the first and second trials revealed no significant differences, but the third trial showed (50 thresholded connectivity matrices) a lower number of edges (t_{(48)} = −2.41, p = 0.02, Cohen’s d = 0.23) and a lower clustering coefficient (t_{(48)} = −2.21, p = 0.03, Cohen’s d = 0.22) for the first group’s random network. All the results are presented in

Displayed are the results of the second example, which used artificial data. The comparison of the two networks, based on artificial data, revealed several significant differences. Depending on the number of thresholds (defining the different measurement units within each group) and the threshold range used for the comparison, completely distinct results could be obtained. For an optimized display, the numbers of edges were scaled (number of edges divided by 1000).

The same data set was used as in the first example, which made use of multiple-thresholds-approach (see above). In line with the first example using the multiple-thresholds-approach, we created a mean connectivity matrix (averaged across all subjects), which was then thresholded with a set of different thresholds (range r = 0.55–0.95, increments: 0.05). In the second step, small-world network parameters (clustering coefficient, path length) were calculated for the different thresholded mean coherence matrices. Here we present the results for the particular chosen threshold that best corresponds to a small-world topology (r = 0.85). This threshold was applied to the mean connectivity matrices of the low and high IQ groups. This is only one of several possible approaches to choosing a threshold. In the upcoming discussion section we delineate the other possibilities. For more information regarding the results of the other thresholds please consider

As in the first example of the

The permutation analysis revealed that the high IQ group demonstrate significantly more edges than the low IQ group (p<0.001). Moreover, we found an increased clustering coefficient (p<0.001) and a decreased characteristic path length (p = 0.004) for the high IQ group compared to the low IQ group. Thus, the high IQ group exhibits significantly more small-world topology. All results are summarized in

Displayed are the distributions of the randomly generated group pair differences. The red arrow indicates where the differences of the real data ( = empirical difference between high and low IQ groups) are located within the distribution. The results show that the high IQ group revealed increased small-world network parameters.

In the present example, we used the same data set as in the second example of the multiple-thresholds-approach with artificial random networks (connectivity matrices). Again we assume to have two different groups with 30 subjects per group, but there is only one value per node and subject (i.e. cortical thickness or FA value in this specific region). We again have 84 simulated brain regions per subject, where we again allocated random values to each simulated brain region for each single subject. These data were used to construct the correlation matrix between all pairs of nodes, resulting in an 84×84 association matrix (network) for each group. They served as representation for the networks of two different groups.

However, instead of using the

The permutation analysis revealed no significant differences regarding the clustering coefficient (p = 0.46, p>0.05) or the characteristic path length (p = 0.88, p>0.05). The results are illustrated in

Displayed are the distributions of the randomly generated group pair differences. The red arrow indicates where the differences of the original data are located within the distribution. The results show, that there are no significant differences regarding the clustering coefficient or the characteristic path length.

The same data set was used as in Example 1 of the

The t-test for independent samples comparing the high IQ group vs. the low IQ group revealed a significantly increased number of edges (t_{(57)} = 2.83, p = 0.006), a significantly increased clustering coefficient (t_{(57)} = 3.54, p = 0.001), and a significantly decreased characteristic path length (t_{(57)} = −2.70, p = 0.009) (See

In this example we used a similar data set as in the second example of the

The t-test for independent samples comparing the two groups did not reveal significant effects for the clustering coefficient (t_{(58)} = 0.24, p = 0.81) or for the characteristic path length (t_{(58)} = 0.56, p = 0.58).

Graph-theoretical approaches are an elegant way to describe functional or structural brain networks on the basis of large anatomical and neurophysiological data sets. Although attractive, these techniques are associated with some statistical problems, which have been described in this paper. A major problem is on which basis inferential statistics are performed when statistically testing the measures obtained from graph-theoretical analyses. A typical approach is to compare the graph-theoretical measures between two different groups. Several papers have adopted the multiple-thresholds-approach by using different thresholds for which different graphs are computed separately for each group. The obtained graph-theoretical measures for each group are then subjected to between-groups statistical test. Typically this approach is used in the context of graph-theoretical network analyses conducted with cortical thickness and FA data. Since for each voxel or region there is only one data point available, connectivity measures can only be computed for an entire group. Thus, there is no distribution of measures available to calculate statistical tests. To generate the necessary distribution of measures for classical inferential statistics, some studies generated an artificial distribution by thresholding the networks on group level over a range of thresholds and thus collected several connectivity measures. These different measures were then subjected to between-groups statistical tests. One problem with this approach is that these measures are not independent from each other since information of denser networks (thresholded using low thresholds) is also included in sparser networks (thresholded using high thresholds). These networks and thus the derived measures are strongly inter-correlated and should not be treated as coming from different subjects. This is a serious problem, especially for parametric inferential statistical analyses, which requires independence between the measurements. A further problem is that the power of the statistical tests strongly depends on the number of measurements and in this case on the number of thresholds used.

We demonstrated these problems on the basis of a real EEG data set and simulated data. As expected the p-values strongly depend on the number of thresholds. Thus, a researcher could easily manipulate the obtained p-value by arbitrarily manipulating the number of thresholds until he/she obtained the p-value she/he would like to achieve. In order to circumvent this problem effect size measures are more suitable because they are independent from sample size. In fact, we demonstrated similar effect size measurements that were independent of the number of thresholds. Therefore, effect sizes are an important measurement, which should be added to the p-values if one still uses the multiple-thresholds-approach. If one is really interested in comparing the profiles of the network parameters across the different thresholds, randomization tests should be used since they do not need independence of the data.

We described two different approaches, which in a valid manner can indeed deal with the non-independency problem, namely, the group-level-permutation-statistic-approach and the single-subject-connectivity-matrices-approach. For intra-subject connectivity measures, like correlations between time series of resting-state fMRI, coherence measures of EEG or measures dependency obtained by fibre tracking in diffusion tensor imaging both suggested approaches are applicable. Whether the group-level-permutation-statistic-approach or the single-subject-connectivity-matrices-approach should be employed depends on the available data and the deployed research question. The advantages of the randomisation procedure are that permutation statistics can be applied when the assumptions of classical inferential statistics are untenable or distribution of the data is unknown and sample size is small

Another unsolved problem within the tresholding procedures is that there is currently no definitive and generally accepted strategy for applying particular thresholds in graph-theoretical networks analyses. How large should the threshold steps be? What are the smallest or largest thresholds that one can use? There are currently no concrete answers to these questions. Nonetheless, we present here and in our previous studies

Using unthresholded weighted connectivity matrices (as it was demonstrated above) is another possibility to statistically test the network parameters, but this approach can also generate long computation times. In addition, thresholded networks exhibit a clearer small-world topology, because the noise of the data is reduced by the thresholding procedure

Using graph-theoretical network approaches in the context of neuroscience research is a relatively young scientific field. Although promising this approach can be associated with some methodological problems. Apart from the thresholding problem there are several further methodological issues. For example the set of nodes of the network has to be carefully selected and determines largely the connection and therefore also the interpretation of the brain networks

For most studies anatomical templates as Brodmann areas or the Automated Anatomical Labeling (AAL) atlas were used. An immense advantage of using an anatomical template is that different networks of different studies, even functional and structural networks, could be directly compared. So fMRI, structural MRI, and DTI data most often use one of these template maps. The disadvantage of this template maps is that the regions can vary extremely in the size (number of voxel within a nodes). Therefore, new approaches have been developed to define nodes. One promising approach is to define the nodes on the basis of data-driven techniques

Taken together there are several valid possibilities of dealing with thresholding in network analysis. The choice of the applied approach should be decided based on the particular hypothesis, the amount of data, the methods used for network analysis, and the resources that are available for the computations. We suggest that if there is the possibility to calculate a connectivity matrix for each individual subject, then one should not create mean connectivity matrices for a whole group and compare this mean connectivity between different groups.

Displayed are the distributions of the randomly generated group pair differences for all thresholds. The red arrow indicates where the differences of the real EEG data are located within the distribution. The results of all thresholds show, that the high IQ group revealed increases small-worldness.

(DOC)

Listed are the p-values for each small-world parameter of the permutation statistics of the first example (EEG data) of all thresholds. We compared the difference of the real EEG data to 1000 randomly generated group pairs. All threshold showed an increased small-worldness for the high IQ group.

(DOC)

Listed are the t-values and p-values for each small-world parameter of the single subject method of the first example (EEG data) for all thresholds. We compared the small-world parameters between the high and the low IQ group for each threshold separately. All threshold showed an increased small-worldness for the high IQ group.

(DOCX)