Choosing Wavelet Methods, Filters, and Lengths for Functional Brain Network Construction

Wavelet methods are widely used to decompose fMRI, EEG, or MEG signals into time series representing neurophysiological activity in fixed frequency bands. Using these time series, one can estimate frequency-band specific functional connectivity between sensors or regions of interest, and thereby construct functional brain networks that can be examined from a graph theoretic perspective. Despite their common use, however, practical guidelines for the choice of wavelet method, filter, and length have remained largely undelineated. Here, we explicitly explore the effects of wavelet method (MODWT vs. DWT), wavelet filter (Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families), and wavelet length (2 to 24)—each essential parameters in wavelet-based methods—on the estimated values of graph metrics and in their sensitivity to alterations in psychiatric disease. We observe that the MODWT method produces less variable estimates than the DWT method. We also observe that the length of the wavelet filter chosen has a greater impact on the estimated values of graph metrics than the type of wavelet chosen. Furthermore, wavelet length impacts the sensitivity of the method to detect differences between health and disease and tunes classification accuracy. Collectively, our results suggest that the choice of wavelet method and length significantly alters the reliability and sensitivity of these methods in estimating values of metrics drawn from graph theory. They furthermore demonstrate the importance of reporting the choices utilized in neuroimaging studies and support the utility of exploring wavelet parameters to maximize classification accuracy in the development of biomarkers of psychiatric disease and neurological disorders.

Introduction method and length impact subsequent graph metrics, but wavelet type has little effect. Based on our findings, we suggest that researchers use MODWT methods with a wavelet length of 8 or greater, and carefully report their choices to enhance comparability of results across studies.

Ethics Statement
All human subjects provided written informed consent for the study approved by the Institutional Review Board at the University of Minnesota.

fMRI data acquisition and preprocessing
Resting-state fMRI data from 29 healthy controls (11 females; age 41.1 ± 10.6 (SD)) and 29 participants with chronic schizophrenia (11 females; age 41.3 ± 9.3 (SD)) were included in this analysis (See [46] for detailed characteristics of participants and imaging data). A Siemens Trio 3T scanner was used to collect the imaging data, including a 6-min (TR = 2 secs; 180 volumes) resting-state fMRI scan, in which participants were asked to remain awake with their eyes closed, a field map scan, and a T1 MPRAGE whole brain volumetric scan. The fMRI data were preprocessed using FEAT (FMRIB's Software Library in FSL) with the following pipeline: deletion of the first 3 volumes to account for magnetization stabilization; motion correction using MCFLIRT; B0 fieldmap unwarping to correct for geometric distortion using acquired field map and PRELUDE+FUGUE52; slice-timing correction using Fourier-space time-series phase-shifting; non-brain removal using BET; regression against the 6 motion parameter time courses; registration of fMRI to standard space (Montreal Neurological Institute-152 brain); registration of fMRI to high resolution anatomical MRI; registration of high resolution anatomical MRI to standard space. Importantly, the two groups had similar mean RMS motion parameters: Two-sample t-tests of mean RMS translational and angular movement were both not significant (p = 0.14 and p = 0.12, respectively).

Statistical analysis
All calculations were done in MATLAB R2013b (The MathWorks Inc.). We used the WMTSA Wavelet Toolkit for MATLAB (http://www.atmos.washington.edu/~wmtsa/) to perform the wavelet decompositions, and we used the Brain Connectivity Toolbox (https://sites.google. com/site/bctnet/) to estimate values for graph metrics.

Network construction
We extracted average time series for each participant from 90 of the 116 anatomical regions of interest (ROIs) defined by the AAL atlas [47] covering the whole brain and including cortical and subcortical regions but excluding the cerebellar regions and vermis. We performed a battery of wavelet decompositions on each regional mean time series by varying wavelet method (DWT vs. MODWT), wavelet filter (Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families), and wavelet length (from a minimum of 2 to a maximum of 24). In prior literature, both the discrete wavelet transform (DWT) and the maximal overlap discrete wavelet transform (MODWT) methods have been used to create functional connectivity matrices (see [48] and [49] respectively for examples). DWT is an orthogonal transform, just as the discrete Fourier transform (DFT); MODWT adds redundancy to DWT, and can be thought as a non-downsampled version of it [27].
Wavelet filter and length alter the symmetry and shape of the wavelet (see Fig 1 for illustrations of how wavelet filters differ from one another, and how the same filter of different lengths display very distinct shapes). To examine the effect of wavelet filter, we apply Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families [27], which together constitute the most widely used orthogonal and compactly supported types of wavelet filters. We abbreviate these three filters types as D (Daubechies Extremal Phase), LA (Daubechies Least Asymmetric), and C (Coiflet). To examine the effect of wavelet length, we vary the length of the filter from a minimum of 2 to a maximum of 24. Note that the Daubechies Least Asymmetric family is only defined for lengths greater than or equal to 8, and the Coiflet family is only defined for lengths that are multiples of 6. The exact range of wavelet lengths for each family is also prespecified by the software package we utilized, namely the WMTSA Wavelet Toolkit for MATLAB. We refer to each wavelet type and length together; for example, D4 refers to the Daubechies Extremal Phase filter that has a length of 4.
Consistent with prior work [19,50], we apply this battery of wavelet decompositions to each regional mean time series and extract wavelet coefficients for the first four wavelet scales, which in this case correspond to the frequency ranges 0.125*0.25 Hz (Scale 1), 0.06*0.125 Hz (Scale 2), 0.03*0.06 Hz (Scale 3), and 0.015*0.03 Hz (Scale 4). For each subject, wavelet method (DWT vs. MODWT), wavelet filter (Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families), and wavelet length (2-24), we constructed a correlation matrix whose ij th elements were given by the estimated wavelet correlation between the wavelet coefficients of brain region i and the wavelet coefficients of brain region j.

Graph metrics
We characterized the organization of each functional connectivity matrix using both weighted and binary graph metrics. Note that we use the term "metric" due to its prevalence in the literature, but that these summary statistics do not necessarily have the properties of a metric as defined formally in the field of mathematics. To examine simple properties of the correlation matrix itself, we followed [45,51] and calculated (i) the mean correlation coefficient of the matrix as the average of the upper triangular elements of the matrix, and (ii) the variance of the correlation coefficients of the matrix as the variance of the upper triangular elements of the matrix.
To examine the topological properties of each functional connectivity matrix, we performed a cumulative thresholding approach [45] by which we thresholded each matrix to maintain the strongest edges, giving a binary undirected network that has a density of 30% (see the SI for examination of other thresholds). The choice of this threshold is based on a large and growing literature demonstrating small-world attributes of neuroimaging-based brain networks thresholded to retain this density [19,28,48,51]. On this thresholded binary matrix, we calculated several graph metrics, including the clustering coefficient, characteristic path length, global efficiency, local efficiency, modularity, and number of communities. See the Appendix for mathematical definitions of these graph metrics.
The maximization of modularity requires the investigator to make several methodological choices [52]. Due to the heuristic nature of the Louvain algorithm [53] used in maximizing the modularity quality function [54] and the degeneracy of the modularity landscape [55], we performed 20 optimizations of Q for each functional connectivity matrix. The modularity values that we report are the mean values over these 20 optimizations. We also constructed a consensus partition [56] from these optimizations using a method that compares the consistency of community assignments to that expected in a null model [52].

Classification between healthy controls and schizophrenia patients
To inform the utility of various wavelet methods, filters, and lengths in neuroimaging studies of functional brain network architecture, we performed a classification analysis in which we sought to classify functional connectivity matrices extracted from 29 healthy subjects from those extracted from 29 people with schizophrenia [45]. This particular data set is well-suited to this study because it has been difficult to classify these two groups of subjects using binary networks constructed from traditional methods; the data set therefore offers a reasonable testbed for optimization of classification accuracy as a function of methodological variation. To perform this classification, we gathered all graph metrics obtained in scale 2 (corresponding to the most commonly utilized frequency band for resting state network analyses [45,51]), and used a classification algorithm referred to as the C5.0 algorithm (http://www.rulequest.com/ see5-info.html) to generate decision trees to classify data from healthy controls versus people with schizophrenia. The C5.0 algorithm supports boosting, and is faster and more memory efficient than the previous C4.5 algorithm [57], which in turn is an extension of the earlier ID3 algorithm [58]. We generated decision trees with 10-trial boosting and 6-fold cross validation.
The boosting method, AdaBoost, allows us to generate multiple decision trees for a given set of training data and combine them for better classification while avoiding overfitting [59]. Utilizing the cross validation procedure, we randomly divided all of the subjects into 6 groups, and for each group, we trained a set of boosting decision trees on 5 groups and tested the decision trees on the remaining group. The results we report are the cumulative results across these 6 groups.

Results
In this section, we examine the effects of wavelet method (DWT vs. MODWT), wavelet filter (Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families), and wavelet length (2-24) on (i) the estimated values of graph metrics in healthy subjects, and (ii) the classification accuracy in distinguishing between functional connectivity matrices extracted from people with schizophrenia and healthy controls.

The Effect of Wavelet Method: DWT vs. MODWT
Both the DWT and the MODWT have previously been utilized to obtain wavelet coefficients for regional time series, prior to the construction of functional connectivity matrices representing graphs or networks (see [48] and [49] for recent examples). Here we performed a direct comparison between DWT and MODWT in terms of their effects on estimated network organization. In Fig 2, we show the mean and variance of the correlation coefficients of the functional connectivity matrices of healthy controls for all 3 wavelet filters, all 4 wavelet scales, and all wavelet lengths. In general, the shapes of the graph metric versus wavelet length curves for both methods show qualitative similarities. We also observe that both DWT and MODWT give similar standard errors across subjects in the mean correlation coefficient and the variance of correlation coefficients.
Despite these gross qualitative similarities, we observe that the two methods differ in terms of (i) the variation of graph metric values over wavelet lengths, and (ii) the magnitude of variance of correlation coefficients. Diagnostic values obtained using MODWT show a smooth change with increasing wavelet length, for all 3 wavelet filters and all 4 wavelet scales corresponding to different frequency bands (see Fig 2 panels A and C). In contrast, graph metric values obtained from DWT do not show smooth changes with increasing wavelet length (see Fig 2 panels B and D). To quantify these observations, we calculated the sum of the absolute value of differences between graph metrics at consecutive lengths. For each scale and wavelet filter, we performed a paired t-test to test for differences in the mean. We found that-indeed-the variation of graph metric values over wavelet lengths is significantly greater when using DWT than when using MODWT for all scales and all filters except scale 1 Coiflet; see Table 1.
Furthermore, the variance of correlation coefficients extracted using the MODWT method are smaller in magnitude than the variance of the correlation coefficients extracted using the DWT method (compare Fig 2 panels C and D). To quantify this observation, we averaged the variance of the correlation coefficients over all wavelet lengths and filter types, separately for each scale. We performed a paired two-sided t-test to measure the difference between the average variance of correlation coefficients obtained using the DWT method versus those obtained using the MODWT method. We found that the average variance of the correlation coefficients was larger in the DWT case than in the MODWT case for all 4 wavelet scales: t = 5.87 and p < 0.0001 (Scale 1), t = 8.89 and p < 0.0001 (Scale 2), t = 14.64 and p < 0.0001 (Scale 3), and t = 9.44 and p < 0.0001 (Scale 4). Together these results are consistent with the theoretical notion that DWT provides more noisy estimates of structure than MODWT, and support the common preference in neuroimaging studies to use MODWT over DWT [19].
Based on its reliable variation with wavelet length, we restrict ourselves to the study of graph metrics extracted using the MODWT method for the remainder of this paper.

The Effect of Wavelet Filter Type
In prior literature, many wavelet filters have been applied to the extraction of regional time series prior to functional brain network construction, including Daubechies [48], and Least Asymmetric families [60]. Moreover, Coiflet wavelets have been shown to provide superior compression performance in magnetic resonance images [61]. Here we performed a direct comparison between Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families in terms of their effects on estimated network organization. To isolate the effect of wavelet filter, we examine graph metrics obtained using each filter family and a fixed wavelet length. In Fig 3, we show representative results from a comparison of D6 and C6, and a comparison of D8 and LA8 in wavelet scale 2. Qualitatively, we observe no significant differences in graph metrics estimated from different wavelet filters of the same wavelet length. To confirm this observation quantitatively, we use a sign test (due to the skewed distribution of the data) to test the hypothesis that the difference median is zero between the distributions of graph metrics for D6 and C6, and between the distributions of graph metrics for D8 and LA8. Consistent with our qualitative observations, we find no significant differences (as defined as p < 0.05 corrected for multiple comparisons using a conservative family-wise error correction). Note that we observe qualitatively similar results for other wavelet scales and other graph densities (see S1 File).

The Effect of Wavelet Filter Length
In prior literature, many wavelet lengths have been applied to the extraction of regional time series prior to functional brain network construction (for example see [48] and [60]). Here we performed a direct comparison between wavelet lengths 2 through 20 (Daubechies Extremal Phase), 8 to 20 (Daubechies Least Asymmetric), and 6 to 24 (Coiflet). Note these length choices were dictated by those available in the WMTSA toolbox (see Methods). Consistent with effects shown in Fig 2, we observe that the length of the wavelet filter affects graph metrics differently; some graph metrics are affected significantly (such as the modularity index), and other graph metrics are affected very little (such as the characteristic path length); see Fig 4. To quantify the differential sensitivity of graph metrics to wavelet length, we performed a set of repeated measures ANOVA, for each graph metric and each type of wavelet filter. Here, wavelet filter length was treated as a categorical factor, and graph metric type was treated as a repeated measure. For complete results for each of these ANOVAs, see Table 2. We observe that the mean and variance of correlation coefficients are significantly affected by wavelet length in all 3 wavelet filters. The characteristic path length and global efficiency are not significantly affected by wavelet length in any of the 3 wavelet filters. The clustering coefficient, local efficiency, modularity, and number of communities are affected by wavelet length in some but not all of the wavelet filters. These results demonstrate that graph metrics are differentially Table 1. Variation of Diagnostic Values Over Wavelet Lengths t-values and p-values for two-sample t-tests measuring the differences in the sum of the absolute value of differences between graph metrics at consecutive lengths obtained from the MODWT approach as opposed to the DWT approach (df = 28 over the 29 healthy control subjects). Paired t-tests were performed separately for each filter type ("D" = Daubechies Extremal Phase, "LA" = Daubechies Least Asymmetric, and "C" = Coiflet) for each wavelet scale separately.

Scale
Filter type Mean correlation coefficient Variance of correlation coefficients sensitive to wavelet length, challenging the potential performance of meta-analyses that incorporate results obtained using different wavelet length and filters. Note that we observe qualitatively similar results for other wavelet scales (see S1 File).

Classification in Psychiatric Disease
Finally, we asked whether different wavelet filters provide different degrees of statistical sensitivity or classification accuracy when seeking to distinguish between functional connectivity matrices extracted from healthy controls versus those extracted from people with schizophrenia. To determine whether different wavelet filters provide different degrees of statistical sensitivity for group comparisons, we first visually inspect graph metric values in wavelet scale 2 as a function of filter type and length (compare dark and light lines in Fig 4). We observe that group differences in mean correlation coefficient, variance of correlation coefficients, clustering coefficient, modularity, and number of communities appear to be larger for longer wavelet lengths, across all three filter types. To quantify these observations, we performed a two-sample t-test between graph metric values extracted from the two groups (patients vs. controls) for each filter type and length (see Fig 5). In general, we observe that the p-values decreased with increasing wavelet length (as demonstrated by the increase in the minus log p-values in Fig 5), suggesting that longer wavelets display greater statistical sensitivity to group differences in these data. This finding was particularly salient for the mean correlation coefficient, variance of the correlation coefficients, clustering coefficient, modularity and number of communities, consistent with our visual inspection of Fig 4. In the SI, we explore the dependence of these results on methodological choices in network construction including the measure of functional connectivity (partial correlation, wavelet coherence, and the wavelet correlation used in the main manuscript), strength of edges (strongest versus weakest [45,62]), and time series (wavelet details vs. wavelet coefficients). We observe that the effect of wavelet length is more salient (i) when using wavelet correlation than when using wavelet coherence or partial correlation, and (ii) when using the strongest 30% connections or 10% weakest connections than when using the 30% or 1% weakest connections. Results are consistent across the use of both wavelet details and wavelet coefficients. Based on prior work [45], we speculate that the networks constructed from the 1% weakest connections display significant spatial localization and the networks that constructed from the 30% weakest (C-F) Binary graph metrics calculated at a graph density of 30% obtained through a cumulative thresholding procedure, including (C) the clustering coefficient, (D) characteristic path length, (E) global efficiency, (F) local efficiency, (G) modularity index Q, and (H) the number of communities. The more saturated curves represent data from the 29 healthy controls, while the less saturated curves represent data from 29 people with schizophrenia. Error bars depict standard errors of the mean across subjects. Note that the range of lengths examined for each wavelet family is sufficient to observe significant trends in graph metrics. See S1 File for qualitatively similar results obtained at different wavelet scales.
doi:10.1371/journal.pone.0157243.g004 Table 2. Effect of Wavelet Length. Results of Repeated Measures ANOVAs for graph metrics extracted from 29 healthy controls at scale 2 and a graph density of 30%; wavelet length is treated as a factor and graph metric is treated as a repeated measure, separately for each wavelet filter type. Effects that are significant at p < 0.05, uncorrected, are shown in red. connections display significant random structure, together overshadowing the potential effects of wavelet length on group differences. We build on the above results drawn from parametric t-tests by applying non-parametric machine learning techniques to determine whether different wavelet filters provide different degrees of classification accuracy. Specifically, we generated decision trees (see Methods) to classify healthy controls and people with schizophrenia based on graph metrics extracted from functional brain networks constructed from correlations in scale 2 wavelet coefficients (see S1 File for results across all 4 wavelet scales). We observe that the classification accuracy ranged from approximately 63.8% to approximately 82.8%, the classification sensitivity ranged from approximately 65.5% to approximately 96.6%, and the classification specificity ranged from approximately 51.7% to 79.3% (see Fig 6). The poorest classification accuracy and specificity occurred in short wavelets using the Daubechies Extremal Phase filter, and the best classification results occurred for relatively long wavelets using the Daubechies Least Asymmetric filter (LA14), which gave 82.8% accuracy and 96.6% sensitivity. These results support those obtained from the parametric t-test analysis, that larger wavelet lengths display greater statistical sensitivity to group differences in these data.

Discussion
Wavelet-based methods offer extensive benefits in time series analysis and functional brain network construction. These include denoising capabilities [25], robustness to outliers [19], utility in null model construction [26], frequency-specificity without edge effects [27], and accurate estimates of functional connectivity in long memory processes [32,33], such as those observed in fMRI time series [28-30, 34, 35]. Yet despite their utility, fundamental principles to guide the performance of wavelet-based methods remain largely undefined, hampering comparability and reproducibility of wavelet-based functional connectivity studies. Here we explicitly fill this gap by exploring the use of different wavelet methods (MODWT vs. DWT), filters (Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families), and lengths  and by determining their implications for the estimated values of functional graph metrics and the sensitivity to group differences. We found that the MODWT produces less variable estimates than the DWT method, and that wavelet length significantly impacts graph metric values and sensitivity to group differences. Collectively, our results underscore the importance of reporting the choices utilized in neuroimaging studies and provide concrete recommendations for these choices in wavelet-based analyses.
Wavelet Filtering versus Band Pass Filtering Before embarking on a thorough discussion of our results, it is worth briefly mentioning the differences-including advantages and disadvantages-of wavelet filtering [63] versus band pass filtering. Importantly, the differences between the two methods have been extensively described and quantified in the mathematics and signal processing literature, but unfortunately have rarely been discussed in the neuroimaging literature. Perhaps the two most salient differences between the two methods lie in the preservation of signal shape [64] and in the ability to denoise signal content [65]. Simple bandpass filters traditionally used in time series analysis have the benefits of being easy and fast to compute [66] but because they are implemented in the frequency domain, they have the unfortunate side effect of distorting the time domain, which in turn can alter the shape of transient waveforms, such as those characteristic of neural signals [64]. Wavelet decomposition methods by contrast work simultaneously in the frequency and time domains by fitting "little waves" or wavelets directly to the data, which offers greater veracity in representing neural signals [64]. Moreover, wavelets offer simple denoising properties not offered by bandpass filtering [64,67,68]. This is specifically because bandpass filters can only clean data in which the signal and the noise occupy different frequency bands. Unfortunately, in many neurophysiological time series, noise and desired signal occupy overlapping frequency bands, which is where wavelets can be used to some advantage. Specifically, wavelet methods can be used to decompose a noisy signal into different scales and remove the noise while preserving the signal, regardless of the frequency content. These denoising properties have demonstrated particular utility in the study of fMRI time series [67,69,70], such as those used in this study, but also in the context of ASL [71] and PET [72]. Together, the capabilities for signal preservation and cleaning make wavelet methods often preferable to bandpass methods.
Discrete versus Continuous Wavelet Transforms Once one has chosen to apply a wavelet filtering method, one is faced with the choice of using a discrete wavelet transform (DWT; or its close cousin the maximum overlap discrete wavelet transform, MODWT), or using a continuous wavelet transform (CWT) [73]. Indeed, the CWT implemented with a Morlet wavelet [74] has been used successfully in the study of fMRI time series [75]. CWT works by computing the inner products of a continuous signal with a set of continuous wavelets [73]. While an interesting and worthwhile approach, CWT has specific mathematical disadvantages that motivated us (and others [19,25,26,32,33], particularly in the context of fMRI time series [28-30, 34, 35]) to focus our exposition on MODWT and DWT. These include the fact that (i) the CWT is inherently redundant and therefore computationally intensive, (ii) it does not provide information regarding signal phase, and (iii) one cannot reconstruct the original signal from the CWT coefficients (again, due to the redundancy) [73]. However, CWT also has some benefits, particularly in enabling an assessment of frequency bands that are not necessarily different by powers of 2. To capitalize on the major advantages of both the DWT and CWT approaches, it could be interesting in future to use the notions of wavelet packet trees to study hierarchical structure in fMRI BOLD time series and its impact on observed functional graph architecture [76].
In the remainder of this section, we translate the results of our paper into concrete recommendations for the field, and we close with a brief discussion of important future directions.
The Choice of Wavelet Method The superior performance of MODWT in the context of the numerical experiments performed here is consistent with features of its theoretical construction [77]. First, and perhaps most importantly, MODWT is well defined for any signal length, making it statistically appropriate for the processing of arbitrary signals. In contrast, strictly speaking a DWT of level J 0 can be applied only to signals whose length is a multiple of 2 J 0 , significantly limiting its application to signals of arbitrary lengths. In practice when applying the DWT to signals of arbitrary lengths, one can choose to avoid this issue-as we did in this study-by preserving at most one extra scaling coefficient at each level of wavelet decomposition. Second, while DWT is an orthogonal transform, MODWT is not. In fact, MODWT is highly redundant and invariant under 'circular shift' [27,77]. This feature of MODWT preserves the smooth time-varying structure in regional time series that is otherwise lost during the application of DWT. In the context of human neuroimaging, analyses based on MODWT therefore more accurately reflect the dynamics of brain activity.
The Choice of Wavelet Filter Type and Length Wavelet filter types offer differently shaped wavelets that can be applied to empirical time series in a wavelet decomposition. While there is a generally well-accepted notion that one should choose a wavelet that displays similar timevarying features to the time series at hand, we observed that wavelet filter type had very little influence on graph metrics extracted from resting state fMRI signals. The much larger factor impacting graph metrics was the wavelet length, which tunes the fine-scale detail of the wavelet shape: larger wavelet length provides smoother wavelets. In general, graph metrics obtained using the Daubechies Extremal Phase wavelets changed more from wavelet lengths 2 to 6 than from lengths 6 to 20. These results are intuitive: the changes in wavelet smoothness are more apparent at shorter wavelet lengths than at larger wavelet lengths, and their impact on estimated wavelet coefficients should follow. From a reliability perspective, we would argue that one would wish to choose a wavelet of a relatively larger length, to ensure that one's results are (i) not sensitive to artifacts of jagged edges in the wavelet and (ii) are relatively robust to small perturbations in wavelet length. Yet, it is important to keep in mind that very large wavelet lengths may suffer from the following limitations: (i) more coefficients may be influenced by boundary conditions, (ii) a decrease in the degree of localization of the wavelet coefficients, (iii) an increase in computational burden [27]. The ideal choice may therefore be a moderate length that retains the advantages of long wavelets without gaining any associated disadvantages.
Wavelets for Classification In our methodological recommendations thus far, we have called on arguments of reliability, insensitivity to artifact, and decreased variability to support specific choices in wavelet-based functional network analysis. In a final analysis we further asked whether one can support these choices based on differential sensitivity to group differences in functional network architecture. In analyses based on scale 2 wavelet coefficients (corresponding to 0.06-0.125 Hz), the answer is clear: longer wavelet lengths provide increased sensitivity to group differences as measured both by parametric t-tests and non-parametric machine learning algorithms based on decision trees. Using these longer wavelets, we observe significantly greater classification accuracy, sensitivity, and specificity values than those previously observed in this same data set [45], complementing prior work demonstrating differences in spontaneous low-frequency (<0.1 Hz) fluctuations in BOLD signal [78,79] and functional or structural network architecture [80][81][82][83] between schizophrenia patients and healthy controls. Thus, in addition to their benefits in terms of sensitivity and robustness, longer wavelets offer greater sensitivity to group differences in this data set, supporting their choice in the performance of wavelet-based analyses of resting state fMRI data more broadly. We speculate that there might be some underlying structural difference between the two groups of subjects that is consistent among individuals, and that the longer wavelet lengths smooth small differences between individuals so that large-scale differences are clearer. More generally, we speculate that larger wavelet lengths are better able to distinguish group-level features, while shorter wavelets may better distinguish individual-level features.
Longer Wavelets and Vanishing Moments As mentioned earlier, wavelet methods offer significant advantages over bandpass filtering in terms of signal preservation and denoising. These capabilities are supported by the fact that wavelets have vanishing moments [84], the number of which is the maximum degree of the polynomials the scaling function can reproduce. Both Daubechies and Coiflet wavelets have p vanishing moments for lengths 2p [85]. For denoising purposes, it has been suggested that the number of vanishing moments should be greater than 2H + 1 where H is the Hurst exponent [86]. Preliminary evidence suggests that the Hurst exponent for fMRI noise lies below 1, suggesting that one might wish to use wavelets with at least 4 vanishing moments [86]. In the Daubechies family, this would correspond to a wavelet of length 8, which is a length that our results also support as demonstrating particularly high reliability, decreased sensitivity to artifact, and decreased variability. However, in the context of other clinical or task data, these choices might be quite different [87]. While these heuristics suggest a minimal number of vanishing moments, it is not as simple to define a maximal number of vanishing moments that should be considered. Evidence suggests that very large numbers of vanishing moments can lead to computational artifacts in the decomposed signal [86]. However, the point at which these artifacts occur is difficult to predict for different data types. Our work therefore offers a numerical approach to identifying wavelet lengths that maximize a statistic of interest such as the classification accuracy in a diagnostic test.
Methodological Considerations In general, our results point to the optimality of longer wavelets for functional network construction from spontaneous fluctuations of the BOLD signal in the low frequency band of 0.06-0.12 Hz (corresponding to wavelet scale 2 in these data). However, this scale covers only a portion of the larger frequency range often interpreted in resting fMRI: 0.008-0.15Hz. This larger frequency range actually encompasses several wavelet scales at a TR of 2 s: scale 1 corresponding to 0.12-0.25 Hz, scale 2 corresponding to 0.06-0.12 Hz, scale 3 corresponding to 0.03-0.06 Hz, and scale 4 corresponding to 0.015-0.03 Hz. While here in the main text we focus on scale 2, in the SI we provide a detailed account of the structure of graphs constructed from scales 1, 3, and 4. The inclusion of these scales is supported by previous literature demonstrating their neurophysiological relevance in terms of graph construction and functional connectivity profiles. For example, there is extensive precedent for examining functional connectivity patterns in frequency ranges including but not limited to scale 2 (0.06-0.12 Hz), in particular dating back to some of the very earliest work roughly 10 years ago [19]. In their very influential paper, Achard and colleagues demonstrated that functional networks obtained at rest displayed significant small-world structure in scales 1 through 6, in their data representing frequency bands covering 0.007-0.45 Hz [19]. Since that seminal paper, others have demonstrated that graphs constructed from frequencies above the traditionally studied bands (e.g., 0.09-0.18 Hz) demonstrate greater heritability than graphs constructed from frequency bands below 0.09 Hz [50]. However, while heritability of graph statistics might be greatest in high frequency bands, sensitivity to disease may be greatest in other bands. For example, Wang and colleagues demonstrate that resting state functional connectivity graphs in people with amnestic mild cognitive impairment are most different from healthy controls in the low frequency band 0.031-0.063 Hz [88]. Together, these data argue that a thorough assessment across several wavelet scales may be warranted depending on the scientific question at hand.
In light of these prior studies, it is interesting to note that for higher frequencies such as those probed by scale 1 coefficients (corresponding to 0.125-0.25 Hz), shorter wavelet lengths appear to provide better sensitivity to group differences; see the SI. These results suggest that the optimal methodological choice for wavelet length might depend on the frequency band of interest, and therefore the properties of the signal being studied, an observation that might be particularly relevant in the assessment of functional networks in EEG and MEG data [89]. Such a conclusion is supported by work identifying a variety of wavelet lengths and types as optimal for classification schemes in EEG signals [90] and other complex systems [91,92]. More work is therefore necessary to determine rules of thumb for wavelet analysis that are generalizable across frequency bands and imaging modalities.
We have exercised these methods on functional networks constructed using the AAL atlas applied to resting state fMRI data, which represent common choices in functional network analysis in both health and disease. It will be interesting in future to assess the utility of these methods in other parcellation schemes and in task-based data.
Finally, we have utilized orthogonal wavelet filters in this work, largely based on their common application in fMRI time series analysis [29,30,34], and the prior precedent using these filters in the construction of functional brain networks in health and disease [15,23,28,35,39,41,42]. However, it is important to note that bi-orthogonal wavelet filters, and symmetric wavelet functions are interesting alternatives [73], and have offered some utility in MR and other image processing [93,94]. Future work could determine the utility of these other filters in the context of functional brain network construction and sensitivity to connectomic disruptions in psychiatric disease [95,96].

Conclusion
As a final note, it is worth pointing out that the wavelet decompositions utilized here build on procedures currently employed in the literature on functional brain network construction in an effort to provide the field with a few useful rules of thumb. However, other wavelet-based analysis techniques do exist-including wavelet packets, dual-tree complex wavelet transforms, and double-density DWT-that have not yet been applied to this problem, and it is not yet known whether these alternative techniques might provide complementary insights into whole-brain patterns of functional connectivity. It will be interesting in future to assess the utility of these alternative methods in reliably quantifying brain network organization and its alteration in disease states.

Appendix 1: Relationship Between Sampling Frequency and Wavelet Scales
The frequency ranges extracted by a wavelet decomposition directly depend on the sampling frequency of the data. It is therefore important to delineate which features of our results are generalizable across data sets acquired with different sampling frequencies. The data used here was acquired with a TR of 2 s (a common choice), and therefore contains information up through the frequency 0.25 Hz. A wavelet decomposition of this signal affects consecutive scales in which the observed signal is repeatedly convolved with a wavelet filter (which behaves as a high-pass filter) and a related scaling filter (which behaves as a low-pass filter). The first four scales therefore correspond to the frequency ranges of approximately 0.125−0.25 Hz, 0.06 −0.125 Hz, 0.03−0.06 Hz, and 0.015−0.03 Hz, respectively. We note that different sampling frequencies may be used in other experiments, and the applicability of our specific results will depend on the degree of overlap in the frequency ranges of wavelet scales. However, our approach and conclusions regarding (i) the benefits of MODWT, (ii) the utility of moderate wavelet lengths, and (iii) the relatively small effect of wavelet filter are expected to be more generally applicable.
Appendix 2: Definitions of Graph Metrics 1. Clustering coefficient C: The clustering coefficient is used to quantify the local clustering properties of the network. First, the local clustering coefficient C i of a node i can be defined as the fraction of actual edges between its neighbors [97]: where A refers to the adjacency matrix, and k i refers to the degree of node i. Then, the clustering coefficient of the network is defined as the mean of C i over all nodes.
2. Characteristic path length L: The characteristic path length is defined as the length of the geodesic path between two vertices, averaged over all pairs of connected vertices: where V m refers to the set of vertices in connected component m, d ij refers to the geodesic distance between node i and j, and n m refers to the number of nodes in connected component m.
3. Global efficiency E glob [98]: The global efficiency has been interpreted as a measure of how effectively information can be exchanged through the network. It is defined as follows: where n is the number of nodes in the network.

4.
Local efficiency E loc [98]: The local efficiency of node i assesses the efficiency of the subgraph formed by the neighbors of i: The local efficiency of the entire network is taken as the mean of E loc,i over all nodes in the network.
5. Modularity Q [99][100][101]: The modularity of a network under a specific partitioning paradigm measures how well the network is divided into non-overlapping groups (or communities) of nodes such that the number of within-group edges is larger than expected in some null model [99][100][101][102][103]. The modularity index is defined as: where l is the number of edges in the network, c i and c j are the communities containing nodes i and j, respectively, and δ(c i , c j ) is the Kronecker delta. In this study, we presented the maximum modularity value obtained with the Louvain algorithm [53] over 100 nearly degenerate solutions [55].
Supporting Information S1 File. Supplementary Material for "Choosing Wavelet Methods, Filters, and Lengths for Functional Brain Network Construction". (PDF)