Multimodal principal component analysis to identify major features of white matter structure and links to reading

The role of white matter in reading has been established by diffusion tensor imaging (DTI), but DTI cannot identify specific microstructural features driving these relationships. Neurite orientation dispersion and density imaging (NODDI), inhomogeneous magnetization transfer (ihMT) and multicomponent driven equilibrium single-pulse observation of T1/T2 (mcDESPOT) can be used to link more specific aspects of white matter microstructure and reading due to their sensitivity to axonal packing and fiber coherence (NODDI) and myelin (ihMT and mcDESPOT). We applied principal component analysis (PCA) to combine DTI, NODDI, ihMT and mcDESPOT measures (10 in total), identify major features of white matter structure, and link these features to both reading and age. Analysis was performed for nine reading-related tracts in 46 neurotypical 6–16 year olds. We identified three principal components (PCs) which explained 79.5% of variance in our dataset. PC1 probed tissue complexity, PC2 described myelin and axonal packing, while PC3 was related to axonal diameter. Mixed effects regression models did not identify any significant relationships between principal components and reading skill. Bayes factor analysis revealed that the absence of relationships was not due to low power. Increasing PC1 in the left arcuate fasciculus with age suggest increases in tissue complexity, while increases of PC2 in the bilateral arcuate, inferior longitudinal, inferior fronto-occipital fasciculi, and splenium suggest increases in myelin and axonal packing with age. Multimodal white matter imaging and PCA provide microstructurally informative, powerful principal components which can be used by future studies of development and cognition. Our findings suggest major features of white matter undergo development during childhood and adolescence, but changes are not linked to reading during this period in our typically-developing sample.


Introduction
Reading is a sophisticated skill with many constituent systems including vision, language, memory, and attention. White matter fibers play an important role in connecting these systems and facilitating coordinated processing across the reading network. Diffusion tensor imaging (DTI) is frequently used to investigate links between white matter and reading thanks to its sensitivity to white matter microstructural features. DTI studies have linked reading to white matter in a broad network of tracts including the arcuate, superior and inferior longitudinal, inferior fronto-occipital, and uncinate fasciculi, and the posterior corpus callosum [1][2][3][4][5], such that markers of increased white matter maturity correlate with better reading scores.
Additionally, longitudinal DTI studies show that maturation of reading-related tracts is related to improvements in reading ability [6][7][8][9][10]. White matter abnormalities have been observed in children with reading difficulties, most often in left temporo-parietal white matter [11][12][13][14] as language and reading networks are typically left lateralized [11,15,16]. Finally, changes in DTI measures are observed in reading-related white matter following reading interventions [17][18][19]. DTI studies have identified a network of white matter related to reading but cannot comment on the particular features of white matter microstructure driving these relationships. Fractional anisotropy (FA) and mean diffusivity (MD) describe water diffusion and are simultaneously sensitive to many microstructural factors [20][21][22][23]. Newer techniques with increased specificity may be used to build upon DTI literature. Neurite orientation dispersion and density imaging (NODDI) produces the neurite density index (NDI) and orientation dispersion index (ODI) which are sensitive to axonal packing and tract coherence, respectively [24]. Inhomogeneous magnetization transfer (ihMT) and multicomponent driven equilibrium singlepulse observation of T1 and T2 (mcDESPOT) produce the quantitative ihMT (qihMT) and myelin volume fraction (VF m ) measures respectively, both sensitive to myelin [25,26]. Additionally, measures of axon volume and myelin volume such as NDI and VF m can be combined to produce the g-ratio, which describes the ratio of axon thickness to total fiber diameter [27]. These methods have been validated in vitro [28][29][30][31][32][33], and they hold great potential to clarify our understanding of white matter development and links to reading.
Investigating multiple imaging measures in a univariate fashion, the typical practice in developmental studies to date, necessarily increases the number of comparisons and may introduce redundancy via shared sensitivities between metrics, reducing the discriminating power of the analysis. One solution to reduce comparisons and exploit shared sensitivities is to collapse white matter measures into orthogonal components via principal component analysis (PCA). A framework using PCA for dimensionality reduction in white matter has been recently described [34], and resultant components were linked to age, suggesting developmental sensitivity. The goal of this study was to combine white matter imaging techniques (DTI, NODDI, ihMT, and mcDESPOT) to better understand relationships between brain structure and reading in a sample of healthy 6-16 year old children. We aimed to investigate links between resultant principal components and both age and reading to describe development of key microstructural features and how these features underlie reading. We hypothesized that observed principal components would represent diffusion restriction and tissue complexity, similar to previous studies [34]. Furthermore, we expected that these components would be linked to age and reading proficiency in reading-related tracts, such that indications of more myelin, axonal packing, and fiber coherence would increase with age and would relate to better reading performance.

Participants
46 healthy participants aged 6-16 years (mean age: 11.0 ± 2.6 years, 24 males / 22 females) were recruited as part of an ongoing study on pediatric brain development. Inclusion criteria were: 1) uncomplicated birth between 37-42 weeks' gestation, 2) no history of developmental disorder, psychiatric disease, or reading difficulty, 3) no history of neurosurgery, and 4) no contraindications to MRI. 22 children (mean age: 13.3 ± 2.6 years, 11 males / 11 females) returned 2 years after their initial visit for a second scan and cognitive assessment. All subjects provided informed assent and parents/guardians provided written informed consent. Gender was determined by parent report. This study was approved by the local research ethics board, Conjoint Health Research Ethics Board (CHREB, ID: REB13-1346). All subjects provided informed assent and parents/guardians provided written informed consent.

Image processing
All images were visually inspected for quality assessment and processed separately using appropriate tools before being combined for principal component analysis. Preprocessing for T1 images was carried out in FreeSurfer 5.3 (http://surfer.nmr.mgh.harvard.edu/) for intensity normalization and brain extraction. Preprocessing for DTI datasets was performed within ExploreDTI [35]. Preprocessing steps included signal drift correction [36], brain extraction, eddy current and motion corrections [37,38], and registration to skull-stripped T1 images to correct geometric distortions induced by echo-planar imaging. The REKINDLE model was used to calculate FA, MD, radial diffusivity (RD), and axial diffusivity (AD) maps for each subject using the b = 900 s/mm 2 shell only [39]. Whole brain tractography was performed on b = 900 s/mm 2 data using constrained spherical deconvolution [40] with L_max = 6, 2mm isotropic seed voxels, 1mm step size, FA threshold of 0.2, 30 maximum angle of deviation and an acceptable streamline range of 50 to 500mm. Following whole brain tractography, semiautomated methods [41] were performed to segment the arcuate, inferior longitudinal (ILF), inferior fronto-occipital (IFOF), and uncinate fasciculi bilaterally, along with the splenium, as shown in Fig 1. A 11-year old female with high data quality was selected as the exemplar participant for this process; all regions were drawn on this template brain and then registered to other participants' data for tracking in native space [42]. Processed multi-shell DTI datasets were also exported to the NODDI Toolbox (http://www.nitrc.org/projects/noddi_toolbox) for calculation of isotropic (f iso ) and intracellular (f icvf , or NDI) volume fractions and ODI.
Pseudo-quantitative ihMT maps (qihMT) and magnetization transfer ratio (MTR) maps were calculated from ihMT data using an in-house GE protocol as described in previous work [43]. Following MTR and qihMT image production, brain extraction was performed on MTR images using FSL's BET2 tool [44], and resulting brain-extracted MTR image was used as a mask to produce a brain-extracted qihMT image.
mcDESPOT SPGR, IR-SPGR, and bSSFP images were aligned to the SPGR image with the largest α then processed by fitting T1, T2, and volume fractions to three water compartments (myelin-bound, intra/extracellular, and free), along with exchange rates between myelinbound and intra/extracellular water [45]. The myelin-bound water volume fraction from this fitting was used to produce VF m maps for each participant. G-ratio maps were computed using VF m , NDI, and f iso maps to calculate the fiber volume fraction (FVF) and g-ratio using the following two equations.
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ð1 À VF m Þ=FVF p Following production of all measure maps, qihMT, MTR, VF m , NDI, and ODI maps were registered to b = 900 s/mm 2 FA maps using Advanced Normalization Tools (ANTs) [46]. Default parameters from antsRegistrationSyN.sh were used, with the-t s flag chosen to select rigid, affine, and deformable symmetric normalization transforms. Then, the mean FA, MD, AD, RD, NDI, ODI, MTR, qihMT, VF m , and g-ratio values were extracted for all 9 tracts of interest (Fig 1) per participant. Additionally, along-tract analysis was performed in Explor-eDTI [47,48], to sample all ten measures at twenty equidistant points along each tract. visually depicts all processing steps performed following preprocessing of images in their native space.

Reading assessments
Reading was evaluated using the Wechsler Individual Achievement Test-Third Edition: Canadian [49]. Participants completed the Reading Comprehension, Word Reading, Pseudoword Decoding, and Oral Reading Fluency subtests. From these subtests, the Total Reading Composite Score was computed as a measure of general reading proficiency. This score combines phonological awareness, reading comprehension, and fluency.

Principal component analysis
To implement principal component analysis in white matter, we followed the methods described in Chamberland et al [34]. All analysis was conducted in R version 3.6.1 [50]. First, along tract data for each subject's first time point (10 measures x 9 tracts of interest x 20 points along each tract) was combined into a single table for principal component analysis (described in Chamberland et al [34]). A Kaiser-Meyer-Olkin (KMO) test was conducted via the KMO() function to assess correlations between input measures and indicate the suitability of our Processing pipeline to prepare imaging data for principal component analysis. Preprocessed diffusion-weighted images (A) were registered to T1-weighted anatomical images (B). Measure maps from NODDI, ihMT, and mcDESPOT sequences were registered to diffusion-weighted images in anatomical space (C) to produce all measure maps in anatomical space (D). Next, whole brain tractography was computed from b = 900s/mm 2 data using constrained spherical deconvolution (E), and tracts of interest were segmented in a semiautomated fashion in ExploreDTI (F). Measure means were extracted for each tract of interest (G) and along each tract of interest at 20 equidistant segments (H).
https://doi.org/10.1371/journal.pone.0233244.g002 measure set for PCA; values >0.5 indicate suitability [51]. PCA was performed via the prcomp () function (using the scale = 1 option to normalize each feature independently). Following PCA, input variable contributions to principal components along with correlations between variables within along-tract data were inspected to identify redundancy between variables. In the case of highly collinear measures (moderately to highly correlated (|r| > 0.6) and contributed to PCA outputs similarly), the variable with highest correlations to all other input measures was removed to improve stability of PCA computations [52] and PCA was recomputed. Principal components with eigenvalue > 1 were retained, while other components were discarded [53]. Varimax rotation was applied on retained principal components via the varimax () function to maximize differences in principal components loadings and improve interpretation of component sensitivities. Measures were considered meaningful contributors to a resultant principal component if they accounted for above average variance (>11.1%) in the component.

Statistical analysis
All statistical analysis was performed in R version 3.6.1 [50]. Following varimax rotation, longitudinal principal component weightings were calculated by multiplication of time point 2 along tract data with the rotation matrix output by varimax(). Next, along tract weightings for principal components were averaged in each tract to produce mean principal component weightings for each subject in all 9 investigated tracts. Linear mixed effects models were computed via lmer() [54] to investigate relationships between principal components with Total Reading and age in each tract. Age models included age, gender, an age � gender interaction, and a random intercept per subject, to account for repeated measures within subjects. If the age � gender interaction was not significant, it was removed and the model was rerun. Total Reading models for each tract included all retained principal components along with age, and gender if a gender effect was observed for any principal component. Restricted maximum likelihood was used for all models. Benjamini-Hochberg false discovery rate (FDR) correction was used to correct for 27 comparisons (9 tracts x three principal components). Multiple comparisons corrections were conducted separately for age and Total Reading findings. Example formulas are provided below. Time point 1 data for each measure included in our final PCA was correlated with Total Reading via partial correlation in each region, controlling for age, and FDR correction was applied for 9 correlations across each measure.
Bayes factor analysis was performed via generalTestBF in the BayesFactor package for R [55] to supplement regression analysis by assessing the observed statistical power of models connecting retained principal components and Total Reading. Bayes factors output by general-TestBF were inverted to reflect the ratio of likelihood of the null hypothesis divided by the likelihood of a given model. A Bayes factor greater than 3, indicating our data was 3 times more likely to be described by the null hypothesis than a given model, was considered evidence for the null hypothesis. A Bayes factor less than 1/3, indicating that a model including our chosen predictors was 3 times more likely to explain our data than the null hypothesis, was considered evidence for the alternative hypothesis. Bayes factors between 1/3 and 3 were considered indicators of low power, such that neither evidence for the null or alternative hypotheses could be inferred [56]. MTR was removed from our principal component analysis due to high collinearity with qihMT (r 2 = 0.64). Three principal components were identified in our final model, which collectively explained 79.5% of variance (KMO test value = 0.53). Measures contributing greater than 11.1% variance (expected if all variables contributed uniformly) to a component following varimax rotation are visualized in Fig 4. Interpretation of principal components was carried out by evaluating the common microstructural sensitivities of each measure, and by comparison to previous PCA analyses in white matter [34,57]. Principal component (PC) 1 explained 37.5% of variance and was primarily composed of measures sensitive to tissue complexity: FA, AD, ODI, along with MD. PC2 explained 23.0% of variance and was composed of measures sensitive to myelin and axon packing: FA, MD, RD, and NDI. PC3 explained 19.0% of variance and was driven by measures sensitive to myelin and axonal diameter, VF m and g-ratio.

Principal component analysis
As shown in Fig 4 panel A, FA and MD contributed strongly to PC1 and PC2 even after varimax rotation, likely because FA and MD are broadly sensitive to white matter structure. To better interpret components, we removed FA and MD and recomputed PCA (results shown in Fig 4, panel B). The reduced model (denoted as PC B ) had three principal components that explained 77.3% of variance (KMO = 0.43). PC1 B explained 36.6% of variance and was composed of RD, NDI, and qihMT. PC2 B explained 22.7% of variance and was composed of VFm and g-ratio. Finally, PC3 B explained 18.0% of variance and was driven by AD and ODI. Mixed effects regression models and Bayes factor analyses were conducted with the full PCA model including FA and MD to provide comparable data to previous studies, to preserve power to detect age and reading effects, and because the KMO test value of 0.43 for PC B indicated that input variables may not share enough information for robust factor analysis.

Regression models
Mixed effects models results linking principal components to Total Reading scores are summarized in Table 1. No significant relationships were observed between principal components and Total Reading. To further investigate the absence of significant relationships between principal components and Total Reading, we followed up by running mixed effects models between principal components and subtest scores for Reading Comprehension, Word Reading, Pseudoword Decoding, and Oral Reading Fluency. No significant relationships were observed between principal components and reading subtest scores. Correlations between the initial measure set and Total Reading are summarized in S1 Table. No significant correlations were observed between individual measures and Total Reading scores. Table 2 summarizes models linking principal components to age and gender. A significant relationship between PC1 and age was observed in the left arcuate (t = -2.93, p = 0.004). Increases in PC1 with age suggest increased diffusion restrictions and tissue complexity

Bayes factor analysis
Bayes factors analysis was conducted to evaluate Total Reading mixed effects regression models. Results from this analysis are summarized in Table 3. Bayes factors including all principal components and age as covariates of Total Reading were greater than 3 in all regions, indicating evidence for the null hypothesis.

Discussion
We applied principal component analysis in a multimodal dataset including highly specific measures of myelin, axon packing, and fiber coherence to investigate white matter development and links to reading. PCA identified three principal components that explained a large proportion of variance (79.5%) in our dataset, and represented tissue complexity (axon coherence), diffusion restriction (axonal packing and myelination), and axon diameter. The interpretation of principal components was based upon common sensitivities shared by the measures in each component and previous literature. The sensitivity of each individual metric included in PCA has been histologically validated [20,[28][29][30][31][32][33], suggesting that the interpretations presented here are biologically meaningful. PC1 explained the largest amount of variance (37.5%). With significant contributions from FA, MD, AD, and ODI, PC1 probed diffusion anisotropy and was driven by axon integrity and coherence. PC2 explained 23.0% of variance and reflects myelin and axonal packing, as shown by heavy loadings of FA, MD, RD, and NDI. Finally, PC3 explained 19.0% of variance and was driven by VF m and g-ratio. PC3 likely corresponds to axon diameter, as principal components are expected to be orthogonal and PC2 contains several myelin-sensitive measures. Studies employing PCA with white matter imaging measures have identified similar principal components related to diffusion anisotropy and overall diffusivity [34,57]. Our PCA expands upon previous findings by including non-diffusion measures from magnetization transfer and relaxometry. This allowed our multimodal PCA to identify a novel third component related to axon diameter. Shared information between white matter imaging metrics resulted in measures loading onto multiple principal components, in particular FA and MD. This was addressed in multiple ways. First, in the case of highly correlated variables, redundant variables (MTR) were Table 3 Principal components were not significantly related to Total Reading scores in any investigated region. Bayes factors suggested the null hypothesis was substantially more likely than the alternative hypothesis in all regions. No significant relationships were identified in follow-up mixed effects models including principal components, age and scores from subtests included in the Total Reading composite score. Further, no significant correlations between initial measures and Total Reading scores were significant following multiple comparisons corrections. These findings suggest that gross relationships between white matter structural features and Total Reading ability are absent in typically developing children and adolescents, who tended to be skilled readers in our sample. Expanding this analysis to a larger age range or a population with reading difficulties may provide a larger effect to assess, and further insight into the role of white matter in reading.

. Bayes factors assessing the likelihood of the null hypothesis condition (no relationship between Total Reading scores and model components) versus the likelihood of the model condition (relationships between included components and Total Reading
Despite a lack of broad relationships between key white matter features and reading, some findings here hint that more specific relationships may be present in our sample. Pvalues < 0.1 suggest a larger sample may find significant relationships between PC2 or PC3 and Total Reading in the left IFOF and left uncinate, respectively. Left hemisphere ventral white matter supports reading processing in skilled readers, and left inferior frontal regions have been consistently highlighted as related to reading skill in previous studies [3,6,[8][9][10]. Additionally, qihMT was correlated with Total Reading ability in the bilateral arcuate fasciculus and ILF, the right IFOF and right uncinate fasciculus, and was trend level in the left IFOF (see S1 Table), though these findings did not survive multiple comparison corrections. Interestingly, qihMT was not significantly related to Total Reading in either the left IFOF or uncinate fasciculus, where trend level relationships with principal components were found. Trend level relationships between PC2, PC3, or qihMT and Total Reading provide some evidence for a link between axon diameter and myelin and reading. However, these relationships must be investigated and confirmed by future studies.
Links between principal components and age were identified throughout the brain. Relationships between PC2 and age were most prominent, found in all tracts except the uncinate fasciculus, and are visualized as scatterplots in Fig 5. Age-related trends tended to be similar between left and right hemispheres, suggesting that at the macro-scale, brain development is similar between hemispheres. This is in contrast to investigations of individual microstructural features, where increases in VF m were shown to be largely left-lateralized during adolescence [58]. PC2 findings may be driven by NDI, as NDI has been previously shown to be age-sensitive and increases bilaterally throughout adolescence [58][59][60]. One relationship between PC1 and age remained in the left arcuate following multiple comparisons. While axon coherence tends to be stable across adolescence [61][62][63], we show that changes may still be ongoing in some regions. Gender was related to PC3 in the right inferior longitudinal fasciculus such that males had higher values than females. Higher PC3 values reflect higher VF m and lower g-ratio values, thus the development of the right inferior longitudinal fasciculus may be further along in males. Studies of sex effects on white matter development have produced mixed results, suggesting either absence of or minor developmental effects during childhood and adolescence (for review see [64]), but large longitudinal studies remain necessary to effectively assess sex and gender effects across development.
This study has several limitations. First, inclusion of broadly sensitive measures such as FA and MD decreased clarity in interpretation of our principal components. We included these metrics to provide a baseline for future work applying principal component analysis in white matter, and to better connect to previous work. Future investigators should seek to refine their set of included metrics and exclude generally sensitive measures which may mask loadings of other, more specific metrics. Second, not all participants provided longitudinal data, and younger participants contributed fewer longitudinal data points than older participants. Future studies with more longitudinal data may be better able to elucidate relationships between components of white matter structure and age or reading across development. Finally, although the metrics applied here have been histologically validated, none are truly specific to any microstructural feature. Principal component analysis helps to address these sensitivities by focusing on information that is shared between measures, but our interpretation is still complicated by the multiple factors which affect each imaging metric.

Conclusions
Here, we combined multimodal imaging techniques to assess microstructure in readingrelated white matter tracts. Principal component analysis revealed three key features of white matter microstructure that explained 79.5% of variance in our dataset. Principal components were related to tissue complexity, axon packing and myelin, and axon diameter. No significant relationships were observed between principal components and Total Reading scores, suggesting gross relationships between white matter structural features and reading are not present in typical children and adolescents. Some trend level results suggest minor roles for axon diameter and myelin in reading ability, but these findings must be confirmed by further research. Principal components were sensitive to age effects, consistent with previous studies. PCA is an effective tool to preserve power and exploit shared variance between imaging metrics. Resultant principal components are age-sensitive have expanded our understanding of links between white matter and reading. This study provides an important initial description of PCA in a multimodal set of white matter imaging metrics, and will serve as an important baseline for future studies investigating white matter in development or cognitive disorders.