Genome-Wide Association Analysis with Gray Matter Volume as a Quantitative Phenotype in First-Episode Treatment-Naïve Patients with Schizophrenia

Reduced Gray matter (GM) volume is a core feature of schizophrenia. Mapping genes that is associated with the heritable disease-related phenotypes may be conducive to elucidate the pathogenesis of schizophrenia. This study aims to identify the common genetic variants that underlie the deficits of GM volume in schizophrenia. High-resolution T1 images and whole genome genotyping data were obtained from 74 first-episode treatment-naïve patients with schizophrenia and 51 healthy controls in the Mental Health Centre of the West China Hospital, Sichuan University. All participants were scanned using a 3T MR imaging system and were genotyped using the HumanHap660 Bead Array. Reduced GM volumes in three brain areas including left hOC3v in the collateral sulcus of visual cortex (hOC3vL), left cerebellar vermis lobule 10 (vermisL10) and right cerebellar vermis lobule 10 (vermisR10) were found in patients with schizophrenia. There was a group by genotype interaction when genotypes from genome-wide scan were subsequently considered in the case-control analyses. SNPs from three genes or chromosomal regions (TBXAS1, PIK3C2G and HS3ST5) were identified to predict the changes of GM volume in hOC3vL, vermisL10 and vermisR10. These results also highlighted the usefulness of endophenotype in exploring the pathogenesis of neuropsychiatric diseases such as schizophrenia although further independent replication studies are needed in the future.


Introduction
Schizophrenia is one of the leading causes of mental disability, which affects about 1% of the population worldwide [1]. It has been suggested that genetic factors plays an important role in the pathophysiology of schizophrenia by many convincing studies [2]. Indeed, the heritability of schizophrenia is as high as 80% [3]. However, it is challenging to identify genes involved in the pathogenesis of schizophrenia due to its complex model of inheritance and the unknown pathophysiology of the disorder. Furthermore, phenotypic heterogeneity, such as various clinical presentation and duration of illness, complicated the genetic study in schizophrenia.
Of these previous studies, the latest Psychiatric Genomics Consortium (PGC) identified 61 common variants in liability to schizophrenia. Two studies in Han Chinese population suggested some common variants involved in susceptibility of schizophrenia [4,5], though there were no overlapping SNPs detected from the two studies. One of possible reasons may be that the heterogeneity of clinical manifestations such as the severity of disease and more subtle characteristics were ignored in large according to the clinical diagnostic categories. The current diagnosis for schizophrenia mainly depends on subjective and qualitative information. In case-control design of GWAs, it is also possible that a small proportion of current healthy controls have actually harbored disease because of the long clinical silent prodromal phase. On the other hand, although the GWAs have successfully identified some risk genes of complex diseases including diabetes, macular degeneration, Crohn disease, Alzheimer disease, and Parkinson disease [6], it is a great challenge to have enough sample sizes to obtain satisfied statistical power, and to control population stratification which is due to the combination of samples from multiple ethnic backgrounds. The unsatisfied statistical power due to small sample size can be improved by using a quantitative trait (QT) analysis [7]. Compared to a simple dichotomous classification of diagnosis, QT related to studied disorder is an objective and informative measurement and could be used as endophenotypes or intermediate phenotypes in genetic analysis. Endophenotypes have been considered to be more proximal to the biological etiology of the disorder [8], and may provide an alternative strategy to explore the pathogenesis of complex genetic diseases such as schizophrenia.
The meta-analytic, twin and family studies showed that whole and regional GM volumes are highly heritable [9,10]. And GM volume abnormalities covary in a dose-dependent manner with risk for schizophrenia [11,12]. Thus it should be a good strategy to use GM volume as one of the endophenotypes in molecular studies of schizophrenia. Some of the previous studies have successfully performed candidate gene analyses using GM as QT [13][14][15], however, only 3 studies employed GWAs design for schizophrenia [7,16,17]. Moreover, QT-GWAs design in Chinese sample is still rare.
The aim of the present study is to identify the common genetic variants that underlie the deficits of GM volume in schizophrenia. We used the SPM Anatomy Toolbox [18][19][20] to obtain anatomical region of interests (ROIs) volume and to detect the brain areas with different ROIs volumes between patients with first-episode treatment-naïve schizophrenia and healthy controls. Subsequently, GM volumes in these brain areas were integrated into genetic data from GWAS analysis as QTs in order to identify novel susceptibility loci for schizophrenia.

Participants
The study was approved by the Ethics Committee of the West China Hospital of Sichuan University. All next of kin, carer takers or guardians consented on the behalf of participants to provided written informed consent for their participation. We used the following criteria to evaluate whether the participants had the capacity to consent: Firstly, patients have the ability to understand; Second, patients have the ability to know reason; Thirdly, patients have the ability to make rational decisions. If participants failed to fill out the consent form more than twice, their guardians were asked to fill out the consent on the behalf of patients. A total of 125 participants were recruited at the Mental Health Centre of the West China Hospital, Sichuan University, People Republic of China, including 74 first-episode patients with schizophrenia and 51 healthy controls. The Structured Clinical Interview for DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, fourth edition) -Patient Version (SCID-P) was used and patients fulfilling the diagnostic criteria for schizophrenia as specified in DSM-IV were included. The interviews were performed and the diagnoses of schizophrenia were confirmed by an attending psychiatrist and a trained interviewer. Healthy controls were recruited from the local area by poster advertisement and they were screened for the lifetime absence of psychiatric illnesses by the SCID nonpatient version (SCID-NP). All healthy controls were interviewed to assure that there was no history of psychiatric illness in their first-degree relatives. All patients with schizophrenia have been followed up for at least 6 months in order to confirm the diagnosis of schizophrenia. All participants are Han Chinese from Sichuan province of China.

Imaging acquisition
One patient and one control were excluded from subsequent analysis because of poor images' quality. Finally, data were acquired for the 125 participants undergoing MRI scans in the Department of Radiology at West China Hospital, Sichuan university with a 3T MR imaging system (EXCITE, General Electric, Milwaukee, USA) which has an eight-channel phase array head coil. High-resolution T1 images were obtained by the 3D spoiled gradient echo sequence (SPGR) from all participants. The protocols used were as follows: TR=8.5ms; TE=3.93ms; flip angle=12°; thickness of slice=1 mm; single shot; field of view (FOV) =24cm×24cm; matrix =256×256; size of voxel=0.47×0.47×1 mm 3 . In total, 156 slices of axial images were collected from each brain.

Image preprocessing
The DICOM format data were collected from magnetic resonance scanning and then transformed using MRIcro software [41]. We used the non-parametric non-uniformity intensity normalization (N3) in the MINC software package (http://wiki.bic.mni.mcgill.ca/index.php/MINC) to correct the intensity inhomogeneity of images by manually aligning on the AC-PC line. The Brain Extraction Tool (BET) in FSL software was used to remove non-brain tissues and structures in the images [42] and to obtain native images for further processing. In order to process the data, Statistical Parametric Mapping (SPM2 package, Wellcome Department of Imaging Neuroscience, London; http://www.fil.ion.ucl.ac.uk/spm) software and the optimized VBM method were implemented in Matlab 7.0. All the 125 participants underwent structural MRI scans (in native space) and the brain areas were segmented into GM, white matter (WM) and cerebrospinal fluid (CSF). We obtained the GM images which were affine-normalized in the same stereotactic space (International Consortium for Brain Mapping) according to the GM template (gray. mnc). The normalization parameters were implemented to the original structural images. Then the tissue segmentations were applied to normalized images to obtain image of GM, WM and CSF. Specifically, the images were normalized and segmented by an integrated generative model (unified segmentation) [43] with the customized templates. Jacobian determinants derived from the spatial normalization step were applied to the segmented images to correct voxel signal intensity for volume displacement and to reflect the volume during normalization. Finally we used a 6-mm FWHM kernel to smooth each optimally normalized, segmented and modulated image. The SPM Anatomy Toolbox [18] was applied to obtain anatomical ROIs volumes.

Statistical Analysis Imaging
In order to detect whether there were any differences in the volume of areas extracted from smoothed images by the SPM Anatomy Toolbox between schizophrenia and control individuals, we used SPSS16.0 statistical software to perform an ANOVA analysis to explore the differences of GM volume between schizophrenia and controls, age and gender were incorporated as covariates. Statistically significant p value thresholds were set at<0.05 after Bonferroni correction.

Genotyping and quality control
We obtained DNA with the standard isolation method from whole blood samples and performed genotyping using the HumanHap660 Bead Array. The quality of the SNP genotypes was assessed for genotyping rates across samples, minor allele frequency (MAF) and Hardy-Weinberg equilibrium (HWE) tests (only in controls). Participants with genotyping rate < 95% were excluded. The gender of each participant was confirmed with genotyping data on gender-specific loci provided by Illumina. For pairs of participants with identical genotypes, the member with the lower genotype call rate was excluded. Finally, 99393 SNPs were removed with more than 10% missing genotypes across samples, 95922 SNPs were removed with MAF<5% and 331 SNPs failed the HWE test in control subjects (i.e. p-value <10 -5 ). Consequently a total of 464,219 SNPs passed the quality control and a mean call rate of 98.9% was obtained; 2 patients and 1 control failed gender check and were excluded from subsequently analysis. Finally, 74 patients and 51 controls passed quality control and were included in subsequent analysis.

Genotype imputation
BEAGLE was employed to infer genotypes for untyped SNPs. We used the haplotypes from the HapMap project as the reference dataset for the genotype imputation, which include the phased genotypes of 45 subjects from the CHB panel of HapMap Phase 2, 41 subjects from the CHB panel and 83 subjects from the CHD panel of HapMap Phase 3. The phased genotypes were downloaded from http:// hapmap.ncbi.nlm.nih.gov/downloads/phasing/?N=D. The tool IGG3 [44] (http://bioinfo.hku.hk/iggweb/) was used to integrate the phased genotypes into our GWAS genotype dataset and to export the integrated data in BEAGL input format. In imputation, the parameters of BEAGLE were set by default.

Correction for population stratification
Population stratification was detected by using EIGENSTART [45], which employs principal components analysis (PCA) to capture hidden population structure in GWAS data. Prior to the PCA, chromosomal regions of long-range linkage disequilibrium (LD) [46] were removed and the SNPs were pruned so that the retained neighboring SNPs were in weak LD with each other (with PLINK command --indeppairwise 50 10 0.5). PCA was performed on the remaining set of 189,398 SNPs. The top 10 principal components (PCs) were extracted for subsequent QT analyses as covariates (See Figure S1).

QT analysis
The volumes of hOC3vL, vermisL10 and vermisR10 (see Figure S2), which were detected to be different between the patients and controls, were used as QTs in separate regression analyses. The 125 individuals who passed both imaging and genotyping quality control were included in QT analyses. The statistical model was based on comparing the differential effects of SNP association by diagnosis on the brain imaging QT. Out of the 4 possible models (additive, codominant, dominant and recessive) the linear model with interactions between diagnosis and SNPs implements the additive model that generally reflects the additive contribution to risks for complex diseases [47]. For QT analyses, we implemented the PLINK program with linear model function (--linear) [48] (http:// pngu.mgh.harvard.edu/~purcell/plink/), in which each QT was regressed on interactive effects between diagnosis and an amount of minor alleles of the 464,219 SNPs having passed quality control. The covariates (age and gender) were included in the linear model. Meanwhile, the top 10 PCs were included as covariates in QT analyses to control potential population stratification effects. In these models, we used an ordinal variable (controls =1, schizophrenia=2) to reflect the disease status. On the other hand, since in the initial analyses of interaction term model, there were a large number of variables analyzed and the complexity of the model was applied, we chose the p value threshold ≤ 10 -6 to keep consistent with WTCCC recommendations [6]. Meanwhile, we employed an even more conservative regulation that required to detect at least 2 SNPs with the p value <10 -6 within a gene or intergenic region for each QT analysis. The results were shown in Manhattan plots and Quantile-quantile plots (See Figure 1 and Figure S3).

Set-based analysis
Set-based analyses were used for the significant genes in single SNPs regressive analysis in PLINK. This method combined the association p-values of SNPs within the same gene (coding sequences± QUOTE _D5kb flanking regions) to calculate a gene-based p-value. This gene-based test accounted for gene size and LD pattern. Each SNP in the gene set was searched for its possible SNPs in LD (r 2 =0.5). Standard single SNP QT analysis was then performed. All 'independent' SNPs (r 2 =0.5) with p-values below 0.05 were selected in the set. The best SNP was selected first and subsequent SNPs were selected in the order of decreasing statistical significance after removing SNPs in LD with previously selected SNPs. The statistical parameters of each SNP subset were calculated as the mean of these single SNP statistics. The data set was further permuted by 10,000 times, keeping LD between SNPs constant, which was also called as permuting trait. The empirical p-value for set (empirical p value EMP1) was the number of times that the permuted set statistically exceeded the original one of the set. The empirical p values were corrected for the multiple SNPs within a set (http:// pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#set).

Demographic data and Clinical characteristics
In present study, there were 74 patients with schizophrenia (37 males and 37 females) and 51 healthy controls (28 males and 23 females). All patients with schizophrenia were firstepisode and treatment-naïve by the time of entering the study. Table 1 represents the demographic and clinical characteristics of all the participants (see Table 1). There were no significant differences in mean age (t=1.29; p=0.20), gender distribution (Pearson's χ 2 =0.29; p=0.56) and mean years of education (t=1. 44; p=0.15) between patients with schizophrenia and normal controls. The mean duration of disease in patients was 9.91 (SD =6.21, range 3-11.2) months and the mean age of onset was 23.67 (SD = 8.56, range 16-35.2) years old when they were recruited. The average of total PANSS score of the patient group was 94.54 (SD =15.00), suggesting that patients were in the acute phases of the disease.

Reduced GM volume in patients with schizophrenia
By using SPM Anatomy Toolbox, we found that GM volumes in three brain areas including left hOC3v in the collateral sulcus of visual cortex (hOC3vL), left cerebellar vermis lobule 10 (vermisL10) and right cerebellar vermis lobule 10 (vermisR10) were significantly reduced in patients with schizophrenia (See Figure S1). Furthermore, the correlation analysis showed that the correlative coefficient between vermisL10 and vermisR10 was 0.91(p=0.001), the correlative coefficient between vermisL10 and hOC3vL was 0.45(p=0.02) and the correlative coefficient between vermisL10 and hOC3vL was 0.43(p=0.04), respectively.

Genetic analysis of QT
After quality control and genotype imputation, the final dataset consisted of 1,983,054 SNPs from 125 individuals. The genomic inflation factors of 3 endophenotyps after PCA adjustment (λ) were 1.025, 0.991, 1.021 and 1.018 respectively, indicating the absence of major population stratification (see Figure S3). For the interaction terms of SNP x diagnosis, using the criteria that requires at least 2 SNPs within a gene or intergenic region with the p value <10 -6 , we identified a number of SNPs from three genes associated significantly with GM volumes in above three brain areas, respectively. As shown in Table 2, 3 SNPs (2 typed and 1 imputed) on the TBXAS1 gene were associated with the GM volume of hOC3vL. 4 SNPs (2 typed and 2 imputed) on the PIK3C2G gene were associated with the GM volume of vermisR10 and vermisL10, with additional 2 SNPs (1 typed and 1 imputed) on PIK3C2G associated only with the GM volume of vermisL10. 9 SNPs (4 typed and 5 imputed) on the HS3ST5 gene were associated with the GM volume of vermisR10. The regional SNPs LD maps of in chromosomal region of TBXAS1, PIK3C2G and HS3ST5 were shown in Figure 2. Moreover, as shown in Figure 3, there were close correlations amongst the SNPs inside each gene. Also, correlations could be found between typed and imputed SNPs.
The effect per minor allele on GM volume of above three brain areas was relatively small, with β approximately 4% ( Table 2). The SNP effect of the minor allele in TBXAS1 and PIK3C2G genes were consistent with the reduced GM volume of hOC3vL and bilateral cerebellar vermis lobule 10, respectively. However, the effect of the minor allele in HS3ST5 gene was correlated with the increased GM volume of vermisR10.
Furthermore, we compared the GM volumes of three brain areas according to genotypes of each significant SNP in table 2 in cases and controls, respectively. We found that patients with homozygous GG of rs10277664 in TBXAS1 gene have reduced GM volume of hOC3vL while compared to healthy controls (p<0.001), but there were no significant difference for subjects with heterozygous (AG) or homozygotes (AA) between patients and controls. We also found that there was a significant smaller GM volume of hOC3vL in individuals with genotype GG of rs10277664 in TBXAS1 gene than individuals with genotype AA in patients only (p<0.03) (see Figure 3). We also used set-based analysis to obtain the combined association p-values of SNPs within the same gene for each QT. Table S1 showed gene-wise p values, number of SNPs in set, total number of SNPs below p-value threshold, and number of significant SNPs passing LD-criterion.

Discussion
In present study, we used the SPM Anatomy Toolbox to obtain anatomical ROIs volumes and detected significant differences of the GM volumes in hOC3vL, vermisL10 and vermisR10 between patients with schizophrenia and healthy controls. The GM volume in the three regions was found to be correlated with each other, which may imply the role of same or shared genetic factors to these deficits in these brain regions. Furthermore, by using GM volume from these 3 brain areas as QT, we performed a GWAS on 74 patients with schizophrenia and 51 controls from a Han Chinese population and identified a number of SNPs from three genes or chromosomal regions (TBXAS1, PIK3C2G and HS3ST5) were associated with changed GM volumes of hOC3vL, vermisL10 and vermisR10. The fact that more than 2 SNPs on the genotyping chip exhibited association and they were in strong LD confirmed that the significant association was not likely to result from a genotyping artifact. Moreover, imputed SNPs were identified to be significantly associated with candidate endophenotypes, which further confirmed that the associations detected in our study were genuine.
The hOC3v area is located in the collateral sulcus of visual cortex and represents the anatomical substrates of the functionally defined areas of VP/V3v [21]. Previous studies on neurocognitive deficits of schizophrenia have been mainly focused on executive function, attention and working memory with less attention on perceptual processing. Studies have shown that perceptual deficits are one of core features of clinical manifestations in patients with schizophrenia [22][23][24][25][26]. The lower local sulcal index in the left collateral sulcus were detected in patients with early-onset schizophrenia by Penttilä et al. [27]. The present study adds new evidence for the deficit of visual processing in patients with schizophrenia, which support the hypothesis of dysfunction within low-level visual pathways involving thalamocortical radiations [22,28]. On the other hand, cerebellar vermis consists of ten lobules (lobule I-X) [29] and some studies found that the GM deficits in firstepisode schizophrenia were more prominent in the right lobule III and IX of cerebellar vermis while compared to healthy controls [30]. To the best of our knowledge, the present study is the first one to identify the GM deficits in bilateral lobule X in patients with schizophrenia.
Three genes (TBXAS1, PIK3C2G and HS3ST5) have been implicated with the GM volume changes of cortex in schizophrenia, particularly in the collateral sulcus of visual cortex and cerebellar vermis lobule 10. PIK3C2G gene is located to 12p12 and it encodes the protein which belongs to the phosphoinositide 3-kinase (PI3K) family. Previous studies suggested that the PI3K plays important role in the myelinsignaling pathway, involved in cell proliferation, oncogenic transformation, cell survival, cell migration and intracellular protein trafficking. PI3K also involved in the prosurvival effects of BDNF in the SH-SY5Y neuroblastoma cell line, as well as in the protective effects of BDNF against cortical neuronal death. Jungerius et al. found a weak but significant association between PIK3C2G gene and schizophrenia [31]. HS3ST5 gene is located in 6q21 and encodes a protein that belongs to a group of heparan sulfate 3-O-sulfotransferases (EC 2.8.2.23) and transfers sulfate from 3-prime-phosphoadenosine 5-prime phosphosulfate (PAPS) to heparan sulfate and heparin. The product of HS3ST5 gene, which is a member of the human 3-OST family, regulates heparan sulfate binding to AT III and Gd. A study has revealed that 3-OST-5 was highly expressed in fetal brain, followed by adult brain and spinal cord. Its level is very low or undetectable in other tissues [32]. In addition, this chromosome region (6q21) has long been of interest for studying schizophrenia although replicating positive linkage and association in schizophrenia are difficult [33]. TBXAS1 gene is located to 7q34-q35 and has been related with the physiopathology of a number of diseases including cerebral infarction, myocardial infarction and stroke [34,35]. It belongs to the cytochrome P450 superfamily. Alternative spliced transcript variants encoding different isoforms have been found for this gene. The cytochrome P450 proteins are monooxygenases that catalyze reactions involved in drug metabolism and the synthesis of cholesterol, steroids and other lipids. This endoplasmic reticulum membrane protein catalyzes the conversion of prostglandin H2 to thromboxane A2, a potent vasoconstrictor and an inducer of platelet aggregation.
It should be noted that the majority of SNPs on the Illumina Infinium HumanHap 660 BeadArray are tagging SNPs as surrogate makers to represent a given small chromosome region. The association identified in present study suggested evidence that areas of three genes (TBXAS1, PIK3C2G and HS3ST5) or related chromosome regions harbor susceptibility allele for schizophrenia with GM volume as QT. However, it is likely that the causal SNPs for the disorder are just located in close proximity to these positive SNPs identified in current study and requires confirmation in independent samples. Future studies should consider sequencing these genomic regions in a larger cohort of patients with schizophrenia to identify causal genetic variants for schizophrenia. It will also be useful to explore the biological functions of these genes for the disease.
By using the case-control approach, previous GWAS of schizophrenia have detected several noteworthy candidate genes [4,5]. However there has been little consistency in those findings. In current study, the linear model with interaction of diagnosis x SNPs has been used with the advantageous to detect the effects of different genotypes between case and control in QT analysis. In order to exclude the false positive in the QT analyses in the interaction term model, we chose a threshold of p value <10 -6 for at least 2 independent SNPs within a gene/intergenic region in each QT analysis. Moreover, this study has another advantage by including the first-episode treatment-naïve patients whereas most previous studies of QT GWAS included chronic and long-term drugs-taking patients  [7,36]. The latter could disguise the effects of some confounding factors, such as antipsychotic medications, duration of illness, hospitalization and so on. Ho et al. found that antipsychotics had a subtle but measurable influence on brain tissue losses over time in a long-term longitudinal study of 211 patients over 7-14 years follow up [37]. More extensive structural brain abnormalities were found in chronic patients with schizophrenia that may reflect the progressive nature of this condition compared to first-episode patients with schizophrenia. Thus the brain deficits in early stage could reflect the features of neurodevelopmental deficits of schizophrenia. With inconsistent with our finding, Hu et al. and Leung et al. found that grey matter deficits in frontal, temporal, insular, striatal, posterior cingulate, and cerebellar for firstepisode drug naïve schizophrenic patients [38,39]. We should acknowledge that it is difficult to detect the similar brain regions with deficits between different studies. And the difference could be due to lots of reasons, especially methodology. In our study, we used the SPM Anatomy Toolbox to obtain ROI regions which was a finer segmentation for whole brain, and could partly account for the difference in results. In addition, sample size and race of participants could partly contribute to the difference too. It should be acknowledged that our study still has some limitations. First, the sample size was not big enough for GWAs. Second, this study needs to confirm by independent replications, especially in non-Chinese samples. Third, the SNPs detected were in non-encoding regions of the genes, so further functional analyses are needed for these genes. Finally, since environment factors, such as city living and urban upbringing [40], play an important role in brain development and risk of schizophrenia, it will be better for this study to include environmental factors and examine their roles in the pathogenesis of schizophrenia.
In summary, through the GWAS approach combined with GM volume as QTs, we identified some novel susceptibility loci for schizophrenia on 3 genes/intergenic regions, namely TBXAS1, PIK3C2G and HS3ST5, as potential risk factors for schizophrenia in first-episode treatment-naïve patients with schizophrenia and controls. The present study also highlighted the usefulness of endophenotype in exploring the pathogenesis of neuropsychiatric diseases such as schizophrenia although further independent replication studies are warranted in the future.