¶ MR, MM, and BB also contributed equally.
‡ Membership for GROUP Investigators and iPSYCH-GEMS SCZ working group is provided in the Acknowledgments.
The authors declare no conflict of interest.
Conceived and designed the experiments: DJ BH MZ SC MMN MR MM BB. Performed the experiments: DJ ML CL SR MZ MM. Analyzed the data: DJ BH SC MMN MR MM BB. Contributed reagents/materials/analysis tools: MZ JF SHW TWM JT JS SM FD IG TGS RM IN HS DR WM AB RO SC MMN MR BB. Wrote the paper: DJ BH SC MMN MR MM BB.
In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e.
Large-scale genetic studies of complex diseases such as schizophrenia have identified a variety of susceptibility loci. Since many of the respective variants have only a weak influence on disease risk, pathophysiological interpretation of the results is problematic. Investigation of the joint effects of multiple functionally related genes or pathways increases the power to detect disease related genes, and provides insights into the etiology of the disease in question. In the present study, an integrated hierarchical approach was applied to: (i) identify pathways associated with complex neuropsychiatric disease schizophrenia (ii) detect potentially affected genes in these pathways; and (iii) annotate the functional consequences of genetic markers in the affected genes or their regulatory regions. Two samples comprising >10,000 individuals of European ancestry as well as data from the Psychiatric Genomics Consortium schizophrenia study were examined. Pathways representing transcriptional regulation and gene expression, cell adhesion, apoptosis, and synapse organization showed significant association with schizophrenia. In particular,
Genome-wide association studies (GWAS) have identified common susceptibility variants for numerous disorders
At the time of writing, several methods are in use for the pathway-based analysis of GWAS data
Various methodological approaches to pathway association analysis are available. Maciejewski
While selection of the pathway association method is an important consideration, the power of a given pathway association study is also dependent upon other factors. These include the biological information (i.e. from gene-set and pathway databases) that is integrated into the model, the use of independent replication datasets, and the different levels of interpretation, which extend from the pathway level to the level of SNPs.
As a logical consequence, researchers are now modifying analytical frameworks in order to increase their power and potential impact. To achieve this, the present study has applied a hierarchical approach (see
Application of the Global Test to the BOMA-UTR (MooDS SCZ consortium (BOMA)) dataset and independent data from a Dutch study (UTR),
Sample | Ancestry | Case (n) | Control (n) | Platform |
Reference |
BOMA | German | 1 531 | 2 168 | I5, I6Q, IWQ | |
UTR | Dutch | 699 | 642 | I5 | |
GAIN | European | 1 157 | 1 364 | A6 | |
MGS | European | 1 279 | 1 282 | A6 |
Platforms are: I5, Illumina HumanHap 550; I6Q, Illumina Human610 Quad; IWQ, Illumina Human660W-Quad; A6, Affymetrix Genome-Wide Human SNP Array 6.0.
Publication reporting individual sample level genotypes for Schizophrenia is listed.
Discovery set: single nucleotide polymorphisms (SNPs) before pruning – 491,393; after pruning – 419,267.
Replication set: SNPs before pruning – 669,059; after pruning – 552,988.
Description | Pathway ID | BOMA-UTR | GAIN-MGS | Number of SNPs | |||
BH | P | BH | P | BOMA-UTR | GAIN-MGS | ||
dbMIR:gagcctg,mir-484 | GAGCCTG,MIR-484 | 1.60E-02 | 1.32E-04 | 1.01E-04 | 6.66E-06 | 1,658 | 2,332 |
dbGO:0008270:zinc ion binding |
GO:0008270 | 1.58E-02 | 8.13E-06 | 1.01E-04 | 7.46E-06 | 14,839 | 34,704 |
dbGO:0046914:transition metal ion binding |
GO:0046914 | 5.17E-04 | 8.20E-07 | 1.02E-04 | 1.14E-05 | 17,193 | 40,248 |
dbTFT::v$hnf4 q6 |
V$HNF4_Q6 | 5.85E-04 | 6.42E-06 | 5.04E-04 | 9.33E-05 | 3,450 | 4,375 |
dbGO:0010628:positive regulation of gene expression |
GO:0010628 | 2.34E-02 | 3.21E-04 | 7.88E-04 | 1.75E-04 | 8,878 | 18,006 |
dbTFT:v$chop 01 | V$CHOP_01 | 3.76E-05 | 1.65E-07 | 5.51E-03 | 1.63E-03 | 4,365 | 5,436 |
dbKEGG:04514:cell adhesion molecules (cams) | hsa04514 | 2.02E-02 | 8.45E-04 | 1.21E-02 | 4.02E-03 | 3,562 | 4,846 |
dbTFT:v$ciz 01 |
V$CIZ_01 | 3.76E-05 | 1.63E-07 | 1.59E-02 | 5.88E-03 | 4,443 | 5,987 |
dbKEGG:04210:apoptosis |
hsa04210 | 1.97E-02 | 6.42E-04 | 3.33E-02 | 1.48E-02 | 985 | 1,304 |
dbTFT:v$sox5 01 | V$SOX5_01 | 1.02E-03 | 2.01E-05 | 5.32E-02 | 2.56E-02 | 5,067 | 6,159 |
dbTFT:v$cebpa 01 | V$CEBPA_01 | 4.03E-04 | 3.54E-06 | 7.84E-02 | 4.07E-02 | 4,113 | 5,133 |
dbTFT:v$ptf1beta q6 | V$PTF1BETA_Q6 | 1.02E-03 | 1.60E-05 | 1.07E-01 | 6.73E-02 | 4,849 | 5,911 |
dbCGP:Kyng dna damage by uv | KYNG_DNA_DAMAGE_BY_UV | 2.89E-02 | 9.89E-05 | 1.49E-01 | 9.96E-02 | 577 | 732 |
dbGO:0050808:synapse organization | GO:0050808 | 3.21E-02 | 2.89E-05 | 3.28E-01 | 2.92E-01 | 2,504 | 3,862 |
* - Significant pathways identified by more than one pathway analysis method within the BOMA-UTR data set. The test statistics obtained using the alternative algorithms are provided in
Note: FDR – False Discovery Rate; BH – Benjamini-Hochberg.
To visualize the integration of the Global Test application on a SNP-, a gene- and a pathway level, Circos plots were generated for the entire genome (
(B) Inset legend providing information represented by each data ring. Notes: for visibility, the implicated gene locations were zoomed in upon by up to 1200%. The inset legend image provides information represented by each ideogram. −log10 of the individual SNP and the gene p-values increase radially outward. The arc of each heatmap wedge maps directly to the location of the SNP in the genome. The arc width is proportional to the size of the associated gene (plus 20 kb upstream and downstream). Individual SNP p-values for the BOMA-UTR and the GAIN-MGS data sets are shown as scatterplots on ideograms A and B. The gene p-values for Psychiatric Genetics Consortium (PGC) datasets are shown as a scatterplot on ideogram C. The significance scores for genes contributing to a pathway significance are shown as heatmaps on ideograms 1–14. 1 - dbGO:0050808:synapse organization; 2 - dbKEGG:04514:cell adhesion molecules; 3 - dbCGP:Kyng dna damage by UV; 4 - dbKEGG:04210:apoptosis; 5 - dbGO:0046914:transition metal ion binding; 6 - dbGO:0008270:zinc ion binding; 7 - dbGO:0010628:positive regulation of gene expression; 8 - dbMIR:gagcctg,mir-484; 9 - dbTFT:v$cebpa 01; 10 - dbTFT::v$hnf4 q6; 11 - dbTFT:v$chop 01; 12 - dbTFT:v$ptf1bea q6; 13 - dbTFT:v$ciz 01; 14 - dbTFT:v$sox5 01. The darker the red, the higher the contribution of the SNP/gene to the association of the respective pathway. Comparing the overlapping of important genes in different pathways allows investigation of whether they lie within intersections of those pathways.
A total of 100 genes fulfilled the criteria described in the Methods section “Gene-based analysis with Global Test and FORGE”, i.e. these genes map to SNPs with a component Global Test p-value of <0.001 in the BOMA-UTR dataset. Of these, the following eight genes were annotated to at least four (up to eight) of the 14 replicated pathways, thus indicating their potential importance in terms of SCZ risk:
Of the genes that were annotated to the 14 replicated pathways, the top 100 were then tested in the Psychiatric Genomewide Association Study Consortium (PGC) data. Of these, significant results were obtained for 18 genes (see
Polyphen-2 predicted that the coding SNPs of interest in
Notes: * genotyped in the BOMA-UTR data set and sorted by their genomic coordinates. SNPs are within or 20 kb upstream and downstream of
The complete functional annotation data for the SNPs of
In the present study, a genome-wide pathway association analysis was performed by means of the Global Test. The analyses involved well-curated descriptions of 7,350 pathways, and were carried out on large-scale discovery and replication datasets. A gene-based analysis of genes with a high contribution to the significance of the top pathways was then performed using the SCZ GWAS results of the PGC. Finally, a functional SNP-based analysis of the top hit genomic regions was conducted. Through this hierarchical approach, we were able to replicate pathway findings from previous studies of SCZ and detect novel pathways and genomic regions with an association to SCZ in the investigated samples. In the discovery set, we detected evidence for a significant contribution of 27 pathways. Of these, 14 remained significant in the replication dataset. The 14 replicated pathways are involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis.
Previous pathway analyses of SCZ GWAS data have identified associations with pathways that are mainly involved in processes critical to synaptic function, neurodevelopment, cell adhesion, the immune system, the estrogen biosynthetic process, and apoptosis
However, the majority of pathways with significant association to SCZ in the present study are novel, and they are mainly involved in transcriptional regulation and gene expression. One reason for the failure of previous pathway-based studies of SCZ to generate similar findings may have been that they focused mainly on gene sets from the KEGG and BioCarta databases, whereas we accessed several pathway databases. These included the GO database, as well as special gene-set collections on chemical and genomic perturbations (dbCGP), and transcriptional regulation such as dbTFT and dbMIR. It should be noted that only few of our 14 replicated pathways achieved significance in the analysis of our discovery sample using GRASS
As part of our hierarchical approach, we aimed to identify which genes in a particular pathway could be responsible for the association with SCZ risk. Integration of gene-based analysis facilitated both the prioritization of potential candidate genes and more precise formulation of hypotheses concerning the functional consequences of the potential pathway perturbations (i.e. at the gene- and SNP-level). In particular, we explored how variants that emerged as being of importance for our pathway- and gene-based signals might affect the function and regulation of other genes.
In the gene-based analysis,
Another top hit gene in the present study was
The association with the apoptosis pathway was driven predominantly by a SNP which mapped to
In conclusion, the present study demonstrated that use of information from databases focusing on cell-regulatory networks together with information from traditional pathway database resources can facilitate the identification of susceptibility factors for the complex neuropsychiatric disease SCZ. Through the application of a well-designed hierarchical framework, our study highlighted the importance of calcium channel signaling, cell adhesion, and the modulation of transcriptional regulation implicated in neuronal diversity, neurite growth, and synapse formation in the etiology of SCZ. In particular,
Each participant provided written informed consent prior to inclusion and all aspects of the study complied with the Declaration of Helsinki. The study was approved by the ethics committees of all study centers. For the German samples, this comprised the Ethics Committee of the Rheinische Friedrich-Wilhelms-University Medical School in Bonn, Ethics Committee “Medizinische Ethik-Kommission II” of the University of Heidelberg, the Ethics Committee of the Friedrich-Schiller-University Medical School in Jena, and the Ethics Committee of the Ludwig-Maximilians-University Munich. Samples obtained through dbGaP were collected using institutional review board-approved protocols in three studies, i.e. Schizophrenia Genetics Initiative (SGI), Molecular Genetics of Schizophrenia Part 1 (MGS1), and MGS2.
Participants from four datasets were included (
To accommodate the Global Test's assumption of independence between variables, the SNP set was reduced according to a variance inflation factor (VIF) and using a sliding window approach, as implemented in PLINK
For the gene-based analysis, PGC data (
SNPs were annotated with information from dbSNP Build 127. The “seq-gene” file containing information for annotating the SNP rs numbers to ENTREZ gene IDs was downloaded from the NCBI ftp website (BUILD 36.3). SNPs were assigned to a gene if the SNP was located within the genomic sequence or within 20 kb of the 5′ and 3′ ends of the first and last exons in order to account for important regulatory regions
Selected gene-set collections were accessed from the Molecular Signatures Database (MSigDB, version 3.0)
For the pathway-based analysis, the Global Test
At the discovery stage of the analysis, less conservative correction for multiple testing was applied in order to prioritize the identification of associated pathways. This was a legitimate approach, since any false positives would be controlled for in the replication analysis. Multiplicity correction was applied for each individual collection of pathways/gene-sets. For pathways/gene-sets retrieved from the KEGG, Reactome, and MSigDB gene-set collections, the pathway scores were corrected for multiple testing using the Benjamini-Hochberg method
To estimate the contributions of individual SNPs to a pathway- or a gene association, the component global test was performed using the
Only pathways that were significantly associated with SCZ in the discovery set were followed-up (
The aim of the second step (
The third step (
The heatmap of the level of gene overlap between the 27 schizophrenia associated pathways. The values in the cells indicate the maximum fraction overlap of the genes in a pathway (listed on y-axis). The corresponding pathway name in the x-axis is a pathway with the highest overlap (self-overlap is excluded).
(TIF)
Hierarchical clustering of replicated pathways. The data are the counts of overlapping implicated single nucleotide polymorphisms, as detected using the Global Test in the BOMA-UTR dataset.
(TIF)
Comparison of the p-values obtained from the single nucleotide polymorphism-label permutation and subject-sampling test for all gene-sets.
(TIF)
Comparisons of FDRs (BH) and P-values (P) for (
(DOC)
Comparison of redundancies in the subsets of the 6 pathway databases/gene-set collections.
(DOC)
(
(DOC)
List of schizophrenia (SCZ) associated genes, their p-values (FORGE analysis), and membership in the SCZ associated pathways discovered and replicated in the present study. Pathways in bold also showed an overall association using one of the other three methods (ALIGATOR, GRASS, gseaSNP) applied in the present study.
(DOC)
Potential functional consequences of CTCF associated SNPs.
(XLS)
Potential functional consequences of CACNB2 associated SNPs.
(XLS)
The Global Test results for the discovered gene-sets remained significant when the test was repeated with varying degrees of multicollinearity in the data.
(DOC)
Description of supplementary results and methods.
(DOC)
We thank two anonymous reviewers, whose comments/suggestions improved and clarified the manuscript. We are grateful to all of the patients who contributed to this study. We also thank the probands from the community-based cohorts of PopGen, KORA, the Heinz Nixdorf Recall (HNR) study. We thank Rolf Kabbe and Karl-Heinz Groβ for providing IT support. We thank Christine Schmäl for her critical reading of the manuscript. We acknowledge the contribution of Fitnat Buket Basmanav to the generation of the genome wide association study data sets analyzed in the present study.