A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.


Introduction
Genome-wide association study (GWAS) has become a very effective way to identify common genetic variants underlying various complex traits [1]. The most commonly used approach to analyze GWAS data is the single-locus test, which evaluates one single nucleotide polymorphism (SNP) at a time. Despite the enormous success of the single-locus analysis in GWAS, proportions of genetic heritability explained by already identified variants for most complex traits still remain small [2]. It is increasingly recognized that the multi-locus test, such as genebased analysis and pathway (or gene-set) analysis, can be potentially more powerful than the single-locus analysis, and shed new light on the genetic architecture of complex traits [3,4].
The pathway analysis jointly tests the association between an outcome and SNPs within a set of genes compiled in a pathway according to existing biological knowledge [4]. Although the marginal effect of a single SNP might be too weak to be detectable by the single-locus test, accumulated association evidence from all signal-bearing SNPs within a pathway could be strong enough to be picked up by the pathway analysis if this pathway is enriched with outcome-associated SNPs. Various pathway analysis procedures have been proposed in the literature, with the assumption that researchers could have full access to individual-level genotype data [5][6][7][8][9]. In practice, pathway analysis usually utilizes data from a single resource with limited sample size, as it can be challenging to obtain and manage individual-level GWAS data from multiple resources. As a result, pathway analysis often fails to identify new findings beyond what have already been discovered by the single-locus tests. To maximize the chance of discovering novel outcome-associated variants by increasing sample size, a number of consortia have been formed to conduct single-locus meta-analysis on data across multiple GWAS [10][11][12][13][14]. The single-locus meta-analysis aggregates easily accessible SNP-level summary statistics from multiple studies. Similarly, the pathway-based meta-analysis [15][16][17][18][19][20][21] that integrates the same type of summary data across participating studies could provide us a greater opportunity for detecting novel pathway associations. Future association studies focusing on identified pathways would have a much-reduced multiple-comparison burden in searching for novel variants with main or complicated nonlinear joint effects on the outcome of interest.
In this paper, we developed a pathway-based meta-analysis procedure by extending the adaptive rank truncated product (ARTP) pathway analysis procedure [9], which was originally developed for analyzing individual-level genotype data. The new procedure, called Summary based ARTP (sARTP), accepts input from SNP-level summary statistics, with their correlations estimated from a panel of reference samples with individual-level genotype data, such as the ones from the 1000 Genomes Project [22,23]. This idea was initially used in conducting genebased meta-analysis [24,25] or conditional test [26]. As will be shown in the Results Section, sARTP usually has a power advantage over its competitors. In addition, sARTP is specifically designed for conducting pathway-based meta-analysis using SNP-level summary statistics from multiple studies. In real applications (e.g., the type 2 diabetes example described below), it is very common that different studies could have genotypes measured or imputed on different sets of SNPs. As a result, the sample size used in the pathway-based meta-analysis on each SNP can be quite different. Ignoring the difference in sample sizes across SNPs in a pathwaybased meta-analysis would generate biased testing results.
Pathway analysis generally targets two types of null hypotheses [4], including the competitive null hypothesis [15,16,[18][19][20], i.e., the genes in a pathway of interest are no more associated with the outcome than any other genes outside this pathway, and the self-contained null hypothesis [17,21], i.e., none of the genes in a pathway of interest is associated with the outcome. The sARTP procedure focuses on the self-contained null hypothesis, as our main goal is to identify outcome-associated genes or loci. Also, as pointed out by [27], tests for the competitive null hypothesis often assume that genotype measured at different genes are independent when evaluating the association significance level. This assumption, which is generally invalid in practice, is unnecessary for sARTP when testing the self-contained null hypothesis. One may refer to [27] and [4] for more discussion and comparison of these two types of hypotheses.
The pathways defined in many public databases can consist of thousands of genes and tens of thousands of SNPs. To make the procedure applicable to large pathways, or pathways with high statistical significance, we implement sARTP with efficient and parallelizable algorithms, and adopt the direct simulation approach (DSA) [28] to evaluate the significance of the pathway association.
We demonstrated the validity and power advantage of sARTP through simulated and empirical data. We applied sARTP to conduct a pathway-based meta-analysis on the association between type 2 diabetes (T2D) and 4,713 candidate pathways defined in the Molecular Signatures Database (MSigDB) v5.0. The analysis used SNP-level summary statistics from two sources with European ancestry. One is generated from the Diabetes Genetics Replication and Meta-analysis (DIAGRAM) consortium [13], which consists of 12,171 T2D cases and 56,862 controls across 12 GWAS. The other one is based on a T2D GWAS with 7,638 T2D cases and 54,319 controls that were extracted from the Genetic Epidemiology Research on Aging (GERA) study [29,30]. The novel T2D-associated pathways detected in the European population were further examined in Asians using summary data generated by the Asian Genetic Epidemiology Network (AGEN) consortium meta-analysis, which combined 8 GWAS of T2D with a total of 6,952 and 11,865 controls from eastern Asian populations [10].

Materials and Methods
The Pathway-Based Meta-analysis Procedure Here we describe the proposed method sARTP for assessing the association between a dichotomous outcome and a pre-defined pathway consisting of J genes. The same procedure can be applied to study a quantitative outcome with minor modifications.
Score statistics and their variance-covariance matrix. We assume we have data from L GWA studies, with each consisting of n (l) subjects, l = 1,Á Á Á,L. Each gene in that pathway can contain one or multiple SNP(s), while any two genes may have some overlapped SNPs. For simplicity, we use superscript l to represent an individual study. For subject i in study l, i = 1,Á Á Á,n (l) , let y ðlÞ i be the dichotomous outcome (e.g., disease condition, case/control status) taking values from {0,1}, and let X ðlÞ i be the vector of covariates to be adjusted for. The centralized genotypes of q SNPs within a pathway are presented as a vector G ðlÞ i ¼ ðg ðlÞ i1 ; Á Á Á ; g ðlÞ iq Þ T for subject i. We assume the following logistic regression model as the risk model Under the self-contained null hypothesis H 0 : γ = 0, we denote the maximum likelihood estimate of α (l) as b a ðlÞ . Let b y ðlÞ i ¼ 1=ð1 þ expðÀX ðlÞ i b a ðlÞ ÞÞ and u ðlÞ The Rao's score statistic vector on γ, which is the sum of score vectors from L participating studies, follows the asymptotic multivariate normal distribution N(0,V), where When only the summary statistics, i.e., the estimated marginal log odds ratios b b ðlÞ t and their standard errors t ðlÞ t are available for each of the L studies, the score statistic at SNP t, defined by (Eq 1) can be approximated as Assume that ρ ts can be estimated from a public dataset (e.g., 1000 Genomes Project) and the sample sizes n ðlÞ t and n ðlÞ ts are known, we can approximately recover the variance-covariance matrix V = (V ts ) q×q of score statistics S = (S t ) q×1 . In cases when we only have the SNP p-value p and its marginal log odds ratio b b, we can compute its standard error as Combining score statistics for pathway analysis. With recovered score statistics vector S and its variance-covariance matrix V, we can conduct a pathway association test using the framework of the ARTP method. The ARTP method first combines p-values of individual SNPs within a gene to form a gene-based association statistic (i.e., the gene-level p-value), and then combines the gene-level p-values into a final testing statistic for the pathway-outcome association. In the original ARTP method, [9] proposed the use of a resampling-based method to evaluate the significance level of the pathway association test. Here we integrate the SNPlevel score statistics into the ARTP framework and use DSA [28] to evaluate the significance level, which is much faster than the original ARTP algorithm [31]. Below is a brief summary of the improved ARTP algorithm.
First we obtain the p-values p ð0Þ t 1 ; Á Á Á ; p ð0Þ t q j of q j distinct SNPs in gene j as p ð0Þ jð1Þ ; Á Á Á ; p ð0Þ jðq j Þ be their order statistics such that p ð0Þ jð1Þ Á Á Á p ð0Þ jðq j Þ . For any predefined integer K and SNP-level cut points c 1 < Á Á Á < c k , we define the observed negative log product statistics for that gene at cut point c k as We sample M copies of vectors of the score statistic from the null distribution N(0,V) and convert each of them to be the tail probability of w 2 1 as p ðmÞ t 1 ; Á Á Á ; p ðmÞ t q j , m = 1,Á Á Á,M, which are then used to calculate w ðmÞ jk , m = 1,Á Á Á,M. The significance of w ð0Þ jk can be estimated as The ARTP statistic for testing association between gene j and the outcome is defined as T ð0Þ jk . Note that for any w ðmÞ jk , the set fw ðm0Þ jk : m 0 2 f0; Á Á Á ; Mg and m 0 6 ¼ mg forms its empirical null distribution. The significance of w ðmÞ jk therefore can be estimated as This idea, which was given by [32], can be used to avoid the computationally challenging nested two-layer resampling procedure for evaluating p-values. The p-value of T ð0Þ j can be readily calculated as j is the estimated gene-level p-value for the association between the outcome and the jth gene. To obtain the pathway p-value, a similar procedure as above can be applied to combine already established gene-level p-values z ð0Þ j , j = 1,Á Á Á,J, through a set of K 0 gene-level cut points d 1 < Á Á Á <d K 0 . For simplicity, let z ð0Þ k be the significance (p-value) of negative log product statistics defined on z ð0Þ j , j = 1,Á Á Á,J at a specific cut point d k , k = 1,Á Á Á,K 0 . The ARTP statistic for the pathway association is defined as T ð0Þ ¼ min k¼1;ÁÁÁ;K 0 z ð0Þ k . The top d k Ã genes, at which z ð0Þ k Ã ¼ min k¼1;ÁÁÁ;K 0 z ð0Þ k , can be regarded as the set of selected candidate genes that collectively convey the strongest pathway association signal.
In the following discussion, we will use the term sARTP to represent the proposed pathway analysis procedure using the SNP-level summary statistics as input, and reserve the term ARTP to represent the original ARTP procedure that requires the individual-level genetic data. Both procedures adopt the DSA algorithm to accelerate evaluating the significance level. When performing the pathway analysis in this paper, we set SNP-level cut points as (c 1 ,c 2 ) = (1,2), i.e., gene-level association is summarized by one or two most significant SNPs within each gene, and gene-level cut points as d k ¼ kmaxð1; dJ=20eÞ, k = 1,Á Á Á,10, where J is the number of genes in a pathway, and dJ=20e is the largest integer that is less or equal to J / 20. We used M = 10 5 DSA steps to assess the significance level of each pathway in the initial screening. For pathways with estimated p-values <10 −4 , we further refined their p-value estimates with M = 10 7 or 10 8 DSA steps.
Applying sARTP to meta-analysis result. Many GWAS consortia usually publish their meta-analysis results by providing only the combined results from the fixed effects model, rather than the summary statistics from each participating study. We can apply sARTP to this metaanalysis result directly, with some modifications. First, since the reported marginal log odds ratios for each SNP by using the fixed effects inverse-variance weighting method is given by ; with its standard error given by Based on (Eq 4), we can see S t % t À2 t b b t . By assuming large sample sizes and certain conditions (see S1 Text), we can also approximate the covariance between S t and S s , which is given by (Eq 5), as where n t ¼ X L l¼1 n ðlÞ t , and n ts ¼ n ðlÞ ts . Thus, using just the meta-analysis result, without knowing summary statistics from each participating study, we can still obtain S t exactly, and approximately recover V ts . As a result, we can carry out the pathway-based meta-analysis based on the SNP-level meta-analysis result as if it were summary data from a single study. We call this approach the Meta-analysis based sARTP (MsARTP). However, to apply the MsARTP, we need additional sample size information n t and n st in order to properly estimate the variance-covariance matrix defined by (Eq 7). If the same set of SNPs are studied by all participating studies, we have n t = n s = n st , and the approximation (Eq 7) becomes V ts % r ts t t t s , i.e., we can obtain the estimated variance-covariance matrix without knowing n t and n st . But in most applications, not all GWAS choose the same SNP genotyping array, even after the imputation using the same reference genomes. As a result, the SNP coverage, i.e., the set of SNPs evaluated in each participating study can be quite different. In those situations, we need to know the SNP coverage information in each participating study in order to obtain n s and n st . We will show in the Results Section that using MsARTP with an inappropriate uniform coverage assumption (i.e., n ts = n t = n s ), which is commonly made by many multilocus approaches, can lead to inflated type I error.
Given SNP-level summary statistics from each participating study, we can either apply sARTP directly, or first conduct a SNP-level meta-analysis, and then apply MsARTP to the meta-analysis result. These two approaches use the same score statistics, and different but consistent estimates for the variance-covariance matrix. Numeric experiments in the Results Section suggest that these two approaches generate vary similar pathway p-values.

Study Materials
Pathway and gene definition. We downloaded definitions for 4,716 human and murine (mammalian) pathways (gene sets) from the MSigDB v5.0 (C2: curated gene sets). Genomic definitions for genes were downloaded from Homo sapiens genes NCBI36 and reference genome GRCh37.p13 using the Ensemble BioMart tool.
DIAGRAM study. The DIAGRAM (DIAbetes Genetics Replication And Meta-analysis) consortium conducted a large-scale GWAS meta-analysis to characterize the genetic architecture of T2D [13]. We downloaded the summary statistics generated by the DIAGRAMv3 (Stage 1) GWAS meta-analysis from www.diagram-consortium.org [13]. The meta-analysis studied 12 GWAS with European ancestry consisting of 12,171 cases and 56,862 controls. Up to 2.5 million autosomal SNPs with minor allele frequencies (MAFs) larger than 1% were imputed using CEU samples from Phase II of the International HapMap Project. Study-specific covariates were adjusted in testing T2D-SNP association under an additive logistic regression model [13]. SNP-level summary statistics from each GWAS were first adjusted for residual population structure using the genomic control (GC) method [33], and then combined in the fixed effects meta-analysis.
We sorted 2.5 million autosomal SNPs by their corresponding meta-analysis sample sizes in S1 Fig, which shows that there are two major groups of SNPs with equal sample sizes. One group of 469,985 SNPs (19.0%) had 12,171 cases and 56,862 controls, which included all the available samples in the meta-analysis; another group of 1,431,361 SNPs (57.9%) had 9,580 cases and 53,810 controls. Since the calculation of covariance V ts in (Eq 7) relies on n ts , the number of samples having genotypes available at both SNP s and SNP t, in order to obtain an accurate estimate of n ts , we focused on these two groups of SNPs, which in combination had a total of 1,901,346 SNPs. For any two SNPs in this reduced set, it is certain n ts = min(n t , n s ). The Pearson's correlation coefficients ρ ts were estimated using an external reference panel consisting of genotypes on 503 European subjects (CEU, TSI, FIN, GBR, and IBS) from the 1000 Genomes Project (Phase 3, v5, 2013/05/02).
GERA study. We assembled a GWAS on T2D from the Genetic Epidemiology Research on Adult Health and Aging (GERA, dbGaP Study Accession: phs000674.v1.p1). The GERA project includes a cohort of over 100,000 adults who are members of the Kaiser Permanente Medical Care Plan, Northern California Region, and participating in the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH). From the GERA data, we compiled a GWAS with 7,638 T2D cases and 54,319 controls (subjects without T2D) who selfreported to be non-Hispanic White Europeans in the RPGEH survey. We performed the genotype imputation with IMPUTE2 [34] using CEU reference samples from Phase II of the International HapMap Project. After removing SNPs with low imputation quality (r 2 < 0.3), we ended up with 2.4 million SNPs for further analysis. In the single-locus analysis, we adjusted for the categorized body mass index (BMI) provided in the downloaded dataset (adding a category for missing BMI), gender, year of birth (in five-year categories), a binary indicator on whether or not a participant was diagnosed with cancer (includes malignant tumors, neoplasms, lymphoma and sarcoma), and the top five eigenvectors for the adjustment of population stratification. In the following discussion, we refer this assembled T2D GWAS as the GERA study. When analyzing the SNP-level summary data from the GERA study, the Pearson's correlation coefficients ρ ts were estimated using an external reference panel consisting of genotypes on 503 European subjects from the 1000 Genomes Project.
AGEN-T2D study. The Asian Genetic Epidemiology Network (AGEN) consortium carried out a meta-analysis by combining eight GWAS of T2D with a total of 6,952 cases and 11,865 controls from eastern Asian populations [10]. The meta-analysis was conducted with the fixed effect model. We obtained SNP-level summary statistics on 2.6 million imputed and genotyped autosomal SNPs from AGEN, and used this summary data to evaluate whether pathway associations identified in European populations remain to be present in Asians. We adopted an external reference panel consisting of 312 eastern Asian subjects (103 from CHB, 105 from CHS, and 104 from JPT) from the 1000 Genomes Project for the variance-covariance matrix estimation in the pathway analysis.

Simulation Studies
Firstly, we conducted a simulation study to evaluate the empirical size of sARTP and MsARTP. Secondly, we compared empirical powers of different strategies for carrying out pathway-based meta-analysis that integrated summary statistics from multiple studies. We also evaluated whether results from sARTP were consistent with the ones from MsARTP. Thirdly, we compared our method to the recently developed method aSPUsPath [8] that can be used for pathway-based meta-analysis. We used the R package, aSPU (version 1.39), with the default settings given in [8,17] to conduct the aSPUsPath test.
Empirical size of sARTP and MsARTP. To evaluate the empirical size of sARTP and MsARTP, we conducted a simulation study by using individual-level GWAS data of the pathway PUJANA_BREAST_CANCER_WITH_BRCA1_MUTATED_UP (including 728 SNPs in 50 genes) from the GERA study. We picked 12,000 samples randomly for this experiment. By keeping their genotypes unchanged, we randomly assigned 6,000 subjects as cases and the remaining as controls to generate 500,000 datasets. We split each dataset into three case-control studies, each with 2,000 cases and 2,000 controls. To mimic the scenario when not all studies have their genotypes measured on the same set of SNPs (such as the one occurred in the DIAGRAM and AGEN data), we assumed that each case-control study had genotypes measured on only half of SNPs in the pathway. For each generated dataset that consisted of three case-control studies, we applied sARTP to the SNP-level summary data obtained from each case-control study, and MsARTP to the meta-analysis result based on the three case-control studies, with the variance-covariance matrix estimated by an external reference panel (with 503 European reference samples from the 1000 Genomes Project), or an internal reference panel (with 500 samples randomly selected from the GERA data).
Based on results from the 500,000 generated datasets, this simulation study showed that both sARTP and MsARTP, using the internal or external reference samples, can well control their empirical sizes (Table 1). Given the same reference panel, the p-values estimated from sARTP and MsARTP are highly consistent (Pearson's correlation coefficient > 0.99). Furthermore, the p-values of sARTP (or MsARTP) estimated with an external or internal reference panel are also very consistent (Pearson's correlation coefficient > 0.99). More numeric experiments demonstrating the validity of sARTP under the null are described in S2 Text.
To demonstrate the importance of knowing n t and n ts when applying MsARTP to the metaanalysis result, we analyzed each simulated dataset using MsARTP assuming the uniform coverage (n ts = n t = n s ). We called this approach MsARTP-u. It is clear from Table 1 that MsARTP-u assuming the uniform coverage suffers from inflated type I errors with either the internal or external reference panel.
Empirical power of sARTP and MsARTP for pathway-based meta-analysis. We conducted a set of simulation studies to compare the power of different strategies to carry out pathway analysis when SNP-level summary statistics were available from multiple studies. We considered a hypothetical pathway consisting of 50 genes randomly selected from chromosome 17, each with 20 randomly chosen SNPs. The joint genotype distribution at the 20 SNPs within each gene was defined by the observed genotypes in the GERA study. We further assumed that all genes in that pathway are independent. This assumption is unnecessary for sARTP and MsARTP, but it was introduced for simplifying the simulation. For the risk model, we assumed the first M (M ¼ 5; 10; 15) genes were associated with the outcome. Within each outcomeassociated gene, we picked the SNP with its MAF closest to the median MAF level within the gene to be functional. We considered the following risk model where g Ã l is the genotype (encoded as 0, 1, or 2 according to counts of minor alleles) at the functional SNP within gene l. Under this model, g Ã l is also the marginal log odds ratio for the lth functional SNP [9]. Given the sample sizes of cases and controls, and the MAF of the lth functional SNP, g Ã l was chosen such that the theoretical power of the trend test to detect the lth functional SNP is equal to P (P ¼ 0:3; 0:4), with 0.05 as the targeted type I error rate. For every pair of ðM; PÞ, we generated 1,000 datasets, each consisting of three case-control studies, with the same sample size and SNP coverage configurations used for evaluating the empirical size. Given the genotype distribution in the general population, individual-level genotype data for a case-control study can be generated according to the assumed risk (model 8).
We assumed that only SNP-level summary statistics from each of the three studies were available. For each simulated dataset, we applied sARTP and MsARTP, using either an internal or external reference panel to estimate the variance-covariance matrix. The sARTP and MsARTP approaches integrate association evidence across SNP-level summary statistics, which are obtained by pooling information from all participating studies on individual SNPs. As a comparison, we also considered a naïve approach, in which we first applied sARTP to analyze the summary statistics from each study separately, and then combined the three pathway p-values with Fisher's method. This naïve approach could be useful when the researchers do not have access to the SNP-level summary data but the pathway p-values from individual studies. The empirical powers are compared at the type I error level of 0.05, and are summarized in Table 2. It is obvious that the pathway-based meta-analysis using sARTP, with either the internal or external reference panel, have almost the same level of power as the MsARTP method. It is also evident that both sARTP and MsARTP are more powerful than the naïve approach, which suggests that it is always be beneficial to have the SNP-level summary statistics from each participating study, or SNP-level meta-analysis result when conducting a pathway analysis.
Given the SNP coverage information, the MsARTP method is a valid pathway association test that has well controlled type I error and similar power to the sARTP method. In the following analysis, either sARTP or MsARTP is chosen depending on the type of available data. For the sake of simplicity, we always label the chosen procedure as sARTP.
Power comparison between sARTP and aSPUsPath. Since the aSPUsPath method in the current aSPU package cannot handle summary data from multiple studies, or meta-analysis results from studies with varied SNP coverage, we focused on the scenario with just one study, and adopted the similar simulation strategy as the one used by [8] to compare the power between sARTP and aSPUsPath. We simulated haplotypes on a set of SNPs within a gene in the general population using the algorithm of Wang and Elston [35]. Then the joint genotypes on a subject can be formed by randomly pairing two haplotypes. In brief, we first chose the MAF for each SNP by randomly sampling a value from the uniform distribution U(0.1,0.4).
Then for the set of SNPs in a gene we sampled a latent vector Z ¼ ðz 1 ; Á Á Á ; z q Þ T from a multivariate normal distribution with a covariance matrix Cov(z i , z j ) = ρ |i−j| ,1 i,j q, where ρ was sampled from the uniform distribution U(0,0.8) for a given gene. We randomly picked 50% of the SNPs and converted their simulated z i into minor and major alleles (coded as 0, 1), with the cuts chosen for each z i such that the resultant minor allele has its frequency defined by the specified MAF. For the remaining SNPs, we used the same algorithm to dichotomize −z i into minor and major alleles. This created a more realistic haplotype structure such that a haplotype can consist of a mixture of minor and major alleles. Genotypes on SNPs from different genes were generated independently. Given the number of genes (20, 50, or 80) in a pathway, the proportion of genes (5%, 10%, 20%, and 30%) associated with the outcome, and a chosen common value for all log odds ratios (γ Ã ) in the risk (model 8), we repeated the following steps to generate 1,000 case-control studies, with each consisting of 1,000 cases and 1,000 controls. First, the number of SNPs within each gene was randomly chosen from 10 to 100. Second, for each randomly selected outcomeassociated gene, we randomly picked a functional SNP. Third, we use the aforementioned algorithm of Wang and Elston [35] to generate the individual-level genotype data for a case-control study according to the specified risk model. We also considered the situation where all γ Ã in the risk (model 8) had the same magnitude but different directions. More precisely, when generating a case-control study at the third step, we defined the risk (model 8) by randomly choosing the direction of each log odds ratio to be positive or negative with equal probability. Furthermore, we considered a more complex scenario where each outcome-associated gene had one or two functional SNPs, each with equal probability.
All simulation results are given in S1 and S2 Tables. It is clear that sARTP are generally more powerful than aSPUsPath, especially when the signal-to-noise ratio (the proportion of genes including a functional SNP) is relatively low. The two types of tests tend to have comparable performance when the signal-to-noise ratio increases to 30%, although it is uncommon for a candidate pathway to have such a high signal-to-noise ratio in real applications. For example, among the 4,713 candidate pathways analyzed in the next section, only 4.2% and 0.9% of the pathways have over 20% and 30% of their genes that are likely to contain association signals (i.e., with gene-level p-values < 0.05). From S1 and S2 Tables, we also notice that the advantage of sARTP over aSPUsPath is more evident if not all minor alleles of the functional SNPs are deleterious (or protective) variants (i.e., γ Ã in the risk (model 8) are not all positive). This is expected, as the sARTP approach does not take the effect direction of the minor allele at each SNP into consideration, while aSPUs-Path integrates a set of candidate statistics, including the one similar to the burden test that assumes all minor alleles are either deleterious or protective. When this assumption is not valid, the inclusion of the burden test statistic in aSPUsPath is unlikely to enhance the power, but certainly would increase the multiple-testing penalty.

Evaluation of sARTP Using Data from T2D Studies
To demonstrate the consistency between results obtained by sARTP using SNP-level summary statistics and the ones by ARTP using individual-level genotype data, we compared pathway analysis results from three different procedures on the 4,713 candidate pathways using the GERA GWAS data. Details on how those 4,713 pathways were pre-processed are given in the Results of T2D Pathway Analysis Section. We applied sARTP to the SNP-level summary statistics generated from the GERA study, using either an internal or an external reference panel. We also obtained the pathway p-values by directly applying the ARTP method to the individual-level GERA GWAS data. Fig 1 shows the comparison among p-values from these three analyses, and demonstrates that all three approaches can generate very consistent results.

Results of T2D Pathway Analysis
Findings from the European populations. Since our goal was to identify new susceptibility loci for T2D through the pathway analysis, we excluded 170 high evidence T2D associated SNPs that were either listed in [13] or found from the GWAS Catalog satisfying the following (2) had reported p-values <10 −7 on the initial study; and (3) were replicated on independent studies. We excluded 195 SNPs that has their single-locus testing p-values less than 10 −7 in either DIAGRAM or GERA data to ensure that the pathway analysis result was not driven by a single SNP. In addition, we further excluded genes within a ±500kb region from each of the removed SNPs to eliminate potential association signals that could be caused by linkage disequilibrium (LD) with the index SNPs.
We conducted three types of pathway-based meta-analyses using sARTP, including the one using the DIAGRAM SNP-level summary statistics, the one using the GERA SNP-level summary statistics, and the pathway meta-analysis combining SNP-level summary statistics from both DIAGRAM and GERA studies. When applying the pathway-based meta-analysis to a single gene, we refer to this as the gene-level meta-analysis. We used the external reference panel of 503 Europeans from the 1000 Genomes Project to estimate the variance-covariance matrix.
Before performing a pathway analysis, we applied LD filtering to remove redundant SNPs. For any two SNPs with their pairwise squared Pearson's correlation coefficient > 0.9 estimated from the external reference panel from the 1000 Genomes Project, we removed the one with a smaller value defined as, 2f(1−f)n 0 n 1 n −1 , where n 0 and n 1 are numbers of controls and cases, n = n 0 + n 1 , and f is the MAF based on the reference panel. This value is proportional to the non-centrality parameter of the trend test statistic at a given SNP. We also excluded SNPs with MAF < 1%. After all SNP filtering steps, we had a total of 4,713 pathways for the analysis. The summary of the number of genes and SNPs used in each pathway analysis is given in S2 Fig. The DIAGRAM study had a genomic control inflation factor λ GC = 1.10 based on the published meta-analysis result. The assembled GERA T2D GWAS had λ GC = 1.08. When conducting the pathway analysis on each of two studies, we adjusted the inflation by using the corresponding ffiffiffiffiffiffi ffi l GC p to rescale the standard error of estimated log odds ratio at each SNP. The single-locus meta-analysis combining results from DIAGRAM and GERA datasets had an inflation factor λ GC = 1.067 after each study had adjusted for its own inflation factor. We further adjusted this inflation in the pathway and gene-level meta-analysis when combining SNPlevel summary statistics from both studies using formulas (4) and (5).
The Q-Q plots of gene-level and pathway p-values are given in Fig 2. Gene-level p-value Q-Q plots based on the three analyses show no sign of inflation with their λ GC close to 1.0, but suggest that there are enriched gene-level association signals at the tail end. The pathway pvalue Q-Q plots, on the other hand, shift away from the diagonal identify line and have much higher λ GC , which suggests that T2D associated genes are preferably included in pathways under study. In fact, it can be seen from S3 Fig that a gene with a smaller gene-level meta-analysis p-value tends to be included in more pathways, even though the 4,713 pathways collected from MSigDB v5.0 are not specifically defined for the study of T2D. Fig 2 illustrates that the gene and pathway level signal from the GERA study tends to be slightly stronger than that from the DIAGRAM study. The main reason is that the DIA-GRAM summary result had gone through two rounds of inflation adjustments, with the first round done at each participating study, and the second round on the meta-analysis result. Also, its second round adjustment (λ GC = 1.10) is larger than the one applied to the GERA study (λ GC = 1.08). Adjusting for λ GC in the pathway analysis could be too conservative, since some proportion of the inflation can be caused by the real polygenic effect. A less conservative adjustment could be possible, but it might not be adequate. More discussions on this issue are given in the Discussion Section.
Based on the pathway meta-analysis on a total of 4,713 pathways, we identified 43 significant pathways with p-values less than 1.06×10 −5 , the family-wise significant threshold based on the Bonferroni correction. Their pathway meta-analysis results as well as results from individual studies are summarized in Table 3. More detailed results on each of 43 significant pathways are given in the S6-S48 Figs and S6 Table. There are a total of 15,946 unique genes in all 4,713 pathways. The top 50 genes with smallest gene-level p-values based on the gene metaanalysis are listed in S3 Table. Because of the LD filtering, a gene belonging to two pathways might end up with slightly different sets of SNPs. To remove this ambiguity, we obtained the gene-level p-values by conducting a gene-level meta-analysis on each gene separately.
From Table 3, we can notice that some identified pathways have relatively weak association signals from each of the two studies, but have very significant p-values based on the pathway meta-analysis on the two studies combined. For example, the pathway RIZ_ERYTHROID_ DIFFERENTIATION has p-values of 0.0233 and 0.0231 based on DIAGRAM and GERA studies, respectively. Combining these two p-values using Fisher's method yields a p-value of 0.0046. On the other hand, the pathway meta-analysis produces a much more significant result (p = 6.15×10 −7 ). This demonstrates the power advantage of the pathway meta-analysis over the approach that simply combines the pathways p-values from individual studies. The aforementioned simulation studies also confirmed this observation (Table 2).
In Fig 3, we illustrate the connection between the 43 significant pathways and a group of genes showing association evidence. For the purpose of illustration, in the figure we only focus on 46 genes that are covered by the 43 pathways and have their gene-level meta-analysis p-values less than 0.001. It is evident from Fig 3 that a cluster of 4 genes, UBE2Z, SNF8, GIP, and ATP5G1, has the most significant gene-level p-values (S3 Table), and contribute association signals to 20 out of 43 significant pathways (S6-S25 Figs). These 4 genes overlap each other at chromosome 17q21. This region contains a previously unidentified genome-wide significant synonymous SNP rs1058018 (meta-analysis p = 3.06×10 −8 ) after two rounds of inflation adjustments. More detailed information on SNP rs1058018 and SNPs in that region are given  in S4 Table, S4 and S5 Figs. By conditioning on rs1058018, none of the other SNPs in this region are significant based on the conditional association analysis using the GERA individuallevel GWAS data. Based on GTEx data v6, rs1058018 is a cis eQTL for UBE2Z in blood (p = 7.9×10 −15 ). UBE2Z is involved in Class I MHC antigen processing and presentation (Gen-eCards). The region at 17q21 was previously implicated to be associated with T2D through a candidate gene/loci approach [36]. Although genes at the 17q21 region carry the strongest association signal, 11 out of those 20 pathways remain to be globally significant (p < 1.06×10 −5 ) after excluding those genes from the pathway definition (Table 3).
The majority of 43 identified pathways are enriched with signals from multiple chromosomal regions as demonstrated by the Q-Q plots of their SNP-level and gene-level p-values (S6-S48 Figs). For example, the strongest T2D-associated pathway, SCHLOSSER_SERUM_ RESPONSE_UP, consists of 103 genes, which includes two genes with p-values < 0.001, and has 20 genes with p-values between 0.001 and 0.05 (S26 Fig, and Supplemental data). We conducted the ingenuity pathway analysis on those 22 genes with p-values less than 0.05, and found enrichment of these genes in caveolae-mediated cytosis (important for removal of low/ high density lipoproteins), and lipid metabolism pathways, and in functions/diseases related to differentiation of phagocytes and transport of proteins.
It is assuring that our pathway analysis detected several pathways that are natural candidates underlying the development of T2D, including the pathways KEGG_MATURITY_ONSET_ DIABETES_OF_THE_YOUNG (S31  that these well-defined T2D-related pathways are enriched with additional unidentified and contributory T2D-associated genes. Among the 43 globally significant pathways, there are multiple ones that are defined according to specific gene expression patterns on various tumor types, including pancreatic adenocarcinoma (S21 Fig), hepatocellular carcinoma (HCC) (S27 and S32 Figs), bladder carcinoma (S42 Fig), nasopharyngeal carcinoma (S24 Fig), and familial breast cancer (S34 and S37 Figs). It is well recognized that T2D patients have elevated risk of cancer at multiple cancer sites, such as the liver and pancreas [37,38]. These findings can provide valuable insights into the genetic basis underlying the connection between T2D and a host of different cancers.
In the above analysis, we used the sARTP method with the gene-level association evidence summarized by one or two most significant SNPs within each gene, under the assumption that there are at most two independent association signals within a given gene. We also applied sARTP by using 3 SNP-level cut points (i.e., (c 1 ,c 2 ,c 3 ) = (1,2,3)) to reanalyze the 4,713 pathways based on the combined data of DIAGRAM and GERA. It appears that results obtained by sARTP with 3 SNP-level cut points are very consistent with those with 2 cut points (S49 Fig).
Findings from Eastern Asian populations. We reanalyzed the 43 significant pathways identified from the European populations using summary-level data generated by the AGEN-T2D study. An inflation factor λ GC = 1.03 calculated from the AGEN-T2D meta-analysis was adjusted in the pathway meta-analysis. The genetic regions excluded from analyzing the DIAGRAM and GERA studies were also excluded from the AGEN study. The results were summarized in Table 4. There are 10 out of 43 pathways with the unadjusted p-value less than 0.05, suggesting that many pathways identified from the European populations were also enriched with T2D-associated genes in the eastern Asian populations (S7 Table). Among the 43 pathways, we were able to identify 4 significant T2D-associated pathways at the false discovery rate (FDR [39]) of 0.05 (S27, S12, S32 and S36 Figs), and 3 additional T2D-associated pathways at the FDR of 0.1 (S47, S44, and S25 Figs). All the pathway p-values remain basically the same level if we further excluded genes within ±500kb regions surrounding the GWAS T2D loci established in eastern Asian populations. These results support the presence of trans-ethnic pathway effect on T2D in European and eastern Asian populations [11,12].
Given the existing epidemiologic evidence on the close connection between T2D and the liver cancer, it is noteworthy that the two HCC related pathways (S27 and S32 Figs) identified in European populations remain to be significant in eastern Asian populations at the FDR of 0.05 (Table 4). The pathway PATIL_LIVER_CANCER consists of 653 genes (after data preprocessing) that are highly expressed in HCC and are enriched with genes having functions related to cell growth, cell cycle, metabolism, and cell proliferation [40]. The other pathway, HOSHI-DA_LIVER_CANCER_SUBCLASS_S3 consists of 240 genes that show similar gene expression variation patterns and together define a HCC subtype with its unique histologic, molecular and clinical characteristics [41]. These two pathways have only 6 genes in common, and none of the 6 genes has a gene-level p-value < 0.05 in either European or eastern Asian data. More in depth investigations of these two complementary pathways could lead to further understanding the connection between T2D and the liver cancer.
The genome-wide significant SNP rs1058018 at the 17q21 region identified through the combined analysis of DIAGRAM and GERA studies turned out to be null in the AGEN-T2D study (p = 0.29). This could be due to the relatively small sample size of the AGEN-T2D study, or the genetic risk heterogeneity at the 17q21 locus among different ethnic populations. Nevertheless, 2 out of the 20 pathways (S12 and S25 Figs) that contain genes within the 17q21 region are still significant at the FDR of 0.1. Among the 23 pathways that do not contain any gene within the 17q21 region, 5 pathways remain significant at the FDR of 0.1 (S27, S32, S36, S47 and S44 Figs).

Discussion
We developed a general statistical procedure sARTP for pathway analysis using SNP-level summary statistics generated from multiple GWAS. By applying sARTP to summary statistics from two large studies with a total of 19,809 T2D cases and 111,181 controls with European ancestry, we were able to identify 43 globally significant T2D-associated pathways after excluding genes in neighborhoods of GWAS established T2D loci. Using summary data generated from 8 T2D GWAS with 6,952 cases and 11,865 controls from eastern Asian populations, we further showed that 7 out of 43 pathways identified in the European populations were also significant in the eastern Asian populations at the FDR of 0.1. The analysis clearly highlights novel T2D-associated genes and pathways beyond what has been known from single-SNP association analysis reported from largest GWAS to date. Since the new procedure requires only SNP-level summary statistics, it provides a flexible way for conducting pathway analysis, alleviating the burden of handling large volumes of individual-level GWAS data.
We have developed a computationally efficient R package called ARTP2 implementing the ARTP and sARTP procedures, so that it can be used for conducting pathway analysis based on individual-level genetic data, as well as SNP-level summary data from one or multiple GWAS. The R package also supports the parallelization on Unix-like OS, which can substantially accelerate the computation of small p-values when a large number of resampling steps are needed. The ARTP2 package has a user-friendly interface and provides a comprehensive set of data preprocessing procedures to ensure that all the input information (e.g., allele information of SNP-level summary statistics and genotype reference panel) can be processed coherently. To make the sARTP method accessible to a wider research community, we have also developed a web-based tool that allows investigators to conduct their pathway analyses using the computing resource at the National Cancer Institutes through simple on-line inputs of summary data.
Single-locus analysis of GWAS usually has its genomic control inflation factor larger than 1.0. Some proportion of the inflation can be attributed to various confounding biases, such as the one caused by population stratification, while the other part can be due to the real polygenic effect. In the pathway analysis it is important to minimize the confounding bias at the SNP-level summary statistic. Otherwise a small bias at the SNP level can be accumulated in the pathway analysis, and lead to an elevated false discovery rate. Here we try to remove the confounding bias by adjusting for the genomic control inflation factor observed at the GWAS study. This approach is conservative because part of the inflation can be caused by the real polygenic effect. Recently, [42] developed the LD score regression method to quantify the level of inflation caused solely by the confounding bias. Adjusting for the inflation factor estimated by this method, instead of the genomic control inflation factor, can potentially increase the power of the pathway analysis. However, the LD score regression method relies on a specific polygenic risk model, and its estimate might not be robust for this model assumption. More investigations are needed to evaluate the impact of this new inflation adjustment on the pathway analysis.
There are several other strategies to increase the power of pathway analysis besides increasing sample size [4]. One area of active research is to find better ways to define the gene-level summary statistic using observed genotypes on multiple SNPs, so that it can accurately characterize the impact of the gene on the outcome [43][44][45][46]. In our proposed procedure, we adopt a data driven approach to select a subset of SNPs within a gene that collectively show the strongest association evidence. Because of this, we have to pay the penalty of multiple-comparison in the final pathway significance assessment. However, it is well recognized that SNPs at different loci can have varied levels of functional implications. We can potentially reduce the burden of multiple-comparisons and thus improve the power of the pathway analysis, by prioritizing SNPs according to existing genomic knowledge and other data resources. For example, [47] recently proposed a new gene-level summary statistic based on a prediction model that was trained with external transcriptome data. The gene-level summary statistic is defined as the predicted value that estimates the component of gene expression regulated by a subject's genotypes within the neighborhood of the considered gene. Pathway analysis procedures using this kind of biologically informed gene-level summary statistic can be easily incorporated into the ARTP2 framework.
The sARTP method can be easily expanded to adopt other multi-locus statistics in accumulating association within a gene, as long as they can be written in terms of SNP-level score statistics and their variance-covariance matrix. For example, the current ARTP2 package provides the option for conducting the pathway meta-analysis using the joint test statistics proposed by [31].
When conducting pathway analysis with individual-level genetic data, we could run into a computing memory issue if the study has a large sample size and the pathway consists of a large number of genes and SNPs (S4 Fig). The ability of performing pathway analysis using summary data provides a convenient and efficient solution in those situations. We can first calculate the SNP-level summary statistics based on the individual-level genetic data, and then randomly sample a small proportion of the original data as an internal reference to estimate the variance-covariant matrix for score statistics at considered SNPs. Based on our experiments, using 500 or more subjects to form a reference panel would be good enough to generate accurate pathway p-values. As shown in Fig 1, the testing results using this approach are very consistent with those based on individual-level genotype data.
The sARTP approach can be applied directly to SNP-level meta-analysis results. This is very convenient as meta-analysis results are in general easily accessible. But we want to emphasize that it is important to know the set of the SNPs studied by each participating study in order to apply sARTP properly, as the SNP coverage information is essential for accurately estimating the variance-covariance matrix of SNP-level score statistics. GWAS consortia usually do not post the SNP coverage information when releasing their meta-analysis results. Many statistical packages designed for conducting multi-locus analysis based on meta-analysis results often assume the uniform coverage [15-18, 24, 25, 48]. As we already have demonstrated in the context of pathway analysis, this type of over-simplification could lead to inflated false positive rate.
The proposed procedure assumes that all participating studies are conducted with subjects with the same ancestry background. If this is not the case, a simple approach is to use the Fisher's method to combine pathway p-values estimated on different ethnic populations. However, if there were no evidence for the existence of cross ethnic risk heterogeneity, it would be more powerful to assume a fixed effects model on the SNP-level association when performing the pathway analysis. In that case, since the LD structures in different ethnic populations are different, we need a separate reference panel for each ethic group to derive the corresponding variance-covariance matrix of the score statistics. The current ARTP2 package needs to be modified to accommodate such a more complicated case.
As already demonstrated by many successful GWAS meta-analysis, increasing the sample size through combining results from multiple studies is a very effective way to improve our chance for new findings. For the same reason, pathway-based meta-analysis can provide us with new opportunities to uncover biological pathways that are previously undetectable due to the limitation on the sample size. With more summary data from meta-analysis becoming increasingly available, we expect the ARTP2 package would be a valuable tool for further exploring the genome in search for the hidden heritability.

Web Resources
The URLs for data and software presented herein are as follows: S1 Text. Recovering score statistics and their variance-covariance matrix using summary results from the fixed effects model.