An Optimal Bahadur-Efficient Method in Detection of Sparse Signals with Applications to Pathway Analysis in Sequencing Association Studies

Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker—single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency,limε→0N(2)/N(1)=ϕ12(θ), compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals (PN(i)<ε→0). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.


Introduction
Next-generation sequencing (NGS) technology has opened a new era for studying genetic associations with complex diseases. Yet, although whole genome searching has become easier and less costly to perform, our ability to critically evaluate such high throughput data has not improved substantially. Sequencing data often contain millions of genetic variants. However, testing millions of markers using the "single marker-single trait" analysis often loses power after the multiple-testing adjustment. Genome-wide significance requires strict Bonferroni correction with p-value < 2.5×10 −6 for a total of 20,000 gene-based statistical tests. To maintain statistical power of detecting rare variants, a theoretical sample size of n>10,000 may be required for sequencing data [1].
These dimensional challenges motivate us to aggregate effects from multiple genes using pathway analysis. Genetic pathways comprise molecular entities that interact with each other to regulate specific cell functions, metabolic processes, biosynthesis, and embryonic developments. For non-Mendelian diseases and complex traits, multiple genetic risk factors may function together in the pathway. As a result, signals may not be significant in the "single markersingle trait" analysis, but many such values from related genes might provide valuable information regarding gene function and regulation. The pathway information can be extracted from bioinformatic resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) [2], the PANTHER classification system for protein sequence data [3], and Reactome database for human pathway data [4].
We propose a two-stage combined p-value method for pathway (gene set) analysis of NGS data. The first stage is at the gene level, where we integrate effects from rare and common variants within a gene. The goal of the first stage analysis is to generate a p-value that summarizes an overall effect within a gene. The second stage is at the pathway level, where we aggregate pvalues among all genes in a pathway.
An exome sequencing simulation study was conducted to compare the SKAT-Lancaster procedure and Gene Set Enrichment Analysis (GSEA) [5]. We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium.

Two-Stage Pathway Analysis for Sequencing Data
There is a different nature of effects between gene and pathway. At the gene level, we are interested in identifying rare genetic variants from high throughput data. At the pathway level, genes with similar functions work together to fulfill biological tasks. Thus, we are interested in detecting small and common effects among genes. The proposed "SKAT-Lancaster" procedure provides a two-stage framework in order to (1) reduce the dimension of genetic variants, (2) combine effects from multiple genes, and (3) take genetic correlation architecture into account.
Stage I-Gene Level Testing. In the first stage, we suggest integrating effects from rare variants within the i th gene using the Sequence Kernel Association Test (SKAT) [6]. Several tests have been proposed to analyze rare variants at the gene level, including burden tests and the C-alpha test. We choose SKAT because SKAT has been proven to be a locally most powerful score test [7].
Let G ij be the j th variant of the i th gene. Let β i = (β i1 , Á Á Á, β ij , Á Á Á) be the effects from markers in the i th gene. Generate a p-value, P i for the i th hypothesis testing H 0i : b i ¼0 vs. H ai : b i 6 ¼0 in the i th gene. A ! is added to denote the zero vector. SKAT is a locally most powerful score test on the variance component of a regression where Y is a phenotype, α is a vector of fixed effects from covariates X, and ε is an error term. To increase the power, SKAT tests H 0i : β i = 0 by treating β ij as a random variable with mean zero and variance w ij τ i , where τ i is a common variance component and w ij is a pre-specified weight for variant G ij . As a result,H 0i : b i ¼0 is equivalent to H 0i : τ i = 0. The variance component score statistic is Q ¼ ðY ÀmÞ t G i W i G t i ðY ÀmÞ, wherem ¼â t X is the predicted mean of Y under H 0i , and are the weights of the variants. Under the null hypothesis, Q follows a mixture of chi-square distributions [6]. Common variants, population stratification, and other covariates can also be included as fixed effects in the model. The goal of the first stage analysis is to generate a p-value that summarizes the overall effect for each gene.
Stage II-Pathway Level P-value Combination. The second stage is at the pathway level, where we perform the modified Lancaster procedure to combine effects from multiple genes within a pathway. We choose the Lancaster procedure because it is optimal in Bahadur efficiency among all weighted combined p-value methods. The original Lancaster procedure is based on the independent p-value assumption. However, genetic data are highly correlated and ignoring the correlation structure will severely inflate the Type I error rate. Thus we need a modification of the Lancaster procedure to take the complex correlation structure among genetic variants into account [8].
Consider m sequences of test statistics, fT ðiÞ n i g; i ¼ 1; 2; Á Á Á ; m and the corresponding significance levels,fP ðiÞ n i g, where n i is the sample size for the i th test statistic. Let the Lancaster sta- where r ij ¼ covðF À1 i ð1 À P ðiÞ n i Þ; F À1 j ð1 À P ðjÞ n j ÞÞ takes the correlation among p-values into account. We use the Satterthwaite method to match the mean and variance of T Lancaster n and cw 2 v , and solve the equations to derive c and v. Thus we have T Lancaster Þ=EðT Lancaster n Þ and v ¼ 2½EðT Lancaster n Þ 2 =varðT Lancaster n Þ. As genetic variants have very complex correlation architecture, there is no analytical form for the exact correlated Lancaster procedure. The Satterthwaite approximation is an effective approach to summarize the distribution of the exact correlated Lancaster procedure. Q-Q plots from simulated data suggest a good match between the approximated T Lancaster n and exact T Lancaster n , with a very slight deviation in the tail part. By introducing the correlation structure, the Satterthwaite approximation can significantly reduce the Type I error among correlated p-values.

Self Contained vs. Competitive Lancaster Procedure
The main difference between competitive and self-contained tests lies in the formulation of the null hypothesis [9]. Let μ i stand for the effect size from the i th pathway. The null hypothesis for the self-contained test of the i th pathway is H 0, self-contained : μ i = 0. Thus, the correlated Lancaster procedure can be considered as a self-contained test.
The null hypothesis in the competitive test is H 0, competitive : μ 1 = μ 2 = Á Á Á = μ i = Á Á Á The competitive Lancaster test can be carried out using permutation testing: Step 1: Let P i be the p-value from the Lancaster procedure in the i th real pathway.
Step 2: Create L, say 100000, permutated pathways by shuffling genes among pathways. The permutated pathway sizes should resemble the real pathway sizes. Let P l be the p-value from the Lancaster procedure in the l th permutated pathway for l = 1, 2,Á Á Á, L.
Step 3: The p-value of the competitive Lancaster procedure is X L l¼1 IfP i ! P l g=L for the i th real pathway, where I{.} is an indicator function.

Meta-analysis in Sequencing Association Studies
Due to cost, the rarity of diseases involved, and high dimensionality of variants, sequencing association studies are often underpowered to detect modest genetic effects. Meta-analysis can be used to address this issue by analyzing data across studies. Meta-analysis uses study-specific summary statistics, allowing investigators to combine information across studies when individual-level data cannot be shared.
The Lancaster procedure is independent from the SKAT test. One can directly apply the Lancaster procedure to meta-analysis, as we demonstrated in our analysis of the Global Lipids Genetics Consortium data. In this work, we choose SKAT to pair with the Lancaster procedure in order to detect rare variants in exome sequencing data. For other types of sequencing data, we suggest replacing SKAT with other statistical tests, such as FaST-LMM [10] or GEMMA [11], at the gene level and then applying the Lancaster procedure to combine multiple effects at the pathway level.

Lancaster Procedure Is Optimal in Bahadur Efficiency
Several weighted combined p-value methods have been developed. See [12] for a comprehensive review. Since high throughput sequencing data pose a severe challenge in retaining the statistical power for small sample sizes in detection of sparse signals, it is critical to theoretically assess the efficiency among the weighted combined p-value methods. Let P i , (i = 1, 2,Á Á Á, m) be p-values from m hypothesis tests. Littell and Folks [13,14] showed that Fisher's method of combining independent tests (T Fisher ¼ À2 However, Fisher's method does not allow a weight function when combining p-values.
The weight function can be used to integrate multiple-source omics data from varying sequencing platforms. For instance, one can apply weight functions to integrate microarray data and CHIP-TIE data to identify the protein involved in transcription. In this case, weight functions can be considered as prior information to ensure the binding calling is a real signal instead of an artifact. As [15] pointed out, carefully chosen weights can generally improve power for a combination of p-values.
There is no uniformly most powerful method of combining p-values. The Bahadur efficiency is an important way to compare sample sizes required by two statistics in detection of sparse signals (ε ! 0).
The Notation of Bahadur Relative Efficiency. Consider a hypothesis test for H 0 : θ 2 Θ 0 vs. H a : θ 2 Θ − Θ 0 . Bahadur efficiency offers an asymptotic relative comparison between two competing test statistics. Under H a , a test statistic whose significance level converges to zero at a faster rate is considered more Bahadur efficient.
Let T n be a real valued test statistic depending on an independent sample, x 1 , x 2 ,Á Á Á, x n for n = 1, 2, Á Á Á Assume for all θ 2 Θ 0 , T n follows the same null CDF F 0 . Let t be the value attained by T n , then the significance level of T n is P n = 1 − F 0 (t). Suppose that − 2ln P n / n converges to c(θ) with probability 1, i.e., Prðlim n!1 is dependent on θ under the alternative hypothesis and c(θ) is called the Bahadur efficiency slope of T n when n ! 1. Consider two competing sequences of test statistics, fT n ð1Þ g and fT n ð2Þ g, with the Bahadur efficiency slopes c 1 (θ) and c 2 (θ), respectively. The ratio is the Bahadur efficiency of fT n ð1Þ g relative to fT n ð2Þ g. Let N (i) be the minimal sample size satisfying P N ðiÞ < ε for the i th test. Bahadur [16] shows that lim ε!0 N ð2Þ =N ð1Þ ¼ 12 ðyÞ with probability 1 under H a : θ 2 Θ − Θ 0 , which indicates that the Bahadur efficiency ratio ϕ 12 (θ) gives the limiting ratio of sample sizes required by the two statistics to attain an equally small significance level. As a result, fT n ð1Þ g is deemed superior to, i.e. more Bahadur efficient than, fT n ð2Þ g if ϕ 12 (θ) ! 1 under H a : θ 2 Θ − Θ 0 . Bahadur Efficiency for Lancaster Procedure, Weighted Z-test, and Good's test. Consider m sequences of test statistics, fT ðiÞ n i g; i ¼ 1; 2; Á Á Á ; m and the corresponding significance levels,fP ðiÞ n i g, where n i is the sample size for the i th test statistic. Assume that for each i = 1, 2,Á Á Á, m, the sequence fT ðiÞ n i g has a Bahadur efficiency slope c i (θ). That is, Pr lim n i !1 À ð2=n i ÞlnP ðiÞ Assume also that the sample sizes n 1 , Á Á Á, n m have an average sample size n = (n 1 +Á Á Á+n m ) / m and We first derive the Bahadur efficiency for the Lancaster test. Let f i , F i and F À1 i be the PDF, CDF, and inverse CDF for [Theorem 2] Assume m independent test statistics T ð1Þ n 1 ; Á Á Á ; T ðmÞ n m have significance levels P ð1Þ n 1 ; Á Á Á ; P ðmÞ n m respectively. The weighted Z-test, When w i = 1 for all i = 1, 2,Á Á Á, m, the weighted Z-test is reduced to the regular z statistic and the Bahadur efficiency slope is c regular z ðyÞ ¼ . This finding is in agreement with the Bahadur efficiency finding for the regular Z-test in [13]. The Lancaster test statistic is superior to the weighted Z-test and regular Z-test in terms of Bahadur efficiency. Using the induction method, we show that the Bahadur relative efficiency The fact that lim ε!0 N ð2Þ =N ð1Þ ¼ 12 ðyÞ indicates that the Lancaster procedure will require smaller sample sizes as compared to the weighted Z-test to achieve the same significance level.
When the weights are unequal, the null CDF of Q is given by The maximal weight, max i ðw i Þ, has a strong impact on the Bahadur efficiency in Good's test. Only the individual test(s) assigned with the maximal weight reserves its Bahadur effi- Other individual tests will relatively lose more Bahadur efficiency as the maximal weight gets larger. That is, if w i < max i ðw i Þ, then w i l i c i ðyÞ=max i ðw i Þ < l i c i ðyÞ.
The Lancaster procedure is superior to Good's test in terms of the Bahadur efficiency, i.e. ϕ 12 = c Lancaster (θ)/ c Good (θ) ! 1 for all θ 2 Θ − Θ 0 and lim ε!0 N ð2Þ =N ð1Þ ¼ 12 ðyÞ. For large-scale tests, which often occur in next-generation sequencing data, the Lancaster procedure will require relatively smaller sample sizes as compared to Good's test, i.e., N Lancaster N Good when the significance level goes to 0, which represents sparse signaling in high throughput data.
Lancaster Procedure Has the Optimal Bahadur Efficiency. We can further prove that the Lancaster procedure reaches the upper bounds of Bahadur efficiency among all nondecreasing T n . Thus the Lancaster procedure has the optimal Bahadur efficiency compared to all other combination methods under mild conditions.
[Proposition 1] Let T n be any function of m independent test statistics T ð1Þ n 1 ; Á Á Á ; T ðmÞ n m . Let c any (θ) > 0 be the Bahadur efficiency slope of T n . Assume T n is non-decreasing in a way that Then the Lancaster statistics have the optimal Bahadur efficiency, with c Lancaster (θ) ! c any (θ) for all θ 2 Θ − Θ 0 .
The Lancaster procedure and Fisher's test both have the optimal Bahadur efficiency among all non-decreasing combined tests. Since the Lancaster procedure can incorporate weight functions for auxiliary information in modeling and testing, the Lancaster procedure has more flexibility and it can be considered as the optimal generalized Fisher's method. The nondecreasing condition, t 1 t Ã 1 ; Á Á Á ; t m t Ã m ) T n ðt 1 ; Á Á Á ; t m Þ T n ðt Ã 1 ; Á Á Á ; t Ã m Þ, is easy to meet in practice.
Comparing Bahadur Efficiency for Correlated Data. It is critical to assess Bahadur efficiency for correlated data as it will shed light on the impact of correlation structures on the asymptotic convergence rate of significance levels and will further impact the sample sizes required for the experiments. This is a challenging topic since the distributions of combined test statistics under complex correlation structures have no closed analytical forms. To address this issue, we give an approximate Bahadur efficiency using the techniques described in the Methods Section. Below are some interesting results.
[Proposition 2] When m statistics T ð1Þ n 1 ; Á Á Á ; T ðmÞ n m are correlated, under H a : θ 2 Θ − Θ 0 : • the Lancaster statistic, has an approximate Bahadur efficiency slope • the Good's test statistic has an approximate Bahadur efficiency slope

Simulation Study
We conducted an extensive simulation study to assess the type I error and power of the SKA-T-Lancaster procedure. We further compared our proposed method to Gene Set Enrichment Analysis (GSEA) [5]. The empirical assessment was based on rigorous simulation algorithms for sequencing-based genome-wide association studies [18]. The simulation was conducted using the whole exome sequencing genotype data from the 1000 Genomes Project Phase 1 study (n = 822 individuals). After filtration, 40,918 biallellic protein-changing coding variants in selected pathways were mapped to KEGG and Biocarta pathways. To avoid testing over-or under-sized pathways, we selected pathways containing 10 to 100 genes. This yielded 353 pathways with 3304 genes for our simulation study.
We applied a genome-wide additive model to evaluate pathway-testing methods using realistic genetic architectures. Let Y i = X i β + ε i , where Y i is a continuous trait, the vector X i is the whole exome sequencing genotype for the i th subject and ε i $ i:i:d: Nð0; s 2 Þ is random noise. The vector β contains genetic effect regression coefficients corresponding to genotype variants. In simulation, the j th variant is causal if |β j | > 0; pathways and genes are causal if they harbor causal variants. We adopted a stochastic hierarchical effect model β pgv = C p × C g d g × C gv d gv × e pgv to distribute the total genetic variance into pathways, genes, and individual variants [18]. Within a central causal pathway, we first randomly selected 50% of the genes to be associated with the trait. Then we randomly selected 70% of the variants in causal genes to be associated with the trait. We randomly assigned 80% (20%) of causal genes to be detrimental (protective). For variants within causal genes, 80% were detrimental and 20% were protective. We set the wholegenome heritability h 2 = Var(Xβ)/(Var(Xβ) + σ 2 ) = 20%. This resembles heritability in real data, which often ranges between 20% and 30%. We used Bonferroni correction to control Family-Wise Error Rate (FWER) and set the genome-wide significance level at α = 0.05/ 353 = 1.4164E-4 for testing 353 pathways. We performed principal component analysis and included the top 3 principal components as covariates in regression analyses to adjust for the population stratification.
In the SKAT-Lancaster procedure, we first performed SKAT to test overall effects on the gene level. Then we considered 4 weight functions in the Lancaster procedure when combining p-values among genes in a pathway: where n i is the number of SNPs in the i th gene andñ ¼ medianðn i Þ is the median gene size. This weight can remove bias when testing overly small or overly generalized pathways.
• AIC weight, BIC weight: these weight functions calculate the degrees of variations summarized by the gene level multi-SNP regression.
We considered 3 simulation scenarios (Table 1). In Scenario 1, we assessed the global null hypothesis type I error by setting all genetic effect coefficients as zero, i.e. b ¼0. Any pathways or genes reaching the significance level were considered as false positives. The results in Table 2 indicate that the SKAT-Lancaster procedure has well-controlled type I error rates (~10E-4). We further investigated the Q-Q plot by comparing observed p-values versus expected p-values (Fig 1). The type I error inflation factor (λ) is the ratio between the area under the curve and the area under the diagonal reference line. Fig 1 indicates that SKAT-Lancaster procedure with 4 weight functions has no inflation of the global null hypothesis type I error rate (λ < 1).
In Scenarios 2 and 3, we assessed the stringent power and lenient power when randomly generating one central causal pathway in each simulation ( Table 1). The stringent power calculates the percentage of times the central causal pathway is found significant. Due to the correlation among pathways, pathways that share causal genes with the central causal pathway are overlapping causal pathways. The lenient power calculates the percentage of times (central and overlapping) causal pathways are found significant.
The results in Table 2 indicate that the SKAT-Lancaster procedure outperformed GSEA. In Scenario 2, the SKAT-Lancaster procedure with 4 weight functions had lenient power ranging between 0.826 and 0.884, while GSEA had lenient power of 0.373. In Scenario 3, the SKAT-Lancaster procedure with 4 weight functions had lenient power ranging between 0.543 and 0.645, while GSEA had lenient power of 0.505. We randomly assign the causal variances in Scenarios 2 and 3, the SKAT-Lancaster procedure with uniform weight had the best detection power.
Regarding the computing time, the self-contained Lancaster procedure compares a test statistic to an asymptotic distribution, thus it does not require intensive computation. The competitive Lancaster procedure is based on permutation and it has similar computation efficiency as compared with GSEA.

Case Study: Lipid Meta-Analysis
We illustrate our method using meta-analysis data generated by the Global Lipids Genetics Consortium. To identify new loci and validate existing loci associated with lipids, [19] we analyzed the levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides (TG) and total cholesterol (TC) of 196,475 individuals from 60 studies. A total of 1,048,161 Single Nucleotide Polymorphisms (SNPs) were genotyped using the genome-wide association study (GWAS) arrays and Metabochip arrays. These variants were selected from promising loci associated with lipid and coronary artery disease, based on findings from previous GWAS studies and the 1000 Genome Project. Subjects taking lipid-lowering Table 1. Simulation Scenarios and parameters*.
• Phenotype is normally distributed.
• No pathways, genes or variations are associated with the trait.
• Significance level is 0.05/353. All significant results are considered as type 1 errors.
• Phenotype is normally distributed.
• Randomly assign one central causal pathway. Within the central causal pathway, randomly assign 50% causal genes. Randomly assign 70% causal variants in associated genes.
• Phenotype is normally distributed.
• Randomly assign one central causal pathway. Within the central causal pathway, randomly assign 50% causal genes. Randomly assign 70% causal variants in associated genes.
• Significance level is 0.05/353. medications were excluded in the meta-analysis. The additive effect of each SNP on blood lipid levels after adjusting for age and sex was analyzed and p-value was generated for each SNP and each lipid variable. Genomic control values for the initial meta-analyses were 1.10-1.15, indicating that population stratification had only a minor impact on the results [20]. The SKAT-Lancaster procedure can only be applied to original data. Remarkably, as the Lancaster procedure is independent from the SKAT test, it can be applied to secondary data analysis. To identify pathways that are more significant than others, we performed the competitive Lancaster procedure. In the competitive test, we performed 100,000 times of permutations and ensured that the permutated pathways preserved the size and characteristics of original pathways. Our simulation study showed that the competitive Lancaster procedure had wellcontrolled type I error rates to prevent false discoveries.
Before comparing the proposed method to Fisher's method [21] and weighted Z-test [22], we considered 4 weight functions for the Lancaster procedure: where n i is the number of SNPs in the i th gene andñ ¼ medianðn i Þ is the median gene size. This is a weight adjusted by gene size to remove the bias from large genes.
, where MAF stands for minor allele frequency. Common variants receive higher weights.
. Rare variants receive higher weights.
Pathway analysis was performed using the gene ontology (GO) gene sets from http://www. broadinstitute.org/gsea/index.jsp. A total of 1454 pathways were analyzed and multiple testing  [23]. The numbers of significant pathways are summarized in Fig 2 As shown in Table 3, the Lancaster procedure outperformed Fisher's method and weighted Z-test by identifying more significant pathways. When the Lancaster procedure was assigned with uniform weights (w 4 ), it performed equivalently to Fisher's method. The weighted Z-test is not optimal in Bahadur efficiency, so it identified fewer pathways than the Lancaster procedure and Fisher's method. Weight functions w 1 and w 2 outperformed w 3 and w 4 , indicating that removing gene size bias and assigning higher weights to common variants can improve power of the Lancaster procedure. We compared our pathway findings with findings from the MAGENTA analysis in [19] ( Table 4). The Lancaster procedure (w 1 ) showed that the "enzyme binding" pathway is significantly associated with HDL (FDR<10 −5 ), which agrees with the finding from [19] (FDR = 0.038). The "enzyme binding" pathway contained 178 genes interacting selectively and non-covalently with any enzyme. The Lancaster procedure (w 1 , w 2 , w 4 ) showed that the "lipid  An Optimal Bahadur-Efficient Method transport" pathway is significantly associated with LDL (FDR adjusted p-value <10 −5 ), which agrees with the finding from [19] (FDR = 0.0016). The "lipid transport" pathway contains 28 genes involving directed movement of lipids into, out of, within, or between cells. Lipids are compounds soluble in an organic solvent but not, or sparingly, in an aqueous solvent. The Lancaster procedure (w 1 , w 2 , w 4 )found that the "lipoprotein metabolic process" pathway is significantly associated with LDL (FDR adjusted p-value <10 −5 ), which agrees with the finding from [19] (FDR = 0.00017). The "lipoprotein metabolic process" pathway contains 33 genes involving the chemical reactions. The pathway also involves any conjugated, water-soluble protein in which the non-protein moiety consists of a lipid or lipids.

Discussion and Conclusions
The proposed two-stage approach is a powerful tool to integrate information in pathway analysis of sequencing association studies. The first stage is the gene-based testing, where effects from rare variants within a gene are summarized into one p-value using the SKAT test. In the second stage, p-values from multiple genes are combined for pathway analysis and meta-analysis using the correlated Lancaster procedure. In this work, we prove that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. We assess the Bahadur efficiency among weighted combined p-value methods and further prove that the Lancaster procedure is optimal in Bahadur efficiency under very mild conditions. There has been a lack of theatrical comparison among combined p-value methods. Several simulation studies have compared weighted combined p-value methods [15,22,24]. With more than 400 citations in the literature, these studies have been a subject of intense interest to the research community heated discussions in the research community, but yield controversial results in different simulation scenarios. Thus, we fill the gap by comparing the Bahadur efficiency among methods. The Bahadur efficiency is a critical measure of performance of statistical testing [25] [26]. In [25], Bahadur efficiency has been applied for sensitivity analyses in observation studies. The Bahadur efficiency, lim ε!0 N ð2Þ =N ð1Þ ¼ 12 ðyÞ, compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε ! 0. As the number of genetic variants scanned by the sequencing technology increases from thousands to millions, signals that are associated with phenotypes become sparse, requiring a more stringent statistical significance level to detect sparse signals, i.e. (P N ðiÞ < ε ! 0). The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals.
Among combined p-value methods, the Lancaster procedure can be considered as the generalized Fisher's method with a weight function. Weight functions, when used appropriately, can generally increase the power of combined p-value methods [27][28][29].
Evaluating Bahadur efficiency for high-throughput genetic data is critical since there is no combined p-value method that that is uniformly the most powerful. Bahadur efficiency calculates the limiting ratio of sample sizes required by two statistics to attain an equally small significance level. The optimal Bahadur efficiency indicates that the Lancaster procedure asymptotically requires a minimal sample size to attain the significance level.