A powerful and versatile colocalization test.

Transcriptome-wide association studies (TWAS and PrediXcan) have been increasingly applied to detect associations between genetically predicted gene expressions and GWAS traits, which may suggest, however do not completely determine, causal genes for GWAS traits, due to the likely violation of their imposed strong assumptions for causal inference. Testing colocalization moves it closer to establishing causal relationships: if a GWAS trait and a gene’s expression share the same associated SNP, it may suggest a regulatory (and thus putative causal) role of the SNP mediated through the gene on the GWAS trait. Accordingly, it is of interest to develop and apply various colocalization testing approaches. The existing approaches may each have some severe limitations. For instance, some methods test the null hypothesis that there is colocalization, which is not ideal because often the null hypothesis cannot be rejected simply due to limited statistical power (with too small sample sizes). Some other methods arbitrarily restrict the maximum number of causal SNPs in a locus, which may lead to loss of power in the presence of wide-spread allelic heterogeneity. Importantly, most methods cannot be applied to either GWAS/eQTL summary statistics or cases with more than two possibly correlated traits. Here we present a simple and general approach based on conditional analysis of a locus on multiple traits, overcoming the above and other shortcomings of the existing methods. We demonstrate that, compared with other methods, our new method can be applied to a wider range of scenarios and often perform better. We showcase its applications to both simulated and real data, including a large-scale Alzheimer’s disease GWAS summary dataset and a gene expression dataset, and a large-scale blood lipid GWAS summary association dataset. An R package “jointsum” implementing the proposed method is publicly available at github.


Introduction
We develop a simple and general approach overcoming the above shortcomings of the existing methods. We call this approach the conditional method, due to the central role of conditional modeling in fine mapping; in addition, more general than most existing methods, our approach applies to multiple traits with only summary statistics based on one or more possibly dependent/overlapping samples. We demonstrate that, in certain scenarios, the new method has power advantages over the existing methods using simulated data. Also, to show our method's flexibility as well as its ability to detect significant loci, we compare different methods using the large-scale GWAS lipid data [30] with four traits and summary statistics only, and to the largest AD GWAS by IGAP [31] with GWAS summary statistics only and an individuallevel gene expression dataset from ADNI [32].

Existing methods
We will compare our method with some major representatives for the existing methods, namely JLIM, coloc, coloc.abf, HEIDI and eCAVIAR, the details of which can be found in the S1 Text. To summarize, JLIM needs individual level data for the second trait; JLIM, coloc, coloc.abf and HEIDI mainly aim at the scenario with at most one causal SNP for each trait in a locus; eCAVIAR is more robust when allelic heterogeneity (AH) exists (i.e. there are multiple causal SNPs for a trait), but it still requires giving a (usually small) maximum number of causal SNPs (to limit the computational burden).

Testing colocalization with conditional analysis
Suppose we are interested in p traits and q SNPs, and we want to test whether at least one of the SNPs is causal for all the traits. We first assume the availability of individual-level data, and will discuss its extension to GWAS summary data at the end. The kth SNP and the jth trait in the individual level data are X k = (X 1k . . .X nk ) 0 and Y j = (Y 1j . . .Y nj ) 0 , where n is the sample size. With the individual level data, we can easily build a conditional model for each trait as where X = (X 1 . . .X q ), b j = (b j1 . . .b jq ) 0 , and e j = (e 1j . . .e nj ) 0 ; e j is assumed to be independently normal with mean 0. The null and alternative hypotheses for our colocalization test are H 0 : no k satisfies b 1k 6 ¼ 0; b 2k 6 ¼ 0; . . . ; b pk 6 ¼ 0; H 1 : at least one k satisfies b 1k 6 ¼ 0; b 2k 6 ¼ 0; . . . ; b pk 6 ¼ 0: Suppose the estimated coefficients and standard errors areb jk and seðb jk Þ, where k = 1,. . .,q and j = 1,. . .,p. We can also get the p-value forb jk , denoted by P jk . To test H 0 , we can apply the principles of the Intersection-Union Test (IUT) and the Union-Intersection Test (UIT) [33,34]. We divide the null hypothesis into simpler sub-hypotheses like H 1k : b 1k 6 ¼ 0; b 2k 6 ¼ 0; . . . ; b pk 6 ¼ 0; H 0jk : b jk ¼ 0: P jk can be considered as the test statistic for testing H 0jk . By IUT, max j (P jk ) is a valid test statistic for H 0k = S j H 0jk . If max j (P jk ) is small enough for SNP k, it suggests colocalization at this position/SNP since all of its P jk 's are small (i.e., its effects on all traits are significant). By UIT, min k [max j (P jk )] is a valid test statistic for H 0 = T k H 0k . If min k [max j (P jk )] is small enough, it suggests there is colocalization since at least one SNP's max j (P jk ) is small enough. Thus we can simply define our test statistic for H 0 as Suppose the given nominal statistical significance level is α. The colocalization test can be carried out by rejecting H 0 if P � <α, which means for at least one SNP, its effects on all of the traits are nonzero (p-value < α). However, this approach involves multiple testing, which will lead to inflated type I errors. Hence, we propose using the Bonferroni correction: we reject H 0 if and only if P � <α/q. In this way, the type I error rate is guaranteed to be controlled.
If we have only GWAS summary statistics, instead of individual-level data, we can also perform the colocalization test. Denote the estimated marginal effect size of the kth SNP on the jth trait and its variance byb jk andv arðb jk Þ, respectively. Suppose we also have some reference panel to estimate the LD among the SNPs. We can use some well-known method, e.g. described in [18], to estimate the coefficients and standard errors in each conditional model (1). Now after obtaining our estimatesb jk and seðb jk Þ, we can get the p-values and conduct the above colocalization test.

The conditional method with monte carlo approximation
Since the Bonferroni correction may be quite conservative, we propose another way to calculate the p-value, called the conditional method with Monte Carlo approximation (CMC), to test colocalization, which is expected to be less conservative. For convenience, we denote the previously described conditional method with Bonferroni correction by CB. Suppose Z = (Z jk ) is the Z-statistic matrix with Z jk ¼b jk =seðb jk Þ for trait j and SNP k. Assume that the covariance/correlation matrices are constant across the rows and columns, denoted as Cov(Z j .) = corr(Z j .) = R 1 , and Cov(Z .k ) = corr(Z .k ) = R 2 . R 1 and R 2 are usually estimated by and regarded as the correlations among the SNPs and among the traits respectively. Our test statistic is Suppose that the observed test statistic is t. The p-value is If we know the mean of Z jk 's under the null, we can calculate P(max k (min j |Z jk |)�t) by using multivariate normal tail probabilities or Monte Carlo simulations. The challenge here is that the means of Z jk 's are unknown under H 0 , which is a composite null hypothesis. For instance, it is possible that the first SNP has some effect on the first trait but not the second trait, which means Z 11 has a nonzero mean while Z 21 has a zero mean. It is also possible that the first SNP does not have any effect at all, meaning that Z 11 and Z 21 both have a zero mean. In different situations, the distribution of max k (min j |Z jk |) can vary a lot, which we will show numerically in the simulation section. What we know is for each SNP k, at most (p−1) of the Z jk 's have a nonzero mean under the null and at least one has a zero mean. A conservative way to get around is to assume all Z 11 ,. . .,Z 1q have a zero mean and use Pðmax k ðmin j jZ jk jÞ > tÞ � Pðmax k ðjZ 1k jÞ > tÞ ¼ 1 À PðjZ 11 j < t; . . . ; jZ 1q j < tÞ; where P(|Z 11 |<t,. . .,|Z 1q |<t) is now a calculable tail probability. The problem of this approach is that it is quite conservative, due to its use of the above inequality.
We propose to use the current data to roughly estimate which of the Z jk 's have nonzero means. For each SNP k, we compare its |Z jk |'s with |F −1 (θ/2)|, where F is the probability function of the standard normal distribution and θ is a tuning parameter, usually chosen as 0.05 or 0.1. We will discuss more about it later. If |Z j � k | is greater than |F −1 (θ/2)|, we assume under the null, SNP k is associated with trait j � ; otherwise SNP k has no effect on trait j � . Since each SNP must have no effect on at least one trait under the null (no colocalization), we always assume the effect corresponding to the smallest |Z jk | to be 0 regardless of how large this |Z jk | is. Let 4 be the set of all the trait-SNP pairs (j,k) whose corresponding effects are assumed to be nonzero. Then we assume for each k, which means |Z jk |'s with nonzero means are always larger than the other |Z jk |'s under the null. If none of the |Z jk |'s is greater than |F −1 (θ/2)|, we assume SNP k has no effect on any trait under the null, so Z jk 's all have a zero mean. In this way, we can estimate Now all of the Z jk 's on the right-hand side have a zero mean, and their covariance matrix can be easily constructed with R 1 and R 2 , because CovððZ j1 ; . . . ; Z jq Þ0Þ ¼ R 1 ; CovððZ 1k ; . . . ; Z pk Þ0Þ ¼ R 2 ; CovðZ jk ; Z j0k0 Þ ¼ 0 ðj 6 ¼ j0; k 6 ¼ k0Þ: We can break the whole probability down into tail probabilities that can be calculated by the "pmvnorm" function, as shown in the appendix, but the number of tail probabilities is usually too big (e.g. 3 q ) due to the limitation of the function, which makes it computationally infeasible for large q's. A better way is to use a Monte Carlo approximation. For each iteration b, we can simulate Z jk (b) 's that satisfy Z jk (b) = 24 from a multivariate normal distribution with zero mean and then calculate t ðbÞ ¼ max k min j:Z jk ðbÞ = 24 jZ jk ðbÞ j. Then the p-value is simply Ift ðbÞ > tg: Note that it is equivalent to simulate all of the Z jk (b) 's from MVN(μ,S) and calculate t (b) = max k min j |Z jk (b) |, where S = R 2 �R 1 and μ = (μ 11 ,. . .,μ 1q ,μ 21 ,. . .,μ 2q ,. . .,μ p1 ,. . .,μ pq ) 0 . Now μ jk = 0 if Z jk = 24 and μ jk = Inf otherwise, for defining μ jk = Inf results in Pðmax k ðmin j jZ jk jÞ � tÞ ¼ Pðmin j:Z j1 = 24 jZ j1 j < t; . . . ; min j:Z jq = 24 jZ jq j < tÞ. In practice, we can use a large value like 999 for μ jk (j, k: Z jk 24). For convenience, we denote this method by CMC. The choice of B depends on the significance level α. In the simulations where we aim at α = 0.05,B = 10 4 is usually more than enough. For the real data application where α = 5×10 −8 is often considered, we can try B = 10 4 first. If the p-value of one locus turns out to be less than 0.005, we can try B = 10 6 . If the new p-value turns out to be less than 5×10 −5 , we can use B = 10 9 to get a final p-value for that locus. We will follow this proposal in the next sections unless otherwise specified.

Model averaging
The previously described process uses |Z jk |'s to determine the null distribution and then uses Monte Carlo simulations to estimate a p-value. Nevertheless, its performance may be affected if the null distribution is not optimally determined (i.e. the assumed 4 is far off from the truth). Following the model averaging idea from [35], we can use different 4's to get different p-values, and then take the weighted average to get a final p-value. The process to get different 4's is: (1) Order |Z jk |'s. Suppose (j 1 ,k 1 ) has the largest |Z jk |, (j 2 ,k 2 ) has the second largest |Z jk |, etc.
(2) 4 0 is ϕ, the empty set, meaning that all of the means are zero.
(3) Examine each trait-SNP pair (j 1 ,k 1 ), (j 2 ,k 2 ), etc. Suppose currently the biggest 4 we have is 4 t and the trait-SNP pair we are looking at is (j l ,k l ).
Check whether (j l ,k l ) satisfies both a. jZ j l k l j is larger than a cutoff ξ.
b. After adding (j l ,k l ) into 4 t , it still satisfies the null hypothesis (i.e. the new 4 does not contain all (j,k l )'s with the same k l ).
If (j l ,k l ) satisfies these two conditions, then If not, move to the next pair (j l+1 ,k l+1 ) and check the conditions again. Continue this process until one of the following conditions is met: A. All of the |Z jk |'s that are larger than ξ have been examined.
For each 4 t , we can use the approach in the previous subsection to get a p-value P t . We can also get a weight, w t , for this p-value using the AICc value [36] as in [35], details of which are provided in S1 Text. Then the final p-value for testing colocalization is ∑ t w t P t /∑ t w t . For short, we call this method MA-CMC, or simply MA. If ξ is too small, it can lead to inflated type I errors. Our recommended value for ξ is 1.64. To save computing time, we recommend using a moderate u (e.g. 10 by default). Note that when we use the above process to get different 4's, if the number of SNPs is large (e.g. 100), instead of adding one SNP at a time like described in step (3), it may be beneficial to add multiple SNPs at a time. For example, if we add one SNP at a time, the largest size of 4 (or number of nonzero effects) is u. If we add s SNPs at a time, the largest number of nonzero effects that can be considered is su.

Selecting the tuning parameters
As mentioned before, CMC depends on a tuning parameter θ. When θ is small, CMC will be less conservative, but it can have inflated type I errors as well. When θ is large, CMC will be more conservative with loss of power. As a result, we recommend choosing a θ that is not too large or too small (e.g. 0.1). One way to interpret θ is that if the mean of Z jk is zero, the probability of getting |Z jk |>|F −1 (θ/2)| will be θ. More extensive simulation studies and discussions on choosing θ are provided in the S1 Text, based on which we decide to use θ = 0.1 by default. MA involves two more tuning parameters s and u. We recommend choosing u = 10 and modifying s according to the number of SNPs in each locus (e.g. s = 1 for q�20; s = 2 for 20<q�50; s = 3 for q>50). This choice gives us a fairly reasonable su, which means the largest number of nonzero effects that can be considered under the null. When su is fixed, choosing a larger u and a smaller s means adding more models. This usually leads to higher power but also larger type I errors, partly because the more models we add, the more likely one or more of them have significant results with none-negligible weights. Based on our experience and to keep it simple, we suggest using u = 10 by default, though cautions have to be taken (e.g. with possible sensitivity analyses).

Package availability
The coloc method uses the "coloc.test" function (for hypothesis testing) and "coloc.abf" (for drawing ROC curves) from the R "coloc" package. The HEIDI method was implemented by the original authors in the "smr" software available at http://cnsgenomics.com/software/smr/ #SMR&HEIDIanalysis, while the eCAVIAR method is available at http://genetics.cs.ucla.edu/ caviar/download.html. The JLIM method was originally implemented as the "jlim.test" function in the "jlimR" package, which requires input files in specific formats (e.g. need to include unnecessary SNP information); we re-implemented JLIM in our own R function, which only needs the p-values, a reference panel and permuted genotypes as input. This function for JLIM and the functions for CB, CMC and MA are included in the R package "jointsum", publicly available at https://github.com/yangq001/conditional.

Simulations
Possible null distributions. First, we use a simple example to demonstrate that the null distribution of the test statistic of the conditional method largely depends on the unknown truth of which effects are nonzero under the composite null hypothesis. Suppose we have 5 independent SNPs and 3 independent traits. Fig 1 shows the different densities of the test statistic T conditional under different null scenarios. In the first scenario, no SNP has any effect on PLOS COMPUTATIONAL BIOLOGY any trait. In the second scenario, the first two SNPs have nonzero effects on the first two traits. In the third scenario, the first four SNPs have nonzero effects on the first two traits. Since no SNP has nonzero effects on all of the three traits, the null hypothesis holds (i.e., there is no colocalization). As shown in Fig 1, the null distribution of the test statistic changes dramatically with different scenarios. The 0.95-quantiles are 1.24, 1.42 and 1.51 respectively. If the truth is scenario 3, but we use the null distribution in scenario 1 to generate test statistics, then it will lead to inflated type I errors; it is not hard to imagine the scenarios of using an inappropriate null distribution would lead to substantial power loss too. This confirms the importance of estimating the nonzero components as we did for CMC and MA.
Next, we did some simulation studies to compare the performances of different methods (CB, CMC, MA, JLIM, etc.). Independent samples. We simulated two traits for q SNPs in a small region in chromosome 19, making sure none of the correlations' absolute values were greater than 0.9. For the first trait, we selected the first 2000 subjects from the 4136 subjects in the Lung Health Study (LHS) data, and then used a linear model Y = Xβ+ε to obtain the trait, where β = (β 1 . . .β q ), and ε was independently normally distributed with mean 0 and variance 1. β k was nonzero when the kth SNP was causal for the first trait. For the second trait, we used the rest of the subjects and a similar model Y � = X � β � +ε � . Note that X and X � represented the same set of SNPs but for different samples, so the two traits, based on two non-overlapping samples, were independent. We regressed Y and Y � separately on each SNP using the corresponding subjects to get the summary statistics of marginal associations for each trait-SNP pair. The individual level genotype data were also used as our reference panels.
We compared the conditional method with various approaches (i.e., without Bonferroni adjustment, with Bonferroni adjustment, CMC, MA) along with JLIM, coloc and HEIDI, using the suggested tuning parameters for the methods (e.g. the threshold for neighborhood in JLIM was ffi ffi ffi ffi ffi ffiffi 0:8 p ). We used 500 permutations for JLIM. In addition to conducting these regional tests, we also obtained the results of testing whether a certain SNP k is causal for both traits. Simply reject the null hypothesis if both p-values of this SNP are smaller than α. The pvalues can be from the marginal models or the conditional models. For our simulations, we only included the case looking at the first SNP using the conditional p-values. To distinguish testing one SNP and testing a region in conditional analysis, we indicate the former by "1st" in the tables.
First, we looked at the type I errors of JLIM, conditional methods, CMC and MA. In this scenario, the rejection rates of coloc and HEIDI were their empirical power. As shown in Table 1, the conditional method with Bonferroni's adjustment for multiple testing was conservative, as expected, while CMC and MA were less conservative. The type I error rate of CMC went up as θ went down. As for JLIM, it had inflated type I errors in the last situation because it only looked at marginal effects. Due to its correlation with other SNPs, a non-causal SNP might turn out to have the most significant marginal association. In this case, SNPs 1 and 5 were causal for the first trait, but the correlation between these two SNPs and SNP 4 made SNP 4 the most marginally significant. As a result, JLIM tended to falsely conclude with colocalization, yielding larger type I errors. We also discovered that JLIM had very low rejection rates when the causal SNP for trait 2 was not SNP 1, which is due to JLIM's definition of neighborhoods. Detailed explanations can be found in S1 Text.
Next, we looked at the power of different methods. Since the conditional method without adjustment had already been demonstrated to have inflated type I errors, we did not include this approach anymore. For convenience, the conditional method will refer to the conditional method with Bonferroni correction from this point unless otherwise specified. Besides, a smaller θ was shown to make CMC less conservative but it would also possibly lead to inflated type I errors, so we chose θ = 0.1, a medium value, from this point on. More simulation results and discussions on the choice of θ are provided in S1 Text. As Table 2 shows, when both causal locations were SNP 1, the conditional method testing SNP 1 had lower power than JLIM, which was probably due to estimating all those parameters in the conditional model. However, sometimes JLIM had very low power when there were more than one causal SNP for a trait. For example, when SNP 1 was causal for trait 1, and both SNP 1 and SNP 4 were causal for trait 2, but the effect of SNP 4 was stronger, JLIM tended to take SNP 4 as the only causal location for trait 2. As a result, it concluded that there was no colocalization, which led to loss of power. In contrast, the conditional method was able to distinguish multiple causal SNPs and thus maintaining its power in this case. Using CMC had much higher power than using the Bonferroni adjustment. Here, MA performed similarly to CMC, but we will show later that sometimes the former was more powerful.
In addition, we compared the receiver operating characteristic (ROC) curves of JLIM, CB, CMC and coloc, as well as eCAVIAR (with the maximum number of causal SNPs set to 3). This time we used the more recent and popular Bayesian version (coloc.abf) of coloc, which gives the posterior probability of having one common causal variant [29,37]. We combined the scenarios in Table 1 and Table 2 to calculate the true positive rates (TPR) and false positive rates (FPR). Due to the close performance of CMC and MA, only the curve for CMC was plotted for better visualization. As shown in Fig 2, JLIM did not perform well at all since its Table 1. Rejection rates (type I errors for JLIM and conditional, power for coloc and HEIDI). q = 13. 1000 iterations. α = 0.05. The correlation is -0.69 between SNP 1 and SNP 3, 0.68 between SNP 1 and SNP 4, 0.02 between SNP 1 and SNP 5, -0.35 between SNP 4 and SNP 5. Different subjects for two traits. For CMC, B = 10 4 . For MA, B = 10 3 , u = 10 and s = 1.

PLOS COMPUTATIONAL BIOLOGY
assumption (at most 1 causal SNP) was often violated. The assumption of eCAVIAR (at most 3 causal SNPs) was sometimes violated too. In the last two scenarios in Table 2 with more causal SNPs, eCAVIAR had lower power because it failed to detect all the causal SNPs and thus was less likely to establish colocalization. As a result, when all the scenarios were combined, eCA-VIAR did not have much advantage. Meanwhile, CB and CMC were quite robust and did better than the Bayesian coloc (using default settings) in these scenarios. Note that JLIM's ROC curve seemed to have an uncommon shape in Fig 1. The main reason is that JLIM had many p-values of 0 or 1. We have already shown that sometimes JLIM had zero rejection rates in some scenarios in Table 1. Most of the p-values in those cases were 1. Likewise, when the alternative was true, because of the way of constructing the test statistic, the p-values of JLIM tended to be exactly 0 once it successfully picked up the right causal SNP. We tried increasing the number of permutations, but it did not really change the result. Dependent samples. In the above scenarios, we assumed the data for the two traits came from different subjects. We also looked at some other situations where the two traits were correlated from the same subjects. This time we only used the 2000 subjects previously used for trait 1. The models for two traits were Y = Xβ+ε and Y � = Xβ � +ε � . For subject i, the correlation between ε i and ε � i was 0.5. As shown in Table 3, the results were similar to the previous ones. The conditional method with Bonferroni adjustment was conservative, and CMC was less conservative. MA had higher power than CMC. JLIM had better performance in some cases, but in others it could have inflated type I errors or very low power because of its problematic assumption of at most one causal SNP for each trait (and that this causal SNP should be marginally the most significant). The conditional method did not have this issue at all.  Table 1 and

PLOS COMPUTATIONAL BIOLOGY
We did some other simulations that were closer to real scenarios, involving larger numbers of SNPs and various effect sizes. The genotypes were obtained from all the subjects (around 4000) in the LHS data. We used the Lipid data from the Global Lipids Genetics Consortium GWAS study [30] to get some loci on chromosome 1 and set up the linear models to simulate 2 traits, LDL and HDL. To determine the loci, we expanded from the marginally significant SNPs for LDL in a similar way to what was done by [38]. We also made sure none of the correlations were greater than 0.95. Then we used the Lipid data with the 1000G data as reference to build linear modelsỸ ¼Xβ þε andỸ � ¼Xβ � þε � whereỸ andỸ � here stand for LDL and HDL,X is the genotypes of the subjects used for the Lipid data. The models we used to simulate two traits were Y = Xβ+ε and Y � ¼ Xβ � þ ε � where X is the genotypes from the LHS data. β and β � were obtained by shrinking smaller effects (<τ) inβ andβ � to 0 and multiplying larger effects (�τ) by 2. Then we applied the methods in the same way as previous. The "true" effect sizes in each region can be found in S1 Text.
The first part of Table 4 shows the results for 2 regions (A1 and A2) with no colocalization. Both JLIM and CMC were able to control type I errors, while the conditional method with Bonferroni adjustment was much more conservative. Coloc did not reject its null hypothesis, probably because its assumption of only one causal SNP for each trait was violated. The second part of the table contains several examples of loci with colocalization (regions B1-B6). CMC and MA worked well and showed higher power than JLIM and CB in most cases. JLIM often had very low power when applied to these loci, because it only focused on the most significant SNP for each trait, while the SNP(s) with colocalization usually did not have the largest marginal effect size. In the meantime, coloc had very high rejection rates because there were many causal SNPs and the proportionality assumption did not hold. These results are consistent with what we had in Table 2 (rows 3, 5, 6 and 7).

Lipid Data: colocalization of two GWAS traits
We looked at the Lipid data from the Global Lipids Genetics Consortium GWAS study [30], containing summary statistics for traits LDL, HDL, TG and TC. The 1000 Genomes Project data [39] were used as the reference panel. To compare the performance of different methods, we applied the conditional methods and coloc to test colocalization of only two traits, LDL and HDL, in different regions, and we also tested each SNP in the marginal models and each lead SNP in the conditional models for colocalization. First, we focused on several chromosomes that had multiple SNPs associated with both LDL and HDL (marginal p-value < 5e-8). Then we looked at the SNPs that were also present in the 1000G data (with 503 subjects) on those chromosomes. Some of the SNPs were removed because they did not appear in the 1000G data. Instead of using sliding windows, we defined the loci associated with LDL following the procedure by [38]. JLIM could not be applied to this scenario because individual level data was not available for either trait. The marginal analysis looking at one SNP at a time examined each SNP on the chromosomes of interest, while the other methods only looked at the defined loci associated with LDL. According to Table 5, the conditional method with Bonferroni correction did not detect any significant loci while CMC found many, proving that CMC can be much less conservative. MA's result was close to CMC's. Coloc's result was different, likely because its null hypothesis was the opposite of the other methods' while the chosen cutoff was the same; this might lead to many false discoveries.

Lipid Data: colocalization of more than two traits
Using the conditional method with Bonferroni correction or CMC, we are also able to test colocalization of more than 2 traits, while the other methods we discussed can only be applied to 2 traits. We tested colocalization for LDL, HDL and TG with the same SNPs and loci as in the previous section. As shown in Table 6, CMC detected more colocalization loci than the conditional method with Bonferroni correction. The results of testing loci and testing lead SNPs were relatively consistent. Table 5. Numbers of SNPs and loci with colocalization. LDL and HDL only. For the methods that test each locus, α = 0.05/#loci. For the marginal method that tests each SNP separately, the numbers are the numbers of significant SNPs, and α = 0.05/(#SNPs in "Marginal"). The numbers in parentheses were obtained using the cut-off α = 5E-8. For coloc, the numbers of loci with colocalization are the ones that were not rejected for its null hypothesis under α.

PLOS COMPUTATIONAL BIOLOGY
In addition, we tested colocalization of all four traits, LDL, HDL, TG and TC with the new methods. This time we used the same gap length and window size to obtain large non-overlapping windows, and then applied the new methods to each window. We also applied the marginal analysis looking at one SNP at a time to see whether a SNP was associated with all the traits (all p-values < α). Note that the reference data (1000G) only have 503 subjects while the window size is as large as 100. The joint models we build may be quite inaccurate. Hence, we applied the Sum-MI method in [40], which uses the reference panel and marginal summary statistics to build joint models more effectively. Then we used the summary statistics (joint effects and their covariance matrix) from these joint models for the conditional method.
As Table 7 shows, in the presence of four traits, the conditional methods were able to detect multiple significant loci with summary statistics. MA had the most significant results.

IGAP and ADNI Data: colocalization of AD and gene expression
We applied the methods to the largest AD GWAS data by IGAP (International Genomics of Alzheimer's Project) [31], along with the ADNI (Alzheimer's Disease Neuroimaging Initiative) data [32] to detect colocalization of AD and gene expressions. CMC and MA's results were more significant than CB's. JLIM and the conditional methods had different results, mostly because JLIM looked at the marginal effects while the conditional methods looked at the joint effects. Fig 3 is a LocusZoom plot [41] that shows the difference between JLIM and the conditional method. For this locus, JLIM did not detect colocalization (its p-value was almost 1), while the conditional method did with a looser threshold (p-value = 8e-7). Note that JLIM only looked at the marginal effects while the conditional method examined the conditional Table 6. Numbers of loci with colocalization. LDL, HDL and TG. For the methods testing each locus, α = 0.05/#loci. For the marginal method that tests each SNP separately, the numbers are the numbers of significant SNPs, and α = 0.05/(#SNPs in "Marginal"). For testing lead SNPs, α = 0.05/(#SNPs in "Regional").

Chr
Marginal Regional

PLOS COMPUTATIONAL BIOLOGY
ones. None of the SNPs were marginally significant enough for both traits, so JLIM did not conclude colocalization. However, in the conditional analysis, the p-values of some SNPs for trait 1 became much smaller. As a result, the conditional method gave a much smaller p-value. More details and other examples are provided in S1 Text.

Discussion
We have presented a new method to test colocalization with conditional analysis, which plays a central role in fine mapping to distinguish causal SNPs from marginally associated ones due to the latter's LD with the former. The proposed conditional method offers some distinct advantages over other existing methods in many scenarios. First of all, unlike coloc and HEIDI, the conditional method's null hypothesis is that there is no colocalization, which seems more natural and desirable for the purpose of detecting colocalization. Besides, compared to methods like JLIM, the conditional method does not put any restriction on the PLOS COMPUTATIONAL BIOLOGY number of causal SNPs in a locus. It also considers conditional effects of SNPs in a locus, rather than marginal effects, which could be quite different. As shown in the simulation study and discussed in the literature [17,18], when multiple SNPs are in LD, the marginally most significant SNP may not be one of the causal ones, in which case JLIM may draw a wrong conclusion due to its relying on the marginally most significant SNP. JLIM may also ignore a SNP's causal effect when another SNP's effect is stronger. However, the conditional method can handle these cases with ease. The difference between marginal and conditional analyses is also shown in our real data examples. In addition, some methods require individual level data for at least one trait, while the conditional method can be easily applied to GWAS summary statistics with a reference panel. Finally, among the methods we have compared, the conditional method is the only one that is currently available for dealing with more than two traits from either independent or overlapping samples. We have developed two new methods called CMC and MA-CMC (or simply MA), which are less conservative than the Bonferroni correction, in conditional analysis of multiple SNPs in a candidate locus. The performances of CMC and MA have been shown to improve over many existing methods both in simulations and real data applications. A technical challenge in colocalization testing with conditional analysis is the nature of the composite null hypothesis: the null distribution of the test statistic (under H 0 ) is unknown (with some unknown nuisance parameters); specifically, under H 0 , although we have the asymptotic distribution of (Z jk ) as a matrix normal variate, but for any given j and k, the mean of Z jk is unknown that may not equal to 0. In CMC, we have proposed a simple method to estimate these non-zero means/ components; it can be made more conservative with the choice of a larger tuning parameter value (closer to 1) to estimate an upper bound of the true p-value, yielding a type I error rate under the nominal significance level with possible loss of power; we proposed a default value that seems to be working well across a wide range of simulations. However, CMC may also suffer from power loss when the estimation of non-zero components is too difficult to be good. To deal with this problem, we proposed to use MA, which looks at multiple estimates/choices of non-zero components and takes the weighted average of different results. As long as one of those choices is close to the truth, it can benefit MA's power, making it even less conservative than CMC. One limitation of MA is that it not only requires a cut off value as CMC does, but also needs a pre-specified number of effects to be added each time as well as the total number of models. Furthermore, MA requires more computing time. In addition, we tried the harmonic mean p-value (HMP) by [42] for the conditional method to replace the Bonferroni correction, but the improvement was very limited (so we did not include the results). In the future, it will be worthwhile to further explore other options to yield less conservative (and thus more powerful) colocalization tests in conditional analysis. One possible way to make the new method more efficient is to apply a sequential Monte Carlo method [43]. Finally, we would like to mention that, as for any challenging problem in practice, it might be more informative to apply multiple approaches, instead of a single one, to reach a more robust conclusion. Our proposed methods as hypothesis testing can be applied first to examine if there is any colocalization. If so, a Bayesian approach may be used to identify the SNPs that are likely to be causal for multiple traits. The first step focuses on the global question of the presence/ absence of colocalization, while the second step aims at identifying specific causal SNPs, which is more challenging and may require larger samples.
Supporting information S1 Text. A supplementary file contains a brief review on some main representatives of the existing methods, plots of effect sizes for Table 4, results for IGAP and ADNI data and some other technical discussions. (DOCX)