Leveraging pleiotropy to discover and interpret GWAS results for sleep-associated traits

Genetic association studies of many heritable traits resulting from physiological testing often have modest sample sizes due to the cost and burden of the required phenotyping. This reduces statistical power and limits discovery of multiple genetic associations. We present a strategy to leverage pleiotropy between traits to both discover new loci and to provide mechanistic hypotheses of the underlying pathophysiology. Specifically, we combine a colocalization test with a locus-level test of pleiotropy. In simulations, we show that this approach is highly selective for identifying true pleiotropy driven by the same causative variant, thereby improves the chance to replicate the associations in underpowered validation cohorts and leads to higher interpretability. Here, as an exemplar, we use Obstructive Sleep Apnea (OSA), a common disorder diagnosed using overnight multi-channel physiological testing. We leverage pleiotropy with relevant cellular and cardio-metabolic phenotypes and gene expression traits to map new risk loci in an underpowered OSA GWAS. We identify several pleiotropic loci harboring suggestive associations to OSA and genome-wide significant associations to other traits, and show that their OSA association replicates in independent cohorts of diverse ancestries. By investigating pleiotropic loci, our strategy allows proposing new hypotheses about OSA pathobiology across many physiological layers. For example, we identify and replicate the pleiotropy across the plateletcrit, OSA and an eQTL of DNA primase subunit 1 (PRIM1) in immune cells. We find suggestive links between OSA, a measure of lung function (FEV1/FVC), and an eQTL of matrix metallopeptidase 15 (MMP15) in lung tissue. We also link a previously known genome-wide significant peak for OSA in the hexokinase 1 (HK1) locus to hematocrit and other red blood cell related traits. Thus, the analysis of pleiotropic associations has the potential to assemble diverse phenotypes into a chain of mechanistic hypotheses that provide insight into the pathogenesis of complex human diseases.

improves the chance to replicate the associations in underpowered validation cohorts and leads to higher interpretability. Here, as an exemplar, we use Obstructive Sleep Apnea (OSA), a common disorder diagnosed using overnight multi-channel physiological testing. We leverage pleiotropy with relevant cellular and cardio-metabolic phenotypes and gene expression traits to map new risk loci in an underpowered OSA GWAS. We identify several pleiotropic loci harboring suggestive associations to OSA and genome-wide significant associations to other traits, and show that their OSA association replicates in independent cohorts of diverse ancestries. By investigating pleiotropic loci, our strategy allows proposing new hypotheses about OSA pathobiology across many physiological layers. For example, we identify and replicate the pleiotropy across the plateletcrit, OSA and an eQTL of DNA primase subunit 1 (PRIM1) in immune cells. We find suggestive links between OSA, a measure of lung function (FEV 1 /FVC), and an eQTL of matrix metallopeptidase 15 (MMP15) in lung tissue. We also link a previously known genome-wide significant peak for OSA in the hexokinase 1 (HK1) locus to hematocrit and other red blood cell related traits. Thus, the analysis of pleiotropic associations has the potential to assemble diverse phenotypes into a chain of mechanistic hypotheses that provide insight into the pathogenesis of complex human diseases.

Introduction
Genome-wide association studies of human phenotypes ranging from gene expression to human diseases are now routine. Cumulatively, the data indicate that complex traits are highly polygenic [1,2], and genetic correlation between these traits indicates abundant pleiotropy [3][4][5]. Interpreting the plethora of results raises two major challenges: first, generating testable mechanistic hypotheses about the underlying pathophysiology; and second, increasing statistical power to identify associations in studies of traits with small or moderate sample sizes. Leveraging pleiotropy can help address both of these challenges. Previous work has demonstrated that including many correlated traits in association studies increases power to detect associations common to multiple traits [3,6,7]. This approach is largely untried in genetic investigations of Obstructive Sleep Apnea (OSA). Here, we demonstrate that using shared associations between correlated traits can identify effects in under-powered studies of OSA, and that leveraging molecular and physiological endophenotypes in this way also generates clear and testable biological hypotheses.
OSA is characterized by recurrent episodes of partial or complete obstruction of the pharyngeal airway resulting in multiple physiological disturbances, including sympathetic nervous system activation, increased energy cost of breathing, intermittent hypoxemia, and wide swings in intrathoracic pressure. This disorder is highly prevalent in the general population, affecting more than 10% of middle-aged adults, with increased prevalence observed with aging, obesity, and cardiometabolic disease, and is more common in men [8]. OSA leads to sleep disruption, particularly increased sleep fragmentation and decreased proportion of restorative stages of sleep, resulting in daytime sleepiness, impaired quality of life and cognitive deficits [9]. Moreover, OSA is associated with increased rates of hypertension, incident heart disease, stroke, diabetes, depression, certain cancers, and overall mortality [10][11][12][13][14][15][16][17][18][19]. Despite the large number of epidemiological studies indicating that OSA is closely associated with these outcomes, there appear to be subgroup differences in susceptibility, e.g., middle-aged individuals and men are more likely to experience OSA-related cardiovascular disease in some studies than older individuals and women, respectively [20]. This underscores gaps in our knowledge of the pathophysiological pathways linking OSA to other diseases [21,22]. Pathophysiological pathways linking OSA to other diseases and factors that influence individual differences in susceptibility are poorly understood. While there are several effective treatments for OSA, including Continuous Positive Airway Pressure (CPAP), there appears to be substantial variation in overall clinical response and attenuation of cardiometabolic consequences, suggesting heterogeneity in both the etiology of the disease and susceptibility to its physiological disturbances.
Indices of OSA, including the Apnea-Hypopnea Index (AHI; the number of breathing pauses per hour of sleep), apnea event duration, indices of overnight hypoxemia, habitual snoring, and excessive daytime sleepiness, show substantial heritability in family studies [23]. Past studies have identified only a handful of associations with a variety of OSA-related traits. We have previously described a GWAS of OSA traits measured by overnight polysomnography in multi-ethnic cohorts totaling~20,000 individuals [24]. In that study, we found two genomewide significant multiethnic associations: variants in a locus on 10q22 were associated with indices of average and minimum SpO 2 and percentage of sleep with SpO 2 < 90%, and variants in a locus on 2q12 were associated with minimum oxygen saturation (SpO 2 ). In another study, we identified a locus in 17p11 with a male-specific effect on AHI [25]. Furthermore, in an admixture mapping study in Hispanic/Latino Americans, we identified a locus on 2q37 associated with AHI and one in a locus on 18q21 associated with AHI and SpO 2 < 90% [26].
The low number of genetic associations reported to date only explains a small fraction of OSA trait heritability. This relative paucity of findings is driven primarily by modest sample sizes, a reflection of the expense and complexity of measuring physiological phenotypes by overnight polysomnography or respiratory polygraphy. This also limits our ability to fine-map associations down to causative variants and thus identify relevant genes. Data on hundreds of thousands of individuals-sample sizes at which GWAS designs are well-powered to detect tens of loci and, in combination with additional experiments, fine-map some of them-have yet to be collected for OSA traits based on overnight polysomnography or respiratory polygraphy and may never be available [1,2]. Biological interpretation of available genetic associations is further complicated by the observation that most GWAS effects localize to enhancer regions and other regulatory elements and are often distal to physiologically relevant genes [27,28]. Now that GWAS of massive sample sizes have been accumulated for various comorbid conditions and endophenotypes related to OSA, we hypothesize that analysis of shared associations across correlated traits can identify effects in underpowered studies of OSA and generate clear and testable biological hypotheses. A number of computational methods increase power for discovering genetic associations by capitalizing on pleiotropy between disease phenotypes or between a disease and a molecular trait such as gene expression. One common approach takes advantage of the genetic correlation among phenotypes [6,[29][30][31]. This class of methods gains substantial additional power by pooling association signals across traits. However, such methods suffer from power loss when the correlation of genetic effect sizes is highly variable across the genome or limited to a subset of loci. For example, an approach such as MTAG is not suitable for small GWAS studies when the assumption of the homogeneity of genetic correlation is violated. The estimated genetic effect of an underpowered trait can be inflated by the strong genetic signals of well-powered traits that are pooled together even if the variant is not causative for the underpowered trait [6]. An alternative approach analyzes individual loci to detect pleiotropic alleles, with no regard to genetic correlation [4,7,[32][33][34][35][36][37][38][39][40][41]. Only a handful of existing methods account for the possibility that the apparent pleiotropy is driven by the linkage disequilibrium (LD) between two distinct causative variants each of which drives only one phenotype [4,[36][37][38][39][40].
In this study, we apply a colocalization method to detect shared associations between OSA and other related traits. We elected Joint Likelihood Mapping method (JLIM) [39] and eCAVIAR [38] as colocalization methods, but our general strategy does not depend on any specific method. We focus on a set of well powered intermediate traits which have previously been implicated in the pathobiology of OSA. Given prior GWAS studies suggesting the involvement of inflammatory genes in OSA [42][43][44], and cohort studies reporting high levels of inflammation, including elevations in neutrophils and monocytes in OSA [45,46], we included leukocyte and platelet related traits in our pleiotropic comparisons. Similarly, we also included red blood cell related traits given prior GWAS implicating iron metabolism [26] and erythrocyte function [24]. We will refer to these as clinical traits. In addition, OSA is associated with lung [47,48], obesity and cardiovascular-related disorders [46,[49][50][51], and we have included clinical traits that reflect these overlaps, together with gene expression traits in tissues implicated in these diseases.
By linking different clinical and gene expression traits to OSA at specific loci, our analysis suggests new hypotheses about OSA pathobiology across many physiological layers as well as finding a new association.

Creating a framework to identify associations in underpowered GWAS through pleiotropy
We used a colocalization method to identify pleiotropic loci, where a genetic effect drives association to two traits. First, we selected genome-wide significant loci (association p < 5 x 10 −8 ) in our well-powered trait (here, a clinical trait), and from these we selected the subsets which also show nominal association to OSA traits (p < 0.01 for any SNP in the locus). We then used colocalization to directly evaluate if the association to the two traits was consistent with the same underlying effect, indicating a pleiotropic effect. This two-step strategy allowed us to distinguish between cases where there was association only in the clinical trait; where there was a shared association in both traits; and if there were distinct associations in both traits stemming from different underlying effects (S1 and S2 Figs).

Simulations
We first used simulations to assess our strategy to identify true associations in an underpowered study by detecting pleiotropic associations with a better-powered study of another trait and following up with replication in an independent cohort. To assess sensitivity and specificity, we simulated variable proportions of shared and distinct causal effects, whilst assuming high polygenicity for each trait and only one causative variant per locus. We simulated GWAS statistics for pairs of traits corresponding to our own study design: we simulated situations where no variant is causative for an underpowered trait (H 0 ), where the same variant is causative for the association to both traits (H 1 ), and where two distinct variants in LD drive each of the associations at a locus (H 2 ) (Methods). To simulate summary statistics from a well-powered GWAS (representing our clinical traits), we sampled values from a multivariate normal (MVN) distribution using the local LD matrix as a variance-covariance parameter [52] with clinical trait sample size of 150,000. To simulate a GWAS of limited power (representing our OSA discovery cohorts), we used genotypes from ancestry-matched samples as a reference from which we simulated quantitative traits in 10,000 samples as surrogates for the OSA GWAS. Most overlapping associations are expected to be due to distinct variants [39]; we simulated scenarios with just 5% and 20% of true associations being driven by the same variant in both traits (H 1 ). For example, in the first case, we simulated 2,500 loci with 3.5% corresponding to H 1 , 66.5% corresponding to H 2 and 30% corresponding to no association in the underpowered trait (H 0 ). The focus of our simulation study was the ability to replicate the association signal for the primary phenotype in an independent cohort. This means successfully discriminating between (H 1 +H 2 ) and H 0 . We leave the interpretability question (discriminating H 1 from H 2 ) to Discussion.
We tested two colocalization methods-JLIM and eCAVIAR (a popular Bayesian colocalization test) [38,39]. We also tested a popular pleiotropy-informed test, conditional false discovery rate (cFDR) [32,33]. cFDR is not based on the LD-informed colocalization of the association signals. All methods appear underpowered in the discovery dataset at a sample size typical of current genetic studies of OSA (n = 10,000); the methods are only able to find associations if the observed effect is larger than the true effect (S3 and S4A Figs). Among the true associations identified by JLIM, eCAVIAR and cFDR, there is an enrichment of H 1 loci. In spite of this enrichment, many loci identified in the discovery cohort are driven by distinct variants (H 2 ): (for the simulation with 5% of the true associations being H 1 , 54.7, 55.8 and 61.5% of loci identified by JLIM, eCAVIAR and cFDR, respectively, are H 2 ). At the same time, almost all proportions of successfully replicated associations are driven by the same variant (H 1 ) in our simulation regimes (Figs 1C and 1D and S4B-S4F).
Because the colocalization tests (such as JLIM and eCAVIAR) and cFDR are using different features of the data, we found that taking a consensus between them can effectively exclude the majority of H 2 from the analysis (S3 Fig). In the intersection of JLIM and cFDR, only 3.5% of identified loci are H 2 ; in the intersection of eCAVIAR and cFDR, only 5.3% are H 2 (S4A Fig). Even though the elimination of H 2 loci increases the false discovery rate (fraction of H 0 ) in the discovery sample, it substantially increases the rate of successful replication. The main reason for the increase in the rate of successful replication is the prioritization of H 1 loci of larger true effect size and the corresponding reduction of multiple testing burden at the replication stage. The benefit of this approach increases with the fraction of loci driven by distinct effects (H 2 ) (S5 Fig). It also increased when the fraction of loci of no true effect (H 0 ) is low (as in the case of very high polygenicity [53] (S6 Fig)).
To test if the consensus method allows for more replicated loci simply due to the smaller number of candidate loci, we tightened the p-value threshold of cFDR to select the same number of candidates in discovery as the consensus method (S7 Fig) or to control the rate of H 0 loci (S8 Fig). We also explored a range of other simulation conditions, including varying the total number of examined well-powered trait-associated loci, correlation of effect sizes between well-powered and underpowered traits, trans-ethnic validation cohort, and the presence of A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for the shared effect in simulated discovery cohorts (n = 10,000), multiple causative variants in locus (S9-S12 Figs). Consistently, the consensus method resulted in more replicated loci than cFDR alone when the sample size is limited.
Next, we compared the rate of successful replication between the consensus approach and Bayesian meta-analysis methods. In contrast to colocalization methods, the Bayesian metaanalysis approach leverages the correlation of effect sizes between traits rather than SNP-level colocalization of causative variants [29][30][31]. We chose to compare two such Bayesian metaanalysis methods, MetABF and CPBayes, with the consensus approach using the simulated dataset. Again, when the sample size was limited (n < 20,000), our consensus method found more replicated associations than meta-analysis approaches (S13 and S14 Figs). As the sample size of the validation cohort increases, both MetABF and CPBayes identified more H 2 loci, which led to more replicated loci than the consensus method. Since the meta-analysis methods are designed for a pleiotropy analysis of more than two traits, in principle, they can boost the power by using more than one well-powered trait. To test if the multi-trait meta-analysis can outperform the pairwise meta-analysis, we simulated nine additional well-powered traits, generating GWAS statistics for one underpowered trait and ten well-powered traits in each locus (Methods). For the additional well-powered traits, we randomly decided whether the simulated causative variant was to be the same as or distinct from that of the main well-powered trait. Using this simulated dataset, we examined the power of the above Bayesian methods and non-parametric meta-analysis (iGWAS) [7] in multi-trait and pairwise settings. Overall, we could not see a clear advantage of multi-trait meta-analyses over pairwise tests in our simulated data (S15 Fig). Rather, particularly for CPBayes and iGWAS, multi-trait tests substantially underperformed pairwise tests. This result demonstrates that for underpowered studies, the consensus approach focused on SNP-level colocalization outperforms meta-analysis techniques in pairwise and multi-trait settings when the association signals are driven by distinct rather than the same effect in a large fraction of loci.
Finally, we found that looking at more trait combinations with more GWAS peaks results in proportionally more discoveries (Fig 1E). Overall, our simulations demonstrate that casting a wide net across many traits (increasing the number of GWAS peaks) and taking the consensus between pleiotropy mapping methods is a viable strategy to increase discoveries in underpowered studies. We therefore felt justified in pursuing this strategy using real data, to make additional discoveries in traits related to OSA.

Identifying pleiotropic associations between clinical traits and sleep apnearelated traits
Based on clinical relevance [54] and heritability [23], we focused on four OSA-related traits measured in five European-ancestry cohorts: the apnea-hypopnea index (AHI) [25], average respiratory event (apneas or hypopneas) duration [26], and minimum and average oxygen saturation (SpO 2 ) during sleep [24]. We used summary statistics from the remaining multi-ethnic cohorts in our replication effort. and then select loci were tested for replication in simulated validation cohorts of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR), colocalization tests (JLIM and eCAVIAR), or the intersection of cFDR and colocalization test. The p-value cutoff was set to 0.01 for cFDR and JLIM, and for eCAVIAR, the equivalent posterior threshold was calibrated using H 0 simulation data. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2  We assembled a collection of GWAS summary statistics for a total of 55 candidate intermediate traits from across these physiological areas: erythroid, leukocyte and platelet counts and function, from a study combining the UK Biobank and INTERVAL datasets (170,000 individuals of European ancestry) [55]; cardiovascular, metabolic and respiratory traits from the UK Biobank (380-450,000 European ancestry participants) [56,57], and cardio-metabolic traits (36,000 European ancestry participants) [58]. We then compared associations in each of these clinical traits to our OSA traits (6,781 European ancestry participants; S1 Table), to identify potential associations in the latter. A complete list of clinical traits we considered is presented in S2 Table. We tested for directional causal effects of the selected clinical traits on our OSA related traits using Mendelian Randomization (MR) [59]. Due to the low sample sizes in OSA traits, no comparison reached statistical significance after multiple test correction (S3 Table). After excluding the extended MHC region and the sex chromosomes, we identified 3,191 genomewide significant associations (p < 5 x 10 −8 ) in the 55 clinical traits, of which 2,939 had a corresponding suggestive association to one of the four OSA traits (p < 0.01 at any SNP in the locus; S4 Table). We then explicitly tested for evidence of pleiotropy between clinical and OSA traits using JLIM (2,142 to 2,236 tests for each OSA trait). We found evidence that in 61/2,939 of these regions the OSA and clinical trait associations are consistent with a shared, pleiotropic underlying causative variant by JLIM (false discovery rate (FDR) < 0.20) and at the same time show evidence of association to OSA by cFDR (Tables 1 and S5).
To independently validate our 61 putative OSA trait associations from the discovery stage, we compiled summary statistics for the same traits in 15,594 individuals of Asian, African, European and Hispanic ancestries/backgrounds (S6 Table). These individuals do not overlap with those from the cohorts used in our colocalization analysis using JLIM. We did not attempt to replicate pleiotropic associations; we only replicated the OSA association statistics. JLIM relies on local LD patterns being preserved between clinical and OSA trait cohorts, so we cannot use multi-ethnic data in our discovery analysis of pleiotropy. We found that 2/61 variants in S5 Table show significant association with the same OSA trait as the initial observation, after Bonferroni correction for the number of tests performed. The variant in SNP rs17476364 (Fig 2) links 11 of 12 red blood cell related clinical traits analyzed with average SpO 2 during sleep. It is an intronic variant in the hexokinase 1 (HK1) region in chromosome 10, and has been previously reported, as it reached genome-wide significance in association to minimum and average SpO 2 [24]. Another variant in SNP rs2277339 is a missense coding variant in DNA primase subunit 1 (PRIM1) in chromosome 12. It links plateletcrit (ratio of platelet volume to whole blood volume, a marker of thrombotic as well as a broad range of inflammatory processes [60]) to AHI. In the UK Biobank, it has documented significant associations to height, waist to hip-ratio, age at menopause and multiple red blood cell related traits [61]. Further 8 variants, shown in Table 1, were below nominal association p values < 0.05 in the replication set, but did not survive multiple test correction.

Incorporating gene expression to construct molecular hypotheses of sleep apnea physiology
Non-coding regions with evidence of gene regulatory activity carry a large proportion of heritability in most traits analyzed in large GWAS [62]. We reasoned that some OSA causative variants would reside in such regulatory regions, and thus act on gene regulation. We therefore sought shared associations between gene expression traits and the clinical traits for which we identified a pleiotropic association in the 61 loci in Tables 1 and S5. To do so, we compiled expression quantitative trait loci (eQTL) data for protein-coding genes expressed in lung, liver, spleen and skeletal muscle from individuals with European ancestry from the GTEx Project [63], and monocyte, T cell and neutrophil populations in individuals from BLUEPRINT [64]. We chose these tissues for potential relevance to OSA pathology: the lung is involved in OSA-related hypoxemia [47,48]; previous GWAS associations have implicated the neuromuscular junction in overnight SpO 2 levels, and abnormalities in upper airway muscle function are fundamental mechanisms for sleep apnea [24,65]; the spleen and liver are known to mediate filtration of erythrocytes, iron homeostasis and production of inflammatory cytokines; and leukocytes are key modulators of inflammation, an antecedent risk factor of OSA development [66]. We calculated FDR based on JLIM p-values over the 5,860 comparisons against 1,009 protein-coding genes of which transcription start sites are within 1 Mb from the index SNPs of clinical traits.
We were able to identify shared associations between eQTL and clinical traits in 7/61 loci (FDR < 0.05; Table 2). This includes several notable examples, including rs2277339, one of our two replicated SNPs ( Table 1). The rs2277339 SNP is a missense variant of PRIM1 but also an eQTL for PRIM1 levels in monocytes and T cells ( Fig 3A). Another example is a locus on chromosome 16 where we find that an eQTL for MMP15 (matrix metallopeptidase 15) expression in lung tissue is pleiotropic with a measure of lung function (FEV 1 /FVC, the ratio of (S1 Table). The replication p-value refers to the association p-value of the SNP to the OSA trait in meta-analyzed validation cohorts (S6 Table). The JLIM p-value corresponds to the tests for the shared effect between the clinical and OSA phenotypes.
https://doi.org/10.1371/journal.pgen.1010557.t001  forced expired volume per second to forced vital capacity), which in turn is pleiotropic with an association with minimum SpO 2 during sleep ( Fig 3B). The lead SNP for MMP15 (rs4784886) is marginally associated with minimum SpO 2 in the validation cohort (nominal p = 0.051).
Overall, of the eight identified genes, four genes (MMP15, SESN1, FOXO3 and BRD3) are known to be induced by hypoxia and oxidative stress, and three genes (FOXO3, BRD3 and GSDMA) are implicated in inflammatory responses. The colocalization of associations to gene expression in specific cell types, clinical and sleep apnea traits suggests biological hypotheses of the pathophysiology underlying OSA. We also investigated pleiotropy between gene expression traits and OSA in the three loci harboring known genome-wide significant OSA associations in the discovery sample (Tables 3  and S7). In each locus, we compared OSA trait summary statistics to eQTLs for genes within 1Mb from the most associated variant, where there exists a SNP with eQTL association pvalue < 5x10 -8 in the locus. We replicated a previously found pleiotropic effect in a locus on chromosome 17 [24], where minimum oxygen saturation (SpO 2 ) colocalizes with expression of the epsilon subunit of the nicotinic receptor (CHRNE), in various tissues, including neutrophils, monocytes, spleen and muscle (Fig 4).

Discussion
In our comparison of clinical (respiratory, cardiometabolic, hematologic, inflammatory) traits to OSA-related traits, the strongest finding lies in an intronic region of hexokinase 1 (HK1) and is associated with average overnight oxygen saturation level (SpO 2 ). This locus is pleiotropic with most of the red blood cell related traits tested (Fig 2) and corresponds to one of the most significant genome-wide associations we had previously reported from this data [24]. Prior to this analysis, two alternative hypotheses for the etiology of this signal had been proposed: that HK1 acted by modulating inflammation, or that it affected OSA by altering erythrocyte function. Our results provide evidence that is consistent with the erythrocyte pathway hypothesis. Mutations in HK1 have been implicated in anemia, together with severe hemolysis and marked decreases in red blood cells [67]. As discussed previously [24], it is possible that HK1 affects the Rapoport-Luebering shunt through glycolytic pathway intermediates, which in turn mediates oxygen carrying in mature erythrocytes. Factors that influence arterial oxygen levels can lead to a more severe OSA phenotype (i.e., lower average levels of oxygen saturation predispose to greater hypoxemia with each breathing obstruction). Lowered oxygen carrying  capacity and thus more tissue hypoxia could also contribute to breathing instability (and thus apneas) via Hypoxia-Inducible Factor-1 (HIF-1) and enhanced carotid body sensitivity and chemoreflex activation, or through long-term respiratory facilitation and plasticity [68,69]. The analysis of pleiotropy can be used to concatenate more than one phenotype to create candidate "causal chains," which by linking eQTLs to well-powered traits to sleep apnea related traits can hint at promising biological targets. Among the most significant results for this multicomponent model is DNA primase subunit 1 (PRIM1). We found that a missense variant in PRIM1 colocalizes with its own expression levels in monocytes and T cells, plateletcrit and AHI. The association of this variant with AHI was replicated in an independent validation cohort. The PRIM1 deficiency is known to cause lymphopenia along with severe growth retardation, microcephaly and "triangular face" [70]. The impact of PRIM1 on craniofacial morphology may contribute to OSA by causing a narrow airway, and thereby increasing the risk of airway obstruction during sleep; however, an inflammatory mechanism is also possible.
Another example of causal chains is matrix metallopeptidase 15 (MMP15), a gene whose expression in lung tissue is affected by an eQTL that colocalizes with a lung function phenotype (FEV 1 /FVC) which itself colocalizes with minimum SpO 2 during sleep. Matrix metallopeptidases (MMPs), a family of proteolytic enzymes that can be activated by inflammation and oxidative stress, participate in and potentiate tissue remodeling by breaking down the extracellular matrix. MMPs have been suggested to play an etiological role in OSA-related pathophysiological responses that may lead to multiple organ dysfunction [71], and specifically, may lead to OSA by contributing to abnormalities in the extracellular matrix of the skeletal muscle in the upper airway, predisposing to passive airway collapse [72]. In particular, MMP15 (membrane-type 2 MMP) is known to be upregulated by HIF-1α under hypoxic conditions [73,74] and is expressed in alveolar epithelial cells in Idiopathic Pulmonary Fibrosis (IPF) [75], an Interstitial Lung Disease (ILD) characterized by chronic inflammation, progressive formation of scar tissue and decreased lung function. OSA is highly prevalent in ILD as well as associated with subclinical markers of ILD [47]. OSA is also common in Chronic Obstructive Pulmonary Disease (COPD), and this overlap is especially associated with more severe hypoxemia [76]. Our results suggest a common causal pathway linking these lung diseases with OSA.
We also found that GSDMA (Gasdermin A) eQTL in T cells colocalizes with associations to the lymphocyte count, AHI as well as oxygen saturation. While Gasdermin A is considered to be expressed mostly in epithelial cells rather than in T cells, broadly, gasdermins mediate inflammatory responses via pyroptosis, a form of programmed cell death leading to the release of proinflammatory molecules, and play a key role in NLRP-inflammasome responses. Increased level of gasdermin D has been shown to mediate hypoxia-related muscle injury in an animal model of OSA [77]. Inflammatory mechanisms may increase risk for OSA through effects on soft tissue structures in the upper airway, muscle function, and/or neural control of breathing, as suggested by a prospective study of inflammation leading to higher risk of OSA [66].
Another interesting result is an eQTL in the epsilon subunit of the nicotinic acetylcholine receptor CHRNE that colocalizes with a genome-wide significant association in minimum oxygen saturation. This receptor is present at neuromuscular junctions, and mutations in this subunit are known to cause congenital myasthenic syndrome in humans that can result in progressive respiratory impairment [78]. As discussed above, abnormal skeletal muscle responses in the upper airway are considered to be central in the pathogenesis of OSA.
We have shown that we can leverage pleiotropy between OSA and physiological traits to identify the known OSA association in the HK1 locus. We could not identify two other known OSA associations at 2q12 and 17p11 using this approach, as there is no apparent pleiotropy between OSA and our library of clinical traits. Adding gene expression traits to this analysis increases our discovery ability, as we can identify the 17p11 association as pleiotropic with an eQTL for CHRNE in neutrophils, monocytes, spleen and muscle (Table 3). Furthermore, in simulations, we showed that using a large array of clinical trait-associated loci has a potential to boost the power to identify associations in underpowered studies by casting a wider net ( Fig  1E). These results warrant further investigation on using gene expressions in diverse cell types and conditions as a library of well-powered traits for underpowered association studies.
Replication in an independent cohort is frequently necessary to validate GWAS findings in small discovery samples. In this study, we significantly replicate the association of two loci to OSA after Bonferroni correction and identify additional eight loci that replicate only nominally but are probably enriched with true positive associations to OSA (S16 Fig). We randomly selected 28,558 unlinked autosomal SNPs (r 2 < 0.1 within 100kb windows) from our initial OSA GWAS in individuals of European ancestry. We focused on AHI since we detected the most SNPs reaching nominal replication in this trait. We found that 11,910 of these would have been selected for our colocalization analysis, in line with expectation for the presence of a SNP within the 100kb distance satisfying our in-sample threshold of association p < 0.01. Out of these, 6,123 had a significant out-of-sample nominal replication (association p < 0.05). In comparison, our pleiotropy analysis results for AHI are 4.6-fold enriched at this threshold (4/ 20 independent hits with r 2 < 0.1), suggesting the presence of true associations to sleep traits (one-sided Fisher Exact Test p = 0.018). In fact, the 76/975 associations that did not show significant evidence of pleiotropy were also slightly enriched for nominal replication relative to the set of randomly selected variants (one-sided Fisher Exact Test p = 0.00048, 1.6-fold enrichment). This suggests that additional pleiotropic effects-and therefore true OSA associationsremain to be discovered above the FDR cutoff we applied to our colocalization results although we cannot rule out the possibility that the random SNPs may not fully recapitulate the functional properties of GWAS SNPs from clinical traits.
From a methodological perspective, the analysis of pleiotropy has become an important tool in the analysis of complex trait genetics. Most complex traits are highly polygenic, implying that many variants associated with a single trait will also be associated with other traits or will be in LD with such variants. Different computational methods are required for different applications and for different genetic architectures. If the goal is to increase power to detect an association and the genetic correlation is broadly dispersed over many loci, methods explicitly capitalizing on the broad genetic correlation are capable of producing large power gains [6,29]. In cases where the majority of overlaps in GWAS association peaks between traits are driven by LD between distinct causative variants, power can still be increased with the help of other methods that leverage pleiotropy to reduce multiple testing burden [32,33]. Development of another group of approaches was motivated by the need to link genetic associations to genes via eQTL data [36,38,39] but, as shown here, these methods can be easily adapted to the analysis of other traits. Because of the abundance of association signals, especially for cellular and molecular traits, distinguishing between true pleiotropy due to the same underlying causative variants and different causative variants in LD is important for all the applications. Therefore, in our study of OSA, we selected a method that explicitly models LD structure in the locus. The drawback of this choice is the need to restrict the discovery sample to a demographically homogeneous subset while using the available multi-ethnic cohort for replication. While colocalization tests can distinguish the same and distinct causative variants in overall, the specificity is diminished in cases where association signals of two traits are driven by distinct causative variants in high LD (S2 Fig). For example, when the distinct causative variants are in LD between 0.8 and 0.95 (3.7% of simulated data assuming the random LD distribution), eCA-VIAR and JLIM misclassify 1.8 and 2.3% of H 2 loci, respectively, as driven by the same variant (JLIM p < 0.01 or at equivalent posterior probability threshold for eCAVIAR). JLIM estimates accurately calibrated p-values under H 0 and conservatively approximate p-values under H 2 . For underpowered GWAS studies, with limited sample sizes, we lowered JLIM's genetic resolution parameter θ to 0.5 in this study. JLIM does not attempt to distinguish distinct causative variants beyond its specified genetic resolution from the same causal effect. When distinct causative variants for two traits are separated with LD r 2 < 0.5, JLIM maintains the ability to distinguish the same and distinct effects in simulations (S2D-S2F Fig). And the overall specificity of JLIM to distinguish H 1 from H 2 is similar or higher compared to other colocalization method such as eCAVIAR in our simulated datasets (S2B Fig). Pleiotropy does not necessarily imply a causal relationship between phenotypes. Confounding by unadjusted covariates in GWAS data can further complicate the interpretation of causal chains due to the difficulty to distinguish between direct and indirect associations. Nonetheless, as we demonstrate here, a shared genetic basis between OSA and organismal, cellular and molecular traits can reveal new aspects of the underlying biology. This will likely be of benefit for other clinically relevant traits that are difficult to study at the scale required for GWAS. Traits that are burdensome or expensive to phenotype, rare diseases that are hard to sample and diseases that affect under-represented populations could all lead to underpowered genetic studies, which are unlikely to get dramatically higher sample sizes in the near future. Therefore, there is an unmet need to optimize the signals that can be extracted from small GWAS, and the strategy presented here should help in achieving this goal.

Ethics statement
This research was approved by Partners Healthcare IRB (protocol #2010P001765).

OSA Cohorts
To study pleiotropic associations underlying the risk of OSA, we prepared two sets of cohorts: the discovery cohorts to identify pleiotropic variants and independent replication cohorts to validate their associations to OSA traits. For the discovery cohorts, we used individual-level genotype data in order to determine the significance of pleiotropy by permutation (JLIM) [39]. At the replication stage, we do not carry out any pleiotropy analysis, we only check for genetic association to OSA, so summary-level association statistics were sufficient. In addition, we restricted the genetic ancestry of GWAS discovery cohorts to that of European ancestry, to match GWAS of clinical traits. This was in order to avoid potential issues due to the mismatch of LD patterns in our pleiotropy analysis. In contrast, we did not require the replication cohorts to have any specific ancestry. Thus, our replication cohorts included all available ethnicities.
The discovery cohorts included the subset of samples of European ancestry from the following five cohorts: the Atherosclerosis Risk in Communities Study (ARIC) [79], Osteoporotic Fractures in Men (MrOS) Study [80], Multi-Ethnic Study of Atherosclerosis (MESA) [81], Cardiovascular Health Study (CHS) [82] and the Western Australian Sleep Health Study (WASHS) [83]. ARIC is a study that investigates atherosclerosis and cardiovascular risk factors. It is one of the cohorts included in the Sleep Heart Health Study, which collected polysomnography and genotype data [84]. Genotype data were obtained through dbGaP (phg000035.v1.p1). MESA is a population-based study focused on cardiovascular risk factors, which included participants of four ethnicities: African-, Asian-, European-and Hispanic/ Latino-Americans ranging from ages of 45 to 86 years old. We only included samples from European-Americans in the discovery cohort. Polysomnography data measuring sleep-related traits were later obtained from individuals who did not use overnight oxygen, CPAP or an oral device for sleep apnea [85]. MrOS is a multi-center prospective epidemiological cohort assembled to examine osteoporosis, fractures and prostate cancer in older males [86]. An ancillary study (MrOS Sleep) measured sleep disturbances and related outcomes [87]. CHS is a cohort aimed to study coronary heart disease and stroke in individuals aged 65 and older, and genotype data were obtained through dbGaP (Illumina CNV370 and IBC; phg000135.v1.p1 and phg000077.v1.p1). WASHS is a clinic-based study designed to examine OSA and its associated genetic risk factors in patients referred to a sleep clinic in Western Australia. Not all individuals had measurements for the four OSA-related traits of interest. Details on genotyping, imputation and QC procedures have been previously reported [24]. See S1 Table for the details.
The replication cohorts were: the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) [88,89], Starr County Health Studies (Starr) [90], Cleveland Family Study (CFS) [91] and Framingham Heart Study (FHS) [92], in addition to non-European samples of CHS and MESA. HCHS/SOL is a population-based study to examine protective and risk factors for many health conditions among Hispanic/Latinos living in four urban areas within the USA (Chicago IL, Miami FL, San Diego CA and Bronx NY). Starr is a cohort collected to study risk factors for diabetes in a population of Mexican-Americans in Texas, later phenotyped for sleep traits [93]. CFS is a family-based study, which recruited patients with OSA, their relatives and neighborhood control families to study the familial and genetic basis of sleep apnea (356 families of African American or European American ancestry). We included only unrelated individuals from CFS. FHS is an epidemiological cohort established to study cardiovascular disease risk factors, using follow-up medical examinations every two years for the population of European Ancestry in Framingham, MA. Data from the first Sleep Heart Health Study were obtained between 1994-1998. Genotype data were obtained through dbGaP (Affymetrix 500k; phg000006.v7). See S6 Table for the details of each cohort.
We examined the following four OSA-related traits in the discovery and replication cohorts: minimum and average oxygen saturation (SpO 2 ), apnea-hypopnea index (AHI) and event duration. Briefly, the minimum and average SpO 2 were calculated from oximetry-based SpO 2 measurements over the entire recorded sleep interval excluding occasional waking periods. AHI was scored by counting the number of episodes of complete (apnea) or partial (hypopneas) airflow reduction associated with � 3% desaturation per hour of sleep. The event duration was measured for the average length of apneas and hypopneas, from the nadir of the first reduced breath to the nadir following the last reduced breath (in seconds). The full description of phenotyping protocols is present in the original studies which first reported their genetic analysis in the context of OSA [24][25][26]. We rank-normalized all OSA traits, separately in each cohort, in order to obtain normally distributed phenotypes.

Clinical trait data
For clinical traits, we used GWAS summary statistics calculated for various traits in the UK Biobank [56,94], blood cell-related phenotypes in a general UK population [55] and cardiometabolic phenotypes in individuals of European ancestry [58]. There is no sample overlap between clinical trait GWAS data and our discovery or replication cohorts. The full list of clinical traits is shown in S2 Table. The GWAS summary statistics for UK Biobank traits and blood cell counts were downloaded from their websites. The summary statistics of cardio-metabolic traits from [58] were obtained directly from the authors.

Identifying pleiotropic variants affecting both clinical traits and OSA
We applied Joint Likelihood Mapping (JLIM version 2.0) [39] to test whether the association signals of clinical and OSA traits were driven by a shared genetic effect. We ran JLIM only on the loci in which there was strong evidence of association to a clinical trait (genome-wide significant) and a suggestive association to the OSA trait (p < 0.01 at any SNP in the locus). In these loci, JLIM compares the likelihood of observed association signals under the following three competing possibilities: the OSA trait has no causative variant in the locus ("H 0 "), the same OSA causative variant is shared with the clinical trait ("H 1 "), and the OSA causative variant is distinct from the clinical trait causative variant ("H 2 "), as shown in S1 Fig. Since underlying association data have the limited genetic resolution, we test for modified hypotheses H y 1 and H y 2 instead of H 1 and H 2 . H y 1 represents that the causative variants of two traits are identical or in high LD (r 2 >θ), and H y 2 assumes that the causative variants are separated by the LD threshold (r 2 <θ). In our simulation of underpowered GWAS (see below for the details), only a small fraction (9.0%) of loci had lead SNPs in tight LD (r 2 >0.8) with causative SNPs; only 15.7% of loci had the modest LD of r 2 >0.5 between lead and causative SNPs. This contrasts with well-powered GWAS data, for which, in 91.4% of simulated loci, lead SNPs were in LD of r 2 >0.8 with causative SNPs. To account for this limited statistical resolution of underpowered GWAS data, we lowered θ to 0.5 from the default of 0.8 in this study.
JLIM calculates the ratio between the likelihood of the data under H y 1 compared to that under H y 2 and evaluates the significance of this statistic by permuting the phenotypes simulating the lack of causal effect under H 0 . The false positives due to H y 2 are indirectly controlled by the asymptotically conservative behavior of JLIM: with sufficiently large effect sizes or sample sizes, the cumulative distribution function of JLIM statistic shifts lower under H y 2 than under H 0 as previously shown analytically [39]. Thus, under the asymptotic condition, JLIM guarantees that the p-value estimated under H 0 can be used to control for H y 1 as well. However, under non-asymptotic conditions, in particular in case of underpowered cohorts of limited sample sizes, the power to distinguish H y 1 from H y 2 will be diminished. This reduced specificity is more pronounced when the LD between distinct causative variants for two traits is substantial (See Supplements of [39] for the detail). JLIM assumes that only up to one causative variant is present for each trait in a locus. However, simulations showed that the accuracy of JLIM remained robust in the presence of multiple causative variants in a locus (S12 Fig).
To run JLIM, we used the genetic association statistics of OSA traits calculated over all common SNPs in a 200kb analysis window around the focal SNP (the lead SNP of a clinical trait). We derived these statistics from our discovery cohorts by combining association signals of each cohort using an inverse-variance weighted meta-analysis approach. The association statistics were calculated in individual cohorts by linear regression adjusting for age, sex, BMI and the top three principal components. The principal components were calculated from genome-wide genotype data in each cohort separately. We used mean imputation for missing covariate values. Multi-allelic SNPs and rare variants with minor allele frequencies (MAF) below 0.05 were excluded from the analysis. We only used variants present in all of the discovery cohorts. To reflect the limited resolution for fine-mapping of causative variants in underpowered studies, we relaxed JLIM's genetic resolution parameter θ to 0.5 from the default value of 0.8. All other parameters were set to the default. We ran JLIM only in the discovery cohorts. JLIM 2.0 requires individual-level genetic data for underpowered traits in order to run permutations. This limits the applicability of our approach to more general cases where only summary-level data are available. The direct permutation on individual-level genetic data, however, enables robust estimation of p-values even under the vagaries of imputation noise across cohorts for underpowered traits. For the permutation procedure, we used the same pipeline described above to generate permuted association statistics for JLIM. OSA phenotypes were randomly shuffled in each cohort separately. For each permutation, the association statistics were calculated in the same way including all the covariates. Then, the cohort-level association statistics calculated on permuted data were combined across cohorts by meta-analysis. This permutation procedure was repeated up to 100,000 times, adaptively, to estimate JLIM p-values.
We accounted for the multiple testing burden with the False Discovery Rate (FDR), separately for each clinical and OSA trait combination. Specifically, we used the Benjamini-Hochberg procedure to calculate the FDR as follows: where u and v indicate a combination of clinical and OSA traits, JLIM u,v (l) is the JLIM p-value of tested locus l, and N u,v is the number of loci tested between the trait pairs u and v. JLIM hits were obtained at the FDR threshold of 0.2. The JLIM/cFDR consensus was defined by the intersection between the JLIM (FDR < 0.2) and cFDR hits defined at the same p-value threshold. Specifically, given the JLIM p-value cutoff of FDR 0.2, i.e., p 0.2 such that FDR u,v (p 0.2 ) = 0.2, the cFDR hits were defined by the association p-value below p 0.2 for an OSA trait. Note that the p-value of association to a clinical trait is always ascertained to be less than 5 x 10 −8 in our analysis. The cFDR association p-values were examined at the focal SNP.

Comparison with eCAVIAR
eCAVIAR (version 2.2) was run in the default setting. The reference LD matrix was set to the LD estimated from the subjects of European ancestry (n = 10,000) randomly subsampled out of our OSA discovery cohort. Colocalization Posterior Probability (CLPP) was calculated for lead SNPs of well-powered clinical traits to identify pleiotropic loci. The CLPP thresholds were calibrated to be equivalent to p-value cutoffs using unfiltered null simulation datasets (H 0 ), which were not preconditioned on the minimum association p-values in loci.

Replication of OSA associations in independent samples
We validated the OSA associations identified in the discovery cohorts by replicating them in out-of-sample multi-ethnic replication cohorts (S6 Table). There was no sample overlap between our discovery and replication cohorts. We combined the p-values of associations across the six replication cohorts by applying an inverse variance-weighted meta-analysis technique. We defined the p-value of association < 0.05 as nominal evidence of replication and the p-value < 0.05/61 as a more stringent Bonferroni-corrected replication cutoff, given that 61 independent SNPs were uncovered for their pleiotropic associations to OSA in the discovery cohort (Tables 1 and S5).

Comparison of nominal replication rates with random SNPs
To assess the sensitivity of our pleiotropy analysis, we compared the nominal replication rate of random SNPs with that of candidate pleiotropic variants identified in the discovery cohort. We started with 87,938 SNPs randomly selected across the autosomes, excluding chromosome 6 to avoid the major histocompatibility complex locus. Then, we used Plink to apply LD pruning on the random set of SNPs and obtained a subset of 28,558 independent SNPs. The LD pruning procedure ensured that the r 2 between SNPs was less than 0.1 in the distance of 100kb in the LD background of non-Finnish Europeans (n = 404) from the 1000 Genomes Project. For the fair comparison between random and predicted pleiotropic SNPs, we further filtered these random SNPs based on the OSA association p-values in 200kb windows. Overall, we kept only 11,910 random independent SNPs that have an AHI-associated SNP (p < 0.01) within 100kb distance. On all loci examined for the pleiotropy, we similarly applied the LD pruning procedure on the focal SNPs to obtain a subset of independent SNPs. During the pruning steps, SNPs with more significant JLIM p-values were preferentially retained. After the LD pruning, we obtained 995, 1,024, 967 and 961 independent SNPs for AHI, average SpO 2 , minimum SpO 2 and event duration, respectively; of which 20, 12, 6 and 11 SNPs are predicted to be pleiotropic in the discovery cohort.

Identifying pleiotropic variants affecting both gene expressions and OSA
We used cis-expression quantitative trait loci (eQTLs) from the Gene-Tissue Expression project (GTEx release v8) [63] and BLUEPRINT epigenome project [64], to examine pleiotropy between the variation in gene expression levels and OSA phenotypes. Among the GTEx datasets, we only considered liver (n = 178), spleen (n = 179), skeletal muscle (n = 588) and lung (n = 436) tissues for our analysis, based on the potential relevance of these tissues to OSA and their sample sizes. Again, we used eQTLs calculated only with samples of European ancestry for this analysis. The genome-wide summary statistics of European American eQTLs were obtained from the GTEx Consortium. For the analysis of immune cell eQTLs, we used BLUE-PRINT datasets which consisted of genotypes of participants and expression profiles of CD14 + monocytes (n = 191), neutrophils (n = 196) and CD4 + T cells (n = 167). The RNA transcripts of BLUEPRINT samples were derived from unstimulated primary cells collected from healthy individuals of European ancestry. The genome-wide summary statistics of BLUEPRINT eQTLs were downloaded from the eQTL Catalogue [95].
We start with the 61 index SNPs corresponding to the candidate pleiotropic loci identified by JLIM FDR < 0.2 and cFDR consensus (S5 Table). Using GTEx and BLUEPRINT eQTLs, we scanned for pleiotropy between eQTLs and clinical traits. We considered all protein-coding genes whose transcription start sites (TSS) were less than 1Mb away from the focal SNP of a clinical trait (1,011 unique genes; 5,690 eQTLs). The protein-coding genes were defined by Ensembl annotation (release 104). The genes with eQTL association p-value > 0.05 at all SNPs in the locus were excluded due to weak evidence of association to gene expression. Overall, a total of 5,860 clinical traits/eQTLs pairs (1,009 unique genes; 5,023 eQTLs) were tested for the pleiotropy using JLIM in the default setting (Table 2). JLIM version 2.5 was used to test for pleiotropy only using summary-level eQTL data that are publicly available. The p-values were calculated by adaptive resampling (up to 10,000 iterations). The null distribution of JLIM statistic was generated by random sampling of phenotypes from a normal distribution, sampling of genotypes from the reference LD panel (1000 Genomes, non-Finnish Europeans), and then linearly regressing the sampled phenotypes on the genotypes. Multiallelic SNPs and rare variants with MAF < 0.05 were excluded from the analysis. The multiple testing burden was accounted for with the False Discovery Rate (FDR) by applying the Benjamini-Hochberg procedure separately to each tissue/cell type and clinical trait combination.

Using gene expression as clinical traits
In three loci with known genome-wide associations to OSA traits, we tested for pleiotropy between gene expression and OSA traits using cis-eQTLs as clinical traits. We examined four GTEx tissues and three BLUEPRINT immune cell types, similarly to the above causal chain analysis. There are a total 57 eQTLs in the three loci satisfying: 1) the eQTL p-values < 5 x 10 −8 for a SNP within the 200kb window centered at the known OSA-associated SNP, and 2) the transcription start sites being within 1 Mb from the OSA-associated SNP. Using these 57 eQTLs as clinical traits, we applied JLIM in the default setting (S7 Table). For the OSA traits, we used the association statistics from our discovery cohorts. The JLIM FDR was calculated using the Benjamini-Hochberg procedure (a total of 57 tests).

Simulated datasets
To compare the accuracy of JLIM to other methods, we simulated genetic loci with pleiotropic associations under different scenarios with unbalanced sample sizes. For one of two traits, we simulated a well-powered GWAS of a quantitative trait with a sample size of 150,000. For the other trait, we simulated an underpowered GWAS with a much smaller sample size of 10,000. To generate datasets of realistic LD backgrounds, we used real genotypes of 80 randomly picked loci across the genome. Each locus was 200kb in length. Chromosome 6 and the sex chromosomes were excluded from our simulations due to the difference in LD patterns from the rest of genome.
In these loci, we generated 80,000 sets of association statistics under each of the following six scenarios (H 0 , H 1 and four configurations of H 2 ). For the simulations of H 0 , the SNP in the midpoint of the genomic segment was chosen as the causative variant for well-powered traits, but no causal effect was simulated for underpowered traits. For H 1 , the midpoint SNP was taken as the shared causative variant for both well-powered and underpowered traits. For H 2 , the midpoint SNP was selected to be causal for well-powered traits, but a distinct SNP was selected as the causative variant for underpowered traits. The distinct SNP was randomly chosen in the intervals of LD relative to the midpoint SNP, selecting one in each of the following LD ranges: |r| < 0.3, 0.3 < |r| < 0.6, 0.6 < |r| < 0.8 and 0.8 < |r| < 0.95.
For each pair of causative variants, we randomly sampled the genetic effect sizes (β 1 , β 2 ), corresponding to well-powered and underpowered traits, respectively, from the following bivariate normal distribution: where s 2 1 and s 2 2 are per-SNP heritability of two traits, set to 5 x 10 −5 under the assumptions of the heritability of 0.5, causal fraction of 0.01 and 1,000,000 independent markers across the genome. The parameter ρ was used to represent the correlation of causal effect sizes between two traits. For H 1 , we generated simulated data under no correlation (ρ = 0) as well as moderate to high correlation (ρ = 0.5, 0.7 or 0.9). For H 2 , we assumed ρ = 0 since the causative variants are not shared. For H 0 , β 2 was set to 0.
JLIM only requires GWAS summary statistics for the well-powered trait. Therefore, for all SNPs j = 1,. . .,m in each locus, we generated summary statistics by sampling the observed association statistics z = (z 1 ,. . .,z j ,. . .,z m ) from the following multivariate normal distribution [52]: where N 1 is the GWAS sample size (150,000), m is the number of markers in the locus, D is the m x m local LD matrix, and β is the m-dimensional vector of true causal effects (on standardized genotype values) of all SNPs in the locus. Here, β was set to β 1 at the causative SNP and to 0 at all other SNPs. The p-values of association were calculated from the z scores as follows: where z j is the association statistic at SNP j and F is the standard normal cumulative distribution function. JLIM 2.0 requires individual-level genotype data as well as association statistics for the second trait. Therefore, for the underpowered trait, we generated a discovery cohort with simulated phenotypes for all individuals; then, we calculated the association statistics by linear regression between genotypes and phenotypes, instead of sampling the summary statistics from a multivariate normal distribution. The genotype data of 10,000 individuals were obtained by subsampling from six cohorts of European Ancestry (MESA, ARIC, MrOS, CHS, CFS, FHS and WASHS). The phenotype value y i of each individual i was generated by random sampling from the following standard normal distribution: where x i is the genotype of the causative variant in the individual i (standardized to the mean of 0 and the variance of 1). The association statistics and p-values of association were obtained by regressing the phenotypes y i on the genotypes of each SNP in the locus. We excluded multiallelic sites and variants with MAF < 0.05 from the analysis.
The above process generated up to 480,000 sets of association data under scenarios of H 0 , H 1 and H 2 in the genetic background of 80 loci (i.e., 80 x 6 x 1,000). Because of sampling noise, in a minority of simulations, association data of some loci failed to pass the genome-wide significance threshold for well-powered traits. We rejected such simulation runs to mimic our data analysis, where we only included loci in which the clinical trait association p-value was genome-wide significant. Similarly, we excluded instances where there exists no SNP with the p-value of association < 0.01 for underpowered traits. In total, we retained 25,958, 31,858, 28,602, 29,131, 27,025 and 25,115 sets of association data for H 0 , H 1 , and H 2 of distinct causative variants in low to high LD, respectively.
Next, we estimated the distribution of LD between random pairs of SNPs in the 80 simulated loci and titrated the causative variants of H 2 simulations to follow this distribution. Specifically, we subsampled H 2 datasets to fit to the following composition: 72% for |r| < 0.3, 19% for 0.3 < |r| < 0.6, 5% for 0.6 < |r| < 0.8 and 4% for 0.8 < |r| < 0.95 (S2C Fig). After this, we generated GWAS datasets of T loci by random subsampling of H 0 , H 1 and H 2 . T represents the total number of GWAS loci available from well-powered traits and varied from 500 to 5,000. The proportion of H 0 out of T loci varied from 0 to 30%. The relative ratios of loci simulating H 1 and H 2 were set to 1:19, 1:9, 1:4 or 1:0.
To assess the power to replicate the pleiotropic associations in independent validation cohorts, we calculated the expected association statistic z v j for the underpowered trait at the focal SNP j: where the focal SNP j was defined by the lowest p-value of association to the well-powered trait, N v is the sample size of validation cohort for the underpowered trait, ranging from 10,000 to 35,000, β 2 is the effect size for the underpowered trait at its causative SNP j 0 , and r jj 0 is the LD between two SNPs j and j 0 . For the LD backgrounds of validation cohorts, we used CEU and YRI from the 1000 Genomes Project to represent ancestry-matched and trans-ethnic validation cohorts, respectively. The expected number of replicated signals after Bonferroni correction was calculated by estimating: where M is the number of pleiotropic associations found in each subsampled GWAS dataset of T loci.
Last, we also evaluated the accuracy of our method under the presence of multiple causative variants that were shared between two traits in H 1 . In addition to the midpoint SNP, we randomly picked another SNP in the locus as a second shared causative variant. The effect sizes of the two causative variants for well-powered and underpowered traits were sampled from the same bivariate normal distribution. We generated GWAS datasets for the well-powered and underpowered traits in a similar manner as the simulations of single causative variants. The correlation of effect sizes between well-powered and underpowered traits (ρ) was set to 0.7 for H 1 . The proportion of H 0 was assumed to be 30% of examined GWAS loci (2,500). The relative proportion of H 1 and H 2 was set to 1:19. The proportion of loci with two causative variants was set to 1/4 of all H 1 as expected under Poisson distribution with the causal fraction of 0.01.

Simulated datasets for the meta-analysis of more than two traits
GWAS data were simulated for ten well-powered and one underpowered traits under H 0 , H 1 and H 2 (S15A Fig). One of the well-powered traits was ascertained to have a genome-wide significant association (the main well-powered trait). Here, we simulated association statistics only for the causative SNP of the main well-powered trait rather than simulating for entire SNPs in the locus. For simplicity of simulation, we assumed that the lead SNP of the main well-powered trait is in tight LD with its causative variant. The rest of well-powered traits have the same or distinct causative variants from the main well-powered trait. The number of additional well-powered traits simulating the same causative variants was randomly decided by sampling from a binomial distribution Binom n ¼ 9; p ¼ 1 20 À � , and the rest were assumed to have distinct causative variants. For underpowered traits, the causative variant was assumed to be absent, same or distinct from the main well-powered trait depending on whether the locus was simulated under H 0 , H 1 or H 2 , respectively. Effect sizes of all traits sharing the same causative variant were sampled together from a multivariate normal distribution with the correlation of effect sizes between traits set to 0.7. Effect sizes of traits simulating distinct causative variants were sampled independently and then multiplied by a random variable representing the LD between causative and tested SNPs. The association statistics were generated with the sample sizes of n = 150,000 for well-powered traits and n = 10,000 for underpowered traits. In total, 3,751, 613 and 6,229 sets of simulated association statistics were generated under H 0 , H 1 and H 2 , respectively. From these simulated H 0 , H 1 and H 2 datasets, GWAS datasets of 2,500 loci were generated by random subsampling at the proportions of 30%, 3.5% and 66.5%, respectively (H 1 :H 2 ratio of 1:19). We evaluated the power to replicate the candidate pleiotropic loci using the subsampled GWAS datasets (1,000 iterations). The power to replicate after Bonferroni correction was estimated in the same way as other simulations. The validation cohort were assumed to be of the same genetic ancestry as the discovery cohort, and the sample sizes varied from n = 10,000 to 35,000. For pairwise analyses, we used only the main well-powered trait and underpowered trait. On the other hand, for full multi-trait analyses, we used the entire data including all ten well-powered traits and one underpowered trait.

Meta-analysis methods
We benchmarked the power to replicate pleiotropic associations in underpowered cohorts by applying three meta-analysis methods-MetABF, CPBayes and iGWAS-on simulated datasets. We ran MetABF in the subset-exhaustive mode to scan all possible subsets of pleiotropic traits. The parameter for the correlation of effect sizes was set to the known value of simulation (ρ = 0.7). MetABF was run three times with the scale parameter of causal effect sizes set to 0.1, 0.2 and 0.4, and the Approximate Bayes Factors (ABF) were averaged over the three runs as recommended by the authors. MetABF reports the ABF relative to H 0 , thus we calculated the ABF of pleiotropy to the underpowered trait relative to that of no pleiotropy to the underpowered trait by summing ABFs over subsets. In contrast, we ran CPBayes in the fully automatic setting. All parameters of the pleiotropy were directly learned from the data. We used the PPAj (Posterior Probability of Association) of the underpowered trait to identify pleiotropic associations. In comparison to the two Bayesian methods, iGWAS is a non-parametric meta-analysis test. We ran iGWAS in the default setting. For all three methods, candidate pleiotropic associations were identified in the discovery cohort at the same p-value cutoff of 0.01. For the Bayesian methods, we applied the equivalent ABF and PPAj thresholds by calibrating them to have the false positive rate of 0.01 in our null simulation (H 0 ).  . (B-F) The number of loci replicated in a validation cohort, subdivided by the configuration of causative variants: (B) H 1 , H 2 with the LD between causative variants to be in the ranges of (C) |r| < 0.3, (D) 0.3 < |r| < 0.6, (E) 0.6 < |r| < 0.8 and (F) 0.8 < |r| < 0.95. In all panels, simulation was conducted under the following parameters: A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in simulated discovery cohorts (n = 10,000), and then the candidate pleiotropic loci were tested for replication in simulated validation cohorts of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR), eCAVIAR, Joint Likelihood Mapping (JLIM), or the intersection of eCAVIAR or JLIM and cFDR, all at the p-value cutoff of 0.01 (or equivalent posterior cutoff). The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1 and H 2 at the ratio of 1:19. The effect sizes of causative variants are correlated (ρ = 0.7) under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs. (TIF)

S5 Fig. The projected number of replicated loci in simulations, varying the relative ratio between H 1 and H 2 .
A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR), Joint Likelihood Mapping (JLIM), or the intersection of both, all at the p-value cutoff of 0.01. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1  were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR), Joint Likelihood Mapping (JLIM), or the intersection of both, all at the p-value cutoff of 0.01. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was varied to (A) 0%, (B) 10%, (C) 20% and (D) 30%, and the remaining loci were split to H 1 and H 2 at the ratio of 1:19. The effect sizes of causative variants are correlated (ρ = 0.7) under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs. (TIF) S7 Fig. The projected number of replicated loci in simulations where the cFDR threshold was tightened to identify the same number of candidate loci as the JLIM/cFDR consensus method in a discovery cohort. A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR) or by the JLIM/cFDR consensus method. (A) The consensus method (red line) selected 12.9 candidate pleiotropic loci by taking the intersection between JLIM p < 0.01 and cFDR (association p < 0.01 for an underpowered trait). For the comparison, we tightened cFDR threshold to underpowered trait assoc p < 0.0012 so to identify the same number of candidate loci in a discovery cohort (black line). (B) Similarly, the consensus method (red line) found 7.5 candidate loci by taking the intersection between JLIM p < 0.005 and cFDR (assoc p < 0.005 for underpowered trait). The cFDR threshold was tightened to underpowered trait assoc p < 0.00060 for the same number of candidates (black line). In both panels, the 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1  A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR) or by the JLIM/cFDR consensus method. (A) The consensus method (red line), by taking the intersection between JLIM p < 0.01 and cFDR p < 0.01, showed the empirical false positive rate of 0.0038 in simulated H 0 dataset. To match this false positive rate, we tightened cFDR threshold to p < 0.0038 (black line). (B) Similarly, the consensus method (red line) showed the empirical false positive rate of 0.0018 in H 0 when the intersection was taken at JLIM and cFDR p < 0.005. To match the false positive rate, the cFDR threshold was tightened to p < 0.0018 (black line). The cFDR p-value refers to the p-value of association to an underpowered trait. In both panels, the 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1 and H 2 at the ratio of 1:19. The effect sizes of causative variants are correlated (ρ = 0.7) under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs.  total of (A) 500, (B) 1,000, (C) 2,500 and (D) 5,000 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR) or by the JLIM/cFDR consensus method. The consensus method (red line), by taking the intersection between JLIM p < 0.01 and cFDR p < 0.01, showed the empirical false positive rate of 0.0038 in simulated H 0 dataset. To match this false positive rate, we tightened cFDR threshold to p < 0.0038 (black line). The cFDR p-value refers to the p-value of association to an underpowered trait. The GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1 and H 2 at the ratio of 1:19. The effect sizes of causative variants are correlated (ρ = 0.7) under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs. (TIF) S10 Fig. The projected number of replicated loci in simulations where the correlation of effect sizes for H 1 varied. A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR) or by the JLIM/cFDR consensus method. The consensus method (red line), by taking the intersection between JLIM p < 0.01 and cFDR p < 0.01, showed the empirical false positive rate of 0.0038 in simulated H 0 dataset. To match this false positive rate, we tightened cFDR threshold to p < 0.0038 (black line). The cFDR p-value refers to the pvalue of association to an underpowered trait. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1  A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of (A) the same (CEU) and (B) different (YRI) genetic ancestry (n = 10,000-35,000). The LD patterns of genetic ancestry were obtained from the 1000 Genomes Project data. In both panels, the candidate loci were identified by conditional false discovery rate (cFDR) or by the JLIM/cFDR consensus method. The consensus method (red line), by taking the intersection between JLIM p < 0.01 and cFDR p < 0.01, showed the empirical false positive rate of 0.0038 in simulated H 0 dataset. To match this false positive rate, we tightened cFDR threshold to p < 0.0038 (black line). The cFDR pvalue refers to the p-value of association to an underpowered trait. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1 and H 2 at the ratio of 1:19. The effect sizes of causative variants are correlated with ρ = 0.7 under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs. (TIF) S12 Fig. The projected number of replicated loci in simulations of multiple causative variants for H 1 . A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by conditional false discovery rate (cFDR) or by the JLIM/cFDR consensus method. The consensus method (red line), by taking the intersection between JLIM p < 0.01 and cFDR p < 0.01, showed the empirical false positive rate of 0.0038 in simulated H 0 dataset. To match this false positive rate, we tightened cFDR threshold to p < 0.0038 (black line). The cFDR p-value refers to the pvalue of association to an underpowered trait. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 was set to 30%, and the remaining 70% of loci were split to H 1 and H 2 at the ratio of 1:19. In Panel (A), only one causative variant was simulated for H 1 , whereas in Panel (B), up to two causative variants were simulated for H 1 . The proportion of loci with two causative variants was set to 1/4 of all H 1 as expected under Poisson distribution with the causal fraction of 0.01. The effect sizes of causative variants are correlated with ρ = 0.7 under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs. (TIF) S13 Fig. Comparison with Bayesian meta-analysis methods with a validation cohort from the same ancestry. A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the same genetic ancestry (n = 10,000-35,000). The candidate loci were identified by the JLIM/ cFDR consensus method, MetABF and CPBayes. The consensus method (red line), by taking the intersection between JLIM p < 0.01 and cFDR p < 0.01, showed the empirical false positive rate of 0.0038 in simulated H 0 dataset. Bayesian posterior thresholds for MetABF (green) and CPBayes (purple) calibrated using H 0 loci to match the false positive rate of the consensus method. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 varied to (A) 0%, (B) 10%, (C) 20% and (D) 30%, and the remaining loci were split to H 1 and H 2 at the ratio of 1:19. In all panels, the effect sizes of causative variants are correlated with ρ = 0.7 under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs. In (E), the fraction of H 1 among replicated loci is compared among three methods when the proportion of H 0 is 30% (similar for different proportions of H 0 ). (TIF) S14 Fig. Comparison with Bayesian meta-analysis methods with a validation cohort from the different ancestry (YRI). A total of 2,500 association peaks from well-powered GWAS studies (n = 150,000) were tested for pleiotropy in a discovery cohort (n = 10,000), and then the candidate pleiotropic loci were tested for replication in an independent validation cohort of the different genetic ancestry (YRI; n = 10,000-35,000). The candidate loci were identified by the JLIM/cFDR consensus method, MetABF and CPBayes. The consensus method (red line), by taking the intersection between JLIM p < 0.01 and cFDR p < 0.01, showed the empirical false positive rate of 0.0038 in simulated H 0 dataset. Bayesian posterior thresholds for MetABF (green) and CPBayes (purple) calibrated using H 0 loci to match the false positive rate of the consensus method. The 2,500 GWAS peaks consist of the loci simulating no causal effect for underpowered traits (H 0 ) and those simulating the same causal effect between two traits (H 1 ) or distinct causal effects (H 2 ). The proportion of H 0 varied to (A) 0%, (B) 10%, (C) 20% and (D) 30%, and the remaining loci were split to H 1 and H 2 at the ratio of 1:19. In all panels, the effect sizes of causative variants are correlated with ρ = 0.7 under H 1 but uncorrelated under H 2 . Bonferroni correction was applied on replication tests. The shaded area denotes the 95% CIs. In (E), the fraction of H 1 among replicated loci is compared among three methods when the proportion of H 0 is 30% (similar for different proportions of H 0 ). (TIF) S15 Fig. Simulated power of multi-trait meta-analysis compared to pairwise analysis. Ten well-powered traits and one underpowered trait were simulated under H 0 , H 1 and H 2 . One of the well-powered traits is ascertained to have a genome-wide significant association peak, and the rest of well-powered traits were randomly decided to have the same or distinct causative variants by sampling from a binomial distribution Binom(n = 9, p = 1/20). The underpowered trait harbors no causative variant, the same causative variant or distinct causative variant depending on whether it is simulating H 0 , H 1 or H 2 . Effect sizes of all traits sharing the same causative variant were sampled together from a multivariate normal distribution with the correlation parameter of 0.7. Effect sizes of traits simulating distinct causative variants were sampled independently. GWAS association statistics at the focal SNP were generated with the sample sizes of n = 150,000 for the well-powered traits and n = 10,000 for the underpowered trait. The validation cohort of the same ancestry was simulated with the sample sizes of n = 10,000 to 35,000. From these simulated H 0 , H 1 and H 2 loci, a total of 2,500 GWAS loci were randomly selected at the proportions of 30%, 3.5% and 66.5%, respectively (H 1 :H 2 ratio of 1:19). (B) We applied iGWAS, CPBayes and MetABF on these data. Dashed lines indicate pairwise analyses for which only one genome-wide significant well-powered trait was compared with an underpowered trait. Solid lines indicate multi-trait analyses using the full set of ten well-powered traits and one underpowered trait. Candidate pleiotropic loci were selected for replication analysis at the cutoffs of p-value of 0.01 (iGWAS), or equivalent posterior probability thresholds calibrated in the H 0 dataset (CPBayes and MetABF). Bonferroni correction was applied on replication tests in the validation cohort. (TIF)

S16 Fig. Pleiotropic loci identified by the intersection of JLIM and cFDR are enriched with
SNPs that nominally replicate out of sample. The plot shows the fractions of randomly selected and putative pleiotropic loci with OSA associations that are nominally replicated in the meta-analyzed independent validation cohort (S6 Table). (TIF) S1