Figures
Abstract
Genome-wide association studies (GWAS) help to identify disease-linked genetic variants, but pinpointing the most likely causal genes in GWAS loci remains challenging. Existing GWAS gene prioritization tools are powerful but often use complex black box models trained on datasets containing biases. Here, we used a data-driven approach to construct a truth set of causal genes in 200 GWAS loci. We found that a simple logistic regression model performed as well as a more complex XGBoost model, and that many commonly-used gene prioritization features could be removed without meaningfully affecting performance (e.g., expression quantitative trait locus colocalization and Mendelian randomization). We present CALDERA, a gene prioritization tool that uses a logistic regression model and uses just four input features. In independent benchmarking datasets of resolved GWAS loci, CALDERA achieved state-of-the-art performance in comparison with other methods (FLAMES, L2G, and cS2G). CALDERA outputs causal gene probabilities for all genes in a given GWAS locus and we show that these probabilities are well-calibrated. Applying CALDERA to 93 UK Biobank traits, we predicted 11,956 putative causal genes, potentially resolving up to 52% of loci. Overall, CALDERA provides a powerful solution for prioritizing potentially causal genes in GWAS loci that minimizes the data processing required to construct input features and generates an easily-interpretable output score.
Author summary
Genome-wide association studies are a type of genetic study that have identified many genetic mutations that are involved in disease. However, in most cases the genes that are affected by these mutations are unknown. To predict these “effector genes”, we introduce a new tool called CALDERA. We show that existing tools may be unnecessarily complex: CALDERA achieves state-of-the-art prediction of known effector genes despite using a simpler machine learning model and far fewer input variables. By applying CALDERA to the results of genetic studies for 93 different traits and diseases, we were able to predict the likely effector gene for 52% of all trait- and disease-associated mutations for a total of 11,956 likely effector genes. By accurately linking mutations to genes, we gain a better understanding of disease biology and uncover potential opportunities to treat disease by manipulating the function of these genes.
Citation: Schipper M, Ulirsch J, Posthuma D, Ripke S, Heilbron K (2026) Simplifying causal gene identification in GWAS loci. PLoS Genet 22(3): e1012079. https://doi.org/10.1371/journal.pgen.1012079
Editor: Sungho Won, Seoul National University, KOREA, REPUBLIC OF
Received: March 6, 2025; Accepted: March 3, 2026; Published: March 17, 2026
Copyright: © 2026 Schipper et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: CALDERA is available as a set of open-source R scripts at https://github.com/kheilbron/caldera. All the credible set and variant-to-gene mapping data for the UK Biobank traits are available at https://www.finucanelab.org/data.
Funding: KH received support for the research of this work from the Alexander von Humboldt Foundation. SR received support for the research of this work from the German Center for Mental Health (DZPG), the European Union’s Horizon program (101057454, “PsychSTRATA”), and The German Research Foundation (402170461, grant “TRR265”). DP and MS received support for the research of this work from The Netherlands Organization for Scientific Research (NWO Gravitation: BRAINSCAPES: A Roadmap from Neurogenetics to Neurobiology - Grant No. 024.004.012). DP received support for the research of this work from The European Research Council (Advanced Grant No ERC-2018-AdG GWAS2FUNC 834057) and the European Union’s Horizon program (964874, “REALMENT”). DP and SR received support for the research of this work from the National Institute Of Mental Health of the National Institutes of Health (Award Number: R01MH124873). The content is the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: MS, DP, and SR have declared that no competing interests exist; JU is an employee of Illumina; KH is a former employee of 23andMe, Inc. and a current employee of Bayer AG.
Introduction
Genome-wide association studies (GWAS) are a valuable tool for identifying associations between diseases and genetic variants. However, many GWAS loci contain multiple genes and the vast majority of GWAS variants do not alter protein coding sequences [1]. A key challenge in using GWAS data is determining which genes are affected by disease-associated variants. Several gene prioritization tools have been developed to identify the most likely effector gene for a given GWAS signal such as Ei [2], FLAMES [3], and L2G [4]. These three tools all model the probability that each gene in a GWAS locus is a causal gene using 1) XGBoost, 2) a truth set of causal and non-causal trait-gene pairs, and 3) a variety of features. The FLAMES study [3] performed a head-to-head comparison of these methods and found that FLAMES outperformed L2G [4] and Ei [2], which in turn outperformed cS2G [5].
There are two main drawbacks to current gene prioritization tools. First, XGBoost models are challenging to interpret. While regression methods assign a single, fixed effect size to each feature, XGBoost uses ensembles of decision trees in which a feature’s contribution depends on the specific combinations and thresholds of other features along the decision paths. As a result, the influence of a given feature is context-dependent and non-linear, making it challenging to isolate and interpret why one gene receives a higher score than another. Second, models need to be trained on a ground truth dataset. Expert-curated causal genes have been shown to be biased toward genes in close proximity to GWAS hits and biased toward genes affected by coding credible set variants [4]. Although some methods try to mediate this by using a data-driven strategy for constructing ground-truth datasets [3, 5], none have actively corrected for potential sources of bias.
To address these issues we present a novel gene prioritization tool, CALDERA (CALling Disease-RelAted genes). Compared to existing XGBoost-based models, CALDERA uses a simpler, more interpretable logistic regression model and far fewer input features (4 versus >45). CALDERA mitigates bias by using a data-driven truth set. We show that CALDERA achieves state-of-the-art performance when predicting causal genes in gold standard benchmarking datasets that were not used during training.
Results
Study overview
We used a data-driven dataset of gene-trait pairs to develop a simple, yet powerful model for prioritizing genes in GWAS loci (Fig 1). Using three benchmarking sets, we show that CALDERA’s performance is superior or equivalent to other methods, but with a simpler and more interpretable model. Lastly, we performed a literature analysis to validate the top CALDERA predictions in 93 UK Biobank traits.
A data-driven dataset of causal and non-causal gene-trait pairs was used to train the CALDERA model. Performance versus three other methods was tested in three external datasets of known causal and non-causal gene-trait pairs. CALDERA was run on 93 UK Biobank traits and literature validation was performed for the top 10 predictions.
Model development
Defining causal genes.
We constructed a set of putatively causal (and non-causal) trait-gene pairs using SuSiE [6] credible sets for 25 independent (genetic correlation < 20%) UK Biobank traits, similar to Weeks and colleagues [7]. Within a given trait, we defined causal genes as those that were 1) affected by a fine-mapped non-synonymous variant (posterior inclusion probability [PIP] > 50%) and 2) within 300kb of a separate non-coding credible set (no non-synonymous variant PIP > 50%). The PIP threshold was chosen to remain consistent with study by Weeks and colleagues [7]. The choice of window size is elaborated upon in the Methods (section “Creating a set of causal and non-causal trait gene pairs”) and changes in window size do not meaningfully impact model performance (see the Results section “Sensitivity analyses”, Table 2). We defined non-causal genes as all other genes within 300kb of these non-coding credible sets. This resulted in a set of 200 putatively causal genes and 2,068 putatively non-causal genes across 25 independent traits.
Selecting a learning model.
Next, we trained logistic regression and XGBoost models to predict causal gene status using a set of 16 features derived from: distance to GWAS lead variant, non-synonymous variant PIP (all < 50% by definition), number of local genes, activity-by-contact (ABC) [8], enhancer-promoter correlation [9–11], eQTL colocalization [12], promoter capture Hi-C (PCHi-C) [13, 14], summary data-based Mendelian randomization (SMR) [15], transcriptome-wide association studies (TWAS) [16], DEPICT [17], NetWAS [18], and polygenic priority scores (PoPS) [7]. To assess model performance, we trained the models in a leave-one-trait-out cross-validation framework.
Model performance in held-out traits was similar for both logistic regression (Figs 2 and S1, AUPRC = 59.6%, 95% CI = 52.6% to 66.2%) and XGBoost (AUPRC = 58.5%, 95% CI = 51.5% to 65.1%). This suggests an absence of strong feature-feature interactions and non-linear relationships between causal gene status and features (after feature transformation, see Methods). Due to similar model performance, we proceeded using the simpler logistic regression model since it is computationally faster to run, does not require hyperparameters tuning, and is easier to interpret (see Introduction).
Full = the full set of 16 gene prioritization features, basic = the basic set of 4 gene prioritization features. Random refers to predictive performance of assigning a random gene in the locus to be the effector gene.
Selecting a set of features.
Applying these models to obtain predictions for a new GWAS of interest requires running a wide range of pipelines to construct the full feature set. We therefore tested the performance of a logistic regression model that used only a basic set of features: distance to GWAS lead variant, non-synonymous variant PIP, number of local genes, and PoPS. Despite the large reduction in the number of features, the performance in held-out traits was similar for both the full feature set (AUPRC = 59.6%, 95% CI = 52.6% to 66.2%) and the basic feature set (AUPRC = 60.0%, 95% CI = 53.1% to 66.6%; Fig 2). We therefore proceeded using the basic feature set.
Assuming a single causal gene per credible set.
Under the assumption that a given credible set will typically affect a single causal gene, we normalized model output probabilities to sum to 100% for each credible set. We refer to this model as CALDERA: a logistic regression model trained on a data-driven set of causal and non-causal genes using a basic set of four features followed by local normalization. We found the AUPRC in held-out traits was greater for CALDERA (AUPRC = 64.1%, 95% CI = 57.2%% to 70.5%) than for the same model without local normalization (AUPRC = 60.0%, 95% CI = 53.1% to 66.6%; Fig 2). This shows that the single causal gene assumption increases prediction performance. To help visualize the predicted feature effects, we plotted the model-predicted causal probability across a wide range of actual feature values (S2 Fig).
Model performance
Benchmarking performance.
We compared the performance of CALDERA with three other methods—FLAMES [3], L2G [4], and cS2G [5]—in three external gold standard datasets of causal and non-causal trait-gene pairs. CALDERA outperformed all other methods in the Open Targets gold standard dataset (Figs 3, left panel, and S3). Even though L2G was trained on this dataset, AUPRC was greater for CALDERA (AUPRC = 84.4%, 95% CI = 75.8% to 90.4%) than for L2G (AUPRC = 72.9%, 95% CI = 63.1% to 80.8%). In a gold standard dataset derived from burden tests of rare coding variants in the UK Biobank (Fig 3, middle panel), FLAMES outperformed all other methods (AUPRC = 56.0%, 95% CI = 48.1% to 63.7%), followed by CALDERA (AUPRC = 55.4%, 95% CI = 47.4% to 63.1%). Note that FLAMES was trained using UK Biobank rare coding variant burden data, including data from some of the same phenotypes as the test dataset. However, FLAMES was not trained on any of the loci used in the test dataset. Finally, CALDERA outperformed all other methods in a gold standard dataset derived from the serum levels of three molecules that belong to well-characterized biochemical pathways (the “Molecular Traits” dataset; Fig 3, right panel). Combining results from all three external benchmarking datasets, CALDERA (AUPRC = 67.7%, 95% CI = 62.0% to 72.9%) had a significantly larger AUPRC than L2G (AUPRC = 57.6%, 95% CI = 51.7% to 63.3%, P = 0.014) and cS2G (AUPRC = 31.0%, 95% CI = 25.9% to 36.7%, P < 2.2x10-16), and was not significantly different from FLAMES (AUPRC = 66.0%, 95% CI = 60.3% to 71.4%, P = 0.679). These results demonstrate that CALDERA achieves state-of-the-art performance.
Calibration.
CALDERA predictions for held-out traits were largely well-calibrated (Brier score = 4.9%), although predictions between approximately 2% and 15% were slightly lenient and predictions between 35% and 55% were slightly conservative (Fig 4). CALDERA predictions were similarly well-calibrated in the three gold standard datasets (S4 Fig), and achieved smaller Brier scores than all other methods in all three datasets (Table 1). Note that Brier scores are not meaningful for FLAMES and cS2G because, unlike CALDERA and L2G, their outputs do not represent easily-interpretable causal gene probabilities.
The x-axis represents the model predicted probability in held-out trait data and the y-axis represents the ground truth causal probability. The solid line represents the fitted values from generalized additive models with shaded areas representing 95% confidence intervals. The dashed line represents perfect calibration.
Sensitivity analyses.
We required causal genes to be within 300kb of a non-coding credible set. We repeated our analyses using alternative windows of 100kb, 200kb, and 400kb (Table 2). AUPRC in the three external benchmarking datasets varied little, suggesting that CALDERA is robust to the choice of locus size.
We also assumed a single causal gene for each credible set and therefore normalized model output probabilities to sum to 100%.
Since running PoPS requires more than 57,000 input features, we tested whether it would be possible to further simplify CALDERA by removing PoPS scores. Performance in the CALDERA truth set decreased substantially when removing PoPS scores (AUPRC = 46.9%, 95% CI = 40.1% to 53.9%) or replacing them with similarity-based scores from DEPICT (AUPRC = 49.0%, 95% CI = 42.1% to 55.9%) or NetWAS (AUPRC = 48.6%, 95% CI = 41.8% to 55.5%). We therefore retained PoPS scores in the CALDERA model.
Implementation
We have provided code to run CALDERA using only two input files: a PoPS output file and a file containing credible set information (https://github.com/kheilbron/caldera). There, we have also provided example input files from a recent Parkinson’s disease study [19] and a pre-generated output file to facilitate easy, reproducible use of CALDERA. Using a standard laptop (Apple M2 Pro chip, 10-core CPU, 16GB of RAM), CALDERA processed 93 UK Biobank traits—including 774,014 variants from 22,802 credible sets—in just 29 seconds on a single CPU thread with a peak memory usage of 336MB. We identified 11,956 genes with an estimated causal gene probability > 50%, potentially resolving up to 52% of these GWAS loci. We have provided CALDERA results for all 93 UK Biobank traits in S1 Table.
Biological validation of CALDERA predictions
Beyond the data-driven benchmarking we sought to validate top predictions with high-quality biological evidence. We investigated whether CALDERA’s top 20 predictions were consistent with previously reported gene-trait associations in the literature (see Methods) and found support for 17 of the 20 highest-scoring trait-gene pairs across all 93 UK Biobank traits (Table 3).
For height, specific mutations in PDE4D, ZBTB20, NPR3, LAMA2, ADAMTS17, and PAPPA2 cause Mendelian disorders which results in short stature [20–24,27,38,39,41,42], and IGF1, TBX3, and PAPPA2 are directly involved in the IGF1 growth pathway—a major determinant of adult height [24, 31,32,45–47]. Additionally, specific PCSK5 mutations in mice cause anteroposterior patterning defects and hindlimb hypoplasia [28], NOX4 knockout zebrafish displayed drastically-reduced bone development and mineralization [34], and BMP5 knockout mice displayed skeletal defects and decreased body size [36, 37].
For platelet count, specific mutations in RUNX1 cause a Mendelian disorder resulting in a ~90% reduction in platelet count [29, 30], while CD36 plays a major role in platelet activation in response to oxidized low-density lipoprotein [25] and CD36 deficiency has been linked to refractoriness of platelet transfusions, thrombocytopenia and the development of post-transfusion purpura [26]. Experiments in mice have shown SYK to be involved in platelet signaling, and SYK inhibitors increase platelet counts in humans [43]. For mean corpuscular volume, RUNX1 has been shown to alter red blood cell volume in mouse studies [44]. POT1 forms a complex with TPP1 and individuals with mutations in the POT1-TPP1 complex have been shown to suffer from increased hematological neoplasms, which are often characterized by an increase in corpuscular volume [40]. Finally, CBLB is a major susceptibility gene for rat models of type 1 diabetes, a disease that is associated with progressive decline in kidney function [35].
In total, 17 of our 20 (85%) top predictions passed our validation criteria. The remaining three gene-trait pairs that were not validated may be: 1) cases of incorrect CALDERA predictions, or 2) cases of correct CALDERA predictions where no high-quality validation data was available. Additionally, we applied the same validation criteria to 10 randomly-selected predictions with a CALDERA score of 0.75. Of these genes, 7 are supported by high-quality literature evidence (70%).
As an additional source of biological validation, we inspected the 34 causal genes in the Open Targets benchmarking dataset that were derived from functional studies (e.g., perturbation studies, CRISPR). CALDERA correctly assigned the highest score in the locus to 74% of these genes, including validated genes for type 2 diabetes (e.g., SLC30A8, WFS1, GLIS3, ANKH) and breast cancer (e.g., TGFBR2, CUX1, SETBP1, FGF10).
Discussion
In this work we have developed CALDERA, a tool for prioritizing genes in GWAS loci that achieves state-of-the-art prediction performance.
To account for biases in the CALDERA truth set, we used a data-driven truth set rather than one that was manually curated by human experts. The L2G study found that some features related to distance and non-synonymous variants performed much better in manually-curated datasets than in data-driven datasets derived from ChEMBL [4]. This suggests that many of these causal genes were selected precisely because of their close proximity to a GWAS signal or due to a credible set coding variant.
We compared the performance of CALDERA with three other gene prioritization tools in three independent benchmarking datasets. Combining all three datasets, CALDERA (AUPRC = 67.7%) significantly outperformed cS2G (AUPRC = 31.0%, P < 2.2x10-16) and L2G (AUPRC = 57.6%, P = 0.014), and achieved similar performance to FLAMES (AUPRC = 66.0%, P = 0.679). Notably, CALDERA outperformed L2G in the Open Targets dataset even though this was the L2G training dataset (AUPRCCALDERA = 84.4%, AUPRCL2G = 72.9%). In the ExWAS dataset, FLAMES was the only tool that achieved a greater AUPRC than CALDERA (AUPRCFLAMES = 56.0%, AUPRCCALDERA = 55.3%). Interestingly, FLAMES and CALDERA do not significantly differ in predictive performance in our benchmarks, despite CALDERA utilizing a more simplistic logistic regression model over an XGBoost model, and using far fewer features. Similarly, training CALDERA as an XGBoost model did not lead to improved performance. We feel that with similar performance the logistic regression model of CALDERA is preferred because it greatly enhances interpretability. One possible explanation for why FLAMES and CALDERA outperformed cS2G and L2G is that both CALDERA and FLAMES used features derived from PoPS. While most features used by gene prioritization tools are derived from a single focal GWAS locus, PoPS is a similarity-based method that integrates genome-wide information [7]. As such, PoPS may contribute information that is orthogonal to the information provided by locus-based features. We argue that one of the main improvements of CALDERA is that it outperforms most current methods, performs similarly to FLAMES, and does so with a much simpler model.
ABC, enhancer-promoter correlation, eQTL colocalization, PCHi-C, SMR, and TWAS are all locus-based features that have been shown to predict causal genes in GWAS loci [7]. However, we found that removing these tools had little impact on the predictive performance of our models. This suggests that the information provided by these methods was already accounted for by the basic set of CALDERA features derived from gene distance, non-synonymous variant PIP, PoPS, and the number of nearby genes. Removing these features allowed us to reduce the number of datasets and processing steps required to run CALDERA.
Since CALDERA uses a logistic regression model, an increase in a given feature leads to a linear increase in the log odds that a given gene is causal. As shown in S2 Fig, this approach makes it simple to visualize and understand the relationship between features and CALDERA’s predicted causal probabilities. In contrast, this is not possible for XGBoost models, where the effect of increasing a given feature is typically dependent on the values of other features. PoPS values are generated by a ridge regression model that requires ~58,000 input features, making them challenging to interpret. Nevertheless, it is straightforward to visualize the relationship between PoPS values and CALDERA predictions.
This study has two main assumptions. We assumed that genes bearing a coding variant with PIP > 50% are causal for a given trait. While a variant with PIP = 50% should only have a 50% probability of being the causal variant, this probability should be much greater for coding variants [48] and 73% of our causal genes had a coding variant PIP > 90%.
More importantly, we also assumed that all non-coding credible sets within 300kb of one of these genes also act through the same causal gene. Reprocessing published data [49], we found that 87% of cis-eQTLs were within 100kb of their effector gene and that the percentage of effector genes decreased steeply as the distance increased further (S5 Fig). We found similar results for the distance between GWAS hits and their nearest gene, a proxy for the causal gene (S5 Fig). By definition, the distance between GWAS hits and their true effector genes must be greater. Nevertheless, these data and others [50] suggest that, beyond a certain distance, the probability of being a causal gene begins to decrease in an exponential-like fashion. As such, distal “causal genes” in the CALDERA truth set are more likely to be incorrect. Including incorrect causal genes introduces noise to the truth set and deflates coefficient estimates. Nevertheless, we believe that there is value in using a data-driven truth set that is free from human biases and that CALDERA’s strong performance in external benchmarking datasets demonstrates the utility of this approach.
At the same time, there are well-documented examples where the causal gene lies further than 300kb from the credible set [51]. Nevertheless, CALDERA showed good calibration (S4 Fig) in all three external benchmarking datasets, which used 500–750kb windows. This suggests that CALDERA can be robustly applied to larger locus definitions than the ones on which it was trained.
A limitation of CALDERA is that it was trained on features computed using in-sample linkage disequilibrium (LD) from one cohort (UK Biobank). Using out-of-sample LD reference panels can lead to errors in both sources of CALDERA features—PoPS and fine-mapped credible sets. Additionally, GWASes that meta-analyze multiple cohorts commonly have heterogeneous sample sizes across variants. This leads to misspecified credible set PIPs [52], although PoPS can process variant-specific sample sizes and is therefore more robust. Prior to using CALDERA, we therefore advise the use of tools to check for discrepancies between GWAS summary statistics and the LD reference panel and the removal of failing variants or loci [52, 53].
Finally, the performance of CALDERA has not been benchmarked in GWASes from non-European populations. Previous work has shown that polygenic risk scores trained in one population typically perform poorly in other populations [54]. Unfortunately, this is challenging to test at present. Identifying the 200 causal trait-gene pairs in the CALDERA truth set required GWAS data for 25 independent traits, each of which was performed on hundreds of thousands of individuals. Fortunately, this is likely to be possible in the near future thanks to biobank-scale initiatives in individuals of diverse ancestries, such as All of Us [55].
In conclusion, CALDERA is a powerful tool for GWAS gene prioritization that accounts for training set biases and uses a simple logistic regression model. It achieved consistently greater predictive power than L2G and cS2G in external benchmarking datasets. Unlike cS2G and FLAMES, CALDERA returns estimated causal gene probabilities that can be intuitively interpreted. Leveraging CALDERA could aid in the prioritization of novel causal disease genes.
Methods
Creating a set of causal and non-causal trait gene pairs
To define a set of causal (and non-causal) trait-gene pairs, we employed a similar approach to previous work by Weeks and colleagues [7]. We started with SuSiE credible sets for 39 independent UK Biobank GWASes [56] (S2 Table for independent traits). To minimize the risk of errors in SuSiE fine-mapping, we subsetted to the top 5 credible sets within each region. We compared subsetting to the top 3–10 credible sets and found that 5 yielded the highest model performance (S3 Table; see “Model training, testing, and performance”). We identified credible sets containing a non-synonymous variant with a PIP > 50% (“coding credible sets”) and the affected gene (“coding genes”). The 50% PIP threshold was chosen to maintain consistency with the study by Weeks and colleagues [7]. We designated the remaining credible sets as “non-coding credible sets” (no non-synonymous variant with PIP > 50%). We subsetted to non-coding credible sets within 300kb of a single coding gene for the same trait and with a maximum credible set width of 400kb. We extracted all protein-coding genes within 300kb of each of these non-coding credible sets, assigned the nearby coding gene as a “causal gene”, and assigned all others as “non-causal genes”. As such, the maximum locus size was 1Mb—a 400kb credible set plus 300kb on either side. Although this is smaller than the 1.5Mb locus size used by FLAMES [3] and similar to the 1Mb locus size used by L2G [4], previous work has shown that 90% of eQTLs are found within 130kb of their causal gene and that 90% of GWAS hits are found within 108kb of the nearest gene (a proxy for the causal gene) [49]. Under this design, a gene could be causal for multiple non-coding credible sets, potentially leading to pseudoreplication. As such, if a causal gene was shared by multiple non-coding credible sets from the same trait, we subsetted to the non-coding credible set with the strongest statistical association (as measured by SuSiE approximate Bayes factor). If a causal gene was shared by multiple non-coding credible sets across traits, we subsetted to the trait with the fewest causal genes to achieve a more balanced distribution of causal genes per trait.
Variant-to-gene evidence
Next, we joined variant-to-gene mapping evidence to this causal gene dataset by trait and gene. We extracted all predictive features for all trait-gene pairs from the original Polygenic Prioritization Scores (PoPS) study [7]. These features were: distance to GWAS lead variant, fine-mapped non-synonymous variant PIP, activity-by-contact (ABC) [8], enhancer-promoter correlation [9–11], expression quantitative trait locus (eQTL) colocalization [12], PCHi-C [13, 14], summary data-based Mendelian randomization (SMR) [15], transcriptome-wide association studies (TWAS) [16], DEPICT [17], NetWAS [18], and PoPS [7]. DEPICT, NetWAS and PoPS are all similarity-based approaches that prioritize genes with characteristics that are overrepresented regions of the genome that contain GWAS signals. To narrow the search space to credible genes, we only included canonical ENSGIDs. To determine the number of local genes, we included all GENCODE v44 [57] genes within 300kb of the focal credible set (see the “Sensitivity analyses” Results section).
Feature engineering and missing data imputation
To prevent information leakage, we used PoPS values that were generated using the leave-one-chromosome-out (LOCO) method. PoPS values are derived from MAGMA z-scores and a gene’s MAGMA z-score is derived from all independent variants falling between its transcription start and end sites, including the coding variants that we used to define causal genes. The LOCO approach ensures that a gene’s PoPS score is computed without including its own MAGMA z-score in the training process.
For SMR [15] and TWAS [16], we used the absolute value of the z-score. Using local polynomial regression, we found that distance (GWAS lead variant to gene body) had a non-linear relationship with the log odds of being a causal gene. To improve linearity, we added 1 kilobase before applying a log10 transformation (see S6 Fig for local polynomial regression results before and after transformation). We multiplied the transformed distance values by -1 to ensure a positive relationship with causal gene status, consistent with all other input features. We transformed the number of local genes by taking the inverse, which is equivalent to the prior probability that a gene is causal in the locus. By applying a logit10 transformation, we then converted these probabilities to log10 odds. Some input feature values were missing due to an absence of required data in the locus (e.g., no eQTLs with which to run TWAS). We imputed these values to 0, the minimum possible value (e.g., TWAS absolute z-score). There were no missing data for the basic feature set of PoPS, distance, non-synonymous variant PIP, or the number of local genes.
Model training, testing, and performance
To maximize the applicability of model predictions to new traits, we trained models using a leave-one-trait-out (LOTO) cross-validation framework. We held one trait out as a test set, trained models on the remaining 24 traits, and used these trained models to predict causal gene probability in the held-out test set. We then used these predictions to compute AUPRC using the pr.curve function and the auc.integral method from the PPROC R package [58]. We computed AUPRC 95% CIs using the logit method [59]. By evaluating model performance in a held out trait, we sought to develop a model that would generalize well to new, unseen traits. AUPRC is a particularly suitable metric for evaluating binary classification models when there is an unequal number of positive and negative labels (i.e., causal and non-causal genes) because it emphasizes accuracy in the minority class.
XGBoost
We trained XGBoost models using the xgboost and mlr R packages. We performed an additional round of LOTO cross-validation within the training data to select hyperparameter values. We used a binary logistic objective function and 100 hyperparameter sets. For each set, we randomly sampled hyperparameters from uniform distributions (see S5 Table for hyperparameters and their ranges). We used the hyperparameter set that minimized the cross-validation log loss to train a model and apply it to the held out test set.
Basic feature set
To minimize the number of datasets and computational pipelines required to generate CALDERA input features, we aimed to create a minimal set of features that would yield an AUPRC similar to that of the full feature set. We selected distance, coding PIP, and the number of local genes because they all had large feature importance values in the L2G and FLAMES models [3, 4]. In addition to these locus-based predictors, we selected PoPS. Previous work has shown that PoPS’s genome-wide approach strongly increased positive predictive value when combined with locus-based predictors [3, 7].
Assuming a single causal gene per credible set
Under the assumption that there will typically only be a single causal gene within a GWAS locus, we normalized model output probabilities to sum to 100% within each locus (see results section: “Assuming a single causal gene per credible set”).
Benchmarking performance
To compare the performance of CALDERA and three other tools (FLAMES, L2G, cS2G), we used three benchmarking datasets from the FLAMES study [3]. The “Open Targets dataset” (assembled by the L2G study [4] and referred to as the “expert-curated” dataset in the FLAMES study) contained 105 causal genes that were either curated by human experts (e.g., the ProGeM study [60]) or identified using a data-driven analysis of the ChEMBL drug database [61], as well as 991 nearby non-causal genes. The “ExWAS dataset” (assembled by the FLAMES study [3]) contained 160 causal genes with a significant missense or loss-of-function burden test association in the UK Biobank whole exome sequencing data, as well as 2,019 nearby non-causal genes. The “Molecular Traits dataset” (assembled by Sinnott-Armstrong and colleagues [62]) contained 29 causal genes and 310 nearby non-causal genes for the serum levels of urate, IGF-1, and testosterone. A large proportion of the strongest GWAS loci for these molecular traits contain genes with known biological relevance based on preexisting knowledge of their metabolic pathways.
We directly used pre-computed values from the FLAMES study for PoPS (‘PoPS_Score’) and distance (‘distance’, also used to compute the number of genes within 300kb of each GWAS signal). The definition of coding variant impact differed between CALDERA (non-synonymous variant PIP) and the FLAMES study (‘VEP_sum’). We therefore used the credible sets from the FLAMES study to directly compute CALDERA-compatible values. Seven traits in the ExWAS dataset were identical or highly correlated with the traits used to train CALDERA. We therefore used a version of CALDERA excluding these traits (calcium, estimated bone mineral density, hemoglobin, hemoglobin A1c, adult height, low density lipoprotein cholesterol, total bilirubin) for testing in the ExWAS dataset. Likewise, we used a version of CALDERA that excluded serum IGF1 protein level for testing in the Molecular Traits” dataset. We used the FLAMES, L2G, and cS2G values reported by the FLAMES study [3]. As with model development in the CALDERA training dataset, we evaluated the performance of each model in each benchmarking dataset using AUPRCs from the pr.curve function and the auc.integral method from the PPROC R package [58].
Calibration
We sought to ensure that CALDERA output probabilities were well-aligned with the actual causal probabilities in the training dataset. We generated calibration plots using the cal_plot_logistic function from the “probably” R package.
Biological validation
We aimed to biologically validate the top predictions by the CALDERA model for 93 UK Biobank traits. To avoid cherry-picking examples we decided to evaluate the 20 unique gene-trait pairs with the largest CALDERA scores. We also removed loci containing only one gene (by definition CALDERA score = 100%). Specifically, we evaluated whether the predictions made by CALDERA were supported in the literature. We separated predictions into two categories: “supported” and “not supported”. The “supported” category indicates strong support from literature through functional evidence (e.g., similar phenotypes in model organism knockouts), Mendelian forms of the phenotypes associated with specific mutations in the same gene, or the gene being part of a well-studied pathway directly underlying the trait of interest. Remaining gene-trait pairs were classified as “not supported”. For comparison, we repeated this analysis for the 10 gene-trait pairs with CALDERA scores closest to 75% (S4 Table).
Supporting information
S1 Fig. Comparisons of causal probabilities across models.
A. XGBoost with the full feature set versus logistic regression with the full feature set. B. Logistic regression with the full feature set and gene-level covariates (GLCs) versus logistic regression with the basic feature set and GLCs. C. Logistic regression with the basic feature set and GLCs versus CALDERA. Each point represents a single trait-gene pair. The solid black line represents an equivalent value for the x- and y-axis variables.
https://doi.org/10.1371/journal.pgen.1012079.s001
(TIF)
S2 Fig. Relationships between predicted causal gene probabilities and PoPS (top left panel), distance between gene and GWAS lead variant (in base pairs, top middle panel), non-synonymous credible set variant posterior inclusion probability (PIP, bottom left panel), and number of local genes (bottom middle panel).
The lower y-axis represents the logit-transformed probability that a given gene is causal for a given trait. The x-axis represents global feature values ranging from the 5th to the 95th percentile (except for coding PIP, which ranges from the 0th to the 100th percentile). Histograms showing the global feature distribution are plotted at the top of each panel. For coding PIP, the histogram y-axis was truncated at 20 for clarity (count of first bin = 4,787). All other features were set to their means, leading to low overall probabilities. Although transformed distances were used to train the model, untransformed values are presented to facilitate interpretation. The right panel shows the distribution of CALDERA scores for causal (top) and non-causal (bottom) genes in the training dataset.
https://doi.org/10.1371/journal.pgen.1012079.s002
(TIFF)
S3 Fig. Area under the precision-recall curve (±95% confidence intervals) for CALDERA, FLAMES, L2G, and cS2G model predictions in the Open Targets ground truth dataset (left panel), a ground truth dataset derived from burden tests of rare coding variants in the UK Biobank (middle panel), or a ground truth dataset derived from three well-characterized serum metabolite levels (right panel).
https://doi.org/10.1371/journal.pgen.1012079.s003
(TIF)
S4 Fig. CALDERA calibration plots in the Open Targets (left), ExWAS (middle), and Molecular Traits (right) gold standard datasets.
The x-axis represents the model-predicted probability in held-out trait data and the y-axis represents the ground truth causal probability. The solid lines represent the fitted values from generalized additive models with shaded areas representing 95% confidence intervals. The dashed lines represent perfect calibration.
https://doi.org/10.1371/journal.pgen.1012079.s004
(TIF)
S5 Fig. The proportion of genes that lie in various distance bins for eQTLs and their actual effector genes, and for GWAS hits and their nearest genes.
The data were reprocessed from Mostafavi et al. 2023 [49].
https://doi.org/10.1371/journal.pgen.1012079.s005
(TIF)
S6 Fig. Relationships between predicted causal gene probabilities and distance between gene and GWAS lead variant (left panel), gene coding sequence length (middle panel), and total Roadmap enhancer length (right panel) both before (top panels) and after transformation (bottom panels).
The lower y-axis represents the logit-transformed probability that a given gene is causal for a given trait. The x-axis represents feature values across their entire range. Histograms showing the feature distribution are plotted at the top of each panel. Green lines represent the fitted values of a local polynomial regression with grey shading representing 95% confidence intervals. In all cases, transformation improved the linearity of the relationship between the feature and the probability of being a causal gene in the training dataset.
https://doi.org/10.1371/journal.pgen.1012079.s006
(TIFF)
S1 Table. CALDERA results for 93 UK Biobank traits. locus: a unique identifier for each independent locus based on the trait, chromosome, position, and credible set number.
Locus_pos: the chromosome, start, and end position of the locus. gene: gene name. caldera: CALDERA score (probability of being a causal gene). multi: CALDERA score without normalizing so that probabilities sum to 1 within each locus. n_genes: number of genes in the locus. dist: distance between credible set and gene. pops: PoPS value. coding: summed posterior inclusion probability of non-synonymous variants affecting this gene. impute_pops: logical, was the PoPS value imputed (to 0) due to being missing in the PoPS gene list. ensgid: ENSEMBL gene identifier. trait: abbreviated trait name. description: trait description.
https://doi.org/10.1371/journal.pgen.1012079.s007
(XLSX)
S2 Table. Set of 39 independent traits used to construct the CALDERA training dataset.
Trait: phenotype abbreviation. Description: phenotype description. N individuals: Total sample size. N cases: number of cases (binary traits only). N controls: number of controls (binary traits only). N causal genes: number of causal genes for this trait in the training dataset. N non-causal genes: number of non-causal genes for this trait in the training dataset.
https://doi.org/10.1371/journal.pgen.1012079.s008
(XLSX)
S3 Table. Area under the precision-recall curve for CALDERA models after subsetting to the top 3–10 SuSiE credible sets.
# credible sets: number of SuSiE credible sets per region that were used for defining the training dataset. AUPRC: area under the precision-recall curve of CALDERA in the held-out test set. Lower 95% CI: lower bound of the AUPRC 95% confidence interval. Upper 95% CI: upper bound of the AUPRC 95% confidence interval. # causal genes: number of causal genes in the training dataset. # non-causal genes: number of non-causal genes in the training dataset.
https://doi.org/10.1371/journal.pgen.1012079.s009
(XLSX)
S4 Table. Biological validation of 10 randomly-selected predictions with a CALDERA score of 75%.
https://doi.org/10.1371/journal.pgen.1012079.s010
(XLSX)
S5 Table. List of XGBoost hyperparameters that were tuned during leave-one-trait-out cross-validation (“Parameter” column).
We randomly sampled hyperparameter values from uniform distributions ranging between the values in the “Minimum” and “Maximum” columns.
https://doi.org/10.1371/journal.pgen.1012079.s011
(XLSX)
Acknowledgments
We thank SURF (www.surf.nl) for their support in using the Snellius National Supercomputer and Leonhard Kohleick for valuable feedback.
Web resources
PoPS: https://github.com/FinucaneLab/pops, MAGMA: https://cncr.nl/research/magma/, Gencode release 44: https://www.gencodegenes.org/human/release_44.html, The Mostafavi et al. 2023 [49] Zenodo repository: https://zenodo.org/records/6618073
References
- 1. Watanabe K, Stringer S, Frei O, Mirkov MU, de Leeuw C, Polderman TJC, et al. Author Correction: A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet. 2020;52(3):353. pmid:32029922
- 2. Forgetta V, Jiang L, Vulpescu NA, Hogan MS, Chen S, Morris JA, et al. An effector index to predict target genes at GWAS loci. Hum Genet. 2022;141(8):1431–47. pmid:35147782
- 3. Schipper M, de Leeuw CA, Maciel BAPC, Wightman DP, Hubers N, Boomsma DI, et al. Prioritizing effector genes at trait-associated loci using multimodal evidence. Nat Genet. 2025;57(2):323–33. pmid:39930082
- 4. Mountjoy E, Schmidt EM, Carmona M, Schwartzentruber J, Peat G, Miranda A, et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat Genet. 2021;53(11):1527–33. pmid:34711957
- 5. Gazal S, Weissbrod O, Hormozdiari F, Dey KK, Nasser J, Jagadeesh KA, et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat Genet. 2022;54(6):827–36. pmid:35668300
- 6. Wang G, Sarkar A, Carbonetto P, Stephens M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J R Stat Soc Series B Stat Methodol. 2020;82(5):1273–300. pmid:37220626
- 7. Weeks EM, Ulirsch JC, Cheng NY, Trippe BL, Fine RS, Miao J, et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat Genet. 2023;55(8):1267–76. pmid:37443254
- 8. Fulco CP, Nasser J, Jones TR, Munson G, Bergman DT, Subramanian V, et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet. 2019;51(12):1664–9. pmid:31784727
- 9. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507(7493):455–61. pmid:24670763
- 10. Liu Y, Sarkar A, Kheradpour P, Ernst J, Kellis M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 2017;18(1):193. pmid:29058599
- 11. Ulirsch JC, Lareau CA, Bao EL, Ludwig LS, Guo MH, Benner C, et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat Genet. 2019;51(4):683–93. pmid:30858613
- 12. Hormozdiari F, van de Bunt M, Segrè AV, Li X, Joo JWJ, Bilow M. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am J Hum Genet. 2016;99:1245–60.
- 13. Jung I, Schmitt A, Diao Y, Lee AJ, Liu T, Yang D, et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat Genet. 2019;51(10):1442–9. pmid:31501517
- 14. Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, et al. Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters. Cell. 2016;167(5):1369–1384.e19. pmid:27863249
- 15. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–7. pmid:27019110
- 16. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48(3):245–52. pmid:26854917
- 17. Pers TH, Karjalainen JM, Chan Y, Westra H-J, Wood AR, Yang J, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun. 2015;6:5890. pmid:25597830
- 18. Greene CS, Krishnan A, Wong AK, Ricciotti E, Zelaya RA, Himmelstein DS, et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet. 2015;47(6):569–76. pmid:25915600
- 19. Lange LM, Cerquera-Cleves C, Schipper M, Panagiotaropoulou G, Braun A, Kraft J, et al. Prioritizing Parkinson’s disease risk genes in genome-wide association loci. NPJ Parkinsons Dis. 2025;11(1):77. pmid:40240380
- 20. Morgul A, Sharova M, Kenis V, Orlova M, Ryzhkova O, Markova T. Case Report: The widening genetic and phenotypic spectrum of ultra-rare PDE4D-related acroscyphodysplasia. Front Med (Lausanne). 2025;12:1623593. pmid:40917827
- 21. Michot C, Le Goff C, Goldenberg A, Abhyankar A, Klein C, Kinning E, et al. Exome sequencing identifies PDE4D mutations as another cause of acrodysostosis. Am J Hum Genet. 2012;90(4):740–5. pmid:22464250
- 22. Lindstrand A, Grigelioniene G, Nilsson D, Pettersson M, Hofmeister W, Anderlid B-M, et al. Different mutations in PDE4D associated with developmental disorders with mirror phenotypes. J Med Genet. 2014;51(1):45–54. pmid:24203977
- 23. Lynch DC, Dyment DA, Huang L, Nikkel SM, Lacombe D, Campeau PM, et al. Identification of novel mutations confirms PDE4D as a major gene causing acrodysostosis. Hum Mutat. 2013;34(1):97–102. pmid:23033274
- 24. Fujimoto M, Andrew M, Dauber A. Disorders caused by genetic defects associated with GH-dependent genes: PAPPA2 defects. Mol Cell Endocrinol. 2020;518:110967. pmid:32739295
- 25. Podrez EA, Byzova TV, Febbraio M, Salomon RG, Ma Y, Valiyaveettil M, et al. Platelet CD36 links hyperlipidemia, oxidant stress and a prothrombotic phenotype. Nat Med. 2007;13(9):1086–95. pmid:17721545
- 26. Lee K, Godeau B, Fromont P, Plonquet A, Debili N, Bachir D, et al. CD36 deficiency is frequent and can cause platelet immunization in Africans. Transfusion. 1999;39(8):873–9. pmid:10504124
- 27. Cordeddu V, Redeker B, Stellacci E, Jongejan A, Fragale A, Bradley TEJ, et al. Mutations in ZBTB20 cause Primrose syndrome. Nat Genet. 2014;46(8):815–7. pmid:25017102
- 28. Szumska D, Pieles G, Essalmani R, Bilski M, Mesnard D, Kaur K, et al. VACTERL/caudal regression/Currarino syndrome-like malformations in mice with mutation in the proprotein convertase Pcsk5. Genes Dev. 2008;22(11):1465–77. pmid:18519639
- 29. Stockley J, Morgan NV, Bem D, Lowe GC, Lordkipanidzé M, Dawood B, et al. Enrichment of FLI1 and RUNX1 mutations in families with excessive bleeding and platelet dense granule secretion defects. Blood. 2013;122(25):4090–3. pmid:24100448
- 30. Latger-Cannard V, Philippe C, Bouquet A, Baccini V, Alessi M-C, Ankri A, et al. Haematological spectrum and genotype-phenotype correlations in nine unrelated families with RUNX1 mutations from the French network on inherited platelet disorders. Orphanet J Rare Dis. 2016;11:49. pmid:27112265
- 31. Blum WF, Schweizer R. Insulin-Like Growth Factors and Their Binding Proteins. Diagnostics of Endocrine Function in Children and Adolescents. KARGER. 2003:166–99.
- 32. Savage MO, Burren CP, Rosenfeld RG. The continuum of growth hormone-IGF-I axis defects causing short stature: diagnostic and therapeutic challenges. Clin Endocrinol (Oxf). 2010;72(6):721–8. pmid:20050859
- 33. Park YM. CD36, a scavenger receptor implicated in atherosclerosis. Exp Mol Med. 2014;46(6):e99. pmid:24903227
- 34. Cao Z, Liu G, Zhang H, Wang M, Xu Y. Nox4 promotes osteoblast differentiation through TGF-beta signal pathway. Free Radic Biol Med. 2022;193(Pt 2):595–609. pmid:36372285
- 35. Yokoi N, Komeda K, Wang H-Y, Yano H, Kitada K, Saitoh Y, et al. Cblb is a major susceptibility gene for rat type 1 diabetes mellitus. Nat Genet. 2002;31(4):391–4. pmid:12118252
- 36. Costantini A, Guasto A, Cormier-Daire V. TGF-β and BMP Signaling Pathways in Skeletal Dysplasia with Short and Tall Stature. Annu Rev Genomics Hum Genet. 2023;24:225–53. pmid:37624666
- 37. Baldarelli RM, Smith CL, Ringwald M, Richardson JE, Bult CJ, Mouse Genome Informatics Group. Mouse Genome Informatics: an integrated knowledgebase system for the laboratory mouse. Genetics. 2024;227(1):iyae031. pmid:38531069
- 38. Boudin E, de Jong TR, Prickett TCR, Lapauw B, Toye K, Van Hoof V, et al. Bi-allelic Loss-of-Function Mutations in the NPR-C Receptor Result in Enhanced Growth and Connective Tissue Abnormalities. Am J Hum Genet. 2018;103(2):288–95. pmid:30032985
- 39. Sarkozy A, Foley AR, Zambon AA, Bönnemann CG, Muntoni F. LAMA2-Related Dystrophies: Clinical Phenotypes, Disease Biomarkers, and Clinical Trial Readiness. Front Mol Neurosci. 2020;13:123. pmid:32848593
- 40. Wang T, Mei S, Fu R, Wang H, Shao Z. Expression of Shelterin component POT1 is associated with decreased telomere length and immunity condition in humans with severe aplastic anemia. J Immunol Res. 2014;2014:439530. pmid:24892036
- 41. Morales J, Al-Sharif L, Khalil DS, Shinwari JMA, Bavi P, Al-Mahrouqi RA, et al. Homozygous mutations in ADAMTS10 and ADAMTS17 cause lenticular myopia, ectopia lentis, glaucoma, spherophakia, and short stature. Am J Hum Genet. 2009;85(5):558–68. pmid:19836009
- 42. Khan AO, Aldahmesh MA, Al-Ghadeer H, Mohamed JY, Alkuraya FS. Familial spherophakia with short stature caused by a novel homozygous ADAMTS17 mutation. Ophthalmic Genet. 2012;33(4):235–9. pmid:22486325
- 43. Rasheed F, Khan A, Musleh Ud Din S, Zaidi M, Ayub MA, Hussein A. Efficacy and safety of spleen tyrosine kinase inhibitors in chronic immune thrombocytopenic purpura: A meta-analysis of randomized controlled trials. Blood. 2025;146(Supplement 1):6589–6589.
- 44. Matsuura S, Komeno Y, Stevenson KE, Biggs JR, Lam K, Tang T, et al. Expression of the runt homology domain of RUNX1 disrupts homeostasis of hematopoietic stem cells and induces progression to myelodysplastic syndrome. Blood. 2012;120(19):4028–37. pmid:22919028
- 45. Peres J, Mowla S, Prince S. The T-box transcription factor, TBX3, is a key substrate of AKT3 in melanomagenesis. Oncotarget. 2015;6(3):1821–33. pmid:25595898
- 46. Peres J, Damerell V, Chauhan J, Popovic A, Desprez P-Y, Galibert M-D, et al. TBX3 Promotes Melanoma Migration by Transcriptional Activation of ID1, which Prevents Activation of E-Cadherin by MITF. J Invest Dermatol. 2021;141(9):2250–2260.e2. pmid:33744299
- 47. Okawa ER, Gupta MK, Kahraman S, Goli P, Sakaguchi M, Hu J, et al. Essential roles of insulin and IGF-1 receptors during embryonic lineage development. Mol Metab. 2021;47:101164. pmid:33453419
- 48. Weissbrod O, Hormozdiari F, Benner C, Cui R, Ulirsch J, Gazal S, et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat Genet. 2020;52(12):1355–63. pmid:33199916
- 49. Mostafavi H, Spence JP, Naqvi S, Pritchard JK. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat Genet. 2023;55(11):1866–75. pmid:37857933
- 50. Fauman EB, Hyde C. An optimal variant to gene distance window derived from an empirical definition of cis and trans protein QTLs. BMC Bioinformatics. 2022;23(1):169. pmid:35527238
- 51. Claussnitzer M, Dankel SN, Kim KH, Quon G, Meuleman W, Haugen C. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N Engl J Med. 2015;373:895–907.
- 52. Kanai M, Elzur R, Zhou W, Global Biobank Meta-analysis Initiative, Daly MJ, Finucane HK. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genom. 2022;2(12):100210. pmid:36643910
- 53. Chen W, Wu Y, Zheng Z, Qi T, Visscher PM, Zhu Z, et al. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nat Commun. 2021;12(1):7117. pmid:34880243
- 54. Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, et al. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet. 2024;25(1):8–25. pmid:37620596
- 55. All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature. 2024;627(8003):340–6. pmid:38374255
- 56. Kanai M, Ulirsch JC, Karjalainen J, Kurki M, Karczewski KJ, Fauman E, et al. Insights from complex trait fine-mapping across diverse populations. medRxiv. 2021.
- 57. Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019;47(D1):D766–73. pmid:30357393
- 58. Grau J, Grosse I, Keilwagen J. PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics. 2015;31(15):2595–7. pmid:25810428
- 59.
Boyd K, Eng KH, Page CD. Area under the precision-recall curve: Point estimates and confidence intervals. Machine learning and knowledge discovery in databases. Springer Berlin Heidelberg. 2013:451–66.
- 60. Stacey D, Fauman EB, Ziemek D, Sun BB, Harshfield EL, Wood AM, et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. 2019;47(1):e3. pmid:30239796
- 61. Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017;45:D945–54.
- 62. Sinnott-Armstrong N, Naqvi S, Rivas M, Pritchard JK. GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background. Elife. 2021;10:e58615. pmid:33587031