Figures
Abstract
Early-stage diagnosis of paroxysmal atrial fibrillation (PAF) is challenging owing to its asymptomatic nature. However, the genetic factors underlying PAF and predictive utility of polygenic risk scores (PRSs) for PAF in Asian populations remain elusive. We aimed to explore the PAF-associated genetic variants in a Japanese cohort and evaluate the predictive performance of PAF-specific PRSs. This study included 2,604 participants. Following exclusion, quality control, and genotype imputation, a genome-wide association study (GWAS) was conducted. The predictive performance of 30 sets of PRS models constructed across various thresholds was evaluated using three machine learning methods. Model performance was assessed using area under the curve (AUC) and SHapley Additive exPlanations (SHAP). The GWAS using 1,038 PAF cases and 744 controls identified 82 genome-wide significant variants (P < 5 × 10−8), all on chromosome 4q25. Of these, 80 variants clustered upstream of PITX2, and two were located in LINC01438. Fine mapping identified two independent intergenic signals, with rs2200732 as the lead single-nucleotide polymorphism. The best PRS-only model achieved an AUC of >0.70, which was improved up to 0.737 in additive models incorporating both PRS and clinical variables. SHAP analysis consistently ranked PRS as the most influential predictor among the clinical variables included in this study. These results suggest that genetic risk, particularly at the established 4q25/PITX2 locus, contributes substantially to PAF susceptibility in this Japanese cohort and that PRS may improve early risk stratification when integrated with clinical risk factors.
Citation: Shiomi M, Nagata Y, Sudo T, Takahashi K, Higuchi C, Ihara K, et al. (2026) Genetic variants and polygenic risk scores associated with paroxysmal atrial fibrillation in the Japanese population. PLoS One 21(5): e0344360. https://doi.org/10.1371/journal.pone.0344360
Editor: Xiang Zhu, Calico: Calico Life Sciences LLC, UNITED STATES OF AMERICA
Received: October 3, 2025; Accepted: February 19, 2026; Published: May 4, 2026
Copyright: © 2026 Shiomi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Genotype data has been deposited at the Japanese Genotype-phenotype Archive (JGA, https://www.ddbj.nig.ac.jp/jga), which is hosted by the Bioinformation and DDBJ Center, under accession number JGAS000866. The successful PRSs resulting from this study have been deposited in the PGS Catalog (publication ID: PGP000775 and score ID: PGS005388). The analysis code used in this study is publicly available on Zenodo at https://doi.org/10.5281/zenodo.17918863.
Funding: This work was supported in part by the Japan Agency for Medical Research and Development (AMED; https://www.amed.go.jp/en/index.html) under Grant Number 21he2102002h0003 to T.F. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: T.S. reports receiving research funding from the City of Shizuoka. This does not alter our adherence to PLOS ONE policies on sharing data and materials. All other authors have declared that no competing interests exist.
Introduction
Paroxysmal atrial fibrillation (PAF), the most common clinical subtype of atrial fibrillation (AF), accounts for approximately 50% of all AF cases in Japan [1]. Clinically, AF is classified as paroxysmal, persistent, long-standing persistent, or permanent. PAF is characterized by its intermittent and self-terminating nature, resolving spontaneously within 7 days of onset. Although PAF often initially presents with brief episodes, it can progress to persistent or permanent AF over time [2], leading to a worsening prognosis as the disease advances [3]. The majority of patients hospitalized for stroke and diagnosed with new-onset AF reportedly have asymptomatic PAF [4]. Moreover, PAF is reported to be more prevalent than persistent AF among patients with a history of stroke or transient ischemic attack [5]. These findings highlight the need for early detection and intervention. However, diagnosing PAF remains challenging owing to its intermittent, short-lived, and frequently asymptomatic presentation.
The development of AF, including PAF, is associated with advanced age and established risk factors, such as smoking, alcohol consumption, obesity, hypertension, diabetes, and cardiovascular disease [6]. In addition to these clinical and environmental factors, genetic predisposition plays a significant role in AF pathogenesis [7–11]. AF can occur in the absence of conventional risk factors, suggesting familial or heritable forms [7].
Since the first genome-wide association study (GWAS) on AF, over 100 susceptibility loci have been identified across multiple ethnic populations [8–11]. Among these, the 4q25 locus near PITX2 is one of the most consistently replicated genetic signals for AF susceptibility [8–11]. PITX2 encodes a transcription factor implicated in atrial development and electrophysiological regulation [10]. Functional studies, including animal models and human iPSC-derived cardiomyocytes, further suggest that reduced PITX2 activity may alter the electrophysiological properties of cardiomyocytes [12,13]. The genetic architecture of AF has been reported to differ between Japanese and European populations [11]. In Japanese cohorts, associations at the 4q25/PITX2 locus have also been confirmed, and six additional loci (KCND3, PPFIA4, SLC1A4–CEP68, HAND2, NEBL, and SH3PXD2A) have been identified [11]. Several of these genetic loci are specific to Japanese and East Asian populations and have not been consistently observed in studies of populations with European ancestry, highlighting genetic diversity between populations. These ancestry-dependent differences underscore the need for gene discovery and risk prediction using population-matched cohorts. However, most previous studies have treated AF as a single phenotype. As a result, the specific contributions of established AF-associated loci, such as PITX2, to PAF remain poorly understood.
Polygenic risk scores (PRSs), which aggregate the effects of multiple disease-associated single-nucleotide variants, have emerged as valuable tools for estimating the genetic risk of an individual. Khera et al. developed PRSs for five common diseases, including AF, and demonstrated that these scores can identify individuals with risk comparable to that conferred by monogenic variants [14]. Although PRSs for AF show limited predictive performance when used in isolation, their performance improves when combined with clinical risk models [15]. Most AF PRSs have been developed in populations of European ancestry [15], although several recent studies have constructed and validated AF PRSs in Japanese cohorts [16–18]. However, these studies have primarily focused on general AF phenotypes without distinguishing PAF from persistent AF. Additionally, the utility of PRSs specifically for PAF remains largely underexplored, despite its clinical significance and high prevalence. As PAF may represent a distinct phenotype within AF, evaluating its genetic features and predictive models separately is essential to advance precision medicine approaches.
In this study, we aimed to uniquely target PAF, which may have distinct genetic underpinnings and clinical implications. We conducted a GWAS in a Japanese population to identify susceptibility loci associated with PAF. Furthermore, we constructed PRSs by aggregating the effects of identified risk variants using a clumping and thresholding approach. These PRSs were evaluated using multiple machine learning algorithms to assess their predictive utility alone and in combination with clinical variables and gene–environment interaction terms. To enhance interpretability and facilitate clinical translation, we used SHapley Additive exPlanations (SHAP) to quantify the relative contributions of genetic and clinical predictors [19]. Overall, this study aimed to evaluate whether PRSs can predict the risk of PAF in a Japanese cohort and to identify PAF-relevant susceptibility loci. Our findings indicate that these aims were achieved.
Materials and methods
Study participants
In total, 2,604 participants were enrolled between March 1, 2020 and December 31, 2021, including inpatients and outpatients from the Department of Cardiology of Tokyo Medical and Dental University Hospital, Jichi Medical University Hospital, Yokohama City Minato Red Cross Hospital, Jichi Medical University Saitama Medical Center, Yokosuka Kyosai Hospital, National Hospital Organization Disaster Medical Center, and Tsuchiura Kyodo General Hospital. This study was conducted in Japanese subjects, and participants were classified as PAF cases or controls; those in the control group were unrelated individuals without any history of AF, recruited from the same institutions. AF was defined as an electrocardiogram (ECG) recording lasting over 30 s, while PAF was defined as AF that spontaneously terminated within 7 days of onset, in accordance with the Japanese Circulation Society/Japanese Heart Rhythm Society guidelines [20]. All participants underwent ECG assessments at their respective institutions. PAF cases were identified based on physician diagnoses with ECG documentation. Controls were defined as individuals without a documented history of AF/PAF and without AF documented on the ECG performed at the participating institutions. Clinical exclusion criteria were: (1) missing clinical information, (2) AF other than PAF, (3) history of heart failure (HF), (4) coronary artery disease (CAD), or (5) valvular heart disease. HF, CAD, and valvular heart disease, an organic cardiac condition that directly affects AF [6], were excluded to avoid potential confounding in comparisons between PAF cases and controls from the same cardiology departments. Collected clinical information included age, body mass index (BMI), cardiometabolic parameters at sample collection, age at PAF diagnosis (for PAF cases), sex, smoking and alcohol consumption (current drinker vs. non-drinker), histories of hypertension (HT), diabetes mellitus (DM), dyslipidemia (DL), cerebral infarction (CI), and family history of AF. In addition to medical records, HT was defined as a systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg or the use of antihypertensive medications [21]. DM was defined as a fasting plasma glucose level ≥126 mg/dL, random plasma glucose level ≥200 mg/dL, or use of antidiabetic medications [22]. DL was defined as low-density lipoprotein cholesterol ≥140 mg/dL, high-density lipoprotein cholesterol <40 mg/dL, triglycerides ≥150 mg/dL in the fasting state or ≥175 mg/dL in the non-fasting state, or use of lipid-lowering medications [23].
This study was approved by the institutional review boards of all participating institutions. Ethical approval for the overall study was granted by the Ethics Committee of Tokyo Medical and Dental University (No. O2019-006), and the study was conducted in accordance with the principles of the Declaration of Helsinki. Written informed consent was obtained from all participants.
Single-nucleotide polymorphism (SNP) genotyping, quality control, and genetic imputation
Genomic DNA was extracted from peripheral blood samples and genotyped using the Infinium Asian Screening Array-24 v1.0 BeadChip (Illumina, Inc., San Diego, CA, USA), which covers 659,184 common SNPs, at Macrogen Japan (Tokyo, Japan). Genotype calling was performed using GenomeStudio v2.0 (Illumina, Inc.). Quality control of the raw data was conducted using PLINK v1.9 [24] at the sample and SNP levels. Samples were excluded for sex discrepancy, call rate <97%, excess heterozygosity (heterozygosity rate ± 3 standard deviations from the mean), or relatedness (PI_HAT > 0.185). Population stratification was assessed using principal component analysis, comparing the first four principal components with five reference populations (Africans, Americans, South Asians, East Asians, and Europeans) from the 1000 Genomes Project [25]. SNPs were excluded if they had a call rate <95%, Hardy–Weinberg equilibrium P-value <1.0 × 10−6, or minor allele frequency (MAF) <0.01. Association analyses were performed using autosomal SNPs. Pre-phasing was performed using SHAPEIT2 [26] and imputation using Minimac3 [27], with the 1000 Genomes Phase 3 reference panel [25]. Post-imputation, SNPs were filtered for imputation quality (Rsq < 0.3), MAF < 0.01, and duplications. Accordingly, our association analyses focused on variants with a MAF of ≥0.01, and rare-variant association tests were not performed.
GWAS analysis
GWAS was conducted using logistic regression in PLINK [24], adjusting for covariates including sex, smoking status, alcohol consumption, history of HT, DM, DL, and CI, and family history of AF (maternal, paternal, and other relatives). We did not include BMI or cardiometabolic laboratory measurements at the time of sample collection as covariates because these measurements may be influenced by ongoing treatments and, for some cases, may have been obtained after the PAF diagnosis. Instead, we adjusted for major comorbidity histories defined by standardized clinical criteria. Genome-wide significance was defined as P < 5 × 10 ⁻ ⁸ and suggestive significance as P < 1 × 10 ⁻ ⁵. SNP annotations were obtained from SNPnexus [28] and dbSNP [29]. To identify lead SNPs, the Functional Mapping and Annotation (FUMA) platform v1.5.2 [30] was used, incorporating GWAS summary statistics and linkage disequilibrium (LD) data from the 1000 Genomes Phase 3 East Asian panel [25]. This analysis mapped GWAS-identified SNPs to 18,900 protein-coding genes and calculated P-values for each gene using multi-marker analysis of genomic annotation (MAGMA) v1.6 [31], implemented using FUMA [30]. A Bonferroni correction was applied to determine statistical significance, with the significance threshold set at a P-value < 2.6 × 10−6 (0.05/18,900). Independent significant SNPs were identified as those with a P < 5 × 10 ⁻ ⁸ and pairwise LD r2 < 0.6. Lead SNPs were selected as those among these with pairwise LD r2 < 0.1. Annotations were based on the GRCh37 (hg19) genome build. Functional annotation of lead SNPs included evaluation of the combined annotation-dependent depletion (CADD) score (deleteriousness) [32], the RegulomeDB score (regulatory potential) [33], expression quantitative trait loci (eQTL), and three-dimensional chromatin interactions (Hi-C data) [34]. The false discovery rate (FDR) was applied to adjust for multiple comparisons. Lead SNPs identified using FUMA [30] were subsequently used as covariates in conditional analysis performed in PLINK [24] to assess the independence of genome-wide significant insertions and deletions (INDELs) located within the same loci. Regional plots were generated using LocusZoom to visualize the LD structures surrounding significant variants [35].
PRS construction
Before PRS calculation, the entire dataset was randomly split into a training dataset (90%) and a test dataset (10%) while preserving the case–control ratio through stratified sampling. Given the modest sample size, we adopted a 90/10 split to maximize the samples available for model development while maintaining an independent dataset for final evaluation. This split was performed once and fixed throughout all subsequent analyses to avoid information leakage. PRS for PAF were constructed using SNP effect size estimates (summary statistics) from a published AF GWAS [11]. The rationale for selecting this reference GWAS is detailed in the Results section. The framework of this study is shown in Fig 1. To evaluate the effect of GWAS thresholds and LD clumping on PRS, we created 30 SNP sets representing all combinations of three P-value thresholds (5 × 10 ⁻ ⁸, 1 × 10 ⁻ ⁵, and 1 × 10 ⁻ ⁴) and 10 squared correlation (r2) thresholds for clumping (no clumping and 0.9 to 0.1) within a 250-kb window. For each SNP set in the training and test datasets, we calculated:
The entire dataset was randomly split using stratified sampling into training (90%) and test (10%) datasets to maintain the case–control ratio. Within the training dataset, 30 sets of PRS models were constructed using all combinations of three P-value threshold settings (5 × 10 ⁻ ⁸, 1 × 10 ⁻ ⁵, and 1 × 10 ⁻ ⁴) and 10 linkage disequilibrium (r²) clumping threshold settings (no clumping and 0.9 to 0.1) based on summary statistics from a published AF GWAS. For each PRS set, three machine learning models—RF, XGB, and LGBM—were trained using five-fold stratified cross-validation. The best PRS threshold setting and model combination was selected based on the mean cross-validated area under the curve and evaluated on the test dataset using 1,000 bootstrap resamples. Three types of models were developed: PRS-only, additive (PRS + clinical variables), and multiplicative (additive + interaction terms). To interpret model predictions, SHapley Additive exPlanations values were computed for the additive and multiplicative models to estimate the contribution of each feature to prediction outcomes. AF, atrial fibrillation; LGBM, light gradient boosting machine; PRS, polygenic risk score; RF, random forest; XGB, extreme gradient boosting.
where is the log odds ratio (effect size) for SNPi, which represents the summary statistics published in AF GWAS [11], Gi is the genotype count (0, 1, or 2), and n is the number of SNPs included, representing the data evaluated in this study.
Machine learning-based evaluation of PRS
In the training dataset, three ensemble-based machine learning models—random forest (RF), extreme gradient boosting (XGB), and light gradient boosting machine (LGBM)—were evaluated owing to their ability to model non-linear relationships, robustness to feature correlation, and compatibility with SHAP-based interpretation. These three algorithms were selected a priori as representative tree-ensemble methods for tabular clinical/genetic data, facilitating a consistent SHAP interpretability workflow using TreeExplainer across models. Single decision trees and non-tree-based models were not evaluated in this analysis. Three model types were evaluated: 1) PRS-only model; 2) additive model: PRS + clinical variables (sex, smoking, alcohol consumption, history of HT, DM, DL, and CI, and family history of AF); and 3) multiplicative model: additive model + interaction terms (PRS × each clinical variable). Interaction terms were specified a priori to assess potential effect modification of genetic risk by major clinical factors known to be relevant to AF. To limit model complexity and reduce the risk of overfitting, interactions were restricted to covariates with complete data (no missingness). Hyperparameter tuning was performed using Bayesian optimization with Optuna (Tree-structured Parzen Estimator; seed = 42) under five-fold stratified cross-validation, maximizing cross-validated area under the curve (CV-AUC); we ran 100 trials per algorithm per feature set. Hyperparameter search spaces for RF, XGB, and LGBM are provided in S1 Table. The model with the highest mean CV-AUC across the five folds was selected and tested on the test dataset. To assess robustness, 95% confidence intervals (95% CI) were calculated using 1,000 non-parametric bootstrap resamples from the test dataset. Performance metrics included AUC, area under the precision–recall curve (AUPRC), and F1 score, defined as the harmonic mean of precision and recall. As a sensitivity analysis, we also evaluated clinical-covariate–only models (excluding PRS) using the same fixed training/test split and evaluation pipeline.
SHAP-based feature interpretation
To enhance model transparency and assess feature contribution, SHAP values were calculated to estimate the marginal contribution of each feature to individual predictions [19]. The TreeExplainer method was used for each model (RF, LGBM, and XGB). Summary plots were generated to visualize the distribution of feature importance across all samples. This interpretability analysis was applied to additive and multiplicative models.
Statistical analysis
Continuous variables are summarized as mean ± standard deviation, and categorical variables as counts and percentages. Differences between PAF cases and controls were tested using Student’s t-test for continuous variables and Fisher’s exact test for categorical variables. General statistical analyses were conducted using R v4.4.1. All statistical tests were two-sided at the 0.05 significance level, unless stated otherwise. Machine learning and SHAP analyses were performed in Python v3.12.9 using scikit-learn v1.6.1 (including RF), LGBM v4.6.0, XGBoost v3.0.0, and Optuna v4.3.0.
Results
Quality control, imputation, and clinical characteristics
Of the 2,604 participants enrolled, 764 were excluded based on clinical criteria, and an additional 58 were removed during genotype-based quality control. Additional details on the specific exclusion reasons and quality control thresholds are provided in S1 Fig. Finally, 1,782 participants (1,038 PAF cases and 744 controls) were included in the GWAS and PRS analyses. At the SNP level, 215,273 variants were removed based on quality control filters, leaving 443,911 variants for the downstream analysis. After imputation filtering, 8,094,202 variants were selected for GWAS (S1 Fig). Baseline characteristics of the participants are summarized in Table 1. PAF cases were more likely to be male (P = 0.009), older at the time of sample collection (66.4 ± 11.5 vs. 64.9 ± 13.6 years; P = 0.02), and alcohol consumers (P < 0.001) and have a family history of AF (maternal: P < 0.001; paternal: P = 0.002; other relatives: P < 0.001) than controls. No significant differences were observed for smoking or histories of HT, DM, DL, or CI. BMI at sample collection was comparable between PAF cases and controls. Age at sample collection was available for all participants; in contrast, age at PAF diagnosis was recorded only for a subset of cases. Among all PAF cases (n = 1,038), age at PAF diagnosis was available for 734 patients (70.7%). In this subset, the mean age at diagnosis was 65 ± 12 years, and 65% were aged ≥60 years.
GWAS analysis
GWAS identified 82 genome-wide significant variants (P < 5 × 10 ⁻ ⁸), all located on chromosome 4q25 (Fig 2A, S2 Table). An additional 78 variants showed suggestive significance (P < 1.0 × 10 ⁻ ⁵; S3 Table). A quantile-quantile plot indicated minimal genomic inflation (λ = 0.98), suggesting limited population stratification (Fig 2B). Gene-based MAGMA [31] analysis revealed no genes reaching genome-wide significance (threshold P < 2.6 × 10 ⁻ ⁶; Fig 2C), with λ = 1.05 (Fig 2D). Of the 82 variants, 72 were SNPs and 10 were INDELs. Seventy SNPs and all ten INDELs were intergenic, located upstream of PITX2, whereas two SNPs were within LINC01438, a long intergenic non-coding RNA. Using FUMA [30], two independent significant intergenic SNPs (rs2200732 and rs13122916) were identified, with rs2200732 designated as the lead SNP (S2 Fig). Functional annotation was performed for the two independent significant SNPs, rs2200732 and rs13122916. However, neither SNP showed deleterious potential based on CADD [32] or RegulomeDB scores [33], nor were they associated with PITX2 expression in cis-eQTL analyses. Chromatin interaction analysis identified three additional genome-wide significant SNPs—rs7434417, rs1906591, and rs6843082—in predicted enhancer regions that physically interacted with PITX2 regulatory elements (FDR < 1.0 × 10 ⁻ 6). These interactions were observed in mesenchymal stem cells and mesoderm-derived tissues. Conditional analysis assessing the effect of the 10 INDELs alongside these SNPs revealed that none of the INDELs reached suggestive significance or showed independent effects.
(A) Manhattan plot of the SNP-based GWAS. The X‐axis represents the chromosomal location, and the Y‐axis represents the − log10 P‐value. The red line indicates the genome-wide significance threshold of P = 5 × 10−8 and the blue line indicates the suggestive threshold of P = 1 × 10−5. (B) Quantile–quantile plot of SNP-based GWAS. The red diagonal line represents the expected distribution of P-values under the null hypothesis of no association. (C) Manhattan plot of the gene-based GWAS. The X‐axis represents the chromosomal location, and the Y‐axis represents the − log10 P‐value. The red dashed line indicates the statistical significance threshold of P = 2.6 × 10−6, which is Bonferroni-corrected for the 18,900 protein-coding genes. (D) Quantile–quantile plot of gene-based GWAS. The red dashed diagonal line represents the expected distribution of P-values under the null hypothesis of no association. GWAS, genome-wide association study; PAF, paroxysmal atrial fibrillation; SNP, single-nucleotide polymorphism.
PRS performance
In our GWAS, all genome-wide significant variants were located at 4q25, near PITX2. To avoid overfitting and utilize well-powered effect estimates, PRSs were built from an independent large-scale Japanese AF GWAS [11] with no sample overlap. Thirty sets of PRS models were created, combining three P-value thresholds with 10 LD r² thresholds. Among PRS-only models, the highest 5-fold CV-AUCs were consistently achieved using the P < 1.0 × 10 ⁻ ⁵ and r² > 0.3 threshold set, which included 122 SNPs (S3 Fig). In the test dataset, the LGBM-based PRS-only model achieved the highest AUC (0.702; 95% CI: 0.624–0.770), followed by RF (0.695) and XGB (0.683) (Table 2). All methods showed comparable AUPRC and F1 scores (Table 2). The PRS was standardized using the mean and standard deviation of controls in the training cohort. In the test cohort, the standardized PRS had a mean (± standard deviation) of 0.22 ± 1.01 overall. The mean PRS was significantly higher in cases than in controls (0.50 ± 1.01 vs. −0.16 ± 0.88; P = 5.55 × 10 ⁻ ⁶, t-test). Calibration performance was assessed using the Brier score in the test dataset. The Brier scores were 0.224 for RF, 0.226 for XGB, and 0.228 for LGBM, indicating similar overall predictive error across models.
For additive models incorporating clinical variables, the highest test AUC was achieved using XGB (0.737; 95% CI: 0.668–0.804), followed by RF (0.727) and LGBM (0.704) (Table 2). Minor improvements in AUPRC and F1 scores indicated modest benefits from adding clinical variables. As a sensitivity analysis, clinical-only models achieved AUCs of 0.709 (RF), 0.700 (XGB), and 0.671 (LGBM) in the test dataset (S4 Table). Compared to the clinical-only models, adding PRS to the clinical covariates yielded a modest but consistent improvement in test AUC (Table 2). Multiplicative models, including interaction terms between PRS and clinical variables, showed the best performance using XGB (AUC: 0.715; 95% CI: 0.643–0.785), followed by RF (0.703) and LGBM (0.700) (Table 2). Receiver operating characteristic curves for the test dataset are shown in Fig 3 for the PRS-only, additive, and multiplicative models.
Curves are shown for RF, XGB, and LGBM; AUCs (95% confidence intervals) are indicated in the legends. AUC, area under the curve; LGBM, light gradient boosting machine; RF, random forest; XGB, extreme gradient boosting.
SHAP-based feature contributions
To interpret model predictions, SHAP summary plots were generated for each machine learning method under both additive and multiplicative models. Fig 4 shows the global and local SHAP summary plots for the additive models using RF, XGB, and LGBM. In the global feature importance bar plots on the left, PRS consistently ranked as the most influential predictor across all models. The beeswarm plots (Fig 4, right) show the distribution of SHAP values for each feature, illustrating how variations in feature values influence individual predictions. The wide and symmetrical distribution of SHAP values for PRS indicates its strong and consistent effect on PAF risk prediction. Alcohol consumption and sex were the next most important features, although their contributions were smaller than that of PRS. As shown in the beeswarm plots, individuals who reported alcohol consumption exhibited higher SHAP values than non-drinkers, and men had higher predicted risks than women. Fig 5 shows the same plots for the multiplicative models using RF, XGB, and LGBM. In the global feature importance bar plots, PRS again emerged as the most influential predictor across all models. Several interaction terms, such as PRS × sex and PRS × DL, ranked among the top features; however, their contributions were consistently smaller than that of the main PRS effect. The beeswarm plots on the right display the distribution of SHAP values for both main and interaction features. Although PRS exhibited the widest and most symmetrical SHAP distribution, suggesting a strong and consistent impact on PAF risk prediction, interaction terms showed narrower distributions. For instance, in the XGB model, the mean absolute SHAP value for PRS was 0.34, whereas that for PRS × sex was 0.07, indicating a modest contribution of gene–environment interactions compared with the dominant genetic effect.
SHAP global feature importance bar plots (left) and local explanation summary beeswarm plots (right) are shown for (A) RF, (B) XGB, and (C) LGBM. In each model, features are ranked by their mean SHAP values, with the same ranking applied in both bar and beeswarm plots. The bar plots indicate the average contribution of each feature to the magnitude of the model output. In the beeswarm plots, each dot represents one individual in the test dataset. The X-axis shows the SHAP value, indicating the positive or negative contribution of the features to the prediction. Colors represent feature values: red, high; blue, low. Vertical dispersion indicates the density of SHAP values across individuals. CI, cerebral infarction; DL, history of dyslipidemia; DM, history of diabetes mellitus; HT, history of hypertension; LGBM, light gradient boosting machine; Maternal, family history in the mother; Other, family history in other relatives; Paternal, family history in the father; PRS, polygenic risk score; RF, random forest; SHAP, SHapley Additive exPlanations; XGB, extreme gradient boosting.
SHAP global feature importance bar plots (left) and local explanation summary beeswarm plots (right) are shown for (A) RF, (B) XGB, and (C) LGBM. In each model, the top 20 features are ranked by their mean SHAP values, with the same ranking applied in both bar and beeswarm plots. The bar plots indicate the average contribution of each feature to the magnitude of the model output. In the beeswarm plots, each dot represents one individual in the test dataset. The X-axis shows the SHAP value, indicating the positive or negative contribution of the features to the prediction. Colors represent feature values: red, high; blue, low. Vertical dispersion indicates the density of SHAP values across individuals. CI, cerebral infarction; DL, history of dyslipidemia; DM, history of diabetes mellitus; HT, history of hypertension; LGBM, light gradient boosting machine; Maternal, family history in the mother; Other, family history in other relatives; Paternal, family history in the father; PRS, polygenic risk score; RF, random forest; SHAP, SHapley Additive exPlanations; XGB, extreme gradient boosting.
Discussion
This study identified genetic variants associated with PAF in a Japanese population and evaluated the predictive utility of PRSs using multiple machine learning models. In addition to confirming the well-established PITX2 locus on chromosome 4q25 as a major determinant of AF, our findings offer insight into the genetic architecture and risk modeling of PAF as a distinct clinical subtype. This is among the few studies to focus specifically on PAF and integrate interpretable machine learning approaches to quantify the relative contributions of genetic and clinical factors.
Our GWAS identified 82 genome-wide significant variants, all located upstream of PITX2 on chromosome 4q25. This region has been repeatedly reported as the most robust genetic risk locus for AF across diverse populations, including Japanese cohorts [8,9,11,16,36–39]. In particular, two independent significant SNPs, rs2200732 and rs13122916, were identified with rs2200732 as the lead SNP in our cohort. A previous well-powered GWAS in a Japanese population identified three independently associated variants near PITX2 (rs2220427, rs6843082, and rs3853445) linked to AF susceptibility [11]. Other GWASs in the Japanese population have reported additional associated SNPs in this region, including rs4611994, rs1906617, and rs6817105 [16,38,39]. Although rs2200732 exceeded the genome-wide significance threshold (P < 5 × 10−8) in a previous study, it was not identified as an independent signal [11], suggesting that rs2200732 may represent a proxy variant in linkage disequilibrium with previously reported AF-associated lead variants at the 4q25/PITX2 locus. Furthermore, the rs2200732 region is enriched with monomethylated histone H3 lysine 4, a marker indicative of enhancer activity [38]. However, enhancer reporter assays do not demonstrate significant differences in promoter activity between risk and protective alleles of rs2200732 [38]. Neither SNP showed deleteriousness or cis-eQTL associations with PITX2; however, chromatin interaction analysis revealed that nearby SNPs—rs7434417, rs1906591, and rs6843082—were located in enhancer regions that physically interacted with PITX2 regulatory domains in mesoderm-derived tissues. In particular, rs7434417 and rs6843082 have previously been reported as AF-associated SNPs [9,11,38], whereas rs1906591 has been associated with ischemic and cardioembolic stroke risk [40]. These findings suggest that chromatin architecture at the 4q25 locus mediates regulatory effects on PITX2 expression and highlight the need for further functional studies to elucidate the biological relevance of this region in PAF pathogenesis.
We constructed 30 sets of PRS models and evaluated their predictive performance using three tree-based machine learning methods: RF, XGB, and LGBM. Although numerous PRSs have been developed for AF and significant associations between PRS and AF risk have been demonstrated, it has been noted that PRS alone has limited clinical utility [15]. We compared three types of models: PRS-only, additive models (PRS + clinical variables), and multiplicative models (additive model + interaction terms). The best-performing PRS-only models achieved AUCs exceeding 0.70 in the test dataset, outperforming previously reported AF-PRS models, which typically yield AUCs of approximately 0.60 in large datasets, such as the UK Biobank [15]. In a Japanese study, weighted genetic risk scores have achieved AUCs of approximately 0.641 [16]. Furthermore, our additive models showed modest improvements in performance, with AUCs up to 0.737, whereas multiplicative models incorporating gene–environment interactions did not offer additional gains. The consistently top-ranked PRS across all models suggests that, among the predictors included in our models, genetic predisposition contributes substantially to PAF risk prediction, even when major cardiometabolic comorbidities (hypertension, diabetes, and dyslipidemia) and lifestyle factors were considered. These findings highlight the importance of evaluating AF subtypes separately, as risk contributions may vary depending on clinical and pathophysiological profiles. As PAF is often asymptomatic and episodic, with a different natural history and treatment response compared with persistent AF, a better understanding of its genetic determinants may facilitate earlier detection and personalized management.
A key strength of our study is the incorporation of SHAP to enhance model interpretability. SHAP provides individual-level feature attributions, enabling the direct comparison of feature contributions across models. Across all modeling approaches, SHAP consistently identified PRS as the most important predictor, with wide and symmetrical distributions indicating strong and consistent contributions to PAF risk prediction. This pattern was observed in both additive and multiplicative models, suggesting robustness to modeling strategy. Among clinical variables, alcohol consumption and sex emerged as the next most influential features in additive models, although their contributions were smaller than that of PRS. Individuals with a history of alcohol consumption exhibited elevated SHAP values, reflecting an increased predicted risk of PAF. Similarly, sex-related patterns indicated that male participants showed a higher predicted risk than female participants, consistent with prior AF epidemiological findings [41,42]. In contrast, interaction terms in multiplicative models, such as PRS × sex or PRS × DL, did not surpass the main effect of PRS in either SHAP values or model performance. For example, in the XGB multiplicative model, PRS had a mean absolute SHAP value of 0.34, whereas PRS × sex had only 0.07. This quantitative ranking reinforces the conclusion that the explicit modeling of gene–environment interactions offers limited additional value. Previous studies have shown that incorporating covariates, such as age, sex, BMI, and the CHARGE-AF score, can improve model performance, achieving C-indices or AUCs of approximately 0.72–0.83 [15,43,44]. Similar improvements have been observed in Japanese and East Asian cohorts after adjusting for age, sex, genotype array, BMI, hypertension, and major principal components [17,45]. Our results for PAF are consistent with these findings. However, owing to limitations in available data at diagnosis, we were unable to include age of onset, BMI, or CHARGE-AF score, which have been shown to improve prediction accuracy in previous studies. The inclusion of these variables could enhance future models. Moreover, previous studies have demonstrated an association between PRS and age at onset, reporting that individuals with higher PRS values tend to develop AF at younger ages [16,18,46,47]. In the present study, age at diagnosis was known for some participants, with most diagnosed after the age of 60 years. Further studies, including younger patients with PAF, are needed to evaluate age-specific predictive accuracy and clarify the role of PRS across age strata.
This study has several strengths, including the use of a well-characterized Japanese cohort, application of multiple tree-based machine learning algorithms, and incorporation of interaction modeling and SHAP-based interpretability analyses. However, certain limitations should be acknowledged. First, external validation in an independent Japanese cohort was not performed. Additionally, owing to the modest sample size, this study may have been underpowered to detect additional genome-wide significant loci beyond the 4q25/PITX2 region, particularly for variants with small effect sizes or low allele frequencies. To confirm the reliability and reproducibility of our findings, larger studies and validation in independent cohorts will be required. Second, as the control group comprised individuals recruited from cardiology departments, it does not fully represent the general population, potentially introducing selection bias. Consequently, the generalizability of our findings may be limited. In addition, because PAF can be intermittent and asymptomatic, undiagnosed PAF among controls cannot be completely excluded. Future studies, incorporating population-based controls or external cohorts, would aid in addressing this limitation and enable sensitivity analyses to assess the effect of control selection on risk estimation. Third, the PRS was derived from AF-associated loci identified in previous studies, which may not fully capture the genetic risk specific to PAF. Future analyses using PAF-specific GWAS summary statistics, once available, may improve predictive accuracy. In addition, because variants with MAF < 0.01 were excluded during quality control in this GWAS with imputation, we did not evaluate rare-variant associations; further studies will be needed to assess the contribution of rare variants to PAF. Fourth, the use of a single train/test data split restricts a comprehensive assessment of variability in model performance. Additionally, because the test dataset comprised only 10% of the cohort, performance estimates may be unstable. Although five-fold cross-validation was applied within the training dataset for hyperparameter tuning, a fixed train/test data split was maintained to prevent information leakage and ensure fair comparisons across PRS thresholds and machine learning models. While this approach favors comparability, it may underrepresent performance variability. Even though we reported bootstrap-based confidence intervals, performance may still be sensitive to the specific random split. Future studies should assess model stability using repeated resampling or nested cross-validation in larger cohorts and, importantly, validate the models in independent external cohorts. Finally, as this study focused exclusively on a Japanese population, the external generalizability of our findings to other ethnic groups remains untested. Further studies are needed to validate these results in diverse populations to establish their broader applicability.
From a clinical perspective, our findings suggest that PRS could serve as a valuable component in the early risk stratification of PAF, particularly among individuals who do not exhibit conventional clinical risk factors. Given the intermittent and frequently asymptomatic nature of PAF, genetics-based risk models may facilitate proactive screening strategies and enable earlier detection and intervention.
Conclusions
The present study demonstrated that variants near the PITX2 locus strongly influenced PAF risk in the Japanese population and that PRSs derived from AF-associated loci offered robust predictive value even when used independently. Incorporating clinical variables in additive models yielded modest improvements, whereas multiplicative models offered minimal incremental benefit. SHAP-based interpretability analyses consistently identified PRS as the most influential predictor among the clinical variables included in our models. However, some established clinical predictors, particularly BMI at the time of PAF diagnosis and validated clinical risk scores, were not available for all participants. Incorporating these parameters into future models and validating them with external cohorts could enhance individual-level risk prediction and increase clinical relevance.
Supporting information
S1 Fig. Flowchart of Participant and Variant Selection during Quality Control and Genotype Imputation.
In total, 2,604 participants were genotyped. Of these, 764 participants based on clinical criteria and 58 participants based on genotype-based quality control were excluded, retaining 1,782 participants (cases: 1,038; controls: 744) for the final analysis. In total, 443,911 SNPs were identified in these participants after removing 215,273 variants based on quality control criteria. Following subsequent genotype imputation using the 1000 Genomes Phase 3 reference panel [25], variants with low imputation quality (Rsq < 0.3), low minor allele frequency (<0.01), or duplication were excluded. Finally, 8,094,202 SNPs were retained for GWAS and PRS analyses. AF, atrial fibrillation; CAD, coronary artery disease; HF, heart failure; PAF, paroxysmal atrial fibrillation; PCA, principal component analysis; PRS, polygenic risk score. a: 5 cases and 2 controls with 2 or 3 overlapping exclusions.
https://doi.org/10.1371/journal.pone.0344360.s001
(PDF)
S2 Fig. Regional Plot of Chromosome 4 with Surrounding Independent Significant SNPs.
Independent significant SNPs are represented using purple diamonds. Colors indicate linkage disequilibrium (r2) with independent significant SNPs. (A) rs2200732. (B) rs13122916. SNP, single-nucleotide polymorphism.
https://doi.org/10.1371/journal.pone.0344360.s002
(PDF)
S3 Fig. Cross-validated AUC Heatmaps for PRS-only Models using Three Machine Learning Methods.
Heatmaps showing the cross-validated AUC values for PRS-only models across combinations of P-value and linkage disequilibrium r2 thresholds. Three machine learning models are shown: RF, XGB, and LGBM. Each heatmap displays AUC values derived from five-fold cross-validation in the training dataset. Rows correspond to linkage disequilibrium r2 thresholds (no clumping and 0.9 to 0.1) and columns represent P-value thresholds (5 × 10−8, 1 × 10−5, and 1 × 10−4). Darker shades indicate higher AUCs. The numeric values within each cell represent the mean AUC obtained for that specific parameter combination. AUC, area under the curve; LGBM, light gradient boosting machine; PRS, polygenic risk score; RF, random forest; XGB, extreme gradient boosting.
https://doi.org/10.1371/journal.pone.0344360.s003
(PDF)
S1 Table. Hyperparameter Search Spaces and Fixed Settings.
Hyperparameter tuning was performed on the training dataset only using Bayesian optimization with Optuna (Tree-structured Parzen Estimator; five-fold stratified cross-validation; 100 trials per algorithm; seed = 42), maximizing cross-validated area under the curve. LGBM, light gradient boosting machine; RF, random forest; XGB, extreme gradient boosting.
https://doi.org/10.1371/journal.pone.0344360.s004
(XLSX)
S2 Table. Variants associated with Paroxysmal Atrial Fibrillation reaching Genome-wide Significance.
A1, risk allele; A2, reference allele; BETA, regression coefficient; Chr, Chromosome; L95, lower 95% confidence interval; U95, upper 95% confidence interval. Chromosome position (GRCh37/hg19).
https://doi.org/10.1371/journal.pone.0344360.s005
(XLSX)
S3 Table. Variants associated with Paroxysmal Atrial Fibrillation reaching Suggestive Significance.
A1, risk allele; A2, reference allele; BETA, regression coefficient; Chr, Chromosome; L95, lower 95% confidence interval; U95, upper 95% confidence interval. Chromosome position (GRCh37/hg19).
https://doi.org/10.1371/journal.pone.0344360.s006
(XLSX)
S4 Table. Performance of clinical-covariate–only models (excluding PRS) in the test dataset.
AUC, area under the curve; AUPRC, area under the precision-recall curve; CI, confidence interval; LGBM, light gradient boosting machine; RF, random forest; XGB, extreme gradient boosting. The F1 score was calculated as 2 × (precision × recall)/(precision + recall). All metrics were computed using scikit-learn v1.6.1 in Python v3.12.9.
https://doi.org/10.1371/journal.pone.0344360.s007
(XLSX)
Acknowledgments
We would like to express our gratitude to all the patients and their families. We thank Makiko Matsuda and Takamasa Ichikawa for their kind support and advice regarding this study.
References
- 1. Akao M, Chun Y-H, Wada H, Esato M, Hashimoto T, Abe M, et al. Current status of clinical background of patients with atrial fibrillation in a community-based survey: the Fushimi AF Registry. J Cardiol. 2013;61(4):260–6. pmid:23403369
- 2. Kato T, Yamashita T, Sagara K, Iinuma H, Fu L-T. Progressive nature of paroxysmal atrial fibrillation. Observations from a 14-year follow-up study. Circ J. 2004;68(6):568–72. pmid:15170094
- 3. de Vos CB, Pisters R, Nieuwlaat R, Prins MH, Tieleman RG, Coelen R-JS, et al. Progression from paroxysmal to persistent atrial fibrillation clinical correlates and prognosis. J Am Coll Cardiol. 2010;55(8):725–31. pmid:20170808
- 4. Sposato LA, Klein FR, Jáuregui A, Ferrúa M, Klin P, Zamora R, et al. Newly diagnosed atrial fibrillation after acute ischemic stroke and transient ischemic attack: importance of immediate and prolonged continuous cardiac monitoring. J Stroke Cerebrovasc Dis. 2012;21(3):210–6. pmid:20727789
- 5. Rizos T, Wagner A, Jenetzky E, Ringleb PA, Becker R, Hacke W, et al. Paroxysmal atrial fibrillation is more prevalent than persistent atrial fibrillation in acute stroke and transient ischemic attack patients. Cerebrovasc Dis. 2011;32(3):276–82. pmid:21893980
- 6. Joglar JA, Chung MK, Armbruster AL, Benjamin EJ, Chyou JY, Cronin EM, et al. 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2024;149(1):e1–156. pmid:38033089
- 7. Darbar D, Herron KJ, Ballew JD, Jahangir A, Gersh BJ, Shen W-K, et al. Familial atrial fibrillation is a genetically heterogeneous disorder. J Am Coll Cardiol. 2003;41(12):2185–92. pmid:12821245
- 8. Gudbjartsson DF, Arnar DO, Helgadottir A, Gretarsdottir S, Holm H, Sigurdsson A, et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature. 2007;448(7151):353–7. pmid:17603472
- 9. Roselli C, Chaffin MD, Weng L-C, Aeschbacher S, Ahlberg G, Albert CM, et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat Genet. 2018;50(9):1225–33. pmid:29892015
- 10. Vinciguerra M, Dobrev D, Nattel S. Atrial fibrillation: pathophysiology, genetic and epigenetic mechanisms. Lancet Reg Health Eur. 2024;37:100785. pmid:38362554
- 11. Low S-K, Takahashi A, Ebana Y, Ozaki K, Christophersen IE, Ellinor PT, et al. Identification of six new genetic loci associated with atrial fibrillation in the Japanese population. Nat Genet. 2017;49(6):953–8. pmid:28416822
- 12. Schulz C, Lemoine MD, Mearini G, Koivumäki J, Sani J, Schwedhelm E, et al. PITX2 Knockout Induces Key Findings of Electrical Remodeling as Seen in Persistent Atrial Fibrillation. Circ Arrhythm Electrophysiol. 2023;16(3):e011602. pmid:36763906
- 13. Kim K, Blackwell DJ, Yuen SL, Thorpe MP, Johnston JN, Cornea RL, et al. The selective RyR2 inhibitor ent-verticilide suppresses atrial fibrillation susceptibility caused by Pitx2 deficiency. J Mol Cell Cardiol. 2023;180:1–9. pmid:37080450
- 14. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50(9):1219–24. pmid:30104762
- 15. Gibson JT, Rudd JHF. Polygenic risk scores in atrial fibrillation: Associations and clinical utility in disease prediction. Heart Rhythm. 2024;21(6):913–8. pmid:38336192
- 16. Liu L, Ebana Y, Nitta J-I, Takahashi Y, Miyazaki S, Tanaka T, et al. Genetic Variants Associated With Susceptibility to Atrial Fibrillation in a Japanese Population. Can J Cardiol. 2017;33(4):443–9. pmid:28129963
- 17. Okubo Y, Nakano Y, Ochi H, Onohara Y, Tokuyama T, Motoda C, et al. Predicting atrial fibrillation using a combination of genetic risk score and clinical risk factors. Heart Rhythm. 2020;17(5 Pt A):699–705. pmid:31931171
- 18. Miyazawa K, Ito K, Ito M, Zou Z, Kubota M, Nomura S, et al. Cross-ancestry genome-wide analysis of atrial fibrillation unveils disease biology and enables cardioembolic risk prediction. Nat Genet. 2023;55(2):187–97. pmid:36653681
- 19. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 4768–77.
- 20. Takase B, Ikeda T, Shimizu W, Abe H, Aiba T, Chinushi M, et al. JCS/JHRS 2022 Guideline on Diagnosis and Risk Assessment of Arrhythmia. Circ J. 2024;88(9):1509–95. pmid:37690816
- 21. Umemura S, Arima H, Arima S, Asayama K, Dohi Y, Hirooka Y, et al. The Japanese Society of Hypertension Guidelines for the Management of Hypertension (JSH 2019). Hypertens Res. 2019;42(9):1235–481. pmid:31375757
- 22. Araki E, Goto A, Kondo T, Noda M, Noto H, Origasa H, et al. Japanese Clinical Practice Guideline for Diabetes 2019. Diabetol Int. 2020;11(3):165–223. pmid:32802702
- 23. Okamura T, Tsukamoto K, Arai H, Fujioka Y, Ishigaki Y, Koba S, et al. Japan Atherosclerosis Society (JAS) Guidelines for Prevention of Atherosclerotic Cardiovascular Diseases 2022. J Atheroscler Thromb. 2024;31(6):641–853. pmid:38123343
- 24. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. pmid:17701901
- 25. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. pmid:26432245
- 26. Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nat Methods. 2011;9(2):179–81. pmid:22138821
- 27. Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48(10):1284–7. pmid:27571263
- 28. Oscanoa J, Sivapalan L, Gadaleta E, Dayem Ullah AZ, Lemoine NR, Chelala C. SNPnexus: a web server for functional annotation of human genome sequence variation (2020 update). Nucleic Acids Res. 2020;48(W1):W185–92. pmid:32496546
- 29. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29(1):308–11. pmid:11125122
- 30. Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8(1):1826. pmid:29184056
- 31. de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 2015;11(4):e1004219. pmid:25885710
- 32. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5. pmid:24487276
- 33. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–7. pmid:22955989
- 34. van Berkum NL, Lieberman-Aiden E, Williams L, Imakaev M, Gnirke A, Mirny LA, et al. Hi-C: a method to study the three-dimensional architecture of genomes. J Vis Exp. 2010;(39):1869. pmid:20461051
- 35. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–7. pmid:20634204
- 36. Christophersen IE, Rienstra M, Roselli C, Yin X, Geelhoed B, Barnard J, et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat Genet. 2017;49(6):946–52. pmid:28416818
- 37. Lubitz SA, Lunetta KL, Lin H, Arking DE, Trompet S, Li G, et al. Novel genetic markers associate with atrial fibrillation risk in Europeans and Japanese. J Am Coll Cardiol. 2014;63(12):1200–10. pmid:24486271
- 38. Ebana Y, Ozaki K, Liu L, Hachiya H, Hirao K, Isobe M, et al. Clinical utility and functional analysis of variants in atrial fibrillation-associated locus 4q25. J Cardiol. 2017;70(4):366–73. pmid:28087289
- 39. Ellinor PT, Lunetta KL, Albert CM, Glazer NL, Ritchie MD, Smith AV, et al. Meta-analysis identifies six new susceptibility loci for atrial fibrillation. Nat Genet. 2012;44(6):670–5. pmid:22544366
- 40. Sun L, Zhang Z, Xu J, Xu G, Liu X. Chromosome 4q25 Variants rs2200733, rs10033464, and rs1906591 Contribute to Ischemic Stroke Risk. Mol Neurobiol. 2016;53(6):3882–90. pmid:26162320
- 41. Shantsila E, Choi E-K, Lane DA, Joung B, Lip GYH. Atrial fibrillation: comorbidities, lifestyle, and patient factors. Lancet Reg Health Eur. 2024;37:100784. pmid:38362547
- 42. Magnussen C, Niiranen TJ, Ojeda FM, Gianfagna F, Blankenberg S, Njølstad I, et al. Sex Differences and Similarities in Atrial Fibrillation Epidemiology, Risk Factors, and Mortality in Community Cohorts: Results From the BiomarCaRE Consortium (Biomarker for Cardiovascular Risk Assessment in Europe). Circulation. 2017;136(17):1588–97. pmid:29038167
- 43. Homburger JR, Neben CL, Mishne G, Zhou AY, Kathiresan S, Khera AV. Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores. Genome Med. 2019;11(1):74. pmid:31771638
- 44. Khurshid S, Mars N, Haggerty CM, Huang Q, Weng L-C, Hartzel DN, et al. Predictive Accuracy of a Clinical and Genetic Risk Model for Atrial Fibrillation. Circ Genom Precis Med. 2021;14(5):e003355. pmid:34463125
- 45. Tanigawa Y, Qian J, Venkataraman G, Justesen JM, Li R, Tibshirani R, et al. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet. 2022;18(3):e1010105. pmid:35324888
- 46. Nielsen JB, Fritsche LG, Zhou W, Teslovich TM, Holmen OL, Gustafsson S, et al. Genome-wide Study of Atrial Fibrillation Identifies Seven Risk Loci and Highlights Biological Pathways and Regulatory Elements Involved in Cardiac Development. Am J Hum Genet. 2018;102(1):103–15. pmid:29290336
- 47. Mars N, Koskela JT, Ripatti P, Kiiskinen TTJ, Havulinna AS, Lindbohm JV, et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat Med. 2020;26(4):549–57. pmid:32273609