Obesity and risk of female reproductive conditions: A Mendelian randomisation study

Background Obesity is observationally associated with altered risk of many female reproductive conditions. These include polycystic ovary syndrome (PCOS), abnormal uterine bleeding, endometriosis, infertility, and pregnancy-related disorders. However, the roles and mechanisms of obesity in the aetiology of reproductive disorders remain unclear. Thus, we aimed to estimate observational and genetically predicted causal associations between obesity, metabolic hormones, and female reproductive disorders. Methods and findings Logistic regression, generalised additive models, and Mendelian randomisation (MR) (2-sample, non-linear, and multivariable) were applied to obesity and reproductive disease data on up to 257,193 women of European ancestry in UK Biobank and publicly available genome-wide association studies (GWASs). Body mass index (BMI), waist-to-hip ratio (WHR), and WHR adjusted for BMI were observationally (odds ratios [ORs] = 1.02–1.87 per 1-SD increase in obesity trait) and genetically (ORs = 1.06–2.09) associated with uterine fibroids (UF), PCOS, heavy menstrual bleeding (HMB), and pre-eclampsia. Genetically predicted visceral adipose tissue (VAT) mass was associated with the development of HMB (OR [95% CI] per 1-kg increase in predicted VAT mass = 1.32 [1.06–1.64], P = 0.0130), PCOS (OR [95% CI] = 1.15 [1.08–1.23], P = 3.24 × 10−05), and pre-eclampsia (OR [95% CI] = 3.08 [1.98–4.79], P = 6.65 × 10−07). Increased waist circumference posed a higher genetic risk (ORs = 1.16–1.93) for the development of these disorders and UF than did increased hip circumference (ORs = 1.06–1.10). Leptin, fasting insulin, and insulin resistance each mediated between 20% and 50% of the total genetically predicted association of obesity with pre-eclampsia. Reproductive conditions clustered based on shared genetic components of their aetiological relationships with obesity. This study was limited in power by the low prevalence of female reproductive conditions among women in the UK Biobank, with little information on pre-diagnostic anthropometric traits, and by the susceptibility of MR estimates to genetic pleiotropy. Conclusions We found that common indices of overall and central obesity were associated with increased risks of reproductive disorders to heterogenous extents in a systematic, large-scale genetics-based analysis of the aetiological relationships between obesity and female reproductive conditions. Our results suggest the utility of exploring the mechanisms mediating the causal associations of overweight and obesity with gynaecological health to identify targets for disease prevention and treatment.


Introduction
Obesity is commonly understood as the excess accumulation of body fat, which leads to increased health risks. In women, increased body mass index (BMI) is associated with increased prevalence of gynaecological conditions, including excessive and abnormal menstrual bleeding [1,2], endometriosis and uterine fibroids (UF) [3,4], polycystic ovary syndrome (PCOS) [5,6], complications of pregnancy such as pre-eclampsia and eclampsia [7], miscarriage [8,9], and infertility [10,11]. These are often non-linear and heterogeneous relationships. While the risks of anovulatory infertility and recurrent miscarriages are highest in obese women, underweight women also have increased risk of infertility [9,12]. The association of BMI with endometriosis varies by disease severity, as women with advanced-stage endometriosis have lower BMI than those with minimal disease, and the inverse BMI-endometriosis association is stronger in women with infertility [13,14]. Finally, although the severity of PCOS and menstrual disorders increases with overall obesity, women presenting with these conditions are more likely to store fat in the abdominal region, regardless of their BMI [2,5].
Previous estimates of the associations of obesity with reproductive conditions have primarily been based on observational study designs including case-control studies [8,15,16] and cross-sectional studies in randomly selected women [1], or cross-sectional studies conducted only in women with obesity [2]. While the direction of effect is largely consistent across studies, the heterogeneity in selection of cases, controls, and populations observed in these studies is reflected in heterogeneity in the effect estimates. Further, observational epidemiological studies are limited in assessing causality, due to confounding and reverse causation. The Mendelian randomisation (MR) framework is a genetics-based instrumental variable approach that relies on the random and fixed assignment of genetic variants at conception to estimate the causal effect size of genetically predicted exposures on an outcome. MR has previously indicated causal associations of genetically predicted BMI with some subtypes of ovarian cancer (OR = 1.29 per 5 units of BMI) [17], endometrial cancer (OR = 2.06 per 5 units of BMI) (16), and PCOS (OR = 4.89 per 1 SD higher BMI) [18]. However, the aetiological role of obesity and body fat distribution in many other female reproductive diseases has not been reported. It is especially relevant to investigate the effects of fat distribution, as there are intricate metabolic and endocrine links between adipose tissue and female reproductive organs. Yet, causal investigations of such relationships are lacking.
Leptin, which is a hormone secreted by adipocytes and elevated in individuals with obesity, is increased in women with endometriosis, UF, and hypertensive disorders of pregnancy, even when analyses are adjusted for BMI [7,[19][20][21][22]. Obesity-induced insulin resistance additionally increases the risk and severity of PCOS and pre-eclampsia by dysregulating steroid hormone and metabolic pathways [5,23,24]. The dysregulation of sex hormones, including oestrogen and testosterone, is likely to play a role in the obesity-driven development of female reproductive disorders due to its close associations with body fat [23,25]. Yet, to the best of our knowledge, the causal impact of these factors in mediating the relationships between obesity and gynaecological diseases has not been detailed.
Here, we apply logistic regression, generalised additive models (GAMs), and 2-sample, non-linear, and multivariable MR to dissect the relationships of overall obesity and body fat distribution with a range of female reproductive disorders, and to investigate the mediating role of metabolic factors including leptin and insulin.

Observational associations in UK Biobank
UK Biobank (UKBB) is a prospective UK-based cohort study with approximately 500,000 participants aged 40-69 years at recruitment on whom a range of medical, environmental, and genetic information is collected [26]. We included 257,193 individuals self-identifying as females of white ancestry in UKBB in our analyses. Baseline measurements of BMI (total body weight [kg]/standing height squared [m 2 ]) and waist-to-hip ratio (WHR) (waist circumference [WC] [cm]/hip circumference [HC] [cm]), and WHR adjusted for BMI (WHRadjBMI) were used to estimate general obesity (BMI) and central obesity (WHR and WHRadjBMI). In response to peer review comments, comparative body size at age 10 years (as self-reported in a questionnaire with the options 'thinner', 'plumper', or 'about average') was used to estimate adiposity at an earlier time point, i.e., before diagnosis of most reproductive disorders. Cases of reproductive conditions were identified based on ICD-9 and ICD-10 primary and secondary diagnoses from hospital inpatient data, self-reported illness codes, and primary care records (Table 1). We fitted logistic regression models to estimate the associations of BMI, WHR, and WHRadjBMI with prevalence of endometriosis (7,703 cases, 249,490 controls); heavy menstrual bleeding (HMB) (17,229 cases, 239,964 controls); infertility (2,194 cases, 254,999 controls); self-reported stillbirth, spontaneous miscarriage, or termination (81,102 cases, 176,091 controls); PCOS (746 cases, 256,447 controls); pre-eclampsia (2,242 cases, 254,951 controls); and UF (19,192 cases, 238,001 controls). Case definitions for pre-eclampsia included eclampsia cases to capture cases in which the former may have developed into the latter. For each disease, individuals not included in the case group were used as controls. BMI, WHR, and WHRadjBMI were adjusted for age, age squared, assessment centre, and smoking status. The residuals were rank-based inverse normally transformed. Multiple testing correction for 21 tests (3 exposures × 7 outcomes) was applied using the false discovery rate (FDR) to evaluate statistical significance while minimising false negatives [27,28].
We also tested associations without adjustment for smoking status, as it has previously been suggested that higher BMI increases risk of smoking [29] and adjustment for both could therefore induce collider bias. Adjustment for menopause status was not performed as up to 42% of women with reproductive disorders in UKBB report being unsure of their menopause status, as compared to 16% of women who do not have a recorded history or presence of a reproductive condition (Table 1).
To evaluate the presence of non-linear observational associations between obesity and each reproductive trait, fractional polynomial regression following the closed test procedure was performed using the mfp v1.5.2 R package [30]. This algorithm tests for the presence of an overall association, determines the likelihood of non-linearity, and selects the best-fitting fractional polynomial function. We also fitted GAMs to the same data, allowing for smoothing of the obesity trait with splines, using the mgcv 1.8-31 R package [31]. These models allow a greater degree of flexibility in modelling curves that cannot be represented by polynomials of the nth degree; however, they are also more complex and thus less immediately interpretable [32]. All models were adjusted for age, age squared, assessment centre, and smoking status. Model fits were compared with Akaike's information criterion (AIC), with lower AIC indicating better fit [33]. As a data-driven investigation, the analyses upon which this paper is based were planned and designed for each of the sections independently; no overall prospective analysis plan was followed. Analyses dependent upon results from other sections, such as the mediation analysis dependent on associations from 2-sample MR, are noted as such within each section. This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization (STROBE-MR) guideline (S1 Checklist).

Two-sample MR
Genetic instruments for BMI, WHR, and WHRadjBMI were selected based on the sentinel variants at genome-wide significant loci (P < 5 × 10 −9 to account for denser imputation data) reported in the largest publicly available European ancestry genome-wide association studies (GWASs), which are meta-analyses of the Genetic Investigation of ANthropometric Traits Three instrument weighting strategies were considered where sex-stratified GWAS results were available: (i) SNPs from combined-sex GWASs with combined-sex weights (effect sizes), (ii) combined-sex SNPs with female-specific weights, and (iii) female-specific SNPs with female-specific weights. The method of female-specific SNPs with female-specific weights produced the strongest instruments as evaluated by F-statistics, i.e., mean β 2 /σ 2 over all SNPs in the instrument, and was thus chosen for analysis (S1 Table). Additionally, due to concerns of ascertainment bias in UKBB [38,39], sensitivity analyses with combined-sex instruments (combined-sex SNPs with combined-sex weights) were also performed. In response to peer review comments, an additional sensitivity analysis with SNPs from previous GIANT releases for BMI [40] and WHR and WHRadjBMI [41] that do not include UKBB participants was also performed to alleviate potential ascertainment bias. Finally, to alleviate concerns of collider bias in the WHRadjBMI GWAS [42], we constructed a joint WHR and BMI instrument to perform multivariable MR.
Associations of the genetic instruments for obesity traits with female reproductive diseases were obtained by performing a fixed-effect inverse-variance-weighted (IVW) meta-analysis of publicly available GWAS summary statistics from 2 large biobank projects-FinnGen and UKBB [43]. The meta-analysis was performed using METAL [44] by matching the relevant ICD codes (S2 Table) for the following traits: infertility (4,996 cases, 421,223 controls), preeclampsia (2,711 cases, 480,373 controls), and UF (21,835 cases, 456,551 controls). For endometriosis, summary statistics were obtained by request from a recent European ancestry GWAS [45] and meta-analysed as above with publicly available FinnGen and UKBB summary statistics (12,210 cases, 450,183 controls). For HMB (9,813 cases, 210,946 controls), sporadic miscarriage (i.e., 1-2 miscarriages; 50,060 cases, 174,109 controls), and multiple consecutive miscarriage (i.e., �3 consecutive miscarriages; 750 cases, 150,215 controls), publicly available summary statistics were obtained from recent European ancestry GWASs that include UKBB individuals [3,46]. For PCOS, estimates were based on a fixed-effect IVW meta-analysis of published GWAS summary statistics [47], publicly available GWAS results by FinnGen, and a European ancestry GWAS run in UKBB using SAIGE (11,186 cases, 273,812 controls) (S3 Table). As a sensitivity analysis, all MR tests were performed using disease association estimates based on FinnGen only, where available, to alleviate bias due to sample overlap between the exposure and outcome GWAS sources.
Power to detect MR associations was calculated using 2 methods, one designed for general 2-sample MR [48] and the other for MR performed on binary outcomes [49]. Briefly, these methods calculate power by accounting for GWAS sample size, the proportion of cases in case-control GWASs, and the variance explained by genetic instruments for the exposure. The power to detect a true odds ratio (OR) association of 1.1 or more extreme at an unadjusted significance level of 0.05 was estimated.
Instrument SNPs were extracted from the outcome GWAS results and harmonised for consistency in the alleles, and MR was performed using the TwoSample MR v0.5.4 R package [50]. Two methods for MR-IVW and MR-Egger-were evaluated, and the best method was selected via Ruecker's framework [51]. Briefly, this framework advises to choose the MR method with least heterogeneity as assessed by Cochran's Q statistic, while accounting for the trade-off between power and pleiotropy [52]. IVW results, which were chosen by Ruecker's framework for all tested associations, are reported in the main text. However, as these are still susceptible to pleiotropy, results from all methods, including weighted median MR [53], are calculated for robustness and displayed in the supplementary information. Correction for multiple hypothesis testing for 24 tests in each analysis (3 exposures × 8 outcomes) was applied with the FDR method, and significance established at FDR < 0.05. MR-Egger intercept tests were performed to detect horizontal pleiotropy, and single-SNP and leave-one-out analyses were used to identify outlier SNPs driving relationships [50].

Non-linear MR
For non-linear MR analyses, we selected female UKBB participants of white British ancestry with no second-degree or closer relatives in the study, as identified by the UKBB team [55], to avoid violation of the MR assumption of random assignment of genetic variants; 207,705 women were retained following this selection. Genetic instruments for BMI were constructed for each individual using female-specific index variants from Pulit et al.'s GIANT-UKBB meta-analysis [34]. The instruments for BMI explained 4.15% of trait variance after adjustment for age, age squared, smoking status, assessment centre, genotyping array, and the first 10 genetic principal components to account for population stratification. Binomial non-linear MR, a method designed to assess genetically predicted causal relationships in different exposure strata while avoiding collider bias, was performed using the fractional polynomial method with 100 quantiles and the piecewise linear method with 10 quantiles [56]. Outcomes were restricted to female reproductive disorders with prevalence > 5% in UKBB (i.e., HMB, miscarriage, and UF) to maintain sufficient sample sizes in each quantile to estimate localised average causal estimates. We assessed non-linearity with the fractional polynomial non-linearity and Cochran's Q tests, and tested heterogeneity of the instrumental variable with the Cochran's Q and trend tests. All analyses were performed with the nlmr v2.0 R package [56].

MR with mediation analysis
To investigate the extent to which obesity affects female reproductive disorders via hormonerelated mediators, 2-step MR by the product of coefficients method was performed using GWAS summary statistics. This method was chosen as female reproductive disease phenotypes are binary outcomes with disease prevalence < 10% in UKBB, for which 2-step MR provides the least biased estimates of mediation [57]. Summary statistics for leptin (N = 33,987) [58], fasting insulin (N = 51,750) [59], and insulin sensitivity (N = 16,753) [60] were obtained from publicly available European ancestry GWAS sources that do not include samples from UKBB to minimise bias from sample overlap (S1 Table).
In the first step of 2-step MR, the mediators were regressed on obesity-related exposures using summary statistics MR methods described above. The direction of causality for all relationships was confirmed with the MR-Steiger directionality test [61], and reciprocal MR with mediator instruments and obesity-related exposures as outcomes were performed to ensure correct direction of causality. In the second step, multivariable MR was performed using combined genetic instruments for each obesity trait and hormone to estimate the independent effect of the mediator on each outcome after adjusting for the value of the exposure, and to estimate the independent effect of the exposure on outcome when adjusted for the value of each mediator. This was only done for traits where the total unadjusted effect of the exposure on the outcome was significant (FDR < 0.05). ORs for binary outcomes were converted to log ORs to calculate mediated effect by the product of coefficients method. The proportion of effect mediated was calculated by dividing the indirect effect by the total effect. Standard errors were estimated with the delta method [62].

Disease and SNP clustering
To assess similarities in the aetiological relationships of different reproductive conditions with obesity traits, we projected single SNP genetic effect estimates for BMI, WHR, and WHRadjBMI on the reproductive traits, estimated using the Wald ratio, in a 2-dimensional space using the Uniform Manifold Approximation and Projection (UMAP). Briefly, each 8-by-M matrix of SNP effect estimates (for 8 female reproductive outcomes, where M is 281 for BMI, 203 for WHR, and 266 for WHRadjBMI) was reduced to a 2-dimensional representation while maintaining as close a topological relationship between the 8 reproductive outcomes as possible, as measured by cross entropy [63]. SNPs were annotated to their nearest gene with SNPsnap [64].
To identify the genetic instruments driving the obesity-reproductive trait association, and identify clusters of SNPs with distinct associations, we clustered SNPs by the magnitude of their causal estimates using mixture model clustering in the MR-Clust v0.1.0 R package [65]. For each obesity trait-reproductive disease pair, the algorithm distinguishes the genetic instruments for the obesity traits that do not have an effect on the disease ('null clusters') from those that have a similar scaled effect on the disease ('substantial clusters') and those that have a scaled effect that cannot be grouped with other variants ('junk clusters').

Research ethics
This research has been conducted using the UKBB resource under application number 11867. All procedures and data collection in UKBB were approved by the UKBB Research Ethics Committee (reference number 11/NW/0274), with participants providing full written informed consent for participation in UKBB and subsequent use of their data for approved applications. All other publicly available de-identified summary data used in this study have ethical permissions from their respective institutional review boards.
Non-linear models explained the associations of BMI with many reproductive disorders better than linear models. We observed inverted-U and plateau relationships with endometriosis ( Table). All 3 obesity traits displayed U-shaped relationships with PCOS.

Body fat distribution is genetically causally related to risk of female reproductive diseases
Genetically predicted VAT mass was associated with the development of pre-eclampsia (OR [95% CI] per 1-kg increase in predicted VAT mass = 3.08 A series of logistic regression, fractional polynomial, and generalised additive models were fitted to estimate the probability of developing female reproductive disorders as a function of obesity-related traits in UK Biobank. Predicted fits for logistic and best-fitting non-linear models were better than those for logistic regression (as evaluated with Akaike information criterion), and 95% confidence intervals about the mean are displayed. Asterisks indicate that non-linear models fit the data better than linear models. All models were adjusted for age, age squared, assessment centre, and smoking status. BMI, body mass index; spont. miscarr., spontaneous miscarriage; WHRadjBMI, waist-to-hip ratio adjusted for BMI.   Table). The differential association of genetically predicted body fat distribution with female reproductive traits was further reflected in the heterogeneous associations of WC and HC with disease development. Increased WC posed a higher risk than did increased HC for pre-eclampsia (OR per 1-SD increase: 1.93 for WC versus 1.40 for HC, heterogeneity P-het = 0.0373), HMB (1.41 for WC versus 1.12 for HC, P-het = 3.60 × 10 −03 ), UF (1.32 for WC versus 1.12 for HC, P-het = 7.70 × 10 −03 ), and PCOS (1.16 for WC versus 1.10 for HC, P-het = 0.0325). We did not see this heterogeneity in observational associations (all P-het > 0.164) (Fig 3; S12 Table).
No significant associations were found when restricting MR analyses to genetic instruments with a specific effect on waist but not HC, or on hip but not WC (S13 Table; S1 Fig), but the power of these instruments to detect ORs more extreme than 1.1 was limited to 5%-20% (S14 Table). No non-linear MR models explained the genetic associations of BMI with any reproductive disorder better than linear MR models (S4 Fig). However, the Odds ratios and 95% confidence intervals per 1 SD higher obesity trait displayed. Significant relationships (false discovery rate [FDR]-adjusted P value < 0.05) are in solid lines while non-significant (n.s.) ones are shown with dotted lines. For observational results, BMI, WHR, and WHRadjBMI adjusted for age, age squared, region (assessment centre), and smoking status are used as predictors in a logistic regression model. Causal relationships between genetically predicted obesity-related traits and female reproductive disorders are assessed by 2-sample Mendelian randomisation (MR). The displayed method (inverse-variance-weighted) was determined via Rucker's model selection framework to minimise heterogeneity of the estimate. "Miscarriage (sporadic)" is self-reported stillbirth, spontaneous miscarriage, or termination for observational results and sporadic miscarriage for MR results. BMI, body mass index; HMB, heavy menstrual bleeding; Misc. (mult.), multiple consecutive miscarriage; PCOS, polycystic ovary syndrome; WHR, waist-to-hip ratio; WHRadjBMI, waist-to-hip ratio adjusted for BMI.
https://doi.org/10.1371/journal.pmed.1003679.g002 power to detect non-linear effects was severely limited by the lower number of cases in each quantile of the BMI distribution in which analyses were run. Statistical significance after multiple testing correction for 24 tests (3 exposures × 8 outcomes) was established at FDR < 0.05, unadjusted P < 0.03.
SNPs identified in female-only GWASs and with female-specific weights for BMI, WHR, and WHRadjBMI [34] were found to be the strongest instruments, with F-statistics > 60; instrument strength for WC and HC was >45 (S1 Table). We prioritised MR results based on the IVW method over less powered MR-Egger and weighted median methods, as there was no evidence for directional pleiotropy (MR-Egger horizontal pleiotropy P > 0.0547) and the ratio of Cochran's Q 0 (Egger) to Q (IVW) was >0.876 (S7 Table). For the 24/64 analyses for which IVW indicated a significant effect, the effect estimate of the other methods was either directionally consistent (13/24) or non-significant (11/24), but never opposite (S1 Fig). Estimates were also consistent when based only on FinnGen summary statistics (heterogeneity P > 0.163), with instruments from GIANT only (heterogeneity P > 0.0541), or with combined-sex instruments (heterogeneity P > 0.999), suggesting that the findings were not substantially biased due to sample overlap between exposure and outcome GWAS sources or ascertainment bias in UKBB [35,36] (S9 and S10 Tables; S1 and S2 Figs). Finally, results for WHRadjBMI did not appear to be affected by collider bias, as estimates did not differ when using WHRadjBMI GWAS instruments compared to a multivariable analysis for WHR SNPs and BMI SNPs in the same model (S11 Table;   We did not find evidence for reverse causal associations of endometriosis, PCOS, or UF with BMI, WHR, and WHRadjBMI (S15 Table). However, these estimates may be biased by weak genetic instruments for endometriosis (F-statistic = 5.13) and UF (F-statistic = 11.1) and high heterogeneity for all associations (Cochran's Q P < 4.71 × 10 −06 ). We were limited in assessing the reverse causality of other female reproductive conditions on obesity traits by the lack of index SNPs in large-scale publicly available GWAS summary statistics.
We calculated the proportion of total obesity effect mediated by leptin, fasting insulin, and ISI for disorders where the effects of obesity traits and mediators were significant at unadjusted P < 0.05. We found that leptin (50.2% of the effect of BMI on pre-eclampsia), fasting insulin (27.7%-36.6%), and ISI (19.1%-50.1%) each mediated the total genetically predicted effects of obesity traits on female reproductive disorders (Table 4).

Other metabolic and hormone pathways may drive the aetiological relationships of obesity with female reproductive diseases
We assessed the similarities in the aetiological relationships of different reproductive conditions with obesity, by projecting the single SNP genetic effect estimates for BMI, WHR, and WHRadjBMI on the reproductive traits in a 2-dimensional space using UMAP (Fig 5A). The UMAP projections based on all obesity traits clustered female reproductive diseases into three groups; one, consisting of endometriosis, UF, infertility, and HMB, which was separated from the second (sporadic and multiple consecutive miscarriage). Group 3 consisted of PCOS and pre-eclampsia, which clustered closely in UMAP plots of the effect of WHR and WHRadjBMI variants, but were separated by BMI-associated variants. This reflects a shared genetic component of the aetiological role of general and central obesity in the 3 groups of reproductive conditions.
We further examined whether different aspects of obesity play an aetiological role in different reproductive conditions. For each obesity trait-reproductive disease pair, we grouped the genetic instruments for the obesity traits by those that do not have associations with the disease ('null clusters'), those that have a similar scaled association with the disease ('substantial clusters'), and those that have a scaled association that cannot be grouped with other variants Mendelian randomisation (MR) to estimate hormonally mediated effects between obesity and female reproductive disorders. In step 1, the effect of exposures on mediators is estimated using instruments for the exposure alone, while in step 2 the independent effect of the mediators on outcomes is estimated using multivariable MR (MVMR) adjusted for exposures. All SNP-phenotype effect estimates come from different genome-wide association studies. (B) Estimated effects of obesity traits on female reproductive disorders adjusted for mediators. MVMR was performed with combined genetic instruments for each exposure-mediator combination, displayed here for relationships where unadjusted (unadj.) exposureoutcome effect was significant (false discovery rate [FDR] < 0.05). (C) Example of a mediated relationship, shown here for BMI effect on preeclampsia. Estimated effects for exposure-mediator (betas and standard errors) and for mediator-outcome (log odds ratios and standard errors) effects are shown. Repr., reproductive; BMI, body mass index; n.s., not significant; PCOS, polycystic ovary syndrome; sensitiv., sensitivity; WHR, waist-to-hip ratio; WHRadjBMI, waist-to-hip ratio adjusted for BMI.
https://doi.org/10.1371/journal.pmed.1003679.g004 ('junk clusters') using MRClust [65]. One substantial cluster was identified for each pair of obesity traits and reproductive conditions. The only exception to this was the pair WHRadjBMI and UF, for which 2 substantial clusters were identified, one with positive genetic association and the other with negative association (S5 Fig; S17 Table). Of the 4 SNPs in the negative effect cluster, rs2277339 (missense variant in PRIM1 and upstream of HSD17B6, involved in steroid biosynthesis) is associated with primary ovarian insufficiency, early menopause, and PCOS [68,69], and rs11694173 is intronic to THADA, which is also associated with PCOS [47]. On the other hand, 4 of 10 SNPs in the positive effect cluster are associated with metabolic traits-rs12328675 and rs2459732 are associated with circulating leptin [70], rs6905288 is associated with type 2 diabetes and thyroid stimulating hormone [71,72], and rs4686696 is intronic to insulin-like growth factor IGF2BP2. SNPs with high probability of belonging to the substantial cluster (�80% probability) were generally unique to each obesity-disease relationship, with no more than 2 variants shared between any 2 clusters ( Fig 5B). However, 6 BMI index SNPs had positive genetic estimates for both PCOS and pre-eclampsia, including rs1121980 in the adipose-associated gene FTO and rs7498665 in SH2B1, linked to insulin resistance in obesity. The BMI-associated variant rs7084454 (intronic to MLLT10) was shared by substantial clusters for PCOS, endometriosis, and UF, while rs114760566 (mapped to HMGA1, associated with type 2 diabetes and multiple lipomatosis) was shared by endometriosis and UF. We evaluated the biological effect of the top SNPs in each substantial cluster with the DEPICT algorithms for pathway enrichment and gene prioritisation. We recapitulated the known association of the GEMIN5 subnetwork with PCOS in SNPs causal for BMI-PCOS [73]. Gene prioritisation for WHR-endometriosis causal SNPs highlighted TBX15, an important mesodermal transcription factor with roles in endometrial and ovarian cancer [74,75].

Discussion
In this systematic genetics-based causal investigation of the aetiological role of obesity in female reproductive health, we report evidence that common indices of obesity are associated  with increased risk of a broad range of reproductive conditions, and these associations may be non-uniform across the obesity spectrum. The strongest association of generalised obesity was found with pre-eclampsia, while more modest associations were observed for nearly all other studied conditions. We identified endocrine mechanisms, including those related to leptin and insulin resistance, as potential drivers of aetiological relationships of both generalised and central obesity with female reproductive health. Finally, we found genetic evidence that certain groups of reproductive conditions, such as UF and endometriosis, may share a mechanistically similar relationship with obesity. The results from our MR investigations are less likely to be biased by confounding or reverse causation than observational epidemiological results. We undertook multiple supplementary and sensitivity analyses to evaluate the plausibility of instrumental variable assumptions and robustness to horizontal pleiotropy, outliers, collider bias, and sample overlap that may invalidate or bias MR estimates. While causal associations from MR must be interpreted with caution as several assumptions of the method are untestable, the concordance of our estimates from different methods and analytical approaches indicates strong support for a causal role of obesity in the aetiology of female reproductive conditions.
Our findings highlight that the relationships between obesity and female reproductive disorders are (i) non-uniform in their nature and strength and (ii) observationally nonlinear across the obesity spectrum. We report substantial differences in the genetically predicted causal associations of BMI with reproductive diseases, with each 1-SD increase in BMI associated with double the risk of pre-eclampsia, but more moderately (ORs = 1.01-1.25 for PCOS, miscarriage, UF, and HMB) or not at all (infertility and endometriosis) affecting other conditions. Conversely, central fat distribution independent of BMI showed substantial genetically predicted effects on both infertility and endometriosis (ORs per 1-SD increase in WHRadjBMI = 1.21-1.46) as well as on pre-eclampsia and UF (ORs = 1.17-1.43), but not on PCOS, HMB, and miscarriage. These findings highlight that the aetiological role of obesity in female reproductive diseases is heterogeneous in its effect strength, and may be driven by overall adiposity (PCOS, HMB, and miscarriage), by isolated central obesity (infertility and endometriosis), or by both generalised and central obesity (pre-eclampsia and UF).
For several reproductive conditions, we found substantial differences between the observational and genetically predicted causal estimates, which may indicate a bidirectional relationship between obesity and reproductive health. For instance, while the observational analyses suggested an 87% increase in PCOS risk per 1 SD higher BMI, the MR analyses indicated that each 1 SD higher genetically predicted BMI was associated with an increased PCOS risk of only 13%. Similarly, 1-SD increases in genetically predicted WHR and WHRadjBMI were associated with a 24% increase in endometriosis risk, while the observational analyses suggest more modest increases in risk of 7% and 2%, respectively. This discrepancy may in part be due to reverse causality, which we were not powered to detect in this study, as the available genetic instruments for reproductive conditions are substantially fewer in number and weaker than those for BMI and WHR. The obesity traits upon which the observational analyses were based were measured at ages 40-69 years, which was for most conditions likely to be several years or decades after women developed the condition, and often post-menopause. While our observational analyses adjusted for the effect of age on obesity traits, adjusting for menopause status proved to be unreliable as up to 42% of women with reproductive diseases in UKBB-some of whom had undergone hysterectomies-were unsure of their menopause status, as opposed to 16% of female participants without a diagnosis of any of the studied conditions. The observational estimates may therefore capture both the effect of obesity on disease risk as well as any downstream effects of the disease or commonly used treatments on body weight and fat distribution. For instance, the large observational effect of BMI on PCOS prevalence may reflect both a causal association of obesity with disease risk [76], as captured by the genetically predicted effect, as well as weight gain as a consequence of PCOS [77]. Other potential contributing factors to the differences between genetic and observational estimates are confounding by unmeasured variables, leading to inflated observational associations [78,79]; referral bias, wherein obesity status affects the likelihood of receiving a diagnosis [80,81]; and differences in pre-and post-menopausal weight and body fat distribution not captured by age [82].
While the observational relationships between obesity and some female reproductive disorders were non-linear, we did not find non-linearity in the genetically predicted effects of BMI on these diseases. The non-linear MR analyses were likely underpowered to detect associations, with few cases in each quantile of the BMI spectrum. However, as current GWAS approaches are focused on identifying genetic determinants of BMI across the full BMI spectrum, it is possible that the instruments used here do not capture genetic factors that specifically explain variations in BMI among those with lower (20-25 kg/m 2 ) or higher (40-45 kg/ m 2 ) BMI. There is currently no evidence for this, but if this were the case, then our analysis would not identify non-linear causal associations.
We noted that genetic estimates for the effect of fat distribution were not similarly attenuated when compared to BMI effects. This disparity may be due to the differing impacts of overall and abdominal (central) adiposity, as the latter is thought to be biologically more directly linked to female reproductive health than generalised obesity, via pathways including insulin resistance and hyperandrogenaemia [5,15,16,83]. Supporting the stronger effect of central body fat, we also reported greater genetic effect estimates of WC than HC with HMB, PCOS, pre-eclampsia, and UF. Genetically predicted VAT mass was associated with increased risk of PCOS and pre-eclampsia, in line with observational studies [84]. VAT mass is also observationally associated with UF [16], yet we did not find a significant association of genetically predicted VAT mass with development of UF, which may suggest a bidirectional or reverse causal relationship.
Endometriosis and infertility were the only reproductive conditions that did not show a consistently positive link with obesity. The modest observational associations of both BMI and WHR with higher endometriosis prevalence in UKBB contradict previous studies, including prospective cohort studies, which reported that lower BMI was associated with increased disease prevalence [14,85,86]. The positive association with endometriosis may in part be due to weight gain as a consequence of the disease, for instance due to hormonal treatments [87][88][89], chronic pain [90], inflammation [91], or earlier onset of menopause [92]. We however did not find evidence that generalised obesity plays a causal role in the aetiology of endometriosis, which suggests that the observational finding reflects a reverse causal relationship. Indeed, we noted that being thinner than average at age 10 years posed a higher risk for development of endometriosis than did having comparatively larger body size, although both were associated with higher prevalence of disease compared to those who reported average body size in early life. Conversely, the positive genetically predicted effect of WHRadjBMI on endometriosis risk indicates a causal role for abdominal fat distribution. For infertility, we observed a similar divergence between the observational and genetically predicted effects of obesity traits, with BMI showing a negative observational association, but WHRadjBMI a genetically predicted positive association. The causes of female infertility are multiple, ranging from PCOS [93] and anovulation [94] to tubal disease [95], endometriosis [96], low oocyte quality [97], hormonal and immunological dysfunction [98][99][100][101], and yet unknown mechanisms. Each of these may have distinct and complex relationships with obesity, which cannot be captured by studying the links with infertility of any cause. Non-linear effects, such as the increased association of under-and overweight with incidence of infertility [12,102], may also obscure these estimates, although our observational analyses did not provide evidence for a non-linear relationship.
We conducted one of the first genetics-based investigations of the mediating hormonal pathways underlying the causal relationships between obesity and female reproductive health. We identify mechanisms related to insulin resistance and leptin as mediators of the effects of obesity traits on UF and pre-eclampsia. The latter is consistent with hypotheses that obese women with metabolic dysregulation are at highest risk of developing hypertensive disorders of pregnancy via angiogenic and pro-inflammatory mechanisms. Increased circulating leptin may have a vasoconstrictive, hypertensive effect, which may be worsened by attenuation of insulin-induced vasorelaxation and increased levels of TNF-alpha and IL6 [7,22].
Finally, genetic clustering of female reproductive conditions revealed evidence supporting common genetic causes for the effects of obesity on endometriosis, UF, and HMB, which are known to share mechanisms of development [3,103]. The projection of infertility with these diseases merits following up on the genetic basis of endometriosis-related infertility, with an eye to prevention and treatment. The main strength of our work is the systematic approach to characterising the relationship between a broad range of obesity traits and common female reproductive conditions using both observational and genetic approaches. All observational associations were estimated in the same large-scale cohort study, which tends to lead to less biased estimates than case-control studies, upon which most previous results were based. Moreover, to the best of our knowledge, we have conducted the first genetics-based mediation analyses to pinpoint the mechanisms driving the causal association of obesity with reproductive diseases.
Reproductive conditions remain underdiagnosed and underreported in the UK, which was reflected in their low prevalence among female UKBB participants (Table 1). Although we based our case definitions on 3 distinct sources, i.e., participants' responses to interviews and structured surveys, primary care electronic health records dating back to 1938 at the earliest, and secondary care hospital in-patient records dating back to 1981 at the earliest, a participant's diagnosis may have been missed if, for example, the diagnostic code was not entered, the diagnosis was made in a setting where electronic health records were not implemented, or the participant could not recall ever having received such a diagnosis at baseline assessment. The low prevalence of female reproductive disorders posed a limitation to our analyses in UKBB by reducing power to identify significant associations. For this reason, we opted to use broad case categories, such as infertility of any cause, as we had insufficient power and information to examine conditions by subtypes. We also restricted our analyses to women of European ancestry, due to a lack of genetic data on women of other ancestries. Many of the reproductive diseases included here, with UF being the most notable example [104,105], are more prevalent in non-European populations, and our results may not be transferable to women of other ancestries [88,106,107], which highlights the urgent need to set up large-scale studies similar to UKBB with participants of non-European ancestry. We were further limited in investigations of metabolic, hormonal, and inflammatory mediating mechanisms by a lack of publicly available GWAS summary statistics for these traits. Finally, the lack of data on BMI and WHR prior to disease onset, and limited information on the age at which reproductive conditions were first diagnosed, complicated the interpretation of our findings from observational analyses in UKBB.
Key priorities for the future are the further exploration and validation of the pathways through which obesity increases the risk of female reproductive disease. Notably, our finding that insulin resistance may be an important mediating mechanism warrants further attention, as affordable and safe treatments are available to increase insulin sensitivity. This is demonstrated by the successful use of metformin treatment in women with PCOS [108], but such a treatment strategy has not yet been explored for other reproductive conditions linked to obesity. More generally, better and more detailed diagnostic information on reproductive health in large-scale cohort studies is urgently required for future research on the causes, consequences, and aetiological mechanisms of female reproductive illnesses.
In conclusion, we provide genetic evidence that both generalised and central obesity play an aetiological role in a broad range of female reproductive conditions, but the extent of this link differs substantially between conditions. Our findings also highlight the importance of hormonal pathways, notably those involving leptin and insulin resistance, as mediating mechanisms and potential targets for intervention in the treatment and prevention of common female reproductive conditions.   Table. Comparing causal effect of WHR adjusted for BMI on female reproductive conditions using WHRadjBMI GWAS instruments versus multivariable Mendelian randomisation for WHR and BMI in same model. (XLSX) S12 Table. Observational and genetically predicted causal associations of hip circumference and waist circumference with female reproductive conditions. (XLSX) S13 Table. Two-sample Mendelian randomisation effect estimates of waist-specific WHR and hip-specific WHR on female reproductive conditions. (XLSX) S14 Table. Power calculations for all 2-sample Mendelian randomisation analyses. (XLSX) S15 Table. Reciprocal Mendelian randomisation for effect of female reproductive conditions on BMI, WHR, and WHRadjBMI. (XLSX) S16 Table. Multivariable Mendelian randomisation effect estimates of obesity-related traits on female reproductive conditions, adjusted for metabolic hormones. (XLSX) S17 Table. Cluster classification and probabilities for each SNP in obesity instruments, with Mendelian randomisation effect estimates on female reproductive conditions. (XLSX) S18 Table. Step 1 of mediation Mendelian randomisation-2-sample Mendelian randomisation effect estimates of obesity-related exposures on metabolic hormones. (XLSX) S19