Do sex hormones confound or mediate the effect of chronotype on breast and prostate cancer? A Mendelian randomization study

Morning-preference chronotype has been found to be protective against breast and prostate cancer. Sex hormones have been implicated in relation to chronotype and the development of both cancers. This study aimed to assess whether sex hormones confound or mediate the effect of chronotype on breast and prostate cancer using a Mendelian Randomization (MR) framework. Genetic variants associated with chronotype and sex hormones (total testosterone, bioavailable testosterone, sex hormone binding globulin, and oestradiol) (p<5×10−8) were obtained from published genome-wide association studies (n≤244,207 females and n≤205,527 males). These variants were used to investigate causal relationships with breast (nCases/nControls = 133,384/113,789) and prostate (nCases/nControls = 79,148/61,106) cancer using univariable, bidirectional and multivariable MR. In females, we found evidence for: I) Reduced risk of breast cancer per category increase in morning-preference (OR = 0.93, 95% CI:0. 88, 1.00); II) Increased risk of breast cancer per SD increase in bioavailable testosterone (OR = 1.10, 95% CI: 1.01, 1.19) and total testosterone (OR = 1.15, 95% CI:1.07, 1.23); III) Bidirectional effects between morning-preference and both bioavailable and total testosterone (e.g. mean SD difference in bioavailable testosterone = -0.08, 95% CI:-0.12, -0.05 per category increase in morning-preference vs difference in morning-preference category = -0.04, 95% CI: -0.08, 0.00 per SD increase in bioavailable testosterone). In males, we found evidence for: I) Reduced risk of prostate cancer per category increase in morning-preference (OR = 0.90, 95% CI: 0.83, 0.97) and II) Increased risk of prostate cancer per SD increase in bioavailable testosterone (OR = 1.22, 95% CI: 1.08, 1.37). No bidirectional effects were found between morning-preference and testosterone in males. While testosterone levels were causally implicated with both chronotype and cancer, there was inconsistent evidence for testosterone as a mediator of the relationship. The protective effect of morning-preference on both breast and prostate cancer is clinically interesting, although it may be difficult to effectively modify chronotype. Further studies are needed to investigate other potentially modifiable intermediates.

In bidirectional MR analyses, we performed MR to evaluate the reciprocal effect of each hormone on chronotype. Together with the effect obtained from the MR of chronotype on hormones, these estimates can be used to determine directionality of the relationship. To ensure the robustness of the inferred direction between chronotype and the sex hormone traits, the instruments were subjected to Steiger filtering to assess strength of SNP associations with both exposure and outcome [6]. Any SNPs found to be more strongly associated with the outcome than the exposure were subsequently removed from the instrument, to ensure that the instrument is only influencing the outcome through the exposure of interest. This Steiger filtering of instruments was also subjected to sensitivity testing (see 'sensitivity analyses'). The same approach was taken as in the univariable, with effect estimates obtained from IVW analysis. Multivariable MR (mvMR) analysis was conducted to assess the direct effects of chronotype and sex hormones on both breast and prostate cancer [7,8]. For these analyses, the instruments for each exposure included in the previous uvMR analyses were used. These instruments were first clumped individually (r2 = 0.001), before being combined and then clumped again [9]. The mv_multiple() function was then used to obtain an inverse variance weighted (IVW) estimate for the direct causal effect of each exposure on the outcome, in which instruments for each exposure are regressed against the outcome together, weighting for the inverse variance of the outcome [5]. Evidence of attenuation of the chronotype effect in mvMR compared to the corresponding uvMR results can be used to indicate a mediated (or indirect) effect executed via the sex hormones investigated in this study [10].

Supplementary Methods B: Genotyping, Imputation and association analysis (UKB)
The full data release contains the cohort of successfully genotyped samples (n=488,377). 49,979 individuals were genotyped using the UK BiLEVE array and 438,398 using the UK Biobank axiom array. Pre-imputation quality control, phasing and imputation are described elsewhere [11]. In brief, prior to phasing, multiallelic SNPs or those with minor allele frequency (MAF) ≤1% were removed.
Phasing of genotype data was performed using a modified version of the SHAPEIT2 algorithm [12].
Genotype imputation to a reference set combining the UK10K haplotype and HRC reference panels 8 was performed using IMPUTE2 algorithms [13]. The analyses presented here were restricted to autosomal variants within the HRC site list using a graded filtering with varying imputation quality for different allele frequency ranges. Therefore, rarer genetic variants are required to have a higher imputation INFO score (Info>0.3 for MAF >3%; Info>0.6 for MAF 1-3%; Info>0.8 for MAF 0.5-1%; Info>0.9 for MAF 0.1-0.5%) with MAF and Info scores having been recalculated on an in-house derived 'European' subset [14].
Individuals with sex-mismatch (derived by comparing genetic sex and reported sex) or individuals with sex-chromosome aneuploidy were excluded from the analysis (n=814). We restricted the sample to individuals of 'European' ancestry as defined by an in-house k-means cluster analysis performed using the first 4 principal components provided by UKB in the statistical software environment R (n=464,708) [14].
Genome-wide association analysis (GWAS) was conducted using linear mixed model (LMM) association method as implemented in BOLT-LMM (v2.3) [15]. To model population structure in the sample we used 143,006 directly genotyped SNPs, obtained after filtering on MAF > 0.01; genotyping UOB Open rate > 0.015; Hardy-Weinberg equilibrium p-value < 0.0001 and LD pruning to an r2 threshold of 0.1 using PLINKv2.00. Genotype array was adjusted for in the model.

Supplementary Methods C: Genotyping and Imputation (BCAC)
Summary GWAS data obtained from BCAC came from a meta-analysis of 133,384 breast cancer cases and 113,789 controls [1]. Summary statistics were also obtained from analyses of 106,491 SNPs. Criteria for removal of SNPs include: i) concordance of <98% among 5,280 duplicate sample pairs; ii) a p < 1.0 x 10 -12 in cases or p < 1.0 x 10 -7 in controls and; iii) a call rate of <95%. SNPs were also removed where the cluster plot was judged to be inadequate. Samples from the overall breast cancer GWAS were also obtained through imputation using the 1000 genomes project phase 3 reference panel [16], where criteria for removal include: i) minor allele frequency of <1%; ii) a call rate of <98% or; iii) different frequency or absence from the 1000 genomes reference panel.
Results from the most recent OncoArray analyses were combined with 46,785 cases and 42,892 controls from the previous iCOGS genotyping project, plus 46,785 cases and 42,892 controls from eleven other breast cancer GWAS using a fixed-effects meta-analysis. All contributory GWAS were adjusted for principal components of European ancestry. OncoArray analyses were further adjusted for country and iCOGS analyses for study. All participating studies had the approval of their appropriate ethics review board and all participants provided informed consent.

Supplementary Methods D: Genotyping and Imputation (ELLIPSE/PRACTICAL)
The ELLIPSE Consortium includes the meta-analysis of both existing and novel GWAS, plsu iCOGS OncoArray genotyping of 533,631 selected variants from ELLIPSE studies was conducted across five sites (Cambridge, CIDR, Copenhagen, USC, and NCI).Exclusion criteria for SNPs included: a <95% call rate by study; a <98% call rate in any study; a p < 1.0 x 10 -12 in cases or p < 1.0 x 10 -7 in controls; concordance of <98% among 11,260 duplicate pairs; a minor allele frequency of <1%; different frequency or absence from the 1000 genomes reference panel or; where the cluster plot was judged to be inadequate. Of the 533,631 SNPs on the OncoArray, 498,417 were retained following quality control. Approximately 70 million SNPs were imputed for all samples using the 1000 genomes project phase 3 reference panel. OncoArray and GWAS datasets were imputed through a two-stage approach, using SHAPEIT31 for phasing and IMPUTEv2 [17] for imputation.

Supplementary Methods E: Assessment of MR Assumptions
MR analysis relies on three key assumptions: i) IVs must be robustly associated with the exposure of interest; ii) IVs must be independent of confounders of the exposure-outcome association; iii) IVs must only influence the outcome through the exposure of interest [18].
To test the first of the MR assumptions, F-statistics were calculated for all IVs used in univariable analysis to assess instrument strength. The F-statistic incorporates phenotypic variance explained by genetic variants (r2), sample size, and number of variants to estimate strength of the relationship between IVs and phenotype [19] , [20]. Instrument strength was also assessed in mvMR analyses with conditional F-statistics [21]. In two-sample MR weak instrument bias is expected to result in attenuation of effects towards the null.
While germline genetic variants should not plausibly be influenced by confounding factors, concerns about violation of the second MR assumption usually focus on confounding by ancestry. Steps were taken to minimise population stratification in the GWAS performed although this assumption is difficult to test, particularly in a two-sample setting.
The third assumption relates to potential horizontal pleiotropy of the IVs used in the genetic variants. In particular, the main IVW approach used assumes that the genetic IVs are valid and that there is no unbalanced horizontal pleiotropy. We undertook complementary sensitivity analyses using methods that relax this assumption. MR-Egger, is similar to IVW but does not constrain the UOB Open intercept to be zero. A non-zero intercept indicates the presence of horizontal pleiotropy and the slope (effect estimate) has controlled for potential bias due to unbalanced horizontal pleiotropy. [22] Both IVW and MR-Egger assume that the 'instrument strength independent of direct effect' (INSIDE) assumption is not violated (i.e. that there is no correlation between the genetic IV-exposure association and unmediated IV-outcome association). The weighted median method assumes that ≥50% of the weight in the analysis stems from IVs which are valid (i.e. if one strong instrument reflecting 50% or more of the variation in the exposure is pleiotropic or several weaker instruments together explain 50% or more of the association and are pleiotropic, the estimate will be biased) [23]. Weighted mode-based MR, estimates from individual valid IVs are in the majority [24].
Scatter plots were used to visualise consistency between IVW, MR-Egger, median and mode effect estimates for initial uvMR of chronotype and sex hormone measures on breast and prostate cancer risk. In subsequent bdMR, and mvMR, MR-Egger alone was used to estimates effects which were robust to pleiotropy.
Steiger filtering was applied to instruments in the bdMR analysis of chronotype and sex hormones.
This approach removes SNPs that explain more of the variance in the outcome than the exposure and consequently reduces the likelihood of erroneous results due to pleiotropy [6]. Steiger sensitivity ratios (R) were also calculated for these analyses, which considers the measurement precision of both exposure and outcome, and represents the likelihood that the observed direction of association is correct [25]. An R>1 provides evidence to support the direction of association from exposure to outcome.