Better together against genetic heterogeneity: A sex-combined joint main and interaction analysis of 290 quantitative traits in the UK Biobank

Genetic effects can be sex-specific, particularly for traits such as testosterone, a sex hormone. While sex-stratified analysis provides easily interpretable sex-specific effect size estimates, the presence of sex-differences in SNP effect implies a SNP×sex interaction. This suggests the usage of the often overlooked joint test, testing for an SNP’s main and SNP×sex interaction effects simultaneously. Notably, even without individual-level data, the joint test statistic can be derived from sex-stratified summary statistics through an omnibus meta-analysis. Utilizing the available sex-stratified summary statistics of the UK Biobank, we performed such omnibus meta-analyses for 290 quantitative traits. Results revealed that this approach is robust to genetic effect heterogeneity and can outperform the traditional sex-stratified or sex-combined main effect-only tests. Therefore, we advocate using the omnibus meta-analysis that captures both the main and interaction effects. Subsequent sex-stratified analysis should be conducted for sex-specific effect size estimation and interpretation.

standard statistical tests derived directly or indirectly from regression and expected to be accurate, particularly when the trait is normally distributed.But for completeness, we report the empirical type I error rates under the null of no association and including the sensitivity analyses.

Empirical type I error rates
We used R = 5 ⇥ 10 6 replications to evaluate the empirical type I error rates at the nominal level of ↵ = 10 5 .Results in Fig i show that, as expected, all six tests are accurate, and have the nominal level ↵ = 10 5 covered by 95% binomial proportion confidence interval ( ↵ ± z p ↵(1 ↵)/R), where ↵ is empirical type I error.We note that ↵ = 5 ⇥ 10 8 is ideal but requires at least R = 10 10 replicates for accurate type I error evaluation, for each sample size considered, which is computational expensive, but In Scenario A1 where genetic effects are the same between female and male, it is easy to see that this is the best case-scenario for T 1,mega , which jointly analyzes all samples available through a (correctly-specified) main-effect-only regression model.
Similarly, T F emale should have the lowest power, as the female-only analysis ignores the n M = k • n F of the total available n samples.Additionally, when there is no effect heterogeneity, meta-analysis (linearly combining Z F emale and Z Male ) is as efficient as mega-analysis [1].Thus, T 1,metaL has the same power as T 1,mega as expected; the two power curves overlap in Fig ii Scenario A1.
As there is no interaction effect in Scenario A1, the 2 df T 2,mega test, jointly testing both the main and interaction effects, is expected to be less powerful than T 1,mega .However, results in Fig ii Scenario A1 show that the loss of power is marginal, suggesting the robustness of T 2,mega even under the scenario of no interaction effect.Additionally, T 2,mega is noticeably more powerful than the minimum-p value approach of T Min .Finally, the power curve of T 2,mega , interestingly, overlaps with that of T 2,metaQ (quadratically combining Z 2 F and Z 2 M .) In Scenario A2, where the genetic effect exists only in female, the female-only sex-stratified test T F emale is most powerful as expected, and T 1,mega is the least powerful with significant loss of power; power of T 1,metaL is practically identical to that of T 1,mega as expected.Compared with Scenario A1, the performances of T F emale and T 1,mega (and T 1,metaL ) here in Scenario A2 are reversed, suggesting that neither method is robust against different alternatives, which are unknown in practice.Importantly, T 2,metaQ is also competitive in this case, with power only slightly smaller than that of the most powerful method of T F emale .Interestingly, same as in Scenario A1, the power curve of T 2,mega overlaps with that of T 2,metaQ in Scenario A2 as well.
In Scenario A3 where genetic effects exist in both female and male but differ in magnitude and/or direction, it reasonable to predict that no method dominates the others as confirmed by results in Fig ii .Specifically, (I) When the genetic effect in male is close to zero, the relative performance of the different methods is, as expected, similar to that observed scenario A2: The female-only sex-stratified test T F emale has the highest power, but T 2,metaQ and T 2,mega are competitive.(II) When the genetic effect in male is close to 0.15 (i.e.without effect heterogeneity between male and female), the relative performance is similar to that in scenario A1: T 1,mega and T 1,metaL have the highest power, but T 2,metaQ and T 2,mega are competitive.(III) When the genetic effect in male is close to 0.15 (i.e. with severe effect heterogeneity to the extend of opposite effect directions), T 2,metaQ and T 2,mega are clearly more powerful than the other methods.
Finally, across the range of parameter values under Scenario A3, the empirical powers of T 2,mega and T 2,metaQ are the same as under A1 and A2.Thus, although theoretical justification is needed, our simulation results suggest that the 2 df interaction mega-analysis can be obtained from sex-stratified summary statistics through T 2,metaQ (quadratically combining Z 2 F emale and Z 2 Male ).

Sensitivity studies with model mis-specification and binary outcome
When the error distribution is t 4 or 2 4 , given our simulation sample size (n F = 5, 000 with male-to-female sample size ratio of 0.5, 1, 1.5, or 2), the empirical type I error rates can be slightly inflated when MAF is low at 0.05, for all six tests examined (Fig vi).This is, however, not surprising as the convergence of the least square estimator's sampling distribution is slower when the residual distribution is non-normal, particularly asymmetrical (e.g. 2 4 ) [2].Subsequently, we did not consider power evaluation with non-normal residuals.
When the phenotype values are simulated from a dominant genetic model while the working model is additive, or the trait is binary, the type I error control results are , where ↵ is the empirical type I error rate.
, where ↵ is the empirical type I error rate.

Fig i .
Fig i.The empirical type I error rates at the nominal ↵ = 10 5 based on R = 5 ⇥ 10 6 replications and their 95% confidence intervals under the null scenario, with n f = 5, 000, across different k = n m /n f and M AF .Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of female and male sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).The dashed line indicates the nominal type I error rate of 1e-5.The 95% Binomial proportion confidence interval is constructed by formula: ⇣ ↵ 1.96 p (1 ↵)↵/R, ↵ + 1.96 p (1 ↵)↵/R ⌘ , where ↵ is the empirical type I error rate.

Fig
Fig ii.Power comparison at ↵ = 5 ⇥ 10 8 with female sample size n F = 5, 000 and k = 1 (n M = 5, 000) under the three alternative scenarios, A1, A2 and A3, stratified by the MAF.A1: Homogeneous genetic effect between female and male, and the genetic effect sizes ranged from 0 to 0.3; A2: Female-only genetic effect, and the genetic effect sizes ranged from 0 to 0.3; A3: Heterogeneous genetic effect between female and male, the genetic effect in female was kept at 0.15, while the effect in male ranged from -0.3 to 0.3.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).Results for other k = n M /n F male-to-female ratios are shown in Fig iii-Fig v.

Fig
Fig iii.Power comparison at ↵ = 5e-8 with female sample sizes n f = 5, 000 under the alternative scenario A1 (homogeneous genetic effects between female and male), stratified by MAF and male-to-female sample size.The genetic effect sizes ranged from 0 to 0.3.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).

Fig
Fig iv.Power comparison at ↵ = 5e-8 with female sample sizes n f = 5, 000 under the alternative scenario A2 (female only genetic effect), stratified by MAF and male-to-female sample size.The genetic effect sizes ranged from 0 to 0.3.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).

Fig v .
Fig v. Power comparison at ↵ = 5e-8 with female sample sizes n f = 5, 000 under the alternative scenario A3 (heterogeneous effects), stratified by MAF and male-to-female sample size.The genetic effect in female was kept at 0.15, while the effect in male ranged from -0.3 to 0.3.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).
similar to above (Fig vii and Fig viii).Finally, across the three alternatives considered, the relative performances of the different tests for dominant generating models (Fig ix, the right panel) and for binary traits (Fig x, left panel for additive model and right panel for dominant model) remain consistent with those observed from normally distributed traits and additive genetic models, as shown Fig ii and Fig iii-Fig v.

Fig
Fig vi.Results of sensitivity study: non-normal residuals.The empirical type I error rates at the nominal ↵ = 1e-5 based on R = 5 ⇥ 10 6 replications and their 95% confidence intervals under the null scenario, with n f = n m = 5, 000, stratified by residual distributions (a) standard normal, (b) t 4 and (c) 2 4 .The genotypes simulated under additive model.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).The dashed line indicates the nominal type I error rate of 1e-5.The 95% Binomial proportion confidence interval is constructed by formula: ✓ ↵ 1.96

Fig
Fig vii.Results of sensitivity study: non-additive generating model.The empirical type I error rates at the nominal ↵ = 1e-5 based on R = 5 ⇥ 10 6 replications and their 95% confidence intervals under the null scenario, with n f = n m = 5, 000, stratified by residual distributions (a) standard normal, (b) t 4 and (c) 2 4 .The genotypes simulated under dominant model.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).The dashed line indicates the nominal type I error rate at 10 5 .The 95% Binomial proportion confidence interval is constructed by formula:

Fig
Fig viii.Results of sensitivity study: binary trait and non-additive generating model.The empirical type I error rates with binary response at the nominal ↵ = 1e-5 based on R = 5 ⇥ 10 6 replications and their 95% confidence intervals under the null scenario, with n f = n m = 5, 000.We simulated data from (a) additive and (b) dominant genetic models respectively, although we assume additive model for association testing.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).The dashed line indicates the nominal type I error rate at 10 5 .The 95% Binomial proportion confidence interval is constructed by formula: ✓ ↵ 1.96 q

Fig
Fig x.Binary trait power at ↵ = 5e-8 under the three alternative scenarios, stratified by genetic models and female-to-male sample size ratio (k).Sample sizes n f = 5, 000, n m = n f ⇥ k, and M AF = 0.25.The columns correspond to the additive and dominant genetic models respectively, although we assume additive model for association testing.Six association testing methods were evaluated: (1) The female-only SNP main effect test (T F emale ); (2) The minimum p-value of sex-stratified analysis (T Min ); (3) The traditional meta-analysis (T 1,metaL ); (4) SNP main effect test (T 1,mega ); (5) The omnibus meta-analysis (T 2,metaQ ), the recommended test when only sex-stratified summary statistics are available; (6) SNP main and SNP×sex interaction joint analysis (T 2,mega ).