Fine-Scale Patterns of Population Stratification Confound Rare Variant Association Tests

Advances in next-generation sequencing technology have enabled systematic exploration of the contribution of rare variation to Mendelian and complex diseases. Although it is well known that population stratification can generate spurious associations with common alleles, its impact on rare variant association methods remains poorly understood. Here, we performed exhaustive coalescent simulations with demographic parameters calibrated from exome sequence data to evaluate the performance of nine rare variant association methods in the presence of fine-scale population structure. We find that all methods have an inflated spurious association rate for parameter values that are consistent with levels of differentiation typical of European populations. For example, at a nominal significance level of 5%, some test statistics have a spurious association rate as high as 40%. Finally, we empirically assess the impact of population stratification in a large data set of 4,298 European American exomes. Our results have important implications for the design, analysis, and interpretation of rare variant genome-wide association studies.


Figures
Calibrate parameters using Eq. 4-5 on exome data Using parameters from A generate case/cont. with Eq. 1-3.
Using parameters from A generate case/cont. with logistic function first, and then Eq. 1-3.

Calibrated Parameters Five Pop. Null Test Five Pop. Power Test
Using parameters from A, but only two pops with var. split times and generate case/cont. with Eq. 1-3.

Two Pop. Null Test
A B C D Figure S1: Simulation scenarios for the various analyzes. A) This is a reproduction of Figure 1 from the main text. B) Data produced by this scenario were used to test spurious association rates with five populations. C) Similar to B, data was combined with a logistic regression to generate 'causative' variants in order to test for power. D) is a slimmed down version of B where the parameter of split time was varied to test questions of divergence vs spurious association rates.    Figure S5: The effects of PCA correction of logistic regression based methods. Similar to the results reported for CMC in Figure 3, these are the results of performance of T1, StepUp, and StepDown on the five population scenario. The first column is T1, then StepUp, and finally StepDown where the first row has no PC correction, the second has one PC as a covariate, and the final row has ten PCs included as covariates. A Spurious association rate (SAR) lower than 5% are represented as white, with other levels signified by sequential coloration with red the lowest and blue the highest.  Figure S7: The effects on low P-values of PCA correction in logistic regression based method.
Here, we have reperformed the analysis of Figure 2 and Figure S5 for the T1 method, but increased to 100,000 permutations in order to sample low p-values. All of the data is the same as before, including phenotypes, and with 1000 "gene" repetitions the expectation is zero with an α = 0.0001. A Spurious association rate (SAR) lower than 0.1% are represented as white, with other levels signified by sequential coloration with red the lowest and blue the highest (in this figure  8%).   Figure S9: The effects of PCA correction of logistic regression based methods in the two population scenario. The first column is CMC, then T1, then StepUp, and finally StepDown where the first row has no PC correction, the second has one PC as a covariate, and the final row has ten PCs included as covariates. The dashed black line represents the 0.05 α value used to determine significance and the dotted lines represent the 95% confidence intervals calculated by bootstrapping.  a logistic regression test predicting on the presence or absence of any rare variant (≤ rare) in a region (for example if rare is defined as 1% then it is the common T1 test) [1][2][3]. Combined Multivariate and based on a logistic regression all variants are collapsed Collapsed Test (CMC) into an aggregate like T(rare), but the common variants (defined as > rare) are each included as a separate predictor on the phenotype, thus the model is

Tables
where X a is the aggregate variable and N common variants. Madsen-Browning as implemented in Madsen and Browning (2009) [4], Weight Test which is based on a rank statistic of variants, weighted by allele frequency in unaffected individuals (e.g. 1/[n i * p i * q i ]). Significance is assessed as a one tailed test by either normal approximation by permutation or standard permutation. Variable Threshold a one tailed Z statistic, which is optimized by assessing Test the frequency of alleles that should be included [3]. RareCover as implemented by Bhatia et al. (2010) [5] it is a χ 2 that selects and collapses rare variants with a greedy optimization algorithm. StepUp similar to RareCover only based on logistic regression [6,7]. Initially the model fits each variant separately to estimate relative coefficients (negative equals protective, positive equals detrimental). Then each variant is added to the appropriate aggregate variable one at a time optimizing for the highest likelihood. The model is logit(Y ) = α + β p * X p + β n * X n where 'p' signifies positive and 'n' signifies negative. StepDown is a variant of StepUp's optimization procedure. Instead of starting with no variants and adding them one at a time, it starts with all variants in their aggregate variable and tests each one by sequential deletion. If there is no reduction in likelihood they are restored. It only cycles through the variants once and is faster than StepUp.