A sibling method for identifying vQTLs

The propensity of a trait to vary within a population may have evolutionary, ecological, or clinical significance. In the present study we deploy sibling models to offer a novel and unbiased way to ascertain loci associated with the extent to which phenotypes vary (variance-controlling quantitative trait loci, or vQTLs). Previous methods for vQTL-mapping either exclude genetically related individuals or treat genetic relatedness among individuals as a complicating factor addressed by adjusting estimates for non-independence in phenotypes. The present method uses genetic relatedness as a tool to obtain unbiased estimates of variance effects rather than as a nuisance. The family-based approach, which utilizes random variation between siblings in minor allele counts at a locus, also allows controls for parental genotype, mean effects, and non-linear (dominance) effects that may spuriously appear to generate variation. Simulations show that the approach performs equally well as two existing methods (squared Z-score and DGLM) in controlling type I error rates when there is no unobserved confounding, and performs significantly better than these methods in the presence of small degrees of confounding. Using height and BMI as empirical applications, we investigate SNPs that alter within-family variation in height and BMI, as well as pathways that appear to be enriched. One significant SNP for BMI variability, in the MAST4 gene, replicated. Pathway analysis revealed one gene set, encoding members of several signaling pathways related to gap junction function, which appears significantly enriched for associations with within-family height variation in both datasets (while not enriched in analysis of mean levels). We recommend approximating laboratory random assignment of genotype using family data and more careful attention to the possible conflation of mean and variance effects.

Observed versus expected p-value distributions for analysis of sibling-pair standard deviation in height for FHS generation-three respondents with controls for parental genotype, mean height of sibling pair, sex, and sex difference. B) Same as in (A) except for BMI instead of height. Shaded gray regions depict 95% confidence intervals.
A. Figure D Power to detect an effect size of R 2 The figure contrasts power at three potential sample sizes (defined as the number of sibling pairs in the data)(see Methods): 1) the Framingham Heart Study (FHS) sample used in the present analysis; 2) the Adolescent and Longitudinal Study of Health (AddHealth) sample; and 3) the UK Biobank sample. Likewise, the figure contrasts two potential p-value thresholds: p < 10 − 5 for the discovery analysis; p < 0.05 for the confirmation analysis. The figure shows that although the sample used in the present analysis (FHS) is not adequately powered to detect realistic effect sizes of R 2 < 0.01, newly-released datasets with larger sibling subsamples are adequately powered to detect effects using the method. A. B.
B.   The results show two types of inflated type I error rates. First, when a variant has null effects (neither effects on the mean nor effects on the variance) and there is confounding, the DGLM has an inflated type I error rate, detecting β = 0 in 77.8% of simulations. Second, when a variant has variance effects but no mean effects, the method also has an inflated type I error rate, detecting β = 0 in 11% of cases in the absence of confounding and 70% of cases in the presence of confounding.
Coefficient on minor allele count % coef = 0 at p < 0.05  The results show an inflated type I error rate (estimate β = 0 despite the presence of allele affects on the variance and not the mean) that is smaller but still present in the demeaned data. The results also show that while demeaning reduces the type I error rate (false detection of mean effects), the transformation leads to type II errors (fails to detect variance effects when these are present).
Coefficient on minor allele count % coef = 0 at p < 0.05  The results show the percentage of simulations for which the coefficient on the minor allele count is significant at the p < 0.05 level when we regress the sibling standard deviation of the trait on this count and controls. In order for the method to adequately control for Type I error, we want this percentage to be low for the traits simulated to have 1. neither mean nor variance effects or 2. mean effects only. In order for the method to be adequately powered, we want this percentage to be high for the traits simulated to have: 1. variance effects only and 2. both mean and variance effects. For the first goal, the results show that in contrast to the squared Z-score and DGLM, which each, in the presence of an unobserved confounder, display type I error rates of around 20% in detecting variance effects in traits simulated to have mean effects only, the sibling SD method avoids this type of error (underlined rows) both with and without controls for parental genotype. The results also illustrate that the method detects variance effects when the trait either has variance effects only or when the trait exhibits both mean and variance effects. The first half of the table also shows a lower type I error rate than squared Z-score when there is no unobserved confounder.   Table K Illustrating correlation between population indicators and family-level intercept across 1000 replicates at four degrees of family-level confounding The results show that at low levels of confounding (ρ ≤ 0.1), the broad ancestry indicators are correlated with the indicator for family genotype but are too broad to fully capture the confounding. As the confounding increases beyond low levels, the ancestry indicators better capture the confounding.
cor(X ij , α j ) Percent significant β on population indicator 0 0.1520 0.01 0.1790 0.05 0.1630 0.1 0.1750 Table L Relationship between family intercept and observed genotype The table shows that as the degree of between-family confounding increases, there is a stronger relationship between the intercept that shifts levels of a trait up or down between families and the genotype.