Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits

Admixture mapping can be used to detect genetic association regions in admixed populations, such as Hispanics/Latinos, by estimating associations between local ancestry allele counts and the trait of interest. We performed admixture mapping of the blood pressure traits systolic and diastolic blood pressure (SBP, DBP), mean arterial pressure (MAP), and pulse pressure (PP), in a dataset of 12,116 participants from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Hispanics/Latinos have three predominant ancestral populations (European, African, and Amerindian), for each of which we separately tested local ancestry intervals across the genome. We identified four regions that were significantly associated with a blood pressure trait at the genome-wide admixture mapping level. A 6p21.31 Amerindian ancestry association region has multiple known associations, but none explained the admixture mapping signal. We identified variants that completely explained this signal. One of these variants had p-values of 0.02 (MAP) and 0.04 (SBP) in replication testing in Pima Indians. A 11q13.4 Amerindian ancestry association region spans a variant that was previously reported (p-value = 0.001) in a targeted association study of Blood Pressure (BP) traits and variants in the vitamin D pathway. There was no replication evidence supporting an association in the identified 17q25.3 Amerindian ancestry association region. For a region on 6p12.3, associated with African ancestry, we did not identify any candidate variants driving the association. It may be driven by rare variants. Whole genome sequence data may be necessary to fine map these association signals, which may contribute to disparities in BP traits between diverse populations.


The linear model local ancestries
Suppose that an outcome y follows the linear model . . , n where α are covariate effects, x i is a vector of covariates for participants i, g i is the genotype count (or dosage) of participants i, β its effect, and i is a random error, independent of covariates and genotypes effects (but not necessarily independent between participants). Note that we assume here that the genotype effect is the same for all individuals, regardless of subgrouping, e.g. local genetic ancestry. Suppose further that study individuals are admixed, and for simplicity, between two parental population. Therefore, variant g could have been inherited by descent from ancestry A 1 or from ancestry A 2 . Assuming that there are intervals with known local ancestry (LAI, local ancestry intervals), in each such interval all participant genotypes on each chromosome are assumed to originate from the same ancestry.
Focusing on a given local ancestry inferred interval, each individual i has two genotype alleles inherited from genotype g (for all genotypes in the interval; here we focus on a single genotype). Genotype g can take the values a or b. In a model in which allele a is the effect allele, we say that g i1 = 0 if the allele of g on one of the chromosomes is b and g i1 = 1 if it is a. Similarly, g i2 can get the values 0 or 1. More formally, we have that: g ij = 0 if genotype g on copy j its chromosome = b 1 if genotype g on copy j its chromosome = a Since g ij may be either of A 1 or A 2 , it has the conditional expected values: E[g ij = 1|A k ] = P r (g ij = 1|LAI on copy j of chromosome from population k) = p k which is the a allele frequency in population A k , k = 1, 2.

When the variant effect is the same in the two populations
In admixture mapping, we regress the outcomes against counts of (estimated) local ancestries. Therefore, our working model is 0 if the ancestry of LAI of genotype g on copy j of its chromosome = A 1 1 if the ancestry of LAI of genotype g on copy j its chromosome = A 2 Note that genotype g is not directly modeled, so that the model and the estimated effect γ does not explicitly depend on genotypes in the LAI. However the relationship between the admixture mapping estimand and the genotype effect estimand can be studied, specifically under the single allelic model which assumes that only a single genotype in the interval is associated with the outcome.
To characterize the estimand of the admixture mapping γ, consider the expected value of the outcome under various settings of local ancestry: Thus, the effect of increasing the A 2 ancestry count by 1 is γ = β(p 2 − p 1 ), depending on both the genotype effect and the difference between the allele frequencies of the two ancestries.

When the variant effect differ between the two populations
Assume now different effects between the two populations, say β 1 for A 1 in β 2 for A 2 . We get: so that γ = (β 2 p 2 − β 1 p 1 ). Here, the admixture mapping estimator may detect an effect even if p 1 = p 2 , in contrast to the same effect scenario.

Comparing power between association analysis and admixture mapping
Power for testing variants depend on their frequency, effect size, sample size, and also, the p-value threshold used for the analysis. In the case of admixture mapping, the frequency in question is in fact the frequency of the one of the ancestries. While there are many variables to consider, we here demonstrate a few scenarios, to consider cases in which admixture mapping is more powerful than association testing, in the two ancestries scenarios and under the same effect size assumption. Our calculations are based on the accepted p-value threshold for association analysis, the so-called genome-wide significance threshold 5 × 10 −8 , and the p-value threshold for admixture mapping found by Brown et al. (2017) ? for the HCHS/SOL: 5.68 × 10 −5 . Note that another scenario of higher power for admixture mapping is that of non-genotyped/imputed variant in an interval with inferred local ancestry. These variants could not possibly be detected by association analysis, but admixture mapping may yield significant associations.
Denote the probability of a variant to be inherited from ancestry A 1 by p a . An association analysis, that does not make use of local ancestry information, has that the frequency of the genotype allele is p(g = 1) = p(g = 1, allele from ancestry A 1 ) + p(g = 1, allele from ancestry A 2 ) = p(g = 1|allele from ancestry A 1 )p a + p(g = 1, allele from ancestry A 2 )(1 − p a ) = p 1 p a + p 2 (1 − p a ) = (p 1 − p 2 )p a + p 2 Therefore, power for association analysis is based on allele frequency (p 1 − p 2 )p a + p 2 and effect size β, while power for admixture mapping is based on allele frequency p a and effect size (p 1 − p 2 )β. Note that if the same p-value threshold was used for determining significance in association and admixture mapping, then association mapping would have always been more powerful than admixture mapping (given that the genotypes was available in association mapping). However, power advantage for admixture mapping comes from lower p-value threshold. This power advantage is present when local allele frequency is both relatively high (close to 0.5), |p 1 − p 2 | is relatively high, and the variant effect is relatively high, as seen in Figure H.

Additional power considerations
More complex scenario, e.g. different genotype effects between ancestries, multiple local ancestries, and multiple causal variants in the LAI are plausible but are harder to derive analytically due the the increase in the number of free parameters.   Figure H: The power difference between association and admixture mapping for given values of frequencies p a of ancestry 1 (A 1 ), effect size in standard deviation of the outcome, MAF of the single causal variant in A 1 p 1 , MAF of the causal variant in A 2 p 2 , and based on n = 12, 500 and p-value threshold for significance 5 × 10 −8 in association mapping, and 5.68 × 10 −5 in admixture mapping. Positive values indicated larger power for association mapping, and negative values indicate larger power for admixture mapping.