^{1}

^{2}

^{3}

^{*}

^{1}

^{2}

Conceived and designed the experiments: AK DR. Performed the experiments: AK. Analyzed the data: AK. Wrote the paper: AK DR.

The authors have declared that no competing interests exist.

Allele frequency differences across populations can provide valuable information both for studying population structure and for identifying loci that have been targets of natural selection. Here, we examine the relationship between recombination rate and population differentiation in humans by analyzing two uniformly-ascertained, whole-genome data sets. We find that population differentiation as assessed by inter-continental _{ST} shows negative correlation with recombination rate, with _{ST} reduced by 10% in the tenth of the genome with the highest recombination rate compared with the tenth of the genome with the lowest recombination rate (P≪10^{−12}). This pattern cannot be explained by the mutagenic properties of recombination and instead must reflect the impact of selection in the last 100,000 years since human continental populations split. The correlation between recombination rate and _{ST} has a qualitatively different relationship for _{ST} between African and non-African populations and for _{ST} between European and East Asian populations, suggesting varying levels or types of selection in different epochs of human history.

A common assumption when analyzing patterns of human genetic variation is that most of the genome can be treated as “nearly neutral,” in the sense that the effects of natural selection on allele frequencies are very small compared with the influence of population demographic history. To test the validity of this assumption, we analyzed data from more than a million human polymorphisms and summarized allele frequency differences across populations. We find that, compared with the genome-wide average, allele frequency differences are 7% reduced on average in the tenth of the genome with the highest recombination rate and are 3% increased in the tenth with the lowest rate. Such a correlation cannot be explained by demography. Instead, the pattern reflects the fact that forces of natural selection have had a profound impact on patterns of variation throughout the genome in the last 100,000 years.

Single Nucleotide Polymorphism (SNP) allele frequency differentiation between human populations, usually measured by Wright's _{ST} _{ST}, it may be possible to learn about how much of the overall differentiation in allele frequencies across populations can be accounted for by natural selection.

Many studies have documented a positive correlation between

A substantial body of research has explored an alternative, mechanistic explanation for the observed positive correlation between nucleotide diversity and recombination, which is the mutagenic effect of recombination

In this study, we examine the relationship between recombination rate and population differentiation in allele frequencies of ascertained polymorphisms. For this type of analysis, the mutagenic effect of recombination is not a confounder, and any observed correlation is expected to be the result of selection in human history since the split of the analyzed populations. Different types of natural selection are expected to affect patterns of allele frequency differentiation: Positive selection is predicted to produce a negative correlation between _{ST} and recombination rate if adaptation is local and selective sweeps drive alleles to high frequency in some but not all of the populations between which _{ST} is measured. Selective sweeps are expected to extend less far in regions of higher recombination rate, and thus allele frequency differentiation is expected to be higher on average in regions of low recombination rate _{ST} and recombination rate. While negative selection is expected to decrease population differentiation at the site under selection itself _{ST}

To empirically examine the correlation of recombination rate and allele frequency differentiation in human populations, we needed to examine data sets that were ascertained in a uniform way across the genome, in a manner that is independent of local recombination rate. We focused on two data sets: 1,110,338 Perlegen “class A” SNPs

Our primary set of analyses were carried out on 1,110,338 Perlegen “class A” autosomal SNPs

To examine whether and how population differentiation depends on recombination rate, we assigned SNPs to equally-sized bins according to the recombination rate around them. For each bin, we estimated global population differentiation for the SNPs in that bin as the _{ST} between the three population samples, which consisted of 24 individuals of European ancestry, 24 individuals of Han Chinese ancestry, and 23 African Americans ^{−6}). A linear regression of population differentiation as a function of recombination rate provides a reasonably good fit to the data, and predicts an average decrease of 0.0048 (4%) in _{ST} for every 1 cM/Mb increase in recombination rate (_{ST} of single SNPs and recombination rate around them (r = −0.015; P≪10^{−12}).

We placed 1,110,338 SNPs into 10 bins according to the recombination rate in a 3 Mb window centered on each SNP. The x-axis of all panels indicates the recombination rate, with the values indicated on the ticks corresponding to the edges between 10 bins. For each bin, at an x-axis position corresponding to the median recombination rate across the SNPs at that bin, the figure presents (A) global population differentiation between African Americans, Europeans, and Chinese; (B) _{ST} between African Americans and Europeans; (C) _{ST} between African Americans and Chinese; and (D) _{ST} between Europeans and Chinese. Error bars indicate ±1 standard error, which is estimated based on 1,000 moving block bootstraps over the SNPs in the bin. Linear regression of _{ST} estimates as a function of median recombination rate in each bin is also presented (solid line) and corresponds to (A) 0.1280–0.0048ρ (B) 0.1138–0.0057ρ (C) 0.1546–0.0067ρ and (D) 0.1156–0.0022ρ. The corresponding correlation coefficient estimates between _{ST} and median recombination rate are (A) r = −0.962 (P = 8.9×10^{−6}), (B) −0.815 (P = 0.0041), (C) −0.931 (P = 0.0001), and (D) −0.361 (P = 0.306). For comparison, population differentiation based on all SNPs in all bins combined is also presented (horizontal dotted line). The y-axis range is different between the four panels but spans 0.02 units in all.

To characterize in more detail the relationship between population differentiation and recombination rate, we designed a more elaborate statistical framework that overcame three limitations of the correlation and regression analysis described above: (i) The previous analysis did not incorporate uncertainty in _{ST} estimation that is due to a limited number of SNPs in each bin. (We did estimate standard errors in each bin via bootstrapping as presented in _{ST} and recombination rate.) (ii) Because the regression did not incorporate this uncertainty, we could not apply the analysis to many recombination rate bins since as the number of SNPs per bin decreased the noise obscuring the signal increased. (iii) The analysis did not account for correlation between SNPs due to linkage disequilibrium (LD), which is important especially since LD is itself correlated to recombination rate.

To overcome these limitations, we developed a bootstrapping framework for estimating several statistics that capture the relationship between recombination rate and population differentiation. The framework generates many data sets of the same size as the original using a Moving Block Bootstrap (MBB)

After applying the bootstrapping framework to the data set, the correlation between _{ST} and recombination rate remains extremely significant: r = −0.89±0.06 (P≪10^{−12}). Linear regression of _{ST} as a function of recombination rate results in a best fitting relationship of 0.1280–0.0049ρ, which is very similar to the naïve analysis; but, importantly, the bootstrapping framework allows us to perform hypothesis testing based on the standard error estimates across bootstraps (^{−12}) as is a t-statistic for the significance of the linear regression coefficient: −6.09±2.18 (P = 0.0051; ^{−8}). We use 10 recombination rate bins in all following analyses.

Global _{ST} |
Pairwise _{ST} |
||||||

All SNPs | All SNPs (40 bins) | Coding SNPs | Non-coding SNPs | AA vs. Europeans | AA vs. Chinese | Europeans vs. Chinese | |

Number of SNPs | 1,110,338 | 1,110,338 | 21,391 | 1,088,947 | 1,110,338 | 1,110,338 | 1,110,338 |

_{0} |
0.1280±0.0012 (≪10^{−12}) |
0.1277±0.0011 (≪10^{−12}) |
0.1381±0.0038 (≪10^{−12}) |
0.1279±0.0012 (≪10^{−12}) |
0.1137±0.0014 (≪10^{−12}) |
0.1547±0.0015 (≪10^{−12}) |
0.1156±0.0018 (≪10^{−12}) |

_{1} |
−0.0049±0.0007 (2.6×10^{−12}) |
−0.0046±0.0007 (5.0×10^{−11}) |
−0.0081±0.0022 (2.3×10^{−4}) |
−0.0048±0.0007 (7.0×10^{−12}) |
−0.0057±0.0008 (1.0×10^{−12}) |
−0.0067±0.0010 (2.1×10^{−11}) |
−0.0021±0.0010 (0.036) |

−0.8862±0.0567 (≪10^{−12}) |
−0.6476±0.0678 (≪10^{−12}) |
−0.6801±0.1286 (1.2×10^{−7}) |
−0.8823±0.0561 (≪10^{−12}) |
−0.7697±0.0589 (≪10^{−12}) |
−0.8448±0.0635 (≪10^{−12}) |
−0.3244±0.1544 (0.036) | |

−6.0926±2.1758 (0.0051) | −5.4512±0.9889 (3.5×10^{−8}) |
−2.8775±1.0702 (0.0072) | −5.8950±1.9773 (0.0029) | −3.5253±0.6955 (4.0×10^{−7}) |
−4.7959±1.2800 (1.8×10^{−4}) |
−1.0147±0.5389 (0.0597) | |

_{1}_{0} |
−0.0379±0.0052 | −0.0358±0.0049 | −0.0585±0.0147 | −0.0376±0.0053 | −0.0499±0.0064 | −0.0430±0.0059 | −0.0185±0.0088 |

We cannot envision any demographic or mechanistic explanation that would produce a correlation between recombination rate and allele frequency differentiation as observed and we hypothesize that our observations reflect a history of natural selection. Natural selection is usually expected to increase population differentiation at linked neutral sites ^{−4}) population differentiation: 0.1265±0.0016 compared with 0.1212±0.0006 for non-coding SNPs (ncSNPs).

The novel signal of a negative correlation to recombination rate that we observed is more pronounced in genes: The slope of the regression of _{ST} as a function of recombination rate is steeper for cSNPs than for ncSNPs (

Global population differentiation between African Americans, Europeans, and Chinese is presented for coding SNPs (cSNPs). Except for focusing on the 21,391 SNPs in coding exons, the figure is identical to _{ST} estimates as a function of the median recombination rate in each bin (solid line; 0.1381–0.0081ρ), the linear regression for the rest of the data set (non-coding SNPs) is provided (dashed line; 0.1278–0.0048ρ), which is very similar to the regression based on the entire data set (_{ST} of cSNPs and median recombination rate is −0.752 (P = 0.012).

To test formally whether selection has an impact on the correlation with recombination rate above and beyond the general effect of increased population differentiation in genes, while controlling for the characteristically different recombination rate in genes, we repeated the bootstrapping framework analysis on cSNPs and compared it with analysis of ncSNPs (

We next considered the effect of recombination rate on population differentiation between each pair of populations separately. A negative correlation is observed between all pairs of populations, but the pattern is qualitatively different across population pairs (_{ST} per cM/Mb (to account for the varying levels of population differentiation between different populations), for _{ST} between African Americans and Europeans and for _{ST} between African Americans and Chinese, than for _{ST} between Europeans and Chinese (P = 0.004 and P = 0.021; _{ST} between African Americans and Europeans and for _{ST} between African Americans and Chinese (P = 0.43; _{ST} (r = −0.0183; P≪10^{−12}), as well as for African American–Chinese _{ST} (r = −0.0147; P≪10^{−12}), but no correlation with recombination rate for European–Chinese _{ST} (r = 0.0002; P = 0.81).

The weaker correlation for the _{ST} between European and Chinese populations is driven by a dip in differentiation at very low recombination rate loci (^{−4}) as well as quadratic (P = 1.8×10^{−5}) terms. Conversely, quadratic regression gives a non-significant quadratic term for _{ST} between African Americans and each of the other two populations and if anything is slightly convex. As expected, for single SNP analysis (without binning by recombination rate), linear regression is very significant for _{ST} between African Americans and either non-African population (P≪10^{−12}). For _{ST} between Chinese and Europeans, however, linear regression is not significant (P = 0.81), while a quadratic regression is very significant (P≪10^{−12}).

These results suggest a qualitatively different effect of recombination rate on allele frequency differentiation for different pairs of human populations and in different epochs of human history. In particular, most of the signal we observed of a correlation between recombination rate and _{ST} of all three populations (

Considering the complexity of the ascertainment scheme _{ST} and recombination rate. However, GC and repeat content are expected if anything to produce a correlation in the opposite direction to the one we observed, as these features are associated with lower resequencing depth, which would be expected to bias SNPs toward having a higher minor allele frequency. High minor allele frequency SNPs are empirically observed to be more differentiated than lower frequency SNPs (_{ST} since these features are associated with higher recombination rate.

To replicate our results in an independent data set with less complex ascertainment, we applied similar analyses on a data set of uniformly-ascertained SNPs that we previously reported, where ascertainment was carried out in two chromosomes of known ancestry in a way that is independent of the effect of genomic features on coverage and where the discovery in two chromosomes cannot result in a frequency bias associated with recombination rate

Similar to _{ST} between YRI and CEU; (C) _{ST} between YRI and ASN; and (D) _{ST} between CEU and ASN. Linear regression as a function of the median recombination rate (solid line) is (A) 0.1473–0.0026ρ (B) 0.1541–0.0028ρ (C) 0.1819–0.0046ρ, and (D) 0.1060–0.0005ρ. The corresponding correlation coefficient estimate between _{ST} and median recombination rate is (A) r = −0.526 (P = 0.118), (B) −0.482 (P = 0.158), (C) −0.634 (P = 0.049), and (D) −0.066 (P = 0.857).

Results from the analysis with uniformly-ascertained subsets of HapMap replicated the previous results based on the Perlegen data. In particular, we replicated the strong negative correlation between global population differentiation and recombination rate (_{ST} between YRI and CEU (P = 0.008 for correlation and P = 0.024 for regression coefficient) and for _{ST} between YRI and ASN (P = 1.0×10^{−5} and P = 3.0×10^{−4}), but not for _{ST} between CEU and ASN (P = 0.59 and P = 0.59;

In this study, we have explored whether local recombination rate is associated with allele frequency differentiation across human populations when examined on a genome-wide scale. The negative correlation we find in the Perlegen data set (the larger of the two data sets we analyzed) corresponds to an average decrease of 4% in _{ST} for every 1 cM/Mb increase in recombination rate. This correlation is mostly driven by the differentiation between African and non-African populations, where the decrease in _{ST} is 5% for every cM/Mb. The differentiation of European and East Asian populations shows a qualitatively different, inverse U-shaped relationship with recombination rate. These results are present in both the data sets we analyzed, and unlike similar results for nucleotide diversity are not sensitive to the mutagenic effect of recombination. By considering only data sets that have been uniformly-ascertained, we also ruled out the possibility that the correlation is due to ascertainment biases correlating with recombination rate.

We considered various explanations for these observations, all involving natural selection. We first considered evolution favoring higher recombination rate in functionally important elements, which could potentially contribute to the higher recombination rate observed in genes, and which could also generate a correlation between recombination rate and allele frequency differentiation. However, we realized that this would generate a correlation in the opposite direction to what we observe: This is expected to result in a higher recombination rate in functionally important regions, which exhibit higher differentiation on average (

The only force we could identify that can explain the observation of a negative correlation between recombination rate and _{ST} is directional selection; that is, hitchhiking linked to positively selected alleles (sweeps) or background selection linked to negatively selected alleles. Evidence for this explanation comes from the stronger negative correlation between population differentiation and recombination rate in coding regions, though we did not have enough data to establish the difference between coding SNPs and the rest of the genome with high statistical significance. Hitchhiking of recent, geographically localized selective sweeps, after the split of African and non-African populations is a potential explanation of these results, especially considering the magnitude of the effect we observed, since it is expected to increase population differentiation and to have a more marked effect in regions of lower recombination. Alternatively, background selection also has the potential to increase population differentiation, because it is expected to decrease within-population diversity at regions linked to loci under negative selection, which will have a more marked effect in regions of lower recombination.

The different nature of the effect these two selective forces have on population differentiation should make it possible to distinguish them in finer-scale studies

A striking observation in our study is the qualitatively different relationships between recombination rate and allele frequency differentiation for different pairs of populations, suggesting that selection has acted in different ways over different epochs of history. A possible explanation for the stronger correlation observed between African and non-African populations, compared to that between Europeans and East Asians, is the smaller effective population size in non-African population history compared with African history since they diverged. Although in general a reduced effective population size makes selection less efficient, it can increase the impact of background selection on patterns of genetic variation since weakly deleterious mutations are less efficiently purged from the population, thereby reaching higher frequencies which results in more extensive background selection when they are purged

Another scenario that could potentially produce different _{ST} patterns between different pairs of populations is if selective sweeps were shared to different extents across populations. When an allele that arose in one population and is under selection enters a second population via migration, _{ST} at linked neutral sites can actually be _{ST} and recombination rate. Europeans and East Asians exchanged genes more recently than both did with Africans, and hence a larger fraction of selective sweeps are expected to be shared between these two populations, introducing a component of positive correlation between _{ST} and recombination rate. The signature of global selective sweeps is expected to decay differently with genetic distance from the selected site than the decay due to local sweeps _{ST} and recombination rate. A limitation of this explanation for our observations, however, is that the phenomenon of reduced _{ST} due to a global selective sweep has only been demonstrated for populations that are much more diverged than human populations

To further study the pattern observed between different pairs of populations, we explored the relationship between _{ST} and recombination rate in additional populations by studying data from HapMap 3, which genotyped 1,184 individuals from 11 populations _{ST} and recombination rate, with an average decrease of 3% in global _{ST} between all 11 populations for every 1 cM/Mb increase in recombination rate (_{ST}, the same observations we made with the Perlegen data are also observed, with a quadratic regression being concave only for inter-continental _{ST} between European and Asian populations (

We placed 1,326,404 autosomal HapMap 3 SNPs (release 2) _{ST} estimates as a function of the median recombination rate in each bin and partition the populations-pairs as follows: (B) _{ST} between an African and a non-African population, where a negative correlation is observed with recombination rate, and where the quadratic regression is convex, (C) _{ST} between a population of European, East Asian, or South Asian ancestry and a second population of a different one of these three ancestries, which shows a concave quadratic regression for all pairs of populations, and which recapitulates the result observed between North Europeans and East Asians in the uniformly-ascertained datasets (_{ST} between two African populations, which shows a much steeper linear regression compared to intercontinental _{ST}, as well as a convex quadratic regression, and (E) _{ST} between closely-related non-African populations (within either Europe or East Asia; genome-wide _{ST}<0.008), showing a very steep linear regression and a convex quadratic regression. _{ST} based on all SNPs in all bins combined is presented as a horizontal dotted line and is equal to 1 in panels B–E since these present normalized _{ST} values obtained by dividing each value by the genome-wide _{ST} for the same pair of populations. Population codes are as follows: WAF (“West African”) is a combined sample of YRI (Yoruba in Ibadan, Nigeria) and LWK (Luhya in Webuye, Kenya); EAS (“East Asia”) is a combined sample of CHB (Han Chinese in Beijing, China), CHD (Chinese in Metropolitan Denver, CO, USA), and JPT (Japanese in Tokyo, Japan); EUR (“Europe”) is a combined sample of CEU (ancestry from Northern and Western Europe) and TSI (Toscani in Italia); GIH is a sample of Gujarati Indians in Houston, TX, USA; MKK is a sample of Maasai in Kinyawa, Kenya; and CHI (Chinese) is a combined sample of CHB and CHD.

In addition to qualitatively replicating our findings, analysis of HapMap 3 data allows us to generalize them to additional populations. A striking result is that the relationship between _{ST} and recombination rate is stronger for _{ST} between pairs of closely-related populations, whether within or outside Africa: _{ST} between a West African sample and Maasai (of mixed West African and East African ancestry _{ST} between Italians and individuals of North-Western European ancestry decreases by 10% for every cM/Mb (_{ST} between Japanese and individuals of Chinese ancestry decreases by 4% (_{ST} between closely-related populations in a quadratic regression analysis (

The approach presented in this study allows not only a comparison of the effect selection has on allele frequency differentiation at different historical times, but also a comparison across different compartments of the genome. Repeating the analysis on Perlegen “class A” X-linked SNPs (to contrast with the autosomal analyses we report above), we observed a very significant correlation between global population differentiation of X-linked SNPs and recombination rate, with a correlation coefficient of −0.86 (P = 0.001) when partitioning the data into 10 bins (_{ST} for every 1 cM/Mb increase in recombination rate (compared with 4% predicted for the autosomes). If this suggestive result is verified, it will point to natural selection playing more of a role in allele frequency differentiation on chromosome X than on the autosomes. This observation is especially interesting in light of our recent finding that chromosome X exhibits higher allele frequency differentiation between Africans and non-Africans than would be expected from ¾ the effective population size of the autosomes

More generally, these results show that comparing the relationship of differentiation and recombination rate between different genomic regions and in different populations is a promising direction to be explored in future studies with larger data sizes. In addition to using this approach to study natural selection, by extrapolating the prediction of population differentiation to “infinite” recombination rate it might be possible to predict the level population differentiation that is due to genetic drift alone, separate from the effect of selection, since every nucleotide becomes independent of selection at nearby sites. (Population differentiation is still affected by natural selection at the sites directly under selection.) Models of demographic history would be expected to be more accurate if one used the prediction of _{ST} for high recombination rate, rather than the genome-wide average. This is in the same spirit of studies targeting regions of high recombination rate and far from functional elements to infer human demographic history _{ST} in the Perlegen data set, the genome-wide estimate is 0.121, while it is 0.113 for the SNPs in the bin with 10% highest recombination rate (_{ST} would have a profound effect. The effect would be even larger if extrapolating to higher recombination rates.

A limitation of this study is the genetic map available. We chose to use a pedigree-derived human genetic map

In conclusion, we have shown that genome-wide human population differentiation in allele frequencies is significantly correlated with recombination rate on a megabase scale, demonstrating that natural selection has had a profound effect on allele frequency distributions averaged over the last hundred thousand years. While these results likely reflect the effects of hitchhiking and background selection, disentangling the strengths of these two forces will require extending the analyses presented in this paper. One important direction is to use genetic maps that have fine spatial resolution, which may shed light on the detailed distribution of selective coefficients that have shaped allele frequency differentiation. A second direction in which these results can be extended is to compare more populations of continentally diverse ancestry. This should facilitate an exploration of the relationship between recombination rate and population differentiation during different epochs of human evolution, and should allow a better understanding of how demographic history has shaped the impact of natural selection on patterns of human genetic variation.

To examine the correlation between SNP allele frequency differentiation and recombination rate in a way that is not sensitive to the confounder of recombination rate-dependent SNP ascertainment, we limited our main analysis to autosomal Perlegen “class A” SNPs

We replicated our results in a data set of SNPs that were uniformly ascertained as polymorphic in exactly two chromosomes of the same ancestry and genotyped in all HapMap samples

We determined recombination rate around each SNP based on the deCODE genetic map

To estimate pairwise allele frequency differentiation between populations, we used the _{ST} statistic as formulated in ref. _{ST} estimates are almost identical to the estimates obtained based on the estimator of Weir and Cockerham _{ST} over all population-pairs. We estimated pairwise and global _{ST} across all SNPs in each recombination rate bin. We also estimated _{ST} standard errors (for presentation in _{ST} for all SNPs irrespective of bins, and used the standard errors for z-tests to determine whether the genome-wide _{ST} values are different, e.g. between cSNPs and ncSNPs.

We randomly resampled 1,000 data sets of SNPs from the set of all SNPs using MBB. For each of these sets, we started by stratifying resampled SNPs into bins according to their recombination rate, and then repeated the procedure that was used to analyze the original data set. Specifically, we estimated _{ST}, and then estimated the correlation and regression between _{ST} in a bin and the bin's median recombination rate _{ST} = _{0}_{1}ρ_{1}/b_{0}

Based on averaged estimates across resamplings and their standard errors, we performed two-sided z-tests for the significance of each statistic (_{1}_{0}_{0}_{1}

Relationship between population differentiation and recombination rate for different minor allele frequencies (MAFs). We divided 1,110,338 SNPs into 4 categories according to their MAF: (A) MAF≤0.125, (B) 0.125<MAF≤0.25, (C) 0.25<MAF≤0.375, and (D) 0.375<MAF (≤0.5). For each category, we partitioned SNPs into 10 bins according to the recombination rate around each SNP and presentation is similar to _{ST} estimates as a function of the median recombination rate in each bin (solid line) is (A) 0.0855–0.0010ρ (B) 0.1222–0.0048ρ (C) 0.1371–0.0059ρ, and (D) 0.1434–0.0065ρ. The corresponding correlation coefficient estimates between _{ST} and median recombination rate is (A) r = −0.691 (P = 0.0269), (B) −0.960 (P = 1.0×10^{−5}), (C) −0.954 (P = 1.9×10^{−5}), and (D) −0.865 (P = 0.0012). We emphasize that natural selection being the force behind the correlation between population differentiation and recombination rate can entail a (non-causal) relationship between MAF and recombination rates since selection changes allele frequencies. Nevertheless, the correlation of population differentiation and recombination rate is very significant for all categories of common SNPs (B–D). The results are not as significant, though a correlation is observed, for SNPs of low MAF (A), likely due to the effect of negative selection on allele frequencies.

(1.15 MB EPS)

Population differentiation in allele frequencies is inversely correlated with recombination rate on chromosome X. We placed 26,074 Perlegen “class A” X-linked SNPs into 10 bins according to the recombination rate in a 3 Mb window centered on each SNP. The x-axis of all panels indicates the recombination rate, with the values indicated on the ticks corresponding to the edges between the 10 bins. For each bin, at an x-axis position corresponding to the median recombination rate across the SNPs at that bin, the figure presents global population differentiation between African Americans, Europeans, and Chinese. Error bars indicate ±1 standard error, which is estimated based on 1,000 moving block bootstraps over the SNPs in the bin. Linear regression of _{ST} estimates as a function of the median recombination rate in each bin is also presented (solid line), corresponding to 0.2272–0.0552ρ. The corresponding correlation coefficient between _{ST} and median recombination is r = −0.860 (P = 0.0014). For comparison, population differentiation based on all SNPs in all bins combined is also presented (horizontal dotted line).

(0.01 MB EPS)

Results when recombination rate is estimated in a 5 Mb window. The figure mirrors

(1.12 MB EPS)

Results when recombination rate is estimated in a 1 Mb window. The figure mirrors

(1.15 MB EPS)

Results when filtering regions near centromeres and telomeres. In the main analyses a SNP was discarded if the 3 Mb around it overlaps either a centromere or a telomere. To more cautiously account for the possibility of our results being sensitive to centromeric or telomeric regions, we repeated the analysis while also discarding SNPs for which the 3 Mb window around them is within 5 Mb of such a region (namely, the SNP is within 6.5 Mb of such a region). The figure mirrors

(1.15 MB EPS)

Bootstrapped correlation and regression coefficient estimates of _{ST} as a function of recombination rate in uniformly-ascertained subsets of HapMap. The table mirrors _{ST} between all three HapMap populations, as well as pairwise _{ST} between each pair of populations.

(0.07 MB DOC)

We thank H. Chen, S. Mallick, N. Patterson, and M. Przeworski for discussions and comments.