Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome

Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.


Introduction
The distribution of fitness effects of newly arising mutations largely determines the importance of different evolutionary forces (e.g., strong and weak selection, genetic drift, recombination, and bottlenecks) on the patterning the history of a population [1]. Quantifying this distribution could, therefore, inform debates in evolutionary and medical genetics including the advantages of sex and recombination [2], the applicability of neutral and nearly neutral theories of molecular evolution to population genetics [3,4], and the importance of different population genetic theories of human complex disease [5]. Estimates for the proportion of mutations in the human genome that are neutral, deleterious, and nearly-neutral vary widely [6][7][8][9][10][11]. Many factors account for differences among estimates including quantity and quality of data, methodological approaches, and interpretation of results (including definitions of what is meant by deleterious, neutral, and nearly-neutral).
Two main approaches are used to estimate the distribution of fitness effects (DFE) of new mutations: direct estimates based on mutation accumulation (MA) experiments and indirect estimates based on population-genetic analyses of polymorphism and/or divergence data. MA experiments have generally found that the fitness distributions of new mutations are consistent with a leptokurtic (i.e., highly peaked) DFE with a maximum at neutrality [12][13][14][15]. The precise functional form (e.g., gamma, lognormal, etc.) remains unknown, however, since MA experiments are labor intensive and can only measure the fitness consequences of mutations of large effect. Another drawback is that MA cannot readily be applied to many species of interest (including humans) for both ethical and practical reasons.
Analyses of McDonald-Kreitman (MK) contingency tables (i.e., comparison of the silent to replacement ratio of polymorphisms and fixed differences) have yielded various conclusions about the DFE, including that it may be normally distributed [16], nearly exponentially distributed [17], or gamma distributed with a shape parameter between 0.1 and 1 [18]. These tests have few degrees of freedom for model fitting and therefore have low power for distinguishing between alternative distributions. MK tables also ignore allele frequency information such as the relative proportion of low-and high-frequency SNPs.
Analyses that utilize the full distribution of allele frequencies (i.e., the site-frequency spectrum) should provide better estimates of the DFE, although such methods would be very sensitive to the effects of demographic history and ascertainment biases in SNP discovery [19][20][21]. Furthermore, reliable outgroup data are needed in order to polarize SNPs (i.e., distinguish between derived SNPs at lowand high-frequency), and multiple substitutions need to be considered carefully [22,23]. The site-frequency spectrum has previously been used to estimate constant selection for nonstationary demographic models [19] and to estimate a DFE assuming the population under study is panmictic and constant in size [11], but neither study allowed for simultaneous inference of demography and a distribution of fitness effects. Recently, Keightley and Eyre-Walker [24] developed a method for simultaneous inference for demography and a distribution of fitness effects; however, their method is limited to very simple demographic scenarios (single population size change events) and was applied to a human dataset with a small number of loci without outgroup data. In this study, we describe a method to simultaneously infer both demography and the DFE from genome-wide polymorphism data, and describe the results of applying this method to a large polarized SNP dataset of 47,576 coding SNPs free of ascertainment bias, from two human populations. This approach enables us to estimate the DFE with more precision than previous approaches as well as explore various distributions that may underlie these effects and estimate the proportion of mutations fixed by adaptive evolution.
Briefly, we extend the Poisson Random Field approach outlined in Williamson et al. [19] for estimating demography and a constant selection coefficient to allow for inference of the DFE of newly arising mutations. This maximum likelihood approach uses putatively neutral synonymous SNPs to infer the demographic history of the population and then numerically solves for the transient distribution of allele frequencies for a given selective effect under that demographic model. We integrate these distributions over a range of possible fitness effects, weighted according to the candidate distribution, to find the maximum likelihood distribution of fitness effects. As in previous studies (e.g., [1,18,24]), we focus on the DFE of deleterious nonsynonymous mutations. We assume s and 2s are the selective disadvantage of heterozygotes and homozygous mutants and infer the DFE based on the scaled selection parameter, c = 2N e s. Our inference method only considers polymorphism data; however, once we have inferred the demographic and selective parameters, we can calculate the expected number of nonsynonymous fixed differences between humans and chimpanzees. Comparing our estimate to the number of observed differences offers a means of testing the predictions of our model.
We applied this method to a database of human autosomal coding SNPs ascertained without bias and for which a syntenic, informative chimpanzee outgroup nucleotide was available. SNPs were initially obtained by direct resequencing of exons from 20,362 putative genes in 15 African Americans and 20 European Americans. In total, 17.8 Mb of genic sequence (approximately 61% of autosomal RefSeq gene bases) containing 47,576 diallelic SNPs (25,145 synonymous and 22,431 nonsynonymous) from 11,404 genes met our bioinformatic criteria and were used in our population analysis.

Numerical and Simulation Results To Test Performance of the Method
The statistical method described below was implemented in an ANSI C computer program which estimates parameters of multiepoch population size change models, and conditional on demographic parameters, estimates parameters of 13 different selection models (see Table 1). To test the accuracy of the program we simulated 100 datasets with and without linkage under the best-fit demographic model and gamma-distribution selection model inferred from the data (see Methods). The method was unbiased in estimating the demographic and selection parameters, and the amount of linkage in the dataset did not affect the accuracy or power of our inference appreciably ( Figure 1).

Inference of Demographic History
We find that a two-epoch instantaneous growth model provides a good fit to the synonymous site-frequency spectrum in the African American sample (x 2 obs = 40.3; p.0.05 based on coalescent simulations with recombination, see Figure S1, Figure S2, Figure S3 and Table S1). For the European American data, in contrast, growth or single bottleneck models were a poor fit (p,0.001). In order to even marginally fit the European American SFS data, a model with at least six parameters is necessary (i.e., the ''complex'' bottleneck scenario; x 2 obs = 133.5; p,0.01). The reason for this is that the European Americans show a large number of high-frequency derived SNPs even after correcting for multiple-hits using the method of Hernandez et al. [22]. This uptick in high frequency derived SNPs is also evident in the European American nonsynonymous site-frequency spectrum and because of the uncertainty in the cause of the departure from the model at synonymous sites, we focus much of our analyses on the African American data which showed no such bias.

Distribution of Fitness Effects of Newly Arising Mutations
To infer the DFE at nonsynonymous sites, we analyzed the unfolded nonsynonymous site-frequency spectra using 13 different selection models (see Table 1 and Figure 2). After demographic correction (see Methods and Text S1), both populations showed

Author Summary
Although mutations are known to cause varying degrees of harmful effects, it is difficult to quantify the distribution that best describes the variation of fitness effects of these mutations. Here we present a new method for inferring this distribution and inferring population history using Single Nucleotide Polymorphism (SNP) data from human populations. Using 47,576 SNPs discovered in 11,404 genes from sequencing 35 individuals (20 European Americans and 15 African Americans), we find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or lognormal) peaked near neutrality. Specifically, we predict 27-29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral, 30-42% are moderately deleterious, and nearly all the remainder are highly deleterious or lethal. Furthermore, we infer that 10-20% of amino acid differences between humans and chimpanzees were fixed by positive selection, with the remainder of differences being neutral or nearly neutral.
less diversity than expected under strict neutrality at nonsynonymous sites, especially at high-frequency derived classes, indicative of purifying selection ( Figure 2).
One-parameter models of selection, particularly those assigning a single selection coefficient to all nonsynonymous mutations, did a poor job of recovering the nonsynonymous site-frequency spectrum and predicting the number of observed nonsynon-ymous differences ( Table 1). The best one-parameter model was a model of purifying selection which predicts 32.2% of amino acid sites are neutral, and 68.8% are strongly constrained. This model is 136 log-likelihood units below the maximum of more complex two-and three-parameter models, suggesting additional parameters provide highly significant improvements to the models.  Several two-parameter selection models-notably the gamma and lognormal-resulted in much better predictions (DLL = 3.7 and DLL = 5.1 below the maximum). All of these best-fit twoparameter selection models slightly under-predicted the number of observed nonsynonymous human-chimp differences which is consistent with the fixation of rare positively-selected mutations that were not incorporated into these models. Three-parameter selection models did not provide a measurably improved fit to the site-frequency spectrum except for the normal distribution model with a proportion of lethal site (normal+lethal, DLL = 0), which was the only model tested that allowed for weak positive selection on segregating variation.
The best fitting gamma and lognormal distributions are both highly leptokurtic, with the bulk of mass centered on neutrality and a long negative tail extending into lethality. Our confidence in the proportion of nearly neutral mutations (|s|,0.0001) is tightly clustered around 28% in African Americans (27.3-29.0%, Tables 2  and 3). Our inferences are less tightly centered for more deleterious classes, as expected since the estimates of these classes are basically extrapolations of the distribution from the nearlyneutral data, since highly deleterious classes contribute little if any to observed polymorphism. Our confidence intervals incorporate both our demographic and selection parameter uncertainty, although they do critically depend on the selective effects following the gamma distribution. For example, the normal+lethal model also provides a good fit to the data but predicts somewhat different proportions of mutations in each selective class ( Figure 3). This illustrates a general point that methods that make use of standing genetic variation to infer the strength of selection have little power to distinguish among models for the strength of highly deleterious mutations since these mutations contribute little to extant polymorphism. We note that in African Americans, the mean selection coefficient for newly arising mutations under the best-fit distributions is 20.058 for the lognormal and 20.029 for the gamma (95% C.I. = [20.059, 20.018]), but the mean selection coefficient for mutations that are segregating (have two alleles in the sample) is less negative and nearly identical for the two distributions (20.000136 for the lognormal and 20.000140 with 95% C.I. = [20.000143, 20.000134]) for the gamma). Recently, Kryukov et al. [25] have estimated s for rare segregating human alleles, and their estimated range (20.003-20.001) is, encouragingly, between our estimate for new mutations and for segregating alleles.
Although selective inference is complicated by the complex demographic history of European Americans, we find no evidence that the DFE among new mutations differs between European and African Americans if we assume a gamma or lognormal distribution of fitness effects (Table 2; Figure S4). Other distributions provide a better fit to the European American data, but these all have a neutral point mass or a positive selection component which may lead to overfitting to the uptick of highfrequency derived SNPs observed in this population. We find it highly unlikely that the true DFE lacks moderately deleterious mutations, as is predicted by the best-fitting exponential+neutral or normal+gamma models inferred from the European American data ( Table 2). We infer that European Americans have a similar mean selection coefficient for newly arising mutations as African Americans (20.030 under the best-fit gamma model), but that the  Our estimates of the proportion of mutations that are strongly deleterious (|s|.1%), mildly deleterious (0.1%,|s|,1%), weakly deleterious (0.01%,|s|,0.1%), and nearly neutral (|s|,0.01%) are not affected by using the folded (i.e., the distribution of minor allele frequency) vs. unfolded site-frequency spectrum or by removing singletons from the analysis (Table 3). This suggests that biases in SNP calling (which would predominantly affect singleton SNPs) and polarization of the ancestral state of mutations (which would only affect the unfolded SFS) have little influence on our inferences from the African American data. Furthermore, reasonable levels of weak, purifying selection at putatively neutral synonymous sites have only a modest affect on the inference of selection at nonsynonymous sites, although they can alter the demographic parameters significantly ( Table 3).

Distribution of Fitness Effects on Nonsynonymous SNPs and Fixed Differences
Based on our inference of the DFE of newly arising mutations, we can estimate the proportion of segregating SNPs or humanchimp fixed differences that fall within various selective ranges. Assuming the best-fit two-parameter gamma-distributed selection model, we estimate that around half of nonsynonymous mutations are strongly or mildly deleterious (|s|.0.1%). Because of the strength of purifying selection, however, we estimate that fewer than 0.5% of amino acid replacing SNPs segregating at any frequency above 5% have fitness effects this extreme (see orange  Figure 3. Cummulative proportion of nonsynonymous mutations with a selection coefficient less than s. Gamma and lognormal curves represent the best-fit gamma and lognormal models to the African American polymorphism data (Table 1). Gamma+pos and wnorm are the best-fit gamma distribution with positive selection at 2N e s = 5 and best-fit weighted normal model to the African American polymorphism+divergence data. All four distributions predict nearly identical site-frequency spectra that closely match the observed data. Left side are deleterious selection coefficients; right side are advantageous selection coefficients. doi:10.1371/journal.pgen.1000083.g003 bars in Figure 4). That is, most of the segregating variation above 5% frequency in the population is predicted to be nearly neutral (|s|,0.01%; Figure 4 green and blue bars) or weakly deleterious (0.01%,|s|,0.1%; Figure 4 yellow bars) with a higher proportion of neutral variation as the allele frequency increases. This culminates in a prediction that only 1% of nonsynonymous human-chimp differences are slightly deleterious (yellow bars), with the remainder being neutral or positively selected ( Figure 4). Our data do show significant evidence for positive selection having played an important role in patterning human-chimp amino acid divergence. All best-fit two-and three-parameter selection models which do not include positive selection (i.e., all except Normal+lethal) underpredict the number of human-chimp nonsynonymous differences, consistent with a portion of differences being fixed through positive selection. For example, the expected number of nonsynonymous differences under the African American gamma model (20,113) is 9.3% less than what we observe (22,180; 95% C.L. = 5.4%, 12.8%). Modeling the DFE as a gamma function plus a positive selection component ( Figure 4A-C) results in a significantly better fit to the nonsynonymous sitefrequency spectrum than the best-fit gamma function (Ddf = 1; DLL = 4.1 for model in Figure 4B) or the best-fit gamma function with a neutral point mass (Ddf = 0; DLL = 3.3 for model Figure 4B). Furthermore, including positive selection of varying strength (from weak to moderate to strong) into the model explicitly ( Figure 4A-C) increases our estimate of the proportion of differences fixed through adaptive evolution to around 20% instead of 9.3% but does not alter our deleterious mutation estimates appreciably. Based on these inferences, we expect that 10-20% of humanchimp differences were fixed through positive selection, depending on the exact nature of the DFE of beneficial mutations and the relative proportion of weakly versus strongly beneficial mutations. It is also important to note that if the strength of selection of beneficial mutations is very small (e.g., c+#5), then our approach predicts that more than 20% of human-chimp differences are adaptive and that appreciable levels of human segregating variation are subject to positive selection ( Figure 4A).

Concordance of PolyPhen Classifications and Inference of Selection
To further refine our estimates, we utilized PolyPhen [26,27] to classify amino acid replacing SNPs as ''benign'', ''possibly damaging'' or ''probably damaging'' based on site-specific sequence conservation among mammals as well as location in the three dimensional structure of the protein molecule (if known). (It is important to note that the term ''damaging'' used by PolyPhen is meant to reflect only that the mutation affects protein structure and not that the mutation results in loss or gain of function). After running PolyPhen, our dataset was reclassified into 15,916 benign, 4,199 possibly damaging, and 2,646 probably damaging SNPs, and we inferred the DFE for each class. As expected, the degree of purifying selection increases from benign to possibly damaging to probably damaging (Table 4). For possibly and probably damaging SNPs, the lognormal distribution was the best-fitting two-parameter model; for benign SNPs, the neutra-l+exponential distribution provided the best fit (Table 4). We estimate that approximately 5% of nonsynonymous differences classified by PolyPhen as benign were fixed through positive selection, while 27% and 35% of those classified as possibly and probably damaging were inferred to be fixed through positive selection. Furthermore, although the selection coefficient of a segregating benign mutation is less negative than that of a segregating damaging mutation, because there are so many more benign mutations and because they segregate at higher frequencies, the cumulative effect of benign mutations on fitness is greater than the cumulative effect of the rarer but more damaging mutations. Based on the best-fit models in Table 4 and the observed allele frequencies in each class, we predict the average fitness in the population is reduced by 13.4% due to weakly deleterious ''benign'' alleles but only by 4.3% and 3.2% for possibly and probably damaging alleles, respectively.

Discussion
This study represents the most comprehensive analysis to date using polymorphism frequencies to infer the DFE of newly arising mutations in the human genome. Several previous studies have suggested that the DFE is highly leptokurtic with a sharp peak around neutrality and a long tail extending into lethality [11,18,24]. Here, we find that such models provide a good fit to the data, but so does a mixture model with a Gaussian component centered on moderately deleterious mutations (,45%) and 55% of mutations being lethal.
Our estimate that 27.3-29.0% of nonsynonymous mutations are neutral is remarkably consistent (although somewhat larger) than Eyre-Walker et al.'s [11] estimate of 23% (21-29%) inferred from a smaller dataset using folded frequency data and the hybrid approach of Yampolsky et al. [10] except that we expect a slightly larger proportion of mutations to be nearly neutral (Table 5). Using the folded site-frequency spectrum (i.e., minor allele frequency distribution) or removing singleton SNPs had little effect on the predictions of the model ( Table 3), suggesting that they are robust to errors in SNP calling, assembly, and alignment to the chimpanzee genome. Even if selection at synonymous sites were as strong as c = 21 (stronger than current estimates in humans; e.g., [28,29]), the parameter estimates for the fitness distribution are within those estimated under the neutral assumption ( Table 3). The demographic inference, however, changes if there is selection at silent sites-we infer a significantly larger population size undergoing an expansion event significantly more recently than we did under the assumption of synonymous sites being neutral (Table 3). Since c = 21 is almost certainly an overestimate of selection on synonymous sites, and since the resulting model results in a poorer fit to both the synonymous and nonsynonymous site-frequency spectra (DLLsil = 14.4 and DLLrepl = 8.1 in African Americans), our inferences using c = 0 at synonymous sites should be reasonable.
It is important to note that our approach differs from some recent methods proposed for estimating the DFE on new mutations as well as some recent estimates of the DFE in humans (e.g., [24]). That algorithm, as described and implemented, only considers a single size change event due to computational limitations of their matrix multiplication approach. Our method, on the other hand, uses an algorithm with running time independent of the number of size change events, so that is effectively no limit to the number of events that can be considered (e.g., the population size can change every generation with essentially no added overhead). Our conclusions also differ from those of the previous group in that we find no significant difference in the DFE between African and European Americans, although we do find that alleles segregating in European Americans are on average more deleterious than those segregating in African Americans consistent with Lohmueller et al. [30]. Our comparison of observed versus predicted site-frequency spectra for silent sites suggests that our three-size-change model is a significantly better fit than the single-size-change model, and that the difference in DFEs between Europeans and African Americans reported by [24] may largely be a consequence of a poor fit of the single size change model to the European SFS data. That is, demography matters when estimating the parameters of the DFE as our and their analysis suggests and if the demographic model is misspecified, estimates of the parameters in the DFE will be significantly biased.
Despite the large number of deleterious mutations that enter the population each generation, we estimate that the vast majority of common human genetic variation (e.g., SNPs at .5% derived allele frequency) is neutral or nearly neutral. We hold that these results have important implications for interpreting the outcome of whole genome association mapping studies that aim to identify alleles underlying common human diseases. Specifically, if chronic disease has a negative impact on Darwinian fitness, whole-genome association mapping approaches that survey common genetic variation for association with disease may be missing the most evolutionarily interesting and medically relevant alleles. The reason is that these assays have highest power to identify susceptibility alleles at moderate frequency, and are therefore biased towards finding evolutionarily neutral mutations (or formerly positively selected alleles). Re-sequencing in large samples of phenotypically extreme individuals, on the other hand, is much more likely to discovery rare, large-effect mutations that are predicted by our analysis (and others) to be deleterious. Population genetic theory also predicts that deleterious alleles will not spread widely from their geographic point of origin unless they stochastically drift in frequency due to founder effects or bottlenecks. Therefore, one would not expect, a priori, an association between a deleterious mutation and a disease phenotype to replicate across human populations even if the mutation has a significant attributable risk within a single population. Conversely, neutral polymorphisms (and formerly positively selected polymorphisms) that affect disease may likely replicate across human populations. These ascertainment biases in the evolutionary classes of alleles that can currently be associated with common human disease cloud our understanding of the relationship between chronic disease and Darwinian fitness.
Future work could extend this approach to allow for nonadditive genetic effects and inference of more complex demographic models from joint site-frequency spectra. The method of Williamson et al. [31] of inferring selection and dominance could be incorporated into this model, although it is likely that the degree of dominance of new mutations follows a distribution whose shape may depend on the fitness effect of the mutation. Based on our results, it is unlikely that the synonymous and nonsynonymous site-frequency spectra contain enough information to allow for an inference of dominance parameters in addition to selective and demographic parameters. Like previous studies of the DFE in humans, our results should be interpreted as the distribution of heterozygote fitnesses over all genetic backgrounds [11].
A potential concern with our analysis is that we ignore linkage among selected alleles and among selected and neutral alleles in deriving predictions for the site-frequency spectrum. Ignoring linkage may bias our inference regarding the parameters of the DFE. It is important to note, however, the level of linkage in our dataset (15,681 nonsynonymous SNPs in African Americans spread over 11,404 genes and ,100,000 exons) is slight and simulation results suggest that our parameter estimates and confidence intervals are robust to this complication (Figure 1).
A consequence of this DFE is that the molecular clock should be fairly insensitive to changes in N e at nonsynonymous sites. Assuming mutations occur at a constant rate per generation, nearly neutral theory predicts that a selective regime that is gamma-distributed with a shape parameter a results in a rate of amino acid substitution proportional to N e 2a per generation [4,32]. Since generation time is approximately proportional to N e 0.33 in mammals [32], we expect amino acid replacing substitutions to fix at a rate proportional to ,N e 0.15 per year if a = 0.158-0.206 as we have inferred. This would explain why the observed relationship between population size and substitution rate is stronger at synonymous sites (which should fix at a rate proportional to N e 0.33 ) than at nonsynonymous sites [3,4,33]. However, our estimate of a is sensitive to whether or not there is a point mass at neutrality in addition to a gamma distribution of fitness effects (Table 1).
Encouragingly, our frequency-based method of selective inference concurs with PolyPhen's fitness classification method which is based on conservation and physiochemical properties and does not utilize frequency information. Benign mutations are best described by a model with a significant neutral class (40.9%) and an exponentially-distributed deleterious class, whereas possibly and probably damaging mutations are best described by lognormal distribution with significant levels of deleterious mutation (Table 4), although these last two distributions were not significantly better fitting than the gamma distribution. Despite the heterogeneity of these distributions, the overall estimate of adaptive evolution is the same as that obtained from using a single gamma distribution-both models lead to a ,10% underprediction in the number of human-chimp nonsynonymous differences. The proportionately higher levels of inferred adaptive evolution at possibly and probably damaging sites versus benign sites agrees with Kimura's [6] observation that stochastic loss of weak beneficial mutations results in mutations of intermediate effect being most likely to adaptively fix rather than those of small effect as predicted by Fisher [34]. Table 5. A comparison of several recent estimates of the DFE at nonsynonymous sites in humans.

this study
Eyre-Walker et al. [11] Yampolsky et al. [10] African American (gamma distribution) Note that the selection intervals are those used by [10] and differ from those used in Tables 2-4. doi:10.1371/journal.pgen.1000083.t005 In this investigation we did not look directly at the distribution of positively selected amino acid mutations owing to the difficulty of inferring the distribution when there are so many fewer positively selected mutations segregating than neutral or negatively selected ones. Nevertheless, we note that both African and European Americans exhibit about 10% more amino acid divergence than expected from best-fitting selective models that ignore positive selection. This suggests 10-20% of amino acid divergence is due to positive selection (Figure 4). This result are somewhat consistent with Fay et al.'s [8] estimate that 35% of human-chimp nonsynonymous fixed differences were adaptively driven, but contrast with recent calculations suggesting essentially none of these differences were driven by positive selection [35,36]. We note that non-selective factors influencing fixation rates (such as linkage) should apply equally to synonymous sites and should not bias the number of nonsynonymous differences relative to the number of synonymous differences (divergence time was estimated from synonymous differences).
Despite the considerable dataset used in this study, our power to infer selective distributions was limited to two-and threeparameter models, some of which were indistinguishable. Although there are good theoretical and experimental reasons to prefer leptokurtic distributions like the gamma and the lognormal to fit the DFE (e.g., [4,37,38]), even with a large, genome-wide resequencing dataset, site-frequency spectrum data alone are insufficient to rule out alternative models such as the normal+lethal (Gaussian shift) model. Previous studies have noted the lack of power to infer the tails of the fitness distribution from population genetic analyses, but we see that even distributions that differ significantly at intermediate frequencies can predict nearly identical site-frequency spectra ( Figure 3). Furthermore, complex demography such as that found in our European American sample (see Text S1) can limit the power to infer selection even more. Lastly, it is important to note that we have not explicitly included migration in this analysis due to computational complexity. We hope to do so in future work, and while we suspect it will improve demographic inference, we do not expect it will greatly change our estimates of the DFE.
Although we encourage adding more genes to the dataset, such work is unlikely to increase the power of our analysis substantially. Deeper resequencing within humans, on the other hand, may likely yield improvements as estimates of rare allele frequencies are refined. Likewise, if the shape of the DFE is similar across species, analyzing the DFEs in species with various population sizes could be quite informative [1,18,24]. We note that when N e is very large, the assumption that synonymous sites are neutral may no longer be met [39]; however, our method is capable of jointly inferring the distribution of nearly neutral fitness effects and demography from the synonymous site-frequency spectrum prior to inferring the distribution of nonsynonymous fitness effects (as we did for Table 3). Independent estimates of the distribution of synonymous site fitness effects would be needed, however, to minimize the number of parameters being estimated from the site-frequency spectra on top of the demographic and nonsynonymous selective parameters.

Bioinformatic Pipeline
Resequencing was performed by Celera Genomics using PCR amplification of over 20,362 putative genes from the 2001 human genome in 19 African Americans and 20 European Americans (see [40] for a detailed description of this sequencing). These potential gene regions were included in our analysis if they could be uniquely mapped to hg18 in a region containing a refseq18 gene [41]. Refseq18 annotations were used to determine reading frame; regions that mapped to multiple Refseq genes were excluded if two or more of these genes were out-of-frame (89 kb of the 18.3 Mb of aligned regions). Furthermore, isPCR [42] was run on the 199k amplified primer pairs with perfect match = 15 and maximum product size = 800 bp. Regions with multiple isPCR hits (16.2 kb containing 76 SNPs) were removed from the data.
Finally, these high-quality amplified coding regions were included in the final analysis if they occurred in genomic regions corresponding to top-level syntenic or inverted chimp regions in net alignments between hg18 and panTro2 [42,43]. In this way, chimp outgroup information could be used to polarize the SNPs. In total, we analyzed 17.

Inference of Demography
To control for the effect of demographic history on SNP frequencies, we used the method of Williamson et al. [19] to find the best-fit history of instantaneous population size changes to account for the frequency spectrum of synonymous SNPs assuming that such SNPs behave neutrally. To reduce any bias in this estimate caused by recent admixture, we removed from the analysis the four African Americans that showed high levels of European admixture. Since some SNPs did not amplify in all 15 non-admixed African American individuals or 20 (non-admixed) European American individuals, we performed a hypergeometric projection (gsl_ran_hypergeometric_pdf in Galassi et al. 2006) down to N = 24 chromosomes (in African Americans) and N = 32 chromosomes (in European Americans) [44]. SNPs where fewer than 80% of the chromosomes were sampled in a population or with some low quality base calls were excluded from the analysis for that population. After projection, the synonymous site frequency spectra contained 10746.1 and 8663.3 SNPs and the nonsynonymous site frequency spectra contained 7052.8 and 6273.0 SNPs for the African and European Americans, respectively (Table S2).
The putative ancestral state of each SNP was identified using the chimpanzee outgroup provided by the chimp genome (panTro2). Because mutation rates vary widely across the genome and are highly context dependent, we accounted for uncertainty in the ancestral state of each SNP following the method of Hernandez et al. [22]. Briefly, this method accounts for the probability of misidentifying the ancestral state of a SNP by modeling the observed frequency spectrum as a mixture of SNPs whose ancestral states were correctly identified and those that were misidentified under the context-dependent substitution process inferred by Hwang and Green [45] along the human lineage. This mixture model results in a system of equations whose unknown quantities represent the true frequency spectrum, which can readily be solved to correct for ancestral misidentification. This context-based correction is robust to the unequal distribution of CpG context between synonymous and nonsynonymous sites, a phenomenon that biases selection studies using nucleotide substitution matrices for multiple-hit correction [46].
Considering each population's corrected synonymous sitefrequency spectrum separately, we used maximum likelihood to determine the best-fit demographic model. In both populations, a two-epoch model (two free parameters: timing and magnitude of size change) was a significantly better fit than a stationary model (Table S1). This two-epoch model was sufficient to account for the observed African American but not the observed European American synonymous site-frequency spectrum ( Figure S1 and Figure S2). We optimized the parameters of the African American demographic model using a multinomial likelihood function conditioning on the number of SNPs observed in the sample (i.e., each multinomial count is the proportion of SNPs in that frequency class multiplied by the total number of SNPs observed in the data) , and estimated the scaled human-chimp divergence time by finding the value that correctly predicted the number of observed synonymous human-chimp differences under this demographic model assuming a stationary model of chimp demography (with N e = human ancestral N e ) and 300,000 generations since human-chimp divergence. We then set the European American ancestral population size and per-nucleotide mutation rate (m) to that inferred from the African American data and fit a simple bottleneck (four free parameters: timing and magnitude of bottleneck; timing and magnitude of expansion) and a complex bottleneck (six free parameters: timing and magnitude of bottleneck; timing and magnitude of recovery; timing and magnitude of expansion) using a Poisson likelihood function and back-calculated European American N anc and m using these parameters and a multinomial likelihood function. Although the maximum likelihood was nearly identical between the two bottleneck models and subsequent inferences of selection were robust to which of these two models we used (Table 2), the complex bottleneck provided a better goodness-of-fit, although it still failed to explain the excess of high-frequency derived SNPs in the sample. This excess may be caused by migration among Europeans and other populations or linkage to selected sites, although we cannot rule out uncorrected multiple hits or sequencing error. This excess is also apparent at European American nonsynonymous sites and is likely responsible for the overestimate in neutral and positively selected classes in some European American three-parameter selection distributions ( Table 2).

Inference of Distribution of Fitness Effects of Newly Arising Mutations
We use the distribution of sample frequencies among variable nucleotides in the coding gene alignments (i.e., the site-frequency spectrum of coding SNPs) to infer the distribution of fitness effects among newly arising mutations. Our analytical approach makes use of standard Wright-Fisher population genetic theory within a Poisson Random Field setting [16,19,31,[47][48][49]. The assumptions of this model include independence among SNPs, genic selection, an underlying Poisson process governing mutations, and a piecewise constant population of large size amenable to modeling using diffusion approximations. The model we employ is an extension of Williamson et al. [19], where we present the relevant population and statistical inference theory for modeling genic selection in a population experiencing a recent size change. The key addition to our previous model is a distribution of fitness effects among new mutations. This amounts to modeling the components of the site-frequency spectrum-defined as (X 1 , X 2 , …, X n ) where X i is the number of SNPs with i derived alleles in a population of n/2 individuals (n chromosomes)-as independent Poisson random variables with mean: where h is the genome-wide mutation rate, x represents the (unknown) population frequencies of mutations, f(x; H, c) is the distribution of mutation frequencies given selection (c = 2N e s) and demographic history (H), and g(c) is the distribution of fitness effects among new mutations. The synonymous mutation rate (h S ) was calculated from the number of synonymous segregating sites by the method of Williamson et al. [19]; the nonsynonymous mutation rate (h N ) was fixed as 2.5 6h S [45]. Given the potentially different demographic histories of individuals with recent African and European ancestries, we model the two groups separately. Although in theory c = 2N e s can encompass any value from 2' to ', we limited our integration of the distribution to 2N e (corresponding to lethality) to 210 26 (corresponding to neutrality), plus a point-mass (usually at c = 0) when appropriate. Only models incorporating a normal distribution were integrated above c = 0. For distributions with a significant (.0.01%) mass between c = 210 26 and 0, we added that mass to the weighting at c = 0. Piecewise integrations were performed using midpnt, midinf, and qromo [50] with EPS = 2.5610 29 . Integration usually required evaluating the site-frequency spectrum at approximately 1,000 c values. Each evaluation took around 1 sec and the result could be stored to facilitate the evaluation of subsequent distributions. Our algorithm, implemented in C, is available in the computer program prfreq available for download from the Bustamante lab website (,http://bustamantelab.cb.bscb.cornell.edu/software. shtml.).
To obtain approximate confidence intervals around the maximum likelihood estimates for each African American selective model, we generated 200 datasets from the observed African American synonymous and nonsynonymous site-frequency spectra by drawing a random Poisson variates for each entry of the sitefrequency spectrum using the observed data as the mean for the variable. We optimized the demographic parameters for each synonymous dataset and then proceeded to optimize the selection parameters under that demographic model. These inferred parameters can be ordered to generate approximate 95% confidence limits that incorporate the uncertainty of both the demographic and selective parameter estimates. Exact 95% limits would require simulating datasets with linkage and simultaneously inferring the demographic and selection parameters. However, the level of linkage in our genome-wide dataset is slight and does not affect our estimates appreciably ( Figure 1). Additionally, our algorithm is capable of jointly inferring the demographic and selection parameters, but computationally it is more demanding, and we find that the nonsynonymous site-frequency spectrum adds little power to our ability to infer demography ( Figure S3).

Inference of Mutational Effects across PolyPhen Classes
For each nonsynonymous SNP in our database, we sought a PolyPhen classification for every transcript in the human genome containing that SNP. Briefly, PolyPhen will classify a SNP as benign, possibly damaging, or probably damaging based on how conserved the site is across a multi-species alignment, functional annotation, and upon the structural consequences of the SNP [26,27]. To assess evolutionary conservation, our sequences were aligned to those in the nrdb95 database (a union of the Swissprot, Swissnew, Trembl, Tremblnew, Genbank, PIR, Wormpep and PDB databases, with sequences of .95% similarity removed; [51]) using BLAST, and conservation was measured using PSIC. Functional annotations were taken from the UniProt database (http://www.pir.uniprot.org/), and structures were taken from PDB or PQS. Excluding SNPs that were unclassifiable or had different PolyPhen classifications for different transcripts, we retained 15,916 benign, 4,199 possibly damaging, and 2,646 probably damaging SNPs. The population-specific site-frequency spectrum for each mutational class was calculated as before, and selective inference was run assuming class-specific genome-wide mutation rates. To determine what proportion of new nonsynonymous mutations PolyPhen classifies as benign, possibly damaging, or probably damaging, we randomly mutated coding sequences from RefSeq 18 using a context-dependent substitution model [45]. We mutated 0.2% of the sites for a total of 46,492 nonsynonymous SNPs across 18,657 RefSeq transcripts that were then analyzed using PolyPhen. We found 59.5%, 21.2% and 19.3% of mutations were classified as benign, possibly damaging, and probably damaging, respectively, and used these proportions to calculate the class-specific genome-wide mutation rates. Finally, we also obtained PolyPhen classifications for each human-chimp fixed difference in our dataset.