The evolution of group differences in changing environments

The selection pressures that have shaped the evolution of complex traits in humans remain largely unknown, and in some contexts highly contentious, perhaps above all where they concern mean trait differences among groups. To date, the discussion has focused on whether such group differences have any genetic basis, and if so, whether they are without fitness consequences and arose via random genetic drift, or whether they were driven by selection for different trait optima in different environments. Here, we highlight a plausible alternative: that many complex traits evolve under stabilizing selection in the face of shifting environmental effects. Under this scenario, there will be rapid evolution at the loci that contribute to trait variation, even when the trait optimum remains the same. These considerations underscore the strong assumptions about environmental effects that are required in ascribing trait differences among groups to genetic differences.


Introduction
The last couple of decades of research in human genetics have substantiated the perspective on phenotypes that was developed by quantitative geneticists over the past century [1,2]. In particular, it is now clear that most human traits of interest, whether morphological, physiological, or behavioral, are "complex," meaning that individuals differ in their heritable phenotypes because of numerous small contributions of loci scattered throughout the genome and varying environments [3][4][5][6]. A paradigmatic example is height [7,8]. The height of an individual adult depends on myriad alleles they carry in their genome, features of the environments in which they develop, grow up, and live, and potentially intricate interactions between their genetics and the environment.
Given the advent of genome-wide association studies (GWAS), it is now possible to map loci that contribute to human trait variation and to construct predictors of individual trait values, so-called "polygenic scores" (PGS) [9,10]. Although in their infancy, PGS have already been deployed for many purposes, notably in disease prognosis (e.g., [11][12][13][14]). PGS have also been used in comparisons between sets of individuals with different genetic ancestries (e.g., [15][16][17][18][19][20] trait (i.e., are "causal"), population structure confounding, and many other factors [21][22][23][24][25][26]. Partly as a result, even where existing PGS explain a substantial proportion of trait variation in the GWAS sample, they are poor predictors of trait values in individuals that differ from the GWAS sample in their genetic ancestry (or in other characteristics) [21,22,27,28]. Given these limitations, it is no surprise that PGS distributions differ among genetic ancestry groups, and indeed, current comparisons of PGS are very difficult to interpret [29]. An underlying question remains, however: Under what conditions should we expect the true, heritable component of trait variation to differ among groups and what, if anything, can we learn from genetic differences about the selection pressures that shaped trait differences? To date, the question of how genetic differences between groups might arise has been considered primarily in light of 2 possibilities: that a trait has no fitness effects (i.e., is "neutral") or that the trait optimum changed recently, leading to directional natural selection. It is well understood that if the alleles that contribute to variation in the trait are neutrally evolving, their frequencies will drift over time and consequently vary somewhat across the globe [19,30]. Similarly, it is intuitive that if a complex trait is under directional selection, such that higher (or lower) trait values are favored, allele frequencies will change more rapidly than under drift alone [19,[31][32][33][34][35][36][37][38]. In these cases, it is also well appreciated that because environmental effects differ over time and across the world, the resulting differences in trait values between groups are unpredictable (e.g., [29,30,39,40]).
In this Essay, we review these results but emphasize an alternative scenario, which may be quite common [31]: that the trait is under ongoing stabilizing selection, i.e., that there exists a stable optimal trait value and selection against values far from it. As we outline, under stabilizing selection for a fixed trait optimum, shifting environmental effects on the trait will lead to rapid changes in frequency at loci that influence trait variation. In other words, there will be transient polygenic adaptation even in the absence of a change in trait optimum. We discuss the implications in 2 settings where genetic differences have been interpreted in terms of trait differences: comparisons of human groups and tests for polygenic adaptation.

Assumptions
For concreteness, we focus on 2 geographically separated groups of individuals (henceforth "populations") who derive from a common ancestral group and ignore demographic effects such as those induced by changing population sizes or structure within populations. In practice, the classification of individuals into groups is at the discretion of the researcher and may depart from our assumptions (for example, because the groups subsequently came into contact and intermixed [41]). Even this simplified model suffices to highlight many interpretative challenges, however.
We consider a quantitative trait, whose individual values (Y) are given by the sum of the total genetic effect (G) across all causal sites in the genome, polymorphic or not, and the total environmental effect (E): Here, the genetic differences among individuals are due to many independently evolving loci of small effect. We make the standard assumption that alleles are additive at a locus and across loci, thereby ignoring possible dominance or epistatic effects. We do so because an additive model provides a very good fit to existing GWAS data for humans [6,[42][43][44]; because most theory for the evolution of complex traits has been developed for this model (e.g., [19,26,38,45,46]); and most importantly, because, while it is a simplification, the qualitative points that we make do not depend on it.
As is also standard, we assume that within each population, E is Normally distributed and independent of G 7 . For now, we also ignore possible interactions between genetics and environment. We note that the word "environment" has ambiguous uses in evolutionary genetics. Often, it refers to the ecological context that shapes the fitness function (i.e., the relationship between values of Y and fitness). Here, as is common in quantitative genetics, we mean the "environmental effect" on the trait, namely E in Eq 1.
In what follows, we focus on the true genetic effects on the trait and not what can currently be learned about them. We refer to the true, total genetic effect G on a trait as the "polygenic effect" (this quantity is akin to the breeding value in quantitative genetics [9] and to what in other contexts is called the genetic or genotypic value [2,19]). For a given individual, the polygenic effect is constructed by summing the alleles over every site in their genome, weighted by their effects on the trait value. Given this setup, a PGS can be viewed as an estimate of the polygenic effect (shifted by a constant). As noted above, PGS rely on results from GWAS and currently suffer from numerous limitations (e.g., [21,22,26,28]). Unless otherwise stated, however, when we discuss the genetics of trait variation, we do not mean the PGS, but the polygenic effect, assuming knowledge of the exhaustive set of causal loci and their exact effects on the trait.

Polygenic effects are expected to evolve under most plausible scenarios
A trait with no fitness effects. If fitness is the same for all trait values and correlated traits are similarly neutral, the distribution of polygenic effects and trait values will be approximately Normal in each population, and the expectation of the polygenic effects will be the same. In any given realization of the evolutionary process, however, the mean polygenic effect in the 2 populations will diverge as allele frequencies undergo genetic drift [19,46,47]. In humans, the difference in means will tend to be small (relative to the variation within populations), as there has been little drift among populations [19], and either population may end up with a slightly higher mean polygenic effect, with equal probability (Fig 1, case 1).
Random genetic drift is always occurring, so will contribute to differences in polygenic effects regardless of additional selection pressures. For that reason among others, a neutral model may be a useful point of departure. Yet it is unlikely to apply strictly. Indeed, the flip side of the high polygenicity of many traits is that variants in the human genome often affect more than 1 trait [6,48]. Such pleiotropic effects are manifest in the genetic correlations among seemingly distinct traits [49,50] and to be expected from consideration of network pleiotropy (as outlined in [51,52]). As a consequence, even when the focal trait itself has no effect on evolutionary fitness, variation in the trait may often be under selection because of the pleiotropic effects of the loci that shape it [31,53].
Directional and stabilizing selection. One alternative to strict neutrality is for the focal trait to have been under recent directional selection, such that higher (or lower) trait values were favored [6,19,35,[54][55][56][57]. As an example, it has been proposed that darker skin pigmentation was beneficial to individuals that moved to a geographic location with higher levels of ultraviolet radiation (UVR) [58]. In this loose conceptualization of directional selection, higher trait values are always better, at least transiently.
Another formulation, which may be more realistic, is to consider a trait evolving under stabilizing selection, such that most individuals have trait values around an intermediate optimum, then imagine a sudden change in optimum (due, say, to a change in UVR levels). (In this case, a higher trait value is not always favored, for example, because it may greatly exceed the new optimum value.) This scenario has been the focus of many decades of research in evolutionary quantitative genetics [1,2,6,31,53,59,60] but has received less attention in human and population genetics (but see [61][62][63][64] Stabilizing selection can arise from trade-offs between the fitness advantages and disadvantages provided by the trait, leading to selection against extreme values in either direction; a textbook example is birth weight, which leads to lower infant survival when too low or too high [65]. Importantly, stabilizing selection on a trait can also be only "apparent," arising from pleiotropic effects of variants contributing to the trait [31,48,66]. For example, a focal trait (such as height) may have no direct effect on evolutionary fitness, but individuals with extreme phenotypes (very tall or very short) could nonetheless be selected against because carrying too many alleles that nudge in the same direction (many tall alleles or many short ones) leads to other phenotypes that are deleterious [48,67]. Even if increased height itself were favored, its dynamics may still be dominated by stabilizing selection because of deleterious consequences (e.g., on cancer risk or musculoskeletal problems) of carrying too many height-increasing alleles ( [53] and references therein). Depending on one's view of what constitutes a trait, the distinction between direct and apparent stabilizing selection may be somewhat semantic [6,48].
For now, we focus on a single trait under stabilizing selection, in which case it is standard to model the polygenic effect and the trait as Normally distributed (assuming no mutation bias) (e.g., [45]), and consider the impact of changing the optimum, say to a higher trait value. Assuming this new optimum is not exceedingly far off (relative to the trait variance present within the population), it is rapidly attained through a slight, average increase in the frequencies of many trait-increasing alleles [33,37,38]. The transient period of directional selection is followed by a much longer one during which stabilizing selection on the trait again dominates allele frequency dynamics [35,38]. Regardless, there will be a rapid shift upward in the polygenic effect (Fig 1, case 2).
Stabilizing selection for a fixed optimum. While possible changes in complex traits over the course of human evolution have received a lot of attention, there is often little reason to expect contemporary populations to differ in their fitness optima. Thus, an important alternative to consider is what happens when the trait is under ongoing stabilizing selection for the same optimum in both populations. If we assume the same fitness function and the same distribution of environmental effects on the trait in the 2 populations, then the distribution of the polygenic effects will also be the same.
For many traits, however, it is unrealistic to assume that the distribution of salient environmental effects is identical across groups or that variation in these effects is negligible. Height is a particularly well-studied case, in which environmental effects (e.g., due to nutrition) are known to vary over time and among countries [68] and can be dramatic: As 1 example, South Korean women gained approximately 20 cm on average from 1896 to 1996 [69]-on par with the entire range of mean female heights observed across countries in 1996. More generally, differences in environmental effects among populations can arise from many sources, if individuals in the groups do not have identical distributions of diets, living conditions, medical care, incomes, and so forth (e.g., [22,[70][71][72]).
The seemingly innocuous observation that salient environmental effects probably differ somewhat between any 2 populations has a key implication. If we assume the same fitness function (including the same optimum) in the 2 populations, but posit that the environmental effects tend toward smaller trait values in population 2, then we expect selection favoring a higher polygenic effect in population 2 and a concomitant upward shift in the mean polygenic effect (Fig 1, case 3). For instance, if we imagine poorer maternal nutrition on average in population 2 and the same optimal birth weight in both populations, then we might expect a higher polygenic effect in population 2-not because there is selection for higher birth weight in population 2, but because of selection to counteract the poorer maternal nutrition (i.e., "genetic compensation"; [73][74][75]). This type of compensation has been invoked to explain adaptation to high-altitude hypoxia in mammals, the hypothesis being that genetic changes in high-altitude populations counteracted the maladaptive (environmental) effects of acclimatization, thereby maintaining similar hematological profiles to low-altitude populations [76]. As these examples illustrate, there can be polygenic adaptation and a shift in mean polygenic effect between populations even when there is no difference in trait optimum.
We may also consider a case in which the fitness functions are the same in the 2 populations but the variance of environmental effects differs, for instance, because individuals in population 1 encounter more heterogeneous environments than those in population 2. In this case, the fitness difference between any 2 polygenic effects will be smaller in population 1 than in population 2 because fitness in population 1 depends more heavily on environmental effects [77]. (For simplicity, we ignore further complications such as canalization and selection for robustness [64,[78][79][80].) Because the effects of stabilizing selection are more pronounced in population 2 than in population 1, at equilibrium, polygenic effects in population 2 will be less variable (Fig  1, case 4, where for ease of interpretation, the mean environmental effects are the same).
If we instead suppose that the focal trait is not under direct but under apparent stabilizing selection as a result of correlations with other traits, the effects of shifting environmental effects on polygenic effects are harder to predict; for one, they will depend on the precise nature of the pleiotropic effects [6,48,53,81]. But the qualitative point remains: In the face of changing environmental effects, stabilizing selection for a fixed optimum, like selection for a new trait optimum, can lead to polygenic adaptation.
To recap, the genetic contribution to a trait will evolve whether the trait is strictly neutral, dominated by transient directional selection for a new optimum, or under stabilizing selection for the same optimum, as long as salient environmental effects differ among groups of people or over time. Thus, we should expect that mean polygenic effects will not be identical between populations.

Interpretations of trait differences rely on strong assumptions about environmental effects
Regardless of how population differences in polygenic effects arise, their mere existence implies nothing about their relationship to trait differences. Indeed, when both polygenic and environmental effects differ, all bets are off (see, e.g., [29,30,39,74,75,82]): The mean polygenic effect could differ substantially even when the mean trait value does not differ at all (Fig 2, case A). The mean polygenic effect could be higher in population 2 when the mean trait value is lower (Fig 2, case B). Or the differences in polygenic effects and trait values could align (Fig 2, case D). Only when environmental effects on the trait are similar in the 2 populations should we expect mean trait difference to mirror the difference in mean polygenic effect (Fig 2, case C).
Such considerations highlight an important-and at times implicit-assumption made in comparisons of PGS and observed mean trait values across groups with different genetic ancestries: that environmental effects are identical (e.g., [16,83,84]). As noted in these studies, mean differences in PGS among groups often do not track differences in mean trait values: For example, on the basis of the PGS, one would predict that, on average, individuals in 1 group should be shorter than individuals in the other, when in fact they are taller. Importantly, that observation alone is not evidence that PGS are unreliable outside of the ancestry in which they are estimated. There are many reasons to expect that PGS would not port reliably from a GWAS sample to individuals of other genetic ancestries and several lines of empirical evidence that they do not [16,21,22,25,26]. Even if all current limitations of PGS were surmounted, however, a discrepancy between a between-population difference in PGS and corresponding trait may well persist, owing to differences in environmental effects between populations.

Adaptive differences in polygenic effects could reflect changes in trait optima or shifts in environmental effects
Assumptions about environmental effects also matter critically when interpreting evidence for adaptive differences in polygenic effects. Under the standard quantitative genetic assumptions, a shift in the optimum of a trait given a fixed environmental effect has mathematically equivalent implications for selection on the polygenic effect as does shifting the mean environmental effect while keeping the optimum fixed. Thus, an increase in the polygenic effect for a trait such as height could reflect selection for increased height in a fixed nutritional environment or selection for the same optimal height in an environment where nutritional effects on height are suddenly much lower, or the combined effect of the two.
This equivalence between an environmental shift for a fixed optimum and a shift in the trait optimum shines a different light on the interpretation of tests for polygenic selection. Given mounting evidence that many adaptations in human evolution were likely polygenic https://doi.org/10.1371/journal.pbio.3001072.g002 [6,35,54,55,[85][86][87], there has been a recent push to identify the signature of polygenic adaptation in sets of loci associated with a trait in a GWAS [18][19][20]32,[88][89][90][91][92][93]. Interpreting the results is extremely tricky, as a number of authors have underscored [19,[28][29][30]40], both because of technical challenges (e.g., residual stratification in GWAS and the poor portability of PGS [21][22][23]28,84,94,95]) and deeper conceptual issues, such as the problem of selection on correlated traits [6,30,53,62,92,96,97]. Nonetheless, where these methods have identified possible signatures of polygenic selection, they have often been interpreted as pointing to a trait (or a correlated trait) whose optimum value has shifted in response to natural selection in the course of human evolution. An alternative interpretation is that the optimal trait value has remained the same, but environmental effects have shifted [6,19,40,75,90]. Thus, evidence of polygenic adaptation could be reflective of a change of trait optimum, but need not be. Of course, it is also possible that drastic changes in environmental effects are often accompanied by shifts in the optimum trait value.
This interpretative ambiguity is heightened by the fact that traits are inevitably somewhat arbitrary constructs [6,40,48]. Consider body size as 1 example. Bergmann's rule [98] proposes that in homeotherms, body shape varies across the globe in part in order to optimize heat retention. Under Bergmann's rule, smaller body size leads to lower ratios of surface area to volume and higher heat retention and is therefore favored in colder climates (e.g., [99][100][101] and references therein). This selection pressure could be interpreted either as favoring different body sizes in different ecological settings or as maintaining the same heat retention optimum in the face of a changing environmental effect. Another example may be provided by skin pigmentation levels, which mediate UVR penetration into the skin and vary with UVR exposures across the globe. The extent of UVR penetration is thought to be subject to a trade-off that arises from dual effects of UVR on folate degradation and vitamin D synthesis-2 critically important vitamins [58]. Numerous studies have reported evidence for polygenic adaptation at the loci that contribute to variation in skin pigmentation levels, both across the globe and over time (e.g., [102,103]). These signatures of polygenic adaptation are usually viewed as resulting from repeated episodes of directional selection on skin pigmentation in environments with different UVR levels, but could also be seen as arising from selection for a similar degree of UVR penetration in the face of varying UVR levels. In other cases, much less is known about the relationship between the traits and fitness, and the interpretation will be all the more challenging.
That polygenic adaptation can arise both from a shift in the fitness optimum and from stabilizing selection for a fixed optimum in the face of shifting environmental effects further suggests that transient polygenic adaptation may be widespread [6,74,104]. Just how common we should expect such adaptive bouts to be depends, among other factors, on the timescale over which environmental effects change (relative to the timescale of causal allele dynamics) and their degree of autocorrelation across time [53,105], and remains to be investigated. In any case, these considerations raise interesting questions about the proper framing of tests for polygenic adaptation, if indeed polygenic adaptation has been the norm rather than the exception in human evolution.

Gene by environment interactions
In discussing how traits are expected to evolve, we have ignored the possibility of gene by environment (GxE) interactions, in which the genetic effects manifest themselves differently in distinct environments. In model organisms, in which environments can be controlled and manipulated, such interactions are ubiquitous (e.g., [2,106,107]). In humans, the same types of experiments obviously cannot be conducted, and statistical approaches to detect GxE have identified few clear-cut cases [25]. If GxE interactions turn out to be prevalent in humans too, they would only amplify the points highlighted in this Essay: (1) that we should expect polygenic effects to differ across groups; and (2) that relating differences in polygenic effects to trait differences requires strong assumptions about the stability of environmental effects.
GxE interactions are more than a complication for the models whose results we have summarized, however. Their existence challenges the validity of the generative models on which we rely to make inferences about complex traits (e.g., heritability estimation [108]). These models envision variation in a trait as arising from strictly genetic effects, independent environmental effects, and in some cases, a specific way in which genetics and environment could interact. For traits such as behaviors, for which genetic effects plausibly arise entirely and inextricably by interaction with environmental effects, it is not obvious to us that this conception is apt-even when a statistical model based on it appears to capture substantial trait variance [29,40,109,110]. If instead it is more sensible to think of such traits as emerging from GxE interactions, polygenic effects will be incommensurable across environments. These considerations are beyond the scope of this Essay, but seem to us worth keeping in mind, especially when these models are relied on, at times implicitly, to interpret differences among people over time and across the globe.

Conclusion
Whether a trait has no fitness consequences, its fitness optimum recently changed, or its fitness optimum has remained the same in the face of shifting environmental effects, we should expect mean polygenic effects to evolve. Therefore, a shift in polygenic effects does not imply that the trait optimum has changed. Nor does it follow, without strong assumptions about environmental effects, that trait value changes will mirror the direction or magnitude of the genetic changes.
Such considerations apply to any organism (e.g., [105,111]), but are particularly consequential for humans and other nonexperimental species. They underscore the importance of adaptation studies in which environmental effects can be rendered identical, as done in common garden experiments for example. In humans, however, even when current limitations of PGS are overcome, the absence of knowledge of the salient environmental effects (and possibly also their interaction with genetics) may generate a fundamental interpretative ambiguity.