Skip to main content
Advertisement
  • Loading metrics

The distribution of fitness effects during adaptive walks using a simple genetic network

  • Nicholas L. V. O’Brien ,

    Roles Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    n.obrien@uq.edu.au (NLVO’B); d.ortizbarrientos@uq.edu.au (DO-B)

    Affiliations School of the Environment, The University of Queensland, Brisbane, Queensland, Australia, ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, QLD, Australia

  • Barbara Holland,

    Roles Methodology, Supervision, Writing – review & editing

    Affiliations School of Natural Sciences, University of Tasmania, Hobart, Tasmania, Australia, ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, Tasmania, Australia

  • Jan Engelstädter ,

    Contributed equally to this work with: Jan Engelstädter, Daniel Ortiz-Barrientos

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    Affiliations School of the Environment, The University of Queensland, Brisbane, Queensland, Australia, ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, QLD, Australia

  • Daniel Ortiz-Barrientos

    Contributed equally to this work with: Jan Engelstädter, Daniel Ortiz-Barrientos

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – review & editing

    n.obrien@uq.edu.au (NLVO’B); d.ortizbarrientos@uq.edu.au (DO-B)

    Affiliations School of the Environment, The University of Queensland, Brisbane, Queensland, Australia, ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, QLD, Australia

Abstract

The tempo and mode of adaptation depends on the availability of beneficial alleles. Genetic interactions arising from gene networks can restrict this availability. However, the extent to which networks affect adaptation remains largely unknown. Current models of evolution consider additive genotype-phenotype relationships while often ignoring the contribution of gene interactions to phenotypic variance. In this study, we model a quantitative trait as the product of a simple gene regulatory network, the negative autoregulation motif. Using forward-time genetic simulations, we measure adaptive walks towards a phenotypic optimum in both additive and network models. A key expectation from adaptive walk theory is that the distribution of fitness effects of new beneficial mutations is exponential. We found that both models instead harbored distributions with fewer large-effect beneficial alleles than expected. The network model also had a complex and bimodal distribution of fitness effects among all mutations, with a considerable density at deleterious selection coefficients. This behavior is reminiscent of the cost of complexity, where correlations among traits constrain adaptation. Our results suggest that the interactions emerging from genetic networks can generate complex and multimodal distributions of fitness effects.

Author summary

Historically, models of adaptation have typically considered traits as a sum of effects at many genes. Mutations in these genes move a population incrementally closer to an optimum phenotype. However, the genetic basis of traits is often more complex, with interwoven networks of genes creating non-additive effects that might reduce natural selection’s ability to steer populations towards optimal trait combinations. In this study, we developed a model that simulates the evolution of a trait as the product of a gene regulatory network. We used this model to compare the effects of the network on adaptation to a typical additive model. We found that mutations in network populations were more likely to drive the population away from an optimum phenotype. We likened this result to the “cost of complexity”, where more complicated genetic systems can create constraints on adaptation. Our results suggest that the non-additive interactions emerging from genetic networks might alter the adaptive dynamics predicted under additive models.

Introduction

A lingering question in the study of adaptation concerns the distribution of effect sizes of adaptive alleles that affect trait and fitness variation in natural populations. Research across various species and traits has yielded mixed results. In some cases, natural selection favors alleles with small phenotypic effects (e.g. [13]), such as the alleles contributing to the evolution of body size in mammals [4]. Other times, however, alleles with larger effects are preferred, such as those found in flower color changes that affect pollinator preferences (e.g. [57]). Understanding when allelic effects of different sizes are favored is crucial to comprehend the “adaptive walk”—the metaphorical path a population takes through genetic space as it adapts to its environment [8]. An adaptive walk begins when environmental change induces an “optimum shift”, where the phenotype with the highest fitness (the phenotypic optimum) is suddenly changed. This drives the population out of mutation-selection balance and towards the new optimum via directional selection [9]. Each “step” in the walk represents a beneficial allele fixing in the population (i.e. reaching 100% frequency), moving it closer to the new phenotypic optimum [10, 11]. The distribution of fitness effects (DFE) among beneficial alleles is a key feature of adaptation, as it describes the tempo and mode of the adaptive walk in phenotypic space: how many steps there are, how large each step is, and in which direction each step drives the population.

Adaptive walks and the distribution of fitness effects

The leading theory of adaptive walks was developed by Gillespie and Orr [1214]. Under this theory, adaptive walks are characterized by a sequence of mutations with diminishing effect sizes. A key prediction of the Gillespie-Orr model is that the fitness effects of beneficial mutations sampled during adaptation form a negative exponential distribution [14, 15]. This shape emerges as populations approach the optimum phenotype: large effect mutations become progressively disadvantageous due to the increased risk of “overshooting” the optimum. Gillespie recognized that extreme-value theory—a statistical branch focused on sampling from extreme tails of distributions—could be integrated into the genetic theory of adaptation to study the DFE. To understand this connection, consider the distribution of fitnesses among all genotypes. Under normal circumstances, the wild-type genotype should have relatively high fitness, and hence it belongs to the right tail of this distribution. The remainder of the right tail of the DFE consists of beneficial mutations. Extreme value theory predicts that this right tail can be described by a family of distributions termed the extreme value distribution (EVD). Gillespie posited that adaptive steps are samples from the EVD [13]. Studying the shape of the EVD highlights the availability of beneficial mutations to a population and which effect sizes are likely to contribute to adaptation.

The EVD can be broadly categorized into one of three “domains of attraction”, or shapes: the Gumbel, Weibull, or Fréchet family of distributions (Fig 1; [1618]). Gillespie argued that the Gumbel is most likely to represent empirical DFEs as it captures common distributions such as the normal, gamma, and log-normal [12, 13]. The Gillespie-Orr exponential expectation outlined above assumes a Gumbel domain [14]. Mutagenesis studies and spontaneous mutation experiments largely support this theoretical expectation (e.g. [14, 1924]). However, evidence in favor of alternative domains does exist, suggesting the field is still open to exploration. Adaptive walks with Weibull-distributed DFEs are characterized by fewer large-effect beneficial mutations compared to Gumbel EVDs [25, Fig 1]. Rokyta et al. [16] observed a Weibull distribution in the ID11 ssDNA phage, hinting at an upper limit on the size of beneficial fitness effects. Further, stabilizing selection can generate a Weibull-distributed EVD by limiting the size of beneficial mutations as populations approach the optimum [18]. On the other hand, Fréchet EVDs have more frequent beneficial mutations [25, Fig 1]. Schenk et al. [26] observed a Fréchet EVD in Escherichia coli adapting to antibiotics. Although it is unclear as to which conditions cause Fréchet EVDs to arise, environments which invoke strong selective pressures on populations are associated with their appearance [2629].

thumbnail
Fig 1. Examples of how the shape parameter, κ, can change the generalized Pareto distribution (GPD).

The GPD, an extreme value distribution, characterizes the behavior of extreme values: random samples from the right tail of a continuous distribution. Extreme values are used to represent beneficial mutations during an adaptive walk. Depending on κ, the GPD can belong to one of three domains: Gumbel, Weibull, or Fréchet. When κ = 0, the extreme values align with the Gumbel distribution, which is the anticipated distribution for beneficial mutations during an adaptive walk (solid line). If κ < 0, the extreme values take on a Weibull distribution, exhibiting a truncated right tail (dotted line). Decreasing κ shifts the maximum extreme value (the truncation point) towards zero. Conversely, for κ > 0, the extreme values follow the Fréchet distribution, characterized by an extended right tail (dashed line). The specific κ values used in this plot are 0, -0.75, and 1 for Gumbel, Weibull, and Fréchet, respectively.

https://doi.org/10.1371/journal.pgen.1011289.g001

There are a large number of factors which can influence the DFE among beneficial mutations and these can produce non-Gumbel EVDs. Some of these factors are environmental, such as stabilizing selection driving Weibull EVDs [18]. However, developmental and selective constraints also contribute to the shape of the DFE, implicating the structure of the genotype-phenotype-fitness (GPW) map in the DFE’s shape [3032].

Genotype-phenotype maps in adaptation

The genotype-phenotype-fitness (GPW) map is composed of two distinct components: the genotype-phenotype and the phenotype-fitness relationships. The genotype-phenotype (GP) map describes how developmental and physiological processes translate genotypes into phenotypes, while the phenotype-fitness map describes how ecological and selection regimes create fitness differences between phenotypes. For example, stabilizing selection favors intermediate phenotypes, while directional selection favors extreme phenotypes in one direction [33]. The phenotype-fitness map, along with the related genotype-fitness map, has been extensively studied in quantitative and population genetics (e.g. [3438]). Because the GP map is notoriously challenging to estimate, many quantitative genetics models assume an additive relationship between genotype and phenotype [39, 40]. These models derive from Fisher’s [41] infinitesimal model, which supposes that continuous trait distributions can be produced by loci under Mendelian segregation as long as those loci are a) many, and b) have small, additive effects on the phenotype [41]. This model is the basis of modern quantitative genetics, providing a simple GP map that requires no information about the underlying genetic systems that describe the developmental and physiological underpinnings of traits [39, 42].

Overwhelming empirical evidence suggests that non-additive gene interactions (epistasis) are ubiquitous in nature [4345] and play a crucial role in adaptation [4649]. Additive models, including the infinitesimal, often fail to capture these gene interactions, and when identified, it is unclear how they represent biological gene action [50]. Hence, it is unknown how functional epistasis might contribute to the developmental constraints that limit evolution [44].

Integrating information about the underlying systems describing trait development enables us to capture genetic interactions affecting fitness. Modeling the molecular networks underpinning trait development and expression can help us understand how gene interactions and regulatory processes shape the survival and reproductive success of organisms, and provide a mechanistic view of how variation arises in natural populations. As we delve deeper into the nature of the genotype-phenotype-fitness landscape, it becomes important to re-evaluate our foundational quantitative genetic models in light of our current understanding of the molecular basis of traits. For instance, how does the distribution of fitness effects change when non-additive GP maps are considered? Does this affect a population’s chance to adapt to a changing environment? To address these questions, we consider a simple gene regulatory network motif to motivate our approach and implement a nonlinear GP map into adaptive walk theory.

At the foundation of any trait lies a complex network of interacting genes and regulatory elements which control the expression of proteins [51]. Systems biologists employ mathematical networks known as gene regulatory networks (GRNs) to model such systems [52]. Empirical networks harbor startling complexity: for example, consider the circadian clock network in Arabidopsis thaliana. This network is driven by a number of interacting network motifs: small, common subnetworks with particular effects on gene expression. For instance, a feed-forward loop motif in this network generates pulses of expression in PRR9/7 [53]. In addition, a negative feedback loop (another motif) between CCA1/LHY and PRR5/TOC1 generates a bistable switch [53]. This switch is toggled at daybreak by the aforementioned PRR9/7 feed-forward loop and again at dusk by yet another circuit [53]. Given the complexity of many GRNs, it is common to study motifs as separate units to elucidate the reasons for their repeated recruitment into so many systems [5456]. This allows systems biologists to probe the general effects of motifs on a broad range of networks. In this study, we focus on the simplest network motif, negative autoregulation (NAR; Fig 2). This motif consists of two genes: gene X activates gene Z (indicated by the pointed arrow in Fig 2A), while Z’s product limits its further expression (indicated by the flat arrow in Fig 2A).

thumbnail
Fig 2. A negative autoregulation (NAR) motif and its expression curve.

The NAR motif consists of two genes, X and Z. The expression of X activates Z, and the expression of Z inhibits further Z production (A). This results in a characteristic expression curve (B). Z production begins when X is activated (blue shaded area) and stops when it is inactivated. Gene Z product approaches a steady state concentration and then quickly falls off in the absence of X.

https://doi.org/10.1371/journal.pgen.1011289.g002

NAR motifs are highly prevalent in biological networks. Close to half of all transcription factors in E. coli are negatively autoregulated [57, 58]. In plants, expression of the oil biosynthesis transcription factor WRINKLED1 (WRI1) is driven via a NAR network that has been evolutionarily conserved from at least the split between monocotyledons and dicotyledons (about 140 million years) [59, 60]. The ubiquity of the NAR motif in nature comes as a result of its self-balancing property: when a gene is overexpressed, the NAR mechanism triggers to reduce that gene’s production, leading to a steady state of gene expression [52]. This reduces variability in gene expression between cells and accelerates responses to environmental cues [61, 62]. Owing to its simplicity and ubiquity, we consider the NAR motif a reasonable toy model for beginning to explore the evolution of complex traits mediated by genetic networks.

How might we expect networks to affect adaptation? Genetic networks impose functional epistasis, biological interactions between genes, which can create nonlinear GP maps [63]. This results in rugged fitness landscapes where populations can get trapped at local optima because the path towards the global peak (which maximizes fitness) is paved by low-fitness genotypes [34]. Adaptation should then be limited by the structure of the fitness landscape: the more local peaks there are, the less likely it is for a population to be able to find the global optimum. In turn, the structure of the fitness landscape will depend on the nature of the underlying network. For instance, the NAR network provides a relatively simple fitness landscape. Kozuch et al. [64] investigated the fitness landscape of E. coli’s lexA NAR network. The authors found that the fitness landscape was ridge-like, with fitness maximized along a parameter space that balanced lexA production with the strength of autoregulation [64]. However, other motifs might produce different constraints on adaptation. Recent work by Baier et al. [65] found that a synthetic gene network based on a feed-forward motif produced reciprocal sign epistasis, a prerequisite for a rugged fitness landscape. Further, empirical fitness landscapes suggest that while ruggedness is common (although not ubiquitous) [66], connectivity is also high, making these rugged landscapes searchable [67].

In this paper, we use a novel approach to model a quantitative trait as the product of a NAR motif. Previous attempts to reconcile quantitative and population genetics with systems biology have used a variety of approaches. Some have focused on the explicit modeling of network structures (e.g. [68, 69]), whilst others have considered systems as hierarchical structures of developmental parameters that resemble quantitative traits (e.g. [70, 71]). Our approach is the first to our knowledge to combine both approaches. We model a quantitative trait as the expression of the NAR motif via a system of ordinary differential equations (ODE) similarly to François [69]. However, instead of modeling a gene network’s evolvability via the addition/removal of genes to the network, we instead consider the perturbation of expression dynamics via quantitative changes in the coefficients of the ODE/s, similarly to Slatkin [70]. This approach combines the biological realism of network modeling with the wealth of existing tools for studies of the evolution of quantitative traits. We use our approach to evaluate the shape of the DFE among beneficial mutations during adaptive walks, finding stark differences from classical expectations in some cases. We use Wright-Fisher simulations to describe the adaptive walks of populations following an optimum shift with either an additive or NAR GP map, and examine which extreme value distribution domain (Gumbel, Weibull, or Fréchet) best captures the behavior of our model. We then identify if the network imposes constraints on adaptation compared to an additive model. Finally, we discuss the contributions of genetic network architectures to adaptation and propose avenues for further exploration using similar systems models.

Materials and methods

The model

Modeling the NAR motif.

To model the expression patterns of the NAR motif, we first translate its network diagram (Fig 2A) to a system of ordinary differential equations (ODE). ODEs are commonly used to model gene networks due to their balance between efficiency and realism [72, 73]. The solution to the ODE predicts gene expression over a time period, such as during cell development. The NAR ODE is given by: (1)

The coefficients in this equation have biological relevance. X and Z represent the cellular concentrations of the two genes. h is the Hill coefficient reflecting the sensitivity of the system to the presence of X and/or Z products. Higher values indicate a more rapid response to increasing X and/or Z concentration [52]. KXZ and KZ are activation and repression coefficients, respectively. They control how quickly the presence of X or Z drive the activation or suppression of further Z expression [52]. In this study, we fixed these parameters (values can be found in S1 Table) in favor of studying the evolution of the remaining two: αZ and βZ.

αZ is the rate at which Z is removed from the cell. This might reflect, for example, the activity of ubiquitinase in tagging Z for removal and the 26S proteasome in breaking down the protein [74, 75]. βZ represents the production rate of Z, which is influenced by factors such as transcription factor binding affinity, enhancers, silencers, and trans-acting regulatory elements [76]. These biological interpretations provide some realism to modeling quantitative traits. It also gives interpretability to the action of mutations that affect these coefficients, which will be explained in the sections following.

X follows a step function and is only expressed within the interval t ∈ [tstart, tstop]: (2) where tstart and tstop are the time points at which X is activated and deactivated. Values for these parameters are given in S1 Table.

The solution of the ODE is characterized by a nonlinear approach to maximum Z expression, followed by a decline after X expression ceases (Fig 2B). The area underneath the expression curve (Fig 2B) is the total amount of Z produced, which we can take as a quantitative trait value. To create variation in this trait value, the coefficients of Eq 1 can be varied. We refer to these coefficients as “molecular components”. In this study, we focus on modeling mutations in two of the NAR’s molecular components: the Z degradation rate, αZ, and the Z production rate, βZ. Mutations in our model have direct effects on either αZ or βZ. This leads to a differently-shaped expression curve when the ODE is solved, and hence a (potentially) different trait value.

We adopt a model where loci contribute multiplicatively to the values of molecular components. We refer to these loci as molecular quantitative trait loci (mQTLs). The multiplicative transformation ensures that the molecular components are always positive, which is essential as these values represent rates which cannot be negative. We study the evolution of the mQTLs using the Wright-Fisher model, a foundational model in population genetics which describes the stochastic process of allele frequency change in a finite population [34, 77]. In our implementation, individuals in the Wright-Fisher population are diploid.

Although this implementation is complex for a simple adaptive walk scenario, explicitly modeling the entire population means that the adaptive walks can emerge organically and situations with small deviations from strict adaptive walk scenarios (where sometimes more than two alleles segregate in a population) are also covered. Also, this means the model is easily extensible to many different genetic architectures with different networks, numbers of contributing loci, recombination rates, and mutational effect sizes. For a more abstract formulation of such a generalised model, refer to S1 Appendix.

Phenotype calculation.

To calculate an individual’s phenotype from their set of mQTLs, we follow the below algorithm:

  1. Take the exponent of the sum of all alleles for αZ across all loci and both homologous chromosomes (this is analogous to an additive model summing effects across all loci, but with a multiplicative transformation as described above).
  2. Repeat step 1 for βZ.
  3. Substitute the αZ and βZ values into the ODE, Eq 1, and solve the ODE between time points 0 and 10 to get an expression curve.
  4. Take the area under the expression curve to get the total amount of Z expression (the phenotype).
  5. Repeat steps 1–4 for each individual in the population.

This algorithm is described mathematically below.

Let C be a vector of the molecular components: An allele at locus i and chromosome j has an effect aij on one of the molecular components.

Given an individual’s alleles, the value for each molecular component is calculated by exponentiation of the sum of allelic effects across all mQTLs and chromosomes. For molecular component k: (3) where Ck ∈ [0, ∞). LQ represents the number of causal loci along the genome. In all simulations, LQ = 2. This treatment of allelic effects assumes no explicit epistasis or dominance deviations, however these can arise as consequences of the nonlinear ODE solution.

After calculating the molecular component values (C), we can determine the amount of Z expression, the phenotype. We achieve this by constructing a system of ODEs (Eq 1) and solving for the area under the expression curve (i.e. the total Z produced during a time period).

We express the ODE as an initial value problem where given the “initial” or starting value of the function we can determine the function’s behavior for subsequent time points using the ODE: where F is the function that defines the differential equation. To calculate Z, we integrate the expression curve: (4) where tmax = 10.

In Eq 2, tstart = 1 and tstop = 6. Z activity is evaluated for 0 ≤ t ≤ 10. We chose tmax = 10 to allow for a wide range of possible phenotypes while limiting the computational cost of solving the ODE. tstart and tstop were chosen so that X is activated half of the total time evaluated.

Fitness calculation.

To map the trait value (P) to fitness (w), we utilize a Gaussian fitness function, which allows us to model directional and stabilizing selection. The Gaussian fitness function, originally described by Lande [40], is defined as follows: (5)

In this function, Δz = PPO, where PO is the optimal phenotype, and σ is the width of the fitness function. Wider functions represent weaker selection.

Evolutionary simulation.

To model the evolution of the network over multiple generations, we employ a Wright-Fisher (WF) model [34, 78]. We consider a diploid population of N = 5000 individuals with random mating and non-overlapping generations. This satisfies the requirement that 2 ≪ 1 (where 2N is the total number of genomes in the population; 2 = 0.0915), which is necessary to meet the Gillespie-Orr model’s assumption of strong selection relative to mutation (see section Model validation for more information on this) [12, 79]. Offspring are generated by sampling with replacement from parents, with the sampling probability weighted by their relative fitness. Individuals possess a pair of homologous chromosomes with two mQTLs: one for each molecular component. Random mutations occur at a rate of μ = 9.1528 × 10−6 per locus per generation, for a total rate of 1.831 × 10−5 across both loci. This mutation rate is based on the average mutation rates observed in A. thaliana and adjusted to per-locus rates by multiplying it with the average length of a eukaryotic gene (1346bp) [80, 81]. A. thaliana was chosen as it is a model plant species with good estimates of mutation rates. The mutational effects of mQTLs on molecular components were drawn from a standard normal distribution and the effect added to the previous allele at that locus such that , where anew is the new allele at a given locus and aold is the previous allelic effect at that locus. After being sampled, the allelic effects were exponentiated as per Eq 3. We assume free recombination between loci.

We also considered an additive model. In this scenario, genomes also consisted of two causal loci to match the mutational target size of the NAR model. However, the trait value was given by summing allelic effects across QTLs instead of solving an ODE: (6) where is the allelic effect of locus i on chromosome j.

Computational implementation

To investigate our model’s behavior during an adaptive walk, we implemented the model in a custom version of the forward-time, individual-based simulation software SLiM 3.7.1 [82]. Our modified SLiM implementation can be accessed openly at https://github.com/nobrien97/SLiM/releases/tag/AdaptiveWalks2023. SLiM scripts are available at https://github.com/nobrien97/NARAdaptiveWalk2023. A flowchart of the SLiM implementation is given in S1 Fig. The ODE was solved by integrating the Ascent numerical solution library [83] into SLiM. Ascent solved Eq 4 to produce trait values for the individuals in the population. Since individuals often shared genotypes (and hence had the same ODE inputs), ODE solutions were cached in memory to reduce redundant solutions and improve performance.

Both models underwent 50,000 generations of burn-in. The burn-in ensured that populations had high fitness just prior to the optimum shift, an assumption of the Gillespie-Orr model [1214]. During burn-in, populations adapted to a phenotypic optimum at PO = 1. The phenotypic optimum was then instantaneously shifted to PO = 2, and the population was monitored for 10,000 generations of adaptation. We measured the phenotypic means and allelic effects of segregating and fixed mutations every 50 generations. We replicated each model 2,880 times with 32-bit integer seeds sampled from a uniform distribution in R 4.3.1 [84]. Detailed simulation parameters can be found in Table 1. Simulations were executed on the National Computational Infrastructure’s Gadi HPC system.

thumbnail
Table 1. Simulation parameters.

Table of symbols, names, descriptions, and values for relevant parameters used in the forward-time Wright-Fisher simulations.

https://doi.org/10.1371/journal.pgen.1011289.t001

Of the simulation parameters, the most important were the fitness function width, σ, and the optimum shift amount, Oshift. The Gillespie-Orr model assumes that the wild-type has relatively high fitness, so that in the total space of genotypes, it exists on the right-tail of the distribution [14]. Hence, σ and Oshift need to be chosen so that at the optimum shift (generation 50,000) fitness is still relatively high. We chose σ and Oshift so that this assumption is met: at the optimum shift, individuals perfectly adapted to the burn-in optimum suffered a ∼ 5% drop in relative fitness.

Model validation

The Gillespie-Orr model assumes that there is strong per-locus selection relative to mutation [12, 13]. This regime is often referred to as the strong selection weak mutation (SSWM) paradigm, as opposed to the weak selection strong mutation regimes found in polygenic models (e.g. the infinitesimal model) [41, 85].

To ensure we were in the SSWM domain assumed by adaptive walk theory, we measured population heterozygosity and the effects of segregating alleles on trait variation. To measure heterozygosity, H, we used the equation (7) where Nhet is the number of individuals heterozygous at locus i, and N is the total population size. Since we had two genes in our simulations (one for each molecular component, and two for the additive model), we took the average heterozygosity across both. Under SSWM theory, heterozygosity should be close to zero, as evolution is mutation limited and beneficial alleles should be quickly fixed. To measure the effects of segregating alleles on the trait, we measured the ratio between the phenotype created by only fixations and the population mean phenotype, (8) where Pfixed is the phenotype due to only fixed alleles and is the mean population phenotype. Pfixed was calculated by removing any segregating effects from the αZ and βZ values and recalculating the phenotype via Eq 4. Under SSWM, r ≈ 1, as the phenotype of the population should be constructed from only fixations due to strong selection quickly fixing any segregating beneficial alleles.

Fitness effect calculation

To assess the shape of the DFE, we needed to measure the fitness effects of mutations that arose during the adaptive walk. We performed single gene knockouts and measured the difference between the “wild-type” (genotype AA) and knockout’s (genotype aa) relative fitness. A flowchart is given in S2(A) Fig. To measure a given mutation’s effect on fitness, we first subtracted that mutation’s homozygous effect on the molecular component (αZ or βZ) and recalculated the trait value via Eq 4. We then calculated the relative fitness of this knocked-out individual (waa) via Eq 5. To get our final homozygous fitness effect, we subtracted waa from the relative fitness of an individual with the mutation (wAA): (9)

We built a custom tool to recalculate the phenotype and fitness recalculations, implementing the Ascent C++ library to solve ODEs [83]. This library was also used in the SLiM simulations, keeping the phenotype calculation consistent. The source code for our tool can be found at https://github.com/nobrien97/odeLandscape. Fitness effects were calculated in a similar manner for additive simulations. However, the phenotype was computed as the sum of additive effects instead of using the network ODE solution.

In silico mutant screen

In addition to the DFE of mutations that arose during the adaptive walk, we were also curious about the total DFE across the space of possible mutations (including deleterious mutations that would be lost quickly). We ran an in silico mutant screen experiment to determine whether the fitness distribution of possible mutations differed between additive and NAR populations. A flowchart is given in S2(B) Fig. We sampled 1,000 mutations from a standard normal distribution and added them to the molecular component values/quantitative trait values of individuals at each step of the adaptive walk. This was done for both network and additive models. We chose a standard normal distribution as it matched the distribution of new mutations that populations faced in the simulations. For the network models, mutations were applied separately to αZ and βZ, while in the additive models, the mutation effect was added to the quantitative trait value. We then recalculated the phenotypic and homozygous fitness effects of the sampled mutations using the fitness effect calculation methods described above. The source code for this experiment can be found at https://github.com/nobrien97/NARAdaptiveWalk2023.

Expected waiting times to beneficial mutation

To tie the mutant screen findings to a quantitative measure of adaptation, we used the mutant screen fitness effect distribution to estimate the expected waiting time to a beneficial mutation between models. We first calculated waiting times, twait for a particular simulation replicate at a given adaptive step using the equation (10) where N is the population size, μ is the per-locus, per-generation mutation rate, and ps>0 is the probability that a new mutation is beneficial. N is multiplied by 4 because the population is diploid, and because there were two loci contributing to the trait. ps>0 is given by where m is the total number of mutations generated in the in silico mutation screen experiment and ms>0 is the number of mutation screen alleles with beneficial effects on fitness.

Waiting times were calculated for each adaptive step and model. Comparisons between models were done using a bootstrap analysis. We sampled 100,000 different NAR-Additive model pairs across all adaptive steps, calculating the difference between their waiting times. We then calculated the means and 95% confidence intervals (CIs) for each model’s waiting times and the difference between models (Δtwait). The source code for this analysis can be found at https://github.com/nobrien97/NARAdaptiveWalk2023. A flowchart describing this methodology is shown in S3 Fig.

We also measured the difference in ps>0 between models and across adaptive steps using a linear model: (11) Where α is the intercept, βi is the slope coefficient for dependent variable i, and ϵ is the residual error. To estimate marginal means and contrasts, we used the R package emmeans 1.8.5 in R 4.3.1 [84, 86].

Fitness landscapes

With the mutant DFE evaluated, we turned to how the underlying network might drive the complex distribution that we discovered. This involved constructing a fitness landscape to show how fitness changed with different combinations of αZ and βZ, introducing epistasis to the system. We generated 160,000 combinations of αZ and βZ, sampling αZ and βZ values between 0 and 3. We then calculated the phenotype and fitness for each combination using the ODE landscaper tool we used to calculate fitness effects. We plotted the resulting fitnesses against αZ and βZ using ggplot2 3.4.2 [87] in R 4.3.1 [84]. A similar method was used to estimate the fitness landscape of the ratio βZ/αZ as αZ increased. For this analysis, we generated another 160,000 combinations of αZ and βZ/αZ ratios. Both αZ and βZ/αZ values were sampled between 0 and 3.

A key point to note is that the DFE and the fitness landscape are reflections of each other. The fitness landscape represents the entire space of possible αZ and βZ combinations, whereas the DFE that we estimated during the mutation screen experiments is a subspace of that landscape. This subspace is the area that a population can reach via a single mutational step. Hence, the DFE represents the explorable portion of the fitness landscape for a population at a particular point in time, whereas the fitness landscape is the total space explorable by any population at any time, given a specific genetic architecture and phenotypic optimum. Neither depend on the evolutionary history of the population nor the SSWM assumption of this study.

DFE analysis

The final part of our analysis involved measuring the shape of the distribution of beneficial fitness effects (DFE among beneficial mutations), which should be Gumbel-distributed under Gillespie-Orr predictions [13, 15]. We used a method developed by Beisel et al. [88] to fit a generalized Pareto distribution (GPD) to the DFE among beneficial mutations obtained from the mutant screen. A flowchart describing this approach is provided (S4 Fig). The GPD characterizes a distribution of extreme events that exceed a threshold. The threshold is the high-fitness wild-type and the extreme events are beneficial mutations. The shape parameter of the GPD, κ, indicates which domain of attraction the underlying distribution belongs to. Beisel et al.’s [88] method uses a likelihood ratio test to evaluate if κ = 0, corresponding to the Gumbel case. If κ > 0, the DFE belongs to the Fréchet domain, and if κ < 0, it belongs to the Weibull domain [88, 89].

We used two different sampling approaches to fit our data to GPDs (S4 Fig). In the first method, we pooled mutant screen alleles across all replicates to create a pooled DFE of beneficial mutations for additive and network models. We sampled 1,000 mutations from the pooled DFE and fit the GPD to the resulting sampling distribution. We repeated this process 10,000 times, yielding a distribution of estimates of κ. The pooled approach shows the average shape of the DFE among beneficial mutations across many adaptive walks. In the second method, we sampled up to 100 mutations from each simulation’s mutant screen DFE of beneficial mutations and fit the GPD to that sample. This was done per adaptive step. Simulation-step pairs with fewer than 20 beneficial alleles were discarded as they could not be reliably used to estimate the GPD shape [88]. We obtained a distribution of κ estimates, one for each simulation-step pair. This approach shows the variability in the shape of the DFE during different adaptive walks. In both methods, p-values from the likelihood ratio tests were combined across replicates using Fisher’s method [90]. Both approaches fit the GPD using the R package GenSA 1.1.8 [91] in R 4.3.1 [84], with code modified from Lebeuf-Taylor et al. [92].

Results

Assumption validation

We first checked if we were in the strong selection weak mutation (SSWM) regime assumed by the Gillespie-Orr model. Under SSWM, each beneficial mutation should fix before another arises. Hence, heterozygosity at QTLs/mQTLs should be close to zero most of the time. We found observed heterozygosity (H) at the QTLs/mQTLs was low on average, peaking just after the optimum shift at 11.39% ± 0.423% (95% confidence interval, CI) and 9.637% ± 0.400% (95% CI) in additive and NAR models, respectively (S8 Fig). The grand mean H across all time points was 8.368% ± 0.028% (95% CI) and 7.841% ± 0.027% (95% CI) for additive and NAR models. Although H was not zero, the majority of variation arising was not beneficial (Fig 5B), meaning that the chance that two beneficial mutations could co-segregate was lower than H suggests.

To further test the SSWM assumption, we also measured the ratio between phenotypes generated by only fixations (i.e. the phenotype in absence of all segregating variants) and the mean population phenotype (Eq 8. This ratio describes how much trait variation was contributed by segregating variants compared to fixations. When r = 1, segregating variants contribute no trait variation. This is the SSWM expectation. When r > 1, segregating variants decrease the trait value on average compared to the phenotype due to fixations, and vice versa when r < 1. We found that most populations had no variation contributed by segregating variants (S9 Fig). The average r was 1.019 ± 0.007 (95% CI) and 1.024 ± 0.011 (95% CI) for additive and NAR populations respectively. From these figures, it seems that the populations are in the SSWM domain, enabling us to explore the extreme value theory expectations set out by Gillespie and Orr [1214].

NAR and additive populations differ in their ability to adapt

We first examined the phenotypic response to selection during adaptive walks in populations with either an additive or a network (NAR motif) GP map. We defined an adapted population as one with a mean phenotype within 10% of the optimum (). We chose 10% as the threshold as it only decreased fitness by 0.04% relative to a 5% threshold, but increased the sample size of adapted populations by 42% (2636 adapted populations vs. 1529). By the end of the simulation, 50.5% of the additive simulations and 41% of the network simulations had adapted. For the remainder of the results, we consider only the adapted populations.

Among the adapted populations, there were differences in the rate of trait evolution between the models (Fig 3). Network model populations took longer to reach the optimum on average (Fig 3A), although the changes in phenotype across adaptive steps were similar between the models (Fig 3B). Note that this difference cannot be solely attributed to the NAR ODE, as the multiplicative scaling of the allelic effects also contributes to differences between the models. See the Discussion for more on this distinction. There was no difference between the models in the number of adaptive steps taken. Of the populations that adapted, 66.3% did so in a single step, 28% in two steps, and the remaining 5.7% adapted in three or more steps. On average, the number of steps taken was 1.4 ± 0.026 (95% CI). This agrees with results from Orr [93], who found a mean walk of ≈ 1.72 steps, assuming that the genotype was comprised of many loci and that the starting genotype of the walk was randomly sampled.

thumbnail
Fig 3. Phenotypic evolution over 10,000 generations of adaptation across 2,880 replicates of additive and network models.

(A) The distribution of mean phenotypes among replicate populations during the adaptive period. The -1000 timepoint represents 1,000 generations before the optimum shifted. (B) The mean phenotype at each adaptive step (each fixation along the adaptive walk). The last step is pooled across all steps greater than or equal to step three in the walk. The dotted line in A and B represents the post-shift optimum. The grey area shows the overlap between additive and network distributions.

https://doi.org/10.1371/journal.pgen.1011289.g003

Among populations that had not yet adapted after one step (the left tail in Fig 3), the median step in a network model occurred 900 (95% CI [150, 1600]) generations later than in the additive models (S5 Fig). To investigate the underlying causes of the differences in adaptation success between models, we examined the distribution of beneficial fitness effects among fixations during the adaptive walk.

The distribution of beneficial fitness effects among fixations is similar between NAR and additive populations

Fixations had similar effect sizes in both models and became smaller over the course of the adaptive walk (Fig 4). We found that the mode of the distribution of beneficial fixations was offset from 0 (S6 Fig), supporting a gamma distribution of fixations rather than an exponential distribution. We fit gamma distributions to the DFE among fixations in both the additive and network models using fitdistrplus 1.1.8 in R 4.3.1 [84, 94]. The deleterious fixations shown in Fig 4 were excluded from the dataset when fitting the gamma distribution. These deleterious fixations were neutral or adaptive prior to the shift in the optimum and typically reached high frequencies before the adaptive walk commenced (S7 Fig). Therefore, these fixations can be attributed to the action of genetic drift.

thumbnail
Fig 4. Fitness effects among fixations during adaptive walks to a phenotypic optimum across 2,880 replicates of additive and network models.

(A) The overall distribution of all fixations across the entire walk. (B) The distribution of fixations at each adaptive step. The grey area shows overlap between the additive and network distributions of effects.

https://doi.org/10.1371/journal.pgen.1011289.g004

We compared the shape and rate parameters of the gamma fits between the models. The network model populations exhibited slightly fewer small effect fixations and had a shorter tail compared to the additive models (NAR: Shape = 2.032 ± 0.147 95% CI, Rate = 44.392 ± 3.651 95% CI; Additive: Shape = 1.722 ± 0.108 95% CI, Rate = 37.588 ± 2.730 95% CI). However, the DFE among beneficial fixations did not differ strongly between the models. To investigate this further, we needed to determine the availability of beneficial mutations to the populations. We did this by conducting an in silico mutant screen experiment, examining the DFE among new mutations at each adaptive step.

The distribution of fitness effects among possible mutants is complex and deleterious in NAR populations

The in silico mutation screen experiment revealed that the DFE among new mutations differed dramatically between models (Fig 5A). Both were strongly negatively skewed; however, mutations in NAR populations were overall more deleterious than those in additive models, with a longer and more pronounced tail of strongly deleterious mutations. In addition, the DFE in NAR populations was bimodal (Fig 5A). The first mode was at s ≈ 0, similar to the additive DFE mode (NAR: s = −0.0003; Additive: s = −0.0004). The second mode was at s = −0.113. Among beneficial mutations, the distributions were more similar.

thumbnail
Fig 5. Fitness effects among possible mutations and the consequences for adaptation in 2,880 replicates of additive and network models.

(A) The distribution of fitness effects among 1,000 randomly sampled mutations in additive and network models. The grey area represents overlap between additive and network models. (B) The proportion of mutations that are beneficial over the adaptive walk. (C) The change in beneficial mutation probability corresponds to the expected waiting time for a beneficial mutation.

https://doi.org/10.1371/journal.pgen.1011289.g005

Data pooling affects generalized Pareto distribution fitting results

We fit a generalized Pareto distribution (GPD) to the DFE among beneficial mutations generated by the mutant screen experiment using methods developed by Beisel et al. [88]. This was done separately for additive and network models. The shape parameter of the GPD (κ) describes what domain the DFE belongs to: κ = 0 specifies a Gumbel domain, κ < 0 specifies a Weibull domain, and κ > 0 specifies a Fréchet domain. We fit this in two ways: first, by pooling all replicates and sampling from this pooled distribution to estimate the GPD parameters; and second, by sampling each replicate at each of its adaptive steps separately and estimating the GPD for each estimate (for more information, see the Methods section). Under the pooled approach, we found evidence for a minor deviation from an exponential distribution towards the Weibull domain in both the additive and network models (Network: ; Additive: ). However, this deviation is small enough that the adaptive walk should behave as if its DFE belonged to the Gumbel domain [17].

Under the per-simulation method, we saw a different result. The DFE among beneficial mutations in both NAR and additive populations were strongly Weibull distributed. In the NAR model, κ tended to decrease over the adaptive walk. However, the additive model κ estimate was relatively stable at around κ ≈ −2, slightly increasing over the adaptive walk (S11 Fig and S2 Table). In addition, these non-pooled fits were considerably more variable than under the pooled approach (S10 Fig), particularly at early adaptive steps. Under the per-simulation/adaptive-step approach, across all adaptive steps, 99.9% of additive replicates and at least 99.6% of NAR replicates had DFEs of beneficial mutations that fit the Weibull domain. Under the pooling method, all replicates in both models were approximately Gumbel-distributed (i.e. κ ≈ 0).

Beneficial mutations are less common in NAR populations than additive populations

Despite sharing similar DFEs, beneficial mutations were on average less common in network models than in additive models (Linear model, Eq 11; F7,5171 = 1239, R2 = 0.626, Fig 5B). The difference between models diminished with adaptive steps (as populations neared the optimum). Immediately after the optimum shift, mutations in NAR populations were 9.504 ± 0.639% (95% CI) less likely to be beneficial. At the first adaptive step, this difference decreased to 4.648 ± 0.639% (95% CI). At the second adaptive step, the NAR models were 3.630 ± 1.107% (95% CI) less likely to produce a beneficial mutation. At further adaptive steps, there was no difference between models. We measured the effect of the lower beneficial mutation rate on adaptation by estimating the waiting time for a new beneficial mutation to arise. We found that the mean expected waiting time for a beneficial mutation in NAR populations was 98.555 ± 5.816 (95% CI) generations longer than in additive populations (Additive mean waiting time: 136.382 ± 2.131 (95% CI) generations; NAR mean waiting time: 234.937 ± 5.414 generations; Fig 5C). Given that the DFE differed between models, we wanted to explore the causes underpinning this. To do so, we investigated the fitness landscape of αZ and βZ.

The NAR fitness landscape is ridge-like across molecular component space

We found a pronounced fitness ridge on the NAR fitness landscape, surrounded by valleys of low fitness (Fig 6A). This ridge lay diagonally across αZ and βZ space, suggesting that fitness depended on the ratio of the molecular components. This relates to the underlying ODE: βZ/αZ is the level of steady state Z expression under simple gene regulation (i.e. without any negative feedback or induction) [52]. Plotting this ratio over αZ confirmed this: βZ/αZ was the arbiter of fitness as long as αZ ⪆ 0.5 (Fig 6B). We investigated how the βZ/αZ ratio contributed to the shape of the GP map, finding a nonlinear relationship between βZ/αZ and phenotype (Fig 6C). Around the optimum, increasing βZ/αZ quickly increases the trait value, however with larger βZ/αZ increases, the trait value changes less (Fig 6C). We approximated the optimum βZ/αZ ratio (), by substituting our fixed molecular components (KXZ = 1, KZ = 1, h = 8) into Eq 1. We considered the case where X = 1, simplifying to

thumbnail
Fig 6. The fitness and phenotype landscapes of the network model.

(A) The fitness landscape with regards to the molecular components of the network, αZ and βZ. The landscape consists of a high-fitness ridge (yellow) surrounded by low-fitness valleys (purple). (B) The ratio of βZ/αZ, the steady state of the network, is what matters for adaptation. As αZ increases relative to βZ/αZ, fitness becomes constant over αZ. (C) The phenotype is a nonlinear function of the network steady state, βZ/αZ, although largely linear around the phenotypic optimum. An optimum ratio at maximizes fitness. The dashed black line shows the phenotypic optimum. (D) The contributions of mutations in each molecular component to the adaptive walk. This is taken for populations with at least 2 steps in the walk. ϕ shows the difference in sums of absolute allelic effects among fixations in αZ and βZ across an adaptive walk. Values greater than 0 represent larger changes in αZ than βZ, and vice versa for values smaller than 0. At 0, both αZ and βZ changed by the same amount during the adaptive walk.

https://doi.org/10.1371/journal.pgen.1011289.g006

The equilibrium fulfills the equation

If we assume a model of simple gene regulation, where θ is the steady state of gene expression under simple gene regulation [52]. For small values of Z (where Z < 1), this is well approximated by

Using this, we approximate the solution of the ODE by the equilibrium whenever X = 1 and by 0 when X = 0. Hence, the solution can be given by When the optimum is at P = M = 2, the optimum steady state is

This assumes that the increase in Z after X is activated is rapid, and likewise for the decrease in Z when X is deactivated (i.e. in Fig 2B, the green curve should be close to rectangular around X activation). This behavior is driven by h: for our chosen value, the above approximation predicts the observed in Fig 6B.

To explore how the nonlinearity of the GP map might have influenced how populations navigated molecular component space, we examined the ratio of absolute additive effect sizes on molecular components among populations that adapted by two steps or more, denoted by ϕ (Fig 6D). Mathematically, where a is the allelic effect on αZ or βZ without the multiplicative transformation (i.e. the value composed by Eq 6 instead of Eq 3. Allelic effects were multiplied by two since we simulated a diploid population. When ϕ > 0, fixations have larger effects on the αZ axis than the βZ axis, and vice versa for when ϕ < 0. When ϕ = 0, there are equal contributions from mutations affecting αZ and βZ. On average, ϕ = 0.164 ± 0.150, suggesting that αZ mutations contributed most to adaptation, albeit by a small amount. Of the populations considered for this analysis, 29.5% adapted with only αZ mutations, 25% with only βZ mutations, and the remaining 45.5% used a combination of both to reach the optimum.

Discussion

Overview

In this study, we explored how a simple gene network might influence trait evolution during an adaptive walk scenario. Models of adaptation have traditionally relied on direct genotype-phenotype relationships [42, 95, 96], overlooking the intricate gene interaction networks that drive trait expression at different stages of development (but see [68, 70, 71]). Understanding how genetic networks influence adaptation is an under-explored area of research, particularly in the study of DFEs.

We observed that both network and additive models exhibit similarities in key aspects of their adaptive walks, specifically the number of adaptive steps and the distribution of fitness effects among both fixations and beneficial mutations (Figs 3 and 4). However, the characterization of the DFE among beneficial mutations was sensitive to the estimation method employed. Utilizing a pooling approach under the assumptions of the Central Limit Theorem leads to a homogenization of the shape parameter κ, nudging the distribution towards the Gumbel domain, and thus suggesting exponential spacing between adjacently-ranked beneficial alleles [17]. The averaging effect of pooling samples can mask underlying Fréchet or Weibull behaviors in κ, effectively reducing our ability to discriminate between different extreme value distributions.

In contrast, using a non-pooled approach preserves the contributions of each replicate’s unique combination of mutations to the shape parameter, improving the detection of non-Gumbel behavior. However, this is not always possible in experimental studies where the number of beneficial mutations obtained per round of adaptation is small. Under these conditions, pooling between replicates is required to reduce the risk of incorrectly rejecting a Gumbel distribution [17, 88]. We suggest that researchers should be cautious of the homogenization effect of the pooling approach when estimating the DFE among beneficial mutations. Given these considerations, we next discuss the variation in estimates of the shape parameter κ when using pooled and non-pooled approaches.

When we fit GPDs to each replicate with their own set of mutations, almost all the fitted GPDs belonged to the Weibull domain (S10 Fig). In addition, κ was quite variable between simulations, models, and adaptive steps. κ affects the GPD by modulating an upper bound on the effect size of beneficial mutations. With decreasing κ, this upper bound approaches zero. κ was relatively stable in both models, but tended to decrease over longer walks under the network model and increase over longer walks in the additive model (particularly when comparing κ before the optimum shift and at adaptive steps ≥ 3 (S11 Fig and S2 Table). Regardless of this change, both models were largely Weibull-distributed at all adaptive steps (over 99% of simulations/adaptive steps belonged to the Weibull domain), which rejects Gillespie’s Gumbel expectation.

Weibull-distributed DFEs of beneficial effects have been observed in a number of empirical studies in viral and bacterial populations (e.g. [16, 20, 29]). Furthermore, a Weibull distribution of effects is predicted when populations are close to an optimum and/or under stabilizing selection [18]. Hence, our results match theoretical expectations. Evidence for limits on adaptation seems common and both models seem to be affected by such limitations. While both models were starved of large-effect beneficial mutations relative to the exponential expectation, this effect was more limiting for network models than additive as the walk went on. This was further highlighted by a major difference between network and additive responses to selection: the rate of trait evolution.

Trait adaptation and network evolution

In our simulations, we found that network simulations approached the optimum more slowly than additive models on average (Fig 3A). This difference in rate might not have only been due to the NAR structure, but also because of the difference in allelic effect scaling between additive and network models, as shown in Eqs 3 and 6. This scaling difference can influence rates of adaptation, mandating caution in attributing observed differences solely to the network’s complexity. Hence, we do not make conclusions on how much the observed differences are due to the NAR network’s structure compared to the difference in allelic effect scaling. Nonetheless, there was a difference in the rate of adaptation between the models and the NAR DFE does not completely reflect the expectation under a multiplicative model, so the distribution of effects is at least partly mediated by the NAR structure. This will be the subject of a future study.

We explored the distribution of new mutations and discovered a complex DFE in the network models, contrasting with the simple distributions observed in additive models (Fig 5). The additive model distribution largely met Gillespie’s expectations: the DFE was comprised of mostly neutral mutations, with a small number of beneficial mutations and a long tail of deleterious mutations (Fig 5, [97] pg. 267, Fig 6.5).

In the network model, we found a bimodal distribution of fitness effects. The bimodality is more pronounced than expected compared to a multiplicative model, suggesting that the NAR contributes to this shape. This observation aligns with previous studies that have reported complex and multimodal distributions of deleterious fitness effects (e.g. [98, 99]), potentially attributed to different classes of mutations with distinct DFEs [24]. A recent empirical study in Escherichia coli also found a similar bimodal DFE following a similar evolutionary regime [100]. Mutations in gene regulatory regions can have widely varying effects on fitness (e.g. [101103]), suggesting that network structure can strongly affect the fitness distribution of mutants on a given molecular component [104]. The ridge-like shape of the αZ/βZ fitness landscape reflects this, showing that dependencies between the molecular components drive the fitness of organisms in the NAR model and that both the phenotypic and fitness effect of a mutation depends on its genetic background (Fig 6).

The additive model has a simple, one-dimensional fitness landscape: it recapitulates the fitness function, a normal distribution (Eq 5). This simplicity is not the case in the NAR model. The fitness landscape between the molecular components of the NAR revealed non-interchangeable per-locus effects (Fig 6A and 6C). The ridge-like shape of the landscape corroborates findings by Kozuch et al. [64], who investigated a similar NAR motif in the E. coli lexA transcription factor. NAR populations also faced lower beneficial mutation rates and a slight bias favoring αZ mutations over βZ, which might have been caused by the ridge-like landscape (Figs 5 and 6D). The bias suggests αZ mutations should be less deleterious on average than βZ mutations. Supporting this, αZ mutations were slightly less likely to be strongly deleterious, instead falling between the distribution’s modes (S13 Fig). However, both mutation types remained largely deleterious, and had similar DFEs among beneficial mutations. A potential explanation for the bias towards αZ mutations involves the optimal βZ/αZ ratio of 0.8 post-shift, versus 0.4 at burn-in. To reach this optimum from burn-in, populations could increase βZ or decrease αZ. However, the required αZ decrease exceeded the βZ increase. In essence, larger αZ mutations were needed to reach the new optimum. Crucially, αZ and βZ mutations did not affect the phenotype equally.

Complexity and the cost of adaptation

In our study, the NAR model demonstrated greater complexity compared to traditional additive models. We discovered a nonlinear relationship between βZ/αZ and phenotype, embodied by a nonlinear genotype-phenotype (GP) map (Fig 6C). Such maps are known to impose adaptive constraints [105, 106], reflected in the multimodal shape of the deleterious DFE for NAR populations (Fig 5A). The NAR model’s increased complexity through multiple molecular components and dynamic biochemical interactions renders it more susceptible to deleterious mutations. This vulnerability could amplify Hill-Robertson interference, with beneficial alleles overshadowed by deleterious backgrounds [107]. For instance, consider the persistence of epistasis over time. Gene interactions can lead to situations where genetic combinations have greater effects on fitness than the sum of their parts (positive epistasis), or the combinations can reduce fitness compared to the sum expectation (negative epistasis). Drift and mutation generate positive and negative epistasis in similar proportions [108]. However, when recombination is rare, selection will quickly fix synergistic combinations, leading to an excess of deleterious gene combinations which persist through drift and limit the efficiency of selection [107]. When a gene network introduces further epistasis on traits, this might compound the imbalance between negative and positive epistasis. Thus, network complexity may trade off with adaptive potential.

Trade-offs between complexity and adaptability echo the cost of complexity described by Orr [109]. The cost of complexity postulates that an increasing number of traits under selection leads to an increasingly deleterious mutation space, impeding adaptation. This cost could also apply to network models of traits, where molecular components under selection limit adaptation. While the NAR motif is simple, we were surprised by the complexity already apparent in the DFE. If even simple networks can generate such complexity, the cost of complexity might be more important for driving the evolution of traits than previously thought. Some empirical examples of these costs in network-mediated traits exist. Costanzo et al. [110] found that increasing the degree of genetic interaction in Saccharomyces cerevisiae genes was negatively correlated with single mutant fitness, suggesting an adaptive cost to maintaining a highly connected gene network. This cost might lead to the long-term stability of gene networks or complex traits: comparisons between S. cerevisiae and Schizosaccharomyces pombe genetic interaction networks revealed remarkable similarity in network structure [111]. Another study by Barua and Mikheyev [112] found features of housekeeping gene networks involved in reptile venom production were conserved between amniote clades, with those genes contributing to saliva production in mammals [112]. The cost of complexity suggests rather strong limitations on the evolution of complex genetic networks: in Orr’s [109] model, the rate of adaptation declines according to the inverse square root of the number of traits, . Empirical evidence suggests that biological networks can be extremely complicated, connecting hundreds of nodes. Adaptation involving such a highly connected network would be extremely slow if Orr’s [109] theory is correct. However, aspects of network structure, including local connectivity and modularity, might alleviate such a cost of molecular complexity.

One empirical example of network structure modulating the rate of evolution is the biofilm production network in Candida species [113]. In this network, seven master regulators control the expression of about one-sixth of the C. albicans genome and are required to drive biofilm production [113]. Genes that are connected to one or more of these master regulators are much less connected than the master regulators themselves and more “free” to evolve without affecting the expression of hundreds of other genes. Mancera et al. [113] found that the master regulators evolve slowly compared to the target genes, possibly due to the structure of the network. Since the master regulators affect the expression of hundreds of genes, the space of beneficial mutations at master regulator loci might scale according to the cost of complexity. On the other hand, the authors found that target genes are considerably more divergent between Candida species and populations, perhaps due to target genes harboring fewer interactions within the network than master regulators [113].

Despite the NAR motif being a simple network, we were surprised to find the adaptive constraints it appears to impose on populations. This point is underlined by the density of strongly deleterious alleles in the DFE of new mutations (Fig 5). Fixing multiple NAR molecular components (e.g. KZ, KXZ, h) renders the model’s behavior more predictable. For sufficiently large αZ, individual fitness correlates solely with the ratio βZ/αZ (Fig 6B), reflecting the steady-state cellular concentration under simple regulation [52]. This outcome may be attributed to our parameter choices; specifically, the rapid onset of Z activation/suppression due to h = 8. Varying these Hill coefficients or treating them as evolvable components in future work could offer deeper insights into how more gradual responses to X activation/deactivation and Z production might modulate total Z expression [52]. Splitting the Hill coefficient into an activation coefficient (hX) and a repression coefficient (hZ) would give further insight into the relative importance of X and Z responses to phenotype variation. Moreover, allowing these parameters to mutate could elucidate the full NAR fitness landscape and the cost of complexity apparent in a full NAR system.

Should KZ be allowed to evolve, the fitness landscape would likely exhibit increased ruggedness. In this context, the optimal βZ/αZ ratio would be contingent upon the KZ value. A similar effect is anticipated for KXZ, the regulator of Z’s production rate in response to X. Consequently, both steady-state and fitness would depend on KZ and KXZ. Introducing more evolvable molecular components is expected to amplify the landscape’s ruggedness due to interdependencies that influence fitness, exacerbating the previously noted cost of complexity. To substantiate these projections, future research should explore more intricate networks, as exemplified by empirical studies such as Bertheloot et al. [114]. In addition, exploring the behavior of other network motifs will elucidate the generality of our findings.

There are other motifs that are common in nature that could be reasonable choices for a toy model of network-mediated traits. Other options include feedback loops, feed-forward loops, single input modules, and cascades. Each of these motifs have different behaviors that could lead to different evolutionary dynamics. For instance, feedback loops can create oscillatory expression patterns [115]. Autoregulation is a simple example of a feedback loop, however multi-locus feedback loops also exist. The p53–Mdm2 feedback loop is an example of a two-gene feedback loop in vertebrates. p53 is an important tumor suppressor transcription factor. Its expression is kept at an equilibrium cell concentration by the presence of Mdm2 [116]. p53 is a positive regulator of Mdm2, whilst Mdm2 inhibits p53 expression. When this balance is interrupted, increased Mdm2 production results in cell proliferation, whilst increased p53 production leads to cessation of growth [116]. Another common motif, the feed-forward loop, has eight different configurations, each with different behaviors [52]. Some of these configurations are more common in transcription networks than others, suggesting a selective pressure favoring these ubiquitous forms [117]. One of these common forms, the type I coherent feed-forward loop, is implicated in compensatory evolution via the emergence of epistasis [118].

Ramifications of strong selection weak mutation (SSWM) and alternative approaches

This study focused on adaptation by employing a Gillespie-Orr model, which operates under the assumption of SSWM, characterized by Ns ≫ 1 and ≪ 1. We ensured adherence to the SSWM assumption by choosing an appropriate population size and mutation rate (see Table 1). In our simulations, = 0.092. However, Ns varies due to selection coefficients being dependent on individuals’ genetic backgrounds and phenotypes. Nonetheless, Ns > 1 holds at s > 0.0002 for this model, so most beneficial mutations should meet the Ns ≫ 1 criterion. In the instance they do not, these mutations are driven by drift [119]. This also applies to slightly deleterious mutations. N|s| = 1 represents a “drift barrier”, where the fitness effects of alleles become effectively too small to drive allele frequency changes, leading to drift-dominated dynamics.

The drift barrier explains our finding that the DFE among fixations followed a gamma distribution as opposed to an exponential in both network and additive models (S6 Fig). As populations approach the optimum, selection’s efficiency is weakened by the drift barrier [119]. At the barrier, the probability of fixation (π) is no longer predicated by π ≈ 2s when the population is close to the optimum. Instead, π = 1/2N becomes more applicable [120]. The switch from selection-dominated to drift-dominated dynamics occurs when N|s| = 1 [121]. In our simulations, this means that when |s| < 0.0002, drift is expected to dominate. Owing to the relatively large population size in our simulations (N = 5000), drift is unlikely to drive allele fixations early in the adaptive walk (i.e. when most beneficial alleles have s > 0.0002), however later steps might be more affected as the possible fitness improvement of any given allele faces diminishing returns.

Another assumption of the SSWM model is that only one mutation segregates at any given time point, i.e. before a new beneficial mutation arises, the previous beneficial mutation must have fixed. This was met most of the time (see S8 and S9 Figs), however, some simulations revealed instances where segregating variation contributed to adaptation, including the presence of alleles under balancing selection (S12 Fig). Future research should therefore explore the model using a polygenic approach, where many loci contribute small amounts to the molecular components underlying complex traits. Adaptation would then occur by small frequency shifts at these loci [122]. The genetic architecture of polygenic traits, that is, the number of loci, frequencies, and the distributions of their additive and non-additive effects on phenotypes, can greatly affect the response to selection [85]. There are also expectations for how genetic networks might determine the genetic architectures of polygenic traits: large-effect loci are likely linked to cis-regulatory features and coding region mutations, whilst small-effect loci are more likely to represent trans-regulatory elements [2, 123]. However, the non-additive effects of genetic networks on phenotypes remain understudied. These interactions have ramifications for theories of the evolution of recombination [124], and the persistence of linkage disequilibrium in natural populations [108].

A final point on SSWM is that it assumes that the fitnesses of a wild-type genotype and its mutants are uncorrelated [125]. This may not be the case: Cowperthwaite et al. [125] found that the fitnesses of simulated RNA mutations were correlated with those mutations’ wild-type genotypes, violating the Gillespie-Orr model’s assumptions. When mutations are correlated with the wildtype, the adaptive walk might be elongated, as the fitness benefit of a mutation is reduced relative to an uncorrelated beneficial allele [126]. However, work by Orr [127] showed that most results from the Gillespie-Orr model are robust to such correlations, especially those pertaining to the first step of the walk.

Opportunities and challenges in modeling complex genetic networks

Our study serves as an introductory venture into the evolutionary modeling of genetic networks, elucidating how they guide evolutionary trajectories. The strong selection and weak mutation assumptions improve computational efficiency and interpretability but sideline rich polygenic dynamics [36, 85]. Our simplified NAR motif offers initial insights but lacks empirical complexity, making future work with more complex architectures important. Moving forward, our focus will shift towards a spectrum of genetic architectures to better capture non-additive effects. This will include investigating other network motifs, including several feed-forward loops. By doing so, we will enrich our understanding of how topology influences evolutionary pathways [128]. This transition will require further statistical and computational innovations to maintain feasibility. Collaborations with molecular biologists will enrich our models with further mechanistic details. Rigorous validation against empirical data and existing theories will be a cornerstone in constructing more realistic models of adaptation (e.g. [114]).

To drive these future works, we have constructed a general form of our network model in S1 Appendix. In the full model, we envision a phenotype as a combination of “molecular traits”: intermediate traits like methylation or gene expression which represent cellular or developmental processes [129]. Our NAR model represents the simplest case, where there is one molecular trait, Z expression, which we treat as the phenotype. In more complex implementations, phenotypes can emerge from a composite of molecular traits. Each molecular trait has its own differential equation and its own set of molecular components driving gene expression, creating a hierarchical model of gene expression driving quantitative trait variation.

As we delve into more complex systems, computational hurdles become prominent. The complexity of the ODE system significantly impacts in silico performance [130, 131]. We are considering optimization strategies such as smarter caching of common genotypes, data-oriented design, and adaptive step-size algorithms. Pre-computed ODE solutions stored on disk are also under exploration as a means to tackle the computational load. Another option is to approximate ODE solutions via smooth functions or a deep neural network trained on previous simulated data. In summary, our study, while preliminary, opens up avenues for understanding the complex interplay between genetic networks and evolutionary dynamics, leaving us optimistic about the future of this research area.

Conclusion

Our network approach contributes methodologically to the field and holds broad applicability to many open questions in evolutionary biology, from quantitative genetics to the mechanisms underpinning adaptability. It offers a robust framework to investigate issues such as the maintenance of genetic variation, the evolution of recombination, and the dynamics of adaptation from both standing variation and de novo mutations. We have introduced how networks can generate epistasis, with implications for the evolution of recombination, maintenance of linkage, and the trade-offs between network complexity and adaptability. Our study here shows that networks can create complex distributions of fitness effects that differ from additive expectations. While promising, it is essential to acknowledge the model’s limitations, particularly concerning its computational demands and simplifications, which future work should aim to address. Empirical validation through evolutionary experiments will be crucial in assessing our model’s real-world applicability. Expanding into adaptation by standing variation will further illuminate the role of genetic interactions during polygenic adaptation. As detailed systems models of complex traits become increasingly available, our framework can be employed to simulate the evolution of traits based on empirically-described networks. The utility of our model also extends beyond evolutionary biology, offering valuable insights for interdisciplinary collaborations that aim to integrate molecular biology, computational science, and statistical genetics. Future work should focus on leveraging this interdisciplinary potential and on exploring more complex genetic networks to better understand how gene interactions constrain evolution.

Supporting information

S1 Appendix. A hierarchical molecular network model for quantitative traits.

An extension of the negative autoregulation model to a general form to model quantitative traits under the control of an arbitrary genetic network.

https://doi.org/10.1371/journal.pgen.1011289.s001

(PDF)

S1 Table. NAR model parameters.

Table of symbols, names, descriptions, and values for relevant parameters used in the NAR model.

https://doi.org/10.1371/journal.pgen.1011289.s002

(PDF)

S2 Table. Generalized Pareto distribution parameter estimates.

Mean generalized Pareto distribution (GPD) parameters fit to mutant screen distributions of fitness effects over adaptive walks in additive and network populations. Brackets indicate 95% confidence intervals. Parameters were estimated by fitting a GPD to a random sample of mutations from the mutant screen experiments conducted on each replicate at each adaptive step. κ is the shape parameter of the GPD. is the mean κ value across n replicate simulations. The log-likelihood ratio tests the null hypothesis that a sample of beneficial mutations belongs to an exponential distribution (which meets κ = 0). P-values across the n replicates were combined using Fisher’s method.

https://doi.org/10.1371/journal.pgen.1011289.s003

(PDF)

S1 Fig. Flowchart of SLiM simulation procedure.

SLiM simulates a Wright-Fisher process to model evolutionary change through a genotype-phenotype-fitness (GPW) map. Our simulation began with 50,000 generations of burn-in to ensure populations were adapted to the environment. We then shifted the phenotypic optimum and adaptation was tracked for a further 10,000 generations. This process was repeated 2,880 times per model for replication purposes (i.e. there were 2,880 replicates of network and additive adaptive walks for a total 5,760 simulations). The GPW map consisted of several stages: first, the genotype was translated to phenotype via summing genetic effects at QTLs (for additive models) or by solving a system of ordinary differential equations (network models, P/t figure shows solutions to the differential equation for the blue and purple genotypes). Phenotype was then translated to fitness by a stabilizing selection fitness function (shown by the w/P figure—the purple phenotype has lower fitness than the blue phenotype). The fitness value then influenced the chance that an individual was sampled as a parent for the next generation. After parents were chosen, random mutations (μ) could occur to introduce further genotypic and phenotypic variation in the next generation (shown in orange).

https://doi.org/10.1371/journal.pgen.1011289.s004

(TIFF)

S2 Fig. Flowchart describing how selection coefficients and the distribution of fitness effects (DFE) among new mutations were estimated.

(A) Consider two alleles contributing to the phenotype (blue and purple boxes) at either the same or different loci. Individuals with both alleles have some phenotype (P1), which gives rise to a fitness (w1). By removing the purple allele and recalculating the phenotype, we achieve a different phenotype (P2) and fitness (w2). The difference between w1 and w2 represents the selection coefficient (s) of the purple allele. This difference can be measured either through addition—adding the purple allele to the genotype, or by knockout (removing the purple allele from the genotype). (B) To estimate the distribution of fitness effects (DFE) among new mutations, we conducted a mutation screen experiment. We generated mutants by taking 1,000 samples (ϵ) from a standard normal distribution and adding those to the molecular component values from the SLiM simulations (αZ and βZ). Mutant phenotypes (Pm) were calculated by inputting the mutant component values into an ordinary differential equation and solving it. The 1,000 samples were independently added to each molecular component to measure the DFE of both components. We then calculated the selection coefficients of ϵ by the addition method in (A). Pm represents P1 in (A), whilst the phenotypes without the added ϵ represents P2. We then plotted the joint distribution of s across all sampled ϵ.

https://doi.org/10.1371/journal.pgen.1011289.s005

(TIFF)

S3 Fig. Flowchart describing how waiting times to new beneficial mutations were calculated.

(A) For each of the 2,880 replicates, we took the distribution of s calculated during the mutation screen experiments and extracted the proportion of new mutations with s > 0, ps>0. This is the area under the curve shown in dark blue/red (to the right of the dashed line). This was repeated for each adaptive step (Nsteps within a replicate (i.e. replicates with more than one adaptive step had ps>0 calculated for each adaptive step). The waiting time to a new beneficial mutation for a given replicate at a given step was calculated as twait = 1/(2NμLQps>0), where N is the population size (multiplied by 2 because we simulate a diploid organism), μ is the per-locus, per-generation mutation rate, and LQ is the number of causal loci. (B) To calculate the difference between models in the waiting time to a new beneficial mutation, we ran a bootstrap analysis. We sampled 100,000 random pairs of additive and network models calculating twait for each. We calculated the difference between their waiting times, Δtwait. The random sampling gave a distribution of differences between network and additive waiting times. We used a paired t-test to determine if the mean difference in waiting times between models, was not zero (and hence there was a difference between models).

https://doi.org/10.1371/journal.pgen.1011289.s006

(TIFF)

S4 Fig. Flowchart describing the estimation procedure for the shape of the distribution of fitness effects (DFE) among beneficial mutations via two approaches.

(A) Using the pooled distribution of s across all simulations and adaptive steps that we calculated during the mutation screen experiments, we sampled 1,000 beneficial mutations (where s > 0) to generate a distribution of beneficial mutations. We then fit this to a generalized Pareto distribution (GPD) using the method outlined by Beisel et al. [88]. From the fit, we extracted the shape parameter of the fit, κ. This process was repeated 10,000 times to generate a distribution of κ. We treated the mean κ, as the estimate for the shape parameter. (B) We sampled nx,y mutations from each simulation’s s distribution at each adaptive step. The s distributions were generated during the mutation screen experiments. nx,y was , where was the number of mutant screen alleles with s > 0 in replicate x and adaptive step y. The sampled distributions were fit to a GPD as per (A), generating a distribution of κ from each replicate and adaptive step. The mean was again treated as the estimate for the shape parameter.

https://doi.org/10.1371/journal.pgen.1011289.s007

(TIFF)

S5 Fig. Distribution of generations at which mutations fixed in adapting additive (red) and network (blue) populations at each adaptive step.

There was no significant difference between populations at adaptive step 1. Generations are given relative to the optimum shift so that Generation 0 is when the optimum shifted.

https://doi.org/10.1371/journal.pgen.1011289.s008

(TIFF)

S6 Fig. Zoomed view of Fig 4B.

The mode of the distribution is clearly greater than 0, possibly due to small effect alleles being more susceptible to loss by drift.

https://doi.org/10.1371/journal.pgen.1011289.s009

(TIFF)

S7 Fig. Behavior of deleterious fixations in additive (red) and network (blue) models.

(A) The selection coefficient, s, decreases after the optimum shift at generation 50,000. (B) The allele frequency, p, of the fixations at the optimum shift.

https://doi.org/10.1371/journal.pgen.1011289.s010

(TIFF)

S8 Fig. Mean observed heterozygosity in additive (red) and network (blue) populations over the adaptation period.

The mean is taken over 2,880 replicates per group. Heterozygosity was measured at the two QTLs (mQTLs in NAR models) in both models.

https://doi.org/10.1371/journal.pgen.1011289.s011

(TIFF)

S9 Fig. Ratio of fixed effect phenotypes to mean population phenotype in additive and network models.

Values > 1 indicate that segregating variation decreases the mean population phenotype, and vice-versa for values < 1. Values = 1 indicate no segregating variation contributing to trait variance in the population.

https://doi.org/10.1371/journal.pgen.1011289.s012

(TIFF)

S10 Fig. Comparison of sampled generalized Pareto distribution (GPD) fits to the distributions of fitness effects (DFE) among beneficial mutations fit onto mutations sampled by two methods.

Mutations were created by a mutant screen experiment where 1,000 mutations were randomly generated and their fitness effects measured relative to the phenotype of a population at a given adaptive step. The pooled method (A) pooled together the mutations from all mutant screens across simulation replicates and adaptive steps and used bootstrapping to repeatedly sample 1,000 alleles from the pooled distribution to fit a GPD onto. The non-pooled method (B) fit the GPD onto a sample of 100 mutant screen alleles from each replicate simulation at each of its adaptive steps. Each figure shows the resulting GPD fits of 5 randomly sampled replicates per model of these two methods (i.e. either 5 simulations in the non-pooled case or 5 repeated samples of 1,000 alleles for the pooled case). Note that fewer than 5 samples existed for adaptive steps > 1 in (B), owing to the rarity of walks that long.

https://doi.org/10.1371/journal.pgen.1011289.s013

(TIFF)

S11 Fig. Comparison of mean shape parameters, κ, from generalized Pareto distribution (GPD) fits to beneficial alleles randomly sampled from each simulation’s mutant screen experiment between models and at each adaptive step.

κ describes the shape of the GPD, with negative values (seen here) indicating a Weibull domain of attraction. As κ decreases, the maximum size of a beneficial mutation decreases. Error bars are 95% confidence intervals. Sample sizes for each group are given in S2 Table.

https://doi.org/10.1371/journal.pgen.1011289.s014

(TIFF)

S12 Fig. Two examples of alleles under balancing selection during adaptation in both additive (red) and network (blue) models.

The additive mutation had a phenotypic effect α = −0.618 while the NAR allele had a phenotypic effect α = −0.689.

https://doi.org/10.1371/journal.pgen.1011289.s015

(TIFF)

S13 Fig. The distribution of fitness effects among new mutations in network models for both αZ and βZ molecular components.

Compared to βZ mutations, αZ mutants were less likely to have strongly deleterious effects and more likely to have slightly deleterious effects existing on one of the two modes of the distribution. The shape of the distribution was similar between the molecular components.

https://doi.org/10.1371/journal.pgen.1011289.s016

(TIFF)

Acknowledgments

We would like to acknowledge members of the Ortiz-Barrientos lab for their feedback on earlier versions of this manuscript. This research was conducted using the Gadi HPC system maintained by the National Computational Infrastructure (NCI), which is supported by the Australian Government.

References

  1. 1. Visscher PM, Brown MA, McCarthy MI, Yang J. Five Years of GWAS Discovery. Am J Hum Genet. 2012;90(1):7–24. pmid:22243964
  2. 2. Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169(7):1177–1186. pmid:28622505
  3. 3. Kosheleva K, Desai MM. Recombination Alters the Dynamics of Adaptation on Standing Variation in Laboratory Yeast Populations. Mol Biol Evol. 2018;35(1):180–201. pmid:29069452
  4. 4. Kemper KE, Visscher PM, Goddard ME. Genetic Architecture of Body Size in Mammals. Genome Biol. 2012;13(4):244. pmid:22546202
  5. 5. Dorweiler J, Stec A, Kermicle J, Doebley J. Teosinte Glume Architecture 1: A Genetic Locus Controlling a Key Step in Maize Evolution. Science. 1993;262(5131):233–235. pmid:17841871
  6. 6. Bradshaw HD, Wilbert SM, Otto KG, Schemske DW. Genetic Mapping of Floral Traits Associated with Reproductive Isolation in Monkeyflowers (Mimulus). Nature. 1995;376(6543):762–765.
  7. 7. Bomblies K, Peichel CL. Genetics of Adaptation. PNAS. 2022;119(30):e2122152119. pmid:35858399
  8. 8. Walsh B, Lynch M. Long-Term Response: 3. Adaptive Walks. In: Walsh B, Lynch M, editors. Evolution and Selection of Quantitative Traits. Oxford University Press; 2018. p. 991–1013. Available from: https://doi.org/10.1093/oso/9780198830870.003.0027.
  9. 9. Hayward LK, Sella G. Polygenic Adaptation after a Sudden Change in Environment. eLife. 2022;11:e66697. pmid:36155653
  10. 10. Maynard Smith J. Natural Selection and the Concept of a Protein Space. Nature. 1970;225(5232):563–564.
  11. 11. Kauffman S, Levin S. Towards a General Theory of Adaptive Walks on Rugged Landscapes. J Theor Biol. 1987;128(1):11–45. pmid:3431131
  12. 12. Gillespie JH. A Simple Stochastic Gene Substitution Model. Theor Popul Biol. 1983;23(2):202–215. pmid:6612632
  13. 13. Gillespie JH. Molecular Evolution over the Mutational Landscape. Evolution. 1984;38(5):1116–1129. pmid:28555784
  14. 14. Orr HA. The Population Genetics of Adaptation: The Distribution of Factors Fixed during Adaptive Evolution. Evolution. 1998;52(4):935–949. pmid:28565213
  15. 15. Orr HA. The Distribution of Fitness Effects among Beneficial Mutations. Genetics. 2003;163(4):1519–1526. pmid:12702694
  16. 16. Rokyta DR, Beisel CJ, Joyce P, Ferris MT, Burch CL, Wichman HA. Beneficial Fitness Effects Are Not Exponential for Two Viruses. J Mol Evol. 2008;67(4):368–376. pmid:18779988
  17. 17. Joyce P, Rokyta DR, Beisel CJ, Orr HA. A General Extreme Value Theory Model for the Adaptation of DNA Sequences under Strong Selection and Weak Mutation. Genetics. 2008;180(3):1627–1643. pmid:18791255
  18. 18. Martin G, Lenormand T. The Distribution of Beneficial and Fixed Mutation Fitness Effects Close to an Optimum. Genetics. 2008;179(2):907–16. pmid:18505866
  19. 19. Imhof M, Schlötterer C. Fitness Effects of Advantageous Mutations in Evolving Escherichia Coli Populations. PNAS. 2001;98(3):1113–1117. pmid:11158603
  20. 20. MacLean RC, Buckling A. The Distribution of Fitness Effects of Beneficial Mutations in Pseudomonas Aeruginosa. PLOS Genetics. 2009;5(3):e1000406. pmid:19266075
  21. 21. Kassen R, Bataillon T. Distribution of Fitness Effects among Beneficial Mutations before Selection in Experimental Populations of Bacteria. Nat Genet. 2006;38(4):484–488. pmid:16550173
  22. 22. Sanjuán R, Moya A, Elena SF. The Distribution of Fitness Effects Caused by Single-Nucleotide Substitutions in an RNA Virus. PNAS. 2004;101(22):8396–8401. pmid:15159545
  23. 23. Vale PF, Choisy M, Froissart R, Sanjuán R, Gandon S. The Distribution of Mutational Fitness Effects of Phage Φx174 on Different Hosts. Evolution. 2012;66(11):3495–3507. pmid:23106713
  24. 24. Eyre-Walker A, Keightley PD. The Distribution of Fitness Effects of New Mutations. Nat Rev Genet. 2007;8(8):610–618. pmid:17637733
  25. 25. Charras-Garrido M, Lezaud P. Extreme Value Analysis: An Introduction. Journal de la SFdS. 2013;154(2):66–97.
  26. 26. Schenk MF, Szendro IG, Krug J, de Visser JAGM. Quantifying the Adaptive Potential of an Antibiotic Resistance Enzyme. PLOS Genetics. 2012;8(6):e1002783. pmid:22761587
  27. 27. Neidhart J, Szendro IG, Krug J. Adaptation in Tunably Rugged Fitness Landscapes: The Rough Mount Fuji Model. Genetics. 2014;198(2):699–721. pmid:25123507
  28. 28. Bank C, Hietpas RT, Wong A, Bolon DN, Jensen JD. A Bayesian MCMC Approach to Assess the Complete Distribution of Fitness Effects of New Mutations: Uncovering the Potential for Adaptive Walks in Challenging Environments. Genetics. 2014;196(3):841–852. pmid:24398421
  29. 29. Foll M, Poh YP, Renzette N, Ferrer-Admetlla A, Bank C, Shim H, et al. Influenza Virus Drug Resistance: A Time-Sampled Population Genetics Perspective. PLOS Genetics. 2014;10(2):e1004185. pmid:24586206
  30. 30. Connallon T, Clark AG. The Distribution of Fitness Effects in an Uncertain World. Evolution. 2015;69(6):1610–1618. pmid:25913128
  31. 31. Eyre-Walker A. Genetic Architecture of a Complex Trait and Its Implications for Fitness and Genome-Wide Association Studies. PNAS. 2010;107(suppl_1):1752–1756. pmid:20133822
  32. 32. Salvador-Martínez I, Coronado-Zamora M, Castellano D, Barbadilla A, Salazar-Ciudad I. Mapping Selection within Drosophila Melanogaster Embryo’s Anatomy. Mol Biol Evol. 2018;35(1):66–79. pmid:29040697
  33. 33. Simpson GG. Tempo and Mode in Evolution. Columbia University Press, N.Y.; 1944.
  34. 34. Wright S. Evolution in Mendelian Populations. Genetics. 1931;16(2):97–159. pmid:17246615
  35. 35. Wright S. The Roles of Mutation, Inbreeding, Crossbreeding, and Selection in Evolution. Proceedings of the XI International Congress of Genetics. 1932;8:209–222.
  36. 36. Thornton KR. Polygenic Adaptation to an Environmental Shift: Temporal Dynamics of Variation under Gaussian Stabilizing Selection and Additive Effects on a Single Trait. Genetics. 2019;213(4):1513–1530. pmid:31653678
  37. 37. Maynard Smith J, Burian R, Kauffman S, Alberch P, Campbell J, Goodwin B, et al. Developmental Constraints and Evolution: A Perspective from The Mountain Lake Conference on Development and Evolution. Q Rev Biol. 1985;60(3):265–287.
  38. 38. Lande R, Shannon S. The Role of Genetic Variation in Adaptation and Population Persistence in a Changing Environment. Evolution. 1996;50(1):434–437. pmid:28568879
  39. 39. Walsh B, Lynch M. Evolution and Selection of Quantitative Traits. New York, NY, USA: Oxford University Press; 2018.
  40. 40. Lande R. Natural Selection and Random Genetic Drift in Phenotypic Evolution. Evolution. 1976;30(2):314–334. pmid:28563044
  41. 41. Fisher RA. The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans R Soc Edinburgh. 1918;52:399–433.
  42. 42. Falconer DSM. Introduction to Quantitative Genetics. 4th ed. Longmans Green, Harlow, Essex, UK: Pearson Education Limited; 1996.
  43. 43. Álvarez-Castro JM, Carlborg Ö. A Unified Model for Functional and Statistical Epistasis and Its Application in Quantitative Trait Loci Analysis. Genetics. 2007;176(2):1151–1167. pmid:17409082
  44. 44. Hansen TF. Why Epistasis Is Important for Selection and Adaptation. Evolution. 2013;67(12):3501–3511. pmid:24299403
  45. 45. Ang RML, Chen SAA, Kern AF, Xie Y, Fraser HB. Widespread Epistasis among Beneficial Genetic Variants Revealed by High-Throughput Genome Editing. Cell Genomics. 2023;3(4):100260. pmid:37082144
  46. 46. Draghi JA, Parsons TL, Plotkin JB. Epistasis Increases the Rate of Conditionally Neutral Substitution in an Adapting Population. Genetics. 2011;187(4):1139–1152. pmid:21288876
  47. 47. Breen MS, Kemena C, Vlasov PK, Notredame C, Kondrashov FA. Epistasis as the Primary Factor in Molecular Evolution. Nature. 2012;490(7421):535–538. pmid:23064225
  48. 48. Bendixsen DP, Østman B, Hayden EJ. Negative Epistasis in Experimental RNA Fitness Landscapes. J Mol Evol. 2017;85(5):159–168. pmid:29127445
  49. 49. Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian Evolution Can Follow Only Very Few Mutational Paths to Fitter Proteins. Science. 2006;312(5770):111–114. pmid:16601193
  50. 50. Huang W, Mackay TFC. The Genetic Architecture of Quantitative Traits Cannot Be Inferred from Variance Component Analysis. PLOS Genetics. 2016;12(11):e1006421. pmid:27812106
  51. 51. Doane AS, Elemento O. Regulatory Elements in Molecular Networks. WIREs Systems Biology and Medicine. 2017;9(3):e1374. pmid:28093886
  52. 52. Alon U. An Introduction to Systems Biology: Design Principles of Biological Circuits. 2nd ed. Boca Raton, FL, USA: CRC Press; 2019.
  53. 53. Joanito I, Chu JW, Wu SH, Hsu CP. An Incoherent Feed-Forward Loop Switches the Arabidopsis Clock Rapidly between Two Hysteretic States. Sci Rep. 2018;8(1):13944. pmid:30224713
  54. 54. Alon U. Network Motifs: Theory and Experimental Approaches. Nat Rev Genet. 2007;8(6):450–461. pmid:17510665
  55. 55. Hallinan JS, Jackway PT. Network Motifs, Feedback Loops and the Dynamics of Genetic Regulatory Networks. In: 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology; 2005. p. 1–7. Available from: https://ieeexplore.ieee.org/abstract/document/1594903.
  56. 56. Seo CH, Kim JR, Kim MS, Cho KH. Hub Genes with Positive Feedbacks Function as Master Switches in Developmental Gene Regulatory Networks. Bioinformatics. 2009;25(15):1898–1904. pmid:19439566
  57. 57. Stewart AJ, Seymour RM, Pomiankowski A, Reuter M. Under-Dominance Constrains the Evolution of Negative Autoregulation in Diploids. PLoS Comput Biol. 2013;9(3):e1002992. pmid:23555226
  58. 58. Shen-Orr SS, Milo R, Mangan S, Alon U. Network Motifs in the Transcriptional Regulation Network of Escherichia Coli. Nat Genet. 2002;31(1):64–68. pmid:11967538
  59. 59. Chaw SM, Chang CC, Chen HL, Li WH. Dating the Monocot–Dicot Divergence and the Origin of Core Eudicots Using Whole Chloroplast Genomes. J Mol Evol. 2004; 58(4):424–441. pmid:15114421
  60. 60. Snell P, Grimberg Å, Carlsson AS, Hofvander P. WRINKLED1 Is Subject to Evolutionary Conserved Negative Autoregulation. Frontiers in Plant Science. 2019;10. pmid:30984229
  61. 61. Rosenfeld N, Elowitz MB, Alon U. Negative Autoregulation Speeds the Response Times of Transcription Networks. J Mol Biol. 2002;323(5):785–793. pmid:12417193
  62. 62. Nevozhay D, Adams RM, Murphy KF, Josić K, Balázsi G. Negative Autoregulation Linearizes the Dose–Response and Suppresses the Heterogeneity of Gene Expression. PNAS. 2009;106(13):5123–5128. pmid:19279212
  63. 63. Hether TD, Hohenlohe PA. Genetic Regulatory Network Motifs Constrain Adaptation Through Curvature in the Landscape of Mutational (Co)Variance. Evolution. 2014;68(4):950–964. pmid:24219635
  64. 64. Kozuch BC, Shaffer MG, Culyba MJ. The Parameter-Fitness Landscape of lexA Autoregulation in Escherichia Coli. mSphere. 2020;5(4):e00718–20. pmid:32817380
  65. 65. Baier F, Gauye F, Perez-Carrasco R, Payne JL, Schaerli Y. Environment-Dependent Epistasis Increases Phenotypic Diversity in Gene Regulatory Networks. Science Advances. 2023;9(21):eadf1773. pmid:37224262
  66. 66. de Visser JAGM, Krug J. Empirical Fitness Landscapes and the Predictability of Evolution. Nat Rev Genet. 2014;15(7):480–490. pmid:24913663
  67. 67. Gavrilets S. Evolution and Speciation on Holey Adaptive Landscapes. Trends Ecol Evol. 1997;12(8):307–312. pmid:21238086
  68. 68. Wagner A. Does Evolutionary Plasticity Evolve? Evolution. 1996;50(3):1008–1023. pmid:28565284
  69. 69. François P. Evolving Phenotypic Networks in Silico. Semin Cell Dev Biol. 2014;35:90–97. pmid:24956562
  70. 70. Slatkin M. Quantitative Genetics of Heterochrony. Evolution. 1987;41(4):799–811. pmid:28564358
  71. 71. Rice SH. The Evolution of Canalization and the Breaking of Von Baer’s Laws: Modeling the Evolution of Development with Epistasis. Evolution. 1998;52(3):647–656. pmid:28565257
  72. 72. Schlitt T, Brazma A. Current Approaches to Gene Regulatory Network Modelling. BMC Bioinformatics. 2007;8(6):S9. pmid:17903290
  73. 73. Karlebach G, Shamir R. Modelling and Analysis of Gene Regulatory Networks. Nat Rev Mol Cell Biol. 2008;9(10):770–780. pmid:18797474
  74. 74. Morimoto D, Walinda E, Fukada H, Sugase K, Shirakawa M. Ubiquitylation Directly Induces Fold Destabilization of Proteins. Sci Rep. 2016;6(1):39453. pmid:27991582
  75. 75. Schmidtke G, Kalveram B, Weber E, Bochtler P, Lukasiak S, Hipp MS, et al. The UBA Domains of NUB1L Are Required for Binding but Not for Accelerated Degradation of the Ubiquitin-like Modifier FAT10 *. J Biol Chem. 2006;281(29):20045–20054. pmid:16707496
  76. 76. Halfon MS. Silencers, Enhancers, and the Multifunctional Regulatory Genome. Trends Genet. 2020;36(3):149–151. pmid:31918861
  77. 77. Fisher RA. The Genetical Theory of Natural Selection. Oxford, UK: The Clarendon press; 1930.
  78. 78. Fisher RA. XXI.—On the Dominance Ratio. Proc R Soc Edinb. 1923;42:321–341.
  79. 79. Orr HA. The Population Genetics of Beneficial Mutations. Philos Trans R Soc Lond, B, Biol Sci. 2010;365(1544):1195–1201. pmid:20308094
  80. 80. Aston E, Channon A, Belavkin RV, Gifford DR, Krašovec R, Knight CG. Critical Mutation Rate Has an Exponential Dependence on Population Size for Eukaryotic-Length Genomes with Crossover. Sci Rep. 2017;7:15519. pmid:29138394
  81. 81. Xu L, Chen H, Hu X, Zhang R, Zhang Z, Luo ZW. Average Gene Length Is Highly Conserved in Prokaryotes and Eukaryotes and Diverges Only between the Two Kingdoms. Mol Biol Evol. 2006;23(6):1107–8. pmid:16611645
  82. 82. Haller BC, Messer PW. SLiM 3: Forward Genetic Simulations beyond the Wright-Fisher Model. Mol Biol Evol. 2019;36(3):632–637. pmid:30517680
  83. 83. Berry S, Walcott M, Avalos CG, Du R. Ascent; 2021. AnyarInc. Available from: https://github.com/AnyarInc/Ascent.
  84. 84. R Core Team; 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria.
  85. 85. Barghi N, Hermisson J, Schlötterer C. Polygenic Adaptation: A Unifying Framework to Understand Positive Selection. Nat Rev Genet. 2020;21(12):769–781. pmid:32601318
  86. 86. Lenth RV; 2023. Emmeans: Estimated Marginal Means, Aka Least-Squares Means.
  87. 87. Wickham H. Ggplot2: Elegant Graphics for Data Analysis. Use R!. Springer-Verlag New York; 2016. Available from: https://ggplot2.tidyverse.org.
  88. 88. Beisel CJ, Rokyta DR, Wichman HA, Joyce P. Testing the Extreme Value Domain of Attraction for Distributions of Beneficial Fitness Effects. Genetics. 2007;176(4):2441–2449. pmid:17565958
  89. 89. Fisher RA, Tippett LHC. Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample. Math Proc Camb Philos Soc. 1928;24(2):180–190.
  90. 90. Fisher RA. Statistical Methods for Research Workers. Statistical methods for research workers. 1936;(6th Ed).
  91. 91. Xiang Y, Gubian S, Suomela B, Hoeng J. Generalized Simulated Annealing for Efficient Global Optimization: The GenSA Package for R. The R Journal. 2013;5/1.
  92. 92. Lebeuf-Taylor E, McCloskey N, Bailey SF, Hinz A, Kassen R. The Distribution of Fitness Effects among Synonymous Mutations in a Gene under Directional Selection. eLife. 2019;8:e45952. pmid:31322500
  93. 93. Orr HA. A Minimum on the Mean Number of Steps Taken in Adaptive Walks. J Theor Biol. 2003;220(2):241–247. pmid:12468295
  94. 94. Delignette-Muller ML, Dutang C. fitdistrplus: An R Package for Fitting Distributions. J Stat Softw. 2015;64(4):1–34.
  95. 95. Kopp M, Hermisson J. The Genetic Basis of Phenotypic Adaptation II: The Distribution of Adaptive Substitutions in the Moving Optimum Model. Genetics. 2009;183(4):1453–1476. pmid:19805820
  96. 96. Matuszewski S, Hermisson J, Kopp M. Catch Me If You Can: Adaptation from Standing Genetic Variation to a Moving Phenotypic Optimum. Genetics. 2015;200(4):1255–1274. pmid:26038348
  97. 97. Gillespie JH. The Causes of Molecular Evolution. University Press; 1991.
  98. 98. Kousathanas A, Keightley PD. A Comparison of Models to Infer the Distribution of Fitness Effects of New Mutations. Genetics. 2013;193(4):1197–1208. pmid:23341416
  99. 99. Zeyl C, DeVisser JAGM. Estimates of the Rate and Distribution of Fitness Effects of Spontaneous Mutation in Saccharomyces Cerevisiae. Genetics. 2001;157(1):53–61. pmid:11139491
  100. 100. Couce A, Limdi A, Magnan M, Owen SV, Herren CM, Lenski RE, et al. Changing Fitness Effects of Mutations through Long-Term Bacterial Evolution. Science. 2024;383(6681):eadd1417. pmid:38271521
  101. 101. Halligan DL, Kousathanas A, Ness RW, Harr B, Eöry L, Keane TM, et al. Contributions of Protein-Coding and Regulatory Change to Adaptive Molecular Evolution in Murid Rodents. PLOS Genetics. 2013;9(12):e1003995. pmid:24339797
  102. 102. Williamson RJ, Josephs EB, Platts AE, Hazzouri KM, Haudry A, Blanchette M, et al. Evidence for Widespread Positive and Negative Selection in Coding and Conserved Noncoding Regions of Capsella Grandiflora. PLOS Genetics. 2014;10(9):e1004622. pmid:25255320
  103. 103. Ohta T. Near-Neutrality in Evolution of Genes and Gene Regulation. PNAS. 2002;99(25):16134–16137. pmid:12461171
  104. 104. Erwin DH, Davidson EH. The Evolution of Hierarchical Gene Regulatory Networks. Nat Rev Genet. 2009;10(2):141–148. pmid:19139764
  105. 105. Wade MJ, Winther RG, Agrawal AF, Goodnight CJ. Alternative Definitions of Epistasis: Dependence and Interaction. Trends Ecol Evol. 2001;16(9):498–504.
  106. 106. Wagner A. Genotype Networks Shed Light on Evolutionary Constraints. Trends Ecol Evol. 2011;26(11):577–584. pmid:21840080
  107. 107. Hill WG, Robertson A. Effect of Linkage on Limits to Artificial Selection. Genet Res. 1966;8(3):269–294. pmid:5980116
  108. 108. Ortiz-Barrientos D, Engelstädter J, Rieseberg LH. Recombination Rate Evolution and the Origin of Species. Trends Ecol Evol. 2016;31(3):226–236. pmid:26831635
  109. 109. Orr HA. Adaptation and the Cost of Complexity. Evolution. 2000;54(1):13–20. pmid:10937178
  110. 110. Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, et al. The Genetic Landscape of a Cell. Science. 2010;327(5964):425–431. pmid:20093466
  111. 111. Koch EN, Costanzo M, Bellay J, Deshpande R, Chatfield-Reed K, Chua G, et al. Conserved Rules Govern Genetic Interaction Degree across Species. Genome Biology. 2012;13(7):R57. pmid:22747640
  112. 112. Barua A, Mikheyev AS. An Ancient, Conserved Gene Regulatory Network Led to the Rise of Oral Venom Systems. PNAS. 2021;118(14):e2021311118. pmid:33782124
  113. 113. Mancera E, Nocedal I, Hammel S, Gulati M, Mitchell KF, Andes DR, et al. Evolution of the Complex Transcription Network Controlling Biofilm Formation in Candida Species. eLife. 2021;10:e64682. pmid:33825680
  114. 114. Bertheloot J, Barbier F, Boudon F, Perez-Garcia MD, Péron T, Citerne S, et al. Sugar Availability Suppresses the Auxin-Induced Strigolactone Pathway to Promote Bud Outgrowth. New Phytologist. 2020;225(2):866–879. pmid:31529696
  115. 115. Pigolotti S, Krishna S, Jensen MH. Oscillation Patterns in Negative Feedback Loops. PNAS. 2007;104(16):6533–6537. pmid:17412833
  116. 116. Wu X, Bayle JH, Olson D, Levine AJ. The P53-Mdm-2 Autoregulatory Feedback Loop. Genes Dev. 1993;7(7a):1126–1132. pmid:8319905
  117. 117. Mangan S, Alon U. Structure and Function of the Feed-Forward Loop Network Motif. PNAS. 2003;100(21):11980–11985. pmid:14530388
  118. 118. Bullaughey K. Multidimensional Adaptive Evolution of a Feed-Forward Network and the Illusion of Compensation. Evolution. 2013;67(1):49–65. pmid:23289561
  119. 119. Lynch M. Evolution of the Mutation Rate. Trends Genet. 2010;26(8):345–352. pmid:20594608
  120. 120. Charlesworth B, Charlesworth D. Elements of Evolutionary Genetics. Greenwoord Village, Colorado, USA: Roberts and Company; 2010.
  121. 121. Koonin EV. Splendor and Misery of Adaptation, or the Importance of Neutral Null for Understanding Evolution. BMC Biology. 2016;14(1):114. pmid:28010725
  122. 122. Walsh B, Lynch M. Maintenance of Quantitative Genetic Variation. In: Walsh B, Lynch M, editors. Evolution and Selection of Quantitative Traits. Oxford University Press; 2018. p. 1016–1078. Available from: https://doi.org/10.1093/oso/9780198830870.003.0028.
  123. 123. Liu X, Li YI, Pritchard JK. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell. 2019;177(4):1022–1034.e6. pmid:31051098
  124. 124. Ritz KR, Noor MAF, Singh ND. Variation in Recombination Rate: Adaptive or Not? Trends Genet. 2017;33(5):364–374. pmid:28359582
  125. 125. Cowperthwaite MC, Bull JJ, Meyers LA. Distributions of Beneficial Fitness Effects in RNA. Genetics. 2005;170(4):1449–1457. pmid:15944361
  126. 126. Kryazhimskiy S, Tkačik G, Plotkin JB. The Dynamics of Adaptation on Correlated Fitness Landscapes. PNAS. 2009;106(44):18638–18643. pmid:19858497
  127. 127. Orr HA. Theories of Adaptation: What They Do and Don’t Say. Genetica. 2005;123(1):3–13. pmid:15881676
  128. 128. Siegal ML, Promislow DEL, Bergman A. Functional and Evolutionary Inference in Gene Networks: Does Topology Matter? Genetica. 2007;129(1):83–103. pmid:16897451
  129. 129. Claringbould A, de Klein N, Franke L. The Genetic Architecture of Molecular Traits. Curr Opin Syst Biol. 2017;1:25–31.
  130. 130. Agocs FJ. (Py)Oscode: Fast Solutions of Oscillatory ODEs. Journal of Open Source Software. 2020;5(56):2830.
  131. 131. Städter P, Schälte Y, Schmiester L, Hasenauer J, Stapor PL. Benchmarking of Numerical Integration Methods for ODE Models of Biological Systems. Sci Rep. 2021;11(1):2696. pmid:33514831