Parallel Evolutionary Dynamics of Adaptive Diversification in Escherichia coli

The divergence of Escherichia coli bacteria into metabolically distinct ecotypes has a similar genetic basis and similar evolutionary dynamics across independently evolved populations.


Introduction
The causes and mechanisms of diversification are central issues in evolutionary biology. Explanations that involve the splitting of an ancestral population into geographically or otherwise isolated populations (allopatric diversification) have historically been favored because of theoretical difficulties with sympatric diversification (i.e., diversification without isolation) [1][2][3][4][5]. In the last 15 years, though, two major developments have increased the attractiveness of sympatric explanations. First, models of sympatric diversification have largely overcome earlier theoretical objections, showing that sympatric diversification can occur due to frequency-dependent selection under a wide range of conditions [6][7][8][9]. Second, empirical evidence from both laboratory experiments [10][11][12][13][14][15][16][17][18][19] and field studies [20][21][22][23] suggests that diversification can occur in sympatry, sometimes on time scales of hundreds of generations, and that such diversification may be an important source of biological diversity.
Sympatric diversification can be driven by frequency-dependent selection in a process called adaptive diversification and under conditions that may be quite general [7][8][9]24]. This process can be described by the theoretical framework of adaptive dynamics [6,25,26]. A crucial component of this framework is the concept that the environment a population experiences, and that drives its evolutionary dynamics, depends in part on the phenotypic distribution of the population itself and the resulting ecological dynamics. Adaptive diversification occurs through evolutionary branching [6], a process in which selection drives a population to a point in phenotype space at which selection becomes disruptive. At this point, the population diverges into two lineages, which may continue to diverge.
In general, the problem of adaptive diversification and speciation is 2-fold: on the one hand, one wants to identify the ecological conditions that lead to disruptive selection and evolutionary branching, and on the other hand, one wants to understand the mechanisms interrupting gene flow between ecologically diverging subpopulations. Both of these aspects of adaptive diversification have been studied extensively in the theoretical literature (e.g., [7][8][9]). Here we experimentally address the first of these issues using asexual organisms, in which mating does not lead to recombination between diverging subpopulations, and which are therefore ideally suited to study the ecological conditions generating the frequency dependence necessary for adaptive diversification. Indeed, adaptive diversification has been documented in microbial evolution experiments [11,12,[27][28][29][30][31] in which well-mixed populations of Escherichia coli bacteria founded with a single genotype repeatedly evolve two metabolically distinct phenotypes. When grown in well-mixed serial batch cultures in medium with glucose and acetate as carbon sources, E. coli cells preferentially metabolize glucose and excrete acetate until the glucose is depleted and then undergo a diauxic switch to acetate consumption [32]. In several populations evolving in these conditions for more than 1,000 generations, two coexisting phenotypes emerged that differ in their diauxic lag-that is, in the time required to switch to acetate metabolism: the slow switcher (SS) has a longer diauxic lag than that of the fast switcher (FS) [11,28]. These two phenotypes reflect a tradeoff in carbohydrate metabolism: SS strains grow more quickly than FS strains when glucose is abundant, but are unable to efficiently catabolize acetate, while FS strains continue to grow rapidly on acetate after glucose is depleted [28]. The evolution of the FS and SS phenotypes in multiple replicate lines is a striking example of convergence at the phenotypic level, suggesting a deterministic adaptive process.
However, the evolutionary branching predicted by adaptive dynamics models necessarily involves changing selective pressures. Therefore, the similar outcomes of diversification across replicate populations are qualitatively different from parallel adaptation to a fixed adaptive landscape. Rather, in this case, the entire process of genetic change leading to environmental change and new selective pressures that in turn cause further genetic change has occurred in parallel. This suggests that not only the outcome of evolution is parallel but the evolutionary dynamics as well.
In spite of phenotypic evidence for adaptive diversification, there is limited information available on the genetic changes underlying this process. In fact, to our knowledge there are no examples of sympatric diversification for which the underlying genetics have been fully described. In the FS and SS example, the degree to which the similar, independently evolved phenotypes reflect similar underlying genetics in different populations is unknown. This has implications for the genotype-phenotype map: Are there few genetic ways to produce FS and SS phenotypes or many? Also unknown is the degree to which the similar evolutionary outcomes reflect similar evolutionary dynamics; the results of previous studies suggest that the degree of similarity in the type, order, and timing of adaptive changes across independently evolving populations varies widely (e.g., [33][34][35]). This in turn has implications for the degree of determinism in the evolutionary dynamics: Are there many paths or few that lead to similar phenotypic (and possibly genetic) outcomes? And are the changing selective pressures predicted by adaptive dynamics models reflected in genetic changes leading to new selective pressures that in turn cause further genetic change? If such a pattern is present in multiple replicate lines, this would provide evidence that not only the outcome of evolution is predictable, but the evolutionary dynamics as well.
To trace the dynamics of genetic change underlying adaptive diversification, we combined sequencing of FS and SS clones isolated near the end of the evolution experiment with sequencing of whole-population samples from time points in the frozen (''fossil'') record of the experiment. We sequenced two FS clones, two SS clones, and 16 time point samples for each of three replicate evolution experiments (called populations 18, 19, and 20 [28]). Sequencing of SS and FS clones allowed us to identify mutations associated with the phenotypes of interest, and sequencing of whole-population samples from the fossil record of the experiments allowed us to trace the origin, increase, and (occasionally) extinction of these and other mutations. Finally, comparing these results across three independently evolved populations allowed us to assess the degree to which a similar ecological setting led to similar evolutionary dynamics and outcomes (i.e., the degree of determinism).

Results
Sequencing the SS and FS clones revealed striking similarities in the genetic changes underlying the derived phenotypes across the three replicate populations (Figure 1). Each of the SS clones carried a mutation in spoT, a deletion of part or all of the ribose operon (rbs), and a mutation in nadR ( Figure 1). One or two additional mutations appeared in some SS clones, but these were not shared between clones. No mutations were fixed in any of the three replicate populations, and in no case was any specific genetic change shared between FS and SS clones. In population 19, the two SS clones did not share any mutations (Figure 1b), indicating that they evolved independently from the ancestral strain (although each clone has a mutation in spoT, nadR, and rbs). Thus, the six sequenced SS clones represent four separate origins of the SS phenotype, all of which evolved parallel changes to the same three loci.
Each of the FS clones carried 6-10 mutations relative to the ancestral strain, most of which were shared between the two clones from each population (Figure 1). Assuming a single origin for each mutation, we infer that these shared mutations occurred before the two sequenced clones last shared a common ancestor. Phenotypically, the FS type represents a novel metabolic strategy, while the SS type is more similar to the ancestral strain [11,27,28,30], and this difference is reflected in the underlying genetics. In all three populations, the FS clones are more genetically distant from the ancestor than the SS clones (paired t test, n = 4 independent comparisons, two-tailed p = 0.0008). FS clones from different populations are also more genetically dissimilar than SS clones from different populations: in contrast to spoT, rbs, and nadR in the SS clones, there were no genes that carried mutations in the FS clones from all three populations.

Author Summary
The causes and mechanisms of evolutionary diversification are central issues in biology. There is well-established theory that predicts that adaptive diversification can arise because of ecological interactions between individuals, such as competition or predation, but there are no empirical examples in which this process has been observed at the genetic level. We documented the genetic basis of adaptive diversification resulting from competition for resources in populations of the bacterium Escherichia coli. The populations diversified into two coexisting ecotypes representing different physiological adaptations. We found that similar but independently evolved phenotypes often shared mutations in the same gene and, in four cases, shared identical mutations at the same nucleotide position. Timelines of allele frequencies extracted from the frozen ''fossil record'' of three evolving populations showed parallel evolutionary dynamics, suggesting that mutations causing one type of physiology changed the ecological environment and allowed invasion of mutations causing an alternate physiology. The results provide empirical evidence of adaptive diversification as a predictable evolutionary process.  Figure 2 summarizes the evolutionary dynamics unfolding in each of the three evolution experiments, and Figures 3 and 4 show the frequencies of the mutations found in the various SS and FS endpoint clones over time. These timelines suggest that each ecotype affected the other's evolution by altering the available ecological opportunities. In all three evolving populations, nonsynonymous SS-associated spoT and rbs mutations were the first to reach high frequency and likely increased the degree of specialization on glucose [36,37]. In population 18, for which the timeline of metabolic phenotypes has been documented [28], the rapid rise of these mutations corresponds very well with the increase in the mean switching lag shown in Figure 1B of Spencer et al. [28]. Similarly, in population 20, SS bacteria were present by generation 200 [31]. In both cases, spoT and rbs were the only SS-associated mutations present when the SS phenotype was first detected, so one or both of these mutations must have caused the SS phenotype. It is known that spoT mutations can confer a substantial advantage by reducing the lag phase before exponential growth on glucose and by increasing the maximum growth rate on glucose, both of which presumably occur through partial deactivation of the stringent stress response [37,38]. This may in turn make it harder for the cells to switch to acetate consumption after glucose is exhausted, and hence cause the SS phenotype.
Due to an IS150 element immediately upstream of the rbs operon, deletions of all or part of rbs occur at high frequency (,5610 25 per cell generation) in the ancestral E. coli strains used in our evolution experiments and provide a ,1%-2% fitness advantage in glucose minimal medium [36]. Since rbs deletions were also the first mutations to occur in two of the three FS lineages (Figure 1b, c), it is likely that rbs deletions alone do not cause either the SS or the FS phenotype, but rather that rbs deletion mutants were a common genetic background early in the experiment and that the mutations causing the SS and most FS phenotypes occurred on this background.
By generation 342, the frequency of SS-associated spoT and rbs mutations was high (.65%) in all three populations ( Figure 3). If either or both of these mutations are responsible for an increase in acetate lag (as must be the case in population 18), their increased frequency would have caused a change in the daily regime of nutrient concentrations in the experimental environment, namely that more acetate was available later in the growth phase. The first FS-associated mutations began to rise in frequency at this time (Figures 2 and 4). This wave of invasion involved a different set of genes in each population, but some evidence of parallelism is apparent here as well: the mutations increasing at this time included an identical insertion in the yfbV/ackA intergenic region in populations 18 and 19, and different mutations affecting the ptsG gene in populations 19 and 20. In all three populations, the first FS-associated mutations to reach appreciable frequency included ones in or upstream of genes related to acetate utilization and excretion and glucose metabolism. These mutations appeared either in the remaining ancestral genetic background or in rbs deletion mutants and led to coexistence between the SS and FS lineages that persisted until the end of the evolution experiments. These early FS-associated mutations occurred upstream of ackA in populations 18 and 19, in iclR in population 18, in pta in population 20, and in or upstream of ptsG in populations 19 and 20 ( Figure 2). The timing of these invasions, which in all three populations only reached appreciable frequencies after SS-associated mutations had reached high frequency, is consistent with FS-like phenotypes evolving as an adaptation to the novel ecological niche of greater acetate availability generated by increased glucose specialization of the SS. These early FS invasions thus generated the basic SS-FSpolymorphism that persisted to the end of the evolution experiment. Experimental evidence demonstrates that the longterm coexistence of FS and SS is due to frequency-dependent interactions [28][29][30][31]. Again, in population 18 the correspondence with phenotypic change is conspicuous: clones with a short acetate lag were first detected around the same time (ca. generation 500, Figure 1B in Spencer et al. [28]) at which the first three FSassociated mutations reached appreciable frequency: a nonsynonymous substitution in yijC, an insertion upstream of ackA (yfbV/ ackA+T), and a 10 bp deletion in iclR (Figures 2 and 4). Thus one or more of these must have produced the FS phenotype. By the same logic, one or more of the four FS-associated mutations present in population 20 when FS were first detected at generation 200 (rbs, pta, ptsG, and yceA) must be sufficient to produce the FS phenotype.
The functions of some genes in the initial FS invasions suggest their involvement in similar phenotypic changes across populations. The yfbV/ackA insertion in populations 18 and 19 affects a potential transcriptional recognition sequence of the global fermentation activator arcA upstream of ackA [39], suggesting that this mutation affects ackA expression, and hence acetate metabolism. In population 20, a mutation in pta rose in frequency at about the same time (Figure 4c), and all six sequenced FS clones bear one of these two mutations. Since ackA and pta catalyze subsequent reactions in the pathway of acetate utilization and excretion (acetate«acetylphosphate and acetylphosphate«acetyl-CoA, respectively), these two mutations may have similar metabolic effects.
The function of ackA as an important regulator of acetate metabolism and the independent origin of the identical yfbV/ackA mutation in populations 18 and 19 strongly suggest that this intergenic substitution is at least partially responsible for the reduced acetate lag in the FS clones (although FS in population 20 has a different genetic basis). Similarly, iclR is a regulator of the acetate operon aceBAK, and in an experimental population not included in this study, an insertion in iclR acting as a stop codon was previously shown to be partly responsible for the FS phenotype by derepressing the acetate operon [27]. This suggests that the iclR deletion in population 18 has contributed to the FS phenotype as well. Finally, yijC, a repressor of genes involved in fatty acid biosynthesis [38], could play a role in the FS phenotype by altering the relative amounts of acetyl-CoA used in fatty acid biosynthesis and in the citric acid cycle.
In population 20, a mutation in ptsG was one of the first FSassociated mutations to invade (Figure 4c), while in population 19, an IS186 insertion sequence appeared in the intergenic region ackA) indicate that the mutation is in the intergenic region between the indicated genes. Underlined gene names indicate nonsynonymous changes (including indels in coding regions). Colored gene names (other than black) indicate changes in or upstream of the same gene. Timing of mutations and divergences were inferred from the fossil record: a mutation found in a clone was assumed to have arisen at the midpoint between the first time step at which the mutation was detected in the fossil record and the previous time step; divergences between clones are assumed midway between the last mutation they share and the first mutation they do not share. Because of limitations on time resolution and minimum detectable frequency, timing of all such events should be viewed as approximations. Mutations found in clones but not in time point samples are assumed to have occurred near the end of the experiment and are marked with asterisks (*). doi:10.1371/journal.pbio.1001490.g001 Figure 2. Dynamics of the frequencies of mutations detected in the fossil record of three evolving populations. Shades of blue (above) indicate the mutations associated with FS clones as identified in Figure 1, and shades of green (below) indicate the mutations associated with SS clones as identified in Figure 1 (mutations with a * in Figure 1 are not shown, because their frequency was not high enough to be detected in the time point samples). Gold indicates ancestral strains (which may include mutations not associated with any sequenced clone). The white region in (b) indicates an independent origin of the SS phenotype. Mutations within a lineage are cumulative-that is, mutations corresponding to lighter regions appear in the genetic background corresponding to the darker regions in which the lighter regions are nested. For example, in population 20 the first mutations to appear in the SS lineage were an 1,160 bp deletion in the rbs operon and a substitution in codon 454 of spoT. An IS150 insertion in the intergenic region between mokB and trg appeared on this background around generation 300 and remained at low frequency for the rest of the experiment. Around generation 650, a single bp deletion in codon 394 of nadR appeared on the rbs D1160 bp+spoT-454+mokB/trg IS150 background. Grouping of mutations into lineages was based on their presence together in sequenced clones (in this case 20-SS2), and their order of appearance was inferred from the time point sample in which each was first detected. In addition, three mutations not found in any of the sequenced clones but whose association with SS and upstream of ptsG around the same time, potentially disrupting its transcriptional regulation. The enzyme encoded by ptsG, a glucose-specific PTS permease, is involved in the uptake of glucose and its transport across the cell membrane [40], and disruption or down-regulation of these functions would be consistent with the FS phenotype.
After FS mutations had risen to intermediate frequencies (.0.15), several SS-associated nadR mutations appeared at detectable frequencies in each of the three populations (Figures 1,  2, and 5; Text S1). The proliferation of these mutations ($5 in each population) after generation 500 is striking since no nadR mutations were present at detectable frequency before this time. nadR plays an important role in many metabolic pathways, including growth on carbohydrates [41,42], and the observed mutations show a surprising degree of parallelism. The highestmean frequency nadR mutation in population 20 (nadR-290) was identical to that in 19-SS1 in population 19, and a different mutation in the same codon was present in population 18. All three populations also included a mutation in codon 294 of nadR, and this was identical between populations 18 and 20 (Figure 5a,  c). Thus, a different pair of mutations in these two codons is found  in each of the three populations, though each mutation is shared by two populations.
The nadR mutation found in both SS clones from population 18 was only detected in a single Illumina read in the time point samples (at generation 482), indicating that it was present at very low frequency. The presence of such a low-frequency mutation in both sequenced clones suggests that it had a phenotypic effect (since we preferentially selected clones that were clearly of the SS phenotype; see Materials and Methods). The protein encoded by nadR has both enzymatic and regulatory roles in the NAD biosynthetic pathway and plays important roles in glycolysis and the citric acid cycle. The presence of nadR mutations in all six sequenced SS clones and none of the six sequenced FS clones strongly suggests that these mutations are adaptive for the SS, but not the FS, phenotype. It is interesting to note that mutations in nadR were found in 12 of 12 experimental E. coli populations after 20,000 generations of evolution in glucose minimal medium [42], and that one of these was identical to the nadR-290 mutation in populations 19 and 20.
In populations 18 and 20, invasion by SS-associated nadR mutants was followed by rapid increases in frequency of a second set of FS-associated mutations (Figure 2). In population 18, a spoT mutation identical to that in FS from population 20 (spoT-414) increased in frequency only to be replaced by another spoT mutation (spoT-369) that had previously been present at very low frequency. In population 20, the second set of FS-associated mutations included one in a global regulator (arcA) known to increase acetate consumption [31]. It is likely that the FSassociated arcA mutation in population 20 affects the expression of ackA; if so, one of the phenotypic effects of this mutation may be similar to that of the yfbV/ackA insertion in populations 18 and 19. This would explain why this mutation has a larger impact on SS clones than on FS clones [31]: if the primary phenotypic effect of the arcA mutation is to alter the rates of acetate utilization and/or excretion, the FS-associated pta mutation may have made this effect at least partially redundant in population 20.
In addition to the spoT mutations associated with FS and SS clones, one other mutation in spoT was present at $20% frequency at some time in each of the three populations ( Figure S1). In populations 18 and 20, this mutation was lost by the end of the experiment. In population 19 this spoT mutation increased in frequency near the end of the experiment as the spoT mutation associated with 19-SS1 underwent a corresponding decline. The transient spoT mutation in population 18 was identical to that associated with the FS clones in population 20 ( Figure S1a, c), and hence is likely to be FS-associated. This indicates that mutations in the stringent response can be adaptive for either the SS or the FS phenotype [43]. The phenotype associated with the spoT-316 mutation in population 20 is not known.
Several other mutations not associated with any of the FS and SS clones were present at detectable frequencies in each of the three fossil records ( Figure S2). A complete list of detected mutations and the samples in which they were found is shown in Table S1.

Discussion
Microbial evolution experiments are a powerful approach to understanding evolutionary dynamics, combining controlled conditions with the capability for experimental replication to allow strong inferences of causation. In addition, rapid reproduction allows laboratory experiments lasting hundreds or thousands of generations, and cryopreservation allows direct comparisons between ancestors and descendants. The recent rapid advance of nucleic acid sequencing technologies has made whole-genome sequencing feasible for both single microbial strains and whole populations containing a variety of strains. The combination of microbial evolution experiments and next-generation sequencing technologies provides an unprecedented opportunity to observe the temporal dynamics of evolutionary change across the entire genome [44,45]. Replicating this approach in multiple independent populations can tell us whether adaptive sympatric diversification in independent populations involves similar genetic mechanisms and similar evolutionary dynamics. Our results revealed both shared and unique genetic mechanisms underlying the evolution of pairs of metabolically distinct ecotypes in different populations. In some cases, similar phenotypes had mutations in different genes (e.g., the wecF, uppS, and arcA mutations in the FS clones from populations 18, 19, and 20, respectively; no mutations in these genes were detected in either clones or time point samples in the other populations). In some cases, mutations affected different codons of the same gene, as in the distinct spoT and nadR mutations found in the SS clones from all three populations. We also observed different changes to the same codon (e.g., codons 290 and 294 of the nadR gene; Figures Figure 5). Of the 45 mutations shown in Figure 1, 21 (47%) occurred in a nucleotide, codon, or gene that also had a mutation associated with the same ecotype in another population.
The pattern of genetic invasions evident in the fossil records also revealed strikingly similar evolutionary dynamics: in all three evolving populations, SS-associated spoT and rbs mutations were the first to invade, followed by FS-associated mutations affecting acetate and glucose metabolism, followed by SS-associated mutations in nadR, and finally additional FS-associated mutations. In spite of several mutations showing evidence of strong positive selection, such as the SS-associated spoT mutations in all three populations, no mutation was fixed in any of the three populations. Many mutations that increased rapidly after their initial appearance later declined in frequency yet were then maintained in the populations at intermediate frequencies.
Apart from genetic drift, two separate (but not mutually exclusive) processes could explain the repeated and parallel invasions and long-term coexistence observed in these three populations. We do not consider genetic drift as an explanation because the large effective population sizes make drift implausible for allele frequency changes greater than a fraction of 1% from one time point sample to the next (see Materials and Methods).
Clonal interference, which involves the coexistence of two or more beneficial mutations on different genetic backgrounds, is one potential explanation. This process allows long transient polymorphisms to be maintained in asexual populations because several different and almost equally beneficial mutations can be present in different subpopulations [46][47][48]. Thus, clonal interference is expected to lead to longer fixation times, elevated levels of polymorphism, and generally more complex evolutionary dynamics in asexual populations such as the ones studied here. One probable example of clonal interference is the replacement of spoT-414 by fecI/insA-25, spoT-369, and wecF-244 within the FS lineage in population 18. Around generation 700, the spot-414 mutation appeared on the FS background and began to rapidly invade, while the fecI/insA-25 and spoT-369 mutations remained at low frequency. Before the spoT-414 mutation could reach fixation within the FS lineage, though, the wecF-244 mutation appeared and quickly replaced all other FS lineages, including that with spoT-414.
Another possible explanation for long-term coexistence is the coevolution of diverging phenotypes through environmental feedbacks and frequency-dependent selection. In this scenario, the adaptive landscape changes as metabolic changes in one subpopulation create a new niche, which another subpopulation evolves to fill. Since the only source of environmental change over the course of the experiments was the bacteria themselves, any such changes in the selective regime must have been generated by changes in the genetic, and hence metabolic, makeup of the bacterial populations. Such environmental feedback generates frequency dependence and is at the core of the theory of adaptive diversification [6][7][8][9]. An example of environmentally mediated negative frequency dependence is the interaction between the SS lineage and the wecF-244 containing FS lineage in population 18: the wecF-244 mutation invaded the FS lineage rapidly, indicating a strong selective advantage. In the absence of any frequencydependent interactions, such an advantageous mutation would continue to invade, going quickly to fixation unless another even more advantageous mutation appeared (as in the clonal interference scenario). In this case, though, neither of these explanations is viable: after quickly fixing within the FS lineage, the wecF-244 mutation leveled off (or even declined in frequency) in the absence of any new mutations.
Taken by themselves, most of our results could be explained by either clonal interference or reciprocal niche construction. Since both processes can explain the long-term coexistence of multiple lineages, it can be difficult to distinguish between them. However, the populations in this study have also been the subject of numerous previous studies, and this prior work aids substantially in interpreting the current results. When this additional information is taken into account, it is clear that although clonal interference may explain some of the observed dynamics, it is unlikely to explain all of them.
The main reason for this is that we already know from previous experimental analyses that the coexistence between the SS and FS ecotypes involves frequency dependence, at least in populations 18 and 20 (e.g., [28][29][30][31]). In particular, the polymorphisms between SS and FS lineages that we observed arising early on in the evolution experiments are maintained by selective forces favoring rare ecotypes. For population 20, [31] has explicitly shown the action of frequency dependence throughout the fossil record in invasion experiments with SS and FS strains extracted at various time points. In addition, Spencer et al. [28] have already argued in detail why clonal interference is unlikely to be the main driver for the pattern of evolutionary branching observed in population 18, which is one of the populations used for the present study. Clonal interference may have played a role in generating some of the polymorphisms observed within the SS and within the FS lineages, and in the timing of the rise of various mutations. Overall, however, it seems clear that the basic coexistence between the SS and FS ecotypes are not due to clonal interference, but to frequency-dependent ecological interactions. Indeed, similar evidence has led to the conclusion that a polymorphism in one of R. Lenski's long-term experimental lines, Ara-2 [18,49], evolved as a result of niche construction [15,50].
It is a hallmark of frequency dependence that one type's abundance creates the niche for another type's invasion. Although we cannot rule out clonal interference, the sequence of alternating invasions observed in the fossil records of our experimental lines is consistent with this process of reciprocal niche construction. In particular, as is apparent from Figures 3 and 4, the rise of the first FS mutations consistently following in the wake of the establishment of first SS mutations is conspicuous, and so is the rise of the SS-associated nadR mutations following the appearance of the first FS mutations. We note that the limited replication of this study prevents many rigorous statistical tests, so that many of our results can only be described qualitatively, not quantitatively. With the continuing rapid decline in the cost of sequencing data, it is quickly becoming feasible to carry out studies similar to ours with higher temporal resolution and across larger numbers of populations, which will make rigorous statistical analyses possible.
Nevertheless, it seems unlikely that the consistent pattern of alternating invasions observed in our three lines is due to chance alone, and given that the endpoint FS and SS strains coexist due to frequency dependence, it is tempting to conclude that the patterns of invasion reflect the action of frequency-dependent selection in the course of the evolution experiment. The observed diversification should then be viewed in the light of the theory of adaptive diversification due to frequency-dependent interactions [9]. It is worth noting that much (but not all) of this theory is based on the assumption of many mutations of small effect, and the basic theoretical phenomenon of evolutionary branching in particular is an essentially continuous process in phenotype space [6], which moreover is often presented as a symmetric pattern of diversification. In contrast, in our experimental lines diversification is obviously due to a few mutations of large effect, and the pattern of diversification is asymmetric in phenotype space [28]. However, many aspects of the theory of adaptive diversification are robust to introducing large mutational effects, and asymmetric evolutionary branching is entirely feasible [9]. Therefore, our experimental results can be seen as proof of this robustness, and as providing a full description of adaptive diversification at the genetic level, revealing parallel evolutionary dynamics, and thus a high degree of determinism, in the sympatric origin and subsequent divergence of ecologically distinct lineages.

Materials and Methods
We isolated clones from frozen samples of populations 18 and 19 from day 156 of the evolution experiment of Spencer et al. [28]. Frozen samples were inoculated into 10 mL of the growth medium, grown overnight at 37uC with shaking, and spread onto agar plates. We arbitrarily chose 10 small colonies and 10 large colonies from each population and measured their growth profiles over 24 h as described in Spencer et al. [28]. From each population, we chose two large colonies with unambiguous SS growth profiles and two small colonies with unambiguous FS growth profiles for sequencing. For population 20, we used previously isolated clones [30], also from day 156 of the experiment. In this experiment, replicate populations were founded from isogenic lines of E. coli B and cultured in wellmixed condition for 183 d (,1,230 generations) with daily (,6.7 generations) transfers to fresh medium (Text S1). Populations 18 and 20 were initiated with REL606, and population 19 with REL607 [51]. REL606 and REL607 perform similarly in the growth environment of the evolution experiment [28,51,52] [53] for use in all downstream analyses. All FASTQ files were deposited in the NCBI short read archive (accession: SRP017657). We identified SNPs and small (#4 bp) indels and estimated their frequencies in the time point samples using both the main public server and local instances of Galaxy (details below) [54][55][56]. To identify larger indels and estimate their frequencies in the time point samples, we used BreSeq version 0.16 [57]. The sequence [58] of the ancestral strain REL606 (GenBank accession number NC_012967.1) was used as the reference for all mutation screens.
FASTQ files were first filtered for quality, retaining only those reads with #5 bases with quality scores ,20. Reads were aligned to the reference genome using BWA version 0.5.9-r16 [59] with default settings and treating the reads as single-end, and variants were identified using SAMTools version 0.1.12-r862 [60]. For the 60 sequenced samples (12 clones and 48 time points) average coverage (over the 4,629,812 bp of the reference genome) ranged from 726 to 2,5006. For all 60 samples, .99% of the genome was covered by .30 aligned reads.
We report the frequencies of all variants that both appear in more than one time point sample (within the same population) and rise to at least 5% frequency in one or more of the samples. We also report the frequencies of variants that are found in the clonal samples, regardless of their frequency in the time point samples. Variants supported by a single read at a given time point are not reported unless supported by multiple reads in the next time point. We estimated the frequencies of large deletions (.4 bp) by manually inspecting all reads in which $10 bp matched each side of the deleted region. In a few cases, we were able to determine linkage between nearby mutations by examining individual Illumina reads that spanned both loci.
To distinguish changes in allele frequency due to selection from those due to drift, we assume an effective population size (N e ) of 3.3610 7 , as estimated for E. coli grown in similar conditions [51]. Under the Wright-Fisher model [61,62], drift is a Markov chain, which generates a variance in allele frequency of pq/N e after one generation (for haploids). After t generations, the variance is pq(1e t/Ne ). If we assume p = q = 0.5 (which yields the fastest drift), the variance after 82 generations (the average time separating our time point samples) is 6.21610 27 (s.d. = 7.88610 24 or 0.08%). Using the normal approximation of the binomial, the probability that drift causes an allele frequency change $1% from one time point sample to the next is less than 1610 212 . Thus, even accounting for multiple tests, the possibility that any of the changes in allele frequency that we discuss are caused solely by drift is remote. Text S1 Supplementary information incorporating Supplementary Methods 1 (Materials and Methods details) and Supplementary Results 1 (nadR mutations found in the timelines of the fossil record, but not in the sequenced clones). (DOCX)

Table S1
Mutations detected in all samples and their effect (if known) on the encoded protein. Predicted effects on amino acid sequences are classified as ''Frameshift'' (e.g., indels of one or two base pairs), substitutions (e.g., ''RRH'' indicates an arginine residue replaced with a histidine), or synonymous (e.g., ''RRR''). Samples are indicated as population number/clone (FS or SS) or time point (TP, indicating that the mutation is found at .5% in one or more time point samples). Numbers after gene names indicate the affected codon. Gene names separated by a forward slash (''/'') indicate a mutation in the intergenic region. (DOCX)