The First Steps of Adaptation of Escherichia coli to the Gut Are Dominated by Soft Sweeps

The accumulation of adaptive mutations is essential for survival in novel environments. However, in clonal populations with a high mutational supply, the power of natural selection is expected to be limited. This is due to clonal interference - the competition of clones carrying different beneficial mutations - which leads to the loss of many small effect mutations and fixation of large effect ones. If interference is abundant, then mechanisms for horizontal transfer of genes, which allow the immediate combination of beneficial alleles in a single background, are expected to evolve. However, the relevance of interference in natural complex environments, such as the gut, is poorly known. To address this issue, we have developed an experimental system which allows to uncover the nature of the adaptive process as Escherichia coli adapts to the mouse gut. This system shows the invasion of beneficial mutations in the bacterial populations and demonstrates the pervasiveness of clonal interference. The observed dynamics of change in frequency of beneficial mutations are consistent with soft sweeps, where different adaptive mutations with similar phenotypes, arise repeatedly on different haplotypes without reaching fixation. Despite the complexity of this ecosystem, the genetic basis of the adaptive mutations revealed a striking parallelism in independently evolving populations. This was mainly characterized by the insertion of transposable elements in both coding and regulatory regions of a few genes. Interestingly, in most populations we observed a complete phenotypic sweep without loss of genetic variation. The intense clonal interference during adaptation to the gut environment, here demonstrated, may be important for our understanding of the levels of strain diversity of E. coli inhabiting the human gut microbiota and of its recombination rate.


Introduction
Mutation is the fuel of evolution and beneficial mutations the driver of organismal adaptation. When a small group of organisms founds a new niche it will rely on de novo mutations to adjust to such novel environment. If the rate of emergence of new beneficial mutations is low, adaptation will proceed through the asynchronous accumulation of such mutations, one at a time. This will result in short-term polymorphism, during which the frequency of the beneficial mutation will rapidly change. Such strong selective sweeps will purge linked neutral variation, a phenomenon long recognized as the signature of selective sweeps in bacterial populations [1]. Thus, for reasonably small populations and for low mutation rates, population mean fitness will increase in steps, where at each step neutral variation becomes completely depleted. The adaptive walk will therefore proceed by discrete movements along the fitness landscape.
Over the years studies of microbial adaptation in different environments have repeatedly detected deviations from this simple pattern [2][3][4][5][6]. It is now much more commonly accepted that, in reasonably large microbial populations, many distinct adaptive mutations may arise and compete for fixation. The pattern of microbial adaptation has therefore been supportive of an evolutionary mechanism described in the sixties -the Hill-Roberston effect [7]. This has been coined clonal interference (CI) in the context of clonal bacterial populations (reviewed in [8]). This effect is theoretically expected to limit the speed of adaptation in asexual versus sexual populations [9]. A great number of beneficial mutations are lost and the distribution of mutations that fix can be greatly affected by interference [8,10,11].
But how strong is CI and what consequences does it entail? While it has been shown to occur in bacteria [5,12] and eukaryotes [3,13] in laboratory settings, its relevance in natural environments is poorly known. Interestingly, CI has recently been inferred to be an important determinant of the evolution of the influenza virus [14]. Several examples from the study of rapid adaptation in natural populations have also shown the violation of the classical hard selective sweep model as well as the assumption of the mutation limited regime of adaptation. In contrast multiple adaptive alleles at the same locus can sweep through the populations at the same time, a phenomenon known as soft sweeps. These alleles can emerge from de novo mutation or from standing genetic variation (see [15] for a revision). The phenomenon of CI is theoretically expected to impact the dynamics of adaptation if the effective population size (N e ) and/or the rate of occurrence of beneficial mutations (U b ) is large. More specifically, when the number of competing mutations is bigger than one, that is if 2N e U b Ln(N e s b /2) .1 (where s b is the mean effect of a beneficial mutation [8]). Desai and Fisher [16] made the case that in extremely large populations with very high mutational inputs, haplotypes with multiple beneficial mutations are expected to arise and increase in frequency. Furthermore, important advances of the theory of CI have also recently been made [11,17,18]. Since the parameter values important to determine the importance and level of interference are expected to be dependent on the environment, the relevance of CI for bacterial evolution in natural conditions remains to be demonstrated.
E. coli K-12 has been an important model organism in many fields of biology, since its isolation from human feces in 1922 [19]. Accumulation of mutations during its adaptation to the gut has been observed [20][21][22], but the fitness landscape characteristic of this environment as well as the extent of interference E. coli experiences is not known.
We have studied the process of accumulation of beneficial mutations and the strength of their effects in a natural environment of E. coli, the mouse gut. We found that CI is pervasive in vivo and described the genetic basis of the initial steps of adaptation, which were observed to exhibit a striking parallelism. Remarkably we have found that in the same population, distinct mutations with equivalent functional effects (i.e., targeting the same gene or operon) reach detectable frequencies simultaneously. This leads to the occurrence of a phenotypic hard sweep without loss of variation at the genetic level. However, most of these mutations get extinct after a few generations, a signature of soft sweeps [23].

Results and Discussion
Dynamics of adaptation of E. coli through changes in the frequency of a neutral marker A genetically homogeneous population of E. coli, except for a chromosomally encoded fluorescence marker (see Material and Methods), was used to trace the occurrence of different adaptive mutations [2] and determine the strength of their fitness effects. This population, composed of equal amounts of two subpopulations expressing either a yellow (YFP) or cyan (CFP) fluorescent protein, was used to inoculate inbred mice orally. Subsequently, we followed fluorescent-markers frequency from daily collected fecal samples, for 24 days. The experiment was repeated three times to a total of 15 mice housed individually, where E. coli adaptation was followed.
The adaptive dynamics observed in 15 independently evolving populations are shown in Figure 1A (see also Figure S1 for the E. coli loads over time). Some of the dynamics observed, exhibit a typical signature of CI. An initial increase in frequency of one neutral marker, followed by replacement by the subpopulation bearing another (e.g. Figure 1A, population 1.9) was diagnostic of CI. Presumably, the markers hitchhike with successions of beneficial mutations [2]. We also observed cases in which the frequencies of the markers remained stable, ( Figure 1A, populations 1.6 and 1.8), for up to 24 days. This time corresponds to approximately 400 generations, if we take the 80 minute generation time estimated in the gut [24]. Stable marker dynamics are expected under two completely opposing scenarios: neutral evolution due to the lack of occurrence of adaptive mutations, or very intense CI caused by strong mutations of similar effect occurring in the different fluorescence backgrounds simultaneously. To distinguish between these scenarios and to further show that strong adaptive mutations are occurring, we performed in vivo competitive fitness assays of several clones with the ancestral strain. We tested clones from populations where a clear signal of adaptation was detected (populations 1.12 and 1.13 in Figure 1A) or was not (1.6 and 1.8 in Figure 1A). All clones tested were found to carry beneficial mutations ( Figure 1B). As expected, the clones isolated after a clear sweep ( Figure 1B, populations 1.12 and 1.13), showed a fitness increase. Importantly, clones sampled from the populations in which only slight changes in marker frequency were detected also showed a fitness increase ( Figure 1B, populations 1.6 and 1.8). This demonstrates that the evolution process involved multiple adaptive mutations competing for fixation. Consistent with the small changes in marker frequency observed, the competitive fitnesses were similar for clones isolated from the same evolving population, but bearing distinct neutral markers (Mann-Whitney test P = 0.7 population 1.6, P = 0.4 population 1.8).

The distribution of effects of mutations that contributed to adaptation
From the dynamics of neutral markers one can try to estimate the important evolutionary parameters of the adaptive process: mutation rate and selective effects [2,[25][26][27]. One method that has been used [25] summarizes neutral marker dynamics in replicate experiments in two statistics: i) the time it takes for one of the markers to first change significantly in frequency and ii) the slope associated with that frequency change. The time for the first marker deviation corresponds to the time where the natural logarithm of the marker ratio significantly changes from the initial ratio. The slope associated with a frequency change is the slope of the linear regression of the natural logarithm of the ratio between the two markers after a significant frequency change is detected.

Author Summary
Adaptation to novel environments involves the accumulation of beneficial mutations. If these are rare the process will proceed slowly with each one sweeping to fixation on its own. On the contrary if they are common in clonal populations, individuals carrying different beneficial alleles will experience intense competition and only those clones carrying the stronger effect mutations will leave a future line of descent. This phenomenon is known as clonal interference and the extent to which it occurs in natural environments is unknown. One of the most complex natural environments for E. coli is the mammalian intestine, where it evolves in the presence of many species comprising the gut microbiota. We have studied the dynamics of adaptation of E. coli populations evolving in this relevant ecosystem. We show that clonal interference is pervasive in the mouse gut and that the targets of natural selection are similar in independently E. coli evolving populations. These results illustrate how experimental evolution in natural environments allows us to dissect the mechanisms underlying adaptation and its complex dynamics and further reveal the importance of mobile genetic elements in contributing to the adaptive diversification of bacterial populations in the gut.
These summary statistics lead to estimation of two effective evolutionary parameters (U e and s e ) [2]. Under intense CI U e (the effective genomic mutation rate) will be an underestimate of U b , and s e (the effective selection coefficient of beneficial mutations) an overestimate of s b [27]. For the dynamics in Figure 1, the summary statistics imply U e ,7610 27 and s e ,0.075.
We also have used a recently developed method [26] to estimate the fitness of the haplotypes that segregate at sufficiently high frequency to lead to changes in marker dynamics. This method takes the frequencies of the neutral markers across all the time points of the experiment and allows for the occurrence of CI. We started by estimating the distribution of fitness effects of the minimum number of haplotypes, as proposed by Illingworth and Mustonen [26], to fit the marker frequency data. Assuming the simplest possible model of Darwinian selection, with constant selection across space and time (admittedly an oversimplified view of the gut), we fitted a series of models, which differ in the number of beneficial mutations. Starting from the simplest minimal model where only one beneficial mutation occurs, we sequentially increase the number of mutations until a maximum of five. For each model this approach fits, by maximum likelihood, the observed frequency changes at a marker locus with two alleles (our fluorescent alleles) and then chooses the simplest possible model. The predictions described faithfully the empirical data in terms of marker frequencies (lines in Figure 1A). With this method the distribution of haplotype fitnesses across all populations, can be determined ( Figure 2). The inferred fitness effects of segregating haplotypes are quite large with a mean of (15%) and some haplotypes leading to increases in fitness of more than 30%. We note that given the intense CI observed, this approach misses some of the mutations contributing to the dynamics (see below).

Genetic basis of the initial adaptations to the mouse gut
To characterize genetically the first steps of the adaptive process in vivo, we performed whole genome sequencing (WGS) of the ancestral and independently evolved clones. Each evolved clone was isolated from a different mouse after colonization for 24 days (,400 generations). The mean number of mutations per evolved clone was 2.3, with a minimum of 1 and a maximum of 4 ( Figure 3 and Table S2, see also Table S1).
The genetic analysis showed that similar and parallel adaptive paths were taken during the initial adaptation to the gut ( Figure 3, Table S2). The most striking recurrent event was detected at the level of mutations in the gat operon, which occurred in all the sequenced clones (79% IS insertions and 21% small indels). The gat operon consists of six genes that collectively allow for galactitol metabolism [28]. We found that galactitol had an inhibitory effect on the ancestral strain when grown in minimal medium with glycerol ( Figure S2). However in all the evolved clones (carrying  mutations either in the coding region of gatA, gatC, gatZ or in the regulatory region of gatY (see Figure 3)) this inhibition was surpassed ( Figure S2). All these clones exhibited a gat-negative phenotype, inability to metabolize galactitol. Since galactitol is part of the host' metabolism of galactose, E. coli might be frequently exposed to this compound in the gut. This may have selected for the gat-negative phenotype.
Parallelism at the gene level was also observed: four clones carried mutations in srlR, which is a DNA-binding transcription factor that represses the srl operon involved in sorbitol metabolism [29]. At least two of the mutations inactivated the gene, since these caused a frameshift and a stop codon (Table S2). The inactivation of srlR leads to the constitutive expression of the srl operon [29], thus these mutants are expected to readily consume sorbitol, even at low concentrations. This might represent an advantage when facing limiting nutrients and strong competition, compatible with the hypothesis that the ability to use several limiting carbon sources is an important advantage for E. coli inhabiting the mammalian gut [30,31].
Two other parallelisms in the regulatory regions of dcuB (three clones) and focA (two clones) were observed. Both of these genes encode transmembrane transporters whose expression occurs almost exclusively in anaerobic conditions. The Dcu system mediates the uptake, exchange and efflux of C 4 -dicarboxylic acids [32]; fumarate is a C4-dicarboxylate acid and is the most important anaerobic electron acceptor for E. coli in the intestine [33]. Additionally, focA is a putative transporter of formate [34] and is the ''signature compound'' of E. coli anaerobic metabolism [35]. During anaerobic growth, pyruvate is cleaved into formate that is subsequently used as a major electron donor for the anaerobic reduction of fumarate [36] and nitrate or nitrite [37]. Hence, given their roles in anaerobic respiration it would be expected that dcuB and focA are under selective pressure in the gut and that the mutations observed may be important for regulating anaerobic respiration of E. coli in the gut.
Interestingly, one clone carried a non-synonymous mutation in arcA, a global regulator that governs respiratory flexibility of E. coli. arcA allows controlled switching between aerobic and anaerobic respiratory genes during fluctuating oxygen conditions, a trait crucial for E. coli to efficiently compete with the gut flora composed essentially of anaerobes [33,38].
Previous studies have found increased expression of dcu, gat and srl regulons during growth of E. coli on mucus [20]. Together with our findings of mutations on these genes, suggests that these regulons are under strong selection.
One common adaptation in E. coli reported to occur in the mouse gut is decreased motility [21,31,39], which typically occurs in strains that are hypermotile as a result of an IS element upstream of the flhDC operon. Since our strain is not hypermotile, it is not surprising that mutations in this region were not observed in our study.
Taken together, our genetic analyses revealed that the initial adaptive steps of E. coli to the mouse gut involved gene inactivation or modulation through IS elements. In terms of the relative contribution of regulatory versus coding regions to adaptation [40,41], we found that half of the IS insertions occurred in regulatory regions and one third of all mutations were located in these regions. We therefore observe that both changes in regulatory and coding regions contributed to E. coli adaptation to the gut. We also note that the first step of adaptation involved loss of function mutations (namely in the gat operon), which supports the recent view that null mutations may play an important role in early adaptation to new environments [42].
Previous studies [4,43] focusing on evolution of E. coli populations in glucose-limited chemostats, a device used to simulate the human gut [44], during a similar period of time (26 The genes dcuB, srlR and focA and one operon (gat) are highlighted. These represent regions of parallel mutation in at least two genomes. The genomic context of these mutations is represented on the right. (reg) after the gene name, means that the regulatory region, rather than the coding region, was affected. Numbers above marked mutations represent the number of times a particular mutation was detected at the same position. doi:10.1371/journal.pgen.1004182.g003 days), found a high degree of parallelism just as we observe here. Furthermore extensive phenotypic diversity was observed [43] and whole genome sequencing identified similar numbers of substitutions and a contribution of IS elements to the adaptive process [45]. However the targets of selection in glucose-limited chemostats were very different from those found during its adaptation to the gut, reflecting the distinct specific selective pressures E. coli is subjected to in the different environments.
Phenotypic hard sweeps and genetic soft sweeps at the gat operon Given the striking parallelism that occurred in the gat operon, through knockout mutations, we sought to determine the timing of its emergence in each of the populations. We followed the dynamics of the gat-negative phenotype during adaptation ( Figure 4). In eight populations the advantageous mutation could be detected just 2 days post-colonization. In one population an extraordinarily rapid phenotypic sweep could be seen: a population where initially the majority of clones were gat-positive changed completely its phenotype within 2 days (Figure 4, population 1.5), showing the strong benefit associated with this phenotype.
In most populations the increase in frequency of the gat-negative phenotype was followed by a change in marker frequency ( Figure  S3A), consistent with the expected dynamics of a strong beneficial mutation occurring in a given fluorescent background sweeping to fixation. However in some populations fixation of the gat-negative phenotype was observed without fixation of the neutral marker ( Figure S3B). This suggests that independent mutations causing the same phenotype have occurred, i.e. clonal competition emerging from multiple alleles at the same locus with similar selective effects is driving these dynamics.
We next sought to estimate the strength of selection associated with new beneficial alleles that emerge and escape stochastic loss from the initial change in frequency of the gat-negative phenotype. If we take population 1.12, where only one allele was detected in the gat operon, we can estimate the selective advantage conferred by this mutation (s gat ) from the slope of the Ln(x/(1-x)) along time, where x is the frequency of the gat-negative phenotype. When doing so we find that s gat = 0.07 (60.01) (Figure 4 inset). In the other populations the strength of selection is more difficult to estimate due to the emergence of many distinct alleles, which leads to an underestimate of their beneficial effect due to CI.
To determine directly the selective advantage of gat alleles we performed in vivo competitions against the ancestor. As shown in Figure S4, the mutant gatZ (clone 4YFP) had, on average, an advantage of 0.065 (60.013) over the ancestral, which is compatible with that inferred from the dynamics of the gatnegative phenotype.
To gain further insight into the regime of interference detected by the change in frequency of the neutral markers, we studied polymorphism levels of four of the adapting populations. We followed the change in frequency of haplotypes at different points in time in two populations that showed a rapid change in marker frequency (populations 1.5 and 1.12) and in two other populations where the neutral marker was kept polymorphic throughout the period studied (1.1 and 1.11) (Tables S3 to S6). Each haplotype was the combined genotype at 7 selected loci: gatY, gatZ, gatA, gatC, srlR, dcuB and focA. These loci were previously observed to be mutational targets in two or more of the sequenced independently evolved clones (see Fig. 3), which we interpret as important players in adaptation to the gut, and thus likely exhibiting segregating polymorphisms in most populations. We note that we determined the haplotype structure of a sample of clones (between 20 and 40 clones per time point, per population) based on the mutations found in the sequenced clones, that is, we looked for IS insertions in gatY, gatZ, gatA, gatC, dcuB and focA, and SNPs or small indels in gatZ, gatC and srlR (see Methods). The presence of an extra mutation, a duplication of ,150 Kb, was also tested in population 1.12. Figure 5 shows the dynamics of these haplotypes in the populations studied (see also Figure S5 for the sums of the frequencies of genotypes with possible equivalent phenotypes over time). Extensive haplotype diversity is found in all populations but to a lesser extent in populations 1.12 and 1.5. For example, in the first ,100 generations we observe 16 and 8 haplotypes segregating in populations 1.1 and 1.11, but only 3 and 5 in populations 1.12 and 1.5, respectively. This observation is concordant with the neutral marker dynamics, where populations 1.12 and 1.5 showed a rapid sweep of one of the markers while the other two (1.1 and 1.11) maintained the ratio of these markers closer to 50% for a longer period. The first mutation detected, without exception, impairs galactitol metabolism, thus surpassing the vital inhibitory effect of this compound on E coli' growth. We observed multiple soft sweeps of mutations, with possibly similar fitness benefit, occurring independently in the gat operon and competing for fixation. We directly tested for this by performing competitions between clones carrying different gat alleles: mutations in gatZ against gatC. The results of these competitions show an equivalent selective coefficient of at least these two mutants ( Figure S6), supporting the hypothesis that the different mutations in the gat operon present in the populations confer the same phenotype and fitness advantage. This provides an explanation for the observed coexistence between different alleles at the same locus and the phenotypic replacement detected.
After the mutations in the gat operon secondary adaptive mutations occurred and promoted the increase in the frequency of haplotypes carrying more than one beneficial mutation. Overall, the adaptive process appears non-mutation limiting, allowing polymorphism levels within populations to be kept high until the end of the period studied. A closer look to the haplotype dynamics points to the strong possibility of further mutations occurring beyond the ones we have targeted. For example, at generation 126 in population 1.12, the increase in frequency of the neutral allele yfp seems to be caused by some other mutation besides gatC. Furthermore the occurrence of a large duplication in the same population appears to cause a beneficial effect. To gain access to further beneficial mutations, we performed whole genome sequencing of this (1.12) and two other populations (1.11 and 1.10) at the end-point of the experiment (see Table S7). This procedure aimed at querying the possibility of beneficial mutations at other loci significantly contributing to the adaptive process. In all populations we observed multiple mutations in some of the targets previously identified (gat operon, srlR and focA) as well as several other loci that were targets for mutation unique to a single population. The adaptive value of these last events is difficult to ascertain since these might have increased in frequency to detectable values due to linkage with other beneficial mutations or just by chance. This analysis allowed uncovering two new parallel mutational targets. One of these targets was the regulatory region of yjjP where an IS insertion occurred in population 1.11 and also on clone 3YFP (from population 1.3) and the other target was asnA, where either a deletion or a duplication of around 50 bp was observed in two different populations (1.11 and 1.12). yjjP codes for an inner membrane structural protein of unknown function [46] and asnA codes for one of the two asparagine synthetases in E. coli, catalyzing the ammonia-dependent conversion of aspartate to asparagine [47].

Potential role of epistasis in the adaptive process
Overall, the haplotype analysis found ,6% of triple mutants. These were all gat-negative and carried either a srlR and dcuB mutation or a srlR and focA mutation. dcuB and focA were never found in combination, suggesting that they may interact epistatically. This can be possibly due to their common involvement in anaerobic respiration modulation, thus fulfilling related functions. This last observation suggests a role for negative epistatic interactions whereby mutations are beneficial individually but deleterious when in combination. The fact that repeated selection of dcuB or focA mutations was observed, but these mutations were never found in the same genetic background, may indicate a disadvantage of the double mutant carrying mutations that individually are presumably beneficial. Another example of possible epistasis is the case of the mutations in the gat operon. These occur very rapidly and lead to the inactivation of the gat operon, conferring an estimated advantage of approximately 7%. It is therefore expected that additional mutations subsequently occurring in other gat genes would not confer a strong or any advantage, since the first mutation already resulted in a loss of function.

Test for negative frequency-dependent selection
Negative frequency dependent selection, i.e., fitness advantage of clones when at low frequency and fitness disadvantage when at high frequency, is known to lead to maintenance of genetic diversity [12,43,48]. Since the gut is a complex environment [49] where non-transitive ecological interactions may occur, we tested for variation in fitness effects with initial frequency of evolved clones. Negative frequency-dependent interactions could explain the long-term maintenance at low frequency of some of the fluorescent subpopulations observed in the adaptive dynamics ( Figure 1A, populations 1.5, 1.12 and 1.13). We tested population 1.13 for this type of selection, where CFP marker increased till 0.95 but never achieved fixation. In Figure S7 we show the results of the tested adapted clones in competitive fitness assays at different initial frequencies. While we observed that clones have an advantage when rare we also observe that such selective advantage is present when at high frequency. No disadvantage was found at any of the initial frequencies tested, which suggests that negative frequency dependent selection is not a major force operating in this population.

No evidence for emergence of mutators
Our finding that mutations of large beneficial effects can accumulate rapidly in vivo suggested to us that mutators could have emerged. Mutators may play an important role in bacterial evolvability and their emergence typically involves mutations in genes of the DNA repair system [50]. Depending on the gene mutated a mutator bacteria can acquire an increase in its genomewide mutation rate of 10 up to 10 000 fold [51]. Indeed cocolonization of wild-type and mutator bacteria in germ free mice [52] has shown that mutators can increase in frequency by hitchhiking with beneficial mutations to which they are genetically linked to. Furthermore mutators have been observed to sometimes emerge de novo in the early stages of adaptation to glucose-limited environments in laboratory experiments [53,54]. We therefore tested for an increased rate of mutation through fluctuation assays of several adapted clones. These tests however provided no evidence for significant increases in genomic mutation rate ( Figure  S8).

Conclusions
In conclusion, we demonstrate here that E. coli MG1655 adapts very rapidly to the intestine of streptomycin-treated mice. The first steps of adaptation of E. coli to the mouse gut are dominated by soft sweeps and adaptive mutations of large effect. We observed a regime of intense clonal interference where haplotypes carrying more than one beneficial mutation compete for fixation. This is the regime assumed under the Fisher-Muller theory that predicts that recombination speeds up adaptation by reducing competition between beneficial mutations. This data therefore provides support for the Fisher-Muller effect as an important mechanism for the maintenance of homologous gene recombination in bacteria [55]. It shows that E. coli can adapt to this complex ecosystem very fast: within two days a phenotypic replacement could be detected. The first beneficial mutations targeted the same gene or operon and transposable elements made a substantial contribution to adaptation. These results demonstrate the remarkable adaptive potential of bacteria in one of the most complex environments of their natural ecology and may have important consequences for our understanding of the species and strain diversity in the gut microbiota.

Ethics statement
All experiments involving animals were approved by the Institutional Ethics Committee at the Instituto Gulbenkian de Ciência (project nr. A009/2010 with approval date 2010/10/15), following the Portuguese legislation (PORT 1005/92), which complies with the European Directive 86/609/EEC of the European Council.
Strains DM08-YFP and DM09-CFP (MG1655, galK::YFP/CFP amp R (pZ12), str R (rpsl150), DlacIZYA) were used in the initial colonization experiment. These strains contain the yellow (yfp) or cyan (cfp) fluorescent genes linked to amp R in the galK locus under the control of a lac promoter and were obtained by P1 transduction from previously constructed strains [2]. To ensure constitutive expression of the fluorescent proteins the lac operon was deleted using the Datsenko and Wanner method [56].

Fluorescent marker dynamics during mouse colonization
To study E. coli adaptation to the gut we used a streptomycintreated mouse colonization model [49]. Briefly, 6-to 8-week old C57BL/6 male mice raised in specific pathogen free (SPF) conditions were given autoclaved drinking water containing streptomycin (5 g/L) for one day. After 4 hours of starvation for water and food the animals were gavaged with 100 ml of a suspension of 10 8 colony forming units (CFUs) of a mixture of YFP-and CFP-labeled bacteria (ratio 1:1) grown at 37uC in brain heart infusion medium to OD 600 of 2. After gavage, the animals were housed separately and both food and water containing streptomycin were returned to them. Mice fecal pellets were collected for 24 days, diluted in PBS and plated in Luria Broth agar (LB agar). Plates were incubated overnight and the frequencies of CFP-or YFP-labeled bacteria were assessed by counting the fluorescent colonies with the help of a fluorescent stereoscope (SteREO Lumar, Carl Zeiss). Plating the fecal samples allowed checking for morphological phenotypic diversity of the colonies, a phenomenon reported previously in previous studies of adaptation of E. coli to the mouse gut [20][21][22]31,39]. However, no consistent changes in colony morphology were observed during the evolution experiments.
A sample of each collected fecal pellet was daily stored in 15% glycerol at 280uC for future experiments.
For the initial colonization a group of 5 mice was gavaged and followed for 24 days, and this procedure was repeated for two more groups of 5 mice. Thus a total of 15 mice were analyzed ( Figure 1A and S1).

Competitive fitness assays in vivo to test for adaptation
We measured the relative fitness in vivo by bacterial clones isolated from mouse fecal samples after 24 days of colonization (approximately 400 generations). Individual colonies or mixtures of approximately 30 colonies with the same fluorescent marker (population of clones) were picked from mouse fecal platings, grown in 10 ml of Luria Broth medium (LB) supplemented with ampicillin (100 mg/ml) and streptomycin (100 mg/ml) and stored in 15% glycerol at 280uC.
The isolated clones were competed against the ancestral strain labeled with the opposite fluorescent marker, at a ratio of 1 to 1, in 1-2 day co-colonization experiments following the same procedure described for the evolution experiments. The selective coefficient (fitness gain) of these clones in vivo (presented in Figure 1B) was estimated as: where s b is the selective coefficient of the evolved clone, Rf ev/anc and Ri ev/anc are the ratios of evolved to ancestral bacteria in the end (f) or in the beginning (i) of the competition and t is the number of generations per day. We assume t = 18, in accordance with the 80 minute generation time estimated in previous studies on E. coli colonization of streptomycin-treated mouse [24,49,57].

Estimation of rate of beneficial mutations
To infer the effective evolutionary parameters (U e and s e ) we used the method described in [25], available online at http:// barricklab.org/twiki/bin/view/Lab/ToolsMarkerDivergence. This procedure involves three main steps: simulate families of marker trajectories at many combinations of mutation rate (U) and selection coefficient (s), summarize the simulated and experimental marker trajectories in the statistics t e and a e (that represent the time of divergence of marker frequency and the rate of divergence, respectively); compare the distributions of t e and a e of the simulated dynamics and the experimental data and find the values of U and s that best explain the experimental marker dynamics.
Using the model described in [25] we simulated sets of 100 replicate populations evolved under different parameter combinations of U and s. U ranged from 10 28 to 10 24 (with increments of 10 28 , 10 27 , 10 26 , 10 25 and 10 24 ) and s ranged from 0.07 to 0.1 (with increments of 0.005). We assumed a population size of 10 7 , based on the weight of feces per day (1.560.12 (2 s.e.m.) grams) that a mouse produces, on the observed number of CFUs (per gram of feces) along the experiments (which remained stable over the length of the experiment around 10 8 , see Figure S1), and on the number of generations per day, which is 18 [24]. A population size of the same order of magnitude was measured by Leatham-Jensen et al. [58] in the intestinal mucus, where the majority of bacterial division occurs [49].
This simulated data consist of the logarithm of the ratio of the two marker frequencies (Rf = f YFP (ti)/(f CFP (ti)) at several time points (ti), (ln(Rf(ti))).
We then used both the simulated and experimental marker dynamics as input in the program marker_divergence_fit.pl, that summarizes the evolutionary dynamics in two statistics: t e and a e . The first is the time, t e , where a significant deviation of ln(Rf(t e )) from ln(Rf(t = 0)) occurs. The second is the rate of change of ln(Rf(t)) with time, that is, t e sets the time of divergence of marker frequency and a e the rate of divergence. Each replicate population is summarized by a single value of t e and a e , and the n replicate populations (characterized by a given combination (U, S)) result in a distribution of T(t e ) and A(a e ). These distributions are then compared, using the program marker_divergence_significance.pl, to the distributions of t e and a e that summarize the experimental data To(t e ) and Ao(a e ) using a two-dimensional Kolmogorov-Smirnov to test the fit between the simulated data and the pseudoobserved data. The combination (U, S) that gives rise to the highest P value is taken as U e and S e , even when the hypothesis that the distributions are different cannot be rejected.

Estimation of distribution of haplotypes fitness
To estimate the fitness effects of beneficial mutations that may contribute to adaptation we used a maximum likelihood approach to infer the number, fitness effect and time of establishment of beneficial mutations in an asexual population from individual marker frequencies trajectories [26]. This method, which is available on web page: http://www.sanger.ac.uk/resources/ software/optimist/, makes no a priori assumptions about the distribution of mutation effects. It has been tested against simulated data and has been shown to be able to: reproduce complex trajectories of neutral markers, make accurate estimates of the distribution of haplotype fitness effects, and provide good estimates of the number of mutations segregating at high frequencies, at least for values of mutation rates that are not very large. We note that this method may miss many small effect beneficial mutations if these indeed occur at high rates.
It assumes an initial population of cells comprising two haplotypes with equal fitness, corresponding to two neutral markers. A beneficial mutation occurring in one of the backgrounds is represented as a new haplotype, which is assumed to exist at a given time (Tb i ) and with a frequency of 0.001 in the population. Since this new haplotype has increased fitness (W i ) relative to the initial wild-type population, its frequency will increase, leading to an increase in frequency of the fluorescent marker population in which the mutation has occurred. The simplest model that can be assumed is that where a single beneficial mutation occurs. That will imply a given time of establishment and effect of the mutation that maximize the probability of observing the marker frequency data. The second model that is considered is one where two mutations can occur, which will lead to a given likelihood of the data. Models assuming different numbers of beneficial mutations can thus be compared based on their likelihood score using Akaike information criteria [26]. With this method we obtain a distribution of haplotype fitnesses with the minimal number of mutations that can explain the marker frequency dynamics (v h ).
Whole genome re-sequencing and mutation prediction Clone analysis: After 24 days of gut colonization one clone from populations 1.1 to 1.14 and the two ancestors (MG1655-YFP and MG1655-CFP) were isolated and used to seed 10 mL of LB (Line 1.15 was not analyzed since the mouse from this line died at that time point). These cultures were then grown at 37uC with agitation. Subsequently DNA was isolated following a previously described protocol [59]. The DNA library construction and sequencing was carried out by BGI. Each sample was pair-end sequenced on an Illumina HiSeq 2000. Standard procedures produced data sets of Illumina paired-end 90 bp read pairs with insert size (including read length) of ,470 bp. Genome sequencing data have been deposited in the NCBI Read Archive, http:// www.ncbi.nlm.nih.gov/sra (accession no. SRP017347). Mutations were identified using the BRESEQ pipeline [60]. To detect potential duplication events we used ssaha2 [61] with the pairedend information. This is a stringent analysis that maps reads only to their unique match (with less than 3 mismatches) on the reference genome. Sequence coverage along the genome was assessed with a 250 bp window and corrected for GC% composition by normalizing by the mean coverage of regions with the same GC%. We then looked for regions with high differences (.1.4) in coverage. Large deletions were identified based on the absence of coverage. For additional verification of mutations predicted by BRESEQ, we also used the software IGV (version 2.1) [62].
Ancestral genome: The sequence reads from MG1655 were mapped to the reference strain [63]. The two ancestors carried the mutations listed in Table S1. The mutations underlined were present in the YFP ancestor but not in the CFP. The sequences of the 14 sequenced clones were then interrogated against this ancestral genome and the mutations identified are listed in Table S2.
Population analysis: DNA isolation was obtained in the same way as described above for the clone analysis except that instead of one clone per population a mixture of .1000 clones per population was used. Three populations were sequenced: 1.10, 1.11 and 1.12.
The DNA library construction and sequencing was carried out by the IGC genomics facility. Each sample was pair-end sequenced on an Illumina MiSeq Benchtop Sequencer. Standard procedures produced data sets of Illumina paired-end 250 bp read pairs. The mean coverage per sample was as ,260x for population 1.10, ,160x for population 1.11 and ,100x for population 1.12. Genome sequencing data have been deposited in the NCBI Read Archive http://www.ncbi.nlm.nih.gov/sra (accession no. SRP033025). Mutations were identified using the BRESEQ pipeline with the polymorphism option on (see Table  S7). The default settings were used except for: a) require a minimum coverage of 3 reads on each strand per polymorphism; b) eliminate polymorphism predictions occurring in homopolymers of length greater than 3; c) eliminate polymorphism predictions with significant (p-value,0.05) strand or base quality score bias. All predicted polymorphisms were manually inspected using IGV.
Test for the advantage of the evolved clones in the presence of galactitol All clones that were sequenced carried mutations in the gat operon. To test for any fitness advantage of these evolved clones, we measured their growth in M9 minimal medium (MM) supplemented with glycerol and with glycerol and galactitol, and compared growth in both conditions. All growth curves were conducted in 96-well plates incubated at 37uC with aeration (Thermoshaker PHMP-4, Grant). After a first overnight growth in MM with glycerol (0.4%) the cultures were normalized to the same OD 600 (approximately 0.1) and diluted 100-fold. We used 5 ml of the 10 22 dilution to inoculate in triplicate wells containing MM supplemented either with glycerol (0.4%) or glycerol (0.4%) and galactitol (0.4%). Optical density at 600 nm was monitored for 36 hours using a microplate reader (Victor3, PerkinElmer). The results are shown in Figure S2.

Emergence and dynamics of mutations in the gat operon
To investigate the dynamics of appearance and expansion of beneficial mutations in the gat operon we determined the frequency of bacteria unable to metabolize galactitol (gat-negative phenotype) within a given population of the evolution experiment.
For each population, a sample of the frozen fecal pellets was diluted in PBS and plated in Mac Conkey agar supplemented with 1% of galactitol and streptomycin (100 mg/ml); a differential medium used to monitor the ability of bacteria to use galactitol. Plates were incubated for 20 hours at 28uC. The frequency of galactitol mutants for each time point was estimated by counting the number of white (galactitol mutants) and red colonies in Mac Conkey-galactitol plates.
Identification of adaptive mutations and estimate of haplotype frequencies in populations 1.1, 1.5, 1.11 and 1.12 during 24 days of adaptation to the mouse gut In order to identify the adaptive mutations and estimate the haplotype frequencies, between 20 and 40 clones were analyzed per time point per population. These clones were obtained by plating directly the frozen fecal samples and thus correspond to samples representative of the evolving E. coli population. The presence of the adaptive mutations was queried by target PCR. The increase in size of the PCR product was indicative of the presence of an IS element. The detection of SNPs was done by target Sanger sequencing using the primers listed in Table S8. PCR reactions were performed in a total volume of 50 ml containing 1 ml bacterial culture, 10 mM of each primer (forward and reverse), 200 mM dNTPs, 1 U Taq polymerase and 1X Taq polymerase buffer. The PCR reaction conditions were as follows: 95uC for 3 min followed by 34 cycles of 95uC for 30 s, 60uC for 30 s, 72uC for 2 min, followed by 72uC for 5 min. srlR, gatC and gatZ were sequenced directly from the PCR products using the primers listed in Table S8.
To detect an insertion of 1 bp in gatC, the gene was PCR amplified and then submitted to an enzymatic restriction with MvaI enzyme. The mutant was identified based on the fragment restriction profile.
The 150 Kb duplication was detected by amplification of the new junction formed by the tandem duplication. The PCR was performed using the same conditions described before and the primers listed in Table S8.
Direct estimation of the advantage of the gat alleles by competition To determine the advantage of the gat alleles we performed in vivo fitness assays. The competition experiments lasted for three days and were performed according to the protocol described above for the evolution experiment.
In the first set of competitions ( Figure S4) we competed a clone genetically similar to the ancestral but bearing an IS insertion in the gatZ gene (sequenced clone 4) against the ancestral. In the second set of competitions ( Figure S6) we competed the gatZ mutant against a gatC mutant (derived from sequenced clone 12). In addition to a 1 bp insertion in the coding region of the gatC gene, this clone originally carried an unstable 150 kb duplication, lost during in vitro manipulations to create the mutant gatC.

Testing for negative frequency dependent selection
To test for this form of selection we isolated around 30 YFP and 30 CFP clones from fecal samples after 24 days of colonization of population 1.13. We then performed competitive fitness assays in vivo. The mixture of the 30 YFP clones was competed against the mixture of the 30 CFP clones at three initial ratios (1:1, 10:1 and 1:100) for 2 days, using the procedure described for the evolution experiments. The results are shown in Figure S7.

Test for increased mutation rate
To test for the possible emergence of mutators during adaptation to the gut we determined the frequency of rifampicin-resistant mutants, in each of the evolved populations. We grew pre-cultures of the evolved populations by inoculating in 10 ml of LB a sample of each population from the last day of the experiments (approximately 400 generations of gut colonization). Pre-cultures were grown overnight at 37uC with aeration in the presence of ampicillin (100 mg/ml) and streptomycin (100 mg/ml). The pre-cultures were then diluted and approximately 1000 cells (10 ml of a 10 -4 dilution) were inoculated in triplicate in 1 ml of LB and incubated overnight. Aliquots of each tube were plated in LB agar and LB agar supplemented with rifampicin (100 mg/ml) and incubated overnight at 37uC. The frequency of mutation to rifampicin resistance was calculated as the ratio between the number of rifampicin resistant mutants and the total number of individuals in each population.  Figure S4 Direct estimate of the selective advantage of a gat allele over the ancestral in the mouse gut. Results of three independent competition experiments between gatZ mutant and the ancestral. We estimated a selective advantage of gatZ (corresponding to the slope (62 s.e.m) of the linear regression of ln(gatZ/anc) along the three days of competition) of 0.065 (60.013), R2 = 0.99 over the ancestral. (TIF) Figure S5 Sums of frequencies of genotypes at different loci along time for the populations represented in Figure 5. As in Figure 5 these frequencies are represented along 24 days (corresponding to 432 generations) of evolution inside the mouse gut (see Tables S3 to S6 for numeric data). A corresponds to population 1.1, B to 1.11, C to 1.12 and D to 1.