The rate and potential relevance of new mutations in a colonizing plant lineage

By following the evolution of populations that are initially genetically homogeneous, much can be learned about core biological principles. For example, it allows for detailed studies of the rate of emergence of de novo mutations and their change in frequency due to drift and selection. Unfortunately, in multicellular organisms with generation times of months or years, it is difficult to set up and carry out such experiments over many generations. An alternative is provided by “natural evolution experiments” that started from colonizations or invasions of new habitats by selfing lineages. With limited or missing gene flow from other lineages, new mutations and their effects can be easily detected. North America has been colonized in historic times by the plant Arabidopsis thaliana, and although multiple intercrossing lineages are found today, many of the individuals belong to a single lineage, HPG1. To determine in this lineage the rate of substitutions—the subset of mutations that survived natural selection and drift–, we have sequenced genomes from plants collected between 1863 and 2006. We identified 73 modern and 27 herbarium specimens that belonged to HPG1. Using the estimated substitution rate, we infer that the last common HPG1 ancestor lived in the early 17th century, when it was most likely introduced by chance from Europe. Mutations in coding regions are depleted in frequency compared to those in other portions of the genome, consistent with purifying selection. Nevertheless, a handful of mutations is found at high frequency in present-day populations. We link these to detectable phenotypic variance in traits of known ecological importance, life history and growth, which could reflect their adaptive value. Our work showcases how, by applying genomics methods to a combination of modern and historic samples from colonizing lineages, we can directly study new mutations and their potential evolutionary relevance.


INTRODUCTION
Colonizing or invasive populations sampled through time (1,2) constitute "natural experiments" where it is possible to study evolutionary processes in action (3) . Colonizations, which are dramatically increasing in number (4,5) , sometimes are characterized by strong bottlenecks and genetic isolation (6,7) , and thus greatly facilitate the observation of new mutations and potentially their effects under natural population dynamics and selection (8) . Colonizations thus offer a complementary approach to other studies of new mutations, which often minimize natural selection, for example in laboratory mutation accumulation experiments (9) and parent-offspring comparisons (10) . The study of colonizations is also complementary to the investigation of genetic divergence over long time scales, e.g., between distant species (11) , where the results are largely independent of short-term demographic fluctuations. There is exist in this putatively pure HPG1 set (Fig. 4). As a reminder, random pairs of A. thaliana accessions from the native range or pairs of non-HPG1 typically differ by about 500 SNPs / 100 kb (21) (see scale in There were no SNPs in mitochondrial nor chloroplast genomes, which already suggested a recent common origin, and genome-wide nuclear diversity (π = 0.000002, θ W = 0.00001, with 5,013 full informative segregating sites) was two orders of magnitude lower than in the native range of the species (θ W = 0.007) (21) (Table S1) (Supplementary Text 6). The population recombination parameter was also four orders of magnitude lower (4 N e r = ρ = 3.0x10 -6 cM bp -1 ) than in the native range (ρ = 7.5x10 -2 cM bp -1 ) (26) (Supplementary Text 6). While recombination occurs in every generation, regardless of self-fertilization or outcrossing, it is only observable after outcrossing between genetically non-identical individuals, and this is what the population recombination parameter reports. We must stress that because A. thaliana can outcross at rates of several percent per generation (23,27) , but because the HPG1 population is genetically so homogeneous, we are mostly "blind" to the consequences of outcrossing in this special case. The lack of "observable recombination" in the genome is important, as it allows for the use of straightforward phylogenetic methods to calculate a mutation rate. The enrichment of low frequency variants in the site frequency spectrum (Tajima's D = -2.84; species mean = -2.04, (21) ) and low levels of polymorphism are consistent with a recent bottleneck followed by population expansion (Fig. 3). The obvious explanation is that the strong bottleneck corresponds to a colonization founder event, likely by very few closely related individuals, or perhaps only a single plant.
Altogether these patterns indicate that the collection of HPG1 plants we investigated constitute a quasi-clonal and quasi-identical set of individual genomes, mostly devoid of observable recombination and population structure, and thus eminently suited for the study of naturally arising de novo mutations.

The genome-wide substitution rate
It is important to distinguish between the mutation rate , which is the rate at which genomes change due to DNA damage, faulty repair, gene conversion and replication errors, and substitution rate , which is the rate at which mutations survive and accumulate under the influence of demographic processes and natural selection (28,29) . Under neutral evolution, mutation and substitution rates should be equal (29) .
The simple evolutionary history of the HPG1 population enables direct estimates of substitution rates, and the comparison of theses between different genome annotations, as well as with mutation rates from controlled conditions experiments, could reveal the role played by both demographic and selective forces.
To estimate the substitution rate in the HPG1 lineage, we used distance-and phylogeny-based methods that take advantage of the known collection dates (Supplementary Text 7). The distance method is independent of recombination and has been previously applied to viruses (30) and humans (31) . The substitution rate is calculated from correlation between differences in collection time in historic-modern sample pairs, and the number of nucleotide differences between those pairs relative to a reference (Fig. 3C), scaled to the size of the genome accessible to Illumina sequencing. This method resulted in an estimated rate of 2.11x10 -9 substitutions site -1 year -1 (95% bootstrap Confidence Interval [CI]: 1.88-2.33x10 -9 ) using rigorous SNP calling quality thresholds. Relaxing the thresholds for base calling and minimum genotyped rate affects both the number of called SNPs and the length of the interrogated reference sequence (32) . These largely cancelled each other out, and the adjusted estimates were relatively stable, between 2.1-3.2x10 -9 substitutions site -1 year -1 (Table S3, Supplementary Text 3).
The second method, a Bayesian phylogenetic approach, uses the collection years for tip-calibration and assumes a relaxed molecular clock. It summarizes thousands of plausible coalescent trees, and it has been extensively used to calculate evolutionary rates in various organisms (33-35) . This method yielded a substitution rate of 4.0x10 -9 , with confidence ranges overlapping the above estimates (95% Highest Posterior Probability Density [HPPD]: 3.2-4.7x10 -9 ).
Based on the similar results obtained with two very different methods, we can confidently say that the substitution rate in the wild populations of HPG1 is between 2 and 5 x10 -9 site -1 year -1 .
To date the colonization of N. America by HPG1 A. thaliana and to improve the description of intra-HPG1 relationships compared to that from a NJ tree, we further used a Bayesian phylogeny. At first sight, the 73 modern samples appeared separated from the herbarium samples (Fig. 3B), but the superimposition of thousands of possible trees showed that the apparent separation of samples was less clear near the root (Fig. 3A). Long terminal branches reflected that the majority of the variants are singletons, typical of populations that expand after bottlenecks.
The mean estimate of the last common HPG1 ancestor, the average tree root, was the year 1597 (HPPD 95%: 1519-1660) (Fig. 3A, B), and an alternative non-phylogenetic method gave a similar estimate, 1625. Both estimates are older than a previously suggested date in the 19 th century, using a laboratory mutation rate estimate and having no information from herbarium samples (25) . Because HPG1 appears to have been the most abundant lineage in N. America since the 1860s, we believe it could have been one of the first, if not the first colonizer that could establish itself in N. America. If that is true, the time of coalescence of the HPG1 diversity could be close to the time of HPG1 introduction to N. America. During the colonial period, many European immigrants settled on the East coast, consistent with N. American A. thaliana lineages being genetically closest to British and coastal West European populations (21) . Coincidently, the oldest herbarium samples (12 out of the 27) were HPG1 and came from the East Coast, and we found a significant correlation between collection date and both latitude and longitude (Fig. 1C). This could indicate that after the colonization they moved from the East Coast to the Midwest -the other main area of the distribution that experienced an agricultural expansion in the 19 th century (36) . Still, these conclusions need to be treated with caution, since regardless of the robustness of the results and our attempts to sample evenly from available collections, there could be unknown biases in the 19 th century herbaria.

Mutation spectra across genome annotations
Although for dating divergence events a substitution rate expressed by years is ideal, in order to compare substitution and mutation rates, both need to be expressed per generation. While A. thaliana is an annual plant, seed bank dynamics generate a delay of average generation time at the population scale.
A comprehensive study of multiple A. thaliana populations in Scandinavia found that dormant seeds could wait for longer than a year in the seed bank, generating overlapping generations and an delayed average generation time of 1.3 years (37) with a notable variance across populations. Multiplication by the mean generation time led to an adjusted rate of 2.7x10 -9 substitutions site -1 generation -1 (95% CI 2.4-3.0x10 -9 ) (Fig. 3E). To be able to compare this rate with a reference, we also re-sequenced mutation accumulation (MA) lines in the Col-0 reference background grown under controlled conditions in the de novo mutation rate in A. thaliana greenhouse that had been analyzed before with less advanced short read sequencing technology (38) .
From the new re-sequencing data, we obtained an updated rate of 7.1x10 -9 mutations site -1 generation -1 (95% CI 6.3-7.9x10 -9 ) (Tables S2, S3, Supplementary Text 4 and 7). This is two-to three-fold higher than the per-generation estimate in the wild, but within the same order of magnitude.
The same holds for rates in different genome annotations, i.e. genic, intronic and intergenic regions, but the confidence intervals overlapped in many cases (Table S3).
Differences in per-generation rates between laboratory and wild populations could stem from both methodological as well as biological causes. For instance, if the true average generation time was actually over 3 years / generation, the differences would cancel out (Fig. 3E Fig. S4).
An alternative evolutionary explanation to the aforementioned laboratory and wild populations' rates differences is that purifying selection in the wild would slow down the accumulation of mutations by removing deleterious mutations (Fig. 3E). This has been observed before and is one of the accepted causes of the discrepancy between the so called long-and short-term substitution rates in a range of organisms (40) .
In order to provide evidence for negative purifying selection acting in the wild, we performed three types of analyses involving comparisons across genomic annotations within the HPG1 dataset. intergenic regions, intronic regions, and all non-coding regions of genome. All three pairwise comparisons showed a depletion of coding SNPs and an enrichment of intergenic, intronic and non-coding SNPs (odds ratio>2, p<10 -16 ). An obvious explanation is that in genome annotations where a mutation is more likely to be deleterious, i.e. coding regions, the number of observed variants should be lower due to selection having removed them from the population before we could sequence them.
Secondly, we studied the Site Frequency Spectrum (SFS) of genetic variants. The rationale was that because purifying natural selection is more efficient at removing intermediate-frequency variants, variants that tend to be deleterious or slightly deleterious should be found at lower frequency than those that only suffer neutral drift (41) . We built contingency tables of coding, intergenic, intronic and non-coding variants segregating above and and below the conventional frequency cutoff of 5% to separate low-and intermediate-frequency variants (42) . We found that SNPs in coding regions were more likely to be at low frequency than those in intergenic (odds ratio=2.34, p=3.09x10 -11 ), intronic (odds ratio=1.48, p=0.02), and all non-coding regions (odds ratio=2.05, p=1.29x10 -8 ). We carried out the same analysis using nonsynonymous and synonymous SNPs, which are easily interpretable in terms of the selection regimes under which they evolve. We did not find an enrichment (p=0.67), perhaps a consequence of the small number of such mutations (Table S3).
Thirdly, to verify that the full frequency spectrum of coding SNPs was shifted to lower frequencies (i.e. the results were not dependent on the arbitrary 5% frequency cutoff), we used the nonparametric Kolmogorov-Smirnov test for two samples. We found that the cumulative distribution of the site frequency spectrum (CD SFS ) of coding regions is above (i.e., the frequency distribution is overall skewed to lower values) both the intergenic CD SFS (p=3.25x10 -6 ) and the non-coding regions CD SFS (p=0.001), but not the intronic CD SFS (p=0.60) (Fig. S5). As in our previous analysis, the comparison between the nonsynonymous and synonymous CD SFS yielded, likely for similar reasons, no differences (p=0.53).
All in all, these results support that purifying selection is a force shaping to some degree the diversity across the HPG1 genome and might therefore as well contribute to the differences between HPG1 and MA rates.

Potentially advantageous de novo mutations
Finally, having discovered over 5,000 de novo mutations in the HPG1 lineage, we wondered whether there is any evidence for an adaptive role of these de novo mutations in the colonization of N. America by HPG1. We noted that some new mutations had risen to intermediate or even high frequencies in the HPG1 samples. This might have been the consequence of drift from stochastic demographic processes, or it could have been caused by positive natural selection. To find direct evidence for the latter, we grew the modern accessions in a common garden and studied phenotypes of known importance in ecology of invasions (43) , namely flowering time and root traits (see Supplementary Text 8). Using linear mixed models, we calculated the proportion of variance explained (also called narrow sense heritability, h 2 ) with a kinship matrix of all SNPs that had become common (>5%, n=391). We found significant heritable variation for multiple traits including the growth rate in length (h 2 =0.64) and the average root gravitropic direction (h 2 =0.54). As in our study mutations are the main source of genetic variants, these mutations -or mutations linked to them -should be responsible for significant quantitative variation in several traits (Table S4, Supplementary Text 10). The existence of mutation-driven phenotypic variation at least indicates that natural selection could have acted upon such phenotypic variation.
Although linkage disequilibrium (LD) among SNPs is high, the fact that HPG1 genomes differ in very few SNPs greatly reduces the list of candidate loci that might generate the observed phenotypic variation (Fig. S6) (44) . With this reasoning in mind and understanding the limitations imposed by LD, we carried out a genome-wide association (GWA) analysis and found 79 SNPs associated with one or more root traits, mostly growth and directionality (Fig. 4). Twelve SNPs were in coding regions and seven resulted in nonsynonymous changes -some producing non-conservative amino-acid changes and thus likely to affect protein structure and/or function (Table 1, based on transition scores from (45) ). Due to the aforementioned LD, in some cases the results of associations could not be confidently assigned to a specific SNP and thus we report the number of other associated mutations with r 2 > 0.5 (Table 1, Fig.   S6). For other cases, we were able to pinpoint clear candidates that were not in LD with other SNPs and whose functional annotation had a strong connection to the phenotype (Table 1, Fig. S6). For example, one SNP associated with root gravitropism was not linked to any other SNP hit and it was found at 40% frequency (top 3% percentile). This SNP produces a cysteine to tryptophan change in AT5G19330, which is involved in abscisic acid response and confers salt tolerance when overexpressed (46) . Another nonsynonymous SNP associated with root growth is located in AT2G38910, which encodes a calcium-dependent kinase that is a factor regulating root hydraulic conductivity and phytohormone response in vitro (47,48) .
Nineteen other SNPs were associated with climate variables after correction for latitude and longitude ( www.worldclim.org , Table S4 ), and generally tended to coincide with top root-associated SNPs (odds ratio = 3.9, Fisher's Exact test p = 0.002; Fig. 4, and Table S5). Specifically, this means that alleles increasing root length and gravitropic growth were present in areas with lower precipitation, and vice versa (Pearson's correlation r=0.85, p=0.003). This indicates that phenotypic variation generated by mutations coincides with environmental (and not geographic) gradients along the colonized areas.
Compared to other mutations with matched allele frequencies, root-associated mutations are first found in older herbarium samples nearer to Lake Michigan (Fig. S5), the area in US that seems to be most populated by A. thaliana (21) . This could be explained by natural selection having maintained mutations with phenotypic effect for a longer time than neutral mutations or perhaps that this mutations were selected for in a new environment. All in all our results are compatible with natural positive selection having already acted on root morphology variation that was generated by de novo mutations in this colonizing lineage. To confirm such hypotheses of local adaptation by de novo mutations, it will be necessary to grow collections of divergent HPG1 individuals in multiple contrasting locations over several years, and ideally revive historical specimens to compare performance (49) .

Conclusions
In summary, we have exploited whole-genome information from historic and contemporary collections of a herbaceous plant to empirically characterize evolutionary forces during a recent colonization. With this natural time series experiment we could directly estimate the nuclear substitution rate in wild A. thaliana populations -a parameter difficult to characterize experimentally (9) . This allowed us to date the colonization time and spread of HPG1 in N. America. We provide evidence that purifying selection has already changed the site frequency spectrum in the course of just a few centuries. Finally, we discovered that a small number of de novo mutations that rose to intermediate frequency can together explain quantitative variation in root traits across environments. This strengthens the hypothesis that some de novo variation could have had an adaptive value during the colonization and expansion process, a hypothesis that has been put forward as one of the possible solutions to the genetic paradox of invasion in plants (17) . This process might be more relevant in self-fertilizing plants, which typically have less diversity than outcrossing ones (50) , but have higher growth rates (43) and account for the majority of successful plant colonizers (5) . While A. thaliana HPG1 is not an invasive, i.e. harmful, species, it can teach us about fundamental evolutionary processes behind successful colonizations and adaptation to new environments. Our work should encourage others to search for similar natural experiments and to unlock the potential of herbarium specimens to study "evolution in action".

Sample collection and DNA sequencing
Modern A. thaliana accessions were from the collection described by Platt and colleagues (23) , who identified HPG1 candidates based on 149 genome-wide SNPs (Table S1, Supplementary Text 1).
Herbarium specimens were directly sampled by Max Planck colleagues Jane Devos and Gautam Shirsekar, or sent to us by collection curators from various herbaria (Table S1, Supplementary Text 1).
Among the substantial number of specimens in the herbaria of the University of Connecticut, the Chicago Field Museum and the New York Botanical Garden, we selected herbarium specimens spaced in time so there was at least one sample per decade starting from the oldest record (1863). The differences in geographic biases of herbarium and modern collections are difficult to know (2) , thus we did choose both historic and modern samples that were as regularly distributed in space as possible, and sample overlapping locations wherever possible. DNA from herbarium specimens was extracted as  Table S1. We also re-sequenced the genomes of twelve Col-0 MA lines (57,58) ( We estimated genetic diversity as Watterson´s θ (64) and nucleotide diversity π, and the difference between these two statistics as Tajimas's D (65) using DnaSP v5 (66) . We estimated pairwise linkage disequilibrium (LD) between all possible combinations of informative sites, ignoring singletons, by computing r 2 , D and D ' statistics using DnaSP v5 (66) . For the modern individuals, we calculated the recombination parameter rho ( 4N e r ) also using DnaSP v5 (66) .

Substitution and mutation rate analyses
Similarly as in Fu et al. (67) , we used genome-wide nuclear SNPs to calculate pairwise "net" genetic distances using the equation D' ij = D ic -D jc , where D ' ij is the net distance between a modern sample i and a herbarium sample j ; D ic the distance between the modern sample i and the reference genome c ; To fully account for the non-independence of points, we need to work with phylogenies. The Bayesian phylogenetics approach we used is implemented in BEAST v1.8 (63) and is called tip-calibration, and calculates a substitution rate along the phylogeny. Our analysis optimized simultaneously and in an iterative fashion using a Monte Carlo Markov Chain (MCMC) a tree topology, branch length, substitution rate, and a demographic Skygrid model (Supplementary Text 7). The demographic model is a Bayesian nonparametric one that is optimized for multiple loci and that allows for complex demographic trajectories by estimating population sizes in time bins across the tree based on the number of coalescent -branching -events per bin (68) . We also performed a second analysis run using a fixed prior for substitution rate of 3x10 -9 substitutions site -1 year -1 based on our previous net distance estimate to confirm that the MCMC had the same parameter convergence, e.g. tree topology, as in the first "estimate-all-parameters" run.
Having a substitution rate per year we can estimate the time to the most common recent ancestor L solving d = 2L x µ where d is the average pairwise genetic distance between our samples and µ is the calculated substitution rate from the distance method. This yielded 363 years, which subtracted to the average collection date of the samples, produced a point estimate of 1615. We compare this estimate with the inferred phylogeny root from the BEAST analysis.

Inference of genome-wide selection
We separately analyzed sequences at different annotations, since as they might be under different selection regimes (i.e. evolutionary constraints). We computed one-tailed Fisher's exact test using the base stats package in R (69) on tables of counts of the total number of positions in the genome annotated as a coding or non-coding (intergenic, intronic, all other noncoding) and the number of SNPs of each annotation present in the HPG1 dataset: The test will return whether coding regions have a lower number of SNPs than other reference annotation (intronic, interenic, all non-coding regions), as expected by the total number of positions in the genome annotated as such. We also constructed contingency tables to test whether the SNPs are more likely to be found at low (<5%) or intermediate (5≥%) frequency: Finally, we calculated the unfolded Site Frequency Spectrum (SFS) based on the order of appearance of genetic variants in the herbarium dataset. We then used the the Kolmogorov-Smirnov two-samples test and 10,000 bootstrap resampling using the R package Matching v. 4.9-2 (ref. (70) ) to calculate whether the frequency spectrum was lower for coding SNPs than for other SNPs. Additionally, we also repeated these analyses comparing nonsynonymous and synonymous mutations.

Association analysis
We collected flowering, seed and root morphology phenotypes for 63 accessions (Supplementary Text 8). For associations with climate parameters, we followed a similar rationale as previously described (71) . We extracted information from the bioclim database ( http://www.worldclim.org/bioclim ) at a 2.5 variants, we used a linear mixed model: y = Xb + Zu + ε ; where y is the phenotype or climate variable, X is the genotype states at a given SNP, b is the fixed phenotypic effect of such SNP, Z is the design matrix of genome identities, u is the random genome background effect informed by the kinship matrix and distributed as MVN (0, σ g A ), and ε is the random error term. The ratio of σ g / σ T is commonly called narrow sense heritability, "chip" heritability, or proportion of variance explained by genotype (73) .
Only SNPs with MAF>5% (n=391) were used to build a kinship or relationship matrix A . Note that the differences between any two genotypes were of the order of one or few dozens of SNPs. While this approach is appropriate to calculate a chip heritability, it would not be very useful to detect significant SNP, as the random factor accumulates all the available variation (Table S4). We therefore run regular GWA model without kinship matrix: y = Xb + ε ; but generated a p-value empirical null distribution based on running such model over 1,000 permuted datasets, which lead to conservative significance calculation (Fig. S6, Data Appendix S1). The p-values from running the association in the real data that were below the 5% tail in the empirical distribution could be considered significant. However, we also established a conservative "double" Bonferroni correction, where the significant threshold was lowered to 0.01% (= 5% / [number of SNPs + number of phenotypes tested]). All significant SNPs are shown in Table S5, and a subset in Table 1. Although many phenotypic traits did not have significant SNPs, we show all the QQ plots in the Data Appendix S1 file.     Substitution rates for HPG1 were re-scaled to a per generation basis assuming different generation times. Confidence intervals in HPG1 substitution rates were obtained from 95% confidence intervals of the slope from 1,000 bootstraps (Table S4 for actual values).  Table 1 and Table S5). The rate and potential relevance of new mutations in a colonizing plant lineage 8

Sample collection and preparation
Seeds from modern accessions (Table S1) were bulked at the University of Chicago. Progeny for DNA extraction was grown at the Max Planck Institute for Developmental Biology. We used 2 to 8 mm 2 of dried tissue for destructive sampling from the herbarium specimens (Table S1).

SNP calling thresholds
To assess the effect of SNP calling thresholds on the mutation rate, we employed three different SHORE v0.9.0 quality thresholds following previous work (see Table S4 from (5)  information (0% missing rate). Thus, the most rigorous case would be 32-32 quality and 0% missing rate, and the most relaxed 24-24 quality and 50% maximum missing rate. Substitution rate calculations (section 7.2) were done for datasets from all combinations of these quality parameters (Fig. S3), and we chose the regular 32_15 quality threshold and complete information for the final estimate (Fig 3 C, E).

HPG1 and other haplogroups in North America
The modern samples had been originally selected based on previous genotyping efforts of about  S1A). We also built the same tree with the whole-genome sequences (Fig. S1B), which was mostly in agreement with the 149 SNP tree.
The previous work had identified several other haplogroup in N. America (9) . Not surprisingly, HPG1 individuals outcross with other lineages, and this accounts for some of the individuals which we later removed, because they did not agree completely in all 149 markers with the HPG1 consensus.

North american private diversity
Having identified these bona fide HPG1 individuals, we wanted to confirm that the diversity has a legitimate origin from de novo mutations. For that we used the 1001 Genomes resource ( www.1001genomes.org ), which covers a sampling of populations from the native Eurasian and African range. Subsetting the genomes from this resource to only European accessions, and limiting the SNP set to those with ≥1% frequency of alternative alleles and a maximum of 50% missing data (the same quality rate as our HPG1 SNP call), there were 300 variants out of all 5,181 HPG1 variants that were also found in Europe or Asia (5.7%). Changing the maximum missing data to 10% we get a more conservative estimate of 1.8% overlap, while increasing the maximum missing data to 90%, we get the anti-conservative estimate of 6.5% overlap. Only one of the reported SNPs associated with phenotypes (see Section 8 ) was among these shared variants. There are several scenarios that can explain these shared SNPs. One is simply that there was not a single founding seed, but a few of closely related individuals coming from the native range.
Other explanations are that parallel mutations occurred in North America and Eurasia, that HPG1 individuals were reintroduced to Europe, or that reversion-mutation occurred in some HPG1 individuals. The latter is not implausible given the large population size of the species and the fact that about 10% of all sites in the genome are SNPs in the 1001 Genomes collection. As explained in the main text, SNP sharing due to admixture with other lineages is extremely unlikely, as such cases should be evident as blocks of high SNP diversity along the genome (Fig. 4).
Finally, regarding chloroplast diversity, we did not find any SNP in the chloroplast of HPG1 individuals. This is probably because chloroplast mutation rates are much slower (10) and because the founder colonizers actually came from a small batch of seeds from an identical mother (chloroplast diversity in the native range is of 2,842 SNPs (11) ).

Extent of linkage disequilibrium and recombination
We estimated pairwise linkage disequilibrium (LD) between all possible combinations of informative sites, ignoring singletons, by computing r 2 , D and D ' statistics. LD decay was estimated using a linear regression approach. Linkage disequilibrium parameter | D '| did not decay with physical distance (intercept = 0.99, slope = 0.00) among all SNP pairs. Indeed 99.975% of pairwise SNP comparisons had | D '|=1 meaning that 99.975% of those comparisons only three out of the four possible gametes (ab, aB, Ab, AB) are found and thus mutation alone can explain their existence without the need of invoking recombination. In other words, such three gametes can be represented in a tree structure.
LD and recombination related statistics were determined using DnaSP v5 (12) .  (Table S3). The genome length was determined as all base pairs with coverage higher or equal to 3, and a SHORE mapping quality score of at least 32 in one sample (Table S2).

Net distances
For the "net genetic distances" method, we computed confidence intervals of the b regression slope coefficient ( D ' = a + bT ') using a bootstrap with replacement of 1,000 samples to avoid over-confident confidence intervals due to lack of independence of points (13) . We used either all SNPs or SNPs at specific annotations to calculate different substitution rates and scaled the slope into a per-base rate using all positions (of the given annotation) that passed alternative or reference call quality thresholds rather than using a single value of genome length (Table S3). For all annotations we calculated substitution rates with three quality thresholds and either full information per SNP or allowing a maximum of 50% missing accessions per SNP (see Section 3 and Fig. S1C).
For some annotations substitution rates were not reliable. For instance, in 3' and 5' UTR regions, we did not have enough mutations (on average~1 SNP difference between any pair), and thus do not report these regions' rates. We could also have less power to discover SNPs in annotations with extensive structural variation such as active transposable elements (14) .
Transposons, which comprise~8% of the genome and~19% of all the SNPs in greenhouse MA lines, had fewer SNPs called than expected in HPG1. This would explain the atypically low transposon substitution rate (Table S3). Therefore, transposon substitution rates in HPG1 cannot be trusted.

Bayesian tip-calibration
For the second approach to estimate a substitution rate, the Bayesian phylogenetics tip-calibration approach, we performed systematic runs and chain convergence assessments of different demographic and molecular clock models. We found the Skygrid demographic model (15) (Fig. 3B). Additionally, we used DensiTree (18) to simultaneously draw the 10,000 BEAST trees with the highest posterior probability (Fig. 3A). Since all trees were drawn transparently, agreements in both topology and branch lengths appear as densely colored regions, while areas with little agreement appear lighter.

Methylation status of mutated sites
As in many other species, the spectrum of de novo mutations in the greenhouse-grown A. thaliana MA lines is biased towards G:C → A:T transitions (8) , leading to an inflated transition-to-transversion ratio (Ts/Tv). This bias is less pronounced in recent mutations in a Eurasian collection of natural accessions (Fig. 5A of (19) and in HPG1 accessions (Fig. 3D). A recent multigenerational salt stress experiment in the greenhouse also showed a more balanced Ts/Tv (20) . These findings indicate that less benign conditions might promote a lower Ts/Tv, and one possible cause are methylation patterns, known to change under different environments (21) .
We interrogated the potential evolutionary role of cytosine methylation in the mutability of cytosine bases in the HPG1 accessions. For reference DNA methylation data, we used previously generated bisulfite-sequencing data of HPG1 strains (7) and of Col-0 MA lines (5) , respectively. For both datasets, methylation status was calculated as the fraction of reads with methylated cytosines by the total number of reads at a certain cytosine position in the genome. Our rationale was that if methylation affected mutability, the degree of methylation at positions were we find a new mutation should be higher. To be sure that a given site in HPG1 was a new mutation, we only considered positions for which we could determine that state by alignment to the A. lyrata genome (22) . The "tested sites" were positions in HPG1 that had a mutation both from A. lyrata and A. thaliana Col-0.
These positions can be of two kinds, "fixed" if all HPG1 individuals carry the alternative, or "segregating" if both reference and alternative alleles exist in HPG1. As control, "control set", we used cytosine positions that did not vary across HPG1, A. lyrata and A. thaliana . To produce the methylation distribution of the control set we randomly chose 1,000 invariant cytosine positions. For the test sets, we averaged the methylation degree and compared it with the control distribution.
Ancestral cytosines with higher methylation in both A. thaliana Col-0 reference and HPG1 pseudo-reference methylome datasets were more likely to mutate to thymines in HPG1 (Fig. S2 A-D). Additionally, the methylation degree at substitutions inside genes was higher in the HPG1 methylome ( Fig. S2 B,D). While some C → T changes could be explained by higher spontaneous deaminations known to happen more often at methylated cytosines, also C → A/G substitutions were more likely to have been methylated. If this process is common enough, the Ts/Tv ratio should decrease. We are far from understanding differences in Ts/Tv in natural and controlled conditions, but definitely methylation status seems to have a strong statistical connection with mutability. We used the means per genotypes and per time series for association analyses.

Seed size
We spread the seeds of given genotypes on separate plastic square 12

Quantitative genetic analyses
For 63 modern accessions, we measured time to bolting and flowering, seeds per plant, seed size, and 15 root phenotypes in common chamber or common garden settings. For all 100 accessions, climatic information from the bioclim database ( www.worldclim.org/bioclim ) was extracted using their geographic coordinates. For historic samples, some locations were only known by county name. In this case we assigned the geographic coordinate location of the centroid of the county.

Heritability
We performed association analyses using the R package GenABEL (24) , with measured phenotypes

Linear Models
For association analyses we first employed a linear mixed model that fitted the kinship matrix using the mmscore function. This model is of the type: y = Xb + Zu + ε (see Main text Methods) (26) .
Only three significant SNP hits were discovered using a 5% significance threshold after False Discovery Rate correction (FDR). This was expected since we have few variants and these would have originated in an approximated phylogeny structure. We concluded that fitting the kinship matrix in our model was not appropriate since there would be no residual variation for association with specific SNPs. With this rationale we employed a fixed effects linear model using the qtscore function SNPs with p-values below 5% in the empirical p-value distribution should be considered significant (but see next section). In climatic models, we included longitude and latitude as covariates to correct for any spurious association between SNPs and climate gradients created by the migratory pattern of isolation by distance.

Evaluation of significance
Significant SNPs were interspersed throughout the genome (Fig. 4)   To further ensure that we avoided false positive results, we also prioritized SNPs whose empirical p-value was not below 5% only but also below 5% / (number of SNPs + number of traits) = 0.01%. This "double" Bonferroni correction was very conservative (Table 1, Table S5).

Context of de novo mutations associated with phenotypes
For each SNP in our dataset, we determined the ancestral and derived states, by identifying which allele was found in the oldest herbarium samples. We compared the time of emergence and the centroid of geographic distribution of the alternative alleles of SNP hits to random draws of SNPs with the same MAF filtering (5%) (Fig. S1).

Functional information
On top of phenotypic and climatic associations of SNP hits, we also provide a likely functional effect employing a commonly used amino acid matrix of biochemical effects (28

Proof of concept examples
We argue that the power of our association approach relies on the fact that HPG1 lines resemble Near Isogenic Lines (NILs) produced by experimental crosses (29) (Fig. S2A). Similar to genome-wide association studies (GWA), power depends on many factors, namely the noise of phenotype under study, architecture of phenotypic trait, quality of genotyping, population structure, sample diversity, sample size, allele frequency, and recombination. On one hand, association analyses in NILs suffer from large linkage blocks, but confident results can be achieved due to accurate measurement of phenotypes, limited genetic differences between any two lines, and high quality genotypes. In common GWA studies such as in humans, there are multiple confounding effects.
Among the confounders are (1) that any two samples differ in hundreds of thousands of SNPs, and (2) that historical and geographic stratification produce non-random correlations among those SNP differences. This considerably complicates the identification of phenotypic effects at specific genes, and power relies greatly on large sample sizes to achieve the sufficient number of recombination between markers.
To provide support for the non-synonymous SNP on chromosome 5, at position 6,508,329 in AT5G19330, we looked for pairs of lines that carry the ancestral and the derived allele, but that differ in few (or no other) SNPs in the genome. When considering all genic substitutions with a minimum allele frequency of 5% (Fig. S2A), we identified 20 pairs of lines differing only in the AT5G19330 SNP and another linked SNP (located on a different chromosome, association p-value > 0.4). The phenotypic differences in mean gravitropic score of these almost-identical pairs were significantly higher than phenotypic differences among all pairs of HPG1 lines, and genetically identical pairs attending to substitutions inside genes (Fig. S2A). Furthermore, this SNP was not in complete linkage with any other SNP hit ( r 2 < 0.5) (Fig. S2D). The same approach was used to examine the SNPs in AT1G54440 (Fig. S2E) and AT2G16580 (Fig. S2F) Table S1. HPG1 sample information. Table S2. Sample information for Col-0 mutation accumulation lines.  Table S4. Description of phenotypic and climatic variables for association mapping analyses. Table S5. SNP hits from association analyses and several descriptors.
Data Appendix S1 : For each trait employed in association analyses, we report the histogram distribution and the QQ plot of p-values to ensure that no trait departs exaggeratedly from the normal distribution, and that no inflation of p-values is observed (when lambda ≤ 1, there is no inflation of false positives).  substitutions that are shared -fixed -across all individuals (light red) or whose allele are present at an intermediate -segregating -frequency (dark red). Likewise, average methylation is shown for sites that changed from cytosine to adenine (blue) that that are fixed (light blue) or segregating (dark blue). The fact that the average methylation is higher in new substitutions than in invariant positions supports a connection between methylation and mutability of sites.