Seed size and its rate of evolution correlate with species diversification across angiosperms

Species diversity varies greatly across the different taxonomic groups that comprise the Tree of Life (ToL). This imbalance is particularly conspicuous within angiosperms, but is largely unexplained. Seed mass is one trait that may help clarify why some lineages diversify more than others because it confers adaptation to different environments, which can subsequently influence speciation and extinction. The rate at which seed mass changes across the angiosperm phylogeny may also be linked to diversification by increasing reproductive isolation and allowing access to novel ecological niches. However, the magnitude and direction of the association between seed mass and diversification has not been assessed across the angiosperm phylogeny. Here, we show that absolute seed size and the rate of change in seed size are both associated with variation in diversification rates. Based on the largest available angiosperm phylogenetic tree, we found that smaller-seeded plants had higher rates of diversification, possibly due to improved colonisation potential. The rate of phenotypic change in seed size was also strongly positively correlated with speciation rates, providing rare, large-scale evidence that rapid morphological change is associated with species divergence. Our study now reveals that variation in morphological traits and, importantly, the rate at which they evolve can contribute to explaining the extremely uneven distribution of diversity across the ToL.

Introduction phylogenetic timetree available [17] with an unparalleled dataset of seed mass measurements from over 30,000 angiosperm species [18]. We estimated rates of speciation, extinction, and seed size evolution across the phylogeny using Bayesian, method-of-moments, and maximum likelihood analyses-each with different assumptions regarding rate variation through time. We then tested whether there were any links between rates of diversification and both seed size and its rate of evolution. Additionally, we examined whether these links were consistent across different methodologies and timescales.

Results
Our results point to a strong association between angiosperm diversification and rates of seed size evolution irrespective of analytical method or timescale, with weaker evidence for a link between macroevolutionary dynamics and absolute seed size. In the first instance, we calculated rates of speciation (λ), extinction (μ) and seed size evolution by using Bayesian Analysis of Macroevolutionary Mixtures (BAMM) [19]. BAMM models rate heterogeneity through time and lineages, and accounts for incomplete taxon sampling. We used a phylogenetic tree that contained 29,703 angiosperm species for the speciation/extinction analysis. The tree was subset to 13,577 species with seed size data for phenotypic evolution analysis. As expected, given the high degree of taxonomic imbalance observed in the angiosperm phylogeny, we found strong support for more than 500 shifts in the rates of diversification. There was also marked heterogeneity in the rates of seed size evolution (Fig 1), which varied over 3 orders of magnitude (S1 Fig).
We then estimated whether shifts in macroevolutionary dynamics (λ, μ, and r) estimated with BAMM were significantly correlated with absolute seed size and tip-specific rates of seed size evolution by comparing the empirical correlations to a null distribution generated using STructured Rate Permutations on Phylogenies (STRAPP), which is robust to phylogenetic pseudoreplication (see Materials and methods for details) [20]. We were able to link major differences in diversity across angiosperm clades with both the present rate of phenotypic evolution and the absolute value of trait itself. Specifically, increased speciation was associated with a faster rate of seed size evolution (Spearman's ρ = 0.55, p-value < 0.0001; Fig 2A). Increased extinction rates were similarly associated with higher evolvability (ρ = 0.44, p-value < 0.0001; Fig 2B), but given the weaker effect, the net outcome of λ−μ was that diversification rates were positively correlated with phenotypic change (ρ = 0.49, p-value < 0.0001; Fig 2C). We also identified an association between seed size and both speciation (ρ = −0.17, p-value = 0.003; Fig  2D), and extinction rates (ρ = −0.17, p-value = 0.003, Fig 2E). As the correlations with speciation and extinction were in the same direction and of comparable magnitude, and estimates of extinction rates were relatively variable (Fig 2E), net diversification rates did not change with seed size (ρ = −0.12, p-value = 0.077; Fig 2F). Generally, the observed correlations arose from many phenotypically fast-evolving clades distributed across the phylogeny (S1 Fig) and were robust to prior choice in the BAMM analyses (S2 Fig).
Given recent discussion on the reliability of BAMM for estimating diversification rates ( [21], but see [22]), we tested the robustness of our results by using alternative methodologies to infer macroevolutionary dynamics across clades at different timescales. Ten, 2 million yearwide time slices from the present up to 20 million years (myr) ago were defined. These time slices were used to identify the most inclusive monophyletic clades of at least 4 species in which we had estimated at least a 70% probability of recovering the correct crown age node of the clade (see Materials and methods). For each resulting clade in each time slice, we calculated diversification rates using a method-of-moments estimator, which assumes rates are constant over time [23]. We also fitted a series of time-dependent diversification models to each clade with R: Phylogenetic ANalyses of DiversificAtion (RPANDA), which uses a maximum likelihood approach to estimate speciation and extinction and allows for incomplete taxon sampling [24] (see S1 Table for a summary of the best fitting models for each time slice). Rates of seed size evolution were estimated within each clade that also had at least 4 species with seed size data by fitting both Brownian motion (BM) and early burst (EB) or accelerating decelerating [25] models of trait evolution. Mirroring the BAMM results, we found a positive correlation between the rate of seed size evolution and speciation rates that was consistent across time slices (Fig 3A, S3A Fig). As expected given the weaker association between seed size and speciation found in our BAMM analyses, correlations were generally weaker and nonsignificant (Fig 3B), except for 1 of the time slices (S3B Fig). Phylogenetic tree of 13,577 species of flowering plants with seed mass, rate of seed mass change, and speciation (λ), extinction (μ), and net diversification (r) rates estimated by BAMM. Seed mass and rate data were standardised to Z-scores so that variation could be directly compared. λ, μ, and r were calculated with a larger, 29,703-species tree. Photo credits (Fig 1) We also found limited evidence that other traits that covary with seed size (S2 and S3 Tables) better explained our results. If our results were explained by seed size being a proxy of some other phenotypic trait that ultimately influenced speciation, we would expect this other phenotypic trait to be strongly correlated with seed size. We would also expect a significantly stronger correlation of speciation with both this trait and its rate of evolution when compared with seed size and its rate of evolution. By comparing the effects of genome size, life cycle, plant height, and woodiness across a subset of 1,007 species in our dataset, we found that only the distinctions between woody and herbaceous and annual and perennial species were more strongly correlated with macroevolutionary dynamics than absolute seed size. The rate of phenotypic evolution of all the continuous traits (genome size, seed size, and plant height) was strongly and similarly correlated with speciation (S4 Fig, S4 Table). Importantly, however, neither the rate of evolution nor the absolute values of both genome size and plant height were more strongly associated with speciation than seed size or its rate of evolution (S4 Fig). These results therefore suggest that the correlation between macroevolutionary dynamics and both seed size and its rate of evolution is not simply mediated by other phenotypic traits. . Spearman correlations were calculated between speciation (λ), extinction (μ), and net diversification (r), and each of (a) present-day rate of seed mass change and (b) seed mass. Coloured lines are correlations for each of one sample of the BAMM posterior distribution, bold line is the median. The insets show the density plots of the absolute difference between the observed and null correlation calculated across 1,000 structured permutations of the evolutionary rates on the phylogenetic tree (myr, million years). See http://www.github.com/javierigea/seed_size for data.

Discussion
Our study supports the idea that variation in seed mass and, particularly, its rate of evolution can help explain disparity in diversification across the angiosperm phylogeny by playing a central role in plant life history. As we show with our clade-based analysis, our results are repeatable across many methodologies, varying timescales, and are not restricted to a particular taxonomic rank (Figs 2 and 3).
The robust association of high rates of phenotypic change with lineage diversification has recently been observed in other taxonomic groups [9,26], but never across the whole of the angiosperm ToL as we find here. Accelerated morphological evolution may allow radiating lineages to occupy more complex adaptive landscapes [27]. Species with greater rate of change in their seed mass (i.e., higher evolvability) could also shift between adaptive peaks or develop reproductive barriers more rapidly. Alternatively, the theory of punctuated equilibria [28], whereby morphological changes can arise from the speciation process, might also explain the connection of phenotypic evolution with species divergence. However, current methods do not allow us to distinguish whether speciation is responding to morphological change or vice versa when reconstructing 250 myr of evolutionary history [9].
Seed mass itself will also covary with dispersal ability and environmental tolerance in ways that can change speciation. For example, we found that smaller-seeded genera had faster speciation rates. This may be because smaller-seeded genera generally disperse over larger distances [29], which can promote speciation by creating isolated populations [30]. However, the relationship between dispersal and speciation is highly context dependent. The permeability of a landscape to dispersal determines the dispersal distances that may promote species divergence. For example, long distance dispersal may be needed for isolation to occur in continuous habitats but be less effective in highly fragmented landscapes [31]. Therefore, the weak correlation that we observed between seed size and diversification might reflect contradicting patterns operating in different angiosperm clades. Dispersal syndromes may also modify the effect of seed size on speciation. For instance, species with larger seeds are generally associated with biotic dispersal that distributes seeds over greater distances than wind or gravity dispersal [5]. However, broad-scale predictions for the effects of dispersal syndromes on diversification may be inaccurate because the former depend on landscape connectivity [8] and can sometimes be inconsistent, e.g., a wind-dispersed seed might be transported by an animal. Detailed contextual data will be necessary to expand upon the mechanisms underlying our findings in specific regions and clades.
Although seed mass is associated with other traits that can affect diversification, there is little evidence that these better explain our observed correlations or that seed size is a mere proxy for one of these other traits. For example, genome size positively correlates with seed mass [32], and faster rates of genome size evolution have been linked to increased speciation in angiosperms [12]. Shorter, smaller-seeded plants also tend to have faster life cycles, which may accelerate mutation rates [33,34] and promote diversification [35]. But unlike other traits [12], both absolute seed size as well as its rate of change were correlated with speciation. Thus, although other traits surely influence diversification [36], we argue that our results generally reflect the role of seed size as a trait that integrates across multiple aspects of life history characteristics in ways that can predictably influence plant macroevolutionary dynamics (S5 Fig).
Our analyses build upon the largest available phylogenetic timetree for angiosperms in ways that do not consider topological and branch length uncertainty. Similar to other megaphylogenies, the low sampling fraction (approximately 10% of described flowering plants) and limited number of phylogenetic markers (a maximum of 7), which were employed in constructing our phylogeny [17], may affect the inference of macroevolutionary estimates [37,38]. In any case, we believe our results, particularly the strong correlation between the rate of seed size evolution and speciation rates may reflect general patterns on how biodiversity is generated across angiosperms. Detailed studies in well-sampled clades can expand upon our findings and reveal different relationships operating in particular groups of organisms.
The approach applied here can help to unravel the processes responsible for generating large-scale asymmetries in biodiversity. It also offers the potential to test how widely varying traits and their rate of morphological evolution influence other aspects of the evolution and adaptation of flowering plants (e.g., [17]). Clade-specific exceptions arising from local interactions with nonfocal traits [39] and specific spatiotemporal contexts will undoubtedly interact with broad-scale macroevolutionary patterns and may modulate the effects of seed mass on diversification. Regardless, our results show that seed size and its rate of evolution correlate with speciation and extinction across the flowering plants. This finding may help to explain why some clades are much more species-rich than others and points to the role of rapid morphological evolution in generating greater levels of diversity.

Seed mass and phylogenetic dataset
Seed mass data for 31,932 species were obtained from the Royal Botanic Gardens Kew Seed Information Database [18]. Species names were standardised with The Plant List (TPL) nomenclature [2] and cleaned using the Taxonstand R package [40]. Further processing was performed with the taxonlookup R package [41], which is a complete genus-family-order mapping for vascular plants that draws from TPL, the Angiosperm Phylogeny website [42], and a higher-level manually-curated taxonomic lookup [17].
We used the most comprehensive phylogenetic tree for land plants [17,43] that comprises 31,389 species. Taxonomic information for our phylogenetic tree was run through Taxonstand Diversification and phenotypic evolution analyses Speciation, extinction, and net diversification rates were estimated across the 29,703 species tree by using BAMM version 2.5.0 [19]. BAMM models shifts in macroevolutionary regimes across a phylogenetic tree using reversible-jump Markov chain Monte Carlo (rjMCMC) sampling. The size of the tree precluded a single analysis from readily converging. Therefore, we divided our initial tree into clades of no more than 6000 species. This resulted in 6 monophyletic clades and 1 additional clade that contained the backbone of the tree (i.e., 1 representative of the 6 monophyletic clades) plus the remaining, unassigned species (S7 Fig). We then ran BAMM speciation/extinction analyses for each of the 7 clades (6 monophyletic clades plus the backbone set). Initial prior settings were calculated with the setBAMMpriors function in BAMMtools [44], and the expectedNumberOfShifts parameter was set at 50. Following [12], we incorporated nonrandom incomplete sampling information by calculating the proportion of species sampled inside each family and estimated the backbone sampling as the overall proportion of sampled species. Taxonlookup was used as a reference for these calculations. Analyses were run for 150 million generations and convergence was verified by plotting chain traces and ensuring that the effective sample sizes of all relevant parameters exceeded 200. The first 100 million generations were discarded as burn-in. The resulting event files for all 7 analyses were combined into a single event file, effectively generating one BAMM result set that was then analysed following standard procedure. This clade-level analysis allowed for diversification rate estimation with the most complete dataset that was available. The resulting rate estimates results were strongly correlated with estimates from a single speciation/extinction analysis using the 13,577 species that were present in our seed size database (S8 Fig; r = 0.92, pvalue < 0.001).
Rates of seed size evolution were also estimated with BAMM by using the phylogenetic tree of the 13,577 species that were present in our seed size database. Initial priors were calculated as above and analyses were run for 300 million generations. The initial 30 million generations were discarded as burn-in. We analysed BAMM prior sensitivity following recent concerns ( [45], but see [22]) by rerunning both the diversification and the trait evolution analyses for the 13,577 species dataset with different settings for the expectedNumberOfShifts parameter of either 25, 50, 100, or 250. These analyses confirmed a low prior sensitivity for the posterior of the number of expected rate shifts (S2 Fig). Finally, we obtained clade-based measures of diversification and seed size evolution across the species-level tree. Clades were defined as nonoverlapping monophyletic groupings of 4 or more species and their ages were defined using a series of 2-myr-wide time slices from the present up to 20 myr ago (see S13 Fig for the number of clades in each time slice). For each clade, we estimated its sampling fraction by weighting the genus-specific sampling fractions (i.e., the number of congeneric species in the 13,577 species tree divided by the total number of described species for that genus) with the number of species from each genus present in the clade. A minimum clade-specific sampling fraction of 0.3 was used for inclusion in our analyses, ensuring that the crown sampling probability for each clade was at least 0.7 (the actual median clade crown sampling probabilities for the time slices ranged between 0.90 and 0.96). Crown ages for the selected clades were then used to estimate net diversification rates by using the method-of-moments estimator [23]. Following standard practice [46,47], we assumed three values of relative extinction fraction, ε = 0, 0.5, and 0.9. Different values did not affect our conclusions (results not shown); therefore, we present the results of the intermediate extinction fraction (ε = 0.5). We also used RPANDA to fit a series diversification models that estimated time-dependent rates for each clade. We fitted 6 different models of diversification: (i) pure birth model with constant λ (speciation rate); (ii) pure birth model with exponential λ; (iii) birth-death model with constant λ and μ (extinction); (iv) birth-death model with exponential λ and constant μ; (v) birth-death model with constant λ and exponential μ; and (vi) birth-death model with exponential λ and exponential μ. We then used AIC-based model selection to select the best fitting model and obtain the corresponding macroevolutionary parameters. Finally, we estimated the rates of seed size evolution by fitting BM and EB models of evolution to the seed size data within each clade using fitContinuous from the geiger R package [48]. We performed AIC-based model selection to find the best fitting model of trait evolution. More than 99.7% of the clades showed a higher support for the BM model. Additionally, to account for possible biases when analysing clades with many noncongeneric species, we confirmed the results of our clade-based analysis by considering clades that only contained congeneric species (S9 Fig).

Correlation of diversification and trait evolution
All rate variables were log-transformed for the correlation analyses. Following [20], we treated seed mass evolutionary rates at the tips of the tree as character states and used STRAPP to test for multiple associations between these present-day rates and BAMM-estimated diversification dynamics. We similarly analysed the correlation of absolute seed mass and speciation/extinction dynamics. STRAPP compares the correlation between a focal trait and a macroevolutionary parameter (λ, μ, or r) to a null distribution of correlations. The null correlations are generated by permuting the evolutionary rates in the tips of the phylogenetic tree while maintaining the location of rate shift events in the phylogeny. In each case, we calculated the absolute difference between the observed correlation of the macroevolutionary rate and the trait state and the null correlation obtained by the structured permutations across 5,000 samples from the BAMM posterior (for an example of the observed correlation in 1 of the samples in the posterior, see S10 Fig). The reported p-value was the proportion of replicates where the null correlation coefficient was greater than the observed correlation. We found a low, type I error associated with our STRAPP correlation analysis (p-value = 0.094, S11 Fig). We also investigated the correlation between the mean speciation and mean phenotypic rates across all the branches in the tree using an ordinary least squares regression (S12 Fig). We note however, that this regression does not correct for phylogenetic dependence in the branch estimates, despite suggesting something about patterns across the whole evolutionary history of the angiosperms.
For the clade-based analyses, we estimated the correlation between speciation rates (measured as λ at present time for the RPANDA analyses) and each of seed size rate of evolution and phylogenetically-corrected mean seed size (i.e., trait value at the root node of the clade under a BM). Correlations were estimated by using phylogenetic generalised least squares (PGLS) as implemented in the R package caper [49] (S13 Fig and S14 Fig). Similar results as presented in the main text were obtained when analysing net diversification instead of speciation rates (S15 Fig). Finally, we similarly analysed the correlation between speciation and each of seed size and its rate of change when selecting only clades consisting of congeneric species. Again, this analysis resulted in a similar pattern as the one presented in the main text ( S9 Fig).

Diversification dynamics and other phenotypic traits
Seed mass is central to a network of inter-correlated traits associated with plant life history that can impact diversification. Some of these traits are genome size or plant C-value (measured as picograms of DNA per haploid nucleus), plant height, life cycle, and woodiness. We compared the correlation between macroevolutionary parameters (λ, μ, r) and each of seed mass, seed mass rate of evolution, C-value (i.e., genome size), plant height, life cycle, and woodiness across a dataset of 1,007 angiosperm species for which all phenotypic traits could be assembled. Genome content and life cycle data were downloaded from the Plant DNA C-values database [50]. Woodiness data were obtained from [17,51]. Plant height data were obtained from the TRY database [52]. The rates of phenotypic evolution for the continuous traits (seed size, plant height, and C-value) were calculated as the phenotypic rate inferred at the tips of the tree in our main BAMM analysis (see above). Surprisingly, mean seed mass did not differ between the 214 strictly annual and 793 perennial plants when accounting for phylogenetic relationships using a phylogenetic ANOVA (S16 Fig, phylANOVA: p value = 0.308, significance assessed with 1,000 random simulations with phytools [53]). In this reduced dataset, we ran STRAPP correlations for each focal trait with the diversification parameters calculated from our main BAMM analysis. We then calculated the absolute differences in the observed and the null correlations between the macroevolutionary parameters and seed mass, C-value, "annuality" (a binary variable specifying whether the species was strictly annual or not), plant height, woodiness (a binary variable specifying whether the species was herbaceous or woody) and the rates of seed size evolution, genome size evolution, and plant height evolution. As expected, all phenotypic traits and their rates of evolution were correlated with each other (  We estimated the type I error rate of our analysis by simulating neutral traits on the angiosperm phylogenetic tree. We performed 1,000 simulations and then ran 1,000 STRAPP tests with each simulated dataset. We estimated the corresponding p-values for the association between traits and diversification and calculated the type I error as the proportion of datasets where a significant association (p-value < 0.05) was detected.  Table. RPANDA diversification models for the clade-based analyses. For each 2-million year (myr) time slice, we counted the number of clades where the best-fitting model was either i) birth-death model with constant λ (speciation) and μ (extinction) (lambda.cst.mu.cst); pure birth model with constant λ (lambda.cst.mu0); pure birth model with exponential λ (lambda. exp.mu.0); birth-death model with exponential λ and constant μ (lambda.exp.mu.cst); birthdeath model with exponential λ and exponential μ (lambda.exp.mu.exp); or birth-death model with constant λ and exponential μ (lambda.cst.mu.exp). (DOCX) S2 Table. Correlations of seed size and other phenotypic traits. Trait values were obtained from a 1,007 species tree where all species had data for seed size, C-value and plant height. The values are the slopes of the PGLS regressions and asterisks denote statistically significant correlations (p-value < 0.05). (DOCX) S3 Table. Correlations of seed size rate of evolution and other phenotypic rates of evolution. Rate values were obtained from a 1,007 species tree where all species had data for seed size, C-value and plant height. The values are the slopes of the PGLS regressions and asterisks denote statistically significant correlations (p-value < 0.05). (DOCX) S4 Table. STRAPP correlations (rho) for 1,007 species of angiosperms with seed size, genome size (i.e., C-value), life cycle, height, woodiness data and rates of seed size, C-value and height evolution.