Episodic Evolution and Adaptation of Chloroplast Genomes in Ancestral Grasses

Background It has been suggested that the chloroplast genomes of the grass family, Poaceae, have undergone an elevated evolutionary rate compared to most other angiosperms, yet the details of this phenomenon have remained obscure. To know how the rate change occurred during evolution, estimation of the time-scale with reliable calibrations is needed. The recent finding of 65 Ma grass phytoliths in Cretaceous dinosaur coprolites places the diversification of the grasses to the Cretaceous period, and provides a reliable calibration in studying the tempo and mode of grass chloroplast evolution. Methodology/Principal Findings By using chloroplast genome data from angiosperms and by taking account of new paleontological evidence, we now show that episodic rate acceleration both in terms of non-synonymous and synonymous substitutions occurred in the common ancestral branch of the core Poaceae (a group formed by rice, wheat, maize, and their allies) accompanied by adaptive evolution in several chloroplast proteins, while the rate reverted to the slow rate typical of most monocot species in the terminal branches. Conclusions/Significance Our finding of episodic rate acceleration in the ancestral grasses accompanied by adaptive molecular evolution has a profound bearing on the evolution of grasses, which form a highly successful group of plants. The widely used model for estimating divergence times was based on the assumption of correlated rates between ancestral and descendant lineages. However, the assumption is proved to be inadequate in approximating the episodic rate acceleration in the ancestral grasses, and the assumption of independent rates is more appropriate. This finding has implications for studies of molecular evolutionary rates and time-scale of evolution in other groups of organisms.


Introduction
The grass family, Poaceae, is one of the largest plant families, comprising about 10,000 species including the most important agricultural plants, rice, wheat and maize, and grass-dominated ecosystems comprise about one-third of Earth's vegetative cover and support a vast number of animals [1]. It has long been suggested that the chloroplast (chl) genomes of the grass family have undergone an elevated evolutionary rate compared to other angiosperms [2][3][4][5], yet little is known when, why and how this rate change occurred.
To examine how the rate change occurred during evolution, it is prerequisite to know the time-scale of evolution. It has become increasingly feasible to estimate the phylogenetic tree and the time-scale of Angiosperm evolution by using chl genome sequences [5][6][7][8][9]. A reliable calibration is necessary to obtain reliable time estimates, but lack of good fossil evidence of the ancestral grasses has prevented us from addressing this issue. The recent finding of 65 Ma grass phytoliths in Cretaceous dinosaur coprolites [10,11] places the diversification of the grasses to the Cretaceous period, and provides a reliable calibration in studying the tempo and mode of grass chl evolution. By using this calibration, we here find that episodic rate acceleration occurred in the common ancestral branch of the core Poaceae (a clade formed by rice Oryza, wheat Triticum, maize Zea, and their allies) accompanied by adaptive evolution in several chl proteins, while the rate reverted to the slow rate typical of most monocot species in the terminal branches. We also find that the widely used method for estimating divergence times based on the assumption of correlated rates between ancestral and descendant lineages [12][13][14] proved to be inadequate in approximating the process of grass chl evolution, and the assumption of independent rates [15] is more appropriate to studies of rate change over time. These results have implications for studies of molecular evolutionary rates and time-scale of evolution in other groups of organisms.

Results
Estimation of time-scale and pattern of rate change Fig. 1 shows the ML phylogenetic tree of Angiosperm chl with Gymnosperm as an outgroup. The elongated branches of Poaceae are in accord with their widely accepted rate acceleration [2][3][4][5]. The global clock model in Poales (including Poaceae and Typha)+Musa was rejected when compared with the 2-local-clocks model (Poaceae lineages have a different rate from basal lineages such as Typha and Musa) by the likelihood ratio test (LRT) (x 2 1~2 903:89, Pv10 {300 with the codon-substitution model). Moreover, longer distances of the Poaceae species from Musa than the Typha/Musa distance both in terms of non-synonymous and synonymous substitutions (Fig. 1) indicate that both types of substitutions have undergone rate acceleration along the line leading to Poaceae.
To explore the pattern of rate change during the course of grass evolution in more detail, we estimated the time-scale of Angiosperm phylogeny, particularly focusing on monocots. Although several powerful methods have been developed for molecular time estimation allowing the rate change (a relaxed clock) [12][13][14][15][16][17], the poor quality of the fossil record for early grasses has prevented us from addressing this issue. Previously, the divergence among major groups of Poaceae was thought to have occurred in early Cenozoic, and the 60 Ma [18] and 50-60 Ma [5] date calibrations for the maize/wheat divergence were used in estimating the monocots/eudicots divergence time with chl DNA sequences. However, recent findings of grass phytoliths in Cretaceous dinosaur coprolites [10,11] provided evidence that the major groups of core Poaceae had already diversified before Cretaceous/Tertiary (K/T) boundary of 65 Ma. Fig. 2A gives time estimates of the monocots evolution ( Fig. 3 and Table 1 for the whole angiosperms) by a relaxed clock based on the Bayesian method implemented in MCMCTREE (in PAML [19]) with a constraint of .65 Ma for the Zea/Oryza divergence and with the independent-rates model for the rate change along lineages [15,17]. In order to illustrate the rate change during evolution, a single instance of estimated rates along the lineage from the root to Oryza is also shown in Fig. 2A, where elevation of the rate only occurred on the common ancestral branch of Poaceae after they diverged from Typha.
Although the fossil evidence for the .65 Ma constraint of the Zea/Oryza divergence is important in demonstrating the rate acceleration in ancestral grasses with subsequent slow-down, it is not a prerequisite. Even when the constraint was removed, almost the same pattern of rate change as that with the .65 Ma constraint was obtained when the IR model was used (Fig. 2B and  Table 2), although the time estimate of the Zea/Oryza separation became younger (55.0 Ma). This time estimate is consistent with a conservative date of .50 Ma presented in refs. [5,20], and our conclusion of the reverted slow rate in contemporary Poaceae can be regarded as robust to the calibration points used.
In the first model of the relaxed clock implemented by Thorne and colleagues [12,13], rates are auto-correlated between ancestral and descendant lineages on the tree, and the model is called the correlated-rates (CR) model. Sanderson's method of nonparamet- Figure 1. The phylogenetic tree of chloroplast genomes for the 31 species. The tree topology in Fig. 3 of ref. [6] was used, and the branch lengths were estimated by the ML with the codon-substitution model [21,22] (CODEML in PAML [19]). The root was arbitrarily placed between Gymnosperm and Angiosperm. Non-synonymous (d N ) and synonymous (d S ) distances of Poales from Musa were estimated by CODEML. doi:10.1371/journal.pone.0005297.g001 ric rate-smoothing [14] was also based on the same idea. Later, an alternative model named the independent-rates (IR) model with no auto-correlation was developed [15,17]. In Table 1, estimates of divergence times with the .65 Ma constraint for the Zea/Oryza separation are compared between the IR and CR models. The CR model tends to give older estimates for the nodes preceding the Zea/Oryza separation than the IR model. For example, the monocots/eudicots divergence time estimate was 239.1 Ma with the CR model, while the estimate with the IR model was 212.5 Ma which is more in accord with the recently published estimate of 140-150 Ma [5] even though it is still older. Without the .65 Ma constraint, the time estimate of the Zea/Oryza separation became too young (36.9 Ma) from the CR model to be compatible with the suggestion of .50 Ma from the previous works [5,20], while the IR model gave compatible estimate of 55.0 Ma as mentioned before.
In order to examine the impact of including rapidly evolving Poaceae in the analysis, a comparison between the two models was carried out excluding Poaceae ( Table 3). The time estimates were similar between the two models, and were similar to those from the IR model including Poaceae. For the monocots/eudicots separation, the IR model gave almost consistent estimates of 212.5, 207.7, and 216.5 Ma, respectively, with the .65 Ma constraint for the Zea/Oryza separation, without the constraint, and excluding Poaceae, while the CR model gave more diverged estimates of 239.1, 220.4, and 223.2 Ma, respectively. Interestingly, the estimates for this separation were very close between the two models when Poaceae species were excluded. This suggests that the episodic rate acceleration in ancestral Poaceae causes biased estimates, which the CR model cannot accommodate.
In the above mentioned analyses, before fixing the shape and scale parameters (a and b) in the gamma prior for parameter s 2 , which specifies how variable the rates are across branches, impact of priors on these parameters to posterior time and rate estimates was examined in detail (Tables S1, S2, S3, S4). Posterior time estimate for the Zea/Oryza separation depended less on the choice of the gamma prior for parameter s 2 with the IR model (Tables S1 and S3) than with the CR model (Tables S2 and S4), and therefore a and b in the gamma prior for parameter s 2 were arbitrarily chosen to be 1.0 and 10.0, respectively, in the analyses of Tables 1-3

and Figs. 2 and 3.
In order to further check the robustness of the time estimation on the choice of the substitution model, additional analyses based on a more realistic model of codon-substitution [21,22] were carried out (Tables S5 and S6 with and without the .65 Ma constraint to the Zea/Oryza separation, and Table S7 excluding Poaceae). Comparisons of Tables S5 vs 1, Tables S6 vs 2, and  Tables S7 vs 3 indicate that the estimated times are very similar between the two models, and that the estimated times are robust to the choice of a substitution model.

Adaptive evolution
Non-synonymous/synonymous rate ratio (v = d N /d S ) is widely used as an indicator of adaptive evolution or positive selection [23]. Table 4 compares v ratios along the branches estimated by different models. The minimum AIC [24] model shows that a pronounced increase of v ratio occurred in the common ancestral lineage of Poaceae after they diverged from Typha, followed by reversion in the terminal branches to the lower level typical of basal lineages. The elevation of the v ratio can be due either by adaptive evolution or by relaxation of selective constraints. A higher v value than 1 is usually regarded as an evidence of adaptive evolution, but since the analysis shown in the table averages over the entire genomes, we would not get such a high Figure 2. Posterior estimates of divergence times and rate change during evolution. Estimations were done by using MCMCTREE in PAML [19] with the IR model [15] for the rate change along lineages. Shape and scale parameters, a and b, in the gamma prior for parameter s 2 were 1.0 and 10.0, respectively. Only Poales+Musa part of the whole tree is shown, and a numbering of a node follows that of the whole tree in Fig. 3. The upper lines of the colored area trace the estimated rates along the lineage from the root to Oryza value even if positive selection operated in some parts of some proteins. Therefore, the branch-site model [25,26] was applied.
To identify positively selected sites, among 61 protein-encoding ''genes'', we at first selected 16 ''genes'', for which the reverted 2vmodel (with the rate ratio v 1 of the common ancestral branch of Poaceae larger than the rate ratio v 0 of other branches) is significantly better than the 1v-model (P,0.05) ( Table 5), and by using the branch-site model [25,26], we identified 5 genes (atpE, cemA, clpP, rpoB, and rps11) which have P value of the branch-site LRT less than 0.05 and contain positively selected sites (Table 6).
Among the 16 genes with significantly higher v 1 than v 0 in Table 5 and among the 5 genes with positively selected sites in Table 6, only atpE is among the 14 genes with significant heterogeneity of nucleotide substitution rates for maize vs. rice, maize vs. wheat, or rice vs. wheat comparisons listed in Table 5 of ref. [27]. Four ''genes'', psaC, rbcL, rpl6, and PS13, have significantly lower v 1 than v 0 (stronger purifying selection in the ancestral branch of Poaceae than in other branches). In Table 5, we carried out multiple tests for 61 ''genes''. The Bonferroni correction is a safeguard against multiple tests falsely giving the appearance of significance, since 1 out of every 20 hypotheses tests is expected to be significant at the 5% level purely by chance. After performing the Bonferroni correction, 7 genes with * in Table 5 remained significant, that means, all the genes listed in Table 6 remained significant even by the conservative test of Bonferroni. On the other hand, the 4 ''genes'' with lower v 1 than v 0 in Table 5 were not significant after the Bonferroni correction.

Discussion
In our study, the IR model gives more consistent results than the CR model, which has been widely used in estimating divergence times [9,[12][13][14][28][29][30][31][32][33][34]. A basic assumption of the CR model is that rates change gradually over the tree. Our results suggest that the magnitude of the rate acceleration is underestimated by the CR model and that the IR model is more appropriate in approximating the rate change in the grass chl evolution. Although there exists a case in which the CR model outperforms the IR model [34], a number of authors have recently begun to notice that the IR model is superior to the CR model in approximating the evolution of evolutionary rates in several cases [17,[35][36][37].
Our analysis has revealed an episodic acceleration of the evolutionary rate of chl genomes during the emergence of core Poaceae, accompanied by adaptive evolution in several protein-  [19] with the IR model [15]. The .65 Ma constraint to the Zea/Oryza separation was applied. doi:10.1371/journal.pone.0005297.g003  encoding genes. Because the elevation of the rate occurred not only in non-synonymous substitutions but also in synonymous substitutions and because the elevated substitution rates were accompanied also by an elevated rate of insertions/deletions of nucleotides [9], the elevation of the mutation rate of chl genomes might have acted as a trigger of the adaptive evolution in the ancestral grasses, which might have facilitated the successful radiation and diversification of their descendants. Suggested positive selection of clpP in Oenothera and Sileneae accompanied by elevated synonymous rate [38] might be related to our finding of rate acceleration in ancestral grasses both in terms of synonymous and non-synonymous substitutions. A more extensive study of chl genomes showed highly accelerated non-synonymous rates of ribosomal protein and RNA polymerase genes in Geraniaceae accompanied with the elevation of the v ratio [39]. Interestingly, the 4 genes (atpE, cemA, rpoB, and rps11) detected to have positively-selected sites in our analysis (Table 6) are included in the gene group with significantly high v ratio in Geraniaceae relative to other angiosperms (clpP was not analyzed in ref. [39]).
Recently, Smith and Donoghue [40] tested evolutionary rates across five groups of angiosperms, and found that the rates are generally low in trees/shrubs compared to related herbs. This is an interesting finding which links life history of plants to their rates of molecular evolution, and their conclusion generally holds in five different groups of Angiosperm. What we have shown in this work, however, is that the pattern of rate change during evolution is more complicated than has previously been anticipated. Our finding highlights the need for paying attention to rates of internal branches rather than averaging along a lineage in addressing the rate heterogeneity problem.

Materials and Methods
Since our main interest was on grass evolution, we used all the monocot genera (13 species) and selected 18 species from outside monocots (31 species in total) among the 64 species in ref. [6]. We used 75 chl genes among 77 protein-encoding genes in ref. [6], excluding infA and ycf2 because of missing data.  Table 4. Estimation of non-synonymous/synonymous rate ratio (v) under different models by using CODEML in PAML [19].  Estimation of divergence times The concatenated 75 gene sequences of chl from 31 species (from ref. [6]) and the tree topology in ref. [6] were used. To estimate divergence times and molecular evolutionary rates, a Bayesian method implemented in MCMCTREE (in PAML [19]) was applied either with the CR model [12,13] or with the IR model [15] (using the GTR model with a discrete gamma distribution with five rate categories (C 5 ) for nucleotide substitutions), and multiple calibrations were incorporated through the time prior. The Gymnosperm/ Angiosperm divergence time was set at 280-310 Ma [5,7]. Three nodes were constrained with minimum ages as follows; (1) the minimum age of the Zea/Oryza divergence was set either to 65 Ma [10,11] or without this constraint, (2) .115 Ma constraint to the divergence of Poales from other monocots based on the earliest fossils of Poales [41,42], and (3) .125 Ma for the most basal divergence in eudicots [7]. In order to check the robustness of the time estimation on the choice of the substitution model, the codon-substitution [21,22]+C 5 model was also used. The program adopts soft bounds, so that the probability that the true divergence time is outside the bounds is small but not zero [43]. In the Bayesian framework, priors are assigned not only on times, but also on the overall substitution rate parameter m and on the rate-drift parameter s 2 . So we roughly estimated the prior mean of the overall rate m using the strict molecular clock with 295 Ma constraint to Gymnosperm/Angiosperm divergence time, and assigned the gamma prior G(4, 80) and G (4,22) for this prior parameter in applying the nucleotide and codon substitution models, respectively. We next examined the impact of the rate-drift parameter s 2 by giving various priors for s 2 in applying the nucleotide substitution model. Posterior distributions of parameters were approximated using two independent MCMC analyses of 10 7 steps each, following a discarded burn-in of 10 6 steps. All the analyses were repeated with different inseed values to check for convergence of the MCMC chain.

Non-synonymous/synonymous rate ratio
To the concatenated sequences of 75 protein-encoding genes of chl from 6 Poaceae species (Oryza, Triticum, Hordeum, Zea,  The numberings of amino acids are those of Zea mays [45]. Positively selected sites were inferred at P b = 95% with those reaching 99% shown in bold italic. The analyses were carried out for the 16 genes selected in Table 5, and only genes with positively selected sites and with P,0.05 (LRT) are listed. doi:10.1371/journal.pone.0005297.t006 Saccharum, and Sorghum), Typha, and Musa, we applied the codonbased likelihood models that allow for variable v ratios among different lineages [44]. We used the likelihood ratio test (LRT) to compare the likelihood of one-v ratio model, which assumes the same v for all branch in the tree, with the two-v ratio models, which assumes two different v ratios. One of the two-ratio models (named ''Simple 2v-model'') assumes that Poaceae (including the common ancestral branch) has different v from other parts of the tree as is represented by while the other (named ''Reverted 2v-model'') assumes that only the ancestral branch of Poceae has a different v ratio than all the other branches in the tree as is represented by Musa#v 0 , Typha#v 0 , crown Poaceae #v 0 ð Þ #v 1 ð Þ : All the analyses were carried out with the CODEML program in PAML [19] using the codon-substitution model with the F61 codon frequency.

Branch-site test of positive selection
The branch-site test was applied to the dataset of 11 monocot species in our dataset excluding the two most basal monocots; i.e., Dioscorea and Acorus. The branch preceding the common ancestor of the core Poaceae was specified as a foreground branch, and all the others as background branches. LRT is constructed to compare an alternative model that allows for some codons under positive selection on the foreground branch with a null model that does not. The null model restricts codons on the foreground lineage to be undergoing neutral evolution (v = 1). The specific codons which evolved under positive selection were identified on the foreground branch using a Bayes empirical Bayes procedure [25,26].

Supporting Information
Table S1 Impact of the shape and scale parameters (a and b) in the gamma prior for parameter s 2 using IR model with the .65 Ma constraint to the Zea/Oryza separation. 95% HPD is shown in parentheses. Times and rates are represented in 100 Ma (10 8 years ago) and 10 28 substitutions/site/years, respectively. Found at: doi:10.1371/journal.pone.0005297.s001 (0.04 MB DOC) Table S2 Impact of the shape and scale parameters (a and b) in the gamma prior for parameter s 2 using CR model with the .65 Ma constraint to the Zea/Oryza separation. 95% HPD is shown in parentheses. Times and rates are represented in 100 Ma (10 8 years ago) and 10 28 substitutions/site/years, respectively. Found at: doi:10.1371/journal.pone.0005297.s002 (0.04 MB DOC) Table S3 Impact of the shape and scale parameters (a and b) in the gamma prior for parameter s 2 using IR model without the constraint to the Zea/Oryza separation. 95% HPD is shown in parentheses. Times and rates are represented in 100 Ma (10 8 years ago) and 10 28 substitutions/site/years, respectively. Found at: doi:10.1371/journal.pone.0005297.s003 (0.04 MB DOC) Table S4 Impact of the shape and scale parameters (a and b) in the gamma prior for parameter s 2 using CR model without the constraint to the Zea/Oryza separation. 95% HPD is shown in parentheses. Times and rates are represented in 100 Ma (10 8 years ago) and 10 28 substitutions/site/years, respectively. Found at: doi:10.1371/journal.pone.0005297.s004 (0.04 MB DOC)

Table S5
Posterior estimates of divergence times by MCMCTREE in PAML [19] using the codon-substitution+C 5 model with the .65 Ma constraint to the Zea/Oryza separation. Shape and scale parameters, a and b, in the gamma prior for parameter s 2 were 1.0 and 10.0, respectively. 95% HPD is shown in parentheses. Rate (10 28 substitutions/codon/year) refers to the rate of the branch preceding the node. Node numbers refer to those in Fig. 3, and taxa in parentheses refer to those branched off from the lineage leading to Oryza.   [19] using the codon-substitution+C 5 model without constraint to the Zea/Oryza separation. Shape and scale parameters, a and b, in the gamma prior for parameter s 2 were 1.0 and 10.0, respectively. 95% HPD is shown in parentheses. Rate (10 28 substitutions/codon/year) refers to the rate of the branch preceding the node. Node numbers refer to those in Fig. 3, and taxa in parentheses refer to those branched off from the lineage leading to Oryza.   [19] using the codon-substitution+C 5 model excluding Poaceae. Shape and scale parameters, a and b, in the gamma prior for parameter s 2 were 1.0 and 10.0, respectively. 95% HPD is shown in parentheses. Rate (10 28 substitutions/ codon/year) refers to the rate of the branch preceding the node. Node numbers refer to those in Fig. 3, and taxa in parentheses refer to those branched off from the lineage leading to Oryza. Found at: doi:10.1371/journal.pone.0005297.s007 (0.04 MB DOC)