Advantages of a Mechanistic Codon Substitution Model for Evolutionary Analysis of Protein-Coding Sequences

Background A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated. Results The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.


Introduction
Growing DNA and protein sequence data is now a valuable source of knowledge in many fields of science, especially in evolutionary biology. Evolutionary history of DNA sequences is a key to understand the diversity of homologous sequences. Any method for inferring molecular phylogeny is implicitly or explicitly based on the evolutionary model of nucleotide or amino acid substitutions, and the reliability of phylogenetic analyses strongly depends on models designed to approximate the substitution processes of nucleotide and amino acid. For the evolutionary analysis of protein-coding sequences, three types of models can be used: nucleotide, amino acid, and codon substitution models. Which type of model fits any sequence data better than the others? Mutational events occur at the nucleotide level, but selective pressure primarily operates at the amino acid level. Thus, a codon substitution model has a potential to outperform both nucleotide substitution models [1][2][3] and amino acid substitution models [4][5][6][7][8][9][10][11][12], because it can take into account both mutational tendencies at the nucleotide level and selective pressure on amino acid replacements as well as a genetic code. Shapiro et al. [13] proposed a codon position model, in which codon position is incorporated into a nucleotide substitution model. This model is computationally efficient but insufficient to take account of the dependencies of selective pressure on amino acid replacements.
Codon substitution models are classified into either an empirical codon substitution model or a mechanistic codon substitution model. In empirical codon substitution models [14,15], substitution rates between codons were empirically estimated from a large set of protein-coding sequences, and mutational tendencies at the nucleotide level and selection pressure at the amino acid level cannot be separated at all. Therefore, there is no parameter except codon frequencies to tailor for each protein family. Delport et al. [16] showed that empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution.
In mechanistic codon substitution models, a mutational mechanism at the nucleotide level and selection at the amino acid level are distinguished in various levels of separation. If no selection pressure on amino acids is taken into account, the codon substitution model will become essentially equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then the model will become essentially equivalent to an amino acid substitution model. Such a codon model with the infinitely large synonymous substitution rate, was proved [17] to be exactly equivalent to an amino acid substitution model. It was shown on protein-coding sequences that codon substitution models are statistically superior to the nucleotide and amino acid substitution models [18,19].
There are two type of models for the mutational scheme of codon, depending on whether multiple nucleotide changes in infinitesimal time are allowed [17,[19][20][21] or not. Even though all the empirical amino acid substitution models [4][5][6]8,10,11] and the empirical codon substitution model [15] allow amino acid or codon substitutions requiring multiple nucleotide changes in infinitesimal time, only single nucleotide changes were assumed to occur in infinitesimal time [7,18,[22][23][24][25]. Multiple nucleotide changes in infinitesimal time are biologically plausible, because they can be caused by successive compensatory substitutions [26], recombination, gene conversion and other mechanisms [27], especially in long branches. It has been pointed out that assuming multiple nucleotide changes in codon substitution models can significantly improve the maximum likelihood (ML) value [19,20].
In the present models, mutational tendencies at the nucleotide level are tailored to each gene by the general time-reversible (GTR) model, but multiple nucleotide changes in infinitesimal time are allowed. In the Singlet-Doublet-Triplet (SDT) mutation model [20], single-nucleotide, doublet and triplet mutations spanning codon boundaries are taken into account, but double nucleotide mutations at the first and the third positions in a codon were not taken into account. In the present model, it is assumed [19] that nucleotide mutations occur independently at each codon position and so any double nucleotide mutation occurs as frequently as doublet mutations.
There are a variety of models for selection pressure on amino acid replacements in mechanistic codon substitution models; (1) models [17,18,21] based on empirical amino acid substitution matrices, in which codon exchangeabilities for nonsynonymous substitutions were evaluated on the basis of empirical amino acid exchangeabilities, and selective constraints on amino acids are not well separated from codon mutation rates, (2) equal-constraint models [20,24,28,29], in which the difference between nonsynonymous and synonymous substitution rates was taken into account but the amino acid dependences of selective constraints on amino acids were not taken into account, i.e., single selective constraints for all types of amino acid substitutions, (3) physico-chemicalconstraint models [7,22,23], in which selective constraints for each protein family were approximated in a linear function of the selective constraints evaluated from physico-chemical properties of amino acids, (4) fully-parameterized-constraint models [7,16,25], in which selective constraints were grouped, and the number of groups and the strength of selective constraint of each group were optimized for a given protein phylogeny, and (5) site-specific selection models [30], in which site-specific selection was modeled in terms of site-specific residue frequencies in a codon substitution model.
In the models [17,21] of the first category, codon exchangeabilities for nonsynonymous codon substitutions requiring multiple nucleotide changes are set to non-zero according to the empirical amino acid exchangeabilities; the exchangeability is defined to be an instantaneous rate divided by the equilibrium composition of destination codon or amino acid. The method in the fourth category has the highest resolution of selective constraints employing as many substitution groups as necessary. However, it seems to be a very computer-intensive calculation [25].
In the present model, selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints in the same way with the physico-chemical-constraint models. The simplest model for the selective constraints is to assume equal constraint on amino acid replacements and equivalent to the second category of model; it is named here the Equal-Constraint model. Of course, physico-chemical estimates of the selective constraints can also be used [19]. Better estimates are those that were estimated [19] by maximizing the respective likelihoods of observed amino acid or codon substitution frequency matrices.
A property of codon substitution models in which synonymous substitutions can be identified is an advantage over nucleotide and amino acid substitution models. Significance of rate variation over sites in proteins has been demonstrated mostly in nucleotide substitution models and empirical amino acid substitution models [31][32][33]. Variable rates of nucleotide and amino acid substitutions over sites can be caused not only by the variation of mutation rate but also by the variation of selective constraint over sites. However, in the nucleotide and the amino acid substitution models, synonymous substitutions cannot be recognized, and therefore the variations of mutation rate and of selective constraint over sites cannot be distinguished from each other. On the other hand, the variations of selective constraint and mutation rate can be distinguished from each other in codon substitution models, assuming no selective pressure on synonymous mutations at the amino acid level. It is reasonable from a viewpoint of protein structure and function that amino acid replaceabilities strongly depend on sites in a protein [34]. Molecular mechanisms are not known to cause significantly variable mutation rates over sites within the exons of a gene. Here, we examine which model fits data statistically better. In the present model, either the variation of mutation rate or the variation of selective constraint is taken into account, although both is not taken into account at the same time because of heavy computational load. Yang et al. [29] also studied heterogeneous selection pressure at amino acid sites by codon substitution models.
Besides the variation of substitution rate over sites, the variation of substitution rate over time at each site is also possible. The sitespecific variation of substitution rate over time was first discussed as a covarion model by Fitch and Markowitz [35]. Recently, a few cases indicating its significance have been reported [36,37]. Here we take into account the variation of mutation rate over time at each site in a simple approximation.
The estimation of branch lengths is critical on the estimation of phylogeny and divergence times. We examine how differently branch lengths are estimated between models. The present mechanistic codon substitution model can simulate a wide range of codon substitution processes by changing parameters, and can provide biologically meaningful information at both nucleotide and amino acid levels such as transition/transversion rate bias, the ratio of multiple nucleotide changes, the strength of selective constraints on amino acids, the variation of mutation rate or selective constraints over sites, and also the variation of mutation rate over time in branches. Here, the present codon substitution models with the various sets of parameters are extensively studied, and the advantages of the present model over other models are demonstrated.

A time-reversible Markov model for substitutions
When substitutions independently occur at each site with a constant substitution rate R kl per unit time from codon or amino acid k to l, the substitution probability matrix S(t) at time t is calculated as Assuming that the detailed balance condition between states is satisfied, i.e., f k S(t) kl~fl S(t) lk and f k R kl~fl R lk , the substitution rate R kl is represented as where f k is the equilibrium composition; P k f k S(t) kl~fl . The symmetric matrix r is named an exchangeability matrix. In the case of the codon substitution matrix, the equilibrium frequencies of stop codons are set to be equal to 0, and therefore the probability flow from any to a stop codon and its inverse flow are always equal to 0. The unit of time is chosen in such a way that the total rate of R is equal to 1; Therefore, only the relative values among r kl are meaningful. In a given phylogeny of molecular sequences, a substitution process of codon or amino acid is assumed to be in equilibrium. In other words, the substitution process is assumed to be timereversible. Also, exchangeabilities fr kl g are approximated not to depend on the equilibrium frequencies ff k g; this model is specified here with a suffix ''F'' according to a common naming convention.
Empirical amino acid substitution models converted into codon substitution models Amino acid exchangeabilities fr ab g for amino acid substitutions have been estimated from large sets of protein sequences. From nuclear proteins, the JTT [5], the WAG [10], and the LG [11] rate matrices were estimated. The mtREV [6] substitution probability matrix was estimated from vertebrate mitochondrial proteins, and the cpREV10 [8] and the cpREV64 [38] matrices were estimated from chloroplast proteins of 10 species and of 64 species, respectively. These amino acid substitution models can be converted into codon substitution models by defining codon exchangeabilities on the basis of amino acid exchangeabilities between encoded amino acids [17][18][19]21]. Here we consider the following simplest conversion from the amino acid models into codon models to examine the performance of the empirical amino acid substitution models in phylogenetic inference from coding sequences.
The codon exchangeability r mn between nonsynonymous codons m and n is defined to be proportional to the empirical amino acid exchangeability r empirical ambn between encoded amino acids a m and b n with a parameter w 0 to adjust the ratio of synonymous to nonsynonymous substitution exchangeability. Codon exchangeabilities between synonymous codons are taken to be all equal to one another in such a way that in the case of w 0~0 they are equal to the maximum exchangeability of nonsynonymous substitutions.
where 00 empirical 00 [f JTT, WAG, LG, mtREV, cpREV10, cp REV64 g. The arbitrary scaling constant C onst is determined by Eq. 3. In the limit of w 0 ?{?, this model is exactly equivalent to the corresponding amino acid substitution model [17]. This model was named as the SK-P1 model by Seo and Kishino [17], and is called here by the name of the empirical amino acid substitution matrix with a suffix meaning the number of ML parameters such as JTT-n, WAG-n, LG-n, mtREV-n, cpREV10-n, and cpREV64-n.

Empirical codon substitution models
Kosiol et al. [15] estimated codon exchangeabilities fr mn g from nuclear-encoded sequences; this substitution rate matrix is called here the KHG matrix. This empirical codon substitution model has been extended here with a parameter w 0 to adjust the ratio of synonymous to non-synonymous substitution exchangeability.
for m=n, where empirical [f KHG g, and d ambn is the Kronecker's d. The arbitrary scaling constant C onst is determined by Eq. 3. The exchangeabilities of nonsynonymous codon substitutions are scaled in such a way that in the case of w 0~0 the maximum exchangeability of nonsynonymous substitutions is equal to that of synonymous substitutions. This model is called KHG-n, where the suffix n means the number of ML parameters.

A mechanistic codon substitution model with multiple nucleotide changes
In the present mechanistic codon substitution model [19], the substitution rate R mn is represented as the product of a mutation rate M mn and the average rate of fixation F mn , which is defined to be the average fixation probability multiplied by the chromosomal population size, for mutations from codon m to n under selection pressure; R mn ! M mn F mn for m=n. The M is also assumed to satisfy the detailed balance condition; f mut m M mn~f mut n M nm , where f mut n is the equilibrium codon composition of the rate matrix M. Under this assumption, the average fixation rate F mn must be represented as the product of the two terms, f n =f mut n and e wmn , where w mn~wnm ; F mn~( f n =f mut n )e wmn for m=n. Then, the exchangeability r mn can be represented as The arbitrary scaling constant C onst is is determined by Eq. 3. The frequency-dependent term f n =f mut n represents the effects of selection pressures at the DNA level as well as at the amino acid level, which change the codon frequency from the mutational equilibrium frequency f mut n to the frequency f n specific to a gene. The fixation rate F was explicitly given as a function of the fitnesses of mutants m and n [28,30]. The fixation rate is obviously equal to 0 for lethal mutations and equal to 1 for neutral mutations. Here, we approximate the average quantity e wmn over mutants to be independent of codon frequencies. This quantity e wmn is essentially the same as the one called the rate of acceptance by Miyata et al. [39]. We assume that selection pressure against codon replacements appears primarily on an amino acid sequence encoded by a nucleotide sequence; in other words, w mn for codon pair (m,n) is equal to the selective constraint w ambn for the encoded amino acid pair (a m ,b n ). where w estimate ab with 00 estimate 00 [f Equal{Constraint, EI, JTT{ML91z, WAG{ML91z, LG{ML91z, KHG{ML 200g means the estimate of w ab , which is equal constraint on amino acids (w estimate ab~0 or b~0), a physico-chemical estimate based on the Energy-Increment-based (EI) method [19], or a ML estimate [19] from the empirical substitution frequency matrix of JTT, WAG, LG, or KHG. The value of w ab is non-positive, assuming that on average there is negative selection on amino acid replacements; of course, w estimate ab ƒ0 [19]. The parameter b, which is non-negative, adjusts the strength of selective constraints for a given protein family. The parameter w 0 directly controls the ratio of nonsynonymous to synonymous substitution exchangeability. Positive selection is taken into account when selective constraints are variable over sites.
The Equal-Constraint model with w 0~0 is called the No-Constraint model and is equivalent to a nucleotide substitution model. In the model EI,ŵ w estimate ab :Dê e c ab zDê e v ab , where Dê e c ab and Dê e v ab represent the mean increment of contact energies between residues and the mean volume change due to an amino acid replacement, respectively; see Supporting Information, Text S1, in [19]. The selective constraint matrices w estimate with 00 estima te 00 [f JTT{ML91z,WAG{ML91z, LG{ML91z g were those estimated by maximizing the respective likelihoods of the 1-PAM amino acid substitution frequency matrices of JTT, WAG, and LG in the ML-91+ model [19]. Similarly, the matrix w KHG{ML200 were estimated from the 1-PAM KHG codon substitution frequency matrix in the ML-200 model [19]. These estimates of selective constraints are available as Supporting Information, Data S1, in [19]. These models are called here by the name of a selective constraint matrix with a suffix meaning the number of ML parameters such as Equal-Constraint-n, EI-n, JTT/ WAG/LG-ML91+-n, and KHG-ML200-n.
The mutation rate matrix M is defined in terms of nucleotide mutation rates as follows.  where f mut i,n i is the equilibrium composition of nucleotide n i at the ith codon position, and m i,m i n i is the exchangeability between nucleotides m i and n i at the ith codon position. Because the B i is assumed to satisfy the detailed balance condition, the M also satisfies the detailed balance condition.
If multiple nucleotide changes were completely ignored, then Eq. 10 would be simplified as M mn~( (1{d , whose formulation for a codon mutation rate matrix with Eq. 11 is the same as the one proposed by Muse and Gault [24]. Here, it should be noted that B i,m i ni in Eq. 11 is defined to be proportional to the equilibrium nucleotide composition f mut i,ni . Alternatively, one may define M mn as M mn~P 3 i~1 ½d m i ni z(1{d m i ni ) m i,m i ni f mut n in the same way as Miyazawa and Jernigan [22] and others [7,23] defined it to be proportional explicitly to the composition of the base triplet, f mut n . This alternative definition with Eqs. 6 is equivalent to Eqs. 10 and 11 with f mut ni~0 :25 and m i,m i ni [ 4m i,m i ni , and thus it is a special case in the present formulation.
The No-Constraint model, in which there is no selection pressure on amino acid replacements (w mn~0 ), is a nucleotide substitution model extended to allow multiple nucleotide changes in infinitesimal time. Also, it is useful to note that the present model in the special case of M mn~c onstant becomes equivalent to an amino acid substitution model converted into a codon substitution model; if (m i ) m i ni~4 and f mut i,ni~0 :25, then M mn~1 and Eq. 6 will become r mn !e wmn and equivalent to Eq. 4 with r empirical ab !e bw estimate ab .
In the present analyses, we assume for simplicity that m i,m i ni and f mut i,ni do not depend on codon position i; that is, m i,jg~mjg and f mut i,j~f mut j , where j,g[fa,t,c,gg. This approximation is reasonable because mutational tendencies may be independent of a nucleotide position in a codon. Let us define m ½tc½ag to represent the average of the exchangeabilities of the transversion type, m ta , m tg , m ca , and m cg , and likewise m tcjag to represent the average of the exchangeabilities of the transition type, m tc and m ag . We use the ratios fm jg =m ½tc½ag g as parameters for exchangeabilities, and m(:m ½tc½ag ) to represent the ratio of the exchangeability of double nucleotide change to that of single nucleotide change and also the ratio of the exchangeability of triple nucleotide change to that of double nucleotide change; note that the exchangeabilities of single, double, and triple nucleotide changes are of O(m ½tc½ag ),O(m 2 ½tc½ag ), and O(m 3 ½tc½ag ) in Eq. 3, respectively, and that Eq. 3 must be satisfied. Then, multiple nucleotide changes in infinitesimal time can be completely neglected by making the parameter m(:m ½tc½ag ) approach zero with keeping fm jg = m ½tc½ag g constant in Eq. 3. Also, it is noted that unlike the SDT model [20] double nucleotide changes at the first and the third positions in a codon are assumed to occur as frequently as doublet changes.
The number of parameters except equilibrium codon frequencies in the mechanistic codon substitution model is equal to 11; they are b, w 0 , m(:m ½tc½ag ), m tcjag =m ½tc½ag , m ag =m tcjag , m ta = m ½tc½ag , m tg =m ½tc½ag , m ca =m ½tc½ag , f mut a , f mut c , and f mut g , and fixed at certain values or optimized as ML parameters.

Variations of mutation rate and of selective constraint across codon sites
Taking account of the variation of amino acid substitution rate over sites always increases the maximum likelihood of a phylogenetic tree in the analysis of amino acid sequences [31]. The variation of amino acid substitution rate can be caused by the variation of mutation rate and also by the variation of selective constraint on amino acids. Here, the variation of either mutation rate or selective constraint over sites is taken into account, but both are not taken into account at the same time because of a heavy computational load.
The variation of mutation rate over codon sites is also assumed to obey a C distribution [31] with a shape parameter a and the mean equal to 1, which is then approximated by a discrete-gamma distribution [32,40] with m categories, each with equal probability, This model is specified with a suffix dGmr whose m means the number of categories.
The variation of selective constraint over amino acid sites is assumed to obey a discrete-gamma distribution, too. In this model, the average of selective constraints over amino acid pairs (the mean acceptance rate), P a P bwa e w ab =190 in the mechanistic codon substitution model or w 0 in other codon substitution models, is assumed to vary according to a discrete-gamma distribution. The rate matrix of each category is scaled so that the mean rate matrix satisfies Eq. 3. This model is specified with a suffix dGms whose m means the number of categories.
In the mechanistic codon substitution model, selective constraint w i,ab for ith category in a discrete-gamma distribution is calculated to satisfy the following equations.
1 190 where C i is the value of the i th category in the discrete-gamma distribution whose mean is equal to the average of e w ab over all amino acid pairs and whose shape parameter is equal to a; 0ƒC i vC iz1 . If C i v1 and c i exp w ab ƒ1 for Va,b, c i will be simply equal to a point of the discrete-gamma distribution whose mean is equal to 1.
In the other codon models, the equal amino acid constraint w i,0 for i th category in Eq. 4 and Eq. 5 is calculated from the following equations.
In this case, C i is and c i are points of the discrete-gamma distributions, whose means are equal to exp w 0 and 1, respectively, with the shape parameter a.
The shape parameter a of the discrete-gamma distribution for the variation of mutation rate or selective constraint is optimized as one of ML parameters. Equal probability of each category is used for the mutation rate variation, but it may be inappropriate for the variation of selective constraint, because C i (iw1) is often too small for a rate matrix to be significantly different between C i{1 and C i . In such a case, the prior probability of C i{1 is increased to make the rate matrices for C i{1 and C i significantly different.
A simple approximation for the variation of mutation rate over time A mutation rate at each site may vary in each branch, especially long branches, of a phylogenetic tree. If the variation of mutation rate is synchronized among sites, it will be reflected by the length of each branch. The unsynchronized portion of rate variation among sites is considered. Here, a simple approximation for the variation of mutation rate over time is provided. The mutation rate matrix M and therefore the substitution rate matrix R are assumed to vary in time only by a scalar factor, m(t) at time t. The expected values of the mean and the variance of the total substitution rate in a branch whose length is equal to T are as follows.
The mutation rate as a function of time is assumed to be autocorrelated with a correlation time t%T. In this case, the mean and the variance are both the linear functions of T. For the variation of the total mutation rate in the branch of the length T, we assume a C distribution whose scale and shape parameters are equal to s^2tE((m{E(m)) 2 )=E(m) and a~E(m)T=s, respectively. Then, the expected substitution matrix is: where C(x; E(m)T=s,s) is the probability density function of a C distribution with a scale parameter s and a shape parameter equal to E(m)T=s, C(E(m)T=s) is the C function, and I is the identity matrix. Then, log E(S(1,1)) is used instead of R as a rate matrix; the rate matrix log E(S(1,1)) is scaled to make the mean rate matrix satisfy Eq. 3. A constant mutation rate corresponds to s~0. The scale parameter s is set to 0 or is optimized as a ML parameter. This approximation for the variation of mutation rate over time is very simple and does not require any additional computational time, although the performance will be limited in comparison with a more complete approximation [36]. However, the ML estimate of s in this approximation may be influenced by the variation of mutation rate across sites, because the mean of the substitution matrix over sites is represented by a similar functional form to E(S(E(m),T)); assuming that mutation rates vary across sites with a C distribution, the mean of substitution matrix over sites for a branch of the length T is formulated as SS(E(m),T)T: which is equal to the expected substitution matrix in the case of twT.
1. Dataset cpDNA-9: Divergent codon sequences consisting of 45 protein-coding genes from 9 chloroplast genomes, whose protein sequences were used to estimate the cpREV10 by [8]; Synechocystis PCC6803, which was the outgroup sequence in their analysis, is not used in the present analysis. The codon sequences were obtained from the NCBI RefSeq database of organelle genomes. The total codon length of aligned genes is equal to 12507, and the minimum amino acid identity between sequences is equal to 0.58. The tree topology that was estimated as Tree-1 by [8] is used here as the most probable tree. Overlapped segments between genes were removed from codon sequences. 2. Dataset cpDNA-55: Codon sequences consisting of 52 proteincoding genes from 55 chloroplast genomes of the major angiosperm lineages, which are genome sequences available in the NCBI RefSeq database out of the 64 genomes analyzed in [41], and which are genes owned by all 55 taxa. The tree topology estimated by [41] is used as the most probable tree in the present analysis. The total codon length of aligned genes is equal to 14128, and the minimum amino acid identity between the sequences is equal to 0.73. The cpREV64 [38] was estimated from the full set of 77 protein-coding genes in the 64 genomes. 3. Dataset mammalian-mtDNA: Interspecific mammalian mitochondrial codon sequences consisting of 12 protein-coding genes from 69 mammalian species [42], whose genome sequences were obtained from the NCBI RefSeq database of organelle genomes. The total codon length of aligned genes is equal to 3618, and the minimum amino acid identity between the sequences is equal to 0.66. The tree topology that was estimated as Tree-6 by [42] is used here as the most probable tree. Overlapped segments between genes were removed from codon sequences. 4. Dataset human-mtDNA: Intraspecific human mitochondrial codon sequences consisting of 12 protein-coding genes from 53 human races [43], whose genome sequences were obtained from a human mitochondrial genome database (MITOMAP). The total codon length of aligned genes is equal to 3579, and the minimum amino acid identity between the sequences is equal to 0.99. The present analyses are done using the neighbor-joining tree topology estimated by [43]. Overlapped segments between genes were removed from codon sequences. 5. Dataset nDNA: Codon sequences of the 10 most slowlyevolving genes out of the 2789 nuclear genes of 10 mammals that were analyzed by [44]. The tree topologies estimated by [44] are used for respective genes and the tree-1 named by them is used here for the analyses of the concatenated genes. The total codon length of aligned genes is equal to 1112, and the minimum amino acid identity between the sequences is equal to 0.97.
Homologous codon sequences are aligned every gene by ClustalW2 [45] that is modified to align codon sequences with codon score matrices [19]. The ML values for each model are calculated for each gene and also for the concatenated sequences of all genes by Phyml [46] also modified to analyze codon sequences.

Statistical comparison of codon substitution models
Model selection must be pursued with considerable attention [47]. For the comparison of models one of which is a special case of the other, the likelihood ratio test (LRT) [48] can be used to test the superiority of a nesting model to nested models. Models that are not nesting or nested can be compared using Akaike information criterion (AIC) [49], Bayesian information criterion (BIC) [50], a decision-theoretical approach [51,52], and the Bayes factor [53]. Here, AIC and BIC for a given tree topology of aligned codon sequences are used to compare codon substitution models derived from various empirical amino acid and codon substitution rate matrices and mechanistic codon substitution models with the wide range of selective constraint matrices. The AIC and BIC are defined as follows [18]: where K is the number of adjustable parameters,ĥ h is the vector of the ML estimates of the parameters, '(ĥ h ) is the maximum loglikelihood value, and n is the number of codons in a codon alignment. The model whose AIC or BIC is the minimum is regarded as the best model.

Results and Discussion
The naming convention of the present models is briefly described in Table 1. In all models, the equilibrium frequencies of codons are estimated to be equal to codon frequencies in sequences. Other parameters including the scale parameter s of a C distribution for the variation of mutation rate over time are set to a certain value or optimized by maximizing the likelihood of a given topology of a phylogenetic tree. For the empirical amino acid substitution models converted into the codon substitution models, s~0 was assumed, because it seems not to be well matched with these models. In the empirical codon substitution model, s was optimized as well as w 0 . In the mechanistic codon substitution models, all 12 parameters including s for the substitution rate matrix will be optimized if the AIC and the BIC values of a phylogenetic tree are decreased. For the separating analyses of human-mtDNA and the concatenating and the separating analyses of nDNA, which are both datasets consisting of highly-homologous sequences, the five parameters of b, w 0 , s, m(:m ½tc½ag ), and m tcjag =m ½tc½ag were optimized with f mut j0 :25 and m ag =m tcjag~mta =m ½tc½ag~mtg =m ½tc½ag~mca =m ½tc½ag1 :0. In all models, the variation of mutation rate or the variation of selective constraint over sites is taken into account. Both the variations over sites were approximated by a discrete-gamma distribution [32] with 4 categories. The shape parameter a of the discrete-gamma distribution is optimized by maximizing the likelihood. Equal probability was used for each category in all models of rate variation. In the models of variable selective constraints, equal probability was used only for the nonmechanistic codon models for the cpDNA-9, and the different sets of prior probabilities on the basis of the values of C i were used for the other models; p(C 1 )~0:50,p(C 2 )~0:25,p(C 3 )~p(C 4 )0 :125 for the datasets cpDNA-9, cpDNA-55, and mammalian-mtDNA, and p(C 1 )~0:75,p(C 2 )~0:125,p(C 3 )~p(C 4 )~0:0625 for the human-mtDNA and the nDNA. The AIC and the BIC are used for statistical comparisons of models.

Mechanistic codon substitution models outperform other substitution models
First, each gene in a dataset is separately aligned and then all aligned sequences are concatenated. The maximum log-likelihood values of a given phylogenetic tree of concatenated genes for various codon substitution models are listed in Tables 2, 3, 4, 5, 6 for cpDNA-9, cpDNA-55, mammalian-mtDNA, human-mtDNA, and nDNA, respectively. Values in parentheses indicate that the corresponding parameters are fixed at the value specified. The maximum log-likelihood ('), AIC and BIC values for each model are listed in these tables with the difference (D', DAIC , and D BIC) from those of a reference model. For the datasets cpDNA-9, cpDNA-55, and nDNA that use the universal codon table, the empirical codon substitution model KHG-2-F-dG4s estimated from nuclear-encoded sequences is used as a reference state; in the KHG-2-F-dG4s, s is optimized as well as w 0 . For mitochondrial genomes that use a minor genetic code, no empirical codon substitution rate matrix is available, and so the codon substitution model, mtREV-1-dG4s, which is converted from the empirical amino acid substitution matrix mtREV estimated from mitochondrial proteins, is used as a reference state; in the mtREV-1-F-dG4s, s~0 is assumed, and only w 0 is optimized.
In the case of mitochondrial genes, i.e., mammalian-mtDNA and human-mtDNA, the models based on mtREV always show the smallest DAIC and DBIC , i.e., the best performance, in the empirical amino acid substitution models converted into the codon substitution models. For the dataset cpDNA-55, the models converted from cpREV64 show the best performance in the models converted from the empirical amino acid substitution models, and the models converted from cpREV10 perform best for the dataset cpDNA-9. These results are reasonable because the amino acid substitution probability matrix mtREV [6] was estimated from mitochondrial proteins, and cpREV64 [38] and cpREV10 [8] were estimated from the full sets of chloroplast proteins corresponding to cpDNA-55 and cpDNA-9, respectively; see the method section. A rather interesting result is that the models converted from cpREV64 shows larger D AIC and DBIC for cpDNA-9 than the models converted from LG, WAG, and JTT that were estimated from nuclear-encoded proteins, This fact indicates that substitution tendencies vary between genes and cannot always be represented by the average tendencies of substitutions. Delport et al. [16] showed that the empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution.
The empirical codon substitution model KHG performs significantly better for chloroplast-encoded and nuclear-encoded genes than all the amino acid substitution models converted into the codon models. It has often be insisted that synonymous substitutions are saturated between distantly related genes and so substitution analyses at the codon level hardly include more information than those at the amino acid level. However, a fact that KHG performs better even for the distantly related sequence family (cpDNA-9) than the models converted from cpREV10 indicates that codon sequences include more information than amino acid sequences even in the case of distantly related sequences.
If the amino acid substitution models converted into codon models are compared with the mechanistic codon substitution models, the superiority of the codon substitution models will be clearer. For all datasets, the mechanistic codon models with the various estimates of selective constraints show significantly lower The empirical amino acid exchangeabilities of JTT [5], WAG [10], LG [11], cpREV10 [8], cpREV64 [38], and mtREV [6] are used as fr empirical ab g in Eq. 4. The suffix n means the number of parameters optimized for the substitution rate matrix; the w 0 is a ML parameter when n §1.

B. Empirical codon substitution models
The empirical codon exchangeabilities of KHG [15] are used as fr empirical mn g in Eq. 5. The suffix n means the number of parameters optimized for the substitution rate matrix; the w 0 is a ML parameter when n §1, and s is equal to 0 for n~1 and optimized when n~2.

C. Mechanistic codon substitution models
No-Constraint-n -F-dGmr a , Equal-Constraint-n -F-dGm [rs] b~0 for both models and also w 0~0 for the No-Constraint model; see Eq. 9. The suffix n, whose maximum number is equal to 10 or 11, means the number of parameters optimized for the substitution rate matrix.
:Dê e c ab zDê e v ab based on the Energy-Increment-based (EI) method [19] is used to estimate w ab in Eq. 9. The Dê e c ab and Dê e v ab represent the mean increment of contact energies between residues, and the mean volume change due to an amino acid replacement, respectively; see Supporting Information, Text S1, in Miyazawa [19]. The suffix n, whose maximum number is equal to 12, means the number of parameters optimized for the substitution rate matrix.
LG-ML91+-n -F-dGm[rs] Selective constraints fw JTT=WAG=LG{ML91z ab g estimated by maximizing the likelihood of JTT/WAG/LG [5,10,11] in the ML-91+ model [19] are used as fw estimate ab g in Eq. 9. The suffix n, whose maximum number is equal to 12, means the number of parameters optimized for the substitution rate matrix.

KHG-ML200-n -F-dGm[rs] a
Selective constraints fw KHG{ML200 ab g estimated by maximizing the likelihood of the KHG codon substitution matrix [15] in the ML-200 model [19] are used as fw estimate ab g in Eq. 9. The suffix n, whose maximum number is equal to 12, means the number of parameters optimized for the substitution rate matrix. a In the models specified with the suffix ''F'', equilibrium codon frequencies are assumed to be equal to codon frequencies in codon sequences. dGm[rs], i.e., dGmr or dGms, means that the variation of mutation rate or selective constraint over site is approximated by a discrete gamma distribution with m categories [32], respectively; m~1 means no variation and the suffix dG1[rs] is omitted. doi:10.1371/journal.pone.0028892.t001 D AIC and DBIC than the amino acid substitution models converted into the codon models. The Equal-Constraint model always performs worst, and is far inferior to the amino acid dependent constraint models for the phylogenetic trees including long branches such as the datasets cpDNA-9 and mammalian-mtDNA. Only for the phylogenetic trees consisting of extremely short branches such as the datasets human-mtDNA and nDNA, it is not remarkably worse than the amino acid dependent constraint models; amino acid identities between sequences are equal to or larger than 0:99 in human-mtDNA and 0:97 in nDNA. Consistently, DAIC and D BIC for the No-Constraint model, which is essentially equivalent to a nucleotide substitution model, are extremely larger for cpDNA-9 and mammalian-mtDNA, but smaller for cpDNA-55 and human-mtDNA than those for the reference model. These results can be explained to be because the amino acid dependencies of selective constraints must be taken into account to correctly evaluate amino acid substitutions, which occur in long branches, in order to precisely estimate branch lengths. One of the interesting facts is that the No-Constraint model is better for cpDNA-55 and human-mtDNA but worse for nDNA than the reference model, even though the phylogenetic tree of nDNA consists of short branches. This characteristic feature results from a fact that the genes in nDNA are slowlyevolving genes with strong selective constraints on amino acids; note that sequences in the dataset nDNA are highly homologous with amino acid identities greater than 0.97 but are collected from a wide range of mammalian species, i.e., Borentheria, Xenarthra, and Afrotheria.
The EI model, in which the selective constraints were evaluated on the basis of average contact energies between residues in Table 2. Comparisons between various codon substitution models in the concatenating analysis of cpDNA-9. The prior probability of each category for the mechanistic codon models of ''dG4s'' is p(C 1 )~0:50, p(C 2 )~0:25, and p(C 3 )~p(C 4 )~0:125; equal probability is used in other models. b The number of adjustable parameters. proteins [19], always show better performance than the Equal-Constraint model but is always inferior to the other models, which use the selective constraints estimated from the empirical amino acid substitution frequency matrices, especially for the datasets cpDNA-9 and mammalian-mtDNA including long branches. The similar result was obtained in [19]. The selective constraint matrix LG-ML91+ performs better on average than the WAG-ML91+, JTT-ML91+, and KHG-ML200, although the differences of D' between them are small in comparison with the differences from the EI. An unexpected fact is that the selective constraint matrix KHG-ML200 estimated from the codon substitution rate matrix KHG tends to be inferior to the other selective constraint matrices estimated from the empirical amino acid substitution rate matrices, LG-ML91+, WAG-ML91+, and LG-ML91+, although it performs better except for nDNA than the EI.
In the concatenating analyses of multiple genes, it is assumed that all genes have no difference in equilibrium codon frequencies, nucleotide exchangeabilities, and the variations of mutation rate and of selective constraint. These assumptions are not always appropriate. Thus, the separating analyses of multiple genes have been carried out. The D BIC of each gene for some models are plotted against the maximum log-likelihood value for the best model in Fig. 1 for all datasets. In all datasets, the mechanistic codon substitution models show significantly lower DBIC than the best amino acid substitution model converted into the codon models, for almost all genes except some genes for which the maximum log-likelihood values are large owing to short sequences. The No-Constraint model is not shown for cpDNA-9 and mammalian-mtDNA, because its DBIC values for them are too large to show. For the phylogenetic trees of cpDNA-55 and Table 3. Comparisons between various codon substitution models in the concatenating analysis of cpDNA-55.

Codon substitution model
LG-1-F-dG4r The value parenthesized means that the parameter is fixed at the value specified. f The scale parameter of a C distribution for the variation of mutation rate over time. human-mtDNA consisting of relatively short branches, the No-Constraint model, i.e., a nucleotide substitution model, is better for most of the genes than the amino acid substitution models converted into the codon models, as also indicated by the concatenating analyses. Even for those datasets, D BIC can be further decreased by the mechanistic codon substitution models including the the Equal-Constraint model. However, differences of D BIC between the mechanistic substitution models with the different selective constraints are small for those dataset in comparison with the improvement from the amino acid substitution models converted into the codon models. For the phylogenetic trees of cpDNA-9 and mammalian-mtDNA consisting of long branches, the differences between the Equal-Constraint and the EI and between the EI and the best model with amino acid dependent selective constraints are very significant, as indicated by the concatenating analyses.
The mechanistic codon substitution model performs better for a wide range of sequences from highly-homologous to highlydiverged sequences than both nucleotide and amino acid substitution models. This is because it takes into account both mutational tendencies at the nucleotide level and selection at the amino acid level.

Variable mutation rates versus variable selective constraints over sites
Significance of rate variation over sites in proteins has been demonstrated in nucleotide substitution models and amino acid substitution models [32,33]. These results do not necessarily indicate the variation of mutation rate over sites, because the variation of selective constraint over sites in proteins can also cause the variation of amino acid substitution rate over sites even under a uniform mutation rate over sites. Here, we examine which model better fits the heterogeneity of amino acid substitution rate over sites.
The discrete gamma distribution with 4 categories has been used to emulate both the variations of selective constraint and of mutation rate over sites. The models with variable selective constraints and with variable mutation rates are specified by dG4s and dG4r, respectively. Tables 2, 3, 4 for the concatenating analyses of genes consistently indicate that the codon substitution LG The value parenthesized means that the parameter is fixed at the value specified. f The scale parameter of a C distribution for the variation of mutation rate over time. g The ratio of double to single and of triple to double nucleotide change exchangeability;m m:m m ½tc½ag . h The ratio of mean transitional to mean transversional exchangeability;m m tcjag =m m ½tc½ag . i The shape parameter of a discrete gamma distribution for the variation of mutation rate or selective constraint over sites. doi:10.1371/journal.pone.0028892.t004 models with the variation of selective constraint (dG4s) show significantly lower D AIC and DBIC than the corresponding models with the variation of mutation rate (dG4r) over sites for the datasets cpDNA-9, cpDNA-55, and mammalian-mtDNA. The comparisons of DBIC of each gene between those two types of the models are shown in Fig. 2 for all datasets. These figures also show that the variation of selective constraint is a statistically better model than the variation of mutation rate at least for cpDNA-9, cpDNA-55 and mammalian-mtDNA. This is reasonable because a mutation rate may not significantly differ among sites in a gene but selective constraints originating in the tertiary structure and the function of a protein should vary among sites in a protein.
Generally speaking, selective constraints on amino acid replacements are stronger in a protein core than on protein surface [54].
However, in both the concatenating analyses and the separating analyses of genes, the DAIC and the DBIC values for the models with the variation of selective constraint are not smaller for the nDNA than those for the models with the variation of mutation rate. For the human-mtDNA consisting of highly-polymorphic intraspecific mitochondrial genes, the mechanistic codon models with the variation of selective constraints attain slightly lower DAIC and DBIC than the corresponding models with rate variation, although the differences of DBIC between the two models are insignificant in the separating analyses of the genes. The phylogenetic trees of the datasets human-mtDNA and nDNA consist of extremely short branches only, in which nonsynonymous substitutions insignificantly occur under strong selective constraints. In such a phylogenetic tree, it is hard to estimate correctly the variation of selective constraint over sites as indicated by the high performance of the Equal-Constraint model. This would be the reason why the differences of DBIC between the mechanistic codon models of the dG4r and the dG4s are insignificant in the separating analyses of genes for the human-mtDNA. On the other hand, the present result for the nDNA, which consists of 10 genes that are not necessarily closely-located in the same chromosome, may indicate the possibility of rate variation over sites.

Site dependencies of selective constraints
Selective constraints against amino acid replacements at each site must reflect both structural and functional constraints on a Table 5. Comparisons between various codon substitution models in the concatenating analysis of human-mtDNA.
LG- The value parenthesized means that the parameter is fixed at the value specified. f The scale parameter of a C distribution for the variation of mutation rate over time. g The ratio of double to single and of triple to double nucleotide change exchangeability;m m:m m ½tc½ag . h The ratio of mean transitional to mean transversional exchangeability;m m tcjag =m m ½tc½ag . i The shape parameter of a discrete gamma distribution for the variation of mutation rate or selective constraint over sites. doi:10.1371/journal.pone.0028892.t005 residue type at each site, which are required for a protein to fold into a unique native structure and to properly function, and vary among residue sites in a protein. Here a simple analysis of site dependencies of selective constraints has been performed to ascertain the correlation between selective constraints and structural constrains at each site.
Site dependencies of selective constraints are evaluated [40] as a posterior mean of Se w i,ab T(: P a P bwa e w i,ab =190) over categories i for each site. Residue sites are categorized by the number of van der Waals contacts with surrounding non-solvent atoms in a protein structure, which are supposed to reflect the strength of structural constraints; neighboring residues along a polypeptide chain are not counted. Then, the posterior mean of Se w i,ab T are averaged over sites in each residue category and its dependence on the category is examined. In Fig. 3, the site dependencies of selective constrains are shown for the photosystem II CP47 chloroplast protein (psbB gene) and for the cytochrome c oxidase subunit 1 mitochondrial protein (COX1 gene). The van der Waals contacts were evaluated for the psbB in the 38-meric state of the photosystem II protein complex and for the COX1 in the biological 26-meric state of bovine heart cytochrome C oxidase in the fully reduced state; the protein coordinates 3ARC and 2EIJ in the PDB database were used. The posterior mean of selective constrains for each site was calculated in the LG-ML91+-12-F-dG4s for the concatenated sequences of the datasets cpDNA-9 and mammalian-mtDNA. It is clear that the selective constraints tend to be stronger at residues surrounded by more atoms, indicating that they reflect structural constraints at each residue site in a protein. Here we have taken account of purifying selection only, but positive selection can be also examined [29] in terms of exp w ab (fixation rate) at each site. Table 6. Comparisons between various codon substitution models in the concatenating analysis of nDNA. LG-

Estimates of branch lengths under different models
The correct estimation of branch length is critical for the estimations of phylogeny and divergence times. It is known that branch-length estimation is significantly influenced by model selection. Yang et al. [33] found that branch lengths are severely underestimated by nucleotide substitution models in which rate variation over site is ignored. Also they found that simpler and worse models tend to underestimate branch lengths more severely, and such a bias is more serious for longer branches.
In Fig. 4, branch lengths estimated by the models with a uniform substitution rate, with the variation of selective constraint, and with the variation of mutation rate over sites are plotted against those estimated by the model with the variation of selective constraint over sites. The dotted lines in these figures are ones connecting the origin and the point of the longest branch on the abscissa. Assuming the variation of mutation rate or selective constraint leads to longer estimates of branch lengths than the uniform substitution rate over sites. However, the estimates of branch lengths are significantly different between the schemes of variable mutation rates and of variable selective constraints over sites, and assuming the variation of mutation rate estimates branch lengths much longer for all datasets than the variation of selective constraint.
Branch lengths estimated by the models with the variation of mutation rate (dG4r) and with a uniform substitution rate are both roughly proportional to those shown on the abscissa, i.e., those estimated by the model with the variation of selective constraint (dG4s). However, as pointed out by Yang et al. [33], a systematic bias in the estimation of branch length is shown; the ratio of the branch length estimate of a worse model to that of the best model tends to be smaller for longer branches irrespective of overestimation or underestimation. For cpDNA-9, cpDNA-55 and mammalian-mtDNA, for which the dG4s is the best model, plus marks for a uniform substitution rate and cross marks for the dG4r are plotted in a concave pattern, although the concave pattern for dG4r is not clear in cpDNA-55. For nDNA, which the dG4r fits better than the dG4s, cross marks for the dG4r are plotted in a slightly convex pattern. This systematic bias indicates that the worse models tend to underestimate the frequencies of multiple substitutions in long branches in comparison with short branches.
When the different types of models are compared with each other, the correlation of branch lengths between the models is not always good. In Fig. 5, the estimates of branch lengths for cpDNA-9 and mammalian-mtDNA in the Equal-constraint model and in the amino acid substitution model converted into the codon models are plotted against those in the best model. These estimates for cpDNA-9 are roughly proportional to those in the best model, although there is a systematic bias. However, the correlation of branch lengths between the mtREV-1-F-dG4s and the best model for mammalian-mtDNA is not as good as those between the models for cpDNA-9.
In the result, except for the datasets consisting of highlyhomologous sequences, the variation of selective constraint is a better model than the variation of mutation rate, and assuming the variation of mutation rate leads to the overestimation of branch length. Even for highly-homologous sequence families, the model with the variation of selective constraint may not be too bad, because the differences of AIC and BIC between the models with variable mutation rates and with variable selective constraints are not significantly large, and the branch lengths estimated by those models are almost proportional to each other.

Multiple nucleotide changes in infinitesimal time
Codon substitutions requiring multiple nucleotide changes can be caused by either multiple steps of single nucleotide changes or single steps of multiple nucleotide changes. In the present mechanistic codon substitution model, codon mutations by multiple nucleotide changes in infinitesimal time are taken into account. The mechanistic codon substitution models with the various selective constraint matrices all indicate m(:m ½ tc½ag)w0 for the datasets cpDNA-9, cpDNA-55, and mammalian-mtDNA, which include long branches. The DAIC and DBIC values consistently indicate that the model LG-ML91+-12-F-dG4s, in which multiple nucleotide changes are allowed, fits these datasets better than the model LG-ML91+-11s-F-dG4s, in which multiple nucleotide changes are disallowed. Also, the LRTs for LG-ML91+-11s-F-dG4s nested by LG-ML91+-12-F-dG4s reject the assumption of single nucleotide changes with p{value%0:00001 for these datasets; see Tables 2, 3, 4. This result is consistent with a report [19] that the mechanistic codon model could not well fit observed substitution frequency data unless multiple nucleotide changes in infinitesimal time are allowed.
On the other hand, the parameter for multiple nucleotide changes is not significant for the datasets human-mtDNA and nDNA that consist of closely-related or highly-conserved sequences, and whose phylogenetic trees consist of short branches only. This fact indicates that multiple nucleotide changes rarely occur in short evolutionary periods, and multiple nucleotide changes detected in relatively long branches of cpDNA-9, cpDNA-55, and mammalian-mtDNA may result from compensatory substitutions that shortly succeed single nucleotide substitutions, or other mechanisms. A possibility of successive single compensatory substitutions for multiple nucleotide changes was pointed out by Bazykin et al. [26]. Whatever results in multiple nucleotide changes in long evolutionary periods, the present method, in which multiple nucleotide changes in infinitesimal time are allowed, for codon substitutions is effective to improve the likelihood of a phylogenetic tree with long branches.

Variation of mutation rate over time
The site-specific variation of amino acid substitution rate over time was first discussed as a covarion model by Fitch and Markowitz [35], and recently its significance have been indicated again for rRNA [36] and cytochrome b [37]. Although amino acid substitutions may occur in a concerted manner with other interacting sites, causing the variation of selective constraint over time, here we has examined the variation of mutation rate over time at each site.
The present model includes a parameter s for the variation of mutation rate over time. The scale factor s~0 for a C distribution , and nDNA (E) for each specified model is plotted against its log-likelihood value for the best model in the concatenating analysis of the genes. The horizontal dotted line of DBIC~0 shows the reference model for each dataset. The best model is shown by the lowest dot-dashed line. The lower dotted line in each figure shows the data points for the EI model. The lower and the upper broken lines show the Equal-Constraint and the No-Constraint models, respectively. The No-Constraint model is not shown for cpDNA-9 and mammalian-mtDNA, because its D BIC values are too large. In the models specified with the suffix ''-5-'' for human-mtDNA and nDNA, five parameters were optimized with f mut j~0 :25 and m ag =m tcjag~mta =m ½tc½ag~mtg =m ½tc½ag~mca =m ½tc½ag~1 :0. doi:10.1371/journal.pone.0028892.g001 means no variation of mutation rate over time. Because the present simple approximation works by replacing the substitution matrix S by its expected value E(S) under rate variations, the parameter s will not only reflect the variation of mutation rate over time but also be affected by the variations of selective constraints over time and of substitution rate over sites, especially if both the variations of mutation rate and of selective constraint over sites are not taken into account; it tends to take larger values in models assuming a uniform rate over sites than variable mutation rates or selective constraints. Also, if only single nucleotide changes in infinitesimal time are assumed, i.e., m(:m ½tc½ag )~0, this parameter (s) will be estimated to be larger to increase the probability of multiple steps of substitutions. The reverse is also true.
The mechanistic codon substitution models specified with a suffix dG4s, in which selective constraints are variable across sites, all indicateŝ sw0 for the datasets cpDNA-9, cpDNA-55, and mammalian-mtDNA, which include long branches. The D AIC and DBIC values indicate that the model LG-ML91+-12-F-dG4s including s as a parameter fits these datasets better than the model LG-ML91+-11-F-dG4s assuming s~0. Also, the LRTs for LG-ML91+-11-F-dG4s nested by LG-ML91+-12-F-dG4s reject a constant mutation rate over time with p{value%0:00001 for all cpDNA-9, cpDNA-55, and mammalian-mtDNA; see Tables 2,   3, 4. Therefore, rate variation over time should not be ignored for highly-diverged sequences. The ML estimate of s for mammalian-mtDNA is larger than 1, while it is less than 0:5 for the other two datasets. The variation of mutation rate among lineages in primate mtDNAs has been indicated [6,55].
As shown in Tables 2, 3, 4, 5, 6, when mutation rates are assumed to be variable across sites, i.e., in the mechanistic codon substitution models specified with a suffix dG4r, the parameter s has been estimated to be almost equal to zero for all the datasets, even for the datasets cpDNA-9, cpDNA-55, and mammalian-mtDNA, for which the models assuming variable selective constraints indicateŝ sw0. Variable mutation rates across sites are taken into account in such a way that each site has multiple mutation rates with certain probabilities given by a discrete gamma distribution. Thus, in the present approximation it would be hard to distinguish the variation of mutation rate over time at each site from that over sites in these models.

Transition/transversion bias
One of the advantages in mechanistic codon substitution models over amino acid substitution models is that mutational tendencies at the nucleotide level can be estimated. The estimation of mutational tendencies by mechanistic codon substitution models must be more precise than by nucleotide substitution models Figure 2. Comparison of D BIC of each gene in each dataset between the models with variable mutation rates and with variable selective constraints over sites. D BIC of each gene in cpDNA-9 (A), cpDNA-55 (B), mammalian-mtDNA (C), human-mtDNA (D), and nDNA (E) is compared between the models with the variation of mutation rate (dG4r) and with the variation of selection constraint (dG4s) over sites. The dotted line shows the line of equal values between the ordinate and the abscissa. In the models specified with the suffix ''-5-'' for human-mtDNA, five parameters were optimized with f mut j~0 :25 and m ag =m tcjag~mta =m ½tc½ag~mtg =m ½tc½ag~mca =m ½tc½ag~1 :0. doi:10.1371/journal.pone.0028892.g002 Waals contacts with surrounding non-solvent atoms in the protein structure; neighboring residues along a polypeptide chain are not counted. The degree of van der Waals contact for an atom pair, which is separated by r and whose van der Waals distance is equal to r m , is defined as 2(r m =r) 6 {(r m =r) 12 for r m =rv1 and 1 for r m =r §1. The van der Waals contacts are evaluated for the psbB in the 38-meric state of the photosystem II complex from Thermosynechococcus vulcanus, and for the COX1 in the biological 26-meric state of bovine heart cytochrome C oxidase in the fully reduced state; the protein coordinates 3ARC and 2EIJ in the PDB database were used. Posterior mean of selective constrains (Se wi,ab T) averaged over sites in each residue category is shown in the ordinate. The posterior mean of selective constrains were calculated by the LG-ML91+-12-F-dG4s for the concatenated sequences of the datasets cpDNA-9 and mammalian-mtDNA. doi:10.1371/journal.pone.0028892.g003 applied to all codon positions, because selection at the amino acid level is taken into account.
Transitional substitutions have been noted to occur more frequently than transversions [56,57], and transition/transversion rate bias is more pronounced in animal mitochondrial DNAs than in nuclear or chloroplast DNAs [58]. Different measures have been used for transition to transversion bias [55,58,59]. One is the ratio of transitional differences to transversional differences between two sequences. Another is the ratio of the total transitional to the total transversional rate. Also, the ratio of transitional to transversional substitution exchangeability has been used. Here, the ratio of the mean transitional to the mean transversional exchangeability is used, because each type of transitional and transversional mutations occurs with a different exchangeability. The ratio (m m tcjag =m m ½tc½ag ) of the mean transitional to the mean transversional exchangeability is listed in Tables 2, 3, 4, 5, 6 for all datasets. The values ofm m tcjag =m m ½tc½ag in the mechanistic codon substitution models with the various estimates of selective constraints fall into a narrow range for each dataset. They range from 3.7 to 7.2 for mammal-mtDNA, and from 30.5 to 39.0 for human-mtDNA. On the other hand, they fall into the range of much smaller values from 1.7 to 2.4 for cpDNA-9, from 2.4 to 2.7 for cpDNA-55, and from 2.3 to 2.8 for nDNA. The ratio of the mean transitional to the mean transversional exchangeabil-ity is estimated to be almost 10-20 times larger for human mitochondrial DNA but only 2-3 times larger for mammalian mitochondrial DNA than for nuclear and chloroplast DNAs. Adachi and Hasegawa [6] reported that the transitional mutation rate and the ratio of transitional to transversional mutation rate at four-fold degenerate sites of mtDNA were higher by about two times in humans than in apes. On the other hand, Yang and Yoder [55] showed that the maximum likelihood estimate of the ratio of transitional to transversional substitution rate changes with the species included in the analysis, and was always larger at low than at high sequence divergence. It was suggested [55] that the variable rates of transitional and transversional mutations among evolutionary lineages might cause such a sample dependence.

Conclusions
In the present mechanistic codon substitution model, single nucleotide mutations are modeled by the GTR model and multiple nucleotide mutations in infinitesimal time are assumed to occur independently at each position of codon, and selective constraints on amino acids are approximated by a linear function of the empirical selective constraints. It has been shown that even the Equal-Constraint model performs far better for a wide range of sequences from highly-homologous to highly-diverged sequences than both the No-Constraint model and the amino acid . Comparisons of branch lengths estimated by the models with a uniform rate, variable mutation rates, and variable selective constraints over sites. Branch lengths estimated for the phylogenetic trees of cpDNA-9 (A), cpDNA-55 (B), mammalian-mtDNA (C), human-mtDNA (D), and nDNA (E) are compared among models. The abscissa shows the branch lengths estimated by the model with the variation of selection constraint (dG4s). The LG-ML91+-12-F-dG4s is the best model except for human-mtDNA and nDNA. The best model is JTT-ML91+-12-F-dG4s for human-mtDNA and WAG-ML91+-5-F-dG4r for nDNA. The models with the variation of mutation rate (dG4r) and with a uniform substitution rate over sites are shown by cross and plus marks, respectively. The model with the variation of selection constraint (dG4s) is shown by the middle dotted line. The dotted lines in each figure are ones connecting the origin and the respective estimates for the longest branch on the abscissa. In the models specified with the suffix ''-5-'' for human-mtDNA and nDNA, five parameters were optimized with f mut j~0 :25 and m ag =m tcjag~mta =m ½tc½ag~mtg =m ½tc½ag~mca =m ½tc½ag~1 :0. doi:10.1371/journal.pone.0028892.g004 substitution models converted into the codon substitution models. The No-Constraint model is a nucleotide substitution model extended to allow multiple nucleotide changes in infinitesimal time. On the other hand, the codon substitution model converted from the amino acid substitution model is extended here in such a way that the special case of w 0~{ ? is exactly equivalent to the amino acid substitution model [17]. Thus, the performance of the Equal-Constraint model indicates that codon substitution models are superior to nucleotide and amino acid substitution models.
The present analyses have also shown that the mechanistic models with the amino acid dependent selective constraints do not only perform far better especially for phylogenetic trees consisting of relatively long branches than the Equal-Constraint model, but better even for phylogenetic trees consisting of short branches. This result indicates the superiority of the selective constraint matrices (w estimate ab ) estimated by maximizing the respective likelihoods of the observed substitution frequency matrices of 1-PAM [19]. In long branches, nonsynonymous substitutions increase, and therefore the proper evaluation of selective constraints on amino acids becomes critical. On the other hand, in short branches in which nonsynonymous substitutions are insignificant, the proper evaluation of mutational tendencies at the nucleotide level becomes important. The former is the situation in which amino acid substitution models perform better than nucleotide substitution models. Inversely, the latter is the situation in which nucleotide substitution models perform better, although they are not superior for slow-evolving proteins, because there is a possibility that synonymous substitutions are saturated even in short branches; the dataset nDNA is an example of such a case. However, mutational tendencies at the nucleotide level and the strength of selective constraints cannot be tailored to each gene in the amino acid substitution models, and selection on amino acid replacements cannot be taken into account in the nucleotide substitution models. Thus, mechanistic codon models that can tailor both mutational tendencies and the strength of selective constraints are superior to both nucleotide and amino acid substitution models.
It was pointed out [18] that codon substitution models require intensive computation to recalculate eigenvalues and eigenvectors of a 64-dimensional matrix. Simultaneous optimizations of a tree topology and model parameters may be hard. However, model parameters may be fixed at the values estimated for one of the reasonable trees, because the optimum values of model parameters do not severely depend on a tree topology, unless tree topologies are unrealistic. On the other hand, the mechanistic codon substitution model can provide much information on mutational tendencies and the strength of selective constraints. In addition, the present model enables us to distinguish the variations of mutation rate and of selective constraint over sites. The variation of mutation rate over time can also be discussed.
The present analyses show that multiple nucleotide changes in infinitesimal time are statistically significant in long branches as well as the variation of mutation rate over time. It has been also shown that the variation of amino acid substitution rate over sites results from variable selective constraints rather than variable mutation rates at least in the phylogenetic trees of cpDNA-9, cpDNA-55, and mammalian-mtDNA including long branches. Branch lengths will be overestimated for these datasets if the variation of mutation rate over sites is assumed instead of the variation of selective constraint. The capability of the mechanistic codon substitution models to extract biological knowledge from protein-coding sequences makes them superior to both nucleotide and amino acid substitution models.