Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

The impact of software and criteria on the selection of best-fit nucleotide substitution models for molecular evolutionary genetic analysis

  • Xingguang Li ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    xingguanglee@hotmail.com

    Affiliations Ningbo No. 2 Hospital, Ningbo, China, Guoke Ningbo Life Science and Health Industry Research Institute, Ningbo, China

  • Olayinka Sunday Okoh,

    Roles Formal analysis, Methodology, Validation

    Affiliation Department of Chemical Sciences, Anchor University, Lagos, Nigeria

  • Nídia Sequeira Trovão

    Roles Formal analysis, Methodology, Validation, Writing – review & editing

    Affiliation Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America

Abstract

The statistical selection of best-fit models of nucleotide substitution for multiple sequence alignments (MSAs) is routine in phylogenetics. Our analysis of model selection across three widely used phylogenetic programs (jModelTest2, ModelTest-NG, and IQ-TREE) demonstrated that the choice of program did not significantly affect the ability to accurately identify the true nucleotide substitution model. This finding indicates that researchers can confidently rely on any of these programs for model selection, as they offer comparable accuracy without substantial differences. However, our results underscore the critical impact of the information criterion chosen for model selection. BIC consistently outperformed both AIC and AICc in accurately identifying the true model, regardless of the program used. This observation highlights the importance of carefully selecting the information criterion, with a preference for BIC, when determining the best-fit model for phylogenetic analyses. This study provides an assessment of popular model selection programs while contributing to the advancement of more robust statistical methods and tools for accurately identifying the most suitable nucleotide substitution models.

Introduction

It is well known that nucleotide substitution models are widely used in phylogenetic analyses of sequence data, and distinct substitution models can change the outcome of phylogenetic analyses [13]. A nucleotide substitution model is a mathematical description of how DNA sequences change over time. It specifies the rates of substitution between all pairs of nucleotides, and the frequencies of each nucleotide in the sequence. A nucleotide substitution model can be simpler or more complex depending on how many parameters it has and how realistic it is. A simple model may assume that all substitutions are equally likely, and that all nucleotides have the same frequency. A complex model may allow for different rates of substitution for different types of changes (such as transitions and transversions), and for different frequencies of nucleotides depending on the context. A complex model may also account for variation in substitution rates among sites or among lineages [410]. Therefore, the selection of an appropriate substitution model is crucial for obtaining accurate phylogenetic inferences, as it directly influences the reliability of the resulting trees and downstream analyses [1117].

In the last 20 years, a number of software for selecting the best-fit substitution model on a given dataset have been developed [1822]. There are three statistical approaches to estimating how well a given substitution model fits a dataset, including the Akaike Information Criterion (AIC) [23], the Corrected Akaike Information Criterion (AICc) [24,25], both of them derived from frequentist probability, and the Bayesian Information Criterion (BIC) [26], which is derived from Bayesian probability. AIC [23], AICc [24,25] and BIC [26] are the most used model selection criteria and are implemented in a variety of softwares. BIC most heavily penalizes the addition of extra parameters, and substitution model selection parameters in turn. However, the results for selecting the best-fit model on a given dataset are not always consistent with one another, and the rule of thumb is that one should usually pick the model with smaller numbers of parameters for computational efficiency, expecially for a large dataset, when computing resources are limited [27]. Even though the selection of a simpler model might be preferable for computational efficiency, there are other points to be considered, such as the comparison of evolutionary rates among different genes/genomes/organisms, which are affected by the choice of substitution model applied to each dataset. jModelTest v2.1.10 [21,28], ModelTest-NG v0.1.7 [29,30] (ModelTest-NG is one to two orders of magnitude faster than jModelTest) [9], and IQ-TREE v2.2.0 [22,31] are some of the most popular software used for nucleotide substitution model selection, and all three have implemented AIC [23], AICc [24,25] and BIC [26] as model selection criteria.

Given the above, we sought to shed light on the following questions that molecular scientists frequently face. Are the statistical selection of best-fit models of nucleotide substitution by AIC [23] and AICc [24,25] consistent in most cases? If so, is statistical selection by AIC necessary when AICc has already been performed? When the best-fit model of nucleotide substitution selected by BIC [26] is inconsistent with that selected by AIC and AICc, should we use BIC or AIC/AICc? Furthermore, are the best-fit nucleotide substitution models selected by BIC usually simpler or more complex than those selected by AIC and AICc? If there is a difference, should we use the criterion that selects the best-fit nucleotide substitution models with fewer parameters? When the best-fit nucleotide substitution models selected irrespective of criteria in IQ-TREE are inconsistent with those selected in jModelTest2 or ModelTest-NG, which software results should be used? Furthermore, is the statistical selection by AIC, AICc, and BIC in jModelTest2 and ModelTest-NG usually simpler or more complex than those in IQ-TREE? If there is a difference, should we use the software that selects the best-fit nucleotide substitution model with fewer parameters? Notably, are the statistical selection of best-fit nucleotide substitution models by one criterion more frequently consistent with the real nucleotide substitution models used to generate simulated genetic datasets? If so, is statistical selection necessary when the former has already been performed?

While studies like Luo et al. [32], have explored this topic, a lack of consensus remains on which criteria or software should be prioritized in different modeling scenarios. Our study addresses this gap through a comprehensive comparative analysis. In summary, this study addresses the questions outlined above and provides insights into whether the selection of the best-fit nucleotide substitution models is influenced by the method and program used for implementation. If so, it suggests that, in certain cases, the selection of the best-fit nucleotide substitution model may lack objectivity.

Materials and methods

To evaluate these questions, 34 published real datasets from a previous study [33] were investigated. These datasets contained multilocus DNA alignments from the mitochondrial, nuclear, and chloroplast genomes from a diverse array of animals and plants with a varying number of taxa (13 up to 2,872) and alignment lengths (823 up to 25,919 sites), providing a comprehensive representation of the diversity of genetic sequences used in phylogenetic studies. In addition, 88 published simulated datasets each generated with different nucleotide substitution models [34] were also investigated. These datasets contained 100 taxa with 10,000 nucleotides in length generated based on 88 random trees by AliSim software [34]. In summary, we analysed 122 datasets (34 real datasets and 88 simulated datasets) in the present study. For each dataset, the statistical selection of best-fit nucleotide substitution model by AIC, AICc, and BIC was performed in jModelTest v2.1.10 [21,28], ModelTest-NG v0.1.7 [29,30], and IQ-TREE v2.2.0 [22,31] using all substitution models offered in these software (S1 Table). The specific commands used for the statistical selection of the best-fit model are provided in S2 Table.

If different substitution models are selected using different criteria within the same software, we assess their similarity. Based on S3 Table, models that differ by four or fewer are considered similar, while those differing by five or more are deemed dissimilar (see S4 and S5 Tables). To evaluate the concordance between nucleotide substitution model selection results obtained from three different programs (using AIC, AICc, and BIC) and the true nucleotide substitution model, we conducted a statistical analysis. Specifically, we evaluated whether the best-fit models identified by each program and selection criterion were consistent with each other (S4 and S5 Tables) and with the known true model (S5 Table). This resulted in a binary classification (yes/no) reflecting agreement among the programs and with the true model (S1, S4, and S5 Tables, respectively). A Chi-squared test of independence was employed to determine if any significant associations existed between the programs, selection criteria, and the consistency of model selection. All statistical analyses were conducted in RStudio [35,36] using various packages, including ‘rcompanion’ [37], which provides the pairwiseNominalIndependence() function for post hoc analysis.

Results

S4 and S5 Tables present the statistical selection of best-fit nucleotide substitution model for the 34 published real datasets [33] and 88 published simulated datasets [34]. The model selection was performed using three different criteria—AIC, AICc, and BIC—and evaluated with three state-of-the-art programs: jModelTest2, ModelTest-NG, and IQ-TREE. For the 34 published real datasets [33], the best-fit nucleotide substitution model selected by AIC and AICc was the same in jModelTest2, ModelTest-NG and IQ-TREE, except for one dataset (‘Devitt_2013’) in ModelTest-NG and IQ-TREE (Fig 1). For the 88 published simulated datasets [34], as shown in S1 Fig selection by AIC and AICc was also the same in jModelTest2, ModelTest-NG and IQ-TREE, except for three datasets (‘HKY_F_I_G_10000’, ‘JC_I_G_10000’ and ‘TIM2e_10000’) in jModelTest2; one dataset (‘JC_G_10000’) in jModelTest2 and IQ-TREE.

thumbnail
Fig 1. Results of statistical selection of best-fit models of nucleotide substitution by AIC in comparison to AICc using three different programs for real datasets.

https://doi.org/10.1371/journal.pone.0319774.g001

Notably, for the 34 published real datasets [33] (Fig 2) and 88 published simulated datasets [34] (S2 Fig), when the selection of the best-fit models among methods was inconsistent, the best-fit models of nucleotide substitution selected by BIC were relatively simpler than those selected by AIC and AICc using the three different programs, except for one dataset (‘TVMe_10000’) in ModelTest-NG (S2 Fig).

thumbnail
Fig 2. Results of statistical selection of best-fit models of nucleotide substitution by BIC in comparison to AIC and AICc using three different programs for real datasets.

https://doi.org/10.1371/journal.pone.0319774.g002

We also evaluated whether the best-fit nucleotide substitution models selected were consistent across software for each of the criteria. As shown in Fig 3 for the 34 published real datasets [33], when the best-fit nucleotide substitution models selected in jModelTest2 and ModelTest-NG were inconsistent with those selected in IQ-TREE, the statistical selection of the best-fit nucleotide substitution model selected by AIC performed in jModelTest2 and ModelTest-NG for six and seven datasets, respectively, prefered the relatively simpler models in comparison to the selection by AIC performed in IQ-TREE. However, the statistical selection of the best-fit nucleotide substitution model selected by AIC performed in IQ-TREE for three datasets tend to select relatively simpler models in comparison to the statistical selection by AIC performed both in jModelTest2 and ModelTest-NG. The statistical selection of the best-fit nucleotide substitution model selected by AICc performed in jModelTest2 and ModelTest-NG for six and seven datasets, respectively, prefers to select relatively simpler models in comparison to the selection by AICc performed in IQ-TREE, however, the statistical selection of the best-fit nucleotide substitution model selected by AICc performed in IQ-TREE for four and three datasets, respectively, tend to select relatively simpler models in comparison to the statistical selection by AICc performed in jModelTest2 and ModelTest-NG. For BIC, the statistical selection of the best-fit models of nucleotide substitution performed in jModelTest2 and ModelTest-NG for fourteen and twelve datasets, respectively, tends to select relatively simpler models in comparison to the statistical selection by BIC performed in IQ-TREE. However, the statistical selection of the best-fit nucleotide substitution model selected by BIC performed in IQ-TREE for two datasets tend to select relatively simpler models in comparison to the statistical selection by BIC performed both in jModelTest2 and ModelTest-NG.

thumbnail
Fig 3. Results of statistical selection of best-fit models of nucleotide substitution by AIC, AICc, and BIC using jModelTest2 and ModelTest-NG in comparison to IQ-TREE for real datasets.

https://doi.org/10.1371/journal.pone.0319774.g003

For the 88 published simulated datasets [34], as shown in S3 Fig, when the best-fit models of nucleotide substitution selected by AIC, AICc, and BIC in jModelTest2 and ModelTest-NG were inconsistent with those selected in IQ-TREE, the statistical selection of the best-fit models of nucleotide substitution by AIC performed in jModelTest2 and ModelTest-NG for thirteen and sixteen datasets, respectively, tends to select relatively simpler models in comparison to the statistical selection by AIC performed in IQ-TREE. However, the statistical selection of the best-fit models of nucleotide substitution by AIC performed in IQ-TREE for ten and eight datasets, respectively, leans towards selection of relatively simpler models in comparison to the statistical selection by AIC performed in jModelTest2 and ModelTest-NG. Similarly, for AICc, the statistical selection of the best-fit models of nucleotide substitution performed in jModelTest2 and ModelTest-NG for twelve and fifteen datasets, respectively, tends to select relatively simpler models in comparison to the statistical selection by AICc performed in IQ-TREE. But again, there are instances (eight datasets) where the statistical selection of the best-fit models of nucleotide substitution by AICc performed in IQ-TREE tends to select relatively simpler models in comparison to the statistical selection by AICc performed both in jModelTest2 and ModelTest-NG. Lastly, for BIC, the statistical selection of the best-fit models of nucleotide substitution performed in jModelTest2 and ModelTest-NG for just one dataset gravitates towards selection of relatively simpler models in comparison to the statistical selection by BIC performed in IQ-TREE. Interestingly, the statistical selection of the best-fit models of nucleotide substitution by BIC performed in IQ-TREE never selects a relatively simpler model in comparison to the statistical selection by BIC performed both in jModelTest2 and ModelTest-NG.

The results of statistical selection of best-fit models of nucleotide substitution by AIC, AICc, and BIC using three different programs in comparison to real nucleotide substitution model for the 88 published simulated datasets [34] are shown in Fig 4. The statistical selection of the best-fit models of nucleotide substitution by AIC performed in jModelTest2, ModelTest-NG, and IQ-TREE for 50 (50/88; 56.8%), 55 (55/88; 62.5%) and 51(51/88; 58.0%) datasets, respectively, were consistent with real nucleotide substitution models. Similarly and as expected, the statistical selection of the best-fit models of nucleotide substitution by AICc performed in jModelTest2, ModelTest-NG, and IQ-TREE for 51(51/88; 58.0%), 55 (55/88; 62.5%) and 51(51/88; 58.0%) datasets, respectively, were consistent with real nucleotide substitution models, and thus performed better than the previous criterion. Remarkably, the statistical selection of the best-fit models of nucleotide substitution by BIC performed in jModelTest2, ModelTest-NG, and IQ-TREE for 88 (88/88; 100%), 88 (88/88; 100%) and 86 (86/88; 97.7%) datasets, respectively, were consistent with real nucleotide substitution models.

thumbnail
Fig 4. Results of statistical selection of best-fit models of nucleotide substitution by AIC, AICc, and BIC using three different programs in comparison to real nucleotide substitution model for simulated datasets.

https://doi.org/10.1371/journal.pone.0319774.g004

Using AIC, AICc, and BIC criteria, for the 34 published real datasets [33], as shown in S4 Tables, the proportion of nucleotide substitution best model similarities detected by jModelTest2, ModelTest-NG, and IQ-TREE are 19/34, 24/34, and 26/34, respectively. In jModelTest2, the nucleotide substitution best model similarities in 19 out of 34 datasets were also detected by ModelTest-NG. ModelTest-NG detected the nucleotide substitution best model similarity in 5 additional datasets compared to jModelTest2. There are 5 datasets where the nucleotide substitution best model similarity detection results differ between ModelTest-NG and jModelTest2. There are 8 datasets where the nucleotide substitution best model similarity detection results differ between ModelTest-NG and IQ-TREE. The nucleotide substitution best model similarity detection results differ between ModelTest-NG, jModelTest2, and IQ-TREE for 11 datasets.

Using AIC, AICc, and BIC criteria, for the 88 published simulated datasets [34], as shown in S5 Table, the proportion of nucleotide substitution best model similarities detected by jModelTest2, ModelTest-NG, and IQ-TREE are 67/88, 71/88, and 64/88, respectively. In jModelTest2, the nucleotide substitution best model similarities in 67 out of 88 datasets were also detected by ModelTest-NG. ModelTest-NG detected the nucleotide substitution best model similarity in 4 additional datasets compared to jModelTest2. There are 4 datasets where the nucleotide substitution best model similarity detection results differ between ModelTest-NG and jModelTest2. There are 13 datasets where the nucleotide substitution best model similarity detection results differ between ModelTest-NG and IQ-TREE. The nucleotide substitution best model similarity detection results differ between ModelTest-NG, jModelTest2, and IQ-TREE for 13 datasets.

To assess the consistency of model selection across three different programs (jModelTest2, ModelTest-NG, and IQ-TREE), we evaluated their ability to identify the true nucleotide substitution model. Each program was used to select the best-fit model based on three information criteria (AIC, AICc, and BIC) (S5 Table). The results, summarized in Table 1, show the number of instances (irrespective of information criteria) where each program successfully identified the true model.

thumbnail
Table 1. Frequency of correctly identifying the true nucleotide substitution model.

https://doi.org/10.1371/journal.pone.0319774.t001

A Chi-squared test of independence was performed to determine if any significant differences existed in the accuracy of model selection among the three programs. Pairwise comparisons, presented in Table 2, revealed no significant differences (all adjusted p-values >  0.05). This suggests that the choice of program does not significantly impact the ability to identify the true nucleotide substitution model.

thumbnail
Table 2. Pairwise comparisons of model selection accuracy between programs.

https://doi.org/10.1371/journal.pone.0319774.t002

To assess the consistency of model selection across three different information criteria (AIC, AICc, and BIC) (S5 Table), we evaluated their ability to identify the true nucleotide substitution model across datasets. The results, summarized in Table 3, show the number of instances where each criterion successfully identified the true model.

thumbnail
Table 3. Frequency of correctly identifying the true nucleotide substitution model using different information criteria.

https://doi.org/10.1371/journal.pone.0319774.t003

A Chi-squared test of independence was performed to determine if any significant differences existed in the accuracy of model selection among the three criteria. The overall test was statistically significant (χ2 = 141.31, df =  2, p <  2.2 x 10-16), indicating that the choice of information criterion significantly impacts the ability to identify the true model.

Pairwise comparisons, presented in Table 4, were conducted post hoc to identify the source of these differences. Each pairwise comparison was also statistically significant (all adjusted p-values <  0.05), indicating that AIC, AICc, and BIC each differ significantly in their ability to select the true model. Notably, BIC demonstrated a substantially higher accuracy compared to both AIC and AICc.

thumbnail
Table 4. Pairwise comparisons of model selection accuracy between information criteria.

https://doi.org/10.1371/journal.pone.0319774.t004

These findings highlight the importance of carefully considering the choice of information criterion for model selection in phylogenetic analyses. While AIC and AICc produced similar results, BIC demonstrated a clear advantage in identifying the true nucleotide substitution model.

To assess the influence of information criteria on model selection across different programs, we evaluated the performance of AIC, AICc, and BIC in jModelTest2, ModelTest-NG, and IQ-TREE. Each program was used to select the best-fit nucleotide substitution model for 88 datasets (S5 Table), and the frequency of correctly identifying the true model was recorded (Tables 510).

thumbnail
Table 5. Frequency of correctly identifying the true nucleotide substitution model using different information criteria in jModelTest2.

https://doi.org/10.1371/journal.pone.0319774.t006

thumbnail
Table 6. Pairwise comparisons of model selection accuracy in jModelTest2 using different information criteria.

https://doi.org/10.1371/journal.pone.0319774.t007

thumbnail
Table 7. Frequency of correctly identifying the true nucleotide substitution model using different information criteria in ModelTest-NG.

https://doi.org/10.1371/journal.pone.0319774.t008

thumbnail
Table 8. Pairwise comparisons of model selection accuracy in ModelTest-NG using different information criteria.

https://doi.org/10.1371/journal.pone.0319774.t009

thumbnail
Table 9. Frequency of correctly identifying the true nucleotide substitution model using different information criteria in IQ-TREE.

https://doi.org/10.1371/journal.pone.0319774.t010

thumbnail
Table 10. Pairwise comparisons of model selection accuracy in IQ-TREE using different information criteria.

https://doi.org/10.1371/journal.pone.0319774.t005

A Chi-squared test of independence revealed significant differences in the accuracy of model selection among the three information criteria in jModelTest2 (χ2 =  52.409, df =  2, p <  4.164 x 10-12) (Table 5). Pairwise comparisons (Table 6) showed that BIC significantly outperformed both AIC and AICc (adjusted p <  0.05), while there was no significant difference between AIC and AICc.

Similarly, in ModelTest-NG, a significant difference was observed among the criteria (χ2 =  44, df =  2, p <  2.789 x 10-10) (Table 7). Again, BIC showed significantly higher accuracy compared to both AIC and AICc (adjusted p <  0.05), with no significant difference between AIC and AICc (Table 8).

In IQ-TREE, the pattern remained consistent. The overall Chi-squared test was significant (χ2 =  45.269, df =  2, p <  1.479 x 10-10) (Table 9), and BIC was significantly more accurate than both AIC and AICc (adjusted p <  0.05), with no difference between AIC and AICc (Table 10).

To assess the consistency of model selection across different programs, we evaluated the agreement between jModelTest2, ModelTest-NG, and IQ-TREE in identifying the best-fit nucleotide substitution model. Each program was used to select the best model based on three information criteria (AIC, AICc, and BIC) for the 34 real datasets (S4 Table). We then compared whether the models selected by each program were identical across all three criteria, resulting in a binary classification (yes/no) for each program (Table 11).

thumbnail
Table 11. Frequency of consistent model selection across different information criteria for each program.

https://doi.org/10.1371/journal.pone.0319774.t011

A Chi-squared test of independence was performed to determine if any significant differences existed in the consistency of model selection among the three programs. The test was not statistically significant (χ2 =  3.4941, df =  2, p =  0.1743), indicating that the choice of program does not significantly impact the agreement in model selection across different information criteria. This suggests that the three programs generally produce similar results when selecting the best-fit model, regardless of the specific criterion used.

To assess the consistency of model selection across different programs using simulated datasets, we evaluated the agreement between jModelTest2, ModelTest-NG, and IQ-TREE in identifying the best-fit nucleotide substitution model. Each program was used to select the best model based on three information criteria (AIC, AICc, and BIC) for 88 simulated datasets (S5 Table). We then compared whether the models selected by each program were identical across all three criteria, resulting in a binary classification (yes/no) for each program (Table 12).

thumbnail
Table 12. Frequency of consistent model selection across different information criteria for each program using simulated datasets.

https://doi.org/10.1371/journal.pone.0319774.t012

A Chi-squared test of independence was performed to determine if any significant differences existed in the consistency of model selection among the three programs. The test was not statistically significant (χ2 =  1.5599, df =  2, p =  0.4584), indicating that the choice of program does not significantly impact the agreement in model selection across different information criteria when using simulated data. This suggests that, similar to the results observed with real datasets, the three programs generally produce similar results when selecting the best-fit model from simulated data, regardless of the specific criterion used.

Discussion

The statistical selection of best-fit models of nucleotide substitution for multiple sequence alignments (MSAs) of DNA or RNA is routine in phylogenetics [38]. Previous study has shown that BIC is preferred for nucleotide substitution of molecular evolutionary genetic analysis in a comprehensive study [32]. In the present study, we investigated the general principles for statistical selection of best-fit models of nucleotide substitution using 122 published datasets (34 real datasets [33] and 88 simulated datasets [34]), using three selection methods (AIC, AICc, and BIC) and three state-of-the-art programs (jModelTest2, ModelTest-NG, and IQ-TREE). Our finding showed that model selections by AIC and AICc were the same in most cases for both the 34 published real datasets [33] and 88 published simulated datasets [34] (Figs 1 and S1). We observed that, when model selection was inconsistent across methods, the nucleotide substitution models selected by BIC were generally simpler than those chosen by AIC and AICc. This pattern was consistent across all 34 real [33] and 88 simulated datasets [34], except for one dataset (‘TVMe_10000’) in ModelTest-NG, using three different software programs (Figs 2 and S2). Additionally, though evolution is often a complex process, for computational purposes, researcher tend to select the simplest model that can appropriately characterize the evolutionary process [39]. This is in line with similar comparisons in the context of machine learning (https://machinelearningmastery.com/probabilistic-model-selection-measures).

Second, when best-fit model selection was inconsistent among different programs, AIC, AICc, and BIC tended to select relatively simpler best-fit models of nucleotide substitution in jModelTest2 and ModelTest-NG than in IQ-TREE in most cases for both the 34 published real datasets [33] and 88 published simulated datasets [34], especially, for BIC method (Figs 3 and S3). Notably, the statistical selection of the best-fit models of nucleotide substitution by BIC performed in jModelTest2, ModelTest-NG, and IQ-TREE were much more often consistent (100%, 100%, and 97.7%, respectively) with the real nucleotide substitution models of simulated datasets [34] (Fig 4) in comparision to those obtained using AIC (56.8%, 62.5%, 58.0%, respectively) or AICc (58.0%, 62.5%, 58.0%, respectively). We compared the performance of jModelTest2, ModelTest-NG, and IQ-TREE in selecting nucleotide substitution models using AIC, AICc, and BIC criteria across the 34 published real datasets [33] and 88 published simulated datasets [34] (S4 and S5 Tables). The performance of different programs in selecting the best-fit nucleotide substitution model can vary due to several key factors: algorithmic approach, selection criteria, model variety, handling complexity, computational efficiency, ease of use, and software updates/support [21,22,28,29,31,40,41]. Though not statistically significant, ModelTest-NG was often more reliable and accurate in selecting the best-fit nucleotide substitution model. It combines modern algorithms with a scientifically robust methodology to ensure that the selected models are both statistically sound and generalizable, making it the optimal choice for model selection in molecular evolutionary analyses.

Our analysis of model selection accuracy across three popular phylogenetic programs (jModelTest2, ModelTest-NG, and IQ-TREE) revealed that the choice of program had no significant impact on the ability to identify the true nucleotide substitution model. This finding suggests that researchers can confidently use any of these programs for model selection without concern for substantial differences in accuracy. However, in agreement with previous studies [32], our results did highlight the critical influence of the information criterion used for model selection. BIC consistently outperformed both AIC and AICc in identifying the true model, irrespective of the program employed. This observation underscores the importance of carefully considering the information criterion, and potentially favoring BIC, when selecting the best-fit model for phylogenetic analyses. While further research is needed to explore the generalizability of these findings across diverse datasets and evolutionary scenarios, our results provide valuable insights for researchers seeking to optimize model selection strategies in phylogenetics.

Other limitations include only testing 88 substitution models as per the 88 simulated datasets, however, jModelTest2 can test 1624 substitution models and IQ-TREE can test an even higher number of substitution models than jModelTest2. However, the substitution models studied here are a good representation of those implemented in the most popular phylogenetic tree reconstruction software (i.e., MEGA [42], FastTree [43], PhyML [44], RAxML [41], RAxML-NG [45], IQ-TREE [31], MrBayes [46], BEAST [47,48]). We did not test the substitution model selection using the famous MEGA software because it only supports 24 substitution models. The 88 published simulated datasets, each generated with different nucleotide substitution models, were tested using the three state-of-the-art programs (jModelTest2, ModelTest-NG, and IQ-TREE) for comparison, one of which defaults settings only allow testing 88 substitution models (ModelTest-NG).

While previous studies [32] have explored this topic, clear guidance on prioritizing specific criteria or software for different modeling scenarios remains lacking. Our study fills this gap with a comprehensive comparative analysis to resolve these uncertainties. Overall, our results indicate that the selection methods employed by different programs influence the choice of the best-fit nucleotide substitution model. Based on a comprehensive statistical analysis of these patterns, we recommend using the Bayesian Information Criterion (BIC) implemented in most softwares for the statistical selection of the best-fit nucleotide substitution model. We hope that this study will contribute to the development of more robust statistical selection methods and tools for accurately identifying the most appropriate nucleotide substitution models.

Supporting information

S1 Fig. Results of statistical selection of best-fit models of nucleotide substitution by AIC in comparison to AICc using three different programs for simulated datasets.

https://doi.org/10.1371/journal.pone.0319774.s001

(PDF)

S2 Fig. Results of statistical selection of best-fit models of nucleotide substitution by BIC in comparison to AIC and AICc using three different programs for simulated datasets.

https://doi.org/10.1371/journal.pone.0319774.s002

(PDF)

S3 Fig. Results of statistical selection of best-fit models of nucleotide substitution by AIC, AICc, and BIC using jModelTest2 and ModelTest-NG in comparison to IQ-TREE for simulated datasets.

https://doi.org/10.1371/journal.pone.0319774.s003

(PDF)

S1 Table. List of the 88 nucleotide substitution models sorted by ModelTest-NG.

https://doi.org/10.1371/journal.pone.0319774.s004

(XLSX)

S2 Table. Specific command lines used for statistical selection of best-fit models of nucleotide substitution.

https://doi.org/10.1371/journal.pone.0319774.s005

(XLSX)

S3 Table. All common DNA susbtitution models (ordered by complexity).

Adapted from http://www.iqtree.org/doc/Substitution-Models on December 11, 2024.

https://doi.org/10.1371/journal.pone.0319774.s006

(XLSX)

S4 Table. Results of statistical selection of best-fit models of nucleotide substitution by AIC, AICc, and BIC using three different programs for real datasets.

https://doi.org/10.1371/journal.pone.0319774.s007

(XLSX)

S5 Table. Results of statistical selection of best-fit models of nucleotide substitution by AIC, AICc, and BIC using three different programs for simulated datasets.

https://doi.org/10.1371/journal.pone.0319774.s008

(XLSX)

References

  1. 1. Buckley TR. Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst Biol. 2002;51(3):509–23. pmid:12079647
  2. 2. Buckley TR, Cunningham CW. The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support. Mol Biol Evol. 2002;19(4):394–405. pmid:11919280
  3. 3. Lemmon AR, Moriarty EC. The importance of proper model assumption in bayesian phylogenetics. Syst Biol. 2004;53(2):265–77. pmid:15205052
  4. 4. Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–76. pmid:7288891
  5. 5. Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980;16(2):111–20. pmid:7463489
  6. 6. Hasegawa M, Kishino H, Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22(2):160–74. pmid:3934395
  7. 7. Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993;10(3):512–26. pmid:8336541
  8. 8. Kimura M. Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci U S A. 1981;78(1):454–8. pmid:6165991
  9. 9. Zharkikh A. Estimation of evolutionary distances between nucleotide sequences. J Mol Evol. 1994;39(3):315–29. pmid:7932793
  10. 10. Jukes TH, Cantor CR. Chapter 24 - evolution of protein molecules. In: Munro HN, editor. Mammalian protein metabolism. Academic Press; 1969. p. 21–132.
  11. 11. Duchêne S, Di Giallonardo F, Holmes EC. Substitution model adequacy and assessing the reliability of estimates of virus evolutionary rates and time scales. Mol Biol Evol. 2016;33(1):255–67. pmid:26416981
  12. 12. Del Amparo R, Arenas M. Consequences of substitution model selection on protein ancestral sequence reconstruction. Mol Biol Evol. 2022;39(7):msac144. pmid:35789388
  13. 13. Del Amparo R, Arenas M. Influence of substitution model selection on protein phylogenetic tree reconstruction. Gene. 2023;865:147336. pmid:36871672
  14. 14. Sumner JG, Jarvis PD, Fernández-Sánchez J, Kaine BT, Woodhams MD, Holland BR. Is the general time-reversible model bad for molecular phylogenetics?. Syst Biol. 2012;61(6):1069–74. pmid:22442193
  15. 15. Tao Q, Barba-Montoya J, Huuki LA, Durnan MK, Kumar S. Relative efficiencies of simple and complex substitution models in estimating divergence times in phylogenomics. Mol Biol Evol. 2020;37(6):1819–31. pmid:32119075
  16. 16. Yang Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol. 1996;11(9):367–72. pmid:21237881
  17. 17. Hoff M, Orf S, Riehm B, Darriba D, Stamatakis A. Does the choice of nucleotide substitution models matter topologically? BMC Bioinformatics. 2016;17:143. pmid:27009141
  18. 18. Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14(9):817–8. pmid:9918953
  19. 19. Posada D. jModelTest: phylogenetic model averaging. Mol Biol Evol. 2008;25(7):1253–6. pmid:18397919
  20. 20. Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27(8):1164–5. pmid:21335321
  21. 21. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods. 2012;9(8):772. pmid:22847109
  22. 22. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9. pmid:28481363
  23. 23. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;19(6):716–23.
  24. 24. HURVICH CM, TSAI C-L. Regression and time series model selection in small samples. Biometrika. 1989;76(2):297–307.
  25. 25. Sugiura N. Further analysis of the data by Akaike’s information criterion and the finite corrections. Communications in Statistics - Theory and Methods. 1978;7(1):13–26.
  26. 26. Schwarz G. Estimating the dimension of a model. Ann Statist. 1978;6(2).
  27. 27. Posada D, Buckley TR. Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol. 2004;53(5):793–808. pmid:15545256
  28. 28. Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003;52(5):696–704. pmid:14530136
  29. 29. Darriba D, Posada D, Kozlov AM, Stamatakis A, Morel B, Flouri T. ModelTest-NG: A new and scalable tool for the selection of DNA and protein evolutionary models. Mol Biol Evol. 2020;37(1):291–4. pmid:31432070
  30. 30. Flouri T, Izquierdo-Carrasco F, Darriba D, Aberer AJ, Nguyen L-T, Minh BQ, et al. The phylogenetic likelihood library. Syst Biol. 2015;64(2):356–62. pmid:25358969
  31. 31. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4. pmid:32011700
  32. 32. Luo A, Qiao H, Zhang Y, Shi W, Ho SY, Xu W, et al. Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets. BMC Evol Biol. 2010;10:242. pmid:20696057
  33. 33. Kainer D, Lanfear R. The effects of partitioning on phylogenetic inference. Mol Biol Evol. 2015;32(6):1611–27. pmid:25660373
  34. 34. Ly-Trong N, Naser-Khdour S, Lanfear R, Minh BQ. AliSim: a fast and versatile phylogenetic sequence simulator for the genomic era. Mol Biol Evol. 2022;39(5):msac092. pmid:35511713
  35. 35. Team P. RStudio: Integrated Development Environment for R. Boston, MA: Posit Software, PBC; 2024.
  36. 36. Team RC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2020.
  37. 37. Mangiafico SS. Rcompanion: Functions to Support Extension Education Program Evaluation. New Brunswick, New Jersey: Rutgers Cooperative Extension; 2024.
  38. 38. Sullivan J, Joyce P. Model selection in phylogenetics. Annu Rev Ecol Evol Syst. 2005;36(1):445–66.
  39. 39. Shapiro B, Rambaut A, Drummond AJ. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol Biol Evol. 2006;23(1):7–9. pmid:16177232
  40. 40. Lanfear R, Calcott B, Ho SYW, Guindon S. Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol. 2012;29(6):1695–701. pmid:22319168
  41. 41. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3. pmid:24451623
  42. 42. Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol. 2021;38(7):3022–7. pmid:33892491
  43. 43. Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. pmid:20224823
  44. 44. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59(3):307–21. pmid:20525638
  45. 45. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019;35(21):4453–5. pmid:31070718
  46. 46. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61(3):539–42. pmid:22357727
  47. 47. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4(1):vey016. pmid:29942656
  48. 48. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, et al. BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 2019;15(4):e1006650. pmid:30958812