Fig 1.
Pipeline for unbinned analyses, unweighted statistical binning, and weighted statistical binning.
The input to the pipeline is a set of sequences for different loci across different species. In the traditional pipeline, a multiple sequence alignment and gene tree is computed for each locus, and then these are given to the preferred coalescent-based summary method, and a species tree is returned. In the statistical binning pipeline, the estimated gene trees are used to compute an incompatibility graph, where each vertex represents a gene, and an edge between two genes indicates that the differences between the trees for these genes is considered significant (based on the bootstrap support of the conflicting edges between the trees). The vertices of the graph are then assigned colors, based on a heuristic for balanced minimum vertex coloring, so that no edge connects two vertices of the same color. The vertices with a given color are put into a bin, and the sequence alignments for the genes in a bin are combined into a supergene alignment. A (supergene) tree is then computed for each supergene alignment using a fully partitioned analysis. In the unweighted binning approach (presented in [23]), these supergene trees are then given to the preferred summary method, and a species tree is returned. In the weighted binning approach presented here, each supergene tree is repeated as many times as the number of genes in its bin, and this larger set is then given to the preferred summary method.
Table 1.
Topological discordance between true gene trees and true species tree.
For each collection of simulated datasets (defined by the type of simulation and the ILS level), we show the average topological distance between true gene trees and the species tree.
Fig 2.
Divergence of estimated gene tree (triplet) distributions from true gene tree distributions for MP-EST analyses of simulated avian datasets.
In (a), we vary the gene sequence length (250bp genes have the highest error, and 1500bp has the lowest error) and explore 1000 genes under default ILS levels, and in (b) we vary the amount of ILS and fix the number of genes to 1000 and sequence length to 500bp. True triplet frequencies are estimated based on true gene trees for each of the possible triplets, where n is the number of species. Similarly, triplet frequencies are calculated from estimated gene/supergene trees. For each of these
triplets, we calculate the Jensen-Shannon divergence of the estimated triplet distribution from the true gene tree triplet distribution. We show the empirical cumulative distribution of these divergence scores. The empirical cumulative distribution shows the percentage of the triplets that are diverged from the true triplet distribution at or below the specified divergence level. Results are shown for 10 replicates. We used 50% bootstrap support threshold for binning, and estimated the supergene trees using RAxML with unpartitioned analyses.
Fig 3.
Species tree estimation error (FN) for MP-EST and ASTRAL with MLBS on avian simulated datasets.
(a) MP-EST on 1000 genes with varying gene sequence length (controlling gene tree error) and with 1X ILS. (b) ASTRAL on the exact same conditions, (c) MP-EST on varying numbers of genes with fixed default level of ILS (1X level) and 500bp sequence length, and (d) MP-EST on varying levels of ILS and 1000 genes of length 500bp. We show results for 20 replicates everywhere, except for 2000 genes that are based on 10 replicates. Binning was performed using 50% bootstrap support threshold. We estimated the supergene trees, and performed concatenation using RAxML with unpartitioned analyses.
Table 2.
Statistical significance test results for choice of binning method on MP-EST.
We performed ANOVA to test the significance of the choice of methods (unbinned, weighted binned, unweighted binned, WSB-50: weighted statistical binning using 50% bootstrap support threshold and WSB-75: weighted binning using 75% bootstrap support threshold). For weighted vs. unweighted, we compared 50% bootstrap support threshold for avian, 75% for mammalian, and both 50% and 75% for 15- and 10-taxon datasets. All p-values are corrected for multiple hypothesis testing using the FDR correction (n = 16). “n.a.” stands for “not available”.
Table 3.
Statistical significance test results for choice of binning method on ASTRAL.
We performed ANOVA to test the significance of the choice of methods (unbinned, weighted binned, unweighted binned, WSB-50: weighted statistical binning using 50% bootstrap support threshold and WSB-75: weighted binning using 75% bootstrap support threshold). For weighted vs. unweighted, we compared 50% bootstrap support threshold for avian, 75% for mammalian, and both 50% and 75% for 15- and 10-taxon datasets. All p-values are corrected for multiple hypothesis testing using the FDR correction (n = 14). “n.a.” stands for “not available”.
Table 4.
Statistical significance test results for interaction effects (binning and simulation parameter) on MP-EST.
We performed ANOVA to test the significance of whether there is an interaction between the choice of the method (unbinned, weighted binned, unweighted binned, WSB-50: weighted statistical binning using 50% bootstrap support threshold and WSB-75: weighted statistical binning using 75% bootstrap support threshold) and the variable changed in each dataset. For weighted vs. unweighted, we compared 50% bootstrap support threshold for avian, 75% for mammalian, and both 50% and 75% for 15- and 10-taxon datasets. All p-values are corrected for multiple hypothesis testing using the FDR correction (n = 21). “n.a.” stands for “not available”.
Table 5.
Statistical significance test results for interaction effects (binning and simulation parameter) on ASTRAL.
We performed ANOVA to test the significance of whether there is an interaction between the choice of the method (unbinned, weighted binned, unweighted binned, WSB-50: weighted statistical binning using 50% bootstrap support threshold and WSB-75: weighted statistical binning using 75% bootstrap support threshold) and the variable changed in each dataset. For weighted vs. unweighted, we compared 50% bootstrap support threshold for avian, 75% for mammalian, and both 50% and 75% for 15- and 10-taxon datasets. All p-values are corrected for multiple hypothesis testing using the FDR correction (n = 17). “n.a.” stands for “not available”.
Fig 4.
Effect of binning on the branch lengths (in coalescent units) estimated by MP-EST using MLBS on the avian and mammalian simulated datasets.
We show the species tree branch length error (the ratio of estimated branch length to true branch length for branches of the true tree that appear in the estimated tree; 1 indicates correct estimation). Results are shown for (a) 1000 avian genes of 1X ILS level with varying gene sequence length, (b) 1000 avian genes of 500bp and with varying levels of ILS, and (c) varying number of mammalian genes and varying sequence length (250bp, 500bp, and 1000bp) with 1X ILS level. Results are shown for 20 replicates. We used 50% and 75% bootstrap support threshold for binning on avian and mammalian datasets, respectively, and estimated the supergene trees using RAxML with unpartitioned analyses.
Fig 5.
Cumulative distribution of the bootstrap support values (obtained using MLBS) of true positive (TP) and false positive (FP) edges estimated by binned and unbinned MP-EST on avian datasets.
In (a) we fix the number of genes to 1000, use default ILS levels, and vary sequence length to control gene tree estimation error, and in (b) we study 1000 genes with 500bp sequence length, and vary ILS levels. To produce the graph, we order the branches in the estimated species tree by their quality, so that the true positives with high support come first, followed by lower support true positives, then by false positives with low support, and finally by false positives with high support. The false positive branches with support above 75% are the most troublesome, and the highly supported false positives are indicated by the grey area. When the curve for a method lies above the curve for another method, then the first method has better bootstrap support. We used 50% bootstrap support threshold for binning, and estimated the supergene trees using RAxML with unpartitioned analyses.
Fig 6.
Species tree estimation error for MP-EST and ASTRAL using MLBS on mammalian simulated datasets.
We show average FN rate over 20 replicates. (a) Results for MP-EST. We varied the number of genes (50, 100, 200, 400 and 800) and sequence length (250bp (43% BS), 500bp (63% BS) and 1000bp (79% BS)) with default amount of ILS (1X level). (b) ASTRAL on varying numbers of genes with fixed 1X ILS level and 500bp sequence length. We used 50% and 75% bootstrap support threshold for binning on avian and mammalian datasets, respectively, and estimated the supergene trees using RAxML with unpartitioned analyses.
Fig 7.
Species tree estimation error for MP-EST and ASTRAL using MLBS on avian and mammalian simulated datasets with two support thresholds (B).
We show average FN rate for unbinned, and wighted and unweighted binned analyses with both B = 50% and B = 75%. Results are shown for (a) the avian dataset with 10 replicates of 1000 genes of length 500bp and 1X ILS level, and (b) the mammalian dataset with 20 replicates of 400 mixed genes (200 genes with 500bp and 200 genes with 1000bp) with 1X ILS level.
Fig 8.
Species tree estimation error for MP-EST and ASTRAL with MLBS on 15-taxon simulated datasets.
We show average FN rate over 10 replicates. We varied the number of genes (100 and 1000) and sequence length (100bp and 1000bp). We used 50% and 75% bootstrap support thresholds for binning, and estimated the supergene trees using RAxML with fully partitioned analyses.
Fig 9.
Species tree estimation error for MP-EST and ASTRAL with MLBS on 10-taxon simulated datasets.
We show average FN rate over 20 replicates. We varied the amount of ILS and fixed the number of genes to 200 and gene sequence length to 100bp. We used 50% and 75% bootstrap support thresholds for binning, and estimated the supergene trees using RAxML with fully partitioned analyses.
Fig 10.
Trees computed on the avian biological dataset using MP-EST on MLBS gene trees.
We show results with weighted and unweighted binning (left), and unbinned analyses (right). We used 50% bootstrap support threshold for binning. Supergene trees were estimated using fully partitioned analyses. MP-EST with weighted and unweighted binning returned the same tree. The branches on the binned MP-EST tree are labeled with two support values side by side: the first is for unweighted binning and the second is for weighted binning; branches without designation have 100% support. Branches in red indicate contradictions to known subgroups.