Figure 1.
The three possible phylogenies for Dmel, Dere, and Dyak, with Dana as an outgroup.
Figure 2.
Widespread Incongruence of Substitutions, Indels, and Gene Trees
(A) The proportion of informative nucleotide substitutions in 9,405 genes supporting each of the three trees. Tree 1 (red) is supported by 170,002 (44.7%) nucleotide changes; tree 2 (green), 112,278 (29.5%) nucleotide changes; and tree 3 (purple), 98,117 (25.8%) nucleotide changes.
(B) The proportion of informative amino acid substitutions in 9,405 genes supporting each of the three trees. Tree 1 (red) is supported by 28,628 (49.3%) amino acid changes; tree 2 (green), 15,182 (26.2%) amino acid changes; and tree 3 (purple), 14,203 (24.5%) amino acid changes.
(C) The proportion of informative insertions or deletions (indels) in 9,405 genes supporting each of the three genes. Indels were filtered, requiring five flanking amino acids of perfect identity and no repetitive sequence. Tree 1 (red) is supported by 2 deletions and 6 insertions (66.7%); tree 2 (green), 1 deletion and 1 insertion (16.7%); and tree 3 (purple), 2 insertions (16.7%). Similar proportions but much larger counts are found when the indels are not filtered.
(D) The proportion of 9,315 genes with ML support for each of the three trees. Tree 1 (red) has ML support for 5,381 (57.8%); tree 2 (green), 2,188 (23.5%); and tree 3 (purple), 1,746 (18.7%).
Figure 3.
The history of a gene (colored lines) is drawn in the context of a species tree (gray bars). New lineages arising from new polymorphisms in the gene are drawn in different colors. In this case, the two alleles in the population prior to the split of Dmel are maintained through to the split of Dere and Dyak, leading to incomplete lineage sorting and an incongruent genealogy (tree 2). The greater the diversity in the ancestral population and the shorter the time between speciation events, the more likely nonspecies genealogies are.
Figure 4.
Median synonymous branch length trees derived from the genes supporting each of the three trees are drawn to the same scale. The branch spanning the two speciation events is quite short for all trees.
Figure 5.
Coalescence Probabilities for Each Tree
Using the formula p(congruence) = 1 − 2/3exp(−t), where t = generations / 2Ne, the probability of the species tree (black) and the probability of one of the two alternate trees (gray) was plotted as a function of t.
Figure 6.
Clustering of Informative Sites
The enrichment of informative nucleotide (A) and amino acid (B) substitutions near other substitutions that support the same phylogeny was found for all three trees and is on a scale roughly similar to estimates of linkage disequilibrium. At each informative site in the genome, the counts of informative sites supporting each of the three trees in 1-kb windows extending 30 kb up- and downstream were measured. For each type of informative site, the enrichment of the same type of informative site in each 1-kb window was calculated using the observed counts and the expected number of sites based on their genome-wide frequency. Enrichment is log10(observed / expected).
Figure 7.
An excess of incongruence above what is expected by chance was observed for the set of all genes (A) as well as the set of genes that consistently supported the same tree across models and species combinations (B). Genes were binned by bootstrap value, and the proportion of genes supporting tree 1 (red line), tree 2 (green line), and tree 3 (purple line) were plotted. The expected congruence based on the bootstrap value in each bin (black solid line) and the 95% confidence interval based on a X2 distribution (black dash line) demonstrate the excess incongruence.
Table 1.
Congruence and Fit to Data across Six Models of Evolution
Table 2.
Congruence across 21 Species Combinations
Figure 8.
Sequence and Evolutionary Gene Properties
Sequence and evolutionary properties of the genes are unable to explain the incongruence. Distributions are calculated using results from the original ML analysis using the F3×4 model and the Dmel, Dere, Dyak, and Dana species combination. The distributions of informative synonymous divergences in genes supporting each tree reveal a bias toward lower values for the incongruent genes (A). Nearly all genes with little or no informative synonymous divergence, however, are classified as inconsistent (B). Therefore, consistent genes have very similar distributions of ISD across trees (C). TSD is distributed similarly across trees, suggesting homoplasy due to increased mutation rates is not causing the incongruence (D). Gene length is slightly higher in tree 1 genes but overall is very similar across trees (E). Third codon position GC content is slightly biased toward lower values for Dmel and Dana and higher values for Dere and Dyak, creating a conservative bias for the incongruence (F).