Noise and biases in genomic data may underlie radically different hypotheses for the position of Iguania within Squamata

doi:10.1371/journal.pone.0202729

Fig 1.

Summary of phylogenetic relationships obtained using morphological (left), molecular (center) and combined datasets (right). Topologies correspond to the strict consensus of the optimal trees under equal-weights maximum parsimony and Bayesian inference for each dataset (original trees can be found in S4 File). Main lizard clades are color coded: light blue = Anguimorpha, purple = Gekkota, green = Iguania, brown = Lacertoidea, orange = Scincoidea, red = Serpentes. Circles on nodes correspond to support values, coded as shown at the lower left corner. JK = jackknife, PP = posterior probability.

More »

Expand

Fig 2.

Incongruence in the inference of the lizard tree based on individual genes.

(A) Projection of topological differences among gene trees on a two-dimensional treespace. The histogram shows the distribution of pairwise Robinson-Foulds (RF) distances among gene trees, with grey bars representing the fraction of distances larger than the that between the morphological and concatenated molecular topologies (yellow circles). (B) Supernetwork condensing gene tree incongruence. Clades are colored as in Fig 1. Most topological variability is restricted to the backbone (black branches), characterized by different resolutions (reticulations) found at a low frequencies (C) Histogram showing the types of topologies found in the confidence set of trees for all genes. Light grey bars show numbers of genes for which the confidence set contains more than one option (85%), revealing insufficient levels of phylogenetic signal to distinguish among competing resolutions of the lizard backbone clades.

More »

Expand

Fig 3.

Analysis of the rate of evolution in the molecular dataset.

(A) Time-calibrated topology of Zhen & Wiens (2016) with the PI profile of the molecular dataset superimposed (red curve). The informativeness of the dataset decays by the time spanned by the initial crown squamate radiation (branches highlighted in color). (B) Signal and noise analysis of individual genes. The y-axis represents the probability with which individual genes contribute to the correct resolution of the quartets centered on each of the four backbone branches (colored as in A). The stronger the color, the more likely a given probability-outcome is in the set of genes analyzed.

More »

Expand

Fig 4.

Systematic biases in the molecular dataset.

A. Inferred rate of evolution for each of the main clades of lizards studied. Both the median and 95% confidence intervals are represented. In the case of Iguania, the white dots also show the median values for the rate of evolution estimated individually for Acrodonta (right) and Iguanidae (left). B. Nucleotide composition of snakes and iguanians differs systematically from that of the remaining squamates. Values correspond to the average percentage of AT ± 1 standard deviation. C. Clustering of snakes and iguanians (grey dot) due to similar patterns of AT skewness. The tree represents a hierarchical clustering dendrogram, estimated using Euclidean distances of AT skewness per gene.

More »

Expand

Fig 5.

Comparison of the PI profiles for the protein-coding (same as in Fig 3) and UCE datasets with morphology.

Morphology evolves at a slower rate, leading to a PI-profile peak at around 171.5 Ma, an age markedly older than either molecular dataset. The rate of decay is also less steep for the morphological data. Thus, morphology accumulates noise at a much slower pace, potentially retaining more phylogenetic signal to resolve ancient and short branches. The height of the profiles is standardized to emphasize their temporal dynamics; when calculated without standardization, peak informativeness is 75% lower for morphology, and 55% lower for UCEs, than is the peak for protein-coding data measured on a per character basis. This standardization is applied in recognition of the fact that morphological characters are preselected to be informative during the timeframe under study; therefore, their absolute informativeness cannot be compared directly, whereas the shape of their informativeness profile—when their rate of change implies that they will be useful for phylogenetic inference—remains of interest.

More »

Expand