Next-Generation Mitogenomics: A Comparison of Approaches Applied to Caecilian Amphibian Phylogeny

doi:10.1371/journal.pone.0156757

Table 1.

Voucher specimen (codes refer to vouchers: RAN = RAN’s field numbers; UMMZ = University of Michigan Museum of Zoology, Ann Arbor; MVZ = Museum of Vertebrate Zoology, Berkeley) and associated mitogenome sequence information for the six nominal species of Seychelles caecilian (species of Grandisonia, Hypogeophis, Praslinia).

GenBank codes in bold were published previously. bp = base pairs; Av. Cov. = average read coverage across mitogenome. * = genome sequence not fully complete; (1) = voucher incorrectly identified as G. alternans by Zhang & Wake (2009: see San Mauro et al. 2014). ^# = specimen that was excluded from phylogenetic analysis due to the mitogenome sequence being substantially incomplete.

More »

Expand

Table 2.

Size ranges used to partition the Illumina HiSeq dataset into a manageable size based on a sliding window analysis.

Position 0 refers to the start of the trnF(gaa) tRNA gene.

More »

Expand

Table 3.

Summary information for mitogenome data partitions and their best-fit models.

All data are for nucleotides, except “Amino Acid”. CS = number of constant sites, PI = number of parsimony informative sites, CP1, 2, 3 = protein-coding codon position 1, 2 and 3.

More »

Expand

Table 4.

Coverage data and total length of mitogenome sequences generated by different platforms.

Coverage data for each platform is reported as number of sequence reads used and approximate number of bp in parentheses based on the mean read length (RL). The total lengths of reconstructed mitogenomes are reported under the MtL (mitogenome length in bp) column. Numbers in parentheses within the header row refer to mean RL for each platform.

More »

Expand

Table 5.

Number of single base pairs (bp) that were incorrectly called in the three long-amplicon multiplexed mitogenome sequences, as inferred from consensus reads across the sequencing platform data.

More »

Expand

Fig 1.

The four phylogenetic tree topologies inferred from the five data sets.

(a) for both the complete nucleotide data and the protein-coding nucleotide data (b) rRNA, (c) tRNA (d) amino acids. In (a) numbers above branches are support for the complete nucleotide data and below for the protein-coding nucleotides (BI/ML). In (b) and (c) numbers above branches are for analyses with BI/ML. In (d) values above branches are Bayesian posterior probabilities for the unpartitioned CAT and CATGTR analyses run on PhyloBayes/ and BI/ML support for the gene-partitioned dataset. Maximal support is indicated by a single * and support values below 0.5/50% (BI/ML) are indicated by “-”(or by collapsed branches in the PhyloBayes tree (d)). Symbols at terminals refer to genus: stars = Praslinia; squares = Hypogeophis; circles = Grandisonia. Colours refer to species: black = P. cooperi; red = H. rostratus; turquoise = H. brevis; brown = G. alternans; yellow = G. larvata; blue = G. sechellensis. All trees were rooted with Praslinia cooperi. Source trees and branch lengths are deposited online with the Natural History Museum data repository.

More »

Expand

Fig 2.

The fifteen rooted trees for the four taxa used to assess taxon instability and their percentage frequency of occurrence in 1000 Bayesian or Bootstrap (LogDet) trees.

Taxa abbreviated as follows: Grandisonia alternans (A), Hypogeophis brevis (B) Grandisonia larvata + Grandisonia sechellensis (LS) and Hypogeophis rostratus (R). Numbers below trees are support values for analyses of: all nucleotides / protein-coding nucleotides / tRNAs / rRNAs /amino acids / LogDet for all data / LogDet for protein-coding data. < = less than 1% support,— = zero support.

More »

Expand

Table 6.

Summary of percentage of support for clades presented in Fig 2.

A = G. alternans, B = H. brevis, L = G. larvata + G. sechellensis, R = H. rostratus.—indicates zero support. Abbreviations in column 1 are as follows: BI = Bayesian Inference analysis; LD = LogDet analysis; All = complete nucleotide dataset; rRNA = rRNA dataset; tRNA = tRNA dataset; PC = protein coding nucleotide dataset; AA = amino acid dataset.

More »

Expand

Table 7.

Comparison of performance of five approaches for generating our mitogenome sequence data from eight samples of Seychelles caecilians.

Approximate relative ‘values’ depicted are * = low, ** = moderate, *** = high.

More »

Expand