The authors have declared that no competing interests exist.
Conceived and designed the experiments: NZ EAZ LW. Performed the experiments: NZ. Analyzed the data: NZ. Contributed reagents/materials/analysis tools: NZ JW. Wrote the paper: NZ EAZ JW.
Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (
Vitaceae is an economically important plant family, containing the fruit species, the grape (
The molecular phylogenetic analyses of Vitaceae initially were studied using several plastid genes, such as
Wen
It has been commonly reported that the evolutionary history of nuclear genes and that of organellar genes may be different due to incomplete lineage sorting [
With the recent rapid development of next-generation sequencing (NGS) tools, it has become efficient and cost-effective to obtain genomic sequence data and mine markers for plant phylogenetic analyses [
No specific permits were required for the collection of samples as they were all grown in the greenhouse, which complied with all relevant regulations. None of the samples represents endangered or protected species (
All voucher specimens were deposited at the United States National Herbarium (US).
Species | Accession number | Data size (Gb) | Number of reads | Coverage of plastome | Plastome size (bp) |
---|---|---|---|---|---|
3.52 | 23,496,856 | 1,689 | 160,964 | ||
3.11 | 20,763,492 | 1,671 | 161,011 | ||
N/A | N/A | NC_007957 | N/A | 160,928 | |
2.12 | 14,118,242 | 1,178 | 161,275 | ||
2.95 | 19,648,540 | 773 | 155,700 | ||
2.97 | 19,825,858 | 880 | 155,686 | ||
2.70 | 17,988,788 | 679 | 161,640 | ||
2.80 | 18,672,228 | 1,133 | 161,467 | ||
3.12 | 20,780,600 | 1,064 | 156,880 | ||
2.57 | 17,137,048 | 161 | 160,089 |
||
2.67 | 17,809,412 | 602 | 160,232 |
||
3.01 | 20,078,184 | 142 | 159,986 |
||
2.71 | 18,063,906 | 829 | 158,240 |
||
2.31 | 15,375,710 | 480 | 159,245 | ||
2.15 | 14,353,720 | 136 | 158,800 | ||
2.17 | 14,450,758 | 724 | 159,196 | ||
2.47 | 16,487,240 | 400 | 159,495 | ||
2.35 | 15,641,708 | 2,006 | 160,220 | ||
1.93 | 12,848,572 | 995 | 159,928 | ||
2.80 | 18,638,058 | 307 | 160,390 | ||
3.11 | 20,737,692 | 953 | 158,713 | ||
3.08 | 20,537,768 | 2,309 | 160,382 | ||
2.84 | 18,913,388 | 595 | 161,706 |
||
3.62 | 24,125,332 | 1,883 | 162,649 |
||
2.93 | 19,539,876 | 879 | 161,955 |
||
2.51 | 16,705,996 | 681 | 160,619 |
||
3.23 | 21,529,740 | 713 | 162,625 |
||
2.45 | 16,356,482 | 254 | 160,555 |
*: small gaps were not bridged for this plastid genome
To cover the major lineages of the grape family, 27 species were sampled in this study (
The raw data obtained from the GSAF was filtered using Trimmomatic version 0.32 [
Goremykin
In total, the plastomes of 28 species were aligned using MAFFT [
Because the sequences of plastid tRNA and rRNA are highly conserved, we excluded them from further analyses. In addition, sequences of only one copy of the inverted repeat (IR) region were included for the analyses because gene sequences of the two IR copies are completely or nearly identical. With the exclusions, we have a data set of 79 protein-coding genes (
For genes of mitochondrial origin, we selected and concatenated 16 regions (
Maximum likelihood (ML) and Bayesian trees were inferred with RAxML [
Overall, we obtained 74 Gb of data generated in a single Illumina lane for the 27 species. The minimum and the maximum size of the NGS data was 1.93 (12,848,572 reads) and 3.62 Gb (24,125,332 reads), for
For the three
Another strategy we used for the plastome assembly was a successive reference approach. After obtaining a complete plastome with high quality from one species (for example,
After extensive comparative assemblies, we obtained 17 complete plastomes with no gaps and ten plastomes with one to three gaps only (
Plastid genomes represent an important source for characters to be used in plant phylogenetic analyses. To test if the topology using plastomes was congruent with the one using the hundreds of nuclear genes reported by Wen
The tree was reconstructed using RaxML and MrBayes with gene partitioning, which resulted in the same topology. Numbers associated with the branches are bootstrap value and posterior probabilities, with the asterisk indicating the node having a bootstrap value of 100% and a posterior probability of 1.0.
Numbers associated with the branches are bootstrap values and posterior probabilities obtained using RaxML and MrBayes, respectively. The asterisk indicates that the bootstrap value is 100% and the posterior probability is 1.0 at the node.
We also tested the congruence of coding genes and noncoding regions in the plastomes, by analyzing the 79 protein-coding genes (
In addition, we explored whether genes with different evolutionary rates may lead to congruent topologies. Evolutionary rates can be generally measured by sequence identity, i.e., the percentage of identical sites in all sites of an alignment, with faster evolving genes having lower identity scores. This strategy resulted in three data sets of 10 genes, 26 genes, 56 genes, corresponding with identity less than 80%, 85%, 90%, respectively (
Matrix based on gene identity thresholds | Number of genes | Size of alignment (bp) | Parsimony informative sites |
---|---|---|---|
Less than 80% | 10 | 14,062 | 1,985 |
Less than 85% | 26 | 25,883 | 2,880 |
Less than 90% | 56 | 55,382 | 4,344 |
All 79 genes | 79 | 69,800 | 4,832 |
We used mitochondrial genes as another source of phylogenetic data for Vitaceae. Since the mitochondrial genome of the wine grape is large, i.e., 773,279 bp [
It has been reported extensively that phylogenies using genes from the three genomes (plastid, mitochondrial and nuclear) in plants may differ due to different evolutionary histories [
The position of
The
Although most studies using a genome skimming approach produced low-density coverage of the whole genome of a species, large data sets of chloroplast, mitochondrial, rDNA, or even other nuclear genes have been obtained [
Mitochondrial genes have been largely under-utilized in plant phylogenetic studies. Earlier plastome studies employed methods that isolated plastid DNA from fresh leaves via chloroplast enrichment using sucrose gradients [
Numbers associated with the branches are bootstrap values obtained using 10, 26, 56 and 79 genes, corresponding to sequence identities lower than 80, 85, 90 and 100%, respectively. The asterisk indicates the bootstrap value as 100%. If each of the four bootstrap values from all matrices are 100%, one diamond is placed at the node. The dashes indicate incongruence of a relationship from the one using 79 protein-coding plastid genes (see text for details).
(EPS)
Numbers associated with the branches are bootstrap values. The asterisk indicates a bootstrap value of 100%.
(EPS)
(DOCX)
(DOCX)
This study was supported by a Peter Buck Postdoctoral Fellowship from the National Museum of Natural History, Smithsonian Institution, awarded to Ning Zhang, the Laboratories of Analytical Biology of the Smithsonian National Museum of Natural History (NMNH), a grant from the National Science Foundation (DEB 0743474 to J. Wen), a Smithsonian Endowment Grant, and the Small Grants Program of the National Museum of Natural History.