Phylogeny and Divergence Times of Gymnosperms Inferred from Single-Copy Nuclear Genes

Phylogenetic reconstruction is fundamental to study evolutionary biology and historical biogeography. However, there was not a molecular phylogeny of gymnosperms represented by extensive sampling at the genus level, and most published phylogenies of this group were constructed based on cytoplasmic DNA markers and/or the multi-copy nuclear ribosomal DNA. In this study, we use LFY and NLY, two single-copy nuclear genes that originated from an ancient gene duplication in the ancestor of seed plants, to reconstruct the phylogeny and estimate divergence times of gymnosperms based on a complete sampling of extant genera. The results indicate that the combined LFY and NLY coding sequences can resolve interfamilial relationships of gymnosperms and intergeneric relationships of most families. Moreover, the addition of intron sequences can improve the resolution in Podocarpaceae but not in cycads, although divergence times of the cycad genera are similar to or longer than those of the Podocarpaceae genera. Our study strongly supports cycads as the basal-most lineage of gymnosperms rather than sister to Ginkgoaceae, and a sister relationship between Podocarpaceae and Araucariaceae and between Cephalotaxaceae-Taxaceae and Cupressaceae. In addition, intergeneric relationships of some families that were controversial, and the relationships between Taxaceae and Cephalotaxaceae and between conifers and Gnetales are discussed based on the nuclear gene evidence. The molecular dating analysis suggests that drastic extinctions occurred in the early evolution of gymnosperms, and extant coniferous genera in the Northern Hemisphere are older than those in the Southern Hemisphere on average. This study provides an evolutionary framework for future studies on gymnosperms.


Introduction
A solid organismal phylogeny is fundamental to study evolutionary biology and historical biogeography. In recent years, the angiosperm phylogeny group (APG) III system has provided an evolutionary framework for studying angiosperms [1]. However, phylogenetic relationships of the main lineages of gymnosperms, either classified into four subclasses (Cycadidae, Ginkgoidae, Gnetidae and Pinidae) by Christenhusz et al. [2] or into the widely accepted five clades (cycads, ginkgos, cupressophytes, Pinaceae and gnetophytes), are still in hot debate. Gymnosperms, which have been resolved as the sister group of angiosperms by increasing evidence from morphological, molecular phylogenetic and evolutionary developmental studies [3][4][5][6][7][8][9][10][11][12][13], bear important information of seed-plant evolution, and represent an important link in the evolution of many gene families and biological pathways. Therefore, a better understanding of evolutionary relationships within gymnosperms can also help us to interpret the evolution of seed plants, and even molecular evolution in land plants.
Gymnosperms have a rich fossil record that is very useful for phylogenetic reconstruction, but this group suffered a dramatic extinction in the Cenozoic [14] and currently comprises 12 families, 83 genera, and only a little more than 1,000 species [2], which makes it difficult to resolve some interfamilial and intergeneric relationships (see review by Wang and Ran [13]). The early molecular phylogenetic studies of gymnosperms only sampled a small part of the recognized genera [4,5,[15][16][17], and in particular most published molecular phylogenies were constructed based on uniparentally inherited cytoplasmic DNA markers and/ or the multi-copy nuclear ribosomal DNA [4,5,[14][15][16]. Despite that 53 genera representing all extant main lineages of gymnosperms were studied in Ran et al. [16], the main focus of the study was the fast evolution of the mitochondrial gene rps3 in Conifer II (cupressophytes) and the underlying mechanisms. Some other studies of gymnosperms mainly focused on individual families or clades, such as conifers [17,18], Cupressaceae [9][10][11][12][13][14][15][16][17][18][19][20][21][22], Pinaceae [23] and cycads [24,25]. Although great progress has been made on understanding the phylogeny of gymnosperms in recent years, more interesting phylogenetic hypotheses have been proposed and hotly debated (see review by Wang and Ran [13]), like the phylogenetic position of Gnetales and the relationship between cycads and ginkgos [11,12,26,27]. Till now, there is not a molecular phylogeny of gymnosperms that is reconstructed based on a complete sampling of all extant genera, although this ancient and widespread plant group has huge ecological and economic value. Also, it would be interesting to know whether the phylogenetic relationships of gymnosperms inferred from cytoplasmic DNA are supported by evidence from the nuclear genome, given the complex inheritance patterns of organellar genes in this group [28]. Moreover, phylogenetic relationships within some lineages, such as Pinaceae [23,29], Podocarpaceae [30] and Zamiaceae [25,31,32], need to be further resolved.
Due to the fast development of genome sequencing technologies, phylogenomic analyses have been increasingly used in reconstructing the tree of life, and the efficiency of using multiple single-or low-copy nuclear genes for phylogenetic analysis has been widely recognized [33]. However, this is still difficult for gymnosperms with large and complex nuclear genomes characterized by long introns and numerous gene-like fragments [34]. For example, based on ESTs, Lee et al. [27] analyzed millions of amino acid sites from 150 species across land plants, and placed Gnetales as sister to the rest of the gymnosperms, but their dataset suffered greatly from missing data and poor alignment (our unpublished analysis). Nevertheless, Yang et al. [22] successfully used two sister nuclear genes LEAFY (LFY) and NEEDLY (NLY), which originated from an ancient gene duplication in the common ancestor of seed plants and encode transcription factors regulating the development of reproductive structures in gymnosperms [35,36], to reconstruct the phylogeny of Cupressaceae comprising all its 32 genera. They also confirmed that both LFY and NLY exist as single copy in gymnosperms, even in the polyploid species, and are excellent markers for studying the phylogeny and evolution of gymnosperms [22].
In this study, on the basis of Yang et al. [22], we use LFY and NLY gene sequences to reconstruct the phylogeny of gymnosperms based on a complete sampling of extant genera, in effort to provide an evolutionary framework for future studies on this important group. In addition, some controversial interfamilial and intergeneric relationships are resolved and discussed. Moreover, benefiting from the rich fossil record, we estimate the divergence times of different lineages, which would further help us understand the diversification history of gymnosperms.

Ethics statement
No specific permits were required for the sampling.

Taxon sampling
Ninety species representing all recognized genera of extant gymnosperms were sampled. Most genera were represented by one species each, since the coding sequences of LFY and NLY used to reconstruct the phylogeny of gymnosperms are very conserved among congeneric species. If using introns of the two genes, the sequences are unalignable between the main clades of gymnosperms [22], and most congeneric species do not form monophyletic groups, respectively, due to the wide interspecific sharing of alleles as reported in Pinus [37]. Therefore, the addition of more congeneric species can not significantly improve the resolution of intergeneric relationships of gymnosperms when using single-copy nuclear genes like LFY and NLY. Nevertheless, we sampled two species of Pinus to represent its two subgenera with an ancient divergence, and more species from the Juniperus-Cupressus-Callitropsis-Xanthocyparis-Hesperocyparis clade, in which the generic division is controversial [22]. The origins of materials, including the data downloaded from NCBI, are shown in Table  S1. DNA and RNA extraction, PCR and RT-PCR amplification, cloning and sequencing Total DNA was extracted from silica-gel dried leaves using either the modified CTAB method [38] or the DNAsecure Plant Kit (Tiangen, Beijing, China). Young leaves and reproductive organs of Ephedra equisetina were collected for total RNA extraction, which followed the modified Trizol method (Tiangen). The first-strand cDNA was produced using the 59 RACE system (Invitrogen) and the 39 RACE kit (Tiangen). Polymerase chain reaction (PCR) was conducted in a Veriti 96-Well Thermal Cycler (Applied Biosystems, Foster City, CA, USA) or an Eppendorf Mastercycler (Eppendorf Scintific, Westbury, NY, USA), in a volume of 25 ml containing 50-200 ng of DNA or cDNA template, 6.25 pmol of each primer, 0.2 mM of each dNTP, 2 mM MgCl 2 , and 0.75 U of ExTaq DNA polymerase (Takara Biotechnology, CO., Ltd. Dalian, China). PCR cycles were as follows: one cycle of 4 min at 94uC, four cycles of 1 min at 94uC, 30 s at 55-58uC, and 1.5-6.0 min at 72uC, followed by 32 cycles of 30 s at 94uC, 30 s at 53-55uC and 1.5-6.0 min at 72uC, with a final extension step for 10 min at 72uC.
After separation by 1.5% agarose gel electrophoresis, the PCR products were purified using the TIANgel Midi Purification Kit (Tiangen) and identified by direct sequencing with the PCR primers. Then, the correct PCR products were cloned with the pGEM-T Easy Vector System II (Promega, Madison, USA). Ten clones with the correct insertion, confirmed by EcoR I digestion, were picked for each species and screened for variation by sequencing with T7 primer. All distinct clones were further sequenced using SP6 and internal primers. Sequencing was performed using the BigDye Terminator v3.1 Cycle Sequencing Kit., and the sequencing products were separated on a 96-capillary 3730XL DNA analyzer (Applied Biosystems). All newly sequenced LFY and NLY genes, totaling 104 sequences, are deposited in NCBI under GenBank accession numbers KF377856-KF377901, KF377904-KF377918 and KF377921-KF377963 (Table S1). The primers used for amplifying and sequencing the LFY and NLY genes are shown in Table S2.

DNA sequence analysis
Sequence alignments were generated with CLUSTAL X [39] and manually refined. The variable sites and variability of conspecific clones were calculated using MEGA5 [40] and BioEdit v7.2.0 [41], respectively. Introns of the two nuclear genes could not be reliably aligned among distantly related gymnospermous families, and thus were excluded when constructing the entire phylogeny of gymnosperms. However, some intron regions are relatively conserved and alignable within cycads and Podocarpaceae, respectively, and thus were included in the alignments to infer the intergeneric relationships of these groups. The aligned sequences were further trimmed using the Gblocks server (http:// molevol.cmima.csic.es/castresana/Gblocks_server.html).
We used the software DAMBE [42] to test substitution saturation for the two datasets LFY and NLY, and the results showed that none of them was substitutionally saturated. To determine whether the two gene datasets can be combined, we checked variation of clones in each species, and found that many species did not show clone polymorphism of LFY and NLY and no more than two distinct clones occurred in the same individual. In particular, the conspecific clones showed a high sequence similarity of over 95%. Then, we tried to conduct separate phylogenetic analyses for LFY and NLY that included all distinct clones, and the results showed that conspecific clones grouped together except two LFY clones from the tetraploid species Fitzroya cupressoides that were discussed in Yang et al. [22]. Therefore, we randomly selected one clone from each species for the further analyses. The incongruence length difference test (ILD) [43], implemented in PAUP * 4.0b10 [44], CONCATERPILLAR (a hierarchical likelihood ratio test) [45], and CADM (a test of congruence among distance matrices) [46] were performed to assess congruence between different datasets. According to the three tests, no significant incongruence existed between LFY and NLY (Table 1), so we combined the two genes for phylogenetic analysis.

Phylogenetic analysis
Initially, we used the LFY + NLY coding sequences (CDS) and the 1 st +2 nd codon positions, respectively, to reconstruct the phylogeny of all sampled gymnosperms. The fern Angiopteris lygodiifolia was used as outgroup for two reasons. First, as mentioned in the introduction, the two nuclear genes of gymnosperms originated from a duplication event in the common ancestor of seed plants, and the NLY gene was lost in angiosperms. Thus, the LFY gene of ferns may represent an ancestral state of the two genes. Second, the LFY gene sequence cannot be reliably aligned between gymnosperms and angiosperms, although a sister relationship between the two groups is supported by most recent studies (see review by Wang and Ran [13]). The results showed that the phylogenetic trees generated from different methods all supported cycads as a monophyletic group and the basal-most clade of gymnosperms (Fig. S1). To avoid long-branch attraction (LBA) artifacts, the phylogeny of gymnosperms was further reconstructed using cycads as functional outgroups. In addition, to better resolve the intergeneric relationships within cycads and Podocarpaceae that were controversial, we conducted separate phylogenetic analyses for the two lineages with combined LFY + NLY sequences, and compared gene trees generated from CDS and CDS+intron, respectively. The sister groups were chosen as outgroups, including Ginkgo biloba for cycads [26], as well as Araucaria heterophylla and Agathis robusa for Podocarpaceae [22]. When introns were included, the sequences could not be aligned among different gymnospermous families, and therefore the generated trees were not rooted or rooted with a functional outgroup, such as Cycas for cycads. The details of all datasets used for phylogenetic analyses are shown in Table 2. The trees and alignments are deposited in TreeBase (number S16207).
Phylogenetic relationships were reconstructed using maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI), respectively. The MP analyses were implemented in PAUP * 4.0b10 [44], using heuristic searches with 1000 random addition sequence replicates, starting trees obtained via stepwise addition, tree-bisection-reconnection (TBR) branch swapping, MulTrees and Collapse options in effect, and a maximum of 2000 trees saved for each replicate. Robustness of the nodes (50% majority-rule consensus) was tested by the bootstrap analysis [47] using 1000 replicates with the same settings as above. The evolutionary models for the ML and BI analyses were optimized in jModeltest 2.0 [48] and MrModeltest 2.3 [49], using Akaike Information Criterion (AIC), respectively. The best models for analyses are shown in Table 2. The ML analyses were carried out in PHYML version 2.4.4 [50] with a BIONJ tree as a starting point, and support values for the nodes were calculated based on 100 bootstrap replicates. The Bayesian inference was performed with MrBayes 3.1.2 [51]. One cold and three heated Markov chain Monte Carlo chains were run for 10,000,000 generations with random initial trees, and every 1000 generations were sampled. The first 20% of the samples were discarded as burn-in and a 50% majority-rule consensus tree was generated based on the trees sampled after generation 2,000,000.
The Shimodaira-Hasegawa test (SH test) [52] and the Kishino-Hasegawa test (KH test) [53], implemented in PAUP * 4.0b10, were used to test alternative phylogenetic hypotheses for the deep lineages with controversial phylogenetic positions. The different positions of three taxa, including Ginkgoaceae (sister to cycads or conifers + Gnetales), Gnetales (sister to conifers, Conifer II, Pinaceae, or other gymnosperms), and Sciadopityaceae (sister to Cupressaceae + Taxaceae + Cephalotaxaceae or Araucariaceae + Podocarpaceae), were compared. Alternative tree topologies were generated in PhyML 2.4.4 [50], and the tree files were run in PAUP to calculate the p-value for each topology.

Divergence time estimation
Based on the LFY + NLY coding sequences, the divergence times of gymnosperms were estimated using the Markov chain Monte Carlo (MCMC) method, which was implemented in BEAST v1.7.5 [54], under an uncorrelated lognormal-relaxed clock model of rate variation among lineages. The topology was constrained to reflect the ML tree, and a GTR+I+G substitution model was used. Mean substitution rates were allowed to vary. Sauquet et al. [55] suggested that more age constraints could lead to improved time estimates, but risky age constraints might strongly influence estimated ages. Hence, we incorporated 11 fossil constraints that were widely recognized and used in previous molecular dating of gymnosperms or seed plants [18,22,56], and nearly each main lineage of gymnosperms was calibrated by at least one fossil record (For details, see Table S3).
For the most recent common ancestor (MRCA) of gymnosperms (A), a minimum age of 306.2 Ma was set based on Cordaixylon iowensis, the oldest cordaitean coniferophyte found in the Laddsdale Coals (Cherokee Group, Desmoinesian Series; 307.261.0 Ma) near What Cheer of Iowa, and a maximum age of 366. 8 Ma was set based on the well-documented first appearance of seeds (in the form of preovules) in the Upper Fammenian (Upper Devonian) VCo Spore Biozone [57][58][59]. In cycads, the stem age of Lepidozamia (B) was constrained to a minimum age of 33.9 Ma based on the fossil of Lepidozamia leaves from the Eocene of Australia, which possesses cuticular characters that are unique  99 Ma, Glyptostrobus in Cretaceous) [69,70], Diselma-Fitzroya-Widdringtonia (J, 95 Ma, Widdringtonia in Cretaceous) [71], and Juniperus-Cupressus-Hesperocyparis (K, 33.9 Ma, Juniperus in the Eocene/Oligocene boundary) [72]. Since the age estimates by BEAST are usually older than those by PL (penalized likelihood) and the ages estimated with lognormal priors are slightly younger than those estimated with either uniform or exponential priors, Sauquet et al. [55] suggested that using lognormal priors can decrease the uncertainty in age estimates. Therefore, in this study, all fossil constraints were given lognormal prior distributions in the BEAST estimate. For the root constraint, we used a stdev of 0.5, a prior mean of 3.6, and an offset of 290.7 Ma. For a better comparability of our results with previous divergence time estimates of gymnosperms, the other constraints were set following Leslie et al. [18] and Yang et al. [22]. The minimum age was set by the age of fossil, with a 95% confidence interval of the probability distribution extending 20 or 40 million years earlier than this minimum age, since the test by Leslie et al. [18] found that the fossil calibrations associated with the two prior age distributions led to very similar divergence time estimates. We ran four independent MCMC runs of 100 million generations, sampling every 2,500 generations. Tracer v1.5 was used to check convergence of the chains to the stationary distribution, ensuring the Effective Sample Size (ESS) .200. The first 20% of the generations were discarded as burn-in and trees were summarized with TreeAnnotator. The final tree and divergence times were visualized using FigTree v1.4.0.

Sequence characterization
In this study, we cloned and sequenced the LFY and NLY genes from 41 genera of 7 families (Table S1). These new data combined with the sequences downloaded from GenBank (mostly reported in Yang et al. [22]) completely represented all extant genera of gymnosperms. In Parasitaxus usta, the only parasitic conifer, we only got a pseudogene of NLY, in which several indels in the second exon led to an ORF shift. It is interesting that, by RT-PCR, we obtained cDNA sequences of both LFY and NLY genes from Ephedra equisetina and the LFY gene had two clone types that differed by a 9-bp deletion.
Both LFY and NLY sequences amplified from genomic DNA comprised three exons and two introns, and almost covered the full length of the two genes. The exon length was conserved, totally about 1000 bp, but the intron length varied greatly among different groups. A long repeat occurred in the first intron of the NLY gene of four Taxaceae genera (Pseudotaxus, Austrotaxus, Amentotaxus and Torreya), making it difficult to sequence the full length of the gene. Also, the first NLY intron of Cathaya and Pseudotsuga, two genera of the pine family, was difficult to sequence due to long length or complex structures. The detailed information of the sequence alignments for phylogenetic analyses, including sequence lengths and numbers of variable and parsimony-informative sites, is shown in Table 2.

Phylogenetic analysis
Since the MP analysis is more easily affected by long branch attraction (LBA) than the ML and BI analyses [73][74][75], we did not show the MP trees in this study. As mentioned earlier, when Angiopteris lygodiifolia was used as outgroup, all phylogenetic trees generated supported cycads as a monophyletic and basalmost group of gymnosperms, followed by Ginkgo (see the ML and BI trees in Fig. S1). When cycads were used as functional outgroups, the ML and BI trees generated from combined LFY and NLY CDS were topologically identical to each other, except for some branches with low statistical support. In the ML tree ( Fig. 1), Ginkgoaceae was sister to the remaining gymnosperms excluding cycads, and Pinaceae was sister to a clade that was further divided into two sister subclades, i.e., Gnetales and conifer II (Cupressophytes). The conifer II was split into two lineages. One consisted of Sciadopityaceae, Podocarpaceae and Araucariaceae, and Sciadopityaceae was weakly supported to be sister to Podocarpaceae-Araucariaceae. Within the other lineage, Cephalotaxaceae was embedded in Taxaceae, and the two families formed a monophyletic group sister to Cupressaceae. In addition, this nuclear gene tree provided a relatively good resolution for intergeneric relationships in some families such as Pinaceae and Cupressaceae.
Although the trees generated with different rooting or from different codon positions (all CDS vs. 1 st +2 nd codons) were similar in topology, they differed in the positions of Gnetales and Sciadopityaceae (Fig. 2). For instance, in the phylogenetic tree generated from the first and second codon positions and rooted with cycads (Fig. 2D), Gnetales was weakly supported to be sister to Pinaceae, and Sciadopityaceae sister to a well-supported clade containing Cupressaceae and Taxaceae-Cephalotaxaceae. Moreover, many intra-familial relationships were poorly resolved (tree not shown), perhaps due to the declined phylogenetic signals caused by the removal of the third codon positions.
Since the phylogenetic positions of some genera of cycads and Podocarpaceae were controversial in previous studies [14,24,[30][31][32][76][77][78][79], here we reconstructed internal relationships of the two groups, respectively. When only the LFY and NLY CDS was used, the phylogenetic signals were insufficient to resolve some intergeneric relationships (Figs. 3, 4), therefore we added the conserved intron regions of the two genes into analysis. For cycads, the addition of introns neither changed the tree topology nor greatly improved the resolution (Fig. 3), and the generated trees suggested a basal position of Dioon in Zamiaceae, a sister relationship between Zamia and Microcycas, and a close relationship among Encephalartos, Lepidozamia and Macrozamia. However, the resolution of internal relationships of Podocarpaceae was improved by adding intron sequences, with high support values for most nodes (Fig. 4). A large clade was strongly supported and well resolved, containing Microcachrys, Saxegothaea, Pherosphaera, Acmopyle, Dacrycarpus, Dacrydium, Falcatifolium, Afrocarpus, Podocarpus, Nageia and Retrophyllum. Within the clade, there existed two monophyletic sister groups. One was the 'dacrydioid' group comprising Dacrycarpus, Dacrydium and Falcatifolium, and the other was the 'podocarpoid' group including the Retrophyllum-Nageia subclade and the Afrocarpus-Podocarpus subclade. In addition, a close relationship among Manoao, Lagarostrobos and Parasitaxus was revealed (Fig. 4).
The results of the SH and KH tests are shown in Table S4. The trees placing Ginkgoaceae with conifers + Gnetales were better than the trees placing the family sister to cycads, but the trees placing Sciadopityaceae with Podocarpaceae + Araucariaceae were not significantly different from the trees placing Sciadopityaceae sister to Cupressaceae + Taxaceae + Cephalotaxaceae. The sister relationship between Gnetales and the other gymno- Phylogeny and Divergence Times of Gymnosperms PLOS ONE | www.plosone.org sperms was rejected by both SH and KH tests for the CDS dataset, and by the KH test for the dataset of the 1 st +2 nd codon positions. In addition, the topology placing Gnetales sister to conifers was rejected by the KH test for the CDS dataset. There was not significant difference in ln score between the other two topologies (Gnetales sister to Conifer II or Pinaceae).

Divergence time estimation
The divergence time estimation based on combined LFY and NLY CDS suggested a Triassic-Jurassic origin of the crown group for most families (Fig. 5). The mean ages and 95% HPDs are shown in Table S5. The most recent common ancestor (MRCA) of cycads was dated to the Middle Jurassic (158. 1 Ma), and that of Pinaceae to the Lower Triassic (198. 4 Ma). The divergence time between Cupressaceae and Taxaceae s.l. was close to that between Podocarpaceae and Araucariaceae, i.e., in the Late Triassic to the Early Jurassic. Most gymnosperm genera originated in the Cretaceous to the Cenozoic (Fig. 5).

Evolution and phylogenetic utility of the LFY and NLY genes in gymnosperms
Our study indicates that both LFY and NLY genes occur in all extant genera of gymnosperms except that NLY has not been found in Gnetum (Table S1). Frohlich and Parker [80] found that the LFY-NLY gene pair originated from a duplication event in the common ancestor of seed plants, and then both paralogous genes were remained in gymnosperms while NLY was lost in angiosperms. Using this information of gene evolution as strong evidence, they also proposed the mostly male theory of flower origin given the important role of LFY in flower development, although this theory is not supported by the study of Vazquez-Lobo et al. [35]. The study of Frohlich and Parker [80] only sampled a few species from gymnosperms, and supposed the existence of NLY in Gnetum. Frohlich [81] further mentioned the occurrence of both LFY and NLY in Ephedra (his unpublished observations), a close relative of Gnetum. Our present study has covered all extant gymnospermous genera, and the results suggest that each studied species harbors both LFY and NLY genes, except that NLY is still not found in Gnetum. Therefore, our study further supports that the LFY-NLY gene pair originated from an ancient gene duplication, at least before the divergence of gymnosperms. In addition, we have successfully obtained the LFY and NLY genes of Ephedra by RT-PCR and RACE. Moreover, the selection test suggests that both LFY and NLY genes have experienced strong purifying selection in gymnosperms (our unpublished data), implying their conserved functions.
Currently, functions of LFY and NLY in gymnosperms are still not very clear [35,[82][83][84][85][86][87], but it is clear that both of them exist as single-copy genes suitable for phylogenetic reconstruction in gymnosperms based on the present study and Yang et al. [22]. No more than two distinct clones were found in the same individual. Although the NLY sequence obtained from the parasitic Parasitaxus usta represents a pseudogene, its exon region shares a high similarity (over 90%) with that of other Podocarpaceae species, and thus still could be used in phylogenetic analysis. Actually, the LFY gene has been successfully utilized in phylogenetic and biogeographic studies of several gymnosperm groups, including Gnetum [88], Thuja [89], and Pseudotsuga [90]. In particular, the intergeneric relationships of Cupressaceae s.l. have been well resolved by the LFY and NLY genes [22].

Interfamilial relationships of gymnosperms
Our study provides the first molecular phylogeny of gymnosperms covering all extant families and genera, and the phylogeny is based on two single-copy nuclear genes LFY and NLY (Fig. 1). This nuclear gene phylogeny is topologically largely consistent with most previous phylogenies of gymnosperms constructed based on cytoplasmic and/or nuclear ribosomal DNA [4,14,16]. That is, the cycads diverged first, followed by Ginkgoaceae, and then conifers plus Gnetales. Within conifers, Podocarpaceae is sister to Araucariaceae, and Cephalotaxaceae-Taxaceae sister to Cupressaceae (Fig. 2). However, we did not found a sister relationship between cycads and Ginkgoaceae as suggested by chloroplast phylogenomic analyses [12,26] as well as genome-scale nuclear and plastid data [11].
The present study seems to supports the monophyly of Taxaceae s.l. that includes Cephalotaxus (Figs. 1, S1), which is consistent with the study of Leslie et al. [18] based on rbcL, matK, 18S and PHYP. To confirm whether the topology is really constant, we further conducted phylogenetic analyses for the Cephalotaxaceae-Taxaceae lineage using two species (Taiwania cryptomerioides and Cunninghamia lanceolata) of its sister group Cupressaceae as outgroups. The results indicate that Cephalotaxus is strongly supported to be sister to Taxaceae based on either LFY or LFY + NLY CDS, but is nested within Taxaceae with a weak support based on NLY (Fig. 6). The inconsistent positions of Cephalotaxaceae in different analyses could be caused by LBA artifacts or insufficient resolution of the markers. Actually, the evolutionary relationship of Cephalotaxaceae and Taxaceae has been controversial for a long time. All molecular studies based on chloroplast and/or nuclear ribosomal DNA suggested a sister relationship between the two families [91][92][93], but many morphological studies supported the merge of them (see review by Ghimire and Heo [94]). A more broadly defined Taxaceae including Cephalotaxaceae has been suggested by Quinn et al. [95] based on rbcL and matK sequence analyses, and by Ghimire and Heo [94] based on a cladistic analysis of morphological characters. Also, in the new gymnosperm classification scheme of Christenhusz et al. [2], Cephalotaxaceae was merged into Taxaceae, and this taxonomic treatment has been adopted by Lang et al. [96] in the revision of Cephalotaxus. As discussed above, more studies are still needed to resolve the relationship between Cephalotaxaceae and Taxaceae.
The systematic position of Gnetales has been debated for several decades, which involves six main hypotheses (see reviews by Braukmann et al. [8] and Wang and Ran [13]), i.e., anthophyte, gnetales-other seed plants, gnetales-other gymnosperms, Gnetifer, Gnecup and Gnepine [4,5,8,9,80,[97][98][99][100][101]. The last three hypotheses all support a close relationship between Gnetales and conifers. In particular, the Gnepine hypothesis (Gnetales sister to Pinaceae) is supported by more and more molecular phylogenetic studies after eliminating bias in data analyses (see review by Wang and Ran [13]), despite the fact that the Gnecup hypothesis (Gnetales sister to conifer II or cupressophytes) is still supported by a couple  of recent phylogenomic studies using all chloroplast genes [9,12]. According to the present study, the Gnetales has a close relationship with conifers, although it has not been resolved whether the Gnecup or Gnepine hypothesis is correct. In the trees generated from combined LFY and NLY CDS, Gnetales is strongly supported as sister to conifer II ( Fig. 2A and 2C), which is corroborated by the SH and KH tests (Table S4). However, when excluding the third codon positions and using cycads as functional outgroups, the most popular Gnepine hypothesis is recovered with low support (Fig. 2D, Table S4). A similar phenomenon is also observed in Sciadopityaceae. When all CDS sequences are used, this family is moderately supported as sister to the Podocarpaceae-Araucariaceae clade ( Fig. 2A and 2C), but when excluding the third codon positions it is revealed as sister to the Taxaceae-Cephalotaxaceae-Cupressaceae clade ( Fig. 2B and 2D) as found in most previous studies [14,[16][17][18]22]. The topological conflicts on phylogenetic positions of Gnetales and Sciadopityaceae may be attributed to LBA artifacts that could occur when the fast-evolving third codon positions are included in analyses. Zhong et al. [9] also found that the LBA artifacts and parallel changes could mislead the phylogenetic placement of Gnetales when using chloroplast genome data, and the removal of fast-evolving genes can effectively alleviate the LBA artifacts, thereby recovering a sister relationship between Gnetales and Pinaceae.

Intergeneric relationships within gymnospermous families
The combined LFY and NLY CDS phylogeny provides a good resolution for intergeneric relationships within four families including Cupressaceae, Pinaceae, Taxaceae and Araucariaceae (Figs. 1, S1). The LFY + NLY phylogeny of Cupressaceae has been discussed in detail by Yang et al. [22]. For Pinaceae, all of the eleven genera form two strongly supported clades. One clade comprises Cedrus, Pseudolarix, and two pairs of sister genera, i.e., Nothotsuga-Tsuga and Keteleeria-Abies, while the other clade includes the sister genera Pseudotsuga and Larix, and the three closely related genera Pinus, Cathaya and Picea (Figs. 1, S1). The revealed intergeneric relationships are largely congruent with the finding of Wang et al. [23], and are generally consistent with the results of morphological and anatomical analyses (see review by Farjón [102]). However, Wang et al. [23] did not completely resolve the systematic position of Cedrus. According to the present study, Cedrus is sister to the Nothotsuga-Tsuga-Pseudolarix-Keteleeria-Abies clade (Figs. 1, S1), which is consistent with most recent molecular phylogenetic studies [18,29]. Moreover, like Wang et al. [23], our study supports the monotypic genus Cathaya as sister to Picea (Figs. 1, S1), rather than to Pinus as suggested by Lin et al. [29]. In Taxaceae, Torreya is sister to Amentotaxus, and Austrotaxus is closely related to the sister genera Pseudotaxus and Taxus (Figs. 1, 6, S1), corroborating previous studies [18,[91][92][93]. For Araucariaceae, the previous rbcL gene analysis suggested a basal position of Wollemia in the family [103]. However, the present study supports Wollemia as sister to Agathis (Figs. 1, S1), consistent with more recent studies [18,95,104,105].
The concatenated LFY and NLY CDS can not resolve some intergeneric relationships of cycads and Podocarpaceae very well (Figs. 1, S1). It is interesting that the addition of intron sequences can improve the resolution in Podocarpaceae but not in cycads (Figs. 3, 4), although divergence times of the cycad genera are similar to or longer than those of the Podocarpaceae genera (Fig. 5) [18,25]. Consistent with most previous studies [24,31,32,79], the phylogeny of cycads inferred from either CDS or CDS+Intron sequences of LFY and NLY supports the genus Dioon from tropical America as the basal-most lineage in Zamiaceae (Fig. 3), rather than sister to the Bowenia-Ceratozamia-Stangeria-Microcycas-Zamia clade in the PHYP tree constructed by Nagalingum et al. [25]. Actually, despite low support values in some clades, the present LFY + NLY gene tree is topologically very similar to the recently reconstructed phylogeny of cycads based on five single-copy nuclear genes [32]. For instance, the two genera Zamia and Microcycas, also from tropical America, have a sister relationship and form a clade sister to Stangeria, while the African Encephalartos and the Australian Lepidozamia form a clade sister to Macrozamia from Australia (Fig. 3). Moreover, our study also does not support the establishment of the family Boweniaceae or Stangeriaceae that was based on morphological analyses [106,107], since the two genera Bowenia and Stangeria are nested within Zamiaceae and do not form a monophyletic clade (Fig. 3), as found in most previous molecular phylogenetic analyses [24,25,31,79].
Compared to the CDS dataset, the CDS+Intron dataset provides a much better resolution for intergeneric relationships of Podocarpaceae (Fig. 4), a large family comprising 19 genera with a wide distribution in the tropics, especially in the Southern Hemisphere [108,109]. Our study strongly supports a large clade comprising 11 genera, of which the Australian Microcachrys and the South American Saxegothaea diverged first, followed by the two Australian genera Pherosphaera and Acmopyle, and then the three genera Dacrycarpus, Dacrydium and Falcatifolium (all distributed in Asia and Australia) forming the 'dacrydioid' group sister to the 'podocarpoid' group that include Retrophyllum, Nageia, Afrocarpus and Podocarpus. In addition, we found a close relationship among the three Australian genera Manoao, Lagarostrobos and Parasitaxus and a sister relationship between Prumnopitys and Sundacarpus (Fig. 4B). This phylogeny of Podocarpaceae constructed from nuclear genes is topologically highly consistent with those inferred from plastid DNA fragments [110] and from a combined analysis of nrITS1, NLY intron 2 and rbcL sequences as well as anatomical and morphological data [30]. However, our nuclear gene phylogeny strongly supports two pairs of sister genera Retrophyllum-Nageia and Afrocarpus-Podocarpus (Fig. 4B). The genus Phyllocladus is nested within Podocarpaceae (Fig. 4), and thus the family status of Phyllocladaceae is not supported.

Divergence times of gymnosperms
The divergence time estimation is very helpful to interpret the temporal evolution of organisms. Previous studies have provided divergence time estimates for different gymnospermous groups, such as Pinaceae [23], cycads [25], Podocarpaceae [110], Cupressaceae [21,22], and conifers [18]. However, only Crisp and Cook [14] estimated divergence times of gymnosperms as a whole using molecular clock, and in their study many extant genera were not sampled.
Our present study provides divergence time estimates for gymnosperms based on a sampling of all extant families and genera (Fig. 5). The estimated crown ages of some groups such as Pinaceae, cycads and Podocarpaceae are approaching to those reported in previous studies [18,25,110]. However, the estimated crown age of Cupressaceae and divergence times of most genera of this family are a little younger than those reported in Yang et al. [22] and Mao et al. [21]. This could be attributed to the discrepancy of different dating methods and delineation of different fossil calibrations.
Based on the molecular dating analysis, all of the extant five lineages of gymnosperms (cycads, ginkgos, cupressophytes, Pinaceae and gnetophytes) originated at least before 300 Ma (in the Carboniferous), but the crown ages of all families except Ginkgoaceae and Sciadopityaceae are younger than 200 Ma (Fig. 5), indicating that drastic extinctions occurred in the early evolution of gymnosperms, which might be caused by the two extreme cooling events in the Carboniferous and Triassic [111]. After 200 Ma, the divergence speed of genera is moderate when extinction is not considered (Fig. 7), although recent studies showed that the pulse of extinction and speciation in the Cenozoic, even in the late Tertiary, shaped today's species diversity of gymnosperms [14,25]. Leslie et al. [18] found that lineages of conifers that diversified mainly in the Southern Hemisphere show a significantly older distribution of divergence ages than their counterparts in the Northern Hemisphere. However, interestingly, we found that extant coniferous genera in the Northern Hemisphere are older than those in the Southern Hemisphere on average (Fig. 8A). In fact, if excluding the several genera that originated before 150 Ma, the distribution of divergence ages of the remaining genera is very similar between the two hemispheres (Fig. 8B). Of great interest is to investigate why more ancient genera survive in the Northern Hemisphere than in the Southern Hemisphere. Moreover, to get a more accurate estimation of the divergence times and a solid reconstruction of the evolutionary dynamics of gymnosperms, more nuclear genes or genome sequences should be used in future studies, and more reliable fossils are needed to be found. Supporting Information Figure S1 The ML and BI trees of gymnosperms constructed from combined LFY and NLY sequences. Numbers associated with branches are bootstrap percentages of ML higher than 50% and Bayesian posterior probabilities greater than 0.90, respectively. A, ML tree from the CDS sequences with Angiopteris lygodiifolia as outgroup; B, BI tree from the CDS sequences with Angiopteris lygodiifolia as outgroup; C, ML tree from the 1 st +2 nd codon positions with Angiopteris lygodiifolia as outgroup; D, BI tree from the 1 st +2 nd codon positions with Angiopteris lygodiifolia as outgroup; E, BI tree from the CDS sequences with cycads as functional outgroups; F, ML tree from the 1 st +2 nd codon positions with cycads as functional outgroups; G, BI tree from the 1 st +2 nd codon positions with cycads as functional outgroups. (PDF)

Table S3
The eleven calibration points used in divergence time estimation for gymnosperms. All constraints were given lognormal prior distributions, where the minimum age was set by the age of the fossil constraint and 95% confidence interval of the probability distribution extending 20 or 40 million years earlier than the minimum age. (DOC)   Phylogeny and Divergence Times of Gymnosperms