Networks in a Large-Scale Phylogenetic Analysis: Reconstructing Evolutionary History of Asparagales (Lilianae) Based on Four Plastid Genes

Phylogenetic analysis aims to produce a bifurcating tree, which disregards conflicting signals and displays only those that are present in a large proportion of the data. However, any character (or tree) conflict in a dataset allows the exploration of support for various evolutionary hypotheses. Although data-display network approaches exist, biologists cannot easily and routinely use them to compute rooted phylogenetic networks on real datasets containing hundreds of taxa. Here, we constructed an original neighbour-net for a large dataset of Asparagales to highlight the aspects of the resulting network that will be important for interpreting phylogeny. The analyses were largely conducted with new data collected for the same loci as in previous studies, but from different species accessions and greater sampling in many cases than in published analyses. The network tree summarised the majority data pattern in the characters of plastid sequences before tree building, which largely confirmed the currently recognised phylogenetic relationships. Most conflicting signals are at the base of each group along the Asparagales backbone, which helps us to establish the expectancy and advance our understanding of some difficult taxa relationships and their phylogeny. The network method should play a greater role in phylogenetic analyses than it has in the past. To advance the understanding of evolutionary history of the largest order of monocots Asparagales, absolute diversification times were estimated for family-level clades using relaxed molecular clock analyses.


Introduction
The only figure in On the Origin of Species [1] is an evolutionary tree that reflects Darwin's vision of descent with modification from a common ancestor. Today, phylogenetic methods, or ''treethinking'' [2], form the foundation of inferences in evolutionary biology [3][4][5]. Bifurcating phylogenetic trees underlie our understanding of organismal evolution and are also proving instrumental in the development of a more robust classification system based on natural (evolutionary) relationships. Nevertheless, searches to determine ''the tree'' remain problematic, as they can often overlook conflicts in the dataset. Competing signals may arise from stochastic substitution processes, poorly fitting evolutionary models or the heuristic nature of many tree search algorithms. They may also be the result of hybridisation (including introgression), recombination, horizontal/lateral gene transfer, genome fusion, ancestral polymorphism/deep coalescence/incomplete lineage sorting and gene duplication-loss [6]. The detection of data conflicts, and the extent to which they affect analysis, becomes an important first step in phylogenetic analysis. Data-display networks may reveal reticulation patterns that are unsuspected in the data and that may have an important bearing on subsequent analyses and their interpretation. Unfortunately, this field is rather poorly developed at present [6,7], and no tools are available that biologists can easily and consistently use on real data [8].
A neighbour net [9] is a split network that visualises certain collections of splits that have been derived from a distance matrix. These splits are constructed in an iterative fashion using a criterion similar to that used in the neighbour-joining (NJ) algorithm for tree construction [6,10]. Morrison [6] reanalysed a dozen published datasets using split networks, highlighting aspects of the resulting network that could be important for interpretation of the phylogenetic tree and pointed out that the network method should play a greater role in phylogenetic analyses than it has in the past.
Asparagales is the largest order of monocots [11][12][13][14][15][16] with ca. 25,000-42,000 species (ca. 50% of monocots, or 10-15% of flowering plants), including important crop plants such as Allium, Asparagus and Vanilla, and a host of ornamentals such as irises, hyacinths and orchids [17]. The circumscription of Asparagales and the included families have undergone a series of changes in recent years. When the Angiosperm Phylogeny Group (APG) [18] was being formulated, numerous narrow circumscriptions for the families of Asparagales largely followed those of Dahlgren et al. [19], but it was noted (APG II, 2003) that broader circumscriptions were also possible, leading to a set of sensu lato (s.l.) families being proposed with the earlier set of sensu stricto families listed in brackets. In APG III [20], the number of families in Asparagales recognised fell from 26 [18] to 14 due to the elimination of these bracketed families. Furthermore, a set of subfamilies for the expanded asparagalean families was also published to be more manageable for teaching purposes and to facilitate communication among specialists [21]. A number of studies have sampled all/most families of Asparagales sensu APG [11,14,17,18,[22][23][24][25][26][27][28], which have generally clarified the relationships among the families within Asparagales. However, uncertainties remain in two parts of the Asparagales phylogenetic tree. First, the exact relationships of some small families (e.g. Boryaceae, Doryanthaceae, Ixioliriaceae) in lower Asparagales and Aphyllanthoideae, in higher Asparagales, remain unresolved [17,22,23]. Previous studies [17,22] found weak support for a sister relationship between Ixioliriaceae and Tecophilaeaceae, which in turn formed a polytomy or weakly supported sister group to Doryanthaceae. An analysis of morphological data, however, placed Doryanthes as sister to Iridaceae [24]. The position of Boryaceae also remains unclear relative to the rest of the families (except for the orchids) and the hypoxid clade [15,23]. The positions of all of these families require additional evidence to establish their interrelationships [15]. Fay et al. [22] demonstrated that Aphyllanthes (monotypic, Aphyllanthoideae) has a destabilising position within Asparagaceae s.l. Other studies found that incompatible patterns were produced when analyzing different genes [14,17]. The second problem, related to the extreme species richness, diverse morphology and complex taxonomic history of Asparagales, is that the sampling of taxa in previous studies has been limited, and many genera have not been included. Although it is clear that adding multigene sequences and sampling will produce a better hypothesis of evolutionary history, more incompatibilities could arise. Previous studies have demonstrated that bifurcating phylogenetic trees can be valuable tools for investigating the evolutionary history of Asparagales, but it is not possible to simultaneously display contradictory evolutionary signals on any such tree. Phylogenetic networks can provide a useful alternative means of analysis because they allow visualisation of competing evolutionary scenarios within a single figure [6,29]. Here, we used a phylogenetic network method, neighbour net, to reanalyze the evolutionary history of Asparagales using a new comprehensive sampling of taxa and genes. In addition, using our estimates of the time of origin, we discuss their possible evolutionary history to improve our understanding of the processes that have generated such high diversity on this branch of the tree of life.

Neighbour-net Pattern of the Data
To gain a better understanding how conflicting signals were contained in the datasets, we constructed a neighbour net for the combined matrix of the four plastid genes (Figure 1), in which indels were not considered as informative characters. The outgroup Pandanus consisting of two species (Pandanales), together with Commelinales and Liliales species, were included as they are closely related to Asparagales [26]. The centre of the neighbour net was slightly netted, implying that the data support many conflicting deep splits. Nonetheless, the clades identified appeared to be quite robust as 21 clades were generally recovered, as indicated by the colours and arc labelling in Figure 1. The neighbour net showed strong support for monophyletic Aspar-agales. Commelinales, Liliales and Pandanales formed a close clade as the outgroup of Asparagales. The network largely confirmed the current recognised phylogenetic relationships [14,22,28]. In addition, there were strongly supported splits (and clusters), corresponding largely to the well-supported clades in the topology of the combined tree obtained with our parsimony and Bayesian analyses (Figure 2), except Milla biflora, which netted with Orchidaceae. Furthermore, most of the difficult taxon, with conflict position or extremely low resolution from regular phylogenetic analyses, appeared in critical state on the network graph. For example, Orchidaceae competed with Boryaceae and Blandfordiaceae etc. to root of Asparagales in previously researches [12,28,[30][31][32].

Phylogenetic Relationships
The total aligned matrix had 6,862 characters with 3,122 potentially phylogenetically informative sites for the four plastid genes: 1,472 base pairs (bp) for atpB, 1,820 bp for matK, 2,234 bp for ndhF and 1,336 bp for rbcL. In total, 163 base pairs were excluded from the combined matrix (1-17, 1449-1472, 3292-3316, 5480-5560, 6847-6862 bp), either at the beginning or end of sequences or where alignment of the ndhF sequences was ambiguous. Of the included characters, the numbers of potentially parsimony informative characters were 499 (33.9%) for atpB, 1,123 (61.7%) for matK, 1,160 (34.7%) for ndhF and 437 (32.7%) for rbcL ( Table 2). The matK gene was the most variable among the four genes, but gave slightly fewer parsimony informative sites than ndhF due to the longer length of the latter. The rbcL gene was length-conserved with no gaps, and atpB had only few insertions/ deletions (indels), whereas matK and ndhF included a number of indels.
Parsimony analyses of the individual plastid genes gave similar topologies as expected because these genes are inherited on the same linkage group. Aphyllanthes L. has previously been discussed as a problem taxon because of its labile phylogenetic position according to the analyses by different genes [17,22]. As in previous analyses, we also performed analyses that excluded and included Aphyllanthes, which only affected position and support values in Asparagaceae s.l. Here we present the results found when Aphyllanthes was included.
The combined data Fitch analysis with equal weights (EW) produced 14,523 equally most-parsimonious trees of 24,168 steps, with a consistency index (CI, including autapomorphies) of 0.27 and a retention index (RI) of 0.75. With successive weights (SW), the number of equally most parsimonious trees was reduced to one (CI = 0.70, RI = 0.85). The SW tree is one of the trees found with Fitch weights. The Bayesian tree shows the PPs summarised from the set of recovered post-burn-in trees. The parameters of the GTR+I+G model used in this analysis are listed in Table 2. There was only one minor area of discordance between the maximum parsimony (MP) and Bayesian trees: the interrelationships among three families: Aphyllanthaceae, Themidaceae and Doryanthaceae.
Due to the similarity in topology of the strict consensus parsimonious tree and the Bayesian tree, the latter having higher resolution, only the Bayesian tree found in the combined analysis is shown in Figure 2. We report three kinds of support value: parsimony bootstrap percentages with EW, SW and PP for Bayesian analysis. Pandanales was the nominated outgroup in accordance with the results of previous studies [17,22]. Within Asparagales, SW analysis had more nodes with strong support than EW, and the PP offered strong support for most nodes on the phylogenetic tree ( Figure 2).

Divergence Time Estimation
The mean path lengths (MPL) clock tests [34] revealed significant deviations from clock-like behaviour at most nodes of the tree for Asparagales (clock tests: 265; accepted: 14; rejected: 251). Hence, we used BEAST [35], which implements a ''relaxed clock'' methodology that does not assume any correlation between rates (thus accounting for lineage-specific rate heterogeneity), to estimate ages and the phylogenetic tree simultaneously. At the same time, we also used PATHd8, with the mean path length method; this programme is faster for a large dataset and permits rate changes across the tree [34]. We obtained slight younger ages in the results using PATHd8 than using BEAST.
The BEAST analysis that treated fossil priors as lognormal distributions provided an older estimated age (102-143 Ma, data not presented) for crown group of Asparagales than that using an exponential distribution (93-101 Ma), as well as larger variances around age estimates, especially at the base of the tree (also see [36]). The topology showed good agreement with previous analyses of these data using Bayesian methods, with a few exceptions (Agavoideae, Scilloideae, Brodiaeoideae and Aphyllanthoideae present in some one clade but in different relatively position). The age estimates for crown and stem nodes are shown in Figure 3, with a chronogram calibrated against the geological timescale. Additional sampling and age estimates for families and subfamilies of Asparagales are summarised in Table 3.

The Network Reveals a Useful Pattern in Asparagales
The detection of data conflicts and the extent to which data conflicts will affect the data analysis becomes an important first step in a phylogenetic analysis [6]. Phylogenetic networks, such as the split graphs produced by the neighbour-net algorithm, give a broad overview of competing evolutionary scenarios within a dataset [37]. These methods have been successfully used to analyse multigene plastid datasets (e.g. ferns, [38]; Ranunculeae, [39]), nuclear ribosomal DNA; Acer, [40]), and microbial and fungal evolution [9,41,42]. They have also been used in the context of genome sequencing surveys [43,44]. However, the use of networks as a tool for large-scale phylogenetic research has rarely been reported in the scientific literature [6].
In this study, we used the phylogenetic network method neighbour net to analyse a larger-scale sampling datasets of Asparagales. The network tree summarised the majority data pattern in plastid sequences, which with long terminal edges clusters indicated strong support for the family system of Asparagales sensu APG III that was modified to include three expanded families [21], consistent with recently published analyses [14,17,22,[26][27][28]45]. Most of the subfamilies (formerly as families) are pretty clear sustaining their taxonomic status in the split graphic. Otherwise, the short central edges forming the extensive cycles indicate broadly conflicting signals along the Asparagales backbone, but it is still clearly reflected in the underlying ''skeleton'' of evolutionary history. From the dominating tree-like pattern, we can anticipate that the four chloroplast genes in the data are compatible with one another and successfully infer phylogenetic trees [6].
The split pattern revealed strength of conflicting signals and helping us to understand how to affect the phylogenetic analysis. The phylogenetic indistinct taxon in regular phylogenetic analyses well appeared critical state on the split graph. In our case, at the base of Asparagales, astelioid, together with Orchidaceae, joined the main stem base of the network tree at the same position. However this situation means only included very little information about their relationships. It is perhaps unsurprising that the relationships of astelioid (especially Boryaceae) and Orchidaceae  are unstable in some previous studies. For example, Boryaceae has sometimes been placed as sister to Orchidaceae (e.g. [11]), although with weak support, and there are other topologies, including one embedding Orchidaceae in a paraphyletic Boryaceae-Hypoxidaceae clade [32]. Unexpectedly, M. biflora complexly netted to Orchidaceae on network analyses (Figure 1), however this taxon has been grouped within Brodiaeoideae (Themidaceae sensu APG II) at present parsimony and Bayesian inference ( Figure 2, part A) in line with previously reports [17,22]. In case of sequencing or sampling errors, the split network is possibly more sensitive to exhibit artificial than regular phylogenetic analyses. The biased pattern of M. biflora suggests that resampling is necessary in order to find real situation.
The conflicting signals may be caused by homoplasy or stochastic noise rather than recombination that were not detected across the plastid genome in the core Asparagales [45]. DNA sequences from organellar genomes (e.g. mitochondria, plastids) are largely considered to be inherited uniparentally and nonrecombining, with a single shared evolutionary history for the entire organellar genome [46][47][48][49]. Systematic mutational biases may also introduce conflicting phylogenetic signals within organelle sequences, especially between long-diverged taxa [50]. Although there may be reasons weak signals are introduced giving conflicting relationships, additional sequence data should allow identification of the bifurcating phylogenetic history of the organelle genome. Not unexpectedly, the continued examination of additional characters per taxon, 7 [17] and 17 plastid genes [23], and whole plastome sequences [45] gave higher resolution and bootstrap support to many clades in Asparagales.
Undoubtedly, it would be very wise to survey phylogenetic data using network methods before attempting to infer phylogenetic trees. Some attempts have begun [45], nevertheless the network methods should play a greater role in phylogenetic analyses than it has done to date. Compared with our inferred phylogenetic tree, it is worth noting that the network patterns reflect the tree bootstrap support to an extent, despite contrary opinions expressed previously [6,51].

Phylogeny of Asparagales
This study, with relatively dense taxon sampling and more diverse species representing more genera compared to previous phylogenetic studies, documented the stability of relationships within Asparagales. The family-level phylogenetic relationships found here were particularly congruent with other broad studies [14,22,23,[26][27][28]45], indicating that the tree topologies in previous studies are robust with respect to the different samples used to represent genera and taxa sampled.
Relatively dense taxon sampling is generally a beneficial strategy for reducing long-branch attraction and obtaining more accurate inferences of phylogenetic relationships among and within large groups of organisms [52][53][54][55]. Long-branch attraction has been invoked for the placement of several problematic Asparagales taxa, such as Aphyllanthoideae and Ixioliriaceae, which are relatively isolated taxa with a long terminal branch. The position of Aphyllanthes in previous studies was labile and weakly supported [17,22,23]. In the neighbour-net tree in this study, Aphyllanthes had long edges that join to the base of Asparagaceae s.l., close to Lomandroideae, as has been found in other studies [17]. However, its position changed from sister to Agavoideae (Agavaceae sensu APG II) to sister to Brodiaeoideae (Themidaceae sensu APG II) in our MP and BI trees, respectively, but always formed a moderately to strongly supported group with Agavoideae, Scilloideae and Brodiaeoideae (63/91/1.0), which is also consistent with previous studies [22,23,26,28]. Based on genome data (79-plastid gene matrix), Steele et al. [45] found that Aphyllanthes was sister to Agavoideae with moderate support and confirmed that it links the same subfamilies mentioned above using neighbour-net analyses. Obviously, Aphyllanthes may be suffering from not only long branch attraction (LBA), but also too few characters to define individual nearby branches as a result of rapid radiation [45].
Ixioliriaceae was inferred as a strongly supported sister group to Tecophilaeaceae in this study, a result that had variable support in previous analyses [17,22,26,28]. Analyses of morphological data and base chromosome number support the sister relationship of these two families [56]. Doryanthaceae remain unresolved, forming either a polytomy or a weakly supported sister to the clade of Ixioliriaceae/Tecophilaeaceae and the remainder of Asparagales (except Astelioid and Orchidaceae), consistent with previous analyses [13,26,28].

Divergence Time Estimates
The age estimates obtained across the major clades of Asparagales from the PATHd8 and BEAST analyses compared here overlap considerably (see Table 3). Overall PATHd8 produced slightly younger ages than BEAST. The BEAST analyses that used multiple (three) constraints with exponential distribution may be a good alternative to a lognormal distribution in the face of inadequate palaeontological information [59], which yielded a narrower 95% higher posterior density (HPD) and generally younger node ages than the latter, as noted by Bell et al. [36].
We estimated that the stem group of Asparagales dates to ca. 99-113 Ma and that the crown group dates to ca. 93-101 Ma, which agrees reasonably with Bell et al. [36], who reported a crown age range of 83-103 Ma (see Appendix S15 in their paper). However, Janssen and Bremer [31] suggested somewhat older dates of ca. 122 Ma and ca. 119 Ma, respectively. The topology within Asparagales, especially near the base, in the latter differed substantially from our results; e.g. they did not place Orchidaceae as sister to the rest of the order. Comparable results in Magallón and Castillo [60] were ca. 133.1 (stem), 125 (crown), 118.6 (stem) and 112.6 (crown) Ma for relaxed and constrained penalised likelihood dating, respectively. These molecular-based estimates suggest a Cretaceous origin of Asparagales. In this study, the estimates are obviously close to the oldest known fossil record of Asparagales (93-105 Myr old, see [61] Supplementary Methods for details ).
Asparagales; # 2 83.5 Ma, age for the MRCA of Zingiberales; # 3 106.565.5 (93-120) Ma, age for the root of the tree (The upper age constraint of 120 Ma corresponds to the oldest known Monocot fossil). Detailed descriptions see the section of material and methods in text. doi:10.1371/journal.pone.0059472.g003       Our estimated divergence time for the families in Asparagales is much younger than previously suggested by Janssen and Bremer [31], in which most families were indicated to be older than ca. 90 Ma. Orchidaceae is the largest and one of the ecologically and morphologically most diverse families of flowering plants [62]. Our results indicated that the most recent common ancestor of extant orchids lived in the Late Cretaceous (54-82 Ma), slightly overlapping the estimated age (76-84 Ma) based on the discovery of the first unambiguous fossil of Orchidaceae and a pollinator in amber [61]. Moreover, adding two newly described orchid fossils [63], Gustafsson et al. [64] reassessed the data and reported that all extant orchids shared a most recent common ancestor in the Late Cretaceous (ca. 77 Ma), suggesting that the diversification of orchids occurred in a period of global cooling after the early Eocene climatic optimum.
Iridaceae, with over 2,030 species in 65-75 genera, is the second largest family of Asparagales [65]. Based on plastid sequences and molecular clock techniques, Goldblatt et al. [65] inferred that Iridaceae diverged from the most closely related family, Doryanthaceae, ca. 82 Ma and that the crown group of the family diverged in the late Cretaceous ca. 66 Ma. The divergence of the stem group was dated to ca. 75 Ma and crown group to ca. 58 Ma. Goldblatt et al. [65] used a secondary date for the calibration point of the root node of Iridaceae, and this was suggested not to be ideal.
The split between core Asparagales and the remaining families is estimated after the K/T boundary. Furthermore, our molecular phylogenetic analyses suggest multiple rapid radiations have inferred throughout the diversification of major groups of Asparagales. For example, the largest orchid subfamilies diversification occur in a period of global cooling [64] and the possible radiation of lineages of Nolinoideae revealed from this study.
The fossil record of Asparagales is comparatively poor, with few fossils attributable to families reaching back beyond the Late Eocene, perhaps because of the herbaceous habit and widespread zoophilous pollination [66]. The use of more fossils with more sophisticated prior distribution affords exciting opportunities for Orders

Plant Materials
The taxa used for this study included 253 species of 201 genera representing all families in Asparagales [20]. In addition, 29 species representatives of Arecales, Zingiberales, Commelinales, Poales, Liliales and Pandanales were included, with two species of Pandanales as the nominated outgroup. The plant material used was either fresh or dried, collected from the field and dried, taken from specimens in herbaria, from the DNA Bank of the Royal Botanic Gardens, Kew (http://data.kew.org/dnabank/ DnaBankForm.html) or the Medicinal Plant Resources Bank of the Korea National Research Resource Centre (KNRRC) at Gachon University (for details, see Table 1). All necessary permissions and approvals for the described plant and specimen sampling were obtained from the respective curators, i.e. RBG Kew Gardens (Dr. M. W. Chase), Kunming Botanic Garden (MOU), Ivana Franka Botanic Garden (MOU), Australia Royal Botanic Garden (MOU), KEW DNA Bank. Voucher specimens of the taxa were prepared; source, voucher information and database accession numbers are listed in Table 1.

DNA Extraction and Polymerase Chain Reaction Sequencing
Total genomic DNA was extracted from 0.5-1.0 g fresh or silica gel-dried leaves using the 26 CTAB buffer method [67]. Lipids were removed with SEVAG solution (24:1 chloroform:isoamyl alcohol), and DNA was precipitated with isopropanol at -20uC. Total extracted DNA was dissolved in 16 TE buffer and stored at -70uC. The atpB gene was amplified using the primers and protocols of White et al. [68], Nickrent and Soltis [69] and Soltis and Soltis [70]. The matK gene was amplified with the primers and protocols of Johnson and Soltis [71] and Hilu et al. [25]; ndhF was amplified with the primers reported by Terry et al. [72] and Olmstead et al. [73]; and rbcL was amplified with the primers and protocols of Olmstead et al. [74], Shinwari et al. [75] and Fay and Chase [76]. Amplifications were carried out in 50-mL reactions containing 2 units Taq DNA polymerase, 5 mL 106reaction buffer (100 mM Tris-HCl, 500 mM KCl, 15 mM MgCl 2 ), 2.5 mM dNTPs, and 5 pmol mL -1 forward and reverse primers using a Perkin-Elmer 9700 (Applied Biosystems, ABI, Beverly, MA, USA). Dimethyl sulphoxide (DMSO; 2%) was added to reduce the secondary structure in the polymerase chain reaction (PCR). PCR conditions consisted of an initial denaturation at 94uC for 2 min, followed by 30-35 cycles at 94uC for 1 min, 50uC-55uC for 1 min and 72uC for 3 min, followed by a final 7-min extension at 72uC. All PCR products were purified using ExoSAP-IT (USB Corporation, Cleveland, OH, USA), according to the manufacturer's protocols. Dideoxy cycle sequencing was performed using the chain-termination method and an ABI Prism BigDye Reaction Kit (ver. 3.1) in accordance with the manufacturer's protocols. Products were run on an ABI 3700 Genetic Analyser according to the manufacturer's protocols. Sequence editing and assembly of contigs were carried out using the Sequence Navigator and AutoAssembler software (ABI).

Sequence Alignment
All sequences were aligned initially in Muscal [77] and MacClade (ver. 4.0) [78] and then adjusted manually following the guidelines of Kelchner [79]. Manual alignment of rbcL and atpB was accomplished easily because no insertions/deletions occurred for rbcL and they were rare for atpB. In contrast, matK and ndhF were subject to length variation. These two genes were aligned and further edited manually by deleting small sections in which the homology of characters across taxa could not be determined with confidence. In total, the combined alignment was 6,699 characters in length ( Table 2). The aligned matrix has been submitted as Appendix S1.

Neighbour Net
Neighbour nets have the attractive property of always being represented in the plane through a circular ordering of the taxa. Although closely related to split decomposition [80], for larger datasets, the neighbour-net method often provides better resolution than split decomposition due to the criterion used to calculate support for relationships among taxa [9]. To construct neighbour nets, the default settings in SplitsTree4 [81] were used, applying uncorrected P distances with gaps and ambiguous sites coded as missing data. Bootstrap support for internal splits, which define clusters, was calculated with 1,000 replicates.

Parsimony Analysis
PAUP* ver. 4.10b for Macintosh [82] was used for parsimony analysis. Tree searches were conducted using the Fitch (equal weight, EW) [83] criterion with 1,000 random sequence additions and tree bisection/reconnection (TBR) branch swapping, but permitting only five trees to be held at each step to reduce the time spent searching suboptimal ''islands'' of trees. All shortest trees collected in the 1,000 replicates were swapped on to completion without a tree limit. DELTRAN character optimisation was used to illustrate branch length throughout. To evaluate internal support, 1,000 bootstrap replicates were conducted with equal weights (EW) and successive approximation weights (SW; [84]), and TBR branch swapping with five trees held at each step and simple taxon addition [85]. The following descriptions for categories of bootstrap percentages were used: weak, # 74; moderate, 75-84; well supported, 85-100 [14].

Bayesian Analysis
Further phylogenetic analyses were performed using BI as implemented in MrBayes ver. 3.12 [86]. PAUP* ver. 4.10b and MrModeltest ver. 2.2 [87] were used to determine the best model of DNA substitution for each partition by evaluating all models against defaults of the programme. The GTR+I+G model (a general time-reversible model with a proportion of invariable sites and a gamma-shaped distribution of rates across sites) was chosen as the best-fit substitution model in all four partions. Consequently, the combined data matrix was assigned a model of six substitution types (n = 6) with a proportion of invariable sites. Four simultaneous Markov chain Monte Carlo (MCMC) chains were run for 1610 7 generations and sampled every 1,000 generations, and the first 25% sampled trees were excluded as burn-in. Postburn-in samples of trees were used to construct a 50% majority rule consensus cladogram in PAUP* ver. 4.10b. The proportions of bifurcations found in this consensus tree are given as posterior clade probabilities (PPs). Bayesian analysis was performed twice to ensure convergence of the results.

Molecular Dating and Fossil Calibration
We used the combined dataset to estimate the age of origin of Asparagales using the programmes PATHd8 [34] and BEAST v1.7.4 [35,88]. The phylogenetic trees were constructed using MP with PAUP*4.0. The branch lengths on this tree were estimated using DELTRAN optimisation. We used the mean path length method of the PATHd8 programme. The MPL clock tests were used to test the molecular clock. The PATHd8 programme requires at least one reference point to be fixed. We used the oldest monocot fossil estimate of 120 Ma [89] as the fixed crown age of the root to calibrate the clock. BEAST v1.7.4 was also used to estimate the divergence times using multiple calibration points and a relaxed molecular clock approach. The BEAUti interface was used to create input files for BEAST with the tree priors set as follows: 1) age for the most recent common ancestor (MRCA) of extant Asparagales: exponential distribution with a mean of 2.0 and an offset 93 Ma that equalled the minimum age of the fossil (see discussion in [61], labelled # 1 in Figure 3); 2) age for the MRCA of Zingiberales: exponential distribution with a mean of 2.0 and an offset 83.5 Ma which equalled the minimum age of the fossil (see [36,90], labelled # 2 in Figure 3); 3) age for the root of the tree (The upper age constraint of 120 Ma for the calibrations above corresponds to the oldest known Monocot fossil [89]): normal prior distribution with mean 106.5 Ma and standard deviation of 5.5 (giving a 95% CI ranging from 93-120 Ma, labelled # 3 in Figure 3).
The general time-reversible (GTR+ I+G) nucleotide-substitution model was used for the molecular clock model and Yule Process was chosen as speciation process for data set. Several short BEAST runs were first performed to examine the performance of the MCMC. After optimal operator adjustment, as suggested by the output diagnostics, three final BEAST runs each containing 10,000,000 generations were performed, and a tree was saved every 1,000 generations. All resulting trees were then combined with LogCombiner v1.7.4 [35], with a burn-in of ca. 45%. Log files were analysed with Tracer v1.5 [91], to assess convergence and confirm that the combined effective sample sizes for all parameters were enough. A maximum credibility tree was then produced using TreeAnnotator v1.7.4 [35,88]. These were visualised using FigTree v.1.3.1 with means and 95% HPDs of age estimates. An XML file for analyses has been submitted as Appendix S2.

Supporting Information
Appendix S1 The aligned data matrix in this study (Nexus).

(NEX)
Appendix S2 The XML file used for divergence time estimates in BEAST analysis. (XML)