Frequent gene flow blurred taxonomic boundaries of sections in Lilium L. (Liliaceae)

Gene flow between species may last a long time in plants. Reticulation inevitably causes difficulties in phylogenetic reconstruction. In this study, we looked into the genetic divergence and phylogeny of 20 Lilium species based on multilocus analyses of 8 genes of chloroplast DNA (cpDNA), the internally transcribed nuclear ribosomal DNA (nrITS) spacer and 20 loci extracted from the expressed sequence tag (EST) libraries of L. longiflorum Thunb. and L. formosanum Wallace. The phylogeny based on the combined data of the maternally inherited cpDNA and nrITS was largely consistent with the taxonomy of Lilium sections. This phylogeny was deemed the hypothetical species tree and uncovered three groups, i.e., Cluster A consisting of 4 taxa from the sections Pseudolirium and Liriotypus, Cluster B consisting of the 4 taxa from the sections Leucolirion, Archelirion and Daurolirion, and Cluster C comprising 10 taxa mostly from the sections Martagon and Sinomartagon. In contrast, systematic inconsistency occurred across the EST loci, with up to 19 genes (95%) displaying tree topologies deviating from the hypothetical species tree. The phylogenetic incongruence was likely attributable to the frequent genetic exchanges between species/sections, as indicated by the high levels of genetic recombination and the IMa analyses with the EST loci. Nevertheless, multilocus analysis could provide complementary information among the loci on the species split and the extent of gene flow between the species. In conclusion, this study not only detected frequent gene flow among Lilium sections that resulted in phylogenetic incongruence but also reconstructed a hypothetical species tree that gave insights into the nature of the complex relationships among Lilium species.


Sampling and DNA extraction
In total, 29 Lilium samples, representing 20 species in seven sections, were collected from Taiwan, France, Japan, and China (Sichuan, Shandong, Yunnan, and Hubei Provinces). We confirm that the field studies did not involve endangered or protected species. No permission was needed because the samples were not collected in a protected area. The sample information is shown in Table 1. Leaf materials were dried with silica gel and stored at -80˚C for later experiments. Genomic DNA from each sample was extracted using a CTAB method [47], diluted to 2 ng/μL with TE solution, and stored at -20˚C.

Primer design and selection
In total, 29 loci were investigated in our study. Of these loci, 20 were randomly selected from the expressed sequence tags (EST) of L. formosanum [48] and the NCBI EST database of L. longiflorum (http://www.ncbi.nlm.nih.gov/nucest). Primers for each selected locus were designed using the software Primer 3.0 [49]. Gene codes, primer sequences, and the putative functions for the 20 EST loci are available in Table 2. In addition, the internal transcribed spacer of the nuclear ribosomal DNA (nrITS) [50], as well as the additional eight chloroplast DNA (cpDNA) loci were employed (Table 2).

PCR, cloning, and sequencing
PCR amplification was conducted with a reaction volume of 50 μL, containing 25 μL of 2×Taq polymerase master mix (Ampliqon, Denmark), 5 μL of template DNA (2 ng/μL), 5 μL of each primer (2 pM), and distilled water. The PCR reactions were performed using the MyCycler thermal cycler (Bio-Rad, USA) with 35 cycles. For each cycle, we set an initial denaturation at 94˚C for 50 s, annealing at 48-53˚C (optimized for each locus, Table 2) for 50 s, and extension at 72˚C for 80 s. A final extension at 72˚C for 10 min was applied. The PCR products were run on agarose gels, and the targeted DNA fragments were sliced and purified. The purified PCR products were ligated to a pGEM-T Easy vector at 4˚C overnight and transformed into E. coli DH5a cells. Positive clones were validated with blue-white screening followed by colony PCR. To ensure that both diploid alleles were sequenced, we randomly selected five to seven clones from each individual, and discarded the low-frequency clones due to possible PCR errors. Sanger sequencing was conducted in both directions with the universal T7P and SP6 primers using the 96-capillary 3730xl DNA Analyzer (Genomics Biotech Co., Ltd.).

Sequence alignments, indel identification, and tree reconstructions
The sequences of the EST, nrITS, and cpDNA loci of the 20 Lilium species were validated using BLAST on NCBI. All sequences were deposited in the NCBI GenBank, with accession numbers of KX863745-KX865072. In addition, the full-length chloroplast genome and nrITS sequences of Cardiocrinum cordatum (KX575837.1 and KP712019.1, respectively) and Fritillaria taipaiensis (KC543997.1 and KT861551.1, respectively) were downloaded from NCBI as outgroups for the Lilium species. Sequences of each locus were then aligned using CLC Free Workbench (http://www.clcbio.com/) with the default settings, and gap sites were manually checked. Indel events for each locus were identified and coded with SeqState [51] to be incorporated in the phylogeny reconstruction. Genetic distances of each alignment were estimated using the two-parameter model implemented in MEGA 6 [52][53].
For the cpDNA markers, the alignments of all the genes representing 5,432 bp were concatenated, the indels were coded according to Simmons and Ochoterena's simple coding method [54] in SeqState, and a Bayesian inference tree was generated with MrBayes v. 3.2.6 [55]. The best substitution models for the cpDNA were evaluated by MEGA6, and the substitution model used in MrBayes was set accordingly (S1 Table). We performed > 100,000 steps of a Markov chain Monte Carlo (MCMC) for each gene to ensure the average standard deviation of the split frequencies was lower than 0.01, with a sample frequency of 100, print frequency of 100, and diagnosis frequency of 1,000. After summarizing the parameter values, the potential scale reduction factor was confirmed to be approximately 1.0 for all parameters. Finally, the consensus tree was summarized using the default settings. For nrITS and EST markers, tree reconstructions were also conducted with MrBayes for individual loci with the same settings as the cpDNA.
For integrating the information of cpDNA and nrITS genes to reconstruct the phylogeny, the software BEAST version 1.7.5 [56] was used. An uncorrelated relaxed clock model was set  for both cpDNA and nrITS loci. The priors of the substitution rate were set as uniform distributions at initial values of 0.000933 and 0.00968, respectively. The prior of the divergence time of Lilium samples was set as a normal distribution with a mean of 13.6 million years and a standard deviation of 1.5 million years based on the estimation of Gao et al. [57]. The length of the MCMC run was set to 10 million, and the parameters were saved every 1,000 steps. The results of the trees for 10 independent runs with different seeds were combined with the program logcombiner [56], with 50% of the trees discarded as burn-in, and subsequently processed by the program treeannotator [56]

Intraspecific genetic variations, recombination rates (Rm)
Genetic diversities among the studied taxa were estimated using DnaSP version 5.0 [58]. The nucleotide diversities (π) and the minimum recombination events were calculated. The potential gene flow among the species was inferred from the recombination events and the shared genetic variations among the species [59].

IMa analyses
To sophisticatedly estimate gene flow between species clusters, the Isolation with Migration model that was implemented in program IMa2 [60] was applied, and six model parameters were calculated using coalescence simulations and Bayesian computational procedures: the divergence time (t), the bidirectional migration rates (m 1 and m 2 ), and the effective population sizes of the ancestral (θ A ) and two current populations (θ 1 and θ 2 ). Taxa with a possible hybrid origin, L. davidii var. willmotiae, L. bulbiferum, and L. 'Casa Blanca', were excluded. Using the coalescence time of 13.6 million years ago for the Lilium crown group [57], we estimated the substitution rates for the 20 EST loci by dividing the root height of the Bayesian trees by the coalescence time (S2 Table). The infinite site (IS) models were applied to all the loci. Because the IMa2 program does not accept genes containing recombinant fragments, the IMGC program was used to extract the largest non-recombinant DNA fragments from the aligned sequences [61]. After processing our data using the IMGC program, the DNA fragments that were shorter than 100 bp were excluded due to a lack of sufficient genetic information. To ensure that there were enough heating steps for obtaining a reliable result, 20 Markov chain Monte Carlo chains were used with the following heating parameters: ha of 0.96 and hb of 0.9. IMa2 runs were performed and saved using at least 3 million burn-in steps followed by at least 5 million steps (50,000 genealogies). All the effective parameter sample sizes were greater than 100. Three independent runs with different random seeds were performed to check for consistency across the results. The final results were calculated by averaging the values estimated in each run. The migration rate per generation (M) was calculated by multiplying the m value by the geometric mean of the substitution rates (μ).

Phylogeny based on eight cpDNA genes
The Bayesian inference phylogram based on the concatenation of eight cpDNA loci showed that the Pseudolirium section was monophyletic and at the basal position, while 18 other species from 5 sections were clustered into two main groups (Fig 1). The Leucolirion, Martagon, and Sinomartagon sections were paraphyletic or polyphyletic, whereas the Liriotypus section was monophyletic. Unexpectedly, three Lilium speciosum var. gloriosoides samples from different populations (two samples from Taiwan and one from Yunnan in China, see Table 1) were not clustered together. Within the Leucolirium section, L. formosanum was sister to L. leucanthum, L. sulphureum was sister to L. sargentiae, but the four taxa were not clustered together. Within the Sinomartagon section, L. leichtlinii was clustered with L. davidii var. davidii, but L. taliense, L. duchartrei, and L. nepalense were clustered with L. speciosum var. gloriosoides from Yunnan (PD08) of the Archelirion section. In addition, L. davidii var. willmottiae of the Sinomartagon section was sister to L. tsingtauense in the Martagon section, suggesting a hybrid origin.

Phylogeny based on nrITS
The Bayesian inference tree based on nrITS suggested that 20 Lilium species were divided into one main cluster and several smaller clusters (Fig 2).  displayed a closer relationship between L. bulbiferum and section Sinomartagon, which has also been revealed in previous studies [38,40,57].
Phylogeny based on the combined data of nrITS and cpDNA genes: Hypothetical species tree By combining nrITS and cpDNA regions using the BEAST software, the phylogeny showed better resolution on the delimitation of the sections (Fig 3A). This tree uncovered three clusters: Hence, the tree was deemed the hypothetical species tree (Fig 3B) Fig 3B). Furthermore, the topology comparisons between the hypothetical species tree (Fig 3B) and the 20 EST trees (S1 Appendix) were conducted focusing on the sister relationships among the groups. Of the gene phylogenies, only the LL22 tree uncovered all the sister groups (S1 Appendix). Of these sister groups, L. pardalinum and L. parryi showed the highest supporting rate (55%), followed by L. formosanum and L. leucanthum (40%) as well as L. sargentiae and L. sulphureum (40%) (S3 Table). Unexpectedly, the sister relationship between L. speciosum ssp. gloriosoides in Taiwan and L. maculatum was not supported by any EST tree.
We used IMa2 to evaluate the levels of gene flow among the three Lilium clusters (Fig 4). The taxa that had a possible hybrid origin were excluded from this analysis. The level of gene flow was estimated by the migration rates per generation (M). In the pairwise comparisons between Clusters A, B, and C, the highest level of gene flow occurred from Clusters C to A (M = 7.43 × 10 −7 ), followed by the gene flow from Clusters B to A (M = 5.21 × 10 −7 ) and the gene flow from Clusters A to B (M = 1.03 × 10 −7 ). All the above directions showed significant gene flow, as was suggested by the likelihood ratio test (p < 0.05) (S4 Table). These results suggested that hybridization across sections may have occurred frequently in Lilium.

Indel events
Our results revealed that indel distributions varied among the Lilium taxa in the nuclear loci. The indel events in cpDNA and nrITS were identified and were included in the phylogeny reconstruction, and some informative indels in the EST loci were shown in the S5 Table. For

Discussion
This study unraveled the factors that contribute to the phylogenetic conflicts by exploring multilocus phylogenies of Lilium. Although phylogenetic conflict can be rampant across loci, only a few studies have addressed the causal factors comprehensively [3][4]. For Lilium, several phylogenetic studies reflected its phylogenetic conflict but failed in illustrating the mechanism or factors resulting in the conflict [41][42]. Here, we evaluated the influences of the analytical and biological factors that led to the phylogenetic conflict in Lilium. Analytical and biological factors causing phylogenetic conflicts Multilocus approach. While the debate on the advantages and disadvantages of combining data in reconstructing multilocus phylogeny continues, many recent studies suggested that combining all the available data is feasible and reliable and that elucidating incongruence provides hints into the evolutionary history [21,23,25]. Unfortunately, most of the phylogenetic studies that combined sequences across loci provided very few explanations regarding the reliability of the combined trees (e.g., [62][63]). In our study, with a wide locus sampling, predominant phylogenetic incongruences across different loci were revealed (e.g., S3 Table and S1 Appendix). By combining the cpDNA and nrITS loci, species from the same sections were mostly clustered and the topology of the 17 Lilium species substantially agreed with the taxonomic sections by Comber [30] (Fig 3B). The hypothetical species tree (Fig 3B) uncovered three clusters: Cluster A (sect. Pseudolirium and Liriotypus), Cluster B (Leucolirion, Daurolirion, and Archelirion), and Cluster C (Leucolirion, Martagon and Sinomartagon). The multilocus analyses not only gave an insight into the Lilium phylogeny but also provided opportunities to uncover the taxa that had a hybrid origin as shown earlier and revealed the interspecific gene flow, which is addressed in the following paragraphs.
Interspecific gene flow. Intraspecific gene flow could provide concordance in the species genome. It would homogenize the genomes and thereby block the genome divergence of the isolated populations [64]. However, when gene flow among taxa occurs, phylogenetic incongruence ineluctably arises [14][15]. Interspecific gene flow is not rare in plants [65,66]. Examples include Howea belmoreana and H. forsteriana [17], and Arabidopsis halleri and A. lyrata [18], all revealing uninterrupted gene flow after speciation [18]. Interspecific gene flow causes extreme difficulty with regard to the phylogenetic reconstruction.
The divergence of the crown group of Lilium can be dated back to 13.6 million years ago [57]. All the examined taxa in our study diverged a very long time, even the closest sister pair, L. taliense and L. duchartrei, for whom the divergence was more than 1.4 million years. Of the three clusters identified in the hypothetical species tree (Fig 3B), long divergences between the clusters (more than 10 million years) tended to reject incomplete lineage sorting, which blurs the species delimitation in Lilium. Given the rampant phylogenetic incongruence across the loci, interspecific gene flow may have been largely involved in the evolution of the Lilium species. Here, three inspections were proposed to elucidate the possibility and strength of interspecific gene flow.
First, as demonstrated earlier, L. bulbiferum and L. davidii var. willmottiae were recognized as hybrids based on their inconsistent placements on the cpDNA and nrITS trees (Figs 1 and  2). Second, in the DnaSP analysis, 17 out of 20 (85%) EST loci appeared to have high number of recombination events (Table 3). Third, the result of the IMa2 analyses suggested that historical gene flow likely occurred between the three clusters of the hypothetical species tree, especially the genetic exchanges with Cluster A (Fig 4). It is noticeable that all the samples in Cluster A were cultivars, implying the possibility of artificial hybridization, whether intentional or not. It has been shown that the artificial crosses between the different Lilium sections were common and not difficult [67]. As the strongest gene flow occurred from Cluster C, which predominately originated from China, to Cluster A of America or Europe, the gene flow across continents also implied that artificial hybridization may have blurred the species/section boundaries of the Lilium species.
Overall, even though our inspections have limited power in surveying the quantity and direction of gene flow due to the small sample size of each taxon, our results suggested that extensive gene flow among taxa had occurred in Lilium. The plentiful interspecific gene flow apparently contributed to the difficulties in section delimitation. It was likely that interspecific gene flow arose from artificial hybridizations, and thereby caution ought to be exercised in using cultivated samples for phylogeny reconstruction.

Phylogenetic implications
Our results reflected adequate resolution on the phylogenies, despite a small sample size, reflecting the power of incorporating multiple loci in the phylogenetic reconstruction. Some taxa that were assigned to the same section appeared to be sister groups in the phylogenies of cpDNA and nrITS, thanks to the low genetic recombination and lower substitution rate of the maternally inherited cpDNA (Table 3 and S2 Table; [68]) and/or the concerted evolution of nrITS [69][70][71].
Our phylogenies affirmed the previous studies that identified the Martagon section as a monophyletic group [37][38][43][44] with close relationships to the Leucolirion and Sinomartagon sections [36] (Fig 3B). Moreover, the Liriotypus section was determined to be polyphyletic in previous studies that include L. bulbiferum [38,40,43], whereas it was determined to be monophyletic in the studies that excluded L. bulbiferum from the analysis [37,44]. Our study revealed similar results, with the cpDNA phylogeny uncovering monophyly of the Liriotypus section, whereas the nrITS phylogeny showed that L. bulbiferum clustered with the Sinomartagon section. Apparently, the hybrid origin of L. bulbiferum caused noise in the phylogenetic inference. Likewise, the BEAST analysis, which was based on the combined data of cpDNA and nrITS, further supported the monophyly of the Liriotypus section and its close affinity to the Sinomartagon section. Interestingly, when L. bulbiferum was removed from the phylogenetic analysis, the Liriotypus section became the neighbor of the Pseudolirium section ( Fig 3B). Furthermore, phylogenies of the EST loci revealed that L. bulbiferum was clustered with L. davidii, L. leichtlinii (sect. Sinomartagon), L. monadelphum, or L. pyrenaicum (sect. Liriotypus) (S1 Appendix). These results suggested that L. bulbiferum did show affinity to the Sinomartagon section, as suggested by other studies based on nrITS [38;40;57] and a maternal background from the Liriotypus section based on the chloroplast DNA. Altogether, we suggested that L. bulbiferum is likely a hybrid between the Sinomartagon and Liriotypus section, with the latter as the maternal parent. Furthermore, the Pseudolirium section appeared to be basal in Lilium, both in the phylogenies of cpDNA and the combined data, a finding consistent with the matK gene phylogeny [57].
Our study also revealed that the Leucolirion and Sinomartagon sections were polyphyletic, which largely corroborated earlier phylogenies [36][37][38][43][44]57]. In the Leucolirion section, 40% of the EST loci supported that L. formosanum and L. leucanthum were sisters, and 40% of nuclear loci showed that L. sargentiae was related to L. sulphureum, while no data collected here indicated the clustering of these four species (S3 Table, S1 Appendix). The close relationship between L. sargentiae and L. sulphureum also corresponded to the geographic regions where they grow. In contrast, no overlap in the distribution of L. formosanum and L. leucanthum has been reported. Although the allopatric distribution of this sister group might imply a longer divergence between L. formosanum and L. leucanthum, the close phylogenetic relationship indicated their affinity. However, the possibility of a sampling bias that contributes to this allopatric match relationship cannot be ruled out. The cpDNA, nrITS, and the combined data all suggested that L. 'Casa Blanca' was clustered with L. sargentiae and L. sulphureum (Figs 1-3). This may infer that part of the parental species of L. Casa Blanca was from the Leucolirion section. The largest section, Sinomartagon, contains 22 taxa, which are morphologically distinguishable from each other [30,72]. Nishikawa et al. [38] divided this section into five groups according to the phylogeny based on the nrITS data. Our results on the Sinomartagon section generally agreed with Nishikawa et al.'s work [38].
Even though our sampling on the Archelirion section was restricted to two populations of L. speciosum ssp. gloriosoides, the hypothetical species tree and most of the EST trees revealed genetic dissimilarities between the populations (Fig 3B and S1 Appendix). The individuals in Taiwan were closely related to the Daurolirion section, while the individual isolated from China was related to L. nepalense of the Sinomartagon section. Accordingly, we suggest assigning L. speciosum ssp. gloriosoides of China to the Sinomartagon section instead of the Archelirion section.

Conclusions
In summary, multilocus analyses enabled us to uncover interspecific gene flow, identify the taxa with hybrid origins, and comprehensively reconstruct the evolutionary history of Lilium. The hypothetical species tree better resolved the section classification of the Lilium species. Our study suggested that future studies exploring both analytical and biological factors that cause phylogenetic conflicts would provide a better understanding of the evolutionary relationship among plant species.
Supporting information S1 Table. The substitution models for all the loci used in this study.