Origin and Reticulate Evolutionary Process of Wheatgrass Elymus trachycaulus (Triticeae: Poaceae)

To study origin and evolutionary dynamics of tetraploid Elymus trachycaulus that has been cytologically defined as containing StH genomes, thirteen accessions of E. trachycaulus were analyzed using two low-copy nuclear gene Pepc (phosphoenolpyruvate carboxylase) and Rpb2 (the second largest subunit of RNA polymerase II), and one chloroplast region trnL–trnF (spacer between the tRNA Leu (UAA) gene and the tRNA-Phe (GAA) gene). Our chloroplast data indicated that Pseudoroegneria (St genome) was the maternal donor of E. trachycaulus. Rpb2 data indicated that the St genome in E. trachycaulus was originated from either P. strigosa, P. stipifolia, P. spicata or P. geniculate. The Hordeum (H genome)-like sequences of E. trachycaulus are polyphyletic in the Pepc tree, suggesting that the H genome in E. trachycaulus was contributed by multiple sources, whether due to multiple origins or introgression resulting from subsequent hybridization. Failure to recovering St copy of Pepc sequence in most accessions of E. trachycaulus might be caused by genome convergent evolution in allopolyploids. Multiple copies of H-like Pepc sequence from each accession with relative large deletions and insertions might be caused by either instability of Pepc sequence in H- genome or incomplete concerted evolution. Our results highlighted complex evolutionary history of E. trachycaulus.


Introduction
Interspecific or intergeneric hybridization and polyploidization are two widespread and evolutionarily important phenomena in plants, which play important roles in the formation of new allopolyploid species [1][2][3]. Numerous studies have indicated that many intra-and inter-genomic changes that accompanied allopolyploid formation such as rapid elimination and recombination of low-copy sequence fragment, DNA methylation pattern changes, retrotransposon activation, intergenomic conversion and epigenetic changes, might have produced a more harmonious behavior and activity of the different constituent genomes. More importantly, those genomic alterations exhibited different evolutionary dynamics which might lead to genetic asymmetry evolution resulting in conformity and convergent effects [4][5][6][7][8][9].
The tribe Triticeae combines a wide variety of biological mechanisms and genetic systems, and is an excellent group for studying evolutionary dynamics and speciation in plants [10]. Within this tribe, Elymus L. is the largest genus composed exclusively of allopolyploids with approximately 150 species [11]. Five basic genomes (St, H, Y, P, and W) have been cytologically assigned to the species in this genus (Genome symbols follow [12]). The St genome found in all Elymus species was supposedly donated by Pseudoroegneria (Nevski) Á. Löve. The H, P, and W genomes were derived from Hordeum L., Agropyron Gaertn., and Australopyrum (Tzvelev) Á. Löve, respectively, while the origin of Y genome is unknown [13][14][15][16][17][18][19][20][21].
Elymus trachycaulus (Link) (2n = 4x = 28) is a short-lived perennial, self-pollinating allotetraploid species. The distribution range of E. trachycaulus extends from Alaska to Newfoundland and all the way down south to Mexico, and usually grows in open forests and along roadsides [22]. The number of infraspecific taxa in the E. trachycaulus complex that are currently recognized varies from three to six, but considerably more have been recognized in the past [23]. The delimitation of taxa within E. trachycaulus complex is controversial and difficult, since the morphological characters used to distinguish infraspecific taxa (for instance, length and density of the spike), are at least partially under environmental control. Adding to this difficulty are some relatively distinct entities linked by morphologically intermediate plants derived from hybridization [23]. Previous studies have shown that E. trachycaulus complex is the most morphologically and geographically diverse species of Elymus in North America [24], and have showed considerable diversity [24][25][26][27][28][29][30][31][32][33]. Like most North American species of Elymus, E. trachycaulus is a tetraploid that combines the genomes of a Pseudoroegneria species (St genome) and a wild Hordeum species (H genome) [34][35][36][37], but little more is known about its origin and evolutionary dynamics.
In this study, we used two single copy nuclear genes: the second largest subunit of RNA polymerase II (Rpb2) and the phosphoenolpyruvate carboxylase (Pepc), along with chloroplast DNA trnL-trnF region (spacer between the tRNA-Leu (UAA) gene and the tRNA-Phe (GAA) gene) to explore genome evolutionary dynamics and the origin of tetraploid E. trachycaulus.

Plant materials and DNA extraction
Thirteen accessions of E. trachycaulus species were analyzed. DNA was extracted from fresh young leaf using the method of [38]. Two low copy nuclear gene (Rpb2 and Pepc) and chloroplast TrnL/F sequences from different accessions of E. trachycaulus were amplified and sequenced. Rpb2, Pepc and TrnL/F sequences for some diploid Triticeae species representing the St, H, I, Xa, Xu, W, P, E and V genomes were obtained from the published data [39][40][41], and included in the analyses. Plant materials with accession number, genomic constitution, geographical origin, and GenBank identification number are presented in Table 1.

DNA amplification and sequencing
The sequences of Rpb2, Pepc and cpDNA TrnL/F were amplified by polymerase chain reaction (PCR) using the primers P6F and P6FR [42], PEPC-F and PEPC-R [40], and TrnL and TrnF [41], respectively. DNA was amplified in a 20 μl reaction containing 20 ng template DNA, 0.25 mM of each deoxynucleotide (dATP, dCTP, dGTP and dTTP), 2.0 mM MgCl2, 2.0 U Taq polymerase (TransGen, Beijing, China), 0.25 μM of each primer. Amplification was performed in a DNA Thermo-cycler (iCycler, Bio-Rad). The amplification profile for the Rpb2 gene was: an initial denaturation at 94°C for 4 min; 35-40 cycles of 94 C for 40 s, 60°C for 50 s, 72°C for 2 min, and a final cycle of 72°C for 10 min. The PCR profile for amplifying Pepc gene was: an initial denaturation at 94°C for 4 min; 38 cycles of 94°C for 40 s, 65°C for 50 s, 72°C for 2 min,

Data analysis
The chromatographs of automated sequence results were visually checked. Multiple sequence alignments were made using Clustal X with default parameters and additional manual editing to minimize gaps [43]. Maximum-parsimony (MP) analysis was performed using the computer program PAUP ver. 4 beta 10 [44]. All characters were specified as unweighted and unordered, and gaps were excluded in the analysis. Most-parsimonious trees were obtained by performing a heuristic search using the Tree Bisection-Reconnection (TBR) option with MulTrees on, and ten replications of random addition sequences with the stepwise addition option. Multiple parsimonious trees were used to generate a strict consensus tree. Overall character congruence was estimated by the consistency index (CI), and the retention index (RI). In order to infer the robustness of clades, bootstrap values with 1000 replications [45] were calculated. In addition to MP analysis, Bayesian analyses analysis was also performed. Eight evolution models of sequence were tested using PhyML 3.0 [46]. For each data set, the general time-reversible (GTR) [47] model led to a largest ML score compared to the other 7 substitution models: JC69 [48], K80 [49], F81 [50], F84 [51], HKY85 [52], TN93 [53] and custom (data not shown). As the result, the GTR model was used in the Bayesian analysis using MrBayes 3.1 [54]. MrBayes 3.1 was run with the program's standard setting of two analyses in parallel, each with four chains, and estimates convergence of results by calculating standard deviation of split frequencies between analyses. In order to make the standard deviation of split frequencies fall below 0.01 so that the occurrence of convergence could be certain, 1,159,000 generations for Rpb2 data,

Rpb2 analysis
Maximum parsimony analysis of 58 Rpb2 sequences was conducted using B. inermis and B. catharticus as outgroup (125 parsimony informative characters, 316 equally most parsimonious trees, CI = 0.791; RI = 0.932). The separated Bayesian analyses using GTR model resulted in identical trees with mean loglikelihood values -3928.47 and -3973.80 (data not shown). The tree topology generated by Bayesian analyses using the GTR model is similar to those generated by maximum parsimony. One of the most parsimonious trees with Bayesian PP and maximum parsimony bootstrap (1000 replicates) value is shown (Fig 1).
Two distinct copies of sequences were recovered from each nine accessions of E. trachycaulus (PI387895, PI440098, PI440101, H10665A, H3526, H10140, PI232150, H4228, H3995). Phylogenetic analysis clearly separated the two copies of sequences from each accession into two distinct groups, one copy with St genome diploid species, and another copy associated with H genome diploid species. Two copies of sequences from accession PI 236685 were recovered, but both were grouped into H clade in 99% bootsrtap support and 0.99 PP (Fig 1). Three distinct copies of sequences were found from accession PI232147, one was grouped into the St and two were placed into the H genome clade. Only one copy of sequence from accession PI232151 was recovered, and was placed into the St clade. The three H genome species (H. roshevitzii, H. stenostachys, H. brevisubulatum) were grouped together with a 95% bootstrap support in MP, and 0.99 PP in Bayesian analysis, and was sister to the H-like copy sequences from nine accessions of E. trachycaulus which were grouped together in a 86% bootstrap support and 0.98 PP. The H-like sequence from EF596764 and H3526 formed a group, and was sister to the H genome diploids.

Pepc analysis
Pepc gene from 13 accessions of E. trachycaulus were cloned and sequenced. At least 10 clones from each cloned PCR product were screened and sequenced. Two distinct copies of sequences were recovered from each 9 accessions of E. trachycaulus. The relative large insertion/deletion was observed between the two copies sequences from each accession, and shown in Fig 2. Three copies of sequences were recovered from accession PI232147, and relative large insertion/deletion among the three copies of sequences was also observed (Fig 2). Only one copy of sequence was recovered each from accession H10140, PI 232150 and PI 440101.
Phylogenetic analysis of the 54 Pepc sequences was performed using B. tectorum as an outgroup. The data matrix contained 958 characters, of which 600 were constant, 154 were parsimony uninformative, and 204 were parsimony informative. Heuristic searches resulted in 570 most parsimonious trees with a CI = 0.735 (excluding uninformative characters), and RI = 0.906. The Bayesian analyses using GTR model resulted in identical trees with mean loglikelihood values -5561.85 and -5701.88 (data not shown). The tree topologies generated by MP and Bayesian analyses were similar to each other. One of the most parsimonious trees with BS and PP values is shown in  Phylogenetic analysis separated two copies sequences each from accession PI 387895, H10665a and H3526 into two distinct clades, one in H genome and another in St genome clade (Fig 3) PI232151 and PI 236685 were separated into two H subclades, whereas two different copies of sequences each from accession PI 440098, H3995 and H4228 were placed into the same subclade (H2). Three different copies of sequences from accession PI 232147 were placed into the H1, H2 and St clade, respectively. TrnL/F analysis Phylogenetic analysis of 67 TrnL/F sequences was performed using B. tectorum as an outgroup. The data matrix contained 793 characters, of which 684 were constant, and 36 were parsimony informative. Heuristic searches resulted in 134 most parsimonious trees with a CI = 0.903 (excluding uninformative characters) and RI = 0.941. The separated Bayesian analyses using GTR model resulted in identical trees with mean log-likelihood values -2494.43 and -2590.54 (data not shown). One of the most parsimonious trees with BS values from MP and PP value from Bayesian analysis is shown in Fig 4. Phylogenetic analyses divided the 67 sequences into two clades. All sequences from Hordeum species were placed into one clade with 69% bootstrap support. All sequences from E. trachycaulus were grouped with the St genome, E b , E e , V, W, P and F in a 51% bootstrap value and 0.68 PP. Within this clade, the sequence from accession PI232147of E. trachycaulus formed a subclade with P. strigosa subsp. aegilopoides (St) in 60% SB and 0.76 PP support. The sequences from F, P and W genomic species formed a subclade in 82% BS and 0.98 PP support.

Multiple origins of E. trachycaulus
Previous studies using cpDNA sequences have confirmed that the diploid St genome species, Pseudoroegneria, is the maternal donor of E. trachycaulus [41,[54][55][56]. At present study, the phylogenetic analysis of TrnL/F data placed all sequences from E. trachycaulus with the sequences from Pseudoroegneria (St), Thinopyrum (E b , E e ), Dasypyrum (V), Agropyron (P), Eremopyrum (F) and Australopyrum (W) (Fig 4). It is difficult to rule out Thinopyrum, Dasypyrum, Agropyron, Eremopyrum and Australopyrum as potential maternal donors to E. trachycaulus. Our result is consistent with a study based on combined cpDNA restriction sites, rpoA sequences, and tRNA spacer sequences, in which the several North American Elymus species including E. trachycaulus were also grouped with Pseudoroegneria, Thinopyrum and Dasypyrum [41]. In contrast to the chloroplast TrnL/F data, phylogenies of two nuclear gene sequences (Rpb2 and Pepc) placed the E. trachycaulus into Pseudoroegneria and Hordeum clades, and clearly separated from the Thinopyrum, Dasypyrum and other diploid species analyzed here (Figs 1 and 3). Thus, Pseudoroegneria as a maternal donor to E. trachycaulus is consistent with nuclear data in this study and previous chloroplast data [41,[54][55][56], as well as genomepairing data [13]. Two distinct copies of Rpb2 sequences each from 9 out of thirteen accessions of E. trachycaulus were obtained, and were separated into St and H clades by phylogenetic analysis, indicating that the StH genome constitution of these nine accessions (PI387895, PI440098, PI440101, H10665A, H3526, H10140, PI232150, H4228, H3995) of E. trachycaulus. The Pepc sequence data confirmed the presence of StH genome in PI 387895, H10665A and H3526 (Fig 3). Sequence alignment revealed two distinct copies from each accession PI440098, H4228, PI 236685 and H3995. However, phylogenetic analysis grouped the two copies sequences each from accession PI440098, H4228, and H3995 into the H2 group, while the two copies of sequences from accession PI236685 were separated into H1 and H2 groups (Fig 3). Only one copy of Pepc sequence each was recovered from accession PI 440101, H10140 and PI 232150, and grouped into H2 group. Both Rpb2 and Pepc data suggested that accession PI 236685 contained two different copies of H genome (Figs 1 and 3). Only one copy Rpb2 sequence was obtained from accession PI 232151 and PI 372644, but two copies of Pepc sequences each from these accessions were obtained, and were phylogenetically grouped into H1 and H2. Three copies of Rpb2 and Pepc sequences were recovered from the accession PI 232147. The Rpb2 sequence data indicated the presence of StHH, while Pepc data indicated the presence of H1H1H2 sequences in this accession. Chloroplast data well separated the sequences of E. trachycaulus from H-genome species, indicating non-Hordeum species as maternal donor to E. trachycaulus, and presence of one copy of non-Hordeum genome in nuclear of tetraploid E. trachycaulus, most likely St genome as discussed above and suggested previously [41,[54][55][56].
In a study of tetraploid Elymus canius with StH genomes by [57], the Rpb2 data also indicated presence of either St1 or St2 together with H genome in E. caninus. The GBSSI data indicated the presence of Pseudoroegneria (St), Hordeum (H) and an "unknown" Pseudoroegneria-like genome in Elymus repens [58]. Our Rpb2 data here indicated that the St genome in E. trachycaulus was originated from either P. strigosa, P. stipifolia, P. spicata or P. geniculate. The Hordeum-like sequences of E. trachycaulus are polyphyletic in the Pepc tree, suggesting that the H genome in E. trachycaulus were contributed by multiple sources (Figs 1 and 3), whether due to multiple origins or to subsequent hybridization.

Genome diversity and evolution
Allopolyploidization, brought about by inter-specific or inter-generic hybridization followed by chromosome doubling, contributes to the evolution of new functions in duplicated genes [59][60][61]. During or after the process of allopolyploidization, rapid sequence elimination and rearrangement, cytosine methylation, as well as transposable element activation and epigenetic gene silencing in allopolyploids might have been occurred [3][4][5][6]. Rapid elimination of lowcopy DNA gene from one genome is a general phenomenon in newly synthesized allopolyploids after hybridization or after chromosome doubling [7,9]. The genome asymmetry caused by the lost of one parental gene copy was not restricted in Triticum or Elymus [8,62], it was also evident in allotetraploid soybean [63][64][65][66][67].
It has cytological been confirmed that E. trachycaulus is allotetraploid [20,21]. Two distinct copies of sequences for each single copy of nuclear gene are expected to be recovered from allotetraploid E. trachycaulus. However, two distinct copies were not recovered from all accessions for either Rpb2 or Pepc gene. Only one copy Rpb2 sequence was obtained from accession PI 232151 and PI372644, and one copy of Pepc sequence each was recovered from accession PI 440101, H10140 and PI 232150 even though more than ten clones were screened from each accession. Assuming no bias in cloning or PCR amplification, this gives a 99.9% chance of obtaining at least one copy of each of the two ancestral allelic types for the allotetraploid [68]. This might be due to mutation in the primers region causing failure of amplification of the "missing" gene copy. Another possibility might be genome convergent evolution in allopolyploids, partly because the St genome in Elymus species acquired this part of the sequence by the intergenome introgression of sequence segments from the H genome to the St genome and abundant genome-wide recombination following the fusion of St and H gametes, accompanying the process of polyploidization. Genome-wide recombination between the St and H genomes could result in the two genome sequences at this location being identical to the extent that we could not distinguish one from the other in this specific DNA fragment [57]. There were growing evidences that homoeologous rearrangements in Brassica napus [69][70][71][72][73], and exchange among homoeologous chromosomes [74] might lead to genetic asymmetry expression and promote convergent evolution of the two parental genomes and phenotypic variation in newly formed polyploids.
Surprisingly, the Pepc phylogenetic tree showed that St copies were recovered from only 3 accessions (PI 387895, H10665A, and H3526), and other accessions had 2 to 3 different H copies except PI232150 and H10140, from which only one copy of H genome sequence was recovered (Figs 2 and 3). One scenario is that St copy might been missed and not be found, but this situation is less likely because most accessions did not show the St copy even though at least 10 clones screened, and it is less likely that the St copy from about 10 accessions missed at the same time.
Sequence alignment (Fig 2) revealed deletions and insertions between/among the different copies of H sequences from the same accession (Fig 2). It has been reported instability of the Pepc sequences within Hordeum as revealed by numerous insertions and deletions, with some of them involving gain or loss of Stowaway-like transposable elements [75]. The two copies of Pepc sequences each from accession H4228, PI 440098, and H3995 and PI 232147 which were phylogenetically grouped into the same clade might be caused by instability of Pepc sequences in Hgenome. The two/three H-like sequences from accession PI 372644, PI 232151, PI 232147, PI 236685 were clearly separated into H1 and H2 clades in the phylogenetic analysis. The two distinct sequences each isolated from those accessions might be less likely explained by Pepc instabilities in Hordeum since phylogenetic analysis excluded the insertion/deletions. The two phylogenetic distinct copies of H sequences in these accessions might be caused by gene introgression from Hordeum into E. trachycaulus following polyploidization. Incomplete concerted evolution cannot be excluded which incompletely homogenized St copy of Pepc toward second H copy of Pepc. Concerted evolution appears to be a common feature of highly repetitive nuclear sequences, however, low-copy nuclear genes are also not free from concerted evolution [76,77].