Dynamic Evolution of Rht-1 Homologous Regions in Grass Genomes

Hexaploid bread wheat contains A, B, and D three subgenomes with its well-characterized ancestral genomes existed at diploid and tetraploid levels, making the wheat act as a good model species for studying evolutionary genomic dynamics. Here, we performed intra- and inter-species comparative analyses of wheat and related grass genomes to examine the dynamics of homologous regions surrounding Rht-1, a well-known “green revolution” gene. Our results showed that the divergence of the two A genomes in the Rht-1 region from the diploid and tetraploid species is greater than that from the tetraploid and hexaploid wheat. The divergence of D genome between diploid and hexaploid is lower than those of A genome, suggesting that D genome diverged latter than others. The divergence among the A, B and D subgenomes was larger than that among different ploidy levels for each subgenome which mainly resulted from genomic structural variation of insertions and, perhaps deletions, of the repetitive sequences. Meanwhile, the repetitive sequences caused genome expansion further after the divergence of the three subgenomes. However, several conserved non-coding sequences were identified to be shared among the three subgenomes of wheat, suggesting that they may have played an important role to maintain the homolog of three subgenomes. This is a pilot study on evolutionary dynamics across the wheat ploids, subgenomes and differently related grasses. Our results gained new insights into evolutionary dynamics of Rht-1 region at sequence level as well as the evolution of wheat during the plolyploidization process.


Introduction
"Evolutionary dynamics" can be used to study evolutionary mechanisms and processes. Because of the presence of homoeologous genes, allopolyploids are suitable for studying sequence structure, nucleotide diversity, and evolutionary relationships at homoeologous loci, providing insights into the evolutionary dynamics of functionally important loci [1,2]. Patterns and mechanisms of evolutionary dynamics underlying polyploid evolution, which are still poorly understood, can have an impact on breeding programs particularly for genetic improvement of new crop species such as Triticale.
Bread wheat Triticum aestivum L., which represents one of the best-characterized examples of genome polyploidization, was evolved through the process of two spontaneous hybridization events. The first one occurred some 500,000 years ago between a diploid species T. urartu (AA) and an unknown B genome species probably belonging to the Sitopsis group of Ae. speltoides, giving rise to a tetraploid AABB genome species. The second hybridization event took place some 8,000-10,000 years ago between the tetraploid AABB genome species and a diploid Ae. tauschii, the D genome donor, giving rise to current-day bread wheat (2n=6x=42, AABBDD) [3]. Compared with other allopolyploids, wheat is considered to be a young polyploid as a result of relatively recent speciation. Wheat thus has been long employed as a classical system for studying the process of allopolyploidization in flowering plants.
Comparative genomics is often used to investigate evolutionary relationships of genomes from different species and serves as an efficient tool for studying genome sequence composition, structure, gene duplications, origin of new genes and colinearity between different genomes [4][5][6][7][8]. Recently, bread wheat has been used to study the origin of species, chromosome rearrangements, structural variations, and amplification of transposable elements in the polyploidization process [7][8][9][10]. The genome of hexaploid wheat is about 16,000 Mb, and contains up to 80% of repetitive sequences [11]. Furthermore, the complexity of the bread wheat genome as an allopolyploid makes it fairly challenging to be completely sequenced. To gain the first view of the wheat genome, several studies have focused on comparative studies on important genes and flanking genomic regions in wheat, including D-hordein, HMW-glutenin, Acc, Hardness, and Q gene loci [4,5,9,12,13]. These studies have obtained in-depth knowledge of the composition and organization of genomes and revealed subtle forms of conservation and divergence between homologous genomic regions. To date, the majority of comparative sequence analyses have centered on either wheat polyploids and their diploid ancestors or grass genomes of distantly related diploid species. Hence, evolutionary dynamics of loci controlling important traits after recent polyploidization in wheat species in comparison with other related grasses representing broad lineages has not been adequately addressed. A detailed sequence comparison of the genomes in wheat polyploids, their diploid ancestors and related grasses will allow for a better understanding of the mechanisms determining these evolutionary events during polyploidization.
Wheat plant reduced height-1 (Rht-1) genes play a major role in modern agriculture. The Rht-B1b and Rht-D1b alleles of the Rht-B1 and Rht-D1 genes of wheat have been widely used since the start of the green revolution, being an important component for the improvement of crop yield. Some alleles of Rht-1 conferring dwarfism have been cloned [14][15][16][17]. However, little is known about the molecular basis of the evolutionary events that have shaped the Rht-1 locus regions in wheat. The availability of several sequenced grass genomes, representing diverse lineages, for example, Oryza sativa, Sorghum bicolor, Brachypodium distachyon, Zea mays and Setaria italic [18][19][20][21][22][23][24], provides an unprecedented opportunity to better understand genomic composition and evolution of Rht-1 homologous genomic regions across different grass species.
In this study, we identified and sequenced Rht-1 homoeologous BACs from the wheat diploid, tetraploid and hexaploid genomes of T. urartu, Ae. tauschii, T. durum and T. aestivum. We investigated the molecular basis of genomic rearrangements that occurred at the Rht-1 locus by comparing corresponding sequences of diploid, tetraploid, and hexaploid wheat species (Triticum and Aegilops), which diverged relatively recently. We also focused on the characterization of sequence variation to investigate molecular evolution of the wheat Rht-1 homologous genomic regions during the process of polyploidization. To gain a broad insight into the patterns and evolutionary mechanisms of the Rht-1 homologous regions along diverse grass lineages, we also included and compared with the orthologous regions of O. sativa, B. distachyon, S. bicolor, Z. mays and S. italica. The comparative analyses of these orthologous regions provided the first view of sequence divergence on a large scale in the wheat A, B, and D genomes, and enhanced our understanding of molecular evolution of Rht-1 genomic regions across diverse lineages of grasses.

Screening and sequencing of the wheat BACs
The diploid and tetraploid wheat BAC clones of the A and B genomes were selected from the T. urartu and T. durum (cv. Langdon) BAC libraries by screening with Southern hybridization. The hexaploid wheat BAC clones of A, B and D subgenomes were obtained from the T. aestivum (cv. Chinese spring) BAC library by screening with PCR primers specific to the Rht-B1b and Rht-D1b genes. The diploid wheat BAC clone of D genome was selected from the Ae. tauschii (AL8/78) with the same PCR primers. A total of four wheat BAC libraries were used to isolate the BACs covering the Rht-D1b or homologous genes. The diploid libraries were first screened by using T. urartu and Ae. tauschii (AL78/8) with a coverage of 1.8-fold and 2-fold, respectively. Then the tetraploid library was constructed with coverage of 5.1-fold using T. durum (cv. Langdon), which was kindly provided by Dr. Yong-Qiang Gu. Finally, the hexaploid library was constructed from Aibai/CS near-isogenic line (NIL) of T. aestivum and the coverage was estimated to be 6.5-fold.
E. coli-freed DNAs from BAC clones were isolated with the QIAGEN Large-Construct Kit, mechanically sheared into fragments of 2-5Kb by Hydroshear (Gene Machines). The 2-5Kb fragments were blunt-ended with mung bean nuclease and dephosphorylated with Shrimp Alkaline Phosphatase (SAP). Then they were ligated into a pCR4-TOPO vector and transformed into TOP10 electro-competent cells. Individual clones were sequenced from both forward and reverse directions using ABI BigDye3.1 terminator chemistry and analyzed on an ABI 3730XL automated capillary sequencer. Preassembly and assembly analyses of the sequencing reads were performed by using PHRED [25], and assembled through the Lasergene v7.10 software (http://www.dnastar.com/) with the parameters Match Size 40 and Minimum Match Percentage 98. Gaps were closed and weak consensus regions strengthened by either direct sequencing of subclones using primer walking with adding dGTP mix and DMSO in the sequencing reaction system. VISTA family tool (http:// genome.lbl.gov/vista/index.shtml) was used to identify the conserved non-coding sequences (CNSs) in the Rht-1 region [26].  [27]. Then these candidate miRNA sequences were folded to test their secondary structures using M-fold web [28]. Target predictions were performed to search the wheat EST (http://www.tigr.org/tdb/e2k1/tae1/index.shtml) and KOMUGI databases (http://www.shigen.nig.ac.jp/wheat/ komugi/) for miRNA complementary sequences, allowing up to three mismatches and with no gaps between miRNAs and target mRNAs [29]. The software Gepard-1.2 (http:// www.warezkeeper.com/ gepard-v.1.2-crack-serial-keygendownload.html) was used for the dot-plot analysis, in which sequence criteria of 60% was taken with a window size of 40bp. In addition, Rht-1 homologous genomic regions were identified and downloaded from the genomes of Setaria italic (http://www.phytozome.net/foxtailmillet.php), Brachypodium distachyon (http://www.brachypodium.org/), Oryza sativa (http://rice.plantbiology.msu.edu/), Sorghum bicolor (http:// www.phytozome.net/sorghum) and Zea mays (http:// www.plantgdb.org/ZmGDB/). For the purpose of reasonably comparative analyses, these sequences were annotated through the same standard as used in wheat sequences.

Data analyses
Full-length elements were aged by comparing their 5' and 3' LTR sequences [30]. The composition distances of the two LTR sequences were calculated by MEGA 4.0 using the Kimura-2 model to estimate the insertion times of LTRretrotransposons [31]. In this study, we used the average substitution rate of 6.5×10 -9 substitutions per synonymous site per year, estimated from the adh1 and adh2 loci of grasses [32]. The time (T) since element insertion was calculated by using the formula T=K/2r, where T is the time of divergence, K is the divergence, and r is the substitution rate [33]. The molecular clock was calibrated using 60 MYA for divergence of T. aestivum from Z. mays. MEGA5.0 was also used to generate neighbor-joining (NJ) trees with bootstrap values. codeml module of PAML version 4.7 with the F3X4 codon frequency model was used to calculate the pairwise nonsynonymous substitutions rates (Ka) and synonymous substitutions rates (Ks). While, the nucleotide substitutions of UTRs and introns were calculated by baseml module [34].

Sequencing and analysis of the wheat BACs
A total of seven BAC clones containing the Rht-1 homologous regions of T. urartu, Ae. tauschii, T. durum and T. aestivum were selected for sequencing. Among them, the three BAC clones from the A genome, two from the B genome, and two from the D genome were screened, respectively ( Table 1).
Annotation of the BAC sequences indicated that Fragile-X-Flike (gene 1), DUF6-like (gene 2) and Rht-1 (gene 3) genes were all supported by the wheat ESTs in the Genbank. These three genes were shared among the A, B, and D genomes (Table S1, Figure S1A, B, C). The RT-PCR analyses revealed that the Rht-1 homologous genes of the A, B, and D genomes were ubiquitously expressed at all developmental stages and different tissues examined, while the DUF6-like gene was only expressed in the stem and seed, indicative of its conditional expression in wheat ( Figure S1D). To further examine transcriptional regulation of Rht-1 homologous genes, we analyzed the 1,500-bp upstream promoter sequences of these Rht-1 homologous genes using the PlantCARE database. Blast searches found highly conserved essential cis-regulatory elements of promoter, including TATA, CAAT, and GC-box across the investigated species; SP1 motif and MBS were also detected in all these subgenomes. G-box (CACGTG) was further found to be shared by the A and B subgenomes but not the D subgenome of T. aestivum; 5' UTR Py-rich stretch Evolutionary Dynamics of Rht-1 Homologous Regions element (TTTCTTCTCT), which usually confers high transcription levels, were found in the B and D subgenomes of T. aestivum but not in the A subgenome of T. aestivum. A gibberellin-responsive element, P-box (CCTTTTG), was only found in the D subgenome of T. aestivum (Table S2). The repetitive sequences were the major components in the sequenced genomic region, consisting of a wide variety of transposable elements (TEs). The repetitive sequences of individual BACs ranged from 37.36% (BAC 105A8) to 75.31% (BAC C4) (Table S3). Among the TEs, the content of DNA transposons ranged from 0.63% to 11.44%, while retrotransposons account for 35.63% to 66.14%. Retrotransposable elements, Copia, Gypsy, CACTA, and MITEs were found to be the most important superfamilies resided within the homologous genomic regions (Table S4, S5 & S6). A total of 13 complete/intact LTR retrotransposons were identified with variable insertion times. Of them, the oldest LTR retrotransposon was RLG_Fatima_315P18-1 and RLG_Fatima_17O6-1 with their insertion times around ~2.15 MYA, whereas RLC_WIS_1051O6-2 and RLC_WIS_351D1-2 were the youngest elements with an insertion time of ~0.27 MYA (Table S4). In addition, four candidate miRNAs (TamiR1122, TamiR1137, TamiR1132 and TamiR1121) and Simple Sequence Repeat (SSR) were identified in these BAC sequences (Tables S7 & S8).

Genomic divergences of the homologous Rht-1 gene regions at different ploidy levels
To characterize the sequence variation in the Rht-1 homologous regions of the wheat genomes, we performed dot matrix analyses between pairs of corresponding genomes from two different ploidy levels. Genomic divergences were designated as gaps in the main matrix diagonal lines ( Figure  1). The average conserved fragment size (CFS) and conserved sequence ratio (CSR) was calculated based on sequence alignments to evaluate the sequence divergence (Table 2).
Sequence comparisons of the Rht-1 homologous regions among the three A genomes showed that they shared all the three genes with highly conserved collinearity. These collinear genes have the same transcriptional orientations and exon/ Evolutionary Dynamics of Rht-1 Homologous Regions intron structures at different ploidy levels ( Figure S1A, B, C). However, when both CSR and CFS were used to examine the sequence variation, it is found that the CSR and CFS between tetraploid and hexaploid were much higher than that diploid Vs tetraploid and diploid Vs tetraploid (98.9%,67.0%, 46.9% and 47,962bp, 9,173bp 1,739bp, respectively, Table 2). This is in accordance with more gaps (8 gaps) detected in the overlapped regions between the diploid and tetraploid genomes ( Figure 1A). Analysis of these gap regions revealed that LTR retroelements, WIS-type element (WIS-2), WIS-type element (WIS-3) and Copia-type retroelement fragment caused the Gap1 and Gap2, a DNA transposon MITE insertion caused Gap2, a tRNA element caused gap 4, and unknown sequence deletions caused Gap7 and Gap8. Gap3 and Gap6 were caused by two high GC content regions that were unable to sequence through. In contrast, only one gap was observed between the A genomes from tetraploid and hexaploid species. This gap was generated by an insertion of 800-bp sequence in the T. aestivum sequence ( Figure 1B, S2A). Because sequences of tetraploid and hexaploid genome have high conservatism, seven gaps were observed between diploid and hexaploid species, similar to those between tetraploid and hexaploid genome ( Figure 1C, S2A). Apparently, greater genomic divergences were present between the diploid and tetraploid species as compared to that between the tetraploid and hexaploid wheat. The CFS and CSR values in the 68,851 bp overlapping region of the B genomes between tetraploid and hexaploid were very similar with the two A genomes from the polyploid wheat ( Figure 1D, S2B and Table 2). Only a few sequence differences were identified during the evolutionary process from the tetraploid to hexaploid wheat, including a 43bp insertion of unknown sequence encompassing the duplication of TGCGGGCATGCGGCCGATGGCGG A.
The divergences between diploid and hexaploid (no tetraploid for D genome in wheat) of the D genome were also examined ( Figure 1E, S2C and Table 2). A total of 81,518bp overlapping region was observed aligned between the two D genomes from the diploid and hexaploid species. 93.7% of which was conserved, including LTR retrotransposon, CACTA element, MITEs of Stowaway, tRNA sequences of SINEs, SSR sequences of (CT) 23, (GAA) 8 and (CGGT) 5 as well as two predicted genes. Only three gaps were observed in this region; one was caused by tRNA element deletion in Ae. tauschii, and other two gaps might be caused by sequencing issues due to the GC-rich genomic regions.
The nucleotide substitution rates (NSR) were also employed to analyze the sequence divergence. We estimated NSR between the diploid, tetraploid and hexaploid wheat of A, B, and D genomes, respectively, based on pairwise comparisons of Rht-1 and DUF6-like genes ( Table 3; Table S9). Few or no nucleotide substitutions were detected in the two homologous A and in the two homologous B genomes from T. durum and T. aestivum and in the two homologous D genomes from Ae. tauschii and T. aestivum. However, when the diploid A genomes was compared with the A genomes from the polyploid wheat, higher nucleotide divergences were found, suggesting that there were more divergences from diploid to polyploid but less from tetraploid to hexaploid in the A genome, also less from diploid to hexaploid in D genome.

Pair-wise comparisons of the orthologous Rht-1 gene regions among the A, B, and D subgenomes
Pair-wise comparisons of the orthologous Rht-1 regions of the A, B, and D hexaploid were further performed to examine sequence divergence and conservation among the three wheat subgenomes. The CSR between the A (hexaploid) and B (hexaploid), A and D (hexaploid), B (hexaploid) and D (hexaploid), A (diploid) and B (tetraploid), B (tetraploid) and D (diploid), and A (diploid) and D (diploid) genome were 39.8%, 28.3%, 24.2%, 46.9%, 21.4% and 22.7%, respectively, and the CFS were 2,582bp, 2,066bp, 3,526bp, 1,739bp, 3,392bp, and 3,606bp. Both the CFS and CSR values were much smaller than those of homologous wheat genomes from different ploid levels ( Table 2), suggesting that the more divergence among the three subgenomes as compared to the sequences of homologous genomes from different wheat ploids. In our comparative analysis, only six regions were found to be conserved across these three wheat subgenomes (Figure 2, S3). Of them, I, II and III regions contained genes 1, 2 and 3, Evolutionary Dynamics of Rht-1 Homologous Regions respectively. The gene 1 was obviously lacking in the sequence of B genome likely due to the fact that the sequenced BAC region did not cover the sequence of the gene. Besides the above-mentioned gene regions, there were two sequence regions, Regions IV and V, containing three CNSs across all the three wheat genomes. The average length of CNS 1-3 is about 525 bp, 559 bp and 676 bp, respectively. CNS 1 and CNS 2 were located about 10kb and 8kb downstream region of gene 2 (D genome), respectively. CNS 3 was located about 6,000 bp upstream region of Rht-1 (D genome) (Figure 3). These CNSs from different subgenomes had sequence similarities at least over 80% ( Figure S4), and belong to unknown sequences.
Several other conserved sequence regions shared only by two genomes, but not by all three subgenomes were also detected. For instance, conserved region VI was observed to be shared by the A and D genomes (Figure 2A). Except for regions II, III, IV and V, none of other sequences was shared by the B and D genomes ( Figure 2B). Region VII, a conserved region that was composed of unknown sequences, was only present between the A and B genomes ( Figure 2C). We also compared the orthologous Rht-1 regions between T. urartu and Ae. tauchii ( Figure 2D). Regions VIII and IX contains CNS 1 and CNS 2, CNS 3, respectively, were shared by the two genomes, except for regions II and III. Comparison results between T. urartu and B T. durum showed that there have three conserved regions X (contains CNS 1 and CNS 2), XI (same with region VII) and XII (contains CNS 3), except for gene regions II and III ( Figure 2E). Region XIII and XIV contains CNS 1 and CNS 2, CNS 3, respectively, were shared by B T. durum and D Ae. tauchii genomes ( Figure 2F).
The nucleotide substitution rates (NSR) in the Rht-1 and DUF6-like gene regions from the A, B, and D genomes of the hexaploid wheat were much higher as compared to the NSR between any two homologous wheat genomes from different ploid levels (Table 3; Table S9). Phylogenetic analysis has allowed us to establish evolutionary relationships (orthology versus paralogy) between the different members of the Rht-1 and DUF6-like genes in wheat species and from different ploid levels. Although the two (AB) or three (ABD) subgenomes have been co-evolving in the tetraploid or hexaploid wheat species, phylogenetic inferences based on the Rht-1 and DUF6-like gene sequences showed that both genes formed clusters that placed the gene sequences from the homologous genomes together with strong bootstrap supports ( Figure S5).
Both DUF6-like and Rht-1 appeared to be under strong purifying selection as evidenced by Ka/Ks values much less than 1 ( Table 3, Table S9). Considerable variation in nucleotide substitutions was observed between these two genes, suggesting that they evolved at different rates; the number of nucleotide substitutions per site in the Rht-1 gene CDS region is greater than that in the DUF6-like gene (P< 0.001) besides D: Ae. tauschii-D: T. aestivum, indicating that the former probably evolved faster than the latter (Table 4).  Table S10) revealed a high synteny conservation of orthologous genes but large genomic divergence in the intergenic regions. The three genes, Fragile-X-F-like, DUF6-   Figure 4). In addition, high gene sequence conservation was also observed across grass species ( Figure S6). The Fragile-X-F-like gene harbored fourteen exons in all species, and thirteen of which were identical in length across the species ( Figure S1A). While all species had eight exons within the DUF6-like gene, seven of which were identical in length between all grasses ( Figure  S1B). Comparisons of amino acid sequences of the intronless Rht-1 homologous genes suggested that these proteins contained conserved domains (N-terminal DELLA, TVHYNP motifs, C-terminal VHIID, LHR I, LHR II, PFYRE and SAW domain) and non-conserved domains including spacers between DELLA-TVHYNP and TVHYNP-Polys/T/V) ( Figure  S6C, S6D). The majority of the predicted amino acid sequences such as the C-terminal domain were highly conserved among these species with slightly variable lengths. Furthermore, we detected a CNS shared by all the surveyed grasses, which is in accordance with previous report [35]. We further determined phylogenetic relationships of S. italica, B. distachyon, O. sativa, S. bicolor, Z. mays and the wheat species based on amino acid sequences of both DUF6like and Rht-1 genes ( Figure S5). Analyses of these two genes generated a similar topology with high bootstrap supports, which is fairly consistent to the commonly recognized evolutionary relationships of these grass species under study. The result suggested that B. distachyon is more closely related to wheat than the other four grass species, supporting the notion that Brachypodium can serve as a model plant for the analyses of the wheat genome [36].

Sequence structure and molecular evolution of the orthologous Rht-1 regions across different grass species
The Rht-1 homologous regions were shown to have various sizes in different grass genomes with the following order of S. italica < B. distachyon < S. bicolor < O. sativa < T. aestivum < Z. mays (Figure 4). Accordingly, gene density was the smallest in maize but largest in foxtail millet (Table S11). Further characterization of these Rht-1 orthologous regions demonstrated that TEs have played an important role in determining the genome size variation, as indicated by 4.84%, 12.08%, 8.49%, 39.14%, 50.57% and 80.78% of transposable elements in foxtail millet, Brachypodium, sorghum, rice, wheat and maize, respectively (Table S11). Although we failed to detect any intact LTR retrotransposons in B. distachyon, sorghum and S. italica, a total of one, two and three intact retrotransposons were found in wheat, rice and maize, respectively (Table S12). The estimated insertion times of these retrotransposons ranged from 0 ~1.08 mya, indicating their active turnovers during the evolution of Rht-1 genomic regions. Therefore, the expansion and contraction of the Rht-1

Divergence times of the wheat homologous genomes and related grasses
The divergence times between the major Triticum and Aegilops lineages of the wheat species and related grasses were separately estimated using intron and synonymous sites of the Rht-1 and DUF6-like genes ( Table 4). The results showed that the divergence of wheat with B. distachyon was more recent than that with other grass species but much earlier than that of the three wheat subgenomes. The diploid Triticum and Aegilops progenitors of the A, B and D genomes all radiated at approximately the same time, 9.4751-11.8597 MYA. The divergence times of the homologous A genomes are estimated to be 1.2782 MYA between the diploid and tetraploid and between diploid and hexaploid wheat, while the diploid Dgenome species, Ae. tauschii, was found to have diverged from the hexaploid wheat only 0.0277 MYA ( Table 4).
The identification of a few LTR retrotransposons that are intact and shared by two homologous genomes in the Rht-1 regions permits us to further examine the sequence changes and divergence times of the homologous genomes in wheat. In this study, we used the colinear LTR retrotransposons identified in the Rht-1 regions and dated their insertion time to estimate the divergence of the two homologous genomes (Table S4). Of a total of 13 intact colinear LTR retrotransposons, three shared by the A genomes, two by the B genomes, and one by the D genomes, were used to estimate the divergence times. Using the same molecular clock for dating the LTR retrotransposon insertions, we estimated the divergence time by calculating rates of nucleotide substitution between each pair of colinear retroelements to examine the variation in different sequences ( Table 5). The estimated divergence time for the two homologous A genomes from the diploid and tetraploid and from diploid and hexaploid wheat is both around 0.68 million years ago (MYA). The divergence time of the two A genomes from the tetraploid and hexaploid wheat ranged from 0.00 to 0.03 MYA, with an average of 0.013 MYA. The two B genomes in tetraploid and hexaploid were estimated to have diverged in the last 0.01-0.03 MY, with an average of 0.02 MYA. The two D genomes from the diploid Ae. tauschii and hexaploid wheat diverged around 0.15 MYA (Table 5). It appeared that the approximations of the divergence times based on the colinear LTR retroelements were very similar with that using the sequence similarity of the above-mentioned two genes.

Repetitive sequences were the main elements that have a great influence on conservation
In the present study, several parameters including CSR, CFS, NSR and divergence time were employed to examine the conservation/divergence in the Rht-1 regions among the three subgenomes, three ploids and the grass species. Our results revealed that the variation of conservation/divergence were dynamic when the sequences from the different ploidy levels, subgenomes, and grass genomes were compared. Repetitive sequences were the main elements that have a great influence Evolutionary Dynamics of Rht-1 Homologous Regions on conservation. In present study, the main variation among the species and ploids were from the repetitive sequences, their insertion and/or deletion. The insertion time was around 2 MYA, much later than the divergence of the three subgenomes, suggesting that wheat genome expansion occurred after the subgenome divergence. Because of the rapid amplification and probably deletion of TEs, the intergenic regions among wheat subgenomes are largely divergent. The repetitive sequence not only produced the variation among the intergenic regions, but also in the genic regions. For example, a new TRIM transposon inserted in the DELLA domain of Rht-B1, and caused strongly reduced plant height [15,16]. Despite the large sequence divergence among subgenomes, we detected a high conservation between homologous wheat genomes, especially those A and B genomes from the tetraploid and hexaploid wheat and the D genomes from Ae. tauschii and hexaploid wheat. These results provide molecular supports to the current breeding practice that the tetraploid wheat and Ae. tauschii species are often used for modern wheat improvement.

The inter subgenome CNSs in wheat
In our study, we identified three CNSs shared by the three wheat subgemones. CNSs are often rich in regulatory elements that may be involved in various biological functions. Uchida et al. reported that a K-box and RB-box in the CNSs upstream of Knotted1 in grasses; SHOOT MERISTEMLESS (STM) in Arabidopsis regulate the gene expression of STM [37]. Another non-coding region which contains maize-sorghum-rice CNSs has been confirmed to serve as a cis-acting transcriptionregulatory role [38]. The conservation of different CNSs could vary among plant species. The CNSs conserved among all the plant species, can be named as inter plant genome CNSs, while the CNSs conserved among the grass species can be regarded as inter grass genome CNSs [35,39]. The three CNSs reported in present study were conserved only among the three wheat subgenomes, so they were named as inter subgenome CNSs. We can speculate that widely distributed CNSs might serve more general biological functions, such as these by house-keeping genes. Meanwhile, CNSs shared by a small set of genomes might have a specific function. Hence, the inter subgenme CNSs may play an important role to retain the homoeologous relationship of three subgenomes in wheat. The inter subgenome CNSs also suggest that the three subgenomes share a common ancestor. With the fast progress of wheat genome sequencing, more subgenome CNSs will be discovered, and their structure characters and function will be discovered in the near future.

Evolution of the Rht-1 locus in different species
Comparative analyses of Rht-1 locus regions from different species enhance our understanding of the structure and evolution of grass genomes. Divergence time analysis using the two gene sequences showed a significant overestimate of the time of tetraploid wheat formation, which occurred no more than 0.2756 Mya (Table 4). This upper estimate is fairly consistent with the previous reports [3,5]. The estimates based on synonymous substitutions are only approximate due to very low sequence divergence. Based on the nucleotide substitution rate of the two genes, the divergence times of A and B, A and D, and B and D were estimated to be 11.8597, 9.4751 and 9.8010 Mya, respectively. These estimated results are very consistent with the estimations based on the ACC1 loci [25]. The divergence of B genome might occur prior to the separation of A and D genomes. The diploid D genome was the latest genome added to the hexaploid wheat and showed high sequence conservation.