Comparative Sequence Analysis of the Ghd7 Orthologous Regions Revealed Movement of Ghd7 in the Grass Genomes

Ghd7 is an important rice gene that has a major effect on several agronomic traits, including yield. To reveal the origin of Ghd7 and sequence evolution of this locus, we performed a comparative sequence analysis of the Ghd7 orthologous regions from ten diploid Oryza species, Brachypodium distachyon, sorghum and maize. Sequence analysis demonstrated high gene collinearity across the genus Oryza and a disruption of collinearity among non-Oryza species. In particular, Ghd7 was not present in orthologous positions except in Oryza species. The Ghd7 regions were found to have low gene densities and high contents of repetitive elements, and that the sizes of orthologous regions varied tremendously. The large transposable element contents resulted in a high frequency of pseudogenization and gene movement events surrounding the Ghd7 loci. Annotation information and cytological experiments have indicated that Ghd7 is a heterochromatic gene. Ghd7 orthologs were identified in B. distachyon, sorghum and maize by phylogenetic analysis; however, the positions of orthologous genes differed dramatically as a consequence of gene movements in grasses. Rather, we identified sequence remnants of gene movement of Ghd7 mediated by illegitimate recombination in the B. distachyon genome.


Introduction
Comparative genomics is a powerful tool to study gene and genome evolution [1]. However, studies based on model genomes are largely insufficient to interpret the evolutionary history and mechanism of genomic changes. The genus Oryza provides a fantastic model to study gene and genome evolution with its well defined phylogenic relationships and rich genomic resources available [2][3][4][5][6]. Comparative genomics in Oryza have provided insights into genome evolution [7][8][9], genome size variation [10,11] and dynamics of gene evolution, such as lineage specific gene deletions, repeat-mediated gene movements and de novo gene formation [4,[12][13][14].
The genus Oryza, together with several completely sequenced grass species, is becoming a powerful system to elucidate the evolutionary origin of agronomically important genes. Ghd7, a CCT domain-containing gene located on the short arm of rice chromosome 7, controls the number of grains per panicle, plant height and heading date [25]. Enhanced expression of Ghd7 under long-day conditions plays a central role in the photoperiod pathway of flowering. The reduced function of Ghd7 is associated with adaptation of rice to regions with low temperatures and short growth seasons [25]. Ghd7 is thought to be an evolutionarily new gene, because it does not have homologs in Arabidopsis thaliana, and the protein sequence lacks a B-box domain, and the non-CCT portion differs from other CCT domain-containing proteins [25,26]. Interestingly, comparative sequence analysis among rice, sorghum and maize indicated that Ghd7 was absent from orthologous regions in the Andropogoneae lineage [27]. In order to uncover the evolutionary history of the Ghd7 locus, we performed a comparative sequence analysis of the Ghd7 orthologous regions in ten diploid Oryza species and related regions from B. distachyon, sorghum and Zea mays (maize). The Ghd7 regions showed distinctive heterochromatic features compared to previously analyzed euchromatic regions (Adh1, Moc1, Hd1) in the genus Oryza [4,13,14]. The evolutionary history of Ghd7 and the mechanism of gene movements were interpreted and discussed.

Identification and Sequencing of BAC Clones of the Ghd7
Orthologous Regions from the Genus Oryza BAC clones covering the Ghd7 orthologous regions were isolated from Oryza rufipogon (AA), Oryza nivara (AA), Oryza glaberrima (AA), Oryza glumaepatula (AA), Oryza punctata (BB), Oryza officinalis (CC), Oryza australiensis (EE), and Oryza brachyantha (FF). Thirty-three BAC clones were sequenced using Illumina Genome Analyzer II and Roche/454 Genome Sequencer (Table S1). In total, we generated ,5.83 Mb of DNA sequence, representing 3.86 Mb of the corresponding Ghd7 orthologous regions (Table S1). Having additional data points from the syntenic regions of non-Oryza species would be instrumental in reconstructing the evolution history of duplication, retention and syntenic gene order. Therefore, we also included the corresponding orthologous regions from O. sativa L. ssp. japonica (japonica), O. sativa L. ssp. indica (indica), B. distachyon, sorghum and maize for data analysis. A total of ,7.6 Mb of genome sequence data were annotated (Table  S2).

Gene Organization of the Ghd7 Regions
The Ghd7 region from japonica was used as a reference for all comparative analyses. Genes were reannotated as described in the experimental procedures. Twenty gene models and two pseudogenes were annotated in the 553 kb region (Tables S3, S4). The intron/exon structures of the annotated genes were corrected according to the full-length cDNA or EST sequences (Table S3). A total of 163 genes were annotated in other Oryza species, B. distachyon, sorghum and maize (Table S2) (Tables S5,  S6, S7, S8).
The gene densities of the Ghd7-surrounding regions were much lower than the Moc1, Adh1 and Hd1 regions [4,13,14], ranging from 9 kb/gene in B. distachyon to 110 kb/gene in O. officinalis ( Figure S2 and Table S2). Eleven pseudogenes were annotated ( Figure S1). The pseudogenes were caused by the insertion of repetitive elements (J-19, GLA-19, GLU-19, SB-6 and ZM-15), premature terminations (J-2, I-2, I-17, GLA-8 and GLA-18) and mutation at a splice site and the initiation codon (GLU-17) ( Figure  S1). Most of the pseudogenes were detected as duplicated genes or lineage-specific genes. However, three conserved genes were also observed to be pseudogenized in O. glaberrima (GLA-8), sorghum (SB-6) and maize (ZM-15). In addition, six gene models were observed to have variations in intron/exon structures in Oryza species, and their structures were confirmed by RT-PCR experiments (P-9, P-11, A-11, B-9, B-11 and B-22); the gene structures of 11 predicted gene models in the non-Oryza species were found to differ from their orthologs in japonica ( Figure S1).

High Gene Collinearity in the Diploid Oryza Species and a Loss of Conservation in B. distachyon, Sorghum and Maize
A comparison of the Ghd7 orthologous regions indicated high gene collinearity within the genus Oryza and a loss of sequence conservation in B. distachyon, sorghum and maize ( Figure 1, Tables S6, S7, S8). Indica has the highest level of sequence identity with japonica, differing mainly by the insertion or deletion of several repetitive elements. Even within the AA genomes, O. glaberrima and O. glumaepatula contain lineage-specific genes or pseudogenes. The core gene Ghd7 was present in the diploid Oryza species only, but absent in the syntenic regions in B. distachyon, sorghum and maize. There were three pairs of tandem gene duplicates in japonica ( Figure S1). Interestingly, these duplicated genes showed dramatic changes in distal species compared to rice, including copy number, the presence or absence of pseudogenes, gene structure and the direction of expression. In particular, one copy of a duplicated gene pair (J-18, I-18, GLA-18, GLU-18) contains a single exon and no introns, suggesting this duplicated gene copy may arise by reverse transcription of processed mRNA, with subsequent integration into the genome.
The sizes of the Ghd7 orthologous regions varied tremendously among rice, B. distachyon, sorghum and maize. As shown in Figure 1 and Figure S1, B. distachyon, sorghum and maize only contained approximately half of the 22 genes or pseudogenes in japonica. The genome size of B. distachyon (,272 Mb) [15] was smaller than that of rice (,400 Mb) [28], and the corresponding orthologous region (107 kb) in B. distachyon was approximately 20% of the size of the rice syntenic region (553 kb). Annotation of the syntenic regions indicated that the reduced size was associated with the absence of ten genes, the high gene density (8.92 kb/gene) and low transposable element (TE) content (6.81%) in B. distachyon.
The genome size of maize is ,2400 Mb [29], much larger than that of rice; however, the Ghd7 orthologous region of maize (230 kb) is ,40% of rice (553 kb). The eight annotated genes/ gene fragments present in maize were divided into three gene islands surrounded by blocks of repetitive sequences. Many of the intergenic retrotransposon blocks were nested insertions. The factors contributing to the unexpectedly short sequence in maize differed from B. distachyon: 13 genes were absent and a high content of repetitive elements (78.04%) were observed, especially LTR retrotransposons.
Large regions of non-conservation were observed between the sorghum and rice Ghd7 orthologous regions. The genome size of sorghum is ,730 Mb [16]; however, the Ghd7 orthologous region is ,1.91 Mb. The five complex regions divided the sorghum region into six parts ( Figure 1). A series of tandem duplicated genes were identified in the five complex regions, and many of them are related to biotic and abiotic stress responses, such as F-box genes (Table S7).

Abundance and Variation of Transposable Elements
Contributed to the Complexity of the Ghd7 Orthologous Regions TEs were annotated as described in the experimental procedures (Tables 1 and S9, S10, S11). The Ghd7 orthologous regions showed a much higher levels of RNA TEs compared to the regions analyzed previously (Table S13), with only one exception of lower RNA TE content in B. distachyon (,6%). Most of the RNA TEs are LTR retrotransposons ( Table 1). The average insertion time of LTR retrotransposons from each species indicated that these LTRs spread after speciation (Table S10). In addition to the insertions, LTR retrotransposons were removed through illegitimate recombination (IR) and unequal homologous recombination (UR, converting LTR retrotransposons into solo-LTRs) in these orthologous regions [30][31][32][33]. However, we observed that the removal efficiency was not sufficient to counter the expansion caused by LTR retrotransposons in the Ghd7 orthologous regions in O. rufipogon, O. punctata, sorghum and maize ( Table 2).

Ghd7 is a Heterochromatic Gene in Rice
The Gypsy retrotransposon content of the Ghd7 orthologous regions was greater than Copia, especially in O. rufipogon, O.  nivara and O. punctata (Table S12). However, this difference was not observed in B. distachyon or maize, suggesting the Ghd7 regions in different species may be located in distinct chromatin environments (Table S12). From our annotation information, the Ghd7 orthologous regions in most diploid Oryza species display features of heterochromatin compared to other well-characterized euchromatic regions ( Figure S2, Tables S12 and S13). Previous cytological studies indicated that the Ghd7 locus in chromosome 7 of rice is likely to be sited within a high density, or at least a moderately condensed, region of chromatin [34,35]. Furthermore, strict recombination suppression was detected in the Ghd7 region from QTL analysis in rice [25]. All of these results suggested that Ghd7 might be a heterochromatic gene in the genus Oryza [36][37][38].
To confirm this possibility, we performed a fluorescence-insitu-hybridization (FISH) experiment using the PAC (P1 artificial chromosome) clone 46D03, which contains the Ghd7 gene, as a probe. The result indicated that the PAC clone was located in a deeply stained region on the short arm of chromosome 7 of rice ( Figure 2); indicating Ghd7 is a heterochromatic gene in rice.

The Evolutionary History of Ghd7 in the Grass Genomes
To trace the evolutionary history of Ghd7, the CCT domaincontaining genes from rice, B. distachyon, sorghum and maize were identified by BLASTP based on homology to the Ghd7 CCT domain. A total of 28 homologous genes were identified (Table  S14). As shown in Figure 3, the CCT domain-containing genes from different species can be classified into two clades by phylogenetic analysis. SMART and Pfam analyses indicated that genes in Clade I contain a CCT domain only, similar to Ghd7. In contrast, both a B-box and a CCT domain were detected for genes in Clade II, typified by the CCT family gene CO.
The Clade I can be further divided into two subgroups, each containing four genes from rice, B. distachyon, sorghum and maize. In subgroup II, LOC_Os10g41100, Bradi3g33340, Sb01g029080 and GRMZM2G004483_P01 are orthologous genes within a syntenic region. LOC_Os10g41100 and Ghd7 are duplicated genes in rice, and the gene duplication occurred prior to the divergence of rice, sorghum, maize and Brachypodium. In subgroup I, Ghd7 (LOC_Os07g15770), Bradi3g10010, Sb06g000570 and GRMZM2G381691_T01 were also defined as gene orthologs, as they met the following criteria: they  formed a monophyletic clade; systematic information within this clade conformed to a generally accepted species trees; the bootstrap value was .55% [39]. These results indicated that Ghd7 has orthologs in non-Oryza species, but they are not located in syntenic regions (Table S15).
To investigate the origins of Ghd7 and its orthologs, we compared the corresponding orthologous regions from rice, B. distachyon, sorghum and maize. As a basal species of the genus Oryza, O. brachyantha (FF) had been subject to whole genome sequencing by our laboratory. To ensure a reasonable evolutionary gradient, the corresponding regions in O. brachyantha were included in the comparative analysis ( Figure 4). Overall gene collinearity was observed in these genomic regions, but Ghd7 and its orthologs are located in three different syntenic regions in Oryza, B. distachyon and the Andropogoneae lineages (Figure 4), indicating Ghd7 was present prior to the divergence of rice, B. distachyon, sorghum and maize. Therefore, the Ghd7 orthologs must have moved to their current positions by undefined mechanisms in some species.

The Mechanisms for Movements of Ghd7 and its Orthologs
A whole genome duplication event occurred ,50-70 million years ago in grasses, prior to the divergence of major cereals [40,41]. Many of the duplicated genes lost quickly after duplication by large-scale chromosomal rearrangements and deletions, leading to diploidization [42][43][44][45]. If the movement of Ghd7 or its orthologs was resulted from the differential fractionation of duplicated genes derived by the whole genome duplication, Ghd7 and its orthologs should be located on homoeologous chromosomes. Rice chromosome 3 is homoeologous to chromosome 7 where Ghd7 resides. However, the Ghd7 orthologs are located on chromosome 3 of B. distachyon, chromosome 6 of sorghum, and chromosome 10 of maize, which is homoeologous to chromosome 2 and chromosome 4 of rice, respectively (Figure 4), indicating Ghd7 and its orthologs are not located on homoeologous chromosomes. In fact, the duplicated region of the Ghd7 orthologous region could not be identified in rice at all, probably due to the heterochromatic features. Therefore, we conclude that the movements of Ghd7 and/or its orthologs were not resulted from the fractionation of duplicated genes following the whole genome duplication in grasses.
Based the model shown in Figure 5, we identified the evidence for the movement of Bradi3g10010, the Ghd7 ortholog in B. distachyon. We modeled that a DSB was caused by insertion of an unknown transposable element, and a DNA fragment containing Bradi3g10010 was used to fill the gap. A 10 bp target site duplication flanking the unknown transposable element and one side of Bradi3g10010 were detected (Figure 5c). Unfortunately, we could not detect clear evidence of Ghd7 movement in other species due to the high content of repetitive elements in the surrounding regions.

Additional Gene Movements Mediated by DSB Repair in the Ghd7 Orthologous Regions
Four lineage-specific genes or gene fragments in the Ghd7 orthologous regions have signatures of DSB repair ( Figure 6). GLA-U1 in O. glaberrima and A-U4 in O. australiensis were hypothesized to be moved to their current positions by repair of DSBs created by insertion of MuDR-5_OS and HELITRO-N4_OS, respectively (Figure 6a and 6c). Tandem repeat motifs are hotspots for recombination [46]. GLA-U2 located on chromosome 7 of O. glaberrima (the acceptor) and its homolog LO-C_Os05g34200 located on chromosome 5 was flanked by an identical 6-bp sequence signature on one side (Figure 6b). In addition, simple sequence repeats surrounding A-U5 in O. australiensis suggested that the A-U5 movement was mediated by either unequal crossing-over or template slippage (Figure 6d). In particular, the donor regions of GLA-U1and GLA-U2 were also observed.

Gene Movements Mediated by Repeat Elements
The movements of four genes were found to be mediated by transposition of repeat elements ( Figure S3). GLU-U7, A-U6 and B-U3 were embedded within different types of Pack-MULEs. GLU-U7, a GRF domain-containing gene, was evolved through capturing a gene fragment from elsewhere in the genome by TNR12/MuDR and inserted into the intron of pseudogene GLU-19. A-U6, a B-box-containing gene homologous to LO-C_Os02g07930, moved to its current location by MERMITEA/ MuDR. B-U3, which is located between B-8 and B-9 in O. brachyantha, was formed through transposition of OSTE1 carrying a DNA fragment from the second exon of LOC_Os10g34340. Finally, ZM7 moved to its current position by EnSpm-13_ZM, a CACTA family transposon in maize. Consistent with other studies, DNA TEs were found more frequently in capturing gene fragments through transposition [16,17,21].

Discussion
More and more species have been fully sequenced and provided insight into genome structural organization and evolution. Comparative analyses have shown the extensive conservation of gene order, but the loss or gain of genes or genomic segments can be easily detected in closely related species and are important for genome organization and evolution. For instance, comparison of grass genomes indicated gene expansion in the evolution of grassspecific-genes [15]. However, the gene loss in duplicated regions were implicated in returning a paleopolyploid to a diploid state after whole-genome duplication [16,17]. Gene movement is a specific type of gene gain by gain of genes or genomic segments in the acceptor sites, but seems like ''movement'' due to the subsequent gene loss in the donor sites. The loss and gain of gene or genomic segment usually happened during the cell cycles when recombination occurs at double-strand break via homologous recombination (HR) or non-homologous end joining (NHEJ) pathways [46][47][48][49]. The mechanism was found to be involved in many human diseases [50,51]. In addition, gain of genes or genomic segments could also be mediated by transposable element capture or retroposition. However, the frequency of these mechanisms are much lower than recombination [21].
Ghd7 is an agriculturally important gene that affects the number of grains per panicle, plant height and heading date. In our study, we found that Ghd7 was conserved throughout the genus Oryza, but absent in orthologous positions in non-Oryza species. However, Ghd7 was not deleted from the genome, but moved to different genomic locations in different subfamilies of grasses. In B. distachyon, features indicating Ghd7 movement was detected. DNA double-strand break (DSB) repair can take place at any position in the genome and is another mechanism for gene movement. Wicker et al. (2010) proposed a model that the DSB could be repaired with a foreign DNA fragment containing a gene after template slippage or an unequal crossing-over event through synthesis-dependent strand annealing. As a result of a DSB, which was induced by an unknown DNA TE, a DNA fragment containing Bradi3g10010 was used as a template for DSB repair by microhomology or non-homologous end joining [52]. Unfortunately, footprints for movements of Ghd7 in maize, sorghum, and foxtail millet have not been identified, possibly due to their being obscured by a high density of repetitive elements.
The whole genome analysis of rice and B. distachyon has indicated that ''copy-and-paste'' is the dominant duplication process for most non-collinear genes. The apparent movement of genes may result from subsequent deletions in the donor region [21]. The putative donor regions of Ghd7 and its orthologs were not observed in the species analyzed, suggesting that the original copy in the donor region was deleted.
The Ghd7 orthologs were identified in both short-day and longday plants. The expressions of Ghd7 orthologs were detected by RT-PCR experiments in eight Oryza species grown under long-day conditions ( Figure S4). Similar to Ghd7, the Ghd7 ortholog in maize (GRMZM2G381691_T01) was shown to regulate flowering time through determining photoperiod sensitivity [53]. We propose that the Ghd7 orthologs in sorghum (Sb06g000570) and Brachypodium (Bradi3g10010) have similar functions in controlling flowering time. Thus, the Ghd7 flowering pathway is not unique to rice; rather, it may regulate flowering properties in a wide range of grass species.
Heterochromatin is defined as densely coiled chromatin that generally replicates late during the S phase [38]. Low gene density, and large blocks of repetitive DNA, especially a high Gypsy content, are characteristics of heterochromatic regions [36][37][38]. Gypsy and Copia, which differ in the order of RT (reverse transcriptase) and INT (DDT integrase) in the POL, are two subfamilies of LTR retrotransposons [54]: Gypsy tends to insert into heterochromatin, while Copia inserts into euchromatin [55]. Most of the Ghd7 orthologous regions in Oryza species displayed features of characteristic heterochromatin, especially in O. rufipogon, O. nivara and O. punctata ( Figure S2, Tables S9 and S11). In contrast, the low TE content and high gene density is suggestive of a euchromatic environment for the Ghd7 region of O. brachyantha. Heterochromatin is commonly characterized as silent chromatin; however, several hundreds of heterochromatic genes in Drosophila, plants and mammals are discovered recently and many of them have transcriptional activity [56][57][58]. In plants, agriculturally important heterochromatic genes have rarely been reported  [59][60][61]. The expression of Ghd7 was detected in diploid Oryza species under long-day condition, suggestive of its conserved activity in a heterochromatic environment.
In summary, the dramatic position-shift of Ghd7 orthologs in the grass genomes and different allele distribution of Ghd7 in rice indicated plasticity of this agronomically important gene. The mechanism for Ghd7 movement in B. distachyon suggests that repetitive elements play an important role in gene and genome evolution in plants. Finally, as a heterochromatic gene, the regulation of Ghd7 might be an interesting model to understand the effect of chromatin environment on gene regulation in plants.

Materials, Growth Conditions and Gene Structure Verification
The seed dormancy of diploid Oryza species was broken by heat treatment (50-54uC for five days). Seeds were washed thoroughly and germinated on moist filter paper in petri dishes at 37uC. Seedlings were transferred to pots and placed in greenhouse seven days later following the methods described by Xue et al. (2008): neutral day-length conditions (12 h sunlight/day) for 30 days; then long-day conditions (15 h sunlight/day) for ten days. Finally, leaves were harvested and stored at 280uC. Total RNA was extracted with TRIzol reagent (TIANGEN, Cat#DP405-02). The cDNA was synthesized by reverse transcriptase (Promega, Cat.M170A). The PCR conditions were as follows: 5 min at 94uC; 35 cycles of 30 sec at 94uC, 30 sec at suitable primers temperature and 1 min at 72uC; 8 min at 72uC. Amplified products were cloned into a T vector (Promega, Cat#1360) and sequences were verified using an ABI 3730 automated capillary sequencer (Applied Biosystems).

Sequencing the BAC Clones from the Ghd7 Orthologous Regions
For each species, paired-end sequencing libraries were constructed with an insertion size of approximately 500 bp and sequenced on Illumina Genome Analyzer II. Because of the high content of repetitive elements, the assembled sequences resulted in many scaffolds without ordering information. Therefore, we used the Roche/454 Genome Sequencer FLX Instrument as a complementary method to sequence these low quality BAC clones. Overlaps between neighboring BACs were determined using BLASTN, and the resultant pseudomolecules were constructed after careful inspection and verification of each overlap. The sequences of all BACs used in this analysis were deposited to the GeneBank datalibrary under the following accession numbers [GeneBank: JN873128-JN873135]. The sequences and related CDS (coding sequence) databases of japonica, indica, B. distachyon, sorghum and maize were downloaded from individual websites (http://rice.plantbiology.msu.edu; http://rice.genomics.org.cn/ rice/index2.jsp; http://www.brachypodium.org; http://www. phytozome.net/sorghum; http://www.maizegdb.org).

Sequence Annotation of Protein-coding Genes and Transposable Elements
Sequences were annotated using the ab initio prediction programs FGENESH (http://www.softberry.com) for gene prediction [62]. In addition, candidate genes have to meet the following criteria: not transposon-related, and containing a known functional domain or having a homolog to known proteins or having a homolog at the syntenic position. All annotations were overlaid on individual BAC sequences and were visualized and edited using ACT v5 and Artemis [63]. The exon-intron structure of gene models were verified by aligning the genomic sequence with cDNAs or ESTs and experimental verification by sequencing RT-PCR products amplified using specific primers for each gene.
Transposable elements (TE) were identified by RepeatMasker (www.repeatmasker.org) and the signatures of each family of TE [54] using cross_match. The intact, truncated, recombination LTR elements and solo-LTRs were manually identified from the outputs of RepeatMasker and LTR_FINDER [64]. For sorghum and B. distachyon repeats analyses, the intact LTR retrotransposon sequences were isolated according to the results from LTR_FIN-DER, and then blasted against RepeatMasker TE libraries for subfamily classification.

Isolation of CCT Family Members from Rice, B. distachyon, Sorghum and Maize
The Ghd7 CCT domain protein sequence was used to search the rice, B. distachyon, sorghum and maize protein database using BLASTP with the following criterion: E-value #1e-5. The CDS of Ghd7 homolog candidates were aligned using MUSCLE [65] and imported into GeneDoc (http://www.nrbsc.org/gfx/gene-doc/ index.html) for manual adjustment. The phylogenetic tree was built using MEGA4.0 [66]. The Neighbor-Joining method was used with the following parameters: pairwise deletion; bootstrap 1000 replicates and Kimura 2-parameter model. The domain of each gene was identified with SMART (http://smart.emblheidelberg.de/smart/set_mode.cgi?NORMAL = 1) and Pfam analysis. The dotplot analysis of Ghd7 homologous gene regions were carried out in Plant Genome Duplication Database [20] (http://chibba.agtec.uga.edu/duplication/index/dotplot).

Insertion Dating of LTR Retrotransposon
The insertion times of LTR retrotransposons were estimated by the divergence time (T) between two LTRs of single intact LTR retrotransposon [67], T = K/2r, where Ks refers to the distance between the two LTRs and r refers to the average substation rate. The two LTRs were aligned using MUSCLE [65]. The distance between each pair of LTRs (K) were calculated using the baseml program (runmode = 2; model = 4) described in PAML [68]. We used substitution rate (r) of 1.3610 -8 substitutions per site per year to estimate the divergence time of LTR as repeat elements were suggested to evolve much more rapid than coding regions [32,69].

Analysis of the Molecular Mechanism of Gene Movement
Dot plot alignment was used to determine the borders of the repetitive elements in species without TE libraries. Target site duplications around genes and repetitive elements were identified using DOTTER [70]. Tandem repeat motifs around target genes were identified using Tandem Repeats Finder (http://tandem.bu. edu/trf/trf.submit.options.html) and RepeatMasker.

Fluorescence In Situ Hybridization (FISH)
For japonica pachytene FISH, a PAC P0046D03 was used as probe (obtained from National Institute of Agrobiological Sciences in Japan; http://www.nias.affrc.go.jp/index_e.html). The chromosome 7 marker (BAC a0050F10) was obtained from Clemson University in USA (http://www.clemson.edu/). The FISH procedure applied to meiotic chromosomes was essentially the same as previously published protocols [71]. Figure S1 Gene features of the Ghd7 regions in diploid Oryza species, B. distachyon, S. bicolor and Z. mays. Each gene is represented by a colored square. The yellow square with black borders indicates that the gene or gene fragment was captured by a transposable element. The numbers in purple rectangles represent the sequence length of the five complex regions in sorghum. The gene number is indicated above the black line and is summarized in Table