Sequence-Based Analysis of Translocations and Inversions in Bread Wheat (Triticum aestivum L.)

Structural changes of chromosomes are a primary mechanism of genome rearrangement over the course of evolution and detailed knowledge of such changes in a given species and its close relatives should increase the efficiency and precision of chromosome engineering in crop improvement. We have identified sequences bordering each of the main translocation and inversion breakpoints on chromosomes 4A, 5A and 7B of the modern bread wheat genome. The locations of these breakpoints allow, for the first time, a detailed description of the evolutionary origins of these chromosomes at the gene level. Results from this study also demonstrate that, although the strategy of exploiting sorted chromosome arms has dramatically simplified the efforts of wheat genome sequencing, simultaneous analysis of sequences from homoeologous and non-homoeologous chromosomes is essential in understanding the origins of DNA sequences in polyploid species.


Introduction
Scientists have long understood that chromosome translocation is a major driving force in shaping genomes during evolution [1][2][3][4][5]. It has been well documented that translocations are frequently associated with genomic disorders [6][7][8] and that translocated genes undergo an elevated rate of evolution [1,9]. Some studies also claim that translocation could alter levels of recombination [10,11] which is not only a major source of intra-specific variation but also an important constraint in crop improvement programs. Such programs aim to bring together multiple chromosomal segments containing favourable alleles into single plant lines.
The presence of the non-homoeologous translocations between chromosomes 4A, 5A and 7B in the hexaploid wheat (Triticum aestivum L., 2n = 6x = 42, genomes AABBDD) is well known, with the first of these translocations was from studies of chromosome pairing and gene-based marker locations [12]. Detailed linkage analyses with molecular markers confirmed the presence of these translocations and also allowed the development of hypotheses on the possible evolutionary origins of these translocation and inversion events [13][14][15]. Analyses of bin-mapped expressed sequence tags (ESTs) showed that, in addition to the two wellknown reciprocal translocations and two inversions, a third inversion was also likely involved in generating the structure of the modern chromosome arm '4AL' [16]. It is believed that the 4AL/5AL translocation occurred at the diploid level as it is also present in T. monococcum (2n = 2x = 14, genome AA) and the 4A/ 7B translocation must have occurred at the tetraploid level as it is also present in T. durum (2n = 4x = 28, genomes AABB) [14]. Molecular marker profiles of chromosome addition and substitution lines indicated that a 4L/5L translocation may also exist in several other species within the tribe Triticeae [17]. However, due to the limited resolution of marker-or deletion bin-based analyses, fine details for any of these translocations and inversions are still not clear. As none of the techniques currently available allow rapid and accurate detection of these translocations in a given genotype, we still do not know the status of these translocations across the full spectrum of bread wheat and its close relatives. There is also no report on possible contributions of these translocations to wheat speciation.
Recent developments in genome sequencing offer an excellent opportunity to characterize these translocations at the gene level. Synteny-based comparisons of sequences between the sorted wheat chromosomes with those of other grass species identified five distinct segments forming the modern chromosome 4A and putative genes anchoring each of the breakpoints [18]. Similar approaches were used in identifying genes bordering the 7BS/5AL breakpoint on the modern 7BS [19]. Compared with those from previous data which are predominantly based on chromosome pairing or molecular marker analyses, the resolutions offered by these gene-based studies should be significantly higher. However, it is well known that duplications of genes or chromosome segments are common in wheat [20,21]. Thus, accurate identification of translocation breakpoints could be difficult when analyses are focused on a single chromosome or even a set of homoeologous chromosomes. Further, attempts to trace the evolutionary origins of the modern chromosomes by exploiting genome sequencing data have not been made. Working toward a better understanding of the biological consequences of translocations and tracing the evolutionary origins of the modern chromosomes, we systematically analysed the structures of the 4A, 5A and 7B chromosomes by comparing Brachypodium genes against survey sequences of sorted wheat chromosome arms and validating locations of selected genes using bin-mapped ESTs. These analyses identified genes neighbouring breakpoints of these translocation and inversion events thus allowing, for the first time, detailed descriptions of the origins of the modern chromosomes 4A, 5A and 7B of bread wheat at the gene level.

Materials and Methods
Previous evidence shows that genome segments are highly conserved between wheat and Brachypodium although small disruptions of colinearity are not uncommon [18,22,23]. Analyses carried out in this study focused on those chromosomal rearrangements evidenced by two or more genes with the same pattern of chromosome arm locations. The known structures of chromosomes 4A, 5A and 7B reported previously [13][14][15][16]24] were used to group Brachypodium orthologs examined in the initial analyses. Data based on comparison of Brachypodium genes with deletion bin-mapped wheat ESTs were used to determine the relative positions and orientations of orthologs within segments of chromosome arms.
As a consequence of the high degree of rearrangement on what is considered the modern chromosome 4A, its arm ratio has been reversed [13][14][15][16]24]. As a result, discussion of the various historical states of this chromosome can become difficult to understand. To alleviate this confusion, this manuscript uses 4AS and 4AL to refer to the arms of the original ancestral version of this chromosome, while the modern chromosome arms are referred to as '4AS' and '4AL'.
Gene-coding sequences labelled as CDS from Brachypodium genome version 192 were downloaded from http://www. plantgdb.org/BdGDB. The locations of orthologs to these Brachypodium genes within the survey sequence data of wheat chromosome arms from the genotype 'Chinese Spring' were determined using the BLAST++ facility of the International Wheat Genome Sequencing Consortium (IWGSC) (http://www.wheatgenome.org/) hosted by URGI (http://urgi.versailles.inra.fr). The BLASTN algorithm was applied for all analyses using an E value cut-off of 0.0001. Wheat ESTs for individual deletion bins on chromosomes 4A, 5A and 7B were downloaded from http://wheat.pw.usda.gov/GG2/index. shtml. Comparison of bin-mapped ESTs against Brachypodium CDS sequences was performed using the BLAST++ BLASTN algorithm with an E value cut-off of 0.00001.

Chromosomal Locations of Genes on the Modern 5AL
Brachypodium orthologs on this chromosome arm could be placed into two sets. Genes in Set 1 had homoeologous sequences on 5AL, 5BL and 5DL, respectively. The pattern of these chromosome arm locations shows that they are from the original 5AL and were not involved in any interchromosomal translocations. Genes in this set were orthologous to genes on two Brachypodium chromosomes, 1 and 4 with Bradi1g03330 bordering the breakpoint (Tables 1 and S1).
Genes in Set 2 detected homoeologous sequences on 5AL, 4BL and 4DL, respectively. The pattern of these chromosome arm locations shows that these genes were derived from the original 4AL. Genes in this set were orthologous to genes on three Brachypodium chromosomes,1, 4 and 5 with Bradi1g75560 bordering the breakpoint (Tables 1 and S1).

Chromosomal Locations of Genes on the Modern 7BS
Brachypodium orthologs on this chromosome arm were placed into two sets. Many of the genes in Set 1 detected homoeologous sequences on 7BS, 5BL and 5DL, respectively. The pattern of these chromosome arm locations shows that these genes were derived from the original 5AL. Genes in this set were orthologous to those on Brachypodium chromosome 1 with Bradi1g00580 as the most likely gene bordering the breakpoint (Tables1 and S2).
Most of the genes in Set 2 detected homoeologous sequences on 7AS, 7BS and 7DS, respectively. The pattern of these chromosome arm locations shows that they were not involved in any interchromosomal translocations. Genes in this set were orthologous to genes on two Brachypodium chromosomes, 1 and 3. Bradi1g49340 can be conservatively assigned as the one bordering the breakpoint (Tables 1 and S2).

Chromosomal Locations of Genes on the Modern '4AL'
Brachypodium orthologs on the modern chromosome arm '4AL' could be placed into four sets based on chromosomal locations of wheat sequences they detect. Genes in Set 1 detected sequences on '4AL', 7AS and 7DS, respectively. This pattern of the chromosome arm locations shows that they were derived from the original 7BS. Genes in this set have orthologs on Brachypodium chromosomes 1 and 3. These genes could be further placed into two subsets based on deletion bin-mapped ESTs (Table S3) but the orientation of genes within these two sub-sets could not be determined.
Genes in Set 2 detected sequences on '4AL', 4BL and 4DL, respectively. This pattern of chromosome arm locations shows that they were derived from the original 4AL. Genes in this set have orthologs on three Brachypodium chromosomes, 1, 2 and 4. These Definitions of these chromosome segments are provided in Figure 1 genes were placed into three sub-sets based on deletion binmapped ESTs (Table S3).
Genes in Set 3 detected sequences on '4AL', 5BL and 5DL, respectively. This pattern of the chromosome arm locations shows that they were derived from the original 5AL. Genes in this set have orthologs on Brachypodium chromosome 1. The segment containing these genes is likely flanked by Bradi1g00587and Bradi1g03320 and its orientation is such that the gene IDs, from the centromere, increase (Table S3).
Genes in Set 4 detected homoeologous sequences on '4AL', 4BS and 4DS, respectively. This pattern of the chromosome arm locations shows that they belong to the original 4AS. Genes in this set have orthologs on two Brachypodium chromosomes, 1 and 4. The segment containing these genes is likely flanked by Bradi1g09250 and Bradi4g14247 (Table S3).

Chromosomal Locations of Genes on the Modern '4AS'
Brachypodium orthologs on this chromosome arm were placed into two sets based on the chromosome arm locations of sequences they detect. Many of the genes in Set 1 detected homoeologous sequences on '4AS', 4BL and 4DL, respectively. This pattern of the chromosome arm locations shows that these genes belonged to the original 4AL. Genes in this set have orthologs on two Brachypodium chromosomes, 1 and 4. The chromosome segment containing these genes is likely flanked by Bradi1g74922 and Badi4g14040 (Table S4).
Five genes were found to likely belong to Set 2 on this chromosome arm and they have orthologs on Brachypodium chromosome 4. These genes detect homoeologous sequences on '4AS', 4BS and 4DS, respectively. This pattern of the chromosome arm locations shows that they were translocated from the original 4AS to the modern '4AS'. Two of them (Bradi4g14830 and Bradi4g14990) also detected sequences on '4AL', showing that they are duplicated on this chromosome. The reason for their inclusion in this set is that genes flanking them all detected sequences on the three short arms of the homoeologous group 4 chromosomes (Table S4).

Discussion
By analysing Brachypodium genes against survey sequences of sorted wheat chromosome arms and by analysing Brachypodium genes against deletion bin-mapped wheat ESTs, we have identified Brachypodium orthologs bordering several translocation and inversion breakpoints on the modern wheat chromosomes 4A, 5A and 7B. This new analysis allowed detailed description of the evolutionary origins of these bread wheat chromosomes at the gene level (Fig. 1).
Several ESTs were mapped to each of the three smallest segments on the modern chromosome arm '4AL' (4AL-3, 4AL-4 and 7BS-1, respectively) [16]. However, we found corresponding Brachypodium genes for only a few of these ESTs (Table S3) preventing us from accurate allocation and orientation of the Brachypodium orthologs on these segments. Considering the orders of the genes on the original chromosomes and the translocation and inversion events, orientations of these fragments could be deduced as (from centromere): increasing gene IDs for those in 4AL-3, decreasing gene IDs for those in 4AL-4 and decreasing gene IDs for those in 7BS-1 ( Fig. 1; Tables S2 and S3). The deduced orientations for those genes on 4AL-3 and 4AL-4 seem to be in agreement with the findings by Hernandez et al. [18] who reported that genes bordering the 'segment C' derived from the original 4AL on the modern '4AL' have opposite orientations. Our data confirm that these genes form two separate 4AL segments  Fig. 1) on the modern '4AL' as proposed by Miftahudin et al. [16].
Previous models of the structures of chromosomes 5A and 7B are highly consistent [12][13][14]. The structure of the chromosome 4A is less clear. Based on next-generation sequencing Hernandez et al. [18] suggested that five segments form the modern chromosome 4A. Our results show that this chromosome contains at least nine segments (Fig. 1), a structure similar to that deducted from deletion bin-mapped ESTs [16]. However, we found that many of the deletion-bin mapped ESTs detect sequences on large numbers of chromosome arms (Table S5) thus could not be used reliably in tracing the origins of a gene or a chromosome segment. We also found evidence showing that a small segment (4AS-1 in Fig. 1) relocated from the original 4AS to the modern '4AS' during the second pericentric inversion (Fig. 1).
Analysis of chromosome pairing showed that the terminal segment of the modern '4AS' is homoeologous to 4BS and 4DS [12], indicating the first pericentric inversion (marked as event 'B' in Fig. 1) was proximal to the telomere of the original 4AS. This chromosome pairing result seems to be supported by the locations of two deletion bin-mapped ESTs, BE518074 and BE494743 [16]. These two ESTs, when analysed by sequence similarity against sequences of the sorted wheat chromosome arms showed that, although both detected sequences on the modern chromosome arm '4AS', neither gave a clear matching chromosome pattern to confirm that they were translocated from the original 4AS to the modern '4AS' (Table S5). BE518074 matched Brachypodium gene Bradi2g54210. However Brachypodium orthologs on either side of this gene failed to detect sequences on each of the three short arms of the homoeologous group 4 chromosomes (Table S6). Thus the question whether the inversion breakpoint was proximal to the telomere of the original 4AS remains unanswered.
The strategy of sequencing the wheat genome based on sorted chromosomes or chromosome arms has significantly simplified wheat genome research as it circumvents many of the complications caused by the hexaploid nature of this species [18,19,25,26]. However, caution is required when using results obtained from such a strategy to trace the evolutionary origin of a given gene or a chromosome segment. For example, Bradi1g00227 and Bradi1g02980 were reported to flank the original 5AL segment on the modern '4AL' [18]. Our results showed that the segment flanked by Brad1g00450 and Bradi1g00580 was actually translocated from the original 5AL to the modern 7BS as the majority of the genes residing on this segment have homoeologous sequences on 7BS, 5BL and 5DL (Fig. 1, Table S7). The location of these genes on 7BS is further supported by the fact that most of these genes were found to be present on the 7BS syntenic build [19]. Many of the genes between Bradi1g00227 and Bradi1g00460 detected homoeologous sequences on the modern '4AL'. However, most of them also detected multiple sequences on chromosomes belonging to several homoeologous groups. For example, Bradi1g00227 detects homoeologous sequences on 21 chromosome arms belonging to six of the seven homoeologous groups of bread wheat (Table S7). The multiple locations of many genes in bread wheat are not surprising considering its hexaploid nature and the well-known fact that duplications of genes or chromosome segments are common in this species [20,21].
Another example is the 7BS syntenic build where Bradi1g49497 was suggested to be one of the genes neighbouring the 7BS/5AL breakpoint on this chromosome arm [19]. We found that the anchoring gene for this breakpoint is Bradi1g49340. Bradi1g49497 detected sequences on both 7BS and 4AL and most of the Brachypodium orthologs between these two genes belong to a segment translocated from the original 7BS to the modern '4AL' as they detected homoeologous sequences on '4AL', 7AS and 7DS, respectively (Table S8). These examples demonstrate that a more in-depth simultaneous analysis of sequences from homoeologous and non-homoeologous chromosomes is essential in understanding the origins of a DNA sequence in polyploid species.