De Novo Centromere Formation and Centromeric Sequence Expansion in Wheat and its Wide Hybrids

Centromeres typically contain tandem repeat sequences, but centromere function does not necessarily depend on these sequences. We identified functional centromeres with significant quantitative changes in the centromeric retrotransposons of wheat (CRW) contents in wheat aneuploids (Triticum aestivum) and the offspring of wheat wide hybrids. The CRW signals were strongly reduced or essentially lost in some wheat ditelosomic lines and in the addition lines from the wide hybrids. The total loss of the CRW sequences but the presence of CENH3 in these lines suggests that the centromeres were formed de novo. In wheat and its wide hybrids, which carry large complex genomes or no sequenced genome, we performed CENH3-ChIP-dot-blot methods alone or in combination with CENH3-ChIP-seq and identified the ectopic genomic sequences present at the new centromeres. In adcdition, the transcription of the identified DNA sequences was remarkably increased at the new centromere, suggesting that the transcription of the corresponding sequences may be associated with de novo centromere formation. Stable alien chromosomes with two and three regions containing CRW sequences induced by centromere breakage were observed in the wheat-Th. elongatum hybrid derivatives, but only one was a functional centromere. In wheat-rye (Secale cereale) hybrids, the rye centromere-specific sequences spread along the chromosome arms and may have caused centromere expansion. Frequent and significant quantitative alterations in the centromere sequence via chromosomal rearrangement have been systematically described in wheat wide hybridizations, which may affect the retention or loss of the alien chromosomes in the hybrids. Thus, the centromere behavior in wide crosses likely has an important impact on the generation of biodiversity, which ultimately has implications for speciation.


Introduction
Centromeres, which are located at the primary constriction of the chromosome, are required for the accurate segregation of chromosomes and serve as the sites for kinetochore assembly during mitosis and meiosis. The main DNA components of the centromere are highly repetitive, such as the 171-bp α-satellite repeat in humans and 150-to 180-bp simple tandem repeats in some flowering plants [1][2][3][4][5]. Long-terminal repeat (LTR) retrotransposons, also known as centromeric retrotransposons (CRs), are often intermingled with tandem repeats and are enriched in plant centromeric regions [6][7][8][9][10][11]. The highly conserved function of the centromere is correlated with its epigenetic features, including the histone H3 variant CENH3 in plants (CENP-A in mammals) [12][13][14][15], phosphorylation of histone H2A at Thr-133 [16] and H3 phosphorylation at Ser-10 [17,18]. Despite the conserved centromere function, centromeric repeat sequences apparently evolved rapidly in some species under specific circumstances. This phenomenon is known as the "centromere paradox" [13].
Centromeric sequences are highly variable between different species and different chromosomes and even between the same centromeres from different ecotypes or varieties [5,11,19,20]. Most of the centromeric tandem repeats in plants, such as CentO in rice (Oryza sativa), CentBd in Brachypodium distachyon, and CentC in maize (Zea mays), are likely to be species-specific [4,5,21]. Several wild Oryza species lack CentO and instead possess genomespecific satellite repeats [22]. Similarly, little homology was found between the centromeric sequences of the potato (Solanum tuberosum) and its wild relative S. verrucosum [23]. Moreover, centromeres showed diversity in the repeat-less and repeat-based sequences on different chromosomes of S. verrucosum [20]. Eukaryotic centromeres carrying novel satellites may have evolved from neocentromeres that experienced insertion and/or extensive amplification of satellite repeats [20,24]. Previous studies revealed that recent segmental duplication, abundant rearrangements, and reshuffling occurred in CEN4 and CEN8 of rice and that the changes in CEN8 seemed to appear after the divergence of the O. sativa subspecies japonica and indica from a common ancestor [24,25]. An analysis of centromere retention or loss indicated that the major events during the evolution of maize from a supposed tetraploid ancestor (Sorghum bicolor) were chromosomal rearrangements, such as insertions and translocations, resulting in dysploidy and reduced chromosome numbers [26]. Despite the observation that substantial variations in centromeres occurred during evolution, the relationship between centromere variations and species evolution remains uncertain.
In most eukaryotes, the centromeric sequences alone are insufficient to maintain a functional centromere [27]. In humans and plants, many newly formed centromeres are devoid of typical centromeric sequences, and their formation was likely determined by epigenetic mechanisms [28][29][30][31][32]. Additionally, the centromere activity of dicentric chromosomes is independent of centromeric sequences. Many stable dicentric chromosomes in maize, including A-A and A-B centromeres (the centromere of the supernumerary B chromosome contains B-specific repeats), contain one active and one inactive centromere, as determined by examining the epigenetic modifications [18,33]. Furthermore, the inactive centromere recovered its activity by switching its epigenetic features under certain circumstances [34]. The essential structural and functional components for the core chromatin of centromeres include pericentromeric heterochromatin and active transcription of centromeric DNA [35][36][37][38]. The epigenetic components, rather than the DNA sequences, are essential for the establishment of centromere function. However, it remains a mystery why most functional centromeres contain highly repetitive sequences.
Allopolyploid wheat, either tetraploid or hexaploid, originates from interspecific hybridizations that trigger striking chromosomal rearrangements, genome reorganization, and chromatin remodeling in the parental genomes [39][40][41][42][43]. Wheat also has the capacity to hybridize with its wild relatives, which provides a broader gene source for wheat germplasm enhancement through addition, translocation, and substitution lines containing alien chromosomes [44][45][46]. In fact, wheat appears to prefer alien chromosomes or fragments from specific genomes [47]. However, the mechanisms regulating the stable transmission of these alien genomic sources in defined genetic backgrounds are still unclear. A previous study indicated that the size of the maize centromere was expanded in oat (Avena sativa)-maize addition lines, which may be a key factor for the survival of neocentromeric chromosomes in natural populations [48]. As such, an understanding of the adaptation of centromeres to "genome shock" and their evolutionary history in the wide wheat hybrid will require additional studies.
Due to their repetitive structures and low sequence conservation, it is difficult to compare centromeric sequences across different species. Complete centromeres on partial chromosomes have been sequenced in rice and maize [8,[49][50][51]. In wheat, only partial centromere sequences have been released from published bacterial artificial chromosome (BAC) sequences [49,[52][53][54]. Here, we observed that the content of classical centromeric retrotransposon sequences was reduced or apparently lost in both aneuploid wheats (4D, 1B, 5D chromosomes) and their wild relatives, such as Th. intermedium and Th. elongatum, when hybridized with wheat (Fig 1 and  Table 1). With new developments in wheat genome sequencing, we first uncovered the detailed sequences in the new centromere of the 4DS chromosome in wheat aneuploids by ChIP (chromatin immunoprecipitation)-sequencing with wheat CENH3 antibodies. Additionally, for Th. intermedium, which does not have a sequenced reference genome, we developed a new ChIPdot-blot strategy (see Methods) to identify the novel centromeric sequences in the wheat-Th. intermedium addition line TAI-14. We also detected the expansion of centromeric sequences and the formation of multiple centromeres in wheat and its wide hybrid offspring (Fig 1 and  Table 1). Finally, we provide a detailed analysis of centromere variations and offer some new insights into centromere evolution in wheat and its wild relatives (Fig 1). Stable and novel chromosomes induced by universal centromere variations in wheat and wide hybrids. Centromere alterations occurred in aneuploid wheat and in wheat hybrids. Reduction or even deletion of the centromeric sequences did not affect the transmission of the chromosomes or chromosome fragments to the next generation because of de novo centromere formation. Centromere expansion and breakage led to novel alien chromosomes in wheat and wide hybrids.

Reduction and elimination of centromeric sequences in wheat aneuploids and derivatives of wheat wide hybrids
The loss of canonical centromere sequences can be induced by breakage, rearrangements and radiation at plant centromeres [31,32]. Here, we observed the elimination of centromeric sequences in both wheat aneuploids and their wide hybrids.
Compared with normal centromeres in the T. aestivum Chinese Spring background, weaker fluorescence in situ hybridization (FISH) signals from the CRW probes were detected in the ditelosomic lines 5DL, 5DS and 1BS [55] (Fig 2A-2C, 2E-2G and 2I-2K and S1 Fig). Thus, significant reductions of centromeric sequences can frequently occur in allopolyploid wheat. However, CENH3 immunostaining revealed that functional centromeres were present in these three lines (Fig 2D, 2H and 2L). Additionally, in the ditelosomic line 4DS [55], we were unable to detect any CRW signals with FISH in the centromere or the chromosome arms, which stands in stark contrast to the normal chromosome 4D (Figs 3A-3C and S1). However, the epigenetic marks of active centromeres, including CENH3 and H2A phosphorylation at Thr-133 and H3 phosphorylation at Ser-10, were correctly loaded on the short arm of the 4D chromosome, suggesting that a de novo centromere had formed that lacked the canonical centromeric sequences (Fig 3D-3F). The wheat-Th. intermedium addition line TAI-14 was generated from hybrids between T. aestivum Xinshuguang 1 and amphidiploids zhong2 (2n = 56) [56]. The CRW sequences were heterogeneously distributed in the centromeric region of the 42 chromosomes of Th. intermedium, although some FISH-detected signals were very weak (S2A Fig). However, there were no detectable CRW signals on the Th. intermedium-derived chromosome in TAI-14 ( Fig 4A). These chromosomes had functional centromeres, as revealed by the presence of CENH3 ( Fig 4B).
Additionally, most copies of CRW in the two alien chromosomes from Th. elongatum were eliminated in the derivatives of the Chinese Spring nulli-tetrasomic lines N6AT6B (2n = 42) × wheat-Th. elongatum amphidiploid 8802 (2n = 42, AABBE1E2) ( Fig 5B). The genome of 8802 consists of 28 chromosomes from T. durum Kekeruite (2n = 28) and 14 chromosomes from Th. elongatum AE31 (2n = 28) [57]. No obvious CRW FISH signals were detected on the chromosomes of Th. elongatum in the addition line derived from T. durum Kekeruite × 8802 ( Fig 5A). However, the chromosomes that lack CRW sequences were stably transferred to the next generation, indicating that functional centromeres were formed on these chromosomes.
Novel sequences are involved in de novo centromere formation on an alien chromosome Without a reference genome sequence for Th. intermedium, it is difficult to characterize the sequences involved in the de novo formation of centromere in TAI-14. We designed a new strategy to isolate the neocentromere sequences based on CENH3-ChIP and dot-blot methods (see Methods). The CENH3-ChIP-enriched DNAs in the control Chinese Spring (abbreviated as CS) and TAI14 were further analyzed by dot-blotting. The signals from the dot-blots that were significantly different between CS and TAI14 (e.g., signals present in TAI14 and not in CS) were treated as potential elements involved in de novo centromere formation. Two sequences, TAI-14-1 and TAI-14-2, were identified as new centromeric sequences for the new centromere in TAI-14 ( Fig 4C and 4D), and both sequences showed homology to known retrotransposons by alignment to a BAC genome sequence in wheat (S3 Fig). TAI-14-1 was widely dispersed on nearly all of the chromosomes, whereas TAI-14-2 was mainly detected in the pericentromeric regions of the wheat chromosomes (Fig 4C and 4D). We observed that TAI-14-2 was located at both the centromeric and pericentromeric regions on different chromosomes in Th. intermedium (S1B Fig). Interestingly, some chromosomes of Th. intermedium that showed less CRW distribution were accompanied by more TAI-14-2 sequences occupancy in the centromeric region (S2D and S2E Fig). This result suggests that CRW and TAI-14-2 may be complementary centromeric sequences in some chromosomes of Th. intermedium.
A 994-kb genomic sequence of the 4DS chromosome is involved in de novo centromere formation in 4DS The total loss of CRW sequences and the presence of CENH3 in the 4DS ditelosomic line suggest that a new centromere formed on the 4DS chromosome (Fig 3). Because sequenced genomes published for Ae. tauschii (wheat D genome donor) and CS [58,59], we performed a ChIP-seq analysis with CENH3 antibodies and identified potential sequences in the new centromere on the 4DS chromosome. Because the genomes were not completely assembled, we chose a mapping strategy that equally mapped all reads to multiple loci. The raw reads of the 4DS and CS (as control) samples were mapped to the wheat D and CS genomes, respectively, using BWA software (S1 Table) [60].
Using the sequence of the CS genome as a reference, we identified 107 scaffolds on the short arm of the 4D chromosome, with different CENH3 enrichments between 4DS and CS. The sizes of the 107 scaffolds ranged from 1,594 to 32,269 bp, and these scaffolds were combined into a 994-kb region containing only 11 genes that code for ribosomal and photosystem proteins (Fig 6, S2 Table). Furthermore, we selected one of the assembled scaffolds, IWGSC_CSS_4DS_scaff_2287721 (3665 bp), as a FISH probe and confirmed its localization in the 4DS de novo centromere (Fig 7A). We mapped this scaffold to the genome of Ae. tauschii using BLASTN [61] and identified a 68-kb fragment (Scaffold 33994) that contained most of the sequences of the scaffold that showed mapping differences between the 4DS and CS (Fig 6). Both of the sequences, the homologous 3,665-bp sequence from the CS genome and the 68-kb sequence from the D genome, contained many transposable elements and similar GC levels (48.05% and 52.90%, respectively, Fig 6). Due to the incomplete genome sequence, we tentatively suggest that the partial sequences of the 994-kb region in the wheat CS genome and the 68-kb region in wheat D genome may underlie de novo centromere formation in 4DS.
In addition, the same strategy that was used to isolate the neocentromere sequences on TAI-14 was employed with 4DS to better understand the sequences in the 994-kb region. A 769-bp fragment (named 4DS-1) near the original centromere was identified as a candidate Transcription and histone modification accompanied de novo centromere formation in 4DS A previous study showed that the transcripts of centromeric sequences can function as essential components of centromere structure and activity [37]. Histone modifications, such as methylation and phosphorylation, are important regulators of centromere stability and activity [35,38]. For sequence 4DS-1 in the 4DS de novo centromere, we tested whether there were changes in transcription and the transcripts that interacted with CENH3 via RT-qPCR and RNA-CENH3-ChIP. We selected two fragments (~300 bp) of the 4DS-1 sequence, termed 4DS-1-1 and 4DS-1-2. Compared with CS, the transcripts of 4DS-1-1 and 4DS-1-2 were slightly but not significantly decreased in 4DS (Fig 7C). However, the amounts of the 4DS-1-1 and 4DS-1-2 transcripts that were associated with CENH3 were remarkably increased in 4DS compared with CS (Fig 7D). These results suggest that increased transcription of the corresponding sequences may accompany de novo centromere formation in 4DS. We also checked the possible changes in six histone modifications between the normal and de novo centromeres via immunoassay. No significant signals for the euchromatin marks H2AZ and H3K4me3 were detected in either centromere (S6A and S6B Fig), and enrichment of the euchromatin-related histone mark H3K4me2 was discernible for both centromeres (S6C Fig). Compared with noncentromeric chromosome ends, both centromere types revealed a reduction in the heterochromatin marks H3K27me2 and H3K27me3 (S6D and S6E Fig), whereas there were no obvious differences in H3K9me2 in the two centromere types (S6F Fig). In general, there were no significant differences in the accumulation of most euchromatic or heterochromatic histone markers between the normal and de novo centromeres.

Multi-locus centromere formation in wheat wide hybrids
We crossed the hexaploid amphidiploid 8802 with the T. aestivum Chinese Spring nulli-tetrasomic lines to establish new substitution lines for the chromosomes from the E genome. All chromosomes from 8802 have only one centromere, as determined by the FISH signals of CRW (S7A Fig). Chromosomes with two regions containing CRW sequences were identified in the F1 hybrids of the nulli-tetrasomic (3, 5 and 6 homologous groups) lines × 8802 (S7C- S7F Fig). In the F5 generation of the hybrids between nulli-tetrasomic line N6AT6B (2n = 42) × 8802, chromosomes containing two regions with centromeric sequences were inherited from the F1 generation, but only one region was functional (Fig 8A and 8E). In addition to the twolocus centromere, a three-locus centromere was discovered on an alien chromosome in the F5 generation (Fig 8B), and the middle centromere region was shown to act as the functional centromere ( Fig 8F). Furthermore, two different three-locus centromeres were observed in the F6 generation (S8C and S8D Fig).
The repetitive sequences pAs1 and pSc119.2 were used to karyotype the chromosomes with two-or three-locus centromeres. The chromosomes with two centromeric regions originated from the 2E chromosome of 8802, but the three centromeric regions were produced by the combination of sequences from two different arms of chromosomes 2E and 5E, rather than direct inheritance from any chromosome (Figs 8B, 8D and S7B). Furthermore, two different three-locus centromeres were observed in the progeny. One progeny contained an isochromosome 2ES, and the other progeny contained a chromosome produced from the wheat 6BL chromosome and the Th. elongatum 2E chromosome (S8A and S8B Fig). However, neither was stably transferred to the next generation.
Multi-centric chromosomes are frequently formed in the hybrids of wheat and related species, such as Th. elongatum, Th. poticum, Th. intermedium, Agropyron cristatum, Hordeum vulgare and S. cereale (S9 Fig). However, unlike the two-locus centromere in the hybrids of N6AT6B × 8802, both centromeres in these dicentric chromosomes were active, which caused chromosomal loss in the next generation.

Expansion of centromeric retroelements in wheat-rye addition lines
Heterochromatin alterations and chromosomal rearrangements associated with centromere changes have been reported in derivatives of wheat-rye hybrids [62]. The chromosomes containing altered centromeres were lost in the next generation. Here, we discovered another wheat-rye hybridization-promoted centromeric retrotransposon expansion in different rye addition lines. These changes can be stably transmitted to offspring.
The rye addition lines were generally obtained from successive backcrossing between wheat and triticale. A novel octoploid triticale (2n = 56) was generated by hybridization between T.
aestivum Mianyang 11 (2n = 42) and S. cereale Kustro (2n = 14). In the wheat and octoploid triticale hybrids, a novel chromosome emerged after the joining of the two 2R chromosomes (2R-2R). Compared with the normal centromeres of chromosomes 2R and 2RL, the centromere in the 2R-2R chromosome was drastically expanded and was much larger than the 2RL arm (Figs 9A, 9D and S10). Further analysis showed that this large centromere consisted of two normal centromeres from chromosome 2R and a centromere-like region between them with dispersed pAWRC.1 (rye-specific centromeric retrotransposon) sequences [63] (Fig 9A). Dispersed centromeric retrotransposons may function as a part of an active centromere, as 2R-2R was broken into smaller fragments after self-pollination (Fig 9B and 9E). In the progeny, we detected a new 2R chromosome (smaller than the canonical 2R) that retained a region with dispersed pAWRC.1 sequences and was approximately half the size of 2R-2R (Figs 9B, 9E and S10B). A novel chromosome 6R contained pAWRC.1 sequences in a region near the functional centromere in the 6R addition line (Figs 9C, 9F and S10B). However, these regions did not have centromere activity in these progeny (S11 Fig).

Discussion
De novo centromere formation adjacent to native centromeres depends on unique epigenetic modifications in wheat and its wide hybrid Functional centromeres without classic centromeric sequences have been reported in humans, fungi, and plants [29][30][31][32]64,65]. Here, we found that most neocentromeres in wheat and its addition lines consist of genomic sequences that have loose resemblance to the sequences of normal centromeres. The sequences involved in the de novo centromere formation in the ditelosomic lines 4DS and TAI-14 were located at the chromosome arms adjacent to the native centromere. The 4DS-1 sequence was located very near the centromeres of chromosomes 4D, and sequence TAI14-2 was detected in the pericentromeric region before the new centromere had formed (Figs 7B and S4). The preferential localizations of the de novo centromeres in wheat and its addition lines were similar to some of the newly formed centromeres in chicken, Candida albicans and a wheat-barley addition line [66,67]. However, in contrast to wheat, human neocentromeres can originate at multiple positions on different chromosomes through the binding of essential kinetochore proteins [30]. In addition, neocentromeres also have been observed at multiple locations on chromosome arms in other plants [29,32,65]. This difference suggests that the de novo centromere formation was not dependent on location and occurred under appropriate epigenetic conditions, as indicated by previous studies [28,31,32]. We observed that CENH3, H2AT133ph and H3S10ph were deposited in all of the newly formed centromeres in wheat and its wide hybrids (Figs 2D, 2H, 2L, 3D-3F and 4B). With the exception of CENH3 or CENP-A, other histone modifications in the centromeric region are also critical elements for centormere stability and activity. CENP-A nucleosomes are interspersed with the H3K4me2 nucleosomes within the centromeric chromatin of humans and flies [68,69]. Enrichment of H3K9me2 and H3K9me3, but not H3K4me2, has been observed in maize centromeres [70,71]. Furthermore, H3K4me2, H4ac and H3K27me1 were enriched in the centromeric chromatin of rice [72]. We observed that H3K4me2 was slightly enriched in the normal and newly formed centromeres of wheat, similar to previous reports in humans, flies and rice. However, weak signals for H3K27me and H3K9me2 were observed in these two centromeres, which were different from the centromeres of maize and rice (S6 Fig). Based on our results, the histone modifications in the new centromeres are similar to the normal centromeres, highlighting that possibility that unique histone modifications in wheat centromeric regions may be required for de novo centromere formation and stability.
Noncoding RNAs from the centromeric sequences directly interact with the kinetochore and recruit CENPC to centromeres [73]. In maize, transcripts of centromeric retrotransposons and repeat sequences have been associated with the CENH3 protein, as detected by ChIP with anti-CENH3 antibodies [74]. Our results showed that the expression of neocentromeric sequence 4DS-1 was greatly elevated during the process of de novo centromere formation ( Fig  7C and 7D). The transcripts of centromeric sequences may serve as structural and regulatory components of the centromere. In fact, its transcription process, rather than the transcripts themselves, may have facilitated CENP-A deposition and nucleosome assembly at the centromere [35,37]. De novo centromere formation in 4DS increased the expression of the 4DS-1 sequence, which may provide the RNAs that regulate CENH3 deposition and recruit epigenetic elements [35,37].

Highly variable of new centromeric sequences between different chromosomes and species
For cereal centromeres, two common sequences, described as cereal centromeric sequence (CCS1) and Sau3A9, were first reported in Brachypodium and Sorghum bicolor [75,76]. In most cereals, these centromeric sequences represent parts of the Ty3/gypsy retroelement, e.g., 'cereba' of barley, implying that cereals maintain conserved retroelements in their centromeric regions [77]. However, the tandem repeats in the centromere are species-specific [77]. We observed that the pericentromeric sequence TAI-14-2 of wheat was located in the centromeric region of several chromosomes of the wild relative Th. intermedium (S2 Fig). The centromeric sequences may undergo insertion and amplification during species differentiation [20,25,51], providing insight into the mechanism of differential centromeric sequences change in wheat and its wild relatives.
Centromeres on different chromosomes of one species may evolve rapidly and are independent of each other [20]. The novel centromeric sequence 4DS-1 displayed a specific localization pattern in the different chromosomes of the wheat subgenomes, and its chromosomal location was very near the centromeres of chromosomes 7A, 7D, 2A and 2D (Figs 7B and S4). However, the sequence was not readily detectable on chromosomes 2B and 7B, and it was not found in other chromosomes, such as 4A and 4B. Similar to the situations in rice, maize, and potato, the centromeres on different chromosomes experienced differentiation by sequence loss and insertion [20,25,51].
Centromeric protein complexes, which are necessary for proper chromosome segregation, can be formed by the speciation factors HMR and LHR to mediate hybrid sterility and incompatibility in Drosophila [78]. During hybridization, the intragenomic conflicts of different centromeres may cause incompatibilities in hybrids [79,80]. These results strongly suggest that centromere divergence has an important impact on the generation of biodiversity [78]. To adapt to highly variable centromere DNA sequences, the centromere proteins undergo rapid evolution to maintain a functional centromere [81]. We observed that the new centromeric DNA sequences are highly variable between different genomes and chromosomes, which may allow the centromeric proteins to mediate intragenomic incompatibility and genomic specificity in the nascent hybrids. Although detailed reports are not available, it is likely that centromere sequence diversity has an important impact on speciation. An understanding of the diversity of centromeric sequences and its link to speciation and genomic stability deserve further analysis.

Most variations in the centromeric sequences occur during wide hybrid formation
Multiple centromeres, holocentromeres and neocentromeres formation implies that centromere positions may be alterable rather than permanently fixed [11,31,32,[82][83][84]. Wide hybridization can trigger chromosomal rearrangements and genome reorganization, accompanied by centromere alterations [62,85]. Unstable di-centromeres and multiple centromeres have been associated with the formation of inter-and intra-chromosomal translocations in wheat-rye hybrids [62]. Our results demonstrated that multi-locus centromere formation, centromere expansion, and canonical centromeric sequence elimination may yield novel chromosomes in wheat and its wide hybrids (see summary Fig 1).
Chromosomes with two regions containing centromeric sequences were observed in the F1 hybrids of three null-tetra lines and 8802 (S7 Fig). During meiosis I of the F1 generation, several univalents of the E genome in amphidiploid 8802 may experience centromere breakage. Centromere misdivision, which depends on the orientation of the univalent, may occur across the either centromere or the pericentric chromatin, but chromosome fragments containing centromeric and pericentric regions may survive [86]. This observation suggests the possibility that chromosomal fragments containing centromeric and pericentromeric regions were rejoined to a novel chromosome and induced the formation of two-and three-locus centromeric regions. Centromere inactivation allows dicentric chromosomes with only one functional centromere to be stably transmitted to the next generation (Fig 8E), similar to stable dicentric A-A and A-B chromosomes in maize [18,33,34]. However, the chromosome containing a three-locus centromere still suffered from centromere breakage, which led to its structural alterations in the progeny. As a consequence of a translocation in the nulli-tetrasomic line N6AT6B, a chromosome with a three-locus centromere included the 2E chromosome from 8802 and the 6B chromosome from N6AT6B (S8D Fig). Similar to rye and maize [18,86,87], intrachromosomal recombination and centromere breakage likely promoted the formation of multi-locus centromeres and novel chromosomes in wheat and its wide hybrids.
Retrotransposons can be activated during wide hybridization [88]. Interspecific hybrids triggered the amplification of centromeric satellite repeats and retrotransposons [85]. In a wheat-rye hybrid, we observed two chromosomes that had likely lost their telomeres and fused into one chromosome, as previously suggested [89]. The wide hybridization affected the stability of the centromeric retrotransposons and the activation of rye-specific retrotransposons in the 2R-2R and 6R addition lines, causing centromeric sequence expansion (Fig 9A and 9C). These unstable centromeres may subsequently lead to chromosome breakage and different progeny with expanded centromeres, suggesting that centromere variants may trigger the formation of novel chromosomes. Total centromere size has been postulated to be positively correlated with genome size rather than chromosome size [90]. Centromere domains of several maize chromosomes (average size 1.8 Mb) expanded to 3.6 Mb in the background of the oat genome [48]. However, the expanded centromere on chromosome 2R-2R may instead be a general response to genomic stress following the wheat-rye hybridization, rather than an adaptation to the wheat centromere size.
In summary, we observed that the elimination, rearrangement and expansion of centromeric sequences affect chromosome morphology and maintenance in wheat wide hybrids. De novo centromere formation promotes the accurate segregation of chromosomes that experienced centromere sequence elimination. The new centromeres have low sequence homology but high epigenetic similarity to normal centromeres. The highly variable centromeric sequences between genomes and chromosomes may facilitate genomic specificity and differentiation in hybrids. The centromeric sequences involved in de novo centromere formation are mainly retrotransposon-like sequences, and their RNAs are transcribed at high levels. Multiple centromeres and centromere sequence expansion strongly influence centromere activity and cause chromosome breakage and rearrangements. More importantly, centromere variations in the ditelosomic lines 4DS, 1BS, 5DL, 5DS and TAI-14 may specifically affect the size and DNA organization of normal chromosomes. Thus, these may be useful for future studies of chromosome sorting and sequencing.

Cytological preparation, probe labeling and images processing
The root tips were prepared for the FISH experiments and probes as previously described [91]. The CRW and other genomic DNAs were labeled with Alexa Fluor-488-dUTP (green) or Alexa Fluor-594-5-dUTP (red) as needed. The root tip samples of the different wheat lines were treated with the same conditions, and with an equal amount of probe. The FISH images were acquired using an epifluorescence Olympus BX61 microscope (Olympus China Inc, Beijing, China) with the same exposure time and were processed with Adobe Photoshop CS 3.0. For analysis, the fluorescence was quantified using ImageJ [92]. At least 20 cells from three different plants were counted in each of the different wheat lines. Significant differences were calculated using Microsoft Excel and Student's t-test (two-tailed).

Immunolocalization in somatic cells
The root tips were fixed with 4% formaldehyde in 1×PBS for 1 h, and the metaphase chromosomes were prepared as previously described [17]. The wheat-specific CENH3 antibodies were produced in our laboratory. The phH2AThr-133 and phH3Ser-10 antibodies were described previously [16,17].
ChIP-seq, RNA-ChIP-qPCR and RT-qPCR Chromatin immunoprecipitation (ChIP) was performed according to a previously described method [93]. Approximately 20 g of fresh leaf tissue from Chinese Spring and 4DS was prepared for CENH3-ChIP (CENH3 antibody used as described above). ChIP-seq was conducted according to a previously described method [32]. Using an Illumina HiSeq2000 platform, the enriched DNA samples were sequenced to generate paired-end 100-bp sequence reads. RNA-ChIP was performed using a method that was similar to ChIP [93]. RNase activity was inhibited using Recombinant RNase inhibitor (RRI) in the RNA-ChIP process. The RNA was extracted from the sample using TRIzol reagent. The RNA was reverse-transcribed into cDNA using M-MLV reverse transcriptase (Promega) and random primers (New England Biolabs). The qPCR protocol was performed as described [94].

ChIP-seq data analysis
Nearly 14,000 Mbp of raw ChIP-seq paired-end 100-bp reads were mapped to the wheat CS and wheat D genomes using BWA software [60]. All mappable reads were randomly assigned a locus from the possible options, and the duplicated reads were removed. The reads per million (RPM) values of every scaffold in the genome were calculated to show the normalized enrichment. We selected the scaffolds that had enriched reads in both 4DS and the control, with the RPM ratios between two samples 20, or scaffolds that only had enriched reads in 4DS, with read counts 20. IGV Tools was used to visualize the normalized read distribution of each scaffold [95]. The anti-CENH3 ChIP-seq data were deposited in the Gene Expression Omnibus (GEO) database under number GSE63752.

Dot-blot hybridization of the ChIP-enriched DNAs
The ChIPed DNAs from CS and 4DS were amplified by Dop-PCR (primer sequence was 5'-CCGACTCGAGNNNNNNATGTG G-3') [96]. The DOP-PCR product was used as the DNA target, and the ChIPed DNAs were used as probes. The dot-blot protocol was performed as described [97]. The differences in the sequences between CS and 4DS that were determined by the dot-blot hybridization were selected as candidate centromeric sequences for FISH.  A) and (B). The novel 2R and 6R addition lines, respectively. The genomic DNA of rye is labeled in green, CENH3 is labeled in red, and DAPI staining is labeled in blue. The insets show high-magnification images of the chromosomes with expanded centromeres. Bar = 10 μm. (TIF) S1