Nucleotide diversity patterns at the DREB1 transcriptional factor gene in the genome donor species of wheat (Triticum aestivum L)

Bread wheat (AABBDD) originated from the diploid progenitor Triticum urartu (AA), a relative of Aegilops speltoides (BB), and Ae. tauschii (DD). The DREB1 transcriptional factor plays key regulatory role in low-temperature tolerance. The modern breeding strategies resulted in serious decrease of the agricultural biodiversity, which led to a loss of elite genes underlying abiotic stress tolerance in crops. However, knowledge of this gene’s natural diversity is largely unknown in the genome donor species of wheat. We characterized the dehydration response element binding protein 1 (DREB1) gene-diversity pattern in Ae. speltoides, Ae. tauschii, T. monococcum and T. urartu. The highest nucleotide diversity value was detected in Ae. speltoides, followed by Ae. tauschii and T. monococcum. The lowest nucleotide diversity value was observed in T. urartu. Nucleotide diversity and haplotype data might suggest no reduction of nucleotide diversity during T. monococcum domestication. Alignment of the 68 DREB1 sequences found a large-size (70 bp) insertion/deletion in the accession PI486264 of Ae. speltoides, which was different from the copy of sequences from other accessions of Ae. speltoides, suggesting a likely existence of two different ancestral Ae. speltoides forms. Implication of sequences variation of Ae. speltoides on origination of B genome in wheat was discussed.


Introduction
Frequent changes in climate, such as sudden low temperature, high temperature, and flooding, have caused serious damage to crop growth and development [1,2], while, gradually, drought and salinity under increased agricultural pressure have constrained the yield and geographical distribution of global crops, resulting in a 70% reduction in their potential yield [3] and caused irreversible damage to field ecology [4]. Unreasonable practices will continue to increase the soil salinity [5]. The global drought problem may gradually increase in the foreseeable future [6]. Under the influence of plant growth and development [1], the mechanism of tolerance and adaptation of plant to abiotic stress has been a research hotspot. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 However, the nucleotide diversity of DREB1gene in wheat genome donor species is uncharacterized. In order to efficiently use of wild relatives for improving wheat tolerance to abiotic stress, we analyzed the nucleotide diversity of DREB1 gene among the genomes of Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu. The haplotype diversity and evolutionary factors combined with the relationship between DREB1 transcriptional factors and stress resistance further explore the evolution and origin of the wheat-tribe genome.

Plant materials
Thirteen accessions of Aegilops speltoides (S genome), 12 accessions of Ae. tauschii (D genome), 24 accession of Triticum monococcum (A m genome) and 19 accessions of T. urartu (A u genome) were sampled ( Table 1). The seeds were provided by USDA (United States Department of Agriculture). Germinated seeds were transplanted to a sand-peat mixture, and the plants maintained in a greenhouse. Twenty-three sequences from other Triticeae species and Brachypodium distachyon downloaded from NCBI website were included in phylogenetic analysis.

DNA extraction, amplification and sequencing
Leaf tissue samples were collected and frozen in liquid nitrogen. DNA was isolated using the GeneJet Plant Genomic DNA Purification Mini Kit according to the manufacture's instruction (Thermo Scientific). The isolated genomic DNA was stored at -20˚C for use.
PCR products were commercially sequenced by the Shanghai Sangon Biological Engineering & Technology Service Ltd (Shanghai, China). To enhance the sequence quality, both forward and reverse strands were sequenced independently. To avoid any error which would be induced by Taq DNA polymerase during PCR amplification, each sample was independently amplified twice and sequenced.

Data analysis
Automated sequence outputs were visually inspected with chromatographs. Multiple sequence alignments were performed using ClustalX with default parameters. Maximum-parsimony (MP) method was used to perform phylogenetic analysis using computer program PAUP � ver. 4 beta 10 [28]. All characters were specified as unweighted and unordered. The most parsimonious trees were constructed by performing a heuristic search using the Tree Bisection-Reconnection (TBR) with the following parameters: MulTrees on and ten replications of random addition sequences with the stepwise addition option. A strict consensus tree was generated from multiple parsimonious trees. The consistency index (CI) and the retention index (RI) were used to estimate the overall character congruence. Bootstrap (BS) values with 1,000 replications [29] that was calculated by performing a heuristic search using the TBR option with Multree on were used to test the robustness of clades. Bayesian analysis was used to for phylogeny analysis of haplotypes. The jModelTest 2.1.10 [30] was used to calculate the best-fitting model of sequence evolution using default parameters. The Maximum likelihood value (-LnL), Akaike information criterion (AIC) [31] and Bayesian Information Criterion (BIC) [32] were estimated. The model test showed the TrN+I substitution model led to best BIC and AIC scores, therefore, the TrN+I model was used in the Bayesian analysis using MrBayes 3.1 [33]. MrBayes 3.1 was run with the program's standard setting of two analyses in parallel, each with four chains, and estimates of convergence of results were determined by calculating standard deviation of split frequencies between analyses. 689,000 generations were run to make the standard deviation of split frequencies < 0.01. Samples were taken every 1000 generations. The first 25% of samples from each run were discarded as burn-in to ensure the stationary of the chains. Bayesian posterior probability (PP) values were used to test the robustness of clades.
The analysis of protein domain and conserved motif of sequences was characterized using Pfam (https://pfam.xfam.org/search) and Multiple Em for Motif Elicitation (MEME) software [38].

Sequence analysis
The DNA from 13 accessions of Aegilops speltoides (S genome), 12 accessions of Ae. tauschii (D genome); 24 accession of Triticum monococcum (A m genome); and 19 accessions of T. urartu (A u genome) were amplified using the Dreb1F/R primer pair. The size of amplified products from these DNA was approximately 850 bp. Complete alignment of the 68 DREB1 sequences detected a large-size (70 bp) insertion/deletion in the accession PI486264 of Ae. speltoides (Fig 1). BLAST search against NCBI found that this sequence shared 100% identity with T. aestivum DREB transcription factor 6 (DREB6) mRNA (AY781361.1); the sequence on T. aestivum chromosome 3B (LS992087); and T. aestivum genome B dehydration-responsive element-binding protein (DREB1) gene, partial cds (DQ195069.1). BLASTX search found that protein sequence of this accession matched with DREB transcription factor 6 (AAX13289.1) of T. aestivum with 100% identity, while it lost "KDESESPPSLISNAPTAALHRSDA" when compared with AP2-containing protein in T. aestivum (AAL01124.1). The haplotypes of DREB1 sequences from Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu were calculated. A total of 19 haplotypes were identified in the 68 accessions of these four species. Seven, 7, 10, and 4 haplotypes were detected from 13 sequences of Ae. speltoides, 12 sequences of Ae. tauschii, 24 sequences of T. monococcum, and 19 sequences of T. urartu, respectively ( Table 2). Twenty-six out of 68 accessions belonged to the Hap 2, and 15 belonged

Phylogenetic analysis
The phylogenetic relationship of 90 DREB1 sequences from Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu along with DREB1 sequences from other Triticeae species was analyzed using the maximum parsimony. The sequence from Brachypodium distachyon was used as an outgroup. The maximum parsimony analysis resulted in 236 most parsimonious trees (642 constant characters, 110 parsimony-uninformative characters, 77 parsimony-informative characters, CI excluding uninformative characters = 0.864; RI = 0.935). The strict consensus phylogenetic tree yielded obvious Aegilops + Triticum and Hordeum species group (Fig 2) with highly supported bootstrap values (78% and 100%, respectively). All sequences from Aegilops and Triticum species studied here were grouped into the Aegilops + Triticum except the sequences of T. aestivum from B genome and Ae. speltoides accession PI 486264, which formed a group with 97% bootstrap support. Within the Aegilops + Triticum clade, the sequence DQ195070 encoding dehydration responsive element binding protein (DREB1) on A genome of T. aestivum was grouped with the DQ022952, dehydration responsive element binding protein W73 mRNA (87% bootstrap support), and was nested within the most sequences from Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu (56% bootstrap support). The sequence DQ195068 encoding dehydration-responsive element binding protein on D genome of T.  Fig 3 with Bayesian posterior probability (PP) values above branch. Haplotypes of Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu were grouped into different clades (Fig 3). The Hap 4 and Hap 5 showed a close relationship with a well-supported value (PP = 0.99).
Pfam analysis showed that all sequences contain AP2 domain structure. The conserved motif analysis of the DREB proteins found that the haplotype 19 of DREB sequence did not have motif "PPSLISNGPTAALHRSDAKDESESAGTVARK VKKEVSNDLRSTHEEHKTL", the haplotype 5, 9, 13, and 16 did not have motif "KKVRRRSTGPDSVAETIKKWKEENQK LQQENGSRKAPAKGS" (Fig 4).

Nucleotide diversity in Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu
Previous studies have provided evidence that crop domestication and modern breeding strategies resulted in serious reduction of genetic diversity on various species [39,40], which led to a loss of elite genes underlying abiotic stress tolerance in crop [21]; therefore, exploitation of the genetic resources of wild relatives is widely used strategy to increase biodiversity for crop improvement. Characterization of genetic diversity not only has significant effect on genetic improvement and resistance research, but also provides new direction for the conservation and utilization of genetic resources in germplasm gene banks [41]. The nucleotide diversity of Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu was examined here.
The genomes of T. urartu Thum ex Gand (genome A u ) and T. monococcum Linn (genome A m ) have similar genome size and gene content [42]. Triticum urartu, the wild diploid wheat from the Fertile Crescent region, has long been considered as the A-genome donor to tetraploid and hexaploid wheat species [17,43]. The diploid wheat T. monococcum was among the first domesticated crops in the Fertile Crescent 10,000 years ago [44]. Our results showed that both the number of haplotypes and nucleotide diversity values of T. monococcum were much higher than those of T. urartu, which might suggest "no reduction of nucleotide diversity during T. monococcum domestication" made from a study of 18 loci in 321 wild and 92 domesticated lines of Triticum species [44], but was not consistent with the Qi et al. [45]. That higher variability of DREB1 in T. monococcum than in T. urartu could be due to DREB1 regulating role in response to abiotic stress, as this kind of gene might be experienced elevated rates of mutations and adaptation. This was evidenced by the highest number of haplotypes detected in T. monococcum among the four species analyzed here. Indirectly, this variation generated in T. monococcum might be a source to wheat breeding to improve the resistance to abiotic stress.
Higher nucleotide diversity value of DREB1 in Ae. speltoides than that in Ae. tauschii; T. monococcum; and T. urartu was expected, and might agree well with previous studies [45,46]. This might attribute to the mating system of these species. Aegilops speltoides is an outcrossing species, while Ae. tauschii, T. monococcum, and T. urartu are inbreeding species. Mating system is one of the major factors controlling molecular diversity [44,47,48]. It was reported that averaged π value (0.01323) of Acc-1 gene in the genomes of outcrossing species was two-fold of the value (0.005664) in the genomes of selfer in Triticeae species [49].
In general, genetic bottlenecks acting on neutrally evolving loci during either the domestication process or subsequent breeding, or both, are sufficient to account for reduced diversity [50]. In domesticated forms, this reduction is evident in a shift toward more positive values of Tajima's D in the domesticated relative to wild species population [51,52]. Domesticated T. monococcum showed negative Tajima's D values ( Table 3), suggesting that there might be no genetic bottlenecks effects on or a signature of a recent population expansion of T. monococcum.

Comparison of nucleotide diversity of DREB1 gene with other species
Since DREB1/CBF (dehydration responsive element binding/C-repeat binding factor) encoding genes in abiotic stress have important roles in responding to abiotic stress, nucleotide diversity of DREB1 gene has been characterized in several plants [53][54][55][56][57]. In 126 wheat lines, the nucleotide diversity π and θ values of wheat DREB gene on 1A chromosome were 0.180 and 0.392, respectively [27], which were much higher than the values detected in A genome T. monococcum and T. urartu. DREB1A nucleotide diversity was calculated from the 126 wheat lines that were developed by the International Maize and Wheat Improvement Center from entries in the elite spring wheat yield trial, semiarid wheat yield trial, and high temperature wheat yield trial [57]. These lines, during the breeding procedure, might suffer different natural selection pressure, resulting in the wide range of diversity of this gene. Speculatively, the significant Tajima's D value of DREB1A in these wheat lines might be an indicator of presence of selection footprints, while the Tajima's D value of DREB1 in T. monococcum and T. urartu accessions studied here did not reach significant level.
The haplotype (gene) diversity of DREB1 gene among the 10 promising upland and lowland cultivars rice was 0.756 [53], which is comparable to the haplotype diversity detected in Ae. speltoides and T. monococcum, but higher than that in T. urartu. The nucleotide diversity (π) of DREB1 in 191 chickpea was 0.0011 [54], which is similar to the values detected in this study, while the nucleotide diversity in C. canephora CDS region was π = 0.0101, θ = 0.0080) [56], which was much higher than that in our study. This might be attributed to the nature of species.

Conserved motif of DREB proteins
The conserved motif analysis of the DREB proteins found that some sequences did not have "PPSLISNGPTAALHRSDAKDESESAGTVARKVKKEVSNDLRST HEEHKTL", and motif "KKVRRRSTGPDSVAETIKKWKEENQKLQQENGS RKAPAKGS". All sequences contain AP2 domain structure, suggesting the structural diversity and functional similarity of the DREB gene in these species. Allele mining across DREB1A and DREB1B in diverse rice genotypes also found indels across DREB1A and DREB1B [55]. Since DREBs are important transcriptional factors regulating stress-responsive gene expression, the highly conserved domains in these genes are essential for their specific biological functions. Further correlated the SNPs and indels in the DREB1 with its genotype responding to stress will enhance our understanding the role played by this gene.

Implication of sequences variation of Ae. speltoides on origination of B genome in wheat
Overwhelming evidences have suggested that the diploid ancestor of the B genome of tetraploid and hexaploid wheat species is closely related to the S genome of Aegilops speltoides in the Sitopsis section (SS, 2n = 14) [19, 42, 43, 58-60,]. However, none of the presently known species in this group have all properties of the B-genome [60]. A study on transposable elements (TEs) suggested that the S genome of Ae. speltoides has diverged very early from the progenitor of the B genome which remains to be identified [58]. Analysis of the Pgk-1 gene among the Ae. speltoides accessions revealed an 89 bp indel in the intron of the Pgk-1 gene, indicating that likely existence of two different ancestral Ae. speltoides forms, which gave rise to two evolutionarily close lineages of polyploid wheats [61]. The Wcor15 results suggested that Ae. speltoides might be the direct donor of the Wcor15-2B in tetraploid and hexaploid wheat varieties [42]. Our study here also revealed two forms of DREB1 sequences in Ae. speltoides, suggesting "likely existence of two different ancestral Ae. speltoides forms" [61]. The form in the accession PI486264 shared 100% identity with the sequences on T. aestivum chromosome 3B, which might be more likely the B donor genome of wheat.
Recent studies suggested that mono-or polyphyletic B subgenome origin cannot explain entirely the observed accumulation of mutations during evolution in shaping the modern bread wheat B subgenome. The consequences of a differential evolutionary plasticity of the B subgenome was proposed as an alternative scenario where the increased divergence of the B subgenome in the hexaploid wheat compared to Ae. speltoides at the sequences level [62]. Phylogenetic analysis routinely applied to test evolutionary questions and to trace the origin of polyploidy is based on assumptions that intraspecifc variation is smaller than interspecific variation, and that within and between species, sample sizes are sufficiently large enough to capture variation at both levels [63]. When sampling a single individual per species or treating each individual or haplotype as a separate terminal taxon could delineate the potential risk of bias [64]. Intraspecific variation is abundant in all types of systematic characters which could cause bias in the phylogenetic analyses [65], such as in Ae. speltoides. Our results suggested that, in order to reveal the origination of B subgenome in the modern bread, it is critical to include wide range of accessions of Ae. speltoides in phylogenetic analysis.
In summary, the highest DREB1 gene diversity was detected in Ae. speltoides, followed by Ae. tauschii and T. monococcum. The lowest nucleotide diversity value was observed in T. urartu. Both the number of haplotypes and nucleotide diversity values of T. monococcum were much higher than those of T. urartu, which likely supports no reduction of nucleotide diversity during T. monococcum domestication [44]. Our study here revealed two forms of DREB1 sequences in Ae. speltoides. The form in the accession PI486264 shared 100% identity with the sequences on T. aestivum chromosome 3B, which might be more likely the B donor genome of wheat. Our results suggested that, in order to reveal the origination of B subgenome in the modern bread, it is critical to include wide range of accessions of Ae. speltoides in phylogenetic analysis. Stress tolerance study such as drought on these materials will be conducted to make possibly link of the haplotype with gene expression in future.
Supporting information S1