A discriminatory test for the wheat B and G genomes reveals misclassified accessions of Triticum timopheevii and Triticum turgidum

The tetraploid wheat species Triticum turgidum and Triticum timopheevii are morphologically similar, and misidentification of material collected from the wild is possible. We compared published sequences for the Ppd-A1, Ppd-B1 and Ppd-G1 genes from multiple accessions of T. turgidum and T. timopheevii and devised a set of four polymerase chain reactions (PCRs), two specific for Ppd-B1 and two for Ppd-G1. We used these PCRs with 51 accessions of T. timopheevii and 20 of T. turgidum. Sixty of these accessions gave PCR products consistent with their taxon identifications, but the other eleven accessions gave anomalous results: ten accessions that were classified as T. turgidum were identified as T. timopheevii by the PCRs, and one T. timopheevii accession was typed as T. turgidum. We believe that these anomalies are not due to errors in the PCR tests because the results agree with a more comprehensive analysis of genome-wide single nucleotide polymorphisms, which similarly suggest that these eleven accessions have been misclassified. Our results therefore show that the accepted morphological tests for discrimination between T. turgidum and T. timopheevii might not be entirely robust, but that species identification can be made cheaply and quickly by PCRs directed at the Ppd-1 gene.

The wild versions of T. turgidum and T. timopheevii have restricted geographical ranges, overlapping in southeast Turkey, northwest Syria and in the mountainous regions of eastern Iraq/western Iran, with T. turgidum additionally present in the upper Jordan valley and T. timopheevii in the Caucusus [3,4]. Although both species were domesticated by early farmers, only cultivated T. turgidum is considered to be a major crop, being grown extensively at Neolithic sites throughout the Fertile Crescent [3,5,6], and forming part of the package of crops whose cultivation spread into Europe, Asia and North Africa [3]. In contrast, T. timopheevii is looked on as a secondary crop, being found today only in western Georgia [3], although it has been suggested that the 'new glume wheat', which was grown by prehistoric farmers throughout western Asia and eastern Europe but is extinct today, might have been a form of T. timopheevii [7].
The hulled subspecies of T. turgidum and T. timopheevii have very similar morphologies and taxonomic identification is based mainly on the greater degree of hairiness of the culm internodes and leaf sheaths of T. timopheevii [8]. Misclassification is therefore possible, and DNA typing methods that can make unambiguous and correct identifications of the two species have been sought. However, identification of diagnostic DNA markers is complicated by the divergence time of the B and G genomes, which at 2.5-3.5 million years ago [9] is very recent in evolutionary terms, meaning that the two genomes share extensive DNA sequence identity. Additionally, in order to discriminate between T. turgidum and T. timopheevii, a marker must also give a null or diagnostic signal for the A genome, which diverged from the ancestor of the B and G genomes approximately 7 million years ago [9,10] and so also has extensive sequence similarity. Early studies indicated that the multicopy ribosomal DNA (rDNA) transcription units have features that enable the three genomes to be distinguished [11,12], and two polymerase chain reactions (PCRs) intended to be specific for the internal transcribed spacer of the G genome rDNA units were designed for identification of archaeological specimens [13]. However, one of these PCRs gave nonspecific amplification products with modern T. turgidum accessions and neither were successful with the ancient material. More recently, PCRs targeting chloroplast and mitochondrial DNA markers have been used [14,15], but these tests assume that the cytotype is an accurate proxy for the nuclear genome, which may not always be the case [14].
In order to identify nuclear markers for discrimination between T. turgidum and T. timopheevii, gene resequencing data (i.e. the sequences of orthologous genes from multiple accessions of the two species) are required so that species-specific sequence variations can be identified. The wheat gene for which the greatest amount of resequencing data is available is Ppd-1, coding for the major photoperiod response protein, with complete sequences in Genbank for 74 copies of Ppd-B1, 16 Ppd-G1, and 93 Ppd-A1 (77 from T. turgidum and 16 from T. timopheevii) [16,17]. From this information we designed two PCRs that are specific for Ppd-B1 and another two specific for Ppd-G1. Through use of these PCRs, we identify germplasm accessions of T. turgidum that have been misclassified as T. timopheevii, and vice versa.
DNA sequences were downloaded from Genbank for Ppd-B1 from 24 accessions of T. turgidum subsp. dicoccoides and 50 T. turgidum subsp. dicoccum, Ppd-G1 from 11 T. timopheevii subsp. armeniacum and 5 T. timopheevii subsp. timopheevii, and Ppd-A1 from 32 T. turgidum subsp. dicoccoides, 45 T. turgidum subsp. dicoccum, 11 T. timopheevii subsp. armeniacum and 5 T. timopheevii subsp. timopheevii (S2 Table). Sequences were aligned using the ClustalW, Muscle and Mafft programs in Geneious version R10 (https://www.geneious.com, [18]) and single nucleotide polymorphisms (SNPs) that are specific to the different genomes identified. Primer pairs were identified for four PCRs (Table 1), two specific for Ppd-B1 and two for Ppd-G1. PCRs were carried out in a LightCycler480 (Roche) in 20 μl reaction volumes comprising 100 ng DNA extract, 1x SensiFAST SYBR No-ROX PCR master mix (Bioline), 100 nM forward primer, 100 nM reverse primer and PCR grade water. Cycling parameters were: 95˚C for 5 min; followed by 35 cycles of 20 s at 95˚C, 20 s at the annealing temperature, 20 s at 72˚C; followed by a final extension at 72˚C for 10 min. Product formation was assayed using the SYBR Green I/HRM Dye detection format (465 nm excitation, 510 nm emission) by melt curve analysis. Melting data were obtained by heating the products to 95˚C for 5 s, cooling to 55˚C for 30 s and then heating to 99˚C with five data acquisitions/˚C. Melting peaks were obtained by plotting-(δF/δT) against temperature. PCR products were additionally visualized by electrophoresis in 3% agarose gels to confirm they were the correct length.
Prior to sequencing, PCR products were cloned (Invitrogen TOPO TA Cloning Kit for Subcloning, with One Shot TOP10 chemically competent E. coli cells) and reamplified, using the conditions described above except for the final extension at 72˚C, with forward and reverse M13 primers (annealing temperature 55˚C) and recombinant colonies added directly to the PCR mixture. PCR products were purified with the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel) and sequenced using the BigDye Terminator v3.1 kit chemistry (Applied Biosystems). Standard sequencing reactions of 20 μl comprised 20 ng PCR product, 1x BigDye sequencing buffer, 0.125x BigDye reaction mix, 4 pmoles M13 primer and UltraPure DNase/ RNase-free distilled water. Cycling parameters were: 2 min at 96˚C; 35 cycles of 40 s at 96˚C, 15 s at 50˚C, 4 min at 60˚C; with products held at 4˚C before purification (Beckman Coulter  [19]. Unique sequence tags were aligned to release 31 of the genome of Triticum aestivum L. [20] using BWA v.0.7.8-r455 [21] and SNPs identified with the TASSEL-GBS pipeline [22]. Principal components analysis (PCA) was performed with TASSEL [23].

Results
The consensus sequence resulting from multiple alignment of the 173 Ppd-1 Genbank entries had a total length of 7302 bp with the first nucleotide of the initiation codon at position 3604 and the last nucleotide of the termination codon at position 6819. The alignment was used to design two PCRs specific for Ppd-B1, one of these located within exon 7 of the gene and the second mainly in intron 7 but with its 3´-terminus extending a short distance into exon 8, and a further two PCRs specific for Ppd-G1, both of these targeting sequences within exon 6 (Fig 1). The PCRs were designed so that each primer pair had a 100% match with their annealing sites on the target genome, but at least two mismatches with the equivalent sites on the non-target genomes ( Table 2). Each primer pair gave a single product of the expected size  Differences between the sequences of the primers and the non-target genomes are shown in bold. a In some accessions of T. turgidum subsp. dicoccum the target sequence is absent due to a larger deletion in the Ppd-A1 gene.
when used with DNA from its target species, and no product with the non-target species (S1 and S2 Figs), confirming the specificities of the PCRs. The PCRs were used with 51 accessions of T. timopheevii and 20 of T. turgidum (S3 Table). Sixty accessions gave PCR products consistent with their taxon identifications. The other eleven accessions gave anomalous results (Table 3). These accessions comprised ten that were classified as T. turgidum subsp. dicoccoides but which gave positive results with the Ppd-G1 but not the Ppd-B1 PCRs, and which were therefore typed as T. timopheevii, and one T. timopheevii subsp. armeniacum accession which gave positive results for Ppd-B1 but not Ppd-G1, and so was identified as T. turgidum (Fig 2). For each of these eleven anomalous accessions, the PCR products that were obtained were sequenced and their authenticity as Ppd-B1 or Ppd-G1 products confirmed from the presence of specific variations within the internal part of the amplicon (Fig 3).
GBS was carried out with 138 tetraploid wheats including each of the eleven accessions that gave anomalous results by Ppd-1 typing. The resulting dataset of 1,172,469 SNPs was examined by PCA. The first principal component (PC1) separated the T. turgidum and T. timopheevii accessions into distinct clusters (Fig 4). Each of the ten accessions classified as T. turgidum subsp. dicoccoides but identified as T. timopheevii by Ppd-1 typing were positioned within the T. timopheevii cluster, and the single accession classified as T. timopheevii subsp. armeniacum but identified as T. turgidum by Ppd-1 typing was located within the T. turgidum cluster.

Discussion
We designed two PCRs specific for the Ppd-B1 gene and two for Ppd-G1 and tested these with 71 T. timopheevii and T. turgidum accessions. For 60 accessions, the results of the PCRs were  consistent with the species identification, giving positive results for Ppd-B1 and negative for Ppd-G1, or vice versa, indicating that the PCRs were specific for their target sequences and that neither of the PCRs gave products with the Ppd-A1 gene on the A genome. There were, however, eleven anomalous accessions, ten which gave positive results for Ppd-G1 despite being classified as T. turgidum, and one classified as T. timopheevii that was typed positive for Ppd-B1. Previous contradictions between the outcomes of PCR typing and the morphological identification of a wheat as T. timopheevii or T. turgidum have been dismissed as errors in the DNA method [17]. However, we believe that with the anomalies we report our DNA typing results are correct and the accessions have previously been misclassified. This is because each of these eleven accessions were included in a larger group of 138 T. timopheevii and T. turgidum wheats for which we obtained GBS data. PCA of the resulting SNPs separated the 138 accessions into two clusters, one cluster comprising T. timopheevii wheats plus the ten accessions that were classified as T. turgidum but which gave a positive result for Ppd-G1, and the second cluster made up of T. turgidum plus the one accession that was classified as T. timopheevii but which gave a Ppd-B1 result. As the SNPs used in the PCA mapped to all 14 tetraploid wheat chromosomes, with >59,000 markers per chromosome, we can be confident that the clustering reflects genome-wide differences between the groups of accessions, and therefore is giving an accurate identification of whether each wheat has an AABB or AAGG genome set. The agreement between the PCAs and the Ppd-1 typing therefore confirms that these eleven accessions have been misclassified, and that Ppd-1 typing (which is much less time-consuming and costly than GBS analysis) is an accurate means of distinguishing between T. timopheevii and T. turgidum.
The entries for the eleven misclassified accessions in the Germplasm Resources Information Network (GRIN) and the European Wheat Database (EWDB) give no indications that the original material that was collected might have been misidentified. However, the ten accessions misclassified as T. turgidum were collected from Turkey, Iran and Iraq, which are within the distribution range for wild T. timopheevii, and the one misidentified as T. timopheevii was collected in the Lebanon, which is outside of the area normally associated with T. timopheevii [3]. Three of the accessions misidentified as T. turgidum (PI 560697, PI 560873 and PI 560877) were previously reclassified by us as T. timopheevii based on the pattern of retrotransposon insertions in the 5S rDNA arrays [24], and two (PI 560697 and PI 560877) were similarly classified as T. timopheevii in a study of the grain Hardness locus [25]. In contrast, PI 560697 was included in a panel of 113 wild T. turgidum accessions used in a survey of allelic diversity at the ear-shattering loci, TtBtr1-A and TtBtr1-B [26], although PI 560697 gave an unusual result, being one of only two accessions that possessed the domesticate allele at TtBtr1-A. None of the other seven accessions that we reclassify as T. timopheevii (PI 656869, PI 656872, PI 656873, CGN 16098, CGN 16102, CGN 13161, CGN 24296) appear to have been extensively studied in the past. The single accession that we reclassify from T. timopheevii to T. turgidum (PI 427998) was listed as Triticum boeoticum, a wild diploid wheat, now called Triticum monococcum L. subsp. aegilopoides (Link) Thell., in a study of molecular diversity at 18 genetic loci [27], but was subsequently looked on as T. turgidum in the retrotransposon and Hardness projects mentioned above [24,25].

Conclusion
We show that the Ppd-1 gene of wheat displays species-specific variations that enable the B and G genomes to be distinguished via simple PCR tests, the outcomes of these tests agreeing with identifications made by more comprehensive, but more time-consuming and expensive, analysis of genome-wide SNPs. The use of Ppd-1 typing reveals a significant number of misclassified accessions, in particular wheats initially identified as T. turgidum but which we show to be T. timopheevii, suggesting that the accepted morphological tests for discrimination between the two species might not be entirely robust. The short lengths of the amplicons (61-100 bp) means that the tests we report would be particularly suitable for typing ancient DNA, which is typically obtained as fragments <100 bp [28]. Among other archaeological applications, these tests might therefore make it possible to establish if the new glume wheat [7] is a type of T. turgidum or T. timopheevii.
Supporting information S1 Fig. Agarose gel showing products  armeniacum Cltr 17678. The blue lines are no-template controls. Melting peak analysis enables PCR specificity to be confirmed because products with different sequences melt at different temperatures. A single peak therefore indicates that a single PCR product has been formed. (TIFF) S1