A DNA Barcoding Method to Discriminate between the Model Plant Brachypodium distachyon and Its Close Relatives B. stacei and B. hybridum (Poaceae)

Background Brachypodium distachyon s. l. has been widely investigated across the world as a model plant for temperate cereals and biofuel grasses. However, this annual plant shows three cytotypes that have been recently recognized as three independent species, the diploids B. distachyon (2n = 10) and B. stacei (2n = 20) and their derived allotetraploid B. hybridum (2n = 30). Methodology/Principal Findings We propose a DNA barcoding approach that consists of a rapid, accurate and automatable species identification method using the standard DNA sequences of complementary plastid (trnLF) and nuclear (ITS, GI) loci. The highly homogenous but largely divergent B. distachyon and B. stacei diploids could be easily distinguished (100% identification success) using direct trnLF (2.4%), ITS (5.5%) or GI (3.8%) sequence divergence. By contrast, B. hybridum could only be unambiguously identified through the use of combined trnLF+ITS sequences (90% of identification success) or by cloned GI sequences (96.7%) that showed 5.4% (ITS) and 4% (GI) rate divergence between the two parental sequences found in the allopolyploid. Conclusion/Significance Our data provide an unbiased and effective barcode to differentiate these three closely-related species from one another. This procedure overcomes the taxonomic uncertainty generated from methods based on morphology or flow cytometry identifications that have resulted in some misclassifications of the model plant and its allies. Our study also demonstrates that the allotetraploid B. hybridum has resulted from bi-directional crosses of B. distachyon and B. stacei plants acting either as maternal or paternal parents.


Introduction
The impact of the new model plant Brachypodium distachyon on grass genomic research has gathered pace since the publication in 2010 of the full genome sequence of the diploid genotype Bd21 (2n = 10) by the International Brachypodium Initiative [1]. This taxon shows one of the smallest genome sizes of the monocots (272 Mb), together with a short life cycle (6 weeks), an inbreeding nature and a close relationship to the temperate cereals and forage crops [2]. These features make it an optimal model for the cultivated temperate cereals, wheats and barley, and other Poaceae. Over the last decade, more than 400 laboratories worldwide have worked on investigating the genomics, transcriptomics and metabolomics of B. distachyon [2,3,4]. Lines of research include studies on grain production, pathogen resistance, and tolerance to drought and to other abiotic stresses that could be transferred to cereal breeding programs [2,3,5], to those on cell wall analyses focused on the improvement of biofuel grass production [2,5]. Other studies have highlighted the ecological plasticity of B. distachyon [6,7,8], adapted to different environmental conditions, as a suitable plant for ecosystem management and to prevent land erosion [7]. The compact genome of B. distachyon, which shows an extremely low amount of repetitive DNA [1,2], has facilitated the construction of single-copy BAC libraries for comparative genomics and of derived mutagenized T-DNA and TILLING lines as a further aid to investigate gene expression effects under different natural and induced conditions in the model grass [2]. Additionally, large B. distachyon germplasm collections have been built at USDA (http://www.ars-grin.gov/npgs), and in several European and Mediterranean institutions [2,3,4,9,10], containing accessions with both economically and ecologically relevant traits and showing large phenetic and genotypic variation for on-going mapping projects.
The taxonomic and genomic identity of B. distachyon has been recently challenged by the evolutionary and systematic study of Catalán and coworkers [11]. Three cytotypes of B. distachyon sensu lato (s. l.) are known (2n = 10, 2n = 20 and 2n = 30) which were previously attributed to different ploidy levels of the same taxon B. distachyon s. l. (e. g., an autopolyploid series of individuals with x = 5 and 2n = 10 (2x), 20 (4x), 30 (6x) chromosomes; [12]). Catalan and coworkers demonstrated, through exhaustive phylogenetic, cytogenetic and phenotypic analyses, that the three cytotypes should in fact be treated as three different species: two diploids, each with a different chromosome base number, B. distachyon (x = 5, 2n = 10) and B. stacei (x = 10, 2n = 20), and their derived allotetraploid B. hybridum (x = 5+10, 2n = 30). In-situ GISH and rDNA and single-BAC FISH hybridizations, nucleolar dominance, and Comparative Chromosome Painting (CCP) analyses have conclusively demonstrated that the genomes of the two diploid species participated in the origin of the allopolyploid B. hybridum genome [11,13,14,15,16]. Genome size analyses provided further evidence that the genome size of B. hybridum (c. 1.265 pg/2C) resulted from the sum of the genomes of the two parental species [11]. Phylogenetic analyses of two plastid (ndhF, trnLF) and four nuclear (ITS, ETS, CAL, GI, DGAT) genes indicated that the more basally-diverged B. stacei and the more recently evolved B. distachyon emerged from two independent lineages, confirming their contribution as genome donors of B. hybridum [11]. Statistical analysis of morphometric traits showed that five characters (stomata leaf guard cell length, pollen grain length, upper glume length, lemma length, and awn length) significantly discriminated among the three species when they were grown under controlled greenhouse conditions [11]. However, although the three species can be differentiated through several phenotypic and cytogenetic traits, their direct identification is not always straightforward as wild populations show overlapping phenotypic variation for some characters and a similar diploid genome size (B. distachyon 0.631 pg/2C, B. stacei 0.564 pg/2C; [11,17]). This has led to taxonomic uncertainty among, or even to taxonomic misclassifications of, the model species and its close allies when using currently employed identification methods such as morphology or flow cytometry (see Discussion).
The importance of B. distachyon and its recently split congeners, B. stacei and B. hybridum, has been underlined in newly addressed initiatives on re-sequencing 56 new accessions of B. distachyon and the de-novo genome sequencing of B. stacei and B. hybridum, a project undertaken by the Joint Genome Institute and the International Brachypodium Consortium (http://brachypodium.pw.usda.gov/ files/resequencing_description_110822.pdf). The genomic features of the three species of this complex, which are characterized by similar, small genomes with low repetitive DNA content, make it an ideal group to investigate the mechanisms of polyploid hybrid speciation, paralleling those of the major cereal (Triticum) crops [2,5]. The imminent genome sequences of B. stacei and B. hybridum will allow comparative genomic and functional genomic analyses on these diploid and polyploid grasses and their potential transfer to other cereals and forage crops. A large-scale phenomic study of a collection of different B. distachyon accessions, adapted to different selection pressures and currently undergoing re-sequencing (see above), is also under way (EPPN initiative; http://www.plantphenotyping-network.eu/) and could be extended to B. stacei and B. hybridum (John Doonan, pers. comm). These analyses would be hindered, however, by the lack of a reliable method to differentiate the individuals of the three species. This is particularly problematic in natural admixed populations, where B. hybridum grows in sympatry with one or the other parental species [6,11] López-Alvarez & Catalán, unpublished data]. Misidentified B. stacei and B. hybridum samples have also been found within the B. distachyon germplasm collections (see Discussion). Therefore, if the model plant is not one but three species, it is imperative to find an accurate and easily performed method to separate them. The DNA barcoding system offers a suitable approach to this problem.
From the several genes proposed as potential DNA barcodes for plants, the combination of the partial sequences of the plastid rbcL and matK coding genes was selected as the preferred core sequence by the CBOL Plant Working Group [18]. These authors also recommended the use of other fragments in combination with the rbcL+matK core to increase resolution within complex taxonomic groups. However, recent studies have proposed other, more variable genes as suitable candidates for the DNA barcoding of closely related plants [19,20,21]. Among these, the plastid trnLF region [20,21,22] and the nuclear rDNA ITS region [20,23] have demonstrated their utility to discriminate different angiosperms at the species level in many groups, though they are not effective in all cases [21,22,24,25]. A mini-barcoding fragment within the trnLF region, the P6 loop, has provided useful barcoding speciesspecific markers in ecological and dietary studies [22,25]. Analyses of large angiosperm data sets have demonstrated, however, that the inclusion of the nuclear ITS region significantly increased the discriminatory power of the barcoding method beyond that based on the plastid molecules alone [23]. Despite the drawbacks posed by the multicopy ITS region in plants, such as the potential presence of paralogous and recombinant copies, and its predominant concerted evolution towards one of the parental ribotypes in the hybrid species [26], there is overall agreement on the value of its use as a barcoding tool for plants [20,23]. In contrast, little consensus has been reached on the use of nuclear single-copy genes as barcoding molecules for plants. The problem stems from the inherent difficulty of finding appropriate unlinked and nonduplicated orthologous genes across a wide spectrum of angiosperms, capable of high-resolution species discrimination [20,27]. Initial progress, however, has been put forward in some plant groups, where the selection of various taxonomically widespread single-copy orthologous genes (COS) has helped to diagnose species [28,29,30].
The complexity of the appropriate barcoding method is undoubtedly related to the complexity and nature of the group under study. Thus, taxonomically complex groups where species boundaries are narrowly defined [31], recently radiated species which show incomplete lineage sorting and/or few private mutations [21], and polyploids of hybrid origin (allopolyploids) that inherited a maternal plastid genome but a biparental nuclear genome are among the most problematic plants to be barcoded [20]. The B. distachyon -B. stacei -B. hybridum complex fits these characteristics. However, the short generation time of these annuals likely allowed the accumulation of a high number of mutations in their plastid and nuclear genomes. This probably resulted in significantly higher evolutionary rates among these species than those detected in perennial Brachypodium species [11]. Although Catalán and co-workers conducted phylogenetic analyses using a restricted sampling of representatives of B. distachyon, B. stacei and B. hybridum (including type materials of the three species), they found evidence of low intraspecific variation and of high interspecific divergence in the studied plastid and nuclear DNA sequences of the diploids B. stacei and B. distachyon. Regarding the allotetraploid B. hybridum, the evolutionary analyses indicated that this species apparently inherited its maternal cpDNA genome from B. stacei, the paternal nrDNA ribotypes from B. distachyon, and one copy each of the nDNA single-copy CAL, GI, and DGAT genes from both parents [11]. These findings suggested that the studied fragments could be used as barcodes to discriminate among the three related species.
The first major aim of this study was to test whether two genes that have been previously proposed as barcoding tools for different angiosperms, the plastid trnLF region and the nuclear ITS region (both included in the study of Catalán and co-workers [11]), could be used as barcodes to discriminate the model plant B. distachyon and its close relatives B. stacei and B. hybridum when a large sample of representatives of the three taxa was surveyed. Secondly, we wanted to test whether the use of the two molecules would suffice to identify B. hybridum or if a third nuclear single-copy gene is necessary to unambiguously characterize the allotetraploid. A third goal of our study was to investigate whether B. stacei and B. distachyon were, respectively, the maternal and paternal genome donors of all the studied B. hybridum, in order to test whether this species had a monophyletic or polyphyletic origin.
The aligned GI region consisted of 665 nucleotide positions of which 146 (21.9%) were variable and 45 (6.8%) were potentially informative (Tables 2, S3). The GI sequences were more variable than those of either trnLF or ITS, grouping into 200 haplotypes of which approximately the same number were of B. distachyon (n = 90, 45%) and B. stacei-type (n = 106, 53%) (Tables 1, S2). These two groups were monophyletic with respect to one another (Figs. 2,3). The few cloned B. distachyon and B. stacei individuals showed GI haplotypes belonging to their respective groups but with slightly different allelic variants in most cases. These minor variants could represent genuine mutations but could be also a consequence of Taq polymerase errors (Harriet Hunt, pers. comm.). Four haplotypes (h12, h13, h102, h141) showed evidence of inter-parental recombination in B. hybridum (n = 4, 2%) (Tables 1,                                     Asterisks indicate chromosome counts obtained in this study using DAPI-staining methods. Other chromosome numbers and/or ploidy levels were retrieved from the literature or have been provided by the germplasm collectors and curators (see abbreviations below). For each sample, the sequences have been classified into the corresponding trnLF, ITS and GI haplotypes (see Table S2). Bdis and Bsta indicate B. distachyon-type and B. stacei-type sequences, respectively. S2). Most (n = 29, 96.7%) of the studied B. hybridum individuals showed two types of GI sequence, one type of which was inherited from each of the two parental species (Table 2); however the number of clones inherited from one or the other parent was dissimilar in some cases and, in only one instance, all of them were from a single parent (n = 1, 3.3% ) ( Tables 2, S2). K2P pairwise substitution rates, the recommended standard distance model in barcoding studies [32,33], showed high interspecific sequence divergence values and low intraspecific values between and among the diploids B. distachyon and B. stacei for the three analysed data sets (Table 2). Both the mean intra-and interspecific divergence values were higher for the more variable nuclear ITS (0.029 (2.9%) and 0.055 (5.5%) respectively) and GI loci (0.022 (2.2%) and 0.038 (3.8%) respectively) than for the more conserved plastid trnLF locus (0.011 (1.1%) and 0.024 (2.4%) respectively). Moreover, the percentage of correctly identified specimens of a given species was in all cases above the 50% cut-off threshold suggested as a baseline to discriminate among species [21] (trnLF: 100/100%; ITS: 100/100%; GI: 100/100%; for B. distachyon and B. stacei, respectively). This supported the existence of a typical barcode gap for B. distachyon and B. stacei in all the three loci. Regarding B. hybridum, the K2P ''intraspecific'' and ''interspecific'' divergence rate calculations, conducted separately with respect to their two parental-donor sequences, showed sequence divergence values similar to those found in B. distachyon and B. stacei for the three loci ( Table 2). The differences between the intraparental and inter-parental (B. distachyon-like vs B. stacei-like) mean values were equivalent to those found between and within the sequences of the two diploids, and the barcoding gaps were also present in all three loci ( Table 2). The percentage of individuals known from cytogenetic data to be B. hybridum, which showed the expected B. hybridum signature in the sequence data, was .50% with the use of either the combined trnLF+ITS core (90%) or the GI (96.7%) sequences (Tables 1, S2). We could therefore equate these values to the respective percentages of correct identification obtained from one and the other data set.
Haplotype networks constructed for each of the separate data sets using statistical parsimony methods (Fig. 2) showed a clear-cut separation between the B. distachyon-type and B. stacei-type classes of sequences in all cases. The plastid trnLF network required a connection of 23 steps between the two main haplogroups (Fig. 2a). The commonest B. stacei-type, h20, included B. stacei and B. hybridum individuals spread across all the SW Asian-Mediterranean and Macaronesian region (and also the respective type specimens of B. stacei (Spain: Formentera; ABR114) and B. hybridum (Portugal: Lisbon; ABR113). Its satellite haplotypes (h22-h28) corresponded to B. stacei and B. hybridum individuals from distinct western and eastern Mediterranean localities; the most isolated, h21 (6 step connection) was shared by individuals from Eastern Spain and the Balearic Islands (Tables 1, S1; Fig. 2a). The B. distachyon-type network was more diverse, with haplotypes separated by several steps and containing almost exclusively B. distachyon individuals (Fig. 2a). The core-group was formed by three main haplotypes, the interconnected h2, h5 and h4, which were found in individuals from disparate Mediterranean localities, plus the B. distachyon type (Iraq, Bd21; h2).
The nuclear ITS network was more complex than the trnLF one; however, it also distinguished two highly divergent B. distachyonand B. stacei-type clusters that were separated by 33 steps (Fig. 2b). These clusters were linked by two intermediate haplotypes (h35, h42) from B. hybridum individuals from both sides of the Mediterranean that likely corresponded to inter-parental recombinant sequences (Table S2). The B. stacei cluster showed three main haplotypes interconnected by single steps (Fig. 2b). One haplotype (h19) comprised B. stacei and B. hybridum individuals from across the Mediterranean region and the Canary Islands, including most of the clones of the B. stacei type specimen (ABR114). The other two main haplotypes mostly comprised eastern Mediterranean (h20) or exclusively western Mediterranean  (h22) samples (Tables 1, S2). Among the satellite haplotypes of the latter group, close phylogeographic connections were also detected between E Spain and the Balearic Islands (h23 and h24). The group showed a pattern of few unresolved loops, likely caused by intraindividual or by intraspecific B. stacei-type sequence recombinations. The more diverse B. distachyon cluster contained five main haplotypes, four of them interconnected by single mutations (h1, h2, h3, h14) and a fifth one (h13) nested within a derived 14step subcluster (Fig. 2b). Haplotypes h1, h3, and h13 included B. distachyon and B. hybridum individuals from across the Mediterranean region and the first also included the B. distachyon type specimen (Bd21). However, h2 and h14 were more structured geographically, containing only Iberian-Balearic or mostly SW Asian-E Mediterranean individuals, respectively. The B. hybridum type specimen (ABR113) sequences were divided among two haplotypes (h38, h29). The B. distachyon cluster also showed one loop within the h13 subcluster, though the remaining satellite haplotypes were connected linearly, with different numbers of stepwise mutations (Fig. 2b). The level of diversity and complexity was higher in the GI network (Fig. 2c); nonetheless it also showed a clear-cut split between the B. distachyonand B. stacei-type clusters that required a connection of 30 steps. Two kinds of potential interspecific recombinant haplotypes, closer to either the B. distachyon (h13) or the B. stacei (h12, h102) clusters, were observed between them (see also Fig. 3c). Within the B. distachyon cluster, the commonest haplotype, h1, included B. distachyon and B. hybridum individuals from across the Mediterranean region (including the B. distachyon (Bd21) and B. hybridum (ABR113; B. distachyon-like copy) type specimens). Most of the h1 satellite haplotypes differed by one or two stepwise mutations; however, a more distantly related subclade was also present, formed exclusively of Iberian haplotypes (h50, h55-h56, h58, h136, h138, h139). Four unresolved loops involving B. hybridum haplotypes indicated the likely occurrence of intraspecific B. distachyon-like sequence recombinations in the hybrids. The B. stacei cluster comprised four main haplotypes. Two of them, h15 (including the B. stacei (ABR114) and the B. hybridum (ABR113; B. stacei-like copy) type specimens) and h19, included individuals of both species from the whole Mediterranean region. A third one, h39, comprised only B. hybridum individuals mostly from the eastern Mediterranean region. In contrast, the more derived h16 comprised only B. stacei individuals from the Iberian Peninsula. An isolated subcluster (separated by 10 steps) was formed by six haplotypes (h49, h51, h57, h137, h140, h141) from southern Spain. The B. stacei group showed a more intricate pattern of loops and divergences among the haplotypes than that of the B. distachyon group (Fig. 2B), likely reflecting a more complex evolutionary history.
The NJ trees based on K2P distances (Fig. S1) reflected the above findings and their topologies were highly congruent with the Bayesian halfcompat consensus trees shown here. In the trnLF tree (Fig. 3a) the B. distachyon and B. stacei sequences fell into two separate fully supported clades (1.00 posterior probability support); these clades collapsed into a polytomy with the core-perennial clade, B. boissieri and B. mexicanum. The nine haplotypes of the B. stacei clade were unresolved; however, the 19 haplotypes of the B. distachyon clade split into two strongly supported clades. One of them included the 5 divergent haplotypes of intermediate placement in the haplotype network (Fig. 2a), which are mostly distributed in the western Mediterranean region, and the other included the majority of the remaining haplotypes (Fig. 3a). Within this second group, some resolution was obtained for three separate Iberian (0.94), Turkish (0.82) and Middle East (0.98) subclades. The ITS tree depicted a strong divergence of the highly supported B. stacei (0.95) and B. distachyon (0.85) clades (Fig. 3b); B. stacei was unresolved in a sub-basal position with B. mexicanum, whereas B. distachyon was resolved as sister to the core perennials clade (0.92). The internal resolution of both clades was poor; however, two separate eastern Spain/Balearic Islands (0.95) and Iranian (0.94) subclades and two Balearic Islands (0.97, 0.95) subclades were recovered within, respectively, the B. stacei and B. distachyon clades. The GI tree also supported the divergent history of the B. distachyon (0.99) and B. stacei (0.99) lineages (Fig. 3c). B. stacei was sister to B. mexicanum p. p. (0.99), whereas B. distachyon was unresolved with respect to the weakly supported B. boissieri-B. retusum/B. mexicanum p.p. clade. The resolution within the B. distachyon clade was low except for a well supported (0.99) Iberian subclade that corresponded to the isolated subcluster of southern Spain B. hybridum (B. distachyon-like) haplotypes (h50, h55, h56, h58) detected in the network (Fig. 2c). Similarly, the B. stacei clade split into two well supported subclades, one of which also corresponded to a subcluster of highly isolated southern Spain B. hybridum (B. staceilike) haplotypes (h49, h51, h57) recovered in the network (Fig. 2c; Tables 2, S1).

DNA Barcodes for B. distachyon, B. stacei and B. hybridum
Under the premise that a successful barcode locus should enable the recovery of monophyletic clusters corresponding to individual species [34], we found that any one of the three assayed loci (trnLF, ITS, GI) could unambiguously differentiate the two monophyletic diploid species from direct sequencing of PCR amplicons. However the identity of the allotetraploid requires combined analysis of direct trnLF and direct or cloned ITS sequences or through analysis of cloned GI sequences.
Our results demonstrate that the widely employed barcoding regions trnLF and ITS [20,23] clearly discriminate between B. distachyon and B. stacei. Both regions showed: i) high inter-vs intraspecific distance divergences, ii) significant barcoding gaps (Table 2), iii) extremely distant monophyletic clusters in the parsimony networks (Figs. 2a, b); and iv) highly supported divergent monophyletic clades in both the NJ (Results not shown) and the Bayesian trees (Figs. 3a, b). They also comply with the requirements of feasibility and rapid and easy production of the sequences to be considered optimal barcoding molecules [20]. However, the allopolyploid nature of B. hybridum, together with its estimated recent origin (c. 1 Ma; [11]), prevents their direct use as single standard barcodes for this taxon and its two parental taxa. Our study has shown that the maternally-inherited B. hybridum trnLF haplotype sequences could have been acquired from either of the two parents (Table 1; Figs. 2a, 3a) and that the biparentallyinherited B. hybridum ITS copies (B. distachyon-like and B. stacei-like) could either have remained intact in the hybrid genome or could Figure 2. Haplotype networks of the Brachypodium distachyon s. l. taxa (B. distachyon (blue), B. stacei (red) and B. hybridum (purple) constructed from DNA sequences of each of the three studied barcoding loci using statistical parsimony methods. a) trnLF network; b) ITS network; c) GI network (boxes A and B show additional B. distachyon-type and B. stacei-type haplotypes, respectively). Each haplotype is represented by a circle with size proportional to the number of sequences that share the haplotype. Haplotype numbers correspond to those indicated in Tables 1 and S2 Figs. 2b, 3b). This creates the possibility of misleading results if the B. hybridum trnLF and ITS sequences had been respectively inherited from and (co-inherited but) converted into the same progenitor sequences, causing confusion between the parent and the allotetraploid taxa (e. g. Bhyb26, Bhyb30and Bhyb35 with B. distachyon, and Bhyb28, Bhyb40, Bhyb41, Bhyb 47, Bhyb49 and Bhyb105 with B. stacei; Table 1). Cloning of the ITS sequences can help to solve the uncertainty if both parental copies are detected, as demonstrated in several studied cases (e. g. Bhyb9, Bhyb10, Bhyb14, Bhyb15, Bhyb18, Bhyb19, Bhyb22, Bhyb23 and Bhyb38; Table 1). The use of the combined trnLF+ITS barcode shows high percentages of successful species discrimination among the species in the reticulate triangle using either direct trnLF and ITS sequences (93.3%) or direct trnLF and cloned ITS sequences (94% (Tables 1, S2). The barcoding would remain untractable, however, if the concerted-evolution mechanism that operates in the multicopy nuclear ribosomal genes [26,35] had converted all the co-inherited copies into the same parental copy.
Because of the drawbacks posed by the use of these classical barcodes, we searched for an alternative nuclear locus that could unambiguously differentiate the three species. This could only be a single-copy nuclear gene that retained both parental copies in the allotetraploid without undergoing convergent evolution towards one of them. Among the several COS proposed as appropriate candidates to differentiate closely related plant species [28,29,30] and to discriminate among Brachypodium taxa [11,17] we selected a 665 bp fragment of the GIGANTEA gene, one of the key regulators of flowering promotion and phase transition [36]. This GI region has proved to be a strong candidate barcode for the B. distachyon s. l. taxa based on: i) its easy amplification, cloning and sequencing; ii) its single-copy orthologous nature: iii) the accumulation of discriminating mutations between the B. distachyon and B. stacei sequences (3.8% of mean inter-vs. intraspecific distance divergence and a significant barcode gap, Table 2); iv) the common presence of the two different co-inherited parental B. distachyon-like and B. stacei-like GI sequences in B. hybridum (Table 1); and v) rarely, the presence of inter-parental recombinant sequences that could be easily detected (Table1). The genetic differences were reflected in the GI parsimony network (Fig. 2c) and in the NJ (Results not shown) and Bayesian GI (Fig. 3c) topologies that recovered, respectively, distant clusters and well supported divergent monophyletic clades for B. distachyon and B. stacei, each of them including their respective derived B. hybridum copies. Although 5 cloned sequences were sufficient to detect both parental copies in most of the studied B. hybridum samples, a few difficult samples required the screening of up to 10-16 clones (e. g. Bhyb13, Bhyb34, Bhyb35 Bhyb50) or even a larger number, like in the case of the Bhyb69 sample (58 clones), to pick up variation from both parental species. Nonetheless, one sample (Bhyb2) showed only one parental copy after a relatively intensive clonal screening (49 clones; Tables 1, S2). This implies that a larger number of GI clones should be sequenced in order to detect coinherited copies from both parents, providing that they are still maintained in the hybrid genome.
All the above evidence supports the choice of the GI locus as an alternative or as an additional suitable barcode for discriminating among the triangle species of the B. distachyon s. l. complex. This demands the use of cloning procedures but reduces the number of surveyed loci to just one. Moreover, the percentage of successful species discrimination increases to 98.2% (Tables 1, S2), which is above than that of the combined trnLF+ITS barcode. It further complements alternative cytogenetic identifications based on genome size or chromosome counting. The choice of the best method in a given situation would depend on considerations of facilities and costs, the acceptable error rate, and a priori information on the levels of polyploids in the sample. Very likely, other single-copy genes, such as those analysed within Brachypodium that also showed both co-inherited parental copies in the derived hybrid (e. g. CAL, DGAT, SST3; [11,17]), could also serve as barcodes for this group of taxa. Single-copy nuclear genes are not ideal universal barcodes for plants as their priming sites cannot be easily transferred to non-related groups (e. g. [37]). The GI locus has been successfully amplified and sequenced in different representatives of Pooideae (López-Á lvarez & Catalán, unpublished data) and could probably be extended to all the grass family. We propose the use of single-copy genes as a suitable barcoding alternative to circumvent the problem posed by the existence of recently evolved hybrids and polyploids within specific plant groups. In the future, the use of Next Generation Sequencing (NGS) data (e. g. [38,39,40]), may facilitate the barcoding of problematic plant groups which contain recently evolved hybrids and polyploids. Although the availability of NGS data is still limited both taxonomically and among laboratories, its use for this purpose is rapidly increasing. In the mean time, the use of singlecopy genes is the most practicable current solution for barcoding such plant groups.

Utility of the Proposed Barcoding Method
The new DNA barcoding method proposed here has direct applications to many on-going studies of the model plant B. distachyon and its close allies [2,4]. It has great relevance to the selection of wild germplasm for genomic (http://brachypodium. pw.usda.gov) and plant breeding programs, and for ecological and evolutionary studies of wild populations [6,11]. For this, the correct identification of the three species is crucial but still troublesome due to uncertainty in identifications based on highly variable morphological traits and on ambiguous genome sizes, which show overlapping sizes for B. distachyon and B. stacei [11,41]. Our study has revealed several misidentifications of B. distachyon and its close relatives B. stacei and B. hybridum in germplasm collections (e. g. USDA, ABR) and inbred lines (cf. [7,8,11,42]; e. g., Bsta9, Bsta42, Bsta43, Bhyb9, Bhyb10, Bhyb19, Bhyb20, Bhyb21, Bhyb38, Bhyb39, see Table 1) that likely resulted from incorrect orcein-staining chromosome counts or misleading genome size measurements. Alternatively, the misidentifications could also result from the mixed sampling of individuals or seeds of different species from admixed populations. This problem has been manifested in the failure of 'intraspecific' B. distachyon crossing programs, which were in fact interspecific (Magda Opanowicz and Figure 3. Bayesian halfcompat consensus trees of the Brachypodium distachyon s. l. taxa (B. distachyon (blue), B. stacei (red) and B. hybridum (purple) based on analysis of DNA sequences. a) trnLF tree; b) ITS tree; c) GI tree. B. distachyon-type and B. stacei-type clades are shown as blue and red triangles, respectively, in the small subfigures; potential recombinant parental sequences of B. hybridum (BdisBsta, see Table  S2) are indicated in green. 'i' and 'am' indicate, respectively, incomplete and ambiguous sequences. Numbers below branches correspond to posterior probability support (PPS) values above 0.5. Geographical distributions of sequenced samples are indicated in the large subfigures (CircumMed -circumMediterranean; E Med -eastern Mediterranean; IB -Iberian Peninsula; Mo -Morocco; SW As -southwestern Asia; W Medwestern Mediterranean). doi:10.1371/journal.pone.0051058.g003 John Doonan, pers. comm.) and in unexpected results from cell wall analyses of putative B. distachyon lines, which corresponded to B. stacei or B. hybridum lines (Richard Sibout, pers. comm.). Our barcoding method overcomes these problems, providing an efficient and automatable method to discriminate among the three species.
The validity of our proposed barcoding method depends on the large genetic divergences detected between the diploid B. distachyon and B. stacei genomes for the three analysed loci (Tables 1, 2). The high number of synapomorphic mutations separating them (23,33, and 30, respectively, for the trnLF, ITS and GI loci; Fig. 2), facilitates the immediate classification of the genomes, even from incomplete sequences (Figs. 3a, b, c). Furthermore, the three loci provide informative indels that differentiated B. distachyon and B. stacei, like the two 6-bp gaps in the trnLF locus, the two 3-and 4-nts gaps in the ITS locus, and the one 1-nt gap in the GI locus (see Table S3). Within the ITS region, the ITS2 spacer covers the two diagnostic indels and more than half of the synapomorphic markers detected within the locus (24 out of 43; Table S3), supporting the proposal that the ITS2 subregion could be used alone (Hollingstworth 2011) to barcode case study species. The correct identification of B. hybridum would always require, however, the combined use of, at least, the trnLF+ITS barcoding sequences. Our data indicate that direct PCR sequences from the two genes could discriminate B. hybridum from its two parental species in a high percentage of the cases (88.75%; Table 1). This value increases to 90.0% when the ITS products are cloned. However, as the method might not permit full resolution, due to the potential inheritance of the same parental plastid trnLF and converted nuclear ITS sequences in the hybrid (cf. [25]), the single-copy GI locus was selected as an alternative barcode for the species in the triangle. The random screening of 5 individual GI clones gave a relatively high resolution (80%) that became higher (96.7%) when up to 10-16 (and exceptionally more, e. g. 58) clones were sequenced within our surveyed samples (Table 1).
Recently, Giraldo et al. (2012) [41] proposed a new molecular method to differentiate the three taxa based on the different allelic SSR profiles of B. distachyon and B. stacei at four nuclear microsatellite loci and their additive patterns in B. hybridum. This represents an important step forward for rapid molecular identification of the species, similar to the molecular markerbased barcoding methods proposed for taxonomically complex and highly reticulate plant (e. g. [43]) and animal (e. g. [44]) groups. However, these methods could be less stable and prone to substantial changes than the sequence-based ones as the SSR allelic variation of the barcoded species might be greater than their DNA sequences (and consequently overlap) when a wider range of samples is used [45]. The discriminating SSR markers proposed by Giraldo et al. (2012) [41] were tested across a wide representation of Spanish samples and in the type specimens of the three taxa, but they were not studied in samples from other Mediterranean regions. Thus, our barcoding approach and that of Giraldo et al. (2012) [41] could be used in a complementary way (e. g. [44]) for rapid and accurate molecular identification of the 'Brachy-complex' taxa [2], allowing for confident identification even when unusual allelic variation renders one or other method unreliable.

Genetics and Geographical Distributions of the Three Species and the Polyphyletic Origin of B. hybridum
Our current barcoding survey of B. distachyon, B. stacei and B. hybridum samples has encompassed the whole Mediterranean region, the native distribution area of the three species [11]. One of the main findings of the study is the detection of B. stacei populations in both the western and eastern Mediterranean regions (Table 1; Fig. 1). This rare species was until recently only known from the type locality (Spain: Balearic Islands: Formentera) [11]. However, other recent studies have indicated its presence in other localities of SE Spain [7,41] and in the Canary Islands [41]. Our analyses have confirmed most of these findings and have also revealed its presence in other western Mediterranean localities (Mallorca (Balearic Islands), S Spain, NW Morocco; Table 1, Fig. 1) where it was mislabelled as B. distachyon in the herbaria vouchers. Most notably, we have revealed the presence of B. stacei in the SW Asian-Middle East region (Iran, Israel, Lebanon, Palestine; Table 1), from which it was unknown and also misclassified as B. distachyon. Knowledge of this broader native geographical distribution area of B. stacei will be highly valuable for the selection of new ecotypes and local lines that could be used in the generation of F2 progenies to help the assembly of the newly sequenced B. stacei ABR114 genome (http://brachypodium.pw. usda.gov; John Vogel, pers. comm.). Our study has also contributed to understanding the native distribution areas of the more widely distributed species B. distachyon and B. hybridum (Table 1; Fig. 1). Both taxa are widespread in the Mediterranean region and largely overlap [2,8,11]. The new barcoding data confirm their presence on both sides of the Mediterranean basin, from which regions most the germplasm lines have been generated [2,8,41], and also report their presence in the central Mediterranean area (Table 1). This would be also a valuable source of information for the selection of new B. hybridum ecotypes and lines for the production of F2 progenies that would complement the assembly of the newly sequenced B. hybridum ABR113 genome (http://brachypodium.pw.usda.gov; John Vogel, pers. comm.), and those of B. distachyon that could be added to the resequencing project of the model plant.
Despite their abundant distributions in the Mediterranean, the intraspecific genetic diversities of the parental B. distachyon (0.5% trnLF and ITS; 0.4%GI) and B. stacei (0.1% trnLF; 0.5% ITS; 0.3% GI) sequences were low (Table 2). This was manifested in the sharing of their respective most common trnLF, ITS and GI haplotypes by individuals from populations located far apart in the circumMediterranean region (Tables 1, S2; Fig. 2). In contrast, individuals from geographically close populations, or even intraindividual clones, showed different haplotypes. Our results agree with those of Vogel and co-workers [8] and Mur and coworkers [2], based on SSR markers, which found close genetic connections between geographically distant B. distachyon populations in Turkey and between Spain and Turkey, respectively. Selfing species are expected to show low within-population and high among-population genetic diversities [46]. However, the autogamous B. distachyon and B. stacei samples show low overall geographical structuring of genetic diversity. This might be a consequence of the long distance dispersal of their seeds (cf. [8]) and the high capability of these annuals to adapt to different environmental conditions (cf. [6]). The genetic diversity of the less abundant B. stacei could be lower than that of the more widespread B. distachyon, as deduced from the proportionally fewer trnLF and ITS haplotypes detected in the former (Table 1). Both taxa show, however, some traces of geographic isolation between the western and eastern Mediterranean regions, evidenced by the detection of regional haplotypic clades (e. g. B. distachyon: western Mediterranean, Iberian, Turkish and Middle East subclades (trnLF, Fig. 3a); B. stacei: E Iberian-Balearic and Turkish subclades (ITS, Fig. 3b). The phylogeographic study of these populations is currently in progress (López-Á lvarez and coauthors, unpublished results).
Another striking finding of our study is the demonstration of the existence of different directional crosses that likely gave rise to the new allotetraploid species (Tables 1, S2; Figs. 3a, b, c). In the more restricted study of Catalán and co-workers [11], all the surveyed B. hybridum individuals showed the inheritance of a B. stacei-like plastid genome, resulting from a cross with maternal B. stacei and paternal B. distachyon parents. However, our survey with larger sample sizes shows that, although the above seems to be most common cross direction, in a few cases the B. hybridum individuals are derived from a cross between maternal B. distachyon and paternal B. stacei parents (Table 2; Fig. 3a). The fact that B. hybridum plants derived from the alternate-direction crosses occurred in different Mediterranean localities (Table 1; Fig. 1) supports the multiple and polytopic origins of the allotetraploid B. hybridum. A closer inspection of the more variable ITS and GI networks and phylogenetic trees also reveals distinct relationships of the B. hybridum sequences to different parental haplotypic groups (  Figs. 2, 3). Furthermore, the low mean 'interspecific' divergence rates shown by the B. distachyon-like and B. stacei-like sequences of B. hybridum with respect to those of the two progenitors for the three studied loci ( Table 2) indicate that the two genomes of the hybrid have kept the same or similar signatures as those of the ancestral genomes, supporting the recent origin of B. hybridum in the Pleistocene (cf. [11]). Additionally, the low mean 'intraspecific' divergence rates of the respective B. distachyon-like and B. stacei-like sequences of B. hybridum (Table 2), which are similar to the parental ones, suggests that the original genomes have remained largely intact and that the time elapsed since the hybridizations took part was a brief one. Nonetheless, the detection of some interspecific ITS and GI recombinant sequences in B. hybridum (Table 1; Figs. 2, 3) points towards the occurrence of frequent genomic rearrangements within the hybrid nucleus. This agrees with cytogenetic CCP evidence demonstrating the existence of structural rearrangements in the B. hybridum chromosomes with respect to the B. distachyon and B. stacei ones [16].
The recurrent formation of allopolyploid plant species has been largely documented in the literature [47,48] and references therein). Their predominance over their parental diploid progenitors has been explained as the result of their higher fitness or their higher capability to colonize new habitats and new lands [49,50]. The wide distribution of B. hybridum, which exceeds those of B. distachyon and B. stacei in their native Mediterranean region, as the only known species of the complex to have colonized other continents [9,11], could be a consequence of its more genetically diverse hybrid genome and the likely recurrent origin of new hybrid variants. This could have resulted in fit and well adapted individuals that have displaced the parental species from their habitats and/or have invaded new niches [50]. Current studies are under way to investigate the recurrent origins of B. hybridum through time (López-Alvarez & Catalán, unpublished results).

Future Perspectives of the Barcoding Method for Other Brachypodium Taxa
The almost exclusively self-fertile breeding system of the cleistogamous B. distachyon [8] and of B. stacei (L. Mur, pers. comm.) resulted in highly homozygous genomes of the two diploid parental species that contributed to the heterozygous allotetraploid B. hybridum genome [41]. In a recent assessment of genetic distances between different parent-pairs of hybrid plants, Paun and co-workers [51] concluded that parental species of allopolyploids were genetically more divergent that those of homoploid hybrids. Within Brachypodium, the differences in the inter-vs.
intraspecific divergence values between the B. stacei and B. distachyon sequences were significant (Table 2). Catalán and coworkers [11] also found significant differences in the evolutionary rates of the B. stacei and B. distachyon ITS sequences, the former being significantly higher than the later. The salient features of the two distinct genomes were demonstrated through incompatible cross-GISH hybridizations [13,14]. Their genomic divergences could have triggered the allopolyploidization process that resulted in the B. hybridum populations, and the long isolation of the two parental taxa has facilitated the detection of the proposed trnLF -ITS -GI barcoding method to distinguish the parents and the hybrid.
The usefulness of our DNA barcoding approach at the generic level could however be less successful among recently evolved taxa, like the core-perennial group of Brachypodium species, due to their close relationships [11,52]. No significant differences in plastid trnLF and nuclear ITS sequences were detected between pairs of long rhizomatous Brachypodium species, nor between them and B. distachyon [11]. They were found, however, between the ancestral short-rhizomatous B. mexicanum and annual B. stacei taxa. Widespread geographical sampling would be required to test the utility of the trnLF and ITS barcodes within Brachypodium as a whole. Regarding GI, all the six analysed Brachypodium species [17] showed different sequences and copies, with copy numbers related to their ploidy levels. The apparently more-promising GI barcode should also be evaluated within a wide geographical and taxonomical sample of Brachypodium representatives. Brachypodium has been proposed as a model plant genus for temperate grasses [15], based on the overall small genome size of its members, their compact genomes and an extensive reticulate evolutionary and polyploid history [16]. Diverse stable species (e. g. B. phoenicoides, 2n = 4x = 28) and cytotypes (e. g. B. pinnatum 2n = 4x = 28) are of hybrid origin [16,17] and most of the polyploids (e. g. B. mexicanum, B. retusum) are of suspected hybrid origin. Further research is currently under way to find a universal barcoding system for Brachypodium.

Sampling
A total of 210 samples (56 of B. distachyon, 43 of B. stacei and 111 of B. hybridum) were included in the study (Table 1). Those samples corresponded to inbred lines generated at CRF-INIA, INRA, USDA, and Aberystwyth (ABR), Alcalá de Henares (UAH), Jaén (UJA), Politécnica de Madrid (UPM), Tel-Aviv (TAU) and Zaragoza (Unizar) Universities and to new germplasm accessions collected over their entire native distribution areas in the Mediterranean region (Fig. 1). The identity of most of the samples was tested through DAPI-staining chromosome counting of the studied materials, which has proved to be the most accurate cytogenetic method to differentiate the three taxa. This was coupled with other identifications based on flow cytometry measurements (genome size) and anatomical stomata-leaf guard cell length measurements which separated, respectively, B. hybridum from the diploids, and all the three species [11]. We used this information as an a priori method to validate the resolution power of the proposed barcodes for the discrimination of the three species amplified with highly conserved primers.

DNA Barcode Sequences of Plastid and Nuclear Genes
Three loci were tested as potential tools for effective DNA barcoding of B. distachyon, B. stacei and B. hybridum, the maternally inherited plastid trnLF region (trnL(UAA) intron -trnL(UAA) exon -trnL(UAA)/trnF(GAA) spacer) and the biparentally inherited nuclear multicopy ribosomal internal transcribed spacer ITS region (ITS1-5.8S-ITS2) and single-copy GIGANTEA (GI) gene. Total DNA was extracted from dried leaf tissue using a modified CTAB method of Doyle and Doyle [53]. The plastid trnLF region was amplified and sequenced from direct PCR products using the universal primers 'c' forward (59-CGAAATCGGTAGACGC-TACG-39) and 'f' reverse (59-ATTTGAACTGGTGACAC-GAG-39) of Taberlet and co-workers [54]. PCR products were purified using ExoSAP-ITTM (USB Corporation, Cleveland, OH) and sequenced in both directions by cycle-sequencing using the Big-Dye version 3 chemistry (Perkin-Elmer), with a Prism 3100 Genetic Analyzer (ABI). The nuclear multicopy ITS region was amplified using primers ITSL forward (59-TCGTAA-CAAGGTTTCCGTAGGTG-39) and ITS4 reverse (59-TCCTCCGCTTATTGATATGC-39) optimized for grasses [55]. A 665 bp portion of the nuclear single-copy GI gene was amplified with primers GIGIE1F (59-TATGTCWGYNT-CAAATGGGAAGTGG-39) and HGIE5R (59-AACTTTRAA-GATTGGCCTRTTGTRGTGA-39) designed for Brachypodium [56]. Due to the, respectively, plausible and known existence of multiple ITS and GI copies in the B. hybridum samples, and the potential existence of more than one ITS copy in the B. distachyon and B. stacei samples, all GI and most ITS amplified products were cloned and sequenced, aiming to detect their potential intraindividual copy number variation. Sixty one amplified ITS (24) and GI (37) fragments were cloned separately into a pGEMH-T Vector System I cloning vector (Promega, USA) following the manufacturers' instructions. These were transformed into Escherichia coli JM109 competent cells. Five colonies of each individual sample were randomly picked for a first ITS and GI screen, and each clone was sequenced using M13 forward (59-GTTTTCCCAGT-CACGAC-39) and reverse (59-CAGGAAACAGCTATGAC-39 ) primers following the same procedure as for direct-PCR products. In some ambiguous cases, up to 10-16 (58) clones per individual sample had to be sequenced to detect the copy number variation (see Results). The characteristics of the amplification conditions for each barcoding gene are indicated in Supplemental methods (see Methods S1).

Data Analyses
Data alignment. The independent trnLF, ITS and GI sequence data matrices were aligned using Geneious 4.7 (Biomatters, Auckland, New Zealand) and MacClade v4.08 [57]. The alignments were edited manually to remove adapters, PCR primers and bases that had been added during the ligation process and to optimize the final alignment. A single consensus sequence per individual sample was obtained from direct-PCR sequencing of the trnLF region (and in some cases of the ITS region) and several sequences per individual sample were obtained from the cloned ITS and GI regions (see Results).
Genetic distance analysis. Because the fundamental requirement of a suitable barcoding method is to attain a level of interspecific polymorphism high enough to allow the monophyletic grouping of individuals from the same species and the recovery of distinct clusters at the interspecific level [21], inter-and intraspecific genetic distances among and within B. distachyon, B. stacei and B. hybridum (B. distachyonand B. stacei-like) sequences were calculated using the Kimura's 2-parameter (K2P) model implemented in MEGA4 [58] for each DNA barcoding locus. The K2P method has been reported as a fast and accurate method to examine relationships among species and to assign unidentified samples to known species [59]. Due to the distinct nature and inheritance of each molecule, we analysed each locus separately rather than combining sequences. We also generated the respective Neighbor-Joining (NJ) trees based on the K2P sequence divergences estimated. A species was discriminated when more than 50% of the sampled individuals fell in the same monophyletic group in the NJ tree. This relatively low threshold has been chosen to reflect the minimum probability for which a correct identification would be more likely than a wrong identification [21].
Haplotype network analysis. The number of haplotypes of each separate locus was obtained from statistical parsimony analysis of the complete and unambiguous trnLF, ITS and GI aligned data matrices using TCS 1.21 [60]. The respective haplotype networks were constructed with this software imposing a 95% connection limit for up to 30 steps and treating the gaps as a 5 th character state. The clustering of similar haplotypes in groups and their divergence, based on the large number of mutational steps, were used as additional evidence supporting the barcoding method.
Phylogenetic analysis. Independent analyses were conducted on each separate data matrix that included, respectively, all the newly sequenced B. distachyon, B. stacei and B. hybridum samples, other representatives of the close Brachypodium perennials and other more distantly related Triticeae (Hordeum, Secale) outgroups that were used to root the trees ( Table 1). The DNA sequences of the perennial Brachypodium taxa and of the outgroups corresponded to those analysed in the study of Catalán and co-workers [11]. Bayesian phylogenetic analyses were performed using the program MrBayes 3.1.2 [61]. The best nucleotide substitution model (GTR+ G) was previously selected using the hierarchical likelihood ratio test and the Akaike criterion implemented in MrModeltest 2.3 [62]. The Bayesian Inference search was computed imposing the nst = 6 and rates = gamma parameters to the nucleotide sequence partition and leaving the program to estimate the remaining parameters. A total of 3750 posterior probability Bayesian trees were saved for each separate data matrix after performing two runs, each with 5000000 generations and four chains, sampling trees every 1000 generations, and a burn-in option of 1250 trees per run once stability in the likelihood values was attained. A Bayesian halfcompat consensus tree of all saved trees was computed for each separate data set; the posterior probability values of the branches of each consensus tree were used as a measure of their nodal support.