Allopolyploid Origin of Chenopodium album s. str. (Chenopodiaceae): A Molecular and Cytogenetic Insight

Reticulate evolution is characterized by occasional hybridization between two species, creating a network of closely related taxa below and at the species level. In the present research, we aimed to verify the hypothesis of the allopolyploid origin of hexaploid C. album s. str., identify its putative parents and estimate the frequency of allopolyploidization events. We sampled 122 individuals of the C. album aggregate, covering most of its distribution range in Eurasia. Our samples included putative progenitors of C. album s. str. of both ploidy levels, i.e. diploids (C. ficifolium, C. suecicum) and tetraploids (C. striatiforme, C. strictum). To fulfil these objectives, we analysed sequence variation in the nrDNA ITS region and the rpl32-trnL intergenic spacer of cpDNA and performed genomic in-situ hybridization (GISH). Our study confirms the allohexaploid origin of C. album s. str. Analysis of cpDNA revealed tetraploids as the maternal species. In most accessions of hexaploid C. album s. str., ITS sequences were completely or nearly completely homogenized towards the tetraploid maternal ribotype; a tetraploid species therefore served as one genome donor. GISH revealed a strong hybridization signal on the same eighteen chromosomes of C. album s. str. with both diploid species C. ficifolium and C. suecicum. The second genome donor was therefore a diploid species. Moreover, some individuals with completely unhomogenized ITS sequences were found. Thus, hexaploid individuals of C. album s. str. with ITS sequences homogenized to different degrees may represent hybrids of different ages. This proves the existence of at least two different allopolyploid lineages, indicating a polyphyletic origin of C. album s. str.


Introduction
How new forms arise in nature is a key question in evolutionary biology. It is believed that a significant part of speciation and diversification in plants (particularly in angiosperms and ferns) involved reticulate evolution characterized by occasional hybridization creating a in the polyphyletic evolution of this widely distributed species. To test this, we need to know the origin of each polyploid species.
Recently, several attempts were made to elucidate the origin of the hexaploid C. album s. str. [7,37,47,48]. However, neither of them has brought a satisfactory explanation. Gangopadhyay et al. [47] speculate about the involvement of three diploid species. According to them, narrow-leaved and broad-leaved diploid cytotypes of C. album and C. murale L., a species not belonging to the C. album aggregate, were involved. According to Rahiminejad & Gornall [48] the diploids C. ficifolium and C. suecicum (or taxa very similar to these two) are considered the putative parents of C. album s. str., based on an analysis of secondary metabolites. Mandák et al. [7] compared four hypothetical scenarios, including that of Rahiminejad & Gornall [48], by analysing genome size variation among European taxa belonging to the C. album aggregate. Hybridization between diploid and tetraploid taxa via unreduced gametes turned out to be the most likely scenario. Nonetheless, genome size was unable to discriminate between species with the same ploidy level, so the exact combination of parental taxa of C. album s. str. remains unclear. The allopolyploid origin of C. album s. str. was recently confirmed by the molecular systematic study of Walsh et al. [37]. No study, however, was able to identify the putative parents of C. album s. str. The study of Mandák et al. [7] thus still represents the closest approximation of the species origin.
Fuentes-Bazan et al. [49,50] sequenced the nrDNA ITS region, but did not report on the occurrence of sequence heterogeneity in C. album s. str. However, our pilot study revealed a certain degree of intraindividual polymorphism in ITS sequences of this species, indicating that this marker may be a valuable source of information about its origin. We therefore sampled C. album s. str. accessions covering a significant part of the species' distribution range in Eurasia (from the Iberian Peninsula to western China). Besides the analysis of ITS, we sequenced cpDNA and performed genomic in situ hybridization (GISH), aiming to: (1) verify the hypothesis of Mandák et al. [7] that C. album s. str. originated via hybridization between diploid and tetraploid species; (2) identify the direction of the hybridization; (3) evaluate the pattern of intraspecific variation of C. album s. str. at the continental and local scales; and (4) estimate whether C. album s. str. has a mono-or polyphyletic origin.

Ethics statement
The collections used for this study did not involve endangered or protected species, and no specific permissions were required for sampling activities in these locations.

Plant material
We sampled 52 populations of hexaploid Chenopodium album s. str., each represented by 1 accession, covering most of the species distribution range in Eurasia. In addition to this coarse sampling, we sampled 5-13 individuals from five populations in the Czech Republic to investigate genetic variation on a fine scale. In total, we sampled 96 individuals of C. album. Further, we sampled potential parental species of C. album: diploid C. ficifolium (5 accessions) and C. suecicum (9 accessions) as well as tetraploid C. striatiforme (2 accessions) and C. strictum (8 accessions). Chenopodium chenopodioides (L.) Aellen, C. glaucum L., C. rubrum L., and C. urbicum L. served as outgroup taxa. The origins of plant material used for the present study are given in Table 1. When available, ripe seeds were collected in the field; if not, leaf material from well-developed flowering plants was collected and dried in silica gel and stored until DNA extraction. The plants were grown in the experimental garden of the Institute of Botany, Czech Academy of Sciences, Průhonice, Czech Republic (49.991667, 14.566667, ca. 320 m above sea Information on the origin, cpDNA haplotype (cpDNA) and nrDNA ITS ribotype (ITS) is presented for all accessions analyzed. The GenBank accessin numbers of the representative sequences for each cpDNA haplotype and nrDNA ribotype are given by the accession that was submitted to the database.
Samples used for chromosome spread preparation for GISH experiments are marked with "GISH", and samples used as probes for GISH experiments are marked with "GISH-probe". The herbarium specimens are deposited in the herbarium of the Institute of Botany of the Czech Academy of Sciences (PRA). level) between 2011 and 2015. The seeds were germinated in 5 × 5 cm bedding cells with homogenous garden compost and later moved to 19 × 19 × 19 cm (6.9 L) pots filled with common garden substrate. Fresh leaves were collected from each accession and stored in silica gel until DNA extraction. For genomic in situ hybridization, seeds from selected accessions were germinated again and seedlings were cultivated. Root tips were sampled from young plants and stored in a fixative solution (3:1 mixture of absolute ethanol and acetic acid) prior to the analyses.

DNA extraction, PCR and sequencing
Total genomic DNA was extracted following the sorbitol extraction method [51]. The noncoding rpl32-trnL region of chloroplast DNA was analysed to determine the maternal lineage of C. album s. str. The primers described in [52] were used. The PCRs were carried out in 25 μl reactions containing 12.5 ul of Plain PP MasterMix (TopBio, Prague, Czech Republic), 0.2 μM of each primer and 20 ng of genomic DNA. The cycling conditions were as follows: 4 min at 95°C followed by 35 cycles of 95°C for 30 s, 55°C for 30 s and 72°C for 1 min, and final extension at 72°C for 15 min. The nrDNA ITS region was amplified using the primers AC-ITS5 [49] and ITS4 [53]. The PCRs were carried out in 25 μl, each reaction containing 1 x PCR buffer with KCl (Fermentas, St. Leon-Rot, Germany), 3 mM of MgCl 2 , 0.2 mM of each dNTP, 0.2 μM of each primer, 0.5 U of Taq DNA polymerase (Fermentas) and 20 ng of genomic DNA. The cycling conditions were as follows: 4 min at 95°C followed by 35 cycles of 95°C for 30 s, 50°C for 30 s and 72°C for 1 min, and final extension at 72°C for 10 min. All samples were amplified in three independent reactions that were pooled equimolarly prior to any downstream analysis to minimize the stochastic effects leading to the elimination of underrepresented ITS sequence types during the amplification step.
The samples were purified and both strands were sequenced using the services of Macrogen (Amsterdam, The Netherlands).
One individual of hexaploid C. album s. str. showing a high level of intraindividual polymorphism in ITS was cloned to confirm the presence of ribotypes inferred based on direct sequencing. The TOPO TA cloning kit (Invitrogen, Carlsbad, CA) was used following the manufacturer's instructions, only downscaled to half reactions. Ten positive clones were transferred into 20 μl ddH and denatured at 95°C for 10 min. They served as templates for subsequent PCR amplifications and sequencing. Representative sequences were deposited in GenBank under the accession numbers KU517365-KU517408 (Table 1).

Analysis of DNA sequences
Sequences were proofread and corrected manually for inadequate base calling using Chroma-sLite v 2.1 (Technelysium Pty., Brisbane, Queensland, Australia). The ITS sequences were checked for intraindividual polymorphism, and if mixed bases occurred at the same position in both strands, IUPAC ambiguity codes were used to mark these sites. Sequence alignments were done using the MAFFT algorithm [54] as implemented by the GUIDANCE web service [55]. Alignments were adjusted manually using BioEdit [56]. Mononucleotide repeats were excluded from the cpDNA dataset due to a potentially high level of homoplasy [57] prior to further processing of the data. The "simple gap coding" method [58], as implemented in SeqState [59], was used to score variation in indels.

Phylogenetic analysis of cpDNA
Maximum parsimony (MP) analysis using heuristic search with 100 replicates of random sequence addition and TBR branch swapping, as implemented in PAUP Ã 4.0.b10 [60], was performed. Bayesian analysis (MB) was conducted using MrBayes 3.2.1 [61]. The GTR model was used as the best fitting model of nucleotide substitution as determined by the Hierarchical Likelihood Ratio Test, carried out using MrModeltest 2.3 [62]. Two parallel analyses with four chains each were run for 2 000 000 generations, sampling each 1000-th generation. The average deviation of split frequencies indicated that this number of generations was sufficient to reach convergence. Results of the first 500 steps were discarded as burn-in and the remaining 3002 trees were used to reconstruct the consensus tree. The MP and MB analyses were performed twice.
The haplotype network was reconstructed using TCS 1.21 [63]. A considerable part of the observed genetic variation was represented by insertions and deletions. However, TCS cannot work with the presence/absence matrix created for indels and appended to the sequence alignment file by SeqState. To include this information in the analysis, the 0/1 characters were replaced manually by A/T in the nexus input file. The analysis was then run with the default conditions (i.e. with gap characters treated as missing data and a 95% connection limit), so the software could analyse information on indels as A/T variation, but the original gap characters (represented by "-" in the sequence alignment) were not taken into account. This approach allowed the incorporation and equal weighting of indels with different lengths in the analysis.

Phylogenetic analyses of ITS
Nucleotide diversities were calculated in MEGA 5.05 [64]. Maximum parsimony (MP) was performed as described for the cpDNA. The SYM+G model of nucleotide substitution was used for the MB analysis and the analysis was run for 1 000 000 generations, sampling each 1000-th generation. Results of the first 200 steps were discarded as burn-in and the remaining 1602 trees were used to reconstruct the consensus tree.
Neighbour net analysis as implemented in SplitsTree 4.12.6 [65], with ambiguous characters treated as average states, and uncorrected P distance was performed to better visualize the relationships of hexaploid ribotypes to those originating from diploids and tetraploids. Bootstrapping with 1000 replicates was performed to assess the support of the resulting groups.

Chromosome preparations and genomic in situ hybridization (GISH) procedure
Root tips were fixed in 3:1 (v/v) 100% ethanol:acetic acid. The fixed root meristems were thoroughly washed in water and enzyme buffer (10 mM citrate buffer at pH 4.6), and partially digested in 0.3% (w/v) cytohelicase, pectolyase and 0.5% (w/v) cellulase (Sigma Aldrich, St. Louis, MO, USA) at 37°C for 2-3 hours followed by washes in enzyme buffer and water [66,67]. The material, in a water drop, was carefully transferred onto a grease-free microscope slide and the cells were spread according to the technique described in [68] with modifications according to [69]. Slides were examined and metaphase chromosomes were photographed in phase-contrast under an Olympus BX-53 microscope.
For GISH experiments, total genomic DNA of C. ficifolium, C. suecicum, C. strictum, C. striatiforme and C. album s. str. were used. DNA was sonicated in a BIORUPTOR PICO (Diagenode, Liege, Belgium) machine and labelled with Cy3 (GE Healthcare Life Sciences, Little Chalfont, UK) and ATTO 488 (Jena Bioscience, Jena Germany) according to the standard oligolabelling protocol. Two differently labelled DNA probes were hybridized simultaneously and GISH was carried out on a ThermoBrite (StatSpin, Hannover, Germany) programmable temperature controlled slide processing system at 63°C for 3 h. Slides were stained with DAPI and mounted in antifade mountant (Vector Laboratories, Peterborough, UK). After the GISH procedure, the chromosomes were again examined and photographed under a phase contrast microscope and the resulting images were computerized using the ProgRes MF Cool system (Laboratory Imaging, Prague, Czech Republic).

CpDNA haplotype diversity and phylogenetic relationships
The total length of the rpl32-trnL alignment was 1254 bp including outgroup taxa (1184 bp when only the samples of C. album agg. were considered). The total alignment contained 25 indels 1-385 bp in length (4 indels of 1-150 bp length in samples of C. album agg.). Seven cpDNA haplotypes were found in 122 analysed individuals of C. album agg. One haplotype was found in C. ficifolium (5 accessions analysed), two in C. suecicum (9 accessions), two in C. strictum (8 accessions) and one in C. striatiforme (2 accessions). The latter two species shared one haplotype (haplotype 1). The same haplotype was also found in hexaploid C. album s. str. In addition to this shared haplotype (haplotype 1), two further haplotypes were found within C. album s. str. (Fig 1).
Phylogenetic and haplotype network analyses revealed two major lineages of haplotypes, one diploid and one polyploid; the latter, however, did not receive significant bootstrap and posterior probability support (Fig 1). Whereas three haplotypes were found in diploid species (two of them formed a well-supported subclade in C. suecicum), 4 haplotypes were present in tetraploid and hexaploid species. No further subdivision correlated with species identity was found in the polyploid lineage.

Analysis of ITS
Phylogenetic relationships and intraindividual polymorphism in C. album s. str. The total length of the ITS alignment was 625 bp (624 bp for C. album agg. taxa only), containing five 1 bp long indels (only one indel was found in C. album agg.). Forty-nine variable positions were found in C. album agg.
Twenty-nine unique ITS sequences were revealed based on direct sequencing of 122 individuals of five C. album agg. species. In each of the diploid species, C. ficifolium and C. suecicum, three ribotypes were found. Two and three ribotypes were found in tetraploids C. striatiforme and C. strictum, respectively. While the sequences of the diploid and tetraploid species differed in 37 out of the 624 aligned characters (mean nucleotide diversity 1.3%), forming two well separated clades, the sequence divergence within the clades was small (mean nucleotide diversity 0.26% and 0.32% for diploids and polyploids, respectively) (Figs 2 and 3).
Three ribotypes of C. suecicum shared one synapomorphic mutation at position 529 (Fig 2) and formed a monophyletic group within the diploid clade (Fig 3). The sequences of C. ficifolium were placed to the base of this group.
Neither the sequences of C. striatiforme nor C. strictum formed a monophyletic group within the polyploid clade. Moreover, ribotype 7 was found in both species (Fig 3).
In Chenopodium album s. str., 21 ribotypes were identified, all of which were placed in the polyploid clade (Fig 3). The majority of these sequences (ribotypes 8-13, 16-27, see S1 Fig) were not homogenous and contained superimposed peaks at certain positions, indicating the presence of more than one sequence variant in these individuals. This intraindividual polymorphism varied largely between ribotypes (1-37 polymorphic sites), and most of the superimposed peaks occurred at positions where the sequences of diploids and tetraploids differed from each other and showed an additive pattern of diploid and tetraploid characters (Fig 2 and  S1 Fig).
The majority of hexaploid individuals (54) possessed completely homogenized sequences, belonging to ribotypes 7, 14 and 15. The most frequent ribotype 7 was found in 50 hexaploid  Table 1; each haplotype is represented by one sequence. Numbers above and below branches indicate posterior probability (blue) and bootstrap support (red) values from Bayesian and maximum parsimony analysis, respectively. The presence of the particular haplotypes in species of the Chenopodium album agg. is indicated by coloured dots (see legend). (B) A cpDNA haplotype network. Each line individuals. This ribotype, together with ribotype 14, was found also in tetraploids. Ribotype 15, found in 3 individuals, possesses tetraploid-like characters in all but two discriminative positions (443 and 576, Fig 2). At these sites, characters typical for diploid sequences were found, indicating that a chimeric sequence was maintained by concerted evolution.
We found only three ribotypes in four individuals that show complete (ribotype 12) or nearly complete (ribotypes 18 and 20) character state additivity in all 37 positions discriminating between the diploids and polyploids (Fig 2). Based on the results of neighbour net analysis, these sequences were placed at a position intermediate between the diploid and polyploid represents one mutational step. Black bars represent missing haplotypes. The seven haplotypes identified in this study are represented by coloured circles. The size of each circle is proportional to the frequency of the particular haplotype. The occurrence of particular haplotypes in species of the C. album agg. is indicated by colours (see legend).
doi:10.1371/journal.pone.0161063.g001   Table 1 and Fig 2, and each ribotype is represented by one sequence. Numbers above and below branches indicate bootstrap support (red) and posterior probability (blue) values from the maximum parsimony and Bayesian analysis, respectively. The presence of the particular ribotypes in species of the Chenopodium album agg. is indicated by coloured dots (see legend). Sequences isolated from a hexaploid individual with a highly polymorphic direct sequence are indicated by black arrows.
doi:10.1371/journal.pone.0161063.g003 clades (Fig 4), illustrating the presence of an almost intact diploid-like ITS copy in addition to the tetraploid one in these individuals. The persistence of two different ITS copies in ribotype 12 was further confirmed by cloning and phylogenetic analyses. The cloned sequences showed 100% homology to ribotypes 4 and 14, characteristic of C. suecicum and C. striatiforme, respectively (Fig 3).
Geographic patterns of ITS sequence variation. We compared the composition and frequency of ribotypes both at the continental and local scales. The pattern of variation in ITS sequences did not reveal any geographic structure (S2 Fig). The most frequent ribotypes of all species were widely distributed across the sampled area. Some rare ribotypes were found hundreds or even thousands of kilometres apart; ribotype 15, for example, occurred in Central Europe, the Balkan Peninsula and the southwestern part of Siberia, the latter two locations being more than 4,000 km apart.
At local scale, the co-occurrence of individuals with different number of polymorphic positions was found in all five sampled populations (S3 Fig). The majority of sequences was completely or nearly completely homogenized towards the tetraploid parents. In all populations, the two most frequent ribotypes (ribotype 7 and 8) recognized at continental scale were also the most abundant ones. The exception is population 599, where eight ribotypes with more or less equal frequencies were found. In population 583, two individuals with ribotype 20 (representing completely unhomogenized sequences of both progenitors) were found.

Genomic in situ hybridization
Chromosomes of C. album were probed with total DNA of the putative diploid progenitors, C. ficifolium and C. suecicum, to determine the chromosomal distribution of common repetitive sequences. The intensity of red-orange fluorescence shows the level of affinity between repetitive sequences in the genomes tested. GISH reveal strong hybridization signal on 18 chromosomes of C. album with both diploid species C. ficifolium and C. suecicum (Fig 5B and 5D). These GISH experiments, using both diploid probes, resulted in the same intensity of fluorescent signal on the same chromosomes, confirming the close relatedness between the diploid taxa.
Probing chromosomes of C. album s. str. with the total DNA of related tetraploids C. strictum and C. striatiforme yielded ambiguous results. The hybridization mixture consisted of self- DNA of C. album (green fluorescence) and DNA of one of the proposed tetraploid ancestors (red fluorescence). A bright orange signal showed high homoeology of C. strictum and C. striatiforme DNA to all chromosomes of C. album s. str. both in euchromatic and in heterochromatic regions (Fig 5F and 5H), making it difficult to divide chromosomes into groups by the level of fluorescence.
Both, C. album s. str. with completely homogenized as well as non-homogenized ITS sequences were used for GISH and the results of these experiments did not differ.

Discussion
Allopolyploid origin of Chenopodium album s. str. and direction of hybridization The origin of C. album s. str. has been a matter of discussions for several years. Different authors have proposed that this species may be of allopolyploid origin. However, controversy surrounds its progenitors [7,37,47,48,70].
Although we used several approaches, including the sequencing of ITS and cpDNA, and genomic in situ hybridization, none of them by itself has brought clear evidence concerning the origin of C. album s. str. However, all these approaches together have produced a collection of complementary evidence, suggesting that C. album s. str. originated by hybridization between a diploid and a tetraploid species, as suggested by Mandák et al. [7].
The GISH method with total genomic DNA as a probe provides us with unique information about similarities between repetitive DNA of related species as well as about the physical location of conserved sequences on chromosomes [71,72]. The GISH results for diploid species, C. ficifolium and C. suecicum, show close relatedness to eighteen C. album s. str. chromosomes. The results indicate that the donor of these chromosomes was highly similar to present-day C. ficifolium/C. suecicum.
GISH experiments with tetraploid DNA as a probe resulted in fluorescent signal in eu-and heterochromatic chromosome regions (Fig 5F and 5H) on all 54 chromosomes of C. album s. str. This finding does not fit the hypothesis that C. album s. str. originated by hybridization between a diploid and a tetraploid progenitor. If this were the case, we would expect fluorescent signal only on the 36 tetraploid-derived chromosomes. However, this pattern could be explained by the presence of tribe-specific sequences [73], the presence of similar families of transposable elements, homogenization of intrapolyploid repetitive DNA sequences by concerted evolution [74] and interhaplome transfer at the tetraploid and hexaploid levels [72]. An additional investigation, including the analysis of repetitive sequence variation, is required to explain this pattern. However, as regards the allopolyploid origin of C. album s. str., the GISH method is especially valuable because it provides evidence about the involvement of diploid taxa.
The involvement of tetraploids is supported by cpDNA and ITS data. All hexaploid individuals analysed had a tetraploid-like cpDNA sequence. We therefore assume that tetraploids served as maternal parents in these crosses and that diploids were the pollen donors. ITS is a biparentaly inherited marker, so sequences of both parents can be anticipated in allopolyploids. However, most hexaploid accessions of C. album s. str. possessed only one sequence type, representing the tetraploid parent. The missing diploid sequences could be explained by their elimination due to concerted evolution [75], a series of intragenomic processes that may lead to complete homogenization of sequences across different loci [74]. We found some traces of diploid sequences in hexaploid individuals, representing partly homogenized sequences, as well as four individuals with a presence of intact diploid sequence. Similarly, intraspecific differences in the degree of sequence homogenization have also been reported for hybridogenous species of other genera (e.g. Tragopogon [76,77], Malus [78], Carapichea [79]).
Mandák et al. [7] mention two possible mechanisms of allohexaploid C. album formation from diploid and tetraploid taxa: hybridization via fusion of unreduced gametes and a two-step procedure involving a triploid bridge. The fact that no triploids were found among the 482 plants analysed by Mandák et al. [7], not even in their enlarged dataset of 1977 accessions, covering most of the Eurasian distribution range of C. album agg. [46] speaks in favour of the first possibility. However, C. album s. str. forms dense and numerous populations, in which rare triploid individuals can be overlooked with high probability despite dense sampling. Another argument for the involvement of a triploid bridge in the evolution of C. album s. str. is that it could have played a role in the group's evolutionary history even though no triploid has ever been found. On the other hand, the production of unreduced gametes in natural populations is relatively low (< 1%, see [80]). They, however, appear to be more frequent in populations exposed to environmental stress [81,82], which could have increased the probability of fusion of two unreduced gametes and significantly accelerated the evolution of C. album s. str.

Parental species remain unknown
Based on our data, no further conclusions can be drawn regarding the exact parental combination of C. album s. str. The GISH and genome size analysis of Mandák et al. [7] was unable to discriminate between species of the same ploidy level. Similarly, the analysis of cpDNA pointed towards the tetraploid as the maternal parent, but did not bring enough resolution to discriminate between the two tetraploid taxa. On the other hand, while C. strictum is a late-flowering tetraploid with no overlap in flowering time with diploid species, C. striatiforme is an earlyflowering tetraploid and phenologically overlaps at least partly with the diploids. Flowering time might represent a pre-zygotic isolation mechanism favouring C. striatiforme as a putative donor of the tetraploid genome. Likewise, ITS allowed for only restricted discrimination between species of the same ploidy, due to a low level of sequence variation, a lack of phylogenetic structure and ribotype sharing between the two tetraploid taxa. This pattern is further complicated by a strong effect of concerted evolution in C. album s. str.
Walsh et al. [37] used a low-copy nuclear marker to elucidate the origin of polyploids within the C. album aggregate. They found C. album s. str. to be composed of three sub-genomes, but offered no conclusion as to their donors. It should be noted, however, that of all the putative parental taxa proposed by Mandák et al. [7], only C. ficifolium was included in the study of Walsh et al. [37]. The tetraploids C. strictum and C. striatiforme and the diploid C. suecicum were not sampled. The combination of comprehensive sampling and more variable markers (e.g., microsatellites and low-copy nuclear genes of Štorchová et al. [36] or Walsh et al. [37]), may be suitable for identifying the putative parents of C. album s. str.
Polyphyletic origins of C. album s. str.
Recently, multiple origins, rather than a single one, have been confirmed for allopolyploid taxa from several genera such as Arabidopsis [16], Arabis [19], Asplenium [18] Cardamine [20] or Tragopogon [17,21]. High morphological variation has repeatedly been described in Chenopodium album s. str. [22,23,44]. Mandák et al. [7] proposed that this variation could be explained by co-existence of multiple lineages of different origin within C. album s. str. We identified two ribotypes in C. album s. str. (ribotypes 7 and 14), which were completely homogenized and matched the sequences of putative parental taxa, C. strictum and C. striatiforme. We therefore assume that individuals of C. album s. str. possessing these two ribotypes have originated from different tetraploid genotypes (or even different taxa) and thus represent divergent genetic lineages of independent origin. The rest of the ribotypes found in the hexaploid accessions represent intermediate stages of still ongoing concerted evolution or recombinant sequences and therefore could not be used to estimate the number of genotypes within C. album s. str. Altogether, three cpDNA haplotypes were found in C. album s. str. The most abundant haplotype (haplotype 1) was also found in tetraploid species. The other two haplotypes were unique to C. album s. str. However, they are rare and closely related to haplotype 1. This pattern may indicate that there are more maternal lineages of C. album s. str. than revealed by ITS, but that they were not found among the tetraploids analysed due to insufficiently detailed sampling. On the other hand, the two haplotypes specific to C. album s. str. may have originated already after the formation of the hexaploid. As regards the supposed polyphyletic origin of C. album s. str., we can conclude that two independently formed lineages could be readily confirmed and that most likely this number is strongly underestimated.

Ancient vs. recent hybridization
We found individuals with different degrees of ITS homogenization even within the same population. An extreme example is population 583, which besides individuals with partly or completely homogenized sequences comprised two non-homogenized accessions. The degree of sequence homogenization may be dependent on several factors. As reviewed in [75] besides the number and localization of rDNA loci, generation time and the time elapsed since the hybridization event may significantly affect the tempo of homogenization. Though there are numerous exceptions, in general it can be said that sequence homogenization should be more advanced in species with short generation times and in ancient hybrids [75]. All locations sampled in the Czech Republic for detailed analysis were in fact mixed populations of di-, tetraand hexaploids. Such locations may represent ideal opportunities for natural hybridization. However, recent hybridization between taxa with different ploidy levels is considered to be rather rare [7,83]. Mandák et al. [7] failed to experimentally synthetize hexaploid C. album s. str., suggesting that reproductive barriers between taxa with different ploidy level have already developed. However, hexaploid individuals with ITS sequences homogenized to different degrees may represent hybrids of different age. Those with unhomogenized sequences may be of recent origin whereas those with completely homogenized ones may be ancient. On the other hand, the rates of gene conversion are very specific, and it may be very misleading to estimate the time frame of a hybrid's origin based on the patterns of sequence homogenization alone. Shortly after their formation, hybrids are known to exhibit additivity of their parental sequences [76,84,85], but one of the parental sequences may get eliminated from hybrids already after a few generations [85][86][87]. The most extreme example has been described for synthetic hybrids of Armeria [85], where one of the parental ribotypes was almost completely removed already in the F2 generation. By contrast, sequence homogenization in Nicotiana allopolyploids seems much slower, with complete elimination of one of the parental sequences taking hundreds of thousands to millions of years [88].

Conclusions
Our study confirms that hexaploid Chenopodium album s. str. is an allopolyploid formed by hybridization between diploid and tetraploid taxa. All hexaploid accessions possessed cpDNA sequences identical or closely related to those of tetraploid taxa, indicating that tetraploids served as the maternal and diploid as the paternal parent. However, neither of the approaches used was able to identify the particular parental species for all hexaploid accessions, due to high sequence similarity of nrDNA ITS, cpDNA as well as repetitive sequences (as revealed by GISH) between species of the same ploidy level.
In hexaploid C. album s. str., nrDNA ITS sequences with different degrees of intraindividual polymorphism were identified. Most accessions were completely or nearly completely homogenized towards the maternal ribotype. A homogenous but apparently chimeric sequence, combining the characters specific to diploids and tetraploids, was found in three individuals. Both parental sequences were found only in four accessions. Variation in the nrDNA ITS region showed no apparent geographic pattern. Two separate nrDNA lineages, matching the variation within tetraploids, indicate that C. album s. str. originated multiple times. Hexaploid individuals with different degrees of sequence homogenization may represent allopolyploid lineages of different age.
Supporting Information S1 Fig. Summary of variable sites in nrDNA ITS of Chenopodium album agg. Nucleotide characters of nrDNA ITS ribotypes identified in the present study in each of the 49 variable sites is indicated. Sites where particular ribotypes showed intraidividual polymorphism are marked by IUPAC ambiguity codes (Y = C or T, M = C or A, W = A or T, K = T or G, R = A or G, S = C or G). For each ribotype, the level of intraindividual polymorphism (as the "number of polymorphic sites"), its presence in particular species of the C. album group (fic = C. ficifolium, sue = C. suecicum, stc = C. strictum, stf = C. striatiforme, alb = C. album s. str.) and its frequency in the analysed taxa are given. Blue columns indicate sites discriminating between diploids and tetraploids or between individual species and that show character state additivity in hexaploids. Grey columns indicate sites that are additive in hexaploids; this additivity could be inferred from patterns of intraspecific variation of at least some of the putative parental species. Green columns indicate sites variable or polymorphic in some of the putative parental species but not additive in hexaploids. White columns indicate sites polymorphic in hexaploids but invariable in diploids and tetraploids.