Identification of candidate flowering and sex genes in white Guinea yam (D. rotundata Poir.) by SuperSAGE transcriptome profiling

Dioecy (distinct male and female individuals) and scarce to non-flowering are common features of cultivated yam (Dioscorea spp.). However, the molecular mechanisms underlying flowering and sex determination in Dioscorea are largely unknown. We conducted SuperSAGE transcriptome profiling of male, female and monoecious individuals to identify flowering and sex-related genes in white Guinea yam (D. rotundata), generating 20,236 unique tags. Of these, 13,901 were represented by a minimum of 10 tags. A total 88 tags were significantly differentially expressed in male, female and monoecious plants, of which 18 corresponded to genes previously implicated in flower development and sex determination in multiple plant species. We validated the SuperSAGE data with quantitative real-time PCR (qRT-PCR)-based analysis of the expression of three candidate genes. We further investigated the flowering patterns of 1938 D. rotundata accessions representing diverse geographical origins over two consecutive years. Over 85% of accessions were either male or non-flowering, less than 15% were female, while monoecious plants were rare. Intensity of flowering varied between male and female plants, with the former flowering more abundantly than the latter. Candidate genes identified in this study can be targeted for further validation and to induce regular flowering in poor to non-flowering cultivars. Findings of the study provide important inputs for further studies aiming to overcome the challenge of flowering in yams and to improve efficiency of yam breeding.

Introduction White Guinea yam, Dioscorea rotundata Poir., is the most preferred and widely cultivated yam species in West-Africa [1]. Despite its considerable economic and socio-cultural importance, genetic improvement of cultivated yam remains difficult and slow due to its dioecy and poor to non-flowering nature [2]. Dioecy, the presence of distinct male and female individuals, is one of the major characteristics of the genus Dioscorea [3]. A major breeding challenge associated with dioecy is that synchronizing flowering time is difficult when making genetic crosses. In addition, many cultivars of D. rotundata rarely flower, while a significant proportion of those that do flower seldom set fertile seeds [4].
Mechanisms of sex determination in plants have been investigated in over 40 angiosperm species, which identified both heteromorphic and monomorphic sex chromosomes as well as XY and ZW sex-determination systems [5,6]. Due to the large number and small sizes of chromosomes, the identification of sex chromosomes at cytological level has remained difficult in Dioscorea [7]. In D. tokoro, a diploid wild species, and D. alata (water or winged yam), a heterogametic (XY: male) and homogametic (XX: female) sex has been proposed based on segregation patterns of AFLP markers tightly linked to sex and QTL analysis, respectively [5,6,8]. However, a more recent study has revealed that sex determination in the cultivated African species of D. rotundata follows the ZZ (male) and ZW (female) system [9]. In the same study, a candidate chromosomal region associated with sex was determined and a diagnostic marker for sex determination developed. However, gene(s) functionally responsible for sex determination are yet to be identified in yam species.
In angiosperms, flower development involves gene expression and pathway changes in vegetative meristems, leading to their conversion to flowering meristems in response to environmental cues and developmental signals [10]. Molecular genetic analyses in eudicots such as Arabidopsis thaliana and Antirrhinum majus have led to the identification of several floral organ-identity genes, which were originally grouped into three classes (class A, B, and C) based on the floral organ identity they specify [11,12]. The class A, B, and C genes are required for the development of sepals and petals, petals and stamens, and stamens and carpels, respectively. Class A and C genes are mutually antagonistic. As master regulators of floral organ identity, plant MADS-box transcription factors are central to the ABC model [13]. Further studies revealed that additional gene classes, class D and E, are important for ovule and organ development, respectively, leading to a modified ABCDE model for flower development [14].
A range of factors can regulate sex differentiation in angiosperms including genetic [15], epigenetic and environmental [16], as well as physiological regulation by phytohormones [17]. Understanding the molecular and genetic mechanisms of flowering is essential for efficient plant breeding. Recently, whole genome sequencing of D. rotundata has allowed the identification of a genomic region associated with sex [9]. This finding is expected to open opportunities for dissecting the molecular mechanisms regulating flowering and sex determination in yam. Identification of flowering and sex-related genes is critical to scaling up utilization of the available yam germplasm in breeding programs.
Various sequencing-based transcriptome profiling techniques are available for gene expression profiling, novel gene discovery, and genome annotation studies. These include Serial Analysis of Gene Expression (SAGE), which is based on unique short sequence tags 14-15 bp [18], LongSAGE which uses a different type IIS enzyme, MmeI, to generate 21-bp fragments from each transcript [19], Robust-LongSAGE (RL-SAGE) [20], Expressed Sequence Tag Analysis (EST) Analysis (Nielsen 2006) [20], Digital Gene Expression TAG (DGE-TAG), Deep-SAGE [21], and RNA-Seq [22]. The next-generation sequencing (NGS)-based and highthroughput SuperSAGE for tag-based gene expression profiling involves sequencing of longer fragments (26-bp) and simultaneous analysis of multiple samples by using indexing (barcoding) [23]. The longer tags generated by SuperSAGE compared to the relatively shorter tag reads obtained by other NGS-based techniques such as DGE-TAG significantly improve the accuracy of tag-to-gene annotation [23].
In this study, we applied SuperSAGE to analyze transcriptome changes in flowers at early developmental stages from D. rotundata accessions representing three flowering groups (male, female and monoecious) and identified genes that are differentially expressed among these flowering groups. We show that majority of these differentially expressed genes (DEGs) correspond to genes implicated in regulating flowering and sex determination in multiple species. Our findings suggest that known mechanisms underlying flowering and sex determination are also likely to be conserved in D. rotundata.

Morphological characterization
A total of 1938 D. rotundata accessions planted for routine field maintenance at IITA Genetic Resources Center in Ibadan (Nigeria) were characterized using 12 morphological traits for two consecutive seasons in 2010 and 2011.The flowering pattern of these accessions was also recorded over the same period. The 12 morphological traits were selected among the yam descriptors jointly developed by the former International Plant Genetic Resources Institute (current known by its new designation Bioversity International) and the International Institute of Tropical Agriculture [24]. Detailed description of the traits used is provided in Supplemental Table 1.

Sampling and RNA extraction for SuperSAGE
Tubers harvested from seven accessions including two male, two female, and three monoecious (Table 1) selected from the accessions used for morphological characterization based on their consistency of flowering over two years were planted in pots in a screen house. Samples for RNA extraction were collected from early stage flowers (Fig 1). Total RNA was extracted using the Qiagen RNeasy plant mini kit according to manufacturer's protocol (Qiagen, Venlo, the Netherlands). On-column DNAase treatment was performed to remove contaminating DNA.

SuperSAGE: Library preparation and sequencing
SuperScript II double-strand cDNA synthesis kit was used for cDNA synthesis with the biotinylated adapter-oligo (dT) primer (5'-bio-CTGATCTAGAGGTACCGGATCC-CAGCAGTT TTTTTTTTTTTTTTT-3'). Synthesized cDNA was purified using Qiagen PCR purification Kit. The library was prepared following the protocols of Matsumura et al. [23]. Briefly, the NlaIII anchoring enzyme was used for sample digestion, and the fragments (NlaIII-digested cDNA) were bound to streptavidin-coated magnetic beads (Dynabeads streptavidin M-270). Non-biotinylated cDNA fragments were removed by washing, and then adapter 2 was ligated to digested cDNA fragments bound to the magnetic beads. The type III restriction enzyme EcoP151 was used for digestion of adapter 2-cDNA after washing. Adapter2-26bp fragments were further ligated to adapter 1 (that are specific for each of the samples) as described Matsumura et al. [23] were used. The adapter2-tag-adapter1 ligates was amplified using Phusion High polymerase and GEX primers (5'-AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA-3' and 5'-CAAGCAGAAGACGGCATACGATCT-3'). The amplification program was 98˚C for 1min, 10 cycles at 98˚C for 35 sec, and 60˚C for 30 sec. The PCR product consisting of eight tubes per sample was pooled and concentrated using Qiagen MinElute reaction purification kit. The amplification product was run on an 8% non-denaturing polyacrylamide gel. After staining with SYBR green (Takkara Bio), the DNA band around 123-125-bp size was cut out and gel purified. The purified PCR product from each sample was analyzed for its quantity and quality on an Agilent Bioanalyzer 2. The PCR product was cloned using invitrogen:-zeroblunt Topo PCR cloning kit for sequencing and later transformation using one shot chemical transformation protocol. Colony PCR was performed using 2x colony PCR mixture and purified using QIAGEN PCR purification kit. Purified and mixed PCR products were applied to Illumina Genome Analyzer II for sequencing reactions following the manufacturers protocol.

Real-time quantitative reverse transcription PCR (qRT-PCR)
Total RNA was extracted from female and male flowers after flowering using the CTAB extraction protocol as described by Dellaporta [25] with slight modifications. Approximately 500 mg tissues were grinded using mortar and pestle in liquid nitrogen. A pre-heated 1000 uL of CTAB extraction buffer (2% CTAB, 2% PVP-40, 20 mM Tris-HCl pH 8.0, 1.4 M NaCl, 20 mM EDTA) was added to each sample and incubated at 65˚C for 15 minutes, vortexed every 5-minutes and centrifuged at 15000 rpm at 4˚C for 5 minutes. The aqueous top layer was transferred and purified with one volume of chloroform: isoamyl alcohol (24:1) and centrifuged for 10 minutes at 15000 rpm. The supernatant was mixed with 0.6 volume of cold isopropanol to precipitate the nucleic acid, mixed gently by inversion and centrifuged at 15000 rpm for 20 minutes. The precipitated pellet was washed with 100 uL of cold 70% ethanol and centrifuged for 5 minutes and air dried. RNA preparations were subjected to on-column DNase digestion (RNA clean and concentrator kit; ZymoResearch) and reverse-transcribed in the presence of random hexamers (LunaScript1RT SuperMix Kit, NEB Inc.) in 20 μl of total reaction volume containing 1 μg of total RNA and incubated at 55˚C for 15 minutes following manufacturer's instructions. Primer pairs used for qRT-PCR were designed for the three selected candidate genes (TK, GST and PIF3) with the support of the Integrated DNA Technologies' PrimerQuest software (S2 Table). Three different housekeeping genes; Adenine phosphoribosyl transferase (APT), Beta-Tubulin (Tub) and TIP41-like family protein (TIP41), were tested. Based on the results of melt and standard curves, Beta-Tubulin was chosen as the endogenous reference for quantification of targets genes. RT-PCR reaction volumes were set up to 12.5 μL containing Luna Universal qPCR Master Mix (New England Biolabs Inc.), 2 μl of a 1:10 dilution of cDNA reaction, and 400 nM each of corresponding forward and reverse primers. Five different biological replicates represented by total RNA extracted from three individual plants and reverse transcribed into cDNA by LunaScript1RT SuperMix Kit (New England Biolabs Inc.) were used for statistical analysis of the quantification. Each cDNA sample was amplified in duplicate on a single 96-well optical plate using LightCycler480 (Roche). The cycling profile consisted of 95˚C for 10 min followed by 40 cycles of 15 s at 95˚C and 60 s at 60˚C, as recommended by the manufacturer. Immediately after the final PCR cycle, a melting curve analysis was done to determine specificity of the reaction.

Data analysis
For analysis of data on phenotypic traits, multiple correspondence analyses (MCA) was performed using FactoMineR package [26] in R software [27] to detect the association between sex types and morphological variables.
For SuperSAGE data analysis, sequence reads obtained from the different libraries were first sorted into different flowering groups based on their respective index sequences. Then, the subsequent extraction of SuperSAGE tags from reads was conducted using a script written in Perl. The R package edgeR [28] was used to determine differentially expressed tags between pairs of flowering groups (male vs female, male vs monoecious and female vs monoecious). Tags that were expressed in at least one sample or flowering group with a minimum number of 10 were considered for the analysis. Additionally, since two replicate samples were sequenced from each flowering group, tags that were represented in both replicates were considered. The expression cutoff was 100 counts per million (CPM), corresponding to a read count of about 5 for each tag.
For annotation of SuperSAGE tags, the selected tags were first aligned to the draft D. rotundata scaffold sequence, followed by extraction of the 2000-bp upstream sequences. These sequences were fused as queries for BLAST search against the non-redundant (nr) database of the National Center for Biotechnology Information (NCBI) and the Universal Protein Resource (UniProt).
For qRT-PCR analysis, quantities of RNA accumulation levels were calculated as relative quantification (RQ) values using the comparative cycle threshold (Ct) (2-ΔΔCt) method [29]. Before quantitative analyses, validation experiments were carried out to confirm equal amplification efficiencies between the reference and target genes and the applicability of the comparative method. Assessing the relative amplification efficiencies was achieved by running standard curves for each amplicon (i.e., five serial log10 dilutions of starting cDNA were amplified and Ct values of target and reference genes were measured in triplicate and plotted against the log of the input cDNA amount). The efficiencies were considered comparable when falling within a range of 100 ± 10%, corresponding to a curve slope of -3.3 ± 0.33 [30], and all primers pairs that did not allow PCR performances matching these criteria were discarded.

Variation in flowering patterns and morphological traits in D. rotundata genebank accessions
The flowering patterns of 1938 D. rotundata accessions collected primarily from the main yam growing regions of West and Central Africa were assessed under field conditions at IITA over two consecutive growing seasons in 2010 and 2011. D. rotundata accessions are easily distinguished based on morphology of their flowers as female, male, or monoecious (unisexual female and male flowers on the same plant) (Fig 1). In 2010, majority of the accessions (996, 51.4%) failed to flower, while 745 (38.4%), 170 (8.8%), and 27 (1.4%) were male, female, and monoecious, respectively (Fig 2). In the 2011 growing season, 939 (48.5%) accessions were male, followed by 630 (32.5%) non-flowering, 287 (14.8%) female, and 82 (4.2%) monoecious accessions ( Fig 2B). Most accessions were consistent over the two seasons with respect to flowering, while some were not. About 326 of the male, 169 of the female and 43 of the monoecious accessions failed to flower at least in one of the seasons, hence the discrepancy observed between the two seasons with regard to the proportion of accessions with different sex groups (Fig 2). Overall, majority of D. rotundata accessions maintained at IITA were either male or non-flowering, while the female accessions represented less than 15% of the collection. Whereas, monoecious types are very rare in D. rotundata.
In addition to flowering and type of sex, we collected categorical data on 14 selected morphological traits over the same period (S1 Table). The data was subjected to multiple correspondence analysis (MCA), generating three major clusters that mainly reflected sex of the accessions (Fig 3). This suggested that certain morphological traits are associated with sex in D. rotundata with the two dimensions explaining about 38% of the total variations. Included in the first cluster were a group of non-flowering accessions that could be distinguished by traits such as purplish green stem, non-waxy stem, stem with non-barky patches, dark green leaf color, and hastate leaf shape. A second cluster composed entirely of male accessions was correlated with purplish green stem, presence of barky patches, non-waxy stem, dark green leaf, and sagittate leaf shape. The third group was composed of a mixture of male, female and monoecious flowering accessions that were identified by waxiness, absence of barky patches, either green, brownish green or purple stem color, and pale green or green leaf color as distinct traits.

Generation and analysis of SuperSAGE tags
SuperSAGE libraries representing early stage male, female, and monoecious flower were multiplexed and sequenced on a single lane of an Illumina Genome Analyzer IIx, generating a total of 8,332,523 tags following quality control (QC) based on sequence read length, adapter dimmer, and sequencing error rates (Table 1). After the tags were sorted into the different flowering groups using Adapter1 sequences and singleton tags were removed, 20,236 unique nonsingleton tags were obtained. Of these, 6,335 tags that were ten or less in number per library were removed, while the remaining 13,901 abundant tags were retained for further analysis (S3 Table). Of 13,901 abundant tags, 5985 (43%) were shared among the three flowering groups, whereas 1855 (13%), 1648 (12%) and 765 (5%) tags were specific to male, female, and monoecious flowers, respectively (Fig 4). The remaining tags were shared between male and female (2650 tags; 19.0%), male and monoecious (378 tags; 2.7%), and female and monoecious (620 tags; 4.5%) flower groups (Fig 4).

Identification of genes differentially expressed across different flowering groups of yam
The fold change of differential expression and gene abundance (count per million) of 13,901 unique tags was compared across the different flowering groups. A total of 100 tags were differentially expressed with p and false discovery rate (FDR) values of less than 0.01 (Fig 5; S4 Table). The number of differentially expressed genes-DEGs (or SuperSAGE tags) between male vs. female, male vs. monoecious and female vs. monoecious groups were 13, 67 and 20, respectively (S4 Table). Of the 13 tags differentially expressed between male and female groups, five were highly expressed in male while the remaining eight were expressed in the female sex type. Similarly, the male vs monoecious group comparison revealed that 25 tags were highly represented in male, and 42 were abundant in monoecious sex type. For the female vs monoecious flower group, 11 and nine tags were expressed highly in female and monoecious flowers, respectively. The tag abundance estimated by the average logCPM (counts per million) ranged from a minimum of 6.36 to 11.88 for all the DEGs (Fig 5 and S4 Table).

Annotation of SuperSAGE tags
Of the 100 differentially expressed tags, 88 tags were unique. These unique tags were aligned to the D. rotundata draft scaffold sequence to extract 2000-bp upstream sequences for use as a query for tag annotation by BLAST search against the National Center for Biotechnology Information (NCBI) and Universal Protein Resource (UniProt) databases. Of the 88-unique sequences, 87 could be aligned to the draft D. rotundata sequence, and 72 of these matched sequences available in the NCBI and UniProt databases. Sequences obtained for fifteen (17.24%) tags did not match sequences available in the two databases, while eight (9.09%) tags generated high e-values (>0.005). Of the 72 tags that could be annotated, 14 (19.5%) corresponded to proteins of unknown function, unnamed proteins, uncharacterized proteins or hypothetical proteins (S5 Table). However, a set of 18 (25.0%) tags representing 16 unique genes corresponded to genes that have been previously described as having a role in flower development and/or being expressed in flowers in multiple species (Table 2).
Among the 16 genes reported to have involvement in flowering and flower development, twelve were highly expressed in female flowers, while the remaining two had very low expression levels in the same flowers. They genes also had variable expression levels in male and monoecious flowers ( Table 2). Our results further indicated that 16, 15, and four genes were preferentially expressed in male, monoecious, and female flowers, respectively. However, only one gene was differentially expressed across all the flowering groups, whereas two, three and 14 genes were differentially expressed between female and monoecious, male and female, and male and monoecious flowers, respectively.
We conducted BLASTN search of the 88 deferentially expressed tags to D. rotundata scaffolds, eight tags returned multiple hits while 62 tags had a single hit each. The remaining 18 tags had no hit (S6 Table). The 62 tags with single hit were retained for further analysis. Further conversion of the scaffold position of the tags to recently published [9] pseudo chromosome as "TDr96_F1_Pseudo_Chromosome_v1.0.fasta" The tags were distributed across all chromosomes except chromosome_01, with number of tags per chromosome ranging from 2 to 11 on chromosome_05 (S6 Table). Moreover, the gene ID and annotations of the tags were confirmed using the published [9] gene model, "TDr96_F1_v1.0. gff3".

Validation of selected candidate genes by qRT-PCR
To independently validate our SuperSAGE results using a different technique, we selected three representative genes associated with flowering and included Transketolase (TK), Glutathione-S Transferase (GST) and Phytochrome Interacting Factor-3 (PIF3). Three different housekeeping genes previously used for qRT-PCR across different organs and developmental stages of Dioscorea opposita [31] including Adenine Phosphoribosyl Transferase (APT), Beta-Tubulin (Tub) and TIP41-like family protein (TIP41) were tested based on standard curve and melt curve analysis.
Each target gene was amplified in two different samples; female (F) and male (M) flowers and in five biological replicates. Beta-Tubulin was selected in the current experiments as the most suitable and stable housekeeping gene in D. rotundata. qRT-PCR amplification revealed higher expression of TK in F than in M (Fig 6A) whereas, both GST and PIF3 were highly expressed in M compared to their level of expression in F (Fig 6B and 6C). SuperSAGE transcriptome profiling for identification of flowering and sex-related genes

Flowering relationships with morphological variation
A number of previous studies have shown the existence of dioecious and monoecious flowering patterns in D. rotundata [32][33][34]. Here, we have also identified dioecious and monoecious, as well as non-flowering accessions in the D. rotundata germplasm maintained at IITA. The monoecious plants predominantly produced male flowers, a phenomenon previously described as trimonoecious [32]. Most of the inflorescences consisted only of male flowers, while a limited number of inflorescences contained few female flowers. Our observations over the two seasons demonstrate that majority of the D. rotundata accessions being maintained at IITA are either male or non-flowering types, with a significantly lower proportion of female accessions, while monoecious accessions are rare (Fig 2). Abundance of flowering appeared to vary according to sex, as male accessions flowered profusely compared to female ones. Moreover, higher numbers of male flowers develop on monoecious plants than female flowers. A similar observation from materials collected in Benin showed that female plants are rare and produce a limited number of flowers, whereas the male plants, whenever they flower, produce flowers in abundance [33]. Flowering pattern was largely inconsistent between the two seasons. This could be related to the effect of environmental factors as previously reported [32]. This supported by the differences in mean monthly temperature and rainfall we recorded at the IITA experimental site in 2010 and 2011, particularly during main flowering period between July and September (S7 Table).
Morphological characters that could be used to predict or distinguish the different flowering groups or sex types at the earliest growth stage possible are important particularly for breeders to select germplasm and cultivars in breeding experiments. Our study revealed that a set of selected morphological traits can be used to for predict sex in D. rotundata accessions (Fig 3; S1 Table). The correlation between flower type, morphological traits, and ploidy level previously reported in D. rotundata suggested that all triploid individuals are either male or non-flowering, and display some morphological features distinct from diploid individuals [35]. This morphological traits-based prediction of sex and ploidy level provides significant practical advantages for choosing of parents for crossing in breeding trials. However, we do recognize that the correlation identified needs to be validated with repeated experiments conducted under different environmental conditions to select those traits with the highest predictive value across multiple environments.

Identification of candidate genes associated with flowering in yams
In our study, a number of genes differentially expressed in D. rotundata flower types were identified (Table 2 and S4 Table). However, only a few of the DEGs corresponded to genes known to be associated with flowering in plants, including those that are known either for their expression in flowers, for organ or sex specific expression, for a role in the regulation of flowering time, photoperiod and the transition from vegetative to reproductive phase, or for flower color development ( Table 2). Examples of candidate genes with expression in flowers include the Pectinesterase inhibitor and AQPs/MIP gene in flowers of Arabidopsis thaliana [36,37] the Malonyltransferase gene in flowers of Salvia splendens [38], and the VPE gene in Citrus sinensis L. during flower development [39] and expression of Trichome birefringence-measured across female (F, solid) and male (M, pattern) sex types in D. rotundata. Error bars represent the standard error. Significance was determined by Student's t-test ( � P<0.05, �� P<0.01). Each value is the mean of five biological replicates. like 23 at mature pollen stage [40] and in petal, sepal, pedicel, stamen, pollen, and petal differentiation and expansion stages of Arabidopsis thaliana [41,42].
In tomato, the long chain fatty acid CoA synthetase gene that is highly expressed in anther and petals, specifically in the sites subject to epidermal fusion [33], has been shown to be important for flower development, as lack of this gene impairs fertility and floral morphology. Similarly, the GDSL esterase/lipase and APG genes have been suggested for its potential involvement in flowering [43]. Among genes with a role in flower color development, our SuperSAGE analysis identified a glutathione S-transferase-like gene, reported for flower color intensity in Dianthus caryophyllus L [44].
Additional genes identified include those involved in flowering time, photoperiod and senescence regulation, such as: (1) a gene encoding NAC domain containing protein important for the coordination of cold response and flowering time [45], (2) a gene encoding a Transferase family protein, which is implicated in regulation of flowering time via the flowering repressor FLOWERING LOCUS C in Arabidopsis thaliana [46], a Zinc finger family protein gene involved in regulating flowering time and abiotic stress tolerance in Chrysanthemum morifolium [47], and a LOX gene known to regulate cell death related to flower senescence and flower opening [48,49]. A dramatic increase in LOX gene in response to senescence was observed in Rosa hybrida cv. Kardinal [42], in addition the RPS4 gene is considered to play an important role in regulating flowering time, where a delay in flowering time was showed by silencing of the genes encoding RPS4 and Rhodanese in Glycine max [50]. The transcription factor PIF3 is considered to play an important role in the control of flowering through clock-independent regulation of CO and FT gene expression and is associated with early flowering in Arabidopsis thaliana [51]. Furthermore, genes known for organ specific expression were also identified including the DnaJ-like protein gene, detected predominantly in male flower of Salix bakko [52]. Likewise, transcriptome analysis identified male specific expression of the MYB-like gene Male Specific Expression 1 (MSE1) in the dioicous species of Asparagus officinalis [53]. Unlike D. rotundata, (ZW) sex determination in A. officinalis follows the XY system and MSE1 is tightly linked to the Y chromosome as well as specifically expressed in early stage of anther development.
Another gene identified by our SuperSAGE analysis was Transketolase (TK), whose overexpression was previously shown to result in a higher ratio of female flowers and increased yields in cucumber [54]. In D. rotundata, we found that TK is located on chromosome 11 (S6 Table). This corresponds to the same chromosomal region candidate where a candidate gene for sex determination was previously mapped by next-generation sequencing-based bulked segregant analysis (BSA) [9].
In this study, we have identified genes that are differentially expressed in D. rotundata male, female and monoecious flowers, including genes implicated for roles in flowering and flower development in multiple plant species. However, the mechanisms of flowering and sex determination in these species and the role these genes play remains largely unknown. Detailed functional characterization of the genes including analysis of allelic variants such as null and hypomorphic alleles, over and ectopic expression analyses etc can improve our knowledge of the underlying mechanisms. With the recent releases of a draft genome sequence of D. rotundata and a protocol for genetic transformation of the same species [9,55], such studies will provide powerful tools for dissection of the molecular networks controlling sex determination and flowering in yams and for genetic improvement of the crop.

Validation of selected candidate genes associated with flowering in yams
To validate the SuperSAGE data, we selected three of the DEGs and analyzed their expression levels in male and female flowers using qRT-PCR. Although samples for qRT-PCR analysis were obtained from plants independently grown from those used for SuperSAGE analysis, the results confirmed our SuperSAGE data for all three genes analyzed. Considering that one of the genes we identified and validated in this study, TK, is located within the same chromosomal region to which a candidate gene for sex determination in D. rotundata was previously mapped by NGS-based BSA [7], detailed functional characterization of these genes will improve how knowledge of how flowering and sex determination is regulated in yams.

Conclusions
The current study is the first to identify candidate genes associated with flowering and sex determination in yams using a genome-wide transcriptome profiling approach. Once validated with further experiments, the candidate sex genes identified in this study can be targeted for manipulation of flowering. Such genes provide important inputs for studies aiming to overcome the erratic to non-flowering nature of yam, thereby contributing to improving breeding efficiency in the crop. This has a great practical significance considering most D. rotundata accessions are either non-flowering or flowering is inconsistent from year to year. In addition to the need for further genetic and genomic studies on flowering and sex determination in yam, understanding the environmental and epigenetic factors controlling sex and flowering regulation is invaluable to underpin the improvement of breeding systems for yam.