Genome-Wide Identification and Characterization of R2R3MYB Family in Cucumis sativus

Background The R2R3MYB proteins comprise one of the largest families of transcription factors in plants. Although genome-wide analysis of this family has been carried out in some species, little is known about R2R3MYB genes in cucumber (Cucumis sativus L.). Principal Findings This study has identified 55 R2R3MYB genes in the latest cucumber genome and the CsR2R3MYB family contained the smallest number of identified genes compared to other species that have been studied due to the absence of recent gene duplication events. These results were also supported by genome distribution and gene duplication analysis. Phylogenetic analysis showed that they could be classified into 11 subgroups. The evolutionary relationships and the intron - exon organizations that showed similarities with Arabidopsis, Vitis and Glycine R2R3MYB proteins were also analyzed and suggested strong gene conservation but also the expansions of particular functional genes during the evolution of the plant species. In addition, we found that 8 out of 55 (∼14.54%) cucumber R2R3MYB genes underwent alternative splicing events, producing a variety of transcripts from a single gene, which illustrated the extremely high complexity of transcriptome regulation. Tissue-specific expression profiles showed that 50 cucumber R2R3MYB genes were expressed in at least one of the tissues and the other 5 genes showed very low expression in all tissues tested, which suggested that cucumber R2R3MYB genes took part in many cellular processes. The transcript abundance level analysis during abiotic conditions (NaCl, ABA and low temperature treatments) identified a group of R2R3MYB genes that responded to one or more treatments. Conclusions This study has produced a comparative genomics analysis of the cucumber R2R3MYB gene family and has provided the first steps towards the selection of CsR2R3MYB genes for cloning and functional dissection that can be used in further studies to uncover their roles in cucumber growth and development.


Introduction
The MYB family of proteins is large, functionally diverse and represented in all eukaryotes. Most MYB proteins function as transcription factors with MYB binding domain conferring the ability to bind DNA [1]. The MYB gene family is divided into different types according to the number of repeat(s) in the MYB domain: 4RMYB has four repeats, 3RMYB (R1R2R3MYB) has three consecutive repeats, R2R3MYB has two repeats, and the MYB-related type usually, but not always, has a single repeat [1,2,3,4]. However, most plant MYB genes encode R2R3MYB class proteins, which containing two repeats [1,5]. Each of these MYB repeats contains three-helices, with the second and third helices forming a helix-turn-helix structure when bound to DNA [6]. Moreover, R2R3MYB proteins are characterized by the presence of a conserved MYB domain and a highly variable Cterminal region. The C-terminal region is responsible for establishing protein-protein interactions with other components [7,8].
Extensive studies of the R2R3MYB gene family in various plant species have provided a better understanding of this gene family. However, little is known about this gene family in cucumber (Cucumis sativus L.). To date, none of the R2R3MYB genes have been reported in cucumber. Cucumber is not only one of the most important vegetables all over the world, but is also a model system for studies on sex determination and plant vascular biology [38]. Furthermore, its growth and production are severely affected by some abiotic stresses, such as high salinity [39,40], drought [41],  and low temperature [42,43]. Therefore, the identification and functional study of cucumber stress responses and tolerance genes may elucidate the molecular mechanisms behind the plant stress responses and tolerance and could ultimately lead to improvements in stress tolerance. A draft of the Cucumis sativus genome sequence was reported recently [44]. The genome-wide of R2R3MYB genes can now be identified and described. In the present study, genome sequence was searched so that the CsR2R3MYB genes could be identified in order to predict protein domain architectures and to assess the extent of conservation and divergence in the cucumber R2R3MYB family. A phylogenetic tree combining Arabidopsis, Vitis, Oryza, Populus and Glycine R2R3MYB proteins was constructed so that their evolutionary relationships and the putative functions of cucumber R2R3MYB proteins could be examined based on Arabidopsis R2R3MYB proteins with known functions. Alternative splicing (AS) analysis indicated that 8 out of 55 (, 14.54%) cucumber R2R3MYB genes underwent AS events, producing a variety of transcripts from a single gene. Tissue-specific analysis was performed and abiotic condition response expression profiles were generated so that genes, which could be potentially participate in the stress signal transduction pathway in cucumber, could be identified. This extended analysis is the first comprehensive study of the R2R3MYB gene family in cucumber and provides valuable information for further exploration into the functions of this significant gene family in cucumber. In addition, these results provide information about the relationship between evolution and functional divergence in the R2R3MYB family.

Identification and Sequence Conservation of Cucumber R2R3MYB Genes
One hundred and twenty-six Arabidopsis R2R3MYB proteins and the consensus protein sequences of the MYB-binding domain, Hidden Markov Model (HMM) profile (PF00249), were employed as a query to search against the cucumber genome database (http://cucumber.genomics.org.cn/page/cucumber/index.jsp) using the BlastP program. A total of 71 genes in the cucumber genome were identified as possible members of the CsR2R3MYB family. To confirm putative R2R3MYB genes in the cucumber genome, the amino acid sequences of all 71 proteins were searched for the presence of the R2R3 domain by Pfam and SMART. Following an extensive search for R2R3MYB genes, 55 typical R2R3MYB genes (named CsMYB0 to CsMYB54) were confirmed from the original data. These 55 cucumber R2R3MYB genes were subjected to further analyses (Table 1).
To gain insight into the cucumber R2R3MYB binding domains, sequence logos were produced to examine how well conserved the R2 and R3 repeats were in the R2R3MYB proteins within each residue position. As shown in Fig.1, fifteen and five conserved amino acid residues were identical among the members detected in the R2 and R3 MYB repeat regions, respectively. Within the 55 cucumber R2R3MYB proteins, all the R2 repeat sequences contained three tryptophan residues. However, in the R3 repeats, the first tryptophan residue was replaced by phenylalanine. The second and third tryptophan residues were conserved in all the members. These results were consistent with those from Arabidopsis [5], Populus [10] and Triticum [4].

Phylogenetic Analysis of the Cucumber R2R3MYB Family
The phylogenetic relationship between the CsR2R3MYB proteins was examined by multiple sequence alignment of their MYB binding domain with bootstrap analysis (1,000 replicates). The 55 members of the CsR2R3MYB family were subdivided into 11 subgroups, designated S1 to S11, according to clades with at least 50% bootstrap support. Nineteen gene pairs were formed with strong bootstrap support. To compare the two phylogenetic trees on the basis of cucumber R2R3MYB domains and complete protein sequences, respectively, similar subgroups were analyzed, though the classifications of only a few members varied ( Fig. 2; Fig.  S1). This indicated that the conserved R2R3MYB domain was an important unit in CsR2R3MYB protein and the dramatic divergence of the C-terminal regions did not appear to have a large influence on the regulatory function of the corresponding proteins [8].
To obtain information about the evolutionary relationship of the CsR2R3MYB genes, an unrooted NJ phylogenetic tree using bootstrap analysis (1000 replicates) was built from alignments of the R2R3MYB complete protein sequences from 55 CsR2R3MYB, 126 AtR2R3MYB, 117 VvR2R3MYB, 102 OsR2R3MYB, 197 PtR2R3MYB and 244 GmR2R3MYB genes ( Fig. 3; Fig. S2). The phylogeny was very similar to a previously published phylogeny that included all known Arabidopsis, Vitis, Oryza, Populus and Glycine R2R3MYB proteins [1,7,10,45]. The resulting tree generated 90 subgroups (triangles), which were designated with a subgroup number (C1-C90). However, 48 proteins did not fit well into any subgroups (lines) ( Fig. 3; Fig. S2). The 48 proteins were considered . Phylogenetic relationships and subgroup designations in R2R3MYB proteins from cucumber (Cs), Arabidopsis (At), grape (Vv), rice (Os), poplar (Pt) and Soybean (Gm). The neighborjoining tree includes 55 R2R3MYB proteins from cucumber, 126 from Arabidopsis, 117 from Vitis, 102 from rice, 197 from poplar and 244 from soybean. The bootstrap values lower than 50 are not shown in the phylogenetic tree. The proteins are clustered into 90 subgroups (triangles), designated with a subgroup number (e.g. C1). Forty eight proteins did not fit well into subgroups (lines). The membership of each subgroup is described in the table at right. Several subgroups are highlighted. 12 subgroups (yellow) are shared in all the 6 species. 5 subgroups (red) are shared among other 5 speices but not with cucumber. The uncompressed tree with full taxa names is available as Figure S2. doi:10.1371/journal.pone.0047576.g003 orphans, most likely representing highly diverged lineage-specific R2R3MYB protein sequences.
Seventy  15,34,51,61) were shared among Arabidopsis, grape, rice, poplar and soybean but not in cucumber, which suggested that these R2R3MYB proteins may have specialized roles that were acquired or expanded in Arabidopsis, grape, rice, poplar and soybean after divergence from the last common ancestor with cucumber. Meanwhile, some species-specific subgroups were also observed, indicating that R2R3MYB genes may have evolved or been lost in a single species, following divergence. For example, 10 subgroups (C8, 18 ) only soybean members, which indicated that these genes may have special functions in Arabidopsis, grape, rice, poplar and soybean, respectively. Interestingly, C66 did not include any Arabidopsis R2R3MYB proteins but only members from cucumber, grape, poplar and soybean. This suggested that the genes in C66 may have been lost in Arabidopsis during the evolutionary process. The similar reason could also explain that three subgroups (C5, 41 and 65) were absent in the rice genome but not cucumber, Arabidopsis, grape, poplar and soybean, Some cucumber R2R3MYB proteins were clustered into Arabidopsis functional clades (Fig. S2), which provided an excellent reference to explore the functions of the cucumber R2R3MYB genes. For example, CsMYB21 grouped together with Arabidopsis AtMYB21 and AtMYB24 into clade 41, referring to control anther development [46,47]. CsMYB36 and CsMYB49 were clustered into clade 79 and shared a high level of sequence similarity with male gamete cell formation protein AtMYB125 (DUO1). This implied that the possible functions of CsMYB36 and 49 were related to male gamete cell division and differentiation [22]. CsMYB6 and CsMYB26 was grouped into clade 7 with two Arabidopsis proteins, AtMYB16 (MIXTA), proposed to control the shape of petal epidermal cells [48] and AtMYB106 (NOK), a negative regulator of trichome branching [49]. This represented a functional clade containing proteins responsible for cell development or morphogenesis. Remarkably, CsMYB 0, 8, 16, 27, 48, 50, 51, 53 and 54 did not fit well into any of the clades, which indicated that the 9 proteins might have specialized roles in cucumber or were acquired after divergence from the last common ancestor with other 5 species.

Intron-exon Structure of the Cucumber R2R3MYB Family
According to the results of intron-exon structure identification (Fig. 2), within the 55 CsR2R3MYB genes, the number of exons ranged from one to five and 44 out of 55 had more than one exon. As shown in Fig. 4A, exon 1 and 2 appeared to be the more restricted in length, while exon 3 was more variable (31-850 bp). Presence of a fourth and fifth exon was exclusive to some specific genes. Despite this variability, the lengths of the first two exons were very similar (exon 1, 133 bp; exon 2, 130 bp) and highly conserved (exon 1, 32.7% occurrence; exon 2, 52.7% occurrence). Although exon 3 was the most diverse in size, R2R3MYB families from cucumber, Arabidopsis [7], grape [7] and soybean [45] species were similarly distributed when the first three exon lengths were considered (Fig. 4B).
When the CsR2R3MYB gene structures were analyzed further, the number of introns contained in their R2 and R3 domains was determined. All 55 genes, according to relative positions and phases, could be arranged into 11 different splicing patterns (A-K) (Fig. 5). Patterns A to C, composed of one or two intron (s) distributed at two highly conserved specific positions (indicated by white inverted triangles), accounting for approximately 67% of CsR2R3MYB genes. Patterns F-I had introns at varying positions in the R2 or R3 domain and were observed in only 11% of the 55 genes. Approximately 22% of these 55 genes (patterns J and K) had no introns at the MYB binding domain. It was noteworthy that two genes (CsMYB36 and CsMYB49) from pattern J had one intron between the R2 and R3 domain and were classified into the same subgroup shown in Fig. 2.
Intron phases with respect to codons were investigated. An intron was designated as occurring in one of three phases. In phase 1, splicing occurred after the first nucleotide of the codon; in phase 2, splicing occurred after the second nucleotide and in phase 0, splicing occurred after the third nucleotide of the codon [50,51]. In contrast, the phases that contained five introns (7, 8, 9, 10 and 11), which were located in the R3 domain, were more variable. This suggested that the splicing phase was highly conserved during the evolution of CsR2R3MYB genes. Such conserved splicing patterns and phases were also observed in the MYB gene families of Arabidopsis [52], rice [52] and soybean [45].

Genome Distribution and Gene Duplication of Cucumber R2R3MYB Genes
To determine the genomic distribution of the CsR2R3MYB genes, the DNA sequence of each CsR2R3MYB gene was used to search the cucumber genome database using BLASTN. A total of 52 R2R3MYB genes could be mapped on chromosomes 1 to 7 (Table 1; Fig. 6). Three genes (Csa015272, Csa022057 and Csa024079) could not be conclusively mapped on any chromosome. Although each of the seven cucumber chromosomes contained some CsR2R3MYB genes, the distribution seemed to be uneven (Fig. 6). The largest number of R2R3MYB genes were found on chromosomes 2 and 5 (ten genes each), followed by chromosome 3 (eight genes). Seven genes were distributed on each of chromosomes 1and 6. Only five genes were located on each of chromosomes 4 and 7. This analysis revealed that cucumber R2R3MYB genes were found in all chromosomes. Relatively high densities of CsR2R3MYB genes were found in some chromosomal In this study, gene duplication events, including tandem and segmental duplications, were investigated with the purpose of elucidating the mechanism behind the expansion of the CsR2R3MYB gene family that is thought to have occurred during the evolutionary process [53][54][55]. Huang et al. [44] reported that the recent whole-genome duplication event was absent in the cucumber genome. However, a few tandem duplications have been shown to exist in cucumber. The phylogenetic analysis results indicated that there were no tandem duplicated genes in the CsR2R3MYB family because no cucumber paralogs could be detected, which indicated the absence of a recent tandem duplication event in the CsR2R3MYB family. The method utilized by Schauser et al. [56] was used to detect whether or not segmental duplication events had occurred in the CsR2R3MYB family and found that no CsR2R3MYB genes could be attributed to segmental duplication. Similar results were also found in the cucumber WRKY [57], MADS [58], LOX [59] and ERF [60] families.
According to Holub's [61] description, a chromosome region containing two or more genes within 200 kb can be defined as a gene cluster. Analysis of the positions of the 55 CsR2R3MYB genes in the cucumber genome did not reveal a strong clustering on particular chromosomes (Table 1; Fig.6). The exceptions were CsMYB 7 and 8, CsMYB 13 and 15, which were located within 17 kb on chromosome 7and171 kb of each other on chromosome 1, respectively. Since none of the neighboring genes were duplicated, the two clusters likely arose from a local rather than a whole genome duplication event.

Alternative Splicing (AS) Analysis
Alternative splicing (AS) is the mechanism by which a common precursor mRNA produce different mRNA variants, by extending, shortening, skipping, or including exon sequences, or retaining intron sequences [45]. The combinatorial joining of exons by AS is an elegant mechanism that most eukaryotes use to generate several distinct proteins from a single transcript [62]. In this paper, PCR amplification to screen possible AS in all 55 CsR2R3MYB genes were conducted, and several distinctively spliced transcripts were successfully obtained.
As shown in Fig. 7, 8 of 55 R2R3MYB genes in cucumber contain two to five alternative structures that indicate they had undergone AS, producing a variety of transcripts from a single gene. Two distinctively spliced transcripts were found for CsMYB30, 31 and 47, three for CsMYB5 and 43, four for CsMYB36 and 49, and five for CsMYB19, respectively. In general, these AS events resulted in a variety of sequence insertions and/or deletions in the corresponding ORFs. For instance, a 21bp AS site in R2 repeat of CsMYB19-2 allowed the lengthening of 16 and 7 amino acids, respectively. However, a 15 bp and 57 bp AS sites in CsMYB30 and 31 resulted in a deletion of 5 and 19 amino acids in R3 repeat, respectively. Interestingly, we observed that some of the AS events changed the type of R2R3MYB protein. For example, A 189 bp AS site of CsMYB5 resulted in a frame shift, which changed the R2R3MYB (CsMYB5-1) into a single-repeat MYB type (CsMYB5-2). Similarly, CsMYB19-3, CsMYB30-2, CsMYB43-2, CsMYB47-2, CsMYB49-2, -3 and -4 were also confirmed as single-repeat MYB genes. In contrast, although AS in CsMYB19-2 resulted in an insertion of 21bp in R2 repeat and a deletion of 57bp in R3 repeat of CsMYB31-2, they were still typical R2R3MYB genes. Remarkably, some alternative types of splicing resulted in a long deletion at the 59 terminus, for example, CsMYB5-3, CsMYB19-4, -5 and CsMYB36-2, -3, -4. However, these transcripts were unlikely to code a protein. The reasons were as follows: the seven upstream ORFs existing in the long leader region (at least 515 bp) of these transcripts would strongly repress translation of the downstream ORF [62][63][64][65][66]; and it has been shown that the transcripts with long, AUG-burdened leader sequences were incapable of supporting protein synthesis [67][68][69]. More interestingly, all ORFs encode proteins that differ only in the MYB domains at the 59 terminus.
As Li et al. [62] reported that AtMYB59 and AtMYB48 underwent similar AS events, moreover, the conserved AS pattern was also found in two rice homologous genes (Os11g47460 and Os12g37970). As shown in Fig. 3 and Fig. S2, CsMYB43 and CsMYB47 were two homologous genes of AtMYB59, AtMYB48, Os11g47460 and Os12g37970 in cucumber. The results in Fig. 7 demonstrated that these two cucumber homologous genes undergo similar AS with AtMYB59, AtMYB48, Os11g47460 and Os12g37970.

Expression Profiles for Cucumber R2R3MYB Genes in Different Tissues and Under Different Abiotic Conditions
Semi and real-time quantitative RT-PCR were both used to detect the expression patterns for all cucumber R2R3MYB genes in the roots, stems, leaves, male flowers, fruits and tendrils, and under  . Tissue-specific expression profiles of 50 cucumber R2R3MYB genes. Relative transcript abundances of CsR2R3MYB genes were examined by qRT-PCR. The Y axis is the scale of the relative transcript abundance level. The X axis is the tissues of cucumber. Total RNA was isolated from roots (R), stems (S), leaves (L), male flowers (MF), fruits (F) and tendrils (T), respectively. The cucumber b-actin gene (GenBank AB010922) was performed as an internal control. Five genes (CsMYB9, CsMYB14, CsMYB33, CsMYB38 and CsMYB45) showed very low expression in the above tissues, so the qRT-PCR results of these five genes were not displayed. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 300 bp. Primer sequences were shown in detail in Table S2. doi:10.1371/journal.pone.0047576.g008 Figure 9. Expression patterns of the 12 cucumber R2R3MYB genes under NaCl (100 mM) treatment. Relative transcript abundances of CsR2R3MYB genes were examined by qRT-PCR. The Y axis is the scale of the relative transcript abundance level. The X axis is the time course of NaCl treatment. The cucumber b-actin gene (GenBank AB010922) was performed as an internal control. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 300 bp. Primer sequences were shown in detail in Table S2. doi:10.1371/journal.pone.0047576.g009 three treatments. The expression profiles of the 55 cucumber R2R3MYB genes showed different patterns of tissue-specific expression ( Fig. 8; Fig. S3). Nineteen genes (CsMYB 0-2, 5, 6, 10, 11, 23-25, 28, 29, 34, 35, 36, 37, 41, 43 and 49) (34.5%) were expressed in all tissues tested, although the transcript abundance of some genes in spatial tissues was very low. Five genes (CsMYB9, CsMYB14, CsMYB33, CsMYB38 and CsMYB45) showed very low transcript abundances when tested using both semi and real-time quantitative RT-PCR in the above tissues which may be pseudogenes, or may be expressed at specific developmental stages, under special conditions or have higher transcript abundance in other tissues, e.g., seeds. The rest of the genes showed spatial variations in transcript abundance, with high levels of transcript abundance in one or some tissues and low transcript abundance in others. For example, CsMYB5, CsMYB7, CsMYB16 and CsMYB26 showed high levels of transcript abundance in stems, leaves, male flowers, fruits and tendrils but low levels in the roots. The transcript abundances of CsMYB15, CsMYB22, CsMYB43 and CsMYB47 were higher in the roots than any other tissues. Only two genes, CsMYB18 and CsMYB21, showed tissuespecific expression and were only detected in male flowers. These Figure 10. Expression patterns of the 14 cucumber R2R3MYB genes under ABA (100 mM) treatment. Relative transcript abundances of CsR2R3MYB genes were examined by qRT-PCR. The Y axis is the scale of the relative transcript abundance level. The X axis is the time course of ABA treatment. The cucumber b-actin gene (GenBank AB010922) was performed as an internal control. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 300 bp. Primer sequences were shown in detail in Table S2. doi:10.1371/journal.pone.0047576.g010 Figure 11. Expression patterns of the 9 cucumber R2R3MYB genes under low temperature (46C) treatment. Relative transcript abundances of CsR2R3MYB genes were examined by qRT-PCR. The Y axis is the scale of the relative transcript abundance level. The X axis is the time course of low temperature treatment. The cucumber b-actin gene (GenBank AB010922) was performed as an internal control. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 300 bp. Primer sequences were shown in detail in Table S2. doi:10.1371/journal.pone.0047576.g011 results indicate that the cucumber R2R3MYB genes are involved in various aspects of physiological and developmental processes.
Mounting evidence suggests that R2R3MYB transcription factors play important roles in the response to abiotic stresses [1]. In this study, the transcript abundances of the CsR2R3MYB genes at the three-true-leaf stage were investigated under NaCl (100 mM), low temperature (4uC) and ABA (100 mM) treatments. Leaves were harvested after being treated for 0, 1, 3, 5 and 10 h, respectively. The results indicated that 27 (,49.1%) genes responded to at least one treatment, which included 12 genes responding to NaCl treatment, 14 genes to ABA and 9 genes to low temperature, which suggested that these CsR2R3MYB genes were involved in responsive of high salinity, ABA signaling and low temperature, respectively (Fig. 9, 10, 11 and S4). Among these genes, 4 genes (CsMYB16, 29, 35 and 53) were able to respond to two treatments and 2 (CsMYB0 and 2) genes to all three treatments. The rest 21 genes only responded to a single treatment. The expression of 10, 10 and 7 genes were induced by NaCl, ABA and low temperature treatment, respectively, whereas 2, 4 and 2 genes were repressed, respectively (Fig. 9, 10, 11 and S4; Table S3). Interestingly, some genes behaved in an opposite manner to their expression profile when subjected to different treatments. For example, CsMYB16 was induced by high salinity but were repressed by ABA, and CsMYB0 and 53 were induced by ABA but were repressed by low temperature.

Characterization of the Cucumber R2R3MYB Family
R2R3MYBs are widely distributed in higher plants and comprise one of the largest known families of regulatory proteins [8,10]. However, no related information has been reported in cucumber. This study identified and characterized 55 cucumber R2R3MYB genes through genome-wide analysis. Compared to Arabidopsis (126) [5,7], Vitis (117) [7,10], rice (102) [9], poplar (197) [10] and soybean (244) [45], the size of the R2R3MYB family was small in cucumber, which suggested that the R2R3MYB gene family in Arabidopsis, Vitis, Oryza, Populus and Glycine had expanded compared to cucumber.
Complete and accurate annotation of genes is an essential starting point for further evolution and function study in gene family. A total of 55 CsR2R3MYB genes from 26682 cucumber annotated genes in cucumber genome were identified. Moreover, the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequencing technologies to obtain 72.2-fold genome coverage, and the high coverage of the cucumber genome by this assembly was also confirmed using the available EST, fosmid and BAC sequences [44]. Therefore, the low number of CsR2R3MYB family was not the result of inadequate depth of genome coverage.
Many angiosperms underwent whole genome duplication events (c, b, a). The recent gene duplication events were the most important for the rapid expansion and evolution of gene families [54,57]. Arabidopsis (as well as rice and poplar) underwent the recent duplication events, which led to the large-scale expansion of the R2R3MYB family in their genome [54,70]. However, Huang et al. [44] reported that the cucumber genome was not part of the recent whole-genome duplication events and tandem duplications. The method utilized by Schauser et al. [56] was used to detect whether recent small duplication blocks occurred in the CsR2R3MYB family. This study found no CsR2R3MYB gene locus on any recent duplication blocks. In addition, phylogenetic analysis revealed that the cucumber R2R3MYB family contained nineteen sister pairs. However, none of these pairs were genetically linked to each other on their corresponding chromosomal locations, which indicated the absence of recent tandem duplication event in CsR2R3MYB genes. Furthermore, the cucumber genome contained the smallest average gene family size (1.71) compared to Arabidopsis, poplar, rice and grape [44]. This may explain, in part, to the small number of genes found in cucumber.

Phylogenetic Analysis and Evolution of Cucumber R2R3MYB Genes
Phylogenetic analysis of the R2R3MYB proteins have been conducted extensively in Arabidopsis [1,5,7], grape [7], poplar [10], rice [9] and soybean [45], and the evolutionary relationship of this gene family within and among the different species has been systematically studied. To obtain an overall picture of the 55 cucumber R2R3MYB proteins and their relationships with those of Arabidopsis, grape, rice, poplar and soybean, phylogenetic trees combining cucumber, Arabidopsis, grape, rice, poplar and soybean R2R3MYB proteins were constructed, which divided the 841 R2R3MYB into 90 clades and the 55 CsR2R3MYB members into 20 clades. There are anatomical and physiological differences between cucumber, Arabidopsis, grape, rice, poplar and soybean, in addition, the gene loss and lineage-specific expansions were likely to be accounted for by genomic drift [71], so it is possible that some clades could have expanded differently in the cucumber, Arabidopsis, grape, rice, poplar and soybean R2R3MYB families.
Seventy clades did not include any cucumber R2R3MYB, which suggested that these clades were either lost in cucumber or were acquired after divergence from the last common ancestor. For example, the subgroup of C59 genes is known to be involved in epidermis cell-fate determination in Arabidopsis. In cucumber, no C59 subgroup genes were observed, which indicated that the possible gene loss and/or lineage-specific expansions, which may reflect species-specific adaptations [71]. The possible reason could be that multi-cellular trichomes in cucumber (as well as Solanum lycopersicum) develop through a transcriptional regulatory network that differs from those regulating unicellular trichome formation in Arabidopsis (and perhaps cotton) [72,73]. The AtMYB75, 90, 113 and 114 genes in subgroup C52 play a role in the regulation of anthocyanin biosynthesis [74,75]. There have been at least five, six and seven C52 subfamily members identified in Vitis [7], Populus [10] and soybean [45], respectively. It would be interesting to characterize the possible mechanism underlying the absence of anthocyanin-related R2R3MYB genes in the cucumber genome. The reason of the absence of epidermis cell-fate determination and anthocyanin-related R2R3MYB genes in cucumber perhaps is that these related R2R3MYB genes were not identified in this paper. So it is possible that new CsR2R3MYB genes could be identified in the future as annotations improve.
Clade 66 did not include any Arabidopsis R2R3MYB and only members from cucumber, grape, rice, poplar and soybean, which implied that these proteins might have specialized roles that were either acquired or expanded in cucumber, grape, rice, poplar and soybean lineages. Similar reasons could explain why none of the rice R2R3MYB members were grouped within clades 5, 41 and 65.
As shown in Fig. 3, several cucumber R2R3MYB proteins were clustered into some Arabidopsis functional clades, which provided valuable information on the functions of cucumber R2R3MYB genes. Remarkably, none of the cucumber proteins were grouped within the Arabidopsis 'glucosinolate biosynthesis' clade (C19). A previous study indicated that this clade was derived from a b-type duplication event [76] after Arabidopsis diverged from monocots but before diverging from brassicas [77,78], which may explain the reason for its absence in cucumber, wheat [4], grape [7], rice [7,9], poplar [10] and soybean [45].
In addition, CsMYB0, 8, 16, 27, 48, 50, 51, 53 and 54 did not fit well into any of the clades, suggesting a gene acquisition mechanism from the most recent common ancestor with other 5 species during the evolution. Our expression analysis revealed that cucumber R2R3MYBs had a variety of expression patterns in different tissues. Therefore, we believe that these genes may regulate essential biological processes during cucumber development.
Usually, the pattern of intron positioning can provide important evidence for evolutionary relationships. Previous studies demonstrated that the intron-exon structure was conserved within the same subgroup, but differed between subgroups in the MYB gene family in Arabidopsis, rice [52] and soybean [45]. Unexpectedly, among the 11 subgroups, R2R3MYB genes in six subgroups (1, 3, 4, 5, 8, and 11) did not always show similar intron-exon structures, respectively. In addition, intron-exon structure was not conserved, even in the same sister pair (CsMYB17 and 35; CsMYB0 and 51; CsMYB25 and 37; CsMYB20 and 53 and CsMYB11 and 33). Furthermore, the intron phases were not conserved within the six subgroups and five sister pairs either. These results combined with gene duplication analysis, suggested that these five gene pairs, as well as the other 14 pairs, were not duplicated genes and confirmed the absence of recent whole-genome duplication events and tandem duplications in the CsR2R3MYB family.
As previously observed in Arabidopsis, grape [7] and soybean [45] R2R3MYB genes, the modal lengths of the first two exons were very similar (exon 1, 133 bp; exon 2, 130 bp) and highly conserved. The exon length of the CsR2R3MYB family was also investigated and the results showed that the first two exons lengths were very similar to Arabidopsis, grape and soybean, which suggested that MYB binding domains could be partially conserved because exons coding for this domain have all evolved with restricted lengths.

Alternative Splicing (AS) Analysis
AS of pre-mRNAs is one of the most complex cellular processes in eukaryotes and accounts for a large proportion of proteomic complexity [62,79,80]. This allows production of many gene products with enriched functions from a single coding sequence. However, only a small number of AS events have been reported in plants. To date, up to 18 (,14.29%) R2R3MYB genes in Arabidopsis underwent AS events [45]. In the present study, AS of R2R3MYB genes were detected. We found that 8 of 55 (,14.54%) R2R3MYB genes in cucumber contained two to five alternative structures, which indicated that they had undergone AS, thus producing a variety of transcripts from a single gene (Fig. 7).
Some of the AS events changed the R2R3MYB into a singlerepeat MYB type, for example, CsMYB5-2, CsMYB19-3, CsMYB30-2, CsMYB43-2, CsMYB47-2, CsMYB49-2, -3 and -4 were confirmed as single-repeat MYB genes. As all ORFs encode proteins that differ only in the MYB domains, CsMYB5, 19,30,31,36,43,47 and 49 will be able to encode MYB proteins with one or two MYB repeats, which are known to bind DNA. Therefore, these types of MYB proteins may have binding affinities to different target genes. We also observed that six homologous genes (AtMYB59, AtMYB48, Os11g47460, Os12g37970, CsMYB43 and CsMYB47) in C73 [62] underwent similar AS events. This AS pattern, which may have occurred before the divergence of monocots and dicots, was conserved in this subgroup of genes during evolution [62]. A previous study on the MYB genes of Arabidopsis and rice indicated that the intron-exon structure was conserved among subgroups [52]. Similar results were also observed in cucumber R2R3MYB family. These results demon-strated that besides the conserved intron-exon structure, the AS pattern may also be conserved in some subgroups of MYB genes in both monocotyledonous (rice) and dicotyledonous (Arabidopsis and cucumber) plants, although its biological significance is unknown yet [62].
Some alternative types of splicing resulted in a long deletion at the 59 terminus, for example, CsMYB5-3, CsMYB19-4, -5 and CsMYB36-2, -3, -4. Since these seven transcripts were unlikely to encode a protein, the biological relevance of this type of transcript remains to be determined.

Expression Analysis of Cucumber R2R3MYB Genes Response to Abiotic Conditions
Numerous R2R3MYB proteins have been characterized by genetic analysis and have been found to occur in response to various abiotic stresses [1,4]. However, no R2R3MYB family genes have been shown to respond to abiotic conditions in cucumber. For this reason, the expression patterns of cucumber R2R3MYB genes were investigated under NaCl (100 mM), low temperature (4uC) and ABA (100 mM) treatment, respectively. The results demonstrated that 27 genes responded to at least one treatment, of which 6 genes responded to multiple treatments. Additionally, some genes showed opposing expression patterns under different stress conditions, such as CsMYB0, CsMYB16 and CsMYB53, which indicated that they played a major role in the plant response to abiotic conditions and involved in communication between different signal transduction pathways.
126 Arabidopsis R2R3MYB proteins were used as query sequences and Blastp searches against the predicted cucumber proteins. In addition, the Hidden Markov Model (HMM) profile for the MYB binding domain (PF00249) from the Pfam database (http://pfam.janelia.org) was also applied as a query to identify all MYB containing sequences in cucumber by searching MYB binding domain sequence against the cucumber genome database using BlastP program. To further verify the reliability of these candidate sequences, the Pfam database (http://pfam.sanger.ac. uk/search) and SMART (http://smart.embl-heidelberg.de/) [81] were used to confirm each candidate CsR2R3MYB protein as a member of R2R3MYB family.
To analyze the features of the MYB domain of cucumber R2R3MYB proteins, the sequences of R2 and R3 MYB repeats of 55 CsR2R3MYB proteins were aligned with the ClustalX 1.81 and adjusted manually. The sequence logos for R2 and R3 MYB repeats were obtained by submitting the multiple alignment sequences to the website (http://weblogo.berkeley.edu/logo.cgi) [82].

Phylogenetic Analysis
Multiple sequence alignments were performed using ClustalX 1.81 with default parameters, and the alignments were then adjusted manually before phylogenetic tree constructed. A phylogenetic tree was constructed with the aligned R2R3MYB binding domain and full predicted protein sequences of 55 CsR2R3MYB genes using MEGA 4 [83], respectively. The neighbor-joining (NJ) method was used with the following parameters: poisson correction, pairwise deletion, and bootstrap (1,000 replicates; random seed). The complete amino acid sequences of 841 R2R3MYB proteins, including 126 AtR2R3-MYB, 117 VvR2R3MYB, 102 OsR2R3MYB, 197 PtR2R3MYB, 244 GmR2R3MYB and 55 CsR2R3MYB, were used to construct NJ tree using MEGA 4 [83]. Classification of the CsR2R3MYB genes was then performed according to their phylogenetic relationships with their corresponding Arabidopsis, grape, rice, poplar and soybean R2R3MYB genes.

Intron-exon Structure Analysis
The DNA and cDNA sequences corresponding to each predicted gene from the cucumber genome and annotation database CuGI were downloaded, and then the intron distribution pattern and splicing phase were analyzed using the web-based bioinformatics tool GSDS (http://gsds.cbi.pku.edu.cn/) [84].

Genome Distribution and Gene Duplication Analysis
Genes were mapped on chromosomes by identifying their chromosomal position provided in the Cucumber Genome Database. The distribution of CsR2R3MYB family members throughout the cucumber genome was drawn manually. To detect the segment duplicated events, the method of Schauser et al. [56] was used. Tandem duplicated genes were identified using the method provided by He et al. [85] and Hu and Liu [58]. Software DNAMAN 5.2.2 was used to analyze the CsR2R3MYB homologs in the phylogenetic tree for similarity.

Alternative Splicing and Expression Analysis
Cucumber (Cucumis sativus L. cv. 'Daqingba') seeds were germinated on moist filter paper in an incubator at 28uC for 1 day. The germinated seeds were sown into soil mixture in the greenhouse at Shandong Agricultural University. After 10 days, batches of ten seedlings were transferred to a plastic tank filled with an aerated nutrient solution (pH 6.0-6.5) containing: Ca (NO 3 [86]. The experiment was carried out in an illuminated incubator and the air temperature (25uC during the day and 18uC during the night) and light intensity (400 mmol m 22 s 21 ) regimes were maintained throughout each treatment. When the cucumber seedlings were at the three-true-leaf stage, three treatments were conducted respectively: 100 mM NaCl, 100 mM ABA, 4uC. Leaves for RNA extraction were harvested at 0, 1, 3, 5 and 10 h after the three treatments, respectively. The roots, stems, leaves, male flowers, fruits and tendrils of mature plants were collected separately used for tissue specific expression analysis.
Total RNA was prepared from different tissues with an RNAprep pure Plant Kit (TIANGEN, China), according to the manufacturers' instructions. First strand cDNA was synthesized by using 1 mg total RNA and PrimeScript 1 st Strand cDNA Synthesis Kit (TaKaRa, Japan).
For alternative splicing analysis, One pair of specific primers was designed (Table S1) for each gene, to amplify the fragments of 55 CsR2R3MYB genes by RT-PCR with TransStart TM FastPfu DNA polymerase (TransGen, China). The amplified DNA fragments were purified using the TIANgel Midi Purification Kit (TIAN-GEN, China) and cloned with the Clone JET TM PCR Cloning Kit (Fermentas, China). Three independent clones for each of the different insert lengths were sequenced for sequence confirmation. Gene structures of the differently spliced transcripts were analyzed using GSDS (http://gsds.cbi.pku.edu.cn/) [84]. The ORFs were predicted for the transcripts that were cloned by using ORF Finder software (http://www.ncbi. nlm.nih.gov/gorf/gorf.html).
To analysis expression patterns of CsR2R3MYB genes, semiquantitative RT-PCR was performed. b-actin gene (GenBank AB010922) was used as an internal control. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 300 bp long. Primer sequences were shown in detail in Table S2. Quantitative real-time PCR was carried out using the RealMasterMix (SYBR Green) kit (TIANGEN, China) and quantified the PCR amplification according to the manufacturers' protocol. Amplification was performed on an iCycler iQ TM multicolor real-time PCR detection system (Bio-Rad, hercules, USA) and the analysis of each type of sample was repeated four times. The analysis of relative mRNA expression data was performed using the 2 2DDCt method [87]. Each expression profile was independently verified in 3 replicate experiments performed under identical conditions.  Table  1.Total RNA was isolated from roots (R), stems (S), leaves (L), male flowers (MF), fruits (F) and tendrils (T). The cucumber b-actin gene (GenBank AB010922) was used to adjust cDNA concentrations. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 300 bp. Primer sequences were shown in detail in Table S2. (TIF) Figure S4 Expression patterns of cucumber abiotic-responsive R2R3MYB genes under different treatment conditions. A: NaCl (100mM); B: ABA (100 mM); C: Low temperature (4uC). Cs represented CsR2R3MYB assigned in Table 1.The cucumber bactin gene (GenBank AB010922) was performed as an internal control. The PCR primers were designed to avoid the conserved region and to amplify products of 150 to 300 bp. Primer sequences were shown in detail in Table S2. (TIF)

Supporting Information
Table S1 Specific primers used for 55 CsR2R3MYB genes used in alternative splicing pattern analysis in this study. (DOCX)

Author Contributions
Conceived and designed the experiments: QL LW ZR. Performed the experiments: QL CZ JL. Analyzed the data: QL. Contributed reagents/ materials/analysis tools: QL CZ LW. Wrote the paper: QL. Revised the manuscript: LW ZR.