Characterization of the Rosa roxbunghii Tratt transcriptome and analysis of MYB genes

Rosa roxbunghii Tratt belongs to the Rosaceae family, and the fruit is flavorful, economic, and highly nutritious, providing health benefits. MYB proteins play key roles in R. roxbunghii’ fruit development and quality. However, the available genomic and transcriptomic information are extremely deficient. Here, a normalized cDNA library was constructed using five tissues, stem, leaf, flower, young fruit, and mature fruit, with three repetitions, and sequenced using the Illumina HiSeq 2500 platform. De novo assembly was performed, and 470.66 million clean reads were obtained. In total, 63,727 unigenes, with an average GC content of 42.08%, were determined and 59,358 were annotated. In addition, 9,354 unigenes were assigned the Gene Ontology category, and 20,202 unigenes were assigned to 25 Eukaryotic Ortholog Groups. Additionally, 19,507 unigenes were classified into 140 pathways of the Kyoto Encyclopedia of Genes and Genomes database. Using the transcriptome, 18 candidate MYB genes that were significantly expressed in mature fruit, compared with other tissues, were obtained. Among them, 10 R2R3 MYB and 1 R1 MYB were identified. The expression levels of 12 MYB genes randomly selected for qRT-PCR analysis were consistent with the RNA-seq results. A total of 37,545 microsatellites were detected, with an average EST-–SSR frequency of 0.59 (37,545/63,727). This transcriptome data will be valuable for identifying genes of interest and studying their expression and evolution.

roxburghii MYB genes. helix-turn-helix domain that interact with the major grooves of specific DNA sequences 53 [9]. MYB superfamily members can be classified into several subfamilies, including R1  To better understand the profiles of different tissues in R. roxburghii, leaf, stem, 75 flower, young fruit and mature fruit were collected. In this study, the Illumina platform 76 was used to construct a cDNA library using 15 mixed tissues to obtain transcriptome 77 information. MYBs, which are significantly expressed in mature fruit, were identified.  The clean reads were selected from raw data with filtering out adaptor-only reads, reads 100 with more than 5% N bases unknown, and low-quality reads (reads containing more than 101 50% bases with Q-value≤10). Trinity assembly program was used to obtain data. To  designed (S1 Table) with primer premier 6. Total RNAs were isolated from leaf, stem, 131 flower, young fruit (YF) and mature fruit (MF) using the Trizol, followed by purification 132 with an RNA purification kit (Takara, Japan). Real tine RT-PCR was performed on a Roche 133 LightCycler480 machine using SYBR green I, with β-actin as an endogenous control.

134
Amplification was performed for 95°C for 2 min, 40 cycles with 95°C for 15 s, annealed 135 at 58°C for 30 s, and 72°C for 30 s. The expression levels relative to the control were 136 estimated by calculating △△Ct and subsequently analyzed using 2 −△△Ct method. Rose roxburghii. The number of core repeat motifs in mononucleotide, di-nucleotides, tri-142 nucleotide tetra-nucleotide, penta-nucleotide and hexa-nucleotides was counted.

144
Illumina sequencing and sequence assembly 145 To identify more genes, RNA-seq of different tissues, leaf, stem, flower, young fruit and  (Table 1). Of the 63,727 unigenes, 78.03% (49,727) were longer than 600 bp 156 and 56.07% (35,732) were longer than 1 kb (Fig 2). In addition, most unigenes (60,901) 157 had a length of less than 5,200 bp (95.57%) (S3 Table and Table 1).  Table). Swiss-prot and Nr Page 9 of 27 162 contained the most homologous unigenes, at 55,118 and 55,151, respectively. In total, 163 3,284 unigenes were annotated to all database ( Fig 3A). Comparative approaches 164 effectively found differences and similarities among different species. Nucleic acid 165 sequences from Rose species, including strawberry, apple, and cherry, were aligned using 166 a BLAST algorithm-based search of the Uniprot database. We found that strawberry was 167 the closest model species, followed by cherry and then apple, with the numbers of  There are three GO categories: biological process, cellular component, and 171 molecular function (S5 Table). Category "biological process" consisted of 20 functional 172 groups, with the major groups, metabolic process (56.56%) and cellular process 173 (54.02%), having the same and higher numbers of annotations, respectively. For the 174 category "cellular part", 16 groups were predicted. Cell, cell part, and organelle were the 175 three major groups. For "molecular function", binding (49.02%) and catalytic activity 176 (46.01%) were the dominant groups, followed by structural molecule activity (14.89%) 177 (Fig 4).  (Fig 4).    8,757 contained more than one SSR (23.32%; S11 Table). In total, 5,275 (14.05%) SSRs 227 were present in a compound formation. Transcriptome types of SSRs, from single 228 nucleotide to hexa-nucleotide, were abundant.

231
The most abundant motif was A or T (18,855, 50.22%), followed by AG or CT (8,055, 232 21.45%) ( Table 2). Among SSRs with tri-, tetra-, and penta-nucleotides, the most  Table 2). The repeat positions of the SSR types were analyzed and ranged from 5 to 121.

237
Most SSR types were repeated more than 15 times, at 19.01% (7,136), while those 238 repeated 10 times were 17.10% (6,422) (Table 3). Except mononucleotides, the repeat 239 numbers for most SSRs ranged from 5 to 12 (9,612, 75.9%), with only small percent 240 being repeated more than 15 times (1,177, 9.3%).  In present study, a deep RNA-seq analysis was conducted on five tissues, and a total of 312 469.5 million reads were generated. In total, 63,727 unigenes were obtained using Trinity