Comparative transcriptome analysis of translucent flesh disorder in mangosteen (Garcinia mangostana L.) fruits in response to different water regimes

Translucent flash disorder (TFD) is one of the important physiological disorders in mangosteen (Garcinia mangostana L.). TFD has symptoms such as flesh arils that become firm and appear transparent similar to watercore in apple or pear. Information on the changes of gene expression in TFD-affected tissues remain limited, and investigations into the effects of different water regimes still need to be undertaken. Through an RNA sequencing approach using the Ion Proton, 183,274 contigs with length ranging from 173–13,035 bp were constructed by de novo assembly. Functional annotation was analyzed using various public databases such as non-redundant protein NCBI, SwissProt, and Gene Ontology, and KEGG pathway. Our studies compared different water regimes to incidence and differentially expressed genes of TFD-like physiological disorders. From the differentially expressed gene (DEG) between normal air and TFD-affected aril, we identified DEG-related TFD events, which 6228 DEGs in the control condition and 3327 DEGs in under water stress treatment condition remained, and confirmed these with RT-qPCR, including sucrose synthase (SUSY), endoglucanase (GUN), xyloglucan endotransglucosylase/hydrolase (XTH), and polygalacturonase (PG) showed statistically significant. In addition, transcription factors also indicated changes in MYB, NAC and WRKY between tissues and different water regimes.


Introduction
Plant physiological disorders occur during plant development; these are not visible and are usually difficult to identify from outside [1]. In horticultural products such as fruits that are consumed as a fresh product, the appearance of physiological disorders influences consumer tastes, downgrades quality, and lowers pricing. The postharvest losses in fresh-consumed horticultural products are estimated at approximately 5-35% in developed countries, and this increases to approximately 20-50% in developing countries [2]. One of the quality losses in fruits are caused by the appearance of physiological disorders, especially in tropical fruit crops such as mangosteen (translucent flesh disorder, TFD), citrus (cracking and puffing), mango (spongy tissue and internal breakdown), guava (bronzing), and pomegranate (fruit cracking and aril browning) [3,4]. Mangosteen (Garcinia mangostana L.) is a tropical fruit with high economical, nutritional, and health values. In Indonesia, most provinces produce mangosteen fruits, with total production increasing from 117,595 tons in 2011 to 203,100 tons in 2015. In total, 98,294.6 tons were exported in 2015, with a market value of US$ 44.9 million, and were exported to Asian, Middle East, and some European countries [5]. However, fruits with physiological disorders such as TFD are generally rejected by consumers less than 20% of total production. TFD is one of the major physiological disorders in mangosteen. TFD symptoms cause the flesh to appear transparent or water soaked, and in severe cases, may become firm and crisp. Disorders with symptoms similar to TFD in mangosteen have been reported and differentially termed depending on the fruit, such as watercore in apple [6], European pear [7], and Japanese pear [8]; water-soaked brown flesh disorder [9]; water-soaking in melon [10]; and flesh translucency in pineapple [11]. The phenomenon is not confined to fruits, similar symptoms have also been found in onion and termed translucent scale (Shock et al., 2007) [12]. In previous studies, specific gravity techniques have been used to separate TFD-affected from healthy mangosteen fruits [3]. Recently, more reliable and accurate techniques such as near infrared reflectance spectroscopy [13], vibration frequency base on strain gage sensor [14], electrical resistance and capacitance [15], and electrical impedance [16], have been used to predict TFDaffected fruits.
The causes of TFD are still unclear and the subject of speculation. However, mechanical injuries during harvesting and handling [17], water-related stress [18,19], and nutrient deficiency [20,21] are highly correlated to occurrences of TFD in mangosteen. The incidence of TFD increased when there was excess water during fruit development, around 9 weeks after anthesis, and when plants were re-watered suddenly after drying conditions to -100 kPa soil moisture, which trigger the increase in incidence of TFD-affected mangosteen fruits [19,22]. The intensive irrigation increased TFD-affected fruits [23]. Calcium and boron concentrations in TFD-affected flesh were different compared to normal flesh, in which calcium was lower and boron was higher [20]. Application of CaCl 2 and H 3 BO 4 by spraying decreased the percentage of TFD-affected fruits [21]. Some studies also suggest that changes in the cell wall led to the occurrence of TFD. Changes in the ultrastructure and biochemistry of the cell wall in TFD-affected flesh have been observed. TFD-affected cell walls showed that the cells had loss disintegration between middle lamella and primary cell wall and that they were swollen [24]. Pectin composition, as well as calcium distribution on the pectin, was higher in TFD than normal flesh [25].
Recently, massively parallel sequencing technologies such as next generation sequencing (NGS), have provided immense opportunities to obtain massive amounts of sequence data at very low cost and with high time efficiency compared to Sanger sequencing. Using NGS platforms allows the sequencing of all transcripts concurrently and can also quantify the gene expression level [26]. Although there have been various studies on TFD-like physiological disorders in mangosteen at the physiological and morphological levels, molecular level details have not been reported yet. Here, we used NGS platforms based on Ion Proton systems for RNA sequencing to obtain the reliable transcriptome reference for mangosteen and analysis of their differential gene expression in instances of translucent flesh disorder in mangosteen. Our studies compare different water regimes to incidence and differentially expressed genes of TFD-like physiological disorders. In further studies, these results can be used for better understanding of gene regulation and mechanisms of TFD-like physiological disorders in mangosteen.

Plant materials
Fruits were collected from mature mangosteen trees, which were grown and maintained at the Pasir kuda experimental field, Center for Tropical Horticultural Studies, Bogor Agricultural University (Bogor, Indonesia). The experimental conditions used four treatments in which started 7 weeks anthesis, including (1) control as the natural growth condition (Control), (2) irrigating the land surface under the canopy (about 50 cm from trunk) with drip irrigation for the whole day (ILS), (3) covering the land surface under the canopy with dark plastic sheet (CLS), and (4) the treatments condition copied from 3rd treatment (CLS condition), with applied water irrigation every morning (CLS+RW). Twenty fruits were collected from each treatment randomly, then the incidences of translucent flesh disorder (TFD) was characterized by scoring each fruit on a scale of 0-4 [27]. A score of 0 (none) was assigned when no part of the fruit was affected, and 1 (slight), 2 (moderate), 3 (severe), and 4 (most severe) were assigned when less than 25%, 50%, 75%, and 100% of the fruit was affected, respectively. The data were analyzed using SAS University Edition. Analysis of variance (ANOVA) was used to test for significant differences between treatments, and Duncan's Multiple Range Test was used to compare means within the ANOVA.
For RNA extraction, the fruit was harvested and collected from a normal aril and translucent affected-aril with three replicates per treatment. All samples were stored in RNALater solution (Life Technologies, Grand Island, NY) prior to RNA extraction at 4˚C. For RNA sequencing, 12 libraries were constructed from normal and TFD-affected arils of both of control and CLS treatments. The RNA sample were collected from two tissue type including normal aril and TFD-affected aril (Fig 1) RNA isolation, sequencing, data pre-processing, de novo transcriptome assembly, and contigs construction Total RNA was extracted using the hot-borate method [28]. Quantity and quality of total RNA were measured using the NanoDrop ND-1000 spectrophotometer (Thermo Scientific, Wilmington, DE) and the 2100 Bioanalyzer (Agilent, Santa Clara, CA), respectively. PolyA RNA was subsequently isolated from 1 μg total RNA using the Dynabeads mRNA DIRECT Micro Purification Kit (Life Technologies, Grand Island, NY). The 12 libraries were constructed using the Ion Total RNA-Seq Kit v2 and Ion PI Template OT2 200 Kit v3 (Life Technologies, Grand Island, NY) following the manufacturer's protocols. RNA sequencing was performed To perform transcriptome assembly, the raw data was obtained from DRA005014 [29], DRA005254, and DRA005255 (in this study). Data pre-processing was performed following [29] to obtain clean data without rDNA filtering. De novo assembly of the transcriptome was performed by Trinity 2.2.0 [30] with default parameters to set a minimum length of 200 bp and performing an in silico normalization to set coverage at 30x. To reduce redundant contigs, the contigs were performed using the CAP3 program with parameters, -p 90 [31] and CD-HIT-EST with parameters, -c 0.90 -M 0 -T 0 [32]. The ribosomal DNA (rDNA) sequences were removed from the contigs using the BLAST+ program [33] with megablast parameter matched to the SILVA rRNA 128 databases [34] and the mangosteen rDNA sequences downloaded from NCBI database. Then, rRNA-removed contigs were hierarchically clustered using the Corset 1.06 program [35] as the mangosteen reference transcriptome and then deposited as a transcriptome shotgun assembly (TSA) to DDBJ with accession numbers, IAC-D01000001-IACD01183274. The high-quality reads were mapped to the rRNA-removed contigs using the SortMeRNA program with default parameters [36] to remove potential rDNA from reads.

Functional annotation, classifications of contigs, and differential gene expression (DGE) analysis
The contigs were aligned using the BLAST+ program against the NCBI non-redundant databases (nr/nt) with gi accession subset to Viridiplantae and UniProtKB databases (Swiss-Prot and TrEMBL subset to Viridiplantae databases) with an E-value cut-off of 10 −5 . Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and INTERPRO were performed using BLAST2GO software [37] retrieved from BLASTX-contigs against the NCBI protein non-redundant database. To identify putative transcription factors (TFs), transcriptional regulators (TRs), and protein kinases (PKs), the contigs were performed using the iTAK program [38].
To determine the differential expressed genes, high quality reads from each sample were mapped onto reference transcriptome sequences using Bowtie2 with default parameters [39]. The abundance of each transcript was calculated using eXpress software [40]. The R package edgeR [41] was used to identify differential expression of transcripts across samples with three biological replicates and directly used for comparing among tissue types, namely normal and TFD tissues, and conditions, namely control and treatment respectively. The significance of differential expression was considered by the fold change (>2) and adjusted-p value using Benjamini and Hochberg's approach [42] approach to obtain the false discovery rate (FDR > 0.01). Gene ontology of DEGs from each pairwise comparison was performed using the DAVID 6.8 beta web interface [43,44]. KEGG analysis of DEGs from each pairwise comparison was performed using the KOBAS 2.0 web interface [45].
Twenty-one genes were selected from contigs to validate their reliability and actin gene (C12506.56596.Ctg.6192) was used as an internal reference gene (S8 File). One microgram of total RNA from each sample was treated with the PrimeScript RT reagent Kit with gDNA Eraser (Takara Bio, Japan) to remove any residual genomic DNA and convert it to cDNA, following the manufacturer protocol. The relative quantification was performed on Eco Real-Time PCR System (San Diego, California) using SYBR Premix Ex Taq (Takara Bio, Japan). The comparative 2 −ΔΔCt method [46] was used for relative quantification of the gene expression levels. The level of significance difference was calculated with independent-samples t-test (P<0.05).

Field experimental, sequencing, de novo transcriptome assembly, and functional annotation
The covered land surface (no irrigation) treatment showed incidence values of 60.00 ± 0.489% for TFD, higher than those of the other treatments (Fig 2). However, the incidence of total of physiological disorders was not significantly different between treatments. In the field experiment, TFD was scored [27]. The incidence of TFD was highest for the covered land surface (no irrigation), with an average score of 60%, and lowest for the control (26.67%). The scores for irrigated land surface and covered land surface with irrigation were 2 (18.33%) and 1 (20.00%), respectively.
An overview of the sequencing and de novo transcriptome assembly statistics is provided in Table 1. In total, 520,709,829 bp of clean reads (after removal of adaptors and low-quality reads) were obtained from 760,917,051 raw reads. The length of reads was between 35-230 bp and the average was 100-bp pooled-clean reads were assembled de novo using the Trinity In total, 183,274 contigs were annotated and are presented in Table 2. Functional annotation was performed against several public databases. The database was subset into the Viridiplantae database to reduce false positive sequence homology from other organisms. The Contigs from the NCBI non-redundant (nr) protein database were used to analyze gene ontology, INTERPRO, and KEGG. An overview of species distribution, e-value distribution, number of hits, and alignment-length are presented in Fig 3. In total, 74,349 (40.57%) and 47,660 (26.00%) of contigs were tagged with the best aligning sequence homology from non-redundant proteins and nucleotides of NCBI databases, respectively. In comparison, 84,222 (45.95%) and 56,815 (31.00%) of contigs were annotated using UniProt KB and SwissProt databases, respectively.
Gene Ontology (GO) terms and KEGG were retrieved from the Blast2GO database with associated BLAST search results using the NCBI non-redundant protein database. GO assigned contigs into three main GO categories, biological processes (BP), molecular function (MF), and cellular component (CC) (Fig 4). A total of 30,883 contigs had retrieved GO terms (S1 File). The highest abundance GO term from each category was identified. Hydrolase activity (2,225), metal ion binding (1,955), and protein binding (1,678) were the largest GO terms within the molecular function category. Single-organism cellular process (2,081), gene expression (1,896), and cellular nitrogen compound biosynthetic process (1,813) were the largest GO terms for biological process. Of the cellular component GO terms, integral component of membrane (2,448), cytoplasmic part (576), and intracellular organelle part (1,178) were the most represented. KEGG biological pathway analysis showed that 2,504 contigs were mapped into 138 KEGG pathways (S2 File). Purine metabolism pathway was the largest pathway assigned to the contigs (1,008), followed by thiamine metabolism (542), biosynthesis of antibiotics (476), and aminobenzoate degradation (257), respectively.

Differentially expressed genes (DEGs) and their functional annotation analyses
Using the Bowtie2 program, clean reads from 12 libraries were mapped individually onto the mangosteen transcriptome as a reference. Finally, count data were performed by the eXpress program before differential expression analysis using the edgeR program. For global DEGs analysis, counted-reads from 12 libraries were subjected to two pairwise comparisons, namely TFD vs normal aril in both control (unstressed water) and treatment (under water deficit) conditions. In total, 51,239 contigs for the control and 46,564 contigs for the treatment condition were used for DEG analysis after a reasonable level filtering from each sample and differences between samples were shown by hierarchal clustering analysis (Fig 5). DEGs were characterized after normalized-library by using TMM normalization and the normalized expression level using count per million (CPM). The number of DEGs was identified from each pairwise comparison. A cutoff was applied with FDR threshold of 0.01, after which 6228 DEGs under the control condition and 3327 DEGs in treatment condition (under water deficit condition) remained (S4 File). The up-regulated and down-regulated DEGs were identified ( Fig 5). The top ten up-regulated and down-regulated DEGs from each pairwise comparison is shown in Table 3 and all DEGs are listed in S4 File. Functional annotations of DEG for GO terms and KEGG pathway (Table 4 and   against the KEGG pathways database and assigned 26 pathways involving ath01130 (biosynthesis of antibiotics), ath01230 (biosynthesis of amino acids), and ath01200 (carbon metabolism). Interestingly, carbohydrate metabolisms-related pathways such as glycolysis/ gluconeogenesis, starch and sucrose metabolism, and pyruvate metabolism were highly observed. Under the treatment condition, a total of 3,326 DEGs was identified. 2,162 DEGs were upregulated and 1,164 DEGs were down-regulated. The highest up-regulated DEGs involved Cyclin-B2-3, Receptor-like protein kinase THESEUS 1, Protein DMR6-LIKE OXYGENASE 2, and pectinesterase 2. The most down-regulated DEGs included beta-galactosidase 1, secoisolariciresinol dehydrogenase, monothiol glutaredoxin-S6, and long-chain-alcohol oxidase FAO1. GO analysis assigned 101 terms of biological process, 67 terms of cellular component, and 74 terms of molecular function. The top 3 most counted enriched GO functional annotation categories were GO:0006351 (transcription, DNA-templated), GO:0006468 (protein phosphorylation), and GO:0009651 (response to salt stress) for the biological process category, GO:0005634 (nucleus), GO:0016021 (integral component of membrane), and GO:0005886 (plasma membrane) for the cellular component category, and GO:0005515 (protein binding), GO:0005524 (ATP binding, and GO:0046872 (metal ion binding) for the molecular function category. DEGs were annotated against the KEGG pathways database and assigned to 19 pathways involving ath01230 (biosynthesis of amino acids), and ath01200 (carbon metabolism).

Identification of candidate genes putatively involved in occurrences of TFD-like physiological disorders
We used GO-slim terms (S6 File) and KEGG pathway (S7 File) results annotated by using SwissProt accession on the KOBAS 2.0 web interface to determine related gene into mineral nutrition-related, phytohormone-related, maturation and ripening process-related, including cell wall modification and carbohydrate metabolism, and water-related genes. In total, 21 genes were selected for validity testing from the genes identified from contigs (Fig 6; S8 File). The candidate gene is also associated with several TFD-causing hypotheses, including genes associated with the ripening process. Genes involved in the process of cell wall synthesis and degradation became of interest to this study. The polygalacturonase gene under TFD conditions showed decreased expression level compared to control conditions in all treatments, and this is noticeably different from the expression level of the gene polygalacturonase inhibitor, which has increased expression level under TFD conditions, compared to the control in all treatments except CLS+RW treatments with  Transcriptome analysis of translucent flesh disorder in mangosteen and CLS treatment, however increased the expression level in ILS and CLS+RW (statistically significant) treatments. The sucrose synthase (SUSY) gene expression level showed a significant increase in TFD compared to normal conditions in almost treatments.
The PME gene showed very high expression level under TFD conditions compared to normal under control and CLS conditions, compared to the stable condition of ILS but in the CLS +RW condition, its expression level showed lower expression in TFD than normal tissue. The XTH gene showed an increased pattern of expression level in TFD compared to normal samples under all treatment conditions and only in ILS treatment showed statistically significant. The EXP gene increased in expression level in the TFD compared to normal samples under control and CLS conditions, but was inversely proportional under both other conditions. Invertase genes in the cell wall and cytoplasm tended to have increased expression level under TFD conditions compared to controls in almost all treatments.
ACC gene expression showed decreased expression level in TFD compared to normal samples in all treatments except CLS+RW. While the ACS gene showed an increase in TFD compared to controls in the control treatment and ILS, the conditions were reversed under CLS and CLS+RW conditions. Calcium-related genes, such as CAM, had increased expression level in TFD compared to controls under all treatments. However, CDPK gene expression level exhibited the same pattern except under CLS conditions that had decreased expression level under TFD conditions than controls.
Three types of transcription, i.e. NAC, MYB, and WRKY, have been identified to be related to the three major transcription factors. MYB1 and MYB2 show increased expression level in TFD tissue compared to normal tissue in almost all treatments except in CLS+RW treatments. NAC1 did not show significant differences between TFD and normal tissue, except for under the ILS treatment. NAC2 showed significant differences between TFD and normal tissue under normal and ILS conditions, whereas under CLS and CLS+RW conditions, it did not show significant differences. In contrast, WRKY1 showed higher expression level than TFD tissue under almost all treatments. WRKY2 and WRKY3 showed higher expression level in TFD than normal tissue under almost all treatments. WRKY2 shows significant expression level under normal treatment conditions. WKRY3 showed significant differences in expression level between TFD and normal tissues.

Discussion
The present study is to report transcriptomic analysis for elucidating the physiological disorders in mangosteen, namely translucent flesh disorder responses to different water regimes. Due to the lack of genomic and transcriptomic data for mangosteen, we used a massive sequencing technology called RNA Sequencing (RNA-seq) to read whole transcripts and determine their gene expression level. In the present study, we analyzed approximately 700 million raw reads (80-gb bases) and 183,274 contigs were generated by de novo assembly using the Trinity program after removal of redundant contigs. More than 50% of the contigs were annotated by at least one public database, such as the non-redundant (nr) protein of NCBI database and SwissProt of UniProtKB databases.
In apple, approximately 18 factors have been identified as either causing or being correlated with watercore such as water regime, temperature, mineral nutrition, source/sink ratio, and maturation or ripening processes [6]. Environmental factors such as high temperature [47] and sunlight exposure [48] have also been found to be highly correlated to increased incidence of watercore in Japanese pear. Physiological and biochemical changes also determined and were correlated with the incidence of TFD-like physiological disorders. Alteration of carbohydrate metabolism may play a role in increasing the incidence of watercore in apple [49], flesh translucency in pineapple [11], and water-soaked flesh disorder in peach [9]. A particular calmodulin-binding protein (CaM-BP) has also been observed to be absent in water-soaked melon fruit [10].
The results of this study indicate that the sucrose synthase (SUSY), endoglucanase (GUN) and polygalacturonase (PG) genes involved in the incidence of TFD in mangosteen are similar to those in other plants, such as watercore incidence in pear and apple. An increase in gene expression level associated with ripening such as sugar metabolism, cell wall loosening, and degradation related genes was observed. In apple, sorbitol transporter (mdSOT) became one of the indicators of watercore, due to an increase in expression level [50]. Sucrose synthase (SUSY) also provides an indicator accumulation of sucrose [51].
TFD tissue had a thickened cell wall was thicker and showed the disintegration of the middle lamella [24]. This cell state indicates the occurrence of cell wall loosening and disintegration, which allows strong cell wall strength to become weak and ultimately damaged. Cell walls are composed of carbohydrate complexes in the form of polysaccharides, such as cellulose and pectin. The presence of middle lamella disintegration may allow formation of pectin by the deesterification homogalacturonan complex [52], as seen from the increased pectin methylesterase (PME) expression level compared to normal tissue. This can also be attributed to to the deficiency of calcium, thereby crosslinking between the pectin in the cell tissue is not formed. Conversely, high expression of pectin methylesterases (PMEs) increases Ca 2+ bound to the cell wall [53]. Thus, the application of calcium can reduce the incidence of TFD [21].
Previous studies have shown that the availability of water affects the incidence of TFD. Availability of water is important for plants; fluctuation in water levels leads to instability in plant potential and caused the fragile cell wall to rupture [54]. Irrigation treatment on a daily basis induced the incidence of TFD; that explained it as loss of turgor pressure due to the accumulation of apoplastic solutes [55]. Continuous water application increases the incidence of TFD compared with other treatments [19]. The incidence of TFD was different among treatments. For both the irrigated land surface and covered land surface, the incidence of TFD was higher than in the other conditions. Similarly, TFD incidence was increased by daily watering or interval watering (7-or 4-day intervals) relative to the control (no irrigation added) [23]. The covered land surface with irrigation had a lower incidence of TFD than the irrigated land surface and covered land surface by 8-9 weeks after anthesis, although it still had a higher incidence of TFD than the control. Nine weeks after anthesis was a time at which water supply had a strong impact on TFD and GD incidence in mangosteen fruit [22]. TFD may be triggered by a water potential imbalance in the cells that results from excess water content in the fruit, which itself arises from continuous exposure of the plants to high soil water potential during the pre-harvest period. Water potential gradients are also involved in water uptake by the roots and transpiration in the leaves [56].
The suppression of the pectate lyase mRNA in transgenic strawberry fruits during ripening resulted in firmer fruits [57]. On the other hand, the XTH gene shows an increase in TFD that allows expression level in the maintenance structure compared to cell wall disassembly that enlarges the cell wall (loosening). Cell size in TFD is significantly increased than in the normal aril [24]. This cell enlargement is likely due to the modified cell wall due to the loosening process. This is demonstrated in the expression level of genes associated with loosening, including expansin and XTH. EXP mediates cell wall loosening [58]. The polygalacturonase gene decreases in TFD compared to normal conditions as it acts in the softening process, resulting in degradation or solubilization of the cell wall. Increased pectinesterase expression level in TFD showed that the pectin is bound to the cell walls and pectin-bound calcium is higher than normal [25]. Calcium-pectate forms cross-links between homogalacturonan (HGA) in the middle of the lamella so that the pectin-bound calcium is increased.
These results provide a basis for future studies on the molecular mechanisms of the incidence of TFD in mangosteen since the mangosteen known as polyploid plant [59]. The present study summarized incidence of TFD by the following (Fig 7): first, the cells undergo major osmotic changes between the inside and outside of the cell so that cell membranes push against the cell wall due to mass migration into cells by osmosis (over-turgor pressure) [54]. Second, at the same time the cell wall responds by stretching the cell (loosening). This can be measured by CSL, XTH, and EXP gene expression level compared to normal tissue on the cell wall. In the middle lamella, PME gene expression level, which is the process of formation of the homogalacturonan complex, is also higher. In condition of calcium deficiencies, the calcium-pectate complex is not formed which causes the lamella damage. Third, because of the large flow of water into the cell, and the damaged middle lamella, water fills the air cavity. However, regarding physiological disorders, e.g. watercore in apple or pear, there are other factors that cause TFD occurrence, such as temperature changes, and hormone factors such as gibberellin. Thus, further studies are needed to re-confirm the genes-related translucent flesh disorder using RT-qPCR and also elucidate the other gene-related disorders such as gamboge disorder.