Genome-wide identification and expression analysis of the SR gene family in longan (Dimocarpus longan Lour.)

Longan (Dimocarpus longan Lour.) is an important commercial fruit tree in southern China. The embryogenesis of longan affects the quality and yield of fruit. A large number of alternative splicing events occurs during somatic embryogenesis (SE), which is regulated by serine/arginine-rich (SR) proteins. However, the functions of SR proteins in longan are poorly understood. In this study, 21 Dlo-SR gene family members belonging to six subfamilies were identified, among which Dlo-RSZ20a, Dlo-SR30, Dlo-SR17, Dlo-SR53 and Dlo-SR32 were localized in the nucleus, Dlo-RSZ20b, Dlo-RSZ20c, Dlo-RSZ20d, Dlo-SC18, Dlo-RS2Z29, Dlo-SCL41, and Dlo-SR33 were localized in chloroplasts, and Dlo-RS43, Dlo-SC33, Dlo-SC37, Dlo-RS2Z33, Dlo-RS2Z16, Dlo-RS2Z24, Dlo-SCL43, Dlo-SR112, and Dlo-SR59 were localized in the nucleus and chloroplasts. The Dlo-SR genes exhibited differential expression patterns in different tissues of longan. The transcript levels of Dlo-RSZ20a, Dlo-SC18, Dlo-RS2Z29, DLo-SR59, Dlo-SR53, and Dlo-SR17 were low in all analyzed tissues, whereas Dlo-RS43, Dlo-RS2Z16, Dlo-RS2Z24, and Dlo-SR30 were highly expressed in all tissues. To clarify their function during SE, the transcript levels of Dlo-SR genes were analyzed at different four stages of SE, comprising non-embryonic callus (NEC), friable-embryogenic callus (EC), incomplete compact pro-embryogenic culture (ICpEC) and globular embryo (GE). Interestingly, the transcript levels of Dlo-RS2Z29 and Dlo-SR112 were increased in embryogenic cells compared with the NEC stage, whereas transcript levels of Dlo-RSZ20a, Dlo-RS43, Dlo-SC37, and Dlo-RS2Z16 were especially increased at the GE stage compared with the other stages. Alternative splicing events of Dlo-SR mRNA precursors (pre-mRNAs) was detected during SE, with totals of 41, 29, 35, and 44 events detected during NEC, EC, ICpEC, and GE respectively. Protein–protein interaction analysis showed that SR proteins were capable of interaction with each other. The results indicate that the alternative splicing of Dlo-SR pre-mRNAs occurs during SE and that Dlo-SR proteins may interact to regulate embryogenesis of longan.

1 Introduction problems such as variable inter-annual yields and low proportion of edible pulp seriously restrict expansion of the longan industry. Embryo development is strongly associated with the yield and fruit quality of longan. Plant somatic embryos show close morphological and molecular similarities to normal zygotic embryos [17][18][19][20]. Somatic embryos of longan have been used to investigate the embryogenesis mechanism in vitro and in vivo [21][22][23]. Through analysis of the longan transcriptome, a large number of AS events have been detected during somatic embryogenesis (SE), which can be divided into six stages including friable-embryogenic callus (EC), incomplete compact pro-embryogenic cultures (ICpEC), globular embryos (GE), heart-shaped embryos, torpedo-shaped embryos and cotyledonary embryos [24]. Among these six stages, the first three stages (EC, ICpEC and GE) belong to the early stage during SE and the cultivation of early stage embryo requires the addition of plant hormone 2,4-dichlorophenoxyacetic acid (2,4-D). In the present study, genome-wide identification and analysis of the Dlo-SR gene family members were performed and the expression pattern of Dlo-SR genes in different tissues of longan was analyzed. In addition, AS events of Dlo-SR genes were analyzed during SE in longan and the potential interactions among selected Dlo-SR proteins were predicted.

Experimental materials
Friable-embryogenic callus of D.longan 'Honghezi' (LC2 cell line), induced by Lai Zhongxiong in 1994, was long-term sub-cultured by the Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University [21][22]25]. In the present study, callus of the LC2 cell line was first cultured on Murashige and Skoog (MS) medium supplemented with 2,4-D (1.0 mg/L), kinetin (0.5 mg/L), and AgNO 3 (5 mg/L) for 20 days and was subsequently transferred to MS medium supplemented with 2,4-D (1.0 mg/L) for an additional 20 days to obtain friable-embryogenic callus (EC). The EC was cultured on MS medium supplemented with 2, 4-D (0.5 mg/L) for an additional 20 days to obtain incomplete compact pro-embryogenic cultures (ICpEC). To obtain cells at the globular embryo (GE) stage, the EC was cultured on MS medium supplemented with 2,4-D (0.1 mg/L) for 20 days. Mature embryos were cultured on MS medium for about 45 days to obtain non-embryonic callus (NEC) [24][25][26]. The cultivation of synchronized cultures at NEC, EC, ICpEC, and GE stages were conducted for three replications. The samples for qRT-PCR analysis were generated from five bottles of synchronized culture, respectively as described in the previous study [24].

Identification of Dlo-SR gene family members
Longan genome data were downloaded from the GigaScience GigaDB repository (2017) (http://dx.doi.org/10.5524/100276). Longan SR family members were isolated from the genome data as follows. First, the known SR accession number of Arabidopsis and rice were used to search the amino acid sequences from TAIR (http://www.arabidopsis.org/, V10.0) and RGAP (http://www.rice.plantbiology.msu.edu), respectively [7]. Second, a longan genomewide amino acid sequence database was generated using BLAST (Standalone). Alignment of the longan sequences with the amino acid sequences of known Arabidopsis and rice SR proteins (E-value = 0.001) confirmed the candidate longan SR family members. Third, the candidate Dlo-SR family members were verified using the Pfam database (PF00076.21). A total of 21 candidate sequences were obtained finally and the CDS sequences of Dlo-SRs were showed in S1 Table. Transcriptome data for nine tissues and organs of longan 'Si Ji Mi' were accessed in the National Center for Biotechnology Information (NCBI) GEO database (accession number GSE84467). The AS events of Dlo-SR genes at the NEC, EC, ICpEC, and GE stages were extracted from the longan transcriptome dataset (NCBI Accession No.: SRA050205).
A controlled plant chamber was used for cultivation of longan ECs under different light conditions. The intensity and photoperiod of blue (457 nm) and white light were fixed at 32 μmol m −2 s −1 and 12 h d −1 , respectively. Transcriptome data under different light qualities (white, blue, and dark) were extracted from the NCBI BioProject database (accession number PRJNA562034).

RNA extraction and gene expression analysis
Total RNA was extracted from cultures at the longan NEC, EC, ICpEC, and GE stages using Tri-Pure Isolation Reagent (Roche Diagnostics, Indianapolis, IN, USA). Extracts were treated with DNase I to remove any contaminating genomic DNA. The RNA quality was analyzed using a Nanodrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The cDNA was synthesized with the PrimeScript RT Reagent Kit (TaKaRa, Japan), using 500 ng RNA in a 10 μL reaction volume. Transcript levels were analyzed by qRT-PCR performed on a Lightcycler 480 system (Roche Applied Science, Basel, Switzerland). The 22.5 μL final reaction volume contained 12.5 μL SYBR II Premix Ex Taq™ (Takara), 1 μL of 10× diluted cDNA, 0.8 μL specific primer pairs (100 nM), and 7.4 μL ddH 2 O [33]. The qRT-PCR protocol was as follows: 95˚C for 30 s, followed by 40 cycles of 95˚C for 5 s, 60˚C for 20 s and 72˚C for 10 s. Each qRT-PCR analysis was performed in biological triplicates and technical replications. The β-actin (ACTB) gene was used as an internal control for calculation of the relative expression level of SR genes following the 2 −44Ct method [33]. The data presented are the average ± standard error of three replicates. The primers for qRT-PCR were designed using Primer 3 software (Table 1).

Construction of interaction networks of SR proteins in longan
STRING (https://string-db.org/) was used to construct the functional protein association networks for SR proteins on the basis of Arabidopsis orthologs. The minimum required interaction score was set to Medium confidence (0.400) and the maximum number of interactors was 10.

Statistical analysis
Statistical analysis was performed using SPSS (version 19.0, Chicago, IL, USA). The gene expression data was analyzed by one-way analysis of variance (ANOVA) Different lower-case letters were used to indicate the differences significant at P < 0.05.

Phylogenetic and conserved motif analysis of Dlo-SR gene family members
To analyze evolutionary relationships between the SR gene families of longan and Arabidopsis, a phylogenetic tree was constructed using the neighbor-joining method ( Fig 1A). Consistent with the nomenclature for Arabidopsis SR proteins [7], Dlo-SR gene family members were named systematically according to the phylogenetic tree and the molecular weight ( Table 2). The Dlo-SR genes consisted of six subfamilies: RSZ, RS, SC, RS2Z, SCL and SR. Among these subfamilies, RS, RS2Z and SCL are plant-specific [7]. The number of Dlo-SR members in each subfamily ranged https://doi.org/10.1371/journal.pone.0238032.t001

PLOS ONE
from one (RS subfamily) to seven (SR subfamily). Bootstrap analysis supported a close relationship between the members of the RSZ and SC subfamilies, whereas the genetic distances indicated that the SCL subfamily members were distantly related. The members of the RS2Z and SR subfamilies were closely or distantly related, exhibiting conserve or protein-specific characteristics. Exon and intron positions in longan SR genes were mapped using the GSDS server. The number of introns ranged from two to ten and the number of exons ranged from three to 11 ( Fig 1B). The genes that contained the highest introns and fewest exons were Dlo-SR32 and Dlo-SR59, respectively. Three conserved motifs in the Dlo-SR proteins were predicted, namely Motif 1 (RPRGFAFVEFEDRRDAEDAIRALDGKN), Motif 2 (LYVGNLSPRVTERELEDLFS-KYGKVVDVD), and Motif 3 (GWRVELSHNSKGGGGRGGARGRGGGEDLKCYECPGH-FARECRLRVGS) (Fig 1C). Each motif comprised 27, 29, and 49 amino acids,e respectively. All four RSZ subfamily members contained the three conserved motifs. Dlo-SR17 in the SR subfamily, Dlo-SCL41 in the SCL subfamily and Dlo-RS2Z32 in the RS2Z subfamily contained only Motif 1, and the remaining gene family members contained Motif 1 and Motif 2.

Predication of cis-acting elements in the Dlo-SR gene promoter
The PlantCARE database was used to analyze the 2 kb upstream sequence of the 21 Dlo-SR gene family members to predict the transcription start site (TSS) ( Table 3). The 21 Dlo-SR genes all contained the core promoter elements CAAT-Box and TATA-Box. In addition to Dlo-RS2Z16, promoters of the other Dlo-SR genes contained multiple cis-acting elements related to growth and development, and responsive to phytohormones, stress, and light. In addition, the promoter of each Dlo-SR gene contained a large number of cis-acting elements of unknown functions, therefore the functions of Dlo-SR genes remain to be elucidated.

Expression pattern of Dlo-SR genes in different tissues of longan
The transcript levels of Dlo-SR genes in the root, stem, leaf, flower, flower bud, pericarp, pulp, seed, and young fruit of longan indicated that the expression levels varied in different tissues (Fig 2). The expression levels of Dlo-RSZ20a, Dlo-RSZ20b, Dlo-SR17, Dlo-SC18, and Dlo-RS2Z29 were low in all analyzed organs, whereas the transcript levels of Dlo-RS43, Dlo-RS2Z16, Dlo-RS2Z24, Dlo-SCL43, Dlo-SR30, and Dlo-SR32 were high in all tissues. In the same subfamily, the transcript level of SR genes differed, for example, in the SC subfamily, the abundance of Dlo-SC18 transcripts was distinctly lower than that of Dlo-SC33 and Dlo-SC37, and in the RS2Z family, RS2Z16, RS2Z32, and RS2Z24 transcripts were more abundant than RS2Z29 transcripts. Certain Dlo-SR genes exhibited a tissue-specific expression pattern and transcription of different members in the same family varied in the same tissue. In the pulp, the expression level of Dlo-RSZ20a and Dlo-RSZ20c were higher than those of Dlo-RSZ20b and Dlo-RSZ20b (all members of the RSZ family). In the SR family, the expression level of Dlo-SR33 was highest in the pulp, followed by that of Dlo-SR30, Dlo-SR32, and Dlo-SR112.

Expression analysis of Dlo-SR genes at early stages of longan somatic embryogenesis
To investigate the role of Dlo-SR genes during longan embryogenesis, qRT-PCR analysis of 18 Dlo-SR genes was performed at the early stages of SE, namely NEC, EC, ICpEC and GE, with genes at the NEC stage as a control (Fig 3). Compared with that at the NEC stage, most transcripts of Dlo-RS2Z29 and Dlo-SR112 was detected in embryonic stages (EC, ICpEC and GE). Moreover, the specific induction of Dlo-RS2Z24 and Dlo-SCL41 was noted at the EC stage while the expression level of Dlo-RSZ20a and Dlo-SC37 and Dlo-RS2Z16 was highest at the GE stage. In addition, the expression level of Dlo-RS43, Dlo-SC33, Dlo-RS2Z16, Dlo-SCL43, Dlo-SR17, Dlo-SR33, and Dlo-SR32 was reduced at the ICpEC and EC stages compared with that at the NEC stage.

Analysis of AS events of Dlo-SR genes during longan somatic embryogenesis
Alternative splicing can be divided into seven predominant forms according to the location of splicing in the mRNA precursor (pre-mRNA) sequence: exon skipping (ES), retention intron (RI), alternative 5'splice site (A5SS), alternative 3'splice site (A3SS), alternative first exon, alternative last exon and mutually exclusive exon. The first four AS forms are the predominant types observed in eukaryotes. The AS events for Dlo-SR gene family members were analyzed at four stages including NEC, EC, ICpEC, and GE (Fig 4 and

Protein-protein interaction analysis of Dlo-SR in longan
To explore the potential functions of Dlo-SR proteins, six genes (Dlo-RSZ20a, Dlo-RS43, Dlo-SC18, Dlo-RS2Z32, Dlo-SCL41, and Dlo-SR30) from different subfamilies were selected to construct a protein-protein interaction network using STRING 11 software based on an Arabidopsis association model (Fig 5). The Dlo-RSZ20a protein showed high homology with At-RSZ21, and the protein interaction network consisted of 11 nodes and 50 edges. The biological processes involved RNA splicing, mRNA processing, mRNA splicing via a spliceosome and mRNA transport. Dlo-RS43 showed high homology with At-RS41, and the protein interaction network consisted of 11 nodes and 31 edges. The biological processes involved mRNA splicing via a spliceosome, primary miRNA processing, gene expression, cellular response to dsRNA and cellular nitrogen compound metabolic process. Dlo-SC18 showed high homology with At-SC35, and the protein interaction network consisted of 11 nodes and 54 edges. The biological processes involved RNA splicing, mRNA processing and mRNA splicing via a spliceosome. Dlo-RS2Z32 showed high homology with At-SR45a, and the protein interaction network consisted of 11 nodes and 55 edges. The biological processes involved RNA splicing, mRNA  processing, mRNA splicing via a spliceosome, gene expression and RNA metabolic process. Dlo-SCL41 showed high homology with At-SCL30, and the protein interaction network consisted of 11 nodes and 46 edges. The biological processes involved RNA splicing, mRNA processing and mRNA splicing via a spliceosome. Dlo-SR30 showed high homology with At-RS31, and the protein interaction network consisted of 11 nodes and 29 edges. The biological processes involved mRNA splicing via a spliceosome, gene expression, cellular nitrogen compound metabolic process, translational elongation and glutathione metabolic process.

Discussion
Alternative splicing events occur extensively in genes involved in expression regulation of eukaryotes, such as signaling, programmed cell death and other related genes [34][35][36][37]. Alternative splicing also affects the efficiency, stability and localization of cis-acting elements in the transport and translation of mRNA [37]. In the present study, 21 members of the Dlo-SR gene family belonging to six subfamilies were identified in longan based on the results of a neighbor-joining analysis with At-SR genes of Arabidopsis. The SR gene families of longan and Arabidopsis are relatively conserved in evolution. The AS events of Dlo-SR pre-mRNAs at the NEC, EC, ICpEC, and GE stages in longan were analyzed in this study, which confirmed the involvement of Dlo-SR and AS in longan SE. In plants, the study of the functions of SR genes is in its infancy. The mechanism of SR function in SE has not been reported in plants, but has been studied during embryogenesis in Caenorhabditis elegans and mouse. Longman et al. observed that RNA interference with Ce-SF2/ ASF may be lethal in the late embryonic stage [8]. Jumaa et al. attempted to raise srp20-deficient mice using gene knockout technology, but failed to obtain viable offspring [9]. In the present study, we observed that during longan SE, except for Dlo-RSZ20a and Dlo-SR53, the remaining 16 Dlo-SR genes showed varied expression levels in the ICpEC, EC, and GE stages. It is worth noting that the expression level of Dlo-RS2Z29 and Dlo-SR112 were significantly increased in the embryonic cells (EC, ICpEC, and GE) compared with those in NEC, though small amounts of Dlo-RS2Z29 transcripts were detected in different tissues of longan, which indicated it may function in an embryo-specific manner. Transcripts of the genes Dlo-SR43, Dlo-SC33, Dlo-RS2Z16, Dlo-SCL43, Dlo-SR33, and Dlo-SR32 were detected in all tissues and showed varying degrees of reduction at the stages of ICpEC and EC while the transcripts of Dlo-RSZ20a, Dlo-SC37, and Dlo-RS2Z16 were highly detected at the GE stage. Differential expression pattern of Dlo-SR genes during SE suggested that their involvement in the regulation of longan embryogenesis. As shown in the promoter cis-elements analysis, the majority of SR genes contain abundant photo-responsive elements. Under blue and white light conditions, the tissue-highly expressed Dlo-RS43 and Dlo-RS2Z24 showed no differential response compared with the control (S1 Fig). The high abundance of Dlo-SR33, Dlo-SR112, Dlo-SR17, and Dlo-RS2Z32, as well as decreased amounts of Dlo-RSZ20d, Dlo-SC33, Dlo-SC37, and Dlo-SC43 at EC in response to exposure to blue light compared with those of the control (S1 Fig). The response of SR proteins to light may also associated with their localization in chloroplasts, as chloroplasts are an important plant-specific organelle and critical for photosynthesis.
Whether SR genes in plant play different roles to those in mammals still needs verification. The present results provided information to support their distinction. In the traditional definition of SR protein, the serine/arginine-rich RS domain is located at the C-terminus of the SR protein polypeptide chain. However, sequence analysis revealed that the RS domain of Dlo-RS2Z16 (Dlo_004147.1) is located at the N-terminus of the polypeptide chain. It is generally considered that the majority of SR proteins are stored in the nucleus speckles in a stable state, and that some are shuttled between the nucleus and the cytoplasm. The longan SR proteins predicted to be localized in both the nucleus and chloroplasts may belong to the nucleus-cytoplasm shuttle proteins. In addition to effect of AS of pre-mRNAs, the nucleus-cytoplasm shuttle proteins also play an important role in mRNA transport, translation regulation, and mRNA stabilization and localization [38]. The serine residue of the RS domain is the substrate for a variety of protein kinases and its phosphorylation state is closely related to the splicing which is regulated by SR proteins and to the localization of the SR protein in the cell [39][40][41].

Conclusion
In conclusion, a total of 21 Dlo-SR members were identified in longan and the expression of these Dlo-SRs varied in different tissues of longan. During the development of longan SE, the high expression levels of Dlo-RS2Z29 and Dlo-SR112 were noted in the embryogenic cells (EC, ICpEC, and GE) compared with those at the NEC stage. Moreover, the transcripts of Dlo-RSZ20a, Dlo-RS43, Dlo-SC37, and Dlo-RS2Z16 were increased especially at the GE stage compared with those at other embryonic stages. In addition, AS events of Dlo-SR pre-mRNAs were also observed during SE with the least AS occurred in the cells at the EC stage and gradual increasement in the later transition to GE stage, indicating the involvement of SR's AS during longan SE.
Supporting information S1 Fig. Expression of Dlo-SRs in longan cells at EC stage exposed to different light conditions. Longan cells at EC stages were subjected to white and blue light conditions with dark condition as a control. (TIF) S1