Genome-wide analysis of sulfotransferase genes and their responses to abiotic stresses in Chinese cabbage (Brassica rapa L.)

Sulfotransferases (SOTs; EC 2.8.2.-), which are widespread from prokaryotes to eukaryotes, constitute a multi-protein family that plays crucial roles in plant growth, development and stress adaptation. However, this family has not been systemically investigated in Brassica rapa. Here, a genome-wide systemic analysis of SOT genes in B. rapa subsp. pekinensis, a globally cultivated vegetable, were conducted. We identified 56 SOT genes from the whole B. rapa genome using Arabidopsis SOT sequences as queries and classified them into nine groups, rather than the eight groups of previous research. 56 B. rapa SOT genes (BraSOTs) were distributed on all 10 chromosomes except for chromosome 5. Of these, 27 BraSOTs were distributed in seven clusters on five chromosomes (ChrA01, ChrA02, Chr03, ChrA07, and Chr09). Among the BraSOT proteins, 48 had only one SOT_1 domain and 6 had two, while 2 had one SOT_3 domain. Additionally, 47 BraSOT proteins contained only known SOT domains. The remaining nine proteins, five in group-VIII and two in group-IX, contained additional transmembrane domains. Specific motif regions I and IV for 3′-phosphoadenosine 5′-phosphosulfate binding were found in 41 BraSOT proteins. Introns were present in only 18 BraSOT genes, and all seven BraSOT genes in groups VIII and IX had more than three introns. To identify crucial SOTs mediating the response to abiotic stress in B. rapa, expression changes in 56 BraSOT genes were determined by quantitative RT-PCR after drought, salinity, and ABA treatments, and some BraSOT genes were associated with NaCl, drought and ABA stress, e.g. Bra017370, Bra009300, Bra027880.


Introduction
Sulfotransferases (SOTs) catalyze the transfer of a sulfonate group from 3 0 -phosphoadenosine 5 0 -phosphosulfate (PAPS) to an appropriate hydroxyl/amino group on numerous substrates [1]. This reaction, generally referred to as sulfation or sulfonation, has been found to occur in a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 all organisms investigated to date [1] and is associated with various biological processes, such as cell communication, growth and development, and defense [2].
SOT substrates include numerous endogenous and xenobiotic molecules that contain hydroxyl or amino groups [3,4]. Sulfonation transforms drugs and other xenobiotic compounds into more water-soluble forms, thereby increasing their renal excretion and converting them into less toxic metabolites [5]; however, sulfonation can also activate molecules [6].
Because they all use PAPS as a co-substrate, SOT proteins are characterized by the presence of a histidine residue in the active site, defined PAPS-binding sites and a defined SOT domain (Pfam: PF00685) [1,7]. The PAPS-binding sites of SOTs are two highly conserved amino acid motifs known as regions I and IV [6,8,9], which are respectively localized near the N-and Ctermini. SOT proteins in mammals were originally classified on the basis of their affinity for different classes of substrates. One group of SOT proteins, mainly membrane-associated, accepts macromolecules such as proteins/peptides and glycosaminoglycans [10] as substrates. The second group, which are typically soluble proteins, has substrates consisting of small organic molecules with diverse chemical structures, such as flavonoids, steroids and xenobiotics [2]. SOT proteins in plants can be divided into cytosolic and membrane-associated proteins. Thus far, most identified plant SOTs are cytosolic proteins with no transmembrane region and their exact localization is largely unknown. Only a few membrane-associated proteins, which are bound to the plasma membrane or localized in the Golgi apparatus, have been characterized in plants [11][12][13].
In Arabidopsis thaliana, 18 SOT genes (AtSOT1-AtSOT18) were initially identified on the basis of the sequence similarities of their translated products, with an average identity of 51.1%, and were classified into seven groups [14]. In 2008, another three SOT genes (SOT19-SOT21) were added to the AtSOT family to form group VIII [2]. Additionally, a tyrosylprotein SOT protein was identified, named AtTPST, that has low sequence similarity with other AtSOTs, possesses more amino acids, and lacks conserved amino acid sequences such as regions I to IV [13]. AtTPST is also the only Arabidopsis SOT protein containing a Sulfotrans-fer_2 domain (PF03567) instead of a Sulfotransfer_1 domain (PF00685) [1]. Arabidopsis SOTs have different substrate specificities; for example, the preferred substrate of AtSOT5 and AtSOT13 is flavonol [1,15,16], AtSOT8 functions as a flavonoid glycoside sulfotransferase [17], and AtSOT10 displays catalytic activity toward brassinosteroids [18], while AtSOT12 seems to have more functions with activities towards flavonone, brassinosteroids and salicylic acid and is involved in plant responses to salt, osmotic stress and hormone treatments [18][19][20]. AtSOTs can also sulfonate the bacterial-produced toxin cycloheximide [21].
In addition to genome-wide analyses of SOTs in A. thaliana, several limited studies have been conducted in other Brassicaceae species. Two SOT proteins, BNST3 and BNST4, have been characterized in Brassica napus. Comparative analysis of B. napus and A. thaliana databases has revealed at least 11 putative desulfoglucosinolate SOTs. In addition to the discovery of homologs of AtSOT16-18, phylogenetic analyses have revealed new subfamilies of desulfoglucosinolate SOTs not present in A. thaliana [22]. A comparative genomics study of B. rapa and A. thaliana identified all glucosinolate biosynthesis genes [23]. Twelve desulfoglucosinolate SOTs were identified and found to be differentially expressed in B. rapa, compared with only three present in Arabidopsis. In addition to Brassicaceae, SOTs have been identified in rice (Oryza sativa L.). In particular, the resistant allele of rice STV11 was found to encode an SOT (OsSOT1) catalyzing the sulfonation of salicylic acid [24].
Arabidopsis SOT genes are so far the most studied plant SOT genes. Given the wide range of functions and substrate specificities of SOTs and the very limited number of plant species in which they have been fully characterized, genome-wide studies of SOTs in more plant species will be useful to explore their functions in detail. Here, we identified 56 putative SOT-domain proteins in B. rapa across the whole genome. Phylogenetic analysis and classification was performed based on the SOT genes in Arabidopsis, followed by gene and protein structural characteristic analysis. Expression pattern of 56 BraSOT genes were also investigated via qRT-PCR. Our results provided basic resources for further functional research.

Retrieval and identification of SOT domain proteins
Sequences of SOT domain proteins from A. thaliana were downloaded from the TAIR database (http://www.arabidopsis.org/) according to previous methods [2,13,14]. To identify all SOT proteins in B. rapa, all the AtSOTs were then used as queries for the blastp tool in the Brassica database (BRAD) (http://brassicadb.org/brad/index.php). An E-value of 10 based on the BLOSUM62 alignment score matrix was used as the matching criterion. Coding (CDS) and genomic sequences of BraSOTs were also verified according to CoGe FeatView (https:// genomevolution.org/coge/FeatView.pl). The online Compute pI/Mw tool (http://web.expasy. org/compute_pi/) was used to calculate the theoretical isoelectric point and molecular weight of each protein [25].

Chromosomal location and gene structural determinations
Location information for the BraSOT genes on the 10 B. rapa chromosomes, obtained from the latest version of BRAD, was depicted using the MapChart v2.3 (http://www.wageningenur. nl/en/show/Mapchart.htm) and Adobe Photoshop CS4 (http://www.adobe.com/) software. The online Gene Structure Display Server (http://gsds.cbi.pku.edu.cn) was used to generate IDD exon-intron structures including exon positions and gene length [26].

Identification of conserved domains and phylogenetic analysis
The protein sequences obtained from BRAD were analyzed to determine their domain organization using the SMART sequence analysis tool and the Pfam database (http://smart.emblheidelberg.de/). The domains were also verified and named according to the results of NCBI-CD searches against the CDD v3.14 database (47,363 position-specific scoring matrices). To visualize conserved motifs, SOT-domain amino acid sequences were analyzed with the WEBLOGO program (http://weblogo.threeplusone.com/). In addition, BraSOT protein motifs were identified using the MEME program (http://meme-suite.org/tools/meme), with motif discovery mode set to normal, motif number to 20, motif width to 5-50, and all other settings to default values. The results were manually adjusted.
Multiple sequence alignment of complete BraSOT and AtSOT sequences was performed using the T-Coffee program [27]. Phylogenetic and molecular evolutionary analyses were conducted in MEGA 6 [28]. Phylogenetic trees were constructed using the neighbor-joining method with the following parameters: gaps treated as missing data, the pairwise deletion option for total sequence analysis, the p-distance substitution method, and 1000 bootstrap replications to assess internal branch reliability [29].

Plant materials and treatment
B. rapa seedlings were grown in liquid plant nutrient medium prepared with green-leaf-vegetable universal formula [30] under a 16/8h day/night photoperiod at 20±2˚C and 50%-60% humidity. Two-week-old seedlings were prepared for stress treatment. For drought stress, seedlings were treated with 25% PEG-6000 once for 24 h. For salt stress, seedling roots were submerged in 200 mmol/L NaCl for 24 h. For ABA stress, plants were sprayed with 100 μmol/ L ABA once and allowed to continue growing for 12 h. Plants grown in normal Hoagland nutrient solution were used as a control (S1 Fig). Drought and salinity-treated leaves were collected at 6, 12, 24 h after treatment. ABA-treated samples were collected at 3, 6, 12 h. Root, hypocotyl and cotyledon of the control seedlings were also collected. Samples were immediately frozen in liquid nitrogen and stored at −80˚C for further analysis.

Expression analysis by quantitative real-time PCR (qRT-PCR)
Total RNA was extracted using Trizol Reagent, according to the manufacturer's specifications (Thermo Fisher, USA). First-strand cDNA was synthesized by reverse transcription of 2 μg of total RNA using the reverse transcription-PCR system (Thermo Scientific RevertAid RT Kit, USA).
To investigate the expression patterns of BraSOT genes under different stresses, all BraSOT genes were analyzed using real-time RT-PCR. The Brassica actin gene (Accession No. AF111812) was used as an internal control. Gene-specific primers (listed in S1 Table) were designed using the OligoArchitect Online software (http://www.oligoarchitect.com/ OligoArchitect/). The real-time PCR assay mix (20 μL) consisted of 3 μL cDNA sample (diluted 1:10), 10 μL 2× LightCycler1 480 SYBR Green I Master mix (Roche, USA), 0.5 μL of each primer (10 μM) (S1 Table), and 6.0 μL distilled deionized H 2 O. Each real-time PCR assay was run on a Light-Cycler1 480 System (Roche, USA) with an initial denaturation at 95˚C for 6 min, followed by 45 cycles of 95˚C for 15 s and 60˚C for 30 s. The resulting fragments were subjected to melting-curve analysis to verify the presence of gene-specific PCR products. The melting-curve analysis was performed directly after real-time PCR under conditions of 95˚C for 5 s followed by a constant increase from 65 to 97˚C at a 2.50˚C/s ramp rate. The 2 −ΔΔCt method was used to calculate the relative amount of template present in each PCR amplification mixture [31].

Identification of SOT proteins in B. rapa
Using the approaches described in the Methods to identify all the SOT-domain containing proteins in B. rapa, we obtained 56 proteins. To confirm these 56 putative SOT proteins, we performed SMART analysis and the results showed that all 56 proteins contained a sulfotransferase domain according the Pfam database. Therefore, we identified 56 BraSOT proteins in B. rapa, compared with 22 proteins in Arabidopsis. The BraSOT amino acid lengths ranged from 65 (Bra017006) to 589 (Bra017364) residues. Excluding the two shortest BraSOT proteins (Bra036654 and Bra017006), the average BraSOT length was 325.4 amino acids (Table 1), similar to the AtSOTs in Arabidopsis.

Chromosomal localization of the BraSOT gene family
Based on the latest information available in BRAD, we localized 53 of the 56 genes encoding SOT proteins on all 10 B. rapa chromosomes except for chromosome A05 (Fig 1). The locations of three BraSOT genes (Bra034520, Bra034521, Bra041015) assigned to scaffolds could not be determined. As shown in Fig 1, the BraSOT genes were unevenly distributed on the B. rapa chromosomes. Specifically, there were no SOT genes on chromosome A05, 1 on A04, 2 on A06 and A10, 3 on A08, 6 on A01 and A03, 7 on A02, 10 on A07 and 16 on A09. As defined by Holub et al. (2001) [32], a chromosome region containing two or more genes within 200 kb is considered to be a gene cluster. In B. rapa, 27 BraSOT genes formed seven such clusters ( Fig  1). In one case, eight (Bra017364, Bra017365, Bra017368, Bra017369, Bra017370, Bra017371, Bra017372 and Bra017373) of the 16 BraSOT genes on chromosome A09 were closely spaced together in a tandem fashion within a 120-kb region. The seven identified clusters were distributed as follows: two each on chromosomes A07 and A09 and one cluster each on chromosomes A01, A02 and A03. No clusters were located on chromosomes A04, A06, A08 or A10.

Phylogenetic analysis of the SOT family in B. rapa
To reveal the phylogenetic relationships among all 56 BraSOT proteins, phylogenetic analysis was performed with the MEGA software using the full-length amino acid sequences of the SOT proteins and a bootstrap value of 1000 (Fig 2). To obtain a reliable classification within different subfamilies, the SOT domain proteins from Arabidopsis were used as references. Similar to Arabidopsis, the results showed that the all but one BraSOT protein could be divided  into nine subfamilies according to their sequence similarities. One protein with a very short amino acid sequence (Bra017006) formed a single branch. Considering this protein was too short to constitute a complete SOT domain, we did not classify it into any group. The classification of AtSOT proteins on this phylogenetic tree was identical to that in previous research [2], while the proportions of each group in the whole set of SOT proteins in B. rapa and Arabidopsis were different.   [6,8];. The BraSOT proteins contained both regions, and their consensus sequences were highly similar to those of A. thaliana (Fig 4). Region I is located near the N-terminus, whereas region IV is adjacent to the C-terminus. To visualize the distribution of highly conserved motifs in the BraSOT proteins, 20 specific conserved motifs were identified using the MEME motif search tool. The distribution of these motifs in BraSOT proteins is displayed in Fig 5. Among the 56 BraSOT proteins, eight motifs appeared at least 40 times and nine were present fewer than 12 times. The most frequent motif (50 times) was Motif 7. None of these motifs were present in all BraSOT proteins. The motif types of groups VIII and IX appeared to be quite different from those of the other BraSOT groups. Motifs 11, 12, 13, 15, 16 and 18 were specific to groups VIII and IX. Motifs 2 and 3 respectively contained region I at the C-terminus and region IV near the N-terminus, the two amino acid regions involved in PAPS binding. Of the 56 BraSOT proteins, 41 possessed both regions I and IV, while four only contained region I and another four only contained region IV.

Gene structure analysis of the BraSOT gene family
A map of exon-intron structure based on the B. rapa genome and BraSOT CDS sequences was generated with the GSDS tool [26] to reveal the gene structures (Fig 3). This map revealed that 32.14% of BraSOT genes contained introns, a level similar to that of A. thaliana SOT genes (31.82%). Among the 18 intron-containing BraSOT genes, 9 genes had 1 intron, 2 had 3 introns, 2 had 4 introns, 2 had 5 introns, 1 had 7 introns and 2 genes from group IX had 11. Seven of the nine BraSOT genes containing more than three introns were members of groups VIII and I; the other two genes were in groups I and V.

Expression profiles of BraSOT genes in different tissus and the response to drought, salt and ABA
To investigate the potential functions of BraSOT family, Real-time RT-PCR was used to analyze the expression of BraSOT genes in different tissues of B. rapa seedlings. The gene expression level in leaf with no treatment was used as the reference for calculation, and results are presented in S2 Table. BraSOT genes showed different expression patterns in different tissues ( Fig 6).Notably, 3 genes in group IV (Bra034065, Bra034066, Bra027963), all in group VI (Bra005921, Bra009300, Bra028711) and most in group VII showed higher transcription in root, hypocotyl and cotyledon than in leaf (Fig 6). The tissue-specific expression BraSOT genes indicate the probable roles of members of different groups in various physiological and developmental processes.
To further explore BraSOT gene expression patterns under abiotic stresses and identify genes important for improving tolerance to abiotic stresses, B. rapa seedlings were challenged with 25% PEG-6000, 200mmol/L NaCl and 100μmol/L abscisic acid (ABA). The expression patterns of most BraSOT genes showed transcriptional changed under drought, salinity, and ABA stresses in different treatment time points, and this suggested that the response of Bra-SOTs to multiple stresses is a dynamic process (S2 Table and   Genome-wide analysis of sulfotransferase genes in Brassica rapa L. 56 tended to be down-regulated. Taken together, we found the expression patterns of some BraSOT genes are similar under drought, salinity, and ABA treatments (Fig 6).

Discussion
Members of the SOT family have been found in most organisms including mammals and plants. These enzymes use PAPS as the sulfuryl donor and transfer the sulfonate group to an appropriate hydroxyl group on several classes of substrates. Although several SOTs have been identified and characterized in mammals and plants, the biological functions of SOTs in plants are still largely unknown. SOTs modulate physiological processes such as growth, development, and adaptation to stress by affecting the biological activity of certain compounds [11,18,21]. In Brassica napus, a SOT catalyzes the O-sulfonation of brassinosteroids and thereby specifically abolishes the biological activity of 24-epibrassinolide [14]. To further determine the roles of this multi-protein family, identification and analysis of more SOTs in plants is necessary.

Identification and classification of B. rapa SOT proteins
In the present study, we analyzed the SOT protein family in B. rapa and identified 56 BraSOT proteins. The density (number/genome size) of SOT genes in the B. rapa genome is 0.197, similar to that in Arabidopsis (0.163) and much higher than the value of 0.093 in O. sativa [34]. The 56 SOT proteins have an average length of approximately 325.4 amino acids, similar to SOTs in A. thaliana. Fifty-three of the 56 BraSOT genes were localized to the 10 chromosomes of B. rapa and were found on all chromosomes except chromosome 5 (Fig 1). Because only scaffold information was available for the other three BraSOT genes (Table 1), they could not be positioned on an exact chromosome. Seven gene clusters including 27 BraSOT genes were detected on five chromosomes and accounted for nearly half (48.21%) of all BraSOTs. For example, eight (Bra017364, Bra017365, Bra017368, Bra017369, Bra017370, Bra017371,  Bra017372 and Bra017373) of the 16 BraSOT genes on chromosome A09 were tandemly clustered in a 120-kb region. The distribution of gene clusters on A02, A03, A07 and A09 suggests that tandem duplication has occurred during the process of Brassica evolution [35].
Considering the results of other studies, the division of the 56 BraSOT genes into nine groups is reasonable. In Arabidopsis, 18 SOT genes were initially classified into seven subfamilies (I-VII) [14]. In 2008, three new AtSOT genes (At3G50620, At2g15730 and At4g34420) were added to the AtSOT gene family and were respectively designated AtSOT19, AtSOT20 and AtSOT21. These three new genes were classified into group VIIIof the Arabidopsis SOT gene family [2]. A tyrosylprotein SOT gene (AT1G08030) was identified in 2009 and named AtTPST [13]. Because the AtTPST protein showed low similarity (less than 15%) to any other AtSOT of Arabidopsis, we considered it to constitute a new group of the AtSOT protein family, group IX. In the present investigation, two members of Group IX showed distinct features compared with the other eight groups: they contained SOT_2 but not SOT_1 and their motif patterns were very different. Our results indicate that tyrosylprotein SOT genes should be classified into an independent group, Group IX.

Domain organization diversity and putative function analysis
Forty-eight BraSOT proteins contain only one SOT_1 domain and no other domains, while six BraSOT proteins have SOT_1 and SOT_2 domains and two proteins in Group IX have one SOT_3 domain. Transmembrane regions were only detected in six BraSOT proteins, all in groups VIII and IX, using the SMART sequence analysis tool [33], indicating that most Bra-SOT proteins are not associated with membranes. Bra034521 in Group I contains several pentatricopeptide repeat domains, which are possibly involved in RNA stabilization [36] or editing [37], suggesting that Bra034521 is involved in those functions in B. rapa. Bra017364 in Group V contains a translin domain. Translin-domain proteins are DNA-binding proteins that specifically recognize consensus sequences at breakpoint junctions in chromosomal translocations [38]. The role of Bra017364 in B. rapa is thus most likely related to chromosomal translocation. In Group VII, Bra027118 contains a coiled-coil domain. Such domains are involved in important biological functions, such as regulation of transcription factors, which suggests that Bra027118 is involved in the regulation of gene expression in B. rapa. The information presented here provides a useful reference for further study of the functions of these genes in B. rapa.
The motif patterns of the genes are relatively similar in each SOT group. All BraSOT proteins in groups I to VII possess conserved regions I and IV, which are involved in PAPS binding. These conserved regions are absent from groups VIII and IX, suggesting that group-VIII and group-IX BraSOT proteins have PAPS-binding regions that differ from those of other BraSOTs.
In summary, this investigation was the first systematic analysis of the SOT protein family in B. rapa. Our findings regarding gene and protein structure, gene distribution and protein classification should aid further investigations of the functions of BraSOT genes in B. rapa and will provide a useful reference for the study of SOT genes in other plants. Our data will also promote study of the evolution of polyploid genomes and the genetic improvement of Brassica oil and vegetable crops.

The response of B. rapa SOT genes to abiotic stresses
It has been reported that AtSOT12 gene expression was strongly induced by salt, osmotic stress and hormone treatments. The T-DNA knock-out mutant sot12 exhibited hypersensitivity to NaCl and ABA during seed germination [20]. While ABA also has recently been reported to play crucial roles in responding to abiotic stresses, such as drought and salinity [39,40]. However, it is still unknown which BraSOTs is the important ABA receptor in response to abiotic stress in B. rapa. To investigate the possible roles of BraSOT genes in B. rapa growth and development, we analyzed the transcription of all 56 BraSOT genes by qRT-PCR (Fig 6). The Bra-SOT genes displayed different expression patterns, indicating that they may have different functions in B. rapa development. As shown in Fig 6, the expression of 5 BraSOT genes (Bra034065, Bra017370, Bra009300, Bra027880, Bra015936) was substantially up-regulated under PEG-6000, salinity and ABA treatment. The high expression levels of these genes indicate they have important roles in the plant response to abiotic stresses and the ABA signal transduction. However, some members of group V subfamily(Bra017364 and Bra026538), grouped with Bra017370, were down-regulated in response to salt and drought stresses. This is similar to the AtSOTs. For example, the AtSOT12 gene was clustered with and encoded a protein with high sequence similarity to AtSOT11 and AtSOT13, but mutations in AtSOT11 and AtSOT13 did not cause a more sensitive phenotype to SA like AtSOT12 mutation [20]. Our research suggests that some group-V BraSOTs are involved in plant responses to abiotic stresses, while others, even though they have similar sequences, seem to have no role in drought or salt stress responses. The functions of these genes require further study. Most Bra-SOT genes in group VIII and IX were down-regulated by drought, salinity and ABA treatment. Moreover, the structure of this two group genes were particularly different from other group. They have more introns (Fig 3). This may suggesting that these BraSOTs have disparate functional roles under abiotic stresses or involved in a negative-feedback regulatory mechanism. In Arabidopsis, AtSOT12, which is homologous to the Bra017370, is overexpressed to enhance abiotic tolerance. In our study, Bra017370 expression level increased by drought, salinity and ABA. We thus speculated that overexpression of Bra017370 could also promote abiotic resistance in B. rapa. ABA is produced rapidly in response to stress under drought and salinity conditions and then plays an important role in the regulation of the stress response [41]. Consequently, perhaps abiotic stress induced gene in this research (Bra034065, Bra009300, Bra027880, Bra015936) could also be overexpressed to enhance B.rapa abiotic tolerance. In conclusion, the results of the stress response experiments, combined with the analysis of the gene and protein structure (Figs 3 and 5), suggest that some BraSOTs respond to drought, salinity, and ABA treatments but have different mechanism. These BraSOTs may potentially be utilized for improving the tolerance of B. rapa to abiotic stresses.