Gene Structures, Evolution, Classification and Expression Profiles of the Aquaporin Gene Family in Castor Bean (Ricinus communis L.)

Aquaporins (AQPs) are a class of integral membrane proteins that facilitate the passive transport of water and other small solutes across biological membranes. Castor bean (Ricinus communis L., Euphobiaceae), an important non-edible oilseed crop, is widely cultivated for industrial, medicinal and cosmetic purposes. Its recently available genome provides an opportunity to analyze specific gene families. In this study, a total of 37 full-length AQP genes were identified from the castor bean genome, which were assigned to five subfamilies, including 10 plasma membrane intrinsic proteins (PIPs), 9 tonoplast intrinsic proteins (TIPs), 8 NOD26-like intrinsic proteins (NIPs), 6 X intrinsic proteins (XIPs) and 4 small basic intrinsic proteins (SIPs) on the basis of sequence similarities. Functional prediction based on the analysis of the aromatic/arginine (ar/R) selectivity filter, Froger’s positions and specificity-determining positions (SDPs) showed a remarkable difference in substrate specificity among subfamilies. Homology analysis supported the expression of all 37 RcAQP genes in at least one of examined tissues, e.g., root, leaf, flower, seed and endosperm. Furthermore, global expression profiles with deep transcriptome sequencing data revealed diverse expression patterns among various tissues. The current study presents the first genome-wide analysis of the AQP gene family in castor bean. Results obtained from this study provide valuable information for future functional analysis and utilization.


Introduction
Aquaporins (AQPs) are a special class of integral membrane proteins that belong to the ancient major intrinsic protein (MIP) superfamily [1,2]. Although they firstly raised considerable interest for their high permeability to water, increasing evidence has shown that some of them also transport certain small molecules, e.g., glycerol, urea, boric acid, silicic acid, ammonia (NH 3 ), carbon dioxide (CO 2 ) and hydrogen peroxide (H 2 O 2 ) [3,4]. After the first AQP gene reported in human erythrocytes [5], in the past three decades, its homologs have been identified from all types of organisms, including eubacteria, archaea, fungi, animals and plants [2,4]. The AQPs are characterized by six transmembrane helices (TM1-TM6) connected by five loops (i.e. LA-LE) as well as two highly conserved NPA motifs located at the N-termini of two half helices (i.e. HB and HE) in LB and LE. The NPA motifs which create an electrostatic repulsion of protons and act as a size barrier form one selectivity region of the pore, whereas another region called the aromatic/arginine (ar/R) selectivity filter (i.e. H2 in TM2, H5 in TM5, LE1 and LE2 in LE) that renders the pore constriction site diverse in both size and hydrophobicity determines the substrate specificity [6][7][8]. Based on the statistical analysis, Froger et al. proposed five conserved amino acid residues (called Froger's positions, P1-5) for discriminating glycerol-transporting aquaglyceroporins (GLPs) from water-conducting AQPs: GLPs usually feather an aromatic residue at P1, an acidic residue at P2, a basic residue at P3, a proline followed by a nonaromatic residue at P4 and P5, as Y 108 -D 207 -K 211 -P 236 -L 237 observed in the Escherichia coli glycerol facilitator GlpF in contrast to A 103 -S 190 -A 194 -F 208 -W 209 in the pure water channel AqpZ [9]. More recently, based on the analysis of structure resolved and/or functionally characterized AQPs, nine specificity-determining positions (SDPs) for non-aqua substrates, i.e., urea, boric acid, silicic acid, NH 3 , CO 2 and H 2 O 2 were also predicted for each group [10]. Surveys at the global genome level indicate that the AQPs in terrestrial plants are especially abundant and diverse [11][12][13][14][15]. Based on sequence similarities, plant AQPs are divided into seven main subfamilies, i.e., plasma membrane intrinsic proteins (PIPs), tonoplast intrinsic proteins (TIPs), NOD26-like intrinsic proteins (NIPs), small basic intrinsic proteins (SIPs), uncategorized X intrinsic proteins (XIPs), GlpF-like intrinsic proteins (GIPs) and hybrid intrinsic proteins (HIPs) [16][17][18][19]. The former four subfamilies are widely distributed whereas GIPs and HIPs are only found in algae and moss, and XIPs in moss and several dicots including poplar (Populus trichocarpa) [11][12][13][14][15][16][17][18][19][20].
Castor bean (Ricinus communis L., 2n = 20) is an annual or perennial shrub that belongs to the Euphorbiaceae family. The castor oil extracted from the seeds is an important raw material used for industrial, medicinal and cosmetic purposes [21]. Although originated in Africa, castor bean is now cultivated in many tropical, subtropical and warm temperate regions around the world, especially under unfavorable conditions with barren, drought and salt stresses [22][23][24]. The recent completion of its draft genome and the available transcriptome datasets provide an opportunity to analyze specific gene families in castor bean [25][26][27][28][29][30][31][32]. In present study, a genome-wide search was performed to identify the castor bean AQP (RcAQP) genes. Furthermore, functional prediction was performed based on the ar/R filter, Froger's positions and SDPs [6,7,9], and the expression profiles were examined using deep transcriptome sequencing data. Results obtained from this study provide global information in understanding the molecular basis of the AQP gene family in castor bean.

Identification of RcAQP Genes
The genome sequences of castor bean [25] were downloaded from phytozome v9.1 (http:// www.phytozome.net/), whereas the nucleotides, Sanger ESTs (expressed sequence tags) and raw RNA sequencing reads were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/). The amino acid sequences of Arabidopsis (Arabidopsis thaliana) and poplar AQP obtained from Phytozome v9.1 (the accession numbers are available in S1 Table) were used as queries to search for castor bean AQP homologs. Sequences with an E-value of less than 1E-5 in the tBlastn search [33] were selected for further analysis. The predicted gene models were further validated with cDNAs, ESTs and raw RNA sequencing reads derived from various tissues/ organs such as root, stem, leaf, flower, seed, embryo, endosperm and callus [25][26][27][28][29][30][31][32]. Gene structures were displayed using GSDS [34]. Homology search for nucleotides, ESTs or RNA sequencing reads was performed using Blastn [33], and sequences with a similarity of more than 98% were taken into account.

Sequence Alignments and Phylogenetic Analysis
Multiple sequence alignments using deduced amino acid sequences were performed with Clus-talX [35], and the unrooted phylogenetic tree was constructed by the maximum likelihood method using MEGA 6.0 [36]. The reliability of branches in resulting trees was supported with 1,000 bootstrap resamplings. Classification of AQPs into subfamilies and subgroups was done as described before [16,19].

Gene Expression Analyses
To analyze the global expression profiles of RcAQP genes among different tissues or developmental stages, RNA sequencing data of leaf, flower, endosperm (II/III, V/VI) and seed described before [30] were examined: the expanding true leaves, appearing after the first cotyledons and leaf-pair, represent the leaf tissue; the male flower tissue includes pollen and anthers but excludes sepals; the germinating seed tissue was obtained by soaking dry seeds in running water overnight followed by germination in the dark for 3 days; and the endosperm tissue includes two representative stages termed stages II/III (endosperm free-nuclear stage) and V/ VI (onset of cellular endosperm development). The clean reads were obtained by removing adaptor sequences, adaptor-only reads, reads with "N" rate larger than 10% ("N" representing ambiguous bases) and low quality reads containing more than 50% bases with Q-value 5. Then, the clean reads were mapped to 37 identified RcAQP genes (cDNA) and released transcripts using Bowtie 2 [40], and the mapped reads were counted. The abundance of each transcript can be measured by its read counts. To remove technical biases inherent in the sequencing approach, the widely-used RPKM (reads per kilo bases per million reads) method [41], which was developed to correct certain biases (i.e. the length of the RNA species and the sequencing depth of a sample), was adopted for the expression annotation. Unless specific statements, the tools used in this study were performed with default parameters.

Identification and Classification of RcAQP Genes
Via a comprehensive homology analysis, a total of 37 loci putatively encoding AQP-like genes were identified from the castor bean genome, corresponding 36 loci by the genome annotation (Table 1) [25]. Among them, the locus 28747.t000001 was predicted to encode 243 residues (28747.m000131) with a gene length of 11054 bp (default in part of the 3' sequences), however, EST and read mapping supported the existence of two genes denoted RcXIP2;1 and RcXIP3;1, both of which harbor one intron and putatively encode 306 and 304 residues, respectively (see S1 File). In addition, although most predicted gene models were validated with ESTs and RNA sequencing reads, three loci (i.e. 28962.t000006, 30101.t000004 and 29816.t000013) seem not to be properly annotated. In phytozome v9.1, the locus 28962.t000006 was predicted to harbor two introns encoding 198 residues (28962.m000437) which is relatively shorter than that of any other PIP subfamily members, however, two nucleotide sequences (accession numbers HB466472 and HB466473) [42] and three ESTs (accession numbers EE257493, EE260412 and EE258867) suggested that this locus harbors four introns encoding 270 or less residues depending on different alternative splicing isoforms (at least four) (see S2 File), and the longest transcript consistent with other PIPs was selected for further analyses. The locus 30101.t000004 was predicted to encode 203 residues (30101.m000372), however, EST mapping indicated that a number of 143-bp coding -sequences close to the second intron is absent from the assembled genome, which was further validated with PCR amplification and sequencing (see S3 File). The locus 29816.t000013 was predicted to harbor seven introns encoding 367 residues (29816. m000676) which is considerably longer than that of any other NIPs, however, thousands of RNA sequencing reads indicated that this locus harbors only four introns putatively encoding 269 residues (see S4 File). Although the current draft genome of castor bean is comprised of 25,763 scaffolds without anchored to 10 chromosomes [25], we still observed that 8 scaffolds harbor two AQP-encoding loci, whereas other 21 scaffolds contain only one (Table 1).

Analysis of Exon-Intron Structure
The exon-intron structures of the 37 RcAQP genes were analyzed based on the optimized gene models. Although the ORF (open reading frame) length of each gene is similar (627-830 bp), the gene size (from start to stop codons) is distinct (705-4934 bp) ( Table 1 86-1469 bp and 91-650 bp, respectively), however, RcNIP5;1 contains three instead. Two out of three RcSIP1s harbor no introns, in contrast, RcSIP1;1 and RcSIP2;1 contain two. Most RcXIPs contain one intron except for RcXIP1;4 and RcXIP2;1 that contain zero or two, respectively (Table 1 and Fig 2).

Structural Features of RcAQPs
Sequence analysis showed that the 37 deduced RcAQPs consist of 208-309 amino acids, with a theoretical molecular weight of 22.49-33.22 kDa and a pI value of 4.93-9.97. Homology analysis of these deduced proteins revealed a high sequence diversity existing within and between the five subfamilies. The sequence similarity of 65 S2 Table).
Topological analysis showed that almost all RcAQPs were predicted to harbor six TMs except for RcXIP1;4 containing only four, which is consistent with the results from multiple alignments with structure proven AQPs (Table 2 and S5 File). The subcellular localization of each RcAQP was also predicted ( Table 2). RcPIPs with an average pI value of 7.89 and RcNIPs with an average pI value of 8.44 are localized to plasma membranes. RcTIPs with an average pI value of 5.63 are mainly localized to vacuoles, though several members (RcTIP3;1 and RcTIP4;1) were mispredicted to target the cytosol by WoLF PSORT. RcSIPs (with an average pI value of 9.79) and RcXIPs (with an average pI value of 7.63) are mainly predicted to target the plasma membrane by Plant-mPLoc, in contrast, the predicted localizations by WoLF PSORT are diverse, including the plasma membrane, chloroplast, vacuole, peroxisome and cytosol. In addition, based on the multiple alignments with structure/function characterized AQPs, the conserved residues typical of dual NPA motifs, the ar/R selectivity filter, five Froger's positions and nine SDPs were also identified (Tables 2 and 3, and S6 File).

RcPIP Subfamily
All RcPIPs were identified to have similar sequence length, however, RcPIP2s (270-288 residues) can be distinguished from RcPIP1s (286-288 residues) by harboring relatively shorter Nterminal and longer C-terminal sequences (see S5 File). The five RcPIP1s harbor sequence similarities of 90.7Ç99.7%, whereas the similarity similarities of the five RcPIP2s are 73.3Ç94.5%. Between RcPIP1 and RcPIP2 members, sequence similarities of 59.1Ç65.9% were observed (see S1 Table). The dual NPA motifs, ar/R filter (F-H-T-R), and four of five Froger's positions are highly conserved in RcPIPs (Table 2). In contrast, the P1 position is more variable with the appearance of an E, Q or M residue ( Table 2). In addition, two phosphorylation sites corresponding to S115 and S274 in Spinacia oleracea PIP2;1 [8] are invariable in RcPIP2s, and the former one is even highly conserved in all RcPIPs, RcTIPs and RcXIPs, and most RcSIP1s except for the S!T substitution in several members (see S5 File), implying their regulation by phosphorylation.    S1 Table). Dual NPA motifs and three Froger's positions (i.e. P3, P4 and P5) are highly conserved in RcTIPs (Table 2), in contrast, residue substitutions were observed at the P1 and P2 positions: the usual T is replaced by A in RcTIP5;1 at the P1 position, and the S is replaced by A in RcTIP3;1 and RcTIP5;1 at the P2 position. Of the ar/R filter, the usual H residue at H2 and I at H5 positions are replaced by N and V in RcTIP5;1, respectively; at the LE1 position, members of RcTIP1, RcTIP3, and RcTIP4 subgroups favor an A, whereas RcTIP2 and RcTIP5 members favor the G residue; residues at the LE2 position are more variable, including V, R or C ( Table 2).

RcSIP Subfamily
There are only four members composing two subgroups in the RcSIP subfamily, which consist of 234-240 residues (   (Table 2).

RcXIP Subfamily
RcXIPs vary from 208 to 309 residues in length ( Table 2). With the exception of the RcXIP1 subgroup that contains four members, other two groups harbor only one. RcXIP2;1 shares 37.6-45.2% sequence similarity with RcXIP1s, which is considerably lower than that between RcXIP2;1 and RcXIP3;1 (76.2%), and slightly lower than that between RcXIP3;1 and RcXIP1s (39.5-47.1%) (see S2 Table). Among four RcXIP1s, the highest sequence similarity of 90.6% and the lowest of 42.8% were observed between RcXIP1;1 and RcXIP1;2, or RcXIP1;3 and RcXIP1;4, respectively. Compared with other RcXIPs, the length of RcXIP1;4 is relatively shorter (only 208 residues). Further sequence alignments indicated that RcXIP1;4 harbors only the first NPA motif and H2 and P1 positions ( Table 2 and (Table 2). In addition, like most XIPs [11,47], two highly conserved C residues in the LG/AGC motif of LC and the NPARC motif of LE were also found in RcXIPs except for RcXIP1;4 which is deficient in LE (see S5 File).

Expression Profiles of RcAQP Genes
To gain more information on the role of RcAQP genes in castor bean, RNA sequencing data of leaf, flower, endosperm (II/III, V/VI) and seed were investigated. Results showed that all 37 RcAQP genes were detected in at least one of the examined tissues, i.e., 35 in flower, 34 in leaf, 33 in seed, 31 in stage II/III endosperm and 29 in stage V/VI endosperm. According the RPKM annotation, the leaf tissue was shown to harbor the most transcripts, followed by the seed and flower tissues, and the endosperm has the fewest. Although the flower tissue harbors the most expressed AQP genes, its total expression level is only 0.46 and 0.71 folds of that of the leaf and seed tissues. In leaf, seed and flower tissues, the PIP subfamily contributes the major transcripts, in contrast, the TIP subfamily contributes the most in the endosperm. In the leaf tissue, RcAQP genes is considerably low in the stage V/VI endosperm, nevertheless, RcTIP3;1 occupies more than 95% of the total TIP transcripts or 69% of the total AQP transcripts. Although most genes were shown to be constitutively expressed in examined tissues, three genes seem to be tissue-specific, i.e., RcNIP4;2 in flower, RcXIP1;4 in leaf and RcXIP3;1 in seed. Castor bean AQP isoforms seem to play different roles in various tissues. For example, RcPIP2;4 is the most abundant transcript in both the flower and seed tissues, whereas RcPIP1;4 and RcTIP3;1 were shown to be expressed most in the leaf and endosperm tissues, respectively, suggesting their crucial roles in these tissues. In addition, the transcripts of RcNIP5;1, RcNIP7;1 and RcXIP1;1 were also shown to be relatively abundant in seed, flower and leaf, respectively (Fig 3). According to the cluster analysis shown in Fig 3, more similar expression pattern of RcAQP genes was observed between leaf and seed, and two stages of endosperm; two groups with distinct expression levels were also observed, where the more abundant group includes eight PIPs (i.e. RcPIP2;2, RcPIP1;2, RcPIP1;3, RcPIP2;3, RcPIP2;4, RcPIP1;4, RcPIP2;1 and RcPIP1;5), five TIPs (i.e. RcTIP1;1, RcTIP2;1, RcTIP4;1, RcTIP1;2 and RcTIP3;1), two NIPs (i.e. RcNIP1;1 and RcNIP5;1), four SIPs (i.e. RcSIP1;2, RcSIP1;3, RcSIP2;1 and RcSIP1;1).

Small Number but High Diversity of Castor Bean AQP Genes
Compared with animals and microbes, AQPs are particularly abundant and diverse in land plants. To date, a high number of homologs have been identified from several plant species, i.e., 19 from Selaginella moellendorffii [18], 23 from Physcomitrella patens [9], 23 from Vitis vinifera [20], 33 from Oryza sativa [48], 35 from A. thaliana [16], 36 from Zea mays [49], 41 from Solanum tuberosum [14], 47 from Solanum lycopersicum [13], 55 from P. trichocarpa [19], 66 from Glycine max [15], and 71 from Gossypium hirsutum [12]. In contrast, the characterization of castor bean AQPs is still in its infancy. In the present study, a total of 37 AQP genes were identified from the castor bean through mining the genome and transcriptome datasets. Previously, one review also informed the presence of 37 RcAQP genes in castor bean, however, their result was merely dependent on the automatic genome annotation and only 34 out of the 37 RcAQP genes identified by this study were mentioned [50]. Although that paper focused on the analysis of TIPs, only seven RcTIP genes were described, whereas RcTIP2;2 (30101.m000372) and RcTIP5;1 (30147.m014231) were missed. Instead, three transcripts (i.e. 28962.m000435, 28962.m000436 and 30170.m014271) were misannotated as PIPs. In addition, four RcXIP1 genes were also misannotated as PIPs (S3 Table). The numbers of the castor bean AQP family are comparable to that of Arabidopsis and maize but less than that of potato, tomato, poplar, soybean and cotton.
Since the AQP genes in the model plant Arabidopsis and poplar were well characterized [16,19,47,51], their deduced proteins were added in the phylogenetic analysis of RcAQPs, which assigned the 37 RcAQPs to five subfamilies. Compared with Arabidopsis without XIPs, castor bean contains five XIPs and one more SIP but fewer members of PIPs, TIPs and NIPs. In contrast, the number of members in the five castor bean subfamilies was shown to be relatively smaller than that of poplar (Fig 4). With the exception of XIP subfamily, the further classification of RcAQP subfamilies into subgroups is consistent with Arabidopsis, i.e., 2 PIP subgroups, 5 TIP subgroups, 7 NIP subgroups and 2 SIP subgroups. However, it should be noticed that, as shown in Fig 1, the classification of AtNIP2;1 and AtNIP3;1 was not well resolved. In the case of AtNIP2;1, it shares the highest similarity of 67.0% with AtNIP1;2 in Arabidopsis, 64.7% with RcNIP1;1 in castor bean, or 62.0% with PtNIP1;2 in poplar. Given the same ar/R filter (W-V-A-R) and its closer cluster to the NIP1 subgroup, we recommend to class AtNIP2;1 to the NIP1 subgroup, thereby, no NIP2s were retained in Arabidopsis as seen in castor bean (RcNIP2;1) and poplar (PtNIP2;1) which harbor a G-S-G-R filter. In the case of AtNIP3;1, although it was clustered with the NIP4 subgroup, its closest homolog is AtNIP1;2 (61.7%) in Arabidopsis, RcNIP1;1 (61.6%) in castor bean, or PtNIP1;1 (60.3%) in poplar, thus AtNIP3;1 can also be nominated as an NIP1 member. In addition, according to the phylogenetic analysis and sequence similarity, we propose to rename PtNIP1;5, PtNIP1;3, PtNIP1;4, PtNIP3;3, PtNIP3;4, PtNIP3;1, PtNIP3;2 and PtNIP3;5 as PtNIP3;1, PtNIP4;1, PtNIP4;2, PtNIP5;1, PtNIP5;2, PtNIP6;1, PtNIP6;2 and PtNIP7;1 (see S1 Table). According to the nomenclature based on the ar/R filter that classes NIPs into subgroups NIP I, NIP II and NIP III [52,53], RcNIP1;1, RcNIP3;1 and two RcNIP4s can be also assigned to the subgroup NIP I, three members (i.e. RcNIP5;1, RcNIP6;1 and RcNIP7;1) to the subgroup NIP II, whereas RcNIP2 forms a new subgroup termed the subgroup NIP III as observed in rice and maize [53]. Since no XIP homolog was found in the Arabidopsis genome, the nomenclature proposed by Lopez et al. [19] for poplar was adopted to divide RcXIPs into three subgroups. Besides supported by high bootstrap values, XIP1s are characterized by the ar/R filter of V-F-V-R, XIP2s of I-F-V-R, and XIP3s of V-Y-A-R.
As seen in Arabidopsis and poplar, gene pairs were also observed in the castor bean AQP gene family, though the number is considerably small (Fig 1). For example, five AtPIP1s were clustered together apart from PIP1s of castor bean and poplar; RcPIP1;2, RcPIP1;3 and RcPIP1;4 were clustered with PtPIP1;3; RcPIP2;1 and RcPIP2;2 were clustered with PtPIP2;3 and PtPIP2;4; four RcXIP1s were clustered with two PtXIP1s; RcSIP1;2 and RcSIP1;3 were clustered with PtSIP1;3 and PtSIP1;4. It is well established that poplar underwent one wholegenome triplication event (designated γ) and one doubling event, whereas Arabidopsis underwent the same γ event and two independent doubling events [54][55][56][57][58][59]. And these duplication events mainly contribute to the gene expansion in these two plant species. Nevertheless, in contrast to poplar, the Arabidopsis genome encodes relatively fewer AQP genes due to massive gene loss and chromosomal rearrangement after genome duplications [54,60]. According to the comparative genomics analyses, the γ duplication occurred at approximate 117 million years ago, shortly before the origin of core eudicots [61]. As a core eudicot plant, castor bean was shown to share and only undergo the whole-genome γ duplication [25]. Since most gene pairs tend to be clustered in same scaffolds, tandem duplications are promised to be the main force for their expansion.

Subcellular Localization and Functional Inference of RcAQPs
In comparison to non-plants, plant AQPs exhibit a broader subcellular localization, including plasma membrane, vacuolar, endoplasmic reticulum (ER), Golgi apparatus, mitochondrion and chloroplast, etc., corresponding to the high degree of compartmentalization of plant cells [3,62]. Our subcellular localization prediction of RcAQPs included the plasma membrane, vacuole, chloroplast, peroxisome and cytosol. As observed in other plants and suggested by their names [62], basic RcPIPs and acidic RcTIPs are localized to plasma membranes and vacuoles, respectively. All RcNIPs were predicted to target the plasma membrane, though their homologs in other organisms were determined to localize to the plasma membrane, ER or peribacteroid membrane of root nodules [63][64][65]. Compared with the diverse localizations predicted by WoLF PSORT, all XIPs were predicted to localize the plasma membrane by Plant-mPLoc, which is consistent with experimental results [66]. RcSIP2;1 was predicted to target the ER as reported in Arabdopsis and grapevine [67,68], in contrast, three RcSIP1s were predicted to localize the plasma membrane, vacuolar and chloroplast. Thereby, further investigations are required on the subcellular localization of RcAQPs.
Although plant AQPs were first known for their high water permeability, when expressed in Xenopus oocytes or yeast cells, increasing evidence has shown that some of them are also participated in the transport of other small molecules such as glycerol, urea, boric acid, silicic acid, NH 3 , CO 2 and H 2 O 2 [2]. As shown in Table 3, most RcAQPs exhibit an AqpZ-like Froger's positions to favor the permeability of water. In contrast, NIP subfamily members possess mixed key residues of GlpF for P1 and P5, and AqpZ for P2-P4. Given the glycerol permease activity of soybean NOD26 and Arabidopsis NIPs [69,70], RcNIPs are promised to transport glycerol and may play roles in oil formation/translocation.
Although highly variable in the ar/R filter, plant TIPs were shown to transport water as efficiently as PIPs [79]. Additionally, they also allow urea, NH 3 and H 2 O 2 through [80,81]. As shown in Table 3, all RcTIPs represent urea-type SDPs, whereas RcTIP5;1 represents H 2 O 2type SDPs, indicating similar functionality. Compared with typical NH 3 SDPs, RcTIP2;2 seems to represent novel SDPs (T-K-T-V-A-S-A-P-S) with the substitution of S for A/R/T at SDP9.
As well as glycerol and water, plant NIPs have been found to transport urea, boric acid, silicic acid, NH 3 and H 2 O 2 [63][64][65][80][81][82]. As shown in Table 3, RcNIP5;1 is promised to be a transporter of urea, boric acid and H 2 O 2 ; RcNIP1;1 is promised to be a urea and NH 3 transporter; RcNIP3;1 and RcNIP4;2 are potential urea and H 2 O 2 transporters; RcNIP4;1 and RcNIP7;1 are potential urea transporters; RcNIP2;1 is a potential urea and H 2 O 2 transporter. In addition, compared with typical silicic acid SDPs, HbNIP2;1 seems to represent novel SDPstypes with the substitution of V for A/E/L at SDP3 or Q for A/K/P/T at SDP9, which is similar to that of GmNIP2;1 and GmNIP2;2 (S-Y-E-R-G-N-R-T-P) [83]. RcNIP6;1 may also be a urea transporter representing novel SDPs (H-P-I-A-L-E-G-S-N) with the substitution of E for A/G/ P at SDP6.

Distinct Expression Profiles of RcAQP Isoforms in Various Tissues
Water is essential for all life on earth. Like other organisms, plant growth and development depends on water uptake and transport across cellular membranes and tissues. Thereby, water stress forms a major factor that decreases plant growth and productivity. One important response of plant cells to water stress is the regulation of AQPs [84][85][86][87]. Although plant AQPs were reported to be regulated by posttranslational modifications (e.g. phosphorylation, methylation and glycosylation), gating, heteromerization and cellular trafficking [88,89], the transcriptional regulation still acts as the key mechanism. Since the organ specificity of AQP expressions may be closely related to the physiological function of each organ, we took advantage of deep transcriptome sequencing (also known as RNA-Seq, an approach to transcriptome profiling through sequencing the total cDNA) data to survey the expression profiles of RcAQP genes from a global view. Results showed that the transcripts of RcPIP and RcTIP subfamily members are highly abundant in all examined tissues, which is consistent with that observed in other plant species such as maize, Arabidopsis, tomato and potato [13,14,49,51]. Considering PIPs and TIPs are highly permeable to water [79,90], their high abundance indicates their crucial roles in intracellular, cellular, organic and whole plant water balance of castor bean. RcPIP1;4, RcPIP2;4 and RcTIP1;1 were considerably abundant in leaves and are promised to play the key role in leaf hydraulics. RcPIP2;4, RcPIP1;2 and RcTIP1;2 are more likely to control the flower water balance for their high abundance in this tissue. The highly abundant RcPIP2;4 and RcTIP2;1 are promised to govern the seed water balance and RcTIP3;1 is promised to monitor the water balance of endosperms (regardless of stage II/III or V/VI). Compared with other tissues or developmental stage, RcTIP1;1 is expressed more in leaf and endosperm II/III, in contrast, its closest homolog in Arabidopsis (AtTIP1;1, encoding a transporter of water, urea and H 2 O 2 ) was reported to be highly expressed in vascular tissues of root, stem, cauline leaf and flower but not in the apical meristem [79,[91][92][93][94][95]. Despite without a strict organ-specific expression pattern, RcTIP1;2 is preferentially expressed in flowers, by contrast, its closest Arabidopsis homolog (AtTIP1;3, encoding a transporter of water and urea) was shown to be expressed highest in mature pollen, moderate in flower, very low in inflorescence and nondetectable in any other tissue [93,96]. Although the castor bean genome encodes two TIP2s, these two genes exhibit distinct expression profiles. Despite very low, the transcripts of RcTIP2;2 was observed in leaf, seed, flower and endosperm II/III, but not detectable in endosperm V/VI. In contrast, RcTIP2;1 is expressed in all examined tissues and the transcript level is considerably high in seed and leaf. In Arabidopsis, the closest homolog of RcTIP2;1 is AtTIP2;1, encoding a transporter of water, urea and NH 3 , which was shown to be mainly expressed in flower, shoot, and stem, and to a lower extent in roots [93,[97][98][99]. AtTIP3;1, an orthology of RcTIP3;1 was reported to be a seed-and embryo-specific AQP gene [100], in contrast, RcTIP3;1 was preferentially expressed in the endosperm of developing seeds and considerably low in germinating seed. In addition, it is noteworthy that three putative non-aqua transporter encoding genes (i.e. RcNIP5;1, RcNIP7;1 and RcXIP1;1) were shown to be highly abundant in certain tissues. Compared with other tissues tested, the transcript level of RcNIP5;1 is considerably high in seed. Like castor bean, the Arabidopsis genome encodes a unique NIP5 member (AtNIP5;1), which was shown to transport boric acid and arsenite as well as water [65,101]. Expression analyses indicated that AtNIP5;1 is mainly expressed in root epidermal, cortical, and endodermal cells and the transcript is upregulated in response to B deprivation [65]. RcNIP7;1 can be regard as a flower-specific gene, because its expression level in leaf is extremely low. Similar to castor bean, AtNIP7;1, the orthology of RcNIP7;1 in Arabidopsis was also found to be specifically expressed in anther, encoding a less efficient boric acid transporter in comparison to AtNIP5;1 and AtNIP6;1 [102]. RcXIP1;1, a less characterized AQP subfamily member not found in Arabidopsis [19,47,66], was considerably abundant in leaf tissue, thus further investigating its function is of special interest. These results indicated that distinct AQP evolution occurred after the divergence of castor bean from Arabidopsis, and the AQP functional characterization in specific biological process needs to be performed beyond model plant species such as Arabidopsis.

Conclusions
To our knowledge, this is the first genome-wide study of the castor bean AQP gene family and using systematic nomenclature assigned 37 RcAQPs into five subfamilies based on the sequence similarity and phylogenetic relationship with their Arabidopsis and poplar counterparts. Furthermore, their structural and functional properties were investigated using bioinformatics tools. The global expression profiles of these 37 RcAQP genes were examined with deep transcriptome sequencing. And putative transporters of water, glycerol, urea, boric acid, silicic acid, NH 3 , CO 2 and H 2 O 2 were also predicted and discussed. The RcAQP genes identified in this study represent an important resource for future functional analysis and utilization. S6 File. SDP analysis of castor bean AQPs from alignments with amino acid sequences of AQPs transporting non-aqua substrates. (PDF) S1 Table. List of the Phytozome accession numbers of the AQPs genes identified in Arabidopsis (35) and poplar (55).