Genome-Wide Characterization of Major Intrinsic Proteins in Four Grass Plants and Their Non-Aqua Transport Selectivity Profiles with Comparative Perspective

Major intrinsic proteins (MIPs), commonly known as aquaporins, transport not only water in plants but also other substrates of physiological significance and heavy metals. In most of the higher plants, MIPs are divided into five subfamilies (PIPs, TIPs, NIPs, SIPs and XIPs). Herein, we identified 68, 42, 38 and 28 full-length MIPs, respectively in the genomes of four monocot grass plants, specifically Panicum virgatum, Setaria italica, Sorghum bicolor and Brachypodium distachyon. Phylogenetic analysis showed that the grass plants had only four MIP subfamilies including PIPs, TIPs, NIPs and SIPs without XIPs. Based on structural analysis of the homology models and comparing the primary selectivity-related motifs [two NPA regions, aromatic/arginine (ar/R) selectivity filter and Froger's positions (FPs)] of all plant MIPs that have been experimentally proven to transport non-aqua substrates, we predicted the transport profiles of all MIPs in the four grass plants and also in eight other plants. Groups of MIP subfamilies based on ar/R selectivity filter and FPs were linked to the non-aqua transport profiles. We further deciphered the substrate selectivity profiles of the MIPs in the four grass plants and compared them with their counterparts in rice, maize, soybean, poplar, cotton, Arabidopsis thaliana, Physcomitrella patens and Selaginella moellendorffii. In addition to two NPA regions, ar/R filter and FPs, certain residues, especially in loops B and C, contribute to the functional distinctiveness of MIP groups. Expression analysis of transcripts in different organs indicated that non-aqua transport was related to expression of MIPs since most of the unexpressed MIPs were not predicted to facilitate the transport of non-aqua molecules. Among all MIPs in every plant, TIP (BdTIP1;1, SiTIP1;2, SbTIP2;1 and PvTIP1;2) had the overall highest mean expression. Our study generates significant information for understanding the diversity, evolution, non-aqua transport profiles and insight into comparative transport selectivity of plant MIPs, and provides tools for the development of transgenic plants.

Plant AQPs have been reported recently to transport not only water but also a wide range of substrates such as ammonia, antimony, arsenite, boron, carbon dioxide, glycerol, hydrogen peroxide, silicon, urea etc [2,22,23,24,25].Almost all of these molecules are important for plant growth and development, plant nutrition, photosynthesis, structures of biological membranes and cell walls, tolerance to biotic and abiotic stresses, stomatal movement and senescence [26,27,28,29].These physiological roles as well as the chance of heavy metalloids such as arsenic and antimony to enter into the food chain through plant AQPs suggest that it is important to understand their transport selectivity profiles.Despite the discovery of more than 400 AQPs in plants, very few studies have been done to compare their transport profiles and the molecular determinants for the substrate selectivity.
AQPs consist of six transmembrane (TM) α-helices (helix H1-H6) and five loops (loops A-E).The N-and C-termini are located on the cytoplasmic side of the membrane.In the pore of the channel, two regions of constriction have been proposed to specify the transport selectivity profile.The first constriction is formed at the centre of the pore by oppositely juxtaposing two Asn-Pro-Ala (NPA) motifs in loops B and E [30].This constriction is supposed to be involved in proton exclusion [31].Consensus sequences are suggested for the first (SGXHXNPAVT) [32] and second (GXXXNPAR(S/D)XG) [33] NPA motifs.The second constriction known as the aromatic/arginine (ar/R) selectivity filter is formed at the extracellular mouth of the pore by four residues from H2, H5, and loop E (LE1 and LE2), respectively [34,35].Variability at the ar/R selectivity filter is thought to form the basis of the broad spectrum of substrate conductance in plant AQPs [11,14,36,37].Up to five relatively conserved amino acid residues known as the Froger's positions (FPs) and those designated P1-P5 play roles in substrate selectivity [32,38].Recently, some specificity-determining positions have been suggested by analyzing the protein sequences of MIPs transporting non-aqua substrates in wet-lab experiments [23].
Identification and characterization of the MIP gene family is the first step in investigating the role of MIPs in plant water relationships or transporting physiologically important small molecules.Grasses, plants of the Poaceae family, the largest plant family in the world, afford the bulk of human nutrition, and highly productive grasses are potential sources of sustainable biofuels [39,40].Phytozome (www.phytozome.net),which facilitates comparative genomic studies among green plants, provides access to six grass plants.The MIPs in rice and maize, among these six grass plants, have been reported [12,14,17].There has been no study for MIPs in the remaining four grass plants namely switchgrass (Panicum virgatum), foxtail millet (Setaria italica), sorghum (Sorghum bicolor) and Brachypodium distachyon.P. virgatum, which exists at multiple ploidies, is a drought tolerant plant and has been intensively studied as a source of lignocellulosic biomass to produce renewable energy [41,42].S. italica is closely related to P. virgatum.It is a small diploid C4 panicoid crop species and a more tractable experimental model because of its small genome [43].S. bicolor, related to sugar cane and maize, is grown for food, feed, fibre and biofuels [44].B. distachyon, related to rice, maize, wheat, barley, sorghum and millet, has several advantages as an experimental model organism for understanding genetic, cellular and molecular biology of temperate grasses [40].
In the study reported herein, we identified MIP genes in the genomes of P. virgatum, S. italica, S. bicolor and B. distachyon.We investigated the phylogeny, structural properties, in silico subcellular localization and expression profiles of MIPs in these plants.Based on structural analysis of the homology models and comparing the primary selectivity-related motifs, we further deciphered the non-aqua transport profiles (ammonia, antimony, arsenic, boron, CO 2 , H 2 O 2 , silicon and urea) and molecular determinants for substrate selectivity of the MIPs in the four grass plants and compared them with their counterparts in two grass plants such as rice (OsMIP) and maize (ZmMIP) and six non-grass plants such as soybean (GmMIP), poplar (PtMIP), cotton (GhMIP), Arabidopsis thaliana (AtMIP), Selaginella moellendorffii (SmMIP) and Physcomitrella patens (PpMIP).

Identification of PvMIP, SiMIP, SbMIP and BdMIP genes
The genomes of P. virgatum (JGI v1.1), S. italica (JGI v2.1), S. bicolor (v2.1) and B. distachyon (v1.2), available at Phytozome, were searched for MIPs using TBLASTN and BLASTp tools with the protein sequences of the complete set of 55 MIPs from P. trichocarpa and 22 MIPs from P. patens as queries.PvMIPs, SiMIPs, SbMIPs and BdMIPs were included until no more MIPs could be found from P. virgatum, S. italica, S. bicolor and B. distachyon, respectively.Every sequence from each plant was individually compared with functional annotations by browsing the Phytozome databases of P. virgatum, S. italica, S. bicolor and B. distachyon to indentify the maximum number of MIPs for further analyses.The genomic regions containing MIP genes were further used to determine the gene structure using the program GeneMark.hmm ES-3.0 [45] (http://exon.gatech.edu/GeneMark), a self-training based algorithm for prediction of genes from novel eukaryotic genomes, and Arabidopsis was chosen as a model organism in GeneMark for gene prediction in P. virgatum, S. italica, S. bicolor and B. distachyon.When short genes were found, their sequences with 1000 base flanking regions were subjected to Genetyx_SV_RC_version 7 to investigate their protein sequences.

Phylogenetic and domain analysis of PvMIPs, SiMIPs, SbMIPs and BdMIPs
PvMIPs, SiMIPs, SbMIPs or BdMIPs were separately aligned with PtMIPs using the Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/) and a phylogenetic tree was constructed using Molecular Evolution Genetic Analysis (MEGA), version 5.0 [46].The evolutionary history was inferred using the Neighbor-Joining method and the genetic distance was estimated by the p-distance method.To identify the total number of subfamilies present in PvMIPs, SiMIPs, SbMIPs and BdMIPs, phylogenetic analysis was also conducted with PpMIPs that have seven subfamilies [20], whereas PtMIPs have five subfamilies.The identified PvMIPs, SiMIPs, SbMIPs and BdMIPs were classified into different subfamilies and groups by their phylogenetic relationship with PtMIPs.To investigate the different subfamilies and groups, we further analyzed phylogeny separately with AtMIPs, ZmMIPs, OsMIPs and GmMIPs.PvMIPs, SiMIPs SbMIPs and BdMIPs were named according to the best similarities from the trees generated by phylogeny analysis.To construct the phylogenetic tree with the MIPs in the four grass plants, all of their MIPs were aligned as above.The TM α-helices were predicted by SOSUI (http://bp.nuap.nagoya-u.ac.jp/sosui/),TMpred (http://www.ch.embnet.org/software/TMPRED_form.html)and the tools of ExPASy (http://kr.expasy.org/tools/).

Homology modeling
Homology models were constructed using the Molecular Operating Environment software (MOE 2009.10;Chemical Computing Group, Quebec, Canada).The sequence of each MIP homologue was aligned with the open conformation of spinach PIP, SoPIP2;1 (PDB, Protein Data Bank ID: 2B5F) [47] using the MOE software as described previously [36].The alignment of the MIP homologue was based on both sequence and structural homology with the structure of SoPIP2;1.The 3D structure models were formed using the MOE homology program and the stereochemical quality of the templates and the models was assessed, as we described previously [36].

Prediction of subcellular localization and computation of Ka/Ks value
The subcellular localizations of PvMIPs, SiMIPs, SbMIPs and BdMIPs were predicted in silico by using tools of WoLF PSORT (http://www.genscript.com/wolf-psort.html),TargetP (www.cbs.dtu.dk/Services/TargetP),Cello prediction system (http://cello.life.nctu.edu.tw/) and MultiLoc2 (www.abi.inf.uni-tuebingen.de/Services/MultiLoc2). Ka and Ks are the numbers of non-synonymous and synonymous substitutions per site, respectively on a protein-coding gene.The Ka/Ks values of the PvMIPs, SiMIPs, SbMIPs and BdMIPs were calculated using an online Ka/Ks calculation tool at http://services.cbu.uib.no/tools/kaks.A Ka/Ks value greater than one implies gene evolution under positive or Darwinian selection; less than one indicates purifying (stabilizing) selection and a Ka/Ks value of one suggests a lack of selection or possibly a combination of positive and purifying selections at different points within the gene that cancel each other out [18].

Expression analysis
For expression analysis, a compendium of RNA-seq data for the plants in the Phytozome was used.In the Phytozome, P. virgutam, S. bicolor and B. distachyon were selected separately and the phytozome accession number of a specific MIP was entered to search the gene.Transcript level as FPKM (Fragments per Kilobase of Transcript per Million Mapped Reads) values of a MIP gene was achieved from the gene view link.The FPKM values for each MIP gene of S. italica was retrieved from the InterMine interface of Phytozome (https://phytozome.jgi.doe.gov/phytomine/template.do? name=One_Gene_Expression&scope=global) using phytozome accession number or identifier.The FPKM values of individual MIP gene in leaf, root and shoot under diverse conditions were retrieved and put into the Microsoft Excel.The heatmap was generated using conditional formatting based on the FPKM values.The FPKM values <1 were treated as no expression of the respective gene.

Determination of pore diameter and pore lining residues
To analyze the MIP channels, the poreWalker server [48] (http://www.ebi.ac.uk/thornton-srv/ software/PoreWalker/) was used.This is a fully automated method designed to detect and characterize transmembrane protein channels from their 3D structures.The 3D structure of a MIP in PDB format was uploaded to the server, which generated the specific pore characteristics, particularly the conformation and the regularity of the channel cavity, the corresponding pore lining residues and atoms, and the location of pore centers along the channel.From the PoreWalker outputs, the pore diameter profiles at different regions of a MIP channel were compiled.From the given pore diameter profile of a channel, continuous numerical data were constructed from the non-continuous numerical data through a customized statistical language R-script so that the precise pore diameter at a specific region particularly at the ar/R selectivity filter could be determined.The existing values of pore diameters generated by the PoreWalker were used as an input in the R-script to calculate the missing values of pore diameters to make a continuous pore diameter profile.Through the PoreWalker server, the pore lining residues, which are very important for the formation of a channel, were identified.

Results
Genome-wide identification of PvMIP, SiMIP, SbMIP and BdMIP genes The whole genome shotgun sequence (WGS) of P. virgatum, S. italica, S. bicolor and B. distachyon available at Phytozome was searched for PvMIP, SiMIP, SbMIP and BdMIP genes using TBLASTN.The query PtMIP and PpMIP sequences from P. trichocarpa and P. patens resulted in 116, 51, 44 and 37 hits for PvMIPs, SiMIPs, SbMIPs and BdMIPs, respectively.We further analyzed the PvMIP, SiMIP, SbMIP and BdMIP sequences for domain identification.Out of 116 unique hits for PvMIPs, 48 were deemed to be pseudo MIP genes after manual inspection of their amino acid sequences, TM domains and homology models, and were discarded (S1 Table ).Out of the 51 unique hits for SiMIPs, 9 were deemed to be pseudo MIP genes and were discarded (S1 Table ).On the other hand, 6 and 9 unique hits for SbMIPs and BdMIPs, respectively were deemed to be pseudo MIP genes and discarded (S1 Table ).We ultimately obtained 68, 42, 38 and 28 full-length PvMIP, SiMIP, SbMIP and BdMIP protein sequences from the WGS of P. virgatum, S. italica, S. bicolor and B. distachyon, respectively (Tables 1-4).

Nomenclature and predicted subcellular localization of PvMIPs, SiMIPs, SbMIPs and BdMIPs
The phylogenetic analysis showed that PvMIPs, SiMIPs, SbMIPs and BdMIPs were divided into four subfamilies.PIPs, TIPs, NIPs and SIPs of PvMIPs, SiMIPs SbMIPs and BdMIPs clustered with those subfamilies in the respective plant (Fig 1).However, no XIP was found.Sequences belonging to hybrid intrinsic proteins (HIPs) and a novel plant MIP (GIP, GlpF-like intrinsic protein) homologous to bacterial glycerol channel reported in the nonvascular moss P. patens [20] were not found.Fig 1 shows that most of the PIPs clustered either to PIP1s or PIP2s.However, some of the PIPs formed distinct clades from PIP1s and PIP2s.In contrast to PIP1s or PIP2s, they had no N-and C-terminal characteristic lengths [12], and in comparison with reference PIPs, they had the characteristic FPs (discussed later).The phylogenetic analysis with all PIPs from the 12 plants showed that these PIPs clustered with OsPIP2;7 and OsPIP2;8 (S1 Fig) .Moreover, their percentage of identity at amino acid level with OsPIP2;7 and OsPIP2;8 (~65% to 80%) was higher than that with PpPIP3;1 (~54%).We therefore named these PIPs as PIP2s.The PvTIPs, SiTIPs and BdTIPs had five subgroups (TIP1 to TIP5) similar to TIPs in Arabidopsis, maize, poplar, rice and soybean.However, SbTIPs had four subgroups (SbTIP1 to SbTIP4).Four subgroups of NIPs were found in P. virgatum, S. bicolor and B. distachyon.Nevertheless, NIPs in S. italica had three subgroups.Although Arabidopsis and soybean have seven NIP subgroups [13,18], poplar, rice and maize have three to four NIP subgroups [11,12,17].Similar to Arabidopsis, rice, maize, poplar and soybean, P. virgatum, S. italica and S. bicolor had two SIPs subgroups.However, B. distachyon had only one SIP of SIP1 subgroup.PvPIPs, SiPIPs, SbPIPs and BdPIPs were predicted to be localized in the PM or both in the PM and chloroplast (Tables 1-4).However, the predicted subcellular loclization of TIPs was diversed including vacuole, PM, mitochondria, chloroplast and cytosol.Most of the NIPs were predicted to be localizd in the PM.However, some of the NIPs were predicted to be localized in any of the endoplasmic reticulum, choroplast, vacuole or cytosol.The predicted subcellular localization of SIPs was either in the PM or in the chloroplast.However, 1 PvSIP and 1 SiSIP were predicted to be localized in the nucleus.The amino acid lengths of PvMIP, SiMIP, SbMIP and BdMIP homologues with their maximum sequence identity with MIP in other plants are tabulated in Tables 1-4.

Gene structure of MIPs in the four grass plants
All of the full-length MIP sequences found in P. virgatum, S. italica, S. bicolor and B. distachyon were analyzed for introns and exons.The introns in the MIPs of these plants were compared to OsMIPs and ZmMIPs of the two other grass plants as well as AtMIPs and PtMIPs of two nongrass plants ( Fig 2).The number of introns varied from zero to five.However, apart from some disparities, the number and positions of introns were conserved within the subfamilies of MIPs in the grass plants.Nevertheless, major differences were observed when subfamilies from monocots were compared to those from dicots [11].
A comparison of members of the PIP subfamily revealed that among the grass plants only PvPIP2;14 had four introns.Although the majority of AtPIPs and PtPIPs had three introns, only ~30% of PIPs in the six grass plants had three introns (Fig 2).The majority of PIPs in the grass plants had two introns because they lost one intron between helices H2 and H3; only PvPIP1;3 and SiPIP1;3 lost one intron between helices H5 and H6.Two PIP genes from each of P. virgatum, S. italica, S. bicolor, O. sativa and Z. mays had a single intron in the distal end of Loop E. The P. virgatum further had four PIP genes that carried a single intron between helices H4 and H5.Nonetheless, this intron position is conserved in all PIPs having more than one intron.Conversely, B. distachyon had no single intron bearing PIP gene.At least one PIP gene from each of S. italica, S. bicolor, O. sativa and Z. mays and two PIP genes from each of P. virgatum and B. distachyon had no intron.Members of the TIP subfamily showed the most stable gene structure in comparison with members of other subfamilies.The majority of TIPs in the grass plants including Arabidopsis and poplar had either two or one introns.Despite PvTIP1;4, TIPs with two introns had intron position at the end of helices H1 and H3.The position of the intron in TIPs having a single intron was at the end of helix H1.Two TIPs, each from P. virgatum and S. italica, had three introns.Similar to AtTIP1;3, only BdTIP4;3 had a gene structure without any intron.
The gene structures of members of NIP subfamily in grass plants diverged from their counterparts in Arabidopsis and poplar (Fig 2).The majority of NIPs had four or three introns with highly variable introns organization.Similar to OsNIP1;5, the SiNIP2;3 had five introns, which was the highest intron number among the MIPs.However, the intron positions in SiNIP2;3 and OsNIP1;5 were different.The SiNIP3;4 possesed a unique gene structure without any intron.Similar to AtSIPs, all SIPs in the grass plants had two introns having highly conserved positions in helix H3 and loop E.

Grouping of MIPs based on the ar/R selectivity filter and Froger's position
To group the MIPs based on the ar/R selectivity filter and FPs, we constructed 3D models of all MIPs in P. virgatum, S. italica, S. bicolor and B. distachyon.The structure-based alignments and multiple sequence alignments of MIPs helped us to identify the four amino acid residues at the ar/R selectivity filter and the five residues in the FPs.The residues at the ar/R selectivity filter and in the FPs were considered to group MIPs and to compare these groups with those of the eight plants ( Fig 3,S2 and S3 Figs).These groups were correlated with their expression and non-aqua transport profiles (discussed later).The ar/R selectivity filters in all PIPs of the four grass plants contained residues F, H, T and R in H2, H5, LE1 and LE2, respectively (Fig 3A) identical to those found in Arabidopsis, maize, rice and G. max, and hence there was no group in PIPs based on this filter.Based on ar/R selectivity filters, all TIPs in P. virgatum, S. italica, S. bicolor and B. distachyon were grouped into  However, the ar/R selectivity filter of TIP Group III, which was reported in Arabidopsis, G. max and poplar (S5 Fig; [11,18,30]), was not found in grass plants or cotton.Based on the ar/R selectivity filters, all NIPs were grouped into four (Fig 3C).All members of NIP1, NIP2, NIP3 and NIP4 were grouped to Groups I, III, II and IV, respectively.The tetrad of the ar/R selectivity filters in Group I (W, V/A, A and R) and Group II (A, A/I, A/P/G and R) were similar to those of Groups I and II, respectively in Arabidopsis, rice, maize, soybean, poplar and cotton.The tetrad of the ar/R filter in Group III (G, S, G and R) was conserved in the six glass plants as well as in soybean and poplar but was absent in Arabidopsis, cotton, P. patens and S. moellendorffii (Fig 3

MIPs with unusual NPA motifs
Like their counterparts in other plants, all PIPs, TIPs, NIP1s and NIP2s in the four grass plants had dual conserved NPA motifs in loops B and E, respectively.In the NIPs with unusual NPA motifs, A of the NPA in Loop B was substituted by S and that in Loop E was substituted by V or I, as was found in poplar and other plants (Table 5).However, substitution of A with I in LE of PvNIP4;1-2 and SbNIP4;1 has not so far been reported although it is found in XIPs [11].The NIPs with unusual NPA motifs in which A in loop B and that in loop E were substituted by S and V, respectively, had a characteristic Arg-rich C-termini (Table 5).In all SIPs in the grass plants, substitution of A by T (in SIP1s) or L (in SIP2s) in the NPA motif of Loop B was in agreement with other plants.The SIPs in all plants had the conserved NPA motif in Loop E with a unique characteristic Lys-rich C-termini (Table 5) which is a potential endoplasmic reticulum retention signal [1,49].

Substrate-specific signature sequences or specificity-determining positions and non-aqua transport profiles of plant MIPs
The 3D models and the multiple sequence alignments of plant MIPs that have been shown experimentally to facilitate the transport of physiologically important non-aqua molecules such as ammonia, boron, CO 2 , H 2 O 2 , silicon and urea as well as toxic heavy metals arsenic and antimony [22,23,24,25,26,27,28,50,51,52,53,54,55] were analyzed for predicting substrate-specific signature sequences (SSSS) or specificity-determining positions (SDPs) in NPA regions, ar/R filter and FPs.The predicted SSSS or SDPs in these three constrictions in the experimentally proven MIPs are summarized in Table 6.All of the MIPs in each of the 12 plant genomes were subjected to ScanProsite tool (http://prosite.expasy.org/scanprosite/) to identify the SSSS or SDPs, and thereby the non-aqua transporters MIPs were predicted.Only the common homologues supported by all the characteristic SSSS or SDPs in the three constrictions (two NPA regions, ar/R selectivity filter and FPs) were listed as the transporter of the specific nonaqua molecule ( AtPIP1;2 in Arabidopsis, no homologue in these 12 plants has experimental evidence, hence it would be interesting to test the CO 2 permeability of these predicted PIPs in higher and lower plants.However, the plant MIPs especially in Arabidopsis, barley and tobacco, which have been experimentally proven to transport CO 2, are dispersed to PIPs [51,52,56].Including a  Phytotoxic antimony and arsenic transported through MIPs in the form of antimonite and arsenite, respectively can enter the food chain [25,57].Our analysis predicted that the antimony and arsenic transporters were distributed only among the NIPs (either Group II or III NIPs based on the ar/R filter) in all grass plants including other higher plants ( Fig 3 and S6 Fig).The antimony and arsenic transporter MIPs so far reported based on wet lab experiments are NIPs [25,57].Therefore, antimony and arsenic transport through NIPs is a conserved and prehistoric characteristic.It was predicted that 24 MIPs from the 12 plants were arsenic transporters; of them 9 homologous were from the four grass plants (  However, A few PIP homologues in rice have been reported to have arsenic permeability [58].Therefore, SSSS or SDPs prediction based on only a few PIP homologues might not be significant, and hence, PIPs were not considered in the analysis to predict arsenic transport.Very few studies have examined the functions of SIPs.However, at least one of the two AtSIPs showed water channel activity when they were expressed in yeast [59].Our analysis based on the SSSS or SDPs in the NPA regions, ar/R filter and FPs determined from the experimental PIPs, TIPs and NIPs did not detect their non-aqua transport.However, based on only the FPs, almost all SIP1s were predicted as urea transporters and SIP2s in the grass plants were predicted as transporters of arsenic and H 2 O 2 in addition to urea (Fig 3D).

MIPs predicted with multi, dual and single molecule transport activity
We defined a multichannel MIP when one MIP homologue was predicted to facilitate the transport of three or more than three non-aqua substrates.The total number of such MIPs in the four grass and in the other 6 higher plants was 18 and 37,respectively (Fig 3 and S5 and S6 Figs).However, this types of multichannel MIPs were not predicted in the lower plants, P. patens and S. moellendorffii.This result indicated that the multichannel MIPs were members of TIP2s and NIP2s.The 12 plants had a total of 136 MIP homologues that were predicted to transport two non-aqua substrates; 54 homologous were predicted in the four grass species .A total of 78 MIP homologues in the 12 plants were predicted to transport only one non-aqua substrate.

Expression of MIP genes in roots, shoots and leaves
The FPKM values obtained from the Phytozome could be assigned to 176 MIP genes of the four species.A heatmap showing their transcript levels in roots, shoots and leaves of the four plants was generated (Fig 3A -3D).The percentage of MIP genes in P. virgatum, S. italica, S. bicolor and B. distachyon expressed in at least one organ analyzed was 70, 76, 75 and 89, respectively, and that of MIPs in those plants expressed in all organs analyzed was 47, 59, 50 and 34, respectively.Among the MIPs, PvTIP1;2 (FPKM = 411.5),SiTIP1;1 (FPKM = 846.5),SbTIP2;1 (FPKM = 941.5)and BdTIP1;1 (FPKM = 1076) showed the highest expression in roots and these TIP homologues were ubiquitously expressed in all organs analyzed.

Discussion
We identified and characterized a total of 176 MIP homologues from the genomes of four grass plants, P. virgatum, S. italica, S. bicolor and B. distachyon to predict and compare their structural properties and non-aqua transport functions to those in other two grass plants, rice and maize, as well as at least six non-grass plants comprising higher and lower plants.The genomes of all twelve plants included a total of 487 full-length MIP homologues.Therefore, this study provides a comparative particulars in context of their genome-wide number of homologues, subclasses or groups, non-aqua transport profile and structure-function relationships or nonaqua transport selectivity.

The genome of P. virgatum has the largest number of MIP homologues
Although the number of MIP homologues varies from plant to plant, dicot plants comparatively have more homologues than monocot plants.Before our report, the highest known number of 66 full-length MIP homologues was shown in the genome of the dicot species G. max [18].However, in the present study, we identified 68 full-length MIP homologues in a monocot species, P. virgatum (Table 1).This is the largest number of MIP homologues in a plant genome reported to date.It can be speculated that the polyploidy nature of P. virgatum resulted in duplication of these genes along the genome [33,42].The large numbers of MIPs reflects wide diversity in substrate specificity, subcellular localization, transcriptional and post-translational regulation.

Grass plants have the least number of MIP subfamilies
Similar to Arabidopsis [13], MIPs of grass plants comprise only four subfamilies, namely PIPs, TIPs, NIPs and SIPs (Fig 1), whereas MIPs of other higher plants with dicotyledon such as poplar, soybean, tomato and cotton have one more subfamily, XIPs [11,16,18,19].The earlybranched land plants, P. patens or mosses, possesses additional MIP subfamilies adding up to seven including GIPs and HIPs [20].The PtMIPs and PpMIPs were chosen as queries so that XIPs, GIPs or HIPs could be detected if they were encoded in the genomes of grass plants.Occurrence of gene duplication as well as horizontal gene transfer during evolution is an important consideration for diversification of MIPs [33].HIPs and GIPs might have been lost between the ancestor of early-branched vascular and seed plants and XIPs might have been lost between the ancestor and grass plants including Arabidopsis.Interestingly, although all higher plants have both SIP1s and SIP2s, B. distachyon possesses only SIP1 homologue as was found in lower plants, P. patens and S. moellendorffii [20,60].This indicated that either SIP2s were present in the early-branched land plants but were subsequently lost in B. distachyon.It might be because of rapid divergence of SIP2s from SIP1 in B. distachyon as was suggested for P. patens and S. moellendorffii.

Sub-cellular localizations and expression of plant MIPs are likely to be connected to their transport profiles
The sub-cellular localizations of plant MIPs are diversified, which might be connected to their functions.It was speculated that the same PIP localized in the PM and chloroplast might be responsible for transporting water and CO 2 , respectively [51,56].Dual or multiple localizations might be coherent with the dual or multi channel activities of MIPs (Tables 1-4, Fig 3 and S4-S6 Figs).We guess that the PIPs predicted to transport CO 2 are localized in the chloroplast in addition to PM.However, the score for localization in the chloroplast is lower than that in the PM.This is also applicable to the AtPIP1;2 in Arabidopsis, HvPIP2;1 in barley and NtAQP1 in Nicotina tabaccum (data not shown), which were shown experimentally to localize in PM and chloroplast and to transport CO 2 [51,52,56].Again PIP1 is localized in the PM when it is coexpressed with PIP2; if it is expressed alone, then it remains in the ER [61,62].TIPs and NIPs exhibit multiple sub-cellular localizations and high functional diversity with transport of water, glycerol, H 2 O 2 , NH 3 , urea or metalloids such as arsenic, antimony, boron and silicon (Tables 1-4, Fig 3 and S5 and S6 Figs; [55,63]).The multiple sub-cellular localizations and diversified transport activities of MIPs are associated with osmoregulation and transcellular water transport, cell elongation, cell signaling, detoxification of excess urea, NH 3 and H 2 O 2 [3,27,36,55,64].
Data in the present study revealed that out of the 36 CO 2 transporter PIPs, 32 were expressed in the leaves (Fig 3A).All predicted arsenic, silicon, boron, ammonia and H 2 O 2 transporters were expressed in the roots.Nevertheless, most of these MIPs were also expressed in shoots and leaves.Similarly, almost all MIPs predicted to transport other non-aqua substrates such as antimony and urea were also ubiquitously expressed in the three organs roots, shoots and leaves.Interestingly, most of the unexpressed MIPs were not predicted to have nonaqua transport activity (Fig 3A -3D).These results indicate that the predicted non-aqua transport profiles of MIPs have a close relation with their expression.Again higher level of expression of some PIPs, TIPs and NIPs suggest that they have central physiological role in regulating water homeostasis, cell growth and cell expansion [4,36].Therefore, the prediction of sub-cellular localization and expression profiles of MIPs in this study may be a nice direction for wet lab experiments to validate the relationship among the multiple sub-cellular localizations, expression and functional diversity.
Non-aqua transport selectivity profile might be MIP group-specific Mutagenesis studies might be interesting to validate this hypothesis.However, the pore diameter and the transport profile might be regulated by post translational modification [5,6,47] and/or by heteromerization through physical interaction [62,65].
Since all the TIPs conserve the NPA motifs and also the FPs except some disparities (Fig 3B and S5 Fig), their non-aqua transport selectivity profiles might be rendered by the ar/R selectivity filter.The substitution of the Arg in the LE2 position by the smaller Val present in group I TIPs results in wider pore diameter (data not shown).We thus support other reports [36,66,67] that the wider pore apertures in group I TIPs might have facilitated the transport of larger non-aqua susbstrates such as urea and H 2 O 2 .However, ammonia transporters clustered only to group IIA in grass plants and also to group IIB in other plants (Fig 3B and S5 Fig), most of which had smaller pore diameter than the diameter of the ammonia molecule (data not shown).This indicates that pore diameter alone is not a determinant for selectivity of all nonaqua solutes.The regulatory events, biochemical properties of the filters and elsewhere or SDPs [23] might have effects on the transport selectivity profiles of TIPs.The TIP2s and TIP4s that have been predicted to be ammonia transporters have two motifs, G-L-x-y-G-G and P-x-H in loops B and C, respectively (S2 Fig and S9 Fig).The hydrophobic pore-linning conserved Leu, Pro and basic His with imidazole ring in these motifs might have imparted a more hydrophobic channel above the ar/R selectivity filter.The greater hydrophobicity of the channel might have aided the transport of ammonia [68].Further studies such as mutagenesis are required to test the relevance of these motifs to ammonia transport.
The divergent NPA motifs, ar/R filter and FPs individually and/or collectively may play important roles in the substrate transport selectivity profiles of NIPs which were particularly predicted for metalloids such as arsenic, antimony, silicon and boron transporters in addition to H 2 O 2 and urea (Fig 3C and S8 Fig).Conserved NPA motifs in silicon transporter and silicon non-transporter NIPs might have a limited role in the selectivity for silicon and also urea [37,69].The ar/R filter in silicon transporter NIP2s characterized by the conserved G-S-G-R made the constriction wider [70].The wider pore diameter of NIP2s might be one of the reasons to facilitate the transport of the bulkiest silicon molecule as well as urea, antimony and arsenic (Fig 3C and S6 Fig).In addition to NPA motifs, ar/R filter and FPs, the pore-lining highly conserved His in F/L-x-H-F-P motif in loop B may also influence the transport selectivity of metalloids in NIPs (S10 Fig) .Hydrophobic Leu and Phe in the first position of this motif would be one of the determinants for boron and other metals such as arsenic, silicon and antimony, respectively.Interestingly, all of the predicted boron transporters conserved two unusual NPA motifs (NPS and NPV in Loops B and E, respectively) and Arg-enrich (R-x-x-R-S-F-R-R) C-termini (Table 5) that may play roles respectively in transport selectivity profiles and structural stabilization of the tetramers [49,71].Furthermore, the highly conserved porelining SGGVTVP motifs in loop C of boron transporter NIPs might have important roles in the transport selectivity profile (data not shown).
The substitutions of corresponding positions of E14, H66, I187 and F200 of GlpF were focused to affect the width of the pore and the hydrophilic-hydrophobic pattern inside the channel in SIPs [72].However, most of the substitutions were found to be SIP group-specific in the present study (Fig 3D,S11 Fig and S2 Table).Thus, the substrate selectivity profiles of SIP1s and SIP2s, notably both the width of the pore and the interior properties of the channels, are likely to differ.Comparison of the primary sequences of SIPs with GlpF and AQP1 suggests that SIPs are likely to transport solutes which are noble, hydrophobic and large in size [72].It is usually supposed that MIPs with unusual NPA motifs may not transport water.However, water transport activity has been demonstrated in two AtSIP1s but not in AtSIP2;1 and the latter is supposed to have non-aqua transport activity [62].Expression profiles of SIPs further indicated to have their transport activity.Nevertheless, wet-lab experiments are necessary to determine the intracellular localization, expression pattern and transport activities of SIPs.

Conclusions
Analysis of genome sequences in four monocot grass plants revealed a new highest number of MIP homologues in P. virgatum without the recently discovered XIP subfamily in the grass plants.Further sequence and homology models analysis indicated that the signatures for substrate selectivity are group-specific, and like the ar/R selectivity filter, FPs can be an important basis for phylogenetic and functional groupings of MIP subfamilies.While the amino acid residue at the P1 position of FPs is one of the critical molecular determinants of the transport selectivity profiles of PIPs, residues at the ar/R filter and FPs are critical for substrate selectivity in TIPs and NIPs.Besides, the ar/R filter and FPs appear to work in coordination with pore-lining residues, particularly in loops B and C. Comparison of the predicted transport profiles with the expression profiles of MIPs in the four grass plants elucidateed a close correlation.The signature sequences or residues identified in the present study are important for predicting the transport profiles of uncharacterized MIPs.Prediction of the transport profiles and substrate selectivity of MIPs in the present study will provide an inroad to develop genetically modified plants that are tolerant to toxicity of heavy metals such as arsenic and antimony or deficiency of microelements and nutritionally better or healthier.However, the computational analysisaided prediction for transport profiles, substrate selectivity and subcellular localization based on the critical primary sequence motifs and tertiary structural models of MIPs need to be validated by wet lab experiments.

yFig 1 .Fig 2 .Fig 3 .
Fig 1. Evolutionary relationship of MIPs in the four grass plants.Phylogenetic analysis of all MIPs from the four grass plants is shown along with MIPs from poplar.The deduced amino acid sequences of MIPs were aligned using the Clustal Omega computer program and a phylogenetic tree was constructed using MEGA.The evolutionary history was inferred using the Bootstrap Neighbor-Joining (1000 replicates) method and the genetic distance was estimated by the p-distance method.PIPs, TIPs, NIPs and SIPs from the four plants clustered with the corresponding PtMIP subfamilies.Each MIP subfamily is shown with a specific background color to distinguish them from others.doi:10.1371/journal.pone.0157735.g001 and S6 Fig).The ar/R selectivity filter in NIPs of Group IV (C/V, G, G and R) was found only in grass plants but completely absent in other six plants.The SIPs were grouped into Group I and II based on the tetrad of the ar/R selectivity filter (Fig 3D).All SIP1s in the grass plants were clustered together with the ar/R filter composed of L/V, V/I, P and N which was fully conserved in some of the SIP1 members in other plants.All SIP2s were clustered into Group II with the conserved ar/R selectivity filter composed of S, H, G and S. Based on the FPs, all PIPs from the four grass plants were clustered into two groups (Fig 3A).The P2-P5 positions were conserved in PIPs of both groups (Fig 3 and S3 Fig).While Gln was conserved in the P1 position in all members of Group I, the corresponding position in the homologues of Group II was substituted by H/V/T/M/N/E.The P3-P5 positions in all TIPs conserved the residues A, Y and W, respectively (Fig 3B).Based on the disparities in P1 and P2 positions, all TIPs could be divided into two groups.Despite three members of TIP3, all members of TIP1, TIP2, TIP3 and TIP4 were in Group I in which the P1 and P2 positions conserved T and S/V/A, respectively.All TIP5 members and a few members of TIP3 were in Group II in which the P1 and P2 positions conserved S and S/A, correspondingly.Similar FPs of Groups I and II TIPs were observed in rice, maize and other plants (S5 Fig).Based on the FPs, NIPs were clustered into four groups (Fig 3C).All NIP1 and NIP2 members were in Groups I and II, respectively, whereas all members of NIP3 and NIP4 clustered to Groups III and VI, individually.In all NIPs, P3 and P4 positions were conserved with A and Y, correspondingly.NIPs of rice and maize as well as other plants also followed this grouping (S6 Fig).Based on the FPs, all SIP1s and SIP2s clustered to Groups I and II, respectively, with the residues in P1-P5 positions correspondingly M, A, A, Y, W and F/L, A, A, Y, W (Fig 3D).However, the P2 position in other than grass plants was substituted by V.
Fig 3 and S4-S6 Figs).Our analysis showed that the predicted ammonia transporter MIPs were distributed to TIPs (TIP2s and TIP4s) (Fig 3 and S5 Fig), which was in agreement with experimental evidence [24].This result indicated that ammonia transport through TIPs might be a conserved and ancient feature in higher plants since early branched plants such as P. patens and S. moellendorffii have no ammonia transporter.At least 5 MIPs from the four grass plants and 12 MIPs of the other plants were predicted to transport boron and were distributed only to members of NIP3, NIP5 and NIP6 except OsNIP2;1 (Fig 3 and S6 Fig).Boron transport in plants could be an ancestral feature as each of the 12 plants except S. moellendorffii had at least one NIP homologue predicted to be boron transporter.Our data showed that 36 PIPs in the four grass plants and 55 PIPs in the other 8 plants were predicted to be CO 2 transporters with the highest and lowest numbers in cotton and S. moellendorffii, respectively (Fig 3 and S4 Fig).Despite

doi: 10 .
1371/journal.pone.0157735.t005total of 72 MIPs in the four species, more than 139 MIP homologues in the 12 plants were predicted to facilitate the transport of H 2 O 2 (Fig 3 and S4 and S5 Figs).These MIPs were mostly of PIPs and TIPs; the members of PIPs were of group I based on FPs (Fig 3 and S4 Fig, [24]).However, a few NIPs of group I from rice, poplar and Arabidopsis, and two HIPs each from P. patens and S. moellendorffii were predicted to be H 2 O 2 transporters (S6 Fig).Data showed that all of the six grass plants had more than one silicon transporter and all were members of NIP2s (Fig 3 and S6 Fig).Furthermore, except PtNIP2;1, no silicon transporter was predicted in the other 5 plants.This result indicated that silicon transport might not be an ancestral characteristic and may be inherited based on the plant species.Each of the 12 plants had multiple urea transporters that were distributed to TIPs and NIPs (Fig 3 and S5 and S6 Figs).This result indicated that urea transport might be an ancestral characteristic of plants.
Fig 3 and S6 Fig), and among the six grass plants, P. virgatum, O. sativa had the highest number of arsenic transporters.

Table 6 .
Substrate-specific signature sequences (SSSS) or specificity determining positions (SDPs) in MIPs transporting non-aqua substrates.A/T)(S/I/V/A)(G/ A)R SG(A/C)H(L/M)NP(S/A)(V/I/T)(T/S) (G/S)(G/A)SMNP(V/A)R(T/S)L (G/A) (L/F/Y/I)(T/S)AY(/M/I/V/)NP(A/S)(V/I)T(G/S)(A/G)SMNP(A/V)R(TG)(I/S)GR SGAH(M/L/I)NP(A/S)(V/L)T (G/S)(G/A)SMNP(A/V)R(SG/N)(I/V)(A/G)(R/ V/C) SGGH(V/I/L/M)NPAVT G(A/G)SMNPA(R/V/C)SFG T(S/A)AYW [24] NIP (G/A)(S/I)AR SGAH (M/ V/I/L/)NPAVT (G/S)(A/G)SMNP(A/V)R(T/S) LG (L/F/M/V/I)TAY(F/ L) SIP* (L/V/I/A)(V/I/F/M/T)P (NF/I) G(G/S)(V/A)(S/T)(F/W)NP(S/C/T/A)(T/A/G/D) (S/T/N/L/V/I/F) (G/R)P(S/A)MNPA(N/F/I)A(F/ Y) (M/I)AAYW a The diameter of the molecule is shown in the parenthesis.b SSSS or SDPs in two NPA regions, ar/R selectivity filter and FPs were determined by analyzing the MIPs that have been shown experimentally to transport CO 2 and silicon [24] which synchronized with the report of Hove and Bhave [23].c The SSSS or SDPs were determined in this study by analyzing the experimental MIP homologues mentioned in the references within the parenthesis.*SSSS and SDPs were not based on the experimental SIPs.SIPs that were predicted as arsenic, H 2 O 2 and urea transporter based on the FPs of experimental PIPs, TIPs and NIPs, were used to predict the SSSS or SDPs in NPA regions, ar/R selectivity filter and FPs.doi:10.1371/journal.pone.0157735.t006 The non-aqua transport activities are mostly related to the phylogenetic framework of MIPs (Fig3and S4-S6 Figs).Group IA PIPs (based on FPs) from every plant were predicted to transport dual substrate CO 2 and H 2 O 2 (Fig 3A and S4 Fig).Because all the PIPs conserve the NPA motifs and the ar/R selectivity filter, their non-aqua transport selectivity profiles might be rendered by FPs.Group I PIPs usually differ from Group II PIPs by P1 position among the pore lining and their neighboring residues (S7 Fig).The variety of hydogen bonding interaction of Gln and the substituted amino acid residue at P1 position (S8 Fig) might be a reason for the different conformation and thereby transport selectivity between group I and group II PIPs.The NH 2 of polar Gln at the P1 position of group I PIPs may further influence the permeate molecules.

Table 1 .
(Continued) Where, Ka and Ks are numbers of non-synonymous and synonymous substitutions per site, respectively.PPL: polypeptide length, aa: amino acid, PSCL: predicted subcellular localization, PLAS: plasma membrane.VACU: vacuolar membrane, CYTO: cytosol, ER: endoplasmic reticulum, MITO: mitochondrion, NUCL: Nucleous and CHLO: chloroplast.x A gene that shows the highest identity with MIP in other plants by BLASTp.Parenthesis indicates the percentage of identity at the amino acid level.a Sorghum bicolor b Zea mays c Setaria italica and d Oryza sativa Japonica Group y The same abbreviations have been used in Tables 1-4.doi:10.1371/journal.pone.0157735.t001

Table 2 .
x A gene that shows the highest identity with MIP in other plants by BLASTp.Parenthesis indicates the percentage of identity at the amino acid level.a Sorghum bicolor b Zea mays d Oryza sativa Japonica Group e Hordeum vulgare and f Aegilops tauschii y The same abbreviations have been used in Tables 1-4.doi:10.1371/journal.pone.0157735.t002

Table 3 .
(Continued)Oryza sativa Japonica Group y The same abbreviations have been used in Tables1-4.
Where, CL: chromosome location, U: Unknown chromosomal location x A gene that shows the highest identity with MIP in other plants by BLASTp.Parenthesis indicates the percentage of identity at the amino acid level.b Zea mays c Setaria italica and d
x A gene that shows the highest identity with MIP in other plants byBLASTp.Parenthesis indicates the percentage of identity at the amino acid level.a Sorghum bicolor e Hordeum vulgare f Aegilops tauschii g Lolium perenne h Stipa baicalensis and i Triticum urartu.

Table 5 .
NIPs and SIPs with unusual NPA motifs and the characteristic C-termini.

Table 5 .
(Continued) *LB and LE indicates loops B and E, respectively.