Conventional and Novel Gγ Protein Families Constitute the Heterotrimeric G-Protein Signaling Network in Soybean

Heterotrimeric G-proteins comprised of Gα, Gβ and Gγ proteins are important signal transducers in all eukaryotes. The Gγ protein of the G-protein heterotrimer is crucial for its proper targeting at the plasma membrane and correct functioning. Gγ proteins are significantly smaller and more diverse than the Gα and Gβ proteins. In model plants Arabidopsis and rice that have a single Gα and Gβ protein, the presence of two canonical Gγ proteins provide some diversity to the possible heterotrimeric combinations. Our recent analysis of the latest version of the soybean genome has identified ten Gγ proteins which belong to three distinct families based on their C-termini. We amplified the full length cDNAs, analyzed their detailed expression profile by quantitative PCR, assessed their localization and performed yeast-based interaction analysis to evaluate interaction specificity with different Gβ proteins. Our results show that ten Gγ genes are retained in the soybean genome and have interesting expression profiles across different developmental stages. Six of the newly identified proteins belong to two plant-specific Gγ protein families. Yeast-based interaction analyses predict some degree of interaction specificity between different Gβ and Gγ proteins. This research thus identifies a highly diverse G-protein network from a plant species. Homologs of these novel proteins have been previously identified as QTLs for grain size and yield in rice.


Introduction
Heterotrimeric G-proteins comprised of three dissimilar subunits a, b and c are important signaling intermediates in all eukaryotes [1][2][3]. The Ga subunit, due to its ability to switch between the GDP-bound inactive form and GTP-bound active form, defines the status of signal transduction. Ligand binding to the GPCR causes a change in its conformation allowing an exchange of GTP for GDP on the Ga subunit [4]. The GTPbound Ga dissociates from the Gbc subunits and the released GaNGTP and Gbc dimer interact with a variety of effector proteins to transduce the signal. The intrinsic GTPase activity of Ga causes GTP hydrolysis, converting it back to its GDP-bound state. The GaNGDP reassociates with the Gbc dimer and the proteins return back to trimeric conformation [4,5].
A wide range of fundamental signal transduction pathways are mediated via G-proteins in both plants and animals [4,6]. In nonplant systems the multiplicity of each of the G-protein subunits, together with almost one thousand GPCRs, tissue-specific expression and signal-dependent heterotrimer formation, provides for the specificity of response [7,8]. In plants the repertoire of Gprotein components is relatively simple; the two most investigated species Arabidopsis and rice have only a single Ga, a single Gb and two canonical Gc proteins [9]. Given the presence of a single Ga and Gb, the specificity in Arabidopsis and rice G-protein signaling is provided solely by the multiplicity of Gc proteins.
We recently carried out an analysis of the soybean genome to assess if this paucity of G-protein components in plants is the norm and whether structural and functional diversity exists within the multiple copies of a gene present in highly duplicated genomes. Our analysis revealed a much more diverse plant G-protein family with the soybean genome encoding for four Ga, four Gb and two Gc proteins [10]. The number of Ga and Gb proteins in the soybean genome corresponds well to the two recent genome duplication events [11] resulting in four copies of each gene. Interestingly both Ga and Gb proteins exhibit some degree of interaction specificity between them. Moreover, based on the GTP-binding and GTPase activity, the four Ga proteins form two distinct subgroups. These data thus revealed that the G-protein signaling in plants is significantly more diverse and complex than what was predicted based on the studies in Arabidopsis and rice [10].
The presence of only two Gc proteins in the soybean genome however did not correspond to what was expected based on the genome duplication events. Additionally two of the Gb proteins GmGb1 and GmGb3 did not show any interaction with the GmGc1 and GmGc2 proteins, suggesting that additional Gc proteins may exist. The small size and relatively low sequence conservation make the homology-based identification of Gc proteins difficult; however, they do have certain conserved features to them. All known Gc proteins contain a signature DPLL/I motif which together with few additional conserved amino acids in the middle coiled-coil region is required for interaction with the Gb proteins. Most of the known Gc proteins also contain a CAAX motif at C termini which is isoprenylated, resulting in the targeting of the proteins to the plasma membrane [12,13].
In plants Gc proteins have been reported from Arabidopsis, rice, pea and soybean. The Arabidopsis AGG1 and AGG2 proteins show 48% sequence identity and are involved in regulation of defense responses of plants [14][15][16]. These two proteins are the prototypical plant Gc proteins. The rice RGG1 and soybean GmGc1 and GmGc2 proteins are highly homologous to the AGG1 protein and contain all the conserved features and motifs of Gc proteins. The rice RGG2 protein is relatively diverse as this protein has an extra 57 amino acid extension at its N terminus (compared to RGG1) and does not contain the signature prenylation motif. The two reported pea Gc proteins PGG1 and PGG2 are somewhat unusual as they do not contain the highly conserved DPLL/I motif even though a possible prenylation motif is present at its C termini [17]. The function of plant Gc proteins has been evaluated only in Arabidopsis where the proteins participate in known Ga and/or Gb mediated signaling pathways. Molecular-genetic analysis of knockout mutants in AGG1 and AGG2 reveals that the proteins are involved in regulating response to fungal pathogens [16,18].
With the availability of a newer version of the soybean genome assembly (phytozome.net v7) and the use of a series of careful genome annotation programs, we queried the soybean genome to identify additional Gc protein sequences. Our analysis identified two more canonical Gc proteins that are present on the not yet annotated genome regions as well as six additional, novel Gc proteins. The proteins display a great degree of diversity and can be grouped into three distinct families based on sequence features: the archetypal Gc proteins, the prenylation-less Gc proteins and the cysteine-rich Gc proteins. This study describes the identification of these three families of Gc proteins from soybean, details its expression profile in comparison to the expression profile of the GmGb genes and evaluates its interaction with the specific GmGb proteins. The presence of three different families of Gc proteins in a single plant species supports a highly elaborate and diverse Gprotein signaling network as well as provides clues to plant-specific G-protein signaling mechanisms, distinct from what is known based on mammalian systems.

Identification of additional canonical and novel Gc proteins from the soybean genome
Our previous analysis of the soybean genome had identified only two Gc proteins [10]. We performed a careful search with the newer version of the soybean genome and identified eight additional Gc proteins. The presence of ten Gc proteins together with four Ga and four Gb proteins thus corresponds to a total of one hundred and sixty possible heterotrimeric combinations.
Two of the newly identified Gc proteins are highly homologous to the previously identified GmGc1 and GmGc2 and show high sequence homology between them (Table 1). We named these proteins GmGc3 and GmGc4, respectively. Both these proteins are present on the regions of the soybean chromosomes that have not yet been annotated. GmGc3 is present on the chromosome 20 between the protein coding regions Glyma20g33300.1 and Glyma20g33320.1. We subsequently annotated the locus for GmGc3 as Glyma20g33310. 1. The open reading frame of this gene along with the positions of introns is reported in Figure S1. GmGc4 is present on the chromosome 10 between the protein coding regions Glyma10g32210.1 and Glyma10g32220.1. We annotated the locus for GmGc4 as Glyma10g32215.1. Figure S2 details the sequence of this newly annotated gene with its exons and intron.
These four GmGc proteins Gc1, 2, 3 and 4 have all the features of canonical Gc proteins, namely the coiled-coil domain in the middle with the conserved DPLL motif at position 66-69 and conserved L30, E40 and S51 (amino acid numbers according to GmGc1). These conserved features are important for the interaction of Gc proteins with the Gb proteins [19]. GmGc3 and GmGc4 also contain the CWIL motif at its C termini. This is the most common isoprenylation motif present in all plant Gc proteins. We assigned these four prototypical Gc proteins to group I. Figure 1 shows the protein sequence and conserved motifs of these Gc proteins.
BLAST analysis [20] of the soybean genome using the coiledcoil region of the Gc proteins identified three additional Gc-like proteins on loci Glyma11g18050.1, Glyma14g17060.1 and Glyma17g29590.1. We named these proteins GmGc5, GmGc6 and GmGc7, respectively. These three proteins share a high degree of sequence similarity with each other (Table 1). Based on its unique features, we assigned these three proteins to group II. Compared to the group I proteins, the group II proteins have an extra N terminal extension of 20-25 amino acids. The middle coiled-coil region of these proteins is highly similar to the group I proteins ( Figure 1). Sequence features predicted to be involved in Gb-Gc interaction are conserved in group II Gc proteins [19]. The most distinct feature of these group II proteins is the lack of Cterminal isoprenylation motif. The proteins end in a RWI sequence instead of a CWIL sequence. The group II proteins are thus somewhat similar to the rice RGG2 proteins as RGG2 also has an N terminal extension (albeit longer, 57 aa) and lacks the prenylation motif [21]. The group II Gc proteins are also small in size: GmGc5 and GmGc6 are encoded by 131 amino acids and GmGc7 is encoded by 126 amino acids, similar to canonical Gc proteins [5]. Surprisingly a prenylation motif-less Gc protein is not present in Arabidopsis.
The GmGc5 gene sequence is mis-annotated in the current version of the soybean genome. The predicted protein based on the genome annotation is much smaller and does not have the first exon as identified in our study. The genomic arrangement and experimental validation support the presented version of GmGc5 as the correct version. The correct sequence of the gene is detailed in Figure S3.
We identified three additional Gc-like novel proteins on loci Glyma15g19630.1, Glyma17g05640.1 and Glyma07g04510.1. We named these proteins GmGc8, GmGc9 and GmGc10, respectively and assigned them to group III based on its distinctive features. The group III proteins are significantly larger than conventional Gc proteins as GmGc8, GmGc9 and GmGc10 are encoded by 213, 228 and 159 amino acids, respectively. The N terminal of group III proteins is similar to group I proteins and the middle coiled-coil domain is highly conserved (Figure 1). GmGc8 and GmGc9 have all the sequence features required for Gb interaction. GmGc10 is the only protein identified in our analysis that does not have the conserved DPLL motif, instead this protein contains a similar DPFT motif ( Figure 1). It is interesting to note that a highly conserved sequence in mammalian Gc proteins N62 P63, F64 (numbers based on human Gc1) is not conserved in the plant Gc proteins. This region is required for the GPCRdependent conformational change in Gc [22].
In addition to large size, there are other features that are unique to group III Gc proteins. The proteins are predicted to have a TNFR (tumor necrosis growth factor receptor) signature and a long cysteine rich C-terminal region which is not found in any other known Gc proteins to date. The C terminus of these proteins is quite variable. Counting from the conserved DPLL (DPFT in case of GmGc10) motif, the C terminal of GmGc8, GmGc9 and GmGc10 consists of 130, 145 and 74 amino acids, respectively. This variable region of group III proteins is unusually high in cysteine content: GmGc8 contains 30% Cys (39 out of 130 amino acids), GmGc9 contains 33% Cys (49 out of 145 amino acids) and GmGc10 contains 26% Cys (19 out of 74 amino acids).
The sequence of GmGc8 and GmGc10 is mis-annotated in the current version of the soybean genome assembly. The predicted genes could never be amplified in our analysis. We manually annotated these genes and amplified the full length product. Based on the experimentally obtained cDNAs, we marked the correct exon-intron boundaries of GmGc8 ( Figure S4) and GmGc10 ( Figure  S5). We also identified a homolog of GmGc9 in the Arabidopsis genome at locus At5g20635. This gene has been recently described as a Gc protein in Arabidopsis [23]. Homologs of group III proteins are present in all plant species. The proteins also show some homology to a keratin associated protein present in mammals. Interestingly the homologs of group III proteins in rice which are named DEP1 and GS3 have been recently identified as major QTLs for grain size and yield determination [24,25].

Genome organization and phylogenetic relationship analysis of soybean Gc proteins
The availability of multiple Gc protein sequences in soybean with seemingly variable sequence features raised the question whether these protein families originated from the duplication of a single gene. We analyzed the chromosomal location of all ten GmGc genes and the organization of exon and introns ( Figure 2). Group I and group II GmGc genes have four exons each, whereas five exons are present in the group III genes. The length of the second and third exons is highly conserved between all ten GmGc genes. The second exon is 52 bp in group I and II and 53 bp in group III. Similarly, the third exon is 45 bp long in group I and II and 44 bp long in group III GmGc genes. These two exons code for the highly conserved, middle coiled-coil domain of GmGc proteins. This extreme conservation of the exon organization suggests that the proteins originated from this core sequence and acquired variable N-and Cterminal sequences. The group I and group II GmGc genes also show a highly conserved fourth exon. It is also interesting to note that despite similarity in the size of cDNAs and proteins of the group I and group II GmGcs, the group I genes are encoded by large genomic regions (,4-5 kb) and have a very long first intron ranging from 3.5 to 4.5 kb. This feature is not present in the group II GmGc genes. The group III GmGc genes show relatively less conserved genome organization even within the group. The last two exons of this group encode for the cysteine rich region of the proteins and display variability in size ( Figure 2).
Phylogenetic analysis of all soybean Gc proteins, along with the Arabidopsis and rice sequences, groups them in three expected subgroups with the two Arabidopsis proteins, the rice RGG1 and the GmGc1-4 in one subgroup, the GmGc group II proteins and rice RGG2 in another subgroup and the GmGc group III proteins, the Arabidopsis At5g20635 and rice DEP1 and GS3 in the third subgroup ( Figure S6). Analysis of chromosome location of GmGc genes suggests that the four group I genes have resulted from two genome duplication events with the first event resulting in two related genes which duplicated again to result in highly homologous GmGc1 and GmGc2 forming one pair and GmGc3 and GmGc4 forming another. Similarly within group II, the genes GmGc5 and GmGc6 form a duplicated gene pair; and within group III, the genes GmGc8 and GmGc9 form a duplicated gene pair. We did not identify duplicated gene pairs corresponding to GmGc7 and GmGc10. These might have been lost during evolution. Tissue-dependent expression analysis of soybean Gc genes In mammalian systems where multiple isoforms of Gc proteins are present, a high degree of tissue specificity is observed for expression. Similarly the Arabidopsis Gc proteins also show distinct expression patterns [16]. We assessed the expression profile of the ten GmGc genes by real-time quantitative PCR to evaluate whether all ten GmGc genes are expressed and the comparative expression levels. We quantified the absolute expression of each gene by a 100 fold serial dilution of cloned plasmid DNA and ascertained the specificity and efficiency of the individual primer pairs (Table S2, Figure S7). A linear correlation coefficient (R 2 ) of 0.98-0.99 was observed over a 100,000 fold dilution. Interestingly unlike Ga and Gb genes that are expressed at relatively similar levels in different tissue types [10], a range of variable expression levels were observed for the GmGc genes.
We analyzed the tissue specific expression of the three families of GmGc genes in vegetative tissues and reproductive tissues. Additionally given the possible role of G-protein dependent signaling during nodulation [26][27][28][29][30], we also analyzed the expression of GmGc genes in this legume-specific tissue. Of the group I genes, GmGc4 exhibits overall high expression compared with the other members of this group ( Figure 3A). In general, all four genes are expressed in all tissue types tested; however, the expression of GmGc4 is comparatively lower in roots than in aerial tissues whereas GmGc3 is most highly expressed in nodules and at a very low level in developing seeds (S4). The expression data of the genes GmGc1 and GmGc2 are presented for comparing relative expression level to the GmGc3 and GmGc4 genes.
Within the group II genes GmGc6 is expressed at a relatively low level compared to the other two members of this group ( Figure 3B). GmGc7 in general has lower expression in leaves at all stages of development. All three genes are expressed at a moderate level in nodules. Additionally the group II genes are expressed at a relatively lower level in reproductive tissues i.e. the inflorescence apex, flowers and at an almost non-detectable level in seeds compared to the vegetative tissue ( Figure 3B).
The group III genes show maximum variability in tissue specific expression. In this group GmGc9 is expressed at an extremely high level when comparing within this group or with any of the other GmGc genes; whereas GmGc10 is expressed at a very low level.
Noticeably this group of genes shows poor expression in nodules and seeds ( Figure 3C).

Expression of GmGc genes during seed development and germination
Our previous gene expression analysis with the soybean Ga and Gb genes showed interesting expression patterns during seed development and germination [10]. We analyzed the expression pattern of the newly discovered GmGc genes during seed development and germination and compared it to the expression of GmGb genes.
Specific patterns of gene expression were observed for different GmGc genes. For example GmGc3, GmGc6 and GmGc7 do not show any change in expression during seed development ( Figure 4) similar to GmGb1 and GmGb4 [10]; whereas GmGc1, GmGc2 and GmGc9 show moderate down-regulation during seed maturation ( Figure 4). Conversely, GmGc4, GmGc5 and GmGc8 exhibit significant up-regulation of expression during seed maturation ( Figure 4). These genes thus show expression profiles similar to GmGb2 and GmGb3. It is also noticeable that the expression of these three Gc genes, GmGc4, GmGc5 and GmGc8, is maintained at a high level in dry seeds whereas the expression of GmGb2 and GmGb3 genes returns back to basal levels in dry seeds [10]. G-proteins are involved in the regulation of seed germination in Arabidopsis [31,32] and the soybean GmGa and GmGb genes show distinct patterns of expression during different stages of seed germination [10]. GmGc3 and GmGc4 follow the similar pattern as GmGb3 and GmGb4 with higher expression starting 6 h after imbibition, maximizing at 12 h followed by a decrease in expression ( Figure 5). The expression of GmGc5 and GmGc9 is similar to the expression of GmGb1 and GmGb2 with expression peaking at 6 h after imbibition followed by a decrease in expression ( Figure 5). The genes GmGc6 and GmGc10 are expressed at a significantly low level in dry seeds and do not show any change in expression during germination ( Figure 5). GmGc7 is also expressed at a very low level in dry seeds ( Figure 4C); however, the expression of this gene is significantly up-regulated during germination. The genes show higher expression starting 6 h post-imbibition and maximum expression is observed at 24 h post imbibition followed by a gradual decrease. In contrast GmGc8 which is expressed at a high level in dry seeds shows a significant down-regulation in its expression following 6 h post imbibition.
We also tested the expression of different GmGc genes in response to various stresses. No significant differences were observed in the expression of any of the genes under the conditions where a stress-marker gene GmRab18 was expressed at significantly higher level (data not shown). These data are similar to what we earlier observed for the GmGa and GmGb genes [10].

Localization of soybean Gc proteins
Canonical Gc proteins are localized to the plasma membrane via the isoprenylation of C terminal sequence [33,34]. The presence of three Gc protein groups with distinctly variable C terminal sequences allowed us to assess whether the three groups of proteins exhibit any differences in localization. We transiently transformed tobacco leaves via agrobacterium-mediated transformation with YFP (yellow fluorescent protein) fused with respective GmGc genes at N termini. The transformed leaves were visualized with confocal microscopy. The group I fusion proteins YFP-GmGc1, YFP-GmGc2, YFP-GmGc3 and YFP-GmGc4 showed fluorescence localized to the periphery of the cells as expected based on the presence of a canonical isoprenylation motif at the C terminal of these proteins (Figure 6). Similarly the group II fusion proteins YFP-GmGc5, YFP-GmGc6 and YFP-GmGc7 also showed fluorescence restricted to the periphery which suggests the predominantly plasma membrane localization for these proteins ( Figure 6). This is intriguing as these proteins lack a canonical prenylation motif and do not have any cysteine residues in the vicinity for such modifications. This is similar to the localization of rice RGG2 protein to the plasma membrane despite lacking a prenylation motif at its C terminal [21].
GmGc8 and GmGc9 have a conserved C at position 4 and all three group III genes have a conserved cysteine at position 6 from C-terminal in addition to multiple other cysteine residues in the vicinity. These cysteines qualify for the possible lipid modification by farnesyl transferase or by gernylgernyl transferases [35]. However, the localization of this group of proteins is interesting as in addition to most of the protein being present at the periphery, fluorescence was also observed as clear puncta especially in the case of YFP-GmGc8 ( Figure 6). Such patterns are indicative of either protein aggregate formation or endosomal localizations [34]. Further studies with stably transformed plants will be required to critically assess the localization of group III GmGc proteins.

Protein-protein interaction between GmGb and GmGc proteins
We assessed the protein-protein interaction specificity of the three groups of Gc proteins with all four Gb proteins of soybean using the ProQuest yeast-2-hybrid system. The group I proteins GmGc3 and GmGc4 showed strong and specific interactions with GmGb2 and GmGb4 but not with GmGb1 and GmGb3 proteins    Figure 7A) as is also the case with GmGc1 and GmGc2 [10]. The group II GmGc proteins exhibit relatively weaker interaction with the GmGb proteins compared to the group I proteins except the interaction between GmGc5 and GmGb4. However, specific interactions of group II proteins were observed with all four GmGb proteins ( Figure 7B).
The establishment of interaction with Gb proteins was an utmost requirement to classify the group III proteins as novel Gc proteins. As shown in Figure 7C, the group III GmGc proteins showed very strong and specific interaction with GmGb proteins. Similar to the group I proteins, the group III proteins also interacted strongly with GmGb2 and GmGb4. However, distinct from the group I proteins, where no interaction was seen with GmGb1 and GmGb3, the group III proteins clearly show weak but specific interaction with GmGb1 and GmGb3 proteins ( Figure 7C). We also tested whether the N terminus itself was sufficient for this interaction by making deletion constructs of proteins that either expressed the N-terminal half (till DPLL/ DPFT motif) or the C-terminal half (protein sequence following the DPLL/DPFT motif). Significantly weaker interaction was detected using either of the truncated proteins ( Figure S8), suggesting a full length protein is required for such interactions.
The presence of long cysteine rich C-terminal regions on the group III GmGc proteins prompted us to test whether these proteins interact with one another. We tested interaction between GmGc8, GmGc9 and GmGc10 in all nine possible combinations. The proteins exhibit strong interaction with each other as shown in Figure 8.

Identification of three distinct Gc protein families in soybean
Gc proteins are an integral part of the G-protein heterotrimer. In mammalian systems, these proteins are relatively small (7-8.5 kDa) and are the most diverse of the three subunits with only ,50% sequence homology between different isoforms. The proteins also show a high degree of tissue specific expression and isoform-specific interactions with Gb proteins. In both plants and animals, the Gc proteins are required for proper targeting of the Gb subunit and of the intact heterotrimer to the plasma membrane [5,[36][37][38]. Identification of diverse Gc proteins from soybean ( Figure 1) along with the presence of multiple Ga and Gb proteins expands the number of possible heterotrimeric combinations in soybean in addition to identifying novel, plant specific components of G-protein signaling.
The C-terminus of Gc proteins is the basis of our classification of the proteins into three distinct groups. Group I is comprised of canonical Gc protein that have all the conserved features of Gc proteins as described based on the mammalian paradigm. Homologs of this family are present in all plant species including gymnosperms and mosses. Most of the reported plant Gc proteins to date, AGG1, AGG2, RGG1 and GmGc1-4, are members of this group (Figure 1). The group II proteins consisting of Gc5, Gc6 and Gc7 differ from the family I protein mainly due to the absence of conserved cysteines in the C-terminal region. The genes seem to have evolved from a single amino acid substitution of the CWIL motif to the RWI motif (the most common C-terminal motif present in dicot plants) as most of the protein sequence and exonintron organization of family I and family II proteins is highly conserved (Figure 2). The rice RGG2 protein is a member of this family although the protein ends in a KGDFS sequence which also seems to be conserved in other monocot species. Lack of cysteine residues in group II proteins precludes the possibility of Figure 7. Interaction between soybean G-protein b and c subunits. Interaction between GmGb (in pDEST32) and GmGc (in pDEST22) proteins was determined using yeast-2 hybrid-based colorimetric assay. The assays were performed in triplicates and data were averaged. Error bars represent the standard error of the mean. Two biological replicates of the experiment were performed with similar results. (A) Interaction between GmGb proteins and group I GmGc proteins. (B) Interaction between GmGb proteins and group II GmGc proteins. (C) Interaction between GmGb proteins and group III GmGc proteins. Strong, weak and -ve refer to the interaction strength between RalGDS-wt-pDEST32 with Krev1-pDEST22, RalGDS-m1-pDEST32 with Krev1-pDEST22 and RalGDS-m2-pDEST32 with Krev1-pDEST22, respectively The controls are provided with the ProQuest two hybrid system (Invitrogen). doi:10.1371/journal.pone.0023361.g007 prenylation; however, the proteins do seem to localize at the plasma membrane ( Figure 6) as has also been reported for the rice RGG2 (21). A single cysteine present in the middle of these proteins could be a potential target for palmotylation which might assist in its anchoring to the plasma membrane [21]. Additionally the proteins have a high number of positively charged and aromatic amino acids at the C-terminus (7 out of 10) which may target them to the plasma membrane by the formation of an a helix [39]. In mammals, lack of prenylation either by mutation of the conserved cysteine residue or by chemical inhibition has been shown to result in localization of specific Gbc proteins to the nucleus of the cells and its possible role in regulating transcription, a function not typically associated with G-proteins [40]. Lack of a prenylation-less gene in the Arabidopsis genome has limited the functional characterization of this family of Gc in plants. The availability of an insertional mutant line in RGG2 might be able to resolve the issue of whether such proteins play any unconventional roles in plants.
The group III proteins constitute a novel Gc family, specific to plants. Homologs of these proteins are present in both angiosperm and gymnosperm families. We applied the following criteria to establish the group III proteins as authentic Gc proteins. The coiled-coil domain of group III Gc proteins is highly similar to the conventional Gc proteins with full conservation of amino acid residues involved in the interaction with Gb proteins (Figure 1). The size of the second and third exons of these proteins is very similar to group I and group II Gc proteins (Figure 2). Additionally a homology modeling-based analysis of three-dimensional protein structures using a fold recognition server Phyre (Protein Homology/analogY Recognition Engine, http://www.sbg.bio.ic. ac.uk/˜phyre/) predicted these proteins to be Gc proteins with 40-55% precision. Finally the proteins showed strong and specific interaction with the GmGb proteins. The Arabidopsis homolog of group III family protein At5g20635 has recently been identified as a novel Gc protein [23].
Rice has two proteins that show homology to group III proteins, DEP1 (dense and erect panicle 1) and GS3 (grain size 3), that have been isolated as major QTLs for seed size and yield [24,25]. Interestingly, the rice Ga protein RGA1 is also involved in regulation of seed size [41]. The rice DEP1 and GS3 proteins have been described as novel proteins containing a TNFR motif, a transmembrane domain and proteins with homology to human keratin-associated protein. We also identified a TNFR motif in the soybean group III family proteins; however, using multiple transmembrane prediction programs including TMHMM (http://www.cbs.dtu.dk/services/TMHMM/), HMMTOP (http://www.enzim.hu/hmmtop/) and DAS (http://www.sbc.su. se/˜miklos/DAS/), we did not identify any transmembrane domains in the GmGc8, GmGc9 or GmGc10 proteins. Rice and Arabidopsis group III proteins are predicted to have a single transmembrane domain using DAS, but not with TMHMM or HMMTOP. Experimental verification of the presence of a transmembrane domain and any possible role it might play in localization and/or positioning of these at the plasma membrane will be needed to evaluate its importance. Interestingly, YFP-fused group III GmGc proteins, in addition to the peripheral YFP fluorescence, also showed small vesicle like structures which were very evident in GmGc8. These proteins might be localized to endosomes structures in addition to the plasma membrane. However, since these proteins are highly cysteine rich such structures could also be due to protein aggregate formation or selfinteraction ( Figure 8). Our data at this time cannot differentiate between these possibilities. Expression of proteins with native promoters in a protein null background will help decipher correct localization.

Expression profile of soybean Gc proteins and possible correlation with Gb proteins
The analysis of the complete repertoire of the GmGc genes and its comparison with the expression pattern of GmGa and GmGb genes began to display specific expression patterns related to particular genes or gene families. Moreover, when comparing the absolute expression levels within different subunits, a wide range of expression levels were observed for the GmGc genes (e.g. Figure 3, GmGc9 versus GmGc6), whereas all GmGa and GmGb genes were expressed at a relatively similar level to each other. Additionally the duplicated gene pairs of GmGa or GmGb typically showed similar expression patterns, a trend not observed between duplicated gene pairs of GmGc genes. GmGc9 was the most highly expressed gene, whereas its duplicated gene GmGc8 was expressed at the moderate level. Likewise, GmGc4 was a highly expressed gene but its duplicated gene pair GmGc3 was relatively poorly expressed.
Some tissue specificity of gene expression was also evident while comparing the expression of multiple Gc genes such as low expression of group II genes in reproductive organs or lower expression of group III genes in nodules ( Figure 3). Additionally, during seed development and germination, specific expression profiles were observed for individual genes which in some cases corresponded well to the expression of GmGb genes. These observations suggest that developmental stage-specific or tissuespecific expression of particular genes may lead to specific bc combinations, similar to what is observed in the mammalian systems [42,43].
Since the two rice homologs of group III genes DEP1 and GS3 are involved in grain size determination and yield, we focused on the expression pattern of soybean group III GmGc genes during Interaction between different members of group III GmGc proteins was determined using yeast-2 hybrid-based growth and colorimetric assay. The assays were performed in triplicates and data were averaged. Error bars represent the standard error of the mean. Two biological replicates of the experiment were performed with similar results. Inset shows growth of yeast colonies on media lacking Leu and Trp but containing 50 mM 3-AT. Strong, weak and -ve refer to the interaction strength between RalGDS-wt-pDEST32 with Krev1-pDEST22, RalGDS-m1-pDEST32 with Krev1-pDEST22 and RalGDS-m2-pDEST32 with Krev1-pDEST22, respectively. doi:10.1371/journal.pone.0023361.g008 seed development. Our data showed that of the three group III genes in soybean, GmGc8 shows the most interesting expression pattern during seed development and germination (Figure 4, 5). The expression of this gene was highly up-regulated when seed is undergoing maturation (stages S7 onwards), whereas a sharp decrease was observed during seed germination. This gene could be a true functional homolog of the rice DEP1 or GS3 gene. Additionally the expression of this gene in seeds could be dependent on the endogenous ABA and/or GA concentration as the levels of both these hormones change significantly during seed maturation and germination. Interestingly, GmGc5 also followed a similar expression profile where the expression was up-regulated during seed maturation and generally down-regulated during germination. These genes could be potential targets for manipulation to regulate soybean seed development. It was also obvious that this group of genes is highly variable functionally as GmGc10 shows very little expression in seeds at all developmental stages and exhibits no change in expression profile during germination.

Interaction specificity of GmGb and GmGc proteins
Specific mammalian Gb and Gc proteins form non-dissociable dimers and interact very strongly under a variety of in vitro and in vivo conditions. The data presented in this study suggest that there is specificity of interaction between different GmGb and GmGc proteins. It is especially intriguing that the GmGb1 and GmGb3 are in general weaker interactors compared to GmGb2 and GmGb4 even though they have more than 90% sequence similarity at the protein level (Figure 7). Additionally the group II GmGc proteins also exhibited weak interactions compared to the group I and group III proteins even though they do interact with similar strengths with all four GmGb proteins. Similar differences in the interaction between mammalian Gb and Gc proteins have also been observed. The human Gb1-4 share 80-90% sequence identity; however, Gb1 in general interacts with multiple Gc isoforms, Gb2 is more restricted in its interaction partners and Gb3 displays significantly weaker interactions [44][45][46]. In most cases, however, the interaction data were based on in vitro assays and its relevance in the context of a specific cell type or a signal remains to be evaluated in both mammalian and plant systems.
An intriguing observation in our studies is the strong interaction between different members of the group III proteins themselves ( Figure 8). It would be interesting to assess how the oligomerization of these proteins might affect interaction with GmGb proteins or other possible interactors. The unusual nature of these proteins does not preclude the possibility of its involvement in some plantspecific signaling mechanisms which are different from what is known from studies in mammalian systems. Plants do have several unconventional G-protein components such as the extra-large Gproteins that have a Ga domain [47][48][49][50]; the GTG proteins that have GTP-binding and hydrolysis activity of their own and are regulated by GPA1 [51]; and the RGS1 protein that has a 7TM GPCR-like structure fused to RGS domain [52]. Likewise, most of the known effector proteins of G-protein signaling in plants are also distinct from the conventional effector proteins of mammalian systems. For example, PRN1 (Pirin1) which is a member of an iron-containing subgroup of the cupin superfamily, PD1 (prephenate dehydratase1), a protein involved in phenylalanine biosynthesis, and a NF-Y family transcription factor form a signaling complex during G-protein mediated light and ABA signaling pathways during early growth and development in Arabidopsis [53][54][55]. Similarly, a chloroplast-localized protein THF1 (thylakoid formation 1) is a GPA1 effector protein during sugar signaling [56]. Detailed study of specific pathways mediated by these unconventional proteins in the context of canonical heterotrimeric G-protein signaling is only in its infancy and future work may divulge additional signaling mechanisms specifically evolved in plants.

Conclusion
We have identified three distinct families of Gc proteins including a novel, plant-specific Gc protein family in the soybean genome. The elucidation of this complete repertoire of different Gprotein subunits in soybean reveals a highly elaborate G-protein signaling network in plants. Our data also suggest the presence of subunit-specific and tissue-type or developmental stage-specific heterotrimeric combinations. Additionally the homologs of the group III Gc protein have been identified as major QTLs for grain size and yield in rice. Further work with the generation of RNAi and overexpression lines of soybean G-protein genes will help us decipher its signaling mechanisms as well as its use as potential targets for biotechnological applications.

Plant material and growth conditions
Soybean (Glycine max L.) cv. Jack seeds were grown in growth chamber (26/20uC day/night temperature, photoperiod of 14/ 10 h, 800 mmol m 22 s 21 light intensity, and 60% humidity). Different developmental stages of soybean plants were collected, immediately frozen in liquid nitrogen and stored at 280uC. Tissue for germination and stress-related experiments was prepared as described in [10].

Cloning of soybean G-protein genes
Soybean Gc genes were identified by analysis of the latest the soybean genome assembly (www.phytozome.net/soybean) with Arabidopsis and rice full length and middle coiled-coil region Gc protein sequences as queries. Full-length Gc genes were amplified from soybean seedling cDNA using gene-specific primers (Table  S1). The eight newly identified genes were cloned into the pENTR/D-TOPO vector and confirmed by sequencing.

RNA isolation and qRT-PCR
Total RNA was isolated from different tissues of soybean plants using Trizol reagent (Invitrogen) and qRT-PCR experiments were performed essentially according to [10]. The real-time PCR amplification was repeated three times and data were averaged. Sequencing and melt curve analysis of amplicons confirmed specificity. Standard curves for each of the genes were generated using the cloned plasmid DNA of each gene.

Localization of GmGc proteins
The ten GmGc genes were cloned into the pEarleyGate 104 [57] destination vector using LR clonase mix (Invitrogen). Sequenceconfirmed recombinant plasmids containing the YFP::GmGc1-10 were transformed into A. tumefaciens strain GV3101 for subsequent plant transformation.
Abaxial surface of tobacco leaves were infiltrated with a logphase culture of A. tumefaciens containing either the gene of interest or an empty vector control according to [58]. Infiltrated plants were incubated in darkness for 36 h followed by 24 h in light. The leaves were imaged with the Zeiss LSM 510 laser scanning confocal microscope (Carl Zeiss) using a 40x water-immersion, 1.2 numerical aperture, C-Apochromat objective. The yellow fluorescent protein (YFP) was excited with the 458-nm line of the argon laser. At least three independent infiltrations were performed for each construct.

Protein-protein interaction assays
To determine the interaction between specific GmGb and GmGc proteins, GATEWAY-based yeast-two-hybrid assay was performed (ProQuest Two Hybrid System, Invitrogen). Briefly, GmGb1-4 genes were cloned into pDEST32 bait vector (containing DNA-binding domain) and GmGc1-10 genes (full length, Nterminal and C terminal parts) were cloned into pDEST22 prey vector (containing DNA-activating domain). The constructs were co-transformed in yeast host strain MaV203 (Invitrogen) in specific combinations. Interaction was determined by growth of diploid yeast colonies on minimal media lacking leucine and tryptophan, but containing 50 mM 3AT (3-Amino-1,2,4-triazole). The quantitative strength of interaction was determined by b-galactosidase (b-gal) expression assay using ONPG (o-nitrophenyl-b-D-galactopyranoside) as a substrate per manufacturer's instructions.

Supporting Information
Table S1 Gene-specific primers used for expression analysis of GmGc genes.