The Repertoire of Heterotrimeric G Proteins and RGS Proteins in Ciona intestinalis

Background Heterotrimeric G proteins and regulators of G protein signaling (RGS) proteins are key downstream interacting partners in the G protein coupled receptor (GPCR) signaling pathway. The highly versatile GPCR transmembrane signaling system is a consequence of the coupling of a diverse set of receptors to downstream partners that include multiple subforms of G proteins and regulatory proteins including RGS proteins, among others. While the GPCR repertoire of Ciona intestinalis, representing the basal chordate is known, the repertoire of the heterotrimeric G proteins and RGS proteins is unknown. Methodology/Principal Findings In the present study, we performed an in-silico genome-wide search of C. intestinalis for its complement of G proteins and RGS proteins. The identification of several one-to-one orthologs of human G proteins at the levels of families, subfamilies and types and of homologs of the human RGS proteins suggests an evolutionarily conserved structure function relationship of the GPCR signaling mechanism in the chordates. Conclusions The C. intestinalis genome encodes a highly conserved, albeit, limited repertoire of the heterotrimeric G protein complexes with the size of subunit types comparable with that in lower eukaryotes.


Introduction
G protein coupled receptors (GPCRs) comprise a large family of diverse transmembrane signaling proteins that receive information from various extracellular stimuli including hormones, neurotransmitters or sensory stimuli. The three-component GPCR signaling system is so named based on its ability to recruit and regulate the activity of intracellular heterotrimeric G proteins (guanine nucleotide-binding protein). Ligand binding at the extracellular recognition sites on the GPCRs is transduced into intracellular signals through the coupling of the receptor and G protein and between G protein and other effector proteins, all of which can be independently regulated by additional proteins or at the transcriptional level.
The heterotrimeric G protein is composed of a, b, and c subunits. The inactive form of the GDP-bound heterotrimeric state is activated when the agonist activated receptor induces a conformational change in the G protein trimer resulting in the Ga-subunit binding to GTP in exchange for GDP. This exchange leads to the dissociation of Ga-GTP and Gbc subunits that further interact with downstream target effectors thus activating and regulating a signaling cascade. The turnoff of the cellular response occurs when the Ga subunit hydrolyses GTP to GDP and the Ga-GDP and Gbc subunits re-associate to form the Gabc trimer. The Gb and Gc subunits work as an obligate complex, a functional unit that can only be dissociated under denaturing conditions. The structural and functional aspects of G proteins and their receptor-mediated activation have been extensively studied [1][2][3][4][5]. The third component of the GPCR signaling system is the G proteinregulated effector. Several effectors enhance GTPase activity of the Ga subunit thus playing a role in deactivation and modulation of G protein mediated signaling. A recent modification to the standard model of GPCR signaling has come from a family of proteins called ''regulators of G protein signaling'' or RGS proteins that have been found to accelerate the intrinsic GTPase activity of Ga subunits independent of the Gbc subunits [6].
The modular architecture of the G protein-mediated transmembrane signaling system is highly versatile and specific and is based on the fact that there are numerous receptors (around 800 in human) and several types of G proteins and effector proteins [7]. Enormous diversity of heterotrimeric complexes can be assembled from a limited repertoire of G protein subunits which are then activated by different receptors [8]. Most receptors also are able to activate more than one type of G protein [9]. Moreover, several Gbc complexes can interact with the same Ga suggesting that differential expression or subcellular localization are important in the regulation of downstream signaling [10].
Ciona intestinalis (henceforth referred to as Ciona) is a protochordate belonging to the ascidian class of chordates that diverged from the vertebrate lineage about 520 million years ago. An out-group to the vertebrates, this ascidian has the smallest genome of any experimentally manipulable chordate. A translucent morphology, availability of developmental mutants, quickly spawning embryos, established transgenic, morpholino-based gene knockdown, in situ hybridization experimental procedures and extensive EST data are some of the many advantages that make Ciona an excellent model organism to study developmental and evolutionary biology of the vertebrate-invertebrate split. Moreover, Ciona possesses organ systems that are homologous to vertebrate heart, thyroid, blood, digestive and neural complex systems [11][12][13][14]. In an earlier study, a genome wide survey of the repertoire of GPCRs in Ciona reported the presence of 169 putative receptors. A comparative analysis of the repertoire revealed a high level of orthology with that of human GPCRs (about 40% of Ciona receptors) [15]. Here we extend the previous study and report the identification of the repertoire of the heterotrimeric G proteins and the RGS proteins in Ciona and present a comparative analysis with that in human. The analyses could provide insights into the origin and evolution of the GPCR signaling system in a protochordate and its further diversification into the vertebrate lineage. Our results thus could serve as a basis for carrying out experimental studies to address functional and regulatory aspects of GPCR mediated signaling.

Protein data mining
The Joint Genome Institute (JGI) C. intestinalis genome versions 2.0 and 1.0 databases (http://genome.jgi-psf.org/Cioin2/Cioin2. home.html; http://genome.jgi-psf.org/ciona4/ciona4.home.html) were used as the source for obtaining the complete proteome [16][17]. The v2.0 proteome dataset includes 15,852 proteins and sequences in the current version include both automated and to a lesser extent, manually curated annotations. The sequences of the query G proteins and RGS proteins were obtained from the NCBI non-redundant database and the gpDB Database [18]. The gpDB Database is a relational database of GPCRs and GPCR interacting proteins based on interactions reported in the literature. The BLAST program was used for initial identification using a cut off of E-value of 10 23 [19]. Additionally, searches were carried out using customized Hidden Markov Models (HMM) built separately for all known families/sub-families listed in the gpDB Database. The HMMs were constructed using HMMBUILD and HMMCALIBRATE programs of HMMER package version 2.3.2 [20]. Ciona sequences that returned hits with an E-value better than 0.01 were extracted. Default settings were used in all HMMBUILD and HMMCALIBRATE models constructed. At this stage of the analysis, E-values of low stringency were used for BLAST and HMM search algorithms to enable detection of all true positives. False positives, if any, were sifted out in the next stage with a higher level of stringency where the results of BLAST, best reciprocal BLAST hit and HMM based search were compared and common hits satisfying the cutoff criteria were taken as confirmed hits of Ciona. All except two of Ciona sequence accession numbers reported in this study, including that of several manually edited fragments, refer to the identifiers assigned in the JGI v2.0 genome database. GenBank accession numbers, where available, are given in the supporting information.

Sequence comparison and phylogenetic methods
Multiple sequence alignment of the Ciona sequences with known orthologs picked up from the gpDB database was performed using CLUSTALX 1.8 program [21] using the default parameters for substitution matrices and gap penalties. Manual adjustments of the alignments were made when necessary using the BioEdit program [22]. All phylogenetic analysis was carried out using the PHYLIP package, as implemented in the MOBYLE Portal (http://mobyle. pasteur.fr/) [23]. Neighbor-Joining (NJ) and Maximum-Likelihood (ML) methods as implemented in the NEIGHBOR and PROML programs were employed for tree searching and inference. The statistical reliability of the phylogenetic trees was tested by interior branch analysis with 500 bootstrap replicates used to create a consensus tree. Phylogenetic trees created using the NJ method and the ML method resulted in trees with identical topologies and clusters with significant bootstrap support. Alignments and secondary structures were displayed using ESPript 2.2 (http:// espript.ibcp.fr) [24].
Expressed Sequence Tag hits and tissue/developmental stage based expression data The entire UniGene database for the Ciona transcript sequences was downloaded from the ftp site (ftp://ftp.ncbi.nih.gov/repository/ UniGene/) and locally installed. All Ciona G protein and RGS protein sequences were queried against the UniGene EST database using TBLASTN with the E-value set at 1e 25 . Identifiers of the statistically significant EST-hit collection were imported in an excel sheet and categorized based on their derivative developmental stage/tissues.

Results and Discussion
In order to generate an independent data set of the repertoire of G proteins and RGS proteins, a data mining approach was taken using BLAST program and HMM based searches. Sequences from the gpDB Database and the NCBI were collected and classified into different groups and families of G proteins and RGS proteins. The sequences were carefully chosen to represent diverse organismal lineages and groups as reported in the gpDB classification. The search strategy that included using sequences as single queries and also using distantly homologous sequences combined into a profile for a more sensitive HMM based search, resulted in the identification of unambiguous homologs in the Ciona proteome. The query sequences used for the BLAST searches and for the construction of HMMs are available as supplemental data (Data S1). Further classification of the Ciona sequences into orthologous sets of different classes, families, subfamilies and types were carried out based on phylogenetic analyses ( Table 1, Table S1). The complete amino acid sequences of all identified Ciona proteins are available as supplemental data (Data S2).
Since EST evidence covers three quarters of the predicted Ciona genes, we also identified EST matches for the predicted Ciona G proteins and RGS proteins. Except for 2 out of 10 Ga sequences and 5 out of 14 RGS sequences, all other sequences had at least one EST match mapped to the gene sequences. EST matches for these sequences were sorted as being derived from different developmental stages as well as tissues (Table S2).

The Ga subunit
The Ga class of the heterotrimeric G proteins comprises four main families, Ga s , Ga i/o , Ga q/11 and Ga 12/13 , each consisting of several members with different expression patterns. Members of these families are structurally conserved and often share functional properties. The gpDB Database lists four other families that include members from nematodes, fruitfly, fungal_plant and one group designated as 'Unclassified Ga''. The human repertoire of Ga class as listed in the gpDB database includes a total of 31 sequences with each of the families Ga s , Ga i/o , Ga q/11 and Ga 12/13 , containing 9, 13, 5 and 2 members, respectively. In our study, a total of 10 unambiguous orthologs of the Ga class were identified in Ciona. Three sequences out of the ten members were fragments, possibly the result of errors in the automated splice site prediction methods used. In these cases, additional searches carried out against the NCBI non-redundant protein database and the Ciona proteome v1.0, yielded two sequences that were full length proteins. The validity of the edited sequences was verified by comparison of these regions against the EST database. In the case of the third fragment (281048), the missing region was identified by the use of translated alignments against the Ciona genomic regions where the missing region was expected to be found based on alignment with the human orthologous protein. However, in this case, the EST database contained no hits corresponding to the genomic region. The Ciona Ga homologs share a relatively high pair wise sequence identity ranging from 33-83% with the best match in human ( Table 1). The sequence identity values within the Ciona Ga proteins range from 26-99%.
The classification of the Ciona sequences into different families and subfamilies, carried out by phylogenetic analysis, revealed distinct clusters that corresponded to known families and subfamilies in human and other organisms ( Figure 1). The phylogenetic tree showed monophyletic clades with good bootstrap support for each of the major families. The clustering shows two orthologs of the G q/11 family and one ortholog each of the Ga i/o family, the Ga s family and the Ga 12/13 family. The Ga i/o ortholog (209567) shows the highest identity to that of human. We however, did not find Ciona orthologs of the three non-vertebrate families. Five Ciona homologs clustered into three separate groups with no clear one-to-one orthologous relationship to the known families. However, among these, two Ciona pairs (220350/273922; 201119/287065) which are located basal to the main Ga i/o family grouping, share high sequence identities of about 59% to the human Ga i1 . One of these pairs, (220350/273922) shares an identity of 99% and appears to be a recent gene duplication event.
The two genes have the same loci and are positioned in tandem to each other, separated by about 3500 nucleotides. These divergent Ciona specific Ga proteins were placed into a group designated 'Other'. One Ciona Ga sequence (281048) was found to be an outgroup to all the other clusters. Although this homolog retains all the critical and conserved motifs, it shows slight deviations in two consensus motifs, namely the GAGE motif at the N-terminal and the TCAT motif at the C-terminal regions of the sequence ( Figure 2). This homolog also is the most divergent among the Ciona Ga proteins (26-34%) and shares an identity of 33% with the top human Ga match. Hence this homolog was placed into a group designated 'Unclassified'. The presence of only a single or two members in Ciona, belonging to each of the major Ga subunit families is in contrast to the presence of several members of each family in mammals. Moreover, a lower eukaryote like Caenorhabditis elegans has 20 Ga subunit genes out of which four belong to the four major mammalian families, while the remaining 16 are not homologous to the mammalian subunit and are a product of lineage specific gene expansion [25].
The Ga subunits in Ciona share a strong level of sequence conservation with those from other organisms including human, reflecting common structure-function relationships ( Figure 2). The Ga subunit contains two domains, a Ras-like domain with a nucleotide-binding pocket and an all a-helical domain, composed of a six-helix bundle that makes up a lid on the nucleotide-binding pocket. Binding to GTP causes conformational changes within three flexible segments of Ga, namely ''switch'' regions (I-III) leading to a well ordered, GTP bound activated conformation with lowered affinity for the Gbc complex and increased affinity for downstream effectors. Several crystal structures of Ga-effector complexes have revealed that a common site for effector interactions is provided by a highly conserved hydrophobic cleft   the critical nucleophilic water molecule responsible for hydrolysis of the c-phosphate, is completely conserved (numbered as in human Ga i1 , Figure 2) [26]. The communication between activated GPCR and the Ga subunit is modulated primarily through the C terminus/a5 helix and a4/b6 loop that invoke further structural changes necessary for GDP release [27][28][29]. The loop connecting b6/a5 contains a completely conserved TCAT motif that stabilizes the binding of GDP. The release of GDP is triggered by the receptor contacts to the C terminus and through the a5 helix that further structurally modulate the TCAT motif. Other highly conserved regions of the Ga subunit involved in transmitting conformational changes of the TCAT motif includes the a3 helix, which connects the a3/b5 loop to switch III and the b6 strand [30][31]5]. The N-terminal region of the Ciona Ga subunits also contains highly conserved Cys and Gly residues that are sites of palmitoylation and myristoylation which anchor the subunits to the membrane thus regulating membrane localization and protein-protein interaction [32].

The Gbc complex
A total of two homologs of the Gb class were identified in Ciona with both homologs exhibiting the characteristic conserved set of WD repeat motifs. Figure 3 denotes the relationships of the repertoire of Ciona Gb homologs. The Gb class of proteins consists of a single family and subfamily that includes five types of Gb proteins named Gb (1)(2)(3)(4)(5). In the mammalian genome, the types 1 to 4 are highly conserved sharing greater than 80% sequence identity, but Gb 5 is divergent, sharing only about 50% identity [33][34]. Our phylogenetic analysis indicates that one Ciona homolog (283613) exhibits a clear one-to-one orthologous relationship to the Gb(1-4) group while the other sequence is orthologous to the Gb 5 members (297548) (Figure 3). The two Ciona Gb members share a sequence identity of about 41% between them and are more divergent than that in human. The Ciona Gb(1-4) ortholog shares a high sequence identity of about 77% with that of the human Gb (1)(2)(3)(4) group while sharing about 40% identity with the human Gb 5 . The Ciona Gb 5 ortholog shares a sequence identity of 51% with the human Gb 5 .
The Ciona genome holds two homologs of the Gc class of proteins. It has to be noted here that a search for Gc homologs in Ciona v.2.0 database resulted in the identification of just a single hit. However, parallel searches of the v1.0 database resulted in the identification of yet another homolog of the Gc protein that has not been modeled and annotated in the v2.0 database. We have therefore annotated this sequence by the accession number corresponding to the Ciona v1.0 genome database. In both Gc homologs, the protein models have been verified as having EST hits. In human, 12 Gc gene products are known. Furthermore, among all heterotrimeric G protein subunits identified in Ciona, the Gc homologs display the largest divergence to that of human. The Ciona Gc sequences share the highest sequence identities of 40% and 34% to human Gc type 7 and 12 members. Phylogenetic analysis of the Gc members revealed that the Ciona Gc subunits show no reliable clustering to any known Gc types as classified in the gpDB database (Figure 4).
The Gb subunit has an extended N terminal a-helix followed by a seven-bladed b-propeller fold that is composed of seven WD sequence repeats. The N-terminus is primarily supported by coiled-coil interactions with Gc which is an extended stretch of two a-helices joined by a loop with additional interactions to loops that connect the CD and AB strands of Gb blades 5 and 6 [35][36] (Figures 5a,b). The heterotrimeric Ga and Gbc complex formation involves two major sites of interactions. Firstly, the extensive burial of the b3/a2 loop and a2 helix (switch II) of Ga within six of the seven WD repeats of Gb. Numerous interactions involve a hydrophobic core formed by Trp211 of Ga and Trp99 of Gb (numbering as in human Gb 1 ) and forms the basis for Gbcmediated guanine nucleotide dissociation inhibitor activity and for Gbc binding between Ga-GDP and Gbc-effectors [5]. Several key switch II residues such as Lys209, Trp211, Ile212 and Phe215 that interact with Gb 1 are completely conserved in Ciona. The other major region of Ga/Gb interactions involves the WD1-2 of Gb and the extended N-terminal helix of Ga (Figure 2, Figures 5a,b). There is also considerable overlap of the effector binding site on Gbc with the region that interacts with the switch II of Ga [37]. Our sequence comparisons reveal that a majority of the functionally important residues of the Gbc sequences are identical between Ciona and other organisms including human, indicating highly conserved modes of complex formation.
The selectivity and diversity of the Gbc complex of mammalian G proteins arise from the numerous potential dimer combinations that can be assembled from a repertoire of 5 Gb protein subunits and 12 Gc subunits. The Gb(1-4) subunits form a tight complex with Gc subunits and can only be separated under denaturing conditions [38]. The sequence divergence between the Gb(1-4) and the Gb 5 types is reflected in the relatively weak interactions of the Gb 5 with the Gc subunits in contrast to that in the Gb(1-4)c complex. Moreover, unlike the other Gb subunits which are generally found associated with the membrane, Gb 5 exists in a significant amount in the soluble cell fraction [39]. It is now well established that Gb 5 makes an in-vivo complex with the R7 subfamily of the RGS proteins in preference to Gc and that the novel Gb 5 -RGS dimer formation is primarily mediated through interactions via a Gc-like domain (GGL) present N-terminal to the RGS domain [34,40]. The identification of the Gb 5 ortholog in Ciona also points to the fact that the Ciona genome most likely encodes homologs of GGL domain containing RGS proteins.

RGS proteins
The repertoire of human RGS family includes at least 37 members that are broadly classified into eight subfamilies. All members contain a core Ga-interacting RGS domain while they differ widely in their over all sizes and organization of several functional domains. RGS proteins are multifunctional and have the ability to negatively regulate Ga subunits and also function as effector antagonists. This multiple modular architecture enables regulatory selectivity and specificity in their mediation with diverse signaling partners that include all three components of the GPCR signaling pathway [41,6]. We identified 14 proteins in Ciona that possess the RGS domain (Table S1, Data S1). Eight of these are sequences that contain just the RGS domain at the C-terminal end with N-terminal regions having variable lengths ranging from 28 to 349 amino acid residues. It is to be noted here that 9 sequences identified from Ciona v2.0 database turned out to be fragments and in all these cases, the complete sequences corresponding to these members were identified in v1.0 database. One other fragment sequence (ci0100153349) was exclusively identified only from the v1.0 database with no corresponding protein model in v2.0. The complete sequence corresponding to this member was identified in the GenBank database. An examination of the RGS protein sequences for additional domains carried out using the Conserved Domain Database (CDD) indicated that six sequences contain additional domains that could confer further functionality [42]. One member has a DEP (Dishevelled, Egl-10 and Pleckstrin) domain N-terminal to the RGS domain, while two sequences contain a C2 domain in addition to the RGS domain. The RGS repertoire also includes one sequence that contains four domains that include an N-terminal PXA (Phox-associated) domain, an RGS domain followed by a PX (Phox) domain and a Nexin_C domain and another sequence with an RGS domain and a C-terminal DIX domain ( Figure 6). Among these various domains, the DEP domain has been shown to play an important role in mediating the interaction of an RGS protein to the C-terminal tail of the GPCR, thus placing RGS in close proximity with its substrate Ga subunit [43]. The PX domain is an important phosphoinositide-binding module with varying lipid-binding specificities and implicated in cell signaling.
Our results further revealed that Ciona indeed does contain one RGS homolog (373767) that contains a GGL domain located between the DEP and RGS domains analogous to the domain organization that characterizes members of the R7 subfamily (RGS6, RGS7, RGS9, RGS11) of the human RGS family ( Figure 6). This finding strongly supports the formation of an RGS complex with the Gb 5 ortholog in Ciona. Sequence comparison of the GGL domain with that of the Gc subunits from Ciona as well as other organisms indicates the homologous relationship of the GGL domain to the Gc subunit ( Figure 5b). This is further confirmed by the existence of significant monophyletic clades that separates the Gc and GGL domains ( Figure 4). The GGL domain of Ciona shares an identity of 17% and 19% with the Gc subunit in Ciona and human, respectively and shares an identity of 28-49% with the four GGL domains in human. Recent structural studies show that the marked difference in specificities of Gb 5 for the GGL domain versus Gc arise from the cumulative effect of several minor changes in the corresponding sequences [44]. It is evident from our comparative analysis that the Ciona GGL domain containing RGS homolog could indeed interact with the Gb 5 homolog in an analogous mode.

Heterotrimeric G protein complexes
The results of our analysis indicating a repertoire of only one homolog each of the Gb(1-4) type and the Gb 5 type subunits, two homologs of Gc subunit and one homolog of the GGL containing RGS protein implies that only a maximum of five potential Gb heterodimer complex combinations (two Gb(1-4) c complexes, two Gb 5 c complexes and one Gb 5 -RGS complex) likely exists in Ciona. Furthermore, with 10 Ga subunits, the Ciona genome potentially displays a limited repertoire of 50 Gabc heterotrimeric complexes. It is noteworthy that the diversity in the types of Gbc complexes in Ciona is comparable to those seen in lower eukaryotic model organisms. For example, yeast has one Gb subunit (Genbank ID: AAA35114) that is equally distant to the mammalian Gb1-4 and the Gb 5 types. C. elegans has two Gb subunit sequences that are each orthologous to the Gb1-4 and the Gb 5 types. Furthermore, yeast has a single Gc subunit (Genbank ID: AAA35110) that shows poor similarity to the human homolog, while C. elegans has two Gc subunits (Genbank ID: AAK55965, AAK55966) that do not fall into any particular mammalian subfamilies (Figure 4).

Conclusions
Ciona appears to have a highly compact set of the heterotrimeric G protein complexes and these results complement the earlier findings of a compact set of GPCRs in the Ciona genome. The high levels of sequence similarity to the human orthologs indicate that they serve functions that are general to vertebrate signaling biology and that Ciona can indeed be used as a model organism to study A. Alignment of the Gb sequences from representative organisms. Secondary structure assignments (a-helices, b-strands) are derived from mammalian Gb 1 (PDB ID: 1GP2). The four b-strands that comprise each of the seven WD repeat segments within Gb subunits are marked with horizontal arrows to match the tertiary structure of Gb 1 . Residues in Gb that contact Gc are marked with inverted triangles and the contacts with the switch regions in Ga are indicated with stars. Key Gb 1 residues (Trp99, Asp228, and Asp246) interacting with the Ga i1 switch II helix are represented with upright triangles. B. Alignments of the Gc subunits and GGL domains from representative organisms. Secondary structure assignments are derived from mammalian Gc 2 (PDB ID: 1GP2). Gc subunit-like (GGL) domains and Gc subunits are structurally equivalent. Residues in Gc that contact Gb are marked with stars. Bold letters indicate highly conserved residues. The inverted triangle in the C-terminal of Gc sequences represents prenylation site. The horizontal line in the alignment separates the two blocks of Gc and the RGS sequences (GGL domain region). The black vertical boxes in the lower block show conserved residues specific to the GGL sequences. The inverted triangles corresponds to residues in the RGS9 of human and mouse that make contacts with Ga subunit, while the upright triangles corresponds to the specific contacts with the Gb 5 subunit. doi:10.1371/journal.pone.0007349.g005 vertebrate GPCR signaling events. Extensive experimental characterization of the developmental regulatory network in Ciona points to the possibilities of similar studies for GPCR signaling pathways [45]. The clear one-to-one orthology between human and Ciona G protein repertoires that extends at the levels of protein family, subfamily and type, indicates a strong conservation in the evolution of the mechanistic basis of specificity and diversity of the chordate GPCR signaling pathways. Thus, the limited repertoire of heterotrimeric G protein complexes in Ciona provides an unique opportunity to extend the predictions to guide experimental studies to explore the basis of receptor activation, specificity and selectivity, phenotypic effects of GPCR signaling events, reinforce functional importance of existing receptor G protein/RGS protein interactions and for identification of new partners and downstream targets.

Supporting Information
Data S1 Amino acid sequences of heterotrimeric G proteins and RGS proteins used as queries in BLAST and HMM search programs (FASTA format)