Identification, Activity and Disulfide Connectivity of C-di-GMP Regulating Proteins in Mycobacterium tuberculosis

C-di-GMP, a bacterial second messenger plays a key role in survival and adaptation of bacteria under different environmental conditions. The level of c-di-GMP is regulated by two opposing activities, namely diguanylate cyclase (DGC) and phosphodiesterase (PDE-A) exhibited by GGDEF and EAL domain, respectively in the same protein. Previously, we reported a bifunctional GGDEF–EAL domain protein, MSDGC-1 from Mycobacterium smegmatis showing both these activities (Kumar and Chatterji, 2008). In this current report, we have identified and characterized the homologous protein from Mycobacterium tuberculosis (Rv 1354c) named as MtbDGC. MtbDGC is also a bifunctional protein, which can synthesize and degrade c-di-GMP in vitro. Further we expressed Mtbdgc in M. smegmatis and it was able to complement the MSDGC-1 knock out strain by restoring the long term survival of M. smegmatis. Another protein Rv 1357c, named as MtbPDE, is an EAL domain protein and degrades c-di-GMP to pGpG in vitro. Rv1354c and 1357c have seven cysteine amino acids in their sequence, distributed along the full length of the protein. Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. By proteolytic digestion and mass spectrometric analysis of MtbDGC, connectivity between cysteine pairs Cys94-Cys584, Cys2-Cys479 and Cys429-Cys614 was determined, whereas the third cysteine (Cys406) from N terminal was found to be free in MtbDGC protein, which was further confirmed by alkylation with iodoacetamide labeling. Bioinformatics modeling investigations also supported the pattern of disulfide connectivity obtained by Mass spectrometric analysis. Cys406 was mutated to serine by site directed mutagenesis and the mutant MtbC406S was not found to be active and was not able to synthesize or degrade c-di-GMP. The disulfide connectivity established here would help further in understanding the structure – function relationship in MtbDGC.


Introduction
The cell-cell communication or quorum sensing plays a major role in survival and maintenance of bacteria during the stationary phase. One of the interesting aspects of quorum sensing is the coordinated response of bacteria like biofilm formation, antibiotic production, sporulation, expression of virulence factor etc. [1]. A cell produces small autoinducer molecule and simultaneously senses the concentration of the autoinducer in the cell surface [2]. Second messengers act as autoinducers and relay signals received at the cell surface to target molecules within the cells. Nucleotide derivatives, which act as second messengers have been extensively studied for their regulatory function [3]. Cyclic adenosine monophosphate (cAMP), Cyclic guanosine monophosphate (cGMP), Guanosine 39,59 (bis) pyrophosphate (ppGpp) are all important second messengers both in prokaryotes and eukaryotes. cGMP is commonly used in eukaryotes but has very little role in bacteria [4]. cAMP is known to activate catabolite regulatory protein (CRP), a transcription regulator of gene involved in carbon metabolism [1,5]. ppGpp on the other hand, regulates bacterial survival during nutrient starvation [6]. Another nucleotide, Bis-(39-59)-cyclic dimeric guanosine monophosphate (C-di-GMP) has been found to be involved in modulating cell surface and biofilm formation in several bacteria. This molecule was first reported more than 20 years back as a positive allosteric regulator of cellulose synthesis [7,8].
C-di-GMP is synthesized from cyclization of two GTP molecules by diguanylate cyclase (DGC) and degraded to linear diguanylic acid (pGpG) by phosphodiestrases (PDE) [9,10]. These two opposing enzymatic activities regulate the cellular pool of c-di-GMP. The DGC and PDE activity are encoded by conserved amino acid motifs GGDEF and EAL or HD-GYP, respectively [3,11,12]. With the advent of whole genome sequencing it is observed that GGDEF and EAL domain are ubiquitously present in all bacteria but absent in eukaryote [13]. Gram negative bacterial genomes harbor large number of proteins belonging to GGDEF-EAL domain super family, whereas Gram positive bacteria have small number of them. For example Vibrio cholerae and Escherichia coli has 53 & 36 proteins but Bacillus subtilis and Mycobacterium smegmatis has 7 & 1 GGDEF-EAL domain proteins, respectively [14]. In many cases GGDEF and EAL domains are present in tandem and most of the proteins so far characterized have either DGC or PDE-A activity. Interestingly, the possibility of opposing enzymatic activities co-existing in a single protein has also been reported [15,16]. Our last report on MSDGC-1 from M. smegmatis was one such example of a bifunctional protein [17].

Bacterial strains, plasmids and oligonucleotides
Bacterial strains, plasmids and oligonucleotides used in the study are listed in Table 1. DH5a and BL21 (DE3) E. coli strains were grown in LB broth at 37uC with agitation or on a plate containing 1.5% w/v agar. Antibiotics were used at following concentration as and when required: Ampicillin (100 mg ml 21 ) or kanamycin (35 mg ml 21 ) for E. coli and kanamycin (20 mg ml 21 ) or Hygromycin (20 mg ml 21 ) for M. tuberculosis or M. smegmatis. The PCR reactions were carried out using Dynazyme EXT polymerase. All of the clones generated were confirmed by sequencing (MWG, India). Restriction enzymes used for the cloning were procured from New England Biolabs or Fermentas.

Expression, isolation and purification of MtbDGC and MtbPDE from inclusion bodies
BL-21 cells carrying MtbDGC and MtbPDE plasmids were grown at 37uC in 2 L flask containing 500 ml of LB medium supplemented with 100 mg/ml of ampicillin. Protein expression was induced when cell density was reached to 0.6 at OD 600 with 1 mM of isopropyl-D-thiogalactopyranoside (IPTG). After 4 h induction, cells were harvested by centrifugation at 8,000 rpm for 15 min at 4uC. Cells were lysed in lysis buffer containing 50 mM Tris-HCl at pH 7.9, 500 mM NaCl, 100 mM Dithiothreitol (DTT), 1 mM Ethylenediaminetetraacetic acid (EDTA), 6 M Urea and 20% Glycerol. Cell suspensions were sonicated at 150 kHz, using a sonicator with a 13 mm probe. This cycle was repeated three times for a total sonication time of 5 min each. Cell debris was removed by centrifugation at 12,000 rpm for 20 min at 4uC. After centrifugation the supernatant was loaded on Nickel-Nitrilotriacetic acid (Ni-NTA) column and washed with 100 column volumes of wash buffer containing 10 mM imidazole. The protein was eluted with the elution buffer containing 500 mM imidazole. Eluted protein was further dialyzed as reported below.

Refolding of MtbDGC and MtbPDE proteins with stepwise dialysis
The protein solubilized in the elution buffer containing 500 mM imidazole was subsequently dialyzed at various steps with decreasing concentration of urea in the buffer containing 50 mM Tris-HCl (pH 7.9 at 4uC), 250 mM NaCl, 10 mM DTT, 5% Glycerol. Urea concentration was varied from 6 M to 0 M and the dialysate was allowed to reach equilibrium [30]. As reported earlier disulfide containing proteins can refold even in the presence of concentrated denaturant [31]. At the final stage with no urea, precipitation of the protein was not observed. Finally the protein was dialysed against the same buffer having no DTT and assayed for its activity.

Enzymatic assays
Di-guanylate cyclase and phosphodiesterase assays were adapted from procedures described previously [17,32]. Both the activities were followed in a buffer containing 5 mM of protein, MtbDGC, MtbPDE or MSDGC-1 and 25 mM Tris-HCl (pH 7.9), 250 mM NaCl, 10 mM MgCl 2 in 50 ml volume. The reaction was triggered by the addition of a mixture of 0.1 mM Guanosine 59-triphosphate (GTP) and a-labeled [ 32 P]GTP (Board of Radiation and Isotope Technology (BRIT), India; 0.01 mCi ml 21 or 3500Ci mmol 21 ) in the case of MtbDGC. However, MtbPDE activity was checked by replacing GTP with c-di-GMP. Aliquots were withdrawn at regular time intervals and the reaction was stopped with an equal volume of 50 mM EDTA. Reaction products (2.5 mL) were separated on polyethyleneimine-cellulose plates (Merk) in 1:1.5 (v/v) saturated NH 4 SO 4 and 1.5 M KH 2 PO 4 (pH 3.6) and plates were exposed to a phosphor-imager screen. The [ 32 P] c-di-GMP, prepared as described below was used as substrate for PDE-A activity. Preparation of [ 32 P] labelled and nonlablled c-di-GMP was done by the protocol as described earlier [10].

Western blot analysis
Western blot analysis for the detection of MtbDGC was carried out with primary antibodies raised in rabbit against the purified protein.
Total cellular proteins extracted from cells grown at different time were normalized and 100 mg of the lysate was separated by Poly acrylamide gel electrophoresis (SDS-PAGE) and blotted on to a Sodium Dodecyl Sulphate -Polyvinylidende Fluoride (PVDF) membrane. The polyclonal serum generated was used as primary antibodies and a secondary antibody was purchased from Sigma Aldrich. Both were used after ten thousand fold dilution. The blots were developed with 26 mg aminoethylcarbazole ml 21 and 0.01% v/v H 2 O 2 .

Long term starvation cultures
The DMSDGC-1, wild type strain of M. smegmatis mc 2 155 and strain developed by complementing DMSDGC-1 with Mtbdgc were grown in MB7H9 with 0.02% w/v glucose and 0.05% v/v Tween280 till saturation. Cultures were declumped before plating on agar described earlier [29]. The colony forming units of these cultures were determined at regular interval of time up to 20 days. Antibiotics were omitted from the culture to rule out any possible effect of antibiotics on long term survival.

Proteolytic digestion of protein (MtbDGC)
10 mg of purified full length protein was diluted to final acetonitrile concentration of 5-10% in presence of digestion buffer (50 mM of ammonium bicarbonate) and subsequently digested at 37uC overnight using the following proteolytic enzymes in individual experiments. MtbDGC was digested with 200 ng trypsin (sequencing grade, modified trypsin (promega)), 200 ng chymoptrypsin (sequencing grade, modified trypsin (promega)) and double digested with 100 ng of trypsin and 100 ng of Reduction and alkylation of digested protein sample 10 mL of purified protein was taken in 15 mL of 0.1 M ammonium bicarbonate buffer pH 8.0 for alkylation. For the reduction, DTT was added to a final concentration of 8 mM and was incubated at 37uC for 3 h. For alkylation iodoacetamide (IAM) was added to the final concentration of 40 mM. The mixture was incubated at room temperature in the dark for 90 mins. Reaction mixture was incubated at 37uC overnight for further digestion. Two sets of reactions were carried out, one to check for alkylation and the others for reduction and followed by alkylation.

MALDI-TOF Mass spectrometry
Mass-spectra of digested gel spots were obtained by Matrix-Assisted Laser Desorption Ionization time of flight (MALDI-TOF) mass spectrometry on an Ultraflex TOF/TOF spectrometer (Bruker Daltonics). All the mass spectra were acquired in positive-ionization mode with reflectron optics. The instrument was equipped with 50 Hz pulsed nitrogen laser (l = 337 nm) and operated under delayed extraction conditions; delayed time 90-ns, and 25 kV accelerating voltage. All peptides samples were prepared by mixing an equal amount of matrices dihydroxybenozic acid/a-cyano-4-hydroxycinnamic acid saturated in 0.1% Trifluoroacetic acid and acetonitrile (1:1). Typically, 50-100 laser shots were used to record each spectrum.

Tandem Mass Spectrometry (LC-ESI-MS, MS/MS)
For all the disulfide connectivity studies, the proteolytic peptide mixture was analyzed by reverse-phase HPLC-ESI-MS, MS/MS (Electrospray Ionisation, ESI, Bruker Daltonics) spectrometry. Peptides were separated on HPLC (1100 series HPLC (Agilent)) system equipped with C18 column (Supelco). The column eluant was directly coupled to a HCT Ultra PTM Discovery System (ETD II-Bruker Daltonics). Mass spectra (ESI-MS) and tandem mass spectra (ESI-MS/MS) mode were recorded in positive-ion mode with a resolution of 12,000-15,000 Full Width of Half Maximum (FWHM). For tandem mass spectrometry dissociation, the mass analyzer was set to 61 m/z. The precursor ions were fragmented in a collision cell using nitrogen as the collision gas. Spectra were calibrated in static mode using MS/MS fragment ions of standard supplied by the manufacturer. The enzymatically digested peptides were separated using mobile phase A consisted of 0.05% formic acid in 98% H 2 O/2% CAN (Acetonitrile) and B consisted of 0.05% formic acid in 98% ACN/2% H 2 O with a gradient of 5% B in first 5 mins, followed by 95% B in next 50 mins and 5% B in last 10 mins. The gradient times were varied depending upon the sample specifications.

Homology modeling, minimization and model validation
The sequence and domain information of 623 amino acid length Rv1354c was taken from swissprot (ID:P64826; http://us. expasy.org/sprot/) [33]. Protein consists of three domains named as GAF (residue 28-171), GGDEF (residue 212-345) and EAL (residue 354-609). Since no homology was found for the full length protein, Modeling of individual domain was done and joined to give the complete structure. BLAST [34] and PSI-BLAST [35] was used against National centre for biotechnology information (NCBI), Protein Data Bandk (PDB) database to find top hit as a template and target vs template sequence alignment for individual domains was done using CLUSTALW2 [36]. On the basis of sequential similarity 10 models each of EAL and GGDEF domain were modeled with the protein PDB [37] entry 2R6O (chain:A;identity:40%) and PDB entry 1W25 (chain:A;identity:40%) respectively using MODELLER release 9v7 [38] and model using PHYRE [39] server. GAF and EAL domain consists of N terminal (residue 1-27) and C terminal (residue 610-623) of full protein respectively. Considering three modeled domains as template full with the minimum DOPE score was selected. A threaded model of GAF domain was produced convergence to machine precision. Minimized structure was validated with the programs PRO-CHECK score (91.8%) [40], VERIFY3D (78.04%) [41] and PROSA score (26.68) [42]. PyMOL [43] was used for 3D-  [44] using OPLS [45] force field and steepest descent method. The minimization was set to run for 10000 steps with 0.002 ps time step or until.

Alignment of proteins containing GAF, GGDEF, EAL architecture
Since the protein of interest was having three domains namely GAF, GGDEF, & EAL domain, protein sequences with the above architecture were searched in PFAM database [46]. A total of 183 sequences were found. Many of the protein sequences consisted of transmembrane or low complexity regions and there was a large variation in length among the sequences. This variation was more prominent in the region before GAF domain and the relative positioning of each domain with respect to other. Linker region between the GAF and GGDEF domain also varied in length. Each domain was separated from the corresponding sequences by taking the domain information given in the PFAM database as SWIS-SPROT did not provide the domain start & end information for all 183 sequences. Alignment of each of the domain sequences was done separately using ClustalW2. Although there was ambiguity in the assignment of start and end positions of each domain of proteins in PFAM and SWISSPROT databases, the SWISSPROT data was considered for modelling as it is supported by experimental methods.

Cysteine mutation using site directed mutagenesis
Third cysteine of MtbDGC at position 406 was mutated to serine amino acid using PCR based approaches. In this method two complimentary oligonucleotides containing the desired mutation were used to prime the PCR on a pET MtbDGC plasmid DNA template. Conditions used for PCR are as follows -in total volume of 50 mL template DNA 100 ng, thermostable polymerase buffer (106) 5 uL, mutagenic primer 20 pmol, 0.6 mL of 25 mM dNTPs solution and thermostable polymerase 2-5 U. The tube was initially heated at 94uC for 2 min, followed by 19 cycles of 94uC for1 min, for annealing 54uC for 1 min and an extension step at 72uC for 8 mins, the tube is held at 72uC for 20 mins and if required at 4uC indefinitely. Subsequently digestion of the reaction mixture with DpnI was done to removes the template DNA leaving intact newly synthesized double stranded mutant PCR product designated as MtbC406S [47,48]. The primers used in mutagenesis are listed in Table 1. The clone was confirmed by sequencing at MWG (data not shown), Bangalore. The plasmid MtbC406S was further transformed in BL-21 Escherichia coli cells. Protein was expressed and purified as described in section 2.3. Activity of the mutated protein was checked as described in section 2.5.

Circular dichorism study of MtbDGC and MtbC406S
The purified MtbDGC and MtbC406S protein after folding was further dialyzed against the CD buffer containing 10 mM phosphate buffer pH 8.0, 150 mM KCl and was filtered through a 0.2 mM PVDF filter (Milipore). 5 uM of MtbDGC and MtbC406S protein were used for CD studies. Circular dichorism spectra were recorded on a Jasco J-715 spectopolarimeter at 22uC using a cell of path -length 0.2 cm, bandwith 2.0 nm and a response time of 2s. Spectra were averaged over three scans with each sample and were recorded at a speed of 50 nm/min from 250 nm to 200 nm in a quartz cuvette.

MtbDGC possesses both DGC and PDE-A and MtbPDE possesses PDE-A activity
Domain organization of Rv 1354c and Rv 1357c was obtained from SMART database as shown (Figure 2A). Rv 1354c and Rv 1357c are named and referred throughout the manuscript as MtbDGC and MtbPDE, respectively. MtbDGC and MtbPDE were purified in soluble form as described in materials and methods to homogeneity as judged by SDS-PAGE profile ( Figure 2B). Both the proteins were more than 80% pure (Protein id. NP-215870.1). Further digestion of the band and subsequent MALDI and MASCOT analysis confirmed the presence of MtbDGC and MtbEAL proteins (data not shown). These protein preparations were used to check the DGC and PDE-A activity. The reaction products were separated on reverse phase High performance liquid chromatography (HPLC) along with GTP and purified c-di-GMP as controls. Further, these peaks were collected and subjected to MALDI-TOF analysis. The major ions detected were m/z 542 (M+H) + for GTP, 691 (M+H) + and 713 (M+Na) + for c-di-GMP, 709 (M+H) + and 731 (M + Na) + for pGpG, thus confirming that MtbDGC is capable of both synthesizing and degrading c-di-GMP ( Figure 3A). Reaction mixture with MSDGC-1 protein (Kumar and Chatterji, 2008), a known DGC protein of Mycobacterium smegmatis was used as control. On the Thin Layer Chromatography (TLC) plate the spot at the same Rf of cdi-GMP apparently followed by another spot with Rf of pGpG.
Reaction mixture with MSDGC-1 was used as a positive control and reaction mixture with cell lysate was used as the negative control ( Figure 3B).
Similarly, the reaction mixture where c-di-GMP was incubated with MtbPDE showed reduction in peak area of c-di-GMP and appearance of an additional peak at 16.4 min at the same retention time of pGpG. The mass spectrum of 16.4 min peak showed major ion at m/z 709, an addition of a water molecule to c-di-GMP ( Figure 3C). The purified pGpG showed the identical mass spectrum at m/z 709 confirming that the MtbPDE possesses phosphodiesterase activity.

MtbDGC is functional in vivo
Previously we reported that the MSDGC-1 which possesses both DGC and PDE-A activities is required for long term survival in M. smegmatis, under nutritional starvation. MtbDGC is also bifunctional and homologous to MSDGC-1. It is intriguing to test the in vivo functionality of MtbDGC in the MSDGC-1 disrupted mutant DMSDGC-1. We complimented this mutant through chromosomal integration of MtbDGC in DMSDGC-1 strain using pMV361 integration vector containing Rv1354c gene. These strains were grown in carbon starved conditions and their survival was followed in terms of CFU until 20 days of incubation. Figure 4A shows that MtbDGC was able to restore the long term survival phenotype of the mutant strain as the CFU of wild type and complimented strain was comparable.
Level of MtbDGC increases in stationary phase of growth As described earlier, the level of MtbDGC in bacteria varies according to the environmental change and the growth phase. We determined the level of MtbDGC expression in three phases of the growth, i.e. exponential, early stationary and late stationary phase.   Figure 4B) was detected, with increased band intensity in the stationary phase. Figure 4B shows that the amount of MtbDGC was three times higher in the stationary phase culture than in the exponential phase, as detected by quantization of the blot. In each lane, the same amount of protein (100 mg) was loaded. It should be mentioned here that the growth of the bacteria was found to be slow when the dgc gene was over expressed (data not shown).

Proteolytic digestion of MtbDGC protein
The flow chart for the protein digestion is depicted in Figure 5. Protein was digested with trypsin, chymotrypsin and in combination of both in three different sets of experiments. The ratio of concentration of proteolytic enzyme (Trypsin and Chymotrypsin) to protein affects the LC-ESI-MS/MS response. The optimal ESI signal was obtained when the ratio of enzyme to protein was 1: 20. Injection of 10-15 nmol of digested protein into the capillary of LC-ESI-MS/MS system produced a signal sufficient to characterize peptide containing labeled Cys and disulfide bonded Cys residues (data not shown). Mass spectral characterization of free cysteine residue and disulfide bonds In order to identify number of disulfide bonds and free cysteine in the intact MtbDGC protein, LC-ESI-MS of intact protein was studied, which confirmed m/z of 68475.99 (His 6 protein). The protein was further reduced with 8 mM DTT and mass difference of seven Daltons m/z 68482.99 (His 6 protein) was obtained confirming the presence of at least three disulfide bonds in the protein ( Figure 6A and 6B).
A common problem in detecting peptides which contain free cysteine residues is that the sulfhydryl group reacts with other chemical in the solution. Thus, the cysteine residues in the protein are alkylated by alkylting reagent like iodoacetamide, iodoacetic acid or 4-venyl pyridiene. Here, we used iodoacetamide to alkylate the samples, using five fold molar excess of iodoacetamide, and we were able to detect all seven cysteine containing peptides in proteolytic digested protein in MALDI as well as by LC-ESI-MS analysis. A sample that displayed an isotopic distribution of the required peptides was further used for the studies.

Disulfide connectivity
Disulfide connected dipeptide were analyzed by LC-ESI-MS/ MS and individual peptides were identified based on the information obtained by tandem mass spectra. The technique involves the breakage of peptide backbone generating predominantly b and y-type ions [49]. The LC-ESI-MS/MS has been shown to give ambiguous result due to thiol exchange reaction. To overcome this problem all the analysis was done in the presence of alkylting agent iodoacetamide. To prove that the disulfide bonding is not altered, alkylation was done before or after protein was proteolyticaly digested.
MS/MS analysis of singly charged [M + H] + ions at m/z 776.6 (aa. 400-407) obtained from trypsin and chymotrypsin digested protein after DTT treatment confirmed Cys 406 as a free cysteine containing peptide, Figure 8A.  Figure 9C, the tandem mass spectra obtained from the ion m/z 1976.8 confirm the structure of peptide containing the third disulfide bridge between Cys 429 -Cys 614 . Figure 9C shows that the prominent fragments are derived from the singly charged ion of dipeptide containing Cys 429 and Cys 614 . From MS/MS analysis it can be seen that only minor fragmentation occurred within the amino acid sequence between disulfide linked cysteine residues. The data showed no other disulfide linked peptides, other than the expected. It is consistent with the fact that it is difficult to cleave peptides that contain -S-S-bonds as fragmentation requires cleavage of double bonds [28]. No evidence was found for the fragmentation of the amide bonds and formation of linearized sequences by cleavage of peptide bonds inside the disulfide ring [50].
Similar dissociation fragmentation processes have already been reported earlier for intramolecular disulfide-bonded peptides [26]. The obtained data confirmed the disulfide connectivity within intramolecular disulfide bond, but not intermolecular disulfide bond. Similar results to those shown in Figure 9(A-C) were Table 3. Theoretically and experimental mass of peptides generated from intact proteins samples of MtbDGC upon oxidation.    observed for chymotrypsin and trypsin digested samples treated with or without iodoacetamide.

Sequence alignment of proteins containing GAF, GGDEF, EAL domains and homology modeling of Rv1354c
There is no structure available for full length GAF-GGDEF-EAL bifunctional protein. However, to probe the disulfide connectivity, even an approximate structure would be very useful. Structure obtained by modeling studies (Figure 10) showed a good correspondence between the predicted disulfide connectivity obtained by mass spectral analysis.
Schmidt and coworkers [51] first noticed that the catalytically active EAL domains seem to contain a conserved motif that was later confirmed to contain loop 6 [DFG(T/A)GYSS], loop between the a6-helix and b6-strand and one of the residues (Asp) for Mg 2+ binding [51,52]. Alignment of the EAL domain sequences shows a conserved motif [DFG(A/S/T)(A/G)(Y/F)(S/ T)(S/T/G/A/N)] or also called loop 6 ( Figure S1), apart from conserved EAL motif and many conserved acidic residues. Similar result was found when EAL domain (PF00563) family (total of 8082 sequences) in PFAM database, was analyzed (data not shown). Out of three domains named as GAF (residue 28-171), GGDEF (residue 212-345) and EAL (residue 354-609), EAL domain is the biggest with (b/a) 10 barrel fold with 10 a-helix and b-strands each assigned by STRIDE [53]. Percentage consensus of G 2 , G 4 , F 3 , S 7 and S 8 in the loop 6 was found to be 88%, 79%, 80%, 65% and 61% respectively highlighting the importance of both Glycine residues along with Phenylalanine in the loop. Above result is based on the alignment of 8082 sequences belonging to EAL domain family. G 514 and G 516 are involved in the side chainmain chain hydrogen bonding with E 484 , while F 513 and D 534 residues are involved in the main chain-main chain hydrogen bonding ( Figure 11) [54]. Hence, it can be concluded that the above three residues are important for the positioning of the loop 6. Depending upon the involvement of residues of loop 6 in forming hydrogen bond, they can be divided into two parts: i) D 512 F 513 G 514 T 515 G 516 (ii) Y 517 S 518 A 519 . The first half gives stability and proper positioning to the loop, while the latter half is flexible. Alignment of EAL domain sequences obtained from the GAF, GGDEF, EAL architecture shows very high conservancy of P 376 , E 484 , G 505, L 523 , K 532 , E 568 , G 569 , V 570 , E 571 ,Q 588 G 589 . As discussed earlier E 484 is involved in the hydrogen bonding with two glycines of loop 6. On mapping P 376 , K 532 , E 568 , G 569 ,E 571 , Q 588 , G 589 on the modeled structure, it was found that these residues are in close proximity with E 389 A 390 L 391 motif ( Figure 12) may also play an important role in coordinating with Mg 2+ ions.
GGDEF domain sequences were also aligned in the same fashion as the EAL domain sequences and were found to have high conservancy of G 232 , D 237 , G 307 & R 259 apart from GGDEF 261-265 motif ( Figure S2). When the above three residues were mapped on the structure, D 237 is found to be on the surface, possibly making it important in oligomerization. High conservancy of G232 can be related with the need of neglecting satiric clashes and the proper orientation of the two consecutive helices. R 259 may help in the stabilizing the GTP-GGDEF domain complex as it is close to the GGDEF motif. P 131 & G 140 of GAF domain shows high conservancy ( Figure  S3). Both of them are part of beta strands and at the same time they are involved in van der Waals interaction with R 381 and N 54 respectively.
Although the part of linker region between GAF and GGDEF domain (linker 1; residue 184-211) is considered as a part of GGDEF domain, yet while analyzing it was considered as a part of linker region (according to SWISSPROT linker 1 ranges from residue 172-211) and analysis shows the high conservancy of D 187 , T 190 & N 194 , R 195 (Figure S4). All are polar residues and may play important role in stabilizing the oligomer of the proteins.
Several N-terminal residues (residue 1-28) are involved in either main chain-main chain (M 1 -V 565 , C2-V 565 ,R 583 -D 4 ,A 6 -D 581 ) or side chain-main chain (T 7 -T 165 ,Q 9 -A 159 ) interaction with the residues of GAF and EAL domain. At the same time C 2 forms a disulfide bond with C 479 , showing the importance of the Nterminal residues in stabilizing the GAF-EAL interface. Although C-terminal residues are not involved in the formation of hydrogen bond, C 614 forms a disulfide bond with C 429 . Hydrogen bonds were found between the GAF and EAL domain residues also (data not shown). Interestingly enough GAF domain does not interact with the GGDEF domain, but interactions were found between the linker 1(residue 172-211).
EAL-GGDEF interface has lesser number of interactions compared to that of the GAF-EAL interface. Few main chainmain chain & side chain-main chain hydrogen bonds were found (data not shown).  Van der Waals interaction between two atoms was evaluated using CONTACT program of CCP4 program suit by setting the limit of distance below 4.5 Å [55]. The result shows that although EAL is the last domain, yet it is in contact with both GAF and GGDEF domains.
The proposed structure of Rv 1354c protein shows well exposed GGDEF motif. Disulfide bond length was found to be 2.3 Au (Cys 2 -Cys 479 , Cys 429 -Cys 614 ) and 2.2 Au (Cys 94 -Cys 584 ). Cys 406 residue of EAL domain is more solvent assessible than other six cysteine residues.

Construction of Cysteine mutant by site directed mutagenesis
From the Mass spectral data it was confirmed that cysteine at position three (Cys 406 ) is free. To further elucidate whether Cys 406 has any role in the enzymatic activity of the protein, the Cys 406 was mutated to serine. Cysteine mutation was confirmed by LC-MS and the mutated protein showed the mass of 68459.93 Da (histidine tagged) (data not shown). MtbC406S was further reduced with 8 mM of DTT and the reduced protein showed the mass difference of seven Da confirming the presence of only three cysteine bond and Cys 3 (Cys 406 ) is free (data not shown). The chymotrypin and trypsin digested MtbC406S shows the change of 16 Da in the peptide bearing cysteine at position three confirming the mutation of Cys 406 to Serine 406 (aa. 400-407) ( Figure 13A). MS/MS analysis of mutated peptide (aa. 400-407) obtained from trypsin and chymotrypsin digested protein showed the mutation of cysteine to serine (data not shown). In order to check the activity of c-di-GMP in MtbC406S protein, the mass spectrometric as well as thin layer chromatographic analysis were done as shown in Figure 13B and Figure 13C. It is clear from the data that Cys 406 which is a free cysteine plays a critical role in the regulation of enzymatic activity and by mutating cysteine to serine, protein activity is lost. However, when CD spectrum of the mutated protein was compared with MtbDGC protein ( Figure 14) and no difference was found in the folding pattern of the protein. This indicates that Cys 406 plays an important role in c-di-GMP regulation but does not play any significant role in the folding of the protein.

Concluding remarks
The cyclic dinucleotide 39, 59 cyclic guanylic acid (c-di-GMP) has been characterized as an important second messenger that affects a range of physiological traits in bacterial species. Many reports have come in recent past, from diverse branches of bacteria utilizing c-di-GMP signaling pathways [56,57]. Prior to our study in Mycobacterium smegmatis (Kumar and Chatterji, 2008), nothing was known about the role of c-di-GMP in Mycobacteria. We reported that MSDGC-1 protein is a bifunctional protein and can synthesize and degrade c-di-GMP in vitro. This study was undertaken to follow the role of C-di-GMP in M. tuberculosis. In this work we investigate the homologue of MSDGC-1, MtbDGC in M. tuberculosis H37Rv. We reveal the following findings: (1). M. tuberculosis has one protein with GAF-GGDEF-EAL (MtbDGC) and another with only EAL domain (MtbPDE). (2). MtbDGC is a bifunctional multidomain protein exhibiting both the activity of DGC and PDE-A. (3). MtbPDE is a single EAL protein exhibiting only PDE-A activity. (4) Presence of three disulfide bond and one free cysteine in MtbDGC protein. (5) Upon mutation of free cysteine at position 406 to serine, synthesis of C-di-GMP was completely abolished.
All the known examples of GGDEF-EAL domain either have DGC or PDE-A activity. However, it is difficult to predict the dominant activity from the primary sequence [56,58]. In Rhodobacter sphaeroides and Vibrio parahaemolyticuss GGDEF/EAL protein shows both DGC and PDE-A activities as a function of environmental cue [59,60], but no clear cut distinction of the activities were possible. In this regard mycobacterial protein (MSDGC-1 and MtbDGC) is the first such protein which shows two activities at a time in vitro. It appears that both M. tuberculosis and M. smegmatis proteins have similar function; the role of additional EAL domain protein in M. tuberculosis is yet to be addressed.
M. tuberculosis DGC protein reported here was found to be predominantly expressed at the stationary phase suggesting its role in survival at stationary phase. This hypothesis was substantiated by complementation of MSDGC-1 knockout strain of M. smegmatis by mtbdgc. From sequence analysis and presence of dual activities it appears that both M. smegmatis and M. tuberculosis DGC will have similar function in the respective host. However, the only difference is presence of seven cysteines in MtbDGC. Comparison of the sequences of DGC protein revealed that three cysteines are conserved in MSDGC-1 out of seven in MtbDGC. It is interesting to study the presence of high numbers of cysteines in MtbDGC with regards to folding and regulation of c-di-GMP signaling at enzymatic level.
In order to address above hypothesis we studied disulfide bonded cysteine and free cysteine in MtbDGC. Disulfide bonded cysteine might play and crucial role in folding of MtbDGC, whereas free cysteine may be involved in regulation activity of enzyme. Extensive mass spectrometric analysis confirmed the presence of three disulfide bonded Cys 94 -Cys 584 , Cys 2 -Cys 479 and Cys 428 -Cys 614 and one free cysteine at position Cys 406 . Data obtained from mass spectrometric analysis was further confirmed by homology modeling.
Structure obtained by homology modeling revealed presence of three disulfide bonds and one free cysteine with exposed GGDEF motif. The homology model presented here, is the first example which shows that the last (EAL) domain comes in the middle interacting with the other two (GAF & GGDEF) domains. Probably it may hint in the possible mechanism regulating the activities of such proteins having both GGDEF & EAL domains to avoid a possible futile cycle. Apart from conserved motifs and polar residues, significant number of conserved Glycine residues was also found. Almost all conserved residues are either involved in the van der Waals interaction or in the hydrogen-bonding and are important in maintaining the orientation of the domains and thereby the shape of the protein.
Cys406 was found to be distinct far from the remaining six cysteines and does not participate in any disulfide connectivity. As we hypothesized that free cysteine might have regulatory role. We mutated Cys 406 to Serine and enzymatic activity was assayed. Cdi-GMP synthesis activity was completely abolished in mutated protein. The exact role of Cys 406 in mechanism of c-di-GMP synthesis still remains to be investigated. To our best knowledge this is the first example of regulation of enzymatic activity by free cysteine in GAF-GGDEF-EAL protein.