Crystal Structure of a Monomeric Thiolase-Like Protein Type 1 (TLP1) from Mycobacterium smegmatis

An analysis of the Mycobacterium smegmatis genome suggests that it codes for several thiolases and thiolase-like proteins. Thiolases are an important family of enzymes that are involved in fatty acid metabolism. They occur as either dimers or tetramers. Thiolases catalyze the Claisen condensation of two acetyl-Coenzyme A molecules in the synthetic direction and the thiolytic cleavage of 3-ketoacyl-Coenzyme A molecules in the degradative direction. Some of the M. smegmatis genes have been annotated as thiolases of the poorly characterized SCP2-thiolase subfamily. The mammalian SCP2-thiolase consists of an N-terminal thiolase domain followed by an additional C-terminal domain called sterol carrier protein-2 or SCP2. The M. smegmatis protein selected in the present study, referred to here as the thiolase-like protein type 1 (MsTLP1), has been biochemically and structurally characterized. Unlike classical thiolases, MsTLP1 is a monomer in solution. Its structure has been determined at 2.7 Å resolution by the single wavelength anomalous dispersion method. The structure of the protomer confirms that the N-terminal domain has the thiolase fold. An extra C-terminal domain is indeed observed. Interestingly, it consists of six β-strands forming an anti-parallel β-barrel which is completely different from the expected SCP2-fold. Detailed sequence and structural comparisons with thiolases show that the residues known to be essential for catalysis are not conserved in MsTLP1. Consistent with this observation, activity measurements show that MsTLP1 does not catalyze the thiolase reaction. This is the first structural report of a monomeric thiolase-like protein from any organism. These studies show that MsTLP1 belongs to a new group of thiolase related proteins of unknown function.


Introduction
The genus Mycobacterium comprises some of the most devastating pathogens that infect both animals and humans. To date, twenty eight genomes of mycobacterial species have been sequenced completely [http://www.ncbi.nlm.nih.gov/genomes/ lproks.cgi]. A striking feature in all these genomes is the abundance of genes coding for enzymes involved in fatty acid and lipid metabolism; more than 250 in Mycobacterium tuberculosis compared to only 50 in Escherichia coli. The mycobacterial genome codes for over a hundred enzymes involved in fatty acid degradation [1]. Apart from providing energy, lipids and fatty acids also form an integral part of the cell wall and cell membrane of Mycobacteria. The abundance and importance of lipid metabolizing enzymes in Mycobacteria make them attractive targets for drug discovery [2,3]. It is therefore necessary to biochemically and structurally characterize these enzymes.
Thiolases are a ubiquitous group of enzymes that are involved in biosynthetic and degradative pathways of lipid metabolism. In the last step of the b-oxidation pathway [4], degradative thiolases catalyze the shortening of the fatty acid chains by degrading 3-keto acyl CoA to form acetyl CoA and a shortened acyl CoA species (Figure 1). Thiolases are a subfamily of the thiolase superfamily. This superfamily also includes the Ketoacyl-(Acyl-carrier-protein) -Synthase (KAS) enzymes, polyketide synthases and chalcone synthases [5,6,7]. Most members of this superfamily are dimers and only a few tetramers have been reported. The tetramers are dimers of tight dimers. The best characterized member of the thiolase subfamily is the tetrameric biosynthetic bacterial thiolase from Zoogloea ramigera [8,9]. The key catalytic residue in all characterized members of the thiolase superfamily is a cysteine residue (Cys89 in Z. ramigera thiolase). This nucleophilic cysteine participates in the reaction by forming a covalent intermediate with the substrate. There are two other important catalytic residues, either two histidines or an asparagine and a histidine (Asn316 and His348 in Z. ramigera thiolase) that are also conserved in the thiolase superfamily [10]. These two residues are important for the formation of an oxyanion hole, which stabilizes the enolate intermediate of the Claisen condensation reaction. The reaction also requires a base to abstract a proton from the substrate. In the biosynthetic thiolase of Z. ramigera, a cysteine (Cys378) has been shown to play this role [8]. In thiolases, these catalytic residues are present in four highly conserved loops with characteristic sequence fingerprints [7,11,12]. An additional sequence fingerprint near the catalytic site is important for substrate discrimination between unbranched and 2-methyl branched fatty acid tails [7]. Each thiolase subunit is about 400 residues long and has three regions, an N-terminal core region, a loop region (referred to as the thiolase loop domain) and a C-terminal core region [7]. Each of these three regions consists of about 120 residues. The thiolase loop domain shapes the binding pocket of the Coenzyme A (CoA) moiety of the substrate. The N-and C-terminal core regions have similar topologies [12] and appear to be the result of gene duplication. Most of the catalytic residues occur in the C-terminal half but the conserved reactive nucleophile, a cysteine (Cys89 of Z. ramigera), is found in the N-terminal half of the protomer. Catalysis in both the synthetic and the degradative directions starts with the acylation of the catalytic, nucleophilic cysteine (Cys89 in Z. ramigera thiolase) [8].
In humans, six different thiolases have been identified (CT, T1, T2, TFE, AB and SCP2) [12,13,14] with distinct distribution in cellular compartments, quaternary structure, substrate specificity and enzyme kinetics. The sequences of these six thiolases are similar. Of these thiolases, AB [12], T2 [11] and CT [15] have been well characterized and a crystal structure has been determined for each of these enzymes. In contrast, no crystal structures are available for T1, SCP2 and TFE-thiolases. The mammalian SCP2-thiolase has an additional sterol carrier protein C-terminal domain (SCP2) [16]. The structure of SCP2 is known [17,18].
Examination of the Mycobacterium smegmatis genome revealed the presence of several putative thiolase genes [19]. These genes have been annotated as thiolases on the basis of sequence analysis. However, none of them has been biochemically characterized. The sequence identity between some of these proteins and the other well-characterized thiolases is rather low. The protein encoded by one of the M. smegmatis thiolase-like genes (the thiolaselike protein type-1, MsTLP1) was over expressed in E. coli, purified and crystallized [19]. The crystal structure of this MsTLP1 is described here. Analysis of the structure revealed that the protein has an N-terminal domain with a typical thiolase fold and an additional C-terminal domain. The C-terminal domain is a six stranded anti-parallel b-barrel which is completely different from the structure of SCP2. A detailed analysis of the structure and biochemical properties of this protein revealed that its fold agrees with that of classical thiolase, but the nucleophilic active site cysteine and other characteristic thiolase sequence fingerprints are not conserved. Accordingly, it is found that the protein does not exhibit thiolase activity. Therefore, this protein can be classified as a new subfamily of the thiolase superfamily.

Bioinformatics analysis
From a genome database search, two TLP coding genes (MsTLP1 and MsTLP2) were identified in M. smegmatis. The pair wise sequence identity between the two corresponding proteins is 31%. A similar search of the M. tuberculosis genome revealed the presence of only one such gene. The pairwise sequence identities of the protein encoded by this gene with MsTLP1 and MsTLP2 are 33% and 69%, respectively. Subsequent searches revealed the presence of TLP homologs in several other bacterial genomes.
However, a protein homologous to TLP was not found in the human genome. A multiple sequence alignment was performed using six human thiolases, several other eukaryotic thiolases and seven TLPs from various bacterial sources including M. smegmatis and M. tuberculosis. The phylogenetic relationships of TLPs with thiolases were analyzed and the results are presented in Figure 2. The TLPs are clustered as a distinct group and the branch point is near the SCP2-thiolase group with high confidence. The mammalian SCP2-thiolase gene codes for an N-terminal thiolase domain and a C-terminal SCP2 domain. Interestingly, the TLP gene also codes for a thiolase like N-terminal domain and an additional C-terminal domain. However, the latter domain is not related to the C-terminal domain of SCP2-thiolase in sequence and its function is yet to be established, as discussed below.

Enzymological characterization
The molecular mass of an MsTLP1 protomer as calculated from the sequence is 56 kDa. Static Light Scattering (SLS) and size exclusion chromatography experiments show that, unlike other thiolases, MsTLP1 is a monomer of 56 kDa in solution ( Figure S1). Similarities between the structures of MsTLP1 and thiolases suggest that MsTLP1 might be able to bind CoA and its derivatives. However, thiolase activity could not be detected in either the forward or the reverse direction. The assays were carried out with two different batches of freshly purified enzyme.
Surface Plasmon Resonance (SPR) binding assays showed that TLP1 binds CoA ( Figure S2). The binding curves were fitted to the Langmuir binding equation and the affinity of MsTLP1 for CoA was found to be in the millimolar range (K d = 0.6 mM 60.3) (n = 3). The assays were carried out using two different batches of freshly purified enzyme.

Structure determination and model quality
The crystal structure of MsTLP1 was determined using the single wavelength anomalous dispersion (SAD) method. As the triclinic P1 cell could accommodate six protomers of the polypeptide, rotation functions were computed using reflections Figure 1. The degradative reaction catalyzed by thiolase. The substrate specificity of different thiolases are distinct: the mammalian SCP2thiolase accepts as substrate a molecule in which R1 is a 2-methyl group and R2 is a steroid moiety. In all degradative thiolases, the reaction proceeds via a covalent intermediate in which a nucleophilic cysteine is acylated in the first step by the 3-keto substrate, releasing free acetyl CoA (when R1 is H) or propionyl CoA (when R1 is the 2-methyl moiety). The acyl group is subsequently transferred to CoA in the second step of the reaction. doi:10.1371/journal.pone.0041894.g001 Figure 2. Evolutionary tree analysis of the thiolase sequences. The phylogenetic tree was constructed using the neighbor-joining method, with 10,000 bootstrap replicates in MEGA5 software. Only the region corresponding to the thiolase domain was used for these calculations. The group identifiers correspond to the nomenclature described in the text. The numbers next to each node indicate bootstrap values as percentages. The following sequences (listed with their NCBI accession codes and, for trypanosomatid sequences, their NCBI and GeneDB accession codes) were used for creating the evolutionary tree. in the 10 Å -4 Å resolution shell and a radius of integration of 30 Å . Rotation functions corresponding to k = 180u and k = 120u hemispheres had significant peaks consistent with non-crystallographic 32 symmetry [19].
The structure determination was not straightforward and various strategies as described in the methods section had to be used for obtaining a nearly complete structure. About 63 residues of the thiolase domain distributed over four loops could not be built, indicating that these residues belong to truly disordered segments. Electron density is missing for four segments consisting of residues 17-25, 64-77, 135-170 and 253-254 in the thiolase domain (see also Figure 4) as well as for the N-terminal hexa histidine tag. In the rest of the polypeptide, adequate density is not observed for the side chains of only a few surface residues. The data processing and structure refinement statistics are shown in Tables 1 and 2 respectively. The quality of the final electron density map is illustrated in Figure 5. The electron density difference map has an elongated unmodeled feature at an interhexamer crystal contact region close the disordered loop 63-76. The uninterpreted density is not connected to the residues at the beginning or end of the disordered loop. The final R-factor and Rfree of the refinement at 2.7 Å resolution are 23.3% and 26.1%, respectively ( Table 2).

The overall fold
The MsTLP1 polypeptide fold is illustrated in Figure 6. The polypeptide folds into two distinct domains: an N-terminal thiolase like domain (residues 1-407) and an additional C-terminal domain (residues 435-507), with a 27 residue long linker ( Figure 6A) connecting the two domains. The N-terminal domain of MsTLP1 exhibits the characteristic thiolase fold. DALI search against the PDB using the N-terminal domain as the template revealed several structural homologs. Functionally, all these proteins are involved in fatty acid metabolism. The N-terminal domain can be further divided into two topologically similar N-terminal and C-terminal halves, both of which exhibit the babababb topology characteristic of thiolase domains ( Figure 6). Between the N-and C-terminal halves is a layer of two a helices contributed by the two halves of the N-terminal domain. The b sheets of the N-and C-terminal halves are also surrounded by a helical layer on the side facing the bulk solvent ( Figure 6). Thus, the N-terminal domain is organized into a five layered a/b/a/b/a structure characteristic of the thiolase superfamily. In thiolases, the topologically similar N-and C-terminal halves are connected by the long thiolase loop domain consisting of about 120 residues. In MsTLP1, this region corresponds to residues 133 to 253. The lengths of the loop domains in thiolases and MsTLP1 are comparable. However, there is no electron density for a large segment at the N-terminal region of this loop in MsTLP1 (135-170), indicating that this region of the loop domain is highly flexible. The end of the thiolase loop domain is also disordered in MsTLP1 ( Figure 4). Two other disordered loops of the MsTLP1 thiolase domain consisting of 9 and 14 residues, respectively, occur after Nb1 and Nb2.
The C-terminal extra domain (residues 435-507) is made up of 6 b strands and resembles a barrel-like structure with three antiparallel b-strands on either side of the barrel ( Figure 6A). There is a short helix located at the top of the barrel resembling a lid. The C-terminal extra domain is connected to the N-terminal thiolase domain by a long loop consisting of a short helix ( Figure 6A). The topology of the C-terminal domain is reminiscent of single strand nucleic acid binding proteins. DALI search using this domain against the PDB shows structural similarities to a molybdenum binding protein (PDB id: 1H9K) [20] and a single strand DNA binding protein (PDB id: 3PGZ) (Seattle Structural Genomics Center for Infectious Disease; Unpublished). However, no characteristic DNA binding sequence motif could be identified in the domain.

Organization of protomers in the crystal asymmetric unit
The six protomers in the asymmetric unit are related by a nearly perfect non-crystallographic 32 symmetry (Figure 7). The layer of protomers (labeled A, B, C in Figure 7A) related by the noncrystallographic 3-fold symmetry is staggered with respect to the second layer of protomers (labeled A9, B9, C9) related to the first by the non-crystallographic 2-fold symmetry. In the non-crystallographic 3-fold related layer, the inter-protomer contacts are between the thiolase domain of one protomer and the C-terminal domain of the other protomer ( Figure 7). The rotational relationship of neighboring protomers related by the noncrystallographic 3-fold axis are within 60.5u of 120u. Similarly the rotations relating protomers of the two layers are within 60.5u of 180u.

Analysis of inter-protomer contacts
The buried surface areas between protomers related by the noncrystallographic 3-fold and 2-fold axes were computed. The solvent accessible surface area of a protomer is 18,145 Å 2 . Due to the 3-fold related interactions, the area that gets buried in the A/B interface is 561 Å 2 per protomer. This represents only 3.2% of the total area and is small compared to the buried surface area of tightly associated subunits in protein oligomers. This suggests that the interface between 3-fold related subunits is not strong. On the average, about 20 residues from the N-terminal domain of one protomer and 15 residues from the C-terminal domain of the adjacent protomer participate in the interface. The interface is stabilized by 5 hydrogen bonds (Arg 18-Asp 452, Thr 16-Thr 454, Pro 219-Thr 454, Glu 35-Lys 475, Asp 49-Glu 499, Lys 38-Arg 503) and one salt bridge (Glu 35-Lys 473). The contact between protomers related by the non-crystallographic 2-fold is much weaker. No significant hydrogen bonds or salt bridges were detected between the layers. In contrast, an area of 2546 Å 2 representing 16% of the total surface area gets buried in the dimerization of Z. ramigera thiolase. The interface is stabilized by 62 H-bonds and 18 salt bridges and is completely different from the interface between A and A9 subunits of MsTLP1. Therefore, the hexameric state observed in the crystal structure of MsTLP1 is likely to be the result of crystal packing. This observation is consistent with the monomeric state of MsTLP1 in solution suggested by size exclusion chromatography and Static Light Scattering (SLS). Modeling studies suggested that the linker region between the N and C-terminal domains of MsTLP1 ( Figure 6A) obstructs the formation of an interface similar to that of Z. ramigera thiolase due to steric clashes. Also, the tetramerization loop of Z. ramigera thiolase, which occurs at the N-terminal end of the thiolase   Figure 8A). It is noteworthy that most of the structural differences observed between MsTLP1 and Z. ramigera thiolase are in the thiolase loop domain (119-249 of Z. ramigera thiolase). In Z. ramigera thiolase, five segments of this domain appear to be important for catalysis and substrate specificity [11,12]. Figure 8B shows a structural superposition of the thiolase loop domains of Z. ramigera thiolase and MsTLP1 highlighting the five functionally important segments. The tetramerization loop occurring at the Nterminus of this domain is essential for the tetrameric organization of Z. ramigera thiolase. Residues from this loop also interact with the substrate. The covering loop occurs immediately after the tetramerization loop and covers the active site pocket. The pantetheine loop interacts with the pantetheine part of bound CoA. The covering loop and the pantetheine loop together shape   the entrance to the catalytic pocket of Z. ramigera thiolase. The cationic loop is solvent-exposed and is thought to capture the negatively charged substrate. The adenine binding loop promotes binding of the adenosine moiety of CoA. The tetramerization loop and the pantetheine loop that occur at the end of the thiolase loop domain are disordered in MsTLP1. From sequence comparison (Figure 4), it is evident that the tetramerization loop is longer in MsTLP1 by about eight residues when compared to the corresponding loop of Z. ramigera thiolase. The cationic loop is substantially shortened in MsTLP1 and hence does not resemble  the corresponding loop of Z. ramigera thiolase. These differences in the conformation of the loops surrounding the active site pocket are likely to be functionally significant. The adenine loop, however, is in a similar position and is of the same length in both the proteins.

Sequence fingerprints of thiolases
All enzymes in the thiolase family have five highly conserved sequence fingerprints as shown in Figure 4 [7,11,14]. The corresponding residues in MsTLP1 were identified by structure based sequence alignment and careful manual examination ( Figure 4). The most significant difference is that the active site nucleophilic cysteine, Cys89 of the CXS motif of Z. ramigera thiolase, has been replaced by Gly102 in MsTLP1. This fingerprint is labeled 1 in Figures 3 and 4. The VMG motif (residues 287-289 of thiolase; fingerprint 2) determines the size of the substrate that can be accommodated in the active site cavity. In T2-thiolase [10], the active site cavity can accommodate a branched fatty acid with a 2-methyl group. The inability of Z. ramigera thiolase to bind long chain acyl CoA moieties as well as 2methyl branched fatty acid CoAs has been attributed to the presence of two residues, Met157 and Met288, which protrude into the binding pocket [21,22]. The Z. ramigera thiolase-Met157 has been replaced by a smaller residue Ala178 in MsTLP1. The VMG-motif that includes Met288 is significantly displaced in MsTLP1 such that a much larger volume is available for ligand binding (Figure 9). Therefore, this region of the putative binding pocket of MsTLP1 may bind a more extended ligand. Substantial differences are also observed between Z. ramigera thiolase and MsTLP1 in other motifs that define the nature of the active site. Asn316 in the NEAF motif (fingerprint 3; residues 316-319) of Z. ramigera thiolase makes a hydrogen bond with the catalytic water [8]. This motif has been replaced by YSCF (residues 330-333) in MsTLP1. The GHP motif (residues 347-349 of Z. ramigera thiolase, fingerprint 4), which is involved in the formation of the oxyanion hole, has been replaced by GGL (residues 364-366 of MsTLP1). Finally, Cys379 of the CXG motif (residues 378-380 of Z. ramigera thiolase, fingerprint 5), which acts as the base for proton abstraction from the substrate in thiolase, has been replaced by Asn395 of the corresponding NGG motif in MsTLP1 (395-397). The altered motifs YSCF, GGL and NGG are, however, conserved in homologs of MsTLP of other prokaryotes (Figure 3) suggesting that these residues may be important for the function. The conservation of the finger print sequences within TLP homologs and classical thiolases but not between the two families may be a key feature that determines the functional differences between proteins belonging to these families.

The putative Coenzyme A binding groove of MsTLP1
The surface characteristics of MsTLP1 are shown in Figure 10A. A large groove is observed extending across the full length of the MsTLP1 molecule. Comparison with the Z. ramigera thiolase structure shows that this groove corresponds to the binding site for a CoA molecule or a fatty acyl CoA molecule. As in Z. ramigera thiolase, several positively charged residues, notably Arg227, Arg240, Arg248, line this putative CoA-binding pocket, presumably to stabilize binding of the negatively charged phosphate groups of the ligand ( Figure 10A). An analysis of the loops of TLP1 (Figure 4) shows that the shape and binding properties of the groove extending beyond the CoA molecule will be affected by the disordered loops occurring at the beginning (135-169) and end (residues 153, 154) of the loop domain, as visualized in Figure 10B. In particular, the disordered loop at the beginning of the loop domain (135-169) is rather long and could cover part of the binding pocket for CoA or a fatty acyl CoA molecule after complex formation. Both loops may become ordered on binding the physiological ligand, thereby shielding the bound ligand from bulk solvent.

The C-terminal domain
The C-terminal domain of MsTLP1 does not have the same fold as the SCP2 domain of human SCP2-thiolase. The residues of the linker region between the thiolase domain and the C-terminal domain prevent formation of dimers resembling those of classical thiolases. This concerns for example, clashes between the MsTLP1 linker helix (residues 419-427) and the loop region just before the Nb3 dimer interface b-strand of the other subunit of the hypothetical dimer (residues 73-83), as shown in Figure 10B.
The interface area between the N-terminal thiolase domain and the C-terminal domain is 1035 Å 2 . This constitutes 22% of the Cterminal domain and 4% of the N-terminal domain surface areas, respectively. There are 19 hydrogen bonds and one salt bridge (between Asp487 of the C-terminal domain and Arg190 of the thiolase domain) between the two domains. Interestingly, Arg190 is largely conserved in the homologs of TLP. Asp487 is also conserved in the homologs of TLP. In three of the seven homologs shown in Figure 3, it has been replaced by glutamate and hence the charge is conserved. Thus it appears that most of the TLP homologs have a C-terminal domain anchored to the thiolase domain by similar interactions. This domain is not close to the putative active site, except for residues DWTP (452-455) of the long loop occurring after Eb1 (Figure 4, Figure 8A). This loop is hydrogen bonded to the thiolase domain via an anti-parallel bsheet interaction between residues TVR (448-450) and IVD (242-244) of the region just before Nb5 of the thiolase domain. It is interesting to note that residues of this domain have a slightly higher B-factor (54.3 Å 2 ) as compared to the N-terminal thiolase domain (41.6 Å 2 ). The role of the C-terminal domain is yet to be elucidated. Deletion studies are underway to shed more light on the functional importance of this domain.

Concluding remarks
All enzymes of the thiolase superfamily that have been structurally characterized so far share four features: 1) conservation of the core a/b/a/b/a-layered structure of the thiolase domain, 2) conservation of the extensive dimerization interface, 3) Figure 9. MsTLP1 has a larger binding pocket near its putative catalytic site. In Z. ramigera thiolase the size of the pocket is restricted by two methionines (shown in green) that protrude into the binding cavity. In MsTLP1 (brown), one of these residues (Met157 of Z. ramigera thiolase) is replaced by Ala178, while the second (Met288 of Z. ramigera thiolase) is replaced by a loop containing five residues. However, the loop is displaced away from the cavity. doi:10.1371/journal.pone.0041894.g009 the location of the active site pocket and conservation of key active site residues and 4) the use of a nucleophilic cysteine residue in catalysis. MsTLP1 has the conserved core a/b/a/b/a structure and conforms to the thiolase fold strictly. Although the dimerization is not a conserved feature in MsTLP1, the location of the putative active site is similar to those in other thiolases. The ligand binding groove of MsTLP1, identified by structural superposition with Z. ramigera thiolase, is larger than that of Z. ramigera. The absence of the catalytic cysteine and the consequent lack of thiolase activity suggest that though the protein has the strictly conserved thiolase fold, it might perform an entirely different function. Therefore, the protein appears to belong to a new subfamily in the thiolase superfamily. MsTLP1 is the first monomeric member of this superfamily. The initial sequence annotation suggested correctly that MsTLP1 is a thiolase-like protein, although the sequence identities are not very significant.
Interestingly, none of the catalytic residues of the thiolase subfamily is conserved in MsTLP1. Nevertheless, the fold of the thiolase part of MsTLP1 closely resembles the Z. ramigera thiolase structure. The location of the ligand binding pocket for CoA in the two proteins also appears to be conserved. MsTLP1 has a unique extra C-terminal domain of unknown function. The structure of this domain is completely different from that of the SCP2 domain of SCP2-thiolase. Examination of TLP1 homologs suggests that the interactions between the thiolase-like domain and the Cterminal additional domains are largely conserved. The physiological function of the full length MsTLP1 protein has not yet been established. Further studies have been initiated to establish the intriguing role of MsTLP1 in the lipid metabolism of mycobacterium and other microorganisms.

Cloning, expression and purification
The gene coding for M. smegmatis TLP1 (YP_889758) was cloned, over-expressed in E. coil and purified as described earlier (19).

Selenomethionine incorporation
The MsTLP1 clone was transformed into BL21 (DE3) Rosetta strain of E. coli. A single colony was picked and grown in 1 ml LBampicillin overnight at 37uC. This was then inoculated into 500 ml minimal media with 100 mg/ml ampicillin and grown for 12 hrs at 37uC. Methionine synthesis was inhibited by the addition of an amino acid mixture containing 50 mg/l of leucine, isoleucine, valine, lysine, threonine and phenylalanine. After incubation for half an hour, protein expression was induced with 1.0 mM isopropyl-beta-thio galactopyranoside (IPTG). Selenomethionine was added at the time of induction. Purification of the protein was carried out as described for the native enzyme. Selenomethionine incorporation was confirmed by accurate mass determination using Electrospray ionisation mass spectrometry (ESI-MS).

Crystallization, data collection and data processing
The crystals were obtained as described previously [19]. Briefly, the crystals were grown at room temperature using the microbatch method. A crystallization droplet consisted of 2 ml of protein solution (5 mg/ml in 50 mM Tris-HCl pH 8.0 containing 100 mM NaCl, 10% glycerol, 5 mM 2-mercaptoethanol) and 2 ml of crystallization solution (100 mM 4-(2-hydroxyethyl)-1piperazineethanesulfonic acid (HEPES) pH 7.5, 20% Polyethylene glycol (PEG) 4000, 10% isopropanol). Selenium incorporated MsTLP1 crystals were obtained under the same conditions. Three data sets of selenomethionine labeled crystals (Se-MsTLP1) were collected to 2.7 Å resolution at a wavelength of 0.9789 Å (peak) at beam line BM14 of the European Synchrotron Radiation Facility (ESRF), Grenoble, France, using three different crystals. The data were processed using DENZO and SCALEPACK from the HKL-2000 suite [23]. Data processing revealed that the crystal belonged to the triclinic space group P1 and the size of the unit cell is compatible with 4-8 protomers of MW 56 kDa corresponding to Matthews coefficients of 4.0 Å 3 Da 21 22.0 Å 3 Da 21 , respectively. The final statistics for data collection and processing are summarized in Table 1.

Structure solution and refinement
The structure of MsTLP1 was determined as follows. Using anomalous differences, 63 selenium positions were identified and refined using the program SHELXD [24]. Initial selenium Figure 10. The putative CoA binding groove of MsTLP1 from the comparison with the Z. ramigera thiolase. A) The shape of the binding groove. The binding mode of acetyl CoA (red) as obtained by superposition of the complexed Z. ramigera thiolase structure. The Nterminal thiolase domain is shown in pale blue and the C-terminal domain is shown in green. Also highlighted in blue are the positively charged residues Arg227, Arg240, and Arg248 which line the putative CoA binding pocket. B). The disordered loops. Loop regions that are disordered in MsTLP1 but ordered in Z. ramigera thiolase are highlighted in blue. Also included in the Z. ramigera reference structure is the loop just before the dimer interface Nb3 strand (residues 73-83) of the adjacent subunit (red). This superposition shows that the linker region of MsTLP1 clashes with this loop, thereby preventing the formation of Z.ramigera thiolase-like dimers. doi:10.1371/journal.pone.0041894.g010 position based phases obtained by the PHENIX program were improved by solvent flattening using the program DM of CCP4 [25] suite. The resultant map was used for automated model building with the program Autobuild of the Phenix suite [26]. The first round of automated model building could generate only 40% of the structure. The model obtained was manually adjusted using the interactive graphics program COOT [27] and refined using REFMAC5 [28] of the CCP4 suite. Subsequent rounds of manual model building and refinement extended the structure by an additional 20%. At this stage, a phase combination of the initial phases obtained by anomalous data and the model based phases was carried out using the program Phase-Combine of CCP4 suite [25]. The electron density map obtained using the combined phases was used for a new round of automated model building. This extended the structure by an additional 5%. At this stage, an anomalous difference Fourier map was calculated and selenium positions were re-determined. The 62 highest peaks in the difference Fourier map were accepted as representing selenium atoms. Between this set and the earlier set of selenium positions, 50 were common. The new selenium positions were subjected to extensive refinement using PHASER [29]. Phases obtained using these positions were used for map calculation and automated model building. At this stage, 80% of the structure could be built.
These calculations were based on only one of the three data sets collected at the peak wavelength. At this stage, all the three peak data sets collected at ESRF were merged and scaled and the resulting data were used to perform Molecular Replacement-Single wavelength Anomalous Dispersion (MR-SAD) using PHASER [29] along with the 80% model already built as the partial structure for initial phase calculation. This extended the model by an additional 10% and brought the structure to its current state of completion, i.e. 90%.

Structure analysis
The geometry of the final model was examined using PROCHECK [30]. All structural superpositions were achieved using the SSM superpose feature of COOT [27]. Average Bfactors for protein atoms, water molecules and ligands were calculated using BAVERAGE of the CCP4 suite. The PISA [31] server was used for interface area calculations. Surface area calculations and analysis of contacts were performed using programs available in the CCP4 suite [25]. All figures were prepared using PYMOL (The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC). Bacterial Z. ramigera thiolase (PDB ID: 1DM3) has been used for all sequence and structural alignments. 1DM3 represents the acetylated Z. ramigera thiolase structure complexed with acetyl CoA.

Bioinformatics analysis
Only the thiolase domain of MsTLP1 was used in all sequence analyses. The amino acid sequence of MsTLP1 was used as a starting point in an extensive bioinformatics analysis of the M. smegmatis genome: (i) Proteins homologous to MsTLP1 were identified in the non-redundant protein sequence database of SWISSPROT [32] using BLAST [33]. (ii) Additional sequences that might be related to MsTLP were identified using the six characterized human thiolases as query sequences in the BLAST program. Two TLP homologs, TLP1 and TLP2 (YP_887911.1 and YP_889758.1), were identified in the M. smegmatis genome. The catalytic cysteine is not conserved in either of these proteins although they have been annotated as thiolases. These proteins appear to have an N-terminal thiolase-like domain and an extra domain at the C-terminus. (iii) A BLAST search against the human genome using the MsTLP1 amino acid sequence did not reveal any hits. However, homologs of MsTLP1 could be identified in many prokaryotic genomes. TLP homologs were also observed to be present in all other Mycobacterium spp. Interestingly, only one TLP sequence was found in the M. tuberculosis genome. (iv) A multiple sequence alignment was performed by ClustalW [34] using amino-acid sequences of the six human thiolases, several other eukaryotic thiolases and seven TLPs from various sources including two mycobacterial spp. This alignment was used to generate a phylogenetic tree using the neighbor-joining method with 10,000 bootstrap replicates in MEGA5 [35].

Oligomeric state of MsTLP1
The oligomeric state of MsTLP1 in solution was determined using a static light scattering (SLS) system (Wyatt Minidawn) with its flow cell connected to an Ä KTA purifier operated at 17uC. MsTLP1 (0.5 ml of 4.0 mg/ml) protein solution in 50 mM Tris-HCl pH 8.0, 100 mM NaCl, 10% glycerol, 5 mM 2-mercaptoethanol was loaded on a Superdex 200 (10/300) GL column (GE Healthcare) attached to the Ä KTA purifier. The elution profiles (Supplementary Figure S1) were also monitored using RI and UV detectors and analysed using the ASTRA program. The molecular mass expected from the sequence was calculated using the online tool PROTPARAM [36].

Thiolase activity assays
The thiolase activity in the degradative ( Figure 1) and synthetic directions were estimated as described previously [8] at 30uC using a Shimadzu UV-1800 299 spectrophotometer. The concentrations of the substrates CoA, acetyl CoA and acetoacetyl CoA, in the respective stock solutions, were verified using the Ellman's test [37]. In all the assays, a cocktail without MsTLP1 served as the negative control and measurements with active trypanosomal thiolase were used as the positive control.
In the thiolytic direction, the 500 ml reaction cocktail contained 50 mM Tris-HCl pH 7.8, 25 mM MgCl 2 , 60 mM CoA and 50 mM acetoacetyl CoA. The disappearance of the Mg 2+ / acetoacetyl CoA complex was measured at 303 nm for 3 min after the addition of 10 mg of MsTLP1. In the synthetic direction, the activity assay was carried out using a short chain 3hydroxyacyl CoA dehydrogenase (SC-HAD) as the linker enzyme [9]. In this assay, the formation of acetoacetyl CoA is measured by incubating the enzyme with acetyl CoA and measuring the product formation by reducing it with nicotinamide adenine dinucleotide (NADH), as catalyzed by the linker enzyme (as described previously [11]). The reaction was initiated by the addition of 10 mg of MsTLP1 to a mixture containing 50 mM Tris-HCl, pH 7.8, 40 mM KCl, 0.2 mM NADH, 1 U of SCHAD (1 U is 1 mmol/(min. mg) of protein), 0.5 mM dithiothreitol (DTT) and 2 mM acetyl CoA. The total reaction volume was 500 ml and the rate of NADH oxidation was monitored at 340 nm for 3 minutes.

CoA affinity studies
The binding studies between MsTLP1 with CoA were performed with a Biacore 3000 (Biacore, Uppsala, Sweden) optical biosensor at 25uC. 2 mM of MsTLP1 protein was immobilized by amine coupling on the surface of a CM5 sensor chip (GE Healthcare) to a level of ,3,000 response units using the GE Healthcare standard immobilization protocol. Binding and dissociation were measured at a flow rate of 20 ml per minute. The ligand solution was run over the sensor surface at five different ligand concentrations (0.3, 0.6, 1.25, 2.50 and 5 mM) in a running buffer of 10 mM HEPES pH 7.4, 150 mM NaCl, 3 mM ethylenediaminetetraacetic acid (EDTA) and 0.005% P-20 The sensor surface was regenerated with borate buffer pH 8.5. The binding curves were corrected for non-specific binding by subtracting the signal obtained for the negative control flow cell. Kinetic constants for association and dissociation were derived from linear transformations of the binding data. The data were fit to the simple 1:1 Langmuir interaction model using the BIA EVALUATION software. Figure S1 The oligomeric state analysis of the recombinantly expressed, purified MsTLP1 has been determined by using static light scattering (SLS) in combination with size-exclusion chromatography. A Superdex 200 10/300 GL column (GE Healthcare) was used for the sizeexclusion chromatography. The elution profile is provided by the UV signal (thin red line). The SLS signal is combined with the RIsignal for the calculation of the molar mass (thick red line) by the ASTRA program. (TIFF) Figure S2 Kinetics of binding of CoA to MsTLP1 determined using SPR. The SPR sensogram was obtained by flowing different CoA solutions (see inset) over the TLP1 immobilized sensor chip. The data show that CoA binds to TLP1 with a K d in the millimolar range. (TIFF)