The C-Terminal Domain of the Arabinosyltransferase Mycobacterium tuberculosis EmbC Is a Lectin-Like Carbohydrate Binding Module

The d-arabinan-containing polymers arabinogalactan (AG) and lipoarabinomannan (LAM) are essential components of the unique cell envelope of the pathogen Mycobacterium tuberculosis. Biosynthesis of AG and LAM involves a series of membrane-embedded arabinofuranosyl (Araf) transferases whose structures are largely uncharacterised, despite the fact that several of them are pharmacological targets of ethambutol, a frontline drug in tuberculosis therapy. Herein, we present the crystal structure of the C-terminal hydrophilic domain of the ethambutol-sensitive Araf transferase M. tuberculosis EmbC, which is essential for LAM synthesis. The structure of the C-terminal domain of EmbC (EmbCCT) encompasses two sub-domains of different folds, of which subdomain II shows distinct similarity to lectin-like carbohydrate-binding modules (CBM). Co-crystallisation with a cell wall-derived di-arabinoside acceptor analogue and structural comparison with ligand-bound CBMs suggest that EmbCCT contains two separate carbohydrate binding sites, associated with subdomains I and II, respectively. Single-residue substitution of conserved tryptophan residues (Trp868, Trp985) at these respective sites inhibited EmbC-catalysed extension of LAM. The same substitutions differentially abrogated binding of di- and penta-arabinofuranoside acceptor analogues to EmbCCT, linking the loss of activity to compromised acceptor substrate binding, indicating the presence of two separate carbohydrate binding sites, and demonstrating that subdomain II indeed functions as a carbohydrate-binding module. This work provides the first step towards unravelling the structure and function of a GT-C-type glycosyltransferase that is essential in M. tuberculosis.


Introduction
Tuberculosis (TB) affects large parts of the world's population, particularly in developing countries [1]. The antibiotics isoniazid (INH) and ethambutol (EMB) [2] have been used for decades as frontline drugs to treat Mycobacterium tuberculosis infections, the causative agent of TB, but the rise of multi-drug resistant (MDR) and extensively drug resistant (XDR) strains poses a serious threat to present treatment options [3]. Both, INH and EMB inhibit the synthesis of essential components of the mycobacterial cell wall. This unique and highly impermeable barrier surrounds a single phospholipid bilayer membrane and is composed of an outer segment of solvent-extractable lipids, glycans and proteins, and a covalently linked inner segment, known as the mycolyl-arabinogalactan-peptidoglycan (mAGP) core [4]. Perturbations to the mAGP core tend to undermine viability of M. tuberculosis, a major reason why mAGP biosynthesis constitutes an attractive target for drug design efforts. The mycobacterial cell wall also encompasses various membrane-anchored lipoglycans, a group that includes lipoarabinomannan (LAM), which plays a key role in modulating the host immune response [5]. The arabinogalactan (AG) segment of the mAGP core and LAM both contain D-arabinan polymer, composed of a(1R5), a(1R3) and b(1R2)-linked arabinofuranosyl (Araf) residues that are assembled in distinct structural motifs ( Fig. 1A) [4,5].
Owing to their hydrophobic nature, generating recombinant Emb proteins in soluble form has proved difficult, hampering in vitro characterisation. As a result, the function of the Emb enzymes has been delineated by genetics, phenotypic analysis of the cell envelope and cell-free assays. Single gene deletions of embC, embB in M. tuberculosis are lethal [18,19], but corresponding knock-outs in Mycobacterium smegmatis or Corynebacterium glutamicum yield viable, albeit slow growing mutants, whose cell wall defects can be analysed [8,9]. Following attachment of the initial Araf residue to the linear galactan polymer [Galf-b(1R5)Galf-b(1R6)] n , catalysed by the Araf-transferase AftA [12], EmbA and EmbB extend the arabinan chain in AG synthesis, transferring Araf residues from DPA to polysaccharide acceptors [8,9]. Highly similar in amino acid sequence (,40% identity, see also Supporting Fig. S1), EmbA,B and EmbC have differential roles: the DembA,B deletions inhibit AG synthesis, but leave LAM synthesis intact, whereas the DembC deletion only affects LAM synthesis. Chimaeric forms of the Emb enzymes, where the hydrophilic C-terminal domain of EmbC was swapped for that of EmbB led to a hybrid-LAM, bearing an AGspecific, branched Araf 6 group instead of the characteristic LAMspecific linear Araf 4 [9]. These data indicated that the hydrophilic C-terminal domain makes a critical contribution to determining the structure of the resulting AG or LAM segments.
To date, the Emb enzymes have remained poorly characterised in structural terms, despite their central significance as targets of the TB antibiotic EMB and their link to drug resistance [20]. Herein, we present the crystal structure of the C-terminal hydrophilic domain of M. tuberculosis EmbC (residues 719-1094, henceforth EmbC CT ), as a first step towards the elucidation of the 3D structure of the full-length enzyme.

Structure determination and domain architecture
EmbC CT crystallised in space group P6 5 22 over a diverse range of reservoir conditions, with one molecule in the crystallographic asymmetric unit. Crystals were generated with or without an Araf acceptor analogue (see below) present in the crystallisation droplet. The experimental density, phased by multi-wavelength anomalous dispersion (2.7 Å , Table 1), was of very good quality (Fig. S2A), defining the structure for residues 735-1067, except for two disordered loops (795-824 and 1016-1037, Fig. 2A). EmbC CT is composed of two distinct subdomains, separated by a deep crevice marked by the disordered loops (residues 795-824 and 1016-1037). Subdomain I, which encompasses residues 746-760 and 967-1067, displays a mixed a/b structure, with a 5-stranded bsheet forming a semi-barrel ( Fig. 2A). The long H6-S13 loop, which forms a minor crystal packing interface, protrudes from the core of subdomain I with a helical half-turn at its tip ( Fig. 2A). Subdomain II (residues 761-966) forms an anti-parallel bsandwich structure, of which the 'outer' sheet (S2, S4, S10, S6, S7) faces solvent while the 'inner' sheet (S3, S11, S5, S9, S8) packs against the core of the domain (Fig. 2A). The b-sandwich of subdomain II assumes a jellyroll fold (Fig. 2B), a fold typical for polysaccharide binding units in plant lectins and carbohydrate active enzymes [21]. Although not part of the formal jellyroll description, strands S2 and S8 extend the 'outer' and 'inner' sheet, respectively, while helix H4 forms a boundary to the 'outer' sheet. A high-density peak (14s, anomalous density difference map, Fig. 3A) is embedded between loops S3-S4 and S10-S11. Quasioctahedral coordination geometry and the distribution of peakligand distances from 2.40 to 2.63 Å (Fig. 3A) suggest a bound Ca 2+ ion [22]. The metal ion appears shielded from solvent, although including 10 mM EDTA in the cryoprotectant buffer significantly diminished the height of the density peak (Fig. S2B). Substitution of Asp949 by serine in EmbC CT , the only side chain in direct contact with the Ca 2+ ion (2.6 Å , bidentate, Fig. 3A), resulted in very poor recombinant expression compared to wildtype and other point mutants probed in this study (see below). Together these observations suggest that the Ca 2+ ion is important for the structural integrity of EmbC CT .

Structural neighbours
The fold of subdomain II is consistent with the proposed role of EmbC CT as an acceptor saccharide recognition module. The comparison with structural homologues, identified via distance matrix alignment using the DALI program (http://ekhidna. biocenter.helsinki.fi/dali_server/, [23]) reinforces this notion. The vast majority of PDB entries retrieved by DALI (over 300 entries above the default significance threshold of Z = 2) match the b-sandwich fold of subdomain II and represent 'carbohydrate binding modules' (CBM), structural domains that confer carbohydrate-binding specificity, but that lack intrinsic catalytic activity [21]. CBMs occur frequently as a part of glycoside hydrolase enzymes and fall into (to date) 61 distinct CBM families (http:// www.cazy.org/). While none of the structural homologues is particularly close to subdomain II (Z-scores#6.9, root mean square deviation (RMSD)$3.0 Å ), the top 10 hits include the calcium-containing CBM families 6 and 36 ( Fig. S3A-C). Interestingly, in the DALI-generated superposition of EmbC CT with Paenibacillus polymyxa endo-1,4-b-xylanase (PDB entry 1UX7, CBM 36), the Ca 2+ sites match to within 0.9 Å , and in the latter, the Ca 2+ ion makes direct contact with the bound xylobiose ligand (Fig. S3A). In contrast, only three hits were obtained for subdomain I of which only the best (PDB entry 2ZAG, Z = 3.0, RMSD 3.4 Å for 66 Ca pairs) showed weak similarity in terms of secondary structure topology in a limited region of overlap (Fig.  S4). This PDB entry describes the hydrophilic C-terminal domain of oligosaccharyltransferase STT3 from Pyrococcus furiosus [24], a membrane-embedded glycosyltransferase of the GT-C superfamily that catalyses transfer of glycosyl groups from a lipid donor to Asnglycosylation sites of the acceptor protein.

Author Summary
Tuberculosis (TB), an infectious disease caused by the bacillus Mycobacterium tuberculosis, burdens large swaths of the world population. Treatment of active TB typically requires administration of an antibiotic cocktail over several months that includes the drug ethambutol. This front line compound inhibits a set of arabinosyltransferase enzymes, called EmbA, EmbB and EmbC, which are critical for the synthesis of arabinan, a vital polysaccharide in the pathogen's unique cell envelope. How precisely ethambutol inhibits arabinosyltransferase activity is not clear, in part because structural information of its pharmacological targets has been elusive. Here, we report the highresolution structure of the C-terminal domain of the ethambutol-target EmbC, a 390-amino acid fragment responsible for acceptor substrate recognition. Combining the X-ray crystallographic analysis with structural comparisons, site-directed mutagenesis, activity and ligand binding assays, we identified two regions in the C-terminal domain of EmbC that are capable of binding acceptor substrate mimics and are critical for activity of the fulllength enzyme. Our results begin to define structurefunction relationships in a family of structurally uncharacterised membrane-embedded glycosyltransferases, which are an important target for tuberculosis therapy.
ed three prominent interaction surfaces burying 390 Å 2 , 670 Å 2 and 1100 Å 2 of solvent accessible surface (SAS) per monomer, respectively (Fig. S5). We probed self-assembly of EmbC CT by sedimentation velocity at three different protein concentrations (Fig. 4A). The distribution C(S) of the sedimentation coefficient S indicates a dynamic equilibrium between three different molecular species at 3.1S, 4.6S and 7S, which correspond to apparent molecular weights of 46.5 kDa, 75.8 kDa and 138.0 kDa, respectively, compared to the calculated monomer mass of 39.9 kDa. Bearing in mind that under-or overestimates of apparent masses can occur as a result of fitting a single frictional coefficient for an ensemble of species with different frictional ratios, the dominant peak at 4.6S most likely represents a dimer.
The higher molecular weight peak at 7.6S, could be a trimer or tetramer, but strongly suggests that more than one of the crystal packing interfaces is able to mediate oligomerisation of EmbC CT in vitro.

Carbohydrate binding
Previous studies had attributed to the C-terminal domain of the Emb proteins a critical role in arabinan chain extension [9,11]. Therefore, we asked whether the isolated domain is able to bind synthetic acceptor analogues. As the physiological substrate is chemically complex and diverse, using synthetic acceptor analogues offered the best chance to obtain an experimental acceptor-bound complex structure. In previous work, our laboratory had chemically synthesised neo-glycolipid acceptors that were modelled on motifs found in mycobacterial AG and LAM. When incubated with [ 14 C]-labelled Araf-donor substrate DPA and isolated mycobacterial membranes in a cell-free Araf transferase, these molecules acted as potent acceptor mimics [25]. One of these acceptors was the di-arabinoside a-D-Araf-(1R5)-a-D-Araf-O-(CH 2 ) 7 CH 3 (for short: Ara(1R5)Ara-O-C8, Fig. 4B).
The O-linked octyl tail allowed extraction of the reaction products for qualitative characterisation in vitro. Importantly, the closely related di-arabinoside a-D-Araf-(1R5)-a-D-Araf-O-CH 3 (Ara(1R5)Ara-O-C1) exhibited similar levels of acceptor activity, demonstrating the O-linked octyl was dispensable for activity [25]. By way of intrinsic tryptophan fluorescence, we probed binding of Ara(1R5)Ara-O-C8 to EmbC CT , as well as that of analogous triand penta-arabinofuranosides,  (Table 2), while the disaccharide lacking the octyl chain, Ara(1R5)Ara-O-C1, resulted in a K d of 11.0 mM. These data confirmed that in the solution state the octyl chain is not essential for binding, although it may enhance affinity. Soaking EmbC CT crystals in cryoprotectant solution containing 27 mM Ara(1R5)Ara-O-C8 (,3-fold excess of ligand relative to protein concentration in the crystal) reproducibly resulted in defined ligand density (Fig. 3B), allowing us to unequivocally build one Araf unit and the octyl chain of Ara(1R5)Ara-O-C8, while the second Araf ring remained invisible, even when contouring the map at near-noise level. Soaking experiments using the other acceptor analogues, for which solution binding was examined, failed to reveal electron density for the ligand. The soaked di-arabinofuranoside ligand is positioned between two symmetry-related copies of EmbC CT , forming noncovalent contacts only with residues in subdomain I, but not with the CBM-like subdomain II, in contrast to our expectation. The Araf moiety packs against helix H6 and the H6-S13 loop (Fig. 2), forming three direct H-bond contacts with protein: O2 binds to , and O3 to Nd2 of Asn7409 (primed residues indicating the symmetry mate). In contrast, the octyl chain binds between helix H0 and the S13-S14 loop of the symmetry mate (Fig. 3B). Ligand binding promotes ordering of the N-terminus of helix H0, where 3 additional residues become visible compared to apo, and induces a conformational shift of aspartate residues 1051 and 1052 in the S13-S14 loop (Fig. S6). While this crystallographic complex structure did not reveal binding to the CBM-like subdomain II, it is possible that crystal lattice formation of EmbC CT interferes with binding at a site on subdomain II. We, therefore, asked whether the structural superimposition with saccharide-bound CBM domains could be exploited to predict potential additional binding sites. We note that ligand binding modes and substrate specificity of CBM domains can differ even within the same CBM family [21,26]. Thus, structural alignments of the protein scaffolds are unlikely to accurately predict the precise modes of binding and potential specificity-determining interactions. Nevertheless, superimposing carbohydrate-bound structures of CBM domains with the 10-highest DALI Z-scores (with respect to the non-redundant PDB90 subset) shows two clusters of putative ligand binding sites in subdomain II (Fig. 3C): (1)

Mutagenesis and activity in full-length EmbC
The crystallographic complex of EmbC CT bound to Ara(1-R5)Ara-O-C8 and the structural superposition with carbohydrate-bound homologues had indicated two distinct regions in EmbC CT as potential sites for carbohydrate binding (Fig. S7A). In order to probe the relevance of these two sites, we asked whether replacement of endogenous EmbC with recombinant EmbC carrying appropriate point mutations would alter the cell wall composition of M. smegmatis. Aromatic residues frequently mediate binding of carbohydrate ligands to CBMs [21]. Given the H-bond contacts between Trp985 and Ara(1R5)Ara-O-C8 in subdomain I, and the central position of Trp868 of the 'outer' (solventexposed) b-sheet of subdomain II ( Fig. 3C and Fig. S7A), we probed these two residues in the first instance.
Using a phage-mediated transduction method for allelic exchange [27], we generated an EmbC-deficient strain of M. smegmatis (M. smegmatis DembC), which was complemented with plasmids encoding either wild-type (full length) M. tuberculosis EmbC or mutant forms thereof. In accordance with previously reported data [9], our M. smegmatis DembC strain retains lipomannan (LM) synthesis, but is deficient in LAM (Fig. 4C lane 2). The abrogation of LAM biosynthesis can be directly attributed to the loss of EmbC, which is involved in the early synthesis of a(1R5)-Araf arabinan elongation of LM, the immediate LAM precursor (Fig. 1A) [9]. We utilised this phenotype by analysing LM/LAM resulting from complementation of M. smegmatis DembC with plasmid pVV16-Mt-embC, encoding full-length M. tuberculosis EmbC, and plasmids pVV16-Mt-embC W868A or pVV16-Mt-embC W985A , which encode point mutants W868A and W985A of full-length M. tuberculosis EmbC, respectively. Complementation with wild type EmbC largely restored the normal phenotype (Fig. 4C -lane 3), whereas complementation with the point mutants failed to re-establish LAM synthesis (Fig. 4C -lanes 4, 5). We verified by Western blot that loss of LAM synthesis was not due to failure of the plasmidencoded protein to incorporate into the membrane of M. smegmatis DembC (Supporting Fig. S7B). These results suggest that the structural perturbations caused by the individual single-site mutations are sufficient to disrupt the function of EmbC.

Differential acceptor binding of EmbC CT mutants
In order to establish whether loss of activity was linked to compromised acceptor binding, we introduced the single-residue mutations W868A or W985A into expression plasmids encoding EmbC CT . In addition, we prepared analogous expression plasmid constructs bearing mutations on Asn740 (to Ala, binding site subdomain I), Gln899 (to Ser) and His911 (to Ala, binding site subdomain II) and Asp949 (to Ser, Ca 2+ binding site, see Supporting Fig. S7A). Two constructs (Q899S, D949S) did not express well enough to yield protein suitable for in vitro assays. For those proteins that were produced successfully, proper folding was verified by far-UV circular dichroism spectroscopy (Supporting Fig. S7C). When comparing binding of the di-and penta-arabinoside acceptor analogues ( Fig. 4B and Fig. 5) that both carry the O-linked octyl tail, it was striking that the substitutions W868A and W985A affected binding of these ligands in a differential fashion. While the W985A mutation virtually abrogated binding of the disaccharide Ara(1-R5)Ara-O-C8, the W868A substitution preserved binding of this particular ligand, with only a modestly higher K d ( Table 2, Fig. 5A). In contrast, binding of the penta-arabinoside Ara(1R5) 4 Ara-O-C8 was insensitive to the W985A mutation, but completely inhibited in response to the W868A mutation. Likewise, mutating Asn740 to Ala weakened binding of the disaccharide (Table 2), consistent with its position within H-bond distance of the ordered Araf in subdomain I, whereas the distant H911A mutation in subdomain II had no effect on this ligand. Thus, the differential effect of mutations in the putative binding sites in subdomain I and II on binding of acceptor analogues that differ only in length, strongly suggests that these bind preferentially to distinct sites on EmbC CT .

Discussion
Polyprenyl-dependent glycosyltransferases of superfamily GT-C are still awaiting the determination of a structure of an intact, fulllength enzyme, but structures of individual hydrophilic domains have begun to emerge [24] (see also PDB entry 3BYW). As a first step towards the complete structural characterisation of the Emb Araf-transferases in M. tuberculosis, we have determined the crystal structure of the hydrophilic C-terminal domain of EmbC, the enzyme responsible for arabinan chain elongation in LAM synthesis and a target for the front line antibiotic EMB [5]. We found that the architecture of this domain comprises two subdomains, one of which folds as a lectin-or CBM-like domain, the other one shows weak similarity to the C-terminal hydrophilic domain of an unrelated GT-C glycosyltransferase, oligosaccharyl transferase STT3 [24]. The match between subdomain I and the so-called CC region of STT3 is poor (Fig. S4), and is limited to core secondary structure elements. Nevertheless, the DALIderived superposition aligns the second Trp in STT3's highly conserved WWDYG motif with EmbC's Trp985, a side chain we showed is critical for enzymatic activity. Thus the alignment lends additional support to the notion of Trp985 sitting at a critical junction of the C-terminal domain of EmbC.
Sequence comparison of the Emb C-terminal domains (Fig. S1) strongly suggests that the disulfide bond Cys749-Cys993 is a conserved structural feature. Forming a topologically intuitive demarcation of this domain, this covalent link presumably enhances the stability of the C-terminal domain at physiological conditions in the host. The disordered loops (residues 794-825, 1016-1037) encompass regions of high sequence diversity as with carbohydrate-bound structural homologues. Ligands (shown as stick models) were drawn according to the DALI-alignment with the Catraces of structural neighbours. Ligand structures shown in this diagram encompass PDB entries 1ux7, 1w9t, 1o8s, 1w9w, 1uy2, 1od3, 2vzq, 2w47, 2w87, 2cdp, 2cdo, 1uyy, 1uy0, representing the top 10 matches of the DALI search against the PDB90 subset (chains that are less than 90% identical in sequence to each other; Z-scores 6.9-6.3, RMSD 3.0-3.6 Å ). doi:10.1371/journal.ppat.1001299.g003 opposed to otherwise remarkably conserved regions of the structure. Given the latter, one could speculate that these disordered regions are linked to acceptor discrimination, and/or that ordering might be induced by contacts with adjacent structural elements in the context of the full-length enzyme.
It has previously been proposed that the Emb enzymes may function as dimers, possibly in the combination EmbA/EmbB and EmbC/EmbC [11,28]. Our sedimentation velocity data now provide supporting evidence for self-assembly of EmbC, although we cannot rule out that the observed oligomerisation occurs solely as a result of separating EmbC CT from the rest of the protein. However, the presence of dimers and trimers (or tetramers) (Fig. 4A) in solution demonstrated that at least two of the observed crystal packing interfaces were able to mediate self-assembly of EmbC CT . While thile the most-extended packing interface (SAS buried 1100 Å 2 ) is mediated by structural elements (helices H0 and H6) that are close the truncation site, the second-largest interface (SAS buried 670 Å 2 ) is mediated by strand S2, and distant to the truncation site. Indeed, the latter self-assembly interface generates a continuous b-sheet that extends across the monomer-monomer boundary (Fig. S5C), hinting that it could be preserved in the full-length enzyme.
The presence of a CBM-like subdomain in EmbC CT is consistent the proposed role of the C-terminal domain in acceptor substrate recognition [10,11]. Among these structurally diverse carbohydrate binding modules, the b-sandwich fold seen in EmbC CT is most common [21]. The differential response of the ligands of different length to the Trp mutations in subdomains I and II provides compelling evidence for the presence of two separate ligand binding sites in EmbC CT . This response also links the loss of Araf transferase activity in the Trp mutants to compromised acceptor binding. Although we were not successful in crystallising a complex structure that directly demonstrates binding of an acceptor analogue to the CBM-like subdomain II, the dramatic loss of binding affinity of the penta-arabinoside acceptor for the mutant EmbC CT(W868A) (Fig. 5B, Table 2) and the corresponding loss of LAM synthesis, are strong indications that subdomain II indeed functions as a carbohydrate binding module. We note that the W868A mutation has also a modest effect on binding of Ara(1R5)Ara-O-C8 (,2.5-fold increase in K d , Table 2), despite the obvious preference of this ligand for binding to subdomain I, as shown by the structure and the response to the W985A mutation. This observation could indicate that Ara(1-R5)Ara-O-C8 also associate with the CBM-like subdomain II, albeit with considerably lower affinity. The converse may be true for the penta-saccharide as well, although the affinities we measured show no corresponding signature. Comparison of the affinities for binding of the tri-and pentasaccharide to wild type EmbC CT clearly indicates that binding to subdomain II is tighter for longer polysaccharides, as these can be expected to make additional contacts. However, the apparent switch in binding preference from the site in subdomain I to that in subdomain II on going from two to five Araf units is less straightforward to explain. If, as the structure suggests, only the octyl tail and the first Araf unit were the major determinants of binding to subdomain I, one would expect to see evidence for binding of Ara(1R5) 4 Ara-O-C8 to subdomain I, that is, a significant change in affinity when mutating Trp985. Thus, while the octyl tail clearly influences binding of the di-saccharide, this appears to be less the case for the tri-and penta-saccharides. This observation is in line with the dispensable nature of the octyl chain when the above ligands are used as acceptor mimics in cell-free Araf transferase assays [25].
Overall, a string of genetic and biochemical evidence consistently indicated that enzymatic activity of the Emb Araftransferases is associated with loops displayed on the extra-cellular face of the membrane. For instance, the most frequent point mutation present in EMB-resistant clinical isolates of M. tuberculosis concerns residue Met306 in EmbB ( = Met300 in EmbC, see Fig. 1) [20], only a few residues downstream of the GT-C-specific, strictly conserved DDX motif in the E2 loop [15]. Berg et al. showed that loop E6 carries a functionally relevant, conserved prolinecontaining sequence motif [10], consistent with findings in the Emb protein of C. glutamicum [14]. Moreover, a crystal structure of the first extracellular loop of the Emb Araf-transferase of the related organism Corynebacterium diphtheriae has become available very recently (PDB entry 3BYW; Tan K., Hatzos C., Abdullah J., Joachimiak A., unpublished). The domain of the E1 loop displays a b-sandwich fold with similarity to the fold of galectin [29], but is not superimposable on that of subdomain II of EmbC CT . The galectin-like fold again hints to a potential function in carbohydrate binding -perhaps the sugar moiety of the Araf-donor DPA. In conclusion, the present structure of the C-terminal domain of M. tuberculosis EmbC provides a first corner stone towards assembling the structure of the full-length enzyme, and allows us to begin probing this essential enzyme in a rational and targeted fashion.   Table 2. doi:10.1371/journal.ppat.1001299.g005

Reagents
Plasmids were propagated during cloning in E. coli Top10 cells (Invitrogen). All restriction enzymes, T4 DNA ligase and Phusion DNA polymerase enzymes were sourced from New England Biolabs. Oligonucleotides were from MWG Biotech Ltd and PCR fragments were purified using the QIAquick gel extraction kit (Qiagen). Plasmid DNA was purified using the QIAprep purification kit (Qiagen).

Solution binding assay by intrinsic tryptophan fluorescence
Intrinsic tryptophan fluorescence (ITF) experiments were carried out using a PTI QuantaMaster 40 spectrofluorimeter, recording data with the FeliX32 software package (PTI, Birmingham, New Jersey, USA). The excitation wavelength was set to 294 nm and the fluorescence emission (F emission ) was recorded between 300-400 nm for each ligand aliquot added to a 200 ml solution containing 20 mM EmbC CT in 50 mM KH 2 PO 4 (pH 7.9), 300 mM NaCl. For EmbC CT , the emission maximum (F emission max ) was at l = 338 nm, providing a basal F emission coordinate for the collection of subsequent ITF data. The change in fluorescence emission (DF emission ) was calculated by subtracting F emission (recorded 2 min after each ligand addition) from F emission max , and the data was then plotted against ligand concentration, [L] (3 independent experiments). A plot of DF emission at l = 338 nm vs. [L] was fitted to the saturation binding equation using GraphPad Prism software:

Circular dichroism spectroscopy
Far-UV circular dichroism (CD) spectra were recorded at 25uC using a Jasco J-715 spectropolarimeter and a cell of 0.01 cm path length. Proteins EmbC CT , EmbC CT(N740A) , EmbC CT(W868A) , EmbC CT(H911A) and EmbC CT(W985A) were dialysed into 50 mM KH 2 PO 4 (pH 7.9), 50 mM NaF to a final concentration of 0.5 mg/ml each. Spectra were recorded of 250 ml aliquots of each protein by measuring ellipticity from 195-260 nm, using a bandwidth of 2 nm and a scan speed of 100 nm/min. Spectra were normalised by subtracting the spectrum of buffer alone (baseline).

Analytical ultracentrifugation
Sedimentation velocity experiments were performed using a Beckman Proteome XL-I analytical ultracentrifuge equipped with absorbance optics. EmbC CT was dialysed into 50 mM KH 2 PO 4 (pH 7.9), 300 mM NaCl, and loaded into cells with two channel Epon centre pieces and quartz windows. A total of 100 absorbance scans (280 nm) were recorded (40,000 rpm, 4uC) for each sample, representing the full extent of sedimentation of the sample. Data analysis was performed using the SEDFIT software, fitting a single friction coefficient [40].

Generation of embC-deficient M. smegmatis and complementation plasmids
Approximately 1 kb of upstream and downstream flanking sequences of the embC gene (MSMEG2785) were PCR amplified from M. smegmatis mc 2 155 genomic DNA using the primer pairs MSEMBCLL, MSEMBCLR, MSEMBCRL and MSEMBCRR, respectively (sequences listed in Supporting Information Table S1). Following restriction digestion of the primer incorporated AlwNI sites, the PCR fragments were cloned into AlwNI-digested p0004S to yield the knockout plasmid pDMSMEGEMBC which was then packaged into the temperature sensitive mycobacteriophage phAE159 as described previously [27] to yield phasmid DNA of the knockout phage phDMSMEGEMBC. Generation of high titre phage particles and specialized transduction were performed as described earlier [27,41]. Deletion of MSMEGEMBC in one hygromycin-resistant transductant was confirmed by Southern blot. For complementation, M. tuberculosis embC was cloned using primer pairs Mt-embC-forward and Mt-embC-reverse (sequences listed in Supporting Information Table S1) and blunt-end ligated into SmaI digested pUC18. For QuikChange mutagenesis (Stratagene) of pUC18-Mt-embC W868A and W985A codons, primer pairs W868A-sense/-antisense and W985A-sense/-antisense (sequences in Supporting Information Table S1, each with 59-phosphate modifications) were used. The 3301 bp product was extracted from plasmids (pUC18-Mt-embC, pUC18-Mt-embC W868A and pUC18-Mt-embC W985A ) digested with NdeI and HindIII, and sub-cloned into the similarly digested mycobacterial shuttle vector pVV16 to yield pVV16-Mt-embC, pVV16-Mt-embC W868A and pVV16-Mt-embC W985A . These plasmids were then used to transform M. smegmatisDembC to yield clones resistant to both hygromycin and kanamycin.

Point mutations in recombinant EmbC CT
QuikChange mutagenesis (Stratagene) was carried out using pET23b-Mt-embC CT (generated as described above). Primer pairs used for the codon alterations N740A, W868A, Q899S, H911A and W985A are listed in the Supporting Information Table S1. Mutant plasmids were subsequently transformed individually into E. coli C41 (DE3). Mutant proteins were expressed and purified as described above.

Analysis of lipoglycans
Lipoglycans form M. smegmatis strains were extracted as described previously [42]. Dried cells were resuspended in deionized water and disrupted by sonication (MSE Soniprep 150, 12 mm amplitude, 60 s on, 90 s off for 10 cycles, at 4uC). An equal volume of ethanol was added to the cell suspension and the mixture was refluxed at 68uC, for 12 h intervals, followed by centrifugation and recovery of the supernatant. The C 2 H 5 OH/ H 2 O extraction process was repeated five times and the combined supernatants dried. The dried supernatant was then subjected to hot-phenol treatment by addition of phenol/H 2 O (80%, w/w) at 70uC for 1 h, followed by centrifugation and the aqueous phase was dialyzed using a 1500 MWCO membrane (Spectrapore) against de-ionized water. The retentate was dried, resuspended in water and sequentially digested with a-amylase, DNase, RNase, chymotrypsin and trypsin. The retentate was further dialyzed using a 1500 MWCO membrane (Spectrapore) against deionized water. The eluates were collected, extensively dialysed against deionized water, concentrated and analyzed by 15% SDS-PAGE using a Pro-Q emerald glycoprotein stain (Invitrogen).

Accession numbers
The accession number for the coordinates and structure factors of the C-terminal domain of EmbC in the Protein Data Bank (http://www.rcsb.org) is 3PTY. Figure S1 Sequence alignment of EmbC CT . CLUSTALW2aligned sequences of the C-terminal domain of EmbC (residues 719-1094) and related Emb enzymes. Species names are abbreviated as Mt = M. tuberculosis, Ms = M. smegmatis, Cg =C. glutamicum. The sequence alignment was formatted using ESPript (espript.ibcp.fr, reference [43] Figure S2 Experimental electron density and Ca 2+ site. A) Solvent-flattened electron density map, contoured at 1.2 s, calculated based on the seleno-methionine substructure, and superimposed over the final refined model of EmbC CT (yellow sticks). The region shown is the S10-S11 loop with the Ca 2+ binding site. B) Comparison of s A -weighted F o 2F c density (contour level 4.5s) without EDTA (green), and with 10 mM EDTA (purple) in the cryo-buffer. Density was calculated with phases and calculated amplitudes of a protein-only coordinate set. The height for the Ca 2+ peak is 21s (no EDTA) and 7s (10 mM EDTA), respectively, while the height of the nearby phosphate peak is ,7.5s in both maps. Found at: doi:10.1371/journal.ppat.1001299.s002 (3.06 MB TIF) Figure S3 Comparison of subdomain II of EmbC CT with structural neighbours. EmbC CT (blue strands, green helices) superimposed over structural neighbours (yellow ribbons) identified by DALI, reference [23]. A) Carbohydrate binding module (CBM) of Paenibacillus polymyxa endo-1,4-b-xylanase (CBM family 36) in complex with b-D-xylopyranose trisaccharide (yellow sticks, 1UX7, reference [44]). B) CBM family 6: Cellvibrio mixtus cellulase B bound to a b-D-glucose trisaccharide (red sticks, 1UYY, reference [26]) C) CBM family 6: Bacillus halodurans BH0236 bound to xylobiose (red sticks, 1W9T, reference [45]). Bound Ca 2+ ions are shown as spheres in green and magenta for EmbC CT and the superimposed CBM, respectively. The side chain of Trp868 in the 'outer' b-sheet of EmbC CT is shown in grey sticks. Found at: doi:10.1371/journal.ppat.1001299.s003 (1.97 MB TIF) Figure S4 Superposition of EmbC CT with Pyrococcus furiosus STT3's C-terminal domain. Superposition of EmbC CT with the 'central core' domain of the C-terminal hydrophilic domain of oligosaccharyltransferase Pyrococcus furiosus STT3 (yellow ribbon, reference [24]) calculated using DALI. Secondary structure elements of EmbC CT with matches in STT3 are labelled in accordance to Figs. 2 and S1. Side chains of the catalytic WWDYG motif in STT3 and of the corresponding tryptophan residue in EmbC CT (Trp985) are shown in blue and red sticks, respectively. The view in panel B is rotated by 90u about the vertical axis relative to panel A, and restricted to subdomain I (residues 735-759, 968-1067). Found at: doi:10.1371/journal.ppat.1001299.s004 (1.19 MB TIF) Figure S5 Major packing interfaces of the EmbC CT crystal lattice. A) Arrangement of 3 copies of EmbC CT on the crystal lattice around the two major packing interfaces, burying 1100 Å 2 (green-magenta) and 670 Å 2 (green-gray) of solvent-accessible surface (SAS) per monomer. B) The helix H0-mediated packing interface burying1100 Å 2 SAS per monomer. C) The strand S2mediated packing interface (670 Å 2 SAS buried per monomer) demonstrating b-sheet formation across the interface. A) Ribbon diagram of EmbC CT , with subdomains I and II shown with orange and blue b-strands, respectively. The Ara(1R5)Ara-O-C8 ligand (and one of its symmetry-related copies) are shown in grey sticks. The semi-transparent sticks show a b-D-Gal hexamer from the structural superposition of EmbC CT with the family 6 CBM of b-agarase (PDB entry 2CDO, reference [46]). Mutated residues are indicated with their sequence numbers. B) Plasmids pVV16 encoding full-length EmbC, or point mutants thereof, were transformed into an embC-deficient M. smegmatis. Cell homogenates were separated into membrane (M) and cytosolic (C) fractions, and probed with an anti-His6 antibody (Roche). The lanes are as follows: 1 -pVV16 (empty vector), 2 -pVV16-Mt-embC, 3 -pVV16-Mt-embC W868A , 4 -pVV16-Mt-embC W985A . C) Far-UV circular dichroism spectra of recombinant EmbC CT (wildtype and point mutants). Found at: doi:10.1371/journal.ppat.1001299.s007 (1.28 MB TIF)