Evolution-guided adaptation of an adenylation domain substrate specificity to an unusual amino acid

Adenylation domains CcbC and LmbC control the specific incorporation of amino acid precursors in the biosynthesis of lincosamide antibiotics celesticetin and lincomycin. Both proteins originate from a common L-proline-specific ancestor, but LmbC was evolutionary adapted to use an unusual substrate, (2S,4R)-4-propyl-proline (PPL). Using site-directed mutagenesis of the LmbC substrate binding pocket and an ATP-[32P]PPi exchange assay, three residues, G308, A207 and L246, were identified as crucial for the PPL activation, presumably forming together a channel of a proper size, shape and hydrophobicity to accommodate the propyl side chain of PPL. Subsequently, we experimentally simulated the molecular evolution leading from L-proline-specific substrate binding pocket to the PPL-specific LmbC. The mere change of three amino acid residues in originally strictly L-proline-specific CcbC switched its substrate specificity to prefer PPL and even synthetic alkyl-L-proline derivatives with prolonged side chain. This is the first time that such a comparative study provided an evidence of the evolutionary relevant adaptation of the adenylation domain substrate binding pocket to a new sterically different substrate by a few point mutations. The herein experimentally simulated rearrangement of the substrate binding pocket seems to be the general principle of the de novo genesis of adenylation domains’ unusual substrate specificities. However, to keep the overall natural catalytic efficiency of the enzyme, a more comprehensive rearrangement of the whole protein would probably be employed within natural evolution process.


Introduction
Lincosamides are a small but clinically important group of antibiotics consisting of only two compounds with a characterised biosynthetic gene cluster, lincomycin and celesticetin ( Fig  1A), produced by Streptomyces lincolnensis and Streptomyces caelestis, respectively. The crucial step of lincosamide biosynthesis is the condensation of amino sugar and amino acid precursors via an amide bond [1]. While the amino sugar precursor of both natural lincosamides is identical, the biosynthetic origin and availability of the incorporated amino acid (green in Fig 1A) differ. The celesticetin precursor, L-proline, is a regular component of the cellular proteinogenic amino acid pool, while the lincomycin precursor is an unusual alkyl-L-proline A-domain recognises the amino acid substrate and activates its carboxyl functional group by binding of adenosine monophosphate [3]. The activated amino acid precursor is subsequently attached to the carrier protein [7] and condensed with the activated amino sugar precursor [1]. In lincomycin biosynthesis, the PPL precursor is specifically recognised and activated by A-domain LmbC, while the homologous protein CcbC from celesticetin biosynthesis is strictly L-proline-specific [3]. The substrate specificity of the A-domain thus determines which amino acid will be incorporated in the molecule of the resulting lincosamide. Phylogenetic analysis of CcbC and LmbC revealed that they both belong to the subfamily of stand-alone L-proline-specific A-domains. Their sequence identity to these L-proline-specific A-domains from biosyntheses of various natural products ranges from 33.0 to 39.7% [3]. Nevertheless, the CcbC/LmbC mutual 55.7% identity [7,8] significantly exceeds this level, suggesting their direct evolution from a common L-proline-specific ancestor. It makes this pair a suitable experimental model for the study of molecular evolution of A-domain substrate specificity.
Substrate specificity of the A-domain is determined by a "nonribosomal code" consisting of 10 amino acid residues that create a substrate binding pocket (SBP). Two SBP residues (lysine and glutamate) interacting with the carboxy-and amino-group of substrate, respectively, are conserved in all amino acid-activating A-domains. The remaining eight variable residues are supposed to determine substrate specificity [9][10][11]. The nonribosomal code of LmbC differs from that of CcbC in five of the eight variable amino acid residues (Fig 1B), likely as a result of its adaptation to use the unusual PPL precursor. Homology models of LmbC/CcbC SBPs with PPL or L-proline substrate, respectively, show that those differences in nonribosomal codes probably result in differences in the overall size, shape and hydrophobicity between both SBPs ( Fig 1C) [3]. The modelled CcbC SBP has a smaller cavity, where the substrate is in contact with only three variable residues of the nonribosomal code-V202, A274 and V306. This binding site thus appears to be too small to accommodate the alkyl side chain of PPL. In contrast, in the homology model of the LmbC SBP, a hydrophobic channel accommodating the alkyl side chain of PPL has been predicted [3].
A-domains that activate either proteinogenic or, even more often, unusual amino acids are an indispensable part of the biosynthesis of the large portion of existing natural compounds. Here, we used a unique system of two functionally characterised and evolutionary closely related stand-alone A-domains, LmbC and CcbC, and attempted to simulate the process of the molecular evolution of the substrate specificity of the L-proline-specific A-domain to activate the unusual APD.

Site-directed mutagenesis and construction of expression vectors
Site-directed mutagenesis of lmbC was performed using the vector plmbC3 [3] as a template and the QuickChange Site-Directed Mutagenesis Kit (Stratagene, USA) as described previously for the preparation of LmbC G308V [3]. Site-directed mutagenesis of ccbC was performed analogously to the mutagenesis of lmbC: first, the ccbC gene was excised via the NdeI and HindIII restriction sites from the pccbC vector and inserted into a pJAKO cloning vector [12] using the same restriction sites. Next, the resulting pccbC2 plasmid was used as a template for the in vitro site-directed mutagenesis of ccbC. Primers used for site-directed mutagenesis are listed in S1 Table. Multiple mutations were prepared by repeating the site-directed mutagenesis protocol using the already mutated plmbC3 or pccbC2 as a template.
The mutated genes lmbC (excised via NdeI and XhoI restriction sites) and ccbC (excised via NdeI and HindIII restriction sites) were inserted into expression vectors pET42b and pET28b, respectively. The open reading frames were confirmed by sequencing. The resulting vectors were used for the production of LmbC and CcbC mutant proteins with a C-terminal His 8 -tag and an N-terminal His 6 -tag, respectively.

Preparation of the chimeric adenylation domain
Outer portions of the ccbC gene were amplified from the plasmid pccbC2 using the primer pair CcbC1_for and CcbC1_rev and primer pair CcbC2_for and CcbC2_rev (S2 Table). The central part of the lmbC gene, coding for amino acid residues 173 to 315, was amplified from the plasmid plmbC3 using the primers LmbC1_for and LmbC1_rev (S2 Table). The outer and central parts were fused by PCR using primers CcbC1_for and CcbC2_rev. The chimeric ccbC gene was inserted into the pET28b expression vector via NdeI and HindIII restriction sites. The resulting plasmid was used for production of the N-terminal His 6 -tagged protein. The open reading frame of the chimeric gene was confirmed by sequencing.

Heterologous production and purification of proteins
All proteins were heterologously produced in Escherichia coli BL21 (DE3) as described previously [3] at a postinduction temperature of 17˚C for 20 hours. Protein purification was performed according to a previously described method [3]. The CcbC, CcbC mutants and chimeric CcbC proteins were washed on a column with TS-8 buffer containing 50 mM imidazole and the LmbC and LmbC mutants were washed with TS-8 buffer containing 100 mM imidazole. All proteins were eluted with TS-8 buffer containing 250 mM imidazole. The concentration of purified proteins was determined spectrophotometrically.

Enzyme activity assay
The A-domains were biochemically characterised using an ATP-[ 32 P]PPi exchange assay-the amino acid-dependent exchange of radioactivity from [ 32 P]-labelled PPi into ATP. This standard method was previously used for the characterisation of other stand-alone A-domains [13][14][15]. The enzyme activity assay was conducted as described previously [3] to ensure the comparability of results. The linearity of reaction velocity during the 30-minute testing range was confirmed. Negative control reactions were conducted by excluding substrate. The kinetic parameters were determined by non-linear regression using the programme KaleidaGraph 4.5.2.

Results and discussion
LmbC SBP mutagenesis: Detection of residues affecting the affinity for PPL We assessed the impact of amino acid residues of the LmbC SBP on its preference for PPL over L-proline. Amino acid residues of the LmbC nonribosomal code, which differ from corresponding residues of the CcbC nonribosomal code (Fig 1B), were individually replaced by their CcbC counterpart. His-tagged forms of the mutated A-domains were heterologously produced and purified as described in the experimental section. Their activities were determined using the ATP-[ 32 P]PPi exchange assay. The kinetic parameters of LmbC and LmbC single mutants for PPL and L-proline are summarised in Table 1 and the Michaelis-Menten plots in S1 Fig. LmbC single mutants can be divided into two groups according to their activity in reactions with PPL. The first group includes mutants LmbC I300L and LmbC V274C, whose kinetic parameters only slightly differ from LmbC. Residues in these positions may have been subject of a random mutation during the evolution of LmbC, with minimal influence on the final PPL specificity. Examples of variability in one or two residues of the nonribosomal code of related stand-alone proline-specific A-domains were reported previously [16,17].
The remaining three mutations significantly affected the LmbC acceptance of PPL. From the comparison of K m values of LmbC G308V and LmbC A207F, it is apparent that the affinity of these mutants for PPL was more than 10 times lower in contrast to LmbC. The LmbC residues (G308 and A207) with no or minimal side chain, respectively, likely contribute to the formation of the channel of the proper shape and size to accommodate the propyl side chain of PPL ( Fig 1C, red and orange). Conversely, the function of residue L246, which was experimentally documented to have the highest impact on the affinity to PPL (Fig 1C, light blue), cannot be fully explained by homology models, except for the possible adjustment of hydrophobicity of the channel in the SBP [3]. However, both the affinity and catalytic rate constant of LmbC L246Y for PPL were two orders lower compared with LmbC characteristics. The L246Y mutation is also the only one that decreases the Km value of LmbC for L-proline by an order. We can only speculate that the tyrosine large planar side chain may stabilise the SBP and makes it more compact and suitable for the binding of L-proline, similar to the CcbC SBP. It is also possible that the corresponding tyrosine residue Y244 in CcbC interacts with F205 either by π-π stacking or simply by steric effects to better accommodate the L-proline, which is not the case for LmbC L246Y, where A207 (conform to F205 of CcbC) is unable to delineate by a similar way the steric orientation of the artificially introduced Y246. Our results together suggest that L246Y may fulfil an important role in the LmbC SBP, but it probably cannot be elucidated without the crystal structures of LmbC/CcbC proteins.
The single mutations of three abovementioned important residues negatively influenced the LmbC affinity for PPL, and in addition their combination completely abolished its activation, confirming the significance of these residues, as summarised in S3 Table and S1 Fig. The channel that accommodates the propyl side chain was probably completely blocked in the Evolutionary adaptation of an adenylation domain LmbC double and triple mutants. It should be noted that all these mutants were still active in reactions with L-proline, indicating that the proper protein fold was at least partially preserved.
In summary, these experiments evaluated the previously designed homology models of CcbC/LmbC SBPs and revealed three amino acid residues (G308, A207 and L246) in the LmbC SBP that are significant for LmbC affinity for PPL. These residues together likely contribute to the formation of a channel of a proper size, shape and hydrophobicity to accommodate the propyl side chain of PPL.
CcbC mutagenesis: Verification of the evolutionary adaptation of the Adomain SBP to accommodate unusual PPL After elucidation of the key LmbC residues affecting the acceptance of PPL, we used CcbC to experimentally verify the evolutionary adaptation of the L-proline-specific A-domain substrate specificity to prefer PPL. Residues in the CcbC SBP located in the corresponding positions to the three abovementioned significant LmbC residues were replaced by them. All CcbC mutants included a mutation of the essential residue V306, which interferes with proximal atoms of the substrate's alkyl side chain and sterically hinders its accommodation (Fig 1C,  red). It was subsequently combined with mutations F205A and/or Y244L, which are localised deeper in the alkyl side chain-accommodating channel, resulting in double and triple mutants.
The affinity of all tested CcbC mutants for L-proline and various APDs is summarised in Table 2  Another CcbC double mutant (V306G + Y244L) exhibits modified substrate specificity and is also capable of activating APDs with 2C or 3C side chains. However, the natural substrate of LmbC, PPL, is strongly preferred over L-proline only by the CcbC triple mutant with the additional F205A mutation. Based on the homology model, this mutation likely facilitates the accommodation of distal atoms of the PPL side chain into the channel in the SBP. In accordance, the triple mutant also activates the synthetic L-proline derivatives with prolonged alkyl side chains, (2S,4R)-4-butyl-proline (BuPL) and (2S,4R)-4-pentyl-proline (PePL), with K m  Evolutionary adaptation of an adenylation domain values even lower than those for PPL. This decreasing trend of K m values from L-proline to PePL mimics the substrate preference of LmbC [3]. It should be mentioned that the CcbC triple mutant retains 99.4% identity with the strictly L-proline-specific CcbC but only 56.2% identity with the PPL-preferring LmbC. In other words, there are 224 remaining differences (214 substitutions and 10 insertions/deletions) between LmbC and CcbC triple mutant with identical substrate specificity patterns. Our results show that so minor modification of the overall primary structure as these three substitutions in the SBP of the L-proline-specific Adomain is sufficient to simulate the evolutionary adaptation of its substrate specificity to a new unusual substrate. Site-directed mutagenesis, guided by the nonribosomal code, was previously used in several studies to alter A-domain substrate specificity [18][19][20][21][22]. Nevertheless, in any of these experiments, such a conclusively evolutionary close but substrate specificity divergent pair such as CcbC/LmbC has not been studied. This is the first time that such a comparative study provided an evidence of the evolutionary adaptation of the A-domain substrate specificity to a new sterically different substrate by a few point mutations.
At the molecular level, this evolutionary shift is probably caused by a dramatic rearrangement of the SBP, specifically by formation of a hydrophobic channel accommodating the alkyl side chain of the substrate, while binding of the L-proline without any alkyl side chain is disadvantaged. The formation of a channel in the SBP accommodating a prolonged alkyl side chain was recently reported in a comparative study of another pair of related A-domains differing in substrate specificity. The incednine A-domain has a shallow SBP, where the bulky L220 residue prevents the incorporation of a substrate with a longer side chain. In contrast, the cremimycin A-domain possesses a smaller residue, G220, at the corresponding position, allowing the tunnel to extend over the position of G220 and accommodate the substrate's side chain [23].
In contrast to the shift in substrate specificity, the overall catalytic efficiency of PPL-activating CcbC double and triple mutants is far from the parameters of LmbC. As shown in Table 3, the catalytic rate constant, and thus the overall catalytic efficiency, is significantly lower compared to LmbC. It can be at least partially explained by nonselective worsening of the overall catalytic efficiency of CcbC mutants, as the catalytic rate constant for L-proline is also reduced (see Table 3 and S4 Table for other tested CcbC mutants). The nonselective worsening of the overall catalytic efficiency is, however, a common consequence of multiple artificial changes in natural proteins [18,19,21,[24][25][26]. Moreover, in contrast to substrate specificity, the overall catalytic efficiency is also affected by the amino acid residues neighbouring the SBP and the entire tertiary structure of the Adomain [18,24]. We suggest that the worse overall catalytic efficiency of CcbC mutants for PPL may be the result of incompatibility between artificially-changed residues in the SBP and some of the hundreds of residues changed outside of the SBP during the separated evolution of CcbC and LmbC proteins from their common L-proline-specific ancestor.
In addition to the evolutionary significance described herein, this type of studies also has an application potential. More than a hundred hybrid lincosamide compounds were recently prepared in vitro using the combination of enzymatic activities from celesticetin and lincomycin biosynthesis [5]. Those that combine the incorporation of the lincomycin-specific PPL precursor together with the salicylate unit, which is specific for celesticetin, exhibited even higher antibacterial activity than the clinically important lincomycin. Based on the knowledge of salicylate attachment in celesticetin biosynthesis [5,[27][28][29], a celesticetin-producing strain with genetically engineered CcbC to accept PPL or APD with prolonged side chains could be used for the mutasynthetic preparation of the most potent lincosamide compounds, even with significant antimalarial activity [4][5][6]. Nevertheless, the fully active enzyme is necessary for these practical purposes. The approaches used to increase the overall catalytic efficiency should take into consideration the entire protein sequence. These methods resemble recombination, an evolutionary mechanism described in modular NRPS A-domains [30][31][32][33]. Artificial recombination has been successfully used to prepare chimeric proteins from the modular NRPS Adomains in hormaomycin biosynthesis [32]. Using this approach we prepared the soluble chimeric LmbC/CcbC protein, nevertheless it was inactive in reactions with both L-proline and PPL.

Evolutionary impact of the lincosamide model in the context of other APD activating A-domains
Adaptation of the L-proline-specific A-domain to use an unusual PPL precursor was an important milestone in the molecular evolution of lincosamide biosynthesis, resulting in the production of the more efficient antibiotic, lincomycin. Analogous scenario i.e. the evolution of metabolites involving an APD moiety instead of the L-proline emerged several times in nature. APD precursors nearly identical to PPL are incorporated into anticancer pyrrolo [2,1-c] [1,4]benzodiazepines (PBDs; S3 Fig) [34,35] and the bacterial signalling molecule, hormaomycin (S3 Fig) [32]. Accordingly, the biosynthetic pathways of all these APD containing compounds share nearly identical set of 5-6 enzymes encoded by APD biosynthetic gene cluster spread by the mechanism of horizontal gene transfer [2,[36][37][38][39].
In contrast to the common origin of the APD biosynthetic genes, phylogenetic analysis convincingly documented that the relevant APD specific A-domains evolved independently from different ancestors in the biosynthesis of PBDs, hormaomycin and lincomycin [3] (updated in S4 Fig). Nevertheless, in all three cases, APD-specific A-domains arose from L-prolinespecific ancestors. We suggest that their adaptation to a new unusual amino acid substrate occurred by an identical molecular mechanism as the adaptation of LmbC, by point mutations in the SBP of an L-proline-specific ancestor. It can be demonstrated by the example of the SibD A-domain from the biosynthesis of PBD sibiromycin. The variable residues of its nonribosomal code (VMFYTALV) differ from the consensus code of related L-proline-specific modular NRPS A-domains (VQ(F/Y)IAHVV) in five underlined residues. It resembles the dramatic rearrangement of the SBP in A-domains from lincosamide biosynthesis.
Because compounds with incorporated unusual amino acid precursors form a large portion of all occurring natural products, the genesis of substrate specificity of the corresponding A-domains is a topic of high general significance. Here we documented this process on a model of molecular evolution of a pair of stand-alone A-domains. Even though the evolutionary mechanism of recombination has been described for more frequent A-domains of modular NRPSs [30][31][32][33], this mechanism can only elucidate the emergence of new combinations of incorporated amino acid units, but not the genesis of unusual substrate specificity de novo. The presented SBP rearrangement thus seems to be the general principle for the molecular evolution of both groups of A-domains.