Enzyme engineering: A synthetic biology approach for more effective library generation and automated high-throughput screening

The Golden Gate strategy entails the use of type IIS restriction enzymes, which cut outside of their recognition sequence. It enables unrestricted design of unique DNA fragments that can be readily and seamlessly recombined. Successfully employed in other synthetic biology applications, we demonstrate its advantageous use to engineer a biocatalyst. Hot-spots for mutations were individuated in three distinct regions of Candida antarctica lipase A (Cal-A), the biocatalyst chosen as a target to demonstrate the versatility of this recombination method. The three corresponding gene segments were subjected to the most appropriate method of mutagenesis (targeted or random). Their straightforward reassembly allowed combining products of different mutagenesis methods in a single round for rapid production of a series of diverse libraries, thus facilitating directed evolution. Screening to improve discrimination of short-chain versus long-chain fatty acid substrates was aided by development of a general, automated method for visual discrimination of the hydrolysis of varied substrates by whole cells.


Introduction
Effective mutagenesis strategies in enzyme engineering are often dependent on the generation of small and targeted, high-quality libraries of mutants [1]. Such 'smart' libraries are consistent with practical constraints imposed by the screening effort: while point-mutant libraries are readily screened, we need creative solutions to improve our capacity to explore the combinatorial complexity of sequence space. Indeed, simultaneous amino acid substitutions may have non-additive or epistatic effects (cooperative or antagonistic) on protein function [1][2][3][4]. To sample complex mutational patterns, efforts are increasingly made to maximize protein sequence diversity while keeping the library size manageable. Strategies include controlling mutational bias through the use of a reduced genetic alphabet (i.e. NNK, NDT, or more a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 in a unidirectional manner [19]. In addition, since the restriction site is absent from the assembled construct, restriction and ligation can be performed in a one-pot fashion.

Library design
Our smart library design considered the reported knowledge on the putative mode of Cal-A substrate binding. The long chain of a C18 fatty acid substrate has been hypothesized to bind in a tunnel where PEG crystalized, and targeted NDT mutagenesis of that region had previously shown some effect on cis-trans substrate selectivity [25]. To explore greater sequence diversity, the entire tunnel region (residues 211-350) was targeted for random mutagenesis and thus constitutes one of the three Cal-A regions we mutated.
Based on further structural analysis (PDB identification code: 2VEO) [26] and proximity to the catalytic triad (within 5 Å from the hydroxyl oxygen of Ser183), residues Tyr93, Tyr183 and Phe431 were selected as potential hot-spots for mutagenesis. Residues Tyr93 and Tyr183 are located in the N-terminal region of Cal-A, which we name 'part 1'. Mutation of the bulky Tyr93, located below the catalytic triad, could modulate substrate access; removal of the bulky Tyr183 appears to open a putative tunnel (confirmed by molecular dynamics simulations; results not shown). Finally, Phe431 is located in 'part 3', corresponding to the Cal-A C-terminal region. Phe 431 belongs to the 'small loop', hypothesized to gate substrate entry [26] (motion observed by molecular dynamics simulations; results not shown). We chose NDT degeneracy at these three positions (where N = A,C,G,T, and D = A, G or T; covering 12 codons, 12 amino acids) as it provides good chemical diversity while keeping library size modest [29].
To increase the likelihood of modulating Cal-A substrate specificity, we sought potential synergistic effects that required combining libraries of the above mutations [1,30]. This strategy integrates random and focused mutagenesis, proposed to be a hallmark of the most successful mutagenic strategies [3]. Our target residues/areas span the entire Cal-A gene (Tyr93, Tyr183, the 211-350 putative tunnel region, and Phe431) and require different mutagenesis Breakdown of Cal-A into three parts for independent mutagenesis. Shown in cartoon representation with the catalytic triad (Ser184, Asp334, His366; yellow sticks) and key residues (Tyr93, Tyr183 and Phe 431; purple sticks). PART 1 (N-terminal region 11-210, in green) is comprised of the α/β fold and includes Tyr93 and Tyr183. PART 2 (tunnel region from 211-350, in blue) has been hypothesized to bind the substrate [25]. PART 3 (from 351 to C-terminal His-tag, in red) contains the small loop (in orange), with Phe 431 that may act as a gate-keeping residue. (PDB identification code: 2VEO) [26] doi:10.1371/journal.pone.0171741.g001 methodologies, and are, therefore, ideally suited to being combined for seamless assembly using the Golden Gate method (Fig 1).
The three 'parts' of Cal-A were synthesized in a codon-optimized form (S1 Fig) and inserted into the DNA2.0 pM269 mother vector. When ready to be recombined, for example upon library generation, the parts were reassembled into the complete gene in an Electra daughter vector (pD441pelB, DNA2.0) customized to carry the pelB leader sequence for periplasmic expression under control of the T5 promoter (constructs are illustrated in S2 Fig). The pelB signal sequence provides high expression of Cal-A [25]. It is worth noting that it is possible to generate a variety of library combinations: in fact, mutated parts can be recombined with other mutated parts or with the native ones. Each new combination is a library per se, and this ensures maximum combinatorial freedom. This method finds an interesting application in the generation of randomized libraries limited to specific parts of a protein. For instance, in our work, it allowed for the generation of a randomized library of part 2 of Cal-A, while maintaining parts 1 and 3 native. This is advantageous when only specific parts of the sequence are deemed worth targeting with random mutagenesis. In this work, it is not known which residues within part 2 of Cal-A are involved in substrate binding, justifying a random approach. The recombination of this randomized library with native parts 1 and 3 ensured minimal potential disruption in the rest of the protein.
Our assembly strategy used the BsaI and SapI type IIS restriction enzymes, which allowed for the design of customized overhangs. The junctions between parts were designed with a unique BsaI restriction site while two unique SapI restriction sites were designed to ligate the assembled gene into the daughter vector (Fig 2). We note that the parts could equally be ligated into any expression vector containing appropriately designed SapI sites. Furthermore, codon optimization is optional.
The major strength of our method is that the parts can be treated independently for the purposes of library generation, and can be assembled at will to recombine the full gene with any or all parts mutated (primers and conditions are given in S1-S5 Tables). We thus rapidly obtained a variety of complex, mutated libraries, ready to bring forward to screening ( Table 1). The final constructs have lost the BsaI and SapI recognition sites. As a consequence, the ligated parts cannot be directly separated by restriction. However, by designing primers with overhangs that reintroduce the BsaI/SapI sites, it is possible to amplify the mutated parts by PCR and mix and match them at any time. Herein rests the impressive flexibility of the system. We note that, as for all cloning strategies, care should be taken to ensure that no unwanted BsaI/ SapI restriction sites (or whichever type IIS restriction enzyme is used) are present in the sequence of interest. As discussed above, the diversity included in the ten libraries (Table 1) was generated by mutagenesis using NDT degeneracy for parts 1 and 3 and using error prone PCR for part 2 (S5 Table).
The mutants were assembled according to Fig 2. The PCR steps 1 and 4 ensure a high availability of DNA, maximizing the transformation efficiency. Furthermore, when performing restriction and assembling the parts together and with the vector, the total DNA mass in the reaction is reduced if the mother vectors are not present, minimizing reaction volume and units of restriction enzyme needed. Transformation of the mutant libraries yielded at least 10 3 transformants for each library: between 80 and 100% of the colonies contained the desired constructs. In one instance (construction of library Tyr93-Phe431), use of the circularized daughter plasmid afforded no transformants. This issue was resolved by using the commercial, pre-cut daughter plasmid, although the reason for this difference in performance is not clear.
Colonies were picked and propagated individually. The quality of each of the degenerate NDT libraries, whether resulting from mutation of one part or assembly of more than one mutated part, was routinely assessed by sequencing pooled clones. As illustrated in Fig 3, codon degeneracy can be clearly visible in the DNA sequencing electropherogram (Fig 3), confirming the expected distribution of nucleotides instead of the original codon. Sequencing of individual clones further confirmed library quality (S6 Table). The quality of the randomly mutated library was assessed by sequencing 20 clones ( Table 2). The 20 clones carried a total mutational load of 34, with approximately 25% of the clones being wild-type and the majority of clones carrying two mutations (average = 1.65), for a maximum of 4 mutations per clone. Facile reassembly of individually mutated gene parts. The Cal-A gene was obtained as three separate parts in DNA2.0 mother vectors. In this method, the parts can be mutated independently as appropriate for each part (Table 1, yellow stars represent illustrative mutations). As a proof of concept to demonstrate the versatility of the method, NDT libraries were generated for parts 1 and 3, and part 2 was randomly mutated. The parts (both mutated and wild-type) were then amplified by PCR reactions. They were then purified (steps 1, 2 and S3 Fig) for assembly into a number among the possible combinations of mutated parts (see Table 1 for chosen combinations), in a one-pot restriction-ligation reaction using BsaI (3). The library of assembled genes was PCR amplified and gel purified (4,5 and S4 Fig). Each amplified library was inserted into the daughter vector (6) using SapI in a one-pot restriction-ligation reaction, for transformation into E. coli (7). Note that a simplified version of this strategy is also possible, but was found to work only when applied to the wild-type parts (S1 and S5 Figs).

Screening
The resulting Cal-A libraries expressed in E. coli BL21 (DE3) were screened to evaluate selectivity of hydrolase activity toward long versus short-chain triglycerides. To achieve the required screening throughput, we devised a novel strategy based on a well-established in vivo screening assay for esterases [31,32]. Lipases have been screened in the past using robust inplate assays involving either tributyrin or olive oil/rhodamine emulsions in agar: plates were inoculated with lipase variants, and hydrolytic activity could be detected upon triglyceride hydrolysis either with the formation of a halo of clearance (in the case of tributyrin) or a fluorescent rhodamine halo upon fatty acid release due to a change in pH [31,33]. Importantly, the aforementioned techniques are well suited to be automated and allow for the screening of the lipase towards triglycerides, which are substrates of industrial interest.
Our strategy makes use of a liquid-handler robot to inoculate individual clones onto rectangular agar plates containing the emulsified triglycerides, starting from saturated cultures of the  variants. This results in a perfectly ordered, compact array of variants exposed to the substrate of interest. Either tributyrin (C4), or olive oil (70% oleic acid, C18) with rhodamine, were emulsified into the agar. The inoculated plates were grown at 30˚C, overnight (16 hrs) until colonies of manageable size appeared (between 0.2 and 0.5 cm). Active clones gave rise to a clear halo around the colony against the otherwise opaque tributyrin emulsion, or a fluorescent halo in the case of the olive oil and rhodamine emulsion (Fig 4). The use of a saturated bacterial inoculum (overnight growth) provided highly reproducible results on triplicate plates (S2 Fig). This method allowed us to rapidly and reliably individuate Cal-A lipase variants that discriminate between short-and long-chain triglyceride substrates (Fig 4). It was possible to test four plates of 96 variants at a time. Pictures of the plates were collected using a gel imager. The results were consistent in triplicates (S2 Fig) and analysis of the plates revealed the presence of variants able to discriminate between the various substrates, whilst the wild-type was active with tributyrin and olive oil without distinction. In the representative example of screened plates presented in Fig 4, variant B6 hydrolyzes olive oil preferentially over tributyrin, while variants D4, D6 and G9 show the inverse selectivity. The majority of the variants do not show any improved selectivity compared to the wild-type Cal-A (e.g. A2, A3, the majority of row B, etc, relative to wild-type A11, starred) or they lose the ability to hydrolyze both of the substrates (A1, A4, A5, and so on).
Our protocol allows for screening automation and augments the throughput of an otherwise classical assay for lipase/esterase activity because it arrays whole cells rather than lysates or purified enzyme. The use of a liquid handler ensures reproducibility, precision, and fast operations, making the assay robust and convenient. The rhodamine version of the method is extremely versatile, as oils from the most different sources can be used as substrates for the lipase  (i.e. coconut oil, palm oil). A total of 735 clones were screened (Table 1). Among these, 88 clones (12%) showed clear ability to discriminate between long-and short-chain fatty acids whereas the wild-type Cal-A was indiscriminate (Fig 4 and Table 3). When Tyr93 was mutated, 89% of the clones retained activity and discriminated preferentially towards long-chain fatty acids. In contrast, Tyr183 was crucial for activity: when mutated, only 10% of the variants retained activity, with a preference for the hydrolysis of short-chain fatty acids. Phe431 tolerated mutation without little effect, as activity was maintained in 90% of the variants and no increased discrimination was found. For the randomized Part 2 library, 66% of the variants retained activity: the ratio between variants active on short-chain vs. long-chain fatty acids was 39:1. These results demonstrate the suitability of the Golden Gate strategy to achieve rapid generation of readily recombined functional diversity.

Conclusions
Our method allows for maximum versatility both in the generation and the screening of smart libraries of mutants. The generation of libraries based on the Golden Gate strategy is quick and enables easy library combination to study the synergistic effect of the mutations. Furthermore, each part of the gene is treated separately, with the possibility of applying different strategies for mutagenesis to each part. Here, we illustrated the use of targeted mutagenesis for parts 1 and 3 and random mutagenesis for part 2. As a result, this method provides a means to create highly mutated portions of genes that can be inserted either into the wild-type background (see Library 10) or recombined together (Libraries 8,9) such that epistasis among residues within specified regions can readily be explored. Moreover, there is no need to engineer restriction sites into the gene sequence, as the method is seamless. This novel strategy can be applied to any protein, offering countless possibilities for facile generation of smart libraries. Furthermore, the Golden Gate method was previously used to assemble up to nine parts of DNA [19], hence we envisage that the method in this paper could be extended to recombine more than three parts of an enzyme, to achieve even higher flexibility. As a proof of concept, the method was successfully applied in conjunction with a novel automated strategy for qualitative screening to alter the selectivity of Cal-A in hydrolyzing short-chain and long-chain fatty acid esters.

Materials, strains, vectors and culture conditions
Unless otherwise stated, all chemical reagents and DNA primers were purchased as analytical grade from Sigma-Aldrich. Nunc™ OmniTray™ rectangular petri dishes were purchased from Thermo Fisher Scientific. Our in vivo screening was performed using a Beckman Coulter Chemically competent E. coli BL21 (DE3) were prepared by the CaCl 2 method. The original pelB-Cal-A construct was a kind donation of Prof. Uwe Bornscheuer, Department of Dept. of Biotechnology & Enzyme Catalysis, Greifswald University, Germany. This construct was used to produce the initial Tyr93 NDT library. The library was re-generated in the pM269 DNA2.0 mother vector, and subsequenctly in the pD441pelB daughter vector, as reported below.
The codon-optimized Cal-A parts provided in mother vectors (pM269; chloramphenicol resistant) and the linearized daughter vectors (pD441pelB, pD441OmpA; kanamycin resistant) were purchased from DNA2.0 (California, USA; https://www.dna20.com). The DNA2.0 (https:// www.dna20.com) terminology was used throughout this report, where 'mother vector' refers to a plasmid carrying a part, and 'daughter vector' is the expression plasmid. The codon-optimized sequence of wild-type Cal-A resulting from assembly of the three parts is reported in the supplemental information (S1 Fig).  Transformed E. coli strains were generally cultured on Luria-Bertani (LB) agar and in LB broth, both containing ampicillin (100 μg/mL), kanamycin (50 μg/mL) or chloramphenicol (35 μg/mL), depending on the resistance marker, at 37˚C for 16 hours with shaking at 250 rpm, when appropriate.

Assembly of plasmids and library generation
Restriction digestion of both daughter plasmids and parts was performed routinely at 37˚C over 30 to 45 minutes using either SapI or BsaI restriction enzymes. Ligation of the parts among themselves and with the vector was performed at 37˚C, over 30 to 45 minutes. PCR conditions for the amplification of the parts (wild-type and mutated) are detailed in the supplementary information (S2 Table). Mutants of Cal-A were generated by circular mutagenesis according to the manufacturer's instructions using the QuikChange Lightning Site-Directed mutagenesis kit and mutagenic primers (S1 Table). All genetic constructs were designed and assembled in silico using the SnapGene software (GSL Biotech; available at snapgene.com). The mutational frequency and Ts/Tv (transition/transversion ratio) were calculated with the Mutanalyst software, available on line. Library properties calculated at the PEDEL-AA server page [34] assumed a Poisson distribution [35]. The resulting ligation mixes were transformed into E.coli BL21 (DE3) expression host. Colonies were picked individually and grown in 96 deep-well plates in LB supplemented with the appropriate antibiotic.

Screening of Cal-A variants through automation by liquid handler robot
The screening of the variants against tributyrin and olive oil was based on previously reported techniques [31,32]. With the help of a Beckman Coulter Biomek NX p robot, we transformed the manual screen into an automated version that allows for higher throughput. We used (give the dimensions here) rectangular petri dishes to cast the growth media to be inoculated with the library variants. Tributyrin plates were prepared as follows: 1.5 g of agar were added to 100 mL of auto-inducing ZY medium [36] adjusted to pH 8. Tributyrin oil (2.5 g) was added after autoclaving, with kanamycin. For the olive/oil rhodamine plates, the tributyrin was replaced with 2.5 g olive oil, and rhodamine was added at a concentration of 0.001% w/v. The media were thoroughly shaken to generate strong emulsions before plating. Each plate was cast with 42 mL of medium and left to set on a smooth and perfectly horizontal surface. The plates were stacked in the liquid handler, ready to be picked up by the robotic arm. On the deck of the robot, four 96-well plates containing 1 mL aliquots of the library variants pre-grown to saturation (LB, overnight, shaking, 37˚C) served as inoculum for the agar plates. A script was designed to pick up 20 μL of culture, column by column, from the inoculum plates and spot 8 μL of inoculum by lightly touching the agar surface. The pipette tips were re-used for the same inoculum. The excess liquid culture was released in an ethanol waste. Inoculated plates were moved by the robotic arm to a second stacker to be picked up by the user for overnight incubation at 30˚C. The incubated plates were visualized by a gel imager and analyzed. As a negative control, a culture of E. coli BL21 (DE3) harboring an inducible, unrelated gene (cytochrome P450 BM3) was tested under the same conditions. A halo of clearance around a colony on tributyrin-containing medium indicates activity toward the short-chain substrate while a halo of fluorescence around a colony on olive oil/rhodamine-containing medium indicates activity toward long-chain substrate. Halo analysis is qualitative, giving a yes/no response as to activity of a variant toward a specific substrate. The method is qualitatively robust, in that an active variant consistently shows a halo and a negative variant consistently does not (S6 Fig =  triplicate plates). Although reproducible, the current method does not lend itself to quantitative activity measurements because of factors including insufficient precision in the number of cells inoculated which affects colony size and thus appearance of the halo (diameter and/or intensity).  Table. PCR conditions routinely used to amplify the parts from the mother vectors. The Phusion Green High-Fidelity DNA Polymerase was used to ensure maximum fidelity. Primers Inner34_fwd and part1_rvs were used for amplification of part1, part2_fwd and part2_rvs for part2 and part3_fwd and Inner34_rvs for part 3 (S1 Table). Table. PCR conditions routinely used to amplify the ligated parts before ligation into the daughter vector. Phusion Green High-Fidelity DNA Polymerase was used to ensure maximum fidelity. Primers Inner34_fwd and Inner34_rvs were used for amplification (S1 Table). Table. Conditions routinely used to perform colony PCR in order to screen for clones carrying the correctly assembled constructs. (DOCX) S5 Table. Conditions used to perform error prone PCR. The random library in part2 was generated using the GeneMorph II Random Mutagenesis Kit by Agilent. This kit allows for the elimination of any bias during mutation. Following the manufacturer's instructions, we applied the conditions yielding the highest mutation rate. This is achieved by using very little template DNA (0.1 ng) and by repeating the PCR for 30 cycles using primers part2_fwd and part2_rvs (S1 Table). For error-prone PCR to be effective, the DNA yield at the end of the reaction must be between 500 ng and 10 μg. Our reaction yielded 5 μg of DNA, which is within parameters. Once part2 was randomly mutated, we assembled it with the wild-type part1 and part3 and screened for positive clones. Upon screening more than 15 randomly-selected clones for each recombined library, we observed 100% correct ligation products by colony PCR. DNA sequencing of a number of those clones served to assess the library quality (refer to text in main paper and Table 2 In this simplified one-pot assembly procedure the three mother vectors carrying the parts are restricted and ligated together with the daughter vector in a single reaction (1) and the ligated product is then directly transformed (2). It is worth noting that the use of type IIS restriction enzymes should, in principle, allow to directly cut and ligate the parts and the vector together without the need of PCR steps for pre-amplification. Furthermore, the use of different selective antibiotic markers on the mother and daughter vectors eliminates the worry of carrying forward the mother constructs. In this scenario, the three parts carried in the circular mother vectors are directly pooled with the daughter vector, cut at the respective restriction sites at the same time and assembled by adding the ligase directly to the mixture. We used this simplified assembly strategy during our experiments to generate the wild-type Cal-A construct. Screening by colony PCR (S2 and S4 Tables) showed that the five tested clones contained the desired fragment (S5 Fig). DNA sequencing of three of the clones confirmed that no undesired events had occurred. However, we also noticed that when we applied this simplified strategy to achieve the recombination of the mutant parts, a reduced success rate of ligation was achieved. Even though we cannot at present explain this phenomenon, the use of our standard method (Fig 2 in main article) solved the problem. (DOCX)