Golden Gate Shuffling: A One-Pot DNA Shuffling Method Based on Type IIs Restriction Enzymes

We have developed a protocol to assemble in one step and one tube at least nine separate DNA fragments together into an acceptor vector, with 90% of recombinant clones obtained containing the desired construct. This protocol is based on the use of type IIs restriction enzymes and is performed by simply subjecting a mix of 10 undigested input plasmids (nine insert plasmids and the acceptor vector) to a restriction-ligation and transforming the resulting mix in competent cells. The efficiency of this protocol allows generating libraries of recombinant genes by combining in one reaction several fragment sets prepared from different parental templates. As an example, we have applied this strategy for shuffling of trypsinogen from three parental templates (bovine cationic trypsinogen, bovine anionic trypsinogen and human cationic trypsinogen) each divided in 9 separate modules. We show that one round of shuffling using the 27 trypsinogen entry plasmids can easily produce the 19,683 different possible combinations in one single restriction-ligation and that expression screening of a subset of the library allows identification of variants that can lead to higher expression levels of trypsin activity. This protocol, that we call ‘Golden Gate shuffling’, is robust, simple and efficient, can be performed with templates that have no homology, and can be combined with other shuffling protocols in order to introduce any variation in any part of a given gene.


Introduction
Current protocols for assembling variant gene libraries have evolved from the relatively simple early protocols that generated random variability through error prone PCR [1] into a rich variety of protocols that allows introduction of virtually any type of variation in any given gene [2,3,4]. For example, libraries can be constructed from pools of DNaseI digested fragments prepared from parental templates [5,6,7,8], from degenerate oligonucleotides [9,10] or from mixtures of both, or even from undigested parental templates [11,12,13], and are usually assembled through PCR. Libraries can also be made from parental sequences recombined in vivo or in vitro by either homologous or nonhomologous recombination [14,15,16].
Despite the large diversity of existing DNA shuffling protocols, standard cloning methods based on restriction enzymes are not widely used in these protocols. One obvious reason is that current cloning methods are usually not efficient enough to generate the large number of variants required for DNA shuffling. Using restriction enzymes would have several advantages such as providing the ability to shuffle genes irrespective of their degree of homology, providing flexibility and control regarding the number of recombination events in each shuffled gene, and the ability to shuffle very large genes or several regions within large genes (independence from PCR amplification). In fact, two DNA shuffling strategies have earlier been developed based on the use of type IIB or type IIs restriction enzymes [17,18,19,20]. However, these protocols are quite complex to perform, require several successive steps, and in many cases still rely on PCR for amplification of the library since only small amount of recombinant templates is obtained.
We have recently developed a protocol that allows subcloning a DNA fragment from one plasmid to another with very high efficiency in one tube and one step [21]. This protocol is also based on the use of type IIs restriction enzymes, and allows the conversion of more than half of all input plasmids into the desired recombinant product in just a 30 minutes restriction-ligation. High efficiency was also reported for the cloning of one to three PCR products using a similar cloning strategy [22]. We have now developed a protocol for cloning multiple fragments at once, and show here that at least 9 different fragments can be assembled together in a defined linear order and inserted into a recipient plasmid in one step, and that such procedure is so efficient that the majority (about 90%) of colonies growing on selection plates contain the desired constructs. This efficiency is sufficient to generate libraries of recombinant genes from several parental templates.
We have used trypsinogen as a test protein for this shuffling protocol. In plants, trypsinogen (bovine cationic trypsinogen) is expressed only at a low level ( [23], and our unpublished results), and we suspect that this is due to instability of the protein. We have shuffled trypsinogen together with the genes for bovine anionic and human cationic trypsinogen. Screening of just 225 recombinant clones by transient expression in Nicotiana benthamiana leaves led to selection of variants that allows production of a higher amount of trypsin activity per gram of leaf tissue.

DNA shuffling strategy
In an earlier work, we have shown that a DNA fragment of interest can be subcloned with very high efficiency in one step and one tube from one plasmid to another [21]. The principle of the cloning strategy is based on the ability of type IIs restriction enzymes to cut outside of their recognition site. Two DNA ends can be designed to be flanked by a type IIs restriction site such that digestion of the fragments removes the enzyme recognition sites and generates ends with complementary 4 nt overhangs; such ends can be ligated seamlessly, creating a junction that lacks the original site (Fig. 1A). This property allows cloning to be performed using a one-step restriction-ligation. This strategy was shown to result in the conversion of more than half of all input plasmids present into the desired recombinant product in just a 30 minutes restrictionligation. Subcloning was also found to be very efficient when two and three inserts were subcloned, but the total amount of recombined plasmid was lower.
This cloning strategy could also be used for DNA shuffling if the entry modules that are subjected to restriction-ligation are prepared from a set of homologous genes rather than from a single gene. Such a DNA shuffling protocol would consist of first selecting a number of 4 nucleotides 'recombination sites' (sequence f1 to fn+1, Fig. 1B) on a nucleotide sequence alignment of several homologous genes. Recombination sites would be chosen on sequences that are identical among all homologues, but different from all other selected sites within the same gene. The selection of these recombination sites defines modules that consist of a core sequence (C, sequence variable among homologues) flanked by two 4 nt sequences (f). These modules can be amplified by PCR with primers designed to add flanking BsaI sites on each side of the modules (the BsaI cleavage sites perfectly overlapping with the recombination sites), and cloned in an intermediate cloning vector and sequenced. A restriction-ligation performed on a mix containing all intermediate plasmids (total number of plasmids: x multipled by n), the recipient acceptor vector, BsaI enzyme and ligase is expected to allow assembly of a library of shuffled genes. This is because each module is compatible and can be ligated only to a module belonging to the next consecutive set of homologous modules, or to the acceptor vector for the first and last modules, and because each module from a set of homologous modules can be ligated with equal probability to each module of a contiguous set. In addition, because of the restriction-ligation, only the desired assembled products are expected to accumulate since all other ligation products (for example, ligated products containing plasmid backbone DNA from the intermediate constructs) will contain BsaI sites and should therefore be immediately redigested with BsaI.
As a first step toward testing this protocol, we decided to try to assemble a plasmid from 10 separate input plasmids (9 module plasmids and one vector plasmid) in a restriction-ligation.
One-pot one-step assembly of a GFP construct from 10 constructs We chose to make a construct containing a GFP gene with 4 introns and 5 exons (the same construct as described previously [21] but with introns). The introns and exons were defined as separate modules (sequence of the flanking BsaI restriction sites shown in Fig. 2, sequence of the complete modules given in Fig.  S1). The 9 fragments were amplified from a cloned GFP gene (for the GFP exons) or from Arabidopsis thaliana genomic DNA (for the introns), and cloned into the SmaI site of pUC19spec (a derivative of pUC19 with the ampicillin-resistance b-lactamase gene replaced by a spectinomycin-resistance gene) and sequenced.
The recipient expression vector, pX-LacZ (Fig. 2, described previously in [21]) contains two BsaI sites compatible with the first and last exon modules. To define optimal restriction-ligation conditions, a first experiment was performed using only the nine GFP intron/exon constructs without the acceptor vector. The result of the restriction-ligation is expected to be a 1.17 kb linear fragment containing the assembled GFP exons and introns, in addition to all linear entry vector backbone fragments (2.8 kb). Restrictionligations were set up by pipetting into a tube 75 ng of each of the 9 plasmids (5 exons, 4 introns), 2.5 units of BsaI enzyme (NEB) and either 2.25 or 15 units of T4 DNA ligase (Promega, 0.75 ml of normal -3 u/ml or high concentration HC ligase -20 u/ml, respectively) in a total volume of 15 microliters in ligation buffer (Promega). The restriction-ligations were incubated at 37uC for 3 and 6 hours and then run on an agarose gel. The expected 1.17 kb band could be seen only when the ligation was performed with HC ligase, and mostly after a 6 hour ligation (Fig. 2B). To try to improve the amount of assembled ligated product, we modified the restriction-ligation parameters so as to alternate between conditions optimal for annealing of the DNA ends and conditions optimal for enzymatic reactions (digestion or ligation), and for this purpose, performed the restriction-ligation in a thermocycler. Programs were defined with the following steps: incubation for 2 minutes at 37uC and 5 minutes at 16uC, both steps repeated either 25 or 50 times, followed by incubation for 5 minutes at 50uC (final digestion) and then 5 minutes at 80uC (heat inactivation). These conditions were more efficient than a continuous incubation at 37uC because the expected product was visible on a gel even when normal ligase was used, and was highest after 50 cycles The same conditions as described above were also used with a mix containing the acceptor expression vector in addition to the nine GFP module plasmids (75 ng of each of the 10 plasmids). The ligation was transformed into 100 microliters of chemically competent DH10B cells and 20 ml out of a final volume of 1 ml plated on Kanamycin X-gal plates. For all restriction-ligations performed with BsaI and ligase, the number of white colonies mirrored the efficiency of ligation observed in the ligation assay described above (Table 1). In general, high concentration ligase was more efficient than normal ligase for restriction-ligations performed without cycling, but both normal and high concentration ligases appeared to work well with a program with 50 cycles.
Plasmid DNA was prepared from 12 white colonies for each of 6 transformations (6 hr 37uC, 25 cycles and 50 cycles, each with normal and HC ligase) and was analyzed by gel electrophoresis undigested or digested with XmaI and AvrII. Analysis of undigested DNA (not shown) indicates that 4 out of 72 clones consisted of dimers (vector-insert-vector-insert religated, star in Fig. 2C). Analysis of digested DNA revealed that 67 out of 72 clones had the expected restriction pattern, or 93% of white colonies. When both incorrect inserts and dimers are included, this leads to a success rate of 63 correct colonies out of 72, or 87.5% of all white colonies. By extrapolating this frequency of correct clones to the entire transformation, one can conclude that up to 7918 correct clones were obtained with restriction-ligation performed at 37uC, and up to 16581 correct clones obtained with restrictionligation performed with cycling.
Six clones with a correct restriction digest pattern were sequenced as well as all incorrect constructs. Sequencing confirmed that all 6 clones with the correct restriction pattern had the expected sequence. For the incorrect constructs, 3 clones (24, 34 and 64) contained an insertion of one extra C7 module between modules C7 and C8, while one clone (22) had a deletion of module C7 (Fig. 3). Both types of events can be explained by ligation of inappropriate DNA ends complementary for 3 out of 4 nucleotides.

DNA shuffling of trypsinogen
The first set of experiments has allowed to establish restrictionligation conditions that are efficient enough to allow DNA shuffling. Since we did not have multiple GFP homologues to test the complete shuffling protocol, another protein, trypsinogen, was selected for further experiments. We had earlier tried to express trypsinogen (bovine cationic trypsinogen, UniProtKB database ID P00760) but only low levels of expressed protein were obtained (unpublished results), and hypothesized that low expression might come from either toxicity of trypsinogen to plant tissues or to instability of the protein in plant cells. Therefore, it is possible that related but different trypsinogen proteins might lead to higher level of expression in plants cells. Therefore, the genes for two other related proteins were selected for shuffling: bovine anionic trypsinogen (UniProtKB database ID Q29463) and human cationic trypsinogen (P07477). The coding sequence for bovine cationic trypsinogen was obtained by PCR amplification of exon sequences from calf thymus DNA. The bovine anionic and human cationic trypsinogen genes were chemically synthesized by Entelechon GmbH, with a Nicotiana codon usage (sequences given in Fig. S2). The three genes display 66 to 73% identity at the nucleotide level and 74 to 78% identity at the amino acid level. Eight recombination points were chosen on conserved aminoacids (Fig. 4A). These recombination points were selected randomly at positions throughout the genes to define 27 modules (9 sets of 3 modules), with the only requirement that each final module contains a distinct aminoacid sequence. The resulting 27 trypsinogen fragments were amplified by PCR with primers containing flanking BsaI sites, and cloned blunt [24] in the SmaI site of pUC19spec and sequenced.
A restriction-ligation was set up by adding into a single tube 50 ng of each of the 27 trypsinogen fragment constructs (Fig. 4B), 50 ng of vector, 10 units of BsaI enzyme (NEB) and 3 units of T4 DNA ligase (Promega) in a total volume of 15 microliters in ligation buffer (Promega). The restriction-ligation mix was incubated in a thermocycler with the following program: 5 minutes at 37uC and 5 minutes at 16uC, both steps repeated 50 times, followed by incubation for 5 minutes at 50uC and 5 minutes at 80uC (trypsin shuffling experiment 1, ts1, Table 2). The ligation was transformed in 100 ml chemically competent cells and 50 ml out of a final volume of 1 ml plated on Kanamycin Xgal plates. After counting the number of white colonies per plate and extrapolating to the whole transformation, a total of 7320 white colonies were obtained. Plasmid DNA from 24 white colonies was analyzed by gel electrophoresis (of cut and uncut DNA); four clones had an incorrect restriction pattern (Fig. 4C) and two were dimers (not shown). The 18 correct clones were sequenced and found to have correctly assembled inserts and all of these were different (structure of all sequenced clones shown in    S3). This shows that out of the 7320 colonies, 5490 colonies are estimated to contain correct constructs. Shuffling was repeated using the same amount of inserts but three times more vector (to be in the same molar ratio as the inserts (shuffling experiment ts2, Table 2), and 10190 white colonies were obtained, with an estimated number or 8492 correct constructs. In order to get a complete library (the maximal theoretical diversity for shuffling 3 genes in nine fragments is 19,683) one only needs to transform three separate 15 ml reactions or use more efficient competent cells (for example electrocompetent).

Optimization of ligation parameters and module design
All plasmids with incorrect restriction pattern were sequenced. The majority of incorrect clones (clones ts1-1/7/17, ts2-37 and 39) had a deletion of 5 modules (modules 3 to 7, Fig. 5). As for the incorrect GFP constructs, these can be explained by ligation of DNA ends complementary for 3 out of the 4 nucleotides, in this case between modules 3 (sequence of the top strand: agtg) and 8 (ggtg). Finally, one trypsinogen construct contained 6 extra modules (modules 3 to 8) between modules 8 and 9. In this case, exonucleolytic removal of one terminal base from the 59 end of each DNA overhang led to two complementary three base extensions that were able to anneal and become ligated (Fig. 5). This base removal can be explained by the presence of trace amount of a contaminating exonuclease in one of the components introduced in the ligation mix (the plasmids, the enzymes or the buffer).
Two approaches were used to try to further improve the efficiency of cloning. One consisted of modifying the ligation conditions, in particular the temperature, so as to minimize ligation of ends that are not perfectly complementary. For example, shuffling was performed using programs in which the 16uC incubation was increased from 16 to 20, 25, 30 or 37uC (experiments ts3 to ts6 and ts30, Table 2). However, these modifications did not significantly affect cloning efficiency.
The second approach consisted of modifying the sequence joining modules 2 to and 3 (which is involved in inappropriate ligation to module 8): the sequence was changed from agtg to agtc (a silent substitution) to prevent inappropriate ligation to module 8. This means that six modules BA2, BC2, HC2, BA3, BC3 and HC3, had to be recloned. Shuffling was then repeated with the 6 new modules using a range of different restriction-ligation conditions (experiments ts15, ts16, ts23 to 29, Table 2). This modification led to an increase in clones with the correct restriction pattern from 91% to 97%. After substracting the amount of clones that contained dimers, the overall number of correct clones increased from 87.5% to 91.5%. Constructs with the incorrect pattern obtained with the new modules were also sequenced. The majority of incorrect clones were generated as a result of exonucleolytic removal of at least one base at the 39 end of the overhang (Fig. S4).
All clones with a correct restriction pattern from cloning experiments ts15 and 16 (43 clones) were sequenced. All constructs were found to contain shuffled trypsinogen genes as expected, and all were different (structure in Fig. S3). None contained any single nucleotide mutation. This is expected since these constructs are assembled without using PCR amplification.

Screening of the shuffled trypsinogen constructs
The 81 sequenced constructs (all different) and 87 constructs analyzed by restriction digest but not sequenced were transformed in Agrobacterium strain GV3101:pMP90. In addition, a library of unscreened recombinant plasmids was directly transformed in Agrobacterium, and 53 Agrobacterium colonies were picked and grown  Constructs were made from a first set of trypsinogen modules (mod1, the junction between modules 2 to 3 is agtg) or a second set of modules (mod2, junction between modules 2 to 3 is agtc). White and blue are the total number of colonies obtained per transformation (extrapolated from the number of colonies obtained per plate). All restriction-ligations were performed using equimolar amount of insert and vector except for ts1 that was made using three times less vector (50 ng) than insert. Dimers were identified by running uncut DNA on an agarose gel. Programs used either 6 hours at 37uC (37uC 6 hr) or 50 cycles (conditions given in program); all programs are followed by digestion 5 min at 50uC and heat inactivation 5 min at 80uC. hc, use of high concentration ligase. doi:10.1371/journal.pone.0005553.t002 separately for infiltration. In addition, two other Agrobacterium strains were grown: a strain containing a 59 viral vector containing an Arabidopsis SUMO gene and a strain containing a construct for plant expression of recombinase [25]. The outcome of coinfiltration of three strains in plant tissues (the 59 vector, the recombinase, and a trypsinogen construct 39 vector) leads to recombination in plant tissues of the 59 construct and the trypsinogen construct, and to expression and secretion in the apoplast of a fusion protein: Arabidopsis SUMO-shuffled trypsinogen (Fig. 6A). Autocatalytic conversion of trypsinogen to trypsin then occurs (either in plant tissues or during extraction).
The 221 different shuffled trypsinogen constructs were infiltrated. Three constructs containing the parental genes were also infiltrated. Plant tissue was harvested at 7 days post infiltration (dpi). Trypsin enzymatic activity was determined using a colorimetric assay based on the conversion of a colorless substrate, BAPNA, into a yellow product by digestion with trypsin. Four clones (two of the previously sequenced constructs, clones ts15-7 and ts15-21, one from the characterized but non-sequenced miniprep, clone ts4-80, and one from the library of noncharacterized plasmids, clone ts1-103) were found to provide a higher level of activity than the bovine trypsinogen construct control, with the best clone 103, displaying approximately 4 fold higher activity (Fig. 6B). Both non-sequenced clones were then sequenced (structure shown in Fig. 6B).
A second round of shuffling was performed using information from the 3 best clones obtained (ts15-21, ts4-80 and ts1-103). Shuffling was performed by setting up a restriction-ligation containing modules in the same molar ratio as in the three selected parents combined (module set 1: BA1/BC1/HC1, 100/ 50/0 ng; module set 2: BA2/BC2/HC2: 0/150/0 ng; etc). Since not all of the 27 modules are used, the number of theoretical possible combinations is only 256 different constructs. Nevertheless, one construct with nine fold higher activity than bovine trypsinogen was obtained after screening 24 new recombinants (Fig. 6C). Preliminary data (not shown) suggests that the high activity of these clones is due to an increase of specific activity toward the BAPNA substrate rather than an increase in the amount of expressed protein; more precise quantification will be the subject of a separate study.

Discussion
We have shown here that inserts from nine separate plasmids (or nine sets of modules) can be easily and efficiently assembled and cloned in an acceptor vector in one step and one tube. The efficiency of this protocol comes from the fact that the only stable product(s) issued from the restriction-ligation are the desired product(s) [21]; these products are formed continuously with each cycle and with increasing length of incubation. Assembly was shown to be efficient with two independent sets of BsaI restriction sites overhangs, one set with the GFP construct, and the second set with trypsinogen. Sequencing of the constructs with incorrect restriction pattern obtained with both sets has allowed to draw some conclusions as to how these overlaps should be selected to maximize cloning efficiency. The majority of incorrect constructs for both experiments were found to occur as a result of ligation of two DNA ends complementary for three consecutive out of the four nucleotides of the overhang. This occurrence can be explained by inappropriate ligation of improperly annealed ends. An alternative explanation would consist of removal of a terminal nucleotide from one of the DNA ends by a contaminating exonuclease, and ligation of only one of the DNA strands of the annealed product. Whatever the mechanism, this occurrence can Figure 6. Activity assay of the shuffled trypsinogen constructs. (A) Three constructs (in Agrobacterium) are coinfiltrated for each trypsinogen construct: the 59 viral vector module (pICH30211), a trypsinogen construct, and an integrase construct (not shown). In planta recombination leads to formation of an assembled construct (1) which leads to viral expression of a fusion protein (2) containing a signal peptide (SP), Arabidopsis thaliana SUMO exons, and trypsinogen. The signal peptide is cleaved upon import through the ER (3), and trypsin is obtained by autocatalytic cleavage of the proprotein (red arrow). Grey boxes represent introns. (B) Activity and structure of some of the constructs obtained from the first round of shuffling (name, column 1 and activity, column 2), activity expressed relative to activity of bovine cationic trypsinogen (BC). Activity for the parents (BA, HC, BC) was also measured (from corresponding constructs infiltrated as a control). GFP is used as a negative control. The last 3 constructs (boxed) were used for a second round of shuffling. (C) Best construct obtained with the second round of shuffling. doi:10.1371/journal.pone.0005553.g006 be reduced by selecting a set of recombination sites in which none of the site shares three consecutive nucleotides with any other selected site. This is usually not a problem since a large number of possible sequences, 240 (256 theoretically possible sequences, minus the 16 palindromes that should be excluded), can be chosen from. For each additional site to select, the sequence of previously selected sites or their complement should be excluded as well (use of 2 sites with complementary sequences would allow one fragment to be inappropriately ligated at the wrong position and in the opposite orientation). With a set of overlaps chosen according to these criteria (the trypsinogen second set of modules), 97% of constructs obtained contained only correctly ligated DNA ends. This efficiency suggests that it is likely that more than nine fragments could be ligated together and still result in a high percentage of correct constructs.
Sequencing of 87 constructs with a correct restriction pattern (81 trypsinogen constructs and 6 GFP constructs) showed that none contained any single mutation in the shuffled genes. This is expected because shuffling is performed without the use of PCR; the modules in the intermediate constructs are made using PCR but are then sequenced before being used for restriction-ligation assembly.
A second reason for the efficiency of the assembly protocol is that the number of procedures performed on DNA has been brought down to a minimum. Indeed, any manipulation performed on DNA, including extraction, digestion, buffer exchange, dephosphorylation, DNA precipitation, column purification, or any other DNA manipulation procedure, is likely to result in some amount of DNA damage and to loss of some of the DNA. With the protocol described here, the plasmids used for assembly are not pre-digested but simply added to the restrictionligation mix. Only one step and one buffer are used, and the time between digestion and ligation is brought to a minimum. No purification step is required between DNA preparation of the input modules and transformation of the library in competent cells. Using undigested plasmids for this procedure rather than digested gel-purified DNA fragments has an added advantage: it allows estimating the relative DNA concentration of the modules (which might vary in size significantly) more precisely; this is because the relative size difference between modules is much lower for plasmids than for purified inserts. This precision if very important when it comes to ligating many fragments since a module present in too low or too high amount would become a limiting factor and reduce the number of final clones. Unlike for standard cloning, where only one clone is usually required, obtaining the maximum number of independent recombinant plasmids is a necessity for DNA shuffling.
The protocol described here has two applications: (1) making constructs and (2) DNA shuffling. Regarding the first application, the ability to assemble in one step a construct from 10 different plasmids should allow much more flexibility and efficiency in making constructs than is now possible. Cloning strategies that require many successive steps can now by done in two steps: one being preparation of the intermediate constructs and the second, assembly of the final construct. The cloning protocol described here is in fact an extension of the 'Golden Gate' cloning protocol described earlier [21]. A protocol based on ligation-independent cloning has been reported that also allows cloning nine fragments into a vector [26], but efficiency was lower at about 17%. Moreover, the protocol is based on assembly of PCR products rather than of sequenced inserts in plasmids, which means that a portion of the constructs obtained will contain mutations derived from the primers or the PCR amplification. A protocol has also been reported for the cloning of four fragments using a restriction-ligation using type II enzymes that produce compatible ends, such as EcoRI and MfeI [27]. However, this strategy required adding eight different enzymes to the restriction-ligation mix for ligation of just four fragments. Ligation of nine fragments in one vector would require the simultaneous use of 20 different enzymes in the same mix. Finding such a combination would impose extreme limitations on the design of any cloning experiment.
The application of this cloning protocol to DNA shuffling results in a protocol that we call 'Golden Gate shuffling'. The use of Type IIs enzymes for DNA shuffling has been reported before [17,18,19,20,28]. However, assembly of the modules was performed from gel purified pre-digested DNA fragments. As a result, in many cases, ligation had to be done module set by module set in consecutive steps, which led to a low amount of assembled product, often requiring PCR amplification of the library before cloning in the expression vector. In contrast, with Golden gate shuffling, once modules are made, assembling a defined set of modules is easy to perform. A first round of shuffling might provide a number of improved constructs that an experimenter might want to subject to a second round of shuffling. In that case, performing the second round of shuffling may consist or performing a one-tube restriction-ligation with different relative ratios of already made input modules. Another advantage comes from the fact that PCR is not required for assembly of the final library; this is useful since no PCR mutations will be present in the final library. Because of this feature, theoretically, large genes can therefore be shuffled. Another advantage of this technology is that shuffling can be done between parental templates that have no homology at all, one application being exon shuffling (as previously described by [28]). The only requirement is the presence of 4 nt at the chosen junction points. This means only one fixed aminoacid at each junction point.
Shuffling of three genes divided in 9 modules (9 sets of modules, each set containing 3 modules) provides a theoretical number of variants of 19,683 and shuffling of four or five genes would provide a maximal theoretical diversity of 262,144 and 2 million combinations, respectively. However, Golden Gate shuffling does not need to be limited to assembly of pre-made sequenced modules. In fact, it can be used to combine together modules sets that have been prepared with different shuffling protocols. For example, one set of modules might be made using any of the existing DNA shuffling protocols and might consist of thousands or even millions of variants. These module sets can be combined together with other less variable module sets, depending on the need of the experimenter. At the same time, not all sets of modules need to contain the same amount of modules. For example, one module set might consist of only one module of defined sequence that is used as a linker between two highly variable sets of modules. Therefore, the flexibility and efficiency of Golden Gate shuffling as well as its compatibility and complementarity with other DNA shuffling protocols should make it a valuable tool for molecular evolution.

Molecular biology techniques
Chemically competent cells were prepared as described earlier [21]. Agrobacterium infiltration of plant tissue has been described in [25]. Plasmid DNA minipreps were made using the Nucleospin Plasmid Quick Pure kit from Macherey-Nagel, Düren, Germany.
Trypsin enzymatic assay 100 mg of plant tissue was ground in liquid nitrogen and mixed with 300 ml of extraction buffer (0.15 M Tris pH 8.0, 2 mM EDTA). The extract was incubated for 10 minutes on ice and centrifuged for 15 minutes at 13,000 rpm. 20 ml of the supernatant was mixed with 20 ml of 2 mM BAPNA substrate (Sigma Aldrich). OD was read every 5 min from 5 to 45 min using a BioTek ELx808 Absorbance Microplate Reader with a 405 nm filter. Enzymatic activity was measured in the linear part of the curve (5-20 min) as the rate of the curve, to which background activity of uninfiltrated WT tissue was substracted. Activity was then expressed relative to activity of the bovine cationic trypsinogen parent construct. Figure S1 Sequence of GFP intron and exon modules and of the final assembled construct. The sequence of the 5 GFP exon modules, the 4 intron modules and of the final assembled GFP construct is given.