Circular Polymerase Extension Cloning of Complex Gene Libraries and Pathways

High-throughput genomics and the emerging field of synthetic biology demand ever more convenient, economical, and efficient technologies to assemble and clone genes, gene libraries and synthetic pathways. Here, we describe the development of a novel and extremely simple cloning method, circular polymerase extension cloning (CPEC). This method uses a single polymerase to assemble and clone multiple inserts with any vector in a one-step reaction in vitro. No restriction digestion, ligation, or single-stranded homologous recombination is required. In this study, we elucidate the CPEC reaction mechanism and demonstrate its usage in demanding synthetic biology applications such as one-step assembly and cloning of complex combinatorial libraries and multi-component pathways.


Introduction
Molecular cloning is a foundational technology for molecular biology and biotechnology. Pioneered by the restriction digestion and ligation based method [1][2][3], new cloning technologies have continuously been invented and evolved to suit various requirements and applications. Depending on whether specific sites or sequences are used in the insert and the vector for cloning, cloning methods can be broadly divided into two categories: sequencedependent and sequence-independent. Sequence-dependent cloning is based either on restriction digestion and ligation, or sitespecific recombination, such as the Univector plasmid-fusion system [4] and Gateway [5,6]. Sequence-independent cloning is largely based on homologous recombination and includes methods such as ligase-free [7] or ligation-independent cloning (LIC) [8], LIC with Uracil DNA glycosylase (UDG or USER cloning) [9,10], MAGIC [11], SLIC [12], In-Fusion (Clontech) [13], and PIPE [14]. Although these methods all have their own special characteristics and advantages, new developments especially the emergence of synthetic biology have put ever increasing demand for more accurate, efficient, convenient and economical cloning technologies for purposes such as creating complex combinatorial synthetic gene libraries, gene circuits and metabolic pathways.
For synthetic biology applications involving high-complexity or multi-fragment cloning, sequence-dependent methods are generally inconvenient because they require unique and specific sites in both the insert and the vector in order to generate the initial plasmids [4][5][6]. For this reason, the more flexible sequenceindependent cloning methods are preferred. However, such methods usually require generating complementary single-stranded overhangs in both the insert and vector fragments, with or without RecA-mediation [8,12,14]. And some of these methods are not strictly sequence-independent because they require the presence or absence of specific nucleotides at certain positions in the overlapping region [8,15]. The generation of complementary single-stranded overhangs takes additional preparation steps and often uses expensive enzyme systems. These manipulations generally require large amounts of DNA at the beginning and tend to have insufficient efficiency for library cloning. Furthermore, the annealing step in these methods is normally performed at ambient temperature, which allows non-specific hybridization among single-stranded overhangs and lead to frequent assembly errors in multi-fragment cloning. Therefore, for the demanding tasks of assembling and cloning complex synthetic gene libraries and pathways, further improvements on accuracy and efficiency over existing methods would be highly desirable. For routine and high-throughput cloning, fewer steps and lower cost is always a significant improvement.
Polymerase extension is the basis of the polymerase chain reaction (PCR) used for amplification of DNA sequences. The same principle is also used for gene assembly with overlapping oligonucleotides or gene fragments [16][17][18]. However, to our knowledge, there has been no reported gene cloning method which solely relies on this mechanism. Here we report the development of a much simplified sequence-independent cloning technology based entirely on the polymerase extension mechanism. This method extends overlapping regions between the insert and vector fragments to form a complete circular plasmid and is therefore named ''Circular Polymerase Extension Cloning''. In the current study, we elucidate the reaction mechanism and demonstrate the broad utility and advantages of CPEC in cloning of synthetic genes, complex combinatorial libraries and metabolic pathways. We performed extensive tests on CPEC and recommend it as one of the most convenient, economical, and accurate cloning method.

Single gene cloning using CPEC
Existing sequence-independent cloning methods require generating complementary single-stranded overhangs between the insert and the vector, a time-consuming and expensive process. We reasoned that it might be possible to eliminate this requirement by using the polymerase extension mechanism to extend doublestranded overlapping insert and vector to form a complete plasmid (Fig. 1A). In this mechanism, the insert and the vector share overlapping sequences on both ends. After denaturation and annealing, the insert and the vector will hybridize and extend using each other as a template to form a complete double-stranded plasmid, leaving only one nick in each strand.
To confirm the validity of this mechanism, we first attempted cloning of a simple test gene, lacZa, into a modified pUC19 expression vector (see Methods S1 for sequence information). The vector was linearized using either restriction digestion or PCR method. We added sequences on both ends of the lacZa gene to overlap with the ends of the linearized vector. The overlapping regions between the inset and the vector were designed to have similar melting temperatures (Tm), which were typically between 60-70uC (see Methods S1 for sequence information). We mixed the linearized vector with the lacZa gene without adding any PCR primers in an otherwise typical PCR reaction mixture.
We performed CPEC cloning as we would perform one cycle of PCR using a high-fidelity DNA polymerase. The reaction involved a brief denaturation step to denature the double-stranded insert and the linear vector, an annealing step for the overlapping ends of the insert and the vector to hybridize, and an extension step to form a complete plasmid (see Methods). We examined a small aliquot of the reaction mixture using DNA agarose gel electrophoresis and used another small amount for transformation.
The gel electrophoresis results showed formation of a significant amount of vector-insert merging product ( ). An examination of the transform results found that approximately 100% of the colonies showed blue color, indicating minimal cloning error or carry-over of intact vectors. Sequencing results of randomly picked colonies confirmed that the cloning reaction happened exactly as expected, with no mutations at the cloning junctions.

Gene library cloning using CPEC
For individual gene cloning, we determined that one cycle of CPEC reaction was optimal. For complex gene library cloning where sufficient numbers of clones need to be obtained in order to maintain the complexity of the library, more cycles of CPEC reaction might be needed. To determine the optimal library cloning conditions with CPEC, we examined the cloning a synthetic library containing codon variants of the lacZa gene, which was designed and synthesized for studying the effects of synonymous codon usage on protein expression. We selected the lacZa gene because we could use the blue or white color of the colonies to demonstrate the cloning and expression results. For this complex gene library, no convenient restriction sites could be found for cloning into the modified pUC19 expression vector without cutting a fraction of the insert sequences. Therefore, a sequence-independent cloning method must be used.
We performed CPEC cloning of the library and determined the cloning efficiency at different cycle numbers (see Methods). The results indicated that 5,333 transformants were obtained from one nanogram of insert after only one cycle, which was sufficient for routine library cloning. The number of transformants obtained peaked at around 15 cycles and reached 56,000 colony forming units (c.f.u.) per nanogram of insert ( Fig. 2A). As a comparison, we typically achieved approximately 1,200,1,500 c.f.u per nanogram of control insert using the ligation method. Approximately 100% of the colonies on the positive control plate transformed with wt-lacZa gene showed blue color, while the colonies with codon variations demonstrated, as expected, a wide range of blue color intensities. We isolated and sequenced plasmids from several hundred independent colonies which showed different intensities of the blue color. The sequencing results showed the presence of a distinct codon variant sequence of lacZa in every plasmid, which demonstrated correct and unbiased cloning of the complex library using CPEC. We further investigated the percentage of clones that might carry more than one insert after different number of CPEC cycles by single-colony PCR using primers on the vector (Fig. 2B). Sixteen colonies were randomly picked from plates culturing cells transformed with CPEC reaction product after 1, 2, 5 and 15 cycles. It was found that all 64 plasmids examined contained the correct, single-copy insert. This result suggested that the CPEC cloning mechanism was highly specific and did not favor carryover of concatemers, if any, into the final clones.

Combinatorial library cloning using CPEC
Construction of combinatorial library is extremely useful for synthetic biology and molecular evolution. We designed and tested two strategies of constructing complex combinatorial synthetic gene libraries using CPEC. The first strategy was to assemble the full-length inserts from shorter fragments first, followed by cloning the pre-assembled full-length inserts into a vector by CPEC. The second strategy was to combine the assembly and cloning steps into one CPEC reaction (Fig. 3).
For these tests, we selected a synthetic library which contained codon variants of the HIV envelope gene, gp120. The 1.7-kb fulllength codon variant library was divided into two fragments of approximately equal lengths, which were synthesized separately (see Methods S1 for sequence information). The two fragments and the vector were designed to share overlapping sequences with similar melting temperatures. To test the first strategy, we preassembled the 1.7-kb combinatorial library using a two-step polymerase cycling assembly (PCA) reaction [19] (see Methods) and then mixed the insert with the linearized vector and performed a multi-cycle CPEC reaction. An aliquot of the  reaction was taken after 5, 10 and 20 cycles, respectively, and the reaction products were analyzed by gel electrophoresis (Fig. 3A,  lanes 1-3). The results indicated that after 5 cycles of CPEC, a significant amount of the full-length 6.4-kb plasmid had formed. By 10 cycles, approximately 80% of the 1.7-kb inserts had merged with the vector. After 20 cycles, all free inserts and vector DNA had merged to form the complete plasmid.
Next, we tested one-step combinatorial assembly and cloning of the library from two sub-libraries using CPEC. We mixed the two insert libraries with the linearized vector in equal molar concentrations and performed 25 cycles of CPEC. The annealing step was carefully controlled in terms of annealing temperature and cooling rate in order to achieve highest hybridization efficiency and accuracy. A single band representing the 6.4-kb complete plasmid was clearly visible in gel electrophoresis analysis (Fig. 3B). We then transformed an aliquot of the reaction mixture directly into competent cells. Approximately 2.43610 5 colonies were obtained from one picomole of vector DNA. We randomly picked independent colonies on the plate and performed singlecolony PCR to verity the presence of the correct insert. The result showed that all 16 colonies examined contained the full-length insert, indicating a 100% cloning efficiency (Fig. 3C).

Multi-component pathway assembly using CPEC
We then tested if CPEC can be applied for efficient assembly and cloning of multi-component pathways in a single reaction. The proposed multi-way cloning mechanism is shown in Figure 4A. Unlike single-insert CPEC, which only require one cycle, multi-way CPEC usually requires multiple cycles to assemble a full-length product. However, unlike PCR, multi-cycle CPEC is not an amplification process, therefore will not accumulate or propagate errors generated by the DNA polymerase.
We tested multi-way CPEC by constructing and cloning a metabolic pathway for synthesizing a biodegradable plastic material, poly(3HB-co-4HB) in E. coli. The pathway consisted of four genes and additional regulatory elements. To construct the plasmid, we needed to assemble four PCR fragments of various lengths: 3280, 2959, 2047, and 171 bp. The total length of the ensemble was 8360 bp (see Methods S1 online for sequence information). We mixed these fragments in equal molar concentrations and carried out 20 cycles of CPEC reaction. Gel electrophoresis analysis showed the formation of a single prominent band of approximately 8.4 kb, representing the fulllength plasmid (Fig. 4B). To dissect the formation process of the full-length plasmid in a multi-cycle CPEC reaction, we analyzed the reaction intermediates after 2, 5, and 10 cycles (Fig. 4C, lanes  1-3). Discrete bands representing extension products joining neighboring pieces to form longer and longer intermediates were clearly visible. The 8.4-kb full-length band was already strong by 10 cycles as the lengths of the intermediate bands shifted upward. The results supported the proposed mechanism (Fig. 4A) and suggested that multiple cycles are necessary in order to drive the reaction into completion.
To assess the quality of the cloned plasmid, we transformed 1.25 ml of the 20-cycle reaction mixture (,20 ng of DNA) into competent E. coli cells and plated aliquots of the cells on chloramphenicol plates containing Nile Red. We calculated that approximately 1,000 transformants were obtained from 0.1 ml of the 20-cycle reaction. 100% of the colonies turned pink color, suggesting the presence of a functional pathway [20,21]. We isolated plasmids from randomly picked colonies and performed restriction mapping. The results confirmed that all colonies examined contained the full-length plasmids (8.4 kb) with individual components at their expected positions (Fig. 4D).

Discussion
Unlike any other cloning method, CPEC relies solely on the simple and robust polymerase extension mechanism to clone individual genes, libraries, or multiple fragments. In a single closed-tube reaction, the insert and vector fragments are first heat denatured, then annealed at elevated temperatures to ensure specific hybridization of overlapping regions, and finally extended to form complete plasmids, leaving only one nick in each strand. The fully-formed relaxed double-stranded plasmids are then efficiently introduced into E. coli cells where the nicks are sealed and covalently closed plasmids are formed. The most significant advantages of CPEC include accuracy, efficiency, convenience and cost-effectiveness in complex library and pathway assembly.
For routine single-gene cloning, one denature-annealingextension cycle is sufficient and optimal, which can be completed in five minutes. For library cloning where sufficient numbers of clones need to be obtained in order to maintain the complexity of the library, CPEC offers the unique advantage of being able to perform multiple cycles to maximize the total number of clones constructed without using excessive amounts of vector DNA. We recommend 2-5 cycles depending on library complexity. For multi-fragment cloning, 5-25 cycles maybe used depending on the number of fragments.
Unlike PCR, CPEC is not an amplification process and therefore will not accumulate mutations. However, excessive numbers of cycles should be avoided in order to minimize possible concatemer formation. In cases where concatemers may form, the cloning efficiency will not be significantly affected because concatemers usually do not have the correct complementary ends for efficient circularization and therefore will not form covalently closed plasmids in the cells.
Only PCR polymerases with no strand displacement activity should be used in CPEC reactions to avoid long concatermer formation or other cloning artifacts. Many of the commercially available PCR polymerases belong to this category. The Phusion DNA polymerase was used in this study due to its robustness, speed and accuracy. The reaction conditions may need to be adjusted if other polymerases are used, especially the extension time. PCR polymerases with low efficiency or low fidelity should be avoided for demanding cloning tasks using CPEC.
Compared to sequence-dependent cloning methods, such as those mediated by restriction-ligation or site-specific recombination, CPEC has the advantage of complete flexibility with respect to sequence junctions. Compared to other sequence-independent cloning methods, in addition to enjoying all of their benefits, CPEC offers other significant advantages. First, CPEC eliminates the extra steps or enzymes required by other sequence-independent cloning methods to generate single-stranded regions for annealing. For example, in LIC, overlapping sequences lacking a particular dNTP are added to the insert by PCR and complementary 12-nt singlestranded regions in both the insert and the vector are generated by T4 DNA polymerase treatment in the presence of that particular dNTP. In UDG-based methods, a ribonucleotide U replaces a T in the PCR primers used to add overlapping sequences to the insert and subsequent treatment with UDG enzyme generates singlestranded ends in both the insert and the vector for annealing. In SLIC, T4 DNA polymerase treatment or incomplete PCR with two pairs of primers are used to generate mixed products containing ssoverlapping regions. In PIPE, a different version of incomplete PCR is used so that some PCR products are not fully extended and therefore leave heterogeneous single-stranded regions toward the ends. These extra preparation steps take more time, more DNA, and many require extra expensive enzymes or reagents. In contrast, CPEC uses double-stranded overlapping inserts and vector directly without any treatment. The whole single-cycle CPEC reaction can be completed in 5 minutes and uses only a PCR polymerase, making CPEC one of the most convenient, economical and versatile cloning methods, which can also be easily adapted for high-throughput cloning.
Another notable advantage of CPEC is its high cloning accuracy and efficiency, which makes it uniquely suitable for complex, combinatorial, multi-fragment or multi-library cloning. In CPEC, all overlapping regions among fragments are designed to have similar high melting temperatures (typically 55-70uC) so annealing between fragments can be very specific. This is most desirable for complex, combinatorial or multi-fragment cloning where non-specific annealing can cause cloning errors. It is our experience that typically 95-100% of CPEC-generated colonies contain the correct inserts, including multi-way assembly. All other sequence-independent cloning methods use ambient annealing temperatures and, as a result, the specificity and success rate of multi-way cloning can be significantly compromised.
The high cloning efficiency of CPEC, especially for multi-way or complex library cloning, comes from a combination of two special features. First, CPEC forms covalently joined complete plasmids in vitro. Secondly, multiple CPEC cycles can drive the reaction into near completion. In contrast, all other sequenceindependent cloning methods either loosely anneal fragments without covalent bonding or allow only a small fraction of the fragments to form plasmids due to the low efficiency of multifragment hybridization.
With increasing demands for complex or combinatorial library cloning and multi-fragment gene pathway and network assembly, we expect CPEC to play a significant role in various applications of synthetic biology. It will enable rapid and high-throughput construction of combinatorial libraries, gene circuits and pathways. It will also liberate researchers from tedious and timeconsuming everyday cloning tasks.

CPEC
We obtained linear vectors with PCR amplification and gel purified it using E.Z.N.A gel extraction kits (Omega Bio-Tek). We added vector-overlapping sequences onto the lacZa gene (with a Cterminal His6 tag) using PCR (see Methods S1 for primer sequences) and gel purified the insert. 200 ng of the linear vector was mixed with insert DNA at equal molar ratio in a 20 ml volume containing Phusion DNA polymerase reaction mixture (Finnzymes). We denatured the insert and vector mixture at 98uC for 30 seconds, annealed them at 55uC for 30 seconds, and performed polymerase extension for 15 seconds per kb according to the length of the longest piece. We normally added an extra extension period equivalent to 1-2 times of the required extension step in the end. For average-sized vectors and inserts, the total reaction time was less than 5 minutes. We transformed 1-4 ml of the mixture into 50 ml of chemically competent GC5a cells and plated a fraction of them on carbenicillin plates with 2% X-gal.

Multi-cycle CPEC for library cloning
We set up the cloning reaction exactly the same way as in singlecycle CPEC. After the initial 30 second denaturation step, we performed multiple cycles each consisted of 10 seconds denaturation at 98uC, 30 seconds annealing at 55uC, and extension at 72uC for 20-30 seconds per kb according to the length of the longest piece. We ended the reaction with an extra 5 minutes of extension. We transformed a fraction of the reaction mixture into cells and plated an aliquot of the cells on a carbenicillin plate with 2% X-gal.

Combinatorial library cloning
For the strategy combining PCA with CPEC, we set up the PCA reaction by mixing 100 ng of each sub-library (VacF1 and VacF2) with the Phusion reaction mixture in a 50 ml volume. After an initial 30 seconds denaturation at 98uC, we performed 5 cycles of PCA which consisted of 7 seconds denaturation at 98uC, 30 seconds annealing at 52uC, and extension at 72uC for 20 seconds, and completed with an extension step at 72uC for 5 minutes. We used 0.5 ml of the PCA reaction as a template and performed 30 cycles of PCR amplification of the full-length library using GP140R and GP140L primers (Methods S1) and the Phusion enzyme. We performed multi-cycle CPEC using identical conditions as described except that during the annealing step, we applied slow ramping at 0.1uC/second from 70uC to 55uC before annealing at 55uC for 30 seconds.

Multi-way CPEC
We mixed equal molar concentrations of the insert and vector fragments for multi-way CPEC. We used extension time which was sufficient to cover the full-length of the plasmid. Otherwise the reaction condition was identical to the multi-cycle CPEC. We transformed 1.25 ml of the reaction into 50 ml chemically competent DH5a cells and plated 100-200 ml aliquots from 1 ml culture on 2% agar plates containing 20 mg/ml chloramphenicol and 0.5 mg/ml Nile Red.

Additional Methods
Information about plasmid construction, vector, inserts and primer sequences is available in Supplementary Methods S1.