Improved PCR Amplification of Broad Spectrum GC DNA Templates

Many applications in molecular biology can benefit from improved PCR amplification of DNA segments containing a wide range of GC content. Conventional PCR amplification of DNA sequences with regions of GC less than 30%, or higher than 70%, is complex due to secondary structures that block the DNA polymerase as well as mispriming and mis-annealing of the DNA. This complexity will often generate incomplete or nonspecific products that hamper downstream applications. In this study, we address multiplexed PCR amplification of DNA segments containing a wide range of GC content. In order to mitigate amplification complications due to high or low GC regions, we tested a combination of different PCR cycling conditions and chemical additives. To assess the fate of specific oligonucleotide (oligo) species with varying GC content in a multiplexed PCR, we developed a novel method of sequence analysis. Here we show that subcycling during the amplification process significantly improved amplification of short template pools (~200 bp), particularly when the template contained a low percent of GC. Furthermore, the combination of subcycling and 7-deaza-dGTP achieved efficient amplification of short templates ranging from 10–90% GC composition. Moreover, we found that 7-deaza-dGTP improved the amplification of longer products (~1000 bp). These methods provide an updated approach for PCR amplification of DNA segments containing a broad range of GC content.


Introduction
De novo gene synthesis relies on the chemical synthesis of oligonucleotides that are used as the building blocks for enzymatic assembly, making it possible to synthesize multi-kilobase genes and even whole genomes [1,2,3]. This technology is now more affordable than actually cloning genes and it enables the use of existing genome databases to construct any intended target. Such targets can include synthetic enzymes, reporter genes, regulatory elements, and even entire pathways [4,5,6,7]. However, most gene synthesis approaches utilize PCR amplification, which has shown to be difficult for DNA templates with extreme percent GC content (i.e: greater than 75%). These sequences can be problematic during PCR amplification as they form hairpins and various other secondary structures leading to premature termination of the polymerase extension yielding incomplete and non-specific products [8].
Previously, researchers changed cycling conditions to overcome low and high GC DNA sequences. Subcycling of the annealing and elongation steps during PCR amplification of long DNA segments was previously shown to achieve efficient amplification especially when the segments contain a variety of GC content [9]. For example, this method has been successfully used to detect an inversion mutation of intron 22 in the Factor VIII gene in patients with severe hemophilia A [10,11,12,13], a region of the gene known for containing long sequences with high and low GC content.
Additionally, researchers have also overcome amplification unevenness, or bias, in a multiplexed PCR due to extreme GC content by performing PCR reactions in presence of organic molecules like dimethyl sulfoxide (DMSO), Betaine, glycerol, or 7-deaza-dGTP. These additives improve the amplification specificity and yield [14,15,16]. Recently, researchers have reported that a combination of either two (betaine and DMSO) [17] or three additives (betaine, DMSO, and 7-deaza-dGTP) [18] resolve the complex secondary structure formation found in GC-rich sequences and synergistically enhance their amplification. To date, no better method has been clearly defined for de novo synthesis of genes containing a broad range of GC contents.
In this study, we use a single-shot assembly process to synthesize genes. The process includes PCR amplification of pools of multiple short DNA templates differing in GC contents that often results in unsuccessful amplification. In order to achieve a uniform, or non-biased, amplification of the templates, we combined the subcycling protocol with chemical additives to expand the viable amplification range of short templates by greater than 10% on both ends of the GC spectrum. These pools of short templates were subsequently assembled into discrete longer templates (~1000 bp), where we further show that 7-deaza-dGTP particularly improved the amplification product specificity at the high GC end of the spectrum. This new method for PCR amplification of DNA segments with a broad range of GC sequences allows for efficient production of a variety of gene constructs.

Oligonucleotide pools
The oligonucleotide pools used in these experiments were synthetic DNA designed to have the desired GC content. Sequences are provided in supporting information S1 File. Synthesis was performed using Agilent's Sureprint Oligonucleotide Library Synthesis (OLS) technology [19]. The work presented in this publication utilized a small portion of an OLS design. Oligonucleotides contained 10% GC to 90% GC with the bases distributed randomly throughout the molecules. The pools were mixed such that each pool would have the range of oligonucleotides with differing GC content to test many combinations in one multiplexed PCR. A single-shot assembly process was used to assemble synthesized oligonucleotides into larger genes [20].

PCR conditions
The subcycling PCR of the short oligonucleotides was carried out with the Phusion HF polymerase (Thermo Fisher Scientific, Inc) as well as with the KAPA HotStart ReadyMix (Kapa biosystems). For the Phusion PCR without additives we used 200μM dNTPs and a 1x final concentration of the Phusion 5x HF buffer from Thermo Fisher Scientific, Inc. The KAPA PCR was carried out with the master mix in accordance to what is indicated in the KAPA literature. To address high GC content various additives were included in the Phusion HF PCR reactions as follows: 7-deaza-dGTP (NEB) at a 40:60 ratio with normal dGTP, as well as 50:50 and 60:40 ratios keeping the final concentration of dNTPs constant; DMSO (Sigma) at a final concentration of 2.5%, 5%, and 10%; betaine (Sigma) at a final concentration of 1M, 2M and 4M. Oligonucleotide pools were amplified using one common set of primers in the multiplexed amplification as to avoid bias from unique primers for each sequence. These primers (TGCAACCCCCAAGACAACGT, TGGTTGACTCTTGTGCCGCA) are 20 bases in length with a melting temperature of 65°C and GC content of 55%. The PCR amplification reactions for the short oligonucleotides of varying GC content were based on the subcycling protocol previously mentioned [9] with the following thermocycle conditions: 95°C for 5 min, 29 cycles at 98°C for 20 sec, with 4 subcycles of 60°C for 15 sec and 65°C for 15 sec, a final extension of 65°C for 5 min and held at 12°C. The standard PCR protocol for the short oligonucleoties to which we compared our subcycling conditions was: 98°C for 30 sec, 40 cycles of 98°C for 5 sec, 65°C for 10 sec and 72°C for 10 sec, then 72°C for 3 min. No additives were used in these PCR reactions.
The longer fragments were amplified with KOD polymerase master mix (EMD Millipore). Conditions for amplification of longer fragments were: 98°C for 30 sec, 40 cycles of 98°C for 5 sec, 65°C for 10 sec and 72°C for 10 sec, then 72°C for 3 min.

MiSeq sequencing and analysis
In order to be able to determine the relative representation of each individual oligonucleotide in a multiplexed PCR we used the Illumina MiSeq sequencer to identify individual sequences and effectively count the number of appearances of each unique sequence. These oligonucleotides were first labeled with Illumina adapters such that the primers for labeling had a region identical to the oligonucleotide primers listed above, and to the adapters appended to the ends (TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGCAACCCCCAAGACAACGT, TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGCAACCCCCAAGACAACGT). The labeling PCR was carried out using NEB Phusion master mix with the following thermocycle: 95°C for 2 min, 5 cycles of 95°C for 30 sec, 60°C for 30 sec and 72°C for 2 min, and finally 72°C for 5 min. The labeled material was SPRI cleaned using 1.8x Agencourt AMPure XP beads (Beckman Coulter Inc.) and placed into an indexing reaction as indicated in the Illumina MiSeq prep instructions. Due to the length of these oligonucleotides they were loaded onto the MiSeq at 3pM. The data were analyzed using in house software.

GX Analysis
Capillary electrophoresis DNA analysis post amplification was done with the LabChip GX instrument from Perkin Elmer. The DNA 5K/RNA/CZE LabChip was used in the GX instrument as appropriate for the sizes of DNA being analyzed. LabChip GX software version 4.1 was used to analyze the GX data.

Results
Subcycling improves amplification of DNA templates containing low GC content PCR amplification of multiple DNA oligonucleotides containing a wide range of GC content often results in uneven amplification, where a sequence is preferentially amplified over others in a comlplex pool unintentionally, preventing efficient assembly of a complete larger construct. We hypothesized that insertion of a subcycling step during the PCR amplification process would yield a more uniform amplification of all types of oligonucleotides in the mix. To test this hypothesis we introduced a subcycling step that alternates four times between 60°C and 65°C within each one of the regular PCR amplification cycles, as indicated in the scheme in Fig 1a. Using this subcycling protocol we first tested amplification of oligonucleotides consisting of low GC content ranging between 12%-45% GC sequences. Under these conditions we compared amplification using Phusion or KAPA DNA polymerases. KAPA HiFi DNA polymerase was chosen due to its known ability to significantly improve the quality of PCR products from difficult templates containing AT-or GC-rich sequences. As presented in Fig  1b, under normal conditions (i.e. without subcycling), Phusion produced amplified product from all low GC 200bp oligonucleotide templates, while KAPA showed poor amplification for most of the short templates. In contrast, with the subcycling protocol, the amplification of all oligonucleotides was significantly improved for both polymerases, yet there was a clear performance advantage to Phusion (Fig 1b subcycling).
To determine whether the subcycling protocol improved the evenness of amplification, pools of 6 oligonucleotides consisting of different GC content (12% to 45% GC) were amplified as a group by multiplex PCR reaction with and without the subcycling protocol. Following the amplification, individual oligonucleotides were subjected to a MiSeq analysis. Fig 2 shows the copy numbers of amplified oligonucleotides from each group. For the data, it is clear that the subcycling protocol greatly improved the amounts of amplified oligonucleotides, and it was specifically efficient for those oligos with GC at the lower end of the spectrum, containing between 12% and 25% GC sequences. Thus, the subcycling protocol resulted in a more even amplification across this range (12%-45% GC). Combination of subcycling and 7-deaza-dGTP improves PCR amplification of oligonucleotides with high and low GC content Prior studies report that DMSO and betaine improve PCR amplification of GC-rich DNA when added to the PCR reaction [21,22,23,24]. To examine whether such additives will improve amplification in our study, we used these additives to amplify oligonucleotides with a wider range of GC content (from 10% to 90%), the analysis of this data is shown in Fig 3, and clearly indicate that subcycling improves amplification of both the oligonucleotides containing low GC (10%) and the oligonucleotides with high GC (79%) contents (Fig 3 panels A&B compared to C&D). In addition, combining subcycling PCR with chemical additives, as indicated in Fig 3, yielded complete full-length product and less truncated non-specific products for oligonucleotide templates with low GC range (10% GC) (Fig 3 panels A&C). In contrast, the high GC range (79% GC) templates (Fig 3 panels B&D), showed only slight amplification improvement when subcycling was combined with different additives. Interestingly, the addition of 7-deaza-dGTP (all ranges) and betaine (0.1M) was sufficient to provide strong PCR product without any sub-cycling. Unexpectedly, 0.2M and 0.4M betaine resulted in yield loss when using a standard anneal temperature. Usage of 7-deaza-dGTP did not result in a yield loss, regardless of the final ratio of 7-deaza-dGTP:dGTP used. With sub-cycling, 7-deaza-dGTP and the addition of betaine (Fig 3) reduced the truncated oligonucleotides products and clearly improved the correct amplification of the high GC oligonucleotides (Fig 3 lanes 8&9). In following experiments, the mid-point condition of 0.2M betaine was used in order to maintain balance with sub-cycling while the percentage of 7-deaza-dGTP was maximized (60%). Applying the same experimental conditions in the amplification of oligonucleotides containing GC sequences in the middle range (between 30% and 70% GC), showed only a slight improvement over the normal amplification conditions (data not shown).
In order to define a PCR condition that amplifies the broadest range of GC, we compared amplification of oligonucleotides from 10% to 90% GC using the subcycling protocol combined with the following: no additives, 60% 7-deaza-dGTP, or 0.2M betaine as presented in Fig 4. The results indicate that 60% 7-deaza-dGTP (Fig 4B lane 8 is stronger than that of A and C) when combined with the subcycling protocol, is the best condition, of those listed above, to   Subcycling PCR combined with 7-deaza-dGTP increases the purity and concentration of the synthesized full-length constructs Finally, we sought to establish a quality measurement depicted as build success. To establish that we performed a GX analysis (see methods for details) of the correct full-length constructs following assembly. We compared the correct full-length constructs grouped by their GC content, that were generated by the standard protocol, to the same constructs, generated by the modified, GC protocol. Data are presented as percent of build success by calculating the number of successful assemblies of the correct full-length constructs, per number of attempts, for each of the synthesis protocols. As seen in the Fig 5 bar chart and Table 1, the results clearly indicate   that constructs containing 5-10% GC sequences, 11-15% GC, or 81-90% GC sequences, resulted in a decreased build success as compared to sequences with more moderate GC content when the standard protocol was used. However, using the broad spectrum protocol (Fig 5 and  Table 2) greatly increased the build success from 20% up to 40% for the constructs containing 5-10% GC, from 46.6% to 80% for the 11-15% GC constructs, and from 41.2% to 82.4% for the constructs containing 81-90%GC. Importantly, the modified GC protocol did not adversely affect the synthesis of constructs containing GC content between 16-80%.

Discussion
The ability to design and synthesize DNA constructs has many applications within molecular biology including the study of large sets of single genes [25], engineering of metabolic pathways for target molecule manufacturing [26], and the ability to safely obtain genes for vaccine research without the need to grow the full pathogens [27,28,29]. All methods of gene synthesis use chemically synthesized small oligonucleotides; however, it is difficult to successfully synthesize DNA sequences that are longer than 150-200 nucleotides by chemical synthesis [19,30,31], One method uses oligonucleotides synthesized on microarray chips. While chip-based synthesis can yield up to10 5 different sequences per chip, the total synthetic yield of any given oligonucleotide is too low to be used directly in conventional molecular biology methods. Therefore PCR amplification is needed to increase oligonucleotide quantity. However, a major impediment for such PCR amplification is multiple oligonucleotides with variety of GC content including oligonucleotides containing less than 30% GC or higher than 70% GC. Multiplex amplification of such an oligonucleotide mix can be uneven, resulting in low copy numbers of the particular products of the high and low GC oligonucleotides, and becomes a limiting factor for the final assembly of the larger constructs.
Here we show that a combination of subcycling and 7-deaza-dGTP included in the PCR amplification step greatly improve multiplex amplification of oligonucleotides containing a wide range of GC content. During subcycling, the combined annealing/elongation step is composed of subcycles, shuttling between a low and a high temperature, and helps mitigate secondary structure formation. In a similar manner, 7-deaza-dGTP reduces secondary structure formation during PCR as it makes one less chemical contact than dGTP and behaves more like dATP. Our single-shot synthesis process involves multiplex PCR amplification of DNA segments with GC contents from low GC (10%) to high GC (90%). Subcycling PCR was previously reported to improve multiplex PCR amplification of long DNA templates containing regions with low and high GC content [9]. Similarly, we introduced 4X subcycling in the PCR amplification protocol in order to achieve a sufficient amount of amplified synthetic DNA where sequence conservation was critical. We first examined multiple oligonucleotides containing low GC sequences (in the 10%-45% range), and as shown in Fig 1b the subcycling clearly improved the amplification, and produced a significantly higher copy number (Fig 2) of the low GC (12-25% GC) oligonucleotides. In this experiment, we also compared the activity of the two DNA polymerases Phusion and KAPA. Both enzymes offer high fidelity and amplify long templates. In our study, Phusion produced more consistent results for all conditions and oligonucleotides (Fig 1b). While subcycling improved amplification of oligonucleotide containing low GC (Fig 2), it had less impact on oligonucleotides containing high GC (data not shown). Thus, we combined subcycling with different chemical additives to improve amplification of higher GC material. Recent work by Jensen et al. reported that betaine and DMSO when added to the PCR reaction in both the amplification and assembly steps of the synthesis, greatly improved de novo synthesis. In their study, Jensen et al use two GC rich templates, the Insulin like growth factor 2 receptor (IGF2R), and the V-raf murine sarcoma viral oncogene homolog B1 (BRAF) [32]. Even though organic chemical additives are known to optimize and improve PCR amplification, very little is known about the precise structural features that make these additives effective [33,34]. We examined the effect of several different additives in multiple concentrations on PCR amplification of DNA oligonucleotides templates containing a wide range of GC sequences (from 10% GC to 90% GC). We found that 7-deaza-dGTP and betaine were both successful in amplifying higher GC material (Fig 3). However, betaine at a high amount (0.2M, 0.4M) resulted in yield loss with a standard anneal temperature for higher GC material. This was an unexpected and repeatable result. We ultimately determined that 7-deaza-dGTP in combination with subcycling significantly improved the correct amplification of all the different oligonucleotides across the board, was less sensitive to cycling conditions and produced more mass than betaine in the PCR for the 90% GC oligonucleotide sequence (Fig 4). As the data presented indicates, most additives do not improve the amplification and can result in truncated shorter products, specifically at the high GC range. To verify the significance of this result we used capillary electrophoresis to compare the correctly assembled long constructs established by the subcycling synthesis protocol, to the ones established by the regular protocol. Capillary electrophoresis allows comparison of size, purity, and concentration of the assembled products. The subcycling protocol greatly improved the synthesis as well as the correct assembly of the long constructs especially for the constructs containing low GC content (5%-15%) and the oligonucleotides containing high GC content (81%-90%), while not significantly affecting the middle range of GC content (16%-80%) constructs (in Fig 5, Tables 1 and 2).
This work provides an updated method for PCR amplification of DNA oligonucleotides containing a broad range of GC sequences without the need of additional protocol modifications or sample extraction and purification. As these challenges are addressed, synthetic biologists will be able to construct next-generation synthetic gene networks with useful applications and advance the game changing areas of regenerative medicine, gene editing and metabolic engineering.
Supporting Information S1 File. DNA Sequences. MS-Excel spreadsheet containing DNA sequences used in this study. (XLSX)