A Simple and Efficient Method for Assembling TALE Protein Based on Plasmid Library

DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate.


Introduction
Efficient targeted genome editing relies on the use of engineered nucleases, artificial proteins composed of a customizable sequencespecific DNA-binding domain fused to a nuclease that cleaves DNA in a non-sequence-specific manner. The technology platform heavily relies on engineering sequence-specific DNAbinding domain. TALEN is a newly developed technology in targeted genome editing following the zinc finger nuclease (ZFN) technology [1]. Transcription activator-like effectors (TALEs) from the genus Xanthomonas consist of a series of novel DNA-binding proteins [2,3]. Deciphering the DNA binding mechanism of TALE repeats opens a new avenue to develop TALE-based technology for genome editing [4,5]. The sequence binding specificity of TALE is determined by highly conserved tandem repeats in the central DNA-binding domain. Naturally occurring repeats in TALE consist of 33-35 amino acids with two variable di-residues (RVDs) at positions 12 and 13 specifying one of the four bases together [4,5]. TALE repeats can be adjacent in arrays of custom length with capability to target specific DNA sequences. Artificial TALE transcription factors (TALE-TFs) or TALE nucleases (TALENs) have been constructed by fusing customized repeat arrays to transcriptional activation domain or FokI cleavage domain [6,7]. Customized TALE-TFs have efficiently up-regulated the expression of endogenous human Sox2, Klf4 genes [8] and mouse Oct4 gene [9], which might provide a way for pluripotent stem cells induction. Genome alterations have been generated by repair of DNA double-strand breaks (DSBs) through nonhomologous end-joining (NHEJ) or homologous recombination (HR) induced by customized TALENs in plants [6], yeasts [10], zebrafish [11,12,13], Xenopus embryos [14], rat embryos [15] and human somatic [16,17] and pluripotent stem cells [18].
Few researchers chose to use commercial synthesized TALENs for their studies [16]. Although many assembly methods have been recently reported, including a series of methods derived from Golden Gate cloning [6,8,10,19,20,21,22], regular cloning methods [12,23] and high-throughput methods [24,25], a rapid, convenient, and more cost-efficient method with high success rate is desired by researchers who are interested in TALE application.
Here we describe a rapid, convenient and cost-efficient method with high success rate based on Golden Gate cloning strategy. Our results indicate that this is a feasible method which may make TALEN and TALE-TF construction more universal.

Molecular Biology Reagents
Restriction enzymes and T4 DNA ligase used in this study were purchased from New England Biolabs (NEB, USA). Taq and Pfu DNA polymerase were bought from Transgen Biotech (Beijing, China). pGEM-T easy vector was bought from Tiangen Biotech (Beijing, China). 4 monomer fragments (NI, HD, NG and NN), 4 pLenti-EF1a-Backbone plasmids (backbone plasmids for TALE-TF construction and expression) and pST1374 vector (backbone plasmids for TALEN construction and expression) were bought from Addgene (USA).
Plasmid preparations were performed by using Plasmid Mini Kit (Omega, USA) and DNA gel extractions by using AxyPrepTM DNA Gel Extraction Kit (Axygen, USA) following the manufacturer's protocol. Plasmid DNA concentration was measured using a NanoDrop 2000 Spectrophotometer (Thermo, USA).

Construction of Monomer Plasmids
Optimized four monomers (NI, HD, NN and NG) with minimized repetitiveness were used as template for monomer plasmid construction as previously described [8]. Each monomer was amplified with four pairs of primers TALE-F1/TALE-R1,  TALE-F2/TALE-R2, TALE-F3/TALE-R3 or TALE-F4/TALE-R4. Each pair of primers (Table 1) was designed such that each monomer had four possible cohensive ends as TGAC/ACTC, ACTC/CCTC, CCTC/ATTA, and ATTA/CTTA at 59/39 ends after BsaI digestion [8]. The amplified monomers were ligated into pGEM-T easy vector after purification. Transformed cells were plated on LB solid medium containing 100 ug/ml ampicillin, with X-gal and IPTG for blue/white screening of recombinants. White colonies were picked and 37uC overnight cultured in LB medium with 100 ug/ml ampicillin. Monomer plasmids were sequenced with T7 forward primer (Table 2) on pGEM-T easy vector. Therefore, the monomer library of 16 monomers was constructed.

Construction of Tetramer Plasmids
Tetramer plasmids were constructed using cut/ligation reaction based on Golden Gate cloning strategy, taking 16 monomer plasmids as building blocks. A cut/ligation reaction was set up in a 20 ul reaction containing 100 ng of each 4 monomer plasmids, 1 ul BsaI (10 U/ul), 1 ul T4 DNA ligase (400 U/ul), 2 ul 10 6 T4 DNA ligase buffer and distilled water. The reaction was performed in a thermocycler with the following program: 5 min at 37uC,  Steps of the construction of TALENs and TALE-TFs are presented. TALE constructs can be generated in 3 days following the described steps. Samples of each step can be stored at 220uC for further use. doi:10.1371/journal.pone.0066459.g001 5 min at 16uC for 35 cycles followed by 2 h at 16uC and 20 min at 80uC. 3 ul reaction mixture was transformed into E. coli DH5aand the cells were plated on LB solid medium containing 100 ug/ml ampicillin. Plates were incubated overnight at 37uC. Colony PCR were used for verification with primer T7 (forward primer annealing on pGEM-T vector ,60 bp before tetramer) and SP6 (reverse primer annealing on vector ,80 bp after tetramer). The positive clone should have a single band of size 641 bp. Positive clones were grown in LB liquid medium with 100 ug/ml ampicillin overnight. Plasmids were extracted the next day. Tetramer plasmids were identified by sequencing with T7 forward primer. By repeating the steps above, we constructed a library of 256 tetramer plasmids.

Construction of TALEN Expression Backbones
The regions including N-terminal, two BsmBI sites, a 0.5 repeat (coding half of repeat NI, HD, NG or NN) and truncated Cterminus of TALE gene from each of the four pLenti-EF1a-Backbones were amplified by PCR with forward primer N-F and reverse primer C-R ( Table 2). The XbaI and BamHI restriction sites were added at 59 end of the primers respectively. The PCR products were cloned into pST1374 vector, designated as pST-TALEN-Backbones. The sequence fidelity was verified by sequencing with T7 forward primer.

Construction Protocol for TALEN or TALE-TF Constructs
Assembly of a customer TALE-TF or TALEN construct took about 3 days based on our tetramer library. A schematic representation is shown in Figure 1. The protocol for construction of a customized TALEN with 12.5 tandem repeats is summarized as follows: Day 1 AM. Tetramer plasmids containing repeats 1-4, 5-8 and 9-12 were amplified with primer pairs Tetramer-Fv\Tetramer-R1, Tetramer-F2\Tetramer-R2 and Tetramer-F3\Tetramer-Rv, respectively. Sequences of primers are given in Table 3. Typically, the PCR reaction was prepared to a total volume of 100 ul consisting of 40 ng tetramer plasmids, 10 ul of 10 mM    size. The tetramer amplicon was purified with PCR purification kit and finally eluted in 30 ul of distilled water. The purified tetramers (100 ng each) with the expression vector pST-TALEN-Backbone (200 ng) were subjected to a cut/ligation reaction in a 10 ul volume containing 1 ul of BsmBI (10 U/ul), 1 ul of T4 DNA ligase (400 U/ul), 1 ul of 106T4 DNA ligase buffer and distilled water. The cut/ligation reaction was performed with cycles in thermocycler. The digestion with BsmBI was carried out at 42uC for 5 min and ligation at 16uC. The reaction was repeated for 30 cycles. Although the optimal temperature of BsmBI is 55uC, its  activity at 42uC is adequate for the cut/ligation reactions (BsmBI still maintains 20% cleavage activity at 37uC), as higher temperature will accelerate inactivity of DNA ligase.
Day 1 PM. 3 ul of the cut-ligation reaction mixture was transformed into 40 ul competent E. coli strain DH5a. Cells were plated on LB solid medium containing 100 ug/ml ampicillin and incubated overnight at 37uC. Day 2. Typically, hundreds to thousands of colonies would grow up on the plates against few on the negative control. 20 colonies were then picked out for colony PCR verification. Primers SeqF (forward primer annealing at the last of the N-terminal on backbone) and SeqR (reverse primer annealing at the beginning of the C-terminal on backbone) were used for amplification. Their sequences are listed in Table 2. PCR reactions were made in 20 ul reaction containing 1 ul colony suspension, 2 ul 10 mM dNTP (2.5 mM each), 0.4 ul forward primer (5 uM) and 0.4 ul reverse primer (5 uM), 0.2 ul Taq DNA polymerase (5 U/ul) and 2 ul 10 6 Taq DNA polymerase buffer. The reaction was performed in a thermocycler with the following program: 95uC for 4 min; 94uC for 30 sec, 56uC for 30 sec, 72uC for 1 min 45 sec, cycle for 30 times; 72uC for 10 min.
PCR results were verified on a 1% (wt/vol) agarose gel. For a complete insert of 12 monomers (three tetramers ligated into the TALE backbone vector) together with a half one on the backbone, the product should be a single band of 1535 bp in size. The clones with the correct band were then cultured into 3 ml of LB medium with 100 ug/ml ampicillin and shook at 37uC overnight.
Day 3. The plasmids were prepared by using Plasmid Mini Kit following the manufacturer's protocol and verified by XbaI/ BamHI restriction digestion before sending for sequencing with primers SeqF and SeqR.

Construction of a Tetramer Library
We previously assembled a TALEN pair targeting human CCR5 gene by following the described method [8] into our pST-TALEN-Backbones and found that it is not only time consuming but also has low fidelity. This prompted us to consider building tetramer library which covers all 256 possible combinations binding 4 nucleotides. This library would be used as building blocks for assembling TALE proteins. The repeat tetramers in the library are all flanked by BsmBI sites, which will generate TGAC at 59 end and CTTA at 39 end after digestion. To generate different cohesive ends for assembling tetramers, we designed the primers tagged with BsmBI site, which yielded flexible cohesive ends as needed and were used to amplify the tetramers (Figure 2). The first pair of primers used to amplify the first tetramer generated a cohesive end at 59 end compatible with the 59 end from vector digestion and at 39 end compatible with the 59 end of the second tetramers amplified with the second pair of primers. The second pair of primers was designed to generate the tetramer at 59 end compatible with the 39 end of the first tetramer and at 39 end compatible with the 59 end of the third tetramer, and so on. Theoretically, we can assemble a TALE protein containing up to 30 monomers at once.
To efficiently construct tetramer library, we first generated a monomer library containing 16 monomer plasmids. The 4 pairs of unique primers were designed such that the amplified monomers can be ligated into any chosen position (such as position 1, or 2, or 3 and 4) in tetramers. Four monomers recognizing 4 DNA bases (NI = A, HD = C, NG = T and NN = G) were used as templates and subjected to PCR amplification with these 4 pairs of primers, resulting in 16 plasmids. The fidelity of all 16 monomer plasmids was confirmed by sequencing. A tetramer library including 256 possible combinations of four monomers was constructed using cut/ligation reactions based on Golden Gate cloning strategy. As BsaI and BsmBI have been proved to retain activity in DNA ligase buffer, digestion and ligation could be conveniently performed in one tube just by changing the temperature using a thermo cycler. We designed the tetramer with the unique structure that BsmBI sites are located at 59 end of the monomer 1 and at 39 end of the monomer 4 and rest ends of monomers are ligated after digestion with BsaI. To assemble the tetramer from the 16 monomer plasmids, we selected 4 monomer plasmids from monomer library to carry out the cut/ ligate reaction in one step with BsaI and T4 DNA ligase (Figure 3). The existence of a BsaI site located in AMP resistance gene on pGEM-T vector improved the efficiency of tetramer construction, as this BsaI site generated 3 fragments from the monomer 2 and 3 plasmids after BsaI digestion, likely eliminating self-ligation of these monomer plasmids. After sequencing, we found that some colonies either had mutations introduced by PCR or contained 3 monomers. The efficiency of tetramer assembly was about 85%. The 256 tetramer library was constructed in 4 weeks.
We also constructed a trimer library which covered all 64 combinations of 3 monomers targeting 3 nucleotides by using similar approach as described above. A dimer library containing all 16 combinations of 2 monomers targeting 2 nucleotides was also constructed with the same approach.
Using these libraries (monomer, dimer, trimer and tetramer) as building blocks, we can assemble a TALE protein containing any number of monomer repeats.

Assembling Tetramers for Construction of TALEN or TALE-TF Expression Vectors
All repeat tetramers in library plasmids were flanked by BsmBI sites which generated TAGC at 59 end and CTTA at 39 end after digestion. For the assembling purpose, the tetramers were amplified by pairs of primers based on their location in TALE protein. The primers were designed to contain mismatch bases with tetramer template of library and yield a 4 base pairs unique cohesive end (the last nucleotide coding Gly and three nucleotides coding Leu) after BsmBI digestion ( Figure 4, Table 3). To assemble a TALE protein of any tetramer number, the first and last tetramers had to be amplified with Tetramer-Fv for the first and Tetramer-Rv for the last, since they must be compatible with vector ends. Based on this method, we assembled 3 tetramers equivalent to 12 monomer repeats ( Figure 5). The 3 tetramers were selected based on target sequences. The first tetramer was amplified with Tetramer-Fv/Tetramer-R1, and the second and third tetramers were amplified with Tetramer-F2/Tetramer-R2 and Tetramer-F3/Tetramer-Rv, respectively. The PCR products were gel-purified and cut/ligation reaction was set up with the backbone vector in one tube containing BsmBI/T4 ligase. Once the correct tetramers were cut out and ligated into the expression vector, the vector no longer contained any BsmBI site, whereas expression vectors without inserts were linearized by BsmBI during incubation for 1 h at 55uC resulting in 2 pieces to avoid false positive clones.
Each of the TALEN backbones included a CMV promoter, a FLAG epitope tag, a nuclear localization signal (NLS), truncated N-terminal, C-terminal of the Xanthomonas campestris pv. armoraciae TALE hax3, two close BsmBI sites cutting in opposite direction, a 0.5 repeat encoding one of the RVDs (NI, HD, NG or NN) and the wild-type FokI nuclease domain ( Figure 6). TALE-TF backbones were pLenti-EF1a-Backbones the same as the previous study [8].

Testing of TALENs or TALE-TFs Assembly
To test the feasibility of our method, we constructed 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences, and 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences. Binding sites were selected by using the online tool: TAL Effector-Nucleotide Targeter 2.0 (TALE-NT 2.0; https://boglab.plp. iastate.edu/) [26]. All our constructs contained 12.5 repeats. Binding sites and the lengths of spacer sequences between TALEN pair sites are shown in Table 4.
Each of the constructed TALENs or TALE-TFs was performed in 3 days prior to being sent for sequencing. For each of the 12 constructs, we picked 20 colonies for verification via colony PCR method. The positive rates of different constructs ranged from 50% to 90% (Tables 4 and 5). 153 out of 240 colonies were positive in total, with an average positive rate about 64%. 5 positive clones of each construct were sent for sequencing. All the constructs were successfully generated according to the sequencing results, although there were point mutations in few of them (Table  S1). Some false clones were also sequenced (Text S2). Results indicated that addition or lack of one tetramer was the main cause, which was caused by incorrect ligation between similar adaptors (GCTC and ACTC in primer Tetramer-R2 and Tetramer-Rv which may cause tetramer 2 incorrectly ligated to the backbone). Therefore, we redesigned Tetramer-R2 and Tetramer-F3 to more divergent ones (RDT-R2 and RDT-F3 in Table 3) for efficient construction of a TALEN pair targeting sequence 59tccggagccccccctacctccgggggctcgcccgcgtca-39. The positive was 80% (data not shown).

Discussion
Although engineering TALE protein to target specific DNA sequence is much easier than using zinc finger protein, assembling TALE repeats is technically challenging. We tried to find a rapid, cheaper and convenient method with high success rate for TALE assembly and construction, which can be quickly used by any laboratory with basic molecular biology equipment.
Golden Gate cloning strategy is a convenient and fast method for multiple fragment assembly compared with traditional cloning approaches [27]. Based on our repeat tetramer library and cut/ ligation reactions, TALE constructs can be generated as quickly as other reported methods depending on the Golden Gate cloning strategy [8,19]. Since we start from the tetramers, only one cut/ ligation reaction is needed to generate TALE constructs with sufficient length of repeat arrays (12.5 repeats). We took advantage of PCR amplification to mutate the BsmBI cutting sequence flanking tetramers so that only one tetramer library was needed and not three. As nothing more than fundamental molecular biology experiments, such as, PCR, gel isolation, cut-ligation and transformation, need to be performed with the most regular enzymes and facilities, methodology described here is indeed convenient and available in every laboratory.
Incorrect ligation between incompatible adaptors may be the main factor lowering the rate of positive clones. Replacement of codons on adaptors with more divergent ones may promote accurate ligations. As linearized vectors are transformed into E. coli competent cells accompanied with correctly constructed vectors, homology recombination between repeats may lead to incomplete repeat arrays which may increase the number of false clones. The use of PlasmidSafe exonuclease may solve this problem as previously described [19]. As PCR is used in this method, amplification errors are unavoidable. HIFI DNA polymerase can be tried in tetramer amplification to decrease mutations in further studies. The length of tandem repeats can be optionally designed according to our tetramer, trimer, dimer and monomer plasmids. We also designed some more primer pairs for assembly of longer repeat arrays in Table 3 (from Tetramer-R3 to Tetramer-F7). However, due to the limitation of the sequencing length, longer repeat arrays are not available for sequencing at present, since sequencing results longer than 700-800 bp are not credible and internal sequencing primers are not specific among repeats. Despite these limitations, the success rates prove the feasibility of this method. Text S1 Sequencing result of a correct Gt(ROSA)26Sor-L1 colony. 5 positive colonies of Gt(ROSA)26Sor-L1 were sent sequencing. Sequencing result of one out of four correct colonies were given in the text.

(DOC)
Text S2 Sequencing result of a tetramer lost Gt(ROSA)26Sor-L1 colony. A false colony of Gt(ROSA)26Sor-L1 was sent sequencing. Sequencing result presented in the text indicated that the last repeat tetramer between adaptor GCTC and ACTC was lost. The adaptor became ACTC and was marked in red. (DOC)