Restriction Site Extension PCR: A Novel Method for High-Throughput Characterization of Tagged DNA Fragments and Genome Walking

Background Insertion mutant isolation and characterization are extremely valuable for linking genes to physiological function. Once an insertion mutant phenotype is identified, the challenge is to isolate the responsible gene. Multiple strategies have been employed to isolate unknown genomic DNA that flanks mutagenic insertions, however, all these methods suffer from limitations due to inefficient ligation steps, inclusion of restriction sites within the target DNA, and non-specific product generation. These limitations become close to insurmountable when the goal is to identify insertion sites in a high throughput manner. Methodology/Principal Findings We designed a novel strategy called Restriction Site Extension PCR (RSE-PCR) to efficiently conduct large-scale isolation of unknown genomic DNA fragments linked to DNA insertions. The strategy is a modified adaptor-mediated PCR without ligation. An adapter, with complementarity to the 3′ overhang of the endonuclease (KpnI, NsiI, PstI, or SacI) restricted DNA fragments, extends the 3′ end of the DNA fragments in the first cycle of the primary RSE-PCR. During subsequent PCR cycles and a second semi-nested PCR (secondary RSE-PCR), touchdown and two-step PCR are combined to increase the amplification specificity of target fragments. The efficiency and specificity was demonstrated in our characterization of 37 tex mutants of Arabidopsis. All the steps of RSE-PCR can be executed in a 96 well PCR plate. Finally, RSE-PCR serves as a successful alternative to Genome Walker as demonstrated by gene isolation from maize, a plant with a more complex genome than Arabidopsis. Conclusions/Significance RSE-PCR has high potential application in identifying tagged (T-DNA or transposon) sequence or walking from known DNA toward unknown regions in large-genome plants, with likely application in other organisms as well.


Introduction
Linking gene identity to function is critical for genetic approaches to unravelling complex biological phenomena. For mutant genes disrupted by DNA insertions, the DNA insertion acts as a tag to enable the identification of the mutated gene. Obtaining flanking DNA is also valuable for isolating sequences upstream or downstream from a gene fragment. Unfortunately, especially for high-throughput screens, the current methods of isolating flanking DNA sequence, are less than ideal. There are three types of PCR-based techniques for walking in an unknown region from a known genomic fragment. The first type including TAIL-PCR uses nested specific primers from the ends of known region and degenerate primers that anneal randomly with the genome to obtain unknown flanking fragments [1,2]. The second one usually digests the genomic DNA with a restriction enzyme to generate an overhang followed by ligation of a complementary adaptor. The primers derived from the adaptor and known sequence amplify the flanking sequences through successive rounds of PCR [3][4][5][6][7][8][9][10][11][12]. The third type such as inverse PCR (iPCR) begins with the digestion of genomic DNA with a restriction enzyme like the second one, however, subsequent intramolecular ligation generates a small DNA circle. Two primers designed in opposite direction from the known fragment could amplify the unknown junction region [13][14][15][16][17][18]. TAIL-PCR and iPCR have been used widely for identifying genes from Arabidopsis and rice. TAIL-PCR usually requires 3 rounds of amplification and special treatment of PCR samples before direct sequencing, and non-specific products are often a problem. iPCR requires sufficiently long sequences for two pairs of nested primers and presence of two appropriate restriction sites within an amplification range. Adaptorbased PCR usually suffers from non-specific amplification from the adaptor primers, and panhandle suppression may be inadequate especially when genome in question is very complex.
We have modified adaptor-based PCR such that non-specific amplification is reduced and ligation is avoided. Specifically we designed a novel PCR strategy called Restriction Site Extension PCR (RSE-PCR). Genomic DNA targets are specifically and efficiently amplified through two rounds of PCR. During the first cycle of the first round RSE-PCR (Primary RSE-PCR), a short extension of 5 seconds extends the 39 end of the endonuclease restricted DNA fragments through a 5 bp terminal complementary to the 39end of the 1 st adaptor primer. Simultaneously, this 1 st adaptor primer cannot complete its extension in such a short period along the majority of genomic DNA templates in the range of kilobases long. During the subsequent cycles and the second round of semi-nested RSE-PCR (Secondary RSE-PCR), touchdown and two-step PCR are combined to further enhance the amplification specificity.
The success of this novel strategy is demonstrated in the isolation of T-DNA flanking sequences from 23 out of 37 Arabidopsis mutants of interest, and unknown fragment for a particular gene of maize. The ease and specificity of RSE-PCR prove the efficacy of this approach toward high throughput application in genetics and genome walking in diverse organisms, including those of large complex genomes.

Materials and Methods
An ethics statement is not required for this work.

Genomic DNA isolation and restriction
Genomic DNA was isolated from young leaves of Arabidopsis and maize as described [19]. One mL (500 ng to 1 mg) of genomic DNA was digested with 10 units of restriction endonuclease generating 39 overhangs in a 100 mL volume containing 16BSA,16buffer and 1 ml RNase A(10 mg/mL) for 3 hours under appropriate temperatures. The restriction endonucleases were subsequently heat inactivated.

PCR primers and conditions
All primers were synthesized by GenoMechanix (Gainesville, FL) or Invitrogen, and are summarized in Table 1. One microlitre of the above restricted genomic DNA was added to a 10 mL primary RSE-PCR reaction comprising 0.5 mL of each primer (10 mM, one is JL270, BIL1 or LB1; the other is AdKpnI, AdNsiI, AdPstI, or AdSacI), 16PCR buffer, 0.5 ml of 50 mM MgCl 2 , 0.5 mL of dNTP (2.5 mM each), and 0.25 U of Platinum Taq Polymerase (Invitrogen). The primary RSE-PCR program was performed as shown in Table 2. 190 mL of autoclaved ddH 2 O was added to each sample to make a 20 fold dilution after amplification, from which 1 mL was removed for the secondary RSE-PCR. The secondary RSE-PCR contained the same ratio of reagents in a 20 mL volume except with the nested specific primer, such as JL202, BIL2 or LB2 from known sequences, and the 2 nd general adaptor primer (AP), as described in Table 2.
Gel analysis and DNA sequencing 5 mL of the secondary RSE-PCR products were loaded in 1.2% agarose gel stained with ethidium bromide in a 16TAE or 16TBE buffer and visualized under a UV illumination system. The remaining 15 mL PCR products were purified through Sephadex G-50 column and subject to sequencing (Lone Star Labs, Houston, TX).

Principle of RSE-PCR and optimization of PCR parameters
In 1993, Upcroft and Healey employed PCR priming from the SacI restricted Giardia duodenalis (an intestinal protozoan parasite, genome size = ,12 Mb) to successfully extend the 59 flanking fragment of a drug resistance related gene [20]. Although there was no description of their PCR procedure, their idea could be extended and tested in large scale plant genetics. We designed the 1 st adaptor primers containing a core part of 22 bp (GTAA-TACGACTCACTATAGGGC, a derivative from Genome Walker upper adaptor strand (Clontech) and a 39 terminus of 5 bp (GTACC for KpnI, TGCAT for NsiI, TGCAG for PstI, and AGCTC for SacI) as shown in Table 1. Theoretically, the probability for a restriction site of a six base pair endonuclease is 1 out of every 4 6 (4096) base pairs, meaning that the average size of the restricted genomic DNA is about 4 Kb. If the sequence around the middle of a fragment is known, the isolation of its flanking 59 and 39 parts (about 2 Kb each) will be compatible with the amplification capacity of Platinum Taq Polymerase (Invitogen). The chance of successful isolation will be further increased through separate digestions with four different endonucleases.
During the primary RSE-PCR, a 5-second extension during the first cycle extends the 39 end of the endonuclease restricted DNA strands through a 5 bp terminal complementary to the 39end of the 1 st adaptor primer, whereas the extension of the 1 st adaptor primer along the majority of genomic DNA templates is not completed. Subsequent specific exponential amplification of the target is favored through the combination of touchdown, two-step and semi-nested PCR strategy and driven by primers from a known fragment such as T-DNA border sequence. This will give rise to the 59 flanking sequence of a known fragment (Figure 1).
However if nested reverse primers are used, the 39 flanking sequence could be isolated from a known sequence. Five microlitres of the secondary PCR products are gel-checked as detailed in Materials and Methods, and if the result is positive, the remaining 15 mL PCR products are purified through Sephadex G-50 and subject to sequencing.

Isolating T-DNA flanking sequence in Arabidopsis transformed with different vectors
To elucidate molecular mechanisms involved in the complex regulation of the TCH4 (TOUCH4) gene [21], one transgenic line harboring the 2258 to +48 of TCH4 sequences fused to LUC in Col-0 background was mutagenized with pSuperTag2 vector to generate T-DNA insertion mutations [22,23]. Genetic screens identified 37 mutants, which showed altered TCH4 expression (tex) after heat shock. Previous attempts with TAIL-PCR worked with only one mutant out of 37 tex mutants (unpublished data, Luis & Braam). Using RSE-PCR, sequences flanking T-DNA insertions were isolated and sequenced from 23 out of 37 tex mutants ( Table 3). Figure 2 shows the representative RSE-PCR products from one tex mutant digested with four endoenzymes. The RSE- During the primary RSE-PCR, a short extension of the first cycle extends the 39 end of the endonuclease (PstI as an example) restricted DNA fragments through a 5 bp terminal complementary to the 39end of the 1 st adaptor primer (AdPstI), whereas the extension of the 1 st adaptor primer (AdPstI) along the majority of genomic DNA templates is not completed (not shown here). Subsequent specific exponential amplification of the target is favored through the combination of touchdown, two-step and semi-nested PCR strategy (secondary RSE-PCR), and driven by T-DNA (for example, JL270 from pSuperTag2 for primary RSE-PCR and JL202 from pSuperTag2 for secondary RSE-PCR) or gene specific primer. In this case, the 59 flanking sequence from a known fragment will be extended. The 39 flanking sequence will be isolated if nested reverse primers are used from the known fragment. Note the size of primers and genomic DNA fragment are not to scale. doi:10.1371/journal.pone.0010577.g001 PCR product size ranged from about 300 bp to nearly 3 Kb (data not shown). All the purified RSE-PCR products sequenced with JL202 primer contained T-DNA left border sequence and genomic sequence from Arabidopsis. Flanking sequences in the remaining 14 tex mutants failed to be isolated possibly due to tandem insertion, lack of intact T-DNA border sequence, DNA rearrangement, or complicated DNA context [24,25]. One tex mutant contains two insertions. 6 tex mutants contain insertions in exons or introns, 4 downstream of protein coding regions and 14 upstream of protein coding regions.
In addition, we found that RSE-PCR also works with other vectors commonly used in SALK and SAIL T-DNA insertion lines. xth22-A (SAIL_158_A07) and xth24-1 (SALK_005941.51.20.x) mutants insertion sites were successfully analyzed; the nested primers from the left borders of the vectors used in generating SAIL (LB1 and LB2) and SALK (BIL1 and BIL2) lines are listed in Table 1.

Isolation of multiple insertions in a single line
Multiple bands could be amplified after the secondary RSE-PCR, which suggests the presence of several T-DNA insertions in the line. After the gel separation of the bands, pipette tips of 1-200 ml were used to pick up a tiny piece of agarose gel directly from individual PCR bands under UV light. These were resuspended in 20 mL of autoclaved water by pippetting up and down several times. Then one microlitre was used for another round of PCR with the same primers and cycling program as in the secondary RSE-PCR. The PCR product was gel checked and purified as described above and subject to sequencing. As an example, tex87 mutants were found to contain two T-DNA insertions: one is 2982bp of 59end of AT5G67390, and the other 562 bp downstream of At2g16290 (F-box family protein).

Isolating unknown sequence from a particular known gene sequence in different plant species
The above work suggests that as long as the sequence of a DNA fragment is known, the specific flanking sequence can be isolated; therefore, we next tested the feasibility of the approach in more complex plant genomes. Maize ns2 gene from B73 inbred line was specifically amplified after SacI restriction (AdSacI and ZmPFR13 Table 3. T-DNA insertion sites in 23 tex mutants obtained with RSE-PCR.  Figure 2. Gel image of one representative tex mutant (tex34) after two rounds of RSE-PCR. Lanes 1-4 were tex34 restricted with KpnI, NsiI, PstI and SacI, while lanes 5-8 were tex34 without any digestion. Primers AdKpnI (lanes 1 and 5), AdNsiI (lanes 2 and 6), AdPstI (lanes 3 and 7) and AdSacI (lanes 4 and 8) were used respectively in four primary RSE-PCRs together with JL270 from pSuperTag2, while primers JL202 from pSuperTag2 and AP were for all four secondary RSE-PCRs. Lanes 9 and 10 were 100 bp and lamda BstEII ladders. Five ml of each secondary RSE-PCR products were loaded in 1.2% agarose gel stained with ethidium bromide in a 16TBE buffer. Note that the arrows represented the specific amplification. doi:10.1371/journal.pone.0010577.g002 for the primary RSE-PCR, and AP and ZmPFR14 for the secondary RSE-PCR). Sequencing with primer ZmPFR14 recovered 863 bp of readout, which was the same obtained previously with Genome Walker Kit from Clontech [26]. This data suggests that RSE-PCR can substitute for genome walker kit for gene cloning. Together, the data here indicate that a new strategy, RSE-PCR, has high potential application in identifying tagged (T-DNA or transposon) sequencing or walking from known DNA toward unknown regions in large-genome plants, with likely application in other organisms as well.

Author Contributions
Conceived and designed the experiments: JJ JB. Performed the experiments: JJ. Analyzed the data: JJ JB. Contributed reagents/ materials/analysis tools: JB. Wrote the paper: JJ JB. Major corresponding author: JB.