Fragmentation Through Polymerization (FTP): A new method to fragment DNA for next-generation sequencing

Fragmentation of DNA is the very important first step in preparing nucleic acids for next-generation sequencing. Here we report a novel Fragmentation Through Polymerization (FTP) technique, which is a simple, robust, and low-cost enzymatic method of fragmentation. This method generates double-stranded DNA fragments that are suitable for direct use in NGS library construction and allows the elimination of the additional step of reparation of DNA ends.


Introduction
Next Generation Sequencing (NGS) has become one of the most widely used techniques in genomic research and genetic diagnostics. Fragmentation of DNA is the first main step in preparing a sequencing library for NGS. The well-known NGS technologies-like Illumina or Ion Torrent-generate a plethora of reads with lengths under 600-1000 bases. For library preparation, purified DNA samples are sheared into shorter fragments, then platform-specific adapters are ligated to the molecules to provid primer-binding sites for further amplification and sequencing. The high level of NGS resolution is achieved by multiple representations through different reads for every DNA region despite their sequence and context. In other words, the sequences of the fragments must overlap. Thus, the quality of NGS is largely dependent on the randomness of DNA fragmentation and the overlap of the resulting library fragments. This makes the fragmentation step critical in the process of library construction.
There are three typical approaches to shorten long DNA for library preparation: physical (by using acoustic sonication or by hydrodynamic shearing), enzymatic (based on the usage of endonucleases or transposase) and chemical shearing (by hydrolyzing DNA through heating it with divalent metal cations) [1,2].
Acoustic shearing with Covaris ultrasonicators (Covaris, Woburn, MA, USA) is currently the gold standard for fragmentation at random nucleotide locations for an NGS library PLOS  construction; this process is very important for a high-quality NGS library sample preparation. Unfortunately, it can be financially inaccessible for many laboratories [3]. An additional disadvantage of acoustic shearing is that it can be a source of oxidative damage to DNA that may result in sequencing artifacts [4]. Enzymatic methods and acoustic shearing have similar levels of efficiency, but enzymatic methods do not need expensive equipment [2]. Commercially available Fragmentase (New England Biolabs, Ipswich MA, USA) and Nextera tagmentation (Illumina, San Diego, CA, USA) are the most popular enzymatic techniques. Nextera uses a transposase to simultaneously fragment and insert adapters into dsDNA [5]. Fragmentase contains two enzymes: one randomly nicks dsDNA and the other cuts the strand opposite to the nicks [2]. Enzymatic digestion is simple and very efficient, but it may introduce an enzymatic bias, such as insertions and deletions (indels) [2,6]. These biases are associated with DNA sequence content and may produce a non-random fragmentation [6].
DNA fragments obtained by physical fragmentation or by the Fragmentase method require a repair of DNA ends for the ligation with adapters during subsequent NGS library construction [1,2]. To improve the protocol for NGS library generation and reduce the end repair stage, we have developed a new enzymatic method for DNA fragmentation: Fragmentation Through Polymerization (FTP). Our FTP method is based on the use of two enzymes: a nonspecific endonuclease, which randomly nicks dsDNA (DNase I), and a thermostable DNA polymerase with strong strand-displacement activity (SD DNA polymerase) [7]. At the first stage of FTP, DNase I introduces nicks into the dsDNA, and at the second stage, SD DNA polymerase elongates the 3'-ends of the nicks in a strand-displacement manner. As a result, FTP generates multiple double-stranded DNA fragments with extended overlapping sequences at the ends (Fig 1). Additionally, the SD polymerase causes 3'-A-overhangs, which make the fragments suitable for direct ligation with T-tailed DNA adapters without a requiring DNA end repair.
A random fragmentation process is an important feature for high-quality NGS library sample preparation. It is known that DNA cleaving is not an entirely random process because cleaving/nicking enzymes-including DNase I-are sequence-dependent [8,9], and physical methods for fragmentation are partly sequence-specific as well [10,11]. Like other enzymatic methods, FTP utilizes DNase I as a nicking enzyme. In contrast to other digesting techniques, the fragments obtained by FTP from a long DNA molecule have overlapping sequences at the ends (Fig 1) that may help to overcome the problem with sequence-dependent DNA-nicking by DNase I.
Here we describe the detailed FTP method of DNA fragmentation and compare it with the well-known and widely used Fragmentase technique (New England Biolabs). Systematic comparison of Fragmentase with other fragmentation methods has been described earlier [2].
NEBNext dsDNA Fragmentase and the NEBNext Ultra II DNA Library Prep kit were supplied by New England Biolabs, Inc. (Ipswich, MA, USA). Russian State Budget. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Syntol JSC provided support in the form of salary for Y.A. M. and reagents for NGS-analysis, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of this author are articulated in the 'author contributions' section. Evrogen JSC provided equipment for NGS-analysis and provided support in the form of salary for author K.A.B., but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section. Bioron GmbH provided enzymes and reagents (including SD DNA polymerase), but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

dsDNA Fragmentation Through Polymerization (FTP)
For fragmentation, 200 ng of gDNA of the E. coli strain BL21(DE3) were added to the following reaction mixture: 1X reaction buffer for SD polymerase (Bioron GmbH), 3.5 mM MgCl 2 , 0.25 mM dNTPs (each), DNase I 1 ng/μl, SD DNA polymerase 1.5 U/μl. The total volume of the reaction was 25 μl. The reaction mixture was completed at 4˚C (wet ice). The fragmentation of gDNA was carried out by two-step incubation: 20 minutes at 30˚C and then 20 minutes at 70˚C. For incubation, we used a thermal cycler with a heated lid. The reaction was stopped by cooling down the mixture to 10˚C. The mixture was diluted 1:1 with sterile water, and fragmented DNA was purified with SPRI beads. After the amplification stage, all libraries were quantified with a Quant-iT PicoGreen dsDNA Assay Kit (Molecular Probes, Inc., Eugene, OR, USA) and with the Agilent 2200 TapeStation Instrument with a D1000 Tape System (Agilent Technologies, Waldbronn, Germany), pooled (500 ng of each), and purified with AMPure XP beads.

NGS and bioinformatic analysis
The pooled libraries were sequenced with the Illumina MiSeq Instrument (Illumina, California, USA) with a 300 Cycles MiSeq Sequencing Kit v2-paired-end mode-resulting in 12×10 6 reads. Each of the reads was approximately 150 nt long. The FASTQ files generated on the instrument were uploaded to the NCBI SRArchive under project ID: PRJNA509202.

Digestion of gDNA with the FTP method
We compared two enzymatic methods of dsDNA fragmentation for NGS library construction: digestion with Fragmentase from New England Biolabs and FTP. The FTP method consists of two enzymatic reactions: random DNA nicking and elongation in a strand-displacement manner of the 3' ends of the nicked DNA. As a result, multiple double-stranded DNA fragments with overlapping sequences at the ends are generated. The general overview of the FTP method is outlined in Fig 1. We carried out FTP in a one-tube format as described above. Mesophilic DNase I and thermophilic SD DNA polymerase were added to the reaction mixture that contained the gDNA of the E. coli strain BL21(DE3). The reaction was incubated at 30˚C for 20 minutes, plus an additional 20 minutes at 70˚C. DNase I has an optimum performance temperature between 30˚C and 40˚C. During the first stage of incubation at 30˚C, DNase I introduced nicks into the dsDNA. In order to optimally obtain average-sized fragments, we tested different DNase I concentrations and incubation times (S1 Fig). During the second stage, the DNase I was heatinactivated and the SD polymerase was activated by increasing the reaction temperature to 70˚C. The SD polymerase is a Taq DNA polymerase mutant that has a strong 5'-3' strand displacement and 5'-3' polymerase activities [7]. It does not have 5'-3' and 3'-5' exonuclease activities. Unlike natural enzymes with strong strand displacement activity, such as Phi29 or Bst polymerase that are stable and active below 70˚C, SD polymerase is stable up to 93˚C and has its optimum level of enzymatic activity at 70-75˚C. Additionally, the enzyme does 3'-A-overhangs, which make the product of its polymerization suitable for ligation with T-tailed DNA adaptors. These properties of SD DNA polymerase make it very suitable for the FTP technique.
In summary, DNase I generated 3' ends by nicking dsDNA at 30˚C, followed by SD polymerase using these ends for strand displacement DNA polymerization at 70˚C, which resulted in disjointed dsDNA fragments (Fig 1). As the result, the A-tailed dsDNA fragments with overlapping sequences and with an average size of about 500 bp (in a range from 150 to 1500 bp) were obtained from the intact gDNA. Agarose-gel electrophoresis of gDNA fragmented by FTP is demonstrated in Fig 2. As seen in this figure, both DNase I and SD polymerase are required for the DNA fragmentation and complete separation of the fragments (Fig 2, lanes 4  and 5).
Fragmentase and other methods of fragmentation-with the exception of Illumina's Nextera tagmentation-generate DNA fragments by introducing nicks and counter nicks in DNA strands that disassociate at 8-12 nucleotides downstream or upstream from the nick site. Thus, the generated fragments need repair of DNA ends for the subsequent NGS library construction [1,2]. Unlike in other methods, in FTP the DNA fragments are separated by stranddisplacement DNA polymerization and not by counter nicks. SD polymerase also carries out A-tailing of the ends. As a result of FTP, double-stranded DNA fragments have ends that are suitable for direct NGS library construction and the additional step of DNA end repair is no longer necessary.

NGS library constructions from Fragmentase and FTP -digested gDNA
Two techniques-FTP and standard Fragmentase-were used to digest the gDNA of the E. coli strain BL21(DE3). The fragmented DNA samples were then used for the construction of NGS libraries with NEBNext Ultra II DNA Library Prep Kit from New England Biolabs. Four libraries were prepared from the DNA samples digested with Fragmentase by the standard protocol, which included the stage of DNA end repair.
Another four libraries were prepared using the same NEBNext kit, but the DNA samples for these libraries were generated with the FTP method without the stage of DNA end repair. It is worth noting that when the DNA fragments are obtained by physical fragmentation or from the Fragmentase method, the repair of the DNA ends is necessary for the library's construction [1,2]. The FTP method does not require this step; therefore, the procedure of NGS library preparation is simpler. As mentioned above, FTP generates A-tailed DNA fragments which are suitable for direct ligation with T-tailed adaptors. As a result, the preparatory time for NGS library creation has decreased by 70 minutes-from 180 minutes (the preparation with the end repair stage) to 110 minutes (without the stage of end repair).
The DNA amount in each library was quantified with the Quant-iT PicoGreen dsDNA Assay Kit and with the Agilent 2200 TapeStation. All NGS libraries generated with both the Fragmentase and the FTP method contained similar amounts of ds DNA (800 ± 50 ng) and had similar mean insert sizes of the libraries in a range from 400 to 500 bp. This result shows that the yield of the NGS libraries generated with the FTP method is comparable to the yield obtained with the Fragmentase technique.

Assessment of NGS libraries generated from Fragmentase and FTP -digested gDNA
The NGS libraries of E. coli BL21(DE3) gDNA were sequenced at 48× depth with an Illumina MiSeq Instrument. The raw data (about 220 Mb for each DNA sample) generated in this study have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive under BioProject accession number PRJNA509202 (https://www.ncbi.nlm.nih. gov/sra/PRJNA509202).
Different fragmentation and NGS library preparation protocols could potentially affect the quality of the reads. We therefore estimated the quality of reads as described in [2] for comparison of different fragmentation methods. PHRED quality scores for each base provide a sequencing error estimate and are a good tool to assess the quality of sequences and to compare the reliability of different sequencing runs on the same instrument [16]. We did not detect any significant differences in the quality scores obtained from the Fragmentase and FTP NGS libraries (S2 Fig). The randomization of DNA digestion for both fragmentation methods was compared by nucleotide composition plots which show the mean base composition for every read cycle of NGS and indicate-at the beginning of the reads-the quality of the random fragmentation ( Fig 3A). The difference between the mean base composition for every read cycle and the average base composition in the reads was estimated using the chi-squared test (Fig 3B).
The deviations of the plots from the average base composition in the first three positions of the reads (Fig 3A) and the increased chi-square value at the first positions of the reads (Fig 3B) indicate that the sites of DNA fragmentation for both enzymatic methods are partly associated with DNA sequence contents. This is no surprise because all methods of fragmentation are partly sequence-specific [2,6,10,11]. We expected a lower randomization of FTP DNA digestion in comparison with Fragmentase because DNase I-used in FTP-is a sequence-dependent enzyme [8,9]. However, the FTP method provided the better randomization of the fragmentation sites than Fragmentase (Fig 3). Perhaps the generation of overlapping sequences at the ends of FTP fragments (Fig 1) counterbalances the sequence-dependent DNA nicking by DNase I.
For the efficient and complete extraction of information from the NGS assay, the full and uniform representation of the whole genome sequence in the NGS library is essential. Among other factors, this heavily depends on the level of randomization during the fragmentation step of the library preparation. To assess the representation of the sequences in the FTP and Fragmentase libraries, we visualized the read coverage uniformity over the genome (Fig 4A) and GC coverage bias (Fig 4B) for both methods. As the reference sequence, the E.coli BL21(DE3) genome sequence (NCBI Ref Seq: NC_012971.2) was used.
To evaluate the read coverage uniformity throughout the genome, Lorenz curves were used. A Lorenz curve shows the cumulative fraction of reads as a function of the cumulative fraction of the genome. The plotted curves (Fig 4A) demonstrate that both the Fragmentase and FTP methods exhibit the same uniformity. GC coverage bias plots allow the evaluation of the read coverage depending on GC content. A normalized (relative) coverage in the plots is a relative measure of sequence coverage by the reads at a particular GC content. The plot visualizes the normalized coverage across the entire GC spectrum by grouping all 100-base sliding windows across the genome by their GC content and reporting the average normalized coverage for each GC content percentage. A normalized coverage of 1 indicates that a particular base is covered at the expected average rate. A relative coverage above 1 indicates higher than expected coverage and below 1 indicates lower than expected coverage. The obtained GC bias plots (Fig 4B) demonstrate similar uniform coverage depending on GC content, while FTP provides better uniform coverage for GC reach sequences.
There are several key characteristics of NGS that depend on the quality of the library: genome coverage, identity with a reference sequence, the rate of errors, and the number of unmappable sequences. These characteristics were estimated for different sequencing depths of the NGS libraries. For the simulation of different depths, random samples of NGS reads were generated. To compare the genome coverage (the total number of aligned bases in the reference divided by the genome size), we used the genome sequence NCBI Ref Seq: NC_012971.2 as the reference with the assumption that this represented 100% coverage. For the computation of genome coverage, a base in the reference genome is counted as aligned if there is at least one contig with at least one alignment to this base. Contigs from repeat regions may map to multiple places and thus may be counted multiple times in this quantity. Unmappable sequences were calculated as a rate of unmappable reads. A large fraction of these reads  Table 1. The statistics for the Fragmentase and FTP NGS libraries were calculated from the data of the four independent libraries for each fragmentation method. The detailed data for each NGS library are shown in the Supporting information (S1 Table). The obtained characteristics are identical or very similar for the assembled sequences from the libraries generated by the different methods ( Table 1). The FTP method gives a greater proportion of unmappable reads compared to Fragmentase, but the difference is less than 1% of all reads in the library. It can be explained by the assumption that FTP generates additional non-specific sequences during the polymerization stage of the fragmentation. Potentially, FTP may increase the level of mismatches, because SD polymerase does not have proofreading activity. In practice, we did not see any significant difference between the methods. Proportions of FTP/Fragmentase mismatches are equal 1 for deep sequencing and 1.08 for shallow (3× depth) sequencing.
To evaluate the de novo genome assembly of the Fragmentase and FTP libraries, we used QUAST software (quality assessment tool for genome assemblies) [15]. We compared the following assembling metrics: • Number of contigs: the total number of contigs in the assembly.
• Largest contig: the length of the largest contig in the assembly.
• Total length: the total number of bases in the assembly.
• N50 and N75: the contig length such that using equal or longer length contigs produces at least 50% and 75% (respectively) of the bases of the assembly length [15,17,18].
• NG50 and NG75: the contig length such that using equal or longer length contigs produces at least 50% and 75% (respectively) of the length of the reference genome, rather than 50% and 75% of the assembly length [15,17,18].
The assembly metrics were calculated for different sequencing depths of the libraries obtained with the Fragmentase and FTP methods. The mean statistics calculated from the data In summary, the Fragmentation Through Polymerization method is a novel, robust, and simple method of DNA fragmentation which is suitable for NGS. In comparison with Fragmentase, it provides very similar characteristics for NGS libraries. Potential disadvantages of FTP are associated with biases of the enzymes used in the method, such as non-random DNA fragmentation and mismatch errors. These characteristics of FTP were compared with the Fragmentase method. The experimental data demonstrate that FTP yields higher quality random fragmentations (Fig 3) and better coverage of GC reach contents (Fig 4B) than Fragmentase. Levels of mismatch errors are similar for both methods. FTP generates a greater number of unmappable reads than Fragmentase, but the difference is less than 1% of all reads in the library.
The main advantage of the FTP method lies in the simplification of NGS library preparation by eliminating the DNA end repair and A-tailing stage from the protocol. In the result, the work time of the procedure can be decreased from 180 minutes to 110 minutes (the repair/Atailing stage takes 70 minutes according to the manual). Additionally, it can reduce the price of the library preparation. For example, the current price of the NEBNext Ultra II DNA Library Prep kit for 24 reactions is 535 Euros; the price of the NEBNext Ultra II End Repair/dA-Tailing Module for 24 reactions is 262 Euros. Thus, the elimination of this module from the kit can decrease the primary cost of NGS library preparation.
Based on our data we hope that the FTP method can become a helpful tool for NGS.
Supporting information S1