PCR-free whole exome sequencing: Cost-effective and efficient in detecting rare mutations

In this study, we describe the development of a PCR-free whole exome sequencing method. Using this method, 2 μg DNA was sufficient for library preparation for whole exome sequencing. Furthermore, the method is simple and makes use of a commercial kit, with additional step of concentrating the captured library by ethanol precipitation. The accuracy of the PCR-free method was found to be equivalent to that of unique molecular identifier-corrected analysis method, which is the commonly used method to detect rare mutations. Thus, the PCR-free whole exome sequencing method is cost-effective as well as efficient in detecting rare mutations.


Introduction
Whole exome sequencing (WES) with next-generation sequencing (NGS) is a powerful and cost-effective method for detecting mutations and small indels in all exons, and is widely utilized for analyses of inherited diseases [1][2][3]. The application of WES has been widened to analyses of somatic mutations [4][5][6]. However, polymerase chain reaction (PCR) error during library preparation is the most resistant obstacle for detection of de novo, low-frequency mutations [7][8][9][10]. Unique molecular identifier (UMI) has been developed to detect rare mutations with NGS [11]. UMI is a method that uses molecular tags to detect original sequence and quantify unique DNA and RNA molecules. Moreover, duplex sequencing, in which the tags present on each end of the paired reads are utilized, is a very powerful method with extremely low error rates [12][13][14]. Many kits with UMI are provided by manufacturers for DNA-Seq and RNA-Seq, and it has become easy for customers to utilize the kits, since most kits come with their own data analysis software. However, the use of these UMI-based kits becomes expensive, even those for WES. Furthermore, for clinical application, a large number of samples are required to check for rare de novo mutations in cancer tissues and quality inspection is required before transplantation of human iPS cells. Therefore, in the current study, we attempted to develop a PCR-free WES technique to detect rare mutations in a cost-effective manner.

Ultrasonication of DNA
To use same condition of fragmented DNA by ultrasonication, a total of 20 μg DNA was taken in two sets of 10 μg DNA/tube and sheared using Covaris (Covaris, MA, USA) and used for every library preparation.

PCR-free library preparation of DNA sheared by Covaris and adaptor ligation using KAPA Hyper Prep Kit (PCRfree-Soni)
Approximately 20 μg sonicated DNA was size-selected using 2% agarose gel electrophoresis. The DNA from 100 bp to 300 bp was excised; the size-selected DNA was not stained with ethidium bromide (EtBr); instead, a precut marker DNA lane stained with EtBr was used as a guide for DNA size. Subsequently, the DNA was extracted from the gel using Wizard SV Gel and PCR Clean-Up System (Promega, WI, USA), according to the manufacturer's instructions (Fig 1). The extracted DNA was then purified with AMpure XP (Beckman Coulter, USA) ( Fig  2). Of the 4.43 μg of purified DNA, 4 μg DNA was subjected to end repair, A-tailing, and adaptor ligation with the KAPA Hyper Prep Kit (Kapa Biosystems, MA, USA), according to manufacturer's instructions.

PCR-free library preparation of DNA sheared enzymatically, followed by adaptor ligation using NEBNext Ultra II FS DNA Library Prep Kit (PCRfree-Frag)
Using the NEBNext Ultra II FS DNA Library Prep Kit for Illumina (NEB, MA, USA), 2 μg DNA was fragmented by DNA Fragmentase at 37˚C for 15 min, followed by end repair, A-tailing, and adaptor ligation according to the manufacturer's instructions, except that the ligation condition used was 4˚C for 4 h to maximize the efficiency of adaptor ligation (Table 1).

Target enrichment with SureSelect XT Human All Exon V5 kit
Input DNA amount for hybridization with the V5 kit was changed from 500 ng to 3000 ng for each library preparation method ( Table 1). The quality and concentration of the libraries were verified using the Agilent 2100 Bioanalyzer and Qubit Fluorometer (Thermo Fisher Scientific, MA, USA), respectively. Target enrichment of all libraries was conducted according to the manufacturer's instructions.

Library quantification and sequencing of PCR-free captured libraries
Eluted PCR-free captured libraries (25 μL) were mixed with equal volume of 0.2 N NaOH and allowed to stand for 3 min at room temperature to separate the capture probes. After the released probes were removed with magnetic beads, the supernatant containing the singlestranded PCR-free libraries was neutralized with 50 μL of 200 mM Tris-HCl (pH 7.5). Next, 5 μg of glycogen (Thermo Fisher Scientific) was added to the collected PCR-free captured libraries as a co-precipitant. The libraries were precipitated by adding 100 μL of isopropyl alcohol and the obtained pellet was washed once with 70% ethanol and then dissolved in 35 μL or 15 μL RNase-free water for PCRfree-Soni and PCRfree-Frag, respectively. Library quantification was conducted by qPCR with GenNext NGS library quantification kit (TOYOBO, Japan). The libraries were directly mixed with another 10 pM UMI or non-UMI library diluted with HT1 buffer. All libraries were sequenced on an Illumina HiSeq 2500 system performing 100 bp paired-end reads. The raw data were deposited in the DNA Data Bank of Japan (DDBJ; accession nos. DRA008877, PRJDB8701).

Exome sequence data analysis
All data analyses were conducted using the CLC genomics Workbench (CLCGW, v12, QIA-GEN), except the UMI consensus reads, which were made with alignment reads sharing the same UMI, of HS-UMI using Strand NGS (v3.3, Agilent). Prior to importing into CLCGW, UMI reads of HS-UMI were attached to the head of read1 of HS-UMI, because the library prepared using SureSelect XT HS Reagent has a 10-bp UMI on the i5 index read. After importing into CLCGW and adaptor trimming from fastq reads, only the reads of HS-UMI were imported into Strand NGS. UMI consensus read sequences of HS-UMI were generated and those with family size (the number of reads in each family) less than 2 were removed. Then, the reads of HS-UMI were re-imported into CLCGW. All fastq reads were mapped to hg19 reference genome. Duplicate PCR reads were removed from the XT-PCR library. To analyze lowfrequency mutations, basic variant detection operation was performed following local realignment operation. The results were corrected using VCF data of Illumina platinum genome NA12878 (https://www.illumina.com.cn/platinumgenomes.html) and compared among library preparation methods under the conditions of read coverage (the number of unique reads that include a given nucleotide) � 20 and read count (the number of variant-supporting reads) � 2.

PCR-free WES
Firstly, to absolutely exclude fragmented DNA less than a sequence read length of 100 bp and easily confirm the status of adaptor ligation to the fragmented DNA, we began the experiment using DNA of 100 bp to 300 bp resulting from agarose-gel size selection for PCR-free library preparation. The size-selected DNA (4 μg) was ligated to the adaptor using KAPA Hyper prep kit. The adaptor ligation efficiency roughly estimated from the results of the bioanalyzer was about 70-80% (Fig 2). We hybridized as much as 3000 ng library with the V5 probe. The captured library was denatured, followed by buffer exchange and concentration. Library quantification by qPCR showed that the estimated concentration of the libraries was 70.39 pM. Since this concentration was higher than the final concentration of the sequence library required for HiSeq (10 pM), we considered that these libraries could be sequenced by HiSeq. Therefore, we directly blended the PCR-free library with another 10 pM UMI or non-UMI library, and sequenced 76 million reads, with the sequence yield being about 70% of the yield estimated from the amount of input PCR-free library quantified by qPCR. On the other hand, the yield of UMI and non-UMI libraries sequenced with the PCR-free library was as expected by qPCR.

Comparison among three library preparation methods
To evaluate the accuracy of the PCR-free library method, we carried out three library preparation methods, PCRfree-Soni, HS-UMI, and XT-PCR, and compared the results of variant detection (Fig 1). For the HS-UMI method, we sequenced 359 million reads and 41.8 million consensus reads were obtained, of which only 102 million reads (28.4%) were used to make UMI consensus reads ( Table 2). After adaptor trimming, mapping to hg19, removing duplicates (only XT-PCR), and making UMI consensus reads (only HS-UMI), the numbers of reads overlapping with the V5 target regions of HS-UMI, XT-PCR, and PCRfree-Soni were 36,808,046, 63,572,153, and 63,936,438 respectively (Table 2). After local realignment operation, basic variant detection operation was conducted. Reads were mapped throughout the V5 target regions of all three library preparation methods (Fig 3). The coverage map of XT-PCR showed larger variation than that of PCRfree-Soni although the number of mapped reads of these two was approximately equal. The corrected frequency of detected SNP and small indel of PCRfree-Soni was almost the same as that of HS-UMI (Table 2 and Fig 4) and was lower than that of XT-PCR. These results showed that the accuracy of PCR-free method was superior to that of normal exome sequencing with PCR (XT-PCR) and equal to that of UMI corrected method (HS-UMI).

PCR-free library preparation using DNA sheared by Fragmentase
We confirmed that PCR-free WES is viable as stated above. Next, we tried to use DNA Fragmentase for PCR-free library preparation, because commercial DNA Fragmentase-based kits, such as KAPA Hyper plus kit and NEBNext Ultra II FS DNA Library Prep Kit for Illumina, showed higher adaptor-ligated library yield than did the covaris-sheared DNA processed kits. Starter DNA amount was reduced to 2 μg, and the shearing condition was adapted to lengthen DNA insert (Fig 2). The estimated concentration of 15 μL of the final library showed 137.11 pM. The total yield of the final library was enough to sequence over 200 million reads by HiSeq. We then sequenced 79.7 million reads, which again showed about 70% of the estimated yield by qPCR. The proportion of reads overlapping with the V5 target regions between PCRfree-Frag and PCRfree-Soni was almost the same, and the accuracy of the two methods was also similar ( Table 2). These results showed that the performance of PCRfree-Frag was almost equal to that of PCRfree-Soni.

Discussion
Our results showed that 2 μg DNA is sufficient to conduct PCR-free WES analysis, with the rate of mutation detection equaling that achieved with UMI-based methods. The PCR-free WES method described here satisfied the practical level required for detection of cancer specific mutations and iPS cell quality check. PCR-free method was shown to be effective not only in detection of rare mutation but also in detection of long repeat expansions [15]. We could conduct PCR-free WES analysis with less amount of DNA (500 ng-1000 ng) in combination with longer read length, such as 125 bp, 150 bp, and 250 bp by HiSeq. For practical analysis, it is desirable to utilize the consensus reads of UMI family size more than 2 [16]. The members of UMI libraries amplified by PCR from 200 ng DNA were too large to make UMI consensus reads efficiently, and the reads generated from 359 million fastq reads were very few (6,045,390 reads). Of course, if we use 10 ng DNA for HS-UMI, more UMI consensus reads would be possible. However, our goal was to establish a cost-effective detection method of rare somatic mutation using WES; therefore, reducing DNA amount is not appropriate for the purpose of detecting rare mutations.
Notably, the sequence yield of PCR-free captured libraries showed reproducibility of about 70% of that estimated by qPCR quantification. This might be due to the fact that the DNA standard in the qPCR kit was double-stranded DNA. Nonetheless, we believe that the PCRfree WES method is powerful and cost-effective for screening a large number of samples to detect rare mutation and small indels in cancer tissues and human iPS cells.