Inexpensive Multiplexed Library Preparation for Megabase-Sized Genomes

Whole-genome sequencing has become an indispensible tool of modern biology. However, the cost of sample preparation relative to the cost of sequencing remains high, especially for small genomes where the former is dominant. Here we present a protocol for rapid and inexpensive preparation of hundreds of multiplexed genomic libraries for Illumina sequencing. By carrying out the Nextera tagmentation reaction in small volumes, replacing costly reagents with cheaper equivalents, and omitting unnecessary steps, we achieve a cost of library preparation of $8 per sample, approximately 6 times cheaper than the standard Nextera XT protocol. Furthermore, our procedure takes less than 5 hours for 96 samples. Several hundred samples can then be pooled on the same HiSeq lane via custom barcodes. Our method will be useful for re-sequencing of microbial or viral genomes, including those from evolution experiments, genetic screens, and environmental samples, as well as for other sequencing applications including large amplicon, open chromosome, artificial chromosomes, and RNA sequencing.


Introduction
Sequencing has become an indispensible tool in modern microbiology, dramatically changing the resolution and speed of studies of biodiversity [1], evolution [2][3][4][5][6], and molecular biology [7], and improving pathogen surveillance [8] and clinical diagnostics [9,10]. With current technology, hundreds of full megabase-size genomes can be sequenced in a single Illumina HiSeq lane at over 30x coverage, for a cost of about $15 per sample. Thus, the costs of standard library preparation methods, which typically exceed $50 per sample, substantially limit the amount of microbial genome sequencing. Two studies have recently proposed ways to alleviate this limitation [11,12]. Based on similar principles to those proposed by Lamble et al. [12] and in the Illumina Nextera XT kit [13], we developed a library-preparation protocol that achieves further reductions in costs and increases in efficiency. Specifically, we improve on the cost-limiting steps of these protocols by substantially decreasing tagmentation reaction volume (to 2.5μl), replacing bead-based standardization with inexpensive fluorescent standardization, substituting inexpensive third-party PCR reagents, and replacing the bead cleanup step with functionally equivalent but much cheaper beads. The protocol described here costs approximately $750 per 96 samples including consumables, with under 3 hours hands on time and under 5 hours total time.

Protocol Overview and Important Considerations
Our protocol consists of 5 modules (Fig 1). We assume that the protocol is executed with purified genomic DNA (gDNA) but other types of purified DNA can be used. This protocol is adaptable to any application where template size exceeds read length (e.g. not short amplicon). Since the reliability of the tagmentation reaction (Module 2) is sensitive to the purity of input gDNA [12], we recommend using column-based genomic extraction, such as the Invitrogen PureLink 96-well kit. The cost of consumables per sample is rounded to the nearest $0.25.
Module 1: Standardization of gDNA concentrations across samples ($0.50/sample, 60 min) The goal of this module is to standardize the gDNA concentration across samples to achieve uniform reaction efficiency in the tagmentation step (Module 2). Tagmentation is sensitive to the input gDNA concentration and the optimal concentration will vary depending on the organism, DNA type (e.g., genomic versus PCR product), and the DNA extraction method. We found that the optimal initial gDNA concentration may vary depending on the organism and application. In our experience, the optimal concentrations for both Gram-negative (e.g., Escherichia coli) and Gram-positive (Staphylococcus aureus) bacteria were in the range of 0.5 to 1ng/μl, while for yeast Saccharomyces cerevisae it was about 2ng/μl. See "Selecting input gDNA concentration and bead volume for optimal fragment length" and Fig 2 for more information.
We use SYBR Green I to quantify gDNA, which gives sufficiently precise measurements and is markedly cheaper than other dyes. For lower-throughput work, QuBit quantification can also be used. We do not recommend absorbance quantification methods such as NanoDrop because they have lower sensitivity and can be affected by the presence of single-stranded nucleic acids.  In this module, the transposase loaded with a part of Illumina adaptors (also referred to as "tagmentation enzyme") and the tagmentation buffer provided in an Illumina Nextera kit are used to simultaneously fragment gDNA and incorporate sequencing adaptors. We use steps described in the standard Nextera protocol, but with a smaller reaction volume. We have found that tagmentation-reaction volume as small as 2.5μl does not significantly limit the diversity of sequenced DNA for megabase-sized genomes. Specifically, libraries of E. coli DNA (genome size 4.64Mb) prepared with this protocol typically yield over 98% unique reads (Fig 3). Since each position in the genome is represented on thousands of tagmented DNA fragments, the fraction of false-positive variants created by errors in subsequent PCR amplification (Module 3) are negligible. Larger tagmentation-reaction volumes may be necessary for larger genomes to achieve sufficient library complexity and avoid PCR-induced errors. The final fragment size distribution critically depends on the stoichiometry of gDNA and transposase [14]. Thus, to achieve results consistent across samples, it is essential to accurately standardize input DNA (Module 1) and to thoroughly mix the tagmentation master mix with gDNA. Tagmented DNA, without purification, can be directly used as template for the subsequent PCR step.
Module 3: PCR-mediated adapter addition and library amplification ($1.50/sample, 75 min) In this module, PCR is used to incorporate the remaining Illumina adaptor sequences and sample barcodes to tagmented DNA fragments. The adaptors bind fragments to the flow cell [14], and the barcodes allow for multiplexed sequencing. If 96 or fewer samples are pooled on a single lane, we use the Illumina TruSeq primers S501-S508 and N701-N712. For higher multiplexing requirements, we developed custom row and column primers, labeled R09-R36 and C13-C24. These were derived from the TruSeq primers and are compatible with them (S1 Table). Also, Illumina now has additional TruSeq barcodes. When combining the barcodes in this paper with other sets beyond S501-S508 and N701-712, care should be taken to verify that pairs of barcodes remain at sufficiently distant Hamming distances for disambiguation (we recommend at least 3bp). We substitute the Nextera-provided PCR reagents with KAPA high fidelity library amplification reagents. While we have tested KAPA reagents, in principle any hot start high-fidelity enzyme with low GC-bias amplification should work. Compared to the PCR program recommended in the original Nextera protocol, we recommend a longer initial denaturation to promote inactivation of tagmentation enzyme, shorter extension time to enrich for smaller fragments, and more cycles to increase yield from the smaller tagmentation reaction. Samples had between 0.2 and 2.8 million reads with 90% of samples having over 1 million reads. Raw reads were filtered and then aligned to a reference genome using bowtie2. Unique reads are those that appear only once in the alignment for a particular sample. These are the reads that remain after use of the rmdup tool in samtools. Non-unique reads arise primarily when the same tagmented fragment is amplified during PCR. A low fraction of nonunique reads implies a diversity of fragments after tagmentation, and that errors introduced during PCR will not reach high frequencies. An anomalous secondary peak can occur at >1kb in Bioanalyzer traces. While the nature of these apparently long fragments is unclear, they do not appear to substantially affect the Illumina sequencing reaction Nevertheless, they can bias the fragment-size estimation and thereby lead to a decrease in the number and quality of sequenced reads (see Section "Selecting input DNA concentration and bead to sample ratio"). However, in our experience, this never led to a complete failure of the sequencing run (see Fig 4). We also found that various combinations of (a) good mixing prior to tagmentation, (b) higher initial primer concentration, and (c) "reconditioning PCR" (i.e., 3-4 additional cycles with fresh primers and polymerase [15]) can ameliorate this problem.
Module 4: PCR clean-up and size selection ($0.50/sample, 40 min) In this module, PCR products are purified with magnetic beads and enriched for fragments of the desired length for sequencing. In lieu of the significantly more expensive Illumina-recommended AmPure XP beads, we use the simple magnetic bead solution with the "MagNA" bead-extraction protocol from [11,16] and Thermo Sera-Mag SpeedBeads. See the "Selecting input DNA concentration and bead to sample ratio" section for more information, Fig 2 for an example. In general, we found that a 1:1 volumetric ratio of sample to beads works well for MagNA as well as AmPure beads. It may also be possible to use a sample purification and standardization kit, such as SequalPrep (Life technologies,~$0.50/sample).  In this module, sample concentrations and fragment size distributions are estimated and libraries are pooled. We measure the DNA concentration of each sample fluorescently, as in Module 1; quantification by qPCR is unnecessary at this stage. We discard samples with less than 0.5ng of DNA. Fragment size distribution can be measured with Agilent BioAnalyzer, TapeStation, Bio-Rad Experion, or a number of other devices. While it would be ideal to measure the size distribution of every sample, this is not practical or economically feasible at large scale. Moreover, we found that sample preps from the same 96-well plate typically have similar post-cleanup fragment-size distributions. Thus, we estimate this distribution for a subset of samples (5 to 10). Then, based on individual sample concentrations and the common average fragment length, we calculate the DNA molarity of each sample and pool variable volumes of samples to achieve equimolar concentrations in the pool. Despite the fact that average fragment length can vary across a plate (Fig 5A), Plate 2), calculating molarity based on a few samples results in roughly uniform numbers of reads for~90% of samples (Fig 5B). For applications that are sensitive to fragment size (e.g. de novo assembly), modification of the tagmentation reaction ratios and size-selection cleanup (e.g. via bead-based dual purification, PippinPrep or E-Gel) may be required.
As fragment size distributions often vary more between plates than within plates (Fig 5A), multiple plates should be pooled with equal molarity rather than equal concentration. A final verification of the pooled sample concentration can be useful before sequencing (including qPCR), though most sequencing centers perform this as part of sample quality control.

Selecting input DNA concentration and bead to sample ratio
For each sample, this protocol produces an adaptor-ligated, barcoded, library of DNA fragments with some distribution of read lengths. The optimal fragment-size distribution depends on the sequencing protocol. To maximize the amount of useful sequenced DNA, most fragments should be longer than the combined length of the sequencing reads and adaptors (e.g. above 338bp for paired-end 100bp sequencing). However, as longer fragments are underrepresented in aligned sequencing reads (Fig 4) and can lower overall cluster density (Illumina Nextera technical notes), fragment length should ideally be kept under 1kb.
We found that the distribution of fragment sizes depends mainly on three parameters: (1) the input concentration of gDNA during tagmentation (output of Module 1); (2) the PCR extension time (Module 3); and (3) the amount of beads during post-PCR clean-up (Module 4). Higher input gDNA concentrations generally result in longer fragments because each tagmentation enzyme makes a single cut [13]. Shorter PCR extension time biases the distribution towards shorter fragments. Finally, higher bead to sample ratios for purification can yield smaller fragments because beads preferentially capture larger fragments [16].
When using a novel DNA source or extraction method, we strongly recommend calibrating the first and the third parameters by dilution series on a representative sample (Fig 2) prior to scaling up the preparation. For example, the initial concentration of DNA might be increased to increase average fragment size.

Detailed Protocol
This protocol is for the preparation of 96 samples (8 rows x 12 columns) but can be modified for either higher or lower throughput.  Table). Input gDNA concentrations ranged from 2 to 25ng/μl, and were standardized to 0.5ng/μl. Based on estimated fragment-length distributions, Plates 1 and 2 were pooled in mass ratio 0.8:1. In this preparation, 2 samples (1%) failed to yield libraries, and 17 (10%) produced low, but usable, numbers of reads (between 0. 1  General tips • We advise cleaning pipettes and your station with DNA-Away (Thermo Scientific 7010) to reduce contamination from the environment and previous samples.
• Many steps call for centrifugation of 96-well plates. These steps are necessary for consistency across wells when to transferring small volumes of liquid and should not be omitted.
• The potential for cross-contamination of samples and primers, in particular, is high. We recommend the use of filter tips, exclusively aspirating samples or primers with fresh tips, and avoiding "blowing out" the pipette.
Materials and equipment used throughout the protocol • DNA standards in range of 1-10ng/μl (we use those that come with Life Q-33120) • Plate reader with SYBR-compatible filters Procedure. Note: This procedure assumes gDNA concentration in the range of 1-10ng/μl. If samples cover a different range of concentrations, the procedure should be modified accordingly. We recommend a two step-dilution for samples with a broad range of concentrations.
1. Mix 5μl of concentrated SYBR Green I with approximately 25mL of TE in a clean reservoir to make a working dye solution.
a. The same dye solution should be used across all samples and standards.
3. Add 10μl of gDNA to each well of a fluorometry plate.

Module 2. Tagmentation
Goal. Mix 1.25μl of TD buffer, 0.25μl TDE1, and 1μl of gDNA in each well. Final total volume per well is 2.5μl. Carry out tagmentation reaction in a thermocycler. Procedure. Note: All reagents should be kept on ice. The 96-well plate containing samples should also be kept on ice while assembling the mix, and all steps should be done quickly. Small volume reactions can be difficult to work with; do not skip centrifugation.

Materials and equipment
1. Preheat thermocycler to 55°C. If starting from frozen gDNA, thaw, vortex and spin down gDNA. Thaw TD buffer and TDE1 on ice.
2. Invert TD buffer and TDE1 gently to mix, spin down, and replace on ice.
3. Make the tagmentation master mix (TMM) by mixing 156μl buffer and 31.2μl enzyme in a PCR tube. Mix thoroughly by gently pipetting up and down 10 times.
a. Excess volumes enable rapid and even distribution of tagmentation enzyme and buffer. It is essential that all samples receive the same amount of enzyme. Procedure. Note: The first 6 steps can be done during the tagmentation reaction (Module 2, step 8). Special care should be taken to avoid primer cross-contamination; only fresh tips should be inserted into the primer stock. It saves significant effort to aliquot the primers into 8-and 12-strip PCR tubes, so that multi-channel pipettes can be used. We recommend strip tubes with individual attached caps or using fresh cap strips each time to minimize potential for cross-contamination. When using Microseal 'A', it is important to make sure the plate is fully sealed to prevent evaporation, when using Microseal 'B', care must be taken when removing the seal to minimize cross-contamination.
1. Preparation. Thaw indexing primers at room temperature. Invert gently to mix. Spin down.
Record which indexing primers you are using. Thaw the KAPA master mix at room temperature. Invert gently to mix.
2. Label one fresh 8-well PCR strip for the row master mix (RMM) and one fresh 12-well PCR strip for the column master mix (CMM).
a. It is easy to accidentally rotate these strips-proper labeling is essential. 7. Transfer 10μl of RMMs into each well of the tagmentation plate using a multi-channel pipette, so that each row receives the same Rxxx index.
a. Change tips after every transfer.
b. Make sure that the row number corresponds to the Rxxx.
8. Transfer 10μl of CMMs into each well of the plate using a multi-channel pipette, so that each column receives the same Cxxx/Nxxx index. Mix by gently pipetting up and down 10 times.
a. Change tips after every transfer.
b. Make sure that the column number corresponds to the Cxxx.
a. Gently press the seal on each well, especially edge wells before spinning down; this seal is non-adhesive until heat is added. 10. Place plate in the thermocycler.
a. Ensure that the lid is tight and is heated during thermocycling.