Miniaturization and optimization of 384-well compatible RNA sequencing library preparation.

Preparation of high-quality sequencing libraries is a costly and time-consuming component of metagenomic next generation sequencing (mNGS). While the overall cost of sequencing has dropped significantly over recent years, the reagents needed to prepare sequencing samples are likely to become the dominant expense in the process. Furthermore, libraries prepared by hand are subject to human variability and needless waste due to limitations of manual pipetting volumes. Reduction of reaction volumes, combined with sub-microliter automated dispensing of reagents without consumable pipette tips, has the potential to provide significant advantages. Here, we describe the integration of several instruments, including the Labcyte Echo 525 acoustic liquid handler and the iSeq and NovaSeq Illumina sequencing platforms, to miniaturize and automate mNGS library preparation, significantly reducing the cost and the time required to prepare samples. Through the use of External RNA Controls Consortium (ERCC) spike-in RNAs, we demonstrated the fidelity of the miniaturized preparation to be equivalent to full volume reactions. Furthermore, detection of viral and microbial species from cell culture and patient samples was also maintained in the miniaturized libraries. For 384-well mNGS library preparations, we achieved cost savings of over 80% in materials and reagents alone, and reduced preparation time by 90% compared to manual approaches, without compromising quality or representation within the library.


Introduction
Metagenomic next-generation sequencing (mNGS) is becoming an increasingly useful tool in the field of biology and clinical medicine. Its applications are almost limitless-any nucleic acid can be turned into a library, amplified, and sequenced, making mNGS an appealing technology for labs and hospitals alike. As sequencers such as the Illumina NovaSeq increase throughput, hundreds to thousands of libraries can be sequenced in a single run. Although the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 per-base cost of sequencing has become less expensive over the last several decades, the cost and time associated with sample preparation remain disproportionately high [1,2].
Manual library preparation is tedious and is often the bottleneck for many sequencing projects. Numerous library preparation protocols have been adapted for automation through the use of various positive displacement tip-based liquid handler instruments, including the Beckman Coulter Biomek, Hamilton Star, Agilent Technologies Bravo, TTP LabTech Mosquito, and others [3][4][5]. Though these provide more hands-off time during the library preparation process, the overall cost can often exceed that of hand-prepared libraries due to the increased dead volume of reagents and the large number of expensive, sometimes proprietary tips required for liquid handlers. Furthermore, sub-microliter miniaturization is a challenge for the majority of positive displacement based liquid handlers.
Recently, acoustic liquid handlers with sub-microliter precision have become commercially available and have been used for a large range of applications, including RT-qPCR, mass spectrometry, drug discovery, and compound dosing assays [6][7][8][9]. To date, performance metrics for end-to-end protocols for miniaturized mNGS preparation are scarce. Given the cost and time limitations of current library preparation techniques, we sought to adapt our mNGS library preparation protocol into a high-throughput protocol by leveraging the small dispensing volumes of the Labcyte Echo 525. Here, we describe a detailed protocol that provides high-fidelity, miniaturized, automated, cost-and time-efficient 384-well library preparation together with its quality and performance metrics.

Methods and results
All next generation sequencing workflows begin with isolation of nucleic acid; however, such protocols are highly dependent on the source of the sample and desired product, such as DNA, RNA, mRNA, cell-free RNA, and so on. The protocol described herein is independent of nucleic acid isolation methods, and for the purpose of this work, we assume an input of isolated total RNA. To optimize miniaturization of our laboratory's current library preparation protocol, we prepared libraries from varying concentrations of HeLa RNA using the New England Biolabs Ultra II RNA Library Prep Kit (E7770S/L). The miniaturized and automated workflow is presented in Fig 1 and described in detail below and at https://www.protocols.io/ private/07DC2C6765A0D9665C227879D935EED3.

Sample dehydration
To achieve miniaturized volumes, nucleic acid samples must first be dehydrated. Several vacuum evaporators were tested to dehydrate input RNA: Thermo Savant AES2010, Thermo Savant SC110A, Thermo Savant DNA110, and GeneVac EZ-2. 5uL of extracted RNA were loaded into a 384-well PCR plate (E&K Cat No EK-75009) and were spun at low (no heat/room temperature), medium (40˚C), or high (65˚C) temperatures until all wells were completely dry. Room temperature drying was prohibitively slow; at medium and high temperatures, drying times were comparable. At 40˚C, sample plates dried fastest with the GeneVac E-Z2 (25 minutes) and ranged from 35-50 minutes using other machines. Of note, variable drying times were observed between brands of 384-well PCR plates. After drying RNA at 40˚C for 25 minutes, samples were rehydrated in 5uL water. Parallel capillary electrophoresis (Agilent Bioanalyzer) RNA Integrity Number showed no significant difference between the resuspended sample and the original sample (T-test p = 0.306, Fig 2), demonstrating that dehydration does not compromise RNA quality within these parameters.

Intra-sample positive control ERCC spike-ins
Internal spike-in control nucleic acids are useful indicators of potential library preparation errors. Furthermore, carefully designed spike-in controls, such as the External RNA Controls Consortium (ERCC) collection [10][11][12][13], which consists of 92 variable-length templates present at a pre-defined range of concentrations, may be used to establish the relationship between read count and input RNA concentration. For this mNGS protocol, 25 picograms of ERCC

Library preparation
Our laboratory has previously described a mNGS library prep protocol for RNA using the New England Biolabs Ultra II RNA Library Prep Kit [14][15][16]. In this protocol, RNA was quantified using QuBit, fragmented in a magnesium-based buffer at 94˚C, primed with random hexamers, and reverse transcribed to form cDNA. Libraries were made Illumina-compatible by blunting ends of cDNA and adding non-templated d-A tails. Loop adaptors were ligated and cleaved with uracil-specific excision reagent (USER) enzyme before PCR enrichment was performed with sample-specific octamer primers.
To miniaturize and adapt this procedure to the Echo 525, we tested several protocol modifications involving reduced volume of reagents using variable HeLa RNA input (0, 0.1, 0.5, 1, and 5 ng in 5uL of water). Reagents for each step, including ERCC spike-ins, were prepared as a master mix and miniaturized volumes were dispensed using the Echo 525. The ideal number of PCR cycles is input-dependent and should be optimized depending on sample characteristics; for this miniaturized protocol, 19 PCR cycles were used to achieve adequate amplification of very low input samples; however, samples with a higher concentration of RNA could be appropriately amplified either by diluting the input or lowering the number of PCR cycles.
Although all dispensing steps utilized acoustic liquid handling in this protocol, magnetic bead-cleaning steps still required pipetting automation. To facilitate high-throughput beadcleaning, we programmed the Beckman Coulter Biomek NX P with a 384 multichannel head to perform a simultaneous 384-well bead clean and size selection using AMPure XP beads (Beckman Coulter Cat. No A63881). Bead volumes were chosen to select for a final library size of �200 base pairs and to remove adaptor-and primer-dimers. An Alpaqua 384 Post Magnet Plate (SKU A001222), adjusted with a 3D-printed plastic adaptor (available for download at http://derisilab.ucsf.edu/index.php?3D=225), was used to minimize library elution volume. The adaptor raises the PCR plate when placed on the magnet plate, thereby lowering the height of the bead-DNA pellet on the side of each well and allowing for complete resuspension of beads in as low as 6uL of eluent.
To check the quality of libraries prepared using the miniaturized protocol, we used a parallel capillary electrophoresis assay to process up to 95 samples simultaneously (Advanced Analytical Fragment Analyzer). Libraries were assessed for distribution of cDNA fragment size, estimated molarity and concentration, and for the presence of primer-and adaptor-dimers. Library size distributions were consistent between libraries prepared with both full and miniaturized volumes (Fig 3).

Sequencing results
Final HeLa RNA libraries were sequenced on the Illumina MiSeq to an average depth of 2.5 million paired-end reads. Resultant data were processed through a pipeline for pathogen detection developed in our laboratory involving subsequent removal of duplicate reads, reads with low quality, and reads aligning to phage [14,17,18]. Original FASTQ files are available at BioProject Accession #PRJNA493096.
Analysis of reads aligning to ERCC transcripts showed strong correlation to their original molar spike-in concentrations, which spanned six orders of magnitude, indicative of a successful library preparation and reflected uniform PCR amplification across fragment size (R 2 = 0.93, Fig 4A). The linear association between input RNA concentration and sequencing output results was used to calculate the approximate amount of HeLa RNA present in the original sample by solving the following equation: ERCC mass ðpgÞ Total mass ðpgÞ ¼ ERCC reads Total reads . The mass of input RNA measured by fluorometric quantification (Qubit HS RNA) correlated strongly with estimations using the ERCC back-calculation method (R 2 = 0.995, Fig 4B) indicating that ERCC controls are an effective way to assess the mass of input nucleic acid, even when present at subnanogram levels.
The HPV18 genome is integrated into the HeLa cell line and results in HPV RNA expression [19,20]. We used HPV18 RNA reads recovered from HeLa samples as a proxy for a human metagenomic sample. Libraries were analyzed for percent and depth of coverage of the reference sequence of integrated HPV18, as well as the human transcriptome. At 1ng and 5ng of input, HPV18 coverage was essentially indistinguishable for reactions performed with full or miniaturized volumes of reagents, indicating that using as low as 10% of the standard reagent volume provided equivalent detection of the viral genome ( Table 1). Assessment of human transcriptome coverage in the HeLa libraries demonstrated high correlation between full volume and miniaturized reactions (for 5ng RNA input: Spearman's ρ = 0.79, p<0.001; for 1ng RNA input: Spearman's ρ = 0.69, p<0.001, Fig 5). Additionally, the miniaturized protocol demonstrated strong correlation between replicate 1ng HeLa RNA samples (Spearman's ρ = 0.84, p<0.001). Together, these metrics show that libraries prepared using the miniaturized protocol are equally sensitive for detection of both pathogen and human reads.

Improving high-throughput library pooling
With advances in sequencing technology, the number of reads provided by a single run on high-throughput sequencers such as the HiSeq or NovaSeq is driving the creation of larger sample pools to take advantage of lower sequencing costs. Samples can be pooled to occupy equal portions of a flow cell lane by capillary electrophoresis, fluorimetry, or qPCR. These processes are costly, tedious, and error-prone due to imprecise estimations and inaccurate pipetting, especially when pooling large numbers of libraries together. To overcome these difficulties, we employed a two-step process to optimize the pooling of hundreds of samples (Fig 6). First, we used the Echo to dispense equal volumes (500nL) from each sample of a set of 265 libraries. This pool was then sequenced on the Illumina iSeq to a total combined depth of 4.5 million reads (other existing options include a MiSeq). Use of the Echo to dispense nanoliter volumes allowed preservation of the bulk of each original sample library. The read distribution of the 265 libraries resulted in a normal distribution with each library occupying a mean of 0.377% ± 0.125% of the total reads per sample (Fig 6).
By virtue of the internal ERCC controls, the number of reads belonging to each library was proportional to the partial concentration occupied by each library. This enabled estimation of the partial concentrations for each of the original 265 libraries. Next, the Echo 525 was used to dispense calculated equimolar volumes of each library (ranging from 160nL to 3800nL) into a final pool. The pool was sequenced on the Illumina NovaSeq to a total combined depth of approximately 11 billion paired-end reads. Sequencing results from the NovaSeq yielded a

Cost and time comparison
This miniaturized, high-throughput protocol significantly reduces the materials cost and time associated with library prep. The materials cost, at the time of this writing, for each library preparation using the manual protocol, including consumables such as reagents and tips, was approximately $43 (~$16,648 for 384 samples). Using this miniaturized protocol, the cost per sample dropped to approximately $8 per sample (~$3,161 for 384 samples), resulting in cost savings of over 80% (Table 2)  time to complete the same number using the automated miniaturized protocol was approximately 10 hours (Table 3). This significant time savings further increases the cost savings of this protocol, as the cost estimates above do not take labor into account.

Use-case example: mNGS for pathogen detection
As a demonstration of the applicability of this protocol, we used mNGS for pathogen detection in clinical samples. Sequencing libraries were prepared, as described above, from RNA isolated from two clinical samples. Patient #1: Endotracheal tube aspirate from a patient with respiratory failure was collected with IRB approval (UCSF IRB 14-13546) and informed consent. An aliquot was sent to the UCSF hospital clinical lab and another aliquot was placed immediately on dry ice. Standard bacterial culture of this sample produced heavy Enterobacter cloacae growth. Sequencing libraries produced by hand with full reagent volume, and again as described in the miniaturized protocol above, both demonstrated robust detection of E. cloacae RNA (109 unique rpm vs 103 unique rpm, respectively). Patient #2: bronchoalveolar lavage sample was

Discussion
In this study, we present a miniaturized, highly automated, high-throughput protocol for preparation of high-quality mNGS sequencing libraries. Compared to the standard full-volume protocol, this protocol is faster, less expensive, produced data of equivalent quality for pathogen detection and human transcriptome coverage in HeLa preparations, and, for use-case patient samples, demonstrated correlation with clinical microbiology test results. To our knowledge, this is the first application of the Echo for metagenomic and metatranscriptomic sequencing analysis and adds to existing Echo-based protocols for DNA synthesis and plasmid sequencing [21,22]. This protocol has several advantages over alternative library preparation approaches. First, since the Labcyte Echo does not use pipette tips, tip contamination is eliminated as is the pipetting error inherent to transferring small volumes by volumetric displacement [23]. Second, simultaneous processing of up to 384 samples significantly reduces variance between library preps and exposes all samples to the same environmental and reagent contaminants, minimizing batch effects. Third, use of ERCC spike-in controls allows the determination of the original RNA input quantity, which is highly variable in human patient samples, and is often difficult to measure by traditional spectroscopy. The ERRC controls also help establish the degree of linearity between read counts and input concentrations. The latter is particularly important to avoid calling false negatives in clinical metagenomics. Lastly, the robots used in this miniaturized and automated library prep protocol enable rapid and simultaneous processing of a up to 384 samples at a time. This results in significant cost and time savings and can allow large-scale projects to be completed at an expedited timeline.
There are several limitations to this protocol. First, although this protocol reduces the time needed to prepare libraries for sequencing, it does not take into account the necessary upstream procedures to isolate nucleic acid [24]. Second, Labcyte Echo source plates have a limited working volume which results in dead volume loss of reagents in each source well; however, because the overall volume of each master mix is greatly reduced by the miniaturization, this dead volume does not greatly affect the overall cost of the prep. Third, the ability to estimate the relative concentrations of samples within a pool may be limited by the broad range of sample input concentrations and the shallow depth of sequencing on the iSeq. This may result in some samples not achieving equimolar pooling on the NovaSeq and could be addressed by additional sequencing depth. Fourth, the bead-based clean-up steps described here require aspiration with traditional tip based liquid handlers. Finally, the capital expense cost of robots used in this protocol is significant. Access to a core lab facility with these machines will greatly reduce the initial start-up cost of this protocol.
In conclusion, we present an automated, miniaturized, high-throughput protocol to prepare RNA sequencing libraries using the NEB Ultra II RNA Library Prep Kit. With this workflow, it is possible to prepare 384 high-quality sequencing libraries with just 10% of the regular reagent volume, at less than 20% of the cost and in less than 10% of the time compared to the regular hand-prep. The workflow presented here may support the further advancement of clinical metagenomics as well as large scale sequencing projects.