Nanoliter Reactors Improve Multiple Displacement Amplification of Genomes from Single Cells

Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-μl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.


Introduction
Recovery of whole genome sequences from single cells greatly facilitates the study of microbial ecology and evolution because the majority of microorganisms cannot be obtained in pure culture [1,2]. A method called Multiple Displacement Amplification (MDA) [3][4][5][6] enables genome amplification from single cells isolated by FACS flow cytometry [7] or by serial dilution [8]. Micromanipulation methods [9] have allowed isolation of cells identified by fluorescent in situ hybridization (FISH) using 16S rRNA gene probes, allowing specific microbes to be selected and increasing the confidence of asserting the presence of single cells in MDA reactions. Partial genome sequencing from single-cell amplicons has also been demonstrated [8,10,11] MDA suffers from two unwanted characteristics: (1) nonspecific synthesis [3][4][5]8] coming from either DNA contamination competing with the intended template or endogenously generated DNA such as primer dimers, and (2) uneven representation of the template due to amplification bias [4,5] that is worsened by stochastic effects of MDA from a single copy of the genome. In the initial report of MDA from a single bacterial cell [7], an estimated 70% of DNA synthesis was nonspecific and, of the 30% that was specific to the cell isolated, amplification bias ranged over several orders of magnitude. A recent study using a combination of MDA and rolling circle amplification showed single-molecule amplification of circular 7-kb DNA templates, and demonstrated that improved specificity could be achieved by reducing the volume of the MDA reaction [12] from the standard 50ll down to 600 nl. The effect of the lower volume on amplification bias was not determined. Meticulous reagent cleaning and strict sample handling procedures can be used to make background amplification negligible in microliter MDA reactions; this enabled accurate assembly of 62% and 66% of individual Prochlorococcus genomes after conventional Sanger sequencing to depths of 3.53 and 4.73, respectively [8]. Due to amplification bias, these sequencing depths are greater than would be required for unamplified DNA template.
Here, we studied the performance of MDA on single-cell genome amplification and show by means of a direct comparison that amplification bias is reduced and specific amplification is increased as the reaction volume shrinks from microliters to nanoliters. Parallel single-cell isolation and whole genome amplification was performed using a dedicated microfluidic chip with 60-nl reactors. The microfluidic device has an integrated cell sorter to isolate selected individual cells, thereby allowing flexible sample selection and avoiding the contamination that can reportedly be introduced by conventional FACS [8]. Parallel amplification of defined cells greatly reduces reagent consumption, making the process more economical. This technique allows for significant improvement of the amplification specificity by strongly reducing the background amplification without requiring stringent reagent or sample handling protocols. Direct pyrosequencing of the amplification product shows highly specific assembly and mapping onto the consensus genome.

Results/Discussion
The microfluidic chip allows for the parallel isolation and addressing of eight single bacteria into separate 3-nl chambers in an automatic fashion, and has also been used to perform a manual survey of environmental microbes in the human oral cavity [13]. We used it here to perform an automatic isolation and addressing of single E. coli fluorescently labeled by a generic DNA stain. Both the microscope and the valve operation are driven by custom software. Cells are pumped into the sorting channel ( Figure 1). As soon as a fluorescent signal is detected in front of the isolation region, the isolation valve is closed. An image is taken and analyzed to count the number of cells. If the cell number is not equal to one, the algorithm goes back to the previous step. If there is one cell, the sorting valve is opened and the single cell is pumped into the template chamber until it is detected at its entrance and the chamber is then closed. The algorithm is then iterated for the next processing units, uniquely addressed using the multiplexer. Once all chambers are processed, they are carefully checked to verify the presence of a single cell (or absence in the no-cell control chamber). An example of one round of isolation is shown in Figure 1. The first seven chambers contain one single fluorescently labeled cell and the last one is used as a control. A semiautomated (or manual) procedure using unstained cells was essentially similar to the automated one, the different steps being performed by the operator instead of the algorithm. The cell detection and counting were performed visually using phase contrast microscopy.
On chip whole-genome amplification of single E. coli cells was performed in a final reactor volume of 60 nl, yielding about 10 7 copies of the genome. The amplification units contain four chambers, each one dedicated to a single step of the amplification protocol: a template chamber (3-nl volume), an alkaline cell lysis and DNA denaturation chamber (3.5 nl), a neutralization chamber (3.5 nl), and a reaction chamber (50 nl). Chip operation is shown in Figure 2. The lysis buffer was pushed into the feed line until all the air escaped from the channel. We then opened both the feed and lysis valve. The solution containing the cell was pushed into the lysis chamber, replacing the air inside. Once the chambers filled, we closed the feed valve, waited about 20 min, and washed the feed line first with air, then with the neutralization solution; the waste valve was open during this operation. We then reopened the feed valve, opened the neutralization valve, and pushed the liquid until complete filling of the neutralization chamber was achieved. Neutralization occurred for about 15 min, during which time the feed line was washed again, this time with the reaction mix. The reaction chamber was finally loaded by opening the feed and reaction valves. All the chip valves were then closed, except for the three valves separating the chambers, and the chip was placed onto a hotplate set at 32 8C for 10 h to 16 h to carry out the amplification step. We retrieved the samples in a volume of about 10 ll. The amount of DNA resulting from the amplification was about 40 ng to 50 ng. We estimated the copy number at this step using quantitative PCR of a region of the gene coding for the SSU rRNA ( Figure 2). The 60nl reactor provided fairly reproducible DNA yields; the distribution of yields over 25 separate single-cell amplifications was approximately lognormal with median 1.4 3 10 7 and geometric standard deviation of 19. We reamplified half of each sample, this time using the protocol recommended by the manufacturer. This second amplification yielded about 50 lg of DNA in the 50-ll reaction. The yields are also highly reproducible and demonstrate the utility of serial amplifications when starting from small initial reaction volumes. MDA tends to be a self-limiting reaction, with DNA yields reaching a plateau at about 0.7-1.0 lg/ll. Therefore, the overall amplification should still be about 10 9 À10 10 -fold [7], whether amplifying from one E. coli cell in a single 50-ll reaction or serially amplifying, first to the plateau level in 60 nl, and then transferring half or more of this to a 50-ll MDA reaction.
For control reactions, 12 replicate MDAs were carried out in a 50-ll volume as recommended by the manufacturer (and referred to hereafter as the ''standard reaction''), with each reaction receiving a single E. coli cell isolated by micromanipulation [9,10]. The yield for the control reactions ranged between 40 lg and 50 lg of DNA. Quantitative PCR (qPCR) using Taqman assays for ten different single-copy loci [7] of the E. coli genome was used to evaluate all MDA reactions. The results are reported as concentration (copies/ ll) ( Figure 3). Amplification bias is indicated by the over-and underrepresentation of the ten loci. The bias is far greater for the 50-ll MDA reactions, and several loci were not detected at all in four of the 12 reactions, whereas all loci were detected in the microfluidic reactions. On a percent basis, where a value of 100% indicates that a locus is still present at one copy per genome [5] after the amplification, the loci ranged from .0.1% to 106% with coefficient of variation of 223% for the twelve 50-ll MDA reactions and 21% to 80% and 16% to 77%, with coefficient of variation of 88% and 135%, respectively, for the two microfluidic MDA reaction sets ( Figure 3D).
It is not immediately obvious why bias would be lower in smaller volumes, as bias is generally thought to be a concentration-independent phenomenon resulting from differences in local priming efficiency at different DNA sequences. Bias is also thought to result from stochastic effects early in the reaction, in which uneven amplification occurs by chance across the template. One possible explanation is that the small volume reactions have reduced competition with contaminant or endogenously generated background, thus providing more DNA polymerase molecules per E. coli template and ensuring more uniform amplification. It is also possible that damage to the DNA template is reduced in the microfluidic system, and this results in lower amplification bias. As pointed out above, the 60-nl MDA and subsequent MDA up to a final volume of 50 ll gives a net amplification of 10 9 -10 10 -fold, just as for the single 50-ll standard reactions. Therefore, any difference in amplification bias or coverage between nanliter and microliter reactions should not be simply due to a lower total-fold amplification in the 60-nl case.
qPCR carried out at a sufficient number of different loci can also be used to estimate the specificity of the reaction for the single E. coli DNA template [5]. While amplification bias will result in over-and underreprentation of various loci, the average representation of the combined single-copy loci tested will approach one copy per genome if the amplified DNA is entirely E. coli sequence. The average of the ten loci for the two sets of 60-nl MDA reactions was 0.47 and 0.38 copies per genome, respectively, indicating that approximately 47% and 38% of the amplified DNA was specifically derived from the E. coli DNA template and implying that 53% and 62%, respectively, was nonspecific DNA synthesis such as primer dimers or amplification from contaminating DNA templates. The larger-volume control reactions were less specific, with an average locus representation of only 27%. This was consistent with a previous report of 30% for standard 50-ll MDA reactions [7].
In order to measure the specificity of the amplification, large-scale pyrosequencing was performed on the two best single-cell amplicons from each method ( Figure 3, dashed lines), based on their performance in the qPCR assays. In the qPCR assays, these had loci representations, averaged over all ten loci tested, of 80% for the 60-nl microfluidic MDA reactions and 88% for the 50-ll control reaction. Therefore, it was estimated that 20% and 12%, respectively, was nonspecific DNA synthesis. When these samples were used in pyrosequencing, the nanoliter MDA sample produced 97,470 usable reads, totaling 9,833,093 bases, and the microliter MDA sample produced 114,551 usable reads, totaling 11,518,280 bases, each sample being processed on one half of a 454 Life Sciences Genome Sequencer 20 picotiter plate. For each sample, a de novo assembly of the reads was performed using the GS 20 Data Processing Software runAssembly script. For both samples, the read status statistics for assembly were similar ( Table 1): 82% of the reads assembled, 7%-8% of the reads had no overlap and remained singleton, 10% of the reads were identified as likely repeats, and less than 0.4% of the reads were identified as problematic by the assembler, possibly due to being chimeric sequences.
Individual reads were mapped to the E. coli K12 reference strain genome using the GS 20 Data Processing Software runMapping script. For both samples, .95% of all reads mapped fully or partially to the reference genome. This level is reasonably consistent with the qPCR analysis (Figure 3), which estimated that about 80% and 88% of DNA was specific for the 60-nl and 50-ll reactions, respectively, but was based on the average of only ten loci. The qPCR reactions provide a useful quality-control assay prior to sequencing, as well as testing the relative performance between different reaction conditions, such as two volumes (Figure 3). Even for novel microbes, qPCR quality-control assays based on 16S rDNA sequences or other initially identified loci can be used to assure a high level of success in subsequent sequencing efforts. One possibility that remains to be tested is that the 454 sequencing protocol itself may also improve specificity if small nonspecific DNAs such as primer dimers are lost during PCR or DNA clean-up steps. If so, the combination of MDA and emulsion PCR sample preparation will be a particularly powerful approach to single-cell sequencing applications. The assembly process also greatly improved specificity for E. coli sequence, with over 99% of the contigs mapping to the  For the nanoliter sample, contigs covered 33% of the reference genome. For the microliter sample, contigs covered 39% of the reference genome. Since the microliter sample was prescreened by qPCR to assure relatively low amplification bias, and was therefore not a representative sample, it is not surprising that its performance was comparable to the nanoliter sample.
The average length of mapped reads was 100 bases. For the nanoliter sample, reads covered 61% of the reference genome. For the microliter sample, reads covered 67% of the reference genome. Under the ideal assumptions implicit in the Lander-Waterman model for the fraction of genome expected to be covered by reads, for the nanoliter sample, with 9,408,767 mapped bases expected to provide an average of 2.03-fold sampling of the genome, the expected genome coverage is 87%, but only 61% was covered. For the microliter sample, with 10,878,753 mapped bases, the expected genome coverage is 90%, but only 67% was covered. Thus, for both samples, only 70%-75% of the expected genome coverage was achieved. The amplification bias created by MDA [4,5,7,14] can account for this discrepancy. Overrepresentation of some sequences and underrepresentation of others should result in the coverage being lower than the ideal value. Nevertheless, the ability to sequence unculturable microbes, even with the requirement of sequencing to greater depth in order to close genomes, promises to be a major advancement in microbiology.
A second method used to assess representational bias was to partition the reference genome into ten equally sized segments, count the number of reads whose centers mapped to each segment, and analyze the resulting histograms. For the nanoliter sample, the standard deviation of counts among the ten segments was 29% of the mean value. For the microliter sample, the standard deviation of counts among the ten segments was 38% of the mean value. By comparison, in four regions of 454 sequencing data from a non-MDA sample (Erythrobacter litoralis), the standard deviation of counts among ten segments ranged from 3% to 4% of the mean value. The lower bias for sequences derived from the 60-nl MDA reaction is in agreement with the qPCR analysis, which also indicated that lower reaction volume reduces bias (Figure 3).
The MDA reaction is known to generate DNA chimeras. We used the sequence data to measure the frequency of chimeras. Using the reads that only partially mapped onto the reference genome, we discovered 792 chimeras in the nanoliter samples and 495 chimeras in the microliter sample. This corresponds to a chimera rate of 1 in 10 kbp and 1 in 20 kbp, respectively. This is slightly better than observed when sequencing Sanger libraries of MDA amplifications, and slightly worse than observed when the samples are treated with an S1 nuclease [8]. Additional studies will be required to determine if the difference in chimera rates between nanoliter and microliter samples is significant. The reaction pathway that leads to chimera formation was recently solved [15] and may be unrelated to amplification bias, template specificity, and percent coverage. Chimera formation results from alterative secondary structures that can occur in the branched DNA formed during the MDA reaction. DNA 39 termini extended on an initial template can be displaced by a branch migration mechanism and then extended on a new template, creating the rearranged sequence. Eighty-five percent of chimeras consisted of two segments joined in inverted orientation as predicted by the model.
In conclusion, we have shown that microfluidic devices allow accurate and high fidelity single-cell genome amplification by MDA in nanoliter volumes. The advantages of going to smaller volumes include a higher percentage of specific product from the targeted DNA template, reduced amplification bias, and significant economies of scale in terms of reagent consumption. High throughput pyrosequencing of the amplicons shows that the specificity from single cells is extraordinarily high, which suggests that it may be possible to perform full genome assemblies with this procedure. Pyrosequencing has the advantage of simplified library construction, and although current pyrosequencing technologies have shorter read lengths than Sanger sequencing, substantial de novo assembly was achieved. For unculturable organisms, this approach will greatly increase the diversity of species amenable to genomic study. The ability to rapidly acquire even substantial portions of the genome, when used to aid in assembly of existing metagenomic shotgun data [16], promises to greatly accelerate genomic discovery of new microbial species.

Materials and Methods
Chip fabrication. Chips were made as described for a push up geometry [17] with some adjustments described in detail in Protocol S1.
Chip design. The chip has nine processing units; eight of them have both the cell sorting and amplification features and one is dedicated to the positive control and lacks the sorting capability.
The chip contains 24 control lines operating 225 valves. The valves define: six valves to control input lines, 9 3 1 pumps, a base 3 multiplexer replicated on both sides of the chip to uniquely address each processing unit for input and output, 8 3 3 valves for the sorting process, and 9 3 6 valves for the amplification operation. The attribution of the different inlets are described in the Supporting Information.
Chip operation for cell sorting. All operations were performed using custom software written in LabVIEW (National Instruments, http://www.ni.com/) driving both the valves through a BOB3 controller (Fluidigm, http://www.fluidigm.com/) operating 24 solenoid valves and an automated microscope (DMI6000; Leica, http://www. leica-microsystems.com/) onto which the chip was mounted. The automatic cell sorting and addressing was performed with a 203 phase contrast objective and the image recorded using a CCD camera (Retiga 2000 Qimaging, http://www.qimaging.com/). The chip was positioned using dedicated alignment marks replicated for each processing unit. One phase contrast image of the first mark was recorded and used as a reference to automatically position and adjust the focus for the next processing units by image comparison analysis. The automatic cell sorting was operated using fluorescence. Every detection step was performed by recording an image of a small area. This image was analyzed in real time using a LabVIEW particle analysis subroutine after low pass filtering and thresholding. The final verification of the chambers was performed either by fluorescence or by subtracting two phase contrast images a few seconds apart. Since we never observed a cell adhering to the template chamber, the resulting image always exhibited a bright and a dark spot corresponding to two different positions of a single cell displaced by Brownian motion.
On chip amplification protocol. The on chip amplification procedure was made with the Qiagen REPLI-g MIDI kit (http:// www1.qiagen.com/) using a modified protocol. The template chamber where the cells were addressed was filled with PBS buffer with 0.2% Triton X-100. The lysis buffer and neutralization buffer were prepared as recommended in the protocol for cell amplification. The reaction mix was supplemented to reach a final concentration of 23 of kit polymerase and 0.2% of Tween 20. After amplification, the samples were retrieved. The feed line was flushed with air and then washed with TE buffer containing 0.2% Tween 20. Then the feed and the output valves opened and the amplified reactions in the chamber were pushed into gel loading tips placed into the outlet holes until reaching the desired volume.
The off chip reamplification was performed following the protocol recommended by the manufacturer for genomic DNA using 5 ll of the amplified samples.
Micromanipulated cells. The system for micromanipulation has been described in detail elsewhere [9]. Briefly, an inverted microscope (Olympus IX70, http://www.olympusmicro.com/) with micromanipulation equipment (TransferMan NK2 and CellTram Vario; Eppendorf, http://www.eppendorfna.com/) was used with sterilized glass capillaries (ID 10 lm, Eppendorf) to isolate single E. coli cells from a suspension of cells to 200 nl TE buffer. The cells were placed on ice until all cells for the control series were collected. TE-buffer (2.8ll) was added and MDA carried out using the Repli-g kit (Qiagen) following the manufacturer's recommended protocol. After incubation for 16 h at 30 8C the reactions were terminated at 65 8C for 3 min.
PCR analysis. The on chip amplification was analysed using a SYBR Green qPCR assay (Bio-Rad Laboratories, http://www.bio-rad.com/), with primers directed at a 200-bp region of the 16S rRNA gene of E. coli strain K12. The resulting copy number was then divided by seven, the number of copies of this gene in the genome.
The reamplified MDA products were analyzed by Taqman assay for ten different single-copy loci together with the control reactions [7]. The concentration of double stranded DNA was measured using a PicoGreen assay (Invitrogen, http://www.invitrogen.com/).
Pyrosequencing. Approximately 5 lg of the MDA products chosen for sequencing were used for 454 library construction according to the recommended procedures of 454 Life Sciences (http://www.454. com/). Sixteen emulsion PCRs were set up for each sample, and 454 standard protocols were followed for both enrichment and sequencing.
Sequence Analysis. For each sample, de novo assembly of the pyrosequencing data was performed using the GS 20 Data Processing Software runAssembly script. Assembly metrics were collected from the output 454NewblerMetrics.txt file.
For each sample, a mapping of the pyrosequencing reads to the E. coli K12 reference sequence was performed using the GS 20 Data Processing Software runMapping script. Mapping metrics were collected from the output 454NewblerMetrics.txt file. Reads categorized as PartiallyMapped (Partial) were further analyzed by using National Center for Biotechnology Information (NCBI) BLAST Version 2.2.10 (http://www.ncbi.nlm.nih.gov/) to align each of these reads to the E. coli K12 reference sequence. Reads that had two segments of length .20 bp that mapped to noncontiguous portions of the reference genome were characterized as chimeric.