Conceived and designed the experiments: NS JJDM MB. Performed the experiments: NS. Analyzed the data: NS. Contributed reagents/materials/analysis tools: NS. Wrote the paper: NS MB.
The authors have declared that no competing interests exist.
The ability to efficiently and economically generate libraries of defined pieces of DNA would have a myriad of applications, not least in the area of defined or directed sequencing and synthetic biology, but also in applications associated with encoding and tagging. In this manuscript DNA microarrays were used to allow the linear amplification of immobilized DNA sequences from the array followed by PCR amplification. Arrays of increasing sophistication (1, 10, 3,875, 10,000 defined sequences) were used to validate the process, with sequences verified by selective hybridization to a complementary DNA microarray and DNA sequencing, which demonstrated a PCR error rate of 9.7×10−3/site/duplication. This technique offers an economical and efficient way of producing specific DNA libraries of hundreds to thousands of members with the DNA-arrays being used as “factories” allowing specific DNA oligonucleotide pools to be generated. We also found substantial variance observed between the sequence frequencies found via Solexa sequencing and microarray analysis, highlighting the care needed in the interpretation of profiling data.
The ability to efficiently and economically generate libraries of defined pieces of DNA would have a myriad of applications, not least in the area of defined or directed sequencing and synthetic biology but also in applications associated with encoding and tagging. There are many examples of where DNA has been used as an encoding device for peptides or small molecules, enabling the high-throughput screening of peptide/small molecule interactions with a range of biological targets
Perhaps the first use of DNA encoding in this scenario was in the early days of combinatorial chemistry, with bead-based, DNA-encoded libraries composed of up to 800,000 heptapeptides
Another application of DNA libraries is nucleic acid aptamers, which are able to bind molecular targets such as small molecules, proteins, nucleic acids, and even cells, tissues and organisms
DNA microarrays can be efficiently and economically custom synthesized to contain high numbers (up to millions) of relatively long (up to 200 bp) DNA oligonucleotides
Efforts have been made to obtain oligonucleotide libraries from a microarray by cleaving the oligonucleotides off the array followed by PCR amplification, thereby generating multiplex DNA libraries for parallel genomic assays
Here we demonstrate an approach to the generation of DNA libraries from DNA microarrays allowing the efficient and inexpensive production of custom made thousand-member DNA libraries. The DNA libraries were generated while keeping the array intact and useable for subsequent applications, such as additional rounds of DNA production. This was achieved by fabricating arrays up to 10,000 oligonucleotides followed by “read-off” from the array using a DNA polymerase with subsequent amplification by PCR (
A ssDNA microarray was incubated with a primer (16 h) followed by elongation using
In order to explore the fidelity of the approach, microarrays were designed to contain an increasing number of different DNA oligonucleotides (1, 10, 3,875, or 10,000) and were based on the 17 bp sequences (with a 12 bp variable region) complementary to a previously reported 10,000-member PNA-encoded peptide library
The first oligonucleotide array was designed to contain just one sequence (
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H = A, C, or T.
The 3,875 and 10,000-member oligonucleotide arrays were designed with the variable domain (12 bp;
Number of oligonucleotides |
Number of spots |
2000 oligonucleotides×16: | 32,000 |
1000 oligonucleotides×8: | 8,000 |
500 oligonucleotides×4: | 2,000 |
250 oligonucleotides×2: | 500 |
125 oligonucleotides×1: | 125 |
3,875 oligonucleotides in total: | 42,625 |
The first steps in the process involved primer hybridization and elongation on the solid support and required extended reaction times for efficient production of a double stranded (ds) DNA microarray, with one DNA strand covalently attached to the surface. The newly synthesized DNA strands could then function as templates for solution phase PCR carried out over the microarray leading to amplification of the ssDNA displayed on the microarray (
PCR “read-off” of the 1-member oligonucleotide array gave a 50 bp band by DNA gel electrophoresis (
(
Amplification off the 10,000-member oligonucleotide microarray was repeated 5 times after the initial round of primer hybridization, elongation, and washing but without stripping off the newly synthesized DNA and resulted in similar isolated yields of 39–40% (Eq. 1) illustrating that “read-off” can be performed multiple times without damaging the array (
Previous studies have shown that spacer molecules reduce steric interference of the support on the hybridization efficiency of immobilized oligonucleotides
To allow microarray quantification of the DNA microarray “read-off” libraries, these were further amplified by PCR with a FAM-labeled primer and an unlabeled primer (primer-1 and primer-2-FAM for DNA-10, primer-3 and primer-4-FAM for DNA-3,875 and DNA-10,000) producing FAM-labeled dsDNA libraries (DNA-10-FAM, DNA-3,875-FAM, DNA-10,000-FAM;
The dsDNA-10-FAM was hybridized onto a complementary DNA microarray identical to the “read-off” DNA microarray (above). Fluorescent microarray imaging in combination with BlueFuse technology (BlueGenome) was used to quantify the intensity of the FAM-label and thereby determine the amount of DNA hybridized to each spot (ArrayExpress: E-MEXP-3102).
The double stranded DNA-3,875 and DNA-10,000 libraries needed to be hybridized to DNA microarrays that encode only the 12 bp variable domain of the DNA-10,000 library (
Raw microarray data were obtained from Bluefuse, which allows grid alignment and signal estimation. The top ∼5% and the bottom ∼5% of each of the replicate-sets were removed as outliers (erroneous values caused by dust, scrapes etc.
(
The slight differences in average intensities for the 10 oligonucleotide graph (
The graph for the 3,875 oligonucleotides shows a linear relationship between the microarray intensities versus the number of replicates illustrating that the 3,875 DNA templates had been “read-off” and amplified relative to the number of replicates of oligonucleotides on the microarray (
The average intensity versus the number of replicates for the 10,000 oligonucleotides showed a curved distribution illustrating that the microarray “read-off” occurs uniformly over the high-content arrays with few replicates of each oligonucleotide (
Solexa sequencing of the DNA-10,000 oligo-pool identified 9976 sequences from the possible 10,000 DNA oligonucleotides synthesized on the DNA microarray giving a loss rate of 0.2% (24 oligonucleotides not seen out of 10,000,
Sequence | Microarray intensity |
|
1.53 E+04 |
|
2.64 E+04 |
|
1.12 E+04 |
|
6.38 E+03 |
|
1.53 E+04 |
|
2.33 E+04 |
|
3.81 E+04 |
|
3.64 E+04 |
|
1.69 E+04 |
|
2.06 E+04 |
|
1.80 E+04 |
|
2.54 E+04 |
|
3.57 E+04 |
|
2.37 E+04 |
|
3.47 E+04 |
|
2.30 E+04 |
|
3.25 E+04 |
|
3.28 E+04 |
|
2.37 E+04 |
|
2.09 E+04 |
|
3.75 E+04 |
|
3.52 E+04 |
|
2.13 E+04 |
|
1.47 E+04 |
Of interest was that the 9976 sequences were seen between 1 to 4837 times each (
36-bp reads of the Solexa primer of the dsDNA-10,000 oligo-pool generated by “read-off” the 10,000 oligonucleotide microarray.
The relatively high number of rare hits seen in
Based on the data from sequencing and the microarray screening it can be assumed that the relative amounts observed by sequencing are an effect of the
The PCR error rate was calculated using the formula given by Hayes (1965;
The effective number of duplications can be calculated from the template-product ratio. The amount of PCR product amplified from ∼2.9×10−13 g of microarray supported template DNA (Agilent) was determined to be 111 µg, and the effective number of duplications was calculated to be 18.8 (
This error rate is slightly higher than the error rate typically observed for the
Four microarrays with 1, 10, 3,875 or 10,000 different oligonucleotide sequences were utilized to determine whether they could be used as a platform for large scale DNA synthesis. A novel microarray “read-off” technology was established that allows high-throughput amplification of microarray supported DNA probes and the production of DNA libraries containing tens of thousands of members.
DNA sequencing and microarray hybridization of 1, 10, 3,875, and 10,000 DNA oligonucleotide “read-off” libraries illustrated that microarray “read-off” had occurred uniformly over the whole of the high-content DNA microarrays, and that the amount of oligonucleotide in the library mixture was determined by the number of replicates of each oligonucleotide on the “read-off” array. The DNA-arrays could be used as “factories” allowing specific DNA oligo pools to be generated with or without masking. The PCR error rate for the combined PCR “read-off” microarray and subsequent PCRs was calculated to be 9.7×10−3/site/duplication, which is relative to the error rate typically observed for the
This technique offers efficient and inexpensive generation of thousands of defined oligonucleotides, which could allow the rapid synthesis of specific primers for use in genome sequencing and genotyping assays or DNA-encoding methods and aptamer screening. Furthermore, this method gives easy access to unpurified mixtures of microarray-synthesized oligonucleotides, which have been used directly in generation of high-quality gene assembly
Another application of the technique could be the synthesis of defined siRNA libraries by employing an RNA polymerase
Interestingly we also observed that the comparative results of microarray hybridization analysis did not correlate with those of Solexa sequencing due to specific consensus sequences that sequenced poorly. The oligonucleotides not seen by sequencing were identified in substantial amounts by microarray hybridization. Together with the relatively low PCR error rate of the combined PCR “read-off” microarray and subsequent PCR amplification this demonstrates that the “read-off” approach is not sequence dependent but that the Solexa sequencing is. Similarly, significant skewing has previously been reported in Solexa sequencing of a PCR-amplified synthetic oligonucleotide library
The 1-member oligonucleotide microarray was generated by contact printing a 3′-amino modified DNA oligonucleotide (Microsynth) onto a Codelink® slide in a 10×10 pattern. After printing the unreacted sites on the slide were blocked with ethanolamine and the array was washed briefly with 0.2% SDS in 4× SCC (Fisher Scientific), 0.1% SDS in 2× SCC for 2×5 min, 0.2× SCC for 5 min, and 0.1× SCC for 5 min, and dried under a flow of N2. All other DNA microarrays were custom fabricated by Oxford Gene Technologies (OGT).
Samples (20–30 µL) were prepared with 6× Blue/Orange Loading Dye (5 µL, Promega) and DNA grade H2O were run on a 5 (w/v)% agarose gel (Promega Preparative grade for small fragments) in 1× Tris Borate EDTA (pH 8.3, TBE, Fisher Scientific) buffer for approximately 1 h. The gel was analyzed under UV light and the appropriate bands were exercised with a scalpel. DNA was purified using a QIAEX II Agarose Gel Extraction Kit (Qiagen) according to the manufacturer's protocol.
Elongation reaction mix (200 µL) without primers was prepared according to a Promega standard protocol using a PCR Master Mix (Promega, 25 U/mL
The purified products (250 ng) from each of the PCR “read-off” microarrays were used as templates in another round of PCR with primer-1 and 2 (1 µM)
dsDNA-10.000-2 (200 ng)
The purified fluorescent PCR constructs were dissolved in 0.1% SDS in 4× SSPE buffer (110 µL; 0.6 M NaCl, 40 mM NaH2PO4, 5 mM EDTA in H2O at pH 7.4) and denatured at 65°C for minimum 5 min. This solution was hybridized on a customized DNA array (OGT) in an Agilent hybridization chamber from 65–27°C over 24 h (conditions were optimized for exclusion of mismatches during hybridization). The arrays were washed with 0.2% Sodium Dodecyl Sulphate (SDS, Promega) in 2× Saline-Sodium Citrate (SSC, 20 mL, Promega) for 5 min, 0.2× SSC (20 mL) for 5 min, 0.1× SSC (20 mL) for 5 min, and briefly rinsed in DNA grade H2O (20 mL) and Tris buffer at pH 8.0 (20 mL, 10 mM) and dried under a N2 flow. The microarrays were imaged with a Tecan LS Reloaded microarray scanner using a FITC filter and the images were analyzed using Bluefuse (BlueGenome) software (ArrayExpress, accession number E-MEXP-3102, all microarray data complies with the Minimal Information About a Microarray Experiment (MIAME) guidelines.).
dsDNA-LL10,000 (200 nmol) was Illumina sequenced with 36-base reads off the Solexa-primer-1 domain at the end of each oligonucleotide (The GenePool, The University of Edinburgh). The resulting reads were clustered against a list of the 10,000 oligonucleotides in the 10,000 library and a list of the identified sequences was generated including the number of times each oligonucleotide was seen. Another list of the sequences not seen by Illumina Solexa sequencing was generated (ArrayExpress, accession number E-MTAB-540).
The PCR error rate was estimated using the formula given by Hayes (1965, Eq. 2):
The observed error number per sequence was calculated as follows:
The length of the microarray supported DNA templates is 60 bp (see
After elongation on the microarray and PCR the product (1.65 µg; dsDNA-10,000) was used as template in a subsequent PCR with Solexa primers. The amount of PCR product obtained (dsDNA-10,000-2) was 110.8 µg. The effective number of duplications (# of cycles) was calculated from Eq. 6: