Conceived and designed the experiments: JD FA MR BG. Performed the experiments: SS SM BG. Analyzed the data: JD SS RJ BG. Contributed reagents/materials/analysis tools: JWL RWD MR. Wrote the paper: BG.
The authors have declared that no competing interests exist.
Pyrosequencing is a DNA sequencing method based on the principle of sequencing-by-synthesis and pyrophosphate detection through a series of enzymatic reactions. This bioluminometric, real-time DNA sequencing technique offers unique applications that are cost-effective and user-friendly. In this study, we have combined a number of methods to develop an accurate, robust and cost efficient method to determine allele frequencies in large populations for association studies. The assay offers the advantage of minimal systemic sampling errors, uses a general biotin amplification approach, and replaces dTTP for dATP-apha-thio to avoid non-uniform higher peaks in order to increase accuracy. We demonstrate that this newly developed assay is a robust, cost-effective, accurate and reproducible approach for large-scale genotyping of DNA pools. We also discuss potential improvements of the software for more accurate allele frequency analysis.
Population-based studies are commonly used to locate genes that underlie complex diseases in genetic association studies, which have shown to be a crucial tool for mapping complex diseases and traits. Although the cost of individual SNP genotyping has been reduced significantly, the use of DNA pooling decreases the cost even further, especially for large-scale genetic studies.
Pyrosequencing
In this study, we have addressed these issues by developing a robust, cost-effective, accurate and reproducible assay for large-scale genotyping of DNA pools based on a combination of robotic DNA pooling, universal biotin amplification, touchdown PCR, using lower DNA concentrations, and finally replacing dATP-alpha-thio wtih dTTP readouts by redesigning the genotyping for accurate peak uniformity. The assay is remarkably cost-effective and has a general approach.
192 patients with Parkinson's disease (PD) and 192 control individuals were enrolled in this study. A consent form was signed by all patients participating in this project. Genomic DNA from PD patients and control individuals were extracted from blood and quantified by spectrophotometer (Qiagen, Hilden, Germany). DNA quality was verified by both Gel electrophoresis and spectrophotometer in order to evaluate DNA integrity and any possible contamination of DNA samples by RNA or protein. To confirm the quality of DNA, no degradation by electrophoresis gel and a 260/280 ratio between 1.7 to 2 for all extracted DNA were required
The initial concentration of each DNA sample from both controls and cases was 15 ng/μl. The DNA concentration measurements were performed by NanoDrop ND spectrophotometer (NanoDrop Technologies, Wilmington, DE). All the samples were robotically diluted to 1.5 ng/μl with TE buffer (10 mM Tris, bring to pH 8.0 with HCl and 1 mM EDTA) using Biomek FX Dual Bridge Laboratory Automation Workstation (Beckman Coulter, Fullerton, CA). From the diluted samples, 10 μl of 192 controls were robotically pooled and combined from 96-well Thermo-Fast microplates (Abgene, Surrey, Uk) into Deep Well titer plates (Beckman, Fullerton, CA). The same procedure was applied separately to the 192 DNA samples from the cases to have a final pooled DNA concentration of 1.5 ng/μl (15 ng/10μl). For evaluation of the dilutions, control and case samples were also separately pooled to 96, and 192 final dilutions robotically.
The amplification primers were designed by the online software SOP3 version 2 (
Sequencing primers were designed by SOP3. Single strand preparation and sequencing primer hybridization were performed semi-automatically using a Vacuum Prep Tool and Vacuum Prep Worktable (Biotage AB, Uppsala, Sweden) as described before
In order to achieve high precision pooling with minimal sampling errors and same systemic error, Biomek automation workstation was used to separately pool 192 controls and 192 cases with a final concentration of 15 ng/μl. For assay accuracy evaluation, pools of 96 and 192 control samples were also prepared.
For universal biotin amplification a 22-mer sequence that has no interaction with human genome was selected from a previous study
The amplicons were prepared for DNA sequencing (single-strand separation and sequencing primer annealing) by Vacuum Prep WorkStation using 10 μl of each PCR product. The primed amplicons were sequenced and genotyped by the high sensitive pyrosequencer requiring lower amounts of sample and reagents. The genotyping results were analyzed by Software HS96A version 1.2. The genotyping results of the test pool samples were also analyzed manually to investigate the accuracy of the software.
To evaluate the accuracy of the assay, for 3 SNPs (CYP2E1[rs:915906], DrD2[rs:6279] and COMT[rs:933271]), pyrosequencing was performed on 192 control samples individually to obtain the true SNP genotypes. For these SNPs pools of 96 and 192 were genotyped by pyrosequencing as described above. The obtained results were compared with the individual genotyping results.
Each pool consists of DNA from 192 individuals. The pyrograms signal peak heights analyzed with reference demonstrates a) low error rate b) higher error rate between manual and software analysis.
Gene name: CYP2E1, SNP ID: rs 915906 | ||||||||
No.of Samples | Individual Samples | Pooled Samples (Software) | Pooled Samples (Manual) | % Error (Software) | % Error (Manual) | |||
C% | T% | C% | T% | C% | T% | |||
1-96 | 18.75 | 81.25 | 18.8 | 81.2 | 18.2 | 81.8 | 0.05(±4.42) | 0.55(±0.063) |
1-192 | 16.4 | 83.6 | 14.2 | 85.8 | 15.62 | 84.38 | 2.2(±4.72) | 0.78(±0.042) |
Gene name: DrD2, SNP ID: rs 6279 | ||||||||
No.of Samples | Individual Samples | Pooled Samples (Software) | Pooled Samples (Manual) | % of Error (Software) | % of Error (Manual) | |||
C% | G% | C% | G% | C% | G% | |||
1-96 | 28.64 | 71.35 | 30.5 | 69.5 | 29.55 | 70.45 | 1.86(±0.49) | 0.9(±0.065) |
1-192 | 31.9 | 68.1 | 36.1 | 63.9 | 33.58 | 66.42 | 4.2(±1.42) | 1.68(±0.034) |
Gene name: COMT, SNP ID: rs933271 | ||||||||
No.of Samples | Individual Samples | Pooled Samples (Software) | Pooled Samples (Manual) | % of Error (Software) | % of Error (Manual) | |||
C% | T% | C% | T% | C% | T% | |||
1-96 | 26.04 | 73.96 | 27.98 | 72.02 | 27.59 | 72.41 | 1.94(±6.59) | 1.55(±0.18) |
1-192 | 26.04 | 73.96 | 29.25 | 70.75 | 26.59 | 73.4 | 3.21(±4.40) | 0.55(±0.12) |
Manual and software analysis comparison with the reference (true values) for pools of 96 and 192 for three SNP from genes CYP2E1, DrD2 and COMT. The table demonstrates significant lower error rates by manual evaluation.
The 192 pool of cases and 192 pool of controls were genotyped for 230 SNPs and the results of these genotyping results are available on the above-mentioned website. Amplification primer sequences and sequencing primers are available on Table A, and the dispensation orders are also listed as Table B online. The name all of SNPs, their positions, genotyping results for control and cases and the control-case difference are listed on the online Table C.
In our study we avoided using nucleotide A sequence signal peaks. In Pyrosequencing, the intensity of nucleotide dATP-alpha-thio signal peak is usually 10 to 15 percent higher than other nucleotides
For manual analysis of SNPs in the repeat regions, there is sometimes lack of an adjacent single base before the SNP for accurate measurement. To address this, we recommend measuring the next single base peak height in the pyrogram after the SNP for correct allele frequency determination.
In conclusion, we have developed an automated high throughput assay for large-scale DNA pool analysis for allele frequency estimation and determination. The assay is highly robust, accurate and cost-effective. The universal biotin amplification has a general approach and could be used for studies of any scale. The assay addresses the challenges that can increase the accuracy and precision of allele frequency estimation. Although not all the labs might have access to the robotic sample pooling, this could most likely be outsourced. The cost efficiency for biotin-labeling and DNA pooling decreases the cost by many orders of magnitude, which allows many large scale studies possible. Furthermore, the pooled DNA samples could be stored for future analysis of other relevant markers.
Gel staining figure of different SNPs amplified with universal biotin sequence tag from genomic DNA.
(10.29 MB TIF)
We would like to thanks Monika Trebo for her excellent web technical assistance.