Rapid advancements in long-read sequencing technologies have transformed read lengths from bps to Mbps, which has enabled chromosome-scale genome assemblies. However, read lengths are now becoming limited by the extraction of pure high-molecular weight DNA suitable for long-read sequencing, which is particularly challenging in plants and fungi. To overcome this, we present a protocol collection; high-molecular weight DNA extraction, clean-up and size selection for long-read sequencing. We optimised a gentle magnetic bead based high-molecular weight DNA extraction, which is presented here in detail. The protocol circumvents spin columns and high-centrifugation, to limit DNA fragmentation. The protocol is scalable based on tissue input, which can be used on many species of plants, fungi, reptiles and bacteria. It is also cost effective compared to kit-based protocols and hence applicable at scale in low resource settings. An optional sorbitol wash is listed and is highly recommended for plant and fungal tissues. To further remove any remaining contaminants such as phenols and polysaccharides, optional DNA clean-up and size selection strategies are given. This protocol collection is suitable for all common long-read sequencing platforms, such as technologies offered by PacBio and Oxford Nanopore. Using these protocols, sequencing on the Oxford Nanopore MinION can achieve read length N50 values of 30–50 kb, with reads exceeding 200 kb and outputs ranging from 15–30 Gbp. This has been routinely achieved with various plant, fungi, animal and bacteria samples.
Citation: Jones A, Torkel C, Stanley D, Nasim J, Borevitz J, Schwessinger B (2021) High-molecular weight DNA extraction, clean-up and size selection for long-read sequencing. PLoS ONE 16(7): e0253830. https://doi.org/10.1371/journal.pone.0253830
Editor: Mark Eppinger, University of Texas at San Antonio, UNITED STATES
Received: March 1, 2021; Accepted: June 9, 2021; Published: July 15, 2021
Copyright: © 2021 Jones et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Sequencing data and reference genomes generated with this protocol are being made publicly available on the Sequence Read Archive (SRA, NCBI) under BioProjects PRJNA509734 and PRJNA510265. https://www.ncbi.nlm.nih.gov/bioproject/509734 https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=510265.
Funding: A.J. and B.S. received sequencing funds from Bioplatforms Australia, as part of the Genomics for Australian Pants initiative www.genomicsforaustralianplants.com J.B. received funds from an Australian Research Council Centre of Excellence (Plant Energy Biology) (CE140100008) and Discovery Project (DP150103591) www.arc.gov.au B.S. received funds from an Australian Research Council Future Fellowship (FT180100024) www.arc.gov.au The funders had and will not have a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: “For two plant species, Wahlenbergia ceracea and Phebalium stellatum, sequencing funds have been generously provided by Bioplatforms Australia, under the Genomics for Australian Plants initiative. The funder had no influence on the interpretation of the data.” This does not alter our adherence to PLOS ONE policies on sharing data and materials. Many datasets are publicly available on NCBI, others will become available in due time with publication of the genome assembly. There are no competing interests relating to employment, consultancy, patents, products in development, or marketed products. We present a full and objective manuscript that has no interference to its objectivity.
DNA sequencing technologies have transformed genomics due to rapid advances in read length, throughput and application, combined with an ever competitive price. Short-read sequencing platforms provide billions of reads 100–250 bp in length at unrivalled accuracy, while long-read platforms a can provide millions of reads 1 kbp to 1 Mbp, at the cost of accuracy . Long-read platforms have been at the forefront of recent advancements, as they offer unprecedented opportunities for de novo assembly of full length chromosomes and phasing of haplotypes [2,3]. With long-read sequencing platforms, advancements have shifted read length being limited by technology to being limited by quality and length of the DNA input. This has given rise to a new challenge; the extraction of pure high-molecular weight DNA suitable for long-read sequencing, which is particularly troublesome in plants and fungi. This is often caused by the presence of secondary metabolites such as polyphenols and polysaccharides. Polyphenols within the cytosol will be exposed to DNA after cell lysis and can have irreversible interactions . Polysaccharides can co-precipitate with DNA in the presence of alcohol and can have downstream inhibitory effects in many molecular biology techniques . Isolation of nuclei can help resolve these issues and obtain high-molecular weight DNA . Indeed, nuclei preps have been further developed for long-read sequencing but remain laborious and low throughput . One approach that is becoming widely utilized for long-read sequencing is the use of carboxylated magnetic beads, which DNA can bind to under the presence of polyethylene glycol and sodium chloride . This method does not isolate nuclei but still circumvents the use of binding columns and high centrifugation, which are techniques that can fragment DNA. Here we present a modified protocol of Mayjonade et al.  that has been used across a wide variety genera of samples, including recalcitrant plants. For plants containing excessive phenols and polysaccharides, an optional washing of homogenate with sorbitol is included which help remove these contaminants . Lastly, DNA clean-up and size selection options are presented which can greatly enhance the success of long-read sequencing platforms. This protocol is part of a bigger repository hosted on Protocols.io, as part of the public workspace ‘High-molecular weight DNA extraction from all kingdoms’ (https://www.protocols.io/workspaces/high-molecular-weight-dna-extraction-from-all-kingdoms).
The protocol described in this article is published on protocols.io, https://dx.doi.org/10.17504/protocols.io.bss7nehn.
Using the protocol described, we have been obtaining large yields of high-molecular weight DNA (Table 1, Fig 1). DNA fragment size ranges from 20–200 kb in length, which is ideal for long-read sequencing (Fig 1). To remove the small DNA fragments and clean plant DNA preps which can be somewhat crude, PippinHT (Sage Science) to select fragments 20 kb and above has been very efficient (Table 1). Other DNA clean-up options are presented in the protocol and achieve similar results, however are more labour intensive. During sequencing, we can reproducibly obtain over 15–30 Gbp of reads from a single Oxford Nanopore MinION flow cell, with read length N50s 30–50 kb (Table 2, Fig 2). This includes quality reads over 200 kb in length (> Q7, Phred scale). It is likely smaller fragments are favoured during sequencing (higher molarity) and the library prep is likely to cause some DNA shearing. Sequencing with PacBio Sequel II (circular consensus sequencing mode for HiFi reads), yields over 20 Gbp can be achieved at very high accuracy (> Q30), but at a smaller length, as this technology is optimised for 10–20 kb fragments. High performing sequencing results have been achieved with various plant, fungi, animal and bacteria samples (Table 2). The sequencing data is being used for de novo genome assemblies and in some instances haplotype phasing. Sequencing data and the subsequent reference genomes being generated in this project are being made publicly available Sequence Read Archive (SRA, NCBI). Multiple Eucalyptus genomic datasets are available under BioProject PRJNA509734 and Acacia datasets are available under BioProject PRJNA510265. Supporting publications and other genera are soon to follow.
Peak at 200 kb represents all fragments > 200 kb, as they cannot be resolved with the technology. Sample was crude DNA prior to and size selection or further DNA clean-up.
Image generated with NanoPlot 1.28.2 .
Firstly, crude DNA was extracted, which was then size selected for 20 kb and above with a PippinHT (Sage Science).
- 1. Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nature Reviews Genetics. 2020;21(10):597–614. pmid:32504078
- 2. Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585(7823):79–84. pmid:32663838
- 3. Shafin K, Pesout T, Lorig-Roach R, Haukness M, Olsen HE, Bosworth C, et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nature Biotechnology. 2020;38(9):1044–53. pmid:32686750
- 4. Leng M, Drocourt J-L, Helene C, Ramstein J. Interactions between phenol and nucleic acids. Biochimie. 1974;56(6):887–91. pmid:4447802
- 5. Do N, Adams RP. A simple technique for removing plant polysaccharide contaminants from DNA. BioTechniques. 1991;10(2):162, 4, 6. pmid:2059438
- 6. Bolger A, Scossa F, Bolger ME, Lanz C, Maumus F, Tohge T, et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nature Genetics. 2014;46:1034. pmid:25064008
- 7. Workman R, Fedak R, Kilburn D, Hao S, Liu K, Timp W. High Molecular Weight DNA Extraction from Recalcitrant Plant Species for Third Generation Sequencing. 2018.
- 8. Mayjonade B, Gouzy J, Donnadieu C, Pouilly N, Marande W, Callot C, et al. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. BioTechniques. 2016;61(4):203–5. pmid:27712583
- 9. Inglis PW, Pappas MdCR, Resende LV, Grattapaglia D. Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications. PLOS ONE. 2018;13(10):e0206085. pmid:30335843
- 10. De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–9. pmid:29547981