Bisulfite sequencing is a valuable tool for mapping the position of 5-methylcytosine in the genome at single base resolution. However, the associated chemical treatment causes strand scission, which depletes the number of sequenceable DNA fragments in a library and thus necessitates PCR amplification. The AT-rich nature of the library generated from bisulfite treatment adversely affects this amplification, resulting in the introduction of major biases that can confound methylation analysis. Here, we report a method that enables more accurate methylation analysis, by rebuilding bisulfite-damaged components of a DNA library. This recovery after bisulfite treatment (ReBuilT) approach enables PCR-free bisulfite sequencing from low nanogram quantities of genomic DNA. We apply the ReBuilT method for the first whole methylome analysis of the highly AT-rich genome of Plasmodium berghei. Side-by-side comparison to a commercial protocol involving amplification demonstrates a substantial improvement in uniformity of coverage and reduction of sequence context bias. Our method will be widely applicable for quantitative methylation analysis, even for technically challenging genomes, and where limited sample DNA is available.
Citation: McInroy GR, Beraldi D, Raiber E-A, Modrzynska K, van Delft P, Billker O, et al. (2016) Enhanced Methylation Analysis by Recovery of Unsequenceable Fragments. PLoS ONE 11(3): e0152322. https://doi.org/10.1371/journal.pone.0152322
Editor: Jorg Tost, CEA - Institut de Genomique, FRANCE
Received: October 1, 2015; Accepted: March 11, 2016; Published: March 31, 2016
Copyright: © 2016 McInroy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Raw sequencing files can be found in the GEO under accession number GSE65116. The public link to the raw data presented in the manuscript is here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65116.
Funding: GRM is supported by funding from Trinity College Cambridge and Herchel Smith. DB is supported by funding from the Wellcome Trust and Herchel Smith. EAR is a Herchel Smith Fellow. PVD is a Marie Curie Fellow of the European Union (FP7-PEOPLE-2013-IEF/624885). The Balasubramanian lab is supported by a Senior Investigator Award from the Wellcome Trust (099232/Z/12/Z to SB) and by core funding from Cancer Research UK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors of this manuscript have read the journal’s policy and have the following competing interests: EAR, GRM, and SB are named inventors on a filed patent relating to this work (Nucleic acid preparation method, WO/2015/145133, PCT/GB2015/050871). This does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials.
5-methylcytosine (5mC) is the primary epigenetic DNA modification in eukaryotes. This covalent base modification regulates gene expression and is important to genomic imprinting  and disease states across a wide range of organisms [2, 3]. Accurate, quantitative detection and mapping of 5mC in genomic DNA is essential to understand its function. The core methodology used to provide single base resolution methylation maps is bisulfite sequencing (BS-seq) , which exploits the differential deamination kinetics of cytosine and 5mC when treated with sodium bisulfite . Bisulfite treatment causes cytosine to rapidly deaminate to uracil, while 5mC reacts over two orders of magnitude slower. Subsequent sequencing reveals cytosine to thymine switches at unmodified cytosine sites and cytosine calls at 5mC loci. Additionally, BS-seq may be used to measure the extent of methylation at a single genomic locus in a population of cells, by dividing the number of reads carrying an unconverted cytosine by the total number of reads covering that site. For genomes containing 5-hydroxymethylcytosine (5hmC) [6, 7] it is essential to be aware that 5mC and 5hmC are indistinguishable in BS-seq data . New methods have been developed to distinguish 5mC from 5hmC [9, 10].
While BS-seq has been regarded as the gold standard for methylation analysis, there are some serious deficiencies with the method. One major issue is the formation of abasic sites via the loss of pyrimidine bases . Heat or alkali conditions can induce strand scission at these sites, both conditions which are employed during bisulfite treatment. Current library preparation techniques require addition of sequencing adapters to both ends of the fragmented DNA of interest prior to bisulfite conversion [12, 13]. The retention of these ligated adapter sequences at both ends of a fragment is an absolute requirement for generation of a read during sequencing; consequently just one DNA cleavage event precludes data acquisition. As a result of the need to ensure complete cytosine conversion, harsh bisulfite conditions can induce strand scission that renders up to 99.9% of DNA fragments in a library unsequenceable . PCR amplification is therefore required to enrich for the remaining minority of uncleaved fragments bearing both adapter sequences. Bisulfite induced strand scission has been exploited to fragment genomic DNA in the post-bisulfite adapter tagging method . This method has enabled notable advances such as single-cell genome-wide bisulfite sequencing .
Post-bisulfite DNA is invariable AT-rich due to the conversion of cytosine to uracil. Even originally balanced genomes become highly skewed, for example the AT-content of a human genome rises from approximately 57% to 78% following bisulfite conversion. Sequences with highly skewed base compositions amplify poorly or not at all , and therefore require an increased number of PCR cycles to obtain sufficient material for sequencing. Moreover, the ratio of DNA fragments following amplification is not truly representative of the input material, as those fragments tending towards a more balanced AT/GC composition will be amplified preferentially. Therefore, the accuracy of ‘quantitative’ methylation data, which has been generated from PCR amplified DNA libraries, must be drawn into question. Fragments of DNA containing 5mC will retain a more balanced AT/GC composition than those fragments without the modification, due to the retention of 5mC and conversion of cytosine to uracil. Amplification can lead to the overrepresentation of DNA fragments with a more balanced composition, and thus an overestimation of methylation levels .
Previous studies have shown that prudent enzyme choice and minimizing amplification cycles can limit enrichment biases in bisulfite data , though not completely evade them. One method designed to produce representative sequencing libraries from samples requiring amplification is the linear amplification for deep sequencing (LADS) protocol . This method relies upon in vitro transcription for amplification before cDNA synthesis to regenerate a sequenceable library, and has only slightly decreased coverage uniformity compared to amplification free techniques. However, it has not yet been applied to bisulfite treated genomic samples for methylation analysis.
We have developed a PCR-free library preparation for bisulfite sequencing. A two-step ligation protocol enables us to rebuild ‘damaged’ fragments into sequenceable strands, thus regaining library diversity and quantity. As a result, we obtain virtually unbiased data from low nanogram quantities of input sample. We employed our method to obtain the first methylome of the blood-borne stages of the murine malarial model Plasmodium berghei, which has a starting genome composition of 78% AT and so poses a challenge for amplification dependent techniques. While epigenetic control mechanisms in Plasmodium spp. have attracted much study [19–21], DNA modification has been largely neglected. The ability to obtain an accurate methylation map would add to the knowledge base of the existing epigenetic network, and may offer new therapeutic targets.
0.1 Preparation of PCR-free bisulfite libraries
Whilst bisulfite treatment depletes sequenceable DNA due to loss of adapters by fragmentation, the majority of cleaved fragments still contain useful information and are of a mappable length. We recover these lost fragments and the associated information by employing a two-step ligation procedure, where the P7′ adapter is added before bisulfite treatment and the P5 adapter afterwards.
The recovery after bisulfite treatment (ReBuilT) method begins with fragmentation, end repair and A-tailing. We then employ custom methylated adapters, with one strand bearing a 3′ biotin label and the other a 3′ dideoxythymidine (ddT) terminator. The presence of a 3′ ddT prevents ligation to the 5′ end of the insert DNA, resulting in a single-stranded directional ligation to the 3′ insert terminus. Furthermore, adapter dimerisation is not possible during this ligation, thus preventing formation of common sequencing contaminants. Following bisulfite conversion, primer extension with a high fidelity uracil tolerant polymerase generates blunt ended double stranded DNA, which is immobilized on streptavidin coated magnetic beads via the biotin label. Immobilization enables near lossless manipulation of the library during subsequent processes. The immobilized DNA is A-tailed before ligation of a complementary P5 adapter. The biotin bearing strand of this fully adapted DNA contains uracils, so is not suitable for standard next-generation sequencing, while the other strand contains only the canonical nucleobases. Denaturing conditions elute the canonical DNA strand ready for sequencing.
As a proof of concept experiment we generated sequencing libraries from E. coli, chosen due to its small genome (4.6 Mb) and balanced base composition (50%). We employed qPCR to compare the concentration of sequenceable fragments obtained with either ReBuilT or a standard BS-seq library preparation protocol. With equal input quantities of DNA, the concentration of sequenceable fragments was two orders of magnitude higher with the ReBuilT protocol than with the standard protocol excluding PCR amplification (S1 Fig). We then sequenced the libraries, having amplified the BS-seq library to obtain sufficient adapter ligated DNA for sequencing, with paired end reads on an Illumina MiSeq. Upon inspection of the genomic coverage, the ReBuilT method had a significantly more uniform profile than the amplified library (termed PCR-BS). Notably, a number of regions had very few reads in the PCR-BS dataset, yet were efficiently sequenced via the ReBuilT method (S2 Fig).
With this promising method in hand, we focused on our system of interest, the challenging AT-rich P. berghei genome. Since there have been no reports of the DNA base composition of P. berghei, we first analysed the global DNA modification levels by tandem mass-spectrometry. We found the level of 5mC to be 0.31% of total cytosine species, and detected no other oxidised cytosine derivatives (S3 Fig).
We employed the ReBuilT method to generate PCR-free libraries from 50 ng of P. berghei DNA, extracted from an asynchronous population of erythrocytic stages. In parallel we generated traditional bisulfite libraries that included post-bisulfite PCR amplification (again termed PCR-BS). We sequenced multiplexed libraries on the Illumina NextSeq platform, with paired end reads of 75 or 100 bases. We obtained up to 285 million reads from 13% of an amplification free library generated from 50 ng, i.e: equivalent to 6.5 ng of input DNA, which provided ample data for analysis of low methylation levels with high confidence.
0.2 Comparison of sequencing data quality
To evaluate the possible benefits of ReBuilT over the PCR-BS method, we compared a range of data quality metrics for the two systems (Fig 1). As the sets of libraries were generated from the same source of genomic DNA, any differences should be solely due to the library preparation method.
The possible fragments generated by cleavage (indicated with red stars) during bisulfite treatment are illustrated, and annotated to indicate if they remain sequenceable. Left track: adapter ligation precedes bisulfite treatment, after which