Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely available under a GNU General Public License version 3.0 (GPLv3) at https://github.com/tadkeys/tabsat/ and http://demo.platomics.com/.
Citation: Pabinger S, Ernst K, Pulverer W, Kallmeyer R, Valdes AM, Metrustry S, et al. (2016) Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers. PLoS ONE 11(7): e0160227. https://doi.org/10.1371/journal.pone.0160227
Editor: Matteo Pellegrini, UCLA-DOE Institute for Genomics and Proteomics, UNITED STATES
Received: April 22, 2016; Accepted: July 16, 2016; Published: July 28, 2016
Copyright: © 2016 Pabinger et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All files for testing and running the software are available at https://github.com/tadKeys/tabsat and https://github.com/tadKeys/tabsat/tree/master/test_data.
Funding: This work was supported by the European Union, FP7 small medium focused project 277849 EurHEALTHAgeing (http://eurhealth.org/). Platomics GmbH provided support in the form of salaries for authors [DK, AN, AK], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the "author contributions" section.
Competing interests: The authors have read the journal's policy and the authors of this manuscript have the following competing interests: Denis Katic, Angelo Nuzzo, and Albert Kriegner are employed by Platomics GmbH. The authors wish to confirm that this Competing Interest does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.
DNA methylation is one of the most important epigenetic modifications of the eukaryotic genome and plays essential roles in several biological processes, such as alternative splicing , regulation of temporal and spatial gene expression [2,3], as well as genome stabilization . It is the most widely studied epigenetic modification in humans and describes the adding of a methyl group to DNA nucleotides, in humans typically occurring in a CpG dinucleotide context . These CpG sites are often clustered together as CpG islands, which co-locate to distinct gene regions, modifying the expression of these genes. DNA methylation is heritable and known to cause genomic imprinting . It is highly relevant for the study of human disease, as somatic alterations of the methylation status have been described to be associated with aging , atherosclerosis , and several human diseases , most notably cancer .
The gold-standard for studying single-base methylation is bisulfite sequencing, also referred to as BS-Seq . Here, DNA is treated with bisulfite, which changes the DNA sequence depending on the methylation status of individual cytosine residues. Unmethylated cytosines are converted to uracil, which is amplified and read by the sequencer as thymine, whereas methylated cytosines are left unconverted. Next, the generated sequences are compared to a known reference yielding single nucleotide resolution information about the methylation status of the DNA. In cases where whole bisulfite genome sequencing is too expensive or not required, a practical alternative is to limit the sequencing per individual to selected, meaningful targets. This approach, commonly termed as targeted sequencing, allows generating consistent data with high coverage around regions of interest .
Due to the low entry cost and medium throughput, bench-top next-generation DNA sequencers (454 GS Junior from Roche, Illumina MiSeq, or Ion Torrent Personal Genome Machine (PGM)) are especially equipped for targeted sequencing . The PGM  applies a sequencing-by-synthesis approach, uses native dNTP chemistry, and relies on a modified silicon chip to detect hydrogen ions released during base incorporation by DNA polymerase. The sequencer generates single-end (SE) reads in varying quality and length. A known caveat of the PGM is its susceptibility to over-call or under-call the number of homopolymer bases, a feature which needs to be specifically addressed by dedicated downstream analysis methods and tools .
Bisulfite sequencing data analysis involves several steps including quality assessment, alignment, and methylation calling . Several tools are available, which use different approaches to analyze the data [17,18]. An important part of the bisulfite sequencing workflow is the translation of raw sequence information into bisulfite calls for each investigated base. The widely used tool Bismark  contains multiple routines to carry out alignment of bisulfite-treated reads to a reference genome as well as cytosine methylation calling.
According to omicsmaps.com there are over 300 Ion Torrent PGM machines in use, which could potentially be applied to targeted bisulfite sequencing. However, currently no protocol or analysis solution for bisulfite sequencing on the PGM is officially provided. We have therefore created a novel tool called TABSAT for the analysis and visualization of targeted bisulfite sequencing data generated by Ion Torrent instruments. The tool accepts raw sequencing files as input and outputs result tables containing information about the methylation status of covered CpG sites. Read mapping and methylation calling is handled by Bismark, which has been modified to use the TMAP  mapper instead of the default mapper Bowtie2. Results are aggregated in tabular format and automatically visualized as lollipop figures. TABSAT has been designed to run with a minimal set of input parameters but can be customized to support specific questions. In addition, it can be used with data from the Illumina MiSeq platform. The software is freely available at https://github.com/tadkeys/tabsat/.
Overview of TABSAT
We have developed TABSAT, a tool for the analysis of targeted bisulfite sequencing data generated on Ion Torrent systems. It was implemented using Python (version 2.7.6), R (version 3.1.2), and Perl (version 5.18.2) and makes use of several third party tools, including PRINSEQ , Bismark , and TMAP (version 4.4.8) to address different tasks of the analysis workflow (depicted in Fig 1). TABSAT is available as a standalone application (either as Docker image or as installable source code) and as an application in the web-based Platomics platform (see Fig 2 and section “Availability of TABSAT”). In addition to Ion Torrent data, TABSAT can be used with data from the Illumina MiSeq platform. The tool is capable of handling data generated by Bisulfite-Sequencing PCR (BSP)  as well as Methylation-specific PCR (MSP) .
The first step conducts quality assessment and generation of a report. Next, sequences are mapped to a reference genome and the methylation information is extracted. Based on several quality statistics and thresholds the generated results are filtered and aggregated into a final output table. In the next step, reads covering all CpGs in a target region are used to calculate methylation-pattern statistics, which are subsequently compared between samples. The last step creates the final output table, graphical representations of the results, and reports basic statistics generated during the analysis workflow.
After setting the correct parameters, a new analysis is started by clicking on the “Start Job” button. Running and finished analyses are displayed in the jobs panel and can be selected by clicking on the corresponding job.
Upon execution of TABSAT, data is gathered from the user-specified sources and processed by the application. Results are organized into folders and stored under the given output path (see section “Using TABSAT”). The first step of the pipeline performs quality assessment using the PRINSEQ  software. For each input file, summary statistics, including read length, GC content, quality score distributions, and the number of read duplicates, are created. These metrics along with graphics are summarized in HTML reports, which can be easily displayed in a web browser. After inspection of the quality results, the user can decide which TABSAT parameters (see section “Using TABSAT”) should be used for further downstream processing. By default low quality bases (phred score < 20) are clipped at the end of reads. In addition, filtering of reads based on a user defined length can be performed, which removes all reads that do not meet the minimum length requirement. After finishing the quality control step a second HTML report is generated, which allows users to compare the quality of data before and after quality assessment.
The subsequent alignment module uses Bismark to map the reads to the defined reference genome (download links for human genome version hg19 and mm10 are provided at http://github.com/tadkeys/tabsat). In order to improve the performance of Bismark with Ion Torrent data, we have included the TMAP mapper into the program (see section “Modifications to Bismark”). The pipeline uses the non-directional Bismark setting as default, but can be changed to directional depending on the library used. Next, methylation levels are determined based on the previous mapping result. The final output table (see Fig 3) is generated using a BED  file, which is created during the methylation calling step. Alignment strand-specific as well as CpH information is available in additional output files.
The output table of a run is presented in the upper part of the large panel containing the sequencing results for each CpG site of all samples. The lower panel shows the graphical representation of the methylation results. Additional information, such as mapping statistics and quality control information, about each sample can be accessed by clicking on the corresponding links.
Due to the PGM’s ability to produce reads with a length of up to 400 base pairs, entire PCR amplicons can be covered by single reads. Consequently, it is possible to determine the methylation pattern across a single amplicon by taking the information from reads spanning the whole amplicon target (see Fig 4). For example, an amplicon containing 3 CpG sites has 8 (23) different methylation patterns, ranging from fully unmethylated to fully methylated. For each sample and for each target the first step is the selection of all reads, which completely cover the target. Based on a user adjustable value, the user can select the percentage of the target that needs to be covered by a single read. Next, the different methylation-patterns are extracted, aggregated, and ranked based on their occurrence. In a final step, the patterns of all samples are compared with each other and their respective frequency is reported in a text document.
The top line shows an example target region where the CpG coordinates are highlighted. Displayed below are several reads mapping to this target with different methylation patterns. From these reads methylation-patterns are extracted, ranked according to their frequency, and the patterns are compared between samples.
The output creation module aggregates the generated methylation information of all individual targets for all samples into a final table (see Fig 3). Several filtering steps are applied to remove false methylation calls (see section Using TABSAT): first low coverage methylation calls are removed from the output using a cutoff filter that can be set as a parameter. In addition, TABSAT determines the average coverage for each CpG per target/sample using the methylated and unmethylated coverage information. Next, it calculates a mean from these averages for each sample/target combination, which is used to remove every methylation call where the total coverage of a CpG is lower than the calculated mean coverage minus one standard deviation. As we use an amplicon sequencing approach, we expect a uniform coverage for each sample/target. By taking the average coverage of all CpGs within one sample/target and comparing it to the sum of each CpG we ensure to remove only extreme outliers. In addition, all CpGs that are not on the user specified strand are filtered out.
Finally, for each sample and for each CpG in the target region the number of methylated reads, the number of unmethylated reads, and the corresponding methylation percentages are reported. The results are also visualized as lollipop diagrams (see Fig 5) providing an intuitive way to investigate the methylation of the CpGs. Depending on the size of the target regions, it might be the case that all CpGs of the target region cannot be adequately displayed if they are spaced according to their position in the target region. Therefore, two versions of the lollipop diagram are created: a) CpGs are equally spaced along the target; b) CpGs are spaced according to their actual chromosomal coordinate. In addition to the graphical representations, general statistics, such as mapping results, and target coverage are reported.
Displayed is an example of the two automatically generated lollipop diagrams containing the CpGs of the target region: a) all CpGs equally spaced across the target; b) all CpGs spaced according to their true chromosomal coordinate.
Availability of TABSAT
TABSAT is open-source and freely available. The pipeline consists of several modules including quality assessment, genomic alignment, methylation calling, CpG filtering, merging of samples, as well as methylation pattern calculation, visualization, and summarization of results. The pipeline is available in three versions: i) as Docker container with configured dependencies, required software, human genome assembly hg19, and mouse genome mm10 (available at https://github.com/tadkeys/tabsat) ii) as source code to be installed on a local infrastructure (accessible at https://github.com/tadkeys/tabsat) and iii) as a Platomics (http://demo.platomics.com) web application.
The Platomics platform supports creation, execution, and distribution of life-science applications. It manages storing, retrieving, organizing, and analysis of data and provides an interface to create life-science applications. A configurable output dashboard enables analysis results to be presented according to the needs of the application. The Platomics TABSAT version can be freely accessed at http://demo.platomics.com and after successful login (credentials available on github page) the user can perform analyses by selecting TABSAT in the workspace panel on the left side. After setting the correct parameters, target, and input files, the analysis is started by clicking on the “Start Job” button (see Fig 2). Currently running analyses are displayed in the lower left panel and results can be viewed by clicking on the corresponding job (see Fig 3). The complete result set can be downloaded as a compressed file allowing further downstream analyses.
The user manual including information for the setup of the pipeline using the source code as well as the Docker container can be found at https://github.com/tadkeys/tabsat. Our pipeline has no limitations concerning the number or length of FASTQ sequences and the used reference genome assembly. Test datasets can be downloaded at https://github.com/tadkeys/tabsat. They include FASTQ files and target files.
Modifications to Bismark
Bismark is a software package containing routines to map reads to a reference genome (using Bowtie , or alternatively Bowtie 2 ) and determine their methylation state. It provides an attractive combination of processing speed, genomic coverage, and quantitative accuracy and is one of the most widely used tools to map and analyze bisulfite treated short reads [17,18,27]. The software is able to handle single-end as well as paired-end reads of both directional and non-directional bisulfite libraries. Bismark is open-source, written in Perl and executed from the command line. Due to the initial poor alignment results (see Discussion) and the specific error profile of reads produced by the Ion Torrent systems, we decided to replace Bismark’s default mapper Bowtie2 (2.2.4) with TMAP (4.4.8) , the read mapping program provided by the Ion Torrent suite. TMAP has been specifically optimized to handle reads from Ion Torrent sequencing platforms in order to meet specific data mapping challenges. We have therefore extended the source code of Bismark to support the use of TMAP as an additional mapping program. Specifically, a new method to create the index of the reference genome and to perform single-end (SE) alignment using TMAP was integrated.
We processed the sequencing data of 48 samples (as described in the methods section) using the standalone version of TABSAT on a Linux Ubuntu 14.04 server with 16 cores and 32 GB memory. Using the Ion Torrent “314” sequencing chip, 16 barcoded samples (each having 53 amplicon targets) were multiplexed per run yielding on average approximately 25,000 reads per sample (~380 reads per amplicon) with an average length of 125 base pairs. In total 1,543,767 reads were analyzed, where 1,196,822 reads could be aligned to the human reference genome (hg19) yielding a mapping average of 77.9% and a mean target coverage of 282X. Results of all 48 samples were aggregated and reported in a final table.
To assess the performance of the improved mapping capabilities, we used one of the three targeted bisulfite datasets (314 chip) with 16 samples (each with 53 targets) to compare the output of the modified Bismark-TMAP version to the default version of Bismark (0.13.1). Hereby, we could observe that the average number of mapped reads per dataset increased from 31,469 to 54,395 (45.23% to 78.12%). Consequently, the average coverage of the targets improved from 362X to 483X. In addition, we could observe an increase of the average coverage of CpG sites corresponding with the 450k chip (see below) from 392X to 599X (see Table 1). It should be noted that bisulfite conversion introduces additional homopolymer stretches, which are traditionally difficult to analyze on Ion Torrent systems. However, TABSAT analyses only CpG sites, and therefore avoids these homopolymer rich regions.
Correlation with Illumina Infinium HumanMethylation450
In order to assess the performance of the targeted bisulfite sequencing setup, we compared the obtained sequencing results to an Illumina 450k methylation chip experiment using the same 48 DNA samples as used for targeted bisulfite sequencing (see Fig 6).
The 450k target is depicted below as well as the CpGs measured using targeted sequencing. Grey color intensity is correlated to the methylation signal (450k) and percent methylation (sequencing).
We correlated the methylation results of single Illumina 450k probes with the sequencing result of the corresponding single CpG for 53 targets per sample (see Fig 7).
Presented is the correlation result between the Illumina 450k methylation chip and targeted bisulfite sequencing data: a) Barplot of the correlation for each sample; b) Scatterplot of 450k (x-axis) versus targeted bisulfite data (y-axis).
Details about the selected amplicons, such as position, length, number of CpGs, melting temperature, GC percentage and number of mapped reads, are presented in Table 2 and S1 Table. The calculated median Pearson correlation yields 0.91 with a median R2 of 0.83. We also calculated the correlation between the average methylation of each target, instead of the single corresponding CpG, to the 450k result yielding a median correlation of 0.89 with a median R2 of 0.79.
We have developed a novel tool for the analysis of targeted bisulfite sequencing, which is especially equipped to handle, in addition to Illumina data, sequences generated on Ion Torrent systems. To date, several tools exist for the analysis of bisulfite data (reviewed here ), amplicons from bisulfite flowgram sequencing (Amplikyzer ) and locus-specific analysis of 5-methylcytosine (BiQ Analyzer HiMod ). However, none of these tools is specifically tailored for the analysis of Ion Torrent sequencing data and provides a one-stop solution from raw sequencing data to final results. Furthermore, the Ion Torrent PGM software platform currently does not support the analysis of bisulfite sequencing data. TABSAT comprises an analysis pipeline containing quality control, alignment, methylation calling, and output generation. In order to select the best mapping software for our purpose, we have evaluated several bisulfite analysis programs, such as Bismark , BS-Seeker2 , and BSMAP . All programs produced similar or poorer mapping results, and offer different downstream analysis capabilities. Preliminary analysis with one input file using default parameters resulted in around 45%, 42%, and 20% of mapped reads for Bismark, BsSeeker2, and BSMap, respectively (see https://github.com/tadkeys/tabsat/tree/master/comp_tools). Based on the availability of the source code, the possibility to integrate a different mapping program, and the positive reviews, we decided to include Bismark in our workflow to handle mapping of sequencing reads and methylation calling.
Due to the initial suboptimal alignment results of the default Bismark version, we decided to incorporate TMAP  into the Bismark software, a dedicated mapper for Ion Torrent reads. Reads from Ion Torrent sequencing devices are usually longer than their Illumina counterparts and show a distinct different error profile, especially in homopolymer regions. The boost in read length causes an increased number of sequencing errors per read, which requires changing the mapping settings as the standard parameters for controlling mapping error may be too strict. As the supported aligners in Bismark (bowtie or bowtie2) are configured to be used with Illumina sized reads, we decided to include the Ion Torrent TMAP program, which has been designed to overcome these limitations.
Consequentially, we could show that the number of mapped reads increased on average from 31,469 to 54,395 (45.2% to 78.1%) on an evaluation dataset including 16 samples. Therefore, users will be able to process more samples on a chip, reducing the cost per sample. In order to check for correct mapping, we compared the number of positions where reads mapped outside of the target regions between the default and the TMAP Bismark versions. We expect reads to map outside of the target region, for example, due to unspecific PCR amplification. On average, we observed 2.7 additional positions where reads mapped outside of the target region when running the TMAP version, which can be explained due to the increased mapping potential of the TMAP program.
TABSAT supports the analysis of multiple samples (e.g. all barcoded samples on one chip) at once and outputs a combined result table for all samples. This facilitates interpretation and comparison of multi-sample studies. An important issue when working with sequencing data is to remove false results introduced by sequencing errors. Therefore, we have included a filtering mechanism into the workflow to reduce the number of uncertain methylation calls based on read coverage and artifacts. This strategy allows the generation of reliable results from which to draw biological meanings. Another important part of data analysis is the intuitive visualization of results. We have incorporated an automatic graph generation procedure which outputs two different types of lollipop diagrams. In addition, results are reported in tabular format, which can be used in further downstream analysis methods.
Since the Ion Torrent PGM is capable of producing reads with a length of up to 400 base pairs, we can use them to extract specific methylation-patterns. As one read originates from one distinct biological source, it could be possible to deduce biological causes that lead to different methylation-patterns in one sample. Especially interesting is the comparison of methylation-patterns between different biological groups, which may contribute to new methods for group classification. Consequently, descriptive statistics based on these methylation-patterns are automatically calculated for each sample and subsequently compared between samples.
The whole analysis solution has been designed to work with minimal user input and outputs results in clearly arranged tables and lollipop diagrams. The large number of Ion Torrent PGM sequencers available world-wide shows that there is a large community, which would benefit from this tailored analysis of bisulfite sequencing data. In addition, more than 100 Ion Proton sequencers have been registered on omicsmaps.com, which generate reads with a similar error profile as the PGM. Consequently, they would also benefit from a dedicated analysis workflow for generating high-quality results. Therefore, the presented work will help to unlock the power of Ion Torrent (and potentially Ion Proton) for bisulfite sequencing and DNA methylation analysis.
TABSAT is designed for amplicon studies and does not support the analysis of whole genome bisulfite sequencing projects. Furthermore, it is not limited to, but works best with amplicons smaller than 500bp as these can be efficiently visualized using lollipop diagrams. The aim of TABSAT is to cover the primary analysis of raw targeted bisulfite sequencing data to obtain methylation information for each analyzed cite. Given the provided comprehensive output, researchers can use additional downstream tools, such as the R project for statistical computing, to compare the methylation level between different groups of samples.
To assess the accuracy of TABSAT, we compared the sequencing results with data from an Illumina 450k methylation chip. The calculated median correlation of 0.91 confirms that targeting bisulfite sequencing on an Ion Torrent PGM yields accurate and reproducible results. A recent publication  has demonstrated effective methylated DNA immunoprecipitation sequencing (MeDIP-Seq)  on an Ion Torrent PGM, which was successfully validated using a 450k chip. This study shows the practicability of performing DNA methylation sequencing studies on a PGM and emphasizes the need for dedicated analysis solutions.
TABSAT can be used as standalone software, conveniently available as a Docker container. Docker containers wrap up software in a complete filesystem that contains everything it needs to run, making it an ideal solution to execute bioinformatics software in a self-contained and precisely controlled environment . In addition, TABSAT is available as an embedded application within the Platomics life-science data analysis system. This graphical, web-based user interface has been especially designed to be usable without informatics knowledge. The results are presented in an intuitive way enabling an exploration of the data. As the complete result set can be downloaded as a compressed file, further in-depth downstream analysis can be easily performed.
In summary, TABSAT offers a novel and unreported way to analyze and interpret targeted bisulfite sequencing data.
Materials and Methods
DNA samples from the TwinsUK cohort were provided by the Department of Twin Research at King’s College London (KCL), TwinsUk is the largest registry of adult twins in the United Kingdom. It started in 1992 and currently encompasses approximately 12,000 volunteer twins from all over the UK . The study was approved by St. Thomas’ Hospital Research Ethics Committee, and all twins provided informed written consent to participate in the study. 24 monozygotic twin pairs (n = 48 samples) with a difference of 800 or more grams in birthweight between twins were selected for this study. DNA isolation was done from whole blood at the KCL’s laboratories and subsequently provided to AIT alongside sample annotation
Illumina Infinium HumanMethylation450 BeadChip
gDNA isolated from peripheral blood of 24 MZTs were subjected to DNA methylation analyses using the Illumina Infinium HumanMethylation450 BeadChip (450k Chip) following the manufacturer’s protocol. Briefly, 500 ng DNA per sample was deaminated using the Zymo EZ DNA Methylation Kit. The contained bisulfite solution converted unmethylated cytosines into uracil, by removing the amino group of the cytosine, while 5’-methylated cytosines remained unaffected. Incubation was done by applying 16 cycles of a two-step temperature program, starting at 95°C for 30 seconds followed by 50°C for 60 minutes.
The deaminated DNA was eluted in 12 μl elution buffer of which 4 μl were subjected to the 450k protocol. That protocol combines a genome wide amplification, an enzyme based targeted fragmentation, and a cleanup of the DNA prior to hybridization onto the bead chips. Since one 450k chip allows the investigation of 12 samples, the 48 samples were randomized before hybridization on four 450k chips. The methylation module of Illumina’s Genome Studio Software was used to subtract the background from the raw data and to perform an Illumina specific normalization which depends on internal control probes. Received beta-values were exported to a text-file for further analysis using R (V2.15) and the IMA package .
Targeted bisulfite sequencing
Fifty-three differentially methylated loci (DML) identified by the 450k chip were selected for further analyses by targeted bisulfite sequencing using the Ion Torrent PGM platform (see S1 Table). MSRE-HTPrimer  was used to design high quality assays with a length of 150–320 bp for these 53 DML. BSP assays to amplify the methylated and unmethylated alleles were designed to cover the region of each DML probe of the 450k chip. Consequently, each individual assay contains the CpG of interest and multiple additional CpGs (typically up to 30). The designed BSP assays were setup and subsequently used to enrich the respective regions in the same 48 samples used for the 450k analysis. Target enrichment was done by qPCR single reaction, followed by pooling together all 53 targets for each sample. Library preparation and targeted sequencing on the PGM was conducted according to the manufacturer protocol. In brief, an individual barcode and a sequencing adapter were attached to the pooled targets (53) of each sample. Next, a pool containing all targets and all barcoded samples (48 samples each with 53 targets) was prepared for the final sequencing protocol, which includes an emulsion PCR to enrich the targets on the microspheres. The 48 samples were split into 3 equal batches of 16 samples. Each batch was loaded onto an Ion 314 chip and analyzed on the PGM.
In order to run the command line version of TABSAT, a system with a Linux operating system and current hardware (50 GB HDD, 16GB RAM) is required. Installation descriptions are detailed at http://github.com/tadkeys/tabsat. The Platomics version of TABSAT can be freely used and does not require any installation. In order to try-out the software please consult the information at https://github.com/tadkeys/tabsat/blob/master/demo.md. TABSAT can be used with the following parameters:
Target file [-t]: The first input is the target file containing the region of interest. The pipeline accepts a tab-separated text file containing name, chromosome, start position, end position, and strand for all targets.
Library [-l]: This input can be either directional (DIR) or non-directional (NONDIR) depending on the used bisulfite sequencing adapters.
Genome [-g]: This parameter can be either hg19 (human) or mm10 (mouse).
Sequencing library [-e]: This input can be either SE (single end) or PE (paired-end).
Aligner [-a] (optional): This input selects the aligner (TMAP or Bowtie2) used for mapping the reads to the reference genome.
Minimum read length [-m] (optional): This parameter is used for filtering reads that are shorter than the given threshold.
Minimum 3’ read quality [-q] (optional): Bases that are below the given threshold are removed from the 3’ end of the reads.
Percent of target covered by a read for pattern creation [-p] (optional): This value specifies the percentage of the target that needs to be covered by a read to include it for pattern analysis.
Minimum number of mapped reads per CpG [-r] (optional): Number of reads that need to be present at each CpG site.
Lollipop sort order [-s] (optional): List of samples that is used to specify the order in the lollipop plots.
Output directory [-o]: This parameter determines where the output of the analysis run is stored.
Input files: With the last input the user specifies a list of FASTQ files (one or many targeted deep bisulfite sequencing runs) as produced by the sequencer. If a barcoded library was used, FASTQ files need to be generated for all barcodes before use (barcode splitting is not performed by TABSAT). Optionally the user can specify a directory (using–d) containing several FASTQ files.
This work was supported by the European Union, FP7 small medium focused project 277849 EurHEALTHAgeing (http://eurhealth.org/).
Conceived and designed the experiments: SP KE WP RK AV SM DK AN AK KV AW. Performed the experiments: SP KE WP DK. Analyzed the data: SP KE WP. Contributed reagents/materials/analysis tools: WP AV SM DK AW. Wrote the paper: SP WP RK AW.
- 1. Maunakea AK, Chepelev I, Cui K, Zhao K. Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Res. 2013; 23: 1256–1269. pmid:23938295
- 2. Muers M. Gene expression: Disentangling DNA methylation. Nat. Rev. Genet. 2013; 14: 519. pmid:23797851
- 3. Wagner JR, Busche S, Ge B, Kwan T, Pastinen T, Blanchette M. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 2014; 15: R37. pmid:24555846
- 4. Jones PA, Gonzalgo ML. Altered DNA methylation and genome instability: a new pathway to cancer. Proc. Natl. Acad. Sci. U.S.A. 1997; 94: 2103–2105. pmid:9122155
- 5. Portela A, Esteller M. Epigenetic modifications and human disease. Nat. Biotechnol. 2010; 28: 1057–1068. pmid:20944598
- 6. Paulsen M, Ferguson-Smith AC. DNA methylation in genomic imprinting, development, and disease. J Pathol. 2001; 195: 97–110. pmid:11568896
- 7. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013; 14: R115. pmid:24138928
- 8. Dong C, Yoon W, Goldschmidt-Clermont PJ. DNA methylation and atherosclerosis. J. Nutr. 2002; 132: 2406S–2409S. pmid:12163701
- 9. Robertson KD. DNA methylation and human disease. Nat. Rev. Genet. 2005; 6: 597–610. pmid:16136652
- 10. Kulis M, Esteller M. DNA methylation and cancer. Adv. Genet. 2010; 70: 27–56. pmid:20920744
- 11. Noehammer C, Pulverer W, Hassler MR, Hofner M, Wielscher M, Vierlinger K, et al. Strategies for validation and testing of DNA methylation biomarkers. Epigenomics. 2014; 6: 603–622. pmid:25531255
- 12. Mertes F, Elsharawy A, Sauer S, van Helvoort J, van der Zaag P, Franke A, et al. Targeted enrichment of genomic DNA regions for next-generation sequencing. Brief Funct Genomics. 2011; 10: 374–386. pmid:22121152
- 13. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 2012; 30: 434–439. pmid:22522955
- 14. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011; 475: 348–352. pmid:21776081
- 15. Bragg LM, Stone G, Butler MK, Hugenholtz P, Tyson GW. Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput. Biol. 2013; 9: e1003031. pmid:23592973
- 16. Krueger F, Kreck B, Franke A, Andrews SR. DNA methylome analysis using short bisulfite sequencing data. Nat. Methods. 2012; 9: 145–151. pmid:22290186
- 17. Tran H, Porter J, Sun M, Xie H, Zhang L. Objective and comprehensive evaluation of bisulfite short read mapping tools. Adv Bioinformatics. 2014; 2014: 472045. pmid:24839440
- 18. Kunde-Ramamoorthy G, Coarfa C, Laritsky E, Kessler NJ, Harris RA, Xu M, et al. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 2014; 42: e43. pmid:24391148
- 19. Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011; 27: 1571–1572. pmid:21493656
- 20. IonTorrent. Torrent Suite; 2015. Available: https://github.com/iontorrent/TS.
- 21. Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011; 27: 863–864. pmid:21278185
- 22. Clark SJ, Harrison J, Paul CL, Frommer M. High sensitivity mapping of methylated cytosines. Nucleic Acids Res. 1994; 22: 2990–2997. pmid:8065911
- 23. Herman JG, Graff JR, Myöhänen S, Nelkin BD, Baylin SB. Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. Proc. Natl. Acad. Sci. U.S.A. 1996; 93: 9821–9826. pmid:8790415
- 24. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002; 12: 996–1006. pmid:12045153
- 25. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10: R25. pmid:19261174
- 26. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012; 9: 357–359. pmid:22388286
- 27. Chatterjee A, Stockwell PA, Rodger EJ, Morison IM. Comparison of alignment software for genome-wide bisulphite sequence data. Nucleic Acids Res. 2012; 40: e79. pmid:22344695
- 28. Sven Rahmann, Jasmin Beygo, Deniz Kanber, Marcel Martin, Bernhard Horsthemke, Karin Buiting. Amplikyzer: Automated methylation analysis of amplicons from bisulfite flowgram sequencing. 2013. https://doi.org/10.7287/peerj.preprints.122v2
- 29. Becker D, Lutsik P, Ebert P, Bock C, Lengauer T, Walter J. BiQ Analyzer HiMod: an interactive software tool for high-throughput locus-specific analysis of 5-methylcytosine and its oxidized derivatives. Nucleic Acids Res. 2014; 42: W501–7. pmid:24875479
- 30. Guo W, Fiziev P, Yan W, Cokus S, Sun X, Zhang MQ, et al. BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data. BMC Genomics. 2013; 14: 774. pmid:24206606
- 31. Xi Y, Li W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics. 2009; 10: 232. pmid:19635165
- 32. Corley MJ, Zhang W, Zheng X, Lum-Jones A, Maunakea AK. Semiconductor-based sequencing of genome-wide DNA methylation states. Epigenetics. 2015; 10: 153–166. pmid:25602802
- 33. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, et al. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat. Genet. 2005; 37: 853–862. pmid:16007088
- 34. Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers on the performance of genomic pipelines. PeerJ. 2015; 3: e1273. pmid:26421241
- 35. Moayyeri A, Hammond CJ, Valdes AM, Spector TD. Cohort Profile: TwinsUK and healthy ageing twin study. Int J Epidemiol. 2013; 42: 76–85. pmid:22253318
- 36. Wang D, Yan L, Hu Q, Sucheston LE, Higgins MJ, Ambrosone CB, et al. IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data. Bioinformatics. 2012; 28: 729–730. pmid:22253290
- 37. Pandey RV, Pulverer W, Kallmeyer R, Beikircher G, Pabinger S, Kriegner A, et al. MSRE-HTPrimer: a high-throughput and genome-wide primer design pipeline optimized for epigenetic research. Clin Epigenetics. 2016; 8: 26. pmid:26949424