The authors have declared that no competing interests exist.
Conceived and designed the experiments: PC MC AC FF. Performed the experiments: AC FF. Analyzed the data: AC FF. Contributed reagents/materials/analysis tools: PC MC. Wrote the paper: AC FF AOD CS RS PC MC.
Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the risks associated with comparing data generated using different strategies. We also recommend that laboratories with particular interests in certain microbes should optimise their protocols to accurately detect these taxa using different techniques.
The use of Next Generation Sequencing (NGS) for the analysis of complex microbial communities has increased dramatically in recent years. Reasons for this include a continual decrease in cost and an ever greater appreciation of the ability of NGS to more comprehensively characterise microbial communities than traditional culture based methods. NGS has been advantageous in determining the role of the microbiome in disorders like Inflammatory Bowel Disease [
There are many methodological choices to be made when conducting a sequence-based microbiome study. These decisions have led to the introduction of a variety of technical variables that affect the compositional signal to various degrees, potentially limiting the ability to investigate the main hypothesis or to compare results relating to communities that are similar but which have been investigated using different methods. Factors such as sampling methods, DNA extraction protocol [
The majority of microbiome studies have relied on 16S rRNA gene amplicon sequencing. There are nine different variable regions within the prokaryotes ubiquitous 16S rRNA gene (V1-V9), each flanked by highly conserved stretches of DNA suitable for primer binding [
One of the first considerations before embarking on a microbiota project is to select a sequencing technology. Traditionally, the most common options are Roche 454 GS-FLX, the Illumina MiSeq (lower output, longer reads) and HiSeq (higher output, shorter reads) and the Ion PGM, each offering a series of advantages and disadvantages (see
Comparative studies were also conducted to assess the initial potential of the MiSeq to replace the Roche 454 GS-FLX, while also evaluating the effect of the variable region studied. Kozich and co-authors established a dual-index barcoding approach suitable for variable MiSeq read lengths and amplicon regions, in particular V3-V4, V4 and V4-V5 regions [
With the ever increasing number of technological variables that have the potential to have non-trivial effects on microbiota composition analysis, it is critically important to maintain a consistent methodology within studies and when comparing studies, or to have evidence that any inconsistencies that exist do not bias results. A more expensive alternative to 16S rRNA gene amplicon sequencing is shotgun metagenomic sequencing, which bypasses gene-specific amplification and potentially sequences all fragmented DNA, including that from other microorganisms and viruses, in a community. While providing much more information, including encoded functions of the microbiota, the vast amount of sequence data obtained however leads to a new set of challenges in terms of data processing, storage and analysis. For instance, the Illumina HiSeq 2500 platform can yield over 1,000,000,000,000 bp (1 Tbp) of raw sequence data, which may increase several-fold during downstream processing and analysis. Shotgun sequencing is also possible using both the Illumina MiSeq and Ion PGM albeit with less throughput compared to HiSeq. Some non-metagenomic studies have evaluated these platforms and demonstrated comparable results when used to detect blood pathogens [
In the current study we investigated the impact of various amplicon primer combinations and sequencing technologies on the analysis of complex microbial communities. More specifically we compared amplicon and shotgun data generated by Illumina MiSeq, HiSeq and Ion PGM through the use of six human stool samples using two primer sets covering two different 16S rRNA gene regions (V1-V2 [
Stool samples were collected from six elderly individuals and stored at -80°C during the ELDERMET project [
For Illumina MiSeq shotgun sequencing, samples were initially tagmented, whereby the Nextera Transposome with sequencing adaptors combines to template DNA resulting in fragmentation of the DNA and the addition of adaptors using the Nextera XT kit from Illumina. A limited 12-cycle PCR was completed during which time sequencing adaptors and indexing primers were added to the DNA. Amplicon samples were then normalized and pooled, followed by sequencing on the MiSeq platform using Illumina protocols for a 2 x 300 cycle run, with an insert size of 400 bases.
Shotgun libraries for Ion PGM were generated according to instructions from the ‘Ion Xpress™ Plus gDNA Fragment Library Preparation’ User guide (Publication number MAN0007044). Libraries were sheared, size selected and individually barcoded using the Ion Xpress Barcode Adapters. Following library quantification and equimolar pooling, the Ion OneTouch™ 2 system was used to prepare template positive ion sphere particles containing the clonally amplified DNA libraries using the ION PGM™ Template OT2 400 Kit, allowing up to 400 bp single-end reads. Enrichment of the template positive ISPs was performed using the Ion OneTouch™ ES and an enrichment percentage of 18% was obtained, which was within the range recommended in the ION PGM™ Template OT2 400 Kit guide (Publication number MAN0007218). Sequencing was performed on the Ion PGM using an Ion 318v2 chip and the Ion PGM Sequencing 400 kit (guide number MAN0007242).
Shotgun Illumina HiSeq sequencing reads were obtained from the published ELDERMET dataset [
MiSeq reads were merged and filtered using
All three shotgun datasets reads were aligned to the human genome version 20 (hg20) to filter out human-derived sequences using Bowtie2 version 2.2.3. Illumina HiSeq and MiSeq reads were subsequently quality filtered and trimmed using Trimmomatic version 0.32 [
All metagenome assemblies were performed using IDBA_UD version 4.1.2 [
All statistical analysis was performed in R version 3.1.3. In each of the heatplots, Spearman correlations, along with Ward D2 clustering, were performed on the relative abundance at genus level of each sample. As the data was largely non-parametric, Spearman correlations were chosen to prevent breaking the statistical assumptions of Pearson correlations. A Mann-whitney test was used to analyse differences in the taxa between clusters. Where necessary, the P-values were corrected for multiple testing using Benjamini and Hochberg [
The data generated reflected the different outputs of the three platforms. For the amplicon datasets the PGM produced 57,720 (mean) ± 9,841 (SD) V1-V2, and 33,454 ± 10,488 V4-V5 reads per sample, respectively, while the MiSeq produced 181,758 ± 108,343 V1-V2, and 102,824 ± 22,154 V4-V5 reads per sample, respectively. For the shotgun datasets there was also a marked difference between the three sequencing technologies, with 26,590,475 ± 51,650 HiSeq, 1,352,748 ± 458,483 MiSeq and 962,226 ± 170,251 PGM reads were generated per sample, respectively.
We performed hierarchical clustering analysis on the microbiota composition of all six stool samples in order to assess the effect of the amplification primer combination (where relevant), sequencing strategy (16S rRNA gene or shotgun), sequencing technology and type along with metagenomic read classifier.
The heat-plot also includes amplicon data long shotgun datasets from three classifiers namely: MetaPhlAn2, Kraken and GOTTCHA. Only genera in a minimum of 20% of datasets were retained. The method of correlation used was Spearman along with Ward D2 Clustering (PGM = Ion Personal Genome Machine).
For the amplicon datasets, sample-wise clustering was less prevalent than for the metagenomic datasets. MiSeq V1V2 amplicons were contained in a distinctive sub-cluster, contained within the cluster labelled 1 in
As for bacterial taxa that were the most abundant across all of the datasets, there were some families that differentiate the six subjects regardless of methodology used (
The families are first organised by phylum abundance (highest to lowest) followed by family abundance (highest to lowest) in each of the phyla. The numbers of observed species are located at the top of each bar.
The data points represent the median values across the 6 samples and the error bars are the 25% and 75% quartile ranges.
To investigate which technology was most suitable for shotgun sequencing, we performed random subsampling of reads to determine occurrences at even sequencing depths, in recognition of the fact that the HiSeq coverage was substantially higher than the coverage for MiSeq and PGM.
Each point represents the median value across each of the 6 samples per technology (including 3 replicates per sample). Error bars are the 25% and 75% quartile ranges.
Furthermore, unique species detection was also performed on the sub-sampled shotgun sequencing-derived reads (
Each point represents the median value across each of the 6 samples per technology (including 3 replicates per sample). Error bars are the 25% and 75% quartile ranges.
From within the categories of shotgun datasets, the core and unique genes were predicted using Metaphor (
The numbers represent the total number of predicted complete or incomplete genes for each metagenome.
The NGS technologies Illumina MiSeq, HiSeq and Ion PGM have shown significant promise in delivering cost-effective, high-resolution insights into microbiomes from various environments. However, due to a multitude of technical variables, careful comparisons are required to provide recommendations for suitable methodological approaches. In response to this, we compared the taxonomic composition of six stool samples using two different primer combinations covering two 16S rRNA gene variable regions. We then compared these results with those of shotgun sequencing using Illumina and Ion technologies.
Following either OTU clustering of amplicon reads or taxonomic classification by binning of shotgun reads, all at genus level, we compared microbiota composition of the different datasets. Even though the gut microbiota is generally regarded as individual specific, it was apparent that some amplicon datasets clustered according to technology and/or primer set, rather than by subject. In particular, microbiota composition from all V1V2 MiSeq and four of the six V4V5 MiSeq datasets grouped together in separate sub-clusters. The V1V2 and V4V5 PGM datasets clustered by sample opposed to technology in 3 of the 6 samples (samples 1, 3 and 6) while the V4V5 MiSeq data clustered with V4V5 PGM data per sample in 2 of the 6 samples (samples 5 and 6).
To ensure that the differences in classifications between shotgun and amplicon sequencing were not simply due to a particular shotgun classification method, we compared the compositional clustering with three classifiers of shotgun reads, MetaPhlAn2, GOTTCHA and Kraken. The shotgun datasets grouped together in a sub-cluster separated from the amplicon datasets, which might be expected as these methods are independent of amplification bias and 16S rRNA gene copy number differences. With MetaPhlAn2, all Illumina HiSeq and MiSeq datasets were consistently closer to each other than to the PGM shotgun sequences. This is seen to a smaller degree with GOTTCHA, where three of the six samples sub-clustered the Illumina technologies, but not at all for Kraken assemblies. In terms of clustering by sample over method, MetaPhlAn2 gave the most optimal results with all datasets clustered by sample groups, closely followed by Kraken where this occurred for 5 of the 6 samples in separate sub-clusters. GOTTCHA failed to cluster any dataset by samples, indicating its higher sensitivity for technological artefacts between sequencing methods. However, it must be noted that measuring accuracy based on individual sample clustering is not always a reflection of performance, as GOTTCHA datasets clustered more closely to MetaPhlAn2 and although sample clustering is observed when using Kraken, many of the taxonomic assignments may be false positives as previously mentioned.
Unsurprisingly, Illumina HiSeq shotgun sequences translated to the highest number of species, compared to the other two shotgun datasets, which were more than an order of magnitude smaller. Sub-sampling that simulated lower HiSeq coverage revealed, however, that even equal number of reads could result in more observed species for HiSeq. As this technology produces shorter reads compared to MiSeq and PGM it is possible that the number of species is artificially inflated as a result of higher sequence variation created from incorrect alignment to the reference marker genes. While not directly comparable with species observed through shotgun sequencing, V1-V2 amplicons, which are expected to be more variable than V4-V5 amplicons, sequenced by PGM resulted in the highest species counts.
Despite having the largest number of reads per sample, the V1-V2 region on the MiSeq had at each subsampling point the lowest number of unique species identified. This could be due to the questionable reliability for this primer combination in relation to unexpected clustering and failure to detect expected genera. Curiously, Salipante
The benefits to using metagenomic shotgun over amplicon sequencing are clear in terms of increased information content and reduced biases related to amplification and gene copy numbers. However, it is currently not established what sequencing depth is required for the different technologies; this is a more pertinent issue for shotgun than for amplicon sequencing, due to its much higher cost per sample. We therefore assembled the randomly sub-sampled shotgun datasets and compared the common N50 metric across the three sequencing technologies. As expected, the MiSeq technology, with its non-overlapping 300 bp paired-end reads, had marginally higher N50 values than HiSeq and PGM. An N50 peak occurred at 10 million reads for the HiSeq data suggesting that this was the optimal point for sequencing depth for stool samples and 100 bp paired-end reads with 300 bp insert size. There was no peak observed for the PGM or the MiSeq in the available coverage range, which may suggest that the coverage may not be sufficient to reach an optimal level of assembly. Somewhat surprisingly, for two of the six samples there were drastically elevated N50 values at 600,000 HiSeq reads, irrespective of which random sub-sampling set. Such early N50 peaks were also observed using two other assemblers, albeit for a different number of reads, and has previously been reported when assembling sub-samples of an isolated bacterium [
In terms of functional categorisation of assembled shotgun sequences, we found the MiSeq and PGM datasets to largely contain equal proportions of predicted core genes from the assembled contigs. For the HiSeq assemblies there were, however, substantially fewer core genes involved in “Metabolism” and more genes with unknown function. This may be attributable to the fewer number of predicted complete genes, which is plausible for this shorter-read technology.
To summarise, this is, to our knowledge, the first reported study comparing both amplicon and shotgun sequencing for Illumina and Ion technologies. Although shotgun sequencing did not suffer from the same degree of technology-dependent bias seen with the amplicon sequencing, there were some major distinct differences between phylogenetic binning software, with MetaPhlAn2 producing the most favourable results. GOTTCHA failed to cluster any datasets by sample, however sub-clustered with MetaPhlAn2, while Kraken clustered separately from the other two binners and also appeared to produce a high number of false positive taxonomic assignments. The variation of microbiota composition between the majority of gut samples proved to be lesser than between the compared sequencing technologies and variable 16S rRNA gene regions. In particular, the V1-V2 MiSeq showed poor performance, while the V4-V5 region was marginally more reliable on both platforms. There is evidence that the MiSeq and PGM offer valuable information when used for shotgun sequencing, however, in order to detect the majority of species in samples and to perform a high quality assembly, deeper sequencing is required. Species assignment is also dependent on read length, which is shorter for the HiSeq. We subsequently showed that there may be no assembly-related benefit in sequencing greater than 10 million HiSeq reads per stool sample. Nevertheless, as the cost of shotgun sequencing is lower on the HiSeq instrument compared to MiSeq or PGM, this platform may still be preferable even though MiSeq produces longer reads and somewhat better assemblies at low sequencing depth. Caution should however be applied with regards to taxonomic binning, and comparisons such as those described in this study must be carried out to prevent methodological biases eclipsing the true biological picture. Hence, we advise laboratories with particular interests in certain microbes to optimise their protocols to accurately detect these taxa using different techniques.
Each point represents the median value across each of the 6 samples per technology (including 3 replicates per sample). Error bars are the 25% and 75% quartile ranges.
(PDF)
Each point represents the median value across each of the 6 samples per technology (including 3 replicates per sample). Error bars are the 25% and 75% quartile ranges.
(PDF)
The table also contains the PCR conditions for 16S rRNA gene amlification and sequence length for quality filtering during read processing.
(XLSX)
A Mann-whitney test was used to analyse differences and the P-values were corrected for multiple testing using Benjamini and Hochberg.
(XLSX)
(XLSX)
The authors wish to thank Dr. Fiona Crispie and Ms. Vicki Murray for their extensive help with the sequencing in this study. This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2273 and 11/PI/1137 and by FP7 funded CFMATTERS (Cystic Fibrosis Microbiome-determined Antibiotic Therapy Trial in Exacerbations: Results Stratified, Grant Agreement no. 603038).