Assessing the Fecal Microbiota: An Optimized Ion Torrent 16S rRNA Gene-Based Analysis Protocol

Assessing the distribution of 16S rRNA gene sequences within a biological sample represents the current state-of-the-art for determination of human gut microbiota composition. Advances in dissecting the microbial biodiversity of this ecosystem have very much been dependent on the development of novel high-throughput DNA sequencing technologies, like the Ion Torrent. However, the precise representation of this bacterial community may be affected by the protocols used for DNA extraction as well as by the PCR primers employed in the amplification reaction. Here, we describe an optimized protocol for 16S rRNA gene-based profiling of the fecal microbiota.


Introduction
The gut microbiota is presumed to play a key role in human health and disease through its impact on nutrition, pathogenesis and immunology [1]. During the last decade extensive efforts have been made to determine the complexity of the microbial communities residing in the gut (for review see [2]). In fact, (gut) disorders such as inflammatory bowel disease [3,4], irritable bowel syndrome [5], obesity [6,7] and necrotizing enterocolitis [8] have been considered as deviations from a healthy gut microbiota composition.
The capability of high through-put sequencing of 16S rRNA gene sequences by means of Next Generation Sequencing (NGS) technologies has been pivotal in facilitating the discovery of gut microbiota biodiversity [9]. The Ion Torrent PGM instrument represents a recently commercialized bench-top NGS platform and is marketed as being less costly and with a faster turnaround as compared to other NGS techniques such as the 454 and Illumina platforms [10,11]. Application of the Ion Torrent technology to 16S rRNA-based profiling of complex bacterial communities has been achieved for the investigation of the aquatic microbial community structure of the Athabasca river [12], the bacterial and archaeal community dynamics in a covered anaerobic pond that was utilized to treat waste from a piggery [13], and the microbial population residing in human subgingival plaque [14].
Previous studies have pointed out how the obtained biodiversity image of the gut microbiota is affected by various protocols used for DNA extraction, as well as by the particular PCR primers used for amplification of the targeted region of the 16S rRNA gene [15][16][17], leading to an underestimation of key components of the gut microbiota of infants, in particular bifidobacteria [18]. In fact, based on both culture-based techniques and analysis using speciesspecific DNA probes, bifidobacteria were considered to represent the dominant component of the neonatal gut microbiota, [19][20][21], though other microbiota studies have suggested that bifidobacteria are present at low abundance or even absent in the infant gut microbiota [22,23].
These findings reinforce the need for a reliable protocol to investigate the composition of the human gut microbiota. Here, we describe a procedure specifically designed for the Ion Torrent PGM technology to determine the biodiversity of the human gut by means of 16S rRNA gene-based sequence profiling.

Subject Recruitment and Fecal Sample Collection
The study was approved by the Ethical Committee of the Regional Asturias Public Health Service (SESPA) and informed written consent was obtained from the mothers. All subjects were healthy and had not received any antibiotic or probiotic in the previous 3 months. Stool samples consisted of 6-10 gr of fresh fecal material, and were immediately frozen upon collection at 280uC until processed for DNA extraction.

PCR Primer Design
Primers Probio_Uni/Probio_Rev (Table 1) were assessed for specificity using the ARB software package [25] and the SILVA 108 SSU Reference 16S rRNA gene database release [26]. In order to validate the designed primers, we used an in silico approach based on BLASTN matches with corresponding 16S rRNA gene sequences from various bacteria that are commonly found in the gut as well as those that are not expected in this environment, which provided allowed us to evaluate if these primers may also be used to examine the biodiversity of different microbial ecosystems. In order to increase the number of microbial taxa recognized by the primers designed in this study we introduced a certain level of nucleotide degeneracy.

Deliberate Contamination of Faeces from Germ-free Rats and DNA Extraction
Pellets for each strain were extensively washed with PBS, concentrated in a PBS solution and cell counts were calculated in a Neubauer Chamber. Feces from germ-free Fisher 344 rats (males) were deliberately contaminated with three different mixes of microorganisms, named mix P1, mix P2 and mix P3, and homogenised for 1 min using a stomacher (IUL Instruments, Barcelona, Spain). In all cases, M. smithii was added at a final concentration of 1.30E9 cells/g faeces. In the first mix (P1), both Gram positive and Gram negative microorganisms were added at a final viable count of 3.84 E9 cells/g feces for each strain. In the second mix (P2) Gram positive numbers were the same as in the P1 mix, but Gram negative bacteria were added at a final concentration of 3.84 E7 cells/g faeces for each strain, resulting in a cell population in which the numbers of each Gram negative strain are 2 logs below the numbers of Gram positive strains. Finally, in the third mix (P3) the number of each introduced Gram positive bacterial strain was 2 logs below the Gram negatives: 3.84 E7 cells/g feces for each Gram positive strain, and 3.84 E9 cells/g feces for each Gram negative strain.

Mouse Trial
All animals used in this study were cared for in compliance with guidelines established by the Italian Ministry of Health. All procedures were approved by the University of Parma, as executed by the Institutional Animal Care and Use Committee (Dipartimento per la Sanità Pubblica Veterinaria, la Nutrizione e la Sicurezza degli Alimenti Direzione Generale della Sanità Animale e del Farmaco Veterinario, Italy). Fecal samples were collected after removal of animals from their box and no sacrifice of the animals was performed.

16S rRNA Gene Amplification
Partial 16S rRNA gene sequences were amplified from extracted DNA using primer pair Probio_Uni and/Probio_Rev, which targets the V3 region of the 16S rRNA gene sequence, or employing primer pair P1 and P2 [27], which targets the V3 region of the 16S rRNA gene sequences, or primer pair 520F and 802R [28] corresponding to the V4 region of the 16S rRNA gene sequences. These primers were designed to include at their 59 end one of the two adaptor sequences used in the Ion Torrentsequencing library preparation protocol linking a unique Tag barcode of 10 bases to identify different samples. The complete list of the primers used in this study is reported in Table 1.
The PCR conditions used were 5 min at 95uC, 35 cycles of 30 s at 94uC, 30 s at 55uC and 90 s at 72uC, followed by 10 min at 72uC. Amplification was carried out by using a Verity Thermocycler (Applied Biosystems). The integrity of the PCR amplicons was analyzed by electrophoresis on an Experion workstation (BioRad, UK).

Ion Torrent PGM Sequencing of 16S rRNA Gene-based Amplicons
The PCR products derived from amplification of specific 16S rRNA gene hypervariable regions were purified by electrophoretic separation on an 1.5% agarose gel and the use of a Wizard SV Gen PCR Clean-Up System (Promega), followed by a further purification step involving the Agencourt AMPure XP DNA purification beads (Beckman Coulter Genomics GmbH,Bernried, Germany) in order to remove primer dimers. DNA concentration of the amplified sequence library was estimated through the Experion system (BioRad). From the concentration and the average size of each amplicon library, the amount of DNA fragments per microliter was calculated and libraries for each run were diluted to 3E9 DNA molecules prior to clonal amplification. Emulsion PCR was carried out using the Ion OneTouch TM 200 Template Kit v2 DL (Life Technologies) according to the manufacturer's instructions. Se-quencing of the amplicon libraries was carried out on a 314 chip using the Ion Torrent PGM system and employing the Ion Sequencing 200 kit (Life Technologies) according to the supplier's instructions. After sequencing, the individual sequence reads were filtered by the PGM software to remove low quality and polyclonal sequences. Sequences matching the PGM 39 adaptor were also automatically trimmed. All PGM quality-approved, trimmed and filtered data were exported as sff files.

Sequence-based Microbiota Analysis
The sff files were processed using QIIME [29]. Quality control retained sequences with a length between 150 and 200 bp, mean sequence quality score .25, with truncation of a sequence at the first base if a low quality rolling 10 bp window was found. Presence of homopolymers .6 bp, and sequences with mismatched primers were omitted. In order to calculate downstream diversity measures (alpha and beta diversity indices, Unifrac analysis), 16S rRNA Operational Taxonomic Units (OTUs) were defined at $97% sequence homology. All reads were classified to the lowest possible taxonomic rank using QIIME and a reference dataset from the Ribosomal Database Project [30].
OTUs were assigned using uclust [31]. The hierarchical clustering based on population profiles of most common and abundant taxa was performed using UPGMA clustering (Unweighted Pair Group Method with Arithmetic mean, also known as average linkage) on the distance matrix of OTU abundance. This resulted in a Newick formatted tree, which was obtained utilizing the QIIME package.

Nucleotide Sequence Accession Numbers
The raw sequences reported in this article have been deposited in the NCBI Short Read Archive (SRA) (SAMN02009352, SAMN02009353, SAMN02009354, SAMN02009355, SAMN02009356, SAMN02009357).

Results and Discussion
Design of a Suitable PCR Primer Pair for 16S rRNA Gene Sequence Profiling on the Ion Torrent NGS Platform Previous investigations on taxonomic classification of the human gut microbiota were based on PCR primers that target the 16S rRNA gene. In order to develop a specific and reliable 16S rRNA gene-based primer set that is suitable for Ion Torrent sequencing technology, we selected, based on currently available literature [27,28], primer pairs that generate an amplicon of a maximum size of 200 bp and that target the V4 or V3 hypervariable region of the 16S rRNA gene, corresponding to positions 563-797 and 341-534 (coordinates based on the 16S rRNA gene of Escherichia coli strain K-12 substr. MG1655). In addition, a novel primer set, i.e. Probio_Uni/Probio_Rev, was designed for the purpose of this study by modification of the well established primer set P1/P2, originally designed by [27]. Compared to other primer pairs mentioned above, the Probio_Uni/Probio_Rev primer pair perfectly matches with a higher number of human gut microbiota components as detected by an in silico analysis against Ribosomal Database Project's (RDP) sequences. As displayed in Figure 1, the Probio_Uni/Probio_Rev primer pair theoretically targets all 16S rRNA gene sequences of the gut microbiota bacterial orders we selected with an average similarity of 94.4% and generating Figure 1. Heat map showing the order classification rates for optimal choice of primers. Cells are colored on a gradient from 0% to 100% matches of the primer with the 16S rRNA gene target sequences of the indicated microbial order. The size of the amplicons generated by the various primer pairs are indicated. Furthermore, the percentage of the amplification rate from each primer pair on the various microbial taxa here described is indicated at the top of the heat map. doi:10.1371/journal.pone.0068739.g001 an optimal Ion Torrent amplicon with an average length of 181 bp. Specifically, compared to the primer pair P1/P2 [27], the Probio_Uni/Probio_Rev primer pair perfectly targets (is fully complementary without mismatches) relevant 16S rRNA gene sequences of Archaeoglobales, Thermococcales, Sulfolobales, Cenarchaeales, Methanococcales and Methanopyrales (Fig. 1).

Development of an Appropriate Methodology for DNA Amplification
Precise assessment of the composition of the human gut microbiota crucially depends on the reliability and specificity of the PCR primers employed to amplify the 16S rRNA gene of Figure 2. Fecal levels of different microorganisms as analyzed by PCR using different 16S rRNA gene-based primers. Panel a depicts PCR amplification targeting the 16S rRNA gene of different enteric bifidobacteria as well as key intestinal bacteria using the primers described by [22] [27,32,33,43] and the primer set Probio_Uni/Probio_Rev. Panel b. displays the microbial profile of the amplicons generated by the primer sets described by [22,32,33] (Fig. 2). In particular, the microbial taxon Bifidobacterium appears to be completely absent when employing primer combinations ENV1/ENV2 and Bact-8F/Bact-1510R [22,33]. This is consistent with these primers exhibiting the lowest level of homology with the corresponding 16S rRNA gene sequences of members of the taxa Bifidobacteriaceae, Lactobacillus and Enterobacteriaceae (Fig. 1). Altogether these results lead us to conclude that the apparent lack of bifidobacterial sequences from certain previously published 16S rRNA gene profiling gut microbiota studies is in part due to PCR primers that were biased against key members of the gut microbiota such as bifidobacteria. Thus, many of the published studies describing the bacterial composition of the infant gut seem to have significantly underestimated the incidence and diversity of these commensals.

Setting up of a Valid Methodology for Sample Processing
Various studies investigating the human fecal microbiota biodiversity have reported highly variable and sometimes contradictory results [22,23,35]. Although these results may have been due to biological variation between individuals, it has been shown that sample preparation methods, such as DNA extraction does play a significant role in explaining the reported variations [16]. In order to address this issue we evaluated four fecal DNA extraction methods, i.e. involving different commercial kits and by introducing an initial enzymatic treatment prior to the actual DNA extraction protocol (see materials and methods), and assessed which of these methods would provide the most accurate representation of microbial populations in fecal samples by means of 16S rRNA gene-based profiling using the Ion Torrent PGM technology. Germ-free fecal samples from rats were mixed with various combinations of nine different microbial strains belonging to the following species: Bifidobacterium longum, Collinsella intestinalis, Blautia coccoides, Blautia producta, Faecalibacterium prausnitzii, Escherichia coli, Prevotella copri, Klebsiella pneumoniae, Bacteroides thetaiotaomicron and the Archeal strain Methanobrevibacter smithii, representing some of the most abundant bacterial species in the human gastrointestinal tract, according to several reports [36,37], while M. smithii is the most frequently found archeal bacterium in the human gut [38]. Total DNA samples, extracted by different methods from feces contaminated with microbial mixes of these ten microorganisms in different proportions, were used as template for amplification with the primers developed in this study, i.e., Probio_Uni and Probio_Rev. Gnotobiotic feces were used as negative controls in the four extraction procedures, yielding as expected no amplicons. The Ion Torrent sequencing platform, as optimized here, was shown to detect all bacterial strains present in the samples, independent of the relative amount of each strain. However, M. smithii was not consistently detected in all samples analyzed. Since the primers Probio_Uni and Probio_Rev match with a high score to the 16S rRNA gene sequences of this archaea, our results suggest incomplete lysis of archaeal cells. In this regard, Methanobrevibacter cells are covered by pseudomurein, a polymer that is similar to bacterial peptidoglycan with the exception of the l-N-acetyltalosaminuronic acid linked (b-1,3) to d-N-acetylglucosamine (GlcNAc) and the absence of D-amino acids in the interpeptide bridges [39]. Therefore, it seems that methods focused on pseudomurein digestion are needed for optimizing methanobacteria lysis. Overall, our results point that this microorganism is present in the human intestine in quantities higher than previously reported in some microbiota studies. Notably, our results indicate that several species, such as F. prausnitzii, C. intestinalis and B. thetaiotaomicron, are normally overestimated, whereas others, such as B. longum, are underestimated.
In order to determine which of the four tested DNA extraction methods provided the most reliable representation of the fecal bacterial community, we determined the proportion of each microbial group using the percentages of sequences obtained after 16S rRNA gene-based microbial profile analysis and compared this to the known number of microorganisms added to each sample. As previously shown, the evaluation of the abundance of members of the gut microbiota based on 16S rRNA gene profiling might be influenced by the copy number of 16S rRNA gene sequences [40].
Thus, we decided to normalize the number of reads linked to microbial taxon for the copy number of 16S rRNA gene loci present in their genome sequences (Table 2 and Figure S1). The detected difference, expressed in log units, between the number of microorganisms added and the number of sequences, either normalized for the copy number of 16S rRNA gene loci or not normalized, was calculated for each microorganism. The Bray-Curtis dissimilarity indexes were then calculated between the experimental data obtained for each bacterial mix (P1, P2 and P3) and the true levels of added microbes present in these mixes. This approach allows the identification of the best method without introducing subjective biases. In fact, the method showing the lowest mean value is considered to provide the most accurate Figure 3. Rarefaction curves generated for 16S rRNA gene sequences and Principal Coordinate Analysis (PCoA) based on the phylotypes identified from different PCR primer sets as well as from different samples (stool samples of infants and fecal samples of mothers). Panels a and c display the rarefaction curves and the PCoA from stool samples of infants. Panels b and d show the rarefaction curves and the PCoA from fecal samples of mothers. In panels c and d percentages shown along the axes represent the proportion of dissimilarities captured by the axes. Each symbol represents the 16S rRNA gene sequences from each sample which are displayed in a different colour and shape according to the PCR primer pair. In panels a and b the plots depicted on the left represent the rarefaction curves determined using the PD index whereas the plots show on the right constitute the rarefaction curves obtained using the Chao index. A 95% confidence intervals was added to the rarefaction curves. doi:10.1371/journal.pone.0068739.g003 overall representation of the bacterial communities in feces, based on the fact that the obtained Bray-Curtis dissimilarity index value is the lowest for the situation where the observed discrepancy between the number of microorganisms added and detected is the smallest. Thus, when normalization for the copy number of 16S rRNA gene loci was not considered, we found that the Qia-DNA extraction method yielded the most reliable results out of the four methods tested here, since it provides the most accurate reflection of the fecal microbiota composition. In contrast, the MBioEz-DNA extraction method generated the least accurate results. The Bray-Curtis distances using the four different methods were 0.17 (Qia), 0.23 (QiaEz), 0.29 (MBio) and 0.45 (MBioEz). However, if normalization for the number of 16S rRNA gene copies was taken into account, the Qia and QiaEz-DNA extraction methods showed the best results, with Bray-Curtis distance values being 0.17, 0.16, 0.39, and 0.42 for the Qia, QiaEz, MBio, and MBioEz-DNA extraction methods, respectively. We also noticed that an initial enzymatic treatment to enhance cell lysis increases the representation of some species, such as B. thetaiotaomicron. It is worthwhile mentioning that results are expected to be different when alternative DNA extraction protocols are used. In fact, other studies have recently indicated that the Qia-DNA extraction method is not as accurate as alternative bead-beating methods [41,42].

Comparison of Gut Microbiota Profiling using Different Primer Sets
In order to further evaluate the efficacy of different primer sets, including Probio_Uni/Probio_Rev, P1/P2 [27], 520F/802R [43] to delineate the microbiota composition of human fecal samples, we sequenced 16S rRNA gene-based amplicons achieved with either of these primers pairs, using the same DNA stock extracted from two different fecal samples. For this purpose two human fecal DNA samples were used, one retrieved from a three month infant stool sample, which based on previous published data was considered to possess a low level of complexity [18] and another one extracted from a mother's fecal sample assumed to include a diverse composition of microbiota [18,32,43]. In total, 1,276,969 sequence reads, representing 633,877 and 643,092 reads per infant-and mother sample, respectively, were generated on the Ion Torrent PGM machine ( Table 3). The decrease in the rate of phylotype detection and the plateauing of various diversity indices for this set of PCR primers demonstrated that a large part of the diversity in these libraries had been detected, even though a higher number of phylotypes was identified for the primer pair Probio_Uni/Probio_Rev as compared to the other PCR primer sets (Fig. 3).
The significance test in UniFrac [44] reported P-Values (Bonferroni corrected) equal to 1e-02 for every pair of samples of both mother and infant sets, meaning that the results are highly significant (if P-values #0.05 are considered to be statistical significant). This method was used to evaluate if the cluster distribution of the sequences achieved with the different PCR primers differs from random expectations. Principal Coordinate Analysis (PCoA), applied using the UniFrac program, showed that, while the datasets obtained with the P1/P2 and Probio_uni/ Probio_rev primer pairs cluster together, the dataset achieved with the 520F/802R primer pair is rather different (Fig. 3). This suggests that the microbial composition assessed by this latter primer set is different than those detected by either P1/P2 or Probio_uni/Probio_rev.
Clustering of de-noised high quality reads generated 18,910 and 2,361 OTUs for the mother and infant samples, respectively. As expected the microbial composition of the two fecal samples Table 3. Quantitative data of the 16S rRNA gene sequence datasets used in this study. displayed a much simpler OTU organization in the infant stool sample as compared to that of the mother. At genus level, sequencing reads could be assigned to 83 main individual taxons, of which 30 were present in all the three PCR-primer dataset in at least one of the two samples (Fig. 4). Furthermore, we analyzed the similarities and differences in the composition of the fecal sample assayed using the three PCR primer sets by hierarchical clustering based on population profiles of the most common and abundant taxa (Fig. 4). The clustering patterns are also reflected in the corresponding bar diagram (Fig. 4), highlighting bias in resolution of the PCR primer sets against specific microbial taxa. Microbiota composition observed with these different primer sets, highlighted a discrepancy with respect the Enterococcaceae, Lachnospiraceae, Lactobacillaceae, Streptococcaceae and Veillonellaceae (Fig. 4). In fact, primer set Probio_Uni/Probio_Rev displayed a higher proportion of members of these bacterial groups compared to the other set of primers, where a lower number or no phylotypes corresponding to the above mentioned microbial taxa were identified (Fig. 4). The latter finding thus reinforces the importance of a reliable PCR primer set for the appropriate delineation of the biodiversity of the human gut.

Conclusions
Appropriate primer selection as well as DNA extraction protocols in microbiota studies using 16S rRNA gene sequencing approach is essential to enable trustworthy representation of the organisms present in an environment such as human gut ecosystem. In this context, dominant infant gut microbiota members such as bifidobacteria are under-represented in many published metagenomic studies of the microbial biodiversity of the infant gut [22] due to technical biases. In fact, the presence of a thick cell wall of bifidobacterial cells as well as a polysaccharide surface layer [45] might render these microorganisms recalcitrant to cell lysis, thus causing a low recovery of their chromosomal DNA. In contrast, other key members of the gut microbiota such as Faecalibacterium and Bacteroides might be over-represented in most of the current human gut metagenomic studies. Here, we were able to demonstrate that the bias observed against the detection of bifidobacteria was due to the DNA extraction as well as PCR steps. We have shown that erroneous conclusions about the presence/absence as well as relative proportion of bifidobacteria are likely if primers that do not sufficiently complement the target 16S rRNA gene sequence are used. Noticeably, the error frequencies predicted to occur within DNA sequences generated by IonTorrent is equivalent to that estimated for other NGS technologies such as 454, i.e., about 1% [9]. Such a low level of error should not affect the final OTUs prediction, since these are calculated at 97% of nucleotide identity.
In this study we have designed a PCR primer set allowing accurate detection of bifidobacteria as well as other main members of the human gut microbiota, which, in combination with a fast and cheap sequencing approach like the Ion Torrent PGM, is suitable for the investigation of the microbial composition of the human gut microbiota. Figure S1 Ratio of 16S rRNA gene sequences obtained after the analysis of the artificially contaminated gnotobiotic fecal samples. Four different DNA extraction procedures (Qia, QiaEz, MBio and MBioEz DNA-extractions) and three different mixes of microorganisms (P1, P2 and P3) were used (see material and methods section). A, B and C represent the results obtained for the three different mixes (P1, P2 and P3, respectively) when the relative abundance of 16S rRNA gene sequences was not normalized. D, E and F represent the results obtained for the three different mixes (P1, P2 and P3, respectively) when the relative abundance of 16S sequences was normalized considering the predicted 16S copy number of the strains. In the center of each panel, the expected result according to the real microbial population present in each sample is depicted, and the graphics on the corners show the results using the four different extraction methods. The sequence of Blautia coccoides and Blautia producta were indistinguishable and were included in the same group. (TIF) Figure 4. Number of sequences per phylotype for each PCR primer pair and sample (stool samples of infants and fecal samples of mothers). The y axis displays the OTUs at family level detected in this study; each row represents a different OTU. Increasing darkness of the colour scale corresponds to higher estimated relative abundance. At the bottom of the heatmap a representation of the abundance of different phylotypes at the genus level detected by different PCR primer set for each sample has been displayed. doi:10.1371/journal.pone.0068739.g004