Reader Comments

Post a new comment on this article

Response to Dr. Rodriguez-Valera

Posted by PLOSBiology on 07 May 2009 at 22:28 GMT

Author: David Relman
Position: Professor of Microbiology & Immunology, and of Medicine
Institution: Stanford University
E-mail: relman@stanford.edu
Additional Authors: Les Dethlefsen, Sue Huse, Mitchell L. Sogin
Submitted Date: December 17, 2008
Published Date: December 18, 2008
This comment was originally posted as a “Reader Response” on the publication date indicated above. All Reader Responses are now available as comments.

We thank Dr. Rodriguez-Valera for his comment; he is certainly correct that intragenomic heterogeneity among multiple copies of the 16S rRNA gene can exceed several percent in some cases. Such divergence reaches 5% in the archaeon Haloarcula marismortui ; in this case the diversity is related to adaptation for growth at different temperatures (Mylvaganam and Dennis 1992; Lopez-Lopez et al. 2007). Multiple rRNA operons in a genome can persist for the duration of microbial species divergence; for example, the close relatives Escherichia coli and Salmonella enteritidis each have 7 rrn operons per genome, and spore-forming genera within the Firmicutes typically have an even higher number (Lee et al. 2008). Despite the example of H. marismortui, most rRNA sequence variation within a genome has no known functional significance, and regular intragenomic recombination generally limits the sequence divergence between multiple operons in a genome to a much smaller value than would be expected given their evolutionary age (Hashimoto et al. 2003).

The method we used for clustering pyrosequencing tags of 16S rRNA hypervariable regions into operational taxonomic units (OTUs) grouped all tags that shared the same reference sequence as their closest match. The set of reference sequences comprised all unique hypervariable region sequences extracted from full length sequences in release 92 of the SILVA database (Huse et al. 2008). Conversely, our reference-based OTUs (refOTUs) would separate tags that had their closest match to distinct reference sequences, even if those reference sequences were very similar and possibly derived from different rrn operons in the same genome.

Hence, Dr. Rodriguez-Valera is correct that the number of refOTUs inferred from a pyrosequencing survey of a microbial community could overestimate the true taxonomic diversity of the community. For a community that has near-complete representation in public rRNA databases, the number of refOTUs could approach the number of unique 16S rRNA gene sequences in the community, rather than the number of distinct strains or species. As Dr. Rodriguez-Valera points out, the potential for sequence-based OTUs to overestimate taxonomic diversity is not restricted to the pyrosequencing approach to 16S rRNA community surveys. Full length sequences of cloned 16S rRNA genes could differ by more than 1-3%, and thus be grouped into different species-level OTUs according to commonly used thresholds, despite representing multiple operons within a single genome rather than distinct microbial lineages.

In practice we believe the consequences are unlikely to be so extreme for our study, or other 16S rRNA-based community surveys. Although this result was not reported in the article, we found that clustering our V3 pyrosequencing reads into OTUs at a 3% genetic distance threshold resulted in somewhat more OTUs than the number of refOTUs that we did report (data not shown). (Although 3% genetic distance is a commonly-used threshold to approximate species-level clustering, our interest was motivated more by the fact that V3 tags containing 2 pyrosequencing errors will be less than 3% distant from the error-free sequence from which they are derived. Extrapolating from published rates of pyrosequencing errors based on the V6 region (Huse et al. 2007), tags containing more than 2 pyrosequencing errors are expected to be rare.)

For comparison, we assessed the average level of intragenomic 16S rRNA heterogeneity among completely sequenced genomes. From the Integrated Microbial Genomes website of the Joint Genome Institute (http://img.jgi.doe.gov, accessed 12/1/2008), we downloaded all 16S rRNA genes from 519 completely sequenced bacterial genomes that contained at least two 16S rRNA genes (total of 2445 16S rRNA genes, 2-14 per genome). We aligned the sequences using the Infernal-based aligner of RDP-10 at the Ribosomal Database Project website (http://rdp.cme.msu.edu), and imported the aligned sequences into Arb for the calculation of distance matricies that included all sequence positions (no mask) and used no distance correction. Averaged over all 519 genomes, the mean intragenomic 16S rRNA distance was less than 0.2%. Approximately 90% of the genomes had a mean intragenomic 16S rRNA distance of 0.5% or less; about 97.5% of the genomes had a mean intragenomic 16S rRNA distance of 1% or less. These distances would be smaller if a mask were used to exclude sequence positions that are highly variable and difficult to align, which is a typical practice for clustering full length sequences into OTUs, but was not done for our pyrosequencing tags.

Hence, if the sequenced bacterial genomes are representative of bacteria as a whole, we would expect few distinct 16S rRNA sequences from the same genome to be classified into different OTUs based on a 3% genetic distance. The number of refOTUs that we reported is even less than this number, so we are confident that we have not overestimated the taxonomic diversity of the gut communities at approximately the species level. It would appear that for both full length sequencing and for hypervariable region pyrosequencing, OTU definitions that protect against treating sequencing errors as biological diversity will also provide reasonable protection against mistaking intragenomic 16S rRNA heterogeneity for taxonomic diversity.

Nonetheless, Dr. Rodriguez-Valera is quite right to remind us and other researchers that although a particular pair of sequences may have been grouped into distinct OTUs (whether full length or hypervariable region sequences, and whether the OTUs are based on reference sequences or genetic distance), they may in fact be derived from different genes in the same genome. This caveat can be added to the list of issues, including PCR bias, differential cell lysis, and variation in the number of rrn operons per genome, that prevent us from considering 16S rRNA gene surveys as direct measurements of microbial diversity.

References

Hashimoto JG, Stevenson BS, Schmidt TM (2003) Rates and consequences of recombination between rRNA operons. J Bacteriol 185(3): 966-972.

Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8(7): R143.

Huse SM, Dethlefsen L, Huber JA, Welch DM, Relman DA et al. (2008) Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet 4(11): e1000255.

Lee ZM, Bussema C, 3rd, Schmidt TM (2008) rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids Res.

Lopez-Lopez A, Benlloch S, Bonfa M, Rodriguez-Valera F, Mira A (2007) Intragenomic 16S rDNA divergence in Haloarcula marismortui is an adaptation to different temperatures. J Mol Evol 65(6): 687-696.

Mylvaganam S, Dennis PP (1992) Sequence heterogeneity between the two genes encoding 16S rRNA from the halophilic archaebacterium Haloarcula marismortui. Genetics 130(3): 399-410.

Competing interests declared: Authors of the Research Article