The Genome of Akkermansia muciniphila, a Dedicated Intestinal Mucin Degrader, and Its Use in Exploring Intestinal Metagenomes

Background The human gastrointestinal tract contains a complex community of microbes, fulfilling important health-promoting functions. However, this vast complexity of species hampers the assignment of responsible organisms to these functions. Recently, Akkermansia muciniphila, a new species from the deeply branched phylum Verrucomicrobia, was isolated from the human intestinal tract based on its capacity to efficiently use mucus as a carbon and nitrogen source. This anaerobic resident is associated with the protective mucus lining of the intestines. Methodology/Principal Findings In order to uncover the functional potential of A. muciniphila, its genome was sequenced and annotated. It was found to contain numerous candidate mucinase-encoding genes, but lacking genes encoding canonical mucus-binding domains. Numerous phage-associated sequences found throughout the genome indicate that viruses have played an important part in the evolution of this species. Furthermore, we mined 37 GI tract metagenomes for the presence, and genetic diversity of Akkermansia sequences. Out of 37, eleven contained 16S ribosomal RNA gene sequences that are >95% identical to that of A. muciniphila. In addition, these libraries were found to contain large amounts of Akkermansia DNA based on average nucleotide identity scores, which indicated in one subject co-colonization by different Akkermansia phylotypes. An additional 12 libraries also contained Akkermansia sequences, making a total of ∼16 Mbp of new Akkermansia pangenomic DNA. The relative abundance of Akkermansia DNA varied between <0.01% to nearly 4% of the assembled metagenomic reads. Finally, by testing a large collection of full length 16S sequences, we find at least eight different representative species in the genus Akkermansia. Conclusions/Significance These large repositories allow us to further mine for genetic heterogeneity and species diversity in the genus Akkermansia, providing novel insight towards the functionality of this abundant inhabitant of the human intestinal tract.


Introduction
Humans host a vast variety of microorganisms associated with the various body surfaces, such as on their skin [1] as well as in their gastrointestinal (GI) tract [2,3]. However, only a minor fraction has been shown amenable to cultivation [4]. One way of probing the diversity of commensals and mutualists in the GI tract microbiota is through metagenomics. This culture-independent approach can capture for example the default proxies for species richness, the 16S ribosomal RNA (rRNA) sequences [5]. Subsequent analyses allow the quantification of the differences in colonization diversity between individuals, as well as their overlapping core microbiota [6,7]. Alternatively, attempts have been made to sequence all microbial DNA of different individuals [8], now providing extensive gene catalogues of the human GI tract microbiome [9].
Due to its high phylogenetic and functional diversity, the GI tract microbial ecosystem represents a virtual organ that performs an array of health-promoting functions, from metabolizing otherwise inaccessible foods, the storage of fat, to the production of important vitamins [10,11,12,13]. However, determining which species is responsible for what function can be an arduous task, since relatively little of the microbial diversity has been functionally characterized. This frustrates our understanding of the relationships with and between the residents of the microbiota.
Recently, Akkermansia muciniphila, a novel representative of the deeply rooted phylum Verrucomicrobia, was isolated from the human GI tract [14]. A. muciniphila was isolated using mucin, a complex glycosylated protein, which is used as a sole carbon and nitrogen source. Mucin is the major component of the protective coating of the human intestinal epithelium, where bacteria live in close proximity to human cells [15,16]. The Gram-negative anaerobe A. muciniphila is known to colonize a substantial part of the human population, starting at early childhood [17], and by adulthood reaching densities up to 3% (,1610 9 ) of the total bacterial count in feces [18]. Recently, A. muciniphila has been found to be inversely related to the severity of appendicitis [19], as well as being present in lower numbers in patients with inflammatory bowel disease [20], providing first glimpses of its association with human health issues.
Except for the significant colonization of A. muciniphila, little is known about this frequent and abundant resident of the GI tract. Thus far, only a few genomes of Verrucomicrobiae are available, hampering insight into the evolutionary history of this phylum. In order to uncover the functional capacity of A. muciniphila, we sequenced the complete genome of this species. Its full genetic repertoire is key in understanding the role of this abundant colonizer. Furthermore, by probing available GI tract metagenomic libraries with the full genome we shed light on the abundance and diversity of Akkermansia.

General characteristics of the genome
The complete genome of A. muciniphila ATCC BAA-835 is composed of one circular chromosome of 2,664,102 bp with an average G+C content of 55.8%. The genome has a total of 2,176 predicted protein-coding sequences, with an overall coding capacity of 88.8%. Of the predicted protein-coding genes, 1,408 (65%) could be assigned a putative function, whereas 768 (35%) encode hypothetical proteins, with 38 (1.7%) of all protein-coding genes classified as pseudogenes.
Comparision to the six other available full and draft genome sequences of representatives of the Verrucomicrobia phylum showed that A. muciniphila shares 28.8%, 24.5%, 19.8% 17.9%, 16.0%, and 14.6%, coding sequences (CDS) with Verrucomicrobium spinosum DSM 4136, Chthoniobacter flavus Ellin428, Pedosphaera parvula Ellin514, Opitutus terrae PB90-1, Methylacidiphilum infernorum V4 and Opitutaceae bacterium TAV2, respectively. Overall, the available verrucomicrobial genomes show large variations in their GC content and genome size. A brief summary of the main characteristics of these seven genomes is provided in Table 1.
Further analysis of the COG distribution of verrucomicrobial genomes shows overall a similar trend in the relative abundance of genes in the main COG categories for the genomes of A. muciniphila and Methylacidiphilum infernorum V4, including a more than average occurrence of genes in classes ''Coenzyme metabolism'' (H), ''Nucleotide transport and metabolism'' (F) and ''Translation, ribosomal structure and biogenesis'' (J), whereas relative abundance of genes is lower in categories ''Transcription''(K) and ''Signal transduction mechanisms''(T) in comparison to other verrucomicrobial genomes (Table S1). Furthermore, a subsequent comparison of the COG distribution of all A. muciniphila genes as well as 1337 A. muciniphila specific genes absent in the other six verrucomicrobial genomes revealed that those related to ''Carbohydrate transport and metabolism'' (G) and ''Cell envelope biogenesis, outer membrane'' (M) categories were enriched in the fraction of A. muciniphila specific genes. In contrast categories ''Translation, ribosomal structure and biogenesis'' (J) and ''Nucleotide transport and metabolism'' (F) were underrepresented in A. muciniphila specific genes.
In line with the above, further inspection of the annotated genome showed that A. muciniphila is predicted to synthesize all 20 canonical amino acids, as well as important co-factors and vitamins (data not shown), indicating that development of defined synthetic media for future post-genomic studies should be feasible. Furthermore, genome analysis suggested the ability to metabolize a variety of carbohydrates previously not found to be metabolized, such as galactose, cellobiose, melobiose and fructose [14], and will be addressed by ongoing efforts towards the generation and experimental validation of a genome-based metabolic models.
A large proportion (26%, 567 proteins) of the predicted A. muciniphila proteome contains a signal peptide cleavage site as predicted by signalP [21]. This seems to be a general trend for the Verrucomicrobia (Table S2). From this putative secretome, 61 proteins (11%) are annotated as glycosyl hydrolases, proteases, sulfatases and sialidases (35, 13, 11 and 2, respectively), and therefore strong candidates to be involved in the degradation of mucin. A substantial fraction of all proteins that are predicted to be secreted, are hypothetical proteins (242; 43%), a number of which may also be involved in mucin degradation and processing.
Remarkably, no canonical mucus-binding domains are encountered in the proteome of A. muciniphila, and therefore no candidates involved in the adherence to the mucus layer of the host via these domains [22]. However, a recent study identified a novel module termed BACON (Bacteroidetes-associated carbohydrate-binding Often N-terminal) [23], which is also found in two A. muciniphila candidate mucinases (encoded by Amuc_0953, a sulfatase, and Amuc_2164, a glycosyl hydrolase), and is thought to be involved in mucin binding. But in contrast to most BACON-motif containing proteins, the two Akkermansia proteins have the motif on the Cterminus. Finally, a novel C-terminal targeting signal (TIGR02595) was recently identified in proteins from a variety of mainly Gramnegative species, the PEP-CTERM sequence consisting of a near invariant C-terminal Pro-Glu-Pro motif [24]. As was predicted, this motif could be found in 21 A. muciniphila proteins, encoded by genes scattered across the genome, and the corresponding exosortase EpsH (encoded by Amuc_1470), together forming a protein sorting system associated with exopolysaccharide expression. Clustered regularly interspaced palindromic repeats (CRISPR) loci represent heritable and adaptive primitive immune systems in bacteria and archaea against invading agents such as bacteriophages or plasmids [25]. Two CRISPR loci are detected in the A. muciniphila genome, one in close proximity to a predicted mobile element (an integrase, encoded by Amuc_2006). These CRISPR loci 1 and 2 comprise direct repeats of 36 and 33 bp, and are interspersed 11 and 3 times with spacers, at coordinates 2438206-2438965 bp and 2507588-2507825 bp, respectively. Whereas the 36 bp repeat could not be classified, the 33 bp repeat is similar to repeat cluster 3 [26]. Homologues of CRISPR associated sequences cas1, cas2 and csn1 could be identified in close proximity to CRISPR locus 1 (Amuc_2008, Amuc_2009 and Amuc_2010). The predicted CRISPR locus 2, however, lacks proximal homologues to known CRISPR associated sequences. Due to the differences in repeat sequence and size it is unclear whether both repeat loci can be processed by the cas system located near CRISPR locus 1. In addition to these CRISPR loci, the presence of 9 predicted phagerelated sequences (Amuc_0323, Amuc_0551, Amuc_1116, Amuc_1335, Amuc_1348, Amuc_1355, Amuc_1367, Amuc_1711 and Amuc_2017) suggests that A. muciniphila experienced frequent infection by bacteriophages.
Recently, the human microbiota were found to be a natural reservoir for antibiotic resistance genes [27]. As a frequent and abundant human resident, we queried the A. muciniphila genome for possible antibiotic resistance associated genes. We found potential beta-lactamase genes in the genome (Amuc_0106 and Amuc_0183), belonging to beta-lactamases classes C and A, respectively, as well as a gene coding for a 5-nitroimidazole antibiotic resistance protein (Amuc_1953). Furthermore, the A. muciniphila genome contains a gene that codes for a putative secreted antibiotic biosynthesis monooxygenase (Amuc_1805, PFAM PF03992).
Long mononucleotide repeats in A. muciniphila are overrepresented at the gene termini (Table S3 and Figure S1), as found previously for archaea, bacteria and eukaryotes [28,29]. These repeats are known to be involved in prokaryotic transcriptional or translational phase variation [30]. Long homopolymeric tracts of .8 bp are found in 17 genes in A. muciniphila, amongst which 2 genes involved in the capsular polysaccharide biosynthesis; a capsular exopolysaccharide biosynthesis gene with a (G) 8 repeat, and a gene that codes for an acyltransferase with a (C) 10 repeat (Amuc_1413 and Amuc_2098, respectively).
Images of A. muciniphila have shown that the cells are frequently covered with flagella-like structures [14]. However, no obvious candidate genes have been discerned in the genome that could encode the putative proteinaceous building blocks of these filaments. Recently, studies into Lactobacillus rhamnosus have shown that this bacterium contains pili that are indispensible for interactions with human mucus [31]. In A. muciniphila, these structures are therefore interesting targets for proteomic investigation, since the availability of the genome sequence enables straightforward determination of the amino acid composition of extracellular proteins.

Mining for Akkermansia DNA in metagenomic libraries of human GI tracts
Previous studies have shown that A. muciniphila is a common and abundant colonizer of the human GI tract, detectable in approximately 75% of the human population [17]. Therefore, we have queried 37 metagenomic libraries from an international effort in the cataloguing of human GI tract microbiomes (M. Arumugam et al. under revision) for the presence of A. muciniphila 16S rRNA gene sequences. Eleven (30%) of the 37 libraries contained sequences .95% identical to the A. muciniphila 16S rRNA gene query ( Table 2, Table S4). In most cases, the nucleotide identity was .99% (ambiguous nucleotides excluded), except in one case (Italian male, 87 years old, subject B), where a complete 16S rRNA gene locus was identified with only 98% identity to that of A. muciniphila.
Subsequently, we queried each of the metagenomic libraries with the entire genomic complement in order to discern all Akkermansia carriers in this dataset. The combined set of bacterial and archaeal genome sequences at NCBI (1026 genomes, obtained 22-01-2010) failed to show any non-Akkermansia hit with a nucleotide identity score .90% for over 200 bp when queried with the A. muciniphila genome (rRNA sequences excluded, data not shown). Therefore, applying these values as a conservative cutoff, we identified putative Akkermansia DNA in a further 12 metagenomic databases (Table 2, all predicted Akkermansia contigs are listed in Table S5), which brings the total of Akkermansia containing libraries up to 23 (62%). The 11 libraries that contained Akkermansia 16S rRNA gene sequences were found to contain on average over 1.3 Mbp of Akkermansia DNA, compared to an average of 133 kbp of Akkermansia DNA in the 12 libraries lacking the ribosomal proxy. The amount of Akkermansia DNA per database varied from a single contig of 575 bp (Japanese female, 4 months old, subject In-M), to well over 2.5 Mbp in 1102 contigs (Danish healthy male, 54 years old, subject MH13). The largest relative amount of Akkermansia DNA, 3.9% of the total assembled DNA, was found in a 61-year-old French healthy male (Subject NO3). In total 13,589 contigs, comprising 15.9 Mbp of novel Akkermansia sequences, were identified in over 1.98 Gbp of assembled metagenomic data.
The average nucleotide identity (ANI) has been proposed to advance the definition of species boundaries in prokaryotes [32]. For each metagenomic database that contains over ten predicted Akkermansia sequences (i.e., stretches of .200 bp with .90% ANI), we analyzed the distribution of ANI scores for all contigs that we predict to be derived from Akkermansia (Figure 1). This shows that for most of the metagenomic libraries (20 out of 23 metagenomes), these contigs have an ANI of around 98% as compared to the A. muciniphila genome, and the subjects could be considered to be A. muciniphila carriers. However, metagenomic datasets from three individuals (A, B and MH6) display a distinctly different distribution of nucleotide identity values, with a much lower ANI (84-88%, though MH6 has several peaks, Figure 1). This indicates that these subjects are likely to be colonized by uncultured and unknown representatives of the Akkermansia genus. Moreover, in databases A and B, (parts of) both BACON domain containing protein-coding genes are encountered, suggestive of a mucolytic potential in these different Akkermansia species.
All 14 databases that are devoid of Akkermansia-like 16S rRNA gene sequences and of sequences .200 bp that were .90% identical to Akkermansia DNA, contained contigs with nucleotide identity scores between 75 and 90% and an average ANI of 80%. These sequences could belong to other species from the genus Akkermansia, though this is tentative. In order to corroborate the possibility for co-colonization of individual microbiomes by different species of Akkermansia, we queried 9773 nearly full-length 16S rRNA sequences from a microbiome study in lean and obese twins [6], showing that out of 30 sampled individuals, 15 contain sequences with over 95% identity with the A. muciniphila 16S rRNA sequence (Table S7). One individual (TS148) harbours only 16S rRNA sequences with ,98% identity, indicative of colonization by an unknown species from the Akkermansia genus, whereas four individuals (TS1, TS6, TS51 and TS150) harbour both A. muciniphila and other Akkermansia spp. 16S rRNA sequences, suggesting simultaneous colonization of these hosts by different species from this genus.
Using these sequences we find a total of eight different species (each represented by at least two individual sequence traces) in the genus Akkermansia using an identity threshold of 98%, suggesting a large still unexplored intrageneric diversity, of which representatives also colonize human microbiomes.

Discussion
We sequenced the genome of the human gut colonizer Akkermansia muciniphila, a representative of the phylum Verrucomicrobia. A. muciniphila has been isolated in basal medium using mucin as a sole carbon and nitrogen source [14], showing that this species is able to degrade the major component of the mucosal lining of the GI tract. Analyses of the distribution of this species have shown it to be a frequent and abundant colonizer of GI tracts in a range of animal hosts [33]. A further exploration into the functional roles of A. muciniphila in the GI tract, however, would be greatly facilitated by the availability of its genetic repertoire.
In genomic terms, A. muciniphila is an average member of the Verrucomicrobia, both in respect to the number of protein coding genes and the GC percentage of the genome, which fall between the values of the sequenced genomes of other members of this phylum. Verrucomicrobial genomes seem to contain a relatively large number of genes encoding a signal peptide when compared to other phyla, and correspondingly, in A. muciniphila approximately a quarter of the proteins encoded in the genome contain a signal peptide, and are therefore potentially secreted. Several of these proteins are predicted to be involved in the different steps of mucin degradation, whereas the large number of genes encoding signal-peptide bearing hypothetical proteins suggests that there may be a large undiscovered capacity of A. muciniphila to break down extracellular polymeric substrates, including mucin. Future studies, including proteomic analyses and functional screening of genomic libraries, can be expected to shed light on the involved enzymes, since recent analysis have confirmed the mucolytic activity of Akkermansia [34].
As found for other, mainly human-associated, organisms such as Neisseria spp., Haemophilus spp., Campylobacter spp. and Helicobacter spp., phase variation via mononucleotide repeat slippage may be employed by A. muciniphila. Notably, two genes involved in capsule synthesis contain very long repeats of guanines, which are known to be more severely underrepresented in the coding parts of genome sequences [35]. Capsules are known antigens encoded by numerous human pathogens [36], but another role of capsules is protection against desiccation [37]. This may be involved in the transmission of Akkermansia via the fecal-oral route.
The presence of two dinstinct CRISPR loci [38], as well as numerous presumably phage-derived sequences in the genome, suggests that viral infections have played an important part in the evolutionary history, and perhaps speciation, of A. muciniphila. Little was known about the variety of the Akkermansia strains and species that colonize a single microbiome, but the current analyses suggest that at least eight different species of the Akkermansia genus colonize the GI tracts of humans, and even simultaneous colonization by different species seems to take place. Whether this means that distinct niches exist for different specialist mucin-degraders in the GI tract, or whether humans are infected continuously by different Akkermansia species, resulting in discontinuous (co-)colonization, is unknown.
In three libraries, we encountered divergent Akkermansia sequences, based on low ANI values compared to the sequenced A. muciniphila genome. It is not possible with the current datasets to confirm whether these three species are identical to each other, since for two of these databases the Akkermansia 16S rRNA gene sequences are lacking in the metagenome. It is, however, tempting to speculate that these other Akkermansia species can also thrive on mucin as a carbon source, based on the presence of the BACON domain containing protein-coding genes, and therefore occupy a similar, if not the same, niche as A. muciniphila.
We approached the investigation of metagenomic libraries with a given complete genome as a query sequence. This increases, as expected, the detection of closely related strains and species in large metagenomes, and aids in the quantification of bacterial abundance. Many metagenomic repositories contain assembled DNA sequencing reads, which may skew the interpretation of the actual abundance of the organism of interest as opposed to the total number of raw sequence reads. However, hybridization signal strength in phylogenetic microarray analyses identified the same metagenome library with the largest amount of Akkermansia DNA.
Further investigations into the congruence between different abundance estimates may help to validate their applicability.
Together, we present the genome sequence of Akkermansia muciniphila, as well as a number of its features that may be important in its ecology and evolution tuned to its niche, the human GI tract. These data enable a further characterization into the functional role of this abundant human-associated commensal.

DNA isolation
A glycerol stock of the Akkermansia muciniphila type strain (ATCC BAA-835) was inoculated in 500 ml anoxic basal medium containing pork gastric mucin as carbon and energy source and subsequently incubated at 37uC overnight as described previously [14]. Cells were harvested by centrifugation and used for high molecular weight DNA isolation using the standard Bacterial genomic DNA isolation using CTAB method recommended by the DOE Joint Genome Institute (JGI, Walnut Creek, CA) with minor modifications. In short, cells were resuspended in 14.8 ml modified TE (10 mM tris; 20 mM EDTA, pH 8.0). The modified TE has shown to prevent DNA degradation (data not shown). Subsequently, cells were lyzed using lysozyme and proteinase K, and DNA was extracted and purified using CTAB and phenol:chloroform:isoamylalcohol extractions. After precipitation in 2-propanol and washing in 70% ethanol, the DNA was resuspended in 400 ml TE containing 40 mg RNase A. Following quality and quantity check using agarose gel electrophoresis in the presence of ethidium bromide, and spectrophotometric measurement using a NanoDrop ND-1000 spectrophotometer (NanoDropH Technologies, Wilmington, DE, USA), respectively, the DNA was precipitated in 2-propanol and shipped to the JGI for whole genome shotgun sequencing.

Genome sequencing and assembly
The genome of Akkermansia muciniphila was sequenced at the JGI using a combination of 3 kb, 8 kb and 40 kb (fosmid) DNA libraries. In addition to Sanger sequencing, 454 pyrosequencing was performed to a depth of 206 coverage. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/. Draft assemblies were based on 51,010 total reads and resulted in approximately 15.56 coverage of the genome. The Phred/Phrap/Consed software package (www.phrap.com) was used for sequence assembly and quality assessment [39,40,41]. Gaps between contigs were closed by custom primer walks on gap spanning clones or PCR products. A total of 567 additional reactions were necessary to close gaps and to raise the quality of the finished sequence. The completed genome sequence of A. muciniphila contains 50,774 reads, achieving an average of 17.7-fold sequence coverage per base with an error rate less than 1 in 100,000.

Gene calling
The gene modeling program Prodigal (http://prodigal.ornl.gov/) was run on the finished genome, using default settings that permit overlapping genes and using ATG, GTG, and TTG as potential starts. The resulting protein translations were compared to Genbank's nonredundant database (NR), the Swiss-Prot/TrEMBL, PRIAM, Pfam, TIGRFam, Interpro, KEGG, and COGs databases using BLASTP or HMMER. From these results, product assignments were made. Initial criteria for automated functional assignment set priority based on PRIAM, TIGRFam, Pfam, Intepro profiles, pairwise BLAST vs Swiss-Prot/TrEMBL, KEGG, and COG groups. Manual corrections to automated functional assignments were completed on an individual gene-by-gene basis as needed. The annotation was imported into The Joint Genome Institute Integrated Microbial Genomes (IMG; http:// img.jgi.doe.gov/cgi-bin/pub/main.cgi) [42]. Singleton identification was carried out as described by Blom et al. [43]. Genes from other available verrucomicrobial genomes were assigned to COGs using RPS-BLAST (Reverse Position Specific BLAST) and NCBI's Conserved Domain Database (CDD). Top hits were taken with an E-value cut-off of 10 22 . The A. muciniphila genome sequence is available at NCBI under accession number NC_010655.

Metagenome mining
The GI tract metagenomes originate from previous studies, all based on Sanger sequencing, and have been re-processed with the SMASH pipeline [44]. General characteristics of these metagenomes are given in Table 2 and Table S6. These 37 metagenomes were queried with the A. muciniphila 16S rRNA gene sequence or with its entire genome sequence using BLAST [45], with hits required to be over 200 bp in length and with over 90% nucleotide identity (rRNA regions were filtered out in the whole genome BLAST analyses).
Nearly full-length 16S rRNA sequences from a twin microbiome study [6] were included (FJ362604-FJ372382; 9773 were extracted from NCBI) for co-colonization analyses and species determination. Different species were assigned using a 98% sequence identity cut-off threshold, and each species group requires at least two representatives. Figure S1 Positional bias of homopolymeric repeats within all protein coding genes from Akkermansia muciniphila. All genes were divided proportionally into five quintiles (with at its 59 end Quintile 1, next Quintile 2, Quintile 3 and Quintile 4, and Quintile 5 as the 39 end). With increasing repeat length (from .4 than .7 nucleotides), the repeats are progressively more abundant in the first quintile. Percentages are depicted as deviations relative to the expected value of 20% per gene quintile for a non-biased intragenic distribution of repeats. (TIFF) Table S1 COG assignments for the seven verrucomicrobial genomes (for full names, see Table 2). (DOCX) Table S2 SignalP predictions for a range of bacterial phyla and species (based on JGI predictions and curations). In bold, the phylum-averages are depicted. (DOCX) Table S3 List of Akkermansia muciniphila protein coding genes that include mononucleotide repeats of 9 bp or longer. The relative gene position (between 0 and 1) is calculated based on the start (relative gene position 0) and end (relative gene position of 1) of each gene. (DOCX)