Saliva Microbiota Carry Caries-Specific Functional Gene Signatures

Human saliva microbiota is phylogenetically divergent among host individuals yet their roles in health and disease are poorly appreciated. We employed a microbial functional gene microarray, HuMiChip 1.0, to reconstruct the global functional profiles of human saliva microbiota from ten healthy and ten caries-active adults. Saliva microbiota in the pilot population featured a vast diversity of functional genes. No significant distinction in gene number or diversity indices was observed between healthy and caries-active microbiota. However, co-presence network analysis of functional genes revealed that caries-active microbiota was more divergent in non-core genes than healthy microbiota, despite both groups exhibited a similar degree of conservation at their respective core genes. Furthermore, functional gene structure of saliva microbiota could potentially distinguish caries-active patients from healthy hosts. Microbial functions such as Diaminopimelate epimerase, Prephenate dehydrogenase, Pyruvate-formate lyase and N-acetylmuramoyl-L-alanine amidase were significantly linked to caries. Therefore, saliva microbiota carried disease-associated functional signatures, which could be potentially exploited for caries diagnosis.


Introduction
Caries is the most common infectious disease throughout the world [1]. Lesions and cavities on tooth surfaces, caused by caries activity, result in infection and pain and can lead to decay and even the loss of tooth structure. Furthermore, once started, the destruction process is usually irreversible. Therefore, preventive measures against caries, as well as the prognosis and early diagnosis, are of particular clinical significance.
Human saliva is home to numerous microorganisms [2,3,4,5,6]. Evidences have recently emerged from our group and others that the organismal structure of saliva microbiota is highly individualized among human hosts [3,4,5,6,7,8] and that changes in organismal structure are linked to caries [9], gingivitis [10] and periodontitis [11]. However, the functional characteristics of saliva microbiota are not well understood [12] and the potential roles of saliva microbiota in health and diseases remain elusive, as (i) organismal lineages do not necessarily correlate with functional activities; (ii) many organisms in a given microbiota are either novel or uncultured; (iii) the degree of microbial functional divergence among host individuals is presently unknown.
Here we reported the global functional profiles of human saliva microbiota associated with dental caries and health. Saliva samples from ten healthy (''H'') and ten caries-active (''C'') hosts were analyzed using HuMiChip 1.0, a new generation of Geochip targeting microbial metabolism in human and mouse microbiota, based on a modified pipeline in the well validated GeoChip3.0 [13]. Our results showed that the functional gene structure of saliva microbiota is able to distinguish caries-active patients from healthy hosts, suggesting that the structure and selected microbial functional gene markers can be potentially exploited for caries diagnosis and perturbation. Thus saliva can serve as a sensitive and non-invasive venue for simultaneously tracking the host, microbial and environmental attributes whose interactions underlie health and disease.

Study design
All human host volunteers (nearly 700 individuals) were from an oral health census on the undergraduates from the east campus of Sun Yat-sen University, Guangzhou, China, in September, 2009 [9]. After oral health survey, ''healthy'' individuals (DMFT = 0) and ''caries-active'' subjects (DMFT §6) were chosen for saliva sample collection (Materials S1). All volunteers provided written informed consent in accordance with the sampling protocol with approval of the ethical committee of the Guanghua Stomatological Hospital, Sun Yat-sen University. They were all unrelated individuals of both genders, aged between 18 and 23 years and shared a relatively homogeneous college-campus living environment. All reported no antibiotics intake for the preceding at least six months and no smoking or tobacco used. All were asked to avoid eating or drinking for 1 h before oral sampling. Those with other oral (for example, periodontitis or halitosis) or systematic diseases were excluded. To decipher the functional landscape of saliva microbiota, 20 saliva samples (including ten from the ''healthy'' group and ten from the ''caries-active'' group) were randomly selected for HuMiChip analysis ( Table 1).

Sample collection and processing
Two milliliters of saliva were collected from each human-host individual into a tube containing an equal volume of lysis buffer (50 mM Tris, pH 8.0, 50 mM EDTA, 50 mM sucrose, 100 mM NaCl and 1% SDS) [10]. Samples were stored at 280uC before high-salt DNA extraction [14]. Thirty microliters of proteinase K (20 mg/mL, Sigma, USA) and 150 mL of 10% SDS were added to 2 mL of the saliva extraction buffer mixture, which was then incubated overnight at 53uC in a shaking water bath. After addition of 400 mL 5 M NaCl and 10 min incubation on ice, the mixture was equally distributed into two 2-mL centrifuge tubes and centrifuged for 10 min at 13,000 rpm in an Eppendorf 5415D centrifuge. The supernatant from each tube was transferred to a new tube, where 800 mL isopropanol was added. The tubes were then incubated for 10 min at room temperature and centrifuged for 15 min at 13,000 rpm. The supernatants were discarded and then the DNA pellets were washed once with 500 mL 70% ethanol, dried and dissolved in 30 mL double-distilled water. Concentrations of the resulted total DNA were measured by Nanovue (GE, USA). DNA purity was determined by A 260 /A 280 , with the inclusion criteria of above 1.8. DNA integrity was verified via agarose gel electrophoresis after ethidium bromide staining under ultraviolet light. DNA Samples were stored at 220uC before further processing.

HuMiChip analysis of saliva microbiota function
A functional gene microarray (HuMiChip1.0) was developed to interrogate microbial metabolism in human and mouse microbiota (details in Materials S1). The design of HuMiChip employed a modified pipeline as that in the well validated GeoChip 3.0 [15,16,17]. In total, 36,056 probes targeting 139 functional genes families were included in HuMiChip 1.0, covering 50,007 coding sequences from 322 draft/finished bacterial genomes and 27 shotgun metagenome datasets from various human body sites. The microarrays were synthesized and manufactured by NimbleGen.
HuMiChip analysis was performed for totally 20 saliva microbiota that include ten healthy and ten caries-active ones ( Table 1). Microarray sample preparation, hybridization, and scaling were performed as previously described [16]. We used minimal signal intensity of 1000 and SNR (Signal to Noise Ratio) cutoff of 2 for positive callings of the presence of a protein. Raw data obtained from microarray image analysis was uploaded to microarray data manager for preprocessing and analysis (http:// ieg2.ou.edu/NimbleGen). Functional gene diversity (e.g., Shannon-Weaver index), detrended correspondence analysis (DCA) and permutation t-tests were performed using R (version 2.9.1). Permutation t-tests were performed based on host dental healthstate. All statistical tests were two-sided, with asterisks denoting statistical significance (NS: not significant; *: p,0.1; **: p,0.05;

Statistical analysis in network reconstruction and biomarker detection
The 3,656 functional genes with hybridization signals on HuMiChip were grouped into ''complete-presence proteins'' (''core'', i.e., those present in all the 20 saliva microbiota) and ''partial-presence proteins'' (''non-core'', i.e., those missing in at least one saliva microbiota). The ''complete-presence proteins'' were represented as normalized values according to their signal intensity, while the ''partial-presence proteins'' as binary values (either 1 or 0). The network of core functional genes was built based on the normalized values of ''all-presence genes'' for the H and C Groups respectively. The network of non-core functional genes was based on the binary values of ''partial-presence genes'' on specific metabolic pathways that include Carbon-associated Pathway (including 'Complex carbohydrates' and 'Feeder pathways to glycolysis' and 'Respiration'), AA-associated Pathway (including ''Amino acid transport and metabolism'' and ''Amino acid synthesis'') and Nitrogen-associated Pathway (''Nitrogen Metabolism'').
To identify those markers that reliably distinguish caries microbiota, the ten healthy samples were grouped into training (seven samples) and testing (three samples) by all possible combinations (thus 120 different groupings). Based on the binary presence profiles of non-core genes, bootstrapping method was used to randomly select a grouping and then used two steps (''feature selection'' and ''classification'') to identify biomarkers based on triplet feature selection. Features with the highest discrimination power on the training data were selected and then employed to ''predict'' the presence profiles of the testing data ( Figure S1; Materials S1). The biomarkers (each represented as a triplet-feature set of microbial genes) identified after each of the two steps were then subjected to manual inspections before retrieving the final list of biomarkers.

Functional gene diversity in healthy and caries-active saliva microbiota
To interrogate microbial metabolisms in human and mouse microbiota, we developed a functional gene microarray (HuMi-Chip1.0) based on our well validated GeoChip 3.0 platform [15,16,17]. HuMiChip 1.0 contains 36,056 oligonucleotide probes targeting 139 functional genes families and covering 50,007 coding sequences from 322 draft/finished bacterial genomes and 27 shotgun metagenome datasets from various human body sites (Table S1; Materials S1). For a pilot-population of 20 human adults (whose organismal structure were decoded [9]) that included ten healthy (H Group) and ten caries-active (C Group) ( Table 1), metabolic functions of saliva microbiota were analyzed via hybridizing the saliva DNA to the microarray. In total, 3,685 genes distributed in 19 gene categories were identified within the collection of 20 saliva microbiota. For each microbiota, the number of detected genes was 2,246,2,880 ( Table 1). In terms of signal intensity, gene categories such as ''Amino acid synthesis'' (accounting for ,25% of all detected genes), ''Amino acid transport and metabolism'' (,15%), ''Central Carbon Metabolism Pathways'' (,10%), ''Cofactor Biosynthesis'' (,8%) and ''Complex Carbohydrates'' (,7%) were the most prominent across all samples ( Figure 1A). The results indicated that the overall functional gene diversity was similar among the 20 samples ( Figure 1B), and no significant difference in gene number or diversity indices was observed between the two groups (p.0.05) ( Table 1).
Among the detected 3,685 genes, averagely 70.86% genes (62.68%,79.45%) were shared between any two of the ten H microbiota, and 69.47% genes (63.37%,76.04%) were shared between any pair of the ten C microbiota ( Table S2). The functional core, of either the H or C group, mainly consisted of ''Amino acid synthesis'' (,25%), ''Amino acid transport and metabolism'' (,15%) and ''Central carbon metabolism pathways'' (,5%). The top 20 most abundant shared genes in the functional core of all 20 samples were shown in Table S3. The most varied functions (measured by standard deviation of signal intensity) in the functional core was ''Cytidylate kinase'' (''Pyrimidine metabolism''), while the most conserved (least varied in signal intensity) functions in the functional core was ''Acetyl-CoA acyltransferase anaerobic'' (''Fatty Acid Metabolism'') ( Table S4). Inside either the H or the C group, a gradual decrease and eventual saturation of shared genes with individual additions of hosts were apparent (Figure 2), where 35.6% (1,134/3,187) and 35.0% (1,179/3,366) of the genes were shared for individuals within H and those within C group respectively (in contrast, only 0.08% (for C group) and 0.53% (for H group) of species-level OTUs were shared for these hosts [9]). Four functional genes were found with an ''exclusive pattern'' (found in either H or C triplet-features but not both): Diaminopimelate epimerase (''Amino Acid Synthesis''; C-exclusive), Prephenate dehydrogenase (''Amino Acid Synthesis''; C-exclusive), Pyruvate-formate lyase (''Respiration''; H-exclusive) and N-acetylmuramoyl-Lalanine amidase (''Glycan Biosynthesis and Metabolism''; C-exclusive).

Distinctions in functional structure between the healthy and caries-active saliva microbiota
Interestingly, functional modularity in saliva microbiota was apparent, as unveiled by the co-presence network of functional genes (i.e. among the 3,685 positive genes; Materials and Methods). First, the network of core functional gene (genes shared by every microbiota) in the H Group (the core H-network) consisted of 21 modules, with 348 nodes and 22,964 edges (links), while the core C-network consisted of 19 modules, with 363 nodes and 22,031 links ( Figure 3A). Sizes of these modules were quite similar between the two core networks. The largest module in the core H-network consisted of 222 nodes and 22,658 links, while that in the core C-network consisted of 230 nodes and 21,800 links; furthermore, between the two modules, there was a large overlap of functional genes (96.9%,98.3% of the genes). In fact, the nodes in the two networks displayed a similar functional pattern ( Figure 3A and Table S5).
Second, sub-networks of non-core genes (genes without signals in at least one microbiota) were constructed based on specific metabolic pathways, such as the Carbon-associated, AA (amino acid)-associated and Nitrogen-associated pathways. The largest module in the Carbon-associated non-core sub-networks for H Group featured 35 genes, while that for C Group features only 14 genes ( Figure 3B). The Amino-acids-associated non-core subnetworks exhibited a similar trend, with 337 genes in the biggest module in H Group while 74 in that of the C Group ( Figure 4). The largest Nitrogen-metabolism-associated module was consisted of seven genes in H Group (with most encoding Spermidine synthase), yet was absent in C Group. Therefore, healthy saliva microbiota exhibited more conservation in non-core genes than caries-active ones. Interestingly, healthy saliva microbiota was also more conserved in organismal structure than caries-active ones [9].

Functional gene markers of saliva microbiota that were linked to caries
Although the overall functional gene diversity of saliva microbial communities remained unchanged between the C and H groups, their composition and structure were significantly different as demonstrated by dissimilarity analysis (MRPP with p, 0.01) and detrended correspondence analysis (DCA) from all 3,685 detected genes on HuMiChip 1.0 ( Figure 5A), indicating a significant link between the host disease state and saliva microbiota functioning. We have previously demonstrated a high degree of divergence in organismal structure and a minimal organismal core in human saliva microbiota among host individuals [9]. Our data here showed that functional-gene structure of saliva microbiota was able to distinguish the caries state from the healthy state of human hosts. Thus a function-based strategy via HuMiChip appears to be more effective than an organism-based strategy via 16S amplicon sequencing in our case ( Figure 5B). Therefore, functional gene structure of saliva microbiota can potentially be a more reliable predictor of caries than established risk factors such as Streptococcus mutans [18].
To understand the link between functional-gene structure of saliva microbiota to caries-state, signal intensities of genes and gene categories detected by HuMiChip were compared between the two groups of hosts. Significant differences were detected for gene categories of Complex carbohydrates, Nitrogen metabolisms and Amino acid transport and metabolism ( Figure 5C), and for functional genes such as Xylose isomerase, N-acetylmuramoyl-L-alanine amidase, Alpha-glucosidase, etc ( Figure 5D). Through a ''feature selection'' strategy (Materials and Methods) based on the 2,822 non-core functional genes, 1,247 triplet features were selected whose accuracy was at least 80% each among all possible permutations (Table S6). Among them, eight triplet-features were identified with high predictive power for H Group, and nine triplet-features for C Group ( Table 2). These 17 triplet-feature sets thus represented salivary microbial gene markers that were of value in dissecting and diagnosing caries etiology.
Interestingly, those genes presented with the highest frequency in the 17 triplet-features were those that exhibited an ''exclusive pattern'' (found in either H or C triplet-features but not in both): Diaminopimelate epimerase (''Amino Acid Synthesis''; C-exclusive), Prephenate dehydrogenase (''Amino Acid Synthesis''; C-exclusive), Pyruvate-formate lyase (''Respiration''; H-exclusive) and N-acetylmuramoyl-Lalanine amidase (''Glycan Biosynthesis and Metabolism''; C-exclusive). In contrast, for these 20 saliva microbiota, not a single taxon, from the phylogenetic level of phylum to that of OTUs (totally 9,628 OTU found), was identified with such an ''exclusive pattern'' of distribution in either the H or C Group [9], suggesting functionbased strategies can potentially be more effective than organismbased ones in diagnosis and treatment of oral infectious diseases.

Discussion
There has been a long history in sialometry and sialochemistry diagnosis of both oral and systemic diseases, such as caries [19], primary Sjogren's Syndrome [20], oral squamous cell carcinoma [21] and pancreatic diseases [22]. For caries, previous works in saliva have mainly focused on human-host attributes such as Glucosyltransferase B [23], antimicrobial peptides [24], past caries experience [25], soluble CD14 [26] and trace elements [27], while only a few have exploited individual microbial features, such as specific microbiological counts [28] and microbial nitrate reductase activities [29]. Few global functional analysis and comparison of saliva microbiota function was available, due to (i) the organismal complexity of the microbiota [9] and (ii) the observations that metagenome-sequencing based functional comparison of microbiota can be hampered by sequencing biases [30], the paucity of reference genomes and the small percentage of annotatable reads [31,32].
Microarray-based technologies are generally robust for community comparisons and more resistant to contaminants [33]. Therefore, we developed a functional gene microarray (HuMi-Chip1.0) to interrogate microbial metabolism in human and mouse microbiota. This comprehensive survey of saliva microbiota functions on the 10 healthy and 10 caries-active adults suggested that saliva microbiota carried disease-associated functional signatures.
The global functional landscapes of saliva microbiota in healthy and diseased hosts revealed a series of microbial functional markers strongly linked to caries in the pilot populations. Most of these microbial markers were novel and could lead to new clinical applications once validated in larger cohorts. One class of them was affiliated with Amino acid synthesis, suggesting the close link between the microbial activity and caries. Diaminopimelate epimerase is central to the biosynthesis of both lysine and cell-wall peptidoglycan in many bacteria. It catalyzes the stereoinversion of LL-DAP to meso-DAP, a precursor of L-lysine and an essential component of bacterial peptidoglycans [34]. Prephenate dehydrogenase (PDH) is a bacterial enzyme that converts prephenate to 4-hydroxyphenylpyruvate through the oxidative decarboxylation pathway for tyrosine biosynthesis [34]. Aspartate-ammonia ligase (Asparagine synthetase) catalyses the conversion of L-aspartate to L-asparagine in the presence of ATP and ammonia. These findings were consistent with previous works linking compounds with amine functional groups to caries (such as association of free salivary proline and glycine with caries [35]) and reporting higher levels of free salivary arginine and lysine in caries-free adults than those with caries history [36]). Microbial catabolism of dibasic amino acids might contribute to neutralization of plaque acids and thus partially accounted for the higher resting plaque pH in caries-free hosts [37].
Another class of candidate caries biomarkers we identified was consisted of those involved in carbohydrate hydrolysis. Pyruvate formate-lyase (PFL), exclusively existent in the H Group, converts sugar into volatile compounds (formate, acetate and ethanol) and serves in ATP synthesis and NAD+/NADH recycling [38]. This enzyme is extremely sensitive to oxygen and can be key to anaerobic fermentation in dental plaques [39]. N-acetylmuramoyl-Lalanine amidase is an autolytic enzyme bound to the surface of bacterial cell walls. It hydrolyzes the link between N-acetylmuramoyl residues and L-amino acid residues in certain cell wall glycopeptides (particularly peptidoglycan). It was reported that mutanolysin, one of the petidoglycan-degradative enzymes, exhibited lytic activity against the ''etiologic agents'' of dental caries, e.g. Streptococcus mutans, Streptococcus salivarius, Streptococcus sanguis, Lactobacillus acidophilus and Actinomyces viscosus [40]. Alphaglucosidase is hypothesized to participate in the induction of dental caries [41]. Alpha-glucosidase and Glucosyltransferases (Gtfs) are both from GH13 family (Glycoside Hydrolase Family 13; http://www. cazy.org); Gtfs are a major virulence factor in caries-pathogens in that Gtfs adsorb to enamel and synthesize extracellular glucans in situ, providing sites for colonization by microbes and an insoluble Figure 3. Co-presence networks of core and non-core functional genes in the healthy and caries-active host groups. A node (squares: core genes; circles: non-core genes) represents a functional gene. A solid line linking two nodes stands for positive correlation between the two. The sub-networks of Carbon-associated pathway (including 'Complex Carbohydrates' , 'Feeder Pathways to Glycolysis' and 'Respiration') were shown. The top five most abundant gene categories (the core-genes network) or genes (the non-core-genes network) in each network were labeled with different colors. doi:10.1371/journal.pone.0076458.g003 matrix for plaque [42]. Xylose isomerase is a key enzyme in xylose to xylitol conversion, which is carried out by bacteria. Xylitol has been recommended for its positive caries-prevention effect, demonstrated in various clinical trials using xylitol-containing chewing gum [43].
Microarray-based technology has served as useful tools for sensitive, specific, and quantitative analysis of microbial communities, yet their limitations in dissecting the functional composition of complex microbial communities still remain. For example, functional features that can be revealed were dependent on the defined probe sets with known functions. With the development of high-throughput sequencing, the number of functional gene sequences of interest has been increasing rapidly, thus the probes must be continuously updated and improved for comprehensive analysis [44].
In summary, our work unveiled the global functional features of human saliva microbiota. The sensitivity to host disease state, links to systematic body functions (due to circulation; [45,46]), easy accessibility and non-invasiveness in sampling, susceptibility for in situ analysis, feasibility of genotyping microbiota, as well as the  [9]). Thus in this study functional-gene structure of saliva microbiota can be more sensitive than organismal structure in distinguishing caries state from healthy state. (C) Categories of functional genes with significant differences between the two groups. (D) Functional genes that were of significant differences between the two groups. Means of signal intensity for each gene (or gene category) between the two groups were compared (*p,0.1; **p,0.05, ***p,0.01; mean6s.e.m.). doi:10.1371/journal.pone.0076458.g005 extensive clinical knowledge base [47] and accumulating salivaomics data (e.g. the salivary proteome [48]) suggest saliva as an advantageous venue and valuable research model for tracking intricate interactions that underlie oral and even systemic diseases. Figure S1 Computational strategy for selecting the functionalgene markers associated with caries in saliva microbiota.

Supporting Information
(TIF)      Author Contributions