Detection of Novel Integrons in the Metagenome of Human Saliva

Integrons are genetic elements capable of capturing and expressing open reading frames (ORFs) embedded within gene cassettes. They are involved in the dissemination of antibiotic resistance genes (ARGs) in clinically important pathogens. Although the ARGs are common in the oral cavity the association of integrons and antibiotic resistance has not been reported there. In this work, a PCR-based approach was used to investigate the presence of integrons and associated gene cassettes in human oral metagenomic DNA obtained from both the UK and Bangladesh. We identified a diverse array of gene cassettes containing ORFs predicted to confer antimicrobial resistance and other adaptive traits. The predicted proteins include a putative streptogramin A O-acetyltransferase, a bleomycin binding protein, cof-like hydrolase, competence and motility related proteins. This is the first study detecting integron gene cassettes directly from oral metagenomic DNA samples. The predicted proteins are likely to carry out a multitude of functions; however, the function of the majority is yet unknown.


Introduction
Integrons are commonly found in bacterial genomes, especially in most Gram-negative bacteria. They are involved in the dissemination and differential expression of genes in the bacterial population [1][2][3]. They contain two common features, a functional platform and an array of gene cassettes (GCs). The former or the 5' conserved segment (5'CS) contains the integrase gene, intI, an attI recombination site and the promoter Pc. This platform is used for the capturing and expression of the GCs, non-replicative mobile elements which generally couple one or more open reading frames (ORFs) with the cassette-associated recombination attC site. The intI gene encodes a site-specific tyrosine recombinase, IntI which catalyses the integration and excision of the GCs. The expression of integrase genes can be upregulated by the SOS response, a bacterial stress response induced by the accumulation of single stranded DNA in a cell, such as transformation, conjugation, starvation and exposure to antibiotics such as quinolones and trimethoprim [4].
The recombination usually occurs between attI, located immediately adjacent to intI, and attC which is present on circular gene cassettes [5][6][7]. The attC sites contain two inverted regions of homology (L'-R' and R"-L"), which flank the central region containing a highly variable sequence. The size of attC can be between 57 to 141 bp [8]. Even though the sequences of attC sites are not conserved, they show a palindromic organization which is essential for the formation of the correct hairpin structure, which is the recognition site of the integrase during integron GC recombination reactions [9]. The GCs normally do not have a promoter. The Pc promoter is usually required for the transcription of GC ORFs, therefore the first GC following Pc often has the higher levels of expression relative to downstream GC located ORFs [10].
Integrons have the potential to drive bacterial evolution and adaptation by differential expression of ORFs within GCs. One of the most clinically significant adaptive traits is antibiotic resistance [11]. The first integrons were identified by their association with antibiotic resistance genes (ARGs) [12]. Among hundreds of classes of integrons, class 1 integrons are the most commonly associated with multiple ARGs in clinical strains. More than 130 different GCs carried by integrons were predicted to confer resistance to a variety of classes of antibiotics such as aminoglycosides, beta-lactams, chloramphenicol, trimethoprim, and streptothricin [13].
Gene cassettes are abundant and disseminated widely in diverse environments. Different isolates of the same bacterial species can have different GC arrays [14]. The predicted protein functions of ORFs within GCs are varied and include, in addition to antibiotic resistance, virulence, and secondary metabolism, which are likely to be niche-specific [1,2]. However, metagenomic analyses of the integron cassette gene pool from several studies revealed that vast majority of GCs were novel [15][16][17].
Due to the fact that, in many environments, less than 1% of the bacterial population is culturable [18], one of the approaches to investigate the GCs in the entire bacterial community is the PCR-based amplification of GCs using metagenomic DNA as a template [19]. Several studies on the diversity of GCs in different environments have been performed with this approach such as soil, seawater, marine sediment and deep sea vents [16,17,19,20].
The human oral cavity is one of the most complex microbial ecosystems in the human body. More than 700 bacterial species have been detected from the oral cavity, [21,22]. Many ARGs have been detected and discovered in the oral cavity, including tetracycline resistance genes tet(Q), tet(W), tet(M), tet (37) and tet(32); erythromycin resistance genes, ermB and mef, and kanamycin resistance gene, aphA-3 [23][24][25]. Recent genetic analysis of the oral metagenome showed that 2.8% of the predicted genes had the potential to encode proteins with antibiotic and toxin resistance [26]. However, very few studies investigating integrons in human oral cavity have been performed. There are two major reports on integrons in the human oral cavity; one describing an unusual or reverse integron, an integron with the integrase gene oriented in the same direction as a gene cassette array, in Treponema denticola ATCC35405 by using whole genome sequencing analysis, and the in silico analysis of an integron associated with Treponema species by using metagenomic datasets of the Human Microbiome Project [14,27]. The presence of other integrons in other oral bacterial species remains to be determined.
Despite the oral microbiota being recognised as a potential source of ARGs and the oral environment providing conducive conditions for the transfer of ARGs between a range of species [25], no in depth studies have been carried out to detect integrons and GCs within the oral microbiota. In this study, we have investigated the presence of integrons and associated GCs in the human oral metagenomic DNA from two countries, the UK and Bangladesh using a PCR approach. Different sets of primers targeting different regions of integrons were used for PCR amplification, in which multiple GCs were identified and predicted to encode various proteins including some likely to confer antibiotic resistance.

Saliva sample collection and ethical approval
Saliva samples were collected from 11 and 10 healthy volunteers (both male and female with age between 21 and 65) from UK and Bangladesh, respectively. The UK samples were collected from the staff and international postgraduate students from the UCL Eastman Dental Institute and represent various ethnic and cultural backgrounds including Asian, Australian, European, African and Middle-Eastern, some of which had moved to the UK in the past few months. Therefore, the UK samples represent an international metagenome. The Bangladeshi samples were collected from the staff, undergraduate and post-graduate students of Department of Pharmacy of Rajshahi University all of which were Bangladeshi. All of the volunteers read and gave written consents before sample collection. None of the volunteers had received antibiotic treatment for 3 months before the sample collection day. Ethical approvals for the analysis of pooled saliva as part of this project were obtained from University College London (UCL) Ethics Committee (project number 5017/001) and the Institutional Animal, Medical Ethics, Biosafety and Biosecurity Committee (IAMEBBC) for Experimentations on Animal, Human, Microbes and Living Natural Sources, University of Rajshahi (project number 54/ 320/IAMEBBC/IBSC). Both ethics committees approved the consent procedures for the sample collection and processing. For the UK samples, 2 ml of saliva were collected in a sterile plastic tube and processed immediately. The samples from Bangladesh were collected and transported using Norgen's Saliva DNA Collection and Preservation Device, (Norgen, Canada) following the manufacturer's guidelines, and transported to UK for analysis. All samples were anonymised.

Extraction of oral metagenomic DNA
The freshly collected UK saliva samples were pooled together into a sterile plastic tube in a class I microbiological safety cabinet. The pooled saliva sample was then divided into 1.5ml aliquots and centrifuged at 20238 g for 1 min. The UK oral metagenomic DNA was then extracted by using the Puregene DNA extraction kit (Qiagen, UK), following the Gram-positive bacteria and yeasts protocol with the modification in final step, which the DNA pellets were dissolved in 400μL molecular grade water at room temperature, instead of 100μL.
The Bangladeshi oral metagenomic DNA was extracted from the Norgen's Saliva DNA storage buffer using ethanol precipitation technique according to manufacturer's protocol. The preservative buffer of Norgen devices is designed for rapid cellular lysis and subsequent preservation of DNA from fresh saliva samples. Prior to DNA isolation, the storage devices were incubated for 1h at 50°C and mixed by inversion and gentle shaking for 10 seconds. DNA was then extracted from 500 μL of the pooled saliva in preservative buffer by taking 50 μL aliquots from 10 saliva samples.

PCR amplification
The list of primers and their sequences are shown in S1 Table and the target sites for the primers are shown in Fig 1. The typical PCR was prepared as follows; 50 μL reaction containing 15μl of 2x BioMix Red (Bioline, UK), 0.2 μM of each 10 μM primer, 50-100 ng of DNA template, and molecular grade water (Sigma, UK) up to 30 μL. The standard PCR was carried out with (i) an initial denaturation: 94°C for 5 minutes, (ii) denaturation step: 94°C for 1 minute, (iii) annealing step: 50-65°C depending on the primers for 30 seconds, (iv) elongation step: 72°C for 30 seconds to 3 minutes depending on the size of expected products, repeated step (ii)-(iv) for 35 cycles and (v) final elongation step 72°C for 10 min.

PCR purification and gel extraction
The PCR products were subjected to electrophoresis on 1% agarose gel stained with 1:10,000 dilution of GelRed nucleic acid stain (Biotium, UK). The products were then purified by using either QIAquick PCR Purification Kit (Qiagen, UK) or QIAquick Gel Extraction Kit (Qiagen, UK), depending on the amplification results and the target amplicons, according to the manufacturer's instructions.

Ligation and transformation
Purified PCR products were ligated into pGEM-T Easy vector (Promega, UK). The ligation mixtures were transformed into Escherichia coli α-Select Silver Efficiency competent cells (Bioline, UK) by heat shock at 42°C for 40 s, and plated on Luria-Bertani (LB) agar with ampicillin (100 μg/mL) as a selective marker for the plasmids and 40 μg/ml X-Gal plus 0.4 mM IPTG for the blue-white colony screening.

Plasmid isolation and sequencing
White colonies were subcultured in 5 mL of LB broth with ampicillin (100μg/mL) and incubated overnight. Plasmids were isolated by using QIAprep Spin Miniprep Kit (Qiagen, UK) following the manufacturer's instructions. The presence of the insert in a plasmid was verified by a 10μl DNA digestion reaction, containing 0.5 μL EcoRI restriction enzyme (20 units/μL, New England Biolabs, UK), 1μL 10x EcoRI buffer, 100-500 ng of DNA and molecular grade water (Sigma, UK) up to 10 μL. The reactions were incubated at 37°C for 1 hour and electrophoresed on 1% agarose gel.
DNA sequences were aligned and manipulated by using BioEdit software version 7.2.0 (http://www.mbio.ncsu.edu/bioedit/bioedit.html). For the inserts which required sequencing with more than one primer, the sequences were assembled using the CAP contig function in the BioEdit program [28]. The sequences were screened for vector contamination by using VecScreen analysis tool (http://www.ncbi.nlm.nih.gov/tools/vecscreen). The primer binding sites were then identified by searching the sequences by eye. The sequences were analysed by the comparison of sequence and translated sequence using the National Centre for Biotechnology Information (NCBI) tools and databases including BlastN and BlastX [29], ORF finder and Clustal Omega.
A sequence obtained using the attC-based primers was considered a putative GCs if (i) it contains both of the primer sequences (designed from conserved nucleotides of attC) (ii) the sites included an integrase-like simple site at each end [10] (iii) the primer sites flank a putative ORF beginning with ATG, TTG or GTG [17]. The sequences which did not contain an ORF, but contained the attC site, were considered as empty GCs. The putative translated sequences were subjected to BlastX searches and matches were considered significant if the e-value was <0.001.

Nomenclature and accession number of the gene cassettes
The gene cassettes (GCs) were named according to the source and the primers. The first two letters indicate the forward and reverse primers used for amplification. The third letter indicates the source of oral metagenomic DNA that the GCs amplified from (U for UK; B for Bangladesh), which is followed by a numerical code for the number of clone. For example, TMB1 means it is the first GC obtained from Bangladeshi samples by using the primer TDIF and MARS2.
The sequences of integron regions, which contained intI, Pc, attI and gene cassettes, were deposited in the DNA database Genbank under accession numbers from KT921469 to KT921473. The accession numbers from KT921474 to KT921509 and from KT921510 to KT921531 represented gene cassette sequences generated by the T. denticola primers from UK and Bangladeshi samples, respectively.

Recovery and characterization of PCR products containing intI and the first gene cassette
Initially, we used previously published primers that had been used to successfully amplify gene cassettes from a range of environments (Fig 1, S1 and S2 Tables). Unexpectedly none of these primers produced amplicons having the structural features of a gene cassette [17] when oral metagenomic DNA isolated from the UK and Bangladesh was used as a template (see materials and methods).
As Treponema denticola integrons are the only ones that have been described in the oral microbiota [27], new primers were designed based on this integron. The PCR were performed by using the intI-based primer TDIF (designed based on the conserved amino acid sequence SSQNQAL of IntI of the Treponema denticola integron) coupled with the attC-based primer MARS2. Resulting amplicons were cloned into pGEM-T Easy vector and a total of 17 clones were randomly selected from both cohorts and the inserts within the plasmids were sequenced. All of these contained the basic features of an integron. Within the amplicons, a major part of intI (768 bp), the full length attI site and a putative integron promoter, Pc were detected. A total of 5 different amplicons containing 5 different GCs including one empty GC with no identifiable ORF were found (Fig 2). The putative ORFs detected on the GCs had a size range of 258 to 777 bp (Table 1).
Among the 17clones sequenced from both cohorts, 8 clones (TMB3/5/6/10/11/13/14/16) had a GC having an ORF (768-bp) predicted to encode a protein homologous to a cof-like hydrolase of Treponema putidum. Two of the first gene cassettes with an ORF of 258-bp and 387-bp present on clones TMB1/8/12/15 and TMU18, respectively had no nucleotide sequence similarity to anything in GenBank. However, at the amino acid level the 387-bp ORF on TMU18 showed 100% identity with a hypothetical protein of T. denticola. Another GC detected on clones TMU3/4/11 with an ORF of 777-bp was found to encode a hypothetical protein of Treponema denticola (Table 1). Finally, an empty first GC was found on clone TMB4.All but one ORF detected on the first GCs had putative ribosomal binding sites (RBS) at less than 8-bp upstream of the ORFs. In all first gene cassettes, two putative integrase binding sites (L and R; also termed as S2 and S1, respectively) were detected on the attI sites where the integrase binding sites S1 (R) were found to contain a plausible attI-attC junction (GTT). The 7 bp core site Rʹʹ (1L) of attC was also detected upstream of the reverse MARS2 primer having the consensus sequence RYY(/R)YAAC (S3 Table). In most cases, the stop codons of the ORFs was located at these Rʹʹ integrase binding sites of attC [8,30].  Gene Cassettes Amplified Using attC-based primers A library of PCR amplicons obtained using a different set of attC-based primers was constructed in pGEM-T Easy vector and the inserts from 100 clones were sequenced (Table 2, Fig  1, S1 and S2 Tables). By analysing the sequences with different bioinformatics tools we have detected a total of 58 unique GCs having the features of an integron GC and flanked by the primer binding sites. The size of the cassettes ranged from 425 to 1144 bp. Of the 58 GCs, 12 had no identifiable ORFs and the remaining 46 GCs contained one or more putative ORFs giving a total of 72 different ORFs with a size range between 117 to 894 bp. As the forward and reverse primers were designed based on the consensus Lʹ (2R) and Lʹʹ (2L) core sites, respectively, we were able to locate the Rʹ (1R) core sites in all GCs with a consensus GTTRR(Y)R(Y) Y(R) after the forward primer sequence. The complementary Rʹʹ (1L) core sites with a consensus R(Y)Y(R)Y(R)YAAC were also detected upstream or as a part of the reverse attC primers which confirms that the putative GCs are not PCR artefacts and is consistent with the attC structure of a GC [31]. The majority of the Rʹ and Rʹʹ core sites (51 out of 58 GCs) exhibited 100% complementarity with each other. In the remaining seven, 6 out of 7-bp were complimentary (S4 Table). By analysing the arrangement of genetic features within the GCs we found that they were arranged in seven different ways (Fig 2) as defined by the direction, position and number of ORFs within the GCs. The type C arrangement accounted for the majority; found in 24 cassettes. The sequences of the clones containing two or more ORFs were examined for the presence of other putative attC sequences in between the ORFs, none of which were found. These observations show that the attC-based primers based on the T. denticola integron are able to amplify GCs from oral metagenomic DNA. From 72 putative ORFs found in all GCs, 63 of them had ribosomal binding sites located upstream of the predicted start codons. As in previous studies the GCs other than the toxin-antitoxin encoding GCs did not contain an identifiable promoter, thus are likely to be dependent on the Pc of the cassette array for expression [19].
Diversity of the functions of putative proteins encoded by ORFs within the GCs detected by attC primers Out of 72 putative ORFs detected on 58 different GCs amplified by using attC primers, 66 (91.66%) of the predicted proteins had a homologue in GenBank. However, only 24 of the 66 ORFs (36.36%) were found to encode proteins with known function and the remaining 42 matched hypothetical proteins. With regards to sequence similarity of the ORFs with those in GenBank, we found that 45 of the 66 ORFs (68.0%) exhibited >90% amino acid identity. Ten putative ORFs were predicted to encode completely novel proteins (e-value <0.001).
The putative ORFs detected on the gene cassettes were predicted to encode proteins of diverse functions including antibiotic resistance, host adaptation to stress and competence ( Table 2). Four different putative antibiotic resistance genes were found among the cassette ORFs. BlastX searches showed that the clone MMB22 contained an ORF that encoded a protein with 99% identity to streptogramin A O-acetyltransferase from T. denticola. The single ORF (390-bp) present in the clones SSU3, SSU4 and SSU30 of UK was predicted to encode a glyoxalase/bleomycin antibiotic binding protein. Two ORFs were detected in the clone SSU28 encoding potassium ABC transporter ATPase and multidrug transporter MatE. Proteins related to adaptation to stress include different toxin-antitoxin systems and a twitching motility protein. The clones containing the ORFs encoding toxin-antitoxin system includes SSU27, MMB23, MMB38 which encoded HicA (toxin)-HicB (antitoxin), peptidase (antitoxin)-PemK (toxin) and higA (antitoxin)-higB (toxin), respectively.       Most of the proteins encoded by the ORFs on GCs showed similarity with many proteins in the database, some of which were from Treponema spp. (60 out of 66) mostly from T. denticola (24 out of 60) followed by T. putidum, T. medium, T. vincentii, T. pedis, T. phagedenis, and T. socranskii. This observation supports the previous reports that T. denticola, T vincentii and T. phagedenis carry chromosomal integrons [14,32]. However, we have also identified 27 ORFs related to other Treponema spp.; T. putidum, T. medium, T. pedis and T. socranskii. Only six ORFs out of 66 were predicted to encode proteins related to non-treponemes including those from Paenibacillus sp., Clostridium sp. and Maripofundus sp. however, the homologies of the ORFs with these species were low (<70%) at the amino acid level.

Discussion
The PCR strategies to recover novel integron cassettes from metagenomic DNA using primers targeting the conserved sequence of IntI and attC have been successful in previous studies [15-17, 19, 33]. However, all of these metagenomic studies were carried out on non-human environmental samples. Most of the metagenomic studies involving human microbiota, were either sequence-based [34] focusing on the recovery of all genetic features or focusing on a function of interest such as antibiotic resistance [35]. No studies have been reported so far on metagenomes obtained from human saliva to detect integrons using a PCR approach. We detected mostly Treponema integrons and GCs from metagenomic DNA from human saliva from both Bangladeshi and UK samples, indicating that this methodology is applicable to any oral metagenomic sample.
This study provides an analysis of the diversity of integron GCs amplifiable in saliva metagenomic DNA. Using novel primer combinations based on the structural features of the reverse integron of T. denticola ATCC 35405 [27], we have uncovered a diverse array of gene cassettes including those in the first position, most of which are novel. Although the chromosomal integron of T. denticola ATCC 35405 is the only integron described from the oral bacteria (it has 45 gene cassettes in the array), in silico analysis of metagenomic data sets from the Human Microbiome Project (HMP) showed that two other Treponema species, including T. vincentii ATCC 35580 and T. phagedenis F0421, have also been found to carry integron GCs [14,27,32]. However, the PCR strategies used in this study, recovered novel GCs that were predicted to encode proteins related to those from genera other than Treponema spp.
Analyzing the proteins encoded by the GCs amplified from the oral cavity showed several interesting ORFs. GC SSU3 was predicted to encode a protein with 97% amino acid identity to the glyoxalase of Treponema pedis (WP_009105863.1, 100% coverage). It contains the Glo-EDI-BRP-like domain which can be found in metalloproteins including glyoxalase I, type I extradiol dioxygenases and bleomycin sequester proteins. Bleomycin is a glycopeptide antibiotic, which inhibits the peptidoglycan synthesis in bacteria, and also used as an antitumor drug which bind to DNA and generate free radicals that result in both double-strand and singlestrand DNA breaks [36,37]. Another ORF found on GC MMB22 detected in the Bangladeshi sample was predicted to encode streptogramin A O-acetyltransferase which had 77.0% nucleotide identity with Clostridium sp. BLN1100 and 99.0% amino acid identity with the streptogramin A O-acetyltransferase from T. denticola. Streptogramin A O-acetyltransferases mediate resistance to the streptogramin A-B combination by adding acetyl group to streptogramin, which inactivates the drugs [38].
Finally, a cof-like hydrolase gene (a member of haloacid dehalogenase superfamily) was predicted to be within a GC amplified using both GC primers and first gene cassette primers (GC SSU26 and GC TMB3). Cof-like hydrolases are a group of enzymes that inactivate halogenated aliphatic hydrocarbons by hydrolysing the carbon-halogen bonds. They are essential for detoxification of many chlorinated compounds [39,40]. Therefore, a cof-like hydrolase in the oral cavity could play a role in detoxifying or inactivating antimicrobials or other compounds with carbon-halogen bonds that are used as antibiotics, pesticides and food preservatives such as chloramphenicol, atrazine and brominated vegetable oil, respectively.
Another function of predicted GC ORFs was related to the adaptation of bacteria to environmental stress. For example, the twitching motility PilT protein was predicted to be encoded by the ORF in GC of clone SSU5, MMB3 and MMB9. It has been shown to be involved with type IV fimbria-mediated twitching motility and protease secretion [41]. Twitching motility was also shown to play a key role in the development of biofilm from Pseudomonas aeruginosa [42]. As many oral bacteria can form biofilms on the surfaces in the human oral cavity, having a PilT-encoded GC could help them to develop biofilms and survive environmental stress.
As in previous metagenomic studies to detect integron GCs [16,17,43], ORFs predicted to encode proteins with regulatory functions such as toxin-antitoxin (TA) systems have been detected. Four different TA operons including the HicAB, HigBA, RelBE and MazF were detected on GCs in our study. TA cassettes are usually abundant in chromosomal integrons and are thought to have a role in the stability of the integron GC arrays [27,44]. All of the detected TA cassettes are the members of type II toxin-antitoxin systems [45]. The toxins (HicA, HigA, RelE and MazF) work by cleaving mRNA, inhibiting translation and exhibit bacteriostatic activity, and the antitoxins (HicB, HIgB, RelE, MazE) can inhibit the action of toxin by protein-protein complex formation [46][47][48][49]. Among the four detected TA operons, only the HicAB TA system was previously found on the T. denticola integron. The nucleotide sequence of HicA and HicB system found on SSU27 cassette exhibited 97% and 99% nucleotide identity to the corresponding fourth gene cassette of the integron of T. denticola, containing HicA (TDE1838) and HicB (TDE1837) genes [27]. We have detected two HigBA TA systems in our GCs (MMU24 and MMB38), and this system has also been detected on the Vibrio cholerae super integron Several recovered GCs did not contain ORFs. This kind of ORF-less GCs was found both in the first position GC and other GC positions in the integron (clone TMB4, SSU29 and MMU2). Other noncoding cassettes have been previously found in cassette arrays comprising, for example, between 4 and 49% of Vibrio spp. cassette arrays [50]. They have been hypothesised to contain promoters or encode regulatory RNAs [2]. It was previously shown that a Xanthomonas campestris integron GC encoded trans-acting small RNA, which was capable of regulating the virulence in Xanthomonas [51].
This survey on the presence of integrons and associated GCs in salivary metagenomic DNA has resulted in new information regarding the putative functions and diversity of GCs which likely reflects the highly variable physicochemical and stressful environment of the human oral cavity.
Supporting Information S1