A Method for Selectively Enriching Microbial DNA from Contaminating Vertebrate Host DNA

DNA samples derived from vertebrate skin, bodily cavities and body fluids contain both host and microbial DNA; the latter often present as a minor component. Consequently, DNA sequencing of a microbiome sample frequently yields reads originating from the microbe(s) of interest, but with a vast excess of host genome-derived reads. In this study, we used a methyl-CpG binding domain (MBD) to separate methylated host DNA from microbial DNA based on differences in CpG methylation density. MBD fused to the Fc region of a human antibody (MBD-Fc) binds strongly to protein A paramagnetic beads, forming an effective one-step enrichment complex that was used to remove human or fish host DNA from bacterial and protistan DNA for subsequent sequencing and analysis. We report enrichment of DNA samples from human saliva, human blood, a mock malaria-infected blood sample and a black molly fish. When reads were mapped to reference genomes, sequence reads aligning to host genomes decreased 50-fold, while bacterial and Plasmodium DNA sequences reads increased 8–11.5-fold. The Shannon-Wiener diversity index was calculated for 149 bacterial species in saliva before and after enrichment. Unenriched saliva had an index of 4.72, while the enriched sample had an index of 4.80. The similarity of these indices demonstrates that bacterial species diversity and relative phylotype abundance remain conserved in enriched samples. Enrichment using the MBD-Fc method holds promise for targeted microbiome sequence analysis across a broad range of sample types.


Introduction
From birth, humans participate in an intimate life-long relationship with their microbiome. Indeed, the number of microorganisms living in a human body is about 10-fold greater than the number of human cells [1]. Human-associated microbial communities affect diverse processes including digestion, immune system maturation, polysaccharide production, toxin degradation and pathogen defense [2]. Not all results of human-microbial interactions are positive; microbiota have been implicated as contributors to metabolic diseases through the modulation of host metabolism and inflammation. For example, bacteria have been implicated as a causative agent of atherosclerosis, which is associated with lipid accumulation and inflammation in the arterial wall [3]. Similarly, bacterial species are responsible for the two most common oral diseases in humans: dental caries (tooth decay) and periodontal (gum) disease [4]. Therefore, identification and characterization of complex microbial communities associated with humans is of increasing interest to the research community, medicine and public health.
It is now possible to study human-microbe relationships via DNA sequencing and analysis to establish identities, abundances, and functional characteristics of microbial community members [5,6], but these studies are hindered by the complex nature of typical samples. Libraries prepared from many biological samples represent DNA from a mixture of bacteria, fungi (mainly yeasts), viruses, protists and an overwhelming amount of host genomic DNA. Nucleic-acid based techniques such as polymerase chain reaction (PCR), quantitative PCR (qPCR) and massively parallel sequencing offer rapid and highly sensitive options for detecting microbial species in collected specimens. As many microorganisms are difficult to grow or are unculturable [7], nucleic acid techniques offer significant advantages in breadth and depth of coverage. Currently, 16S ribosomal RNA gene-based sequencing can detect both abundant and rare members of a microbial community [8]; however, 16S rRNA gene approaches are not fully adequate for epidemiological studies or virulence factor identifi-cation. To circumvent the limitations of gene-based amplicon (e.g. 16S rRNA gene) sequencing, whole genome shotgun sequencing (WGS) has emerged as an alternative strategy for assessing microbial diversity [9]. One limitation of species identification by this method is the presence of large amounts of host genomic DNA in addition to microbial DNA. Metagenomics of clinical samples by direct sequencing or PCR can be inefficient and time consuming since most reads are derived from the host. Of particular relevance are clinical samples containing the malaria parasite, Plasmodium falciparum, which often contain greater than 90% human DNA.
In this report we describe a method for the separation of large pieces of DNA containing methyl-CpG from a complex mixture of human and bacterial DNA using the constant region of human IgG genetically fused to a human methyl-CpG binding domain (MBD-Fc) known to interact specifically with methyl-CpG elements [10,11]. We demonstrate effective separation of vertebrate DNA from microbial DNA on the basis of differences in abundance of CpG methylation. We have analyzed the microbiomes of fish, human saliva, and human blood, as well as synthetic mixtures containing Escherichia coli and Plasmodium falciparum DNA with this method, and show evidence of relatively unbiased enrichment in all tested samples.

MBD-Fc Fusion Selectively Depletes Human DNA from Mixed DNA Samples
Our MBD-Fc approach uses the following strategy: The methyl-CpG binding domain of human MBD2 protein was genetically fused to the Fc tail of human IgG1 (MBD-Fc). A truncated form of recombinant Protein A was covalently coupled to a paramagnetic bead and was used to bind the MBD-Fc protein. This complex selectively binds double-stranded DNA containing 5-methyl CpG dinucleotides [12].
To demonstrate specific interaction between the MBD-Fc fusion and methylated DNA, we prepared defined mixtures of 3 H labeled E. coli K12 MG1655 DNA with mammalian genomic DNA from IMR-90, HeLa, and Mouse NIH 3T3 cell lines. These cell lines were chosen because their genomic DNA exhibit varying levels of CpG methylation density with IMR-90 being most dense, followed by NIH 3T3, and HeLa DNA being least dense [13]. Each mixture contained 10% 3 H labeled E. coli DNA and 90% mammalian DNA (weight:weight). We first prebound the MBD-Fc protein with paramagnetic Protein A beads, then incubated increasing amounts of MBD-Fc bound Protein A with 500 ng of input DNA, and separated bead-bound from unbound DNA fractions using a magnetic field as outlined in Figure 1. Subsequently, we analyzed the amount of E. coli DNA in bound and unbound fractions with a scintillation counter, and mammalian DNA by gel densitometry measurements. We observed that mammalian DNA was efficiently depleted from the supernatant fraction after MBD-Fc enrichment. For the enrichment experiment in which we used 40 mL MBD protein, 1-4% of mammalian DNA remained in the supernatant for the IMR-90+E. coli, NIH 3T3+E. coli, and HeLa+E. coli mixed samples. Conversely, 84-100% of the 3 H labeled E. coli DNA (lacking CpG methylation) remained in the supernatant fraction ( Figure 2).
We further characterized and optimized the separation and enrichment of microbial DNA from human DNA using defined mixtures of human (IMR-90) and E. coli DNA with the MBD-Fc enrichment ( Figure 1). 1-2 million reads from each sample were acquired on an Ion TorrentH PGM sequencer and aligned to a synthetic reference genome containing both the hg19 human reference sequence [14] and the E. coli MG1655 reference sequence [15] using Bowtie 2.0.4 [16]. We calculated the percentage of reads mapping to the E. coli genome vs. human chromosomes from an unenriched control sample, the enriched supernatant fraction and DNA eluted from the fraction bound to the magnetic beads. The starting unenriched DNA mixtures varied from 2.5-10% E. coli DNA. After the enrichment, 65-85% of the reads mapped to the E. coli reference genome with only a small percentage (15-35%) of reads mapping to the human reference ( Figure 3). We also detected increased numbers of sequences mapping to the human mitochondrial genome in the supernatant fraction. While mitochondrial reads in the unenriched sample made up only 0.3% of human reads, mapped mitochondrial reads in the enriched sample made up 40% of human reads ( Figure S1). As a further control, we eluted and sequenced DNA that remained bound to the beads after enrichment. Analysis of reads from these samples showed 97-99% aligned to human chromosomes, and 1-3% aligned to the E. coli genome. The small number of E. coli reads from the bead-bound samples was evenly distributed across the genome (data not shown).
To determine the CpG methylation density required for effective binding of DNA by the MBD-Fc-protein A complex, T7 (39.9 kb) and lambda phage (48.5 kb) DNAs were cleaved with BstEII and XbaI restriction enzymes, respectively. BstEII cuts T7 once, forming two ,20 kb fragments; XbaI cuts lambda once, forming two ,24 kb fragments. The T7 and lambda DNA fragments were then methylated with M.HhaI (GmCGC), M.Hpa II (CmCGG) or both methyltransferases. The resulting pool of six DNA fragments had methylation densities ranging from 1.5 to 14 methyl-CpG residues per kilobase ( Figure 4A). From each pool, 250 ng of DNA was added to 40 mL of MBD-Fc-protein A beads, and the supernatant containing the unbound DNA was quantified using polyacrylamide gel electrophoresis and gel densitometry. Plotting the percentage of bound DNA vs. the number of methyl-CpG sites in these fragments reveals a threshold for efficient binding between 2 and 3 methyl-CpG per kilobase ( Figure 4B). Lister et al. [17] report between 45 and 62 million methyl-CpG sites in IMR90 and H1 cell lines. Assuming that these cell lines reflect typical human methylation density and assuming even distribution, we can estimate ,15-20 methyl-CpGs per kilobase of  human DNA. Bacterial genomes generally do not contain sufficient CpG methylation density to efficiently bind MBD-Fc [18,19].

Human Microbiome Analysis of Enriched Samples from Saliva and Blood DNA
To determine the level of enrichment in biological samples, we analyzed microbiomes from DNA extracted from human blood and saliva samples before and after enrichment with the MBD-Fc protein bead complex. We produced libraries from enriched and unenriched samples, as well as from DNA that remained bound to the bead complex. All libraries were sequenced on the SOLiDH 4 platform, acquiring 174-346 million reads per blood sample and 501-537 million reads per saliva sample. Reads from the enriched library (unbound supernatant fraction) were compared with reads from the unenriched DNA library. Reads were aligned to the hg19 human reference genome [14], the Human Oral Microbiome Database (HOMD) [20], and the PhageSeed database of phage genomes (http://phantome.org). After enrichment, we observed a dramatic increase in reads mapping to the HOMD database ( Figure 5A) and the PhageSeed database ( Figure 5B). 94-96% of reads aligning to the human reference genome in the unenriched experiment were depleted after enrichment, corresponding to an 8-fold increase in reads mapping to the HOMD database. We also sequenced libraries prepared from the DNA that remained bound to the paramagnetic bead pellet. As expected, the vast majority of mapped reads from the bound fraction (99.3%) aligned to the human reference genome ( Figure 5A). Plotting the abundance of known oral microbes observed by analysis with MetaPhlAn 1.7.1 [21] reveals high concordance between enriched and unenriched libraries for the most abundant species ( Figure 6). Good concordance was maintained even as the limit of detection for low abundance population members was approached ( Figure 6, inset). The predominant genera in saliva-extracted DNA were Haemophilus, Streptococcus, Neisseria and Veillonella (Table S1). In the commercial blood-derived DNA samples, Pseudomonas, Escherichia and Acinetobactor genera were predominant (Table S2). We also observed several common genera of bacteria in both saliva and blood microbiota including Klebsiella, Haemophilus and Escherichia. An analysis of phage sequences before and after enrichment in the saliva sample showed strong enrichment of many phage found in bacterial genera including Streptococcus, Enterobacteria and Haemophilus (Table S3).

Microbial Species Diversity and Abundance is Similar by MBD-Fc Enrichment
The high abundance microorganisms from the saliva microbiome were very similar between the unenriched and MBD-Fc enriched samples. For example, the percentage of the total sequence reads matching microbial genomes in unenriched vs. enriched samples, respectively, were, 33% vs. 34% for Streptococcus, 19% vs. 16% for Neisseria, 12% vs. 12% for Veillonella and 10% vs. 10% for Haemophilus (Table S1). A Shannon-Wiener diversity index was calculated for 147 bacterial species in saliva that were  [17] (red) and total 5-methylcytosine levels found in E. coli (blue) are shaded and indicated with arrows. Since it is not reported to have CpG methylation, E. coli DNA is not expected to bind the beads. Boundaries for the bacterial reference area are derived from chromatography results of nuclease digested DNA [46], since this represents the maximum possible amount of CpG methylation. Replicates of each methylation density are overlaid in this plot. doi:10.1371/journal.pone.0076096.g004 Enrichment of Microbial DNA from Host DNA PLOS ONE | www.plosone.org observed in the unenriched (H9 = 4.72) and enriched (H9 = 4.80) datasets, indicating our enrichment method preserves diversity of the microbiome sample. We observed one abundant species, Niesseria flavescens, which was anomalous in the saliva enrichment. This organism may exhibit an unusual methylation density [22], allowing it to bind the enriching beads at a low level. Other Niesseria species (N. mucosa, N. sicca and N. elognata) are represented, but did not exhibit this anomalous enrichment. These data demonstrate relatively unbiased enrichment of microbial genomes after MBD-Fc enrichment. A strong correlation between the lowabundance microbial species in the unenriched and enriched samples is further evidence of unbiased enrichment. Additionally, we observed low abundance microbes that would have been difficult to detect without enrichment. Deinococcus, Treponema and Bulleidia sequence reads cumulatively represented less than 0.003% of the population after enrichment and were not detected by MetaPhlAn analysis in the unenriched samples ( Figure 7A vs. 7B). Species reported at extremely low abundance (less than 0.01%) are derived from very few sequence reads and may represent false positives resulting from imperfect sequence data used to construct or query the MetaPhlAn database.
Enrichment of Plasmodium falciparum DNA from Human-Plasmodium Mixture MBD-Fc was effective in separating human host contamination from bacterial DNA based on differential CpG methylation density. Therefore, we hypothesized that Plasmodium falciparum DNA could also be enriched from human host genomic DNA. To test this, a mock sample (containing 90% human and 10% P.

Black Molly Fish Microbiome can be Enriched using MBD-Fc Beads
Since most vertebrate genomes contain methyl-CpG dinucleotide residues, we hypothesized that MBD-Fc based beads would also be effective at enriching for microbial DNA in other vertebrate samples. We enriched DNA derived from an entire black molly fish (Poecilia cf. sphenops), prepared libraries, and sequenced them on Illumina GAIIx and MiSeq sequencers. We analyzed the microbiome of the black molly using the MG-RAST [23] server and compared relative abundance of microbes between unenriched and enriched samples and observed an even enrichment, as demonstrated by the concordance plot ( Figure 9). The most abundant genera of bacteria reported were Aeromonas, Pseudomonas, Vibrio, and Shewanella. Reads were also analyzed by MetaPhlAn, which reported the most abundant genera in the same order as MG-RAST (Table S4).
Enrichment of microbial DNA from black molly fish using MBD-Fc beads also gave us sufficient sequence coverage of microbial genomes to build meaningful assemblies and high quality protein annotations. Our 50-bp single end metagenomic reads were assembled using the CLC de novo assembler [24] and yielded fungal, viral and bacterial contigs greater than 4000 bp, often with upwards of 20 fold coverage. Analysis of protein predictions using the MG-RAST server [23] yielded 198 contigs with more than 10 fold coverage matching bacterial derived drug resistance genes. More than 15 drug resistant genes derived from each of Aeromonas hydrophila, A. salmonicida, Vibrio cholerae and E. coli were also detected. These species are all known fish pathogens, and are likely targeted by antibiotic treatment as regular practice in the ornamental fish industry [25][26][27]. The most abundant drug resistance related protein had 496 coverage over a 1,390 bp contig and was annotated as the quinolone resistance protein QnrS2, derived from another the fish pathogen A. caviae. Quinolones are powerful broad-spectrum antibiotics with known usage in aquaculture, and QnrS2 has been previously implicated in antibiotic resistance in ornamental fish [25].

Discussion
We have described a robust, even, and broadly applicable enrichment method taking advantage of the properties of a methyl-binding domain genetically fused to the constant region of human IgG and linked to paramagnetic beads. This protein is known to specifically interact with 5-methyl CpG motifs found in vertebrate DNA but largely absent from microbial and mitochondrial DNA [28,29]. DNA fragments having sufficient CpG methylation (greater than ,3 sites/kilobase) efficiently bind MBD-Fc conjugated magnetic beads and can be quickly separated from less methylated DNA, which remains in the supernatant. We have demonstrated the utility of this method in a series of experiments of varied complexity. We began with experiments on DNA fragments of controlled methylation, providing evidence of the mechanism and capabilities of the MDB-Fc enrichment method. Next, mixtures of 3 H-labeled E. coli DNA with mammalian DNA were examined and showed that this technique is effective in a variety of cell types. Sequencing and analysis of experiments on defined mixtures of human and microbe DNA characterized the specificity of this method. Finally, we report experiments on a variety of real samples including human blood, human saliva, and fish, showing unbiased and robust enrichment.
In addition to the method described in this paper, we are aware of two other strategies available for enriching microbial DNA from human samples, both of which have been available commercially. The MolYsisH (Molzym GmbH & Co. KG, Bremen, Germany) strategy takes advantage of differential lysis of human and microbial cells. In short, human cells are lysed under chaotropic conditions, human DNA is removed by enzymatic digestion, the enzyme is then removed from the sample, and finally microbial cells are lysed and microbial DNA is purified. The MolYsis method could introduce bias in selection of microbial DNA since   [30]. Another method, PureproverH (previously available from SIRS-Lab GmbH, Jena, Germany), uses conventionally extracted microbiome/host DNA, and a protein to bind non-methylated CpG motifs in bacterial genomes [30]. Since microbes have a range of density of CpG motifs, capture efficiency of microbial DNA may vary, changing the microbiome species distribution in the analyzed sample.
Other methods of eliminating human DNA contamination from clinical samples (e.g. malarial patient samples) have various limitations [31]. Alternative methods might be specific to a single organism, involve a laborious and bias-prone culture step or employ complex lab techniques [32]. A method utilizing a methylcytosine dependent restriction enzyme (e.g. MspJI) to enrich Plasmodium DNA in malarial DNA samples was recently reported [33]. This approach uses a restriction enzyme requiring a methylated recognition sequence to selectively digest host DNA contamination in malarial samples. The method depletes 80% of the host DNA and enriches the Plasmodium DNA 9-fold but requires a 16 hour incubation at 37uC. Since malaria is often found in regions where sample collection, storage and processing is difficult, a more desirable approach would involve blood collection followed by DNA extraction of the clinical sample in the field for subsequent enrichment and analysis in a laboratory.
Many organisms present in vertebrate microbiomes are not culturable; therefore, high throughput sequencing of 16S ribosomal RNA gene amplicons is often used for characterization of natural communities. However, 16S ribosomal RNA gene sequencing only provides information about identity and abundance of community members without considering the substantial additional information available in the bacterial genomes. Additionally, 16S rRNA gene methods cannot be used to simultaneously detect both bacteria and the non-bacterial  [34,35]. Other culture independent techniques such as whole genome shotgun sequencing (WGS) are being used more frequently to identify organisms, provide insights into gene function and allow for inference of the functional potential of a microbial population. The presence of host DNA in such samples can result in a significant number of sequencing reads that must be discarded since they do not contain information about the microbial community. In the absence of an enrichment step, extensive sequence data must be obtained in order to achieve sufficient coverage of the sample of interest. In a recent report, analysis of the human salivary microbiome using WGS sequencing revealed a large fraction of the BLASTN [36] hits matched to human DNA, and only a small fraction represented bacterial and viral sequences amounting to 0.73% and 0.0036%, respectively [37]. Additionally, in 2012, the Human Microbiome Project Consortium [38] reported especially high levels of human DNA in soft tissue samples such as mid-vagina, anterior nares and throat; as well as high levels from DNA extracted from saliva. In our study of saliva-associated microbes, we observed low abundance of host sequences and high abundance of bacterial genera including Rothia, Streptococcus, Haemophilus, Veillonella, and Neisseria, among more than 100 total observed species.
We also observed strong enrichment of human mitochondrial DNA in both saliva and blood libraries. Mitochondrial and chloroplast DNA have been previously reported to have very low abundance of CpG methylation [28,29,39,40]. Although recent literature suggests that some CpG methylation is present in mitochondrial DNA [41,42], our observation of strong enrichment of mitochondrial DNA leads us to believe that the level of CpG methylation in the samples we have studied (blood, saliva, and cell lines) cannot be sufficient to promote efficient binding of the MBD-Fc beads. Similarly, we have observed strong enrichment of chloroplast DNA in plant samples using MBD-Fc based enrichment steps (data not shown). Furthermore, analysis of phage sequences before and after enrichment revealed increases in sequence reads from many species of phage with Streptococcus, Enterobacteria and Haemophilus phage being the major species.
Our studies with blood enrichment are complicated by the fact that the source material was from a commercial vendor with no specifications of collection date, source information (normal, disease or surgical patient) or method of storage prior to shipment. Bacteria from the genus Pseudomonas represented the majority of reads in the enriched dataset, which could reflect the presence of these organisms in the original host, but may represent environmental contamination during or after sample collection.
MBD-Fc bead enrichment is effective in non-human vertebrates as well. Various bacterial genera including Aeromonas, Pseudomonas, Vibrio, and Shewanella were observed in high abundance in our study of the black molly fish. We were not able to robustly assign reads to specific bacterial species, probably due to short read Depletion of methylated DNA using paramagnetic beadcoupled MBD-Fc is a simple and rapid procedure that can easily be automated or performed with minimal equipment. We have demonstrated that this method can be used to separate host DNA from microbial DNA in a variety of contexts (human saliva, human blood, black molly fish, and artificial mixtures of human and E. coli). In addition, we have successfully enriched P. falciparum DNA from a prepared mixture of human and P. falciparum DNA, demonstrating its practicality for improving the laborious enrichment of Plasmodium DNA in patient blood samples. We have observed efficient separation of low-methylation density DNA from DNA with higher 5-methyl CpG content in varied contexts. We expect the MBD-Fc bead method to be a robust method of separating microbial DNA from plant, animal, or any host organism DNA containing sufficient CpG methylation. Finally, through sequencing data analysis, we have shown that this enrichment method evenly enriches microbial DNA, accurately reflecting the diversity of microbial species in the original sample.

Ethics Statement
This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. The black molly (Poecilia cf. sphenops) protocols were conducted under the Marine Biological Laboratory IACUC protocol #12-35 by Linda Amaral-Zettler. Fish were maintained under the supervision of an institutional veterinarian, and sick or moribund fish were euthanized immediately to minimize suffering. To minimize suffering at the point of sacrifice, fish were euthanized immediately upon collection using a 250 ppm buffered Tricaine methanesulfonate (MS-222) solution.
Genomic DNA Preparation E. coli K-12 MG1655 and IMR-90 cells (ATCC CCL-186) were suspended in 50 mL of lysis buffer, 50 mM Tris (pH 7.5), 20 mM EDTA, 1% SDS, 100 mg/mL Proteinase K (NEB #P8102S), and incubated at 50uC for 3 hours with frequent mixing. DNA was extracted from samples once with Tris-EDTA equilibrated phenol and once with methylene chloride [43]. DNA was precipitated with two volumes of ethanol and washed twice with 70% ethanol. The precipitates were then centrifuged at 14,0006g for 15 minutes at 4uC and the pellets were air dried at room temperature and suspended in 200 mL of 16 TE buffer (pH 7.5). Twenty mL of 0.4 mg/mL RNase A was added and the samples were incubated at 37uC for one hour. The samples were phenol/methylene chloride extracted, ethanol precipitated as above and DNA pellets were suspended in 16TE buffer (pH 7.5). The final concentration of genomic DNA was adjusted to 100 mg/mL. Care was taken in pipetting steps to avoid unnecessary fragmentation of genomic DNA. DNA quality and quantity were assayed by agarose gel electrophoresis of the samples alongside a DNA marker (2-log DNA ladder, NEB #N3200S) and by NanodropH spectrophotometry.
Preparation of 3 H E. coli DNA E. coli cells were grown in Luria broth (LB) tetracycline and tritiated thymidine (Moravek Biochemicals MT 6036) for 6-8 hours at 30uC in a shaker incubator (300 rpm). Then, bacterial cells were pelleted at 14,0006g for 10 min. in a centrifuge. DNA was extracted by phenol/chloroform and precipitated by ethanol.
Specific activity of labeled E. coli DNA was adjusted to 30,000 cpm/ml by mixing with unlabeled E. coli DNA.

Metagenomic DNA Preparation
25 mL of human buffy coats/leukocytes (Innovative Research) were suspended in 50 mL of lysis buffer and genomic DNA was prepared as described above. 500 mL of pooled human saliva (Innovative Research) was centrifuged at 14,0006g for 15 minutes at 4uC, the pellet was resuspended in 50 mL of lysis buffer, and genomic DNA was prepared as described above. Agarose gel analysis revealed ,50% of DNA from the pooled saliva sample was shorter than 10 kb. To further purify the DNA and enrich for high molecular weight fragments, the sample was loaded on a 1% low melt agarose gel with 16 SYBRH Safe DNA Gel Stain (Life Technologies). The ,15 kb band was cut out of the gel. The resulting gel slice was melted at 50uC and cooled to 42uC. 200 mL of 106 b Agarase I buffer and 20 units b Agarase (NEB #M0392S) were added to the sample and incubated at 42uC for 30 minutes. DNA was precipitated using 2 volumes of ethanol, air dried at room temperature and resuspended in 16 TE buffer (pH 7.5) to a concentration of 100 mg/mL.

Black Molly DNA Preparation
A single ,5 cm black molly was euthanized in MS-222, rinsed and homogenized in 30 mL 16 PBS with dissection scissors. The homogenate was vortexed for 10 minutes to disassociate microbial cells from host tissue, and then filtered through a 5-mm Isopore TM polycarbonate filter (Millipore part no. TMTP02500). The filtrate was centrifuged at 14,0006g for 10 minutes to pellet microbial components of the fish sample. The pellet was then dried and resuspended in 600 mL GentraH Puregene Yeast/Bact. Kit cell lysis solution (Qiagen #158722), and DNA purification was performed according to the manufacturer's protocol.

Simulation of Clinical Malaria Samples
For the malaria DNA enrichment, P. falciparum 3D7 genomic DNA was obtained from Prof. Chris Newbold's laboratory at the University of Oxford, UK. Human genomic DNA was purchased from (Promega #G3401). Mock samples were manually prepared by mixing 0.2 mg of P. falciparum 3D7 genomic DNA with 1.8 mg of human genomic DNA to obtain 2 mg of a simulated clinical genomic DNA sample.

Prebinding of MBD-Fc Protein to Protein A Paramagnetic Beads
Protein A paramagnetic beads (NEB #E2612, NEB #E2615A) were uniformly suspended in bind/wash buffer (NEB #E2612, NEB #E2616A) by gentle pipetting. 1 mL of the suspension was transferred to a 1.5 mL microcentrifuge tube, and 100 mL of MBD-Fc protein solution (NEB #E2612, NEB #E2614A) was added into the tube. The paramagnetic beads and MBD-Fc protein mixture was gently rotated for 10 minutes at room temperature. The tube was placed in a magnetic separator at room temperature until the supernatant was clear and beads were collected on the wall of the tube (5 minutes). The supernatant was removed and discarded using a pipette without disturbing the beads. 1 mL of 16 ice-cold bind/wash buffer was added to the tube to wash the beads, the tubes were removed from the rack, and the solution was pipetted up and down three times. The sample was mixed on a rotating mixer for 3 minutes at room temperature, and then briefly centrifuged. The tube was placed in a magnetic separator at room temperature until the supernatant was clear and beads were collected on the wall of the tube (5 minutes). The supernatant was removed and discarded using a pipette without disturbing the beads. The wash step was repeated. After the final wash, beads were resuspended in 1 mL of ice cold 16 wash/bind buffer and kept at 4uC for no more than 7 days.

Enrichment of Microbial DNA
The purified sample DNA prepared as described above was mixed with MBD-Fc protein A beads in a ratio of 1 mg of sample DNA to 160 mL of beads. The sample DNA was directly added to the bead slurry and incubated for 15 minutes at room temperature with gentle rotation. The incubated mixture was placed on a magnetic rack at room temperature until the supernatant was clear and beads were collected on the wall of the tube (2-5 minutes). The supernatant, containing enriched microbial DNA was carefully removed with a pipette without disturbing the beads, purified by 1.86 volume of Agencourt AMPureH XP beads (Beckman Coulter #A63880) according to the manufacturer's instructions and the DNA was eluted in 150 mL 16 TE buffer (pH 7.5). Volumes for this procedure were scaled directly depending on the amount of input DNA.
For the samples described in this study, the following specific input amounts used for the microbial DNA enrichment experiments were as follows: saliva microbiome enrichment, 12 mg of input DNA was enriched using 2 mL of MBD-Fc Protein A paramagnetic beads; black molly microbiome enrichment, 9.12 mg of input DNA was enriched using 1.5 mL of MBD-Fc Protein A paramagnetic beads; and for Plasmodium falciparum DNA enrichment, 2 mg Plasmodium falciparum/human DNA mixture was enriched using 320 mL of MBD-Fc Protein A paramagnetic beads.

Elution of Host DNA
Host DNA bound to the MBD-Fc protein A beads during the enrichment of microbial DNA procedure was recovered for further analysis. After the supernatant-containing enriched microbial DNA was removed (described above), any remaining liquid was removed from the bottom of the tube without disturbing the beads and was discarded. The beads were rinsed with 1 mL 16 ice-cold wash/bind buffer, placed on a magnetic rack at room temperature until the supernatant was clear, beads were collected on the wall of the tube (2-5 minutes) and the supernatant was discarded. The beads were then resuspended in 150 mL 16 TE buffer (pH 7.5), and 15 mL of Proteinase K (NEB #P8102) was added before incubating at 70uC for 20 minutes with occasional mixing. The tube was briefly centrifuged and placed on a magnetic rack at room temperature until the supernatant was clear and beads were collected on the wall of the tube (2-5 minutes Illumina libraries for the Plasmodium falciparum enrichment experiment were prepared as follows: Genomic DNA (500 ng) was sheared using a Covaris S2 device to obtain an average fragment size of ,350 bp. Illumina paired-end sequencing libraries were constructed using the NEBNext DNA Library Prep Reagent Set for Illumina (NEB #E6000) following the standard Illumina sample preparation protocol. PCR library amplifications were performed with an MJ Research Thermo Cycler PTC-225. Illumina PE 1.0 and 2.0 primers or PE 1.0 and 2.0-derived indexing primers were used to amplify adapter-ligated library fragments by PCR. Libraries were amplified using optimized PCR conditions described in [44].

Calculation of Percent Read Metrics
Unless otherwise specified, percent enrichment was calculated as below using only mapped reads.
Percent microbe reads = (number of microbe reads/(number of microbe reads + number of host reads)) * 100.
Percent host reads = (number of host reads/(number of microbe reads + number of host reads)) * 100.

Ion Torrent Sequencing of Defined Mixtures
Fastq files were processed to remove reads shorter than 50 base pairs and mapped to a combined reference sequence database containing both the hg19 human reference genome and the E. coli MG1655 genome. Reads were mapped to this reference sequence using Bowtie 2.0.4 using the sensitive, end-to-end options.

SOLiD 4 Sequencing of Blood and Saliva DNA Samples
50 base pair reads were acquired from blood libraries and saliva libraries. SOLiD csfasta and qual files were combined using the solid_to_fastq.py script from Galaxy [45]. The resultant fastq files were mapped to hg19, the PhAnToMe PhageSeed database (http://phantome.org/downloaded on 09/01/2012) and the Human Oral Microbiome Database (HOMD) [20] using Bowtie 0.12.7 with parameters allowing 2 mismatches in a 28 bp seed region. The MetaPhlAn [21] database was manually indexed using Bowtie 0.12.7 tools to allow alignment of SOLiD colorspace reads.

Illumina Sequencing of Black Molly DNA Sample
Fastq reads from a 66 bp MiSeq run and a 50 bp GAIIx run were combined and analyzed with MG-RAST [23]. Relative abundance was calculated by number of hits on each bacterial genus reported by the total number of bacterial hits. For the MetaPhlAn analysis, reads were trimmed using Sickle (available at https://github.com/vsbuffalo/sickle) until quality scores from 50 bases averaged at least Q20 (Sanger units). These reads were mapped to the MetaPhlAn database using the Bowtie 2 [16] ''very sensitive'' parameter set.

Shannon-Weiner Diversity Index Calculation
The Shannon-Weiner Diversity, H9, was calculated using the following equation: H9 = 2S pi ln(pi) where pi is the proportion of each species in the sample. 149 species were used in the calculation from the saliva HOMD dataset.