Metagenomic Analysis of Viral Communities in (Hado)Pelagic Sediments

In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 106 to 1011 viruses/cm3 of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24−30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10−3 in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95−99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses.


Introduction
Viruses represent the most abundant number of biological components by far in aquatic ecosystems [1], and viral ecology in environments such as oceanic surface waters, coastal, and fresh waters have been intensively investigated [2]. Viral activity in aquatic environments is known to regulate the dynamics and mortality of the host microbial community [3][4][5][6][7][8][9]. The lytic processes of the host microbial cells infected by marine viruses, termed the ''viral shunt'', supply organic matter to dissolved carbon and nutrient pools [10][11][12]. Furthermore, viruses have been noted as natural genetic vectors for horizontal gene transfer events [13,14]. Despite their ecological and evolutionary importance, our current knowledge of marine viruses is restricted to the euphotic zone of the habitat, which represents only a limited portion of the oceanic biosphere [15]. Viral ecology in sedimentary environments has been poorly studied, although the seafloor sediments cover almost two-thirds of the Earth's surface and serve as highly vital and dynamic interface habitats in global ocean biogeochemical cycles [16].
Deep-sea sediments (down to 10 cm below the seafloor [cmbsf]) harbor a great number of viral particles (.10 7 viruses/cm 3 sediment) and high virus productivity associated with large prokaryotic biomasses ranging from 10 6 to 10 8 cells/cm 3 sediment [17,18]. These observations suggest that viral infections have a large impact on deep-sea sedimentary microbial communities and that the benthic prokaryotic biomass is sustained by the ''viral shunt'', which is estimated to provide 35% of organic carbon for the total benthic microbial production [18]. However, the genetic composition and diversity of viral communities in deep-sea sediments have not yet been reported.
A comprehensive metagenomic approach to environmental viral populations (viromes) can provide insight into the genetic diversity and previously unidentified constituents of the viral communities of various ecosystems [19][20][21][22][23][24][25][26]. Two different wholegenome amplification methods have been used for virome analysis. One is known as the linker-amplified shotgun library (LASL) method [27] and is only applicable to double-stranded DNA (dsDNA). The LASL method has been applied in several virome studies, such as of surface seawater [27], human feces [28], and fermented foods [29]. These virome studies have suggested that a large proportion of the DNA viruses infect prokaryotic hosts, while most RNA viruses analyzed by reverse transcription infect eukaryotes [30]. The other method, known as multiple displacement amplification (MDA) with phi29 polymerase [31], can amplify both the dsDNA and single-stranded DNA (ssDNA) of the viral genomes, although this method is known to have a consider-able bias for the preferential amplification of small circular genomes (129 kb) from ssDNA viruses [32,33]. Using this method, the distribution and diversity of ssDNA viruses (including both phages and eukaryotic viruses) have been investigated in various environments, such as marine waters [19,34], modern microbialites [35], coral [36], temperate freshwater lakes [37], the Antarctic lake [21], reclaimed water [38], the human gut [39,40], and rice paddy soil [33]. However, the host ranges and ecological impacts of these ssDNA viruses are still largely uncertain [41].
In this study, we used 454 pyrosequencing to conduct a virome analysis of deep-sea shallow subseafloor sediments (down to 40 cmbsf) in three distinct (hado)pelagic environments of the northwest Pacific: the hadopelagic sediments in the Izu-Ogasawara Trench (water depth = 9,760 m), the hadopelagic sediments in the Challenger Deep of the Mariana Trench (water depth = 10,325 m), and the pelagic sediments off Shimokita Peninsula (water depth = 1,181 m). To our knowledge, this study is the first to describe the characteristics of viromes in deep-sea sediments and identify novel ssDNA viruses that are distinct from viral genotypes previously known to occur in ocean environments.

Ethics Statement
The sampling in the Mariana Trench during the JAMSTEC KR08-05 cruise was approved by the U.S. Government. No specific permits were required for the other field studies described here and sampling locations are not privately-owned or protected. The field studies did not involve endangered or protected species.

Sediment Samples
Sediment cores from the Izu-Ogasawara Trench (29u099 N, 142u49' E; 9,760 m water depth) (Fig. 1) and the Challenger Deep in the Mariana Trench (11u229 N, 142u429 E; 10,332 m water depth) (Fig. 1) were obtained with a gravity corer of the ROV ABISMO (Automatic Bottom Inspection and Sampling Mobile) during the JAMSTEC KR07-17 (December 2007) and KR08-05 (May2June 2008) cruises with the R/V Kairei [42], respectively. The lengths of the cores were 1.0 m and 1.3 m from the Izu-Ogasawara and Mariana Trench, respectively. A short core (40 cm in length) of seafloor surface sediment from offshore of the Shimokita Peninsula was obtained using a push corer of the ROV HyperDolphin during the JAMSTEC NT06-13 cruise (Dive #581:41u109 N, 142u129 N; 1,181 m water depth) ( Fig. 1) with the R/V Natsushima. Each sediment core was subsampled from top to bottom at every 2210 cm interval using sterilized top-cut 50 mL syringes or spatulas. Subsamples were stored at 280uC until the viral DNA was collected. The total organic carbon of each subsample was estimated with a Flash EA1112 elemental analyzer (Thermo Fisher Scientific, Waltham, MA, USA) at S1 Science (Saitama, Japan).

Prokaryotic 16S rRNA Gene Clone Analyses
To identify the phylotype compositions of the prokaryotic communities in the (hado)pelagic surface sediments, DNA was extracted with a PowerSoil DNA Isolation kit (Mo Bio Laboratories, Carlsbad, CA, USA) following the manufacturer's instructions. DNA was extracted from approximately 0.25 g of sediments that were a portion of the same subsample used for the virome analysis. The method for this clone library analysis is . Prokaryotic community structures based on the bacterial and archaeal 16S rRNA gene clone sequences detected from the deep-sea shallow subseafloor sediments from the Ogasawara, Mariana, and Shimokita locations. The numbers on the right of each row show the numbers of the sequenced clones in each library. The ''Others'' category represents the bacterial taxa that compose less than 5% of the total clone numbers. doi:10.1371/journal.pone.0057271.g002 described in the supplementary material (Materials and Methods S1).

Direct Counting of Viral Particles
To enumerate the viral particles, approximately 1 cm 3 of frozen sediment was promptly suspended in 10 mL of modified SM buffer (10 mM MgSO 4 ; 50 mM Tris-HCl, pH 7.5) containing 3% NaCl (w/v) and 2% formaldehyde in a 50 mL centrifuge tube. The slurry was shaken with a ShakeMaster (BioMedical Science, Tokyo, Japan) for 1 min at the maximum speed and then sonicated for 1 min with an ultrasonic homogenizer (UH-50; SMT company, Tokyo, Japan) to detach viruses from sediment matrices [43]. After centrifugation, the size fraction of prokaryotes was removed from the supernatants through a 0.2 mm cut-off filter. The viral population was then filtered onto a 0.02 mm pore-size Anodisc membrane filter (Whatman, Kent, UK). The filters were rinsed thoroughly three times with 2 mL of fresh SM buffer, and the viruses on the filter were stained with 206 SYBR Gold (Invitrogen, Carlsbad, CA, USA) at room temperature for 20 min [44]. After rinsing with pure water, each filter was mounted on a glass slide with immersion oil. Viruses on the filter were observed with a fluorescence microscope (model BX61; Olympus, Tokyo, Japan) using a fluorescence filter set (WIB; Olympus). The number of viruses was counted in at least 10 microscopic fields for each sample.

Construction of Virome Libraries
Sediment core samples at a core depth of 20230, 0210, and 5210 cmbsf in the Ogasawara Trench, the Mariana Trench, and offshore the Shimokita Peninsula, respectively, were used to construct the libraries of viral metagenomes (viromes). The libraries were obtained by following the procedures described by Casas and Rhower [45] with minor modifications. A total of approximately 100 cm 3 frozen sediments was suspended in 400 mL of modified SM buffer containing 3% NaCl (w/v), dispensed into 50 mL centrifuge tubes, and incubated for 1 h at 4uC. The slurry was shaken for 1 min with a ShakeMaster (BioMedical Science) at the maximum speed and centrifuged at 6,0006g for 15 min. The supernatant was filtered with a 0.2 mm filter, and viral particles were precipitated with 10% polyethylene glycol (PEG) 8000 (w/v) overnight at 4uC. Viral particles were collected by centrifugation at 11,0006g for 30 min. The viral fractions were further purified by cesium chloride (CsCl) density centrifugation as described previously [45]. The virome library was then obtained by using formamide and CTAB/NaCl according to Casas and Rhower [45]. The obtained libraries were amplified with a REPLI-g Midi Kit (Qiagen), and remnant ssDNA in the amplified genomes was digested with S1 nuclease (Invitrogen, Carlsbad, CA, USA).

Virome Composition Analysis
The virome libraries from the deep-sea shallow subseafloor sediments were analyzed with a 454 GS FLX Titanium pyrosequencer (Roche, Basel, Switzerland) by Beckman Coulter Genomics (Danvers, MA, USA). An eighth of a PicoTiterPlate device was used to sequence each of the three virome libraries. The CLC Genomics Workbench ver. 5.5.1 (CLC Bio, Aarhus, Denmark) was used to remove poor quality reads (the parts with Phred quality scores lower than 20 were trimmed off; the rest of the trimmed reads have a length shorter than 100 bp) or artificial duplicates (they share a common sequence of at least 20 bp in the beginning; the rest of the reads have an alignment scores above 80% of the optimal score) and to assemble the trimmed reads from each library. The default values were used for all the parameters in the assembly. The obtained sequences of contigs and singletons were subjected to BLASTx analyses against the NCBI GenBank nonredundant (nr) protein database [46]. MEGAN (MEtaGenome Analyzer; version 4.61.6) software was used to assign taxonomic groups of viruses and cellular organisms (bacteria, archaea, and eukaryotes) to the sequences with significant BLAST hits (E-values .10 23 ) in the three libraries [47,48]. The MEGAN-based taxonomic assignment was performed based on the top 10% of the significant hits.

Functional Analysis of Virome Genes
Predicted protein-encoding sequences (CDSs) from the contigs in the virome libraries were identified with the MetaGeneMark [49] and Glimmer-MG [50] programs, and additional CDSs were identified by BLASTx searches. In the partially overlapping CDSs from two different methods, the longer one was used for the analysis. These full and partial CDSs were classified functionally according to the SEED-subsystems [51] based on the BLASTp search results. The top 10% of the significant hits (E-value ,10 23 ) were used to infer gene functions.

Phylogenetic Analysis of the ssDNA Viral Genes
Two ssDNA viral markers (the major capsid protein [VP1] gene and the putative replication associated protein [Rep] gene) were used to construct the phylogenetic trees. These markers from the virome genes were screened based on significant sequence similarity (E-value ,10 23 in BLASTp) to the references in the GenBank nr protein database and presence of the conserved Pfam domains (Pfam 26.0; http://pfam.sanger.ac.uk/): the Phage_F (PF02305) domain, in the VP1 genes; the Viral Rep (PF02407) or Gemini_AL1 (PF00799) domains, in the Rep genes. Multiple sequence alignments of the conserved domains in their marker genes were constructed by the MAFFT program [52,53]. Phylogenetic analyses with the neighbor-joining method [54] were performed with the MEGA5.05 program [55].

Comparison of Viromes
The virome libraries from the deep-sea shallow subseafloor sediments were compared with the virome data from other environments with MetaVir (http://metavir-meb.univbpclermont.fr/) [56]. In the MetaVir workflow, viromes were compared based on sequence similarity with a cross-tBLASTx search as described in Martín-Cuadrado et al. [57]. The viromes in deep-sea sediments were compared with all deposited viromes in the MetaVir using tBLASTx. A similarity score between the two viromes was computed as the sum of best BLAST hit scores of a sequence component in one virome library against a counterpart in the other virome library. Finally, the resulting score matrix (i.e., similarity scores for all virome pairs) was used to cluster the viromes with R software (version 2.14.0; http://www.r-project. org/) [58] and the PVCLUST package (http://www.is.titech.ac. jp/shimo/prog/pvclust/) [59] using a construction method based on the unweighted pair-group method with arithmetic averages (UPGMA). The confidence of the clustering was assessed with the multiscale bootstrap resampling clustering algorithm in PVCLUST [59] and indicated by the approximate unbiased bootstrap probability at selected nodes.

Nucleotide Sequence Accession Numbers
All pyrosequencing read data from the three virome libraries obtained in this study have been submitted to the DDBJ Sequence Read Archive (DRA) (http://trace.ddbj.nig.ac.jp/dra/index_e. shtml) under the accession number DRA000564. The sequences of the VP1 and Rep genes from the virome libraries used for the phylogenetic analyses of the ssDNA viral assemblages were deposited into the DDBJ/EMBL/GenBank nucleotide sequence databases under the accession numbers BAKA01000001 to BAKA01000006 (Ogasawara library), BAKB01000001 to BAKB01000011 (Mariana library), and BAKC01000001-BAKC01000114 (Shimokita library). The 16S rRNA gene sequences obtained in this study were deposited in the DDBJ/ EMBL/GenBank nucleotide sequence databases under the accession numbers AB734482 to AB734640.

Sample Characteristics
The deep-sea sediments used in this study were obtained from three geographically and geologically distinct areas in the Northwest Pacific (Fig. 1). The Challenger Deep in the Mariana Trench is the deepest part of the world's oceans under the oligotrophic water masses [60]. The forearc basin off the Shimokita Peninsula is located in the area near the coast of northeastern Japan, with high primary production, and the sediments of the area are characterized by a large amount of subseafloor microbial biomass [61]. The sampling station at the Izu-Ogasawara Trench is one of the deepest points of the trench system. The shallowest sediment (down to 30 cmbsf) off Shimokita Peninsula contained high organic carbon contents (2.6223.81% weight of TOC) compared with the shallow subseafloor sediments at the Ogasawara Trench and Mariana Trench (0.6720.86 and 0.1520.23 wt%, respectively), reflecting different oceanographic backgrounds between the (hado)pelagic sedimentary habitats [62].
The abundance of viruses in the shallowest 30 cmbsf sediments was determined to be 9.9610 7 21.8610 11 viruses/cm 3 sediment in the off Shimokita Peninsula (SH) sediments, 5.8610 7 26.6610 7 viruses/cm 3 in the Ogasawara Trench (OG) sediments, and 2.4610 6 25.3610 7 viruses/cm 3 in the Mariana Trench (MA) sediments (Table S2). The abundance of viruses in the SH and MA decreased with increasing sediment depth but did not decrease significantly in the OG sediments. Based on the virus abundance data in the shallow subseafloor sediments, we chose subsamples of the sediments for the subsequent virome analysis and prokaryotic 16S rRNA gene clone analysis. The sediment samples with relatively high virus abundances of 7.6610 10 viruses/ cm 3 at a depth of 5210 cmbsf in the SH sediments, 6.6610 7 viruses/cm 3 at a depth of 20230 cmbsf in the OG sediments, and 1.2610 7 25.3610 7 viruses/cm 3 at a depth of 0210 cmbsf in the MA sediments were used for the subsequent investigations.
The phylotype compositions of the prokaryotic communities in the deep-sea shallow subseafloor sediment samples that hosted relatively abundant virus populations were assessed by 16S rRNA gene clone analysis. Most of the 16S rRNA gene phylotypes recovered from the sediments were derived from the previously uncultivated prokaryotes but were related to environmental sequences that have frequently been identified in deep-sea surface and subseafloor sedimentary habitats. In addition, the proportions of the phylum-level compositional groups in the 16S rRNA gene clone libraries (Fig. 2) were different between the three sedimentary habitats, but a considerable portion of their constituent phylotypes was identified commonly among the deep-sea shallow subseafloor sediments. The predominant phylogroups in the SH sediment were Deltaproteobacteria (32%) and Gammaproteobacteria (24%) (Fig. 2). In contrast, phylotypes affiliated with Chloroflexi (35%) and Planctomycetes (25%) dominated the phylotype composition of the OG sediment. In the MA sediment, the phylotypes of the marine group I archaea represented the most predominant prokaryotic components (20%) (Fig. 2).

Compositions of the Viromes
A total of 37,458, 39,882, and 70,882 sequence reads were obtained from the pyrosequencing libraries of the three surface sedimentary viromes in the OG, MA, and SH sediments, respectively (Table 1). Only 24230% of the sequence reads in the libraries exhibited significant similarity (E-value ,10 23 in BLASTx) to the sequences deposited in the GenBank nr protein database (Fig. 3A), and these reads were further classified into viral, prokaryotic, or eukaryotic sequences based on the top 10% of the significant hits (Fig. 3A). In the SH pyrosequencing library, a relatively higher proportion (28%) of reads were assigned as of potentially viral origins, and either 0.3% or 0.5% of the reads were categorized as being of potential bacterial or eukaryotic origins. The potentially viral origin of reads was found in 10% and 4% of the OG and MA pyrosequencing libraries, respectively, while 11% and 6% of the OG and MA pyrosequencing libraries, respectively, were likely derived from a bacterial origin. However, as reported in other environmental virome studies (e.g., [19,21,35,63]), the similarity analysis revealed that most of the sequences in all of the pyrosequencing libraries were unclassified (Fig. 3A).
The potentially virus-derived sequences were further classified into sequences tentatively associated with the family-level taxa of viruses (Fig. 3B). Most of the viral reads in all three libraries were found to be genetic components from ssDNA viral families, including Microviridae, Circoviridae, Nanoviridae, and Geminiviridae. These tentative ssDNA viral sequences together occupied 95299% of the total viral reads in each library (Fig. 3B). The ssDNA viral sequences from the OG sediment were related to the  genetic components from Microviridae phages (81% of the total viral reads), whereas 59% of the total viral reads from the MA sediment library were likely derived from the Circoviridae2Nanoviridae viral group, which is known to infect eukaryotes [34]. In the SH sediment library, the sequences associated with both viral groups (Microviridae and Circoviridae2Nanoviridae groups) were predominant (53% and 44% of the total viral reads, respectively). In contrast, the possible viral reads related to dsDNA viruses, including the order Caudovirales, known as ''tailed bacteriophages'', were detected as very minor populations (0.0323.2% of the total viral reads) in the three libraries.

Profile of Functional Genes from the Viromes
All of the constituent sequence reads of the genes predicted from the deep-sea shallow subseafloor sedimentary viromes were subjected to functional assignments based on the SEED-subsystems (Fig. S1). Most of the functionally assigned sequences among all libraries belonged to the viral protein category (37298% of the total reads assigned). Only a small fraction of the sequences from the OG and MA sediment libraries were classified into various functional categories, including microbial metabolism (Fig. S1). The sequences assigned as the viral protein category were further subgrouped into several subcategories ( Table 2). A majority (76294%) of the viral genes from each library were classified into three ssDNA viral protein categories: replication proteins, major capsid proteins, and minor capsid proteins ( Table 2).

Diversity of ssDNA Viral Sequences in the Viromes
The genetic diversity of the ssDNA viral sequences obtained from three libraries was examined with the MetaVir tool [56], enabling the detection of the diversity of representative viral marker genes (Table S1). As a result, only three viral makers were identified, and these markers are summarized in Table 3: conserved major capsid protein (VP1) of Microviridae phages [35], a putative replication initiation protein (Rep) of the eukaryotic infectious Circoviridae2Nanoviridae2Geminiviridae group [34], and a terminase large subunit (TerL) [64] of the dsDNA viruses affiliated with Caudovirales. A high genotypic diversity of these two ssDNA viral sequence groups was found (833 genotypes for VP1 and 2,551 genotypes for Rep).
Based on the MetaVir data, we selected two ssDNA viral markers (the VP1 and Rep genes) to construct the phylogenetic trees to determine the phylogenetic relationship between the potential deep-sea shallow subseafloor sedimentary ssDNA viruses and previously identified viruses, including environmental sequences. From the virome CDSs identified by multiple informative programs (e.g., MetaGeneMark [49]) for gene finding from metagenomic sequences, we screened 100, 35, and 686 CDSs encoding partial or complete viral VP1 genes and 85, 57, and 784 CDSs encoding partial or complete putative viral Rep genes from the OG, MA, and SH virome libraries, respectively. Then, three conserved domains in the VP1 and Rep genes were explored on the Pfam website: the major capsid protein F domain (PF02305, Phage_F) in VP1 and the putative viral replication protein domain (PF02407, Rep Viral) and Geminivirus Rep protein catalytic domain (PF00799, Gemini_AL1) in Rep. Consequently, we obtained 11 (from the OG library), 7 (from the MA library), and 71 (from the SH library) CDSs that harbored at least one complete or nearly complete conserved domain (7, 2, and 36 sequences for the major capsid protein F domain; 4, 5, and 72 sequences for the putative viral replication protein domain; and 0, 0, and 17 sequences for the Geminivirus Rep protein catalytic domain).
We constructed a phylogenetic tree for each viral marker gene domain (Fig. 4, 5, 6). The phylogenetic tree of the Microviridaecapsid protein F domain revealed that the sequences obtained in this study were more closely related to sequences from an intracellular parasitic bacteria-infectious phage group (Chlamydia, Bdellovibrio, and Spiroplasma phages) or environmental sequence groups detected in oceanic waters and marine sedimentary microbialites compared with those from the Enterobacteria and Bacteroidetes phage groups (Fig. 4). However, our benthic virome sequences did not fall within any groups composed of known Microviridae sequences (Fig. 4). The phylogenetic tree of the viral_Rep domain group indicated that the sequences obtained from the deep-sea shallow subseafloor sediments were very diverse and that most were distinct from previously characterized ssDNA viral groups (Fig. 5). In addition, the phylogenetic analysis of the Gemini_AL1 domain revealed that the sequences identified in this study were moderately related to the known members of the Geminiviridae family and that all the virome sequences, with the exception of MPSH00373, formed a novel phylogenetic cluster (Fig. 6).
We also employed an alternative approach to examine the ssDNA viral diversity by using the automatic tree construction tool in MetaVir [56]. In contrast to the above-mentioned phylogenetic analysis, which was performed with sequences that were as long as possible, this tree construction tool has been developed to analyze as many metagenomic sequences as possible in phylogenetic trees with reference sequences for each genetic marker (for details, see Materials and Methods S2 in the supplemental material). Representative reliable phylogenetic trees of the VP1 and Rep sequences, including relatively abundant genotypes obtained in this study, are shown in Fig. S2 and Fig. S3, respectively. The phylogenetic trees also indicated that the VP1 and Rep sequences from the deep-sea shallow subseafloor sediments were phylogenetically distinct from those of any previously known Microviridae phages and eukaryotic infectious ssDNA viruses. The results also supported the phylogenetic topology and diversity found in the trees constructed from longer sequences (Fig. 4, 5, 6).

Discussion
Most of the potentially virus-originating sequences from the deep-sea shallow subseafloor sediments were similar to sequences from ssDNA viruses, such as the families of Microviridae, Circoviridae, Nanoviridae, and Geminiviridae (Fig. 3B). These ssDNA viruses have been isolated only from terrestrial environments; however, recently, both isolation and metagenomic studies have revealed the existence of ssDNA viruses in marine environments. To date, seven ssDNA viruses infecting marine diatoms (the new genus Bacilladnavirus) [65][66][67][68] have been isolated, and Holmfeldt et al. [69] reported the first description of Microviridae phages that infect marine Bacteroidetes (Cellulophaga baltica). Furthermore, diverse ssDNA viral-related sequences have also been recovered in metagenomic investigations of marine environments, such as oceanic waters [19,38,70], coastal microbialites [35], coral [36], and marine protist cells [71].
Of the ssDNA viral families phylogenetically associated with the sequences from the deep-sea shallow subseafloor sediments, only Microviridae is a bacteriophage family, whereas the other families are known as eukaryotic viral families; Circoviridae infects animals, and Nanoviridae and Geminiviridae infect plants. The phylogenetic analyses of the virome genes revealed that the viromes in the deepsea shallow subseafloor sediments harbored diverse phage VP1and viral Rep-related sequences ( Table 3) that were genetically distinct from the previously known Microviridae phages and eukaryotic infectious ssDNA groups and their homologs identified by metagenomic characterizations of other environments (e.g., oceanic and fresh waters) (Fig. 4, 5, 6; see also Figs. S2 and S3 in the supplemental material). In eukaryotic ssDNA viruses, the Rep protein family is known to include non-viral replication-associated proteins from bacterial plasmids (Bifidobacterium pseudocatenulatum pM4 and Phytoplasma sp.) and protozoan genomes (Giardia intestinalis and Entamoeba histolytica) [72]. In particular, the genetic diversity of the Rep genes obtained from the deep-sea sedimentary viromes suggests that the potential ssDNA viruses harboring the Rep genes would have much greater diversity in host selection than presently expected [73,74]. Thus, the genetically diverse ssDNA virus candidates in the deep-sea shallow subseafloor sediments may infect not only eukaryotic but also prokaryotic   Table 3. Distribution of the viral genetic marker-related sequences from the virome assemblies obtained through the MetaVir workflow [56]. hosts, although it is still unclear whether such potential ssDNA viruses actively interact with the host eukaryotic and prokaryotic populations in the in situ sedimentary habitats or other ocean environments.
The proportion of ssDNA viral sequences among all the possible viral sequences is significantly higher (95299%) in the deep-sea sedimentary viromes (Fig. 3B) than in other previously described ocean planktonic viromes [19]. However, it should also be noted that the predominance of ssDNA viral sequences in the deep-sea sedimentary viromes may be biased by the method (MDA method) used to construct the virome library in this study. The MDA method has been adopted in several metagenomic studies of planktonic viromes in seawater samples of the Arctic Ocean, Gulf of Mexico, British Columbia, and Sargasso Sea, and a lower proportion (0.7 to 25.0%) of ssDNA viral sequences among the whole viral sequences has been demonstrated [19]. Thus, although methodological biases cannot be completely excluded, the comparison of the results of the ocean planktonic and benthic viromes suggests the potential predominance of the ssDNA viral components in the viral populations of the (hado)pelagic sedimentary habitats.
Although a high abundance of ssDNA viral sequences was commonly noted in the viromes of the (hado)pelagic sedimentary habitats, many differences also became evident upon the detailed comparison of the three deep-sea sedimentary viromes (Fig. 3B). For example, the MA virome library was dominated by sequences related to ssDNA viruses of the eukaryotic infectious Circovir-idae2Nanoviridae group, while the OG virome library was dominated by Microviridae-related sequences (Fig. 3B). We expected that the difference in viral genotype compositions was most likely associated with the different host community compositions, specifically, the compositional differences between prokaryotic communities as the predominant microbial populations in the deep-sea sedimentary habitats. The 16S rRNA gene phylotype analysis revealed a difference in the phylum-level composition of the prokaryotic phylotypes but a considerably similar pattern of the emerging constituent phylotypes in the three deep-sea shallow subseafloor sediments (Fig. 2), so that we could not find how the viral genotype compositions are coupled with the potential host microbial (prokaryotic and eukaryotic) community compositions in the deep-sea shallow subseafloor sediments.
In contrast, a relatively high abundance of sequences potentially originating from bacteria was indicated in the virome libraries of the OG and MA sediments (11% and 6%, respectively), and each of the three viromes represented a unique composition of viral and non-viral sequence origins (Fig. 3A). In previous metagenomic virome studies, the sequences identified as of non-viral origin, such as prokaryotic and eukaryotic sequences, were interpreted to be the result of a potential misclassification of viral sequences as host (prokaryote and eukaryote) genomic components [19][20][21]27,28,75]. However, in this study, the non-viral sequences found in the viromes may be derived from the potential contamination of the extracellular DNA by the indigenous prokaryotic and eukaryotic populations during the viral purification processes. To purify viral particles from the sediment samples, we used the CsCl density centrifugation method but did not perform DNase digestion of the extracellular free DNA fragments in the purified viral fractions. The functional profiling of the virome sequences (Fig. S1) revealed that the genes related to virusmediated gene transfer, such as those encoding integrases and transposases and belonging to the category of prophages and transposable elements, were rarely observed (1.4% and 0.2% in the OG and MA virome libraries, respectively). Because the viral abundance was significantly lower in the OG and MA sediments (6.6610 7 and 1.2610 7 25.3610 7 viruses/cm 3 , respectively) than the SH sediment (7.6610 10 viruses/cm 3 ) (Table S2), the influence of contaminated extracellular DNA would be greater in the OG and MA virome libraries than the SH virome library. Therefore, the bacterial sequences identified in this study may be due to contamination by extracellular DNA from cellular organisms.
The deep-sea shallow subseafloor sedimentary viromes were compared with previously characterized viromes of other environments by pairwise sequence similarities using the MetaVir workflow (Fig. 7). Because the analysis addresses not only sequences of known function but also sequences of unknown function, which constitute most of the virome sequences in public databases, the MetaVir analysis can provide a comparison between viromes with a broader spectrum of genetic information. The cluster analysis revealed that all of the deep-sea shallow subseafloor sedimentary viromes and coastal microbialite virome form a novel group of viromes that are clearly differentiated from the viromes of other environments, particularly the aquatic (marine and freshwater) viromes (Fig. 7). The distinct characteristics of the deep-sea shallow subseafloor sedimentary viromes in the statistical analysis are consistent with the domination of the novel viral genotype compositions by sequences from ssDNA viruses (Fig. 3B).
Although many differences in the virome compositions (e.g., the detailed viral genotype composition [ Fig. 3B]) and environmental conditions (e.g., geographical location, geological and oceanographic settings, physical and chemical environments, and potential prokaryotic community structures [ Fig. 2]) were identified, the deep-sea shallow subseafloor sedimentary viromes were statistically related to each other (Fig. 7). It is interesting that the viromes in the extant microbialite habitats have a significant relationship with the deep-sea sedimentary viromes (Fig. 7). The microbialites are types of complex sedimentary mineral and microbial structures that grow with photosynthetic primary production and the associated heterotrophic populations and are controlled microbially by mineral deposition in coastal and freshwater environments [35]. The microbialite virome in a shallow coastal area has been characterized by the high abundance of previously known viral sequences from ssDNA viruses and several marine cyanophages [35]. The deposition rates and properties and the indigenous microbial processes appear to differ considerably between the (hado) pelagic sediments and the microbialites, whereas both of the aquatic sedimentary habitats may have similar environmental and microbiological interactions in the development of the in situ viral community.
Here, we described the characteristics of viromes in deep-sea sediments. The virome investigations revealed that the (hado)pelagic sediments harbored novel viromes, including previously unidentified ssDNA viruses distinct from the viral genotypes previously identified in ocean environments, although the relative abundance of these ssDNA viral assemblages were likely biased during the construction of the metagenomic library. Still now, prospective trials of less biased methods to prepare the virome library con-tinue to be developed [76][77][78], including new copurification methods allowing simultaneous access to dsDNA, ssDNA, and RNA viruses from the same sample [79]. Therefore, further advanced investigations of community metagenomes of multiple DNA and RNA viral families are required to obtain a more comprehensive and reliable overview of the viral community in the deep-sea sedimentary environment. Moreover, in-depth analyses of the viral and host microbial community metagenome datasets in the (hado)pelagic sedimentary zones would provide a better understanding of the host-virus systems in the deep-sea sediments. Figure S1 Profiles of the function categories for the genes predicted from three deep-sea shallow subseafloor sedimentary viromes. The relative abundance of the constituent sequence reads of the virome genes assigned to SEED subsystems [51] with significance (E-value ,10 23 in BLASTp) is shown. (TIF) Figure S2 Maximum-likelihood tree of the 58 amino acid sequences of the major capsid protein (VP1 marker for Microviridae) from the contigs in the virome libraries, as represented by a tree gallery (the 50 'best' trees) with the MetaVir workflow [56]. The virome sequences from the Ogasawara (OG), Mariana (MA), and Shimokita (SH) libraries are highlighted in blue, red, and green, respectively. The numbers in parentheses indicate the DDBJ/ EMBL/GenBank accession numbers for the sequences. Only bootstrap values of .50% are indicated at the nodes of the tree. (TIF) Figure S3 The neighbor-joining phylogenetic tree of the 52 amino acid sequences of the replication protein (Rep marker for the Circoviridae2Nanoviridae2Geminiviridae group) from the contigs in the virome libraries as represented by a tree gallery (the 50 'best' trees) with the MetaVir workflow [56]. The virome sequences from the Ogasawara (OG), Mariana (MA), and Shimokita (SH) libraries are highlighted in blue, red, and green, respectively. The numbers in parentheses indicate the DDBJ/EMBL/GenBank accession numbers for the sequences. Only bootstrap values of .50% are indicated at the nodes of the tree. (TIF)