Genomic features of “Candidatus Venteria ishoeyi”, a new sulfur-oxidizing macrobacterium from the Humboldt Sulfuretum off Chile

The Humboldt Sulfuretum (HS), in the productive Humboldt Eastern Boundary Current Upwelling Ecosystem, extends under the hypoxic waters of the Peru-Chile Undercurrent (ca. 6°S and ca. 36°S). Studies show that primeval sulfuretums held diverse prokaryotic life, and, while rare today, still sustain species-rich giant sulfur-oxidizing bacterial communities. We here present the genomic features of a new bacteria of the HS, “Candidatus Venteria ishoeyi” (“Ca. V. ishoeyi”) in the family Thiotrichaceae.Three identical filaments were micro-manipulated from reduced sediments collected off central Chile; their DNA was extracted, amplified, and sequenced by a Roche 454 GS FLX platform. Using three sequenced libraries and through de novo genome assembly, a draft genome of 5.7 Mbp, 495 scaffolds, and a N50 of 70 kbp, was obtained. The 16S rRNA gene phylogenetic analysis showed that “Ca. V. ishoeyi” is related to non-vacuolate forms presently known as Beggiatoa or Beggiatoa-like forms. The complete set of genes involved in respiratory nitrate-reduction to dinitrogen was identified in “Ca. V. ishoeyi”; including genes likely leading to ammonification. As expected, the sulfur-oxidation pathway reported for other sulfur-oxidizing bacteria were deduced and also, key inorganic and organic carbon acquisition related genes were identified. Unexpectedly, the genome of “Ca. V. ishoeyi” contained numerous CRISPR repeats and an I-F CRISPR-Cas type system gene coding array. Findings further show that, as a member of an eons-old marine ecosystem, “Ca. V. ishoeyi” contains the needed metabolic plasticity for life in an increasingly oxygenated and variable ocean.


Introduction
Below the oxygen-deficient waters of the Peru-Chile Undercurrent, between central Peru (ca. 6˚S) and central Chile (ca. 36˚S), the Humboldt Sulfuretum (HS), a sulfur compound-and macro-, megabacteria-rich biotope has been described [1][2][3][4][5]. Of varying morphology, the diameter of the filamentous macrobacteria typically range from ca. 1 μm to ca. 10 μm, and their lengths from around 10 μm to mostly several hundreds or even several thousands of micrometers. The megabacteria in turn, are defined as having a diameter greater than around 10 μm, most generally exhibiting a vacuolar system, and lengths that can reach several hundred to several thousands, and even millimeters. This classification of large filamentous bacteria, while ecologically useful, probably reflects their adaptation and evolution to an ocean evolving from fully or mostly anoxic to fully or mostly oxic [2]. Presenting results from a study on a member of the macrobacteria group is the objective of the present work.
While the term sulfuretum stems from stagnant fresh-water systems [6], recent study suggest that this type of benthic habitat and its typical biological complements, i.e., a highly diverse community of big filamentous and spherical non-photosynthetic sulfur-oxidizing bacteria [5], and small Archaea, might have been the only or the dominant types of life in primeval ocean bottoms when free oxygen was lacking or scarce [5,7]. Paleontological, geological, and geochemical evidence from this habitat and its biota have been reported from the Archaean and Proterozoic Eons [8][9][10], and from late Miocene (Tertiary), Italian mudstone beds [11]. We further posit that such evidence might still be present in the late Jurassic (Oxfordian), "proto-Humboldt Sulfuretum"-derived fossil-bearing deposits in northern Chile's "Cordillera Domeyko" [12]. Further, the recently described early occupation of new substrate created by a submarine volcanic eruption, of the closely related Thiotrichaceae "Ca. Thiolava veneris", could also be interpreted as evidence of biotic linkages between present sulfur bacteria and primeval benthic ecosystems [13].
Following the probable course of evolution, filamentous sulfur oxidizing macrobacteria oxidize sulfur directly using environmental nitrate [14][15][16], while sulfur oxidizing megabacteria do it through their highly concentrated vacuole-stored nitrate. Moreover, most big sulfur-oxidizing bacteria utilize stored intracellular elemental sulfur, and in some of them, phosphorus accumulations as polyphosphate have been reported [17][18][19].
According to the currently accepted classification system all discovered colorless sulfuroxidizing macro-and megabacteria e.g., [2,25,26], including the organisms of the present study, are classified in the family Thiotrichaceae [27], where the founder genus was Beggiatoa [28].
Seven filaments were initially isolated through micromanipulation from highly reduced soft HS-sediment samples, collected in 35 m depth, near the mouth of the Bay of Concepción, in central Chile. Next, the DNA from three out of seven filaments, evaluated as belonging to the same species ("Ca. V. ishoeyi"), were amplified by multiple displacement amplification (MDA) and sequenced through the Roche 454 GS FLX sequencing platform, to finally assemble these libraries in a single draft genome.
To uncover the main genomic features of the "Ca. V. ishoeyi" filaments, the potential for nitrate-reducing, sulfur-oxidizing metabolism, phosphorus, and carbon acquisition, as well as, the presence of CRISPR-Cas system, and gene clusters potentially involved in the biosynthesis of secondary metabolites, were assessed.

Sampling
Sampling was carried out on December 13, 2008, a normal El Niño Southern Oscillation (ENSO) cycle year, i.e., a non-warm El Niño and non-extremely cold La Niña year, at the benthic Station 7 (Lat. -36.64, Long. -73.04) [3] located near the mouth of the Bay of Concepcion (central Chile) at 35 m depth (Fig 1). Sediments were sampled from the 8.2 m R/L "Otilia" motor-whale boat with an in-house built mono-coring device equipped with a 1 m long, 5 cm in diameter sampling Plexiglas tube (Note: being a public space, special permissions was not required for sampling in this area. Moreover, the sampling area is not inhabited for protected species and we have not used sampling or experimental methods that endangered the place or the species inhabit there).
During a normal cold ENSO phase at the sampling site, dissolved oxygen near the bottom varies between ca. 0.2 mL L -1 during mid-summer and 1.3 mL L -1 in late winter. Over the shallow shelf at the end of the austral summer (March) maximum and minimum temperatures at surface and at 50 m depth are typically of ca. 14˚C and 11˚C, respectively. At the bottom, redox measurements showed highly reduced conditions with the abundant presence of hydrogen sulfide felt organoleptically [3,4]. Detailed hydrographic conditions data and information on the coastal sea dynamics of the area resulting from two summer cruises of the "Thioploca-Chile 1994 Expedition", carried out in cooperation with the Max Planck Institute for Marine Microbiology, Bremen, Germany; and the Chilean Navy are available [3,38].

Filament micromanipulation and amplification-sequencing of DNA
Micromanipulation of seven single narrow, non-vacuolated filaments (Fig 2) from freshly collected samples was performed following Ishoey's [39] method, and their whole genome was amplified. Genome amplifications were carried out during two separate sessions (Dec. 2008 andFeb. 2009). Firstly, samples were diluted in sterile filtered seawater to reduce cell densities. This step was followed by isolation, micromanipulation, and washing/cleaning of selected filaments in sterile filtered seawater. Next, filaments were transferred to 0.5 or 1 μL sterile PBS in a 200 μL PCR tube for whole genome amplification. Amplifications were done using the GenomiPhi HY kit (GE Healthcare) by the alkaline lysis method and were terminated after 6 hr at 30˚C. WGA products were diluted 2-fold in TE-buffer (stored at -20˚C) and a 20-fold dilution was prepared as working solution for PCR analysis and quantification. The purity of the amplified DNA was examined by PCR/sequencing of the 16S rRNA gene using universal primers 27F (5'-AGA GTT TGA TCM TGG CTC AG-3') and 1492R (5'-CGG TTA CCT TGT TAC GAC TT-3') and sequenced by Sanger sequencing. PCR products were visualized by agarose gel electrophoresis (81%) E-gels (Invitrogen) with a low DNA mass 2kb ladder. MDA and corresponding full length (~1.5 kbp) 16S rDNA PCR products are shown in S1 Fig. Analysis of the 16S rRNA sequences confirmed that MDA1, MDA2 and MDA5 (S1 Table) contained identical species of bacteria. In consequence, MDA1 and MDA2 were selected for genome analysis by sequencing of 3kb mate pair library, and MDA5 for single-end library through the Roche 454 GS FLX sequencing platform. Next, in order to assess the reads identity level, the three libraries were aligned using the software bbmap from BBTools suite tools (http://jgi.doe. gov/data-and-tools/bbtools/), which showed that~94% of the reads aligned effectively. Thus, based on the previous results, the MDA1, MDA2, and MDA5 library, were pooled to rebuild the draft genome of "Ca. V. ishoeyi".
Reads and annotation of the draft genome assembly are available through the PRJEB15360 Project id of The European Nucleotide Archive (ENA). Filament micromanipulation, amplification, and DNA sequencing were performed at Synthetic Genomics Inc. (SGI), La Jolla, CA, USA.

Genome assembly and annotation
Read sequences were converted and divided (if mate pair reads were detected) using sffToCA, a subroutine of the Celera (WGS) assembler [40]. Read quality control was performed through the FastQC (www.bioinformatics.babraham.ac.uk/projects/), which provides for a fast and easy display of quality. Then, low-quality reads and noncoding DNA fragments were filtered and removed through Prinseq-lite software (http://prinseq.sourceforge.net).
De novo genome assembly was carried out using Celera assembler on a Rocks cluster of 64 cores. The main assembly metrics were estimated by a custom python script and scaffolds >1000 bp were selected for gene prediction.
In order to determine possible contamination, the draft genome obtained was subjected to binning analysis using the MetaWatt software [41]. Moreover, the Amphora2 software [42] was used to evaluate the presence of 32 bacterial marker genes. The genome completeness of "Ca. V. ishoeyi" was assessed using the number of identified tRNAs, in reference to the number of tRNAs present in the complete genome of Thioploca ingrica (44 tRNAs-1 contig) [33] and Beggiatoa leptomitiformis (47 tRNAs-1 contig) [37], next to the presence of 139 single-copy genes [43].
Coding DNA sequences (CDSs) were predicted using a combination of the RAST platform [44] and the stand-alone Prokka program (rapid prokaryotic genome annotation) [45].

Phylogenetic analysis
After genome assembly, the 16S rRNA gene (MBHS_03239-1,537 bp in length) was identified using Barrnap. The phylogenetic tree was built using the 16S rRNA gene sequence identified in the genome of "Ca. V. ishoeyi", together with 16S rRNA gene sequences of 37 representative sulfur-oxidizing bacteria, obtained from NCBI. Sequences were aligned using Clustalw2 (http://www.clustal.org/clustal2/) and the phylogenetic tree was built through the bayesian approach with MrBayes tool [55], using Markov chain Monte Carlo (MCMC) method to estimate the a posteriori probability, one million of generations, GTR (General Time Reversible) as substitution model (substitution rate = 6), highest likelihood, and discarding 25% of the sampled trees. The 16S rRNA gene sequences of Thiothrix nivea (L40993), Leucothrix mucor (X87277), and Achromatium oxaliferum (L42543), were used as outgroup. The a posteriori probability was represented in percentage from 0 to 1 over the nodes.
The amino acid sequences were aligned using mafft v7.221 [56], and the phylogenetic tree was built with the same method used for the 16S rRNA gene sequences, but protein as nucmodel and 500.000 generations.

Results and discussion
Metrics and general features of the draft genome of "Ca. V. ishoeyi" The sequencing of two 3 kbp mate pair libraries and a single-end library rendered 377 Mbp in 1,175,553 raw reads with 240 bp in average for mate pair libraries and 450 bp for the singleend library. After the preprocessing stage, 244,880 and 245,912 reads for the 3 kbp mate pair libraries and 476,770 reads for the single-end library were obtained.
De novo assembly was applied to obtain a draft genome, through the Celera assembler, taking as input two mate-pair libraries and a single-end library. A genome size of 5.7 Mbp (Table 1), 719 contigs in 495 scaffolds, a N50 of 30.9 kbp for contigs and 70.2 kbp for scaffold, 41.1% of GC content, and 34X of coverage, was produced.
In the draft genome of "Ca. V. ishoeyi" (Fig 3), 4,767 coding sequences (CDSs), 5 rRNA (one 16S, one 23S, and three 5S), 41 tRNA, and one tmRNA, were identified. Out of 4,767 CDSs, 1,939 corresponded to hypothetical proteins leaving 2,828 CDSs encoding for proteins with a known function. Three sequences of 5S rRNA, each one of 109 bp in length, were identified, a feature that may respond to the nature of the assembly, which resulted from pooling the DNA of three different "Ca. V. ishoeyi" filaments. The 23S rRNA gene was broken in two separated segments of ca. 1,600 bp in a contiguous region (MBHS_03236-37).

16S rRNA phylogeny
The 16S rRNA gene is the most utilized molecular marker for bacteria identification since it possesses a constant functionality and it is thus considered a valid molecular chronometer, essential to inferring precise phylogenetic relationships among organisms [57-60]. The phylogenetic analysis was made using the 16S rRNA gene sequence identified in "Ca. V. ishoeyi" and the 16S rRNA gene sequences of 37 representative sulphur-oxidizing colorless bacteria. Further, three sequences belonging to Thiothrix nivea (L40993), Leucothrix mucor (X87277), and Achromatium oxaliferum (L42543), were used as outgroup. The phylogenetic tree (Fig 4) shows that "Ca. V. ishoeyi" affiliates to the root of the tree in a monophyletic clade composed of the marine Beggiatoa sp. MS-81-6 [61], Beggiatoa sp. Arauama I, Beggiatoa sp. Arauama II, from hypersaline coastal lagoon Araruama [62], the marine Beggiatoa sp. 35Flor [63], the uncultured Beggiatoa sp. HMW from the Hakon Mosby mud volcano area [64], and close to Beggiatoa sp. MS-81-1c [61]. Within the clade, "Ca. V. ishoeyi" shares between 90% and 91% sequence identity with the other 16S rRNA gene sequences, and have even less sequence identity with narrow freshwater Beggiatoa or the wide, vacuolated marine sulfur-oxidizing bacteria. The species close to "Ca. V. ishoeyi" are "narrow" marine or from hypersaline systems, non-vacuolate macrobacteria from different sources, i.e., Beggiatoa sp. Araruama 1 II and Beggiatoa sp. Araruama I (6.5 to 12.3 μm and 2.4 to 6.5 μm in diameter, respectively) both isolated from a narrowly ocean-connected, organically enriched, hypersaline lagoon [62]; the 4-5.  a microbial consortium of a black band disease of an epibenthic scleractinian coral [63]; the uncultured Beggiatoa spp. HMW (6-10 μm in diameter) collected from the Hakon Mosby mud volcano in Barents Sea (1,250 m depth) [64]; and Beggiatoa sp. MS-81-1c, (1.6-2.2 μm in diameter) also from the marine-influenced Sippewissett Marsh near Woods Hole, Massachusetts, USA. Thus, "Ca. V. ishoeyi" shares a similar, macrobacteria morphology and a marine or marine-related habitat with the related organisms within this clade.
It should also be emphasized that "Ca. V. ishoeyi" is clearly close to the tree root, which is consistent with the old age, at least Mesozoic, of the HS off Chile [12]. Also, consistent with the consensus criterion to define a new prokaryotic species, less than 97% in identity of the 16S rRNA gene sequences, which corresponds to 70% DNA-DNA hybridization [65], and less than 95% for a new genus [66], "Ca. V. ishoeyi" appears as a valid new genus and species within the Thiotrichaceae family.
Gene prediction for "Ca. V. ishoeyi" indicates the presence of a dissimilatory nitrate reduction (denitrification) pathway leading to N 2 as final product, and likely ammonification, which is the dissimilatory reduction of nitrate to ammonia (DNRA) (Fig 5). The first step into denitrification is conducted by the respiratory nitrate reductase (NAR) and also the periplasmic nitrate reductase (NAP) enzymes, while NAP can as well acts in DNRA [71]. In the genome of "Ca. V. ishoeyi", napAB genes (MBHS_00506 and MBHS_00509), encoding for a periplasmic nitrate reductase were identified. On the contrary, narGHJI genes encoding for NAR enzymes were not identified in "Ca. V. ishoeyi". The same result was obtained with the macrobacteria Beggiatoa alba B18LD (BioProject PRJNA224116), which only harbors NAP enzyme encoding genes. Conversely, in megabacteria "Ca. Maribbegiatoa sp." [31,32], "Ca. Isobeggiatoa divolgata" [30], "Ca. Thiomargarita nelsonii Bud S10" [35], and in "Ca. Thiomargarita nelsonii Thio36" [34], NAR and NAP encoding genes have been identified. NAP is expressed under aerobic and anaerobic conditions [72], but its role is not totally clear, since it has been found predominantly associated to aerobic denitrification [72]. Moreover, in nitrate ammonifiers NAP is highly effective under low nitrate conditions and its effectiveness is even higher when a reduced carbon source is available for bacterial growth [71]. However, Li et al. (2012) [73] reported that in Magnetospirillum gryphiswaldense NAP is required for anaerobic growth instead of NAR. On the other hand, Mußmann et al. (2007) [30] state that NapAB could allow a large vacuolate filamentous Beggiatoa, to respire nitrate at low nitrate concentrations, or even under aerobic conditions. Two copies of the nirS gene (MBHS_03552-03560), encoding for nitrite reductase precursors, were identified close to each other in scaffold scf_477. In a contiguous position, both the nirT (MBHS_03559) and the nirQ (MBHS_03566) genes were identified that encode for regulatory components. The nitrite removal from the cell is essential, since this compound is generally toxic to bacteria, thus, if the process of nitrite diffusing through the membrane is not efficient, it becomes necessary to convert nitrite to a less toxic form such as N 2 O (nitrous oxide), N 2 (dinitrogen), or NH 4 + (ammonium). It should be noted that the gene encoding for the pentaheme catalytic subunit (NrfA), involved in dissimilatory nitrite reduction to ammonium [74] was not identified in "Ca. V. ishoeyi". However, the presence of an nrfH homologously encoding a subunit of nitrite reductase complex (MBHS_04162), which mediates the electron transfer from the quinol to the catalytic subunit (NrfA), was identified. The NrfH protein has been originally described in the anaerobic epsilon-proteobacterium Wolinella succinogenes, predicted to be a membrane-bound tetrahaem cytochrome c belonging to the NapC/ NirT family [75]. Thus, the presence of genes encoding for the enzymes NrfH and NapAB suggests that "Ca. V. ishoeyi" could obtain energy through nitrate reduction to ammonia, besides denitrification. Both these capacities are of interest with respect to understanding fundamental regional oceanographic processes, in particular among the traditionally high productivity regions, such as the Humboldt Current ecosystem and the Benguela Current ecosystem, where sulfur oxidizing bacteria are present today, but may have been even more dominant in the geological past [76]. Similar ways for nitrite reduction have been reported for "Ca. Maribeggiatoa sp." [32] and "Ca. Isobeggiatoa divolgata" [30], which, besides both NAR and NAP enzymes present a candidate gene that would encode for an octaheme cytochrome nitrite reductase capable for DNRA. On the other hand, "Ca. Thiomargarita nelsonii Thio36" [34], "Ca. Thiomargarita nelsonii Bud S10" [35], Thioploca ingrica [33], and Beggiatoa alba B18LD, contain genes encoding for the assimilatory nitrate reductase (nasA) and nitrite reductase (nirBD) enzymes, involved in both assimilatory and dissimilatory pathways. Conversely, nasA nor nirBD genes, were identified in the draft genome of "Ca. V. ishoeyi". The third intermediate in the denitrification pathway is nitrous oxide (N 2 O), generated by the nitric oxide reductases. In the genome of "Ca. V. ishoeyi", the norC gene (MBHS_03562), encoding for nitric oxide reductase subunit C and the norB gene (MBHS_03563), encoding to nitric oxide reductase subunit B, were identified in a contiguous segment of the genome.
Evidence suggests that "Ca. V. ishoeyi" would be capable of respiring nitrate through both pathways: denitrification and DNRA. These alternatives give "Ca. V. ishoeyi" the metabolic versatility needed by an organism facing the challenges arising in a very dynamic and variable environment, including a great variation in redox conditions, even in the microscale. Therefore, it is highly probable that "Ca. V. ishoeyi" can comfortably occupy the sediment surface, but also could sustain anoxic periods and/or occupy deeper anoxic sediment conditions.

Sulfur oxidation
Regarding the fact that the HS is a highly sulfidic environment and intracellular sulfur globules could be observed in "Ca. V. ishoeyi" filaments under the microscope, adaptation to sulfide oxidation could be expected. Accordingly, we were able to identify most of the genes (S2 Table) required for the oxidation of sulfide, sulfur and thiosulfate to sulfate, shared by many free-living Gammaproteobacteria such as; Thiomicrospira, Halothiobacillus, and Beggiatoa [77][78][79].
Sulfur bacteria separate the two half reactions and oxidize sulfide to elemental sulfur using nitrate when present in lower, sulfidic layers of the sediment, after which filaments move up to the oxic zone where the stored elemental sulfur is oxidized to sulfate using oxygen [20,30].
The prediction for fccAB genes and a homolog of the sqr gene in "Ca. V. ishoeyi" would allow regulating the sulfide oxidation, depending on the environment conditions. It is reasonable to hypothesize that "Ca. V. ishoeyi" has been shaped by the evolution and selection to thrive in the upper microoxic benthic layers, under oxygen deficient water masses such as those of the Peru-Chile Subsurface Countercurrent [24].
"Ca. V. ishoeyi" appears to be capable of thiosulfate oxidation by the Sox (sulfur oxidation) system which consists of three core parts with SoxAX, SoxYZ, and SoxB [84][85][86], and which are suggested essential to thiosulfate oxidation in this system [33,87]. Two copies of the gene coding for SoxAX (MBHS_04050 and MBHS_04079) were identified in two contiguous scaffolds, adjacent to the genes encoding for SoxY and SoxZ (MBHS_04077-04078) proteins. However, the SoxB encoding genes or homologous sequences were not identified, perhaps due to assembly errors in the draft genome, or lack of sequence coverage to assemble into a contig of >1000 bp. Moreover, in "Ca. V. ishoeyi" the SoxCD encoding genes or homologous sequences were not identified, which is accordance with reported observations that sulfur-oxidizing bacteria typically lacks the SoxCD genes [88,89], when the DSR system genes is present, except in environmental fosmid sequences [33].
Our results from the genome analysis confirm that "Ca. V. ishoeyi" can obtain energy from sulfide and sulfur oxidation, maintaining a chemolithoautotrophic mode of life, where its intracellular sulfur globules would function as a stored energy source.

Oxygen and dimethyl sulfoxide respiration
The draft genome of "Ca. V. ishoeyi" encodes the high-affinity Cbb3-type cytochrome c oxidase subunit CcoP2 (MBHS_02490). The Cbb3-type cytochrome c oxidase is activated under microoxic conditions to oxidize inorganic sulfur species [30,90]. Thus, the presence of the subunit CcoP2 from the Cbb3-type cytochrome c oxidase suggests that "Ca. V. ishoeyi" would be able to use oxygen as terminal electron acceptor under microoxic conditions to growth. Conversely, the low-affinity cytochrome c aa3-oxidase, is activate under high oxygen conditions [30], was not identified in "Ca. V. ishoeyi". Additionally, "Ca. V. ishoeyi" could respire dimethyl sulfoxide (DMSO), due to the presence of the dsmA (MBHS_03149) and dsmB (MBHS_03148) genes, which encode for DMSO reductases. The DMSO reduction represents an alternative respiration pathway, since (CH 3 ) 2 OS is found in significant concentrations in aquatic environments, sometimes representing the most abundant methylated sulfur compound present [91].
In Thioploca ingrica a complete ccoNOQP gene cluster, encoding for the high-affinity cytochrome c bb3-oxidase, as well as the DMSO reductases, have been identified, but no genes encoding for the low-affinity cytochrome c aa3-oxidase [33]. In addition, the recently published genome of "Ca. Thiomargarita nelsonii Thio36" [34] harbors genes encoding for highand low-affinity cytochrome c oxidases, as well the dmsA gene. Besides, in "Ca. Isobeggiatoa divolgata" and "Ca. Parabeggiatoa communis", both high-and low-affinity terminal oxidases, plus dmsABC genes were identified [30]. Thus, results indicate that "Ca. V. ishoeyi", as several other sulfur-oxidizing macrobacteria have a flexible response when facing changes in oxygen concentration, being able to use different respiration pathways. This plasticity in respiratory pathways would allow "Ca. V. ishoeyi" inhabiting the variable oceanographic conditions over the HS [92].

Carbon metabolism
Most of currently available Thiotrichaceae genomes harbor protein-coding genes for the tricarboxylic acid cycle (TCA cycle) and the genome of "Ca. V. ishoeyi" is no exception, harboring genes coding for almost the full set of proteins that constitute the TCA cycle (S3A Table). Among them, we found the sucA (MBHS_00167) and sucB genes (MBHS_01480-04108), encoding for the 2-oxoglutarate dehydrogenase complex, an essential enzyme in the TCA cycle. The absence of this enzyme has been indicated as a key feature for obligate autotrophy [93], thus its presence would be characteristic of an organism living under conditions of variable amounts of sulfur that could take advantage of a variety of organic compounds, such as carboncontaining molecules or other energy sources, as a facultative chemolithoautotrophic. Moreover, according to earlier studies on the marine non-vacuolate Beggiatoa, the ability to utilize a wide range of organic compounds was described [94]. On the other hand, the presence of the complete protein-coding genes for the TCA cycle has been reported in other related genomes i. e., Thioploca ingrica, Beggiatoa sp. 35Flor, and "Ca. Thiomargarita nelsonii Thio36" [34].
It should be noted that in "Ca. V. ishoeyi" most of the genes required to encode the glycolysis pathway were identified (S3B Table). However, "Ca. V. ishoeyi" lacks the pdhAB genes, coding for pyruvate dehydrogenase; the pdhC gene, coding for dihydrolipoamide acetyltransferase, and the ppgk gene, coding for polyphosphate glucokinase. A similar condition has been reported for Beggiatoa sp. 35Flor [95]. The lack of these genes could be a feature of the non-vacuolated sulfur-oxidizing bacteria. However, additional genomic studies are required to discard the possibility that absence of these genes is due to incomplete assembly.
The ccbM gene (MBHS_00487) coding for form II of the ribulose-bisphosphate carboxylase oxygenase (RubisCO), which mediates CO 2 fixation, was identified in "Ca. V. ishoeyi". Besides this finding, five other genes coding for enzymes of the Calvin-Benson-Bassham cycle (S3C Table), were identified in "Ca. V. ishoeyi", but the gene coding for form I RuBisCO was not found. On the other hand, aceA and aceB genes, coding for key enzymes in the glyoxylate cycle: isocitrate lyase and malate synthase, were not identified in "Ca. V. ishoeyi". The glyoxylate cycle can bypass the TCA, to form glyoxylate and succinate from isocitrate for gluconeogenesis, using compounds such as acetate. The full set of genes coding for enzymes that participate in the glyoxylate cycle has been identified in the genome of Beggiatoa sp. 35Flor, Beggiatoa alba B18LD, and Thioploca ingrica.
The protein-coding gene for acetate assimilation, actP (MBHS_02529), which encodes for a cation/acetate symporter, and two copies of the acsA gene (MBHS_00422-03461), encoding for an acetyl-CoA synthetase, which mediates the transformation from acetate to acetyl-CoA, were, identified in "Ca. V. ishoeyi".
These findings suggest that "Ca. V. ishoeyi" could behave as a facultative chemolithoautotroph, being able to use acetate as an alternative energy source, and CO 2 fixation, using inorganic carbon source from sediments, such as bicarbonate to form biomass.

Phosphorus metabolism
The ability of Beggiatoa to store phosphorus as polyphosphate has been experimentally documented for the narrow non-vacuolated Beggiatoa sp. strain 35Flor [18,19], and was further demonstrated in natural samples of large vacuolated megabacteria [17][18][19]. Therefore, natural communities of macro-and megabacteria can play an important role in removing phosphorus from the marine environment and releasing it back in, a feature already proposed earlier following the observation of extensive Thioploca mats and the abundance of large phosphorite deposits off western South America [76,96]. Moreover, these findings, in combination with reported oceanographic conditions prevailing in the Proto-Humboldt Sulfuretum could add a new potential explanation for the unusual phosphatic preservation of Oxfordian Jurassic marine life as found in the "Cordillera Domeyko", in northern Chile [12].
According to the annotation, 20 CDSs related to the phosphorus metabolism were identified in the genome of "Ca. V. ishoeyi", among them, genes encoding for the transport, biosynthesis, and degradation of polyphosphates. The pitA gene (MBHS_01834), identified in "Ca. V. ishoeyi" coding for a phosphate inorganic transport (Pit), is indicated as a low-affinity transporter taking up inorganic phosphate through the phosphate transport system. However, the high-affinity transporter (pstSBCA) was not identified. Moreover, the polyphosphate kinase (PPK2) encoding gene (MBHS_02186) was identified in the scf_436 scaffold. The synthesis of polyphosphates in prokaryotes is mediated mostly by PPK1, which was not identified in "Ca. V. ishoeyi", and PPK2 which prefers ATP rather than GTP (guanosine-5'-triphosphate), and Mg 2+ over Manganese. PPK2 has also been found in Pseudomonas aeruginosa as an enzyme catalyzing GTP synthesis from GDP and polyphosphate. Conversely to PPK1, PPK2 preferentially catalyzes the reverse reaction [97]. Once the polyphosphate is synthesized, it may be stored intracellularly and generate phosphate from the degradation of polyphosphate by exopolyphosphatase (PPX), which catalyzes the hydrolysis of terminal phosphate residues from long chain polyphosphate [98]. Then, the phosphate may be transported out of the cell by the Pit transport system. The ppx gene (MBHS_00946) was identified in "Ca. V. ishoeyi", which encodes for PPX. In other sulphur-oxidizers such as Thioploca ingrica, ppk and ppgk genes, encoding for polyphosphate glucokinase have been identified [33]. Ppgk catalyses the phosphorylation of glucose and glucosamine to glucose-6-phosphate and glucosamine-6-phosphate respectively, however, it was not identified in "Ca. V. ishoeyi". Also, ppgk has been reported in "Ca. Isobeggiatoa" and "Ca. Parabeggiatoa" from Eckernförde Bay, Germany, Baltic Sea [30]. Furthermore, under laboratory conditions, Brock et al. (2012) [19] confirm the presence of polyphosphate inclusions in Beggiatoa sp. 35Flor which are enclosed by a lipid layer and store cations. The same authors further suggest that these inclusions "represent a new type of membrane-surrounded storage compartment within the genus Beggiatoa, distinct from the mostly nitrate-storing vacuoles known from other marine sulfur-oxidizing Thiotrichaceae".
The presence of genes related to transport, biosynthesis, and degradation of polyphosphates suggests that "Ca. V. ishoeyi" would be capable of storing polyphosphates, however, experimental testing including the visualization of potential phosphate inclusions as suggested by Brock et al. (2012) [19] would be necessary to confirm this.

CRISPR-Cas system
The CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) modules are adaptive immunity systems that are encoded by most Archaea and many Bacteria to act against invading genetic elements such as bacteriophages and plasmids [99,100].
Through the CRISPRFinder tool [51], 18 CRISPR repeats or Direct Repeats (DRs) were identified in "Ca. V. ishoeyi". These consist of a succession of highly conserved regions (S4A Table), which varied between 26 and 35 bp. Remarkable is the CRISPR repeat DR-16 ("GTTGACTGCCGCACAGGCAGCTTAGAAA") consists of 28 bp and 110 spacers, which covers a size of 6,750 bp in the scf_483. A total of 33 CRISPR-associated (Cas) protein encoding genes, were detected in "Ca. V. ishoeyi" (S4B Table), most of them, associated with a DR. Cas proteins are directly implicated in different stages of the processing of CRISPR loci transcripts, i.e., cleavage of the target DNA or RNA and new spacer integrations which correspond to alien DNA fragments that are included in the CRISPR cassettes [100,101]. Thus, in "Ca. V. ishoeyi" genes encoding for proteins such as: Cas4/endonuclease, Cas1 fusion (MBHS_00983), non-defined Cas; and RAMP Superfamily protein (MBHS_00987-00989-00990-009929, and Cmr5 protein (MBHS_00991) were identified associated to DR-4. Moreover, flanking to the DR-8 CRISPR repeat, a noteworthy gene array, coding for an enzymatic complex of CRISPR-Cas, was identified (9 spacers). Here, genes coding for the key protein Cas1 (MBHS_01576) that allows the CRISPR system memorize previous contacts with infectious agents [101], and cas3 (MBHS_01575), coding for CRISPR-associated nuclease/helicase Cas3, which acts as a single-strand nuclease and helicase in the CRISPR-Cas immune system [102], together with the protein-encoding genes cys1, cys2, cys3, and cas6f (MBHS_01573-01572-01571-01570), were identified. This gene array conforms the canonical type I-F cas, which encode for a I-F CRISPR-Cas system. The type I-F CRISPR-Cas system has been identified and well characterized in bacteria such as Pseudomonas aeruginosa [103], where it has been demonstrated that under lab conditions it is capable of acquiring protective spacers during challenges by the bacteriophage DMS3vir [104].
Indeed, the numerous CRISPR repeats detected could well reflect the pressure that bacteriophages have exerted on these organisms along the geological time scale. Morphologically similar to "Ca. V. ishoeyi" fossil prokaryotic filamentous forms have been cited between ca. 3,496 and 2,560 million years ago within the Archean Eon by Schopf [8]. These findings have given rise to a whole system of generic names such as: Archeoscillatoriopsis sp. (4-16 μm in diameter), and binomial designations such as Archaescillariopsis disciformis (ca. 3-5 μm in diameter). Significantly, according to Schopf's compilation work, broad tubular and sheath-like structures first appeared 2,560 and 2,516 million years ago and were named Siphonophycus transvaalense (ca. 10-28 and 15-27 μm in diameter), perhaps signaling the advent of oxygen at the bottom and the setting of a selection process that lead to a vacuolate, Thioploca-like morphology.
The presence in "Ca. V. ishoeyi" of canonical type I-F cas genes, together with eighteen CRISPR repeats suggests with high probability that "Ca. V. ishoeyi" is capable of generate a strong defense against bacteriophages. In addition, considering the proximity of "Ca. V. ishoeyi" to the root of the phylogenetic tree and the old age of sulfuretum ecosystems on Earth, we here tentatively posit that, besides being a good taxonomic discriminator and a measure of the actual bacteriophage pressure, the presence and complexity of the CRISPR-Cas system could be a good measure of the species age and the ecological maturity of a given prokaryotic community, with advantages for taxonomy and nomenclature.

Secondary metabolites
Although not thoroughly investigated in sulfur-oxidizing bacteria, several enzyme-encoding gene clusters involved in the biosynthesis of secondary metabolites have been identified, i.e., non-ribosomal peptide synthetase (NRPSs), and polyketide synthetase (PKSs) genes [30], presumably originating from cyanobacteria, as suggested after genome analysis of other vacuolated sulfur bacteria [30,31]. For example, a new macrolide compound, denominated Macplocimine A, was isolated from massive Thioploca-dominated mat samples, collected off central Chile [105].
In "Ca. V. ishoeyi", the presence of gene clusters related to the biosynthesis of secondary metabolites was assessed through the antiSMASH tool [47][48][49]. A gene cluster of 21,082 bp containing genes for the synthesis of a terpene type compound, which includes a wide range of complex natural products such as toxins, repellents, and attractants [106], and an undetermined gene cluster of 41,958 bp, were identified in "Ca. V. ishoeyi". The terpene gene cluster candidate ( Fig  7A) contains a key gene sequence coding for a phytoene synthase (MBHS_04759). On the other hand, the undetermined gene cluster has a sequence identified as pksj gene (MBHS_04255), encoding Adenylation (A) and Carrier Protein (Thiolation) domains, similar to a non-ribosomal peptide-synthetase (NRPS) module. However, it lacks the condensation (C) domain to complete a typical NRPS module (Fig 7B). Therefore, it is highly probable that the enzyme encoded by the pksj gene is involved in the biosynthesis of some non-canonical secondary metabolite, e.g., in non-  Table. https://doi.org/10.1371/journal.pone.0188371.g007 proteinogenic amino acid biosynthesis, such as many enzymes with A-CP domain structures that are not 'real' NRPSs (Personal communication from Dr. Marnix Medema, Wageningen University, Netherlands).
About 50,000 terpenoid metabolites within nearly 400 different structural families are known, mostly isolated from plants and only a few originating from prokaryotes. However, in recent studies, using bacterial proteins from public databases [107], 262 putative bacterial terpene synthases were identified. The phytoene synthase is a key enzyme in biosynthesis of carotenoids, natural pigments synthesized by plants, and some microorganisms that carry out important physiological functions. Thus, they are essential for photosynthesis and photoprotection in photosynthetic organisms, while in non-photosynthetic organisms, they take part in mitigating photooxidative damage [108]. Despite the presence of phytoene synthases in several Thiotrichaceae, the possible role in them and in "Ca. V. ishoeyi" is not clear since they are neither photosynthetic organisms not are they exposed to sunlight. Therefore, its presence and function is an issue to be clarified.

Conclusions
The draft genome of a marine, non-vacuolated filamentous macrobacterium, inhabiting strongly reduced soft sediment, in the Humboldt Sulfuretum, off central Chile, was successfully obtained.
Here the new bacterium, "Candidatus Venteria ishoeyi" ("Ca. V. ishoeyi"), within the family Thiotrichaceae, is proposed. The phylogenetic analysis indicated that the "Ca. V. ishoeyi" affiliated closely with other marine and hypersaline narrow non-vacuolate Beggiatoa spp. close to the phylogenetic root of the family Thiotrichaceae. Although "Ca. V. ishoeyi" lacks a nitrate-storing vacuole, genes coding for enzymes enabling both, denitrification and possibly also ammonification, were identified, which is a feature that was already detected in some other members of the family of large sulfur bacteria [34,35]. Besides, genes for sulfur oxidation, oxygen, and dimethyl sulfoxide respiration under micro-oxic conditions, and organic and inorganic carbon fixation, were identified in "Ca. V. ishoeyi", suggesting a facultative chemolithoautotrophic lifestyle. Moreover, judging from the possession of a comparatively richly endowed CRISPR-Cas system, it is suggested that within its ancient and competitive habitat, "Ca. V. ishoeyi" in time has undergone, and presently is, subjected to great pressure from bacteriophages, within the highly diversified HS filamentous microbial community, which according to the micropaleontology may have already existed for Eons.  Table. S4A. List of CRISPR repeats identified in "Ca. V. ishoeyi". The table shows: scaffolds containing start and end of CRISPR repeat regions, Direct Repeat (DR) sequences, length of the DR, and number of spacers. S4B. CRISPR-associated protein encoding genes identified in "Ca. V. ishoeyi". (PDF) S5 Table. S5A. Gene and product names of the Terpene type gene cluster. S5B. Gene and product names of the gene cluster, related with the potential biosynthesis of a non-classified secondary metabolite. (PDF)

Acknowledgments
Firstly, we deeply thank Dr. J. Craig Venter for his fundamental contribution to the advancement of knowledge on oceanic bacteria through his Sorcerer II Expedition, and, in particular, of the benthic large filamentous bacteria living within the Humboldt Sulfuretum off the central coast of Chile. We further acknowledge the valuable support from the international program Census of Marine Life (CoML) and its project International Census of Marine Microbes (ICOMM). VAG acknowledges the facilities offered by the National Taiwan Ocean University, Keelung, Taiwan.