Functional Divergence in the Genus Oenococcus as Predicted by Genome Sequencing of the Newly-Described Species, Oenococcus kitaharae

Oenococcus kitaharae is only the second member of the genus Oenococcus to be identified and is the closest relative of the industrially important wine bacterium Oenococcus oeni. To provide insight into this new species, the genome of the type strain of O. kitaharae, DSM 17330, was sequenced. Comparison of the sequenced genomes of both species show that the genome of O. kitaharae DSM 17330 contains many genes with predicted functions in cellular defence (bacteriocins, antimicrobials, restriction-modification systems and a CRISPR locus) which are lacking in O. oeni. The two genomes also appear to differentially encode several metabolic pathways associated with amino acid biosynthesis and carbohydrate utilization and which have direct phenotypic consequences. This would indicate that the two species have evolved different survival techniques to suit their particular environmental niches. O. oeni has adapted to survive in the harsh, but predictable, environment of wine that provides very few competitive species. However O. kitaharae appears to have adapted to a growth environment in which biological competition provides a significant selective pressure by accumulating biological defence molecules, such as bacteriocins and restriction-modification systems, throughout its genome.


Introduction
Oenococcus kitaharae is a lactic acid bacterium (LAB) that was recently isolated from composting distilled Shochu residue [1]. This species represents only the second member of the genus Oenococcus to be identified, with Oenococcus oeni, the founding member of this genus, being reclassified from Leuconostoc oenos by Dicks et al in 1995 [2]. Whereas little is known regarding the biology or ecology of O. kitaharae, O. oeni plays a pivotal role in the production of wine (its almost exclusive habitat) where it is responsible for performing malolactic fermentation (MLF) [3]. However, initial phenotypic comparisons would indicate that the environmental niche of O. kitaharae is very different to that of O. oeni. The two species display markedly different pH (6.0 to 6.8 versus 4.8, respectively) and temperature (30uC versus 22uC, respectively) optima and O. kitaharae is also incapable of growth in concentrations of ethanol routinely found in wine [1].
Given the importance of LAB to the food and beverage industries, it is not surprising that this group of bacteria has been the focus of extensive research, including several genome sequencing efforts. These have resulted in a broad phylogenetic genome sequencing survey of eight LAB genera (covering over 80 species) and, of particular relevance to the study of O. kitaharae, an intra-specific study of the genomes of three individual O. oeni isolates [4][5][6]. The results of these preliminary comparative genomic studies indicated that the LAB group harbours extensive genetic variation, such that even within single species such as O. oeni, coding potential can be over 10% different between any two strains [6].
In order to expand our understanding of the LAB group and to provide a point of comparison for understanding the genome dynamics of O. oeni, we have sequenced the genome of the O. kitaharae type strain DSM 17330 [1]. Comparisons between the Oenococcus genomes uncovered several genetic differences that have the potential to translate into important points of interspecific phenotypic differentiation. These include several major metabolic differences such as the ability to ferment maltose, citrate and malate and the ability to synthesize specific amino acids such as L-arginine and L-histidine. In addition to these metabolic differences, the O. kitaharae genome also encodes many proteins involved in defence against both bacteriophage (restrictionmodification and CRISPR) and other microorganisms (bacteriocins), and has had its genome populated by at least two conjugative transposons, which is in contrast to currently available genome sequences of O. oeni which lack the vast majority of these defence proteins. It therefore appears that the genome of O. kitaharae has been shaped by its need to survive in a competitive growth environment that is vastly different from that encountered by O. oeni, where environmental stresses provide the greatest challenge to growth and reproduction.

Results
The O. kitaharae DSM 17330 genome was assembled from 1610 6 Illumina paired-end reads (500 bp spacing) into 17 contigs, comprising 12 single-copy contigs in addition to five contigs that were present in two copies each (with three of these having 100% identity). Through the application of paired-end information and the precise order of the unique sequences bounding these repeats, these 17 contigs were able to be manually arranged into two replicons (Fig. 1). The first of these is a 1.84 Mb circular chromosome which, due to the presence of a highly repetitive repeat cluster that is associated with the coding region of a serinerepeat protein, contains one assembly gap. The second replicon is predicted to be an 8.3 kb plasmid, which, on the basis of sequencing coverage, is predicted to be present in low copy number (,2 copies per cell, data not shown).
Genome annotation using a combination of Glimmer [7] and RAST [8] identified a total of 1833 predicted open reading frames (ORFs), four rRNA genes (two copies of each of the large and small ribosomal subunits), which are identical in sequence to the previously published ribosomal sequences for O. kitaharae [1], and 44 tRNA genes (see Table S1).  Fig. 2A), while the remainder display no recognizable homolog and either represent novel protein sequences or false positives of the ORF prediction methodology applied.
In order to compare the phylogenetic position of O. kitaharae as determined from 16S rDNA sequencing [1] with that based upon whole genome data, homologs of each of the predicted ORFs of O. kitaharae were sought from whole genome sequence annotations of thirteen strains of LAB representing the genera Enterococcus, Lactobacillus, Lactococcus, Leuconostoc, Oenococcus, Pediococcus and Weissella. A total of 561 of the O. kitaharae ORFs were shown to be conserved across all thirteen strains (BLAST e-value,10 220 , minimum 50% coverage of the query protein). Of these, 95 were then selected based on high sequence conservation and lack of potential paralogous sequences (see Table S2 for a complete list of the protein sequences used). Each of the subsequent individual protein alignments produced from these conserved groups of ORFs were then concatenated and used to construct a single maximum-likelihood phylogeny (Fig. 2B). The result of this analysis was consistent with phylogenies based upon 16S rDNA [1,9,10] and positions O. kitaharae as a clear sister species to O. oeni with both Leuconostoc spp. and Weisella spp. being the next closest evolutionary relatives.
Chromosomal elements of ''foreign'' origin. The majority of the O. kitaharae genome is conserved with that of O. oeni with the exception of several large islands (Fig. 3). In all but a limited number of cases, these O. kitaharae-specific islands were also shown to lack identifiable homology with other species of LAB used in the phylogenetic constructions. In addition, the majority of these regions display a high probability of being acquired by horizontal gene transfer (HGT) [11] and would be expected to have entered the O. kitaharae genome via genetic elements such as bacteriophage or conjugative plasmids.
As previous investigations between strains of O. oeni have shown that non-conserved genomic islands can often be attributed to the differential presence of prophage elements [12], homology searches were used to identify classical prophage genes, such as those that encode conserved bacteriophage integrase and lyase proteins, in the genome of DSM 17330. While several O. kitaharae ORFs were found to be homologous to phage proteins (Table S1), each potential prophage region lacked the repertoire of proteins which would be expected for the presence of functional prophage elements [12]. In addition, in all but one case (the genomic region from 382388 bp-404717 bp), these genomic islands were not found downstream of tRNA genes as is observed for O. oeni phage, which use tRNA genes as attachment sites [12].
Whereas the non-conserved genomic islands in the O. kitaharae genome do not appear to encode active phage, two of these islands have the hallmarks of genomically-integrated conjugative transposons (Fig. 4). Conjugative transposons are DNA elements that combine features of bacteriophage, transposons and conjugative plasmids [13]. These elements are often found integrated into the genome and are able to transpose to new sites in the host genome via the formation of a circular intermediate, which can sometimes replicate as a plasmid. As the name suggests, conjugative transposons also encode the ability to be horizontally transferred to new cells (and across species boundaries) via the classical conjugation pathway [13]. As yet, there have been no documented cases of conjugative transposons found in the genome of any strain of O. oeni.
The two potential conjugative transposons of O. kitaharae share a conserved core of 12 proteins. These proteins are predicted to encode the machinery necessary for conjugation and are highly homologous to a 12 kb region of the conjugative plasmid pWCFS103 from Lactobacillus plantarum [14]. In addition to the conserved conjugative core, one transposon is predicted to encode transposase-based integrative functions and the ability to replicate as a plasmid via the RepA and RepB plasmid replication proteins, whereas the other appears to use bacteriophage-based proteins for integration into the genome. kitaharae ORFs were sought from thirteen strains of LAB using BLAST and individual results are displayed for each strain color-coded by individual protein identity scores. In addition, an overall median identity was calculated by applying a sliding window of syntenic ORFs (n = 9, step = 1) to obtain a median percent identity for each strain with regions of low conservation highlighted (grey shading). Both the average GC percentage (5000 bp window, 200 bp step) and alien hunter foreign DNA likelihood scores [11] across the genome are also shown to compare areas of low sequence conservation with possible instances of HGT. The position of sequences associated with either toxin-antitoxin modules, phage integrase proteins, conjugative transposons or the CRISPR array are also shown. doi:10.1371/journal.pone.0029626.g003 Accompanying the replicative functions of the O. kitaharae conjugative transposons, each element is predicted to encode several proteins that can be broadly categorised as functioning as part of ''cellular defence'' from either foreign DNA or from competition imposed by other microorganisms. One conjugative transposon is predicted to encode two separate restrictionmodification (RM) systems (one combined Type III enzyme, and one Type IIs RM pair) and the other contains at least five ORFs that are potentially involved in the production of the antibacterial compounds (Fig. 4). In addition to the gene content of these potentially transmissible elements, the O. kitaharae DSM 17330 genome encodes another twelve proteins with putative roles in bacteriocin/antibacterial manufacture, transport or detoxification, four proteins involved in DNA RM and one CRISPR pathway array which, in other bacterial species, has been shown to provide a memory-based immunity to bacteriophage infection and possibly to the transmission of plasmid DNA (reviewed in [15]) ( Table 1). All of these cellular defence mechanisms are lacking conserved homologs in the sequenced strains of O. oeni, with the exception of a single Type III RM enzyme which is found specifically in O. oeni strain AWRIB429 [6].

Phenotypic differences attributable to genomic variation
In the initial characterization of O. kitaharae [1] several phenotypic traits were noted that differentiate this new species from O. oeni. Comparative genomics reveals a basis for some of these known differences while also suggesting several additional points of phenotypic differentiation.
Sugar utilization. One of the defining biochemical differences between O. kitaharae and O. oeni that was noted in its original isolation was the ability of O. kitaharae to produce acid from maltose [1]. This trait is rare in O. oeni, which is formally classified as maltose negative [16,17]. By comparing available whole-genome annotations for O. oeni with O. kitaharae DSM 17330 [8], it was possible to identify several genes associated with sugar utilization that are differentially present across the species (Table 2). Of these, at least four genes which are present in O. kitaharae, but absent in the O. oeni genomes, are predicted to be involved in the utilization of maltose, providing a direct genetic basis for this phenotype. In addition to genes predicted to be involved in the species-specific utilization of maltose, there are several ORFs predicted to be involved in the metabolism of trehalose, D-gluconate, D-ribose and fructose that are specifically present in O. kitaharae. While the assimilation of these sugars is often carried out by specific strains of O. oeni [17], this genotypic data agrees well with biochemical tests performed previously that indicated that O. kitaharae was able to utilize all of these various carbon sources [1,17].
In addition to those genes that are specifically present in O. kitaharae DSM 17330, several were identified that were present only in strains of O. oeni and which are predicted to be involved in the uptake and metabolism of arabinose and xylose ( Table 2). This is consistent with the inability of O. kitaharae to produce acid from either L-arabinose or D-xylose, two biochemical reactions that many strains of O. oeni, including those for which genome sequence are available often perform [1,17].
Amino acid biosynthesis. Both O. oeni and O. kitaharae are fastidious microorganisms that require many exogenous vitamins and amino acids. However, it appears that the O. kitaharae genome encodes biosynthetic pathways for at least two amino acids, arginine and histidine, which are lacking in O. oeni [18].
The O. kitaharae DSM 17330 genome encodes the six genes necessary for the production of arginine from glutamate via the ornithine/carbamoyl-phosphate (CP) pathway, in addition to encoding a second set of carbamoyl-phosphate synthase (CPSase) proteins ( Table 3). As CP is an important intermediate in both the arginine and pyrimidine biosynthetic pathways, many bacteria, such as Lactobacillus plantarum, contain two completely separate sets of CPSase proteins [19]. In this situation, one protein is encoded in an operon with genes involved in arginine biosynthesis and regulated by arginine, while a second gene is located in the pyrimidine biosynthetic operon and regulated by exogenous pyrimidines. O. kitaharae contains both sets of CPSase enzymes while the O. oeni genomic sequences are predicted to encode only the single pyrimidine-associated CPSase [19].
O. kitaharae also appears to encode all of the enzymes necessary for the synthesis of histidine from the pentose phosphate pathway intermediate 5-phosphoribosyl 1-pyrophosphate (PRPP) ( Table 3)  genes in the O. oeni PSU-1 genome have been identified previously that would provide this strain with the ability to convert citrate to pyruvate [4] (Table 4). These genes are absent from the O. kitaharae DSM 17330 genome leading to the prediction that, unlike O. oeni, this strain would lack the ability to ferment this organic acid.
One of the key defining biochemical features that separates O. oeni from O. kitaharae is the ability to perform malolactic fermentation. Malolactic fermentation has been shown to require the action of three proteins, a malate permease, which transports malate into the cell, the malolactic enzyme, which is responsible for converting malic acid into lactic acid, and a regulatory protein for these two downstream genes [22]. Surprisingly, the O. kitaharae genome was shown to contain genes that are orthologous to those which encode all three of these activities in O. oeni (Fig. 5A). It was subsequently shown that, while the sequences of all three genes are present in the O. kitaharae genome, the gene encoding malolactic enzyme contained a nonsense mutation that would prematurely truncate the protein coding region (Fig. 5B). The alteration of a single base in this premature stop codon would be sufficient to restore the full-length malolactic enzyme coding region (Fig. 5B) that is highly conserved with malolactic enzymes from many bacteria (Fig. S1). Furthermore, the O. kitaharae gene was shown to have a low ratio of non-synonymous to synonymous mutations (dN/dS = 0.0123) when compared with its O. oeni homologue. This would indicate that there has been limited opportunity for the unconstrained accumulation of synonymous mutations in the two fragments of the malolactic enzyme coding region in O. kitaharae (as would be expected in a non-functional gene undergoing random drift). It is therefore likely that the conversion of the malolactic enzyme to a pseudogene is a very recent event in O. kitaharae and it may be possible to obtain a functional enzyme through reversion of the nonsense mutation or to find a functional malate pathway in strains of O. kitaharae other than DSM 17330.

Discussion
O. kitaharae and O. oeni comprise the only known members of their genus. Sequencing of the O. kitaharae DSM 17330 genome has provided important insights into the genetic diversity across this genus. These two species of Oenococcus appear to inhabit significantly different ecological niches, with O. oeni being found almost exclusively in the highly stressful environment of wine whereas O. kitaharae was isolated from a composting shochu residue of unknown nutrient composition. Accordingly, the two species have accumulated genetic adaptations that reflect different metabolic needs and environmental constraints.
Although little is known regarding the exact nutrient profile of the residue from which O. kitaharae was isolated, the average composition of the major wine metabolites are well known. Finished wine has little or no glucose, fructose or maltose but does contain significant quantities of arabinose and xylose [23,24]. Many strains of O. oeni are capable of exploiting these carbohydrates, and contain genes whose biochemical functions are consistent with this ability, but, in all but a limited number of cases [16,17], cannot utilize maltose. In comparison, O. kitaharae DSM 17330 lacks the genes required for the use of arabinose and xylose but has the ability to utilize the maltose that would be present in the feedstocks, such as barley, which are used in the production of shochu. In addition, while little is known regarding the amino acid profile of the shochu residue in which O. kitaharae was isolated, it would be predicted that these feedstocks would generally be lower in arginine and histidine than wine (where they are often amongst the most prevalent amino acids [24]) given the presence of the biosynthetic pathways for both of these amino acids in O. kitaharae. Interestingly, both the histidine and arginine biosynthetic pathways display a scattered pattern of presence throughout the LAB phylogeny with only a limited number of species within a genus possessing these pathways (Fig. S2). It appears that there must be significant selective pressure working for and against these biosynthetic pathways in an environmentally-dependent manner across the LAB. For O. kitaharae, the evolutionary origins of both pathways are more consistent with loss of these enzymes followed by horizontal gain from a Lactobacillus-related species (Fig. S2).
Wine represents a harsh growth environment in which only a select few species of bacteria are capable of growth to significant levels [25]. O. oeni is therefore faced with little competition from other species of bacteria during its growth and its genome is almost devoid of proteins that are involved in defence against other bacteria or even to invasion by bacteriophage. In contrast, the O. kitaharae genome contains numerous proteins that potentially provide a selective advantage over other bacteria (bacteriocins/ antimicrobials), to defend against attack by other species of bacteria (bacteriocin immunity proteins) and to also defend against invasion by foreign DNA, such as that introduced by bacteriophage (restriction-modification systems and the CRISPR element). Although the biological diversity of composting residue in which O. kitaharae was not formally evaluated, it can be assumed that there was sufficient microbiological competition for resources to justify the selective advantage for the presence of these defence compounds. This argument is further supported by the fact that at least two other novel species of LAB have been isolated from this environment in addition to many other species of LAB that have been shown to be present during the shochu production process [1,[26][27][28][29]. O. kitaharae has therefore evolved to compete in a mixed-species environment whereas O. oeni has adapted to a niche in which the extreme nature of the growth substrate has removed the majority of biological competition.
Whereas the applicability of O. kitaharae for use as an industrial species is yet to be determined, it could prove useful for the development of improved strains of its relative O. oeni. Despite the environment of wine providing protection from competition for O. oeni, it is still argued that many instances of failed malolactic fermentations are due to the action of bacteriophage on susceptible strains [30,31]. If it were possible to move genes, such as those of the CRISPR array, from O. kitaharae into O. oeni via the conjugation machinery which is predicted to be present in O. kitaharae, this could provide a non-GM means of equipping industrial O. oeni strains with general resistance to bacteriophage infection. Likewise, if the genes involved in antimicrobial production can be transferred to O. oeni, these strains could limit potential negative impacts on wine quality due to the growth of spoilage bacteria such as Pediococcus spp, Lactobacillus spp and acetic acid bacteria, and may provide the means to reduce the amount of sulfite that is currently used for the microbial stabilization of wine [32]. The use of genes from O. kitaharae may therefore allow the production of strains of O. oeni which are not only able to thrive in the harshness of the wine environment, but are more resistant to potential biological competition from bacteriophage or other microorganisms.

DNA isolation and sequencing
O. kitaharae DSM 17330 was obtained from DSMZ (Germany) and was grown in modified MRS media (Amyl, Australia). Genomic DNA was isolated using standard phenol-chloroform extractions. Sequencing was performed on an Illumina GAIIx using 26100 bp paired-end ends with an average library size of 500 bp (Ramacioitti Centre, NSW, Australia).

Genome assembly
A total of 990,000 reads (,50-fold expected genome coverage) were randomly selected and assembled using MIRA (version 3.2.1). The MIRA output was imported into Seqman Pro (DNAstar, Madison, WI) for manual alignment and editing of the assembly. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AFVZ00000000. The version described in this paper is the first version, AFVZ01000000.

Genome Annotation
Gene predictions were made using Glimmer 3.02 [7]. Gene functional annotations were performed using the RAST server [8]   and BLAST [33] with comparisons to the non-redundant Genbank database. Predictions of genomic regions likely to have been acquired by horizontal gene transfer were calculated using Alien Hunter [11]. dN/dS ratios were calculated using Pal2nal [34]. Circular genome plots were compiled using Artemis [35] and DNAplotter [36].
Comparisons to the various LAB genomes were performed using BLAST [33] and custom written scripts. For phylogenetic analysis, proteins used for the analysis were first screened to ensure that they were conserved (minimum 60% identity when compared to the homologous O. kitaharae protein) in all of the LAB genomes used in this study (See Table S2). Next, proteins which had potential paralogs (which could confound the phylogeny) were identified by assigning each protein to specific orthoMCL [37] clusters and then only retaining those groups of orthologs in which each protein was the only member of a particular orthoMCL group. Individual protein alignments were then performed on each set of homologous sequences using Muscle [38]. These individual alignments were then concatenated into a single large sequence for each strain which was used to construct a maximumlikelihood phylogenetic tree using PhyML [39]. Table S1 Oenococcus kitaharae genome annotation. O. kitaharae ORFs as predicted by Glimmer [7] are matched against an automated annotation and functional prediction performed using RAST [8]. Comparative analysis of each ORF was also performed using BLASTp comparisons against O. oeni PSU1 [33] with the comparisons parameters listed for each O. kitaharae ORF and its closest O. oeni PSU1 match (if present).

(XLS)
Table S2 Evolutionarily conserved ORFs in lactic acid bacteria (LAB). Homologs of each of the predicted O. kitaharae ORFs were sought from thirteen strains of LAB using BLASTp [33]. Evolutionarily conserved proteins were classified as exhibiting sequence conservation across all fourteen species (minimum 60% identity when compared to the homologous O. kitaharae protein). This list was further refined by mapping each ORF from each species to an orthoMCL group [37] and only retaining those ORFs which were found be be unique within their particular orthoMCL cluster (removes the potential for paralogous ORFs to interfere with the phylogeny).