Recent Origin of the Methacrylate Redox System in Geobacter sulfurreducens AM-1 through Horizontal Gene Transfer

The origin and evolution of novel biochemical functions remains one of the key questions in molecular evolution. We study recently emerged methacrylate reductase function that is thought to have emerged in the last century and reported in Geobacter sulfurreducens strain AM-1. We report the sequence and study the evolution of the operon coding for the flavin-containing methacrylate reductase (Mrd) and tetraheme cytochrome с (Mcc) in the genome of G. sulfurreducens AM-1. Different types of signal peptides in functionally interlinked proteins Mrd and Mcc suggest a possible complex mechanism of biogenesis for chromoproteids of the methacrylate redox system. The homologs of the Mrd and Mcc sequence found in δ-Proteobacteria and Deferribacteres are also organized into an operon and their phylogenetic distribution suggested that these two genes tend to be horizontally transferred together. Specifically, the mrd and mcc genes from G. sulfurreducens AM-1 are not monophyletic with any of the homologs found in other Geobacter genomes. The acquisition of methacrylate reductase function by G. sulfurreducens AM-1 appears linked to a horizontal gene transfer event. However, the new function of the products of mrd and mcc may have evolved either prior or subsequent to their acquisition by G. sulfurreducens AM-1.

acid sequences [18] to the genome sequence. We found that the genes coding for Mrd and Mcc were arranged linearly and organized in one transcription unit (Fig 1). The mrd gene (1581 bp) was separated by 56 nucleotides from mcc (696 bp). The genes were flanked by a transposase gene 3297 nucleotides upstream of mrd separated from mrd by two pseudogenes and GTP cyclohydrolase gene 505 nucleotides downstream of mcc. Both flanking genes have the same orientation as mrd and mcc.
Putative promoter sites were found in close proximity to the predicted start codon. The sequences found 75 to 97 bp upstream of the translation start codon are similar to the consensus promoter sequences typically found -10 to -35 from the transcription start site. Furthermore, two transcription factor binding sites are predicted in this region, supporting the hypothesis that the promoter is a common regulatory element of the redox operon. A potential ρ-independent transcriptional terminator (energy of terminator -8.9) was found 75 nucleotides downstream of mcc. A second potential transcriptional terminator (terminator energy -9.4) is located in the spacer between the two genes and partially overlapped the mcc gene. The extra transcription termination signal located between the genes in the operon implies a complex regulation of the redox system at the transcriptional level.

Evolution of the methacrylate redox system
To elucidate the evolutionary history of the methacrylate redox system, we searched for orthologues of mrd and mcc. First, we searched for homologs in the eleven Geobacter genomes available in GenBank. For mrd the closest homologs by protein sequence divergence were found in three strains: G. lovleyi SZ (YP_001951186.1, YP_001953845.1, YP_001953762.1), G. bemidjiensis Bem (YP_002140822.1, YP_002140385.1) and Geobacter sp. M21 (YP_003023900.1) ( Table 1, Fig 2a). One of the homologs from G. lovleyi SZ, capable of chlororespiration, was the only protein from this list (YP_001951186.1) that does not contain the heme-binding sites CXXCH. Other homologous sequences found in Geobacter genomes have 4 heme-binding sites and different regions of their sequence are homologous to either Mrd or Mcc from the methacrylate redox system of G. sulfurreducens AM-1 (Tables 1 and 2 ; Fig 2a and 2b). Homology of Mcc from G. sulfurreducens AM-1 was observed for N-terminal amino acid sequence of Geobacter species flavocytochromes (usually 125 amino acids from the N-terminus). Sequence identity of Mrd with the flavocytochromes was higher (see column 5 of Tables 1 and 2) than for the region homologous to Mcc, and found in the C-terminal region (usually between the  140 th and 590 th amino acids). Thus, the methacrylate redox system homologs of bacteria of the genus Geobacter are often present as one multifunctional flavoprotein, combining functions of electron delivery and catalysis of reduction. A diversity of other cytochrome c protein sequences were found to be coded in Geobacter genomes [30][31][32][33][34][35][36], which were much more diverged than the Geobacter homologs we considered in our phylogenetic analysis. None of these distantly related genes were considered in our analysis.
To confirm that the G. sulfurreducens AM-1 Mrd and Mcc homologs found in other Geobacter species are not their direct orthologues, we performed a phylogenetic analysis of the homologs, including several of the sequences from the Geobacter genus that were most similar to Mrd of G. sulfurreducens AM-1. The analysis showed that G. sulfurreducens AM-1 Mrd and Mcc share a closer common ancestor with sequences from distant clades of bacteria, confirming that the methacrylate redox system genes, mrd and mcc, were likely acquired by G. sulfurreducens AM-1 through recent horizontal gene transfer and that their orthologues are not present in the sequenced Geobacter genomes (Fig 2a and 2b).

Products of the methacrylate redox system genes
The protein coded by mrd has 526 amino acids (Mr 57.2 kDa). The N-terminal amino acid sequence contains a 55 amino acid-long signal peptide with the Tat-motif RRDFLK in position 25 (Fig 3, Table 1). Thus, the mature protein is predicted to contain 471 amino acids (estimated Mr 51.4 kDa). Previous results have shown that the mature Mrd has 1 mol FAD [18]; therefore, the Mr of the mature Mrd with FAD should be 52.2 kDa, which is consistent with experimental data. We validated the start and flanking regions of mrd by Sanger sequencing of both strands, which were identical to the sequences obtained through the next generation sequencing of the entire genome. Thus, the unusually long predicted signal peptide was confirmed not to result from sequencing or assembly error.
The mcc gene codes for a protein 231 amino acids long (Mr 24.5 kDa). The N-terminal region contains a shorter Sec-type signal peptide of 23 amino acids (Fig 4, Table 2) with the mature protein predicted to have 208 amino acids (Mr 22.1 kDa). Previous experiments showed that the mature Mcc had 4 mol of heme c and a Mr of nearly 30 kDa [16]. Consistent with these results, we found four heme-binding motifs CXXCH [37] with the GENE RUNNER program. The Mr of a mature Mcc with 4 hemes is 24.8 kDa, substantially lower than expected. A visual analysis of the Mcc sequence revealed three more heme-binding motifs CXXCH, which brought the Mr of the mature Mcc with 7 hemes to 27.9 kDa (Fig 4).
The closest of the identifiable homologs of Mrd (Fig 2a) are likely FAD-binding proteins and flavocytochromes c, as indicated by the conserved phosphate-binding regions of N-termini (Fig 3). The phosphate-binding site is typical for all FAD-and NAD(P)H-dependent oxidoreductases: xhxhGxGxxGxxxhxxh(x) 8 hxhE(D), where x-any amino acid, h-hydrophobic amino acid [38]. In the case of Mrd this site was located between amino acids 69 and 98 of the immature protein (Fig 3). The central part of the consensus, GxGxxG, is a glycine-rich part of the loop, linking the first β-sheet in the Rossmann fold with the first α-helix directed to the pyrophosphate residue for charge compensation. Generally this motif has β-strand-turn-β-strand structure and forms a flexible clamp, surrounding and anchoring the pyrophosphate of FAD [39]. Another conservative FAD-binding site, which is an eleven amino acid segment T(S) xxxxxF(Y)hhGD(E) [40], was present in amino acid sequences of Mrd and its homologs. The site was slightly truncated, without the first threonine while all other amino acids were present (487-491).
The heme-binding sites of Mcc homologs identified by the phylogenetic analysis (Fig 2b) are shown in Fig 4. Their presence suggests that these homologs, not annotated as having any function, are cytochromes c, containing four heme-binding sites (YP_002429920.1) in D. alkenivorans AK-01 or seven heme-binding sites in the other species.
The Mrd sequence of G. sulfurreducens AM-1 had a higher level of similarity with its homologs ( Table 1) than Mcc of G. sulfurreducens AM-1 ( Table 2). This observation is consistent with a relatively poor conservation of cytochromes c [36] and probably with evolutionary early origin of the flavin-containing Mrd homologs.

Discussion
The methacrylate redox system genes in the genome of G. sulfurreducens AM-1 appear to be arranged in a single operon. The clear absence of orthologs in the genomes of several other Geobacter genomes, coupled with a lack of closely-related orthologs in genomes of bacteria from any other closely related genus, strongly suggests that the methacrylate redox system genes were acquired recently by the G. sulfurreducens AM-1 strain (Fig 5). The intriguing similarity of the phylogenetic distribution of the closely related homologs of both genes, mrd and mcc, suggests that these two genes tend to be horizontally transferred together, confirming their close functional relationship. The high congruence of the evolutionary history of the mrd and mcc genes is consistent with their organization into a single operon and confirms their joint functional role.
Unfortunately, for most of the identified homologs experimental data of their enzyme specificity are not available. Such lack of experimental data precludes us from understanding whether or not the acquisition of the methacrylate reducing function occurred before or after the horizontal gene transfer. Furthermore, even the closest of the identified homologs were evidently too diverged to be identified as the origin of the horizontal gene transfer. This conclusion is based on the observation of the divergence of Mrd and Mcc sequences from their closest homologs in comparison to the high similarity of genomes of different Geobacter species.
Conserved amino acids (histidine-461 and arginines-R501 and R353, Fig 3), found in the Mrd sequence, may stabilize the transition state during catalysis by providing delocalization of the negative charge of the intermediate carbanion, in a similar manner as in Shewanella fumarate reductases [16]. Point mutagenesis showed that the arginine homologous to R353 of Mrd is the proton donor for the carbanion [44]. The fumarate reductase arginine homologous to R501 in Mrd interacts through its guanidino group with both oxygen atoms of a carboxyl group of succinate, positioning it parallel to the isoalloxazine ring [16]. Mrd does not have two other conserved residues that interact with succinate or fumarate. It has a tryptophan instead of histidine at position 311 and valine instead of serine or threonine at position 324. Since these amino acids are also involved in substrate binding, it is possible that their absence is due to substrate specificity of Mrd of G. sulfurreducens AM-1.
Biogenesis of chromoproteids of the methacrylate redox system probably occurs via different mechanisms. The immature Mrd protein has a longer and less hydrophobic Tat-type signal peptide sequence (Table 1, Fig 3), characteristic for Bacteria, Archaea and chloroplast proteins. Such proteins are transported through the membrane after folding [45]. A Sec-type signal peptide sequence was found in the immature Mcc protein (Table 2, Fig 4). Such proteins are translocated across the membrane before the acquisition of tertiary structure [45,46], with heme attachment occurring in the periplasm [37,46]. Thus, both the Tat-and Sec-type secretory mechanisms are likely to be required for maturation of the methacrylate redox system proteins.
Genes of the methacrylate redox system components of G. sulfurreducens AM-1 are organized similarly to genes for their closest homologs in four representatives of δ-Proteobacteria and one representative of Deferribacteres (see RESULTS). Thus, it is possible that these organisms may also either be able to grow using methacrylate as a terminal electron acceptor or at least show some methacrylate-reducing activity. The methacrylate redox system is representative of a comprehensive family of flavocytochromes c and flavoproteins with reducing properties. These reducing complexes probably use a natural substrate, for example, acrylate produced by marine bacteria [26,47]. The rates of reduction of acrylate and methacrylate by the methacrylate redox system are comparable [18] supporting the hypothesis of the use of some natural substrate by these proteins. Methacrylate reduction may be an additional characteristic of this redox system.
Components of the methacrylate redox system from G. sulfurreducens AM-1 and lyase of dimethylsulphoniopropionate (DMSP) of DddY-type from marine microorganisms have some similar features: 1) a distribution in certain groups of proteobacteia, 2) gene organization with cytochrome c genes adjacent to the enzyme genes (reductase or lyase) and 3) presence of cleavable signal peptide in the immature enzymes. The enzyme DddY catalyzes the cleavage of DMSP to the volatile compound dimethyl sulphide (DMS) and the toxic acrylate [47]. We suggest that the reductase evolved to transform the toxic acrylate, formed by lyases, to a less toxic compound. These cytochromes c, whose genes are located near the reductase or lyase genes, may be homologous. The methacrylate redox system evolved from a cytochrome c and a flavoprotein. These proteins were recently acquired by horizontal gene transfer by G. sulfurreducens AM-1 either before or after the evolution of the substrate specificity. Furthermore, these proteins likely constitute an adaptive mechanism to allow growth in sludge microbial communities, in particular, in wastewater of plastic manufacture factories.

Experimental Procedures
The object of investigation was anaerobic bacterium G. sulfurreducens AM-1 from the culture collection of Laboratory of microorganisms adaptation at the Institute of Biochemistry and Physiology of Microorganisms (Pushchino, Russia).
The subject of investigation was the operon containing genes mrd and mcc of the methacrylate redox system of G. sulfurreducens AM-1.

Genome sequencing
The draft genome sequences were obtained by pair-end library and mate pair library sequences by Illumina HiSeq 2000. The resulting contigs were submitted to GenBank under the accession numbers of CP010430.

Genome assembly
The genome was assembled de novo using SOAPdenovo [48], Velvet [49] and SPAdes Genome Assembler [50]. The quality of assembly was estimated by running QUAST [51] and by aligning of the contigs to the full genomes of Geobacter sulfurreducens available in GenBank: Geobacter sulfurreducens KN400 and Geobacter sulfurreducens PCA. The alignments were done with Mauve [52].
The contigs obtained by SPAdes turned out to be the best. Nevertheless, SPAdes failed to assemble the genome into one sequence. We used SSPACE [53] for scaffolding. This allowed us to obtain the genome as just one contig. After this we applied GapFiller [54] for closing gaps.