Evidence of horizontal gene transfer by transposase gene analyses in Fervidobacterium species

Horizontal Gene Transfer (HGT) plays an important role in the physiology and evolution of microorganisms above all thermophilic prokaryotes. Some members of the Phylum Thermotogae (i.e., Thermotoga spp.) have been reported to present genomes constituted by a mosaic of genes from a variety of origins. This study presents a novel approach to search on the potential plasticity of Fervidobacterium genomes using putative transposase-encoding genes as the target of analysis. Transposases are key proteins involved in genomic DNA rearrangements. A comprehensive comparative analysis, including phylogeny, non-metric multidimensional scaling analysis of tetranucleotide frequencies, repetitive flanking sequences and divergence estimates, was performed on the transposase genes detected in four Fervidobacterium genomes: F. nodosum, F. pennivorans, F. islandicum and a new isolate (Fervidobacterium sp. FC2004). Transposase sequences were classified in different groups by their degree of similarity. The different methods used in this study pointed that over half of the transposase genes represented putative HGT events with closest relative sequences within the phylum Firmicutes, being Caldicellulosiruptor the genus showing highest gene sequence proximity. These results confirmed a direct evolutionary relationship through HGT between specific Fervidobacterium species and thermophilic Firmicutes leading to potential gene sequence and functionality sharing to thrive under similar environmental conditions. Transposase-encoding genes represent suitable targets to approach the plasticity and potential mosaicism of bacterial genomes.


Introduction
The microbial world presents an astonishing high diversity [1,2]. Natural habitats contain complex microbial communities where different microorganisms interact with the environment representing a driving force to get adapted to the available conditions. The microbial cells live in communities and they continuously interact with others both within and between species [3]. This interactive scenario generates opportunities to exchange functional capabilities and accelerate the pace of evolutionary adaptation. PLOS  Horizontal gene transfer (HGT) represents the exchange of DNA between species and, this is even more interesting when it occurs among scarcely related cells, for example, between different phyla. HGT events occur frequently in the prokaryotic world and they are thought to have enormous relevance in the evolution of prokaryotes [4,5,6]. However, the mechanisms governing these events are poorly understood. Phylogenetic studies are providing valuable information on potential DNA mobility events during microbial evolution. In this respect, the involvement of mobile genetic elements is a major factor involved in the genomic evolution of prokaryotes [6,7].
The activities of different mobile genetic elements, both within a single genome and across genomes of different microorganisms, have serious impacts on genome structure and function. For instance, mobile genetic elements are able to generate transpositions of DNA fragments and as a result gene inactivation/activation and gene deletions and insertions [8]. Among the mobile elements, the insertion sequences (IS) are considered the simplest autonomous transposition elements. These ISs are constituted by a transposase gene flanked up-and down-stream by repetitive flanking sequences. Consequently, transposase genes are expected to show genome rearrangements more frequently than other genes [9]. The mechanism of action of transposases consists of replicative and non-replicative (conservative) modes that facilitate movement of the transposition elements to a different location. As a consequence, gene disruption caused by the scission of an IS and its insertion into the genome are usually traced back to understand these evolutionary events [10]. The repetitive flanking sequences, or inverted repeat (IR) sequences, are targets of the transposases and are generally included in the transposed DNA. The involvement of transposases in the mobility of DNA within and likely between genomes [6,7] suggests that these genes can be used as a proxy to better understand the potential of HGT on bacterial evolution and genome plasticity.
Extreme conditions, such as high temperatures, represent environments fostering rapid adaptation mechanisms. HGT has proven to represent a major mechanism allowing the adaptation of microorganisms to these environments [11,12,13]. For instance, the phylum Thermotogae, mostly represented by thermophiles, has been reported to present a high frequency of HGT events, such as some DNA fragments putatively coming from members of the Phylum Firmicutes and the Domain Archaea. This has been pointed out in the genome of Thermotoga maritima [6,14], genomes of Thermotoga and Thermosipho species and Fervidobacterium nodosum [13,15].
In this study, we analyzed transposase genes in Fervidobacterium genomes as potential targets to infer the putative occurrence of HGT events in this genus and their liaison to distantly and closely related taxa.

Genomes, genes and phylogenies
The genomes of F. nodosum Rt17-B1 (NC_009718) [13,16], F. pennivorans DSM9070 (NC_ 017095) [17], F. islandicum AW-1 (NZ_CP014334) [18,19] and a new isolate [20], Fervidobacterium thailandense strain FC2004 (LWAF01000000) were used in this analysis. Annotated transposases in these four genomes and homology searching (using blast algorithms) [21] resulted in the detection of a number of transposases. They were classified based on their similarity to transposase families as proposed by Siguier et al. [8] at the ISFinder database (http:// www-is.biotoul.fr). Blastp was used to determine the closest relatives to the transposase genes detected in Fervidobacterium genomes. Besides the closest relative sequences detected in Gen-Bank, the closest sequences to those transposase families within the Phylum Thermotogae were also incorporated into the analyses as a comparative threshold for the detection of HGT events between different phyla. Sequence alignments were performed by ClustalW [22]. Phylogenetic trees based on the amino acid sequences encoded by the detected genes were constructed using MEGA [23] by the Neighbor-joining method with a bootstrap value of 1000.

Multivariate analyses of tetranucleotide frequencies
Non-metric MultiDimensional Scaling (NMDS) analyses were performed to obtain graphical distributions of the transposase gene sequences corresponding to each transposase family detected in the Fervidobacterium genomes and their related genes. The frequencies of tetranucleotides [24] were used in NMDS analyses. NMDS plots were constructed using R with the Vegan Package [25].

Repetitive flanking sequences
The conservation of insertion sequence endings were studied by searching through visual inspection the sequence alignments corresponding to related transposase genes plus additional 1000 nucleotides up-and down-stream. When present, the detection of the inverted repeats or palindromic sequences (depending on the transposase family) was guided by the information available at the ISFinder database. The percentage of identity between aligned, conserved endings and the distance to the annotated start or stop codons were noted.

K-L divergence estimates
The Kullback-Leiber (K-L) divergence metric [26,27] was used as an approach to estimate the divergence between related transposase genes clustered in a transposase family. K-L divergence (D KL ) was calculated on tetranucleotide frequencies for pairwise comparisons as where g is the parameter used (e.g., frequency of a particular tetranucleotide, i) for the analyzed gene and G the same parameter for a reference transposase gene or whole genome.

Classification of transposases in Fervidobacterium genomes
The number of putative transposase genes detected in the Fervidobacterium (Phylum Thermotogae) genomes ranged between 39 and 48 per genome (Table 1). These transposase genes were classified in six IS families (IS6, IS110, IS200/IS605, ISL3, IS3 and IS4) plus four undefined groups (types A, B, C and D) that were unrelated to transposase families listed in ISFinder [28]. In Table 1, ISCpe7-like transposases (IS6 family) were identified as the most frequent IS element family in the genomes of Fervidobacterium: F. nodosum Rt17-B1 (23 copies), F. pennivorans DSM 9078 (24 copies), Fervidobacterium sp. FC2004 (12 copies) and F. islandicum AW-1 (34 copies). Based on their degree of similarity and phylogenetic relationships, the IS110 (IS110_I-IS110_III), IS200 (IS200_I-IS200_III), IS605 (IS605_I-IS605_III) and IS3  (IS3_I-IS3_III) were further classified into three subgroups. Evidences of HGT Fervidobacterium genomes that might have had occurred across phyla (i.e., other than Thermotoga) were characterized . The results revealed that ISCpe7, IS110, IS605_I, IS605_II, ISL3, IS3_II, IS3_III and the types B, C, and D might be actively involved in the HGT events rendered by other species belonging to other phyla. The IS605_I, ISL3, type C and type D were chosen as four representative models which were described below.

IS605_I transposases
Phylogenetic analysis using transposase sequences revealed a copy of IS605_I in F. nodosum which shares the highest similarity to a clade formed by Caldicellulosiruptor (Phylum Firmicutes; amino acid similarity 77%; Fig 1). Substantial transposase amino acid sequence similarities of 60-65% and 51% were observed in bacteria belonging to the phyla Aquificae, Synergistetes and Deinococcus-Thermus and a member of the Domain Archaea (Ferroplasma acidarmanus), respectively. In contrast, the closest Thermotogae representatives (Thermotoga spp.) for this transposase group formed a different cluster (Fig 1) showing much lower similarity (32-43%). Thermotogae and Firmicutes are too different phyla represented in phylogenetic dendrograms as independent low-branching bifurcations [29]. This suggests that the potential origin of these transposases in F. nodosum might have been the Phylum Firmicutes rather than the closest relatives in the Phylum Thermotogae. NMDS analysis of tetranucleotide frequencies also showed a clustering of the IS605_I transposase from F. nodosum (1381_Fn) in the vicinity of the closest relatives (Fig 2) mainly from the genus Caldicellulosiruptor (Phylum Firmicutes). The results imply that the IS605 from F. nodosum is the closest relative to this type of transposition elements from Caldicellulosiruptor. Analysis of palindromic IR sequences flanking the transposase gene sequences revealed that the left and the right IR sequences of F. nodosum show remarkably high nucleotide identity, 67-70% and 71-75%, to sequences from species of the genus Caldicellulosiruptor (Phylum Firmicutes), and to other Firmicutes, 55 and 60% to Thermacetogenium phaeum and 73 and 78% to Ammonifex degensii. Lower similarities to the left and right IR sequences from members belonging to the Phylum Aquificae (Hydrogenivirga, 42% and 57%, and Desulfurobacterium, 44-46% and 76-80%) were observed. In contrast, lowest similarities to the left and right IRs were obtained from members of the Phylum Thermotogae (42-43% and 55-60%). Besides, these IR sequences were highly conserved at the genus level and the high identity observed between these related sequences (i.e., Fervidobacterium and Caldicellulosiruptor) suggested interphyla relationships ( Table 2).
The K-L divergence estimator (Fig 3) also confirmed the proximity of F. nodosum and Caldicellulosiruptor transposases within IS605_I subgroup. An increase in divergence estimations was in agreement to phylogenetic distance (Fig 1). The other related transposase genes from other phyla showed slightly increasing divergence (Fig 3). The results from different methods confirmed a close relationship between F. nodosum IS605_I transposase and members of the Firmicutes and other phyla rather that to the closest Thermotogae sequences.
The clustering organization observed in Fig 4 was in agreement to their distributions on NMDS analysis of tetranucleotide frequencies ( Fig 5). Thus, the major lineages differentiated on the phylogenetic dendrogram were also differentiated in the NMDS plot ( Fig 5). F. nodosum and Caldicellulosiruptor sequences represented a single cluster, clearly differentiated to the other related sequences with decreasing similarity.
The IR sequences surrounding the ISL3 transposase genes in F. nodosum were practically identical among them (Table 3). Only minor differences (89-92% identity) were observed with their corresponding sequences in Caldicellulosiruptor (Table 3). Other Firmicutes ISL3 transposase genes showed close but slightly more distant IR sequences (75-86% identity) which paralleled to their increasing phylogenetic distance (Fig 4). The closest Thermotogae ISL3 transposase sequences were distant from those in F. nodosum with identities below 35% which also corresponded to the divergent phylogeny shown for this group (Fig 4).
Similarly, results from K-L divergence analyses (Fig 6) showed increasing divergence at increasing phylogenetic distance, in agreement to the results from Figs 4 and 5 and Table 3. The clusters observed at the phylogeny (Fig 4) were related to stepping up divergence and showed Caldicellulosiruptor as the genus with the lowest divergence from F. nodosum ISL3 transposase genes.

Type C transposases
Three genes classified within the type C transposase group were found in F. pennivorans genome and one in F. islandicum genome. The closest sequences related to Fervidobacterium type C transposase genes belonged to the genus Caldicellulosiruptor (Firmicutes; 98-99% amino acid similarity) which formed a single clade together with the Fervidobacterium sequences (Fig 7). Other type C related transposases included transposases from Firmicutes such as Thermoanaerobacter, Thermobrachium and Clostridium (70-82% similarity). The closest Thermotogae transposase sequences belonged to Petrotoga mobilis (53% similarity). Other more distant sequences from Thermotogae forming an independent cluster were also detected but they showed levels of similarity below 43% (35-43%). NMDS analysis of tetranucleotide frequencies (Fig 8) differentiated the major phylogenetic clade formed by Fervidobacterium and Caldicelulosiruptor type C transposases. Additionally, other related phylogenetic clades observed within the Firmicutes clustered on the NMDS plot. Most Thermotogae sequences were clearly separated in a different and distant group (Fig 8).
F. pennivorans and F. islandicum showed identical IR sequences ( Table 4). The IR sequence down-stream of F. islandicum type C transposase remained undetected. Caldicellulosiruptor type C transposases presented practically identical IR sequences (97-100% identity) than Fervidobacterium. Other related Firmicutes showed 74-86% identity to Fervidobacterium type C IR sequences. For the different lineage of Thermotogae, the IR sequence identity was much lower than the related Firmicutes, down to 42-56% identity.
K-L divergence analyses of tetranucleotide frequencies also confirmed the proximity of Caldicellulosiruptor type C sequences to Fervidobacterium (Fig 9). Low divergence estimations    were observed between Fervidobacterium and Caldicellulosiruptor type C transposase genes. Other Firmicutes and Thermotogae presented much higher K-L Divergence values.

Type D transposases
Transposase type D was only detected in Fervidobacterium sp. strain FC2004 with five copies found in its genome. These Fervidobacterium transposase genes presented as closest relatives genes from Firmicutes (68-72% similarity; Fig 10), specifically from the genera Caldicellulosiruptor and some sequences from Caldanaerobacter, Thermoanaerobacter and Clostridium which formed a single cluster to the Fervidobacterium type D transposases. A relatively close transposase gene was found in Thermotoga thermarum but it formed a divergent clade and presented lower similarity (65% similarity) that those Firmicutes genes. Other related genes from the Thermotogae presented much lower similarities (42-44%) and represented a divergent clade. Transposase genes forming related lineages with decreasing similarity to Fervidobacterium type D transposases were detected in Aquificae (63-67%), Proteobacteria (58%), Thermodesulfobacteria (59%) and Archaea (49%) (Fig 10). Squares represent pairwise divergence estimates between transposase genes; the line indicates the divergence estimates for transposase to whole genome comparisons. Identification of transposases follows the IDs used in the dendrogram for these sequences (Fig 4). https://doi.org/10.1371/journal.pone.0173961.g006 A similar pattern to that observed from the phylogeny was observed through NMDS analysis of tetranucleotide frequencies (Fig 11). The Fervidobacterium type D transposase genes clustered closed to the most related Firmicutes genes and clearly differentiated of more distant transposase genes from other taxonomic groups, including Thermotogae.
Palindromic IR sequences were detected surrounding the transposase type D genes ( Table 5). The IRs within the Fervidobacterium FC2004 genome were highly conserved (95-100% identity). Sequence identity between the IRs of Fervidobacterium and some Caldicellulosiruptor species were 59% and 67% up-and down-stream, respectively. In other Firmicutes (Caldicellulosiruptor, Thermoanaerobacter, Clostridium) identities between 50% to 69% were observed revealing proximity between IR sequences from Fervidobacterium and some Firmicutes. The closest Thermotogae transposase genes to these type D transposases presented IR nucleotide identities between 30% to 43% in agreement to a much higher phylogenetic distance than the pointed Firmicutes to Fervidobacterium type D transposase genes.
Gene sequences corresponding to type D transposases presented lowest K-L Divergence estimates when comparing Fervidobacterium and the Firmicutes indicated above. Increasing phylogenetic distance (Fig 10) resulted in a progressive increase of K-L Divergence estimates

Discussion
Transposases are highly related to the mobility of DNA within and likely between genomes [6,7] which suggests that they represent a proxy to detect and analyze HGT events and its influence on bacterial evolution and genome plasticity. In this case, we approached the study of transposases in Fervidobacterium genomes. Using different methods to test for potential HGT events of these sequences, the results indicated the presence of a number of transposases reflecting HGT events.
Due to the generally observed IS sequence conservation over a broad range of prokaryotic taxa [9,30] and their functional trait promoting their own duplication, transposase genes are likely to show relatively homogeneous sequences in comparison to other genes forming the pan genome of a taxon. Wagner [9] proposed that ISs may go periodically extinct in prokaryotic populations and become reintroduced through HGT events alleviating the potential negative long-term effects of their massive expansion in the prokaryote genomes. Nevertheless, ISs are suggested to represent evolutionary benefits in a short run which justifies their implication in species adaptation and genome plasticity [5,6,31]. Thus, the detection of transposase genes within a genome represents a snapshot of a prokaryote genome and provides indications of the potential genetic flow affecting the relatively recent history of specific taxa.
https://doi.org/10.1371/journal.pone.0173961.g008 Table 4. Repetitive flanking sequences detected in Fervidobacterium and related sequences of the type C transposase group. The inverted repeats (IR) upstream (left end) and down-stream (right end) and their percentage of identity with respect to Fervidobacterium sequences are presented. Identification numbers correspond to those used in the figures for this transposase group. Using the variability range from a reference IS sequence within a Fervidobacterium genome and the closest relatives and clades within other Thermotogae allows to approach the distinction of those ISs putatively incorporated through interphylum HGT events. Thus, these phylogenetically distant HGT events are easily discriminated from potential vertical transferences of DNA within a taxon genomic context. The approach involved different strategies to discriminate IS sequences including phylogeny and tree topology, multidimensional analysis and divergence metrics on tetranucleotide frequencies, and comparative IR sequence conservation. These methods were mostly in agreement on suggesting HGT events of IS sequences between Fervidobacterium and other phyla. The use of transposase genes as target sequences and different approaches to detect interphylum HGT events has shown to represent valuable tools for HGT analyses.
Phylogeny showed clear clustering of similar sequences. Besides, multidimensional analyses (i.e., NMDS) of tetranucleotide frequencies also confirmed clustering of the distribution of transposase sequences from Fervidobacterium and related transposase genes. Divergence Symbols represent pairwise divergence estimates between transposase genes; the line indicates the divergence estimates for transposase to whole genome comparisons. Divergence estimates were calculated with respect to sequences 43_Fi (triangules), 646_Fp (squares), 687_Fp (circles) and 1012_Fp (diamonds). Identification of transposases follows the IDs used in the dendrogram for these sequences (Fig 7).
https://doi.org/10.1371/journal.pone.0173961.g009               Identification of transposases follows the IDs used in the dendrogram for these sequences (Fig 10) metrics of transposase genes confirmed increased values in parallel to increasing phylogenetic distance. In addition, the detection of IR sequences flanking those transposase genes is consistent with the idea of them being functional and likely their relatively recent incorporation [9] into these genomes even if the actual mechanism of transference is hardly understood. This study clearly shows the relationship on gene sharing between Fervidobacterium genomes and other prokaryotic phyla. The most frequently detected interphylum HGT relationship was observed between Fervidobacterium and Firmicutes genomes. Specifically, Caldicellulosiruptor genomes shared multiple common ISs with Fervidobacterium genomes and some of them presented up to 99% amino acid similarity in their transposases and 100% IR nucleotide sequence identities. These results were compared to the closely detected IS sequence within each IS family and other Thermotogae (beyond the Fervidobacterium genus) which showed amino acid similarities and IR nucleotide sequence identities generally below 50%. These data confirmed the occurrence of clear interphylum HGT events between those two genera. Symbols represent pairwise divergence estimates between transposase genes; the line indicates the divergence estimates for transposase to whole genome comparisons. Divergence estimates were calculared with respect to the sequences 3_Ft (triangules), 10_Ft (squares), 11_Ft (circles) and 19_Ft (diamonds). Identification of transposases follows the IDs used in the dendrogram for these sequences (Fig 10). https://doi.org/10.1371/journal.pone.0173961.g012 Other members of the Firmicutes (i.e., Thermoanaerobacter, Caldanaerobacter, Caloramator, Thermobracchium and Clostridium among others) also showed putative HGT events with the genus Fervidobacterium. As well, bacteria from other phyla (i.e., Aquificae, Synergistetes, Deinococcus-Thermus, Thermodesulfobacteria, Proteobacteria) and some Archaea were also detected as putatively involved in relatively recent HGT events with Fervidobacterium. Interestingly, most of these representatives presenting HGT relationship to Fervidobacterium were also thermophiles which confirmed the requirement of potentially sharing a habitat to make DNA exchanges possible [30].

Conclusions
The diversity of transposase genes detected within the genus Fervidobacterium suggests that these sequences have a major contribution to the adaptation and evolution of these bacteria and their genomes to the environment. The actual implication of transposases, their role in HGT, the rate of HGT events and transposase duplication and exchange remain to be fully understood. This study shows the relevance of transposases on HGT using the Fervidobacterium genus as a model of study and shows that IS sequences can be used to detect the occurrence of HGT events between phylogenetically distant prokaryotes and to identify the taxa involved in potential exchanges of genomic DNA.