Trends of the Major Porin Gene (ompF) Evolution: Insight from the Genus Yersinia

OmpF is one of the major general porins of Enterobacteriaceae that belongs to the first line of bacterial defense and interactions with the biotic as well as abiotic environments. Porins are surface exposed and their structures strongly reflect the history of multiple interactions with the environmental challenges. Unfortunately, little is known on diversity of porin genes of Enterobacteriaceae and the genus Yersinia especially. We analyzed the sequences of the ompF gene from 73 Yersinia strains covering 14 known species. The phylogenetic analysis placed most of the Yersinia strains in the same line assigned by 16S rDNA-gyrB tree. Very high congruence in the tree topologies was observed for Y. enterocolitica, Y. kristensenii, Y. ruckeri, indicating that intragenic recombination in these species had no effect on the ompF gene. A significant level of intra- and interspecies recombination was found for Y. aleksiciae, Y. intermedia and Y. mollaretii. Our analysis shows that the ompF gene of Yersinia has evolved with nonrandom mutational rate under purifying selection. However, several surface loops in the OmpF porin contain positively selected sites, which very likely reflect adaptive diversification Yersinia to their ecological niches. To our knowledge, this is a first investigation of diversity of the porin gene covering the whole genus of the family Enterobacteriaceae. This study demonstrates that recombination and positive selection both contribute to evolution of ompF, but the relative contribution of these evolutionary forces are different among Yersinia species.


Introduction
The genus Yersinia, a member of the Enterobacteriaceae family, is currently composed of 14 known species: Y. pestis, Y. pseudotuberculosis, Y. enterocolitica Y. aldovae, Y. aleksiciae, Y. bercovieri, Y. frederiksenii, Y. intermedia, Y. kristensenii, Y. massiliensis, Y. mollaretii, Y. rohdei, Y. ruckeri, and Y. similis [1][2][3]. Three of them are well documented human pathogens. Y. pestis is the etiologic agent of plague while Y. pseudotuberculosis and Y. enterocolitica are known to cause a variety of gastrointestinal symptoms [4]. The characterization of the remaining 11 species is more limited. However, these species accepted as human nonpathogenic possess novel virulence mechanisms, and some of them have been associated with human cases [5,6]. Yersinia are disseminated all over the world in terrestrial and aquatic environments, and associated with many different hosts (plants, animals, insects, fish and so on). Despite recent advances in our understanding of the pathobiology of Yersinia, the molecular-genetic mechanisms by which Yersinia colonizes and adapts to various host or environmental conditions are still poorly understood. In this context, membrane surface molecules are considered the major targets of the membrane-environment interaction.
General bacterial porins (GBPs) are one of the most abundant proteins (up to 10 5 copies per cell) in the outer membrane of the gram-negative bacteria [7,8]. Structurally, a typical GBP subunit consists of 16 antiparallel b-strands forming a b-barrel, with short turns facing the periplasmic space and long loops facing the external surface of bacterial membrane [9][10][11]. Three porin subunits are assembled into stable homotrimers. The best-studied GBPs, which include OmpF, OmpC and PhoE of E. coli, differ in their solute selectivity, porin activity and gene expression in response to many environmental factors, such as osmotic pressure, temperature and pH [12][13][14]. Porins are one of the first molecules responding to environmental changes and at least for some bacteria have been found to reflect their ecological niche by the sequence type [15,16]. As the major components of the outer membrane, some pore-forming proteins play a role in bacterial pathogenesis, such as adherence, invasion, and serum resistance [17][18][19][20].
Little is known about evolution and diversity of GBPs of the Enterobacteriaceae at all and the Yersinia especially. Scattered reports showed that Yersinia's major porin is the b-structured protein resistant to high temperature, proteases, and detergents [21][22][23]. Primary structure and topology of the OmpF porin of pathogenic Yersinia was determined and demonstrated 55% homology with E. coli and 70% homology with Serratia marcescens OmpFs, respectively [24,25]. Here we conducted an in-depth study of the ompF gene diversity in all currently known Yersinia paying special attention to evolution inference and phylogenetic relationships of these bacteria.

16S rDNA and gyrB sequence variations and genetic relationships among Yersinia species
To justify evolutionary relationships and taxonomic position, 16S rDNA and gyrB genes sequences were analysed in all Yersinia strains used in this study ( Table 1). The 16S rDNA gene sequencing has definitely allowed Yersinia identification [26] and recognizing novel species and subspecies within the genus [3,27,28]. However, the 16S rDNA sequence analysis cannot resolve the phylogenetic relationships between closely related Yersinia species [2,29]. Recently, gyrB has been successfully applied to characterization of Y. frederiksenii genomospecies [30] and was included as one of the MLST gene targets for studying genetic relationships among Yersinia species [29].
In this study, the total number of Yersinia strains was 65, covering all Yersinia species, and originating from different sources and geographic locations (Table 1). 16S rDNA and gyrB sequences from all the above strains were PCR amplified and sequenced. Eight additional sequences of each gene were obtained from publicly available Yersinia genomes (http://www.ncbi.nlm.nih. gov). In total 73 16S rDNA and gyrB sequences were analysed. The sequences were aligned and adjusted to 750 bp for 16S rDNA and to 838 bp for gyrB. Each unique sequence, differing in one or more nucleotide or amino acids sites, was assigned as a distinctive allele, resulting in 31 alleles for 16S rDNA and 48 alleles for gyrB or 21 alleles for GyrB (Table 1). The number of the detected alleles was ranged from 1 of 16S rDNA per species (Y. pestis, Y. pseudotuberculosis, Y. enterocolitica subsp. palearctica and Y. aldovae) or GyrB (Y. mollaretii, Y. pestis, Y. pseudotuberculosis, Y. similis Y. ruckeri, Y. rohdei and Y. bercovieri) and to 13 alleles of gyrB (Y. intermedia). The number of allele variants slightly varied from those published previously [29] possibly because of inclusion of more distant strains and/or increasing the lengths of the analyzed fragments.
In order to correctly identify each strain examined, a neighbour-joining tree was constructed from the 16S rDNA-gyrB concatenated sequences (Fig. 1). Ten Yersinia species (Y. aldovae, Y. bercovieri, Y. enterocolitica, Y. intermedia, Y. mollaretii, Y. pestis, Y. pseudotuberculosis, Y. similis, Y. rohdei and Y. ruckeri) were clearly grouped into relatively distinct clusters. The intraspecies genetic distance means of these species were up to 0.012. Y. pestis strains clustered tightly with the Y. pseudotuberculosis strains and the distance mean for this group was 0,001. Since, only one Y. similis strain was examined, the genetic distance of that species could not be estimated. Y. similis is a novel species in Yersinia, recently separated from its nearest phylogenetic neighbor Y. pseudotuberculosis [3]. As expected, Y. similis Y239 was clustered with Y. pseudotuberculosis and Y. pestis, forming a distinctive long branch. Strains of Y. enterocolitica were divided into three groups mainly caused by gyrB sequences, while 16S rDNA sequences separated strains into two subspecies (Y. enterocolitica subsp. enterocolitica and Y. enterocolitica subsp. palearctica), previously described by Neubauer et al., 2000 [28]. Strain Y. frederiksenii 2043 did not group with other five isolates of this species. It branched with Y. aleksiciae, Y. bercovieri and Y. mollaretii. Based on these results phylogenetic relations and BLAST (data not shown), Y. frederiksenii 2043 was more closely related to Y. massiliensis. Similar partition was observed for Y. kristensenii, three of which (991, Y332 and 6266) diverged from the other eight strains (6572, 8914, H17-36/83, 5868, 6032, 5862, 5306 and 5932) with a genetic distance about 0.055 and clustered with Y. aleksiciae Y159, sharing the genetic distance by 0,005. The data definitely indicated that these uncommon strains of Y. kristensenii and Y. frederiksenii might be members of Y. aleksiciae sp. nov. and Y. massiliensis sp. nov., since, Y. aleksiciae was recently separated from Y. kristensenii [1] and Y. massiliensis is more closely related to Y. frederiksenii [2]. Therefore, these strains were designated as Y. aleksiciae-like and Y. massiliensis-like, respectively. Based on the 16S rDNA-gyrB tree, most Y. intermedia clustered together into one of two branches; four Russian strains (6044, 5934, 6270 and 601) were located on the line leading to the rest Y. intermedia, shared the intraspecies distances up to 0.007.

Phylogenetic and recombination analyses of the ompF gene
We investigated phylogenetic relationships and recombination of the ompF gene from all Yersinia strains ( Table 1). The ompF gene was amplified, using primers, derived from a CLUSTALX alignment of the published ompF nucleotide sequences. 73 complete coding nucleotide sequences of the ompF gene were aligned to infer ompF phylogenetic tree. We found 62 unique nucleotide alleles of the ompF gene (table 1), which clustered into 18 groups on the tree (Fig. 2). Though different algorithms and clustering methods produced similar topologies of the ompF tree, phylogenetic clustering of the strains performed by neighborjoining method with Kimura 2-parameter algorithm gave the highest bootstrap values. With the exception of five species, Y. similis, Y. intermedia, Y. mollaretii, Y. frederiksenii and Y. aleksiciae, all strains that belong to the same Yersinia species were clustered in one group. Five strains of Y. ruckeri clustered together in a distinct group V and showed the intragroup distance mean by 0.002 and the largest intergroup genetic distance means from 0.166 to 0.197. It reaffirmed that Y. ruckeri has been fairly clonal and genetically the most distant species within the genus [26,29]. For Y. enterocolitica (group VII) the intraspecies genetic distance mean was 0.029, and the groups means of 0.097-0.166. Phylogenetic grouping of Y. enterocolitica ompF genes exactly replicated that of 16S rDNA-gyrB sequences with division in two subspecies, Y. enterocolitica subsp. palearctica (Y11, 1234, 2974/81, 6579, 1245, 2720/87, and 1215) and Y. enterocolitica subsp. enterocolitica (WA220 and ATCC 8081), supported by a high bootstrap value (100%). Interestingly, in both phylogenetic trees, Y. enterocolitica subsp. palearctica clearly splits into two lines (bootstrap value 100%), one of them was only formed by Y. enterocolitica strains (1215, 1234, and 1245) isolated in Russian Far-East. Strains of Y. kristensenii formed group X with intragroup distance mean 0.020, and intergroup distance means 0.072-0.183. The Y. bercovieri (group VI), Y. rohdei (group XIV) and Y. aldovae (group IV) were represented by only two strains and the within and between group distance means were up to 0.009 and 0.068-0.188, respectively. The strains of Y. pestis, Y. pseudotuberculosis and Y. similis grouped together (group VIII) with intragroup distance mean of 0.037, and between group distance means being 0.138-0.196. The VIII group splits into two subgroups with bootstrap value of 100%. One of these subgroups included two Y. pseudotuberculosis strains IP32953, IP31758 and Y. similis Y239, while the other-all Y. pestis strains and Y. pseudotuberculosis YPIII. This ompF tree topology did not correlate with the 16S rDNA-gyrB tree branching, possibly indicating interspecies recombination between Y. pseudotuberculosis and Y. similis, or/and diversification of the ompF gene of Y. pseudotuberculosis before emergence of Y. pestis by adaptive evolution.
The ompF sequences of the remaining species, Y. intermedia, Y. mollaretii, Y. frederiksenii and Y. aleksiciae, exhibited different phylogenetic relationships and produced incongruent molecular phylogenies with the 16S rDNA-gyrB tree. The Y. frederiksenii strains, that were genetically distinct and not closely related to each other according to the 16S rDNA-gyrB tree, split into three groups; two groups (XI and XII) were presented by single strains, and IX group was by strains with intragroup of 0.043 and intergroup from 0.112 to 0.178 distance means. From previously characterized Y. frederiksenii genomic groups [31], Y. frederiksenii IX and XI groups of ompF could corresponded to genomic groups 1b and 1a, respectively, and XII group (Y. massiliensis)-to genomospecies 2. A mixed branching pattern was found in Y. mollaretii and Y. aleksiciae strains. Two Y. aleksiciae strains (Y159 and Y332) grouped together with three Y. mollaretii strains (H279-36/85, Nr850/89 and ATCC43969), whereas two others Y. aleksiciae strains (991 and 6266)-with three other Y. mollaretii (87-36/87, H87/82 and Nr846/89). Therefore, Y. aleksiciae (groups XVI and XVII) and Y. mollaretii (groups XV and XVIII) strains split into two relatively distinct groups with intragroup distance means up to 0.007, and intergroup distance means of 0.048-0.197. Interestingly, Y. aleksiciae recently isolated from Y kristensenii was more closely related to Y. bercovieri and Y. mollaretii than to Y kristensenii, and that was confirmed by the 16S rDNA-gyrB tree. Previously, based on the concatenated tufA-tufB tree, Y. aleksiciae type strain LMG 22254 was found to be distinct from the Y. kristensenii cluster and clearly grouped with Y. bercovieri and Y. mollaretii [32]. The most genetically heterogeneous was Y. intermedia that formed four different groups (I-III, XIII) with intragroup distance means up to 0.039, and with between groups means being 0.076-0.195. Moreover, XIII group, formed by five Y. intermedia strains (601, Nr12/84, 1948, 5375 and 5631) was separated from the rest Y. intermedia groups by a number of genetic clades.
As mentioned above, some species produced incongruent 16S rDNA-gyrB and ompF phylogenies. A mix branching pattern can be a sign of recombination, whereas in the case of mutation the gene trees look the same [33]. To verify this assumption, we used four tests (RDP, MaxChi, Chimera, and Geneconv) in the RDP3.34 package for investigation of the ompF gene of all Yersinia groups. We detected four recombination events with brake-points involving three species, Y. intermedia (groups I, II, XIII), Y. aleksiciae (groups XVI and XVII) and Y. mollaretii (groups XVIII and XV)  intermedia ompF. This might be a subject of further research, as well as investigation of associations within a specific niche. It is very interesting to note that one of the brake-points of all recombinant ompF is located in the same region (431-501 bp), corresponding to 6-th b-strand of OmpF. The reconstruction of the ompF tree for the region 1-501 bp produced very similar branch pattern with that of the 16S rDNA-gyrB tree (data not shown). We suspect that significant nucleotide similarity in this region (with the exception of the external loops) reflects a strong selective pressure (purifying selection) due to an important functional role of this region as a zone of monomers contacts in a porin trimer. This might be an evident example of protein structural constraints.
Examples of the homologous recombination in porin genes have been recognized for some bacteria, mainly for naturally transformable species as Pseudomonas (OprD), Neisseria (PorB, OmpA), Chlamydia (OmpA), and Leptospira (OmpL1) [16,[34][35][36][37][38]. For these genes different mosaic patterns have been identified. The intragenic recombination has been frequently observed within species due to the transfer of a portion or an entire gene. As a rule, the exchanges occur only in the loop regions and do not affect the transmembrane domains. Moreover, rare cases of interspecies recombination of porin genes have been described in literature. It was suggested that porB2, an allele of porB, arose in meningococci by interspecies recombination between ancestral pathogenic and commensal Neisseria species [39]. Also, an interspecies recombination in ompA between a mouse strain of C. trachomatis and a horse strain of C. pneumoniae was documented [35]. Multiple interspecies recombination patterns were observed among ompL1 genes, belonging to four different Leptospira species [38].
The phylogenetic analysis of ompF sequences placed most of the Yersinia strains in the same line assigned by 16S rDNA-gyrB tree with the exception of six species, Y. pseudotuberculosis, Y. similis, Y. frederiksenii, Y. intermedia, Y. mollaretii, and Y. aleksiciae. The incongruence of ompF and 16S rDNA-gyrB trees indicated the inter-and intraspecies recombination. Despite extensive recombi- nation events in the Yersinia ompF genes, this seems to happen not so often to remove all phylogenetic signals.

Adaptive evolution of the ompF gene in Yersinia
As it was shown above, the ompF gene of Yersinia is more divergent than the 16S rDNA and gyrB genes. The nucleotide diversity for all ompF genes (0,13160,005) is twofold higher than for housekeeping genes (0,05160,004). The common alignment of 73 ompF sequences contain 40% (479/1200 bp) of polymorphic nucleotide sites, which distributed strikingly nonrandom and formed hypervariable and conserved regions (Fig. 4). We have divided yersinia's ompF gene into 18 regions, according to domain organization of Escherichia coli OmpF protein [9]. Loops L2, L4-L7 were characterized by nucleotide deletions and/or insertions. Comparative analysis of surface-exposed loops exhibited significant heterogeneity of L4 and L5 (4664.5%). The highest homology was conserved in L3 (8.261.6%). The same nonrandom heterogeneity with characteristic conserved regions forming the bbarrel structure of the proteins, and variable regions, making up the putative surface-exposed loops, has been shown in some other porins [39,40].
To estimate deviation in codon usage, the codon adaptation index (CAI) was calculated for the ompF gene. CAI is a measure of the relative adaptiveness of the codon usage of a gene towards the codon usage of highly expressed genes of that organism: the higher the index value, the greater the codon usage bias [41]. As a reference for highly expressed genes, we used the 27 concatenated ribosomal genes for ten Yersinia species. The genes of the ribosomal proteins had a CAI value from 0.52 to 0.56 for all species, but CAI values for the ompF gene were higher (from 0.64 to 0.75). Therefore, there is a strong codon usage bias in the ompF gene in all Yersinia species, as expected for highly expressed genes. This is another reason to assume that the high level of ompF transcription may be also responsible for nonrandom heterogeneity in the gene.
To determine how the level of selective constraint varies along the ompF gene, we estimated the numbers of synonymous substitutions per synonymous site (dS) and nonsynonymous substitutions per nonsynonymous site (dN) and calculated the dS/dN ratio for the ompF gene. If purifying selection has occurred, a gene has a dS/dN.1. Absence of selection should generate dS/ dN = 1. A ratio dS/dN,1 indicates diversifying selection or accelerated evolution [42,43]. We excluded Yersinia groups with recombination events from analysis and dealt only with six ompF groups of Yersinia (VII, VIII, IX, I, X, XIII). The dS/dN ratio was calculated as an average over all of the codon sites in each ompF group using the Nei-Gojobori method by MEGA 4 of Jukes-Cantor model. Statistical significance was tested by Codon-based Z-test. For all groups we detected approximately identical dS/dN  means from 4.224 to 5.748 with p,0.05 of purifying selection. Thus, ompF gene is under strong purifying selection in all six Yersinia groups. Neilsen and Yang method [44], compiled in Sitewise likehood ratio estimation programme [45], was used to identify the sites with the evidence of positive selection in selected ompF groups. The porin protein structures for these groups have been simulated and sites with weak or strong positive selection have been located on the models (Fig. 5).
When these selected sites were mapped onto three-dimensional structural models, it becomes clear that the majority fell within regions predicted to encode surface-exposed loop regions. It is important to note that these sites were located in different surface loops of analyzed Yersinia groups. For example, three residues in putative loop L1 were shown to be under strong selection in the group VIII, whereas there is no evidence of positive selected sites in putative loop L1 for groups VII, XIII, I and X. Smith N.H. observed unlike distribution of positive selected regions in porB genes in N. meningitides and N. gonorrhoeae [46]. Authors explained this by differences in the immune response to these two organisms. The impact of diversifying selection on ompC, ompF, lamB and fhuA omp's genes of Escherichia and Shigella [47,48], ompC, ompS1 and ompS2 genes of Salmonella [49] has been demonstrated. Authors proposed that positive selection in omp genes may be an important mechanism that facilitates adaptation of bacterial pathogens allowing them to escape recognition by the host immune system, phages and penetration of antibiotics.
Our analyses demonstrated that the Yersinia ompF gene has evolved with nonrandom mutational rate under purifying selection in overall. However, the surface loops of the OmpF porin contain sites subjected to positive selection. Interestingly, such sites are located in different surface loops in different Yersinia species. We suppose that the ompF genes of different Yersinia species have evolved under individual constraints associated with unlike environmental challenges. Existence of both positive selection and recombination in porin genes has previously been reported for Neisseria porB and porA genes [37,50] as well as for ompA from Chlamidia [35] and Wolbachia [51]. In case of Yersinia ompF gene we consider that horizontally acquired fragments of some surface loops may be fixed by positive selection in process of species adaptation to new ecological niches. Such recombinant genes might supply their new hosts with benefits allowing to escape a deadly response of the immune system as well as lethal attacks of phages and antimicrobials. This might be more easily achieved by gene recombination rather than by random mutations. Moreover, these mechanisms seem to operate in evolution of porins genes of all taxonomic groups.
Conclusion. Genetic diversity of outer membrane proteins might result from bacterial adaptation to different ecological niches. Porins are surface exposed and their structure strongly reflects the history of multiple interactions with the environmental changes in their ecological niches. The evolution of the ompF gene of Yersinia clearly demonstrates a combination of diversifying selection (recombination and positive selection) and functionstructure constraint (translational selection and purifying selection). The data can be important for clarification the role of porin's surface exposed loops on bacterial adaptation and development of broad-spectrum Yersinia vaccine antigens and serological methods of diagnostics.

Materials and Methods
Bacterial strains, growth conditions, and DNA isolation A total of 65 Yersinia strains from the collections of Max von Pettenkofer Institute (Munchen, Germany) and Research institute of epidemiology and microbiology, Siberian branch of Russian academy of medical sciences, (Vladivostok, Russia) were used in this study. Strain selection was intended to include strains of all known Yersinia species with a high degree of diversity. All strains were grown overnight at 30uC or 37uC under aerobic conditions on LB medium. Bacterial DNA was isolated from overnight cultures of the selected strains using Genomic DNA Purification Kit (Fermentas, EU). The DNA concentration was determined by agarose gel electrophoresis. The gels were scanned and the signals were analyzed with the VersaDoc 4000 MP system (Bio-Rad Laboratories AG, Switzerland). Additionally, eleven Yersinia strains for which the genome sequences are available on the GeneBank of NCBI website were analyzed.

Comparative sequence analysis and phylogeny inference
Nucleotide sequence data from forward-and reverse-strand chromatograms were assembled into single contiguous sequences using the Vector NTI Advance 9.1.0 software. Sequences were aligned by ClustalW 2.0.10 [52]. MEGA version 4.1 [53] was used to calculate genetic distances between sequences and to produce phylogenetic trees. To construct the tree from nucleotide sequences, all three coding positions were examined and the Neighbour-Joining model with Kimura 2-parameter method [54] was applied. The reliability of the inferred trees was assessed using the bootstrap test (1000 replications) [55]. Alignment gaps were excluded using function ''Pairwise Deletion'' from all analyses.

Evolution analyses
Nucleotide divergence (Pi) along ompF sequences was determined by DnaSP v5 [56] using Sliding window with length of 20 and step size of 7. Adaptive evolution of ompF gene was calculated as proportion of synonymous (silent; ds) and non-synonymous (amino acid-changing; dn) substitution rates in MEGA 4 using the Nei-Gojobori method with Jukes-Cantor correction and SLR [45] software. Recombinant ompF sequences were detected with the RDP v3.34 software [57] using four automated recombination detection methods including RDP [58], Genconv [59], Chimaera [60], Maximum Chi Square [60,61]. For the RDP method, internal reference sequences were used, the window size was set to 20, and 0-100 sequence identity was used. For both the MaxChi and the Chimera methods, the number of variable sites was set to 40. For the GENCONV method, we used standard settings. A maximum P value of 0.01 and a Bonferroni correction were used. Results were then checked by visual inspection. CAI index was calculated by CodonW 1.3 (ftp://molbiol.ox.ac.uk/cu/codonW. tar.Z) software for 11 Yersinia species (Y. pestis CO92, Y. pseudotuberculosis IP32953, Y. enterocolitica ATCC 8081, Y. intermedia ATCC 29909, Y. rohdei ATCC 43380, Y. kristensenii ATCC 33638, Y. frederiksenii ATCC 33641, Y. mollaretii ATCC 43969, Y. ruckeri ATCC 29473, Y. bercovieri ATCC 43970, Y. aldovae ATCC 35236). As a reference for highly expressed genes, we used the 26 concatenated ribosomal genes for each organism.