Main Functions and Taxonomic Distribution of Virulence Genes in Brucella melitensis 16 M

Many virulence genes have been detected in attenuated mutants of Brucella melitensis 16 M; nevertheless, a complete report of these genes, including the main Cluster of Orthologous Groups (COG) represented as well as the taxonomical distribution among all complete bacterial and archaeal genomes, has not been analyzed. In this work a total of 160 virulence genes that have been reported in attenuated mutants in B. melitensis were included and analyzed. Additionally, we obtained 250 B. melitensis randomly selected genes as a reference group for the taxonomical comparisons. The COGs and the taxonomical distribution profile for 789 nonredundant bacterial and archaeal genomes were obtained and compared with the whole-genome COG distribution and with the 250 randomly selected genes, respectively. The main COGs associated with virulence genes corresponded to the following: intracellular trafficking, secretion and vesicular transport (U); cell motility (N); nucleotide transport and metabolism (F); transcription (K); and cell wall/membrane/envelope biogenesis (M). In addition, we found that virulence genes presented a higher proportion of orthologs in the Euryarchaeota and Proteobacteria phyla, with a significant decrease in Chlamydiae, Bacteroidetes, Tenericutes, Firmicutes and Thermotogae. In conclusion, we found that genes related to specific functions are more relevant to B. melitensis virulence, with the COG U the most significant. Additionally, the taxonomical distribution of virulence genes highlights the importance of these genes in the related Proteobacteria, being less relevant in distant groups of organisms with the exception of Euryarchaeota.


Introduction
Bacteria of the genus Brucella are the etiological agents of brucellosis, the most widely spread zoonotic disease worldwide. Brucella spp. are able to infect a wide range of mammals, including humans, with an estimated 500,000 new reported human cases per year [1,2]. Although bacteria belonging to this genus, such as B. melitensis, B. suis, B. abortus and B. canis, are known to infect humans, only B. melitensis and B. abortus are considered the main causal agents of human infections [3,4].
B. melitensis is a Gram-negative facultative intracellular pathogen that belongs to the alpha-2 proteobacteria group. This bacterium has the ability to infect phagocytic macrophages and nonphagocytic cells (epithelial cells, fibroblasts and osteoblasts), and its virulence relies on the ability to replicate inside these cells [5,6]. Brucella lacks known bacterial pathogenic factors that can directly harm eukaryotic cells, such as cytolisins, exotoxins, exoproteases and exoenzymes, nor does it express pathogenic determinants, like fimbriae, capsules, antigenic variation and plasmids [5,6]. All these data suggest that the bacterium produces direct tissue damage, probably by the activation of host immune responses [5]. Despite the absence of classical virulence factors, many virulence genes have been detected via random [7,8] and directed [9] mutagenesis in attenuated phenotypes. The best studied are the virB operon, which encodes the Type IV secretion system (T4SS), and genes related to lipopolysaccharide production and membrane proteins [10,11,12]. To date, an integrative analysis of these genes in B. melitensis has not been addressed; therefore, the main goal of this work is to achieve a comprehensive comparison of virulencerelated genes identified in B. melitensis 16 M under different approaches, to gain insights into their main functions and their taxonomical distribution across bacterial and archaeal genomes.

Material and Methods
In this analysis, we included all the identifiable genes (by the BME locus) reported to date in vivo or in cells with an attenuated phenotype in B. melitensis 16 M [7][8][9][10][12][13][14][15][16][17][18][19][20][21][22][23][24]. We achieved a comparison of the functional annotations based on the Cluster of Orthologous Groups (COG) classification in relation to the wholegenome COG distribution. Additionally, a taxonomical distribution analysis profile of these genes was conducted and the results were compared with randomly selected genes of B. melitensis.

Functional annota tions
The functional annotations were retrieved from the DOE Joint Genome Institute Integrated Microbial Genomics Project server (http://img.gji.doe.gov).

Identification of orthologous genes
The orthologs for each gene in 789 nonredundant bacterial and archaeal genomes that have been completely sequenced were obtained by using the bidirectional best hits (BDBHs) definition in the protein sequence through depurated genomes at 95% identity, with an E-value of #1e-6, as described elsewhere [25]. The taxonomical distribution of the virulence genes was compared with the taxonomical distribution of 250 randomly selected genes of the B. melitensis 16 M genome that did not overlap with virulence genes; these randomly selected genes were obtained using the function of random gene selection of the RSA tools database (http://rsat.ulb.ac.be). These genes are described in Supplementary Table S1.

Statistical analysis
The descriptive statistics for qualitative variables consisted of frequencies and percentages and for quantitative variables they consisted of medians and ranges. For the comparison of the COG classification and the taxonomical distribution, the chi-squared test and Fisher exact test were performed. For the comparison of the number of orthologs between virulence and randomly selected genes, we used the Mann-Whitney U test (considering the nonparametric distribution of the data). The statistical analysis was carried out with the software SPSS v 10.0 and Epi-info. Statistical significance was set at p#0.05.

Results
A total of 160 virulence genes of B. melitensis 16 M identified in attenuated mutants were gathered from the literature. The BME locus, functional description, COG annotation and number of orthologs identified in all the bacterial and archaeal genomes included, as well as the reference, are listed in Table 1.

COG classification and comparison with the whole genome COG distribution
In order to have a better approximation of the functional annotations of the virulence-related genes described in previous studies, the genes were classified in terms of their COGs, and afterwards their frequencies were evaluated. From these analyses, we observed that 20 of the 24 COG categories were present in the virulence genes ( Table 2). The most frequent COGs observed were genes for cell wall/membrane/envelope biogenesis (M) (10.62%), transcription (K) (10.44%), intracellular trafficking, secretion and vesicular transport (U) (7.81%) and carbohydrate transport and metabolism (G) (7.19%) while the least frequent COGs were secondary metabolites biosynthesis (Q) (0.62%), lipid transport and metabolism (I) (0.63%), defense mechanisms (V) (1.25%) and inorganic ion transport and metabolism (P) (1.88%) ( Table 2). In the comparison of virulence genes with the wholegenome COG distribution, we found a significant overrepresentation of the COGs for intracellular trafficking, secretion and vesicular transport (U) (7.81% vs 1.5%), nucleotide transport and metabolism (F) (6.56% vs 2.2%), cell motility (N) (4.69% vs 0.92%), transcription (K) (10.44% vs 5.62%) and cell wall/ membrane/envelope biogenesis (M) (10.62% vs 5.0%) and an underrepresentation of genes not categorized in a COG (NC) ( Table 2). The difference was more significant for COGs U, N and F, which are composed mainly of genes grouped in the virB operon, flagellar genes and genes for enzymes related to nucleotide synthesis, respectively; COG M includes predominantly genes for enzymes related to lipopolysaccharide production, while COG K was mainly represented by RNA polymerase sigma factors and transcription factors of the families MerR, LysR, LuxR, AsnC and GntR (Table 1).

Taxonomical distribution and comparison with random genes
We observed a significantly higher number of orthologs among the virulence genes in comparison with random genes (p = 0.034). The virulence genes presented a median (range) of 174 (0-727) orthologs and 119 (0-788) for random genes. These data were obtained by adding all the orthologs of each gene included and then obtaining and comparing (with the Mann-Whitney U test) the median and range per group of genes (random and virulence). In Table 3 we show the taxonomical distribution of the orthologs present among the virulence and random genes. These analyses were conducted by comparing the frequencies and percentages of the number of orthologs in each organism division and not the numerical count (i.e., medians and ranges). We used this method after considering that the presence or absence of an ortholog is a dichotomous qualitative variable and the best way to analyze its behavior is by using qualitative probes, which better describe their distribution in the different bacterial and archaeal divisions, thus avoiding a possible bias by the total increase or decrease of orthologs in a particular set of genes. By using the frequencies and percentages, we were able to identify an increase or a decrease in the proportion with respect to the total number of orthologs in a particular bacterial or archaeal division with higher accuracy than when using quantitative probes.
In order to know the distribution of the orthologs in the 2 sets of genes in the different bacterial and archaeal genomes, we adjusted the percentages of the observed orthologs by dividing the total number of orthologs for each organism by the number of genes in its genome, and the sum of these values in one particular division was divided by the number of organisms analyzed in that division. Finally we adjusted the total to 100% and obtained the corresponding percentage by group of organisms. After adjustments by genome size and number of organisms analyzed per group, we observed that the highest proportions of orthologs in random genes were found in Aquificae, Nitrospirae, Elusimicrobia and Proteobacteria, and the lowest frequencies were in Nanoarchaeota, Crenarchaeota and Euryarchaeota. Virulence genes presented the highest frequencies in Aquificae, Synergistetes, Nitrospirae and Deferribacteres and the lowest frequencies in the archaeal groups Nanoarchaeota, Korarchaeota and Crenachaeota (Table 3).
In the comparison of virulence genes with the randomly selected genes, we observed a slight but significant increase of orthologs in Euryarchaeota and Proteobacteria, with a significant decrease in Tenericutes, Chlamydiae, Bacteroidetes, Firmicutes and Thermotogae. The differences within the Proteobacteria, and the Alphaproteobacteria groups are described in Table 4. Within Proteobacteria division, in virulence genes, we found a slight but significant decrease of orthologs in Gammaproteobacteria, and within Alphaproteobacteria we found a significant decrease in the order of Rickettsiales when compared with randomly selected genes. It is important to mention that in order to observe the differences in the comparisons of the orthologs present, the absolute percentage and not the adjusted percentage must be considered, because for comparison purposes the number of genes per genome or the number of genomes by each particular division did not affect the differences observed, as the same organisms were used to obtain the orthologs in the 2 sets of genes. This is why these percentages are more accurate and better reflect the differences between both groups. The adjusted percentages must be taken into account only to appreciate the orthologs distribution in each set of genes, as previously mentioned.   The virulence genes with the lowest number of orthologs were BMEII0039 (gltD) and BMEI0972 (gor), with no orthologs, followed by BMEI0998 (wboA), BMEI0926 (emrA) and the three hypothetical proteins BMEI0540, BMEI0514 and BMEI0193, with only one ortholog (in B. suis). The gene BMEII0031 (virB7) only presented 2 orthologs (in B. suis and Methylobacterium radiotolerans JCM1831) ( Table 1). The genes with the highest number of orthologs were BMEI1301 (dapA), BMEI0296 (purE), BMEI1519 (purD), BMEI1127 (purL) and BMEI0781 (rpoA), with around 700 orthologs (Table 1). Additionally, orthologs of the virB operon were found in different organisms; nevertheless, in addition to B. suis (the only member of the family Brucelaceae included), Methylobacterium radiotolerans JCM1831 was the only analyzed bacterium that presented an ortholog for the 11 genes of this operon, followed by Burkholderia vietnamiensis G4, B. xenovorans LB400 and Leptothrix cholodnii SP6, with 10 virB genes.

Discussion
In this report we comprehensively evaluated all the attenuated mutants in B. melitensis described in the literature. Additionally, we performed a COG and taxonomical comparison in order to gain insights into the function and distribution of these genes.
From these analyses we observed virulence genes in all but 4 COGs, which indicate that virulence genes are present in almost all functional categories. The absence of virulence genes in COGs A, B, D and W is probably due to the extremely low frequencies of these COGs in the B. melitensis genome (,1%). In the comparison of the most highly represented COGs among the virulence genes, we found good agreement with the largest study performed in random mutants of B. melitensis [8], in which COGs related to intracellular trafficking and vesicular transport (U), nucleotide transport and metabolism (F), transcription (K), cell wall/ membrane/envelope biogenesis (M) and energy production and conversion (C) were overrepresented in attenuated mutants. In this study, where the information was complemented with information on the rest of the mutants reported so far for B. melitensis, we also found that COG N (composed of flagellar genes) was overrepresented, while COG C was no more significantly affected (Table 2). These results showed that genes related to intracellular trafficking, mainly composed of virB and flagellar genes and also to genes related to nucleotide synthesis, lipopolysaccharide production and transcription factors of the families MerR, LysR, LuxR, AsnC and GntR, are essential for B. melitensis virulence. These observations have been mainly confirmed for virB genes, for which many independent studies have shown that virB is required for infection persistence [8,11,20,24]. As mutants lacking this operon show normal dissemination, it has been proposed that this genes products are not required in early infection establishment but instead in the late stages of infection [20]. Some mutants classified in the COGs related to transcription (K), such as RNA polymerase sigma factors and VjbR and ArsR6 transcription factors, are associated with virulence based on their importance in virB transcription ( Table 1). The underrepresentation of genes not categorized in a COG suggests that uncharacterized genes are less relevant in B. melitensis virulence.
The increased number of orthologs for virulence genes compared with randomly selected genes suggests an increased conservation of these genes in different bacterial and archaeal groups, which is probably related to a higher importance of their functions in the analyzed organisms. It is important to mention that some of these genes affect the intracellular replication of B. melitensis by disruption of general metabolism (i.e., nucleotide metabolism), which is also essential in other nonpathogenic organisms, and so these genes could be contributing to the higher number of orthologs found compared to results with the randomly selected genes.
The adjusted taxonomical distribution of orthologs among virulence and random genes was unexpectedly found with a high frequency in bacterial groups different from Proteobacteria, including Aquificae, Nitrospirae, Deferribacteres and Elusimicrobia (Table 3). This  could be explained by events of massive horizontal transfer between the Brucellaceae family and these groups, as proposed in deviations from the genome-wide molecular clock [18], although it is also important to consider that these groups are composed of a small number of organisms, which diminishes the representativeness and could affect their true distribution. The significantly high number of orthologs in Euryarchaeota, which was even more pronounced in Proteobacteria, together with the low proportion of orthologs in other bacterial groups, including Tenericutes, Chlamydiae, Bacteroidetes, Firmicutes and Thermotogae, indicates that virulence genes are more conserved in Euryarchaeota (Archaea) and Proteobacteria than expected by chance, with an underrepresentation of many bacterial groups, including intracellular microorganisms (Table 3). These observations seem contradictory, considering that it is thought that virulence genes are more likely to be associated with intracellular adaptations; nevertheless, it has been shown that some virulence genes in Brucella spp. are also needed for plant symbiosis or even plant virulence in members of the Rhizobiaceae family, a phylogenetically related family with a different lifestyle (plant symbionts or plant pathogens) [27,28,29]. This suggests an evolution and adaptation of these genes ancestors to different environments, such as animal or plant intracellular life. To date, the main virulence genes associated with intracellular survival in B. melitensis are composed of the virB operon, which is more closely related to Proteobacteria than to other bacterial groups. Other virulence-related genes reported so far exhibit a wide diversity of functions, including transcription, membrane structure and cell motility, which are more likely to be associated with phylogenetically related organisms than with distantly related ones when compared with randomly selected genes. The significant diminishment in the number of orthologs in Gammaproteobacteria and Rickettisales (Table 4) suggests a diminished conservation of these genes in these particular groups within the Proteobacteria and the Alphapoteobacteria groups, respectively, which could also be explained by gene loss in these specific divisions. It is important to mention that the proportion of orthologs in virulence genes in the Alphaproteobacteria was not increased within the Proteobacteria group; likewise, there was not an increased proportion of orthologs  in the Rhizobiales (the order to which B. melitensis belongs) within the Alphaproteobacteria subgroup. These results could indicate that although there is not an increased proportion of orthologs in these subgroups, the increased number of orthologs in the Proteobacteria suggests that a higher number of orthologs is distributed in all the subgroups of Proteobacteria in a similar proportion, including the Alphaproteobacteria and hence the Rhizobiales, with the exception of Gammaproteobacteria and Rickettsiales, which presented a diminished presence (as previously mentioned). The lack of a particular increase of orthologs in Alphaproteobacteria and Rhizobiales could be due to a nonpreferential conservation of these genes in these subgroups and/or that the slight increase observed (Table 4) does not reach statistical significance due to the small sample size, considering that these comparisons were performed with a smaller sample size than the comparisons for the main groups of organisms. With respect to the number of orthologs present in specific virulence genes, we found that genes related to specialized functions, such as BMEII0039 (gltD), BMEI0972 (gor), BMEI0998 (wboA) and BMEI0926 (emrA), as well as 3 hypothetical proteins with unknown function (BMEI0540, BMEI0514 and BMEI0193) exhibited the lowest number of orthologs, suggesting a set of unique genes involved in the B. melitensis virulence network that could be candidates for specific drug targets. While the genes with known functions that are important but more general, such as nucleotide metabolism [BMEI0296 (purE), BMEI1519 (purD) and BMEI1127 (purL)] and the RNA polymerase BMEI0781 (rpoA), presented the highest number of orthologs. It is noteworthy that the gene with the highest number of orthologs was BMEI1301 (dapA), a gene involved in lysine biosynthesis (Table 1).
Considering the few organisms with orthologs for the complete set of genes included in the virB operon (besides B. suis), we propose a possible mechanism of horizontal transfer or a similar evolution to other Rhizobiales (in the case of Methylobacterium radiotolerans) or even Betaproteobacteria (in the case of Burkholderia vietnamensis, B. xenovorans and Leptothrix cholodnii), as these bacteria are not the closest organisms to B. melitensis that were analyzed. A mechanism of gene loss in closer organisms is also possible.
In conclusion, 160 virulence-related genes in B. melitensis were retrieved from the literature. They exhibited a wide range of functions with an overrepresentation of specific COGs, such as intracellular trafficking and vesicular transport, transcription, cell wall and membrane biogenesis, nucleotide transport and metabolism and cell motility.
The taxonomical distribution analysis showed low but significant differences between virulence and randomly selected genes in B. melitensis. These differences could be related to the conservation of these genes in different bacterial groups and are probably related to their functions (considering that a higher conservation is closely related to a more required function). In this line, the high number of orthologs in Proteobacteria and to a lesser extent in Euryarchaeota (from the Archaea domain) with a significant decrease in other groups when compared with randomly selected genes is remarkable and indicates a higher conservation of these genes within the Proteobacteria group, which in the case of B. melitensis required adaptation to animal intracellular pathogenicity. These differences would also indicate a reduced likelihood of horizontal transfer with distant related organisms with the exception of Euryarchaeota; nevertheless, further experimental and integrative analyses will redefine the main functions and taxonomical distribution of virulence genes in B. melitensis. Finally, our integrative approach suggests potential specific drug targets, such as the gltD, gor and wboA genes, that when combined with drugs that affect virulence genes of different functions could be used in a synergistic fashion to treat B. melitensis infection.  : adjusted percentage, this is the corresponding percentage of the number of orthologs per each 100 genes of each organisms and divided by the number of organisms in each group, P value obtained with chi-squared test and Fisher exact test, *p value #0.05. In order to understand the differences in the percentages between the groups, the percentage and not the adjusted percentage must be taken into account, the adjusted percentage should only be considered in descriptive results, i.e., to know the distribution of orthologs in the organisms groups in each set of genes. doi:10.1371/journal.pone.0100349.t004