Phylogeny in Aid of the Present and Novel Microbial Lineages: Diversity in Bacillus

Bacillus represents microbes of high economic, medical and biodefense importance. Bacillus strain identification based on 16S rRNA sequence analyses is invariably limited to species level. Secondly, certain discrepancies exist in the segregation of Bacillus subtilis strains. In the RDP/NCBI databases, out of a total of 2611 individual 16S rDNA sequences belonging to the 175 different species of the genus Bacillus, only 1586 have been identified up to species level. 16S rRNA sequences of Bacillus anthracis (153 strains), B. cereus (211 strains), B. thuringiensis (108 strains), B. subtilis (271 strains), B. licheniformis (131 strains), B. pumilus (83 strains), B. megaterium (47 strains), B. sphaericus (42 strains), B. clausii (39 strains) and B. halodurans (36 strains) were considered for generating species-specific framework and probes as tools for their rapid identification. Phylogenetic segregation of 1121, 16S rDNA sequences of 10 different Bacillus species in to 89 clusters enabled us to develop a phylogenetic frame work of 34 representative sequences. Using this phylogenetic framework, 305 out of 1025, 16S rDNA sequences presently classified as Bacillus sp. could be identified up to species level. This identification was supported by 20 to 30 nucleotides long signature sequences and in silico restriction enzyme analysis specific to the 10 Bacillus species. This integrated approach resulted in identifying around 30% of Bacillus sp. up to species level and revealed that B. subtilis strains can be segregated into two phylogenetically distinct groups, such that one of them may be renamed.


Introduction
Phylogenetics, the science of estimating the evolutionary past is based on the comparison of DNA or protein sequences [1]. In this age of rapid and rampant gene sequencing, the availability of a large amount of genomic information from 639 sequenced genomes (http://www.ncbi.nlm.nih.gov) and 16S rDNA sequencing data of 451545 isolates (http://rdp.cme.msu.edu/) has given new dimensions to microbial taxonomy and is likely to lead to revision of concepts such as species, organism and evolution [2]. 16S rDNA gene sequencing is often used as an alternative method to define microbes at species level. Protein coding genes having high variability has been successfully used to differentiate taxa that cannot be identified solely on the basis of 16S rDNA sequences e.g., heat shock protein (hsp65) [3], hsp70, ATPase-ß-subunit, RNA polymerases and recombinase (recA) [4]. In addition, partial rpoB sequences have been applied to classify members of the genus Mycobacterium [3,5]; gyrB gene sequences have been used to define Acinetobacter members [6]; Mycobacterium [7]; Pseudomonas [8] and Shewanella [9], gyrA gene for defining Bacillus subtilis and related taxa [10].
Members of the genus Bacillus comprise gram-positive, sporeforming, rod-shaped, aerobic bacteria. Bacillus species are phenotypically and genotypically heterogenous [11,12]. Bacillus represents microbes of high economic, medical and biodefense importance such as bio-pesticides [13] and biofuels [14][15][16], pathogens [17,18]. Bacillus thuringiensis is currently used for the biological control of insects and in crop protection [19]. B. subtilis strains produce a broad spectrum of bioactive peptides with great potential for biotechnological and biopharmaceutical applications [20]. Bacillus licheniformis strains also produce a variety of peptide antibiotics such as bacitracin [21,22], bacteriocin [23] and are also known to contaminate industrial processes [24][25][26] and cause food poisoning [27,28]. Bacillus spores are being used as human and animal probiotics despite the fact that studies now indicate extensive mislabeling of constituent Bacillus strains [29,30]. Therefore, it is becoming increasingly clear that a more rigorous selection process is required for Bacillus probiotic candidates [31,32]. Because of these divergent characteristics, questions arise concerning intra species diversity that could differentiate isolates of potential economic importance. It is for these reasons that 11 closely related Bacillus are among the 29 Bacillales sequenced so far (http://www.ncbi.nlm.nih.gov).

Bacterial Systematics
Bacterial systematics began long before the discovery of DNA as the heritable material. Bacteria were originally classified largely on the basis of phenotype, morphology, ecology and other associated metabolic characteristics. Bacterial taxonomy has been a tedious, esoteric and uncertain discipline. Some excitement was brought in by the team led by Dr. Carl Woese. They provided a detailed insight into bacterial phylogeny by exploiting molecular biology in an innovative manner [33]. Genomic discoveries are posing a challenge to the classical bacterial systematics [34].

Defining Diversity of Bacillus
Bacillus cereus group. Three species of the Bacillus cereus group (B. cereus, Bacillus anthracis and B. thuringiensis) have a marked impact on human activity. B. cereus and B. anthracis are important pathogens of mammals, including humans, and B. thuringiensis is extensively used in the biological control of insects [68]. In B. cereus group the chromosomes of the sequenced members are extremely similar. The number of genes unique to one species is quite limited and often represents metabolic adaptations [34]. Although B. anthracis can be distinguished from B. cereus on the basis of biochemical tests [69], however, certain isolates on the periphery such as the pathogenic B. cereus G9241 [70] cannot be properly classified unless it is checked for the plasmid pX01, which is native to B. anthracis [34]. Since comparisons of chromosomal contents are not able to easily distinguish different members of B. cereus group -B. cereus, B. anthracis and B. thuringiensis, one may have to look for some more easily recognizable markers. Based on the phylogenetic homogeneity, 86 strains of B. thuringiensis could be closely clustered together in four different groups (Bt group I-IV) at a DNA similarity rate of 93% [54]. B. thuringiensis is closely related to B. anthracis and Bacillus mycoides and is regarded as a subvariant of B. cereus based on genotypic data [12,71,72]. Demonstration of the high genetic relatedness of B. thuringiensis, B. anthracis and B. cereus has led to the suggestion, that these are members of a single species of B. cereus sensu lato [48,72,73]. Overall genetic studies have shown that B. cereus and B. thuringiensis are essentially identical [74]. B. anthracis can be distinguished from B. cereus and B. thuringiensis through microbiological and biochemical tests. B. anthracis isolates are non-hemolytic, non-motile, penicillin sensitive, susceptible to c-phage and produce a poly-c-D-glutamic acid capsule [70]. The classification and taxonomic separation of members of B. cereus group is rather difficult even with modern molecular tools.
Analysis of large culture collections of B. cereus, B. anthracis and B. thuringiensis by AFLP and MLST [70,75,76] have identified a class of organisms containing toxigenic B. cereus and B. thuringiensis that are closely related to B. anthracis. These isolates were phylogenetically distinct from environmental B. cereus and B. thuringiensis [75] and might represent the closest ancestor B. anthracis.
Bacillus subtilis group. Although several species resembling B. subtilis have been described over the last two decades, identification of B. subtilis like organisms has been quite difficult and laborious. Presently, the difficult part lies in confirming the significance of the sequence clusters. Such organisms have almost identical 16S rDNA sequences (99.2 to 99.6% sequence similarity) [42,77]. Comparative sequence analysis for the gyrA gene, which codes for DNA gyrase subunit A of 7 representatives of B. subtilis and allied taxa provided a frame work for their rapid and accurate classification and identification [10,65] divided B. subtilis in to two subspecies, namely B. subtilis subsp. subtilis and B. subtilis subsp. spizizenii on the basis of cell wall chemistry and DNA-DNA relatedness data.
Bacillus licheniformis. The intra-specific diversity of B. licheniformis studied by means of MLEE and phenotypic analysis could distinguish these isolates into two main subgroups [78]. Despite the high phenotypic similarities among the 182 isolates of B. licheniformis, the DNA-DNA reassociation studies showed three very distinct groups. These were therefore regarded by Manachini et al., [79] as genomovars of B. licheniformis. A 59 hypervariable region of the 16S rDNA corresponding to B. subtilis at positions 41-307 and similarly a B. licheniformis specific Taq probe 59-FAM-GAG CTT GCT CCC TTA GGT CAG -Dab Syl -39 were designed for targeting a section of this region corresponding to B. subtilis 16S rDNA numbering positions 79-99. The B. licheniformis chromosome contains large regions that are co-linear with the genomes of B. subtilis and Bacillus halodurans and approximately 80% of the predicted B. licheniformis coding sequences have B. subtilis orthologs [80]. Recent taxonomic studies indicate that B. licheniformis is closely related to Bacillus amyloliquefaciens and B. subtilis on the basis of comparisons of 16S rDNA and 16S-23S ITS nucleotide sequence Lapidus et al., [81] and Xu and Cote [82] recently constructed a physical map of the B. licheniformis chromosome using a PCR approach and established a number of regions of co-linearity where gene content and organization were conserved with the B. subtilis genome. The close relationship between B. licheniformis and B. halodurans compared to B. subtilis has been shown on the basis of i) replication terminator protein (rtp), which is lacking in B. licheniformis [80] and B. halodurans [83]; ii) putative transposase of B. licheniformis shows close relation to B. halodurans [80,83]; iii) the 27 predicted extracellular proteins encoded by B. licheniformis ATCC 14580 genome that are not found in B. subtilis 168 [80]; iv) two gene clusters involved in cellulose degradation and utilization were discovered in B. licheniformis and there are no counterparts in B. subtilis 168. Sixty six per cent of the predicted B. licheniformis genes have orthologs in B. subtilis and 55% of the genes models are represented by orthologous sequences in B. halodurans, 1719 orthologs are common to all these three species. These conservations clearly support previous hypothesis [82] that B. subtilis and B. licheniformis are phylogenetically and evolutionarily closer to each other than to B. halodurans [80]. In our study, B. halodurans reference strains were very distinct from B. licheniformis and B. subtilis.
Bacillus halodurans. B. halodurans is a group of rod shaped gram positive, aerobic or anaerobic bacterium. An alkaliphilic bacterium, strain C-125 (JCM9153), isolated in 1975, and was reidentified as B. halodurans based on 16S rDNA sequence DNA-DNA hybridization analysis. Out of 11 factors which belong to the extracytoplasmic function family, 10 are unique to B. halodurans. One hundred and twelve CDSs in B. halodurans genome showed significant similarity to the transposases or recombinases from various species such as Anabeana sp., Rhodobacter capsulatus, Lactococcus lactis, Enterococcus faecium, Clostridium beijerinckii, Staphylococcus aureus and Yersinia pseudotuberculosis indicating that these have played an important evolutionary role in HGT and also in internal rearrangement of the genome [83].
B. halodurans and B. subtilis similarities: genome sequence comparisons between B. halodurans and B. subtilis reveal that among the total CDSs; 8.8% match sequences of proteins found only in B. subtilis. The Shine-Dalgarno (SD) sequence was complementary to the one found at the 39 end of 16S rDNA (UCU UUC UCC ACU AG…) of alkaliphilic B. halodurans C-125 [83]; is the same as that of B. subtilis. B. halodurans C-125 is quite similar to B. subtilis in terms of genome size, G+C content of the genomic DNA and the physiological properties used for taxonomical identification, except the alkaliphilic phenotype [84]. Also, the phylogenetic placement of B. halodurans C-125 based on 16S rDNA sequence analysis indicates that this organism is more closely related to B. subtilis than to other members of the genus Bacillus. Four types of ATPases were also well conserved between B. halodurans and B. subtilis. ABC transporter genes are the most frequent class of protein coding genes found in B. halodurans genomes as an in the case of B. subtilis.
Bacillus pumilus. Bacillus pumilus is commonly isolated from a variety of environmental sources, particularly feaces of animals. B. pumilus grows as a smooth colony that becomes yellow with increased incubation; the organism is motile, b-hemolytic on blood agar, catalase positive, salt tolerant and penicillin susceptible and does not grow under strict anaerobic conditions [85]. B. pumilus has toxic properties; it has cytopathic effects in vero cells, hemolytic activity, lecithinase production, and proteolytic action on casein. Recently, From et al. [86] detected an emetic toxin that can be related to food poisoning incidents. Human infection due to B. pumilus is exceptional.
Bacillus megaterium. Bacillus megaterium is a gram-positive, mainly aerobic spore forming bacterium found in widely diverse habitats from soil to seawater, sediment, rice paddies, honey, fish and dried food. B. megaterium has been industrially employed for more than 50 years, as it possesses some very useful and unusual enzymes and a high capacity for the production of exoenzymes. Genetic tools for this species include transducing phages and several hundred mutants covering the processes of biosynthesis, catabolism, division, sporulation, germination, antibiotic resistance, and recombination [87].
Bacillus sphaericus. Bacillus sphaericus is an aerobic, mesophilic, spore-forming bacterium with terminal swollen sporangia and spherical spores [88]. Strains of B. sphaericus are toxic towards mosquito larvae and can be used as biological control agents of the important vectors of filariasis, malaria and yellow fever [89]. Most studies interested in the use of these highly toxic strains in biocontrol programmes such as in Brazil have focused on the isolation of more adapted strains in tropical conditions [90]. Strains of B. sphaericus were divided in to five distinct groups and group II was formed by two subgroups, IIA and IIB. All toxic strains were located in DNA homology subgroup IIA [91] but this homology group also contained non-pathogens [92]. Different techniques such as phage typing [93], serotyping [94,95], cellular fatty acid analysis [96] and MLEE on agarose gel [97] were also used in order to identify entomopathogenic B. sphaericus. However, only a few works report the diversity within mosquito toxic B. sphaericus strains [98,99]. As result, the attempt to define a new species (DNA homology group IIA) based on mosquito pathogenicity as the unique characteristic was then discarded [92]. Subsequently, using cloned toxin genes bin and mtx from B. sphaericus as probes resulted in segregating 30 strains into 22 groups within the DNA homology group IIA [37]. From the variation in the number and size of bands it was possible to identify similarities among the strains resulting in 5 groups in BOX-PCR and 8 groups in REP-PCR. B. sphaericus strains isolated from diverse habitats and geographically different locations in Brazil, were phenotypically and genetically quite heterogeneous and can be potentially useful as biological control agents against mosquito larvae [37].
Although the current classification of species within the genus Bacillus and correlated genera is well established and is based on a combination of numerous approaches. The present study aims at determining whether or not a new set of parameters be deduced to develop a method, which could be informative enough to be useful for the more effective classification of Bacillus species and other microbes. Since 16S rDNA sequence repository is quite large, we focused our studies on this gene.
The genus Bacillus contain a heterogeneous assembly of aerobic or facultative anaerobic bacteria, widely distributed in the environment. The phenotypic protocols though important need a supplementation of molecular approaches. Molecular approaches based on DNA sequence minimize problems associated with typability and reproducibility and enables assembly of large reference databases [100]. Sequence specific primers for 16S rDNA gene have proved to be gold standards for the identification of pure cultures of Bacillus sp. such a B. subtilis [101], B. cereus and B. thuringiensis [102], Paenibacillus alvei (formerly Bacillus alvei) [103]. Genus specific primer have been successfully developed for Lactobacillus, Mycoplasma, Bifidobacterium, Pandorea, Clostridium and recently for certain Bacillus strains [104].

Results
During the investigation, 1121 individual 16S rDNA sequences belonging to the genus Bacillus (from RDP/NCBI sites: http://rdp. cme.msu.edu/; http://www.ncbi.nlm.nih.gov/) were analyzed for generating species-specific framework and probes as a rapid tool for their identification. These sequences represented a total of 10 different species.

Phylogeny of Bacillus species
Bacillus cereus group. (i) B. anthracis: Phylogenetic tree based on the 16S rDNA sequences from 153 strains of B. anthracis revealed 6 clusters (BAI to BAVI) ( Figure S1). These different clusters were represented by 5 to 87 strains. Twelve strains could not be segregated clearly into any of these clusters. All the sequences from each of the cluster when subjected to multiple alignments showed that the strains were fairly similar over a large region. A visual scan of the profiles in each of the clusters (BAI to BAVI), showed that the level of genetic variability is quite low, since 84.2% of the total sequence i.e. 1309/1554 nucleotides (nts) were deemed to be indistinguishable. Further a comparison of the total length of each of the groups support the limited genetic diversity between cluster BAI -BAIV on one hand and BAV -BAVI on the other. These clusters possess nearly identical similarity rate and were considered to be identical. Hence, the 153 B. anthracis 16S rDNA sequences were in fact belonging to 4 representative groups varying in length from 1309 nts to 1554 nts. The variability among four clusters was found to extend on either side of the core region, 78 nts upstream and 169 nts downstream.
(ii) B. cereus: B. cereus, another member of the B. cereus group was represented by 211 different strains. A phylogenetic tree based on the 16S rDNA sequences showed 7 clusters -BCI to BCVII ( Figure S2). These 7 clusters consisted of 12 to 48 strains. Three strains could not be segregated clearly into any of these clusters. Multiple alignments of all the strains revealed the conserved region in each of the 7 clusters. A visual scan of the profiles in each of the clusters BCI to BCVII, reflects that there is around 83.8% (1276/ 1522 nts) similarity among them. This 1276 nts long sequence may represent the core region of this Bacillus species. The 16.22% variability present in the region flanking the core sequence stretch was found to extend up to 86 nts upstream and 142 nts downstream. However, some further similarity was recorded in two sets of clusters i) BCI, BCIV and BCVI, ii) BCV and BCVII, which were thus considered as identical. These two major groups were comprised of 77 and 87 strains, respectively. The final number of B. cereus clusters could thus be reduced to 4 from the 7 clusters observed in phylogenetic tree.
(iii) B. thuringiensis: The third member of the B. cereus group is represented by 108 strains of B. thuringiensis. The phylogenetic tree based on the nucleotide sequences of the 16S rDNA gene of these 108 strains were primarily represented by 12 different clusters. The number of strains in each of these 12 clusters, BTI to BTXII, varied from 3 to 17 ( Figure S3). Twenty six strains could not be segregated clearly into any of these clusters. Multiple alignments of all members within each group revealed their respective conserved regions. Further alignment of the representative conserved regions from all the 12 clusters BTI to BTXII revealed that a completely conserved stretch of 1322/1516 nts equal to 81.2% was visible. This seems to represent the core region of the B. thuringiensis. A visual scan of the profiles of the sequences flanking the core region in each of the BTI to BXII cluster shows that there is 100% similarity between clusters BTIII-BTX and BTV-BTXII, which thus reduced the final tally of B. thuringiensis phylogenetic clusters from 12 to 10. The overall genetic variability around the core region extended up to 42 nts upstream and 142 nts downstream.
The 472 strains of the B. cereus group could thus be segregated in to 18 clusters, where the core region varied from 1276 to 1322, covering 81.2% to 84.2% of the total 16S rDNA gene length.
B. subtilis. Apart from the B. cereus group of 472 strains, B. subtilis is represented by 271 different strains. All the 16S rDNA sequences of B. subtilis were segregated in to 30 clusters (BSI to BSXXX) on the phylogenetic tree ( Figure S4). Each cluster had 4 to 18 organisms. Thirty nine strains could not be segregated clearly into any of these clusters. Multiple alignments within each of the group revealed the regions which were shared by all the strains. The length of the conserved region within clusters BSI to BSXXX varied from 1190 to 1554 nts and exceptionally was 900 nts long. A further realignment of the representative conserved sequences of each of the cluster revealed a DNA stretch of 1278 nts to be common to all. It represented 82.1% of the total length of the 16S rDNA gene of B. subtilis. The 17.9% variability in the flanking regions was found to extend on either side of the core region, up to 180 nts upstream and 174 nts downstream. It also reflects that there is a large genetic diversity in B. subtilis. On the basis of a visual scan of the regions flanking the conserved sequences, the 30 BS clusters could be reduced to 26. The four clusters quite similar to each other were BSI-BSII, BSVIII-BSXXI, BSXV-BSXXV and BSXXII-BSXXIV. In Table 1. 16S rDNA sequences of Bacillus species and number of sequences used in this study (http://rdp.cme.msu.edu/). addition, to the long and variable regions, there were two unique groups, BSXVII and BSXXIII, where the highly conserved region was intercalated by a highly variable region of 105 nts and 24 nts length, respectively. These were the only two such cases comprised of 14 sequences of B. subtilis strains, with this unique characteristic among all the 2146 sequences studied here.
B. licheniformis. In the RDP/NCBI database, B. licheniformis was represented by 131 different strains. These strains were distributed as 13 clusters, BLI to BLXIII on the 16S rDNA phylogenetic tree. Three to 15 organisms were present in each of the clusters ( Figure S5). Multiple alignment of these 16S rDNA sequence of all the members in each group revealed the regions which were shared among them. The length of the conserved region in clusters BLI to BLXIII, varied from 1219 to 1497 nts. Further alignment of the representative conserved sequences from each of the clusters revealed a completely conserved core DNA stretch of 1139 nts i.e. at position 210 to 1348. On the basis of this high similarity between the representative core regions equivalent to 74.59% of the total length, the 13 clusters could be reduced to 10. BLI, BLIII, BLVII and BLX, BLXI were of almost indistinguishable among themselves. The long and variable regions flanking the core sequence, indicates the high genetic variability within the B. licheniformis. The 25.41% variability in the flanking regions was found to extend on either side of the core region, up to 183 nts upstream and 180 nts downstream.
B. pumilus. A group of 83 strains represents the B. pumilus species in the 16S rDNA RDP database. The phylogenetic tree of 16S rDNA from B. pumilus strains showed 12 clusters ( Figure S6). Nine strains could not be segregated clearly into any of these clusters. Each of the clusters, BPI to BPXII was represented by 4 to 13 strains. Multiple alignment of gene sequences of members of each of the cluster showed that the conserved regions vary in length from 1215 nts to 1503 nts. Further alignment of the representative 16S rDNA sequences from each of the 12 clusters showed that there is a conserved region of 1215 nts. The core region represented 79.61% of the total length of the B. pumilus 16S rDNA sequence. It indicates the extent of similarity within this Bacillus species. On the other hand, the genetic variability extends on either side of the core region, up to 103 nts upstream and 205 nts downstream. Taking into account the core region and the flanking regions, the 12 clusters BPI to BPXII did not show any redundancy and could be easily distinguished from each other.
B. sphaericus. The phylogenetic tree based on the 16S rDNA gene from a small group of 42 strains of B. sphaericus, showed 7 clusters ( Figure S7). Each of the clusters BSPI to BSPVII was represented by 2 to 12 different strains. Five strains could not be segregated clearly into any of these clusters. Multiple alignment of gene sequences of members of each of the cluster revealed that the conserved regions varied in length from 1179 nts to 1501 nts. Subsequent alignment of the representative 16S rDNA sequences from each of the 7 clusters showed that there is a conserved region of 1081 nts. It represented 72.06% of the total length of the B. sphaericus 16S rDNA gene. There is thus a great genetic variability among the different clusters. The variable region extended up to 248 nts upstream of the 16S rDNA core region and 174 nts downstream. When all the clusters and their core regions are taken into account, two clusters BSPI and BSPVI showed high similarity and could be considered as one. The final tally of clusters of B. sphaericus could thus be reduced to six.
B. halodurans. B. halodurans group was constituted by 36 strains. Phylogenetic tree of the 16S rDNA sequences could segregate these 36 strains into 4 clusters ( Figure S8). These 4 clusters, BHI to BHIV were represented by 3 to 13 strains. Three strains could not be segregated clearly into any of these clusters.
Multiple alignment of 16S gene sequences within each group showed that the conserved region varies from 1443 to 1548 nts in length. Further alignment of the representative sequences of each of the group revealed that the core conserved region is 1416 nts long. It represents 91% of the total length. The low variability in the 16S rDNA gene sequence is evident by the length of the flanking regions which varies from 65 to 70 nts. However, the 4 clusters were still maintained in spite of low genetic variability.
Bacillus clausii. Bacillus clausii group is constituted of 39 strains. On the basis of the 16S rDNA gene phylogenetic tree, 6 clusters were observed for 34 of the strains ( Figure S9). Five strains did not fall clearly into clusters. The 6 clusters BCI to BCVI were represented by 2 to 10 different strains. Multiple alignments of gene sequences of members of each of the cluster revealed the size of the conserved region, which varied in length from 1343 nts to 1551 nts. However, subsequently when the representative 16S rDNA sequences from each of the 6 clusters were considered, the conserved region was found to be 1337 nts long. It represented 86.2% of the total length of which 16S rDNA gene of B. clausii may extend. It reflects that there is quite a large genetic variability among the different clusters. The variable region extends only up to 164 nts upstream of the 16S rDNA core region and 52 nts downstream. In spite of a large homologous region among all the clusters of B. clausii, they were still distinguishable from each other particularly the clusters BCII, BCIII and BCVI.
B. megaterium. The B. megaterium group was represented by 47 strains. A phylogenetic tree based on the 16S rDNA gene segregated these 47 strains in to 8 clusters: BMI to BMVIII ( Figure  S10). Each of the clusters was composed of 2 to 12 isolates, 6 strains could not be clustered concretely. Multiple alignments of gene sequences of members of these 8 clusters revealed the diversity of the core region of the 16S rDNA, which 1240 nts long. It represented 81.2% of the total length of the B. megaterium 16S rDNA gene. The 18.8% diversity in the region flanking the core region reflects the extent and range of genetic variability among the clusters BMI to BMVIII. Some of the clusters had high bootstrap value of 963 to 1000, while others had moderate bootstrap value of 562 to 661. and five members of B. thuringiensis were selected to represent these species. Thus 34 sequences were chosen to represent the 10 Bacillus species (Figure 2). This reference phylogenetic framework tree has been used to segregate those Bacillus isolates which have been presently classified only as Bacillus sp.

Phylogeny of Bacillus Core Groups
A total of 1025 sequences of 16S rDNA gene from Bacillus sp. were screened at the rate of 52 entries along with 34 reference phylogenetic framework sequences to generate 22 phylogenetic trees. From all the phylogenetic trees it was possible to classify a total of 305 isolates of the Bacillus sp: 75 isolates as B. cereus, 2 isolates as B. thuringiensis, 44 isolates as B. subtilis, 21 isolates as B. licheniformis, 32 isolates as B. pumilus, 23 isolates as B. sphaericus, 7 isolates as B. halodurans, 69 isolates B. megaterium, 31 isolates as B. clausii and 1 as B. anthracis. Final phylogenetic trees were drawn on the basis of this preliminary screening showed that 305 out of these 1025 show more similarity having 800 to 1000 bootstrap values with their respective species (Figures 3-7). The accession numbers of these Bacillus species are given in Table S1. At this rate about 29.75% of the unclassified Bacillus sp. could be identified up to species level. This phylogenetic framework based on 34, 16S rDNA sequences from 10 Bacillus species can be used to identify Bacillus strains up to species level.

Signature Analysis
Bacillus species-specific signature. The sequences of 10 data sets were submitted groupwise to MEME program (http:// meme.nbcr.net/meme/meme.html). Ten signatures were identified for each species, which were 25-30 nts long (Table  S2). Signature, which was not present in other Bacillus spp. and found as distinct was used to blast against the NCBI database. As seen from the Table 2 Cluster-specific signature. In case of 11 clusters (Figure 8), on the basis of the ClustalW alignment, the size of the conserved region varied from 1226 to 1547 nts. These 11 clusters consisting of 18 to 50 isolates each did not appear to fall within any of the aligned species and may represent sub-species or novel lineages. The sequences of these 11 clusters were groupwise submitted to MEME program (http://meme.nbcr.net/meme/meme.html). Ten signatures were identified for each cluster which was 25-30 nts long (Table S3). As seen from the Table, there were 1-2 regions in each of the cluster, which had the potential to be used as  Table 2). The signatures identified through MEME program revealed that there are certain similarities and even overlapping sequence stretches. The usage of signature and the conserved sequence regions against all other known Bacillus spp. indicate that they were not significantly similar and are not homologous. The signature sequence -59TTTAATTCGAAGCAACGCGAAGAACCTTA39 -shared by all the Clusters except 5 and 7 also indicates that these sequences may have a common origin similar to Lactobacillus, Paenibacillus, Clostridium, rumen bacterium, etc. [105]. Each of the clusters has certain unique signatures, which did not match with the 10 Bacillus spp. Typically, Clusters 2, 3, 4 and 8 did not show any unique signatures by this approach. However, in Cluster 2 -59AAATGATTGGGGTGAAGTCGTAACAAGGTA39 was the only signature which showed remarkable closeness to B. clausii in 31 out of 39 sequences. Similarly, only one out of 10 signatures in Clusters 3 and 4 showed closest matches with B. licheniformis and B. megaterium at a frequency of 54/131 and 13/47 sequences, respectively. As far as Cluster 8 is concerned, one of the signatures could be traced among 18/37 sequences of Cluster 7 and with negligible frequency in other clusters.
The unique signatures of sequences present in Cluster 1 showed similarity to uncultured Bacillus sp. and uncultured Macrococcus sp. with a frequency of 28% among the top 50 hits (BLAST). Similarly, the signatures of Clusters 6, 9 and 11 shared similar sequences to Virgibacillus, B. megaterium and Geobacillus, respectively, at a frequency in the range of 16 to 74%. In spite of employing all approaches to assign the signatures and sequences to known organisms, the signatures of Clusters 5, 7 and 10 could not be properly categorized. In brief, all these 11 clusters consisting of 18 to 50 sequences each did not appear to fall within any of the aligned species and may represent sub-species or novel lineages. It is probable that the Bacillus community is more diverse than reported so far (http:// rdp.cme.msu.edu). This suggests that bacterial communities in a variety of soils, environmental habitats, etc., may be very similar when assessed by molecular methods for their 16S rDNA than for other metabolic genes.

Restriction Enzyme Analysis
A total of 14 Type II restriction enzymes, which are independent of methylase and cleave at very specific sites within or close to the recognition sequence (4 to 6 mer) (  (Table S4). The sites for the two enzymes EcoRI and SmaI appeared in 96 to 97% of the sequences but due to the presence of only one site per sequence, these two enzymes could not serve any significant purpose at this stage.
Six REs with more than 1 site for their action were compared for all the 344 sequences of 10 Bacillus species (Figure 9). The pattern of the number and lengths of the fragments served as reference standards for those strains which have been designated so far as Bacillus sp.
A The three additional RE sites appeared on the 59 end of the 16S rDNA genes. Similarly with AluI, B. subtilis groups I and II differed not only in the number of RE sites but also in their positions. It may not be too premature to conclude that B. subtilis need to be subdivided in to at least two groups, since a similar situation was observed with respect to the signature analyses presented in the previous sections. Nakamura et al. [77] and Chun and Bae [10] divided B. subtilis in to two subspecies, namely B. subtilis subsp. subtilis and B. subtilis subsp. spizizenii on the basis of cell wall chemistry and DNA-DNA relatedness data.
The second distinct feature which emerged is with regards to the classification of these 10 Bacillus spp. on the basis of the 6 REs is with respect to members of the B. cereus group -B. cereus, B. anthracis, B. thuringiensis, which were indistinguishable within the group. However, they were very clearly distinguishable from all other Bacillus spp. with respect to the action of DpnII and AluI. B. halodurans could be distinguished from all other species on the basis of a combination of actions of TruI and HaeIII. B. clausii could be segregated from the others by using Tru9I and RsaI. B. megaterium though quite close to B. licheniformis for BfaI and Tru9I and B. clausii for RsaI; was distinguishable by cutting its 16S rDNA gene sequences with a set of 4 REs: BfaI, Tru9I, HaeIIII and RsaI. B. megaterium appears to be the originator of B. halodurans, B. clausii, B. licheniformis, B. subtilis and B. pumilus, with whom it shares different RE sites and fragments lengths ( Figure 10).   (Figures S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, S22, S23, S24, S25, S26, S27, S28, S29, S30, S31 to S32 and  (Figure 10).

Pattern of restriction enzyme sites in Bacillus spp.
On the basis of sequence similarity (phylogenetic closeness), and signature patterns (previous sections), 288 Bacillus sp. and 47 sequences of Jeotgalibacillus, Brevibacillus, Geobacillus, Marinibacillus, Paenibacillus, Pontibacillus and Virgibacillus, etc. could be segregated in to 11 clusters. RE patterns of the 344 16S rDNA sequences of 10 Bacillus spp. served as references for the segregation of strains designated as Bacillus sp. (Figure 9).
Out of the 14 TypeII REs employed for in silico digestions, SmaI, EcoRI, DpnII, RsaI, BfaI, HaeIII, Tru9I and AluI could cleave 16S rDNA gene sequences with a frequency of 1 to 7 as was observed with known Bacillus spp. RE sites for BamHI, NotI, SacI occurred with a very low frequency of 0.4 to 3.0% and thus proved to be ''non''-cutters. On the other hand, the sites for enzymes NruI, HindIII and PstI were observed to occur with moderate frequency in some of the clusters and could not be detected in some others. SmaI and EcoRI sites were observed with high frequency but with only one site per sequence, strong conclusions were difficult to draw. Hence, once again, information based on 6 REs -DpnII, RsaI, BfaI, HaeIII, Tru9I and AluI -generating more than 2 fragments proved useful for reaching meaningful conclusions.
On the basis of RE patterns (Figure 9), Cluster 1 comprising of 46 sequences (22 Bacillus sp. and 24 sequences of Jeotgalibacillus, Marinibacillus, Ureibacillus and Sporosarcina) showed similarity to 4 different Bacillus spp.: B. subtilis, B. licheniformis, B. sphaericus and B. halodurans (Table 4). Clusters numbered 4 and 7, showed similarity to 7 and 8 Bacillus spp., respectively. Incidentally, except for Tru9I, pattern of fragment length (nts) and order showed that Clusters 1, 4 and 8 resemble B. licheniformis, B. sphaericus, B. megaterium. However, there was quite a bit of variation among these clusters for the other 5 REs, which implies that the organisms in these clusters might have a common origin but are presently well distinguishable from each other. Clusters numbered 2, 3, 5, 6, 8, 9, and 10 showed RE fragment length and order pattern to be quite similar to 1 to 5 of the known Bacillus spp. but for certain REs there was no resemblance to known species. Clusters numbered 3, 6, 9 and 10 were quite distinct and hence categorized as ''unique'' on the basis of their response to AluI, DpnII and HaeIII, The fragment lengths and order were unique to each of them. A comparison of the different Clusters numbered 3, 6, 9, 10 with respect to their similarity to known species revealed that no two clusters showed exact match. The only exception to this observation, were the clusters 6 and 7 with respect to the action of HaeIII, where the order and size of the fragments were quite similar. For BfaI, RsaI and Tru9I, the similarity to known Bacillus spp. varied from cluster to cluster in two respects: firstly, the species identification varied from RE to RE, and secondly, no two clusters showed similar results with any two REs (Table 4).
Clusters 1, 2, 6 and 11 were composed of Bacillus sp. which could be categorized in to two groups: Group 1 consisted of Bacillus sp., which has now been reclassified as Jeotgalibacillus, Marinibacillus, Ureibacillus, Sporosarcina and Group 2 consisted of those which have been defined only up to genus level (Bacillus sp.). These served as controls for the observations made for Bacillus spp. with different REs. Cluster 1 showed closest matches with Marinibacillus, B. subtilis and B. licheniformis for RsaI (Figure 9). Such a dual relationship was also recorded for Cluster 1 with BfaI, DpnII and Tru9I. In these cases, the closest matches were Sporosarcina, Jeotgalibacillus, Marinibacillus, B. licheniformis and B. sphaericus. The variability in the closest matches among different REs reflects that Cluster 1 is a ''unique '' group intermediate to Bacillus and other closely related species such as Jeotgalibacillus, Marinibacillus, Sporosarcina, etc. Similarly, Clusters 2 and 6 showed uniqueness in their fragment order and length with 3 and 5 different REs, respectively. It implies that these groups of organisms need attention from taxonomists and may be classified as new organisms. Cluster 11 turned out to be primarily a set of 46 sequences of Geobacillus and 4 sequences of Anoxybacillus. These sequences largely resembled known Geobacillus sp. with respect to their RE activities. It thus served primarily as validation of the results recorded with Bacillus sp (Table 4).

Discussion
Among the group of aerobic endospore-forming bacteria of 25 genera and over 200 species, Bacillus is the largest and most prominent. Bacillus is comprised by heterogeneous assembly of grampositive, rod shaped, spore forming bacteria, which may grow aerobically or as facultative anaerobes. Further characterization and identification has been traditionally based on biochemical tests and fatty acid methylester (FAME) analysis [106,107]. With further developments, API (Analytab Products, Inc) system of identification is quite reproducible and reliable [108]. Microbial genotype or DNA sequence based analytical approaches are highly reproducible and enable large number of samplings at a time [100]. 16S rDNA sequencing has proved to be one of the most powerful tools for the classification of microorganisms [109][110][111]. 16S rDNA sequencing technique has been used for the identification of Bacillus spp. such as B. subtilis [101]; B. cereus and B. thuringiensis [102]; P. alvei (formerly B. alvei) [103].
The genus Bacillus has under gone considerable taxonomic changes. The number of spp. within this genus was reduced from 146 in the 5 th Edition of Bergey's Manual of Determinative Bacteriology [112] to 22 [113]. In the Approved List of Bacterial Names [114], 31 of the 38 aerobic endospore formers were Bacillus. However, at present there are 175 Bacillus species (http://rdp.cme. msu.edu/). The two factors for the rapid increase are the application of more diverse and intelligent methods for enrichment and isolation and the development of new and ever more sophisticated methods of amplification and sequencing of genes [115].
The identification of Bacillus species on the basis of 16S rDNA gene sequence is done by blasting it against the available databases. The need for developing a tool for identifying Bacillus species arose due to two reasons: (i) more than 50% of the 16S rDNA sequences deposited in the databases, have been annotat-  (Figures S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, S22, S23, S24, S25, S26, S27, S28, S29, S30, S31 to S32 and Table S1). A neighbor-joining analysis with Jukes-Cantor correction and bootstrap support was performed on the gene sequences. Bootstrap values are given at nodes, 1000 bootstrap replicates were run. Values in parentheses are accession numbers (http://rdp.cme. msu.edu/). doi:10.1371/journal.pone.0004438.g004 ed/identified only as Bacillus sp., ii) B. subtilis strains were seen to cluster in different clades and widely placed clusters on the phylogenetic tree (Data not shown here) raising doubts about the sequencing quality. It gives an impression that B. subtilis perhaps defies the phylogenetic pattern. It also suggests that the genera Bacillus may be further divided into sub-species, particularly as far as B. subtilis is concerned. Our data imply that the 16S rDNA molecule of the B. subtilis may exist in two different states much similar to B. cereus. This may lead to a different structure of the 30S subunit since binding of primary binding proteins affects binding of the secondary and tertiary binding proteins [116]. In fact, Nakamura et al. [77] divided B. subtilis in to two subspecies, namely B. subtilis subsp. subtilis and B. subtilis subsp. spizizenii on the basis of cell wall chemistry and DNA-DNA relatedness data. These were therefore regarded as genomovars. In view of this scenario, we thought of developing a tool which will enable identification of new Bacillus isolates at species level. In fact, a few Bacillus species, such as B. kaustophilus, B. sterothermophilus, B. thermoglucosidasius and B. thermoleovorans have been transferred to the newly created genus Geobacillus [65]. Certain others such as Bacillus globisporus, B. pasteurii and B. psychrophilus have been reclassified to the genus Sporosarcina [66]. In the same manner, B. marinus has been reclassified as M. marinus [67].

Problems with Bacillus Species
The members of B. cereus group show 99.5 to 100% similarity for their 16S and 23S rDNA sequences [117,118]. The genetic diversity of B. cereus isolates is enhanced by extra chromosomal elements [92,[119][120][121][122]. The need to develop approaches that rapidly identify the ''near neighbours'' of B. cereus group are of great interest for the study of B. anthracis virulence mechanisms as well to prevent the use of such strains for B. anthracis based bioweapon development [123]. In this study all the three members of the B. cereus group have very large conserved regions and appear indistinguishable on the basis of 16S rDNA gene sequence. B. thuringiensis and B. mycoides differ from B. anthracis and B. cereus by 0 to 9 nucleotides [47]. Even single strand conformation of polymorphism (SSCP) did not allow species discrimination within B. cereus group [124]. Variable region VI of 16S rDNAs of B. cereus and B. thuringiensis are useful for differentiation between these species [125]. However, our approaches have been able to distinguish B. cereus and B. thuringiensis from B. anthracis. With the first approach of using the representative sequences for each species, these three though distinguishable were always placed next to each other. On the other hand, with specific signature sequence, it was possible to distinguish B. cereus from other two. It was difficult to identify species specific signatures for B. anthracis and B. thuringiensis. A strategy has been proposed by Daffonchio et al. [123] for the identification of near neighbours of B. anthracis based on single nucleotide polymorphism (SNP) in the 16S-23S rDNA ITS containing tRNA genes, characteristic of B. anthracis. Two B. cereus strains and one B. thuringiensis strains showed RSI-PCR profiles identical to that of B. anthracis. The strict relationship with B. anthracis was confirmed by MLST of four independent loci: the 16S-23S rDNA long ITS, the SG-749 fragment that included a region homologous to B. subtilis ypnA gene; the AC-390 fragment that is homologous to the B. subtilis ywfK gene, encoding a hypothetical transcriptional regulator belonging to the LysR family; the pleR gene encoding for a pleiotropic regulator previously identified as one of the principal regulators of B. cereus virulence gene and the cerA gene that encodes the cereolysin A phospholipase. In yet another population genetic study among a strain collection of B. cereus group species, it was found by MLEE and MLST that the strains could be divided into two main groups [123]. The difficulty in distinguishing B. cereus group members on 16S rDNA based diagnosis however, correlated well with gyrase B (gyrB) as a molecular diagnostic marker [126].
Rep-PCR has been shown to be a useful technique in the subtyping of Bacillus species [37,38]. However, protein coding genes such as gyrA and rpoB exhibit much higher genetic variation. These genes have been thus used for the classification of closely related taxa within the B. subtilis group [10,39]. In spite of such difficulties our frame work is proving an efficient tool to handle such problems. With our approach two B. subtilis groups could be easily segregated on the basis of 16S rDNA gene sequence itself. These reference sequences in fact could segregate the two subspecies proposed by Nakamura et al. [77] namely B. subtilis subsp. subtilis and B. subtilis subsp. spizizenii on the basis of cell wall chemistry and DNA-DNA relatedness data. Of the two strains chosen within the B. subtilis Gr1, one of them was highly similar to B. licheniformis (Accession No. DQ504376), whereas the other strain was quite distinct (B. subtilis Accession No. AY631853).
The large degree of variation in the individual group fingerprints suggests that a substantial intra-species genetic diversity may exist and highlights the very high resolution of 16S rDNA gene sequencing. The data presented here is to our knowledge the first time, where molecular technique has been exploited and applied in this manner for assigning Bacillus isolates to different species. It would be difficult to extrapolate to define the potential limits of genetic variability within the Bacillus. However, such a strategy can be extended to other genera as well. It could form the basis for developing reference phylogenetic tree for various genera for which large number of 16S rDNA sequencing data is available.
The 16S rDNA sequences therefore give consistent separation of the strains into 9 major, non-overlapping clusters. The significance of this formal cluster assignment is made clear by the inclusion in our data set of sequences from independently reported strains. The signature nucleotide offers the opportunity for designing species-specific probes as primers for a rapid identification/segregation of new isolates.
Since pathogenic capacities of Bacillus species are often plasmid linked i.e. the pXO2 encoded capsule gene cluster, it may not necessarily be linked to internal genotypic grouping of taxa. The fact that plasmids can easily be transferred or lost makes these criteria unacceptable for typing purposes. Our reference phylogenetic tree may partially replace the use of schemes such as MLST.

Signatures
16S rDNA sequences vary due to substitutions and not due to insertion or deletion of bases [127,128]. Based on this they could identify repeating elements that are highly conserved across different species of Pseudomonas [128]. There are programs available for designing polymerase chain reaction (PCR) primer pairs as a means of rapid detection [129,130]. These programs select primer pairs based on the user defined parameters.  (Figures S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, S22, S23, S24, S25, S26, S27, S28, S29, S30, S31 to S32 and However, it becomes difficult to select the best primers since they do not provide information regarding the specificity of the oligonucleotides or patterns [131][132][133][134]. B. cereus has time and again been reported to be highly homogenous however, signature sequences of the 16S rDNA could distinguish the psychrotolerant and mesophilic strains. Single base pair substitutions were randomly distributed over the gene. The most obvious differences was one signature located at bp 180 to 192 (E. coli nomenclature) or bp 180 to 201 (B. cereus nomenclature) [127].

Restriction Enzyme Analysis
The phenomenon of host specific restriction and modification of bacterial viruses stemmed from endonucleases within the cells that destroy foreign DNA molecules. REs cleave DNA at specific sites, generating discrete and gene-size fragments and have proved to be a remarkable tool for investigating gene organization, function and expression [135][136][137]. REs occur in combination with 1 or 2 modification enzymes (DNA methyl transferases) that protect the cell's own DNA from cleavage by the RE. Since ME methylates as the same site where RE cuts, R-M system perhaps ensures that 16S rDNA remains conserved in spite of the fact that a large number of sites for each RE are present. In other genes, presence of RE sites increases diversity by promoting recombination [138,139]. PCR Restriction analysis (PRA) of 16S rDNA has been shown to contribute to rapidly and reliably identify newly isolated strains belonging to recognized species [140]. However as a result of analyzing a large set of data encompassing many species, we may extend the statement that this method can be applied for recognizing so far unrecognized species as well. In their study, they have applied four REs: HaeIII, HinfI, TaqI and RsaI. However, HinfI and RsaI showed no distinctive patterns for the strains tested. In fact, TaqI proved instrumental in clearly distinguishing one of the groups. A wide range of REs such as HaeIII, DpnII, RsaI, BfaI and Tru9I were used for defining the genus Virgibacillus [141,142]. B. licheniformis 16S rDNA sequence was spliced with AluI into five bands 270; 140; 180; 200 and 800 nts, whereas RsaI resulted in four bands 2110; 400; 450; 500 nts [105].
An innovative approach for revealing intraspecific genomic variability of B. cereus and B. licheniformis was the PCR fingerprinting of the spaces between the 16S and 23S rRNA genes and of intergenic tRNA genes regions [60]. Although RAPD showed remarkable diversity among B. cereus strains however, it was realized that the genetic diversity can arise from plasmid wide variability in the plasmid profiles. B. licheniformis formed 2 groups with all the methods. Based on single strand conformation of polymorphism analysis after RE (AluI and RsaI) digestions of 16S rDNA, two different evolutionary schemes for the two Bacillus species, B. cereus and B. licheniformis were proposed [60]. In our analysis, only 6 out of 14 REs (DpnII, RsaI, BfaI, HaeIII, Tru9I and AluI) -proved beneficial in easy distinction. The rest 8 REs (BamHI, EcoRI, HindIII, NotI, NruI, SacI, SmaI, and PstI) could not be exploited to significant extents to be useful for this purpose. However, in a different gene such as gyrB, digestion with EcoRI (and ClaI) could distinguish the four members of the B. cereus group but HindIII did not [38]. So the same set of REs may not hold good for different gene sequences. In fact HindIII gave poor results even while screening Corynebacterium spp., hence was discontinued and multiple RE usage was recommended [143]. In silico digestion with RE AluI was found to be most discriminative [144] and generated 3 to 13 fragments depending on the Mycoplasma species. Although 73 Mycoplasma species could be differentiated using AluI, other species gave undistinguishable patterns. For these, an additional restriction digestion typically with BfaI (or hpyF10VI) was needed for a final identification [145]. This was confirmed by application of ARDRA on 27 species and subspecies. We also validated our findings by applying RE to species of Virgibacillus, Gracibacillus and Geobacillus and can be exploited for describing new species [146].

Novel Lineages
Among the innovative strategies applied by various researchers, phylogenetic relationships between Bacillus species and related genera were inferred from comparison of 39 end 16S rDNA and 59 end 16S-23S ITS nucleotide sequences [82]. Among the 40 Bacillaceae species, Bacillus circulans remained ungrouped. Out of the ten groups, Group VI constituted of B. licheniformis, B. subtilis, B. sphaericus along with B. amyloliquefaciens, B. atrophaecus, B. mojavensis, B. macroides and B. fusiformis. Group X was placed independent of other 6 Bacillus groups and was comprised of B. anthracis, B. cereus, B. thuringiensis, B. mycoides and B. lentus. It indicates that B. cereus group is quite different from other Bacillus species. Separation of Bacillus species by Paenibacillus, Brevibacillus, Geobacillus, Marinibacillus and Virgibacillus species, indicates that in some cases, further divisions or conversely further grouping might be warranted. Our work has provided the tools for defining the thresholds of each species and enables us to pose the questions such as: Should current classifications be re-examined?
Lower levels of similarity were found with other alkalitolerant Bacillus strains particularly DSM8714 and DSM877, which still lack taxonomic standing. The results obtained confirm that the four Eterogermina strains belong to a unique genospecies which can be unequivocally identified as B. clausii. The finding is of intrinsic value, since some bacterial strains described as B. clausii strains have been reported to exhibit levels of DNA hybridization with the reference type strain of less than 61% [147]. Thus emphasizing the great genomic heterogeneity of the strains placed in the species, B. clausii [32].
As Ash et al., [42] predicted, their phylogenetic groups have been redefined as separate genera, and their outlying species have served as the basis for novel taxa as well. At the present pace of refined discoveries, we can expect new Bacillus -like genera to be redefined in the near future. In fact, Nazina et al., through physiological and genetic analysis submitted the validly described genus name of Geobacillus [65].
Heyndrickx et al., [142] undertook a polyphasic study, which revealed the presence within Virgibacillus of an as yet undescribed new species for which the name Virgibacillus proomii was proposed (V. proomii was distinguished from V. pantothenticus and members of Bacillus sensu stricto and from members of Paenibacillus and other aerobic endospore-forming bacteria by routine phenotypic tests). Comparisons of the 16S rDNA sequences of type strains of Bacillus and Sporosarcina species indicated that Bacillus pantothenticus lies at the periphery of rDNA gr1 (Bacillus sensu stricto) [42]. Virgibacillus was proposed to accommodate B. pantothenticus and two related organisms, which appeared to belong to an as yet (as on 1999) Figure 6. Phylogenetic tree of 34 framework sequences (bold values) and 69 Bacillus sp., which could be designated as B. megaterium (Figures S11, S12, S13, S14, S15, S16, S17, S18, S19, S20 , S21, S22, S23, S24, S25, S26, S27, S28, S29, S30, S31 to S32  and Table S1). A neighbor-joining analysis with Jukes-Cantor correction and bootstrap support was performed on the gene sequences. Bootstrap values are given at nodes, 1000 bootstrap replicates were run. Values in parentheses are accession numbers (http://rdp.cme.msu.edu/). doi:10.1371/journal.pone.0004438.g006 undescribed new species. It appears that the relationship between Bacillus laevolecticus and V. pantothenticus (Group III) and between Bacillus badius and M. marinus (Group IV) could still be open to debate. The robustness of this classification tool will be assessed by comparison with the current Bacillaceae classifications.
The various parameters like signatures (generated by MEME), restriction enzyme (RE) sites, nucleotide stretches ''generated'' by RE and the phylogenetic framework together can enable to generate a battery of markers. These are likely to define the variability between the species of a specific genus and the specificity of the genus. The use of these parameters is a simple, rapid approach, suitable to larger screening programs and easily accessible to most laboratories.

Sequence data
A total of 2146, 16S rDNA sequences belonging to the genus Bacillus (from RDP/NCBI sites: http://rdp.cme.msu.edu/; http:// www.ncbi.nlm.nih.gov/) were analysed in the present study. These included 271 sequences belonging to isolates of B. subtilis, 211 to isolates of B. cereus, 153 to isolates of B. anthracis, 131 to isolates of B. licheniformis, 108 to isolates of B. thuringiensis, 83 to isolates of B. pumilus, 47 to isolates of B. megaterium, 42 to isolates of B. sphaericus, 39 to isolates of B. clausii, 36 to isolates of B. halodurans and 1025 to isolates of Bacillus species (Table 1). The first ten sets of Bacillus species were used as the master species set for this analysis for generating a phylogenetic framework while the 1025 Bacillus species were used as a data set for segregating these unclassified Bacillus species. The other Bacillus species, which were represented by relatively minor numbers, were not considered for the development of the identification tool (Table S5).

Phylogenetic Analyses
For phylogenetic analyses of each of these 10 species data sets, the sequences from each of them were assembled and aligned using the multiple alignment program CLUSTALW version 1.82 [133]. To estimate evolutionary distance, pairwise distances between all taxa were calculated with the DNADIST of the PHYLIP 3.6 package. The resultant distance matrix was then used to draw a neighbor joining tree with the program NEIGHBOR. The program SEQBOOT [148] was used for statistical testing of the trees by resampling the dataset 1000 times. The trees were viewed through HyperTree Version 1.0.0 [133] and TreeView Version 1.6.6 [149] ( Figures S1, S2, S3, S4, S5, S6, S7, S8, S9, S10).
For each of these 10 data sets, sequences which fell in the same clade were grouped together. Candidate sequences of these individual groups were aligned and a consensus was obtained by removing ambiguous parts using JALVIEW sequence editor. Consensus from each group was chosen as a representative for the particular group. A phylogenetic tree was drawn on the basis of these representative sequences of the 16S rDNA gene. Members from each of the clusters of the tree were selected to define the range of each of the Bacillus species. Thus a reference set of 34 sequences was selected that contained two to five from each represented topology (Table 5) and regarded these as likely candidates that could give information about the organismal phylogeny.
Bacillus Species-Specific Signature MEME (Multiple EM for Motif Elicitation) is used for searching for novel motifs or signatures in sets of biological sequences. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences [150,151]. MEME searches can be performed via the web server (http:// meme.nbcr.net) and its mirror sites [151]. The same web server also allows access to motif alignment and search tool to search sequence databases for matches to motifs. To successfully discover motifs with MEME, it is necessary to choose and prepare the input sequences carefully. Ideally, the sequences should be ,1000 base pairs long [152]. In our analysis, sequences of 10 data sets in FASTA format were submitted group wise in MEME program Version 4.0.0 (http://meme.nbcr.net/meme4/cgi-bin/meme.cgi). In order to obtain maximum number of motifs in our sequences, we modified default settings from 3 motifs to 10 motifs. MEME search stops when this number of motifs has been found, or when none can be found with E-value less than 10000 (http://meme. nbcr.net/meme4/meme-input.html#width). We used default setting zero or one motif per sequence to get the occurrence of single motif which is distributed among the sequences. The default value of motif widths, set between 6 (minimum) and 50 (maximum) were modified and re-set between 25 and 30, respectively. Each of the 10 signatures (25 to 30 nucleotides long) (Table S2) was checked for its frequency of occurrence among all the sequences of a particular Bacillus sp. and the ones with highest frequency and did not appear in other Bacillus spp. were considered as unique to this species. These unique motifs were used as query sequence to BLAST against the sequenced microbial genomes available in NCBI database (http://www.ncbi.nlm.nih.gov/), to validate the results.

Restriction Enzyme Analysis
A total of 14 Type II Restriction enzymes (Table 3) were considered for these analyses. The criteria for selecting type II RE are independence of methylase and occurrence of cleavage(s) at very specific sites that are within or close to the recognition sequence.
All the 10 Bacillus species considered were checked for all the 14 Type II RE using the online software: Restriction Mapper Version 3 (http://restrictionmapper.org/index.htm). Sequences (one at a time) were entered in the restriction mapper site, results obtained were analyzed and consensus pattern was determined for each species depending upon its frequency of occurrence in the sequences. Ten known Bacillus spp. and 11 clusters belonging to Bacillus sp. were used as data sets. For B. megaterium, B. sphaericus, B. clausii, B. halodurans all the sequences were analysed and for B. cereus, B. anthracis, B. licheniformis, B. thuringiensis, B. pumilus 30 sequences were taken into consideration but for B. subtilis since no conclusive pattern could be made using 30 sequences, so 150 sequences were taken. The patterns developed for each Bacillus species were considered as a representative for that specific species.

Clusters/Potential Novel Lineages
In addition to the identification of some Bacillus spp. up to species level, [366 sequences were aligned {according to clusters made in 22 trees of Bacillus sp.} and 52 representatives were chosen and a phylogenetic tree was made] 335 isolates were found to cluster in to 11 groups {44 representatives out of 52} (Figure 8). These 11 clusters consisting of 18 to 50 isolates each did not appear to fall within any of the aligned species and so may represent sub-species or novel lineages. These were also checked for signatures identified through MEME program to reveal that if there are certain similarities. Each of the 11 clusters was also checked for the 14 type II restriction enzyme digestion. The pattern so obtained was checked against those of the representative Bacillus species for a match so that a conclusion could be drawn that these might belong to some Bacillus species.    Figures S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, S22, S23, S24, S25, S26, S27, S28, S29, S30, S31 to S32). These 52 representative sequences were observed to segregate in 11 clusters: cluster 1 to cluster 11. A neighbor-joining analysis with Jukes-Cantor correction and bootstrap support was performed on the gene sequences. Bootstrap values are given at nodes, 1000 bootstrap replicates were run. Accession numbers of the representative sequences are given and values in parentheses against these are equal to the number of sequences in that group within each cluster

Author Contributions
Conceived and designed the experiments: VCK. Performed the experiments: SP SL SC. Analyzed the data: SP SL SC VCK. Wrote the paper: VCK.