Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1.

Proposed species names and assembly data for strains used for pipeline construction and testing.

More »

Table 1 Expand

Fig 1.

Pipeline for circumscription of B. pumilus group strains.

The diagram describes the informatics tools used and the pipeline integrating genomic (A), phylogenomic (B) and functional (C) approaches for bacteria circumscription. A) ANI approach. ANI values of any two genomes among strains under study were calculated and then used to perform a correlation analysis. B) Phylogenomic approach. Core genes were searched using BLAST in all bacterial genomes under study. Orthologous genes were individually aligned, concatenated, and trimmed. Finally, the best substitution model was selected, and the evolutionary history inferred. C) Encoded function repertoires approach. The functions of all codified protein analyzed were assigned, and the presence or absence of particular biological functions in each of the microorganisms was determined. Finally, this binary information was used to perform a hierarchical cluster analysis. Similarities or differences between phylogenomic (B) and functional (C) dendrograms were used to define ecologically distinct strains, or reinforce a species definition. When necessary, complementary analyses like is-DDH were performed.

More »

Fig 1 Expand

Fig 2.

Correlation plot based on strain ANI values.

ANI values between each indicated strain (Types in bold) were calculated using the JSpecies software [14] and used for a Pearson correlation matrix construction conducted using R [19]. The plot shows the correlation constructed and ordered by hierarchical clustering using the R package “corrplot” [20]. The minimum percentages of ANI values between strains of a given cluster are indicated in brackets.

More »

Fig 2 Expand

Fig 3.

Comparison of phylogenomic and functional dendrograms of Bacillus pumilus group strains.

Phylogenomic and functional dendrogram comparisons were performed and plotted with the R package “dendextend” [28]. A) Phylogenomic dendrogram. 109 BLAST core genes were individually aligned, concatenated and trimmed resulting in a final alignment containing a total of 104022 residues. The evolutionary history of the indicated strains was inferred with RAxML algorithm [24]. Reliability of the inferred tree was tested by bootstrapping with 1000 replicates. When not indicated, the bootstrap support values were 100. B) Functional dendrogram. Biological functions of proteins encoded in the genome of the indicated strain (Types in bold) were inferred using the OrthoMCL software [26] and then used as a binary score for hierarchical cluster analysis implemented with the R package “pvcluster” [27].

More »

Fig 3 Expand

Fig 4.

Analysis of functional repertoires among clusters of Bacillus pumilus group strains.

Numbers of biological functions of proteins encoded in each cluster, all B. pumilus group strains or all 26 strains under analysis are indicated.

More »

Fig 4 Expand

Fig 5.

Pipeline to circumscribe bacteria as well as to rank genes base on their importance.

First, gen distances among all individually aligned core genes are calculated. Then, a forest of decision trees is constructed considering all variables and as classes the suggested species names that resulted from the genomic, phylogenomic and functional cluster analysis (pipeline described in Fig 1). The importance of the variables are computed using RF algorithm [30]. Finally, distances of the most important gene are used to perform a PCA to circumscribe bacteria and identify outclasses. Further analysis (base on phylogenomic, genomic, and experimental phenotypic information) have to be performed to classify those outlier strains.

More »

Fig 5 Expand

Fig 6.

Ranking of genes based on their RF importance and PCA for outlier detection.

A) Importance and error rate plot. Importance of each gene was computed using RF and plotted versus its gen distance mean. Symbols representing the percentage of the classification error rate are depicted. B) PCA plot. The PCA was conducted using R [19] and as variables, the distances to each of the ybbP orthologs from strains listed in Tables 1 and S4. PC1 vs. PC2 and 95% confidence interval ellipses were plotted with the R package “ggbiplot” [31]. Symbols used for strains listed in Table 1 are depicted in the figure. The 15 B. pumilus group strains listed in S4 Table (New strains) are depicted as closed triangles. A, JPL_MERTA2; B, RIT372; C, SCAL1; D, 15.1; E, LK12; F, LK21; G, LK32; H, LK31; I, LK18; J, W3; K, RIT380; L, LK23; M, LK33; N, LK5; O, DSM 26896.

More »

Fig 6 Expand

Table 2.

Statistics of the 10 most important genes for RF species circumscription.

More »

Table 2 Expand