Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1.

Training, test data and model parameters.

More »

Table 1 Expand

Table 2.

MCC on cross validation and independent test-set.

More »

Table 2 Expand

Table 3.

Top 10 ranking pathogenic protein families and annotated functions of their proteins for TM-Gammaproteobacteria model.

More »

Table 3 Expand

Table 4.

Top 10 ranking non-pathogenic protein families and annotated functions of their proteins for TM-Gammaproteobacteria model.

More »

Table 4 Expand

Table 5.

Top 10 ranking pathogenic protein families and annotated functions of their proteins for the WDM model.

More »

Table 5 Expand

Table 6.

Top 10 ranking non-pathogenic protein families and annotated functions of their proteins for the WDM model.

More »

Table 6 Expand

Figure 1.

Pratio and Z-score histograms for TM-Betaproteobacteria model.

The model was built setting MinOrg = 2, HT = 0.9 and LT = 0.3. (A) and (B) respectively show the Pratio and Z-score histograms for the clusters i such that ORGi≥MinOrg. By this step the original 69,744 clusters are reduced to 26,706. In (A) the bars at the extremes are the count for clusters containing either only genes from pathogenic organisms (right bar) and non-pathogenic ones (left bar), while the small pick in the middle are clusters containing the same number of pathogenic and non-pathogenic organisms, and hence will not be used since they provide no discriminative information about pathogenicity. (C) and (D) show the same histograms for the PFs obtained removing all the significant clusters with Pratio value between LT and HT. We can see how the amount of non-pathogenic PFs is higher than the pathogenic ones (C). HT and LT can be used to modify the amount of both pathogenic and non-pathogenic PFs, which can be useful in model in which the training-set has an unbalanced amount of pathogenic and non-pathogenic organisms. In (D) the negative Z-scores are associated with non-pathogenic families while the others are for pathogenic PFs.

More »

Figure 1 Expand

Figure 2.

PFDB, training and test-set for each model.

Each bar-plot shows the percentage of pathogenic (orange) and non-pathogenic (light-blue) organisms in the training and test-set, and the percentage of pathogenic and non-pathogenic protein families in the PFDB of the model identified by the title of the bar-plot (eg. WMD). Below each horizontal bar-plot the number of protein families composing the PFDB of the model the bar-plot refers to, along with its size in megabytes and the number of sequences, is shown.

More »

Figure 2 Expand