Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

ViraMiner architecture.

ViraMiner architecture takes as input the raw sequences in one-hot encoded form. The raw sequences are then processed by two different convolutional branches (pattern and frequency branch). The architecture of the branches is shown in detail in Methods and Material. The outputs of the branches (two 1D vectors) are concatenated. This concatenated vector is all-to-one connected (fully-connected) with the output node. The output node passes the weighted sum of its inputs through sigmoid activation function resulting in a value restricted to range [0, 1]. This value reflects the likelihood of the sequence belonging to virus class.

More »

Fig 1 Expand

Fig 2.

ROC curve of ViraMiner model on the testing dataset.

The blue line on the figure represents the performance of ViraMiner (AUROC: 0.923). For comparison, the green line depicts the performance of a random model.

More »

Fig 2 Expand

Fig 3.

The trade-off between precision and recall for classifying viral genomes.

A genome is labeled as virus if the output value is higher than a certain threshold. The blue line represents the trade-off. The crossing point of the vertical and horizontal red dotted lines indicates that the model can achieve 90% precision with 32% recall.

More »

Fig 3 Expand

Fig 4.

Comparison with baseline models.

The blue line depicts the performance of ViraMiner (AUROC: 0.923); the red line corresponds to the best of Random Forest on k-mers models (using 6-mers); the brown line, shows RF performance on raw metagenomic contigs.

More »

Fig 4 Expand

Fig 5.

Performance comparison of models designed in this study.

Models based on k-mers (red bars) were trained using Random Forest and the best performance was achieved with 6-mers with test AUROC value 0.875. Convolutional Neural Networks (blue bars) use the raw sequence as input and outperform RF models. Pattern and Frequency branches yield performances above 0.9 AUROC. ViraMiner uses a combination of both branches and achieves the highest test AUROC value (0.923) out of all models.

More »

Fig 5 Expand

Table 1.

ViraMiner performance on unseen serum metagenomic experiments.

The number of viral samples in these experiments varies from 22 to 732.

More »

Table 1 Expand

Fig 6.

ViraMiner architecture.

(a) ViraMiner full model, with the branches not expanded. (b) Architecture of the Frequency model with best performing layer sizes shown. The dotted parts will be discarded when using the pre-trained parameter values of this model as the Frequency Branch in the full model. (c) Architecture of the Pattern model (similar to DeepVirFinder) with best performing layer sizes shown. The dotted parts will be discarded when using the pre-trained parameter values of this model as the Pattern Branch in the full model.

More »

Fig 6 Expand