Fig 1.
Top 20 genera in the plasmid and chromosomal datasets.
A total of 10,654 chromosomal contigs and 10,584 plasmid contigs were used. Genera representing greater than 1% of the contigs are labeled.
Fig 2.
Workflow for the study.
Table 1.
Accuracy of ML models built for classifying plasmid and chromosome sequences using 6-mers as features*.
Fig 3.
Receiver-operator characteristic curve for the 10-fold cross validated neural network model based on 6-mers and randomly selected 5-kb sequence fragments.
Results for each fold are shown.
Fig 4.
Classification accuracy for the top 20 genera in the test set.
The dotted line indicates accuracy of 0.9. Error bars are the standard deviation from the result of the 10-fold cross validation.
Table 2.
Accuracy of a voting classifier for predicting chromosomal and plasmid sequences using 5kb sequence fragments from each contig and voting based on the majority*.
Table 3.
The most commonly occurring annotations in sequence fragments with false predictions.
Table 4.
Comparison of the performance of PlasFlow and PlasClass with the models built in this study for the same test dataset in the 10-fold cross validation*.