Skip to main content
Advertisement

< Back to Article

RESCRIPt: Reproducible sequence taxonomy reference database management

Fig 13

Comparison of taxonomic information and simulated classification accuracy from BOLD and NCBI GenBank COI gene databases for available arthropod and chordate sequences.

All sequences were dereplicated and trimmed to a common primer region. NCBI references either contained a cross-reference term to BOLD (“ncbiOB”) or not (“ncbiNB”) or were combined together (“ncbiAll”). A, Number of unique taxonomic labels; B, Taxonomic entropy; C, proportion of unclassified taxa at each rank; D, optimal classification accuracy (as F-Measure) without cross-validation (simulating best possible classification accuracy when the true label is known but classification accuracy may be confounded by other similar hits in the database). E, Classification accuracy with cross-validation. Rank labels on x-axis: K = kingdom, P = phylum, C = class, O = order, F = family, G = genus, S = species.

Fig 13

doi: https://doi.org/10.1371/journal.pcbi.1009581.g013