Skip to main content
Advertisement

< Back to Article

Predicting host taxonomic information from viral genomes: A comparison of feature representations

Fig 9

The signal loss for holdout classifiers.

Violin plots of the ratios of the AUC scores for holdout (AUC_ho) to standard (AUC_all) classifiers for each dataset showing the variation in signal loss for the different feature sets. For the feature set labels, the letters indicate the genome representation and the number the k-mer size. Genome representation: DNA—nucleotide sequence (blue); AA—amino acid sequence of CDS regions (orange); PC—physio-chemical properties, each amino acid residue binned into one of seven bins based on its physio-chemical property (green); Domains—presence of PFAM domain in the sequence.

Fig 9

doi: https://doi.org/10.1371/journal.pcbi.1007894.g009