Skip to main content
Advertisement

< Back to Article

Predicting host taxonomic information from viral genomes: A comparison of feature representations

Fig 3

Comparison of the results for all the eukaryote datasets across all the feature sets.

The heatmap shows that most of the feature sets contain some predictive signal, AUC > 0.5, for the majority of the eukaryote datasets and for all Baltimore groupings (indicated by the inner colour bar on the right). Each row corresponds to a dataset and are ordered by taxonomic rank (indicated by the outer colour bar on the right) and each column corresponds to a feature set. For the feature set labels the letters indicate the genome representation and the number the k-mer size. DNA—nucleotide sequence; AA—amino acid sequence of CDS regions; PC—Physio-chemical properties, each amino acid residue binned into one of seven bins based on its physio-chemical property; Domains—presence of PFAM domain in the sequence. The colour indicates the AUC score for each classifier. All AUC scores of less than 0.5 were set 0.5, i.e., no predictive signal. The number of viruses in each dataset is in brackets.

Fig 3

doi: https://doi.org/10.1371/journal.pcbi.1007894.g003