Prediction of virus-host associations using protein language models and multiple instance learning
Fig 5
The Confusion matrix plot of prokaryotic hosts (A) and eukaryotic hosts (B) based on EvoMIL.
The confusion matrix plots A and B represent the performance of the EvoMIL model on 22 prokaryotic hosts and 36 eukaryotic hosts, respectively. It is constructed by evaluating the model’s predictions on a test set comprising 20% of the dataset, while the EvoMIL model was trained on the remaining 80% of the data. This plot provides insights into the model’s accuracy in predicting the host species for the tested viruses.